Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
496 views
in Technique[技术] by (71.8m points)

conv neural network - How to combine spectrogram image with its human-labelled data to be processed with CNN in Python?

I am doing a final project at campus: pitch estimation from a song using CNN.

Input to CNN is spectrogram of a song, generated by plt.specgram(), with size 334 x 217. The song dataset is taken from MIR-QBSH, with this specification: 8 sec duration, mono, 8KHz sampling, 8-bit quantization, frame size = 256, overlap = 0, and the first frame starts from the first sample of the audio file.

This is one example of the spectrogram:
Spectrogram example

As far as I understand now, I need data label (in my case: pitch labels) combined with the spectrogram for CNN to be able to process the computation. My data label contains 250 pitch labels for 1 song. These pitch labels are in the unit of semitone (MIDI number).

This is the example of pitch labels for spectrogram above. I have done math.floor() method to these pitch labels from the original file to simplify the computation.

Pitch values:  [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 48, 50, 50, 52, 52, 53, 54, 54, 53, 53, 53, 53, 54, 54, 54, 54, 54, 53, 0, 0, 54, 54, 54, 54, 54, 54, 53, 0, 0, 0, 0, 46, 46, 46, 47, 48, 48, 48, 48, 48, 49, 49, 49, 50, 50, 50, 50, 0, 0, 0, 50, 50, 50, 50, 50, 50, 50, 49, 0, 0, 51, 47, 47, 47, 47, 47, 47, 47, 47, 48, 49, 50, 0, 0, 0, 0, 0, 0, 0, 0, 0, 58, 57, 58, 58, 58, 58, 58, 57, 57, 57, 57, 57, 57, 57, 58, 58, 57, 57, 57, 57, 56, 55, 55, 56, 56, 56, 56, 56, 55, 56, 56, 56, 56, 55, 55, 55, 56, 56, 54, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 52, 52, 52, 53, 53, 54, 54, 0, 0, 54, 54, 54, 54, 54, 54, 54, 54, 0, 0, 0, 54, 54, 54, 54, 54, 53, 52, 51, 47, 47, 47, 47, 47, 47, 47, 47, 47, 48, 49, 49, 49, 49, 50, 50, 50, 50, 49, 0, 0, 0, 50, 49, 49, 49, 49, 49, 50, 50, 49, 0, 0, 47, 47, 47, 48, 48, 48, 48, 0, 0, 0, 0, 0, 0, 0]

My question is, what should I do to combine the spectrogram and its pitch label before it is processed by CNN in Python?

question from:https://stackoverflow.com/questions/65913473/how-to-combine-spectrogram-image-with-its-human-labelled-data-to-be-processed-wi

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I have solved my problem. It works like this:

    image_data = []
    tm = time.time()
    for img_item in os.listdir(image_path): #for every image in path
        try:
          img_array = cv2.imread(os.path.join(image_path, img_item))

          spectrogram_preprocessing = resize_recolor_spectrogram(img_array) # convert image to grayscale and resize it to 250 x 160

          # imread to array
          spectrogram_preprocessing = np.array(spectrogram_preprocessing)

          label = extract_pitch_label(os.path.join(label_path, img_item))

          # combining label and image
          image_data.append([spectrogram_preprocessing, label])
        except Exception as e:
          raise e

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...