Calling ffmpeg
and manually parsing its stdout
as suggested in many posts about reading a MP3 is a tedious task (many corner cases because different number of channels are possible, etc.), so here is a working solution using pydub
(you need to pip install pydub
first).
This code allows to read a MP3 to a numpy array / write a numpy array to a MP3 file with a similar API than scipy.io.wavfile.read/write
:
import pydub
import numpy as np
def read(f, normalized=False):
"""MP3 to numpy array"""
a = pydub.AudioSegment.from_mp3(f)
y = np.array(a.get_array_of_samples())
if a.channels == 2:
y = y.reshape((-1, 2))
if normalized:
return a.frame_rate, np.float32(y) / 2**15
else:
return a.frame_rate, y
def write(f, sr, x, normalized=False):
"""numpy array to MP3"""
channels = 2 if (x.ndim == 2 and x.shape[1] == 2) else 1
if normalized: # normalized array - each item should be a float in [-1, 1)
y = np.int16(x * 2 ** 15)
else:
y = np.int16(x)
song = pydub.AudioSegment(y.tobytes(), frame_rate=sr, sample_width=2, channels=channels)
song.export(f, format="mp3", bitrate="320k")
Notes:
- It only works for 16-bit files for now (even if 24-bit WAV files are pretty common, I've rarely seen 24-bit MP3 files... Does this exist?)
normalized=True
allows to work with a float array (each item in [-1,1))
Usage example:
sr, x = read('test.mp3')
print(x)
#[[-225 707]
# [-234 782]
# [-205 755]
# ...,
# [ 303 89]
# [ 337 69]
# [ 274 89]]
write('out2.mp3', sr, x)
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…