Linux上没装matlab,没有audition,开源的audacity程序没法看语谱(找到看的方法了……),所以当时就简单写了一个,勉强能看。
基本原理是STFT,短时傅里叶变换,下面的程序思想和STFT类似,但是不完全一致,只是绘制了每一帧的幅度谱,效果看着还可以,语谱绘制使用matplotlib的imshow
函数,直接传入矩阵即可,但是注意,该函数绘制的时候按列绘制,我在存储每一帧的结果是按行存储的,所以需要转置。
另外,之前还做了预加重,毕竟不是提取特征,不需要,细节看代码(当初的写的有点不规范啊):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
| import scipy.io.wavfile as wav import numpy as np import matplotlib.pyplot as plt import sys import re
if len(sys.argv) != 2: print("Format error: <src wave>") sys.exit()
wavpath = sys.argv[1]
rate, samples = wav.read(wavpath)
m = re.match('(.*)/(.*)', wavpath)
wavname = m.group(2) print "process %s..." % wavname
wave = np.array(samples, dtype = "float")
frame_off = 160 frame_len = 400 spect_len = 512
frame_num = (wave.size - frame_len) / frame_off + 1
hamwindow = np.hamming(frame_len) spect = np.zeros((frame_num, spect_len / 2 + 1)) z = np.zeros(spect_len - frame_len)
for idx in range(frame_num): base = idx * frame_off frame = wave[base: base + frame_len] frame = np.append(frame * hamwindow, z) spect[idx:] = np.log10(np.abs(np.fft.rfft(frame)))
plt.title(wavname) plt.imshow(np.transpose(spect), origin="lower", cmap = "jet", aspect = "auto", interpolation = "none") plt.show()
|
给出一个绘制结果

完善:还可以为横纵坐标加上单位(时间和频率),时间信息如下
1 2 3 4
| xlocs = np.linspace(0, frame_num - 1, 5) frame_dur = 1 / float(rate) * frame_off plt.xticks(xlocs, ["%.02f" % l for l in (xlocs * frame_dur)]) plt.xlabel("time (s)")
|

频率信息待完善…