30
Jan
0

Matlab – Sound (& Speech) recognition using MFCC

A quick and easy way to recognize a sound.
This method can be seen as a base to create a speech to text method but I will not treat this subject here

Downloading good sound processing library

To make the thing working, you need to download and add to your work path the fallowing libraries:
PLP and RASTA (and MFCC, and inversion) in Matlab
(More details: http://labrosa.ee.columbia.edu/matlab/rastamat/)
Dynamic Time Warp (DTW) in Matlab Dynamic Time Warp (DTW) in Matlab
(More details: http://labrosa.ee.columbia.edu/matlab/dtw/)

Comparing to sounds

First of all let compare two sounds in a wav format. Record simple sounds for example ‘yes’ and ‘no’ two times.

[aSound1, sampleRate1] = wavread('yes.wav');
[aSound2, sampleRate2] = wavread('yes2.wav');

specSound1  = wavToSpectr( aSound1 , sampleRate1); %First convert 1st sound in spectre
specSound2  = wavToSpectr( aSound2 , sampleRate2); %2nd sound in spectre
checkSpectr(specSound1, specSound2);%then get score, the smaller is, the closer are the two sounds

%% Compare two spectres
function diff = checkSpectr( spec1 , spec2 )
% abs( spec ) is already done
M = simmx( spec1 , spec2 );

[p,q,D] = dp(1-M);
% p & q the best path
% D the cumulate matrix

diff = D(size(D,1),size(D,2)) ;

%% Wav To Spectre
function spec = wavToSpectr(d , sr )
% Calculate basic RASTA-PLP cepstra and spectra
[cep, spec] = rastaplp(d, sr,0,15);
spec = abs(spec);

We can say that two sounds represent the same word(s) if the score is less than a thresh (make different test to find you own limit). You have to know that the volume and the quality of the recorded sounds is very important.

Treating sound in real time

To start this part of the tutorial, please read Dynamic Time Warp (DTW) in Matlabthis first, it a small paper about creating a sound call back in Matlab to capture microphone sound.

Then, after creating you call-back, what we will do is cutting the sound depending of the level:

% Call-back for the sound
function showSample(obj, event)
% get sound
y = peekdata(obj,obj.SamplesAvailable);

%clear the data we just take
flushdata(obj);

%if data is not empty
if ~isempty(y)
...
% Plot data
plot( handles_global.axes , y );%if needed, print the data
...
%if the sound is enough louder
if max(y) > minSound
tmpSound =[tmpSound ;  y ];%concat previous and current sound
nullCounter = 0;
else
nullCounter =nullCounter + 1;%we count to have 10 time a small sound
if nullCounter > 10 %if it is more than 10 empty sound
% HERE YOU CAN TREAT DATA
dbase{ size(dbase , 1 ) +1 } = {tmpSound };%for example putting the sound in a database
tmpSound = [];% reset sound
else
tmpSound = [tmpSound ;  y ];%else we put the small silence in the current sound 

end
end
Enjoyed reading this post?
Subscribe to the RSS feed and have all new posts delivered straight to you.

Comments are closed.

Celadon theme by the Themes Boutique