Source-Filter Based Clustering for Monaural Blind Source Separation

DAFx 2009

Conference homepage

Abstract

In monaural blind audio source separation scenarios, a signal mixture is usually separated into more signals than active sources. Therefore it is necessary to group the separated signals to the final source estimations. Traditionally grouping methods are supervised and thus need a learning step on appropriate training data. In contrast, we discuss unsupervised clustering of the separated channels by Mel frequency cepstrum coefficients (MFCC). We show that replacing the decorrelation step of the MFCC by the non-negative matrix factorization improves the separation quality significantly. The algorithms have been evaluated on a large test set consisting of melodies played with different instruments, vocals, speech, and noise.

Keywords:

Clustering, Monaural Blind Sound Source Separation, NMF, Audio

Paper : SpGn09a.pdf

Slides : Talk_DAFx09.pdf

Matlab-Code:

An example implementation is available under the GNU General Public License:
download

 

Sound Examples with 2 active Sources

Prand PMFCC PNMF,Div PNMF,Euc Pref
Bass Guitar 4.53 20.30 13.59 13.59 20.30
Bass Keyboard 1.61 14.37 14.37 14.37 14.45
Bass Drums 1.67 1.76 1.76 1.76 3.15
Guitar Keyboard 3.19 2.85 1.47 4.83 5.88
Guitar Drums 1.53 8.34 8.34 18.60 19.14
Keyboard Drums 4.32 8.71 15.87 15.88 15.92

Remarks:

  • Results are shown in dB.
  • Mixtures are created with a dynamic difference of 0 dB.
  • For such mixing scenarios PNMF,Euc leads generally to good clustering results, as mentioned in the paper.
  • The mixture Bass Drums could be separated well except the base drum, which is separated and clustered to the bass output. The very low SER could be explained by the high energy of the base drum.

 

Sound Examples with 2 active Sources and dynamic differences

Prand PMFCC PNMF,Div PNMF,Euc Pref
DD 0dB Picollo 3.50 3.63 9.60 9.63 10.06
DD 0dB Horn 3.53 3.89 9.71 9.71 10.15
DD 10dB Picollo 6.01 3.44 17.12 5.69 17.43
DD 10dB Horn -3.96 -6.55 7.17 -4.29 7.42

Remarks:

  • DD stands for dynamic difference between the two input signals
  • Results are shown in dB.
  • Sound files can be found here.
  • For a dynamic difference of 0 dB PNMF,Euc leads to slightly better separation results than PNMF,Div.
  • For a dynamic difference of 10 dB PNMF,Div is significantly better than PNMF,Euc.

 

Sound Examples with 3 active Sources: Bass, Harp, and Piccolo

Prand PMFCC PNMF,Div PNMF,Euc PMFCC,Hier PNMF,Div,Hier PNMF,Euc,Hier Pref
mean 2.83 7.98 20.60 20.62 1.92 20.57 20.62 20.95
Bass 4.94 18.75 18.69 18.69 2.66 18.72 18.69 18.85
Harp 1.35 2.52 17.84 17.93 3.13 17.75 17.93 18.68
Piccolo 2.22 2.67 25.26 25.25 -0.01 25.26 25.25 25.32

Remarks:

  • Results are shown in dB.
  • Sound files can be found here.

 

Sound Examples with 3 active Sources: Castanets, Violoncello, and Flute

Prand PMFCC PNMF,Div PNMF,Euc PMFCC,Hier PNMF,Div,Hier PNMF,Euc,Hier Pref
mean 0.30 10.66 3.74 11.04 2.32 10.77 6.09 11.25
Castanets -0.28 14.10 10.89 14.93 5.46 14.31 5.46 14.94
Violoncello 1.74 8.63 0.04 8.85 1.43 8.65 8.64 9.20
Flute -0.57 9.25 0.29 9.35 0.07 9.35 4.16 9.60

Remarks:

  • Results are shown in dB.
  • Sound files can be found here.
  • For PNMF,Div the violoncello and the flute could not be separated.
  • The hierarchical clustering PNMF,Div,Hier increases the separation quality by first separating an obvious source (the castanets). After that the remaining channels are clustered again into two other sources (violoncello and flute).

 

(C) by Martin Spiertz - 07. September 2009 - spiertz@ient.rwth-aachen.de