Please choose your delivery country and your customer group
Speech separation can be treated as a mask estimation problem where interference dominant portions are masked in a time-frequency representation of noisy speech. In supervised speech separation, a classier is typically trained on a mixture set of speech and noise.It is important to efficiently utilize limited training data to make the classier generalize well.When target speech is severely interfered by a nonstationary noise, a classier tends to mistaken noise patterns for speech patterns. Expansion of a noise through proper perturbation during training helps to expose the classier to a broader variety of noisy conditions, and hence may improve separation performance. In this study, we examine the eects of three noise perturbations on supervised speech separation: noise rate, vocal tract length, and frequency perturbation at low signal-to-noise ratios (SNRs). We evaluate speech separation performance in terms of classication accuracy, hit minus false-alarm rate and short-time objective intelligibility (STOI). The experimental results show that frequency perturbation is the best among the three perturbations in terms of improved speech separation. In particular, we nd that frequency perturbation is effective in reducing the error of misclassifying a noise pattern as as speech pattern.