Automatic Recognition of Ageing Speakers
The process of ageing causes changes to the voice over time. There have been significant research efforts in the automatic speaker recognition community towards improving performance in the presence of everyday variability. The influence of long-term variability, due to vocal ageing, has received only marginal attention however. In this Thesis, the impact of vocal ageing on speaker verification and forensic speaker recognition is assessed, and novel methods are proposed to counteract its effect. The Trinity College Dublin Speaker Ageing (TCDSA) database, compiled for this study, is first introduced. Containing 26 speakers, with recordings spanning an age difference of between 28 and 58 years per speaker, it is the largest longitudinal speech database in the public domain. A Gaussian Mixture Model-Universal Background Model (GMM-UBM) speaker verification experiment demonstrates a progressive decline in the scores of genuine-speakers as the age difference between training and testing increases. The scores of imposters, over the same period, are relatively stable. Consequently, verification error increases with age difference. A novel stacked classifier approach, exploiting an ageing-dependent decision threshold is introduced, significantly reducing verification error rates at large age differences. A new model-based quality measure, Wnorm, is incorporated into the stacked classifier framework alongside ageing information, resulting in a further reduction in the baseline error. A second novel approach, eigenageing compensation, operates by determining the dominant directions of change in the models of ageing speakers, and using this information to compensate for an age difference between training and testing samples. Eigenageing compensation results in a relative reduction in baseline error at large age differences that compares favourably to the stacked classifier approach. A by-product of the eigenageing compensation method is shown to enable a promising new approach to automatic age estimation. Vocal ageing is of particular relevance to the forensic domain. An evaluation of five ageing Irish males is presented in a forensic automatic speaker recognition framework. Vocal ageing is shown to significantly weaken strength-of-evidence estimates, leading to cases of erroneous support for the different-speaker hypothesis within 10 years. Eigenageing compensation is shown to be suitable for the forensic domain, and is effective at reducing the impact of ageing. A listener test demonstrates that vocal ageing is detected with increasing accuracy as age difference increases, and is significantly more detectable in the voices of females than males. The effect of ageing on the performance of an i-vector speaker verification framework is evaluated. Compared to a GMM-UBM approach, lower absolute error rates are achieved. As ageing progresses however, the performance of both systems degrades at the same rate, demonstrating that current inter-session compensation approaches are not sufficient for dealing with ageing variability. This demands that specific strategies, such as the proposed stacked classifier and eigenageing compensation methods, be adopted.
