A syllable-scale framework for language identification [An article from: Computer Speech & Language]

 Computer Speech Technology at Amazon  Comments Off on A syllable-scale framework for language identification [An article from: Computer Speech & Language]
Nov 052015
 
A syllable-scale framework for language identification [An article from: Computer Speech & Language]
  • Pattern recognition
«
»
Amazon Price: $5.95 $5.95 (as of unknown date – Details). Product prices and availability are accurate as of the date/time indicated and are subject to change. Any price and availability information displayed on the Amazon site at the time of purchase will apply to the purchase of this product.

This digital document is a journal article from Computer Speech & Language, published by Elsevier in . The article is delivered in HTML format and is available in your Amazon.com Media Library immediately after purchase. You can view it with any web browser.

Description:
Whilst several examples of segment based approaches to language identification (LID) have been published, they have been typically conducted using only a small number of languages, or varying feature sets, thus making it difficult to determine how the segment length influences the accuracy of LID systems. In this study, phone-triplets are used as crude approximates for a syllable-length sub-word segmental unit. The proposed pseudo-syllabic length framework is subsequently used for both qualitative and quantitative examination of the contributions made by acoustic, phonotactic and prosodic information sources, and trialled in accordance with the NIST 1996 LID protocol. Firstly, a series of experimental comparisons are conducted which examine the utility of using segmental units for modelling short term acoustic features. These include comparisons between language specific Gaussian mixture models (GMMs), language specific GMMs for each segmental unit, and finally language specific hidden Markov models (HMM) for each segment, undertaken in an attempt to better model the temporal evolution of acoustic features. In a second tier of experiments, the contribution of both broad and fine class phonotactic information, when considered over an extended time frame, is contrasted with an implementation of the currently popular parallel phone recognition language modelling (PPRLM) technique. Results indicate that this information can be used to complement existing PPRLM systems to obtain improved performance. The pseudo-syllabic framework is also used to model prosodic dynamics and compared to an implemented version of a recently published system, achieving comparable levels of performance.

NIST and NFI-TNO evaluations of automatic speaker recognition [An article from: Computer Speech & Language]

 Computer Speech Technology at Amazon  Comments Off on NIST and NFI-TNO evaluations of automatic speaker recognition [An article from: Computer Speech & Language]
Nov 032015
 
NIST and NFI-TNO evaluations of automatic speaker recognition [An article from: Computer Speech & Language]
Amazon Price: $5.95 $5.95 (as of unknown date – Details). Product prices and availability are accurate as of the date/time indicated and are subject to change. Any price and availability information displayed on the Amazon site at the time of purchase will apply to the purchase of this product.

This digital document is a journal article from Computer Speech & Language, published by Elsevier in . The article is delivered in HTML format and is available in your Amazon.com Media Library immediately after purchase. You can view it with any web browser.

Description:
In the past years, several text-independent speaker recognition evaluation campaigns have taken place. This paper reports on results of the NIST evaluation of 2004 and the NFI-TNO forensic speaker recognition evaluation held in 2003, and reflects on the history of the evaluation campaigns. The effects of speech duration, training handsets, transmission type, and gender mix show expected behaviour on the DET curves. New results on the influence of language show an interesting dependence of the DET curves on the accent of speakers. We also report on a number of statistical analysis techniques that have recently been introduced in the speaker recognition community, as well as a new application of the analysis of deviance analysis. These techniques are used to determine that the two evaluations held in 2003, by NIST and NFI-TNO, are of statistically different difficulty to the speaker recognition systems.