Title: A HYBRID METHOD WITH CONFUSION NETWORK FOR INDEXING SPOKEN DOCUMENTS IN E-LIBRARIES

Year of Publication: 2013
Page Numbers: 386-390
Authors: Bendib Issam, Laouar Mohamed Ridda
Conference Name: The Third International Conference on Digital Information Processing and Communications (ICDIPC2013)
- United Arab Emirates

Abstract:


The technological development of storage techniques and retrieval methods excite the managers of E-libraries to integrate this resources in their systems. In practice, the creation of these systems imposes the definition of document indexing techniques. In this effect, indexing the spoken content of audio recordings requires the use of automatic speech recognition. Although the research areas of spoken words and audio retrieval has been well addressed, but still significant limitations to achieve, especially in terms of multimedia resource available today. In this paper we propose an indexing procedure for spoken document in field of E-libraries. Our method use the word confusion network as being a representation of alternative recognition candidate by aligning mutually exclusive terms and by giving the posterior probability of each term. The rank of the competing terms and their posterior probability is used to estimate term frequency for indexing. In parallel, we calculate the posterior probability of terms in their specific position in lattice generated with the transcription provided by a large vocabulary continuous speech recognition system. A validation of this approach of indexing and information retrieval is in the course of validation for the field of the E-libraries.