Speech recognition using deep neural networks trained with non-uniform frame-level cost functions

Becerra de la Rosa, Aldonso; De la Rosa Vargas, José Ismael; González Ramírez, Efrén; Pedroza Ramírez, Ángel David; Martínez, Juan Manuel; Escalante, Nivia

Please use this identifier to cite or link to this item: http://ricaxcan.uaz.edu.mx/jspui/handle/20.500.11845/1894

Full metadata record

DC Field	Value	Language
dc.contributor	31249	es_ES
dc.contributor.other	https://orcid.org/0000-0002-7337-8974	-
dc.contributor.other	https://orcid.org/0000-0002-8060-6170	-
dc.coverage.spatial	Global	es_ES
dc.creator	Becerra de la Rosa, Aldonso	-
dc.creator	De la Rosa Vargas, José Ismael	-
dc.creator	González Ramírez, Efrén	-
dc.creator	Pedroza Ramírez, Ángel David	-
dc.creator	Martínez, Juan Manuel	-
dc.creator	Escalante, Nivia	-
dc.date.accessioned	2020-05-06T20:42:07Z	-
dc.date.available	2020-05-06T20:42:07Z	-
dc.date.issued	2017-11	-
dc.identifier	info:eu-repo/semantics/publishedVersion	es_ES
dc.identifier.issn	2573-0770	es_ES
dc.identifier.uri	http://ricaxcan.uaz.edu.mx/jspui/handle/20.500.11845/1894	-
dc.identifier.uri	https://doi.org/10.48779/9ds7-t936	-
dc.description.abstract	The aim of this paper is to present two new variations of the frame-level cost function for training a Deep neural network in order to achieve better word error rates in speech recognition. Minimization functions of a neural network are salient aspects to deal with when researchers are working on machine learning, and hence their improvement is a process of constant evolution. In the first proposed method, the conventional cross-entropy function can be mapped to a nonuniform loss function based on its corresponding extropy (a complementary dual function), enhancing the frames that have ambiguity in their belonging to specific senones (tied-triphone states in a hidden Markov model). The second proposition is a fusion of the proposed mapped cross-entropy and the boosted cross-entropy function, which emphasizes those frames with low target posterior probability. The developed approaches have been performed by using a personalized mid-vocabulary speaker-independent voice corpus. This dataset is employed for recognition of digit strings and personal name lists in Spanish from the northern central part of Mexico on a connected-words phone dialing task. A relative word error rate improvement of 12.3% and 10.7% is obtained with the two proposed approaches, respectively, regarding the conventional well-established crossentropy objective function.	es_ES
dc.language.iso	eng	es_ES
dc.publisher	IEEE	es_ES
dc.relation.uri	generalPublic	es_ES
dc.rights	Atribución-NoComercial-SinDerivadas 3.0 Estados Unidos de América	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/us/	*
dc.source	Proc. of the IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC2017), at Ixtapa, Mexico, pp. 1-6, 2017.	es_ES
dc.subject.classification	INGENIERIA Y TECNOLOGIA [7]	es_ES
dc.subject.other	Speech recognition	es_ES
dc.subject.other	Deep neural network	es_ES
dc.subject.other	Deep Learning	es_ES
dc.title	Speech recognition using deep neural networks trained with non-uniform frame-level cost functions	es_ES
dc.type	info:eu-repo/semantics/conferencePaper	es_ES
Appears in Collections:	Documentos Académicos-- M. en Ciencias del Proc. de la Info.

Files in This Item:

File	Description	Size	Format
72_Becerra_DelaRosa IEEEROPEC P1 2017.pdf	Becerra_DelaRosa IEEEROPEC P1 2017	373,94 kB	Adobe PDF	View/Open

Show simple item record

This item is licensed under a Creative Commons License