Please use this identifier to cite or link to this item: http://ricaxcan.uaz.edu.mx/jspui/handle/20.500.11845/3435
Full metadata record
DC FieldValueLanguage
dc.contributor31249en_US
dc.contributor.advisorEscalante García Nivia I.en_US
dc.contributor.advisorOlvera González J. Ernestoen_US
dc.contributor.otherhttps://orcid.org/0000-0002-7337-8974en_US
dc.coverage.spatialGlobalen_US
dc.creatorVelásquez Martínez, Emmanuel de J.-
dc.creatorBecerra Sánchez, Aldonso-
dc.creatorde la Rosa Vargas, José I.-
dc.creatorGonzález Ramírez, Efrén-
dc.creatorRodarte Rodríguez, Armando-
dc.creatorZepeda Valles, Gustavo-
dc.date.accessioned2023-11-06T19:36:26Z-
dc.date.available2023-11-06T19:36:26Z-
dc.date.issued2023-10-22-
dc.identifierinfo:eu-repo/semantics/acceptedVersionen_US
dc.identifier.isbn979-8-3503-3688-7en_US
dc.identifier.urihttp://ricaxcan.uaz.edu.mx/jspui/handle/20.500.11845/3435-
dc.identifier.urihttp://dx.doi.org/10.48779/ricaxcan-266-
dc.description.abstractSpeech recognition is a common task in various everyday user systems; however, its effectiveness is limited in noisy environments such as moving vehicles, homes with ambient noise, mobile phones, among others. This work proposes to combine deep learning techniques with domain adaptation and filtering based on Wavelet Transform to eliminate both stationary and non-stationary noise in speech signals in automatic speech recognition (ASR) and speaker identification tasks. It demonstrates how a deep neural network model with domain adaptation, using Optimal Transport, can be trained to mitigate different types of noise. Evaluations were conducted based on Short-Term Objective Intelligibility (STOI) and Perceptual Evaluation of Speech Quality (PESQ). The Wavelet Transform (WT) was applied as a filtering technique to perform a second processing on the speech signal enhanced by the deep neural network, resulting in an average improvement of 20% in STOI and 9% in PESQ compared to the noisy signal. The process was evaluated on a pre-trained ASR system, achieving a general decrease in WER of 14.24%, while an average 99% accuracy in speaker identification. Thus, the proposed approach provides a significant improvement in speech recognition performance by addressing the problem of noisy speech.en_US
dc.language.isoengen_US
dc.publisherIEEEen_US
dc.relation.isbasedonUAZ-2022 38599en_US
dc.relation.urigeneralPublicen_US
dc.rightsAttribution 3.0 United States*
dc.rights.urihttp://creativecommons.org/licenses/by/3.0/us/*
dc.sourceIEEE International Autumn Meting on Power, Electronics and Computing (Ixtapa, Méx.), Méxicoen_US
dc.subject.classificationINGENIERIA Y TECNOLOGIA [7]en_US
dc.subject.otherDeep Learningen_US
dc.subject.otherDomain Adaptationen_US
dc.subject.otherFilteringen_US
dc.titleCombining Deep Learning with Domain Adaptation and Filtering Techniques for Speech Recognition in Noisy Environmentsen_US
dc.typeinfo:eu-repo/semantics/conferenceProceedingsen_US
Appears in Collections:*Documentos Académicos*-- M. en Ciencias del Proc. de la Info.

Files in This Item:
File Description SizeFormat 
84_VelasquezE_DelaRosa IEEEROPEC 2023.pdfVelasquezE_DelaRosa IEEEROPEC 20231,59 MBAdobe PDFView/Open


This item is licensed under a Creative Commons License Creative Commons