Managing Development of Speech Recognition Systems: Performance Issues

Karolina Kuligowska, Paweł Kisielewicz, Aleksandra Włodarz

Abstract


Speech recognition enables the transformation of spoken words and sentences into text in digital form. This technology is a subject of numerous studies and commercial development for many years. The aim of this paper is to examine performance issues of speech recognition and to manage the development in this field. Thorough analysis of performance limitations of speech recognition systems we identified main 11 issues to overcome. They indicate the direction of managing development of speech recognition systems.


Keywords


speech recognition system; speech-to-text performance; STT development

Full Text:

PDF

References


Akbarinia A., Valdez Medrano J., Zamani R., Speech Recognition for Noisy Environments – Feasibility of Voice Command in Construction Settings, Engineer’s thesis, Department of Computer Science and Engineering, University of Gothenburg, Goteborg, 2011.

Anumanchipalli G.K., Oliveira L.C., Black A.W., Intent transfer in speech-to-speech machine translation, “IEEE Workshop on Spoken Language Technology” 2012, DOI: https://doi.org/10.1109/SLT.2012.6424214.

Anusuya M.A., Katti S.K., Speech Recognition by Machine: A Review, “International Journal of Computer Science and Information Security” 2009, Vol. 6(3).

Biadsy F., Automatic Dialect and Accent Recognition and its Application to Speech Recognition, Department of Computer Science, Columbia University, 2011 (doctoral dissertation).

Cloarec G., Jouvet D., Modeling inter-speaker variability in speech recognition, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2008, DOI: https://doi.org/10.1109/ICASSP.2008.4518663.

Gaikwad S., Gawali B., Yannawar P., A review on Speech Recognition Technique, “International Journal of Computer Applications” 2010, Vol. 10(3), DOI: https://doi.org/10.5120/1462-1976.

Gubka R., Kuba M., Jarina R., Universal approach for sequential audio pattern search, Proceedings of the 2013 Federated Conference on Computer Science and Information Systems FedCSIS, Annals of Computer Science and Information Systems, Kraków 2013.

Janicki A., Wawer D., Automatic Speech Recognition for Polish in a Computer Game Interface, Proceedings of the 2011 Federated Conference on Computer Science and Information Systems FedCSIS, Annals of Computer Science and Information Systems, Szczecin 2011.

Mary L., Extraction and Representation of Prosody for Speaker, Speech and Language Recognition, SpringerBriefs in Electrical and Computer Engineering, Springer, New York 2012, DOI: https://doi.org/10.1007/978-1-4614-1159-8.

Morgan N., Deep and Wide: Multiple Layers in Automatic Speech Recognition, “IEEE Transactions on Audio, Speech and Language Processing” 2012, Vol. 20(1), DOI: https://doi.org/10.1109/TASL.2011.2116010.

Nouza J., Zdansky J., Cerva P., Silovsky J., Challenges in Speech Processing of Slavic Languages (Case Studies in Speech Recognition of Czech and Slovak), [in:] A. Esposito, N. Campbell, C. Vogel, A. Hussain, A. Nijholt (eds.), Development of Multimodal Interfaces: Active Listening and Synchrony, Springer Verlag, Berlin–Heidelberg 2010.

Qin L., Learning Out-of-Vocabulary Words in Automatic Speech recognition, Language Technologies Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh 2013 (doctoral dissertation).

Seppi D., Demuynck K., Compernolle D. van, Template-based Automatic Speech Recognition meets prosody, 12th Annual Conference of the International Speech Communication Association (Interspeech 2011), Florence 2011.

Shanthi T., Chelpa L., Review of Feature Extraction Techniques in Automatic Speech Recognition, “International Journal of Scientific Engineering and Technology” 2013, Vol. 2(6).

Virtanen T., Singh R., Raj B. (eds.), Techniques for Noise Robustness in Automatic Speech Recognition, Wiley, London 2013.

Wu C.-H., Liu C.-H., Robust Speech Recognition for Adverse Environments, [in:] S. Ramakrishnan (ed.), Modern Speech Recognition Approaches with Case Studies, Intech 2012, DOI: https://doi.org/10.5772/47843.

Ziółko B., Ziółko M., Przetwarzanie mowy, Wydawnictwa AGH, Kraków 2011.




DOI: http://dx.doi.org/10.17951/h.2018.52.2.71-78
Date of publication: 2018-07-27 07:47:42
Date of submission: 2018-03-17 22:51:24


Statistics


Total abstract view - 697
Downloads (from 2020-06-17) - PDF - 0

Indicators



Refbacks

  • There are currently no refbacks.


Copyright (c) 2018 Karolina Kuligowska, Paweł Kisielewicz, Aleksandra Włodarz

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.