Exploring MediaPipe optimization strategies for real-time sign language recognition

Phuoc Thanh Nguyen; Thanh Hoang Nguyen; Ngoc Xuan Nguyen Hoang; Huynh Thanh Binh Phan; Hoang Son Hai Vu; Hieu Nhan Huynh

doi:10.22144/ctujoisd.2023.045

Phuoc Thanh Nguyen ^* , Thanh Hoang Nguyen , Ngoc Xuan Nguyen Hoang , Huynh Thanh Binh Phan , Hoang Son Hai Vu and Hieu Nhan Huynh

* Corresponding author: Phuoc Thanh Nguyen (email: ngpthanh15@gmail.com)

Full Text: PDF

Received: 31 Jul 2023

Revised: 18 Sep 2023

Accepted: 02 Oct 2023

Published: 16 Oct 2023

DOI: 10.22144/ctujoisd.2023.045

Views

1296

Downloads

696

How to Cite

Nguyen, P. T., Nguyen, T. H., Hoang, N. X. N., Phan, H. T. B., Vu , H. S. H., & Huynh , H. N. (2023). Exploring MediaPipe optimization strategies for real-time sign language recognition. CTU Journal of Innovation and Sustainable Development , 15(Special issue: ISDS), 142-152. https://doi.org/10.22144/ctujoisd.2023.045

Issue

Vol. 15 No. Special issue: ISDS (2023)

Section

Intelligent Systems and Data Science (ISDS 2023)

Abstract

The present study meticulously investigates optimization strategies for real-time sign language recognition (SLR) employing the MediaPipe framework. We introduce an innovative multi-modal methodology, amalgamating four distinct Long Short-Term Memory (LSTM) models dedicated to processing skeletal coordinates ascertained from the MediaPipe framework. Rigorous evaluations were executed on esteemed sign language datasets. Empirical findings underscore that the multi-modal approach significantly elevates the accuracy of the SLR model while preserving its real-time capabilities. In comparative analyses with prevalent MediaPipe-based models, our multi-modal strategy consistently manifested superior performance metrics. A distinguishing characteristic of this approach is its inherent adaptability, facilitating modifications within the LSTM layers, rendering it apt for a myriad of challenges and data typologies. Integrating the MediaPipe framework with real-time SLR markedly amplifies recognition precision, signifying a pivotal advancement in the discipline.

Keywords: LSTM, MediaPipe, How2Sign, Indian Sign Language, ISL

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

References

Shi, L., Zhang, Y., Cheng, J., & Lu, H. (2020). Skeleton-based action recognition with multi-stream adaptive graph convolutional networks. IEEE Transactions on Image Processing, 29, 9532-9545.

Dardas, N. H., & Georganas, N. D. (2011). Real-time hand gesture detection and recognition using bag-of-features and support vector machine techniques. IEEE Transactions on Instrumentation and Measurement, 60(11), 3592-3607.

Velmathi, G., & Goyal, K. (2023). Indian Sign Language Recognition Using Mediapipe Holistic. arXiv preprint.
https://arxiv.org/abs/2304.10256

Lugaresi, C., Tang, J., Nash, H., McClanahan, C., Uboweja, E., Hays, M., Zhang, F., Chang, C.-L., Yong, M. G., Lee, J., Chang, W.-T., Hua, W., Georg, M., & Grundmann, M. (2019). MediaPipe: A Framework for Building Perception Pipelines. arXiv preprint.
https://arxiv.org/abs/1906.08172

Staudemeyer, R. C., & Morris, E. R. (2019). Understanding LSTM -- a tutorial into Long Short-Term Memory Recurrent Neural Networks. arXiv preprint.
https://arxiv.org/abs/1909.09586

Emmorey, K. (2001). Language, cognition, and the brain: Insights from sign language research. Psychology Press.

Huang, J., Zhou, W., Li, H., & Li, W. (2018). Attention-based 3D-CNNs for large-vocabulary sign language recognition. IEEE Transactions on Circuits and Systems for Video Technology, 29(9), 2822-2832.

Sofianos, T., Sampieri, A., Franco, L., & Galasso, F. (2021). Space-Time-Separable Graph Convolutional Network for Pose Forecasting. CoRR, abs/2110.04573.
https://arxiv.org/abs/2110.04573

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research, 15(56), 1929-1958. http://jmlr.org/papers/v15/srivastava14a.html

Article Sidebar

Main Article Content

Abstract

Article Details

References