Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Gesture recognition using a bioinspired learning architecture that integrates visual data with somatosensory data from stretchable sensors

Abstract

Gesture recognition using machine-learning methods is valuable in the development of advanced cybernetics, robotics and healthcare systems, and typically relies on images or videos. To improve recognition accuracy, such visual data can be combined with data from other sensors, but this approach, which is termed data fusion, is limited by the quality of the sensor data and the incompatibility of the datasets. Here, we report a bioinspired data fusion architecture that can perform human gesture recognition by integrating visual data with somatosensory data from skin-like stretchable strain sensors made from single-walled carbon nanotubes. The learning architecture uses a convolutional neural network for visual processing and then implements a sparse neural network for sensor data fusion and recognition at the feature level. Our approach can achieve a recognition accuracy of 100% and maintain recognition accuracy in non-ideal conditions where images are noisy and under- or over-exposed. We also show that our architecture can be used for robot navigation via hand gestures, with an error of 1.7% under normal illumination and 3.3% in the dark.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Bioinspired somatosensory–visual associated learning framework.
Fig. 2: Characterization of stretchable strain sensor.
Fig. 3: Dataset preparation for BSV associated learning.
Fig. 4: BSV associated learning for classification.
Fig. 5: Precise HGR based on BSV for human–machine interaction.

Data availability

The data that support the plots within this paper and other findings of this study are available from the corresponding author upon reasonable request. The SV datasets used in this study are available at https://github.com/mirwang666-ime/Somato-visual-SV-dataset.

Code availability

The code that supports the plots within this paper and other findings of this study are available at https://github.com/mirwang666-ime/Somato-visual-SV-dataset. The code that supports the human–machine interaction experiment is available from the corresponding author upon reasonable request.

References

  1. 1.

    Yamada, T. et al. A stretchable carbon nanotube strain sensor for human-motion detection. Nat. Nanotechnol. 6, 296–301 (2011).

    Article  Google Scholar 

  2. 2.

    Amjadi, M., Kyung, K.-U., Park, I. & Sitti, M. Stretchable, skin-mountable and wearable strain sensors and their potential applications: a review. Adv. Funct. Mater. 26, 1678–1698 (2016).

    Article  Google Scholar 

  3. 3.

    Rautaray, S. S. & Agrawal, A. Vision based hand gesture recognition for human computer interaction: a survey. Artif. Intell. Rev. 43, 1–54 (2015).

    Article  Google Scholar 

  4. 4.

    Lim, S. et al. Transparent and stretchable interactive human machine interface based on patterned graphene heterostructures. Adv. Funct. Mater. 25, 375–383 (2015).

    Article  Google Scholar 

  5. 5.

    Pisharady, P. K., Vadakkepat, P. & Loh, A. P. Attention based detection and recognition of hand postures against complex backgrounds. Int. J. Comput. Vis. 101, 403–419 (2013).

    Article  Google Scholar 

  6. 6.

    Giese, M. A. & Poggio, T. Neural mechanisms for the recognition of biological movements. Nat. Rev. Neurosci. 4, 179–192 (2003).

    Article  Google Scholar 

  7. 7.

    Tan, X. & Triggs, B. Enhanced local texture feature sets for face recognition under difficult lighting conditions. IEEE Trans. Image Process. 19, 1635–1650 (2010).

    MathSciNet  Article  Google Scholar 

  8. 8.

    Liu, H., Ju, Z., Ji, X., Chan, C. S. & Khoury, M. Human Motion Sensing and Recognition (Springer, 2017).

  9. 9.

    Liu, K., Chen, C., Jafari, R. & Kehtarnavaz, N. Fusion of inertial and depth sensor data for robust hand gesture recognition. IEEE Sens. J. 14, 1898–1903 (2014).

    Article  Google Scholar 

  10. 10.

    Chen, C., Jafari, R. & Kehtarnavaz, N. A survey of depth and inertial sensor fusion for human action recognition. Multimed. Tools Appl. 76, 4405–4425 (2017).

    Article  Google Scholar 

  11. 11.

    Dawar, N., Ostadabbas, S. & Kehtarnavaz, N. Data augmentation in deep learning-based fusion of depth and inertial sensing for action recognition. IEEE Sens. Lett. 3, 7101004 (2019).

    Article  Google Scholar 

  12. 12.

    Kwolek, B. & Kepski, M. Improving fall detection by the use of depth sensor and accelerometer. Neurocomputing 168, 637–645 (2015).

    Article  Google Scholar 

  13. 13.

    Tang, D., Yusuf, B., Botzheim, J., Kubota, N. & Chan, C. S. A novel multimodal communication framework using robot partner for aging population. Expert Syst. Appl. 42, 4540–4555 (2015).

    Article  Google Scholar 

  14. 14.

    Wang, C., Wang, C., Huang, Z. & Xu, S. Materials and structures toward soft electronics. Adv. Mater. 30, 1801368 (2018).

    Article  Google Scholar 

  15. 15.

    Kim, D. H. et al. Dissolvable films of silk fibroin for ultrathin conformal bio-integrated electronics. Nat. Mater. 9, 511–517 (2010).

    Article  Google Scholar 

  16. 16.

    Ehatisham-Ul-Haq, M. et al. Robust human activity recognition using multimodal feature-level fusion. IEEE Access 7, 60736–60751 (2019).

    Article  Google Scholar 

  17. 17.

    Imran, J. & Raman, B. Evaluating fusion of RGB-D and inertial sensors for multimodal human action recognition. J. Amb. Intel. Hum. Comput. 11, 189–208 (2020).

    Article  Google Scholar 

  18. 18.

    Dawar, N. & Kehtarnavaz, N. Action detection and recognition in continuous action streams by deep learning-based sensing fusion. IEEE Sens. J. 18, 9660–9668 (2018).

    Article  Google Scholar 

  19. 19.

    LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).

    Article  Google Scholar 

  20. 20.

    Wang, M., Wang, T., Cai, P. & Chen, X. Nanomaterials discovery and design through machine learning. Small Methods 3, 1900025 (2019).

    Article  Google Scholar 

  21. 21.

    Li, S.-Z., Yu, B., Wu, W., Su, S.-Z. & Ji, R.-R. Feature learning based on SAE–PCA network for human gesture recognition in RGBD images. Neurocomputing 151, 565–573 (2015).

    Article  Google Scholar 

  22. 22.

    Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115–118 (2017).

    Article  Google Scholar 

  23. 23.

    Long, E. et al. An artificial intelligence platform for the multihospital collaborative management of congenital cataracts. Nat. Biomed. Eng. 1, 0024 (2017).

    Article  Google Scholar 

  24. 24.

    Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016).

    Article  Google Scholar 

  25. 25.

    Silver, D. et al. Mastering the game of Go without human knowledge. Nature 550, 354–359 (2017).

    Article  Google Scholar 

  26. 26.

    Chandrasekaran, C., Lemus, L. & Ghazanfar, A. A. Dynamic faces speed up the onset of auditory cortical spiking responses during vocal detection. Proc. Natl Acad. Sci. USA 110, E4668–E4677 (2013).

    Article  Google Scholar 

  27. 27.

    Lakatos, P., Chen, C. M., O’Connell, M. N., Mills, A. & Schroeder, C. E. Neuronal oscillations and multisensory interaction in primary auditory cortex. Neuron 53, 279–292 (2007).

    Article  Google Scholar 

  28. 28.

    Henschke, J. U., Noesselt, T., Scheich, H. & Budinger, E. Possible anatomical pathways for short-latency multisensory integration processes in primary sensory cortices. Brain Struct. Funct. 220, 955–977 (2015).

    Article  Google Scholar 

  29. 29.

    Lee, A. K. C., Wallace, M. T., Coffin, A. B., Popper, A. N. & Fay, R. R. (eds) Multisensory Processes: The Auditory Perspective (Springer, 2019).

  30. 30.

    Bizley, J. K., Jones, G. P. & Town, S. M. Where are multisensory signals combined for perceptual decision-making? Curr. Opin. Neurobiol. 40, 31–37 (2016).

    Article  Google Scholar 

  31. 31.

    Ohyama, T. et al. A multilevel multimodal circuit enhances action selection in Drosophila. Nature 520, 633–639 (2015).

    Article  Google Scholar 

  32. 32.

    Bullmore, E. & Sporns, O. Complex brain networks: graph theoretical analysis of structural and functional systems. Nat. Rev. Neurosci. 10, 186–198 (2009).

    Article  Google Scholar 

  33. 33.

    Yamins, D. L. et al. Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proc. Natl Acad. Sci. USA 111, 8619–8624 (2014).

    Article  Google Scholar 

  34. 34.

    Gilbert, C. D. & Li, W. Top-down influences on visual processing. Nat. Rev. Neurosci. 14, 350–363 (2013).

    Article  Google Scholar 

  35. 35.

    Chortos, A., Liu, J. & Bao, Z. Pursuing prosthetic electronic skin. Nat. Mater. 15, 937–950 (2016).

    Article  Google Scholar 

  36. 36.

    Barbier, V. et al. Stable modification of PDMS surface properties by plasma polymerization: application to the formation of double emulsions in microfluidic systems. Langmuir 22, 5230–5232 (2006).

    Article  Google Scholar 

  37. 37.

    Bakarich, S. E. et al. Recovery from applied strain in interpenetrating polymer network hydrogels with ionic and covalent cross-links. Soft Matter 8, 9985–9988 (2012).

    Article  Google Scholar 

  38. 38.

    Van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).

    MATH  Google Scholar 

  39. 39.

    Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet classification with deep convolutional neural networks. In Proc. 25th International Conference on Neural Information Processing Systems Vol. 1, 1097–1105 (NIPS, 2012).

  40. 40.

    Polson, N. & Rockova, V. Posterior concentration for sparse deep learning. In Proc. 31st International Conference on Neural Information Processing Systems 930–941 (NIPS, 2018).

  41. 41.

    Le, X. & Wang, J. Robust pole assignment for synthesizing feedback control systems using recurrent neural networks. IEEE Trans. Neural Netw. Learn. Syst. 25, 383–393 (2013).

    Article  Google Scholar 

Download references

Acknowledgements

The project was supported by the Agency for Science, Technology and Research (A*STAR) under its Advanced Manufacturing and Engineering (AME) Programmatic Scheme (no. A18A1b0045), the National Research Foundation (NRF), Prime Minister’s office, Singapore, under its NRF Investigatorship (NRF-NRFI2017-07), Singapore Ministry of Education (MOE2017-T2-2-107) and the Australian Research Council (ARC) under Discovery Grant DP200100700. We thank all the volunteers for collecting data and also A.L. Chun for critical reading and editing of the manuscript.

Author information

Affiliations

Authors

Contributions

M.W. and X.C. designed the study. M.W. designed and characterized the strain sensor. M.W., T.W. and P.C. fabricated the PAA hydrogels. Z.Y. and M.W. carried out the machine learning algorithms and analysed the results. M.W., S.G. and Y.Z. collected the SV data. M.W. performed the human–machine interaction experiment. M.W. and X.C. wrote the paper and all authors provided feedback.

Corresponding author

Correspondence to Xiaodong Chen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Notes 1–3, Figs. 1–11, Tables 1–3 and refs. 1–7.

Reporting Summary

Supplementary Video 1

Conformable and adhesive stretchable strain sensor.

Supplementary Video 2

Comparison of the robot navigation using the BSV learning-based and visual-based hand gesture recognition under a normal illuminance of 431 lux.

Supplementary Video 3

Comparison of the robot navigation using the BSV learning-based and visual-based hand gesture recognition under a dark illuminance of 10 lux.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wang, M., Yan, Z., Wang, T. et al. Gesture recognition using a bioinspired learning architecture that integrates visual data with somatosensory data from stretchable sensors. Nat Electron 3, 563–570 (2020). https://doi.org/10.1038/s41928-020-0422-z

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing