Letter | Published:

Learning the signatures of the human grasp using a scalable tactile glove

Abstract

Humans can feel, weigh and grasp diverse objects, and simultaneously infer their material properties while applying the right amount of force—a challenging set of tasks for a modern robot1. Mechanoreceptor networks that provide sensory feedback and enable the dexterity of the human grasp2 remain difficult to replicate in robots. Whereas computer-vision-based robot grasping strategies3,4,5 have progressed substantially with the abundance of visual data and emerging machine-learning tools, there are as yet no equivalent sensing platforms and large-scale datasets with which to probe the use of the tactile information that humans rely on when grasping objects. Studying the mechanics of how humans grasp objects will complement vision-based robotic object handling. Importantly, the inability to record and analyse tactile signals currently limits our understanding of the role of tactile information in the human grasp itself—for example, how tactile maps are used to identify objects and infer their properties is unknown6. Here we use a scalable tactile glove and deep convolutional neural networks to show that sensors uniformly distributed over the hand can be used to identify individual objects, estimate their weight and explore the typical tactile patterns that emerge while grasping objects. The sensor array (548 sensors) is assembled on a knitted glove, and consists of a piezoresistive film connected by a network of conductive thread electrodes that are passively probed. Using a low-cost (about US$10) scalable tactile glove sensor array, we record a large-scale tactile dataset with 135,000 frames, each covering the full hand, while interacting with 26 different objects. This set of interactions with different objects reveals the key correspondences between different regions of a human hand while it is manipulating objects. Insights from the tactile signatures of the human grasp—through the lens of an artificial analogue of the natural mechanoreceptor network—can thus aid the future design of prosthetics7, robot grasping tools and human–robot interactions1,8,9,10.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Code availability

Custom code used in the current study is available from the corresponding author on request.

Data availability

Source data for key figures in the manuscript are included as interactive maps in Supplementary Data 1–3. Please load (and refresh) all ‘.html’ pages in Firefox or Chrome. The tactile datasets generated and analysed during this study are available from the corresponding author on request.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. 1.

    Bartolozzi, C., Natale, L., Nori, F. & Metta, G. Robots with a sense of touch. Nat. Mater. 15, 921–925 (2016).

  2. 2.

    Johansson, R. & Flanagan, J. Coding and use of tactile signals from the fingertips in object manipulation tasks. Nat. Rev. Neurosci. 10, 345–359 (2009).

  3. 3.

    Mahler, J., Matl, M., Satish, V., Danielczuk, M., DeRose, B., McKinley, S. & Goldberg, K. Learning ambidextrous robot grasping policies. Sci. Robot. 4, eaau4984 (2019).

  4. 4.

    Levine, S., Finn, C., Darrell, T. & Abbeel, P. End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17, 1334–1373 (2016).

  5. 5.

    Morrison, D., Corke, P. & Leitner, J. Closing the loop for robotic grasping: a real-time, generative grasp synthesis approach. In Proc. Robotics: Science and Systems https://doi.org/10.15607/RSS.2018.XIV.021 (RSS Foundation, 2018).

  6. 6.

    Saal, H., Delhaye, B., Rayhaun, B. & Bensmaia, S. Simulating tactile signals from the whole hand with millisecond precision. Proc. Natl Acad. Sci. USA 114, E5693–E5702 (2017).

  7. 7.

    Osborn, L. et al. Prosthesis with neuromorphic multilayered e-dermis perceives touch and pain. Sci. Robot. 3, eaat3818 (2018).

  8. 8.

    Okamura, A. M., Smaby, N. & Cutkosky, M. R. An overview of dexterous manipulation. In Proc. IEEE International Conference on Robotics and Automation (ICRA’00) 255–262 https://doi.org/10.1109/ROBOT.2000.844067 (2000).

  9. 9.

    Cannata, G., Maggiali, M., Metta, G. & Sandini, G. (2008). An embedded artificial skin for humanoid robots. In Proc. International Conference  on Multisensor Fusion and Integration for Intelligent Systems 434–438 https://doi.org/10.1109/MFI.2008.4648033 (2008).

  10. 10.

    Romano, J., Hsiao, K., Niemeyer, G., Chitta, S. & Kuchenbecker, K. Human-inspired robotic grasp control with tactile sensing. IEEE Trans. Robot. 27, 1067–1079 (2011).

  11. 11.

    Marzke, M. Precision grips, hand morphology, and tools. Am. J. Phys. Anthropol. 102, 91–110 (1997).

  12. 12.

    Niewoehner, W., Bergstrom, A., Eichele, D., Zuroff, M. & Clark, J. Manual dexterity in Neanderthals. Nature 422, 395 (2003).

  13. 13.

    Feix, T., Kivell, T., Pouydebat, E. & Dollar, A. Estimating thumb-index finger precision grip and manipulation potential in extant and fossil primates. J. R. Soc. Interf. 12, https://doi.org/10.1098/rsif.2015.0176 (2015).

  14. 14.

    Chortos, A., Liu, J. & Bao, Z. Pursuing prosthetic electronic skin. Nat. Mater. 15, 937–950 (2016).

  15. 15.

    Li, R. et al. Localization and manipulation of small parts using GelSight tactile sensing. In Proc. International Conference Intelligent Robots and Systems 3988–3993 https://doi.org/10.1109/IROS.2014.6943123 (IEEE/RSJ, 2014).

  16. 16.

    Yamaguchi, A. & Atkeson, C. G. Combining finger vision and optical tactile sensing: reducing and handling errors while cutting vegetables. In Proc. IEEE 16th International Conference on Humanoid Robots (Humanoids) 1045–1051 https://doi.org/10.1109/HUMANOIDS.2016.7803400 (IEEE-RAS, 2016).

  17. 17.

    Wettels, N. & Loeb, G. E. Haptic feature extraction from a biomimetic tactile sensor: force, contact location and curvature. In Proc. International Conference on Robotics and Biomimetics 2471–2478 (IEEE, 2011).

  18. 18.

    Park, J., Kim, M., Lee, Y., Lee, H. & Ko, H. Fingertip skin-inspired microstructured ferroelectric skins discriminate static/dynamic pressure and temperature stimuli. Sci. Adv. 1, e1500661 (2015).

  19. 19.

    Yau, J., Kim, S., Thakur, P. & Bensmaia, S. Feeling form: the neural basis of haptic shape perception. J. Neurophysiol. 115, 631–642 (2016).

  20. 20.

    Bachmann, T. Identification of spatially quantised tachistoscopic images of faces: how many pixels does it take to carry identity? Eur. J. Cogn. Psychol. 3, 87–103 (1991).

  21. 21.

    Torralba, A., Fergus, R. & Freeman, W. 80 million tiny images: a large dataset for nonparametric object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 30, 1958–1970 (2008).

  22. 22.

    D’Alessio, T. Measurement errors in the scanning of piezoresistive sensors arrays. Sens. Actuators A 72, 71–76 (1999).

  23. 23.

    Ko, J., Bhullar, S., Cho, Y., Lee, P. & Byung-Guk Jun, M. Design and fabrication of auxetic stretchable force sensor for hand rehabilitation. Smart Mater. Struct. 24, 075027 (2015).

  24. 24.

    He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 770–778 https://doi.org/10.1109/CVPR.2016.90 (IEEE, 2016).

  25. 25.

    Russakovsky, O. et al. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015).

  26. 26.

    Brodie, E. & Ross, H. Sensorimotor mechanisms in weight discrimination. Percept. Psychophys. 36, 477–481 (1984).

  27. 27.

    Napier, J. The prehensile movements of the human hand. J. Bone Joint Surg. Br. 38-B, 902–913 (1956).

  28. 28.

    Lederman, S. & Klatzky, R. Hand movements: a window into haptic object recognition. Cognit. Psychol. 19, 342–368 (1987).

  29. 29.

    Feix, T., Romero, J., Schmiedmayer, H., Dollar, A. & Kragic, D. The GRASP taxonomy of human grasp types. IEEE Trans. Hum. Mach. Syst. 46, 66–77 (2016).

  30. 30.

    Simon, T., Joo, H., Matthews, I. & Sheikh, Y. Hand keypoint detection in single images using multiview bootstrapping. In Proc.  IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 4645–4653 https://doi.org/10.1109/CVPR.2017.494 (IEEE, 2017).

  31. 31.

    Lazzarini, R., Magni, R. & Dario, P. A tactile array sensor layered in an artificial skin. In Proc.  IEEE International Conference on Intelligent Robots and Systems (Human Robot Interaction and Cooperative Robots) 114–119 https://doi.org/10.1109/IROS.1995.525871 (IEEE/RSJ, 1995).

  32. 32.

    Newell, F., Ernst, M., Tjan, B. & Bülthoff, H. Viewpoint dependence in visual and haptic object recognition. Psychol. Sci. 12, 37–42 (2001).

  33. 33.

    Higy, B., Ciliberto, C., Rosasco, L. & Natale, L. Combining sensory modalities and exploratory procedures to improve haptic object recognition in robotics. In Proc. 16th International Conference on Humanoid Robots (Humanoids) 117–124 https://doi.org/10.1109/HUMANOIDS.2016.7803263 (IEEE-RAS, 2016).

  34. 34.

    Klatzky, R., Lederman, S. & Metzger, V. Identifying objects by touch: an “expert system”. Percept. Psychophys. 37, 299–302 (1985).

  35. 35.

    Lederman, S. & Klatzky, R. Haptic perception: a tutorial. Atten. Percept. Psychophys. 71, 1439–1459 (2009).

  36. 36.

    Kappassov, Z., Corrales, J. & Perdereau, V. Tactile sensing in dexterous robot hands. Robot. Auton. Syst. 74, 195–220 (2015).

  37. 37.

    Gao, Y., Hendricks, L. A., Kuchenbecker, K. J. & Darrell, T. Deep learning for tactile understanding from visual and haptic data. In Proc. International Conference on Robotics and Automation (ICRA) 536–543 https://doi.org/10.1109/ICRA.2016.7487176 (IEEE, 2016).

  38. 38.

    Meier, M., Walck, G., Haschke, R. & Ritter, H. J. Distinguishing sliding from slipping during object pushing. In Proc. IEEE Intelligent Robots and Systems (IROS) 5579–5584 https://doi.org/10.1109/IROS.2016.7759820 (2016).

  39. 39.

    Baishya, S. S. & Bäuml, B. Robust material classification with a tactile skin using deep learning. In Proc.  IEEE Intelligent Robots and Systems (IROS) 8–15 https://doi.org/10.1109/IROS.2016.7758088 (2016).

  40. 40.

    Tompson, J., Goroshin, R., Jain, A., LeCun, Y. & Bregler, C. Efficient object localization using convolutional networks. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 648–656 https://doi.org/10.1109/CVPR.2015.7298664 (IEEE, 2015).

  41. 41.

    Paszke, A. et al. Automatic differentiation in PyTorch. In Proc. 31st Conference on Neural Information Processing Systems (NIPS) 1–4 (2017).

  42. 42.

    Bau, D., Zhou, B., Khosla, A., Oliva, A. & Torralba, A. Network dissection: quantifying interpretability of deep visual representations. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 3319–3327 https://doi.org/10.1109/CVPR.2017.354 (IEEE, 2017).

  43. 43.

    Flanagan, J. & Bandomir, C. Coming to grips with weight perception: effects of grasp configuration on perceived heaviness. Percept. Psychophys. 62, 1204–1219 (2000).

  44. 44.

    Maaten, L. V. D. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).

  45. 45.

    Krzywinski, M. et al. Circos: an information aesthetic for comparative genomics. Genome Res. 19, 1639–1645 (2009).

Download references

Acknowledgements

S.S. thanks M. Baldo, V. Bulovic and J. Lang for their comments and discussions. P.K. and S.S. thank K. Myszkowski for discussions. We gratefully acknowledge support from the Toyota Research Institute.

Reviewer information

Nature thanks Giulia Pasquale and Alexander Schmitz for their contribution to the peer review of this work.

Author information

S.S. conceived the sensor and hardware designs, performed experiments, was involved in all aspects of the work and led the project. P.K. performed all data analysis with input from all authors. Y.L. performed network dissection. S.S. and P.K. generated the results. A.T. and W.M. supervised the work. All authors discussed ideas and results and contributed to the manuscript.

Competing interests

The authors declare no competing interests.

Correspondence to Subramanian Sundaram.

Extended data figures and tables

  1. Extended Data Figure 1 STAG images and readout circuit architecture.

    a, Image of the finished STAG just before the electrodes are insulated. b, Scan of the STAG. c, Electrical-grounding-based signal isolation circuit (based on ref. 22). The active row during readout is selected by grounding one of the 32 single-pole double throw (SPDT) switches. A 32:1 analog switch is used to select one of the 32 columns at a time. Here Rc is the charging resistor, Vref is the reference voltage, and Rg sets the amplifier gain. d, Fabricated printed circuit board that interfaces with the STAG. The two connectors shown on the top right and bottom are connected to the column and row electrodes of the sensor matrix. The charging resistors (Rc) are on the back of the printed circuit board.

  2. Extended Data Figure 2 Characteristics of the STAG sensing elements.

    a, The resistance of a single sensing element shows the linear working range (in logarithmic force units). The sensor is not sensitive below about 20 mN of force and saturates in response when a load exceeding 0.8 N is applied. b, Response of three separate sensors in the force range 20 mN to 0.5 N. The sensors show minimal hysteresis (17.5 ± 2.8%; see Supplementary Fig. 2). c, The sensor response after 10, 100 and 1,000 cycles of linear force ramps up to 0.5 N for three separate devices. The resistance measurements are shown in d over the entire set of cycles. e, Differential scanning calorimetry measurements of the FSF material shows a two-polymer blend response with softening/melting temperatures of around 100 °C and 115.1 °C. f, Through-film resistance of an unloaded sensor after treating at different temperatures in a convection oven for 10 min. The film becomes insulating above about 80 °C.

  3. Extended Data Figure 3 Sensor architectures and regular 32 × 32 arrays.

    a, A simplified version of the sensor laminate architecture. b, The sensor is assembled by laminating a FSF along with orthogonal electrodes on each side, that are held in place and insulated by a layer of two-sided adhesive and a stretchable LDPE film (see Methods). c, Fixture used to assemble parallel electrodes. The individual electrodes can be threaded into the structure (like a needle) for assembling parallel electrodes with a spacing of 2.5 mm. d, Assembled version of the architecture shown in a. e, A regular 32 × 32 array version of the STAG based on the design in b.

  4. Extended Data Figure 4 Sample recordings of nine objects on regular 32 × 32 arrays on a flat surface.

    Nine different objects are manipulated on a regular sensor array (Extended Data Fig. 3d) placed on a flat surface. The resting patterns of these objects can be seen easily. Pressing the tactile array with sharp objects like a pen or the needles of a kiwano yields signals with a single sensor resolution.

  5. Extended Data Figure 5 Auxetic designs for stretchable sensor arrays.

    a, Standard auxetic design laser cut from the FSF. b, The actual design of the auxetic includes holes to route the electrodes (shown in red and blue), and slots allow the square, sensing island to rotate, enhancing the stretchability of the sensor array. c, Close-up of the fabricated array showing the conductive thread electrodes before insulation. d, A fully fabricated 10 × 10 array with an auxetic design. e, Auxetic patterning allows the sensor array to be folded, crushed and stretched easily with no damage. f, The array can also be stretched in multiple directions (see Supplementary Video 2).

  6. Extended Data Figure 6 Dataset objects.

    In total, 26 objects are used in our dataset; images of 24 objects are shown here. In addition to these objects, our dataset includes two cola cans (one empty can and one full can).

  7. Extended Data Figure 7 Confusion maps and learned convolution filters.

    ah, The actual object and predicted object labels are shown in these confusion matrices for different networks, each taking 1 to 8 (or N) inputs where each input is obtained from a distinct cluster for N > 1 (approach shown in Fig. 2e; see Methods). These matrices correspond to the ‘clustering’ curve in Fig. 2b. Objects with similar shapes, sizes or weights are more likely to be confused with each other. For example, the empty can and full can are easily mistaken for each other when they are resting on the table. Likewise, lighter objects such as the safety glasses, plastic spoon, or the coin are more likely to be confused with each other or other objects. Large, heavy objects with distinct signatures such as the tea box have high detection accuracy across different numbers of inputs (N). i, Original first-layer convolution filters (3 × 3) learned by the network shown in Fig. 2a for N = 1 inputs. j, Visualization of the first-layer convolution filters of ResNet-18 trained on ImageNet.

  8. Extended Data Figure 8 Weight estimation examples and performance.

    a, Four representative examples from the weight estimation dataset, in which the objects are lifted using multi-finger grasps from the top (see Supplementary Video 6 for an example recording). b, The weight estimation performance is shown in terms of the mean absolute and relative errors (normalized to the weight of each object) in each weight interval. The relative error is analogous to the Weber fraction. We observe that the CNN outperforms the linear baseline with or without the hand pose signal removed. The overall errors of the two linear baselines are comparable.

  9. Extended Data Figure 9 Correspondence maps for six individual sensors using the decomposed hand pose signal.

    The hand pose signal decomposed from object interactions is used to collectively extract correlations between the sensors and the full hand (analogous to Fig. 3b where the decomposed object-related signal is used). The pixels at the fingertips show less structured correlations with the remaining fingers, unlike in Fig. 3b.

  10. Extended Data Figure 10 Hand pose signals from articulated hands.

    a, Images of the hand poses used in the hand pose dataset. The poses G1 to G7 are extracted from a recent grasp taxonomy. In the recordings, each pose is continuously articulated from the neutral empty hand pose. b, When the tactile data from this dataset is clustered using t-SNE, each distinct group represents a hand pose. Sample tactile maps are shown on the right. The corresponding samples are marked in red (see Supplementary Data 3). c, The hand pose signals can be classified with 89.4% accuracy (average of ten runs with 3,080 training frames and 1,256 distinct test frames) using the same CNN architecture shown in Fig. 2a. The confusion matrix elements denote how often each hand pose (column) is classified as one of the possible hand poses (rows). It shows that hand poses G1 and G6 are sometimes misidentified but the other hand poses are identified nearly perfectly.

Supplementary information

  1. Supplementary Information

    This file contains Supplementary Table 1, Supplementary References and Supplementary Figures 1-6

  2. Supplementary Data 1

    This zipped file contains the interactive version of the k-means clustering example in Fig. 2e

  3. Supplementary Data 2

    This zipped file contains ‘sensor-level.html’, ‘region-level.html’ and ‘finger-level.html’ which contain interactive maps of the correlations seen over the entire dataset – between the individual sensors, different hand regions, and fingers respectively. ‘sensor-level.html’ is the interactive version of the map in Fig. 3b

  4. Supplementary Data 3

    This zipped file contains an interactive map of the t-SNE clustered hand pose data shown in Extended Data Fig. 10b

  5. Video 1

    Video shows the bendability of the STAG and includes a demonstration of folding a paper plane while wearing the STAG

  6. Video 2

    Auxetic version of the sensor array with 10 × 10 elements (speed – 3x)

  7. Video 3

    Interaction from the STAG dataset – Mug (speed – 3x)

  8. Video 4

    Interaction from the STAG dataset – Cat [stone] (speed – 3x)

  9. Video 5

    Interaction from the STAG dataset – Safety glasses (speed – 3x)

  10. Video 6

    Example sequence of the dataset used for weight estimation – Multimeter (speed – 3x)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark
Fig. 1: The STAG as a platform to learn from the human grasp.
Fig. 2: Identifying and weighing objects from tactile information.
Fig. 3: Cooperativity among regions of the hand during object manipulation and grasp.
Extended Data Figure 1: STAG images and readout circuit architecture.
Extended Data Figure 2: Characteristics of the STAG sensing elements.
Extended Data Figure 3: Sensor architectures and regular 32 × 32 arrays.
Extended Data Figure 4: Sample recordings of nine objects on regular 32 × 32 arrays on a flat surface.
Extended Data Figure 5: Auxetic designs for stretchable sensor arrays.
Extended Data Figure 6: Dataset objects.
Extended Data Figure 7: Confusion maps and learned convolution filters.
Extended Data Figure 8: Weight estimation examples and performance.
Extended Data Figure 9: Correspondence maps for six individual sensors using the decomposed hand pose signal.
Extended Data Figure 10: Hand pose signals from articulated hands.

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.