A refined robotic grasp detection network based on coarse-to-fine feature and residual attention

Zhenwei Zhu; Saike Huang; Jialong Xie; Yue Meng; Chaoqun Wang; Fengyu Zhou

doi:10.1017/S0263574724001929

A refined robotic grasp detection network based on coarse-to-fine feature and residual attention

Published online by Cambridge University Press: 28 November 2024

Zhenwei Zhu

Saike Huang

Jialong Xie ,

Yue Meng ,

Chaoqun Wang and

Fengyu Zhou

Show author details

Zhenwei Zhu: Affiliation:
School of Control Science and Engineering, Shandong University, Jinan, Shandong, China
Saike Huang: Affiliation:
School of Control Science and Engineering, Shandong University, Jinan, Shandong, China
Jialong Xie: Affiliation:
School of Control Science and Engineering, Shandong University, Jinan, Shandong, China
Yue Meng: Affiliation:
School of Control Science and Engineering, Shandong University, Jinan, Shandong, China
Chaoqun Wang: Affiliation:
School of Control Science and Engineering, Shandong University, Jinan, Shandong, China
Fengyu Zhou*: Affiliation:
School of Control Science and Engineering, Shandong University, Jinan, Shandong, China
*: Corresponding author: Fengyu Zhou; Email: [email protected]

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

Precise and efficient grasping detection is vital for robotic arms to execute stable grasping tasks in industrial and household applications. However, existing methods fail to consider refining different scale features and detecting critical regions, resulting in coarse grasping rectangles. To address these issues, we propose a real-time coarse and fine granularity residual attention (CFRA) grasping detection network. First, to enable the network to detect different sizes of objects, we extract and fuse the coarse and fine granularity features. Then, we refine these fused features by introducing a feature refinement module, which enables the network to distinguish between object and background features effectively. Finally, we introduce a residual attention module that handles different shapes of objects adaptively, achieving refined grasping detection. We complete training and testing on both Cornell and Jacquard datasets, achieving detection accuracy of 98.7% and 94.2%, respectively. Moreover, the grasping success rate on the real-world UR3e robot achieves 98%. These results demonstrate the effectiveness and superiority of CFRA.

Keywords

Robotic grasping deep learning grasping detection multi-scale features attention mechanism

Type: Research Article
Information: Robotica , First View , pp. 1 - 18

DOI: https://doi.org/10.1017/S0263574724001929 [Opens in a new window]
Copyright: © The Author(s), 2024. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

Footnotes

The first two authors contributed equally to this work.

References

Dong, M. and Zhang, J., “A review of robotic grasp detection technology,” Robotica 41(12), 3846–3885 (2023).CrossRef Google Scholar

Upadhyay, R., Asi, A., Nayak, P., Prasad, N., Mishra, D. and Pal, S. K., “Real-time deep learning based image processing for pose estimation and object localization in autonomous robot applications,” Int. J. Adv. Manuf. Technol. 127(3), 1905–1919 (2023).CrossRef Google Scholar

Zhou, Z. and Li, S., “Self-sustained and coordinated rhythmic deformations with SMA for controller-free locomotion,” Adv. Intell. Syst. 6(5), 2300667 (2024).CrossRef Google Scholar

Khadivar, F. and Billard, A., “Adaptive fingers coordination for robust grasp and in-hand manipulation under disturbances and unknown dynamics,” IEEE Trans. Robot. 39(5), 3350–3367 (2023).CrossRef Google Scholar

Maitin-Shepard, J., Cusumano-Towner, M., Lei, J. and Abbeel, P.. Cloth grasp point detection based on multiple-view geometric cues with application to robotic towel folding. In: 2010 IEEE International Conference on Robotics and Automation (ICRA), Anchorage, AK, USA (2010) pp. 2308–2315 Google Scholar

He, C., Meng, L., Sun, Z., Wang, J. and Meng, M. Q.-H., “Fabricfolding: Learning efficient fabric folding without expert demonstrations,” Robotica 42(4), 1281–1296 (2024).CrossRef Google Scholar

Du, G., Wang, K., Lian, S. and Zhao, K., “Vision-based robotic grasping from object localization, object pose estimation to grasp estimation for parallel grippers: A review,” Artif. Intell. Rev. 54(3), 1677–1734 (2021).CrossRef Google Scholar

Le, M.-T. and Lien, J.-J. J., “Robot arm grasping using learning-based template matching and self-rotation learning network,” Int. J. Adv. Manuf. Technol. 121(3), 1915–1926 (2022).CrossRef Google Scholar

Ramisa, A., Alenya, G., Moreno-Noguer, F. and Torras, C.. Using depth and appearance features for informed robot grasping of highly wrinkled clothes. In: 2012 IEEE International Conference on Robotics and Automation (ICRA), Saint Paul, MN, USA (2012) pp. 1703–1708 Google Scholar

Zhang, L. and Wu, D., “A single target grasp detection network based on convolutional neural network,” Comput. Intel. Neurosc. 2021(1), 5512728 (2021).Google Scholar PubMed

Wang, C., Chen, X., Li, C., Song, R., Li, Y. and Meng, M. Q.-H., “Chase and track: Toward safe and smooth trajectory planning for robotic navigation in dynamic environments,” IEEE Trans. Ind. Electron. 70(1), 604–613 (2022).CrossRef Google Scholar

Lenz, I., Lee, H. and Saxena, A., “Deep learning for detecting robotic grasps,” Int. J. Robot. Res. 34(4-5), 705–724 (2015).CrossRef Google Scholar

Zhou, X., Lan, X., Zhang, H., Tian, Z., Zhang, Y. and Zheng, N.. Fully convolutional grasp detection network with oriented anchor box. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain (2018) pp. 7223–7230. Google Scholar

Chu, F.-J., Xu, R. and Vela, P. A., “Real-world multiobject, multigrasp detection,” IEEE Robot. Autom. Lett. 3(4), 3355–3362 (2018).CrossRef Google Scholar

Redmon, J. and Angelova, A.. Real-time grasp detection using convolutional neural networks. In: 2015 IEEE international conference on robotics and automation (ICRA), Seattle, WA, USA (2015) pp. 1316–1322 Google Scholar

Cheng, H., Ho, D. and Meng, M. Q.-H.. High accuracy and efficiency grasp pose detection scheme with dense predictions. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France (2020) pp. 3604–3610 Google Scholar

Yu, S., Zhai, D.-H. and Xia, Y., “SKGNet: Robotic grasp detection with selective kernel convolution,” IEEE T. Autom. Sci. Eng. 20(4), 2241–2252 (2022).CrossRef Google Scholar

Zhai, D.-H., Yu, S. and Xia, Y., “FANet: Fast and accurate robotic grasp detection based on keypoints,” IEEE T. Autom. Sci. Eng. 21(3), 2974–2986 (2023)CrossRef Google Scholar

Kumra, S., Joshi, S. and Sahin, F.. Antipodal robotic grasping using generative residual convolutional neural network. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA (2020) pp. 9626–9633 Google Scholar

Morrison, D., Corke, P. and Leitner, J., “Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach,” arXiv preprint arXiv:1804.05172, (2018)Google Scholar

Yu, S., Zhai, D.-H. and Xia, Y., “Egnet: Efficient robotic grasp detection network,” IEEE T. Ind. Electron. 70(4), 4058–4067 (2022).CrossRef Google Scholar

Kumra, S. and Kanan, C.. Robotic grasp detection using deep convolutional neural networks. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), (2017) pp. 769–776 Google Scholar

Cheng, H., Wang, Y. and Meng, M. Q.-H., “Anchor-based multi-scale deep grasp pose detector with encoded angle regression,” IEEE T. Autom. Sci. Eng. 21(3), 3130–3142 (2023)CrossRef Google Scholar

Wu, Y., Zhang, F. and Fu, Y., “Real-time robotic multigrasp detection using anchor-free fully convolutional grasp detector,” IEEE T. Ind. Electron. 69(12), 13171–13181 (2021).CrossRef Google Scholar

Ren, G., Geng, W., Guan, P., Cao, Z. and Yu, J., “Pixel-wise grasp detection via twin deconvolution and multi-dimensional attention,” IEEE T. Circ. Syst. Vid. Technol. 33(8), 4002–4010 (2023).CrossRef Google Scholar

He, K., Zhang, X., Ren, S. and Sun, J.. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016) pp. 770–778. Google Scholar

Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J.. Learning transferable visual models from natural language supervision. In: Proceedings of the ACM Conference on International Conference on Machine Learning (ICML), Vienna, Austria (2021) pp. 8748–8763 Google Scholar

Wang, S., Zhou, Z., Li, B., Li, Z. and Kan, Z., “Multi-modal interaction with transformers: Bridging robots and human with natural language,” Robotica 42(2), 415–434 (2024).CrossRef Google Scholar

Li, Y., Liu, Y., Ma, Z. and Huang, P., “A novel generative convolutional neural network for robot grasp detection on Gaussian guidance,” IEEE Trans. Instrum. Meas. 71, 1–10 (2022).Google Scholar

D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” arXiv preprint arXiv:1409.0473 , (2014).Google Scholar

Li, X., Wang, W., Hu, X. and Yang, J.. Selective kernel networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA (2019) pp. 510–519 Google Scholar

Guo, M.-H., Liu, Z.-N., Mu, T.-J. and Hu, S.-M., “Beyond self-attention: External attention using two linear layers for visual tasks,” IEEE T. Pattern. Anal. 45(5), 5436–5447 (2022).Google Scholar

Z. Liu, J. Wang, J. Li, Z. Li, K. Ren, and P. Shi, “A novel integrated method of detection-grasping for specific object based on the box coordinate matching,” arXiv preprint arXiv:2307.11783 , (2023).CrossRef Google Scholar

Zhang, H., Lan, X., Bai, S., Zhou, X., Tian, Z. and Zheng, N.. Roi-based robotic grasp detection for object overlapping scenes. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China (2019) pp. 4768–4775 Google Scholar

H. Cao, G. Chen, Z. Li, J. Lin, and A. Knoll, “Lightweight convolutional neural network with gaussian-based grasping representation for robotic grasping detection,” arXiv preprint arXiv:2101.10226 , (2021).Google Scholar

Wang, S., Zhou, Z. and Kan, Z., “When transformer meets robotic grasping: Exploits context for efficient grasp detection,” IEEE Robot. Autom. Lett. 7(3), 8170–8177 (2022).CrossRef Google Scholar

Karaoguz, H. and Jensfelt, P.. Object detection approach for robot grasp detection. In: 2019 IEEE International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada (2019) pp. 4953–4959. Google Scholar

Guo, D., Sun, F., Liu, H., Kong, T., Fang, B. and Xi, N.. A hybrid deep architecture for robotic grasp detection. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore (2017) pp. 1609–1614 Google Scholar

Asif, U., Tang, J. and Harrer, S., “Graspnet: An efficient convolutional neural network for real-time grasp detection for low-powered devices,” IJCAI 7, 4875–4882 (2018).Google Scholar

D. Park, Y. Seo, and S. Y. Chun, “Real-time, highly accurate robotic grasp detection using fully convolutional neural networks with high-resolution images,” arXiv preprint arXiv:1809.05828 , (2018).Google Scholar

Depierre, A., Dellandréa, E. and Chen, L.. Jacquard: A large scale dataset for robotic grasp detection. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain (2018) pp. 3511–3516. Google Scholar

Lu, N., Cai, Y., Lu, T., Cao, X., Guo, W. and Wang, S., “Picking out the impurities: Attention-based push-grasping in dense clutter,” Robotica 41(2), 470–485 (2023).CrossRef Google Scholar

Article contents

A refined robotic grasp detection network based on coarse-to-fine feature and residual attention

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests