Hostname: page-component-669899f699-vbsjw Total loading time: 0 Render date: 2025-05-07T00:43:03.126Z Has data issue: false hasContentIssue false

A refined robotic grasp detection network based on coarse-to-fine feature and residual attention

Published online by Cambridge University Press:  28 November 2024

Zhenwei Zhu
Affiliation:
School of Control Science and Engineering, Shandong University, Jinan, Shandong, China
Saike Huang
Affiliation:
School of Control Science and Engineering, Shandong University, Jinan, Shandong, China
Jialong Xie
Affiliation:
School of Control Science and Engineering, Shandong University, Jinan, Shandong, China
Yue Meng
Affiliation:
School of Control Science and Engineering, Shandong University, Jinan, Shandong, China
Chaoqun Wang
Affiliation:
School of Control Science and Engineering, Shandong University, Jinan, Shandong, China
Fengyu Zhou*
Affiliation:
School of Control Science and Engineering, Shandong University, Jinan, Shandong, China
*
Corresponding author: Fengyu Zhou; Email: [email protected]

Abstract

Precise and efficient grasping detection is vital for robotic arms to execute stable grasping tasks in industrial and household applications. However, existing methods fail to consider refining different scale features and detecting critical regions, resulting in coarse grasping rectangles. To address these issues, we propose a real-time coarse and fine granularity residual attention (CFRA) grasping detection network. First, to enable the network to detect different sizes of objects, we extract and fuse the coarse and fine granularity features. Then, we refine these fused features by introducing a feature refinement module, which enables the network to distinguish between object and background features effectively. Finally, we introduce a residual attention module that handles different shapes of objects adaptively, achieving refined grasping detection. We complete training and testing on both Cornell and Jacquard datasets, achieving detection accuracy of 98.7% and 94.2%, respectively. Moreover, the grasping success rate on the real-world UR3e robot achieves 98%. These results demonstrate the effectiveness and superiority of CFRA.

Type
Research Article
Copyright
© The Author(s), 2024. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

Footnotes

The first two authors contributed equally to this work.

References

Dong, M. and Zhang, J., “A review of robotic grasp detection technology,” Robotica 41(12), 38463885 (2023).CrossRefGoogle Scholar
Upadhyay, R., Asi, A., Nayak, P., Prasad, N., Mishra, D. and Pal, S. K., “Real-time deep learning based image processing for pose estimation and object localization in autonomous robot applications,” Int. J. Adv. Manuf. Technol. 127(3), 19051919 (2023).CrossRefGoogle Scholar
Zhou, Z. and Li, S., “Self-sustained and coordinated rhythmic deformations with SMA for controller-free locomotion,” Adv. Intell. Syst. 6(5), 2300667 (2024).CrossRefGoogle Scholar
Khadivar, F. and Billard, A., “Adaptive fingers coordination for robust grasp and in-hand manipulation under disturbances and unknown dynamics,” IEEE Trans. Robot. 39(5), 33503367 (2023).CrossRefGoogle Scholar
Maitin-Shepard, J., Cusumano-Towner, M., Lei, J. and Abbeel, P.. Cloth grasp point detection based on multiple-view geometric cues with application to robotic towel folding. In: 2010 IEEE International Conference on Robotics and Automation (ICRA), Anchorage, AK, USA (2010) pp. 23082315 Google Scholar
He, C., Meng, L., Sun, Z., Wang, J. and Meng, M. Q.-H., “Fabricfolding: Learning efficient fabric folding without expert demonstrations,” Robotica 42(4), 12811296 (2024).CrossRefGoogle Scholar
Du, G., Wang, K., Lian, S. and Zhao, K., “Vision-based robotic grasping from object localization, object pose estimation to grasp estimation for parallel grippers: A review,” Artif. Intell. Rev. 54(3), 16771734 (2021).CrossRefGoogle Scholar
Le, M.-T. and Lien, J.-J. J., “Robot arm grasping using learning-based template matching and self-rotation learning network,” Int. J. Adv. Manuf. Technol. 121(3), 19151926 (2022).CrossRefGoogle Scholar
Ramisa, A., Alenya, G., Moreno-Noguer, F. and Torras, C.. Using depth and appearance features for informed robot grasping of highly wrinkled clothes. In: 2012 IEEE International Conference on Robotics and Automation (ICRA), Saint Paul, MN, USA (2012) pp. 17031708 Google Scholar
Zhang, L. and Wu, D., “A single target grasp detection network based on convolutional neural network,” Comput. Intel. Neurosc. 2021(1), 5512728 (2021).Google ScholarPubMed
Wang, C., Chen, X., Li, C., Song, R., Li, Y. and Meng, M. Q.-H., “Chase and track: Toward safe and smooth trajectory planning for robotic navigation in dynamic environments,” IEEE Trans. Ind. Electron. 70(1), 604613 (2022).CrossRefGoogle Scholar
Lenz, I., Lee, H. and Saxena, A., “Deep learning for detecting robotic grasps,” Int. J. Robot. Res. 34(4-5), 705724 (2015).CrossRefGoogle Scholar
Zhou, X., Lan, X., Zhang, H., Tian, Z., Zhang, Y. and Zheng, N.. Fully convolutional grasp detection network with oriented anchor box. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain (2018) pp. 72237230. Google Scholar
Chu, F.-J., Xu, R. and Vela, P. A., “Real-world multiobject, multigrasp detection,” IEEE Robot. Autom. Lett. 3(4), 33553362 (2018).CrossRefGoogle Scholar
Redmon, J. and Angelova, A.. Real-time grasp detection using convolutional neural networks. In: 2015 IEEE international conference on robotics and automation (ICRA), Seattle, WA, USA (2015) pp. 13161322 Google Scholar
Cheng, H., Ho, D. and Meng, M. Q.-H.. High accuracy and efficiency grasp pose detection scheme with dense predictions. In: 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France (2020) pp. 36043610 Google Scholar
Yu, S., Zhai, D.-H. and Xia, Y., “SKGNet: Robotic grasp detection with selective kernel convolution,” IEEE T. Autom. Sci. Eng. 20(4), 22412252 (2022).CrossRefGoogle Scholar
Zhai, D.-H., Yu, S. and Xia, Y., “FANet: Fast and accurate robotic grasp detection based on keypoints,” IEEE T. Autom. Sci. Eng. 21(3), 29742986 (2023)CrossRefGoogle Scholar
Kumra, S., Joshi, S. and Sahin, F.. Antipodal robotic grasping using generative residual convolutional neural network. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA (2020) pp. 96269633 Google Scholar
Morrison, D., Corke, P. and Leitner, J., “Closing the loop for robotic grasping: A real-time, generative grasp synthesis approach,” arXiv preprint arXiv:1804.05172, (2018)Google Scholar
Yu, S., Zhai, D.-H. and Xia, Y., “Egnet: Efficient robotic grasp detection network,” IEEE T. Ind. Electron. 70(4), 40584067 (2022).CrossRefGoogle Scholar
Kumra, S. and Kanan, C.. Robotic grasp detection using deep convolutional neural networks. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), (2017) pp. 769776 Google Scholar
Cheng, H., Wang, Y. and Meng, M. Q.-H., “Anchor-based multi-scale deep grasp pose detector with encoded angle regression,” IEEE T. Autom. Sci. Eng. 21(3), 31303142 (2023)CrossRefGoogle Scholar
Wu, Y., Zhang, F. and Fu, Y., “Real-time robotic multigrasp detection using anchor-free fully convolutional grasp detector,” IEEE T. Ind. Electron. 69(12), 1317113181 (2021).CrossRefGoogle Scholar
Ren, G., Geng, W., Guan, P., Cao, Z. and Yu, J., “Pixel-wise grasp detection via twin deconvolution and multi-dimensional attention,” IEEE T. Circ. Syst. Vid. Technol. 33(8), 40024010 (2023).CrossRefGoogle Scholar
He, K., Zhang, X., Ren, S. and Sun, J.. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016) pp. 770778. Google Scholar
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J.. Learning transferable visual models from natural language supervision. In: Proceedings of the ACM Conference on International Conference on Machine Learning (ICML), Vienna, Austria (2021) pp. 87488763 Google Scholar
Wang, S., Zhou, Z., Li, B., Li, Z. and Kan, Z., “Multi-modal interaction with transformers: Bridging robots and human with natural language,” Robotica 42(2), 415434 (2024).CrossRefGoogle Scholar
Li, Y., Liu, Y., Ma, Z. and Huang, P., “A novel generative convolutional neural network for robot grasp detection on Gaussian guidance,” IEEE Trans. Instrum. Meas. 71, 110 (2022).Google Scholar
D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” arXiv preprint arXiv:1409.0473 , (2014).Google Scholar
Li, X., Wang, W., Hu, X. and Yang, J.. Selective kernel networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA (2019) pp. 510519 Google Scholar
Guo, M.-H., Liu, Z.-N., Mu, T.-J. and Hu, S.-M., “Beyond self-attention: External attention using two linear layers for visual tasks,” IEEE T. Pattern. Anal. 45(5), 54365447 (2022).Google Scholar
Z. Liu, J. Wang, J. Li, Z. Li, K. Ren, and P. Shi, “A novel integrated method of detection-grasping for specific object based on the box coordinate matching,” arXiv preprint arXiv:2307.11783 , (2023).CrossRefGoogle Scholar
Zhang, H., Lan, X., Bai, S., Zhou, X., Tian, Z. and Zheng, N.. Roi-based robotic grasp detection for object overlapping scenes. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China (2019) pp. 47684775 Google Scholar
H. Cao, G. Chen, Z. Li, J. Lin, and A. Knoll, “Lightweight convolutional neural network with gaussian-based grasping representation for robotic grasping detection,” arXiv preprint arXiv:2101.10226 , (2021).Google Scholar
Wang, S., Zhou, Z. and Kan, Z., “When transformer meets robotic grasping: Exploits context for efficient grasp detection,” IEEE Robot. Autom. Lett. 7(3), 81708177 (2022).CrossRefGoogle Scholar
Karaoguz, H. and Jensfelt, P.. Object detection approach for robot grasp detection. In: 2019 IEEE International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada (2019) pp. 49534959. Google Scholar
Guo, D., Sun, F., Liu, H., Kong, T., Fang, B. and Xi, N.. A hybrid deep architecture for robotic grasp detection. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore (2017) pp. 16091614 Google Scholar
Asif, U., Tang, J. and Harrer, S., “Graspnet: An efficient convolutional neural network for real-time grasp detection for low-powered devices,” IJCAI 7, 48754882 (2018).Google Scholar
D. Park, Y. Seo, and S. Y. Chun, “Real-time, highly accurate robotic grasp detection using fully convolutional neural networks with high-resolution images,” arXiv preprint arXiv:1809.05828 , (2018).Google Scholar
Depierre, A., Dellandréa, E. and Chen, L.. Jacquard: A large scale dataset for robotic grasp detection. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain (2018) pp. 35113516. Google Scholar
Lu, N., Cai, Y., Lu, T., Cao, X., Guo, W. and Wang, S., “Picking out the impurities: Attention-based push-grasping in dense clutter,” Robotica 41(2), 470485 (2023).CrossRefGoogle Scholar