Hostname: page-component-669899f699-8p65j Total loading time: 0 Render date: 2025-04-28T10:19:34.244Z Has data issue: false hasContentIssue false

Immersive remote telerobotics: foveated unicasting and remote visualization for intuitive interaction

Published online by Cambridge University Press:  30 October 2024

Yonas T. Tefera*
Affiliation:
Advanced Robotics, Istituto Italiano di Tecnologia (IIT), Genova, Italy
Yaesol Kim
Affiliation:
Advanced Robotics, Istituto Italiano di Tecnologia (IIT), Genova, Italy
Sara Anastasi
Affiliation:
Istituto Nazionale per l’Assicurazione contro gli Infortuni sul Lavoro (INAIL), Rome, Italy
Paolo Fiorini
Affiliation:
Department of Computer Science, University of Verona, Verona, Italy
Darwin G. Caldwell
Affiliation:
Advanced Robotics, Istituto Italiano di Tecnologia (IIT), Genova, Italy
Nikhil Deshpande
Affiliation:
Advanced Robotics, Istituto Italiano di Tecnologia (IIT), Genova, Italy
*
Corresponding author: Yonas Teodros Tefera; Email: [email protected]

Abstract

Precise and efficient performance in remote robotic teleoperation relies on intuitive interaction. This requires both accurate control actions and complete perception (vision, haptic, and other sensory feedback) of the remote environment. Especially in immersive remote teleoperation, the complete perception of remote environments in 3D allows operators to gain improved situational awareness. Color and Depth (RGB-D) cameras capture remote environments as dense 3D point clouds for real-time visualization. However, providing enough situational awareness needs fast, high-quality data transmission from acquisition to virtual reality rendering. Unfortunately, dense point-cloud data can suffer from network delays and limits, impacting the teleoperator’s situational awareness. Understanding how the human eye works can help mitigate these challenges. This paper introduces a solution by implementing foveation, mimicking the human eye’s focus by smartly sampling and rendering dense point clouds for an intuitive remote teleoperation interface. This provides high resolution in the user’s central field, which gradually reduces toward the edges. However, this systematic visualization approach in the peripheral vision may benefit or risk losing information and burdening the user’s cognitive load. This work investigates these advantages and drawbacks through an experimental study and describes the overall system, with its software, hardware, and communication framework. This will show significant enhancements in both latency and throughput, surpassing 60% and 40% improvements in both aspects when compared with state-of-the-art research works. A user study reveals that the framework has minimal impact on the user’s visual quality of experience while helping to reduce the error rate significantly. Further, a 50% reduction in task execution time highlights the benefits of the proposed framework in immersive remote telerobotics applications.

Type
Research Article
Copyright
© The Author(s), 2024. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Cobos-Guzman, S., Torres, J. and Lozano, R., “Design of an underwater robot manipulator for a telerobotic system,” Robotica 31(6), 945953 (2013).CrossRefGoogle Scholar
Hokmi, S., Haghi, S. and Farhadi, A., “Remote monitoring and control of the 2-DoF robotic manipulators over the internet,” Robotica 40(12), 44754497 (2022).CrossRefGoogle Scholar
Muscolo, G. G., Marcheschi, S., Fontana, M. and Bergamasco, M., “Dynamics modeling of human-machine control interface for underwater teleoperation - erratum,” Robotica 40(4), 12551255 (2022).CrossRefGoogle Scholar
Naceri, A., Mazzanti, D., Bimbo, J., Tefera, Y. T., Prattichizzo, D., Caldwell, D. G., Mattos, L. S. and Deshpande, N., “The Vicarios Virtual reality interface for remote robotic teleoperation,” J. Intell. Robot. Syst. 101(80), 116 (2021).CrossRefGoogle Scholar
Tefera, Y., Mazzanti, D., Anastasi, S., Caldwell, D., Fiorini, P. and Deshpande, N., “Towards Foveated Rendering for Immersive Remote Telerobotics,” In: The International Workshop on Virtual Augmented, and Mixed-Reality for Human-Robot Interactions at HRI 2022, Boulder, USA (2022) pp. 14.Google Scholar
Mallem, M., Chavand, F. and Colle, E., “Computer-assisted visual perception in teleoperated robotics,” Robotica 10(2), 93103 (1992).CrossRefGoogle Scholar
Mossel, A. and Kröter, M., “Streaming and Exploration of Dynamically Changing Dense 3D Reconstructions in Immersive Virtual Reality,” In: 2016 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), IEEE, Merida, Mexico (2016) pp. 4348.Google Scholar
Slawiñski, E. and Mut, V., “Control scheme including prediction and augmented reality for teleoperation of mobile robots,” Robotica 28(1), 1122 (2010).CrossRefGoogle Scholar
Stotko, P., Krumpen, S., Schwarz, M., Lenz, C., Behnke, S., Klein, R. and Weinmann, M., “A VR System for Immersive Teleoperation and Live Exploration with a Mobile Robot,” In: IEEE/RSJ IROS, Macau, China (2019) pp. 36303637.Google Scholar
Stotko, P., Krumpen, S., Hullin, M. B., Weinmann, M. and Klein, R., “SLAMCast: Large-scale, real-time 3D reconstruction and streaming for immersive multi-client live telepresence,” IEEE Trans. Vis. Comput. Graph. 25a(5), 21022112 (2019).CrossRefGoogle Scholar
Kamezaki, M., Yang, J., Iwata, H. and Sugano, S., “A basic framework of virtual reality simulator for advancing disaster response work using teleoperated work machines,” J. Robot. Mechatron. 26(4), 486495 (2014).CrossRefGoogle Scholar
Dima, E., Brunnström, K., Sjöström, M., Andersson, M., Edlund, J., Johanson, M. and Qureshi, T., “View Position Impact on QoE in an Immersive Telepresence System for Remote Operation,” In: 2019 Eleventh International Conference on Quality of Multimedia Experience (QoMEX), IEEE, Berlin, Germany (2019) pp. 13.Google Scholar
Rosen, E., Whitney, D., Fishman, M., Ullman, D. and Tellex, S., “Mixed Reality as a Bidirectional Communication Interface for Human-Robot Interaction,” In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA (2020) pp. 1143111438.Google Scholar
Stauffert, J.-P., Niebling, F. and Latoschik, M. E., “Latency and cybersickness: Impact, causes, and measures. A review,” Front. Virtual Real. 1, 31 (2020).CrossRefGoogle Scholar
Orts-Escolano, S., Rhemann, C., Fanello, S., Chang, W., Kowdle, A., Degtyarev, Y., Kim, D., Davidson, P. L., Khamis, S., Dou, M., Tankovich, V., Loop, C., Cai, Q., Chou, P. A., Mennicken, S., Valentin, J., Pradeep, V., Wang, S., Kang, S. B., Kohli, P., Lutchyn, Y., Keskin, C. and Izadi, S.. “Holoportation: Virtual 3D Teleportation in Real-Time,” In: 29th Annual Symposium on User Interface Software and Technology (UIST), New York, NY, USA, Association for Computing Machinery (2016) pp. 741754.Google Scholar
Hendrickson, A., “Organization of the Adult Primate Fovea,” In: Macular Degeneration (Penfold, P. L. and Provis, J. M, eds.) (Springer Berlin Heidelberg, Berlin, Heidelberg, 2005) pp. 123.Google Scholar
Guenter, B., Finch, M., Drucker, S., Tan, D. and Snyder, J., “Foveated 3D graphics,” ACM Trans. Graph. 31(6), 110 (2012).CrossRefGoogle Scholar
Maimone, A. and Fuchs, H., “Encumbrance-Free Telepresence System with Real-Time 3D Capture and Display Using Commodity Depth Cameras,” In: 10th IEEE International Symposium on Mixed and Augmented Reality, IEEE,Basel, Switzerland (2011) pp. 137146.Google Scholar
Ni, D., Song, A., Xu, X., Li, H., Zhu, C. and Zeng, H., “3D-point-cloud registration and real-world dynamic modelling-based virtual environment building method for teleoperation,” Robotica 35(10), 19581974 (2017).CrossRefGoogle Scholar
Fairchild, A. J., Campion, S. P., García, A. S., Wolff, R., Fernando, T. and Roberts, D. J., “A mixed reality telepresence system for collaborative space operation,” IEEE Trans. Circ. Syst. Vid. Technol. 27(4), 814827 (2017).CrossRefGoogle Scholar
Weinmann, M., Stotko, P., Krumpen, S. and Klein, R., “Immersive VR-Based Live Telepresence for Remote Collaboration and Teleoperation,” In: Wissenschaftlich-Technische Jahrestagung der DGPF, Stuttgart, Germany (2020) pp. 391399.Google Scholar
Su, Y.-P., Chen, X.-Q., Zhou, C., Pearson, L. H., Pretty, C. G. and Chase, J. G., “Integrating virtual, mixed, and augmented reality into remote robotic applications: A brief review of extended reality-enhanced robotic systems for intuitive telemanipulation and telemanufacturing tasks in hazardous conditions,” Appl. Sci. 13(22), 12129 (2023).CrossRefGoogle Scholar
Izadi, S., Kim, D., Hilliges, O., Molyneaux, D., Newcombe, R., Kohli, P., Shotton, J., Hodges, S., Freeman, D.and Davison, A., “Kinectfusion: Real-Time 3D Reconstruction and Interaction Using a Moving Depth Camera,” In: Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology, Association for Computing Machinery, New York, NY, USA (2011) pp. 559568.Google Scholar
Whelan, T., Leutenegger, S., Salas-Moreno, R., Glocker, B. and Davison, A., “ElasticFusion: Dense SLAM without a pose graph,” In Robot.: Sci. Syst. 11, 16971716 (2015).Google Scholar
Schöps, T., Sattler, T. and Pollefeys, M., “SurfelMeshing: Online surfel-based mesh reconstruction,” IEEE Trans. Pattern Anal. Mach. Intell. 42(10), 24942507 (2020).CrossRefGoogle Scholar
Mekuria, R., Blom, K. and Cesar, P., “Design, implementation, and evaluation of a point cloud codec for tele-immersive video,” IEEE Trans. Circ. Syst. Vid. Technol. 27(4), 828842 (2016).CrossRefGoogle Scholar
Schwarz, S., Sheikhipour, N., Sevom, V. F. and Hannuksela, M. M., “Video coding of dynamic 3D point cloud data,” APSIPA Trans. Signal Info. Process. 8(1), 3143 (2019).Google Scholar
Huang, Y., Peng, J., Kuo, C.-C. J. and Gopi, M., “A generic scheme for progressive point cloud coding,” IEEE Trans. Vis. Comput. Graph. 14(2), 440453 (2008).CrossRefGoogle ScholarPubMed
Shi, Y., Venkatram, P., Ding, Y. and Ooi, W. T., “Enabling Low Bit-Rate mpeg v-pcc Encoded Volumetric Video Streaming with 3D Sub-Sampling,” In: Proceedings of the 14th Conference on ACM Multimedia Systems, New York, NY, USA (2023) pp. 108118.Google Scholar
Van Der Hooft, J., Wauters, T., De Turck, F., Timmerer, C. and Hellwagner, H., “Towards 6DoF http Adaptive Streaming Through Point Cloud Compression,” In: Proceedings of the 27th ACM International Conference on Multimedia, New York, NY, USA (2019) pp. 24052413.Google Scholar
De Pace, F., Gorjup, G., Bai, H., Sanna, A., Liarokapis, M. and Billinghurst, M., “Leveraging Enhanced Virtual Reality Methods and Environments for Efficient, Intuitive, and Immersive Teleoperation of Robots,” In: 2021 IEEE International Conference on Robotics and Automation (ICRA), IEEE, Xi’an, China (2021) pp. 1296712973.Google Scholar
Huey, E.. The Psychology and Pedagogy of Reading: With a Review of the History of Reading and Writing and of Methods, Texts, and Hygiene in Reading, M.I.T. Press paperback series (M.I.T. Press, New York, NY, USA, 1968).Google Scholar
Fitts, P. M., Jones, R. E. and Milton, J. L., “Eye movements of aircraft pilots during instrument-landing approaches,” Aeronaut. Eng. Rev. 9(2), 2429 (1949).Google Scholar
Yarbus, A. L., Eye Movements and Vision (Springer Berlin/Heidelberg, Germany, 1967).CrossRefGoogle Scholar
Stein, N., Niehorster, D. C., Watson, T., Steinicke, F., Rifai, K., Wahl, S. and Lappe, M., “A comparison of eye tracking latencies among several commercial head-mounted displays,” i-Perception 12(1), 2041669520983338 (2021).CrossRefGoogle ScholarPubMed
Stengel, M., Grogorick, S., Eisemann, M. and Magnor, M., “Adaptive image-space sampling for gaze-contingent real-time rendering,” Comput. Graph. Forum 35(4), 129139 (2016).CrossRefGoogle Scholar
Bruder, V., Schulz, C., Bauer, R., Frey, S., Weiskopf, D. and Ertl, T., “Voronoi-Based Foveated Volume Rendering,” In: EuroVis (Short Papers) pp. 6771 (2019).Google Scholar
Charlton, A., What is foveated rendering? Explaining the VR technology key to lifelike realism (2021) (Accessed: 05-Sep-2021).Google Scholar
Kaplanyan, A. S., Sochenov, A., Leimkühler, T., Okunev, M., Goodall, T. and Rufo, G., “Deepfovea: Neural reconstruction for foveated rendering and video compression using learned statistics of natural videos,” ACM Trans. Graph. 38(6), 113 (2019).CrossRefGoogle Scholar
Lungaro, P., Sjöberg, R., Valero, A. J. F., Mittal, A. and Tollmar, K., “Gaze-aware streaming solutions for the next generation of mobile VR experiences,” IEEE Trans. Vis. Comput. Graph. 24(4), 15351544 (2018).CrossRefGoogle ScholarPubMed
Ananpiriyakul, T., Anghel, J., Potter, K. and Joshi, A., “A gaze-contingent system for foveated multiresolution visualization of vector and volumetric data,” Electron. Imaging 32(1), 374-1–374-11 (2020).CrossRefGoogle Scholar
Schütz, M., Krösl, K. and Wimmer, M., “Real-Time Continuous Level of Detail Rendering of Point Clouds,” In: 2019 IEEE Conference on Virtual Reality and 3D User Interfaces (VR), Osaka, Japan (2019) pp. 103110.Google Scholar
Guyton, A. C. and Hall, J. E., Guyton and Hall Textbook of Medical Physiology (Elsevier, New York, NY, USA, 2011) pp. 597608.Google Scholar
Quinn, N., Csincsik, L., Flynn, E., Curcio, C. A., Kiss, S., Sadda, S. R., Hogg, R., Peto, T. and Lengyel, I., “The clinical relevance of visualising the peripheral retina,” Prog. Retin. Eye Res. 68, 83109 (2019).CrossRefGoogle ScholarPubMed
Sherman, W. R. and Craig, A. B., “Chapter 3 - the Human in the Loop,” In: Understanding Virtual Reality, The Morgan Kaufmann Series in Computer Graphics, (Sherman, W. R. and Craig, A. B., eds.) (Second Edition) (Morgan Kaufmann, Boston, 2018) pp. 108188.CrossRefGoogle Scholar
Hyönä, J., “Foveal and Parafoveal Processing during Reading,” In: The Oxford Handbook of Eye Movements (Liversedge, S. P., Gilchrist, I. and Everling, S., eds.) (Oxford University Press, New York, NY, USA, 2011) pp. 820838.Google Scholar
Ishiguro, Y. and Rekimoto, J., “Peripheral Vision Annotation: Noninterference Information Presentation Method for Mobile Augmented Reality,” In Proceedings of the 2nd Augmented Human International Conference, New York, NY, USA (2011) pp. 15.Google Scholar
Strasburger, H., Rentschler, I. and Jüttner, M., “Peripheral vision and pattern recognition: A review,” J. Vision 11(5), 1313 (2011).CrossRefGoogle ScholarPubMed
Simpson, M. J., “Mini-review: Far peripheral vision,” Vision Res. 140, 96105 (2017).CrossRefGoogle ScholarPubMed
Gordon, J. and Abramov, I., “Color vision in the peripheral retina II Hue and saturation,” JOSA 67(2), 202207 (1977).CrossRefGoogle ScholarPubMed
Weymouth, F. W., “Visual sensory units and the minimal angle of resolution,” Am. J. Ophthalmol. 46(1), 102113 (1958).CrossRefGoogle ScholarPubMed
Eckstein, M. P., “Visual search: A retrospective,” J. Vision 11(5), 1414 (2011).CrossRefGoogle ScholarPubMed
Olk, B., Dinu, A., Zielinski, D. J. and Kopper, R., “Measuring visual search and distraction in immersive virtual reality,” Roy. Soc. Open. Sci. 5(5), 172331 (2018).CrossRefGoogle ScholarPubMed
Handa, A., Whelan, T., McDonald, J. B. and Davison, A. J., “A Benchmark for RGB-D Visual Odometry, 3D Reconstruction and SLAM,” In: IEEE International Conference on Robotics and Automation, ICRA, Hong Kong, China (2014) pp. 15241531.Google Scholar
Sanders, C., Practical Packet Analysis: Using Wireshark to Solve Real-World Network Problems (No Starch Press, San Francisco, CA, USA, 2017)   pp. 1155.Google Scholar
M. 3DG and Requirements 2017. Call for Proposals for Point Cloud Compression V2. Technical report, MPEG 3DG and Requirements, Hobart, AU.Google Scholar
Girardeau-Montaut, D., “Cloudcompare-open source project,” OpenSource Project, Paris, France 588 (2011) pp. 237.Google Scholar
International Telecommunication Union, Recommendation ITU-T P.919: Subjective Test Methodologies for 360° Video On Head-Mounted Displays (ITU, Geneva, Switzerland, 2020), pp. 138.Google Scholar
Richardson, J. T., “The use of latin-square designs in educational and psychological research,” Educ. Res. Rev. 24, 8497 (2018).CrossRefGoogle Scholar
Bruder, V., Müller, C., Frey, S. and Ertl, T., “On evaluating runtime performance of interactive visualizations,” IEEE Trans. Vis. Comput. Graph. 26(9), 28482862 (2019).CrossRefGoogle ScholarPubMed