MarineGym: A High-Performance Reinforcement Learning Platform for Underwater Robotics

¹The State Key Laboratory of Fluid Power and Mechatronic Systems, Zhejiang University, Hangzhou, China
²School of Engineering and Physical Sciences, Heriot-Watt University, Edinburgh, UK
³School of Informatics, University of Edinburgh, Edinburgh, UK
⁴Department of Mechanical and Aerospace Engineering, the Hong Kong University of Science and Technology, Hong Kong, China
^*Equal Contribution

Abstract

This work presents the MarineGym, a high-performance reinforcement learning (RL) platform specifically designed for underwater robotics. It aims to address the limitations of existing underwater simulation environments in terms of RL compatibility, training efficiency, and standardized benchmarking. MarineGym integrates a proposed GPU-accelerated hydrodynamic plugin based on Isaac Sim, achieving a rollout speed of 250,000 frames per second on a single NVIDIA RTX 3060 GPU. It also provides five models of unmanned underwater vehicles (UUVs), multiple propulsion systems, and a set of predefined tasks covering core underwater control challenges. Additionally, the DR toolkit allows flexible adjustments of simulation and task parameters during training to improve Sim2Real transfer. Further benchmark experiments demonstrate that MarineGym improves training efficiency over existing platforms and supports robust policy adaptation under various perturbations. We expect this platform could drive further advancements in RL research for underwater robotics. For more details about MarineGym and its applications, please visit our project page: https://marine-gym.com/.

Platform Overview

MarineGym features GPU-accelerated hydrodynamics which supports over 8,000 parallel instances per GPU with a generation rate of 250,000 frames per second. It offers diverse UUV models, four propulsion systems, and a domain randomization toolkit for enhanced training and evaluation.

Multitype UUV Models

MarineGym provides a diverse UUV model library with three typical configurations: multirotor, featuring multiple thrusters for full-degree-of-freedom control (e.g., BlueROV series for precision tasks); rudder-propeller, combining a main thruster and control rudders for efficient cruising (e.g., LAUV, iAUV); and tiltrotor, with tiltable propellers for seamless air-sea transitions (e.g., HAUV).

Benchmarking on UUV tasks

Experiments are conducted on five distinct types of UUVs to systematically evaluate the performance of different UUV models across the three benchmark tasks. We present learning curves of five UUV models across three tasks under Standard (blue), Disturbance (orange), and Disturbance + Randomization (green) environment. Each row represents a different task (Station keeping, Trajectory tracking, and Docking), while each column corresponds to a specific UUV model (BlueROV, BlueROV Heavy, HAUV, LAUV, iAUV).

BibTeX

@online{chu_2025_MarineGymHighPerformanceReinforcement, title = {MarineGym: A High-Performance Reinforcement Learning Platform for Underwater Robotics}, shorttitle = {MarineGym}, author = {Chu, Shuguang and Huang, Zebin and Li, Yutong and Lin, Mingwei and Carlucho, Ignacio and Petillot, Yvan R. and Yang, Canjun}, date = {2025-03-12}, eprint = {2503.09203}, eprinttype = {arXiv}, eprintclass = {cs}, doi = {10.48550/arXiv.2503.09203}, pubstate = {prepublished} }