Abstract

Snake robots, composed of sequentially connected joint actuators, have recently gained increasing attention in the industrial field, like life detection in narrow space. Such robots can navigate the complex environment via the cooperation of multiple motors located on the backbone. However, controlling the robots in a physically constrained environment is challenging, and conventional control strategies can be energy-inefficient or even fail to navigate to the destination. This work develops a snake locomotion gait policy for energy-efficient control via deep reinforcement learning (DRL). After establishing the environment model, we apply a physics constrained online policy gradient method based on the proximal policy optimization (PPO) objective function of each joint motor parameterized by angular velocity. The DRL agent learns the standard serpenoid curve at each timestep. The policy is updated based on the robot’s observations and estimation of the current states. The robot simulator and task environment are built upon PyBullet. Compared to conventional control strategies, the snake robots controlled by the trained PPO agent can achieve faster movement and a more energy-efficient locomotion gait. This work demonstrates that DRL provides an energy-efficient solution for robot control.

References

1.
Tescha
,
M.
,
Lipkin
,
K.
,
Brown
,
I.
,
Hatton
,
R.
,
Peck
,
A.
,
Rembisz
,
J.
, and
Choset
,
H.
,
2009
, “
Parameterized and Scripted Gaits for Modular Snake Robots
,”
Adv. Rob.
,
23
(
9
), pp.
1131
1158
.
2.
Wang
,
T.
,
Whitman
,
J.
,
Travers
,
M.
, and
Choset
,
H.
,
2020
, “
Directional Compliance in Obstacle-Aided Navigation for Snake Robots
,”
2020 American Control Conference (ACC)
, pp.
2458
2463
.
3.
Moattari
,
M.
, and
Bagherzadeh
,
M. A.
,
2013
, “
Flexible Snake Robot: Design and Implementation
,”
3rd Joint Conference of AI & Robotics and 5th RoboCup Iran Open International Symposium
,
Tehran, Iran
,
Apr. 8
, pp.
1
5
.
4.
Shugen
,
M.
,
Araya
,
H.
, and
Li
,
L.
,
2001
, “
Development of a Creeping Snake-Robot
,”
IEEE International Symposium on Computational Intelligence in Robotics and Automation
,
Banff, AB, Canada
,
July 29–Aug. 1
, pp.
77
82
.
5.
GRAY
,
J.
,
1946
, “
The Mechanism of Locomotion in Snakes
,”
J. Exp. Biol.
,
60
(
2
), pp.
101
120
.
6.
Tang
,
C.
,
Shu
,
X.
,
Meng
,
D.
, and
Zhou
,
G.
,
2017
, “
Arboreal Concertina Locomotion of Snake Robots on Cylinders
,”
Int. J. Adv. Rob. Syst.
,
14
(
6
), p.
172988141774844
.
7.
Astley
,
H. C.
,
Gong
,
C.
,
Dai
,
J.
,
Travers
,
M.
,
Serrano
,
M. M.
,
Vela
,
P. A.
,
Choset
,
H.
,
Mendelson
,
J. R.
,
Hu
,
D. L.
, and
Goldman
,
D. I.
,
2015
, “
Modulation of Orthogonal Body Waves Enables High Maneuverability in Sidewinding Locomotion
,”
Proc. Natl. Acad. Sci. U. S. A.
,
112
(
19
), pp.
6200
6205
.
8.
Hirose
,
S.
,
1993
,
Biologically Inspired Robots: Serpentile Locomotors and Manipulators
, Vol.
240
,
Oxford University Press
,
New York
.
9.
Chin
,
K.
,
Hellebrekers
,
T.
, and
Majidi
,
C.
,
2020
, “
Machine Learning for Soft Robotic Sensing and Control
,”
Adv. Intel. Syst.
,
2
(
6
), p.
1900171
.
10.
Mnih
,
V.
,
Kavukcuoglu
,
K.
,
Silver
,
D.
,
Graves
,
A.
,
Antonoglou
,
I.
,
Wierstra
,
D.
, and
Riedmiller
,
M.
,
2013
, “
Playing Atari with Deep Reinforcement Learning
,”
arXiv e-prints
. https://arxiv.org/abs/1312.5602
11.
Tesch
,
M.
,
Lipkin
,
K.
,
Brown
,
I.
,
Hatton
,
R.
,
Peck
,
A.
,
Rembisz
,
J. M.
, and
Choset
,
H.
,
2009
, “
Parameterized and Scripted Gaits for Modular Snake Robots
,”
Adv. Rob.
,
23
(
9
), pp.
1131
1158
.
12.
Tesch
,
M.
,
Schneider
,
J.
, and
Choset
,
H.
,
2011
, “
Using Response Surfaces and Expected Improvement to Optimize Snake Robot Gait Parameters
,”
IEEE/RSJ International Conference on Intelligent Robots and Systems
,
San Francisco, CA
,
Sept. 25–30
, pp.
1069
1074
.
13.
Chernova
,
S.
, and
Veloso
,
M.
,
2004
, “
An Evolutionary Approach to Gait Learning for Four-Legged Robots
,”
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
,
Sendai, Japan
,
Sept. 28–Oct. 2
, Vol. 3, pp.
2562
2567
.
14.
Hengst
,
B.
,
Ibbotson
,
D.
,
Pham
,
S.
, and
Sammut
,
C.
,
2001
, “
Omnidirectional Locomotion for Quadruped Robots
,”
RoboCup
,
Seattle, WA
,
August
.
15.
Olave
,
A.
,
Wang
,
D.
,
Wong
,
J.
,
Tam
,
T.
,
Leung
,
B.
,
Kim
,
M.
,
Brooks
,
J.
,
Chang
,
A.
,
Huben
,
N.
,
Sammut
,
C.
, and
Hengst
,
B.
,
2003
, “
The UNSW RoboCup 2002 legged league team
,”
RoboCup
,
Fukuoka, Japan
,
June 19–25
.
16.
Kim
,
M. S.
, and
Uther
,
W.
,
2003
, “
Automatic Gait Optimisation for Quadruped Robots
,”
RoboCup
,
Padua, Italy
,
July 2–11
.
17.
Lizotte
,
D.
,
Wang
,
T.
,
Bowling
,
M.
, and
Schuurmans
,
D.
,
2007
, “
Automatic Gait Optimization With Gaussian Process Regression
,”
Proceedings of the 20th International Joint Conference on Artifical Intelligence
,
Hyderabad, India
,
Jan. 6–12
, Morgan Kaufmann Publishers Inc., pp.
944
949
.
18.
Calandra
,
R.
,
Seyfarth
,
A.
,
Peters
,
J.
, and
Deisenroth
,
M. P.
,
2014
, “
An Experimental Comparison of Bayesian Optimization for Bipedal Locomotion
,”
IEEE International Conference on Robotics and Automation (ICRA)
,
Hong Kong, China
,
May 31–June 7
, pp.
1951
1958
.
19.
Yu
,
W.
,
Turk
,
G.
, and
Liu
,
C. K.
,
2018
, “
Learning Symmetric and Low-Energy Locomotion
,”
ACM Trans. Graph.
,
37
(
4
), pp.
1
12
.
20.
Kober
,
J.
,
Bagnell
,
J. A.
, and
Peters
,
J.
,
2013
, “
Reinforcement Learning in Robotics: A Survey
,”
Int. J. Rob. Res.
,
32
(
11
), pp.
1238
1274
.
21.
Mnih
,
V.
,
Kavukcuoglu
,
K.
,
Silver
,
D.
,
Rusu
,
A. A.
,
Veness
,
J.
,
Bellemare
,
M. G.
,
Graves
,
A.
,
Riedmiller
,
M.
,
Fidjeland
,
A. K.
,
Ostrovski
,
G.
, et al.
2015
, “
Human-Level Control Through Deep Reinforcement Learning
,”
Nature
,
518
(
7540
), pp.
529
533
.
22.
Menon
,
M. S.
,
Ravi
,
V. C.
, and
Ghosal
,
A.
,
2017
, “
Trajectory Planning and Obstacle Avoidance for Hyper-Redundant Serial Robots
,”
ASME J. Mech. Rob.
,
9
(
4
), p.
041010
.
23.
Deng
,
Y.-H.
, and
Chang
,
J.-Y. J.
,
2021
, “
Human-Like Posture Correction for Seven-Degree-of-Freedom Robotic Arm
,”
ASME J. Mech. Rob.
,
14
(
2
), p.
024501
.
24.
Cully
,
A.
,
Clune
,
J.
,
Tarapore
,
D.
, and
Mouret
,
J.-B.
,
2015
, “
Robots That Can Adapt Like Animals
,”
Nature
,
521
(
7553
), pp.
503
507
.
25.
Ouyang
,
W.
,
Chi
,
H.
,
Pang
,
J.
,
Liang
,
W.
, and
Ren
,
Q.
,
2021
, “
Adaptive Locomotion Control of a Hexapod Robot Via Bio-Inspired Learning
,”
Front. Neurorobot.
,
15
, p.
1
.
26.
Shahriari
,
M.
,
2013
, “
Design, Implementation and Control of a Hexapod Robot Using Reinforcement Learning Approach
,” Ph.D. thesis, Masters thesis, Sharif University of Technology, Int. Campus.
27.
Lele
,
A. S.
,
Fang
,
Y.
,
Ting
,
J.
, and
Raychowdhury
,
A.
,
2020
, “
Learning to Walk: Spike Based Reinforcement Learning for Hexapod Robot Central Pattern Generation
,”
2020 2nd IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)
, pp.
208
212
.
28.
Ramezani Dooraki
,
A.
, and
Lee
,
D.-J.
,
2021
, “
An Innovative Bio-Inspired Flight Controller for Quad-Rotor Drones: Quad-Rotor Drone Learning to Fly Using Reinforcement Learning
,”
Rob. Auton. Syst.
,
135
, p.
103671
.
29.
Koch
,
W.
,
Mancuso
,
R.
,
West
,
R.
, and
Bestavros
,
A.
,
2018
, “
Reinforcement Learning for UAV Attitude Control
,”
Association Comput. Mach.
,
3
(
2
), pp.
1
21
.
30.
Vankadari
,
M. B.
,
Das
,
K.
,
Shinde
,
C.
, and
Kumar
,
S.
,
2018
, “
A Reinforcement Learning Approach for Autonomous Control and Landing of a Quadrotor
,”
International Conference on Unmanned Aircraft Systems (ICUAS)
,
Dallas, TX
,
June 12–15
, pp.
676
683
.
31.
Vaghei
,
Y.
,
Ghanbari
,
A.
, and
Noorani
,
S. M. R. S.
,
2014
, “
Actor–Critic Neural Network Reinforcement Learning for Walking Control of a 5-Link Bipedal Robot
,”
RSI/ISM International Conference on Robotics and Mechatronics (ICRoM)
,
Tehran, Iran
,
Oct. 15–17
, pp.
773
778
.
32.
Castillo
,
G. A.
,
Weng
,
B.
,
Hereid
,
A.
,
Wang
,
Z.
, and
Zhang
,
W.
,
2019
, “
Reinforcement Learning Meets Hybrid Zero Dynamics: A Case Study for Rabbit
,”
International Conference on Robotics and Automation (ICRA)
,
Montreal, QC, Canada
,
May 20–24
, pp.
284
290
.
33.
Castillo
,
G. A.
,
Weng
,
B.
,
Zhang
,
W.
, and
Hereid
,
A.
,
2019
, “
Hybrid Zero Dynamics Inspired Feedback Control Policy Design for 3D Bipedal Locomotion Using Reinforcement Learning
,”
2020 IEEE International Conference on Robotics and Automation (ICRA)
, pp.
8746
8752
.
34.
Rajeswaran
,
A.
,
Kumar
,
V.
,
Gupta
,
A.
,
Vezzani
,
G.
,
Schulman
,
J.
,
Todorov
,
E.
, and
Levine
,
S.
,
2018
, “
Learning Complex Dexterous Manipulation With Deep Reinforcement Learning and Demonstrations
,”
Robotics: Science and Systems 2018
.
35.
Long
,
P.
,
Fan
,
T.
,
Liao
,
X.
,
Liu
,
W.
,
Zhang
,
H.
, and
Pan
,
J.
,
2018
, “
Towards Optimally Decentralized Multi-Robot Collision Avoidance Via Deep Reinforcement Learning
,”
2018 IEEE International Conference on Robotics and Automation (ICRA)
, pp.
6252
6259
.
36.
Peng
,
X. B.
,
Berseth
,
G.
,
Yin
,
K.
, and
Van De Panne
,
M.
,
2017
, “
Deeploco: Dynamic Locomotion Skills Using Hierarchical Deep Reinforcement Learning
,”
ACM Trans. Graph.
,
36
(
4
), pp.
1
13
.
37.
Kormushev
,
P.
,
Ugurlu
,
B.
,
Caldwell
,
D. G.
, and
Tsagarakis
,
N. G.
,
2019
, “
Learning to Exploit Passive Compliance for Energy-Efficient Gait Generation on a Compliant Humanoid
,”
Auton. Rob.
,
43
(
1
), pp.
79
95
.
38.
Peng
,
X. B.
,
Abbeel
,
P.
,
Levine
,
S.
, and
van de Panne
,
M.
,
2018
, “
Deepmimic
,”
ACM Trans. Graph.
,
37
(
4
), pp.
1
14
.
39.
Bing
,
Z.
,
Lemke
,
C.
,
Jiang
,
Z.
,
Huang
,
K.
, and
Knoll
,
A.
,
2019
, “
Energy-Efficient Slithering Gait Exploration for a Snake-Like Robot Based on Reinforcement Learning
,”
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence
, pp.
5663
5669
.
40.
Ren
,
T.
,
Dong
,
Y.
,
Wu
,
D.
, and
Chen
,
K.
,
2018
, “
Learning-Based Variable Compliance Control for Robotic Assembly
,”
ASME J. Mech. Rob.
,
10
(
6
), p.
061008
.
41.
Wu
,
C.-A.
,
2019
, “
Investigation of Different Observation and Action Spaces for Reinforcement Learning on Reaching Tasks
,”
KTH, School of Electrical Engineering and Computer Science (EECS)
. urn:nbn:se:kth:diva-271182
42.
Coumans
,
E.
, and
Bai
,
Y.
,
2016–2019
, “
Pybullet, A Python Module for Physics Simulation for Games, Robotics and Machine Learning
,”
GitHub repository
. http://pybullet.org
43.
Hill
,
A.
,
Raffin
,
A.
,
Ernestus
,
M.
,
Gleave
,
A.
,
Kanervisto
,
A.
,
Traore
,
R.
,
Dhariwal
,
P.
,
Hesse
,
C.
,
Klimov
,
O.
,
Nichol
,
A.
,
Plappert
,
M.
,
Radford
,
A.
,
Schulman
,
J.
,
Sidor
,
S.
, and
Wu
,
Y.
,
2018
, “
Stable Baselines
,”
GitHub repository
. https://github.com/hill-a/stable-baselines
You do not currently have access to this content.