An Energy-Saving Snake Locomotion Pattern Learned in a Physically Constrained Environment With Online Model-Based Policy Gradient Method

Liu, Yilang; Barati Farimani, Amir

doi:10.1115/1.4055167

Abstract

Snake robots, composed of sequentially connected joint actuators, have recently gained increasing attention in the industrial field, like life detection in narrow space. Such robots can navigate the complex environment via the cooperation of multiple motors located on the backbone. However, controlling the robots in a physically constrained environment is challenging, and conventional control strategies can be energy-inefficient or even fail to navigate to the destination. This work develops a snake locomotion gait policy for energy-efficient control via deep reinforcement learning (DRL). After establishing the environment model, we apply a physics constrained online policy gradient method based on the proximal policy optimization (PPO) objective function of each joint motor parameterized by angular velocity. The DRL agent learns the standard serpenoid curve at each timestep. The policy is updated based on the robot’s observations and estimation of the current states. The robot simulator and task environment are built upon PyBullet. Compared to conventional control strategies, the snake robots controlled by the trained PPO agent can achieve faster movement and a more energy-efficient locomotion gait. This work demonstrates that DRL provides an energy-efficient solution for robot control.

References

1.

Tescha

,

M.

,

Lipkin

,

K.

,

Brown

,

I.

,

Hatton

,

R.

,

Peck

,

A.

,

Rembisz

,

J.

, and

Choset

,

H.

,

2009

, “

Parameterized and Scripted Gaits for Modular Snake Robots

,”

Adv. Rob.

,

23

(

9

), pp.

1131

–

1158

.

Google Scholar

Crossref

2.

Wang

,

T.

,

Whitman

,

J.

,

Travers

,

M.

, and

Choset

,

H.

,

2020

, “

Directional Compliance in Obstacle-Aided Navigation for Snake Robots

,”

2020 American Control Conference (ACC)

, pp.

2458

–

2463

.

3.

Moattari

,

M.

, and

Bagherzadeh

,

M. A.

,

2013

, “

Flexible Snake Robot: Design and Implementation

,”

3rd Joint Conference of AI & Robotics and 5th RoboCup Iran Open International Symposium

,

Tehran, Iran

,

Apr. 8

, pp.

1

–

5

.

Google Scholar

Crossref

4.

Shugen

,

M.

,

Araya

,

H.

, and

Li

,

L.

,

2001

, “

Development of a Creeping Snake-Robot

,”

IEEE International Symposium on Computational Intelligence in Robotics and Automation

,

Banff, AB, Canada

,

July 29–Aug. 1

, pp.

77

–

82

.

Google Scholar

Crossref

5.

GRAY

,

J.

,

1946

, “

The Mechanism of Locomotion in Snakes

,”

J. Exp. Biol.

,

60

(

2

), pp.

101

–

120

.

Google Scholar

Crossref

6.

Tang

,

C.

,

Shu

,

X.

,

Meng

,

D.

, and

Zhou

,

G.

,

2017

, “

Arboreal Concertina Locomotion of Snake Robots on Cylinders

,”

Int. J. Adv. Rob. Syst.

,

14

(

6

), p.

172988141774844

.

Google Scholar

Crossref

7.

Astley

,

H. C.

,

Gong

,

C.

,

Dai

,

J.

,

Travers

,

M.

,

Serrano

,

M. M.

,

Vela

,

P. A.

,

Choset

,

H.

,

Mendelson

,

J. R.

,

Hu

,

D. L.

, and

Goldman

,

D. I.

,

2015

, “

Modulation of Orthogonal Body Waves Enables High Maneuverability in Sidewinding Locomotion

,”

Proc. Natl. Acad. Sci. U. S. A.

,

112

(

19

), pp.

6200

–

6205

.

Google Scholar

Crossref

PubMed

8.

Hirose

,

S.

,

1993

,

Biologically Inspired Robots: Serpentile Locomotors and Manipulators

, Vol.

240

,

Oxford University Press

,

New York

.

9.

Chin

,

K.

,

Hellebrekers

,

T.

, and

Majidi

,

C.

,

2020

, “

Machine Learning for Soft Robotic Sensing and Control

,”

Adv. Intel. Syst.

,

2

(

6

), p.

1900171

.

Google Scholar

Crossref

10.

Mnih

,

V.

,

Kavukcuoglu

,

K.

,

Silver

,

D.

,

Graves

,

A.

,

Antonoglou

,

I.

,

Wierstra

,

D.

, and

Riedmiller

,

M.

,

2013

, “

Playing Atari with Deep Reinforcement Learning

,”

arXiv e-prints

. https://arxiv.org/abs/1312.5602

11.

Tesch

,

M.

,

Lipkin

,

K.

,

Brown

,

I.

,

Hatton

,

R.

,

Peck

,

A.

,

Rembisz

,

J. M.

, and

Choset

,

H.

,

2009

, “

Parameterized and Scripted Gaits for Modular Snake Robots

,”

Adv. Rob.

,

23

(

9

), pp.

1131

–

1158

.

Google Scholar

Crossref

12.

Tesch

,

M.

,

Schneider

,

J.

, and

Choset

,

H.

,

2011

, “

Using Response Surfaces and Expected Improvement to Optimize Snake Robot Gait Parameters

,”

IEEE/RSJ International Conference on Intelligent Robots and Systems

,

San Francisco, CA

,

Sept. 25–30

, pp.

1069

–

1074

.

Google Scholar

Crossref

13.

Chernova

,

S.

, and

Veloso

,

M.

,

2004

, “

An Evolutionary Approach to Gait Learning for Four-Legged Robots

,”

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

,

Sendai, Japan

,

Sept. 28–Oct. 2

, Vol. 3, pp.

2562

–

2567

.

14.

Hengst

,

B.

,

Ibbotson

,

D.

,

Pham

,

S.

, and

Sammut

,

C.

,

2001

, “

Omnidirectional Locomotion for Quadruped Robots

,”

RoboCup

,

Seattle, WA

,

August

.

Google Scholar

Crossref

15.

Olave

,

A.

,

Wang

,

D.

,

Wong

,

J.

,

Tam

,

T.

,

Leung

,

B.

,

Kim

,

M.

,

Brooks

,

J.

,

Chang

,

A.

,

Huben

,

N.

,

Sammut

,

C.

, and

Hengst

,

B.

,

2003

, “

The UNSW RoboCup 2002 legged league team

,”

RoboCup

,

Fukuoka, Japan

,

June 19–25

.

16.

Kim

,

M. S.

, and

Uther

,

W.

,

2003

, “

Automatic Gait Optimisation for Quadruped Robots

,”

RoboCup

,

Padua, Italy

,

July 2–11

.

17.

Lizotte

,

D.

,

Wang

,

T.

,

Bowling

,

M.

, and

Schuurmans

,

D.

,

2007

, “

Automatic Gait Optimization With Gaussian Process Regression

,”

Proceedings of the 20th International Joint Conference on Artifical Intelligence

,

Hyderabad, India

,

Jan. 6–12

, Morgan Kaufmann Publishers Inc., pp.

944

–

949

.

18.

Calandra

,

R.

,

Seyfarth

,

A.

,

Peters

,

J.

, and

Deisenroth

,

M. P.

,

2014

, “

An Experimental Comparison of Bayesian Optimization for Bipedal Locomotion

,”

IEEE International Conference on Robotics and Automation (ICRA)

,

Hong Kong, China

,

May 31–June 7

, pp.

1951

–

1958

.

Google Scholar

Crossref

19.

Yu

,

W.

,

Turk

,

G.

, and

Liu

,

C. K.

,

2018

, “

Learning Symmetric and Low-Energy Locomotion

,”

ACM Trans. Graph.

,

37

(

4

), pp.

1

–

12

.

Google Scholar

Crossref

20.

Kober

,

J.

,

Bagnell

,

J. A.

, and

Peters

,

J.

,

2013

, “

Reinforcement Learning in Robotics: A Survey

,”

Int. J. Rob. Res.

,

32

(

11

), pp.

1238

–

1274

.

Google Scholar

Crossref

21.

Mnih

,

V.

,

Kavukcuoglu

,

K.

,

Silver

,

D.

,

Rusu

,

A. A.

,

Veness

,

J.

,

Bellemare

,

M. G.

,

Graves

,

A.

,

Riedmiller

,

M.

,

Fidjeland

,

A. K.

,

Ostrovski

,

G.

, et al.

2015

, “

Human-Level Control Through Deep Reinforcement Learning

,”

Nature

,

518

(

7540

), pp.

529

–

533

.

Google Scholar

Crossref

PubMed

22.

Menon

,

M. S.

,

Ravi

,

V. C.

, and

Ghosal

,

A.

,

2017

, “

Trajectory Planning and Obstacle Avoidance for Hyper-Redundant Serial Robots

,”

ASME J. Mech. Rob.

,

9

(

4

), p.

041010

.

Google Scholar

Crossref

23.

Deng

,

Y.-H.

, and

Chang

,

J.-Y. J.

,

2021

, “

Human-Like Posture Correction for Seven-Degree-of-Freedom Robotic Arm

,”

ASME J. Mech. Rob.

,

14

(

2

), p.

024501

.

Google Scholar

Crossref

24.

Cully

,

A.

,

Clune

,

J.

,

Tarapore

,

D.

, and

Mouret

,

J.-B.

,

2015

, “

Robots That Can Adapt Like Animals

,”

Nature

,

521

(

7553

), pp.

503

–

507

.

Google Scholar

Crossref

PubMed

25.

Ouyang

,

W.

,

Chi

,

H.

,

Pang

,

J.

,

Liang

,

W.

, and

Ren

,

Q.

,

2021

, “

Adaptive Locomotion Control of a Hexapod Robot Via Bio-Inspired Learning

,”

Front. Neurorobot.

,

15

, p.

1

.

Google Scholar

Crossref

26.

Shahriari

,

M.

,

2013

, “

Design, Implementation and Control of a Hexapod Robot Using Reinforcement Learning Approach

,” Ph.D. thesis, Masters thesis, Sharif University of Technology, Int. Campus.

27.

Lele

,

A. S.

,

Fang

,

Y.

,

Ting

,

J.

, and

Raychowdhury

,

A.

,

2020

, “

Learning to Walk: Spike Based Reinforcement Learning for Hexapod Robot Central Pattern Generation

,”

2020 2nd IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS)

, pp.

208

–

212

.

28.

Ramezani Dooraki

,

A.

, and

Lee

,

D.-J.

,

2021

, “

An Innovative Bio-Inspired Flight Controller for Quad-Rotor Drones: Quad-Rotor Drone Learning to Fly Using Reinforcement Learning

,”

Rob. Auton. Syst.

,

135

, p.

103671

.

Google Scholar

Crossref

29.

Koch

,

W.

,

Mancuso

,

R.

,

West

,

R.

, and

Bestavros

,

A.

,

2018

, “

Reinforcement Learning for UAV Attitude Control

,”

Association Comput. Mach.

,

3

(

2

), pp.

1

–

21

.

Google Scholar

Crossref

30.

Vankadari

,

M. B.

,

Das

,

K.

,

Shinde

,

C.

, and

Kumar

,

S.

,

2018

, “

A Reinforcement Learning Approach for Autonomous Control and Landing of a Quadrotor

,”

International Conference on Unmanned Aircraft Systems (ICUAS)

,

Dallas, TX

,

June 12–15

, pp.

676

–

683

.

Google Scholar

Crossref

31.

Vaghei

,

Y.

,

Ghanbari

,

A.

, and

Noorani

,

S. M. R. S.

,

2014

, “

Actor–Critic Neural Network Reinforcement Learning for Walking Control of a 5-Link Bipedal Robot

,”

RSI/ISM International Conference on Robotics and Mechatronics (ICRoM)

,

Tehran, Iran

,

Oct. 15–17

, pp.

773

–

778

.

Google Scholar

Crossref

32.

Castillo

,

G. A.

,

Weng

,

B.

,

Hereid

,

A.

,

Wang

,

Z.

, and

Zhang

,

W.

,

2019

, “

Reinforcement Learning Meets Hybrid Zero Dynamics: A Case Study for Rabbit

,”

International Conference on Robotics and Automation (ICRA)

,

Montreal, QC, Canada

,

May 20–24

, pp.

284

–

290

.

Google Scholar

Crossref

33.

Castillo

,

G. A.

,

Weng

,

B.

,

Zhang

,

W.

, and

Hereid

,

A.

,

2019

, “

Hybrid Zero Dynamics Inspired Feedback Control Policy Design for 3D Bipedal Locomotion Using Reinforcement Learning

,”

2020 IEEE International Conference on Robotics and Automation (ICRA)

, pp.

8746

–

8752

.

34.

Rajeswaran

,

A.

,

Kumar

,

V.

,

Gupta

,

A.

,

Vezzani

,

G.

,

Schulman

,

J.

,

Todorov

,

E.

, and

Levine

,

S.

,

2018

, “

Learning Complex Dexterous Manipulation With Deep Reinforcement Learning and Demonstrations

,”

Robotics: Science and Systems 2018

.

35.

Long

,

P.

,

Fan

,

T.

,

Liao

,

X.

,

Liu

,

W.

,

Zhang

,

H.

, and

Pan

,

J.

,

2018

, “

Towards Optimally Decentralized Multi-Robot Collision Avoidance Via Deep Reinforcement Learning

,”

2018 IEEE International Conference on Robotics and Automation (ICRA)

, pp.

6252

–

6259

.

36.

Peng

,

X. B.

,

Berseth

,

G.

,

Yin

,

K.

, and

Van De Panne

,

M.

,

2017

, “

Deeploco: Dynamic Locomotion Skills Using Hierarchical Deep Reinforcement Learning

,”

ACM Trans. Graph.

,

36

(

4

), pp.

1

–

13

.

Google Scholar

Crossref

37.

Kormushev

,

P.

,

Ugurlu

,

B.

,

Caldwell

,

D. G.

, and

Tsagarakis

,

N. G.

,

2019

, “

Learning to Exploit Passive Compliance for Energy-Efficient Gait Generation on a Compliant Humanoid

,”

Auton. Rob.

,

43

(

1

), pp.

79

–

95

.

Google Scholar

Crossref

38.

Peng

,

X. B.

,

Abbeel

,

P.

,

Levine

,

S.

, and

van de Panne

,

M.

,

2018

, “

Deepmimic

,”

ACM Trans. Graph.

,

37

(

4

), pp.

1

–

14

.

Google Scholar

Crossref

39.

Bing

,

Z.

,

Lemke

,

C.

,

Jiang

,

Z.

,

Huang

,

K.

, and

Knoll

,

A.

,

2019

, “

Energy-Efficient Slithering Gait Exploration for a Snake-Like Robot Based on Reinforcement Learning

,”

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence

, pp.

5663

–

5669

.

40.

Ren

,

T.

,

Dong

,

Y.

,

Wu

,

D.

, and

Chen

,

K.

,

2018

, “

Learning-Based Variable Compliance Control for Robotic Assembly

,”

ASME J. Mech. Rob.

,

10

(

6

), p.

061008

.

Google Scholar

Crossref

41.

Wu

,

C.-A.

,

2019

, “

Investigation of Different Observation and Action Spaces for Reinforcement Learning on Reaching Tasks

,”

KTH, School of Electrical Engineering and Computer Science (EECS)

. urn:nbn:se:kth:diva-271182

42.

Coumans

,

E.

, and

Bai

,

Y.

,

2016–2019

, “

Pybullet, A Python Module for Physics Simulation for Games, Robotics and Machine Learning

,”

GitHub repository

. http://pybullet.org

43.

Hill

,

A.

,

Raffin

,

A.

,

Ernestus

,

M.

,

Gleave

,

A.

,

Kanervisto

,

A.

,

Traore

,

R.

,

Dhariwal

,

P.

,

Hesse

,

C.

,

Klimov

,

O.

,

Nichol

,

A.

,

Plappert

,

M.

,

Radford

,

A.

,

Schulman

,

J.

,

Sidor

,

S.

, and

Wu

,

Y.

,

2018

, “

Stable Baselines

,”

GitHub repository

. https://github.com/hill-a/stable-baselines

You do not currently have access to this content.

An Energy-Saving Snake Locomotion Pattern Learned in a Physically Constrained Environment With Online Model-Based Policy Gradient Method

Abstract

References

Sign In

Purchase this Content

Get Email Alerts

Cited By

ASME Journals

ASME Conference Proceedings

ASME eBooks

Resources

Opportunities

An Energy-Saving Snake Locomotion Pattern Learned in a Physically Constrained Environment With Online Model-Based Policy Gradient Method

Abstract

References

Sign In

Purchase this Content

Product added to cart.

Get Email Alerts

Cited By

Related Articles

Related Proceedings Papers

Related Chapters

ASME Journals

ASME Conference Proceedings

ASME eBooks

Resources

Opportunities

This Feature Is Available To Subscribers Only