王玉惠
Personal Homepage
Paper Publications
Truly proximal policy optimization
Hits:

Affiliation of Author(s):计算机科学与技术学院/人工智能学院/软件学院

Journal:Conf. Uncertain. Artif. Intell., UAI

Abstract:Proximal policy optimization (PPO) is one of the most successful deep reinforcement learning methods, achieving state-of-the-art performance across a wide range of challenging tasks. However, its optimization behavior is still far from being fully understood. In this paper, we show that PPO could neither strictly restrict the probability ratio as it attempts to do nor enforce a well-defined trust region constraint, which means that it may still suffer from the risk of performance instability. To address this issue, we present an enhanced PPO method, named Trust Region-based PPO with Rollback (TR-PPO-RB). Two critical improvements are made in our method: 1) it adopts a new clipping function to support a rollback behavior to restrict the ratio between the new policy and the old one; 2) the triggering condition for clipping is replaced with a trust region-based one, which is theoretically justified according to the trust region theorem. It seems, by adhering more truly to the "proximal" property − restricting the policy within the trust region, the new algorithm improves the original PPO on both stability and sample efficiency. © 2019 Association For Uncertainty in Artificial Intelligence (AUAI). All rights reserved.

Translation or Not:no

Date of Publication:2019-01-01

Co-author:Wang, Yuhui,He, Hao,Tan Xiaoyang

Correspondence Author:Wang Yuhui

Personal information

Professor
Supervisor of Doctorate Candidates

Gender:Female

Education Level:南京航空航天大学

Degree:Doctoral Degree in Engineering

School/Department:College of Automation Engineering

Discipline:Control Theory and Engineering. Control Science and Engineering

Business Address:自动化学院4号楼401

Contact Information:025-84892301-8023

Click:

Open time:..

The Last Update Time:..


Copyright©2018- Nanjing University of Aeronautics and Astronautics·Informationization Department(Informationization Technology Center)

MOBILE Version