2023 Asia Conference on Cognitive Engineering and Intelligent Interaction (CEII)
Download PDF

Abstract

Training sample collection in the real world or simulation environment is usually time-consuming and inefficient. Model-based offline reinforcement learning (RL) provides a feasible way to solve this problem by modeling the interaction environment of agents. However, the previous methods usually ignored the temporal correlation between different samples, resulting in a large deviation in the estimate of state transition, which decreases the learning efficiency and final performance. To address this issue, we proposed GPTMORE, the Generative Pretrained Transformer for Model-based Offline REinforcement learning. Based on the accurate estimation of the state transition dynamics from GPTMORE, we perform Proximal Policy Optimization (PPO) for policy improvement. We instantiate GPTMORE in MuJoCo for robot control tasks and compare it with the state-of-the-art (SOTA) offline RL methods. The final performance of GPTMORE is better than the previous approaches in most tasks, which demonstrates the effectiveness and robustness of our proposed method.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles