Open Access System for Information Sharing

Graduate School of Artificial Intelligence (인공지능대학원) 4. Theses_Master

Thesis

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Relaxing Interference from Data Augmentation for Deep Reinforcement Learning

Title: Relaxing Interference from Data Augmentation for Deep Reinforcement Learning

Authors: 고병찬

Date Issued: 2021

Publisher: 포항공과대학교

Abstract: We consider data augmentation technique to improve data efficiency and generalization performance in reinforcement learning (RL). Our empirical study on Open AI Procgen shows that the data augmentation can help or interfere with training according to environment. To maximize both of data efficiency and generalization, we propose a two-way solution: time scheduling of augmentation and gradient modification to maximize the benefits of data augmentation and minimize interference. We find that the timing of when applying augmentation is critical, and to maximize test performance, an augmentation should be applied either during the entire RL training, or after the end of RL training. More specifically, if the regularization imposed by augmentation is helpful only in testing, it is better to procrastinate the augmentation after training than to use it during training in terms of sample and computation complexity since such an augmentation often disturbs the training process. Conversely, an augmentation providing regularization useful in training needs to be used during the whole training period to fully utilize its benefit in terms of not only generalization but also data efficiency. Based on our findings, we propose a mechanism to fully exploit a set of augmentations,which identifies an augmentation (including no augmentation) to maximize RL training performance, and then utilizes all the augmentations by network distillation to maximize test performance. In addition, we project a gradient from regularization, which is used to use data augmentation, onto a normal vector of direction of training updates to reduce the interference from data augmentation. Our experiment empirically justifies the proposed method compared to other automatic augmentation mechanism.
본 논문에서는 데이터 증강 (data augmentation) 방법을 이용하여 강화 학습(RL)에서의 데이터 효율성 및 일반화 성능을 향상시키고자 한다. Open AI Procgen를 이용한 실증 연구를 통해 우리는 데이터 증강이 환경에 따라 훈련을 돕거나 방해할 수 있음을 보였다. 따라서 우리는 데이터 증강의 이점은 극대화하고, 훈련에 대한 간섭은 최소화하여 강화학습에서의 데이터 효율성과 일반화 성능을 모두 향상시키고자 한다. 이를 위한 방법론으로 데이터 증강 스케줄링과 그래디언트 수정 ( gradient modification) 이라는 두가지 방법론을 제시한다. 우리는 증강을 적용하는 시기의 중요성으로 부터 테스트 성능(test performance)을 높이기 위해서는 RL 훈련 중 또는 RL 훈련이 종료된 후 중에서 증강을 적용할 시기를 정해야 한다는 것을 발견하였다. 구체적으로는 증강에 의해 부여된 정규화(regularization)가 테스트에만 도움이 된다면, 증강을 훈련 중에 사용하는 것보다 훈련 후로 미루는 것이 더 낫다. 반대로, 훈련에 유용한 정규화의 경우에는, 증강을 전체 훈련 기간 동안 사용하여 일반화뿐만 아니라 데이터 효율성 측면에서의 이점도 충분히 활용해야 한다. 앞선 연구를 바탕으로, 우리는 증강을 사용하지 않는 경우를 포함하여 다양한 증강을 활용하여 학습에 도움이 되는 증강과 증강 적용시기를 찾아낼 수 있는 메커니즘을 제안한다. 또한, 데이터 증강으로 인한 간섭을 줄이기 위해 훈련 업데이트 방향의 법선(normal vector)에 데이터 증강 적용을 위한 정규화로 부터의 그래디언트를 투영(projection)하여 그래디언트를 수정하는 방안을 제안하였다. 우리는 실험을 통해 다른 증강 메커니즘들과 비교하여 제안된 방법들의 정당성을 보이고자 한다.

URI: http://postech.dcollection.net/common/orgView/200000598648
https://oasis.postech.ac.kr/handle/2014.oak/117142

Article Type: Thesis

Files in This Item:: There are no files associated with this item.

Show full item record

qr_code

트윗하기

Communities & Collection

Graduate School of Artificial Intelligence (인공지능대학원)

Open Access System for Information Sharing

Communities & Collection

Views & Downloads

Browse