Open Access System for Information Sharing

Department of Creative IT Engineering (창의IT융합공학과) 4. Theses_Master

Thesis

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

Quantization-Aware Bayesian Policy for Deep Reinforcement Learning

Title: Quantization-Aware Bayesian Policy for Deep Reinforcement Learning

Authors: 길윤희

Date Issued: 2023

Publisher: 포항공과대학교

Abstract: Combined with deep neural network, recent deep reinforcement learning algorithms have been proved to be successful in continuous control of real-world systems. However, as most deep reinforcement learning algorithms are assumed to be operated on devices with rich computational resources, operating it on lightweight devices can suffer from excessive usage of power, memory, and inference time. To resolve this issue, this paper proposes a quantization-aware bayesian policy that is aware of quantization scheme while training and pruning its bayesian policy, and methods to make the network compression algorithm more suitable for RL policy network. The proposed algorithm is proven to be successful in training and compressing of deep RL policy without performance loss in multiple continuous control benchmark tasks.
본 논문은 베이지안 기반의 심층 강화학습 정책 신경망 압축 알고리즘을 제안 하였다. 심층 강화 학습은 게임이나 로보틱스 등 다양한 분야에서 뛰어난 성능을 보이지만, 대부분의 알고리즘이 컴퓨팅 자원이 풍부한 기기에서 학습하고 작동하 는 것을 기반으로 연구되고 있다. 그러나 모바일 기기와 같은 시스템에서는 컴퓨팅 자원이 부족한 경우가 많아 강화학습 사용에 큰 어려움을 겪을 수 있다. 이러한 가 용 자원의 차이를 극복하기 위해 신경망 압축 알고리즘이 흔히 사용되고 있고, 관련 연구도 활발하게 진행되고 있지만, 대부분의 신경망 압축 알고리즘이 기계학습의 다른 분야인 지도 학습이나 비지도 학습 신경망에 초점을 맞추고 있어 알고리즘을 강화학습 신경망에 그대로 적용하기엔 어려움이 따른다. 따라서 본 논문에서는 강화학습 정책 신경망을 학습과 동시에 압축이 가능한 베 이지안 기반의 강화학습 정책 신경망 압축 알고리즘을 제안하여, 정책 신경망의 성능은 보존하면서도 압축을 가능하게 하고자 하였다. 제시한 알고리즘은 학습과 동시에 베이지안 신경망 희소화가 가능한 강화학습 알고리즘인 SVDPG와 베이지 안 신경망 압축 알고리즘인 UVNQ를 결합해 정책 신경망 학습과 희소화, 양자화를 모두 가능케 하였지만, 거기에 더해 여러가지 수정을 통해 압축한 정책 신경망이 최 대한의 성능을 보존하도록 하였다. 제안된 알고리즘의 성능 검증 실험에는 OpenAI Gym의 MuJoCo 기반 시뮬레이션이 사용되었고, 실험 결과 제안된 알고리즘으로 압축된 정책 신경망은 타 압축 알고리즘과 비교해 적은 파라미터 수와 비트 수를 가지고도 더 좋은 성능을 보여주는 것을 확인할 수 있었다. 본 연구는 강화학습 기반 제어를 실제 시스템에 컴퓨팅 기기의 성능 제한없이 사 용하게 할 수 있을 것으로 기대된다. 향후 연구 방향으로는 정책 신경망의 학습과 희소화, 양자화를 일원화 시켜 압축된 신경망의 성능 보존을 더 확실화 함과 동시에 압축률을 더 높이는 것을 가능하게 하는 것으로 진행할 예정이다.

URI: http://postech.dcollection.net/common/orgView/200000661648
https://oasis.postech.ac.kr/handle/2014.oak/118229

Article Type: Thesis

Files in This Item:: There are no files associated with this item.

Show full item record

qr_code

트윗하기

Communities & Collection

Department of Creative IT Engineering (창의IT융합공학과)

Open Access System for Information Sharing

Communities & Collection

Views & Downloads

Browse