Open Access System for Information Sharing

Thesis

Cited 0 time in webofscience

Cited 0 time in scopus

Metadata Downloads

통계기계번역에서 문장구조와 단어에 기반한 클러스터링

Abstract: Clustering method which based on sentence type or document genre is a technique used to improve translation quality of statistical machine translation (SMT) by domain-specific translation. But there is no previous research using sentence type information and document genre simultaneously. In this paper, we suggest an integrated clustering method that classifying sentence type by syntactic structure similarity and document genre by word similarity information. We interpolated domain-specific models from clusters with general models to improve translation quality of SMT system. Both similarities are calculated by cosine measures and interpolated. With these similarities, we used K-means machine learning algorithm to clustering training corpus. Compared to previous approach in Japanese-English patent translation corpus, this approach relatively improved 14% of translation quality.

URI: http://postech.dcollection.net/jsp/common/DcLoOrgPer.jsp?sItemId=000000547198
https://oasis.postech.ac.kr/handle/2014.oak/607

qr_code