Open Access System for Information Sharing

Login Library

 

Thesis
Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

최신 개체명의 번역을 위한 효율적인 그래프 기반 방법론

Title
최신 개체명의 번역을 위한 효율적인 그래프 기반 방법론
Authors
김진한
Date Issued
2014
Publisher
포항공과대학교
Abstract
Named Entities (NEs) normally refer to a range of concepts such as people names, location names, organization names, and product names. As large quantities of new named entities (or emerging named entities) appear everyday in newspaper, web sites, and TV programs, NE analysis becomes more and more important in data mining and information retrieval society. Information on NEs can be extracted from (a) structured sources such as databases and tables, (b) semi-structured sources such as knowledge bases (or, called interchangeably as ontologies), or (c) unstructured sources such as text corpora. Among many research topics related with NE analysis such as ontology integration, named entity linking, and named entity translation, this dissertation addresses the problem of mining NE translations from comparable corpora, specifically, mining English and Chinese NE translation. I observe that existing approaches use one or more of the following NE similarity metrics: entity name similarity, entity context similarity, and entity relationship similarity. Motivated by this observation, this dissertation proposes a new holistic approach, by (1) combining all similarity types used and (2) additionally considering a new similarity measure, relationship context similarity between pairs of NEs, which is a missing quadrant in the taxonomy of similarity metrics. I abstract the NE translation problem as the matching of two NE graphs extracted from the comparable corpora. Specifically, two monolingual NE graphs are first constructed from comparable corpora to extract relationship between NEs. Entity name similarity and entity context similarity are then calculated from every pair of bilingual NEs for computing initial pairwise NE similarity. A reinforcing method is utilized to reflect relationship similarity and relationship context similarity between NEs. I also discover corpus “latent” features lost in the graph extraction process and integrate them into proposed framework, and improve relationship-based similarities by overcoming asymmetry of comparable corpora and considering other types of NEs. According to the experimental results, proposed holistic graph-based approaches and its enhancements are highly effective and proposed framework significantly outperforms previous state-of-the-art approaches.
URI
http://postech.dcollection.net/jsp/common/DcLoOrgPer.jsp?sItemId=000001739807
http://oasis.postech.ac.kr/handle/2014.oak/2304
Article Type
Thesis
Files in This Item:
There are no files associated with this item.

qr_code

  • mendeley

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Views & Downloads

Browse