Graph-based Dependency Parsing with Phrasalization
- Graph-based Dependency Parsing with Phrasalization
- Date Issued
- Constituency and dependency are complementary syntactical information, and both are necessary and essential in syntactic analysis. The use of dependency relation between words in constituency parsing is quite common, and previous works have demonstrated that dependency information improves the performance of constituency parsing. Constituency information is seldom used in dependency parsing, in this thesis, we show that constituency information also helps to improve dependency parsing. Motivated by the works of lexicalized PCFG parsing, we proposed a phrasalized dependency parsing. Phrasalization is a process to associate each head word with a phrase category. Since there is no phrase node in a vanilla dependency treebank, we derived a phrase category for each word sequence which can be dominated by a head. Then we associated the head with the derived phrase category. The existing graph-based dependency parsers are mainly based on spans. With the head locates on the left or right-edged word, a span merely corresponds to a half-constituent. Compared with constituent-based algorithms, span-based algorithms can parse efficiently with a complexity ranging from O(n3) to O(n4). In a span-based algorithm, a phrase is treated as two spans, and each span is processed independently to the other. Thus, it is impossible to model phrases in previous span-based algorithms. In this thesis, we proposed a new span-based dependency chart parsing algorithm which can process on phrases based on ternary-span combination, by maintaining the computational efficiency of original span-based algorithms. Additionally, with the proposed algorithm, we can model the relations between the left and right dependents of a head, which has been ignored in existing algorithms. We also proposed a new dependency parsing model involving phrases, and the new parsing model derives parse trees based on scores of dependency arcs and phrases. With the new parsing model, we are able to achieve better performances. The improvements on long sentences are even significant. The improvement for sentences longer than 40, is over 1% on the Chinese data of CoNLL 2009.
- Article Type
- Files in This Item:
- There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.