Open Access System for Information Sharing

Login Library

 

Thesis
Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads
Full metadata record
Files in This Item:
There are no files associated with this item.
DC FieldValueLanguage
dc.contributor.authorMu-Woong Leeen_US
dc.date.accessioned2014-12-01T11:48:13Z-
dc.date.available2014-12-01T11:48:13Z-
dc.date.issued2012en_US
dc.identifier.otherOAK-2014-01107en_US
dc.identifier.urihttp://postech.dcollection.net/jsp/common/DcLoOrgPer.jsp?sItemId=000001385016en_US
dc.identifier.urihttps://oasis.postech.ac.kr/handle/2014.oak/1609-
dc.descriptionDoctoren_US
dc.description.abstractThis research addresses the problem of supporting scalable code similarity search systems for large-scale software repositories. While there are commercial code search engines available, they treat software as text and often fail to find semantically related code. Meanwhile, existing tools for semantic code clone searches take a “post-mortem” approach involving the detection of clones “after” the code development is completed, and hence, fail to return the results instantly. In clear contrast, the goal of this research is to combine the strength of these two lines of existing research.To achieve this goal, an indexing structure on vector abstractions of code is proposed. This index utilizes dimension reduction techniques to efficiently deal with the vector abstractions, which are naturally high-dimensional. This search system is then integrated into real-world development sessions. Such integration suggests that, by posing every code segment as a query to the software code corpus, developers can instantly reference relevant code segments at the time of generation to enhance productivity. This integration scenario creates the need for efficient similarity searches with the following requirements. First, a developer session translates into a sequence of evolving queries that need to be efficiently supported. Second, the quality of the results needs to be controlled, e.g., dealing with licenses requires that there be no false negatives. To satisfy these requirements, a workload-aware striping framework for high-dimensional evolving queries is proposed. This framework can be used to boost most existing high-dimensional indexes. In addition, to further enhance the scalability of code search systems, a workload-balancing distributed indexing structure is proposed. The goal of existing efforts in distributed indexing has been the localization of queries to data residing at a small number of nodes (i.e., locality-preserving indexing) to minimize communication cost. However, considering that workloads often correlate with data locality, such indexing often generates hotspots. Hence, workload-balancing is proposed as an optimization goal, and a distributed index that evenly distributes the workload is presented.en_US
dc.languageengen_US
dc.publisher포항공과대학교en_US
dc.rightsBY_NC_NDen_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/2.0/kren_US
dc.titleScalable High-dimensional Index Design for Code Search Systemsen_US
dc.title.alternative코드검색시스템을 위한 고차원 색인기법 설계en_US
dc.typeThesisen_US
dc.contributor.college일반대학원 전자컴퓨터공학부en_US
dc.date.degree2012- 8en_US
dc.type.docTypeThesis-

qr_code

  • mendeley

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Views & Downloads

Browse