Open Access System for Information Sharing

Login Library

Department of Computer Science & Engineering (컴퓨터공학과) 3. Theses_Ph.D.

Thesis

Cited 0 time in webofscience

webofscience

Cited 0 time in scopus

scopus

Metadata Downloads

Full metadata record

Files in This Item:: There are no files associated with this item.

DC Field	Value	Language
dc.contributor.author	Mu-Woong Lee	en_US
dc.date.accessioned	2014-12-01T11:48:13Z	-
dc.date.available	2014-12-01T11:48:13Z	-
dc.date.issued	2012	en_US
dc.identifier.other	OAK-2014-01107	en_US
dc.identifier.uri	http://postech.dcollection.net/jsp/common/DcLoOrgPer.jsp?sItemId=000001385016	en_US
dc.identifier.uri	https://oasis.postech.ac.kr/handle/2014.oak/1609	-
dc.description	Doctor	en_US
dc.description.abstract	This research addresses the problem of supporting scalable code similarity search systems for large-scale software repositories. While there are commercial code search engines available, they treat software as text and often fail to find semantically related code. Meanwhile, existing tools for semantic code clone searches take a “post-mortem” approach involving the detection of clones “after” the code development is completed, and hence, fail to return the results instantly. In clear contrast, the goal of this research is to combine the strength of these two lines of existing research.To achieve this goal, an indexing structure on vector abstractions of code is proposed. This index utilizes dimension reduction techniques to efficiently deal with the vector abstractions, which are naturally high-dimensional. This search system is then integrated into real-world development sessions. Such integration suggests that, by posing every code segment as a query to the software code corpus, developers can instantly reference relevant code segments at the time of generation to enhance productivity. This integration scenario creates the need for efficient similarity searches with the following requirements. First, a developer session translates into a sequence of evolving queries that need to be efficiently supported. Second, the quality of the results needs to be controlled, e.g., dealing with licenses requires that there be no false negatives. To satisfy these requirements, a workload-aware striping framework for high-dimensional evolving queries is proposed. This framework can be used to boost most existing high-dimensional indexes. In addition, to further enhance the scalability of code search systems, a workload-balancing distributed indexing structure is proposed. The goal of existing efforts in distributed indexing has been the localization of queries to data residing at a small number of nodes (i.e., locality-preserving indexing) to minimize communication cost. However, considering that workloads often correlate with data locality, such indexing often generates hotspots. Hence, workload-balancing is proposed as an optimization goal, and a distributed index that evenly distributes the workload is presented.	en_US
dc.language	eng	en_US
dc.publisher	포항공과대학교	en_US
dc.rights	BY_NC_ND	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/2.0/kr	en_US
dc.title	Scalable High-dimensional Index Design for Code Search Systems	en_US
dc.title.alternative	코드검색시스템을 위한 고차원 색인기법 설계	en_US
dc.type	Thesis	en_US
dc.contributor.college	일반대학원 전자컴퓨터공학부	en_US
dc.date.degree	2012- 8	en_US
dc.type.docType	Thesis	-

Show simple item record

qr_code

트윗하기

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Communities & Collection

Department of Computer Science & Engineering (컴퓨터공학과)

Views & Downloads

OAK

개인정보처리방침 Personal Information Protection Policy

library@postech.ac.kr Tel: 054-279-2548

Copyrights © by 2017 Pohang University of Science ad Technology All right reserved.

Browse

Login Library Help