Open Access System for Information Sharing

Login Library

 

Article
Cited 1 time in webofscience Cited 1 time in scopus
Metadata Downloads

Offline Selective Data Deduplication for Primary Storage Systems SCIE SCOPUS

Title
Offline Selective Data Deduplication for Primary Storage Systems
Authors
Park, SPark, C
Date Issued
2016-02
Publisher
IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG
Abstract
Data deduplication is a technology that eliminates redundant data to save storage space. Most previous studies on data deduplication target backup storage, where the deduplication ratio and throughput are important. However, data deduplication on primary storage has recently been receiving attention; in this case, I/O latency should be considered equally with the deduplication ratio. Unfortunately, data deduplication causes high sequential-read-latency problems. When a file is created, the file system allocates physically contiguous blocks to support low sequential-read latency. However, the data deduplication process rearranges the block mapping information to eliminate duplicate blocks. Because of this rearrangement, the physical sequentiality of blocks in a file is broken. This makes a sequential-read request slower because it operates like a random-read operation. In this paper, we propose a selective data deduplication scheme for primary storage systems. A selective scheme can achieve a high deduplication ratio and a low I/O latency by applying different data-chunking methods to the files, according to their file access characteristics. In the proposed system, file accesses are characterized by recent access time and the access frequency of each file. No chunking is applied to update-intensive files since they are meaningless in terms of data deduplication. For sequential-read-intensive files, we apply big chunking to preserve their sequentiality on the media. For random-read-intensive files, small chunking is used to increase the deduplication ratio. Experimental evaluation showed that the proposed method achieves a maximum of 86% of an ideal deduplication ratio and 97% of the sequential-read performance of a native file system.
URI
https://oasis.postech.ac.kr/handle/2014.oak/37597
DOI
10.1587/transinf.2015EDP7034
ISSN
1745-1361
Article Type
Article
Citation
Ieice Transactions on Information and Systems, vol. E99D, no. 2, page. 370 - 382, 2016-02
Files in This Item:

qr_code

  • mendeley

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher

박찬익PARK, CHAN IK
Dept of Computer Science & Enginrg
Read more

Views & Downloads

Browse