Abstract
NAND flash-based Solid State Drives (SSDs) have been widely deployed in data centers of cloud computing due to their high performance compared with hard disks, while the limited lifespan of flash memory makes SSDs not very suitable for write-intensive applications. Deduplication is an effective method used to reduce the write traffic of applications thus can be used to extend the lifespan of SSDs. However, traditional deduplication schemes rely on the time-consuming fingerprint computing process to find duplicated data, which may impair the write performance of SSDs. Accordingly, Pre-hashing was proposed to reduce the chances of fingerprint computing thus improving the performance of SSDs with deduplication, but at the cost of degrading deduplication rate. In this paper, we propose NF-Dedupe, a new deduplication scheme that needs no fingerprint computing for flash-based SSDs. NF-Dedupe determines whether a write page is duplicated or not by comparing the write page with its potential duplicated page read from underlying flash chips byte by byte, rather than relying on the comparison of fingerprints. As flash memory is known for its high parallelism and low read latency, reading a page from flash chip and comparing two pages byte by byte introduce lower overhead than the fingerprint computing does. We evaluate the NF-Dedupe via trace-driven simulations. Experimental results have shown that NF-Dedupe outperforms the other approaches and can achieve the deduplication rate ranging from 5.3% to 29.9% and the write latency is improved by a factor of up to 21% with an average of 12%.