In this thesis the area of data deduplication is presented. Basic concepts of data deduplication are explained along with the main engineering challenges faced when designing or using deduplication systems. The main emphasis is on storage-media data deduplication, i.e., data deduplication at filesystem level. A few existing and very different deduplication filesystems are presented and compared. Special focus is given to Unix filesystems and development of userspace deduplication filesystems -- in this thesis such a filesystem is designed and implemented. The implemented filesystem is conceptually and experimentally compared to the opensource deduplication filesystem lessfs. Architectural decisions of both filesystems are discussed.
|