In our thesis we presented the area of data deduplication and implemented an algorithm for object storage with support for elimination of duplicate chunks within those objects. In the first part we presented storage system as a tree-likestructure of directories and files. We described the features of storage system and simple ways of storing data on a medium. We examined in detail the properties of distributed storage system Ceph, it's components and operation. In the second part we presented deduplicatin as an important feature of modern storage systems. We surveyed deduplication techniques for centralized as well as distributed systems. In the last part we implemented an example of deduplication technique along with a simple object storage system. Using the described techniques we implemented detection of variable-length duplicated chunks within objects and added CLI tools for manipulating objects in the store.
|