Abstract
Among several distinct kinds of syncing technology implemented in clouds today, none implement binary diffs for efficiency. Binary diffs are well established in literature and can be used to drastically reduce the bulk transferred over the network. Since most cloud technologies are distributed and depend on intensive internetworking, binary diffs can offer a considerable efficiency boost. This paper proposes the DiffHub method for cloud syncs. Its performance is analyzed separately on real filesystems and then on synthetic traces based on hotspot distributions. Results show that traffic bulk can be reduced by between 1 and 2 orders of magnitude, depending on conditions.