Low latency asynchronous database synchronization and data transformation using the replication log
by Vincent van Donselaar
Analytics firm Distimo offers a web based product that allows mobile app developers to track the performance of their apps across all major app stores. The Distimo backend system uses web scraping techniques to retrieve the market data which is stored in the backend master database: the data warehouse (DWH). A batch-oriented program periodically synchronizes relevant data to the frontend database that feeds the customer-facing web interface.
The synchronization program poses limitations due to its batch-oriented design. The relevant metadata that must be calculated before and after each batch results in overhead and increased latency. The goal of this research is to streamline the synchronization process by moving to a continuous, replication-like solution, combined with principles seen in the field of data warehousing. The binary transaction log of the master database is used to feed the synchronization program that is also responsible for implicit data transformations like aggregation and metadata generation. In contrast to traditional homogeneous database replication, this design allows synchronization across heterogeneous database schemas. The prototype demonstrates that a composition of replication and data warehousing techniques can offer an adequate solution for robust and low latency data synchronization software.