Storage of data from multiple sources in different databases is the new trend in the world of business. The rate at which the databases undergo synchronization may vary periodically depending on time frame. But the task of synchronization can be challenging because it can be hard to get modified data in such real-time update especially in source and target databases.
To find added, modified, or removed data; full scanning of original and destination databases are straightforward methods to synchronize database. The next step will be to perform selected actions over rows such as insert source table records which are not present in destination database, delete missing records in source tables from the destination database, and update modified records.
The challenge is that the process of synchronization may exceed the required time frame. Considering two databases on SQL Server and MySQL, let us assume they have more than a million rows. The row-by-row method of analysis of data does not allow improvement of the 100 rows per second process of synchronization, thus the average performance is set as such. Nevertheless, it is impossible to run the process of synchronization on an hourly basis because every database considered above takes more than 2½ hours to run.
But there is a method of achieving real-time synchronization. Once the rows have been modified after the last run of the synchronization process, they can be synchronized. Such algorithm used to implements this method is called trigger-based or incremental synchronization. The steps involved are:
- Running the program for the first time results in a straightforward synchronization of the database, if necessary.
- For every table under synchronization, the triggers on the insert, update, and delete are generated. The job of the trigger is to insert information about modified rows into a ‘special service table’.
- In the second run, the service table provides information about the modified rows. However, the rows are updated in the destination database.
Using the MS SQL and MySQL databases from the example above, the next agenda is the estimation of the reduction of the duration of the synchronization process using the incremental method mentioned above. The advantage of this method is that it can process modified database without replication millions of rows. This function allows it to run near to real-time even though it processes via a row-by-row method. Yet, bulk processing method allows the synchronization performance to be increased.
The above-mentioned method can be implemented. This is because storage of modified data in a table is done in a simple sequence including insert, delete, or update. Owing to this, the performance of the synchronization process is increased about 10 times by bulk processing.
While the most efficient method to update data is incremental synchronization, the source and destination databases have a few restrictions tied to them.
Thus, trigger-based synchronization involves the creation of triggers and service tables with sufficient privileges for synchronized databases, and every synchronized table has a unique index or primary key.
Some software vendors offer special tools to automate incremental database synchronization. One of these companies Intelligent Converters developed tools for incremental synchronization between popular database management systems: Microsoft SQL, MySQL, Oracle and PostgreSQL.
Comments