Microsoft recently added a new tool to its Azure cloud platform that brings on premises even closer to the cloud--and, in the process, makes hybrid computing and migrating to the cloud much easier to implement for entities looking to develop a cloud migration strategy. Azure SQL Data Sync provides for bi-directional synchronization across multiple SQL databases hosted in Azure SQL Database and SQL Server on premises.
What is Data Sync?
Data Sync uses Sync Groups as the construct for synchronizing data among multiple databases and instances. It uses a hub-and-spoke architecture to address the synchronization process. In this design, one database in the Sync Group is designated as the “hub;” the other databases are considered “members.” Placing the hub database at the center of the action, imagine invisible spokes radiating and connecting to the individual member databases. Synchronization occurs between the hub and the members. (There is no member-to-member synchronization.)
Each Sync Group has four properties, defining:
- Sync Direction: can be set to either unidirectional (a.k.a. hub-to-member or member-to-hub) or bidirectional
- Sync Interval: how frequently the databases are synchronized.
- Sync Schema: the roster of databases involved in the synchronization process and assignment of “hub” or “member” status
- Conflict Resolution Policy: whether “Hub Wins” or the “Member Wins” in cases of conflict
There are some constraints on configuring the hub and member assignments. The hub database must be an Azure SQL database. Data Sync requires an additional Sync Database that hosts the necessary metadata that feeds the process of synchronizing databases along with an associated log. The Sync Database also must be an Azure SQL database and is required to be hosted in the same region as the designated hub database. There is more leniency around member databases. They can be either Azure SQL databases or SQL Server databases on premises. If a member database is SQL Server on premises, a local agent must be configured to aid in synchronization.
Data Sync uses triggers to log changes to data values. If designing bidirectional synchronization, the hub synchronizes changes to the member database(s) first. Then the member database(s) synchronizes any changes back to the hub. Any conflicts that arise are resolved based upon the Sync Group’s Conflict Resolution property: Hub Wins or Member Wins.
Azure Data Sync Use Cases
Data Sync allows for meeting needs of business cases where eventual consistency is required, including:
- Hybrid synchronization: Data Sync allows you to keep your cloud and on premises databases synchronized. For many enterprises, this would help establish their first foray into the cloud.
- Global distribution: Place data in regions closest to your customers to reduce latency and improve their experiences with your applications.
- Separation of workloads: Separate read-intensive operations such as analytics and reporting from the core write operations that alter or feed your data repositories. Separating your reads can lead to greatly improved performance for end user operations.
Caveats and Requirements
Microsoft guarantees that Data Sync will not cause data loss. However, since triggers--and not transactions--fuel this service, the members and hub achieve “eventual” consistency. This means that at any given point of time the individual databases in the Sync Group may not host identical values across all tables, but eventually they will. You could conceivably submit an identical query to the hub and one or more member databases and get different results. Eventual consistency is also referred to as being asynchronous. This differs from other “high availability” solutions such as Always On Availability Groups or Database Mirroring. In Availability Groups and Database Mirroring, you can configure the solution to be synchronous or asynchronous. In a synchronous configuration, a transaction is not committed (or “hardened”) on the primary copy of a database until it’s first committed/hardened on the secondary copy of the database. This ensures that both copies are transactionally consistent at all times.
Another issue resulting from use of triggers is performance. The triggers log changes in side tables within each database participating in this service offering. This means additional storage concerns as well as increased IO, as these tables store, and are read from, to facilitate synchronization.You may need to scale your service tier when implementing Data Sync. Be mindful of the timing of any scaling. Provisioning or de-provisioning during Sync Group creation, deletion or configuration may compound performance degradation.
There are also structural requirements for tables, column data types, database object names and other limitations. Consult the official Microsoft documentation for all technical specifications.
If this sounds like something that would bring you closer to cloud computing or solve a business need for global distribution or separation of critical workloads impacting performance, then there is an additional bit of good news: price. The Azure Data Sync service is free. There are all the regular fees for Azure activity and and resource-based usage, as well as applicable egress charges that may occur.