Synchronization algorithm for exchanging data in the “Client

Synchronization algorithm for exchanging data in the “Client – Server” model via REST API

August 27, 2013

Many mobile applications require to sync data with a server if they operate in the client – server model to exchange data with the central repository.

If the server serves up resources through a REST API, then all sync logic can be handled on the client side.
The sync logic is able to handle bi-directional sync between central server and multiple clients where only incremental changes apply on both sides with some conflict detection.

Each table (both server and client) participating in the sync process should include two additional fields: “Ts” as Number, “Deleted” as Boolean.

The Ts field is maintained by the server side. For example SQL Server automatically generates a unique incremental value on each insert or update. The Ts field is used to determine whether one record in the table was modified more recently than another so to download only the incremental changes. Also it will help to identify the new records created on the client as they will have no Ts value.
Please note: Ts in this algorithm is not an instant in time, typically expressed as a date and time of day. Rather, the Ts should be considered as an Update Sequence Number e.g. the number of data element that identifies the order in which the element was last modified. The Ts within a table start at “1” and then increase monotonically every time a record is created, modified, or deleted.

The Deleted field is used to mark records for deletion so the clients would remove such records from local repositories, whereas clients with initial sync would simply skip them during download.
Additionally the client data storage should include the “Dirty” field as Boolean.
The Dirty field designates the records changed locally. Once those records sent successfully to the server the Dirty flag reset.

The sync process will be initiated by the client either manually (pressing some Sync button) or periodically or by combination of push notifications about server changes and client change triggers.
The sync process first download the server changes then upload the client changes.
The sync procedure will process the table changes sequentially and in correct order (where parent child relationship exist) to avoid referential integrity conflicts.

Below is the sync algorithm:

Initialize the list of tables required to sync

Get the latest Ts, e.g. using sqlite “select ifnull(ts, 0) from Table order by ts desc limit 1”
Call the relevant RESTful service method to get server changes as collection for the table based on the provided latest timestamp (Ts)
Iterate through each record in the server changes collection

Check if the server record with the same id already exists in the local storage

If so then check if the existing local record is dirty

If so then call some conflict resolution function
Reset dirty flag

Else replace the local record with the server record and reset the dirty flag

If the local storage doesn’t have the server record, then create it

Get the list of local storage changes for inserts on the server
Iterate through each record in the local changes for inserts

Insert the record on the server (using POST)
Get the Ts for the inserted record from the response
Update the record in the local storage with new Ts
Reset the dirty flag

Get the list of local storage changes for updates on the server
Iterate through each record in the local changes for updates

Update the record on the server (using PUT)
Get the updated Ts for the updated record from the response
Update the record in the local storage with new Ts
Reset the dirty flag

Delete records where Deleted=true and Dirty=false

For the conflict resolution the easiest way is to update the records with the server version (server always wins) and at the end of sync show the conflict log.

Comments

Anonymous30 August 2013 at 13:56
Even if your conflict resolution will always take the server versions, you will likely end up having inconsistencies.

It seems, you are assuming that the device has *exclusively* access to the database for the duration of the sync process on the client side.

If other users are allowed to make changes during the sync process, your algorithm will not work. The server would need to resolve conflicts itself, or simply refuse to make changes. Either way, your algorithm would need to take care about this. Implementing this correctly and reliable is one of the more hard software problems.

Furthermore, your sync algorithm doesn't allow to make changes to the database without doing it exclusively through synching. That is, a user cannot just update a particular record - since this would destroy your prerequisite about the "age" of the table.

There are other caveats, but you may try to find a solution for those problems mentioned first. It's going to be hard.
ReplyDelete
Replies
Christopher Chouputra29 November 2013 at 08:22
Changes with existing data in the server will not be detectable and assuming you have checksum for each row it will be computational heavy to iterate each rows for large dataset.
ReplyDelete
Replies
khelll4 December 2013 at 11:41
There is a slight modification for this algorithm that makes better in syncing during the upload phase. The problem is that there might be few modifications on the server records during the uploading phase. Thus, getting the latest timestamp in step 2, will ignore those modifications cause the uploaded modification(5-8) will have higher timestamps for sure. I suggest to store the latest timestamp after the download phase. This comes after step 4.

There will be redundant work of course for the already uploaded documents in the previous sync. But you grantee no loss of data will happen.
ReplyDelete
Replies
Tulga10 December 2013 at 08:41
I implemented the algorithm in Python with little improvements. Check out here: https://github.com/melug/ClientServerSyncAlgorithm
ReplyDelete
Replies
Unknown8 May 2014 at 09:57
Hi Sergey,

It seems to be that deletion by client side are not supported. I only see deletion in step 4 as some records marked "deleted" by the server. But there is only a mention to update and insert in step 5-8 and not a possible delete operation by client.
Isn't it?

Thank you
ReplyDelete
Replies
Unknown24 August 2014 at 12:06
Hi,

are you using this algorithm in one of production apps? If yes, do you have any issues with it that you would like to discuss?
ReplyDelete
Replies
Piyush Singh29 August 2014 at 06:34
Hello Sergey,

Thanks for sharing the beautiful algorithm with everybody.

I am quarries to know if we can merge the step 5-6 with 7-8 as we can identify the new and client updated records where timestamp is null and dirty is true respectively. That will save a call to server.

I am implementing this algorithm in android in production. i would love discuss the production issues(if any).
ReplyDelete
Replies
Unknown4 October 2014 at 20:36
we should not reset dirty flag if the result of a conflict resolution differs from remote side
ReplyDelete
Replies
Unknown24 October 2014 at 22:49
Wouldn't a better conflict resolution be when the record was updated last? I.e. whichever is later: the server last modified stamp, or the client's last modified stamp?
ReplyDelete
Replies
Unknown27 October 2014 at 17:31
If client A's update is after client B, as a user I would expect my changes to be committed, not client B's. Ideally you want to check for updates from the server before doing any updates but sometimes the network is not available, so client A will create a dirty update, then later (could be the next day) when it can sync with the backend his changes will be overridden by B even though he made the changes after. An extra column for the client data called "lastUpdatedLocallyUTC" which is updated anytime the commit is dirty (i.e. local only) would be able to be compared upon conflict resolution. Nice article, btw.
ReplyDelete
Replies
Panos4 September 2015 at 07:23
Hi Sergey,

first of all thanks for sharing your sync process!
I'm currently trying to implement your approach for an mobile app and are now faced with the problem that comparing local records with the id generated on the server side is not
as easy as it sounds.
Consider following situation:

A user can create new database records which are not public available but always mapped to his/her exclusive account id.

Client/User A creates a new record in a table called for example 'Book'. Because this is the very first record in the local database the autoincrement column will use the id 1.Now the sync process is triggered so this change can be communicated to the server. Assuming that there are no changes to download to the client the client then starts to upload the local changes to the server. On the server this new record is created in the appropriate table where the autoincrement column decides also to use the id 1 (because it's the very first record on server side).

Now client/user B creates also a new Book record. Because this is also the first record being created in the table 'Book' the id in this local database instance will be also 1. Client/User B sends now also this local change to the server but this time the used id for the new record in the table 'Book' on server-side will be 2.

Let's assume now that client/user B will do some changes on the previously created record with the local id 1. While triggering now again the sync process the client will get in the response the record from the server side with the id 2 (because only this record belongs to this client/user -> the record with id 1 belongs to client/user A).

In this case it is not really possible to check if the server record with the same id already exists in the local storage (see step 4 in your sync description) because on the client side we have the record with the id 1 and the record from the server side has id 2.

Any ideas and tips how to solve this problem are really appreciated!
ReplyDelete
Replies
Unknown29 February 2016 at 12:02
hello can you provide a link for exapmle or code sample it will a great help
ReplyDelete
Replies
Sergey Kosik29 February 2016 at 20:23
Currently I don't have my own code sample.
However, there is an implementation of the sync algorithm by Tulga in Python: https://github.com/melug/ClientServerSyncAlgorithm .

I hope that helps.
ReplyDelete
Replies
Unknown15 April 2016 at 16:05
This a good article. But however i have been reading up on sync situations. and when you rely on timestamps for syncing there are a lot of scenarios where it breaks down.

e.g. When the tables are being updated ( but the records are NOT yet commited ) at this tiem if you are inbetween a sync operation. You can actually miss the uncommited changes. refer : http://forums.mysql.com/read.php?27,243736,243736 for more details.

Timestamps only work if the timestamp providing server is a single server. But what happens in a multi server architecture ? The timestamps / clock cycles will never be exactly the same. So how can the timestamps actually be compared ? We could miss out on a lot of records to be synced even if there is a second difference between the 2 servers.
ReplyDelete
Replies
Mr.john Elder25 May 2016 at 06:42
This comment has been removed by a blog administrator.
ReplyDelete
Replies
Anonymous1 August 2016 at 17:09
Thank you for this great algo.
In step 3 when you wrote "Time Stamp" you mean the TS field?
Thank you.
ReplyDelete
Replies
Sergey Kosik1 August 2016 at 20:58
This comment has been removed by the author.
ReplyDelete
Replies
Panos8 August 2016 at 13:23
What if a table contains records for different users (only some records are relevant for a user)?
The Update Sequence Number will be not constant over all user records and there might be gaps in the Update Sequence Number for one user (for example -> 1,2 .... 10,11 and 3 to 9 belongs to another user).

Do you see any problem in that case?
ReplyDelete
Replies
ATUL23 August 2016 at 07:58
Thanks for great article and lots of discussion.

Can any clear my doubt. how the deleted entries on server can also be deleted on client.
ReplyDelete
Replies

Search This Blog

Havrl

Synchronization algorithm for exchanging data in the “Client – Server” model via REST API

Comments

Post a Comment