Editorials

Roll Your Own ACID Transaction

Making your own code follow the concepts of an ACID transaction is not all that fun. I’ve been playing with an in memory engine where portions of the data in the memory is modified frequently throughout the data. It takes quite a while to startup a new instance of the engine if you read the objects from source application. So, to optimize the process of startup we took the simple route of serializing the objects to disk after they are modified by the source application that creates the objects. In essence, we want the objects in memory to match the objects serialized to disk at all times. Essentially, we need an ACID type transaction enforcing the same object is in the memory instance and serialized to disk.

In order to make the process a little harder, we have the ability to pass one or more objects from the object source manager. So our transaction now needs to cover multiple objects in memory and on disk.

The serialization process has to different actions. If an object is being added to memory, the old serialized version needs to be deleted from disk, and the new version needs to be serialized to disk. If it is a completely new object, I can simply serialize the new version to disk. If the object manager determines an object needs to be removed, it sends that call along with adds and updates. In this case, only a delete is executed for the designated object, and the file system once again matches memory.

Using a two phase commit technique similar to that from TSQL, we can assure things are synchronized. I maintain two collections of the objects that are to be modified. The first collection is the list of the objects that have been added. The second collection is the list of objects to be deleted. I create a new memory object to hold the new stuff and then begin to apply the changes.

If I update and existing object I add the old file name to the objects to delete collection, and serialize the new object to disk as a new file. The new file name is added to the objects added collection. Once all individual objects have been processed successfully, or an error has been raised, I now perform the second phase of the commit. If it completed successfully, I delete all the files in the deleted files collection and update the working memory set to point to the objects found in the temporary working set. Then the temporary working set collections are cleared out so that only the working memory set has pointers to the new set of objects.

If the merge process fails at any point, all files in the Objects Added Collection are deleted, and the temporary working set is cleared out for the next object migration.

Using this two stage commit assures files serialized to disk accurately match the same objects in memory.

If this is hard to follow, please leave a comment and I’ll add a flowchart.

Cheers,

Ben

BTW: Don’t forget to register for the Free Online Conference!