Editorials

Making a Graceful Fall

Recently I went through the exercise of error proofing an application. It called a number of micro-services. Sometimes a micro-service became saturated and non-responsive. The work wasn’t complete yet, and all I really needed to do was wait for the saturation to diminish. Then my work could continue successfully. This is a common problem with many different solutions.

The perfect solution is to optimize your service so that it can scale up or down depending on utilization at any time. You don’t want to waste resources supporting a service from rare, peak periods of utilization. Holiday shopping is a common example of this kind of activity. Some applications are big enough to experience variances in access throughout a 24 hour periods. A Mortgage search engine I wrote for a previous employer was such a service. It experienced large peaks and valleys through a typical weekday.

Rarely do we get the opportunity to optimize a service, or micro service to scale flexibly. Other times, we are using a service we purchase, and have no access to optimize. What do you do in those situations? Now we come to the solutions we have been doing for years. We need to write self-healing client code. A good example was in old versions of SQL Server such as version 4.21 or 6. An auction management OLTP system I worked on had “Deadlock Victim” actions on auction day when the transaction load was heavy.

A deadlock in this case occurred when two processes had locks on objects. The first process had a lock on an object needed by the second to complete. The second had a lock on an object needed by the first process to complete. Under these situations, SQL Server will select one of the processes to terminate to release the deadlock, because there will be no resolution by waiting further. In this case, the client receives an error message from SQL Server stating that its process was terminated to resolve the conflict.

We knew the system would experience deadlocks. There was little to be done to resolve them at that time. Our solution was to create a query handler that simply re-submitted the query to SQL Server for a configurable number of attempts, whenever it failed with an error stating that the process was a deadlock victim.

So, the world has come back around full circle once again. Today I am creating utilities to allow multiple attempts when calling micro services that may timeout, or fail for some reason. In today’s world there are so many things that may contribute to a client/server failure. Without something as powerful as a service bus, it falls back on the developer to produce fault tolerant service calls, allowing the application to gracefully fail and recover.

Cheers,

Ben