Editorials

Where Should We Handle Null Values?

Where do you deal with null? The reality is that if you work with a relational database, you will most likely have to address the value, or non-value, of null. Some believe you should do everything you can to remove null from a database. For example, if an individual is working in the USA they must report their income to the Internal Revenue Service using a Social Security Number (SSN), at least in general. When you have a database table including an SSN, what do you do when the number is not known? You can use a different table with a one to zero or one relationship. But, when you outer join to that table, the query once again returns a null value. You can simply allow null in the table. Or, you can provide a real value for unknown.

If you decide to provide a value for when the SSN is not known your choice depends on the data type used to store the value. If the data type is numeric you can store a number that would never be a valid SSN such as 0 or 999999999. If you store the value as a string you can have an unknown value with many more options. I’ve seen 000-00-0000, Unknown, empty string, etc. The point is that we don’t know what the individual’s SSN is, not that they do or do not have one assigned to them.

In another example, we often join data using outer joins for the purpose of cross applying different data dimensions against a set that may or may not have a match. When a match does not exist, this is as significant as the data that does match. However, we often convert the null value into something more meaningful, often the value zero.

John shares in his comments to my editorial on Cartensions, https://sswug.org/bentaylor/editorials/multiple-results-from-queries/, that he prefers to address nulls at the database level. He likes to use stored procedures for these queries, resolving null values in the stored procedure. Then it doesn’t matter where the data is consumed. The null values are handled consistently in any consumer. His point is that many of the potential consumers of a query have different implementations for handling null. Resolving null at the database works for all consumers. John says he can use the same stored procedure results using PowerShell, SSIS, SSRS, and application code.

The same is true, even if you use dynamic SQL. You can still handle the null values in your dynamic queries.

Eilenblogger comments, “Generally it is a bad practice to format data (removing nulls, etc.) at the database.”

I’ve been thinking about how I use Null and DataTables, which was the basic issue that was being discussed. I work with data that is often not hosted by a relational database. However, in most of those cases, I don’t use a DataTable to work with the data. It will usually be objects or collections of objects. Clearly, if the data isn’t in a database, you can’t address null using a database query. I do sometimes have DataTables that work with Excel data or the like. But those data stores don’t tend to have null data.

Another place you may have to address null at a different level would be when you use auto generated ORM tools such as EF. At that point your entities will need to know a value can be null, and your code need to deal with a nullable data type. Using EF doesn’t preclude the use of a Stored Procedure or your own embedded query that would handle null in the query execution.

Where should we handle nulls? Is there a central place where this can be resolved? How about in a micro-service? There is value in consistency, and resolving null values consistently. If we don’t do it at the database level, then where do we handle null? Get into the conversation with your comment.

Cheers,

Ben