Editorials

Bad Data Summary

We had some good feedback on bad data. I’d like to see if I can summarize the input from everyone into a concise package.

  • Bad data is the result of many natural processes in an organization:
  • Data is developed across different business units, is not comprehensive in each unit, and when brought together can result in duplication or loss.
  • When data is merged through company acquisition or merger, it is difficult to get an authoritative source from shared data, such as customers.
  • We don’t always have the talent necessary to design storage systems that assure clean data. This may be due to limited budget or lack of understanding of the importance of a data professional as a member of the team.
  • We take shortcuts in our architecture to reduce the dissonance between application tiers, resulting in applications that aren’t good designs for any single tier. We can write them quickly, and they work for now, so that is good enough.
  • We are following industry trends working with un-structured data without taking into account or understanding the risks inherent with un-structured implementations.

David Eaton gave a really good example of one way bad data can evolve. We have a Boolean data type to store. The value is true or false. However, instead of using a Boolean data type, the system uses “Y” or “N”. This practice may be nice for presentation, but requires a lot of extra work in multiple tiers, assuring that no value other than “Y” or “N” is every instantiated. If you store it in the database as a CHAR(1), then you have to create a check constraint on the column, assuring only the two acceptable values are allowed. By default the CHAR(1) data type allows any ASCII character from CHAR(1) to CHAR(255). You must exclude 253 options if you want it to represent a Boolean value. David’s solution, “Use a bool datatype on all tiers.”

In conclusion, I want to echo the Derrick’s comment as having the same experience; “I’ve never known a project where this didn’t happen…”

Cheers,

Ben