All this sharing of data is moving ahead at such an incredible pace… but I have to wonder, are we missing a piece?
I’ve been thinking a lot more about sharing data (as in preventing it unless it’s something you’re supposed to be doing) and then… it got me thinking. What would it take for someone to seed incorrect information? What if some huge social network had a bunch of data to share and they either decided to be sneaky or were hacked… the net result could be a situation where the information that “got out” could really cause issues with other data systems.
I admit that, at times, part of me cheers this possibility. The thought that there would be a question mark as people used information obtained through potentially unknowing connections makes me grin a bit.
But if you move this inside your company or clients. If you look at information gathered by them for their own use for we’re not attaching morality decisions and instead just trying to make sense of data… the data chain of custody (oh boy, Steve’s talking about this again) becomes an issue.
I’ve talked with several people recently that are working through data flows and when we started talking about working with exceptions in the data, it was a quick flash that brought up the idea of validation. Not just in a “are the values in range and rational” manner, but is it the types of things we expect and do we *know* it’s correct data?
Those are hard questions to answer in the process of working through data streams. If you knew the data elements you could verify them. But of course that’s sort of the point, is that you don’t know the data and need to collect and process it.
It used to be that we’d talk about understanding how you know that that spreadsheet and the information in it, hasn’t been tampered with or modified, or that the formulas that create derived values are actually correct? It’s pretty critical. The same applies to data sharing. And, when we consider data sharing between applications and possibly even outside a given company’s walls.
I’m not sure what the answer is. It could be data modeling for rational values. It could be comparing against other sources (this is similar to what’s done with verification and master data modeling), or something completely different. But consider carefully the sources and uses of the information. If you don’t have the checks in there, it’s possible you could be making decisions based on information that’s just flat wrong.