Editorials

SQL Without Indexes

Indexes are very important for SQL Server today. As your database grows, the more value indexes play in the performance of your SQL Server. However, this may not be the case as we move into the future. A good example of this prediction is the performance of systems like Big Table, the engine behind your Google searches. Instead of growing ever larger systems on bigger and bigger hardware, Google designed a system that grows smaller and smaller, and through distribution of workload results in performance on a massive scale.

There are many NoSql engines out there with similar capabilities, working with different degrees of scale and performance. To date, most of them are not relational in nature, and do not have the efficiencies of relational storage. You will see some sharding implementations where relational databases are split into multiple servers, and the data aggregated at a later time. But this is not the norm.

There are many factors converging that I believe will make distributed engines the norm, even in a relational world.

  • Network speeds are increasing dramatically. This allows the distribution of data across a network to have performance previously only available on the bus of a computer board, at least at a practical level.
  • RAM costs are falling faster such that my servers today have more memory that the hard disk cap
  • acity of only a few years ago.
  • Non-volatile memory, such as SSD, are getting faster and lasting longer, and may actually surpass that of the hard disk today for duration.
  • CPUs are growing in the ability to be truly parallel in processing, resulting in higher performance.
  • Frameworks and Services are being developed and enhanced such that the distribution of processing is not really a problem for the programmer to solve; it comes out of the box, regardless of your application.

With some of those different technologies converging, I would not be surprised to see the emergence of a true grid platform where programmers don’t know or care where their code resides. The work is distributed to the best available resource. When work is broken up and distributed to many machines for processing, the grid operating system knows how to gather the data back and return a consolidated result to the client. We are already seeing predecessors to this technology today. I don’t think it will be long until we see SQL Engines taking advantage of distributed technology without all the effort to make it work. The distribution will simply be handled by the grid operating system. When that happens, and data for any particular unit is small, indexes could become a thing of the past. At least as we know them today.

Cheers,

Ben