Editorials

TSQL Group By Extensions

There are extensions to the GROUP BY clause available in TSQL allowing you to do some really cool stuff. This capability is often forgotten, because we tend to think of doing these kinds of functions in a Data Warehouse, instead of in traditional SQL statements. Those extensions are CUBE, ROLLUP, and GROUPING SETS. There is a good explanation of these aggregate extensions found on MSDN titled, “Using GROUP BY with ROLLUP, CUBE, and GROUPING SETS”.

Let me summarize the contents. However, you will most likely want to review this article as their examples provide data you can see, helping you better understand the difference between each of the extensions, and how they differ.

As you know, the GROUP BY clause allows you to take multiple detail rows, and return only a single row for each unique instance of the columns contained in the results. For each unique instance, you may also return aggregate values, such as SUM, MIN, MAX, COUNT, STDEV. Using grouping extensions does not change this behavior directly. Instead, It breaks the groupings out in unique ways, resulting in additional groupings.

For our example, let’s use a sales table having two columns, SalesDate and SalesAmount.

Using a simple GROUP BY query you might see

SELECT
DatePart(yyyy, SalesDate) AS SalesYear
,DatePart(mm, SalesDate) AS SalesMonth
,DatePart(dd, SalesDate) AS SalesDay
,SUM(SalesAmount) AS Sales
GROUP BY
DatePart(yyyy, SalesDate)
,DatePart(mm, SalesDate)
,DatePart(dd, SalesDate)

Given this example you would see groupings for every distinct year, Month, Day. For each period, the sum of the sales for that day are aggregated as Sales.

Using a GROUP BY ROLLUP the results would be different. The query is nearly identical.

SELECT
DatePart(yyyy, SalesDate) AS SalesYear
,DatePart(mm, SalesDate) AS SalesMonth
,DatePart(dd, SalesDate) AS SalesDay
,SUM(SalesAmount) AS Sales
GROUP BY ROLLUP
DatePart(yyyy, SalesDate)
,DatePart(mm, SalesDate)
,DatePart(dd, SalesDate)


In this scenario you will see different groupings in your output. Instead you will see additional rows.

ROLLUP Groupings
Year Month Day Sales
Null null null #,###.00
Year null null #,###.00
Year Month null #,###.00
Year Month Day #,###.00

For each of distinct values for Year, Month, Day, there will be individual groupings.

Notice that in the previous example, there is a hierarchy in which data is returned. You get a totals row, the first one, which has no year, month or day. They you get a year row. However, you never get a month row without a year. You never get a day row without a month. What if you wanted every possible permutation, so you could get a total of all says that fall on the first day of the month, or the first month of the year, etc. All on one query you can roll up all of the possible permutations. This is done by using the CUBE extension instead of ROLLUP.

SELECT
DatePart(yyyy, SalesDate) AS SalesYear
,DatePart(mm, SalesDate) AS SalesMonth
,DatePart(dd, SalesDate) AS SalesDay
,SUM(SalesAmount) AS Sales
GROUP BY CUBE
DatePart(yyyy, SalesDate)
,DatePart(mm, SalesDate)
,DatePart(dd, SalesDate)



This query returns:


CUBE Groupings
Year Month Day Sales
null null null #,###.00
Year null null #,###.00
Year Month null #,###.00
Year null Day #,###.00
Year Month Day #,###.00
null Month null #,###.00
null Month Day #,###.00
null null Day #,###.00

Again, there will be a separate row for every unique combination of Year, Month, Day.

This is really powerful stuff, and you can do it using traditional SQL you use today.

Cheers,

Ben