Editorials

UNION Set Operations

We’ve been looking at some of the set based operations that are part of the SQL Language, and different ways to achieve the same results. Today we add to the discussion the UNION operation. There are two varieties of the UNION operator, UNION and UNION ALL.

The UNION operator takes two sets of data, combines the results, and then returns each unique record from the combined set. It is like selecting DISTINCT after combining the two sets, without having to specify DISTINCT. DISTINCT is already assumed in the query.

UNION ALL is different from the UNION operation. It does not have a DISTINCT operation over the combined results. So, if there are overlapping records in the two sets, or if there are overlapping records in either set, they will all be returned.

I’m going to use my Sequence table to demonstrate these operators. My sequence query returns the numbers 1 through five.

My first query selects five records with the number 1.


SELECT 1

FROM Sequence

WHERE Number BETWEEN 1 AND 5

If I take this query and combine it with another query with the same set using the UNION operation, the result is one row with the number 1.

SELECT 1

FROM Sequence

WHERE Number BETWEEN 1 AND 5


UNION


SELECT 1

FROM Sequence

WHERE Number BETWEEN 1 AND 5

The reason for this result is because the UNION first creates a set of 10 records with the number 1, then it applies DISTINCT or GROUP BY, resulting in a single record with the number 1.

If instead you used UNION ALL the result is now ten records with the number 1. This is because the GROIUP BY or DISTINCT operation is not performed.

If you really want to be sure this is how things work, run the query three times. Between the first and second query use UNION ALL, and between the second and third query use UNION. Running this final query you once again return only a single row with the number 1.

UNION and UNION ALL can be very useful tools. Use UNION ALL if you are sure you don’t need to remove duplicates as this will have much better performance. Use UNION if you must assure no duplicate records are returned.

Cheers,

Ben