Random record selection is a rather involved topic. You can use random number generators when creating data, and use the randomly created values for selection. You can select records at runtime, randomly assign a value, and select a few meeting your criteria. You can also purchase third party tools that optimize and automate the process for you. And much more.
Today I am presenting a simple solution that often is adequate. It uses a modulus on a sequential value. If the value Mod [test number] = 0, then the record is included in the output. My strategy determines how many rows are in a table, and, using a value for the number of records to return, creates a modulus that applied to all the records in the table, returns the desired number of records, evenly parsed throughout the whole set. Of course, on a very large table, this technique would be quite slow. Let’s start out with how I determine the Mod comparison value. I’m using the Adventureworks 2012 database for this example.
DECLARE @Records INT = 5
DECLARE @ModAmount INT =
(SELECT COUNT(1) FROM Production.Product) / (@Records + 2)
SELECT @Records as Records, @ModAmount as ModAmount
Once this query is done it returns
Records ModAmount
5 72
My products table has 504 records. By using a modulus of 72, it breaks the table up into 7 sections. I get the record that divides each of those sections. Since the first section starts at 1 it will be excluded. I cap the number of records returned to 5, excluding the last selection. This results in 5 records, evenly disbursed in the middle of the entire set.
Now that I have my segments, I run three different queries, demonstrating how you can get different random results from the same set, using different sequences contained in the table. Some of the sequences you have to generate yourself through the use of a ROW_NUMBER() function. The trick is to have a sequential number upon which you can apply your modulus comparison.
This first query does not require a ROW_NUMBER function. Instead it uses the incremental primary key column generated as records are inserted into the table. The problem with this method is that the records are not guaranteed to be sequential, as is the case when using ROW_NUMBER. However, the sequence already exists, and no processing ower is used to create a new one just for your query.
SELECT TOP (@Records) ProductId, ProductNumber, Name, ListPrice
FROM production.Product
WHERE ProductID % @ModAmount = 0
This second query uses the same Modulus comparison. This time it generates a sequential number based on the ProductNumber column. This is a string value, and a MOD operation can’t be applied to it. However, it can be sorted, and a number assigned to each unique instance. Then you can apply the MOD operation to the generated sequence.
;WITH Products
AS
(
SELECT ProductId, ProductNumber, Name, ListPrice,
Row_Number() OVER (PARTITION BY 1 order by ProductNumber) as RowId
FROM production.Product
)
SELECT TOP (@Records) ProductId, ProductNumber, Name, ListPrice, RowId
FROM Products
WHERE RowId % @ModAmount = 0
Like the previous query, the following query generates a sequence based on the ListPrice. This is a nice query because it gets records from a range of prices throughout the entire table. That’s a pretty good representation. It is also weighted in a fashion. If there are more prices at the low end, there will be more records with low prices. Then the MOD comparison returns more low price records than high priced ones.
;WITH Products
AS
(
SELECT ProductId, ProductNumber, Name, ListPrice,
Row_Number() OVER (PARTITION BY 1 order by ListPrice) as RowId
FROM production.Product
)
SELECT TOP (@Records) ProductId, ProductNumber, Name, ListPrice, RowId
FROM Products
WHERE RowId % @ModAmount = 0
Here are the results I had from these queries on my instance of AdventureWorks.
Product Id |
Product Number |
Name | List Price |
Row Id |
---|---|---|---|---|
420 | LI-7160 | Internal Lock Washer 8 | 0.00 | 0 |
504 |
RA-2345
|
Cup-Shaped Race | 0.00 | 0 |
756 | BK-R68R-44 | Road-450 Red, 44 | 1457.99 | 0 |
840 | FR-R92B-52 | HL Road Frame – Black, 52 | 1431.50 | 0 |
924 | FR-M21B-42 | LL Mountain Frame – Black, 42 | 249.79 | 0 |
Product Id | Product Number | Name | List Price | Row Id |
---|---|---|---|---|
749 | BK-R93R-62 | Road-150 Red, 62 | 3578.27 | 84 |
737 | FR-R38B-48 | LL Road Frame – Black, 48 | 337.22 | 168 |
359 | HJ-1213 | Thin-Jam Hex Nut 9 | 0.00 | 252 |
432 | LJ-5811 | Thin-Jam Lock Nut 13 | 0.00 | 336 |
907 | RB-9231 | Rear Brakes | 106.50 | 420 |
Product Id | Product Number | Name | List Price | Row Id |
---|---|---|---|---|
405 | LE-1201 | External Lock Washer 9 | 0.00 | 84 |
489 | MT-1000 | Metal Tread Plate | 0.00 | 168 |
715 | LJ-0192-L | Long-Sleeve Logo Jersey, L | 49.99 | 252 |
820 | FW-R820 | HL Road Front Wheel | 336 | 336 |
888 | FR-T98Y-50 | HL Touring Frame – Yellow, 50 | 420 | 420 |
If you wish to dig into more sophisticated methods of random record selection, I found a good sampling of different techniques here. https://www.mssqltips.com/sqlservertip/3157/different-ways-to-get-random-data-for-sql-server-data-sampling/
Cheers,
Ben