I’m trying to generate a random time between 8:00 AM and 8:00 PM for each row that is selected from a data set, however, I always get the same random value for each row – I want it to be different for each row.
Table schema & data:
╔══════╦════════════════╗ ║ ID ║ CREATED_DATE ║ ╠══════╬════════════════╣ ║ ID/1 ║ 26/04/2014 ║ ║ ID/2 ║ 26/04/2014 ║ ║ ID/3 ║ 26/04/2014 ║ ║ ID/4 ║ 26/04/2014 ║ ║ ID/5 ║ 26/04/2014 ║ ╚══════╩════════════════╝
Сurrent SQL statement:
SELECT [ID] , MyFunction.dbo.AddWorkDays(14, [CREATED_DATE]) AS [New Date] , CONVERT(VARCHAR, DATEADD(MILLISECOND, CAST(43200000 * RAND() AS INT), CONVERT(TIME, '08:00')), 114) AS [New Time] FROM [RandomTable]
Current results (same time for each row in the [New Time]
column):
╔══════╦════════════════╦════════════════╗ ║ ID ║ New Date ║ New Time ║ ╠══════╬════════════════╬════════════════╣ ║ ID/1 ║ 10/05/2014 ║ 09:41:43 ║ ║ ID/2 ║ 10/05/2014 ║ 09:41:43 ║ ║ ID/3 ║ 10/05/2014 ║ 09:41:43 ║ ║ ID/4 ║ 10/05/2014 ║ 09:41:43 ║ ║ ID/5 ║ 10/05/2014 ║ 09:41:43 ║ ╚══════╩════════════════╩════════════════╝
Desired results (different time for each row in the [New Time]
column):
╔══════╦════════════════╦════════════════╗ ║ ID ║ New Date ║ New Time ║ ╠══════╬════════════════╬════════════════╣ ║ ID/1 ║ 10/05/2014 ║ 09:41:43 ║ ║ ID/2 ║ 10/05/2014 ║ 15:05:23 ║ ║ ID/3 ║ 10/05/2014 ║ 10:01:05 ║ ║ ID/4 ║ 10/05/2014 ║ 19:32:45 ║ ║ ID/5 ║ 10/05/2014 ║ 08:43:15 ║ ╚══════╩════════════════╩════════════════╝
Any ideas on how to fix this? All of the above is just sample data – my real table has around 2800 records (not sure if that will make a difference to anyone’s suggestions).
Advertisement
Answer
Interpretation of Original Question:
The question states:
- Generate a random time between 8:00 AM and 8:00 PM (i.e. a 12-hour window)
- It should be different for each row (i.e. unique across all rows)
- The real table has around 2800 records
Now factor in the following points:
- Sample data shows only a single date
- There are 86,400 seconds in 24 hours, hence 43,200 seconds in 12 hours
There is some ambiguity in the following areas:
- What exactly is random within the context of “different for every row“, given that truly random values cannot be guaranteed to be different for every row. In fact, truly random numbers could theoretically be the same for every row. So is the emphasis on “random” or “different”? Or are we really talking about different but not sequentially ordered (to give the appearance of randomness without actually being random)?
- What if there are ever more than 2800 rows? What if there are 1 million rows?
- If there can be more than 43,200 rows, how to handle “different for each row” (since it is not possible to have unique across all rows)?
- Will the date ever vary? If so, are we really talking about “different for each row per date“?
- If “different for each row per date“:
- Can the times for each date follow the same, non-sequential pattern? Or does the pattern need to differ per each date?
- Will there ever be more than 43,200 rows for any particular date? If so, the times can only be unique per each set of 43,200 rows.
Given the information above, there are a few ways to interpret the request:
- Emphasis on “random”: Dates and number of rows don’t matter. Generate truly random times that are highly likely, but not guaranteed, to be unique using one of the three methods shown in the other answers:
- @notulysses:
RAND(CAST(NEWID() AS VARBINARY)) * 43200
- @Steve Ford:
ABS(CHECKSUM(NewId()) % 43201)
- @Vladimir Baranov :
CAST(43200000 * (CAST(CRYPT_GEN_RANDOM(4) as int) / 4294967295.0 + 0.5) as int)
- @notulysses:
- Emphasis on “different for each row”, always <= 43,200 rows: If the number of rows never exceeds the number of available seconds, it is easy to guarantee unique times across all rows, regardless of same or different dates, and appear to be randomly ordered.
- Emphasis on “different for each row”, could be > 43,200 rows: If the number of rows can exceed the number of available seconds, then it is not possible to guarantee uniqueness across all rows, but it would be possible to still guarantee uniqueness across rows of any particular date, provided that no particular date has > 43,200 rows.
Hence, I based my answer on the idea that:
- Even if the number of rows for the O.P. never exceeds 2800, it is more likely that most others who are encountering a similar need for randomness would have a larger data set to work with (i.e. there could easily be 1 million rows, for any number of dates: 1, 5000, etc.)
- Either the sample data is overly simplistic in using the same date for all 5 rows, or even if the date is the same for all rows in this particular case, in most other cases that is less likely to happen
- Uniqueness is to be favored over Randomness
- If there is a pattern to the “seemingly random” ordering of the seconds for each date, there should at least be a varying offset to the start of the sequence across the dates (when the dates are ordered sequentially) to give the appearance of randomness between any small grouping of dates.
Answer:
If the situation requires unique times, that cannot be guaranteed with any method of generating truly random values. I really like the use of CRYPT_GEN_RANDOM
by @Vladimir Baranov, but it is nearly impossible to get a unique set of values generated:
DECLARE @Table TABLE (Col1 BIGINT NOT NULL UNIQUE); INSERT INTO @Table (Col1) SELECT CONVERT(BIGINT, CRYPT_GEN_RANDOM(4)) FROM [master].sys.objects so CROSS JOIN [master].sys.objects so2 CROSS JOIN [master].sys.objects so3; -- 753,571 rows
Increasing the random value to 8 bytes does seem to work:
DECLARE @Table TABLE (Col1 BIGINT NOT NULL UNIQUE); INSERT INTO @Table (Col1) SELECT CONVERT(BIGINT, CRYPT_GEN_RANDOM(8)) FROM [master].sys.objects so CROSS JOIN [master].sys.objects so2 CROSS JOIN [master].sys.objects so3; -- 753,571 rows
Of course, if we are generating down to the second, then there are only 86,400 of those. Reducing the scope seems to help as the following does occasionally work:
DECLARE @Table TABLE (Col1 BIGINT NOT NULL UNIQUE); INSERT INTO @Table (Col1) SELECT TOP (86400) CONVERT(BIGINT, CRYPT_GEN_RANDOM(4)) FROM [master].sys.objects so CROSS JOIN [master].sys.objects so2 CROSS JOIN [master].sys.objects so3;
However, things get a bit trickier if the uniqueness needs per each day (which seems like a reasonable requirement of this type of project, as opposed to unique across all days). But a random number generator isn’t going to know to reset at each new day.
If it is acceptable to merely have the appearance of being random, then we can guarantee uniqueness per each date without:
- looping / cursor constructs
- saving already used values in a table
- using
RAND()
,NEWID()
, orCRYPT_GEN_RANDOM()
The following solution uses the concept of Modular Multiplicative Inverses (MMI) which I learned about in this answer: generate seemingly random unique numeric ID in SQL Server . Of course, that question did not have a tightly-defined range of values like we have here with only 86,400 of them per day. So, I used a range of 86400 (as “Modulo”) and tried a few “coprime” values (as “Integer”) in an online calculator to get their MMIs:
- 13 (MMI = 39877)
- 37 (MMI = 51373)
- 59 (MMI = 39539)
I use ROW_NUMBER()
in a CTE, partitioned (i.e. grouped) by CREATED_DATE
as a means of assigning each second of the day a value.
But, while the values generated for seconds 0, 1, 2, … and so on sequentially will appear random, across different days that particular second will map to the same value. So, the second CTE (named “WhichSecond”) shifts the starting point for each date by converting the date to an INT (which converts dates to a sequential offset from 1900-01-01) and then multiply by 101.
DECLARE @Data TABLE ( ID INT NOT NULL IDENTITY(1, 1), CREATED_DATE DATE NOT NULL ); INSERT INTO @Data (CREATED_DATE) VALUES ('2014-10-05'); INSERT INTO @Data (CREATED_DATE) VALUES ('2014-10-05'); INSERT INTO @Data (CREATED_DATE) VALUES ('2014-10-05'); INSERT INTO @Data (CREATED_DATE) VALUES ('2014-10-05'); INSERT INTO @Data (CREATED_DATE) VALUES ('2014-10-05'); INSERT INTO @Data (CREATED_DATE) VALUES ('2015-03-15'); INSERT INTO @Data (CREATED_DATE) VALUES ('2016-10-22'); INSERT INTO @Data (CREATED_DATE) VALUES ('2015-03-15'); ;WITH cte AS ( SELECT tmp.ID, CONVERT(DATETIME, tmp.CREATED_DATE) AS [CREATED_DATE], ROW_NUMBER() OVER (PARTITION BY tmp.CREATED_DATE ORDER BY (SELECT NULL)) AS [RowNum] FROM @Data tmp ), WhichSecond AS ( SELECT cte.ID, cte.CREATED_DATE, ((CONVERT(INT, cte.[CREATED_DATE]) - 29219) * 101) + cte.[RowNum] AS [ThisSecond] FROM cte ) SELECT parts.*, (parts.ThisSecond % 86400) AS [NormalizedSecond], -- wrap around to 0 when -- value goes above 86,400 ((parts.ThisSecond % 86400) * 39539) % 86400 AS [ActualSecond], DATEADD( SECOND, (((parts.ThisSecond % 86400) * 39539) % 86400), parts.CREATED_DATE ) AS [DateWithUniqueTime] FROM WhichSecond parts ORDER BY parts.ID;
Returns:
ID CREATED_DATE ThisSecond NormalizedSecond ActualSecond DateWithUniqueTime 1 2014-10-05 1282297 72697 11483 2014-10-05 03:11:23.000 2 2014-10-05 1282298 72698 51022 2014-10-05 14:10:22.000 3 2014-10-05 1282299 72699 4161 2014-10-05 01:09:21.000 4 2014-10-05 1282300 72700 43700 2014-10-05 12:08:20.000 5 2014-10-05 1282301 72701 83239 2014-10-05 23:07:19.000 6 2015-03-15 1298558 2558 52762 2015-03-15 14:39:22.000 7 2016-10-22 1357845 61845 83055 2016-10-22 23:04:15.000 8 2015-03-15 1298559 2559 5901 2015-03-15 01:38:21.000
If we want to only generate times between 8:00 AM and 8:00 PM, we only need to make a few minor adjustments:
- Change the range (as “Modulo”) from 86400 to half of it: 43200
- Recalculate the MMI (can use the same “coprime” values as “Integer”): 39539 (same as before)
- Add
28800
to the second parameter of theDATEADD
as an 8 hour offset
The result will be a change to just one line (since the others are diagnostic):
-- second parameter of the DATEADD() call 28800 + (((parts.ThisSecond % 43200) * 39539) % 43200)
Another means of shifting each day in a less predictable fashion would be to make use of RAND()
by passing in the INT form of CREATED_DATE
in the “WhichSecond” CTE. This would give a stable offset per each date since RAND(x)
will return the same value y
for the same value of x
passed in, but will return a different value y
for a different value of x
passed in. Meaning:
RAND(1) = y1
RAND(2) = y2
RAND(3) = y3
RAND(2) = y2
The second time RAND(2)
was called, it still returned the same value of y2
that it returned the first time it was called.
Hence, the “WhichSecond” CTE could be:
( SELECT cte.ID, cte.CREATED_DATE, (RAND(CONVERT(INT, cte.[CREATED_DATE])) * {some number}) + cte.[RowNum] AS [ThisSecond] FROM cte )