SQL TIP: Selecting Random Selection Of Rows From A Database Table

sql 0 comments suggest edit

I found a nice tip for selecting random rows from within a SQL Server 2000 database. Well actually, pseudorandom. Since my undergraduate thesis was on the topic of pseudorandom number generation, I might as well be precise. For some reason, my non-geek friends find it awfully funny when I mention pseudorandom numbers.

I digress. In order to select 10 records from some table at random, try this:

SELECT TOP 10 * FROM someTable ORDER By NEWID()

Now for my homework, I should find out just how random this is. There’s a whole slew of statistical tests I can run to gauge the randomness of pseudorandom number generator such as the Chi-square Test, Serial Correlation Coefficient, and 2-D Random Walk Test to name a few.

IMPORTANT: Please note that this will NOT work in SQL 7 on NT4 because the NEWID() function there generates sequential results (Bad SQL! Bad!).

UPDATE: My friend Erik referred me to this article that has an overview of several methods for random sampling. Thanks e.

Found a typo or error? Suggest an edit! If accepted, your contribution is listed automatically here.

Comments

avatar

65 responses

  1. Avatar for Justin Pitts
    Justin Pitts June 21st, 2004

    Elegant.

  2. Avatar for Adam Miller
    Adam Miller June 21st, 2004

    I believe the downside to this is the NEWID generates a unique identifier for every row in the table and then the result is returned. Therefore you may not want to use this option for tables that have a large amount of rows.

  3. Avatar for Haacked
    Haacked June 21st, 2004

    True. This isn't designed to be high performant for large data sets. I'm using it to select a set of random questions from a set of a few thousand. In that situation it performs quite handily.



    For really large tables, selecting a random result set becomes a bit tricky. The challenge is that you don't want to generate a random number (or a guid) for every row just to select a subset. So you have to generate a subset of random numbers and then somehow map those numbers to the table and select just those records.



    This is easier said than done.

  4. Avatar for HumanCompiler
    HumanCompiler June 21st, 2004

    I know everyone says that cursors are the devil, but I've done it before where you get a random number (between 1 and COUNT(*)) and open a cursor on the result set of a select statement and call FETCH ABSOLUTE and give it the random number. Seemed to work fine, just that you have to actually select every row, so the more rows, the more it will slow down (just like your example, except that you're not creating a new guid for every row, so i'd think it would be faster, but maybe not)

  5. Avatar for Haacked
    Haacked June 22nd, 2004

    Nice idea Eric. But how do you feel about yourself after collaborating with the devil like that? ;)



    I imagine it depends how FETCH ABSOLUTE works? Is it going to do some kind of iteration for each row you fetch or does it index the rows?



    It sounds like this approach could be faster for small selection of random rows from a really large table, but not for large selections from a large table.



    But then again, I'm merely speculating. I'd want to measure, measure, measure.



    Phil

  6. Avatar for .e.
    .e. June 22nd, 2004
  7. Avatar for Koba
    Koba June 25th, 2004

    I use a dartboard.

  8. Avatar for Mickey
    Mickey March 9th, 2005

    Issue - I doubt this will really be random - because the query engine in sql always selects the rows first and then orders it - so it will select the top 10 rows first from the table however is it ordered (usually by the PK) and then order is randomly - so you will most likely get the same set of records or near the same set of records randomly ordered :-/

  9. Avatar for Haacked
    Haacked March 9th, 2005

    Mickey, rather than doubt it, you should give it a try and check out the results.



    As a matter of fact, the query engine will order by newid() first. Then it will select the top 10. Because newid() generates random guids, the initial sort will shuffle the rows.

  10. Avatar for Haacked
    Haacked March 9th, 2005

    Besides, if you were correct and the SQL engine always selected rows before ordering them, you'd never be able to select the last 10 rows in the database.



    For example, suppose we have a table with a DateCreated column. How would you get the last 10 created records? Well the typical query is:



    SELECT TOP 10 * FROM someTable ORDER BY DateCreated DESC



    But that wouldn't work if the query engine selected the top 10 records ordered by the PK first.



    Fortunately, SQL does the ordering first. Thus this query will return the last 10 created records.

  11. Avatar for Sidharth
    Sidharth March 23rd, 2005

    Selectin randomly... it was the nightmare for me.



    Im satisfied with this code.100% . simply great

  12. Avatar for SCollege
    SCollege April 12th, 2005

    It does not seem to work well with SQL Server 2000, in that the records do not appear to be random. Repeatably executing the query always returns the same rows.

  13. Avatar for Up All Night
    Up All Night April 22nd, 2005

    I found the insights posted on this page to be helpful. I thought I would share mine. Still needs help.



    This is my task. I have a table of Realtors. When a new customer signs up for service, the app needs to check to see which Realtors will service the customer's particular zip code. Simple enough. Then randomly assign the customer one of the Realtors who services his zip code area. I then need to set selected_status to 1 on the Realtor's record. This works to remove him from the random pool until all agents have been assigned once.



    SELECT agent_id FROM `agent_info`

    WHERE selected_status <1 and agent_serv_zip like '%<?=$zipcode;?>%'

    ORDER BY rand( )

    LIMIT 1



    This works perfectly to return one random row that fits my original criteria.



    What I would like to find out is if there is a way to set the selected_status to 1 in the same query?



    After all 'selected_status' flags are > 0, I will reset them back to 0 in order to put the realtors back into the pool

  14. Avatar for TG
    TG April 25th, 2005

    This returns 2 random records for me. However, on occasion, the 2 records returned are identical. Does anybody have any thought for preventing this from happening?



    SELECT TOP 2 P.*, C.CatID FROM JEP_tblProducts P, JEP_LtblProductCategories C WHERE P.Image1<>'' AND C.ProductID = P.ProductID ORDER BY NEWID()



  15. Avatar for Krish
    Krish April 27th, 2005

    Hi friends



    I have a huge dataset of size of 350 MB...I want to select 10% of the rows out of (2.5 million records).I tried in sql server but its taking time and I am not getitng the result...is there any other way to do it...plz let me know...

  16. Avatar for Homeslice
    Homeslice May 11th, 2005

    Wow, this is exactly what I was looking for and it works perfectly for me. I created a user control in .NET for ad banners which are all stored in the database. I used the select statement "SELECT TOP 1 * from tbl_ad_banners ORDER BY NEWID()" to grab a random ad and display it in my footer control. You can see it at work at http://www.pimpify.name. Thanks for the code snippet!

  17. Avatar for Harsh
    Harsh May 22nd, 2005

    I think so it works well for a smaller data set for which I have tried. Thank you very much.

  18. Avatar for Berz
    Berz May 23rd, 2005

    Thanks for the information in this thread. I've got some big datasets and some smaller. I'm going to replace the small dataqueries with the NEWID() function.



    For the large ones (aprox 100.000 rows) I'm still using the old version wrapped in a loop till I get a result.. I think from this thread that NEWID() might be to heavy?



    SELECT galleryPk FROM gallery

    WHERE galleryPk <=

    (SELECT RAND()*(MAX(galleryPk)-MIN(galleryPk))+MIN(galleryPk) FROM gallery)

  19. Avatar for Agarash
    Agarash May 24th, 2005

    Asolutely great! But... it doesnt work with UNION!! is there a way around this? I have 2 queries i join with UNION



    thanks

  20. Avatar for JunkMan
    JunkMan June 2nd, 2005

    Agarash : create a view with the union and then another view to randomize the first one.....

  21. Avatar for Patty
    Patty June 16th, 2005

    Why does this even work at all?

    > SELECT TOP 1 FROM MyTable ORDER BY Rand()



    I thought "ORDER BY" had to mention either a "valid sort number column"... or

    the actual column name?



    How does it "sort" on column # 0.34898131415 or column # 983 at all ?????









  22. Avatar for Jay
    Jay June 22nd, 2006

    NewID() is not a good option. However there is another way which is faster and better.
    SELECT TOP 3 * FROM TableName ORDER BY RAND((1000 * AutoID) * DATEPART(millisecond, GETDATE()))
    Perhaps this would help you guys.
    Happy Coding!

  23. Avatar for Brett
    Brett July 7th, 2006

    The earlier examples have the benefit of running. Can you explain the "AutoID" function as used above in T-SQL? If you're referring to an IDENTITY column in the table in question ("TableName" in your example), (a) it doesn't return random records and (b) considering the potential size of someone's identity value, multiplying by 1000 can overflow the field type in question (especially if it's an INT). If there is an undocumented "AutoID" function it's not running on my SQL2k box.
    I've had good results with sorting by NEWID() myself, also given the constraints of a moderate data set.
    Happy Coding Yourself!

  24. Avatar for Sudhakar.D
    Sudhakar.D July 13th, 2006

    This is alright, but i wanted to get Top 10 rows from a Dataset and I dont want to get Top 10 rows from sql itslef, I dont want to reconnect to the database, once the dataset is received.
    Anybody has the clue, how to do it?

  25. Avatar for durai
    durai September 3rd, 2006

    hi to all.
    i want to retrieve some 60 questions from a database of 300, and the questions should be random, and the same order (or)pattern should not repeat for another retrieval of datas from the data base.tell me the ways of getting it, help me with codes or the commands and the logic behind it .It must be of easy understanding .pls make it soon as soon as possible.

  26. Avatar for ED
    ED December 3rd, 2006

    How would i add a line that doesnt include say the first 50 numbers in a database in this random query?

  27. Avatar for Haacked
    Haacked December 4th, 2006

    Just add a where clause to exclude those 50 records. For example, suppose your table has a primary key of ID, and you want to exclude the first 50 ids:
    ELECT TOP 10 * FROM someTable ORDER By NEWID() WHERE ID NOT IN (SELECT TOP 50 ID FROM SomeTable ORDER BY Id ASC)

  28. Avatar for ED
    ED December 4th, 2006

    What does newid() actually do?

  29. Avatar for ED
    ED December 4th, 2006

    When I run the following:
    SELECT TOP 10 columnname, columname
    FROM someTable ORDER By NEWID()
    WHERE ID NOT IN (SELECT TOP 50 ID FROM SomeTable ORDER BY Id ASC)
    I get the following errors:
    msg 156, level 15 , state 1, line 3
    incorrect syntax near the keyword 'where'
    msg 156, level 15 , state 1, line 3
    incorrect syntax near the keyword 'order'
    Any ideas? Cheers, ED.

  30. Avatar for Haacked
    Haacked December 4th, 2006

    NEWID() generates a random guid (globally unique identifier).

  31. Avatar for Haacked
    Haacked December 4th, 2006

    Well that was a rough sketch of the query. Also, I forgot to move the ORDER BY NEWID() to the end. The actual query depends on the columns involved in your table.
    SELECT TOP 10 column1name, column2ame
    FROM YourTable
    WHERE YourIdColumnName NOT IN (SELECT TOP 50 YourIdColumnName FROM YourTable ORDER BY YourIdColumnName ASC)
    ORDER By NEWID()
    YourColumnName is the column you used to figure out which records to skip.

  32. Avatar for Petros
    Petros May 15th, 2007

    I have a table that holds some banner paths. I also run a query on a page to randomly choose among those banners, and display the chosen one. However, I want to promote one of the banners, in a way that I would like to force it appear, let's say, every 10 page refreshes.
    The query I use to randomly choose a banner is this:
    SELECT location FROM random_logos ORDER BY RAND() LIMIT 1,0;
    How could I alter it so that it forcible return the promoted banner every X number of page refreshes for EVERY visitor of the page?
    I mean, even if I use another table to store the latest X (10 in my case) returned banners, I cannot be sure who refreshed the page for them. So, adding a check to that table will display the promoted banner on the 11th refresh to the user that happened to be the 11th visitor who refreshed the page. Then the process would start over.
    I have thought of many ways, but each of them comes to a dead end for one reason or another.

  33. Avatar for Lioness
    Lioness July 9th, 2007

    I've tried multiple ways to select random rows from a database table. I've tried using NewId() and RAND(). Neither of them work. Is there some other way to select a random row?

  34. Avatar for Aaron
    Aaron October 2nd, 2007

    works pefectly dude... you rock!

  35. Avatar for Mike
    Mike March 19th, 2008

    regarding the comment on unions and NewID()
    another way besides using the view is to create a new collumn in your select, such as
    SELECT bla1, bla2, NewID() as RandNum
    FROM Table
    WHERE whatever
    UNION
    SELECT bla1, bla2, NewID() as RandNum
    FROM Table
    WHERE whatever
    ORDER BY RandNum

  36. Avatar for Hamayun Khan
    Hamayun Khan April 27th, 2008

    work perfect. greate work for me. Thanks.

  37. Avatar for Gimo
    Gimo July 23rd, 2008

    To be "really random", you must have another null column where you can fill it if the corresponding row is selected. Note that the selection statement must include a where clause, on which it must select where the column is null.
    eg. SELECT TOP 10 column1 FROM someTable WHERE column2 IS NULL ORDER BY NEWID()
    Then you need to update the column2 setting it a value like 1.

  38. Avatar for mohan
    mohan August 3rd, 2008
  39. Avatar for HonG
    HonG August 13th, 2008

    I WANT TO KEY IN PERCENTAGE IN TEXT TO SELECT RANDOM LIKE EXAMPLE
    IF I HAVE 100 EMPLOYEE SO I WANT KEY IN TEXT LIKE 10% SO THE SHOULD HOW
    SELECT FILED
    FROM TABLE
    WHERE HERE SHOULD HOW TO PUT IF MY TEXT
    ORDER BY RND()

  40. Avatar for tarun darolia
    tarun darolia August 17th, 2008

    i have more than one details corresponding to a particular rollnumber.
    now i want to get these details in such a way that all details corresponding to a particular roll num should appear in one row all together.then what should i write in the query???

  41. Avatar for pan
    pan November 25th, 2008

    select top 10 * from sometable order by newid()
    It gives 7,8,9 number of records sometime WHY??????????????????????????

  42. Avatar for Great eBay Auctions
    Great eBay Auctions November 27th, 2008

    Great code. I needed to randomly display 5 links on the home page. Your code worked like a charm. Thanks!

  43. Avatar for Doug
    Doug December 1st, 2008

    The thing that most surprised me about this is that it works in subsonic:
    SubSonic.SqlQuery qry = Select.AllColumnsFrom<Gallery>().
    Top("6").OrderAsc("NEWID()");

  44. Avatar for NikX
    NikX February 22nd, 2009

    Amazing post..
    Very Very Helpful
    Thanks

  45. Avatar for Dmitriy Zasyatkin
    Dmitriy Zasyatkin March 4th, 2010

    Perfect. Thank you very much!
    I am really glad that this was number one on Google for "sql server how to choose a number of random records"

  46. Avatar for MVS
    MVS April 20th, 2010

    Thanks
    the function newid() works a lot better then the function rand()

  47. Avatar for David Morton
    David Morton May 20th, 2010

    I have been researching both the Rand() and NewId() methods and as stated above, NewId() is great for small sets of data but not so great with large datatables. However, knowing about your data should help! In my case, I counted the rows of data that would be return using my filters (where clauses) and then used that as a multiplier against Rand(). I knew what my desired sampling would be so here is what I did.
    -- **************************************************
    declare @threshold decimal(8,2)
    set @threshold = 9.99
    Declare @somedate char(8)
    set @somedate = '20100518'
    declare @rows int
    Select @rows=COUNT(*) from sometable
    where SOME_DATE = @somedate
    and P_C_CODE = '3'
    and C_A_DUE > @threshold
    and R_T_CODE = '1'
    and S_M_CODE != 'S'

    Declare @rowno int
    declare @i int
    set @i = 1
    -- This loop guarantees that we get unique rows in our sample.
    while @i <= 60 -- the number of samples I want.
    BEGIN
    set @rowno = cast(RAND() * @rows as int)
    if not exists(Select * from @t where RecNo = @rowno)
    BEGIN
    Insert into @t Values (@rowno)
    set @i = @i + 1
    END
    END
    Select
    ROW_NUMBER() over (Order by R_T_Code) as RowNum
    ,MyField1 as fld1
    , MyField2 as fld2
    into #ctedata
    from SomeTable
    where SOME_DATE = @somedate
    and P_C_CODE = '3'
    and C_A_DUE > @threshold
    and R_T_CODE = '1'
    and S_M_CODE != 'S'
    Select
    *
    from #ctedata a
    inner join
    @t b
    on a.RowNum = b.RecNo
    order by a.mbr_No

  48. Avatar for David Morton
    David Morton May 20th, 2010

    In my last post I left out the table @t
    Declare @t table
    (
    RecNo int NOT NULL PRIMARY KEY
    )
    That should answer that question.
    Thanks

  49. Avatar for Rakesh
    Rakesh July 15th, 2010

    it helped me........

  50. Avatar for Dany
    Dany January 28th, 2011
  51. Avatar for afeera
    afeera February 8th, 2011

    i m using this code for my project.. but the same record is repeating more than once.. i don wan that to happen..
    all the records must b chosen randomly but no record should be repeated.
    how do i implement that?
    i tried this query.. but some error..
    SELECT distinct(question) FROM questions where topic = '" + "Data Interpretation" + "' ORDER BY NEWID()
    error :
    ORDER BY items must appear in the select list if SELECT DISTINCT is specified.

    can anyone correct me..
    thanks in advance :)

  52. Avatar for click here
    click here June 8th, 2011

    I am having the same issue as afeera in that the same record is repeating multiple times. I don't see any replies to his/her post and I'm wondering if anybody could help us out. I really need this fixed for the project I'm working on. I'm not using either SQL7 or NT4 because of what Phil said, so I know this isn't the issue. Any help is appreciated.

  53. Avatar for Javier Callico
    Javier Callico July 7th, 2011

    Since SQL 2005, TOP can be used as a function a variable can be used as argument.
    DECLARE @Count INT
    SET @Count = 10
    SELECT TOP(@Count) * FROM someTable ORDER By NEWID()

  54. Avatar for Sachin
    Sachin August 11th, 2011

    Hi, I used NewID() to choose the rows randomly but problem is rows are repeting how can restrict that ?
    helpme pls
    Thanks in advance

  55. Avatar for rrrandom
    rrrandom October 5th, 2011

    I'm also having the restriction problem like Sachin. Can anyone help out with that?

  56. Avatar for Celbester
    Celbester November 11th, 2011

    Thank you so much for the inspiration!!

  57. Avatar for Gerhard
    Gerhard November 20th, 2011

    I love simplicity and you are proven to be a bloody genius. Well done!!!

  58. Avatar for kel
    kel December 6th, 2011

    can some one help me with an an issue?. i am designing a code with php, just like an expert system, a question will be asked from a database and an answer will be typed in a field provided, if the answer is submitted another question will be displayed, this will be done for like 13 time then get a final total of answer. all this will be done in the same page....HOW CAN I GET THIS DONE

  59. Avatar for harry
    harry January 2nd, 2012

    WOW

  60. Avatar for Arpita Rawal
    Arpita Rawal February 6th, 2012

    Hi, I have 1 question regarding query that how NEWID is getting generated ?

  61. Avatar for parveen
    parveen February 17th, 2012

    i want n rows randomly from a table where n will get by parameter can be different every time

  62. Avatar for EenOog
    EenOog May 24th, 2012

    Thanks guys works for me testedand after 30 - 40 tries not on sequence in the same order.
    thanks

  63. Avatar for coolansh63
    coolansh63 June 8th, 2012

    you may find this article useful:
    webgeek.elletis.com

  64. Avatar for dd
    dd January 18th, 2014

    thanks so simple

  65. Avatar for f13 Media!
    f13 Media! September 1st, 2017

    What about this? "SELECT * FROM someTable WHERE something = 'x' ORDER BY RAND() LIMIT 10"