Skip to content
Advertisement

Sampling a large number of rows from a table

I want to extract a roughly 5 million row sample from a table that will contain somewhere between 10 million and 20 million rows.

Due to the large number of rows, efficiency is key. As such, I am trying to avoid sorting the rows where possible, hence why I am avoiding the dbms_random.value solution that I have seen in similar questions.

I tried to do something like the following:

SELECT    *
FROM      full_table
SAMPLE    (CEIL(100 * 5000000 / (SELECT COUNT(*) FROM full_table)));

However, I can’t seem to do arithmetic in the SAMPLE clause (ORA-00933: SQL command not properly ended – I tried this with a simple SAMPLE(10/2) and still get the same thing).

Is this a reasonable approach, and, if so, how do I calculate the number of rows in the sample clause?

Advertisement

Answer

You could use PL/SQL with dynamic SQL like this:

declare
  cnt integer;
begin
  select count(*) into cnt from full_table;
  dbms_output.put_line(cnt);
  execute immediate
    'insert into target_table'
    ||' select * from full_tablesample (' || ceil(100 * 5000000/cnt) || ')';
end;
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement