sample.md 1.3 KB

TABLESAMPLE and SAMPLE

Building a random sample from the data source specified in FROM.

TABLESAMPLE is part of the SQL standard and works as follows:

  • The operating mode is specified:

    • BERNOULLI means "slowly, straightforwardly going through all the data, but in a truly random way".
    • SYSTEM uses knowledge about the physical data storage of data to avoid full data scans, but somewhat sacrificing randomness of the sample.

The data is split into sufficiently large blocks, and the whole data blocks are sampled. For applied calculations on sufficiently large tables, the result may well be consistent.

  • The size of the random sample is indicated as a percentage after the operating mode, in parentheses.
  • Optionally, it can be followed by the REPEATABLE keyword and an integer in parentheses to be used as a seed for a pseudorandom number generator.

SAMPLE is a shorter alias without sophisticated settings and sample size specified as a fraction. It currently corresponds to the BERNOULLI mode.

Examples

SELECT *
FROM my_table
TABLESAMPLE BERNOULLI(1.0) REPEATABLE(123); -- one percent of the table
SELECT *
FROM my_table
TABLESAMPLE SYSTEM(1.0); -- about one percent of the table
SELECT *
FROM my_table
SAMPLE 1.0 / 3; -- one-third of the table