Two part question:
What is Postgresql behavior for storing text/varchars in-row vs out-of-row? Am I correct in thinking that with default settings, all columns will always be stored in-row until the 2kB size is reached?
Do we have any control over the above behavior? Is there any way I can change the threshold for a specific column/table, or force a specific column to always be stored out-of-row?
I’ve read through PostGresql Toast documentation (http://www.postgresql.org/docs/8.3/static/storage-toast.html), but I don’t see any option for changing the thresholds (default seems to be 2kB-for-row) or for forcing a column to always store out-of-row (EXTERNAL only allows it, but doesn’t enforce it).
I’ve found documentation explaining how to do this on SQL Server (https://msdn.microsoft.com/en-us/library/ms173530.aspx), but don’t see anything similar for PostGresql.
If anyone’s interested in my motivation, I have a table that has a mix of short-consistent columns (IDs, timestamps etc), a column that is varchar(200), and a column that is text/varchar(max), which can be extremely large in length. I currently have both varchars stored in a separate table, just to allow efficient storage/lookups/scanning on the short-consistent columns.
This is a pain however, because I constantly have to do joins to read all the data. I would really like to store all the above fields in the same table, and tell Postgresql to force-store the 2 VARCHARs out-of-row, always.
Advertisement
Answer
Edited Answer
For first part of the question: you are correct (see for instance this).
For the second part of the question: the standard way of storing columns is to compress variable length text fields if their size is over 2KB, and eventually store them into a separate area, called “TOAST table”.
You can give a “hint” to the system on how to store a field by using the following command for your columns:
ALTER TABLE YourTable ALTER COLUMN YourColumn SET STORAGE (PLAIN | EXTENDED | EXTERNAL | MAIN)
From the manual:
SET STORAGE
This form sets the storage mode for a column. This controls whether this column is held inline or in a secondary TOAST table, and whether the data should be compressed or not.
PLAIN
must be used for fixed-length values such as integer and is inline, uncompressed.MAIN
is for inline, compressible data.EXTERNAL
is for external, uncompressed data, andEXTENDED
is for external, compressed data.EXTENDED
is the default for most data types that support non-PLAIN storage. Use ofEXTERNAL
will make substring operations on very large text and bytea values run faster, at the penalty of increased storage space. Note thatSET STORAGE
doesn’t itself change anything in the table, it just sets the strategy to be pursued during future table updates. See Section 59.2 for more information.
Since the manual is not completely explicit on this point, this is my interpretation: the final decision about how to store the field is left in any case to the system, given the following constraints:
- No field can be stored such that the total size of a row is over 8KB
- No field is stored out-of-row if its size is less then the
TOAST_TUPLE_THRESHOLD
- After satisfying the previous
constraints, the system tries to satisfy the
SET STORAGE
strategy specified by the user. If no storage strategy is specified, each TOAST-able field is automatically declaredEXTENDED
.
Under these assumption, the only way to be sure that all the values of a column are stored out-of-row is to recompile the system with a value of TOAST_TUPLE_THRESHOLD
less then the minumum size of any value of the column.