Composite type storage overhead

Started by Laiszner Tamásover 6 years ago8 messagesgeneral

t.laiszner@outlook.com

over 6 years ago

Hey there,

I am currently exploring the options to utilize 128-bit numeric primary keys. One of the options I am looking at is to store them as composites of two 64-bit integers.

The documentation page on composite types does not tell too much about the internal storage, so I've made my own experiment:

CREATE TYPE entity_id AS
(
high bigint,
low bigint
);

CREATE TABLE composite_test
(
entity_id entity_id NOT NULL,
CONSTRAINT composite_test_pkey PRIMARY KEY (entity_id)
)

INSERT INTO composite_test (entity_id) VALUES (ROW(0, 0));

Now, as I am really interested in keeping the indexes compact, tried pageinspect to find out what's going on internally:

SELECT * FROM bt_page_items(get_raw_page('composite_test_pkey', 1));

It seems wrapping the values into a composite type has a pretty significant storage overhead, as the index entry has a total size of 48 bytes, end the data look like this:

4b ff ff ff ff fa 40 00 00 ff ff ff ff 00 00 02 00 00 00 18 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

For comparison, when simply using a composite primary key of two columns, each index entry has a length of only 24 bytes - a 100% overhead from wrapping the values in a composite type.

Now, I understand there might be valid reasons to store a structure header alongside the plain data - e. g. to store version information so when the type is altered there is no need to rebuild the whole table.

However, I also think this should be highlighted in the documentation. (If it already is I apologise.)

Also, I would like ask if there is a way to instruct the storage engine to omit the housekeeping information and simply store the plain data, even if it comes with drawbacks.

I would highly appreciate any comments or additional information on this topic.

Best regards,
Tamas

Rob Sargent

robjsargent@gmail.com

over 6 years ago

In reply to: Laiszner Tamás (#1)

Re: Composite type storage overhead

On Oct 23, 2019, at 1:32 PM, Laiszner Tamás <t.laiszner@outlook.com> wrote:

Hey there,

I am currently exploring the options to utilize 128-bit numeric primary keys. One of the options I am looking at is to store them as composites of two 64-bit integers.

The documentation page on composite types does not tell too much about the internal storage, so I've made my own experiment:

CREATE TYPE entity_id AS
(
high bigint,
low bigint
);

CREATE TABLE composite_test
(
entity_id entity_id NOT NULL,
CONSTRAINT composite_test_pkey PRIMARY KEY (entity_id)
)

INSERT INTO composite_test (entity_id) VALUES (ROW(0, 0));

Now, as I am really interested in keeping the indexes compact, tried pageinspect to find out what's going on internally:

SELECT * FROM bt_page_items(get_raw_page('composite_test_pkey', 1));

It seems wrapping the values into a composite type has a pretty significant storage overhead, as the index entry has a total size of 48 bytes, end the data look like this:

4b ff ff ff ff fa 40 00 00 ff ff ff ff 00 00 02 00 00 00 18 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

For comparison, when simply using a composite primary key of two columns, each index entry has a length of only 24 bytes - a 100% overhead from wrapping the values in a composite type.

Now, I understand there might be valid reasons to store a structure header alongside the plain data - e. g. to store version information so when the type is altered there is no need to rebuild the whole table.

However, I also think this should be highlighted in the documentation. (If it already is I apologise.)

Also, I would like ask if there is a way to instruct the storage engine to omit the housekeeping information and simply store the plain data, even if it comes with drawbacks.

I would highly appreciate any comments or additional information on this topic.

Best regards,
Tamas

Why not use UUID type?

Laiszner Tamás

t.laiszner@outlook.com

over 6 years ago

In reply to: Rob Sargent (#2)

Re: Composite type storage overhead

That's an absolutely reasonable suggestion. I am still in the exploration phase so while this solution is not completely ruled out, I have some concerns about it:

1.
Although it does not enforce, but the UUID type kind of suggests a specific interpretation of the data. Of course the documentation says you are free to use any algorithm to generate the values, but there are quite a few standard UUID types and we are not planning to use any of them.
2.
The serialization format is different than needed by the application and, while once again this is not a hard technical barrier, that might cause slight additional complexity and confusion.
3.
The value is logically defined as a 128-bit integer, that is in itself a compound value split into a few "bit groups". Extracting these parts can be done by simple (and supposedly efficient) bitwise operators when stored as integer, but becomes much more cumbersome with UUID, I guess.

________________________________
Feladó: Rob Sargent <robjsargent@gmail.com>
Elküldve: 2019. október 23., szerda 22:58
Címzett: Laiszner Tamás <t.laiszner@outlook.com>
Másolatot kap: pgsql-general@postgresql.org <pgsql-general@postgresql.org>
Tárgy: Re: Composite type storage overhead

On Oct 23, 2019, at 1:32 PM, Laiszner Tamás <t.laiszner@outlook.com<mailto:t.laiszner@outlook.com>> wrote: