Memory-comparable Serialization of Data Types

Started by Shichao Jinalmost 6 years ago7 messages

jsc0218@gmail.com

almost 6 years ago

Hi Postgres Developers,

We are currently integrating LSM-tree based storage engine RocksDB into
Postgres. I am wondering is there any function that serialize data types in
memory-comparable format, similar to MySQL and MariaDB. With that kind of
function, we can directly store the serialized format in the storage engine
and compare them in the storage engine level instead of deserializing data
and comparing in the upper level. I know PostgreSQL is towards supporting
pluggble storage engine, so I think this feature would be particular useful.

Best,
Shichao

Peter Geoghegan

pg@bowt.ie

almost 6 years ago

In reply to: Shichao Jin (#1)

Re: Memory-comparable Serialization of Data Types

On Tue, Feb 11, 2020 at 11:53 AM Shichao Jin <jsc0218@gmail.com> wrote:

We are currently integrating LSM-tree based storage engine RocksDB into Postgres. I am wondering is there any function that serialize data types in memory-comparable format, similar to MySQL and MariaDB. With that kind of function, we can directly store the serialized format in the storage engine and compare them in the storage engine level instead of deserializing data and comparing in the upper level.

Do you mean a format that can perform Index comparisons using a
memcmp() rather than per-datatype comparison code?

--
Peter Geoghegan

Shichao Jin

jsc0218@gmail.com

almost 6 years ago

In reply to: Peter Geoghegan (#2)

Re: Memory-comparable Serialization of Data Types

Yes, this is exactly what I mean.

On Tue, 11 Feb 2020 at 15:01, Peter Geoghegan <pg@bowt.ie> wrote:

Show quoted text

On Tue, Feb 11, 2020 at 11:53 AM Shichao Jin <jsc0218@gmail.com> wrote:

We are currently integrating LSM-tree based storage engine RocksDB into

Postgres. I am wondering is there any function that serialize data types in
memory-comparable format, similar to MySQL and MariaDB. With that kind of
function, we can directly store the serialized format in the storage engine
and compare them in the storage engine level instead of deserializing data
and comparing in the upper level.

Do you mean a format that can perform Index comparisons using a
memcmp() rather than per-datatype comparison code?

--
Peter Geoghegan

Peter Geoghegan

pg@bowt.ie

almost 6 years ago

In reply to: Shichao Jin (#3)

Re: Memory-comparable Serialization of Data Types

On Tue, Feb 11, 2020 at 12:19 PM Shichao Jin <jsc0218@gmail.com> wrote:

Yes, this is exactly what I mean.

PostgreSQL doesn't have this capability. It might make sense to have
it for some specific data structures, such as tuples on internal
B-Tree pages -- these merely guide index scans, so there some
information loss may be acceptable compared to the native/base
representation. However, that would only be faster because memcmp() is
generally faster than the underlying datatype's native comparator. Not
because comparisons have to take place in "the upper levels". There is
some indirection/overhead involved in using SQL-callable operators,
but not that much.

Note that such a representation has to lose information in at least
some cases. For example, case-insensitive collations would have to
lose information about the original case used (or store the original
alongside the conditioned binary string). Note also that a "one pass"
representation that we can just memcmp() will have to be significantly
larger in some cases, especially when collatable text is used. A
strxfrm() blob is typically about 3.3x larger than the original string
IIRC.

--
Peter Geoghegan

Alvaro Herrera

alvherre@2ndquadrant.com

almost 6 years ago

In reply to: Peter Geoghegan (#4)

Re: Memory-comparable Serialization of Data Types

On 2020-Feb-11, Peter Geoghegan wrote:

On Tue, Feb 11, 2020 at 12:19 PM Shichao Jin <jsc0218@gmail.com> wrote:

Yes, this is exactly what I mean.

PostgreSQL doesn't have this capability. It might make sense to have
it for some specific data structures,

I think adding that would be too much of a burden, both for the project
itself as for third-party type definitions; I think we'd rather rely on
calling the BTORDER_PROC btree support function for the type.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Peter Geoghegan

pg@bowt.ie

almost 6 years ago

In reply to: Alvaro Herrera (#5)

Re: Memory-comparable Serialization of Data Types

On Tue, Feb 11, 2020 at 1:40 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

I think adding that would be too much of a burden, both for the project
itself as for third-party type definitions; I think we'd rather rely on
calling the BTORDER_PROC btree support function for the type.

An operator class would still need to provide a BTORDER_PROC. What I
describe would be an optional capability. This is something that I
have referred to as key normalization in the past:

https://wiki.postgresql.org/wiki/Key_normalization

I think that it would only make sense as an enabler of multiple
optimizations -- not just the memcmp()/strcmp() thing. A common
strcmp()'able binary string format can be used in many different ways.
Note that this has nothing to do with changing the representation used
by the vast majority of all tuples -- just the pivot tuples, which are
mostly located in internal pages. They only make up less than 1% of
all pages in almost all cases.

I intend to prototype this technique within the next year. It's
possible that it isn't worth the trouble, but there is only one way to
find out. I might just work on the "abbreviated keys in internal
pages" thing, for example. Though you really need some kind of prefix
compression to make that effective.

--
Peter Geoghegan

Shichao Jin

jsc0218@gmail.com

almost 6 years ago

In reply to: Peter Geoghegan (#6)

Re: Memory-comparable Serialization of Data Types

Thank you for both your feedback. Yes, as indicated by Peter, we indeed use
that technique in comparison in index, and now we will try passing
comparator to the storage engine according to Alvaro's suggestion.

Best,
Shichao

On Tue, 11 Feb 2020 at 17:16, Peter Geoghegan <pg@bowt.ie> wrote:

On Tue, Feb 11, 2020 at 1:40 PM Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

I think adding that would be too much of a burden, both for the project
itself as for third-party type definitions; I think we'd rather rely on
calling the BTORDER_PROC btree support function for the type.

An operator class would still need to provide a BTORDER_PROC. What I
describe would be an optional capability. This is something that I
have referred to as key normalization in the past:

https://wiki.postgresql.org/wiki/Key_normalization

I think that it would only make sense as an enabler of multiple
optimizations -- not just the memcmp()/strcmp() thing. A common
strcmp()'able binary string format can be used in many different ways.
Note that this has nothing to do with changing the representation used
by the vast majority of all tuples -- just the pivot tuples, which are
mostly located in internal pages. They only make up less than 1% of
all pages in almost all cases.

I intend to prototype this technique within the next year. It's
possible that it isn't worth the trouble, but there is only one way to
find out. I might just work on the "abbreviated keys in internal
pages" thing, for example. Though you really need some kind of prefix
compression to make that effective.

--
Peter Geoghegan

--
Shichao Jin
PhD Student at University of Waterloo, Canada
e-mail: jsc0218@gmail.com
homepage: http://sites.google.com/site/csshichaojin/