pgsql: Compress GIN posting lists, for smaller index size.

Started by Heikki Linnakangasalmost 12 years ago12 messages
#1Heikki Linnakangas
heikki.linnakangas@iki.fi

Compress GIN posting lists, for smaller index size.

GIN posting lists are now encoded using varbyte-encoding, which allows them
to fit in much smaller space than the straight ItemPointer array format used
before. The new encoding is used for both the lists stored in-line in entry
tree items, and in posting tree leaf pages.

To maintain backwards-compatibility and keep pg_upgrade working, the code
can still read old-style pages and tuples. Posting tree leaf pages in the
new format are flagged with GIN_COMPRESSED flag, to distinguish old and new
format pages. Likewise, entry tree tuples in the new format have a
GIN_ITUP_COMPRESSED flag set in a bit that was previously unused.

This patch bumps GIN_CURRENT_VERSION from 1 to 2. New indexes created with
version 9.4 will therefore have version number 2 in the metapage, while old
pg_upgraded indexes will have version 1. The code treats them the same, but
it might be come handy in the future, if we want to drop support for the
uncompressed format.

Alexander Korotkov and me. Reviewed by Tomas Vondra and Amit Langote.

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/36a35c550ac114caa423bcbe339d3515db0cd957

Modified Files
--------------
contrib/pgstattuple/expected/pgstattuple.out | 2 +-
src/backend/access/gin/README | 123 ++-
src/backend/access/gin/ginbtree.c | 73 +-
src/backend/access/gin/gindatapage.c | 1450 ++++++++++++++++++++------
src/backend/access/gin/ginentrypage.c | 134 ++-
src/backend/access/gin/ginfast.c | 2 +-
src/backend/access/gin/ginget.c | 117 ++-
src/backend/access/gin/gininsert.c | 67 +-
src/backend/access/gin/ginpostinglist.c | 386 ++++++-
src/backend/access/gin/ginvacuum.c | 232 +++--
src/backend/access/gin/ginxlog.c | 184 ++--
src/backend/access/rmgrdesc/gindesc.c | 45 +-
src/include/access/gin_private.h | 212 +++-
13 files changed, 2309 insertions(+), 718 deletions(-)

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

#2Fujii Masao
masao.fujii@gmail.com
In reply to: Heikki Linnakangas (#1)
Re: pgsql: Compress GIN posting lists, for smaller index size.

On Thu, Jan 23, 2014 at 2:28 AM, Heikki Linnakangas
<heikki.linnakangas@iki.fi> wrote:

Compress GIN posting lists, for smaller index size.

GIN posting lists are now encoded using varbyte-encoding, which allows them
to fit in much smaller space than the straight ItemPointer array format used
before. The new encoding is used for both the lists stored in-line in entry
tree items, and in posting tree leaf pages.

To maintain backwards-compatibility and keep pg_upgrade working, the code
can still read old-style pages and tuples. Posting tree leaf pages in the
new format are flagged with GIN_COMPRESSED flag, to distinguish old and new
format pages. Likewise, entry tree tuples in the new format have a
GIN_ITUP_COMPRESSED flag set in a bit that was previously unused.

This patch bumps GIN_CURRENT_VERSION from 1 to 2. New indexes created with
version 9.4 will therefore have version number 2 in the metapage, while old
pg_upgraded indexes will have version 1. The code treats them the same, but
it might be come handy in the future, if we want to drop support for the
uncompressed format.

I failed to compile HEAD because, ISTM, of this patch.

ginvacuum.c:34: error: redefinition of typedef 'GinVacuumState'
../../../../src/include/access/gin_private.h:715: error: previous
declaration of 'GinVacuumState' was here
make[4]: *** [ginvacuum.o] Error 1
make[3]: *** [gin-recursive] Error 2
make[2]: *** [access-recursive] Error 2
make[2]: *** Waiting for unfinished jobs....

$ uname -a
Darwin test.local 11.4.2 Darwin Kernel Version 11.4.2: Thu Aug 23
16:25:48 PDT 2012; root:xnu-1699.32.7~1/RELEASE_X86_64 x86_64

Regards,

--
Fujii Masao

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

#3Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Fujii Masao (#2)
Re: pgsql: Compress GIN posting lists, for smaller index size.

On 01/22/2014 07:44 PM, Fujii Masao wrote:

ginvacuum.c:34: error: redefinition of typedef 'GinVacuumState'
../../../../src/include/access/gin_private.h:715: error: previous
declaration of 'GinVacuumState' was here
make[4]: *** [ginvacuum.o] Error 1
make[3]: *** [gin-recursive] Error 2
make[2]: *** [access-recursive] Error 2
make[2]: *** Waiting for unfinished jobs....

Hmm, my compiler (gcc 4.8) was happy with that, but that seems to be in
the minority. Fixed, thanks!

- Heikki

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Fujii Masao (#2)
Re: pgsql: Compress GIN posting lists, for smaller index size.

Fujii Masao <masao.fujii@gmail.com> writes:

On Thu, Jan 23, 2014 at 2:28 AM, Heikki Linnakangas
<heikki.linnakangas@iki.fi> wrote:

Compress GIN posting lists, for smaller index size.

I failed to compile HEAD because, ISTM, of this patch.

It looks like some but not all buildfarm members are seeing the same
error. Perhaps a platform- or build-option-specific issue?

regards, tom lane

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

#5Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Tom Lane (#4)
Re: pgsql: Compress GIN posting lists, for smaller index size.

On 01/22/2014 07:58 PM, Tom Lane wrote:

Fujii Masao <masao.fujii@gmail.com> writes:

On Thu, Jan 23, 2014 at 2:28 AM, Heikki Linnakangas
<heikki.linnakangas@iki.fi> wrote:

Compress GIN posting lists, for smaller index size.

I failed to compile HEAD because, ISTM, of this patch.

It looks like some but not all buildfarm members are seeing the same
error. Perhaps a platform- or build-option-specific issue?

clang says I was using a C11 feature:

ginvacuum.c:34:3: warning: redefinition of typedef 'GinVacuumState' is a C11
feature [-Wtypedef-redefinition]
} GinVacuumState;
^
../../../../src/include/access/gin_private.h:715:31: note: previous
definition
is here
typedef struct GinVacuumState GinVacuumState;
^

Anyway, fixed now..

- Heikki

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

#6Stefan Kaltenbrunner
stefan@kaltenbrunner.cc
In reply to: Heikki Linnakangas (#1)
Re: [COMMITTERS] pgsql: Compress GIN posting lists, for smaller index size.

On 01/22/2014 06:28 PM, Heikki Linnakangas wrote:

Compress GIN posting lists, for smaller index size.

GIN posting lists are now encoded using varbyte-encoding, which allows them
to fit in much smaller space than the straight ItemPointer array format used
before. The new encoding is used for both the lists stored in-line in entry
tree items, and in posting tree leaf pages.

To maintain backwards-compatibility and keep pg_upgrade working, the code
can still read old-style pages and tuples. Posting tree leaf pages in the
new format are flagged with GIN_COMPRESSED flag, to distinguish old and new
format pages. Likewise, entry tree tuples in the new format have a
GIN_ITUP_COMPRESSED flag set in a bit that was previously unused.

This patch bumps GIN_CURRENT_VERSION from 1 to 2. New indexes created with
version 9.4 will therefore have version number 2 in the metapage, while old
pg_upgraded indexes will have version 1. The code treats them the same, but
it might be come handy in the future, if we want to drop support for the
uncompressed format.

Alexander Korotkov and me. Reviewed by Tomas Vondra and Amit Langote.

it seems that this commit made spoonbill an unhappy animal:

http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=spoonbill&amp;dt=2014-01-23%2000%3A00%3A04

Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Stefan Kaltenbrunner (#6)
Re: [COMMITTERS] pgsql: Compress GIN posting lists, for smaller index size.

On 01/23/2014 09:18 PM, Stefan Kaltenbrunner wrote:

On 01/22/2014 06:28 PM, Heikki Linnakangas wrote:

Compress GIN posting lists, for smaller index size.

GIN posting lists are now encoded using varbyte-encoding, which allows them
to fit in much smaller space than the straight ItemPointer array format used
before. The new encoding is used for both the lists stored in-line in entry
tree items, and in posting tree leaf pages.

To maintain backwards-compatibility and keep pg_upgrade working, the code
can still read old-style pages and tuples. Posting tree leaf pages in the
new format are flagged with GIN_COMPRESSED flag, to distinguish old and new
format pages. Likewise, entry tree tuples in the new format have a
GIN_ITUP_COMPRESSED flag set in a bit that was previously unused.

This patch bumps GIN_CURRENT_VERSION from 1 to 2. New indexes created with
version 9.4 will therefore have version number 2 in the metapage, while old
pg_upgraded indexes will have version 1. The code treats them the same, but
it might be come handy in the future, if we want to drop support for the
uncompressed format.

Alexander Korotkov and me. Reviewed by Tomas Vondra and Amit Langote.

it seems that this commit made spoonbill an unhappy animal:

http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=spoonbill&amp;dt=2014-01-23%2000%3A00%3A04

Hmm, all the Sparcs. Some kind of an alignment issue, perhaps? I will
investigate..

- Heikki

--
- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#7)
Re: Re: [COMMITTERS] pgsql: Compress GIN posting lists, for smaller index size.

Heikki Linnakangas <hlinnakangas@vmware.com> writes:

On 01/23/2014 09:18 PM, Stefan Kaltenbrunner wrote:

it seems that this commit made spoonbill an unhappy animal:
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=spoonbill&amp;dt=2014-01-23%2000%3A00%3A04

Hmm, all the Sparcs. Some kind of an alignment issue, perhaps? I will
investigate..

My HPUX box, which is also picky about alignment, is unhappy as well.
It's crashing here:

ginPostingListDecode (plist=0xc39efac1, ndecoded=0x7b03bee0)
at ginpostinglist.c:263
263 return ginPostingListDecodeAllSegments(plist,
(gdb) bt
#0 ginPostingListDecode (plist=0xc39efac1, ndecoded=0x7b03bee0)
at ginpostinglist.c:263
#1 0x205308 in ginReadTuple (ginstate=0xc39efac1, attnum=48864,
itup=0x7b03bee0, nitems=0x403ee9a4) at ginentrypage.c:170
#2 0x21074c in startScanEntry (ginstate=0x403ec3ac, entry=0x403ee970)
at ginget.c:463
#3 0x21086c in startScan (scan=0xc39efac1) at ginget.c:493
#4 0x212c14 in gingetbitmap (fcinfo=0xc39efac1) at ginget.c:1531
#5 0x5ffc50 in FunctionCall2Coll (flinfo=0xc39efac1, collation=2063843040,
arg1=2063843040, arg2=1077864868) at fmgr.c:1323
#6 0x24ee5c in index_getbitmap (scan=0x40163878, bitmap=0x403ee620)
at indexam.c:649
#7 0x3b9430 in MultiExecBitmapIndexScan (node=0x40163768)
at nodeBitmapIndexscan.c:89
#8 0x3a5a3c in MultiExecProcNode (node=0x40163768) at execProcnode.c:562
#9 0x3b8610 in BitmapHeapNext (node=0x401628f0) at nodeBitmapHeapscan.c:104
#10 0x3ae5b0 in ExecScan (node=0x401628f0,
accessMtd=0x4001a2c2 <DINFINITY+3802>,
recheckMtd=0x4001a2ca <DINFINITY+3810>) at execScan.c:82
#11 0x3b8e9c in ExecBitmapHeapScan (node=0xc39efac1)
at nodeBitmapHeapscan.c:441
#12 0x3a56e0 in ExecProcNode (node=0x401628f0) at execProcnode.c:414
...

(gdb) p debug_query_string
$1 = 0x4006d4a8 "SELECT * FROM array_index_op_test WHERE i <@ '{38,34,32,89}' ORDER BY seqno;"

The problem appears to be due to the misaligned "plist" pointer
(0xc39efac1 here).

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Tom Lane (#8)
Re: Re: [COMMITTERS] pgsql: Compress GIN posting lists, for smaller index size.

On 01/23/2014 10:37 PM, Tom Lane wrote:

Heikki Linnakangas <hlinnakangas@vmware.com> writes:

On 01/23/2014 09:18 PM, Stefan Kaltenbrunner wrote:

it seems that this commit made spoonbill an unhappy animal:
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=spoonbill&amp;dt=2014-01-23%2000%3A00%3A04

Hmm, all the Sparcs. Some kind of an alignment issue, perhaps? I will
investigate..

My HPUX box, which is also picky about alignment, is unhappy as well.
It's crashing here:

ginPostingListDecode (plist=0xc39efac1, ndecoded=0x7b03bee0)
at ginpostinglist.c:263
263 return ginPostingListDecodeAllSegments(plist,
(gdb) bt
#0 ginPostingListDecode (plist=0xc39efac1, ndecoded=0x7b03bee0)
at ginpostinglist.c:263
#1 0x205308 in ginReadTuple (ginstate=0xc39efac1, attnum=48864,
itup=0x7b03bee0, nitems=0x403ee9a4) at ginentrypage.c:170
#2 0x21074c in startScanEntry (ginstate=0x403ec3ac, entry=0x403ee970)
at ginget.c:463
#3 0x21086c in startScan (scan=0xc39efac1) at ginget.c:493
#4 0x212c14 in gingetbitmap (fcinfo=0xc39efac1) at ginget.c:1531
#5 0x5ffc50 in FunctionCall2Coll (flinfo=0xc39efac1, collation=2063843040,
arg1=2063843040, arg2=1077864868) at fmgr.c:1323
#6 0x24ee5c in index_getbitmap (scan=0x40163878, bitmap=0x403ee620)
at indexam.c:649
#7 0x3b9430 in MultiExecBitmapIndexScan (node=0x40163768)
at nodeBitmapIndexscan.c:89
#8 0x3a5a3c in MultiExecProcNode (node=0x40163768) at execProcnode.c:562
#9 0x3b8610 in BitmapHeapNext (node=0x401628f0) at nodeBitmapHeapscan.c:104
#10 0x3ae5b0 in ExecScan (node=0x401628f0,
accessMtd=0x4001a2c2 <DINFINITY+3802>,
recheckMtd=0x4001a2ca <DINFINITY+3810>) at execScan.c:82
#11 0x3b8e9c in ExecBitmapHeapScan (node=0xc39efac1)
at nodeBitmapHeapscan.c:441
#12 0x3a56e0 in ExecProcNode (node=0x401628f0) at execProcnode.c:414
...

(gdb) p debug_query_string
$1 = 0x4006d4a8 "SELECT * FROM array_index_op_test WHERE i <@ '{38,34,32,89}' ORDER BY seqno;"

The problem appears to be due to the misaligned "plist" pointer
(0xc39efac1 here).

Ah, thanks! Looks like I removed a SHORTALIGN from ginFormTuple that was
in fact very much necessary.. Fixed now, let's see if that pacifies the
sparcs.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#9)
Re: Re: [COMMITTERS] pgsql: Compress GIN posting lists, for smaller index size.

Heikki Linnakangas <hlinnakangas@vmware.com> writes:

On 01/23/2014 10:37 PM, Tom Lane wrote:

The problem appears to be due to the misaligned "plist" pointer
(0xc39efac1 here).

Ah, thanks! Looks like I removed a SHORTALIGN from ginFormTuple that was
in fact very much necessary.. Fixed now, let's see if that pacifies the
sparcs.

My HPPA box is happy again, anyway. Thanks.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Andres Freund
andres@2ndquadrant.com
In reply to: Heikki Linnakangas (#1)
Re: pgsql: Compress GIN posting lists, for smaller index size.

Hi,
On 2014-01-22 17:28:48 +0000, Heikki Linnakangas wrote:

Compress GIN posting lists, for smaller index size.

GIN posting lists are now encoded using varbyte-encoding, which allows them
to fit in much smaller space than the straight ItemPointer array format used
before. The new encoding is used for both the lists stored in-line in entry
tree items, and in posting tree leaf pages.

To maintain backwards-compatibility and keep pg_upgrade working, the code
can still read old-style pages and tuples. Posting tree leaf pages in the
new format are flagged with GIN_COMPRESSED flag, to distinguish old and new
format pages. Likewise, entry tree tuples in the new format have a
GIN_ITUP_COMPRESSED flag set in a bit that was previously unused.

A new version of clang complains:

/home/andres/src/postgresql/src/backend/access/gin/ginvacuum.c:512:34: warning: signed shift result (0x80000000) sets the sign bit of the
shift expression's type ('int') and becomes negative [-Wshift-sign-overflow]
uncompressed = (ItemPointer) GinGetPosting(itup);
^~~~~~~~~~~~~~~~~~~
/home/andres/src/postgresql/src/include/access/gin_private.h:226:59: note: expanded from macro 'GinGetPosting'
#define GinGetPosting(itup) ((Pointer) ((char*)(itup) + GinGetPostingOffset(itup)))
^~~~~~~~~~~~~~~~~~~~~~~~~
/home/andres/src/postgresql/src/include/access/gin_private.h:224:85: note: expanded from macro 'GinGetPostingOffset'
#define GinGetPostingOffset(itup) (GinItemPointerGetBlockNumber(&(itup)->t_tid) & (~GIN_ITUP_COMPRESSED))
^~~~~~~~~~~~~~~~~~~
/home/andres/src/postgresql/src/include/access/gin_private.h:223:33: note: expanded from macro 'GIN_ITUP_COMPRESSED'
#define GIN_ITUP_COMPRESSED (1 << 31)

As far as I understand the code it should rather be
#define GIN_ITUP_COMPRESSED (1U << 31)

Is that right?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

#12Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Andres Freund (#11)
Re: pgsql: Compress GIN posting lists, for smaller index size.

On 09/02/2014 02:11 PM, Andres Freund wrote:

Hi,
On 2014-01-22 17:28:48 +0000, Heikki Linnakangas wrote:

Compress GIN posting lists, for smaller index size.

GIN posting lists are now encoded using varbyte-encoding, which allows them
to fit in much smaller space than the straight ItemPointer array format used
before. The new encoding is used for both the lists stored in-line in entry
tree items, and in posting tree leaf pages.

To maintain backwards-compatibility and keep pg_upgrade working, the code
can still read old-style pages and tuples. Posting tree leaf pages in the
new format are flagged with GIN_COMPRESSED flag, to distinguish old and new
format pages. Likewise, entry tree tuples in the new format have a
GIN_ITUP_COMPRESSED flag set in a bit that was previously unused.

A new version of clang complains:

/home/andres/src/postgresql/src/backend/access/gin/ginvacuum.c:512:34: warning: signed shift result (0x80000000) sets the sign bit of the
shift expression's type ('int') and becomes negative [-Wshift-sign-overflow]
uncompressed = (ItemPointer) GinGetPosting(itup);
^~~~~~~~~~~~~~~~~~~
/home/andres/src/postgresql/src/include/access/gin_private.h:226:59: note: expanded from macro 'GinGetPosting'
#define GinGetPosting(itup) ((Pointer) ((char*)(itup) + GinGetPostingOffset(itup)))
^~~~~~~~~~~~~~~~~~~~~~~~~
/home/andres/src/postgresql/src/include/access/gin_private.h:224:85: note: expanded from macro 'GinGetPostingOffset'
#define GinGetPostingOffset(itup) (GinItemPointerGetBlockNumber(&(itup)->t_tid) & (~GIN_ITUP_COMPRESSED))
^~~~~~~~~~~~~~~~~~~
/home/andres/src/postgresql/src/include/access/gin_private.h:223:33: note: expanded from macro 'GIN_ITUP_COMPRESSED'
#define GIN_ITUP_COMPRESSED (1 << 31)

As far as I understand the code it should rather be
#define GIN_ITUP_COMPRESSED (1U << 31)

Is that right?

Yep. Fixed, thanks.

- Heikki

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers