Collation-aware comparisons in GIN opclasses

Started by Alexander Korotkovover 11 years ago15 messages

aekorotkov@gmail.com

over 11 years ago

Hackers,

some GIN opclasses uses collation-aware comparisons while they don't need
to do especially collation-aware comparison. Examples are text[] and hstore
opclasses. Depending on collation this may make them a much slower.

See example.

# show lc_collate ;
lc_collate
─────────────
ru_RU.UTF-8
(1 row)

# create table test as (select array_agg(i::text) from
generate_series(1,1000000) i group by (i-1)/10);
SELECT 100000

# create index test_idx on test using gin(array_agg);
CREATE INDEX
Time: *26930,423 ms*

# create index test_idx2 on test using gin(array_agg collate "C");
CREATE INDEX
Time: *5143,682 ms*

Index creation with collation "ru_RU.UTF-8" is 5 times slower while
collation has absolutely no effect on index functionality.

However, we can just replace comparison function for those opclasses
because it would break binary compatibility for pg_upgrade. I see following
solution:

1. Rename such opclasses and make them not default.
2. Create new default opclasses with bitwise comparison functions.
3. Write recommendation to re-create indexes with default opclasses into
documentation.

------
With best regards,
Alexander Korotkov.

Peter Geoghegan

pg@heroku.com

over 11 years ago

In reply to: Alexander Korotkov (#1)

Re: Collation-aware comparisons in GIN opclasses

On Mon, Sep 15, 2014 at 8:28 AM, Alexander Korotkov
<aekorotkov@gmail.com> wrote:

some GIN opclasses uses collation-aware comparisons while they don't need to
do especially collation-aware comparison. Examples are text[] and hstore
opclasses. Depending on collation this may make them a much slower.

I'm glad that I saw how pointless this was with the jsonb GIN default
opclass during development.

Rename such opclasses and make them not default.
Create new default opclasses with bitwise comparison functions.
Write recommendation to re-create indexes with default opclasses into
documentation.

I certainly think this should be fixed if at all possible, but I'm not
sure about this plan. Can we really rename an opclass without
consequence, including having that respected across pg_upgrade?

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Alexander Korotkov

aekorotkov@gmail.com

over 11 years ago

In reply to: Peter Geoghegan (#2)

Re: Collation-aware comparisons in GIN opclasses

On Mon, Sep 15, 2014 at 10:51 PM, Peter Geoghegan <pg@heroku.com> wrote:

On Mon, Sep 15, 2014 at 8:28 AM, Alexander Korotkov
<aekorotkov@gmail.com> wrote:

some GIN opclasses uses collation-aware comparisons while they don't

need to

do especially collation-aware comparison. Examples are text[] and hstore
opclasses. Depending on collation this may make them a much slower.

I'm glad that I saw how pointless this was with the jsonb GIN default
opclass during development.

Rename such opclasses and make them not default.
Create new default opclasses with bitwise comparison functions.
Write recommendation to re-create indexes with default opclasses into
documentation.

I certainly think this should be fixed if at all possible, but I'm not
sure about this plan. Can we really rename an opclass without
consequence, including having that respected across pg_upgrade?

Just rename doesn't seem to be safe. Since pg_upgrade uses pg_dump, all
indexes are linked to opclasses using their names. Therefore existed
indexes will be linked to new opclasses. It's likely we need to execute SQL
script renaming opclasses before pg_upgrade. Another option is to don't
rename old opclasses, just create new default opclasses with new names.
Bruce, what is your opinion about pg_upgrade?
Contrib opclasses would be safe to rename using migration script.

------
With best regards,
Alexander Korotkov.

Tom Lane

tgl@sss.pgh.pa.us

over 11 years ago

In reply to: Peter Geoghegan (#2)

Re: Collation-aware comparisons in GIN opclasses

Peter Geoghegan <pg@heroku.com> writes:

On Mon, Sep 15, 2014 at 8:28 AM, Alexander Korotkov
<aekorotkov@gmail.com> wrote:

Rename such opclasses and make them not default.
Create new default opclasses with bitwise comparison functions.
Write recommendation to re-create indexes with default opclasses into
documentation.

I certainly think this should be fixed if at all possible, but I'm not
sure about this plan. Can we really rename an opclass without
consequence, including having that respected across pg_upgrade?

No. And we don't know how to change the default opclass without
breaking things, either. See previous discussions about how we
might fix the totally-broken default gist opclass that btree_gist
creates for the inet type [1]/messages/by-id/CAE2gYzyUESd188j0b290Gf16502H9B-LwNRS3rFi1SwDb9Qcgw@mail.gmail.com. The motivation for getting rid of that
is *way* stronger than "it might be slow", but there's no apparent
way to make something else be the default without creating havoc.

regards, tom lane

[1]: /messages/by-id/CAE2gYzyUESd188j0b290Gf16502H9B-LwNRS3rFi1SwDb9Qcgw@mail.gmail.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Peter Geoghegan

pg@heroku.com

over 11 years ago

In reply to: Tom Lane (#4)

Re: Collation-aware comparisons in GIN opclasses

On Mon, Sep 15, 2014 at 12:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

No. And we don't know how to change the default opclass without
breaking things, either.

Is there a page on the Wiki along the lines of "things that we would
like to change if ever there is a substantial change in on-disk format
that will break pg_upgrade"? ISTM that we should be intelligently
saving those some place, just as Redhat presumably save up
ABI-breakage over many years for the next major release of RHEL.
Alexander's complaint is a good example of such a change, IMV. Isn't
it more or less expected that the day will come when we'll make a
clean break?

--
Peter Geoghegan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Alexander Korotkov

aekorotkov@gmail.com

over 11 years ago

In reply to: Tom Lane (#4)

Re: Collation-aware comparisons in GIN opclasses

On Mon, Sep 15, 2014 at 11:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Peter Geoghegan <pg@heroku.com> writes:

On Mon, Sep 15, 2014 at 8:28 AM, Alexander Korotkov
<aekorotkov@gmail.com> wrote:

Rename such opclasses and make them not default.
Create new default opclasses with bitwise comparison functions.
Write recommendation to re-create indexes with default opclasses into
documentation.

I certainly think this should be fixed if at all possible, but I'm not
sure about this plan. Can we really rename an opclass without
consequence, including having that respected across pg_upgrade?

No. And we don't know how to change the default opclass without
breaking things, either. See previous discussions about how we
might fix the totally-broken default gist opclass that btree_gist
creates for the inet type [1]. The motivation for getting rid of that
is *way* stronger than "it might be slow", but there's no apparent
way to make something else be the default without creating havoc.

I've read thread about gist opclass for inet type. But that case is more
difficult because conflict is between builtin opclass and contrib opclass.
This case seems to be much simpler: we need to change builtin opclass to
builtin opclass and contrib opclass to contrib opclass. I realized that
it's problematic to rename builtin opclass due to pg_upgrade. However, it
seems still possible to create new opclass and make it default.

------
With best regards,
Alexander Korotkov.

Emre Hasegeli

emre@hasegeli.com

over 11 years ago

In reply to: Tom Lane (#4)

Re: Collation-aware comparisons in GIN opclasses

No. And we don't know how to change the default opclass without
breaking things, either. See previous discussions about how we
might fix the totally-broken default gist opclass that btree_gist
creates for the inet type [1]. The motivation for getting rid of that
is *way* stronger than "it might be slow", but there's no apparent
way to make something else be the default without creating havoc.

Inet case was not the same. We tried to replace the default opclass
in contrib with another one in core. It did not work because
pg_dump --binary-upgrade dumps the objects of the extension which
cannot be restored when there is a default opclass for the same
data type.

Changing the default opclasses should work if we make
pg_dump --binary-upgrade dump the default opclasses with indexes
and exclusion constraints. I think it makes sense to do so in
--binary-upgrade mode. I can try to come with a patch for this.

I cannot see a way to rename opclasses in core. I think we can live
with default opclasses which are not named as type_ops.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Alexander Korotkov

aekorotkov@gmail.com

over 11 years ago

In reply to: Emre Hasegeli (#7)

Re: Collation-aware comparisons in GIN opclasses

On Tue, Sep 16, 2014 at 11:29 AM, Emre Hasegeli <emre@hasegeli.com> wrote:

Changing the default opclasses should work if we make
pg_dump --binary-upgrade dump the default opclasses with indexes
and exclusion constraints. I think it makes sense to do so in
--binary-upgrade mode. I can try to come with a patch for this.

Can you explain it a bit more detail? I didn't get it.

------
With best regards,
Alexander Korotkov.

Emre Hasegeli

emre@hasegeli.com

over 11 years ago

In reply to: Alexander Korotkov (#8)

Re: Collation-aware comparisons in GIN opclasses

Changing the default opclasses should work if we make
pg_dump --binary-upgrade dump the default opclasses with indexes
and exclusion constraints. I think it makes sense to do so in
--binary-upgrade mode. I can try to come with a patch for this.

Can you explain it a bit more detail? I didn't get it.

pg_upgrade uses pg_dump --binary-upgrade to dump the schema of
the old database. Now, it generates CREATE INDEX statements without
explicit opclass if opclass is the default. We can change pg_dump
to generate the statements with opclass even if opclass is the default
in --binary-upgrade mode.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Alexander Korotkov

aekorotkov@gmail.com

over 11 years ago

In reply to: Emre Hasegeli (#9)

Re: Collation-aware comparisons in GIN opclasses

On Tue, Sep 16, 2014 at 12:14 PM, Emre Hasegeli <emre@hasegeli.com> wrote:

Changing the default opclasses should work if we make
pg_dump --binary-upgrade dump the default opclasses with indexes
and exclusion constraints. I think it makes sense to do so in
--binary-upgrade mode. I can try to come with a patch for this.

Can you explain it a bit more detail? I didn't get it.

pg_upgrade uses pg_dump --binary-upgrade to dump the schema of
the old database. Now, it generates CREATE INDEX statements without
explicit opclass if opclass is the default. We can change pg_dump
to generate the statements with opclass even if opclass is the default
in --binary-upgrade mode.

Thanks, I get it. I checked pg_dump implementation. It appears to be not as
easy as it could be. pg_dump doesn't form index definition by itself. It
calls pg_get_indexdef function. This function have no option to dump names
of default opclasses. Since we can't change behaviour of old postgres
version, we have to make pg_dump form index definition by itself.

------
With best regards,
Alexander Korotkov.

#11

Bruce Momjian

bruce@momjian.us

over 11 years ago

In reply to: Alexander Korotkov (#10)

Re: Collation-aware comparisons in GIN opclasses

On Tue, Sep 16, 2014 at 06:56:24PM +0400, Alexander Korotkov wrote:

On Tue, Sep 16, 2014 at 12:14 PM, Emre Hasegeli <emre@hasegeli.com> wrote:

Changing the default opclasses should work if we make
pg_dump --binary-upgrade dump the default opclasses with indexes
and exclusion constraints.ï¿½ I think it makes sense to do so in
--binary-upgrade mode.ï¿½ I can try to come with a patch for this.

Can you explain it a bit more detail? I didn't get it.

pg_upgrade uses pg_dump --binary-upgrade to dump the schema of
the old database.ï¿½ Now, it generates CREATE INDEX statements without
explicit opclass if opclass is the default.ï¿½ We can change pg_dump
to generate the statements with opclass even if opclass is the default
in --binary-upgrade mode.

Thanks, I get it. I checked pg_dump implementation. It appears to be not as
easy as it could be. pg_dump doesn't form index definition by itself. It calls
pg_get_indexdef function. This function have no option to dump names of default
opclasses. Since we can't change behaviour of old postgres version, we have to
make pg_dump form index definition by itself.

Well, the server is also operating in binary-upgrade mode, so you could
have the server-side function pg_get_indexdef() behave differently for
pg_upgrade.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Bruce Momjian

bruce@momjian.us

over 11 years ago

In reply to: Peter Geoghegan (#5)

Re: Collation-aware comparisons in GIN opclasses

On Mon, Sep 15, 2014 at 03:42:20PM -0700, Peter Geoghegan wrote:

On Mon, Sep 15, 2014 at 12:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

No. And we don't know how to change the default opclass without
breaking things, either.

Is there a page on the Wiki along the lines of "things that we would
like to change if ever there is a substantial change in on-disk format
that will break pg_upgrade"? ISTM that we should be intelligently
saving those some place, just as Redhat presumably save up
ABI-breakage over many years for the next major release of RHEL.
Alexander's complaint is a good example of such a change, IMV. Isn't
it more or less expected that the day will come when we'll make a
clean break?

It is on the TODO page under pg_upgrade:

Desired changes that would prevent upgrades with pg_upgrade

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Heikki Linnakangas

hlinnakangas@vmware.com

over 11 years ago

In reply to: Alexander Korotkov (#1)

1 attachment(s)

Re: Collation-aware comparisons in GIN opclasses

On 09/15/2014 06:28 PM, Alexander Korotkov wrote:

Hackers,

some GIN opclasses uses collation-aware comparisons while they don't need
to do especially collation-aware comparison. Examples are text[] and hstore
opclasses.

Hmm. It would be nice to use the index for inequality searches, at least
on text[]. We don't support that currently, but it would require
collation-awareness.

Depending on collation this may make them a much slower.

See example.

# show lc_collate ;
lc_collate
─────────────
ru_RU.UTF-8
(1 row)

# create table test as (select array_agg(i::text) from
generate_series(1,1000000) i group by (i-1)/10);
SELECT 100000

# create index test_idx on test using gin(array_agg);
CREATE INDEX
Time: *26930,423 ms*

# create index test_idx2 on test using gin(array_agg collate "C");
CREATE INDEX
Time: *5143,682 ms*

Index creation with collation "ru_RU.UTF-8" is 5 times slower while
collation has absolutely no effect on index functionality.

It occurs to me that practically all of those comparisons happen when we
populate the red-black Tree, during the index build. The purpose of the
red-black tree is to collect identical keys together, but there is
actually no requirement that the order of the red-black tree matches the
order of the index. It also isn't strictly required that it recognizes
equal keys as equal. The only requirement is that it doesn't incorrectly
put two keys that are equal according to the compare-function, into two
different nodes.

We could therefore use plain memcmp() to compare the Datums while
building the red-black tree. Keys that are bit-wise equal are surely
considered as equal by the compare-function. That makes the index build
a lot faster. With the attached quick patch:

postgres=# create index test_idx on test using gin(array_agg );
CREATE INDEX
Time: 880.620 ms

This is on my laptop. Without the patch, that takes about 4.7 seconds
with the C locale, so this is much faster than even using the C locale.

- Heikki

Attachments:

gin-red-black-memcmp-1.patchtext/x-diff; name=gin-red-black-memcmp-1.patchDownload

diff --git a/src/backend/access/gin/ginbulk.c b/src/backend/access/gin/ginbulk.c
index 3af0271..02d8600 100644
--- a/src/backend/access/gin/ginbulk.c
+++ b/src/backend/access/gin/ginbulk.c
@@ -22,6 +22,13 @@
 #define DEF_NENTRY	2048		/* GinEntryAccumulator allocation quantum */
 #define DEF_NPTR	5			/* ItemPointer initial allocation quantum */
 
+static int fastCompareEntries(GinState *ginstate, OffsetNumber attnum,
+				  Datum a, GinNullCategory categorya,
+				  Datum b, GinNullCategory categoryb);
+static int fastCompareAttEntries(GinState *ginstate,
+					 OffsetNumber attnuma, Datum a, GinNullCategory categorya,
+					 OffsetNumber attnumb, Datum b, GinNullCategory categoryb);
+static int fastCmpDatums(int16 attlen, bool attbyval, Datum a, Datum b);
 
 /* Combiner function for rbtree.c */
 static void
@@ -67,9 +74,104 @@ cmpEntryAccumulator(const RBNode *a, const RBNode *b, void *arg)
 	const GinEntryAccumulator *eb = (const GinEntryAccumulator *) b;
 	BuildAccumulator *accum = (BuildAccumulator *) arg;
 
-	return ginCompareAttEntries(accum->ginstate,
-								ea->attnum, ea->key, ea->category,
-								eb->attnum, eb->key, eb->category);
+	return fastCompareAttEntries(accum->ginstate,
+								 ea->attnum, ea->key, ea->category,
+								 eb->attnum, eb->key, eb->category);
+}
+
+
+/*
+ * Compare two keys of the same index column.
+ *
+ * This is like ginCompareEntries, but uses memcmp() for the comparison instead
+ * of the real compare function.
+ */
+static int
+fastCompareEntries(GinState *ginstate, OffsetNumber attnum,
+				  Datum a, GinNullCategory categorya,
+				  Datum b, GinNullCategory categoryb)
+{
+	/* if not of same null category, sort by that first */
+	if (categorya != categoryb)
+		return (categorya < categoryb) ? -1 : 1;
+
+	/* all null items in same category are equal */
+	if (categorya != GIN_CAT_NORM_KEY)
+		return 0;
+
+	/* both not null, so compare the Datums */
+	return fastCmpDatums(ginstate->origTupdesc->attrs[attnum - 1]->attlen,
+						 ginstate->origTupdesc->attrs[attnum - 1]->attbyval,
+						 a, b);
+}
+
+
+/*
+ * Compare two keys of possibly different index columns.
+ *
+ * This is like ginCompareAttEntries, but uses memcmp() for the comparison
+ * instead of the real compare function.
+ */
+static int
+fastCompareAttEntries(GinState *ginstate,
+					 OffsetNumber attnuma, Datum a, GinNullCategory categorya,
+					 OffsetNumber attnumb, Datum b, GinNullCategory categoryb)
+{
+	/* attribute number is the first sort key */
+	if (attnuma != attnumb)
+		return (attnuma < attnumb) ? -1 : 1;
+
+	return fastCompareEntries(ginstate, attnuma, a, categorya, b, categoryb);
+}
+
+/*
+ * Compare two Datums. This is like datumIsEqual, but returns -1, 0, or 1,
+ * rather than just a boolean.
+ *
+ * XXX: This assumes that the arguments are not toasted. Is that safe?
+ */
+static int
+fastCmpDatums(int16 typLen, bool typByVal, Datum value1, Datum value2)
+{
+	int			res;
+
+	if (typByVal)
+	{
+		/*
+		 * just compare the two datums. NOTE: just comparing "len" bytes will
+		 * not do the work, because we do not know how these bytes are aligned
+		 * inside the "Datum".  We assume instead that any given datatype is
+		 * consistent about how it fills extraneous bits in the Datum.
+		 */
+		if (value1 > value2)
+			return 1;
+		else if (value1 < value2)
+			return -1;
+		else
+			return 0;
+	}
+	else
+	{
+		Size		size1,
+					size2;
+		char	   *s1,
+				   *s2;
+
+		/*
+		 * Compare the bytes pointed by the pointers stored in the datums.
+		 */
+		size1 = datumGetSize(value1, typByVal, typLen);
+		size2 = datumGetSize(value2, typByVal, typLen);
+		if (size1 > size2)
+			return 1;
+		else if (size1 < size2)
+			return -1;
+
+		s1 = (char *) DatumGetPointer(value1);
+		s2 = (char *) DatumGetPointer(value2);
+		res = memcmp(s1, s2, size1);
+	}
+	return res;
 }
 
 /* Allocator function for rbtree.c */

#14

Oleg Bartunov

obartunov@gmail.com

over 11 years ago

In reply to: Heikki Linnakangas (#13)

Re: Collation-aware comparisons in GIN opclasses

On Mon, Sep 29, 2014 at 11:48 AM, Heikki Linnakangas <
hlinnakangas@vmware.com> wrote:

On 09/15/2014 06:28 PM, Alexander Korotkov wrote:

Hackers,

some GIN opclasses uses collation-aware comparisons while they don't need
to do especially collation-aware comparison. Examples are text[] and
hstore
opclasses.

Hmm. It would be nice to use the index for inequality searches, at least
on text[]. We don't support that currently, but it would require
collation-awareness.

Depending on collation this may make them a much slower.

See example.

# show lc_collate ;
lc_collate
─────────────
ru_RU.UTF-8
(1 row)

# create table test as (select array_agg(i::text) from
generate_series(1,1000000) i group by (i-1)/10);
SELECT 100000

# create index test_idx on test using gin(array_agg);
CREATE INDEX
Time: *26930,423 ms*

# create index test_idx2 on test using gin(array_agg collate "C");
CREATE INDEX
Time: *5143,682 ms*

Index creation with collation "ru_RU.UTF-8" is 5 times slower while
collation has absolutely no effect on index functionality.

It occurs to me that practically all of those comparisons happen when we
populate the red-black Tree, during the index build. The purpose of the
red-black tree is to collect identical keys together, but there is actually
no requirement that the order of the red-black tree matches the order of
the index. It also isn't strictly required that it recognizes equal keys as
equal. The only requirement is that it doesn't incorrectly put two keys
that are equal according to the compare-function, into two different nodes.

Good point, Heikki. I experienced several times this problem, fixed it with
C-locale and forgot again. Now, it's time to fix !

We could therefore use plain memcmp() to compare the Datums while building
the red-black tree. Keys that are bit-wise equal are surely considered as
equal by the compare-function. That makes the index build a lot faster.
With the attached quick patch:

postgres=# create index test_idx on test using gin(array_agg );
CREATE INDEX
Time: 880.620 ms

This is on my laptop. Without the patch, that takes about 4.7 seconds with
the C locale, so this is much faster than even using the C locale.

Hmm, on my MBA I got
17277.734 (patch) vs 39151.562 for ru_RU.UTF-8 and
6131.929 (patch) vs 6131.929 for C

Not much :(

Show quoted text

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Bruce Momjian

bruce@momjian.us

almost 11 years ago

In reply to: Bruce Momjian (#12)

Re: Collation-aware comparisons in GIN opclasses

On Sun, Sep 28, 2014 at 10:33:33PM -0400, Bruce Momjian wrote:

On Mon, Sep 15, 2014 at 03:42:20PM -0700, Peter Geoghegan wrote:

On Mon, Sep 15, 2014 at 12:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

No. And we don't know how to change the default opclass without
breaking things, either.

Is there a page on the Wiki along the lines of "things that we would
like to change if ever there is a substantial change in on-disk format
that will break pg_upgrade"? ISTM that we should be intelligently
saving those some place, just as Redhat presumably save up
ABI-breakage over many years for the next major release of RHEL.
Alexander's complaint is a good example of such a change, IMV. Isn't
it more or less expected that the day will come when we'll make a
clean break?

It is on the TODO page under pg_upgrade:

Desired changes that would prevent upgrades with pg_upgrade

Item added to TODO list.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers