pgsql: Allow opclasses to provide tri-valued GIN consistent functions.
Allow opclasses to provide tri-valued GIN consistent functions.
With the GIN "fast scan" feature, GIN can skip items without fetching all
the keys for them, if it can prove that they don't match regardless of
those keys. So far, it has done the proving by calling the boolean
consistent function with all combinations of TRUE/FALSE for the unfetched
keys, but since that's O(n^2), it becomes unfeasible with more than a few
keys. We can avoid calling consistent with all the combinations, if we can
tell the operator class implementation directly which keys are unknown.
This commit includes a triConsistent function for the built-in array and
tsvector opclasses.
Alexander Korotkov, with some changes by me.
Branch
------
master
Details
-------
http://git.postgresql.org/pg/commitdiff/c5608ea26a1f51998ad3cf987c3f0bda643c87a8
Modified Files
--------------
doc/src/sgml/gin.sgml | 54 ++++++++++++++--
doc/src/sgml/xindex.sgml | 13 +++-
src/backend/access/gin/ginarrayproc.c | 84 ++++++++++++++++++++++++
src/backend/access/gin/ginlogic.c | 68 +++++++++++++++++--
src/backend/access/gin/ginutil.c | 28 +++++++-
src/backend/utils/adt/tsginidx.c | 104 +++++++++++++++++++++++++++++-
src/include/access/gin.h | 18 +++++-
src/include/access/gin_private.h | 15 +----
src/include/catalog/catversion.h | 2 +-
src/include/catalog/pg_am.h | 2 +-
src/include/catalog/pg_amproc.h | 31 +++++++++
src/include/catalog/pg_proc.h | 4 ++
src/include/tsearch/ts_utils.h | 1 +
src/include/utils/builtins.h | 1 +
src/test/regress/expected/opr_sanity.out | 77 +++++++++++-----------
src/test/regress/sql/opr_sanity.sql | 70 +++++++++++---------
16 files changed, 469 insertions(+), 103 deletions(-)
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
On 12 March 2014 15:52, Heikki Linnakangas <heikki.linnakangas@iki.fi> wrote:
Allow opclasses to provide tri-valued GIN consistent functions.
With the GIN "fast scan" feature, GIN can skip items without fetching all
the keys for them, if it can prove that they don't match regardless of
those keys. So far, it has done the proving by calling the boolean
consistent function with all combinations of TRUE/FALSE for the unfetched
keys, but since that's O(n^2), it becomes unfeasible with more than a few
keys. We can avoid calling consistent with all the combinations, if we can
tell the operator class implementation directly which keys are unknown.This commit includes a triConsistent function for the built-in array and
tsvector opclasses.Alexander Korotkov, with some changes by me.
Patch to fix a couple typos attached.
--
Thom
Attachments:
patch_me_maybe.patchtext/x-patch; charset=US-ASCII; name=patch_me_maybe.patchDownload+2-2
On 03/13/2014 02:55 PM, Thom Brown wrote:
Patch to fix a couple typos attached.
Thanks, fixed.
- Heikki
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
Hi,
On 2014-03-12 15:52:58 +0000, Heikki Linnakangas wrote:
Allow opclasses to provide tri-valued GIN consistent functions.
clang warns me about a mistake here:
In file included from /home/andres/src/postgresql/src/backend/access/gin/ginxlog.c:16:
In file included from /home/andres/src/postgresql/src/include/access/gin_private.h:14:
/home/andres/src/postgresql/src/include/access/gin.h:57:3: warning: no previous extern declaration for
non-static variable 'GinLogicValueEnum'
[-Wmissing-variable-declarations]
} GinLogicValueEnum;
I think the GinLogicValueEnum is supposed to be an enum's name, not a
variable name, right?
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachments:
typo-gin-logic-value.patchtext/x-patch; charset=us-asciiDownload+2-2
Andres Freund <andres@2ndquadrant.com> writes:
I think the GinLogicValueEnum is supposed to be an enum's name, not a
variable name, right?
I think the whole thing is too cute by half. Why isn't it just
typedef enum GinLogicValue
{
GIN_FALSE = 0, /* item is present / matches */
GIN_TRUE = 1, /* item is not present / does not match */
GIN_MAYBE = 2 /* don't know if item is present / don't know if
* matches */
} GinLogicValue;
instead of thinking that we are smarter than the compiler about how
to store the enum? For that matter, those explicit specifications of
the enum tag numeric values seem unnecessary and bad style IMV.
regards, tom lane
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
On 2014-03-21 17:37:35 -0400, Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
I think the GinLogicValueEnum is supposed to be an enum's name, not a
variable name, right?I think the whole thing is too cute by half. Why isn't it just
typedef enum GinLogicValue
{
GIN_FALSE = 0, /* item is present / matches */
GIN_TRUE = 1, /* item is not present / does not match */
GIN_MAYBE = 2 /* don't know if item is present / don't know if
* matches */
} GinLogicValue;instead of thinking that we are smarter than the compiler about how
to store the enum?
It seems to be a memory only type, so using anything but the raw enum
type seems odd. If it were ondisk alignment stuff could make it
advantageous, but this way...
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
I wrote:
I think the whole thing is too cute by half. Why isn't it just
typedef enum GinLogicValue
{
GIN_FALSE = 0, /* item is present / matches */
GIN_TRUE = 1, /* item is not present / does not match */
GIN_MAYBE = 2 /* don't know if item is present / don't know if
* matches */
} GinLogicValue;
... and now that I look at it a second time: *please* tell me the comments
on the first two values should be swapped. The API isn't really "false
means a match", is it?
regards, tom lane
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
On 03/21/2014 10:58 PM, Tom Lane wrote:
I wrote:
I think the whole thing is too cute by half. Why isn't it just
typedef enum GinLogicValue
{
GIN_FALSE = 0, /* item is present / matches */
GIN_TRUE = 1, /* item is not present / does not match */
GIN_MAYBE = 2 /* don't know if item is present / don't know if
* matches */
} GinLogicValue;... and now that I look at it a second time: *please* tell me the comments
on the first two values should be swapped. The API isn't really "false
means a match", is it?
*facepalm*. Yes, the comments are backwards...
- Heikki
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
On 03/21/2014 10:47 PM, Andres Freund wrote:
On 2014-03-21 17:37:35 -0400, Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
I think the GinLogicValueEnum is supposed to be an enum's name, not a
variable name, right?I think the whole thing is too cute by half. Why isn't it just
typedef enum GinLogicValue
{
GIN_FALSE = 0, /* item is present / matches */
GIN_TRUE = 1, /* item is not present / does not match */
GIN_MAYBE = 2 /* don't know if item is present / don't know if
* matches */
} GinLogicValue;instead of thinking that we are smarter than the compiler about how
to store the enum?It seems to be a memory only type, so using anything but the raw enum
type seems odd. If it were ondisk alignment stuff could make it
advantageous, but this way...
That enum is used in the "check" arrays that are passed around GIN, with
one element per index term being searched. I'd really like to keep it
small, because the can be hundreds of elements long, and if the compiler
decides to store it as a 4- or 8-byte value instead of one byte, it
starts to add up.
Besides memory usage, it's convenient that an array of GinLogicValues is
compatible with an array of booleans, as long as there are no GIN_MAYBE
values in it.
I committed your original fix to make it an enum type, like it was
supposed to be. Thanks!
- Heikki
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
On 03/21/2014 06:43 PM, Heikki Linnakangas wrote:
On 03/21/2014 10:47 PM, Andres Freund wrote:
On 2014-03-21 17:37:35 -0400, Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
I think the GinLogicValueEnum is supposed to be an enum's name, not a
variable name, right?I think the whole thing is too cute by half. Why isn't it just
typedef enum GinLogicValue
{
GIN_FALSE = 0, /* item is present / matches */
GIN_TRUE = 1, /* item is not present / does not
match */
GIN_MAYBE = 2 /* don't know if item is present /
don't know if
* matches */
} GinLogicValue;instead of thinking that we are smarter than the compiler about how
to store the enum?It seems to be a memory only type, so using anything but the raw enum
type seems odd. If it were ondisk alignment stuff could make it
advantageous, but this way...That enum is used in the "check" arrays that are passed around GIN,
with one element per index term being searched. I'd really like to
keep it small, because the can be hundreds of elements long, and if
the compiler decides to store it as a 4- or 8-byte value instead of
one byte, it starts to add up.Besides memory usage, it's convenient that an array of GinLogicValues
is compatible with an array of booleans, as long as there are no
GIN_MAYBE values in it.I committed your original fix to make it an enum type, like it was
supposed to be. Thanks!
I don't think there's any guarantee it's only going to be one byte. The
ANSI C standard says:
Each enumerated type shall be compatible with char, a signed integer
type, or an unsigned integer type. The choice of type is
implementation-defined. (6.7.2.2 Enumerationspecifiers)
cheers
andrew
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
On 03/22/2014 03:33 PM, Andrew Dunstan wrote:
On 03/21/2014 06:43 PM, Heikki Linnakangas wrote:
On 03/21/2014 10:47 PM, Andres Freund wrote:
On 2014-03-21 17:37:35 -0400, Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
I think the GinLogicValueEnum is supposed to be an enum's name, not a
variable name, right?I think the whole thing is too cute by half. Why isn't it just
typedef enum GinLogicValue
{
GIN_FALSE = 0, /* item is present / matches */
GIN_TRUE = 1, /* item is not present / does not
match */
GIN_MAYBE = 2 /* don't know if item is present /
don't know if
* matches */
} GinLogicValue;instead of thinking that we are smarter than the compiler about how
to store the enum?It seems to be a memory only type, so using anything but the raw enum
type seems odd. If it were ondisk alignment stuff could make it
advantageous, but this way...That enum is used in the "check" arrays that are passed around GIN,
with one element per index term being searched. I'd really like to
keep it small, because the can be hundreds of elements long, and if
the compiler decides to store it as a 4- or 8-byte value instead of
one byte, it starts to add up.Besides memory usage, it's convenient that an array of GinLogicValues
is compatible with an array of booleans, as long as there are no
GIN_MAYBE values in it.I committed your original fix to make it an enum type, like it was
supposed to be. Thanks!I don't think there's any guarantee it's only going to be one byte. The
ANSI C standard says:Each enumerated type shall be compatible with char, a signed integer
type, or an unsigned integer type. The choice of type is
implementation-defined. (6.7.2.2 Enumerationspecifiers)
Here's the code from gin.h (after fixing the issues Andres and Tom
pointed out):
enum GinLogicValueEnum
{
GIN_FALSE = 0, /* item is not present / does not match */
GIN_TRUE = 1, /* item is present / matches */
GIN_MAYBE = 2 /* don't know if item is present / don't know if
* matches */
};typedef char GinLogicValue;
The reason for the typedef is precisely that an enum is not guaranteed
to be one byte. Tom suggested getting rid of the typedef, but it's
needed to make sure it's stored as one byte.
I'll go add a comment to it, explaining why it's needed.
- Heikki
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
On 03/23/2014 09:22 AM, Heikki Linnakangas wrote:
Here's the code from gin.h (after fixing the issues Andres and Tom
pointed out):enum GinLogicValueEnum
{
GIN_FALSE = 0, /* item is not present / does
not match */
GIN_TRUE = 1, /* item is present / matches */
GIN_MAYBE = 2 /* don't know if item is
present / don't know if
* matches */
};typedef char GinLogicValue;
The reason for the typedef is precisely that an enum is not guaranteed
to be one byte. Tom suggested getting rid of the typedef, but it's
needed to make sure it's stored as one byte.I'll go add a comment to it, explaining why it's needed.
Ak, OK, I probably misread your previous email.
cheers
andrew
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
On Sun, Mar 23, 2014 at 1:22 PM, Heikki Linnakangas <hlinnakangas@vmware.com
wrote:
The reason for the typedef is precisely that an enum is not guaranteed to
be one byte. Tom suggested getting rid of the typedef, but it's needed to
make sure it's stored as one byte.I'll go add a comment to it, explaining why it's needed.
If we're not using the enum type perhaps it would be better to make it a
bunch of #defines? The main advantage of enum types is that the debugger
knows what values are legal and can decode them for you.
That said, I guess there's not much reason to go and change things now
either.
--
greg
On 03/23/2014 06:07 PM, Greg Stark wrote:
On Sun, Mar 23, 2014 at 1:22 PM, Heikki Linnakangas <hlinnakangas@vmware.com
wrote:
The reason for the typedef is precisely that an enum is not guaranteed to
be one byte. Tom suggested getting rid of the typedef, but it's needed to
make sure it's stored as one byte.I'll go add a comment to it, explaining why it's needed.
If we're not using the enum type perhaps it would be better to make it a
bunch of #defines? The main advantage of enum types is that the debugger
knows what values are legal and can decode them for you.
Yeah, I think you're right. Changed it to use #defines, renamed the
typedef to GinTernaryValue as that's more descriptive, and added a brief
comment mentioning that a GinTernaryValue is compatible with a boolean.
- Heikki
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
On 2014-03-31 10:31:00 +0300, Heikki Linnakangas wrote:
On 03/23/2014 06:07 PM, Greg Stark wrote:
On Sun, Mar 23, 2014 at 1:22 PM, Heikki Linnakangas <hlinnakangas@vmware.com
wrote:
The reason for the typedef is precisely that an enum is not guaranteed to
be one byte. Tom suggested getting rid of the typedef, but it's needed to
make sure it's stored as one byte.I'll go add a comment to it, explaining why it's needed.
If we're not using the enum type perhaps it would be better to make it a
bunch of #defines? The main advantage of enum types is that the debugger
knows what values are legal and can decode them for you.Yeah, I think you're right. Changed it to use #defines, renamed the typedef
to GinTernaryValue as that's more descriptive, and added a brief comment
mentioning that a GinTernaryValue is compatible with a boolean.
It doesn't matter much in this case, but in general I find the reasoning
here less than convincing. For one, debuggers have a much easier time
providing the symbolic name (e.g. p WHATEVER_ENUM_VAL) than purely
defines. Also, it's perfectly possible to switch to the enum in
switch()es, and benefit from compiler warnings. Using defines prohibits
that...
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers
On 03/31/2014 12:25 PM, Andres Freund wrote:
On 2014-03-31 10:31:00 +0300, Heikki Linnakangas wrote:
On 03/23/2014 06:07 PM, Greg Stark wrote:
On Sun, Mar 23, 2014 at 1:22 PM, Heikki Linnakangas <hlinnakangas@vmware.com
wrote:
The reason for the typedef is precisely that an enum is not guaranteed to
be one byte. Tom suggested getting rid of the typedef, but it's needed to
make sure it's stored as one byte.I'll go add a comment to it, explaining why it's needed.
If we're not using the enum type perhaps it would be better to make it a
bunch of #defines? The main advantage of enum types is that the debugger
knows what values are legal and can decode them for you.Yeah, I think you're right. Changed it to use #defines, renamed the typedef
to GinTernaryValue as that's more descriptive, and added a brief comment
mentioning that a GinTernaryValue is compatible with a boolean.It doesn't matter much in this case, but in general I find the reasoning
here less than convincing. For one, debuggers have a much easier time
providing the symbolic name (e.g. p WHATEVER_ENUM_VAL) than purely
defines. Also, it's perfectly possible to switch to the enum in
switch()es, and benefit from compiler warnings. Using defines prohibits
that...
Yeah, I could've gone either way on this case. The enum wasn't actually
used anywhere, the typedef was, so debugger wouldn't have been able use
the enum values. Same for switch warnings, unless you explicitly cast to
the enum type. It seems pretty unlikely that you would remember to do
the cast, but forget to handle one of the three values. If there was any
chance that we would add more values to the enum in the future, a cast
in a switch would be useful to catch places where we forgot to handle
the new value, but we're not going to add more values to this enum.
- Heikki
--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers