Review: B-Tree emulation for GIN

Started by Ibrar Ahmedover 17 years ago35 messageshackers
Jump to latest
#1Ibrar Ahmed
ibrar.ahmad@gmail.com

Hi Teodor Sigaev,

I am getting server crash in contrib regression. May be I am doing something
wrong here. Regression diff is attached.

BTW these queries work fine outside the regression.

--
Ibrar Ahmed
EnterpriseDB http://www.enterprisedb.com

Attachments:

regression.diffsapplication/octet-stream; name=regression.diffsDownload+1042-1042
#2Teodor Sigaev
teodor@sigaev.ru
In reply to: Ibrar Ahmed (#1)
Re: Review: B-Tree emulation for GIN

will see, may be it's needed to update the patch

Ibrar Ahmed wrote:

Hi Teodor Sigaev,

I am getting server crash in contrib regression. May be I am doing
something wrong here. Regression diff is attached.

BTW these queries work fine outside the regression.

--
Ibrar Ahmed
EnterpriseDB http://www.enterprisedb.com

------------------------------------------------------------------------

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#3Teodor Sigaev
teodor@sigaev.ru
In reply to: Ibrar Ahmed (#1)
Re: Review: B-Tree emulation for GIN

Updated patch.

Ibrar Ahmed wrote:

Hi Teodor Sigaev,

I am getting server crash in contrib regression. May be I am doing
something wrong here. Regression diff is attached.

BTW these queries work fine outside the regression.

--
Ibrar Ahmed
EnterpriseDB http://www.enterprisedb.com

------------------------------------------------------------------------

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

Attachments:

gin_extra-0.2.gzapplication/x-tar; name=gin_extra-0.2.gzDownload
#4Ibrar Ahmed
ibrar.ahmad@gmail.com
In reply to: Teodor Sigaev (#3)
Re: Review: B-Tree emulation for GIN

Thanks,

On Fri, Dec 19, 2008 at 3:26 PM, Teodor Sigaev <teodor@sigaev.ru> wrote:

Updated patch.

Ibrar Ahmed wrote:

Hi Teodor Sigaev,

I am getting server crash in contrib regression. May be I am doing
something wrong here. Regression diff is attached.

BTW these queries work fine outside the regression.

--
Ibrar Ahmed
EnterpriseDB http://www.enterprisedb.com

------------------------------------------------------------------------

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW:
http://www.sigaev.ru/

--
Ibrar Ahmed
EnterpriseDB http://www.enterprisedb.com

#5Jeff Davis
pgsql@j-davis.com
In reply to: Teodor Sigaev (#3)
Re: Review: B-Tree emulation for GIN

On Fri, 2008-12-19 at 13:26 +0300, Teodor Sigaev wrote:

Updated patch.

I have merged this with HEAD, written a brief document (which is mostly
a copy of the btree-gist page), added the module to the contrib
Makefile, and made some very minor changes. Patch attached.

A couple comments:

1. Right now, to implement "less than" you need to start at the
beginning of the index and scan until you reach the supplied query
datum. This is because GIN doesn't support backwards scans
( http://archives.postgresql.org/pgsql-hackers/2008-07/msg01146.php ).

Unfortunately, that means numeric is not supportable for btree-gin
because there is no left-most value from which to start the scan. Do you
see an easy workaround to support numeric?

2. Why do you create a shell type "int32" from btree_gin.sql? I tried
doing a search-and-replace to use "int4" instead, and it seems to work
fine (and eliminates the warnings). I left it with int32 in my version
of the patch because I thought you may have some reason for using it.

Regards,
Jeff Davis

Attachments:

btree-gin.20081228.patch.gzapplication/x-gzip; name=btree-gin.20081228.patch.gzDownload
#6Jeff Davis
pgsql@j-davis.com
In reply to: Jeff Davis (#5)
Re: Review: B-Tree emulation for GIN

On Sun, 2008-12-28 at 23:41 -0800, Jeff Davis wrote:

2. Why do you create a shell type "int32" from btree_gin.sql? I tried
doing a search-and-replace to use "int4" instead, and it seems to work
fine (and eliminates the warnings). I left it with int32 in my version
of the patch because I thought you may have some reason for using it.

Attached an updated patch with this change.

Regards,
Jeff Davis

Attachments:

btree-gin.20090111.patch.gzapplication/x-gzip; name=btree-gin.20090111.patch.gzDownload
#7Teodor Sigaev
teodor@sigaev.ru
In reply to: Jeff Davis (#5)
Re: Review: B-Tree emulation for GIN

I have merged this with HEAD, written a brief document (which is mostly
a copy of the btree-gist page), added the module to the contrib
Makefile, and made some very minor changes. Patch attached.

Thank you

A couple comments:

1. Right now, to implement "less than" you need to start at the
beginning of the index and scan until you reach the supplied query
datum. This is because GIN doesn't support backwards scans
( http://archives.postgresql.org/pgsql-hackers/2008-07/msg01146.php ).

Unfortunately, that means numeric is not supportable for btree-gin
because there is no left-most value from which to start the scan. Do you
see an easy workaround to support numeric?

Hmm, let we use minimal varlena struct (with size equal to VARHDRSZ) as
left-most (and fake) value. It is never used for any actual argument except
compare function, so, new compare function is provided. New version of patch is
attached and it based on on your patch (btree-gin.20090111.patch.gz).

2. Why do you create a shell type "int32" from btree_gin.sql? I tried
doing a search-and-replace to use "int4" instead, and it seems to work
fine (and eliminates the warnings). I left it with int32 in my version
of the patch because I thought you may have some reason for using it.

My mistake, thank you for fix
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

Attachments:

btree_gin-0.7.gzapplication/x-tar; name=btree_gin-0.7.gzDownload
#8Jeff Davis
pgsql@j-davis.com
In reply to: Teodor Sigaev (#7)
Re: Review: B-Tree emulation for GIN

On Fri, 2009-01-16 at 17:42 +0300, Teodor Sigaev wrote:

Unfortunately, that means numeric is not supportable for btree-gin
because there is no left-most value from which to start the scan. Do you
see an easy workaround to support numeric?

Hmm, let we use minimal varlena struct (with size equal to VARHDRSZ) as
left-most (and fake) value. It is never used for any actual argument except
compare function, so, new compare function is provided. New version of patch is
attached and it based on on your patch (btree-gin.20090111.patch.gz).

Looks good to me. I updated the CommitFest wiki to point to this patch.

I like the fact that this patch does not modify the numeric ADT. It
still relies on the fact that the numeric type will never make use of
the minimal varlena struct, however. I bring this up in case someone
sees it as a problem.

Thanks,
Jeff Davis

#9Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Jeff Davis (#8)
Re: Review: B-Tree emulation for GIN

Jeff Davis escribi�:

I like the fact that this patch does not modify the numeric ADT. It
still relies on the fact that the numeric type will never make use of
the minimal varlena struct, however. I bring this up in case someone
sees it as a problem.

Greg Stark was working on a patch to make certain values use the short
representation.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#10Teodor Sigaev
teodor@sigaev.ru
In reply to: Alvaro Herrera (#9)
Re: Review: B-Tree emulation for GIN

still relies on the fact that the numeric type will never make use of
the minimal varlena struct, however. I bring this up in case someone
sees it as a problem.

Greg Stark was working on a patch to make certain values use the short
representation.

Fake value contains only VARHDRSZ bytes.

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Davis (#8)
Re: Review: B-Tree emulation for GIN

Jeff Davis <pgsql@j-davis.com> writes:

I like the fact that this patch does not modify the numeric ADT. It
still relies on the fact that the numeric type will never make use of
the minimal varlena struct, however. I bring this up in case someone
sees it as a problem.

I'm pretty certain I recall Greg Stark recommending that we adopt
something like that as the standard numeric representation of zero.
It didn't get done yet, but it might happen someday.

regards, tom lane

#12Jeff Davis
pgsql@j-davis.com
In reply to: Tom Lane (#11)
Re: Review: B-Tree emulation for GIN

On Mon, 2009-01-19 at 11:35 -0500, Tom Lane wrote:

Jeff Davis <pgsql@j-davis.com> writes:

I like the fact that this patch does not modify the numeric ADT. It
still relies on the fact that the numeric type will never make use of
the minimal varlena struct, however. I bring this up in case someone
sees it as a problem.

I'm pretty certain I recall Greg Stark recommending that we adopt
something like that as the standard numeric representation of zero.
It didn't get done yet, but it might happen someday.

Then we should use the previous version of the patch here:

http://archives.postgresql.org/message-id/1231709713.25019.129.camel@jdavis

Was there any talk of supporting a +/- infinity in numeric? If we did
that, it would allow numeric to be supported for btree-gin.

Regards,
Jeff Davis

#13Teodor Sigaev
teodor@sigaev.ru
In reply to: Tom Lane (#11)
Re: Review: B-Tree emulation for GIN

I'm pretty certain I recall Greg Stark recommending that we adopt
something like that as the standard numeric representation of zero.
It didn't get done yet, but it might happen someday.

Changes:
- use NULL as left-most value. It's safe because NULL numeric value
cannot be an argument for any function except gin_numeric_cmp and it
cannot be returned in regular SQL query.
- fix uninstall script for support numeric type.
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

Attachments:

btree_gin-0.8.gzapplication/x-tar; name=btree_gin-0.8.gzDownload
#14Jeff Davis
pgsql@j-davis.com
In reply to: Teodor Sigaev (#13)
Re: Review: B-Tree emulation for GIN

On Mon, 2009-01-19 at 20:15 +0300, Teodor Sigaev wrote:

Changes:
- use NULL as left-most value. It's safe because NULL numeric value
cannot be an argument for any function except gin_numeric_cmp and it
cannot be returned in regular SQL query.

gin_numeric_cmp() can be called from regular SQL. I missed this before,
but that function will segfault if you call gin_numeric_cmp(NULL, 1) (in
v0.7 at least).

I know you mean a C NULL, not a SQL NULL, but it reminded me to test SQL
NULL.

And how does GIN handle SQL NULL values in the column? Does it index
them at all, or just ignore them?

Regards,
Jeff Davis

#15Teodor Sigaev
teodor@sigaev.ru
In reply to: Jeff Davis (#14)
Re: Review: B-Tree emulation for GIN

gin_numeric_cmp() can be called from regular SQL. I missed this before,
but that function will segfault if you call gin_numeric_cmp(NULL, 1) (in
v0.7 at least).

Fixed, gin_numeric_cmp is marked as strict.

And how does GIN handle SQL NULL values in the column? Does it index
them at all, or just ignore them?

SQL NULL: GIN doesn't support it (amindexnulls/amsearchnulls == false)
C NULL: NULL-numeric could be returned only by gin_extract_query_numeric which
cannot be called by user directly because of internal type of argument.
GIN doesn't do anything with values returned by gin_extract_query_numeric except
providing they as an argument for comparing functions.
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

Attachments:

btree_gin-0.9.gzapplication/x-tar; name=btree_gin-0.9.gzDownload
#16Jeff Davis
pgsql@j-davis.com
In reply to: Teodor Sigaev (#15)
Re: Review: B-Tree emulation for GIN

On Mon, 2009-01-19 at 21:41 +0300, Teodor Sigaev wrote:

And how does GIN handle SQL NULL values in the column? Does it index
them at all, or just ignore them?

SQL NULL: GIN doesn't support it (amindexnulls/amsearchnulls == false)
C NULL: NULL-numeric could be returned only by gin_extract_query_numeric which
cannot be called by user directly because of internal type of argument.
GIN doesn't do anything with values returned by gin_extract_query_numeric except
providing they as an argument for comparing functions.

Ok, I understand now. I will look at this later.

Regards,
Jeff Davis

#17Jeff Davis
pgsql@j-davis.com
In reply to: Teodor Sigaev (#15)
Re: Review: B-Tree emulation for GIN

On Mon, 2009-01-19 at 21:41 +0300, Teodor Sigaev wrote:

gin_numeric_cmp() can be called from regular SQL. I missed this before,
but that function will segfault if you call gin_numeric_cmp(NULL, 1) (in
v0.7 at least).

Fixed, gin_numeric_cmp is marked as strict.

And how does GIN handle SQL NULL values in the column? Does it index
them at all, or just ignore them?

SQL NULL: GIN doesn't support it (amindexnulls/amsearchnulls == false)
C NULL: NULL-numeric could be returned only by gin_extract_query_numeric which
cannot be called by user directly because of internal type of argument.
GIN doesn't do anything with values returned by gin_extract_query_numeric except
providing they as an argument for comparing functions.

Ok, looks good. I updated the wiki to show this as the latest version of
the patch.

Thanks,
Jeff Davis

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Teodor Sigaev (#15)
Re: Review: B-Tree emulation for GIN

Looked at this a bit ... do you think it's really a good idea to remove
the strategy number argument of comparePartial? The argument given in
the docs for it is that it might be needed to determine when to end the
scan, and that still seems plausible to me.

The description of extractQuery's extra_data parameter seems confusing
too. AFAICS it is incorrect, or at least misleading, to describe it as
void ** extra_data[]; it is really void ***extra_data, because there is
only one object there not an array.

regards, tom lane

#19Teodor Sigaev
teodor@sigaev.ru
In reply to: Tom Lane (#18)
Re: Review: B-Tree emulation for GIN

Looked at this a bit ... do you think it's really a good idea to remove
the strategy number argument of comparePartial? The argument given in
the docs for it is that it might be needed to determine when to end the
scan, and that still seems plausible to me.

Strategy number is not removed, it's replaced by pointer to opclass-specific
data on per key basis. Actually, partial match feature is new for 8.4 and there
is no any compatibility problem. None of existing opclasses (except
contrib/btree_gin) uses strategy number in comparePartial.

The description of extractQuery's extra_data parameter seems confusing
too. AFAICS it is incorrect, or at least misleading, to describe it as
void ** extra_data[]; it is really void ***extra_data, because there is
only one object there not an array.

It's really array, see fillScanKey() in ginscan.c, line 61 of patched file.
extractQuery could provide data for each returned key.

--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/

#20Jeff Davis
pgsql@j-davis.com
In reply to: Teodor Sigaev (#19)
Re: Review: B-Tree emulation for GIN

On Wed, 2009-02-04 at 20:22 +0300, Teodor Sigaev wrote:

The description of extractQuery's extra_data parameter seems confusing
too. AFAICS it is incorrect, or at least misleading, to describe it as
void ** extra_data[]; it is really void ***extra_data, because there is
only one object there not an array.

It's really array, see fillScanKey() in ginscan.c, line 61 of patched file.
extractQuery could provide data for each returned key.

If I understand correctly, the extra_data parameter to extractQuery()
should mean "a pointer to an array of void*". "void **extra_data[]"
might be misinterpreted as "an array of void**".

Would "void *(*extra_data)[]" be more descriptive? I'm not an expert on
style questions like this, but it seems to match the parameter lists for
comparePartial() and consistent().

"void ***extra_data" seems reasonable, too.

Regards,
Jeff Davis

#21Jeff Davis
pgsql@j-davis.com
In reply to: Teodor Sigaev (#15)
#22Teodor Sigaev
teodor@sigaev.ru
In reply to: Jeff Davis (#21)
#23Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Teodor Sigaev (#22)
#24Teodor Sigaev
teodor@sigaev.ru
In reply to: Heikki Linnakangas (#23)
#25Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Teodor Sigaev (#24)
#26Teodor Sigaev
teodor@sigaev.ru
In reply to: Heikki Linnakangas (#25)
#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Teodor Sigaev (#19)
#28Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#27)
#29Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Ibrar Ahmed (#1)
#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Gierth (#29)
#31Tom Lane
tgl@sss.pgh.pa.us
In reply to: Teodor Sigaev (#26)
#32Jeff Davis
pgsql@j-davis.com
In reply to: Tom Lane (#31)
#33Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Davis (#32)
#34Teodor Sigaev
teodor@sigaev.ru
In reply to: Tom Lane (#31)
#35Tom Lane
tgl@sss.pgh.pa.us
In reply to: Teodor Sigaev (#34)