Add a filed to PageHeaderData

Started by Soroosh Sardariover 11 years ago22 messages
#1Soroosh Sardari
soroosh.sardari@gmail.com

Dear Hackers

I wanted to add a char array with length of 20 to PageHeaderData in
include/storage/bufpage.h.
Surprisingly regression test failed on rangetypes test!

The diff of resulted and expected file is :

*** 968,974 ****
select count(*) from test_range_spgist where ir -|- int4range(100,500);
count
-------
! 5
(1 row)

  -- now check same queries using a bulk-loaded index
--- 968,974 ----
  select count(*) from test_range_spgist where ir -|- int4range(100,500);
   count
  -------
!      2
  (1 row)

-- now check same queries using a bulk-loaded index

======================================================================

Any help appreciated.

Soroosh Sardari

#2Soroosh Sardari
soroosh.sardari@gmail.com
In reply to: Soroosh Sardari (#1)
Re: Add a filed to PageHeaderData

On Mon, Jun 23, 2014 at 10:23 AM, Soroosh Sardari <soroosh.sardari@gmail.com

wrote:

Dear Hackers

I wanted to add a char array with length of 20 to PageHeaderData in
include/storage/bufpage.h.
Surprisingly regression test failed on rangetypes test!

The diff of resulted and expected file is :

*** 968,974 ****
select count(*) from test_range_spgist where ir -|- int4range(100,500);
count
-------
! 5
(1 row)

-- now check same queries using a bulk-loaded index
--- 968,974 ----
select count(*) from test_range_spgist where ir -|- int4range(100,500);
count
-------
!      2
(1 row)

-- now check same queries using a bulk-loaded index

======================================================================

Any help appreciated.

Soroosh Sardari

Is there any rule for adding a field to PageHeaderData?

#3Greg Stark
stark@mit.edu
In reply to: Soroosh Sardari (#2)
Re: Add a filed to PageHeaderData

On Tue, Jun 24, 2014 at 12:02 AM, Soroosh Sardari
<soroosh.sardari@gmail.com> wrote:

Is there any rule for adding a field to PageHeaderData?

Not really. It's a pretty internal thing, not something we expect
people to be doing all the time.

The only rule I can think of is that you should bump some version
numbers such as the page format version and probably the catalog
version. But that's probably irrelevant to your problem. It sounds
like you have a bug in your code but you haven't posted enough
information to say much more.

--
greg

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Andres Freund
andres@2ndquadrant.com
In reply to: Greg Stark (#3)
Re: Add a filed to PageHeaderData

On 2014-06-24 01:58:32 -0700, Greg Stark wrote:

On Tue, Jun 24, 2014 at 12:02 AM, Soroosh Sardari
<soroosh.sardari@gmail.com> wrote:

Is there any rule for adding a field to PageHeaderData?

Not really. It's a pretty internal thing, not something we expect
people to be doing all the time.

I'd actually say that 99% of the things that need it are not going to
happen because we don't want to break on disk compatibility.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Pavan Deolasee
pavan.deolasee@gmail.com
In reply to: Greg Stark (#3)
Re: Add a filed to PageHeaderData

On Tue, Jun 24, 2014 at 2:28 PM, Greg Stark <stark@mit.edu> wrote:

On Tue, Jun 24, 2014 at 12:02 AM, Soroosh Sardari
<soroosh.sardari@gmail.com> wrote:

Is there any rule for adding a field to PageHeaderData?

Not really. It's a pretty internal thing, not something we expect
people to be doing all the time.

The only rule I can think of is that you should bump some version
numbers such as the page format version and probably the catalog
version. But that's probably irrelevant to your problem. It sounds
like you have a bug in your code but you haven't posted enough
information to say much more.

Out of curiosity, I actually tried adding a char[20] field in the page
header because just like you I thought this should be completely internal,
as long as the field is added before the pd_linp[] field. But I get the
same failure that OP is reporting. I wonder if its a bug in gist index
build, though I could not spot anything at the first glance. FWIW changing
the char[] from 20 to 22 or 24 does not cause any failure in rangetypes
test. So I am thinking its some alignment issue (mine is a 64 bit build)

Thanks,
Pavan
--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

#6Soroosh Sardari
soroosh.sardari@gmail.com
In reply to: Pavan Deolasee (#5)
Re: Add a filed to PageHeaderData

On Tue, Jun 24, 2014 at 1:34 PM, Pavan Deolasee <pavan.deolasee@gmail.com>
wrote:

On Tue, Jun 24, 2014 at 2:28 PM, Greg Stark <stark@mit.edu> wrote:

On Tue, Jun 24, 2014 at 12:02 AM, Soroosh Sardari
<soroosh.sardari@gmail.com> wrote:

Is there any rule for adding a field to PageHeaderData?

Not really. It's a pretty internal thing, not something we expect
people to be doing all the time.

The only rule I can think of is that you should bump some version
numbers such as the page format version and probably the catalog
version. But that's probably irrelevant to your problem. It sounds
like you have a bug in your code but you haven't posted enough
information to say much more.

Out of curiosity, I actually tried adding a char[20] field in the page
header because just like you I thought this should be completely internal,
as long as the field is added before the pd_linp[] field. But I get the
same failure that OP is reporting. I wonder if its a bug in gist index
build, though I could not spot anything at the first glance. FWIW changing
the char[] from 20 to 22 or 24 does not cause any failure in rangetypes
test. So I am thinking its some alignment issue (mine is a 64 bit build)

Thanks,
Pavan
--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

I check this problem with a virgin source code of postgresql-9.3.2. So the
bug is not for my codes.
As Pavan said, may be some alignment issues cause this problem.
By the way, following code has two different output and it is weird.

drop table if exists test_range_spgist;
create table test_range_spgist(ir int4range);
create index test_range_spgist_idx on test_range_spgist using spgist (ir);
insert into test_range_spgist select int4range(g, g+10) from
generate_series(1,590) g;

SET enable_seqscan = t;
SET enable_indexscan = f;
SET enable_bitmapscan = f;

select * from test_range_spgist where ir -|- int4range(100,500);

SET enable_seqscan = f;
SET enable_indexscan = t;
SET enable_bitmapscan = f;

select * from test_range_spgist where ir -|- int4range(100,500);

Regards,
Soroosh

#7Abhijit Menon-Sen
ams@2ndquadrant.com
In reply to: Soroosh Sardari (#6)
Re: Add a filed to PageHeaderData

At 2014-06-24 14:21:24 +0430, soroosh.sardari@gmail.com wrote:

By the way, following code has two different output and it is weird.

I get the same output from both queries with both 9.3.4 and HEAD:

ir
-----------
[90,100)
[500,510)
(2 rows)

If you're reporting a problem, please make some effort to provide enough
details to reproduce it. From your mail I could guess that you tried it
on 9.3.2, but please try not to make people guess.

-- Abhijit

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Kevin Grittner
kgrittn@ymail.com
In reply to: Soroosh Sardari (#6)
Re: Add a filed to PageHeaderData

Soroosh Sardari <soroosh.sardari@gmail.com> wrote:

I check this problem with a virgin source code of
postgresql-9.3.2. So the bug is not for my codes.

By the way, following code has two different output and it is
weird.

I can confirm that I see the difference in 9.3.2, and that I don't
see the difference in 9.3.4.  Upgrade.

http://www.postgresql.org/support/versioning/

There's really no point in reporting a possible bug on a version
with known bugs which have already had fixes published.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9geohas
lists@hasibether.at
In reply to: Kevin Grittner (#8)
Hooks Docu - list of hooks

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi to all!

I am searching for a documentation of hooks in PG, all i found was a
presentation in the wiki and some modules from 2ndQuadrant and petere on
github. The last three weeks i was reading the source code to get some
information.

Is there a list of possible hooks, or maybe a little docu or overview?
Especially hooks to catch Insert, Update and Delete Stmts and SubQuerys.

It would help a lot to finish / write a log into Tables Module.

Please excuse my mail, if there was a similar question on the list, i
subscribed today and a simple search in the archive showd no results.

regards

geohas

PS: I've an excuse for my bad english - i'am austrian ;)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJTqVetAAoJEJFGMlQe7wR/ae0H/Rkt0G5d6kspgWhPyN/aIWIS
wTYKDdxDWt+EeyuCg7SWx/UxJLW22wnWKxmLjvfkT+/ibkCv5qmYRLMOh+cvH0O9
AimWP7fZX+VpYSfpmm/SuvuwUM3OQiM3iwU6MIpu4XfrulAD3F94/aafNp3D2jBK
Fz/J/Sjmr9LN/YBuE99i6asUJG669m4ISsmMpNwXPAh3wv+A3sN0dhvDCFJ11iCL
hIXqktMpm60iI5sIQUPUjgSTHFTj3aGuKtX3OCWPM4CHoaHwDNtq1klHeuiLSb3y
enjMW4tvTWtPw8DIkEgpatn8gsJvXVIjfsZPiTsp8HbN2evhkYxsgfV89R8usRU=
=vA51
-----END PGP SIGNATURE-----

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Soroosh Sardari
soroosh.sardari@gmail.com
In reply to: Kevin Grittner (#8)
Re: Add a filed to PageHeaderData

On Tue, Jun 24, 2014 at 2:40 PM, Kevin Grittner <kgrittn@ymail.com> wrote:

Soroosh Sardari <soroosh.sardari@gmail.com> wrote:

I check this problem with a virgin source code of
postgresql-9.3.2. So the bug is not for my codes.

By the way, following code has two different output and it is
weird.

I can confirm that I see the difference in 9.3.2, and that I don't
see the difference in 9.3.4. Upgrade.

http://www.postgresql.org/support/versioning/

There's really no point in reporting a possible bug on a version
with known bugs which have already had fixes published.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

wow, it's arch-dependent.
in the 32-bit compiled of PG9.3.2 the code has same output and in 64-bit
binary of same code output is different!!

The problem is not about the sql code I posted in the last email. Problem
could be different in any architecture,
In 32-bit or 64-bit architecture adding a char array of length 20 to
PageHeaderData cause error in regression test.

Regards,
Soroosh

#11Andres Freund
andres@2ndquadrant.com
In reply to: Soroosh Sardari (#10)
Re: Add a filed to PageHeaderData

On 2014-06-24 15:23:54 +0430, Soroosh Sardari wrote:

On Tue, Jun 24, 2014 at 2:40 PM, Kevin Grittner <kgrittn@ymail.com> wrote:

Soroosh Sardari <soroosh.sardari@gmail.com> wrote:

I check this problem with a virgin source code of
postgresql-9.3.2. So the bug is not for my codes.

By the way, following code has two different output and it is
weird.

I can confirm that I see the difference in 9.3.2, and that I don't
see the difference in 9.3.4. Upgrade.

http://www.postgresql.org/support/versioning/

There's really no point in reporting a possible bug on a version
with known bugs which have already had fixes published.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

wow, it's arch-dependent.
in the 32-bit compiled of PG9.3.2 the code has same output and in 64-bit
binary of same code output is different!!

The problem is not about the sql code I posted in the last email. Problem
could be different in any architecture,
In 32-bit or 64-bit architecture adding a char array of length 20 to
PageHeaderData cause error in regression test.

You likely didn't adapt SizeOfPageHederData.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Abhijit Menon-Sen
ams@2ndquadrant.com
In reply to: geohas (#9)
Re: Hooks Docu - list of hooks

At 2014-06-24 12:49:17 +0200, lists@hasibether.at wrote:

Is there a list of possible hooks, or maybe a little docu or overview?

The best I found was "git grep _hook_type" and then read the code to
understand when and why the hook was called.

Especially hooks to catch Insert, Update and Delete Stmts and
SubQuerys.

It would help a lot to finish / write a log into Tables Module.

Look at how pgaudit does it: https://github.com/2ndQuadrant/pgaudit

The code has comments about how the various available hooks are used.
(I was planning to implement a bgwriter that wrote log messages to a
table, which sounds a bit like what you want to do.)

-- Abhijit

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13geohas
lists@hasibether.at
In reply to: Abhijit Menon-Sen (#12)
Re: Hooks Docu - list of hooks

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 24/06/14 12:59, Abhijit Menon-Sen wrote:

At 2014-06-24 12:49:17 +0200, lists@hasibether.at wrote:

Is there a list of possible hooks, or maybe a little docu or overview?

The best I found was "git grep _hook_type" and then read the code to
understand when and why the hook was called.

Especially hooks to catch Insert, Update and Delete Stmts and
SubQuerys.

It would help a lot to finish / write a log into Tables Module.

Look at how pgaudit does it: https://github.com/2ndQuadrant/pgaudit

I already tried pgaudit ;), one of the best examples, it helped me much.

The code has comments about how the various available hooks are used.
(I was planning to implement a bgwriter that wrote log messages to a
table, which sounds a bit like what you want to do.)

The module i'm thinking of, working on, is a bit inspired from pgaudit
and petere's pg_trashcan.
It should copy every created table in a "shadow"-schema with extra
columns for record on / record off and Userid (this is already working ;)).
In case of a drop statement it should rename the table in the shadow
schema XXX-droped-Date.

Now i am trying to catch the planned Stmts, ...
It should work without triggers - because the shadow schema should only
be visible for user postgres.

regards
geohas

-- Abhijit

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJTqWQ4AAoJEJFGMlQe7wR/8CEIAJihWVGc//dDHGF9lDtMo3Ds
v1Xhd5U9n1tLL/Cx0/cqnslKctdfSCY2I/ptjNSDFO8U/YdUjNdPf4nYvxn0gjKR
n8VuC61BDr6qHFQvlJE7GLv2hs2GCxFM5dEgnV7foJjT18C/VgnSRFulJzxU87EZ
8uKG53+CM9ERDa5P9py9jyvrJJvIAXk9AAfevU9g+jimwK9OntwkC7ZfyVWEDwfr
x7LDyrzhge/EIco01pzJSimuVd0BPvTQ8V7XUTpy25xS+D8968wE8eRBaMWXH0b2
KR5lju+sz+SyVQKildcyExOEQWN3PgVmST5USAy9cAzPIuic+yR+qsa5H2VRTFI=
=ZYct
-----END PGP SIGNATURE-----

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Soroosh Sardari
soroosh.sardari@gmail.com
In reply to: Andres Freund (#11)
Re: Add a filed to PageHeaderData

On Tue, Jun 24, 2014 at 3:27 PM, Andres Freund <andres@2ndquadrant.com>
wrote:

On 2014-06-24 15:23:54 +0430, Soroosh Sardari wrote:

On Tue, Jun 24, 2014 at 2:40 PM, Kevin Grittner <kgrittn@ymail.com>

wrote:

Soroosh Sardari <soroosh.sardari@gmail.com> wrote:

I check this problem with a virgin source code of
postgresql-9.3.2. So the bug is not for my codes.

By the way, following code has two different output and it is
weird.

I can confirm that I see the difference in 9.3.2, and that I don't
see the difference in 9.3.4. Upgrade.

http://www.postgresql.org/support/versioning/

There's really no point in reporting a possible bug on a version
with known bugs which have already had fixes published.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

wow, it's arch-dependent.
in the 32-bit compiled of PG9.3.2 the code has same output and in 64-bit
binary of same code output is different!!

The problem is not about the sql code I posted in the last email. Problem
could be different in any architecture,
In 32-bit or 64-bit architecture adding a char array of length 20 to
PageHeaderData cause error in regression test.

You likely didn't adapt SizeOfPageHederData.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#define SizeOfPageHeaderData (offsetof(PageHeaderData, pd_linp))

I think ,the macro does not need any change!

#15Pavan Deolasee
pavan.deolasee@gmail.com
In reply to: Kevin Grittner (#8)
1 attachment(s)
Re: Add a filed to PageHeaderData

On Tue, Jun 24, 2014 at 3:40 PM, Kevin Grittner <kgrittn@ymail.com> wrote:

Soroosh Sardari <soroosh.sardari@gmail.com> wrote:

I check this problem with a virgin source code of
postgresql-9.3.2. So the bug is not for my codes.

By the way, following code has two different output and it is
weird.

I can confirm that I see the difference in 9.3.2, and that I don't
see the difference in 9.3.4. Upgrade.

http://www.postgresql.org/support/versioning/

There's really no point in reporting a possible bug on a version
with known bugs which have already had fixes published.

FWIW I can reproduce this on HEAD with the attached patch. I could
reproduce this on a 64-bit Ubuntu as well as 64-bit Mac OSX. Very confusing
it is because I tried with various values for N in char[N] array and it
fails for N=20. Other values I tried are 4, 12, 22, 24 and the test passes
for all of them. The logic for trying other values is to see if pd_linp[]
starting on un-aligned boundary can trigger the issue. But there seem to be
no correlation.

postgres=# select version();

PostgreSQL 9.5devel on x86_64-apple-darwin13.2.0, compiled by Apple LLVM
version 5.1 (clang-503.0.38) (based on LLVM 3.4svn), 64-bit

postgres=# -- test SP-GiST index that's been built incrementally

postgres=# create table test_range_spgist(ir int4range);
postgres=# create index test_range_spgist_idx on test_range_spgist using
spgist (ir);
postgres=# insert into test_range_spgist select int4range(g, g+10) from
generate_series(1,586) g;
INSERT 0 586

postgres=# SET enable_seqscan = t;
postgres=# SET enable_indexscan = f;
postgres=# SET enable_bitmapscan = f;

postgres=# select * from test_range_spgist where ir -|- int4range(100,500);
ir
-----------
[90,100)
[500,510)
(2 rows)

postgres=# SET enable_seqscan = f;
postgres=# select * from test_range_spgist where ir -|- int4range(100,500);
ir
-----------
[90,100)
[500,510)
(2 rows)

At this point, both rows are visible via index scan as well as seq scan.

postgres=# insert into test_range_spgist select int4range(g, g+10) from
generate_series(587,587) g;
INSERT 0 1

postgres=# select * from test_range_spgist where ir -|- int4range(100,500);
ir
----------
[90,100)
(1 row)

Ouch. The second row somehow disappeared.

postgres=# SET enable_seqscan = t;
postgres=# select * from test_range_spgist where ir -|- int4range(100,500);
ir
-----------
[90,100)
[500,510)
(2 rows)

So the last INSERT suddenly makes one row disappear via the index scan
though its still reachable via seq scan. I tried looking at the SP-Gist
code but clearly I don't understand it a whole lot to figure out the issue,
if one exists.

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

Attachments:

page-header-padding.patchapplication/octet-stream; name=page-header-padding.patchDownload
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index d96e375..82cc592 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -156,6 +156,7 @@ typedef struct PageHeaderData
 	LocationIndex pd_special;	/* offset to start of special space */
 	uint16		pd_pagesize_version;
 	TransactionId pd_prune_xid; /* oldest prunable XID, or zero if none */
+	char		pd_padding[20];
 	ItemIdData	pd_linp[1];		/* beginning of line pointer array */
 } PageHeaderData;
 
#16Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Pavan Deolasee (#15)
Bug in spg_range_quad_inner_consistent for adjacent operator (was Re: Add a filed to PageHeaderData)

On 06/24/2014 08:48 PM, Pavan Deolasee wrote:

FWIW I can reproduce this on HEAD with the attached patch. I could
reproduce this on a 64-bit Ubuntu as well as 64-bit Mac OSX. Very confusing
it is because I tried with various values for N in char[N] array and it
fails for N=20. Other values I tried are 4, 12, 22, 24 and the test passes
for all of them. The logic for trying other values is to see if pd_linp[]
starting on un-aligned boundary can trigger the issue. But there seem to be
no correlation.

postgres=# select version();

PostgreSQL 9.5devel on x86_64-apple-darwin13.2.0, compiled by Apple LLVM
version 5.1 (clang-503.0.38) (based on LLVM 3.4svn), 64-bit

postgres=# -- test SP-GiST index that's been built incrementally

postgres=# create table test_range_spgist(ir int4range);
postgres=# create index test_range_spgist_idx on test_range_spgist using
spgist (ir);
postgres=# insert into test_range_spgist select int4range(g, g+10) from
generate_series(1,586) g;
INSERT 0 586

postgres=# SET enable_seqscan = t;
postgres=# SET enable_indexscan = f;
postgres=# SET enable_bitmapscan = f;

postgres=# select * from test_range_spgist where ir -|- int4range(100,500);
ir
-----------
[90,100)
[500,510)
(2 rows)

postgres=# SET enable_seqscan = f;
postgres=# select * from test_range_spgist where ir -|- int4range(100,500);
ir
-----------
[90,100)
[500,510)
(2 rows)

At this point, both rows are visible via index scan as well as seq scan.

postgres=# insert into test_range_spgist select int4range(g, g+10) from
generate_series(587,587) g;
INSERT 0 1

postgres=# select * from test_range_spgist where ir -|- int4range(100,500);
ir
----------
[90,100)
(1 row)

Ouch. The second row somehow disappeared.

postgres=# SET enable_seqscan = t;
postgres=# select * from test_range_spgist where ir -|- int4range(100,500);
ir
-----------
[90,100)
[500,510)
(2 rows)

So the last INSERT suddenly makes one row disappear via the index scan
though its still reachable via seq scan. I tried looking at the SP-Gist
code but clearly I don't understand it a whole lot to figure out the issue,
if one exists.

Yeah, I can reproduce this. It doesn't seem to be related to the padding
or alignment at all. The padding just happens to move tuples around so
that [500, 510) is picked as an SP-GiST inner node.

The real bug is in spg_range_quad_inner_consistent(), for the adjacent
operator. Things go wrong when:

The scan key is [100, 500)
The prev centroid is [500, 510)
The current centroid is [544, 554).

The row that should match but isn't returned, [500, 510) is equal to the
previous centroid. It's in quadrant 3 from the current centroid, but
spg_range_quad_inner_consistent() incorrectly concludes that it doesn't
need to scan that quadrant.

The function compares the scan key's upper bound with the the previous
centroid's lower bound and the current centroid's lower bound:

/*
* Check if upper bound of argument is not in a
* quadrant we visited in the previous step.
*/
cmp1 = range_cmp_bounds(typcache, &upper, &prevLower);
cmp2 = range_cmp_bounds(typcache, &centroidLower,
&prevLower);
if ((cmp2 < 0 && cmp1 > 0) || (cmp2 > 0 && cmp1 < 0))
which2 = 0;

The idea is that if the scan key's upper bound doesn't fall between the
prev and current centroid's lower bounds, there is no match.

* * *
PL X CL

X = scan key's upper bound: 500)
PL = prev centroid's lower bound [500
CL = current centroid's lower bound [500

This is wrong. X < PL, but it's still nevertheless adjacent to it.

I'll take a closer look tomorrow...

(The "if (which2) ..." block after the code I quoted above also looks
wrong - it seems to be comparing the argument's lower bound when it
should be comparing the upper bound according to the comment. )

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Heikki Linnakangas (#16)
2 attachment(s)
Re: Bug in spg_range_quad_inner_consistent for adjacent operator (was Re: Add a filed to PageHeaderData)

On 06/24/2014 11:22 PM, Heikki Linnakangas wrote:

The real bug is in spg_range_quad_inner_consistent(), for the adjacent
operator. Things go wrong when:

The scan key is [100, 500)
The prev centroid is [500, 510)
The current centroid is [544, 554).

The row that should match but isn't returned, [500, 510) is equal to the
previous centroid. It's in quadrant 3 from the current centroid, but
spg_range_quad_inner_consistent() incorrectly concludes that it doesn't
need to scan that quadrant.

The function compares the scan key's upper bound with the the previous
centroid's lower bound and the current centroid's lower bound:

/*
* Check if upper bound of argument is not in a
* quadrant we visited in the previous step.
*/
cmp1 = range_cmp_bounds(typcache, &upper, &prevLower);
cmp2 = range_cmp_bounds(typcache, &centroidLower,
&prevLower);
if ((cmp2 < 0 && cmp1 > 0) || (cmp2 > 0 && cmp1 < 0))
which2 = 0;

The idea is that if the scan key's upper bound doesn't fall between the
prev and current centroid's lower bounds, there is no match.

* * *
PL X CL

X = scan key's upper bound: 500)
PL = prev centroid's lower bound [500
CL = current centroid's lower bound [500

This is wrong. X < PL, but it's still nevertheless adjacent to it.

I'll take a closer look tomorrow...

(The "if (which2) ..." block after the code I quoted above also looks
wrong - it seems to be comparing the argument's lower bound when it
should be comparing the upper bound according to the comment. )

I came up with the attached. There were several bugs:

* The "if (which2) { ... }" block was broken. It compared the
argument's lower bound against centroid's upper bound, while it was
supposed to compare the argument's upper bound against the centroid's
lower bound (the comment was right). Also, it clear bits in the "which1"
variable, while it was supposed to clear bits in "which2". ISTM it was
copy-pasted from the if (which1) { ... }" block just before it, but not
modified.

* If the argument's upper bound was equal to the centroid's lower bound,
we descended to both halves (= all quadrants). That's unnecessary.
Imagine that the argument is (x, 500), and the centroid is (500, y), so
that the bounds are equal. The adjacent ranges that we're trying to find
would be of form [500, z), which are to the right of the centroid. There
is no need to visit the left quadrants. This won't lead to incorrect
query results, but slows down queries unnecessarily.

* In the case that argument's lower bound is adjacent to the centroid's
upper bound, we also don't need to visit all quadrants. Per similar
reasoning as previous point.

* The code where we compare the previous centroid with the current
centroid should match the code where we compare the current centroid
with the argument. The point of that code is to redo the calculation
done in the previous level, to see if we were supposed to traverse left
or right (or up or down), and if we actually did. If we moved in the
different direction, then we know there are no matches for bound.

Those could be fixed with quite small adjustments, but I think the code
needs some refactoring. It's complicated as it is, it's very difficult
to understand all the cases and comparisons. Case in point: the patch
was written by Alexander, reviewed by Jeff, and committed by me, and we
all missed the above bugs. So, I propose the attached.

I also wrote the attached little white-box testing tool for this. It
allows you to call spg_range_quad_inner_consistent with the adjacent
strategy, and pass the exact argument, centroid and prev centroid ranges
you want. It prints out the result of which quadrants to visit.

- Heikki

Attachments:

spgist-adjacent-fix-2.patchtext/x-diff; name=spgist-adjacent-fix-2.patchDownload
diff --git a/src/backend/utils/adt/rangetypes_spgist.c b/src/backend/utils/adt/rangetypes_spgist.c
index a55cffa..1b83941 100644
--- a/src/backend/utils/adt/rangetypes_spgist.c
+++ b/src/backend/utils/adt/rangetypes_spgist.c
@@ -54,6 +54,12 @@ static int16 getQuadrant(TypeCacheEntry *typcache, RangeType *centroid,
 			RangeType *tst);
 static int	bound_cmp(const void *a, const void *b, void *arg);
 
+static int adjacent_inner_consistent(TypeCacheEntry *typcache,
+						  RangeBound *arg, RangeBound *centroid,
+						  RangeBound *prev);
+static int adjacent_cmp_bounds(TypeCacheEntry *typcache, RangeBound *arg,
+					RangeBound *centroid);
+
 /*
  * SP-GiST 'config' interface function.
  */
@@ -441,6 +447,11 @@ spg_range_quad_inner_consistent(PG_FUNCTION_ARGS)
 			bool		empty;
 			RangeType  *range = NULL;
 
+			RangeType  *prevCentroid = NULL;
+			RangeBound	prevLower,
+						prevUpper;
+			bool		prevEmpty;
+
 			/* Restrictions on range bounds according to scan strategy */
 			RangeBound *minLower = NULL,
 					   *maxLower = NULL,
@@ -550,109 +561,53 @@ spg_range_quad_inner_consistent(PG_FUNCTION_ARGS)
 						break;	/* Skip to strictEmpty check. */
 
 					/*
-					 * which1 is bitmask for possibility to be adjacent with
-					 * lower bound of argument. which2 is bitmask for
-					 * possibility to be adjacent with upper bound of
-					 * argument.
-					 */
-					which1 = which2 = (1 << 1) | (1 << 2) | (1 << 3) | (1 << 4);
-
-					/*
 					 * Previously selected quadrant could exclude possibility
 					 * for lower or upper bounds to be adjacent. Deserialize
 					 * previous centroid range if present for checking this.
 					 */
 					if (in->reconstructedValue != (Datum) 0)
 					{
-						RangeType  *prevCentroid;
-						RangeBound	prevLower,
-									prevUpper;
-						bool		prevEmpty;
-						int			cmp1,
-									cmp2;
-
 						prevCentroid = DatumGetRangeType(in->reconstructedValue);
 						range_deserialize(typcache, prevCentroid,
 										  &prevLower, &prevUpper, &prevEmpty);
-
-						/*
-						 * Check if lower bound of argument is not in a
-						 * quadrant we visited in the previous step.
-						 */
-						cmp1 = range_cmp_bounds(typcache, &lower, &prevUpper);
-						cmp2 = range_cmp_bounds(typcache, &centroidUpper,
-												&prevUpper);
-						if ((cmp2 < 0 && cmp1 > 0) || (cmp2 > 0 && cmp1 < 0))
-							which1 = 0;
-
-						/*
-						 * Check if upper bound of argument is not in a
-						 * quadrant we visited in the previous step.
-						 */
-						cmp1 = range_cmp_bounds(typcache, &upper, &prevLower);
-						cmp2 = range_cmp_bounds(typcache, &centroidLower,
-												&prevLower);
-						if ((cmp2 < 0 && cmp1 > 0) || (cmp2 > 0 && cmp1 < 0))
-							which2 = 0;
 					}
 
-					if (which1)
-					{
-						/*
-						 * For a range's upper bound to be adjacent to the
-						 * argument's lower bound, it will be found along the
-						 * line adjacent to (and just below) Y=lower.
-						 * Therefore, if the argument's lower bound is less
-						 * than the centroid's upper bound, the line falls in
-						 * quadrants 2 and 3; if greater, the line falls in
-						 * quadrants 1 and 4.
-						 *
-						 * The above is true even when the argument's lower
-						 * bound is greater and adjacent to the centroid's
-						 * upper bound. If the argument's lower bound is
-						 * greater than the centroid's upper bound, then the
-						 * lowest value that an adjacent range could have is
-						 * that of the centroid's upper bound, which still
-						 * falls in quadrants 1 and 4.
-						 *
-						 * In the edge case, where the argument's lower bound
-						 * is equal to the cetroid's upper bound, there may be
-						 * adjacent ranges in any quadrant.
-						 */
-						cmp = range_cmp_bounds(typcache, &lower,
-											   &centroidUpper);
-						if (cmp < 0)
-							which1 &= (1 << 2) | (1 << 3);
-						else if (cmp > 0)
-							which1 &= (1 << 1) | (1 << 4);
-					}
+					/*
+					 * For a range's upper bound to be adjacent to the
+					 * argument's lower bound, it will be found along the line
+					 * adjacent to (and just below) Y=lower. Therefore, if the
+					 * argument's lower bound is less than the centroid's
+					 * upper bound, the line falls in quadrants 2 and 3; if
+					 * greater, the line falls in quadrants 1 and 4. (see
+					 * adjacent_cmp_bounds for description of edge cases).
+					 */
+					cmp = adjacent_inner_consistent(typcache, &lower,
+													&centroidUpper,
+											prevCentroid ? &prevUpper : NULL);
+					if (cmp > 0)
+						which1 = (1 << 1) | (1 << 4);
+					else if (cmp < 0)
+						which1 = (1 << 2) | (1 << 3);
+					else
+						which1 = 0;
 
-					if (which2)
-					{
-						/*
-						 * For a range's lower bound to be adjacent to the
-						 * argument's upper bound, it will be found along the
-						 * line adjacent to (and just right of) X=upper.
-						 * Therefore, if the argument's upper bound is less
-						 * than (and not adjacent to) the centroid's upper
-						 * bound, the line falls in quadrants 3 and 4; if
-						 * greater or equal to, the line falls in quadrants 1
-						 * and 2.
-						 *
-						 * The edge case is when the argument's upper bound is
-						 * less than and adjacent to the centroid's lower
-						 * bound. In that case, adjacent ranges may be in any
-						 * quadrant.
-						 */
-						cmp = range_cmp_bounds(typcache, &lower,
-											   &centroidUpper);
-						if (cmp < 0 &&
-							!bounds_adjacent(typcache, upper, centroidLower))
-							which1 &= (1 << 3) | (1 << 4);
-						else if (cmp > 0)
-							which1 &= (1 << 1) | (1 << 2);
-					}
+					/*
+					 * Also search for ranges's adjacent to argument's upper
+					 * bound. They will be found along the line adjacent to
+					 * (and just right of) X=upper, which falls in quadrants
+					 * 3 and 4, or 1 and 2.
+					 */
+					cmp = adjacent_inner_consistent(typcache, &upper,
+													&centroidLower,
+											prevCentroid ? &prevLower : NULL);
+					if (cmp > 0)
+						which2 = (1 << 1) | (1 << 2);
+					else if (cmp < 0)
+						which2 = (1 << 3) | (1 << 4);
+					else
+						which2 = 0;
 
+					/* We must chase down ranges adjacent to either bound. */
 					which &= which1 | which2;
 
 					needPrevious = true;
@@ -808,6 +763,146 @@ spg_range_quad_inner_consistent(PG_FUNCTION_ARGS)
 }
 
 /*
+ * adjacent_cmp_bounds
+ *
+ * Given an argument and centroid bound, this function determines if any
+ * bounds that are adjacent to the argument are smaller than, or greater than
+ * or equal to centroid. For brevity, we call the arg < centroid "left", and
+ * arg >= centroid case "right". This corresponds to how the quadrants are
+ * arranged, if you imagine that "left" is equivalent to "down" and "right"
+ * is equivalent to "up".
+ *
+ * For the "left" case, returns -1, and for the "right" case, returns 1.
+ */
+static int
+adjacent_cmp_bounds(TypeCacheEntry *typcache, RangeBound *arg,
+					RangeBound *centroid)
+{
+	int			cmp;
+
+	Assert(arg->lower != centroid->lower);
+
+	cmp = range_cmp_bounds(typcache, arg,  centroid);
+
+	if (centroid->lower)
+	{
+		/*------
+		 * The argument is an upper bound, we are searching for adjacent lower
+		 * bounds. A matching adjacent lower bound must be *larger* than the
+		 * argument, but only just.
+		 *
+		 * The following table illustrates the desired result with a fixed
+		 * argument bound, and different centroids. The CMP column shows
+		 * the value of 'cmp' variable, and ADJ shows whether the argument
+		 * and centroid are adjacent, per bounds_adjacent(). (N) means we
+		 * don't need to check for that case, because it's implied by CMP.
+		 * With the argument range [..., 500), the adjacent range we're
+		 * searching for is [500, ...):
+		 *
+		 *  ARGUMENT   CENTROID     CMP   ADJ
+		 *  [..., 500) [498, ...)    >    (N)   [500, ...) is to the right
+		 *  [..., 500) [499, ...)    =    (N)   [500, ...) is to the right
+		 *  [..., 500) [500, ...)    <     Y    [500, ...) is to the right
+		 *  [..., 500) [501, ...)    <     N    [500, ...) is to the left
+		 *
+		 * So, we must search left when the argument is smaller than, and not
+		 * adjacent, to the centroid. Otherwise search right.
+		 *------
+		 */
+		if (cmp < 0 && !bounds_adjacent(typcache, *arg, *centroid))
+			return -1;
+		else
+			return 1;
+	}
+	else
+	{
+		/*------
+		 * The argument is a lower bound, we are searching for adjacent upper
+		 * bounds. A matching adjacent upper bound must be *smaller* than the
+		 * argument, but only just.
+		 *
+		 *  ARGUMENT   CENTROID     CMP   ADJ
+		 *  [500, ...) [..., 499)    >    (N)   [..., 500) is to the right
+		 *  [500, ...) [..., 500)    >    (Y)   [..., 500) is to the right
+		 *  [500, ...) [..., 501)    =    (N)   [..., 500) is to the left
+		 *  [500, ...) [..., 502)    <    (N)   [..., 500) is to the left
+		 *
+		 * We must search left when the argument is smaller than or equal to
+		 * the centroid. Otherwise search right. We don't need to check
+		 * whether the argument is adjacent with the centroid, because it
+		 * doesn't matter.
+		 *------
+		 */
+		if (cmp <= 0)
+			return -1;
+		else
+			return 1;
+	}
+}
+
+/*----------
+ * adjacent_inner_consistent
+ *
+ * Like adjacent_cmp_bounds, but also takes into account the previous
+ * level's centroid. We might've traversed left (or right) at the previous
+ * node, in search for ranges adjacent to the other bound, even though we
+ * already ruled out the possibility for any matches in that direction for
+ * this bound. By comparing the argument with the previous centroid, and
+ * the previous centroid with the current centroid, we can determine which
+ * direction we should've moved in at previous level, and which direction we
+ * actually moved.
+ *
+ * If there can be any matches to the left, returns -1. If to the right,
+ * returns 1. If there can be no matches below this centroid, because we
+ * already ruled them out at the previous level, returns 0.
+ *
+ * XXX: Comparing just the previous and current level isn't foolproof; we
+ * might still search some branches unnecessarily. For example, imagine that
+ * we are searching for value 15, and we traverse the following centroids
+ * (only considering one bound for the moment):
+ *
+ * Level 1: 20
+ * Level 2: 50
+ * Level 3: 25
+ *
+ * At this point, previous centroid is 50, current centroid is 25, and the
+ * target value is to the left. But because we already moved right from
+ * centroid 20 to 50 in the first level, there cannot be any values < 20 in
+ * the current branch. But we don't know that just by looking at the previous
+ * and current centroid, so we traverse left, unnecessarily. The reason we are
+ * down this branch is that we're searching for matches with the *other*
+ * bound. If we kept track of which bound we are searching for explicitly,
+ * instead of deducing that from the previous and current centroid, we could
+ * avoid some unnecessary work.
+ *----------
+ */
+static int
+adjacent_inner_consistent(TypeCacheEntry *typcache, RangeBound *arg,
+						  RangeBound *centroid, RangeBound *prev)
+{
+	if (prev)
+	{
+		int			prevcmp;
+		int			cmp;
+
+		/*
+		 * Which direction were we supposed to traverse at previous level,
+		 * left or right?
+		 */
+		prevcmp = adjacent_cmp_bounds(typcache, arg, prev);
+
+		/* and which direction did we actually go? */
+		cmp = range_cmp_bounds(typcache, centroid, prev);
+
+		/* if the two don't agree, there's nothing to see here */
+		if ((prevcmp < 0 && cmp >= 0) || (prevcmp > 0 && cmp < 0))
+			return 0;
+	}
+
+	return adjacent_cmp_bounds(typcache, arg, centroid);
+}
+
+/*
  * SP-GiST consistent function for leaf nodes: check leaf value against query
  * using corresponding function.
  */
spgist_whiteboxtest.tar.gzapplication/gzip; name=spgist_whiteboxtest.tar.gzDownload
#18Soroosh Sardari
soroosh.sardari@gmail.com
In reply to: Pavan Deolasee (#15)
Re: Add a filed to PageHeaderData

On Tue, Jun 24, 2014 at 10:18 PM, Pavan Deolasee <pavan.deolasee@gmail.com>
wrote:

On Tue, Jun 24, 2014 at 3:40 PM, Kevin Grittner <kgrittn@ymail.com> wrote:

Soroosh Sardari <soroosh.sardari@gmail.com> wrote:

I check this problem with a virgin source code of
postgresql-9.3.2. So the bug is not for my codes.

By the way, following code has two different output and it is
weird.

I can confirm that I see the difference in 9.3.2, and that I don't
see the difference in 9.3.4. Upgrade.

http://www.postgresql.org/support/versioning/

There's really no point in reporting a possible bug on a version
with known bugs which have already had fixes published.

FWIW I can reproduce this on HEAD with the attached patch. I could
reproduce this on a 64-bit Ubuntu as well as 64-bit Mac OSX. Very confusing
it is because I tried with various values for N in char[N] array and it
fails for N=20. Other values I tried are 4, 12, 22, 24 and the test passes
for all of them. The logic for trying other values is to see if pd_linp[]
starting on un-aligned boundary can trigger the issue. But there seem to be
no correlation.

postgres=# select version();

PostgreSQL 9.5devel on x86_64-apple-darwin13.2.0, compiled by Apple LLVM
version 5.1 (clang-503.0.38) (based on LLVM 3.4svn), 64-bit

postgres=# -- test SP-GiST index that's been built incrementally

postgres=# create table test_range_spgist(ir int4range);
postgres=# create index test_range_spgist_idx on test_range_spgist using
spgist (ir);
postgres=# insert into test_range_spgist select int4range(g, g+10) from
generate_series(1,586) g;
INSERT 0 586

postgres=# SET enable_seqscan = t;
postgres=# SET enable_indexscan = f;
postgres=# SET enable_bitmapscan = f;

postgres=# select * from test_range_spgist where ir -|- int4range(100,500);

ir
-----------
[90,100)
[500,510)
(2 rows)

postgres=# SET enable_seqscan = f;
postgres=# select * from test_range_spgist where ir -|- int4range(100,500);

ir
-----------
[90,100)
[500,510)
(2 rows)

At this point, both rows are visible via index scan as well as seq scan.

postgres=# insert into test_range_spgist select int4range(g, g+10) from
generate_series(587,587) g;
INSERT 0 1

postgres=# select * from test_range_spgist where ir -|- int4range(100,500);
ir
----------
[90,100)
(1 row)

Ouch. The second row somehow disappeared.

postgres=# SET enable_seqscan = t;
postgres=# select * from test_range_spgist where ir -|- int4range(100,500);

ir
-----------
[90,100)
[500,510)
(2 rows)

So the last INSERT suddenly makes one row disappear via the index scan
though its still reachable via seq scan. I tried looking at the SP-Gist
code but clearly I don't understand it a whole lot to figure out the issue,
if one exists.

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

Is there any plug in to examine each page of spgist index?
Unfortunately pageinspect only work for btree index.

#19Pavan Deolasee
pavan.deolasee@gmail.com
In reply to: Heikki Linnakangas (#17)
Re: Bug in spg_range_quad_inner_consistent for adjacent operator (was Re: Add a filed to PageHeaderData)

On Wed, Jun 25, 2014 at 10:39 PM, Heikki Linnakangas <
hlinnakangas@vmware.com> wrote:

I came up with the attached. There were several bugs:

I tested for the original bug report and patch definitely fixes that. I
don't feel qualified enough with SP-Gist to really comment on the other
bugs you reported and presumably fixed in the patch.

Thanks,
Pavan

#20Pavan Deolasee
pavan.deolasee@gmail.com
In reply to: Pavan Deolasee (#19)
Re: Bug in spg_range_quad_inner_consistent for adjacent operator (was Re: Add a filed to PageHeaderData)

On Wed, Jul 2, 2014 at 11:11 AM, Pavan Deolasee <pavan.deolasee@gmail.com>
wrote:

On Wed, Jun 25, 2014 at 10:39 PM, Heikki Linnakangas <
hlinnakangas@vmware.com> wrote:

I came up with the attached. There were several bugs:

I tested for the original bug report and patch definitely fixes that. I
don't feel qualified enough with SP-Gist to really comment on the other
bugs you reported and presumably fixed in the patch.

Heikki, did you get chance to commit your patch? IMHO we should get the bug
fix in before minor releases next week. My apologies if you've already
committed it and I've missed the commit message.

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

#21Michael Paquier
michael.paquier@gmail.com
In reply to: Pavan Deolasee (#20)
Re: Bug in spg_range_quad_inner_consistent for adjacent operator (was Re: Add a filed to PageHeaderData)

On Wed, Jul 16, 2014 at 1:34 PM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:

Heikki, did you get chance to commit your patch? IMHO we should get the bug
fix in before minor releases next week. My apologies if you've already
committed it and I've missed the commit message.

FWIW, this patch has not been committed yet. I am not seeing any
recent update on src/backend/utils/adt/rangetypes_spgist.c.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22Heikki Linnakangas
hlinnakangas@vmware.com
In reply to: Michael Paquier (#21)
Re: Bug in spg_range_quad_inner_consistent for adjacent operator (was Re: Add a filed to PageHeaderData)

On 07/16/2014 08:30 AM, Michael Paquier wrote:

On Wed, Jul 16, 2014 at 1:34 PM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:

Heikki, did you get chance to commit your patch? IMHO we should get the bug
fix in before minor releases next week. My apologies if you've already
committed it and I've missed the commit message.

FWIW, this patch has not been committed yet. I am not seeing any
recent update on src/backend/utils/adt/rangetypes_spgist.c.

Thanks for the reminder, committed now.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers