CLUSTER loses nulls (was Re: [ADMIN] Still a bug in the VACUUM)

Started by Tom Laneabout 23 years ago5 messageshackers
Jump to latest
#1Tom Lane
tgl@sss.pgh.pa.us

[ repost, as original message of 28-Feb seems to have gotten lost ]

I said:

I was able to reproduce a problem as follows: run the tsearch regression
test, then do "cluster wowidx on test_txtidx". This appears to lose
one row:

Ahh ... it took me way too long to realize what was happening. The
problem is simply that GiST indexes do not index nulls (at least not in
the first column of an index). So if you CLUSTER, you lose any rows
that contain NULLs in the indexed column --- they're not in the index,
so they're not seen by the indexscan that copies the data over to the
new table.

Having CLUSTER lose data is obviously not acceptable :-(. I can see two
possible solutions:

* Make CLUSTER error out if the target index is not of an 'amindexnulls'
index AM. This would amount to restricting CLUSTER to b-trees, which is
annoying.

* If the index is not amindexnulls and the first target column is not
marked attnotnull, make an extra seqscan pass over the source table to
look for rows containing nulls. Copy these rows separately. This would
work but adds a good deal of overhead.

Approach #2 is even worse for functional indexes --- attnotnull is not
helpful. We'd have to actually evaluate the function at every single
row to see if it yields NULL there. Yech.

It occurs to me also that the same kind of pitfall exists for partial
indexes: cluster on a partial index, you lose. However, I don't have
a problem with simply refusing to cluster on partial indexes.

Comments? Any other ideas out there?

regards, tom lane

#2Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Tom Lane (#1)
Re: CLUSTER loses nulls (was Re: [ADMIN] Still a bug in

* Make CLUSTER error out if the target index is not of an 'amindexnulls'
index AM. This would amount to restricting CLUSTER to b-trees, which is
annoying.

I think this solution is fine - we just need to fix GiST to index nulls
one day :)

It occurs to me also that the same kind of pitfall exists for partial
indexes: cluster on a partial index, you lose. However, I don't have
a problem with simply refusing to cluster on partial indexes.

No problem with that...

Comments? Any other ideas out there?

Is there any conceivable gain in doing a CLUSTER over a tsearch index?
Surely it's basically randomly accessed?

Chris

#3Rod Taylor
rbt@rbt.ca
In reply to: Tom Lane (#1)
Re: CLUSTER loses nulls (was Re: [ADMIN] Still a bug in the VACUUM)

* Make CLUSTER error out if the target index is not of an 'amindexnulls'
index AM. This would amount to restricting CLUSTER to b-trees, which is
annoying.

I'm willing to do this:

- Reject on partial or functional indexes
- Reject when first columns attnotnull is false and amindexnulls is
false.

--
Rod Taylor <rbt@rbt.ca>

PGP Key: http://www.rbt.ca/rbtpub.asc

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Christopher Kings-Lynne (#2)
Re: CLUSTER loses nulls (was Re: [ADMIN] Still a bug in the VACUUM)

Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:

* Make CLUSTER error out if the target index is not of an 'amindexnulls'
index AM. This would amount to restricting CLUSTER to b-trees, which is
annoying.

I think this solution is fine - we just need to fix GiST to index nulls
one day :)

Yeah, I think this is the appropriate stopgap for 7.3.* in any case.
If someone wants to improve it later, fine...

regards, tom lane

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Rod Taylor (#3)
Re: CLUSTER loses nulls (was Re: [ADMIN] Still a bug in the VACUUM)

Rod Taylor <rbt@rbt.ca> writes:

I'm willing to do this:

I can handle it (was already working on it, in fact). I had hoped to
find someone who might want to do the more extensive fix, but erroring
out is easy enough.

- Reject on partial or functional indexes
- Reject when first columns attnotnull is false and amindexnulls is
false.

Actually I think it's:

- Reject all partial indexes
- If not amindexnulls, reject all functional indexes, also plain indexes
when first columns attnotnull is false

regards, tom lane