BUG #14155: bloom index error with unlogged table

Started by 德哥almost 10 years ago22 messageshackersbugs
Jump to latest
#1德哥
digoal@126.com
hackersbugs

The following bug has been logged on the website:

Bug reference: 14155
Logged by: Zhou Digoal
Email address: digoal@126.com
PostgreSQL version: 9.6beta1
Operating system: CentOS 6.x x64
Description:

postgres=# create table u_tbl (id int);
CREATE TABLE
postgres=# create index idx_u_tbl on u_tbl using bloom (id);
ERROR: access method "bloom" does not exist
postgres=# create extension bloom;
CREATE EXTENSION
postgres=# create index idx_u_tbl on u_tbl using bloom (id);
CREATE INDEX
postgres=# insert into u_tbl values (1);
INSERT 0 1
postgres=# insert into u_tbl values (2);
INSERT 0 1
postgres=# insert into u_tbl select generate_series(1,100000);
INSERT 0 100000
postgres=# explain select * from u_tbl where id=1;
QUERY PLAN

-----------------------------------------------------------------------------------
Gather (cost=0.00..0.00 rows=0 width=0)
Workers Planned: 1
Single Copy: true
-> Bitmap Heap Scan on u_tbl (cost=1045.38..1397.44 rows=565 width=4)
Recheck Cond: (id = 1)
-> Bitmap Index Scan on idx_u_tbl (cost=0.00..1045.24 rows=565
width=0)
Index Cond: (id = 1)
(7 rows)

postgres=# select * from u_tbl where id=1;
id
----
1
1
(2 rows)

postgres=# drop table u_tbl ;
DROP TABLE
postgres=# create unlogged table u_tbl (id int);
CREATE TABLE
postgres=# create index idx_u_tbl on u_tbl using bloom (id);
ERROR: index "idx_u_tbl" already contains data
postgres=# \set VERBOSITY verbose
postgres=# create index idx_u_tbl on u_tbl using bloom (id);
ERROR: XX000: index "idx_u_tbl" already contains data
LOCATION: blbuildempty, blinsert.c:164
postgres=# drop table u_tbl ;
DROP TABLE
postgres=# create temp table u_tbl (id int);
CREATE TABLE
postgres=# create index idx_u_tbl on u_tbl using bloom (id);
CREATE INDEX
postgres=# insert into u_tbl select generate_series(1,100000);
INSERT 0 100000
postgres=# explain select * from u_tbl where id=1;
QUERY PLAN

-----------------------------------------------------------------------------
Bitmap Heap Scan on u_tbl (cost=1045.38..1397.44 rows=565 width=4)
Recheck Cond: (id = 1)
-> Bitmap Index Scan on idx_u_tbl (cost=0.00..1045.24 rows=565
width=0)
Index Cond: (id = 1)
(4 rows)

postgres=# select * from u_tbl where id=1;
id
----
1
(1 row)

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: 德哥 (#1)
hackersbugs
Re: BUG #14155: bloom index error with unlogged table

digoal@126.com writes:

postgres=# create unlogged table u_tbl (id int);
CREATE TABLE
postgres=# create index idx_u_tbl on u_tbl using bloom (id);
ERROR: index "idx_u_tbl" already contains data

Yeah, it looks like nobody ever tested bloom's unlogged-index support;
it doesn't work or even come very close to working. Will fix, thanks
for the report!

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#3David G. Johnston
david.g.johnston@gmail.com
In reply to: Tom Lane (#2)
hackersbugs
Re: [BUGS] BUG #14155: bloom index error with unlogged table

Moving my griping to -hackers only

On Tue, May 24, 2016 at 8:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

digoal@126.com writes:

postgres=# create unlogged table u_tbl (id int);
CREATE TABLE
postgres=# create index idx_u_tbl on u_tbl using bloom (id);
ERROR: index "idx_u_tbl" already contains data

Yeah, it looks like nobody ever tested bloom's unlogged-index support;
it doesn't work or even come very close to working. Will fix, thanks
for the report!

​I'll tack on my own gripe here, just because.

It doesn't give me a lot of confidence in what was committed when the
summary sentence for the module says:

"
bloom is a module which implements an index access method. It comes as an
example of custom access methods and generic WAL records usage. But it is
also useful in itself.
​"​

​Honestly, as a user I couldn't care less that bloom is "an example custom
access method"​. I want to know what it does and that it does so reliably,
and has a easy-to-use interface. I complained earlier about its lack of
direct support for the boolean type. Teodor's response on the thread
wasn't particularly encouraging:

/messages/by-id/5718A59D.4090706@sigaev.ru

I also see that the following -hacker thread didn't get resolved:

/messages/by-id/CAKFQuwYkrepEsELFWb0B85daT466LELY8Ao-oKPWaQPwTmgGUg@mail.gmail.com

I would not be surprised to see additional problems crop up in the module.
Tom's characterization above just reinforces that.

David J.

#4Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#2)
hackersbugs
Re: BUG #14155: bloom index error with unlogged table

On Tue, May 24, 2016 at 5:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

digoal@126.com writes:

postgres=# create unlogged table u_tbl (id int);
CREATE TABLE
postgres=# create index idx_u_tbl on u_tbl using bloom (id);
ERROR: index "idx_u_tbl" already contains data

Yeah, it looks like nobody ever tested bloom's unlogged-index support;
it doesn't work or even come very close to working. Will fix, thanks
for the report!

To be honest, I began hacking around that to address the issue, though
I don't have yet something to post... So I guess that there is nothing
I need to do? Or do you need a patch?
--
Michael

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#4)
hackersbugs
Re: BUG #14155: bloom index error with unlogged table

Michael Paquier <michael.paquier@gmail.com> writes:

Yeah, it looks like nobody ever tested bloom's unlogged-index support;
it doesn't work or even come very close to working. Will fix, thanks
for the report!

To be honest, I began hacking around that to address the issue, though
I don't have yet something to post... So I guess that there is nothing
I need to do? Or do you need a patch?

Nah, I just pushed something ... but if you'd like to go over the
documentation thread David was complaining about and produce a cleaned-up
docs patch, that would be pretty helpful.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#6Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#4)
hackersbugs
Re: BUG #14155: bloom index error with unlogged table

On Tue, May 24, 2016 at 6:04 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Tue, May 24, 2016 at 5:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

digoal@126.com writes:

postgres=# create unlogged table u_tbl (id int);
CREATE TABLE
postgres=# create index idx_u_tbl on u_tbl using bloom (id);
ERROR: index "idx_u_tbl" already contains data

Yeah, it looks like nobody ever tested bloom's unlogged-index support;
it doesn't work or even come very close to working. Will fix, thanks
for the report!

To be honest, I began hacking around that to address the issue, though
I don't have yet something to post... So I guess that there is nothing
I need to do? Or do you need a patch?

Or too late...
--
Michael

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#7Jeff Janes
jeff.janes@gmail.com
In reply to: David G. Johnston (#3)
hackersbugs
Re: [BUGS] BUG #14155: bloom index error with unlogged table

On May 24, 2016 5:27 PM, "David G. Johnston" <david.g.johnston@gmail.com>
wrote:

Moving my griping to -hackers only

On Tue, May 24, 2016 at 8:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

digoal@126.com writes:

postgres=# create unlogged table u_tbl (id int);
CREATE TABLE
postgres=# create index idx_u_tbl on u_tbl using bloom (id);
ERROR: index "idx_u_tbl" already contains data

Yeah, it looks like nobody ever tested bloom's unlogged-index support;
it doesn't work or even come very close to working. Will fix, thanks
for the report!

​I'll tack on my own gripe here, just because.

It doesn't give me a lot of confidence in what was committed when the

summary sentence for the module says:

"
bloom is a module which implements an index access method. It comes as an

example of custom access methods and generic WAL records usage. But it is
also useful in itself.

​"​

​Honestly, as a user I couldn't care less that bloom is "an example

custom access method"​. I want to know what it does and that it does so
reliably, and has a easy-to-use interface. I complained earlier about its
lack of direct support for the boolean type. Teodor's response on the
thread wasn't particularly encouraging:

Given what a Bloom filter is/does, I'm having a hard time seeing how it
makes much sense to support the boolean type.

My biggest gripe with it at the moment is that the signature size should be
expressed in bits, and then internally rounded up to a multiple of 16,
rather than having it be expressed in 'uint16'.

If that were done it would be easier to fix the documentation to be more
understandable.

On the positive side, I've done extensive crash-recovery testing (not with
unlogged tables, obviously) and that part seems solid.

Cheers,

Jeff

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Janes (#7)
hackersbugs
Re: [BUGS] BUG #14155: bloom index error with unlogged table

Jeff Janes <jeff.janes@gmail.com> writes:

Given what a Bloom filter is/does, I'm having a hard time seeing how it
makes much sense to support the boolean type.

My biggest gripe with it at the moment is that the signature size should be
expressed in bits, and then internally rounded up to a multiple of 16,
rather than having it be expressed in 'uint16'.

If that were done it would be easier to fix the documentation to be more
understandable.

+1 ... that sort of definition seems much more future-proof, too.
IMO it's not too late to change this. (We probably don't want to change
the on-disk representation of the reloptions, but we could convert from
bits to words in bloptions().)

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#5)
hackersbugs
Re: BUG #14155: bloom index error with unlogged table

On Tue, May 24, 2016 at 6:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Michael Paquier <michael.paquier@gmail.com> writes:

Yeah, it looks like nobody ever tested bloom's unlogged-index support;
it doesn't work or even come very close to working. Will fix, thanks
for the report!

To be honest, I began hacking around that to address the issue, though
I don't have yet something to post... So I guess that there is nothing
I need to do? Or do you need a patch?

Nah, I just pushed something ... but if you'd like to go over the
documentation thread David was complaining about and produce a cleaned-up
docs patch, that would be pretty helpful.

OK. I'll get a look at this thread and see how the documentation can
be improved by replying there, though there is already a patch I
see...
--
Michael

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#8)
hackersbugs
Re: [BUGS] BUG #14155: bloom index error with unlogged table

I wrote:

Jeff Janes <jeff.janes@gmail.com> writes:

My biggest gripe with it at the moment is that the signature size should be
expressed in bits, and then internally rounded up to a multiple of 16,
rather than having it be expressed in 'uint16'.
If that were done it would be easier to fix the documentation to be more
understandable.

+1 ... that sort of definition seems much more future-proof, too.
IMO it's not too late to change this. (We probably don't want to change
the on-disk representation of the reloptions, but we could convert from
bits to words in bloptions().)

There were no objections to this, but also no action. Attached is a draft
patch ... any complaints?

regards, tom lane

Attachments:

bloom-express-length-in-bits.patchtext/x-diff; charset=us-ascii; name=bloom-express-length-in-bits.patchDownload+123-153
#11Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#10)
hackersbugs
Re: [BUGS] BUG #14155: bloom index error with unlogged table

On Fri, Jun 3, 2016 at 1:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

Jeff Janes <jeff.janes@gmail.com> writes:

My biggest gripe with it at the moment is that the signature size should be
expressed in bits, and then internally rounded up to a multiple of 16,
rather than having it be expressed in 'uint16'.
If that were done it would be easier to fix the documentation to be more
understandable.

+1 ... that sort of definition seems much more future-proof, too.
IMO it's not too late to change this. (We probably don't want to change
the on-disk representation of the reloptions, but we could convert from
bits to words in bloptions().)

There were no objections to this, but also no action. Attached is a draft
patch ... any complaints?

None. This looks rather sane to me.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Michael Paquier
michael@paquier.xyz
In reply to: Michael Paquier (#11)
hackersbugs
Re: [BUGS] BUG #14155: bloom index error with unlogged table

On Fri, Jun 3, 2016 at 3:11 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Fri, Jun 3, 2016 at 1:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

Jeff Janes <jeff.janes@gmail.com> writes:

My biggest gripe with it at the moment is that the signature size should be
expressed in bits, and then internally rounded up to a multiple of 16,
rather than having it be expressed in 'uint16'.
If that were done it would be easier to fix the documentation to be more
understandable.

+1 ... that sort of definition seems much more future-proof, too.
IMO it's not too late to change this. (We probably don't want to change
the on-disk representation of the reloptions, but we could convert from
bits to words in bloptions().)

There were no objections to this, but also no action. Attached is a draft
patch ... any complaints?

None. This looks rather sane to me.

Actually, the docs could be more polished. "Limitation" should be
changed to "Limitations", and "Opclass interface" to "Operator Class
Interface". The current wording "Seqscan is slow" is not clear, this
should mention a sequential scan instead. And it is not that slow,
just slower than the heap index scan of bloom..
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#12)
hackersbugs
Re: [BUGS] BUG #14155: bloom index error with unlogged table

Michael Paquier <michael.paquier@gmail.com> writes:

Actually, the docs could be more polished.

I think the docs could stand to be rewritten from scratch ;-). But
upthread there was an offer to work on them if we made the code behavior
saner. I've done the latter part, I don't want to do the former.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Jeff Janes
jeff.janes@gmail.com
In reply to: Tom Lane (#10)
hackersbugs
Re: [BUGS] BUG #14155: bloom index error with unlogged table

On Thu, Jun 2, 2016 at 9:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

Jeff Janes <jeff.janes@gmail.com> writes:

My biggest gripe with it at the moment is that the signature size should be
expressed in bits, and then internally rounded up to a multiple of 16,
rather than having it be expressed in 'uint16'.
If that were done it would be easier to fix the documentation to be more
understandable.

+1 ... that sort of definition seems much more future-proof, too.
IMO it's not too late to change this. (We probably don't want to change
the on-disk representation of the reloptions, but we could convert from
bits to words in bloptions().)

There were no objections to this, but also no action. Attached is a draft
patch ... any complaints?

One thing from the commit-message:

"On-disk, we can still store it in words, so as to not break on-disk
compatibility with beta1."

Hasn't that ship already sailed?

from beta1 to HEAD:

The database cluster was initialized with CATALOG_VERSION_NO
201605051, but the server was compiled with CATALOG_VERSION_NO
201605191.

Or is the concern about intra-version pg_upgrade rather than direct
on-disk compatibility?

Cheers,

Jeff

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeff Janes (#14)
hackersbugs
Re: [BUGS] BUG #14155: bloom index error with unlogged table

Jeff Janes <jeff.janes@gmail.com> writes:

One thing from the commit-message:
"On-disk, we can still store it in words, so as to not break on-disk
compatibility with beta1."

Hasn't that ship already sailed?

No.

Or is the concern about intra-version pg_upgrade rather than direct
on-disk compatibility?

This. You can pg_upgrade across a catversion bump, but not across
changes in user table or index contents.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#13)
hackersbugs
Re: [BUGS] BUG #14155: bloom index error with unlogged table

On Fri, Jun 3, 2016 at 11:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Michael Paquier <michael.paquier@gmail.com> writes:

Actually, the docs could be more polished.

I think the docs could stand to be rewritten from scratch ;-). But
upthread there was an offer to work on them if we made the code behavior
saner. I've done the latter part, I don't want to do the former.

I have finally given a shot at improving the docs with the attached.
Comments are welcome.
--
Michael

Attachments:

bloom-docs.patchtext/x-diff; charset=US-ASCII; name=bloom-docs.patchDownload+96-99
#17David G. Johnston
david.g.johnston@gmail.com
In reply to: Michael Paquier (#16)
hackersbugs
Re: [BUGS] BUG #14155: bloom index error with unlogged table

On Tue, Jun 7, 2016 at 1:35 AM, Michael Paquier <michael.paquier@gmail.com>
wrote:

On Fri, Jun 3, 2016 at 11:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Michael Paquier <michael.paquier@gmail.com> writes:

Actually, the docs could be more polished.

I think the docs could stand to be rewritten from scratch ;-). But
upthread there was an offer to work on them if we made the code behavior
saner. I've done the latter part, I don't want to do the former.

I have finally given a shot at improving the docs with the attached.
Comments are welcome.

​Looks good. Thanks!​

​Some minor word-smithing​ related stuff and one definitional concern:

​"of all indexed attributes and so it can report false positives" -> of all
indexed attributes and as such is prone to reporting false positives;

​"in the set, however" -> "in the set although"

​"one only needs a single bloom index (default 80, maximum 4096)" -> ​the
default seems like it would be better placed in the first paragraph of the
intro where "whose size in calculated in bits" is mentioned; or better yet
dropped altogether since the parameters section covers the defaults.

*** "to the number of the column for" - the examples imply that each
parameter refers to columns by name, not number.

"a bloom index representing first the advantage to be more" - this intro
to the example needs some work. maybe: "Here is a more complete example of
index definition and usage, as well as a comparison with the equivalent
btree index. The bloom index is considerably smaller as well as performs
better than the btree index.

---As an aside, is a multi-column index really a fair comparison here?

---Leaving a sequential scan explain analyze in place should be considered.

​"The Bloom opclass interface" -> The Bloom opclass interface requires a
hash function for the indexing datatype and an equality operator for
searching. The example...(drop the simple conclusion the word the equality
operator part better).

​"are implemented with the module" - are supplied by this module. (side
question, for 10.0 how about we call these extensions instead of modules?)

David J.

#18David G. Johnston
david.g.johnston@gmail.com
In reply to: Michael Paquier (#16)
hackersbugs
Re: [BUGS] BUG #14155: bloom index error with unlogged table

On Tue, Jun 7, 2016 at 1:35 AM, Michael Paquier <michael.paquier@gmail.com>
wrote:

On Fri, Jun 3, 2016 at 11:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Michael Paquier <michael.paquier@gmail.com> writes:

Actually, the docs could be more polished.

I think the docs could stand to be rewritten from scratch ;-). But
upthread there was an offer to work on them if we made the code behavior
saner. I've done the latter part, I don't want to do the former.

I have finally given a shot at improving the docs with the attached.
Comments are welcome.

​It would be nice to give guidance on selecting a bit size for columns and
a signature length​. Yes, Wikipedia covers the topic but to get the reader
started some discussion of the relevant trade-offs when using larger
numbers than the default would be nice. I don't suspect using smaller the
default values is apt to be worthwhile...

David J.

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: David G. Johnston (#18)
hackersbugs
Re: [BUGS] BUG #14155: bloom index error with unlogged table

"David G. Johnston" <david.g.johnston@gmail.com> writes:

On Tue, Jun 7, 2016 at 1:35 AM, Michael Paquier <michael.paquier@gmail.com>
wrote:

I have finally given a shot at improving the docs with the attached.
Comments are welcome.

[ assorted comments ]

I adopted most of David's suggestions, whacked it around a bit further
myself, and committed. See what you think.

​It would be nice to give guidance on selecting a bit size for columns and
a signature length​. Yes, Wikipedia covers the topic but to get the reader
started some discussion of the relevant trade-offs when using larger
numbers than the default would be nice. I don't suspect using smaller the
default values is apt to be worthwhile...

Agreed, but I didn't want to write such text myself. There's room for
further improvement here. I did add a note in the main example about
what happens with a non-default signature length, but that hardly
constitutes guidance.

BTW, it seemed to me while generating the example that the planner's
costing for bloom index searches was unduly pessimistic; maybe there's
work to do there?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20David G. Johnston
david.g.johnston@gmail.com
In reply to: Tom Lane (#19)
hackersbugs
Re: [BUGS] BUG #14155: bloom index error with unlogged table

On Tue, Jun 7, 2016 at 12:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

"David G. Johnston" <david.g.johnston@gmail.com> writes:

On Tue, Jun 7, 2016 at 1:35 AM, Michael Paquier <

michael.paquier@gmail.com>

wrote:

I have finally given a shot at improving the docs with the attached.
Comments are welcome.

[ assorted comments ]

I adopted most of David's suggestions, whacked it around a bit further
myself, and committed. See what you think.

Looks good though I'm waiting for the website to update.

Do I understand the process correctly? The current 9.6 docs reflect what
was committed and branched as beta1. Ongoing work is done against master
(devel docs). When beta2 is released it is branched from the current
master; not beta1. At that point the current devel docs will become the
new 9.6 docs.

​Thanks!​

​David J.

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: David G. Johnston (#20)
hackersbugs
#22Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#19)
hackersbugs