BUG #14155: bloom index error with unlogged table

Started by Nonameover 9 years ago22 messages

digoal@126.com

over 9 years ago

The following bug has been logged on the website:

Bug reference: 14155
Logged by: Zhou Digoal
Email address: digoal@126.com
PostgreSQL version: 9.6beta1
Operating system: CentOS 6.x x64
Description:

postgres=# create table u_tbl (id int);
CREATE TABLE
postgres=# create index idx_u_tbl on u_tbl using bloom (id);
ERROR: access method "bloom" does not exist
postgres=# create extension bloom;
CREATE EXTENSION
postgres=# create index idx_u_tbl on u_tbl using bloom (id);
CREATE INDEX
postgres=# insert into u_tbl values (1);
INSERT 0 1
postgres=# insert into u_tbl values (2);
INSERT 0 1
postgres=# insert into u_tbl select generate_series(1,100000);
INSERT 0 100000
postgres=# explain select * from u_tbl where id=1;
QUERY PLAN

-----------------------------------------------------------------------------------
Gather (cost=0.00..0.00 rows=0 width=0)
Workers Planned: 1
Single Copy: true
-> Bitmap Heap Scan on u_tbl (cost=1045.38..1397.44 rows=565 width=4)
Recheck Cond: (id = 1)
-> Bitmap Index Scan on idx_u_tbl (cost=0.00..1045.24 rows=565
width=0)
Index Cond: (id = 1)
(7 rows)

postgres=# select * from u_tbl where id=1;
id
----
1
1
(2 rows)

postgres=# drop table u_tbl ;
DROP TABLE
postgres=# create unlogged table u_tbl (id int);
CREATE TABLE
postgres=# create index idx_u_tbl on u_tbl using bloom (id);
ERROR: index "idx_u_tbl" already contains data
postgres=# \set VERBOSITY verbose
postgres=# create index idx_u_tbl on u_tbl using bloom (id);
ERROR: XX000: index "idx_u_tbl" already contains data
LOCATION: blbuildempty, blinsert.c:164
postgres=# drop table u_tbl ;
DROP TABLE
postgres=# create temp table u_tbl (id int);
CREATE TABLE
postgres=# create index idx_u_tbl on u_tbl using bloom (id);
CREATE INDEX
postgres=# insert into u_tbl select generate_series(1,100000);
INSERT 0 100000
postgres=# explain select * from u_tbl where id=1;
QUERY PLAN

-----------------------------------------------------------------------------
Bitmap Heap Scan on u_tbl (cost=1045.38..1397.44 rows=565 width=4)
Recheck Cond: (id = 1)
-> Bitmap Index Scan on idx_u_tbl (cost=0.00..1045.24 rows=565
width=0)
Index Cond: (id = 1)
(4 rows)

postgres=# select * from u_tbl where id=1;
id
----
1
(1 row)

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Tom Lane

tgl@sss.pgh.pa.us

over 9 years ago

In reply to: Noname (#1)

Re: BUG #14155: bloom index error with unlogged table

digoal@126.com writes:

postgres=# create unlogged table u_tbl (id int);
CREATE TABLE
postgres=# create index idx_u_tbl on u_tbl using bloom (id);
ERROR: index "idx_u_tbl" already contains data

Yeah, it looks like nobody ever tested bloom's unlogged-index support;
it doesn't work or even come very close to working. Will fix, thanks
for the report!

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

David G. Johnston

david.g.johnston@gmail.com

over 9 years ago

In reply to: Tom Lane (#2)

Re: [BUGS] BUG #14155: bloom index error with unlogged table

Moving my griping to -hackers only

On Tue, May 24, 2016 at 8:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

digoal@126.com writes:

postgres=# create unlogged table u_tbl (id int);
CREATE TABLE
postgres=# create index idx_u_tbl on u_tbl using bloom (id);
ERROR: index "idx_u_tbl" already contains data

Yeah, it looks like nobody ever tested bloom's unlogged-index support;
it doesn't work or even come very close to working. Will fix, thanks
for the report!

I'll tack on my own gripe here, just because.

It doesn't give me a lot of confidence in what was committed when the
summary sentence for the module says:

"
bloom is a module which implements an index access method. It comes as an
example of custom access methods and generic WAL records usage. But it is
also useful in itself.
"

Honestly, as a user I couldn't care less that bloom is "an example custom
access method". I want to know what it does and that it does so reliably,
and has a easy-to-use interface. I complained earlier about its lack of
direct support for the boolean type. Teodor's response on the thread
wasn't particularly encouraging:

/messages/by-id/5718A59D.4090706@sigaev.ru

I also see that the following -hacker thread didn't get resolved:

/messages/by-id/CAKFQuwYkrepEsELFWb0B85daT466LELY8Ao-oKPWaQPwTmgGUg@mail.gmail.com

I would not be surprised to see additional problems crop up in the module.
Tom's characterization above just reinforces that.

David J.

Michael Paquier

michael.paquier@gmail.com

over 9 years ago

In reply to: Tom Lane (#2)

Re: BUG #14155: bloom index error with unlogged table

On Tue, May 24, 2016 at 5:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

digoal@126.com writes:

postgres=# create unlogged table u_tbl (id int);
CREATE TABLE
postgres=# create index idx_u_tbl on u_tbl using bloom (id);
ERROR: index "idx_u_tbl" already contains data

Yeah, it looks like nobody ever tested bloom's unlogged-index support;
it doesn't work or even come very close to working. Will fix, thanks
for the report!

To be honest, I began hacking around that to address the issue, though
I don't have yet something to post... So I guess that there is nothing
I need to do? Or do you need a patch?
--
Michael

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Tom Lane

tgl@sss.pgh.pa.us

over 9 years ago

In reply to: Michael Paquier (#4)

Re: BUG #14155: bloom index error with unlogged table

Michael Paquier <michael.paquier@gmail.com> writes:

Yeah, it looks like nobody ever tested bloom's unlogged-index support;
it doesn't work or even come very close to working. Will fix, thanks
for the report!

To be honest, I began hacking around that to address the issue, though
I don't have yet something to post... So I guess that there is nothing
I need to do? Or do you need a patch?

Nah, I just pushed something ... but if you'd like to go over the
documentation thread David was complaining about and produce a cleaned-up
docs patch, that would be pretty helpful.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Michael Paquier

michael.paquier@gmail.com

over 9 years ago

In reply to: Michael Paquier (#4)

Re: BUG #14155: bloom index error with unlogged table

On Tue, May 24, 2016 at 6:04 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Tue, May 24, 2016 at 5:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

digoal@126.com writes:

postgres=# create unlogged table u_tbl (id int);
CREATE TABLE
postgres=# create index idx_u_tbl on u_tbl using bloom (id);
ERROR: index "idx_u_tbl" already contains data

Yeah, it looks like nobody ever tested bloom's unlogged-index support;
it doesn't work or even come very close to working. Will fix, thanks
for the report!

To be honest, I began hacking around that to address the issue, though
I don't have yet something to post... So I guess that there is nothing
I need to do? Or do you need a patch?

Or too late...
--
Michael

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Jeff Janes

jeff.janes@gmail.com

over 9 years ago

In reply to: David G. Johnston (#3)

Re: [BUGS] BUG #14155: bloom index error with unlogged table

On May 24, 2016 5:27 PM, "David G. Johnston" <david.g.johnston@gmail.com>
wrote:

Moving my griping to -hackers only

On Tue, May 24, 2016 at 8:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

digoal@126.com writes:

postgres=# create unlogged table u_tbl (id int);
CREATE TABLE
postgres=# create index idx_u_tbl on u_tbl using bloom (id);
ERROR: index "idx_u_tbl" already contains data

Yeah, it looks like nobody ever tested bloom's unlogged-index support;
it doesn't work or even come very close to working. Will fix, thanks
for the report!

I'll tack on my own gripe here, just because.

It doesn't give me a lot of confidence in what was committed when the

summary sentence for the module says:

"
bloom is a module which implements an index access method. It comes as an

example of custom access methods and generic WAL records usage. But it is
also useful in itself.

"

Honestly, as a user I couldn't care less that bloom is "an example

custom access method". I want to know what it does and that it does so
reliably, and has a easy-to-use interface. I complained earlier about its
lack of direct support for the boolean type. Teodor's response on the
thread wasn't particularly encouraging:

Given what a Bloom filter is/does, I'm having a hard time seeing how it
makes much sense to support the boolean type.

My biggest gripe with it at the moment is that the signature size should be
expressed in bits, and then internally rounded up to a multiple of 16,
rather than having it be expressed in 'uint16'.

If that were done it would be easier to fix the documentation to be more
understandable.

On the positive side, I've done extensive crash-recovery testing (not with
unlogged tables, obviously) and that part seems solid.

Cheers,

Jeff

Tom Lane

tgl@sss.pgh.pa.us

over 9 years ago

In reply to: Jeff Janes (#7)

Re: [BUGS] BUG #14155: bloom index error with unlogged table

Jeff Janes <jeff.janes@gmail.com> writes:

Given what a Bloom filter is/does, I'm having a hard time seeing how it
makes much sense to support the boolean type.

My biggest gripe with it at the moment is that the signature size should be
expressed in bits, and then internally rounded up to a multiple of 16,
rather than having it be expressed in 'uint16'.

If that were done it would be easier to fix the documentation to be more
understandable.

+1 ... that sort of definition seems much more future-proof, too.
IMO it's not too late to change this. (We probably don't want to change
the on-disk representation of the reloptions, but we could convert from
bits to words in bloptions().)

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Michael Paquier

michael.paquier@gmail.com

over 9 years ago

In reply to: Tom Lane (#5)

Re: BUG #14155: bloom index error with unlogged table

On Tue, May 24, 2016 at 6:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Michael Paquier <michael.paquier@gmail.com> writes:

Yeah, it looks like nobody ever tested bloom's unlogged-index support;
it doesn't work or even come very close to working. Will fix, thanks
for the report!

To be honest, I began hacking around that to address the issue, though
I don't have yet something to post... So I guess that there is nothing
I need to do? Or do you need a patch?

Nah, I just pushed something ... but if you'd like to go over the
documentation thread David was complaining about and produce a cleaned-up
docs patch, that would be pretty helpful.

OK. I'll get a look at this thread and see how the documentation can
be improved by replying there, though there is already a patch I
see...
--
Michael

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#10

Tom Lane

tgl@sss.pgh.pa.us

over 9 years ago

In reply to: Tom Lane (#8)

1 attachment(s)

Re: [BUGS] BUG #14155: bloom index error with unlogged table

I wrote:

Jeff Janes <jeff.janes@gmail.com> writes:

My biggest gripe with it at the moment is that the signature size should be
expressed in bits, and then internally rounded up to a multiple of 16,
rather than having it be expressed in 'uint16'.
If that were done it would be easier to fix the documentation to be more
understandable.

+1 ... that sort of definition seems much more future-proof, too.
IMO it's not too late to change this. (We probably don't want to change
the on-disk representation of the reloptions, but we could convert from
bits to words in bloptions().)

There were no objections to this, but also no action. Attached is a draft
patch ... any complaints?

regards, tom lane

Attachments:

bloom-express-length-in-bits.patchtext/x-diff; charset=us-ascii; name=bloom-express-length-in-bits.patchDownload

diff --git a/contrib/bloom/bloom.h b/contrib/bloom/bloom.h
index c21eebf..a9daf63 100644
*** a/contrib/bloom/bloom.h
--- b/contrib/bloom/bloom.h
*************** typedef BloomPageOpaqueData *BloomPageOp
*** 79,96 ****
  #define BLOOM_HEAD_BLKNO		(1)		/* first data page */
  
  /*
!  * Maximum of bloom signature length in uint16. Actual value
!  * is 512 bytes
   */
! #define MAX_BLOOM_LENGTH		(256)
  
  /* Bloom index options */
  typedef struct BloomOptions
  {
  	int32		vl_len_;		/* varlena header (do not touch directly!) */
! 	int			bloomLength;	/* length of signature in uint16 */
! 	int			bitSize[INDEX_MAX_KEYS];		/* signature bits per index
! 												 * key */
  }	BloomOptions;
  
  /*
--- 79,108 ----
  #define BLOOM_HEAD_BLKNO		(1)		/* first data page */
  
  /*
!  * We store Bloom signatures as arrays of uint16 words.
   */
! typedef uint16 BloomSignatureWord;
! 
! #define SIGNWORDBITS ((int) (BITS_PER_BYTE * sizeof(BloomSignatureWord)))
! 
! /*
!  * Default and maximum Bloom signature length in bits.
!  */
! #define DEFAULT_BLOOM_LENGTH	(5 * SIGNWORDBITS)
! #define MAX_BLOOM_LENGTH		(256 * SIGNWORDBITS)
! 
! /*
!  * Default and maximum signature bits generated per index key.
!  */
! #define DEFAULT_BLOOM_BITS		2
! #define MAX_BLOOM_BITS			(MAX_BLOOM_LENGTH - 1)
  
  /* Bloom index options */
  typedef struct BloomOptions
  {
  	int32		vl_len_;		/* varlena header (do not touch directly!) */
! 	int			bloomLength;	/* length of signature in words (not bits!) */
! 	int			bitSize[INDEX_MAX_KEYS];	/* signature bits per index key */
  }	BloomOptions;
  
  /*
*************** typedef struct BloomState
*** 143,154 ****
  /*
   * Tuples are very different from all other relations
   */
- typedef uint16 SignType;
- 
  typedef struct BloomTuple
  {
  	ItemPointerData heapPtr;
! 	SignType	sign[FLEXIBLE_ARRAY_MEMBER];
  }	BloomTuple;
  
  #define BLOOMTUPLEHDRSZ offsetof(BloomTuple, sign)
--- 155,164 ----
  /*
   * Tuples are very different from all other relations
   */
  typedef struct BloomTuple
  {
  	ItemPointerData heapPtr;
! 	BloomSignatureWord	sign[FLEXIBLE_ARRAY_MEMBER];
  }	BloomTuple;
  
  #define BLOOMTUPLEHDRSZ offsetof(BloomTuple, sign)
*************** typedef struct BloomTuple
*** 156,162 ****
  /* Opaque data structure for bloom index scan */
  typedef struct BloomScanOpaqueData
  {
! 	SignType   *sign;			/* Scan signature */
  	BloomState	state;
  }	BloomScanOpaqueData;
  
--- 166,172 ----
  /* Opaque data structure for bloom index scan */
  typedef struct BloomScanOpaqueData
  {
! 	BloomSignatureWord   *sign;			/* Scan signature */
  	BloomState	state;
  }	BloomScanOpaqueData;
  
*************** extern void BloomFillMetapage(Relation i
*** 170,176 ****
  extern void BloomInitMetapage(Relation index);
  extern void BloomInitPage(Page page, uint16 flags);
  extern Buffer BloomNewBuffer(Relation index);
! extern void signValue(BloomState * state, SignType * sign, Datum value, int attno);
  extern BloomTuple *BloomFormTuple(BloomState * state, ItemPointer iptr, Datum *values, bool *isnull);
  extern bool BloomPageAddItem(BloomState * state, Page page, BloomTuple * tuple);
  
--- 180,186 ----
  extern void BloomInitMetapage(Relation index);
  extern void BloomInitPage(Page page, uint16 flags);
  extern Buffer BloomNewBuffer(Relation index);
! extern void signValue(BloomState * state, BloomSignatureWord * sign, Datum value, int attno);
  extern BloomTuple *BloomFormTuple(BloomState * state, ItemPointer iptr, Datum *values, bool *isnull);
  extern bool BloomPageAddItem(BloomState * state, Page page, BloomTuple * tuple);
  
diff --git a/contrib/bloom/blscan.c b/contrib/bloom/blscan.c
index aebf32a..0c954dc 100644
*** a/contrib/bloom/blscan.c
--- b/contrib/bloom/blscan.c
*************** blgetbitmap(IndexScanDesc scan, TIDBitma
*** 93,99 ****
  		/* New search: have to calculate search signature */
  		ScanKey		skey = scan->keyData;
  
! 		so->sign = palloc0(sizeof(SignType) * so->state.opts.bloomLength);
  
  		for (i = 0; i < scan->numberOfKeys; i++)
  		{
--- 93,99 ----
  		/* New search: have to calculate search signature */
  		ScanKey		skey = scan->keyData;
  
! 		so->sign = palloc0(sizeof(BloomSignatureWord) * so->state.opts.bloomLength);
  
  		for (i = 0; i < scan->numberOfKeys; i++)
  		{
diff --git a/contrib/bloom/blutils.c b/contrib/bloom/blutils.c
index 4a5b343..8f13476 100644
*** a/contrib/bloom/blutils.c
--- b/contrib/bloom/blutils.c
***************
*** 27,49 ****
  
  #include "bloom.h"
  
! /* Signature dealing macros */
! #define BITSIGNTYPE (BITS_PER_BYTE * sizeof(SignType))
! #define GETWORD(x,i) ( *( (SignType*)(x) + (int)( (i) / BITSIGNTYPE ) ) )
! #define CLRBIT(x,i)   GETWORD(x,i) &= ~( 0x01 << ( (i) % BITSIGNTYPE ) )
! #define SETBIT(x,i)   GETWORD(x,i) |=  ( 0x01 << ( (i) % BITSIGNTYPE ) )
! #define GETBIT(x,i) ( (GETWORD(x,i) >> ( (i) % BITSIGNTYPE )) & 0x01 )
  
  PG_FUNCTION_INFO_V1(blhandler);
  
! /* Kind of relation optioms for bloom index */
  static relopt_kind bl_relopt_kind;
  
  static int32 myRand(void);
  static void mySrand(uint32 seed);
  
  /*
!  * Module initialize function: initilized relation options.
   */
  void
  _PG_init(void)
--- 27,52 ----
  
  #include "bloom.h"
  
! /* Signature dealing macros - note i is assumed to be of type int */
! #define GETWORD(x,i) ( *( (BloomSignatureWord *)(x) + ( (i) / SIGNWORDBITS ) ) )
! #define CLRBIT(x,i)   GETWORD(x,i) &= ~( 0x01 << ( (i) % SIGNWORDBITS ) )
! #define SETBIT(x,i)   GETWORD(x,i) |=  ( 0x01 << ( (i) % SIGNWORDBITS ) )
! #define GETBIT(x,i) ( (GETWORD(x,i) >> ( (i) % SIGNWORDBITS )) & 0x01 )
  
  PG_FUNCTION_INFO_V1(blhandler);
  
! /* Kind of relation options for bloom index */
  static relopt_kind bl_relopt_kind;
+ /* parse table for fillRelOptions */
+ static relopt_parse_elt bl_relopt_tab[INDEX_MAX_KEYS + 1];
  
  static int32 myRand(void);
  static void mySrand(uint32 seed);
  
  /*
!  * Module initialize function: initialize info about Bloom relation options.
!  *
!  * Note: keep this in sync with makeDefaultBloomOptions().
   */
  void
  _PG_init(void)
*************** _PG_init(void)
*** 53,70 ****
  
  	bl_relopt_kind = add_reloption_kind();
  
  	add_int_reloption(bl_relopt_kind, "length",
! 					  "Length of signature in uint16 type", 5, 1, 256);
  
  	for (i = 0; i < INDEX_MAX_KEYS; i++)
  	{
! 		snprintf(buf, 16, "col%d", i + 1);
  		add_int_reloption(bl_relopt_kind, buf,
! 					  "Number of bits for corresponding column", 2, 1, 2048);
  	}
  }
  
  /*
   * Bloom handler function: return IndexAmRoutine with access method parameters
   * and callbacks.
   */
--- 56,101 ----
  
  	bl_relopt_kind = add_reloption_kind();
  
+ 	/* Option for length of signature */
  	add_int_reloption(bl_relopt_kind, "length",
! 					  "Length of signature in bits",
! 					  DEFAULT_BLOOM_LENGTH, 1, MAX_BLOOM_LENGTH);
! 	bl_relopt_tab[0].optname = "length";
! 	bl_relopt_tab[0].opttype = RELOPT_TYPE_INT;
! 	bl_relopt_tab[0].offset = offsetof(BloomOptions, bloomLength);
  
+ 	/* Number of bits for each possible index column: col1, col2, ... */
  	for (i = 0; i < INDEX_MAX_KEYS; i++)
  	{
! 		snprintf(buf, sizeof(buf), "col%d", i + 1);
  		add_int_reloption(bl_relopt_kind, buf,
! 						  "Number of bits for corresponding column",
! 						  DEFAULT_BLOOM_BITS, 1, MAX_BLOOM_BITS);
! 		bl_relopt_tab[i + 1].optname = MemoryContextStrdup(TopMemoryContext,
! 														   buf);
! 		bl_relopt_tab[i + 1].opttype = RELOPT_TYPE_INT;
! 		bl_relopt_tab[i + 1].offset = offsetof(BloomOptions, bitSize[i]);
  	}
  }
  
  /*
+  * Construct a default set of Bloom options.
+  */
+ static BloomOptions *
+ makeDefaultBloomOptions(void)
+ {
+ 	BloomOptions *opts;
+ 	int				i;
+ 
+ 	opts = (BloomOptions *) palloc0(sizeof(BloomOptions));
+ 	opts->bloomLength = (DEFAULT_BLOOM_LENGTH + SIGNWORDBITS - 1) / SIGNWORDBITS;
+ 	for (i = 0; i < INDEX_MAX_KEYS; i++)
+ 		opts->bitSize[i] = DEFAULT_BLOOM_BITS;
+ 	SET_VARSIZE(opts, sizeof(BloomOptions));
+ 	return opts;
+ }
+ 
+ /*
   * Bloom handler function: return IndexAmRoutine with access method parameters
   * and callbacks.
   */
*************** initBloomState(BloomState *state, Relati
*** 157,163 ****
  
  	memcpy(&state->opts, index->rd_amcache, sizeof(state->opts));
  	state->sizeOfBloomTuple = BLOOMTUPLEHDRSZ +
! 		sizeof(SignType) * state->opts.bloomLength;
  }
  
  /*
--- 188,194 ----
  
  	memcpy(&state->opts, index->rd_amcache, sizeof(state->opts));
  	state->sizeOfBloomTuple = BLOOMTUPLEHDRSZ +
! 		sizeof(BloomSignatureWord) * state->opts.bloomLength;
  }
  
  /*
*************** mySrand(uint32 seed)
*** 208,214 ****
   * Add bits of given value to the signature.
   */
  void
! signValue(BloomState *state, SignType *sign, Datum value, int attno)
  {
  	uint32		hashVal;
  	int			nBit,
--- 239,245 ----
   * Add bits of given value to the signature.
   */
  void
! signValue(BloomState *state, BloomSignatureWord *sign, Datum value, int attno)
  {
  	uint32		hashVal;
  	int			nBit,
*************** signValue(BloomState *state, SignType *s
*** 231,238 ****
  
  	for (j = 0; j < state->opts.bitSize[attno]; j++)
  	{
! 		/* prevent mutiple evaluation */
! 		nBit = myRand() % (state->opts.bloomLength * BITSIGNTYPE);
  		SETBIT(sign, nBit);
  	}
  }
--- 262,269 ----
  
  	for (j = 0; j < state->opts.bitSize[attno]; j++)
  	{
! 		/* prevent multiple evaluation in SETBIT macro */
! 		nBit = myRand() % (state->opts.bloomLength * SIGNWORDBITS);
  		SETBIT(sign, nBit);
  	}
  }
*************** BloomInitPage(Page page, uint16 flags)
*** 362,400 ****
  }
  
  /*
-  * Adjust options of bloom index.
-  *
-  * This must produce default options when *opts is initially all-zero.
-  */
- static void
- adjustBloomOptions(BloomOptions *opts)
- {
- 	int				i;
- 
- 	/* Default length of bloom filter is 5 of 16-bit integers */
- 	if (opts->bloomLength <= 0)
- 		opts->bloomLength = 5;
- 	else if (opts->bloomLength > MAX_BLOOM_LENGTH)
- 		ereport(ERROR,
- 				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- 				 errmsg("length of bloom signature (%d) is greater than maximum %d",
- 						opts->bloomLength, MAX_BLOOM_LENGTH)));
- 
- 	/* Check signature length */
- 	for (i = 0; i < INDEX_MAX_KEYS; i++)
- 	{
- 		/*
- 		 * Zero and negative number of bits is meaningless.  Also setting
- 		 * more bits than signature have seems useless.  Replace both cases
- 		 * with 2 bits default.
- 		 */
- 		if (opts->bitSize[i] <= 0
- 			|| opts->bitSize[i] >= opts->bloomLength * sizeof(SignType) * BITS_PER_BYTE)
- 			opts->bitSize[i] = 2;
- 	}
- }
- 
- /*
   * Fill in metapage for bloom index.
   */
  void
--- 393,398 ----
*************** BloomFillMetapage(Relation index, Page m
*** 405,418 ****
  
  	/*
  	 * Choose the index's options.  If reloptions have been assigned, use
! 	 * those, otherwise create default options by applying adjustBloomOptions
! 	 * to a zeroed chunk of memory.  We apply adjustBloomOptions to existing
! 	 * reloptions too, just out of paranoia; they should be valid already.
  	 */
  	opts = (BloomOptions *) index->rd_options;
  	if (!opts)
! 		opts = (BloomOptions *) palloc0(sizeof(BloomOptions));
! 	adjustBloomOptions(opts);
  
  	/*
  	 * Initialize contents of meta page, including a copy of the options,
--- 403,413 ----
  
  	/*
  	 * Choose the index's options.  If reloptions have been assigned, use
! 	 * those, otherwise create default options.
  	 */
  	opts = (BloomOptions *) index->rd_options;
  	if (!opts)
! 		opts = makeDefaultBloomOptions();
  
  	/*
  	 * Initialize contents of meta page, including a copy of the options,
*************** bloptions(Datum reloptions, bool validat
*** 462,491 ****
  	relopt_value *options;
  	int			numoptions;
  	BloomOptions *rdopts;
- 	relopt_parse_elt tab[INDEX_MAX_KEYS + 1];
- 	int			i;
- 	char		buf[16];
- 
- 	/* Option for length of signature */
- 	tab[0].optname = "length";
- 	tab[0].opttype = RELOPT_TYPE_INT;
- 	tab[0].offset = offsetof(BloomOptions, bloomLength);
- 
- 	/* Number of bits for each of possible columns: col1, col2, ... */
- 	for (i = 0; i < INDEX_MAX_KEYS; i++)
- 	{
- 		snprintf(buf, sizeof(buf), "col%d", i + 1);
- 		tab[i + 1].optname = pstrdup(buf);
- 		tab[i + 1].opttype = RELOPT_TYPE_INT;
- 		tab[i + 1].offset = offsetof(BloomOptions, bitSize[i]);
- 	}
  
  	options = parseRelOptions(reloptions, validate, bl_relopt_kind, &numoptions);
  	rdopts = allocateReloptStruct(sizeof(BloomOptions), options, numoptions);
  	fillRelOptions((void *) rdopts, sizeof(BloomOptions), options, numoptions,
! 				   validate, tab, INDEX_MAX_KEYS + 1);
  
! 	adjustBloomOptions(rdopts);
  
  	return (bytea *) rdopts;
  }
--- 457,471 ----
  	relopt_value *options;
  	int			numoptions;
  	BloomOptions *rdopts;
  
+ 	/* Parse the user-given reloptions */
  	options = parseRelOptions(reloptions, validate, bl_relopt_kind, &numoptions);
  	rdopts = allocateReloptStruct(sizeof(BloomOptions), options, numoptions);
  	fillRelOptions((void *) rdopts, sizeof(BloomOptions), options, numoptions,
! 				   validate, bl_relopt_tab, lengthof(bl_relopt_tab));
  
! 	/* Convert signature length to words, rounding up */
! 	rdopts->bloomLength = (rdopts->bloomLength + SIGNWORDBITS - 1) / SIGNWORDBITS;
  
  	return (bytea *) rdopts;
  }
diff --git a/doc/src/sgml/bloom.sgml b/doc/src/sgml/bloom.sgml
index 49cb066..abf30e7 100644
*** a/doc/src/sgml/bloom.sgml
--- b/doc/src/sgml/bloom.sgml
***************
*** 8,15 ****
   </indexterm>
  
   <para>
!   <literal>bloom</> is a module which implements an index access method.  It comes
!   as an example of custom access methods and generic WAL records usage.  But it
    is also useful in itself.
   </para>
  
--- 8,15 ----
   </indexterm>
  
   <para>
!   <literal>bloom</> is a module that implements an index access method.  It comes
!   as an example of custom access methods and generic WAL record usage.  But it
    is also useful in itself.
   </para>
  
***************
*** 22,29 ****
     allows fast exclusion of non-candidate tuples via signatures.
     Since a signature is a lossy representation of all indexed attributes,
     search results must be rechecked using heap information.
!    The user can specify signature length (in uint16, default is 5) and the
!    number of bits, which can be set per attribute (1 < colN < 2048).
    </para>
  
    <para>
--- 22,30 ----
     allows fast exclusion of non-candidate tuples via signatures.
     Since a signature is a lossy representation of all indexed attributes,
     search results must be rechecked using heap information.
!    The user can specify signature length in bits (default 80, maximum 4096)
!    and the number of bits generated for each index column (default 2,
!    maximum 4095).
    </para>
  
    <para>
***************
*** 51,64 ****
      <term><literal>length</></term>
      <listitem>
       <para>
!       Length of signature in uint16 type values
       </para>
      </listitem>
     </varlistentry>
     </variablelist>
     <variablelist>
     <varlistentry>
!     <term><literal>col1 &mdash; col16</></term>
      <listitem>
       <para>
        Number of bits for corresponding column
--- 52,65 ----
      <term><literal>length</></term>
      <listitem>
       <para>
!       Length of signature in bits
       </para>
      </listitem>
     </varlistentry>
     </variablelist>
     <variablelist>
     <varlistentry>
!     <term><literal>col1 &mdash; col32</></term>
      <listitem>
       <para>
        Number of bits for corresponding column
***************
*** 77,88 ****
  
  <programlisting>
  CREATE INDEX bloomidx ON tbloom USING bloom (i1,i2,i3)
!        WITH (length=5, col1=2, col2=2, col3=4);
  </programlisting>
  
    <para>
     Here, we created a bloom index with a signature length of 80 bits,
!    and attributes i1 and i2 mapped to 2 bits, and attribute i3 to 4 bits.
    </para>
  
    <para>
--- 78,89 ----
  
  <programlisting>
  CREATE INDEX bloomidx ON tbloom USING bloom (i1,i2,i3)
!        WITH (length=80, col1=2, col2=2, col3=4);
  </programlisting>
  
    <para>
     Here, we created a bloom index with a signature length of 80 bits,
!    and attributes i1 and i2 mapped to 2 bits, and attribute i3 mapped to 4 bits.
    </para>
  
    <para>

#11

Michael Paquier

michael.paquier@gmail.com

over 9 years ago

In reply to: Tom Lane (#10)

Re: [BUGS] BUG #14155: bloom index error with unlogged table

On Fri, Jun 3, 2016 at 1:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

Jeff Janes <jeff.janes@gmail.com> writes:

My biggest gripe with it at the moment is that the signature size should be
expressed in bits, and then internally rounded up to a multiple of 16,
rather than having it be expressed in 'uint16'.
If that were done it would be easier to fix the documentation to be more
understandable.

+1 ... that sort of definition seems much more future-proof, too.
IMO it's not too late to change this. (We probably don't want to change
the on-disk representation of the reloptions, but we could convert from
bits to words in bloptions().)

There were no objections to this, but also no action. Attached is a draft
patch ... any complaints?

None. This looks rather sane to me.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Michael Paquier

michael.paquier@gmail.com

over 9 years ago

In reply to: Michael Paquier (#11)

Re: [BUGS] BUG #14155: bloom index error with unlogged table

On Fri, Jun 3, 2016 at 3:11 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Fri, Jun 3, 2016 at 1:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

Jeff Janes <jeff.janes@gmail.com> writes:

My biggest gripe with it at the moment is that the signature size should be
expressed in bits, and then internally rounded up to a multiple of 16,
rather than having it be expressed in 'uint16'.
If that were done it would be easier to fix the documentation to be more
understandable.

+1 ... that sort of definition seems much more future-proof, too.
IMO it's not too late to change this. (We probably don't want to change
the on-disk representation of the reloptions, but we could convert from
bits to words in bloptions().)

There were no objections to this, but also no action. Attached is a draft
patch ... any complaints?

None. This looks rather sane to me.

Actually, the docs could be more polished. "Limitation" should be
changed to "Limitations", and "Opclass interface" to "Operator Class
Interface". The current wording "Seqscan is slow" is not clear, this
should mention a sequential scan instead. And it is not that slow,
just slower than the heap index scan of bloom..
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Tom Lane

tgl@sss.pgh.pa.us

over 9 years ago

In reply to: Michael Paquier (#12)

Re: [BUGS] BUG #14155: bloom index error with unlogged table

Michael Paquier <michael.paquier@gmail.com> writes:

Actually, the docs could be more polished.

I think the docs could stand to be rewritten from scratch ;-). But
upthread there was an offer to work on them if we made the code behavior
saner. I've done the latter part, I don't want to do the former.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Jeff Janes

jeff.janes@gmail.com

over 9 years ago

In reply to: Tom Lane (#10)

Re: [BUGS] BUG #14155: bloom index error with unlogged table

On Thu, Jun 2, 2016 at 9:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

Jeff Janes <jeff.janes@gmail.com> writes:

My biggest gripe with it at the moment is that the signature size should be
expressed in bits, and then internally rounded up to a multiple of 16,
rather than having it be expressed in 'uint16'.
If that were done it would be easier to fix the documentation to be more
understandable.

+1 ... that sort of definition seems much more future-proof, too.
IMO it's not too late to change this. (We probably don't want to change
the on-disk representation of the reloptions, but we could convert from
bits to words in bloptions().)

There were no objections to this, but also no action. Attached is a draft
patch ... any complaints?

One thing from the commit-message:

"On-disk, we can still store it in words, so as to not break on-disk
compatibility with beta1."

Hasn't that ship already sailed?

from beta1 to HEAD:

The database cluster was initialized with CATALOG_VERSION_NO
201605051, but the server was compiled with CATALOG_VERSION_NO
201605191.

Or is the concern about intra-version pg_upgrade rather than direct
on-disk compatibility?

Cheers,

Jeff

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Tom Lane

tgl@sss.pgh.pa.us

over 9 years ago

In reply to: Jeff Janes (#14)

Re: [BUGS] BUG #14155: bloom index error with unlogged table

Jeff Janes <jeff.janes@gmail.com> writes:

One thing from the commit-message:
"On-disk, we can still store it in words, so as to not break on-disk
compatibility with beta1."

Hasn't that ship already sailed?

No.

Or is the concern about intra-version pg_upgrade rather than direct
on-disk compatibility?

This. You can pg_upgrade across a catversion bump, but not across
changes in user table or index contents.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Michael Paquier

michael.paquier@gmail.com

over 9 years ago

In reply to: Tom Lane (#13)

1 attachment(s)

Re: [BUGS] BUG #14155: bloom index error with unlogged table

On Fri, Jun 3, 2016 at 11:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Michael Paquier <michael.paquier@gmail.com> writes:

Actually, the docs could be more polished.

I think the docs could stand to be rewritten from scratch ;-). But
upthread there was an offer to work on them if we made the code behavior
saner. I've done the latter part, I don't want to do the former.

I have finally given a shot at improving the docs with the attached.
Comments are welcome.
--
Michael

Attachments:

bloom-docs.patchtext/x-diff; charset=US-ASCII; name=bloom-docs.patchDownload

diff --git a/doc/src/sgml/bloom.sgml b/doc/src/sgml/bloom.sgml
index 8667763..c1e204f 100644
--- a/doc/src/sgml/bloom.sgml
+++ b/doc/src/sgml/bloom.sgml
@@ -8,43 +8,38 @@
  </indexterm>
 
  <para>
-  <literal>bloom</> is a module that implements an index access method.  It comes
-  as an example of custom access methods and generic WAL record usage.  But it
-  is also useful in itself.
+  <literal>bloom</> provides an index access method usable as a
+  <ulink url="http://en.wikipedia.org/wiki/Bloom_filter">Bloom filter</ulink>.
  </para>
 
- <sect2>
-  <title>Introduction</title>
-
-  <para>
-   The implementation of a
-   <ulink url="http://en.wikipedia.org/wiki/Bloom_filter">Bloom filter</ulink>
-   allows fast exclusion of non-candidate tuples via signatures.
-   Since a signature is a lossy representation of all indexed attributes,
-   search results must be rechecked using heap information.
-   The user can specify signature length in bits (default 80, maximum 4096)
-   and the number of bits generated for each index column (default 2,
-   maximum 4095).
-  </para>
+ <para>
+  A bloom filter is a space-efficient data structure that is used to test
+  whether an element is a member of a set.  In the case of an index access
+  method, it allows fast exclusion of non-candidate tuples via signatures
+  whose size is calculated in bits and is controllable at index creation.
+ </para>
 
-  <para>
-   This index is useful if a table has many attributes and queries include
-   arbitrary combinations of them.  A traditional <literal>btree</> index is
-   faster than a bloom index, but it can require many indexes to support all
-   possible queries where one needs only a single bloom index.  A Bloom index
-   supports only equality comparison.  Since it's a signature file, and not a
-   tree, it always must be read fully, but sequentially, so that index search
-   performance is constant and doesn't depend on a query.
-  </para>
- </sect2>
+ <para>
+  A signature is a lossy representation of all indexed attributes and
+  so it can report false positives (it is reported that an element exists in
+  the set, however it does not), so search results must be rechecked using
+  heap information.
+ </para>
+ 
+ <para>
+  This type of index is useful if a table has many attributes and queries
+  include arbitrary combinations of them.  A traditional <literal>btree</>
+  index is faster than a bloom index, but it can require many indexes to
+  support all possible queries where one needs only a single bloom index
+  (default 80, maximum 4096).
+ </para>
 
  <sect2>
   <title>Parameters</title>
 
   <para>
-   <literal>bloom</> indexes accept the following parameters in the
-   <literal>WITH</>
-   clause.
+   A <literal>bloom</> index accepts the following parameters in its
+   <literal>WITH</> clause.
   </para>
 
    <variablelist>
@@ -52,7 +47,8 @@
     <term><literal>length</></term>
     <listitem>
      <para>
-      Length of signature in bits
+      Length of signature in bits. Default is <literal>80</> and maximum is
+      <literal>4096</>.
      </para>
     </listitem>
    </varlistentry>
@@ -62,7 +58,9 @@
     <term><literal>col1 &mdash; col32</></term>
     <listitem>
      <para>
-      Number of bits generated for each index column
+      Number of bits generated for each index column. Each parameter refers
+      to the number of the column for the relation on which an index is
+      defined.
      </para>
     </listitem>
    </varlistentry>
@@ -82,91 +80,88 @@ CREATE INDEX bloomidx ON tbloom USING bloom (i1,i2,i3)
 </programlisting>
 
   <para>
-   Here, we created a bloom index with a signature length of 80 bits,
-   and attributes i1 and i2 mapped to 2 bits, and attribute i3 mapped to 4 bits.
+   Here, a bloom index is created with a signature length of 80 bits, and
+   attributes i1 and i2 mapped to 2 bits, and attribute i3 mapped to 4 bits.
   </para>
 
   <para>
-   Here is a fuller example of index definition and usage:
+   Here is a more complete example of index definition and usage, a bloom
+   index  representing first the advantage to be more space-efficient
+   than an equivalent btree index:
   </para>
 
 <programlisting>
-CREATE TABLE tbloom AS
-SELECT
-    random()::int as i1,
-    random()::int as i2,
-    random()::int as i3,
-    random()::int as i4,
-    random()::int as i5,
-    random()::int as i6,
-    random()::int as i7,
-    random()::int as i8,
-    random()::int as i9,
-    random()::int as i10,
-    random()::int as i11,
-    random()::int as i12,
-    random()::int as i13
-FROM
-    generate_series(1,1000);
-CREATE INDEX bloomidx ON tbloom USING
-             bloom (i1, i2, i3, i4, i5, i6, i7, i8, i9, i10, i11, i12);
-SELECT pg_relation_size('bloomidx');
-CREATE index btree_idx ON tbloom(i1,i2,i3,i4,i5,i6,i7,i8,i9,i10,i11,i12);
-SELECT pg_relation_size('btree_idx');
-</programlisting>
-
-<programlisting>
-=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 20 AND i10 = 15;
-                                                   QUERY PLAN
------------------------------------------------------------------------------------------------------------------
- Bitmap Heap Scan on tbloom  (cost=1.50..5.52 rows=1 width=52) (actual time=0.057..0.057 rows=0 loops=1)
-   Recheck Cond: ((i2 = 20) AND (i10 = 15))
-   ->  Bitmap Index Scan on bloomidx  (cost=0.00..1.50 rows=1 width=0) (actual time=0.041..0.041 rows=9 loops=1)
-         Index Cond: ((i2 = 20) AND (i10 = 15))
- Total runtime: 0.081 ms
-(5 rows)
+=# CREATE TABLE tbloom AS
+   SELECT
+     (random() * 1000000)::int as i1,
+     (random() * 1000000)::int as i2,
+     (random() * 1000000)::int as i3,
+     (random() * 1000000)::int as i4,
+     (random() * 1000000)::int as i5,
+     (random() * 1000000)::int as i6
+   FROM
+  generate_series(1,10000000);
+SELECT 10000000
+=# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5, i6);
+CREATE INDEX
+=# SELECT pg_size_pretty(pg_relation_size('bloomidx'));
+ pg_size_pretty
+----------------
+ 153 MB
+(1 row)
+=# CREATE index btree_idx ON tbloom(i1,i2,i3,i4,i5,i6);
+CREATE INDEX
+=# SELECT pg_size_pretty(pg_relation_size('btree_idx'));
+ pg_size_pretty
+----------------
+ 387 MB
+(1 row)
 </programlisting>
 
   <para>
-   Seqscan is slow.
+   A sequential scan takes a long time when concatenating multiple quals
+   with <literal>AND</literal>, in which case btree indexes are usually
+   selected by the planner:
   </para>
-
 <programlisting>
-=# SET enable_bitmapscan = off;
-=# SET enable_indexscan = off;
-=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 20 AND i10 = 15;
-                                            QUERY PLAN
---------------------------------------------------------------------------------------------------
- Seq Scan on tbloom  (cost=0.00..25.00 rows=1 width=52) (actual time=0.162..0.162 rows=0 loops=1)
-   Filter: ((i2 = 20) AND (i10 = 15))
- Total runtime: 0.181 ms
-(3 rows)
+ =# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
+                                                           QUERY PLAN
+---------------------------------------------------------------------------------------------------------------------------------
+ Index Only Scan using btree_idx on tbloom  (cost=0.56..298315.96 rows=1 width=24) (actual time=669.284..669.284 rows=0 loops=1)
+   Index Cond: ((i2 = 898732) AND (i5 = 123451))
+   Heap Fetches: 0
+ Planning time: 0.159 ms
+ Execution time: 669.330 ms
+ (5 rows)
 </programlisting>
 
- <para>
-  A btree index will be not used for this query.
- </para>
+  <para>
+   Bloom is better than btree in handling this kind of qual accumulation:
+  </para>
 
 <programlisting>
-=# DROP INDEX bloomidx;
-=# CREATE INDEX btree_idx ON tbloom(i1, i2, i3, i4, i5, i6, i7, i8, i9, i10, i11, i12);
-=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 20 AND i10 = 15;
-                                            QUERY PLAN
---------------------------------------------------------------------------------------------------
- Seq Scan on tbloom (cost=0.00..25.00 rows=1 width=52) (actual time=0.210..0.210 rows=0 loops=1)
-   Filter: ((i2 = 20) AND (i10 = 15))
- Total runtime: 0.250 ms
-(3 rows)
+ =# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
+                                                         QUERY PLAN
+-----------------------------------------------------------------------------------------------------------------------------
+ Bitmap Heap Scan on tbloom  (cost=178436.49..178440.51 rows=1 width=24) (actual time=125.279..125.279 rows=0 loops=1)
+   Recheck Cond: ((i2 = 898732) AND (i5 = 123451))
+   Rows Removed by Index Recheck: 2391
+   Heap Blocks: exact=2340
+   ->  Bitmap Index Scan on bloomidx  (cost=0.00..178436.49 rows=1 width=0) (actual time=109.270..109.270 rows=2391 loops=1)
+         Index Cond: ((i2 = 898732) AND (i5 = 123451))
+ Planning time: 0.490 ms
+ Execution time: 125.377 ms
+(8 rows)
 </programlisting>
  </sect2>
 
  <sect2>
-  <title>Opclass interface</title>
+  <title>Operator Class Interface</title>
 
   <para>
-   The Bloom opclass interface is simple.  It requires 1 supporting function:
-   a hash function for the indexing datatype.  It provides 1 search operator:
-   the equality operator.  The example below shows <literal>opclass</>
+   The Bloom opclass interface is simple as it requires one hash
+   function for the indexing datatype and it needs one equality operator
+   as search operator. The example below shows <literal>opclass</>
    definition for <literal>text</> datatype.
   </para>
 
@@ -179,22 +174,21 @@ DEFAULT FOR TYPE text USING bloom AS
  </sect2>
 
  <sect2>
-  <title>Limitation</title>
+  <title>Limitations</title>
   <para>
-
    <itemizedlist>
     <listitem>
      <para>
-      For now, only opclasses for <literal>int4</>, <literal>text</> come
-      with the module.  However, users may define more of them.
+      Only operator classes for <literal>int4</> and <literal>text</> are
+      implemented with the module.
      </para>
     </listitem>
 
     <listitem>
      <para>
-      Only the <literal>=</literal> operator is supported for search at the
-      moment.  But it's possible to add support for arrays with contains and
-      intersection operations in the future.
+      Only the <literal>=</literal> operator is supported for search.  But
+      it is possible to add support for arrays with union and intersection
+      operations in the future.
      </para>
     </listitem>
    </itemizedlist>

#17

David G. Johnston

david.g.johnston@gmail.com

over 9 years ago

In reply to: Michael Paquier (#16)

Re: [BUGS] BUG #14155: bloom index error with unlogged table

On Tue, Jun 7, 2016 at 1:35 AM, Michael Paquier <michael.paquier@gmail.com>
wrote:

On Fri, Jun 3, 2016 at 11:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Michael Paquier <michael.paquier@gmail.com> writes:

Actually, the docs could be more polished.

I think the docs could stand to be rewritten from scratch ;-). But
upthread there was an offer to work on them if we made the code behavior
saner. I've done the latter part, I don't want to do the former.

I have finally given a shot at improving the docs with the attached.
Comments are welcome.

Looks good. Thanks!

Some minor word-smithing related stuff and one definitional concern:

"of all indexed attributes and so it can report false positives" -> of all
indexed attributes and as such is prone to reporting false positives;

"in the set, however" -> "in the set although"

"one only needs a single bloom index (default 80, maximum 4096)" -> the
default seems like it would be better placed in the first paragraph of the
intro where "whose size in calculated in bits" is mentioned; or better yet
dropped altogether since the parameters section covers the defaults.

*** "to the number of the column for" - the examples imply that each
parameter refers to columns by name, not number.

"a bloom index representing first the advantage to be more" - this intro
to the example needs some work. maybe: "Here is a more complete example of
index definition and usage, as well as a comparison with the equivalent
btree index. The bloom index is considerably smaller as well as performs
better than the btree index.

---As an aside, is a multi-column index really a fair comparison here?

---Leaving a sequential scan explain analyze in place should be considered.

"The Bloom opclass interface" -> The Bloom opclass interface requires a
hash function for the indexing datatype and an equality operator for
searching. The example...(drop the simple conclusion the word the equality
operator part better).

"are implemented with the module" - are supplied by this module. (side
question, for 10.0 how about we call these extensions instead of modules?)

David J.

#18

David G. Johnston

david.g.johnston@gmail.com

over 9 years ago

In reply to: Michael Paquier (#16)

Re: [BUGS] BUG #14155: bloom index error with unlogged table

On Tue, Jun 7, 2016 at 1:35 AM, Michael Paquier <michael.paquier@gmail.com>
wrote:

On Fri, Jun 3, 2016 at 11:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Michael Paquier <michael.paquier@gmail.com> writes:

Actually, the docs could be more polished.

I think the docs could stand to be rewritten from scratch ;-). But
upthread there was an offer to work on them if we made the code behavior
saner. I've done the latter part, I don't want to do the former.

I have finally given a shot at improving the docs with the attached.
Comments are welcome.

It would be nice to give guidance on selecting a bit size for columns and
a signature length. Yes, Wikipedia covers the topic but to get the reader
started some discussion of the relevant trade-offs when using larger
numbers than the default would be nice. I don't suspect using smaller the
default values is apt to be worthwhile...

David J.

#19

Tom Lane

tgl@sss.pgh.pa.us

over 9 years ago

In reply to: David G. Johnston (#18)

Re: [BUGS] BUG #14155: bloom index error with unlogged table

"David G. Johnston" <david.g.johnston@gmail.com> writes:

On Tue, Jun 7, 2016 at 1:35 AM, Michael Paquier <michael.paquier@gmail.com>
wrote:

I have finally given a shot at improving the docs with the attached.
Comments are welcome.

[ assorted comments ]

I adopted most of David's suggestions, whacked it around a bit further
myself, and committed. See what you think.

It would be nice to give guidance on selecting a bit size for columns and
a signature length. Yes, Wikipedia covers the topic but to get the reader
started some discussion of the relevant trade-offs when using larger
numbers than the default would be nice. I don't suspect using smaller the
default values is apt to be worthwhile...

Agreed, but I didn't want to write such text myself. There's room for
further improvement here. I did add a note in the main example about
what happens with a non-default signature length, but that hardly
constitutes guidance.

BTW, it seemed to me while generating the example that the planner's
costing for bloom index searches was unduly pessimistic; maybe there's
work to do there?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

David G. Johnston

david.g.johnston@gmail.com

over 9 years ago

In reply to: Tom Lane (#19)

Re: [BUGS] BUG #14155: bloom index error with unlogged table

On Tue, Jun 7, 2016 at 12:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

"David G. Johnston" <david.g.johnston@gmail.com> writes:

On Tue, Jun 7, 2016 at 1:35 AM, Michael Paquier <

michael.paquier@gmail.com>

wrote:

I have finally given a shot at improving the docs with the attached.
Comments are welcome.

[ assorted comments ]

I adopted most of David's suggestions, whacked it around a bit further
myself, and committed. See what you think.

Looks good though I'm waiting for the website to update.

Do I understand the process correctly? The current 9.6 docs reflect what
was committed and branched as beta1. Ongoing work is done against master
(devel docs). When beta2 is released it is branched from the current
master; not beta1. At that point the current devel docs will become the
new 9.6 docs.

Thanks!

David J.

#21

Tom Lane

tgl@sss.pgh.pa.us

over 9 years ago

In reply to: David G. Johnston (#20)

Re: [BUGS] BUG #14155: bloom index error with unlogged table

"David G. Johnston" <david.g.johnston@gmail.com> writes:

Do I understand the process correctly? The current 9.6 docs reflect what
was committed and branched as beta1. Ongoing work is done against master
(devel docs). When beta2 is released it is branched from the current
master; not beta1. At that point the current devel docs will become the
new 9.6 docs.

There is no beta1 branch. 9.6 is still the master branch, and beta2 will
be stamped on master, as will at least the next couple of betas. We will
branch off REL9_6_STABLE whenever we're ready to open the tree for
9.7^H^H^H10 development, which likely won't be till September. (The
point here is to try to keep people focused on stabilizing 9.6 for
awhile longer yet. Last year we opened new development in the summer,
and that had a detrimental effect on getting 9.5 out the door.)

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22

Michael Paquier

michael.paquier@gmail.com

over 9 years ago

In reply to: Tom Lane (#19)

Re: [BUGS] BUG #14155: bloom index error with unlogged table

On Wed, Jun 8, 2016 at 1:23 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

"David G. Johnston" <david.g.johnston@gmail.com> writes:

On Tue, Jun 7, 2016 at 1:35 AM, Michael Paquier <michael.paquier@gmail.com>
wrote:

I have finally given a shot at improving the docs with the attached.
Comments are welcome.

[ assorted comments ]

I adopted most of David's suggestions, whacked it around a bit further
myself, and committed. See what you think.

That looks better, thanks.

It would be nice to give guidance on selecting a bit size for columns and
a signature length. Yes, Wikipedia covers the topic but to get the reader
started some discussion of the relevant trade-offs when using larger
numbers than the default would be nice. I don't suspect using smaller the
default values is apt to be worthwhile...

Agreed, but I didn't want to write such text myself. There's room for
further improvement here. I did add a note in the main example about
what happens with a non-default signature length, but that hardly
constitutes guidance.

BTW, it seemed to me while generating the example that the planner's
costing for bloom index searches was unduly pessimistic; maybe there's
work to do there?

I wanted them to do so to prove that index rechecks are necessary as
false positives can be returned when scanning the index. We could add
an extra example with an index that has a longer signature size... I
am not sure that's worth the complication though.

I am marking this item as closed, in my view things are looking far better.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers