Typos/Questions in bloom documentation

Started by David G. Johnstonover 9 years ago7 messages
#1David G. Johnston
david.g.johnston@gmail.com

http://www.postgresql.org/docs/devel/static/bloom.html

F.4.3 Examples

Claims that the signature length is 80 bits - shouldn't it be 8?

Also, is it OK to link to wikipedia in our documentation? (the link to
bloom filter in the second paragraph)

F.4.4 "Opclass interface"

The "I" should be capitalized in a proper title

F.4.5 Limitation

Should be plural

Other:

The lack of a boolean built-in seems odd. Can that be added easily? If
not could a user do it themselves without resorting to C code?

Recent post on -performance inspires the last question.

/messages/by-id/CANcrS5pR1P1Tj=e-RQQ=FF3WPAy_fyruS0YJer-+iJHxR1JAiA@mail.gmail.com

David J.

#2Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: David G. Johnston (#1)
1 attachment(s)
Re: Typos/Questions in bloom documentation

On 2016/04/21 6:51, David G. Johnston wrote:

http://www.postgresql.org/docs/devel/static/bloom.html

F.4.3 Examples

Claims that the signature length is 80 bits - shouldn't it be 8?

In F.4.1. Introduction:

... The user can specify signature length (in uint16, default is 5)

So, it seems right to me.

Also, is it OK to link to wikipedia in our documentation? (the link to
bloom filter in the second paragraph)

grep wikipedia doc reveals at least some hits:

doc/src/sgml/release.sgml:26
doc/src/sgml/isn.sgml:361
doc/src/sgml/isn.sgml:367
doc/src/sgml/isn.sgml:369
doc/src/sgml/textsearch.sgml:2774
doc/src/sgml/bloom.sgml:21
doc/src/sgml/monitoring.sgml:2728
doc/src/sgml/pgcrypto.sgml:1289
doc/src/sgml/pgcrypto.sgml:1351

And then some:

doc/src/sgml/acronyms.sgml:16
doc/src/sgml/acronyms.sgml:26
doc/src/sgml/acronyms.sgml:35
doc/src/sgml/acronyms.sgml:54
...

F.4.4 "Opclass interface"

The "I" should be capitalized in a proper title

F.4.5 Limitation

Should be plural

Attached is a patch for these fixes.

Thanks,
Amit

Attachments:

bloom-doc-typos.patchtext/x-diff; name=bloom-doc-typos.patchDownload
diff --git a/doc/src/sgml/bloom.sgml b/doc/src/sgml/bloom.sgml
index 7349095..d0cf317 100644
--- a/doc/src/sgml/bloom.sgml
+++ b/doc/src/sgml/bloom.sgml
@@ -160,7 +160,7 @@ SELECT pg_relation_size('btree_idx');
  </sect2>
 
  <sect2>
-  <title>Opclass interface</title>
+  <title>Opclass Interface</title>
 
   <para>
    The Bloom opclass interface is simple.  It requires 1 supporting function:
@@ -178,7 +178,7 @@ DEFAULT FOR TYPE text USING bloom AS
  </sect2>
 
  <sect2>
-  <title>Limitation</title>
+  <title>Limitations</title>
   <para>
 
    <itemizedlist>
#3David G. Johnston
david.g.johnston@gmail.com
In reply to: Amit Langote (#2)
Re: Typos/Questions in bloom documentation

On Wednesday, April 20, 2016, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>
wrote:

On 2016/04/21 6:51, David G. Johnston wrote:

http://www.postgresql.org/docs/devel/static/bloom.html

F.4.3 Examples

Claims that the signature length is 80 bits - shouldn't it be 8?

In F.4.1. Introduction:

... The user can specify signature length (in uint16, default is 5)

So, it seems right to me.

Great. Maybe you can consider re-wording it so others can understand. I
have no clue how 80bits is determined. The phase you quote is obtuse to
the casual user as well. If that means 16x5=80 irrespective of columns it
is not clear.

This may be a function of this not being considered user-space code but
something to exercise tests. But if we are going to publish it as an
extension its seems worthy of helping people decide when and how to use
them. The docs as written fail to do that - and reading the Wikipedia page
doesn't cut it either,

David J.

#4Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: David G. Johnston (#3)
1 attachment(s)
Re: Typos/Questions in bloom documentation

On 2016/04/21 11:19, David G. Johnston wrote:

On Wednesday, April 20, 2016, Amit Langote wrote:

On 2016/04/21 6:51, David G. Johnston wrote:

http://www.postgresql.org/docs/devel/static/bloom.html

F.4.3 Examples

Claims that the signature length is 80 bits - shouldn't it be 8?

In F.4.1. Introduction:

... The user can specify signature length (in uint16, default is 5)

So, it seems right to me.

Great. Maybe you can consider re-wording it so others can understand. I
have no clue how 80bits is determined. The phase you quote is obtuse to
the casual user as well. If that means 16x5=80 irrespective of columns it
is not clear.

I agree it's unclear. Does the following make it any better (updated
patch attached):

-   The user can specify signature length (in uint16, default is 5) and the
-   number of bits, which can be set per attribute (1 < colN < 2048).
+   The user can specify signature length in units of 16 bits (default is 5)
+   and the number of bits per indexed attribute.

By the way, now I am slightly confused as well about per-column bits
assignment thing:

In F.4.1. Introduction:

... and the number of bits, which can be set per attribute (1 < colN < 2048).

And then in F.4.2. Parameters:

bloom indexes accept the following parameters in the WITH clause.

length
Length of signature in uint16 type values

col1 — col16
Number of bits for corresponding column

Which is it: col1 - col2048 or col1 - col16? Or are they different things
altogether?

Thanks,
Amit

Attachments:

bloom-doc-typos-reword.patchtext/x-diff; name=bloom-doc-typos-reword.patchDownload
diff --git a/doc/src/sgml/bloom.sgml b/doc/src/sgml/bloom.sgml
index 7349095..ff0bf76 100644
--- a/doc/src/sgml/bloom.sgml
+++ b/doc/src/sgml/bloom.sgml
@@ -22,8 +22,8 @@
    allows fast exclusion of non-candidate tuples via signatures.
    Since a signature is a lossy representation of all indexed attributes, 
    search results must be rechecked using heap information. 
-   The user can specify signature length (in uint16, default is 5) and the
-   number of bits, which can be set per attribute (1 < colN < 2048).
+   The user can specify signature length in units of 16 bits (default is 5)
+   and the number of bits per indexed attribute.
   </para>
 
   <para>
@@ -51,7 +51,7 @@
     <term><literal>length</></term>
     <listitem>
      <para>
-      Length of signature in uint16 type values
+      Length of signature in units of 16 bits
      </para>
     </listitem>
    </varlistentry>
@@ -160,7 +160,7 @@ SELECT pg_relation_size('btree_idx');
  </sect2>
 
  <sect2>
-  <title>Opclass interface</title>
+  <title>Opclass Interface</title>
 
   <para>
    The Bloom opclass interface is simple.  It requires 1 supporting function:
@@ -178,7 +178,7 @@ DEFAULT FOR TYPE text USING bloom AS
  </sect2>
 
  <sect2>
-  <title>Limitation</title>
+  <title>Limitations</title>
   <para>
 
    <itemizedlist>
#5David G. Johnston
david.g.johnston@gmail.com
In reply to: Amit Langote (#4)
Re: Typos/Questions in bloom documentation

On Wed, Apr 20, 2016 at 9:18 PM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp

wrote:

On 2016/04/21 11:19, David G. Johnston wrote:

On Wednesday, April 20, 2016, Amit Langote wrote:

On 2016/04/21 6:51, David G. Johnston wrote:

http://www.postgresql.org/docs/devel/static/bloom.html

F.4.3 Examples

Claims that the signature length is 80 bits - shouldn't it be 8?

In F.4.1. Introduction:

... The user can specify signature length (in uint16, default is 5)

So, it seems right to me.

Great. Maybe you can consider re-wording it so others can understand. I
have no clue how 80bits is determined. The phase you quote is obtuse to
the casual user as well. If that means 16x5=80 irrespective of columns

it

is not clear.

I agree it's unclear. Does the following make it any better (updated
patch attached):

-   The user can specify signature length (in uint16, default is 5) and the
-   number of bits, which can be set per attribute (1 < colN < 2048).
+   The user can specify signature length in units of 16 bits (default is
5)
+   and the number of bits per indexed attribute.

​Better. The "and" is confusing. Is the signature length the sum of 16x5
+ (bits per indexed attribute)​?

By the way, now I am slightly confused as well about per-column bits

assignment thing:

In F.4.1. Introduction:

... and the number of bits, which can be set per attribute (1 < colN <
2048).

And then in F.4.2. Parameters:

bloom indexes accept the following parameters in the WITH clause.

length
Length of signature in uint16 type values

How about: "Number of 16bit units to use for the signature"

col1 — col16
Number of bits for corresponding column

Which is it: col1 - col2048 or col1 - col16? Or are they different things
altogether?

​Good question...

David J.​

#6Michael Paquier
michael.paquier@gmail.com
In reply to: David G. Johnston (#5)
Re: Typos/Questions in bloom documentation

On Fri, Apr 22, 2016 at 1:25 AM, David G. Johnston
<david.g.johnston@gmail.com> wrote:

On Wed, Apr 20, 2016 at 9:18 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:

I agree it's unclear. Does the following make it any better (updated
patch attached):

I have sent a patch to rework the docs here:
/messages/by-id/CAB7nPqQB8dcFmY1uodmiJOSZdhBFOx-us-uW6rfYrzhpEiBR2g@mail.gmail.com
This may interest people here.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Amit Langote
Langote_Amit_f8@lab.ntt.co.jp
In reply to: Michael Paquier (#6)
Re: Typos/Questions in bloom documentation

On 2016/06/07 14:41, Michael Paquier wrote:

On Fri, Apr 22, 2016 at 1:25 AM, David G. Johnston
<david.g.johnston@gmail.com> wrote:

On Wed, Apr 20, 2016 at 9:18 PM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:

I agree it's unclear. Does the following make it any better (updated
patch attached):

I have sent a patch to rework the docs here:
/messages/by-id/CAB7nPqQB8dcFmY1uodmiJOSZdhBFOx-us-uW6rfYrzhpEiBR2g@mail.gmail.com
This may interest people here.

Thanks, Michael.

Regards,
Amit

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers