Word-smithing doc changes

Started by Greg Starkover 14 years ago13 messages
#1Greg Stark
stark@mit.edu

I think this commit was ill-advised:
http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=a03feb9354bda5084f19cc952bc52ba7be89f372

     In a concurrent index build, the index is actually entered into the
     system catalogs in one transaction, then the two table scans occur in a
-    second and third transaction.
+    second and third transaction.  All active transactions at the time the
+    second table scan starts, not just ones that already involve the table,
+    have the potential to block the concurrent index creation until they
+    finish.  When checking for transactions that could still use the original
+    index, concurrent index creation advances through potentially interfering
+    older transactions one at a time, obtaining shared locks on their virtual
+    transaction identifiers to wait for them to complete.

Seems way to implementation-specific and detailed for a user to make
heads or tails of. Except in the sections talking about locking
internals we don't talk about "shared locks on virtual transactions
identifiers" we just talk about waiting for a transaction to complete.
And looping over the transactions one by one is purely an
implementation detail and uninteresting to users. Also it uses
ill-defined terms like "active transactions", "potentially interfering
older transactions", and "original index" -- from the user's point of
view there's only one index and it just isn't completely built yet.

Are we not yet in string-freeze though? I'll go ahead and edit it if
people don't mind. I'm curious to see the original complaint though.

--
greg

#2Robert Haas
robertmhaas@gmail.com
In reply to: Greg Stark (#1)
Re: Word-smithing doc changes

On Sat, Jun 25, 2011 at 9:01 PM, Greg Stark <stark@mit.edu> wrote:

I think this commit was ill-advised:
http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=a03feb9354bda5084f19cc952bc52ba7be89f372

    In a concurrent index build, the index is actually entered into the
    system catalogs in one transaction, then the two table scans occur in a
-    second and third transaction.
+    second and third transaction.  All active transactions at the time the
+    second table scan starts, not just ones that already involve the table,
+    have the potential to block the concurrent index creation until they
+    finish.  When checking for transactions that could still use the original
+    index, concurrent index creation advances through potentially interfering
+    older transactions one at a time, obtaining shared locks on their virtual
+    transaction identifiers to wait for them to complete.

Seems way to implementation-specific and detailed for a user to make
heads or tails of. Except in the sections talking about locking
internals we don't talk about "shared locks on virtual transactions
identifiers" we just talk about waiting for a transaction to complete.
And looping over the transactions one by one is purely an
implementation detail and uninteresting to users. Also it uses
ill-defined terms like "active transactions", "potentially interfering
older transactions", and  "original index" -- from the user's point of
view there's only one index and it just isn't completely built yet.

Are we not yet in string-freeze though? I'll go ahead and edit it if
people don't mind. I'm curious to see the original complaint though.

We don't have a string freeze, and certainly not for the
documentation, so if you'd like to wordsmith some more, have at it.
But it would probably be best to post your revised version and solicit
feedback before committing, since there was quite a bit of discussion
about that change before it was made. (Sorry, don't have the pointer
at the moment...)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#3Alvaro Herrera
alvherre@commandprompt.com
In reply to: Greg Stark (#1)
Re: Word-smithing doc changes

Excerpts from Greg Stark's message of sáb jun 25 21:01:36 -0400 2011:

I think this commit was ill-advised:
http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=a03feb9354bda5084f19cc952bc52ba7be89f372

Seems way to implementation-specific and detailed for a user to make
heads or tails of. Except in the sections talking about locking
internals we don't talk about "shared locks on virtual transactions
identifiers" we just talk about waiting for a transaction to complete.

Hmm, right.

And looping over the transactions one by one is purely an
implementation detail and uninteresting to users. Also it uses
ill-defined terms like "active transactions", "potentially interfering
older transactions", and "original index" -- from the user's point of
view there's only one index and it just isn't completely built yet.

Wow, that's a lot of mistakes for a single paragraph, sorry about that.

Are we not yet in string-freeze though? I'll go ahead and edit it if
people don't mind. I'm curious to see the original complaint though.

I don't -- please go ahead.

Original complaint in Message-id 4DDB64CB.7070109@2ndQuadrant.com

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#4Bruce Momjian
bruce@momjian.us
In reply to: Alvaro Herrera (#3)
1 attachment(s)
Re: Word-smithing doc changes

Alvaro Herrera wrote:

Excerpts from Greg Stark's message of s��b jun 25 21:01:36 -0400 2011:

I think this commit was ill-advised:
http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=a03feb9354bda5084f19cc952bc52ba7be89f372

Seems way to implementation-specific and detailed for a user to make
heads or tails of. Except in the sections talking about locking
internals we don't talk about "shared locks on virtual transactions
identifiers" we just talk about waiting for a transaction to complete.

Hmm, right.

And looping over the transactions one by one is purely an
implementation detail and uninteresting to users. Also it uses
ill-defined terms like "active transactions", "potentially interfering
older transactions", and "original index" -- from the user's point of
view there's only one index and it just isn't completely built yet.

Wow, that's a lot of mistakes for a single paragraph, sorry about that.

Are we not yet in string-freeze though? I'll go ahead and edit it if
people don't mind. I'm curious to see the original complaint though.

I don't -- please go ahead.

Original complaint in Message-id 4DDB64CB.7070109@2ndQuadrant.com

I have simplified the concurrent index build docs with the attached
patch.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Attachments:

/rtmp/concurrenttext/x-diffDownload
diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
new file mode 100644
index e8a7caf..7391a5f
*** a/doc/src/sgml/ref/create_index.sgml
--- b/doc/src/sgml/ref/create_index.sgml
*************** CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ]
*** 400,413 ****
  
     <para>
      In a concurrent index build, the index is actually entered into the
!     system catalogs in one transaction, then the two table scans occur in a
!     second and third transaction.  All active transactions at the time the
!     second table scan starts, not just ones that already involve the table,
!     have the potential to block the concurrent index creation until they
!     finish.  When checking for transactions that could still use the original
!     index, concurrent index creation advances through potentially interfering
!     older transactions one at a time, obtaining shared locks on their virtual
!     transaction identifiers to wait for them to complete.
     </para>
  
     <para>
--- 400,409 ----
  
     <para>
      In a concurrent index build, the index is actually entered into the
!     system catalogs in one transaction; two table scans then occur in a
!     second and third transaction.  The concurrent index build will not
!     complete until all running transactions can see the new index
!     definition.
     </para>
  
     <para>
#5Greg Smith
greg@2ndQuadrant.com
In reply to: Bruce Momjian (#4)
Re: Word-smithing doc changes

Excerpts from Greg Stark's message of s�b jun 25 21:01:36 -0400 2011:

I think this commit was ill-advised:
http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=a03feb9354bda5084f19cc952bc52ba7be89f372
Seems way to implementation-specific and detailed for a user to make
heads or tails of. Except in the sections talking about locking
internals we don't talk about "shared locks on virtual transactions
identifiers" we just talk about waiting for a transaction to complete.

Looks like I missed this when it passed by before, and looks like Greg
Stark may have missed the message on pgsql-docs that kicked this all
off:
http://archives.postgresql.org/message-id/4DDB64CB.7070109@2ndQuadrant.com

I will happily accept that the description there may have suffered from
me not using all of the terms optimally, and that the resulting commit
could be improved. Some more feedback to get the description correct
and useful would be much appreciated.

What I cannot agree with is that idea that the implementation details I
suggested documenting should not be. There are extremely user-hostile
things that can happen here, and that are unique to this command.
Saying "this is too complicated for users to make heads or tails of" may
very well be true in many cases, but I think it's not giving PostgreSQL
users very much credit. And when problems with this happen, and I
wouldn't have spent any time on this if they didn't, right now the only
way to make heads or tails of it is to read the source code.

If the code was simple, quick, and had no failure modes, it would be
fine to not describe it. This is complicated, the run time cannot be
bounded, and it can ripple to nasty lock queue issues--at some
impossible to predict future time, long after you start the creation. I
don't have a good idea how to unload the potential foot gun. The best I
could think of after being shot with it was describing how it fires.

And looping over the transactions one by one is purely an
implementation detail and uninteresting to users.

That particular suggestion came from me having a painful session I
didn't want anyone else to ever go through again. By the end of that,
this implementation detail felt like the most important missing piece of
PostgreSQL documentation in the world to me--I'm too busy to send in doc
patches describing things that I haven't been shot by.

To provide some more context, the server I ran into this on always has
at least 2 reports that take 10 to 16 hours to run active. I happily
kicked off a concurrent index build on a heavily bloated 1GB table whose
indexes are typically >5GB, which I expect to take a few minutes given
the small size involved. Six hours later, when I come back and discover
it's still not done, I find a single lock waiting for a transaction to
finish. Since I'm used to multiple locks forming into a tree
structure, and I only see one, I expect I'm OK once that's done. Fine;
I estimate how much time that report has left and leave for a bit.

Four hours later, I come back. That original transaction lock is gone.
Now it's created a new one I didn't expect, moving onto the second
oldest report active. I actually have six more hours to go still. This
locking pattern is unique to this command, and if I'd had the slightest
idea that it worked this way I'd have approached the rebuild differently.

#6Robert Haas
robertmhaas@gmail.com
In reply to: Greg Smith (#5)
Re: Word-smithing doc changes

On Wed, Nov 30, 2011 at 3:02 AM, Greg Smith <greg@2ndquadrant.com> wrote:

I will happily accept that the description there may have suffered from me
not using all of the terms optimally, and that the resulting commit could be
improved.  Some more feedback to get the description correct and useful
would be much appreciated.

What I cannot agree with is that idea that the implementation details I
suggested documenting should not be.  There are extremely user-hostile
things that can happen here, and that are unique to this command.  Saying
"this is too complicated for users to make heads or tails of" may very well
be true in many cases, but I think it's not giving PostgreSQL users very
much credit.  And when problems with this happen, and I wouldn't have spent
any time on this if they didn't, right now the only way to make heads or
tails of it is to read the source code.

+1.

If we only document approximately how it works, then that's less work,
but it's also less useful. Greg's attempt to document *exactly* how
it works was kind of klunky, but I think that can and should be
improved, not replaced with wording that's more vague and therefore
easier to write.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#7Greg Stark
stark@mit.edu
In reply to: Greg Smith (#5)
Re: Word-smithing doc changes

On Wed, Nov 30, 2011 at 8:02 AM, Greg Smith <greg@2ndquadrant.com> wrote:

Except in the sections talking about locking
internals we don't talk about "shared locks on virtual transactions
identifiers" we just talk about waiting for a transaction to complete.

What I cannot agree with is that idea that the implementation details I
suggested documenting should not be.

What I'm suggesting is translating things like "shared locks on
virtual transaction identifiers" into what that means for users.
Namely saying something like "waiting for a transaction to complete".

Given your confusion it's clear that we have to explain that it will
wait one by one for each transaction that was started before the index
was created to finish. I don't think we need to explain how that's
implemented. If we do it should be set aside in some way, something
like "(see virtual transaction id locks in <href...>)".

I just want to keep in mind that the reader here is trying to
understand how to use create index concurrently, not understand how
Postgres's locking infrastructure works in general.

--
greg

#8Greg Smith
greg@2ndQuadrant.com
In reply to: Greg Stark (#7)
Re: Word-smithing doc changes

On 11/30/2011 10:20 AM, Greg Stark wrote:

Given your confusion it's clear that we have to explain that it will
wait one by one for each transaction that was started before the index
was created to finish.

When the index was created is a fuzzy thing though. It looked to me
like it makes this check at the start of Phase 2. If I read "when the
index was created" in the manual, I would assume that meant "the instant
at which CREATE INDEX CONCURRENTLY started". I don't think that's
actually the case though; it's actually delayed to the beginning of
Phase 2 start, which can easily be hours later for big indexes. Please
correct me if I'm reading that wrong.

I don't think we need to explain how that's
implemented. If we do it should be set aside in some way, something
like "(see virtual transaction id locks in<href...>)".

Fair enough. There is this wording in the pg_locks documentation:
"When one transaction finds it necessary to wait specifically for
another transaction, it does so by attempting to acquire share lock on
the other transaction ID (either virtual or permanent ID depending on
the situation). That will succeed only when the other transaction
terminates and releases its locks."

Linking to that instead of trying to duplicate it is a good idea.
Another good, related idea might be to expand the main "Concurrency
Control" chapter to include this information. When I re-read "Explicit
Locking" again:
http://developer.postgresql.org/pgdocs/postgres/explicit-locking.html it
strikes me it would be nice to add "Transaction Locks" as an full entry
there. The trivia around how those work is really kind of buried in the
pg_locks section.

I just want to keep in mind that the reader here is trying to
understand how to use create index concurrently, not understand how
Postgres's locking infrastructure works in general.

To the extent they can be ignorant of the locking infrastructure, that's
true. This operation is complicated enough that I don't think we can
hide too many of the details from the users, while still letting them
assess the risk and potential duration accurately.

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us

#9Bruce Momjian
bruce@momjian.us
In reply to: Greg Smith (#8)
1 attachment(s)
Re: Word-smithing doc changes

On Wed, Nov 30, 2011 at 09:47:40PM -0800, Greg Smith wrote:

On 11/30/2011 10:20 AM, Greg Stark wrote:

Given your confusion it's clear that we have to explain that it will
wait one by one for each transaction that was started before the index
was created to finish.

When the index was created is a fuzzy thing though. It looked to me
like it makes this check at the start of Phase 2. If I read "when
the index was created" in the manual, I would assume that meant "the
instant at which CREATE INDEX CONCURRENTLY started". I don't think
that's actually the case though; it's actually delayed to the
beginning of Phase 2 start, which can easily be hours later for big
indexes. Please correct me if I'm reading that wrong.

I don't think we need to explain how that's
implemented. If we do it should be set aside in some way, something
like "(see virtual transaction id locks in<href...>)".

Fair enough. There is this wording in the pg_locks documentation:
"When one transaction finds it necessary to wait specifically for
another transaction, it does so by attempting to acquire share lock
on the other transaction ID (either virtual or permanent ID
depending on the situation). That will succeed only when the other
transaction terminates and releases its locks."

Linking to that instead of trying to duplicate it is a good idea.
Another good, related idea might be to expand the main "Concurrency
Control" chapter to include this information. When I re-read
"Explicit Locking" again:
http://developer.postgresql.org/pgdocs/postgres/explicit-locking.html
it strikes me it would be nice to add "Transaction Locks" as an full
entry there. The trivia around how those work is really kind of
buried in the pg_locks section.

I just want to keep in mind that the reader here is trying to
understand how to use create index concurrently, not understand how
Postgres's locking infrastructure works in general.

To the extent they can be ignorant of the locking infrastructure,
that's true. This operation is complicated enough that I don't
think we can hide too many of the details from the users, while
still letting them assess the risk and potential duration
accurately.

The concurrent index documentation under discussion above was never
updated, so I took a stab at it, attached.

Greg, I looked at adding a mention of the virtual transaction wait to
the "explicit-locking" section as you suggested, and found those were
all user-visible locking, while this is internal locking. I did find a
clear description of transaction id locking in the pg_locks system view
docs, so I just referenced that.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Attachments:

concurrent.difftext/x-diff; charset=us-asciiDownload
diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
new file mode 100644
index 2ab6470..fe2debf
*** a/doc/src/sgml/ref/create_index.sgml
--- b/doc/src/sgml/ref/create_index.sgml
*************** CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ]
*** 395,408 ****
  
     <para>
      In a concurrent index build, the index is actually entered into the
!     system catalogs in one transaction, then the two table scans occur in a
!     second and third transaction.  All active transactions at the time the
!     second table scan starts, not just ones that already involve the table,
!     have the potential to block the concurrent index creation until they
!     finish.  When checking for transactions that could still use the original
!     index, concurrent index creation advances through potentially interfering
!     older transactions one at a time, obtaining shared locks on their virtual
!     transaction identifiers to wait for them to complete.
     </para>
  
     <para>
--- 395,407 ----
  
     <para>
      In a concurrent index build, the index is actually entered into the
!     system catalogs in one transaction, then the two table scans occur in
!     second and third transactions.  Any transaction active when the
!     second table scan starts, not just those that have already referenced
!     the table, can block concurrent index creation until it completes.
!     When waiting for all old transactions, concurrent index creation
!     serially waits for each old transaction to complete, as outlined in
!     section <xref linkend="view-pg-locks">.
     </para>
  
     <para>
#10Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#9)
1 attachment(s)
Re: Word-smithing doc changes

On Fri, Aug 3, 2012 at 12:26:56AM -0400, Bruce Momjian wrote:

The concurrent index documentation under discussion above was never
updated, so I took a stab at it, attached.

Greg, I looked at adding a mention of the virtual transaction wait to
the "explicit-locking" section as you suggested, and found those were
all user-visible locking, while this is internal locking. I did find a
clear description of transaction id locking in the pg_locks system view
docs, so I just referenced that.

I found a way to clarify the wording further; patch attached.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Attachments:

concurrent.difftext/x-diff; charset=us-asciiDownload
diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
new file mode 100644
index 2ab6470..17b433a
*** a/doc/src/sgml/ref/create_index.sgml
--- b/doc/src/sgml/ref/create_index.sgml
*************** CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ]
*** 394,408 ****
     </para>
  
     <para>
!     In a concurrent index build, the index is actually entered into the
!     system catalogs in one transaction, then the two table scans occur in a
!     second and third transaction.  All active transactions at the time the
!     second table scan starts, not just ones that already involve the table,
!     have the potential to block the concurrent index creation until they
!     finish.  When checking for transactions that could still use the original
!     index, concurrent index creation advances through potentially interfering
!     older transactions one at a time, obtaining shared locks on their virtual
!     transaction identifiers to wait for them to complete.
     </para>
  
     <para>
--- 394,407 ----
     </para>
  
     <para>
!     In a concurrent index build, the index is actually entered into
!     the system catalogs in one transaction, then two table scans occur in
!     two more transactions.  Any transaction active when the second table
!     scan starts can block concurrent index creation until it completes,
!     even transactions that only reference the table after the second table
!     scan starts.   Concurrent index creation serially waits for each old
!     transaction to complete using the method outlined in section <xref
!     linkend="view-pg-locks">.
     </para>
  
     <para>
#11Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Bruce Momjian (#10)
Re: Word-smithing doc changes

Excerpts from Bruce Momjian's message of vie ago 03 09:59:36 -0400 2012:

On Fri, Aug 3, 2012 at 12:26:56AM -0400, Bruce Momjian wrote:

The concurrent index documentation under discussion above was never
updated, so I took a stab at it, attached.

Greg, I looked at adding a mention of the virtual transaction wait to
the "explicit-locking" section as you suggested, and found those were
all user-visible locking, while this is internal locking. I did find a
clear description of transaction id locking in the pg_locks system view
docs, so I just referenced that.

I found a way to clarify the wording further; patch attached.

Looks sane to me.

Are we backpatching this to 9.1? I no longer remember if the original
wording is there or just in 9.2.

--
Álvaro Herrera <alvherre@alvh.no-ip.org>

#12Bruce Momjian
bruce@momjian.us
In reply to: Alvaro Herrera (#11)
Re: Word-smithing doc changes

On Fri, Aug 3, 2012 at 12:55:30PM -0400, �lvaro Herrera wrote:

Excerpts from Bruce Momjian's message of vie ago 03 09:59:36 -0400 2012:

On Fri, Aug 3, 2012 at 12:26:56AM -0400, Bruce Momjian wrote:

The concurrent index documentation under discussion above was never
updated, so I took a stab at it, attached.

Greg, I looked at adding a mention of the virtual transaction wait to
the "explicit-locking" section as you suggested, and found those were
all user-visible locking, while this is internal locking. I did find a
clear description of transaction id locking in the pg_locks system view
docs, so I just referenced that.

I found a way to clarify the wording further; patch attached.

Looks sane to me.

Are we backpatching this to 9.1? I no longer remember if the original
wording is there or just in 9.2.

I wasn't planning to, but will do as you suggest for 9.1.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#13Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#12)
Re: Word-smithing doc changes

On Fri, Aug 3, 2012 at 01:23:53PM -0400, Bruce Momjian wrote:

On Fri, Aug 3, 2012 at 12:55:30PM -0400, �lvaro Herrera wrote:

Excerpts from Bruce Momjian's message of vie ago 03 09:59:36 -0400 2012:

On Fri, Aug 3, 2012 at 12:26:56AM -0400, Bruce Momjian wrote:

The concurrent index documentation under discussion above was never
updated, so I took a stab at it, attached.

Greg, I looked at adding a mention of the virtual transaction wait to
the "explicit-locking" section as you suggested, and found those were
all user-visible locking, while this is internal locking. I did find a
clear description of transaction id locking in the pg_locks system view
docs, so I just referenced that.

I found a way to clarify the wording further; patch attached.

Looks sane to me.

Are we backpatching this to 9.1? I no longer remember if the original
wording is there or just in 9.2.

I wasn't planning to, but will do as you suggest for 9.1.

Done.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +