Why does CREATE INDEX CONCURRENTLY need two scans?

Started by Joshua Maabout 11 years ago5 messagesgeneral

josh@benchling.com

about 11 years ago

Hi all,

I was curious about why CONCURRENTLY needs two scans to complete - from the
documentation on HOT (access/heap/README.HOT), it looks like the process is:

1) insert pg_index entry, wait for relevant in-progress txns to finish
(before marking index open for inserts, so HOT updates won't write
incorrect index entries)
2) build index in 1st snapshot, mark index open for inserts
3) in 2nd snapshot, validate index and insert missing tuples since first
snapshot, mark index valid for searches

Why are two scans necessary? What would break if it did something like the
following?

1) insert pg_index entry, wait for relevant txns to finish, mark index open
for inserts
2) build index in a single snapshot, mark index valid for searches

Wouldn't new inserts update the index correctly? Between the snapshot and
index-updating txns afterwards, wouldn't all updates be covered?

To be clear, I'm not trying to suggest any changes, just wondering what's
missing from my mental model. :)

Thanks!
Josh
ᐧ

Michael Paquier

michael@paquier.xyz

about 11 years ago

In reply to: Joshua Ma (#1)

Re: Why does CREATE INDEX CONCURRENTLY need two scans?

On Wed, Apr 1, 2015 at 9:43 AM, Joshua Ma <josh@benchling.com> wrote:

Hi all,

I was curious about why CONCURRENTLY needs two scans to complete - from
the documentation on HOT (access/heap/README.HOT), it looks like the
process is:

1) insert pg_index entry, wait for relevant in-progress txns to finish
(before marking index open for inserts, so HOT updates won't write
incorrect index entries)
2) build index in 1st snapshot, mark index open for inserts
3) in 2nd snapshot, validate index and insert missing tuples since first
snapshot, mark index valid for searches

Why are two scans necessary? What would break if it did something like the
following?

1) insert pg_index entry, wait for relevant txns to finish, mark index
open for inserts

2) build index in a single snapshot, mark index valid for searches

Wouldn't new inserts update the index correctly? Between the snapshot and
index-updating txns afterwards, wouldn't all updates be covered?

When an index is built with index_build, are included in the index only the
tuples seen at the start of the first scan. A second scan is needed to add
in the index entries for the tuples that have been inserted into the table
during the build phase.
--
Michael

josh@benchling.com

about 11 years ago

In reply to: Michael Paquier (#2)

Re: Why does CREATE INDEX CONCURRENTLY need two scans?

Hi Michael,

Isn't that also true during the 2nd scan? I'm assuming new inserts during
the 2nd scan properly update the index, so couldn't the same mechanism
update the index during the 1st scan?

I guess I'm confused because, if you assume a pathological case where all
the data gets inserted after the 1st snapshot, the 1st scan wouldn't pick
anything up and the 3-step process I had earlier becomes identical to the
2-step process. Is there something special about index_build?

- Josh

On Tue, Mar 31, 2015 at 7:08 PM, Michael Paquier <michael.paquier@gmail.com>
wrote:

Show quoted text

On Wed, Apr 1, 2015 at 9:43 AM, Joshua Ma <josh@benchling.com> wrote:

Hi all,

I was curious about why CONCURRENTLY needs two scans to complete - from
the documentation on HOT (access/heap/README.HOT), it looks like the
process is:

1) insert pg_index entry, wait for relevant in-progress txns to finish
(before marking index open for inserts, so HOT updates won't write
incorrect index entries)
2) build index in 1st snapshot, mark index open for inserts
3) in 2nd snapshot, validate index and insert missing tuples since first
snapshot, mark index valid for searches

Why are two scans necessary? What would break if it did something like
the following?

1) insert pg_index entry, wait for relevant txns to finish, mark index
open for inserts

2) build index in a single snapshot, mark index valid for searches

Wouldn't new inserts update the index correctly? Between the snapshot and
index-updating txns afterwards, wouldn't all updates be covered?

When an index is built with index_build, are included in the index only
the tuples seen at the start of the first scan. A second scan is needed to
add in the index entries for the tuples that have been inserted into the
table during the build phase.
--
Michael

tgl@sss.pgh.pa.us

about 11 years ago

In reply to: Michael Paquier (#2)

Re: Why does CREATE INDEX CONCURRENTLY need two scans?

Michael Paquier <michael.paquier@gmail.com> writes:

On Wed, Apr 1, 2015 at 9:43 AM, Joshua Ma <josh@benchling.com> wrote:

Why are two scans necessary? What would break if it did something like the
following?

1) insert pg_index entry, wait for relevant txns to finish, mark index
open for inserts

2) build index in a single snapshot, mark index valid for searches

Wouldn't new inserts update the index correctly? Between the snapshot and
index-updating txns afterwards, wouldn't all updates be covered?

When an index is built with index_build, are included in the index only the
tuples seen at the start of the first scan. A second scan is needed to add
in the index entries for the tuples that have been inserted into the table
during the build phase.

More to the point: Joshua's design supposes that retail insertions into
an index can happen in parallel with index build. Or in other words,
that index build consists of instantaneously creating an empty-but-valid
index file and then doing a lot of ordinary inserts into it. That's a
possible design, but it's not very efficient, and most of our index AMs
don't do it that way. btree, for instance, starts by sorting all the
entries and creating the leaf-level pages. Then it builds the upper tree
levels. It doesn't have a complete tree that could support retail
insertions until the very end. Moreover, most of the work is done in
storage that's local to the backend running CREATE INDEX, and isn't
accessible to other processes at all.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

josh@benchling.com

about 11 years ago

In reply to: Tom Lane (#4)

Re: Why does CREATE INDEX CONCURRENTLY need two scans?

Ah, that's exactly what I was looking for. Thanks everyone for the
responses!

- Josh
ᐧ

On Tue, Mar 31, 2015 at 8:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Michael Paquier <michael.paquier@gmail.com> writes:

On Wed, Apr 1, 2015 at 9:43 AM, Joshua Ma <josh@benchling.com> wrote:

Why are two scans necessary? What would break if it did something like

the

following?

1) insert pg_index entry, wait for relevant txns to finish, mark index
open for inserts

2) build index in a single snapshot, mark index valid for searches

Wouldn't new inserts update the index correctly? Between the snapshot

and

index-updating txns afterwards, wouldn't all updates be covered?

When an index is built with index_build, are included in the index only

the

tuples seen at the start of the first scan. A second scan is needed to

add

in the index entries for the tuples that have been inserted into the

table

during the build phase.

More to the point: Joshua's design supposes that retail insertions into
an index can happen in parallel with index build. Or in other words,
that index build consists of instantaneously creating an empty-but-valid
index file and then doing a lot of ordinary inserts into it. That's a
possible design, but it's not very efficient, and most of our index AMs
don't do it that way. btree, for instance, starts by sorting all the
entries and creating the leaf-level pages. Then it builds the upper tree
levels. It doesn't have a complete tree that could support retail
insertions until the very end. Moreover, most of the work is done in
storage that's local to the backend running CREATE INDEX, and isn't
accessible to other processes at all.

regards, tom lane