Question: pg_class attributes and race conditions ?

Started by Pavan Deolaseeabout 19 years ago41 messageshackers

pavan.deolasee@gmail.com

about 19 years ago

What is the safest way to access/modify the pg_class attribute
and still avoid any race conditions with the other backends ?

A specific example is: To solve the CREATE INDEX problem with
HOT, I am thinking of adding (along with other things) a pg_class
boolean attribute, say hot_update_enable. All backends are
supposed to check this attribute before they perform an UPDATE.
The attribute would usually be available in relation->rd_rel

My understanding is that the backend which sets this attribute
must first acquire a lock on the heap relation of sufficient
strength so as to ensure that there are no concurrent UPDATErs,
update the pg_class row and then release the lock on the relation.
This would ensure that no backend has a stale "Relation"
pointer with stale value of hot_update_enable.

Also, should I use heap_inplace_update() rather than
simple_heap_update() because I want other backends to see the
change immediately irrespective of their snapshot ?

Is this a fair analysis ? Are there any rules I must follow
to avoid any deadlock and race conditions. I know we should
not be requesting a higher grade lock while holding a
lower grade lock, but are there any other restrictions/best
practices ?

Thanks,
Pavan

--

EnterpriseDB http://www.enterprisedb.com

alvherre@2ndquadrant.com

about 19 years ago

In reply to: Pavan Deolasee (#1)

Re: Question: pg_class attributes and race conditions ?

Pavan Deolasee wrote:

What is the safest way to access/modify the pg_class attribute
and still avoid any race conditions with the other backends ?

A specific example is: To solve the CREATE INDEX problem with
HOT, I am thinking of adding (along with other things) a pg_class
boolean attribute, say hot_update_enable. All backends are
supposed to check this attribute before they perform an UPDATE.
The attribute would usually be available in relation->rd_rel

My understanding is that the backend which sets this attribute
must first acquire a lock on the heap relation of sufficient
strength so as to ensure that there are no concurrent UPDATErs,
update the pg_class row and then release the lock on the relation.
This would ensure that no backend has a stale "Relation"
pointer with stale value of hot_update_enable.

FWIW this is pretty much the same I wanted to do with setting
relfrozenxid to FrozenTransactionId. To this end I wrote a patch to add
a catalog pg_ntclass (later renamed to pg_class_nt), which was
ultimately rejected for reasons I don't remember at the time. Maybe it
would be illuminating to investigate that -- please see the archives.

(I still think it would be good to have a pg_class_nt catalog, so it's
not a dead idea).

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

tgl@sss.pgh.pa.us

about 19 years ago

In reply to: Pavan Deolasee (#1)

Re: Question: pg_class attributes and race conditions ?

"Pavan Deolasee" <pavan.deolasee@enterprisedb.com> writes:

My understanding is that the backend which sets this attribute
must first acquire a lock on the heap relation of sufficient
strength so as to ensure that there are no concurrent UPDATErs,
update the pg_class row and then release the lock on the relation.

In what context are you proposing to do that, and won't this
high-strength lock in itself lead to deadlocks?

The whole thing sounds exceedingly ugly anyway --- for example
what happens if the backend doing the CREATE INDEX fails and
is therefore unable to clear the flag again?

regards, tom lane

pavan.deolasee@gmail.com

about 19 years ago

In reply to: Tom Lane (#3)

Re: Question: pg_class attributes and race conditions ?

Tom Lane wrote:

In what context are you proposing to do that, and won't this
high-strength lock in itself lead to deadlocks?

The whole thing sounds exceedingly ugly anyway --- for example
what happens if the backend doing the CREATE INDEX fails and
is therefore unable to clear the flag again?

Let me state the problem and a vague solution I am thinking of.
I would appreciate comments and suggestions.

The major known issue left with HOT is support for
CREATE INDEX and CREATE INDEX CONCURRENTLY. The
problem is with HEAP_ONLY tuples in the heap which do not have index
entries in the existing indexes. When we build a new index, some or all
of the HEAP_ONLY tuples may need index entries in the new index.
It would be very ugly if we try to keep the existing indexes
without index entries for those tuples. A clean solution
would be to add index entries for the HEAP_ONLY tuples in
the existing indexes and break all the HOT-chains.

I would leave the details, but rather explain what I have in
mind at high level. Any help to fill in the details or any
suggestions to do things differently would immensely help.

This is what I have in mind:

In the context of CREATE INDEX [CONCURRENTLY],

We first disable HOT-updates on the table. This would ensure
that no new HOT tuples are added while we CHILL the heap.
(How do we do this ?)

We then start scanning the heap and start building the new
index. If a HEAP_ONLY tuple is found which needs to be
indexed, we mark the tuple with a CHILL_IN_PROGRESS flag
and insert index entries into all the existing indexes.
(The buffer is exclusively locked and the operation is WAL
logged).

We do this until entire heap is scanned. At this point, we
would have inserted missing index entries for the HEAP_ONLY
tuples. Till this point, we don't use the direct index
entries to fetch the HEAP_ONLY tuples to avoid duplicate
fetches of the same tuple.

We now wait for all the concurrent index scans to end and
then disable HOT-chain following logic to fetch tuples.
(How do we do this ?)

At this point, all index scans would ONLY use the direct
path from the index to fetch tuples. The HOT-chains are
not followed to avoid duplicate fetches of the same tuple.

A second pass over the heap is now required to clear the
CHILL_IN_PROGRESS, HEAP_ONLY and HEAP_HOT_UPDATED flags.

At the end of this step, all the indexes and the table are
in sync. Once again we need to ensure that there are no
concurrent index scans in progress and then enable HOT-fetch.
Also, HOT-updates can be turned on.

If CREATE INDEX crashes, VACUUM is required to clear the
CHILL_IN_PROGRESS flags and the corresponding index entries
are removed. Since VACUUM runs mutually exclusive to CREATE
INDEX, we don't need any special mechanism to handle race
conditions between them.

There are some other details like running multiple CREATE
INDEX in parallel and still be able to CHILL the table
safely. May be one of them needs to act as the chiller
and others wait for it finish successfully.

Any thoughts on the overall approach ? Any suggestions to
simplify things or any alternate designs ? Can something
as simple as CHILLing the table holding VACUUM FULL
strength lock be acceptable ?

Thanks,
Pavan

--

EnterpriseDB http://www.enterprisedb.com

tgl@sss.pgh.pa.us

about 19 years ago

In reply to: Pavan Deolasee (#4)

Re: Question: pg_class attributes and race conditions ?

"Pavan Deolasee" <pavan.deolasee@enterprisedb.com> writes:

Any thoughts on the overall approach ?

Fragile and full of race conditions :-(. I thought from the beginning
that CREATE INDEX might be a showstopper for the whole HOT concept,
and it's starting to look like that's the case.

I think what we need to get away from is the assumption that HOT-ness
for one index is the same as HOT-ness for all. What if we only applied
HOT to primary-key indexes, so that there was certainly not more than
one index per table that the property applies to?

regards, tom lane

Joshua D. Drake

jd@commandprompt.com

about 19 years ago

In reply to: Tom Lane (#5)

Re: Question: pg_class attributes and race conditions ?

Tom Lane wrote:

"Pavan Deolasee" <pavan.deolasee@enterprisedb.com> writes:

Any thoughts on the overall approach ?

Fragile and full of race conditions :-(. I thought from the beginning
that CREATE INDEX might be a showstopper for the whole HOT concept,
and it's starting to look like that's the case.

I think what we need to get away from is the assumption that HOT-ness
for one index is the same as HOT-ness for all. What if we only applied
HOT to primary-key indexes, so that there was certainly not more than
one index per table that the property applies to?

Just to throw my two bits in here :). If we do that, how does that
effect the idea that most people in the web world use (id serial primary
key), even though that is not what they are searching on?

More specifically, does HOT help conditions where a composite comes into
play (session_id,last_active) ... which would be a more heavily updated
index than just the primary key.

Sincerely,

Joshua D. Drake

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

about 19 years ago

In reply to: Tom Lane (#5)

Re: Question: pg_class attributes and race conditions ?

Tom Lane wrote:

What if we only applied
HOT to primary-key indexes, so that there was certainly not more than
one index per table that the property applies to?

The main objective of HOT is to enable retail vacuum of HOT-updated
tuples. Doing the above would make it useless for that purpose, at least
when there's more than one index on the table. Granted, there's a lot of
tables with just one index out there, but it's a big limitation
nevertheless.

An extension of that idea, though is to store a flag per index in the
HOT-updated tuple. We would then need a mapping between bits in the
tuple header to indexes, for example as a new column in pg_index.

Let's assume that we'd use one byte in the tuple header to store
HOT-flags. That's enough to store the HOT-status for 8 indexes. A set
bit means that the index corresponding that bit doesn't have an index
pointer in it.

When you CREATE INDEX, assign a bit for the new index that's not
currently in use. When you scan the table to build the index, clear that
bit for every tuple if set and insert index entry as usual.

DROP INDEX wouldn't need to scan the heap to clear the flags, because we
clear them on CREATE INDEX when necessary.

If you run out of bits in the header, IOW have more than 8 indexes on a
table, indexes unlucky enough to not have a bit assigned to them
wouldn't be HOT-updateable.

This would also enable us to skip index inserts for those indexes whose
key columns are not updated, and do the index inserts as usual for the
rest. The limitation that you can only retail vacuum HOT-updated tuples
when none of the indexed keys were changed remains, but we've accepted
that already.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

pavan.deolasee@gmail.com

about 19 years ago

In reply to: Tom Lane (#5)

Re: Question: pg_class attributes and race conditions ?

Tom Lane wrote:

"Pavan Deolasee" <pavan.deolasee@enterprisedb.com> writes:

Any thoughts on the overall approach ?

Fragile and full of race conditions :-(.

Yes, it looks a bit complex. But IMHO we can get around that.
Do you have any ideas in mind about doing that ?

I thought from the beginning
that CREATE INDEX might be a showstopper for the whole HOT concept,
and it's starting to look like that's the case.

I remember you raised this concern very early, but I am hopeful
that we would be able to solve this. Would it be acceptable
to have a simple (though not the best) solution for this release
and then improve later on ? As I mentioned earlier, one option
is to CHILL the table, if required, holding AccessExclusive lock,
just like VACUUM FULL. I am assuming here that CREATE INDEX is
not such a common activity, isn't that true ?

I think what we need to get away from is the assumption that HOT-ness
for one index is the same as HOT-ness for all. What if we only applied
HOT to primary-key indexes, so that there was certainly not more than
one index per table that the property applies to?

I think that will take away the ability to reuse HEAP_ONLY tuples
without vacuuming the heap and index.

Thanks,
Pavan

--

EnterpriseDB http://www.enterprisedb.com

simon@2ndQuadrant.com

about 19 years ago

In reply to: Pavan Deolasee (#4)

Re: Question: pg_class attributes and race conditions ?

On Fri, 2007-03-16 at 21:56 +0530, Pavan Deolasee wrote:

Any thoughts on the overall approach ? Any suggestions to
simplify things or any alternate designs ?

Well your design is very different from what we discussed, so I think I
should post my proposed design alongside this, for further discussion.

- - -

Proposed solutions for CREATE INDEX and CREATE INDEX CONCURRENTLY.

CREATE INDEX CONCURRENTLY has no feature changes to work with HOT.

CREATE INDEX works normally, except when HOT tuples are found, in which
case special processing occurs requiring additional locking. The index
build occurs in a single scan, as now.

CREATE INDEX CONCURRENTLY
-------------------------
We perform no chilling during the initial scan. We index the tuple
identified by SnapshotNow, but we take the root tuple's htid, not the
htid of the tuple version being indexed. We assume that the tuple
version indexed will be the root of the HOT chain by the time the index
is complete.

Currently, we wait until all pre-existing transactions exit before we
allow this to COMMIT. With HOT, we simply move the wait so it occurs
*before* the second scan, then we can prune the HOT chains as we pass
through the heap on the second scan. There will be no pre-existing HOT
tuples and so no chilling is required.

CREATE INDEX
------------

We add a field, xchill, to pg_class that stores TransactionIds. This is
set to InvalidTransactionId if no index has been built yet. The concept
of "main indexer" is introduced, so we allow multiple concurrent index
builds, but only one of these can chill tuples at a time.

1. In IndexBuildHeapScan, as we pass through the table:

a) if we find any any HOT rows, we check xchill and do one of steps
(i-iii). Until this point, it hasn't been important whether we are the
main or a secondary indexer.

i) if xchill is InvalidTransactionId or is committed then we attempt to
become main indexer immediately, following these steps before we
continue building the index (1b)

-- If the table is temp, or if we created the table then we immediately
become the main indexer, so return immediately. If the table being
indexed is already visible to everybody, then:
-- Update pg_class entry for the table, setting the xchill field for the
table to the builder's Xid. (Use heap_inplace_update for this, which is
OK, whether we commit or abort).
-- acquire AccessExclusiveLock on all existing indexes (not the heap)

ii) If xchill is in-progress we wait for that transaction to complete,
then do either step i) or iii). We cannot continue building our index
until the other transaction commits because we cannot yet see the other
index, yet we have to insert into it in order to correctly chill a tuple
to allow *our* index to be built.

iii) if xchill is aborted we abort also, saying that a VACUUM is needed.

b) If we get here then we are the main indexer and can chill tuples. As
we move through the scan we chill all HOT tuples, mark them
HEAP_CHILL_IN_PROGRESS, write WAL for this and insert index entries for
them in all existing indexes, as well as this one. Then remove
CHILL_IN_PROGRESS flags, without writing WAL.

c) release locks on indexes, before end of transaction

2. If we crash or a transaction abort occurs:
- we cannot prune a HEAP_ONLY_TUPLE that points to a tuple with
HEAP_CHILL_IN_PROGRESS.

- VACUUM must be used to clean up after an aborted index build and needs
some additional code to allow this to occur.

3. Concurrent index builds are allowed. If we are not the main indexer,
then we can attempt to build an index, but any scan that sees a HOT
tuple will block and wait for the main index builder to complete before
it proceeds.

4. When an indexscan reads the table, if it finds a
HEAP_CHILL_IN_PROGRESS tuple it may or may not be valid. Concurrent
index scans and tuple chilling can mean that an index scan find the same
tuple twice, by different routes, if a CREATE INDEX crashed. To avoid
this an IndexScan will only find a tuple visible if it came across a
HEAP_CHILL_IN_PROGRESS tuple using an indirect route, i.e. it followed
the path from root->HOT tuple.

In this design, CREATE INDEX does have a deadlock risk when it is used
within a transaction *and* the index is being built on a publicly
visible table (i.e. not just-built and not temp). IMHO that risk is
acceptable, since if users are worried about concurrent access to a
table during CREATE INDEX they can use CREATE INDEX CONCURRENTLY.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

bruce@momjian.us

about 19 years ago

In reply to: Joshua D. Drake (#6)

Re: Question: pg_class attributes and race conditions ?

"Joshua D. Drake" <jd@commandprompt.com> writes:

Just to throw my two bits in here :). If we do that, how does that
effect the idea that most people in the web world use (id serial primary
key), even though that is not what they are searching on?

"affect". But I think you're right that generally you'll have two indexes.

More specifically, does HOT help conditions where a composite comes into
play (session_id,last_active) ... which would be a more heavily updated
index than just the primary key.

Well if you're indexing a column that you're updating then you've already
failed your saving throw.

The case we're trying to deal with is when you're updating columns that
*aren't* indexed and therefore really don't need redundant index pointers for
each tuple version with identical to the old versions. Especially since those
index pointers are what's preventing us from vacuuming the old tuple versions.

If you are updating an index key then there's no question you're going to need
vacuum to clean out your index.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

pavan.deolasee@gmail.com

about 19 years ago

In reply to: Heikki Linnakangas (#7)

Re: Question: pg_class attributes and race conditions ?

Heikki Linnakangas wrote:

Tom Lane wrote:

What if we only applied
HOT to primary-key indexes, so that there was certainly not more than
one index per table that the property applies to?

The main objective of HOT is to enable retail vacuum of HOT-updated
tuples. Doing the above would make it useless for that purpose,
at least when there's more than one index on the table. Granted,
there's a lot of tables with just one index out there, but it's a
big limitation nevertheless.

Agree.

An extension of that idea, though is to store a flag per index in
the HOT-updated tuple. We would then need a mapping between bits in
the tuple header to indexes, for example as a new column in pg_index.

I like the idea. The major objection would be that it adds a byte
to the tuple header which when considered along with the null
bitmap, may actually make the header 8 bytes larger in the
worst case.

Also, I am also worried about the additional complexity introduced
with this. We can and should work on this idea, I am wondering
whether it would be too much to do before the feature freeze.

I am personally inclined towards doing something simpler to
tackle the CREATE INDEX issue at the moment. But if that is not
acceptable and/or you or anyone else is willing help me on this,
we can work on a better solution.

Thanks,
Pavan

--

EnterpriseDB http://www.enterprisedb.com

simon@2ndQuadrant.com

about 19 years ago

In reply to: Tom Lane (#5)

Re: Question: pg_class attributes and race conditions ?

On Fri, 2007-03-16 at 12:40 -0400, Tom Lane wrote:

"Pavan Deolasee" <pavan.deolasee@enterprisedb.com> writes:

Any thoughts on the overall approach ?

Fragile and full of race conditions :-(. I thought from the beginning
that CREATE INDEX might be a showstopper for the whole HOT concept,
and it's starting to look like that's the case.

Seems like we can fix all but some strange CREATE INDEX use cases. Since
we have CREATE INDEX CONCURRENTLY, seems like HOT is a showstopper for
the whole CREATE INDEX concept.

I think what we need to get away from is the assumption that HOT-ness
for one index is the same as HOT-ness for all.

Sounds interesting. I'd not considered that before.

What if we only applied
HOT to primary-key indexes, so that there was certainly not more than
one index per table that the property applies to?

On its own, I don't think this is a sufficiently wide use-case.

Perhaps we should do this PLUS make HOT-semantics optional for each
additional index. i.e. HOT is always enforced on primary indexes and
optionally on other indexes (but not by default).

If you accept the HOT option on an index, you then accept the additional
issues surrounding chilling tuples. Bear in mind that there aren't any
at all if you use CREATE INDEX CONCURRENTLY and many other cases.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

Joshua D. Drake

jd@commandprompt.com

about 19 years ago

In reply to: Bruce Momjian (#10)

Re: Question: pg_class attributes and race conditions ?

Gregory Stark wrote:

"Joshua D. Drake" <jd@commandprompt.com> writes:

Just to throw my two bits in here :). If we do that, how does that
effect the idea that most people in the web world use (id serial primary
key), even though that is not what they are searching on?

"affect". But I think you're right that generally you'll have two indexes.

More specifically, does HOT help conditions where a composite comes into
play (session_id,last_active) ... which would be a more heavily updated
index than just the primary key.

Well if you're indexing a column that you're updating then you've already
failed your saving throw.

Just for everyone who missed this. Greg Stark obviously spends his time
(or at some time) playing|ed D&D. I have an Epic level Sorcerer, how
about you Greg? ;)

Sincerely,

Joshua D. Drake

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

alvherre@2ndquadrant.com

about 19 years ago

In reply to: Simon Riggs (#12)

Re: Question: pg_class attributes and race conditions ?

Simon Riggs wrote:

What if we only applied
HOT to primary-key indexes, so that there was certainly not more than
one index per table that the property applies to?

On its own, I don't think this is a sufficiently wide use-case.

Perhaps we should do this PLUS make HOT-semantics optional for each
additional index. i.e. HOT is always enforced on primary indexes and
optionally on other indexes (but not by default).

Here's is a very simple, low-tech idea. How about checking whether the
new index requires chilling tuples; if it does, then elog(ERROR) until
all the indexes have been manually chilled, which would be done with an
"ALTER INDEX ... CHILL" command or something like that. Only when all
indexes are known chilled, you can create another one, and then the user
can "hotify" indexes as appropriate.

(Disclaimer: I haven't followed the HOT design closely to know if this
makes enough sense)

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

simon@2ndQuadrant.com

about 19 years ago

In reply to: Alvaro Herrera (#14)

Re: Question: pg_class attributes and race conditions ?

On Fri, 2007-03-16 at 16:59 -0400, Alvaro Herrera wrote:

Here's is a very simple, low-tech idea. How about checking whether the
new index requires chilling tuples; if it does, then elog(ERROR) until
all the indexes have been manually chilled, which would be done with an
"ALTER INDEX ... CHILL" command or something like that. Only when all
indexes are known chilled, you can create another one, and then the user
can "hotify" indexes as appropriate.

Well, I've spent two weeks searching for a design that does CREATE INDEX
without changing existing functionality. What's been proposed is very
close, but not exact.

CREATE INDEX CONCURRENTLY can work, so we're just discussing the other
increasingly edgy cases.

I agree some kind of compromise on CREATE INDEX seems to be required if
we want HOT without some drastic reductions in function. I'm happy to go
for low tech approaches, or anything really. Simple is good, so we can
hit feature freeze.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

pavan.deolasee@gmail.com

about 19 years ago

In reply to: Simon Riggs (#9)

CREATE INDEX and HOT (was Question: pg_class attributes and race conditions ?)

How do we move forward with the CREATE INDEX issue with
HOT ? There are quite a few suggestions and objections.
Can we please discuss and decide on the plan ? I am very
comfortable with the current state of HOT, the results
are encouraging and I hope this issue does not become
a showstopper.

Here is what different people have suggested:

1. Simon and I posted couple of designs which include prior
suggestions from Heikki, Tom, Hannu and others. I believe
these designs are not very distinct and if done correctly
would help us keep the current behavior of CREATE INDEX
and CREATE INDEX CONCURRENTLY same. There are concerns
about race conditions and deadlock issues though. What
is a general feeling at this point ? Are these issues very
hard to address ? At a high level, IMO we need some or all
of these things to make either of these designs work:

- ability to enable/disable HOT-updates on the table
- ability to enable/disable HOT-fetches on the table
- ability to wait for concurrent index scans to finish
- ability to allow only one backend to CHILL the table

How hard would these be without causing any race conditions
and deadlocks ?

2. Heikki suggested an approach where we add a byte
to tuple header and track HOT-ness of different indexes.
The idea looks good but had a downside of increasing tuple
header and complexity.

3. Alvaro suggested to have a ALTER TABLE .. SET CHILL kind
of syntax to chill the table. I would like to extend that
a bit further:

Can we have a mechanism to explicitely ON/OFF HOT-updates
on tables ? By default, HOT is disabled. If user wants to
take advantage of HOT, he/she should do something like:

ALTER TABLE test SET HOT ON;

While creating an index, if a HEAP_ONLY tuple is found,
CREATE INDEX [CONCURRENTLY] fails with an error and the
user needs to SET HOT OFF and then try again. While turning
HOT off, the entire table is CHILLed, holding AccessExclusive
lock on the table. Once the new index is created, user
can turn HOT on again.

This infrastructure would ensure that the current
behavioral expectations of CREATE INDEX [CONCURRENTLY]
don't suddenly change unless user explicitely turns
HOT on. In that case, we can very well assume that the
user is aware of the advantages/disadvantages of HOT
and also cost associated with creating a new index on
HOT-updated tables. This would also allow users
experiment with HOT and turn it off if there is no gain
in their environment.

Any thoughts on the these approaches ? What is feasible
and acceptable ?

I am more inclined towards the third approach for this
release and improve things later on. Is this acceptable ?

Thanks,
Pavan

--

EnterpriseDB http://www.enterprisedb.com

simon@2ndQuadrant.com

about 19 years ago

In reply to: Pavan Deolasee (#16)

Re: CREATE INDEX and HOT (was Question: pg_classattributes and race conditions ?)

On Sat, 2007-03-17 at 12:43 +0530, Pavan Deolasee wrote:

How do we move forward with the CREATE INDEX issue with
HOT ? There are quite a few suggestions and objections.
Can we please discuss and decide on the plan ? I am very
comfortable with the current state of HOT, the results
are encouraging and I hope this issue does not become
a showstopper.

My suggested plan would be:

First, we should make CREATE INDEX CONCURRENTLY work, since we have a
design to do that. Many other things are acceptable if that works fully
and correctly with HOT, if it cannot, we have problems and that is a
showstopper.

Second, I can't put my finger on it exactly, but I think having a DDL
command to chill a table, as Alvaro suggests, sounds like a very wise
utility to include, whatever we do later. That does have the useful
by-product of making CREATE INDEX work without change, but I would not
want to rely on it in the longer term.

While those are happening we can explore the other possibilities in more
detail, but they do seem like safe bets to me.

Here is what different people have suggested:

1. Simon and I posted couple of designs which include prior
suggestions from Heikki, Tom, Hannu and others. I believe
these designs are not very distinct and if done correctly
would help us keep the current behavior of CREATE INDEX
and CREATE INDEX CONCURRENTLY same.

Agreed. Apologies for confusing things by posting the second design,
that wasn't very helpful or polite.

There are concerns
about race conditions and deadlock issues though. What
is a general feeling at this point ? Are these issues very
hard to address ? At a high level, IMO we need some or all
of these things to make either of these designs work:

- ability to enable/disable HOT-updates on the table
- ability to enable/disable HOT-fetches on the table
- ability to wait for concurrent index scans to finish
- ability to allow only one backend to CHILL the table

That seems like a good summary of what we need. I would add only that
these need not be explicit user-level commands, just internal
capabilities.

How hard would these be without causing any race conditions
and deadlocks ?

We should be able to take advantage of two things:

- We only need to chill some of the tuples, in some cases.

- Chilling only requires us to lock the indexes, which would be more
practical if the index locking were more transitory.

I note that IndexScans hold the locks they have on an index until
transaction end, even after the IndexScans and BitmapIndexScans
complete. ISTM we could relax that locking; that seems OK since we
earlier discussed removing locks completely on indexes. The lock on the
heap would not be upgraded or changed.

2. Heikki suggested an approach where we add a byte
to tuple header and track HOT-ness of different indexes.
The idea looks good but had a downside of increasing tuple
header and complexity.

I'm very comfortable with the idea that HOT can be turned on/off for a
table. That gives us a workaround to bugs. Previously, changing things
like WITHOUT OIDS was done over two releases, so I'd suggest the same
thing here. Add the option now, disabled, then look to make it the
default option in the next release. We can override that with the
default_use_hot parameter for those that feel bold, at least initially.
I know Bruce has been long opposed to the idea of a table-level switch,
which is why we've been trying so hard to avoid it. So we should add his
-1 to this idea from the start.

Right now, ideas around (2) sound like too much complexity. Maybe
there's a good idea there waiting to break out, so we should pursue that
also - but I'm not sure you can wait for that to happen.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

tgl@sss.pgh.pa.us

about 19 years ago

In reply to: Pavan Deolasee (#16)

Re: CREATE INDEX and HOT (was Question: pg_class attributes and race conditions ?)

"Pavan Deolasee" <pavan.deolasee@enterprisedb.com> writes:

While creating an index, if a HEAP_ONLY tuple is found,
CREATE INDEX [CONCURRENTLY] fails with an error and the
user needs to SET HOT OFF and then try again. While turning
HOT off, the entire table is CHILLed, holding AccessExclusive
lock on the table. Once the new index is created, user
can turn HOT on again.

It hardly seems acceptable to require exclusive lock to chill a table.
In production situations, knowing that you'd have to do that to do
index maintenance on a large table would probably scare you off of
ever enabling the feature at all. Last year we were getting beaten up
about how it wasn't acceptable for CREATE INDEX to lock out writes
for a long time; how is it suddenly acceptable to need to lock out
both reads and writes for a long time before you can even think
about creating an index?

regards, tom lane

simon@2ndQuadrant.com

about 19 years ago

In reply to: Tom Lane (#18)

Re: CREATE INDEX and HOT (was Question: pg_classattributes and race conditions ?)

On Sat, 2007-03-17 at 11:45 -0400, Tom Lane wrote:

"Pavan Deolasee" <pavan.deolasee@enterprisedb.com> writes:

While creating an index, if a HEAP_ONLY tuple is found,
CREATE INDEX [CONCURRENTLY] fails with an error and the
user needs to SET HOT OFF and then try again. While turning
HOT off, the entire table is CHILLed, holding AccessExclusive
lock on the table. Once the new index is created, user
can turn HOT on again.

It hardly seems acceptable to require exclusive lock to chill a table.
In production situations, knowing that you'd have to do that to do
index maintenance on a large table would probably scare you off of
ever enabling the feature at all. Last year we were getting beaten up
about how it wasn't acceptable for CREATE INDEX to lock out writes
for a long time; how is it suddenly acceptable to need to lock out
both reads and writes for a long time before you can even think
about creating an index?

I agree with you: It isn't acceptable for us to contemplate an
AccessExclusiveLock before we can build any index.

We *must* make CREATE INDEX CONCURRENTLY work with HOT. The good news is
I think we can without significant difficulty.

The problems are with CREATE INDEX, in some cases. I regret that I did
not see those difficulties until recently, which is why I was concerned
that we spent time on VACUUM FULL rather than this issue. I'm glad now
that you both pressed ahead and solved that though.

As a result of the issues, I think Pavan is playing safe, to make sure
there is *an* option, so that we can build upwards from there. The
proposal is pragmatism only, while we discuss other approaches.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

pavan.deolasee@gmail.com

about 19 years ago

In reply to: Tom Lane (#18)

Re: CREATE INDEX and HOT (was Question: pg_class attributes and race conditions ?)

Tom Lane wrote:

"Pavan Deolasee" <pavan.deolasee@enterprisedb.com> writes:

While creating an index, if a HEAP_ONLY tuple is found,
CREATE INDEX [CONCURRENTLY] fails with an error and the
user needs to SET HOT OFF and then try again. While turning
HOT off, the entire table is CHILLed, holding AccessExclusive
lock on the table. Once the new index is created, user
can turn HOT on again.

It hardly seems acceptable to require exclusive lock to chill a table.
In production situations, knowing that you'd have to do that to do
index maintenance on a large table would probably scare you off of
ever enabling the feature at all. Last year we were getting beaten up
about how it wasn't acceptable for CREATE INDEX to lock out writes
for a long time; how is it suddenly acceptable to need to lock out
both reads and writes for a long time before you can even think
about creating an index?

Yeah, I agree. I was proposing this as a stop-gap solution though.
Something which would help us solve the problem without changing
the current behavior for non-HOT tables.

So what do you suggest ? Do you feel that there is no way we can
solve the problem ?

ISTM that if we run CHILL as a seperate transaction (just like
VACUUM), we should be able to CHILL the table with
ShareUpdateExclusiveLock. Running it as a seperate transaction
would reduce the risk of causing deadlocks. Is that a fair
assessment ?

If we need to CHILL with ShareUpdateExclusiveLock, IMHO we
would again be back to something similar to the first approach.
I know you said its fragile and full of race conditions, but
do you think we can handle it better if we have a seperate
DDL command, running within its own transaction ?

The algorithm would look like:

1. Disable HOT-updates
2. CHILL the table by inserting appropriate index entries and
marking tuples CHILL_IN_PROGRESS
3. Establish a point when there are no open index scans
4. Disable HOT-fetches
5. Scan the heap again, reset CHILL_IN_PROGRESS, HEAP_ONLY
and HOT_UPDATED flags
6. Establish a point when there are no open index scans
7. Enable HOT-fetches
8. Enable HOT-updates

I need help to do the step 1,3,4,6,7 and 8 (well all :-))
in a deadlock and race condition free mannner. Any
suggestions ? Where do we keep the global state about
HOT-updates/HOT-fetches ? If we keep it in pg_class, a
crash of the CHILL command or the server may leave the
pg_class row in a stale state. That does not look like
a problem though. In the worst case, we might not be
able to do HOT-updates without manual intervention.

Again comments, suggestions ? I really appreciate
everyone's time and patience. Help is what I need to
solve this problem.

Thanks,
Pavan

--

EnterpriseDB http://www.enterprisedb.com

pavan.deolasee@gmail.com

about 19 years ago

In reply to: Simon Riggs (#19)

simon@2ndQuadrant.com

about 19 years ago

In reply to: Pavan Deolasee (#20)

pavan.deolasee@gmail.com

about 19 years ago

In reply to: Simon Riggs (#22)

simon@2ndQuadrant.com

about 19 years ago

In reply to: Pavan Deolasee (#23)

pavan.deolasee@gmail.com

about 19 years ago

In reply to: Simon Riggs (#19)

simon@2ndQuadrant.com

about 19 years ago

In reply to: Pavan Deolasee (#25)

simon@2ndQuadrant.com

about 19 years ago

In reply to: Simon Riggs (#26)

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

about 19 years ago

In reply to: Pavan Deolasee (#16)

pavan.deolasee@gmail.com

about 19 years ago

In reply to: Heikki Linnakangas (#28)

pavan.deolasee@gmail.com

about 19 years ago

In reply to: Simon Riggs (#26)

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

about 19 years ago

In reply to: Pavan Deolasee (#29)

simon@2ndQuadrant.com

about 19 years ago

In reply to: Pavan Deolasee (#30)

simon@2ndQuadrant.com

about 19 years ago

In reply to: Heikki Linnakangas (#31)

hannu@tm.ee

about 19 years ago

In reply to: Simon Riggs (#33)

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

about 19 years ago

In reply to: Hannu Krosing (#34)

mmoncure@gmail.com

about 19 years ago

In reply to: Simon Riggs (#17)

simon@2ndQuadrant.com

about 19 years ago

In reply to: Merlin Moncure (#36)

mmoncure@gmail.com

about 19 years ago

In reply to: Pavan Deolasee (#25)

bruce@momjian.us

about 19 years ago

In reply to: Heikki Linnakangas (#7)

Jim.Nasby@BlueTreble.com

about 19 years ago

In reply to: Simon Riggs (#33)

mr-russ@pws.com.au

about 19 years ago

In reply to: Jim Nasby (#40)