CREATE INDEX and HOT - revised design
There are few things I realized over the weekend while going
through the code:
1. It looks like a bad idea to use ALTER TABLE .. to chill a table
becuase ALTER TABLE takes AccessExclusive lock on the table.
But it would still be a good idea to have ALTER TABLE .. to turn
HOT-updates ON/OFF.
2. May be I was too paranoid about deadlocks. CREATE INDEX upgrade
its lock anyways and is prone to deadlock. So as long as we don't
create new deadlock scenarios, we should be fine.
3. CREATE INDEX CONCURRENTLY must be run as a seperate transaction.
So its should be acceptable if we run CHILL as a seperate transaction.
4. CREATE INDEX CONCURRENTLY runs with ShareUpdateExclusiveLock and
thus locks out concurrent CREATE INDEX [CONCURRENTLY], VACUUM and
VACUUM FULL. We can do the same for CHILL to avoid any unnecessary
race conditions between all of these.
So here is my plan. Let me know your suggestions/comments/objections.
Changes to pg_class and new DDLs:
---------------------------------
We add two boolean attributes to pg_class: hot_update and hot_fetch.
We introduce two DDLs to turn HOT on/off. Note that the DDL
itself does not CHILL the table, but only affects the
subsequent UPDATEs.
postgres=# ALTER TABLE <tblname> ENABLE HOT;
postgres=# ALTER TABLE <tblname> DISABLE HOT;
These DDLs would acquire AccessExclusive lock on the table and
set hot_update to true/false using simple_heap_update()
CREATE INDEX [CONCURRENTLY]:
----------------------------
If a HEAP_ONLY tuple is found, error out with a HINT to run
CHILL on the table and then retry.
If CHILL_IN_PROGRESS tuple is found, VACUUM is required on
the table.
CHILL utility:
--------------
We introduce a new command to chill a table. The syntax for the
same could be:
postgres=# CHILL [VERBOSE] <tblname>;
UPDATE/INSERT/SELECT would work while the table is being chilled.
But CREATE INDEX [CONCURRENTLY], VACUUM [FULL] and another CHILL
would be locked out. As a side-effect, HOT-updates are turned off on the
table and explicit ALTER TABLE ENABLE HOT is required to turn
HOT-updates on again.
Here is the algoirthm to CHILL table.
1. Check if CHILL is running inside a transaction block, error
out if so.
2. Start a new transaction
3. Acquire ShareUpdateExclusiveLock on the relation. This would
allow concurrent SELECT/INSERT/UPDATE, but lock out concurrent
CHILL, VACUUM [FULL] and CREATE INDEX [CONCURRENTLY]
4. Set hot_update to false and update pg_class using
simple_heap_update()
5. Acquire ShareUpdateExclusiveLock for the entire session.
6. Commit the transaction
7. Start a new transaction
8. Wait for all transactions in the current snapshot to finish.
This would ensure that there are no HOT-updates possible further
9. Start the first heap scan, CHILL HEAP_ONLY tuples by inserting
appropriate index entries and setting CHILL_IN_PROGRESS flag.
WAL log the operation
10. If a CHILL_IN_PROGRESS is already set on the tuple, we can't
be sure whether the corresponding index entry already exists
or not. One option is to error out and force VACUUM on the table.
Alternatively, the index_insert can be enhanced to check if a
the same entry already exists.
11. When the entire heap is chilled, set hot_fetch to false
and update pg_class using simple_heap_update()
12. Commit the transaction
13. Start a new transaction
14. Wait for all transactions in the current snapshot to finish.
This would ensure that all the subsequent index scans would
only use direct path from the index.
15. Start second heap scan. Reset CHILL_IN_PROGRESS, HEAP_ONLY
and HOT_UPDATED flags. WAL log the operations. We may not need
this, but we can revisit this later to optimize WAL logging.
16. When the second scan is complete, set hot_fetch to
true and update pg_class using simple_heap_update(). There are
no HOT_UPDATED tuples in the heap at this moment, but we should
reset the state neverthless.
17. Commit the transaction.
If the CHILL command crashes before completing the operation,
we might be left with hot_update/hot_fetch turned OFF. Administrative
command is needed to turn them ON again. But there won't be any
correctness problems in the meantime.
The uncleaned tuples left with CHILL_IN_PROGRESS flags would
require VACUUM for cleanup.
Index Fetch:
------------
If hot_fetch is true, we ignore direct paths from the
index to HEAP_ONLY tuples
If hot_fetch is false, we ignore HOT_UPDATED flags
and only use direct paths from the index.
VACUUM [FULL]:
--------------
If a CHILL_IN_PROGRESS flag found, collect that tuple for
index removal irrespective of whether the tuple is DEAD
or not. If the tuple is not DEAD, only the CHILL_IN_PROGRESS
flag is reset in the second pass.
Is the plan acceptable ? If there are no objections to the
algorithms or the behavior in general, I would start working
on this with a target of feature freeze.
Thanks,
Pavan
--
EnterpriseDB http://www.enterprisedb.com
I have read the HOT discussion and wanted to give my input. The major
issue is that CREATE INDEX might require a HOT chain to be split apart
if one of the new indexed columns changed in the HOT chain.
As for the outline below, there is no way we are going to add new ALTER
TABLE and CHILL commands to make this work. Can you imagine what kind
of warts we would have in the system if we added such things every time
we got stuck? And once the warts are in, it is hard to educate people
once they are removed. We need to keep going until we have a solution
that is as user-invisible as possible. While I understand the
frustration that we have not discussed this enough, I don't want us
rushing to a solution either until it has been totally thought through.
HOT is not a feature only a few people are going to want to use ---
everyone will want it, and if the user process is cumbersome, we will get
never-ending questions about how to make it work.
Let's all think about this for the next few days.
---------------------------------------------------------------------------
Pavan Deolasee wrote:
There are few things I realized over the weekend while going
through the code:1. It looks like a bad idea to use ALTER TABLE .. to chill a table
becuase ALTER TABLE takes AccessExclusive lock on the table.
But it would still be a good idea to have ALTER TABLE .. to turn
HOT-updates ON/OFF.2. May be I was too paranoid about deadlocks. CREATE INDEX upgrade
its lock anyways and is prone to deadlock. So as long as we don't
create new deadlock scenarios, we should be fine.3. CREATE INDEX CONCURRENTLY must be run as a seperate transaction.
So its should be acceptable if we run CHILL as a seperate transaction.4. CREATE INDEX CONCURRENTLY runs with ShareUpdateExclusiveLock and
thus locks out concurrent CREATE INDEX [CONCURRENTLY], VACUUM and
VACUUM FULL. We can do the same for CHILL to avoid any unnecessary
race conditions between all of these.So here is my plan. Let me know your suggestions/comments/objections.
Changes to pg_class and new DDLs:
---------------------------------We add two boolean attributes to pg_class: hot_update and hot_fetch.
We introduce two DDLs to turn HOT on/off. Note that the DDL
itself does not CHILL the table, but only affects the
subsequent UPDATEs.postgres=# ALTER TABLE <tblname> ENABLE HOT;
postgres=# ALTER TABLE <tblname> DISABLE HOT;These DDLs would acquire AccessExclusive lock on the table and
set hot_update to true/false using simple_heap_update()CREATE INDEX [CONCURRENTLY]:
----------------------------If a HEAP_ONLY tuple is found, error out with a HINT to run
CHILL on the table and then retry.If CHILL_IN_PROGRESS tuple is found, VACUUM is required on
the table.CHILL utility:
--------------We introduce a new command to chill a table. The syntax for the
same could be:postgres=# CHILL [VERBOSE] <tblname>;
UPDATE/INSERT/SELECT would work while the table is being chilled.
But CREATE INDEX [CONCURRENTLY], VACUUM [FULL] and another CHILL
would be locked out. As a side-effect, HOT-updates are turned off on the
table and explicit ALTER TABLE ENABLE HOT is required to turn
HOT-updates on again.Here is the algoirthm to CHILL table.
1. Check if CHILL is running inside a transaction block, error
out if so.2. Start a new transaction
3. Acquire ShareUpdateExclusiveLock on the relation. This would
allow concurrent SELECT/INSERT/UPDATE, but lock out concurrent
CHILL, VACUUM [FULL] and CREATE INDEX [CONCURRENTLY]4. Set hot_update to false and update pg_class using
simple_heap_update()5. Acquire ShareUpdateExclusiveLock for the entire session.
6. Commit the transaction
7. Start a new transaction
8. Wait for all transactions in the current snapshot to finish.
This would ensure that there are no HOT-updates possible further9. Start the first heap scan, CHILL HEAP_ONLY tuples by inserting
appropriate index entries and setting CHILL_IN_PROGRESS flag.
WAL log the operation10. If a CHILL_IN_PROGRESS is already set on the tuple, we can't
be sure whether the corresponding index entry already exists
or not. One option is to error out and force VACUUM on the table.
Alternatively, the index_insert can be enhanced to check if a
the same entry already exists.11. When the entire heap is chilled, set hot_fetch to false
and update pg_class using simple_heap_update()12. Commit the transaction
13. Start a new transaction
14. Wait for all transactions in the current snapshot to finish.
This would ensure that all the subsequent index scans would
only use direct path from the index.15. Start second heap scan. Reset CHILL_IN_PROGRESS, HEAP_ONLY
and HOT_UPDATED flags. WAL log the operations. We may not need
this, but we can revisit this later to optimize WAL logging.16. When the second scan is complete, set hot_fetch to
true and update pg_class using simple_heap_update(). There are
no HOT_UPDATED tuples in the heap at this moment, but we should
reset the state neverthless.17. Commit the transaction.
If the CHILL command crashes before completing the operation,
we might be left with hot_update/hot_fetch turned OFF. Administrative
command is needed to turn them ON again. But there won't be any
correctness problems in the meantime.The uncleaned tuples left with CHILL_IN_PROGRESS flags would
require VACUUM for cleanup.Index Fetch:
------------If hot_fetch is true, we ignore direct paths from the
index to HEAP_ONLY tuplesIf hot_fetch is false, we ignore HOT_UPDATED flags
and only use direct paths from the index.VACUUM [FULL]:
--------------If a CHILL_IN_PROGRESS flag found, collect that tuple for
index removal irrespective of whether the tuple is DEAD
or not. If the tuple is not DEAD, only the CHILL_IN_PROGRESS
flag is reset in the second pass.Is the plan acceptable ? If there are no objections to the
algorithms or the behavior in general, I would start working
on this with a target of feature freeze.Thanks,
Pavan--
EnterpriseDB http://www.enterprisedb.com
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote:
I have read the HOT discussion and wanted to give my input. The major
issue is that CREATE INDEX might require a HOT chain to be split apart
if one of the new indexed columns changed in the HOT chain.
To expand a little more, the problem is that when you split those HOT
chains, you have to insert new entries into the _existing_ indexes,
causing problems for concurrent sequential scans.
I have a new idea. There has been a lot of focus on trying to tag each
tuple to indicate whether it is part of a HOT chain for individual
indexes, the idea being that some indexes will follow the HOT chain, and
some will not.
A different idea is to flag the _index_ as using HOT for the table or
not, using a boolean in pg_index. The idea is that when a new index is
created, it has its HOT boolean set to false and indexes all tuples and
ignores HOT chains. Then doing lookups using that index, the new index
does not follow HOT chains. We also add a boolean to pg_class to
indicate no new HOT chains should be created and set that to false once
the new index is created. Then, at some later time when all HOT chains
are dead, we can enable HOT chain following for the new index and allow
new HOT chains to be created.
A more sophisticated idea would be to place an xid, rather than a
boolean, in pg_index to indicate which chains were created after the
index was created to control whether the index should follow that HOT
chain, or ignore it. The xmax of the head of the HOT chain can be used
as an indicator of when the chain was created. Transactions started
before the pg_index xid could continue following the old rules and
insert into the _new_ index for HOT chain additions, and new
transactions would create HOT chains that could skip adding to the new
index. Cleanup of the hybrid HOT chains (some indexes take part, some
do not) would be more complex.
I know we have xid wrap-around, but I think the VACUUM FREEZE could
handle it by freezing the pg_index xid column value when it does the
table.
OK, so there is my first idea to do this cleanly. Comments?
---------------------------------------------------------------------------
As for the outline below, there is no way we are going to add new ALTER
TABLE and CHILL commands to make this work. Can you imagine what kind
of warts we would have in the system if we added such things every time
we got stuck? And once the warts are in, it is hard to educate people
once they are removed. We need to keep going until we have a solution
that is as user-invisible as possible. While I understand the
frustration that we have not discussed this enough, I don't want us
rushing to a solution either until it has been totally thought through.HOT is not a feature only a few people are going to want to use ---
everyone will want it, and if the user process is cumbersome, we will get
never-ending questions about how to make it work.Let's all think about this for the next few days.
---------------------------------------------------------------------------
Pavan Deolasee wrote:
There are few things I realized over the weekend while going
through the code:1. It looks like a bad idea to use ALTER TABLE .. to chill a table
becuase ALTER TABLE takes AccessExclusive lock on the table.
But it would still be a good idea to have ALTER TABLE .. to turn
HOT-updates ON/OFF.2. May be I was too paranoid about deadlocks. CREATE INDEX upgrade
its lock anyways and is prone to deadlock. So as long as we don't
create new deadlock scenarios, we should be fine.3. CREATE INDEX CONCURRENTLY must be run as a seperate transaction.
So its should be acceptable if we run CHILL as a seperate transaction.4. CREATE INDEX CONCURRENTLY runs with ShareUpdateExclusiveLock and
thus locks out concurrent CREATE INDEX [CONCURRENTLY], VACUUM and
VACUUM FULL. We can do the same for CHILL to avoid any unnecessary
race conditions between all of these.So here is my plan. Let me know your suggestions/comments/objections.
Changes to pg_class and new DDLs:
---------------------------------We add two boolean attributes to pg_class: hot_update and hot_fetch.
We introduce two DDLs to turn HOT on/off. Note that the DDL
itself does not CHILL the table, but only affects the
subsequent UPDATEs.postgres=# ALTER TABLE <tblname> ENABLE HOT;
postgres=# ALTER TABLE <tblname> DISABLE HOT;These DDLs would acquire AccessExclusive lock on the table and
set hot_update to true/false using simple_heap_update()CREATE INDEX [CONCURRENTLY]:
----------------------------If a HEAP_ONLY tuple is found, error out with a HINT to run
CHILL on the table and then retry.If CHILL_IN_PROGRESS tuple is found, VACUUM is required on
the table.CHILL utility:
--------------We introduce a new command to chill a table. The syntax for the
same could be:postgres=# CHILL [VERBOSE] <tblname>;
UPDATE/INSERT/SELECT would work while the table is being chilled.
But CREATE INDEX [CONCURRENTLY], VACUUM [FULL] and another CHILL
would be locked out. As a side-effect, HOT-updates are turned off on the
table and explicit ALTER TABLE ENABLE HOT is required to turn
HOT-updates on again.Here is the algoirthm to CHILL table.
1. Check if CHILL is running inside a transaction block, error
out if so.2. Start a new transaction
3. Acquire ShareUpdateExclusiveLock on the relation. This would
allow concurrent SELECT/INSERT/UPDATE, but lock out concurrent
CHILL, VACUUM [FULL] and CREATE INDEX [CONCURRENTLY]4. Set hot_update to false and update pg_class using
simple_heap_update()5. Acquire ShareUpdateExclusiveLock for the entire session.
6. Commit the transaction
7. Start a new transaction
8. Wait for all transactions in the current snapshot to finish.
This would ensure that there are no HOT-updates possible further9. Start the first heap scan, CHILL HEAP_ONLY tuples by inserting
appropriate index entries and setting CHILL_IN_PROGRESS flag.
WAL log the operation10. If a CHILL_IN_PROGRESS is already set on the tuple, we can't
be sure whether the corresponding index entry already exists
or not. One option is to error out and force VACUUM on the table.
Alternatively, the index_insert can be enhanced to check if a
the same entry already exists.11. When the entire heap is chilled, set hot_fetch to false
and update pg_class using simple_heap_update()12. Commit the transaction
13. Start a new transaction
14. Wait for all transactions in the current snapshot to finish.
This would ensure that all the subsequent index scans would
only use direct path from the index.15. Start second heap scan. Reset CHILL_IN_PROGRESS, HEAP_ONLY
and HOT_UPDATED flags. WAL log the operations. We may not need
this, but we can revisit this later to optimize WAL logging.16. When the second scan is complete, set hot_fetch to
true and update pg_class using simple_heap_update(). There are
no HOT_UPDATED tuples in the heap at this moment, but we should
reset the state neverthless.17. Commit the transaction.
If the CHILL command crashes before completing the operation,
we might be left with hot_update/hot_fetch turned OFF. Administrative
command is needed to turn them ON again. But there won't be any
correctness problems in the meantime.The uncleaned tuples left with CHILL_IN_PROGRESS flags would
require VACUUM for cleanup.Index Fetch:
------------If hot_fetch is true, we ignore direct paths from the
index to HEAP_ONLY tuplesIf hot_fetch is false, we ignore HOT_UPDATED flags
and only use direct paths from the index.VACUUM [FULL]:
--------------If a CHILL_IN_PROGRESS flag found, collect that tuple for
index removal irrespective of whether the tuple is DEAD
or not. If the tuple is not DEAD, only the CHILL_IN_PROGRESS
flag is reset in the second pass.Is the plan acceptable ? If there are no objections to the
algorithms or the behavior in general, I would start working
on this with a target of feature freeze.Thanks,
Pavan--
EnterpriseDB http://www.enterprisedb.com
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com+ If your life is a hard drive, Christ can be your backup. +
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote:
A different idea is to flag the _index_ as using HOT for the table or
not, using a boolean in pg_index. The idea is that when a new index is
created, it has its HOT boolean set to false and indexes all tuples and
ignores HOT chains. Then doing lookups using that index, the new index
does not follow HOT chains. We also add a boolean to pg_class to
indicate no new HOT chains should be created and set that to false once
the new index is created. Then, at some later time when all HOT chains
are dead, we can enable HOT chain following for the new index and allow
new HOT chains to be created.
When exactly would all HOT chains be dead? AFAICS, that would be after
the xid of CREATE INDEX gets older than oldest xmin, and VACUUM is run
to prune and pointer-swing all HOT chains.
Would we have to wait after setting the new forbid_hot_updates-flag in
pg_class, to make sure everyone sees the change? What if CREATE INDEX
crashes, would we need a vacuum to reset the flag?
A more sophisticated idea would be to place an xid, rather than a
boolean, in pg_index to indicate which chains were created after the
index was created to control whether the index should follow that HOT
chain, or ignore it. The xmax of the head of the HOT chain can be used
as an indicator of when the chain was created. Transactions started
before the pg_index xid could continue following the old rules and
insert into the _new_ index for HOT chain additions, and new
transactions would create HOT chains that could skip adding to the new
index. Cleanup of the hybrid HOT chains (some indexes take part, some
do not) would be more complex.
What xid would you place in pg_index? Xid of the transaction running
CREATE INDEX, ReadNewTransactionId() or what?
How does that work if you have a transaction that begins before CREATE
INDEX, and updates something after CREATE INDEX?
I know we have xid wrap-around, but I think the VACUUM FREEZE could
handle it by freezing the pg_index xid column value when it does the
table.
I don't think you can freeze the xid-column, we went through a similar
discussion on pg_class.relfrozenxid. But you can move it forward to
oldest xmin.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On Wed, 2007-03-21 at 10:47 -0400, Bruce Momjian wrote:
A different idea is to flag the _index_ as using HOT for the table or
not, using a boolean in pg_index. The idea is that when a new index is
created, it has its HOT boolean set to false and indexes all tuples and
ignores HOT chains. Then doing lookups using that index, the new index
does not follow HOT chains. We also add a boolean to pg_class to
indicate no new HOT chains should be created and set that to false once
the new index is created. Then, at some later time when all HOT chains
are dead, we can enable HOT chain following for the new index and allow
new HOT chains to be created.
"enable HOT chain following" would require us to hold an
AccessExclusiveLock on the index.
We know that solution exists, the question is: at what point would we
ever request that lock? Or would we just wait until that lock is next
taken before enabling it, giving the user no control over when its
taken? A separate DDL command would be effectively the same as what
Pavan has recently suggested.
--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com
Sorry, I was a bit too quick to respond. I didn't understand at first
how this differs from Pavan's/Simon's proposals.
Let me answer my own questions.
Heikki Linnakangas wrote:
Bruce Momjian wrote:
A different idea is to flag the _index_ as using HOT for the table or
not, using a boolean in pg_index. The idea is that when a new index is
created, it has its HOT boolean set to false and indexes all tuples and
ignores HOT chains. Then doing lookups using that index, the new index
does not follow HOT chains. We also add a boolean to pg_class to
indicate no new HOT chains should be created and set that to false once
the new index is created. Then, at some later time when all HOT chains
are dead, we can enable HOT chain following for the new index and allow
new HOT chains to be created.When exactly would all HOT chains be dead? AFAICS, that would be after
the xid of CREATE INDEX gets older than oldest xmin, and VACUUM is run
to prune and pointer-swing all HOT chains.
I still think that's true.
Would we have to wait after setting the new forbid_hot_updates-flag in
pg_class, to make sure everyone sees the change? What if CREATE INDEX
crashes, would we need a vacuum to reset the flag?
You wouldn't need to do any extra waits to set the forbid_hot_updates
flag, CREATE INDEX locks the table and already sends a relcache
invalidations to make the new index visible. CREATE INDEX CONCURRENTLY
waits already.
A more sophisticated idea would be to place an xid, rather than a
boolean, in pg_index to indicate which chains were created after the
index was created to control whether the index should follow that HOT
chain, or ignore it. The xmax of the head of the HOT chain can be used
as an indicator of when the chain was created. Transactions started
before the pg_index xid could continue following the old rules and
insert into the _new_ index for HOT chain additions, and new
transactions would create HOT chains that could skip adding to the new
index. Cleanup of the hybrid HOT chains (some indexes take part, some
do not) would be more complex.What xid would you place in pg_index? Xid of the transaction running
CREATE INDEX, ReadNewTransactionId() or what?
Apparently ReadNewTransactionId to make sure there's no existing tuples
with an xmax smaller than that.
How does that work if you have a transaction that begins before CREATE
INDEX, and updates something after CREATE INDEX?
You actually explained that above...
The HOT_UPDATED flag on a tuple would basically mean that all indexes
with xid < xmax doesn't contain an index pointer for the tuple, and all
others do. When inserting new updated tuples, we'd also need to maintain
that invariant.
I know we have xid wrap-around, but I think the VACUUM FREEZE could
handle it by freezing the pg_index xid column value when it does the
table.I don't think you can freeze the xid-column, we went through a similar
discussion on pg_class.relfrozenxid. But you can move it forward to
oldest xmin.
You could actually "freeze" the column, because unlike relfrozenid we
never need to unfreeze it.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
"Bruce Momjian" <bruce@momjian.us> writes:
We also add a boolean to pg_class to indicate no new HOT chains should be
created and set that to false once the new index is created.
Since we have all the index info in the relcache we could just skim through
all the indexes when we build the relcache and decide then whether we're
allowed to do HOT updates. That avoids problems if we crash while HOT updates
are disabled.
I think we need to think harder about exactly what test we would perform
against this xid to determine the two relevant tests,
a) whether to prohibit HOT updates (because the index is "too new")
b) whether to ignore HOT update chains when we use the index (because it's
"too new" and any HOT update chains predate it).
I fear it may imply that we have to keep performing cold updates until the
first vacuum after the xid expires.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote:
Bruce Momjian wrote:
A different idea is to flag the _index_ as using HOT for the table or
not, using a boolean in pg_index. The idea is that when a new index is
created, it has its HOT boolean set to false and indexes all tuples and
ignores HOT chains. Then doing lookups using that index, the new index
does not follow HOT chains. We also add a boolean to pg_class to
indicate no new HOT chains should be created and set that to false once
the new index is created. Then, at some later time when all HOT chains
are dead, we can enable HOT chain following for the new index and allow
new HOT chains to be created.When exactly would all HOT chains be dead? AFAICS, that would be after
the xid of CREATE INDEX gets older than oldest xmin, and VACUUM is run
to prune and pointer-swing all HOT chains.
Yes, I think so. We might need to have pg_autovacuum involved in
checking pg_index and doing a vacuum when needed, but that is only for
the boolean idea; the xid idea doesn't need it, I think.
Would we have to wait after setting the new forbid_hot_updates-flag in
pg_class, to make sure everyone sees the change? What if CREATE INDEX
crashes, would we need a vacuum to reset the flag?
Uh, I am thinking we would just set it at the start of CREATE INDEX, but
actually if we do the pg_index xid idea, we don't need to tell them
anything --- once they see the new index in pg_index, they will
automatically know whether to include the new index in the HOT tests
based on its own xid. Right now, if a new index is created, existing
backends know they need to insert into it, so it would be the same
mechanism, except they have to check the pg_index xid field.
If we use the xid idea, and CREATE INDEX crashes, there is no problem
because all the backend just keep going and never see the new pg_index
xid entry.
A more sophisticated idea would be to place an xid, rather than a
boolean, in pg_index to indicate which chains were created after the
index was created to control whether the index should follow that HOT
chain, or ignore it. The xmax of the head of the HOT chain can be used
as an indicator of when the chain was created. Transactions started
before the pg_index xid could continue following the old rules and
insert into the _new_ index for HOT chain additions, and new
transactions would create HOT chains that could skip adding to the new
index. Cleanup of the hybrid HOT chains (some indexes take part, some
do not) would be more complex.What xid would you place in pg_index? Xid of the transaction running
CREATE INDEX, ReadNewTransactionId() or what?
I would say the current xid counter value at time of index creation
completion, so currently running transactions are unaffected, but newly
started transactions create HOT chains that include the new index, and
we just cleanup the old hot chains as the currently-running transactions
complete.
How does that work if you have a transaction that begins before CREATE
INDEX, and updates something after CREATE INDEX?
OK, uses the old rules of always inserting into the new index, even if
it is creating a HOT chain. The bottom line is that we would now have
chains with different indexes involved, and the HOT head xmax compared
to pg_index tells us which indexes are involved in the chain.
I know we have xid wrap-around, but I think the VACUUM FREEZE could
handle it by freezing the pg_index xid column value when it does the
table.I don't think you can freeze the xid-column, we went through a similar
discussion on pg_class.relfrozenxid. But you can move it forward to
oldest xmin.
Sounds good.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Simon Riggs wrote:
On Wed, 2007-03-21 at 10:47 -0400, Bruce Momjian wrote:
A different idea is to flag the _index_ as using HOT for the table or
not, using a boolean in pg_index. The idea is that when a new index is
created, it has its HOT boolean set to false and indexes all tuples and
ignores HOT chains. Then doing lookups using that index, the new index
does not follow HOT chains. We also add a boolean to pg_class to
indicate no new HOT chains should be created and set that to false once
the new index is created. Then, at some later time when all HOT chains
are dead, we can enable HOT chain following for the new index and allow
new HOT chains to be created."enable HOT chain following" would require us to hold an
AccessExclusiveLock on the index.
OK.
We know that solution exists, the question is: at what point would we
ever request that lock? Or would we just wait until that lock is next
taken before enabling it, giving the user no control over when its
taken? A separate DDL command would be effectively the same as what
Pavan has recently suggested.
We could just request a lock and if it fails, try again later; it just
delays HOT updates, which is not a big problem. Allowing the user to do
it via a separate command seems to add nothing. The only thing I could
see would be adding an option to CREATE INDEX which waits and then does
it.
However, given recent comments, I think the xid idea, while more
complicated, is better because it has fewer restrictions on when things
happen.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
On 3/21/07, Bruce Momjian <bruce@momjian.us> wrote:
Bruce Momjian wrote:
I have read the HOT discussion and wanted to give my input. The major
issue is that CREATE INDEX might require a HOT chain to be split apart
if one of the new indexed columns changed in the HOT chain.To expand a little more, the problem is that when you split those HOT
chains, you have to insert new entries into the _existing_ indexes,
causing problems for concurrent sequential scans.
You mean index scans, right ? Sequential scans don't change with HOT.
I have a new idea. There has been a lot of focus on trying to tag each
tuple to indicate whether it is part of a HOT chain for individual
indexes, the idea being that some indexes will follow the HOT chain, and
some will not.
In general I would like to preserve the HOT properties at the end of
creation of new index. All index should point to the same root tuple
if pointing to a HOT-chain. The things might become otherwise messy
with the line pointer redirection, reuse of LP_DELETEd tuples and
other things that we have put in.
Disabling HOT-updates using pg_class attribute has same issue
with my earlier proposal. What do we do if the backend crashes before
it can enable it again ? Who would reset the flag ? We could have lived
without DDL and CHILL command if we would have answers for
these questions.
A different idea here:
Would it be acceptable to force CREATE INDEX to be run outside
a transaction block just like CREATE INDEX CONCURRENTLY ? If thats
acceptable, we can do the following:
CREATE INDEX:
Since CREATE INDEX locks out table for UPDATEs, we just need
to wait enough before we start the heap scan so that when we do
heap scan, all HOT-chains can be pruned to a single tuple (with or
without the redirected line pointer). So when the new index is built,
we first prune the entire page of HOT-chains and insert the TID of
the root tuple into the new index. IOW we need to wait for all
transactions in the snapshot after acquiring ShareLock but before
we start heap scan.
CREATE INDEX CONCURRENTLY:
Simon has already posted a design that would work with CREATE
INDEX CONCURRENTLY. I think we need to tweak it a bit so
that we can handle the HOT-updated tuples after then first heap
scan, but before the index is visible to all transactions. Once the
new index is visible, the heap_update() code path would take care
of avoiding HOT-updates if the column on which new index is being
built is updated.
It seems much simpler to me do something like this. But important
question is whether the restriction that CREATE INDEX can not
be run in a transaction block is acceptable ?
Comments ?
Thanks,
Pavan
--
EnterpriseDB http://www.enterprisedb.com
Ah, sounds like you have the idea clearly now. Great.
---------------------------------------------------------------------------
Heikki Linnakangas wrote:
Sorry, I was a bit too quick to respond. I didn't understand at first
how this differs from Pavan's/Simon's proposals.Let me answer my own questions.
Heikki Linnakangas wrote:
Bruce Momjian wrote:
A different idea is to flag the _index_ as using HOT for the table or
not, using a boolean in pg_index. The idea is that when a new index is
created, it has its HOT boolean set to false and indexes all tuples and
ignores HOT chains. Then doing lookups using that index, the new index
does not follow HOT chains. We also add a boolean to pg_class to
indicate no new HOT chains should be created and set that to false once
the new index is created. Then, at some later time when all HOT chains
are dead, we can enable HOT chain following for the new index and allow
new HOT chains to be created.When exactly would all HOT chains be dead? AFAICS, that would be after
the xid of CREATE INDEX gets older than oldest xmin, and VACUUM is run
to prune and pointer-swing all HOT chains.I still think that's true.
Would we have to wait after setting the new forbid_hot_updates-flag in
pg_class, to make sure everyone sees the change? What if CREATE INDEX
crashes, would we need a vacuum to reset the flag?You wouldn't need to do any extra waits to set the forbid_hot_updates
flag, CREATE INDEX locks the table and already sends a relcache
invalidations to make the new index visible. CREATE INDEX CONCURRENTLY
waits already.A more sophisticated idea would be to place an xid, rather than a
boolean, in pg_index to indicate which chains were created after the
index was created to control whether the index should follow that HOT
chain, or ignore it. The xmax of the head of the HOT chain can be used
as an indicator of when the chain was created. Transactions started
before the pg_index xid could continue following the old rules and
insert into the _new_ index for HOT chain additions, and new
transactions would create HOT chains that could skip adding to the new
index. Cleanup of the hybrid HOT chains (some indexes take part, some
do not) would be more complex.What xid would you place in pg_index? Xid of the transaction running
CREATE INDEX, ReadNewTransactionId() or what?Apparently ReadNewTransactionId to make sure there's no existing tuples
with an xmax smaller than that.How does that work if you have a transaction that begins before CREATE
INDEX, and updates something after CREATE INDEX?You actually explained that above...
The HOT_UPDATED flag on a tuple would basically mean that all indexes
with xid < xmax doesn't contain an index pointer for the tuple, and all
others do. When inserting new updated tuples, we'd also need to maintain
that invariant.I know we have xid wrap-around, but I think the VACUUM FREEZE could
handle it by freezing the pg_index xid column value when it does the
table.I don't think you can freeze the xid-column, we went through a similar
discussion on pg_class.relfrozenxid. But you can move it forward to
oldest xmin.You could actually "freeze" the column, because unlike relfrozenid we
never need to unfreeze it.--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
As stated in later email, I think we should focus on the xid idea
because it is more flexible.
---------------------------------------------------------------------------
Gregory Stark wrote:
"Bruce Momjian" <bruce@momjian.us> writes:
We also add a boolean to pg_class to indicate no new HOT chains should be
created and set that to false once the new index is created.Since we have all the index info in the relcache we could just skim through
all the indexes when we build the relcache and decide then whether we're
allowed to do HOT updates. That avoids problems if we crash while HOT updates
are disabled.I think we need to think harder about exactly what test we would perform
against this xid to determine the two relevant tests,a) whether to prohibit HOT updates (because the index is "too new")
b) whether to ignore HOT update chains when we use the index (because it's
"too new" and any HOT update chains predate it).I fear it may imply that we have to keep performing cold updates until the
first vacuum after the xid expires.--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Pavan Deolasee wrote:
On 3/21/07, Bruce Momjian <bruce@momjian.us> wrote:
Bruce Momjian wrote:
I have read the HOT discussion and wanted to give my input. The major
issue is that CREATE INDEX might require a HOT chain to be split apart
if one of the new indexed columns changed in the HOT chain.To expand a little more, the problem is that when you split those HOT
chains, you have to insert new entries into the _existing_ indexes,
causing problems for concurrent sequential scans.You mean index scans, right ? Sequential scans don't change with HOT.
Sorry, yes, index scans.
I have a new idea. There has been a lot of focus on trying to tag each
tuple to indicate whether it is part of a HOT chain for individual
indexes, the idea being that some indexes will follow the HOT chain, and
some will not.In general I would like to preserve the HOT properties at the end of
creation of new index. All index should point to the same root tuple
if pointing to a HOT-chain. The things might become otherwise messy
with the line pointer redirection, reuse of LP_DELETEd tuples and
other things that we have put in.Disabling HOT-updates using pg_class attribute has same issue
with my earlier proposal. What do we do if the backend crashes before
it can enable it again ? Who would reset the flag ? We could have lived
without DDL and CHILL command if we would have answers for
these questions.
Let's focus on the xid idea, as stated in earlier emails. It has fewer
restrictions.
A different idea here:
Would it be acceptable to force CREATE INDEX to be run outside
a transaction block just like CREATE INDEX CONCURRENTLY ? If thats
acceptable, we can do the following:
Yea, that is possible, but not ideal.
CREATE INDEX:
Since CREATE INDEX locks out table for UPDATEs, we just need
to wait enough before we start the heap scan so that when we do
heap scan, all HOT-chains can be pruned to a single tuple (with or
without the redirected line pointer). So when the new index is built,
we first prune the entire page of HOT-chains and insert the TID of
the root tuple into the new index. IOW we need to wait for all
transactions in the snapshot after acquiring ShareLock but before
we start heap scan.
I am worried that will require CREATE INDEX to wait for a long time.
CREATE INDEX CONCURRENTLY:
Simon has already posted a design that would work with CREATE
INDEX CONCURRENTLY. I think we need to tweak it a bit so
that we can handle the HOT-updated tuples after then first heap
scan, but before the index is visible to all transactions. Once the
new index is visible, the heap_update() code path would take care
of avoiding HOT-updates if the column on which new index is being
built is updated.It seems much simpler to me do something like this. But important
question is whether the restriction that CREATE INDEX can not
be run in a transaction block is acceptable ?
Is the pg_index xid idea too complex? It seems to give you the
per-tuple index bit, without the restrictions.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
"Bruce Momjian" <bruce@momjian.us> writes:
As stated in later email, I think we should focus on the xid idea
because it is more flexible.
Sorry if I was unclear. I agree, my comment and questions are all predicated
on the assumption that we would go with xids.
---------------------------------------------------------------------------
Gregory Stark wrote:
"Bruce Momjian" <bruce@momjian.us> writes:
We also add a boolean to pg_class to indicate no new HOT chains should be
created and set that to false once the new index is created.Since we have all the index info in the relcache we could just skim through
all the indexes when we build the relcache and decide then whether we're
allowed to do HOT updates. That avoids problems if we crash while HOT updates
are disabled.I think we need to think harder about exactly what test we would perform
against this xid to determine the two relevant tests,a) whether to prohibit HOT updates (because the index is "too new")
b) whether to ignore HOT update chains when we use the index (because it's
"too new" and any HOT update chains predate it).I fear it may imply that we have to keep performing cold updates until the
first vacuum after the xid expires.--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com+ If your life is a hard drive, Christ can be your backup. +
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
On 3/21/07, Bruce Momjian <bruce@momjian.us> wrote:
I am worried that will require CREATE INDEX to wait for a long time.
Not unless there are long running transactions. We are not waiting
for the lock, but only for the current transactions to finish.
Is the pg_index xid idea too complex? It seems to give you the
per-tuple index bit, without the restrictions.
How do we handle HEAP_ONLY tuple cleanup ? If I understood
the proposal correctly, at the end of the create index, a HEAP_ONLY
tuple may have pointer from the new index, isn't it ?
Thanks,
Pavan
--
EnterpriseDB http://www.enterprisedb.com
Gregory Stark wrote:
"Bruce Momjian" <bruce@momjian.us> writes:
As stated in later email, I think we should focus on the xid idea
because it is more flexible.Sorry if I was unclear. I agree, my comment and questions are all predicated
on the assumption that we would go with xids.
OK, replies below.
Gregory Stark wrote:
"Bruce Momjian" <bruce@momjian.us> writes:
We also add a boolean to pg_class to indicate no new HOT chains should be
created and set that to false once the new index is created.Since we have all the index info in the relcache we could just skim through
all the indexes when we build the relcache and decide then whether we're
allowed to do HOT updates. That avoids problems if we crash while HOT updates
are disabled.
Uh, with xid, we don't disable HOT updates at all --- we just control,
via xids, which indexes are part of which HOT update chain.
I think we need to think harder about exactly what test we would perform
against this xid to determine the two relevant tests,a) whether to prohibit HOT updates (because the index is "too new")
Never.
b) whether to ignore HOT update chains when we use the index (because it's
"too new" and any HOT update chains predate it).
We have to look at the xmax of the HOT chain head and compare that to
the index xid.
I fear it may imply that we have to keep performing cold updates until the
first vacuum after the xid expires.
Yes, effectively, that is true, because once we have the new index in
place, we can't reuse the old HOT chains --- we need VACUUM at some
point, but new HOT chains can be created.
Actually, since we have a _write_ lock on the table, there are no table
changes during index creation, so we only need to worry about changes
after the index is created.
After the index is created, old HOT chains have to be cleaned up via
VACUUM because we have the new index pointing into the HOT chain.
Existing transactions can't create new HOT chains because they would
have to insert into the new index (because of their xid), so they might
as well not use HOT. New transactions can create new HOT chains because
their xid marks them as including the new index in the index group.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Pavan Deolasee wrote:
On 3/21/07, Bruce Momjian <bruce@momjian.us> wrote:
I am worried that will require CREATE INDEX to wait for a long time.
Not unless there are long running transactions. We are not waiting
for the lock, but only for the current transactions to finish.
Waiting for all transactions might take a while, no?
Is the pg_index xid idea too complex? It seems to give you the
per-tuple index bit, without the restrictions.How do we handle HEAP_ONLY tuple cleanup ? If I understood
the proposal correctly, at the end of the create index, a HEAP_ONLY
tuple may have pointer from the new index, isn't it ?
Right. You would need vacuum to clean up the HEAP_ONLY tuples. I just
sent an email about those deails. We might have autovacuum check
pg_index and do it automatically.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Effectively, my idea is not to chill/break the HOT chains during index
creation, but rather to abandon them and wait for VACUUM to clean them
up.
My idea is much closer to the idea of a bit per index on every tuple,
except the tuple xmax and pg_index xid replace them.
---------------------------------------------------------------------------
Bruce Momjian wrote:
Pavan Deolasee wrote:
On 3/21/07, Bruce Momjian <bruce@momjian.us> wrote:
I am worried that will require CREATE INDEX to wait for a long time.
Not unless there are long running transactions. We are not waiting
for the lock, but only for the current transactions to finish.Waiting for all transactions might take a while, no?
Is the pg_index xid idea too complex? It seems to give you the
per-tuple index bit, without the restrictions.How do we handle HEAP_ONLY tuple cleanup ? If I understood
the proposal correctly, at the end of the create index, a HEAP_ONLY
tuple may have pointer from the new index, isn't it ?Right. You would need vacuum to clean up the HEAP_ONLY tuples. I just
sent an email about those deails. We might have autovacuum check
pg_index and do it automatically.--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com+ If your life is a hard drive, Christ can be your backup. +
---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
On 3/21/07, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
It seems much simpler to me do something like this. But important
question is whether the restriction that CREATE INDEX can not
be run in a transaction block is acceptable ?
yikes -- this is huge, huge price to pay, IMHO. Think about DDL that
implies index creation such as adding unique constraint to a table,
many postgresql users (including me) take advantage of that in update
systems to production databases.
merlin
On 3/21/07, Merlin Moncure <mmoncure@gmail.com> wrote:
On 3/21/07, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
It seems much simpler to me do something like this. But important
question is whether the restriction that CREATE INDEX can not
be run in a transaction block is acceptable ?yikes -- this is huge, huge price to pay, IMHO. Think about DDL that
implies index creation such as adding unique constraint to a table,
many postgresql users (including me) take advantage of that in update
systems to production databases.
I didn't understand that quite well. How does it help to run CREATE
INDEX inside a transaction ?
Thanks,
Pavan
--
EnterpriseDB http://www.enterprisedb.com