Improve the concurency of vacuum full table and select statement on the same relation

Started by Jinyuover 10 years ago9 messages
#1Jinyu
call_jinyu@126.com

Now vacuum full table takes an AccessExclusiveLock on relation at start and select statement takes an AccessShareLock on relation. So 'vacuum full table' blocks select statement on the same table until it is committed and select statement block 'vacuum full table' until it is finished. The concurency is very very bad.

Proposal: vacuum full table takes an ExclusiveLock on relation instead of AccessExclusiveLock at start. It can' block select statement before call function "finish_heap_swap". and select statement is safe because vacuum full table copys tuples from old relation to new relation before calling function "finish_heap_swap". But it must take an AccessExclusiveLock on relation when call function "finish_heap_swap" in order to block select statement on the same relation.

This solution can improve the concurency. the following shows the reasons.
1. The Function 'copy_heap_data' which copys tuples from old relation to new relation takes most elapsed time of vacuum full table. And it takes an ExclusiveLock on relation when call function "copy_heap_data". So select statement on the same relation can't be blocked in the most elapsed time of vacuum full table.
2. The elapsed time of "finish_heap_swap" is very short, So the blocking time window is very short.

This proposal can also improve the concurency of cluster table and select statement. Because the execution steps of cluster table is similar to vacuum full table. The select statement is safe before cluster table call function "finish_heap_swap".

Please let me know if I miss something.

Jinyu Zhang
thanks

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jinyu (#1)
Re: Improve the concurency of vacuum full table and select statement on the same relation

Jinyu <call_jinyu@126.com> writes:

Proposal: vacuum full table takes an ExclusiveLock on relation instead of AccessExclusiveLock at start. It can' block select statement before call function "finish_heap_swap". and select statement is safe because vacuum full table copys tuples from old relation to new relation before calling function "finish_heap_swap". But it must take an AccessExclusiveLock on relation when call function "finish_heap_swap" in order to block select statement on the same relation.

This solution can improve the concurency. the following shows the reasons.

What it's more likely to do is cause the vacuum full to fail altogether,
after doing a lot of work. Lock upgrade is a bad thing because it tends
to result in deadlocks.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Jinyu
call_jinyu@126.com
In reply to: Tom Lane (#2)
Re: Improve the concurency of vacuum full table and select statement on the same relation

Get it now, thanks.
The lock upgrade(from ExclusiveLock to AccessExclusiveLock) may result in deadlocks iff one transaction first takes an AccessShareLock and then takes a lock(lockmode > AccessShareLock) on the same relation.

The single SQL statement can't take an AccessShareLock and then takes a lock(lockmode > AccessShareLock) on the same relation.
In fact, there is lock upgrade in transaction block which includes multiple query.like this transaction block "start transaction; select query; DML/DDL", it is from AccessShareLock to lockmode > AccessShareLock.

Now there may be deadlocks when run multiple transaction blocks even if no vacuum full. The some transaction will report error to break deadlock once deadlocks happen. So the vacuum full table may failed after doing lots of work in some time.

In some scenes, there are not explicit transaction block (no lock upgrade from AccessShareLock to lockmode > AccessShareLock), the deadlocks rarely happens. perhaps we can provide an option for vacuum full to let user choose whether "cluster/vacuum full" block select statement for very short time.

Are there other solutions to improve the concurency of vacuum full/cluster and select statement on the same relation?

Jinyu Zhang,
thanks

At 2015-10-10 23:34:41, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Jinyu <call_jinyu@126.com> writes:

Proposal: vacuum full table takes an ExclusiveLock on relation instead of AccessExclusiveLock at start. It can' block select statement before call function "finish_heap_swap". and select statement is safe because vacuum full table copys tuples from old relation to new relation before calling function "finish_heap_swap". But it must take an AccessExclusiveLock on relation when call function "finish_heap_swap" in order to block select statement on the same relation.

This solution can improve the concurency. the following shows the reasons.

What it's more likely to do is cause the vacuum full to fail altogether,
after doing a lot of work. Lock upgrade is a bad thing because it tends
to result in deadlocks.

regards, tom lane

#4Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Jinyu (#3)
Re: Improve the concurency of vacuum full table and select statement on the same relation

On 10/11/15 6:55 AM, Jinyu wrote:

Are there other solutions to improve the concurency of vacuum
full/cluster and select statement on the same relation?

ISTM that if we were going to put effort into this it makes more sense
to pull pg_repack into core. BTW, it's approach to this is to summarily
kill anything that attempts DDL on a table being repacked.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Jinyu
call_jinyu@126.com
In reply to: Jim Nasby (#4)
Re: Improve the concurency of vacuum full table and select statement on the same relation

it's approach to this is to summarily kill anything that attempts DDL on a table being repacked.

Why? I am confused with it. Could you please explain this?

Jinyu Zhang
thanks

At 2015-10-12 23:46:12, "Jim Nasby" <Jim.Nasby@BlueTreble.com> wrote:

Show quoted text

On 10/11/15 6:55 AM, Jinyu wrote:

Are there other solutions to improve the concurency of vacuum
full/cluster and select statement on the same relation?

ISTM that if we were going to put effort into this it makes more sense
to pull pg_repack into core. BTW, it's approach to this is to summarily
kill anything that attempts DDL on a table being repacked.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

#6Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Jinyu (#5)
Re: Improve the concurency of vacuum full table and select statement on the same relation

On 10/13/15 10:18 AM, Jinyu wrote:

At 2015-10-12 23:46:12, "Jim Nasby" <Jim.Nasby@BlueTreble.com> wrote:

On 10/11/15 6:55 AM, Jinyu wrote:

Are there other solutions to improve the concurency of vacuum
full/cluster and select statement on the same relation?

ISTM that if we were going to put effort into this it makes more sense
to pull pg_repack into core. BTW, it's approach to this is to summarily
kill anything that attempts DDL on a table being repacked.

Please don't top-post, it leads to confusion.

it's approach to this is to summarily kill anything that attempts DDL

on a table being repacked.
Why? I am confused with it. Could you please explain this?

It's just how the authors of pg_repack decided to handle it. It seems
pretty reasonable, since you probably don't want an errant DDL statement
to cause the rollback of hours or days of pg_repack work.

Ultimately, I don't think you'll find many people interested in working
on this, because the whole goal is to never need VACUUM FULL or
pg_repack. If you're clustering just for the sake of clustering, that
has it's own set of difficulties that should be addressed.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Robert Haas
robertmhaas@gmail.com
In reply to: Jim Nasby (#6)
Re: Improve the concurency of vacuum full table and select statement on the same relation

On Thu, Oct 15, 2015 at 8:28 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

It's just how the authors of pg_repack decided to handle it. It seems pretty
reasonable, since you probably don't want an errant DDL statement to cause
the rollback of hours or days of pg_repack work.

Ultimately, I don't think you'll find many people interested in working on
this, because the whole goal is to never need VACUUM FULL or pg_repack. If
you're clustering just for the sake of clustering, that has it's own set of
difficulties that should be addressed.

I think the topic of online table reorganization is a pretty important
one, actually. That is a need that we have had for a long time,
creates serious operational problems for users, and it's also a need
that is not going away. I think the chances of eliminating that need
completely, even if we rearchitected or heap storage, are nil.

I think the bigger issue is that it's a very hard problem to solve.
pg_repack is one approach, but I've heard more than one person say
that, as C-3PO said about the asteroid, it may not be entirely stable.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Robert Haas (#7)
Re: Improve the concurency of vacuum full table and select statement on the same relation

On 10/16/15 10:04 AM, Robert Haas wrote:

On Thu, Oct 15, 2015 at 8:28 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

It's just how the authors of pg_repack decided to handle it. It seems pretty
reasonable, since you probably don't want an errant DDL statement to cause
the rollback of hours or days of pg_repack work.

Ultimately, I don't think you'll find many people interested in working on
this, because the whole goal is to never need VACUUM FULL or pg_repack. If
you're clustering just for the sake of clustering, that has it's own set of
difficulties that should be addressed.

I think the topic of online table reorganization is a pretty important
one, actually. That is a need that we have had for a long time,
creates serious operational problems for users, and it's also a need
that is not going away. I think the chances of eliminating that need
completely, even if we rearchitected or heap storage, are nil.

Yeah, which is why I made the comment about CLUSTER.

I think the bigger issue is that it's a very hard problem to solve.

ISTM nothing can be done here until there's some way to influence how
pages get pulled from the FSM (IE: pull from one of the first X pages
with free space). Maybe having some way to expose that would be enough
of a start.

pg_repack is one approach, but I've heard more than one person say
that, as C-3PO said about the asteroid, it may not be entirely stable.

I looked at it recently, and it seems to be under active development.
But I agree it'd be better if we could handle this internally.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Jinyu
call_jinyu@126.com
In reply to: Robert Haas (#7)
Re: Improve the concurency of vacuum full table and select statement on the same relation

The lock upgrade for vacuum full table tends to cause deadlock with other lock upgrade transaction which is from AccessShareLock to lockmode > AccessShareLock. Tom Lane's concern is that it will cause vacuum full failed after do a lot of work.
But If we can always let other transaction failed to break deadlock instead of vacuum full table, how about this lock upgrade solution?
For example: we can enlarge the 'DeadlockTimeout' for vacuum full table transaction to avoid deadlock check.

Jinyu Zhang
regards

At 2015-10-16 23:04:51, "Robert Haas" <robertmhaas@gmail.com> wrote:

Show quoted text

On Thu, Oct 15, 2015 at 8:28 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

It's just how the authors of pg_repack decided to handle it. It seems pretty
reasonable, since you probably don't want an errant DDL statement to cause
the rollback of hours or days of pg_repack work.

Ultimately, I don't think you'll find many people interested in working on
this, because the whole goal is to never need VACUUM FULL or pg_repack. If
you're clustering just for the sake of clustering, that has it's own set of
difficulties that should be addressed.

I think the topic of online table reorganization is a pretty important
one, actually. That is a need that we have had for a long time,
creates serious operational problems for users, and it's also a need
that is not going away. I think the chances of eliminating that need
completely, even if we rearchitected or heap storage, are nil.

I think the bigger issue is that it's a very hard problem to solve.
pg_repack is one approach, but I've heard more than one person say
that, as C-3PO said about the asteroid, it may not be entirely stable.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company