Missing CONCURRENT VACUUM (Was: Release notes for 8.1)

Started by Hannu Krosingover 20 years ago19 messageshackers
Jump to latest
#1Hannu Krosing
hannu@tm.ee

Once more:

I would like to get at least some answer, why my patch for enabling
concurrent VACUUM was left out from 8.1.

It was submitted well in time, and there was only minimal amount of
discussion of an earlier patch,and AFAIK I addressed all the issues
raised there.

I really hate to have to start maintaining my own patches for "24/7,
OLTP-enabled" postgreSQL, at least unless I know that there is a good
reason why the mainline postgreSQL can't have these.

--
Hannu Krosing <hannu@skype.net>

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hannu Krosing (#1)
Re: Missing CONCURRENT VACUUM (Was: Release notes for 8.1)

Hannu Krosing <hannu@skype.net> writes:

Once more:
I would like to get at least some answer, why my patch for enabling
concurrent VACUUM was left out from 8.1.

You did not respond to this:
http://archives.postgresql.org/pgsql-patches/2005-08/msg00238.php

regards, tom lane

#3Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#2)
Re: Missing CONCURRENT VACUUM (Was: Release notes for 8.1)

Tom Lane wrote:

Hannu Krosing <hannu@skype.net> writes:

Once more:
I would like to get at least some answer, why my patch for enabling
concurrent VACUUM was left out from 8.1.

You did not respond to this:
http://archives.postgresql.org/pgsql-patches/2005-08/msg00238.php

Yep, that's why I put it in the 8.2 queue.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#4Hannu Krosing
hannu@tm.ee
In reply to: Tom Lane (#2)
Re: Missing CONCURRENT VACUUM (Was: Release notes for

On T, 2005-08-16 at 18:26 -0400, Tom Lane wrote:

Hannu Krosing <hannu@skype.net> writes:

Once more:
I would like to get at least some answer, why my patch for enabling
concurrent VACUUM was left out from 8.1.

You did not respond to this:
http://archives.postgresql.org/pgsql-patches/2005-08/msg00238.php

Somehow this did not reach me :(

I'll answer this here:

Bruce Momjian <pgman ( at ) candle ( dot ) pha ( dot ) pa ( dot ) us> writes:

Is there any particular reason for not putting it in 8.1 ?

I thought there was still uncertainty about the patch. Is there?

Considerable uncertainty, in my mind. What we've got here is some
pretty fundamental hacking on the transaction visibility logic, and
neither Hannu nor anyone else has produced a convincing argument
that it's correct. "It hasn't failed yet in my usage" isn't enough
to give me a good feeling about it.

Agreed.

Some specific concerns:

* Given that VACUUM ANALYZE does create new output tuples stamped with
its xid, I'm unclear on what happens in pg_statistic with this code in
place.

Actually any VACUUM, not only VACUUM ANALYSE, updates pg_class at the end.
That's why I exclude only one of the transactions of the VACUUM command, and that
transaction does not create any new tuples, it only removes old ones.

It seems entirely possible that someone might conclude the
analyze tuples are from a crashed transaction and mark them invalid
before the analyze can commit (notice TransactionIdIsInProgress does not
bother looking in PGPROC when the tuple xmin is less than RecentXmin).

Once more, only 2nd transaction of LAZY VACUUM is affected, and that one does
only (heap scan + clean indexes + clean heap) and _only_ removes old tuples.

* If the vacuum xact is older than what others think is the global xmin,
it could have problems with other vacuums removing tuples it should
still be able to see (presumably only in the system catalogs, so maybe
this isn't an issue, but I'm unsure).

The cleanup transaction does no lookups in system catalogs.

A related scenario that I don't
think can be dismissed is someone truncating off part of pg_subtrans or
pg_multixact that the vacuum still needs.

In my patch I specifically exclude TruncateSUBTRANS from using the
inVacuum flag

At the time I originally submitted my patch, GetOldestXmin was only used
in VACUUM and CREATE INDEX, others had other means of getting oldest
Xmin, and these were not affected by my patch. When I reworked it before
last submit, i changed the only nwe use (in xlog.c, line 5165) to use a
new version of GetOldestXmin with an extra flag tu tell it to NOT exlude
transactions running vacuum.

The transaction running the heap scan/cleanup part of the vacuum command
only sets a new isVacuum flag, and this is only used by GetOldestXmin
functions. Other means of getting OldestXmin still get exactly the old
behaviour. GetOldestXmin() does exclude xmin's with isVacuum set only
when called by other VACUUMS. this is controlled by the (new) second
argument of GetOldestXmin().

So I think that both your concerns expressed here are _already_
addressed by the latest patch in:

http://archives.postgresql.org/pgsql-patches/2005-07/msg00086.php

Please check the actual patch and advise if anything is still missing.

--
Hannu Krosing <hannu@skype.net>

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hannu Krosing (#4)
Re: Missing CONCURRENT VACUUM (Was: Release notes for

Hannu Krosing <hannu@skype.net> writes:

On T, 2005-08-16 at 18:26 -0400, Tom Lane wrote:

Some specific concerns:

* Given that VACUUM ANALYZE does create new output tuples stamped with
its xid, I'm unclear on what happens in pg_statistic with this code in
place.

Actually any VACUUM, not only VACUUM ANALYSE, updates pg_class at the end.
That's why I exclude only one of the transactions of the VACUUM command, and that
transaction does not create any new tuples, it only removes old ones.

vac_update_relstats isn't the issue because it doesn't create a new
tuple. I was concerned about ANALYZE --- but since that's done in a
separate transaction that's not marked inVacuum, it's not at risk. So
OK, that's all right.

* If the vacuum xact is older than what others think is the global xmin,
it could have problems with other vacuums removing tuples it should
still be able to see (presumably only in the system catalogs, so maybe
this isn't an issue, but I'm unsure).

The cleanup transaction does no lookups in system catalogs.

Oh? It certainly has to open relations and indexes. I think that all
of that stuff may be done with SnapshotNow, rather than an xmin-related
snap, but it's still nervous-making.

A related scenario that I don't
think can be dismissed is someone truncating off part of pg_subtrans or
pg_multixact that the vacuum still needs.

In my patch I specifically exclude TruncateSUBTRANS from using the
inVacuum flag

You missed vac_truncate_clog, though.

So I think that both your concerns expressed here are _already_
addressed by the latest patch in:
http://archives.postgresql.org/pgsql-patches/2005-07/msg00086.php

I have to admit that in my earlier message, I was looking at the version
of the patch that Bruce had on his patch page --- which I now see was not
the latest. The idea of making GetOldestXmin only conditionally ignore
vacuums certainly makes it a lot safer.

Please check the actual patch and advise if anything is still missing.

There's still a fair amount of breakage in this patch --- eg, in the
VACUUM FULL case it manages to invoke *both* full_vacuum_rel and
lazy_vacuum_rel --- but I think it can probably be made to work.
I'll take another pass at it.

regards, tom lane

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hannu Krosing (#4)
Re: Missing CONCURRENT VACUUM (Was: Release notes for

Hannu Krosing <hannu@skype.net> writes:

Please check the actual patch and advise if anything is still missing.

While testing this I realized that it does not in fact work as
advertised. It will only exclude long-running VACUUMs from other
VACUUMs' OldestXmin if *all* the transactions in the system are lazy
VACUUMs. If there is even one regular transaction in the system,
that transaction will include the VACUUMs in its MyProc->xmin, and
thence GetOldestXmin will have to include them in its result.

AFAICS the only way to fix this would be to exclude inVacuum
transactions from GetSnapshotData's calculations as well. That
makes the patch far more invasive, and I'm not confident I can work
out all the implications. (In particular, the consequences for
TransactionIdIsInProgress look bad. I don't think we want a VACUUM
to be seen as not-in-progress.)

So I'm bouncing this patch again...

regards, tom lane

#7Hannu Krosing
hannu@tm.ee
In reply to: Tom Lane (#6)
Re: Missing CONCURRENT VACUUM (Was: Release notes for

On K, 2005-08-17 at 14:48 -0400, Tom Lane wrote:

Hannu Krosing <hannu@skype.net> writes:

Please check the actual patch and advise if anything is still missing.

While testing this I realized that it does not in fact work as
advertised. It will only exclude long-running VACUUMs from other
VACUUMs' OldestXmin if *all* the transactions in the system are lazy
VACUUMs. If there is even one regular transaction in the system,
that transaction will include the VACUUMs in its MyProc->xmin, and
thence GetOldestXmin will have to include them in its result.

Only if these regular transactions are running in SERIALIZABLE isolation
level, else MyProc->xmin is not set inside GetSnapshotData. As my
transactions mostly run in READ COMMITTED mode I did not care about
those and later forgot about it.

AFAICS the only way to fix this would be to exclude inVacuum
transactions from GetSnapshotData's calculations as well. That
makes the patch far more invasive, and I'm not confident I can work
out all the implications. (In particular, the consequences for
TransactionIdIsInProgress look bad. I don't think we want a VACUUM
to be seen as not-in-progress.)

So I'm bouncing this patch again...

Please don't.

Even with current functionality it is part of solving the problem of
being able to vacuum small tables while vacuuming big ones at the same
time. In a scenario, where I use it, there are lot of OLTP (30-50ms)
transactions. These are run in READ COMMITTED mode, but even if I needed
them to be in SERIALIZABLE mode (or my reasoning about MyProc->xmin is
wrong), I can find (or force if needed) a few ms window where no other
transaction is running to start my VACUUM LAZY in a mode it can actually
clean up the tables.

This is still much better than not being able to do it at all. I'm ready
to write the documentation explaining it all in detail to those really
needing it.

--
Hannu Krosing <hannu@skype.net>

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hannu Krosing (#7)
Re: Missing CONCURRENT VACUUM (Was: Release notes for

Hannu Krosing <hannu@skype.net> writes:

On K, 2005-08-17 at 14:48 -0400, Tom Lane wrote:

While testing this I realized that it does not in fact work as
advertised. It will only exclude long-running VACUUMs from other
VACUUMs' OldestXmin if *all* the transactions in the system are lazy
VACUUMs. If there is even one regular transaction in the system,
that transaction will include the VACUUMs in its MyProc->xmin, and
thence GetOldestXmin will have to include them in its result.

Only if these regular transactions are running in SERIALIZABLE isolation
level, else MyProc->xmin is not set inside GetSnapshotData.

Better read the code again. The first snap in *any* transaction sets
MyProc->xmin.

regards, tom lane

#9Hannu Krosing
hannu@tm.ee
In reply to: Tom Lane (#8)
Re: Missing CONCURRENT VACUUM (Was: Release notes for

On K, 2005-08-17 at 16:45 -0400, Tom Lane wrote:

Hannu Krosing <hannu@skype.net> writes:

On K, 2005-08-17 at 14:48 -0400, Tom Lane wrote:

While testing this I realized that it does not in fact work as
advertised. It will only exclude long-running VACUUMs from other
VACUUMs' OldestXmin if *all* the transactions in the system are lazy
VACUUMs. If there is even one regular transaction in the system,
that transaction will include the VACUUMs in its MyProc->xmin, and
thence GetOldestXmin will have to include them in its result.

Only if these regular transactions are running in SERIALIZABLE isolation
level, else MyProc->xmin is not set inside GetSnapshotData.

Better read the code again. The first snap in *any* transaction sets
MyProc->xmin.

Can't find the place :(

Could you point to the file / function that does this.

--
Hannu Krosing <hannu@skype.net>

#10Hannu Krosing
hannu@tm.ee
In reply to: Hannu Krosing (#9)
Re: Missing CONCURRENT VACUUM (Was: Release notes for

On K, 2005-08-17 at 23:58 +0300, Hannu Krosing wrote:

On K, 2005-08-17 at 16:45 -0400, Tom Lane wrote:

Hannu Krosing <hannu@skype.net> writes:

On K, 2005-08-17 at 14:48 -0400, Tom Lane wrote:

While testing this I realized that it does not in fact work as
advertised. It will only exclude long-running VACUUMs from other
VACUUMs' OldestXmin if *all* the transactions in the system are lazy
VACUUMs. If there is even one regular transaction in the system,
that transaction will include the VACUUMs in its MyProc->xmin, and
thence GetOldestXmin will have to include them in its result.

Only if these regular transactions are running in SERIALIZABLE isolation
level, else MyProc->xmin is not set inside GetSnapshotData.

Better read the code again. The first snap in *any* transaction sets
MyProc->xmin.

Can't find the place :(

Could you point to the file / function that does this.

Nevermind. Found it - GetTransactionSnapshot() always gets a
serializable snapshot first.

--
Hannu Krosing <hannu@tm.ee>

#11Hannu Krosing
hannu@tm.ee
In reply to: Hannu Krosing (#1)
Re: Missing CONCURRENT VACUUM (Was: Release notes for

On E, 2005-08-22 at 00:36 +0300, Hannu Krosing wrote:

On K, 2005-08-17 at 14:48 -0400, Tom Lane wrote:

That
makes the patch far more invasive, and I'm not confident I can work
out all the implications. (In particular, the consequences for
TransactionIdIsInProgress look bad. I don't think we want a VACUUM
to be seen as not-in-progress.)

The way the attached patch does it should hopefully not have these
implications.

So I'm bouncing this patch again...

Please check the attached patch and apply if possible.

I also think this is a good time for this change as some people are
already doing extensive concurrent vacuum testing due to the t_ctid
chain fix, so this one would get all the testing for free.

This patch is against CVS checkout only a few hours old.

Alas, I just noticed that it still runs *both* full_vacuum_rel and
lazy_vacuum_rel due to missing {}.

And probably misses vac_truncate_clog use of inVacuum (have to check
that).

Sorry for confusion.

--
Hannu Krosing <hannu@tm.ee>

#12Hannu Krosing
hannu@tm.ee
In reply to: Hannu Krosing (#1)
Re: Missing CONCURRENT VACUUM (Was: Release notes for

On E, 2005-08-22 at 00:36 +0300, Hannu Krosing wrote:

On K, 2005-08-17 at 14:48 -0400, Tom Lane wrote:

Hannu Krosing <hannu@skype.net> writes:

Please check the actual patch and advise if anything is still missing.

While testing this I realized that it does not in fact work as
advertised. It will only exclude long-running VACUUMs from other
VACUUMs' OldestXmin if *all* the transactions in the system are lazy
VACUUMs. If there is even one regular transaction in the system,
that transaction will include the VACUUMs in its MyProc->xmin, and
thence GetOldestXmin will have to include them in its result.

Thanks for spotting it. I guess in my case it was performing more or
less as expected due to my load fluctuating between 0 and 300 concurrent
transaction each minute, so the vacuum hit the 0 point often enough to
be useful. I am running it in a loop, with 15 sec sleep between each
vacuum.

AFAICS the only way to fix this would be to exclude inVacuum
transactions from GetSnapshotData's calculations as well.

I fixed this in a more local way by adding an extra "xmin" to proc for
transactions where inVacuum is false (proc->nonInVacuumXmin) which is
calculated together with proc->xmin.

Somehow this still did not fix the issue of getting the xmin of long
vacuum through concurrent transactions.

My initial testing was wrong.

Probably you should ignore the whole thing until I figure out why this
does not work ...

--
Hannu Krosing <hannu@tm.ee>

#13Hannu Krosing
hannu@tm.ee
In reply to: Hannu Krosing (#12)
Re: Missing CONCURRENT VACUUM (Was: Release notes for

On E, 2005-08-22 at 01:14 +0300, Hannu Krosing wrote:

I fixed this in a more local way by adding an extra "xmin" to proc for
transactions where inVacuum is false (proc->nonInVacuumXmin) which is
calculated together with proc->xmin.

Somehow this still did not fix the issue of getting the xmin of long
vacuum through concurrent transactions.

Actually it seems to work, if the "other" transaction actually does
something. It does not work when the only thing the other transactions
has doen is BEGIN WORK;

My initial testing was wrong.

And my second testing was wrong too!

I tested in the following way (

create bigtable (i serial primary key, t text);
-- filled bigtable with 8 M rows
create tinytable(i serial primary key, t text);
insert into tinytable select * from bigtable limit 50;

opened 3 psql connections and did the following

1) vacuum bigtable;
2) update tinytable set t = random(); -- this creates dead tuples in
-- parallel with vacuum bigtable
3) begin;
2) vacuum verbose tinytable; -- cant free tuples
3) select 1;
2) vacuum verbose tinytable; -- can free the tuples

+ 30 sec

1) vacuum bigtable finishes

initially I just did begin and thought that concurrent transactions are
still blocking.

When I added a notice inside GetSnapshotData, I saw that it was not
called by plain BEGIN WORK;

Probably nonInVacuumXmin needs more care, i.e. initialising and setting
it outside GetSnapshotData, at trx start and/or end. I'm too sleepy now
to investigate further (it's 2:10 am here).

Probably you should ignore the whole thing until I figure out why this
does not work ...

Please try to apply the patch, even if dangling BEGIN WORK; still causes
problems - I'll fix this tomorrow.

I'd really like to see this in 8.1.

--
Hannu Krosing <hannu@tm.ee>

#14Hannu Krosing
hannu@tm.ee
In reply to: Hannu Krosing (#13)
Re: Missing CONCURRENT VACUUM (Was: Release notes for

On E, 2005-08-22 at 02:14 +0300, Hannu Krosing wrote:

Probably nonInVacuumXmin needs more care, i.e. initialising and setting
it outside GetSnapshotData, at trx start and/or end. I'm too sleepy now
to investigate further (it's 2:10 am here).

The attached patch works now as advertized so that concurrent non-vacuum
transactions don't interfere with vacuums OldestXmin calculations.

I fixed also the "VACUUM FULL case it manages to invoke *both*
full_vacuum_rel and lazy_vacuum_rel " bug.

But I could not find the breakage (from your Aug 17 email) with

In my patch I specifically exclude TruncateSUBTRANS from using the
inVacuum flag

You missed vac_truncate_clog, though.

Has somethoing changed in vac_truncate_clog or am I just too
lazy/stupid ?

--
Hannu Krosing <hannu@skype.net>

Attachments:

vacuum-patch-8.1-2005.08.22-2.difftext/x-patch; charset=ISO-8859-15; name=vacuum-patch-8.1-2005.08.22-2.diffDownload+130-88
#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hannu Krosing (#13)
Re: Missing CONCURRENT VACUUM (Was: Release notes for

Hannu Krosing <hannu@tm.ee> writes:

Please try to apply the patch, even if dangling BEGIN WORK; still causes
problems - I'll fix this tomorrow.

No. A patch that you yourself have so little confidence in, in a
fundamental part of the system?

This will be lucky if it gets into 8.2, after considerably more review
than I have time to give it now. It clearly needs such review, and we
are way too late with the beta already.

regards, tom lane

#16Hannu Krosing
hannu@tm.ee
In reply to: Tom Lane (#15)
Re: Missing CONCURRENT VACUUM (Was: Release notes for

On E, 2005-08-22 at 10:03 -0400, Tom Lane wrote:

Hannu Krosing <hannu@tm.ee> writes:

Please try to apply the patch, even if dangling BEGIN WORK; still causes
problems - I'll fix this tomorrow.

No. A patch that you yourself have so little confidence in, in a
fundamental part of the system?

Yeah, I know I sound flaky. Never should I submit patches when
terminally tired :( . I had just recently read something about starting
beta on monday and wanted to get it in before that.

I had quite much confidence in it not breaking anything (at least not
anything more than what the previous crippled patch did), I had just to
fix the parts regarding initialising nonInVacuumXmin, which is done in
todays patch.

This will be lucky if it gets into 8.2, after considerably more review
than I have time to give it now. It clearly needs such review, and we
are way too late with the beta already.

When I started with the patch a long time ago, GetOldestXmin was used in
less places than it is now. Since then more functionality seems to have
started using it without paying attention that it is used for two
logically different things.

I hope that some care will be taken in future to distinguish between
"real" transactions (those that modify data and whose actions can be
rollbacked) and vacuum (and maybe some other maintenance tasks) where
transaction is used as a programming convenience without any real
transaction-like behaviour present.

Getting my todays patch in would be a way to make sure this distinction
will get more attention.

--
Hannu Krosing <hannu@tm.ee>

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hannu Krosing (#14)
Re: Missing CONCURRENT VACUUM (Was: Release notes for

Hannu Krosing <hannu@skype.net> writes:

But I could not find the breakage (from your Aug 17 email) with

You missed vac_truncate_clog, though.

That was fixed (and documented), along with some other problems,
in the modified patch I sent back to you:

http://archives.postgresql.org/pgsql-patches/2005-08/msg00260.php

AFAICT you ignored that ...

regards, tom lane

#18Hannu Krosing
hannu@tm.ee
In reply to: Tom Lane (#17)
Re: Missing CONCURRENT VACUUM (Was: Release notes for

On E, 2005-08-22 at 14:05 -0400, Tom Lane wrote:

Hannu Krosing <hannu@skype.net> writes:

But I could not find the breakage (from your Aug 17 email) with

You missed vac_truncate_clog, though.

That was fixed (and documented), along with some other problems,
in the modified patch I sent back to you:

http://archives.postgresql.org/pgsql-patches/2005-08/msg00260.php

AFAICT you ignored that ...

This is the second mail you refer to that I have not received, neithet
through list nor directly.

I must make a serious inquiry into my mailhost to see what is going on.

Or start looking for your answers in exclusively in archives :(

Thanks. I'll check that.

--
Hannu Krosing <hannu@skype.net>

#19Hannu Krosing
hannu@tm.ee
In reply to: Hannu Krosing (#18)
Re: Missing CONCURRENT VACUUM (Was: Release notes for

On E, 2005-08-22 at 23:17 +0300, Hannu Krosing wrote:

On E, 2005-08-22 at 14:05 -0400, Tom Lane wrote:

Hannu Krosing <hannu@skype.net> writes:

But I could not find the breakage (from your Aug 17 email) with

You missed vac_truncate_clog, though.

That was fixed (and documented), along with some other problems,
in the modified patch I sent back to you:

http://archives.postgresql.org/pgsql-patches/2005-08/msg00260.php

AFAICT you ignored that ...

I must make a serious inquiry into my mailhost to see what is going on.

Again I answered too quickly. I see it now.

Your answer was to pgsql-patches list and to my terminally spam-infested
old address hannu@tm.ee, whereas I expected it for some reason as a
reply to my hannu@skype.net address and/or pgsql-hackers list from/to
which the last mails were sent

Sorry, must look more aggressively in future :)

--
Hannu Krosing <hannu@tm.ee>