Wrong return code in vacuumdb when multiple jobs are used
Hi,
While reading vacuumdb code, I just noticed that it can return 0 if an
error happen when -j is used, if errors happen on the last batch of
commands.
For instance:
session 1
alter database postgres set lock_timeout = 1;
begin;
lock table pg_extension;
session 2
$ vacuumdb -d postgres -t pg_extension -t pg_extension
vacuumdb: vacuuming database "postgres"
vacuumdb: error: vacuuming of table "pg_catalog.pg_extension" in
database "postgres" failed: ERROR: canceling statement due to lock
timeout
$ echo $?
1
$ vacuumdb -d postgres -t pg_extension -t pg_extension -j2
vacuumdb: vacuuming database "postgres"
vacuumdb: error: vacuuming of database "postgres" failed: ERROR:
canceling statement due to lock timeout
$ echo $?
0
but
$ vacuumdb -d postgres -t pg_extension -t pg_extension -t pg_extension -j2
vacuumdb: vacuuming database "postgres"
vacuumdb: error: vacuuming of database "postgres" failed: ERROR:
canceling statement due to lock timeout
$ echo $?
1
This behavior exists since 9.5. Trivial patch attached. I'm not sure
that a TAP test is required here, so I didn't add one. I'll be happy
to do so though if needed.
Attachments:
fix_parallel_vacuumdb_rc-v1.difftext/x-patch; charset=US-ASCII; name=fix_parallel_vacuumdb_rc-v1.diffDownload
diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c
index f15e1ad8f1..e9da74c3ba 100644
--- a/src/bin/scripts/vacuumdb.c
+++ b/src/bin/scripts/vacuumdb.c
@@ -712,7 +712,10 @@ vacuum_one_database(const char *dbname, vacuumingOptions *vacopts,
for (j = 0; j < concurrentCons; j++)
{
if (!GetQueryResult((slots + j)->connection, progname))
+ {
+ failed = true;
goto finish;
+ }
}
}
On Sat, May 04, 2019 at 10:35:23AM +0200, Julien Rouhaud wrote:
While reading vacuumdb code, I just noticed that it can return 0 if an
error happen when -j is used, if errors happen on the last batch of
commands.
Yes, I agree that this is wrong. GetIdleSlot() is much more careful
about that than vacuum_one_database(), so your patch looks good at
quick glance.
This behavior exists since 9.5. Trivial patch attached. I'm not sure
that a TAP test is required here, so I didn't add one. I'll be happy
to do so though if needed.
You could make that reliable by getting a lock on a table using a
two-phase transaction, and your test case from upthread won't fly high
as we have no facility in PostgresNode.pm to keep around a session's
state using psql. FWIW, I am not convinced that it is a case worth
bothering, so no tests is fine.
--
Michael
On Sat, May 4, 2019 at 11:15 AM Michael Paquier <michael@paquier.xyz> wrote:
I'm not sure
that a TAP test is required here, so I didn't add one. I'll be happy
to do so though if needed.You could make that reliable by getting a lock on a table using a
two-phase transaction, and your test case from upthread won't fly high
as we have no facility in PostgresNode.pm to keep around a session's
state using psql. FWIW, I am not convinced that it is a case worth
bothering, so no tests is fine.
Yes, adding a test for this case looked like requiring a lot of
creativity using TAP infrastructure, that's the main reason why I
didn't add one. 2PC is a good idea though.
On Sat, May 4, 2019 at 2:45 PM Michael Paquier <michael@paquier.xyz> wrote:
On Sat, May 04, 2019 at 10:35:23AM +0200, Julien Rouhaud wrote:
While reading vacuumdb code, I just noticed that it can return 0 if an
error happen when -j is used, if errors happen on the last batch of
commands.Yes, I agree that this is wrong. GetIdleSlot() is much more careful
about that than vacuum_one_database(), so your patch looks good at
quick glance.
The fix looks good to me as well.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Sat, May 04, 2019 at 04:34:59PM +0530, Amit Kapila wrote:
The fix looks good to me as well.
We are very close to the next minor release, so it may not be that
wise to commit a fix for that issue now as we should have a couple of
clean buildfarm clean runs. Are there any objections to wait after
the release? Or would folks prefer if this is fixed before the
release?
--
Michael
On Sat, May 4, 2019 at 2:17 PM Michael Paquier <michael@paquier.xyz> wrote:
On Sat, May 04, 2019 at 04:34:59PM +0530, Amit Kapila wrote:
The fix looks good to me as well.
We are very close to the next minor release, so it may not be that
wise to commit a fix for that issue now as we should have a couple of
clean buildfarm clean runs.
Agreed.
Are there any objections to wait after
the release? Or would folks prefer if this is fixed before the
release?
No objection from me. It's been broken since introduction in 9.5 and
has never been noticed since, so it can wait until next release.
Should I register the patch in the next commitfest to keep track of
it?
On Sat, May 04, 2019 at 02:28:48PM +0200, Julien Rouhaud wrote:
No objection from me. It's been broken since introduction in 9.5 and
has never been noticed since, so it can wait until next release.
Should I register the patch in the next commitfest to keep track of
it?
No need to. I am marking on my agenda to have an extra look at it
next week and potentially commit it after the release.
--
Michael
On Sat, May 4, 2019 at 2:41 PM Michael Paquier <michael@paquier.xyz> wrote:
On Sat, May 04, 2019 at 02:28:48PM +0200, Julien Rouhaud wrote:
No objection from me. It's been broken since introduction in 9.5 and
has never been noticed since, so it can wait until next release.
Should I register the patch in the next commitfest to keep track of
it?No need to. I am marking on my agenda to have an extra look at it
next week and potentially commit it after the release.
Ok, thanks!
Julien Rouhaud <rjuju123@gmail.com> writes:
On Sat, May 4, 2019 at 2:17 PM Michael Paquier <michael@paquier.xyz> wrote:
We are very close to the next minor release, so it may not be that
wise to commit a fix for that issue now as we should have a couple of
clean buildfarm clean runs.
Agreed.
+1, waiting till after the minor releases are tagged seems wisest.
We can still push it before 12beta1, so it will get tested in the beta
period.
regards, tom lane
On Sat, May 04, 2019 at 11:48:53AM -0400, Tom Lane wrote:
+1, waiting till after the minor releases are tagged seems wisest.
We can still push it before 12beta1, so it will get tested in the beta
period.
The new minor releases have been tagged, so committed.
--
Michael
On Thu, May 9, 2019 at 3:32 AM Michael Paquier <michael@paquier.xyz> wrote:
On Sat, May 04, 2019 at 11:48:53AM -0400, Tom Lane wrote:
+1, waiting till after the minor releases are tagged seems wisest.
We can still push it before 12beta1, so it will get tested in the beta
period.The new minor releases have been tagged, so committed.
Thanks a lot!