BUG #7634: Missing files in global/ after a lot of CREATE DATABASE / DROP DATABASE

Started by Thomas Eckestadover 13 years ago5 messagesbugs
Jump to latest
#1Thomas Eckestad
thomas.eckestad@gmail.com

The following bug has been logged on the website:

Bug reference: 7634
Logged by: Thomas
Email address: thomas.eckestad@gmail.com
PostgreSQL version: 9.1.6
Operating system: Linux
Description:

Hi,

We are using a Postgres server dedicated for unit testing, i.e. for testing
our code interacting with the database. Each unit test may create, use and
then drop one or more test databases. When running the complete test suite a
lot of databases are created and dropped (>100).

After a couple of days/weeks with frequent unit test activity DROP DATABASE
eventually triggers errors on the following form:

2012-05-08 08:53:02.512 CEST> LOG: statement: DROP DATABASE IF EXISTS
"HEAD_test_migrate_group_data_10010018668"
2012-05-08 08:53:02.512 CEST> ERROR: could not open file "global/12693": No
such file or directory
2012-05-08 08:53:02.512 CEST> STATEMENT: DROP DATABASE IF EXISTS
"HEAD_test_migrate_group_data_10010018668"

For now we handle this situation by automatically performing a complete
reinstall of the test database server when we detect the error. So we have a
satisfactory workaround in place.

We are using PostgreSQL 9.1.6 on x86_64-unknown-linux-gnu, compiled by gcc
(GCC) 4.3.4, 64-bit.

Best regards,
Thomas

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Eckestad (#1)
Re: BUG #7634: Missing files in global/ after a lot of CREATE DATABASE / DROP DATABASE

thomas.eckestad@gmail.com writes:

After a couple of days/weeks with frequent unit test activity DROP DATABASE
eventually triggers errors on the following form:

2012-05-08 08:53:02.512 CEST> LOG: statement: DROP DATABASE IF EXISTS
"HEAD_test_migrate_group_data_10010018668"
2012-05-08 08:53:02.512 CEST> ERROR: could not open file "global/12693": No
such file or directory
2012-05-08 08:53:02.512 CEST> STATEMENT: DROP DATABASE IF EXISTS
"HEAD_test_migrate_group_data_10010018668"

That is extremely peculiar --- AFAICS, 9.1 should never assign a
relfilenode of 12693. (OIDs assigned by initdb don't get past about
11900 in that version, and OIDs assigned after normal postmaster start
should always be above 16384.) Is it always exactly "global/12693"
that's complained of? Could you monitor the contents of $PGDATA/global
and see if the set of filenames present changes while you're running
these tests?

regards, tom lane

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Eckestad (#1)
Re: BUG #7634: Missing files in global/ after a lot of CREATE DATABASE / DROP DATABASE

thomas.eckestad@gmail.com writes:

After a couple of days/weeks with frequent unit test activity DROP DATABASE
eventually triggers errors on the following form:

2012-05-08 08:53:02.512 CEST> LOG: statement: DROP DATABASE IF EXISTS
"HEAD_test_migrate_group_data_10010018668"
2012-05-08 08:53:02.512 CEST> ERROR: could not open file "global/12693": No
such file or directory
2012-05-08 08:53:02.512 CEST> STATEMENT: DROP DATABASE IF EXISTS
"HEAD_test_migrate_group_data_10010018668"

FWIW, I ran about 40000 cycles of CREATE/DROP DATABASE on 9.1 branch tip
without seeing anything odd. So it's fairly clear that there's
something you've not mentioned that's necessary to trigger this.

regards, tom lane

#4Thomas Eckestad
thomas.eckestad@gmail.com
In reply to: Tom Lane (#2)
Re: BUG #7634: Missing files in global/ after a lot of CREATE DATABASE / DROP DATABASE

2012/11/1 Tom Lane <tgl@sss.pgh.pa.us>

That is extremely peculiar --- AFAICS, 9.1 should never assign a
relfilenode of 12693. (OIDs assigned by initdb don't get past about
11900 in that version, and OIDs assigned after normal postmaster start
should always be above 16384.) Is it always exactly "global/12693"
that's complained of? Could you monitor the contents of $PGDATA/global
and see if the set of filenames present changes while you're running
these tests?

regards, tom lane

No, it is not always global/12693. A few days ago it was global/12589 that
got lost.

I am afraid that I can not guarantee that the example that I posted
(global/12693) was triggered with version 9.1.6. It might be for 9.0.x or
9.1.x, if that makes a difference. I am sure though that global/12589 was
triggered using 9.1.5 (upgraded to 9.1.6 just a few days ago).

Sorry for the version confusion.

I will monitor global/ and try to trigger the bug and get back to you next
week.

Regards,
Thomas

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Eckestad (#4)
Re: BUG #7634: Missing files in global/ after a lot of CREATE DATABASE / DROP DATABASE

Thomas Eckestad <thomas.eckestad@gmail.com> writes:

2012/11/1 Tom Lane <tgl@sss.pgh.pa.us>

That is extremely peculiar --- AFAICS, 9.1 should never assign a
relfilenode of 12693.

I am afraid that I can not guarantee that the example that I posted
(global/12693) was triggered with version 9.1.6. It might be for 9.0.x or
9.1.x, if that makes a difference. I am sure though that global/12589 was
triggered using 9.1.5 (upgraded to 9.1.6 just a few days ago).

I realized that these numbers are actually quite a lot more
platform-specific than I'd been thinking, since in 9.1 they will vary
depending on how many OIDs got consumed for pg_collation entries,
and that will depend not only on your operating system but how many
locales you've seen fit to install. So I'm probably wrong to have
guessed that this might represent a mistaken access to a relfilenode
that never should have existed.

What I'd suggest doing is monitoring the output of this query:

select relname, pg_relation_filenode(oid) from pg_class where relisshared;

which will tell you what filenames *ought* to be present in
$PGDATA/global, and then when something goes missing it'll be possible
to figure out which table or index it was. That might provide at least
the first clue what's wrong.

regards, tom lane