Error while altering an inheritance hierarchy in mid-query

Started by Bob Lunneyabout 16 years ago3 messagesbugs
Jump to latest
#1Bob Lunney
bob_lunney@yahoo.com

My database implements partitioning using inheritance and constraint exclusion, as described in the fine manual. Each partition holds data for a single day, and there are well over 300 child partitions.

The exception mentioned below notwithstanding, the technique works exceedingly well and is a real life saver when you have half a billion rows and climbing.

While a query was running on the partition for today a maintenance job was run that summarizes data in a different partition using the technique described in section 5.9.3. - "Managing Partitions" of the manual, with a twist. Basically, it goes like this:

1. A select into query is run which summarizes the data from a partition into a table outside the inheritance hierarchy, which is then indexed.
2. Then
a. a transaction is begun,
b. the original partition is dropped,
c. the new table renamed to the original partition's name,
d. the new table's unique index is renamed,
e. the appropriate check constraint is added,
f. select privilege is granted, and
g. the transaction is committed.

So far so good, the maintenance job works fine and is quite speedy. The problem occurs when a select query is started prior to the sequence of events above, those steps complete, and an error is thrown saying "PGRES_FATAL_ERROR - ERROR: could not open relation with OID 64412". Of course, seconds later the query runs fine, and there is no object with OID 64412 in the database.

It seems to me that removing and inserting partitions into the inheritance hierarchy may not be a consistent and/or isolated action with respect to queries that may be using the inheritance hierarchy. To be fair, this sequence of events doesn't normally happen in daily processing, but the behavior is not what I expected, nor is it mentioned in the manual. Am I missing or misinterpreting something?

Thanks in advance for your advice!

Bob Lunney

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bob Lunney (#1)
Re: Error while altering an inheritance hierarchy in mid-query

Bob Lunney <bob_lunney@yahoo.com> writes:

1. A select into query is run which summarizes the data from a partition into a table outside the inheritance hierarchy, which is then indexed.
2. Then
a. a transaction is begun,
b. the original partition is dropped,
c. the new table renamed to the original partition's name,
d. the new table's unique index is renamed,
e. the appropriate check constraint is added,
f. select privilege is granted, and
g. the transaction is committed.

I'd suggest taking an exclusive lock on the inheritance hierarchy's
parent table between 2a and 2b. The "could not open relation with OID
nnn" error is to be expected when a table is dropped just as a query
is in the act of trying to open it, which is what could happen here if
a query on the parent table runs concurrently with the DROP.
You're also at risk that a concurrent query might see both or neither
of the old and new versions of the partition, leading to bogus answers.
Both of these things would be fixed if incoming queries are blocked
while trying to open the parent table, rather than while iterating
through the list of inheritance children for it.

regards, tom lane

#3Bob Lunney
bob_lunney@yahoo.com
In reply to: Tom Lane (#2)
Re: Error while altering an inheritance hierarchy in mid-query

So I did miss something! Thanks for the brilliant explanation and simple solution, Tom.

Bob

--- On Thu, 3/18/10, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Show quoted text

From: Tom Lane <tgl@sss.pgh.pa.us>
Subject: Re: [BUGS] Error while altering an inheritance hierarchy in mid-query
To: "Bob Lunney" <bob_lunney@yahoo.com>
Cc: pgsql-bugs@postgresql.org
Date: Thursday, March 18, 2010, 3:26 PM
Bob Lunney <bob_lunney@yahoo.com>
writes:

1.  A select into query is run which summarizes

the data from a partition into a table outside the
inheritance hierarchy, which is then indexed.

2.  Then
   a.  a transaction is begun,
   b.  the original partition is

dropped,

   c.  the new table renamed to the

original partition's name,

   d.  the new table's unique index

is renamed,

   e.  the appropriate check

constraint is added,

   f.  select privilege is granted,

and

   g.  the transaction is

committed.

I'd suggest taking an exclusive lock on the inheritance
hierarchy's
parent table between 2a and 2b.  The "could not open
relation with OID
nnn" error is to be expected when a table is dropped just
as a query
is in the act of trying to open it, which is what could
happen here if
a query on the parent table runs concurrently with the
DROP.
You're also at risk that a concurrent query might see both
or neither
of the old and new versions of the partition, leading to
bogus answers.
Both of these things would be fixed if incoming queries are
blocked
while trying to open the parent table, rather than while
iterating
through the list of inheritance children for it.

           
regards, tom lane