invalid tid errors in latest 7.3.4 stable.
Hello folks,
Stumbled across an odd problem while cleaning data out of a database. I am
getting these "invalid tid" errors. I tried the upgrade from 7.3.2 to 7.3.4.
I tried a dumpall/initdb/restore... nadda. Nothing really usefull is coming
from the logs either, even though logging is cranked up. If anyone can
suggest a method to track down the cause of the following dialog with the db,
I would greatly appreciate it. If you need any more info, please just ask.
Thank you in advance.
-Wade
version
---------------------------------------------------------------------
PostgreSQL 7.3.4 on i386-unknown-freebsd4.6, compiled by GCC 2.95.4
(-STABLE cvs from today)
dropsites=# begin;
BEGIN
dropsites=# delete from te_users where id = 954;
WARNING: Error occurred while executing PL/pgSQL function c_delete_categories
WARNING: line 14 at SQL statement
ERROR: heap_mark4update: (am)invalid tid
dropsites=# rollback;
ROLLBACK
Table "public.te_users"
Column | Type |
Modifiers
-------------------+-----------------------------+-----------------------------------------------------
id | integer | not null default
nextval('"te_users_id_seq"'::text)
username | text | not null
password | text |
reseller | integer | not null default 0
directory | text |
contact | integer |
creation_date | timestamp with time zone | default now()
active | boolean | not null default 'f'
domain | integer | not null default 0
has_domain | boolean | not null default 'f'
tutorial_type | integer | default -1
tutorial_step | integer | default -1
license_agreement | boolean | default 'f'
use_header | integer | default 0
promo | boolean | not null default 'f'
last_billed | timestamp without time zone | default now()
Indexes: primary_fk primary key btree (username, "domain"),
te_users_id_key unique btree (id),
te_users_username_lower_idx btree (lower(username))
dropsites=# \d c_categories
Table "public.c_categories"
Column | Type | Modifiers
-------------+---------+--------------------------------------------------------------
id | integer | not null default
nextval('public.c_categories_id_seq'::text)
category | integer | not null default 0
userid | integer | not null
form | integer | not null
name | text |
description | text |
lft | integer |
rgt | integer |
level | integer |
parentid | integer |
Indexes: c_categories_id_key unique btree (id)
Foreign Key constraints: $1 FOREIGN KEY (userid) REFERENCES te_users(id) ON
UPDATE NO ACTION ON DELETE CASCADE,
$2 FOREIGN KEY (form) REFERENCES c_forms(id) ON
UPDATE NO ACTION ON DELETE CASCADE,
c_categories_fk FOREIGN KEY (parentid) REFERENCES
c_categories(id) ON UPDATE NO ACTION ON DELETE SET DEFAULT,
c_categories_cat_fk FOREIGN KEY (category) REFERENCES
c_categories(id) ON UPDATE NO ACTION ON DELETE NO ACTION
--- Source of c_delete_categories ---
CREATE OR REPLACE FUNCTION c_delete_categories() returns TRIGGER AS '
begin
IF c_category_mutex() THEN
-- delete entry
DELETE FROM c_categories WHERE ID = old.id;
IF (old.rgt - old.lft) > 1 THEN
-- update children
UPDATE c_categories SET ParentID = old.parentid WHERE ParentID =
old.id;
UPDATE c_categories SET lft = lft - 1, rgt = rgt - 1, level = level -
1 WHERE lf
t > old.lft AND lft < old.rgt;
END IF;
-- remove extra space
UPDATE c_categories SET lft = lft - 2 WHERE lft > old.rgt;
UPDATE c_categories SET rgt = rgt - 2 WHERE rgt > old.rgt;
PERFORM c_category_clear_mutex();
return NULL;
else
return old;
END IF;
end;
' language 'plpgsql';
--- source of c_category_mutex ---
CREATE OR REPLACE FUNCTION c_category_mutex() returns BOOL AS '
DECLARE
mutex_count integer;
BEGIN
SELECT INTO mutex_count COUNT(*) FROM pg_class c, pg_attribute a
WHERE a.attrelid = c.oid
AND c.relname = ''___c_category_mutex___''
AND a.attname = ''___c_category_mutex___''
AND pg_catalog.pg_table_is_visible ( c.oid );
IF mutex_count > 0 THEN
RETURN ''f'';
ELSE
CREATE TEMP TABLE ___c_category_mutex___ (___c_category_mutex___ INT2);
RETURN ''t'';
END IF;
END;' LANGUAGE 'plpgsql';
--- source of c_category_clear_mutex ---
CREATE OR REPLACE FUNCTION c_category_clear_mutex() returns BOOL AS '
DECLARE
mutex_count INT4;
BEGIN
SELECT INTO mutex_count COUNT(*) FROM pg_class c, pg_attribute a
WHERE a.attrelid = c.oid
AND c.relname = ''___c_category_mutex___''
AND a.attname = ''___c_category_mutex___''
AND pg_catalog.pg_table_is_visible ( c.oid );
IF mutex_count > 0 THEN
DROP TABLE ___c_category_mutex___;
RETURN ''t'';
ELSE
RETURN ''f'';
END IF;
END;' LANGUAGE 'plpgsql';
--
Wade Klaver
Wavefire Technologies Corporation
GPG Public Key at http://archeron.wavefire.com
/"\ ASCII Ribbon Campaign .
\ / - NO HTML/RTF in e-mail .
X - NO Word docs in e-mail .
/ \ -----------------------------------------------------------------
Wade Klaver <archeron@wavefire.com> writes:
Stumbled across an odd problem while cleaning data out of a database. I am
getting these "invalid tid" errors. I tried the upgrade from 7.3.2 to
7.3.4.
Hm. We fixed something with a similar symptom as of 7.3.3:
http://archives.postgresql.org/pgsql-hackers/2003-03/msg01099.php
If you are still seeing it in 7.3.4 then maybe there's another related
problem. Could you work up a self-contained test case?
regards, tom lane
Hello Tom,
In trying to come up with a test scenario, I loaded this db into a 7.4 db
and got a similar message. It shows up as follows:
dropsites=> begin;
BEGIN
dropsites=> delete from te_users where reseller = 21;
ERROR: attempted to mark4update invisible tuple
CONTEXT: PL/pgSQL function "c_delete_categories" line 14 at SQL statement
dropsites=>
Is this the same message using the new error reporting framework?
-Wade
On September 23, 2003 09:44 pm, Tom Lane wrote:
Wade Klaver <archeron@wavefire.com> writes:
Stumbled across an odd problem while cleaning data out of a database. I
am getting these "invalid tid" errors. I tried the upgrade from 7.3.2 to
7.3.4.Hm. We fixed something with a similar symptom as of 7.3.3:
http://archives.postgresql.org/pgsql-hackers/2003-03/msg01099.php
If you are still seeing it in 7.3.4 then maybe there's another related
problem. Could you work up a self-contained test case?regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?
--
Wade Klaver
Wavefire Technologies Corporation
GPG Public Key at http://archeron.wavefire.com
/"\ ASCII Ribbon Campaign .
\ / - NO HTML/RTF in e-mail .
X - NO Word docs in e-mail .
/ \ -----------------------------------------------------------------
Wade Klaver <archeron@wavefire.com> writes:
In trying to come up with a test scenario, I loaded this db into a 7.4 db
and got a similar message. It shows up as follows:
ERROR: attempted to mark4update invisible tuple
CONTEXT: PL/pgSQL function "c_delete_categories" line 14 at SQL statement
Is this the same message using the new error reporting framework?
Yes, I believe so.
regards, tom lane
Wade Klaver <archeron@wavefire.com> writes:
OK, I set you up a login on arch.wavefire.com
Okay, what I find is this sequence of events:
1. delete from te_users where id = 954;
2. The ON DELETE CASCADE RI constraint propagates this to a delete of
some row(s) in c_categories.
3. That fires the c_delete_categories BEFORE DELETE trigger.
4. That does several things including
UPDATE c_categories SET lft = lft - 2 WHERE lft > old.rgt;
5. This update command suffers a Halloween problem, namely trying to
update rows it's already updated.
Why does it do that, you ask? Because ReferentialIntegritySnapshotOverride
is true, since we are inside the ON DELETE CASCADE RI trigger and
haven't yet returned from any trigger. So instead of using the correct
snapshot for the UPDATE command, tqual.c mistakenly uses SnapshotNow
rules. We have successfully executed a select or two inside the trigger
function already, so CurrentCommandId is greater than the command ID
associated with the UPDATE command, making the updated rows visible.
Oops.
I think this is proof of something I've felt since day one, namely that
a global ReferentialIntegritySnapshotOverride flag is an unusable hack.
How can we get rid of it? Why did we need it in the first place?
(I suspect the proper answer for "how can we get rid of it" will be to
extend the Executor API so that the RI functions can tell the executor
to use SnapshotNow as es_snapshot, instead of a standard query snapshot.
But I'm wondering why we have to do this at all.)
regards, tom lane
Import Notes
Reply to msg id not found: 200309241039.13264.archeron@wavefire.com
On Wed, 24 Sep 2003, Tom Lane wrote:
Wade Klaver <archeron@wavefire.com> writes:
OK, I set you up a login on arch.wavefire.com
Okay, what I find is this sequence of events:
1. delete from te_users where id = 954;
2. The ON DELETE CASCADE RI constraint propagates this to a delete of
some row(s) in c_categories.3. That fires the c_delete_categories BEFORE DELETE trigger.
4. That does several things including
UPDATE c_categories SET lft = lft - 2 WHERE lft > old.rgt;5. This update command suffers a Halloween problem, namely trying to
update rows it's already updated.Why does it do that, you ask? Because ReferentialIntegritySnapshotOverride
is true, since we are inside the ON DELETE CASCADE RI trigger and
haven't yet returned from any trigger. So instead of using the correct
snapshot for the UPDATE command, tqual.c mistakenly uses SnapshotNow
rules. We have successfully executed a select or two inside the trigger
function already, so CurrentCommandId is greater than the command ID
associated with the UPDATE command, making the updated rows visible.
Oops.I think this is proof of something I've felt since day one, namely that
a global ReferentialIntegritySnapshotOverride flag is an unusable hack.
How can we get rid of it? Why did we need it in the first place?(I suspect the proper answer for "how can we get rid of it" will be to
extend the Executor API so that the RI functions can tell the executor
to use SnapshotNow as es_snapshot, instead of a standard query snapshot.
But I'm wondering why we have to do this at all.)
I think if you have something like:
create table test1 (id int primary key, otherid int references test1);
insert into test1 values (4,4);
T1: begin;
T1: set transaction isolation level serializable;
T1: select * from test1;
T2: begin;
T2: insert into test1 values (5,4);
T2: end;
T1: delete from test1 where id=4;
-- I think the standard snapshot rules would mean that the row 5,4 would
be hidden from the select in the trigger, which means that the delete
would be allowed, where it should fail since that'd leave an orphaned
child row.
Or at least I've commented out the updates to
ReferentialIntegritySnapshotOverride in my local ri_triggers.c and see
behavior consistent with that (that the delete succeeds and the child
row is orphaned).
Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
On Wed, 24 Sep 2003, Tom Lane wrote:
But I'm wondering why we have to do this at all.)
I think if you have something like:
create table test1 (id int primary key, otherid int references test1);
insert into test1 values (4,4);
T1: begin;
T1: set transaction isolation level serializable;
T1: select * from test1;
T2: begin;
T2: insert into test1 values (5,4);
T2: end;
T1: delete from test1 where id=4;
-- I think the standard snapshot rules would mean that the row 5,4 would
be hidden from the select in the trigger, which means that the delete
would be allowed, where it should fail since that'd leave an orphaned
child row.
Ah, I see. And the reason there's no race condition with SnapshotNow is
that T2 took a SELECT FOR UPDATE lock on the id=4 row, so the DELETE
couldn't succeed until T2 commits.
Okay, I'll work out some extension of the APIs to let us propagate the
snapshot request down through SPI and into the Executor, rather than
using a global variable for it. (Unless someone has a better idea...)
regards, tom lane
I said:
Okay, I'll work out some extension of the APIs to let us propagate the
snapshot request down through SPI and into the Executor, rather than
using a global variable for it. (Unless someone has a better idea...)
I've committed the attached patch into CVS HEAD. I am now wondering
whether to back-patch it to the 7.3 branch or not. It's a bit larger
than I would have liked, and really needs more testing before being
shoved into a stable branch.
The simplest test case I was able to generate for Wade's bug is this:
-----------
create table t1 (f1 int primary key);
create table t2 (f1 int references t1 on delete cascade);
create table t3 (f1 int);
create or replace function t2del() returns trigger as '
begin
update t3 set f1 = f1 + 1;
return old;
end' language plpgsql;
create trigger t2del before delete on t2 for each row
execute procedure t2del();
create or replace function t3upd() returns trigger as '
begin
perform count(*) from t3;
return new;
end' language plpgsql;
create trigger t3upd before update on t3 for each row
execute procedure t3upd();
insert into t1 values(1);
insert into t2 values(1);
insert into t3 values(1);
delete from t1;
-----------
Until this commit, CVS HEAD generated
ERROR: attempted to mark4update invisible tuple
CONTEXT: PL/pgSQL function "t2del" line 2 at SQL statement
7.3 branch generates a different spelling of the same error:
WARNING: Error occurred while executing PL/pgSQL function t2del
WARNING: line 2 at SQL statement
ERROR: heap_mark4update: (am)invalid tid
AFAICT you need a minimum of two levels of triggers invoked by an RI
trigger to make this happen, so it may be a corner case best left
unfixed in the 7.3 branch.
Opinions anyone?
regards, tom lane
Hello,
Naturally, as I found this problem in a production database running 7.3.4, a
back-patch to 7.3 would be desireable. Even if just a patch was available
and was not commited to -STABLE, this would do. I would also then be able to
test such a critter on our development server for a future back-port if that
ever comes to pass.
-Wade
I've committed the attached patch into CVS HEAD. I am now wondering
whether to back-patch it to the 7.3 branch or not. It's a bit larger
than I would have liked, and really needs more testing before being
shoved into a stable branch.AFAICT you need a minimum of two levels of triggers invoked by an RI
trigger to make this happen, so it may be a corner case best left
unfixed in the 7.3 branch.Opinions anyone?
regards, tom lane
--
Wade Klaver
Wavefire Technologies Corporation
GPG Public Key at http://archeron.wavefire.com
/"\ ASCII Ribbon Campaign .
\ / - NO HTML/RTF in e-mail .
X - NO Word docs in e-mail .
/ \ -----------------------------------------------------------------
Okay, I'll work out some extension of the APIs to let us propagate the
snapshot request down through SPI and into the Executor, rather than
using a global variable for it. (Unless someone has a better idea...)
Just when you thought it was safe to go back in the water ...
Chris Kratz sent me the attached example, which fails in 7.3 and (still)
fails in CVS HEAD too.
It appears that the failure mode goes like this: "DELETE FROM activity"
cascades via ON DELETE CASCADE to a delete in qry_column_list. The RI
trigger's delete query fires the RULE, and so must execute "UPDATE
report_objects". The compilation of report_objects' plpgsql trigger
advances the CommandCounter, creating the potential for Halloween
problems when SnapshotNow is used to fetch values. In particular the
UPDATE sees its own output rows as valid source rows.
As far as the "DELETE FROM qry_column_list" goes, I think the solution
is that fetching rows can't use pure SnapshotNow after all. What we
need is to create a fresh QuerySnapshot that shows all transactions
committed-to-date as committed, and saves the current CommandCounter as
the criterion for locally created rows. Unlike SnapshotNow, this would
mean that transactions committed just after we take the new snapshot
would not be seen as committed. This should be okay AFAICS --- once we
reach the RI triggers, all transactions we need to worry about should be
committed. (If not, surely there's a race condition anyway.) Also note
that an RI query would *not* see the effects of actions it indirectly
triggers. This should be okay, because if they do anything that
requires RI validation, they should cause additional RI trigger firings
to be queued for attention later.
But Chris' example raises a host of other questions in my mind. Should
we apply this forcibly-updated QuerySnapshot to actions that are
indirectly triggered by RI queries? In CVS tip, SnapshotNow rules are
in fact used for the UPDATE that's generated by the RULE, because it's
part of the generated plan for the DELETE. But any queries executed
inside triggers fired as a result of all this would use the pre-existing
QuerySnapshot, and hence could see a worldview completely inconsistent
with the rows they are being fired for :-(. It's worse in 7.3, because
the first trigger exit would revert ReferentialIntegritySnapshotOverride
to false, meaning you wouldn't even be using the same snapshot rules
throughout the UPDATE/DELETE :-( :-(
I am inclined to think now that the right solution is for the RI
triggers to update the global QuerySnapshot to current time when they
start, and then revert it to what it had been before exiting. (And that
code had better be in the RI triggers themselves, *not* in the generic
trigger-calling code.) This would ensure that actions taken indirectly
as a result of RI behavior would see a consistent worldview.
The main argument I can see against this is that it would be a really
big wart on the behavior of SERIALIZABLE transactions. Instead of
saying "in a SERIALIZABLE transaction, you only see the effects of
transactions committed before your transaction started", we'd have to
add a footnote "except in actions taken as a result of RI-generated
queries", which sure complicates matters from a logical point of view.
(In READ COMMITTED mode, on the other hand, it's no big deal; we are
effectively just decreeing that a new command starts before the RI
triggers run.)
Comments? Anyone have a better idea?
Anyway, on to Chris' example. Load the attached script into a database
that has plpgsql already created, and then do
DELETE FROM Activity WHERE ActivityID = 16739;
You'll get
ERROR: attempted to mark4update invisible tuple
(or the equivalent 7.3 message). This is reproducible so long as you
start a fresh session each time you attempt the DELETE. If you try the
DELETE again in the same session, it will succeed, because the trigger
function is already compiled and so no CommandCounterIncrement occurs at
the critical instant. (It might be possible to make the behavior stable
by adding some non-SELECT query inside the trigger function to force
a CommandCounterIncrement to occur anyway. I haven't tried though.)
regards, tom lane
On Friday 26 September 2003 19:50, Tom Lane wrote:
Anyway, on to Chris' example. Load the attached script into a database
that has plpgsql already created, and then do
DELETE FROM Activity WHERE ActivityID = 16739;
You'll get
ERROR: attempted to mark4update invisible tuple
(or the equivalent 7.3 message). This is reproducible so long as you
start a fresh session each time you attempt the DELETE. If you try the
DELETE again in the same session, it will succeed, because the trigger
function is already compiled and so no CommandCounterIncrement occurs at
the critical instant. (It might be possible to make the behavior stable
by adding some non-SELECT query inside the trigger function to force
a CommandCounterIncrement to occur anyway. I haven't tried though.)
I didn't understand exactly the explanation but this last paragraph is bit
interesting.
If the trigger function is precompiled, the error would not be reproducible
and it will work correctly, right?
Can we precompile such RI triggers on postmaster startup? Could that be a
workaround?
Just a thought..
Shridhar
Shridhar Daithankar <shridhar_daithankar@persistent.co.in> writes:
If the trigger function is precompiled, the error would not be reproducible
and it will work correctly, right?
Only because the trigger in the example doesn't issue any queries of its
own. If it did, it would cause CommandCounterIncrement(s) anyway.
Can we precompile such RI triggers on postmaster startup? Could that be a
workaround?
I've thought for some time that it's a bad idea that there is an extra
CCI done when compiling a plpgsql function, because it creates
inconsistencies of behavior. But getting rid of it does not fix the
fundamental issues here at all, it merely means that this particular
drastically-oversimplified example wouldn't happen to fail.
(IIRC, the extra CCI is actually in spi.c, not in plpgsql, so removing
it could potentially break other code; thus I've hesitated to do it.)
regards, tom lane
On Fri, 26 Sep 2003, Tom Lane wrote:
Okay, I'll work out some extension of the APIs to let us propagate the
snapshot request down through SPI and into the Executor, rather than
using a global variable for it. (Unless someone has a better idea...)Just when you thought it was safe to go back in the water ...
Chris Kratz sent me the attached example, which fails in 7.3 and (still)
fails in CVS HEAD too.
As far as the "DELETE FROM qry_column_list" goes, I think the solution
is that fetching rows can't use pure SnapshotNow after all. What we
need is to create a fresh QuerySnapshot that shows all transactions
committed-to-date as committed, and saves the current CommandCounter as
the criterion for locally created rows. Unlike SnapshotNow, this would
mean that transactions committed just after we take the new snapshot
would not be seen as committed. This should be okay AFAICS --- once we
reach the RI triggers, all transactions we need to worry about should be
committed. (If not, surely there's a race condition anyway.) Also note
That should be true. By the time the triggers have started running,
anything that might be conflicting shouldn't be able to commit until after
the transaction the trigger is in (since we already have locks on rows
they'd need to be able to lock).
that an RI query would *not* see the effects of actions it indirectly
triggers. This should be okay, because if they do anything that
requires RI validation, they should cause additional RI trigger firings
to be queued for attention later.
I feel vague uneasiness about this, but can't think of a counter example,
it's probably just breakfast acting up.
But Chris' example raises a host of other questions in my mind. Should
we apply this forcibly-updated QuerySnapshot to actions that are
indirectly triggered by RI queries? In CVS tip, SnapshotNow rules are
in fact used for the UPDATE that's generated by the RULE, because it's
part of the generated plan for the DELETE. But any queries executed
inside triggers fired as a result of all this would use the pre-existing
QuerySnapshot, and hence could see a worldview completely inconsistent
with the rows they are being fired for :-(. It's worse in 7.3, because
the first trigger exit would revert ReferentialIntegritySnapshotOverride
to false, meaning you wouldn't even be using the same snapshot rules
throughout the UPDATE/DELETE :-( :-(I am inclined to think now that the right solution is for the RI
triggers to update the global QuerySnapshot to current time when they
start, and then revert it to what it had been before exiting. (And that
code had better be in the RI triggers themselves, *not* in the generic
trigger-calling code.) This would ensure that actions taken indirectly
as a result of RI behavior would see a consistent worldview.The main argument I can see against this is that it would be a really
big wart on the behavior of SERIALIZABLE transactions. Instead of
saying "in a SERIALIZABLE transaction, you only see the effects of
transactions committed before your transaction started", we'd have to
add a footnote "except in actions taken as a result of RI-generated
queries", which sure complicates matters from a logical point of view.
(In READ COMMITTED mode, on the other hand, it's no big deal; we are
effectively just decreeing that a new command starts before the RI
triggers run.)
Given the two suggestions above, I think I'd vote for the latter. It seems
to break the idea of serializable, but it seems like it'd be easier for
people to work with than the former where you might not be able to even
see the row you're working upon in its own triggers.
I think theoretically in serializable the cases where the difference
between the snapshot from this statement and the standard snapshot for the
transaction are noticable we probably have a serialization failure since
I think that case comes in when a transaction that started after us (and
has already committed) has done something that the constraint's search
condition would need to see we'd potentially be opening up the possibility
for phantoms. If we have to get the same rows as a previous set for
the same search condition, but the row in question was already committed
by that transaction that started after this one, we no longer have the
freedom to pretend that the transactions were in the other order.
Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
I think theoretically in serializable the cases where the difference
between the snapshot from this statement and the standard snapshot for the
transaction are noticable we probably have a serialization failure
Hmm, that is a good point. It would be cleaner to throw a "can't
serialize" failure than have the RI triggers run under a different
snapshot. I am not sure if we can implement that behavior easily,
though. Can you think of a way to detect whether there's an RI conflict
against a later-started transaction?
regards, tom lane
On Fri, 26 Sep 2003, Tom Lane wrote:
Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
I think theoretically in serializable the cases where the difference
between the snapshot from this statement and the standard snapshot for the
transaction are noticable we probably have a serialization failureHmm, that is a good point. It would be cleaner to throw a "can't
serialize" failure than have the RI triggers run under a different
snapshot. I am not sure if we can implement that behavior easily,
though. Can you think of a way to detect whether there's an RI conflict
against a later-started transaction?
Not a complete one yet. :(
I think the case of a row that matches the constraint's search condition
on either check or action but which is committed and invisible to our
snapshot (which for read committed is taken at some point after the
original row modification that this was triggered by) is an error may
cover the basic cases, but I don't feel confident that I'm not
missing some trigger/rule case.
Tom Lane wrote:
Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
I think theoretically in serializable the cases where the difference
between the snapshot from this statement and the standard snapshot for the
transaction are noticable we probably have a serialization failureHmm, that is a good point. It would be cleaner to throw a "can't
serialize" failure than have the RI triggers run under a different
snapshot. I am not sure if we can implement that behavior easily,
though. Can you think of a way to detect whether there's an RI conflict
against a later-started transaction?
Just some thoughts on this that, of course, could be wrong. So please
don't be too hard on me if I'm full of it. :-)
By "a later-started transaction" I assume you mean a later-started
transaction that commits before yours does?
I don't see how RI is any different than dealing with straight SQL
in this regard. The effect of RI is to read/write/delete rows from a
related table that you otherwise wouldn't read or modify, and that means
that the RI mechanism needs to be treated in exactly the same way that
the equivalent SELECT/UPDATE/DELETE would be.
So the question I have is: what would PG do in the case that you SELECT
the same row(s) that the RI triggers are reading implicitly? For
instance, suppose we have two tables:
CREATE TABLE corps (id integer PRIMARY KEY, name varchar(32));
CREATE TABLE widgets (id integer PRIMARY KEY, name varchar(32),
corpid integer REFERENCES corps(id) ON DELETE CASCADE);
When, within a transaction, I do:
INSERT INTO widgets (id, name, corpid) VALUES (1, 'cpu', 3);
the RI mechanism will automatically check to make sure that the value
3 is in the id column of the corps table. Put another way, it will do
an implicit "SELECT id FROM corps WHERE id = 3", right? So suppose
that for the purposes of testing the serialization code I remove the RI
triggers and then actually do the following:
SELECT id FROM corps WHERE id = 3;
INSERT INTO widgets (id, name, corpid) VALUES (1, 'cpu', 3);
If my transaction is serializable then clearly, when another transaction
does
UPDATE corps SET id = 4 WHERE id = 3;
and commits before my transaction commits, either the updating
transaction is in violation of serializability rules or the inserting
transaction is. Serialization is maintained if either of those
transactions aborts with a serialization error.
But note that whether or not RI is involved should be entirely
irrelevant. What matters is what rows each transacion sees and
modifies. How the row gets looked at doesn't matter; the only thing
that matters is that the row *does* get looked at.
The important thing here is that the effect of the RI mechanism MUST be
the same as if the equivalent manual SQL statements were exected within
the same transaction. If it's not, then the RI mechanism is broken and
needs to be fixed at that level.
But if PG exhibits exactly the same bug this thread refers to regardless
of whether a row is examined/modified via directly issued SQL or via
the RI mechanism then the problem lies not within the RI code at all,
but within the serialization code.
I just hope I'm not merely stating the obvious here...
--
Kevin Brown kevin@sysexperts.com
On Fri, 26 Sep 2003, Kevin Brown wrote:
Tom Lane wrote:
Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
I think theoretically in serializable the cases where the difference
between the snapshot from this statement and the standard snapshot for the
transaction are noticable we probably have a serialization failureHmm, that is a good point. It would be cleaner to throw a "can't
serialize" failure than have the RI triggers run under a different
snapshot. I am not sure if we can implement that behavior easily,
though. Can you think of a way to detect whether there's an RI conflict
against a later-started transaction?Just some thoughts on this that, of course, could be wrong. So please
don't be too hard on me if I'm full of it. :-)By "a later-started transaction" I assume you mean a later-started
transaction that commits before yours does?
...
But if PG exhibits exactly the same bug this thread refers to regardless
of whether a row is examined/modified via directly issued SQL or via
the RI mechanism then the problem lies not within the RI code at all,
but within the serialization code.
It's actual a different problem from the original one at this point. If
one just switches to using whatever snapshot is in place for the
transaction, you run into the problem that our serializable isolation mode
isn't entirely serializable and therefore isn't sufficient to guarantee
that the constraint is satisfied.
The case at hand (with *'s on the ri queries) assuming pk already
has an id=1 row would be.
T1: begin;
T1: set transaction isolation level serializable;
T1 ... (something that does a select, not necessarily on either pk or fk)
T2: begin;
T2: insert into fk values (1);
T2*:select * from pk where id=1 for update;
T2: commit;
T1: delete from pk where id=1;
T1*:select * from fk where id=1 for update;
T1: commit;
If you want to treat the serial execution as T1 followed by T2. Then
T2* would have to show no rows for pk and T2 rolls back.
If you want to treat the order as T2,T1, then T1* would have to see the
row that T2 inserted and T1 rolls back.
Right now, you won't get that, you'll get T2* showing 1 row and T1*
showing 0 rows.
Stephan Szabo wrote:
The case at hand (with *'s on the ri queries) assuming pk already
has an id=1 row would be.
T1: begin;
T1: set transaction isolation level serializable;
T1 ... (something that does a select, not necessarily on either pk or fk)
T2: begin;
T2: insert into fk values (1);
T2*:select * from pk where id=1 for update;
T2: commit;
T1: delete from pk where id=1;
T1*:select * from fk where id=1 for update;
T1: commit;If you want to treat the serial execution as T1 followed by T2. Then
T2* would have to show no rows for pk and T2 rolls back.If you want to treat the order as T2,T1, then T1* would have to see the
row that T2 inserted and T1 rolls back.Right now, you won't get that, you'll get T2* showing 1 row and T1*
showing 0 rows.
Does it also behave this way *without* any actual foreign key
constraints in place? In other words, if you perform the RI queries
explicitly?
If so, then the problem is with the serialization code. Sounds like
that's pretty much what you're saying.
The problem in the scenario you described should be solved if we mark any
rows that are selected with the "for update" option (either implicitly,
as with RI triggers, or explicitly) as having been modified by the
selecting transaction, the equivalent of (in the case of T2*) "update pk
set id=id where id=1" but without firing any of the ON MODIFY triggers.
A rollback would, of course, not have any effect on the data in those
rows since there weren't any real changes. This "fix" won't work,
of course, if the serialization code is so broken that it doesn't work
properly even in the face of updates (something I'd find hard to believe).
--
Kevin Brown kevin@sysexperts.com
On Fri, 26 Sep 2003, Kevin Brown wrote:
Stephan Szabo wrote:
The case at hand (with *'s on the ri queries) assuming pk already
has an id=1 row would be.
T1: begin;
T1: set transaction isolation level serializable;
T1 ... (something that does a select, not necessarily on either pk or fk)
T2: begin;
T2: insert into fk values (1);
T2*:select * from pk where id=1 for update;
T2: commit;
T1: delete from pk where id=1;
T1*:select * from fk where id=1 for update;
T1: commit;If you want to treat the serial execution as T1 followed by T2. Then
T2* would have to show no rows for pk and T2 rolls back.If you want to treat the order as T2,T1, then T1* would have to see the
row that T2 inserted and T1 rolls back.Right now, you won't get that, you'll get T2* showing 1 row and T1*
showing 0 rows.Does it also behave this way *without* any actual foreign key
constraints in place? In other words, if you perform the RI queries
explicitly?
Yeah, that wasn't clear from what I'd wrote, but that was from two psql
sessions with non-ri constraint tables (excepting that I used a real
select in place of the holder).
The problem in the scenario you described should be solved if we mark any
rows that are selected with the "for update" option (either implicitly,
as with RI triggers, or explicitly) as having been modified by the
selecting transaction, the equivalent of (in the case of T2*) "update pk
set id=id where id=1" but without firing any of the ON MODIFY triggers.
A rollback would, of course, not have any effect on the data in those
rows since there weren't any real changes. This "fix" won't work,
of course, if the serialization code is so broken that it doesn't work
properly even in the face of updates (something I'd find hard to believe).
That fixes the case above which will fix the ri constraints for right now
(although they really have to stop using for update eventually), but
doesn't really solve the serialization problem since it still exists
AFAICS without for update. Without the for update, you still have T2*
getting 1 row and T1* getting 0 which can't happen for either ordering of
the transactions. It gets worse if that select as a holder at the
beginning of T1 was say select * from fk where id=1 because SQL tells us
that the later select can't see a different set of rows from the earlier
one, so T2 shouldn't be allowed to commit before T1.
Stephan Szabo wrote:
The problem in the scenario you described should be solved if we mark any
rows that are selected with the "for update" option (either implicitly,
as with RI triggers, or explicitly) as having been modified by the
selecting transaction, the equivalent of (in the case of T2*) "update pk
set id=id where id=1" but without firing any of the ON MODIFY triggers.
A rollback would, of course, not have any effect on the data in those
rows since there weren't any real changes. This "fix" won't work,
of course, if the serialization code is so broken that it doesn't work
properly even in the face of updates (something I'd find hard to believe).That fixes the case above which will fix the ri constraints for right now
(although they really have to stop using for update eventually), but
doesn't really solve the serialization problem since it still exists
AFAICS without for update. Without the for update, you still have T2*
getting 1 row and T1* getting 0 which can't happen for either ordering of
the transactions. It gets worse if that select as a holder at the
beginning of T1 was say select * from fk where id=1 because SQL tells us
that the later select can't see a different set of rows from the earlier
one, so T2 shouldn't be allowed to commit before T1.
That's what I was afraid of, and what I figured serialization really
meant: what you see is a snapshot of the database as it was at
transaction start time.
I can't think of any good way to implement proper serialization without
destroying a serialized transaction's read performance, because it
seems to me that the only way to properly implement serialization is to
somehow record on-disk all the rows a serializable transaction visits,
which means that a serializable transaction is going to be *much* slower
than a read-committed transaction. You have to mark such rows because
other transactions (even read-committed transactions) have to abort
if they attempt to modify such a row, and the list of such rows can
grow far too large to record it in shared memory. Worse, you have to
maintain a dynamic list of serializable transactions that have seen the
row and remove a transaction from the list once it commits or rolls back,
because the only time a transaction needs to care about this when changing
a row is when there's a currently-running transaction that's seen it.
We could use the MVCC mechanism to implement it: duplicate the row being
examined and assign the reader's transaction ID to the duplicate just
as if it had modified the row. But you also have to somehow flag the
duplicate as being there as a result of a serializable read, so that
other serializable transactions that try to modify the row after the one
in question has committed won't themselves throw a serialization error
(because without the flag they'd think they were attempting to read a
row that had been modified by someone else during their lifetime).
The other situation you have to deal with is when you have two
transactions, 1 and 2, that start and commit in that order but which
have overlapping times of execution. If transaction 1 modifies a row
after transaction 2 starts, then commits before transaction 2 reads it,
transaction 2 has to be able to detect that and throw a serialization
error.
The way around that problem is to assign a commit ID to each transaction
at commit time. The commit ID is just the transaction ID that will be
assigned to the next transaction that runs. It might make sense for
assignment of commit IDs to increment the transaction ID counter the
way assignment of a transaction ID does. Anyway, if a serializable
transaction reads a row that has a commit ID greater than the reader's
transaction ID, it throws a serialization error. It's probably sufficient
to store the commit ID along with the transaction ID of the committer
in the transaction log as well as in shared memory, so that the commit
ID can be quickly looked up from the transaction ID.
Maybe there's a better way around all this, but I certainly can't think
of one at the moment. :-(
--
Kevin Brown kevin@sysexperts.com