Do I need serializable for this query?

Started by William Garrisonalmost 19 years ago7 messagesgeneral
Jump to latest
#1William Garrison
postgres@mobydisk.com

I have a table that keeps running totals. It is possible that this
would get called twice simultaneously for the same UserID. Do I need to
put this in a serializable transaction? Since this gets called often,
could it be a performance problem if I make it serializable?

CREATE FUNCTION UpdateTotals(IN UserID int,
IN AddToCount int,
IN AddToSize bigint)
RETURNS void AS $$
BEGIN
UPDATE
Totals
SET
TotalSize = TotalSize + $2,
TotalCount = TotalCount + $3
WHERE
UserID = $1;
END IF;
END
$$ LANGUAGE 'plpgsql' VOLATILE;

#2Florian Pflug
fgp@phlo.org
In reply to: William Garrison (#1)
Re: Do I need serializable for this query?

William Garrison wrote:

I have a table that keeps running totals. It is possible that this
would get called twice simultaneously for the same UserID. Do I need to
put this in a serializable transaction? Since this gets called often,
could it be a performance problem if I make it serializable?

CREATE FUNCTION UpdateTotals(IN UserID int,
IN AddToCount int,
IN AddToSize bigint)
RETURNS void AS $$
BEGIN
UPDATE
Totals
SET
TotalSize = TotalSize + $2,
TotalCount = TotalCount + $3
WHERE
UserID = $1;
END IF;
END
$$ LANGUAGE 'plpgsql' VOLATILE;

Where is that stray "END IF;" comming from?
Anyway, this should be safe, and work even more
reliably in read-committed mode than in serializable
mode. In serializeable mode, if the Total of the same
user is updated by two transactions simultanously,
you'll get a SerializationError. Read-Committed mode
OTOH will make sure that it uses the latest version of
the tuple for calculating the new values.

Usually, you need serializable mode if you do updates
based on the results of previous selects.

greetings, Florian Pflug

#3Jeff Davis
pgsql@j-davis.com
In reply to: William Garrison (#1)
Re: Do I need serializable for this query?

On Tue, 2007-04-10 at 14:45 -0400, William Garrison wrote:

I have a table that keeps running totals. It is possible that this
would get called twice simultaneously for the same UserID. Do I need to
put this in a serializable transaction? Since this gets called often,
could it be a performance problem if I make it serializable?

CREATE FUNCTION UpdateTotals(IN UserID int,
IN AddToCount int,
IN AddToSize bigint)
RETURNS void AS $$
BEGIN
UPDATE
Totals
SET
TotalSize = TotalSize + $2,
TotalCount = TotalCount + $3
WHERE
UserID = $1;
END IF;
END
$$ LANGUAGE 'plpgsql' VOLATILE;

You only have one command to execute; you don't need to do anything
special at all. If two UPDATEs happen to the same tuple at the same
time, one will block waiting for the other to commit.

Regards,
Jeff Davis

#4Peter Eisentraut
peter_e@gmx.net
In reply to: William Garrison (#1)
Re: Do I need serializable for this query?

William Garrison wrote:

I have a table that keeps running totals. It is possible that this
would get called twice simultaneously for the same UserID. Do I need
to put this in a serializable transaction?

Transaction isolation is only a question of interest if you have more
than one statement in a transaction.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

#5Michael Fuhr
mike@fuhr.org
In reply to: Peter Eisentraut (#4)
Re: Do I need serializable for this query?

On Tue, Apr 10, 2007 at 10:52:11PM +0200, Peter Eisentraut wrote:

William Garrison wrote:

I have a table that keeps running totals. It is possible that this
would get called twice simultaneously for the same UserID. Do I need
to put this in a serializable transaction?

Transaction isolation is only a question of interest if you have more
than one statement in a transaction.

Not true; the isolation level is also relevant if you're doing
concurrent updates or deletes (although, as the documentation
mentions, PostgreSQL's serializable isolation level doesn't guarantee
true serializability).

create table test (id integer primary key, t text);
insert into test (id, t) values (1, 'a');

T1: begin isolation level serializable;
T2: begin isolation level serializable;
T1: update test set t = 'b' where id = 1;
T2: update test set t = 'c' where id = 1; -- blocks
T1: commit;
T2: ERROR: could not serialize access due to concurrent update

--
Michael Fuhr

#6William Garrison
postgres@mobydisk.com
In reply to: Florian Pflug (#2)
Do I need serializable for this query? -- Part 2

This is part 2 of my "serializable" question from earlier. Thanks to
everyone who answered the first part. (Hopefully there will be no part 3)

When a user adds a record, I update a table with totals. But when a
user deletes data, I subtract the totals. That presents a more
interesting situation since it is theoretically possible to add a record
and delete one at the same time:

For simplicity sake, the two tables are something like:
item(itemid int, data...)
history(itemid int, versionid int, data...)

-- This deletes the item and all the history records for the item
CREATE FUNCTION DeleteItem(IN _UserID int,
IN _ItemID int)
RETURNS void AS $$
DECLARE
mycount int;
total bigint;
BEGIN
-- Cotal and total of data to be deleted
SELECT
COUNT(*), COALESCE(SUM(clientsize),0)
INTO
mycount,total
FROM
history
WHERE
userid= $1 AND itemid=$2;

-- Potential problem point #1

-- Call the stored proc to update the count and total
-- This is the one from my last email
PERFORM sbp_UpdateTotals($1,-mycount,-total);

-- Potential problem point #2

-- Delete the records
DELETE FROM history WHERE userid = $1 AND itemid=$2;
DELETE FROM items WHERE userid = $1 AND itemid=$2;
END
$$ LANGUAGE 'plpgsql' VOLATILE;

Suppose the user adds another record to the history table with the same
itemid as the one being deleted. If they do this at one of the two
"problem points" then that history record will be deleted, but it will
not be subtracted from the totals. Am I understanding this properly?

In my system, this is highly unlikely, and probably not even something
we care about very much. But I still want to know what to do. It seems
to me I could a few things to fix this:

1) Make this serializable
- I'm not even sure this will help, since adding a new history record
won't change any records that this touches. Would PostgreSQL even
realize that adding a history record would have changed the results of
the select?
2) Table lock
3) Make the first select statement store the PK(itemid,versionid) of the
history records and then only delete those records at the end.
4) Somehow calculate the count and total during the delete statement
5) Compare the # of records deleted with the # of records selected and
throw/rollback if they mismatch.

I'm curious which one I should do, and if my solutions are all valid.

Also, I'm under the impression that I don't need to do any sort of
begin/end transaction stuff in here, since PostgreSql does that
implicitly with any stored procedure. Is that correct?

Thanks to anyone who can assist.

Florian G. Pflug wrote:

Show quoted text

William Garrison wrote:

I have a table that keeps running totals. It is possible that this
would get called twice simultaneously for the same UserID. Do I need
to put this in a serializable transaction? Since this gets called
often, could it be a performance problem if I make it serializable?

CREATE FUNCTION UpdateTotals(IN UserID int,
IN AddToCount int,
IN AddToSize bigint)
RETURNS void AS $$
BEGIN
UPDATE
Totals
SET
TotalSize = TotalSize + $2,
TotalCount = TotalCount + $3
WHERE
UserID = $1;
END IF;
END
$$ LANGUAGE 'plpgsql' VOLATILE;

Where is that stray "END IF;" comming from?
Anyway, this should be safe, and work even more
reliably in read-committed mode than in serializable
mode. In serializeable mode, if the Total of the same
user is updated by two transactions simultanously,
you'll get a SerializationError. Read-Committed mode
OTOH will make sure that it uses the latest version of
the tuple for calculating the new values.

Usually, you need serializable mode if you do updates
based on the results of previous selects.

greetings, Florian Pflug

#7William Garrison
postgres@mobydisk.com
In reply to: William Garrison (#6)
Re: Do I need serializable for this query? -- Part 2

Should I just use a trigger to update these totals? That way concurrent
deletes/updates would be guaranteed to update the totals. The only down
side is that deleting 10 records would result in 10 updates to the
totals. But deleting is rare enough that it probably isn't a problem.

William Garrison wrote:

Show quoted text

This is part 2 of my "serializable" question from earlier. Thanks to
everyone who answered the first part. (Hopefully there will be no part 3)

When a user adds a record, I update a table with totals. But when a
user deletes data, I subtract the totals. That presents a more
interesting situation since it is theoretically possible to add a record
and delete one at the same time:

For simplicity sake, the two tables are something like:
item(itemid int, data...)
history(itemid int, versionid int, data...)

-- This deletes the item and all the history records for the item
CREATE FUNCTION DeleteItem(IN _UserID int,
IN _ItemID int)
RETURNS void AS $$
DECLARE
mycount int;
total bigint;
BEGIN
-- Cotal and total of data to be deleted
SELECT
COUNT(*), COALESCE(SUM(clientsize),0)
INTO
mycount,total
FROM
history
WHERE
userid= $1 AND itemid=$2;

-- Potential problem point #1

-- Call the stored proc to update the count and total
-- This is the one from my last email
PERFORM sbp_UpdateTotals($1,-mycount,-total);

-- Potential problem point #2

-- Delete the records
DELETE FROM history WHERE userid = $1 AND itemid=$2;
DELETE FROM items WHERE userid = $1 AND itemid=$2;
END
$$ LANGUAGE 'plpgsql' VOLATILE;

Suppose the user adds another record to the history table with the same
itemid as the one being deleted. If they do this at one of the two
"problem points" then that history record will be deleted, but it will
not be subtracted from the totals. Am I understanding this properly?

In my system, this is highly unlikely, and probably not even something
we care about very much. But I still want to know what to do. It seems
to me I could a few things to fix this:

1) Make this serializable
- I'm not even sure this will help, since adding a new history record
won't change any records that this touches. Would PostgreSQL even
realize that adding a history record would have changed the results of
the select?
2) Table lock
3) Make the first select statement store the PK(itemid,versionid) of the
history records and then only delete those records at the end.
4) Somehow calculate the count and total during the delete statement
5) Compare the # of records deleted with the # of records selected and
throw/rollback if they mismatch.

I'm curious which one I should do, and if my solutions are all valid.

Also, I'm under the impression that I don't need to do any sort of
begin/end transaction stuff in here, since PostgreSql does that
implicitly with any stored procedure. Is that correct?

Thanks to anyone who can assist.

Florian G. Pflug wrote:

William Garrison wrote:

I have a table that keeps running totals. It is possible that this
would get called twice simultaneously for the same UserID. Do I need
to put this in a serializable transaction? Since this gets called
often, could it be a performance problem if I make it serializable?

CREATE FUNCTION UpdateTotals(IN UserID int,
IN AddToCount int,
IN AddToSize bigint)
RETURNS void AS $$
BEGIN
UPDATE
Totals
SET
TotalSize = TotalSize + $2,
TotalCount = TotalCount + $3
WHERE
UserID = $1;
END IF;
END
$$ LANGUAGE 'plpgsql' VOLATILE;

Where is that stray "END IF;" comming from?
Anyway, this should be safe, and work even more
reliably in read-committed mode than in serializable
mode. In serializeable mode, if the Total of the same
user is updated by two transactions simultanously,
you'll get a SerializationError. Read-Committed mode
OTOH will make sure that it uses the latest version of
the tuple for calculating the new values.

Usually, you need serializable mode if you do updates
based on the results of previous selects.

greetings, Florian Pflug

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match