SQL design pattern for a delta trigger?
IS there such a thing? I can be the first to consider
this.
What I am aiming for is a solution with a couple
coupled tables, one of which represents state through
time and the other represents transactions or deltas
on the state. With one field (a floating point
number) in the state table (or should I say a number
for each id field), it obviously has a time interval
for which it is valid: a start time and an end time.
What I am after is a situation where the moment a
record is inserted in the deltas table, a trigger
function first looks to see if the id provided
presently has a state in the state table. If not,
then it creates one. Then, the end date for that
state record gets set to the current time and a new
record is inserted with the new state (computed by
applying the delta to the value in the previous record
for the state), the current date as the start date and
null for the end date.
This seems like an obvious thing to try, but I am
floundering a little and am therefore wondering if
anyone has seen an SQL design pattern that talks about
this, and an url where I can see such a discussion
and, better, an example. The first concern is to
ensure that every record inserted into the deltas
table is immediately reflected in the state table, and
the second is that the history of state can be
reconstructed from a suitable query on the state
table.
I can do this easily in client code, but isn't this
the sort of thing best suited to living within the
database itself?
Thanks
Ted
--- Ted Byers <r.ted.byers@rogers.com> wrote:
IS there such a thing? I can be the first to
consider
this.
OOPS. The mind is faster than the fingers. That
should have been "I can NOT be the first to consider
this.
Ted
On Dec 6, 2007, at 2:36 PM, Ted Byers wrote:
IS there such a thing? I can be the first to consider
this.
What I am aiming for is a solution with a couple
coupled tables, one of which represents state through
time and the other represents transactions or deltas
on the state. With one field (a floating point
number) in the state table (or should I say a number
for each id field), it obviously has a time interval
for which it is valid: a start time and an end time.What I am after is a situation where the moment a
record is inserted in the deltas table, a trigger
function first looks to see if the id provided
presently has a state in the state table. If not,
then it creates one. Then, the end date for that
state record gets set to the current time and a new
record is inserted with the new state (computed by
applying the delta to the value in the previous record
for the state), the current date as the start date and
null for the end date.This seems like an obvious thing to try, but I am
floundering a little and am therefore wondering if
anyone has seen an SQL design pattern that talks about
this, and an url where I can see such a discussion
and, better, an example. The first concern is to
ensure that every record inserted into the deltas
table is immediately reflected in the state table, and
the second is that the history of state can be
reconstructed from a suitable query on the state
table.I can do this easily in client code, but isn't this
the sort of thing best suited to living within the
database itself?
What you want to do here for handling the update v. insert is called
an "UPSERT". Basically, what you do is run the update as if the row
exists and catch the exception that is thrown if it doesn't at which
point you insert the record with the end date = now(). After that
you can proceed normally with creating the new record with start date
= now() and end date = NULL.
Erik Jones
Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)
Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com
--- Erik Jones <erik@myemma.com> wrote:
On Dec 6, 2007, at 2:36 PM, Ted Byers wrote:
[snip]
What you want to do here for handling the update v.
insert is called
an "UPSERT". Basically, what you do is run the
update as if the row
exists and catch the exception that is thrown if it
doesn't at which
point you insert the record with the end date =
now(). After that
you can proceed normally with creating the new
record with start date
= now() and end date = NULL.
Thanks Eric. Do you know of an URL where this is
discussed or where I can find an example. None of my
books discuss this, and my search using google has so
far produced only noise.
Thanks again.
Ted
Ted Byers wrote:
--- Erik Jones <erik@myemma.com> wrote:On Dec 6, 2007, at 2:36 PM, Ted Byers wrote:
[snip]
What you want to do here for handling the update v.
insert is called
an "UPSERT". Basically, what you do is run the
update as if the row
exists and catch the exception that is thrown if it
doesn't at which
point you insert the record with the end date =
now(). After that
you can proceed normally with creating the new
record with start date
= now() and end date = NULL.Thanks Eric. Do you know of an URL where this is
discussed or where I can find an example. None of my
books discuss this, and my search using google has so
far produced only noise.
You can do this with a conditional. Something like the following should
work.
IF
NOT (a query matching your data returns rows)
THEN
INSERT (your new data)
AFAIK, the developers are working on implementing the {MERGE, UPDATE OR
ON FAILURE INSERT, UPSERT} statement. Until then, you have to build
your upsert manually.
Colin
On Dec 7, 2007, at 6:29 AM, Ted Byers wrote:
--- Erik Jones <erik@myemma.com> wrote:On Dec 6, 2007, at 2:36 PM, Ted Byers wrote:
[snip]
What you want to do here for handling the update v.
insert is called
an "UPSERT". Basically, what you do is run the
update as if the row
exists and catch the exception that is thrown if it
doesn't at which
point you insert the record with the end date =
now(). After that
you can proceed normally with creating the new
record with start date
= now() and end date = NULL.Thanks Eric. Do you know of an URL where this is
discussed or where I can find an example. None of my
books discuss this, and my search using google has so
far produced only noise.
http://www.postgresql.org/docs/current/static/plpgsql-control-
structures.html#PLPGSQL-UPSERT-EXAMPLE might be a good place to start.
Cheers,
Steve
On Dec 7, 2007, at 8:29 AM, Ted Byers wrote:
--- Erik Jones <erik@myemma.com> wrote:On Dec 6, 2007, at 2:36 PM, Ted Byers wrote:
[snip]
What you want to do here for handling the update v.
insert is called
an "UPSERT". Basically, what you do is run the
update as if the row
exists and catch the exception that is thrown if it
doesn't at which
point you insert the record with the end date =
now(). After that
you can proceed normally with creating the new
record with start date
= now() and end date = NULL.Thanks Eric. Do you know of an URL where this is
discussed or where I can find an example. None of my
books discuss this, and my search using google has so
far produced only noise.Thanks again.
The basic pseudo-code (not really SQL) is something like:
TRY:
run UPDATE
EXCEPT not found:
make INSERT
If you can give me some table layouts I can probably work out a
better example for you.
Erik Jones
Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)
Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com
So two design patterns for a makeshift UPSERT have been presented -
one is to check beforehand, and only insert if the item isn't present
already, the other is to do the insert blindly and let PG check for
you, and catch any exceptions.
I'm also wondering what people's ideas are for a sort of BULK
UPSERT. I often find myself inserting the results of a SELECT and
wanting a similar check for already existing rows. The idiom I've
stumbled upon looks like this:
insert into foo (x, y, z)
select a, b, c from bar join bax ...
EXCEPT
select x, y, z from foo;
Namely, I subtract from the results to be inserted any rows that are
already present in the target table. This can actually even be used
for UPSERTing a single row, and has the virtue of being pure SQL, but
I've wondered about its efficiency. One alternative would be to
iterate over the SELECT result with a procedural language, and do a
series of UPSERTS, but that seems unlikely to be as efficient for a
large result set. Any comments about the relative merits of these or
other alternatives?
Thanks.
- John Burger
MITRE
On Dec 8, 2007, at 7:54 AM, John D. Burger wrote:
So two design patterns for a makeshift UPSERT have been presented -
one is to check beforehand, and only insert if the item isn't
present already
... which will give the wrong results if there's any concurrent
updates...
, the other is to do the insert blindly and let PG check for you,
and catch any exceptions.I'm also wondering what people's ideas are for a sort of BULK
UPSERT. I often find myself inserting the results of a SELECT and
wanting a similar check for already existing rows. The idiom I've
stumbled upon looks like this:insert into foo (x, y, z)
select a, b, c from bar join bax ...
EXCEPT
select x, y, z from foo;Namely, I subtract from the results to be inserted any rows that
are already present in the target table.
This can actually even be used for UPSERTing a single row, and has
the virtue of being pure SQL, but I've wondered about its efficiency.
Worry more about it's correctness. Doing entirely the wrong thing,
quickly, isn't always what you want. If there's any concurrency
involved at all, this is likely to do the wrong thing.
One alternative would be to iterate over the SELECT result with a
procedural language, and do a series of UPSERTS, but that seems
unlikely to be as efficient for a large result set.
Just take the idiom that's been pointed out in the documentation and
wrap a loop around it.
Cheers,
Steve
Thanks all. I tried the appended code in a trigger
function, but postgresql won't take it.
It complains that assets.quantity is not a scalar.
However, the WHERE clause in that select statement
guarantees that at most only one record will be
returned. An open position on a given kind of asset
is represented by null in the end_valid_time field,
and the combination of asset_type_id, portfolio_id and
end_valid_time is certain to be unique, if there is a
record for that asset type in that porfolio at all.
I thought I'd try checking for an open position first
because the manual indicated that exception handling
is quite expensive. But I must have missed something,
because it doesn't like how I tried to define my
trigger function.
I have four sequences, one each for four tables. Two
of the tables are just look up tables, for asset types
and portfolios; trivial for test case with only an
autoincrementing integer primary key and a "name".
The other two are the ones of interest. Assets is
treated as read only as far as the user is concerned.
The user's data in the assets table is mediated
through transactions inserted (and NEVER deleted or
updated), into the transactions table. Assets has the
minimal suite of columns (autoincrementing integer
primary key, asset_typeID, portfolio_id, all integers,
quantity with is a floating point number and two
dates: start_valid_time and end_valid_time).
Transactions has only a transaction_id, portfolio_id,
asset_type_id, quantity and transaction_date. There
are of course foreign keys connectin the assets and
transactions tables to the lookup tables, and a
composite index on assets to make looking up records
based on portfolio_id, asset_id and end_valid_time as
quick as possible. It couldn't be simpler,
conceptually! yet I must have missed something, cause
postgresql won't accept the function body I show
below.
If I can't get this working quickly, I may just resort
to creating a stored procedure that takes the
transaction details as arguments and processes both
tables appropriately without relying on a trigger.
:-(
Thanks for everyone's help.
Ted
===========================================
DECLARE
id BIGINT;
q DOUBLE PRECISION;
BEGIN
SELECT assets.id INTO id, assets.quantity INTO q
FROM assets
WHERE assets.asset_type_id = NEW.asset_type_id
AND assets.portfolio_id = NEW.portfolio_id
AND assets.end_valid_time IS NULL;
IF (id IS NULL) THEN
INSERT INTO assets (asset_type_id,
portfolio_id,quantity,start_valid_stime,end_valid_time)
VALUES (NEW.asset_type_id,NEW.portfolio_id,
NEW.quantity, NEW.transaction_date,NULL);
ELSE
UPDATE assets SET end_valid_time =
NEW.transaction_date WHERE id = id;
INSERT INTO assets (asset_type_id,
portfolio_id,quantity,start_valid_stime,end_valid_time)
VALUES (NEW.asset_type_id,NEW.portfolio_id, q +
NEW.quantity, NEW.transaction_date,NULL);
END
END
On Dec 7, 2007, at 11:42 AM, Colin Wetherbee wrote:
You can do this with a conditional. Something like the following
should work.IF
NOT (a query matching your data returns rows)
THEN
INSERT (your new data)
There exists a race condition here unless you've locked your tables.
Vivek Khera wrote:
On Dec 7, 2007, at 11:42 AM, Colin Wetherbee wrote:
You can do this with a conditional. Something like the following
should work.IF
NOT (a query matching your data returns rows)
THEN
INSERT (your new data)There exists a race condition here unless you've locked your tables.
Yes, clearly. In the context of the thread, I was assuming my algorithm
would be implemented as an atomic transaction.
For what it's worth, the real algorithm would be as follows. I hadn't
had enough coffee yet, and I forgot the UPDATE bit.
IF
(a query matching your old data returns rows)
THEN
UPDATE with your new data
ELSE
INSERT your new data
Colin
On Dec 10, 2007, at 5:04 PM, Colin Wetherbee wrote:
For what it's worth, the real algorithm would be as follows. I
hadn't had enough coffee yet, and I forgot the UPDATE bit.IF
(a query matching your old data returns rows)
THEN
UPDATE with your new data
ELSE
INSERT your new data
Still exists race condition. Your race comes from testing existence,
then creating/modifying data afterwards. You need to make the test/
set atomic else you have race.
--- Vivek Khera <khera@kcilink.com> wrote:
On Dec 10, 2007, at 5:04 PM, Colin Wetherbee wrote:
For what it's worth, the real algorithm would be
as follows. I
hadn't had enough coffee yet, and I forgot the
UPDATE bit.
IF
(a query matching your old data returns rows)
THEN
UPDATE with your new data
ELSE
INSERT your new dataStill exists race condition. Your race comes from
testing existence,
then creating/modifying data afterwards. You need
to make the test/
set atomic else you have race.
Yes, but how do you do that in a stored function or
procedure or in a trigger. It would be obvious to me
if I were writing this in C++ or Java, but how do you
do it using SQL in an RDBMS?
I saw something about table locks, but that doesn't
seem wise, WRT performance.
The classic example of a race condition, involving a
bank account, was used in the manual to introduce the
idea of a transaction, but we can't use a transaction
in a trigger, can we?
It is one thing to point out a race condition, but a
pointer to a solution that would work in the context
of the problem at hand would be useful and
appreciated.
Thanks all.
Ted
On Dec 10, 2007, at 4:48 PM, Ted Byers wrote:
--- Vivek Khera <khera@kcilink.com> wrote:On Dec 10, 2007, at 5:04 PM, Colin Wetherbee wrote:
For what it's worth, the real algorithm would be
as follows. I
hadn't had enough coffee yet, and I forgot the
UPDATE bit.
IF
(a query matching your old data returns rows)
THEN
UPDATE with your new data
ELSE
INSERT your new dataStill exists race condition. Your race comes from
testing existence,
then creating/modifying data afterwards. You need
to make the test/
set atomic else you have race.Yes, but how do you do that in a stored function or
procedure or in a trigger. It would be obvious to me
if I were writing this in C++ or Java, but how do you
do it using SQL in an RDBMS?I saw something about table locks, but that doesn't
seem wise, WRT performance.The classic example of a race condition, involving a
bank account, was used in the manual to introduce the
idea of a transaction, but we can't use a transaction
in a trigger, can we?It is one thing to point out a race condition, but a
pointer to a solution that would work in the context
of the problem at hand would be useful and
appreciated.Thanks all.
In a stored procedure you'd just execute the UPDATE and then check
the FOUND variable to see if it found a row to update:
UPDATE table_name SET foo='bar' WHERE id=5;
IF NOT FOUND THEN
INSERT INTO table_name (id, foo) VALUES (5, 'bar');
END IF;
Erik Jones
Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)
Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com
Thanks Erik
In a stored procedure you'd just execute the UPDATE
and then check
the FOUND variable to see if it found a row to
update:UPDATE table_name SET foo='bar' WHERE id=5;
IF NOT FOUND THEN
INSERT INTO table_name (id, foo) VALUES (5, 'bar');
END IF;
To be clear, if I understand you correctly, with your
example, if there is no record where id=5, nothing
happens except FOUND is set to false? Can I, then,
declare a variable prior to your update statement, and
then modify your update statement so that the value in
a particular field on the row where id=5 can be
captured? Bearing in mind this is to be in a row
level trigger after an insert into table_name,
something like:
DECLARE q DOUBLE;
UPDATE table_name
SET foo='bar',
q = table_name.quantity
WHERE id=5;
And then follow that with something like:
IF FOUND THEN
INSERT INTO another_table (baz,quantity)
VALUES (foo,q+NEW.quantity);
ELSE
INSERT INTO another_table (baz,quantity)
VALUES (foo,NEW.quantity);
END IF
Thanks again,
Ted
--- On Mon, 12/10/07, Ted Byers <r.ted.byers@rogers.com> wrote:
but how do you
do it using SQL in an RDBMS?
I believe that there is an ANSI SQL command "MERGE" that is yet to be implemented into PostgreSQL.
Regards,
Richard Broersma Jr.
Richard Broersma Jr wrote:
--- On Mon, 12/10/07, Ted Byers <r.ted.byers@rogers.com> wrote:but how do you
do it using SQL in an RDBMS?I believe that there is an ANSI SQL command "MERGE" that is yet to be implemented into PostgreSQL.
IIRC the standard's definition of MERGE is still subject to the race
condition :-) It seems mostly defined for OLAP, and assumes rather
static data.
--
Alvaro Herrera http://www.amazon.com/gp/registry/DXLWNGRJD34J
"La tristeza es un muro entre dos jardines" (Khalil Gibran)
Vivek Khera wrote:
On Dec 10, 2007, at 5:04 PM, Colin Wetherbee wrote:
For what it's worth, the real algorithm would be as follows. I
hadn't had enough coffee yet, and I forgot the UPDATE bit.IF (a query matching your old data returns rows) THEN UPDATE with
your new data ELSE INSERT your new dataStill exists race condition. Your race comes from testing existence,
then creating/modifying data afterwards. You need to make the
test/set atomic else you have race.
I guess when I wrote that the algorithm would have to be implemented in
an atomic manner, it fell on deaf ears.
That said, perhaps implementing a good MERGE would be not such a bad
idea for PostgreSQL 8.4.
Colin
On 12/10/07, Colin Wetherbee <cww@denterprises.org> wrote:
Vivek Khera wrote:
On Dec 10, 2007, at 5:04 PM, Colin Wetherbee wrote:
IF (a query matching your old data returns rows) THEN UPDATE with
your new data ELSE INSERT your new data
Still exists race condition. Your race comes from testing existence,
then creating/modifying data afterwards. You need to make the
test/set atomic else you have race.
I guess when I wrote that the algorithm would have to be implemented in
an atomic manner, it fell on deaf ears.
The problem is that there isn't a good atomic method for that order of
operations, short of locking the entire table first. A concurrent
transaction might insert a row after your test but before your own
INSERT. Even a SERIALIZABLE transaction won't help, as PostgreSQL
doesn't implement predicate locking.
That's why the example in the docs is a loop with result checking on
both operations, and requires a UNIQUE constraint to work correctly.
If high concurrency isn't a concern, table locking is the simpler approach.