SQL design pattern for a delta trigger?

Started by Ted Byersover 18 years ago20 messagesgeneral
Jump to latest
#1Ted Byers
r.ted.byers@rogers.com

IS there such a thing? I can be the first to consider
this.

What I am aiming for is a solution with a couple
coupled tables, one of which represents state through
time and the other represents transactions or deltas
on the state. With one field (a floating point
number) in the state table (or should I say a number
for each id field), it obviously has a time interval
for which it is valid: a start time and an end time.

What I am after is a situation where the moment a
record is inserted in the deltas table, a trigger
function first looks to see if the id provided
presently has a state in the state table. If not,
then it creates one. Then, the end date for that
state record gets set to the current time and a new
record is inserted with the new state (computed by
applying the delta to the value in the previous record
for the state), the current date as the start date and
null for the end date.

This seems like an obvious thing to try, but I am
floundering a little and am therefore wondering if
anyone has seen an SQL design pattern that talks about
this, and an url where I can see such a discussion
and, better, an example. The first concern is to
ensure that every record inserted into the deltas
table is immediately reflected in the state table, and
the second is that the history of state can be
reconstructed from a suitable query on the state
table.

I can do this easily in client code, but isn't this
the sort of thing best suited to living within the
database itself?

Thanks

Ted

#2Ted Byers
r.ted.byers@rogers.com
In reply to: Ted Byers (#1)
Re: SQL design pattern for a delta trigger?
--- Ted Byers <r.ted.byers@rogers.com> wrote:

IS there such a thing? I can be the first to
consider
this.

OOPS. The mind is faster than the fingers. That
should have been "I can NOT be the first to consider
this.

Ted

#3Erik Jones
erik@myemma.com
In reply to: Ted Byers (#1)
Re: SQL design pattern for a delta trigger?

On Dec 6, 2007, at 2:36 PM, Ted Byers wrote:

IS there such a thing? I can be the first to consider
this.

What I am aiming for is a solution with a couple
coupled tables, one of which represents state through
time and the other represents transactions or deltas
on the state. With one field (a floating point
number) in the state table (or should I say a number
for each id field), it obviously has a time interval
for which it is valid: a start time and an end time.

What I am after is a situation where the moment a
record is inserted in the deltas table, a trigger
function first looks to see if the id provided
presently has a state in the state table. If not,
then it creates one. Then, the end date for that
state record gets set to the current time and a new
record is inserted with the new state (computed by
applying the delta to the value in the previous record
for the state), the current date as the start date and
null for the end date.

This seems like an obvious thing to try, but I am
floundering a little and am therefore wondering if
anyone has seen an SQL design pattern that talks about
this, and an url where I can see such a discussion
and, better, an example. The first concern is to
ensure that every record inserted into the deltas
table is immediately reflected in the state table, and
the second is that the history of state can be
reconstructed from a suitable query on the state
table.

I can do this easily in client code, but isn't this
the sort of thing best suited to living within the
database itself?

What you want to do here for handling the update v. insert is called
an "UPSERT". Basically, what you do is run the update as if the row
exists and catch the exception that is thrown if it doesn't at which
point you insert the record with the end date = now(). After that
you can proceed normally with creating the new record with start date
= now() and end date = NULL.

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com

#4Ted Byers
r.ted.byers@rogers.com
In reply to: Erik Jones (#3)
Re: SQL design pattern for a delta trigger?
--- Erik Jones <erik@myemma.com> wrote:

On Dec 6, 2007, at 2:36 PM, Ted Byers wrote:

[snip]
What you want to do here for handling the update v.
insert is called
an "UPSERT". Basically, what you do is run the
update as if the row
exists and catch the exception that is thrown if it
doesn't at which
point you insert the record with the end date =
now(). After that
you can proceed normally with creating the new
record with start date
= now() and end date = NULL.

Thanks Eric. Do you know of an URL where this is
discussed or where I can find an example. None of my
books discuss this, and my search using google has so
far produced only noise.

Thanks again.

Ted

#5Colin Wetherbee
cww@denterprises.org
In reply to: Ted Byers (#4)
Re: SQL design pattern for a delta trigger?

Ted Byers wrote:

--- Erik Jones <erik@myemma.com> wrote:

On Dec 6, 2007, at 2:36 PM, Ted Byers wrote:

[snip]
What you want to do here for handling the update v.
insert is called
an "UPSERT". Basically, what you do is run the
update as if the row
exists and catch the exception that is thrown if it
doesn't at which
point you insert the record with the end date =
now(). After that
you can proceed normally with creating the new
record with start date
= now() and end date = NULL.

Thanks Eric. Do you know of an URL where this is
discussed or where I can find an example. None of my
books discuss this, and my search using google has so
far produced only noise.

You can do this with a conditional. Something like the following should
work.

IF
NOT (a query matching your data returns rows)
THEN
INSERT (your new data)

AFAIK, the developers are working on implementing the {MERGE, UPDATE OR
ON FAILURE INSERT, UPSERT} statement. Until then, you have to build
your upsert manually.

Colin

#6Steve Atkins
steve@blighty.com
In reply to: Ted Byers (#4)
Re: SQL design pattern for a delta trigger?

On Dec 7, 2007, at 6:29 AM, Ted Byers wrote:

--- Erik Jones <erik@myemma.com> wrote:

On Dec 6, 2007, at 2:36 PM, Ted Byers wrote:

[snip]
What you want to do here for handling the update v.
insert is called
an "UPSERT". Basically, what you do is run the
update as if the row
exists and catch the exception that is thrown if it
doesn't at which
point you insert the record with the end date =
now(). After that
you can proceed normally with creating the new
record with start date
= now() and end date = NULL.

Thanks Eric. Do you know of an URL where this is
discussed or where I can find an example. None of my
books discuss this, and my search using google has so
far produced only noise.

http://www.postgresql.org/docs/current/static/plpgsql-control-
structures.html#PLPGSQL-UPSERT-EXAMPLE might be a good place to start.

Cheers,
Steve

#7Erik Jones
erik@myemma.com
In reply to: Ted Byers (#4)
Re: SQL design pattern for a delta trigger?

On Dec 7, 2007, at 8:29 AM, Ted Byers wrote:

--- Erik Jones <erik@myemma.com> wrote:

On Dec 6, 2007, at 2:36 PM, Ted Byers wrote:

[snip]
What you want to do here for handling the update v.
insert is called
an "UPSERT". Basically, what you do is run the
update as if the row
exists and catch the exception that is thrown if it
doesn't at which
point you insert the record with the end date =
now(). After that
you can proceed normally with creating the new
record with start date
= now() and end date = NULL.

Thanks Eric. Do you know of an URL where this is
discussed or where I can find an example. None of my
books discuss this, and my search using google has so
far produced only noise.

Thanks again.

The basic pseudo-code (not really SQL) is something like:

TRY:
run UPDATE
EXCEPT not found:
make INSERT

If you can give me some table layouts I can probably work out a
better example for you.

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com

#8John D. Burger
john@mitre.org
In reply to: Erik Jones (#3)
Re: SQL design pattern for a delta trigger?

So two design patterns for a makeshift UPSERT have been presented -
one is to check beforehand, and only insert if the item isn't present
already, the other is to do the insert blindly and let PG check for
you, and catch any exceptions.

I'm also wondering what people's ideas are for a sort of BULK
UPSERT. I often find myself inserting the results of a SELECT and
wanting a similar check for already existing rows. The idiom I've
stumbled upon looks like this:

insert into foo (x, y, z)
select a, b, c from bar join bax ...
EXCEPT
select x, y, z from foo;

Namely, I subtract from the results to be inserted any rows that are
already present in the target table. This can actually even be used
for UPSERTing a single row, and has the virtue of being pure SQL, but
I've wondered about its efficiency. One alternative would be to
iterate over the SELECT result with a procedural language, and do a
series of UPSERTS, but that seems unlikely to be as efficient for a
large result set. Any comments about the relative merits of these or
other alternatives?

Thanks.

- John Burger
MITRE

#9Steve Atkins
steve@blighty.com
In reply to: John D. Burger (#8)
Re: SQL design pattern for a delta trigger?

On Dec 8, 2007, at 7:54 AM, John D. Burger wrote:

So two design patterns for a makeshift UPSERT have been presented -
one is to check beforehand, and only insert if the item isn't
present already

... which will give the wrong results if there's any concurrent
updates...

, the other is to do the insert blindly and let PG check for you,
and catch any exceptions.

I'm also wondering what people's ideas are for a sort of BULK
UPSERT. I often find myself inserting the results of a SELECT and
wanting a similar check for already existing rows. The idiom I've
stumbled upon looks like this:

insert into foo (x, y, z)
select a, b, c from bar join bax ...
EXCEPT
select x, y, z from foo;

Namely, I subtract from the results to be inserted any rows that
are already present in the target table.

This can actually even be used for UPSERTing a single row, and has
the virtue of being pure SQL, but I've wondered about its efficiency.

Worry more about it's correctness. Doing entirely the wrong thing,
quickly, isn't always what you want. If there's any concurrency
involved at all, this is likely to do the wrong thing.

One alternative would be to iterate over the SELECT result with a
procedural language, and do a series of UPSERTS, but that seems
unlikely to be as efficient for a large result set.

Just take the idiom that's been pointed out in the documentation and
wrap a loop around it.

Cheers,
Steve

#10Ted Byers
r.ted.byers@rogers.com
In reply to: Steve Atkins (#9)
Re: SQL design pattern for a delta trigger?

Thanks all. I tried the appended code in a trigger
function, but postgresql won't take it.

It complains that assets.quantity is not a scalar.
However, the WHERE clause in that select statement
guarantees that at most only one record will be
returned. An open position on a given kind of asset
is represented by null in the end_valid_time field,
and the combination of asset_type_id, portfolio_id and
end_valid_time is certain to be unique, if there is a
record for that asset type in that porfolio at all.

I thought I'd try checking for an open position first
because the manual indicated that exception handling
is quite expensive. But I must have missed something,
because it doesn't like how I tried to define my
trigger function.

I have four sequences, one each for four tables. Two
of the tables are just look up tables, for asset types
and portfolios; trivial for test case with only an
autoincrementing integer primary key and a "name".
The other two are the ones of interest. Assets is
treated as read only as far as the user is concerned.
The user's data in the assets table is mediated
through transactions inserted (and NEVER deleted or
updated), into the transactions table. Assets has the
minimal suite of columns (autoincrementing integer
primary key, asset_typeID, portfolio_id, all integers,
quantity with is a floating point number and two
dates: start_valid_time and end_valid_time).
Transactions has only a transaction_id, portfolio_id,
asset_type_id, quantity and transaction_date. There
are of course foreign keys connectin the assets and
transactions tables to the lookup tables, and a
composite index on assets to make looking up records
based on portfolio_id, asset_id and end_valid_time as
quick as possible. It couldn't be simpler,
conceptually! yet I must have missed something, cause
postgresql won't accept the function body I show
below.

If I can't get this working quickly, I may just resort
to creating a stored procedure that takes the
transaction details as arguments and processes both
tables appropriately without relying on a trigger.
:-(

Thanks for everyone's help.

Ted

===========================================
DECLARE
id BIGINT;
q DOUBLE PRECISION;
BEGIN
SELECT assets.id INTO id, assets.quantity INTO q
FROM assets
WHERE assets.asset_type_id = NEW.asset_type_id
AND assets.portfolio_id = NEW.portfolio_id
AND assets.end_valid_time IS NULL;
IF (id IS NULL) THEN
INSERT INTO assets (asset_type_id,
portfolio_id,quantity,start_valid_stime,end_valid_time)
VALUES (NEW.asset_type_id,NEW.portfolio_id,
NEW.quantity, NEW.transaction_date,NULL);
ELSE
UPDATE assets SET end_valid_time =
NEW.transaction_date WHERE id = id;
INSERT INTO assets (asset_type_id,
portfolio_id,quantity,start_valid_stime,end_valid_time)
VALUES (NEW.asset_type_id,NEW.portfolio_id, q +
NEW.quantity, NEW.transaction_date,NULL);
END
END

#11Vick Khera
vivek@khera.org
In reply to: Colin Wetherbee (#5)
Re: SQL design pattern for a delta trigger?

On Dec 7, 2007, at 11:42 AM, Colin Wetherbee wrote:

You can do this with a conditional. Something like the following
should work.

IF
NOT (a query matching your data returns rows)
THEN
INSERT (your new data)

There exists a race condition here unless you've locked your tables.

#12Colin Wetherbee
cww@denterprises.org
In reply to: Vick Khera (#11)
Re: SQL design pattern for a delta trigger?

Vivek Khera wrote:

On Dec 7, 2007, at 11:42 AM, Colin Wetherbee wrote:

You can do this with a conditional. Something like the following
should work.

IF
NOT (a query matching your data returns rows)
THEN
INSERT (your new data)

There exists a race condition here unless you've locked your tables.

Yes, clearly. In the context of the thread, I was assuming my algorithm
would be implemented as an atomic transaction.

For what it's worth, the real algorithm would be as follows. I hadn't
had enough coffee yet, and I forgot the UPDATE bit.

IF
(a query matching your old data returns rows)
THEN
UPDATE with your new data
ELSE
INSERT your new data

Colin

#13Vivek Khera
khera@kcilink.com
In reply to: Colin Wetherbee (#12)
Re: SQL design pattern for a delta trigger?

On Dec 10, 2007, at 5:04 PM, Colin Wetherbee wrote:

For what it's worth, the real algorithm would be as follows. I
hadn't had enough coffee yet, and I forgot the UPDATE bit.

IF
(a query matching your old data returns rows)
THEN
UPDATE with your new data
ELSE
INSERT your new data

Still exists race condition. Your race comes from testing existence,
then creating/modifying data afterwards. You need to make the test/
set atomic else you have race.

#14Ted Byers
r.ted.byers@rogers.com
In reply to: Vivek Khera (#13)
Re: SQL design pattern for a delta trigger?
--- Vivek Khera <khera@kcilink.com> wrote:

On Dec 10, 2007, at 5:04 PM, Colin Wetherbee wrote:

For what it's worth, the real algorithm would be

as follows. I

hadn't had enough coffee yet, and I forgot the

UPDATE bit.

IF
(a query matching your old data returns rows)
THEN
UPDATE with your new data
ELSE
INSERT your new data

Still exists race condition. Your race comes from
testing existence,
then creating/modifying data afterwards. You need
to make the test/
set atomic else you have race.

Yes, but how do you do that in a stored function or
procedure or in a trigger. It would be obvious to me
if I were writing this in C++ or Java, but how do you
do it using SQL in an RDBMS?

I saw something about table locks, but that doesn't
seem wise, WRT performance.

The classic example of a race condition, involving a
bank account, was used in the manual to introduce the
idea of a transaction, but we can't use a transaction
in a trigger, can we?

It is one thing to point out a race condition, but a
pointer to a solution that would work in the context
of the problem at hand would be useful and
appreciated.

Thanks all.

Ted

#15Erik Jones
erik@myemma.com
In reply to: Ted Byers (#14)
Re: SQL design pattern for a delta trigger?

On Dec 10, 2007, at 4:48 PM, Ted Byers wrote:

--- Vivek Khera <khera@kcilink.com> wrote:

On Dec 10, 2007, at 5:04 PM, Colin Wetherbee wrote:

For what it's worth, the real algorithm would be

as follows. I

hadn't had enough coffee yet, and I forgot the

UPDATE bit.

IF
(a query matching your old data returns rows)
THEN
UPDATE with your new data
ELSE
INSERT your new data

Still exists race condition. Your race comes from
testing existence,
then creating/modifying data afterwards. You need
to make the test/
set atomic else you have race.

Yes, but how do you do that in a stored function or
procedure or in a trigger. It would be obvious to me
if I were writing this in C++ or Java, but how do you
do it using SQL in an RDBMS?

I saw something about table locks, but that doesn't
seem wise, WRT performance.

The classic example of a race condition, involving a
bank account, was used in the manual to introduce the
idea of a transaction, but we can't use a transaction
in a trigger, can we?

It is one thing to point out a race condition, but a
pointer to a solution that would work in the context
of the problem at hand would be useful and
appreciated.

Thanks all.

In a stored procedure you'd just execute the UPDATE and then check
the FOUND variable to see if it found a row to update:

UPDATE table_name SET foo='bar' WHERE id=5;

IF NOT FOUND THEN
INSERT INTO table_name (id, foo) VALUES (5, 'bar');
END IF;

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com

#16Ted Byers
r.ted.byers@rogers.com
In reply to: Erik Jones (#15)
Re: SQL design pattern for a delta trigger?

Thanks Erik

In a stored procedure you'd just execute the UPDATE
and then check
the FOUND variable to see if it found a row to
update:

UPDATE table_name SET foo='bar' WHERE id=5;

IF NOT FOUND THEN
INSERT INTO table_name (id, foo) VALUES (5, 'bar');
END IF;

To be clear, if I understand you correctly, with your
example, if there is no record where id=5, nothing
happens except FOUND is set to false? Can I, then,
declare a variable prior to your update statement, and
then modify your update statement so that the value in
a particular field on the row where id=5 can be
captured? Bearing in mind this is to be in a row
level trigger after an insert into table_name,
something like:

DECLARE q DOUBLE;
UPDATE table_name
SET foo='bar',
q = table_name.quantity
WHERE id=5;

And then follow that with something like:
IF FOUND THEN
INSERT INTO another_table (baz,quantity)
VALUES (foo,q+NEW.quantity);
ELSE
INSERT INTO another_table (baz,quantity)
VALUES (foo,NEW.quantity);
END IF

Thanks again,

Ted

#17Richard Broersma Jr
rabroersma@yahoo.com
In reply to: Ted Byers (#14)
Re: SQL design pattern for a delta trigger?
--- On Mon, 12/10/07, Ted Byers <r.ted.byers@rogers.com> wrote:

but how do you
do it using SQL in an RDBMS?

I believe that there is an ANSI SQL command "MERGE" that is yet to be implemented into PostgreSQL.

Regards,
Richard Broersma Jr.

#18Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Richard Broersma Jr (#17)
Re: SQL design pattern for a delta trigger?

Richard Broersma Jr wrote:

--- On Mon, 12/10/07, Ted Byers <r.ted.byers@rogers.com> wrote:

but how do you
do it using SQL in an RDBMS?

I believe that there is an ANSI SQL command "MERGE" that is yet to be implemented into PostgreSQL.

IIRC the standard's definition of MERGE is still subject to the race
condition :-) It seems mostly defined for OLAP, and assumes rather
static data.

--
Alvaro Herrera http://www.amazon.com/gp/registry/DXLWNGRJD34J
"La tristeza es un muro entre dos jardines" (Khalil Gibran)

#19Colin Wetherbee
cww@denterprises.org
In reply to: Vivek Khera (#13)
Re: SQL design pattern for a delta trigger?

Vivek Khera wrote:

On Dec 10, 2007, at 5:04 PM, Colin Wetherbee wrote:

For what it's worth, the real algorithm would be as follows. I
hadn't had enough coffee yet, and I forgot the UPDATE bit.

IF (a query matching your old data returns rows) THEN UPDATE with
your new data ELSE INSERT your new data

Still exists race condition. Your race comes from testing existence,
then creating/modifying data afterwards. You need to make the
test/set atomic else you have race.

I guess when I wrote that the algorithm would have to be implemented in
an atomic manner, it fell on deaf ears.

That said, perhaps implementing a good MERGE would be not such a bad
idea for PostgreSQL 8.4.

Colin

#20Trevor Talbot
quension@gmail.com
In reply to: Colin Wetherbee (#19)
Re: SQL design pattern for a delta trigger?

On 12/10/07, Colin Wetherbee <cww@denterprises.org> wrote:

Vivek Khera wrote:

On Dec 10, 2007, at 5:04 PM, Colin Wetherbee wrote:

IF (a query matching your old data returns rows) THEN UPDATE with
your new data ELSE INSERT your new data

Still exists race condition. Your race comes from testing existence,
then creating/modifying data afterwards. You need to make the
test/set atomic else you have race.

I guess when I wrote that the algorithm would have to be implemented in
an atomic manner, it fell on deaf ears.

The problem is that there isn't a good atomic method for that order of
operations, short of locking the entire table first. A concurrent
transaction might insert a row after your test but before your own
INSERT. Even a SERIALIZABLE transaction won't help, as PostgreSQL
doesn't implement predicate locking.

That's why the example in the docs is a loop with result checking on
both operations, and requires a UNIQUE constraint to work correctly.

If high concurrency isn't a concern, table locking is the simpler approach.