Implementing "thick"/"fat" databases

Started by Karl Nackover 14 years ago47 messagesgeneral

karlnack@futurityinc.com

over 14 years ago

I've been following a few blogs
(http://database-programmer.blogspot.com/,
http://thehelsinkideclaration.blogspot.com/) that make a very compelling
argument, in my opinion, to move as much business/transactional logic as
possible into the database, so that client applications become little
more than moving data into and out of the database using a well-defined
API, most commonly (but not necessarily) through the use of stored
procedures. Although there seems to be a lot of discussion out there of
the reasons why one might want to do this, I'm really at a loss for
finding good, concrete examples of how to do it. Consequently, I'm
hoping that somebody can share their experience(s), or point me to some
examples, of doing this with PostgreSQL. I'd consider myself fairly
well-versed in using the various features of PostgreSQL to enforce data
integrity, but much less so for implementing transactional logic.

To focus on a more concrete example, let's consider adding a financial
transaction to the database. The "traditional" way to do this, with the
business logic in the application layer, leaves us with two steps:
insert the transaction "header", then insert the line items:

BEGIN;

INSERT INTO transaction (id, date, description)
VALUES (1, CURRENT_DATE, 'Transaction 1');

INSERT INTO line_item (transaction_id, account_id, amount)
VALUES (1, 1, 50), (1, 2, -50);

END;

Now if we start moving this logic to the database, we'd have something
like:

BEGIN;
SELECT create_transaction(1, current_date, 'Transaction 1');
SELECT create_line_item(1, 1, 50);
SELECT create_line_item(1, 1, -50);
END;

But we've actually taken a step back, since we're making a round-trip to
the database for each line item. That could be resolved by doing:

BEGIN;

SELECT create_transaction(1, current_date, 'Transaction 1');

SELECT create_line_item(transaction_id, account_id, amount)
FROM (VALUES (1, 1, 50), (1, 2, -50))
AS line_item (transaction_id, account_id, amount);

END;

Better, but still not good, since we're invoking the function for each
individual line item, which ultimately means separate INSERTs for each
one. What we'd want is something like:

BEGIN;
SELECT create_transaction(1, current_date, 'Transaction 1');
SELECT create_line_items(((1, 1, 50), (1, 2, -50)));
END;

But this still falls short, since we're still basically managing the
transaction in the application layer. The holy grail, so to speak, would
be:

SELECT create_transaction(1, current_date, 'Transaction 1', ((1, 50),
(2, -50)));

Perhaps I just need to spend more time digging through the
documentation, but I really have no idea how to do something like this,
or if it's even possible. I'm really hoping someone can provide an
example, point me to some resources, or even just share their real-world
experience of doing something like this. It would be very much
appreciated.

Thanks.

Karl Nack

Futurity, Inc
5121 N Ravenswood Ave
Chicago, IL 60640
773-506-2007

Darren Duncan

darren@darrenduncan.net

over 14 years ago

In reply to: Karl Nack (#1)

Re: Implementing "thick"/"fat" databases

Karl Nack wrote:

I've been following a few blogs
(http://database-programmer.blogspot.com/,
http://thehelsinkideclaration.blogspot.com/) that make a very compelling
argument, in my opinion, to move as much business/transactional logic as
possible into the database, so that client applications become little
more than moving data into and out of the database using a well-defined
API, most commonly (but not necessarily) through the use of stored
procedures.

I strongly agree with that design philosophy. One principle is that the buck
stops with the database and that regardless of what the application does, any
business logic should be enforced by the database itself. Another principle is
to treat the database like a code library, where the tables are its internal
variables and its public API is stored procedures. Using stored procedures
means you can interact with the database from your application in the same way
your application interacts with itself, meaning with parameterized routine calls.

<snip>

To focus on a more concrete example, let's consider adding a financial
transaction to the database. The "traditional" way to do this, with the
business logic in the application layer, leaves us with two steps:
insert the transaction "header", then insert the line items:

BEGIN;

INSERT INTO transaction (id, date, description)
VALUES (1, CURRENT_DATE, 'Transaction 1');

INSERT INTO line_item (transaction_id, account_id, amount)
VALUES (1, 1, 50), (1, 2, -50);

END;

<snip>

Anything intended to be a single transaction can be a single stored procedure.

The code is something like this (out of my head, adjust to make it correct):

FUNCTION financial_trans (trans_id, when, desc, dest_acct, src_acct, amt)
BEGIN
INSERT INTO transaction (id, date, description)
VALUES (trans_id, when, desc);
INSERT INTO line_item (transaction_id, account_id, amount)
VALUES (trans_id, dest_acct, amt), (trans_id, src_acct, -amt);
END;

SELECT financial_trans( 1, CURRENT_DATE, 'Transaction 1', 1, 2, 50 );

But this still falls short, since we're still basically managing the
transaction in the application layer. The holy grail, so to speak, would
be:

SELECT create_transaction(1, current_date, 'Transaction 1', ((1, 50),
(2, -50)));

Well, not quite, because specifying the number "50" twice would be ridiculous
for such a non-generic function; you can calculate the "-50" from it in the
function.

Perhaps I just need to spend more time digging through the
documentation, but I really have no idea how to do something like this,
or if it's even possible. I'm really hoping someone can provide an
example, point me to some resources, or even just share their real-world
experience of doing something like this. It would be very much
appreciated.

A general rule of thumb, however you would design a routine in a normal
programming language, try to do it that way in PL/PgSQL, assuming that PL/PgSQL
is a competent language, and then tweak to match what you actually can do.

-- Darren Duncan

John R Pierce

pierce@hogranch.com

over 14 years ago

In reply to: Darren Duncan (#2)

Re: Implementing "thick"/"fat" databases

On 07/22/11 4:11 PM, Darren Duncan wrote:

Karl Nack wrote:

I've been following a few blogs
(http://database-programmer.blogspot.com/,
http://thehelsinkideclaration.blogspot.com/) that make a very compelling
argument, in my opinion, to move as much business/transactional logic as
possible into the database, so that client applications become little
more than moving data into and out of the database using a well-defined
API, most commonly (but not necessarily) through the use of stored
procedures.

I strongly agree with that design philosophy. One principle is that
the buck stops with the database and that regardless of what the
application does, any business logic should be enforced by the
database itself. Another principle is to treat the database like a
code library, where the tables are its internal variables and its
public API is stored procedures. Using stored procedures means you
can interact with the database from your application in the same way
your application interacts with itself, meaning with parameterized
routine calls.

the alternative 'modern' architecture is to implement the business logic
in a webservices engine that sits in front of the database, and only use
stored procedures for things that get significant performance boost
where that is needed to meet your performance goals.. Only this
business logic is allowed to directly query the operational database.
The business logic in this middle tier still relies on the database
server for data integrity and such. The presentation layer is
implemented either in a conventional client application or in a
webserver (not to be confused with the webservices).... so you have
user -> browser -> webserver/presentation layer -> webservices/business
logic -> database

The main rationale for this sort of design pattern is that large complex
business logic implemented in SQL stored procedures can be rather
difficult to develop and maintain

--
john r pierce N 37, W 122
santa cruz ca mid-left coast

Darren Duncan

darren@darrenduncan.net

over 14 years ago

In reply to: John R Pierce (#3)

Re: Implementing "thick"/"fat" databases

John R Pierce wrote:

On 07/22/11 4:11 PM, Darren Duncan wrote:

Karl Nack wrote:

I've been following a few blogs
(http://database-programmer.blogspot.com/,
http://thehelsinkideclaration.blogspot.com/) that make a very compelling
argument, in my opinion, to move as much business/transactional logic as
possible into the database, so that client applications become little
more than moving data into and out of the database using a well-defined
API, most commonly (but not necessarily) through the use of stored
procedures.

I strongly agree with that design philosophy. One principle is that
the buck stops with the database and that regardless of what the
application does, any business logic should be enforced by the
database itself. Another principle is to treat the database like a
code library, where the tables are its internal variables and its
public API is stored procedures. Using stored procedures means you
can interact with the database from your application in the same way
your application interacts with itself, meaning with parameterized
routine calls.

the alternative 'modern' architecture is to implement the business logic
in a webservices engine that sits in front of the database, and only use
stored procedures for things that get significant performance boost
where that is needed to meet your performance goals.. Only this
business logic is allowed to directly query the operational database.
The business logic in this middle tier still relies on the database
server for data integrity and such. The presentation layer is
implemented either in a conventional client application or in a
webserver (not to be confused with the webservices).... so you have
user -> browser -> webserver/presentation layer -> webservices/business
logic -> database

The main rationale for this sort of design pattern is that large complex
business logic implemented in SQL stored procedures can be rather
difficult to develop and maintain

I should clarify that the primary thing I support, with respect to putting it in
the database, is the business rules/constraints, because the buck stops there.
It should not be possible for any database user lacking in data-definition
privileges to circumvent any of the business rules. So one can not circumvent
by using a generic SQL shell, for example.

As for the rest, yes I agree with you that this doesn't have to actually be in
the database, though from a standpoint of good design principles, all of the
business logic should still be in one place, next to if not in the database, and
that all database access should go through the business logic layer.

All logic that is not specific to an application should go in a logic layer, so
it is shared by multiple applications whether web or command-line or whatever,
and so then the application is largely just a user interface.

In other words, thinking in the Model-View-Controller paradigm, the Model should
be fat and the Controller should be thin.

-- Darren Duncan

Gavin Flower

GavinFlower@archidevsys.co.nz

over 14 years ago

In reply to: John R Pierce (#3)

Re: Implementing "thick"/"fat" databases

On 23/07/11 12:05, John R Pierce wrote:

On 07/22/11 4:11 PM, Darren Duncan wrote:

Karl Nack wrote:

I've been following a few blogs
(http://database-programmer.blogspot.com/,
http://thehelsinkideclaration.blogspot.com/) that make a very
compelling
argument, in my opinion, to move as much business/transactional
logic as
possible into the database, so that client applications become little
more than moving data into and out of the database using a well-defined
API, most commonly (but not necessarily) through the use of stored
procedures.

I strongly agree with that design philosophy. One principle is that
the buck stops with the database and that regardless of what the
application does, any business logic should be enforced by the
database itself. Another principle is to treat the database like a
code library, where the tables are its internal variables and its
public API is stored procedures. Using stored procedures means you
can interact with the database from your application in the same way
your application interacts with itself, meaning with parameterized
routine calls.

the alternative 'modern' architecture is to implement the business
logic in a webservices engine that sits in front of the database, and
only use stored procedures for things that get significant performance
boost where that is needed to meet your performance goals.. Only this
business logic is allowed to directly query the operational database.
The business logic in this middle tier still relies on the database
server for data integrity and such. The presentation layer is
implemented either in a conventional client application or in a
webserver (not to be confused with the webservices).... so you have
user -> browser -> webserver/presentation layer ->
webservices/business logic -> database

The main rationale for this sort of design pattern is that large
complex business logic implemented in SQL stored procedures can be
rather difficult to develop and maintain

I was thinking similar thoughts, but you not only beat me to it, you
made some good points I had not thought of!

The only thing I can think of adding: is that it would be good to lock
down the database so that only the middleware can access it, everything
else accesses the database via the middleware.

Cheers,
Gavin

Alban Hertroys

dalroi@solfertje.student.utwente.nl

over 14 years ago

In reply to: Karl Nack (#1)

Re: Implementing "thick"/"fat" databases

On 22 Jul 2011, at 21:15, Karl Nack wrote:

But this still falls short, since we're still basically managing the
transaction in the application layer.

The problem you're facing here is that database statements work with records, while your example has a need to handle a set of (different types of) records in one go.

The holy grail, so to speak, would be:

SELECT create_transaction(1, current_date, 'Transaction 1', ((1, 50),
(2, -50)));

Now imagine someone not familiar with your design reading this query...
To start with, they're going to assume this query SELECTs data, while it actually inserts it. Secondly, it's impossible to see what the different data-values are supposed to represent without looking up the function - and quite possibly, it's implementation. They're going to wonder what (1,50) and (2, -50) mean, what kind of date current_date gets assigned to, etc.

Having to write queries like these is even worse, even if you designed the function. You'll be looking at your own documentation a lot while writing these.

It would seem to me that the API you would provide for business logic like this should provide the users of said API with enough context to create valid statements. For example, you could use XML to describe the data (I'm no fan of XML, but it does suit a need here and allows validation of the provided data), especially as Postgres has XML parsing functionality.
Or you could use a more sophisticated procedural language (plpython or plphp, for example) that's capable of marshalling and unmarshalling data structures to strings and vice versa (eg. '{foo:1,bar:2}').

You would still have a SELECT statement that INSERTs data, which is semantically a bad thing to do IMHO.

Perhaps the better solution is (as others mentioned already) to move the data interpretation to a (web)service/application server and have that perform the actual database operations.
With that in mind, you would put business logic ("process an invoice") into an "application server", while you put data integrity logic ("don't allow transactions with no line_items") into the database.

Alban Hertroys

--
The scale of a problem often equals the size of an ego.

!DSPAM:737,4e2a9c2112098024710106!

Chris Travers

chris.travers@gmail.com

over 14 years ago

In reply to: Karl Nack (#1)

Re: Implementing "thick"/"fat" databases

On Fri, Jul 22, 2011 at 12:15 PM, Karl Nack <karlnack@futurityinc.com> wrote:

I've been following a few blogs
(http://database-programmer.blogspot.com/,
http://thehelsinkideclaration.blogspot.com/) that make a very compelling
argument, in my opinion, to move as much business/transactional logic as
possible into the database, so that client applications become little
more than moving data into and out of the database using a well-defined
API, most commonly (but not necessarily) through the use of stored
procedures.

There are costs and benefits associated with this. The major argument
against is that if you have business processes that may frequently or
quickly change at the requirement level, a heavier-weight process
might not work so well.

On the other hand if your database is being used by more than one
application, or if you want it to be used by one or more application,
then it ideally can provide a way to consistently enforce business
logic and security across multiple applications. Another benefit is
that depending on your architecture, you might be able to place a
single API to call such procedures, generate parameterized query
strings, and then pass those though, reducing the possibility of one
type of SQL injection. Note however, you have to worry about other
forms of SQL injection inside your stored procs so this isn't a magic
bullet even if it helps.

So I think it boils down to how much consistency and stability you
want and where you want it.

Although there seems to be a lot of discussion out there of
the reasons why one might want to do this, I'm really at a loss for
finding good, concrete examples of how to do it. Consequently, I'm
hoping that somebody can share their experience(s), or point me to some
examples, of doing this with PostgreSQL. I'd consider myself fairly
well-versed in using the various features of PostgreSQL to enforce data
integrity, but much less so for implementing transactional logic.

In LedgerSMB, we take this a step further by making the procedures
into discoverable interfaces, so the application logic itself is a
sort of thin glue between a UI layer and the database procedure layer.
One thing I would suggest is to try to keep API calls as atomic as
possible. You want to enforce consistency and so you need to have all
relevant inputs passed to the function. See below for a suggested
change to your API.

To focus on a more concrete example, let's consider adding a financial
transaction to the database. The "traditional" way to do this, with the
business logic in the application layer, leaves us with two steps:
insert the transaction "header", then insert the line items:

BEGIN;

INSERT INTO transaction (id, date, description)
VALUES (1, CURRENT_DATE, 'Transaction 1');

INSERT INTO line_item (transaction_id, account_id, amount)
VALUES (1, 1, 50), (1, 2, -50);

END;

Now if we start moving this logic to the database, we'd have something
like:

BEGIN;
SELECT create_transaction(1, current_date, 'Transaction 1');
SELECT create_line_item(1, 1, 50);
SELECT create_line_item(1, 1, -50);
END;

Now, if you are doing double-entry bookkeeping this doesn't provide
enough consistency, IMO. You can't check inside the function to
ensure that the transaction is balanced. it would be better to:

BEGIN;
SELECT create_transaction(1, current_date, 'Transaction 1', '{{1, 1,
50},{1,1,-50}}');
COMMIT;

Now for the application, you can create an API that is semantically
clearer. But PostgreSQL doesn't provide an easy way of calling
procedures of this sort out of select/update/insert statements and
select is the only way to do this.

Best Wishes,
Chris Travers

David G. Johnston

david.g.johnston@gmail.com

over 14 years ago

In reply to: Chris Travers (#7)

Re: Implementing "thick"/"fat" databases

Now, if you are doing double-entry bookkeeping this doesn't provide
enough consistency, IMO. You can't check inside the function to
ensure that the transaction is balanced. it would be better to:

BEGIN;
SELECT create_transaction(1, current_date, 'Transaction 1', '{{1, 1,
50},{1,1,-50}}');
COMMIT;

Now for the application, you can create an API that is semantically
clearer. But PostgreSQL doesn't provide an easy way of calling
procedures of this sort out of select/update/insert statements and
select is the only way to do this.

One option to consider is restricting final tables but making staging tables available. You use normal inserts to build up the staging table and then validate and transfer the data to the final table using a function. In the example you can build a work-in-process transaction however you see fit but the final real transaction creation process would first confirm that the entry balances before copying the records to the transaction table and cleaning up the work-in-process table.

David J.

David G. Johnston

david.g.johnston@gmail.com

over 14 years ago

In reply to: Chris Travers (#7)

Re: Implementing "thick"/"fat" databases

Now for the application, you can create an API that is semantically
clearer. But PostgreSQL doesn't provide an easy way of calling
procedures of this sort out of select/update/insert statements and
select is the only way to do this.

A semantically accurate way to run "procedures" is:

DO $$ BEGIN PERFORM function(); END $$;

It would be nice if you could use PERFORM outside of plpgsql...

Now, this is not standard SQL but it does convey the desired semantics. That said, proper function naming can convey similar semantic information as well.

David J.

#10

Chris Travers

chris.travers@gmail.com

over 14 years ago

In reply to: David G. Johnston (#9)

Re: Implementing "thick"/"fat" databases

On Sat, Jul 23, 2011 at 1:32 PM, David Johnston <polobo@yahoo.com> wrote:

Now for the application, you can create an API that is semantically
clearer. But PostgreSQL doesn't provide an easy way of calling
procedures of this sort out of select/update/insert statements and
select is the only way to do this.

A semantically accurate way to run "procedures" is:

DO $$ BEGIN PERFORM function(); END $$;

It would be nice if you could use PERFORM outside of plpgsql...

Now, this is not standard SQL but it does convey the desired semantics. That said, proper function naming can convey similar semantic information as well.

The other option (one we try to follow in LedgerSMB) is to always make
the procedure return useful information. So you are essentially
calling a procedure and selecting the result for further use by your
application.

Best Wishes,
Chris Travers

#11

Karl Nack

karlnack@futurityinc.com

over 14 years ago

In reply to: Darren Duncan (#2)

Re: Implementing "thick"/"fat" databases

SELECT create_transaction(1, current_date, 'Transaction 1', ((1,
50), (2, -50)));

Well, not quite, because specifying the number "50" twice would be
ridiculous for such a non-generic function; you can calculate the "-
50" from it in the function.

Not if there were more than two line-items per transaction. A paycheck
is a good example, where generally one or more income accounts are
credited and multiple tax accounts as well as one or more asset accounts
are debited. Ensuring that all the line-items add up to 0 would be one
of the data integrity rules implemented in the database (though most
likely checked in the application layer as well).

A general rule of thumb, however you would design a routine in a
normal programming language, try to do it that way in PL/PgSQL,
assuming that PL/PgSQL is a competent language, and then tweak to
match what you actually can do.

In the language I'm most familiar with, PHP, I could do this with an
associative array:

$transaction = array(
'id' => 1,
'date' => date('Y-m-d'),
'description' => 'Transaction 1',
'line_items' => array(
array('account_id' => 1, 'amount' => 50),
array('account_id' => 2, 'amount' => -50),
),
);

From which I can easily build the appropriate SQL statements. This would
be very similar in Python. I wonder if this would be achievable in
PL/PGSQL, maybe through the use of composite types and/or domains?

Karl Nack

Futurity, Inc
5121 N Ravenswood Ave
Chicago, IL 60640
773-506-2007

#12

Karl Nack

karlnack@futurityinc.com

over 14 years ago

In reply to: Alban Hertroys (#6)

Re: Implementing "thick"/"fat" databases

For example, you could use XML to describe the
data (I'm no fan of XML, but it does suit a need here and allows
validation of the provided data), especially as Postgres has XML parsing
functionality.

Would you go so far as to suggest making the entire parameter an XML
statement, something like:
CREATE FUNCTION create_transaction (txn xml) ...

I don't know, it seems like with converting everything to and from XML,
we really haven't made things any better, just added the complexity of
adding an intermediate technology. Also, I'm not a fan of XML either,
and for the most part have managed to steer relatively clear of it so
far. Getting pulled into that black whole is something I'd rather avoid,
if possible....

Or you could use a more sophisticated procedural language (plpython or
plphp, for example) that's capable of marshalling and unmarshalling data
structures to strings and vice versa (eg. '{foo:1,bar:2}').

I haven't looked into the other procedural languages besides PGSQL,
although I am familiar with Python in general. Perhaps that may be the
way to go ...

You would still have a SELECT statement that INSERTs data, which is
semantically a bad thing to do IMHO.

True, although others have suggested always returning something useful
from the function. For example, it's more likely the transaction id
would be assigned using a sequence inside the database. The function
could then return that id after successfully creating the transaction.

Karl Nack

Futurity, Inc
5121 N Ravenswood Ave
Chicago, IL 60640
773-506-2007

#13

Karl Nack

karlnack@futurityinc.com

over 14 years ago

In reply to: Chris Travers (#7)

Re: Implementing "thick"/"fat" databases

In LedgerSMB, we take this a step further by making the procedures
into discoverable interfaces, so the application logic itself is a
sort of thin glue between a UI layer and the database procedure layer.
One thing I would suggest is to try to keep API calls as atomic as
possible. You want to enforce consistency and so you need to have all
relevant inputs passed to the function. See below for a suggested
change to your API.

Would you say LedgerSMB follows the thick/fat database principle? If so,
I shall have to spend some time with your source code. I'd be very
curious to see how you handle this.

BEGIN;
SELECT create_transaction(1, current_date, 'Transaction 1');
SELECT create_line_item(1, 1, 50);
SELECT create_line_item(1, 1, -50);
END;

Now, if you are doing double-entry bookkeeping this doesn't provide
enough consistency, IMO. You can't check inside the function to
ensure that the transaction is balanced.

Yes, but I'd implement the constraint "all transactions must balance" as
a trigger that fires when the transaction is complete. This would
enforce data integrity regardless of whether or not the database API is
used, which I think is also important.

Karl Nack

Futurity, Inc
5121 N Ravenswood Ave
Chicago, IL 60640
773-506-2007

#14

Darren Duncan

darren@darrenduncan.net

over 14 years ago

In reply to: Karl Nack (#11)

Re: Implementing "thick"/"fat" databases

Karl Nack wrote:

SELECT create_transaction(1, current_date, 'Transaction 1', ((1,
50), (2, -50)));

Well, not quite, because specifying the number "50" twice would be
ridiculous for such a non-generic function; you can calculate the "-
50" from it in the function.

Not if there were more than two line-items per transaction. A paycheck
is a good example, where generally one or more income accounts are
credited and multiple tax accounts as well as one or more asset accounts
are debited. Ensuring that all the line-items add up to 0 would be one
of the data integrity rules implemented in the database (though most
likely checked in the application layer as well).

It would help if you clarified your intent with another example.

I was assuming here that we were dealing with a double-entry accounting system
where every line item in one account had a corresponding line item in another
account of equal magnitude, and so all line items were in pairs, because you
showed what looked like 1 pair, hence specifying the "50" once makes sense.

So are you wanting the 1 function to take a set of line-item pairs, or are you
wanting to say do +50 in one account and -25 in each of 2 different accounts, as
a total of 3 line items? If the former, I would still just specify the "50"
once and have a set of {from, to, amount} triples as an argument, which would be
relation/rowset-typed. If the latter, then you would specify the "50" twice.

A general rule of thumb, however you would design a routine in a
normal programming language, try to do it that way in PL/PgSQL,
assuming that PL/PgSQL is a competent language, and then tweak to
match what you actually can do.

In the language I'm most familiar with, PHP, I could do this with an
associative array:

$transaction = array(
'id' => 1,
'date' => date('Y-m-d'),
'description' => 'Transaction 1',
'line_items' => array(
array('account_id' => 1, 'amount' => 50),
array('account_id' => 2, 'amount' => -50),
),
);

From which I can easily build the appropriate SQL statements. This would

be very similar in Python. I wonder if this would be achievable in
PL/PGSQL, maybe through the use of composite types and/or domains?

Yes, you could. PL/PgSQL supports relation/rowset-typed arguments (declared as
"TABLE OF <rowtype>" or something like that).

Or alternately you could use temporary staging tables as quasi-arguments rather
than using an actual argument.

-- Darren Duncan

#15

Karl Nack

karlnack@futurityinc.com

over 14 years ago

In reply to: David G. Johnston (#8)

Re: Implementing "thick"/"fat" databases

Now, if you are doing double-entry bookkeeping this doesn't provide
enough consistency, IMO. You can't check inside the function to
ensure that the transaction is balanced.

One option to consider is restricting final tables but making staging
tables available.

I would implement this using triggers.

First, add a boolean "is_balanced" column to the transaction table,
along with the following trigger:

create or replace function check_txn_balance()
returns trigger
language plpgsql
as $$
declare
_amt numeric;
begin
if 'UPDATE' = TG_OP and new.is_balanced then
return null;
end if;

select sum(amt)
into _amt
from line_item
where txn_id = new.id;

if _amt <> 0 then
raise exception 'unbalanced transaction';
end if;

update txn
set is_balanced = true
where id = new.id;

return null;
end;
$$;

create constraint trigger check_txn_balance
after insert or update on txn
deferrable initially deferred
for each row execute procedure check_txn_balance();

Then, whenever we add, remove, or update a line item, unbalance the
parent transaction, which triggers the balance check:

create or replace function unbalance_txn()
returns trigger
language plpgsql
as $$
begin
if 'UPDATE' = TG_OP then
if (new.txn_id, new.amt) = (old.txn_id, old.amt) then
return null;
end if;
end if;

if TG_OP in ('INSERT', 'UPDATE') then
update txn
set is_balanced = false
where (id, is_balanced) = (new.txn_id, true);
end if;

if TG_OP in ('DELETE', 'UPDATE') then
update txn
set is_balanced = false
where (id, is_balanced) = (old.txn_id, true);
end if;

return null;
end;
$$;

create trigger unbalance_txn
after insert or delete or update on line_item
for each row execute procedure unbalance_txn();

At least, this seems to be a fairly efficient and foolproof way to do it
to me.

Karl Nack

Futurity, Inc
5121 N Ravenswood Ave
Chicago, IL 60640
773-506-2007

#16

David G. Johnston

david.g.johnston@gmail.com

over 14 years ago

In reply to: Darren Duncan (#14)

Re: Implementing "thick"/"fat" databases

I was assuming here that we were dealing with a double-entry accounting system where every line item in one account had a corresponding line item in another account of equal magnitude, and so all line items were in pairs, because you showed what looked like 1 pair, hence specifying the "50" once makes sense.

So are you wanting the 1 function to take a set of line-item pairs, or are you wanting to say do +50 in one account and -25 in each of 2 different accounts, as a total of 3 line items? If the former, I would still just specify the "50" once and have a set of {from, to, amount} triples as an argument, which would be relation/rowset-typed. If the latter, then you would specify the "50" twice.

A little OT but all double-entry accounting requires is that the debits and credits of an entry - when totaled - equal zero (are equal); there is no requirement pertaining to each item having an equal but opposite counter-part. Nor are multiple accounts required; some ad-hoc entries simply "re-class" money within a single account so that different "controls" and/or amounts remain.

Ideally what you would want is a type called "accounting detail entry" which represents a credit/debit. For the function you can either make any "negative" amounts credits OR pass in two arrays - one for debits and one for credits. Either way you'd pass in an array of this "accounting detail entry" type along with information such as accounting date and journal number. To avoid using a type I would make a "create_entry" function that you'd use to create the detail items on a staging table, attached to a specific ID, and then create the final entry by calling the original function with user/date/journal/etc information and reference the detail records via the ID.

David J.

#17

Chris Travers

chris.travers@gmail.com

over 14 years ago

In reply to: Karl Nack (#13)

Re: Implementing "thick"/"fat" databases

On Sat, Jul 23, 2011 at 3:51 PM, Karl Nack <karlnack@futurityinc.com> wrote:

In LedgerSMB, we take this a step further by making the procedures
into discoverable interfaces, so the application logic itself is a
sort of thin glue between a UI layer and the database procedure layer.
One thing I would suggest is to try to keep API calls as atomic as
possible. You want to enforce consistency and so you need to have all
relevant inputs passed to the function. See below for a suggested
change to your API.

Would you say LedgerSMB follows the thick/fat database principle? If so,
I shall have to spend some time with your source code. I'd be very
curious to see how you handle this.

The current svn trunk (to be 1.3) does. Older code in trunk or 1.2
follows the "survival" principle (we inherited one heck of a codebase
when we forked) :-).

BEGIN;
SELECT create_transaction(1, current_date, 'Transaction 1');
SELECT create_line_item(1, 1, 50);
SELECT create_line_item(1, 1, -50);
END;

Now, if you are doing double-entry bookkeeping this doesn't provide
enough consistency, IMO. You can't check inside the function to
ensure that the transaction is balanced.

Yes, but I'd implement the constraint "all transactions must balance" as
a trigger that fires when the transaction is complete. This would
enforce data integrity regardless of whether or not the database API is
used, which I think is also important.

That's problematic to do in PostgreSQL because statement-level
triggers don't have access to statement args, and I don't believe they
can be deferred. Note the GL stuff is not on the thick db system
yet, but we are working on it (for 1.4).

Here's an example:

CREATE OR REPLACE FUNCTION payment_bulk_post
(in_transactions numeric[], in_batch_id int, in_source text, in_total numeric,
in_ar_ap_accno text, in_cash_accno text,
in_payment_date date, in_account_class int, in_payment_type int,
in_exchangerate numeric, in_curr text)
RETURNS int AS
$$
DECLARE
out_count int;
t_voucher_id int;
t_trans_id int;
t_amount numeric;
t_ar_ap_id int;
t_cash_id int;
t_currs text[];
t_exchangerate numeric;
BEGIN
IF in_batch_id IS NULL THEN
-- t_voucher_id := NULL;
RAISE EXCEPTION 'Bulk Post Must be from Batch!';
ELSE
INSERT INTO voucher (batch_id, batch_class, trans_id)
values (in_batch_id,
(SELECT batch_class_id FROM batch WHERE id = in_batch_id),
in_transactions[1][1]);

t_voucher_id := currval('voucher_id_seq');
END IF;

SELECT string_to_array(value, ':') into t_currs
from defaults
where setting_key = 'curr';

IF (in_curr IS NULL OR in_curr = t_currs[0]) THEN
t_exchangerate := 1;
ELSE
t_exchangerate := in_exchangerate;
END IF;

CREATE TEMPORARY TABLE bulk_payments_in (id int, amount numeric);

select id into t_ar_ap_id from chart where accno = in_ar_ap_accno;
select id into t_cash_id from chart where accno = in_cash_accno;

FOR out_count IN
array_lower(in_transactions, 1) ..
array_upper(in_transactions, 1)
LOOP
EXECUTE $E$
INSERT INTO bulk_payments_in(id, amount)
VALUES ($E$ || quote_literal(in_transactions[out_count][1])
|| $E$, $E$ ||
quote_literal(in_transactions[out_count][2])
|| $E$)$E$;
END LOOP;
EXECUTE $E$
INSERT INTO acc_trans
(trans_id, chart_id, amount, approved, voucher_id, transdate,
source, payment_type)
SELECT id,
case when $E$ || quote_literal(in_account_class) || $E$ = 1
THEN $E$ || t_cash_id || $E$
WHEN $E$ || quote_literal(in_account_class) || $E$ = 2
THEN $E$ || t_ar_ap_id || $E$
ELSE -1 END,
amount * $E$|| quote_literal(t_exchangerate) || $E$,
CASE
WHEN $E$|| t_voucher_id || $E$ IS NULL THEN true
ELSE false END,
$E$ || t_voucher_id || $E$, $E$|| quote_literal(in_payment_date)
||$E$ , $E$ ||COALESCE(quote_literal(in_source), 'NULL') ||
$E$ , $E$ || coalesce(quote_literal(in_payment_type), 'NULL') || $E$
FROM bulk_payments_in where amount <> 0 $E$;

EXECUTE $E$
INSERT INTO acc_trans
(trans_id, chart_id, amount, approved, voucher_id, transdate,
source, payment_type)
SELECT id,
case when $E$ || quote_literal(in_account_class) || $E$ = 1
THEN $E$ || t_ar_ap_id || $E$
WHEN $E$ || quote_literal(in_account_class) || $E$ = 2
THEN $E$ || t_cash_id || $E$
ELSE -1 END,
amount * -1 * $E$|| quote_literal(t_exchangerate) || $E$,
CASE
WHEN $E$|| t_voucher_id || $E$ IS NULL THEN true
ELSE false END,
$E$ || t_voucher_id || $E$, $E$|| quote_literal(in_payment_date)
||$E$ , $E$ ||COALESCE(quote_literal(in_source), 'null')
||$E$ , $E$ || coalesce(quote_literal(in_payment_type), 'NULL') || $E$
FROM bulk_payments_in where amount <> 0 $E$;

IF in_account_class = 1 THEN
EXECUTE $E$
UPDATE ap
set paid = paid + (select amount from bulk_payments_in b
where b.id = ap.id)
where id in (select id from bulk_payments_in) $E$;
ELSE
EXECUTE $E$
UPDATE ar
set paid = paid + (select amount from bulk_payments_in b
where b.id = ar.id)
where id in (select id from bulk_payments_in) $E$;
END IF;
EXECUTE $E$ DROP TABLE bulk_payments_in $E$;
perform unlock_all();
return out_count;
END;
$$ language plpgsql;

in_transactions is a n by 2 array of numeric values. The first is an
integer representation of the invoice id to be paid. The second is
the numeric amount to be paid on that invoice. The temporary table
proved necessary because of cache misses when trying to loop through
the array when hundreds of invoices were paid to one vendor. The
code here is still far from ideal, as in the future we will probably
just query against the array using generate_series.

Hope this helps.
Chris Travers

#18

Chris Travers

chris.travers@gmail.com

over 14 years ago

In reply to: Chris Travers (#17)

Re: Implementing "thick"/"fat" databases

CREATE OR REPLACE FUNCTION payment_bulk_post
(in_transactions numeric[], in_batch_id int, in_source text, in_total numeric,
in_ar_ap_accno text, in_cash_accno text,
in_payment_date date, in_account_class int, in_payment_type int,
in_exchangerate numeric, in_curr text)

Oh and as a disclaimer, this was tacked onto a database schema which
a) I did not design and b) is patently insane. That part of the
database schema is being redesigned for 1.4.

Those wonderful disclaimers when having to post code against databases
designed by.... well better say nothing at all >:-D

Best Wishes,
Chris Travers

#19

Chris Travers

chris.travers@gmail.com

over 14 years ago

In reply to: Gavin Flower (#5)

Re: Implementing "thick"/"fat" databases

I was thinking similar thoughts, but you not only beat me to it, you made
some good points I had not thought of!

The only thing I can think of adding: is that it would be good to lock down
the database so that only the middleware can access it, everything else
accesses the database via the middleware.

In general, I am not convinced that middleware is inherently more
maintainable than in-db procedures.

But the fundamental question is: Is this a a one-application
database? If it is, you can use the middleware to be that application
lock the db down so only the middleware can use it etc.

But what if it isn't? What if we want to support a variety of
applications against the same relational database? This has to be
fairly commonplace.....

In this way my experience is that it is often helpful to maintain
several levels of stable, public API's both on a table level if
possible (as attachment points for triggers), stored proc API's for
actually inserting data into relevant areas while enforcing
appropriate business logic, and so forth.

One of the things we are doing in LedgerSMB is to make the stored
procedures discoverable, so the argument names (and eventually the
return types) will have meaning the application can use in building
calls for the procedure. This eases one important maintenance point
because arguments are automatically picked up by the application and
as long as best practices in coding are followed, will be handled
sanely. (The interface will be extended in the future so that return
types determine the class, and the arguments in determine whether we
are talking about a presumed object property or a presumed
application-specified argument.) Theoretically, we should be able to
build objects in languages picking up methods and properties from the
Pg system catalogs but we haven't gotten that far yet with code
generation.

Best Wishes,
Chris Travers

#20

Sim Zacks

sim@compulab.co.il

over 14 years ago

In reply to: Karl Nack (#1)

Re: Implementing "thick"/"fat" databases

I gave a talk on using postgresql as an application server at PG East
in March.

Basically, we try to implement all business logic using functions, using
plpythonu when necessary.

For example, we have functions that send email, ftp files, sync remote
databases, etc.

It is important to keep your MVC intact and not "php" your function code
by mixing business logic with SQL statements.

I am currently playing with interactive queries, where the function
stops in the middle, and sends a message to the client asking for input.
This isn't a necessarily a good idea in all cases, but there are some
functions where you don't have the ability to ask a question until it is
mostly finished processing (ie you dont have the information needed to
ask the question until it finishes munging the data, which might be a
long process).

Let me know if you would like a copy of my presentation.

Sim

On 07/22/2011 10:15 PM, Karl Nack wrote:

Show quoted text

I've been following a few blogs
(http://database-programmer.blogspot.com/,
http://thehelsinkideclaration.blogspot.com/) that make a very compelling
argument, in my opinion, to move as much business/transactional logic as
possible into the database, so that client applications become little
more than moving data into and out of the database using a well-defined
API, most commonly (but not necessarily) through the use of stored
procedures. Although there seems to be a lot of discussion out there of
the reasons why one might want to do this, I'm really at a loss for
finding good, concrete examples of how to do it. Consequently, I'm
hoping that somebody can share their experience(s), or point me to some
examples, of doing this with PostgreSQL. I'd consider myself fairly
well-versed in using the various features of PostgreSQL to enforce data
integrity, but much less so for implementing transactional logic.

To focus on a more concrete example, let's consider adding a financial
transaction to the database. The "traditional" way to do this, with the
business logic in the application layer, leaves us with two steps:
insert the transaction "header", then insert the line items:

BEGIN;

INSERT INTO transaction (id, date, description)
VALUES (1, CURRENT_DATE, 'Transaction 1');

INSERT INTO line_item (transaction_id, account_id, amount)
VALUES (1, 1, 50), (1, 2, -50);

END;

Now if we start moving this logic to the database, we'd have something
like:

BEGIN;
SELECT create_transaction(1, current_date, 'Transaction 1');
SELECT create_line_item(1, 1, 50);
SELECT create_line_item(1, 1, -50);
END;

But we've actually taken a step back, since we're making a round-trip to
the database for each line item. That could be resolved by doing:

BEGIN;

SELECT create_transaction(1, current_date, 'Transaction 1');

SELECT create_line_item(transaction_id, account_id, amount)
FROM (VALUES (1, 1, 50), (1, 2, -50))
AS line_item (transaction_id, account_id, amount);

END;

Better, but still not good, since we're invoking the function for each
individual line item, which ultimately means separate INSERTs for each
one. What we'd want is something like:

BEGIN;
SELECT create_transaction(1, current_date, 'Transaction 1');
SELECT create_line_items(((1, 1, 50), (1, 2, -50)));
END;

But this still falls short, since we're still basically managing the
transaction in the application layer. The holy grail, so to speak, would
be:

SELECT create_transaction(1, current_date, 'Transaction 1', ((1, 50),
(2, -50)));

Perhaps I just need to spend more time digging through the
documentation, but I really have no idea how to do something like this,
or if it's even possible. I'm really hoping someone can provide an
example, point me to some resources, or even just share their real-world
experience of doing something like this. It would be very much
appreciated.

Thanks.

Karl Nack

Futurity, Inc
5121 N Ravenswood Ave
Chicago, IL 60640
773-506-2007