LYDB: What advice about stored procedures and other server side code?

Started by Guyren Howeover 9 years ago10 messagesgeneral
Jump to latest
#1Guyren Howe
guyren@gmail.com

I am putting together some advice for developers about getting the most out of SQL servers in general and Postgres in particular. I have in mind the likes of most web developers, who through ignorance or a strange cultural preference that has emerged, tend to treat their database server as a dumb data bucket.

I call the project Love Your Database (LYDB). It is starting as a series of blog posts:

https://medium.com/@gisborne/love-your-database-lydb-23c69f480a1d#.4jngp2rcb <https://medium.com/@gisborne/love-your-database-lydb-23c69f480a1d#.4jngp2rcb&gt;
https://medium.com/@gisborne/love-your-database-simple-validations-68d5d6d0bbf3#.az4o2s152 <https://medium.com/@gisborne/love-your-database-simple-validations-68d5d6d0bbf3#.az4o2s152&gt;

I would next like to cover server-side code such as stored procedures and triggers.

I am inclined to advise folks to use PL/V8 on Postgres, because it is a reasonable language, everyone knows it, it has good string functions, decent performance and it tends to be installed everywhere (in particular, Amazon RDF offers it).

Broadly, what advice should I offer that isn’t obvious? Not just about PL/V8 but server side code in general.

TIA

#2Steve Atkins
steve@blighty.com
In reply to: Guyren Howe (#1)
Re: LYDB: What advice about stored procedures and other server side code?

On Dec 27, 2016, at 2:03 PM, Guyren Howe <guyren@gmail.com> wrote:

I am putting together some advice for developers about getting the most out of SQL servers in general and Postgres in particular. I have in mind the likes of most web developers, who through ignorance or a strange cultural preference that has emerged, tend to treat their database server as a dumb data bucket.

I call the project Love Your Database (LYDB). It is starting as a series of blog posts:

https://medium.com/@gisborne/love-your-database-lydb-23c69f480a1d#.4jngp2rcb
https://medium.com/@gisborne/love-your-database-simple-validations-68d5d6d0bbf3#.az4o2s152

I would next like to cover server-side code such as stored procedures and triggers.

I am inclined to advise folks to use PL/V8 on Postgres, because it is a reasonable language, everyone knows it, it has good string functions, decent performance and it tends to be installed everywhere (in particular, Amazon RDF offers it).

Think hard about the "impedance mismatch" between parts of the system.

pl/pgsql uses sql data types and operators, and so interfaces very cleanly with the rest of postgresql. pl/v8 uses javascript data types and *for database related things* is likely to be a less perfect match to the rest of the system - as it's translating (or, in some cases, failing to translate) between sql data types and javascript data types that may not be entirely compatible, or which may not exist at all.

So if your functions are mostly doing databasey things, pl/pgsql may well be a better choice. If they're mostly doing appy things, that just happen to be in the database, then pl/v8 may be a better choice (but so might just doing the work in the app, perhaps with some listen/notify assistance).

Most of the functions I write are short trigger functions, or data wrapper/modification functions for migration or making business logic available for SQL. For the majority of those I find pl/pgsql the best match (if I can't get away with sql functions).

If you're trying to convince people to get the most out of their database, pushing them towards pl/v8 as their first choice of embedded language might not be the best path. (That it might encourage them to write code to iterate through tables rather than taking advantage of SQL where they can might be a thing too).

Cheers,
Steve

Broadly, what advice should I offer that isn’t obvious? Not just about PL/V8 but server side code in general.

TIA

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Alban Hertroys
haramrae@gmail.com
In reply to: Guyren Howe (#1)
Re: LYDB: What advice about stored procedures and other server side code?

On 27 Dec 2016, at 23:03, Guyren Howe <guyren@gmail.com> wrote:

I am putting together some advice for developers about getting the most out of SQL servers in general and Postgres in particular. I have in mind the likes of most web developers, who through ignorance or a strange cultural preference that has emerged, tend to treat their database server as a dumb data bucket.

I call the project Love Your Database (LYDB). It is starting as a series of blog posts:

https://medium.com/@gisborne/love-your-database-lydb-23c69f480a1d#.4jngp2rcb
https://medium.com/@gisborne/love-your-database-simple-validations-68d5d6d0bbf3#.az4o2s152

I would next like to cover server-side code such as stored procedures and triggers.

I think there's an objection that most developers would come up with: Namely that stored procedures living in the central relational database can usually not be changed as easily as client-side code, without disrupting development (or worse, operation of the production system) for other users. I think that needs some thought - I don't have the answer to that one.

IMO, as long as those stored procedures are part of the model (ensuring data integrity, interpreting database values, etc.), there shouldn't be a problem, but it's a terrain I haven't trodden much. There are things you usually don't want to handle at the database side, such as firing off other applications to perform tasks or user interface related operations, things like that. I once was tasked with writing a library on a database server (Oracle 8i) to handle printing of data and sending out faxes etc. That was a nightmare and in hindsight it just doesn't belong in a database. That's a task for middleware. Where the dividing line is is probably an interesting topic.

Another thought related to seeing the database as a model server; since we're apparently in the model-view-controller paradigm, it seems to me that database views would probably be useful to the view-part of MVC. That's just something that struck me to make more than a bit of sense…
Of course, for the actual view in the MVC paradigm there should be some kind of user interface, but database views could be really useful in preparing the data required for those, to make it fit the shape of the view.

So far I like what you're writing, so keep it coming!

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Christoph Moench-Tegeder
cmt@burggraben.net
In reply to: Guyren Howe (#1)
Re: LYDB: What advice about stored procedures and other server side code?

## Guyren Howe (guyren@gmail.com):

I am inclined to advise folks to use PL/V8 on Postgres, because it is
a reasonable language, everyone knows it, it has good string functions,
decent performance and it tends to be installed everywhere (in particular,
Amazon RDF offers it).

I'd be careful with "everyone knows JavaScript" - that may hold in
web development, but there are many cases where Javascript is completely
off the map. And for V8 itself - it adds quite a chunk of code to your
PostgreSQL installation, that may put some people off. At least, I
haven't seen it installed "generally" in the wild, but my view might
be as biased as anyone else's.

Broadly, what advice should I offer that isn’t obvious? Not just about
PL/V8 but server side code in general.

Initially, running code in your database can make life easier for
the developers (ise pgTap for testing, pl/profiler and pl/debugger,
etc.). But once you have to change your schema, the hurt begins:
you'll need downtime for that, or you'll have to deal with the
possibility of changing the API of your "database side" code, and
matching code and tables on the database. There have been talks
about that topic (using versioned schemas and leveraging search_path),
but still: a simple change to a function suddenly requires a lot
of care.
The next pain point is scalability: running code on the database server
puts your code on the most expensive and hardest to scale CPUs. You
can (almost) always add another appserver to your setup (just spin
up a VM with a tomcat or whatever-you-use). But if the bottleneck
is your database CPUs, you'd have to move to a larger server (that
got easier with VMs, within limits); or use replication to offload
some code to standbys, keeping writing code on the primary (and
hope you'll have enough horsepower there). Multi-Master introduces
some special limitations and operational overhead on it's own, I'd
not generally recommend that for all applications and developers
just moving up from the "dump data bucket" model.

TL;DR: database side code can be a great thing in a small application,
but once the application and traffic grows, "code in the database"
requires specialist attention and may become a burden.
Unfortunately, most large applications started small...

Regards,
Christoph

--
Spare Space

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Pavel Stehule
pavel.stehule@gmail.com
In reply to: Christoph Moench-Tegeder (#4)
Re: LYDB: What advice about stored procedures and other server side code?

2016-12-28 16:12 GMT+01:00 Christoph Moench-Tegeder <cmt@burggraben.net>:

## Guyren Howe (guyren@gmail.com):

I am inclined to advise folks to use PL/V8 on Postgres, because it is
a reasonable language, everyone knows it, it has good string functions,
decent performance and it tends to be installed everywhere (in

particular,

Amazon RDF offers it).

I'd be careful with "everyone knows JavaScript" - that may hold in
web development, but there are many cases where Javascript is completely
off the map. And for V8 itself - it adds quite a chunk of code to your
PostgreSQL installation, that may put some people off. At least, I
haven't seen it installed "generally" in the wild, but my view might
be as biased as anyone else's.

Broadly, what advice should I offer that isn’t obvious? Not just about
PL/V8 but server side code in general.

Initially, running code in your database can make life easier for
the developers (ise pgTap for testing, pl/profiler and pl/debugger,
etc.). But once you have to change your schema, the hurt begins:
you'll need downtime for that, or you'll have to deal with the
possibility of changing the API of your "database side" code, and
matching code and tables on the database. There have been talks
about that topic (using versioned schemas and leveraging search_path),
but still: a simple change to a function suddenly requires a lot
of care.

you are forgot on reduction of network cost - when some task generates lot
of fast queries, then main bottleneck is a network. Stored procedures
removes this bottleneck.

PLpgSQL shares data formats and process with PostgreSQL database engine -
there are not data conversations, there are not network/protocols/API
overhead, there are not interprocess communication overhead.

The next pain point is scalability: running code on the database server
puts your code on the most expensive and hardest to scale CPUs. You
can (almost) always add another appserver to your setup (just spin
up a VM with a tomcat or whatever-you-use). But if the bottleneck
is your database CPUs, you'd have to move to a larger server (that
got easier with VMs, within limits); or use replication to offload
some code to standbys, keeping writing code on the primary (and
hope you'll have enough horsepower there). Multi-Master introduces
some special limitations and operational overhead on it's own, I'd
not generally recommend that for all applications and developers
just moving up from the "dump data bucket" model.

TL;DR: database side code can be a great thing in a small application,
but once the application and traffic grows, "code in the database"
requires specialist attention and may become a burden.
Unfortunately, most large applications started small...

When you use stored procedures, you have to choose well the border - what
should be done by server, what should be done by outside. Usually stored
procedures should be glue of SQL - and then the overhead of stored
procedures is near to zero. Surely, stupid ORM techniques has terrible
impact on server side.

regards

Pavel

Show quoted text

Regards,
Christoph

--
Spare Space

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6Mike Sofen
msofen@runbox.com
In reply to: Christoph Moench-Tegeder (#4)
Re: LYDB: What advice about stored procedures and other server side code?

|From: Christoph Moench-Tegeder
|Initially, running code in your database can make life easier for the developers
|(ise pgTap for testing, pl/profiler and pl/debugger, etc.). But once you have to
|change your schema, the hurt begins:
|you'll need downtime for that, or you'll have to deal with the possibility of
|changing the API of your "database side" code, and matching code and tables on
|the database.

I look at this from the opposite direction: with a stable database API (via stored procs), I can change the schema and logic within the procs without causing any app code breakage…the app tier is completely insulated from those changes – that’s worth a lot. Yes, for deploying the schema change there must be an outage, but that’s true regardless of where the data is being manipulated – the data is still in a db…and there are ways to mitigate/eliminate the duration of the outage.

|The next pain point is scalability: running code on the database server puts your
|code on the most expensive and hardest to scale CPUs. You can (almost) always
|add another appserver to your setup (just spin up a VM with a tomcat or
|whatever-you-use). But if the bottleneck is your database CPUs, you'd have to
|move to a larger server

Our dedicated db servers have not yet shown any real CPU consumption during app use - memory, for us, is the only real limiting factor. The only time CPU consumption spikes is during admin activities - reindexing, vacuuming, bulk data loads...that sort of thing. Even the boxplot calculations barely cause a ripple. To me that speaks to the efficiency of language and engine working together. You are right of course on the scaling - if we do run out of CPU horsepower and cannot scale up any more, we'd have to scale out, and there are ways to do that too. IOWs, the model doesn't have to change, just the solution to solve the scaling (solve the problem, not the symptom).

|TL;DR: database side code can be a great thing in a small application, but once
|the application and traffic grows, "code in the database"
|requires specialist attention and may become a burden.
|Unfortunately, most large applications started small...

In my opinion, having a database specialist work on database stuff is a GOOD thing. Tables get designed properly, correct indexes are built, efficient query plans are created, etc. ORMs are a shortcut to getting an app talking to data, but aren't a substitute for a proper, scalable data tier. IMO...being a data specialist... :-)

Mike Sofen (Synthetic Genomics)

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Christoph Moench-Tegeder
cmt@burggraben.net
In reply to: Mike Sofen (#6)
Re: LYDB: What advice about stored procedures and other server side code?

## Mike Sofen (msofen@runbox.com):

I look at this from the opposite direction: with a stable database
API (via stored procs), I can change the schema and logic within the
procs without causing any app code breakage…the app tier is completely
insulated from those changes – that’s worth a lot.

Many applications are not designed to have a "stable" database API.
And suddenly... there's not only dependencies between functions
in the "core application" but also between application and database,
in more ways than "just a few tables". One needs to be aware of
that.

Our dedicated db servers have not yet shown any real CPU consumption
during app use - memory, for us, is the only real limiting factor.

That depends on the usage pattern. An application for "company internal"
use will behave quite differently from a popular internet application.
I do know of cases where too much code in the database became the
number one problem.
What I'm saying is: choose wisely, and be prepared for the day where
all your assumptions will be proven wrong. Predicting future use and
changes of an application in an emergent field is near impossible.
Database-side code can be a great thing, and a real problem. In the
worst case, it's both at the same time.

Regards,
Christoph

--
Spare Space

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#8Karsten Hilbert
Karsten.Hilbert@gmx.net
In reply to: Christoph Moench-Tegeder (#7)
Re: LYDB: What advice about stored procedures and other server side code?

Many applications are not designed to have a "stable" database API.

It seems OP is arguing they should.

Regards,
Karsten

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#9Christoph Moench-Tegeder
cmt@burggraben.net
In reply to: Karsten Hilbert (#8)
Re: LYDB: What advice about stored procedures and other server side code?

## Karsten Hilbert (Karsten.Hilbert@gmx.net):

Many applications are not designed to have a "stable" database API.

It seems OP is arguing they should.

Well, if the environment allows for that, fine. If not, well, duh.

Regards,
Christoph

--
Spare Space

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#10Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Christoph Moench-Tegeder (#9)
Re: LYDB: What advice about stored procedures and other server side code?

On 12/29/2016 02:12 AM, Christoph Moench-Tegeder wrote:

## Karsten Hilbert (Karsten.Hilbert@gmx.net):

Many applications are not designed to have a "stable" database API.

It seems OP is arguing they should.

Well, if the environment allows for that, fine. If not, well, duh.

Been following along and while there are several views there seems to be
one constant: "there should be a method to the madness". For a
particular situation a best try should be made to analyze what the
current and future needs are and plan a course of action that meets
those needs the best. This could be database-centric or app-centric. In
a perfect world this would be a non-biased decision, though I suspect it
tends to follow the inclinations/backgrounds of those involved in the
planning. In any case a defined course of action that is set out and
followed would seem to be better then arbitrarily bolting on
new/different methods on a whim.

Regards,
Christoph

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general