Stored procedure code no longer stored in v14 and v15, changed behaviour
Hi all,
In PosgreSQL version 13, the source code for a stored procedure or function
in SQL/plpgsql/etc was stored in pg_proc.prosrc. This column would hold the
original procedure or function body, verbatim.
Since version 14, the source code for a stored procedure or function written
in plain (compound) SQL, a new feature, is no longer stored in
pg_proc.prosrc, instead, there’s an additional column prosqlbody which
returns some kind of pre-parsed SQL which has no use for the user.
I know you can use pg_get_functiondef to get a CREATE PROCEDURE/FUNCTION
statement, but for compound SQL based routines in v14 and v15, this removes
all formatting and comments in the original CREATE statement. For database
development, including browsing an existing database and know what each
routine can do, this is quite terrible.
What is the reason for this?
Can you modify the server code to store the original body in proc.prosrc
again? It would be very helpful.
With regards,
Martijn Tonies
Upscene Productions
https://www.upscene.com
Database Workbench - developer tool for Oracle, SQL Server, PostgreSQL,
MySQL, InterBase, NexusDB and Firebird.
On Thu, Dec 1, 2022 at 7:59 AM Martijn Tonies (Upscene Productions) <
m.tonies@upscene.com> wrote:
Since version 14, the source code for a stored procedure or function
written
in plain (compound) SQL, a new feature, is no longer stored in
pg_proc.prosrc, instead, there’s an additional column prosqlbody which
returns some kind of pre-parsed SQL which has no use for the user.
For database
development, including browsing an existing database and know what each
routine can do, this is quite terrible.
Frankly, this is not all that compelling. The version controlled source
code should be readily referenced to find out the extended details of this
nature. The function name, and a COMMENT ON comment, provide arguably
sufficient in-database knowledge for cursory browsing redirecting the
reader to the source code for implementation details and history.
Can you modify the server code to store the original body in proc.prosrc
again? It would be very helpful.
I seem to recall that this option had been discussed and rejected when this
feature went in. The parsed body is a feature because its contents can be
updated due to, e.g., renaming of objects. The text source of the original
command would be unable to be updated in the same way and so it is possible
the text and the parsed contents would diverge over time, which is a
situation we do not want to have.
David J.
"David G. Johnston" <david.g.johnston@gmail.com> writes:
On Thu, Dec 1, 2022 at 7:59 AM Martijn Tonies (Upscene Productions) <
m.tonies@upscene.com> wrote:Can you modify the server code to store the original body in proc.prosrc
again? It would be very helpful.
I seem to recall that this option had been discussed and rejected when this
feature went in. The parsed body is a feature because its contents can be
updated due to, e.g., renaming of objects. The text source of the original
command would be unable to be updated in the same way and so it is possible
the text and the parsed contents would diverge over time, which is a
situation we do not want to have.
Indeed. We used to have a similar situation with respect to column
default expressions and CHECK constraint expressions. Eventually we got
rid of the textual storage of both, because it couldn't be maintained
in a reasonable way.
I think the answer here is "don't use the new syntax if you want the
function body stored textually". You can have one set of benefits,
or the other set, but not both at once.
regards, tom lane
On Thu, Dec 1, 2022 at 4:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
On Thu, Dec 1, 2022 at 7:59 AM Martijn Tonies (Upscene Productions) <
m.tonies@upscene.com> wrote:Can you modify the server code to store the original body in proc.prosrc
again? It would be very helpful.I seem to recall that this option had been discussed and rejected when this
feature went in. The parsed body is a feature because its contents can be
updated due to, e.g., renaming of objects. The text source of the original
command would be unable to be updated in the same way and so it is possible
the text and the parsed contents would diverge over time, which is a
situation we do not want to have.Indeed. We used to have a similar situation with respect to column
default expressions and CHECK constraint expressions. Eventually we got
rid of the textual storage of both, because it couldn't be maintained
in a reasonable way.I think the answer here is "don't use the new syntax if you want the
function body stored textually". You can have one set of benefits,
or the other set, but not both at once.
FWIW, we have a custom schema introspection and diff'ing ad-hoc framework,
and the fact the original SQL is not conserved as-is has also created
issues for us.
On Oracle, our SQL was preserved as-is, so could be compared reliably. While on
PostgreSQL, some names-in-SQL are rewritten, the text reformatted,
etc... So this
byte-perfect comparison is no longer possible, and we must rely on heuristics
(a.k.a, hacks...) to do a fuzzy-compare (since we lack a real PG-compatible SQL
parsers to do an infoset-comparison instead, at the AST level for example).
So it's not just a matter of browsing the schema. For us, it's a
*functional* issue. --DD
On Thu, Dec 1, 2022 at 7:59 AM Martijn Tonies (Upscene Productions) <m.tonies@upscene.com> wrote:
Since version 14, the source code for a stored procedure or function written
in plain (compound) SQL, a new feature, is no longer stored in
pg_proc.prosrc, instead, there’s an additional column prosqlbody which
returns some kind of pre-parsed SQL which has no use for the user.
For database
development, including browsing an existing database and know what each
routine can do, this is quite terrible.
Frankly, this is not all that compelling. The version controlled source code should be readily referenced to find out the extended details of this nature. The function name, and a COMMENT ON comment, provide arguably sufficient in-database knowledge for cursory browsing redirecting the reader to the source code for implementation details and history.
Frankly, David, you’re wrong. Although I agree a version controlled source code is a very good way to maintain a proper version of your development database, –while- developing and changing source code, it’s simply not the easiest way to run scripts, compared to having a tool that allows more interactive development, code commenting (instead of “COMMENT ON”), SQL Insight, browsing the current structure and so on.
Can you modify the server code to store the original body in proc.prosrc
again? It would be very helpful.
I seem to recall that this option had been discussed and rejected when this feature went in. The parsed body is a feature because its contents can be updated due to, e.g., renaming of objects. The text source of the original command would be unable to be updated in the same way and so it is possible the text and the parsed contents would diverge over time, which is a situation we do not want to have.
Ah yes, automatic renaming of database objects with its dependencies can be useful. Oracle invalidates routines that use the objects, marks those routines ‘invalid’ and they need to be recompiled, that would fail when the source code references an object that no longer exists. This also means that you actually know which files you need to touch in your version controlled source
With regards,
Martijn Tonies
Upscene Productions
https://www.upscene.com
Database Workbench - developer tool for Oracle, SQL Server, PostgreSQL, MySQL, InterBase, NexusDB and Firebird.
Attachments:
On Thu, 2022-12-01 at 16:38 +0100, Dominique Devienne wrote:
FWIW, we have a custom schema introspection and diff'ing ad-hoc framework,
and the fact the original SQL is not conserved as-is has also created
issues for us.On Oracle, our SQL was preserved as-is, so could be compared reliably. While on
PostgreSQL, some names-in-SQL are rewritten, the text reformatted,
etc...So it's not just a matter of browsing the schema. For us, it's a
*functional* issue. --DD
This is arguable, but my opinion is that this is not a robust way to
do development. You should use a schema versioning tool like Liquibase,
develop schema migration scripts and maintain the SQL code in a source
repository like other software.
At any rate, you won't be able to do it in your accustomed way in
PostgreSQL.
Yours,
Laurenz Albe
On Thu, Dec 1, 2022 at 5:10 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Thu, 2022-12-01 at 16:38 +0100, Dominique Devienne wrote:
FWIW, we have a custom schema introspection and diff'ing ad-hoc framework,
This is arguable, but my opinion is that this is not a robust way to
do development. You should use a schema versioning tool like Liquibase,
develop schema migration scripts and maintain the SQL code in a source
repository like other software.
We don't maintain SQL. We maintain a *Logical* model, and generate the
physical model from it.
FKs, NKs, Enums, CHECK constraints, indexes, etc.. all that
"implementation details"
is programmatically generated, and always consistent, from a much
higher-level and simpler model.
And you also get auto-upgrade most of the time, greatly increasing
development velocity too.
I would argue that NOT doing it this way, is the non-robust way myself :)
We've been refactoring a large data-model maintained manually like you advocate,
and I can't tell you how many anomalies we've discovered and had to fix,
using the more robust formalism of using a high-level logical model
and (DDL) code gen.
I guess is a DBA-versus-Developer point-of-view difference. --DD
On 12/1/22 09:24, Dominique Devienne wrote:
On Thu, Dec 1, 2022 at 5:10 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Thu, 2022-12-01 at 16:38 +0100, Dominique Devienne wrote:
FWIW, we have a custom schema introspection and diff'ing ad-hoc framework,
This is arguable, but my opinion is that this is not a robust way to
do development. You should use a schema versioning tool like Liquibase,
develop schema migration scripts and maintain the SQL code in a source
repository like other software.We don't maintain SQL. We maintain a *Logical* model, and generate the
physical model from it.
FKs, NKs, Enums, CHECK constraints, indexes, etc.. all that
"implementation details"
is programmatically generated, and always consistent, from a much
higher-level and simpler model.
And you also get auto-upgrade most of the time, greatly increasing
development velocity too.I would argue that NOT doing it this way, is the non-robust way myself :)
We've been refactoring a large data-model maintained manually like you advocate,
and I can't tell you how many anomalies we've discovered and had to fix,
using the more robust formalism of using a high-level logical model
and (DDL) code gen.I guess is a DBA-versus-Developer point-of-view difference. --DD
What this points to is that there are multiple ways to handle this, many
external to the server itself. My take is that the system catalogs are
there for the proper operation of the server and that is their task,
first and foremost. If you can piggyback of that then great, but with
the knowledge that the information may change to meet the needs of the
server not external users.
--
Adrian Klaver
adrian.klaver@aklaver.com
On Thu, Dec 1, 2022 at 6:41 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 12/1/22 09:24, Dominique Devienne wrote:
I guess is a DBA-versus-Developer point-of-view difference. --DD
What this points to is that there are multiple ways to handle this, many
external to the server itself. My take is that the system catalogs are
there for the proper operation of the server and that is their task,
first and foremost. If you can piggyback of that then great, but with
the knowledge that the information may change to meet the needs of the
server not external users.
I see. Still, Oracle preserves SQL as-is. SQLite preserve SQL as-is.
Would be nice if PostgreSQL did too. That's all I'm saying.
Having in libpq functionality that allows to do the same kind of SQL
normalization / rewriting done server-side would help. Then I could use
that to generate the DDL "just right" the first time.
For now, our current work-around is a two step process.
We first generate at-build-time the DDL using "our syntax" and
instantiate the schema.
Then introspect that and re-generate code with the "rewritten syntax".
Subsequent generation (it's dynamic, at runtime) will use the re-generated code
that matches the syntax re-write. Thus now the introspection and
diff'ing match the in-memory DDL.
Still, that's a PITA. I of course don't contest that PostgreSQL
maintains what it needs.
But ALSO maintaining the original, at least until a re-write is
necessary on renames,
would go A LONG WAY to satisfy the OP and myself in our use-cases. FWIW. --DD
On Dec 1, 2022, at 11:05, Dominique Devienne <ddevienne@gmail.com> wrote:
I see. Still, Oracle preserves SQL as-is. SQLite preserve SQL as-is.
Would be nice if PostgreSQL did too. That's all I'm saying.
Since this is a custom-built system, there is nothing keeping you from creating your own table in the database that stores the original text of the function.
On Thu, Dec 1, 2022 at 8:09 PM Christophe Pettus <xof@thebuild.com> wrote:
On Dec 1, 2022, at 11:05, Dominique Devienne <ddevienne@gmail.com> wrote:
I see. Still, Oracle preserves SQL as-is. SQLite preserve SQL as-is.
Would be nice if PostgreSQL did too. That's all I'm saying.Since this is a custom-built system, there is nothing keeping you from creating your own table in the database that stores the original text of the function.
That's not the point. If a DBA updates one of our triggers or proc or
whatever else,
the recorded info in a custom table won't be affected. We are diff'ing
the server-side
schema, against the expected in-memory model of the physical model.
Thus the dictionaries are the only source of truth we can trust for
the current state of the schema.
And beside minor syntactic differences, and some more troublesome
object-name rewrites, this is
exactly what we want. The system itself needs to preserve the original
DDL IMHO. --DD
Dominique Devienne <ddevienne@gmail.com> writes:
On Thu, Dec 1, 2022 at 8:09 PM Christophe Pettus <xof@thebuild.com> wrote:
Since this is a custom-built system, there is nothing keeping you from creating your own table in the database that stores the original text of the function.
That's not the point. If a DBA updates one of our triggers or proc or
whatever else,
the recorded info in a custom table won't be affected. We are diff'ing
the server-side
schema, against the expected in-memory model of the physical model.
Thus the dictionaries are the only source of truth we can trust for
the current state of the schema.
And beside minor syntactic differences, and some more troublesome
object-name rewrites, this is
exactly what we want. The system itself needs to preserve the original
DDL IMHO. --DD
Do you really fail to see the contradictions in this? You want the
database to preserve the original DDL, but you also want it to update
in response to subsequent alterations. You can't have both those
things.
regards, tom lane
On 2 Dec 2022, at 6:51, Tom Lane wrote:
Dominique Devienne <ddevienne@gmail.com> writes:
On Thu, Dec 1, 2022 at 8:09 PM Christophe Pettus <xof@thebuild.com> wrote:
Since this is a custom-built system, there is nothing keeping you from creating your own table in the database that stores the original text of the function.
That's not the point. If a DBA updates one of our triggers or proc or
whatever else,
the recorded info in a custom table won't be affected. We are diff'ing
the server-side
schema, against the expected in-memory model of the physical model.Thus the dictionaries are the only source of truth we can trust for
the current state of the schema.
And beside minor syntactic differences, and some more troublesome
object-name rewrites, this is
exactly what we want. The system itself needs to preserve the original
DDL IMHO. --DDDo you really fail to see the contradictions in this? You want the
database to preserve the original DDL, but you also want it to update
in response to subsequent alterations. You can't have both those
things.
At the risk of stating the (over) obvious…
NEW: the Pg system has a parsed/tokenised version of functions and such like, with the excellent feature that these will be kept synchronised with any name change of objects referenced
OLD: easily available text version of functions, etc. with comments and layout (but not easily updatable in the event referenced objects get renamed) — sysadmin usage
The contradiction is obvious but both needs can be justified…
NEXT(?): human readable version of function, etc definitions be generated from the parsed version, with the addition of tokens within this parsed version that allow programer’s comments to be reinstated. Leave the layout style conventions to the system and promise to keep this style consistent and maintained as part of the feature. Putting this generated (and updated as needed) text into pg_proc.prosrc would seem to be the least change needed to allow existing usage to move with the new Pg versions (and maybe help pgAdmin as well)
Regards
Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a well-known solution to every human problem — neat, plausible, and wrong.
— H. L. Mencken, 1920
On Thu, Dec 01, 2022 at 04:38:57PM +0100, Dominique Devienne <ddevienne@gmail.com> wrote:
On Thu, Dec 1, 2022 at 4:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
On Thu, Dec 1, 2022 at 7:59 AM Martijn Tonies (Upscene Productions) <
m.tonies@upscene.com> wrote:Can you modify the server code to store the original body in proc.prosrc
again? It would be very helpful.I seem to recall that this option had been discussed and rejected when this
feature went in. The parsed body is a feature because its contents can be
updated due to, e.g., renaming of objects. The text source of the original
command would be unable to be updated in the same way and so it is possible
the text and the parsed contents would diverge over time, which is a
situation we do not want to have.Indeed. We used to have a similar situation with respect to column
default expressions and CHECK constraint expressions. Eventually we got
rid of the textual storage of both, because it couldn't be maintained
in a reasonable way.I think the answer here is "don't use the new syntax if you want the
function body stored textually". You can have one set of benefits,
or the other set, but not both at once.FWIW, we have a custom schema introspection and diff'ing ad-hoc framework,
and the fact the original SQL is not conserved as-is has also created
issues for us.On Oracle, our SQL was preserved as-is, so could be compared reliably. While on
PostgreSQL, some names-in-SQL are rewritten, the text reformatted,
etc... So this
byte-perfect comparison is no longer possible, and we must rely on heuristics
(a.k.a, hacks...) to do a fuzzy-compare (since we lack a real PG-compatible SQL
parsers to do an infoset-comparison instead, at the AST level for example).So it's not just a matter of browsing the schema. For us, it's a
*functional* issue. --DD
Same here. Accessing the loaded stored procedure source
is how I audit the state of stored procedures in the
database against the code in the code repository.
Without the ability to make that comparison, there is
no way to audit the stored procedures, and the only way
to make sure that the database is up to date with the
stored procedures would be to reload every single
stored procedure. I might have to alter the procedure
loading program to store its own copy of the source code
in the database somewhere, and just hope that nobody
loads stored procedures using any other tool. Even so,
it will slow down loading old database backups and
bringing their schema and code up to date. But that's
probably OK compared with the time to load the backup
itself.
cheers,
raf
On 12/1/22 09:24, Dominique Devienne wrote:
I guess is a DBA-versus-Developer point-of-view difference. --DD
What this points to is that there are multiple ways to handle this, many
external to the server itself. My take is that the system catalogs are
there for the proper operation of the server and that is their task,
first and foremost. If you can piggyback of that then great, but with
the knowledge that the information may change to meet the needs of the
server not external users.I see. Still, Oracle preserves SQL as-is. SQLite preserve SQL as-is.
Would be nice if PostgreSQL did too. That's all I'm saying.
So do InterBase, Firebird, SQL Server, MySQL (except for Views, strangely
enough),
MariaDB, NexusDB, SQL Anywhere, and, frankly, all others I know of.
And this is used all the time by database developers.
And at least InterBase and Firebird -also- stored a 'parsed version' (in
binary).
With regards,
Martijn Tonies
Upscene Productions
https://www.upscene.com
Database Workbench - developer tool for Oracle, SQL Server, PostgreSQL,
MySQL, InterBase, NexusDB and Firebird.
On Fri, 2022-12-02 at 08:49 +0100, Martijn Tonies (Upscene Productions) wrote:
So do InterBase, Firebird, SQL Server, MySQL (except for Views, strangely
enough),
MariaDB, NexusDB, SQL Anywhere, and, frankly, all others I know of.And this is used all the time by database developers.
And at least InterBase and Firebird -also- stored a 'parsed version' (in
binary).
Great; then go ahead and use those databases, if it is important for you.
In the same vein, I don't think any of those databases have trigram or
bloom indexes. Perhaps we should improve the acceptance of PostgreSQL
by removing those features?
Yours,
Laurenz Albe
On Fri, 2 Dec 2022 at 02:24, raf <raf@raf.org> wrote:
Same here. Accessing the loaded stored procedure source
is how I audit the state of stored procedures in the
database against the code in the code repository.
Exactly. If our software is audited, how can I reliably prove to auditor
that the running version of the procedure has not been tampered with
either by customer, rogue developer or some malicious party?
Suggested solution "store the text in user table" does not work, because
it requires following a process. And the process overriding changes are those
we are interested in.
I'm not so much concerned about whether the stored procedure text compiles
anymore (because of object name changes etc.) I just really would like to
have the exact source code stored with the database. So if release management
goes horribly wrong, there is at least the possibility to compare procedure
text manually.
Pasi
On 2 Dec 2022, at 14:00, Pasi Oja-Nisula <pon@iki.fi> wrote:
On Fri, 2 Dec 2022 at 02:24, raf <raf@raf.org> wrote:
Same here. Accessing the loaded stored procedure source
is how I audit the state of stored procedures in the
database against the code in the code repository.Exactly. If our software is audited, how can I reliably prove to auditor
that the running version of the procedure has not been tampered with
either by customer, rogue developer or some malicious party?
How do you today prove that for other compiled programs in your system?
--
Daniel Gustafsson https://vmware.com/
On Fri, 2 Dec 2022 at 15:02, Daniel Gustafsson <daniel@yesql.se> wrote:
How do you today prove that for other compiled programs in your system?
Good question. I guess I never considered database to be compiled.
How do you debug this kind of system? What if the you suspect that there is
a wrong version of procedure in the system? How can you then find what release
or patch produced the problem if you can't see the actual procedure text
that produced the running version?
I'm not claiming that there isn't benefits in this system. I'm sure it actually
prevents rogue changes for not having the source code readily available. It's
just a big fundamental change to get over if you can't do comparison from
database to version control.
Pasi
On Fri, Dec 2, 2022 at 1:37 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
Great; then go ahead and use those databases, if it is important for you.
Now come on. We all love PostgreSQL.
But that doesn't mean we can't disagree on some decisions.
Especially when you are a USER of PostgreSQL, not a DEV of it,
and it's too late by the time you are even aware of the changes.
Not everyone can chime in on the dev-list when those are discussed.
From a user point of view, can also be seen as a "regression",
when an observable property of the system changes to a new
different / incompatible way, to some extent. I'm not saying it is,
still it is a change one discovers too late, creates pain to some,
and is both worth reporting and discussing.
Given the tone of the discussion here though,
I don't think there's much to hope for a middle ground...
In the same vein, I don't think any of those databases have ... bloom indexes.
SQLite has bloom filters now :) (not persistent indexes)