Standalone Parser for PL/pgSQL

Started by Matt Milleralmost 21 years ago10 messagesgeneral
Jump to latest
#1Matt Miller
mattm@epx.com

I'd like (to find or make) a utility that inputs the code of a Pl/pgSQL
function (e.g. from a text file or from STDIN, and then parses the
function definition, building a complete symbol table. I would then
write C code that walks that symbol table and does stuff. As a starting
point I'd be happy if I could just visit each node in the symbol table
and dump that node out to another file.

I'm thinking that the code in src/pl/plpgsql/src is where to start, but
I need some guidance sorting through that stuff. I understand (in
theory, anyway) how flex and bison work, so maybe I just need a bit of
hand-holding to get a simple standalone PL/pgSQL parser up and running.

I want to analyze/transform some Oracle PL/SQL procs, and PL/pgSQL is so
close to PL/SQL that I figured that a PL/pgSQL parser would be a good
starting point.

#2Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Matt Miller (#1)
Re: Standalone Parser for PL/pgSQL

On Wed, Jul 13, 2005 at 08:33:59PM +0000, Matt Miller wrote:

I'd like (to find or make) a utility that inputs the code of a Pl/pgSQL
function (e.g. from a text file or from STDIN, and then parses the
function definition, building a complete symbol table. I would then
write C code that walks that symbol table and does stuff. As a starting
point I'd be happy if I could just visit each node in the symbol table
and dump that node out to another file.

I'm thinking that the code in src/pl/plpgsql/src is where to start, but
I need some guidance sorting through that stuff. I understand (in
theory, anyway) how flex and bison work, so maybe I just need a bit of
hand-holding to get a simple standalone PL/pgSQL parser up and running.

I don't think you can use just plpgsql's parser. The problem is that it
relies on the main backend parser to figure out anything it doesn't
understand. So you'd have to mix both parsers somehow. The only
thought that comes to my mind is start with the backend's parser and add
productions for PL/pgSQL's constructs.

The main parser depends (at least) on the List handling and memory
handling. So your "simple standalone parser" will have to contain both
things at least.

See src/backend/parser/README.

--
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
A male gynecologist is like an auto mechanic who never owned a car.
(Carrie Snow)

#3Matt Miller
mattm@epx.com
In reply to: Alvaro Herrera (#2)
Re: Standalone Parser for PL/pgSQL

On Wed, 2005-07-13 at 17:04 -0400, Alvaro Herrera wrote:

a simple standalone PL/pgSQL parser

it relies on the main backend parser ... you'd have to mix
both parsers somehow. ... The main parser depends (at least)
on the List handling and memory handling

Okay, you scared me off.

It looks like I'm back to Perl to script my conversion of PL/SQL to
PL/pgSQL.

Thanks for the advice.

#4Neil Conway
neilc@samurai.com
In reply to: Alvaro Herrera (#2)
Re: Standalone Parser for PL/pgSQL

Alvaro Herrera wrote:

I don't think you can use just plpgsql's parser. The problem is that it
relies on the main backend parser to figure out anything it doesn't
understand.

I think it depends on what kind of information you want to extract from
a PL/PgSQL function definition. The PL/PgSQL parser handles the
structure of the PL/PgSQL function definition itself, but it does not
parse expressions or SQL queries. Those are essentially treated as
strings that are later handed to the main SQL machinery to be parsed and
evaluated. If you're content to treat expressions and SQL queries as
opaque strings, you shouldn't need to concern yourself with the main SQL
parser.

The main parser depends (at least) on the List handling and memory
handling. So your "simple standalone parser" will have to contain both
things at least.

The PL/PgSQL parser also depends on these, although to a lesser degree.

-Neil

#5Matt Miller
mattm@epx.com
In reply to: Neil Conway (#4)
Re: Standalone Parser for PL/pgSQL

On Thu, 2005-07-14 at 14:32 +1000, Neil Conway wrote:

Alvaro Herrera wrote:

I don't think you can use just plpgsql's parser. ... it
relies on the main backend parser

If you're content to treat expressions and SQL queries as
opaque strings, you shouldn't need to concern yourself with the main SQL
parser.

Treating queries as terminal nodes in the parse tree would be fine as a
starting point.

The main parser depends (at least) on the List handling and memory
handling.

The PL/PgSQL parser also depends on these, although to a lesser degree.

I suppose these dependencies are okay as long as I can just link my
parser to a library (e.g. src/pl/plpgsql/src/libplpgsql.so) and
everything magically works.

My fantasy here is to perform some straightforward surgery and bring to
life an executable that accepts PL/pgSQL code (e.g. as text on STDIN)
and creates a parse tree in memory. The stuff in src/pl/plpgsql/src/
looked like a good starting point. gram.y and scan.l are there.
pl_funcs.c has some cool-looking "dump*" functions. Now, can this stuff
be "straightforwardly" hacked into a program that gets its PL/pgSQL
source code from STDIN or from a text file, instead of from the "AS"
clause of a "CREATE FUNCTION" statement?

My goal is to automate (as much as possible) the conversion of Oracle
PL/SQL stored procedures into PL/pgSQL. Up to this point I've been
exploring Perl as a tool for this, but things are getting a bit thick
for me to rely on Perl regexes as my most-powerful weapon. Instead of
growing my Perl stuff to look more and more like a parser, I went
looking for a parser, and ended up in src/pl/plpgslq/src.

#6Richard D Levine
Richard_D_Levine@raytheon.com
In reply to: Matt Miller (#5)
Re: Standalone Parser for PL/pgSQL

pgsql-general-owner@postgresql.org wrote on 07/14/2005 10:38:43 AM:

On Thu, 2005-07-14 at 14:32 +1000, Neil Conway wrote:

Alvaro Herrera wrote:

I don't think you can use just plpgsql's parser. ... it
relies on the main backend parser

If you're content to treat expressions and SQL queries as
opaque strings, you shouldn't need to concern yourself with the main

SQL

parser.

Treating queries as terminal nodes in the parse tree would be fine as a
starting point.

The main parser depends (at least) on the List handling and memory
handling.

The PL/PgSQL parser also depends on these, although to a lesser degree.

I suppose these dependencies are okay as long as I can just link my
parser to a library (e.g. src/pl/plpgsql/src/libplpgsql.so) and
everything magically works.

My fantasy here is to perform some straightforward surgery and bring to
life an executable that accepts PL/pgSQL code (e.g. as text on STDIN)
and creates a parse tree in memory. The stuff in src/pl/plpgsql/src/
looked like a good starting point. gram.y and scan.l are there.
pl_funcs.c has some cool-looking "dump*" functions. Now, can this stuff
be "straightforwardly" hacked into a program that gets its PL/pgSQL
source code from STDIN or from a text file, instead of from the "AS"
clause of a "CREATE FUNCTION" statement?

My goal is to automate (as much as possible) the conversion of Oracle
PL/SQL stored procedures into PL/pgSQL.

Ya know, what would be *totally* cool is to implement Oracle PL/SQL syntax
transparently in PostgreSQL. You just can't do packages in PL/pgSQL. You
can mimic them with schemas, but there are things that must be done
differently, and probably more slowly, like package body global constants
and variables. If I were doing it, which I'm not, I would implement
Oracle's quoting and not the PL/pgSQL quoting paradigm. Man it's painful.
Understandable, due to the many languages supported, but painful
nonetheless.

Show quoted text

Up to this point I've been
exploring Perl as a tool for this, but things are getting a bit thick
for me to rely on Perl regexes as my most-powerful weapon. Instead of
growing my Perl stuff to look more and more like a parser, I went
looking for a parser, and ended up in src/pl/plpgslq/src.

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

#7Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Richard D Levine (#6)
Re: Standalone Parser for PL/pgSQL

On Thu, Jul 14, 2005 at 11:07:35AM -0500, Richard_D_Levine@raytheon.com wrote:

Ya know, what would be *totally* cool is to implement Oracle PL/SQL syntax
transparently in PostgreSQL. You just can't do packages in PL/pgSQL. You
can mimic them with schemas, but there are things that must be done
differently, and probably more slowly, like package body global constants
and variables.

If that's what you want, you may be interested in checking out
EnterpriseDB's product, which does exactly this. There's a free beta
you can try out if you like.

--
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
"Nunca se desea ardientemente lo que solo se desea por raz�n" (F. Alexandre)

#8Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Matt Miller (#5)
Re: Standalone Parser for PL/pgSQL

On Thu, Jul 14, 2005 at 03:38:43PM +0000, Matt Miller wrote:

The main parser depends (at least) on the List handling and memory
handling.

The PL/PgSQL parser also depends on these, although to a lesser degree.

I suppose these dependencies are okay as long as I can just link my
parser to a library (e.g. src/pl/plpgsql/src/libplpgsql.so) and
everything magically works.

Hmm, those symbols are defined in the main Postgres executable, so
there's no shared library to rely on. Anyway, for the memory allocation
stuff, you should be able to

#define palloc(a) malloc(a)
#define pfree(a) free(a)

and define AllocSetContextCreate() and other MemoryContext calls defined
as no-ops. It will leak memory like crazy, but for a short-lived
program I don't think it's a problem.

list.c you'll have to take from src/backend/nodes/list.c, I don't think
it should be too difficult.

--
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
"Porque francamente, si para saber manejarse a uno mismo hubiera que
rendir examen... �Qui�n es el machito que tendr�a carnet?" (Mafalda)

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Matt Miller (#5)
Re: Standalone Parser for PL/pgSQL

Matt Miller <mattm@epx.com> writes:

... The stuff in src/pl/plpgsql/src/
looked like a good starting point. gram.y and scan.l are there.
pl_funcs.c has some cool-looking "dump*" functions. Now, can this stuff
be "straightforwardly" hacked into a program that gets its PL/pgSQL
source code from STDIN or from a text file, instead of from the "AS"
clause of a "CREATE FUNCTION" statement?

Try to compile those as a standalone program. Observe the undefined
symbols you get. Add other backend modules to your compile (or build
stub versions). Lather, rinse, repeat until it links.

I think 90% of what you'd need is replacements for palloc and elog.
palloc can be turned into a simple malloc-and-check-for-failure,
and elog can probably just print the message and exit.

regards, tom lane

#10Tino Wildenhain
tino@wildenhain.de
In reply to: Matt Miller (#5)
Re: Standalone Parser for PL/pgSQL

Am Donnerstag, den 14.07.2005, 15:38 +0000 schrieb Matt Miller:

On Thu, 2005-07-14 at 14:32 +1000, Neil Conway wrote:

Alvaro Herrera wrote:

I don't think you can use just plpgsql's parser. ... it
relies on the main backend parser

If you're content to treat expressions and SQL queries as
opaque strings, you shouldn't need to concern yourself with the main SQL
parser.

Treating queries as terminal nodes in the parse tree would be fine as a
starting point.

*snip*

I'm also extraordinarely interested in this work. I'd like to
see the parser as a lib (for me to add some stubs to python).

I think from a stand-allone parser I could make the changes
I need.