Extending PostgreSQL with a Domain-Specific Language (DSL) - Development

Started by Tom Merchaover 6 years ago10 messages
#1Tom Mercha
mercha_t@hotmail.com

Dear Hackers

I am interested in implementing my own Domain Specific Language (DSL) using PostgreSQL internals. Originally, the plan was not to use PostgreSQL and I had developed a grammar and used ANTLRv4 for parser work and general early development.

Initially, I was hoping for a scenario where I could have PostgreSQL's parser to change grammar (e.g. SET parser_language=SQL vs. SET parser_language=myDSL) in which case my ANTLRv4 project would override the PostgreSQL parser module. I guess another direction that my project could take is to extend PostgreSQL's SQL parser to factor in my DSL keywords and requirements.

To make matters more complicated, this version of ANTLR does not support code generation to C, but it does support generation to C++. Integrating the generated C++ code requires making it friendly to PostgreSQL e.g. using Plain Old Data Structures as described here https://www.postgresql.org/docs/9.0/extend-cpp.html, which seems to be suggesting to me that I may be using the wrong approach towards my goal.

I would be grateful if anyone could provide any general advice or pointers regarding my approach, for example regarding development effort, so that the development with PostgreSQL internals can be smooth and of a high quality. Maybe somebody has come across another DSL attempt which used PostgreSQL and that I could follow as a reference?

Thanks in advance.

Best,
Tom

[https://ipmcdn.avast.com/images/icons/icon-envelope-tick-green-avg-v1.png]&lt;http://www.avg.com/email-signature?utm_medium=email&amp;utm_source=link&amp;utm_campaign=sig-email&amp;utm_content=webmail&amp;utm_term=oa-4885-a&gt; Virus-free. www.avg.com<http://www.avg.com/email-signature?utm_medium=email&amp;utm_source=link&amp;utm_campaign=sig-email&amp;utm_content=webmail&amp;utm_term=oa-4885-a&gt;

#2Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tom Mercha (#1)
Re: Extending PostgreSQL with a Domain-Specific Language (DSL) - Development

On Fri, Jul 05, 2019 at 07:55:15AM +0000, Tom Mercha wrote:

Dear Hackers

I am interested in implementing my own Domain Specific Language (DSL)
using PostgreSQL internals. Originally, the plan was not to use
PostgreSQL and I had developed a grammar and used ANTLRv4 for parser
work and general early development.

Initially, I was hoping for a scenario where I could have PostgreSQL's
parser to change grammar (e.g. SET parser_language=SQL vs. SET
parser_language=myDSL) in which case my ANTLRv4 project would override
the PostgreSQL parser module. I guess another direction that my project
could take is to extend PostgreSQL's SQL parser to factor in my DSL
keywords and requirements.

To make matters more complicated, this version of ANTLR does not
support code generation to C, but it does support generation to C++.
Integrating the generated C++ code requires making it friendly to
PostgreSQL e.g. using Plain Old Data Structures as described here
https://www.postgresql.org/docs/9.0/extend-cpp.html, which seems to be
suggesting to me that I may be using the wrong approach towards my
goal.

I would be grateful if anyone could provide any general advice or
pointers regarding my approach, for example regarding development
effort, so that the development with PostgreSQL internals can be smooth
and of a high quality. Maybe somebody has come across another DSL
attempt which used PostgreSQL and that I could follow as a reference?

I might be missing something, but it seems like you intend to replace
the SQL grammar we have with something else. It's not clear to me what
would be the point of doing that, and it definitely looks like a huge
amount of work - e.g. we don't have any support for switching between
two distinct grammars the way you envision, and just that alone seems
like a multi-year project. And if you don't have that capability, all
external tools kinda stop working. Good luck with running such database.

What I'd look at first is implementing the grammar as a procedural
language (think PL/pgSQL, pl/perl etc.) implementing whatever you expect
from your DSL. And it's not like you'd have to wrap everything in
functions, because we have anonymous DO blocks. So you could do:

DO LANGUAGE mydsl $$
... whatever my dsl allows ...
$$;

It's still a fair amount of code to implement this (both the PL handler
and the DSL implementation), but it's orders of magnitude simpler than
what you described.

See https://www.postgresql.org/docs/current/plhandler.html for details
about how to write a language handler.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#3Tom Mercha
mercha_t@hotmail.com
In reply to: Tomas Vondra (#2)
Re: Extending PostgreSQL with a Domain-Specific Language (DSL) - Development

I might be missing something, but it seems like you intend to replace
the SQL grammar we have with something else. It's not clear to me what
would be the point of doing that, and it definitely looks like a huge
amount of work - e.g. we don't have any support for switching between
two distinct grammars the way you envision, and just that alone seems
like a multi-year project. And if you don't have that capability, all
external tools kinda stop working. Good luck with running such database.
I was considering having two distinct grammars as an option - thanks for indicating the effort involved. At the end of the day I want both my DSL and the PostgreSQL grammars to coexist. Is extending PostgreSQL's grammar with my own through the PostgreSQL extension infrastructure worth consideration or is it also difficult to develop? Could you suggest any reference material on this topic?

What I'd look at first is implementing the grammar as a procedural
language (think PL/pgSQL, pl/perl etc.) implementing whatever you expect
from your DSL. And it's not like you'd have to wrap everything in
functions, because we have anonymous DO blocks.
Thanks for pointing out this direction! I think I will indeed adopt this approach especially if directly extending PostgreSQL grammar would be difficult.

Regards
Tom
________________________________
From: Tomas Vondra <tomas.vondra@2ndquadrant.com>
Sent: 05 July 2019 20:48
To: Tom Mercha
Cc: pgsql-hackers@postgresql.org
Subject: Re: Extending PostgreSQL with a Domain-Specific Language (DSL) - Development

On Fri, Jul 05, 2019 at 07:55:15AM +0000, Tom Mercha wrote:

Dear Hackers

I am interested in implementing my own Domain Specific Language (DSL)
using PostgreSQL internals. Originally, the plan was not to use
PostgreSQL and I had developed a grammar and used ANTLRv4 for parser
work and general early development.

Initially, I was hoping for a scenario where I could have PostgreSQL's
parser to change grammar (e.g. SET parser_language=SQL vs. SET
parser_language=myDSL) in which case my ANTLRv4 project would override
the PostgreSQL parser module. I guess another direction that my project
could take is to extend PostgreSQL's SQL parser to factor in my DSL
keywords and requirements.

To make matters more complicated, this version of ANTLR does not
support code generation to C, but it does support generation to C++.
Integrating the generated C++ code requires making it friendly to
PostgreSQL e.g. using Plain Old Data Structures as described here
https://www.postgresql.org/docs/9.0/extend-cpp.html, which seems to be
suggesting to me that I may be using the wrong approach towards my
goal.

I would be grateful if anyone could provide any general advice or
pointers regarding my approach, for example regarding development
effort, so that the development with PostgreSQL internals can be smooth
and of a high quality. Maybe somebody has come across another DSL
attempt which used PostgreSQL and that I could follow as a reference?

I might be missing something, but it seems like you intend to replace
the SQL grammar we have with something else. It's not clear to me what
would be the point of doing that, and it definitely looks like a huge
amount of work - e.g. we don't have any support for switching between
two distinct grammars the way you envision, and just that alone seems
like a multi-year project. And if you don't have that capability, all
external tools kinda stop working. Good luck with running such database.

What I'd look at first is implementing the grammar as a procedural
language (think PL/pgSQL, pl/perl etc.) implementing whatever you expect
from your DSL. And it's not like you'd have to wrap everything in
functions, because we have anonymous DO blocks. So you could do:

DO LANGUAGE mydsl $$
... whatever my dsl allows ...
$$;

It's still a fair amount of code to implement this (both the PL handler
and the DSL implementation), but it's orders of magnitude simpler than
what you described.

See https://www.postgresql.org/docs/current/plhandler.html for details
about how to write a language handler.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#4Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tom Mercha (#3)
Re: Extending PostgreSQL with a Domain-Specific Language (DSL) - Development

First of all, it's pretty difficult to follow the discussion when it's
not clear what's the original message and what's the response. E-mail
clients generally indent the original message with '>' or someting like
that, but your client does not do that (which is pretty silly). And
copying the message at the top does not really help. Please do something
about that.

On Fri, Jul 05, 2019 at 09:37:03PM +0000, Tom Mercha wrote:

I might be missing something, but it seems like you intend to replace
the SQL grammar we have with something else. It's not clear to me what
would be the point of doing that, and it definitely looks like a huge
amount of work - e.g. we don't have any support for switching between
two distinct grammars the way you envision, and just that alone seems
like a multi-year project. And if you don't have that capability, all
external tools kinda stop working. Good luck with running such database.

I was considering having two distinct grammars as an option - thanks
for indicating the effort involved. At the end of the day I want both
my DSL and the PostgreSQL grammars to coexist. Is extending
PostgreSQL's grammar with my own through the PostgreSQL extension
infrastructure worth consideration or is it also difficult to develop?
Could you suggest any reference material on this topic?

Well, I'm not an expert in that area, but we currently don't have any
infrastructure to support that. It's a topic that was discussed in the
past (perhaps you can find some references in the archives) and it
generally boils down to:

1) We're using bison as parser generator.
2) Bison does not allow adding rules on the fly.

So you have to modify the in-core src/backend/parser/gram.y and rebuild
postgres. See for example for an example of such discussion

/messages/by-id/CABSN6VeeEhwb0HrjOCp9kHaWm0Ljbnko5y-0NKsT_=5i5C2jog@mail.gmail.com

When two of the smartest people on the list say it's a hard problem, it
probably is. Particularly for someone who does not know the internals.

What I'd look at first is implementing the grammar as a procedural
language (think PL/pgSQL, pl/perl etc.) implementing whatever you
expect from your DSL. And it's not like you'd have to wrap everything
in functions, because we have anonymous DO blocks.

Thanks for pointing out this direction! I think I will indeed adopt
this approach especially if directly extending PostgreSQL grammar would
be difficult.

Well, it's the only way to deal with it at the moment.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#5Tom Mercha
mercha_t@hotmail.com
In reply to: Tomas Vondra (#4)
Re: Extending PostgreSQL with a Domain-Specific Language (DSL) - Development

On 06/07/2019 00:06, Tomas Vondra wrote:

First of all, it's pretty difficult to follow the discussion when it's
not clear what's the original message and what's the response. E-mail
clients generally indent the original message with '>' or someting like
that, but your client does not do that (which is pretty silly). And
copying the message at the top does not really help. Please do something
about that.

I would like to apologise. I did not realize that my client was doing
that and now I have changed the client. I hope it's fine now.

On Fri, Jul 05, 2019 at 09:37:03PM +0000, Tom Mercha wrote:

I might be missing something, but it seems like you intend to replace
the SQL grammar we have with something else. It's not clear to me what
would be the point of doing that, and it definitely looks like a huge
amount of work - e.g. we don't have any support for switching between
two distinct grammars the way you envision, and just that alone seems
like a multi-year project. And if you don't have that capability, all
external tools kinda stop working. Good luck with running such database.

I was considering having two distinct grammars as an option - thanks
for indicating the effort involved. At the end of the day I want both
my DSL and the PostgreSQL grammars to coexist. Is extending
PostgreSQL's grammar with my own through the PostgreSQL extension
infrastructure worth consideration or is it also difficult to develop?
Could you suggest any reference material on this topic?

Well, I'm not an expert in that area, but we currently don't have any
infrastructure to support that. It's a topic that was discussed in the
past (perhaps you can find some references in the archives) and it
generally boils down to:

1) We're using bison as parser generator.
2) Bison does not allow adding rules on the fly.

So you have to modify the in-core src/backend/parser/gram.y and rebuild
postgres. See for example for an example of such discussion

/messages/by-id/CABSN6VeeEhwb0HrjOCp9kHaWm0Ljbnko5y-0NKsT_=5i5C2jog@mail.gmail.com

When two of the smartest people on the list say it's a hard problem, it
probably is. Particularly for someone who does not know the internals.

You are right. Thanks for bringing it to my attention!

I didn't design my language for interaction with triggers and whatnot,
but I think that it would be very interesting to support those as well,
so looking at CREATE LANGUAGE functionality is actually exciting and
appropriate once I make some changes in design. Thanks again for this point!

I hope this is not off topic but I was wondering if you know what are
the intrinsic differences between HANDLER and INLINE parameters of
CREATE LANGUAGE? I know that they are functions which are invoked at
different instances of time (e.g. one is for handling anonymous code
blocks), but at the end of the day they seem to have the same purpose?

Show quoted text

What I'd look at first is implementing the grammar as a procedural
language (think PL/pgSQL, pl/perl etc.) implementing whatever you
expect from your DSL. And it's not like you'd have to wrap everything
in functions, because we have anonymous DO blocks.

Thanks for pointing out this direction! I think I will indeed adopt
this approach especially if directly extending PostgreSQL grammar would
be difficult.

Well, it's the only way to deal with it at the moment.

regards

#6Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tom Mercha (#5)
Re: Extending PostgreSQL with a Domain-Specific Language (DSL) - Development

On Sun, Jul 07, 2019 at 11:06:38PM +0000, Tom Mercha wrote:

On 06/07/2019 00:06, Tomas Vondra wrote:

First of all, it's pretty difficult to follow the discussion when it's
not clear what's the original message and what's the response. E-mail
clients generally indent the original message with '>' or someting like
that, but your client does not do that (which is pretty silly). And
copying the message at the top does not really help. Please do something
about that.

I would like to apologise. I did not realize that my client was doing
that and now I have changed the client. I hope it's fine now.

Thanks, seems fine now.

On Fri, Jul 05, 2019 at 09:37:03PM +0000, Tom Mercha wrote:

I might be missing something, but it seems like you intend to replace
the SQL grammar we have with something else. It's not clear to me what
would be the point of doing that, and it definitely looks like a huge
amount of work - e.g. we don't have any support for switching between
two distinct grammars the way you envision, and just that alone seems
like a multi-year project. And if you don't have that capability, all
external tools kinda stop working. Good luck with running such database.

I was considering having two distinct grammars as an option - thanks
for indicating the effort involved. At the end of the day I want both
my DSL and the PostgreSQL grammars to coexist. Is extending
PostgreSQL's grammar with my own through the PostgreSQL extension
infrastructure worth consideration or is it also difficult to develop?
Could you suggest any reference material on this topic?

Well, I'm not an expert in that area, but we currently don't have any
infrastructure to support that. It's a topic that was discussed in the
past (perhaps you can find some references in the archives) and it
generally boils down to:

1) We're using bison as parser generator.
2) Bison does not allow adding rules on the fly.

So you have to modify the in-core src/backend/parser/gram.y and rebuild
postgres. See for example for an example of such discussion

/messages/by-id/CABSN6VeeEhwb0HrjOCp9kHaWm0Ljbnko5y-0NKsT_=5i5C2jog@mail.gmail.com

When two of the smartest people on the list say it's a hard problem, it
probably is. Particularly for someone who does not know the internals.

You are right. Thanks for bringing it to my attention!

I didn't design my language for interaction with triggers and whatnot,
but I think that it would be very interesting to support those as well,
so looking at CREATE LANGUAGE functionality is actually exciting and
appropriate once I make some changes in design. Thanks again for this point!

;-)

I hope this is not off topic but I was wondering if you know what are
the intrinsic differences between HANDLER and INLINE parameters of
CREATE LANGUAGE? I know that they are functions which are invoked at
different instances of time (e.g. one is for handling anonymous code
blocks), but at the end of the day they seem to have the same purpose?

I've never written any PL handler, so I don't know. All I know is this
quote from the docs, right below the simple example of PL handler:

Only a few thousand lines of code have to be added instead of the
dots to complete the call handler.

I suppose the best idea to start an implementation is to copy an
existing PL implementation, and modify that. That's usually much easier
than starting from scratch, because you have something that works. Not
sure if PL/pgSQL is the right choice though, perhaps pick some other
language from https://wiki.postgresql.org/wiki/PL_Matrix

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#7Tom Mercha
mercha_t@hotmail.com
In reply to: Tomas Vondra (#6)
Re: Extending PostgreSQL with a Domain-Specific Language (DSL) - Development

On 09/07/2019 23:22, Tomas Vondra wrote:

On Sun, Jul 07, 2019 at 11:06:38PM +0000, Tom Mercha wrote:

On 06/07/2019 00:06, Tomas Vondra wrote:

First of all, it's pretty difficult to follow the discussion when it's
not clear what's the original message and what's the response. E-mail
clients generally indent the original message with '>' or someting like
that, but your client does not do that (which is pretty silly). And
copying the message at the top does not really help. Please do something
about that.

I would like to apologise. I did not realize that my client was doing
that and now I have changed the client. I hope it's fine now.

Thanks, seems fine now.

On Fri, Jul 05, 2019 at 09:37:03PM +0000, Tom Mercha wrote:

I might be missing something, but it seems like you intend to replace
the SQL grammar we have with something else. It's not clear to me what
would be the point of doing that, and it definitely looks like a huge
amount of work - e.g. we don't have any support for switching between
two distinct grammars the way you envision, and just that alone seems
like a multi-year project. And if you don't have that capability, all
external tools kinda stop working. Good luck with running such
database.

I was considering having two distinct grammars as an option - thanks
for indicating the effort involved. At the end of the day I want both
my DSL and the PostgreSQL grammars to coexist. Is extending
PostgreSQL's grammar with my own through the PostgreSQL extension
infrastructure worth consideration or is it also difficult to develop?
Could you suggest any reference material on this topic?

Well, I'm not an expert in that area, but we currently don't have any
infrastructure to support that. It's a topic that was discussed in the
past (perhaps you can find some references in the archives) and it
generally boils down to:

1) We're using bison as parser generator.
2) Bison does not allow adding rules on the fly.

So you have to modify the in-core src/backend/parser/gram.y and rebuild
postgres. See for example for an example of such discussion

/messages/by-id/CABSN6VeeEhwb0HrjOCp9kHaWm0Ljbnko5y-0NKsT_=5i5C2jog@mail.gmail.com

When two of the smartest people on the list say it's a hard problem, it
probably is. Particularly for someone who does not know the internals.

You are right. Thanks for bringing it to my attention!

I didn't design my language for interaction with triggers and whatnot,
but I think that it would be very interesting to support those as well,
so looking at CREATE LANGUAGE functionality is actually exciting and
appropriate once I make some changes in design. Thanks again for this
point!

;-)

I hope this is not off topic but I was wondering if you know what are
the intrinsic differences between HANDLER and INLINE parameters of
CREATE LANGUAGE? I know that they are functions which are invoked at
different instances of time (e.g. one is for handling anonymous code
blocks), but at the end of the day they seem to have the same purpose?

I've never written any PL handler, so I don't know. All I know is this
quote from the docs, right below the simple example of PL handler:

   Only a few thousand lines of code have to be added instead of the
   dots to complete the call handler.

I suppose the best idea to start an implementation is to copy an
existing PL implementation, and modify that. That's usually much easier
than starting from scratch, because you have something that works. Not
sure if PL/pgSQL is the right choice though, perhaps pick some other
language from https://wiki.postgresql.org/wiki/PL_Matrix

I understand that you never wrote any PL handler but was just thinking
about this functionality as a follow-up to our conversation. I was just
wondering whether anonymous DO blocks *must* return void or not?

The docs for DO say it is a function returning void -
https://www.postgresql.org/docs/current/sql-do.html

But the docs for CREATE LANGUAGE's INLINE HANDLER say 'typically return
void' - https://www.postgresql.org/docs/current/sql-createlanguage.html

Is the implication that we can make the DO block return something
somehow? I would be quite interested if there is a way of achieving this
kind of functionality. My experiments using an SRF, which I have
written, within an anonymous DO block just gives me an "ERROR:
set-valued function called in context that cannot accept a set".

Anyway maybe I'm going off on a tangent here... perhaps it is better to
open a new thread?

Show quoted text

regards

#8David G. Johnston
david.g.johnston@gmail.com
In reply to: Tom Mercha (#7)
Re: Extending PostgreSQL with a Domain-Specific Language (DSL) - Development

On Tue, Jul 9, 2019 at 5:23 PM Tom Mercha <mercha_t@hotmail.com> wrote:

I understand that you never wrote any PL handler but was just thinking
about this functionality as a follow-up to our conversation. I was just
wondering whether anonymous DO blocks *must* return void or not?

The docs for DO say it is a function returning void -
https://www.postgresql.org/docs/current/sql-do.html

But the docs for CREATE LANGUAGE's INLINE HANDLER say 'typically return

void' - https://www.postgresql.org/docs/current/sql-createlanguage.html

No, the language cannot override the SQL execution environment's
limitations.

"The code block is treated as though it were the body of a function with no
parameters, returning void. It is parsed and executed a single time."

The above applies regardless of the language the code block is written in.

It can, however, affect permanent session state (so, use tables).

David J.

#9Tom Mercha
mercha_t@hotmail.com
In reply to: David G. Johnston (#8)
Re: Extending PostgreSQL with a Domain-Specific Language (DSL) - Development

On 10/07/2019 02:31, David G. Johnston wrote:

On Tue, Jul 9, 2019 at 5:23 PM Tom Mercha <mercha_t@hotmail.com> wrote:

I understand that you never wrote any PL handler but was just thinking
about this functionality as a follow-up to our conversation. I was just
wondering whether anonymous DO blocks *must* return void or not?

The docs for DO say it is a function returning void -
https://www.postgresql.org/docs/current/sql-do.html

But the docs for CREATE LANGUAGE's INLINE HANDLER say 'typically return

void' - https://www.postgresql.org/docs/current/sql-createlanguage.html

No, the language cannot override the SQL execution environment's
limitations.

"The code block is treated as though it were the body of a function with no
parameters, returning void. It is parsed and executed a single time."

The above applies regardless of the language the code block is written in.

It can, however, affect permanent session state (so, use tables).

Thank you very much for addressing the question.

I am still a bit of a novice with PostgreSQL internals. Could you please
provide some more detail on your comment regarding affecting permanent
session state?

Show quoted text

David J.

#10David G. Johnston
david.g.johnston@gmail.com
In reply to: Tom Mercha (#9)
Re: Extending PostgreSQL with a Domain-Specific Language (DSL) - Development

On Tue, Jul 9, 2019 at 5:43 PM Tom Mercha <mercha_t@hotmail.com> wrote:

I am still a bit of a novice with PostgreSQL internals. Could you please
provide some more detail on your comment regarding affecting permanent
session state?

I was not referring to internals.

BEGIN;
CREATE TEMP TABLE tempdo (id int);
DO $$
BEGIN
INSERT INTO tempdo VALUES (1);
END;
$$;
SELECT * FROM tempdo;
ROLLBACK;

David J.