making the backend's json parser work in frontend code

Started by Robert Haasabout 6 years ago101 messageshackers
Jump to latest
#1Robert Haas
robertmhaas@gmail.com

The discussion on the backup manifest thread has gotten bogged down on
the issue of the format that should be used to store the backup
manifest file. I want something simple and ad-hoc; David Steele and
Stephen Frost prefer JSON. That is problematic because our JSON parser
does not work in frontend code, and I want to be able to validate a
backup against its manifest, which involves being able to parse the
manifest from frontend code. The latest development over there is that
David Steele has posted the JSON parser that he wrote for pgbackrest
with an offer to try to adapt it for use in front-end PostgreSQL code,
an offer which I genuinely appreciate. I'll write more about that over
on that thread. However, I decided to spend today doing some further
investigation of an alternative approach, namely making the backend's
existing JSON parser work in frontend code as well. I did not solve
all the problems there, but I did come up with some patches which I
think would be worth committing on independent grounds, and I think
the whole series is worth posting. So here goes.

0001 moves wchar.c from src/backend/utils/mb to src/common. Unless I'm
missing something, this seems like an overdue cleanup. It's long been
the case that wchar.c is actually compiled and linked into both
frontend and backend code. Commit
60f11b87a2349985230c08616fa8a34ffde934c8 added code into src/common
that depends on wchar.c being available, but didn't actually make
wchar.c part of src/common, which seems like an odd decision: the
functions in the library are dependent on code that is not part of any
library but whose source files get copied around where needed. Eh?

0002 does some basic header cleanup to make it possible to include the
existing header file jsonapi.h in frontend code. The state of the JSON
headers today looks generally poor. There seems not to have been much
attempt to get the prototypes for a given source file, say foo.c, into
a header file with the same name, say foo.h. Also, dependencies
between various header files seem to be have added somewhat freely.
This patch does not come close to fixing all that, but I consider it a
modest down payment on a cleanup that probably ought to be taken
further.

0003 splits json.c into two files, json.c and jsonapi.c. All the
lexing and parsing stuff (whose prototypes are in jsonapi.h) goes into
jsonapi.c, while the stuff that pertains to the 'json' data type
remains in json.c. This also seems like a good cleanup, because to me,
at least, it's not a great idea to mix together code that is used by
both the json and jsonb data types as well as other things in the
system that want to generate or parse json together with things that
are specific to the 'json' data type.

As far as I know all three of the above patches are committable as-is;
review and contrary opinions welcome.

On the other hand, 0004, 0005, and 0006 are charitably described as
experimental or WIP. 0004 and 0005 hack up jsonapi.c so that it can
still be compiled even if #include "postgres.h" is changed to #include
"postgres-fe.h" and 0006 moves it into src/common. Note that I say
that they make it compile, not work. It's not just untested; it's
definitely broken. But it gives a feeling for what the remaining
obstacles to making this code available in a frontend environment are.
Since I wrote my very first email complaining about the difficulty of
making the backend's JSON parser work in a frontend environment, one
obstacle has been knocked down: StringInfo is now available in
front-end code (commit 26aaf97b683d6258c098859e6b1268e1f5da242f). The
remaining problems (that I know about) have to do with error reporting
and multibyte character support; a read of the patches is suggested
for those wanting further details.

Suggestions welcome.

Thanks,

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

0001-Move-wchar.c-to-src-common.patchapplication/octet-stream; name=0001-Move-wchar.c-to-src-common.patchDownload+3-5
0002-Adjust-src-include-utils-jsonapi.h-so-it-s-not-backe.patchapplication/octet-stream; name=0002-Adjust-src-include-utils-jsonapi.h-so-it-s-not-backe.patchDownload+55-34
0003-Split-JSON-lexer-parser-from-json-data-type-support.patchapplication/octet-stream; name=0003-Split-JSON-lexer-parser-from-json-data-type-support.patchDownload+1224-1206
0004-Introduce-json_error-macro.patchapplication/octet-stream; name=0004-Introduce-json_error-macro.patchDownload+90-132
0005-Frontendify-jsonapi.c.patchapplication/octet-stream; name=0005-Frontendify-jsonapi.c.patchDownload+68-4
0006-Move-jsonapi.c-to-src-common.patchapplication/octet-stream; name=0006-Move-jsonapi.c-to-src-common.patchDownload+6-3
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#1)
Re: making the backend's json parser work in frontend code

Robert Haas <robertmhaas@gmail.com> writes:

... However, I decided to spend today doing some further
investigation of an alternative approach, namely making the backend's
existing JSON parser work in frontend code as well. I did not solve
all the problems there, but I did come up with some patches which I
think would be worth committing on independent grounds, and I think
the whole series is worth posting. So here goes.

In general, if we can possibly get to having one JSON parser in
src/common, that seems like an obviously better place to be than
having two JSON parsers. So I'm encouraged that it might be
feasible after all.

0001 moves wchar.c from src/backend/utils/mb to src/common. Unless I'm
missing something, this seems like an overdue cleanup.

FWIW, I've been wanting to do that for awhile. I've not studied
your patch, but +1 for the idea. We might also need to take a
hard look at mbutils.c to see if any of that code can/should move.

Since I wrote my very first email complaining about the difficulty of
making the backend's JSON parser work in a frontend environment, one
obstacle has been knocked down: StringInfo is now available in
front-end code (commit 26aaf97b683d6258c098859e6b1268e1f5da242f). The
remaining problems (that I know about) have to do with error reporting
and multibyte character support; a read of the patches is suggested
for those wanting further details.

The patch I just posted at <2863.1579127649@sss.pgh.pa.us> probably
affects this in small ways, but not anything major.

regards, tom lane

#3Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#1)
Re: making the backend's json parser work in frontend code

Hi,

On 2020-01-15 16:02:49 -0500, Robert Haas wrote:

The discussion on the backup manifest thread has gotten bogged down on
the issue of the format that should be used to store the backup
manifest file. I want something simple and ad-hoc; David Steele and
Stephen Frost prefer JSON. That is problematic because our JSON parser
does not work in frontend code, and I want to be able to validate a
backup against its manifest, which involves being able to parse the
manifest from frontend code. The latest development over there is that
David Steele has posted the JSON parser that he wrote for pgbackrest
with an offer to try to adapt it for use in front-end PostgreSQL code,
an offer which I genuinely appreciate. I'll write more about that over
on that thread.

I'm not sure where I come down between using json and a simple ad-hoc
format, when the dependency for the former is making the existing json
parser work in the frontend. But if the alternative is to add a second
json parser, it very clearly shifts towards using an ad-hoc
format. Having to maintain a simple ad-hoc parser is a lot less
technical debt than having a second full blown json parser. Imo even
when an external project or three also has to have that simple parser.

If the alternative were to use that newly proposed json parser to
*replace* the backend one too, the story would again be different.

0001 moves wchar.c from src/backend/utils/mb to src/common. Unless I'm
missing something, this seems like an overdue cleanup. It's long been
the case that wchar.c is actually compiled and linked into both
frontend and backend code. Commit
60f11b87a2349985230c08616fa8a34ffde934c8 added code into src/common
that depends on wchar.c being available, but didn't actually make
wchar.c part of src/common, which seems like an odd decision: the
functions in the library are dependent on code that is not part of any
library but whose source files get copied around where needed. Eh?

Cool.

0002 does some basic header cleanup to make it possible to include the
existing header file jsonapi.h in frontend code. The state of the JSON
headers today looks generally poor. There seems not to have been much
attempt to get the prototypes for a given source file, say foo.c, into
a header file with the same name, say foo.h. Also, dependencies
between various header files seem to be have added somewhat freely.
This patch does not come close to fixing all that, but I consider it a
modest down payment on a cleanup that probably ought to be taken
further.

Yea, this seems like a necessary cleanup (or well, maybe the start of
it).

0003 splits json.c into two files, json.c and jsonapi.c. All the
lexing and parsing stuff (whose prototypes are in jsonapi.h) goes into
jsonapi.c, while the stuff that pertains to the 'json' data type
remains in json.c. This also seems like a good cleanup, because to me,
at least, it's not a great idea to mix together code that is used by
both the json and jsonb data types as well as other things in the
system that want to generate or parse json together with things that
are specific to the 'json' data type.

+1

On the other hand, 0004, 0005, and 0006 are charitably described as
experimental or WIP. 0004 and 0005 hack up jsonapi.c so that it can
still be compiled even if #include "postgres.h" is changed to #include
"postgres-fe.h" and 0006 moves it into src/common. Note that I say
that they make it compile, not work. It's not just untested; it's
definitely broken. But it gives a feeling for what the remaining
obstacles to making this code available in a frontend environment are.
Since I wrote my very first email complaining about the difficulty of
making the backend's JSON parser work in a frontend environment, one
obstacle has been knocked down: StringInfo is now available in
front-end code (commit 26aaf97b683d6258c098859e6b1268e1f5da242f). The
remaining problems (that I know about) have to do with error reporting
and multibyte character support; a read of the patches is suggested
for those wanting further details.

From d05e1fc82a51cb583a0367e72b1afc0de561dd00 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 15 Jan 2020 10:36:52 -0500
Subject: [PATCH 4/6] Introduce json_error() macro.

---
src/backend/utils/adt/jsonapi.c | 221 +++++++++++++-------------------
1 file changed, 90 insertions(+), 131 deletions(-)

diff --git a/src/backend/utils/adt/jsonapi.c b/src/backend/utils/adt/jsonapi.c
index fc8af9f861..20f7f0f7ac 100644
--- a/src/backend/utils/adt/jsonapi.c
+++ b/src/backend/utils/adt/jsonapi.c
@@ -17,6 +17,9 @@
#include "miscadmin.h"
#include "utils/jsonapi.h"
+#define json_error(rest) \
+	ereport(ERROR, (rest, report_json_context(lex)))
+

It's not obvious why the better approach here wouldn't be to just have a
very simple ereport replacement, that needs to be explicitly included
from frontend code. It'd not be meaningfully harder, imo, and it'd
require fewer adaptions, and it'd look more familiar.

/* the null action object used for pure validation */
@@ -701,7 +735,11 @@ json_lex_string(JsonLexContext *lex)
ch = (ch * 16) + (*s - 'A') + 10;
else
{
+#ifdef FRONTEND
+						lex->token_terminator = s + PQmblen(s, PG_UTF8);
+#else
lex->token_terminator = s + pg_mblen(s);
+#endif

If we were to go this way, it seems like the ifdef should rather be in a
helper function, rather than all over. It seems like it should be
unproblematic to have a common interface for both frontend/backend?

Greetings,

Andres Freund

#4Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#3)
Re: making the backend's json parser work in frontend code

On Wed, Jan 15, 2020 at 6:40 PM Andres Freund <andres@anarazel.de> wrote:

It's not obvious why the better approach here wouldn't be to just have a
very simple ereport replacement, that needs to be explicitly included
from frontend code. It'd not be meaningfully harder, imo, and it'd
require fewer adaptions, and it'd look more familiar.

I agree that it's far from obvious that the hacks in the patch are
best; to the contrary, they are hacks. That said, I feel that the
semantics of throwing an error are not very well-defined in a
front-end environment. I mean, in a backend context, throwing an error
is going to abort the current transaction, with all that this implies.
If the frontend equivalent is to do nothing and hope for the best, I
doubt it will survive anything more than the simplest use cases. This
is one of the reasons I've been very reluctant to go do down this
whole path in the first place.

+#ifdef FRONTEND
+                                             lex->token_terminator = s + PQmblen(s, PG_UTF8);
+#else
lex->token_terminator = s + pg_mblen(s);
+#endif

If we were to go this way, it seems like the ifdef should rather be in a
helper function, rather than all over.

Sure... like I said, this is just to illustrate the problem.

It seems like it should be
unproblematic to have a common interface for both frontend/backend?

Not sure how. pg_mblen() and PQmblen() are both existing interfaces,
and they're not compatible with each other. I guess we could make
PQmblen() available to backend code, but given that the function name
implies an origin in libpq, that seems wicked confusing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#5Michael Paquier
michael@paquier.xyz
In reply to: Robert Haas (#4)
Re: making the backend's json parser work in frontend code

On Wed, Jan 15, 2020 at 09:39:13PM -0500, Robert Haas wrote:

On Wed, Jan 15, 2020 at 6:40 PM Andres Freund <andres@anarazel.de> wrote:

It's not obvious why the better approach here wouldn't be to just have a
very simple ereport replacement, that needs to be explicitly included
from frontend code. It'd not be meaningfully harder, imo, and it'd
require fewer adaptions, and it'd look more familiar.

I agree that it's far from obvious that the hacks in the patch are
best; to the contrary, they are hacks. That said, I feel that the
semantics of throwing an error are not very well-defined in a
front-end environment. I mean, in a backend context, throwing an error
is going to abort the current transaction, with all that this implies.
If the frontend equivalent is to do nothing and hope for the best, I
doubt it will survive anything more than the simplest use cases. This
is one of the reasons I've been very reluctant to go do down this
whole path in the first place.

The error handling is a well defined concept in the backend. If
connected to a database, you know that a session has to rollback any
existing activity, etc. The clients have to be more flexible because
an error depends a lot of how the tools is designed and how it should
react on a error. So the backend code in charge of logging an error
does the best it can: it throws an error, then lets the caller decide
what to do with it. I agree with the feeling that having a simple
replacement for ereport() in the frontend would be nice, that would be
less code churn in parts shared by backend/frontend.

Not sure how. pg_mblen() and PQmblen() are both existing interfaces,
and they're not compatible with each other. I guess we could make
PQmblen() available to backend code, but given that the function name
implies an origin in libpq, that seems wicked confusing.

Well, the problem here is the encoding part, and the code looks at the
same table pg_wchar_table[] at the end, so this needs some thoughts.
On top of that, we don't know exactly on the client what kind of
encoding is available (this led for example to several
assumptions/hiccups behind the implementation of SCRAM as it requires
UTF-8 per its RFC when working on the libpq part).
--
Michael

#6David Steele
david@pgmasters.net
In reply to: Robert Haas (#1)
Re: making the backend's json parser work in frontend code

Hi Robert,

On 1/15/20 2:02 PM, Robert Haas wrote:

The discussion on the backup manifest thread has gotten bogged down on
the issue of the format that should be used to store the backup
manifest file. I want something simple and ad-hoc; David Steele and
Stephen Frost prefer JSON. That is problematic because our JSON parser
does not work in frontend code, and I want to be able to validate a
backup against its manifest, which involves being able to parse the
manifest from frontend code. The latest development over there is that
David Steele has posted the JSON parser that he wrote for pgbackrest
with an offer to try to adapt it for use in front-end PostgreSQL code,
an offer which I genuinely appreciate. I'll write more about that over
on that thread. However, I decided to spend today doing some further
investigation of an alternative approach, namely making the backend's
existing JSON parser work in frontend code as well. I did not solve
all the problems there, but I did come up with some patches which I
think would be worth committing on independent grounds, and I think
the whole series is worth posting. So here goes.

I was starting to wonder if it wouldn't be simpler to go back to the
Postgres JSON parser and see if we can adapt it. I'm not sure that it
*is* simpler, but it would almost certainly be more acceptable.

0001 moves wchar.c from src/backend/utils/mb to src/common. Unless I'm
missing something, this seems like an overdue cleanup. It's long been
the case that wchar.c is actually compiled and linked into both
frontend and backend code. Commit
60f11b87a2349985230c08616fa8a34ffde934c8 added code into src/common
that depends on wchar.c being available, but didn't actually make
wchar.c part of src/common, which seems like an odd decision: the
functions in the library are dependent on code that is not part of any
library but whose source files get copied around where needed. Eh?

This looks like an obvious improvement to me.

0002 does some basic header cleanup to make it possible to include the
existing header file jsonapi.h in frontend code. The state of the JSON
headers today looks generally poor. There seems not to have been much
attempt to get the prototypes for a given source file, say foo.c, into
a header file with the same name, say foo.h. Also, dependencies
between various header files seem to be have added somewhat freely.
This patch does not come close to fixing all that, but I consider it a
modest down payment on a cleanup that probably ought to be taken
further.

Agreed that these header files are fairly disorganized. In general the
names json, jsonapi, jsonfuncs don't tell me a whole lot. I feel like
I'd want to include json.h to get a json parser but it only contains one
utility function before these patches. I can see that json.c primarily
contains SQL functions so that's why.

So the idea here is that json.c will have the JSON SQL functions,
jsonb.c the JSONB SQL functions, and jsonapi.c the parser, and
jsonfuncs.c the utility functions?

0003 splits json.c into two files, json.c and jsonapi.c. All the
lexing and parsing stuff (whose prototypes are in jsonapi.h) goes into
jsonapi.c, while the stuff that pertains to the 'json' data type
remains in json.c. This also seems like a good cleanup, because to me,
at least, it's not a great idea to mix together code that is used by
both the json and jsonb data types as well as other things in the
system that want to generate or parse json together with things that
are specific to the 'json' data type.

This seems like a good first step. I wonder if the remainder of the SQL
json/jsonb functions should be moved to json.c/jsonb.c respectively?

That does represent a lot of code churn though, so perhaps not worth it.

As far as I know all three of the above patches are committable as-is;
review and contrary opinions welcome.

Agreed, with some questions as above.

On the other hand, 0004, 0005, and 0006 are charitably described as
experimental or WIP. 0004 and 0005 hack up jsonapi.c so that it can
still be compiled even if #include "postgres.h" is changed to #include
"postgres-fe.h" and 0006 moves it into src/common. Note that I say
that they make it compile, not work. It's not just untested; it's
definitely broken. But it gives a feeling for what the remaining
obstacles to making this code available in a frontend environment are.
Since I wrote my very first email complaining about the difficulty of
making the backend's JSON parser work in a frontend environment, one
obstacle has been knocked down: StringInfo is now available in
front-end code (commit 26aaf97b683d6258c098859e6b1268e1f5da242f). The
remaining problems (that I know about) have to do with error reporting
and multibyte character support; a read of the patches is suggested
for those wanting further details.

Well, with the caveat that it doesn't work, it's less than I expected.

Obviously ereport() is a pretty big deal and I agree with Michael
downthread that we should port this to the frontend code.

It would also be nice to unify functions like PQmblen() and pg_mblen()
if possible.

The next question in my mind is given the caveat that the error handing
is questionable in the front end, can we at least render/parse valid
JSON with the code?

Regards,
--
-David
david@pgmasters.net

#7David Steele
david@pgmasters.net
In reply to: David Steele (#6)
Re: making the backend's json parser work in frontend code

Hi Robert,

On 1/16/20 11:37 AM, David Steele wrote:

The next question in my mind is given the caveat that the error handing
is questionable in the front end, can we at least render/parse valid
JSON with the code?

Hrm, this bit was from an earlier edit. I meant:

The next question in my mind is what will it take to get this working in
a limited form so we can at least prototype it with pg_basebackup. I
can hack on this with some static strings in front end code tomorrow to
see what works and what doesn't if that makes sense.

Regards,
--
-David
david@pgmasters.net

#8Robert Haas
robertmhaas@gmail.com
In reply to: David Steele (#6)
Re: making the backend's json parser work in frontend code

On Thu, Jan 16, 2020 at 1:37 PM David Steele <david@pgmasters.net> wrote:

I was starting to wonder if it wouldn't be simpler to go back to the
Postgres JSON parser and see if we can adapt it. I'm not sure that it
*is* simpler, but it would almost certainly be more acceptable.

That is my feeling also.

So the idea here is that json.c will have the JSON SQL functions,
jsonb.c the JSONB SQL functions, and jsonapi.c the parser, and
jsonfuncs.c the utility functions?

Uh, I think roughly that, yes. Although I can't claim to fully
understand everything that's here.

This seems like a good first step. I wonder if the remainder of the SQL
json/jsonb functions should be moved to json.c/jsonb.c respectively?

That does represent a lot of code churn though, so perhaps not worth it.

I don't have an opinion on this right now.

Well, with the caveat that it doesn't work, it's less than I expected.

Obviously ereport() is a pretty big deal and I agree with Michael
downthread that we should port this to the frontend code.

Another possibly-attractive option would be to defer throwing the
error: i.e. set some flags in the lex or parse state or something, and
then just return. The caller notices the flags and has enough
information to throw an error or whatever it wants to do. The reason I
think this might be attractive is that it dodges the whole question of
what exactly throwing an error is supposed to do in a world without
transactions, memory contexts, resource owners, etc. However, it has
some pitfalls of its own, like maybe being too much code churn or
hurting performance in non-error cases.

It would also be nice to unify functions like PQmblen() and pg_mblen()
if possible.

I don't see how to do that at the moment, but I agree that it would be
nice if we can figure it out.

The next question in my mind is given the caveat that the error handing
is questionable in the front end, can we at least render/parse valid
JSON with the code?

That's a real good question. Thanks for offering to test it; I think
that would be very helpful.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#9David Steele
david@pgmasters.net
In reply to: Andres Freund (#3)
Re: making the backend's json parser work in frontend code

On 1/15/20 4:40 PM, Andres Freund wrote:

I'm not sure where I come down between using json and a simple ad-hoc
format, when the dependency for the former is making the existing json
parser work in the frontend. But if the alternative is to add a second
json parser, it very clearly shifts towards using an ad-hoc
format. Having to maintain a simple ad-hoc parser is a lot less
technical debt than having a second full blown json parser.

Maybe at first, but it will grow and become more complex as new features
are added. This has been our experience with pgBackRest, at least.

Imo even
when an external project or three also has to have that simple parser.

I don't agree here. Especially if we outgrow the format and they need
two parsers, depending on the version of PostgreSQL.

To do page-level incrementals (which this feature is intended to enable)
the user will need to be able to associate full and incremental backups
and the only way I see to do that (currently) is to read the manifests,
since the prior backup should be stored there. I think this means that
parsing the manifest is not really optional -- it will be required to do
any kind of automation with incrementals.

It's easy enough for a tool like pgBackRest to do something like that,
much harder for a user hacking together a tool in bash based on
pg_basebackup.

If the alternative were to use that newly proposed json parser to
*replace* the backend one too, the story would again be different.

That was certainly not my intention.

Regards,
--
-David
david@pgmasters.net

#10David Steele
david@pgmasters.net
In reply to: Robert Haas (#4)
Re: making the backend's json parser work in frontend code

On 1/15/20 7:39 PM, Robert Haas wrote:

On Wed, Jan 15, 2020 at 6:40 PM Andres Freund <andres@anarazel.de> wrote:

It's not obvious why the better approach here wouldn't be to just have a
very simple ereport replacement, that needs to be explicitly included
from frontend code. It'd not be meaningfully harder, imo, and it'd
require fewer adaptions, and it'd look more familiar.

I agree that it's far from obvious that the hacks in the patch are
best; to the contrary, they are hacks. That said, I feel that the
semantics of throwing an error are not very well-defined in a
front-end environment. I mean, in a backend context, throwing an error
is going to abort the current transaction, with all that this implies.
If the frontend equivalent is to do nothing and hope for the best, I
doubt it will survive anything more than the simplest use cases. This
is one of the reasons I've been very reluctant to go do down this
whole path in the first place.

The way we handle this in pgBackRest is to put a TRY ... CATCH block in
main() to log and exit on any uncaught THROW. That seems like a
reasonable way to start here. Without memory contexts that almost
certainly will mean memory leaks but I'm not sure how much that matters
if the action is to exit immediately.

Regards,
--
-David
david@pgmasters.net

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Steele (#10)
Re: making the backend's json parser work in frontend code

David Steele <david@pgmasters.net> writes:

On 1/15/20 7:39 PM, Robert Haas wrote:

I agree that it's far from obvious that the hacks in the patch are
best; to the contrary, they are hacks. That said, I feel that the
semantics of throwing an error are not very well-defined in a
front-end environment. I mean, in a backend context, throwing an error
is going to abort the current transaction, with all that this implies.
If the frontend equivalent is to do nothing and hope for the best, I
doubt it will survive anything more than the simplest use cases. This
is one of the reasons I've been very reluctant to go do down this
whole path in the first place.

The way we handle this in pgBackRest is to put a TRY ... CATCH block in
main() to log and exit on any uncaught THROW. That seems like a
reasonable way to start here. Without memory contexts that almost
certainly will mean memory leaks but I'm not sure how much that matters
if the action is to exit immediately.

If that's the expectation, we might as well replace backend ereport(ERROR)
with something that just prints a message and does exit(1).

The question comes down to whether there are use-cases where a frontend
application would really want to recover and continue processing after
a JSON syntax problem. I'm not seeing that that's a near-term
requirement, so maybe we could leave it for somebody to solve when
and if they want to do it.

regards, tom lane

#12Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#11)
Re: making the backend's json parser work in frontend code

Hi,

On 2020-01-16 14:20:28 -0500, Tom Lane wrote:

David Steele <david@pgmasters.net> writes:

The way we handle this in pgBackRest is to put a TRY ... CATCH block in
main() to log and exit on any uncaught THROW. That seems like a
reasonable way to start here. Without memory contexts that almost
certainly will mean memory leaks but I'm not sure how much that matters
if the action is to exit immediately.

If that's the expectation, we might as well replace backend ereport(ERROR)
with something that just prints a message and does exit(1).

Well, the process might still want to do some cleanup of half-finished
work. You'd not need to be resistant against memory leaks to do so, if
followed by an exit. Obviously you can also do all the necessarily
cleanup from within the ereport(ERROR) itself, but that doesn't seem
appealing to me (not composable, harder to reuse for other programs,
etc).

Greetings,

Andres Freund

#13David Steele
david@pgmasters.net
In reply to: Andres Freund (#12)
Re: making the backend's json parser work in frontend code

On 1/16/20 12:26 PM, Andres Freund wrote:

Hi,

On 2020-01-16 14:20:28 -0500, Tom Lane wrote:

David Steele <david@pgmasters.net> writes:

The way we handle this in pgBackRest is to put a TRY ... CATCH block in
main() to log and exit on any uncaught THROW. That seems like a
reasonable way to start here. Without memory contexts that almost
certainly will mean memory leaks but I'm not sure how much that matters
if the action is to exit immediately.

If that's the expectation, we might as well replace backend ereport(ERROR)
with something that just prints a message and does exit(1).

Well, the process might still want to do some cleanup of half-finished
work. You'd not need to be resistant against memory leaks to do so, if
followed by an exit. Obviously you can also do all the necessarily
cleanup from within the ereport(ERROR) itself, but that doesn't seem
appealing to me (not composable, harder to reuse for other programs,
etc).

In pgBackRest we have a default handler that just logs the message to
stderr and exits (though we consider it a coding error if it gets
called). Seems like we could do the same here. Default message and
exit if no handler, but optionally allow a handler (which could RETHROW
to get to the default handler afterwards).

It seems like we've been wanting a front end version of ereport() for a
while so I'll take a look at that and see what it involves.

Regards,
--
-David
david@pgmasters.net

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#1)
Re: making the backend's json parser work in frontend code

Robert Haas <robertmhaas@gmail.com> writes:

0001 moves wchar.c from src/backend/utils/mb to src/common. Unless I'm
missing something, this seems like an overdue cleanup.

Here's a reviewed version of 0001. You missed fixing the MSVC build,
and there were assorted comments and other things referencing wchar.c
that needed to be cleaned up.

Also, it seemed to me that if we are going to move wchar.c, we should
also move encnames.c, so that libpq can get fully out of the
symlinking-source-files business. It makes initdb less weird too.

I took the liberty of sticking proper copyright headers onto these
two files, too. (This makes the diff a lot more bulky :-(. Would
it help to add the headers in a separate commit?)

Another thing I'm wondering about is if any of the #ifndef FRONTEND
code should get moved *back* to src/backend/utils/mb. But that
could be a separate commit, too.

Lastly, it strikes me that maybe pg_wchar.h, or parts of it, should
migrate over to src/include/common. But that'd be far more invasive
to other source files, so I've not touched the issue here.

regards, tom lane

Attachments:

0001-move-wchar-and-encnames-to-src-common.patchtext/x-diff; charset=us-ascii; name=0001-move-wchar-and-encnames-to-src-common.patchDownload+2699-2716
#15Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#14)
Re: making the backend's json parser work in frontend code

On Thu, Jan 16, 2020 at 3:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

0001 moves wchar.c from src/backend/utils/mb to src/common. Unless I'm
missing something, this seems like an overdue cleanup.

Here's a reviewed version of 0001. You missed fixing the MSVC build,
and there were assorted comments and other things referencing wchar.c
that needed to be cleaned up.

Wow, thanks.

Also, it seemed to me that if we are going to move wchar.c, we should
also move encnames.c, so that libpq can get fully out of the
symlinking-source-files business. It makes initdb less weird too.

OK.

I took the liberty of sticking proper copyright headers onto these
two files, too. (This makes the diff a lot more bulky :-(. Would
it help to add the headers in a separate commit?)

I wouldn't bother making it a separate commit, but please do whatever you like.

Another thing I'm wondering about is if any of the #ifndef FRONTEND
code should get moved *back* to src/backend/utils/mb. But that
could be a separate commit, too.

+1 for moving that stuff to a separate backend-only file.

Lastly, it strikes me that maybe pg_wchar.h, or parts of it, should
migrate over to src/include/common. But that'd be far more invasive
to other source files, so I've not touched the issue here.

I don't have a view on this.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#16Robert Haas
robertmhaas@gmail.com
In reply to: David Steele (#9)
Re: making the backend's json parser work in frontend code

On Thu, Jan 16, 2020 at 1:58 PM David Steele <david@pgmasters.net> wrote:

To do page-level incrementals (which this feature is intended to enable)
the user will need to be able to associate full and incremental backups
and the only way I see to do that (currently) is to read the manifests,
since the prior backup should be stored there. I think this means that
parsing the manifest is not really optional -- it will be required to do
any kind of automation with incrementals.

My current belief is that enabling incremental backup will require
extending the manifest format either not at all or by adding one
additional line with some LSN info.

If we could foresee a need to store a bunch of additional *per-file*
details, I'd be a lot more receptive to the argument that we ought to
be using a more structured format like JSON. And it doesn't seem
impossible that such a thing could happen, but I don't think it's at
all clear that it actually will happen, or that it will happen soon
enough that we ought to be worrying about it now.

It's possible that we're chasing a real problem here, and if there's
something we can agree on and get done I'd rather do that than argue,
but I am still quite suspicious that there's no actually serious
technical problem here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#15)
Re: making the backend's json parser work in frontend code

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Jan 16, 2020 at 3:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Here's a reviewed version of 0001. You missed fixing the MSVC build,
and there were assorted comments and other things referencing wchar.c
that needed to be cleaned up.

Wow, thanks.

Pushed that.

Another thing I'm wondering about is if any of the #ifndef FRONTEND
code should get moved *back* to src/backend/utils/mb. But that
could be a separate commit, too.

+1 for moving that stuff to a separate backend-only file.

After a brief look, I propose the following:

* I think we should just shove the "#ifndef FRONTEND" stuff in
wchar.c into mbutils.c. It doesn't seem worth inventing a whole
new file for that code, especially when it's arguably within the
remit of mbutils.c anyway.

* Let's remove the "#ifndef FRONTEND" restriction on the ICU-related
stuff in encnames.c. Even if we don't need that stuff in frontend
today, it's hardly unlikely that we will need it tomorrow. And there's
not that much bulk there anyway.

* The one positive reason for that restriction is the ereport() in
get_encoding_name_for_icu. We could change that to be the usual
#ifdef-ereport-or-printf dance, but I think there's a better way: put
the ereport at the caller, by redefining that function to return NULL
for an unsupported encoding. There's only one caller today anyhow.

* PG_char_to_encoding() and PG_encoding_to_char() can be moved to
mbutils.c; they'd fit reasonably well beside getdatabaseencoding and
pg_client_encoding. (I also thought about utils/adt/misc.c, but
that's not obviously better.)

Barring objections I'll go make this happen shortly.

Lastly, it strikes me that maybe pg_wchar.h, or parts of it, should
migrate over to src/include/common. But that'd be far more invasive
to other source files, so I've not touched the issue here.

I don't have a view on this.

If anyone is hot to do this part, please have at it. I'm not.

regards, tom lane

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#16)
Re: making the backend's json parser work in frontend code

Robert Haas <robertmhaas@gmail.com> writes:

It's possible that we're chasing a real problem here, and if there's
something we can agree on and get done I'd rather do that than argue,
but I am still quite suspicious that there's no actually serious
technical problem here.

It's entirely possible that you're right. But if this is a file format
that is meant to be exposed to user tools, we need to take a very long
view of the requirements for it. Five or ten years down the road, we
might be darn glad we spent extra time now.

regards, tom lane

#19Andrew Dunstan
andrew@dunslane.net
In reply to: Robert Haas (#1)
Re: making the backend's json parser work in frontend code

On Thu, Jan 16, 2020 at 7:33 AM Robert Haas <robertmhaas@gmail.com> wrote:

0002 does some basic header cleanup to make it possible to include the
existing header file jsonapi.h in frontend code. The state of the JSON
headers today looks generally poor. There seems not to have been much
attempt to get the prototypes for a given source file, say foo.c, into
a header file with the same name, say foo.h. Also, dependencies
between various header files seem to be have added somewhat freely.
This patch does not come close to fixing all that, but I consider it a
modest down payment on a cleanup that probably ought to be taken
further.

0003 splits json.c into two files, json.c and jsonapi.c. All the
lexing and parsing stuff (whose prototypes are in jsonapi.h) goes into
jsonapi.c, while the stuff that pertains to the 'json' data type
remains in json.c. This also seems like a good cleanup, because to me,
at least, it's not a great idea to mix together code that is used by
both the json and jsonb data types as well as other things in the
system that want to generate or parse json together with things that
are specific to the 'json' data type.

I'm probably responsible for a good deal of the mess, so let me say Thankyou.

I'll have a good look at these.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#20David Steele
david@pgmasters.net
In reply to: Robert Haas (#8)
Re: making the backend's json parser work in frontend code

Hi Robert,

On 1/16/20 11:51 AM, Robert Haas wrote:

On Thu, Jan 16, 2020 at 1:37 PM David Steele <david@pgmasters.net> wrote:

The next question in my mind is given the caveat that the error handing
is questionable in the front end, can we at least render/parse valid
JSON with the code?

That's a real good question. Thanks for offering to test it; I think
that would be very helpful.

It seems to work just fine. I didn't stress it too hard but I did put
in one escape and a multi-byte character and check the various data types.

Attached is a test hack on pg_basebackup which produces this output:

START
FIELD "number", null 0
SCALAR TYPE 2: 123
FIELD "string", null 0
SCALAR TYPE 1: val ue-丏
FIELD "bool", null 0
SCALAR TYPE 9: true
FIELD "null", null 1
SCALAR TYPE 11: null
END

I used the callbacks because that's the first method I found but it
seems like json_lex() might be easier to use in practice.

I think it's an issue that the entire string must be passed to the lexer
at once. That will not be great for large manifests. However, I don't
think it will be all that hard to implement an optional "want more"
callback in the lexer so JSON data can be fed in from the file in chunks.

So, that just leaves ereport() as the largest remaining issue? I'll
look at that today and Tuesday and see what I can work up.

Regards,
--
-David
david@pgmasters.net

Attachments:

json-api-client-test.difftext/plain; charset=UTF-8; name=json-api-client-test.diff; x-mac-creator=0; x-mac-type=0Download+31-0
#21David Steele
david@pgmasters.net
In reply to: Robert Haas (#8)
#22Robert Haas
robertmhaas@gmail.com
In reply to: Andrew Dunstan (#19)
#23Robert Haas
robertmhaas@gmail.com
In reply to: David Steele (#20)
#24Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Tom Lane (#17)
#25David Steele
david@pgmasters.net
In reply to: Robert Haas (#23)
#26Robert Haas
robertmhaas@gmail.com
In reply to: David Steele (#25)
#27Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#26)
#28Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#27)
#29Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#28)
#30Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#29)
#31Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Alvaro Herrera (#30)
#32Robert Haas
robertmhaas@gmail.com
In reply to: Mark Dilger (#31)
#33Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#32)
#34Bruce Momjian
bruce@momjian.us
In reply to: Alvaro Herrera (#33)
#35Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#33)
#36Robert Haas
robertmhaas@gmail.com
In reply to: Bruce Momjian (#34)
#37Bruce Momjian
bruce@momjian.us
In reply to: Robert Haas (#36)
#38Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Bruce Momjian (#37)
#39Bruce Momjian
bruce@momjian.us
In reply to: Alvaro Herrera (#38)
#40Robert Haas
robertmhaas@gmail.com
In reply to: Bruce Momjian (#39)
#41Bruce Momjian
bruce@momjian.us
In reply to: Robert Haas (#40)
#42Daniel Verite
daniel@manitou-mail.org
In reply to: Robert Haas (#35)
#43Robert Haas
robertmhaas@gmail.com
In reply to: Daniel Verite (#42)
#44Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Bruce Momjian (#39)
#45Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Mark Dilger (#31)
#46Andrew Dunstan
andrew@dunslane.net
In reply to: Mark Dilger (#45)
#47Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Andrew Dunstan (#46)
#48Peter Eisentraut
peter_e@gmx.net
In reply to: Robert Haas (#32)
#49Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#48)
#50David Steele
david@pgmasters.net
In reply to: Robert Haas (#36)
#51David Steele
david@pgmasters.net
In reply to: Tom Lane (#49)
#52Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: David Steele (#50)
#53David Steele
david@pgmasters.net
In reply to: Alvaro Herrera (#52)
#54Mark Dilger
mark.dilger@enterprisedb.com
In reply to: David Steele (#51)
#55Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: David Steele (#53)
#56Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#55)
#57Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Mark Dilger (#54)
#58Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#57)
#59Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#56)
#60Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#58)
#61Peter Eisentraut
peter_e@gmx.net
In reply to: Robert Haas (#59)
#62Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Alvaro Herrera (#60)
#63Robert Haas
robertmhaas@gmail.com
In reply to: Mark Dilger (#45)
#64Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Peter Eisentraut (#61)
#65Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Robert Haas (#63)
#66Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Robert Haas (#27)
#67Andrew Dunstan
andrew@dunslane.net
In reply to: Mark Dilger (#65)
#68Andrew Dunstan
andrew@dunslane.net
In reply to: Mark Dilger (#66)
#69Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Andrew Dunstan (#68)
#70Andrew Dunstan
andrew@dunslane.net
In reply to: Andrew Dunstan (#68)
#71Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Andrew Dunstan (#70)
#72Robert Haas
robertmhaas@gmail.com
In reply to: Mark Dilger (#71)
#73Mahendra Singh Thalor
mahi6run@gmail.com
In reply to: Robert Haas (#72)
#74Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Robert Haas (#72)
#75Robert Haas
robertmhaas@gmail.com
In reply to: Mahendra Singh Thalor (#73)
#76Robert Haas
robertmhaas@gmail.com
In reply to: Mark Dilger (#74)
#77Julien Rouhaud
rjuju123@gmail.com
In reply to: Robert Haas (#75)
#78Mahendra Singh Thalor
mahi6run@gmail.com
In reply to: Robert Haas (#75)
#79Robert Haas
robertmhaas@gmail.com
In reply to: Mahendra Singh Thalor (#78)
#80Robert Haas
robertmhaas@gmail.com
In reply to: Julien Rouhaud (#77)
#81Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#80)
#82Robert Haas
robertmhaas@gmail.com
In reply to: Mark Dilger (#74)
#83Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#79)
#84Julien Rouhaud
rjuju123@gmail.com
In reply to: Tom Lane (#81)
#85Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#83)
#86Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#85)
#87Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#86)
#88Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#87)
#89Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#88)
#90Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#89)
#91Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#90)
#92Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Robert Haas (#82)
#93Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Robert Haas (#76)
#94Andrew Dunstan
andrew@dunslane.net
In reply to: Mark Dilger (#92)
#95Robert Haas
robertmhaas@gmail.com
In reply to: Mark Dilger (#93)
#96Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#95)
#97Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#96)
#98Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#97)
#99Andrew Dunstan
andrew@dunslane.net
In reply to: Andrew Dunstan (#94)
#100Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Andrew Dunstan (#99)
#101Andrew Dunstan
andrew@dunslane.net
In reply to: Mark Dilger (#100)