pgsql: Get rid of backtracking in jsonpath_scan.l

Started by Alexander Korotkovalmost 7 years ago14 messages
#1Alexander Korotkov
akorotkov@postgresql.org

Get rid of backtracking in jsonpath_scan.l

Non-backtracking flex parsers work faster than backtracking ones. So, this
commit gets rid of backtracking in jsonpath_scan.l. That required explicit
handling of some cases as well as manual backtracking for some cases. More
regression tests for numerics are added.

Discussion: https://mail.google.com/mail/u/0?ik=a20b091faa&view=om&permmsgid=msg-f%3A1628425344167939063
Author: John Naylor, Nikita Gluknov, Alexander Korotkov

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/1d88a75c424664cc85f307a876cde85191d27272

Modified Files
--------------
src/backend/utils/adt/Makefile | 1 +
src/backend/utils/adt/jsonpath_scan.l | 56 +++--
src/test/regress/expected/jsonb_jsonpath.out | 2 +-
src/test/regress/expected/jsonpath.out | 168 +++++++++++++++
src/test/regress/expected/jsonpath_encoding.out | 249 ++++++++++++++++++++++
src/test/regress/expected/jsonpath_encoding_1.out | 237 ++++++++++++++++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/serial_schedule | 1 +
src/test/regress/sql/jsonb_jsonpath.sql | 2 +-
src/test/regress/sql/jsonpath.sql | 30 +++
src/test/regress/sql/jsonpath_encoding.sql | 71 ++++++
11 files changed, 795 insertions(+), 24 deletions(-)

#2Robert Haas
robertmhaas@gmail.com
In reply to: Alexander Korotkov (#1)
Re: pgsql: Get rid of backtracking in jsonpath_scan.l

On Mon, Mar 25, 2019 at 8:44 AM Alexander Korotkov
<akorotkov@postgresql.org> wrote:

Discussion: https://mail.google.com/mail/u/0?ik=a20b091faa&amp;view=om&amp;permmsgid=msg-f%3A1628425344167939063

This is really a pretty evil link to include in a commit message.
When I clicked on it, it logged me out of my gmail account.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#3Alexander Korotkov
a.korotkov@postgrespro.ru
In reply to: Robert Haas (#2)
Re: pgsql: Get rid of backtracking in jsonpath_scan.l

On Mon, Mar 25, 2019 at 3:56 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Mar 25, 2019 at 8:44 AM Alexander Korotkov
<akorotkov@postgresql.org> wrote:

Discussion: https://mail.google.com/mail/u/0?ik=a20b091faa&amp;view=om&amp;permmsgid=msg-f%3A1628425344167939063

This is really a pretty evil link to include in a commit message.
When I clicked on it, it logged me out of my gmail account.

Oh, such a shameful oversight!
That should be:
/messages/by-id/CACPNZCuUXV3jEPFPsRw+4AKLvmO6CFWh3OwtH0CJv3w0oXnVoQ@mail.gmail.com

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#4Andrew Dunstan
andrew.dunstan@2ndquadrant.com
In reply to: Alexander Korotkov (#1)
Re: pgsql: Get rid of backtracking in jsonpath_scan.l

On 3/25/19 8:44 AM, Alexander Korotkov wrote:

Get rid of backtracking in jsonpath_scan.l

Non-backtracking flex parsers work faster than backtracking ones. So, this
commit gets rid of backtracking in jsonpath_scan.l. That required explicit
handling of some cases as well as manual backtracking for some cases. More
regression tests for numerics are added.

jacana appears to be having trouble with this:

2019-03-26 00:49:02.208 EDT [5c99ae9e.20cc:6] LOG: server process (PID 8368) was terminated by exception 0xC0000028
2019-03-26 00:49:02.208 EDT [5c99ae9e.20cc:7] DETAIL: Failed process was running: select '$ ? (@ like_regex "pattern" flag "a")'::jsonpath;
2019-03-26 00:49:02.208 EDT [5c99ae9e.20cc:8] HINT: See C include file "ntstatus.h" for a description of the hexadecimal value.
2019-03-26 00:49:02.208 EDT [5c99ae9e.20cc:9] LOG: terminating any other active server processes

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#4)
Re: pgsql: Get rid of backtracking in jsonpath_scan.l

Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:

On 3/25/19 8:44 AM, Alexander Korotkov wrote:

Get rid of backtracking in jsonpath_scan.l

jacana appears to be having trouble with this:

I wonder if that's related to the not-very-reproducible failures
we've seen on snapper. Can you get a stack trace?

regards, tom lane

#6Alexander Korotkov
a.korotkov@postgrespro.ru
In reply to: Tom Lane (#5)
Re: pgsql: Get rid of backtracking in jsonpath_scan.l

On Tue, Mar 26, 2019 at 5:21 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:

On 3/25/19 8:44 AM, Alexander Korotkov wrote:

Get rid of backtracking in jsonpath_scan.l

jacana appears to be having trouble with this:

I wonder if that's related to the not-very-reproducible failures
we've seen on snapper. Can you get a stack trace?

+1

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#7Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andrew Dunstan (#4)
Re: pgsql: Get rid of backtracking in jsonpath_scan.l

On 2019-Mar-26, Andrew Dunstan wrote:

On 3/25/19 8:44 AM, Alexander Korotkov wrote:

Get rid of backtracking in jsonpath_scan.l

Non-backtracking flex parsers work faster than backtracking ones. So, this
commit gets rid of backtracking in jsonpath_scan.l. That required explicit
handling of some cases as well as manual backtracking for some cases. More
regression tests for numerics are added.

jacana appears to be having trouble with this:

2019-03-26 00:49:02.208 EDT [5c99ae9e.20cc:6] LOG: server process (PID 8368) was terminated by exception 0xC0000028

0xC0000028 is STATUS_BAD_STACK, per
https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-erref/596a1078-e883-4972-9bbc-49e60bebca55
Not sure how credible/useful a stack trace is going to be.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#8Andrew Dunstan
andrew.dunstan@2ndquadrant.com
In reply to: Alvaro Herrera (#7)
Re: pgsql: Get rid of backtracking in jsonpath_scan.l

On 3/26/19 11:22 AM, Alvaro Herrera wrote:

On 2019-Mar-26, Andrew Dunstan wrote:

On 3/25/19 8:44 AM, Alexander Korotkov wrote:

Get rid of backtracking in jsonpath_scan.l

Non-backtracking flex parsers work faster than backtracking ones. So, this
commit gets rid of backtracking in jsonpath_scan.l. That required explicit
handling of some cases as well as manual backtracking for some cases. More
regression tests for numerics are added.

jacana appears to be having trouble with this:

2019-03-26 00:49:02.208 EDT [5c99ae9e.20cc:6] LOG: server process (PID 8368) was terminated by exception 0xC0000028

0xC0000028 is STATUS_BAD_STACK, per
https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-erref/596a1078-e883-4972-9bbc-49e60bebca55
Not sure how credible/useful a stack trace is going to be.

Right, and getting stack traces isn't easy in any case. There is a
gadget from Google that is supposed to trap exceptions and produce a
stack trace on the fly in mingw. I'm going to take a look at it,
although loading it might be ... interesting.

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#9Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#7)
1 attachment(s)
Re: pgsql: Get rid of backtracking in jsonpath_scan.l

On 2019-Mar-26, Alvaro Herrera wrote:

2019-03-26 00:49:02.208 EDT [5c99ae9e.20cc:6] LOG: server process (PID 8368) was terminated by exception 0xC0000028

0xC0000028 is STATUS_BAD_STACK, per
https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-erref/596a1078-e883-4972-9bbc-49e60bebca55
Not sure how credible/useful a stack trace is going to be.

BTW I think we should update our message to use this URL instead of
ambiguously pointing to "ntstatus.h". Also, all the URLs in
win32_port.h (except the wine one) are dead.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

ntstatus.patchtext/x-diff; charset=us-asciiDownload
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index f84f882c4cb..549c82ea627 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -647,7 +647,8 @@ pgarch_archiveXlog(char *xlog)
 			ereport(lev,
 					(errmsg("archive command was terminated by exception 0x%X",
 							WTERMSIG(rc)),
-					 errhint("See C include file \"ntstatus.h\" for a description of the hexadecimal value."),
+					 errhint("See \"%s\" for a description of the hexadecimal value.",
+							 "https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-erref/596a1078-e883-4972-9bbc-49e60bebca55"),
 					 errdetail("The failed archive command was: %s",
 							   xlogarchcmd)));
 #else
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index fe599632d3d..a540d51d79a 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -3630,7 +3630,8 @@ LogChildExit(int lev, const char *procname, int pid, int exitstatus)
 		  "server process" */
 				(errmsg("%s (PID %d) was terminated by exception 0x%X",
 						procname, pid, WTERMSIG(exitstatus)),
-				 errhint("See C include file \"ntstatus.h\" for a description of the hexadecimal value."),
+				 errhint("See \"%s\" for a description of the hexadecimal value.",
+						 "https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-erref/596a1078-e883-4972-9bbc-49e60bebca55"),
 				 activity ? errdetail("Failed process was running: %s", activity) : 0));
 #else
 		ereport(lev,
diff --git a/src/include/port/win32_port.h b/src/include/port/win32_port.h
index f4841fb3975..12abd08cda4 100644
--- a/src/include/port/win32_port.h
+++ b/src/include/port/win32_port.h
@@ -107,7 +107,7 @@
  *	similar to a unix-style signal exit (think SIGSEGV ==
  *	STATUS_ACCESS_VIOLATION).  Return values are broken up into groups:
  *
- *	http://msdn2.microsoft.com/en-gb/library/aa489609.aspx
+ *	https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-erref/87fba13e-bf06-450e-83b1-9241dc81e781
  *
  *		NT_SUCCESS			0 - 0x3FFFFFFF
  *		NT_INFORMATION		0x40000000 - 0x7FFFFFFF
@@ -119,24 +119,10 @@
  *	by the system, and it seems values >= 0x100 are system-generated.
  *	See this URL for a list of WIN32 STATUS_* values:
  *
- *		Wine (URL used in our error messages) -
+ *		https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-erref/596a1078-e883-4972-9bbc-49e60bebca55
+ *
+ *		Wine -
  *			http://source.winehq.org/source/include/ntstatus.h
- *		Descriptions - http://www.comp.nus.edu.sg/~wuyongzh/my_doc/ntstatus.txt
- *		MS SDK - http://www.nologs.com/ntstatus.html
- *
- *	It seems the exception lists are in both ntstatus.h and winnt.h, but
- *	ntstatus.h has a more comprehensive list, and it only contains
- *	exception values, rather than winnt, which contains lots of other
- *	things:
- *
- *		http://www.microsoft.com/msj/0197/exception/exception.aspx
- *
- *		The ExceptionCode parameter is the number that the operating system
- *		assigned to the exception. You can see a list of various exception codes
- *		in WINNT.H by searching for #defines that start with "STATUS_". For
- *		example, the code for the all-too-familiar STATUS_ACCESS_VIOLATION is
- *		0xC0000005. A more complete set of exception codes can be found in
- *		NTSTATUS.H from the Windows NT DDK.
  *
  *	Some day we might want to print descriptions for the most common
  *	exceptions, rather than printing an include file name.  We could use
#10Andrew Dunstan
andrew.dunstan@2ndquadrant.com
In reply to: Alvaro Herrera (#9)
Re: pgsql: Get rid of backtracking in jsonpath_scan.l

On 3/26/19 12:53 PM, Alvaro Herrera wrote:

On 2019-Mar-26, Alvaro Herrera wrote:

2019-03-26 00:49:02.208 EDT [5c99ae9e.20cc:6] LOG: server process (PID 8368) was terminated by exception 0xC0000028

0xC0000028 is STATUS_BAD_STACK, per
https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-erref/596a1078-e883-4972-9bbc-49e60bebca55
Not sure how credible/useful a stack trace is going to be.

BTW I think we should update our message to use this URL instead of
ambiguously pointing to "ntstatus.h". Also, all the URLs in
win32_port.h (except the wine one) are dead.

That's a fairly awful URL. How stable is it likely to be?

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#9)
Re: pgsql: Get rid of backtracking in jsonpath_scan.l

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

0xC0000028 is STATUS_BAD_STACK, per
https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-erref/596a1078-e883-4972-9bbc-49e60bebca55
Not sure how credible/useful a stack trace is going to be.

BTW I think we should update our message to use this URL instead of
ambiguously pointing to "ntstatus.h".

I've never cared for the ntstatus.h reference, but how stable is
the URL you suggest going to be? That UUID or whatever it is
does not inspire confidence.

regards, tom lane

#12Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#11)
Re: pgsql: Get rid of backtracking in jsonpath_scan.l

On 2019-Mar-26, Tom Lane wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

0xC0000028 is STATUS_BAD_STACK, per
https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-erref/596a1078-e883-4972-9bbc-49e60bebca55
Not sure how credible/useful a stack trace is going to be.

BTW I think we should update our message to use this URL instead of
ambiguously pointing to "ntstatus.h".

I've never cared for the ntstatus.h reference, but how stable is
the URL you suggest going to be? That UUID or whatever it is
does not inspire confidence.

That's true. Before posting, I looked for a statement about URL
stability, couldn't find anything. I suppose one currently working URL
is better than four currently dead URLs.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#12)
Re: pgsql: Get rid of backtracking in jsonpath_scan.l

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

On 2019-Mar-26, Tom Lane wrote:

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-erref/596a1078-e883-4972-9bbc-49e60bebca55

I've never cared for the ntstatus.h reference, but how stable is
the URL you suggest going to be? That UUID or whatever it is
does not inspire confidence.

That's true. Before posting, I looked for a statement about URL
stability, couldn't find anything. I suppose one currently working URL
is better than four currently dead URLs.

It looks like we could just point to the parent page,

https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-erref/

That requires a little more drilling down, but it seems likely to
remain stable across versions, whereas I strongly suspect the URL
you mention is version-specific.

regards, tom lane

#14Andrew Dunstan
andrew.dunstan@2ndquadrant.com
In reply to: Andrew Dunstan (#8)
Re: pgsql: Get rid of backtracking in jsonpath_scan.l

On 3/26/19 12:36 PM, Andrew Dunstan wrote:

On 3/26/19 11:22 AM, Alvaro Herrera wrote:

On 2019-Mar-26, Andrew Dunstan wrote:

On 3/25/19 8:44 AM, Alexander Korotkov wrote:

Get rid of backtracking in jsonpath_scan.l

Non-backtracking flex parsers work faster than backtracking ones. So, this
commit gets rid of backtracking in jsonpath_scan.l. That required explicit
handling of some cases as well as manual backtracking for some cases. More
regression tests for numerics are added.

jacana appears to be having trouble with this:

2019-03-26 00:49:02.208 EDT [5c99ae9e.20cc:6] LOG: server process (PID 8368) was terminated by exception 0xC0000028

0xC0000028 is STATUS_BAD_STACK, per
https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-erref/596a1078-e883-4972-9bbc-49e60bebca55
Not sure how credible/useful a stack trace is going to be.

Right, and getting stack traces isn't easy in any case. There is a
gadget from Google that is supposed to trap exceptions and produce a
stack trace on the fly in mingw. I'm going to take a look at it,
although loading it might be ... interesting.

Still working on this. However, I have another data point. On a shiny
new Msys2/WS2019 system (on AWS/EC2) this does not reproduce, even
though jacana seems to be producing it quite reliably.

To get the backtrace gadget working I think I'm going to have to add
some code like:

#if defined(WIN32) && defined(LOAD_BACKTRACE)

    LoadLibraryA("backtrace.dll");

#endif

cheers

andrew

--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services