Bison state table

Started by Bruce Momjianalmost 7 years ago6 messages
#1Bruce Momjian
bruce@momjian.us

With our scanner keywords list now more cache-aware, and with us
planning to use Bison for years to come, has anyone ever looked at
reordering the bison state machine array to be more cache aware, e.g.,
having common states next to each other rather than scattered around the
array?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +
#2David Fetter
david@fetter.org
In reply to: Bruce Momjian (#1)
Re: Bison state table

On Fri, Jan 25, 2019 at 05:38:59PM -0500, Bruce Momjian wrote:

With our scanner keywords list now more cache-aware, and with us
planning to use Bison for years to come, has anyone ever looked at
reordering the bison state machine array to be more cache aware, e.g.,
having common states next to each other rather than scattered around the
array?

Do we have a pretty good idea of what commonly grouped states are, or
at least a way to get some not-awful estimates of what they are?

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#3Bruce Momjian
bruce@momjian.us
In reply to: David Fetter (#2)
Re: Bison state table

On Sat, Jan 26, 2019 at 12:49:50AM +0100, David Fetter wrote:

On Fri, Jan 25, 2019 at 05:38:59PM -0500, Bruce Momjian wrote:

With our scanner keywords list now more cache-aware, and with us
planning to use Bison for years to come, has anyone ever looked at
reordering the bison state machine array to be more cache aware, e.g.,
having common states next to each other rather than scattered around the
array?

Do we have a pretty good idea of what commonly grouped states are, or
at least a way to get some not-awful estimates of what they are?

Uh, I am afraid we would need to profile the grammer with some sample
queries and then base the reordering on that, kind of how compilers use
sampling to do branch prediction. Yeah, crazy idea, I know. I figured
some smart person might have written a tool to do that.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +
#4David Fetter
david@fetter.org
In reply to: Bruce Momjian (#3)
Re: Bison state table

On Fri, Jan 25, 2019 at 06:55:56PM -0500, Bruce Momjian wrote:

On Sat, Jan 26, 2019 at 12:49:50AM +0100, David Fetter wrote:

On Fri, Jan 25, 2019 at 05:38:59PM -0500, Bruce Momjian wrote:

With our scanner keywords list now more cache-aware, and with us
planning to use Bison for years to come, has anyone ever looked at
reordering the bison state machine array to be more cache aware, e.g.,
having common states next to each other rather than scattered around the
array?

Do we have a pretty good idea of what commonly grouped states are, or
at least a way to get some not-awful estimates of what they are?

Uh, I am afraid we would need to profile the grammer with some sample
queries and then base the reordering on that, kind of how compilers use
sampling to do branch prediction. Yeah, crazy idea, I know. I figured
some smart person might have written a tool to do that.

Since you're framing it that way, maybe there's something clever to do
with the llvm toolchain, which just happens to have facilities like
that. Perhaps smart people have already done this...

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#5John Naylor
john.naylor@2ndquadrant.com
In reply to: Bruce Momjian (#1)
1 attachment(s)
Re: Bison state table

On Sat, Jan 26, 2019 at 5:39 AM Bruce Momjian <bruce@momjian.us> wrote:

With our scanner keywords list now more cache-aware, and with us
planning to use Bison for years to come, has anyone ever looked at
reordering the bison state machine array to be more cache aware, e.g.,
having common states next to each other rather than scattered around the
array?

I recently spent some time investigating how the various arrays are
generated and accessed, and found this informative article:

https://www.cs.uic.edu/~spopuri/cparser.html

It's dated from 2006, but still seems to be correct on the whole,
judging by the gram.c output file. Anyway, the short answer is,
grouping common states would require changing Bison's algorithm for
compressing a sparse 2-D array into multiple less-sparse 1-D arrays,
if it's even possible to control the effect you have in mind.

That said, I had an idea. When Bison consults yytable, it has to also
consult yycheck with the same index to make sure the result is valid.
If the two arrays of int16 were merged into a single array of structs,
the state and the check would be on the same cache line. I tried this
by directly patching the gram.c output file (see attached for the
basic idea) and #include'-ing a separate file with the merged array.
It didn't improve raw parsing microbenchmarks using information schema
or short pgbench-like queries. So I'm guessing either a). the
microbenchmark already has better cache behavior than real queries so
won't show much difference here, and/or b). the bottleneck is
elsewhere than accessing yytable and yycheck.

In any case, I hope to follow Bison development and test any
performance improvements the maintainers come up with, as that was
reported to be on the roadmap:

/messages/by-id/74DD0F55-F3CF-447E-ACF2-88C01E42AAC3@lrde.epita.fr

--
John Naylor https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

gram-combine.patchapplication/octet-stream; name=gram-combine.patchDownload
--- src/backend/parser/gram_orig.c	2019-08-10 18:25:04.000000000 +0700
+++ src/backend/parser/gram.c	2019-08-10 19:05:21.000000000 +0700
@@ -1525,6 +1525,8 @@
 #define YYTRANSLATE(YYX)						\
   ((unsigned int) (YYX) <= YYMAXUTOK ? yytranslate[YYX] : YYUNDEFTOK)
 
+#include "gram_ext.c"
+
 /* YYTRANSLATE[YYLEX] -- Bison symbol number corresponding to YYLEX.  */
 static const yytype_uint16 yytranslate[] =
 {
@@ -27109,7 +27111,7 @@
       yyfmt = yystpcpy (yyformat, yyunexpected);
 
       for (yyx = yyxbegin; yyx < yyxend; ++yyx)
-	if (yycheck[yyx + yyn] == yyx && yyx != YYTERROR)
+	if (yytable_check[yyx + yyn].yy_check == yyx && yyx != YYTERROR)
 	  {
 	    if (yycount == YYERROR_VERBOSE_ARGS_MAXIMUM)
 	      {
@@ -27446,9 +27448,9 @@
   /* If the proper action on seeing token YYTOKEN is to reduce or to
      detect an error, take that action.  */
   yyn += yytoken;
-  if (yyn < 0 || YYLAST < yyn || yycheck[yyn] != yytoken)
+  if (yyn < 0 || YYLAST < yyn || yytable_check[yyn].yy_check != yytoken)
     goto yydefault;
-  yyn = yytable[yyn];
+  yyn = yytable_check[yyn].yy_table;
   if (yyn <= 0)
     {
       if (yyn == 0 || yyn == YYTABLE_NINF)
@@ -45276,8 +45278,8 @@
   yyn = yyr1[yyn];
 
   yystate = yypgoto[yyn - YYNTOKENS] + *yyssp;
-  if (0 <= yystate && yystate <= YYLAST && yycheck[yystate] == *yyssp)
-    yystate = yytable[yystate];
+  if (0 <= yystate && yystate <= YYLAST && yytable_check[yystate].yy_check == *yyssp)
+    yystate = yytable_check[yystate].yy_table;
   else
     yystate = yydefgoto[yyn - YYNTOKENS];
 
@@ -45388,9 +45390,9 @@
       if (yyn != YYPACT_NINF)
 	{
 	  yyn += YYTERROR;
-	  if (0 <= yyn && yyn <= YYLAST && yycheck[yyn] == YYTERROR)
+	  if (0 <= yyn && yyn <= YYLAST && yytable_check[yyn].yy_check == YYTERROR)
 	    {
-	      yyn = yytable[yyn];
+	      yyn = yytable_check[yyn].yy_table;
 	      if (0 < yyn)
 		break;
 	    }
#6Bruce Momjian
bruce@momjian.us
In reply to: John Naylor (#5)
Re: Bison state table

On Tue, Aug 13, 2019 at 05:09:30PM +0700, John Naylor wrote:

On Sat, Jan 26, 2019 at 5:39 AM Bruce Momjian <bruce@momjian.us> wrote:

With our scanner keywords list now more cache-aware, and with us
planning to use Bison for years to come, has anyone ever looked at
reordering the bison state machine array to be more cache aware, e.g.,
having common states next to each other rather than scattered around the
array?

I recently spent some time investigating how the various arrays are
generated and accessed, and found this informative article:

https://www.cs.uic.edu/~spopuri/cparser.html

It's dated from 2006, but still seems to be correct on the whole,
judging by the gram.c output file. Anyway, the short answer is,
grouping common states would require changing Bison's algorithm for
compressing a sparse 2-D array into multiple less-sparse 1-D arrays,
if it's even possible to control the effect you have in mind.

That said, I had an idea. When Bison consults yytable, it has to also
consult yycheck with the same index to make sure the result is valid.
If the two arrays of int16 were merged into a single array of structs,
the state and the check would be on the same cache line. I tried this
by directly patching the gram.c output file (see attached for the
basic idea) and #include'-ing a separate file with the merged array.
It didn't improve raw parsing microbenchmarks using information schema
or short pgbench-like queries. So I'm guessing either a). the
microbenchmark already has better cache behavior than real queries so
won't show much difference here, and/or b). the bottleneck is
elsewhere than accessing yytable and yycheck.

In any case, I hope to follow Bison development and test any
performance improvements the maintainers come up with, as that was
reported to be on the roadmap:

Very interesting. Thanks for researching this.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +