Lex and things...
Hi,
Shot, Leon. The patch removes the #define YY_USES_REJECT from scan.c, which
means we now have expandable tokens. Of course, it also removes the
scanning of "embedded minuses", which apparently causes the optimizer to
unoptimize a little. However, the next step is attacking the limit on the
size of string literals. These seemed to be wired to YY_BUF_SIZE, or
something. Is there any reason for this?
MikeA
Ansley, Michael wrote:
Hi,
Shot, Leon. The patch removes the #define YY_USES_REJECT from scan.c, which
means we now have expandable tokens. Of course, it also removes the
scanning of "embedded minuses", which apparently causes the optimizer to
unoptimize a little.
Oh, no. Unary minus gets to grammar parser and there is recognized as
such. Then for numeric constants it becomes an *embedded* minus in
function doNegate. So unary minus after parser in numeric constants
is embedded minus, as it was earlier before patch. In other words,
I can see no change in representation of grammar after patching.
However, the next step is attacking the limit on the
size of string literals. These seemed to be wired to YY_BUF_SIZE, or
something. Is there any reason for this?
Hmm. There is something going on to remove fixed length limits
entirely, maybe someone is already doing something to lexer in
that respect? If no, I could look at what can be done there.
--
Leon.
Shot, Leon. The patch removes the #define YY_USES_REJECT from scan.c,
which
means we now have expandable tokens. Of course, it also removes the
scanning of "embedded minuses", which apparently causes the optimizer
to
unoptimize a little.
Oh, no. Unary minus gets to grammar parser and there is recognized as
such. Then for numeric constants it becomes an *embedded* minus in
function doNegate. So unary minus after parser in numeric constants
is embedded minus, as it was earlier before patch. In other words,
I can see no change in representation of grammar after patching.
Great.
However, the next step is attacking the limit on the
size of string literals. These seemed to be wired to YY_BUF_SIZE, or
something. Is there any reason for this?Hmm. There is something going on to remove fixed length limits
entirely, maybe someone is already doing something to lexer in
that respect? If no, I could look at what can be done there.
Yes, me. I've removed the query string limit from psql, libpq, and as much
of the backend as I can see. I have done some (very) preliminary testing,
and managed to get a 95kB query to execute. However, the two remaining
problems that I have run into so far are token size (which you have just
removed, many thanks ;-), and string literals, which are limited, it seems
to YY_BUF_SIZE (I think).
You see, if I can get the query string limited removed, perhaps someone who
knows a bit more than I do will do something like, hmmm, say, remove the
block size limit from tuple size... hint, hint... anybody...
MikeA
Show quoted text
--
Leon.
Import Notes
Resolved by subject fallback
Sorry, I forgot to mention in the previous mail: I sent patches to the
patches mailing list (available from the web server), which patch psql, and
libpq, and scan.l (except for you patch). The were sent at the beginning of
this month, so maybe get them, and see how they work for you.
Show quoted text
-----Original Message-----
From: Ansley, Michael [mailto:Michael.Ansley@intec.co.za]
Sent: Tuesday, August 24, 1999 12:57 PM
To: 'Leon'; 'pgsql-hackers@postgresql.org'
Subject: RE: [HACKERS] Lex and things...Shot, Leon. The patch removes the #define
YY_USES_REJECT from scan.c,
whichmeans we now have expandable tokens. Of course, it
also removes the
scanning of "embedded minuses", which apparently causes
the optimizer
tounoptimize a little.
Oh, no. Unary minus gets to grammar parser and there is
recognized as
such. Then for numeric constants it becomes an *embedded* minus in
function doNegate. So unary minus after parser in numericconstants
is embedded minus, as it was earlier before patch. In other words,
I can see no change in representation of grammar after patching.Great.
However, the next step is attacking the limit on the
size of string literals. These seemed to be wired toYY_BUF_SIZE, or
something. Is there any reason for this?
Hmm. There is something going on to remove fixed length limits
entirely, maybe someone is already doing something to lexer in
that respect? If no, I could look at what can be done there.Yes, me. I've removed the query string limit from psql,
libpq, and as much
of the backend as I can see. I have done some (very)
preliminary testing,
and managed to get a 95kB query to execute. However, the
two remaining
problems that I have run into so far are token size (which
you have just
removed, many thanks ;-), and string literals, which are
limited, it seems
to YY_BUF_SIZE (I think).You see, if I can get the query string limited removed,
perhaps someone who
knows a bit more than I do will do something like, hmmm,
say, remove the
block size limit from tuple size... hint, hint... anybody...MikeA
--
Leon.************
Import Notes
Resolved by subject fallback
Ansley, Michael wrote:
Sorry, I forgot to mention in the previous mail: I sent patches to the
patches mailing list (available from the web server), which patch psql, and
libpq, and scan.l (except for you patch). The were sent at the beginning of
this month, so maybe get them, and see how they work for you.
Hmm. This is beta - testing? I'm afraid there isn't much resources
with me for it (time, experience etc.). What can I do now is make
le-e-etlle changes (improvements, I hope) to the code :)
--
Leon.
Ansley, Michael wrote:
Hmm. There is something going on to remove fixed length limits
entirely, maybe someone is already doing something to lexer in
that respect? If no, I could look at what can be done there.Yes, me. I've removed the query string limit from psql, libpq, and as much
of the backend as I can see. I have done some (very) preliminary testing,
and managed to get a 95kB query to execute. However, the two remaining
problems that I have run into so far are token size (which you have just
removed, many thanks ;-),
I'm afraid not. There is arbitrary limit (named NAMEDATALEN) in lexer.
If identifier exeeds it, it gets '\0' at that limit, so truncated
effectively. Strings are also limited by MAX_PARSE_BUFFER which is
finally something like QUERY_BUF_SIZE = 8k*2.
Seems that string literals are the primary target, because it is
real-life constraint here now. This is not the case with supposed
huge identifiers. Should I work on it, or will you do it yourself?
and string literals, which are limited, it seems
to YY_BUF_SIZE (I think).
--
Leon.
As far as I understand it, the MAX_PARSE_BUFFER limit only applies if char
parsestring[] is used, not if char *parsestring is used. This is the whole
reason for using flex. And scan.l is set up to compile using char
*parsestring, not char parsestring[].
The NAMEDATALEN limit is imposed by the db structure, and is the limit of an
identifier. Because this is not actual data, I'm not too concerned with
this at the moment. As long as we can get pretty much unlimited data into
the tuples, I don't care what I have to call my tables, views, procedures,
etc.
Ansley, Michael wrote:
Hmm. There is something going on to remove fixed length limits
entirely, maybe someone is already doing something to lexer in
that respect? If no, I could look at what can be done there.Yes, me. I've removed the query string limit from psql, libpq, and as
much
of the backend as I can see. I have done some (very) preliminary
testing,
and managed to get a 95kB query to execute. However, the two remaining
problems that I have run into so far are token size (which you have
just
Show quoted text
removed, many thanks ;-),
I'm afraid not. There is arbitrary limit (named NAMEDATALEN)
in lexer.
If identifier exeeds it, it gets '\0' at that limit, so truncated
effectively. Strings are also limited by MAX_PARSE_BUFFER which is
finally something like QUERY_BUF_SIZE = 8k*2.Seems that string literals are the primary target, because it is
real-life constraint here now. This is not the case with supposed
huge identifiers. Should I work on it, or will you do it yourself?and string literals, which are limited, it seems
to YY_BUF_SIZE (I think).--
Leon.
Import Notes
Resolved by subject fallback
I'm afraid not. There is arbitrary limit (named NAMEDATALEN) in lexer.
If identifier exeeds it, it gets '\0' at that limit, so truncated
effectively. Strings are also limited by MAX_PARSE_BUFFER which is
finally something like QUERY_BUF_SIZE = 8k*2.
I think NAMEDATALEN referes to the size of a NAME field in the database,
which is used to store attribute names etc. So you cannot exceed
NAMEDATALEN, or the identifier won't fit into the system tables.
Adriaan
Ansley, Michael wrote:
As far as I understand it, the MAX_PARSE_BUFFER limit only applies if char
parsestring[] is used, not if char *parsestring is used. This is the whole
reason for using flex. And scan.l is set up to compile using char
*parsestring, not char parsestring[].
What is defined explicitly:
#ifdef YY_READ_BUF_SIZE
#undef YY_READ_BUF_SIZE
#endif
#define YY_READ_BUF_SIZE MAX_PARSE_BUFFER
(these strings are repeated twice :)
...
char literal[MAX_PARSE_BUFFER];
...
<xq>{xqliteral} {
if ((llen+yyleng) > (MAX_PARSE_BUFFER - 1))
elog(ERROR,"quoted string parse buffer of %d chars
exceeded",MAX_PARSE_BUFFER);
memcpy(literal+llen, yytext, yyleng+1);
llen += yyleng;
}
Seems that limits are everywhere ;)
--
Leon.
Adriaan Joubert wrote:
I think NAMEDATALEN referes to the size of a NAME field in the database,
which is used to store attribute names etc. So you cannot exceed
NAMEDATALEN, or the identifier won't fit into the system tables.
Ok. Let's leave identifiers alone.
--
Leon.
Yes, I'll go with that.
Show quoted text
Adriaan Joubert wrote:
I think NAMEDATALEN referes to the size of a NAME field in
the database,
which is used to store attribute names etc. So you cannot exceed
NAMEDATALEN, or the identifier won't fit into the system tables.Ok. Let's leave identifiers alone.
--
Leon.************
Import Notes
Resolved by subject fallback
Ansley, Michael wrote:
As far as I understand it, the MAX_PARSE_BUFFER limit only applies if
char
parsestring[] is used, not if char *parsestring is used. This is the
whole
reason for using flex. And scan.l is set up to compile using char
*parsestring, not char parsestring[].What is defined explicitly:
#ifdef YY_READ_BUF_SIZE
#undef YY_READ_BUF_SIZE
#endif
#define YY_READ_BUF_SIZE MAX_PARSE_BUFFER(these strings are repeated twice :)
I noticed that, but hey, who am I to argue.
...
char literal[MAX_PARSE_BUFFER];...
<xq>{xqliteral} {
if ((llen+yyleng) >
(MAX_PARSE_BUFFER - 1))elog(ERROR,"quoted string parse buffer of %d chars
exceeded",MAX_PARSE_BUFFER);
memcpy(literal+llen,
yytext, yyleng+1);
llen += yyleng;
}Seems that limits are everywhere ;)
--
Leon.
I think we can turn literal into a char *, if we change the code for
<xq>{xqliteral}. This doesn't look like it will be too much of a mission,
but the outer limit is going to be close to the block size, because tuples
can't expand past the end of a block. I think that it would be wise to
leave this limit in place until such time as the tuple size limit is fixed.
Then we can remove it.
So, for the moment, I think we can consider the job pretty much done, apart
from bug-fixes. We can revisit the MAX_PARSE_BUFFER limit when tuple size
is delinked from block size. My aim with this work was to remove the
general limit on the length of a query string, and that has basically been
achieved. We have, as a result of the work, come across other limits, but
those have dependencies, and will have to wait.
Cheers...
MikeA
Import Notes
Resolved by subject fallback