BUG #1671: Long interval string representation rejected

Started by Mark Dilgeralmost 21 years ago19 messagesbugs
Jump to latest
#1Mark Dilger
mark.dilger@enterprisedb.com

The following bug has been logged online:

Bug reference: 1671
Logged by: Mark Dilger
Email address: markdilger@yahoo.com
PostgreSQL version: 8.0.2
Operating system: Mandrake Linux 9.1 (2.4.21-0.13mdkenterprise #1 SMP)
Description: Long interval string representation rejected
Details:

<QUOTE>

bash-2.05b$ psql
Welcome to psql 8.0.2, the PostgreSQL interactive terminal.

Type: \copyright for distribution terms
\h for help with SQL commands
\? for help with psql commands
\g or terminate with semicolon to execute query
\q to quit

mark=# select '4 millenniums 5 centuries 4 decades 1 year 4 months 4 days 17
minutes 31 seconds'::interval;
ERROR: invalid input syntax for type interval: "4 millenniums 5 centuries 4
decades 1 year 4 months 4 days 17 minutes 31 seconds"
mark=# select '4 millenniums 5 centuries 4 decades 1 year 4 months 4 days 17
minutes'::interval;
interval
-----------------------------------
4541 years 4 mons 4 days 00:17:00
(1 row)

</QUOTE>

It appears that any string representation of an interval of length greater
than 76 is rejected. (76 = 51 + 25 = MAXDATELEN + MAXDATEFIELDS). This
appears to be a limitation enforced within function interval_in() in the
file src/backend/utils/adt/timestamp.c

#2Neil Conway
neilc@samurai.com
In reply to: Mark Dilger (#1)
Re: BUG #1671: Long interval string representation rejected

Mark Dilger wrote:

It appears that any string representation of an interval of length greater
than 76 is rejected. (76 = 51 + 25 = MAXDATELEN + MAXDATEFIELDS). This
appears to be a limitation enforced within function interval_in() in the
file src/backend/utils/adt/timestamp.c

Yeah, this seems bogus. It's not even clear to me why MAXDATELEN +
MAXDATEFIELDS is used as the size of that buffer in the first place. I
don't know the datetime code particularly well; perhaps someone who does
can shed some light on this?

-Neil

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Neil Conway (#2)
Re: BUG #1671: Long interval string representation rejected

Neil Conway <neilc@samurai.com> writes:

Yeah, this seems bogus. It's not even clear to me why MAXDATELEN +
MAXDATEFIELDS is used as the size of that buffer in the first place. I
don't know the datetime code particularly well; perhaps someone who does
can shed some light on this?

My rule of thumb with the datetime code is that if it looks bogus,
it probably is :-(

There are a lot of fixed-size local buffers in that code. The ones
used in output routines seem defensible since the string to be generated
is predictable. The ones that are used for processing input are likely
wrong. OTOH I'm not eager to throw a palloc into each of those code
paths ... can we avoid that?

regards, tom lane

#4Neil Conway
neilc@samurai.com
In reply to: Tom Lane (#3)
Re: BUG #1671: Long interval string representation rejected

Tom Lane wrote:

There are a lot of fixed-size local buffers in that code. The ones
used in output routines seem defensible since the string to be generated
is predictable. The ones that are used for processing input are likely
wrong. OTOH I'm not eager to throw a palloc into each of those code
paths ... can we avoid that?

I'm not sure offhand what the upper bounds on legal input for each of
the datetime types is. Why not just allocate a larger but still
fixed-size buffer -- say, 256 bytes?

(While we're on the subject, it seems rather silly for ParseDateTime()
not to do its own bounds checking -- all of its call sites do a strlen()
on the input buffer before calling it, which could be avoided if
ParseDateTime() we passed the size of `lowstr')

-Neil

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Neil Conway (#4)
Re: BUG #1671: Long interval string representation rejected

Neil Conway <neilc@samurai.com> writes:

Tom Lane wrote:

There are a lot of fixed-size local buffers in that code. The ones
used in output routines seem defensible since the string to be generated
is predictable. The ones that are used for processing input are likely
wrong.

I'm not sure offhand what the upper bounds on legal input for each of
the datetime types is.

Well, if you allow for whitespace between tokens then it's immediately
clear that there is no fixed upper bound. Perhaps it would work to
downcase just one token at a time, so that the max buffer length equals
the max acceptable token?

regards, tom lane

#6Neil Conway
neilc@samurai.com
In reply to: Tom Lane (#5)
Re: BUG #1671: Long interval string representation rejected

Tom Lane wrote:

Well, if you allow for whitespace between tokens then it's immediately
clear that there is no fixed upper bound.

Good point -- there is no upper bound on the input string, but since we
skip whitespace, AFAICS this shouldn't affect the requirements for the
size of the working buffer (lowstr). So if we passed the size of the
working buffer to ParseDateTime() (per earlier gripe), we could only
bail out when we actually need to use more working space than was
allocated, not simply when strlen(input) >= sizeof(buffer). The
implementation might be a bit ugly, though.

Perhaps it would work to downcase just one token at a time, so that
the max buffer length equals the max acceptable token?

Not sure I follow you.

-Neil

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Neil Conway (#6)
Re: BUG #1671: Long interval string representation rejected

Neil Conway <neilc@samurai.com> writes:

Perhaps it would work to downcase just one token at a time, so that
the max buffer length equals the max acceptable token?

Not sure I follow you.

I believe that the reason for the local buffer is to hold a downcased
version of the input, which we can compare to the all-lower-case tables
of relevant keywords. So plan A for fixing it is to downcase and
compare one token at a time, instead of downcasing the whole string
at once.

Plan B is to make the token comparison routines case-insensitive
themselves, so that there's no need for a temporary copy of the input.

Probably someone can think of plans C and D too ... I don't have any
strong allegiance to any one way to fix it.

regards, tom lane

#8Neil Conway
neilc@samurai.com
In reply to: Tom Lane (#7)
Re: BUG #1671: Long interval string representation rejected

Tom Lane wrote:

I believe that the reason for the local buffer is to hold a downcased
version of the input, which we can compare to the all-lower-case tables
of relevant keywords.

Well, that's one of the reasons, but not the only one. For example, how
do you parse '17 minutes31 seconds'::interval without either a working
buffer or the ability to resize the input buffer? (We need to split the
input into two fields, each of which must be a NUL-terminated string.
These fields are currently implemented as into the working buffer. If we
changed that to be pointers into the input string, we have no where to
put the NUL terminator. So you either need a separate buffer, repalloc()
plus shuffling everything down, or palloc'd allocation for the fields
themselves.) So I don't see that fixing the casing issue is sufficient.

On closer inspection, the current code seems to get this wrong as well
:-( For example, given the query:

select '4millenniums5centuries4decades1year4months4days17minutes31
seconds'::interval;

attaching via gdb, with a breakpoint at datetime.c:890 --

(gdb) p lp
$1 = 0xbfffdde1 "��@"
(gdb) p lowstr + 25 + 51
$4 = 0xbfffdddc "onds"

(i.e. lp points past the end of lowstr)

-Neil

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Neil Conway (#8)
Re: BUG #1671: Long interval string representation rejected

Neil Conway <neilc@samurai.com> writes:

Tom Lane wrote:

I believe that the reason for the local buffer is to hold a downcased
version of the input, which we can compare to the all-lower-case tables
of relevant keywords.

Well, that's one of the reasons, but not the only one. For example, how
do you parse '17 minutes31 seconds'::interval without either a working
buffer or the ability to resize the input buffer?

Sorry, s/downcased/downcased and null-terminated/. I have not read the
parsing code in question lately, but offhand it seems like transferring
one token at a time into a work buffer isn't a completely broken idea...

On closer inspection, the current code seems to get this wrong as well
:-(

Wouldn't be surprised :-(

regards, tom lane

#10Neil Conway
neilc@samurai.com
In reply to: Tom Lane (#9)
Re: BUG #1671: Long interval string representation rejected

Tom Lane wrote:

Sorry, s/downcased/downcased and null-terminated/. I have not read the
parsing code in question lately, but offhand it seems like transferring
one token at a time into a work buffer isn't a completely broken idea...

I wouldn't call it "broken", but I don't see how it could be done
without a significant refactoring of how datetime parsing is done, and
your handwaving has yet to convince me :) The gist of the current code is:

1. parse the input string into fields, which are arrays of pointers into
a working buffer, via ParseDateTime()
2. decode the fields as appropriate for the datatype via
DecodeInterval(), DecodeDatetime(), DecodeTimeOnly(), etc.

i.e. I don't see an easy way to do field decoding one field at a time.

-Neil

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Neil Conway (#10)
Re: BUG #1671: Long interval string representation rejected

Neil Conway <neilc@samurai.com> writes:

your handwaving has yet to convince me :)

Zing ;-)

I'm too tired to think about this more tonight, but it seems like
the basic point is that we can't just construct a list of pointers
into (a copy of) the original input string. A certain amount of
data copying is going to have to occur; or else we change the
token matching code to not assume null-termination.

regards, tom lane

#12Neil Conway
neilc@samurai.com
In reply to: Tom Lane (#11)
Re: BUG #1671: Long interval string representation rejected

Tom Lane wrote:

I'm too tired to think about this more tonight, but it seems like
the basic point is that we can't just construct a list of pointers
into (a copy of) the original input string.

I think we _can_ do it that way, it's just a question of whether that is
the best approach. I think the solution I outlined before would work
fine: pass the length of the working buffer to ParseDateTime(), and
reject the input only if the parsing process actually requires more
working space than was provided. ParseDateTime() can easily skip
whitespace without storing it in the temporary buffer -- and once we've
done that, AFAIK we _can_ place an upper bound on the amount of working
space that is needed (each field has a maximum length and there are a
maximum number of fields).

Or we could rewrite how we do parsing, which seems to be what you're
suggesting. I agree the code could do with cleanup, although we will
also need some kind of minimally-invasive fix for back branches.

-Neil

#13Neil Conway
neilc@samurai.com
In reply to: Neil Conway (#12)
Re: BUG #1671: Long interval string representation rejected

Neil Conway wrote:

I think we _can_ do it that way, it's just a question of whether that is
the best approach. I think the solution I outlined before would work
fine: pass the length of the working buffer to ParseDateTime(), and
reject the input only if the parsing process actually requires more
working space than was provided. [...]

Or we could rewrite how we do parsing, which seems to be what you're
suggesting.

Any thoughts on this, Tom? If you're planning on doing the rewrite you
described, that's fine; alternatively I'm happy to implement the fix
that I described above.

-Neil

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Neil Conway (#13)
Re: BUG #1671: Long interval string representation rejected

Neil Conway <neilc@samurai.com> writes:

Any thoughts on this, Tom? If you're planning on doing the rewrite you
described, that's fine; alternatively I'm happy to implement the fix
that I described above.

I'm not set on it --- fix it as you suggested.

regards, tom lane

#15Neil Conway
neilc@samurai.com
In reply to: Tom Lane (#14)
Re: BUG #1671: Long interval string representation rejected

Tom Lane wrote:

I'm not set on it --- fix it as you suggested.

Attached is a patch that implements this. I'm not especially happy about
the implementation: defining _two_ local macros (that both
doubly-evaluate one of their arguments) is pretty ugly, but I didn't see
a cleaner alternative -- suggestions welcome.

This patch fixes the stack overrun. Since we now pass the length of the
working space into ParseDateTime(), it is also easy to fix Mark's bug: I
just increased the size of the interval_in() working buffer to 256
bytes. I didn't touch any of the other input buffer sizes, although that
shouldn't be taken as a vote of confidence that they are correct :) At
least the consequence of an insufficiently-large buffer is limited to
rejecting otherwise valid input, rather than potentially scribbling on
the stack.

-Neil

Attachments:

datetime_parsing-3.patchtext/x-patch; name=datetime_parsing-3.patchDownload+158-149
#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Neil Conway (#15)
Re: BUG #1671: Long interval string representation rejected

Neil Conway <neilc@samurai.com> writes:

Attached is a patch that implements this. I'm not especially happy about
the implementation: defining _two_ local macros (that both
doubly-evaluate one of their arguments) is pretty ugly, but I didn't see
a cleaner alternative -- suggestions welcome.

Considering that you're incrementing bufptr inside the macro, it hardly
seems that double-evaluation is a problem: the argument pretty much has
to be a variable. OTOH there is no reason for the input argument to be
treated that way. I'd suggest just one macro

#define APPEND_CHAR(bufptr, end, newchar) \
do \
{ \
if (((bufptr) + 1) >= (end)) \
return DTERR_BAD_FORMAT; \
*(bufptr)++ = (newchar); \
} while (0);

used as

APPEND_CHAR(bufp, bufend, *cp++);
APPEND_CHAR(bufp, bufend, pg_tolower((unsigned char) *cp++));

respectively.

regards, tom lane

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#16)
Re: BUG #1671: Long interval string representation rejected

I wrote:

#define APPEND_CHAR(bufptr, end, newchar) \
do \
{ \
if (((bufptr) + 1) >= (end)) \
return DTERR_BAD_FORMAT; \
*(bufptr)++ = (newchar); \
} while (0);

Oh BTW, there definitely should not be a semicolon at the end
of that macro ...

regards, tom lane

#18Neil Conway
neilc@samurai.com
In reply to: Tom Lane (#16)
Re: BUG #1671: Long interval string representation rejected

Tom Lane wrote:

Considering that you're incrementing bufptr inside the macro, it hardly
seems that double-evaluation is a problem: the argument pretty much has
to be a variable. OTOH there is no reason for the input argument to be
treated that way. I'd suggest just one macro [...]

Ah, yeah, that works. Attached is a revised patch -- I'll apply it
tomorrow barring any objections. I also fixed the semi-colon -- thanks
for the review.

-Neil

Attachments:

datetime_parsing-4.patchtext/x-patch; name=datetime_parsing-4.patchDownload+150-141
#19Neil Conway
neilc@samurai.com
In reply to: Neil Conway (#18)
Re: BUG #1671: Long interval string representation rejected

Neil Conway wrote:

Ah, yeah, that works. Attached is a revised patch -- I'll apply it
tomorrow barring any objections. I also fixed the semi-colon -- thanks
for the review.

Patch applied to HEAD, REL8_0_STABLE, and REL7_4_STABLE. Thanks for the
report, Mark.

-Neil