PQunescapeBytea code

Started by Jeroen T. Vermeulenabout 22 years ago5 messages

Someething to consider for after the 7.4 release, perhaps...

As per today's CVS version, PQunescapeBytea() does the following when
it encounters an escaped character (i.e., a backslash) in the escaped
string strtext at offset i:

["if (strtext[i] == '\\')"]

i++;
if (strtext[i] == '\\')
buffer[j++] = strtext[i++];
else
{
if ((isdigit(strtext[i])) &&
(isdigit(strtext[i + 1])) &&
(isdigit(strtext[i + 2])))
{
byte = VAL(strtext[i++]);
byte = (byte << 3) + VAL(strtext[i++]);
buffer[j++] = (byte << 3) + VAL(strtext[i++]);
}
}

This code completely ignores any other usage of the backslash in the
escaped string, generating no output for unknown escape sequences. Is
that the desired behaviour? The code would be a little simpler if it
were to allow al characters to be escaped, which means ignoring the
backslash but not the following character:

i++;
if (isdigit(strtext[i]) && isdigit(strtext[i+1]) && isdigit(strtext[i+2]))
{
byte = VAL(strtext[i]);
i++;
byte = (byte<<3) + VAL(strtext[i]);
i++;
byte = (byte<<3) + VAL(strtext[i]);
buffer[j++] = byte;
}
else
{
buffer[j++] = strtext[i++];
}

In fact, the "else" part is identical to the normal (non-escaped) part of
the loop, so it could probably be merged--leaving only the octal parsing
part as a special case.

Then the whole loop could become something like this:

[unsigned char c;]

for (i=j=buflen=0; i<(int)strtextlen; ++i, buffer[j++]=c)
{
c = strtext[i];
if (c == '\\')
{
c = strtext[i++]; /* Skip backslash */
if (isdigit(c) && isdigit(strtext[i+1]) && isdigit(strtext[i+2]))
{
/* Parse octal number */
byte = VAL(strtext[i++]);
byte = (byte << 3) + VAL(strtext[i++]);
c = (byte << 3) + VAL(strtext[i]);
}
}
}

...Which saves 8 lines, reduces the number of special cases, adds some
comments, and permits arbitrary characters to be escaped.

Jeroen

In reply to: Jeroen T. Vermeulen (#1)
Re: PQunescapeBytea code

On Thu, Oct 30, 2003 at 08:24:13PM +0100, Jeroen T. Vermeulen wrote:

Then the whole loop could become something like this:

Okay, that code wasn't entirely correct but it gets the idea across. In
C++ terms, what I arrived at was:

string result;
for (int i=0; i<F.size(); ++i)
{
unsigned char c = p[i];
if (c == '\\')
{
c = p[++i];
if (isdigit(c) && isdigit(p[i+1]) && isdigit(p[i+2]))
{
c = (VAL(p[c])<<9) | (VAL(p[i+1])<<3) | VAL(p[i+2]);
i += 2;
}
}
result += char(c);
}

Simple, no?

Jeroen

#3Adam Kavan
akavan@cox.net
In reply to: Jeroen T. Vermeulen (#2)
Re: PQunescapeBytea code

Actually I was looking at that code today and it does not ignore something
if it is escaped by a backslash on not on the list. It eats the backslash
and then continues the loop so next time that character will be parsed
normally. However PQunescapeBytea is _very_ slow. I am storing fairly
large (several hundered K) byte strings into Bytea's and it can take 30
seconds or more to convert them back into binary data. I wrote a new
version of PQunescapeBytea that uses pointers instead of arrays to store
the string in, this increases the speed about 30 fold on my strings and
still has the same behavior. I wasn't sure if this would be something I
should submit as a patch or not, is anyone interested in this?

If they are I'll try to figure out how to submit a patch.

--- Adam Kavan
--- akavan@cox.net
#4Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Adam Kavan (#3)
Re: PQunescapeBytea code

Adam Kavan wrote:

Actually I was looking at that code today and it does not ignore something
if it is escaped by a backslash on not on the list. It eats the backslash
and then continues the loop so next time that character will be parsed
normally. However PQunescapeBytea is _very_ slow. I am storing fairly
large (several hundered K) byte strings into Bytea's and it can take 30
seconds or more to convert them back into binary data. I wrote a new
version of PQunescapeBytea that uses pointers instead of arrays to store
the string in, this increases the speed about 30 fold on my strings and
still has the same behavior. I wasn't sure if this would be something I
should submit as a patch or not, is anyone interested in this?

If they are I'll try to figure out how to submit a patch.

Are you testing againts 7.3.X or 7.4? 7.4 has a faster version. If you
are testing against 7.4, do a diff -c against the old and new files and
send it to the patches list.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeroen T. Vermeulen (#1)
Re: PQunescapeBytea code

"Jeroen T. Vermeulen" <jtv@xs4all.nl> writes:

This code completely ignores any other usage of the backslash in the
escaped string, generating no output for unknown escape sequences. Is
that the desired behaviour?

As Adam pointed out, the code does do the right thing for other
backslash sequences; it just processes the character on the following
loop iteration. I'll add a comment to explain this.

I see another issue here, which is that for a zero-length input string
we will do malloc(0) and realloc(foo, 0), neither of which are very
portable. Will fix.

regards, tom lane