bit type external representation

Started by Thomas Lockhartover 23 years ago7 messages
#1Thomas Lockhart
lockhart@fourpalms.org

SQL9x defines bit string constants with a format like

B'101010'
and
X'ABCD'

for binary and hexadecimal representations. But at the moment we don't
explicitly handle both of these cases as bit strings; the hex version is
converted to decimal in the scanner (*really* early in the parsing
stage) and then handled as an integer.

It looks like our current bit string type support looks for a "B" or "X"
embedded in the actual input string, rather than outside the quote as in
the standard.

I'd like to have more support for the SQL9x syntax, which requires a
little more invasive modification of at least the scanner and parser. I
have a couple of questions:

1) the SQL standard says what hex values should be translated to in
binary, which implies that all values may be *output* in binary format.
Should we do this, or should we preserve the info on what units were
used for input in the internal storage? Anyone interpret the standard
differently from this??

2) we would need to be able to determine the format style when a string
is output to be able to reconstruct the SQL shorthand representation (if
preserving binary or hex is to be done). So a column or value should
have a corresponding is_hex() function call. Any other suggestions??

- Thomas

#2Thomas Lockhart
lockhart@fourpalms.org
In reply to: Thomas Lockhart (#1)
Re: bit type external representation

for binary and hexadecimal representations. But at the moment we don't
explicitly handle both of these cases as bit strings; the hex version is
converted to decimal in the scanner (*really* early in the parsing
stage) and then handled as an integer.
It looks like our current bit string type support looks for a "B" or "X"
embedded in the actual input string, rather than outside the quote as in
the standard.

I should point out that this is probably for historical reasons; I'd
implemented the hex to decimal conversion way before we had bit string
support, and we didn't consolidate those features when bit strings came
along.

- Thomas

#3Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Thomas Lockhart (#1)
Re: bit type external representation

for binary and hexadecimal representations. But at the moment we don't
explicitly handle both of these cases as bit strings; the hex version is
converted to decimal in the scanner (*really* early in the parsing
stage) and then handled as an integer.

It looks like our current bit string type support looks for a "B" or "X"
embedded in the actual input string, rather than outside the quote as in
the standard.

Postgres supports both:

test=# create table test (a bit(3));
CREATE
test=# insert into test values (B'101');
INSERT 3431020 1
test=# insert into test values (b'101');
INSERT 3431021 1
test=# insert into test values ('B101');
INSERT 3431022 1
test=# insert into test values ('b101');
INSERT 3431023 1
test=# select * from test;
a
-----
101
101
101
101
(4 rows)

In fact, some of our apps actually _rely_ on the old 'b101' syntax...
Although these could be changed with some effort...

Chris

#4Thomas Lockhart
lockhart@fourpalms.org
In reply to: Christopher Kings-Lynne (#3)
Re: bit type external representation

It looks like our current bit string type support looks for a "B" or "X"
embedded in the actual input string, rather than outside the quote as in
the standard.

Postgres supports both:
test=# insert into test values (B'101');
test=# insert into test values ('B101');

Right. But internally the first example has the "B" stripped out, and
the bit string input routine assumes a binary bit string if there is no
embedded leading [bBxX]. However, if we were to just pass something
without a [xX] as an explicit prefix on the string then it will always
be interpreted as a binary bit string (remember that currently X'ABCD'
is converted to decimal, not to a bit string).

In fact, some of our apps actually _rely_ on the old 'b101' syntax...
Although these could be changed with some effort...

Yes, it seems that applications reading from something like libpq would
need to detect that this is a bit string and then figure out how to
represent it on input or output.

"Cheating" with a leading "B" or "X" helps with this a lot. Note that
simply a leading "B" is not sufficient to distinguish between a binary
value and some hex values, if we were to allow output in hex...

- Thomas

#5Joe Conway
mail@joeconway.com
In reply to: Thomas Lockhart (#1)
Re: bit type external representation

Thomas Lockhart wrote:

1) the SQL standard says what hex values should be translated to in
binary, which implies that all values may be *output* in binary format.
Should we do this, or should we preserve the info on what units were
used for input in the internal storage? Anyone interpret the standard
differently from this??

SQL99, Section "5.3 <literal>":
11) The declared type of a <bit string literal> is fixed-length
bit string. The length of a <bit string literal> is the number
of bits that it contains.
12) The declared type of a <hex string literal> is fixed-length bit
string. Each <hexit> appearing in the literal is equivalent to
a quartet of bits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E,
and F are interpreted as 0000, 0001, 0010, 0011, 0100, 0101,
0110, 0111, 1000, 1001, 1010, 1011, 1100, 1101, 1110, and 1111,
respectively. The <hexit>s a, b, c, d, e, and f have respectively
the same values as the <hexit>s A, B, C, D, E, and F.

So the standard says both represent a fixed-length bit string data type.
ISTM that we should not try to preserve any information on the units
used for input, and that both should be output in the same way.

2) we would need to be able to determine the format style when a string
is output to be able to reconstruct the SQL shorthand representation (if
preserving binary or hex is to be done). So a column or value should
have a corresponding is_hex() function call. Any other suggestions??

Based on above comment, I'd say no. We might want to be able to specify
that the output format should be hex using a formatting function though.
But I guess hex output format would have to be reserved for bit strings
that are integer multiples of 4 bits in length.

Joe

#6Thomas Lockhart
lockhart@fourpalms.org
In reply to: Thomas Lockhart (#1)
Re: bit type external representation

1) the SQL standard says what hex values should be translated to in
binary, which implies that all values may be *output* in binary format.

So the standard says both represent a fixed-length bit string data type.
ISTM that we should not try to preserve any information on the units
used for input, and that both should be output in the same way.

OK, that's how I read it too. I'll work on making it behave with hex
input and not worry about the output, which takes care of itself.

I also notice that the following has trouble:

thomas=# select bit '1010';
ERROR: bit string length does not match type bit(1)

which is similar to the very recent problem with fixed-length character
strings. I've got patches to fix this one too.

- Thomas

#7Peter Eisentraut
peter_e@gmx.net
In reply to: Thomas Lockhart (#1)
Re: bit type external representation

Thomas Lockhart writes:

SQL9x defines bit string constants with a format like

B'101010'
and
X'ABCD'

for binary and hexadecimal representations. But at the moment we don't
explicitly handle both of these cases as bit strings; the hex version is
converted to decimal in the scanner (*really* early in the parsing
stage) and then handled as an integer.

The sole reason that this is still unresolved is that the SQL standard is
ambiguous about whether a literal of the form X'something' is of type bit
(<hex string literal>) or of type blob (<binary string literal>). If I
had to choose one, I'd actually lean toward blob (or bytea in our case).
Two ideas: 1. make an "unknownhex" type and resolve it late, like the
"unknown" type. 2. allow an implicit cast from bytea to bit.

It looks like our current bit string type support looks for a "B" or "X"
embedded in the actual input string, rather than outside the quote as in
the standard.

This was a stopgap measure before the B'' syntax was implemented. I guess
it's grown to be a feature now. :-/

1) the SQL standard says what hex values should be translated to in
binary, which implies that all values may be *output* in binary format.
Should we do this, or should we preserve the info on what units were
used for input in the internal storage? Anyone interpret the standard
differently from this??

I believe you are caught in the confusion I was referring to above: hex
values are possibly not even of type bit at all.

--
Peter Eisentraut peter_e@gmx.net