Extending a bit string

Started by Evan Carrollalmost 8 years ago4 messageshackers
Jump to latest
#1Evan Carroll
me@evancarroll.com

Currently the behavior of bit-string extensions is pretty insane.

SELECT b'01'::bit(2)::bit(4),
b'01'::bit(2)::int::bit(4);
bit | bit
------+------
0100 | 0001
(1 row)

I'd like propose we standardize this a bit. Previously, in version 8
compatibility was broke. From the Version 8 release notes (thanks to
Rhodium Toad for the research),

Casting an integer to BIT(N) selects the rightmost N bits of the integer,

not the leftmost N bits as before.

Everything should select the right-most bits, and extend the left-most bits
from the docs:

Casting an integer to a bit string width wider than the integer itself

will sign-extend on the left.

That makes sense to me. Intergers sign-extend on the left, and the behavior
is currently undefined for bit->bit extensions. What say you?

--
Evan Carroll - me@evancarroll.com
System Lord of the Internets
web: http://www.evancarroll.com
ph: 281.901.0011

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Evan Carroll (#1)
Re: Extending a bit string

Evan Carroll <me@evancarroll.com> writes:

Currently the behavior of bit-string extensions is pretty insane.

You've provided no support for this assertion, much less any defense
of why your proposed semantics change is any less insane. Also, if
I understood you correctly, you want to change the semantics of
casting a bitstring to a bitstring of a different length, which is
an operation that's defined by the SQL standard. You will get zero
traction on that unless you convince people that we've misread the
standard. Which is possible, but the text seems clear to me that
casting bit(2) to bit(4) requires addition of zeroes on the right:

11) If TD is fixed-length bit string, then let LTD be the length in
bits of TD. Let BLSV be the result of BIT_LENGTH(SV).
...
c) If BLSV is smaller than LTD, then TV is SV expressed as a
bit string extended on the right with LTD-BLSV bits whose
values are all 0 (zero) and a completion condition is raised:
warning - implicit zero-bit padding.

That's SQL:99 6.22 <cast specification> general rule 11) c).
(SV and TD are the source value and the target datatype for a cast.)

In hindsight, it would likely be more consistent with this if we'd
considered bitstrings to be LSB first when coercing them to/from integer,
but whoever stuck that behavior in didn't think about it. Too late to
change that now I'm afraid, though perhaps we could provide non-cast
conversion functions that act that way.

regards, tom lane

#3Evan Carroll
me@evancarroll.com
In reply to: Tom Lane (#2)
Re: Extending a bit string

That's SQL:99 6.22 <cast specification> general rule 11) c).
(SV and TD are the source value and the target datatype for a cast.)

In hindsight, it would likely be more consistent with this if we'd
considered bitstrings to be LSB first when coercing them to/from integer,
but whoever stuck that behavior in didn't think about it. Too late to
change that now I'm afraid, though perhaps we could provide non-cast
conversion functions that act that way.

Apologies, I was under the impression that casts were not in the spec. I
withdraw my request. In the 2016-draft it reads,

If the length in octets M of SV is smaller than LTD, then TV is SV

extended on the right by
LTD–M X'00's.

That's how I read it too, and whether I feel like it's insane doesn't
matter much. But yet, the idea

5:bit(8)::bit(32)::int

Not being 5 is terrifying, so you won't find any objections to the current
behavior from me.

--
Evan Carroll - me@evancarroll.com
System Lord of the Internets
web: http://www.evancarroll.com
ph: 281.901.0011

#4Evan Carroll
me@evancarroll.com
In reply to: Tom Lane (#2)
Re: Extending a bit string

In hindsight, it would likely be more consistent with this if we'd
considered bitstrings to be LSB first when coercing them to/from integer,
but whoever stuck that behavior in didn't think about it. Too late to
change that now I'm afraid, though perhaps we could provide non-cast
conversion functions that act that way.

I've been thinking about that, and that actually makes sense and I'd prefer
to revert to the pre-8.0 behavior. I just wanted to speak up to retract
that response too. In reality, I am used to the interger display as it
currently is. The current behavior of the coercion to/from int enforces the
bias that I have. It led me to believe that PostgreSQL would act like that
consistently because that's what I am used to.

SELECT 5::int::bit(8);
bit
----------
00000101

As compared to 10100000. But fundamentally SQL and the current helper
functions don't operate like that, so it's bizarre. Moreover, the
difference between the two makes it very error prone. For example, this
doesn't make sense,

SELECT get_bit(1::bit(1), 0), get_bit(1::bit(2), 1);

But, this does

SELECT get_bit(B'1'::bit(1), 0), get_bit(B'1'::bit(2), 1);

I'm sure it would have been substantially less confusing if integers
displayed their LSB on the left after casting. I think I would have figured
out what was going on *much* faster. You were right on everything in your
initial response (as I've come to expect).

--
Evan Carroll - me@evancarroll.com
System Lord of the Internets
web: http://www.evancarroll.com
ph: 281.901.0011