Re: Bit strings

Started by Bruce Momjianover 25 years ago11 messages
#1Bruce Momjian
pgman@candle.pha.pa.us

Can we get the BIT type working now that 7.1 is branched?

Hi Bruce,

I think there has been some confusion about the bit/bit varying
stuff I did. I initially
sent in a bit type that was non-SQL compliant, with the suggestion that
this may be
useful to stick into contrib.

Then several people remarked that it would be better to have an SQL
compliant type, so
I wrote the code to do all the manipulations, roughly based on

src/backend/utils/adt/varchar.c

This can evidently not be used as a standalone routine, but needs to be
properly
integrated into the database as a type. Unfortunately I don't really
know an awful
lot about the internals of postgres, and got a bit stuck trying to
figure out how to
integrate a new type correctly.

Now I see that this second set of routines has ended up in the contrib
directory,
where they are quite useless. In my opinion it would be best to have no
bit-type
in the contrib directory and have the proper SQL compliant one properly
integrated
into postgres, but I'm going to need some hints to do that correctly.

So my question is how we should proceed on this. Am I right in assuming
that there
should be no major objections to adding an SQL type to postgres, or
should I go
through another iteration on the mailing list?

Thanks for any suggestions,

Adriaan

************

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#2Adriaan Joubert
a.joubert@albourne.com
In reply to: Bruce Momjian (#1)

Bruce Momjian wrote:

Can we get the BIT type working now that 7.1 is branched?

Yep. We never quite finished the discussions before the 7.0 release. The
outstanding issues were

1) should bit operations be allowed between bit strings of different sizes?
They are at the moment, but the longest string is truncated, as you
otherwise get illogical behaviour (imagine padding with zeros and doing an
XOR). Truncation can be somewhat unlogical as well. My current feeling is
that perhaps it would be better to just disallow bit operations on
bitstrings that haven't got the same length completely. Makes the code
easier anyway ;-)

2) A second point brought up by Peter is the conversion of bit-strings to
integers and vice-versa. Postgres does not have unsigned ints, so it can
lead to somewhat surprising behaviour. Implicit conversion is probably out
of the question. So what type of conversion functions would people like to
see, if any?

3) With TOAST bit strings could be used to store large amounts of binary
data. Don mentioned MIME-encoded strings for I/O. Is this useful? I guess
and advantage is that BIT is an SQL type, but I'm not sure that any such
application would be particularly portable. It doesn't look like a lot of
extra work though, (provided somebody has got some MIME en-/decoding code
lying about) so we could add it. Am i right that with TOAST and no
restrictions on the query-buffer lengths in psql one could then upload mime
objects straight into postgres?

So, what do people think?

Adriaan

#3Peter Eisentraut
peter_e@gmx.net
In reply to: Adriaan Joubert (#2)

Adriaan Joubert writes:

My current feeling is that perhaps it would be better to just disallow
bit operations on bitstrings that haven't got the same length
completely.

I feel with you. :)

3) With TOAST bit strings could be used to store large amounts of binary
data.

This is not really the proper application for bit strings. There are BLOBs
for that. Which leads to the question, what are bit strings really useful
for other than genuine bit masks?

--
Peter Eisentraut Sernanders v�g 10:115
peter_e@gmx.net 75262 Uppsala
http://yi.org/peter-e/ Sweden

#4Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Adriaan Joubert (#2)

Can someone comment on this?

Bruce Momjian wrote:

Can we get the BIT type working now that 7.1 is branched?

Yep. We never quite finished the discussions before the 7.0 release. The
outstanding issues were

1) should bit operations be allowed between bit strings of different sizes?
They are at the moment, but the longest string is truncated, as you
otherwise get illogical behaviour (imagine padding with zeros and doing an
XOR). Truncation can be somewhat unlogical as well. My current feeling is
that perhaps it would be better to just disallow bit operations on
bitstrings that haven't got the same length completely. Makes the code
easier anyway ;-)

2) A second point brought up by Peter is the conversion of bit-strings to
integers and vice-versa. Postgres does not have unsigned ints, so it can
lead to somewhat surprising behaviour. Implicit conversion is probably out
of the question. So what type of conversion functions would people like to
see, if any?

3) With TOAST bit strings could be used to store large amounts of binary
data. Don mentioned MIME-encoded strings for I/O. Is this useful? I guess
and advantage is that BIT is an SQL type, but I'm not sure that any such
application would be particularly portable. It doesn't look like a lot of
extra work though, (provided somebody has got some MIME en-/decoding code
lying about) so we could add it. Am i right that with TOAST and no
restrictions on the query-buffer lengths in psql one could then upload mime
objects straight into postgres?

So, what do people think?

Adriaan

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#4)

Can we get the BIT type working now that 7.1 is branched?

I did some work on the BIT types a couple months ago. According to
my notes, the following issues are still outstanding before they
can be said to work at all:

Bit and hexstring literals are not handled in a reasonable fashion;
the scanner converts them to integer constants which is bogus.
Probably they need to be converted to some generic 'UNKNOWNBITSTRING'
pseudo-type that can later be coerced to a specific bitstring type.
I didn't touch this because it seems to open up the Pandora's box
of unknown-constant handling, for which we do not have a good
general solution.

SQL92 sez we need a position() function for bitstrings.

Need a regression test for bit types.

scalarltsel() and friends need to cope with bit/varbit types in
order to make good use of indexes on bitstrings.

pg_dump does not handle BIT/VARBIT lengths properly (pjw may have
fixed this by now).

regards, tom lane

#6Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#5)

Any idea where we are on this?

Can we get the BIT type working now that 7.1 is branched?

I did some work on the BIT types a couple months ago. According to
my notes, the following issues are still outstanding before they
can be said to work at all:

Bit and hexstring literals are not handled in a reasonable fashion;
the scanner converts them to integer constants which is bogus.
Probably they need to be converted to some generic 'UNKNOWNBITSTRING'
pseudo-type that can later be coerced to a specific bitstring type.
I didn't touch this because it seems to open up the Pandora's box
of unknown-constant handling, for which we do not have a good
general solution.

SQL92 sez we need a position() function for bitstrings.

Need a regression test for bit types.

scalarltsel() and friends need to cope with bit/varbit types in
order to make good use of indexes on bitstrings.

pg_dump does not handle BIT/VARBIT lengths properly (pjw may have
fixed this by now).

regards, tom lane

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#7Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Bruce Momjian (#6)

OK, I will delete my copy of the email. If there is something to be
added to the TODO list, let me know.

[ People are welcome to modify it themselves too.]

Bruce Momjian writes:

Any idea where we are on this?

Bit and hexstring literals are not handled in a reasonable fashion;

Bit string literals are handled correctly, although it occurred to me that
perhaps

#define IsA_Value(t) \
(IsA(t, Integer) || IsA(t, Float) || IsA(t, String))

should be augmented with BitString. Can someone advise?

Hex literals are still not handled correctly.

SQL92 sez we need a position() function for bitstrings.

We have one now.

Need a regression test for bit types.

We have one now.

scalarltsel() and friends need to cope with bit/varbit types in
order to make good use of indexes on bitstrings.

Not done.

pg_dump does not handle BIT/VARBIT lengths properly (pjw may have
fixed this by now).

Works fine for me.

--
Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#7)

Peter Eisentraut <peter_e@gmx.net> writes:

perhaps
#define IsA_Value(t) \
(IsA(t, Integer) || IsA(t, Float) || IsA(t, String))
should be augmented with BitString. Can someone advise?

Yes, definitely --- see the sole use of that macro, in outfuncs.c.
This is a bug.

regards, tom lane

#9Peter Eisentraut
peter_e@gmx.net
In reply to: Bruce Momjian (#6)

Bruce Momjian writes:

Any idea where we are on this?

Bit and hexstring literals are not handled in a reasonable fashion;

Bit string literals are handled correctly, although it occurred to me that
perhaps

#define IsA_Value(t) \
(IsA(t, Integer) || IsA(t, Float) || IsA(t, String))

should be augmented with BitString. Can someone advise?

Hex literals are still not handled correctly.

SQL92 sez we need a position() function for bitstrings.

We have one now.

Need a regression test for bit types.

We have one now.

scalarltsel() and friends need to cope with bit/varbit types in
order to make good use of indexes on bitstrings.

Not done.

pg_dump does not handle BIT/VARBIT lengths properly (pjw may have
fixed this by now).

Works fine for me.

--
Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#9)

Peter Eisentraut <peter_e@gmx.net> writes:

Yes, definitely --- see the sole use of that macro, in outfuncs.c.

Maybe this macro could be removed and expanded in place. That would make
things more transparent. Or is there any other potential use?

No objection from me. The macro clearly failed to make things better in
that usage, and offhand I think it would be just as likely to be a time
bomb in any other usage.

regards, tom lane

#11Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#8)

Tom Lane writes:

Peter Eisentraut <peter_e@gmx.net> writes:

perhaps
#define IsA_Value(t) \
(IsA(t, Integer) || IsA(t, Float) || IsA(t, String))
should be augmented with BitString. Can someone advise?

Yes, definitely --- see the sole use of that macro, in outfuncs.c.

Maybe this macro could be removed and expanded in place. That would make
things more transparent. Or is there any other potential use?

--
Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/