Boolean storage takes up 1 byte?

Started by Thom Brownover 16 years ago5 messagesgeneral
Jump to latest
#1Thom Brown
thombrown@gmail.com

Hi,

I've read the PostgreSQL documentation page on the boolean datatype (
http://www.postgresql.org/docs/8.4/static/datatype-boolean.html) to find out
what PostgreSQL's definition of a boolean is, as I believe it is distinctive
from a bit(1) datatype (as you can't max() a boolean.. not sure what an
efficient alternative to that is). However, I see that a boolean takes up 1
byte of storage, which is 8 bits. Is this due to the fact that the value
can be null? I'm not clear as to how a null field is stored, or is that the
point... nothing references is so it is defined as null? If that is the
case, can't this be stored as 1 bit? And does its storage as a byte affect
indexing or query planning?

Thom

#2Sam Mason
sam@samason.me.uk
In reply to: Thom Brown (#1)
Re: Boolean storage takes up 1 byte?

On Thu, Oct 01, 2009 at 11:37:40AM +0100, Thom Brown wrote:

I've read the PostgreSQL documentation page on the boolean datatype (
http://www.postgresql.org/docs/8.4/static/datatype-boolean.html) to find out
what PostgreSQL's definition of a boolean is, as I believe it is distinctive
from a bit(1) datatype

Yup, they're really for different things. AND, OR and NOT are defined
for the BOOLEAN datatype and not for bit strings.

(as you can't max() a boolean.. not sure what an
efficient alternative to that is).

bool_or and bool_and are aggregates that work over boolean data types.

However, I see that a boolean takes up 1
byte of storage, which is 8 bits. Is this due to the fact that the value
can be null?

I believe it's more to do with the fact that if you add a boolean column
and then subsequently an int column then you're going to struggle to
"pack" them efficiently. PG always puts columns on the "end" so that you
can add a column in constant time (i.e. no need to rewrite the table
in some common situations). Once you start doing this then packing is
awkward and a single byte becomes much easier. Whether the value is
NULL is stored elsewhere in the row.

Yes, this could be made more efficient; whether it's worth it is a
difficult question!

And does its storage as a byte affect indexing or query planning?

Not sure which aspects you're referring to here, sorry.

--
Sam http://samason.me.uk/

#3Thom Brown
thombrown@gmail.com
In reply to: Sam Mason (#2)
Re: Boolean storage takes up 1 byte?

2009/10/1 Sam Mason <sam@samason.me.uk>

bool_or and bool_and are aggregates that work over boolean data types.

Ah yes, that makes total sense! I knew max wouldn't be logical in such as
case, but couldn't think of the alternative. Thanks!

I believe it's more to do with the fact that if you add a boolean column
and then subsequently an int column then you're going to struggle to
"pack" them efficiently. PG always puts columns on the "end" so that you
can add a column in constant time (i.e. no need to rewrite the table
in some common situations). Once you start doing this then packing is
awkward and a single byte becomes much easier. Whether the value is
NULL is stored elsewhere in the row.

That's clear now.

And does its storage as a byte affect indexing or query planning?

Not sure which aspects you're referring to here, sorry.

Giving my question more thought, I believe it's pointless.

You've answered my question. Thanks Sam.

#4Simon Riggs
simon@2ndQuadrant.com
In reply to: Sam Mason (#2)
Re: Boolean storage takes up 1 byte?

On Thu, 2009-10-01 at 12:03 +0100, Sam Mason wrote:

However, I see that a boolean takes up 1
byte of storage, which is 8 bits. Is this due to the fact that the value
can be null?

I believe it's more to do with the fact that if you add a boolean column
and then subsequently an int column then you're going to struggle to
"pack" them efficiently. PG always puts columns on the "end" so that you
can add a column in constant time (i.e. no need to rewrite the table
in some common situations). Once you start doing this then packing is
awkward and a single byte becomes much easier. Whether the value is
NULL is stored elsewhere in the row.

It might be possible to make BOOLEAN NOT NULL use the null bit to
represent the actual data value and then have the column use no
additional bytes, except when we don't store the null bitmap at all.
Just needs people to make it happen cleanly, if that's possible.

Don't like booleans myself. They tend to end up as 3+ values eventually.

--
Simon Riggs www.2ndQuadrant.com

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Simon Riggs (#4)
Re: Boolean storage takes up 1 byte?

Simon Riggs <simon@2ndQuadrant.com> writes:

It might be possible to make BOOLEAN NOT NULL use the null bit to
represent the actual data value and then have the column use no
additional bytes, except when we don't store the null bitmap at all.
Just needs people to make it happen cleanly, if that's possible.

I really doubt that any scheme to pack booleans tighter would be a
net win. It'd make the core tuple-assembly and -disassembly loops more
complicated, hence slower and harder to maintain. Everybody would pay
that price whether or not they ever saved a byte from it.

It's worth noting also that you don't save anything from packing a bool
unless the *next* field in the row has a weak enough alignment
requirement that it can be moved over. In a majority of cases this
would mean that you'd need at least five adjacent bool columns before
you have any shot at winning anything --- with four or less, a following
column with int alignment will stay right where it is.

If you do have lots and lots of bool columns, it might possibly be worth
the trouble to represent them as a combined BIT(n) column ...

regards, tom lane