Single Byte values

Started by Jason Hihnabout 23 years ago8 messagesgeneral

jhihn@paytimepayroll.com

about 23 years ago

As someone pointed out, CHAR(1) costs 5 bytes. Is there a better way to
store it? I can see one or three bytes as being bad (byte-alignment issues
and speed) Two could be ok, but at the very least 4 should be the max for
char(1).

Two is ok, once you consider multi-bytes strings (unicode)
Three is right out -
Four bytes could store two multibyte, two single byte (cast as 2 byte for
unicode) or four single byte characters.

But in returning to my question, what should I be using for char(1) to
char(4) storage?

Thanks again!

Nigel J. Andrews

nandrews@investsystems.co.uk

about 23 years ago

In reply to: Jason Hihn (#1)

Re: Single Byte values

On Thu, 3 Apr 2003, Jason Hihn wrote:

As someone pointed out, CHAR(1) costs 5 bytes. Is there a better way to
store it? I can see one or three bytes as being bad (byte-alignment issues
and speed) Two could be ok, but at the very least 4 should be the max for
char(1).

Two is ok, once you consider multi-bytes strings (unicode)
Three is right out -
Four bytes could store two multibyte, two single byte (cast as 2 byte for
unicode) or four single byte characters.

But in returning to my question, what should I be using for char(1) to
char(4) storage?

Thanks again!

I say you should use char(1) or whatever if that's what your requirement
is. I'm not sure on the storage details but I doubt using text type will save
anything.

I also suspect the 5 byte cost isn't just the data but column overhead as
well. I think the person saying not to use it is really saying why limit
yourself to 1 character when for similar cost you can get 1 character _and_ the
ability to get more than 1 character. However, this comes back to my point, if
1 character is what you need then use a char(1). Unfortunately, the example
that jumped to my mind of vehicle registration year tags doesn't work,
somewhere along the line that part of registration numbers became 2 letters
instead of 1, which would really come back and haunt you if you'd done your
database to be char(1) and had been amassing data for years before the change.

--
Nigel J. Andrews

Jason Hihn

jhihn@paytimepayroll.com

about 23 years ago

In reply to: Nigel J. Andrews (#2)

Re: Single Byte values

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org]On Behalf Of Nigel J. Andrews
Sent: Thursday, April 03, 2003 4:02 PM
To: Jason Hihn
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Single Byte values

...

I say you should use char(1) or whatever if that's what your requirement
is. I'm not sure on the storage details but I doubt using text
type will save
anything.

I also suspect the 5 byte cost isn't just the data but column overhead as
well. I think the person saying not to use it is really saying why limit
yourself to 1 character when for similar cost you can get 1

...

4 bytes(stored string length) + 1 data in the case of char(1).

Well this is a key for an enumeration, there are only a handful of values,
but thousands of records. I could do it via CHECK(a='a' || a='b' || a='c')
BUT I much rather dump that enumeration off to a table so to add a letter
later only requires INSERT INTO _table VALUES('d');

Furthermore, storing to off to a table allows a description of user-friendly
views of the data. Join on the column and you get an English (or whatever
your language) description. (Incidentally, it's not bad to change the schema
to:
create table _table
a char(1),
lang char(2),
desc text
primary key(a));

Where you can support multiple languages.

Incententally the typical representation is a list or drop-down box. Now you
can populate it with what's in _table

add a REFERENCES _table(a), and you have an easily extensible system that
the can add values too as well.

instead of 1, which would really come back and haunt you if you'd
done your
database to be char(1) and had been amassing data for years
before the change.

It could, but as it stands, I only have 10 or so for an application that's
been around for 15 years, and additions are rare. Even so this makes them
trivially easy... The decision to use a packed type over a vector type
should lie with the DB designer.

Manfred Koizar

mkoi-pg@aon.at

about 23 years ago

In reply to: Jason Hihn (#1)

Re: Single Byte values

On Thu, 03 Apr 2003 13:35:24 -0500, Jason Hihn
<jhihn@paytimepayroll.com> wrote:

As someone pointed out, CHAR(1) costs 5 bytes.

Jason, you already found out, that char(1) uses

4 bytes(stored string length) + 1 data in the case of char(1).

All char(n) types are stored on 4 byte boundaries, so char(1) costs up
to 8 bytes.

But in returning to my question, what should I be using for char(1) to
char(4) storage?

Instead of char(1) you can use the Postgres specific type "char"
(*with* the quotes) which only needs one byte.

For char(n) a little toy has been posted to -hackers half a year ago
(cf. discussion around
http://archives.postgresql.org/pgsql-hackers/2002-10/msg00501.php).
Feel free to play around with it. Though I wouldn't take the trouble
if only a few thousand rows are involved...

Servus
Manfred

Jason Hihn

jhihn@paytimepayroll.com

about 23 years ago

In reply to: Manfred Koizar (#4)

Re: Single Byte values

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org]On Behalf Of Manfred Koizar
Sent: Thursday, April 03, 2003 5:53 PM
To: Jason Hihn
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Single Byte values

On Thu, 03 Apr 2003 13:35:24 -0500, Jason Hihn
<jhihn@paytimepayroll.com> wrote:

As someone pointed out, CHAR(1) costs 5 bytes.

Jason, you already found out, that char(1) uses

4 bytes(stored string length) + 1 data in the case of char(1).

All char(n) types are stored on 4 byte boundaries, so char(1) costs up
to 8 bytes.

Is this true for memory AND disk, or just memory?

But in returning to my question, what should I be using for char(1) to
char(4) storage?

Instead of char(1) you can use the Postgres specific type "char"
(*with* the quotes) which only needs one byte.

So I assume use of this is 'safe'? Any degridation on performance?

It appears that this datatype is also array compatible, CREATE TABLE test3(a
"char"[4]); works!

I guess if space is so important, I could also just store
(char[0]<<8|char[1]) into a smallint...

Show quoted text

For char(n) a little toy has been posted to -hackers half a year ago
(cf. discussion around
http://archives.postgresql.org/pgsql-hackers/2002-10/msg00501.php).
Feel free to play around with it. Though I wouldn't take the trouble
if only a few thousand rows are involved...

Servus
Manfred

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Manfred Koizar

mkoi-pg@aon.at

about 23 years ago

In reply to: Jason Hihn (#5)

Re: Single Byte values

On Fri, 04 Apr 2003 09:13:54 -0500, Jason Hihn
<jhihn@paytimepayroll.com> wrote:

All char(n) types are stored on 4 byte boundaries, so char(1) costs up
to 8 bytes.

Is this true for memory AND disk, or just memory?

Both.

Instead of char(1) you can use the Postgres specific type "char"
(*with* the quotes) which only needs one byte.

So I assume use of this is 'safe'? Any degridation on performance?

I don't expect performance degradation; this should rather improve
performance because with smaller tuples you get more tuples per page
and so pages have to be read/written less frequently. AFAICT from my
tests, if your system is IO bound, then a reduction of tuple size
translates directly to performance improvement, e.g. 5% tuple size
reduction results more or less in 5% better throughput; might be
less, if your usage pattern is CPU bound; might be more, if you're
lucky and your working set jumps from just a bit more than available
cache to just a bit less.

Servus
Manfred

Antti Haapala

antti.haapala@iki.fi

about 23 years ago

In reply to: Jason Hihn (#5)

Re: Single Byte values

So I assume use of this is 'safe'? Any degridation on performance?

Hmm.. I believe it isn't standard...

It appears that this datatype is also array compatible, CREATE TABLE

test3(a

"char"[4]); works!

I guess if space is so important, I could also just store
(char[0]<<8|char[1]) into a smallint...

It's often better use int4(8) than char(1) as (primary) keys.

pros: int4 uses _less_ space, not affected by locales, operations are
surely -faster-. Larger range of values. I believe even int8 is faster
than char(1) and not many applications are ever going to exhaust 2^63
keyvalues.

cons: integer values often aren't quite mnemonic.

--
Antti Haapala

Joel Rees

joel@alpsgiken.gr.jp

about 23 years ago

In reply to: Antti Haapala (#7)

Re: Single Byte values

It's often better use int4(8) than char(1) as (primary) keys.

A thought from a rank newb, how about using the a misnamed ascii(), and
storing the integer code for the characters you want to use as short
keys in integer fields?

--
Joel Rees <joel@alpsgiken.gr.jp>