RFD: hexstring(n) data type
Following the discussion on making UUID data type to be much more liberal
( http://archives.postgresql.org/pgsql-hackers/2008-02/msg01214.php )
I have decided to try to approach it from more general perspective.
The current state of code is available at:
http://qnex.net/hexstring-0.1-2008-03-03.tgz
And now for more details:
The idea is to have a data type HEXSTRING(n) which can have an optional
typemod specifying the size of data (in bytes).
Internally the datatype is binary compatible with bytea, but I/O is done as
hex-encoded strings. The format is liberal, isspace() and ispunct() characters
are skipped while the digits are read.
I have played with two versions of hexstringin() function, one which uses
strtoul() function and the other which uses "home brew" code. The latter
appears to be faster, so I stayed with that. But I would appreciate
comments on this from more experienced.
So, what are the use cases?
CREATE DOMAIN liberal_uuid AS hexstring(16);
CREATE DOMAIN liberal_macaddr AS hexstring(6);
...it allows for creating other standard hex-types, as for example:
CREATE DOMAIN wwn AS hexstring(8); --
http://en.wikipedia.org/wiki/World_Wide_Name
Also it can be a convenient alternative to bytea format (I know, the
encode()/decode() pair),
especially when you have to format output data as some fancy hex-string.
The code is currently just a bunch of input/output/typemod functions
which appear
to work. I will add casts, operators, etc -- they most likely will be
nicked from bytea.
What I would like to also add is ubiquitous to_char(hex, format) function.
For an UUID-compatilbe format it would be called as:
SELECT to_char(hex, 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx') or
SELECT to_char(hex, '8x-4x-4x-4x-12x') where x is expanded as
[0-9a-f] digit and X is expanded as [0-9A-F].
I am not sure what to do about variable length hexstrings, I am
considering something
like to_char(hex, '8X-') which would produce something like
'00000000-11111111-22222222'
for a 12-byte hexstring (what to do about dangling '-' ?).
...but the original case against liberal UUID was that it would make
the I/O slower.
My simple test:
postgres=# CREATE TABLE uuids AS SELECT uuid_generate_v4()::text AS u
FROM generate_series(1,10000000);
CREATE
postgres=# CREATE TEMP TABLE text ON COMMIT DROP AS SELECT
u::hexstring(16) FROM uuids;
SELECT
Time: 13058.486 ms
postgres=# CREATE TEMP TABLE test ON COMMIT DROP AS SELECT u::uuid FROM uuids;
SELECT
Time: 13536.816 ms
...now hexstring is varlena type but does not use strtoul. Perhaps
uuid might be more liberal too.
What do you think about it?
Regards,
Dawid
I am confused how a hex type is any better than using the 'hex' decode()
format we already support:
test=> select decode('5476', 'hex');
decode
--------
Tv
(1 row)
---------------------------------------------------------------------------
Dawid Kuroczko wrote:
Following the discussion on making UUID data type to be much more liberal
( http://archives.postgresql.org/pgsql-hackers/2008-02/msg01214.php )
I have decided to try to approach it from more general perspective.The current state of code is available at:
http://qnex.net/hexstring-0.1-2008-03-03.tgzAnd now for more details:
The idea is to have a data type HEXSTRING(n) which can have an optional
typemod specifying the size of data (in bytes).Internally the datatype is binary compatible with bytea, but I/O is done as
hex-encoded strings. The format is liberal, isspace() and ispunct() characters
are skipped while the digits are read.I have played with two versions of hexstringin() function, one which uses
strtoul() function and the other which uses "home brew" code. The latter
appears to be faster, so I stayed with that. But I would appreciate
comments on this from more experienced.So, what are the use cases?
CREATE DOMAIN liberal_uuid AS hexstring(16);
CREATE DOMAIN liberal_macaddr AS hexstring(6);...it allows for creating other standard hex-types, as for example:
CREATE DOMAIN wwn AS hexstring(8); --
http://en.wikipedia.org/wiki/World_Wide_NameAlso it can be a convenient alternative to bytea format (I know, the
encode()/decode() pair),
especially when you have to format output data as some fancy hex-string.The code is currently just a bunch of input/output/typemod functions
which appear
to work. I will add casts, operators, etc -- they most likely will be
nicked from bytea.What I would like to also add is ubiquitous to_char(hex, format) function.
For an UUID-compatilbe format it would be called as:
SELECT to_char(hex, 'xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx') or
SELECT to_char(hex, '8x-4x-4x-4x-12x') where x is expanded as
[0-9a-f] digit and X is expanded as [0-9A-F].
I am not sure what to do about variable length hexstrings, I am
considering something
like to_char(hex, '8X-') which would produce something like
'00000000-11111111-22222222'
for a 12-byte hexstring (what to do about dangling '-' ?)....but the original case against liberal UUID was that it would make
the I/O slower.
My simple test:postgres=# CREATE TABLE uuids AS SELECT uuid_generate_v4()::text AS u
FROM generate_series(1,10000000);
CREATEpostgres=# CREATE TEMP TABLE text ON COMMIT DROP AS SELECT
u::hexstring(16) FROM uuids;
SELECT
Time: 13058.486 ms
postgres=# CREATE TEMP TABLE test ON COMMIT DROP AS SELECT u::uuid FROM uuids;
SELECT
Time: 13536.816 ms...now hexstring is varlena type but does not use strtoul. Perhaps
uuid might be more liberal too.What do you think about it?
Regards,
Dawid--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your Subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-hackers
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
I don't think we should define types based on how they print. We should define
types based on what they contain. bytea is a perfectly good datatype for
storing binary data, though perhaps we could use a bytea(n) for fixed size
binary data.
However bytea has an inconvenient string representation. Really I would prefer
if bytea were *always* read and written in hex format. We could have a guc to
enable this but it would make it hard to write code which would function
consistently as part of a larger application.
I think this is more akin to the MONEY data type. Really it would be better if
we could declare columns as NUMERIC but attach a "default format" to them for
use when string representation is desired. Similarly with bytea we could
choose a default string representation different from the default in/out
functions.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's RemoteDBA services!
Gregory Stark <stark@enterprisedb.com> writes:
However bytea has an inconvenient string representation. Really I would prefer
if bytea were *always* read and written in hex format.
Yeah, the escaping rule is a mess: hard to work with and even less
storage-efficient than hex would be.
We could have a guc to
enable this but it would make it hard to write code which would function
consistently as part of a larger application.
Well, the datetime types have had DateStyle for years and people have
managed to deal with it, so maybe you're overestimating the problem.
regards, tom lane