New data type: uniqueidentifier

Started by Dmitry G. Mastrukovover 24 years ago19 messages
#1Dmitry G. Mastrukov
dmitry@taurussoft.org

Hi, All!

I've developed new data type for PostgreSQL - uniqueidentifier - 128-bit
value claims to be unique across Universe. It depends on libuuid from
e2fsprogs by Theodore Ts'o. Now I use it in my project. Everybody can grab
it from
http://taurussoft.chat.ru/uniqueidentifier-0.1.9.tar.gz

Before announce this new type through pgsql-announce I want to clear for
myself some things.
I've marked "=" operator with HASH clause (and planner has started to use
hash jons). But as I understand the right way is to create special hash
function (may be wrapper for hash_any(), isn't it?) and register it for hash
as for btree method.
So is it desirable to mark "=" as HASH for this type (seems internal 16 byte
representation will be hash well) and if yes how can I create hash sort
method for uniqueidentifier?

regards,
Dmitry

PS. If you decide to install uniqueidentifier look at the date of
uuid/uuid.h somewhere in INCLUDE path. Sometimes it's necessary to manualy
enter "make install" in lib/uuid directory of e2fsprogs.

#2Alex Pilosov
alex@pilosoft.com
In reply to: Dmitry G. Mastrukov (#1)
Re: New data type: uniqueidentifier

On Tue, 26 Jun 2001, Dmitry G. Mastrukov wrote:

myself some things.
I've marked "=" operator with HASH clause (and planner has started to use
hash jons). But as I understand the right way is to create special hash
function (may be wrapper for hash_any(), isn't it?) and register it for hash
as for btree method.

No. Currently, there's no way to specify a hash function for a given
operator, it always uses a builtin function that operates on memory
representation of a value.

There's no need (or possibility) to register a hash with btree method.

So is it desirable to mark "=" as HASH for this type (seems internal 16 byte
representation will be hash well) and if yes how can I create hash sort
method for uniqueidentifier?

You can mark it hashable, since two identical uuid values would have
identical memory representation and thus the same hash value.

I'd look at your code, but that is URL too slow, in 5 minutes downloaded
1000 bytes...

Show quoted text

regards,
Dmitry

PS. If you decide to install uniqueidentifier look at the date of
uuid/uuid.h somewhere in INCLUDE path. Sometimes it's necessary to manualy
enter "make install" in lib/uuid directory of e2fsprogs.

#3Dmitry G. Mastrukov
dmitry@taurussoft.org
In reply to: Alex Pilosov (#2)
Re: New data type: uniqueidentifier

Alex Pilosov <alex@pilosoft.com> wrote:

I'd look at your code, but that is URL too slow, in 5 minutes downloaded
1000 bytes...

It's possible now to grab from another location

http://fitmark.net/taurussoft/uniqueidentifier-0.1.9.tar.gz

regards,
Dmitry

#4Peter Eisentraut
peter_e@gmx.net
In reply to: Dmitry G. Mastrukov (#1)
Re: New data type: uniqueidentifier

Dmitry G. Mastrukov writes:

I've developed new data type for PostgreSQL - uniqueidentifier - 128-bit
value claims to be unique across Universe. It depends on libuuid from
e2fsprogs by Theodore Ts'o.

ISTM that this should be a function, not a data type.

--
Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter

#5Dmitry G. Mastrukov
dmitry@taurussoft.org
In reply to: Alex Pilosov (#2)
Re: New data type: uniqueidentifier

Alex Pilosov <alex@pilosoft.com> wrote:

On Tue, 26 Jun 2001, Dmitry G. Mastrukov wrote:

myself some things.
I've marked "=" operator with HASH clause (and planner has started to

use

hash jons). But as I understand the right way is to create special hash
function (may be wrapper for hash_any(), isn't it?) and register it for

hash

as for btree method.

No. Currently, there's no way to specify a hash function for a given
operator, it always uses a builtin function that operates on memory
representation of a value.

There's no need (or possibility) to register a hash with btree method.

Strange. When I execute following query (slightly modified query from User's
Guide chapter 7.6)

SELECT am.amname AS acc_name,
opc.opcname AS ops_name,
opr.oprname AS ops_comp
FROM pg_am am, pg_amop amop,
pg_opclass opc, pg_operator opr
WHERE amop.amopid = am.oid AND
amop.amopclaid = opc.oid AND
amop.amopopr = opr.oid
ORDER BY ops_name, ops_comp;

I see both hash and btree amname for builtin opclasses. For example

acc_name | ops_name | ops_comp
----------+----------+----------
btree | int4_ops | <
btree | int4_ops | <=
btree | int4_ops | =
hash | int4_ops | =
btree | int4_ops | >
btree | int4_ops | >=

But new type has no hash for "=". Plus I saw hash functions for builtin
types in source code. So can I achieve for created type such intergration
with Postgres as for builtin types? Or am I understanding something wrong?

regards,
Dmitry

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dmitry G. Mastrukov (#5)
Re: New data type: uniqueidentifier

"Dmitry G. Mastrukov" <dmitry@taurussoft.org> writes:

Alex Pilosov <alex@pilosoft.com> wrote:

On Tue, 26 Jun 2001, Dmitry G. Mastrukov wrote:

I've marked "=" operator with HASH clause (and planner has started to
use
hash jons). But as I understand the right way is to create special hash
function (may be wrapper for hash_any(), isn't it?) and register it for
hash
as for btree method.

No. Currently, there's no way to specify a hash function for a given
operator, it always uses a builtin function that operates on memory
representation of a value.

Strange. When I execute following query (slightly modified query from User's
Guide chapter 7.6)

You're looking at support for hash indexes, which have nothing to do
with hash joins.

*Why* they have nothing to do with hash joins, I dunno. You'd think
that using the same hash functions for both would be a good idea.
But that's not how it's set up at the moment.

regards, tom lane

#7Dmitry G. Mastrukov
dmitry@taurussoft.org
In reply to: Alex Pilosov (#2)
Re: New data type: uniqueidentifier

Tom Lane <tgl@sss.pgh.pa.us> wrote:

"Dmitry G. Mastrukov" <dmitry@taurussoft.org> writes:

Alex Pilosov <alex@pilosoft.com> wrote:

On Tue, 26 Jun 2001, Dmitry G. Mastrukov wrote:

I've marked "=" operator with HASH clause (and planner has started to
use
hash jons). But as I understand the right way is to create special hash
function (may be wrapper for hash_any(), isn't it?) and register it for
hash
as for btree method.

No. Currently, there's no way to specify a hash function for a given
operator, it always uses a builtin function that operates on memory
representation of a value.

Strange. When I execute following query (slightly modified query from

User's

Guide chapter 7.6)

You're looking at support for hash indexes, which have nothing to do
with hash joins.

*Why* they have nothing to do with hash joins, I dunno. You'd think
that using the same hash functions for both would be a good idea.
But that's not how it's set up at the moment.

OK. It's clear for me now. Thanks.
But should I create support for hash indexes? Since builtin types have such
support I want it too for uniqueidentifier :) How can I make it?

regards,
Dmitry

#8Thomas Swan
tswan@olemiss.edu
In reply to: Peter Eisentraut (#4)
Re: New data type: uniqueidentifier

Peter Eisentraut wrote:

Dmitry G. Mastrukov writes:

I've developed new data type for PostgreSQL - unique identifier - 128-bit
value claims to be unique across Universe. It depends on libuuid from
e2fsprogs by Theodore Ts'o.

ISTM that this should be a function, not a data type.

I'd second the function idea: function uuid( ) returns an int8 value;
don't create a bazillion datatypes. Besides, 128 bit numbers are 7 byte
integers. PostgreSQL has an int8 (8 byte integer) datatype. While I
like the UUID function idea, I'd recommend a better solution to creating
an "unique" identifier. Why not create a serial8 datatype: int8 with an
int8 sequence = 256bit "unique" number. {Yes, I know I'm violating my
first sentence.} Then, you'd have the same thing (or better) AND your
not relying on randomness.

#9Alex Pilosov
alex@pilosoft.com
In reply to: Thomas Swan (#8)
Re: Re: New data type: uniqueidentifier

On Mon, 2 Jul 2001, Thomas Swan wrote:

Peter Eisentraut wrote:

Dmitry G. Mastrukov writes:

I've developed new data type for PostgreSQL - unique identifier - 128-bit
value claims to be unique across Universe. It depends on libuuid from
e2fsprogs by Theodore Ts'o.

ISTM that this should be a function, not a data type.

I'd second the function idea: function uuid( ) returns an int8 value;
don't create a bazillion datatypes. Besides, 128 bit numbers are 7 byte
integers. PostgreSQL has an int8 (8 byte integer) datatype. While I
like the UUID function idea, I'd recommend a better solution to creating
an "unique" identifier. Why not create a serial8 datatype: int8 with an
int8 sequence = 256bit "unique" number. {Yes, I know I'm violating my
first sentence.} Then, you'd have the same thing (or better) AND your
not relying on randomness.

I don't think you know what UUID is. It is NOT just a unique randon
number. There are specific rules for construction of such number, specific
rules for comparison of numbers (no, its not bit-by-bit), thus a datatype
is most appropriate answer.

-alex

#10Peter Eisentraut
peter_e@gmx.net
In reply to: Alex Pilosov (#9)
Re: Re: New data type: uniqueidentifier

Alex Pilosov writes:

I don't think you know what UUID is. It is NOT just a unique randon
number. There are specific rules for construction of such number, specific
rules for comparison of numbers (no, its not bit-by-bit), thus a datatype
is most appropriate answer.

A data type may be appropriate for storing these values, but not for
generating them. Functions generate stuff, data types store stuff.

--
Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter

#11Alex Pilosov
alex@pilosoft.com
In reply to: Peter Eisentraut (#10)
Re: Re: New data type: uniqueidentifier

On Mon, 2 Jul 2001, Peter Eisentraut wrote:

Alex Pilosov writes:

I don't think you know what UUID is. It is NOT just a unique randon
number. There are specific rules for construction of such number, specific
rules for comparison of numbers (no, its not bit-by-bit), thus a datatype
is most appropriate answer.

A data type may be appropriate for storing these values, but not for
generating them. Functions generate stuff, data types store stuff.

Sorry, apparently we misunderstood each other but are really in full
agreement.

Dmitry's stuff contains both datatype (uniqueidentifier), a function to
generate a new object of that type (newid), and a set of functions to
implement comparison operators for that type.

I don't see anything wrong with that setup, but maybe I'm still missing
something?

-alex

#12Christopher Kings-Lynne
chriskl@familyhealth.com.au
In reply to: Thomas Swan (#8)
RE: Re: New data type: uniqueidentifier

don't create a bazillion datatypes. Besides, 128 bit numbers are 7
byte integers.

Hang on: 128 div 8 = 16 byte integer

PostgreSQL has an int8 (8 byte integer) datatype.

And therefore it is a _64_ bit integer and you can't have a 256bit unique
number in it...

While I like the UUID function idea, I'd recommend a better solution to
creating an "unique" identifier. Why not create a serial8 datatype:
int8 with an int8 sequence = 256bit "unique" number. {Yes, I know
violating my first sentence.} Then, you'd have the same thing (or
better) AND your not relying on randomness.

Chris

#13Thomas Swan
tswan@olemiss.edu
In reply to: Christopher Kings-Lynne (#12)
Re: Re: New data type: uniqueidentifier

I sit corrected.

*slightly humbled*

Why not do an unsigned int16 to hold your UUID generated numbers.
Ultimately, this would seem to be a more general solution and accomplish
your goals at the sametime. Or, am I completely missing something.

Christopher Kings-Lynne wrote:

Show quoted text

don't create a bazillion datatypes. Besides, 128 bit numbers are 7
byte integers.

Hang on: 128 div 8 = 16 byte integer

PostgreSQL has an int8 (8 byte integer) datatype.

And therefore it is a _64_ bit integer and you can't have a 256bit unique
number in it...

While I like the UUID function idea, I'd recommend a better solution to
creating an "unique" identifier. Why not create a serial8 datatype:
int8 with an int8 sequence = 256bit "unique" number. {Yes, I know
violating my first sentence.} Then, you'd have the same thing (or
better) AND your not relying on randomness.

Chris

#14Alex Pilosov
alex@pilosoft.com
In reply to: Thomas Swan (#13)
Re: Re: New data type: uniqueidentifier

On Mon, 2 Jul 2001, Thomas Swan wrote:

I sit corrected.

*slightly humbled*

Why not do an unsigned int16 to hold your UUID generated numbers.

Not a good idea, since rules for comparison of UUID are wierd and are
_definitely_ not same as for comparison of int16.

Show quoted text

Ultimately, this would seem to be a more general solution and accomplish
your goals at the sametime. Or, am I completely missing something.

#15Thomas Swan
tswan-lst@ics.olemiss.edu
In reply to: Alex Pilosov (#14)
Re: New data type: uniqueidentifier

Where can I find some more information on it? I'm curious now.

Alex Pilosov wrote:

Show quoted text

On Mon, 2 Jul 2001, Thomas Swan wrote:

I sit corrected.

*slightly humbled*

Why not do an unsigned int16 to hold your UUID generated numbers.

Not a good idea, since rules for comparison of UUID are wierd and are
_definitely_ not same as for comparison of int16.

Ultimately, this would seem to be a more general solution and accomplish
your goals at the sametime. Or, am I completely missing something.

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

#16Peter Eisentraut
peter_e@gmx.net
In reply to: Alex Pilosov (#11)
Re: Re: New data type: uniqueidentifier

Alex Pilosov writes:

Dmitry's stuff contains both datatype (uniqueidentifier), a function to
generate a new object of that type (newid), and a set of functions to
implement comparison operators for that type.

I don't see anything wrong with that setup, but maybe I'm still missing
something?

It would be much simpler if you stored the unique id in varchar or text.

--
Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter

#17Alex Pilosov
alex@pilosoft.com
In reply to: Thomas Swan (#15)
Re: New data type: uniqueidentifier

If you mean the [proposed?] standard itself, here is a good description of
it:
http://www.ics.uci.edu/pub/ietf/webdav/uuid-guid/draft-leach-uuids-guids-01.txt

It was a proposed IETF standard, however, IETF standardization failed
because ISO already ratified it as a DCE/RPC standard ISO 11578, however,
the above URL provides far better description of UUIDs than ISO standard
itself

On Tue, 3 Jul 2001, Thomas Swan wrote:

Show quoted text

Where can I find some more information on it? I'm curious now.

#18Alex Pilosov
alex@pilosoft.com
In reply to: Peter Eisentraut (#16)
Re: Re: New data type: uniqueidentifier

On Tue, 3 Jul 2001, Peter Eisentraut wrote:

Dmitry's stuff contains both datatype (uniqueidentifier), a function to
generate a new object of that type (newid), and a set of functions to
implement comparison operators for that type.

It would be much simpler if you stored the unique id in varchar or text.

Peter,

UUIDs have specific rules for comparison of them. Its so much easier to
compare them via a<b than uuid_lt(a,b). If one wanted to make a meaningful
index on uuid value, normal ordering of varchar would not suffice...

-alex

#19Dmitry G. Mastrukov
dmitry@taurussoft.org
In reply to: Peter Eisentraut (#16)
Re: Re: New data type: uniqueidentifier

Peter Eisentraut wrote:

Alex Pilosov writes:

Dmitry's stuff contains both datatype (uniqueidentifier), a function to
generate a new object of that type (newid), and a set of functions to
implement comparison operators for that type.

I don't see anything wrong with that setup, but maybe I'm still missing
something?

It would be much simpler if you stored the unique id in varchar or text.

Are you sure varchar comparision will be quickly than current
implementation? Next, varchar will need 36 byte, uniqueidentifier takes
16. Next, indexing - IMHO current stuff more suitable for indexes. Some
time ago I saw some stuff which deals with uniqueidentifiers for
Interbase. It uses your scheme with chars. But it strip "-" from string
and reverts it to efficiently use indexes (uid sometimes uses
MAC-address as part of itself, so MAC should go first in string). Weird
scheme for me!

regards,
Dmitry