uuid type for postgres

Started by nathan wagnerover 20 years ago58 messageshackers
Jump to latest
#1nathan wagner
nw@hydaspes.if.org

I have been in need of a uuid type and ran across the pguuid download
by Xiongjian (Mike) Wang. This wasn't really useful to me for two
reasons: first, it is GPLed and I would prefer a more liberal license,
secondly, it didn't compile cleanly on Mac OS 10.3, due to lack of a
SIOCGETIFHWADDR (? i think, i can get the exact name if you want it)
ioctl() under darwin.

While I could dike out the code that calls it, that seems like a suboptimal
solution. So after a bit of poking around the interweb i ran across
Ralf Engelschall's ossp uuid library. This compiled with minimal
effort on mac os. Some reading, and an evening later, i've made
a server plugin with supporting SQL that implements an 'ossp_uuid'
type.

Now i have four questions:

1: Is it feasible for this to be included in the contrib section of
the regular download? The uuid library is a "notice of copyright" style
license, and I am willing to put my own code into the public domain.

2: Would just calling the type 'uuid' be better than 'ossp_uuid'? It's
certainly a nicer name.

3: Would it be possible to include such a type as a postgres extension
to the usual SQL types. It seems to me that having an officially
supported type would be better than a user contributed type on the grounds
that you could then rely on it being avaiable if postgres was.
In particular, installing it as an extension would require the cooperation
of the DBA, which may be infeasible in some environments.

--
Nathan Wagner

#2Mark Mielke
mark@mark.mielke.cc
In reply to: nathan wagner (#1)
Re: uuid type for postgres

Hey Nathan.

I've started to make heavy use of pguuid. It had several bugs in it
that required fixing before I could use it. I have no preference on
pguuid. It was the only such PostgreSQL project I found that provided
a UUID type.

I'd be willing to work with you on ensuring that such a patch is in an
acceptable form for the core developers to accept, and help maintain
it.

My personal preference is that the type be called 'uuid' and accepted
into the core. Tom? Is their history on this issue?

Should it remain an extension, or can be get it built-in?

I find the UUID concept more useful than the SERIAL concept for data
that must now, or may in the future, be stored on multiple servers.
For example, submitting transactions to two different sites, where the
sites periodically synchronize up with each other, requiring a
relatively safe 'merge'. SERIAL can't do this at all. UUID can do
this as long as the rows are split into domains appropriately,
such as ensuring that unique column constraints do not need to
be enforced across sites.

UUID is also desirable over SERIAL for a unique identifier that will
be accessed from an outside source directly. Exposing the SERIAL
number to the outside isn't appealing for reasons including the
predictability of the sequence, the size of the database is exposed,
the internel implementation is exposed, and so on.

I suggest that UUID be recommended in place of SERIAL for certain
classes of applications, and that it therefore belongs in the core.
UUID and SERIAL can be used together (although, once you have a UUID,
it may not be useful to also have a SERIAL).

Ok, you can all tell me I'm wrong now... :-)

mark

On Tue, Sep 06, 2005 at 01:50:57PM +0000, nathan wagner wrote:

I have been in need of a uuid type and ran across the pguuid download
by Xiongjian (Mike) Wang. This wasn't really useful to me for two
reasons: first, it is GPLed and I would prefer a more liberal license,
secondly, it didn't compile cleanly on Mac OS 10.3, due to lack of a
SIOCGETIFHWADDR (? i think, i can get the exact name if you want it)
ioctl() under darwin.

While I could dike out the code that calls it, that seems like a suboptimal
solution. So after a bit of poking around the interweb i ran across
Ralf Engelschall's ossp uuid library. This compiled with minimal
effort on mac os. Some reading, and an evening later, i've made
a server plugin with supporting SQL that implements an 'ossp_uuid'
type.

Now i have four questions:

1: Is it feasible for this to be included in the contrib section of
the regular download? The uuid library is a "notice of copyright" style
license, and I am willing to put my own code into the public domain.

2: Would just calling the type 'uuid' be better than 'ossp_uuid'? It's
certainly a nicer name.

3: Would it be possible to include such a type as a postgres extension
to the usual SQL types. It seems to me that having an officially
supported type would be better than a user contributed type on the grounds
that you could then rely on it being avaiable if postgres was.
In particular, installing it as an extension would require the cooperation
of the DBA, which may be infeasible in some environments.

--
Nathan Wagner

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mark Mielke (#2)
Re: uuid type for postgres

mark@mark.mielke.cc writes:

My personal preference is that the type be called 'uuid' and accepted
into the core. Tom? Is their history on this issue?
Should it remain an extension, or can be get it built-in?

There is pretty much zero chance of being accepted into contrib, much
less core, if the code isn't pure BSD license. (We used to be a bit
lax about that, but are trying to clean things up.) Nathan's comment
about starting with code that was sorta-BSD-with-advertising alarmed me.

However, you can certainly set up a pgfoundry project with code of any
license you like. That would be a good starting point anyway --- at
some point you'd have to convince people that there's enough demand
for the feature to justify putting it in core, and being able to point
to X number of downloads from pgfoundry would go a long way towards
making that case.

regards, tom lane

#4Josh Berkus
josh@agliodbs.com
In reply to: Mark Mielke (#2)
Re: uuid type for postgres

Mark,

I suggest that UUID be recommended in place of SERIAL for certain
classes of applications, and that it therefore belongs in the core.
UUID and SERIAL can be used together (although, once you have a UUID,
it may not be useful to also have a SERIAL).

I think that, if you want to push a refactored UUID type for PostgreSQL
8.2, that you'd better separate your database design arguments from your
inclusion arguments.

For example, you might get my agreement that it would be useful to have a
UUID as a core type; you would *never* get my agreement to recommend using
UUID to newbies. I have seen *far* too many abuses of UUIDs in really
bad database design. People who use them should be experienced enough to
know what they're doing.

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

#5nathan wagner
nw@hydaspes.if.org
In reply to: Tom Lane (#3)
Re: uuid type for postgres

On Tue, Sep 06, 2005 at 11:38:57AM -0400, tgl@sss.pgh.pa.us wrote:

There is pretty much zero chance of being accepted into contrib, much
less core, if the code isn't pure BSD license.

Hmm. Here is the copyright and license portion of the readme...

COPYRIGHT AND LICENSE

Copyright (c) 2004-2005 Ralf S. Engelschall <rse@engelschall.com>
Copyright (c) 2004-2005 The OSSP Project <http://www.ossp.org/&gt;

This file is part of OSSP uuid, a library for the generation
of UUIDs which can found at http://www.ossp.org/pkg/lib/uuid/

Permission to use, copy, modify, and distribute this software for
any purpose with or without fee is hereby granted, provided that
the above copyright notice and this permission notice appear in all
copies.

THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
IN NO EVENT SHALL THE AUTHORS AND COPYRIGHT HOLDERS AND THEIR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
SUCH DAMAGE.

Reading the postgres license from the faq...

PostgreSQL Data Base Management System

Portions Copyright (c) 1996-2005, PostgreSQL Global Development Group Portions
Copyright (c) 1994-1996 Regents of the University of California

Permission to use, copy, modify, and distribute this software and its
documentation for any purpose, without fee, and without a written agreement is
hereby granted, provided that the above copyright notice and this paragraph and
the following two paragraphs appear in all copies.

IN NO EVENT SHALL THE UNIVERSITY OF CALIFORNIA BE LIABLE TO ANY PARTY FOR
DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST
PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION, EVEN IF
THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH
DAMAGE.

THE UNIVERSITY OF CALIFORNIA SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING,
BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS IS" BASIS, AND
THE UNIVERSITY OF CALIFORNIA HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE,
SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.

These two licenses appear to be functionally identical. Both require
notice of the copyright to be included, and both allow use for any purpose.
The wording of the requirement is even nearly identical.

By way of disclaimer, I am not (yet) an attorney.

(We used to be a bit
lax about that, but are trying to clean things up.) Nathan's comment
about starting with code that was sorta-BSD-with-advertising alarmed me.

Perhaps i was a bit lax in my wording. I don't read the license
as requiring any advertising at run time, just as a requirement that
the copyright notice be kept with the source code. That is, an
identical requirement to the one that postgres itself uses.

--
Nathan Wagner

#6Jonah H. Harris
jonah.harris@gmail.com
In reply to: Josh Berkus (#4)
Re: uuid type for postgres

I agree with Josh on the UUID type, it gets abused far too often and (IMHO)
isn't widely enough used to belong in the core. Couldn't you just fix the
problem in pguuid rather than write a whole new type?

Show quoted text

On 9/6/05, Josh Berkus <josh@agliodbs.com> wrote:

Mark,

I suggest that UUID be recommended in place of SERIAL for certain
classes of applications, and that it therefore belongs in the core.
UUID and SERIAL can be used together (although, once you have a UUID,
it may not be useful to also have a SERIAL).

I think that, if you want to push a refactored UUID type for PostgreSQL
8.2, that you'd better separate your database design arguments from your
inclusion arguments.

For example, you might get my agreement that it would be useful to have a
UUID as a core type; you would *never* get my agreement to recommend using
UUID to newbies. I have seen *far* too many abuses of UUIDs in really
bad database design. People who use them should be experienced enough to
know what they're doing.

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

#7Bob Ippolito
bob@redivi.com
In reply to: Jonah H. Harris (#6)
Re: uuid type for postgres

On Sep 6, 2005, at 12:57 PM, Jonah H. Harris wrote:

On 9/6/05, Josh Berkus <josh@agliodbs.com> wrote:
Mark,

I suggest that UUID be recommended in place of SERIAL for certain
classes of applications, and that it therefore belongs in the core.
UUID and SERIAL can be used together (although, once you have a

UUID,

it may not be useful to also have a SERIAL).

I think that, if you want to push a refactored UUID type for
PostgreSQL
8.2, that you'd better separate your database design arguments from
your
inclusion arguments.

For example, you might get my agreement that it would be useful to
have a
UUID as a core type; you would *never* get my agreement to
recommend using
UUID to newbies. I have seen *far* too many abuses of UUIDs in
really
bad database design. People who use them should be experienced
enough to
know what they're doing.

I agree with Josh on the UUID type, it gets abused far too often
and (IMHO) isn't widely enough used to belong in the core.
Couldn't you just fix the problem in pguuid rather than write a
whole new type?

This sounds like a strawman argument. People abuse a lot of types to
do a lot of things they shouldn't be doing, adding types to the core
isn't really going to change that much one way or the other.

If the documentation gives the user a good idea of when to use UUID
and when not, I think it would be a good addition. Worst case, it
could become part of contrib, so at least it ships with PostgreSQL
with the same license. The GPL license of pguuid is a show-stopper
for some (and seems like a silly choice for what it is and who it's
for).

-bob

#8Jonah H. Harris
jonah.harris@gmail.com
In reply to: nathan wagner (#1)
Re: uuid type for postgres

The only time I've seen someone use UUIDs in PostgreSQL is when they were
converting from SQL Server. I've seen many bad data models using UUID that
could've/should've used normal sequences for portability.

I look forward to seeing you're code.

Thanks!

Show quoted text

On 9/6/05, nathan wagner <nw@hydaspes.if.org> wrote:

On Tue, Sep 06, 2005 at 03:57:55PM -0400, jonah.harris@gmail.com wrote:

I agree with Josh on the UUID type, it gets abused far too often

Out of curiosity, how does it get abused? It doesn't seem to me
that it would be any more prone to abuse than any other type.

and (IMHO) isn't widely enough used to belong in the core.

Seems like a self-fulfilling cycle. There isn't one in core
or contrib (pguuid is not in contrib, IIRC), and so one doesn't
get used. Since no one uses one, it's not in core or contrib.

Couldn't you just fix the
problem in pguuid rather than write a whole new type?

Not sure which you you're addressing here, but I can't fix the license
problem. Nor can I readily fix the lack of a mac ioctl() to get
the hardware mac address.

In any case, from an economy of effort view, I'd much rather maintain
a postgres extension/interface to an existing uuid library, than fret
the details of writing a uuid library myself. UUIDs themselves don't
really interest me, I just happen to need one for my application.

I'm basically done. I'll tar it up and post a link to an alpha version in
a
couple of hours. As far as i can tell, it works fine. I'm calling it alpha
because i'm not doing proper error checking, and i'm not sure how to log
or
throw an error even if i detect one. Also, I have only compiled it on
Mac OS 10.3, I'll want to test it on my linux box.

--
Nathan Wagner

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

#9Mark Mielke
mark@mark.mielke.cc
In reply to: Jonah H. Harris (#6)
Re: uuid type for postgres

On Tue, Sep 06, 2005 at 03:57:55PM -0400, Jonah H. Harris wrote:

I agree with Josh on the UUID type, it gets abused far too often and (IMHO)
isn't widely enough used to belong in the core.

There is much in PostgreSQL from my perspective that falls under the
category of 'most advanced open source SQL server in the world', and
not at all in the category of 'widely used'. Unless the history of
PostgreSQL isn't to be accepted in terms of direction for PostgreSQL,
I don't see why such a useful building block shouldn't be supported
by the core. It may not be widely used, because it doesn't exist.
Not yet, anyways. pguuid is broken, remember?

Which isn't to say that it should be in the core, or it must be in the
core. It is to say, however, that I don't buy your arguments. Abused how?
How can you judge widely used for something that doesn't exist?

I'm not very newbie oriented. I think I tend to assume that anybody
using PostgreSQL must know what they are doing. This may be very
wrong of me to assume. I assume you all know what you are doing,
for example.

I'm curious as to what sort of abuses you and Josh are speaking about.
I'd particularly like to know if you think I am abusing it, as I don't
believe that I am. I would also be interested in knowing what you felt
would be an abuse with UUID, that wouldn't be an abuse with SERIAL.
Should we tell newbies not to use SERIAL? Perhaps we should. Do we?

Couldn't you just fix the
problem in pguuid rather than write a whole new type?

I'll submit my patches to pguuid once I've had some more production
use of it. For the little bit of code that it was, there were some
pretty bad bugs. It was sometimes crashing, due to what appears to be
a palloc() that was one byte short (it reliably crashed on me, the
line is wrong as written, and after adding + 1, it no longer crashes
at all). The operator definitions weren't correct, and the indices on
columns of the type were not being preferred by the query planner (a
good thing too - if they were used for any operator other than '=',
the errors in the operator definitions would have caused very
unexpected results).

So yes, pguuid can be fixed. I'm not sure that pguuid makes a good
model for a PostgreSQL extension, but it could be resurrected and
used. (I think the project showed no real updates since 2003?)

But - it's GPL, limiting its use with regard to PostgreSQL
distribution, and it requires special compilation, which as Nathan
found out, doesn't work on MacOS.

I think the GPL reason alone is a compelling reason to create a new
extension. Using a more portable UUID base library (either written
from scratch, or re-used from some other place providing a
compatible license) is a compelling reason to create a new
extension.

As to whether it should be in core - I think that a fully functional
module could be widely used, especially in larger systems that are
having difficulty spreading transactions across multiple
machines. This is the domain that UUID shines in. I can write a
transaction at my site without being very worried that it will collide
with a transaction at your site. We don't have to be connected in real
time.

In my choice of use, I'm using them instead of SERIAL columns, as I
wish to have more freedom merging production data with test data. I
wish to continually import production data into my test environment,
in a single direction. UUID will prevent conflicts from occurring.
SERIAL cannot (although in theory, I could set my copy of the serial
value to 1 billion or something hacky - but that doesn't scale in a
simple fashion to having several test environments). My other
preferred use, is to expose a handle on the data to the world in
select circumstances. I don't want to give them a SERIAL column, as
it lets them be able to predict what else they might have access
to. I'm not using it to secure the data (other routines will do this),
but I am using it to hide the data. Why should people accessing my
system know how many records exist in my tables? Why should they be
able to predict the next value? Why should they be able to relate
the data, or mine my data in the case that I allow read to all?

Yes, I can avoid exposing the UUID/SERIAL in most circumstances. They
can get at most of the data through a name based path. For a few of my
pieces of data, though, I want to provide a secondary means of
accessing the data that does not require a path. Names cause problems,
especially if the names contain UNICODE characters, or if the names
are very long. Giving them a reliably unique handle of fixed length
is highly desirable to me in these circumstances.

Anyways, I'm using a fixed up pguuid right now, and getting along fine.
Nathan, under MacOS, isn't - with or without my patch. And I believe
he stated he had issues with the GPL license.

Josh may be correct, that in terms of a position statement, my
arguments are all over the map, and ignorant of newbies. I'm not a
speaker, or a writer. I'm a technical person who was frustrated with
SERIAL, happy with UUID, frustrated to find pguuid broken, happy that
it was fixable, frustrated with having to compile and install it
separately, happy that there is interest from others (Nathan) in
attempting to have this building block worked on, and perhaps
eventually added to the core.

Personally, I'm not sure what the big opposition to UUID is all about.
Even abused, it can do a better job than SERIAL at quadruple the
storage requirements (not a big deal when considering PostgreSQL's
record overhead). If it'll make it more acceptable, I'll write the
UUID implementation myself, and release it under whatever license
you find most beneficial to PostgreSQL. :-)

Cheers,
mark

On 9/6/05, Josh Berkus <josh@agliodbs.com> wrote:

Mark,

I suggest that UUID be recommended in place of SERIAL for certain
classes of applications, and that it therefore belongs in the core.
UUID and SERIAL can be used together (although, once you have a UUID,
it may not be useful to also have a SERIAL).

I think that, if you want to push a refactored UUID type for PostgreSQL
8.2, that you'd better separate your database design arguments from your
inclusion arguments.

For example, you might get my agreement that it would be useful to have a
UUID as a core type; you would *never* get my agreement to recommend using
UUID to newbies. I have seen *far* too many abuses of UUIDs in really
bad database design. People who use them should be experienced enough to
know what they're doing.

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

#10David Fetter
david@fetter.org
In reply to: nathan wagner (#1)
Re: uuid type for postgres

On Tue, Sep 06, 2005 at 09:16:13PM +0000, nathan wagner wrote:

On Tue, Sep 06, 2005 at 03:57:55PM -0400, jonah.harris@gmail.com wrote:

I agree with Josh on the UUID type, it gets abused far too often

Out of curiosity, how does it get abused? It doesn't seem to me
that it would be any more prone to abuse than any other type.

A typical abuse, as I have seen it, is for OO coders on the front-end
to turn the entire databse into what amounts to an associative array.
It then becomes somewhere between difficult and impossible to get good
performance, as UUID is the only thing useful as a PK/FK.

and (IMHO) isn't widely enough used to belong in the core.

Seems like a self-fulfilling cycle. There isn't one in core or
contrib (pguuid is not in contrib, IIRC), and so one doesn't get
used. Since no one uses one, it's not in core or contrib.

In this case, not having it easy to get to is, IMHO, an *excellent*
idea. The whole thing is a giant foot-gun.

Just my $.02.

Cheers,
D
--
David Fetter david@fetter.org http://fetter.org/
phone: +1 510 893 6100 mobile: +1 415 235 3778

Remember to vote!

#11Bob Ippolito
bob@redivi.com
In reply to: nathan wagner (#1)
Re: uuid type for postgres

On Sep 6, 2005, at 2:16 PM, nathan wagner wrote:

On Tue, Sep 06, 2005 at 03:57:55PM -0400, jonah.harris@gmail.com
wrote:

I agree with Josh on the UUID type, it gets abused far too often

Out of curiosity, how does it get abused? It doesn't seem to me
that it would be any more prone to abuse than any other type.

and (IMHO) isn't widely enough used to belong in the core.

Seems like a self-fulfilling cycle. There isn't one in core
or contrib (pguuid is not in contrib, IIRC), and so one doesn't
get used. Since no one uses one, it's not in core or contrib.

Couldn't you just fix the
problem in pguuid rather than write a whole new type?

Not sure which you you're addressing here, but I can't fix the license
problem. Nor can I readily fix the lack of a mac ioctl() to get
the hardware mac address.

It's not that hard to get a MAC address or serial number out of a
Mac, the problem is I'm not going to bother writing that code for a
GPL extension to PostgreSQL. There's a better way, anyhow. Darwin
ships with UUID parse/generate/etc functions in its Libc (<uuid/
uuid.h>).

FWIW, that code (the Theodore Ts'o UUID implementation) looks to be
suitably licensed for PostgreSQL, and it actually it appears that
someone has already written a PostgreSQL UUID type using this it...
which is unfortunately LGPL'ed (why?!).

-bob

#12nathan wagner
nw@hydaspes.if.org
In reply to: Jonah H. Harris (#6)
Re: uuid type for postgres

On Tue, Sep 06, 2005 at 03:57:55PM -0400, jonah.harris@gmail.com wrote:

I agree with Josh on the UUID type, it gets abused far too often

Out of curiosity, how does it get abused? It doesn't seem to me
that it would be any more prone to abuse than any other type.

and (IMHO) isn't widely enough used to belong in the core.

Seems like a self-fulfilling cycle. There isn't one in core
or contrib (pguuid is not in contrib, IIRC), and so one doesn't
get used. Since no one uses one, it's not in core or contrib.

Couldn't you just fix the
problem in pguuid rather than write a whole new type?

Not sure which you you're addressing here, but I can't fix the license
problem. Nor can I readily fix the lack of a mac ioctl() to get
the hardware mac address.

In any case, from an economy of effort view, I'd much rather maintain
a postgres extension/interface to an existing uuid library, than fret
the details of writing a uuid library myself. UUIDs themselves don't
really interest me, I just happen to need one for my application.

I'm basically done. I'll tar it up and post a link to an alpha version in a
couple of hours. As far as i can tell, it works fine. I'm calling it alpha
because i'm not doing proper error checking, and i'm not sure how to log or
throw an error even if i detect one. Also, I have only compiled it on
Mac OS 10.3, I'll want to test it on my linux box.

--
Nathan Wagner

#13Mark Mielke
mark@mark.mielke.cc
In reply to: David Fetter (#10)
Re: uuid type for postgres

On Tue, Sep 06, 2005 at 01:49:28PM -0700, David Fetter wrote:

On Tue, Sep 06, 2005 at 09:16:13PM +0000, nathan wagner wrote:

On Tue, Sep 06, 2005 at 03:57:55PM -0400, jonah.harris@gmail.com wrote:

I agree with Josh on the UUID type, it gets abused far too often

Out of curiosity, how does it get abused? It doesn't seem to me
that it would be any more prone to abuse than any other type.

A typical abuse, as I have seen it, is for OO coders on the front-end
to turn the entire databse into what amounts to an associative array.
It then becomes somewhere between difficult and impossible to get good
performance, as UUID is the only thing useful as a PK/FK.

Replace "UUID" with "SERIAL" in your above paragraph, and I think your
argument is still true. Perhaps UUID makes it easier, as it crosses
the table boundary - but somebody so inclined, can use *val() to
define their own "database wide SERIAL identifier".

I don't see anything intrinisic about SERIAL, UUID, or OO abstracting
techniques that requires people to write inefficient code. People
write inefficient code because they don't know any better. They don't
properly look at how their data will be used, and what path the data
will take to get from the tables on disk, to the user they are
presenting the information to. If they had, in the case you describe,
perhaps they'd be horrified. Or perhaps they would STILL not know
any better. :-)

Technically, the UUID is 4x the size of a SERIAL, or 2x the size of a
SERIAL8. For this cost in terms of efficiency (table size, index size),
you are purchasing the scaleability of being able to, at any point in
the future, more easily (than SERIAL) merge your tables with other
tables, and have a relatively opaque handle to give out to allow
people to uniquely identify a collection of rows. For me, both are
appealing. For others, one or the other may be more appealing.

In terms of portability, perhaps (somebody else made this
arguments). If other systems don't have a UUID type, you end screwed
if you ever wished to ditch PostgreSQL. The reason I don't buy this
argument, though, is that in my current application, I dumped being
able to ditch PostgreSQL a long time ago. There are just too many neat
things I can do with PostgreSQL, that I can't do with MySQL, and that
I either can't, or don't want to do with Oracle. You're stuck with me
now... :-)

If people truly felt this way, there are many PostgreSQL extensions
that should be phased out, rather than introduced.

I'd like to point out the obvious - a UUID is just an identifier. As
it isn't linked to the table, it doesn't require generation to be
server-side. Any application can generate a UUID as two BIGINTs, and
use these two BIGINTs as a primary key, and accomplish the same.
Before I stumbled upon pguuid, and became interested in the rather
impressive extension interfaces provided by PostgreSQL, I was
considering doing this myself. The costs would have been more
complicated queries, and limited practical use from the psql command
line.

I've become comfortable with the concept of a UUID over the last two
or three years, as they have been a requirement for us to use for a
product at my primary place of employment. We use a source management
system with replicas at each site, that is synchronized periodically.
We write software on top of this system to perform automated tasks at
different sites (sometimes executing at the other sites), or that
requires configuration information to be associated with the data
stored in this source management system. In a previous product, we
used names to access the data. This failed horribly when the names
changed. We now frequently, and liberally use the UUID for the
objects. This allows us to associate data with, access or manipulate
the objects at any site, without ANY problem. To go back to a name
based access method seems a step BACKWARDS.

UUID definately has a place. It's only a question of the imagination
of the user as to how abused, or how well used, it is.

I think it deserves a spot in PostgreSQL, and that people would
possibly use it more than they would SERIAL. Eventually, SERIAL wraps
around. So you switch to SERIAL8. At the point that you have SERIAL8,
you aren't worried terribly about disk space, and you realize there is
usually no benefit at all to the numbers being ordered so closely. Why
not pick a scheme that is based on time? Perhaps create a revision
code field to deal with objects created simultaneously from the same
source. And why not identify the source to prevent collisions from
multiple sources? If we go from 8 bytes, to 16 bytes, we can encode
all of this information neatly. Welcome, UUID. :-)

mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

#14Bruce Momjian
bruce@momjian.us
In reply to: Mark Mielke (#9)
Re: uuid type for postgres

mark@mark.mielke.cc writes:

In my choice of use, I'm using them instead of SERIAL columns, as I
wish to have more freedom merging production data with test data. I
wish to continually import production data into my test environment,
in a single direction. UUID will prevent conflicts from occurring.
SERIAL cannot (although in theory, I could set my copy of the serial
value to 1 billion or something hacky - but that doesn't scale in a
simple fashion to having several test environments).

Just do something like this for every sequence:

ALTER SEQUENCE foo INCREMENT BY 100

And then choose a particular initial value for each server.

I agree with the others that uuid seems to be overused in lots of cases where
a simple serial would serve just as well.

However I don't see why a good uuid type is any less of a valid addition to
the set of data types than any of the others. inet, macaddr, all the geometric
types, for example.

Given the 3-phase commit work going into 8.2 (8.1?) it seems like a pretty
fundamental component of that whole ball of wax. A note in the documentation
that it's designed for use as part of a multiple-database system like that
might be helpful.

--
greg

#15Mark Mielke
mark@mark.mielke.cc
In reply to: Bruce Momjian (#14)
Re: uuid type for postgres

On Tue, Sep 06, 2005 at 05:31:43PM -0400, Greg Stark wrote:

Just do something like this for every sequence:
ALTER SEQUENCE foo INCREMENT BY 100
And then choose a particular initial value for each server.

*shudder*

But you are right. That would work. :-) (I shudder from the maintenance
head-ache - if I re-cloned the test database, and reset the sequence
to the wrong value, I would have overlap)

I agree with the others that uuid seems to be overused in lots of cases where
a simple serial would serve just as well.

Yes.

In March I only used serial columns (actually, I originally used the
hacky MySQL auto_increment type before I migrated the database to
PostgreSQL). I found problems under actual usage with regard to
testing, and with exposing the numbers to the users. I started by
introducing the uuid along side the serial column, but shortly after
realized that it was ridiculous. I effectively had two primary keys,
with two unique indexes. Drop the serial column, and I'm left with
one that does everything I want.

I now used mixed serial and uuid. I retained serial columns for space
reasons. Serial makes it more common for some of my table primary key
indexes or even tables themselves (enumerated type tables) to fit in
fewer pages, improving search time. Some tables may cluster rows more
naturally on a serial column (would depend if the uuid type was sorted
by embedded time stamp, or by literal byte array value - memcmp()).

For the objects represented by a collection of rows, that I expose
to external interfaces, however, I only use the uuid.

My point in this long and winded set of paragraphs, is that although I
agree that uuid might be overused in some situations, I think people
may be currently underusing it in others. They're both ways of
generating unique identifiers. Depending on the context, one will be
better than the other. They may both work for most circumstances.

However I don't see why a good uuid type is any less of a valid
addition to the set of data types than any of the others. inet,
macaddr, all the geometric types, for example.

I've never used geometric types, and don't see myself doing so any
time soon. :-)

Given the 3-phase commit work going into 8.2 (8.1?) it seems like a pretty
fundamental component of that whole ball of wax. A note in the documentation
that it's designed for use as part of a multiple-database system like that
might be helpful.

Yep.

Cheers,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mark Mielke (#9)
Re: uuid type for postgres

mark@mark.mielke.cc writes:

Personally, I'm not sure what the big opposition to UUID is all about.

I don't see any "big opposition". People are simply questioning the
idea whether it belongs in core PG. The reason we don't want to accept
everything-and-the-kitchen-sink in core is that we have only limited
manpower to maintain it. So you've got to justify that we should spend
our effort here and not elsewhere. There's a fair amount of nearly
unmaintained cruft in the core distro already (eg, the never-finished
"line" datatype ... or the entire rtree index module ...) and a datatype
that might be used by only a few people is a likely candidate to become
an unmaintained backwater. And yet it's hard to get rid of stuff that's
been there awhile. So one of the questions that's going to be asked is
how useful/popular it's really going to be.

One thing that is raising my own level of concern quite a bit is the
apparent portability issues. Code that isn't completely portable is a
huge maintainability problem; in particular, stuff that requires
system-dependent behavior used nowhere else in Postgres is a real pain.
It sounds like the UUID code expects to be able to get at the machine's
MAC address, which suggests serious issues in (a) relying on
not-too-standard APIs, (b) possible protection issues (will an
unprivileged process be able to get at the MAC address?), and (c)
ill-defined behavior on machines with more or less than one MAC address.
Not to mention that MAC addresses aren't so unique as all that.

The bottom line is that we're willing to listen, but it's not by any
means a slam dunk to get this into the distribution.

regards, tom lane

#17Mark Mielke
mark@mark.mielke.cc
In reply to: Tom Lane (#16)
Re: uuid type for postgres

On Tue, Sep 06, 2005 at 05:54:34PM -0400, Tom Lane wrote:

I don't see any "big opposition". People are simply questioning the
idea whether it belongs in core PG. The reason we don't want to accept
everything-and-the-kitchen-sink in core is that we have only limited
manpower to maintain it. So you've got to justify that we should spend
our effort here and not elsewhere. There's a fair amount of nearly
...
been there awhile. So one of the questions that's going to be asked is
how useful/popular it's really going to be.

Sounds reasonable, and certainly no more than I expected. If Nathan
hadn't raised the issue, it probably would have been a few months
before I raised it myself.

One thing that is raising my own level of concern quite a bit is the
apparent portability issues. Code that isn't completely portable is a
huge maintainability problem; in particular, stuff that requires
system-dependent behavior used nowhere else in Postgres is a real pain.
It sounds like the UUID code expects to be able to get at the machine's
MAC address, which suggests serious issues in (a) relying on
not-too-standard APIs, (b) possible protection issues (will an
unprivileged process be able to get at the MAC address?), and (c)
ill-defined behavior on machines with more or less than one MAC address.
Not to mention that MAC addresses aren't so unique as all that.

I'll try to prepare an answer for this. (I started to write a lot of
information - but is it unverified from memory, and perhaps should be
more authoritative before presented as truth)

The bottom line is that we're willing to listen, but it's not by any
means a slam dunk to get this into the distribution.

Sounds good.

Cheers,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

#18nathan wagner
nw@hydaspes.if.org
In reply to: Mark Mielke (#17)
Re: uuid type for postgres

As promised a link to the code.

http://granicus.if.org/~nw/ossp_pg_uuid-0.1.tar.gz

You'll also need Ralf Engelschall's uuid library, which mine is a postgres
interface to. It's available at

ftp://ftp.ossp.org/pkg/lib/uuid/uuid-1.3.0.tar.gz

It probably has a few warts. I'm mainly posting it for suggestions,
comments, and so we have something that isn't vaporware to argue about.

:)

--
Nathan Wagner

#19nathan wagner
nw@hydaspes.if.org
In reply to: Tom Lane (#16)
Re: uuid type for postgres

On Tue, Sep 06, 2005 at 05:54:34PM -0400, tgl@sss.pgh.pa.us wrote:

One thing that is raising my own level of concern quite a bit is the
apparent portability issues.

I can't speak to the portability in general, but there is a PORTING file
in the ossp uuid library that states

OSSP uuid was already written with maximum portability in mind, so
there should be no great effort required to get it running on any Unix
platform with a reasonable POSIX API. Additionally, the portability
was tested by successfully building and running it on the following
particular Unix platforms (syntax is "<cpu>-<os> (<compiler>)"):

alpha-tru644.0 (cc) alpha-tru645.1 (gcc, cc) hppa-hpux11.11 (cc)
ia64-hpux11.23 (cc) ix86-debian2.2 (gcc, icc) ix86-debian3.0 (gcc)
ix86-debian3.1 (gcc) ix86-freebsd4.9 (gcc) ix86-freebsd5.2 (gcc, icc)
ix86-netbsd1.6 (gcc) ix86-qnx6.2 (gcc) ix86-solaris10 (gcc) ix86-unixware7.1.3
(cc) mips64-irix6.5 (gcc) sparc64-solaris8 (gcc, forte) sparc64-solaris9 (gcc)

On my end I managed to compile it with nothing more than a "configure",
"make", followed by a "make install".

Code that isn't completely portable is a huge maintainability problem; in
particular, stuff that requires system-dependent behavior used nowhere
else in Postgres is a real pain. It sounds like the UUID code expects to
be able to get at the machine's MAC address,

If the mac address isn't available, I believe it falls back on using
a random 47 bit number with the 48th bit set to make the mac address
fall within the multicast mac numberspace. You could also use a version
4 uuid, or derive a version 3 or 5 uuid from some other source.

The better answer though, is these sort of questions are exactly why
I would prefer to rely on someone else's library. Just as I basically
trust that the folks maintaining postgres aren't going to munge my tables
and destroy my data if i mess up a transaction and roll it back,
because they've spent time thinking about just that sort of problem, I
also (having worked with the code a bit now) trust the UUID folks
to have thought about "just how do we make a unique number without
centralized coordination?" "base on the mac address?" "what if we
don't have one? or don't know it for some reason?" I assume here that
the answer they came up with wasn't "oh, hell, just return a 42 then".

The point being, that other people have already written a better uuid
library than i am likely to, so, license permitting, let's use it.

The bottom line is that we're willing to listen, but it's not by any
means a slam dunk to get this into the distribution.

Fair enough.

Personally, I think it should be a core type, but would be quite happy
if it were in contrib. At least that way it would save the next guy
from having to hunt around the net.

I guess i'm volunteering to maintain it in contrib. I'm not certain
if i have the requisite knowledge to maintain it in the core. While
I could acquire the familiarity if need be, for the next year and nine
months law school is going to take up the bulk of my free time. And
of course I'll still need time to play around with my ticketing and gis
databases i'm developing.

--
Nathan Wagner

#20Bob Ippolito
bob@redivi.com
In reply to: Mark Mielke (#17)
Re: uuid type for postgres

On Sep 6, 2005, at 3:06 PM, mark@mark.mielke.cc wrote:

On Tue, Sep 06, 2005 at 05:54:34PM -0400, Tom Lane wrote:

I don't see any "big opposition". People are simply questioning the
idea whether it belongs in core PG. The reason we don't want to
accept
everything-and-the-kitchen-sink in core is that we have only limited
manpower to maintain it. So you've got to justify that we should
spend
our effort here and not elsewhere. There's a fair amount of nearly
...
been there awhile. So one of the questions that's going to be
asked is
how useful/popular it's really going to be.

Sounds reasonable, and certainly no more than I expected. If Nathan
hadn't raised the issue, it probably would have been a few months
before I raised it myself.

One thing that is raising my own level of concern quite a bit is the
apparent portability issues. Code that isn't completely portable
is a
huge maintainability problem; in particular, stuff that requires
system-dependent behavior used nowhere else in Postgres is a real
pain.
It sounds like the UUID code expects to be able to get at the
machine's
MAC address, which suggests serious issues in (a) relying on
not-too-standard APIs, (b) possible protection issues (will an
unprivileged process be able to get at the MAC address?), and (c)
ill-defined behavior on machines with more or less than one MAC
address.
Not to mention that MAC addresses aren't so unique as all that.

I'll try to prepare an answer for this. (I started to write a lot of
information - but is it unverified from memory, and perhaps should be
more authoritative before presented as truth)

Some modern UUID implementations prefer /dev/urandom or similar to
the time or MAC address unless you really beg them to give you a
weaker UUID.

You can take a look at the man page for the Theodore Y. Ts'o
implementation that is in Darwin's Libc here:
http://developer.apple.com/documentation/Darwin/Reference/ManPages/
man3/uuid_generate.3.html

Specifically:

The uuid_generate function creates a new universally unique
identifier
(UUID). The uuid will be generated based on high-quality
randomness
from /dev/urandom, if available. If it is not
available, then
uuid_generate will use an alternative algorithm which uses
the current
time, the local ethernet MAC address (if available), and
random data
generated using a pseudo-random generator.

The Apache Portable Runtime has a apr_os_uuid_get() that supports two
flavors of UUID for unix (Linux/Mac OS X uuid_generate and FreeBSD's
uuid_create, may be available elsewhere), and the UuidCreate API on
Win32. apr-util's apr_uuid_get() will use apr_os_uuid_get() if
available, and otherwise will default to a relatively weak mostly-
timestamp-based UUID.

It would probably be reasonable and easy to do what Apache does
here. A platform UUID implementation, if present, is generally going
to be better than anything included into PostgreSQL itself.

-bob

#21Jonah H. Harris
jonah.harris@gmail.com
In reply to: Bob Ippolito (#20)
#22Bruce Momjian
bruce@momjian.us
In reply to: Mark Mielke (#13)
#23Josh Berkus
josh@agliodbs.com
In reply to: Bob Ippolito (#7)
#24Mark Mielke
mark@mark.mielke.cc
In reply to: Josh Berkus (#23)
#25Bob Ippolito
bob@redivi.com
In reply to: Josh Berkus (#23)
#26Josh Berkus
josh@agliodbs.com
In reply to: Mark Mielke (#24)
#27Dann Corbit
DCorbit@connx.com
In reply to: Josh Berkus (#26)
#28nathan wagner
nw@hydaspes.if.org
In reply to: Josh Berkus (#26)
#29Paul Ramsey
pramsey@cleverelephant.ca
In reply to: nathan wagner (#1)
#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Paul Ramsey (#29)
#31nathan wagner
nw@hydaspes.if.org
In reply to: Paul Ramsey (#29)
#32Roman Neuhauser
neuhauser@sigpipe.cz
In reply to: Tom Lane (#16)
#33Josh Berkus
josh@agliodbs.com
In reply to: nathan wagner (#31)
#34Josh Berkus
josh@agliodbs.com
In reply to: nathan wagner (#28)
#35nathan wagner
nw@hydaspes.if.org
In reply to: Josh Berkus (#33)
#36nathan wagner
nw@hydaspes.if.org
In reply to: Josh Berkus (#34)
#37Patrick Welche
prlw1@newn.cam.ac.uk
In reply to: Josh Berkus (#33)
#38Tom Lane
tgl@sss.pgh.pa.us
In reply to: nathan wagner (#35)
#39Tom Lane
tgl@sss.pgh.pa.us
In reply to: Josh Berkus (#33)
#40Josh Berkus
josh@agliodbs.com
In reply to: nathan wagner (#36)
#41nathan wagner
nw@hydaspes.if.org
In reply to: Josh Berkus (#40)
#42Bob Ippolito
bob@redivi.com
In reply to: nathan wagner (#35)
#43Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Patrick Welche (#37)
#44Josh Berkus
josh@agliodbs.com
In reply to: nathan wagner (#41)
#45Tom Lane
tgl@sss.pgh.pa.us
In reply to: Patrick Welche (#37)
#46Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: nathan wagner (#41)
#47Mark Mielke
mark@mark.mielke.cc
In reply to: Josh Berkus (#40)
#48Josh Berkus
josh@agliodbs.com
In reply to: Mark Mielke (#47)
#49Andrew Dunstan
andrew@dunslane.net
In reply to: Alvaro Herrera (#43)
#50Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#43)
#51Bruce Momjian
bruce@momjian.us
In reply to: Josh Berkus (#48)
#52Greg Sabino Mullane
greg@turnstep.com
In reply to: Mark Mielke (#47)
#53Jonah H. Harris
jonah.harris@gmail.com
In reply to: Greg Sabino Mullane (#52)
#54Mark Mielke
mark@mark.mielke.cc
In reply to: Greg Sabino Mullane (#52)
#55Jonah H. Harris
jonah.harris@gmail.com
In reply to: Mark Mielke (#54)
#56Mark Mielke
mark@mark.mielke.cc
In reply to: Jonah H. Harris (#55)
#57Jonah H. Harris
jonah.harris@gmail.com
In reply to: Mark Mielke (#56)
#58Bob Ippolito
bob@redivi.com
In reply to: Jonah H. Harris (#55)