UUID's as primary keys
I'm building an app where I have several entities that are identified
using a UUID (i.e. a 128 bit quantity). My current implementation uses a
composite primary key consisting of two int8 values. It's a bit
cumbersome and I would much rather have a distinct type. An earlier
implementation using Oracle mapped the UUID to a RAW(16) but PostgreSQL
doesn't have that and the BYTEA adds extra overhead.
What would be the best (as in most efficient) mapping for a 128 bit
primary key?
Regards,
Thomas Hallgren
On Wed, Jun 28, 2006 at 09:01:49AM +0200, Thomas Hallgren wrote:
I'm building an app where I have several entities that are identified
using a UUID (i.e. a 128 bit quantity). My current implementation uses a
composite primary key consisting of two int8 values. It's a bit
cumbersome and I would much rather have a distinct type. An earlier
implementation using Oracle mapped the UUID to a RAW(16) but PostgreSQL
doesn't have that and the BYTEA adds extra overhead.What would be the best (as in most efficient) mapping for a 128 bit
primary key?
Sounds like something for a custom type. There's one here[1]http://gborg.postgresql.org/project/pguuid/projdisplay.php though I
have no idea how good it is.
[1]: http://gborg.postgresql.org/project/pguuid/projdisplay.php
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout wrote:
On Wed, Jun 28, 2006 at 09:01:49AM +0200, Thomas Hallgren wrote:
I'm building an app where I have several entities that are identified
using a UUID (i.e. a 128 bit quantity). My current implementation uses a
composite primary key consisting of two int8 values. It's a bit
cumbersome and I would much rather have a distinct type. An earlier
implementation using Oracle mapped the UUID to a RAW(16) but PostgreSQL
doesn't have that and the BYTEA adds extra overhead.What would be the best (as in most efficient) mapping for a 128 bit
primary key?Sounds like something for a custom type. There's one here[1] though I
have no idea how good it is.[1] http://gborg.postgresql.org/project/pguuid/projdisplay.php
Have a nice day,
Thanks. That would of course work but at the same time it increases the
complexity of my app. Yet another component to install and keep track
of. It's also a bit of an overkill since the only thing I need is an
opaque bit storage. Why is it that PostgreSQL lack a fixed length binary
type similar to the RAW type in Oracle? ISTM that could be very useful
and not very hard to implement.
Regards,
Thomas Hallgren
On Wed, Jun 28, 2006 at 10:22:45AM +0200, Thomas Hallgren wrote:
Thanks. That would of course work but at the same time it increases the
complexity of my app. Yet another component to install and keep track
of. It's also a bit of an overkill since the only thing I need is an
opaque bit storage. Why is it that PostgreSQL lack a fixed length binary
type similar to the RAW type in Oracle? ISTM that could be very useful
and not very hard to implement.
AIUI, it can't be done because of a basic rule of the type system: the
typmod can't be necessary to interpret the binary representation of a
value. For something like RAW(16) the type would be the oid for "raw"
and the typmod would be 16. However, when reading the value from a disk
page, you're not given the typmod, so you have no way of determining
the length.
That's why there is a verlena header for types where you don't know the
length, which adds overhead. The alternative is to create a type of the
exact length you want, but in your case that's not acceptable either.
I'm not sure if there are any other alternatives.
Have a ncie day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout wrote:
On Wed, Jun 28, 2006 at 10:22:45AM +0200, Thomas Hallgren wrote:
Thanks. That would of course work but at the same time it increases the
complexity of my app. Yet another component to install and keep track
of. It's also a bit of an overkill since the only thing I need is an
opaque bit storage. Why is it that PostgreSQL lack a fixed length binary
type similar to the RAW type in Oracle? ISTM that could be very useful
and not very hard to implement.AIUI, it can't be done because of a basic rule of the type system: the
typmod can't be necessary to interpret the binary representation of a
value. For something like RAW(16) the type would be the oid for "raw"
and the typmod would be 16. However, when reading the value from a disk
page, you're not given the typmod, so you have no way of determining
the length.
OK. I thought you always had a type descriptor handy when reading the binary representation.
I've noticed that the typmod is expected in some receive functions (bpcharrecv and
numeric_recv for instance). Are you saying that there are times when you don't use that?
Regards,
Thomas Hallgren
On Wed, Jun 28, 2006 at 12:03:40PM +0200, Thomas Hallgren wrote:
OK. I thought you always had a type descriptor handy when reading the
binary representation. I've noticed that the typmod is expected in some
receive functions (bpcharrecv and numeric_recv for instance). Are you
saying that there are times when you don't use that?
The input functions get it, the output functions (bpcharout,
bpcharsend, etc) don't. Which makes it kind of hard to print a raw
value if you don't know how long it's going to be. They used to, but
that was removed some time back. It's a security issue IIRC, since any
user could call raw_out(field, 2048) and get whatever was in the 2K of
data after that field.
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout wrote:
On Wed, Jun 28, 2006 at 12:03:40PM +0200, Thomas Hallgren wrote:
OK. I thought you always had a type descriptor handy when reading the
binary representation. I've noticed that the typmod is expected in some
receive functions (bpcharrecv and numeric_recv for instance). Are you
saying that there are times when you don't use that?The input functions get it, the output functions (bpcharout,
bpcharsend, etc) don't. Which makes it kind of hard to print a raw
value if you don't know how long it's going to be. They used to, but
that was removed some time back. It's a security issue IIRC, since any
user could call raw_out(field, 2048) and get whatever was in the 2K of
data after that field.
A user that is trusted with installing a C-function in the backend is
free to scan the process memory anyway so in what way did that increase
the security? IMHO, the only relevant security in that context is to
have trusted people install trusted modules. I'm surprised that
something like that made you remove significant functionality.
Regards,
Thomas Hallgren
On Wed, Jun 28, 2006 at 01:56:47PM +0200, Thomas Hallgren wrote:
A user that is trusted with installing a C-function in the backend is
free to scan the process memory anyway so in what way did that increase
the security? IMHO, the only relevant security in that context is to
have trusted people install trusted modules. I'm surprised that
something like that made you remove significant functionality.
You're missing the point. The type output function is not generally a
priveledged function. Think bpcharout, text_out, numeric_out, etc...
These can be called by users directly and the input to those functions
cannot be trusted.
If the type output function needs an additional parameter to correctly
and safely decode the actual Datum, you're screwed because then users
can pass invalid parameters to affect the decoding. If you have a way
of telling what the right value is, then you didn't need to pass it in
the first place.
Hence, you have to be able to decode a datum knowing only its type,
irrespective of typmod. So say you had a field of type RAW(16) you
would have to be able to decode it knowing only that it is of type
"RAW". So you need a header to tell you how long it is i.e. a varlena
structure.
Hope this clarifies it a bit,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout <kleptog@svana.org> writes:
The input functions get it, the output functions (bpcharout,
bpcharsend, etc) don't. Which makes it kind of hard to print a raw
value if you don't know how long it's going to be. They used to, but
that was removed some time back.
Even back then you couldn't rely on the typmod value to be supplied;
it was quite likely to be passed as -1. The issue is not actually
with on-disk storage, it is with function/operator arguments and
results. Those have never been identified any more closely than by
giving a type OID. So for any value that came from a function,
you won't have a typmod, and you'd better be able to find out all
you need to know just by inspecting the value itself. Hence, length
words.
This is all pretty off-topic for pgsql-general, isn't it?
regards, tom lane
Martijn van Oosterhout wrote:
On Wed, Jun 28, 2006 at 01:56:47PM +0200, Thomas Hallgren wrote:
A user that is trusted with installing a C-function in the backend is
free to scan the process memory anyway so in what way did that increase
the security? IMHO, the only relevant security in that context is to
have trusted people install trusted modules. I'm surprised that
something like that made you remove significant functionality.You're missing the point. The type output function is not generally a
priveledged function. Think bpcharout, text_out, numeric_out, etc...
These can be called by users directly and the input to those functions
cannot be trusted.
Ah, OK that makes sense. An alternative solution when the signature was changed could
perhaps have been to pass one single argument, a structure appointing the data and its
associated type. My idea would work if the data and its type lived together always from the
moment its instantiated (read from disk or otherwise) and until death do them apart (or the
data is stored on disk, in which case the tupledesc knows what it is). I guess that would
imply a major rewrite and that my desire to have a RAW fixed length type isn't enough
motivation to do that :-)
Instead, I would like to humbly request the inclusion of a UUID datatype (or an opaque 128
bit datatype) in the core package. It's increasingly common and some databases (MS
SQLServer) already have built in support for it.
Regards,
Thomas Hallgren
On Wed, Jun 28, 2006 at 06:39:16PM +0200, Thomas Hallgren wrote:
Instead, I would like to humbly request the inclusion of a UUID datatype
(or an opaque 128 bit datatype) in the core package. It's increasingly
ISTM that we get enough requests for this that it's probably worth
doing.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Wed, Jun 28, 2006 at 06:39:16PM +0200, Thomas Hallgren wrote:
Instead, I would like to humbly request the inclusion of a UUID datatype
(or an opaque 128 bit datatype) in the core package. It's increasingly
common and some databases (MS SQLServer) already have built in support for
it.
We have it. We're just not putting in the effort required to have it
included in core, as it's too much effort to convince people that the
type has value, that is is generic, and would be widely used without
being abused. All the geometric types that I'll never use in core,
with few or no uses, including functions to operate on these types,
and no UUID type... Hehe... To me, that's irony... :-)
Cheers,
mark
--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada
One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...
On Wed, June 28, 2006 1:14 pm, mark@mark.mielke.cc wrote:
On Wed, Jun 28, 2006 at 06:39:16PM +0200, Thomas Hallgren wrote:
Instead, I would like to humbly request the inclusion of a UUID
datatype (or an opaque 128 bit datatype) in the core package. It's
increasingly common and some databases (MS SQLServer) already have built
in support for it.We have it. We're just not putting in the effort required to have it
included in core, as it's too much effort to convince people that the type
has value, that is is generic, and would be widely used without being
abused. All the geometric types that I'll never use in core, with few or
no uses, including functions to operate on these types, and no UUID
type... Hehe... To me, that's irony... :-)
Is it on pgfoundry? From past discussions, the new criteria for getting
something into core is to first determine if it is successful on
pgfoundry.
On Wed, Jun 28, 2006 at 01:18:39PM -0400, A.M. wrote:
On Wed, June 28, 2006 1:14 pm, mark@mark.mielke.cc wrote:
On Wed, Jun 28, 2006 at 06:39:16PM +0200, Thomas Hallgren wrote:
Instead, I would like to humbly request the inclusion of a UUID
datatype (or an opaque 128 bit datatype) in the core package. It's
increasingly common and some databases (MS SQLServer) already have built
in support for it.We have it. We're just not putting in the effort required to have it
included in core, as it's too much effort to convince people that the type
has value, that is is generic, and would be widely used without being
abused. All the geometric types that I'll never use in core, with few or
no uses, including functions to operate on these types, and no UUID
type... Hehe... To me, that's irony... :-)Is it on pgfoundry? From past discussions, the new criteria for getting
something into core is to first determine if it is successful on
pgfoundry.
If http://lnk.nu/pgfoundry.org/a86.php is accurate, then no one has ever
downloaded it. But I find that exceptionally hard to believe...
Looking back through the list archives I think you'd find this comes up
at least every few months.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Wed, Jun 28, 2006 at 12:38:50PM -0500, Jim C. Nasby wrote:
On Wed, Jun 28, 2006 at 01:18:39PM -0400, A.M. wrote:
On Wed, June 28, 2006 1:14 pm, mark@mark.mielke.cc wrote:
On Wed, Jun 28, 2006 at 06:39:16PM +0200, Thomas Hallgren wrote:
Instead, I would like to humbly request the inclusion of a UUID
datatype (or an opaque 128 bit datatype) in the core package. It's
increasingly common and some databases (MS SQLServer) already have built
in support for it.We have it. We're just not putting in the effort required to have it
included in core, as it's too much effort to convince people that the type
has value, that is is generic, and would be widely used without being
abused. All the geometric types that I'll never use in core, with few or
no uses, including functions to operate on these types, and no UUID
type... Hehe... To me, that's irony... :-)Is it on pgfoundry? From past discussions, the new criteria for getting
something into core is to first determine if it is successful on
pgfoundry.If http://lnk.nu/pgfoundry.org/a86.php is accurate, then no one has ever
downloaded it. But I find that exceptionally hard to believe...Looking back through the list archives I think you'd find this comes up
at least every few months.
I've downloaded the version off pgfoundry.org. It is broken. It leaks
memory, and if memory is correct it can cause the client to core dump.
Two of us worked on a re-write based off a different UUID system library,
and I've been happily using it in production for a year or so. I don't
believe either of us have bothered to market it. Each time it comes up,
a number of people on this list shut it down, and it doesn't seem worth
the effort to convince them otherwise. They can have their ivory tower,
and I can have my plugin.
Cheers,
mark
--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada
One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...
Jim C. Nasby wrote:
On Wed, Jun 28, 2006 at 01:18:39PM -0400, A.M. wrote:
On Wed, June 28, 2006 1:14 pm, mark@mark.mielke.cc wrote:
On Wed, Jun 28, 2006 at 06:39:16PM +0200, Thomas Hallgren wrote:
Instead, I would like to humbly request the inclusion of a UUID
datatype (or an opaque 128 bit datatype) in the core package. It's
increasingly common and some databases (MS SQLServer) already have built
in support for it.We have it. We're just not putting in the effort required to have it
included in core, as it's too much effort to convince people that the type
has value, that is is generic, and would be widely used without being
abused. All the geometric types that I'll never use in core, with few or
no uses, including functions to operate on these types, and no UUID
type... Hehe... To me, that's irony... :-)Is it on pgfoundry? From past discussions, the new criteria for getting
something into core is to first determine if it is successful on
pgfoundry.If http://lnk.nu/pgfoundry.org/a86.php is accurate, then no one has ever
downloaded it. But I find that exceptionally hard to believe...Looking back through the list archives I think you'd find this comes up
at least every few months.
That's because there is nothing there to download. See instead:
http://gborg.postgresql.org/project/pguuid/projdisplay.php
Personally I don't buy the misuse objection - we already have plenty of
things that can be misused. As long as there is a reasonable valid use
and we can make it portable enough, I think there is a good case for
including it.
cheers
andrew
mark@mark.mielke.cc wrote:
I've downloaded the version off pgfoundry.org. It is broken. It leaks
memory, and if memory is correct it can cause the client to core dump.
Also it couldn't possibly be included in core, since it's based on a
GPL'ed UUID library. If you have a more appropiately licensed package,
it could be considered for inclusion.
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
On Wed, Jun 28, 2006 at 01:49:55PM -0400, Andrew Dunstan wrote:
Personally I don't buy the misuse objection - we already have plenty of
things that can be misused. As long as there is a reasonable valid use
and we can make it portable enough, I think there is a good case for
including it.
Well, since Mark has one, how about we consider adding it in?
If nothing else, can you please put your stuff on pgFoundry so others
can find it, Mark?
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Tom Lane <tgl@sss.pgh.pa.us> writes:
Martijn van Oosterhout <kleptog@svana.org> writes:
The input functions get it, the output functions (bpcharout,
bpcharsend, etc) don't. Which makes it kind of hard to print a raw
value if you don't know how long it's going to be. They used to, but
that was removed some time back.
Even back then you couldn't rely on the typmod value to be supplied;
it was quite likely to be passed as -1. The issue is not actually
with on-disk storage, it is with function/operator arguments and
results. Those have never been identified any more closely than by
giving a type OID. So for any value that came from a function,
you won't have a typmod, and you'd better be able to find out all
you need to know just by inspecting the value itself. Hence, length
words.
Hm, so it could be stored on disk without the length header as long as the
length header is added to the in-memory representation? I don't think the type
system has hooks for reading and storing data to disk though.
This is all pretty off-topic for pgsql-general, isn't it?
[moved to -hackers]
--
greg
Greg Stark <gsstark@mit.edu> writes:
Hm, so it could be stored on disk without the length header as long as
the length header is added to the in-memory representation? I don't
think the type system has hooks for reading and storing data to disk
though.
No, it doesn't, and we'd pay a nonzero price for allowing that.
Currently the executor doesn't have to care (much) about whether a
tuple is on-disk or in-memory --- the individual datums look the same
either way. Allowing them to be different would force a lot of
format conversion steps that currently need not happen.
regards, tom lane
On Wed, Jun 28, 2006 at 01:12:17PM -0500, Jim C. Nasby wrote:
On Wed, Jun 28, 2006 at 01:49:55PM -0400, Andrew Dunstan wrote:
Personally I don't buy the misuse objection - we already have plenty of
things that can be misused. As long as there is a reasonable valid use
and we can make it portable enough, I think there is a good case for
including it.Well, since Mark has one, how about we consider adding it in?
If nothing else, can you please put your stuff on pgFoundry so others
can find it, Mark?
It was written by Nathan Wagner <nw@hydaspes.if.org> and myself, and
is based off the OSSP ( http://www.ossp.org/ ) UUID implementation.
I'm not an expert on the license, but it seems acceptable to me:
"Permission to use, copy, modify, and distribute this software for
any purpose with or without fee is hereby granted, provided that
the above copyright notice and this permission notice appear in all
copies."
I haven't tested to see how portable the OSSP UUID implementation is.
This is their words:
"OSSP uuid was already written with maximum portability in mind, so
there should be no great effort required to get it running on any Unix
platform with a reasonable POSIX API. Additionally, the portability
was tested by successfully building and running it on the following
particular Unix platforms (syntax is "<cpu>-<os> (<compiler>)"):
alpha-tru644.0 (cc)
alpha-tru645.1 (gcc, cc)
hppa-hpux11.11 (cc)
ia64-hpux11.23 (cc)
ix86-debian2.2 (gcc, icc)
ix86-debian3.0 (gcc)
ix86-debian3.1 (gcc)
ix86-freebsd4.9 (gcc)
ix86-freebsd5.2 (gcc, icc)
ix86-netbsd1.6 (gcc)
ix86-qnx6.2 (gcc)
ix86-solaris10 (gcc)
ix86-unixware7.1.3 (cc)
mips64-irix6.5 (gcc)
sparc64-solaris8 (gcc, forte)
sparc64-solaris9 (gcc)"
I've put it through a fair amount of testing, including using it
within compound indexes, expecting the index to be used for at
least '=', constructing many UUIDs quickly, in a sequence, and
converting it to and from string form. We chose to implement our
own encode / decode routines for performance reasons. With the
exception of testing it on a wider range of platforms, I would
call the module stable.
If there is interest - I'm sure Nathan and I would be willing to put
it on pgfoundry, and at some point give it up for inclusion into
PostgreSQL.
Cheers,
mark
--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada
One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...
On Thu, Jun 29, 2006 at 02:02:32AM -0400, mark@mark.mielke.cc wrote:
It was written by Nathan Wagner <nw@hydaspes.if.org> and myself, and
is based off the OSSP ( http://www.ossp.org/ ) UUID implementation.
I'm not an expert on the license, but it seems acceptable to me:
...
If there is interest - I'm sure Nathan and I would be willing to put
it on pgfoundry, and at some point give it up for inclusion into
PostgreSQL.
This might require a little bit of research. It appears that the
development version of OSSP UUID may provide its own PostgreSQL
'bindings'. I may try and contact the author of the OSSP UUID and
see whether any changes we have that he does not, can be rolled
into his version...
Cheers,
mark
--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada
One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...
On Jun 28, 2006, at 10:14 AM, mark@mark.mielke.cc wrote:
All the geometric types that I'll never use in core,
with few or no uses, including functions to operate on these types,
and no UUID type... Hehe... To me, that's irony... :-)
Interestingly, the superior geometry capability is driving a lot of
recent migration from MySQL to PostgreSQL in my own experience,
especially with PostGIS. The geometry parts may not get as much love
as other parts, but they still get to leverage the very solid
foundation they are built on top of. The geometry capability of
MySQL is basically checklist in nature, as it lacks the more
sophisticated indexing and query execution that is really required to
get passable performance from queries with geometry in them. MySQL
has similar geometry capability to PostgreSQL in theory if you don't
look too closely, but in practice the engine is not up to the more
rigorous demands of that kind of work.
With the nascent rise of the geospatial web, it is going to become a
lot more important than it has been.
J. Andrew Rogers
jrogers@neopolitan.com
Tom Lane <tgl@sss.pgh.pa.us> writes:
Greg Stark <gsstark@mit.edu> writes:
Hm, so it could be stored on disk without the length header as long as
the length header is added to the in-memory representation? I don't
think the type system has hooks for reading and storing data to disk
though.No, it doesn't, and we'd pay a nonzero price for allowing that.
Currently the executor doesn't have to care (much) about whether a
tuple is on-disk or in-memory --- the individual datums look the same
either way. Allowing them to be different would force a lot of
format conversion steps that currently need not happen.
Is there ever a case where an entire tuple is passed around without knowing
the typmod of an attribute in the tuple?
The conversion would only really have to happen when the attribute is fetched
or stored, not when the tuple is being passed around wholesale. But I have a
feeling that would be more intrusive than just making the entire system typmod
aware.
--
greg
mark@mark.mielke.cc wrote:
On Wed, Jun 28, 2006 at 01:12:17PM -0500, Jim C. Nasby wrote:
On Wed, Jun 28, 2006 at 01:49:55PM -0400, Andrew Dunstan wrote:
Personally I don't buy the misuse objection - we already have plenty of
things that can be misused. As long as there is a reasonable valid use
and we can make it portable enough, I think there is a good case for
including it.Well, since Mark has one, how about we consider adding it in?
If nothing else, can you please put your stuff on pgFoundry so others
can find it, Mark?It was written by Nathan Wagner <nw@hydaspes.if.org> and myself, and
is based off the OSSP ( http://www.ossp.org/ ) UUID implementation.
I'm not an expert on the license, but it seems acceptable to me:"Permission to use, copy, modify, and distribute this software for
any purpose with or without fee is hereby granted, provided that
the above copyright notice and this permission notice appear in all
copies."I haven't tested to see how portable the OSSP UUID implementation is.
This is their words:"OSSP uuid was already written with maximum portability in mind, so
there should be no great effort required to get it running on any Unix
platform with a reasonable POSIX API. Additionally, the portability
was tested by successfully building and running it on the following
particular Unix platforms (syntax is "<cpu>-<os> (<compiler>)"):alpha-tru644.0 (cc)
alpha-tru645.1 (gcc, cc)
hppa-hpux11.11 (cc)
ia64-hpux11.23 (cc)
ix86-debian2.2 (gcc, icc)
ix86-debian3.0 (gcc)
ix86-debian3.1 (gcc)
ix86-freebsd4.9 (gcc)
ix86-freebsd5.2 (gcc, icc)
ix86-netbsd1.6 (gcc)
ix86-qnx6.2 (gcc)
ix86-solaris10 (gcc)
ix86-unixware7.1.3 (cc)
mips64-irix6.5 (gcc)
sparc64-solaris8 (gcc, forte)
sparc64-solaris9 (gcc)"I've put it through a fair amount of testing, including using it
within compound indexes, expecting the index to be used for at
least '=', constructing many UUIDs quickly, in a sequence, and
converting it to and from string form. We chose to implement our
own encode / decode routines for performance reasons. With the
exception of testing it on a wider range of platforms, I would
call the module stable.If there is interest - I'm sure Nathan and I would be willing to put
it on pgfoundry, and at some point give it up for inclusion into
PostgreSQL.
One requirement would be that it runs on Windows. Is that something you have tested?
Regards,
Thomas Hallgren
mark@mark.mielke.cc wrote:
On Thu, Jun 29, 2006 at 02:02:32AM -0400, mark@mark.mielke.cc wrote:
It was written by Nathan Wagner <nw@hydaspes.if.org> and myself, and
is based off the OSSP ( http://www.ossp.org/ ) UUID implementation.
I'm not an expert on the license, but it seems acceptable to me:
...
If there is interest - I'm sure Nathan and I would be willing to put
it on pgfoundry, and at some point give it up for inclusion into
PostgreSQL.This might require a little bit of research. It appears that the
development version of OSSP UUID may provide its own PostgreSQL
'bindings'. I may try and contact the author of the OSSP UUID and
see whether any changes we have that he does not, can be rolled
into his version...Cheers,
mark
I'm thinking ahead on possible objections to inclusion in core. One objection might be that
a fully blown UUID implementation is a lot of code. Code that needs to be maintained and it
increases the size of the binary etc. A solution to that might be to break the whole thing
up in two:
1 The actual type
A plain scalar type that stores 16 bytes. It's complete with standard operators for
comparison (natural order) and the text representation would be a 32 character hexadecimal
string. This type should make no interpretation whatsoever on what it stores and its only
association with UUID's is the storage size.
2 UUID utilities
Various ways of representing, generating, and extract partial information from UUID's.
Should have support for variants #0, #1, and #2 (the OSSP based code sounds like a good
candidate).
The split make sense since clients often have powerful UUID utilities handy and hence have
limited or no use for such utilities in the database (true for all .NET and Java clients).
Some PL's will also enable such packages out of the box.
The actual type would be extremely generic, lightweight, and easy to implement. No
portability issues whatsoever. The only difficulty that I see is naming it :-).
Regards,
Thomas Hallgren
If there is interest - I'm sure Nathan and I would be willing to put
it on pgfoundry, and at some point give it up for inclusion into
PostgreSQL.One requirement would be that it runs on Windows. Is that something you
have tested?
In case it influences anyone, MySQL 5 already has built-in UUID support:
http://dev.mysql.com/doc/refman/5.0/en/miscellaneous-functions.html#id2899901
Chris
On Thu, Jun 29, 2006 at 02:40:15AM -0400, Greg Stark wrote:
Greg Stark <gsstark@mit.edu> writes:
No, it doesn't, and we'd pay a nonzero price for allowing that.
Currently the executor doesn't have to care (much) about whether a
tuple is on-disk or in-memory --- the individual datums look the same
either way. Allowing them to be different would force a lot of
format conversion steps that currently need not happen.Is there ever a case where an entire tuple is passed around without knowing
the typmod of an attribute in the tuple?
A tuple is just an array of datums, with some header information. The
problems come when you don't have a tuple anymore, but only the datum,
like in arguments for functions.
I think it's more a case that most places that deal with datums simply
don't know about typmods. For example, the return type of a function
can only be char, not char(16). If you consider the case of a function
returning a RAW, the caller will have no way of knowing the typmod,
they do know the type though.
To be honest, it seems like a lot of work to save the four bytes of
overhead for the varlena structure on disk if you're going to need it
in memory anyway. And anything like RAW(16) which people want for
UUIDs, if it's going to have a lot of functions associated with it, may
as well just be a new type.
I think time would be much better spent finding a way of allowing
user-defined types to be created without using C functions.
The conversion would only really have to happen when the attribute is fetched
or stored, not when the tuple is being passed around wholesale. But I have a
feeling that would be more intrusive than just making the entire system typmod
aware.
I'm not sure if tuples are ever passed wholesale very far. The first
node to actually do anything with it (any join, expression or condition
test) is going to need to deconstruct it. Consider where we currently we
have a "Filter Cond" on a "Seq Scan". Currently the filter can access
the datums directly on the disk page, with what you're proposing, it
can't.
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout <kleptog@svana.org> writes:
A tuple is just an array of datums, with some header information. The
problems come when you don't have a tuple anymore, but only the datum,
like in arguments for functions.I think it's more a case that most places that deal with datums simply
don't know about typmods. For example, the return type of a function
can only be char, not char(16). If you consider the case of a function
returning a RAW, the caller will have no way of knowing the typmod,
they do know the type though.To be honest, it seems like a lot of work to save the four bytes of
overhead for the varlena structure on disk if you're going to need it
in memory anyway. And anything like RAW(16) which people want for
UUIDs, if it's going to have a lot of functions associated with it, may
as well just be a new type.
For large databases storage density leads directly to speed. Saving four bytes
of overhead on a 16-byte data structure would mean a 20% speed increase. Even
if that's only helpful on a tenth of the columns you're still talking about a
2% speed increase for all queries on the table. A lot of databases use CHAR(1)
for flags. The overhead is even worse there.
Consider where we currently we have a "Filter Cond" on a "Seq Scan".
Currently the filter can access the datums directly on the disk page, with
what you're proposing, it can't.
Well it only can't if the data type has conversion functions. I'm not sure how
complex it would be having pointers that *getattr sometimes return pointers to
the disk page and sometimes return pointers to a palloced copy though.
--
greg
Greg Stark wrote:
Martijn van Oosterhout <kleptog@svana.org> writes:
To be honest, it seems like a lot of work to save the four bytes of
overhead for the varlena structure on disk if you're going to need it
in memory anyway. And anything like RAW(16) which people want for
UUIDs, if it's going to have a lot of functions associated with it, may
as well just be a new type.For large databases storage density leads directly to speed. Saving four bytes
of overhead on a 16-byte data structure would mean a 20% speed increase. Even
if that's only helpful on a tenth of the columns you're still talking about a
2% speed increase for all queries on the table. A lot of databases use CHAR(1)
for flags. The overhead is even worse there.
I have to concur with this. Assume you use a bytea for a UUID that in
turn is used as a primary key. The extra overhead will be reflected in
all indexes, all foreign keys, etc. In a normalized database some tables
may consist of UUID columns only.
Regards,
Thomas Hallgren
On Thu, Jun 29, 2006 at 03:54:36PM +0200, Thomas Hallgren wrote:
I have to concur with this. Assume you use a bytea for a UUID that in
turn is used as a primary key. The extra overhead will be reflected in
all indexes, all foreign keys, etc. In a normalized database some tables
may consist of UUID columns only.
So you create a UUID type. It's cheap enough to create new types after
all, that's one of postgresql's strengths. What I'm saying is that it's
easier to create new fixed length types for the cases that need it,
than it is to redo the entire type handling of the backend.
And for people that want char(1), they should be using "char", which
really is one byte (ex padding ofcourse).
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout <kleptog@svana.org> writes:
On Thu, Jun 29, 2006 at 03:54:36PM +0200, Thomas Hallgren wrote:
I have to concur with this. Assume you use a bytea for a UUID that in
turn is used as a primary key. The extra overhead will be reflected in
all indexes, all foreign keys, etc. In a normalized database some tables
may consist of UUID columns only.So you create a UUID type. It's cheap enough to create new types after
all, that's one of postgresql's strengths. What I'm saying is that it's
easier to create new fixed length types for the cases that need it,
than it is to redo the entire type handling of the backend.
I guess my motivation here is that I feel currently char(n) is basically
broken in Postgres. Sure it satisfies the letter of the specification, but
it's failing to actually achieve anything for the users. There's no point at
all in using char(n) in Postgres since it takes exactly the same amount of
space as varchar() if you're always stuffing it full and more space if you're
not.
In the current setup the only reason for Postgres to have this data type at
all is purely for legacy compatibility. It doesn't actually "work" in that it
doesn't provide the space savings it's intended to and that would give users
an actual reason to use it in new databases.
--
greg
Greg Stark <gsstark@mit.edu> writes:
In the current setup the only reason for Postgres to have this data type at
all is purely for legacy compatibility.
Yes. So?
regards, tom lane
On Thu, Jun 29, 2006 at 09:12:32AM +0200, Thomas Hallgren wrote:
The split make sense since clients often have powerful UUID utilities handy
and hence have limited or no use for such utilities in the database (true
for all .NET and Java clients). Some PL's will also enable such packages
out of the box.
I agree about splitting the utilities, except that I think the database
should be able to generate UUIDs somehow.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Martijn van Oosterhout wrote:
On Thu, Jun 29, 2006 at 03:54:36PM +0200, Thomas Hallgren wrote:
I have to concur with this. Assume you use a bytea for a UUID that in
turn is used as a primary key. The extra overhead will be reflected in
all indexes, all foreign keys, etc. In a normalized database some tables
may consist of UUID columns only.So you create a UUID type. It's cheap enough to create new types after
all, that's one of postgresql's strengths.
It would be a whole lot easier if I could use a domain.
What I'm saying is that it's
easier to create new fixed length types for the cases that need it,
than it is to redo the entire type handling of the backend.
Of course. But it's a matter of who does what. Your reasoning push the
burden to the users.
Regards,
Thomas Hallgren
On Thu, Jun 29, 2006 at 06:40:13PM +0200, Thomas Hallgren wrote:
Martijn van Oosterhout wrote:
On Thu, Jun 29, 2006 at 03:54:36PM +0200, Thomas Hallgren wrote:
I have to concur with this. Assume you use a bytea for a UUID that in
turn is used as a primary key. The extra overhead will be reflected in
all indexes, all foreign keys, etc. In a normalized database some tables
may consist of UUID columns only.So you create a UUID type. It's cheap enough to create new types after
all, that's one of postgresql's strengths.It would be a whole lot easier if I could use a domain.
It seems to me that maybe the backend should include a 16-byte fixed
length object (after all, we've got 1, 2, 4 and 8 bytes already) and
then people can use that to build whatever they like, using domains,
for example...
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
From each according to his ability. To each according to his ability to litigate.
Jim,
I agree about splitting the utilities, except that I think the database
should be able to generate UUIDs somehow.
There is a GUID add-in, and someone is working on a 2nd one. UUIDs are
not part of the SQL standard, and we've only seen sporadic demand for
them (and different types each time) so I can't imagine one making it
further than contrib real soon.
Also, one could argue that UUIDs are a foot gun, so they're not exactly
the type of thing we want to advocate in advance of demand.
--Josh Berkus
Josh Berkus wrote:
Jim,
I agree about splitting the utilities, except that I think the database
should be able to generate UUIDs somehow.There is a GUID add-in, and someone is working on a 2nd one. UUIDs
are not part of the SQL standard, and we've only seen sporadic demand
for them (and different types each time) so I can't imagine one making
it further than contrib real soon.
Also, one could argue that UUIDs are a foot gun, so they're not
exactly the type of thing we want to advocate in advance of demand.
Martijn van Oosterhout wrote:
It seems to me that maybe the backend should include a 16-byte fixed
length object (after all, we've got 1, 2, 4 and 8 bytes already) and
then people can use that to build whatever they like, using domains,
for example...
So how about the split? I.e. just add a 16 byte data type and forget all
about UUID's for now.
Regards,
Thomas Hallgren
On Fri, Jun 30, 2006 at 08:53:28AM +0200, Thomas Hallgren wrote:
Josh Berkus wrote:
I agree about splitting the utilities, except that I think the database
should be able to generate UUIDs somehow.There is a GUID add-in, and someone is working on a 2nd one. UUIDs
are not part of the SQL standard, and we've only seen sporadic demand
for them (and different types each time) so I can't imagine one making
it further than contrib real soon.Also, one could argue that UUIDs are a foot gun, so they're not
exactly the type of thing we want to advocate in advance of demand.
Josh: Although PostgreSQL is easy to extend - it is still beyond many
people to put in the time required to learn how. The demand exists.
It's the supply that doesn't. People work around the problem.
Perhaps they are using one of the sequence number 'tricks' such as
having each site be allocated a range, or modulus. I was willing to
learn how to implement a UUID type, and not willing to use one of
these sequence number hacks.
If you want to call UUID a foot gun - then please call sequence
numbers hacks a foot gun as well, to be fair, and then we can start
to talk about how a theoretically perfect system should work.
If UUID existed in core, people would use it. It would be used, and it
would be abused - like most other PostgreSQL features. There would be
nothing unique about this. And what's the real danger anyways? People
who should be using 4 byte or 8 byte sequences, find that they lose
a little performance, and that their databases are larger than they
expected? UUID is designed not to collide. So what is the real danger
you are speaking about? How bad can they shoot themselves?
Martijn van Oosterhout wrote:
It seems to me that maybe the backend should include a 16-byte fixed
length object (after all, we've got 1, 2, 4 and 8 bytes already) and
then people can use that to build whatever they like, using domains,
for example...So how about the split? I.e. just add a 16 byte data type and forget all
about UUID's for now.
Martijn: Were you thinking that it would look like a really big integer,
displayed by default as a decimal string in the client?
This makes sense to me.
If it was a full data type - could it be passed around in memory by
value, and not as a pointer? Or is 16 bytes too big to pass around by
value?
Cheers,
mark
--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada
One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...
On Fri, Jun 30, 2006 at 04:04:19AM -0400, mark@mark.mielke.cc wrote:
Martijn van Oosterhout wrote:
It seems to me that maybe the backend should include a 16-byte fixed
length object (after all, we've got 1, 2, 4 and 8 bytes already) and
then people can use that to build whatever they like, using domains,
for example...So how about the split? I.e. just add a 16 byte data type and forget all
about UUID's for now.Martijn: Were you thinking that it would look like a really big integer,
displayed by default as a decimal string in the client?This makes sense to me.
Either that, or a hex string. My problem with displaying as integer is
that not many clients will be able to parse (or print) a 16-byte
integer (the C library doesn't do it), but anyone can write a
hex-to-binary converter, or convince scanf/printf to do it for them.
If it was a full data type - could it be passed around in memory by
value, and not as a pointer? Or is 16 bytes too big to pass around by
value?
You can't pass it by value (doesn't fit in a register on the CPU and
there is no corrosponding C type), and I'm not sure you'd want to. A
pointer is much easier and faster to pass around.
The other thing I was thinking of is a type generator, like so:
# select make_raw_hex_type(16,'uuid');
NOTICE: Created raw hex type 'uuid' of fixed length 16
make_raw_hex_type
-------------------
(0 rows)
# select '1234FF'::uuid;
ERROR: Bad length for type 'uuid'
# select 'hello world'::uuid;
ERROR: Invalid characters for type 'uuid'
# select '1234567890abcdef'::uuid;
?column?
------------------
1234567890ABCDEF
(1 row)
Only this could be used to create other types too, for cryptographic
functions for example. PostgreSQL doesn't have any type generators yet,
so I'm unsure whether a patch creating one would be accepted for core.
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
From each according to his ability. To each according to his ability to litigate.
On Fri, Jun 30, 2006 at 10:38:49AM +0200, Martijn van Oosterhout wrote:
On Fri, Jun 30, 2006 at 04:04:19AM -0400, mark@mark.mielke.cc wrote:
It seems to me that maybe the backend should include a 16-byte fixed
length object (after all, we've got 1, 2, 4 and 8 bytes already) and
then people can use that to build whatever they like, using domains,
for example...So how about the split? I.e. just add a 16 byte data type and forget all
about UUID's for now.Martijn: Were you thinking that it would look like a really big integer,
displayed by default as a decimal string in the client?
This makes sense to me.Either that, or a hex string. My problem with displaying as integer is
that not many clients will be able to parse (or print) a 16-byte
integer (the C library doesn't do it), but anyone can write a
hex-to-binary converter, or convince scanf/printf to do it for them.
No real preference here. I'd be happy to have a native 16-byte type.
If it was a full data type - could it be passed around in memory by
value, and not as a pointer? Or is 16 bytes too big to pass around by
value?You can't pass it by value (doesn't fit in a register on the CPU and
there is no corrosponding C type), and I'm not sure you'd want to. A
pointer is much easier and faster to pass around.
It depends how it is used. If the memory location needs to be
allocated, for the value to be used only a few times, the overhead of
allocation and redirection can be more expensive. If many though, than
the reduction in value copying can make the pointer faster. 64-bytes,
and 128-bytes are just on the line of not being clearly one or the
other. It was just a thought though. The PostgreSQL API seemed pretty
fixed the last time I looked at this stuff.
The other thing I was thinking of is a type generator, like so:
# select make_raw_hex_type(16,'uuid');
NOTICE: Created raw hex type 'uuid' of fixed length 16
make_raw_hex_type
-------------------
(0 rows)
# select '1234FF'::uuid;
ERROR: Bad length for type 'uuid'
# select 'hello world'::uuid;
ERROR: Invalid characters for type 'uuid'
# select '1234567890abcdef'::uuid;
?column?
------------------
1234567890ABCDEF
(1 row)
Only this could be used to create other types too, for cryptographic
functions for example. PostgreSQL doesn't have any type generators yet,
so I'm unsure whether a patch creating one would be accepted for core.
Not sure what I think of this. I suppose the intention would be for it
to work for lengths other than 16? I can see people wanting to use such
a generalized function for char as well as bytea, for at least latin1
characters...
If people agree to a generic 16-byte type, or a hex type with defined
fixed length with a set of standard functions and index operators that
it should work for, but nobody more qualified wants to make the patch
- I'll step up.
Cheers,
mark
--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada
One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...
On Fri, Jun 30, 2006 at 12:39:52PM -0400, mark@mark.mielke.cc wrote:
Only this could be used to create other types too, for cryptographic
functions for example. PostgreSQL doesn't have any type generators yet,
so I'm unsure whether a patch creating one would be accepted for core.Not sure what I think of this. I suppose the intention would be for it
to work for lengths other than 16? I can see people wanting to use such
a generalized function for char as well as bytea, for at least latin1
characters...If people agree to a generic 16-byte type, or a hex type with defined
fixed length with a set of standard functions and index operators that
it should work for, but nobody more qualified wants to make the patch
- I'll step up.
I think it'd be extremely useful to have a means of defining
fixed-length hex types, such as UUIDs and hashes (ie: SHA1). I usually
only see people trying to do the same thing for CHAR in poorly-designed
systems, but I suspect anyone dealing with legacy stuff might welcome
that ability as well.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
mark@mark.mielke.cc writes:
It depends how it is used. If the memory location needs to be
allocated, for the value to be used only a few times, the overhead of
allocation and redirection can be more expensive. If many though, than
the reduction in value copying can make the pointer faster. 64-bytes,
and 128-bytes are just on the line of not being clearly one or the
other. It was just a thought though. The PostgreSQL API seemed pretty
fixed the last time I looked at this stuff.
Yeah, changing the definition of Datum is probably out of the question ;-)
as it'd break not only most of core but every external module in existence.
Anything bigger than a "long" has to be pass-by-reference. The
efficiency argument is wrong anyway, as we've optimized the heck out of
those code paths; very little actual copying happens unless a new value
is getting generated. Pushing pointers around is definitely faster than
pushing multiword values around.
regards, tom lane
On Fri, Jun 30, 2006 at 12:45:13PM -0500, Jim C. Nasby wrote:
If people agree to a generic 16-byte type, or a hex type with defined
fixed length with a set of standard functions and index operators that
it should work for, but nobody more qualified wants to make the patch
- I'll step up.I think it'd be extremely useful to have a means of defining
fixed-length hex types, such as UUIDs and hashes (ie: SHA1). I usually
only see people trying to do the same thing for CHAR in poorly-designed
systems, but I suspect anyone dealing with legacy stuff might welcome
that ability as well.
It would also be possible to provide two functions called hex_raw_in()
and hex_raw_out() that people could use like so:
CREATE TYPE uuid (
input = hex_raw_in,
output = hex_raw_out,
INTERNALLENGTH = 16
);
Where these input/output functions would work for any given length, so
the 16 could be replaced by any number, or even -1 to make a variable
length type...
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout <kleptog@svana.org> writes:
It would also be possible to provide two functions called hex_raw_in()
and hex_raw_out() that people could use like so:
CREATE TYPE uuid (
input = hex_raw_in,
output = hex_raw_out,
INTERNALLENGTH = 16
);
Where these input/output functions would work for any given length, so
the 16 could be replaced by any number, or even -1 to make a variable
length type...
I believe you could make an input function that would support that,
though it would have to do a catalog lookup to find out the desired
type length. The output function, however, would be quite a trick.
It's not going to receive anything except the Datum itself.
regards, tom lane
On Sat, Jul 01, 2006 at 10:37:59AM -0400, Tom Lane wrote:
Where these input/output functions would work for any given length, so
the 16 could be replaced by any number, or even -1 to make a variable
length type...I believe you could make an input function that would support that,
though it would have to do a catalog lookup to find out the desired
type length. The output function, however, would be quite a trick.
It's not going to receive anything except the Datum itself.
Hmm, you're right. With the taggedtypes module I made it work by
cloning the output function with a new OID each time and setting the
arg type so that procLookupArgType() would work. Similarly, the input
function would use procLookupRettype() to find the desired type.
So the procedure would be slightly more complicated:
CREATE FUNCTION uuid_in(cstring) RETURNS uuid AS 'hex_raw_in' LANGUAGE internal;
CREATE FUNCTION uuid_out(uuid) RETURNS cstring AS 'hex_raw_out' LANGUAGE internal;
CREATE TYPE uuid (
input = uuid_in,
output = uuid_out,
internallength = 16
);
The cat lookups are irritating, but that's what syscache is for, I
guess :)
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
From each according to his ability. To each according to his ability to litigate.
Tom Lane wrote:
The output function, however, would be quite a trick.
It's not going to receive anything except the Datum itself.
I understand the reasons for this limitation of output functions, but I
have now seen it bite several times. Maybe we need a little out of the
box thinking on this. I have spent a while taxing my meagre brain on it
over the last few months, without much success ;-(
cheers
andrew
On Sat, Jul 01, 2006 at 10:58:05AM -0400, Andrew Dunstan wrote:
Tom Lane wrote:
The output function, however, would be quite a trick.
It's not going to receive anything except the Datum itself.I understand the reasons for this limitation of output functions, but I
have now seen it bite several times. Maybe we need a little out of the
box thinking on this. I have spent a while taxing my meagre brain on it
over the last few months, without much success ;-(
The thing is, in a lot of other contexts it can work easily because
fcinfo->flinfo->fn_expr points the expression node for this function
call, which means you can extract the relevent data out of that. This
field is simply not filled in for type input/output functions.
Something that has been discussed in the past is allowing non-strict
type input/output functions to be evaluated at query execution time,
rather than during parse time. This would give the type input/output
functions the Expr node they need to extract this info.
I have no idea how easy/hard this would be.
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout <kleptog@svana.org> writes:
On Sat, Jul 01, 2006 at 10:37:59AM -0400, Tom Lane wrote:
The output function, however, would be quite a trick.
It's not going to receive anything except the Datum itself.
Hmm, you're right. With the taggedtypes module I made it work by
cloning the output function with a new OID each time and setting the
arg type so that procLookupArgType() would work. Similarly, the input
function would use procLookupRettype() to find the desired type.
Oh, I see, you relied on flinfo->fn_oid and then did two cat lookups.
That would work as long as nothing tried to call the function with
DirectFunctionCall ... which is a pretty safe assumption I guess.
regards, tom lane
Martijn van Oosterhout <kleptog@svana.org> writes:
Something that has been discussed in the past is allowing non-strict
type input/output functions to be evaluated at query execution time,
rather than during parse time. This would give the type input/output
functions the Expr node they need to extract this info.
We could make that happen for literals used in queries (see comment in
coerce_type()), but it's not appealing to expect all of the ad-hoc I/O
function calls in the whole system to supply dummy expression trees.
That would be adding overhead to all cases that's only useful in a few.
I think requiring the functions that need this info to do extra work
is probably the right answer. (It's already possible to cache whatever
lookups you have to do, cf array_in or record_in, so the overhead isn't
*that* daunting.)
regards, tom lane
On Thu, Jun 29, 2006 at 06:47:17PM +0200, Martijn van Oosterhout wrote:
It seems to me that maybe the backend should include a 16-byte fixed
length object (after all, we've got 1, 2, 4 and 8 bytes already) and
then people can use that to build whatever they like, using domains,
for example...
Sooooo... Back to this.
It won't happen unless somebody does it - and I realize that people
are busy with their own projects, so unless somebody more willing and
better suited will step up, I'm going to take a stab at getting
advanced consensus.
Please answer the below questions, and state whether your opinion is
just an opinion, or whether you are stating it as a PostgreSQL
maintainer and it is law. If you wish, you can rank preferences.
1) The added 128-bit type should take the form of:
a) UUID, with all functions
b) UUID, with only basic generation functions + encode/decode/indexable
c) UUID, with only encode/decode/indexable - generic except for the
name of the type, and the encoding format.
d) Generic 128-bit type - same as c) except may not encode or decode
as UUID (dashes). Either a large number (hex string?), or binary data.
e) Generic n-byte binary data type generator. Not sure of feasibility
of this at this point. See thread.
2) According to your answer in 1), the added 128-bit type should be:
a) In core first.
b) In contrib first.
c) In pgfoundry first.
Thanks,
mark
--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada
One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...
On Thu, Jul 06, 2006 at 12:12:18PM -0400, mark@mark.mielke.cc wrote:
Please answer the below questions, and state whether your opinion is
just an opinion, or whether you are stating it as a PostgreSQL
maintainer and it is law. If you wish, you can rank preferences.
Do I have to pick only one? I'd choose firstly for:
1c) UUID, with only encode/decode/indexable - generic except for the
name of the type, and the encoding format.
2a) In core first
And in addation to that:
1b) UUID, with only basic generation functions +
encode/decode/indexable
2b) In contrib first.
And maybe finally:
1a) UUID, with all functions
2c) In pgfoundry first.
IOW, I'm not so convinced that full UUID support should appear in core,
but I think a 16-byte type should be available in core, with basic UUID
functions in contrib and the full suite on pgfoundry.
But that's just my opinion ofcourse.
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
From each according to his ability. To each according to his ability to litigate.
On 7/6/06, mark@mark.mielke.cc wrote:
Please answer the below questions, and state whether your opinion is
just an opinion, or whether you are stating it as a PostgreSQL
maintainer and it is law. If you wish, you can rank preferences.1) The added 128-bit type should take the form of:
a) UUID, with all functions
2) According to your answer in 1), the added 128-bit type should be:
a) In core first.
Opinion, 1 a, 2 a
Jochem
Hi,
Just MHO:
1) The added 128-bit type should take the form of:
c) UUID, with only encode/decode/indexable - generic except for the
name of the type, and the encoding format.2) According to your answer in 1), the added 128-bit type should be:
a) In core first.
1c is what I would need. 1b or 1a would be nice to have.
- Sander
mark@mark.mielke.cc writes:
e) Generic n-byte binary data type generator. Not sure of feasibility
of this at this point. See thread.
I don't like the idea of a generator that would have to be manually invoked,
though such a thing would be a fine tool for contrib or pgfoundry, I think it
would never be a clean enough interface for core.
On the other hand core could conceivably translate things like char(n) into
such a type generated on the fly. That is, instead of having a single char oid
it could check a cache of fixed length char(n) data types and if there isn't
one already generate one on the fly. That would be somewhat grotty of an
implementation but the user interface at least would be entirely transparent.
If one day we change things to pass around typmod database designs wouldn't
have to change at all.
(Actually Postgres can never do this for char(n), at least not as long as we
insist on making char/varchar/text locale-aware. Personally I think the
default char/varchar/text locale should be C unless you specify otherwise on a
per-column basis. But that seems to be a minority opinion. Postgres could
however do this for separate raw binary datatypes like bit(n) or bytea(n).)
In answer to your question, though my opinion carries no special weight at
all, I would suggest adding a bare bones 16-byte data type to core and a
second binary-compatible data type based on it that parsed/output as uuids.
The extended uuid libraries should only go in pgfoundry/contrib.
--
greg
Greg Stark wrote:
In answer to your question, though my opinion carries no special weight at
all, I would suggest adding a bare bones 16-byte data type to core and a
second binary-compatible data type based on it that parsed/output as uuids.
The extended uuid libraries should only go in pgfoundry/contrib.
I second that.
Regards,
Thomas Hallgren
On Jul 6, 2006, at 4:02 PM, Thomas Hallgren wrote:
In answer to your question, though my opinion carries no special
weight at
all, I would suggest adding a bare bones 16-byte data type to core
and a
second binary-compatible data type based on it that parsed/output
as uuids.
The extended uuid libraries should only go in pgfoundry/contrib.I second that.
+1. If there's enough user demand we can look at adding the type to
core (I don't see any real advantage to contrib over pgFoundry for
this). I'm not sure if it makes sense to add a generic 16 byte RAW to
core, either. I'd *much* rather see effort expended on a generic RAW
type which had it's size defined as part of the type and didn't use
varlena.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Saturday 08 July 2006 14:54, Jim Nasby wrote:
On Jul 6, 2006, at 4:02 PM, Thomas Hallgren wrote:
In answer to your question, though my opinion carries no special
weight at
all, I would suggest adding a bare bones 16-byte data type to core
and a
second binary-compatible data type based on it that parsed/output
as uuids.
The extended uuid libraries should only go in pgfoundry/contrib.I second that.
+1. If there's enough user demand we can look at adding the type to
core (I don't see any real advantage to contrib over pgFoundry for
this).
The advantage of contrib over pgFoundry is that it will be packaged by the
major distributions. Every distribution includes a package of the contrib
modules.
Sincerely,
Joshua D. Drake
--
=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/
On Sat, Jul 08, 2006 at 05:54:26PM -0400, Jim Nasby wrote:
+1. If there's enough user demand we can look at adding the type to
core (I don't see any real advantage to contrib over pgFoundry for
this). I'm not sure if it makes sense to add a generic 16 byte RAW to
core, either. I'd *much* rather see effort expended on a generic RAW
type which had it's size defined as part of the type and didn't use
varlena.
You could place a nice wrapper around type generators, which would let
you say:
DECLARE TYPE RAW(16);
After which point you could use that type in function declarations and
such. It would create an OID for that type would could be used as
normal.
I think that trying to get the backend to pay more attention to typmods
is not going to be successful. Simply because functions and operators
have an affect on the typmod and once you start relying on typmods to
decode a tuple, you've got a real problem.
As an example, what do you get when you concatenate two CHAR(4)'s? Do
you get another CHAR(4) or is it a CHAR(8)? How does the backend know?
You'd have to accompany each function with another function just to
tell you how the typmods would be related.
The only way out I can think of is that RAW(n) is merely a sort of
template and RAW(x) cannot be used in a place where RAW(y) is expected
(if x<>y). Hence it makes sense to have a template that people can
instantiate instances of and let the rest of the system treat them as
new types, unrelated to anything else.
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
From each according to his ability. To each according to his ability to litigate.