UUID's as primary keys
I'm building an app where I have several entities that are identified
using a UUID (i.e. a 128 bit quantity). My current implementation uses a
composite primary key consisting of two int8 values. It's a bit
cumbersome and I would much rather have a distinct type. An earlier
implementation using Oracle mapped the UUID to a RAW(16) but PostgreSQL
doesn't have that and the BYTEA adds extra overhead.
What would be the best (as in most efficient) mapping for a 128 bit
primary key?
Regards,
Thomas Hallgren
On Wed, Jun 28, 2006 at 09:01:49AM +0200, Thomas Hallgren wrote:
I'm building an app where I have several entities that are identified
using a UUID (i.e. a 128 bit quantity). My current implementation uses a
composite primary key consisting of two int8 values. It's a bit
cumbersome and I would much rather have a distinct type. An earlier
implementation using Oracle mapped the UUID to a RAW(16) but PostgreSQL
doesn't have that and the BYTEA adds extra overhead.What would be the best (as in most efficient) mapping for a 128 bit
primary key?
Sounds like something for a custom type. There's one here[1]http://gborg.postgresql.org/project/pguuid/projdisplay.php though I
have no idea how good it is.
[1]: http://gborg.postgresql.org/project/pguuid/projdisplay.php
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout wrote:
On Wed, Jun 28, 2006 at 09:01:49AM +0200, Thomas Hallgren wrote:
I'm building an app where I have several entities that are identified
using a UUID (i.e. a 128 bit quantity). My current implementation uses a
composite primary key consisting of two int8 values. It's a bit
cumbersome and I would much rather have a distinct type. An earlier
implementation using Oracle mapped the UUID to a RAW(16) but PostgreSQL
doesn't have that and the BYTEA adds extra overhead.What would be the best (as in most efficient) mapping for a 128 bit
primary key?Sounds like something for a custom type. There's one here[1] though I
have no idea how good it is.[1] http://gborg.postgresql.org/project/pguuid/projdisplay.php
Have a nice day,
Thanks. That would of course work but at the same time it increases the
complexity of my app. Yet another component to install and keep track
of. It's also a bit of an overkill since the only thing I need is an
opaque bit storage. Why is it that PostgreSQL lack a fixed length binary
type similar to the RAW type in Oracle? ISTM that could be very useful
and not very hard to implement.
Regards,
Thomas Hallgren
On Wed, Jun 28, 2006 at 10:22:45AM +0200, Thomas Hallgren wrote:
Thanks. That would of course work but at the same time it increases the
complexity of my app. Yet another component to install and keep track
of. It's also a bit of an overkill since the only thing I need is an
opaque bit storage. Why is it that PostgreSQL lack a fixed length binary
type similar to the RAW type in Oracle? ISTM that could be very useful
and not very hard to implement.
AIUI, it can't be done because of a basic rule of the type system: the
typmod can't be necessary to interpret the binary representation of a
value. For something like RAW(16) the type would be the oid for "raw"
and the typmod would be 16. However, when reading the value from a disk
page, you're not given the typmod, so you have no way of determining
the length.
That's why there is a verlena header for types where you don't know the
length, which adds overhead. The alternative is to create a type of the
exact length you want, but in your case that's not acceptable either.
I'm not sure if there are any other alternatives.
Have a ncie day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout wrote:
On Wed, Jun 28, 2006 at 10:22:45AM +0200, Thomas Hallgren wrote:
Thanks. That would of course work but at the same time it increases the
complexity of my app. Yet another component to install and keep track
of. It's also a bit of an overkill since the only thing I need is an
opaque bit storage. Why is it that PostgreSQL lack a fixed length binary
type similar to the RAW type in Oracle? ISTM that could be very useful
and not very hard to implement.AIUI, it can't be done because of a basic rule of the type system: the
typmod can't be necessary to interpret the binary representation of a
value. For something like RAW(16) the type would be the oid for "raw"
and the typmod would be 16. However, when reading the value from a disk
page, you're not given the typmod, so you have no way of determining
the length.
OK. I thought you always had a type descriptor handy when reading the binary representation.
I've noticed that the typmod is expected in some receive functions (bpcharrecv and
numeric_recv for instance). Are you saying that there are times when you don't use that?
Regards,
Thomas Hallgren
On Wed, Jun 28, 2006 at 12:03:40PM +0200, Thomas Hallgren wrote:
OK. I thought you always had a type descriptor handy when reading the
binary representation. I've noticed that the typmod is expected in some
receive functions (bpcharrecv and numeric_recv for instance). Are you
saying that there are times when you don't use that?
The input functions get it, the output functions (bpcharout,
bpcharsend, etc) don't. Which makes it kind of hard to print a raw
value if you don't know how long it's going to be. They used to, but
that was removed some time back. It's a security issue IIRC, since any
user could call raw_out(field, 2048) and get whatever was in the 2K of
data after that field.
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout wrote:
On Wed, Jun 28, 2006 at 12:03:40PM +0200, Thomas Hallgren wrote:
OK. I thought you always had a type descriptor handy when reading the
binary representation. I've noticed that the typmod is expected in some
receive functions (bpcharrecv and numeric_recv for instance). Are you
saying that there are times when you don't use that?The input functions get it, the output functions (bpcharout,
bpcharsend, etc) don't. Which makes it kind of hard to print a raw
value if you don't know how long it's going to be. They used to, but
that was removed some time back. It's a security issue IIRC, since any
user could call raw_out(field, 2048) and get whatever was in the 2K of
data after that field.
A user that is trusted with installing a C-function in the backend is
free to scan the process memory anyway so in what way did that increase
the security? IMHO, the only relevant security in that context is to
have trusted people install trusted modules. I'm surprised that
something like that made you remove significant functionality.
Regards,
Thomas Hallgren
On Wed, Jun 28, 2006 at 01:56:47PM +0200, Thomas Hallgren wrote:
A user that is trusted with installing a C-function in the backend is
free to scan the process memory anyway so in what way did that increase
the security? IMHO, the only relevant security in that context is to
have trusted people install trusted modules. I'm surprised that
something like that made you remove significant functionality.
You're missing the point. The type output function is not generally a
priveledged function. Think bpcharout, text_out, numeric_out, etc...
These can be called by users directly and the input to those functions
cannot be trusted.
If the type output function needs an additional parameter to correctly
and safely decode the actual Datum, you're screwed because then users
can pass invalid parameters to affect the decoding. If you have a way
of telling what the right value is, then you didn't need to pass it in
the first place.
Hence, you have to be able to decode a datum knowing only its type,
irrespective of typmod. So say you had a field of type RAW(16) you
would have to be able to decode it knowing only that it is of type
"RAW". So you need a header to tell you how long it is i.e. a varlena
structure.
Hope this clarifies it a bit,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout <kleptog@svana.org> writes:
The input functions get it, the output functions (bpcharout,
bpcharsend, etc) don't. Which makes it kind of hard to print a raw
value if you don't know how long it's going to be. They used to, but
that was removed some time back.
Even back then you couldn't rely on the typmod value to be supplied;
it was quite likely to be passed as -1. The issue is not actually
with on-disk storage, it is with function/operator arguments and
results. Those have never been identified any more closely than by
giving a type OID. So for any value that came from a function,
you won't have a typmod, and you'd better be able to find out all
you need to know just by inspecting the value itself. Hence, length
words.
This is all pretty off-topic for pgsql-general, isn't it?
regards, tom lane
Martijn van Oosterhout wrote:
On Wed, Jun 28, 2006 at 01:56:47PM +0200, Thomas Hallgren wrote:
A user that is trusted with installing a C-function in the backend is
free to scan the process memory anyway so in what way did that increase
the security? IMHO, the only relevant security in that context is to
have trusted people install trusted modules. I'm surprised that
something like that made you remove significant functionality.You're missing the point. The type output function is not generally a
priveledged function. Think bpcharout, text_out, numeric_out, etc...
These can be called by users directly and the input to those functions
cannot be trusted.
Ah, OK that makes sense. An alternative solution when the signature was changed could
perhaps have been to pass one single argument, a structure appointing the data and its
associated type. My idea would work if the data and its type lived together always from the
moment its instantiated (read from disk or otherwise) and until death do them apart (or the
data is stored on disk, in which case the tupledesc knows what it is). I guess that would
imply a major rewrite and that my desire to have a RAW fixed length type isn't enough
motivation to do that :-)
Instead, I would like to humbly request the inclusion of a UUID datatype (or an opaque 128
bit datatype) in the core package. It's increasingly common and some databases (MS
SQLServer) already have built in support for it.
Regards,
Thomas Hallgren
On Wed, Jun 28, 2006 at 06:39:16PM +0200, Thomas Hallgren wrote:
Instead, I would like to humbly request the inclusion of a UUID datatype
(or an opaque 128 bit datatype) in the core package. It's increasingly
ISTM that we get enough requests for this that it's probably worth
doing.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Wed, Jun 28, 2006 at 06:39:16PM +0200, Thomas Hallgren wrote:
Instead, I would like to humbly request the inclusion of a UUID datatype
(or an opaque 128 bit datatype) in the core package. It's increasingly
common and some databases (MS SQLServer) already have built in support for
it.
We have it. We're just not putting in the effort required to have it
included in core, as it's too much effort to convince people that the
type has value, that is is generic, and would be widely used without
being abused. All the geometric types that I'll never use in core,
with few or no uses, including functions to operate on these types,
and no UUID type... Hehe... To me, that's irony... :-)
Cheers,
mark
--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada
One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...
On Wed, June 28, 2006 1:14 pm, mark@mark.mielke.cc wrote:
On Wed, Jun 28, 2006 at 06:39:16PM +0200, Thomas Hallgren wrote:
Instead, I would like to humbly request the inclusion of a UUID
datatype (or an opaque 128 bit datatype) in the core package. It's
increasingly common and some databases (MS SQLServer) already have built
in support for it.We have it. We're just not putting in the effort required to have it
included in core, as it's too much effort to convince people that the type
has value, that is is generic, and would be widely used without being
abused. All the geometric types that I'll never use in core, with few or
no uses, including functions to operate on these types, and no UUID
type... Hehe... To me, that's irony... :-)
Is it on pgfoundry? From past discussions, the new criteria for getting
something into core is to first determine if it is successful on
pgfoundry.
On Wed, Jun 28, 2006 at 01:18:39PM -0400, A.M. wrote:
On Wed, June 28, 2006 1:14 pm, mark@mark.mielke.cc wrote:
On Wed, Jun 28, 2006 at 06:39:16PM +0200, Thomas Hallgren wrote:
Instead, I would like to humbly request the inclusion of a UUID
datatype (or an opaque 128 bit datatype) in the core package. It's
increasingly common and some databases (MS SQLServer) already have built
in support for it.We have it. We're just not putting in the effort required to have it
included in core, as it's too much effort to convince people that the type
has value, that is is generic, and would be widely used without being
abused. All the geometric types that I'll never use in core, with few or
no uses, including functions to operate on these types, and no UUID
type... Hehe... To me, that's irony... :-)Is it on pgfoundry? From past discussions, the new criteria for getting
something into core is to first determine if it is successful on
pgfoundry.
If http://lnk.nu/pgfoundry.org/a86.php is accurate, then no one has ever
downloaded it. But I find that exceptionally hard to believe...
Looking back through the list archives I think you'd find this comes up
at least every few months.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Wed, Jun 28, 2006 at 12:38:50PM -0500, Jim C. Nasby wrote:
On Wed, Jun 28, 2006 at 01:18:39PM -0400, A.M. wrote:
On Wed, June 28, 2006 1:14 pm, mark@mark.mielke.cc wrote:
On Wed, Jun 28, 2006 at 06:39:16PM +0200, Thomas Hallgren wrote:
Instead, I would like to humbly request the inclusion of a UUID
datatype (or an opaque 128 bit datatype) in the core package. It's
increasingly common and some databases (MS SQLServer) already have built
in support for it.We have it. We're just not putting in the effort required to have it
included in core, as it's too much effort to convince people that the type
has value, that is is generic, and would be widely used without being
abused. All the geometric types that I'll never use in core, with few or
no uses, including functions to operate on these types, and no UUID
type... Hehe... To me, that's irony... :-)Is it on pgfoundry? From past discussions, the new criteria for getting
something into core is to first determine if it is successful on
pgfoundry.If http://lnk.nu/pgfoundry.org/a86.php is accurate, then no one has ever
downloaded it. But I find that exceptionally hard to believe...Looking back through the list archives I think you'd find this comes up
at least every few months.
I've downloaded the version off pgfoundry.org. It is broken. It leaks
memory, and if memory is correct it can cause the client to core dump.
Two of us worked on a re-write based off a different UUID system library,
and I've been happily using it in production for a year or so. I don't
believe either of us have bothered to market it. Each time it comes up,
a number of people on this list shut it down, and it doesn't seem worth
the effort to convince them otherwise. They can have their ivory tower,
and I can have my plugin.
Cheers,
mark
--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada
One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...
Jim C. Nasby wrote:
On Wed, Jun 28, 2006 at 01:18:39PM -0400, A.M. wrote:
On Wed, June 28, 2006 1:14 pm, mark@mark.mielke.cc wrote:
On Wed, Jun 28, 2006 at 06:39:16PM +0200, Thomas Hallgren wrote:
Instead, I would like to humbly request the inclusion of a UUID
datatype (or an opaque 128 bit datatype) in the core package. It's
increasingly common and some databases (MS SQLServer) already have built
in support for it.We have it. We're just not putting in the effort required to have it
included in core, as it's too much effort to convince people that the type
has value, that is is generic, and would be widely used without being
abused. All the geometric types that I'll never use in core, with few or
no uses, including functions to operate on these types, and no UUID
type... Hehe... To me, that's irony... :-)Is it on pgfoundry? From past discussions, the new criteria for getting
something into core is to first determine if it is successful on
pgfoundry.If http://lnk.nu/pgfoundry.org/a86.php is accurate, then no one has ever
downloaded it. But I find that exceptionally hard to believe...Looking back through the list archives I think you'd find this comes up
at least every few months.
That's because there is nothing there to download. See instead:
http://gborg.postgresql.org/project/pguuid/projdisplay.php
Personally I don't buy the misuse objection - we already have plenty of
things that can be misused. As long as there is a reasonable valid use
and we can make it portable enough, I think there is a good case for
including it.
cheers
andrew
mark@mark.mielke.cc wrote:
I've downloaded the version off pgfoundry.org. It is broken. It leaks
memory, and if memory is correct it can cause the client to core dump.
Also it couldn't possibly be included in core, since it's based on a
GPL'ed UUID library. If you have a more appropiately licensed package,
it could be considered for inclusion.
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
On Wed, Jun 28, 2006 at 01:49:55PM -0400, Andrew Dunstan wrote:
Personally I don't buy the misuse objection - we already have plenty of
things that can be misused. As long as there is a reasonable valid use
and we can make it portable enough, I think there is a good case for
including it.
Well, since Mark has one, how about we consider adding it in?
If nothing else, can you please put your stuff on pgFoundry so others
can find it, Mark?
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Tom Lane <tgl@sss.pgh.pa.us> writes:
Martijn van Oosterhout <kleptog@svana.org> writes:
The input functions get it, the output functions (bpcharout,
bpcharsend, etc) don't. Which makes it kind of hard to print a raw
value if you don't know how long it's going to be. They used to, but
that was removed some time back.
Even back then you couldn't rely on the typmod value to be supplied;
it was quite likely to be passed as -1. The issue is not actually
with on-disk storage, it is with function/operator arguments and
results. Those have never been identified any more closely than by
giving a type OID. So for any value that came from a function,
you won't have a typmod, and you'd better be able to find out all
you need to know just by inspecting the value itself. Hence, length
words.
Hm, so it could be stored on disk without the length header as long as the
length header is added to the in-memory representation? I don't think the type
system has hooks for reading and storing data to disk though.
This is all pretty off-topic for pgsql-general, isn't it?
[moved to -hackers]
--
greg
Greg Stark <gsstark@mit.edu> writes:
Hm, so it could be stored on disk without the length header as long as
the length header is added to the in-memory representation? I don't
think the type system has hooks for reading and storing data to disk
though.
No, it doesn't, and we'd pay a nonzero price for allowing that.
Currently the executor doesn't have to care (much) about whether a
tuple is on-disk or in-memory --- the individual datums look the same
either way. Allowing them to be different would force a lot of
format conversion steps that currently need not happen.
regards, tom lane