Questions about SERIAL type

Started by Nonameabout 24 years ago10 messages
#1Noname
reina@nsi.edu

I was thinking of re-designing my database schema to use a SERIAL
value as an indentification across tables (i.e. as a foreign key).
I've been playing with some example tables and have found the
following behavior from SERIAL:

(1) I think SERIAL is defined as an int4. However, the upper bound
seems to be 2^31 - 1 (217483647) not 2^32 - 1. I suppose this is
because a generic int4 should have one bit for the sign
(negative/positive). However, shouldn't SERIAL always be a positive
number? Would it be correct to make it some kind of unsigned int4
instead?

(2) The SERIAL number increases even if the transaction was aborted
(e.g. if a repeated tuple were trying to be inserted into a unique
table, the transaction fails, but the SERIAL gets incremented).
I was hoping that VACUUM VERBOSE ANALYZE would somehow reclaim the
lost SERIAL indicies. So, for example, if I had the table:

db02=# select * from center_out order by id;
subject | arm | target | rep | id
---------+-----+--------+-----+------------
F | L | 1 | 1 | 1
F | L | 1 | 2 | 3
F | L | 10 | 2 | 4
F | L | 100 | 2 | 100001
F | L | 100 | 3 | 10000002
F | L | 500 | 3 | 2110000001
F | L | 501 | 3 | 2147483646
F | L | 502 | 3 | 2147483647
(8 rows)

then a VACUUM VERBOSE ANALYZE would do the following:

db02=# select * from center_out order by id;
subject | arm | target | rep | id
---------+-----+--------+-----+------------
F | L | 1 | 1 | 1
F | L | 1 | 2 | 2
F | L | 10 | 2 | 3
F | L | 100 | 2 | 4
F | L | 100 | 3 | 5
F | L | 500 | 3 | 6
F | L | 501 | 3 | 7
F | L | 502 | 3 | 8
(8 rows)

I figure that I should never reach 2^31 - 1 transaction per table even
with many aborted ones; however, I think these would be nice changes.

Comments?

-Tony

#2Doug McNaught
doug@wireboard.com
In reply to: Noname (#1)
Re: Questions about SERIAL type

reina@nsi.edu (Tony Reina) writes:

I was thinking of re-designing my database schema to use a SERIAL
value as an indentification across tables (i.e. as a foreign key).
I've been playing with some example tables and have found the
following behavior from SERIAL:

(1) I think SERIAL is defined as an int4. However, the upper bound
seems to be 2^31 - 1 (217483647) not 2^32 - 1. I suppose this is
because a generic int4 should have one bit for the sign
(negative/positive). However, shouldn't SERIAL always be a positive
number? Would it be correct to make it some kind of unsigned int4
instead?

I don't think PG (or the SQL standard) has any concept of unsigned
numbers. Besides, you can have sequences that have negative values at
some points, and even decrement rather than increment. Some folks may
rely on this behavior.

(2) The SERIAL number increases even if the transaction was aborted
(e.g. if a repeated tuple were trying to be inserted into a unique
table, the transaction fails, but the SERIAL gets incremented).
I was hoping that VACUUM VERBOSE ANALYZE would somehow reclaim the
lost SERIAL indicies. So, for example, if I had the table:

How would this work? Would the DB have to go through all tables
looking for REFERENCES constraints and update those rows referring to
a renumbered key? What if you had a referencing column without a
REFERENCES constraint? What if you had some kind of data external to
the database that relied on those primary keys staying the same? Not
practical IMHO.

I figure that I should never reach 2^31 - 1 transaction per table even
with many aborted ones; however, I think these would be nice changes.

What's going to happen AFAIK is that 64-bit sequences will be
available. It's unlikely that overflow will be an issue with
those... ;)

-Doug
--
Let us cross over the river, and rest under the shade of the trees.
--T. J. Jackson, 1863

#3Ned Wolpert
ned.wolpert@knowledgenet.com
In reply to: Noname (#1)
Re: Questions about SERIAL type

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I'm not 100% sure that you actually want this. The main reason I say this is
that in most cases I use sequence numbers is to do forign-key relationships.

If you change sequence numbers on rows in a table, unless all tables that
use that sequence number are also modified, then the relationship between
tables that rely on the sequence number is lost. If for any reason the
sequence number is used externally, (not usually a good idea, but sometimes
it is) then that relationship is also lost.

And for argument sake, lets assume that we know each location a sequence
number is referenced, so you can make the changes everywhere. (And that these
numbers aren't used for other things like order-numbers that need to appear
in a string format and printed/referenced later) That means that the database
needs to be off-line during this access. So the modifications to Vacuum to
make it less intrusive to users while its occuring is now lost.

I don't think this is a good idea... (Also, does 7.2 have an 8 byte sequence
number (serial8) anyways? So isn't this problem moot?)

On 28-Nov-2001 Tony Reina wrote:

I was thinking of re-designing my database schema to use a SERIAL
value as an indentification across tables (i.e. as a foreign key).
I've been playing with some example tables and have found the
following behavior from SERIAL:

(1) I think SERIAL is defined as an int4. However, the upper bound
seems to be 2^31 - 1 (217483647) not 2^32 - 1. I suppose this is
because a generic int4 should have one bit for the sign
(negative/positive). However, shouldn't SERIAL always be a positive
number? Would it be correct to make it some kind of unsigned int4
instead?

(2) The SERIAL number increases even if the transaction was aborted
(e.g. if a repeated tuple were trying to be inserted into a unique
table, the transaction fails, but the SERIAL gets incremented).
I was hoping that VACUUM VERBOSE ANALYZE would somehow reclaim the
lost SERIAL indicies. So, for example, if I had the table:

db02=# select * from center_out order by id;
subject | arm | target | rep | id
---------+-----+--------+-----+------------
F | L | 1 | 1 | 1
F | L | 1 | 2 | 3
F | L | 10 | 2 | 4
F | L | 100 | 2 | 100001
F | L | 100 | 3 | 10000002
F | L | 500 | 3 | 2110000001
F | L | 501 | 3 | 2147483646
F | L | 502 | 3 | 2147483647
(8 rows)

then a VACUUM VERBOSE ANALYZE would do the following:

db02=# select * from center_out order by id;
subject | arm | target | rep | id
---------+-----+--------+-----+------------
F | L | 1 | 1 | 1
F | L | 1 | 2 | 2
F | L | 10 | 2 | 3
F | L | 100 | 2 | 4
F | L | 100 | 3 | 5
F | L | 500 | 3 | 6
F | L | 501 | 3 | 7
F | L | 502 | 3 | 8
(8 rows)

I figure that I should never reach 2^31 - 1 transaction per table even
with many aborted ones; however, I think these would be nice changes.

Comments?

-Tony

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Virtually,
Ned Wolpert <ned.wolpert@knowledgenet.com>

D08C2F45: 28E7 56CB 58AC C622 5A51 3C42 8B2B 2739 D08C 2F45
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE8BW1siysnOdCML0URAjFqAJ9RJk25zXl/mjhJmjC5tsf4bkj7EQCeNpph
PcrtIXqceZLqdkDOyfAcq84=
=MqDe
-----END PGP SIGNATURE-----

#4G. Anthony Reina
reina@nsi.edu
In reply to: Noname (#1)
Re: Questions about SERIAL type

Doug McNaught wrote:

I don't think PG (or the SQL standard) has any concept of unsigned
numbers. Besides, you can have sequences that have negative values at
some points, and even decrement rather than increment. Some folks may
rely on this behavior.

When I tried setting the current value to -200 I got an error that the
number was outside of the proper range.

db02=# create table test (id SERIAL);
NOTICE: CREATE TABLE will create implicit sequence 'test_id_seq' for SERIAL
column 'test.id'
NOTICE: CREATE TABLE/UNIQUE will create implicit index 'test_id_key' for
table 'test'
CREATE

db02=# select setval('test_id_seq', -200);
ERROR: test_id_seq.setval: value -200 is out of bounds (1,2147483647)

So I'm not sure how people would be using negative values. It looks like from
the documentation that the SERIAL type always increments by 1 so I'm not sure
how they could use decrementing values. Unless, of course, they've changed
the source code to do this. Perhaps I'm missing something here in the
documentation (using PG 7.1.3, maybe 7.2beta has changed this?).

How would this work? Would the DB have to go through all tables
looking for REFERENCES constraints and update those rows referring to
a renumbered key? What if you had a referencing column without a
REFERENCES constraint? What if you had some kind of data external to
the database that relied on those primary keys staying the same? Not
practical IMHO.

Yes, it would have to do this which may be time consuming and possibly
impractical. However, the VACUUM ANALYZE is doing an aweful lot of processing
on the tables and the indicies already.

However, perhaps the other thing to do is to not increment the SERIAL value
on an aborted transaction. I'm not sure why serial has to be incremented if
the transaction fails. Of course, this won't take care of unused SERIAL
numbers when DELETEs occur.

I'm not sure about other database schemas which depend on the SERIAL values
remaining the same for external consistency. You could still use an OID in
that case I should think instead of SERIAL (?)

I figure that I should never reach 2^31 - 1 transaction per table even
with many aborted ones; however, I think these would be nice changes.

What's going to happen AFAIK is that 64-bit sequences will be
available. It's unlikely that overflow will be an issue with
those... ;)

That will definitely make overflow unlikely. Perhaps I'm just being too
paranoid that somehow I'll get to the point where my SERIAL value is maxed
out but I have large gaps from DELETED/UPDATED/ABORTED transactions.

-Tony

db02=# select version();
version
-------------------------------------------------------------
PostgreSQL 7.1.3 on i686-pc-linux-gnu, compiled by GCC 2.96
(1 row)

#5Doug McNaught
doug@wireboard.com
In reply to: Noname (#1)
Re: Questions about SERIAL type

"G. Anthony Reina" <reina@nsi.edu> writes:

Doug McNaught wrote:

I don't think PG (or the SQL standard) has any concept of unsigned
numbers. Besides, you can have sequences that have negative values at
some points, and even decrement rather than increment. Some folks may
rely on this behavior.

When I tried setting the current value to -200 I got an error that the
number was outside of the proper range.

You need to specify MINVALUE, MAXVALUE and INCREMENT explicitly to
play tricks like that. See the docs for CREATE SEQUENCE.

If you need it, you should be able to create a sequence that uses the
whole range from -2^31 to 2^31-1 with proper arguments to CREATE
SEQUENCE.

So I'm not sure how people would be using negative values. It looks like from
the documentation that the SERIAL type always increments by 1 so I'm not sure
how they could use decrementing values. Unless, of course, they've changed
the source code to do this. Perhaps I'm missing something here in the
documentation (using PG 7.1.3, maybe 7.2beta has changed this?).

You didn't read the right part of the docs. ;) See CREATE SEQUENCE
in the SQL reference.

How would this work? Would the DB have to go through all tables
looking for REFERENCES constraints and update those rows referring to
a renumbered key? What if you had a referencing column without a
REFERENCES constraint? What if you had some kind of data external to
the database that relied on those primary keys staying the same? Not
practical IMHO.

Yes, it would have to do this which may be time consuming and possibly
impractical. However, the VACUUM ANALYZE is doing an aweful lot of processing
on the tables and the indicies already.

I'd be more concerned about the hairiness and maintainability of the
resulting code, actually. ;)

However, perhaps the other thing to do is to not increment the SERIAL value
on an aborted transaction. I'm not sure why serial has to be incremented if
the transaction fails. Of course, this won't take care of unused SERIAL
numbers when DELETEs occur.

The reason we don't do it this way is that the sequence object would
have to be locked for the duration of every transaction that used it.
You'd get a lot of contention on that lock and a big slowdown of the
whole system. And as you say it wouldn't address the DELETE issue.

I'm not sure about other database schemas which depend on the SERIAL values
remaining the same for external consistency. You could still use an OID in
that case I should think instead of SERIAL (?)

That's worse if anything. ;)

I figure that I should never reach 2^31 - 1 transaction per table even
with many aborted ones; however, I think these would be nice changes.

What's going to happen AFAIK is that 64-bit sequences will be
available. It's unlikely that overflow will be an issue with
those... ;)

That will definitely make overflow unlikely. Perhaps I'm just being too
paranoid that somehow I'll get to the point where my SERIAL value is maxed
out but I have large gaps from DELETED/UPDATED/ABORTED transactions.

Seriously, I wouldn't worry about it, unless you're incrementing
thousands of times a second, in which case you're in trouble for a lot
of other reasons...

-Doug
--
Let us cross over the river, and rest under the shade of the trees.
--T. J. Jackson, 1863

#6Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Noname (#1)
Re: Questions about SERIAL type

On 28 Nov 2001, Tony Reina wrote:

I was thinking of re-designing my database schema to use a SERIAL
value as an indentification across tables (i.e. as a foreign key).
I've been playing with some example tables and have found the
following behavior from SERIAL:

(1) I think SERIAL is defined as an int4. However, the upper bound

IIRC in 7.2, there's 8 byte sequences and a serial8 pseudotype that
probably uses a signed int8.

(2) The SERIAL number increases even if the transaction was aborted
(e.g. if a repeated tuple were trying to be inserted into a unique
table, the transaction fails, but the SERIAL gets incremented).

Yeah, the tradeoff was made to go for the concurrency advantage. If
you need to rollback the sequence value if rollback is performed, you'd
need to wait until it's happened before the next insert would be able
to get the sequence value.

I was hoping that VACUUM VERBOSE ANALYZE would somehow reclaim the
lost SERIAL indicies. So, for example, if I had the table:

Ick. That sounds really ugly to me. That seems to be outside what
the system can reasonably be expected to handle. It'd be difficult
to determine the full set of in-database dependencies (say triggers
that do their own sort of integrity checks, views, functions, etc
that may join this field to another table) and probably impossible
to determine out of database ones (printed material, etc...).

#7Ned Wolpert
ned.wolpert@knowledgenet.com
In reply to: G. Anthony Reina (#4)
Re: Questions about SERIAL type

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 28-Nov-2001 G. Anthony Reina wrote:

However, perhaps the other thing to do is to not increment the SERIAL value
on an aborted transaction. I'm not sure why serial has to be incremented if
the transaction fails. Of course, this won't take care of unused SERIAL
numbers when DELETEs occur.

I thought its incremented since the sequence is outside of the transaction.
That way, if multiple clients are doing inserts using the sequence, one
doesn't have to wait for the other transactions to end before they get a lock
on the sequence.

Virtually,
Ned Wolpert <ned.wolpert@knowledgenet.com>

D08C2F45: 28E7 56CB 58AC C622 5A51 3C42 8B2B 2739 D08C 2F45
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE8BXVOiysnOdCML0URApJnAJ9Z43xFgJRevgoNIQEGYkwkxbAAJACbBopF
N3slqHoAxPq7HkcDaI7FMsY=
=r9mw
-----END PGP SIGNATURE-----

#8Doug McNaught
doug@wireboard.com
In reply to: Noname (#1)
Re: Questions about SERIAL type

"G. Anthony Reina" <reina@nsi.edu> writes:

Now that I look at the CREATE SEQUENCE documentation, it appears to
have a CYCLE flag which wraps the sequence around if it were to
reach the MAXVALUE. Does anyone know if it wraps around to the next
unused value? Or, if an index already exists at SERIAL value =
MINVALUE, then will the INSERT get an error about duplicate
insertions?

SERIAL columns get a unique index defined, so you'd get an error.

-Doug
--
Let us cross over the river, and rest under the shade of the trees.
--T. J. Jackson, 1863

#9G. Anthony Reina
reina@nsi.edu
In reply to: Noname (#1)
Re: Questions about SERIAL type

Doug McNaught wrote:

"G. Anthony Reina" <reina@nsi.edu> writes:

You need to specify MINVALUE, MAXVALUE and INCREMENT explicitly to
play tricks like that. See the docs for CREATE SEQUENCE.

Ahhh. I found it now. I was looking at the documentation from an older on-line
version of Bruce's book and didn't see the MAXVALUE, MINVALUE stuff. I guess the
default for a serial column is MINVALUE 1, MAX VALUE 2^31-1, INCREMENT +1.

The reason we don't do it this way is that the sequence object would
have to be locked for the duration of every transaction that used it.
You'd get a lot of contention on that lock and a big slowdown of the
whole system. And as you say it wouldn't address the DELETE issue.

Okay, yes I can see the lock problem now. That makes sense.

That will definitely make overflow unlikely. Perhaps I'm just being too
paranoid that somehow I'll get to the point where my SERIAL value is maxed
out but I have large gaps from DELETED/UPDATED/ABORTED transactions.

Seriously, I wouldn't worry about it, unless you're incrementing
thousands of times a second, in which case you're in trouble for a lot
of other reasons...

I figured that I was just being overly cautious. 2^31 transactions is quite a lot.
With the move to int8 the point should be moot.

p.s.

Now that I look at the CREATE SEQUENCE documentation, it appears to have a CYCLE
flag which wraps the sequence around if it were to reach the MAXVALUE. Does anyone
know if it wraps around to the next unused value? Or, if an index already exists
at SERIAL value = MINVALUE, then will the INSERT get an error about duplicate
insertions?

-Tony

#10Serguei Mokhov
sa_mokho@alcor.concordia.ca
In reply to: Noname (#1)
Re: Questions about SERIAL type

----- Original Message -----
From: Doug McNaught <doug@wireboard.com>
Sent: Wednesday, November 28, 2001 5:31 PM

I figure that I should never reach 2^31 - 1 transaction per table even
with many aborted ones; however, I think these would be nice changes.

What's going to happen AFAIK is that 64-bit sequences will be
available. It's unlikely that overflow will be an issue with
those... ;)

"640K ought to be enough for everyone!"
Gill Bates.

No offense, just an association :)