A problem with sequences...

Started by Dmitry Tkachabout 23 years ago12 messagesgeneral
Jump to latest
#1Dmitry Tkach
dmitry@openratings.com

Hi, everybody!

I am experiencing some weird problem that I was hoping you could shed some light on...

This is a little complicated, and, I bet the first thing I am going to hear from you is "change your schema" :-)
I will, most probably, end up doing that, but before I get into it, I would like to understand completely what is going on here,
so, please, bear with me for a while.

The background is, that I have some tables, that are being populated by a java application through jdbc (don't stop reading because of that -
it is very unlikely to matter that the stuff goes through jdbc!)

The java app wants to get back the id (pk) of the rows it inserts, and it does not want to make a separate query for that.
We use rules to achive that. For example:

create table answer
(
id serial primary key,
data text
);

create rule answer_id_seq as on insert to answer do select coalesce (new.id, last_value) as id from answer_id_seq;

(for those who wonders why I did it this way, and not using currval, I'll explain it in the end).

I was surprised to find out that at the time rule is executed, the sequence is already advanced, but the new.id is still null (looks like
the defaults just never make it into the new.* at all), but apparently, that is the case.

So, with this setup my java app is able to execute a statement that inserts a new row, get back a ResultSet and fetch an id from it.
And it works just fine most of the time.

*However*, every now and then (and I was never able to reproduce it on purpose, so I don't know what exactly the circumstances are)
I get a weird exception from java, complaining about at attempt to insert a duplicate into answer_pkey after executing a statement like

insert into answer (data) values ('blah');

So, it looks like my rule somehow manages to screw up the sequence. Does it make sense to anyone?
How could it happen? I understand that this rule is no good, and I need to fix it (because it could return an incorrect value if
two connections happen to insert into the same table simultaneously), but still - I want to understand how does it manage to screw
up that sequence just by *looking* at it???

The reason I want to understand what is going on here is to be sure that, by fixing this rule, I will really get this problem go away
(it shows up pretty rarely, and I cannot reproduce it on purpose, so I have no way to verify that, other than figuring out exactly what is
going on).

Does all this make any sense to any of you? Can you imagine some situation when with a setup like this an insert statement would get
a value from the sequence that was already used?

I would greatly appreciate any help.

Thanks a lot!

Dima.

P.S. As I promised, the reason that rule is not using currval is just that it may or may not be set for a given insert statement -
if the id is explicitly specified, a call to currval would fail (or even return a wrong value if the sequence was accessed previously
in the same session)...
Perhaps, I could work around that, by creating two rules - with 'where new.id is null' and 'where new.id is not null', and having
the first one use currval, and the other one just return new.id, but it just happened to be done this way...
As I said above, I understand that there are all kinds of problems with this rule, and I am going to get rid of it either way (I am thinking,
about just making a change on the java side and appending ';select currval(..)' to those inserts).
However, I would still love to understand what exactly goes on with those duplicate ids, right now.

Thanks again!

#2Richard Huxton
dev@archonet.com
In reply to: Dmitry Tkach (#1)
Re: A problem with sequences...

On Wednesday 19 Feb 2003 11:05 pm, Dmitry Tkach wrote:

create table answer
(
id serial primary key,
data text
);

create rule answer_id_seq as on insert to answer do select coalesce
(new.id, last_value) as id from answer_id_seq;

*However*, every now and then (and I was never able to reproduce it on
purpose, so I don't know what exactly the circumstances are) I get a weird
exception from java, complaining about at attempt to insert a duplicate
into answer_pkey after executing a statement like

insert into answer (data) values ('blah');

P.S. As I promised, the reason that rule is not using currval is just that
it may or may not be set for a given insert statement - if the id is
explicitly specified, a call to currval would fail (or even return a wrong

^^^^^^^^^^^^^^^^^^

Before looking into more obscure possibilities, are you using the sequence to
generate these explicit id's? If not, that's why you're getting duplicates,
the sequence generator doesn't know you've used these numbers.

value if the sequence was accessed previously in the same session)...
Perhaps, I could work around that, by creating two rules - with 'where
new.id is null' and 'where new.id is not null', and having the first one
use currval, and the other one just return new.id, but it just happened to
be done this way... As I said above, I understand that there are all kinds
of problems with this rule, and I am going to get rid of it either way (I
am thinking, about just making a change on the java side and appending
';select currval(..)' to those inserts). However, I would still love to
understand what exactly goes on with those duplicate ids, right now.

--
Richard Huxton

#3Dmitry Tkach
dmitry@openratings.com
In reply to: Richard Huxton (#2)
Re: A problem with sequences...

Before looking into more obscure possibilities, are you using the sequence to
generate these explicit id's? If not, that's why you're getting duplicates,
the sequence generator doesn't know you've used these numbers.

Yeah... I understand this.
In this case, the id is *never* specified explicitly. Java app either
knows the id or it does not.
If it knows it, it does the update, otherwise, it does an insert, with
*no* id specified, and gets the new id back from that rule, so that time
it will
know the id and end up doing update...

I was referring to the situation in general when somehow (like data
migration) the id is specified, I just don't want that rule to barf. It
is definitely not what's causing my problem. So, let's get into those
'obscure possibilities' now :-)

Thanks!

Dima.

#4Richard Huxton
dev@archonet.com
In reply to: Dmitry Tkach (#3)
Re: A problem with sequences...

On Thursday 20 Feb 2003 2:56 pm, Dima Tkach wrote:

Before looking into more obscure possibilities, are you using the sequence
to generate these explicit id's? If not, that's why you're getting
duplicates, the sequence generator doesn't know you've used these
numbers.

Yeah... I understand this.
In this case, the id is *never* specified explicitly. Java app either
knows the id or it does not.
If it knows it, it does the update, otherwise, it does an insert, with
*no* id specified, and gets the new id back from that rule, so that time
it will
know the id and end up doing update...

I was referring to the situation in general when somehow (like data
migration) the id is specified, I just don't want that rule to barf. It
is definitely not what's causing my problem. So, let's get into those
'obscure possibilities' now :-)

Well, in that case you need to setval() the sequence to something bigger than
any used numbers after the import.

OK - let's look at the rule:

create table answer
(
id serial primary key,
data text
);

create rule answer_id_seq as on insert to answer do select coalesce (new.id,
last_value) as id from answer_id_seq;

Well - you're going to have problems if you do something like:

INSERT INTO answer (data) (SELECT d FROM foo);

I daresay you're not, but something to bear in mind.

To see why you're getting problems with duplicate ID numbers, open two psql
windows and do:

1> SELECT nextval('answer_id_seq');
2> SELECT nextval('answer_id_seq');
1> SELECT last_value FROM answer_id_seq;
2> SELECT last_value FROM answer_id_seq;

As you'll see, last_value isn't concurrent-safe like currval() and nextval()
are.

So - if you want to keep your rule, you'll want to rewrite it to use currval()
as you mentioned.

Personally, I'd write a function to insert into the table and make the app use
that, or create a view and have the app insert via that. Getting a result-set
back from an insert would spook me if I wasn't expecting it.
--
Richard Huxton

#5Dmitry Tkach
dmitry@openratings.com
In reply to: Richard Huxton (#4)
Re: A problem with sequences...

Well, in that case you need to setval() the sequence to something bigger than
any used numbers after the import.

I know. (And I do).

OK - let's look at the rule:

create table answer
(
id serial primary key,
data text
);

create rule answer_id_seq as on insert to answer do select coalesce (new.id,
last_value) as id from answer_id_seq;

Well - you're going to have problems if you do something like:

INSERT INTO answer (data) (SELECT d FROM foo);

I daresay you're not, but something to bear in mind.

That's fine. The problen in that case would be that I'll be always
getting back the last id inserted, right.
I've seen this happenning, and it is OK, because I only need that output
in this java app, and the java app only inserts them
one at a time.
(In fact, the most annoying thing with statement like that is having to
scroll through all of that useless output if there are too many rows in
foo - so, I usually just disable the rule whenever I am about to do
something like that)....

To see why you're getting problems with duplicate ID numbers, open two psql
windows and do:

1> SELECT nextval('answer_id_seq');
2> SELECT nextval('answer_id_seq');
1> SELECT last_value FROM answer_id_seq;
2> SELECT last_value FROM answer_id_seq;

As you'll see, last_value isn't concurrent-safe like currval() and nextval()
are.

Well... yeah. I know this. That's why I said in the very beginning, that
this rule is no good.
*However* this would only exp[lain a situation with my java app getting
an incorrect id back after inserting a row, but it never uses that id to
insert other rows (if it did, it would cause a duplication even if the
rule was safe), so, I still don't see any reason how this would cause a
duplicated id to be inserted - nextval() should still return unique
numbers, right? Once again, I know that last_val may not be the same
thing nextval() just returned, but the real question is how can this
possibly cause a duplicated id to be genrated???

So - if you want to keep your rule, you'll want to rewrite it to use currval()
as you mentioned.

Nah... I guess, I'll rather do something in the java (append ";select
currval..." to the insert statement).

Personally, I'd write a function to insert into the table and make the app use
that, or create a view and have the app insert via that. Getting a result-set
back from an insert would spook me if I wasn't expecting it.

Well... If you do not expect it, you can just ignore it - it's not a big
deal :-)
I don't see how having a view would help...

As for the function, I would have to create many of them - one for each
table I need to insert into, because the arguments would be different,
and with the rule I was able to get away with only writing it once ...
If postgres had something like (void *) to pass to a function (and also
if it allowed variable number of parameters), I would ceratinly stick
with the function solution, because then I would also be able to cache a
query plan...

Dima

#6Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Dmitry Tkach (#5)
Re: A problem with sequences...

On Fri, 21 Feb 2003, Dmitry Tkach wrote:

rapidb=# select last_value from answer_id_seq;
last_value
------------
22124
(1 row)

rapidb=# select setval ('answer_id_seq', id) from answer order by id desc limit 1;
setval
--------
25000
(1 row)

rapidb=# select currval('answer_id_seq');
currval
---------
21452
(1 row)

So, the question I have now is - what's going on?
How come setval() reports 25000, but the currval changes to 21452
If my query is wrong (and I can't really see what's wrong with it), then why does it report the correct value?

IIRC, currval returns the last value given by the sequence (in nextval) to
this session. What does a select nextval('answer_id_seq') give you?
21453 or 25001?

#7Doug McNaught
doug@mcnaught.org
In reply to: Dmitry Tkach (#5)
Re: A problem with sequences...

Dmitry Tkach <dmitry@openratings.com> writes:

select setval('answer_id_seq', id) from answer order by id desc limit 1;

Now, for some reason this reports a correct value, but what actually
gets set is wrong!

I'm guessing that 'setval' is getting called more than once here.
Your 'LIMIT 1' controls how many rows are returned to the client, but
the server is probably generating more rows internally. So this is
just wrong, and

select setval ('answer_id_seq', max(id)) from answer

is right.

-Doug

#8Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Doug McNaught (#7)
Re: A problem with sequences...

On 21 Feb 2003, Doug McNaught wrote:

Dmitry Tkach <dmitry@openratings.com> writes:

select setval('answer_id_seq', id) from answer order by id desc limit 1;

Now, for some reason this reports a correct value, but what actually
gets set is wrong!

I'm guessing that 'setval' is getting called more than once here.
Your 'LIMIT 1' controls how many rows are returned to the client, but
the server is probably generating more rows internally. So this is

I'd totally missed that, it's probably doing a plan of seqscan + sort to
run the query.

select setval ('answer_id_seq', max(id)) from answer

is right.

Or possibly:
select setval('answer_id_seq', id) from
(select id from answer order by id desc limit 1) as foo;

which might if the table gets big enough use an index scan.

#9Dmitry Tkach
dmitry@openratings.com
In reply to: Dmitry Tkach (#5)
Re: A problem with sequences...

I got it!

For those, who are still wonderring, the actual problem had nothing to do with that rule.

I was loading some stuff into the database through a sql script (copy from etc...), and in the end of it I did:

select setval('answer_id_seq', id) from answer order by id desc limit 1;

Now, for some reason this reports a correct value, but what actually gets set is wrong!
Here is an example:

rapidb=# select max(id) from answer;
max
-------
25000
(1 row)

rapidb=# select last_value from answer_id_seq;
last_value
------------
22124
(1 row)

rapidb=# select setval ('answer_id_seq', id) from answer order by id desc limit 1;
setval
--------
25000
(1 row)

rapidb=# select currval('answer_id_seq');
currval
---------
21452
(1 row)

So, the question I have now is - what's going on?
How come setval() reports 25000, but the currval changes to 21452
If my query is wrong (and I can't really see what's wrong with it), then why does it report the correct value?

I have rewritten that script to do
select setval ('answer_id_seq', max(id)) from answer
instead, and that works fine... But I'd still love to find out what is wrong with that original query's behaviour...

Thanks!

Dima

#10Dmitry Tkach
dmitry@openratings.com
In reply to: Stephan Szabo (#6)
Re: A problem with sequences...

Stephan Szabo wrote:

IIRC, currval returns the last value given by the sequence (in nextval) to
this session. What does a select nextval('answer_id_seq') give you?
21453 or 25001?

2143

Dima

#11Dmitry Tkach
dmitry@openratings.com
In reply to: Doug McNaught (#7)
Re: A problem with sequences...

Doug McNaught wrote:

Dmitry Tkach <dmitry@openratings.com> writes:

select setval('answer_id_seq', id) from answer order by id desc limit 1;

Now, for some reason this reports a correct value, but what actually
gets set is wrong!

I'm guessing that 'setval' is getting called more than once here.
Your 'LIMIT 1' controls how many rows are returned to the client, but
the server is probably generating more rows internally. So this is
just wrong, and

You are right ! That's it! It does get called twice.
There is even a nice comment in ExecLimit():

* NOTE: when scanning forwards, we must fetch one tuple beyond the
* COUNT limit before we can return NULL, else the subplan won't
* be properly positioned to start going backwards. Hence test
* here is for position > netlimit not position >= netlimit.

Whatever that means, that's what was causing my problems...

Thanks!

Dima

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dmitry Tkach (#11)
Re: A problem with sequences...

Dmitry Tkach <dmitry@openratings.com> writes:

Doug McNaught wrote:

I'm guessing that 'setval' is getting called more than once here.
Your 'LIMIT 1' controls how many rows are returned to the client, but
the server is probably generating more rows internally.

You are right ! That's it! It does get called twice.
There is even a nice comment in ExecLimit():

BTW, in CVS tip ExecLimit has been rewritten to not do this, so the
query will behave as you expect in 7.4. Still, functions with
side-effects are really really dangerous in any but the simplest kind
of SELECT, because the planner is pretty cavalier about rearranging
things. I'd advise doing this instead:

select setval('answer_id_seq', (select id from answer
order by id desc limit 1));

Here, you *know* that the setval will be called exactly once.

regards, tom lane