Re: Bug #613: Sequence values fall back to previously chec

Started by Mikheev, Vadimalmost 24 years ago25 messages

vmikheev@SECTORBASE.COM

almost 24 years ago

AFAICS the only way that we could make the one-WAL-record-every-32-
nextvals idea really work would be if CHECKPOINT could nullify the
logged-in-advance state of each sequence (so that the first nextval
after a checkpoint would always generate a fresh WAL record, but
subsequent ones wouldn't have to). But I don't see any practical
way for CHECKPOINT to do that, especially not for sequences whose
disk block isn't even in memory at the instant of the CHECKPOINT.

But sequences can force WAL record if sequence page LSN is <= than
system RedoRecPtr (XLogCtlInsert.RedoRecPtr), ie previously made
sequence WAL record is "too old" and would not be played during
restart. It seems safe to do NOT write WAL record if sequence
LSN > system RedoRecPtr because of checkpoint started after our
check would finish only after writing to disk sequence buffer with
proper last_value and log_cnt (nextval keeps lock on sequence buffer).

What is not good is that to read system RedoRecPtr backend has to
acquire XLogInsertLock but probably we can change system RedoRecPtr
read/write rules:

- to update RedoRecPtr one has to keep not only XLogInsertLock
but also acquire XLogInfoLock (this is only CheckPointer who
updates RedoRecPtr);
- to read RedoRecPtr one has to keep either XLogInsertLock or
XLogInfoLock.

This way nextval would only acquire XLogInfoLock to check
system RedoRecPtr.

Vadim

Tom Lane

tgl@sss.pgh.pa.us

almost 24 years ago

In reply to: Mikheev, Vadim (#1)

Re: Bug #613: Sequence values fall back to previously chec kpointed

"Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes:

It seems safe to do NOT write WAL record if sequence
LSN > system RedoRecPtr because of checkpoint started after our
check would finish only after writing to disk sequence buffer with
proper last_value and log_cnt (nextval keeps lock on sequence buffer).

Mmm ... maybe. Is this safe if a checkpoint is currently in progress?
Seems like you could look at RedoRecPtr and decide you are okay, but you
really are not if checkpointer has already dumped sequence' disk
buffer and will later set RedoRecPtr to a value beyond the old LSN.
In that case you should have emitted a WAL record ... but you didn't.

Considering that we've found two separate bugs in this stuff in the past
week, I think that we ought to move in the direction of making it
simpler and more reliable, not even-more-complicated. Is it really
worth all this trouble to avoid making a WAL record for each nextval()
call?

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

almost 24 years ago

In reply to: Tom Lane (#2)

Re: Bug #613: Sequence values fall back to previously chec kpointed

I said:

Mmm ... maybe. Is this safe if a checkpoint is currently in progress?
Seems like you could look at RedoRecPtr and decide you are okay, but you
really are not if checkpointer has already dumped sequence' disk
buffer and will later set RedoRecPtr to a value beyond the old LSN.

Oh, wait, I take that back: the checkpointer advances RedoRecPtr
*before* it starts to dump disk buffers.

I'm still worried about whether we shouldn't try to simplify, rather
than add complexity.

regards, tom lane

Mikheev, Vadim

vmikheev@SECTORBASE.COM

almost 24 years ago

In reply to: Mikheev, Vadim (#1)

It seems safe to do NOT write WAL record if sequence
LSN > system RedoRecPtr because of checkpoint started after our
check would finish only after writing to disk sequence buffer with
proper last_value and log_cnt (nextval keeps lock on
sequence buffer).

Mmm ... maybe. Is this safe if a checkpoint is currently in
progress? Seems like you could look at RedoRecPtr and decide
you are okay, but you really are not if checkpointer has already
dumped sequence' disk buffer and will later set RedoRecPtr to a
value beyond the old LSN.

CheckPointer updates system RedoRecPtr before doing anything else.
System RedoRecPtr was introduced to force data buffers backup
by future XLogInsert-s once CheckPointer started and it *must* be
updated *before* buffer flushing.

In that case you should have emitted a WAL record ... but you didn't.

Considering that we've found two separate bugs in this stuff
in the past week, I think that we ought to move in the direction
of making it simpler and more reliable, not even-more-complicated.

Isn't it too late, considering we have fixes for both bugs already? -:)
(And it's not very-more-complicated - just simple check.)

Is it really worth all this trouble to avoid making a WAL record
for each nextval() call?

It's doable... why not do this?
(Though I have no strong objection.)

Vadim

Import Notes

Resolved by subject fallback

Tom Pfau

T.Pfau@emCrit.com

almost 24 years ago

In reply to: Mikheev, Vadim (#4)

I don't fully understand the xlog files or WAL records but...

Why isn't the writing of the WAL record based on the CACHE value of the
sequence? If a request to nextval() can't be satisfied by the cache,
the sequence on disk should be updated resulting in a WAL record being
written.

If two sessions are accessing the sequence, they will likely end up not
writing sequential values as they have both taken a chunk of values by
calling nextval() but the effect of this could be controlled by the user
by selecting an acceptable value for CACHE. If I don't mind having
session 1 write records 1-10 while session 2 interleaves those with
records 11-20, I should set my cache to 10. If I want my id's to map to
insertion time as closely as possible I should set the cache lower (or
NOCACHE, is that an option?).

I'm concerned that the discussion here has been of the opinion that
since no records were written to the database using the value retrieved
from the sequence that no damage has been done. We are using database
sequences to get unique identifiers for things outside the database. If
a sequence could ever under any circumstances reissue a value, this
could be damaging to the integrity of our software.

Import Notes

Resolved by subject fallback

Noname

bgrimm@zaeon.com

almost 24 years ago

In reply to: Tom Pfau (#5)

On Thu, 14 Mar 2002, Tom Pfau wrote:

I don't fully understand the xlog files or WAL records but...

Why isn't the writing of the WAL record based on the CACHE value of the
sequence? If a request to nextval() can't be satisfied by the cache,
the sequence on disk should be updated resulting in a WAL record being
written.

If two sessions are accessing the sequence, they will likely end up not
writing sequential values as they have both taken a chunk of values by
calling nextval() but the effect of this could be controlled by the user
by selecting an acceptable value for CACHE.

I was thinking that too, just leave it up to the end user to decide the
level of performance gain they want. But the cool thing about writing
ahead to the WAL when compared to CACHE is that the on disk copy is
advanced ahead of the cached value so that if you have a cache value of
1 you still get time ordered sequences from multiple backends AND you're
only writing to the WAL once every 32 nextval's -- though the write
ahead should really be based on a multiple of CACHE since if your cache
value is 32 then you're not getting any benefit from the WAL savings.

If I don't mind having
session 1 write records 1-10 while session 2 interleaves those with
records 11-20, I should set my cache to 10. If I want my id's to map to
insertion time as closely as possible I should set the cache lower (or
NOCACHE, is that an option?).

The CACHE value defaults to 1 which means no cache (each time nextval is
called it only grabs 1 value).

I'm concerned that the discussion here has been of the opinion that
since no records were written to the database using the value retrieved
from the sequence that no damage has been done. We are using database
sequences to get unique identifiers for things outside the database. If
a sequence could ever under any circumstances reissue a value, this
could be damaging to the integrity of our software.

Absolutely, we use sequences the same way. And the problem exhibits
itself regardless of whether data is being inserted or not, and
independantly of CACHE value. So this has to be fixed for both
scenarios.

-- Ben

Tom Lane

tgl@sss.pgh.pa.us

almost 24 years ago

In reply to: Tom Pfau (#5)

"Tom Pfau" <T.Pfau@emCrit.com> writes:

I'm concerned that the discussion here has been of the opinion that
since no records were written to the database using the value retrieved
from the sequence that no damage has been done.

Um, you certainly didn't hear me saying that ;-)

There are two different bugs involved here. One is the no-WAL-flush-
if-transaction-is-only-nextval problem. AFAIK everyone agrees we must
fix that. The other issue is this business about "logging ahead"
(to reduce the number of WAL records written) not interacting correctly
with checkpoints. What we're arguing about is exactly how to fix that
part.

We are using database
sequences to get unique identifiers for things outside the database. If
a sequence could ever under any circumstances reissue a value, this
could be damaging to the integrity of our software.

If you do a SELECT nextval() and then use the returned value externally
*without waiting for a commit acknowledgement*, then I think you are
risking trouble; there's no guarantee that the WAL record (if one is
needed) has hit disk yet, and so a crash could roll back the sequence.

This isn't an issue for a SELECT nextval() standing on its own ---
AFAIK the result will not be transmitted to the client until after the
commit happens. But it would be an issue for a select executed inside
a transaction block (begin/commit).

regards, tom lane

Dave Cramer

dave@fastcrypt.com

almost 24 years ago

In reply to: Tom Lane (#7)

Re: [HACKERS] Bug #613: Sequence values fall back to previously chec

I noticed a message asking if this scenario was consistent with the
other reports, and yes it is. We have seen this occuring on our system
with versions as old as 7.0.

Glad to see someone has finally nailed this one.

Dave

Tom Lane

tgl@sss.pgh.pa.us

almost 24 years ago

In reply to: Dave Cramer (#8)

Re: [HACKERS] Bug #613: Sequence values fall back to previously chec

"Dave Cramer" <dave@fastcrypt.com> writes:

I noticed a message asking if this scenario was consistent with the
other reports, and yes it is. We have seen this occuring on our system
with versions as old as 7.0.

Given that these are WAL bugs, they could not predate 7.1.

regards, tom lane

#10

Ben Grimm

bgrimm@zaeon.com

almost 24 years ago

In reply to: Tom Lane (#7)

On Thu, 14 Mar 2002, Tom Lane wrote:

If you do a SELECT nextval() and then use the returned value externally
*without waiting for a commit acknowledgement*, then I think you are
risking trouble; there's no guarantee that the WAL record (if one is
needed) has hit disk yet, and so a crash could roll back the sequence.

This isn't an issue for a SELECT nextval() standing on its own ---
AFAIK the result will not be transmitted to the client until after the
commit happens. But it would be an issue for a select executed inside
a transaction block (begin/commit).

The behavior of SELECT nextval() should not be conditional on being in or
out of a transaction block. What you implying by saying that the data is
that it would be possible to rollback an uncommited call to nextval().
Am I missing some terminology?

I think I finally realized why my old patch that forces a log right off
the bat works to fix at least part of the problem. When the database
is shutdown properly all of the sequences that are in memory are written
back to disk in their state at that time. But the problem with that is
that their state at that time can have log_cnt > 0. This is why after
startup that the sequence in memory is 'behind' the one on disk, the
code sees log > fetch and doesn't log. When you really think about it
log_cnt should not be part of the sequence record at all since there
is never a valid case for storing a log_cnt on disk with a value other
than 0.

Maybe the purpose for the on disk value of log_cnt should be changed?
It could be the value used in place of the static SEQ_LOG_VALS, which
could then be definable on a per sequence basis. And then log_cnt
could be moved into elm->log_cnt. Anyway, that's just a thought.

Here's my latest patch to work around the problem. If there is a way
to prevent log_cnt from being written out with a value greater than
zero, that would be better than this. With this behavior log_cnt is
reset to 0 each time a backend accesses a sequence for the first time.
That's probably overkill... But I still believe that the XLogFlush
after XLogInsert is necessary to ensure that the WAL value is written
to disk immediately. In my testing this patch works fine, YMMV.

-- Ben

*** src/backend/commands/sequence.c.orig        Tue Mar 12 18:58:55 2002
--- src/backend/commands/sequence.c     Thu Mar 14 17:34:25 2002
***************
*** 62,67 ****
--- 62,68 ----
        int64           cached;
        int64           last;
        int64           increment;
+       bool            reset_logcnt;
        struct SeqTableData *next;
  } SeqTableData;

***************
*** 270,275 ****
--- 271,277 ----

PageSetLSN(page, recptr);
PageSetSUI(page, ThisStartUpID);
+ XLogFlush(recptr);
}
END_CRIT_SECTION();

***************
*** 314,321 ****
PG_RETURN_INT64(elm->last);
}

! seq = read_info("nextval", elm, &buf); /* lock page' buffer and
! * read tuple */

        last = next = result = seq->last_value;
        incby = seq->increment_by;
--- 316,322 ----
                PG_RETURN_INT64(elm->last);
        }

! seq = read_info("nextval", elm, &buf); /* lock page' buffer and read tuple */

last = next = result = seq->last_value;
incby = seq->increment_by;
***************
*** 331,339 ****
log--;
}

! if (log < fetch)
{
! fetch = log = fetch - log + SEQ_LOG_VALS;
logit = true;
}

--- 332,340 ----
                log--;
        }

! if (log < fetch)
{
! fetch = log = fetch - log + SEQ_LOG_VALS * cache;
logit = true;
}

***************
*** 403,409 ****
                rdata[0].data = (char *) &xlrec;
                rdata[0].len = sizeof(xl_seq_rec);
                rdata[0].next = &(rdata[1]);
!
                seq->last_value = next;
                seq->is_called = true;
                seq->log_cnt = 0;
--- 404,410 ----
                rdata[0].data = (char *) &xlrec;
                rdata[0].len = sizeof(xl_seq_rec);
                rdata[0].next = &(rdata[1]);
!
                seq->last_value = next;
                seq->is_called = true;
                seq->log_cnt = 0;
***************
*** 417,423 ****

                PageSetLSN(page, recptr);
                PageSetSUI(page, ThisStartUpID);
!
                if (fetch)                              /* not all numbers were fetched */
                        log -= fetch;
        }
--- 418,425 ----

                PageSetLSN(page, recptr);
                PageSetSUI(page, ThisStartUpID);
!               XLogFlush(recptr);
!
                if (fetch)                              /* not all numbers were fetched */
                        log -= fetch;
        }
***************
*** 507,513 ****
                XLogRecPtr      recptr;
                XLogRecData rdata[2];
                Page            page = BufferGetPage(buf);
!
                xlrec.node = elm->rel->rd_node;
                rdata[0].buffer = InvalidBuffer;
                rdata[0].data = (char *) &xlrec;
--- 509,516 ----
                XLogRecPtr      recptr;
                XLogRecData rdata[2];
                Page            page = BufferGetPage(buf);
!
!
                xlrec.node = elm->rel->rd_node;
                rdata[0].buffer = InvalidBuffer;
                rdata[0].data = (char *) &xlrec;
***************
*** 527,532 ****
--- 530,536 ----

                PageSetLSN(page, recptr);
                PageSetSUI(page, ThisStartUpID);
+               XLogFlush(recptr);
        }
        /* save info in sequence relation */
        seq->last_value = next;         /* last fetched number */
***************
*** 660,665 ****
--- 664,674 ----

seq = (Form_pg_sequence) GETSTRUCT(&tuple);

+       if (elm->reset_logcnt)
+       {
+               seq->log_cnt = 0;
+               elm->reset_logcnt = false;
+       }
        elm->increment = seq->increment_by;

        return seq;
***************
*** 703,708 ****
--- 712,718 ----
                                 name, caller);
                        elm->relid = RelationGetRelid(seqrel);
                        elm->cached = elm->last = elm->increment = 0;
+                       elm->reset_logcnt = true;
                }
        }
        else
***************
*** 721,726 ****
--- 731,737 ----
                elm->relid = RelationGetRelid(seqrel);
                elm->cached = elm->last = elm->increment = 0;
                elm->next = (SeqTable) NULL;
+               elm->reset_logcnt = true;

if (seqtab == (SeqTable) NULL)
seqtab = elm;

#11

Tom Lane

tgl@sss.pgh.pa.us

almost 24 years ago

In reply to: Ben Grimm (#10)

Ben Grimm <bgrimm@zaeon.com> writes:

The behavior of SELECT nextval() should not be conditional on being in or
out of a transaction block.

Nonsense. The behavior of INSERT or UPDATE is "conditional" in exactly
the same way: you should not rely on the reported result until it's
committed.

Given Vadim's performance concerns, I doubt he'll hold still for forcing
an XLogFlush immediately every time a sequence XLOG record is written
-- but AFAICS that'd be the only way to guarantee durability of a
nextval result in advance of commit. Since I don't think that's an
appropriate goal for the system to have, I don't care for it either.

I'm planning to try coding up Vadim's approach (pay attention to page's
old LSN to see if a WAL record must be generated) tonight or tomorrow
and see if it seems reasonable.

regards, tom lane

#12

Mikheev, Vadim

vmikheev@SECTORBASE.COM

almost 24 years ago

In reply to: Tom Lane (#11)

This isn't an issue for a SELECT nextval() standing on
its own AFAIK the result will not be transmitted to the
client until after the commit happens. But it would be
an issue for a select executed inside a transaction
block (begin/commit).

The behavior of SELECT nextval() should not be conditional
on being in or out of a transaction block.

And it's not. But behaviour of application *must* be
conditional on was transaction committed or not.

What's the problem for application that need nextval() for
external (out-of-database) purposes to use sequence values
only after transaction commit? What's *wrong* for such application
to behave the same way as when dealing with other database objects
which are under transaction control (eg only after commit you can
report to user that $100 was successfully added to his/her account)?

---

I agree that if nextval-s were only "write" actions in transaction
and they made some XLogInsert-s then WAL must be flushed at commit
time. But that's it. Was this fixed? Very easy.

Vadim

Import Notes

Resolved by subject fallback

#13

'Ben Grimm'

bgrimm@zaeon.com

almost 24 years ago

In reply to: Mikheev, Vadim (#12)

On Thu, 14 Mar 2002, Mikheev, Vadim wrote:

And it's not. But behaviour of application *must* be
conditional on was transaction committed or not.

What's the problem for application that need nextval() for
external (out-of-database) purposes to use sequence values
only after transaction commit? What's *wrong* for such application
to behave the same way as when dealing with other database objects
which are under transaction control (eg only after commit you can
report to user that $100 was successfully added to his/her account)?

But sequences should not be under transaction control. Can you
safely rollback a sequence? No! The only way to ensure that would
be to lock the sequence for the duration of the transaction. If
you want an ACID compliant sequential value, you implement it using
a transaction safe method (e.g. a table with rows you can lock for
the duration of a transaction). If you want a number that is
guaranteed to always move in one direction, return the next value
without requiring locks, and may have gaps in the numbers returned,
you choose a sequence.

Placing a restriction on an application that says it must treat
the values returned from a sequence as if they might not be committed
is absurd. What about applications that don't use explicit
transactions? As soon as a result comes back it should be considered
'live', on disk, never think about it again.

I agree that if nextval-s were only "write" actions in transaction
and they made some XLogInsert-s then WAL must be flushed at commit
time. But that's it. Was this fixed? Very easy.

But aren't the nextval's always going to be the only write actions
in their transactions since the nextval isn't really a part of the
transaction that called it? If it were, then it could be rolled
back along with that transaction. This is why you can, right now,
insert data into a table with a serial column, committing after
each row, crash the database and STILL have the sequence fall back
to its initial state. The XLogInserts that occur from the table
inserts must not happen in the same xact as the nextval's
XLogInserts. I can demonstrate the behavior quite easilly, and
Bruce posted results that confirmed it.

-- Ben

#14

Vadim Mikheev

vmikheev@sectorbase.com

almost 24 years ago

In reply to: 'Ben Grimm' (#13)

But sequences should not be under transaction control. Can you
safely rollback a sequence? No! The only way to ensure that would

...

Placing a restriction on an application that says it must treat the values
returned from a sequence as if they might not be committed is absurd.

Why? The fact that you are not able to rollback sequences does not
necessary mean that you are not required to perform commit to ensure
permanent storage of changes made to database.

And isn't it absurd to do more XLogFlush-es for non-transactional objects
than we do for transactional ones? And why? Just for convenience of
<< 1% applications which need to use sequences in their own,
non-database, external objects? We are not required to care about those
objects, but we'd better care about performance of our operations over our
objects.

I agree that if nextval-s were only "write" actions in transaction
and they made some XLogInsert-s then WAL must be flushed at commit
time. But that's it. Was this fixed? Very easy.

But aren't the nextval's always going to be the only write actions
in their transactions since the nextval isn't really a part of the
transaction that called it? If it were, then it could be rolled

There are no nextval' transactions. See how XLOG_NO_TRAN flag
is used in XLogInsert and you'll see why there is no XLogFlush
after transaction-with-nextval-only (which causes N1 reported problem).

back along with that transaction. This is why you can, right now,
insert data into a table with a serial column, committing after
each row, crash the database and STILL have the sequence fall back
to its initial state. The XLogInserts that occur from the table
inserts must not happen in the same xact as the nextval's
XLogInserts. I can demonstrate the behavior quite easilly, and
Bruce posted results that confirmed it.

Just wait until Tom adds check for system RedoRecPtr in nextval()
and try to reproduce this behaviour (N2 reported problem)
after that.

Vadim

#15

'Ben Grimm'

bgrimm@zaeon.com

almost 24 years ago

In reply to: Vadim Mikheev (#14)

On Fri, 15 Mar 2002, Vadim Mikheev wrote:

But sequences should not be under transaction control. Can you
safely rollback a sequence? No! The only way to ensure that would

...

Placing a restriction on an application that says it must treat the values
returned from a sequence as if they might not be committed is absurd.

Why? The fact that you are not able to rollback sequences does not
necessary mean that you are not required to perform commit to ensure
permanent storage of changes made to database.

I'm not sure I agree, but I'll wait to see the behavior of the db after
the changes are made.

And isn't it absurd to do more XLogFlush-es for non-transactional objects
than we do for transactional ones? And why? Just for convenience of
<< 1% applications which need to use sequences in their own,
non-database, external objects? We are not required to care about those
objects, but we'd better care about performance of our operations over our
objects.

Yes, absolutely - if there's a better way, which apparently there is,
then sure, eliminate the calls to XLogFlush. It's a workaround, a hack.
I am much more concerned with getting the behavior correct than I am
about getting some code with my name on it into a release. My workarounds
only served to point out flaws in the design, even if I didn't quite
understand at the time why they worked :-)

There are no nextval' transactions. See how XLOG_NO_TRAN flag
is used in XLogInsert and you'll see why there is no XLogFlush
after transaction-with-nextval-only (which causes N1 reported problem).

Just wait until Tom adds check for system RedoRecPtr in nextval()
and try to reproduce this behaviour (N2 reported problem)
after that.

Thank you! I think I have much better understanding of how this works
now.

When these bugs are fixed there is still the issue of bug #3 that I
came across. The one that I work around by resetting log_cnt to 0 when a
backend initializes a sequence. It's this third bug that made the other
two so apparent. Fixing them does not obviate the need to fix this one.

Is there a way to intercept writes or reads such that when a sequnce is
going to or from disk that we can force log_cnt = 0? Right now that's
worked around by my 'reset_logcnt' flag in the patch, but I know that it
may not be an ideal solution. But, since sequences are just tuples like
everything else I don't see an obvious way to prevent it.

-- Ben

#16

Tom Lane

tgl@sss.pgh.pa.us

almost 24 years ago

In reply to: 'Ben Grimm' (#15)

"'Ben Grimm'" <bgrimm@zaeon.com> writes:

When these bugs are fixed there is still the issue of bug #3 that I
came across. The one that I work around by resetting log_cnt to 0 when a
backend initializes a sequence. It's this third bug that made the other
two so apparent. Fixing them does not obviate the need to fix this one.

What's bug #3? I don't recall a third issue.

regards, tom lane

#17

Tom Lane

tgl@sss.pgh.pa.us

almost 24 years ago

In reply to: 'Ben Grimm' (#15)

Attached is a patch against current CVS that fixes both of the known
problems with sequences: failure to flush XLOG after a transaction
that only does "SELECT nextval()", and failure to force a new WAL
record to be written on the first nextval after a checkpoint.
(The latter uses Vadim's idea of looking at the sequence page LSN.)
I haven't tested it really extensively, but it seems to cure the
reported problems.

Some notes:

1. I found what I believe is another bug in the sequence logic:
fetch = log = fetch - log + SEQ_LOG_VALS;
should be
fetch = log = fetch + SEQ_LOG_VALS;
I can't see any reason to reduce the number of values prefetched
by the number formerly prefetched. Also, if the sequence's "cache"
setting is large (more than SEQ_LOG_VALS), the original code could
easily fail to fetch as many values as it was supposed to cache,
let alone additional ones to be prefetched and logged.

2. I renamed XLogCtl->RedoRecPtr to SavedRedoRecPtr, and renamed
the associated routines to SetSavedRedoRecPtr/GetSavedRedoRecPtr,
in hopes of reducing confusion.

3. I believe it'd now be possible to remove SavedRedoRecPtr and
SetSavedRedoRecPtr/GetSavedRedoRecPtr entirely, in favor of letting
the postmaster fetch the updated pointer with GetRedoRecPtr just
like a backend would. This would be cleaner and less code ... but
someone might object that it introduces a risk of postmaster hangup,
if some backend crashes whilst holding info_lck. I consider that
risk minuscule given the short intervals in which info_lck is held,
but it can't be denied that the risk is not zero. Thoughts?

Comments? Unless I hear objections I will patch this in current
and the 7.2 branch. (If we agree to remove SavedRedoRecPtr,
though, I don't think we should back-patch that change.)

regards, tom lane

#18

Bruce Momjian

pgman@candle.pha.pa.us

almost 24 years ago

In reply to: Tom Lane (#17)

Re: [HACKERS] Bug #613: Sequence values fall back to previously

Tom Lane wrote:

Attached is a patch against current CVS that fixes both of the known
problems with sequences: failure to flush XLOG after a transaction
that only does "SELECT nextval()", and failure to force a new WAL
record to be written on the first nextval after a checkpoint.
(The latter uses Vadim's idea of looking at the sequence page LSN.)
I haven't tested it really extensively, but it seems to cure the
reported problems.

I can confirm that the patch fixes the problem shown in my simple test:

test=> create table test (x serial, y varchar(255));
NOTICE: CREATE TABLE will create implicit sequence 'test_x_seq' for SERIAL column 'test.x'
NOTICE: CREATE TABLE / UNIQUE will create implicit index 'test_x_key' for table 'test'
CREATE
test=> insert into test (y) values ('lkjasdflkja sdfl;kj asdfl;kjasdf');
INSERT 16561 1
test=> insert into test (y) values ('lkjasdflkja sdfl;kj asdfl;kjasdf');
INSERT 16562 1
test=> insert into test (y) values ('lkjasdflkja sdfl;kj asdfl;kjasdf');
INSERT 16563 1
...

test=> select nextval('test_x_seq');
nextval
---------
22
(1 row)

test=> checkpoint;
CHECKPOINT
test=> insert into test (y) values ('lkjasdflkja sdfl;kj asdfl;kjasdf');
INSERT 16582 1
test=> insert into test (y) values ('lkjasdflkja sdfl;kj asdfl;kjasdf');
INSERT 16583 1
test=> insert into test (y) values ('lkjasdflkja sdfl;kj asdfl;kjasdf');
INSERT 16584 1

[ kill -9 to backend ]

#$ sql test
Welcome to psql, the PostgreSQL interactive terminal.

Type: \copyright for distribution terms
\h for help with SQL commands
\? for help on internal slash commands
\g or terminate with semicolon to execute query
\q to quit

test=> select nextval('test_x_seq');
nextval
---------
56
(1 row)

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#19

Bruce Momjian

pgman@candle.pha.pa.us

almost 24 years ago

In reply to: Tom Lane (#17)

Re: [HACKERS] Bug #613: Sequence values fall back to previously

Tom Lane wrote:

2. I renamed XLogCtl->RedoRecPtr to SavedRedoRecPtr, and renamed
the associated routines to SetSavedRedoRecPtr/GetSavedRedoRecPtr,
in hopes of reducing confusion.

Good.

3. I believe it'd now be possible to remove SavedRedoRecPtr and
SetSavedRedoRecPtr/GetSavedRedoRecPtr entirely, in favor of letting
the postmaster fetch the updated pointer with GetRedoRecPtr just
like a backend would. This would be cleaner and less code ... but
someone might object that it introduces a risk of postmaster hangup,
if some backend crashes whilst holding info_lck. I consider that
risk minuscule given the short intervals in which info_lck is held,
but it can't be denied that the risk is not zero. Thoughts?

The change sounds good to me.

Comments? Unless I hear objections I will patch this in current
and the 7.2 branch. (If we agree to remove SavedRedoRecPtr,
though, I don't think we should back-patch that change.)

Totally agree.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

#20

'Ben Grimm'

bgrimm@zaeon.com

almost 24 years ago

In reply to: Tom Lane (#16)

On Fri, 15 Mar 2002, Tom Lane wrote:

"'Ben Grimm'" <bgrimm@zaeon.com> writes:

When these bugs are fixed there is still the issue of bug #3 that I
came across. The one that I work around by resetting log_cnt to 0 when a
backend initializes a sequence. It's this third bug that made the other
two so apparent. Fixing them does not obviate the need to fix this one.

What's bug #3? I don't recall a third issue.

The problem I was seeing before is that when the postmaster was shutdown
properly, log_cnt in the sequence record was saved with whatever value it
had at the time. So when it loaded from disk it would have a value greater
than zero resulting in no XLogInsert until you'd exceded log_cnt calls to
nextval.

AFAICT, your patch fixes this problem, as I can't reproduce it now.

Thanks!

-- Ben

#21

Tom Lane

tgl@sss.pgh.pa.us

almost 24 years ago

In reply to: 'Ben Grimm' (#20)

"'Ben Grimm'" <bgrimm@zaeon.com> writes:

On Fri, 15 Mar 2002, Tom Lane wrote:

What's bug #3? I don't recall a third issue.

The problem I was seeing before is that when the postmaster was shutdown
properly, log_cnt in the sequence record was saved with whatever value it
had at the time.

Right, it's supposed to do that.

So when it loaded from disk it would have a value greater
than zero resulting in no XLogInsert until you'd exceded log_cnt calls to
nextval.

This is the same as the post-checkpoint issue: we fix it by forcing an
XLogInsert on the first nextval after a checkpoint (or system startup).

regards, tom lane

#22

Mikheev, Vadim

vmikheev@SECTORBASE.COM

almost 24 years ago

In reply to: Tom Lane (#21)

Attached is a patch against current CVS that fixes both of the known
problems with sequences: failure to flush XLOG after a transaction

Great! Thanks... and sorry for missing these cases year ago -:)

Vadim

Import Notes

Resolved by subject fallback

#23

Clark C . Evans

cce@clarkevans.com

almost 24 years ago

In reply to: Vadim Mikheev (#14)

Re: [BUGS] Bug #613: Sequence values fall back to previously chec

(userland comment)

On Fri, Mar 15, 2002 at 01:05:33AM -0800, Vadim Mikheev wrote:
| > But sequences should not be under transaction control. Can you
| > safely rollback a sequence? No! The only way to ensure that would
| ...
| > Placing a restriction on an application that says it must treat the values
| > returned from a sequence as if they might not be committed is absurd.
|
| Why? The fact that you are not able to rollback sequences does not
| necessary mean that you are not required to perform commit to ensure
| permanent storage of changes made to database.

I use sequences to generate message identifiers for a simple
external-to-database message passing system. I also use
them for file upload identifiers. In both cases, if the
external action (message or file upload) succeeds, I commit;
otherwise I roll-back. I assume that the datbase won't give
me a duplicate sequence... otherwise I'd have to find some
other way go get sequences or I'd have duplicate messages
or non-unique file identifiers.

With these changes is this assumption no longer valid? If
so, this change will break alot of user programs.

| And why? Just for convenience of << 1% applications which need
| to use sequences in their own, non-database, external objects?

I think you may be underestimating the amount of "external resources"
which may be associated with a datbase object. Regardless, may of the
database features in PostgreSQL are there for 1% or less of the
user base...

Best,

Clark

--
Clark C. Evans Axista, Inc.
http://www.axista.com 800.926.5525
XCOLLA Collaborative Project Management Software

#24

Rod Taylor

rbt@zort.ca

almost 24 years ago

In reply to: 'Ben Grimm' (#13)

Re: [BUGS] Bug #613: Sequence values fall back to previously chec

I do basically the same thing for files. Except I md5 a 4 character
random string, and the sequence ID just incase I get the same one
twice -- as it's never been written in stone that I wouldn't -- not to
mention the high number of requests for returning a sequence ID back
to the pool on a rollback.

Anyway, you might try using the OID rather than a sequence ID but if
you rollback the database commit due to failure of an action
externally, shouldn't you be cleaning up that useless external stuff
as well?
--
Rod Taylor

This message represents the official view of the voices in my head

----- Original Message -----
From: "Clark C . Evans" <cce@clarkevans.com>
To: "Vadim Mikheev" <vmikheev@sectorbase.com>
Cc: <pgsql-hackers@postgresql.org>
Sent: Friday, March 15, 2002 8:54 PM
Subject: Re: [HACKERS] [BUGS] Bug #613: Sequence values fall back to
previously chec

(userland comment)

On Fri, Mar 15, 2002 at 01:05:33AM -0800, Vadim Mikheev wrote:
| > But sequences should not be under transaction control. Can you
| > safely rollback a sequence? No! The only way to ensure that

would

| ...
| > Placing a restriction on an application that says it must treat

the values

| > returned from a sequence as if they might not be committed is

absurd.

|
| Why? The fact that you are not able to rollback sequences does not
| necessary mean that you are not required to perform commit to

ensure

| permanent storage of changes made to database.

I use sequences to generate message identifiers for a simple
external-to-database message passing system. I also use
them for file upload identifiers. In both cases, if the
external action (message or file upload) succeeds, I commit;
otherwise I roll-back. I assume that the datbase won't give
me a duplicate sequence... otherwise I'd have to find some
other way go get sequences or I'd have duplicate messages
or non-unique file identifiers.

With these changes is this assumption no longer valid? If
so, this change will break alot of user programs.

| And why? Just for convenience of << 1% applications which need
| to use sequences in their own, non-database, external objects?

I think you may be underestimating the amount of "external

resources"

which may be associated with a datbase object. Regardless, may of

the

database features in PostgreSQL are there for 1% or less of the
user base...

Best,

Clark

--
Clark C. Evans Axista, Inc.
http://www.axista.com 800.926.5525
XCOLLA Collaborative Project Management Software

---------------------------(end of

broadcast)---------------------------

TIP 1: subscribe and unsubscribe commands go to

majordomo@postgresql.org

Show quoted text

#25

Vadim Mikheev

vmikheev@sectorbase.com

almost 24 years ago

In reply to: 'Ben Grimm' (#13)

Re: [BUGS] Bug #613: Sequence values fall back to previously chec

| > Placing a restriction on an application that says it must treat the

values

| > returned from a sequence as if they might not be committed is absurd.
|
| Why? The fact that you are not able to rollback sequences does not
| necessary mean that you are not required to perform commit to ensure
| permanent storage of changes made to database.

I use sequences to generate message identifiers for a simple
external-to-database message passing system. I also use
them for file upload identifiers. In both cases, if the
external action (message or file upload) succeeds, I commit;
otherwise I roll-back. I assume that the datbase won't give
me a duplicate sequence... otherwise I'd have to find some

So can you do "select nextval()" in *separate* (committed)
transaction *before* external action and "real" transaction where
you store information (with sequence number) about external
action in database?

BEGIN;
SELECT NEXTVAL();
COMMIT;
BEGIN;
-- Do external actions and store info in DB --
COMMIT/ROLLBACK;

Is this totally unacceptable? Is it really *required* to call nextval()
in *the same* transaction where you store info in DB? Why?

other way go get sequences or I'd have duplicate messages
or non-unique file identifiers.

With these changes is this assumption no longer valid? If

1. It's not valid to assume that sequences will not return duplicate
numbers if there was no commit after nextval.
2. It doesn't matter when sequence numbers are stored in
database objects only.
3. But if you're going to use sequence numbers in external objects
you must (pre)fetch those numbers in separate committed
transaction.

(Can we have this in FAQ?)

so, this change will break alot of user programs.

| And why? Just for convenience of << 1% applications which need
| to use sequences in their own, non-database, external objects?

I think you may be underestimating the amount of "external resources"
which may be associated with a datbase object. Regardless, may of the
database features in PostgreSQL are there for 1% or less of the
user base...

Please note that I was talking about some *inconvenience*, not about
*inability* of using sequence numbers externally (seems my words were
too short). Above is how to do this. And though I agreed that it's not
very convenient/handy/cosy to *take care* and fetch numbers in
separate committed transaction, but it's required only in those special
cases and I think it's better than do fsync() per each nextval() call what
would affect other users/applications where storing sequence numbers
outside of database is not required.

Vadim