Efficiently advancing a sequence without risking it going backwards.

Started by Paul McGarryalmost 6 years ago7 messagesgeneral
Jump to latest
#1Paul McGarry
paul@paulmcgarry.com

I have two sequences in different dbs which I want to keep roughly in sync
(they don't have to be exactly in sync, I am just keeping them in the same
ballpark).

Currently I have a process which periodically checks the sequences and does:

1) Check values
DB1sequence: 1234
DB2sequence: 1233 (1 behind)
2) while (nextval('DB2sequence')<=1234);

which works fine, but is pretty inefficient if the discrepancy is large (ie
calling nextval a hundred thousand times).

I don't think I can use setval(), because it risks making sequences go
backwards, eg:

1) Check values
DB1sequence: 1234
DB2sequence: 1233 (1 behind)
2) setval('DB2sequence',1234);

but if between (1) and (2) there are 2 nextval(DB2sequence) calls on
another process, (2) would take the sequence back from 1235 to 1234 and I
would end up trying to create a duplicate key ID from the sequence.

So what I really want is something equivalent to the setval, but with
"where DB2sequence <1234" logic so it doesn't overwrite the value if it is
already large.

Is there such a mechanism?

Thanks for any help.

Paul

#2Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Paul McGarry (#1)
Re: Efficiently advancing a sequence without risking it going backwards.

On 7/6/20 7:06 PM, Paul McGarry wrote:

I have two sequences in different dbs which I want to keep roughly in
sync (they don't have to be exactly in sync, I am just keeping them in
the same ballpark).

Currently I have a process which periodically checks the sequences and does:

1) Check values
DB1sequence: 1234
DB2sequence: 1233 (1 behind)
2) while (nextval('DB2sequence')<=1234);

which works fine, but is pretty inefficient if the discrepancy is large
(ie calling nextval a hundred thousand times).

I don't think I can use setval(), because it risks making sequences go
backwards, eg:

1) Check values
DB1sequence: 1234
DB2sequence: 1233 (1 behind)
2) setval('DB2sequence',1234);

but if between (1) and (2) there are 2 nextval(DB2sequence) calls on
another process,  (2) would take the sequence back from 1235 to 1234 and
I would end up trying to create a duplicate key ID from the sequence.

So what I really want is something equivalent to the setval, but with
"where DB2sequence <1234" logic so it doesn't overwrite the value if it
is already large.

Is there such a mechanism?

Well sequences are designed to be operated on independently from each
session, so there is not much you can do about locking on a number. The
best you can do is use setval() to increment the number by enough to get
past any potential sequence advances in other sessions. Say advance by
10, 50 or 100 depending on what you think is a reasonable number of
other sessions also hitting the sequence.

Thanks for any help.

Paul

--
Adrian Klaver
adrian.klaver@aklaver.com

#3Jeremy Schneider
schneider@ardentperf.com
In reply to: Paul McGarry (#1)
Re: Efficiently advancing a sequence without risking it going backwards.

On Jul 6, 2020, at 19:06, Paul McGarry <paul@paulmcgarry.com> wrote:

I don't think I can use setval(), because it risks making sequences go backwards, eg:

1) Check values
DB1sequence: 1234
DB2sequence: 1233 (1 behind)
2) setval('DB2sequence',1234);

but if between (1) and (2) there are 2 nextval(DB2sequence) calls on another process, (2) would take the sequence back from 1235 to 1234 and I would end up trying to create a duplicate key ID from the sequence.

An ability to “lock” the sequence momentarily would give you the tool you need, but I don’t think it’s there.

Total hack, but if your application or users can retry when the rare error is encountered then one idea is to rename the sequence momentarily while you do the setval() then rename it back. Do an initial check without renaming, then re-check after renaming and before the setval() call.

If you put retry logic into your application then make sure to include back-off logic so you don’t get an outage induced by thundering herd.

-Jeremy

Sent from my TI-83

#4Chris Browne
cbbrowne@acm.org
In reply to: Jeremy Schneider (#3)
Re: Efficiently advancing a sequence without risking it going backwards.

On Thu, 9 Jul 2020 at 12:59, Jeremy Schneider <schneider@ardentperf.com>
wrote:

On Jul 6, 2020, at 19:06, Paul McGarry <paul@paulmcgarry.com> wrote:

I don't think I can use setval(), because it risks making sequences go

backwards, eg:

1) Check values
DB1sequence: 1234
DB2sequence: 1233 (1 behind)
2) setval('DB2sequence',1234);

but if between (1) and (2) there are 2 nextval(DB2sequence) calls on

another process, (2) would take the sequence back from 1235 to 1234 and I
would end up trying to create a duplicate key ID from the sequence.

An ability to “lock” the sequence momentarily would give you the tool you
need, but I don’t think it’s there.

Total hack, but if your application or users can retry when the rare error
is encountered then one idea is to rename the sequence momentarily while
you do the setval() then rename it back. Do an initial check without
renaming, then re-check after renaming and before the setval() call.

If you put retry logic into your application then make sure to include
back-off logic so you don’t get an outage induced by thundering herd.

This is increasingly looking like a set of attempts to intentionally abuse
what sequences were designed for.

The use-case where you need a lock on the value so that there can't
possibly be a hole in the sequence points at the notion of having some
other kind of a function that takes out a lock on a table, and serially
gives out "MAX+1" as the next value.

That isn't a very difficult function to write; the problem with it is that
that sort of function will forcibly serialize all inserts through the
function+table lock that is giving out "MAX+1" values. That's going to be
WAY slower than using a sequence object, and about 98% of the time, people
will prefer the sequence object, particularly because it's about 98% faster.

I'm not quite sure if anyone has put out there a standard-ish idiom for
this; that seems like a not TOO difficult "exercise for the user."

There will definitely be more failure cases, and *wildly* more fighting, in
a concurrent environment, over tuple locks.

- An obvious failure is that if one connection asks for the new MAX+1, gets
it, and then the transaction fails, for some later, out-of-relevant-scope,
reason, you'll still potentially get some "holes" in the series of values.

- If there are 10 connections trying to get MAX+1 concurrently, only one
can get it at a time, and that connection can't relinquish the lock until
its transaction has completed, and the 9 must wait, regardless of how much
work the "winner" still has to do.

These are amongst the reasons why people conclude they *don't* want that
kind of functionality.

It makes me think that the problem needs to be taken back to that initial
point of "I think I need some somewhat coordinated sequences", and poke at
what the *real* requirement is there, and why someone thinks that the
values should be "somewhat coordinated." Something seems off there.
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

#5Tim Cross
theophilusx@gmail.com
In reply to: Chris Browne (#4)
Re: Efficiently advancing a sequence without risking it going backwards.

Christopher Browne <cbbrowne@gmail.com> writes:

On Thu, 9 Jul 2020 at 12:59, Jeremy Schneider <schneider@ardentperf.com>
wrote:

On Jul 6, 2020, at 19:06, Paul McGarry <paul@paulmcgarry.com> wrote:

I don't think I can use setval(), because it risks making sequences go

backwards, eg:

1) Check values
DB1sequence: 1234
DB2sequence: 1233 (1 behind)
2) setval('DB2sequence',1234);

but if between (1) and (2) there are 2 nextval(DB2sequence) calls on

another process, (2) would take the sequence back from 1235 to 1234 and I
would end up trying to create a duplicate key ID from the sequence.

An ability to “lock” the sequence momentarily would give you the tool you
need, but I don’t think it’s there.

Total hack, but if your application or users can retry when the rare error
is encountered then one idea is to rename the sequence momentarily while
you do the setval() then rename it back. Do an initial check without
renaming, then re-check after renaming and before the setval() call.

If you put retry logic into your application then make sure to include
back-off logic so you don’t get an outage induced by thundering herd.

This is increasingly looking like a set of attempts to intentionally abuse
what sequences were designed for.

The use-case where you need a lock on the value so that there can't
possibly be a hole in the sequence points at the notion of having some
other kind of a function that takes out a lock on a table, and serially
gives out "MAX+1" as the next value.

That isn't a very difficult function to write; the problem with it is that
that sort of function will forcibly serialize all inserts through the
function+table lock that is giving out "MAX+1" values. That's going to be
WAY slower than using a sequence object, and about 98% of the time, people
will prefer the sequence object, particularly because it's about 98% faster.

I'm not quite sure if anyone has put out there a standard-ish idiom for
this; that seems like a not TOO difficult "exercise for the user."

There will definitely be more failure cases, and *wildly* more fighting, in
a concurrent environment, over tuple locks.

- An obvious failure is that if one connection asks for the new MAX+1, gets
it, and then the transaction fails, for some later, out-of-relevant-scope,
reason, you'll still potentially get some "holes" in the series of values.

- If there are 10 connections trying to get MAX+1 concurrently, only one
can get it at a time, and that connection can't relinquish the lock until
its transaction has completed, and the 9 must wait, regardless of how much
work the "winner" still has to do.

These are amongst the reasons why people conclude they *don't* want that
kind of functionality.

It makes me think that the problem needs to be taken back to that initial
point of "I think I need some somewhat coordinated sequences", and poke at
what the *real* requirement is there, and why someone thinks that the
values should be "somewhat coordinated." Something seems off there.

I agree and was going to write something similar. All the 'solutions'
are problematic in one way or the other and seem to be due to a
misconception about the role for sequences or some requirement which
needs to be re-examined.
--
Tim Cross

#6Jeremy Schneider
schneider@ardentperf.com
In reply to: Chris Browne (#4)
Re: Efficiently advancing a sequence without risking it going backwards.

On Jul 9, 2020, at 14:08, Christopher Browne <cbbrowne@gmail.com> wrote:



On Thu, 9 Jul 2020 at 12:59, Jeremy Schneider <schneider@ardentperf.com> wrote:

On Jul 6, 2020, at 19:06, Paul McGarry <paul@paulmcgarry.com> wrote:

I don't think I can use setval(), because it risks making sequences go backwards, eg:

1) Check values
DB1sequence: 1234
DB2sequence: 1233 (1 behind)
2) setval('DB2sequence',1234);

but if between (1) and (2) there are 2 nextval(DB2sequence) calls on another process, (2) would take the sequence back from 1235 to 1234 and I would end up trying to create a duplicate key ID from the sequence.

An ability to “lock” the sequence momentarily would give you the tool you need, but I don’t think it’s there.

The use-case where you need a lock on the value so that there can't possibly be a hole in the sequence

OP asked for a way to call setval() with a guarantee the sequence will never go backwards IIUC. His code can check that the new value he wants to set is higher than the current value, but there’s a race condition where a second connection could quickly advance the sequence between the check and the setval() call and then cause duplicates from the next call which is bad.

The ideal solution is a setval_forward_only() or setval_no_duplicates() function that does it atomically or something. If it were possible to “lock” the entire sequence to prevent any other sessions from using it at all, that would work too. Not locking a value, locking the whole thing. Very bad hack solution is renaming the sequence then renaming it back as a blunt form of locking... and to be clear I don’t think is a good idea I just was saying that technically it might work. :)

-Jeremy

Sent from my TI-83

#7Paul McGarry
paul@paulmcgarry.com
In reply to: Jeremy Schneider (#6)
Re: Efficiently advancing a sequence without risking it going backwards.

On Fri, 10 Jul 2020 at 10:27, Jeremy Schneider <schneider@ardentperf.com>
wrote:

OP asked for a way to call setval() with a guarantee the sequence will
never go backwards IIUC. His code can check that the new value he wants to
set is higher than the current value, but there’s a race condition where a
second connection could quickly advance the sequence between the check and
the setval() call and then cause duplicates from the next call which is bad.

The ideal solution is a setval_forward_only() or setval_no_duplicates()
function that does it atomically or something. If it were possible to
“lock” the entire sequence to prevent any other sessions from using it at
all, that would work too. Not locking a value, locking the whole thing.
Very bad hack solution is renaming the sequence then renaming it back as a
blunt form of locking... and to be clear I don’t think is a good idea I
just was saying that technically it might work. :)

-Jeremy

Yes, that first paragraph is a good summary. A "setval_forward_only()" is
the sort of thing I was after.

Maybe something analogous to:
UPDATE the_seq SET last_value = number WHERE last_value < number;
with some sort of global (but short) lock as required.

Relating to some of the other replies there isn't a "requirement" (from an
application perspective) that the sequences always generate ids in
ascending order or that they don't skip numbers etc. To the application
they are just ids, as long as they are unique that is enough. However it is
an application that is used by people, so there is some external value in
having the ids going up in a way that roughly correlates to time as people
tend to expect numbers to do that sort of thing.

For a bit more background, we have our own application and homegrown
loosely coupled multi-primary DB cluster and replication system.
Each backend DB in the cluster has its own node id (0-9) and when our app
asks for a sequence value it calls a custom function which gets a normal
sequence value suffixed with the DB node ID.

So if there were two backend dbs (1 and 2) and both backend dbs had a
sequence with last_value of 1234 then our application would get a
"sequence" value of 12351 or 12352 depending on which db backend served the
request.
The resulting ids are unique across our cluster, but certainly not gapless
nor issued in strict ascending order which is fine from an application
perspective.

But as mentioned, from a human perspective there is some value in keeping
the ids issued by the cluster roughly in time order, so we have a secondary
process which liaises with all the backend nodes and pulls forwards any
sequences that fall behind other nodes. So if DB 1 happened to serve 1000
requests using the sequence while DB2 served none, the process pulls the
sequence in DB2 forward until it catches up, currently by calling nextval
in a loop.

Which all works fine. However sometimes (eg taking a node offline for
maintenance or upgrade) a sequence might get quite a long way out, and
calling nextval() 100k times seems a rather inefficient way to catch up
(but it is better to be inefficient than risk going backwards and causing a
duplicate id).

We have been using essentially this system for our cluster since Postgres 7
days, periodically we have touched base with Postgres replication
advancements (which have come a long way) but haven't yet found a
compelling reason to switch from what is working.

Paul