Logical locking beyond pg_advisory

Started by marceloover 7 years ago8 messagesgeneral
Jump to latest
#1marcelo
marcelo.nicolet@gmail.com

I need a mechanism of "logical locking" more ductile than the
pg_advisory family.
I'm thinking of a table ("lock_table") that would be part of the
database, with columns
* tablename varchar - name of the table "locked"
* rowid integer, - id of the row "locked"
* ownerid varchar, - identifier of the "user" who acquired the lock
* acquired timestamp - to be able to release "abandoned" locks after a
certain time

and a group of functions
1) lock_table (tablename varchar, ownerid varchar) bool - get to lock
over the entire table, setting rowid to zero
2) unlock_table (tablename varchar, ownerid varchar) bool - unlock the
table, if the owner is the recorded one
3) locked_table (tablename varchar, ownerid varchar) bool - ask if the
table is locked by some user other than the ownerid argument
4) lock_row (tablename varchar, rowid integer, ownerid varchar) bool -
similar to pg_try_advisory_lock
5) unlock_row (tablename varchar, rowid integer, ownerid varchar) bool -
similar to pg_advisory_unlock
6) unlock_all (ownerid varchar) bool - unlock all locks owned by ownerid

The timeout (default, maybe 15 minutes) is implicitly applied if the
lock is taken by another user (there will be no notification).
Redundant locks are not queued, they simply return true, may be after an
update of the acquired column.
Successful locks insert a new row, except the rare case of a timeout,
which becomes an update (ownerid and acquired)
Unlock operations deletes the corresponding row

My question is double
a) What is the opinion on the project?
b) What are the consequences of the large number of inserts and deletions
c) Performance. In fact, pg_advisory* implies a network roundtrip, but
(I think) no table operations.

TIA

---
El software de antivirus Avast ha analizado este correo electrónico en busca de virus.
https://www.avast.com/antivirus

#2Fabrízio de Royes Mello
fabriziomello@gmail.com
In reply to: marcelo (#1)
Re: Logical locking beyond pg_advisory

Em dom, 16 de set de 2018 às 17:53, marcelo <marcelo.nicolet@gmail.com>
escreveu:

I need a mechanism of "logical locking" more ductile than the pg_advisory

family.

I'm thinking of a table ("lock_table") that would be part of the

database, with columns

* tablename varchar - name of the table "locked"
* rowid integer, - id of the row "locked"
* ownerid varchar, - identifier of the "user" who acquired the lock
* acquired timestamp - to be able to release "abandoned" locks after a

certain time

and a group of functions
1) lock_table (tablename varchar, ownerid varchar) bool - get to lock

over the entire table, setting rowid to zero

2) unlock_table (tablename varchar, ownerid varchar) bool - unlock the

table, if the owner is the recorded one

3) locked_table (tablename varchar, ownerid varchar) bool - ask if the

table is locked by some user other than the ownerid argument

4) lock_row (tablename varchar, rowid integer, ownerid varchar) bool -

similar to pg_try_advisory_lock

5) unlock_row (tablename varchar, rowid integer, ownerid varchar) bool -

similar to pg_advisory_unlock

6) unlock_all (ownerid varchar) bool - unlock all locks owned by ownerid

The timeout (default, maybe 15 minutes) is implicitly applied if the lock

is taken by another user (there will be no notification).

Redundant locks are not queued, they simply return true, may be after an

update of the acquired column.

Successful locks insert a new row, except the rare case of a timeout,

which becomes an update (ownerid and acquired)

Unlock operations deletes the corresponding row

My question is double
a) What is the opinion on the project?

Would be nice if you explain more about what kind of problem you want to
solve.

b) What are the consequences of the large number of inserts and deletions

The first thing came to my mind with this approach is table bloat.

c) Performance. In fact, pg_advisory* implies a network roundtrip, but (I

think) no table operations.

Yeap... no table operations.

Regards,

--
Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

#3marcelo
marcelo.nicolet@gmail.com
In reply to: Fabrízio de Royes Mello (#2)
Re: Logical locking beyond pg_advisory

/Would be nice if you explain more about what kind of problem you want
to solve./

There are two main questions "to solve"
a) generally speaking, I like to get some form of "exclusive access" to
the row before updating or deleting. None of the optimistic / pesimistic
automatic variants of concurrency management glad me. Nor, of course,
the "versioning" variants.
b) some of the tables I´m working on have a "number" column (_not the
PK_) which sometimes come from material, external sources, but sometimes
must be assigned by the system. This could be solved two main ways
b.1) Use a trigger to get the famous "max(n) + 1". At least in one of
the cases, the number automatically generated must fall into different
ranges conditioned by the value of another column, and the ranges vary
between versions of the database.
b.2) "Lock the entire table", get programmatically the next number for
the correct range, assign it and free the table lock.

Of course, and beforehand, all database manipulations are done thru
applications.
Till yesterday, I was working with the "advisory" family. Having a
bigint as the only "lock identifier" I was working with a hash of the
table name XORed with the id of the row or zero for the entire table.
(All my tables have an autosequential integer id as PK).
Even if I found a very robust hash algorithm for the table name, I
cannot discard some collision once the id was xored. I tested five or
six table names, along 20000 ids every one, without collision. But...
Of course, I need the "full table lock" for inserts. So, it´s a very
separated concern with updates and deletions. But...

TIA

On 17/09/2018 03:19 , Fabrízio de Royes Mello wrote:

Em dom, 16 de set de 2018 às 17:53, marcelo <marcelo.nicolet@gmail.com
<mailto:marcelo.nicolet@gmail.com>> escreveu:

I need a mechanism of "logical locking" more ductile than the

pg_advisory family.

I'm thinking of a table ("lock_table") that would be part of the

database, with columns

* tablename varchar - name of the table "locked"
* rowid integer, - id of the row "locked"
* ownerid varchar, - identifier of the "user" who acquired the lock
* acquired timestamp - to be able to release "abandoned" locks after

a certain time

and a group of functions
1) lock_table (tablename varchar, ownerid varchar) bool - get to

lock over the entire table, setting rowid to zero

2) unlock_table (tablename varchar, ownerid varchar) bool - unlock

the table, if the owner is the recorded one

3) locked_table (tablename varchar, ownerid varchar) bool - ask if

the table is locked by some user other than the ownerid argument

4) lock_row (tablename varchar, rowid integer, ownerid varchar) bool

- similar to pg_try_advisory_lock

5) unlock_row (tablename varchar, rowid integer, ownerid varchar)

bool - similar to pg_advisory_unlock

6) unlock_all (ownerid varchar) bool - unlock all locks owned by ownerid

The timeout (default, maybe 15 minutes) is implicitly applied if the

lock is taken by another user (there will be no notification).

Redundant locks are not queued, they simply return true, may be

after an update of the acquired column.

Successful locks insert a new row, except the rare case of a

timeout, which becomes an update (ownerid and acquired)

Unlock operations deletes the corresponding row

My question is double
a) What is the opinion on the project?

Would be nice if you explain more about what kind of problem you want
to solve.

b) What are the consequences of the large number of inserts and

deletions

The first thing came to my mind with this approach is table bloat.

c) Performance. In fact, pg_advisory* implies a network roundtrip,

but (I think) no table operations.

Yeap... no table operations.

Regards,

--
   Fabrízio de Royes Mello         Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

---
El software de antivirus Avast ha analizado este correo electrónico en busca de virus.
https://www.avast.com/antivirus

#4Merlin Moncure
mmoncure@gmail.com
In reply to: marcelo (#1)
Re: Logical locking beyond pg_advisory

On Sun, Sep 16, 2018 at 3:53 PM marcelo <marcelo.nicolet@gmail.com> wrote:

I need a mechanism of "logical locking" more ductile than the pg_advisory family.
I'm thinking of a table ("lock_table") that would be part of the database, with columns
* tablename varchar - name of the table "locked"
* rowid integer, - id of the row "locked"
* ownerid varchar, - identifier of the "user" who acquired the lock
* acquired timestamp - to be able to release "abandoned" locks after a certain time

and a group of functions
1) lock_table (tablename varchar, ownerid varchar) bool - get to lock over the entire table, setting rowid to zero
2) unlock_table (tablename varchar, ownerid varchar) bool - unlock the table, if the owner is the recorded one
3) locked_table (tablename varchar, ownerid varchar) bool - ask if the table is locked by some user other than the ownerid argument
4) lock_row (tablename varchar, rowid integer, ownerid varchar) bool - similar to pg_try_advisory_lock
5) unlock_row (tablename varchar, rowid integer, ownerid varchar) bool - similar to pg_advisory_unlock
6) unlock_all (ownerid varchar) bool - unlock all locks owned by ownerid

The timeout (default, maybe 15 minutes) is implicitly applied if the lock is taken by another user (there will be no notification).
Redundant locks are not queued, they simply return true, may be after an update of the acquired column.
Successful locks insert a new row, except the rare case of a timeout, which becomes an update (ownerid and acquired)
Unlock operations deletes the corresponding row

My question is double
a) What is the opinion on the project?
b) What are the consequences of the large number of inserts and deletions
c) Performance. In fact, pg_advisory* implies a network roundtrip, but (I think) no table operations.

Why can't you use the advisory lock functions? The challenge with
manually managed locks are they they are slow and you will lose the
coordination the database provides you. For example, if your
application crashes you will have to clean up all held locks yourself.
Building out that infrastructure will be difficult.

merlin

#5Chris Travers
chris.travers@gmail.com
In reply to: Merlin Moncure (#4)
Re: Logical locking beyond pg_advisory

On Mon, Sep 17, 2018 at 5:09 PM Merlin Moncure <mmoncure@gmail.com> wrote:

On Sun, Sep 16, 2018 at 3:53 PM marcelo <marcelo.nicolet@gmail.com> wrote:

I need a mechanism of "logical locking" more ductile than the

pg_advisory family.

I'm thinking of a table ("lock_table") that would be part of the

database, with columns

* tablename varchar - name of the table "locked"
* rowid integer, - id of the row "locked"
* ownerid varchar, - identifier of the "user" who acquired the lock
* acquired timestamp - to be able to release "abandoned" locks after a

certain time

and a group of functions
1) lock_table (tablename varchar, ownerid varchar) bool - get to lock

over the entire table, setting rowid to zero

2) unlock_table (tablename varchar, ownerid varchar) bool - unlock the

table, if the owner is the recorded one

3) locked_table (tablename varchar, ownerid varchar) bool - ask if the

table is locked by some user other than the ownerid argument

4) lock_row (tablename varchar, rowid integer, ownerid varchar) bool -

similar to pg_try_advisory_lock

5) unlock_row (tablename varchar, rowid integer, ownerid varchar) bool -

similar to pg_advisory_unlock

6) unlock_all (ownerid varchar) bool - unlock all locks owned by ownerid

The timeout (default, maybe 15 minutes) is implicitly applied if the

lock is taken by another user (there will be no notification).

Redundant locks are not queued, they simply return true, may be after an

update of the acquired column.

Successful locks insert a new row, except the rare case of a timeout,

which becomes an update (ownerid and acquired)

Unlock operations deletes the corresponding row

My question is double
a) What is the opinion on the project?
b) What are the consequences of the large number of inserts and deletions
c) Performance. In fact, pg_advisory* implies a network roundtrip, but

(I think) no table operations.

Why can't you use the advisory lock functions? The challenge with
manually managed locks are they they are slow and you will lose the
coordination the database provides you. For example, if your
application crashes you will have to clean up all held locks yourself.
Building out that infrastructure will be difficult.

First, I think in an ideal world, you wouldn't handle this problem with
either approach but sometimes you have to.

I have done both approaches actually. LedgerSMB uses its own lock table
because locks have to persist across multiple HTTP requests and we have
various automatic cleanup processes.

When I was working on the queue management stuff at Novozymes we used
advisory locks extensively.

These two approaches have serious downsides:
1. Lock tables are *slow* and require careful thinking through cleanup
scenarios. In LedgerSMB we tied to the application session with an ON
DELETE event that would unlock the row. We estimated that for every 2
seconds that the db spent doing useful work, it spent 42 seconds managing
the locks..... Additionally the fact that locks take effect on snapshot
advance is a problem here.

2. In my talk, "PostgreSQL at 10TB and Beyond" I talk about a problem we
had using advisory locks for managing rows that were being processed for
deletion. Since the deletion was the scan for items at the head of an
index, under heavy load we could spend long enough checking dead rows that
the locks could go away with our snapshot failing to advance. This would
result in duplicate processing. So the fact that advisory locks don't
really follow snapshot semantics is a really big problem here since it
means you can have race conditions in advisory locks that can't happen with
other locking issues. I still love advisory locks but they are not a good
tool for this.

The real solution most of the time is actually to lock the rows by
selecting FOR UPDATE and possibly SKIP LOCKED. The way update/delete row
locking in PostgreSQL works is usually good enough except in a few rare
edge cases. Only in *very rare* cases do lock tables or advisory locks
make sense for actual row processing.

merlin

--
Best Wishes,
Chris Travers

Efficito: Hosted Accounting and ERP. Robust and Flexible. No vendor
lock-in.
http://www.efficito.com/learn_more

#6marcelo
marcelo.nicolet@gmail.com
In reply to: Chris Travers (#5)
Re: Logical locking beyond pg_advisory

On 17/09/2018 12:21 , Chris Travers wrote:

On Mon, Sep 17, 2018 at 5:09 PM Merlin Moncure <mmoncure@gmail.com
<mailto:mmoncure@gmail.com>> wrote:

On Sun, Sep 16, 2018 at 3:53 PM marcelo <marcelo.nicolet@gmail.com
<mailto:marcelo.nicolet@gmail.com>> wrote:

I need a mechanism of "logical locking" more ductile than the

pg_advisory family.

I'm thinking of a table ("lock_table") that would be part of the

database, with columns

* tablename varchar - name of the table "locked"
* rowid integer, - id of the row "locked"
* ownerid varchar, - identifier of the "user" who acquired the lock
* acquired timestamp - to be able to release "abandoned" locks

after a certain time

and a group of functions
1) lock_table (tablename varchar, ownerid varchar) bool - get to

lock over the entire table, setting rowid to zero

2) unlock_table (tablename varchar, ownerid varchar) bool -

unlock the table, if the owner is the recorded one

3) locked_table (tablename varchar, ownerid varchar) bool - ask

if the table is locked by some user other than the ownerid argument

4) lock_row (tablename varchar, rowid integer, ownerid varchar)

bool - similar to pg_try_advisory_lock

5) unlock_row (tablename varchar, rowid integer, ownerid

varchar) bool - similar to pg_advisory_unlock

6) unlock_all (ownerid varchar) bool - unlock all locks owned by

ownerid

The timeout (default, maybe 15 minutes) is implicitly applied if

the lock is taken by another user (there will be no notification).

Redundant locks are not queued, they simply return true, may be

after an update of the acquired column.

Successful locks insert a new row, except the rare case of a

timeout, which becomes an update (ownerid and acquired)

Unlock operations deletes the corresponding row

My question is double
a) What is the opinion on the project?
b) What are the consequences of the large number of inserts and

deletions

c) Performance. In fact, pg_advisory* implies a network

roundtrip, but (I think) no table operations.

Why can't you use the advisory lock functions?  The challenge with
manually managed locks are they they are slow and you will lose the
coordination the database provides you.  For example, if your
application crashes you will have to clean up all held locks yourself.
Building out that infrastructure will be difficult.

First, I think in an ideal world, you wouldn't handle this problem
with either approach but sometimes you have to.

I have done both approaches actually.  LedgerSMB uses its own lock
table because locks have to persist across multiple HTTP requests and
we have various automatic cleanup processes.

When I was working on  the queue management stuff at Novozymes we used
advisory locks extensively.

These two approaches have serious downsides:
1.  Lock tables are *slow* and require careful thinking through
cleanup scenarios.  In LedgerSMB we tied to the application session
with an ON DELETE event that would unlock the row.  We estimated that
for every 2 seconds that the db spent doing useful work, it spent 42
seconds managing the locks.....  Additionally the fact that locks take
effect on snapshot advance is a problem here.

2.  In my talk, "PostgreSQL at 10TB and  Beyond" I talk about a
problem we had using advisory locks for managing rows that were being
processed for deletion.  Since the deletion was the scan for items at
the head of an index, under heavy load we could spend long enough
checking dead rows that the locks could go away with our snapshot
failing to advance.  This would result in duplicate processing.  So
the fact that advisory locks don't really follow snapshot semantics is
a really big problem here since it means you can have race conditions
in advisory locks that can't happen with other locking issues.  I
still love advisory locks but they are not a good tool for this.

The real solution most of the time is actually to lock the rows by
selecting FOR UPDATE and possibly SKIP LOCKED. The way update/delete
row locking in PostgreSQL works is usually good enough except in a few
rare edge cases.  Only in *very rare* cases do lock tables or advisory
locks make sense for actual row processing.

merlin

--
Best Wishes,
Chris Travers

Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor
lock-in.
http://www.efficito.com/learn_more

I´m using an ORM (Devart´s) to access the database, so, I cannot "select
... FOR UPDATE". The application paradigm is that a user have a list of
records (after a query) and she could update or delete any of them as
the business rules allows it. So, at least an advisory lock is a must.
I´m convinced by now: I would stay with advisory locks... expecting no
app crash could occur...
Thank you all.
Marcelo

---
El software de antivirus Avast ha analizado este correo electrónico en busca de virus.
https://www.avast.com/antivirus

#7Chris Travers
chris.travers@gmail.com
In reply to: marcelo (#6)
Re: Logical locking beyond pg_advisory

On Mon, Sep 17, 2018 at 6:04 PM marcelo <marcelo.nicolet@gmail.com> wrote:

I´m using an ORM (Devart´s) to access the database, so, I cannot "select
... FOR UPDATE". The application paradigm is that a user have a list of
records (after a query) and she could update or delete any of them as the
business rules allows it. So, at least an advisory lock is a must.
I´m convinced by now: I would stay with advisory locks... expecting no app
crash could occur...

I would say to fix this in the ORM rather than reinvent what the database
already gives you in the database.

Thank you all.
Marcelo

<https://www.avast.com/sig-email?utm_medium=email&amp;utm_source=link&amp;utm_campaign=sig-email&amp;utm_content=emailclient&gt; Libre
de virus. www.avast.com
<https://www.avast.com/sig-email?utm_medium=email&amp;utm_source=link&amp;utm_campaign=sig-email&amp;utm_content=emailclient&gt;
<#m_-9091154853724945458_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

--
Best Wishes,
Chris Travers

Efficito: Hosted Accounting and ERP. Robust and Flexible. No vendor
lock-in.
http://www.efficito.com/learn_more

#8marcelo
marcelo.nicolet@gmail.com
In reply to: Chris Travers (#7)
Re: Logical locking beyond pg_advisory

On 17/09/2018 14:27 , Chris Travers wrote:

On Mon, Sep 17, 2018 at 6:04 PM marcelo <marcelo.nicolet@gmail.com
<mailto:marcelo.nicolet@gmail.com>> wrote:

I´m using an ORM (Devart´s) to access the database, so, I cannot
"select ... FOR UPDATE". The application paradigm is that a user
have a list of records (after a query) and she could update or
delete any of them as the business rules allows it. So, at least
an advisory lock is a must.
I´m convinced by now: I would stay with advisory locks...
expecting no app crash could occur...

I would say to fix this in the ORM rather than reinvent what the
database already gives you in the database.

You are right. But you know...

Thank you all.
Marcelo

<https://www.avast.com/sig-email?utm_medium=email&amp;utm_source=link&amp;utm_campaign=sig-email&amp;utm_content=emailclient&gt;
Libre de virus. www.avast.com
<https://www.avast.com/sig-email?utm_medium=email&amp;utm_source=link&amp;utm_campaign=sig-email&amp;utm_content=emailclient&gt;

<#m_-9091154853724945458_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

--
Best Wishes,
Chris Travers

Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor
lock-in.
http://www.efficito.com/learn_more

---
El software de antivirus Avast ha analizado este correo electrónico en busca de virus.
https://www.avast.com/antivirus