lock timeout patch

Started by Satoshi Nagayasualmost 22 years ago21 messageshackers

Jump to latest

Satoshi Nagayasu

nagayasus@nttdata.co.jp

almost 22 years ago

Hello all,

I've created a lock timeout patch and it's attached.

When a transaction is blocked by another transaction because of
waiting a lock, we need a lock timeout in some cases.

Using this patch, the lock timeout is enabled with
'lock_timeout = xxxx' directive in postgresql.conf,
and if a timeout is occured, an error code (40P02)
will be returned and a client application can detect it using

JDBC: SQLException.getSQLState()
C: PQresultErrorField()

I know my code need to be cleaned up,
but any comments about this patch?

--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>

Tom Lane

tgl@sss.pgh.pa.us

almost 22 years ago

In reply to: Satoshi Nagayasu (#1)

Re: lock timeout patch

Satoshi Nagayasu <nagayasus@nttdata.co.jp> writes:

When a transaction is blocked by another transaction because of
waiting a lock, we need a lock timeout in some cases.

Isn't there an existing solution for this problem?

regards, tom lane

Satoshi Nagayasu

nagayasus@nttdata.co.jp

almost 22 years ago

In reply to: Tom Lane (#2)

Re: lock timeout patch

Tom,

I guess the transaction cancellation from the client
using PQrequestCancel() is available, but the cancellation
logic must be implemented in the client-application using
signal or thread.

I think detecting such situation on server-side is not
available now, and SQL Server or DB2 have same function.

Tom Lane wrote:

Satoshi Nagayasu <nagayasus@nttdata.co.jp> writes:

When a transaction is blocked by another transaction because of
waiting a lock, we need a lock timeout in some cases.

Isn't there an existing solution for this problem?

regards, tom lane

--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>

Tom Lane

tgl@sss.pgh.pa.us

almost 22 years ago

In reply to: Satoshi Nagayasu (#3)

Re: lock timeout patch

Satoshi Nagayasu <nagayasus@nttdata.co.jp> writes:

I guess the transaction cancellation from the client
using PQrequestCancel() is available, but the cancellation
logic must be implemented in the client-application using
signal or thread.

Actually I think the recommended solution involves using statement_timeout.

regards, tom lane

Satoshi Nagayasu

nagayasus@nttdata.co.jp

almost 22 years ago

In reply to: Tom Lane (#4)

Re: lock timeout patch

statement_timeout terminates large sort or scan
even if it is running, doesn't it?

statement_timeout doesn't care that
the process is waiting a lock or running.
I don't want to terminate a running query.

So a lock waiting backend shold be killed.

Tom Lane wrote:

Satoshi Nagayasu <nagayasus@nttdata.co.jp> writes:

I guess the transaction cancellation from the client
using PQrequestCancel() is available, but the cancellation
logic must be implemented in the client-application using
signal or thread.

Actually I think the recommended solution involves using statement_timeout.

regards, tom lane

--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>

Tom Lane

tgl@sss.pgh.pa.us

almost 22 years ago

In reply to: Satoshi Nagayasu (#5)

Re: lock timeout patch

Satoshi Nagayasu <nagayasus@nttdata.co.jp> writes:

statement_timeout terminates large sort or scan
even if it is running, doesn't it?

statement_timeout doesn't care that
the process is waiting a lock or running.
I don't want to terminate a running query.

So a lock waiting backend shold be killed.

This argument holds no water. On what will you base your estimate of
a good value for lock_timeout? It is nothing more than your estimate
of the statement runtime for some other backend that is currently
holding the lock you want ... an estimate which surely has less, not
more, reliability than the estimate you could make of the maximum
runtime of your own statement, because you have less information about
just what that other backend is doing. (And both you and the other
backend are in turn dependent on waiting for locks held by third
parties, etc etc.)

I'd accept a mechanism to enforce a timeout at the lock level if you
could show me a convincing use-case for lock timeouts instead of
statement timeouts, but I don't believe there is one. I think this
proposal is a solution in search of a problem.

regards, tom lane

Satoshi Nagayasu

nagayasus@nttdata.co.jp

almost 22 years ago

In reply to: Tom Lane (#6)

Re: lock timeout patch

Tom Lane wrote:

I'd accept a mechanism to enforce a timeout at the lock level if you
could show me a convincing use-case for lock timeouts instead of
statement timeouts, but I don't believe there is one. I think this
proposal is a solution in search of a problem.

I think statement_timeout and lock_timeout are different.

If I set statement_timeout to 1000 to detect a lock timeout,
I can't run a query which takes over 1 sec.

If a lock wait is occured, I want to detect it immediately,
but I still want to run a long-running query.

--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>

Dennis Bjorklund

db@zigo.dhs.org

almost 22 years ago

In reply to: Satoshi Nagayasu (#7)

Re: lock timeout patch

On Mon, 28 Jun 2004, Satoshi Nagayasu wrote:

If I set statement_timeout to 1000 to detect a lock timeout,
I can't run a query which takes over 1 sec.

If a lock wait is occured, I want to detect it immediately,
but I still want to run a long-running query.

Why is it important what it is that makes your query not return as fast as
you expect? Maybe it's locking, maybe the computer is swapping, maybe it's
just lack of IO to that disk that holds the table, maybe it does a big
sort and have too little sort_mem to do it fast, ...

What makes locking special?

--
/Dennis Bjï¿½rklund

Robert Treat

xzilla@users.sourceforge.net

almost 22 years ago

In reply to: Satoshi Nagayasu (#7)

Re: lock timeout patch

On Mon, 2004-06-28 at 02:16, Satoshi Nagayasu wrote:

Tom Lane wrote:

I'd accept a mechanism to enforce a timeout at the lock level if you
could show me a convincing use-case for lock timeouts instead of
statement timeouts, but I don't believe there is one. I think this
proposal is a solution in search of a problem.

I think statement_timeout and lock_timeout are different.

If I set statement_timeout to 1000 to detect a lock timeout,
I can't run a query which takes over 1 sec.

If a lock wait is occured, I want to detect it immediately,
but I still want to run a long-running query.

How is your problem not solved by NOWAIT?
http://developer.postgresql.org/docs/postgres/sql-lock.html

Robert Treat
--
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL

#10

Satoshi Nagayasu

nagayasus@nttdata.co.jp

almost 22 years ago

In reply to: Robert Treat (#9)

Re: lock timeout patch

Robert Treat wrote:

I think statement_timeout and lock_timeout are different.

If I set statement_timeout to 1000 to detect a lock timeout,
I can't run a query which takes over 1 sec.

If a lock wait is occured, I want to detect it immediately,
but I still want to run a long-running query.

How is your problem not solved by NOWAIT?
http://developer.postgresql.org/docs/postgres/sql-lock.html

I agree that it's one of the solutions when we use LOCK explicitly.
But LOCK does only lock a whole table, doesn't it?

--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>

#11

Satoshi Nagayasu

nagayasus@nttdata.co.jp

almost 22 years ago

In reply to: Dennis Bjorklund (#8)

Re: lock timeout patch

Dennis Bjorklund wrote:

If I set statement_timeout to 1000 to detect a lock timeout,
I can't run a query which takes over 1 sec.

If a lock wait is occured, I want to detect it immediately,
but I still want to run a long-running query.

Why is it important what it is that makes your query not return as fast as
you expect? Maybe it's locking, maybe the computer is swapping, maybe it's
just lack of IO to that disk that holds the table, maybe it does a big
sort and have too little sort_mem to do it fast, ...

What makes locking special?

Processing slow-down is just a hardware/software sizing issue.
Of course we need to fix it when a problem is occured,
but I think it's a different kind of problem.

In large databases, such as DSS(decision support system),
some queries takes one or more minutes. I think it's okey.

But I don't want to wait one or more minutes just for a lock.
I need to return a message to the user "retry later." or
something like that. It depends on various applications.

So I think detecting a lock waiting is important.

--
NAGAYASU Satoshi <nagayasus@nttdata.co.jp>

#12

Bruno Wolff III

bruno@wolff.to

almost 22 years ago

In reply to: Satoshi Nagayasu (#11)

Re: lock timeout patch

On Tue, Jun 29, 2004 at 09:25:27 +0900,
Satoshi Nagayasu <nagayasus@nttdata.co.jp> wrote:

But I don't want to wait one or more minutes just for a lock.
I need to return a message to the user "retry later." or
something like that. It depends on various applications.

Why not set statement timeout low when you are about to run a query
that you think should return quickly?

#13

Josh Berkus

josh@agliodbs.com

almost 22 years ago

In reply to: Bruno Wolff III (#12)

Re: lock timeout patch

Tom,

I'd accept a mechanism to enforce a timeout at the lock level if you
could show me a convincing use-case for lock timeouts instead of
statement timeouts, but I don't believe there is one. I think this
proposal is a solution in search of a problem.

Hmmm ... didn't we argue this out with NOWAIT? What did we conclude then?
I'm reluctant to go over old ground repeatedly.

Let me say for myself that I would use this feature if it existed, but would
not miss it a whole lot if the patch was rejected. Here's the idea:

I have an OLAP database of regional office evaluations (in SQL Server, sadly)
which requires that the evaluations, sometimes interlocking, of regions be
"closed" simultaneously (in one transaction). This means that during the
closure process, certain kinds of data entry needs to be frozen out. I am
using SQL Server's lock timeout functionality for this; bascially, the data
entry waits for 30 seconds, and then tells the user to try again in 10
minutes.

I could do the same thing in PostgreSQL using NOWAIT and a loop on the client
side. But the lock timeout is somewhat easier.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Import Notes

Reply to msg id not found: auto-000005536314@davinci.ethosmedia.comReference msg id not found: auto-000005536314@davinci.ethosmedia.com | Resolved by subject fallback

#14

Merlin Moncure

merlin.moncure@rcsonline.com

almost 22 years ago

In reply to: Josh Berkus (#13)

Re: lock timeout patch

Tom,

I'd accept a mechanism to enforce a timeout at the lock level if you
could show me a convincing use-case for lock timeouts instead of
statement timeouts, but I don't believe there is one. I think this
proposal is a solution in search of a problem.

Hmmm ... didn't we argue this out with NOWAIT? What did we conclude
then?
I'm reluctant to go over old ground repeatedly.

The result of this debate was that there was some use for it. NOWAIT is
now implemented for table locking but not for row locking.

Personally I think there is some use for forcing transactions to abort
as soon as a lock situation is detected (although I probably wouldn't
use it). For row level locking I would suggest to the original poster
to compare xmin/xmax (check the docs) to pre check the row level lock
condition. This is inelegant but it mostly works.

FWIW, I think the treatment of locking in the docs could use some
improvement. Especially wrt MVCC and pessimistic locking and the 'big
picture' issues going on there (especially why the former is better than
the latter).

Merlin

Import Notes

Resolved by subject fallback

#15

Simon Riggs

simon@2ndQuadrant.com

almost 22 years ago

In reply to: Josh Berkus (#13)

Re: lock timeout patch

On Tue, 2004-06-29 at 18:36, Josh Berkus wrote:

Tom,

I'd accept a mechanism to enforce a timeout at the lock level if you
could show me a convincing use-case for lock timeouts instead of
statement timeouts, but I don't believe there is one. I think this
proposal is a solution in search of a problem.

Hmmm ... didn't we argue this out with NOWAIT? What did we conclude then?
I'm reluctant to go over old ground repeatedly.

Let me say for myself that I would use this feature if it existed, but would
not miss it a whole lot if the patch was rejected. Here's the idea:

Can't vouch for the patch, but I can say this would get used...

I have an ... database ...
which requires that the evaluations, sometimes interlocking, of regions be
"closed" simultaneously (in one transaction). This means that during the
closure process, certain kinds of data entry needs to be frozen out. I am
using ... lock timeout functionality for this; bascially, the data
entry waits for 30 seconds, and then tells the user to try again in 10
minutes.

Just implementing this same scenario, using DB2 (...). Of course, if I
had MVCC on that application, I could argue that this is not
required...is that the basis of the "not required" view?

I could do the same thing in PostgreSQL using NOWAIT and a loop on the client
side. But the lock timeout is somewhat easier.

SQLServer and DB2 support a lock timeout system wide, simple but not
granular. Oracle supports the NOWAIT option, even though it supports
readers-dont-block locking. I prefer the NOWAIT option as it gives a
more detailed handle on the exact statements that you wish to wait, or
not.

Without NOWAIT, we would need to set lock_timeout = 30 (seconds)

Statement level timeout is a different thing entirely, since there are
very often statements that need to run for 2-3 hours (even more in some
cases), so statement level timeout is set to 10000 (seconds).

Best Regards, Simon Riggs

#16

Simon Riggs

simon@2ndQuadrant.com

almost 22 years ago

In reply to: Josh Berkus (#13)

PITR Error Message assistance

Could I ask for feedback on the error messages used with the archiver
and restore functionality? Possibly those of you with a hand in the
Error Message style guide?

The messages need the eye of some administrators to suggest some better
phrases. Some probably need some explanation, but I'll leave that for
you to decide which...and what you think it should say instead. Thinking
that if they need explanation, they probably are worded wrongly...

These are copied straight from recent patch:

+ elog(WARNING, "could not set notify for archiver to read log file %u,
segment %u",

+   ereport(LOG,
+   (errmsg("recovery.conf found...starting archive recovery")));

+ elog(LOG, "redo: restored \"%s\" from archive", restoreXlog);

+ elog(LOG, "rename failed %s %s",recoveryXlog, lastrecoXlog);

+ elog(LOG, "redo: cannot restore \"%s\" from archive", restoreXlog);

+  ereport(ERROR,
+         			    (errcode_for_file_access(),
+         		         errmsg("could not write archive_status file \"%s\"
",
+     	  			        tmppath)));

+ elog(LOG, "redo: moving last restored xlog to %s", tmppath);

+ elog(LOG, "redo: rename failed");

+ elog(LOG, "redo: archive chain ends; using local copy of \"%s\"",
restoreXlog);

+ (errmsg("archive recovery complete")));

+ ereport(LOG,
! (errmsg("too many transaction log files, removing \"%s\"",
xlde->d_name)));

+ 			    ereport(WARNING,
+ 				(errcode_for_file_access(),
+ 			 	errmsg("chkpt: cannot find archive_status file: %s ",
+ 						rlogpath)));

+ 		 	    elog(WARNING, "chkpt: archiving not yet started for log file
%s", 
+ 						xlog);

DEBUG MESSAGES

+ 		elog(LOG, "backend: written %s", rlogpath );
+

+ elog(LOG, "postmaster: WAKEN_ARCHIVER received, sending SIGUSR1 to
archiver");

+ 			elog(LOG, "chkpt: archiving done for log file %s",
+ 						   xlog);

+ elog(LOG, "redo: system(%s)", xlogRestoreCmd);

Thanks,

Best regards, Simon Riggs

#17

Tom Lane

tgl@sss.pgh.pa.us

almost 22 years ago

In reply to: Merlin Moncure (#14)

Re: lock timeout patch

"Merlin Moncure" <merlin.moncure@rcsonline.com> writes:

FWIW, I think the treatment of locking in the docs could use some
improvement. Especially wrt MVCC and pessimistic locking and the 'big
picture' issues going on there (especially why the former is better than
the latter).

Send a patch ...

regards, tom lane

#18

Peter Eisentraut

peter_e@gmx.net

almost 22 years ago

In reply to: Simon Riggs (#16)

Re: PITR Error Message assistance

Simon Riggs wrote:

+ elog(WARNING, "could not set notify for archiver to read log file
%u, segment %u",

Reason? (disk full, network down, leap year?)

I think elog() calls don't get translated. You should always use
ereport. Tom would know more about the distinction.

+   ereport(LOG,
+   (errmsg("recovery.conf found...starting archive recovery")));

comma

+ elog(LOG, "redo: cannot restore \"%s\" from archive",
restoreXlog);

Reason?

+  ereport(ERROR,
+         			    (errcode_for_file_access(),
+         		         errmsg("could not write archive_status file
\"%s\" ",
+     	  			        tmppath)));

Reason?

+ elog(LOG, "redo: moving last restored xlog to %s", tmppath);

Probably a user doesn't know what "xlog" is. Also, I don't like the
prefix: style you use. Just tell what happened.

+ elog(LOG, "redo: rename failed");

We don't write "foo failed", but "could not do foo". And again: reason?

+ elog(LOG, "redo: archive chain ends; using local copy of \"%s\"",
restoreXlog);

+ ereport(LOG,
! (errmsg("too many transaction log files, removing \"%s\"",
xlde->d_name)));

How/where is the limit defined?

+ 			    ereport(WARNING,
+ 				(errcode_for_file_access(),
+ 			 	errmsg("chkpt: cannot find archive_status file: %s ",
+ 						rlogpath)));

There is enough space that you can write "checkpoint". Or actually
don't write anything. What is the archive_status file?

+ 		 	    elog(WARNING, "chkpt: archiving not yet started for log
file %s",
+ 						xlog);

What does that tell me?

DEBUG MESSAGES

Debug messages should have a DEBUG severity.

Show quoted text

+ 		elog(LOG, "backend: written %s", rlogpath );
+
+ elog(LOG, "postmaster: WAKEN_ARCHIVER received, sending SIGUSR1
to archiver");
+ 			elog(LOG, "chkpt: archiving done for log file %s",
+ 						   xlog);
+ elog(LOG, "redo: system(%s)", xlogRestoreCmd);

#19

Tom Lane

tgl@sss.pgh.pa.us

almost 22 years ago

In reply to: Peter Eisentraut (#18)

Re: PITR Error Message assistance

Peter Eisentraut <peter_e@gmx.net> writes:

Simon Riggs wrote:

+ elog(WARNING, "could not set notify for archiver to read log file
%u, segment %u",

Reason? (disk full, network down, leap year?)

I think elog() calls don't get translated. You should always use
ereport. Tom would know more about the distinction.

elog is deprecated except for debugging or "can't-happen" messages.
Anything user-facing ought to be reported with ereport. In this case
elog might be okay --- it's not clear to me from this snippet whether
the condition is one a user would be likely to see.

There's plenty of detail about all this in chapter 45 of the docs:
http://www.postgresql.org/docs/7.4/static/source.html
and I think most of Peter's comments trace directly to items in the
message style guide there.

regards, tom lane

#20

Simon Riggs

simon@2ndQuadrant.com

almost 22 years ago

In reply to: Tom Lane (#19)

Re: PITR Error Message assistance

On Wed, 2004-06-30 at 15:58, Tom Lane wrote:

Peter Eisentraut <peter_e@gmx.net> writes:

Thanks very much for your comments.

Simon Riggs wrote:

+ elog(WARNING, "could not set notify for archiver to read log file
%u, segment %u",

Reason? (disk full, network down, leap year?)

? "could not increment year number (additional day found)" :)

I think elog() calls don't get translated. You should always use
ereport. Tom would know more about the distinction.

elog is deprecated except for debugging or "can't-happen" messages.
Anything user-facing ought to be reported with ereport. In this case
elog might be okay --- it's not clear to me from this snippet whether
the condition is one a user would be likely to see.

Thanks for clarifying that, I wasn't clear on that. I'll go through and
make any required changes.

There's plenty of detail about all this in chapter 45 of the docs:
http://www.postgresql.org/docs/7.4/static/source.html
and I think most of Peter's comments trace directly to items in the
message style guide there.

Yes, I've read that and this post was all about discussing this and
applying it as part of polishing work.

I was really looking for some suggestions rather than a critique - the
messages were not exactly the bit I was focusing on in dev. Peter has
highlighted the extent of improvement, so I'll get on it now.

(For later, I've found it confusing that the Developer's FAQ makes no
mention of those notes, nor the other way around - I'll submit some
suggested changes to make everything clearer for those on their first
patch....and yes, I've tried hard to RTFM)

Can I just clarify whether the end of June freeze does or does not apply
to documentation? Clearly, I have a significant amount of documentation
to write also, which presumably will need review also.

What are the plans for review? I've not really heard much as yet...

Best Regards, Simon Riggs

#21

Tom Lane

tgl@sss.pgh.pa.us

almost 22 years ago

In reply to: Simon Riggs (#20)

lock timeout patch

Attachments: