Snapshot too old logging

Started by Magnus Haganderabout 9 years ago22 messages
#1Magnus Hagander
magnus@hagander.net

Is there a reason why we don't log which relation triggered the snapshot
too old error when it happens?

Since we do have Relation available in TestForOldSnapshot_impl(), shouldn't
we be able to include it?

Having that in the log message would be very helpful, but is there a reason
why it wouldn't work?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#2Kevin Grittner
kgrittn@gmail.com
In reply to: Magnus Hagander (#1)
Re: Snapshot too old logging

On Tue, Nov 15, 2016 at 3:38 AM, Magnus Hagander <magnus@hagander.net> wrote:

Is there a reason why we don't log which relation triggered the snapshot too
old error when it happens?

Since we do have Relation available in TestForOldSnapshot_impl(), shouldn't
we be able to include it?

Having that in the log message would be very helpful, but is there a reason
why it wouldn't work?

I would probably not want to mess with the text of the error
itself, in case any client-side software bases recovery on that
rather than the SQLSTATE value; but including relation (and maybe
the timestamp on the snapshot?) in detail or context portions seems
like a great idea.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#2)
Re: Snapshot too old logging

Kevin Grittner <kgrittn@gmail.com> writes:

On Tue, Nov 15, 2016 at 3:38 AM, Magnus Hagander <magnus@hagander.net> wrote:

Is there a reason why we don't log which relation triggered the snapshot too
old error when it happens?

I would probably not want to mess with the text of the error
itself, in case any client-side software bases recovery on that
rather than the SQLSTATE value;

Any such code is broken on its face because of localization.
Perhaps including the relname in the main message would make it
unduly long, but if not I'd vote for doing it that way.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#3)
Re: Snapshot too old logging

On Tue, Nov 15, 2016 at 5:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Kevin Grittner <kgrittn@gmail.com> writes:

On Tue, Nov 15, 2016 at 3:38 AM, Magnus Hagander <magnus@hagander.net>

wrote:

Is there a reason why we don't log which relation triggered the

snapshot too

old error when it happens?

I would probably not want to mess with the text of the error
itself, in case any client-side software bases recovery on that
rather than the SQLSTATE value;

Any such code is broken on its face because of localization.
Perhaps including the relname in the main message would make it
unduly long, but if not I'd vote for doing it that way.

Agreed.

Is there value in showing which snapshot as well? Something like:
DETAIL: snapshot <xyz> is too old to access relation <relation>

Putting both those into the main message will probably make it too long.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#5Brad DeJong
Brad.Dejong@infor.com
In reply to: Tom Lane (#3)
Re: Snapshot too old logging

On Tue, Nov 15, 2016 at 10:30 AM, Tom Lane wrote:

Kevin Grittner <kgrittn@gmail.com> writes:

On Tue, Nov 15, 2016 at 3:38 AM, Magnus Hagander <magnus@hagander.net>

wrote:

Is there a reason why we don't log which relation triggered the
snapshot too old error when it happens?

I would probably not want to mess with the text of the error itself,
in case any client-side software bases recovery on that rather than
the SQLSTATE value;

Any such code is broken on its face because of localization.
Perhaps including the relname in the main message would make it unduly
long, but if not I'd vote for doing it that way.

+1 for relname in the main message.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Robert Haas
robertmhaas@gmail.com
In reply to: Magnus Hagander (#4)
Re: Snapshot too old logging

On Tue, Nov 15, 2016 at 1:18 PM, Magnus Hagander <magnus@hagander.net> wrote:

Is there value in showing which snapshot as well? Something like:
DETAIL: snapshot <xyz> is too old to access relation <relation>

IIUC, the granularity is per-block, not per-relation, so that might be
misleading.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Magnus Hagander
magnus@hagander.net
In reply to: Robert Haas (#6)
Re: Snapshot too old logging

On Tue, Nov 15, 2016 at 7:27 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Nov 15, 2016 at 1:18 PM, Magnus Hagander <magnus@hagander.net>
wrote:

Is there value in showing which snapshot as well? Something like:
DETAIL: snapshot <xyz> is too old to access relation <relation>

IIUC, the granularity is per-block, not per-relation, so that might be
misleading.

Does it help to let the user know which number? I'm not really sure what
I'd do with that information, whereas knowing the relation would be very
useful.

But we could certainly say "snapshot <xyz> is too old to access block <nnn>
of relation <rel>".

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#8Brad DeJong
Brad.Dejong@infor.com
In reply to: Magnus Hagander (#4)
Re: Snapshot too old logging

On Tue, Nov 15, 2016 at 12:20 PM Magnus Hagander wrote:

Is there value in showing the snapshot as well?

I don't think so. Knowing the relname let's you look at your report/job and figure out if the access to that relation can be moved. Having the exact snapshot version isn't going to change that. And if you're doing auto-retry, you probably only care if it is happening on the same relation consistently.

#9Robert Haas
robertmhaas@gmail.com
In reply to: Magnus Hagander (#7)
Re: Snapshot too old logging

On Tue, Nov 15, 2016 at 1:30 PM, Magnus Hagander <magnus@hagander.net> wrote:

On Tue, Nov 15, 2016 at 7:27 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Nov 15, 2016 at 1:18 PM, Magnus Hagander <magnus@hagander.net>
wrote:

Is there value in showing which snapshot as well? Something like:
DETAIL: snapshot <xyz> is too old to access relation <relation>

IIUC, the granularity is per-block, not per-relation, so that might be
misleading.

Does it help to let the user know which number? I'm not really sure what I'd
do with that information, whereas knowing the relation would be very useful.

But we could certainly say "snapshot <xyz> is too old to access block <nnn>
of relation <rel>".

I think it would be better not to include either the snapshot or the
block number, and just find some way to reword the error message so
that it mentions which relation was involved without implying that all
access to the relation would necessarily fail. For example:

ERROR: snapshot too old
DETAIL: One or more rows required by this query have already been
removed from "%s".

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#4)
Re: Snapshot too old logging

Magnus Hagander <magnus@hagander.net> writes:

Is there value in showing which snapshot as well? Something like:
DETAIL: snapshot <xyz> is too old to access relation <relation>

Snapshots don't have names, and I can't think of a useful way of
identifying them to users.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Kevin Grittner
kgrittn@gmail.com
In reply to: Tom Lane (#10)
Re: Snapshot too old logging

On Tue, Nov 15, 2016 at 12:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Magnus Hagander <magnus@hagander.net> writes:

Is there value in showing which snapshot as well? Something like:
DETAIL: snapshot <xyz> is too old to access relation <relation>

Snapshots don't have names, and I can't think of a useful way of
identifying them to users.

The time that the snapshot was taken might be useful; I can't think
of anything else that would be. The main thing would be to include
the relation, with an indication that it's not necessarily true
that access to *any* part of the relation would be a problem.
Mentioning the block number would tend to support that idea, and
might be of some use.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Magnus Hagander
magnus@hagander.net
In reply to: Robert Haas (#9)
1 attachment(s)
Re: Snapshot too old logging

On Tue, Nov 15, 2016 at 7:43 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Nov 15, 2016 at 1:30 PM, Magnus Hagander <magnus@hagander.net>
wrote:

On Tue, Nov 15, 2016 at 7:27 PM, Robert Haas <robertmhaas@gmail.com>

wrote:

On Tue, Nov 15, 2016 at 1:18 PM, Magnus Hagander <magnus@hagander.net>
wrote:

Is there value in showing which snapshot as well? Something like:
DETAIL: snapshot <xyz> is too old to access relation <relation>

IIUC, the granularity is per-block, not per-relation, so that might be
misleading.

Does it help to let the user know which number? I'm not really sure what

I'd

do with that information, whereas knowing the relation would be very

useful.

But we could certainly say "snapshot <xyz> is too old to access block

<nnn>

of relation <rel>".

I think it would be better not to include either the snapshot or the
block number, and just find some way to reword the error message so
that it mentions which relation was involved without implying that all
access to the relation would necessarily fail. For example:

ERROR: snapshot too old
DETAIL: One or more rows required by this query have already been
removed from "%s".

Sounds good to me. I've only really found use for the relation name so far
:)

How about the attached?

And if ppl are OK with it, thoughts on backpatching to 9.6?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

Attachments:

snapshot_too_old.patchtext/x-patch; charset=US-ASCII; name=snapshot_too_old.patchDownload
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 58b0a97..04e6450 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -4336,5 +4336,6 @@ TestForOldSnapshot_impl(Snapshot snapshot, Relation relation)
 		&& (snapshot)->whenTaken < GetOldSnapshotThresholdTimestamp())
 		ereport(ERROR,
 				(errcode(ERRCODE_SNAPSHOT_TOO_OLD),
-				 errmsg("snapshot too old")));
+				 errmsg("snapshot too old"),
+				 errdetail("One or more rows required by this query have already been removed from \"%s\"", RelationGetRelationName(relation))));
 }
#13Kevin Grittner
kgrittn@gmail.com
In reply to: Robert Haas (#9)
Re: Snapshot too old logging

On Tue, Nov 15, 2016 at 12:43 PM, Robert Haas <robertmhaas@gmail.com> wrote:

I think it would be better not to include either the snapshot or the
block number, and just find some way to reword the error message so
that it mentions which relation was involved without implying that all
access to the relation would necessarily fail. For example:

ERROR: snapshot too old
DETAIL: One or more rows required by this query have already been
removed from "%s".

That particular language would be misleading. All we know about
the page is that it was modified since the referencing (old)
snapshot was taken. We don't don't know in what way it was
modified, so we must assume that it *might* have been pruned of
rows that the snapshot should still be able to see.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Robert Haas
robertmhaas@gmail.com
In reply to: Kevin Grittner (#13)
Re: Snapshot too old logging

On Tue, Nov 15, 2016 at 2:21 PM, Kevin Grittner <kgrittn@gmail.com> wrote:

On Tue, Nov 15, 2016 at 12:43 PM, Robert Haas <robertmhaas@gmail.com> wrote:

I think it would be better not to include either the snapshot or the
block number, and just find some way to reword the error message so
that it mentions which relation was involved without implying that all
access to the relation would necessarily fail. For example:

ERROR: snapshot too old
DETAIL: One or more rows required by this query have already been
removed from "%s".

That particular language would be misleading. All we know about
the page is that it was modified since the referencing (old)
snapshot was taken. We don't don't know in what way it was
modified, so we must assume that it *might* have been pruned of
rows that the snapshot should still be able to see.

Oh, yeah. So maybe "may have already been removed".

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Magnus Hagander
magnus@hagander.net
In reply to: Robert Haas (#14)
Re: Snapshot too old logging

On Tue, Nov 15, 2016 at 8:22 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Nov 15, 2016 at 2:21 PM, Kevin Grittner <kgrittn@gmail.com> wrote:

On Tue, Nov 15, 2016 at 12:43 PM, Robert Haas <robertmhaas@gmail.com>

wrote:

I think it would be better not to include either the snapshot or the
block number, and just find some way to reword the error message so
that it mentions which relation was involved without implying that all
access to the relation would necessarily fail. For example:

ERROR: snapshot too old
DETAIL: One or more rows required by this query have already been
removed from "%s".

That particular language would be misleading. All we know about
the page is that it was modified since the referencing (old)
snapshot was taken. We don't don't know in what way it was
modified, so we must assume that it *might* have been pruned of
rows that the snapshot should still be able to see.

Oh, yeah. So maybe "may have already been removed".

Just to be clear, you're suggesting 'One or more rows may have already been
removed from "%s"?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#16Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Magnus Hagander (#15)
Re: Snapshot too old logging

Magnus Hagander wrote:

On Tue, Nov 15, 2016 at 8:22 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Nov 15, 2016 at 2:21 PM, Kevin Grittner <kgrittn@gmail.com> wrote:

That particular language would be misleading. All we know about
the page is that it was modified since the referencing (old)
snapshot was taken. We don't don't know in what way it was
modified, so we must assume that it *might* have been pruned of
rows that the snapshot should still be able to see.

Oh, yeah. So maybe "may have already been removed".

Just to be clear, you're suggesting 'One or more rows may have already been
removed from "%s"?

Focusing on the relation itself for a second, I think the name should be
schema-qualified. What about using errtable()?

Can this happen for relation types other than tables, say materialized
views? (Your suggested wording omits relation type so it wouldn't be
affected, but it's worth considering I think.)

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Brad DeJong
Brad.Dejong@infor.com
In reply to: Magnus Hagander (#15)
Re: Snapshot too old logging

Magnus wrote:

Just to be clear, you're suggesting 'One or more rows may have already been removed from "%s"?

Perhaps just 'This query attempted to access a page in "%s" that was modified after the snapshot was acquired.'

#18Magnus Hagander
magnus@hagander.net
In reply to: Alvaro Herrera (#16)
Re: Snapshot too old logging

On Tue, Nov 15, 2016 at 9:23 PM, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

Magnus Hagander wrote:

On Tue, Nov 15, 2016 at 8:22 PM, Robert Haas <robertmhaas@gmail.com>

wrote:

On Tue, Nov 15, 2016 at 2:21 PM, Kevin Grittner <kgrittn@gmail.com>

wrote:

That particular language would be misleading. All we know about
the page is that it was modified since the referencing (old)
snapshot was taken. We don't don't know in what way it was
modified, so we must assume that it *might* have been pruned of
rows that the snapshot should still be able to see.

Oh, yeah. So maybe "may have already been removed".

Just to be clear, you're suggesting 'One or more rows may have already

been

removed from "%s"?

Focusing on the relation itself for a second, I think the name should be
schema-qualified. What about using errtable()?

Can this happen for relation types other than tables, say materialized
views? (Your suggested wording omits relation type so it wouldn't be
affected, but it's worth considering I think.)

I'm fairly certain it can hit other things, including indexes and
definitely matviews, but I won't say I'm 100% sure :) The check is at block
level. Does errtable() work for that? (I've never used it, and it seems
it's only actually use din a single place in the codebase..)

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

#19Robert Haas
robertmhaas@gmail.com
In reply to: Magnus Hagander (#15)
Re: Snapshot too old logging

On Tue, Nov 15, 2016 at 2:25 PM, Magnus Hagander <magnus@hagander.net> wrote:

On Tue, Nov 15, 2016 at 8:22 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Nov 15, 2016 at 2:21 PM, Kevin Grittner <kgrittn@gmail.com> wrote:

On Tue, Nov 15, 2016 at 12:43 PM, Robert Haas <robertmhaas@gmail.com>
wrote:

I think it would be better not to include either the snapshot or the
block number, and just find some way to reword the error message so
that it mentions which relation was involved without implying that all
access to the relation would necessarily fail. For example:

ERROR: snapshot too old
DETAIL: One or more rows required by this query have already been
removed from "%s".

That particular language would be misleading. All we know about
the page is that it was modified since the referencing (old)
snapshot was taken. We don't don't know in what way it was
modified, so we must assume that it *might* have been pruned of
rows that the snapshot should still be able to see.

Oh, yeah. So maybe "may have already been removed".

Just to be clear, you're suggesting 'One or more rows may have already been
removed from "%s"?

I think I was suggesting: One or more rows required by this query may
already have been removed from "%s".

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Kevin Grittner
kgrittn@gmail.com
In reply to: Magnus Hagander (#18)
Re: Snapshot too old logging

On Wed, Nov 16, 2016 at 6:12 AM, Magnus Hagander <magnus@hagander.net> wrote:

On Tue, Nov 15, 2016 at 9:23 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Can this happen for relation types other than tables, say materialized
views? (Your suggested wording omits relation type so it wouldn't be
affected, but it's worth considering I think.)

I'm fairly certain it can hit other things, including indexes and definitely
matviews, but I won't say I'm 100% sure :) The check is at block level. Does
errtable() work for that? (I've never used it, and it seems it's only
actually use din a single place in the codebase..)

Yes, it can happen for indexes and for materialized views.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Brad DeJong
Brad.Dejong@infor.com
In reply to: Robert Haas (#19)
Re: Snapshot too old logging

On Wed, Nov 16, 2016 at 6:43 AM, Robert Haas wrote:

I think I was suggesting: One or more rows required by this query may
already have been removed from "%s".

I keep reading that as "you have data corruption because something removed rows that your query needs" rather than "this query took too long to complete but may succeed if you resubmit it".

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22Robert Haas
robertmhaas@gmail.com
In reply to: Brad DeJong (#21)
Re: Snapshot too old logging

On Fri, Nov 18, 2016 at 11:49 AM, Brad DeJong <Brad.Dejong@infor.com> wrote:

On Wed, Nov 16, 2016 at 6:43 AM, Robert Haas wrote:

I think I was suggesting: One or more rows required by this query may
already have been removed from "%s".

I keep reading that as "you have data corruption because something removed rows that your query needs" rather than "this query took too long to complete but may succeed if you resubmit it".

I see your point, but remember that this would only the detail line.
The error would still be "snapshot too old".

ERROR: snapshot too old
DETAIL: One or more rows required by this query may already have been
removed from "%s".

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers