'SGT DETAIL: Could not open file "pg_clog/05DC": No such file or directory' - what to do now?

Started by Tomasz Chmielewskiover 14 years ago33 messages
#1Tomasz Chmielewski
mangoo@wpkg.org

On 1st May, I saw this message in my postgres log:

May 2 06:52:02 db10 postgres[3590]: [29829-1] 2011-05-02 06:52:02 SGT
ERROR: could not access status of transaction 1573786613
May 2 06:52:02 db10 postgres[3590]: [29829-2] 2011-05-02 06:52:02 SGT
DETAIL: Could not open file "pg_clog/05DC": No such file or directory.
May 2 06:52:02 db10 postgres[3590]: [29829-3] 2011-05-02 06:52:02 SGT
STATEMENT: SELECT 1 FROM core_bill_id_seq FOR UPDATE

Now, I'm not sure what I should do about it. Database behaves "funny",
some inserts do not work.

Searching the internet suggests that:

1) such errors could happen with PostgreSQL 8.1.x under heavy load -
this server is under constant heavy load, but runs 8.3.14 on Debian Lenny

2) I should simply create a 256k pg_clog/05DC empty file with dd - I
wouldn't like to do it, without first knowing what happened, and if it's
really "good fix"

3) some tables can be corrupted - how can I check that? pg_dump works
fine and doesn't report any errors

4) I may have hardware problems - but this server is running for almost
1 year now, is super stable - servers with hardware issues are likely to
show some issues as well

5) database corrupted due to a server crash - this server never crashed

How should I continue from that (assuming I can't reliably verify if
something wrong is going with the hardware or not - points 4 and 5)?

--
Tomasz Chmielewski
http://wpkg.org

#2Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tomasz Chmielewski (#1)
Re: 'SGT DETAIL: Could not open file "pg_clog/05DC": No such file or directory' - what to do now?

Tomasz Chmielewski <mangoo@wpkg.org> wrote:

On 1st May, I saw this message in my postgres log:

May 2 06:52:02 db10 postgres[3590]: [29829-1] 2011-05-02 06:52:02
SGT ERROR: could not access status of transaction 1573786613
May 2 06:52:02 db10 postgres[3590]: [29829-2] 2011-05-02 06:52:02
SGT DETAIL: Could not open file "pg_clog/05DC": No such file or
directory.
May 2 06:52:02 db10 postgres[3590]: [29829-3] 2011-05-02 06:52:02
SGT STATEMENT: SELECT 1 FROM core_bill_id_seq FOR UPDATE

You saw errors on the 1st dated for the 2nd?

Now, I'm not sure what I should do about it. Database behaves
"funny", some inserts do not work.

Define "funny". What happens when you attempt the inserts which
don't work. (Copy and paste any error messages.) Is it all tables?
All inserts to one table? Any other discernible pattern?

4) I may have hardware problems - but this server is running for
almost 1 year now, is super stable - servers with hardware issues
are likely to show some issues as well

Does the server have ECC memory? Do you have SMART monitoring of
the storage system, or something similar? Any errors showing in any
system logs?

-Kevin

#3Tomasz Chmielewski
mangoo@wpkg.org
In reply to: Kevin Grittner (#2)
Re: 'SGT DETAIL: Could not open file "pg_clog/05DC": No such file or directory' - what to do now?

On 04.05.2011 20:14, Kevin Grittner wrote:

Tomasz Chmielewski<mangoo@wpkg.org> wrote:

On 1st May, I saw this message in my postgres log:

May 2 06:52:02 db10 postgres[3590]: [29829-1] 2011-05-02 06:52:02
SGT ERROR: could not access status of transaction 1573786613
May 2 06:52:02 db10 postgres[3590]: [29829-2] 2011-05-02 06:52:02
SGT DETAIL: Could not open file "pg_clog/05DC": No such file or
directory.
May 2 06:52:02 db10 postgres[3590]: [29829-3] 2011-05-02 06:52:02
SGT STATEMENT: SELECT 1 FROM core_bill_id_seq FOR UPDATE

You saw errors on the 1st dated for the 2nd?

My bad; it was 2nd, not 1st.

Now, I'm not sure what I should do about it. Database behaves
"funny", some inserts do not work.

Define "funny". What happens when you attempt the inserts which
don't work. (Copy and paste any error messages.) Is it all tables?
All inserts to one table? Any other discernible pattern?

This repeated many times:

/var/log/postgresql/postgresql_log.1:May 3 18:24:49 db10 postgres[21363]: [26999-1] 2011-05-03 18:24:49 SGT ERROR: could not access status of transaction 1573786613
/var/log/postgresql/postgresql_log.1-May 3 18:24:49 db10 postgres[21363]: [26999-2] 2011-05-03 18:24:49 SGT DETAIL: Could not open file "pg_clog/05DC": No such file or directory.
/var/log/postgresql/postgresql_log.1-May 3 18:24:49 db10 postgres[21363]: [26999-3] 2011-05-03 18:24:49 SGT STATEMENT: SELECT 1 FROM core_wot_seq FOR UPDATE

Today I have this:

/var/log/postgresql/postgresql_log:May 4 22:43:44 db10 postgres[15773]: [555-1] 2011-05-04 22:43:44 SGT ERROR: could not access status of transaction 1612337841
/var/log/postgresql/postgresql_log-May 4 22:43:44 db10 postgres[15773]: [555-2] 2011-05-04 22:43:44 SGT DETAIL: Could not open file "pg_clog/0601": No such file or directory.
/var/log/postgresql/postgresql_log-May 4 22:43:44 db10 postgres[15773]: [555-3] 2011-05-04 22:43:44 SGT STATEMENT: SELECT 1 FROM core_wbl_seq FOR UPDATE

Only such two (different) occurrences; repeated 10-20 times; two different tables.

The system is used heavily, so it would show lots of other errors in other places if it was some major fault.
Which does not include some "minor" fault.

4) I may have hardware problems - but this server is running for
almost 1 year now, is super stable - servers with hardware issues
are likely to show some issues as well

Does the server have ECC memory? Do you have SMART monitoring of
the storage system, or something similar? Any errors showing in any
system logs?

No errors at all anywhere (dmesg, syslog etc.).

It's ProLiant DL180 G6, and I think it should have ECC. At least I see it being mentioned in dmidecode.

Assuming we can't determine what caused the corruption (bitflip, kernel bug, bad RAM, postgres bug, silent HDD error etc.) - how should I best recover from this?

--
Tomasz Chmielewski
http://wpkg.org

#4Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tomasz Chmielewski (#3)
Re: 'SGT DETAIL: Could not open file "pg_clog/05DC": No such file or directory' - what to do now?

Tomasz Chmielewski <mangoo@wpkg.org> wrote:

This repeated many times:

/var/log/postgresql/postgresql_log.1:May 3 18:24:49 db10
postgres[21363]: [26999-1] 2011-05-03 18:24:49 SGT ERROR: could
not access status of transaction 1573786613
/var/log/postgresql/postgresql_log.1-May 3 18:24:49 db10

< postgres[21363]: [26999-2] 2011-05-03 18:24:49 SGT DETAIL: Could

not open file "pg_clog/05DC": No such file or directory.
/var/log/postgresql/postgresql_log.1-May 3 18:24:49 db10
postgres[21363]: [26999-3] 2011-05-03 18:24:49 SGT STATEMENT:
SELECT 1 FROM core_wot_seq FOR UPDATE

Today I have this:

/var/log/postgresql/postgresql_log:May 4 22:43:44 db10
postgres[15773]: [555-1] 2011-05-04 22:43:44 SGT ERROR: could not
access status of transaction 1612337841
/var/log/postgresql/postgresql_log-May 4 22:43:44 db10
postgres[15773]: [555-2] 2011-05-04 22:43:44 SGT DETAIL: Could
not open file "pg_clog/0601": No such file or directory.
/var/log/postgresql/postgresql_log-May 4 22:43:44 db10
postgres[15773]: [555-3] 2011-05-04 22:43:44 SGT STATEMENT:
SELECT 1 FROM core_wbl_seq FOR UPDATE

Only such two (different) occurrences; repeated 10-20 times; two
different tables.

how should I best recover from this?

If you hadn't already said you were running on 8.3.14 I would have
wondered whether you had used pg_migrator/pg_upgrade. As it is, I'm
going to admit I'm out of my depth and hope someone else jumps in
here.

-Kevin

#5Scott Marlowe
scott.marlowe@gmail.com
In reply to: Tomasz Chmielewski (#3)
Re: 'SGT DETAIL: Could not open file "pg_clog/05DC": No such file or directory' - what to do now?

On Wed, May 4, 2011 at 12:37 PM, Tomasz Chmielewski <mangoo@wpkg.org> wrote:

/var/log/postgresql/postgresql_log.1:May  3 18:24:49 db10 postgres[21363]: [26999-1] 2011-05-03 18:24:49 SGT ERROR:  could not access status of transaction 1573786613
/var/log/postgresql/postgresql_log.1-May  3 18:24:49 db10 postgres[21363]: [26999-2] 2011-05-03 18:24:49 SGT DETAIL:  Could not open file "pg_clog/05DC": No such file or directory.
/var/log/postgresql/postgresql_log.1-May  3 18:24:49 db10 postgres[21363]: [26999-3] 2011-05-03 18:24:49 SGT STATEMENT:  SELECT 1 FROM core_wot_seq FOR UPDATE

So a pg_clog file disappeared.

Today I have this:

/var/log/postgresql/postgresql_log:May  4 22:43:44 db10 postgres[15773]: [555-1] 2011-05-04 22:43:44 SGT ERROR:  could not access status of transaction 1612337841
/var/log/postgresql/postgresql_log-May  4 22:43:44 db10 postgres[15773]: [555-2] 2011-05-04 22:43:44 SGT DETAIL:  Could not open file "pg_clog/0601": No such file or directory.
/var/log/postgresql/postgresql_log-May  4 22:43:44 db10 postgres[15773]: [555-3] 2011-05-04 22:43:44 SGT STATEMENT:  SELECT 1 FROM core_wbl_seq FOR UPDATE

Then another pg_clog file disappeared.

Is it possible there's some rogue process deleting files in pg_clog
somehow? Have you run an fsck on this drive to make sure it's not got
any file system errors?

#6Tomasz Chmielewski
mangoo@wpkg.org
In reply to: Scott Marlowe (#5)
Re: 'SGT DETAIL: Could not open file "pg_clog/05DC": No such file or directory' - what to do now?

On 04.05.2011 21:50, Scott Marlowe wrote:

Then another pg_clog file disappeared.

Is it possible there's some rogue process deleting files in pg_clog
somehow?

I don't think.

Have you run an fsck on this drive to make sure it's not got
any file system errors?

Also, don't think there is any corruption here. AFAIR, this system never crashed.
Could be there is some silent corruption though - but if there really was one, we would likely see the kernel complaining, stale files elsewhere, and so on.

Without such clues on filesystem corruption, I can't afford downtime.

I didn't mention, but the application first talks to pgpool, which talks to two database servers (i.e. inserts to both).

The real fun begins here - this is from two different servers:

db10:/var/log/postgresql# zgrep "No such" *
postgresql_log:May 4 18:24:28 db10 postgres[15751]: [23-2] 2011-05-04 18:24:28 SGT DETAIL: Could not open file "pg_clog/0601": No such file or directory.
postgresql_log:May 4 22:43:44 db10 postgres[15773]: [555-2] 2011-05-04 22:43:44 SGT DETAIL: Could not open file "pg_clog/0601": No such file or directory.
postgresql_log:May 4 22:44:30 db10 postgres[15791]: [1841-2] 2011-05-04 22:44:30 SGT DETAIL: Could not open file "pg_clog/0601": No such file or directory.
postgresql_log:May 4 22:55:53 db10 postgres[15741]: [4114-2] 2011-05-04 22:55:53 SGT DETAIL: Could not open file "pg_clog/0601": No such file or directory.

db20:/var/log/postgresql# zgrep "No such" *
postgresql_log:May 4 18:24:28 db20 postgres[27114]: [2-2] 2011-05-04 18:24:28 SGT DETAIL: Could not open file "pg_clog/0601": No such file or directory.
postgresql_log:May 4 22:43:44 db20 postgres[27116]: [2-2] 2011-05-04 22:43:44 SGT DETAIL: Could not open file "pg_clog/0601": No such file or directory.
postgresql_log:May 4 22:44:30 db20 postgres[27138]: [2-2] 2011-05-04 22:44:30 SGT DETAIL: Could not open file "pg_clog/0601": No such file or directory.
postgresql_log:May 4 22:55:53 db20 postgres[27104]: [2-2] 2011-05-04 22:55:53 SGT DETAIL: Could not open file "pg_clog/0601": No such file or directory.

I can't exclude some corruption happened much earlier on db10; the whole database (as binary files) was copied to db20 almost 2 months ago.

Why would it start showing pg_clog files missing just 2 days ago, and not earlier? Hmm.

--
Tomasz Chmielewski
http://wpkg.org

#7Tomasz Chmielewski
mangoo@wpkg.org
In reply to: Tomasz Chmielewski (#6)
Re: 'SGT DETAIL: Could not open file "pg_clog/05DC": No such file or directory' - what to do now?

On 04.05.2011 22:13, Tomasz Chmielewski wrote:

On 04.05.2011 21:50, Scott Marlowe wrote:

Then another pg_clog file disappeared.

OK, I have:

bookstor=# SELECT * FROM core_wot_seq;
sequence_name | last_value | increment_by | max_value | min_value | cache_value | log_cnt | is_cycled | is_called
------------------+------------+--------------+---------------------+-----------+-------------+---------+-----------+-----------
core_wot_seq | 2593 | 1 | 9223372036854775807 | 1 | 1 | 8 | f | t
(1 row)

bookstor=# SELECT 1 FROM core_wot_seq;
?column?
----------
1
(1 row)

bookstor=# SELECT 1 FROM core_wot_seq FOR UPDATE;
ERROR: could not access status of transaction 1573786613
DETAIL: Could not open file "pg_clog/05DC": No such file or directory.

How do I best recover from this? Stop postgres, create an empty, 256k pg_clog/05DC file, start postgres?

Export table, drop table, import table? Anything else?

--
Tomasz Chmielewski
http://wpkg.org

In reply to: Tomasz Chmielewski (#7)
Re: 'SGT DETAIL: Could not open file "pg_clog/05DC": No such file or directory' - what to do now?

On 04.05.2011 22:27, Tomasz Chmielewski wrote:

Then another pg_clog file disappeared.

OK, I have:

bookstor=# SELECT * FROM core_wot_seq;
sequence_name | last_value | increment_by | max_value | min_value | cache_value | log_cnt | is_cycled | is_called
------------------+------------+--------------+---------------------+-----------+-------------+---------+-----------+-----------
core_wot_seq | 2593 | 1 | 9223372036854775807 | 1 | 1 | 8 | f | t
(1 row)

bookstor=# SELECT 1 FROM core_wot_seq;
?column?
----------
1
(1 row)

bookstor=# SELECT 1 FROM core_wot_seq FOR UPDATE;
ERROR: could not access status of transaction 1573786613
DETAIL: Could not open file "pg_clog/05DC": No such file or directory.

How do I best recover from this? Stop postgres, create an empty, 256k pg_clog/05DC file, start postgres?

Export table, drop table, import table? Anything else?

Nobody has a clue? :|

--
Tomasz Chmielewski
http://wpkg.org

#9Tomasz Chmielewski
mangoo@wpkg.org
In reply to: Tomasz Chmielewski (#8)
Re: 'SGT DETAIL: Could not open file "pg_clog/05DC": No such file or directory' - what to do now?

On 06.05.2011 10:42, Tomasz Chmielewski wrote:

On 04.05.2011 22:27, Tomasz Chmielewski wrote:

Then another pg_clog file disappeared.

OK, I have:

bookstor=# SELECT * FROM core_wot_seq;
sequence_name | last_value | increment_by | max_value | min_value |
cache_value | log_cnt | is_cycled | is_called
------------------+------------+--------------+---------------------+-----------+-------------+---------+-----------+-----------

core_wot_seq | 2593 | 1 | 9223372036854775807 | 1 | 1 | 8 | f | t
(1 row)

bookstor=# SELECT 1 FROM core_wot_seq;
?column?
----------
1
(1 row)

bookstor=# SELECT 1 FROM core_wot_seq FOR UPDATE;
ERROR: could not access status of transaction 1573786613
DETAIL: Could not open file "pg_clog/05DC": No such file or directory.

How do I best recover from this? Stop postgres, create an empty, 256k
pg_clog/05DC file, start postgres?

Export table, drop table, import table? Anything else?

Nobody has a clue? :|

Just as a follow up, it turned out several sequences, and only sequences
were affected this way.

I used pg_dump to export these sequences, dropped the sequences, and
imported them again.

As there were some tables which depended on these sequences, I had to
use ALTER TABLE as well several times - grepping for the affected
sequence in the whole database dump gave me hints on what I had to do.

--
Tomasz Chmielewski
http://wpkg.org

#10Tomasz Chmielewski
mangoo@wpkg.org
In reply to: Tomasz Chmielewski (#9)
Re: 'SGT DETAIL: Could not open file "pg_clog/05DC": No such file or directory' - what to do now?

On 07.05.2011 23:19, Tomasz Chmielewski wrote:

On 06.05.2011 10:42, Tomasz Chmielewski wrote:

bookstor=# SELECT 1 FROM core_wot_seq FOR UPDATE;
ERROR: could not access status of transaction 1573786613
DETAIL: Could not open file "pg_clog/05DC": No such file or directory.

How do I best recover from this? Stop postgres, create an empty, 256k
pg_clog/05DC file, start postgres?

Export table, drop table, import table? Anything else?

Nobody has a clue? :|

Just as a follow up, it turned out several sequences, and only sequences
were affected this way.

I used pg_dump to export these sequences, dropped the sequences, and
imported them again.

Unfortunately, the issue is back, and again, only affects sequences.

I'd be really grateful for any more ideas here (why it happens, how to
best recover from it)!

--
Tomasz Chmielewski
http://wpkg.org

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomasz Chmielewski (#10)
Re: 'SGT DETAIL: Could not open file "pg_clog/05DC": No such file or directory' - what to do now?

Tomasz Chmielewski <mangoo@wpkg.org> writes:

bookstor=# SELECT 1 FROM core_wot_seq FOR UPDATE;

Um ... why are you doing that on a sequence?

ERROR: could not access status of transaction 1573786613
DETAIL: Could not open file "pg_clog/05DC": No such file or directory.

This doesn't surprise me too much, because sequences are not expected
to contain any live XIDs, so the XID freezing mechanism ignores them.
So if you did that in the past, this would eventually happen.

I think the most appropriate solution may be to disallow SELECT FOR
UPDATE/SHARE on sequences ... so if you have a good reason why we
shouldn't do so, please explain it.

regards, tom lane

#12Tomasz Chmielewski
mangoo@wpkg.org
In reply to: Tom Lane (#11)
Re: 'SGT DETAIL: Could not open file "pg_clog/05DC": No such file or directory' - what to do now?

On 31.05.2011 05:16, Tom Lane wrote:

Tomasz Chmielewski<mangoo@wpkg.org> writes:

bookstor=# SELECT 1 FROM core_wot_seq FOR UPDATE;

Um ... why are you doing that on a sequence?

ERROR: could not access status of transaction 1573786613
DETAIL: Could not open file "pg_clog/05DC": No such file or directory.

This doesn't surprise me too much, because sequences are not expected
to contain any live XIDs, so the XID freezing mechanism ignores them.
So if you did that in the past, this would eventually happen.

I think the most appropriate solution may be to disallow SELECT FOR
UPDATE/SHARE on sequences ... so if you have a good reason why we
shouldn't do so, please explain it.

That's a good question.

I grepped the sources of the application using postgres, and it certainly doesn't do it.

We use pgpool though, and I see:

pool_process_query.c: snprintf(qbuf, sizeof(qbuf), "SELECT 1 FROM %s FOR UPDATE", seq_rel_name);

So it looks to be coming from pgpool 3.x (it didn't do it in 2.x version).

This is a message explaining why it was introduced to pgpool:

http://comments.gmane.org/gmane.comp.db.postgresql.pgpool.devel/348

This brings two questions:

1) whatever command I send to postgres, should I expect "Could not open file "pg_clog/05DC": No such file or directory"?
If so, it should be documented, and a way to recover from such a situation should be explained.

2) is pgpool behaviour correct?

--
Tomasz Chmielewski
http://wpkg.org

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomasz Chmielewski (#12)
pgpool versus sequences (was Re: [ADMIN] 'SGT DETAIL: Could not open file "pg_clog/05DC": No such file or directory' - what to do now?)

Tomasz Chmielewski <mangoo@wpkg.org> writes:

On 31.05.2011 05:16, Tom Lane wrote:

I think the most appropriate solution may be to disallow SELECT FOR
UPDATE/SHARE on sequences ... so if you have a good reason why we
shouldn't do so, please explain it.

I grepped the sources of the application using postgres, and it certainly doesn't do it.
[ but pgpool does, as of a couple months ago ]
This is a message explaining why it was introduced to pgpool:
http://comments.gmane.org/gmane.comp.db.postgresql.pgpool.devel/348

Too bad that wasn't mentioned on pgsql-hackers, where someone might have
pointed out the major flaws in the idea.

2) is pgpool behaviour correct?

No. Quite aside from the lack-of-XID-maintenance problem, the proposal
seems just plain bizarre to me. SELECT FOR UPDATE wouldn't block
nextval(), so the command doesn't actually guarantee serialization of
sequence value acquisition. Taking a table lock on the sequence could
do so, and wouldn't run into any implementation issues, so I fail to see
why that alternative was rejected. I'm also wondering a bit how one
determines *which* sequence to lock, in a case where the table has
multiple serial columns ...

regards, tom lane

#14Tatsuo Ishii
ishii@postgresql.org
In reply to: Tom Lane (#13)
Re: pgpool versus sequences

Tomasz Chmielewski <mangoo@wpkg.org> writes:

On 31.05.2011 05:16, Tom Lane wrote:

I think the most appropriate solution may be to disallow SELECT FOR
UPDATE/SHARE on sequences ... so if you have a good reason why we
shouldn't do so, please explain it.

I grepped the sources of the application using postgres, and it certainly doesn't do it.
[ but pgpool does, as of a couple months ago ]
This is a message explaining why it was introduced to pgpool:
http://comments.gmane.org/gmane.comp.db.postgresql.pgpool.devel/348

Too bad that wasn't mentioned on pgsql-hackers, where someone might have
pointed out the major flaws in the idea.

2) is pgpool behaviour correct?

No. Quite aside from the lack-of-XID-maintenance problem, the proposal
seems just plain bizarre to me. SELECT FOR UPDATE wouldn't block
nextval(), so the command doesn't actually guarantee serialization of
sequence value acquisition.

Actually it was already explained before:

http://archives.postgresql.org/pgsql-hackers/2011-01/msg00805.php

At the time no one noticed the lack-of-XID-maintenance
problem. Tomasz, thanks for the report. I will go back to old way as
pgpool-II used to do, which is very inefficient unfortunately...

Taking a table lock on the sequence could
do so, and wouldn't run into any implementation issues, so I fail to see
why that alternative was rejected.

Table lock on the sequence? PostgreSQL doesn't allow it...

I'm also wondering a bit how one
determines *which* sequence to lock, in a case where the table has
multiple serial columns ...

No problem at least for pgpool-II. Just choose one of them and obtain
lock on it is enough. Because purpose for the lock is to prevent
concurrent INSERT to the table.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tatsuo Ishii (#14)
Re: pgpool versus sequences

I wrote:

I think the most appropriate solution may be to disallow SELECT FOR
UPDATE/SHARE on sequences ... so if you have a good reason why we
shouldn't do so, please explain it.

Attached is a proposed patch to close off this hole. I found that
somebody had already inserted code to forbid the case for foreign
tables, so I just extended that idea a bit (by copying-and-pasting
CheckValidResultRel). Questions:

* Does anyone want to bikeshed on the wording of the error messages?
* Does anyone want to argue for not forbidding SELECT FOR UPDATE on
toast tables?

regards, tom lane

#16Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#15)
Re: pgpool versus sequences

On Wed, Jun 1, 2011 at 6:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

I think the most appropriate solution may be to disallow SELECT FOR
UPDATE/SHARE on sequences ... so if you have a good reason why we
shouldn't do so, please explain it.

Attached is a proposed patch to close off this hole.  I found that
somebody had already inserted code to forbid the case for foreign
tables, so I just extended that idea a bit (by copying-and-pasting
CheckValidResultRel).  Questions:

* Does anyone want to bikeshed on the wording of the error messages?

Not particularly.

* Does anyone want to argue for not forbidding SELECT FOR UPDATE on
 toast tables?

Maybe. How hard would it be to fix that so it doesn't blow up? What
I don't like about the proposed solution is that it will cause very
user-visible breakage as a result of a minor release upgrade, for
anyone using pgpool, which is a lot of people; unless pgpool is
upgraded to a sufficiently new version first.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#16)
Re: pgpool versus sequences

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Jun 1, 2011 at 6:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

* Does anyone want to argue for not forbidding SELECT FOR UPDATE on
toast tables?

Maybe. How hard would it be to fix that so it doesn't blow up? What
I don't like about the proposed solution is that it will cause very
user-visible breakage as a result of a minor release upgrade, for
anyone using pgpool, which is a lot of people; unless pgpool is
upgraded to a sufficiently new version first.

I think you are answering a different question than what I asked.
I was asking about the not-strictly-necessary forbidding of SFU on
toast tables, not sequences.

If we're going to try to retroactively make the world safe for pgpool
doing what it's doing, the only way is to start including sequences in
the set of objects that are vacuumed and included in
relfrozenxid/datfrozenxid bookkeeping. Which is a lot more overhead
than I think is justified to clean up after a bad decision. I'm not
even terribly sure that it would work, since nobody has ever looked at
what would happen if nextval executed concurrently with vacuum doing
something to a sequence. The relfrozenxid logic might have some
difficulty with sequences that have zero relfrozenxid to start with,
too.

Please note also that what pgpool users have got right now is a time
bomb, which is not better than immediately-visible breakage. I would
prefer to try to get this change out ahead of widespread adoption of the
broken pgpool version.

regards, tom lane

#18Tatsuo Ishii
ishii@postgresql.org
In reply to: Tom Lane (#17)
Re: pgpool versus sequences

If we're going to try to retroactively make the world safe for pgpool
doing what it's doing, the only way is to start including sequences in
the set of objects that are vacuumed and included in
relfrozenxid/datfrozenxid bookkeeping. Which is a lot more overhead
than I think is justified to clean up after a bad decision. I'm not
even terribly sure that it would work, since nobody has ever looked at
what would happen if nextval executed concurrently with vacuum doing
something to a sequence. The relfrozenxid logic might have some
difficulty with sequences that have zero relfrozenxid to start with,
too.

What pgpool really wanted to do was locking sequence tables, not
locking rows in sequences. I wonder why the former is not allowed.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

#19Tatsuo Ishii
ishii@postgresql.org
In reply to: Robert Haas (#16)
Re: pgpool versus sequences

Maybe. How hard would it be to fix that so it doesn't blow up? What
I don't like about the proposed solution is that it will cause very
user-visible breakage as a result of a minor release upgrade, for
anyone using pgpool, which is a lot of people; unless pgpool is
upgraded to a sufficiently new version first.

Thanks for concerning pgpool and pgpool users.

BTW, there two pgpool-II versions:

- pgpool-II 2.x. uses table lock. has conflict problem with autovacuum
if the target table is fairly large.

- pgpool-II 3.x. uses sequence row lock to avoid the autovacuum
problem. However now it has XID-wrapwround problem and Tom's fix.

So both versions are having problem at this point. Yesterday advisory
locking was suggested, but after thinking while, it seems using
advisory locking make fragile. So I'm still looking for other
ways. Probably creating a "secret" relation and acquire table locking
on it is the way to go. This is essentially a dirty alternative for
sequence table locking.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

#20Alvaro Herrera
alvherre@commandprompt.com
In reply to: Tatsuo Ishii (#18)
Re: pgpool versus sequences

Excerpts from Tatsuo Ishii's message of mié jun 01 19:08:16 -0400 2011:

What pgpool really wanted to do was locking sequence tables, not
locking rows in sequences. I wonder why the former is not allowed.

Yeah -- why is LOCK SEQUENCE foo_seq not allowed? Seems a simple thing
to have.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#20)
Re: pgpool versus sequences

Alvaro Herrera <alvherre@commandprompt.com> writes:

Excerpts from Tatsuo Ishii's message of mié jun 01 19:08:16 -0400 2011:

What pgpool really wanted to do was locking sequence tables, not
locking rows in sequences. I wonder why the former is not allowed.

Yeah -- why is LOCK SEQUENCE foo_seq not allowed? Seems a simple thing
to have.

I don't see any particular reason to continue to disallow it, but does
that actually represent a workable solution path for pgpool? Switching
over to that would fail on older servers.

regards, tom lane

#22Tatsuo Ishii
ishii@postgresql.org
In reply to: Tom Lane (#21)
Re: pgpool versus sequences

Yeah -- why is LOCK SEQUENCE foo_seq not allowed? Seems a simple thing
to have.

I don't see any particular reason to continue to disallow it, but does
that actually represent a workable solution path for pgpool? Switching
over to that would fail on older servers.

pgpool will provide following method for older version of PostgreSQL.

Probably creating a "secret" relation and acquire table locking
on it is the way to go. This is essentially a dirty alternative for
sequence table locking.

--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#17)
Re: pgpool versus sequences

I wrote:

Please note also that what pgpool users have got right now is a time
bomb, which is not better than immediately-visible breakage.

BTW, so far as that goes, I suggest that we tweak nextval() and setval()
to force the sequence tuple's xmax to zero. That will provide a simple
recovery path for anyone who's at risk at the moment. Of course, this
has to go hand-in-hand with the change to forbid SELECT FOR UPDATE,
else those operations would risk breaking active tuple locks.

regards, tom lane

#24Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#20)
Re: pgpool versus sequences

On Wed, Jun 1, 2011 at 7:47 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

Excerpts from Tatsuo Ishii's message of mié jun 01 19:08:16 -0400 2011:

What pgpool really wanted to do was locking sequence tables, not
locking rows in sequences. I wonder why the former is not allowed.

Yeah -- why is LOCK SEQUENCE foo_seq not allowed?  Seems a simple thing
to have.

It cause a grammar conflict. Since SEQUENCE and NOWAIT are both
unreserved keywords, it's not clear to the parser whether "LOCK
SEQUENCE NOWAIT" means to lock a table called SEQUENCE without
waiting, or whether it means to lock a sequence called NOWAIT.

Tom and I discussed possible ways of fixing this on -hackers a few
months ago. Currently the syntax for LOCK is:

LOCK [ TABLE ] [ ONLY ] name [, ...] [ IN lockmode MODE ] [ NOWAIT ];

I suggested fixing this by making TABLE required, thus:

LOCK TABLE [ ONLY ] name [, ...] [ IN lockmode MODE ] [ NOWAIT ];

Tom suggested fixing it by making NOWAIT require IN lockmode MODE, thus:

LOCK [ TABLE ] [ ONLY ] name [,...] [ IN lockmode MODE [ NOWAIT ]];

My proposed fix is probably more likely to break people's
applications, but Tom's isn't completely free from that possibility
either. It's also somewhat counterintuitive IMV. The best option
might be to come up with some completely new syntax that is a little
better designed than the current one, maybe along the lines of the
extensible-options syntax used by EXPLAIN. The trouble is that the
first word of the command would probably have to be something other
than LOCK if we don't want to break backward compatibility with the
existing syntax in some way, and there aren't too many good synonyms
for LOCK. LATCH? FASTEN? Blech. We're probably going to end up
having to make a compatibility break here if we want to support this.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#25Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#17)
Re: pgpool versus sequences

On Wed, Jun 1, 2011 at 6:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Please note also that what pgpool users have got right now is a time
bomb, which is not better than immediately-visible breakage.  I would
prefer to try to get this change out ahead of widespread adoption of the
broken pgpool version.

Hmm, I gather from what Tatsuo is saying at the web site that this has
only been broken since the release of 3.0 on February 23rd, so given
that I think your approach makes sense.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#24)
Re: pgpool versus sequences

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Jun 1, 2011 at 7:47 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

Yeah -- why is LOCK SEQUENCE foo_seq not allowed? �Seems a simple thing
to have.

It cause a grammar conflict.

That's a lot of work for a purely cosmetic issue, though. What would be
trivial is to let this work:

regression=# create sequence s1;
CREATE SEQUENCE
regression=# begin;
BEGIN
regression=# lock table s1;
ERROR: "s1" is not a table

We should do that anyway, even if we put in the effort to support the
other syntax.

regards, tom lane

#27Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#26)
Re: pgpool versus sequences

On Thu, Jun 2, 2011 at 10:31 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Jun 1, 2011 at 7:47 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

Yeah -- why is LOCK SEQUENCE foo_seq not allowed?  Seems a simple thing
to have.

It cause a grammar conflict.

That's a lot of work for a purely cosmetic issue, though.  What would be
trivial is to let this work:

regression=# create sequence s1;
CREATE SEQUENCE
regression=# begin;
BEGIN
regression=# lock table s1;
ERROR:  "s1" is not a table

We should do that anyway, even if we put in the effort to support the
other syntax.

Ugh. We are already stuck supporting all kinds of backward
compatibility cruft in tablecmds.c as a result of the fact that you
used to have to use ALTER TABLE to operate on views and sequences.
The whole thing is confusing and a mess. -1 from me on extending that
mess to more places.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#28Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#27)
Re: pgpool versus sequences

Robert Haas <robertmhaas@gmail.com> writes:

Ugh. We are already stuck supporting all kinds of backward
compatibility cruft in tablecmds.c as a result of the fact that you
used to have to use ALTER TABLE to operate on views and sequences.
The whole thing is confusing and a mess.

[ shrug... ] I don't find it so. We have a convention that TABLE is
an umbrella term for all applicable relation types. End of story.

Even if you disagree with that, the convention does exist, and making
LOCK the one command type that disobeys it doesn't seem like a good
plan.

regards, tom lane

#29Alvaro Herrera
alvherre@commandprompt.com
In reply to: Tom Lane (#26)
Re: pgpool versus sequences

Excerpts from Tom Lane's message of jue jun 02 10:31:58 -0400 2011:

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Jun 1, 2011 at 7:47 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

Yeah -- why is LOCK SEQUENCE foo_seq not allowed? Seems a simple thing
to have.

It cause a grammar conflict.

That's a lot of work for a purely cosmetic issue, though. What would be
trivial is to let this work:

regression=# create sequence s1;
CREATE SEQUENCE
regression=# begin;
BEGIN
regression=# lock table s1;
ERROR: "s1" is not a table

Yeah, though it'd be nice to avoid this:

alvherre=# create schema public_too;
CREATE SCHEMA
alvherre=# set search_path to 'public_too', 'public';
SET
alvherre=# create table public_too.s1 ();
CREATE TABLE
alvherre=# create sequence public.s1;
CREATE SEQUENCE
alvherre=# begin;
BEGIN
alvherre=# lock s1;
LOCK TABLE

At this point we have a lock on the table, but if we change LOCK to also
look for sequences, the behavior would change. At the very least, the
command tag should be different.

Hopefully few people name sequences the same as tables ...

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#29)
Re: pgpool versus sequences

Alvaro Herrera <alvherre@commandprompt.com> writes:

Excerpts from Tom Lane's message of jue jun 02 10:31:58 -0400 2011:

That's a lot of work for a purely cosmetic issue, though. What would be
trivial is to let this work:
regression=# lock table s1;
ERROR: "s1" is not a table

Yeah, though it'd be nice to avoid this:

alvherre=# create schema public_too;
CREATE SCHEMA
alvherre=# set search_path to 'public_too', 'public';
SET
alvherre=# create table public_too.s1 ();
CREATE TABLE
alvherre=# create sequence public.s1;
CREATE SEQUENCE
alvherre=# begin;
BEGIN
alvherre=# lock s1;
LOCK TABLE

At this point we have a lock on the table, but if we change LOCK to also
look for sequences, the behavior would change.

No it wouldn't. You seem to be imagining that sequences live in a
different namespace from tables, but they don't. There can only be one
relation that "s1" will refer to for any search_path setting.

regards, tom lane

#31Alvaro Herrera
alvherre@commandprompt.com
In reply to: Tom Lane (#30)
Re: pgpool versus sequences

Excerpts from Tom Lane's message of jue jun 02 11:10:00 -0400 2011:

Alvaro Herrera <alvherre@commandprompt.com> writes:

Excerpts from Tom Lane's message of jue jun 02 10:31:58 -0400 2011:

That's a lot of work for a purely cosmetic issue, though. What would be
trivial is to let this work:
regression=# lock table s1;
ERROR: "s1" is not a table

Yeah, though it'd be nice to avoid this:

alvherre=# create schema public_too;
CREATE SCHEMA
alvherre=# set search_path to 'public_too', 'public';
SET
alvherre=# create table public_too.s1 ();
CREATE TABLE
alvherre=# create sequence public.s1;
CREATE SEQUENCE
alvherre=# begin;
BEGIN
alvherre=# lock s1;
LOCK TABLE

At this point we have a lock on the table, but if we change LOCK to also
look for sequences, the behavior would change.

No it wouldn't. You seem to be imagining that sequences live in a
different namespace from tables, but they don't. There can only be one
relation that "s1" will refer to for any search_path setting.

Doh, I see that I messed up and reversed the schemas in the search_path
line above. If I fix that I get the expected error:

alvherre=# set search_path to 'public', 'public_too';
SET
alvherre=# lock s1;
ERROR: «s1» no es una tabla

("s1" is not a table). What I was imagining was that LOCK was using
search path to look only for tables and ignoring sequences.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#32Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#28)
Re: pgpool versus sequences

On Thu, Jun 2, 2011 at 10:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Ugh.  We are already stuck supporting all kinds of backward
compatibility cruft in tablecmds.c as a result of the fact that you
used to have to use ALTER TABLE to operate on views and sequences.
The whole thing is confusing and a mess.

[ shrug... ]  I don't find it so.  We have a convention that TABLE is
an umbrella term for all applicable relation types.  End of story.

Even if you disagree with that, the convention does exist, and making
LOCK the one command type that disobeys it doesn't seem like a good
plan.

I agree that wouldn't be a good plan to make LOCK inconsistent with
everything else, but LOCK is not the only case that's like this:

rhaas=# drop table v1;
ERROR: "v1" is not a table
HINT: Use DROP VIEW to remove a view.
rhaas=# comment on table v1 is 'v1 is a view';
ERROR: "v1" is not a table
rhaas=# load 'dummy_seclabel';
LOAD
rhaas=# security label on table v1 is 'classified';
ERROR: "v1" is not a table

As far as I can see, ALTER TABLE is just about the only place where we
allow this; and only for certain command types. Your commit message
seems to indicate that we continue to allow that stuff only for
backward-compatibility:

commit a0b012a1ab85ae115f30e5e4fe09922b4885fdad
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Sun Jun 15 01:25:54 2008 +0000

Rearrange ALTER TABLE syntax processing as per my recent proposal: the
grammar allows ALTER TABLE/INDEX/SEQUENCE/VIEW interchangeably for all
subforms of those commands, and then we sort out what's really legal
at execution time. This allows the ALTER SEQUENCE/VIEW reference pages
to fully document all the ALTER forms available for sequences and views
respectively, and eliminates a longstanding cause of confusion for users.

The net effect is that the following forms are allowed that weren't before:
ALTER SEQUENCE OWNER TO
ALTER VIEW ALTER COLUMN SET/DROP DEFAULT
ALTER VIEW OWNER TO
ALTER VIEW SET SCHEMA
(There's no actual functionality gain here, but formerly you had to say
ALTER TABLE instead.)

Interestingly, the grammar tables actually get smaller, probably because
there are fewer special cases to keep track of.

I did not disallow using ALTER TABLE for these operations. Perhaps we
should, but there's a backwards-compatibility issue if we do; in fact
it would break existing pg_dump scripts. I did however tighten up
ALTER SEQUENCE and ALTER VIEW to reject non-sequences and non-views
in the new cases as well as a couple of cases where they didn't before.

The patch doesn't change pg_dump to use the new syntaxes, either.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#33Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#25)
Re: pgpool versus sequences

Robert Haas <robertmhaas@gmail.com> writes:

On Wed, Jun 1, 2011 at 6:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Please note also that what pgpool users have got right now is a time
bomb, which is not better than immediately-visible breakage. �I would
prefer to try to get this change out ahead of widespread adoption of the
broken pgpool version.

Hmm, I gather from what Tatsuo is saying at the web site that this has
only been broken since the release of 3.0 on February 23rd, so given
that I think your approach makes sense.

Done, and I also installed a kluge to clean up the damage retroactively
during any nextval/setval operation.

regards, tom lane