error updating a tuple after promoting a standby

Started by Tom DalPozzoover 9 years ago9 messagesgeneral
Jump to latest
#1Tom DalPozzo
t.dalpozzo@gmail.com

Hi,
I was doing some tests with backup, replication, standby. After promoting a
standby server, I found my db in a condition that raises me an error while
trying to update a particular tuple.
Below here you can se my UPDATE statment and the error raised.
The select * from stato where id=409; executed immediately after worked
well however.
I checked the file and it's readable.
Before my standby promotion test I performed millions of this UPDATE
statments without problem on my db.
I can not reproduce the issue.

Perhaps I did something wrong during my test but I don't know what. I
didn't touch any file in base directory however.
Anyway I'd like to know if in your opinion it's possible that this error
was caused by something wrong done by me or if it should never happen as
the file is perfectly readable.

Regards
Pupillo

psql (9.5.4)
Type "help" for help.

ginopino=# UPDATE stato SET
dati='\x5353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353'
WHERE id=409;
ERROR: could not read block 12281 in file "base/16384/29153": read only 0
of 8192 bytes
ginopino=# select * from stato where id=409; <<< IT WORKS FINE

#2Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Tom DalPozzo (#1)
Re: error updating a tuple after promoting a standby

On 12/21/2016 01:51 AM, Tom DalPozzo wrote:

Hi,
I was doing some tests with backup, replication, standby. After
promoting a standby server, I found my db in a condition that raises me
an error while trying to update a particular tuple.
Below here you can se my UPDATE statment and the error raised.
The select * from stato where id=409; executed immediately after worked
well however.
I checked the file and it's readable.
Before my standby promotion test I performed millions of this UPDATE
statments without problem on my db.
I can not reproduce the issue.

Perhaps I did something wrong during my test but I don't know what. I
didn't touch any file in base directory however.
Anyway I'd like to know if in your opinion it's possible that this error
was caused by something wrong done by me or if it should never happen as
the file is perfectly readable.

Regards
Pupillo

psql (9.5.4)
Type "help" for help.

ginopino=# UPDATE stato SET
dati='\x5353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353'
WHERE id=409;
ERROR: could not read block 12281 in file "base/16384/29153": read only

First I would find what base/16384/29153 actually is. So in the database
where stato is:

select relname from pg_class where relfilenode = 29153;

0 of 8192 bytes
ginopino=# select * from stato where id=409; <<< IT WORKS FINE

But does it have the updated info?

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Tom DalPozzo
t.dalpozzo@gmail.com
In reply to: Adrian Klaver (#2)
Re: error updating a tuple after promoting a standby

Hi,

First I would find what base/16384/29153 actually is. So in the database
where stato is:

select relname from pg_class where relfilenode = 29153;

below here the query you suggested, showing that file belongs to stato
table as expected.
ginopino=# select relname from pg_class where relfilenode = 29153;
relname
---------
stato
(1 row)

0 of 8192 bytes

ginopino=# select * from stato where id=409; <<< IT WORKS FINE

But does it have the updated info?

Yes, it correctly shows the expected data before issuing the update query
which generates the problem:
ginopino=# select * from stato where id=409;
id | ....... dati .......
-----+-------------- .......
409 | \x4a735300db8f4b31ab8660f85192bc....................

--

Adrian Klaver
adrian.klaver@aklaver.com

Thanks
Pupillo

#4Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Tom DalPozzo (#3)
Re: error updating a tuple after promoting a standby

On 12/21/2016 08:17 AM, Tom DalPozzo wrote:

Hi,

First I would find what base/16384/29153 actually is. So in the
database where stato is:

select relname from pg_class where relfilenode = 29153;

below here the query you suggested, showing that file belongs to stato
table as expected.
ginopino=# select relname from pg_class where relfilenode = 29153;
relname
---------
stato
(1 row)

Is there an index on this table?

Have you tried a REINDEX on it?

In your original post you mention this error occurred while testing
backup/replication/standby promotion.

What was the procedure you followed in doing the testing?

0 of 8192 bytes
ginopino=# select * from stato where id=409; <<< IT WORKS FINE

But does it have the updated info?

Yes, it correctly shows the expected data before issuing the update
query which generates the problem:
ginopino=# select * from stato where id=409;
id | ....... dati .......
-----+-------------- .......
409 | \x4a735300db8f4b31ab8660f85192bc....................

--

Adrian Klaver
adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>

Thanks
Pupillo

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Tom DalPozzo
t.dalpozzo@gmail.com
In reply to: Adrian Klaver (#4)
Re: error updating a tuple after promoting a standby

Is there an index on this table?

Have you tried a REINDEX on it?

yes there is an index on id field. I tried REINDEX. Nothing changes but I

notice now (but perhaps it was like that even before reindexing) that every
time I issue that UPDATE query, the number of the block it can't read
increases by one. Now, after some attempts: ERROR: could not read block
12289 in file "base/16384/29153": read only 0 of 8192 bytes.

In your original post you mention this error occurred while testing
backup/replication/standby promotion.

What was the procedure you followed in doing the testing?

Unfortunately I don't remember every step as I was focused on completely
other things... Anyway, in synthesis:
1 pg_basebackup on primary and added, to the just created backup pg_xlog
dir, the needed WAL files according to the .label file (I'm trying without
archiving) .
2 copied the backup dir to my two standby PCs (one is for sync streaming
replication, the other async).
3 configured recovery.conf ecc... on the standby PCs.
4 started the two standby servers . The first was in sync replication, the
second in async. Messages were OK.
5 Updated some thousands of rows in the primary just to check that it
worked fine.
6 stopped the primary
7 promoted the 1st standby (new primary).
8 stopped/reconfigured/restarted the 2nd standby (async replication) to
point to the 1st standby.
9 checked that all messages were ok in both active PCs.
10 tried to update on the new primary getting the error (perhaps after some
successful updates but I'm not sure).

A new thing:
I noticed that, always restarting from the corrupted cluster (without
reindex I mean), if I update the row id=409 with few data (3 bytes), then
it works and after that, even updating with that long data works.

Regards
Pupillo

#6Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Tom DalPozzo (#5)
Re: error updating a tuple after promoting a standby

On 12/21/2016 10:06 AM, Tom DalPozzo wrote:

Is there an index on this table?

Have you tried a REINDEX on it?

yes there is an index on id field. I tried REINDEX. Nothing changes but
I notice now (but perhaps it was like that even before reindexing) that
every time I issue that UPDATE query, the number of the block it can't
read increases by one. Now, after some attempts: ERROR: could not read
block 12289 in file "base/16384/29153": read only 0 of 8192 bytes.

Unfortunately I don't remember every step as I was focused on completely
other things... Anyway, in synthesis:
1 pg_basebackup on primary and added, to the just created backup pg_xlog
dir, the needed WAL files according to the .label file (I'm trying
without archiving) .

If it where me I would use one of the -X methods:

https://www.postgresql.org/docs/9.5/static/app-pgbasebackup.html

A new thing:
I noticed that, always restarting from the corrupted cluster (without
reindex I mean), if I update the row id=409 with few data (3 bytes),
then it works and after that, even updating with that long data works.

To me that looks like an issue with the associated TOAST table. I do not
have a suggestion at this time. Maybe this rings a bell with someone else.

Regards
Pupillo

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Tom DalPozzo
t.dalpozzo@gmail.com
In reply to: Adrian Klaver (#6)
Re: error updating a tuple after promoting a standby

If it where me I would use one of the -X methods:

https://www.postgresql.org/docs/9.5/static/app-pgbasebackup.html

To me that looks like an issue with the associated TOAST table. I do not

have a suggestion at this time. Maybe this rings a bell with someone else.

--
Adrian Klaver
adrian.klaver@aklaver.com

Good tip, x method, I will try it, sure.
But I'm also afraid that there is a still hidden issue.
I think that I'm using postgres in a context slightly different from the
usual ones (many many updates of a table rows with big field,
replication,...) so I'm afraid that issues not already seen can appear.
Regards
Pupillo

#8Brian Sutherland
brian@vanguardistas.net
In reply to: Tom DalPozzo (#1)
Re: error updating a tuple after promoting a standby

Perhaps try 9.5.5 which has a fix for a problem with the same symptoms:

https://wiki.postgresql.org/wiki/Free_Space_Map_Problems
https://www.postgresql.org/docs/9.5/static/release-9-5-5.html#AEN126074

On Wed, Dec 21, 2016 at 10:51:47AM +0100, Tom DalPozzo wrote:

Hi,
I was doing some tests with backup, replication, standby. After promoting a
standby server, I found my db in a condition that raises me an error while
trying to update a particular tuple.
Below here you can se my UPDATE statment and the error raised.
The select * from stato where id=409; executed immediately after worked
well however.
I checked the file and it's readable.
Before my standby promotion test I performed millions of this UPDATE
statments without problem on my db.
I can not reproduce the issue.

Perhaps I did something wrong during my test but I don't know what. I
didn't touch any file in base directory however.
Anyway I'd like to know if in your opinion it's possible that this error
was caused by something wrong done by me or if it should never happen as
the file is perfectly readable.

Regards
Pupillo

psql (9.5.4)
Type "help" for help.

ginopino=# UPDATE stato SET
dati='\x5353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353535353'
WHERE id=409;
ERROR: could not read block 12281 in file "base/16384/29153": read only 0
of 8192 bytes
ginopino=# select * from stato where id=409; <<< IT WORKS FINE

--
Brian Sutherland

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#9Tom DalPozzo
t.dalpozzo@gmail.com
In reply to: Brian Sutherland (#8)
Re: error updating a tuple after promoting a standby

2016-12-22 10:23 GMT+01:00 Brian Sutherland <brian@vanguardistas.net>:

Perhaps try 9.5.5 which has a fix for a problem with the same symptoms:

https://wiki.postgresql.org/wiki/Free_Space_Map_Problems
https://www.postgresql.org/docs/9.5/static/release-9-5-5.
html#AEN126074

Yes it was that!

I tried the procedure in wiki and it worked.
Thank you very much!
Pupillo