ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100
Our database has started reporting errors like this:
2017-05-31 13:48:10 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 14242189 in pg_toast_10919630
...
2017-06-01 11:06:56 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 19573520 in pg_toast_10919630
(157 times, for different toast values, same pg_toast_nnn). pg_toast_10919630
corresponds to a table with around 168 million rows.
These went away, but the next day we got similar errors from another
table:
2017-06-02 05:59:50 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 47060150 in pg_toast_10920100
...
2017-06-02 06:14:54 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 47226455 in pg_toast_10920100
(Only 4 this time) pg_toast_10920100 corresponds to a table with holds
around 320 million rows (these are our two large tables).
The next day we got 6 such errors and the day after 10 such errors. On
June 5th we got 94, yesterday we got 111, of which one looked a little
different:
2017-06-06 17:32:21 CEST ERROR: unexpected chunk size 1996 (expected 1585) in final chunk 0 for toast value 114925100 in pg_toast_10920100
and today the logs have 65 lines, ending with these:
2017-06-07 14:49:53 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 131114834 in pg_toast_10920100
2017-06-07 14:53:41 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 131149566 in pg_toast_10920100
The database is 10 TB on disk (SSDs) and runs on a 48 core server with 3
TB RAM on Ubuntu 14.04 (Linux 3.18.13).
We are updating rows in the database a lot/continuously.
There are no apparent indications of hardware errors (like ECC) in
dmesg, nor any error messages logged by the LSI MegaRAID controller, as
far as I can tell.
We are running PostgreSQL 9.3.14 currently.
The only thing I could see in the release notes since 9.3.14 that might
be related is this:
"* Avoid very-low-probability data corruption due to testing tuple
visibility without holding buffer lock (Thomas Munro, Peter Geoghegan,
Tom Lane)"
Although reading more about it, it doesn't sound like it would exhibit
the symptoms we see?
We have recently increased the load (to around twice the number of
cores), though, which made me think we could be triggering corner cases
we haven't hit before.
We will be upgrading to PostgreSQL 9.3.17 during the weekend, but I'd like to hear
if anyone has seen something like this, or have some ideas of how to
investigate/what the cause might be.
Best regards,
Adam
--
"Lägg ditt liv i min hand Adam Sjøgren
Sälj din själ till ett band" adsj@novozymes.com
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 07/06/2017 16:33, ADSJ (Adam Sjøgren) wrote:
Our database has started reporting errors like this:
2017-05-31 13:48:10 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 14242189 in pg_toast_10919630
...
2017-06-01 11:06:56 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 19573520 in pg_toast_10919630(157 times, for different toast values, same pg_toast_nnn). pg_toast_10919630
corresponds to a table with around 168 million rows.These went away, but the next day we got similar errors from another
table:2017-06-02 05:59:50 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 47060150 in pg_toast_10920100
...
2017-06-02 06:14:54 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 47226455 in pg_toast_10920100(Only 4 this time) pg_toast_10920100 corresponds to a table with holds
around 320 million rows (these are our two large tables).The next day we got 6 such errors and the day after 10 such errors. On
June 5th we got 94, yesterday we got 111, of which one looked a little
different:2017-06-06 17:32:21 CEST ERROR: unexpected chunk size 1996 (expected 1585) in final chunk 0 for toast value 114925100 in pg_toast_10920100
and today the logs have 65 lines, ending with these:
2017-06-07 14:49:53 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 131114834 in pg_toast_10920100
2017-06-07 14:53:41 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 131149566 in pg_toast_10920100
First try to find which tables those toast relations refer to :
select 10919630::regclass , 10920100::regclass ;
Are those critical tables? Can you restore them somehow?
Also you may consider
REINDEX TABLE pg_toast.pg_toast_10920100;
REINDEX TABLE pg_toast.pg_toast_10919630;
REINDEX TABLE <name of table 10920100>;
REINDEX TABLE <name of table 10919630>;
also VACUUM the above tables.
You might want to write a function which iterates over the damaged table's rows in order to identify the damaged row(s). And then do some good update to create a new version.
The database is 10 TB on disk (SSDs) and runs on a 48 core server with 3
TB RAM on Ubuntu 14.04 (Linux 3.18.13).We are updating rows in the database a lot/continuously.
There are no apparent indications of hardware errors (like ECC) in
dmesg, nor any error messages logged by the LSI MegaRAID controller, as
far as I can tell.We are running PostgreSQL 9.3.14 currently.
The only thing I could see in the release notes since 9.3.14 that might
be related is this:"* Avoid very-low-probability data corruption due to testing tuple
visibility without holding buffer lock (Thomas Munro, Peter Geoghegan,
Tom Lane)"Although reading more about it, it doesn't sound like it would exhibit
the symptoms we see?We have recently increased the load (to around twice the number of
cores), though, which made me think we could be triggering corner cases
we haven't hit before.We will be upgrading to PostgreSQL 9.3.17 during the weekend, but I'd like to hear
if anyone has seen something like this, or have some ideas of how to
investigate/what the cause might be.Best regards,
Adam
--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Hi,
I too have been experiencing this with a busy PostgreSQL instance.
I have been following the updates to the 9.4 branch hoping a fix will appear, but sadly no luck yet. I have manually replicated the issue on 9.4.4, 9.4.10 and 9.4.12. My replication steps are:
BEGIN;
CREATE TABLE x (id BIGSERIAL PRIMARY KEY, payload1 VARCHAR, payload2 VARCHAR, payload3 VARCHAR, payload4 BIGINT, payload5 BIGINT);
/* Repeat until 2,000,000 rows are inserted */
INSERT INTO x (id, payload1, payload2, payload3, payload4, payload5) VALUES (random values of varying length/size to force random toast usage);
COMMIT;
VACUUM (ANALYZE, FULL);
BEGIN;
/* Repeat until all 2,000,000 rows are updated */
UPDATE x SET payload1 = , payload2 = , payload3 = , payload4 = , payload5 = ... again random values of varying length/size to force random toast usage
COMMIT;
VACCUM (ANALYZE, FULL);
The second vacuum causes an ERROR identical to that you are reporting below (unexpected chunk number n (expected n) for toast value...). However it may take up to ten attempts to replicate it.
Out of interest, are you using any tablespaces other than pg_default? I can only replicate the issue when using separately mounted tablespaces.
I have been investigating this quite extensively and everything I can find on the web suggests data corruption. However running the the following DO reports no errors and I can dump the database without issue.
DO $$
DECLARE
curid INT := 0;
vcontent RECORD;
badid BIGINT;
var1_sub VARCHAR;
var2_sub VARCHAR;
var3_sub VARCHAR;
var4_sub VARCHAR;
var5_sub VARCHAR;
BEGIN
FOR badid IN SELECT id FROM x
LOOP
curid = curid + 1;
IF curid % 100000 = 0
THEN
RAISE NOTICE '% rows inspected', curid;
END IF;
BEGIN
SELECT *
INTO vcontent
FROM x
WHERE rowid = badid;
var1_sub := SUBSTR(vcontent.var1,2000,5000);
var2_sub := SUBSTR(vcontent.var2,2000,5000);
var3_sub := SUBSTR(vcontent.var3,2000,5000);
var4_sub := SUBSTR(vcontent.var4::VARCHAR,2000,5000);
var5_sub := SUBSTR(vcontent.var5::VARCHAR,2000,5000);
EXCEPTION WHEN OTHERS THEN
RAISE NOTICE 'Data for rowid % is corrupt', badid;
CONTINUE;
END;
END LOOP;
END;
$$;
I also found the following has been reported: /messages/by-id/20161201165505.4360.28203@wrigleys.postgresql.org
Best wishes,
Harry
Show quoted text
On 7 Jun 2017, at 15:22, Achilleas Mantzios <achill@matrix.gatewaynet.com> wrote:
On 07/06/2017 16:33, ADSJ (Adam Sjøgren) wrote:
Our database has started reporting errors like this:
2017-05-31 13:48:10 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 14242189 in pg_toast_10919630
...
2017-06-01 11:06:56 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 19573520 in pg_toast_10919630(157 times, for different toast values, same pg_toast_nnn). pg_toast_10919630
corresponds to a table with around 168 million rows.These went away, but the next day we got similar errors from another
table:2017-06-02 05:59:50 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 47060150 in pg_toast_10920100
...
2017-06-02 06:14:54 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 47226455 in pg_toast_10920100(Only 4 this time) pg_toast_10920100 corresponds to a table with holds
around 320 million rows (these are our two large tables).The next day we got 6 such errors and the day after 10 such errors. On
June 5th we got 94, yesterday we got 111, of which one looked a little
different:2017-06-06 17:32:21 CEST ERROR: unexpected chunk size 1996 (expected 1585) in final chunk 0 for toast value 114925100 in pg_toast_10920100
and today the logs have 65 lines, ending with these:
2017-06-07 14:49:53 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 131114834 in pg_toast_10920100
2017-06-07 14:53:41 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 131149566 in pg_toast_10920100First try to find which tables those toast relations refer to :
select 10919630::regclass , 10920100::regclass ;
Are those critical tables? Can you restore them somehow?Also you may consider
REINDEX TABLE pg_toast.pg_toast_10920100;
REINDEX TABLE pg_toast.pg_toast_10919630;
REINDEX TABLE <name of table 10920100>;
REINDEX TABLE <name of table 10919630>;also VACUUM the above tables.
You might want to write a function which iterates over the damaged table's rows in order to identify the damaged row(s). And then do some good update to create a new version.
The database is 10 TB on disk (SSDs) and runs on a 48 core server with 3
TB RAM on Ubuntu 14.04 (Linux 3.18.13).We are updating rows in the database a lot/continuously.
There are no apparent indications of hardware errors (like ECC) in
dmesg, nor any error messages logged by the LSI MegaRAID controller, as
far as I can tell.We are running PostgreSQL 9.3.14 currently.
The only thing I could see in the release notes since 9.3.14 that might
be related is this:"* Avoid very-low-probability data corruption due to testing tuple
visibility without holding buffer lock (Thomas Munro, Peter Geoghegan,
Tom Lane)"Although reading more about it, it doesn't sound like it would exhibit
the symptoms we see?We have recently increased the load (to around twice the number of
cores), though, which made me think we could be triggering corner cases
we haven't hit before.We will be upgrading to PostgreSQL 9.3.17 during the weekend, but I'd like to hear
if anyone has seen something like this, or have some ideas of how to
investigate/what the cause might be.Best regards,
Adam
--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org <mailto:pgsql-general@postgresql.org>)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general <http://www.postgresql.org/mailpref/pgsql-general>
On 07/06/2017 17:49, Harry Ambrose wrote:
Hi,
Out of interest, are you using any tablespaces other than pg_default? I can only replicate the issue when using separately mounted tablespaces.
One lesson I learned from the BSD camp when dealing with random freezes and panics : when all else fails to give an answer it is time to start blaming my hardware. Are those tablespaces on any cheap
SSD's ?
I have been investigating this quite extensively and everything I can find on the web suggests data corruption. However running the the following DO reports no errors and I can dump the database
without issue.
You don't use index when pg_dump . If only the index is corrupted you can get away with dump/reload (but for big DBs this is unrealistic)
I also found the following has been reported: /messages/by-id/20161201165505.4360.28203@wrigleys.postgresql.org
Best wishes,
HarryOn 7 Jun 2017, at 15:22, Achilleas Mantzios <achill@matrix.gatewaynet.com <mailto:achill@matrix.gatewaynet.com>> wrote:
On 07/06/2017 16:33, ADSJ (Adam Sjøgren) wrote:
Our database has started reporting errors like this:
2017-05-31 13:48:10 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 14242189 in pg_toast_10919630
...
2017-06-01 11:06:56 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 19573520 in pg_toast_10919630(157 times, for different toast values, same pg_toast_nnn). pg_toast_10919630
corresponds to a table with around 168 million rows.These went away, but the next day we got similar errors from another
table:2017-06-02 05:59:50 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 47060150 in pg_toast_10920100
...
2017-06-02 06:14:54 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 47226455 in pg_toast_10920100(Only 4 this time) pg_toast_10920100 corresponds to a table with holds
around 320 million rows (these are our two large tables).The next day we got 6 such errors and the day after 10 such errors. On
June 5th we got 94, yesterday we got 111, of which one looked a little
different:2017-06-06 17:32:21 CEST ERROR: unexpected chunk size 1996 (expected 1585) in final chunk 0 for toast value 114925100 in pg_toast_10920100
and today the logs have 65 lines, ending with these:
2017-06-07 14:49:53 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 131114834 in pg_toast_10920100
2017-06-07 14:53:41 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 131149566 in pg_toast_10920100First try to find which tables those toast relations refer to :
select 10919630::regclass , 10920100::regclass ;
Are those critical tables? Can you restore them somehow?Also you may consider
REINDEX TABLE pg_toast.pg_toast_10920100;
REINDEX TABLE pg_toast.pg_toast_10919630;
REINDEX TABLE <name of table 10920100>;
REINDEX TABLE <name of table 10919630>;also VACUUM the above tables.
You might want to write a function which iterates over the damaged table's rows in order to identify the damaged row(s). And then do some good update to create a new version.
The database is 10 TB on disk (SSDs) and runs on a 48 core server with 3
TB RAM on Ubuntu 14.04 (Linux 3.18.13).We are updating rows in the database a lot/continuously.
There are no apparent indications of hardware errors (like ECC) in
dmesg, nor any error messages logged by the LSI MegaRAID controller, as
far as I can tell.We are running PostgreSQL 9.3.14 currently.
The only thing I could see in the release notes since 9.3.14 that might
be related is this:"* Avoid very-low-probability data corruption due to testing tuple
visibility without holding buffer lock (Thomas Munro, Peter Geoghegan,
Tom Lane)"Although reading more about it, it doesn't sound like it would exhibit
the symptoms we see?We have recently increased the load (to around twice the number of
cores), though, which made me think we could be triggering corner cases
we haven't hit before.We will be upgrading to PostgreSQL 9.3.17 during the weekend, but I'd like to hear
if anyone has seen something like this, or have some ideas of how to
investigate/what the cause might be.Best regards,
Adam
--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org <mailto:pgsql-general@postgresql.org>)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt
Harry Ambrose <harry.ambrose@gmail.com> writes:
I have been following the updates to the 9.4 branch hoping a fix will appear, but sadly no luck yet. I have manually replicated the issue on 9.4.4, 9.4.10 and 9.4.12. My replication steps are:
This is a very interesting report, but you didn't actually provide a
reproducer, just a handwavy outline. If you submit a script that
makes this happen, we will most definitely look into it. But
people aren't going to be excited about trying to reverse-engineer
a test case out of a vague description.
I also found the following has been reported:
/messages/by-id/20161201165505.4360.28203@wrigleys.postgresql.org
That person never came back with a self-contained test case, either.
https://wiki.postgresql.org/wiki/Guide_to_reporting_problems
regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Hi,
Thanks for the responses.
"One lesson I learned from the BSD camp when dealing with random freezes and panics : when all else fails to give an answer it is time to start blaming my hardware. Are those tablespaces on any cheap SSD's ?”
The tablespaces are not sat on SSD’s. Something I had also considered.
Tom - I can provide a jar that I have been using to replicate the issue. Whats the best transport method to send it over?
Best wishes,
Harry
On 7 Jun 2017, at 16:27, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Harry Ambrose <harry.ambrose@gmail.com> writes:
I have been following the updates to the 9.4 branch hoping a fix will appear, but sadly no luck yet. I have manually replicated the issue on 9.4.4, 9.4.10 and 9.4.12. My replication steps are:
This is a very interesting report, but you didn't actually provide a
reproducer, just a handwavy outline. If you submit a script that
makes this happen, we will most definitely look into it. But
people aren't going to be excited about trying to reverse-engineer
a test case out of a vague description.I also found the following has been reported:
/messages/by-id/20161201165505.4360.28203@wrigleys.postgresql.orgThat person never came back with a self-contained test case, either.
https://wiki.postgresql.org/wiki/Guide_to_reporting_problems
regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Harry Ambrose <harry.ambrose@gmail.com> writes:
Tom - I can provide a jar that I have been using to replicate the issue. Whats the best transport method to send it over?
If it's not enormous, just send it as an email attachment.
regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Hi,
Please find the jar attached (renamed with a .txt extension as I know some
email services deem jars a security issue).
The jar accepts the following arguments:
$1 = host
$2 = database
$3 = username
$4 = password
$5 = port
It returns its logging to STDOUT. Please let me know if you require further
info.
Best wishes,
Harry
On 7 June 2017 at 17:46, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Show quoted text
Harry Ambrose <harry.ambrose@gmail.com> writes:
Tom - I can provide a jar that I have been using to replicate the issue.
Whats the best transport method to send it over?
If it's not enormous, just send it as an email attachment.
regards, tom lane
Attachments:
Hi,
Please find the jar attached (renamed with a .txt extension as I know some
email services deem jars a security issue).
The jar accepts the following arguments:
$1 = host
$2 = database
$3 = username
$4 = password
$5 = port
It returns its logging to STDOUT. Please let me know if you require further
info.
Best wishes,
Harry
On 7 June 2017 at 17:46, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Show quoted text
Harry Ambrose <harry.ambrose@gmail.com> writes:
Tom - I can provide a jar that I have been using to replicate the issue.
Whats the best transport method to send it over?
If it's not enormous, just send it as an email attachment.
regards, tom lane
Attachments:
toast-corrupter-aio.txttext/plain; charset=x-binaryenc; name=toast-corrupter-aio.txtDownload+17-11
Hi,
Please find the jar attached (renamed with a .txt extension as I know some
email services deem jars a security issue).
The jar accepts the following arguments:
$1 = host
$2 = database
$3 = username
$4 = password
$5 = port
It returns its logging to STDOUT. Please let me know if you require further
info.
Best wishes,
Harry
On 7 June 2017 at 17:46, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Show quoted text
Harry Ambrose <harry.ambrose@gmail.com> writes:
Tom - I can provide a jar that I have been using to replicate the issue.
Whats the best transport method to send it over?
If it's not enormous, just send it as an email attachment.
regards, tom lane
Attachments:
Achilleas writes:
First try to find which tables those toast relations refer to :
select 10919630::regclass , 10920100::regclass ;
Are those critical tables? Can you restore them somehow?
They are our two big tables, containing the bulk of our data (one with
168M rows, the other with 320M rows).
They are constantly being updated, but if I can identify the affected
rows, I can restore a backup on another machine and cherry pick them
from there.
Also you may consider
REINDEX TABLE pg_toast.pg_toast_10920100;
REINDEX TABLE pg_toast.pg_toast_10919630;
REINDEX TABLE <name of table 10920100>;
REINDEX TABLE <name of table 10919630>;also VACUUM the above tables.
Yes, but I'd like to know find out why it happens, because cleaning up
and having the corruption reoccur is not so fun.
You might want to write a function which iterates over the damaged
table's rows in order to identify the damaged row(s). And then do some
good update to create a new version.
Yes - we started by doing a quick pg_dump, but I guess we should switch
to something that can tell us exactly what rows hit the problem.
Anyone has a handy little script lying around?
Thanks for the response!
Adam
--
"Lägg ditt liv i min hand Adam Sjøgren
Sälj din själ till ett band" adsj@novozymes.com
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Import Notes
Resolved by subject fallback
Harry writes:
The second vacuum causes an ERROR identical to that you are reporting
below (unexpected chunk number n (expected n) for toast value...).
However it may take up to ten attempts to replicate it.
Interesting.
Out of interest, are you using any tablespaces other than pg_default?
I can only replicate the issue when using separately mounted
tablespaces.
No, we are using pg_default only.
I hope your finding can be reproduced, it would be really interesting to
see.
Best regards,
Adam
--
"Lägg ditt liv i min hand Adam Sjøgren
Sälj din själ till ett band" adsj@novozymes.com
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Import Notes
Resolved by subject fallback
On 06/08/2017 08:13 AM, ADSJ (Adam Sjøgren) wrote:
Achilleas writes:
Anyone has a handy little script lying around?
http://www.databasesoup.com/2013/10/de-corrupting-toast-tables.html
Thanks for the response!
Adam
--
Adrian Klaver
adrian.klaver@aklaver.com
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Harry Ambrose <harry.ambrose@gmail.com> writes:
Please find the jar attached (renamed with a .txt extension as I know some
email services deem jars a security issue).
Hmm, the output from this script reminds me quite a lot of one I was
sent in connection with bug #14444 awhile back:
/messages/by-id/20161201165505.4360.28203@wrigleys.postgresql.org
Was that a colleague of yours?
Anyway, the bad news is I couldn't reproduce the problem then and I can't
now. I don't know if it's a timing issue or if there's something critical
about configuration that I'm not duplicating. Can you explain what sort
of platform you're testing on, and what nondefault configuration settings
you're using?
regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Hi Tom,
Thanks for attempting to replicate the issue.
Anyway, the bad news is I couldn't reproduce the problem then and I can't
now. I don't know if it's a timing issue or if there's something critical
about configuration that I'm not duplicating. Can you explain what sort
of platform you're testing on, and what nondefault configuration settings
you're using?
Further details about the environment that I can replicate on below:
- Non default postgresql.conf settings:
checkpoint_segments = 192
checkpoint_completion_target = 0.9
checkpoint_timeout = 5min
wal_keep_segments = 256
wal_writer_delay = 200ms
archive_mode = on
archive_command = 'rsync -e ssh -arv /wal/pg_xlog/%f postgres@<removed
hostname>:/wal/pg_xlog'
archive_timeout = 60
syslog_facility = 'LOCAL0'
log_statement = 'mod'
syslog_ident = 'postgres'
log_line_prefix = '%h %m %p %c %u %a %e '
log_timezone = 'GB'
track_activities = on
track_counts = on
datestyle = 'iso, mdy'
timezone = 'GB'
default_text_search_config = 'pg_catalog.english'
array_nulls = on
sql_inheritance = on
standard_conforming_strings = on
synchronize_seqscans = on
transform_null_equals = off
- Two node master/slave setup using streaming replication (without slots).
- CentOS 6.9 (2.6.32-696.el6.x86_64).
- PostgreSQL 9.4.10 on x86_64-unknown-linux-gnu, compiled by gcc (GCC)
4.4.7 20120313 (Red Hat 4.4.7-17), 64-bit.
- 64GiB RAM.
- AMD Opteron(TM) Processor 6238.
- pg_default sat on 2 disk RAID-1 conifugration (ext3 filesystem).
- Custom tablespaces (where the errors occur) sat on 4 disk RAID-10 (ext3
filesystem).
- All disks are HP 600G SAS 6.0Gbps with P420(i) controllers and battery
backed cache enabled.
Please let me know if you require further info.
Best wishes,
Harry
On 09/06/2017 14:44, Harry Ambrose wrote:
Hi Tom,
Thanks for attempting to replicate the issue.
Anyway, the bad news is I couldn't reproduce the problem then and I can't
now. I don't know if it's a timing issue or if there's something critical
about configuration that I'm not duplicating. Can you explain what sort
of platform you're testing on, and what nondefault configuration settings
you're using?Further details about the environment that I can replicate on below:
- Non default postgresql.conf settings:
checkpoint_segments = 192
checkpoint_completion_target = 0.9
checkpoint_timeout = 5min
wal_keep_segments = 256
wal_writer_delay = 200ms
archive_mode = on
archive_command = 'rsync -e ssh -arv /wal/pg_xlog/%f postgres@<removed hostname>:/wal/pg_xlog'
archive_timeout = 60
syslog_facility = 'LOCAL0'
log_statement = 'mod'
syslog_ident = 'postgres'
log_line_prefix = '%h %m %p %c %u %a %e '
log_timezone = 'GB'
track_activities = on
track_counts = on
datestyle = 'iso, mdy'
timezone = 'GB'
default_text_search_config = 'pg_catalog.english'
array_nulls = on
sql_inheritance = on
standard_conforming_strings = on
synchronize_seqscans = on
transform_null_equals = off
- Two node master/slave setup using streaming replication (without slots).
- CentOS 6.9 (2.6.32-696.el6.x86_64).
- PostgreSQL 9.4.10 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-17), 64-bit.
- 64GiB RAM.
- AMD Opteron(TM) Processor 6238.
- pg_default sat on 2 disk RAID-1 conifugration (ext3 filesystem).
- Custom tablespaces (where the errors occur) sat on 4 disk RAID-10 (ext3 filesystem).
- All disks are HP 600G SAS 6.0Gbps with P420(i) controllers and battery backed cache enabled.
Maybe you could give some info on :
- your ext3 mkfs and mount options (journal, barriers, etc)
- your controller setup (battery should be working good and cache mode set to write back)
- your disks setup (write cache should be disabled)
- you should check your syslogs/messages for any errors related to storage
- is your RAM ECC? Did you run any memtest?
- is your CPU overheating ?
- have you experienced any crashes/freezes ?
Please let me know if you require further info.
Best wishes,
Harry
--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt
Hi,
Maybe you could give some info on :
- your ext3 mkfs and mount options (journal, barriers, etc)
/etc/fstab details below:
LABEL=/var/lib/pgsql /var/lib/pgsql ext3 defaults
1 2
LABEL=/tablespace1 /tablespace1 ext3 defaults
1 2
LABEL=/tablespace2 /tablespace2 ext3 defaults
1 2
LABEL=/tablespace3 /tablespace3 ext3 defaults
1 2
pg_default:
Filesystem features: has_journal ext_attr resize_inode dir_index
filetype needs_recovery sparse_super large_file
Filesystem flags: signed_directory_hash
Default mount options: (none)
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 36634624
Block count: 146506767
Reserved block count: 7325338
Free blocks: 143785740
Free inodes: 36627866
First block: 0
Block size: 4096
Fragment size: 4096
Reserved GDT blocks: 989
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 8192
Inode blocks per group: 512
RAID stride: 64
RAID stripe width: 64
Filesystem created: Fri Aug 9 16:11:53 2013
Last mount time: Fri Apr 21 22:37:02 2017
Last write time: Fri Apr 21 22:37:02 2017
Mount count: 2
Maximum mount count: 100
Last checked: Thu Sep 15 18:52:43 2016
Check interval: 31536000 (12 months, 5 days)
Next check after: Fri Sep 15 18:52:43 2017
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 28
Desired extra isize: 28
Journal inode: 8
Default directory hash: half_md4
Journal backup: inode blocks
tablespaces
Filesystem features: has_journal ext_attr resize_inode dir_index
filetype needs_recovery sparse_super large_file
Filesystem flags: signed_directory_hash
Default mount options: (none)
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 73261056
Block count: 293013543
Reserved block count: 14650677
Free blocks: 286208439
Free inodes: 73174728
First block: 0
Block size: 4096
Fragment size: 4096
Reserved GDT blocks: 954
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 8192
Inode blocks per group: 512
RAID stride: 64
RAID stripe width: 128
Filesystem created: Fri Aug 9 16:11:53 2013
Last mount time: Fri Apr 21 22:37:02 2017
Last write time: Fri Apr 21 22:37:02 2017
Mount count: 2
Maximum mount count: 100
Last checked: Thu Sep 15 18:52:43 2016
Check interval: 31536000 (12 months, 5 days)
Next check after: Fri Sep 15 18:52:43 2017
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 28
Desired extra isize: 28
Journal inode: 8
Default directory hash: half_md4
Journal backup: inode blocks
- your controller setup (battery should be working good and cache mode set
to write back)
Cache Board Present: True
Cache Status: OK
Cache Ratio: 10% Read / 90% Write
Drive Write Cache: Disabled
Total Cache Size: 2.0 GB
Total Cache Memory Available: 1.8 GB
No-Battery Write Cache: Disabled
SSD Caching RAID5 WriteBack Enabled: False
SSD Caching Version: 1
Cache Backup Power Source: Capacitors
Battery/Capacitor Count: 1
Battery/Capacitor Status: OK
- your disks setup (write cache should be disabled)
Write cache is disabled, see above.
- you should check your syslogs/messages for any errors related to storage
No error messages found.
- is your RAM ECC? Did you run any memtest?
Yes, memory is ECC. No error messages found.
- is your CPU overheating ?
No overheating issues.
- have you experienced any crashes/freezes ?
No crashes/freezes experienced.
Best wishes,
Harry
Ha guys,
I am new to postgress and I am trying to write my first function to insert,
update or delete and trap errors as a result of the table not existing ,
the columns not exist or if any other error simply pass back the sqlstate
here's my code can you help
CREATE OR REPLACE FUNCTION listings_audit() RETURNS TRIGGER AS
$listings_audit$
BEGIN
IF (TG_OP = 'DELETE') THEN
IF (EXISTS (
SELECT 1
FROM pg_catalog.pg_class c
JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE n.nspname = 'schema_name'
AND c.relname = 'table_name'
AND c.relkind = 'r' -- only tables
)) THEN
INSERT INTO listings_changes
SELECT now(), 'DELETE', OLD.*;
RETURN OLD;
ELSE RAISE EXCEPTION 'Table does not exists';
END IF;
ELSIF (TG_OP = 'UPDATE') THEN
IF (EXISTS (
SELECT 1
FROM pg_catalog.pg_class c
JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE n.nspname = 'schema_name'
AND c.relname = 'table_name'
AND c.relkind = 'r' -- only tables
)) THEN
INSERT INTO listings_changes
SELECT now(), 'UPDATE', NEW.*;
RETURN NEW;
ELSE RAISE EXCEPTION 'Table does not exists';
END IF;
ELSEIF (TG_OP = 'INSERT') THEN
INSERT INTO listings_changes
SELECT now(), 'INSERT', NEW.*;
RETURN NEW;
END IF;
EXCEPTION
WHEN SQLSTATE '42611' THEN
RAISE EXCEPTION 'Columns do not match audit file does not match user
file';
WHEN SQLSTATE '42P16' THEN
RAISE EXCEPTION 'Table does not exists';
WHEN OTHERS THEN
RAISE EXCEPTION 'PostgresSQL error code that has occurred';
RETURN SQLSTATE;
END;
$listings_audit$ LANGUAGE plpgsql;
On Thu, Jun 8, 2017 at 12:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Show quoted text
Harry Ambrose <harry.ambrose@gmail.com> writes:
Please find the jar attached (renamed with a .txt extension as I know
some
email services deem jars a security issue).
Hmm, the output from this script reminds me quite a lot of one I was
sent in connection with bug #14444 awhile back:
/messages/by-id/20161201165505.
4360.28203%40wrigleys.postgresql.org
Was that a colleague of yours?Anyway, the bad news is I couldn't reproduce the problem then and I can't
now. I don't know if it's a timing issue or if there's something critical
about configuration that I'm not duplicating. Can you explain what sort
of platform you're testing on, and what nondefault configuration settings
you're using?regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Robert Lakes <robertl@propaas.com> writes:
I am new to postgress and I am trying to write my first function to insert,
update or delete and trap errors as a result of the table not existing ,
the columns not exist or if any other error simply pass back the sqlstate
Please do not hijack an existing thread to ask an unrelated question.
Start a new thread (ie "compose" don't "reply") and use an appropriate
subject line.
regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
ADSJ (Adam Sj�gren) wrote:
Our database has started reporting errors like this:
2017-05-31 13:48:10 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 14242189 in pg_toast_10919630
Does the problem still reproduce if you revert commit
6c243f90ab6904f27fa990f1f3261e1d09a11853?
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general