tuplesort: unexpected end of data
Hi all!
First of all, thanks for your fine job and the answers to a problem
(which I thought it was a PG bug) I posted some days ago. I've solved it!
Now I've got another strange issue to post...
Briefing: the program runs daily and this (please see below) error
does not always happen, actually this was the first time this week.
I hope you can help me?... If I'm doing something wrong, please let
me know...
Thanks in advance! :-)
[JDBC driver error report]
org.sourceforge.jxutil.sql.I18nSQLException: sqlError[tuplesort: unexpected end of data]
at org.sourceforge.jxdbcon.postgresql.PGExceptions.sqlError(Unknown Source)
at org.sourceforge.jxdbcon.postgresql.PGErrors.throwError(Unknown Source)
at org.sourceforge.jxdbcon.postgresql.PGResult.checkException(Unknown Source)
at org.sourceforge.jxdbcon.postgresql.PGExecResult.checkException(Unknown Source)
at org.sourceforge.jxdbcon.postgresql.PGConnection.executeSQL(Unknown Source)
at org.sourceforge.jxdbcon.postgresql.PGStatement.execute(Unknown Source)
at org.sourceforge.jxdbcon.AbstractStatement.executeQuery(Unknown Source)
(...)
[PostgreSQL log excerpt]
(...)
(comment: the query in question)
Jun 2 05:35:48 srv31 postgres[2986]: [57277] DEBUG: query: SELECT fill_warehouse()
Jun 2 05:35:48 srv31 postgres[2986]: [57278-1] DEBUG: query: insert into warehouse (uri, expression, n, relevance, spid_measure, size, title, sample) select d.uri,
Jun 2 05:35:48 srv31 postgres[2986]: [57278-2] dn.expression, n.n, dn.relevance, d.spid_measure, d.size, d.title, dn.sample from document as d inner join (document_n_gram as
Jun 2 05:35:48 srv31 postgres[2986]: [57278-3] dn inner join n_gram as n on (dn.expression = n.expression)) on (d.uri = dn.uri) order by dn.expression asc, n.n asc
Jun 2 06:37:08 srv31 postgres[11485]: [3317] DEBUG: recycled transaction log file 00000018000000DA
Jun 2 06:37:08 srv31 postgres[11485]: [3318] DEBUG: recycled transaction log file 00000018000000DB
(commment: this is a local monitor server)
Jun 2 06:39:39 srv31 postgres[11488]: [3317] FATAL 1: No pg_hba.conf entry for host 193.136.120.30, user root, database template1
Jun 2 06:44:38 srv31 postgres[11498]: [3317] FATAL 1: No pg_hba.conf entry for host 193.136.120.30, user root, database template1
Jun 2 06:49:39 srv31 postgres[11505]: [3317] FATAL 1: No pg_hba.conf entry for host 193.136.120.30, user root, database template1
Jun 2 06:54:39 srv31 postgres[11515]: [3317] FATAL 1: No pg_hba.conf entry for host 193.136.120.30, user root, database template1
Jun 2 06:59:38 srv31 postgres[11522]: [3317] FATAL 1: No pg_hba.conf entry for host 193.136.120.30, user root, database template1
Jun 2 07:04:39 srv31 postgres[11541]: [3317] FATAL 1: No pg_hba.conf entry for host 193.136.120.30, user root, database template1
Jun 2 07:09:39 srv31 postgres[11548]: [3317] FATAL 1: No pg_hba.conf entry for host 193.136.120.30, user root, database template1
Jun 2 07:14:38 srv31 postgres[11558]: [3317] FATAL 1: No pg_hba.conf entry for host 193.136.120.30, user root, database template1
Jun 2 07:19:38 srv31 postgres[11565]: [3317] FATAL 1: No pg_hba.conf entry for host 193.136.120.30, user root, database template1
Jun 2 07:24:39 srv31 postgres[11575]: [3317] FATAL 1: No pg_hba.conf entry for host 193.136.120.30, user root, database template1
(comment: the error)
Jun 2 06:29:37 srv31 postgres[2986]: [57279] ERROR: tuplesort: unexpected end of data
Jun 2 06:29:37 srv31 postgres[2986]: [57280] NOTICE: Error occurred while executing PL/pgSQL function fill_warehouse
Jun 2 06:29:37 srv31 postgres[2986]: [57281] NOTICE: line 2 at SQL statement
Jun 2 07:29:39 srv31 postgres[11582]: [3317] FATAL 1: No pg_hba.conf entry for host 193.136.120.30, user root, database template1
Jun 2 06:29:40 srv31 postgres[2986]: [57282] DEBUG: query: COMMIT
Jun 2 06:29:40 srv31 postgres[2986]: [57283] DEBUG: ProcessUtility: COMMIT
Jun 2 06:29:41 srv31 postgres[2985]: [48998] DEBUG: pq_recvbuf: unexpected EOF on client connection
Jun 2 06:29:41 srv31 postgres[2982]: [3335] DEBUG: pq_recvbuf: unexpected EOF on client connection
Jun 2 06:29:41 srv31 postgres[2986]: [57284] DEBUG: pq_recvbuf: unexpected EOF on client connection
(...)
[System info]
Red Hat Linux release 7.2 (Enigma)
Kernel 2.4.9-13
postgresql-libs-7.2.1-2PGDG
postgresql-server-7.2.1-2PGDG
postgresql-7.2.1-2PGDG
--
o__ Bem haja,
_.>/ _ NunoACHenriques
(_) \(_) ~~~~~~~~~~~~~~~
http://students.fct.unl.pt/users/nuno/
NunoACHenriques <nach@fct.unl.pt> writes:
Jun 2 06:29:37 srv31 postgres[2986]: [57279] ERROR: tuplesort: unexpected end of data
Hmm. This is an internal consistency check in the sort code. Perhaps
you've found a bug, but there's not enough info here to do much. Can
you provide the EXPLAIN plan for the query that's triggering the error?
regards, tom lane
Hi!
I should say that this error is a non-deterministic one. It happened
once/twenty...
-------------------------------explain info---------------------------------
spid=> explain insert into warehouse_tmp (uri, expression, n, relevance, spid_measure, size, title, sample)
spid-> select d.uri, dn.expression, n.n, dn.relevance, d.spid_measure, d.size, d.title, dn.sample
spid-> from document as d
spid-> inner join (document_n_gram as dn
spid(> inner join n_gram as n
spid(> on (dn.expression = n.expression))
spid-> on (d.uri = dn.uri)
spid-> order by dn.expression asc, n.n asc ;
NOTICE: QUERY PLAN:
Subquery Scan *SELECT* (cost=3895109.07..3895109.07 rows=1009271 width=886)
-> Sort (cost=3895109.07..3895109.07 rows=1009271 width=886)
-> Hash Join (cost=1155071.81..2115045.12 rows=1009271 width=886)
-> Merge Join (cost=1154294.92..1170599.85 rows=1009271 width=588)
-> Sort (cost=1001390.67..1001390.67 rows=1009271 width=439)
-> Seq Scan on document_n_gram dn (cost=0.00..49251.71 rows=1009271 width=439)
-> Sort (cost=152904.25..152904.25 rows=466345 width=149)
-> Seq Scan on n_gram n (cost=0.00..12795.45 rows=466345 width=149)
-> Hash (cost=767.71..767.71 rows=3671 width=298)
-> Seq Scan on document d (cost=0.00..767.71 rows=3671 width=298)
EXPLAIN
spid=>
----------------------------------------------------------------------------
If you need more info, just ask. I'll be glad to contribute to a so
interesting project! :-)
--
o__ Bem haja,
_.>/ _ NunoACHenriques
(_) \(_) ~~~~~~~~~~~~~~~
http://students.fct.unl.pt/users/nuno/
On Sun, 9 Jun 2002, Tom Lane wrote:
Show quoted text
NunoACHenriques <nach@fct.unl.pt> writes:
Jun 2 06:29:37 srv31 postgres[2986]: [57279] ERROR: tuplesort: unexpected end of data
Hmm. This is an internal consistency check in the sort code. Perhaps
you've found a bug, but there's not enough info here to do much. Can
you provide the EXPLAIN plan for the query that's triggering the error?regards, tom lane
NunoACHenriques <nach@fct.unl.pt> writes:
I should say that this error is a non-deterministic one. It happened
once/twenty...
Is the data in the tables changing constantly? If you can repeat the
same query on the same data and get varying results, then we're
dealing with something odder than I suspected.
regards, tom lane
Hi!
On Sun, 9 Jun 2002, Tom Lane wrote:
Is the data in the tables changing constantly? If you can repeat the
same query on the same data and get varying results, then we're
dealing with something odder than I suspected.regards, tom lane
Not constantly, once a day.
I cannot repeat the same query on the same data, but the app does the
same query over similar data (most static Web docs with little variation of
about 50 docs in 7000) but diferent to PG because of the drop/create
tables before the new fill...
--
o__ Bem haja,
_.>/ _ NunoACHenriques
(_) \(_) ~~~~~~~~~~~~~~~
http://students.fct.unl.pt/users/nuno/
NunoACHenriques <nach@fct.unl.pt> writes:
On Sun, 9 Jun 2002, Tom Lane wrote:
Is the data in the tables changing constantly?
Not constantly, once a day.
Can't you set up a situation where the failure is reproducible, then?
On a day where you get the failure, dump the database and see if
you can load the data into a fresh database and reproduce the problem.
I spent some time looking at the tuplesort code and could not see any
reason for this failure. All that code has been fairly stable since
it was written for 7.0, and AFAIR no one else has reported this error
message. So either you've got a quite-unusual corner case that's
tickling a previously unseen bug, or you've got flaky hardware that for
some reason is manifesting in this way.
I don't necessarily believe the flaky-hardware theory, but I can't
make much progress on the bug theory without a test case to look at.
regards, tom lane
On Sun, 9 Jun 2002, Tom Lane wrote:
Can't you set up a situation where the failure is reproducible, then?
On a day where you get the failure, dump the database and see if
you can load the data into a fresh database and reproduce the problem.
Ok, I will do that...
I don't necessarily believe the flaky-hardware theory, but I can't
make much progress on the bug theory without a test case to look at.
Neither I believe it because the machine is well tested (including a
24h memtest). But there is something I can't get of my mind: once a day my
app "forces" PG to "play" with some 3GB of disk data in a ext2 fs. It is
known that sometimes ext2 corrupts data...
Thanks for the effort! :-)
--
o__ Bem haja,
_.>/ _ NunoACHenriques
(_) \(_) ~~~~~~~~~~~~~~~
http://students.fct.unl.pt/users/nuno/
NunoACHenriques <nach@fct.unl.pt> writes:
Neither I believe it because the machine is well tested (including a
24h memtest). But there is something I can't get of my mind: once a day my
app "forces" PG to "play" with some 3GB of disk data in a ext2 fs. It is
known that sometimes ext2 corrupts data...
Unless you've got things set up so that the temporary files created for
the sorting step are in the ext2 partition, this doesn't seem like it
could be the source of the problem. A more plausible theory is that a
segment of main RAM is bad, but you happen not to use that part of RAM
except under heavy load (ie, while running this daily batch job).
Or it could just be a bug. If you can get a reproducible test case I'll
be happy to dig into it.
regards, tom lane
Hi!
A different error today:
[MemoryContextAlloc: invalid request size 4294967295]
This is a more often (twice a week) error and I don't understand
why?...
I'm verifying the machine: fsck (with bad blocks chk), ... but no
hardware problems untill now.
--------------------------------info------------------------------------
$ pg_config --version
PostgreSQL 7.2.1
$ cat /etc/redhat-release
Red Hat Linux release 7.2 (Enigma)
$/var/log/pgsql (excerpt)
(...)
Jun 11 03:06:04 srv31 postgres[13914]: [3403] ERROR: cannot open segment 1 of relation n_gram (target block 528325): No such file or directory
(...)
Jun 11 04:26:12 srv31 postgres[14972]: [3317] DEBUG: recycled transaction log file 00000020000000F6
Jun 11 04:26:12 srv31 postgres[14972]: [3318] DEBUG: recycled transaction log file 00000020000000F7
Jun 11 04:26:12 srv31 postgres[14972]: [3319] DEBUG: recycled transaction log file 00000020000000F8
Jun 11 04:28:56 srv31 postgres[14983]: [3317] DEBUG: recycled transaction log file 00000020000000F9
Jun 11 04:28:56 srv31 postgres[14983]: [3318] DEBUG: recycled transaction log file 00000020000000FA
Jun 11 04:28:56 srv31 postgres[14983]: [3319] DEBUG: recycled transaction log file 00000020000000FB
Jun 11 03:29:07 srv31 postgres[13913]: [3383] ERROR: MemoryContextAlloc: invalid request size 4294967295
Jun 11 03:29:07 srv31 postgres[13913]: [3384] NOTICE: Error occurred while executing PL/pgSQL function set_n_gram
Jun 11 03:29:07 srv31 postgres[13913]: [3385] NOTICE: line 9 at select into variables
(...)
------------------------------------------------------------------------
--
o__ Bem haja,
_.>/ _ NunoACHenriques
(_) \(_) ~~~~~~~~~~~~~~~
http://students.fct.unl.pt/users/nuno/
NunoACHenriques <nach@fct.unl.pt> writes:
A different error today:
[MemoryContextAlloc: invalid request size 4294967295]
This could be a variant of the same problem: instead of getting a zero
tuple length from the sort temp file, we're reading a -1 tuple length.
Still no way to tell if it's a hardware glitch or a software bug.
(If the latter, presumably the code is getting out of step about its
read position in the temp file --- but how?)
regards, tom lane