consistency check on SPI tuple count failed

Started by Gaetano Mendolaover 22 years ago13 messageshackers
Jump to latest
#1Gaetano Mendola
mendola@bigfoot.com

Hi all,
the following code was working properly under Postgres 7.3.X
I'm now running my regression test with Postgres 7.4beta1 and I'm
having the error in subj.

CREATE TABLE test ( a integer, b integer );

INSERT INTO test VALUES ( 1 );

CREATE OR REPLACE FUNCTION foo(INTEGER)
RETURNS INTEGER AS'
BEGIN
RETURN $1 + 1;
END;
' LANGUAGE 'plpgsql';

CREATE OR REPLACE FUNCTION bar()
RETURNS INTEGER AS'
DECLARE
my_ret RECORD;
BEGIN

FOR my_ret IN
SELECT foo(a) AS ret
FROM test
LOOP
IF my_ret.ret = 3 THEN
RETURN -1;
END IF;

END LOOP;

RETURN 0;

END;
' LANGUAGE 'plpgsql';

Regards
Gaetano Mendola

#2Gaetano Mendola
mendola@bigfoot.com
In reply to: Gaetano Mendola (#1)
Re: consistency check on SPI tuple count failed

I forgot to say to do a:

select bar()

at the end!

Gaetano

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Gaetano Mendola (#1)
Re: consistency check on SPI tuple count failed

"Gaetano Mendola" <mendola@bigfoot.com> writes:

the following code was working properly under Postgres 7.3.X
I'm now running my regression test with Postgres 7.4beta1 and I'm
having the error in subj.

I tried this and got

regression=# select bar();
bar
-----
0
(1 row)

regression=#

Anyone else see the problem?

regards, tom lane

#4Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Tom Lane (#3)
Re: consistency check on SPI tuple count failed

On Fri, 8 Aug 2003, Tom Lane wrote:

"Gaetano Mendola" <mendola@bigfoot.com> writes:

the following code was working properly under Postgres 7.3.X
I'm now running my regression test with Postgres 7.4beta1 and I'm
having the error in subj.

I tried this and got

regression=# select bar();
bar
-----
0
(1 row)

regression=#

Anyone else see the problem?

I got the same thing as Gaetano on my just prior to beta1 system.

#5Rod Taylor
rbt@rbt.ca
In reply to: Tom Lane (#3)
Re: consistency check on SPI tuple count failed

On Fri, 2003-08-08 at 11:55, Tom Lane wrote:

"Gaetano Mendola" <mendola@bigfoot.com> writes:

the following code was working properly under Postgres 7.3.X
I'm now running my regression test with Postgres 7.4beta1 and I'm
having the error in subj.

I tried this and got

regression=# select bar();
bar
-----
0
(1 row)

regression=#

Anyone else see the problem?

Bar gives 0 for me as well.

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephan Szabo (#4)
Re: consistency check on SPI tuple count failed

Stephan Szabo <sszabo@megazone.bigpanda.com> writes:

I got the same thing as Gaetano on my just prior to beta1 system.

Well, we couldn't have fixed it since beta1 --- there's been no changes
anywhere near SPI. I'm thinking it must be platform-dependent. What
are you guys using, exactly?

regards, tom lane

#7Gaetano Mendola
mendola@bigfoot.com
In reply to: Gaetano Mendola (#1)
Re: consistency check on SPI tuple count failed

"Tom Lane" <tgl@sss.pgh.pa.us> wrote:

"Gaetano Mendola" <mendola@bigfoot.com> writes:

the following code was working properly under Postgres 7.3.X
I'm now running my regression test with Postgres 7.4beta1 and I'm
having the error in subj.

I tried this and got

regression=# select bar();
bar
-----
0
(1 row)

regression=#

Anyone else see the problem?

regards, tom lane

Incredible to believe but after playng around that funcion started
to work. I'm not crazy.

I deleted the DB.
Stopped postgres.
Restart postgres.
Create the DB.
Create the language.
Inserted my example.

Again the error:

kalman=# select bar();
ERROR: consistency check on SPI tuple count failed
CONTEXT: PL/pgSQL function "bar" line 5 at for over select rows
kalman=# select bar();
ERROR: consistency check on SPI tuple count failed
CONTEXT: PL/pgSQL function "bar" line 5 at for over select rows
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Gaetano

#8Gaetano Mendola
mendola@bigfoot.com
In reply to: Stephan Szabo (#4)
Re: consistency check on SPI tuple count failed

"Tom Lane" <tgl@sss.pgh.pa.us>

Stephan Szabo <sszabo@megazone.bigpanda.com> writes:

I got the same thing as Gaetano on my just prior to beta1 system.

Well, we couldn't have fixed it since beta1 --- there's been no changes
anywhere near SPI. I'm thinking it must be platform-dependent. What
are you guys using, exactly?

regards, tom lane

kalman=# select version();
version
----------------------------------------------------------------------------
--------------------------------
PostgreSQL 7.4beta1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2.2
20030222 (Red Hat Linux 3.2.2-5)
(1 row)

Regards
Gateano Mendola

#9Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Tom Lane (#6)
Re: consistency check on SPI tuple count failed

On Fri, 8 Aug 2003, Tom Lane wrote:

Stephan Szabo <sszabo@megazone.bigpanda.com> writes:

I got the same thing as Gaetano on my just prior to beta1 system.

Well, we couldn't have fixed it since beta1 --- there's been no changes
anywhere near SPI. I'm thinking it must be platform-dependent. What
are you guys using, exactly?

I'm using RedHat 9.

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Gaetano Mendola (#7)
Re: consistency check on SPI tuple count failed

"Mendola Gaetano" <mendola@bigfoot.com> writes:

Again the error:

kalman=# select bar();
ERROR: consistency check on SPI tuple count failed
CONTEXT: PL/pgSQL function "bar" line 5 at for over select rows
kalman=# select bar();
ERROR: consistency check on SPI tuple count failed
CONTEXT: PL/pgSQL function "bar" line 5 at for over select rows
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

After adding a second row to the test table, I am able to reproduce
the above (including the core dump after second try) on an intel/linux
box, but *not* on HPUX.

I now suspect a memory-stomp kind of problem, like someone writing one
too many bytes in a struct. HPUX tends to mask these in situations
where intel will not, because it uses MAXALIGN 8 rather than 4.

I have also just traced through _SPI_cursor_operation() in spi.c,
watched PortalRunFetch return 2, and then watched _SPI_checktuples read
zero from _SPI_current->processed. How the heck could that happen?
Compiler bug, or am I just crazy?

regards, tom lane

#11Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Tom Lane (#10)
Re: consistency check on SPI tuple count failed

On Fri, 8 Aug 2003, Tom Lane wrote:

"Mendola Gaetano" <mendola@bigfoot.com> writes:

Again the error:

kalman=# select bar();
ERROR: consistency check on SPI tuple count failed
CONTEXT: PL/pgSQL function "bar" line 5 at for over select rows
kalman=# select bar();
ERROR: consistency check on SPI tuple count failed
CONTEXT: PL/pgSQL function "bar" line 5 at for over select rows
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

After adding a second row to the test table, I am able to reproduce
the above (including the core dump after second try) on an intel/linux
box, but *not* on HPUX.

I now suspect a memory-stomp kind of problem, like someone writing one
too many bytes in a struct. HPUX tends to mask these in situations
where intel will not, because it uses MAXALIGN 8 rather than 4.

I have also just traced through _SPI_cursor_operation() in spi.c,
watched PortalRunFetch return 2, and then watched _SPI_checktuples read
zero from _SPI_current->processed. How the heck could that happen?
Compiler bug, or am I just crazy?

Not sure, but I got the same thing. When I changed it to put the
result in a temporary int variable and then put it in it started
working for me (returning 0), reverting to the original made it fail
again. I'm going to try -O0 and see what happens there.

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephan Szabo (#11)
Re: consistency check on SPI tuple count failed

Stephan Szabo <sszabo@megazone.bigpanda.com> writes:

On Fri, 8 Aug 2003, Tom Lane wrote:

I have also just traced through _SPI_cursor_operation() in spi.c,
watched PortalRunFetch return 2, and then watched _SPI_checktuples read
zero from _SPI_current->processed. How the heck could that happen?
Compiler bug, or am I just crazy?

Not sure, but I got the same thing. When I changed it to put the
result in a temporary int variable and then put it in it started
working for me (returning 0), reverting to the original made it fail
again. I'm going to try -O0 and see what happens there.

Oooohhhh ...

<lightbulb>
SPI_stack can move around as functions are entered/exited.
</lightbulb>

Wonder why we've not seen that kind of failure happen before? Someone
(doubtless me) must have changed the coding of this routine since 7.3.

regards, tom lane

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Gaetano Mendola (#7)
Re: consistency check on SPI tuple count failed

"Mendola Gaetano" <mendola@bigfoot.com> writes:

Incredible to believe but after playng around that funcion started
to work. I'm not crazy.

Yeah, it was a problem with storing into a possibly-obsolete pointer ---
the visible effects could range from nothing to a core dump depending on
whether the pointer was really out-of-date and what got clobbered if it
was.

Fix is in CVS.

regards, tom lane