Re: [HACKERS] large objects failing (hpux10.20 sparc/solaris 2.6, gcc 2.8.1)

Started by Ron Snyderalmost 27 years ago7 messages
#1Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Ron Snyder (#2)

Postgres 6.4.2, gcc 2.8.1, hpux 10.20 and Sparc/Solaris 2.6
(All debugging attempts were on the hpux machine, but the symptom also
occurs on the Solaris box. Also tried the latest (feb 18) snapshot
on the hpux box, symptom still occurs.)

I was installing the perl DBD::Pg module, but it fails the large object
test. To make sure it wasn't just an issue with perl (or the module),
I compiled and ran src/test/examples/testlo, and it also fails.
(testlo2 also has failed in the past, although I didn't use it for any
of my current debugging attempts.)

(In case there's any question, I created a database, and then created
a short text file called /tmp/gaga (ok, so I used the same file that
the perl module created for the perl test; you caught me) with one line of
text. Then I do:

./testlo ronfoo /tmp/gaga /tmp/gaga1

which fails complaining that there was an error reading the file (which
actually is misleading-- the error actually is in writing to the new
large object).

(after you do this, you must drop the database and recreate it before you
run testlo again, otherwise you get errors about creating the xinv#####
"object".)

I made patches for 6.4.2 a week ago to fix problems of lobj reported
by another user. I'm not sure if his problem was solved or not, since
I got no reply from him. Anyway, with the patch, lotest.c runs fine on
my LinuxPPC box. More over, following commented out part of testlo.c
now passes without any problem (I guess these were commented out
becasue overwriting lobj did not work).

/*
printf("\tas large object %d.\n", lobjOid);

printf("picking out bytes 1000-2000 of the large object\n");
pickout(conn, lobjOid, 1000, 1000);

printf("overwriting bytes 1000-2000 of the large object with X's\n");
overwrite(conn, lobjOid, 1000, 1000);
*/

Tatsuo Ishii
------------------------------ cut here -------------------------------
*** postgresql-6.4.2/src/backend/storage/large_object/inv_api.c.orig	Sun Dec 13 14:08:19 1998
--- postgresql-6.4.2/src/backend/storage/large_object/inv_api.c	Fri Feb 12 20:21:05 1999
***************
*** 545,555 ****
  			tuplen = inv_wrnew(obj_desc, buf, nbytes - nwritten);
  		else
  		{
! 			if (obj_desc->offset > obj_desc->highbyte)
  				tuplen = inv_wrnew(obj_desc, buf, nbytes - nwritten);
  			else
  				tuplen = inv_wrold(obj_desc, buf, nbytes - nwritten, tuple, buffer);
! 			ReleaseBuffer(buffer);
  		}
  		/* move pointers past the amount we just wrote */
--- 545,561 ----
  			tuplen = inv_wrnew(obj_desc, buf, nbytes - nwritten);
  		else
  		{
!           		if (obj_desc->offset > obj_desc->highbyte) {
  				tuplen = inv_wrnew(obj_desc, buf, nbytes - nwritten);
+ 				ReleaseBuffer(buffer);
+ 			}
  			else
  				tuplen = inv_wrold(obj_desc, buf, nbytes - nwritten, tuple, buffer);
! 			/* inv_wrold() has already issued WriteBuffer()
! 			   which has decremented local reference counter
! 			   (LocalRefCount). So we should not call
! 			   ReleaseBuffer() here. -- Tatsuo 99/2/4
! 			ReleaseBuffer(buffer); */
  		}

/* move pointers past the amount we just wrote */
***************
*** 624,648 ****
|| obj_desc->offset < obj_desc->lowbyte
|| !ItemPointerIsValid(&(obj_desc->htid)))
{

  		/* initialize scan key if not done */
  		if (obj_desc->iscan == (IndexScanDesc) NULL)
  		{
- 			ScanKeyData skey;
- 
  			/*
  			 * As scan index may be prematurely closed (on commit), we
  			 * must use object current offset (was 0) to reinitialize the
  			 * entry [ PA ].
  			 */
- 			ScanKeyEntryInitialize(&skey, 0x0, 1, F_INT4GE,
- 								   Int32GetDatum(obj_desc->offset));
  			obj_desc->iscan =
  				index_beginscan(obj_desc->index_r,
  								(bool) 0, (uint16) 1,
  								&skey);
  		}
- 
  		do
  		{
  			res = index_getnext(obj_desc->iscan, ForwardScanDirection);
--- 630,655 ----
  		|| obj_desc->offset < obj_desc->lowbyte
  		|| !ItemPointerIsValid(&(obj_desc->htid)))
  	{
+ 		ScanKeyData skey;
+ 
+ 		ScanKeyEntryInitialize(&skey, 0x0, 1, F_INT4GE,
+ 				       Int32GetDatum(obj_desc->offset));
  		/* initialize scan key if not done */
  		if (obj_desc->iscan == (IndexScanDesc) NULL)
  		{
  			/*
  			 * As scan index may be prematurely closed (on commit), we
  			 * must use object current offset (was 0) to reinitialize the
  			 * entry [ PA ].
  			 */
  			obj_desc->iscan =
  				index_beginscan(obj_desc->index_r,
  								(bool) 0, (uint16) 1,
  								&skey);
+ 		} else {
+ 			index_rescan(obj_desc->iscan, false, &skey);
  		}
  		do
  		{
  			res = index_getnext(obj_desc->iscan, ForwardScanDirection);
***************
*** 666,672 ****
  			tuple = heap_fetch(obj_desc->heap_r, SnapshotNow,
  							   &res->heap_iptr, buffer);
  			pfree(res);
! 		} while (tuple == (HeapTuple) NULL);
  		/* remember this tid -- we may need it for later reads/writes */
   		ItemPointerCopy(&tuple->t_ctid, &obj_desc->htid);
--- 673,679 ----
  			tuple = heap_fetch(obj_desc->heap_r, SnapshotNow,
  							   &res->heap_iptr, buffer);
  			pfree(res);
! 		} while (!HeapTupleIsValid(tuple));
  		/* remember this tid -- we may need it for later reads/writes */
   		ItemPointerCopy(&tuple->t_ctid, &obj_desc->htid);
***************
*** 675,680 ****
--- 682,691 ----
  	{
  		tuple = heap_fetch(obj_desc->heap_r, SnapshotNow,
  						   &(obj_desc->htid), buffer);
+ 		if (!HeapTupleIsValid(tuple)) {
+ 		  elog(ERROR,
+ 		       "inv_fetchtup: heap_fetch failed");
+ 		}
  	}

/*
***************
*** 746,757 ****

nblocks = RelationGetNumberOfBlocks(hr);

! if (nblocks > 0)
buffer = ReadBuffer(hr, nblocks - 1);
! else
buffer = ReadBuffer(hr, P_NEW);
!
! page = BufferGetPage(buffer);

  	/*
  	 * If the last page is too small to hold all the data, and it's too
--- 757,771 ----

nblocks = RelationGetNumberOfBlocks(hr);

! if (nblocks > 0) {
buffer = ReadBuffer(hr, nblocks - 1);
! page = BufferGetPage(buffer);
! }
! else {
buffer = ReadBuffer(hr, P_NEW);
! page = BufferGetPage(buffer);
! PageInit(page, BufferGetPageSize(buffer), 0);
! }

/*
* If the last page is too small to hold all the data, and it's too
***************
*** 865,876 ****

nblocks = RelationGetNumberOfBlocks(hr);

! if (nblocks > 0)
newbuf = ReadBuffer(hr, nblocks - 1);
! else
newbuf = ReadBuffer(hr, P_NEW);

- newpage = BufferGetPage(newbuf);
freespc = IFREESPC(newpage);

  		/*
--- 879,894 ----

nblocks = RelationGetNumberOfBlocks(hr);

! 		if (nblocks > 0) {
  			newbuf = ReadBuffer(hr, nblocks - 1);
! 			newpage = BufferGetPage(newbuf);
! 		}
! 		else {
  			newbuf = ReadBuffer(hr, P_NEW);
+ 			newpage = BufferGetPage(newbuf);
+ 			PageInit(newpage, BufferGetPageSize(newbuf), 0);
+ 		}

freespc = IFREESPC(newpage);

  		/*
***************
*** 973,978 ****
--- 991,999 ----
  	WriteBuffer(buffer);
  	if (newbuf != buffer)
  		WriteBuffer(newbuf);
+ 
+ 	/* Tuple id is no longer valid */
+ 	ItemPointerSetInvalid(&(obj_desc->htid));

/* done */
return nwritten;

#2Ron Snyder
snyder@athena.lblesd.k12.or.us

[snipped original message explaining how testlo fails for me on
sparc/solaris 2.6 and hpux 10.20, gcc 2.8.1, postgres 6.4.2]

I made patches for 6.4.2 a week ago to fix problems of lobj reported
by another user. I'm not sure if his problem was solved or not, since
I got no reply from him. Anyway, with the patch, lotest.c runs fine on
my LinuxPPC box. More over, following commented out part of testlo.c
now passes without any problem (I guess these were commented out
becasue overwriting lobj did not work).

[patches snipped]

Tatsuo,
I applied the patches to my 6.4.2 source tree (not the snapshot)--
the patches applied cleanly, but my backend still goes into never never
land at the line I mentioned before. What version of gcc are you using?
Would it be useful for me to post any additional info?

-ron

#3Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Ron Snyder (#2)

Tatsuo,
I applied the patches to my 6.4.2 source tree (not the snapshot)--
the patches applied cleanly, but my backend still goes into never never
land at the line I mentioned before. What version of gcc are you using?
Would it be useful for me to post any additional info?

Let me try on Solaris2.6/sparc in my office first. Today is Saturday
in Japan, so the testing will be the day after tomorrow. Is it ok for
you?

BTW, gcc version I'm using on LinuxPPC is egcs-2.90.25 980302
(egcs-1.0.2 prerelease).
---
Tatsuo Ishii

#4Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Tatsuo Ishii (#3)

I applied the patches to my 6.4.2 source tree (not the snapshot)--
the patches applied cleanly, but my backend still goes into never never
land at the line I mentioned before. What version of gcc are you using?
Would it be useful for me to post any additional info?

Let me try on Solaris2.6/sparc in my office first. Today is Saturday
in Japan, so the testing will be the day after tomorrow. Is it ok for
you?

Ok. I found an align problem in lobj that might not appear other than
Solaris/sparc. Please apply included patches to
src/backend/storage/large_object/inv_api.c and try again. (These are
addtions to the previous ones).

Hope this is the last bug:-)
--
Tatsuo Ishii
--------------------------------------------------------------------
*** inv_api.c.orig2	Mon Feb 22 16:15:31 1999
--- inv_api.c	Mon Feb 22 16:16:55 1999
***************
*** 1019,1028 ****

/* compute tuple size -- no nulls */
hoff = offsetof(HeapTupleData, t_bits);

/* add in olastbyte, varlena.vl_len, varlena.vl_dat */
tupsize = hoff + (2 * sizeof(int32)) + nwrite;
! tupsize = LONGALIGN(tupsize);

  	/*
  	 * Allocate the tuple on the page, violating the page abstraction.
--- 1019,1029 ----

/* compute tuple size -- no nulls */
hoff = offsetof(HeapTupleData, t_bits);
+ hoff = DOUBLEALIGN(hoff);

/* add in olastbyte, varlena.vl_len, varlena.vl_dat */
tupsize = hoff + (2 * sizeof(int32)) + nwrite;
! tupsize = DOUBLEALIGN(tupsize);

/*
* Allocate the tuple on the page, violating the page abstraction.

#5Bruce Momjian
maillist@candle.pha.pa.us
In reply to: Tatsuo Ishii (#4)

Applied to the main tree. I found the patch malformed, so I applied it
by hand. Interesting you had to double-align.

I applied the patches to my 6.4.2 source tree (not the snapshot)--
the patches applied cleanly, but my backend still goes into never never
land at the line I mentioned before. What version of gcc are you using?
Would it be useful for me to post any additional info?

Let me try on Solaris2.6/sparc in my office first. Today is Saturday
in Japan, so the testing will be the day after tomorrow. Is it ok for
you?

Ok. I found an align problem in lobj that might not appear other than
Solaris/sparc. Please apply included patches to
src/backend/storage/large_object/inv_api.c and try again. (These are
addtions to the previous ones).

Hope this is the last bug:-)
--
Tatsuo Ishii
--------------------------------------------------------------------
*** inv_api.c.orig2	Mon Feb 22 16:15:31 1999
--- inv_api.c	Mon Feb 22 16:16:55 1999
***************
*** 1019,1028 ****

/* compute tuple size -- no nulls */
hoff = offsetof(HeapTupleData, t_bits);

/* add in olastbyte, varlena.vl_len, varlena.vl_dat */
tupsize = hoff + (2 * sizeof(int32)) + nwrite;
! tupsize = LONGALIGN(tupsize);

/*
* Allocate the tuple on the page, violating the page abstraction.
--- 1019,1029 ----

/* compute tuple size -- no nulls */
hoff = offsetof(HeapTupleData, t_bits);
+ hoff = DOUBLEALIGN(hoff);

/* add in olastbyte, varlena.vl_len, varlena.vl_dat */
tupsize = hoff + (2 * sizeof(int32)) + nwrite;
! tupsize = DOUBLEALIGN(tupsize);

/*
* Allocate the tuple on the page, violating the page abstraction.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#6Ron Snyder
snyder@athena.lblesd.k12.or.us
In reply to: Tatsuo Ishii (#4)

Ok. I found an align problem in lobj that might not appear other than
Solaris/sparc. Please apply included patches to
src/backend/storage/large_object/inv_api.c and try again. (These are
addtions to the previous ones).

Hope this is the last bug:-)

Tatsuo-- I've been out for a couple of days, but I wanted to let you
know that this did indeed fix my problems.

Thanks!

-ron

#7Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Ron Snyder (#6)

Ok. I found an align problem in lobj that might not appear other than
Solaris/sparc. Please apply included patches to
src/backend/storage/large_object/inv_api.c and try again. (These are
addtions to the previous ones).

Hope this is the last bug:-)

Tatsuo-- I've been out for a couple of days, but I wanted to let you
know that this did indeed fix my problems.

Thanks!

You are welcome!

To Bruce:
Thanks for taking care of my previous patches for current. If
included patch is ok, I will make one for current.

Now I'm working on lobj in current tree(Currently lobj in 6.5 seems
broken).
--
Tatsuo Ishii