Problem reloading regression database

Started by Bruce Momjianalmost 24 years ago26 messages
#1Bruce Momjian
pgman@candle.pha.pa.us
1 attachment(s)

I am testing pg_upgrade. I successfully did a pg_upgrade of a 7.2
regression database into a fresh 7.2 install. I compared the output of
pg_dump from both copies and found that c_star dump caused a crash. I
then started doing more testing of the regression database and found
that the regression database does not load in cleanly. These failures
cause pg_upgrade files not to match the loaded schema.

Looks like there is a problem with inheritance, patch attached listing
the pg_dump load failures. I also see what looks like a crash in the
server logs:

DEBUG: pq_flush: send() failed: Broken pipe
FATAL 1: Socket command type 1 unknown

Looks like it should be fixed before final.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Attachments:

/bjm/errtext/plainDownload
#2Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Bruce Momjian (#1)
Re: Problem reloading regression database

pgman wrote:

I am testing pg_upgrade. I successfully did a pg_upgrade of a 7.2
regression database into a fresh 7.2 install. I compared the output of
pg_dump from both copies and found that c_star dump caused a crash. I
then started doing more testing of the regression database and found
that the regression database does not load in cleanly. These failures
cause pg_upgrade files not to match the loaded schema.

Looks like there is a problem with inheritance, patch attached listing
the pg_dump load failures. I also see what looks like a crash in the
server logs:

DEBUG: pq_flush: send() failed: Broken pipe
FATAL 1: Socket command type 1 unknown

Looks like it should be fixed before final.

I should have been clearer how to reproduce this:

1) run regression tests
2) pg_dump regression > /tmp/dump
3) dropdb regression
4) createdb regression
5) psql regression < /tmp/dump > out 2> err

Look at the err file.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#1)
Re: Problem reloading regression database

Bruce Momjian <pgman@candle.pha.pa.us> writes:

I am testing pg_upgrade. I successfully did a pg_upgrade of a 7.2
regression database into a fresh 7.2 install. I compared the output of
pg_dump from both copies and found that c_star dump caused a crash. I
then started doing more testing of the regression database and found
that the regression database does not load in cleanly.

No kidding. That's been a known issue for *years*, Bruce. Without a
way to reorder the columns in COPY, it can't be fixed. That's the main
reason why we have a TODO item to allow column specification in COPY.

I also see what looks like a crash in the server logs:

DEBUG: pq_flush: send() failed: Broken pipe
FATAL 1: Socket command type 1 unknown

No, that's just the COPY failing (and resetting the connection). That's
not going to be fixed before final either, unless you'd like us to
develop a new frontend COPY protocol before final...

regards, tom lane

#4Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#3)
Re: Problem reloading regression database

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

I am testing pg_upgrade. I successfully did a pg_upgrade of a 7.2
regression database into a fresh 7.2 install. I compared the output of
pg_dump from both copies and found that c_star dump caused a crash. I
then started doing more testing of the regression database and found
that the regression database does not load in cleanly.

No kidding. That's been a known issue for *years*, Bruce. Without a
way to reorder the columns in COPY, it can't be fixed. That's the main
reason why we have a TODO item to allow column specification in COPY.

I also see what looks like a crash in the server logs:

DEBUG: pq_flush: send() failed: Broken pipe
FATAL 1: Socket command type 1 unknown

No, that's just the COPY failing (and resetting the connection). That's
not going to be fixed before final either, unless you'd like us to
develop a new frontend COPY protocol before final...

I used to test regression dumps a long time ago. It seems I haven't
done so recently; guess this is a non-problem or at least a known,
minor one.

It also means my pg_upgrade is working pretty well if the rest of it
worked fine.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#5Brent Verner
brent@rcfile.org
In reply to: Tom Lane (#3)
1 attachment(s)
Re: Problem reloading regression database

[2002-01-12 23:45] Tom Lane said:
| Bruce Momjian <pgman@candle.pha.pa.us> writes:
| > I am testing pg_upgrade. I successfully did a pg_upgrade of a 7.2
| > regression database into a fresh 7.2 install. I compared the output of
| > pg_dump from both copies and found that c_star dump caused a crash. I
| > then started doing more testing of the regression database and found
| > that the regression database does not load in cleanly.
|
| No kidding. That's been a known issue for *years*, Bruce. Without a
| way to reorder the columns in COPY, it can't be fixed. That's the main
| reason why we have a TODO item to allow column specification in COPY.

The attached patch is a first-cut at implementing column specification
in COPY FROM with the following syntax.

COPY atable (col1,col2,col3,col4) FROM ...

The details:
Add "List* attlist" member to CopyStmt parse node.
Adds <please supply term ;-)> to gram.y allowing opt_column_list
in COPY FROM Node.
In CopyFrom, if attlist present, create Form_pg_attribute* ordered
same as attlist.
If all columns in the table are not found in attlist, elog(ERROR).
Continue normal operation.

Regression tests all still pass. There is still a problem where
duplicate columns in the list will allow the operation to succeed,
but I believe this is the only problem. If this approach is sane,
I'll clean it up later today.

comments?

cheers.
brent

--
"Develop your talent, man, and leave the world something. Records are
really gifts from people. To think that an artist would love you enough
to share his music with anyone is a beautiful thing." -- Duane Allman

Attachments:

copy.colspec.difftext/plain; charset=us-asciiDownload
Index: src/backend/commands/copy.c
===================================================================
RCS file: /var/cvsup/pgsql/src/backend/commands/copy.c,v
retrieving revision 1.144
diff -c -r1.144 copy.c
*** src/backend/commands/copy.c	4 Dec 2001 21:19:57 -0000	1.144
--- src/backend/commands/copy.c	13 Jan 2002 15:18:06 -0000
***************
*** 45,52 ****
  
  
  /* non-export function prototypes */
! static void CopyTo(Relation rel, bool binary, bool oids, FILE *fp, char *delim, char *null_print);
! static void CopyFrom(Relation rel, bool binary, bool oids, FILE *fp, char *delim, char *null_print);
  static Oid	GetInputFunction(Oid type);
  static Oid	GetTypeElement(Oid type);
  static void CopyReadNewline(FILE *fp, int *newline);
--- 45,52 ----
  
  
  /* non-export function prototypes */
! static void CopyTo(Relation rel, bool binary, bool oids, FILE *fp, char *delim, char *null_print,List *attlist);
! static void CopyFrom(Relation rel, bool binary, bool oids, FILE *fp, char *delim, char *null_print,List *attlist);
  static Oid	GetInputFunction(Oid type);
  static Oid	GetTypeElement(Oid type);
  static void CopyReadNewline(FILE *fp, int *newline);
***************
*** 261,267 ****
   */
  void
  DoCopy(char *relname, bool binary, bool oids, bool from, bool pipe,
! 	   char *filename, char *delim, char *null_print)
  {
  	FILE	   *fp;
  	Relation	rel;
--- 261,267 ----
   */
  void
  DoCopy(char *relname, bool binary, bool oids, bool from, bool pipe,
! 	   char *filename, char *delim, char *null_print, List* attlist)
  {
  	FILE	   *fp;
  	Relation	rel;
***************
*** 333,339 ****
  					 "reading.  Errno = %s (%d).",
  					 (int) geteuid(), filename, strerror(errno), errno);
  		}
! 		CopyFrom(rel, binary, oids, fp, delim, null_print);
  	}
  	else
  	{							/* copy from database to file */
--- 333,339 ----
  					 "reading.  Errno = %s (%d).",
  					 (int) geteuid(), filename, strerror(errno), errno);
  		}
! 		CopyFrom(rel, binary, oids, fp, delim, null_print, attlist);
  	}
  	else
  	{							/* copy from database to file */
***************
*** 379,385 ****
  					 "writing.  Errno = %s (%d).",
  					 (int) geteuid(), filename, strerror(errno), errno);
  		}
! 		CopyTo(rel, binary, oids, fp, delim, null_print);
  	}
  
  	if (!pipe)
--- 379,385 ----
  					 "writing.  Errno = %s (%d).",
  					 (int) geteuid(), filename, strerror(errno), errno);
  		}
! 		CopyTo(rel, binary, oids, fp, delim, null_print, attlist);
  	}
  
  	if (!pipe)
***************
*** 408,414 ****
   */
  static void
  CopyTo(Relation rel, bool binary, bool oids, FILE *fp,
! 	   char *delim, char *null_print)
  {
  	HeapTuple	tuple;
  	TupleDesc	tupDesc;
--- 408,414 ----
   */
  static void
  CopyTo(Relation rel, bool binary, bool oids, FILE *fp,
! 	   char *delim, char *null_print, List* attlist)
  {
  	HeapTuple	tuple;
  	TupleDesc	tupDesc;
***************
*** 620,626 ****
   */
  static void
  CopyFrom(Relation rel, bool binary, bool oids, FILE *fp,
! 		 char *delim, char *null_print)
  {
  	HeapTuple	tuple;
  	TupleDesc	tupDesc;
--- 620,626 ----
   */
  static void
  CopyFrom(Relation rel, bool binary, bool oids, FILE *fp,
! 		 char *delim, char *null_print, List* attlist)
  {
  	HeapTuple	tuple;
  	TupleDesc	tupDesc;
***************
*** 644,651 ****
  	bool		file_has_oids;
  
  	tupDesc = RelationGetDescr(rel);
! 	attr = tupDesc->attrs;
! 	attr_count = tupDesc->natts;
  
  	/*
  	 * We need a ResultRelInfo so we can use the regular executor's
--- 644,672 ----
  	bool		file_has_oids;
  
  	tupDesc = RelationGetDescr(rel);
!   attr_count = tupDesc->natts;
!   if( attlist )
!   {
!     int i,c;
!     List* cur;
!     c = 0;
!     attr = (Form_pg_attribute*)palloc(sizeof(Form_pg_attribute*) * attr_count);
!     foreach(cur,attlist)
!     {
!       char* col = ((Ident*)lfirst(cur))->name;
!       for(i=0;i<attr_count;++i)
!         if(strcmp(col,NameStr(tupDesc->attrs[i]->attname)) == 0)
!         {
!           attr[c] = tupDesc->attrs[i];
!           ++c;
!         }
!     }
!     if( c != attr_count )
!       elog(ERROR,"Columns specified in COPY FROM command do not match columns in target table.");
!   }
!   else{
!   	attr = tupDesc->attrs;
!   }
  
  	/*
  	 * We need a ResultRelInfo so we can use the regular executor's
***************
*** 953,958 ****
--- 974,981 ----
  		pfree(in_functions);
  		pfree(elements);
  	}
+   if( attlist )
+     pfree(attr);
  
  	ExecDropTupleTable(tupleTable, true);
  
Index: src/backend/parser/gram.y
===================================================================
RCS file: /var/cvsup/pgsql/src/backend/parser/gram.y,v
retrieving revision 2.276
diff -c -r2.276 gram.y
*** src/backend/parser/gram.y	9 Dec 2001 04:39:39 -0000	2.276
--- src/backend/parser/gram.y	13 Jan 2002 14:56:43 -0000
***************
*** 1193,1203 ****
--- 1193,1217 ----
  					CopyStmt *n = makeNode(CopyStmt);
  					n->binary = $2;
  					n->relname = $3;
+           n->attlist = NIL;
  					n->oids = $4;
  					n->direction = $5;
  					n->filename = $6;
  					n->delimiter = $7;
  					n->null_print = $8;
+ 					$$ = (Node *)n;
+ 				}
+     | COPY opt_binary relation_name opt_column_list opt_with_copy copy_dirn copy_file_name copy_delimiter copy_null
+ 				{
+ 					CopyStmt *n = makeNode(CopyStmt);
+ 					n->binary = $2;
+ 					n->relname = $3;
+           n->attlist = $4;
+ 					n->oids = $5;
+ 					n->direction = $6;
+ 					n->filename = $7;
+ 					n->delimiter = $8;
+ 					n->null_print = $9;
  					$$ = (Node *)n;
  				}
  		;
Index: src/backend/tcop/utility.c
===================================================================
RCS file: /var/cvsup/pgsql/src/backend/tcop/utility.c,v
retrieving revision 1.124
diff -c -r1.124 utility.c
*** src/backend/tcop/utility.c	3 Jan 2002 23:21:32 -0000	1.124
--- src/backend/tcop/utility.c	13 Jan 2002 13:42:56 -0000
***************
*** 354,360 ****
  				 */
  					   stmt->filename,
  					   stmt->delimiter,
! 					   stmt->null_print);
  			}
  			break;
  
--- 354,361 ----
  				 */
  					   stmt->filename,
  					   stmt->delimiter,
! 					   stmt->null_print,
!              stmt->attlist);
  			}
  			break;
  
Index: src/include/commands/copy.h
===================================================================
RCS file: /var/cvsup/pgsql/src/include/commands/copy.h,v
retrieving revision 1.16
diff -c -r1.16 copy.h
*** src/include/commands/copy.h	5 Nov 2001 17:46:33 -0000	1.16
--- src/include/commands/copy.h	13 Jan 2002 13:52:56 -0000
***************
*** 14,22 ****
  #ifndef COPY_H
  #define COPY_H
  
  extern int	copy_lineno;
  
  void DoCopy(char *relname, bool binary, bool oids, bool from, bool pipe,
! 	   char *filename, char *delim, char *null_print);
  
  #endif   /* COPY_H */
--- 14,24 ----
  #ifndef COPY_H
  #define COPY_H
  
+ #include "utils/portal.h"
+ 
  extern int	copy_lineno;
  
  void DoCopy(char *relname, bool binary, bool oids, bool from, bool pipe,
! 	   char *filename, char *delim, char *null_print, List* _al);
  
  #endif   /* COPY_H */
Index: src/include/nodes/parsenodes.h
===================================================================
RCS file: /var/cvsup/pgsql/src/include/nodes/parsenodes.h,v
retrieving revision 1.151
diff -c -r1.151 parsenodes.h
*** src/include/nodes/parsenodes.h	5 Nov 2001 17:46:34 -0000	1.151
--- src/include/nodes/parsenodes.h	13 Jan 2002 13:40:22 -0000
***************
*** 183,188 ****
--- 183,189 ----
  	char	   *filename;		/* if NULL, use stdin/stdout */
  	char	   *delimiter;		/* delimiter character, \t by default */
  	char	   *null_print;		/* how to print NULLs, `\N' by default */
+   List    *attlist;     /* list of attributes */
  } CopyStmt;
  
  /* ----------------------
#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Brent Verner (#5)
Re: Problem reloading regression database

Brent Verner <brent@rcfile.org> writes:

In CopyFrom, if attlist present, create Form_pg_attribute* ordered
same as attlist.

Doesn't seem like this can possibly work as-is. The eventual call to
heap_formtuple must supply the column values in the order expected
by the table, but I don't see you remapping from input-column indices to
table-column indices anywhere in the data processing loop.

Also, a reasonable version of this capability would allow the input
column list to be just a subset of the table column list; with the
column default expressions, if any, being evaluated to fill the missing
columns. This would answer the requests we keep having for COPY to be
able to load a table containing a serial-number column.

Don't forget that if the syntax allows COPY (collist) TO file, people
will expect that to work too.

regards, tom lane

#7Brent Verner
brent@rcfile.org
In reply to: Tom Lane (#6)
Re: Problem reloading regression database

[2002-01-13 11:41] Tom Lane said:
| Brent Verner <brent@rcfile.org> writes:
| > In CopyFrom, if attlist present, create Form_pg_attribute* ordered
| > same as attlist.
|
| Doesn't seem like this can possibly work as-is. The eventual call to
| heap_formtuple must supply the column values in the order expected
| by the table, but I don't see you remapping from input-column indices to
| table-column indices anywhere in the data processing loop.

yup. back to the drawing board ;-)

| Also, a reasonable version of this capability would allow the input
| column list to be just a subset of the table column list; with the
| column default expressions, if any, being evaluated to fill the missing
| columns. This would answer the requests we keep having for COPY to be
| able to load a table containing a serial-number column.

right.

| Don't forget that if the syntax allows COPY (collist) TO file, people
| will expect that to work too.

;-) darnit!

thanks.
brent

--
"Develop your talent, man, and leave the world something. Records are
really gifts from people. To think that an artist would love you enough
to share his music with anyone is a beautiful thing." -- Duane Allman

#8Brent Verner
brent@rcfile.org
In reply to: Brent Verner (#7)
Re: Problem reloading regression database

[2002-01-13 12:33] Brent Verner said:
| [2002-01-13 11:41] Tom Lane said:
| | Brent Verner <brent@rcfile.org> writes:
| | > In CopyFrom, if attlist present, create Form_pg_attribute* ordered
| | > same as attlist.
| |
| | Doesn't seem like this can possibly work as-is. The eventual call to
| | heap_formtuple must supply the column values in the order expected
| | by the table, but I don't see you remapping from input-column indices to
| | table-column indices anywhere in the data processing loop.
|
| yup. back to the drawing board ;-)

I fixed this by making an int* mapping from specified collist
position to actual rd_att->attrs position.

| | Also, a reasonable version of this capability would allow the input
| | column list to be just a subset of the table column list; with the
| | column default expressions, if any, being evaluated to fill the missing
| | columns. This would answer the requests we keep having for COPY to be
| | able to load a table containing a serial-number column.
|
| right.

I'm still a bit^W^W lost as hell on how the column default magic
happens. It appears that in the INSERT case, the query goes thru
the planner and picks up the necessary Node* representing the
default(s) for a relation, then later evaluates those nodes if
not attisset.

Should I be looking to call
ExecEvalFunc(stringToNode(adbin),ec,&rvnull,NULL);
when an attr is not specified and it has a default? Or is there
a more straightforward way of getting the default for an att?
(I sure hope there is ;-)

thanks.
brent

--
"Develop your talent, man, and leave the world something. Records are
really gifts from people. To think that an artist would love you enough
to share his music with anyone is a beautiful thing." -- Duane Allman

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Brent Verner (#8)
Re: Problem reloading regression database

Brent Verner <brent@rcfile.org> writes:

I fixed this by making an int* mapping from specified collist
position to actual rd_att->attrs position.

Sounds better.

I'm still a bit^W^W lost as hell on how the column default magic
happens.

I'd say use build_column_default() in src/backend/optimizer/prep/preptlist.c
to set up a default expression (possibly just NULL) for every column
that's not supplied by the input. That routine's not exported now, but
it could be, or perhaps it should be moved somewhere else. (Suggestions
anyone? Someplace in src/backend/catalog might be a more appropriate
place for it.)

Then in the per-tuple loop you use ExecEvalExpr, or more likely
ExecEvalExprSwitchContext, to execute the default expressions.
The econtext wanted by ExecEvalExpr can be had from the estate
that CopyFrom already creates; use GetPerTupleExprContext(estate).

You'll need to verify that you have got the memory context business
right, ie, no memory leak across rows. I think the above sketch is
sufficient, but check it with a memory-eating default expression
evaluated for a few million input rows ... and you are doing your
testing with --enable-cassert, I trust, to catch any dangling pointers.

regards, tom lane

#10Brent Verner
brent@rcfile.org
In reply to: Tom Lane (#9)
Re: Problem reloading regression database

[2002-01-13 15:17] Tom Lane said:
| Brent Verner <brent@rcfile.org> writes:
| > I fixed this by making an int* mapping from specified collist
| > position to actual rd_att->attrs position.
|
| Sounds better.
|
| > I'm still a bit^W^W lost as hell on how the column default magic
| > happens.
|
| I'd say use build_column_default() in src/backend/optimizer/prep/preptlist.c
| to set up a default expression (possibly just NULL) for every column
| that's not supplied by the input. That routine's not exported now, but
| it could be, or perhaps it should be moved somewhere else. (Suggestions
| anyone? Someplace in src/backend/catalog might be a more appropriate
| place for it.)

gotcha.

| Then in the per-tuple loop you use ExecEvalExpr, or more likely
| ExecEvalExprSwitchContext, to execute the default expressions.
| The econtext wanted by ExecEvalExpr can be had from the estate
| that CopyFrom already creates; use GetPerTupleExprContext(estate).

many, many thanks!

| You'll need to verify that you have got the memory context business
| right, ie, no memory leak across rows. I think the above sketch is
| sufficient, but check it with a memory-eating default expression
| evaluated for a few million input rows ...

Yes, the above info should get me through.

| and you are doing your
| testing with --enable-cassert, I trust, to catch any dangling pointers.

<ducks>
I am now :-o

thank you.
brent

--
"Develop your talent, man, and leave the world something. Records are
really gifts from people. To think that an artist would love you enough
to share his music with anyone is a beautiful thing." -- Duane Allman

#11Brent Verner
brent@rcfile.org
In reply to: Brent Verner (#10)
1 attachment(s)
Re: Problem reloading regression database

[2002-01-13 16:42] Brent Verner said:
| [2002-01-13 15:17] Tom Lane said:
| | Brent Verner <brent@rcfile.org> writes:
| | > I fixed this by making an int* mapping from specified collist
| | > position to actual rd_att->attrs position.
| |
| | Sounds better.
| |
| | > I'm still a bit^W^W lost as hell on how the column default magic
| | > happens.
| |
| | I'd say use build_column_default() in src/backend/optimizer/prep/preptlist.c
| | to set up a default expression (possibly just NULL) for every column
| | that's not supplied by the input. That routine's not exported now, but
| | it could be, or perhaps it should be moved somewhere else. (Suggestions
| | anyone? Someplace in src/backend/catalog might be a more appropriate
| | place for it.)
|
| gotcha.
|
| | Then in the per-tuple loop you use ExecEvalExpr, or more likely
| | ExecEvalExprSwitchContext, to execute the default expressions.
| | The econtext wanted by ExecEvalExpr can be had from the estate
| | that CopyFrom already creates; use GetPerTupleExprContext(estate).
|
| many, many thanks!
|
| | You'll need to verify that you have got the memory context business
| | right, ie, no memory leak across rows. I think the above sketch is
| | sufficient, but check it with a memory-eating default expression
| | evaluated for a few million input rows ...
|
| Yes, the above info should get me through.

round two...

1) I (kludgily) exported build_column_default() from its current
location.
2) defaults expressions are now tried if a column is not in the
COPY attlist specification.

There are still some kinks... (probably more than I've thought of)
1) a column in attlist that is not in the table will cause a segv
in the backend.
2) duplicate names in attlist are still not treated as an error.

I believe the memory context issues are handled correctly, but I've
not run the few million copy tests yet, and I probably won't be able
to until late(r) tomorrow. No strangeness running compiled with
--enable-cassert. Regression tests still pass.

Sanity checks much appreciated.

cheers.
brent

--
"Develop your talent, man, and leave the world something. Records are
really gifts from people. To think that an artist would love you enough
to share his music with anyone is a beautiful thing." -- Duane Allman

Attachments:

copy.colspec.1.difftext/plain; charset=us-asciiDownload
Index: src/backend/commands/copy.c
===================================================================
RCS file: /var/cvsup/pgsql/src/backend/commands/copy.c,v
retrieving revision 1.144
diff -c -r1.144 copy.c
*** src/backend/commands/copy.c	4 Dec 2001 21:19:57 -0000	1.144
--- src/backend/commands/copy.c	14 Jan 2002 02:35:35 -0000
***************
*** 45,52 ****
  
  
  /* non-export function prototypes */
! static void CopyTo(Relation rel, bool binary, bool oids, FILE *fp, char *delim, char *null_print);
! static void CopyFrom(Relation rel, bool binary, bool oids, FILE *fp, char *delim, char *null_print);
  static Oid	GetInputFunction(Oid type);
  static Oid	GetTypeElement(Oid type);
  static void CopyReadNewline(FILE *fp, int *newline);
--- 45,52 ----
  
  
  /* non-export function prototypes */
! static void CopyTo(Relation rel, bool binary, bool oids, FILE *fp, char *delim, char *null_print,List *attlist);
! static void CopyFrom(Relation rel, bool binary, bool oids, FILE *fp, char *delim, char *null_print,List *attlist);
  static Oid	GetInputFunction(Oid type);
  static Oid	GetTypeElement(Oid type);
  static void CopyReadNewline(FILE *fp, int *newline);
***************
*** 261,267 ****
   */
  void
  DoCopy(char *relname, bool binary, bool oids, bool from, bool pipe,
! 	   char *filename, char *delim, char *null_print)
  {
  	FILE	   *fp;
  	Relation	rel;
--- 261,267 ----
   */
  void
  DoCopy(char *relname, bool binary, bool oids, bool from, bool pipe,
! 	   char *filename, char *delim, char *null_print, List* attlist)
  {
  	FILE	   *fp;
  	Relation	rel;
***************
*** 333,339 ****
  					 "reading.  Errno = %s (%d).",
  					 (int) geteuid(), filename, strerror(errno), errno);
  		}
! 		CopyFrom(rel, binary, oids, fp, delim, null_print);
  	}
  	else
  	{							/* copy from database to file */
--- 333,339 ----
  					 "reading.  Errno = %s (%d).",
  					 (int) geteuid(), filename, strerror(errno), errno);
  		}
! 		CopyFrom(rel, binary, oids, fp, delim, null_print, attlist);
  	}
  	else
  	{							/* copy from database to file */
***************
*** 379,385 ****
  					 "writing.  Errno = %s (%d).",
  					 (int) geteuid(), filename, strerror(errno), errno);
  		}
! 		CopyTo(rel, binary, oids, fp, delim, null_print);
  	}
  
  	if (!pipe)
--- 379,385 ----
  					 "writing.  Errno = %s (%d).",
  					 (int) geteuid(), filename, strerror(errno), errno);
  		}
! 		CopyTo(rel, binary, oids, fp, delim, null_print, attlist);
  	}
  
  	if (!pipe)
***************
*** 408,414 ****
   */
  static void
  CopyTo(Relation rel, bool binary, bool oids, FILE *fp,
! 	   char *delim, char *null_print)
  {
  	HeapTuple	tuple;
  	TupleDesc	tupDesc;
--- 408,414 ----
   */
  static void
  CopyTo(Relation rel, bool binary, bool oids, FILE *fp,
! 	   char *delim, char *null_print, List* attlist)
  {
  	HeapTuple	tuple;
  	TupleDesc	tupDesc;
***************
*** 614,651 ****
  	pfree(isvarlena);
  }
  
  
  /*
   * Copy FROM file to relation.
   */
  static void
  CopyFrom(Relation rel, bool binary, bool oids, FILE *fp,
! 		 char *delim, char *null_print)
  {
  	HeapTuple	tuple;
  	TupleDesc	tupDesc;
  	Form_pg_attribute *attr;
  	AttrNumber	attr_count;
  	FmgrInfo   *in_functions;
  	Oid		   *elements;
  	int			i;
  	Oid			in_func_oid;
  	Datum	   *values;
  	char	   *nulls;
- 	bool		isnull;
  	int			done = 0;
  	char	   *string;
  	ResultRelInfo *resultRelInfo;
  	EState	   *estate = CreateExecutorState(); /* for ExecConstraints() */
  	TupleTable	tupleTable;
  	TupleTableSlot *slot;
  	Oid			loaded_oid = InvalidOid;
  	bool		skip_tuple = false;
! 	bool		file_has_oids;
  
  	tupDesc = RelationGetDescr(rel);
! 	attr = tupDesc->attrs;
! 	attr_count = tupDesc->natts;
  
  	/*
  	 * We need a ResultRelInfo so we can use the regular executor's
--- 614,749 ----
  	pfree(isvarlena);
  }
  
+ /* XXX: move this and the implementation to a saner place */
+ Node * build_column_default(Relation rel, int attrno);
  
  /*
   * Copy FROM file to relation.
   */
  static void
  CopyFrom(Relation rel, bool binary, bool oids, FILE *fp,
! 		 char *delim, char *null_print, List* attlist)
  {
  	HeapTuple	tuple;
  	TupleDesc	tupDesc;
  	Form_pg_attribute *attr;
  	AttrNumber	attr_count;
  	FmgrInfo   *in_functions;
+   Node** defexprs = NULL;
+   int* attr_map = NULL;
  	Oid		   *elements;
  	int			i;
+   ExprDoneCond isdone;
  	Oid			in_func_oid;
  	Datum	   *values;
  	char	   *nulls;
  	int			done = 0;
  	char	   *string;
  	ResultRelInfo *resultRelInfo;
  	EState	   *estate = CreateExecutorState(); /* for ExecConstraints() */
+   ExprContext *econtext = GetPerTupleExprContext(estate);
  	TupleTable	tupleTable;
  	TupleTableSlot *slot;
  	Oid			loaded_oid = InvalidOid;
  	bool		skip_tuple = false;
! 	bool		isnull, file_has_oids;
  
  	tupDesc = RelationGetDescr(rel);
!   attr_count = tupDesc->natts;
!   
!   attr_map = (int*)palloc(sizeof(int) * attr_count);
!   
!   if( attlist )
!   {
!     int lastatt;
!     List* cur;
!     Node* testnode;
!   
!     attr = (Form_pg_attribute*)palloc(sizeof(Form_pg_attribute*) * attr_count);
!     defexprs = (Node**)palloc(sizeof(Node*) * MaxHeapAttributeNumber );
!     lastatt = length(attlist);
!   
!     if( attr_count < lastatt )
!       elog(ERROR,"More columns specified in COPY FROM command than in destination table");
!     /*
!      * XXX: verify that there are no duplicates in attlist, and
!      *      that any name in attlist exists in the table.
!      *      (just too tired to do this right now.
!      */
! 
!     /* 
!      * Mapping the specified column data to the actual table structure. 
!      * 
!      * For the table:
!      *                            Table "test"
!      *   Column |  Type   |                    Modifiers                    
!      *  --------+---------+-------------------------------------------------
!      *   id     | integer | not null default nextval('"test_id_seq"'::text)
!      *   a      | integer | 
!      *   b      | text    | 
!      *   c      | integer | 
!      *
!      * We might have a COPY from of:
!      *   COPY test(b,c) FROM STDIN;
!      *   whatever 100
!      *   however  200
!      *   stuff  300
!      *   \.
!      *
!      * So our actual rel->rd_att->attrs is as follows:
!      *   attrs[0] -> "id"
!      *   attrs[1] -> "a"
!      *   attrs[2] -> "b"
!      *   attrs[3] -> "c"
!      * Our input, on the other hand, will look like:
!      *   values[0] -> "b"
!      *   values[1] -> "c"
!      * 
!      * The code below creates a mapping from the COPY input columns
!      * to their correct table attrs, as well as picking up any
!      * default column expressions for evaluation prior to creating
!      * the new tuple if a column is not present in the COPY data.
!      */
!     for(i=0;i<attr_count;++i)
!     {
!       int found = 0;
!       int c = 0;
!       foreach(cur,attlist)
!       {
!         if( strcmp(strVal(lfirst(cur)),NameStr(tupDesc->attrs[i]->attname)) == 0)
!         {
!           /* column is specified in the COPY attlist */
!           attr_map[c] = i;
!           attr[c] = tupDesc->attrs[i];
!           ++found;
!           defexprs[i] = NULL;
!         }
!         ++c;
!       }
!       if( ! found )
!       {
!         /* column in relation was not specified in attlist.
!          * get a default Expr* for use prior to heap_formtuple
!          */
!         testnode = build_column_default(rel,i+1);
!         if( testnode ) /* is this necessary? */
!         {
!           defexprs[i] = testnode;
!           attr_map[lastatt] = i;
!           attr[lastatt] = tupDesc->attrs[i];
!           ++lastatt;
!         }
!       }
!     }
!   }
!   else{
!     /* no attlist was present, so we just fill the attr_map for
!      * use later.
!      */
!   	attr = tupDesc->attrs;
!     for(i=0;i<attr_count;++i)
!       attr_map[i] = i;
!   }
  
  	/*
  	 * We need a ResultRelInfo so we can use the regular executor's
***************
*** 743,749 ****
  		/* Initialize all values for row to NULL */
  		MemSet(values, 0, attr_count * sizeof(Datum));
  		MemSet(nulls, 'n', attr_count * sizeof(char));
! 
  		if (!binary)
  		{
  			int			newline = 0;
--- 841,847 ----
  		/* Initialize all values for row to NULL */
  		MemSet(values, 0, attr_count * sizeof(Datum));
  		MemSet(nulls, 'n', attr_count * sizeof(char));
!     
  		if (!binary)
  		{
  			int			newline = 0;
***************
*** 777,787 ****
  					done = 1;	/* end of file */
  				else
  				{
! 					values[i] = FunctionCall3(&in_functions[i],
  											  CStringGetDatum(string),
  										   ObjectIdGetDatum(elements[i]),
  									  Int32GetDatum(attr[i]->atttypmod));
! 					nulls[i] = ' ';
  				}
  			}
  			if (!done)
--- 875,885 ----
  					done = 1;	/* end of file */
  				else
  				{
! 					values[attr_map[i]] = FunctionCall3(&in_functions[i],
  											  CStringGetDatum(string),
  										   ObjectIdGetDatum(elements[i]),
  									  Int32GetDatum(attr[i]->atttypmod));
! 					nulls[attr_map[i]] = ' ';
  				}
  			}
  			if (!done)
***************
*** 845,851 ****
  									fp);
  						if (CopyGetEof(fp))
  							elog(ERROR, "COPY BINARY: unexpected EOF");
! 						values[i] = PointerGetDatum(varlena_ptr);
  					}
  					else if (!attr[i]->attbyval)
  					{
--- 943,949 ----
  									fp);
  						if (CopyGetEof(fp))
  							elog(ERROR, "COPY BINARY: unexpected EOF");
! 						values[attr_map[i]] = PointerGetDatum(varlena_ptr);
  					}
  					else if (!attr[i]->attbyval)
  					{
***************
*** 857,863 ****
  						CopyGetData(refval_ptr, fld_size, fp);
  						if (CopyGetEof(fp))
  							elog(ERROR, "COPY BINARY: unexpected EOF");
! 						values[i] = PointerGetDatum(refval_ptr);
  					}
  					else
  					{
--- 955,961 ----
  						CopyGetData(refval_ptr, fld_size, fp);
  						if (CopyGetEof(fp))
  							elog(ERROR, "COPY BINARY: unexpected EOF");
! 						values[attr_map[i]] = PointerGetDatum(refval_ptr);
  					}
  					else
  					{
***************
*** 873,882 ****
  						CopyGetData(&datumBuf, fld_size, fp);
  						if (CopyGetEof(fp))
  							elog(ERROR, "COPY BINARY: unexpected EOF");
! 						values[i] = fetch_att(&datumBuf, true, fld_size);
  					}
  
! 					nulls[i] = ' ';
  				}
  			}
  		}
--- 971,980 ----
  						CopyGetData(&datumBuf, fld_size, fp);
  						if (CopyGetEof(fp))
  							elog(ERROR, "COPY BINARY: unexpected EOF");
! 						values[attr_map[i]] = fetch_att(&datumBuf, true, fld_size);
  					}
  
! 					nulls[attr_map[i]] = ' ';
  				}
  			}
  		}
***************
*** 884,889 ****
--- 982,1012 ----
  		if (done)
  			break;
  
+     /*
+      * set any default columns that are not in the COPY attlist
+      */
+     if( attlist )
+     {
+       for(i=0;i<attr_count;++i)
+       {
+         /* If not NULL, this column was not specified in COPY attlist
+          * specification.  Try to set a default value.  We could probably
+          * optimize when the Expr is a constant by calling once outside
+          * the loop...
+          *
+          * OPT: do the MemoryContextSwitchTo outside the loop instead of
+          *      relying on ExecEvalExprSwitchContext to do it.
+          *      example in ExecQual().
+          */
+         if( defexprs[i] != NULL )
+         {
+           values[i] = ExecEvalExprSwitchContext(defexprs[i],econtext,&isnull,&isdone);
+           if( ! isnull )
+             nulls[i] = ' ';
+         }
+       }
+     }
+ 
  		tuple = heap_formtuple(tupDesc, values, nulls);
  
  		if (oids && file_has_oids)
***************
*** 933,940 ****
  
  		for (i = 0; i < attr_count; i++)
  		{
! 			if (!attr[i]->attbyval && nulls[i] != 'n')
! 				pfree(DatumGetPointer(values[i]));
  		}
  
  		heap_freetuple(tuple);
--- 1056,1063 ----
  
  		for (i = 0; i < attr_count; i++)
  		{
! 			if (!attr[i]->attbyval && nulls[attr_map[i]] != 'n')
! 				pfree(DatumGetPointer(values[attr_map[i]]));
  		}
  
  		heap_freetuple(tuple);
***************
*** 947,958 ****
--- 1070,1094 ----
  
  	pfree(values);
  	pfree(nulls);
+   pfree(attr_map);
  
  	if (!binary)
  	{
  		pfree(in_functions);
  		pfree(elements);
  	}
+   if( attlist )
+   {
+     pfree(attr);
+     for(i=0;i<attr_count;++i)
+     {
+       if( defexprs[i] != NULL )
+       {
+         pfree(defexprs[i]);
+       }
+     }
+     pfree(defexprs);
+   }
  
  	ExecDropTupleTable(tupleTable, true);
  
Index: src/backend/optimizer/prep/preptlist.c
===================================================================
RCS file: /var/cvsup/pgsql/src/backend/optimizer/prep/preptlist.c,v
retrieving revision 1.46
diff -c -r1.46 preptlist.c
*** src/backend/optimizer/prep/preptlist.c	5 Nov 2001 17:46:26 -0000	1.46
--- src/backend/optimizer/prep/preptlist.c	13 Jan 2002 23:38:15 -0000
***************
*** 39,45 ****
  static TargetEntry *process_matched_tle(TargetEntry *src_tle,
  					TargetEntry *prior_tle,
  					int attrno);
! static Node *build_column_default(Relation rel, int attrno);
  
  
  /*
--- 39,45 ----
  static TargetEntry *process_matched_tle(TargetEntry *src_tle,
  					TargetEntry *prior_tle,
  					int attrno);
! Node *build_column_default(Relation rel, int attrno);
  
  
  /*
***************
*** 348,354 ****
   * If not, generate a constant of the default value for the attribute type,
   * or a NULL if the type has no default value either.
   */
! static Node *
  build_column_default(Relation rel, int attrno)
  {
  	TupleDesc	rd_att = rel->rd_att;
--- 348,354 ----
   * If not, generate a constant of the default value for the attribute type,
   * or a NULL if the type has no default value either.
   */
! Node *
  build_column_default(Relation rel, int attrno)
  {
  	TupleDesc	rd_att = rel->rd_att;
Index: src/backend/parser/gram.y
===================================================================
RCS file: /var/cvsup/pgsql/src/backend/parser/gram.y,v
retrieving revision 2.276
diff -c -r2.276 gram.y
*** src/backend/parser/gram.y	9 Dec 2001 04:39:39 -0000	2.276
--- src/backend/parser/gram.y	13 Jan 2002 14:56:43 -0000
***************
*** 1193,1203 ****
--- 1193,1217 ----
  					CopyStmt *n = makeNode(CopyStmt);
  					n->binary = $2;
  					n->relname = $3;
+           n->attlist = NIL;
  					n->oids = $4;
  					n->direction = $5;
  					n->filename = $6;
  					n->delimiter = $7;
  					n->null_print = $8;
+ 					$$ = (Node *)n;
+ 				}
+     | COPY opt_binary relation_name opt_column_list opt_with_copy copy_dirn copy_file_name copy_delimiter copy_null
+ 				{
+ 					CopyStmt *n = makeNode(CopyStmt);
+ 					n->binary = $2;
+ 					n->relname = $3;
+           n->attlist = $4;
+ 					n->oids = $5;
+ 					n->direction = $6;
+ 					n->filename = $7;
+ 					n->delimiter = $8;
+ 					n->null_print = $9;
  					$$ = (Node *)n;
  				}
  		;
Index: src/backend/tcop/utility.c
===================================================================
RCS file: /var/cvsup/pgsql/src/backend/tcop/utility.c,v
retrieving revision 1.124
diff -c -r1.124 utility.c
*** src/backend/tcop/utility.c	3 Jan 2002 23:21:32 -0000	1.124
--- src/backend/tcop/utility.c	13 Jan 2002 13:42:56 -0000
***************
*** 354,360 ****
  				 */
  					   stmt->filename,
  					   stmt->delimiter,
! 					   stmt->null_print);
  			}
  			break;
  
--- 354,361 ----
  				 */
  					   stmt->filename,
  					   stmt->delimiter,
! 					   stmt->null_print,
!              stmt->attlist);
  			}
  			break;
  
Index: src/include/commands/copy.h
===================================================================
RCS file: /var/cvsup/pgsql/src/include/commands/copy.h,v
retrieving revision 1.16
diff -c -r1.16 copy.h
*** src/include/commands/copy.h	5 Nov 2001 17:46:33 -0000	1.16
--- src/include/commands/copy.h	13 Jan 2002 13:52:56 -0000
***************
*** 14,22 ****
  #ifndef COPY_H
  #define COPY_H
  
  extern int	copy_lineno;
  
  void DoCopy(char *relname, bool binary, bool oids, bool from, bool pipe,
! 	   char *filename, char *delim, char *null_print);
  
  #endif   /* COPY_H */
--- 14,24 ----
  #ifndef COPY_H
  #define COPY_H
  
+ #include "utils/portal.h"
+ 
  extern int	copy_lineno;
  
  void DoCopy(char *relname, bool binary, bool oids, bool from, bool pipe,
! 	   char *filename, char *delim, char *null_print, List* _al);
  
  #endif   /* COPY_H */
Index: src/include/nodes/parsenodes.h
===================================================================
RCS file: /var/cvsup/pgsql/src/include/nodes/parsenodes.h,v
retrieving revision 1.151
diff -c -r1.151 parsenodes.h
*** src/include/nodes/parsenodes.h	5 Nov 2001 17:46:34 -0000	1.151
--- src/include/nodes/parsenodes.h	13 Jan 2002 13:40:22 -0000
***************
*** 183,188 ****
--- 183,189 ----
  	char	   *filename;		/* if NULL, use stdin/stdout */
  	char	   *delimiter;		/* delimiter character, \t by default */
  	char	   *null_print;		/* how to print NULLs, `\N' by default */
+   List    *attlist;     /* list of attributes */
  } CopyStmt;
  
  /* ----------------------
#12Brent Verner
brent@rcfile.org
In reply to: Brent Verner (#11)
Re: Problem reloading regression database

[2002-01-13 21:39] Brent Verner said:
|
| I believe the memory context issues are handled correctly, but I've

I must retract this assertion. As posted, this patch dies on the
second line of a COPY file... argh. What did I break?

b
--
"Develop your talent, man, and leave the world something. Records are
really gifts from people. To think that an artist would love you enough
to share his music with anyone is a beautiful thing." -- Duane Allman

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Brent Verner (#12)
Re: Problem reloading regression database

Brent Verner <brent@rcfile.org> writes:

I must retract this assertion. As posted, this patch dies on the
second line of a COPY file... argh. What did I break?

First guess is that you allocated some data structure in the per-tuple
context that needs to be in the per-query context (ie, needs to live
throughout the copy).

regards, tom lane

#14Brent Verner
brent@rcfile.org
In reply to: Tom Lane (#13)
Re: Problem reloading regression database

[2002-01-13 22:51] Tom Lane said:
| Brent Verner <brent@rcfile.org> writes:
| > I must retract this assertion. As posted, this patch dies on the
| > second line of a COPY file... argh. What did I break?
|
| First guess is that you allocated some data structure in the per-tuple
| context that needs to be in the per-query context (ie, needs to live
| throughout the copy).

yup. The problem sneaks up when I get a default value for a "text"
column via ExecEvalExprSwithContext. Commenting out the pfree above
heap_formtuple makes the error go away, but I know that's not the
right answer. Should I avoid freeing the !attbyval items when they've
come from ExecEvalExpr -- I don't see any other examples of freeing
returns from this fn.

b

--
"Develop your talent, man, and leave the world something. Records are
really gifts from people. To think that an artist would love you enough
to share his music with anyone is a beautiful thing." -- Duane Allman

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Brent Verner (#14)
Re: Problem reloading regression database

Brent Verner <brent@rcfile.org> writes:

yup. The problem sneaks up when I get a default value for a "text"
column via ExecEvalExprSwithContext. Commenting out the pfree above
heap_formtuple makes the error go away, but I know that's not the
right answer.

Oh, the pfree for the attribute values? Ah so. I knew that would
bite us someday. See, the way this code presently works is that all of
copy.c runs in the per-query memory context. It calls all of the
datatype conversion routines in that same context. It assumes that
the routines that return pass-by-ref datatypes will return palloc'd
values (and not, say, pointers to constant values) --- which is not
a good assumption IMHO, even though I think it's true at the moment.
This assumption is what's needed to justify the pfree's at the bottom of
the loop. What's even worse is that it assumes that the conversion
routines leak no other memory; if any conversion routine palloc's
something it doesn't pfree, then over the course of a long enough copy
we run out of memory.

In the case of ExecEvalExpr, if the expression is just a T_Const node
then what you get back (for a pass-by-ref datatype) is a pointer to
the value sitting in the Const node. pfreeing this is bad juju.

Should I avoid freeing the !attbyval items when they've
come from ExecEvalExpr -- I don't see any other examples of freeing
returns from this fn.

I believe the correct solution is to get rid of the retail pfree's
altogether. The clean way to run this code would be to switch to
the per-tuple context at the head of the per-tuple loop (say, right
after ResetPerTupleExprContext), run all the datatype conversion
routines *and* ExecEvalExpr in this context, and then switch back
to per-query context just before heap_formtuple. Then at the
loop bottom the only explicit free you need is the heap_freetuple.
The individual attribute values are inside the per-tuple context
and they'll be freed by the ResetPerTupleExprContext at the start
of the next loop. Fewer cycles, works right whether the values are
palloc'd or not, and positively prevents any problems with leaks
inside the datatype conversion routines --- since any leaked pallocs
will also be inside the per-tuple context.

An even more radical approach would be to try to run the whole loop in
per-tuple context, but I think that will probably break things; the
index insertion code, at least, expects to be called in per-query
context because it sometimes makes allocations that must live across
calls. (Cleaning that up is on my long-term to-do list; I'd prefer
to see almost all of the executor run in per-tuple contexts, so as
to avoid potential memory leaks very similar to the situation here.)

You'll need to make sure that the code isn't expecting to palloc
anything first-time-through and re-use it on later loops, but I
think that will be okay. (The attribute_buf is the most obvious
risk, but that's all right, see stringinfo.c.)

regards, tom lane

#16Brent Verner
brent@rcfile.org
In reply to: Tom Lane (#15)
Re: Problem reloading regression database

[2002-01-14 00:03] Tom Lane said:
| Brent Verner <brent@rcfile.org> writes:
| > yup. The problem sneaks up when I get a default value for a "text"
| > column via ExecEvalExprSwithContext. Commenting out the pfree above
| > heap_formtuple makes the error go away, but I know that's not the
| > right answer.
|
| Oh, the pfree for the attribute values? Ah so. I knew that would
| bite us someday. See, the way this code presently works is that all of
| copy.c runs in the per-query memory context. It calls all of the
| datatype conversion routines in that same context. It assumes that
| the routines that return pass-by-ref datatypes will return palloc'd
| values (and not, say, pointers to constant values) --- which is not
| a good assumption IMHO, even though I think it's true at the moment.
| This assumption is what's needed to justify the pfree's at the bottom of
| the loop. What's even worse is that it assumes that the conversion
| routines leak no other memory; if any conversion routine palloc's
| something it doesn't pfree, then over the course of a long enough copy
| we run out of memory.

check. I just loaded 3mil records (with my hacked copy.c), and had
ended up around 36M... <phew!!> I'm gonna load a similar file
with a clean copy.c, just to see if that leak is present without
my changes -- I suspect it's not, but I'd like to see the empirical
effect of my change(s)....

Thanks for the commentary. It really helps glue together the
thoughts I had from reading over the memory context code.

| In the case of ExecEvalExpr, if the expression is just a T_Const node
| then what you get back (for a pass-by-ref datatype) is a pointer to
| the value sitting in the Const node. pfreeing this is bad juju.

yup, that seems like it'd explain my symptom.

| > Should I avoid freeing the !attbyval items when they've
| > come from ExecEvalExpr -- I don't see any other examples of freeing
| > returns from this fn.
|
| I believe the correct solution is to get rid of the retail pfree's
| altogether. The clean way to run this code would be to switch to
| the per-tuple context at the head of the per-tuple loop (say, right
| after ResetPerTupleExprContext), run all the datatype conversion
| routines *and* ExecEvalExpr in this context, and then switch back
| to per-query context just before heap_formtuple. Then at the
| loop bottom the only explicit free you need is the heap_freetuple.
| The individual attribute values are inside the per-tuple context
| and they'll be freed by the ResetPerTupleExprContext at the start
| of the next loop. Fewer cycles, works right whether the values are
| palloc'd or not, and positively prevents any problems with leaks
| inside the datatype conversion routines --- since any leaked pallocs
| will also be inside the per-tuple context.

Gotcha. This certainly sounds like it will alleviate my pfree
problem. I'll get back to this tomorrow evening.

| An even more radical approach would be to try to run the whole loop in
| per-tuple context, but I think that will probably break things; the
| index insertion code, at least, expects to be called in per-query
| context because it sometimes makes allocations that must live across
| calls. (Cleaning that up is on my long-term to-do list; I'd prefer
| to see almost all of the executor run in per-tuple contexts, so as
| to avoid potential memory leaks very similar to the situation here.)
|
| You'll need to make sure that the code isn't expecting to palloc
| anything first-time-through and re-use it on later loops, but I
| think that will be okay. (The attribute_buf is the most obvious
| risk, but that's all right, see stringinfo.c.)

So I /can't/ palloc some things /before/ switching context to
per-tuple-context? I ask because I'm palloc'ing a couple of
arrays, that would have to be MaxHeapAttributeNumber long to
make sure we've enough space. Though, thinking about it, an
additional 13k of static storage in the binary is not all that
much.

thanks.
brent

--
"Develop your talent, man, and leave the world something. Records are
really gifts from people. To think that an artist would love you enough
to share his music with anyone is a beautiful thing." -- Duane Allman

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Brent Verner (#16)
Re: Problem reloading regression database

Brent Verner <brent@rcfile.org> writes:

| You'll need to make sure that the code isn't expecting to palloc
| anything first-time-through and re-use it on later loops, but I
| think that will be okay. (The attribute_buf is the most obvious
| risk, but that's all right, see stringinfo.c.)

So I /can't/ palloc some things /before/ switching context to
per-tuple-context?

Oh, sure you can. That's the point of having a per-query context.
What I was wondering was whether there were any pallocs executed
*after* entering the loop that the code expected to live across
loop cycles. I don't think so, I'm just mentioning the risk as
part of your education ;-)

regards, tom lane

#18Brent Verner
brent@rcfile.org
In reply to: Tom Lane (#17)
Re: Problem reloading regression database

[2002-01-14 00:41] Tom Lane said:
| Brent Verner <brent@rcfile.org> writes:
| > | You'll need to make sure that the code isn't expecting to palloc
| > | anything first-time-through and re-use it on later loops, but I
| > | think that will be okay. (The attribute_buf is the most obvious
| > | risk, but that's all right, see stringinfo.c.)
|
| > So I /can't/ palloc some things /before/ switching context to
| > per-tuple-context?
|
| Oh, sure you can. That's the point of having a per-query context.
| What I was wondering was whether there were any pallocs executed
| *after* entering the loop that the code expected to live across
| loop cycles. I don't think so, I'm just mentioning the risk as
| part of your education ;-)

gotcha. No, I don't think anything inside that loop expects to
persist across iterations. The attribute_buf is static to the
file, and initialized in DoCopy.

What I ended up doing is switching to per-tuple-context prior to
the input loop, then switching back to the (saved) query-context
after exiting the loop. I followed ResetTupleExprContext back, and
it doesn't seem to do anything that would require a switch per loop.
Are there any problems this might cause that I'm not seeing with
my test case?

Memory use is now under control, and things look good (stable around
2.8M).

sleepy:/usr/local/pg-7.2/bin
brent$ ./psql -c '\d yyy'
Table "yyy"
Column | Type | Modifiers
--------+---------+------------------------------------------------
id | integer | not null default nextval('"yyy_id_seq"'::text)
a | integer | not null default 1
b | text | not null default 'test'
c | integer |
Unique keys: yyy_id_key

sleepy:/usr/local/pg-7.2/bin
brent$ wc -l mmm
3200386 mmm
sleepy:/usr/local/pg-7.2/bin
brent$ head -10 mmm
\N
\N
\N
20
10
20
20
40
50
20
sleepy:/usr/local/pg-7.2/bin
brent$ ./psql -c 'copy yyy(c) from stdin' < mmm
sleepy:/usr/local/pg-7.2/bin

thanks.
brent

--
"Develop your talent, man, and leave the world something. Records are
really gifts from people. To think that an artist would love you enough
to share his music with anyone is a beautiful thing." -- Duane Allman

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Brent Verner (#18)
Re: Problem reloading regression database

Brent Verner <brent@rcfile.org> writes:

gotcha. No, I don't think anything inside that loop expects to
persist across iterations. The attribute_buf is static to the
file, and initialized in DoCopy.

There is more to attribute_buf than meets the eye ;-)

What I ended up doing is switching to per-tuple-context prior to
the input loop, then switching back to the (saved) query-context
after exiting the loop. I followed ResetTupleExprContext back, and
it doesn't seem to do anything that would require a switch per loop.
Are there any problems this might cause that I'm not seeing with
my test case?

I really don't feel comfortable with running heap_insert or the
subsequent operations in a per-tuple context. Have you tried any
test cases that involve triggers or indexes?

regards, tom lane

#20Brent Verner
brent@rcfile.org
In reply to: Tom Lane (#19)
Re: Problem reloading regression database

[2002-01-14 21:52] Tom Lane said:
| Brent Verner <brent@rcfile.org> writes:
| > gotcha. No, I don't think anything inside that loop expects to
| > persist across iterations. The attribute_buf is static to the
| > file, and initialized in DoCopy.
|
| There is more to attribute_buf than meets the eye ;-)

I certainly don't doubt that, especially when it's my eye :-O

| > What I ended up doing is switching to per-tuple-context prior to
| > the input loop, then switching back to the (saved) query-context
| > after exiting the loop. I followed ResetTupleExprContext back, and
| > it doesn't seem to do anything that would require a switch per loop.
| > Are there any problems this might cause that I'm not seeing with
| > my test case?
|
| I really don't feel comfortable with running heap_insert or the
| subsequent operations in a per-tuple context. Have you tried any
| test cases that involve triggers or indexes?

no, I dropped the index for the 3mil COPY. I will run with some
triggers and indexes in the table.

cheers.
brent

--
"Develop your talent, man, and leave the world something. Records are
really gifts from people. To think that an artist would love you enough
to share his music with anyone is a beautiful thing." -- Duane Allman

#21Brent Verner
brent@rcfile.org
In reply to: Tom Lane (#19)
Re: Problem reloading regression database

[2002-01-14 21:52] Tom Lane said:
| Brent Verner <brent@rcfile.org> writes:
| > gotcha. No, I don't think anything inside that loop expects to
| > persist across iterations. The attribute_buf is static to the
| > file, and initialized in DoCopy.
|
| There is more to attribute_buf than meets the eye ;-)
|
| > What I ended up doing is switching to per-tuple-context prior to
| > the input loop, then switching back to the (saved) query-context
| > after exiting the loop. I followed ResetTupleExprContext back, and
| > it doesn't seem to do anything that would require a switch per loop.
| > Are there any problems this might cause that I'm not seeing with
| > my test case?
|
| I really don't feel comfortable with running heap_insert or the
| subsequent operations in a per-tuple context. Have you tried any
| test cases that involve triggers or indexes?

Yes, and I'm seeing no new problems (so far), but there is a problem
in the current copy.c. Running the following on unmodified 7.2b5
causes the backend to consume 17-18Mb of memory. Removing the
REFERENCES on yyy.b causes memory use to be normal.

bash$ cat copy.sql
DROP table yyy;
DROP SEQUENCE yyy_id_seq ;
DROP TABLE zzz;
DROP SEQUENCE zzz_id_seq ;
CREATE TABLE zzz (
id SERIAL,
a INT,
b TEXT NOT NULL DEFAULT 'test' PRIMARY KEY,
c INT NOT NULL DEFAULT 1
);
CREATE TABLE yyy (
id SERIAL,
a INT,
b TEXT NOT NULL DEFAULT 'test' REFERENCES zzz(b),
c INT NOT NULL DEFAULT 1
);
-- make sure there is a 'test' value in zzz.b
INSERT INTO zzz (a) VALUES (10);
COPY yyy FROM '/tmp/sometmpfilehuh'

bash$ for i in `seq 1 200000`; do echo "$i $i test $i" >> /tmp/sometmpfilehuh; done

bash$ head -1 /tmp/sometmpfilehuh; tail -1 /tmp/sometmpfilehuh
1 1 test 1
200000 200000 test 200000

bash$ ./psql < copy.sql

Any ideas? I'm looking around ExecBRInsertTriggers() to see what
might need to be freed around that call.

thanks.
brent

--
"Develop your talent, man, and leave the world something. Records are
really gifts from people. To think that an artist would love you enough
to share his music with anyone is a beautiful thing." -- Duane Allman

#22Brent Verner
brent@rcfile.org
In reply to: Brent Verner (#21)
Re: Problem reloading regression database

[2002-01-15 00:44] Brent Verner said:

| I'm looking around ExecBRInsertTriggers() to see what
| might need to be freed around that call.

scratch this idea. this bit is not even hit in my test case... sorry.

b
--
"Develop your talent, man, and leave the world something. Records are
really gifts from people. To think that an artist would love you enough
to share his music with anyone is a beautiful thing." -- Duane Allman

#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Brent Verner (#21)
Re: Problem reloading regression database

Brent Verner <brent@rcfile.org> writes:

Yes, and I'm seeing no new problems (so far), but there is a problem
in the current copy.c. Running the following on unmodified 7.2b5
causes the backend to consume 17-18Mb of memory.

Probably that's just the space consumed for the pending-trigger events
created by the AFTER trigger that implements the foreign key check.
There should be a provision for shoving that list out to disk when
it gets too large ... but it ain't happening for 7.2.

regards, tom lane

#24Brent Verner
brent@rcfile.org
In reply to: Tom Lane (#23)
Re: Problem reloading regression database

[2002-01-15 01:07] Tom Lane said:
| Brent Verner <brent@rcfile.org> writes:
| > Yes, and I'm seeing no new problems (so far), but there is a problem
| > in the current copy.c. Running the following on unmodified 7.2b5
| > causes the backend to consume 17-18Mb of memory.
|
| Probably that's just the space consumed for the pending-trigger events
| created by the AFTER trigger that implements the foreign key check.
| There should be a provision for shoving that list out to disk when
| it gets too large ... but it ain't happening for 7.2.

gotcha. I'll move on along then...

thanks.
brent

--
"Develop your talent, man, and leave the world something. Records are
really gifts from people. To think that an artist would love you enough
to share his music with anyone is a beautiful thing." -- Duane Allman

#25Brent Verner
brent@rcfile.org
In reply to: Tom Lane (#19)
1 attachment(s)
Re: Problem reloading regression database

[2002-01-14 21:52] Tom Lane said:
| Brent Verner <brent@rcfile.org> writes:
| > gotcha. No, I don't think anything inside that loop expects to
| > persist across iterations. The attribute_buf is static to the
| > file, and initialized in DoCopy.
|
| There is more to attribute_buf than meets the eye ;-)
|
| > What I ended up doing is switching to per-tuple-context prior to
| > the input loop, then switching back to the (saved) query-context
| > after exiting the loop. I followed ResetTupleExprContext back, and
| > it doesn't seem to do anything that would require a switch per loop.
| > Are there any problems this might cause that I'm not seeing with
| > my test case?
|
| I really don't feel comfortable with running heap_insert or the
| subsequent operations in a per-tuple context. Have you tried any
| test cases that involve triggers or indexes?

Yes. The attached patch appears to do the right thing with all
indexes and triggers (RI) that I've tested. I'm still doing the
MemoryContextSwitchTo() outside the main loop, and have added some
more sanity checking for column name input.

If anyone could test this (with non-critical data ;-) or otherwise
give feedback, I'd appreciate it; especially if someone could test
with a BEFORE INSERT trigger.

cheers.
brent

--
"Develop your talent, man, and leave the world something. Records are
really gifts from people. To think that an artist would love you enough
to share his music with anyone is a beautiful thing." -- Duane Allman

Attachments:

copy.colspec.3.diff.gzapplication/x-gunzipDownload
�L�C<copy.colspec.3.diff�<iw�8���_�x:1eQ������8��g�������~��,N(����������E�q�N��X���Px����%�{0s�<���j����}���/�<9;:g��vp���U�E�e�o�`����������������xa����^���[,X�e��~3	����6lj=�����j�Y�=i�'�!k��S�6�����}�'�:����Z�~G�S������~2&��� ��&
��-��M�(�0��x���<eI�����Bo��`���:��Cmc��l�>�y����4������lo��]:1��s�[�A���(���~X1��8\}�i����7��7OO�(K�H�-(d�vE�(>���ie��3��O����*���Q������|l���z	������9k�>VUiRg��;���%�
�a��u�C[= L���f����)#/�6����!�T�1��6L����
� �5���"OA/E�1@o3��S��4d�X��v�nw�XS�����`��;�� d/���Y����;oe!�%Oy�����
��4�q���P%�	��b\��B(�$������d��3��~���cM|���'����:3'�,
	"��Qa*���1������r%��c�;���}{�7 X����������%&\,aSt�Fv��3M�f�����0���g�Nt�E>�������y�b~K�,6���x��D��0h��A���H��9��b_�b�����"{s����?��Xb���(��(~]6��o�x5�.�@���e)��"����4[�x\���fAJ5oV��I�C/��,�J�.�q[�R4�+Z�d�)�I���4[Q�+�����(Rb	EMVC�@�K���a�A�[f7Po��'&���	rp��Ojp|��t�w~"��7���0�V��E3,?��� �..h}������9�C�������9�#I`�����K���^4�A����@|�(�����D?�'�
���A��%1�4�Z���@6m���I^#8��P�O@�Pe�g�s�0��I��9sa	���Zr��������	����8�)�u�����_&l^��^�`�Gc{�H��.PI�%N�c��K>��N�9<�Y��s@��V�t�@]RLs��8���
��Sw��Si��\����U���&��,>��(�{��}~�)�( ��_��
0J	�^%	�SK���.�C�R��YI����X98K������)���D��GjK�	�E�OQ�������a���F?2`�e�T��Y]���^Z�gg����w��G�����2	 ���9�4�W!�N=��: �PS ��)s��7H�������������|&�/����t��^�X����
OF
%h	��]E|���[���A a�z�>i:Oc���x���Z��^���|D�
OAr�@���J���k�[��.g~�O���
�&K���A���<����t��=b7���M�W�0+0�"���A�	��J~�I�8�	]^9Q^sD�}�a�������t�_*<��@!
)���]��4���`foU]����x���{�"����	/�q��h%dWl{�������?<\�E��6V]�+���:�|������-=�E��$�,���
���� Xk�U��A�LW!KhUi$*����T�WjD��,I"�|�j �d��V�����k/]j6����	�Ud	y7����"�������'B*�p�g�Z����a�t�4s|�\A�2��}m��	�_x�IQ��CJ�v��;z�s~������z��8p���
���Fs��S?:t���`��� ��D5e2��P�&M�v	��7�&��;��	���>���F��g��=�����E;X���e��������
��!�f�[N���������b0~��[-�|^S1n���I�-�u��_�+ =Y��J���5_�sr0�>8	XyP����]T��X�%���;�umQ�TTuD����+�\��=�H��f�?C���]C���0��|���	�c* ��d]{}���b��7��f`.��Kn(���hb*D�$U ]�"����&���J�&tm$�5�;��8j���@I��hJy�[�	�_��8rW�C�
Pa,
�%�X��p|h��x�cT(�y�O�q��$PePL��`9�V��+/��5����e� <l����8vnA�r�#��A���h�<[kXk(�-�, ���&���G��>�����
����
��%V�&��<$���A�A�]LP�\O��[�V�%.�&���K`��������T(���s��oK��K$���c�*C��������RX�0!2$��s���x ��EyTC���L�T\�y/Z=Q���U�^��E�PUY�5������2�P�e,�s�R���D|����y�Fe!6�z
B�y���U,��uKQ�=7nQUT���*�j����];����T��A���-����+
L��&n	q�JWd��t��@�����R�k���%���TLL���+��2g����f���h�ng���^��&����q�M!@�
r�D�I�`����2��6
�(��	���&A����XO���>����������N��r�d���N��������C����m
���Z��^��%�^�c?�iN�5uj���'z>@�Sgk'HF��7	m�D���2�����ql�\�v@G�/0�-���4���I_0CgZ7��
���7 9)o�!O&�j_�%�Qf|�������x��i�M��S}M�W���������ub
�H���51��}h�oZ�)0�H��)wa;����ac��.��#7��7W�b2Yd�T5(��B�c�c���II�Dq�����f�9��r0��gO����JL	��cJ0i�+K��(�[+������:��RwyZj��K���S�&RNS��`��3��S���0R��v�����L��?���^������sk���a�k{����	F�����Z*�Ee�!V������T�9O-��f-�U)e!�Z�.�{��n�[���u�u�E�D�X)���Ky�J�	��c����
<F���P�*�8����8�"S�mRh���:�"���2����jrYKE��# V�*[�����0vtNibLi"Q,�5���������d�7��X�V���'A���
K�l�������]�+-�4t���,�ed��c�R�1�/"�Y<JV:�*2���V���V��1���mW�Z��������', �� �:~�fG�E1�boT �tR��������{),�iQ�ryp|<�v�.WK�C>}�t��<��R�a-���3���������x6�)��9k��-�X����2�o�s�>�P�r��	x�����k�GmC�)�������/�F�~�����5�>����f�-T��cD3�Dy+t�s���8��i��w�+�>F:��gv#VSa���yS��@��$��m3��m�W�����V�Z�|<��m��`���"R�~.�����#Z���aM4(	��`�X����Oxt�����PH@@ %���<���Z�[0��Qg3o��q�?g�����%
��Hg�������~����7s������/��'f������q���9_<�j3p�������(-�������34&��?7OA�I�:�e:���������i�<�k��hE��� b������i��w��ZZ�N_�hX?+Q�\�������z�W�6d��t��S3�P��K�V���<��\�������M�)�S���2������d�J����=������$�s��wy��D&hlf�f�2Z�OT��l��������>"�����^5y��������r��2�)���� 3����V�#Og��1��>��u$
6d��X�"�C��s��������n���=�"���*L(���~+ft�
,s)����
2��t\ia����C`��$��0�o�L\������8�G��V�`S�9U�QLq�p:Q���VjR�g��&��T���9���8���vVqO8������~�=`�eH�q��$���d�Z�dm�d�Q�� �k�u+-���kv��<+����������P}2y�|>��'����(�V��Q�#�#6\�����Pl�B�7�{���W�E�y�Z���W�B}{8�
&�A~�&)���2R�[\��t'�����r��;�{�5Dy����/!<
R�{Q��m�n0A0\X��K��06�����lq����?��_t�p�*�p�b$*����v
������f�<�1}�3R�������!�zP����9R��hx�HXoQ`Z�1��8��m��}�wP�0���C���yqn��v�H�������C�?�
k91D7���������x�M��w��Q��L�Z6����q�R�Vo�O��8�@1h���u���M��I��������_��-6�e�O���<]���7x	9n��b4_����������{�������b������K�������;$4���u�+F���A��5C�����U#�9��?�#�l��bu��w��4����SS�+�b��O��<���E �hF�DE+�D��p���Q���C���o������5:;�F�#��d�8�T�z"��@�@��,�|/���2g����N��C�+�-[	s���t��n�x�Pg��C������3���n��`I������gg�����ZM��|o(�*�f!��9'r~����?�8i�t����0���K;�l��6`bvTN�8��r7����[�',�3�9���2�
�i����y��S�o�����R�=`V�\_:t��>^sSw�ov'���?��7{v'_���[����OX@'�yQ��
H��J���{��F���C�?��y�T�\mk#�y��	�6�� 3�A��������\1-�
���o'K�r���DDc��m�k}��*�o�:�o3�����]�m�B�`���UQ��^k���������H�S��x}R~�V�Z�X����q�����������r� �i1�p\������}��������e���z�[B�n�l�4����0_��|���Y������9��W���������Q
#26Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Brent Verner (#25)
Re: Problem reloading regression database

This has been saved for the 7.3 release:

http://candle.pha.pa.us/cgi-bin/pgpatches2

---------------------------------------------------------------------------

Brent Verner wrote:

[2002-01-14 21:52] Tom Lane said:
| Brent Verner <brent@rcfile.org> writes:
| > gotcha. No, I don't think anything inside that loop expects to
| > persist across iterations. The attribute_buf is static to the
| > file, and initialized in DoCopy.
|
| There is more to attribute_buf than meets the eye ;-)
|
| > What I ended up doing is switching to per-tuple-context prior to
| > the input loop, then switching back to the (saved) query-context
| > after exiting the loop. I followed ResetTupleExprContext back, and
| > it doesn't seem to do anything that would require a switch per loop.
| > Are there any problems this might cause that I'm not seeing with
| > my test case?
|
| I really don't feel comfortable with running heap_insert or the
| subsequent operations in a per-tuple context. Have you tried any
| test cases that involve triggers or indexes?

Yes. The attached patch appears to do the right thing with all
indexes and triggers (RI) that I've tested. I'm still doing the
MemoryContextSwitchTo() outside the main loop, and have added some
more sanity checking for column name input.

If anyone could test this (with non-critical data ;-) or otherwise
give feedback, I'd appreciate it; especially if someone could test
with a BEFORE INSERT trigger.

cheers.
brent

--
"Develop your talent, man, and leave the world something. Records are
really gifts from people. To think that an artist would love you enough
to share his music with anyone is a beautiful thing." -- Duane Allman

[ Attachment, skipping... ]

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026