ERROR: relation . . . does not exist

Started by Albretch Muellerover 17 years ago17 messagesgeneral
Jump to latest
#1Albretch Mueller
lbrtchx@gmail.com

Hi,
~
I created a number of csv files which I need to import into PG
tables. On the them looks like this:
~
sh-3.1# head -n 3
/media/hda3/prjx/JPack/REF/LANDMARKS/PROFILES/20080828104627_j2sdk-1_4_2_18-linux-i586.binfls.md5sum.txt
"md5sum","fl"
"d41d8cd98f00b204e9800998ecf8427e",".systemPrefs/.system.lock"
"d41d8cd98f00b204e9800998ecf8427e",".systemPrefs/.systemRootModFile"
~
and has a totalof 565 lines
~
sh-3.1# wc -l /media/hda3/prjx/JPack/REF/LANDMARKS/PROFILES/20080828104627_j2sdk-1_4_2_18-linux-i586.binfls.md5sum.txt
565 /media/hda3/prjx/JPack/REF/LANDMARKS/PROFILES/20080828104627_j2sdk-1_4_2_18-linux-i586.binfls.md5sum.txt
~
However, after installing pg from source (using ./configure
--without-readline --prefix=/media/hda4/pgsql/PGDBMS) and creating the
postgres user just fine (as you can see from a knoppix box):
~
root@Knoppix:/media/hda4/pgsql/inst/postgresql-8.3.3# adduser postgres
root@Knoppix:/media/hda4/pgsql/inst/postgresql-8.3.3# chown postgres
/media/hda4/pgsql/DATA
root@Knoppix:/media/hda4/pgsql/inst/postgresql-8.3.3# su - postgres
No directory, logging in with HOME=/
postgres@Knoppix:/$ cd /media/hda4/pgsql/PGDBMS/bin
postgres@Knoppix:/media/hda4/pgsql/PGDBMS/bin$ initdb -D /media/hda4/pgsql/DATA
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale C.
The default database encoding has accordingly been set to SQL_ASCII.
The default text search configuration will be set to "english".

fixing permissions on existing directory /media/hda4/pgsql/DATA ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers/max_fsm_pages ... 24MB/153600
creating configuration files ... ok
creating template1 database in /media/hda4/pgsql/DATA/base/1 ... ok
initializing pg_authid ... ok
initializing dependencies ... ok
creating system views ... ok
loading system objects' descriptions ... ok
creating conversions ... ok
creating dictionaries ... ok
setting privileges on built-in objects ... ok
creating information schema ... ok
vacuuming database template1 ... ok
copying template1 to template0 ... ok
copying template1 to postgres ... ok

WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the -A option the
next time you run initdb.

Success. You can now start the database server using:

postgres -D /media/hda4/pgsql/DATA
or
pg_ctl -D /media/hda4/pgsql/DATA -l logfile start

// __ then starting PG
postgres@Knoppix:/media/hda4/pgsql/PGDBMS/bin$ postgres -D
/media/hda4/pgsql/DATA >logfile 2>&1 &
postgres@Knoppix:/media/hda4/pgsql/PGDBMS/bin$ chown postgres
/media/hda4/pgsql/LOGS
postgres@Knoppix:/media/hda4/pgsql/PGDBMS/bin$ postgres -D
/media/hda4/pgsql/DATA > /media/hda4/pgsql/LOGS/logfile 2>&1
[1]: 16712 postgres@Knoppix:/media/hda4/pgsql/PGDBMS/bin$ createdb jpk; postgres@Knoppix:/media/hda4/pgsql/PGDBMS/bin$ psql jpk; Welcome to psql 8.3.3, the PostgreSQL interactive terminal.
postgres@Knoppix:/media/hda4/pgsql/PGDBMS/bin$ createdb jpk;
postgres@Knoppix:/media/hda4/pgsql/PGDBMS/bin$ psql jpk;
Welcome to psql 8.3.3, the PostgreSQL interactive terminal.

Type: \copyright for distribution terms
\h for help with SQL commands
\? for help with psql commands
\g or terminate with semicolon to execute query
\q to quit

jpk=# COPY j2sdk_1_4_2_18_binfls_md5sum FROM
'/media/hda3/prjx/JPack/REF/LANDMARKS/PROFILES/20080828104627_j2sdk-1_4_2_18-linux-i586.binfls.md5sum.txt'
CSV HEADER;
ERROR: relation "j2sdk_1_4_2_18_binfls_md5sum" does not exist
~
What is going on here?
~
How can I fix that error, effectively creating and populating the
tables from csv files?
~
Thanks
lbrtchx

In reply to: Albretch Mueller (#1)
Re: ERROR: relation . . . does not exist

On 28/08/2008 22:29, Albretch Mueller wrote:

jpk=# COPY j2sdk_1_4_2_18_binfls_md5sum FROM
'/media/hda3/prjx/JPack/REF/LANDMARKS/PROFILES/20080828104627_j2sdk-1_4_2_18-linux-i586.binfls.md5sum.txt'
CSV HEADER;
ERROR: relation "j2sdk_1_4_2_18_binfls_md5sum" does not exist
~
What is going on here?

Just what it says - the relation doesn't exist 'cos you haven't created
it yet. You have to create the table first before you can COPY into it.

How can I fix that error, effectively creating and populating the
tables from csv files?

Just include a CREATE TABLE statement in the same script before the COPY.

Ray.

------------------------------------------------------------------
Raymond O'Donnell, Director of Music, Galway Cathedral, Ireland
rod@iol.ie
Galway Cathedral Recitals: http://www.galwaycathedral.org/recitals
------------------------------------------------------------------

#3Albretch Mueller
lbrtchx@gmail.com
In reply to: Raymond O'Donnell (#2)
Re: ERROR: relation . . . does not exist

On Thu, Aug 28, 2008 at 5:40 PM, Raymond O'Donnell <rod@iol.ie> wrote:

You have to create the table first before you can COPY into it.

~
Well, based on how the statement reads I made the wrong assumption of
thinking that PG would be smart enough to make ends-meat with the
data that it gets once you indicate:
~
1) the header names (to be mapped to column names)
~
2) the data from which PG could easily figure out the types for each
column, since it is CSV data
~
That may/should become a RFE
~
Thanks
lbrtchx
~

#4Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Albretch Mueller (#3)
Re: ERROR: relation . . . does not exist

On Thursday 28 August 2008 3:14:01 pm Albretch Mueller wrote:

On Thu, Aug 28, 2008 at 5:40 PM, Raymond O'Donnell <rod@iol.ie> wrote:

You have to create the table first before you can COPY into it.

~
Well, based on how the statement reads I made the wrong assumption of
thinking that PG would be smart enough to make ends-meat with the
data that it gets once you indicate:
~
1) the header names (to be mapped to column names)
~
2) the data from which PG could easily figure out the types for each
column, since it is CSV data

Define easily. A CSV column has fields with 1's and 0's should that be an
integer field or a boolean field? A CSV column has a series of text entries
none of which are greater then 25 characters long. Does that mean a datatype
of varchar(25) or is it just an artifact of the data to date? You have a CSV
column with text strings of format 'YYYY/MM/DD' . Do you want them stored as
strings or dates? I could go on, but the point is that table data types
require some thought on the part of the DBA.

~
That may/should become a RFE
~
Thanks
lbrtchx
~

--
Adrian Klaver
aklaver@comcast.net

#5Albretch Mueller
lbrtchx@gmail.com
In reply to: Adrian Klaver (#4)
Re: ERROR: relation . . . does not exist

On Thu, Aug 28, 2008 at 7:50 PM, Adrian Klaver <aklaver@comcast.net> wrote:

Define easily.

~
OK, let me try to outline the approach I would go for:
~
I think "COPY FROM CSV" should have three options, namely:
~
1) the way we have used it in which you create the table first
~
2) another way in which defaults are declared, generally as:
~
2.1) aggressive: data type, value and formatting analysis is done; if
only 1 or 0 are found declare then a BOOLEAN, if repeated data is
found (say state codes) and the stratification nodes cover the rest of
the data, stratify the data out to other extra table (they have a name
I can't recall now), index it ..., if data is kind of numeric with
front slashes and/or hyphen could they possibly be dates? if they are
definitelly dates convert them to bigint (and do the formatting in the
presentation code (also this a win-win situation with i18n code)) ...
~
2.2) conservative: data type and value, but no formatting analysis is
done and the greater encompassing data type is selected, say for 1 or
0 data use bytes [0, 255], for bytes use int, if something could be
encoded as char(2), use varchar instead, . . .
~
2.3) dumn: just use the coarsest data type possible; bigint for
anything that looks like a number and varchar for the rest
~
the "dumn" option should suggest to the DBA the option they are
using, quantified consequences for their desicions (larger DBs for no
reason, approx. reduction in speed, . .) and how not to be "dumn"
~
3) or you could define "import templates" declaring which specific
data types to use for data in a certain way, which could be declared
per column using regexps
~

I could go on, but the point is that table data types require some thought on the part of the DBA.

~
Well, it still requires their minds and input, but they will have
jobs even if they get some help, don't you think so ;-)
~
lbrtchx

#6Martijn van Oosterhout
kleptog@svana.org
In reply to: Albretch Mueller (#5)
Re: ERROR: relation . . . does not exist

On Sat, Aug 30, 2008 at 08:23:25AM -0400, Albretch Mueller wrote:

OK, let me try to outline the approach I would go for:
~
I think "COPY FROM CSV" should have three options, namely:

I think you're confusing postgresql with a spreadsheet program. A
database is designed to take care of your data and ensure its
integrity. As such it requires a little more thought.

There are client programs which will do this for you, perhaps you wan
one of those?

What's so hard about:

create table foo (a text, b text);

After which your COPY will complete fine.

2.1) aggressive: data type, value and formatting analysis is done; if
only 1 or 0 are found declare then a BOOLEAN, if repeated data is
found (say state codes) and the stratification nodes cover the rest of
the data, stratify the data out to other extra table (they have a name
I can't recall now), index it ..., if data is kind of numeric with
front slashes and/or hyphen could they possibly be dates? if they are
definitelly dates convert them to bigint (and do the formatting in the
presentation code (also this a win-win situation with i18n code)) ...

PostgreSQL has 60+ types and many look like eachother. How do you
propose to differentiate?

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Please line up in a tree and maintain the heap invariant while
boarding. Thank you for flying nlogn airlines.

#7Albretch Mueller
lbrtchx@gmail.com
In reply to: Martijn van Oosterhout (#6)
Re: ERROR: relation . . . does not exist

I think you're confusing postgresql with a spreadsheet program.

~
I wonder what makes you think so
~

There are client programs which will do this for you, perhaps you wan one of those?

~
Well, then obviously there is the need for it and you were not
successful enough at convincing these developers that they were
"confusing postgresql with a spreadsheet program"
~

PostgreSQL has 60+ types and many look like eachother. How do you propose to differentiate?

~
Data Types are basically about value ranges (how many bits do you
need to hold the value) and formatting. IMHO, finding an optimal
[im|ex]port reasoning among 60+ types should not be that much of a big
deal. In fact as a data analyst I have exported and imported CSV data
a whole lot and in many occasions it required some extra custom
coding. I may as well consolidate my code as a whole jcsvport library
in java and start an OS project when I find the time to so
~
lbrtchx

#8Christophe Pettus
xof@thebuild.com
In reply to: Albretch Mueller (#7)
Re: ERROR: relation . . . does not exist

On Aug 30, 2008, at 6:26 AM, Albretch Mueller wrote:

Well, then obviously there is the need for it and you were not
successful enough at convincing these developers that they were
"confusing postgresql with a spreadsheet program"

The behavior you are looking for is typical of a spreadsheet, because
spreadsheet programs (generally; I'm sure there are exceptions) don't
have the notion of a schema; each cell can hold its own particular
type. That being said, the automatic type-guessing that Excel, say,
provides is far from foolproof; I've probably spent more time
cleaning up Excel's bad guesses than would have been saved by my just
specifying a type for each column.

As has been noted, text representation of values are extremely
ambiguous as of which Postgres type they mean... and, of course, you
could have user-defined domains and types as well. It's true that it
could take a wild guess, but that's not a traditional part of a
database engine.

That being said, it would not be too hard to write a client that
accepted a CSV or tab-delimited file, parsed the header into column
names, and then scanned the values of the columns to take a
reasonable guess as to the column type from a highly limited set of
possibilities. This is probably one of those classic "twenty lines
of Perl" problems.

It doesn't seem as though COPY INTO is the right place for that,
since the particular guesses and set of types that one would make
strike me as very closely tied to your particular application domain.

#9Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Albretch Mueller (#5)
Re: ERROR: relation . . . does not exist

On Saturday 30 August 2008 5:23:25 am Albretch Mueller wrote:

On Thu, Aug 28, 2008 at 7:50 PM, Adrian Klaver <aklaver@comcast.net> wrote:

Define easily.

~
OK, let me try to outline the approach I would go for:
~
I think "COPY FROM CSV" should have three options, namely:
~
1) the way we have used it in which you create the table first
~
2) another way in which defaults are declared, generally as:
~
2.1) aggressive: data type, value and formatting analysis is done; if
only 1 or 0 are found declare then a BOOLEAN, if repeated data is
found (say state codes) and the stratification nodes cover the rest of
the data, stratify the data out to other extra table (they have a name
I can't recall now), index it ..., if data is kind of numeric with
front slashes and/or hyphen could they possibly be dates? if they are
definitelly dates convert them to bigint (and do the formatting in the
presentation code (also this a win-win situation with i18n code)) ...
~
2.2) conservative: data type and value, but no formatting analysis is
done and the greater encompassing data type is selected, say for 1 or
0 data use bytes [0, 255], for bytes use int, if something could be
encoded as char(2), use varchar instead, . . .
~
2.3) dumn: just use the coarsest data type possible; bigint for
anything that looks like a number and varchar for the rest
~
the "dumn" option should suggest to the DBA the option they are
using, quantified consequences for their desicions (larger DBs for no
reason, approx. reduction in speed, . .) and how not to be "dumn"
~
3) or you could define "import templates" declaring which specific
data types to use for data in a certain way, which could be declared
per column using regexps
~

I could go on, but the point is that table data types require some
thought on the part of the DBA.

~
Well, it still requires their minds and input, but they will have
jobs even if they get some help, don't you think so ;-)
~
lbrtchx

This is a combination of more work then necessary and putting the cart after
the horse. All I can see happening is delaying the point of decision to a
later time and or dumping the decision process on someone else. There is
already a "dumb" solution that has been brought many times on this list. It
involve creating a holding table that has text only fields and copying the
data into and then moving the data from there to a final table. As far as
import templates I suggest looking at:
http://pgloader.projects.postgresql.org/
It also addresses some of your other suggestions. It does not automatically
create a table though.

--
Adrian Klaver
aklaver@comcast.net

#10Steve Atkins
steve@blighty.com
In reply to: Christophe Pettus (#8)
Re: ERROR: relation . . . does not exist

On Aug 30, 2008, at 9:19 AM, Christophe wrote:

On Aug 30, 2008, at 6:26 AM, Albretch Mueller wrote:

Well, then obviously there is the need for it and you were not
successful enough at convincing these developers that they were
"confusing postgresql with a spreadsheet program"

The behavior you are looking for is typical of a spreadsheet,
because spreadsheet programs (generally; I'm sure there are
exceptions) don't have the notion of a schema; each cell can hold
its own particular type. That being said, the automatic type-
guessing that Excel, say, provides is far from foolproof; I've
probably spent more time cleaning up Excel's bad guesses than would
have been saved by my just specifying a type for each column.

As has been noted, text representation of values are extremely
ambiguous as of which Postgres type they mean... and, of course, you
could have user-defined domains and types as well. It's true that
it could take a wild guess, but that's not a traditional part of a
database engine.

That being said, it would not be too hard to write a client that
accepted a CSV or tab-delimited file, parsed the header into column
names, and then scanned the values of the columns to take a
reasonable guess as to the column type from a highly limited set of
possibilities. This is probably one of those classic "twenty lines
of Perl" problems.

About 150 line of perl[1]A validator (regex) for each data type, then for each column track which data types it may be, as you scan through the file. Use the relative priorities of different data types to assign something appropriate for each column, then do a second pass translating the format into something Postgresql is comfortable with and feed it into pg_putcopydata.. It can actually work quite well, but is
entirely a client-side problem. None of that sort of heuristics should
go anywhere near COPY in.

It doesn't seem as though COPY INTO is the right place for that,
since the particular guesses and set of types that one would make
strike me as very closely tied to your particular application domain.

Cheers,
Steve

[1]: A validator (regex) for each data type, then for each column track which data types it may be, as you scan through the file. Use the relative priorities of different data types to assign something appropriate for each column, then do a second pass translating the format into something Postgresql is comfortable with and feed it into pg_putcopydata.
which data types it may be, as you scan through the file. Use the
relative priorities of different data types to assign something
appropriate for each column, then do a second pass translating the
format into something Postgresql is comfortable with and feed it into
pg_putcopydata.

#11Albretch Mueller
lbrtchx@gmail.com
In reply to: Steve Atkins (#10)
Re: ERROR: relation . . . does not exist

spreadsheet programs (generally; I'm sure there are exceptions) don't have the notion of a schema; each cell can hold its own particular type.

~
Oh, now I see what Martin meant!
~

that's not a traditional part of a database engine.

~
well, yeah! I would totally agree with you, but since I doubt very
much "COPY FROM CSV" is part of the SQL standard to beging with, why
not spice it up a little more?
~

This is probably one of those classic "twenty lines of Perl" problems.

~
java since 1.5 comes with a full blown, PERL-like regexp engine
~

I suggest looking at: http://pgloader.projects.postgresql.org/
[1] A validator (regex) for each data type . . .

~
that sort of things was what i was talking about, but I would go
quite a bit farther
~
Thanks
lbrtchx

#12Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Adrian Klaver (#9)
Re: ERROR: relation . . . does not exist

On Saturday 30 August 2008 9:42:19 am Adrian Klaver wrote:

On Saturday 30 August 2008 5:23:25 am Albretch Mueller wrote:

On Thu, Aug 28, 2008 at 7:50 PM, Adrian Klaver <aklaver@comcast.net>

wrote:

Define easily.

~
OK, let me try to outline the approach I would go for:
~
I think "COPY FROM CSV" should have three options, namely:
~
1) the way we have used it in which you create the table first
~
2) another way in which defaults are declared, generally as:
~
2.1) aggressive: data type, value and formatting analysis is done; if
only 1 or 0 are found declare then a BOOLEAN, if repeated data is
found (say state codes) and the stratification nodes cover the rest of
the data, stratify the data out to other extra table (they have a name
I can't recall now), index it ..., if data is kind of numeric with
front slashes and/or hyphen could they possibly be dates? if they are
definitelly dates convert them to bigint (and do the formatting in the
presentation code (also this a win-win situation with i18n code)) ...
~
2.2) conservative: data type and value, but no formatting analysis is
done and the greater encompassing data type is selected, say for 1 or
0 data use bytes [0, 255], for bytes use int, if something could be
encoded as char(2), use varchar instead, . . .
~
2.3) dumn: just use the coarsest data type possible; bigint for
anything that looks like a number and varchar for the rest
~
the "dumn" option should suggest to the DBA the option they are
using, quantified consequences for their desicions (larger DBs for no
reason, approx. reduction in speed, . .) and how not to be "dumn"
~
3) or you could define "import templates" declaring which specific
data types to use for data in a certain way, which could be declared
per column using regexps
~

I could go on, but the point is that table data types require some
thought on the part of the DBA.

~
Well, it still requires their minds and input, but they will have
jobs even if they get some help, don't you think so ;-)
~
lbrtchx

This is a combination of more work then necessary and putting the cart
after the horse. All I can see happening is delaying the point of decision

Lets try this again. The cart before the horse.
Memo to self:
1) Drink sufficient coffee.
2) Answer email.

to a later time and or dumping the decision process on someone else. There
is already a "dumb" solution that has been brought many times on this list.
It involve creating a holding table that has text only fields and copying
the data into and then moving the data from there to a final table. As far
as import templates I suggest looking at:
http://pgloader.projects.postgresql.org/
It also addresses some of your other suggestions. It does not automatically
create a table though.

--
Adrian Klaver
aklaver@comcast.net

--
Adrian Klaver
aklaver@comcast.net

#13Christophe Pettus
xof@thebuild.com
In reply to: Albretch Mueller (#11)
Re: ERROR: relation . . . does not exist

On Aug 30, 2008, at 10:33 AM, Albretch Mueller wrote:

well, yeah! I would totally agree with you, but since I doubt very
much "COPY FROM CSV" is part of the SQL standard to beging with, why
not spice it up a little more?

I'd guess that coming up with a general algorithm to guess the type
from a column of CSV text would satisfy no one, since we'd always
miss a particular case that is important to someone (are times local
or UTC? Should we assume integer or float? Varchar or text?), and the
option is a forest of switches that would be extremely complex and
error prone.

This sounds very much like an application-domain problem, best solved
in the application domain.

#14Albretch Mueller
lbrtchx@gmail.com
In reply to: Christophe Pettus (#13)
Re: ERROR: relation . . . does not exist

... are times local or UTC

~
this is a rather semantic, not a syntactic issue that some code could
NOT decide based on the data it reads
~

Should we assume integer or float?

~
is a dot anywhere in the data you read in for that particular column? ...
~

Varchar or text?

~
Is the length of the data read in always less than 255 bytes ( or
characters?)? ...
~
You have made clear to me why my attempt for a RFE for COPY FROM CVS
has found some technical resistance/disagreement, but I still think my
idea even if not so popular for concrete and cultural reasons makes at
least sense to some people
~
DBAs ussualy have a mental map of the data they have on each table,
etc; whereas as a data analyst you find yourself constantly reading
in, cleasing and marshaling data from which you have no prior
knowledge
~
lbrtchx

#15Christophe Pettus
xof@thebuild.com
In reply to: Albretch Mueller (#14)
Re: ERROR: relation . . . does not exist

You have made clear to me why my attempt for a RFE for COPY FROM CVS
has found some technical resistance/disagreement, but I still think my
idea even if not so popular for concrete and cultural reasons makes at
least sense to some people

It's a perfectly reasonable problem to want to solve; the question is
whether COPY is the right place to solve it. I would think that a tool
that reads the CSV data and produces a proposed schema definition for
the table would be a more generally-useful approach.

#16Lew
noone@lwsc.ehost-services.com
In reply to: Albretch Mueller (#14)
Re: ERROR: relation . . . does not exist

Albretch Mueller wrote:

Varchar or text?

~
Is the length of the data read in always less than 255 bytes ( or
characters?)? ...

It may be more limited than that by application-domain-specific constraints -
e.g., a license plate might be statutorily limited to eight characters.

It might be coincidence that the input happens to fit within 255 characters
when future inputs might not. One cannot make certain statements about
whether "data read in always [being] less than 255 ... characters" based on a
limited sample set, only probabilistic ones, absent reasoning about the
application domain.

Additionally, one could use TEXT for shorter columns if one wanted:

There are no performance differences between these three types,

[character varying(n), character(n) and text]

apart from increased storage size when using the blank-padded
type, and a few extra cycles to check the length when storing
into a length-constrained column. While character(n) has
performance advantages in some other database systems, it has
no such advantages in PostgreSQL. In most situations text or
character varying should be used instead.

<http://www.postgresql.org/docs/8.3/static/datatype-character.html&gt;

DBMSes are about schemata and planning the data structures, not loosey-goosey
typing. (This reminds me of the debate between the loosely-typed language
(e.g., PHP) camp versus the strongly-typed language (C#, Java) camp.) Schemas
are based on analysis of and reasoning about the application domain, not lucky
guesses about a limited sample of inputs.

--
Lew

#17Peter Eisentraut
peter_e@gmx.net
In reply to: Albretch Mueller (#7)
Re: ERROR: relation . . . does not exist

Albretch Mueller wrote:

PostgreSQL has 60+ types and many look like eachother. How do you propose to differentiate?

~
Data Types are basically about value ranges (how many bits do you
need to hold the value) and formatting.

That is exactly wrong, at least in the PostgreSQL approach to the type
system. Data types are about functions and operators associated with
them. So it is not about what the data looks like, but what kind of
processing you want to do with it. Guessing that information from a
dump of data seems pretty much impossible.