Syntax error in a large COPY

Started by Reg Me Pleaseover 18 years ago34 messagesgeneral
Jump to latest
#1Reg Me Please
regmeplease@gmail.com

Hi all.
I'm generating an SQL script to load some million rows into a table.
I'm trying to use the COPY command in order to speed the load up.

At a certain point I get an error telling about a
"invalid input syntax for type numeric"

The incriminated line number is the last one (the one containing the \.).

Is there a way to know which line is really malformed?

Thanks.

--
Reg me Please

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Reg Me Please (#1)
Re: Syntax error in a large COPY

Reg Me Please <regmeplease@gmail.com> writes:

At a certain point I get an error telling about a
"invalid input syntax for type numeric"
The incriminated line number is the last one (the one containing the \.).
Is there a way to know which line is really malformed?

Why do you think the report is inaccurate?

I can reproduce this by putting a few spaces in front of \., for
instance.

regression=# create table foo(f1 numeric);
CREATE TABLE
regression=# copy foo from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.

123
\.

regression=# copy foo from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.

\.
\.

ERROR: invalid input syntax for type numeric: " "
CONTEXT: COPY foo, line 1, column f1: " "
regression=#

As the psql prompt mentions, \. has to be alone on a line.

regards, tom lane

#3Reg Me Please
regmeplease@gmail.com
In reply to: Tom Lane (#2)
Re: Syntax error in a large COPY

That seems not to be the case.
The last line has a \. by its own and the last but one is
well formed.

Il Tuesday 06 November 2007 19:14:00 Tom Lane ha scritto:

Reg Me Please <regmeplease@gmail.com> writes:

At a certain point I get an error telling about a
"invalid input syntax for type numeric"
The incriminated line number is the last one (the one containing the \.).
Is there a way to know which line is really malformed?

Why do you think the report is inaccurate?

I can reproduce this by putting a few spaces in front of \., for
instance.

regression=# create table foo(f1 numeric);
CREATE TABLE
regression=# copy foo from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.

123
\.

regression=# copy foo from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.

\.
\.

ERROR: invalid input syntax for type numeric: " "
CONTEXT: COPY foo, line 1, column f1: " "
regression=#

As the psql prompt mentions, \. has to be alone on a line.

regards, tom lane

--
Reg me Please

#4Scott Marlowe
scott.marlowe@gmail.com
In reply to: Reg Me Please (#3)
Re: Syntax error in a large COPY

On 11/6/07, Reg Me Please <regmeplease@gmail.com> wrote:

That seems not to be the case.
The last line has a \. by its own and the last but one is
well formed.

(Please don't top post...)

Got a self contained test case you can post?

#5Collin Kidder
adderd@kkmfg.com
In reply to: Scott Marlowe (#4)
Re: Syntax error in a large COPY

This is offtopic but there is nothing wrong with top posting. Is there a
mail list policy on it or are you just picky about it?

Scott Marlowe wrote:

Show quoted text

On 11/6/07, Reg Me Please <regmeplease@gmail.com> wrote:

That seems not to be the case.
The last line has a \. by its own and the last but one is
well formed.

(Please don't top post...)

Got a self contained test case you can post?

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

#6Scott Marlowe
scott.marlowe@gmail.com
In reply to: Collin Kidder (#5)
Re: Syntax error in a large COPY

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?

Show quoted text

On Nov 6, 2007 1:16 PM, Collin Kidder <adderd@kkmfg.com> wrote:

This is offtopic but there is nothing wrong with top posting. Is there a
mail list policy on it or are you just picky about it?

Scott Marlowe wrote:

On 11/6/07, Reg Me Please <regmeplease@gmail.com> wrote:

That seems not to be the case.
The last line has a \. by its own and the last but one is
well formed.

(Please don't top post...)

Got a self contained test case you can post?

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

#7Scott Marlowe
scott.marlowe@gmail.com
In reply to: Collin Kidder (#5)
Re: Syntax error in a large COPY

On Nov 6, 2007 1:16 PM, Collin Kidder <adderd@kkmfg.com> wrote:

This is offtopic but there is nothing wrong with top posting. Is there a
mail list policy on it or are you just picky about it?

OK, I was being a bit flippant with my last response. The reasoning
on technical mailing lists for preferring inline or bottom posting is
several fold. One is the reason I posted, that the answers being out
of order tend to make it difficult to follow the flow of the
conversation, especially if the conversation has, until then, been
inline / bottom posted.

But the other reason is that top posters are often the biggest
offenders of not editing their replies. there's nothing like seeing a
one line reply to a 500 line conversation.

When inline posting, one can delete the parts of the converstation
that they aren't replying to so that it is obvious what parts they are
replying to and keep their post concise.

Basically, in a technical mailing list, one is trying to converse in
the most efficient and concise manner. Top posting does not help to
accomplish this goal, because a top post has no context within the
conversation.

OTOH, sometimes, a top post is appropriate. I.e. someone posts a long
article explaining how some arcane part of the database works, and a
person replies with a top post of "Thanks for the great article, it
really explained a lot" and then deletes MOST of the article since it
isn't needed for the reply.

So it's not really a hard and fast rule, it's more a guideline on a
technical list.

RFC 1855 is a bit old but still makes some good points

http://tools.ietf.org/html/rfc1855

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Scott Marlowe (#7)
Re: Syntax error in a large COPY

"Scott Marlowe" <scott.marlowe@gmail.com> writes:

But the other reason is that top posters are often the biggest
offenders of not editing their replies. there's nothing like seeing a
one line reply to a 500 line conversation.
When inline posting, one can delete the parts of the converstation
that they aren't replying to so that it is obvious what parts they are
replying to and keep their post concise.

The real point here is that when writing to a mailing list, you should
make an effort to conserve other peoples' time. Hundreds of people are
going to read what you wrote, possibly thousands when you count in
people searching the list archives in the future. Saving them a few
seconds apiece adds up fast. Not bothering to expend a minute of your
own time to make it a bit easier on the reader marks you as a boor.

In the context of this particular issue, what that means is that it's
polite to quote only enough of the previous message to make it clear
what you're replying to --- you should assume that most readers already
saw the earlier message and don't need to read it all again, but maybe
could use a few sentences of context to remember what the thread was
about (especially since it might be hours or days since they read the
prior message). And you should quote the part of the message that bears
on what you've got to say, if there's an identifiable part that you're
responding to. And that quoting should come before your reply, because
otherwise your comment is out of context for the reader; again, they'd
have to burrow through the quoted message to remember what the
discussion was, and then come back and re-read your comment.

Another reason for being selective when you quote is to be polite to
people searching the list archives in the future --- they don't want to
be buried in hits from repeated quotes of the message they're actually
looking for.

I don't say that you *can't* top post and still be polite, but nearly
every example I've ever seen came off as rude to the reader because
it didn't take any of these things into account, and it very obviously
was something that the sender didn't spend any time on. If people
decide you're a jerk they're likely to ignore your posts.

regards, tom lane

#9Reg Me Please
regmeplease@gmail.com
In reply to: Scott Marlowe (#4)
Re: Syntax error in a large COPY

Il Tuesday 06 November 2007 19:43:38 Scott Marlowe ha scritto:

On 11/6/07, Reg Me Please <regmeplease@gmail.com> wrote:

That seems not to be the case.
The last line has a \. by its own and the last but one is
well formed.

(Please don't top post...)

Got a self contained test case you can post?

Back to the original topic ...

I'm trying to understand what and where.
The point is that I have this 29M+ lines script telling me there's a problem
somewhere.

A self contained test, at the moment, would be that long!

I'm considering a "branch and bound" approach ... but it'd be quite long
and tedious as the program generating the script has not been written
to do such things.

I'm pretty sure that the interpreter knows exactly where the problem
is. It's just missing to tell it.
Unless there's some friendly advise.

--
Reg me Please

#10Scott Marlowe
scott.marlowe@gmail.com
In reply to: Reg Me Please (#9)
Re: Syntax error in a large COPY

On 11/6/07, Reg Me Please <regmeplease@gmail.com> wrote:

Il Tuesday 06 November 2007 19:43:38 Scott Marlowe ha scritto:

On 11/6/07, Reg Me Please <regmeplease@gmail.com> wrote:

That seems not to be the case.
The last line has a \. by its own and the last but one is
well formed.

(Please don't top post...)

Got a self contained test case you can post?

Back to the original topic ...

I'm trying to understand what and where.
The point is that I have this 29M+ lines script telling me there's a problem
somewhere.

A self contained test, at the moment, would be that long!

I'm considering a "branch and bound" approach ... but it'd be quite long
and tedious as the program generating the script has not been written
to do such things.

Split it in half, with the appropriate sql on each end so the data
still works, and see which half causes a problem. Keep splitting the
one that causes a problem in half until you have a small one with the
problem still.

I'm guessing the problem will become obvious then.

#11Reg Me Please
regmeplease@gmail.com
In reply to: Scott Marlowe (#10)
Re: Syntax error in a large COPY

Il Tuesday 06 November 2007 22:13:15 hai scritto:

On 11/6/07, Reg Me Please <regmeplease@gmail.com> wrote:

Il Tuesday 06 November 2007 19:43:38 Scott Marlowe ha scritto:

On 11/6/07, Reg Me Please <regmeplease@gmail.com> wrote:

That seems not to be the case.
The last line has a \. by its own and the last but one is
well formed.

(Please don't top post...)

Got a self contained test case you can post?

Back to the original topic ...

I'm trying to understand what and where.
The point is that I have this 29M+ lines script telling me there's a
problem somewhere.

A self contained test, at the moment, would be that long!

I'm considering a "branch and bound" approach ... but it'd be quite long
and tedious as the program generating the script has not been written
to do such things.

Split it in half, with the appropriate sql on each end so the data
still works, and see which half causes a problem. Keep splitting the
one that causes a problem in half until you have a small one with the
problem still.

I'm guessing the problem will become obvious then.

That's the "branch and bound". Editing 29M+ lines file takes some time.
But this is the way I'm going to go right now.

--
Reg me Please

#12Scott Marlowe
scott.marlowe@gmail.com
In reply to: Reg Me Please (#11)
Re: Syntax error in a large COPY

On 11/6/07, Reg Me Please <regmeplease@gmail.com> wrote:

Il Tuesday 06 November 2007 22:13:15 hai scritto:

On 11/6/07, Reg Me Please <regmeplease@gmail.com> wrote:

Il Tuesday 06 November 2007 19:43:38 Scott Marlowe ha scritto:

On 11/6/07, Reg Me Please <regmeplease@gmail.com> wrote:

That seems not to be the case.
The last line has a \. by its own and the last but one is
well formed.

(Please don't top post...)

Got a self contained test case you can post?

Back to the original topic ...

I'm trying to understand what and where.
The point is that I have this 29M+ lines script telling me there's a
problem somewhere.

A self contained test, at the moment, would be that long!

I'm considering a "branch and bound" approach ... but it'd be quite long
and tedious as the program generating the script has not been written
to do such things.

Split it in half, with the appropriate sql on each end so the data
still works, and see which half causes a problem. Keep splitting the
one that causes a problem in half until you have a small one with the
problem still.

I'm guessing the problem will become obvious then.

That's the "branch and bound". Editing 29M+ lines file takes some time.
But this is the way I'm going to go right now.

Oh, we called it half-splitting in the military.

Using something like head / tail in unix to do it. Should be fairly
fast, especially if you keep cutting it in half after the first test.
I can't imagine editing something that large even in vi being very
fast.

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Reg Me Please (#9)
Re: Syntax error in a large COPY

Reg Me Please <regmeplease@gmail.com> writes:

I'm trying to understand what and where.
The point is that I have this 29M+ lines script telling me there's a problem
somewhere.

It told you exactly where it detected the problem.

Actually ... are you looking at the right error message? If this is a
copy operation within a larger psql script, I think it's probably true
that psql will point at the end of the copy data, because *it* doesn't
know any better. But the error message from the backend should
correctly finger which line of copy input it got confused at.

regards, tom lane

#14Reg Me Please
regmeplease@gmail.com
In reply to: Scott Marlowe (#12)
Re: Syntax error in a large COPY

Il Tuesday 06 November 2007 22:37:12 Scott Marlowe ha scritto:

On 11/6/07, Reg Me Please <regmeplease@gmail.com> wrote:

Il Tuesday 06 November 2007 22:13:15 hai scritto:

On 11/6/07, Reg Me Please <regmeplease@gmail.com> wrote:

Il Tuesday 06 November 2007 19:43:38 Scott Marlowe ha scritto:

On 11/6/07, Reg Me Please <regmeplease@gmail.com> wrote:

That seems not to be the case.
The last line has a \. by its own and the last but one is
well formed.

(Please don't top post...)

Got a self contained test case you can post?

Back to the original topic ...

I'm trying to understand what and where.
The point is that I have this 29M+ lines script telling me there's a
problem somewhere.

A self contained test, at the moment, would be that long!

I'm considering a "branch and bound" approach ... but it'd be quite
long and tedious as the program generating the script has not been
written to do such things.

Split it in half, with the appropriate sql on each end so the data
still works, and see which half causes a problem. Keep splitting the
one that causes a problem in half until you have a small one with the
problem still.

I'm guessing the problem will become obvious then.

That's the "branch and bound". Editing 29M+ lines file takes some time.
But this is the way I'm going to go right now.

Oh, we called it half-splitting in the military.

Using something like head / tail in unix to do it. Should be fairly
fast, especially if you keep cutting it in half after the first test.
I can't imagine editing something that large even in vi being very
fast.

My laptop has a fairly slow disk and even with Linux and vi it takes time.

Well, a better diagnostic messages would be better in any case.

--
Reg me Please

#15Thomas Kellerer
spam_eater@gmx.net
In reply to: Tom Lane (#8)
Re: Syntax error in a large COPY

Tom Lane wrote on 06.11.2007 21:21:

The real point here is that when writing to a mailing list, you should
make an effort to conserve other peoples' time. Hundreds of people are
going to read what you wrote, possibly thousands when you count in
people searching the list archives in the future. Saving them a few
seconds apiece adds up fast. Not bothering to expend a minute of your
own time to make it a bit easier on the reader marks you as a boor.

Hmm.
My News/Mail client (Thunderbird) displays all the posts belonging together in a
threaded manner. So I click on the first post, read the posting, click on the
second post, and the first thing I see is the first post repeated (or at least
parts of it).
If everyone simply top-posted, there would be no need for me to scroll down,
just to find a two line answer below a forty line quote - which I personally
find more irritating than top-posting.

But then that may only be me ;)

#16Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Tom Lane (#13)
Re: Syntax error in a large COPY

Hi,

Le Tuesday 06 November 2007 22:40:50 Tom Lane, vous avez écrit :

Reg Me Please <regmeplease@gmail.com> writes:

I'm trying to understand what and where.
The point is that I have this 29M+ lines script telling me there's a
problem somewhere.

You could use pgloader, which reads CSV input (or text which is not quite CSV)
and will import valid rows and reject invalid ones, filling both a reject log
file with error messages and a rejected data file with bad input lines.
Basically, it will issue the dichotomy steps (branch and bound) for you.
http://pgfoundry.org/projects/pgloader
http://pgloader.projects.postgresql.org/

It told you exactly where it detected the problem.

Actually ... are you looking at the right error message? If this is a
copy operation within a larger psql script, I think it's probably true
that psql will point at the end of the copy data, because *it* doesn't
know any better. But the error message from the backend should
correctly finger which line of copy input it got confused at.

Or just use the server logged error message, which will be the same as
reported by pgloader (no magic here) --- pgloader will import successfully
the rest of the data and have the faulty lines separated out in the reject
file, but still uses COPY internally.

Hope this helps,
--
dim

#17Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Reg Me Please (#11)
Re: Syntax error in a large COPY

Reg Me Please wrote:

Il Tuesday 06 November 2007 22:13:15 hai scritto:

Split it in half, with the appropriate sql on each end so the data
still works, and see which half causes a problem. Keep splitting the
one that causes a problem in half until you have a small one with the
problem still.

I'm guessing the problem will become obvious then.

That's the "branch and bound". Editing 29M+ lines file takes some time.
But this is the way I'm going to go right now.

Huh, why not just use pgloader?

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Kellerer (#15)
Re: Syntax error in a large COPY

Thomas Kellerer <spam_eater@gmx.net> writes:

If everyone simply top-posted, there would be no need for me to scroll down,
just to find a two line answer below a forty line quote - which I personally
find more irritating than top-posting.

I think you're ignoring my basic point, which was that people shouldn't
be quoting forty lines' worth in the first place. *Especially* not if
they only have two lines to contribute.

But this horse has been beat to death before...

regards, tom lane

#19Reg Me Please
regmeplease@gmail.com
In reply to: Alvaro Herrera (#17)
Re: Syntax error in a large COPY

Il Wednesday 07 November 2007 01:29:44 Alvaro Herrera ha scritto:

Reg Me Please wrote:

Il Tuesday 06 November 2007 22:13:15 hai scritto:
That's the "branch and bound". Editing 29M+ lines file takes some time.
But this is the way I'm going to go right now.

Huh, why not just use pgloader?

Becasue I never heard about it.
Because it's not included into my distribution package list.
And because I was trusting the "core tools" to work reasonably.

I'll compile and use that.

By the way, unsless you want to have logs at the debug level, no
information has been found in the logs about the offending line(s)
in the 29M+ COPY script.

--
Reg me Please

#20Thomas Kellerer
spam_eater@gmx.net
In reply to: Tom Lane (#18)
Re: Syntax error in a large COPY

Tom Lane, 07.11.2007 06:14:

Thomas Kellerer <spam_eater@gmx.net> writes:

If everyone simply top-posted, there would be no need for me to scroll down,
just to find a two line answer below a forty line quote - which I personally
find more irritating than top-posting.

I think you're ignoring my basic point, which was that people shouldn't
be quoting forty lines' worth in the first place. *Especially* not if
they only have two lines to contribute.

No, I did get your point.

My point is: with top-posting I don't care how many lines were repeated
because I don't have to scroll.

But this horse has been beat to death before...

Yes, nuff said :)

Regards
Thomas

#21Reg Me Please
regmeplease@gmail.com
In reply to: Reg Me Please (#19)
#22Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Reg Me Please (#21)
#23Reg Me Please
regmeplease@gmail.com
In reply to: Dimitri Fontaine (#22)
#24Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Reg Me Please (#23)
#25Reg Me Please
regmeplease@gmail.com
In reply to: Dimitri Fontaine (#24)
#26Reg Me Please
regmeplease@gmail.com
In reply to: Reg Me Please (#1)
#27Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Reg Me Please (#25)
#28Bill Moran
wmoran@potentialtech.com
In reply to: Thomas Kellerer (#20)
#29Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Reg Me Please (#23)
#30Collin Kidder
adderd@kkmfg.com
In reply to: Bill Moran (#28)
#31Andrej Ricnik-Bay
andrej.groups@gmail.com
In reply to: Collin Kidder (#30)
#32Steve Wampler
swampler@noao.edu
In reply to: Collin Kidder (#30)
#33Ron Johnson
ron.l.johnson@cox.net
In reply to: Andrej Ricnik-Bay (#31)
#34Bruce Momjian
bruce@momjian.us
In reply to: Andrej Ricnik-Bay (#31)