Syntax error in a large COPY
Hi all.
I'm generating an SQL script to load some million rows into a table.
I'm trying to use the COPY command in order to speed the load up.
At a certain point I get an error telling about a
"invalid input syntax for type numeric"
The incriminated line number is the last one (the one containing the \.).
Is there a way to know which line is really malformed?
Thanks.
--
Reg me Please
Reg Me Please <regmeplease@gmail.com> writes:
At a certain point I get an error telling about a
"invalid input syntax for type numeric"
The incriminated line number is the last one (the one containing the \.).
Is there a way to know which line is really malformed?
Why do you think the report is inaccurate?
I can reproduce this by putting a few spaces in front of \., for
instance.
regression=# create table foo(f1 numeric);
CREATE TABLE
regression=# copy foo from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
123
\.
regression=# copy foo from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
\.
\.
ERROR: invalid input syntax for type numeric: " "
CONTEXT: COPY foo, line 1, column f1: " "
regression=#
As the psql prompt mentions, \. has to be alone on a line.
regards, tom lane
That seems not to be the case.
The last line has a \. by its own and the last but one is
well formed.
Il Tuesday 06 November 2007 19:14:00 Tom Lane ha scritto:
Reg Me Please <regmeplease@gmail.com> writes:
At a certain point I get an error telling about a
"invalid input syntax for type numeric"
The incriminated line number is the last one (the one containing the \.).
Is there a way to know which line is really malformed?Why do you think the report is inaccurate?
I can reproduce this by putting a few spaces in front of \., for
instance.regression=# create table foo(f1 numeric);
CREATE TABLE
regression=# copy foo from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.123
\.regression=# copy foo from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.\.
\.ERROR: invalid input syntax for type numeric: " "
CONTEXT: COPY foo, line 1, column f1: " "
regression=#As the psql prompt mentions, \. has to be alone on a line.
regards, tom lane
--
Reg me Please
On 11/6/07, Reg Me Please <regmeplease@gmail.com> wrote:
That seems not to be the case.
The last line has a \. by its own and the last but one is
well formed.
(Please don't top post...)
Got a self contained test case you can post?
This is offtopic but there is nothing wrong with top posting. Is there a
mail list policy on it or are you just picky about it?
Scott Marlowe wrote:
Show quoted text
On 11/6/07, Reg Me Please <regmeplease@gmail.com> wrote:
That seems not to be the case.
The last line has a \. by its own and the last but one is
well formed.(Please don't top post...)
Got a self contained test case you can post?
---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?
Show quoted text
On Nov 6, 2007 1:16 PM, Collin Kidder <adderd@kkmfg.com> wrote:
This is offtopic but there is nothing wrong with top posting. Is there a
mail list policy on it or are you just picky about it?Scott Marlowe wrote:
On 11/6/07, Reg Me Please <regmeplease@gmail.com> wrote:
That seems not to be the case.
The last line has a \. by its own and the last but one is
well formed.(Please don't top post...)
Got a self contained test case you can post?
---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
On Nov 6, 2007 1:16 PM, Collin Kidder <adderd@kkmfg.com> wrote:
This is offtopic but there is nothing wrong with top posting. Is there a
mail list policy on it or are you just picky about it?
OK, I was being a bit flippant with my last response. The reasoning
on technical mailing lists for preferring inline or bottom posting is
several fold. One is the reason I posted, that the answers being out
of order tend to make it difficult to follow the flow of the
conversation, especially if the conversation has, until then, been
inline / bottom posted.
But the other reason is that top posters are often the biggest
offenders of not editing their replies. there's nothing like seeing a
one line reply to a 500 line conversation.
When inline posting, one can delete the parts of the converstation
that they aren't replying to so that it is obvious what parts they are
replying to and keep their post concise.
Basically, in a technical mailing list, one is trying to converse in
the most efficient and concise manner. Top posting does not help to
accomplish this goal, because a top post has no context within the
conversation.
OTOH, sometimes, a top post is appropriate. I.e. someone posts a long
article explaining how some arcane part of the database works, and a
person replies with a top post of "Thanks for the great article, it
really explained a lot" and then deletes MOST of the article since it
isn't needed for the reply.
So it's not really a hard and fast rule, it's more a guideline on a
technical list.
RFC 1855 is a bit old but still makes some good points
"Scott Marlowe" <scott.marlowe@gmail.com> writes:
But the other reason is that top posters are often the biggest
offenders of not editing their replies. there's nothing like seeing a
one line reply to a 500 line conversation.
When inline posting, one can delete the parts of the converstation
that they aren't replying to so that it is obvious what parts they are
replying to and keep their post concise.
The real point here is that when writing to a mailing list, you should
make an effort to conserve other peoples' time. Hundreds of people are
going to read what you wrote, possibly thousands when you count in
people searching the list archives in the future. Saving them a few
seconds apiece adds up fast. Not bothering to expend a minute of your
own time to make it a bit easier on the reader marks you as a boor.
In the context of this particular issue, what that means is that it's
polite to quote only enough of the previous message to make it clear
what you're replying to --- you should assume that most readers already
saw the earlier message and don't need to read it all again, but maybe
could use a few sentences of context to remember what the thread was
about (especially since it might be hours or days since they read the
prior message). And you should quote the part of the message that bears
on what you've got to say, if there's an identifiable part that you're
responding to. And that quoting should come before your reply, because
otherwise your comment is out of context for the reader; again, they'd
have to burrow through the quoted message to remember what the
discussion was, and then come back and re-read your comment.
Another reason for being selective when you quote is to be polite to
people searching the list archives in the future --- they don't want to
be buried in hits from repeated quotes of the message they're actually
looking for.
I don't say that you *can't* top post and still be polite, but nearly
every example I've ever seen came off as rude to the reader because
it didn't take any of these things into account, and it very obviously
was something that the sender didn't spend any time on. If people
decide you're a jerk they're likely to ignore your posts.
regards, tom lane
Il Tuesday 06 November 2007 19:43:38 Scott Marlowe ha scritto:
On 11/6/07, Reg Me Please <regmeplease@gmail.com> wrote:
That seems not to be the case.
The last line has a \. by its own and the last but one is
well formed.(Please don't top post...)
Got a self contained test case you can post?
Back to the original topic ...
I'm trying to understand what and where.
The point is that I have this 29M+ lines script telling me there's a problem
somewhere.
A self contained test, at the moment, would be that long!
I'm considering a "branch and bound" approach ... but it'd be quite long
and tedious as the program generating the script has not been written
to do such things.
I'm pretty sure that the interpreter knows exactly where the problem
is. It's just missing to tell it.
Unless there's some friendly advise.
--
Reg me Please
On 11/6/07, Reg Me Please <regmeplease@gmail.com> wrote:
Il Tuesday 06 November 2007 19:43:38 Scott Marlowe ha scritto:
On 11/6/07, Reg Me Please <regmeplease@gmail.com> wrote:
That seems not to be the case.
The last line has a \. by its own and the last but one is
well formed.(Please don't top post...)
Got a self contained test case you can post?
Back to the original topic ...
I'm trying to understand what and where.
The point is that I have this 29M+ lines script telling me there's a problem
somewhere.A self contained test, at the moment, would be that long!
I'm considering a "branch and bound" approach ... but it'd be quite long
and tedious as the program generating the script has not been written
to do such things.
Split it in half, with the appropriate sql on each end so the data
still works, and see which half causes a problem. Keep splitting the
one that causes a problem in half until you have a small one with the
problem still.
I'm guessing the problem will become obvious then.
Il Tuesday 06 November 2007 22:13:15 hai scritto:
On 11/6/07, Reg Me Please <regmeplease@gmail.com> wrote:
Il Tuesday 06 November 2007 19:43:38 Scott Marlowe ha scritto:
On 11/6/07, Reg Me Please <regmeplease@gmail.com> wrote:
That seems not to be the case.
The last line has a \. by its own and the last but one is
well formed.(Please don't top post...)
Got a self contained test case you can post?
Back to the original topic ...
I'm trying to understand what and where.
The point is that I have this 29M+ lines script telling me there's a
problem somewhere.A self contained test, at the moment, would be that long!
I'm considering a "branch and bound" approach ... but it'd be quite long
and tedious as the program generating the script has not been written
to do such things.Split it in half, with the appropriate sql on each end so the data
still works, and see which half causes a problem. Keep splitting the
one that causes a problem in half until you have a small one with the
problem still.I'm guessing the problem will become obvious then.
That's the "branch and bound". Editing 29M+ lines file takes some time.
But this is the way I'm going to go right now.
--
Reg me Please
On 11/6/07, Reg Me Please <regmeplease@gmail.com> wrote:
Il Tuesday 06 November 2007 22:13:15 hai scritto:
On 11/6/07, Reg Me Please <regmeplease@gmail.com> wrote:
Il Tuesday 06 November 2007 19:43:38 Scott Marlowe ha scritto:
On 11/6/07, Reg Me Please <regmeplease@gmail.com> wrote:
That seems not to be the case.
The last line has a \. by its own and the last but one is
well formed.(Please don't top post...)
Got a self contained test case you can post?
Back to the original topic ...
I'm trying to understand what and where.
The point is that I have this 29M+ lines script telling me there's a
problem somewhere.A self contained test, at the moment, would be that long!
I'm considering a "branch and bound" approach ... but it'd be quite long
and tedious as the program generating the script has not been written
to do such things.Split it in half, with the appropriate sql on each end so the data
still works, and see which half causes a problem. Keep splitting the
one that causes a problem in half until you have a small one with the
problem still.I'm guessing the problem will become obvious then.
That's the "branch and bound". Editing 29M+ lines file takes some time.
But this is the way I'm going to go right now.
Oh, we called it half-splitting in the military.
Using something like head / tail in unix to do it. Should be fairly
fast, especially if you keep cutting it in half after the first test.
I can't imagine editing something that large even in vi being very
fast.
Reg Me Please <regmeplease@gmail.com> writes:
I'm trying to understand what and where.
The point is that I have this 29M+ lines script telling me there's a problem
somewhere.
It told you exactly where it detected the problem.
Actually ... are you looking at the right error message? If this is a
copy operation within a larger psql script, I think it's probably true
that psql will point at the end of the copy data, because *it* doesn't
know any better. But the error message from the backend should
correctly finger which line of copy input it got confused at.
regards, tom lane
Il Tuesday 06 November 2007 22:37:12 Scott Marlowe ha scritto:
On 11/6/07, Reg Me Please <regmeplease@gmail.com> wrote:
Il Tuesday 06 November 2007 22:13:15 hai scritto:
On 11/6/07, Reg Me Please <regmeplease@gmail.com> wrote:
Il Tuesday 06 November 2007 19:43:38 Scott Marlowe ha scritto:
On 11/6/07, Reg Me Please <regmeplease@gmail.com> wrote:
That seems not to be the case.
The last line has a \. by its own and the last but one is
well formed.(Please don't top post...)
Got a self contained test case you can post?
Back to the original topic ...
I'm trying to understand what and where.
The point is that I have this 29M+ lines script telling me there's a
problem somewhere.A self contained test, at the moment, would be that long!
I'm considering a "branch and bound" approach ... but it'd be quite
long and tedious as the program generating the script has not been
written to do such things.Split it in half, with the appropriate sql on each end so the data
still works, and see which half causes a problem. Keep splitting the
one that causes a problem in half until you have a small one with the
problem still.I'm guessing the problem will become obvious then.
That's the "branch and bound". Editing 29M+ lines file takes some time.
But this is the way I'm going to go right now.Oh, we called it half-splitting in the military.
Using something like head / tail in unix to do it. Should be fairly
fast, especially if you keep cutting it in half after the first test.
I can't imagine editing something that large even in vi being very
fast.
My laptop has a fairly slow disk and even with Linux and vi it takes time.
Well, a better diagnostic messages would be better in any case.
--
Reg me Please
Tom Lane wrote on 06.11.2007 21:21:
The real point here is that when writing to a mailing list, you should
make an effort to conserve other peoples' time. Hundreds of people are
going to read what you wrote, possibly thousands when you count in
people searching the list archives in the future. Saving them a few
seconds apiece adds up fast. Not bothering to expend a minute of your
own time to make it a bit easier on the reader marks you as a boor.
Hmm.
My News/Mail client (Thunderbird) displays all the posts belonging together in a
threaded manner. So I click on the first post, read the posting, click on the
second post, and the first thing I see is the first post repeated (or at least
parts of it).
If everyone simply top-posted, there would be no need for me to scroll down,
just to find a two line answer below a forty line quote - which I personally
find more irritating than top-posting.
But then that may only be me ;)
Hi,
Le Tuesday 06 November 2007 22:40:50 Tom Lane, vous avez écrit :
Reg Me Please <regmeplease@gmail.com> writes:
I'm trying to understand what and where.
The point is that I have this 29M+ lines script telling me there's a
problem somewhere.
You could use pgloader, which reads CSV input (or text which is not quite CSV)
and will import valid rows and reject invalid ones, filling both a reject log
file with error messages and a rejected data file with bad input lines.
Basically, it will issue the dichotomy steps (branch and bound) for you.
http://pgfoundry.org/projects/pgloader
http://pgloader.projects.postgresql.org/
It told you exactly where it detected the problem.
Actually ... are you looking at the right error message? If this is a
copy operation within a larger psql script, I think it's probably true
that psql will point at the end of the copy data, because *it* doesn't
know any better. But the error message from the backend should
correctly finger which line of copy input it got confused at.
Or just use the server logged error message, which will be the same as
reported by pgloader (no magic here) --- pgloader will import successfully
the rest of the data and have the faulty lines separated out in the reject
file, but still uses COPY internally.
Hope this helps,
--
dim
Reg Me Please wrote:
Il Tuesday 06 November 2007 22:13:15 hai scritto:
Split it in half, with the appropriate sql on each end so the data
still works, and see which half causes a problem. Keep splitting the
one that causes a problem in half until you have a small one with the
problem still.I'm guessing the problem will become obvious then.
That's the "branch and bound". Editing 29M+ lines file takes some time.
But this is the way I'm going to go right now.
Huh, why not just use pgloader?
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
Thomas Kellerer <spam_eater@gmx.net> writes:
If everyone simply top-posted, there would be no need for me to scroll down,
just to find a two line answer below a forty line quote - which I personally
find more irritating than top-posting.
I think you're ignoring my basic point, which was that people shouldn't
be quoting forty lines' worth in the first place. *Especially* not if
they only have two lines to contribute.
But this horse has been beat to death before...
regards, tom lane
Il Wednesday 07 November 2007 01:29:44 Alvaro Herrera ha scritto:
Reg Me Please wrote:
Il Tuesday 06 November 2007 22:13:15 hai scritto:
That's the "branch and bound". Editing 29M+ lines file takes some time.
But this is the way I'm going to go right now.Huh, why not just use pgloader?
Becasue I never heard about it.
Because it's not included into my distribution package list.
And because I was trusting the "core tools" to work reasonably.
I'll compile and use that.
By the way, unsless you want to have logs at the debug level, no
information has been found in the logs about the offending line(s)
in the 29M+ COPY script.
--
Reg me Please
Tom Lane, 07.11.2007 06:14:
Thomas Kellerer <spam_eater@gmx.net> writes:
If everyone simply top-posted, there would be no need for me to scroll down,
just to find a two line answer below a forty line quote - which I personally
find more irritating than top-posting.I think you're ignoring my basic point, which was that people shouldn't
be quoting forty lines' worth in the first place. *Especially* not if
they only have two lines to contribute.
No, I did get your point.
My point is: with top-posting I don't care how many lines were repeated
because I don't have to scroll.
But this horse has been beat to death before...
Yes, nuff said :)
Regards
Thomas