psql copy command - 1 char limitation on delimiter
Hi,
/delimiter/
The single character that separates columns within each row (line)
of the file. The default is a tab character in text mode, a comma in
CSV mode.
Why limit this to a single character?
For easy of use, should it be lifted, so we can specify such as '|*|'?
Rey
rey <reywang@optonline.net> writes:
Why limit this to a single character?
Performance. Believe it or not, breaking fields at the delimiter is
a significant factor in COPY speed.
regards, tom lane
On 09/25/2010 10:03 AM, Tom Lane wrote:
rey<reywang@optonline.net> writes:
Why limit this to a single character?
Performance. Believe it or not, breaking fields at the delimiter is
a significant factor in COPY speed.regards, tom lane
True, but just for 5% to 10% degradation here.
For RDBMS, correct indexes and good logical design we are talking about
10 times or more performance gains.
Who cares about 10% waste here? Is it Oracle and other commercial RDBMS
no such limitation.
On Sat, Sep 25, 2010 at 3:12 PM, rey <reywang@optonline.net> wrote:
On 09/25/2010 10:03 AM, Tom Lane wrote:
rey<reywang@optonline.net> writes:
Why limit this to a single character?
Performance. Believe it or not, breaking fields at the delimiter is
a significant factor in COPY speed.regards, tom lane
True, but just for 5% to 10% degradation here.
For RDBMS, correct indexes and good logical design we are talking about 10
times or more performance gains.Who cares about 10% waste here? Is it Oracle and other commercial RDBMS no
such limitation.
Believe it or not, data loading performance is one of the most common
standard metrics people used to benchmark databases. A large class of
applications need to slam data in the db as quickly as possible, do
some work, and slam it out/dump it. Copy performance matters.
merlin
On 09/25/2010 07:03 AM, Tom Lane wrote:
rey<reywang@optonline.net> writes:
Why limit this to a single character?
Performance. Believe it or not, breaking fields at the delimiter is
a significant factor in COPY speed.regards, tom lane
I agree that that multi-character (or even regex) delimiters would be
useful. Would it be reasonable for the copy process to differentiate
between single character delimiters which could be processed in
"high-speed" mode and multi-character or regex delimiters which would be
available as needed albeit at the expense of a performance hit?
Cheers,
Steve
Steve Crawford wrote:
On 09/25/2010 07:03 AM, Tom Lane wrote:
rey<reywang@optonline.net> writes:
Why limit this to a single character?
Performance. Believe it or not, breaking fields at the delimiter is
a significant factor in COPY speed.regards, tom lane
I agree that that multi-character (or even regex) delimiters would be
useful. Would it be reasonable for the copy process to differentiate
between single character delimiters which could be processed in
"high-speed" mode and multi-character or regex delimiters which would be
available as needed albeit at the expense of a performance hit?
I am not sure you are aware but Postgres never confuses delimiters from
data because it uses a backslash before literal data that matches
delimiters.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
On 10/12/2010 08:28 PM, Bruce Momjian wrote:
Steve Crawford wrote:
On 09/25/2010 07:03 AM, Tom Lane wrote:
rey<reywang@optonline.net> writes:
Why limit this to a single character?
Performance. Believe it or not, breaking fields at the delimiter is
a significant factor in COPY speed.regards, tom lane
I agree that that multi-character (or even regex) delimiters would be
useful. Would it be reasonable for the copy process to differentiate
between single character delimiters which could be processed in
"high-speed" mode and multi-character or regex delimiters which would be
available as needed albeit at the expense of a performance hit?I am not sure you are aware but Postgres never confuses delimiters from
data because it uses a backslash before literal data that matches
delimiters.
Yes, I am. But the discussion was about using multi-character strings as
delimiters.
But while I have encountered files using multiple-character delimiters,
I'm finding myself leaning toward the camp that says that such cases are
better processed externally by Perl/Python/sed/awk/Ruby/ETL/etc.
Especially given the "fun" of defining how to properly escape a
regex-matching string in a regex delimited file.
Cheers,
Steve