psql copy command - 1 char limitation on delimiter

Started by reyalmost 16 years ago7 messagesgeneral

reywang@optonline.net

almost 16 years ago

Hi,

/delimiter/

The single character that separates columns within each row (line)
of the file. The default is a tab character in text mode, a comma in
CSV mode.

Why limit this to a single character?
For easy of use, should it be lifted, so we can specify such as '|*|'?

Rey

Tom Lane

tgl@sss.pgh.pa.us

almost 16 years ago

In reply to: rey (#1)

Re: psql copy command - 1 char limitation on delimiter

rey <reywang@optonline.net> writes:

Why limit this to a single character?

Performance. Believe it or not, breaking fields at the delimiter is
a significant factor in COPY speed.

regards, tom lane

rey

reywang@optonline.net

almost 16 years ago

In reply to: Tom Lane (#2)

Re: psql copy command - 1 char limitation on delimiter

On 09/25/2010 10:03 AM, Tom Lane wrote:

rey<reywang@optonline.net> writes:

Why limit this to a single character?

Performance. Believe it or not, breaking fields at the delimiter is
a significant factor in COPY speed.

regards, tom lane

True, but just for 5% to 10% degradation here.
For RDBMS, correct indexes and good logical design we are talking about
10 times or more performance gains.

Who cares about 10% waste here? Is it Oracle and other commercial RDBMS
no such limitation.

Merlin Moncure

mmoncure@gmail.com

almost 16 years ago

In reply to: rey (#3)

Re: psql copy command - 1 char limitation on delimiter

On Sat, Sep 25, 2010 at 3:12 PM, rey <reywang@optonline.net> wrote:

On 09/25/2010 10:03 AM, Tom Lane wrote:

rey<reywang@optonline.net> writes:

Why limit this to a single character?

Performance. Believe it or not, breaking fields at the delimiter is
a significant factor in COPY speed.

regards, tom lane

True, but just for 5% to 10% degradation here.
For RDBMS, correct indexes and good logical design we are talking about 10
times or more performance gains.

Who cares about 10% waste here? Is it Oracle and other commercial RDBMS no
such limitation.

Believe it or not, data loading performance is one of the most common
standard metrics people used to benchmark databases. A large class of
applications need to slam data in the db as quickly as possible, do
some work, and slam it out/dump it. Copy performance matters.

merlin

Steve Crawford

scrawford@pinpointresearch.com

almost 16 years ago

In reply to: Tom Lane (#2)

Re: psql copy command - 1 char limitation on delimiter

On 09/25/2010 07:03 AM, Tom Lane wrote:

rey<reywang@optonline.net> writes:

Why limit this to a single character?

Performance. Believe it or not, breaking fields at the delimiter is
a significant factor in COPY speed.

regards, tom lane

I agree that that multi-character (or even regex) delimiters would be
useful. Would it be reasonable for the copy process to differentiate
between single character delimiters which could be processed in
"high-speed" mode and multi-character or regex delimiters which would be
available as needed albeit at the expense of a performance hit?

Cheers,
Steve

Bruce Momjian

bruce@momjian.us

almost 16 years ago

In reply to: Steve Crawford (#5)

Re: psql copy command - 1 char limitation on delimiter

Steve Crawford wrote:

On 09/25/2010 07:03 AM, Tom Lane wrote:

rey<reywang@optonline.net> writes:

Why limit this to a single character?

Performance. Believe it or not, breaking fields at the delimiter is
a significant factor in COPY speed.

regards, tom lane

I agree that that multi-character (or even regex) delimiters would be
useful. Would it be reasonable for the copy process to differentiate
between single character delimiters which could be processed in
"high-speed" mode and multi-character or regex delimiters which would be
available as needed albeit at the expense of a performance hit?

I am not sure you are aware but Postgres never confuses delimiters from
data because it uses a backslash before literal data that matches
delimiters.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Steve Crawford

scrawford@pinpointresearch.com

almost 16 years ago

In reply to: Bruce Momjian (#6)

Re: psql copy command - 1 char limitation on delimiter

On 10/12/2010 08:28 PM, Bruce Momjian wrote:

Steve Crawford wrote:

On 09/25/2010 07:03 AM, Tom Lane wrote:

rey<reywang@optonline.net> writes:

Why limit this to a single character?

Performance. Believe it or not, breaking fields at the delimiter is
a significant factor in COPY speed.

regards, tom lane

I agree that that multi-character (or even regex) delimiters would be
useful. Would it be reasonable for the copy process to differentiate
between single character delimiters which could be processed in
"high-speed" mode and multi-character or regex delimiters which would be
available as needed albeit at the expense of a performance hit?

I am not sure you are aware but Postgres never confuses delimiters from
data because it uses a backslash before literal data that matches
delimiters.

Yes, I am. But the discussion was about using multi-character strings as
delimiters.

But while I have encountered files using multiple-character delimiters,
I'm finding myself leaning toward the camp that says that such cases are
better processed externally by Perl/Python/sed/awk/Ruby/ETL/etc.
Especially given the "fun" of defining how to properly escape a
regex-matching string in a regex delimited file.

Cheers,
Steve