Better to dump tabs as tabs, or \t?

Started by Tom Laneover 19 years ago5 messages
#1Tom Lane
tgl@sss.pgh.pa.us

Historically pg_dump has taken pains to dump ASCII control characters
as backslash constructs, for instance \t for tab. I am thinking this
is not such a great idea, and that it'd be more portable rather than
less so if we got rid of that logic and just dumped tab as tab, etc.
In particular, making this play nice with standard_conforming_strings
seems unpleasant: we'll have to emit E'' strings which are certainly
not portable, not even to older PG releases.

The only good argument I can see for the current behavior is that it
makes the dump file somewhat more robust against whitespace-mashing
filters like typical email programs. But I wouldn't count on a
dump file to come through such a thing completely unscathed anyway.

Thoughts?

regards, tom lane

#2Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#1)
Re: Better to dump tabs as tabs, or \t?

Tom Lane wrote:

Historically pg_dump has taken pains to dump ASCII control characters
as backslash constructs, for instance \t for tab. I am thinking this
is not such a great idea, and that it'd be more portable rather than
less so if we got rid of that logic and just dumped tab as tab, etc.
In particular, making this play nice with standard_conforming_strings
seems unpleasant: we'll have to emit E'' strings which are certainly
not portable, not even to older PG releases.

The only good argument I can see for the current behavior is that it
makes the dump file somewhat more robust against whitespace-mashing
filters like typical email programs. But I wouldn't count on a
dump file to come through such a thing completely unscathed anyway.

Go for it.

Emailing a database dump as other than a properly encoded MIME
attachment makes no sense these days.

cheers

andrew

#3Marko Kreen
markokr@gmail.com
In reply to: Tom Lane (#1)
Re: Better to dump tabs as tabs, or \t?

On 5/27/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Historically pg_dump has taken pains to dump ASCII control characters
as backslash constructs, for instance \t for tab. I am thinking this
is not such a great idea, and that it'd be more portable rather than
less so if we got rid of that logic and just dumped tab as tab, etc.
In particular, making this play nice with standard_conforming_strings
seems unpleasant: we'll have to emit E'' strings which are certainly
not portable, not even to older PG releases.

Could we just give a switch to pg_dump, which toggles between
standard_confirming_strings and old escaped strings?

IMHO this decision is similar to COPY/INSERT decision - it depends
what the admin plans to with the dump, what tools are user on it,
whether there is need to reload on older postgres, etc - and all
of them are things that the postgres tools cannot deduce.

By default, pg_dump should output standard_conforming_strings,
that being in sync with policy to move to standard SQL quoting.

And when the switch is given, pg_dump should put SET at the
start of the dump, not use E'' stings, so giving option
for being backwards compatible.

Such option would considerably lower the pain of migrating data
between versions.

--
marko

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marko Kreen (#3)
Re: Better to dump tabs as tabs, or \t?

"Marko Kreen" <markokr@gmail.com> writes:

On 5/27/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Historically pg_dump has taken pains to dump ASCII control characters
as backslash constructs, for instance \t for tab. I am thinking this
is not such a great idea, and that it'd be more portable rather than
less so if we got rid of that logic and just dumped tab as tab, etc.
In particular, making this play nice with standard_conforming_strings
seems unpleasant: we'll have to emit E'' strings which are certainly
not portable, not even to older PG releases.

Could we just give a switch to pg_dump, which toggles between
standard_confirming_strings and old escaped strings?

The plan is that it'll dump according to what it finds as the
standard_conforming_strings setting on the source server.
If you feel a need to override that setting, you can use PGOPTIONS
or the other usual ways to set a GUC variable for a program.

However, my thought on the point at hand is to just go over to
dumping control characters literally in either case. This is
backwards-compatible to all PG versions and I don't know of a
reason to think it wouldn't work (at least as well as the backslash
constructs anyway) for portability to other databases.

Note: this only affects strings dumped as part of SQL commands;
COPY data isn't at issue, since we're not planning to change the
semantics of that. COPY has always dumped tab as \t and I don't
intend to change it. But pg_dump --inserts would be affected,
also strings appearing in view definitions and such.

We have some precedent for this in that pg_dump has by default
dumped function definitions as $$ literals for a release or two
now, and no one's complained of whitespace getting munged in
function definitions.

regards, tom lane

#5Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#4)
Re: Better to dump tabs as tabs, or \t?

COPY wants \r and \n to be used because it checks for line endings, but
your change is only for the SQL strings, and you are right, it is more
porable to dump as actual bytes than backslashes.

---------------------------------------------------------------------------

Tom Lane wrote:

"Marko Kreen" <markokr@gmail.com> writes:

On 5/27/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Historically pg_dump has taken pains to dump ASCII control characters
as backslash constructs, for instance \t for tab. I am thinking this
is not such a great idea, and that it'd be more portable rather than
less so if we got rid of that logic and just dumped tab as tab, etc.
In particular, making this play nice with standard_conforming_strings
seems unpleasant: we'll have to emit E'' strings which are certainly
not portable, not even to older PG releases.

Could we just give a switch to pg_dump, which toggles between
standard_confirming_strings and old escaped strings?

The plan is that it'll dump according to what it finds as the
standard_conforming_strings setting on the source server.
If you feel a need to override that setting, you can use PGOPTIONS
or the other usual ways to set a GUC variable for a program.

However, my thought on the point at hand is to just go over to
dumping control characters literally in either case. This is
backwards-compatible to all PG versions and I don't know of a
reason to think it wouldn't work (at least as well as the backslash
constructs anyway) for portability to other databases.

Note: this only affects strings dumped as part of SQL commands;
COPY data isn't at issue, since we're not planning to change the
semantics of that. COPY has always dumped tab as \t and I don't
intend to change it. But pg_dump --inserts would be affected,
also strings appearing in view definitions and such.

We have some precedent for this in that pg_dump has by default
dumped function definitions as $$ literals for a release or two
now, and no one's complained of whitespace getting munged in
function definitions.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +