pg_dump --where option

Started by Surafel Temesgenalmost 6 years ago6 messageshackers
Jump to latest
#1Surafel Temesgen
surafel3000@gmail.com

Internally pg_dump have capability to filter the table data to dump by same
filter clause but it have no interface to use it and the patch here [1]. /messages/by-id/CAGiT_HNav5B=OfCdfyFoqTa+oe5W1vG=PXkTETCxXg4kcUTktA@mail.gmail.com
adds interface to it but it have at-least two issue, one is error message
in case of incorrect where clause specification is somehow hard to debug
and strange to pg_dump .Other issue is it applies the same filter clause to
multiple tables if pattern matching return multiple tables and it seems
undesired behavior to me because mostly we don’t want to applied the same
where clause specification to multiple table. The attached patch contain a
fix for both issue

[1]: . /messages/by-id/CAGiT_HNav5B=OfCdfyFoqTa+oe5W1vG=PXkTETCxXg4kcUTktA@mail.gmail.com
/messages/by-id/CAGiT_HNav5B=OfCdfyFoqTa+oe5W1vG=PXkTETCxXg4kcUTktA@mail.gmail.com

regards

Surafel

Attachments:

pg_dump_where_clause-v1.patchtext/x-patch; charset=US-ASCII; name=pg_dump_where_clause-v1.patchDownload+157-10
#2Cary Huang
cary.huang@highgo.ca
In reply to: Surafel Temesgen (#1)
Re: pg_dump --where option

The following review has been posted through the commitfest application:
make installcheck-world: tested, failed
Implements feature: tested, failed
Spec compliant: tested, failed
Documentation: tested, failed

Hi

I had a look at the patch and it cleanly applies to postgres master branch. I tried to do a quick test on the new "where clause" functionality and for the most part it does the job as described and I'm sure some people will find this feature useful to their database dump needs. However I tried the feature with a case where I have a subquery in the where clause, but it seems to be failing to dump the data. I ran the pg_dump like:

$ pg_dump -d cary --where="test1:a3 = ( select max(aa1) from test2 )" > testdump2
$ pg_dump: error: processing of table "public.test1" failed

both test1 and test2 exist in the database and the same subquery works under psql.

I also notice that the regression tests for pg_dump is failing due to the patch, I think it is worth looking into the failure messages and also add some test cases on the new "where" clause to ensure that it can cover as many use cases as possible.

thank you
Best regards

Cary Huang
-------------
HighGo Software Inc. (Canada)
cary.huang@highgo.ca
www.highgo.ca

#3Daniel Gustafsson
daniel@yesql.se
In reply to: Cary Huang (#2)
Re: pg_dump --where option

On 10 Jul 2020, at 02:03, Cary Huang <cary.huang@highgo.ca> wrote:

The following review has been posted through the commitfest application:
make installcheck-world: tested, failed
Implements feature: tested, failed
Spec compliant: tested, failed
Documentation: tested, failed

Hi

I had a look at the patch and it cleanly applies to postgres master branch. I tried to do a quick test on the new "where clause" functionality and for the most part it does the job as described and I'm sure some people will find this feature useful to their database dump needs. However I tried the feature with a case where I have a subquery in the where clause, but it seems to be failing to dump the data. I ran the pg_dump like:

$ pg_dump -d cary --where="test1:a3 = ( select max(aa1) from test2 )" > testdump2
$ pg_dump: error: processing of table "public.test1" failed

both test1 and test2 exist in the database and the same subquery works under psql.

I also notice that the regression tests for pg_dump is failing due to the patch, I think it is worth looking into the failure messages and also add some test cases on the new "where" clause to ensure that it can cover as many use cases as possible.

As this is being reviewed, but time is running out in this CF, I'm moving this
to the next CF. The entry will be moved to Waiting for Author based on the
above review.

cheers ./daniel

#4Surafel Temesgen
surafel3000@gmail.com
In reply to: Daniel Gustafsson (#3)
Re: pg_dump --where option

On Fri, Jul 31, 2020 at 1:38 AM Daniel Gustafsson <daniel@yesql.se> wrote:

$ pg_dump -d cary --where="test1:a3 = ( select max(aa1) from test2 )" >

testdump2

$ pg_dump: error: processing of table "public.test1" failed

both test1 and test2 exist in the database and the same subquery works

under psql.

This is because pg_dump uses schema-qualified object name I add
documentation about to use schema-qualified name when using sub query

I also notice that the regression tests for pg_dump is failing due to

the patch, I think it is worth looking into the failure messages and also
add some test cases on the new "where" clause to ensure that it can cover
as many use cases as possible.

I fix regression test failure on the attached patch.

I don’t add tests because single-quotes and double-quotes are
meta-characters for PROVE too.

regards

Surafel

Attachments:

pg_dump_where_clause-v2.patchtext/x-patch; charset=US-ASCII; name=pg_dump_where_clause-v2.patchDownload+165-10
#5Daniel Gustafsson
daniel@yesql.se
In reply to: Surafel Temesgen (#4)
Re: pg_dump --where option

On 14 Sep 2020, at 12:04, Surafel Temesgen <surafel3000@gmail.com> wrote:
On Fri, Jul 31, 2020 at 1:38 AM Daniel Gustafsson <daniel@yesql.se <mailto:daniel@yesql.se>> wrote:

$ pg_dump -d cary --where="test1:a3 = ( select max(aa1) from test2 )" > testdump2
$ pg_dump: error: processing of table "public.test1" failed

both test1 and test2 exist in the database and the same subquery works under psql.

This is because pg_dump uses schema-qualified object name I add documentation about to use schema-qualified name when using sub query

Documenting something is well and good, but isn't allowing arbitrary SQL
copy-pasted into the query (which isn't checked for schema qualification)
opening up for some of the ill-effects of CVE-2018-1058?

I don’t add tests because single-quotes and double-quotes are meta-characters for PROVE too.

I'm not sure I follow. Surely tests can be added for this functionality?

How should one invoke this on a multibyte char table name which require
quoting, like --table='"x"' (where x would be an mb char). Reading the
original thread and trying the syntax from there, it's also not clear how table
names with colons should be handled. I know they're not common, but if they're
not supported then the tradeoff should be documented.

A nearby thread [0]CAFj8pRB10wvW0CC9Xq=1XDs=zCQxer3cbLcNZa+qiX4cUH-G_A@mail.gmail.com is adding functionality to read from an input file due to
the command line being too short. Consumers of this might not run into the
issues mentioned there, but it doesn't seem far fetched that someone who does
also adds a small WHERE clause too. Maybe these patches should join forces?

cheers ./daniel

[0]: CAFj8pRB10wvW0CC9Xq=1XDs=zCQxer3cbLcNZa+qiX4cUH-G_A@mail.gmail.com

#6Michael Paquier
michael@paquier.xyz
In reply to: Daniel Gustafsson (#5)
Re: pg_dump --where option

On Mon, Sep 14, 2020 at 05:00:19PM +0200, Daniel Gustafsson wrote:

I'm not sure I follow. Surely tests can be added for this functionality?

We should have tests for that. I can see that this has not been
answered in two weeks, so this has been marked as returned with
feedback in the CF app.
--
Michael