proposal: possibility to read dumped table's name from file

Started by Pavel Stehulealmost 6 years ago207 messageshackers
Jump to latest
#1Pavel Stehule
pavel.stehule@gmail.com

Hi

one my customer has to specify dumped tables name by name. After years and
increasing database size and table numbers he has problem with too short
command line. He need to read the list of tables from file (or from stdin).

I wrote simple PoC patch

Comments, notes, ideas?

Regards

Pavel

Attachments:

pg_dump-table-list-from-file.patchtext/x-patch; charset=US-ASCII; name=pg_dump-table-list-from-file.patchDownload+56-0
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pavel Stehule (#1)
Re: proposal: possibility to read dumped table's name from file

Pavel Stehule <pavel.stehule@gmail.com> writes:

one my customer has to specify dumped tables name by name. After years and
increasing database size and table numbers he has problem with too short
command line. He need to read the list of tables from file (or from stdin).

I guess the question is why. That seems like an enormously error-prone
approach. Can't they switch to selecting schemas? Or excluding the
hopefully-short list of tables they don't want?

regards, tom lane

#3Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tom Lane (#2)
Re: proposal: possibility to read dumped table's name from file

pá 29. 5. 2020 v 16:28 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

Pavel Stehule <pavel.stehule@gmail.com> writes:

one my customer has to specify dumped tables name by name. After years

and

increasing database size and table numbers he has problem with too short
command line. He need to read the list of tables from file (or from

stdin).

I guess the question is why. That seems like an enormously error-prone
approach. Can't they switch to selecting schemas? Or excluding the
hopefully-short list of tables they don't want?

It is not typical application. It is a analytic application when the schema
of database is based on dynamic specification of end user (end user can do
customization every time). So schema is very dynamic.

For example - typical server has about four thousand databases and every
database has some between 1K .. 10K tables.

Another specific are different versions of data in different tables. A user
can work with one set of data (one set of tables) and a application
prepares new set of data (new set of tables). Load can be slow, because
sometimes bigger tables are filled (about forty GB). pg_dump backups one
set of tables (little bit like snapshot of data). So it is strange OLAP
(but successfull) application.

Regards

Pavel

Show quoted text

regards, tom lane

#4David Fetter
david@fetter.org
In reply to: Pavel Stehule (#1)
Re: proposal: possibility to read dumped table's name from file

On Fri, May 29, 2020 at 04:21:00PM +0200, Pavel Stehule wrote:

Hi

one my customer has to specify dumped tables name by name. After years and
increasing database size and table numbers he has problem with too short
command line. He need to read the list of tables from file (or from stdin).

I wrote simple PoC patch

Comments, notes, ideas?

This seems like a handy addition. What I've done in cases similar to
this was to use `grep -f` on the output of `pg_dump -l` to create a
file suitable for `pg_dump -L`, or mash them together like this:

pg_restore -L <(pg_dump -l /path/to/dumpfile | grep -f /path/to/listfile) -d new_db /path/to/dumpfile

That's a lot of shell magic and obscure corners of commands to expect
people to use.

Would it make sense to expand this patch to handle other objects?

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Fetter (#4)
Re: proposal: possibility to read dumped table's name from file

David Fetter <david@fetter.org> writes:

Would it make sense to expand this patch to handle other objects?

If we're gonna do something like this, +1 for being more general.
The fact that pg_dump only has selection switches for tables and
schemas has always struck me as an omission.

regards, tom lane

#6Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tom Lane (#5)
Re: proposal: possibility to read dumped table's name from file

pá 29. 5. 2020 v 18:03 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

David Fetter <david@fetter.org> writes:

Would it make sense to expand this patch to handle other objects?

Sure. Just we should to design system (and names of options).

If we're gonna do something like this, +1 for being more general.
The fact that pg_dump only has selection switches for tables and
schemas has always struck me as an omission.

a implementation is trivial, hard is good design of names for these options.

Pavel

Show quoted text

regards, tom lane

#7Justin Pryzby
pryzby@telsasoft.com
In reply to: Pavel Stehule (#1)
Re: proposal: possibility to read dumped table's name from file

On Fri, May 29, 2020 at 04:21:00PM +0200, Pavel Stehule wrote:

one my customer has to specify dumped tables name by name. After years and
increasing database size and table numbers he has problem with too short
command line. He need to read the list of tables from file (or from stdin).

+1 - we would use this.

We put a regex (actually a pg_dump pattern) of tables to skip (timeseries
partitions which are older than a few days and which are also dumped once not
expected to change, and typically not redumped). We're nowhere near the
execve() limit, but it'd be nice if the command was primarily a list of options
and not a long regex.

Please also support reading from file for --exclude-table=pattern.

I'm drawing a parallel between this and rsync --include/--exclude and --filter.

We'd be implementing a new --filter, which might have similar syntax to rsync
(which I always forget).

--
Justin

#8Pavel Stehule
pavel.stehule@gmail.com
In reply to: Justin Pryzby (#7)
Re: proposal: possibility to read dumped table's name from file

Hi

pá 29. 5. 2020 v 20:25 odesílatel Justin Pryzby <pryzby@telsasoft.com>
napsal:

On Fri, May 29, 2020 at 04:21:00PM +0200, Pavel Stehule wrote:

one my customer has to specify dumped tables name by name. After years

and

increasing database size and table numbers he has problem with too short
command line. He need to read the list of tables from file (or from

stdin).

+1 - we would use this.

We put a regex (actually a pg_dump pattern) of tables to skip (timeseries
partitions which are older than a few days and which are also dumped once
not
expected to change, and typically not redumped). We're nowhere near the
execve() limit, but it'd be nice if the command was primarily a list of
options
and not a long regex.

Please also support reading from file for --exclude-table=pattern.

I'm drawing a parallel between this and rsync --include/--exclude and
--filter.

We'd be implementing a new --filter, which might have similar syntax to
rsync
(which I always forget).

I implemented support for all "repeated" pg_dump options.

--exclude-schemas-file=FILENAME
--exclude-tables-data-file=FILENAME
--exclude-tables-file=FILENAME
--include-foreign-data-file=FILENAME
--include-schemas-file=FILENAME
--include-tables-file=FILENAME

Regards

Pavel

I invite any help with doc. There is just very raw text

Show quoted text

--
Justin

Attachments:

pg_dump-read-options-from-file.patchtext/x-patch; charset=US-ASCII; name=pg_dump-read-options-from-file.patchDownload+167-0
#9Justin Pryzby
pryzby@telsasoft.com
In reply to: Pavel Stehule (#8)
Re: proposal: possibility to read dumped table's name from file

On Mon, Jun 08, 2020 at 07:18:49PM +0200, Pavel Stehule wrote:

p� 29. 5. 2020 v 20:25 odes�latel Justin Pryzby <pryzby@telsasoft.com> napsal:

On Fri, May 29, 2020 at 04:21:00PM +0200, Pavel Stehule wrote:

one my customer has to specify dumped tables name by name. After years and
increasing database size and table numbers he has problem with too short
command line. He need to read the list of tables from file (or from stdin).

+1 - we would use this.

We put a regex (actually a pg_dump pattern) of tables to skip (timeseries
partitions which are older than a few days and which are also dumped once not
expected to change, and typically not redumped). We're nowhere near the
execve() limit, but it'd be nice if the command was primarily a list of options
and not a long regex.

Please also support reading from file for --exclude-table=pattern.

I'm drawing a parallel between this and rsync --include/--exclude and
--filter.

We'd be implementing a new --filter, which might have similar syntax to rsync
(which I always forget).

I implemented support for all "repeated" pg_dump options.

I invite any help with doc. There is just very raw text

+ Do not dump data of tables spefified in file.

*specified

I still wonder if a better syntax would use a unified --filter option, whose
argument would allow including/excluding any type of object:

+[tn] include (t)table/(n)namespace/...
-[tn] exclude (t)table/(n)namespace/...

In the past, I looked for a way to exclude extended stats objects, and ended up
using a separate schema. An "extensible" syntax might be better (although
reading a file of just patterns has the advantage that the function can just be
called once for each option for each type of object).

--
Justin

#10Pavel Stehule
pavel.stehule@gmail.com
In reply to: Justin Pryzby (#9)
Re: proposal: possibility to read dumped table's name from file

po 8. 6. 2020 v 23:30 odesílatel Justin Pryzby <pryzby@telsasoft.com>
napsal:

On Mon, Jun 08, 2020 at 07:18:49PM +0200, Pavel Stehule wrote:

pá 29. 5. 2020 v 20:25 odesílatel Justin Pryzby <pryzby@telsasoft.com>

napsal:

On Fri, May 29, 2020 at 04:21:00PM +0200, Pavel Stehule wrote:

one my customer has to specify dumped tables name by name. After

years and

increasing database size and table numbers he has problem with too

short

command line. He need to read the list of tables from file (or from

stdin).

+1 - we would use this.

We put a regex (actually a pg_dump pattern) of tables to skip

(timeseries

partitions which are older than a few days and which are also dumped

once not

expected to change, and typically not redumped). We're nowhere near

the

execve() limit, but it'd be nice if the command was primarily a list

of options

and not a long regex.

Please also support reading from file for --exclude-table=pattern.

I'm drawing a parallel between this and rsync --include/--exclude and
--filter.

We'd be implementing a new --filter, which might have similar syntax

to rsync

(which I always forget).

I implemented support for all "repeated" pg_dump options.

I invite any help with doc. There is just very raw text

+ Do not dump data of tables spefified in file.

*specified

I am sending updated version - now with own implementation GNU (not POSIX)
function getline

I still wonder if a better syntax would use a unified --filter option, whose

argument would allow including/excluding any type of object:

+[tn] include (t)table/(n)namespace/...
-[tn] exclude (t)table/(n)namespace/...

In the past, I looked for a way to exclude extended stats objects, and
ended up
using a separate schema. An "extensible" syntax might be better (although
reading a file of just patterns has the advantage that the function can
just be
called once for each option for each type of object).

I tried to implement simple format "[+-][tndf] objectname"

please, check attached patch

Regards

Pavel

Show quoted text

--
Justin

Attachments:

pg_dump-filter.patchtext/x-patch; charset=US-ASCII; name=pg_dump-filter.patchDownload+181-0
#11Justin Pryzby
pryzby@telsasoft.com
In reply to: Pavel Stehule (#10)
Re: proposal: possibility to read dumped table's name from file

On Tue, Jun 09, 2020 at 11:46:24AM +0200, Pavel Stehule wrote:

po 8. 6. 2020 v 23:30 odes�latel Justin Pryzby <pryzby@telsasoft.com> napsal:

I still wonder if a better syntax would use a unified --filter option, whose

argument would allow including/excluding any type of object:

I tried to implement simple format "[+-][tndf] objectname"

Thanks.

+ /* ignore empty rows */
+ if (*line != '\0')

Maybe: if line=='\0': continue

We should also support comments.

+ bool include_filter = false;
+ bool exclude_filter = false;

I think we only need one bool.
You could call it: bool is_exclude = false

+
+							if (chars < 4)
+								invalid_filter_format(optarg, line, lineno);

I think that check is too lax.
I think it's ok if we require the first char to be [-+] and the 2nd char to be
[dntf]

+ objecttype = line[1];

... but I think this is inadequately "liberal in what it accepts"; I think it
should skip spaces. In my proposed scheme, someone might reasonably write:

+
+							objectname = &line[3];
+
+							/* skip initial spaces */
+							while (*objectname == ' ')
+								objectname++;

I suggest to use isspace()

I think we should check that *objectname != '\0', rather than chars>=4, above.

+								if (include_filter)
+								{
+									simple_string_list_append(&table_include_patterns, objectname);
+									dopt.include_everything = false;
+								}
+								else if (exclude_filter)
+									simple_string_list_append(&table_exclude_patterns, objectname);

If you use bool is_exclude, then this becomes "else" and you don't need to
think about checking if (!include && !exclude).

+							else if (objecttype == 'f')
+							{
+								if (include_filter)
+									simple_string_list_append(&foreign_servers_include_patterns, objectname);
+								else if (exclude_filter)
+									invalid_filter_format(optarg, line, lineno);
+							}

I would handle invalid object types as "else: invalid_filter_format()" here,
rather than duplicating above as: !=ALL('d','n','t','f')

+
+					if (ferror(f))
+						fatal("could not read from file \"%s\": %s",
+							  f == stdin ? "stdin" : optarg,
+							  strerror(errno));

I think we're allowed to use %m here ?

+ printf(_(" --filter=FILENAME read object names from file\n"));

Object name filter expression, or something..

+ * getline is originaly GNU function, and should not be everywhere still.

originally

+ * Use own reduced implementation.

Did you "reduce" this from another implementation? Where?
What is its license ?

Maybe a line-reader already exists in the frontend (?) .. or maybe it should.

--
Justin

#12Pavel Stehule
pavel.stehule@gmail.com
In reply to: Justin Pryzby (#11)
Re: proposal: possibility to read dumped table's name from file

st 10. 6. 2020 v 0:30 odesílatel Justin Pryzby <pryzby@telsasoft.com>
napsal:

On Tue, Jun 09, 2020 at 11:46:24AM +0200, Pavel Stehule wrote:

po 8. 6. 2020 v 23:30 odesílatel Justin Pryzby <pryzby@telsasoft.com>

napsal:

I still wonder if a better syntax would use a unified --filter option,

whose

argument would allow including/excluding any type of object:

I tried to implement simple format "[+-][tndf] objectname"

I had another idea about format - instead using +-, we can use case
sensitive options same to pg_dump command line (with extending Df -
because these options doesn't exists in short form)

So format can looks like

[tTnNDf] {objectname}

What do you think about this? This format is simpler, and it can work. What
do you think about it?

Did you "reduce" this from another implementation? Where?
What is its license ?

The code is 100% mine. It is not copy from gnulib and everybody can simply
check it

https://code.woboq.org/userspace/glibc/stdio-common/getline.c.html
https://code.woboq.org/userspace/glibc/libio/iogetdelim.c.html#_IO_getdelim

Reduced in functionality sense. There is no full argument check that is
necessary for glibc functions. There are no memory checks because
pg_malloc, pg_realloc are used.

#13Justin Pryzby
pryzby@telsasoft.com
In reply to: Pavel Stehule (#12)
Re: proposal: possibility to read dumped table's name from file

On Wed, Jun 10, 2020 at 05:03:49AM +0200, Pavel Stehule wrote:

st 10. 6. 2020 v 0:30 odes�latel Justin Pryzby <pryzby@telsasoft.com> napsal:

On Tue, Jun 09, 2020 at 11:46:24AM +0200, Pavel Stehule wrote:

po 8. 6. 2020 v 23:30 odes�latel Justin Pryzby <pryzby@telsasoft.com> napsal:

I still wonder if a better syntax would use a unified --filter option, whose

argument would allow including/excluding any type of object:

I tried to implement simple format "[+-][tndf] objectname"

I had another idea about format - instead using +-, we can use case
sensitive options same to pg_dump command line (with extending Df -
because these options doesn't exists in short form)

So format can looks like

[tTnNDf] {objectname}

What do you think about this? This format is simpler, and it can work. What
do you think about it?

I prefer [-+][dtnf], which is similar to rsync --filter, and clear what it's
doing. I wouldn't put much weight on what the short options are.

I wonder if some people would want to be able to use *long* or short options:

-table foo
+schema baz

Or maybe:

exclude-table=foo
schema=bar

Some tools use "long options without leading dashes" as their configuration
file format. Examples: openvpn, mysql. So that could be a good option.
OTOH, there's only a few "keys", so I'm not sure how many people would want to
repeat them, if there's enough to bother putting them in the file rather than
the cmdline.

--
Justin

#14Pavel Stehule
pavel.stehule@gmail.com
In reply to: Justin Pryzby (#11)
Re: proposal: possibility to read dumped table's name from file

st 10. 6. 2020 v 0:30 odesílatel Justin Pryzby <pryzby@telsasoft.com>
napsal:

On Tue, Jun 09, 2020 at 11:46:24AM +0200, Pavel Stehule wrote:

po 8. 6. 2020 v 23:30 odesílatel Justin Pryzby <pryzby@telsasoft.com>

napsal:

I still wonder if a better syntax would use a unified --filter option,

whose

argument would allow including/excluding any type of object:

I tried to implement simple format "[+-][tndf] objectname"

Thanks.

+                                             /* ignore empty rows */
+                                             if (*line != '\0')

Maybe: if line=='\0': continue

ok

We should also support comments.

+ bool

include_filter = false;

+ bool

exclude_filter = false;

I think we only need one bool.
You could call it: bool is_exclude = false

ok

+

+                                                     if (chars < 4)
+

invalid_filter_format(optarg, line, lineno);

I think that check is too lax.
I think it's ok if we require the first char to be [-+] and the 2nd char
to be
[dntf]

+ objecttype =

line[1];

... but I think this is inadequately "liberal in what it accepts"; I think
it
should skip spaces. In my proposed scheme, someone might reasonably write:

+
+                                                     objectname =

&line[3];

+
+                                                     /* skip initial

spaces */

+ while (*objectname

== ' ')

+

objectname++;

I suggest to use isspace()

ok

I think we should check that *objectname != '\0', rather than chars>=4,
above.

done

+ if

(include_filter)

+                                                             {
+

simple_string_list_append(&table_include_patterns, objectname);

+

dopt.include_everything = false;

+                                                             }
+                                                             else if

(exclude_filter)

+

simple_string_list_append(&table_exclude_patterns, objectname);

If you use bool is_exclude, then this becomes "else" and you don't need to
think about checking if (!include && !exclude).

+ else if

(objecttype == 'f')

+                                                     {
+                                                             if

(include_filter)

+

simple_string_list_append(&foreign_servers_include_patterns, objectname);

+ else if

(exclude_filter)

+

invalid_filter_format(optarg, line, lineno);

+ }

I would handle invalid object types as "else: invalid_filter_format()"
here,
rather than duplicating above as: !=ALL('d','n','t','f')

good idea

+
+                                     if (ferror(f))
+                                             fatal("could not read from

file \"%s\": %s",

+ f == stdin ?

"stdin" : optarg,

+ strerror(errno));

I think we're allowed to use %m here ?

changed

+ printf(_(" --filter=FILENAME read object names from

file\n"));

Object name filter expression, or something..

yes, it is not object names now

+ * getline is originaly GNU function, and should not be everywhere

still.
originally

+ * Use own reduced implementation.

Did you "reduce" this from another implementation? Where?
What is its license ?

Maybe a line-reader already exists in the frontend (?) .. or maybe it
should.

everywhere else is used a function fgets. Currently pg_getline is used just
on only one place, so I don't think so moving it to some common part is
maybe premature.

Maybe it can be used as replacement of some fgets calls, but then it is
different topic, I think.

Thank you for comments, attached updated patch

Regards

Pavel

Show quoted text

--
Justin

Attachments:

pg_dump-filter-20200611.patchtext/x-patch; charset=US-ASCII; name=pg_dump-filter-20200611.patchDownload+206-0
#15vignesh C
vignesh21@gmail.com
In reply to: Pavel Stehule (#14)
Re: proposal: possibility to read dumped table's name from file

On Thu, Jun 11, 2020 at 1:07 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:

Thank you for comments, attached updated patch

Few comments:
+invalid_filter_format(char *message, char *filename, char *line, int lineno)
+{
+       char       *displayname;
+
+       displayname = *filename == '-' ? "stdin" : filename;
+
+       pg_log_error("invalid format of filter file \"%s\": %s",
+                                displayname,
+                                message);
+
+       fprintf(stderr, "%d: %s\n", lineno, line);
+       exit_nicely(1);
+}
I think fclose is missing here.
+                                               if (line[chars - 1] == '\n')
+                                                       line[chars - 1] = '\0';
Should we check for '\r' also to avoid failures in some platforms.
+     <varlistentry>
+      <term><option>--filter=<replaceable
class="parameter">filename</replaceable></option></term>
+      <listitem>
+       <para>
+        Read filters from file. Format "(+|-)(tnfd) objectname:
+       </para>
+      </listitem>
+     </varlistentry>

I felt some documentation is missing here. We could include,
options tnfd is for controlling table, schema, foreign server data &
table exclude patterns.

Instead of using tnfd, if we could use the same options as existing
pg_dump options it will be less confusing.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

#16Justin Pryzby
pryzby@telsasoft.com
In reply to: Pavel Stehule (#14)
Re: proposal: possibility to read dumped table's name from file

On Thu, Jun 11, 2020 at 09:36:18AM +0200, Pavel Stehule wrote:

st 10. 6. 2020 v 0:30 odes�latel Justin Pryzby <pryzby@telsasoft.com> napsal:

+                                             /* ignore empty rows */
+                                             if (*line != '\0')

Maybe: if line=='\0': continue
We should also support comments.

Comment support is still missing but easily added :)

I tried this patch and it works for my purposes.

Also, your getline is dynamically re-allocating lines of arbitrary length.
Possibly that's not needed. We'll typically read "+t schema.relname", which is
132 chars. Maybe it's sufficient to do
char buf[1024];
fgets(buf);
if strchr(buf, '\n') == NULL: error();
ret = pstrdup(buf);

In any case, you could have getline return a char* and (rather than following
GNU) no need to take char**, int* parameters to conflate inputs and outputs.

I realized that --filter has an advantage over the previous implementation
(with multiple --exclude-* and --include-*) in that it's possible to use stdin
for includes *and* excludes.

By chance, I had the opportunity yesterday to re-use with rsync a regex that
I'd previously been using with pg_dump and grep. What this patch calls
"--filter" in rsync is called "--filter-from". rsync's --filter-from rejects
filters of length longer than max filename, so I had to split it up into
multiple lines instead of using regex alternation ("|"). This option is a
close parallel in pg_dump.

--
Justin

#17Pavel Stehule
pavel.stehule@gmail.com
In reply to: vignesh C (#15)
Re: proposal: possibility to read dumped table's name from file

so 27. 6. 2020 v 14:55 odesílatel vignesh C <vignesh21@gmail.com> napsal:

On Thu, Jun 11, 2020 at 1:07 PM Pavel Stehule <pavel.stehule@gmail.com>
wrote:

Thank you for comments, attached updated patch

Few comments:
+invalid_filter_format(char *message, char *filename, char *line, int
lineno)
+{
+       char       *displayname;
+
+       displayname = *filename == '-' ? "stdin" : filename;
+
+       pg_log_error("invalid format of filter file \"%s\": %s",
+                                displayname,
+                                message);
+
+       fprintf(stderr, "%d: %s\n", lineno, line);
+       exit_nicely(1);
+}

I think fclose is missing here.

done

+                                               if (line[chars - 1] ==
'\n')
+                                                       line[chars - 1] =
'\0';
Should we check for '\r' also to avoid failures in some platforms.

I checked other usage of fgets in Postgres source code, and everywhere is
used test on \n

When I did some fast research, then
https://stackoverflow.com/questions/12769289/carriage-return-by-fgets \r in
this case should be thrown by libc on Microsoft

https://stackoverflow.com/questions/2061334/fgets-linux-vs-mac

\n should be on Mac OS X .. 2001 year .. I am not sure if Mac OS 9 should
be supported.

+     <varlistentry>
+      <term><option>--filter=<replaceable
class="parameter">filename</replaceable></option></term>
+      <listitem>
+       <para>
+        Read filters from file. Format "(+|-)(tnfd) objectname:
+       </para>
+      </listitem>
+     </varlistentry>

I felt some documentation is missing here. We could include,
options tnfd is for controlling table, schema, foreign server data &
table exclude patterns.

I have a plan to completate doc when the design is completed. It was not
clear if people prefer long or short forms of option names.

Instead of using tnfd, if we could use the same options as existing
pg_dump options it will be less confusing.

it almost same

+-t .. tables
+-n schema
-d exclude data .. there is not short option for --exclude-table-data
+f include foreign table .. there is not short option for
--include-foreign-data

So still, there is a opened question if use +-tnfd system, or system based
on long option

table foo
exclude-table foo
schema xx
exclude-schema xx
include-foreign-data yyy
exclude-table-data zzz

Typically these files will be generated by scripts and processed via pipe,
so there I see just two arguments for and aginst:

short format - there is less probability to do typo error (but there is not
full consistency with pg_dump options)
long format - it is self documented (and there is full consistency with
pg_dump)

In this case I prefer short form .. it is more comfortable for users, and
there are only a few variants, so it is not necessary to use too verbose
language (design). But my opinion is not aggressively strong and I'll
accept any common agreement.

Regards

Updated patch attached

Show quoted text

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Attachments:

pg_dump-filter-20200705.patchtext/x-patch; charset=US-ASCII; name=pg_dump-filter-20200705.patchDownload+212-0
#18Pavel Stehule
pavel.stehule@gmail.com
In reply to: Justin Pryzby (#16)
Re: proposal: possibility to read dumped table's name from file

st 1. 7. 2020 v 23:24 odesílatel Justin Pryzby <pryzby@telsasoft.com>
napsal:

On Thu, Jun 11, 2020 at 09:36:18AM +0200, Pavel Stehule wrote:

st 10. 6. 2020 v 0:30 odesílatel Justin Pryzby <pryzby@telsasoft.com>

napsal:

+                                             /* ignore empty rows */
+                                             if (*line != '\0')

Maybe: if line=='\0': continue
We should also support comments.

Comment support is still missing but easily added :)

I tried this patch and it works for my purposes.

Also, your getline is dynamically re-allocating lines of arbitrary length.
Possibly that's not needed. We'll typically read "+t schema.relname",
which is
132 chars. Maybe it's sufficient to do
char buf[1024];
fgets(buf);
if strchr(buf, '\n') == NULL: error();
ret = pstrdup(buf);

63 bytes is max effective identifier size, but it is not max size of
identifiers. It is very probably so buff with 1024 bytes will be enough for
all, but I do not want to increase any new magic limit. More when dynamic
implementation is not too hard.

Table name can be very long - sometimes the data names (table names) can be
stored in external storages with full length and should not be practical to
require truncating in filter file.

For this case it is very effective, because a resized (increased) buffer is
used for following rows, so realloc should not be often. So when I have to
choose between two implementations with similar complexity, I prefer more
dynamic code without hardcoded limits. This dynamic hasn't any overhead.

In any case, you could have getline return a char* and (rather than
following
GNU) no need to take char**, int* parameters to conflate inputs and
outputs.

no, it has a special benefit. It eliminates the short malloc/free cycle.
When some lines are longer, then the buffer is increased (and limits), and
for other rows with same or less size is not necessary realloc.

I realized that --filter has an advantage over the previous implementation
(with multiple --exclude-* and --include-*) in that it's possible to use
stdin
for includes *and* excludes.

yes, it looks like better choose

By chance, I had the opportunity yesterday to re-use with rsync a regex
that
I'd previously been using with pg_dump and grep. What this patch calls
"--filter" in rsync is called "--filter-from". rsync's --filter-from
rejects
filters of length longer than max filename, so I had to split it up into
multiple lines instead of using regex alternation ("|"). This option is a
close parallel in pg_dump.

we can talk about option name - maybe "--filter-from" is better than just
"--filter"

Regards

Pavel

Show quoted text

--
Justin

#19Justin Pryzby
pryzby@telsasoft.com
In reply to: Justin Pryzby (#16)
Re: proposal: possibility to read dumped table's name from file

On Wed, Jul 01, 2020 at 04:24:52PM -0500, Justin Pryzby wrote:

On Thu, Jun 11, 2020 at 09:36:18AM +0200, Pavel Stehule wrote:

st 10. 6. 2020 v 0:30 odes�latel Justin Pryzby <pryzby@telsasoft.com> napsal:

+                                             /* ignore empty rows */
+                                             if (*line != '\0')

Maybe: if line=='\0': continue
We should also support comments.

Comment support is still missing but easily added :)

Still missing from the latest patch.

With some added documentation, I think this can be RfC.

--
Justin

#20Pavel Stehule
pavel.stehule@gmail.com
In reply to: Justin Pryzby (#19)
Re: proposal: possibility to read dumped table's name from file

ne 5. 7. 2020 v 22:31 odesílatel Justin Pryzby <pryzby@telsasoft.com>
napsal:

On Wed, Jul 01, 2020 at 04:24:52PM -0500, Justin Pryzby wrote:

On Thu, Jun 11, 2020 at 09:36:18AM +0200, Pavel Stehule wrote:

st 10. 6. 2020 v 0:30 odesílatel Justin Pryzby <pryzby@telsasoft.com>

napsal:

+ /* ignore empty rows

*/

+ if (*line != '\0')

Maybe: if line=='\0': continue
We should also support comments.

Comment support is still missing but easily added :)

Still missing from the latest patch.

I can implement a comment support. But I am not sure about the format. The
start can be "--" or classic #.

but "--" can be in this context messy

Show quoted text

With some added documentation, I think this can be RfC.

--
Justin

#21Pavel Stehule
pavel.stehule@gmail.com
In reply to: Pavel Stehule (#20)
#22Justin Pryzby
pryzby@telsasoft.com
In reply to: Pavel Stehule (#21)
#23vignesh C
vignesh21@gmail.com
In reply to: Pavel Stehule (#21)
#24Pavel Stehule
pavel.stehule@gmail.com
In reply to: Justin Pryzby (#22)
#25Pavel Stehule
pavel.stehule@gmail.com
In reply to: vignesh C (#23)
#26Daniel Gustafsson
daniel@yesql.se
In reply to: Pavel Stehule (#25)
#27Pavel Stehule
pavel.stehule@gmail.com
In reply to: Daniel Gustafsson (#26)
#28vignesh C
vignesh21@gmail.com
In reply to: Pavel Stehule (#25)
#29Justin Pryzby
pryzby@telsasoft.com
In reply to: Daniel Gustafsson (#26)
#30Daniel Gustafsson
daniel@yesql.se
In reply to: Pavel Stehule (#27)
#31Pavel Stehule
pavel.stehule@gmail.com
In reply to: Daniel Gustafsson (#30)
#32Justin Pryzby
pryzby@telsasoft.com
In reply to: Pavel Stehule (#24)
#33Pavel Stehule
pavel.stehule@gmail.com
In reply to: Justin Pryzby (#32)
#34Pavel Stehule
pavel.stehule@gmail.com
In reply to: vignesh C (#28)
#35vignesh C
vignesh21@gmail.com
In reply to: Pavel Stehule (#34)
#36Justin Pryzby
pryzby@telsasoft.com
In reply to: vignesh C (#35)
#37Pavel Stehule
pavel.stehule@gmail.com
In reply to: vignesh C (#35)
#38Pavel Stehule
pavel.stehule@gmail.com
In reply to: Justin Pryzby (#36)
#39Justin Pryzby
pryzby@telsasoft.com
In reply to: Pavel Stehule (#37)
#40Justin Pryzby
pryzby@telsasoft.com
In reply to: Pavel Stehule (#18)
#41Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Pavel Stehule (#37)
#42Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#41)
#43Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tom Lane (#42)
#44Pavel Stehule
pavel.stehule@gmail.com
In reply to: Pavel Stehule (#43)
#45Surafel Temesgen
surafel3000@gmail.com
In reply to: Pavel Stehule (#43)
#46Pavel Stehule
pavel.stehule@gmail.com
In reply to: Surafel Temesgen (#45)
#47Pavel Stehule
pavel.stehule@gmail.com
In reply to: Pavel Stehule (#46)
#48Pavel Stehule
pavel.stehule@gmail.com
In reply to: Pavel Stehule (#47)
#49Stephen Frost
sfrost@snowman.net
In reply to: Pavel Stehule (#48)
#50Pavel Stehule
pavel.stehule@gmail.com
In reply to: Stephen Frost (#49)
#51Pavel Stehule
pavel.stehule@gmail.com
In reply to: Pavel Stehule (#50)
#52Pavel Stehule
pavel.stehule@gmail.com
In reply to: Pavel Stehule (#48)
#53Pavel Stehule
pavel.stehule@gmail.com
In reply to: Pavel Stehule (#50)
#54Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Pavel Stehule (#50)
#55Pavel Stehule
pavel.stehule@gmail.com
In reply to: Alvaro Herrera (#54)
#56Justin Pryzby
pryzby@telsasoft.com
In reply to: Pavel Stehule (#51)
#57Stephen Frost
sfrost@snowman.net
In reply to: Justin Pryzby (#56)
#58Pavel Stehule
pavel.stehule@gmail.com
In reply to: Justin Pryzby (#56)
#59Pavel Stehule
pavel.stehule@gmail.com
In reply to: Pavel Stehule (#58)
#60Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Pavel Stehule (#59)
#61Pavel Stehule
pavel.stehule@gmail.com
In reply to: Dean Rasheed (#60)
#62Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pavel Stehule (#61)
#63Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tom Lane (#62)
#64Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Pavel Stehule (#63)
#65Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dean Rasheed (#64)
#66Stephen Frost
sfrost@snowman.net
In reply to: Tom Lane (#65)
#67Stephen Frost
sfrost@snowman.net
In reply to: Pavel Stehule (#61)
#68Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tom Lane (#62)
#69Pavel Stehule
pavel.stehule@gmail.com
In reply to: Stephen Frost (#67)
#70Justin Pryzby
pryzby@telsasoft.com
In reply to: Pavel Stehule (#68)
#71Pavel Stehule
pavel.stehule@gmail.com
In reply to: Justin Pryzby (#70)
#72Pavel Stehule
pavel.stehule@gmail.com
In reply to: Pavel Stehule (#71)
#73Pavel Stehule
pavel.stehule@gmail.com
In reply to: Pavel Stehule (#72)
#74Pavel Stehule
pavel.stehule@gmail.com
In reply to: Pavel Stehule (#73)
#75Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Pavel Stehule (#74)
#76Daniel Gustafsson
daniel@yesql.se
In reply to: Tomas Vondra (#75)
#77Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Daniel Gustafsson (#76)
#78Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tomas Vondra (#77)
#79Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#78)
#80Stephen Frost
sfrost@snowman.net
In reply to: Tom Lane (#79)
#81Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Stephen Frost (#80)
#82Daniel Gustafsson
daniel@yesql.se
In reply to: Tomas Vondra (#81)
#83Daniel Gustafsson
daniel@yesql.se
In reply to: Alvaro Herrera (#78)
#84Stephen Frost
sfrost@snowman.net
In reply to: Daniel Gustafsson (#82)
#85Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Stephen Frost (#84)
#86Stephen Frost
sfrost@snowman.net
In reply to: Tomas Vondra (#85)
#87Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Stephen Frost (#86)
#88Stephen Frost
sfrost@snowman.net
In reply to: Alvaro Herrera (#87)
#89Pavel Stehule
pavel.stehule@gmail.com
In reply to: Stephen Frost (#88)
#90Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Stephen Frost (#88)
#91Pavel Stehule
pavel.stehule@gmail.com
In reply to: Pavel Stehule (#74)
#92Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tom Lane (#79)
#93Daniel Gustafsson
daniel@yesql.se
In reply to: Pavel Stehule (#92)
#94Justin Pryzby
pryzby@telsasoft.com
In reply to: Pavel Stehule (#92)
#95Pavel Stehule
pavel.stehule@gmail.com
In reply to: Daniel Gustafsson (#93)
#96Pavel Stehule
pavel.stehule@gmail.com
In reply to: Justin Pryzby (#94)
#97Pavel Stehule
pavel.stehule@gmail.com
In reply to: Pavel Stehule (#95)
#98Daniel Gustafsson
daniel@yesql.se
In reply to: Pavel Stehule (#97)
#99Daniel Gustafsson
daniel@yesql.se
In reply to: Pavel Stehule (#95)
#100Pavel Stehule
pavel.stehule@gmail.com
In reply to: Daniel Gustafsson (#98)
#101Pavel Stehule
pavel.stehule@gmail.com
In reply to: Daniel Gustafsson (#99)
#102Daniel Gustafsson
daniel@yesql.se
In reply to: Pavel Stehule (#101)
#103Pavel Stehule
pavel.stehule@gmail.com
In reply to: Daniel Gustafsson (#102)
#104Stephen Frost
sfrost@snowman.net
In reply to: Pavel Stehule (#103)
#105Daniel Gustafsson
daniel@yesql.se
In reply to: Pavel Stehule (#103)
#106Stephen Frost
sfrost@snowman.net
In reply to: Daniel Gustafsson (#105)
#107Pavel Stehule
pavel.stehule@gmail.com
In reply to: Daniel Gustafsson (#105)
#108Pavel Stehule
pavel.stehule@gmail.com
In reply to: Daniel Gustafsson (#99)
#109Daniel Gustafsson
daniel@yesql.se
In reply to: Pavel Stehule (#108)
#110Pavel Stehule
pavel.stehule@gmail.com
In reply to: Daniel Gustafsson (#109)
#111Daniel Gustafsson
daniel@yesql.se
In reply to: Pavel Stehule (#110)
#112Pavel Stehule
pavel.stehule@gmail.com
In reply to: Daniel Gustafsson (#111)
#113Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Daniel Gustafsson (#111)
#114Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Alvaro Herrera (#113)
#115Pavel Stehule
pavel.stehule@gmail.com
In reply to: Daniel Gustafsson (#111)
#116Pavel Stehule
pavel.stehule@gmail.com
In reply to: Daniel Gustafsson (#111)
#117Daniel Gustafsson
daniel@yesql.se
In reply to: Pavel Stehule (#116)
#118Pavel Stehule
pavel.stehule@gmail.com
In reply to: Daniel Gustafsson (#117)
#119Erik Rijkers
er@xs4all.nl
In reply to: Daniel Gustafsson (#117)
#120Erik Rijkers
er@xs4all.nl
In reply to: Erik Rijkers (#119)
#121Daniel Gustafsson
daniel@yesql.se
In reply to: Erik Rijkers (#120)
#122Daniel Gustafsson
daniel@yesql.se
In reply to: Daniel Gustafsson (#117)
#123Pavel Stehule
pavel.stehule@gmail.com
In reply to: Daniel Gustafsson (#122)
#124Erik Rijkers
er@xs4all.nl
In reply to: Daniel Gustafsson (#122)
#125Pavel Stehule
pavel.stehule@gmail.com
In reply to: Daniel Gustafsson (#122)
#126Pavel Stehule
pavel.stehule@gmail.com
In reply to: Pavel Stehule (#97)
#127Andrew Dunstan
andrew@dunslane.net
In reply to: Pavel Stehule (#126)
#128Pavel Stehule
pavel.stehule@gmail.com
In reply to: Andrew Dunstan (#127)
#129Pavel Stehule
pavel.stehule@gmail.com
In reply to: Pavel Stehule (#128)
#130Justin Pryzby
pryzby@telsasoft.com
In reply to: Pavel Stehule (#129)
#131Pavel Stehule
pavel.stehule@gmail.com
In reply to: Justin Pryzby (#130)
#132Pavel Stehule
pavel.stehule@gmail.com
In reply to: Pavel Stehule (#131)
#133Daniel Gustafsson
daniel@yesql.se
In reply to: Pavel Stehule (#132)
#134Erik Rijkers
er@xs4all.nl
In reply to: Daniel Gustafsson (#133)
#135Daniel Gustafsson
daniel@yesql.se
In reply to: Erik Rijkers (#134)
#136Julien Rouhaud
rjuju123@gmail.com
In reply to: Daniel Gustafsson (#135)
#137Daniel Gustafsson
daniel@yesql.se
In reply to: Julien Rouhaud (#136)
#138Pavel Stehule
pavel.stehule@gmail.com
In reply to: Daniel Gustafsson (#133)
#139John Naylor
john.naylor@enterprisedb.com
In reply to: Daniel Gustafsson (#137)
#140Andrew Dunstan
andrew@dunslane.net
In reply to: John Naylor (#139)
#141Daniel Gustafsson
daniel@yesql.se
In reply to: Andrew Dunstan (#140)
#142Pavel Stehule
pavel.stehule@gmail.com
In reply to: Daniel Gustafsson (#141)
#143Erik Rijkers
er@xs4all.nl
In reply to: Daniel Gustafsson (#141)
#144Erik Rijkers
er@xs4all.nl
In reply to: Erik Rijkers (#143)
#145John Naylor
john.naylor@enterprisedb.com
In reply to: Pavel Stehule (#142)
#146Pavel Stehule
pavel.stehule@gmail.com
In reply to: John Naylor (#145)
#147Andres Freund
andres@anarazel.de
In reply to: Daniel Gustafsson (#141)
#148Andres Freund
andres@anarazel.de
In reply to: Daniel Gustafsson (#141)
#149Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#148)
#150Andres Freund
andres@anarazel.de
In reply to: Andres Freund (#149)
#151Daniel Gustafsson
daniel@yesql.se
In reply to: Andres Freund (#150)
#152Tom Lane
tgl@sss.pgh.pa.us
In reply to: Daniel Gustafsson (#151)
#153Andres Freund
andres@anarazel.de
In reply to: Daniel Gustafsson (#151)
#154Pavel Stehule
pavel.stehule@gmail.com
In reply to: Daniel Gustafsson (#151)
#155Julien Rouhaud
rjuju123@gmail.com
In reply to: Pavel Stehule (#154)
#156Pavel Stehule
pavel.stehule@gmail.com
In reply to: Julien Rouhaud (#155)
#157Julien Rouhaud
rjuju123@gmail.com
In reply to: Pavel Stehule (#156)
#158Pavel Stehule
pavel.stehule@gmail.com
In reply to: Julien Rouhaud (#157)
#159Andres Freund
andres@anarazel.de
In reply to: Pavel Stehule (#156)
#160Julien Rouhaud
rjuju123@gmail.com
In reply to: Andres Freund (#159)
#161Pavel Stehule
pavel.stehule@gmail.com
In reply to: Julien Rouhaud (#160)
#162Julien Rouhaud
rjuju123@gmail.com
In reply to: Pavel Stehule (#161)
#163Pavel Stehule
pavel.stehule@gmail.com
In reply to: Julien Rouhaud (#162)
#164Julien Rouhaud
rjuju123@gmail.com
In reply to: Pavel Stehule (#163)
#165Justin Pryzby
pryzby@telsasoft.com
In reply to: Julien Rouhaud (#164)
#166Julien Rouhaud
rjuju123@gmail.com
In reply to: Justin Pryzby (#165)
#167Pavel Stehule
pavel.stehule@gmail.com
In reply to: Julien Rouhaud (#166)
#168Julien Rouhaud
rjuju123@gmail.com
In reply to: Pavel Stehule (#167)
#169Pavel Stehule
pavel.stehule@gmail.com
In reply to: Julien Rouhaud (#166)
#170Julien Rouhaud
rjuju123@gmail.com
In reply to: Pavel Stehule (#169)
#171Pavel Stehule
pavel.stehule@gmail.com
In reply to: Julien Rouhaud (#170)
#172Pavel Stehule
pavel.stehule@gmail.com
In reply to: Pavel Stehule (#171)
#173Julien Rouhaud
rjuju123@gmail.com
In reply to: Pavel Stehule (#171)
#174Pavel Stehule
pavel.stehule@gmail.com
In reply to: Julien Rouhaud (#173)
#175Julien Rouhaud
rjuju123@gmail.com
In reply to: Pavel Stehule (#174)
#176Pavel Stehule
pavel.stehule@gmail.com
In reply to: Julien Rouhaud (#175)
#177Andres Freund
andres@anarazel.de
In reply to: Pavel Stehule (#174)
#178Pavel Stehule
pavel.stehule@gmail.com
In reply to: Andres Freund (#177)
#179Pavel Stehule
pavel.stehule@gmail.com
In reply to: Andres Freund (#177)
#180Gregory Stark (as CFM)
stark.cfm@gmail.com
In reply to: Julien Rouhaud (#175)
#181Daniel Gustafsson
daniel@yesql.se
In reply to: Gregory Stark (as CFM) (#180)
#182Julien Rouhaud
rjuju123@gmail.com
In reply to: Daniel Gustafsson (#181)
#183Pavel Stehule
pavel.stehule@gmail.com
In reply to: Julien Rouhaud (#182)
#184Pavel Stehule
pavel.stehule@gmail.com
In reply to: Pavel Stehule (#183)
#185Justin Pryzby
pryzby@telsasoft.com
In reply to: Pavel Stehule (#183)
#186Pavel Stehule
pavel.stehule@gmail.com
In reply to: Justin Pryzby (#185)
#187Pavel Stehule
pavel.stehule@gmail.com
In reply to: Justin Pryzby (#185)
#188Justin Pryzby
pryzby@telsasoft.com
In reply to: Pavel Stehule (#187)
#189Pavel Stehule
pavel.stehule@gmail.com
In reply to: Justin Pryzby (#188)
#190Pavel Stehule
pavel.stehule@gmail.com
In reply to: Justin Pryzby (#188)
#191Pavel Stehule
pavel.stehule@gmail.com
In reply to: Pavel Stehule (#190)
#192Pavel Stehule
pavel.stehule@gmail.com
In reply to: Pavel Stehule (#191)
#193Pavel Stehule
pavel.stehule@gmail.com
In reply to: Pavel Stehule (#192)
#194Daniel Gustafsson
daniel@yesql.se
In reply to: Pavel Stehule (#193)
#195Pavel Stehule
pavel.stehule@gmail.com
In reply to: Daniel Gustafsson (#194)
#196Pavel Stehule
pavel.stehule@gmail.com
In reply to: Pavel Stehule (#195)
#197Daniel Gustafsson
daniel@yesql.se
In reply to: Pavel Stehule (#196)
#198Pavel Stehule
pavel.stehule@gmail.com
In reply to: Daniel Gustafsson (#197)
#199Pavel Stehule
pavel.stehule@gmail.com
In reply to: Pavel Stehule (#198)
#200Daniel Gustafsson
daniel@yesql.se
In reply to: Pavel Stehule (#199)
#201Erik Rijkers
er@xs4all.nl
In reply to: Daniel Gustafsson (#200)
#202Daniel Gustafsson
daniel@yesql.se
In reply to: Erik Rijkers (#201)
#203Pavel Stehule
pavel.stehule@gmail.com
In reply to: Daniel Gustafsson (#202)
#204Tom Lane
tgl@sss.pgh.pa.us
In reply to: Daniel Gustafsson (#202)
#205Pavel Stehule
pavel.stehule@gmail.com
In reply to: Tom Lane (#204)
#206Daniel Gustafsson
daniel@yesql.se
In reply to: Pavel Stehule (#205)
#207Pavel Stehule
pavel.stehule@gmail.com
In reply to: Daniel Gustafsson (#206)