proposal: possibility to read dumped table's name from file

Started by Pavel Stehuleabout 6 years ago207 messageshackers

pavel.stehule@gmail.com

about 6 years ago

Hi

one my customer has to specify dumped tables name by name. After years and
increasing database size and table numbers he has problem with too short
command line. He need to read the list of tables from file (or from stdin).

I wrote simple PoC patch

Comments, notes, ideas?

Regards

Pavel

tgl@sss.pgh.pa.us

about 6 years ago

In reply to: Pavel Stehule (#1)

Re: proposal: possibility to read dumped table's name from file

Pavel Stehule <pavel.stehule@gmail.com> writes:

one my customer has to specify dumped tables name by name. After years and
increasing database size and table numbers he has problem with too short
command line. He need to read the list of tables from file (or from stdin).

I guess the question is why. That seems like an enormously error-prone
approach. Can't they switch to selecting schemas? Or excluding the
hopefully-short list of tables they don't want?

regards, tom lane

pavel.stehule@gmail.com

about 6 years ago

In reply to: Tom Lane (#2)

Re: proposal: possibility to read dumped table's name from file

pá 29. 5. 2020 v 16:28 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

Pavel Stehule <pavel.stehule@gmail.com> writes:

one my customer has to specify dumped tables name by name. After years

and

increasing database size and table numbers he has problem with too short
command line. He need to read the list of tables from file (or from

stdin).

I guess the question is why. That seems like an enormously error-prone
approach. Can't they switch to selecting schemas? Or excluding the
hopefully-short list of tables they don't want?

It is not typical application. It is a analytic application when the schema
of database is based on dynamic specification of end user (end user can do
customization every time). So schema is very dynamic.

For example - typical server has about four thousand databases and every
database has some between 1K .. 10K tables.

Another specific are different versions of data in different tables. A user
can work with one set of data (one set of tables) and a application
prepares new set of data (new set of tables). Load can be slow, because
sometimes bigger tables are filled (about forty GB). pg_dump backups one
set of tables (little bit like snapshot of data). So it is strange OLAP
(but successfull) application.

Regards

Pavel

Show quoted text

regards, tom lane

david@fetter.org

about 6 years ago

In reply to: Pavel Stehule (#1)

Re: proposal: possibility to read dumped table's name from file

On Fri, May 29, 2020 at 04:21:00PM +0200, Pavel Stehule wrote:

Hi

one my customer has to specify dumped tables name by name. After years and
increasing database size and table numbers he has problem with too short
command line. He need to read the list of tables from file (or from stdin).

I wrote simple PoC patch

Comments, notes, ideas?

This seems like a handy addition. What I've done in cases similar to
this was to use `grep -f` on the output of `pg_dump -l` to create a
file suitable for `pg_dump -L`, or mash them together like this:

pg_restore -L <(pg_dump -l /path/to/dumpfile | grep -f /path/to/listfile) -d new_db /path/to/dumpfile

That's a lot of shell magic and obscure corners of commands to expect
people to use.

Would it make sense to expand this patch to handle other objects?

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

tgl@sss.pgh.pa.us

about 6 years ago

In reply to: David Fetter (#4)

Re: proposal: possibility to read dumped table's name from file

David Fetter <david@fetter.org> writes:

Would it make sense to expand this patch to handle other objects?

If we're gonna do something like this, +1 for being more general.
The fact that pg_dump only has selection switches for tables and
schemas has always struck me as an omission.

regards, tom lane

pavel.stehule@gmail.com

about 6 years ago

In reply to: Tom Lane (#5)

Re: proposal: possibility to read dumped table's name from file

pá 29. 5. 2020 v 18:03 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:

David Fetter <david@fetter.org> writes:

Would it make sense to expand this patch to handle other objects?

Sure. Just we should to design system (and names of options).

If we're gonna do something like this, +1 for being more general.
The fact that pg_dump only has selection switches for tables and
schemas has always struck me as an omission.

a implementation is trivial, hard is good design of names for these options.

Pavel

Show quoted text

regards, tom lane

pryzby@telsasoft.com

about 6 years ago

In reply to: Pavel Stehule (#1)

Re: proposal: possibility to read dumped table's name from file

On Fri, May 29, 2020 at 04:21:00PM +0200, Pavel Stehule wrote:

one my customer has to specify dumped tables name by name. After years and
increasing database size and table numbers he has problem with too short
command line. He need to read the list of tables from file (or from stdin).

+1 - we would use this.

We put a regex (actually a pg_dump pattern) of tables to skip (timeseries
partitions which are older than a few days and which are also dumped once not
expected to change, and typically not redumped). We're nowhere near the
execve() limit, but it'd be nice if the command was primarily a list of options
and not a long regex.

Please also support reading from file for --exclude-table=pattern.

I'm drawing a parallel between this and rsync --include/--exclude and --filter.

We'd be implementing a new --filter, which might have similar syntax to rsync
(which I always forget).

--
Justin

pavel.stehule@gmail.com

almost 6 years ago

In reply to: Justin Pryzby (#7)

Re: proposal: possibility to read dumped table's name from file

Hi

pá 29. 5. 2020 v 20:25 odesílatel Justin Pryzby <pryzby@telsasoft.com>
napsal:

On Fri, May 29, 2020 at 04:21:00PM +0200, Pavel Stehule wrote:

one my customer has to specify dumped tables name by name. After years

and

increasing database size and table numbers he has problem with too short
command line. He need to read the list of tables from file (or from

stdin).

+1 - we would use this.

We put a regex (actually a pg_dump pattern) of tables to skip (timeseries
partitions which are older than a few days and which are also dumped once
not
expected to change, and typically not redumped). We're nowhere near the
execve() limit, but it'd be nice if the command was primarily a list of
options
and not a long regex.

Please also support reading from file for --exclude-table=pattern.

I'm drawing a parallel between this and rsync --include/--exclude and
--filter.

We'd be implementing a new --filter, which might have similar syntax to
rsync
(which I always forget).

I implemented support for all "repeated" pg_dump options.

--exclude-schemas-file=FILENAME
--exclude-tables-data-file=FILENAME
--exclude-tables-file=FILENAME
--include-foreign-data-file=FILENAME
--include-schemas-file=FILENAME
--include-tables-file=FILENAME

Regards

Pavel

I invite any help with doc. There is just very raw text

Show quoted text

--
Justin

pryzby@telsasoft.com

almost 6 years ago

In reply to: Pavel Stehule (#8)

Re: proposal: possibility to read dumped table's name from file

On Mon, Jun 08, 2020 at 07:18:49PM +0200, Pavel Stehule wrote:

pï¿½ 29. 5. 2020 v 20:25 odesï¿½latel Justin Pryzby <pryzby@telsasoft.com> napsal:

On Fri, May 29, 2020 at 04:21:00PM +0200, Pavel Stehule wrote:

one my customer has to specify dumped tables name by name. After years and
increasing database size and table numbers he has problem with too short
command line. He need to read the list of tables from file (or from stdin).

+1 - we would use this.

We put a regex (actually a pg_dump pattern) of tables to skip (timeseries
partitions which are older than a few days and which are also dumped once not
expected to change, and typically not redumped). We're nowhere near the
execve() limit, but it'd be nice if the command was primarily a list of options
and not a long regex.

Please also support reading from file for --exclude-table=pattern.

I'm drawing a parallel between this and rsync --include/--exclude and
--filter.

We'd be implementing a new --filter, which might have similar syntax to rsync
(which I always forget).

I implemented support for all "repeated" pg_dump options.

I invite any help with doc. There is just very raw text

+ Do not dump data of tables spefified in file.

*specified

I still wonder if a better syntax would use a unified --filter option, whose
argument would allow including/excluding any type of object:

+[tn] include (t)table/(n)namespace/...
-[tn] exclude (t)table/(n)namespace/...

In the past, I looked for a way to exclude extended stats objects, and ended up
using a separate schema. An "extensible" syntax might be better (although
reading a file of just patterns has the advantage that the function can just be
called once for each option for each type of object).

--
Justin

pavel.stehule@gmail.com

almost 6 years ago

In reply to: Justin Pryzby (#9)

Re: proposal: possibility to read dumped table's name from file

po 8. 6. 2020 v 23:30 odesílatel Justin Pryzby <pryzby@telsasoft.com>
napsal:

On Mon, Jun 08, 2020 at 07:18:49PM +0200, Pavel Stehule wrote:

pá 29. 5. 2020 v 20:25 odesílatel Justin Pryzby <pryzby@telsasoft.com>

napsal:

On Fri, May 29, 2020 at 04:21:00PM +0200, Pavel Stehule wrote:

one my customer has to specify dumped tables name by name. After

years and

increasing database size and table numbers he has problem with too

short

command line. He need to read the list of tables from file (or from

stdin).

+1 - we would use this.

We put a regex (actually a pg_dump pattern) of tables to skip

(timeseries

partitions which are older than a few days and which are also dumped

once not

expected to change, and typically not redumped). We're nowhere near

the

execve() limit, but it'd be nice if the command was primarily a list

of options

and not a long regex.

Please also support reading from file for --exclude-table=pattern.

I'm drawing a parallel between this and rsync --include/--exclude and
--filter.

We'd be implementing a new --filter, which might have similar syntax

to rsync

(which I always forget).

I implemented support for all "repeated" pg_dump options.

I invite any help with doc. There is just very raw text

+ Do not dump data of tables spefified in file.

*specified

I am sending updated version - now with own implementation GNU (not POSIX)
function getline

I still wonder if a better syntax would use a unified --filter option, whose

argument would allow including/excluding any type of object:
+[tn] include (t)table/(n)namespace/...
-[tn] exclude (t)table/(n)namespace/...
In the past, I looked for a way to exclude extended stats objects, and
ended up
using a separate schema. An "extensible" syntax might be better (although
reading a file of just patterns has the advantage that the function can
just be
called once for each option for each type of object).

I tried to implement simple format "[+-][tndf] objectname"

please, check attached patch

Regards

Pavel

Show quoted text

--
Justin

pryzby@telsasoft.com

almost 6 years ago

In reply to: Pavel Stehule (#10)

Re: proposal: possibility to read dumped table's name from file

On Tue, Jun 09, 2020 at 11:46:24AM +0200, Pavel Stehule wrote:

po 8. 6. 2020 v 23:30 odesï¿½latel Justin Pryzby <pryzby@telsasoft.com> napsal:

I still wonder if a better syntax would use a unified --filter option, whose

argument would allow including/excluding any type of object:

I tried to implement simple format "[+-][tndf] objectname"

Thanks.

+ /* ignore empty rows */
+ if (*line != '\0')

Maybe: if line=='\0': continue

We should also support comments.

+ bool include_filter = false;
+ bool exclude_filter = false;

I think we only need one bool.
You could call it: bool is_exclude = false

+
+							if (chars < 4)
+								invalid_filter_format(optarg, line, lineno);

I think that check is too lax.
I think it's ok if we require the first char to be [-+] and the 2nd char to be
[dntf]

+ objecttype = line[1];

... but I think this is inadequately "liberal in what it accepts"; I think it
should skip spaces. In my proposed scheme, someone might reasonably write:

+
+							objectname = &line[3];
+
+							/* skip initial spaces */
+							while (*objectname == ' ')
+								objectname++;

I suggest to use isspace()

I think we should check that *objectname != '\0', rather than chars>=4, above.

+								if (include_filter)
+								{
+									simple_string_list_append(&table_include_patterns, objectname);
+									dopt.include_everything = false;
+								}
+								else if (exclude_filter)
+									simple_string_list_append(&table_exclude_patterns, objectname);

If you use bool is_exclude, then this becomes "else" and you don't need to
think about checking if (!include && !exclude).

+							else if (objecttype == 'f')
+							{
+								if (include_filter)
+									simple_string_list_append(&foreign_servers_include_patterns, objectname);
+								else if (exclude_filter)
+									invalid_filter_format(optarg, line, lineno);
+							}

I would handle invalid object types as "else: invalid_filter_format()" here,
rather than duplicating above as: !=ALL('d','n','t','f')

+
+					if (ferror(f))
+						fatal("could not read from file \"%s\": %s",
+							  f == stdin ? "stdin" : optarg,
+							  strerror(errno));

I think we're allowed to use %m here ?

+ printf(_(" --filter=FILENAME read object names from file\n"));

Object name filter expression, or something..

+ * getline is originaly GNU function, and should not be everywhere still.

originally

+ * Use own reduced implementation.

Did you "reduce" this from another implementation? Where?
What is its license ?

Maybe a line-reader already exists in the frontend (?) .. or maybe it should.

--
Justin

pavel.stehule@gmail.com

almost 6 years ago

In reply to: Justin Pryzby (#11)

Re: proposal: possibility to read dumped table's name from file

st 10. 6. 2020 v 0:30 odesílatel Justin Pryzby <pryzby@telsasoft.com>
napsal:

On Tue, Jun 09, 2020 at 11:46:24AM +0200, Pavel Stehule wrote:

po 8. 6. 2020 v 23:30 odesílatel Justin Pryzby <pryzby@telsasoft.com>

napsal:

I still wonder if a better syntax would use a unified --filter option,

whose

argument would allow including/excluding any type of object:

I tried to implement simple format "[+-][tndf] objectname"

I had another idea about format - instead using +-, we can use case
sensitive options same to pg_dump command line (with extending Df -
because these options doesn't exists in short form)

So format can looks like

[tTnNDf] {objectname}

What do you think about this? This format is simpler, and it can work. What
do you think about it?

Did you "reduce" this from another implementation? Where?
What is its license ?

The code is 100% mine. It is not copy from gnulib and everybody can simply
check it

https://code.woboq.org/userspace/glibc/stdio-common/getline.c.html
https://code.woboq.org/userspace/glibc/libio/iogetdelim.c.html#_IO_getdelim

Reduced in functionality sense. There is no full argument check that is
necessary for glibc functions. There are no memory checks because
pg_malloc, pg_realloc are used.

pryzby@telsasoft.com

almost 6 years ago

In reply to: Pavel Stehule (#12)

Re: proposal: possibility to read dumped table's name from file

On Wed, Jun 10, 2020 at 05:03:49AM +0200, Pavel Stehule wrote:

st 10. 6. 2020 v 0:30 odesï¿½latel Justin Pryzby <pryzby@telsasoft.com> napsal:

On Tue, Jun 09, 2020 at 11:46:24AM +0200, Pavel Stehule wrote:

po 8. 6. 2020 v 23:30 odesï¿½latel Justin Pryzby <pryzby@telsasoft.com> napsal:

I still wonder if a better syntax would use a unified --filter option, whose

argument would allow including/excluding any type of object:

I tried to implement simple format "[+-][tndf] objectname"

I had another idea about format - instead using +-, we can use case
sensitive options same to pg_dump command line (with extending Df -
because these options doesn't exists in short form)

So format can looks like

[tTnNDf] {objectname}

What do you think about this? This format is simpler, and it can work. What
do you think about it?

I prefer [-+][dtnf], which is similar to rsync --filter, and clear what it's
doing. I wouldn't put much weight on what the short options are.

I wonder if some people would want to be able to use *long* or short options:

-table foo
+schema baz

Or maybe:

exclude-table=foo
schema=bar

Some tools use "long options without leading dashes" as their configuration
file format. Examples: openvpn, mysql. So that could be a good option.
OTOH, there's only a few "keys", so I'm not sure how many people would want to
repeat them, if there's enough to bother putting them in the file rather than
the cmdline.

--
Justin

pavel.stehule@gmail.com

almost 6 years ago

In reply to: Justin Pryzby (#11)

Re: proposal: possibility to read dumped table's name from file

st 10. 6. 2020 v 0:30 odesílatel Justin Pryzby <pryzby@telsasoft.com>
napsal:

On Tue, Jun 09, 2020 at 11:46:24AM +0200, Pavel Stehule wrote:

po 8. 6. 2020 v 23:30 odesílatel Justin Pryzby <pryzby@telsasoft.com>

napsal:

I still wonder if a better syntax would use a unified --filter option,

whose

argument would allow including/excluding any type of object:

I tried to implement simple format "[+-][tndf] objectname"

Thanks.
+                                             /* ignore empty rows */
+                                             if (*line != '\0')
Maybe: if line=='\0': continue

ok

We should also support comments.

+ bool

include_filter = false;

+ bool

exclude_filter = false;

I think we only need one bool.
You could call it: bool is_exclude = false

ok

+
+                                                     if (chars < 4)
+
invalid_filter_format(optarg, line, lineno);

I think that check is too lax.
I think it's ok if we require the first char to be [-+] and the 2nd char
to be
[dntf]

+ objecttype =

line[1];

... but I think this is inadequately "liberal in what it accepts"; I think
it
should skip spaces. In my proposed scheme, someone might reasonably write:
+
+                                                     objectname =
&line[3];
+
+                                                     /* skip initial
spaces */

+ while (*objectname

== ' ')

+

objectname++;

I suggest to use isspace()

ok

I think we should check that *objectname != '\0', rather than chars>=4,
above.

done

+ if

(include_filter)
+                                                             {
+
simple_string_list_append(&table_include_patterns, objectname);

+

dopt.include_everything = false;
+                                                             }
+                                                             else if
(exclude_filter)

+

simple_string_list_append(&table_exclude_patterns, objectname);

If you use bool is_exclude, then this becomes "else" and you don't need to
think about checking if (!include && !exclude).

+ else if

(objecttype == 'f')
+                                                     {
+                                                             if
(include_filter)

+

simple_string_list_append(&foreign_servers_include_patterns, objectname);

+ else if

(exclude_filter)

+

invalid_filter_format(optarg, line, lineno);

+ }

I would handle invalid object types as "else: invalid_filter_format()"
here,
rather than duplicating above as: !=ALL('d','n','t','f')

good idea

+
+                                     if (ferror(f))
+                                             fatal("could not read from
file \"%s\": %s",

+ f == stdin ?

"stdin" : optarg,

+ strerror(errno));

I think we're allowed to use %m here ?

changed

+ printf(_(" --filter=FILENAME read object names from

file\n"));

Object name filter expression, or something..

yes, it is not object names now

+ * getline is originaly GNU function, and should not be everywhere

still.
originally

+ * Use own reduced implementation.

Did you "reduce" this from another implementation? Where?
What is its license ?

Maybe a line-reader already exists in the frontend (?) .. or maybe it
should.

everywhere else is used a function fgets. Currently pg_getline is used just
on only one place, so I don't think so moving it to some common part is
maybe premature.

Maybe it can be used as replacement of some fgets calls, but then it is
different topic, I think.

Thank you for comments, attached updated patch

Regards

Pavel

Show quoted text

--
Justin

vignesh21@gmail.com

almost 6 years ago

In reply to: Pavel Stehule (#14)

Re: proposal: possibility to read dumped table's name from file

On Thu, Jun 11, 2020 at 1:07 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:

Thank you for comments, attached updated patch

Few comments:
+invalid_filter_format(char *message, char *filename, char *line, int lineno)
+{
+       char       *displayname;
+
+       displayname = *filename == '-' ? "stdin" : filename;
+
+       pg_log_error("invalid format of filter file \"%s\": %s",
+                                displayname,
+                                message);
+
+       fprintf(stderr, "%d: %s\n", lineno, line);
+       exit_nicely(1);
+}
I think fclose is missing here.

+                                               if (line[chars - 1] == '\n')
+                                                       line[chars - 1] = '\0';
Should we check for '\r' also to avoid failures in some platforms.

+     <varlistentry>
+      <term><option>--filter=<replaceable
class="parameter">filename</replaceable></option></term>
+      <listitem>
+       <para>
+        Read filters from file. Format "(+|-)(tnfd) objectname:
+       </para>
+      </listitem>
+     </varlistentry>

I felt some documentation is missing here. We could include,
options tnfd is for controlling table, schema, foreign server data &
table exclude patterns.

Instead of using tnfd, if we could use the same options as existing
pg_dump options it will be less confusing.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

pryzby@telsasoft.com

almost 6 years ago

In reply to: Pavel Stehule (#14)

Re: proposal: possibility to read dumped table's name from file

On Thu, Jun 11, 2020 at 09:36:18AM +0200, Pavel Stehule wrote:

st 10. 6. 2020 v 0:30 odesï¿½latel Justin Pryzby <pryzby@telsasoft.com> napsal:
+                                             /* ignore empty rows */
+                                             if (*line != '\0')
Maybe: if line=='\0': continue
We should also support comments.

Comment support is still missing but easily added :)

I tried this patch and it works for my purposes.

Also, your getline is dynamically re-allocating lines of arbitrary length.
Possibly that's not needed. We'll typically read "+t schema.relname", which is
132 chars. Maybe it's sufficient to do
char buf[1024];
fgets(buf);
if strchr(buf, '\n') == NULL: error();
ret = pstrdup(buf);

In any case, you could have getline return a char* and (rather than following
GNU) no need to take char**, int* parameters to conflate inputs and outputs.

I realized that --filter has an advantage over the previous implementation
(with multiple --exclude-* and --include-*) in that it's possible to use stdin
for includes *and* excludes.

By chance, I had the opportunity yesterday to re-use with rsync a regex that
I'd previously been using with pg_dump and grep. What this patch calls
"--filter" in rsync is called "--filter-from". rsync's --filter-from rejects
filters of length longer than max filename, so I had to split it up into
multiple lines instead of using regex alternation ("|"). This option is a
close parallel in pg_dump.

--
Justin

pavel.stehule@gmail.com

almost 6 years ago

In reply to: vignesh C (#15)

Re: proposal: possibility to read dumped table's name from file

so 27. 6. 2020 v 14:55 odesílatel vignesh C <vignesh21@gmail.com> napsal:

On Thu, Jun 11, 2020 at 1:07 PM Pavel Stehule <pavel.stehule@gmail.com>
wrote:

Thank you for comments, attached updated patch

Few comments:
+invalid_filter_format(char *message, char *filename, char *line, int
lineno)
+{
+       char       *displayname;
+
+       displayname = *filename == '-' ? "stdin" : filename;
+
+       pg_log_error("invalid format of filter file \"%s\": %s",
+                                displayname,
+                                message);
+
+       fprintf(stderr, "%d: %s\n", lineno, line);
+       exit_nicely(1);
+}

I think fclose is missing here.

done

+                                               if (line[chars - 1] ==
'\n')
+                                                       line[chars - 1] =
'\0';
Should we check for '\r' also to avoid failures in some platforms.

I checked other usage of fgets in Postgres source code, and everywhere is
used test on \n

When I did some fast research, then
https://stackoverflow.com/questions/12769289/carriage-return-by-fgets \r in
this case should be thrown by libc on Microsoft

https://stackoverflow.com/questions/2061334/fgets-linux-vs-mac

\n should be on Mac OS X .. 2001 year .. I am not sure if Mac OS 9 should
be supported.

+     <varlistentry>
+      <term><option>--filter=<replaceable
class="parameter">filename</replaceable></option></term>
+      <listitem>
+       <para>
+        Read filters from file. Format "(+|-)(tnfd) objectname:
+       </para>
+      </listitem>
+     </varlistentry>
I felt some documentation is missing here. We could include,
options tnfd is for controlling table, schema, foreign server data &
table exclude patterns.

I have a plan to completate doc when the design is completed. It was not
clear if people prefer long or short forms of option names.

Instead of using tnfd, if we could use the same options as existing
pg_dump options it will be less confusing.

it almost same

+-t .. tables
+-n schema
-d exclude data .. there is not short option for --exclude-table-data
+f include foreign table .. there is not short option for
--include-foreign-data

So still, there is a opened question if use +-tnfd system, or system based
on long option

table foo
exclude-table foo
schema xx
exclude-schema xx
include-foreign-data yyy
exclude-table-data zzz

Typically these files will be generated by scripts and processed via pipe,
so there I see just two arguments for and aginst:

short format - there is less probability to do typo error (but there is not
full consistency with pg_dump options)
long format - it is self documented (and there is full consistency with
pg_dump)

In this case I prefer short form .. it is more comfortable for users, and
there are only a few variants, so it is not necessary to use too verbose
language (design). But my opinion is not aggressively strong and I'll
accept any common agreement.

Regards

Updated patch attached

Show quoted text

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

pavel.stehule@gmail.com

almost 6 years ago

In reply to: Justin Pryzby (#16)

Re: proposal: possibility to read dumped table's name from file

st 1. 7. 2020 v 23:24 odesílatel Justin Pryzby <pryzby@telsasoft.com>
napsal:

On Thu, Jun 11, 2020 at 09:36:18AM +0200, Pavel Stehule wrote:

st 10. 6. 2020 v 0:30 odesílatel Justin Pryzby <pryzby@telsasoft.com>

napsal:
+                                             /* ignore empty rows */
+                                             if (*line != '\0')
Maybe: if line=='\0': continue
We should also support comments.
Comment support is still missing but easily added :)

I tried this patch and it works for my purposes.

Also, your getline is dynamically re-allocating lines of arbitrary length.
Possibly that's not needed. We'll typically read "+t schema.relname",
which is
132 chars. Maybe it's sufficient to do
char buf[1024];
fgets(buf);
if strchr(buf, '\n') == NULL: error();
ret = pstrdup(buf);

63 bytes is max effective identifier size, but it is not max size of
identifiers. It is very probably so buff with 1024 bytes will be enough for
all, but I do not want to increase any new magic limit. More when dynamic
implementation is not too hard.

Table name can be very long - sometimes the data names (table names) can be
stored in external storages with full length and should not be practical to
require truncating in filter file.

For this case it is very effective, because a resized (increased) buffer is
used for following rows, so realloc should not be often. So when I have to
choose between two implementations with similar complexity, I prefer more
dynamic code without hardcoded limits. This dynamic hasn't any overhead.

In any case, you could have getline return a char* and (rather than
following
GNU) no need to take char**, int* parameters to conflate inputs and
outputs.

no, it has a special benefit. It eliminates the short malloc/free cycle.
When some lines are longer, then the buffer is increased (and limits), and
for other rows with same or less size is not necessary realloc.

I realized that --filter has an advantage over the previous implementation
(with multiple --exclude-* and --include-*) in that it's possible to use
stdin
for includes *and* excludes.

yes, it looks like better choose

By chance, I had the opportunity yesterday to re-use with rsync a regex
that
I'd previously been using with pg_dump and grep. What this patch calls
"--filter" in rsync is called "--filter-from". rsync's --filter-from
rejects
filters of length longer than max filename, so I had to split it up into
multiple lines instead of using regex alternation ("|"). This option is a
close parallel in pg_dump.

we can talk about option name - maybe "--filter-from" is better than just
"--filter"

Regards

Pavel

Show quoted text

--
Justin

pryzby@telsasoft.com

almost 6 years ago

In reply to: Justin Pryzby (#16)

Re: proposal: possibility to read dumped table's name from file

On Wed, Jul 01, 2020 at 04:24:52PM -0500, Justin Pryzby wrote:

On Thu, Jun 11, 2020 at 09:36:18AM +0200, Pavel Stehule wrote:
st 10. 6. 2020 v 0:30 odesï¿½latel Justin Pryzby <pryzby@telsasoft.com> napsal:
+                                             /* ignore empty rows */
+                                             if (*line != '\0')
Maybe: if line=='\0': continue
We should also support comments.
Comment support is still missing but easily added :)

Still missing from the latest patch.

With some added documentation, I think this can be RfC.

--
Justin

pavel.stehule@gmail.com

almost 6 years ago

In reply to: Justin Pryzby (#19)

Re: proposal: possibility to read dumped table's name from file

ne 5. 7. 2020 v 22:31 odesílatel Justin Pryzby <pryzby@telsasoft.com>
napsal:

On Wed, Jul 01, 2020 at 04:24:52PM -0500, Justin Pryzby wrote:

On Thu, Jun 11, 2020 at 09:36:18AM +0200, Pavel Stehule wrote:

st 10. 6. 2020 v 0:30 odesílatel Justin Pryzby <pryzby@telsasoft.com>

napsal:

+ /* ignore empty rows

*/

+ if (*line != '\0')

Maybe: if line=='\0': continue
We should also support comments.

Comment support is still missing but easily added :)

Still missing from the latest patch.

I can implement a comment support. But I am not sure about the format. The
start can be "--" or classic #.

but "--" can be in this context messy

Show quoted text

With some added documentation, I think this can be RfC.

--
Justin

pavel.stehule@gmail.com

almost 6 years ago

In reply to: Pavel Stehule (#20)

pryzby@telsasoft.com

almost 6 years ago

In reply to: Pavel Stehule (#21)

vignesh21@gmail.com

almost 6 years ago

In reply to: Pavel Stehule (#21)

pavel.stehule@gmail.com

almost 6 years ago

In reply to: Justin Pryzby (#22)

pavel.stehule@gmail.com

almost 6 years ago

In reply to: vignesh C (#23)

Daniel Gustafsson

daniel@yesql.se

almost 6 years ago

In reply to: Pavel Stehule (#25)

pavel.stehule@gmail.com

almost 6 years ago

In reply to: Daniel Gustafsson (#26)

vignesh21@gmail.com

almost 6 years ago

In reply to: Pavel Stehule (#25)

pryzby@telsasoft.com

almost 6 years ago

In reply to: Daniel Gustafsson (#26)

Daniel Gustafsson

daniel@yesql.se

almost 6 years ago

In reply to: Pavel Stehule (#27)

pavel.stehule@gmail.com

almost 6 years ago

In reply to: Daniel Gustafsson (#30)

pryzby@telsasoft.com

almost 6 years ago

In reply to: Pavel Stehule (#24)

pavel.stehule@gmail.com

almost 6 years ago

In reply to: Justin Pryzby (#32)

pavel.stehule@gmail.com

almost 6 years ago

In reply to: vignesh C (#28)

vignesh21@gmail.com

almost 6 years ago

In reply to: Pavel Stehule (#34)

pryzby@telsasoft.com

almost 6 years ago

In reply to: vignesh C (#35)

pavel.stehule@gmail.com

almost 6 years ago

In reply to: vignesh C (#35)

pavel.stehule@gmail.com

almost 6 years ago

In reply to: Justin Pryzby (#36)

pryzby@telsasoft.com

almost 6 years ago

In reply to: Pavel Stehule (#37)

pryzby@telsasoft.com

over 5 years ago

In reply to: Pavel Stehule (#18)

alvherre@2ndquadrant.com

over 5 years ago

In reply to: Pavel Stehule (#37)

tgl@sss.pgh.pa.us

over 5 years ago

In reply to: Alvaro Herrera (#41)

pavel.stehule@gmail.com

over 5 years ago

In reply to: Tom Lane (#42)

pavel.stehule@gmail.com

over 5 years ago

In reply to: Pavel Stehule (#43)

Surafel Temesgen

surafel3000@gmail.com

over 5 years ago

In reply to: Pavel Stehule (#43)

pavel.stehule@gmail.com

over 5 years ago

In reply to: Surafel Temesgen (#45)

pavel.stehule@gmail.com

over 5 years ago

In reply to: Pavel Stehule (#46)

pavel.stehule@gmail.com

over 5 years ago

In reply to: Pavel Stehule (#47)

sfrost@snowman.net

over 5 years ago

In reply to: Pavel Stehule (#48)

pavel.stehule@gmail.com

over 5 years ago

In reply to: Stephen Frost (#49)

pavel.stehule@gmail.com

over 5 years ago

In reply to: Pavel Stehule (#50)

pavel.stehule@gmail.com

over 5 years ago

In reply to: Pavel Stehule (#48)

pavel.stehule@gmail.com

over 5 years ago

In reply to: Pavel Stehule (#50)

alvherre@2ndquadrant.com

over 5 years ago

In reply to: Pavel Stehule (#50)

pavel.stehule@gmail.com

over 5 years ago

In reply to: Alvaro Herrera (#54)

pryzby@telsasoft.com

over 5 years ago

In reply to: Pavel Stehule (#51)

sfrost@snowman.net

over 5 years ago

In reply to: Justin Pryzby (#56)

pavel.stehule@gmail.com

over 5 years ago

In reply to: Justin Pryzby (#56)

pavel.stehule@gmail.com

over 5 years ago

In reply to: Pavel Stehule (#58)

dean.a.rasheed@gmail.com

over 5 years ago

In reply to: Pavel Stehule (#59)

pavel.stehule@gmail.com

over 5 years ago

In reply to: Dean Rasheed (#60)

tgl@sss.pgh.pa.us

over 5 years ago

In reply to: Pavel Stehule (#61)

pavel.stehule@gmail.com

over 5 years ago

In reply to: Tom Lane (#62)

dean.a.rasheed@gmail.com

over 5 years ago

In reply to: Pavel Stehule (#63)

tgl@sss.pgh.pa.us

over 5 years ago

In reply to: Dean Rasheed (#64)

sfrost@snowman.net

over 5 years ago

In reply to: Tom Lane (#65)

sfrost@snowman.net

over 5 years ago

In reply to: Pavel Stehule (#61)

pavel.stehule@gmail.com

over 5 years ago

In reply to: Tom Lane (#62)

pavel.stehule@gmail.com

over 5 years ago

In reply to: Stephen Frost (#67)

pryzby@telsasoft.com

over 5 years ago

In reply to: Pavel Stehule (#68)

pavel.stehule@gmail.com

over 5 years ago

In reply to: Justin Pryzby (#70)

pavel.stehule@gmail.com

over 5 years ago

In reply to: Pavel Stehule (#71)

pavel.stehule@gmail.com

about 5 years ago

In reply to: Pavel Stehule (#72)

pavel.stehule@gmail.com

about 5 years ago

In reply to: Pavel Stehule (#73)

tomas.vondra@2ndquadrant.com

almost 5 years ago

In reply to: Pavel Stehule (#74)

Daniel Gustafsson

daniel@yesql.se

almost 5 years ago

In reply to: Tomas Vondra (#75)

tomas.vondra@2ndquadrant.com

almost 5 years ago

In reply to: Daniel Gustafsson (#76)

alvherre@2ndquadrant.com

almost 5 years ago

In reply to: Tomas Vondra (#77)

tgl@sss.pgh.pa.us

almost 5 years ago

In reply to: Alvaro Herrera (#78)

sfrost@snowman.net

almost 5 years ago

In reply to: Tom Lane (#79)

tomas.vondra@2ndquadrant.com

almost 5 years ago

In reply to: Stephen Frost (#80)

Daniel Gustafsson

daniel@yesql.se

almost 5 years ago

In reply to: Tomas Vondra (#81)

Daniel Gustafsson

daniel@yesql.se

almost 5 years ago

In reply to: Alvaro Herrera (#78)

sfrost@snowman.net

almost 5 years ago

In reply to: Daniel Gustafsson (#82)

tomas.vondra@2ndquadrant.com

almost 5 years ago

In reply to: Stephen Frost (#84)

sfrost@snowman.net

almost 5 years ago

In reply to: Tomas Vondra (#85)

alvherre@2ndquadrant.com

almost 5 years ago

In reply to: Stephen Frost (#86)

sfrost@snowman.net

almost 5 years ago

In reply to: Alvaro Herrera (#87)

pavel.stehule@gmail.com

almost 5 years ago

In reply to: Stephen Frost (#88)

tomas.vondra@2ndquadrant.com

almost 5 years ago

In reply to: Stephen Frost (#88)

pavel.stehule@gmail.com

almost 5 years ago

In reply to: Pavel Stehule (#74)

pavel.stehule@gmail.com

almost 5 years ago

In reply to: Tom Lane (#79)

Daniel Gustafsson

daniel@yesql.se

over 4 years ago

In reply to: Pavel Stehule (#92)

pryzby@telsasoft.com

over 4 years ago

In reply to: Pavel Stehule (#92)

pavel.stehule@gmail.com

over 4 years ago

In reply to: Daniel Gustafsson (#93)

pavel.stehule@gmail.com

over 4 years ago

In reply to: Justin Pryzby (#94)

pavel.stehule@gmail.com

over 4 years ago

In reply to: Pavel Stehule (#95)

Daniel Gustafsson

daniel@yesql.se

over 4 years ago

In reply to: Pavel Stehule (#97)

Daniel Gustafsson

daniel@yesql.se

over 4 years ago

In reply to: Pavel Stehule (#95)

pavel.stehule@gmail.com

over 4 years ago

In reply to: Daniel Gustafsson (#98)

pavel.stehule@gmail.com

over 4 years ago

In reply to: Daniel Gustafsson (#99)

Daniel Gustafsson

daniel@yesql.se

over 4 years ago

In reply to: Pavel Stehule (#101)

pavel.stehule@gmail.com

over 4 years ago

In reply to: Daniel Gustafsson (#102)

sfrost@snowman.net

over 4 years ago

In reply to: Pavel Stehule (#103)

Daniel Gustafsson

daniel@yesql.se

over 4 years ago

In reply to: Pavel Stehule (#103)

sfrost@snowman.net

over 4 years ago

In reply to: Daniel Gustafsson (#105)

pavel.stehule@gmail.com

over 4 years ago

In reply to: Daniel Gustafsson (#105)

pavel.stehule@gmail.com

over 4 years ago

In reply to: Daniel Gustafsson (#99)

Daniel Gustafsson

daniel@yesql.se

over 4 years ago

In reply to: Pavel Stehule (#108)

pavel.stehule@gmail.com

over 4 years ago

In reply to: Daniel Gustafsson (#109)

Daniel Gustafsson

daniel@yesql.se

over 4 years ago

In reply to: Pavel Stehule (#110)

pavel.stehule@gmail.com

over 4 years ago

In reply to: Daniel Gustafsson (#111)

alvherre@2ndquadrant.com

over 4 years ago

In reply to: Daniel Gustafsson (#111)

tomas.vondra@2ndquadrant.com

over 4 years ago

In reply to: Alvaro Herrera (#113)

pavel.stehule@gmail.com

over 4 years ago

In reply to: Daniel Gustafsson (#111)

pavel.stehule@gmail.com

over 4 years ago

In reply to: Daniel Gustafsson (#111)

Daniel Gustafsson

daniel@yesql.se

over 4 years ago

In reply to: Pavel Stehule (#116)

pavel.stehule@gmail.com

over 4 years ago

In reply to: Daniel Gustafsson (#117)

er@xs4all.nl

over 4 years ago

In reply to: Daniel Gustafsson (#117)

er@xs4all.nl

over 4 years ago

In reply to: Erik Rijkers (#119)

Daniel Gustafsson

daniel@yesql.se

over 4 years ago

In reply to: Erik Rijkers (#120)

Daniel Gustafsson

daniel@yesql.se

over 4 years ago

In reply to: Daniel Gustafsson (#117)

pavel.stehule@gmail.com

over 4 years ago

In reply to: Daniel Gustafsson (#122)

er@xs4all.nl

over 4 years ago

In reply to: Daniel Gustafsson (#122)

pavel.stehule@gmail.com

over 4 years ago

In reply to: Daniel Gustafsson (#122)

pavel.stehule@gmail.com

about 4 years ago

In reply to: Pavel Stehule (#97)

andrew@dunslane.net

almost 4 years ago

In reply to: Pavel Stehule (#126)

pavel.stehule@gmail.com

almost 4 years ago

In reply to: Andrew Dunstan (#127)

pavel.stehule@gmail.com

almost 4 years ago

In reply to: Pavel Stehule (#128)

pryzby@telsasoft.com

almost 4 years ago

In reply to: Pavel Stehule (#129)

pavel.stehule@gmail.com

almost 4 years ago

In reply to: Justin Pryzby (#130)

pavel.stehule@gmail.com

almost 4 years ago

In reply to: Pavel Stehule (#131)

Daniel Gustafsson

daniel@yesql.se

over 3 years ago

In reply to: Pavel Stehule (#132)

er@xs4all.nl

over 3 years ago

In reply to: Daniel Gustafsson (#133)

Daniel Gustafsson

daniel@yesql.se

over 3 years ago

In reply to: Erik Rijkers (#134)

rjuju123@gmail.com

over 3 years ago

In reply to: Daniel Gustafsson (#135)

Daniel Gustafsson

daniel@yesql.se

over 3 years ago

In reply to: Julien Rouhaud (#136)

pavel.stehule@gmail.com

over 3 years ago

In reply to: Daniel Gustafsson (#133)

john.naylor@enterprisedb.com

over 3 years ago

In reply to: Daniel Gustafsson (#137)

andrew@dunslane.net

over 3 years ago

In reply to: John Naylor (#139)

Daniel Gustafsson

daniel@yesql.se

over 3 years ago

In reply to: Andrew Dunstan (#140)

pavel.stehule@gmail.com

over 3 years ago

In reply to: Daniel Gustafsson (#141)

er@xs4all.nl

over 3 years ago

In reply to: Daniel Gustafsson (#141)

er@xs4all.nl

over 3 years ago

In reply to: Erik Rijkers (#143)

john.naylor@enterprisedb.com

over 3 years ago

In reply to: Pavel Stehule (#142)

pavel.stehule@gmail.com

over 3 years ago

In reply to: John Naylor (#145)

andres@anarazel.de

over 3 years ago

In reply to: Daniel Gustafsson (#141)

andres@anarazel.de

over 3 years ago

In reply to: Daniel Gustafsson (#141)

andres@anarazel.de

over 3 years ago

In reply to: Andres Freund (#148)

andres@anarazel.de

over 3 years ago

In reply to: Andres Freund (#149)

Daniel Gustafsson

daniel@yesql.se

over 3 years ago

In reply to: Andres Freund (#150)

tgl@sss.pgh.pa.us

over 3 years ago

In reply to: Daniel Gustafsson (#151)

andres@anarazel.de

over 3 years ago

In reply to: Daniel Gustafsson (#151)

pavel.stehule@gmail.com

over 3 years ago

In reply to: Daniel Gustafsson (#151)

rjuju123@gmail.com

over 3 years ago

In reply to: Pavel Stehule (#154)

pavel.stehule@gmail.com

over 3 years ago

In reply to: Julien Rouhaud (#155)

rjuju123@gmail.com

over 3 years ago

In reply to: Pavel Stehule (#156)

pavel.stehule@gmail.com

over 3 years ago

In reply to: Julien Rouhaud (#157)

andres@anarazel.de

over 3 years ago

In reply to: Pavel Stehule (#156)

rjuju123@gmail.com

over 3 years ago

In reply to: Andres Freund (#159)

pavel.stehule@gmail.com

over 3 years ago

In reply to: Julien Rouhaud (#160)

rjuju123@gmail.com

over 3 years ago

In reply to: Pavel Stehule (#161)

pavel.stehule@gmail.com

over 3 years ago

In reply to: Julien Rouhaud (#162)

rjuju123@gmail.com

over 3 years ago

In reply to: Pavel Stehule (#163)

pryzby@telsasoft.com

over 3 years ago

In reply to: Julien Rouhaud (#164)

rjuju123@gmail.com

over 3 years ago

In reply to: Justin Pryzby (#165)

pavel.stehule@gmail.com

over 3 years ago

In reply to: Julien Rouhaud (#166)

rjuju123@gmail.com

over 3 years ago

In reply to: Pavel Stehule (#167)

pavel.stehule@gmail.com

over 3 years ago

In reply to: Julien Rouhaud (#166)

rjuju123@gmail.com

over 3 years ago

In reply to: Pavel Stehule (#169)

pavel.stehule@gmail.com

over 3 years ago

In reply to: Julien Rouhaud (#170)

pavel.stehule@gmail.com

over 3 years ago

In reply to: Pavel Stehule (#171)

rjuju123@gmail.com

over 3 years ago

In reply to: Pavel Stehule (#171)

pavel.stehule@gmail.com

over 3 years ago

In reply to: Julien Rouhaud (#173)

rjuju123@gmail.com

over 3 years ago

In reply to: Pavel Stehule (#174)

pavel.stehule@gmail.com

over 3 years ago

In reply to: Julien Rouhaud (#175)

andres@anarazel.de

over 3 years ago

In reply to: Pavel Stehule (#174)

pavel.stehule@gmail.com

over 3 years ago

In reply to: Andres Freund (#177)

pavel.stehule@gmail.com

over 3 years ago

In reply to: Andres Freund (#177)

Gregory Stark (as CFM)

stark.cfm@gmail.com

about 3 years ago

In reply to: Julien Rouhaud (#175)

Daniel Gustafsson

daniel@yesql.se

about 3 years ago

In reply to: Gregory Stark (as CFM) (#180)

rjuju123@gmail.com

about 3 years ago

In reply to: Daniel Gustafsson (#181)

pavel.stehule@gmail.com

about 3 years ago

In reply to: Julien Rouhaud (#182)

pavel.stehule@gmail.com

about 3 years ago

In reply to: Pavel Stehule (#183)

pryzby@telsasoft.com

about 3 years ago

In reply to: Pavel Stehule (#183)

pavel.stehule@gmail.com

about 3 years ago

In reply to: Justin Pryzby (#185)

pavel.stehule@gmail.com

about 3 years ago

In reply to: Justin Pryzby (#185)

pryzby@telsasoft.com

about 3 years ago

In reply to: Pavel Stehule (#187)

pavel.stehule@gmail.com

about 3 years ago

In reply to: Justin Pryzby (#188)

pavel.stehule@gmail.com

about 3 years ago

In reply to: Justin Pryzby (#188)

pavel.stehule@gmail.com

over 2 years ago

In reply to: Pavel Stehule (#190)

pavel.stehule@gmail.com

over 2 years ago

In reply to: Pavel Stehule (#191)

pavel.stehule@gmail.com

over 2 years ago

In reply to: Pavel Stehule (#192)

Daniel Gustafsson

daniel@yesql.se

over 2 years ago

In reply to: Pavel Stehule (#193)

pavel.stehule@gmail.com

over 2 years ago

In reply to: Daniel Gustafsson (#194)

pavel.stehule@gmail.com

over 2 years ago

In reply to: Pavel Stehule (#195)

Daniel Gustafsson

daniel@yesql.se

over 2 years ago

In reply to: Pavel Stehule (#196)

pavel.stehule@gmail.com

over 2 years ago

In reply to: Daniel Gustafsson (#197)

pavel.stehule@gmail.com

over 2 years ago

In reply to: Pavel Stehule (#198)

Daniel Gustafsson

daniel@yesql.se

over 2 years ago

In reply to: Pavel Stehule (#199)

er@xs4all.nl

over 2 years ago

In reply to: Daniel Gustafsson (#200)

Daniel Gustafsson

daniel@yesql.se

over 2 years ago

In reply to: Erik Rijkers (#201)

pavel.stehule@gmail.com

over 2 years ago

In reply to: Daniel Gustafsson (#202)

tgl@sss.pgh.pa.us

over 2 years ago

In reply to: Daniel Gustafsson (#202)

pavel.stehule@gmail.com

over 2 years ago

In reply to: Tom Lane (#204)

Daniel Gustafsson

daniel@yesql.se

over 2 years ago

In reply to: Pavel Stehule (#205)

pavel.stehule@gmail.com

over 2 years ago

In reply to: Daniel Gustafsson (#206)