possibility of partial data dumps with pg_dump
Hello,
with a view to meeting with postgres code and to get some practice with it,
I am making a small patch that adds the possibility of partial tables dump.
A rule of filtering is specified with standard SQL where clause (without
"where" keyword)
There are three ways to send data filters over command line:
1) using table pattern in "where" parameter with divider '@'
... --where "table_pattern@where_condition" ...
"Where" condition will be used for all tables that match the search pattern.
2) using table parameter before any table inclusion
... --where "where condition" ...
All tables in databases will be filtered with input condition.
3) using "where" parameter after table pattern
... -t table_pattern --where where_condition ...
Only tables matching to last pattern before --where will be filtered. Third
way is necessary to shorten the command
line and to avoid duplicating tables pattern when specific tables are
dumped.
Also filters may be input from files.
A file consists of lines, and every line is a table pattern or a where
condition for data.
For example, file
"""
where column_name_1 == 1
table_pattern
table_pattern where column_name_2 == 1
"""
corresponds to parameters
--where "column_name_1 == 1" -t table_pattern --where "column_name_2 == 1"
The file format is not very good, because it doesn't provide sending
patterns of other components such as schemas for example.
And I am ready to change it, if functionality is actually needed.
All use cases are provided with tests.
I will be grateful if patch will get a discussion.
Attachments:
where_condition_v1.patchapplication/octet-stream; name=where_condition_v1.patchDownload
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 2c68915732..d60542f364 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -96,6 +96,15 @@ typedef enum OidOptions
zeroAsNone = 4
} OidOptions;
+
+typedef struct
+{
+ char *table_pattern; /* name pattern of filtered table */
+ char *filter_condition; /* filter where condition */
+ SimpleOidList table_oids; /* filtered tables OIDs */
+}WhereConditionData;
+
+
/* global decls */
static bool dosync = true; /* Issue fsync() to make dump durable on disk. */
@@ -127,6 +136,8 @@ static SimpleOidList foreign_servers_include_oids = {NULL, NULL};
static SimpleStringList extension_include_patterns = {NULL, NULL};
static SimpleOidList extension_include_oids = {NULL, NULL};
+static SimplePtrList filter_bindigs = {NULL, NULL};
+
static const CatalogId nilCatalogId = {0, 0};
/* override for standard extra_float_digits setting */
@@ -145,6 +156,8 @@ static int ncomments = 0;
static SecLabelItem *seclabels = NULL;
static int nseclabels = 0;
+/* global filter for all data in database */
+static char *global_filter_where_condition = NULL;
/*
* The default number of rows per INSERT when
* --inserts is specified without --rows-per-insert
@@ -317,7 +330,13 @@ static void appendReloptionsArrayAH(PQExpBuffer buffer, const char *reloptions,
static char *get_synchronized_snapshot(Archive *fout);
static void setupDumpWorker(Archive *AHX);
static TableInfo *getRootTableInfo(const TableInfo *tbinfo);
-
+static char *getTableDataCondition(Oid table_oid);
+static bool parseFileToFilters(char* filename, DumpOptions *dopt);
+static void addFilterString(char* filter);
+static void supplementWhereConditionDataWithOids(Archive *fount,
+ bool strict_names);
+static void appendWhereConditionData(char *filter_condition,
+ char *table_pattern);
int
main(int argc, char **argv)
@@ -412,7 +431,8 @@ main(int argc, char **argv)
{"on-conflict-do-nothing", no_argument, &dopt.do_nothing, 1},
{"rows-per-insert", required_argument, NULL, 10},
{"include-foreign-data", required_argument, NULL, 11},
-
+ {"where", required_argument, NULL, 15},
+ {"file-filter", required_argument, NULL, 16},
{NULL, 0, NULL, 0}
};
@@ -621,6 +641,14 @@ main(int argc, char **argv)
simple_string_list_append(&foreign_servers_include_patterns,
optarg);
break;
+
+ case 15:
+ addFilterString(optarg);
+ break;
+
+ case 16:
+ parseFileToFilters(optarg, &dopt);
+ break;
default:
/* getopt_long already emitted a complaint */
@@ -791,6 +819,11 @@ main(int argc, char **argv)
&tabledata_exclude_oids,
false);
+ if(filter_bindigs.head != NULL)
+ {
+ supplementWhereConditionDataWithOids(fout,strict_names);
+ }
+
expand_foreign_server_name_patterns(fout, &foreign_servers_include_patterns,
&foreign_servers_include_oids);
@@ -1027,6 +1060,7 @@ help(const char *progname)
" access to)\n"));
printf(_(" --exclude-table-data=PATTERN do NOT dump data for the specified table(s)\n"));
printf(_(" --extra-float-digits=NUM override default setting for extra_float_digits\n"));
+ printf(_(" --filter-file=FILEPATH dump only specified tables and rows described in file\n"));
printf(_(" --if-exists use IF EXISTS when dropping objects\n"));
printf(_(" --include-foreign-data=PATTERN\n"
" include data of foreign tables on foreign\n"
@@ -1052,7 +1086,8 @@ help(const char *progname)
printf(_(" --use-set-session-authorization\n"
" use SET SESSION AUTHORIZATION commands instead of\n"
" ALTER OWNER commands to set ownership\n"));
-
+ printf(_(" --where=WHEREDEFENITION set WHERE condition to filter data rows for last specified table.\n"
+ " if it sent before table pattern, all data in base will be filtered\n"));
printf(_("\nConnection options:\n"));
printf(_(" -d, --dbname=DBNAME database to dump\n"));
printf(_(" -h, --host=HOSTNAME database server host or socket directory\n"));
@@ -2614,7 +2649,7 @@ makeTableDataInfo(DumpOptions *dopt, TableInfo *tbinfo)
tdinfo->dobj.name = tbinfo->dobj.name;
tdinfo->dobj.namespace = tbinfo->dobj.namespace;
tdinfo->tdtable = tbinfo;
- tdinfo->filtercond = NULL; /* might get set later */
+ tdinfo->filtercond = getTableDataCondition(tbinfo->dobj.catId.oid);
addObjectDependency(&tdinfo->dobj, tbinfo->dobj.dumpId);
/* A TableDataInfo contains data, of course */
@@ -2626,6 +2661,44 @@ makeTableDataInfo(DumpOptions *dopt, TableInfo *tbinfo)
tbinfo->interesting = true;
}
+
+/*
+ * getTableDataCondition -
+ * find the where condition by OID and return a clause based on it or default otherwise.
+ *
+ * Note: default filter is the global filter, if it exists, or null value.
+ * If found filter is empty return null value.
+ */
+static char*
+getTableDataCondition(Oid table_oid)
+{
+ char *filter = NULL;
+
+ for(SimplePtrListCell *filter_bind = filter_bindigs.head;
+ filter_bind != NULL;
+ filter_bind = filter_bind->next)
+ {
+ WhereConditionData *condition_data = (WhereConditionData*)filter_bind->ptr;
+ if(simple_oid_list_member(&condition_data->table_oids, table_oid))
+ {
+ filter = condition_data->filter_condition;
+ }
+ }
+
+ if (filter == NULL || strcmp(filter,"") == 0)
+ {
+ if( global_filter_where_condition == NULL)
+ {
+ return filter;
+ }else{
+
+ filter = global_filter_where_condition;
+ }
+ }
+ return psprintf("where %s", filter);
+}
+
+
/*
* The refresh for a materialized view must be dependent on the refresh for
* any materialized view that this one is dependent on.
@@ -18164,3 +18237,123 @@ appendReloptionsArrayAH(PQExpBuffer buffer, const char *reloptions,
if (!res)
pg_log_warning("could not parse %s array", "reloptions");
}
+
+/*
+ * parseFileToFilters -
+ * read file and convert its content into parametrs "-t" or "--where".
+ */
+static bool
+parseFileToFilters(char *filename, DumpOptions *dopt)
+{
+ FILE *fd;
+ char filter_buffer[500];
+ char *table_name;
+ char *filter;
+
+ canonicalize_path(filename);
+ fd = fopen(filename, PG_BINARY_R);
+ if (!fd)
+ {
+ pg_log_error("file %s does not exist", filename);
+ return EXIT_FAILURE;
+ }
+ while (fgets(filter_buffer,500,fd))
+ {
+ table_name = strtok(filter_buffer, " \n");
+ if(table_name)
+ {
+ /*
+ * If first word in string is "where", next filter will be global
+ */
+ if(strcmp(table_name,"where") == 0)
+ {
+ filter = strtok(NULL, "\n");
+ global_filter_where_condition = pg_strdup(filter);
+ }
+ else
+ {
+ filter = strtok(NULL, " ");
+ if(filter && strcmp(filter,"where") == 0)
+ {
+ filter = strtok(NULL, "\n");
+ appendWhereConditionData(filter, table_name);
+ }
+ else
+ {
+ simple_string_list_append(&table_include_patterns, table_name);
+ dopt->include_everything = false;
+ }
+ }
+ }
+ }
+ fclose(fd);
+ return EXIT_SUCCESS;
+}
+
+/*
+ * addFilterString -
+ * Add condition string in local filter list or in global filter.
+ * Note:
+ * String has a format like "table_pattern@condition". If string withaut
+ * divider '@', pattern will be linked for last added table.
+ */
+static void
+addFilterString(char* filter)
+{
+ char *table_pattern;
+ char *where_condition;
+
+ table_pattern = strtok(filter, "@");
+ where_condition = strtok(NULL, "@");
+ if(where_condition == NULL)
+ {
+ if(table_include_patterns.head == NULL)
+ {
+ global_filter_where_condition = pg_strdup(filter);
+ }
+ else
+ {
+ appendWhereConditionData(filter, table_include_patterns.tail->val);
+ }
+ }
+ else
+ {
+ appendWhereConditionData(where_condition, table_pattern);
+ }
+}
+
+/*
+ * appendWhereConditionData -
+ * create and append data struct for filtering strings
+ */
+static void
+appendWhereConditionData(char *filter_contition, char *table_pattern)
+{
+ WhereConditionData *condition_data = pg_malloc(sizeof(WhereConditionData));
+ condition_data->filter_condition = pg_strdup(filter_contition);
+ condition_data->table_pattern = pg_strdup(table_pattern);
+ simple_ptr_list_append(&filter_bindigs, condition_data);
+}
+
+/*
+ * supplementWhereConditionDataWithOids -
+ * find oids that are responded with table patterns of filtering condition
+ */
+static void
+supplementWhereConditionDataWithOids(Archive *fount, bool strict_names)
+{
+ SimpleStringList *patterns = pg_malloc(sizeof(SimpleStringList));
+
+ for(SimplePtrListCell *filter_bind = filter_bindigs.head;
+ filter_bind;
+ filter_bind = filter_bind->next
+ )
+ {
+ WhereConditionData *condition_data = (WhereConditionData*)filter_bind->ptr;
+
+ simple_string_list_append(patterns, condition_data->table_pattern);
+ expand_table_name_patterns(fount, patterns, &condition_data->table_oids ,strict_names);
+ simple_string_list_destroy(patterns);
+ }
+ pg_free(patterns);
+}
\ No newline at end of file
diff --git a/src/bin/pg_dump/t/020_dump_filters.pl b/src/bin/pg_dump/t/020_dump_filters.pl
new file mode 100644
index 0000000000..ab5c98b714
--- /dev/null
+++ b/src/bin/pg_dump/t/020_dump_filters.pl
@@ -0,0 +1,177 @@
+use strict;
+use warnings;
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+my $tempdir = PostgreSQL::Test::Utils::tempdir;
+my $node = PostgreSQL::Test::Cluster->new('main');
+$node->init;
+$node->start;
+
+
+$node->safe_psql("postgres", "CREATE TABLE t0(a int)");
+$node->safe_psql("postgres", "CREATE TABLE t1(a int)");
+$node->safe_psql("postgres", "INSERT INTO t0 SELECT * From generate_series(1,5)");
+$node->safe_psql("postgres", "INSERT INTO t1 SELECT * From generate_series(1,7)");
+
+
+
+my %default_regexp = (
+ 'double_filter' => qr/^
+ \QCOPY public.t0 (a) FROM stdin;\E\n
+ \Q4\E\n\Q5\E\n\Q\.\E\n
+ (.|\n)*
+ \QCOPY public.t1 (a) FROM stdin;\E\n
+ \Q4\E\n\Q5\E\n\Q6\E\n\Q7\E\n
+ \Q\.\E
+ /xm,
+ 'only_one_filter' => qr/^
+ \QCOPY public.t0 (a) FROM stdin;\E\n
+ \Q4\E\n\Q5\E\n
+ \Q\.\E\n
+ /xm,
+ 'one_filter' => qr/^
+ \QCOPY public.t0 (a) FROM stdin;\E\n
+ \Q4\E\n\Q5\E\n
+ \Q\.\E\n
+ (.|\n)*
+ \QCOPY public.t1 (a) FROM stdin;\E\n
+ \Q1\E\n\Q2\E\n\Q3\E\n\Q4\E\n\Q5\E\n\Q6\E\n\Q7\E\n
+ \Q\.\E
+ /xm,
+ 'two_filter' => qr/^
+ \QCOPY public.t0 (a) FROM stdin;\E\n
+ \Q1\E\n\Q2\E\n\Q3\E\n\Q4\E\n\Q5\E\n
+ \Q\.\E\n
+ (.|\n)*
+ \QCOPY public.t1 (a) FROM stdin;\E\n
+ \Q4\E\n\Q5\E\n\Q6\E\n\Q7\E\n
+ \Q\.\E
+ /xm,
+ 'revers_filter' => qr/^
+ \QCOPY public.t0 (a) FROM stdin;\E\n
+ \Q4\E\n\Q5\E\n
+ \Q\.\E\n
+ (.|\n)*
+ \QCOPY public.t1 (a) FROM stdin;\E\n
+ \Q1\E\n\Q2\E\n
+ \Q\.\E
+ /xm,
+);
+
+
+my %tests = (
+ 'gloabl filter' => {
+ regexp => $default_regexp{'double_filter'},
+ file => 'where a > 3',
+ dump => [
+ 'pg_dump',
+ 'postgres',
+ '-f', "$tempdir/gloabl filter.sql",
+ '--where', 'a > 3']
+ },
+ 'local filter' => {
+ regexp => $default_regexp{'only_one_filter'},
+ file => "t0 \n"
+ ."t0 where a > 3",
+ dump => [
+ 'pg_dump',
+ 'postgres',
+ '-f', "$tempdir/local filter.sql",
+ '-t', 't0',
+ '--where', 'a > 3']
+ },
+ 'local filter with second no filter table' => {
+ regexp => $default_regexp{'one_filter'},
+ file => "t0 where a > 3 \n",
+ dump => [
+ 'pg_dump',
+ 'postgres',
+ '-f', "$tempdir/local filter with second no filter table.sql",
+ '-t', 't0',
+ '--where', 'a > 3',
+ '-t', 't1',
+ ]
+ },
+ 'local filter with first no filter table' => {
+ regexp => $default_regexp{'two_filter'},
+ file => "t1 where a > 3",
+ dump => [
+ 'pg_dump',
+ 'postgres',
+ '-f', "$tempdir/local filter with first no filter table.sql",
+ '-t', 't0',
+ '-t', 't1',
+ '--where', 'a > 3',
+ ]
+ },
+
+ 'local filter with half name' => {
+ regexp => $default_regexp{'two_filter'},
+ file => "t1 where a > 3",
+ dump => [
+ 'pg_dump',
+ 'postgres',
+ '-f', "$tempdir/local filter with half name.sql",
+ '--where', 't1@a > 3',
+ ]
+ },
+
+ 'local filter with search condition' => {
+ regexp => $default_regexp{'double_filter'},
+ file => "t0|t1 where a > 3",
+ dump => [
+ 'pg_dump',
+ 'postgres',
+ '-f', "$tempdir/local filter with search condition.sql",
+ '-t', 't0|t1',
+ '--where', 'a > 3',
+ ]
+ },
+
+ 'gloabal with loacal filter' => {
+ regexp => $default_regexp{'revers_filter'},
+ file => "where a > 3\n"
+ ."t1 where a < 3",
+ dump => [
+ 'pg_dump',
+ 'postgres',
+ '-f', "$tempdir/gloabal with loacal filter.sql",
+ '--where', 'a > 3',
+ '-t', 't0',
+ '-t', 't1',
+ '--where', 'a < 3',
+ ]
+ },
+);
+
+
+
+foreach my $test (sort keys %tests)
+{
+ $node->command_ok(\@{ $tests{$test}->{dump} },"$test: pg_dump runs");
+
+ my $output_file = slurp_file("$tempdir/${test}.sql");
+
+ ok($output_file =~ $tests{$test}->{regexp}, "$test: should be dumped");
+
+
+ open my $fileHandle, ">>", "$tempdir/${test}.filter";
+ print $fileHandle $tests{$test}->{file};
+ close ($fileHandle);
+
+ $node->command_ok([
+ 'pg_dump',
+ 'postgres',
+ '-f', "$tempdir/${test} with file.sql",
+ '--file-filter', "$tempdir/${test}.filter",
+ ],
+ "$test: pg_dump with config runs");
+
+ my $output_file_f = slurp_file("$tempdir/${test} with file.sql");
+ ok($output_file_f =~ $tests{$test}->{regexp}, "$test with file: should be dumped");
+}
+
+done_testing();
Hi
út 4. 10. 2022 v 12:48 odesílatel Никита Старовойтов <nikstarall@gmail.com>
napsal:
Hello,
with a view to meeting with postgres code and to get some practice with
it, I am making a small patch that adds the possibility of partial tables
dump.
A rule of filtering is specified with standard SQL where clause (without
"where" keyword)
There are three ways to send data filters over command line:1) using table pattern in "where" parameter with divider '@'
... --where "table_pattern@where_condition" ..."Where" condition will be used for all tables that match the search
pattern.2) using table parameter before any table inclusion
... --where "where condition" ...All tables in databases will be filtered with input condition.
3) using "where" parameter after table pattern
... -t table_pattern --where where_condition ...Only tables matching to last pattern before --where will be filtered.
Third way is necessary to shorten the command
line and to avoid duplicating tables pattern when specific tables are
dumped.Also filters may be input from files.
A file consists of lines, and every line is a table pattern or a where
condition for data.
For example, file
"""
where column_name_1 == 1
table_pattern
table_pattern where column_name_2 == 1
"""
corresponds to parameters--where "column_name_1 == 1" -t table_pattern --where "column_name_2 == 1"
The file format is not very good, because it doesn't provide sending
patterns of other components such as schemas for example.
And I am ready to change it, if functionality is actually needed.All use cases are provided with tests.
I will be grateful if patch will get a discussion.
What is benefit and use case? For this case I don't see any benefit against
simple
\copy (select * from xx where ...) to file CSV
or how hard is it to write trivial application that does export of what you
want in the format that you want?
Regards
Pavel
Hi,
On Tue, Oct 04, 2022 at 02:15:16PM +0200, Pavel Stehule wrote:
út 4. 10. 2022 v 12:48 odesílatel Никита Старовойтов <nikstarall@gmail.com>
napsal:Hello,
with a view to meeting with postgres code and to get some practice with
it, I am making a small patch that adds the possibility of partial tables
dump.
A rule of filtering is specified with standard SQL where clause (without
"where" keyword)What is benefit and use case? For this case I don't see any benefit against
simple\copy (select * from xx where ...) to file CSV
or how hard is it to write trivial application that does export of what you
want in the format that you want?
Also, such approach probably requires a lot of effort to get a valid backup
(with regards to foreign keys and such).
There's already a project dedicated to generate such partial (and consistent)
backups: https://github.com/mla/pg_sample. Maybe that would address your
needs?
Good afternoon, Indeed, the functionality that I started to implement in
the patch is very similar to what is included in the program you proposed.
Many of the use cases are the same. Thanks for giving me a hint about it. I
have been working on implementing referential integrity, but have not been
able to find simple solutions for a complex structure. And I am not sure if
it can be done in the dump process. Although it is obvious that without
this functionality, the usefulness of the function is insignificant. When I
worked with another database management system, the partial offer feature
was available from the dump program. It was useful for me. But I understand
why it might not be worth extending pg_dump with a non-essential feature.
However, I will try to work again to solve the problem with the guaranteed
recovery of the database. Thanks for the comments, they were really helpful
to me.
вт, 4 окт. 2022 г. в 19:24, Julien Rouhaud <rjuju123@gmail.com>:
Show quoted text
Hi,
On Tue, Oct 04, 2022 at 02:15:16PM +0200, Pavel Stehule wrote:
út 4. 10. 2022 v 12:48 odesílatel Никита Старовойтов <
nikstarall@gmail.com>
napsal:
Hello,
with a view to meeting with postgres code and to get some practice with
it, I am making a small patch that adds the possibility of partialtables
dump.
A rule of filtering is specified with standard SQL where clause(without
"where" keyword)
What is benefit and use case? For this case I don't see any benefit
against
simple
\copy (select * from xx where ...) to file CSV
or how hard is it to write trivial application that does export of what
you
want in the format that you want?
Also, such approach probably requires a lot of effort to get a valid backup
(with regards to foreign keys and such).There's already a project dedicated to generate such partial (and
consistent)
backups: https://github.com/mla/pg_sample. Maybe that would address your
needs?