Skipping schema changes in publication
Hi,
This feature adds an option to skip changes of all tables in specified
schema while creating publication.
This feature is helpful for use cases where the user wants to
subscribe to all the changes except for the changes present in a few
schemas.
Ex:
CREATE PUBLICATION pub1 FOR ALL TABLES SKIP ALL TABLES IN SCHEMA s1,s2;
OR
ALTER PUBLICATION pub1 ADD SKIP ALL TABLES IN SCHEMA s1,s2;
A new column pnskip is added to table "pg_publication_namespace", to
maintain the schemas that the user wants to skip publishing through
the publication. Modified the output plugin (pgoutput) to skip
publishing the changes if the relation is part of skip schema
publication.
As a continuation to this, I will work on implementing skipping tables
from all tables in schema and skipping tables from all tables
publication.
Attached patch has the implementation for this.
This feature is for the pg16 version.
Thoughts?
Regards,
Vignesh
Attachments:
v1-0001-Skip-publishing-the-tables-of-schema.patchtext/x-patch; charset=US-ASCII; name=v1-0001-Skip-publishing-the-tables-of-schema.patchDownload+678-125
On Tue, Mar 22, 2022 at 12:38 PM vignesh C <vignesh21@gmail.com> wrote:
Hi,
This feature adds an option to skip changes of all tables in specified
schema while creating publication.
This feature is helpful for use cases where the user wants to
subscribe to all the changes except for the changes present in a few
schemas.
Ex:
CREATE PUBLICATION pub1 FOR ALL TABLES SKIP ALL TABLES IN SCHEMA s1,s2;
OR
ALTER PUBLICATION pub1 ADD SKIP ALL TABLES IN SCHEMA s1,s2;A new column pnskip is added to table "pg_publication_namespace", to
maintain the schemas that the user wants to skip publishing through
the publication. Modified the output plugin (pgoutput) to skip
publishing the changes if the relation is part of skip schema
publication.
As a continuation to this, I will work on implementing skipping tables
from all tables in schema and skipping tables from all tables
publication.Attached patch has the implementation for this.
The patch was not applying on top of HEAD because of the recent
commits, attached patch is rebased on top of HEAD.
Regards,
Vignesh
Attachments:
v1-0001-Skip-publishing-the-tables-of-schema.patchtext/x-patch; charset=US-ASCII; name=v1-0001-Skip-publishing-the-tables-of-schema.patchDownload+742-155
On Sat, Mar 26, 2022 at 7:37 PM vignesh C <vignesh21@gmail.com> wrote:
On Tue, Mar 22, 2022 at 12:38 PM vignesh C <vignesh21@gmail.com> wrote:
Hi,
This feature adds an option to skip changes of all tables in specified
schema while creating publication.
This feature is helpful for use cases where the user wants to
subscribe to all the changes except for the changes present in a few
schemas.
Ex:
CREATE PUBLICATION pub1 FOR ALL TABLES SKIP ALL TABLES IN SCHEMA s1,s2;
OR
ALTER PUBLICATION pub1 ADD SKIP ALL TABLES IN SCHEMA s1,s2;A new column pnskip is added to table "pg_publication_namespace", to
maintain the schemas that the user wants to skip publishing through
the publication. Modified the output plugin (pgoutput) to skip
publishing the changes if the relation is part of skip schema
publication.
As a continuation to this, I will work on implementing skipping tables
from all tables in schema and skipping tables from all tables
publication.Attached patch has the implementation for this.
The patch was not applying on top of HEAD because of the recent
commits, attached patch is rebased on top of HEAD.
The patch does not apply on top of HEAD because of the recent commit,
attached patch is rebased on top of HEAD.
I have also included the implementation for skipping a few tables from
all tables publication, the 0002 patch has the implementation for the
same.
This feature is helpful for use cases where the user wants to
subscribe to all the changes except for the changes present in a few
tables.
Ex:
CREATE PUBLICATION pub1 FOR ALL TABLES SKIP TABLE t1,t2;
OR
ALTER PUBLICATION pub1 ADD SKIP TABLE t1,t2;
Regards,
Vignesh
On Tue, Apr 12, 2022 at 11:53 AM vignesh C <vignesh21@gmail.com> wrote:
On Sat, Mar 26, 2022 at 7:37 PM vignesh C <vignesh21@gmail.com> wrote:
On Tue, Mar 22, 2022 at 12:38 PM vignesh C <vignesh21@gmail.com> wrote:
Hi,
This feature adds an option to skip changes of all tables in specified
schema while creating publication.
This feature is helpful for use cases where the user wants to
subscribe to all the changes except for the changes present in a few
schemas.
Ex:
CREATE PUBLICATION pub1 FOR ALL TABLES SKIP ALL TABLES IN SCHEMA s1,s2;
OR
ALTER PUBLICATION pub1 ADD SKIP ALL TABLES IN SCHEMA s1,s2;A new column pnskip is added to table "pg_publication_namespace", to
maintain the schemas that the user wants to skip publishing through
the publication. Modified the output plugin (pgoutput) to skip
publishing the changes if the relation is part of skip schema
publication.
As a continuation to this, I will work on implementing skipping tables
from all tables in schema and skipping tables from all tables
publication.Attached patch has the implementation for this.
The patch was not applying on top of HEAD because of the recent
commits, attached patch is rebased on top of HEAD.The patch does not apply on top of HEAD because of the recent commit,
attached patch is rebased on top of HEAD.I have also included the implementation for skipping a few tables from
all tables publication, the 0002 patch has the implementation for the
same.
This feature is helpful for use cases where the user wants to
subscribe to all the changes except for the changes present in a few
tables.
Ex:
CREATE PUBLICATION pub1 FOR ALL TABLES SKIP TABLE t1,t2;
OR
ALTER PUBLICATION pub1 ADD SKIP TABLE t1,t2;
For the second syntax (Alter Publication ...), isn't it better to
avoid using ADD? It looks odd to me because we are not adding anything
in publication with this sytax.
--
With Regards,
Amit Kapila.
On Tue, Apr 12, 2022 at 12:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Apr 12, 2022 at 11:53 AM vignesh C <vignesh21@gmail.com> wrote:
On Sat, Mar 26, 2022 at 7:37 PM vignesh C <vignesh21@gmail.com> wrote:
On Tue, Mar 22, 2022 at 12:38 PM vignesh C <vignesh21@gmail.com> wrote:
Hi,
This feature adds an option to skip changes of all tables in specified
schema while creating publication.
This feature is helpful for use cases where the user wants to
subscribe to all the changes except for the changes present in a few
schemas.
Ex:
CREATE PUBLICATION pub1 FOR ALL TABLES SKIP ALL TABLES IN SCHEMA s1,s2;
OR
ALTER PUBLICATION pub1 ADD SKIP ALL TABLES IN SCHEMA s1,s2;A new column pnskip is added to table "pg_publication_namespace", to
maintain the schemas that the user wants to skip publishing through
the publication. Modified the output plugin (pgoutput) to skip
publishing the changes if the relation is part of skip schema
publication.
As a continuation to this, I will work on implementing skipping tables
from all tables in schema and skipping tables from all tables
publication.Attached patch has the implementation for this.
The patch was not applying on top of HEAD because of the recent
commits, attached patch is rebased on top of HEAD.The patch does not apply on top of HEAD because of the recent commit,
attached patch is rebased on top of HEAD.I have also included the implementation for skipping a few tables from
all tables publication, the 0002 patch has the implementation for the
same.
This feature is helpful for use cases where the user wants to
subscribe to all the changes except for the changes present in a few
tables.
Ex:
CREATE PUBLICATION pub1 FOR ALL TABLES SKIP TABLE t1,t2;
OR
ALTER PUBLICATION pub1 ADD SKIP TABLE t1,t2;For the second syntax (Alter Publication ...), isn't it better to
avoid using ADD? It looks odd to me because we are not adding anything
in publication with this sytax.
I was thinking of the scenario where user initially creates the
publication for all tables:
CREATE PUBLICATION pub1 FOR ALL TABLES;
After that user decides to skip few tables ex: t1, t2
ALTER PUBLICATION pub1 ADD SKIP TABLE t1,t2;
I thought of supporting this syntax if incase user decides to add the
skipping of a few tables later.
Thoughts?
Regards,
Vignesh
On Tue, Apr 12, 2022 at 4:17 PM vignesh C <vignesh21@gmail.com> wrote:
On Tue, Apr 12, 2022 at 12:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
For the second syntax (Alter Publication ...), isn't it better to
avoid using ADD? It looks odd to me because we are not adding anything
in publication with this sytax.I was thinking of the scenario where user initially creates the
publication for all tables:
CREATE PUBLICATION pub1 FOR ALL TABLES;After that user decides to skip few tables ex: t1, t2
ALTER PUBLICATION pub1 ADD SKIP TABLE t1,t2;I thought of supporting this syntax if incase user decides to add the
skipping of a few tables later.
I understand that part but what I pointed out was that it might be
better to avoid using ADD keyword in this syntax like: ALTER
PUBLICATION pub1 SKIP TABLE t1,t2;
--
With Regards,
Amit Kapila.
On Tue, Apr 12, 2022 at 4:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Apr 12, 2022 at 4:17 PM vignesh C <vignesh21@gmail.com> wrote:
On Tue, Apr 12, 2022 at 12:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
For the second syntax (Alter Publication ...), isn't it better to
avoid using ADD? It looks odd to me because we are not adding anything
in publication with this sytax.I was thinking of the scenario where user initially creates the
publication for all tables:
CREATE PUBLICATION pub1 FOR ALL TABLES;After that user decides to skip few tables ex: t1, t2
ALTER PUBLICATION pub1 ADD SKIP TABLE t1,t2;I thought of supporting this syntax if incase user decides to add the
skipping of a few tables later.I understand that part but what I pointed out was that it might be
better to avoid using ADD keyword in this syntax like: ALTER
PUBLICATION pub1 SKIP TABLE t1,t2;
Currently we are supporting Alter publication using the following syntax:
ALTER PUBLICATION pub1 ADD TABLE t1;
ALTER PUBLICATION pub1 SET TABLE t1;
ALTER PUBLICATION pub1 DROP TABLE T1;
ALTER PUBLICATION pub1 ADD ALL TABLES IN SCHEMA sch1;
ALTER PUBLICATION pub1 SET ALL TABLES IN SCHEMA sch1;
ALTER PUBLICATION pub1 DROP ALL TABLES IN SCHEMA sch1;
I have extended the new syntax in similar lines:
ALTER PUBLICATION pub1 ADD SKIP TABLE t1;
ALTER PUBLICATION pub1 SET SKIP TABLE t1;
ALTER PUBLICATION pub1 DROP SKIP TABLE T1;
I did it like this to maintain consistency.
But I'm fine doing it either way to keep it simple for the user.
Regards,
Vignesh
On Wed, Apr 13, 2022 at 8:45 AM vignesh C <vignesh21@gmail.com> wrote:
On Tue, Apr 12, 2022 at 4:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
I understand that part but what I pointed out was that it might be
better to avoid using ADD keyword in this syntax like: ALTER
PUBLICATION pub1 SKIP TABLE t1,t2;Currently we are supporting Alter publication using the following syntax:
ALTER PUBLICATION pub1 ADD TABLE t1;
ALTER PUBLICATION pub1 SET TABLE t1;
ALTER PUBLICATION pub1 DROP TABLE T1;
ALTER PUBLICATION pub1 ADD ALL TABLES IN SCHEMA sch1;
ALTER PUBLICATION pub1 SET ALL TABLES IN SCHEMA sch1;
ALTER PUBLICATION pub1 DROP ALL TABLES IN SCHEMA sch1;I have extended the new syntax in similar lines:
ALTER PUBLICATION pub1 ADD SKIP TABLE t1;
ALTER PUBLICATION pub1 SET SKIP TABLE t1;
ALTER PUBLICATION pub1 DROP SKIP TABLE T1;I did it like this to maintain consistency.
What is the difference between ADD and SET variants? I understand we
need some way to remove the SKIP table setting but not sure if DROP is
the best alternative.
The other ideas could be:
To set skip tables: ALTER PUBLICATION pub1 SKIP TABLE t1, t2...;
To reset skip tables: ALTER PUBLICATION pub1 SKIP TABLE; /* basically
an empty list*/
Yet another way to reset skip tables: ALTER PUBLICATION pub1 RESET
SKIP TABLE; /* Here we need to introduce RESET. */
--
With Regards,
Amit Kapila.
On Wed, Apr 13, 2022 at 2:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Apr 13, 2022 at 8:45 AM vignesh C <vignesh21@gmail.com> wrote:
On Tue, Apr 12, 2022 at 4:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
I understand that part but what I pointed out was that it might be
better to avoid using ADD keyword in this syntax like: ALTER
PUBLICATION pub1 SKIP TABLE t1,t2;Currently we are supporting Alter publication using the following syntax:
ALTER PUBLICATION pub1 ADD TABLE t1;
ALTER PUBLICATION pub1 SET TABLE t1;
ALTER PUBLICATION pub1 DROP TABLE T1;
ALTER PUBLICATION pub1 ADD ALL TABLES IN SCHEMA sch1;
ALTER PUBLICATION pub1 SET ALL TABLES IN SCHEMA sch1;
ALTER PUBLICATION pub1 DROP ALL TABLES IN SCHEMA sch1;I have extended the new syntax in similar lines:
ALTER PUBLICATION pub1 ADD SKIP TABLE t1;
ALTER PUBLICATION pub1 SET SKIP TABLE t1;
ALTER PUBLICATION pub1 DROP SKIP TABLE T1;I did it like this to maintain consistency.
What is the difference between ADD and SET variants? I understand we
need some way to remove the SKIP table setting but not sure if DROP is
the best alternative.The other ideas could be:
To set skip tables: ALTER PUBLICATION pub1 SKIP TABLE t1, t2...;
To reset skip tables: ALTER PUBLICATION pub1 SKIP TABLE; /* basically
an empty list*/
Yet another way to reset skip tables: ALTER PUBLICATION pub1 RESET
SKIP TABLE; /* Here we need to introduce RESET. */
When you were talking about SKIP TABLE then I liked the idea of:
ALTER ... SET SKIP TABLE; /* empty list to reset the table skips */
ALTER ... SET SKIP TABLE t1,t2; /* non-empty list to replace the table skips */
But when you apply that rule to SKIP ALL TABLES IN SCHEMA, then the
reset syntax looks too awkward.
ALTER ... SET SKIP ALL TABLES IN SCHEMA; /* empty list to reset the
schema skips */
ALTER ... SET SKIP ALL TABLES IN SCHEMA s1,s2; /* non-empty list to
replace the schema skips */
~~~
IMO it might be simpler to do it like:
ALTER ... DROP SKIP; /* reset/remove the skip */
ALTER ... SET SKIP TABLE t1,t2; /* non-empty list to replace table skips */
ALTER ... SET SKIP ALL TABLES IS SCHEMA s1,s2; /* non-empty list to
replace schema skips */
I don't really think that the ALTER ... SET SKIP empty list should be
supported (because reason above)
I don't really think that the ALTER ... ADD SKIP should be supported.
===
More questions - What happens if the skip table or skip schema no
longer exists exist? Does that mean error? Maybe there is a
dependency on it but OTOH it might be annoying - e.g. to disallow a
DROP TABLE when the only dependency was that the user wanted to SKIP
it...
------
Kind Regards,
Peter Smith.
Fujitsu Australia
On Tue, Apr 12, 2022 2:23 PM vignesh C <vignesh21@gmail.com> wrote:
The patch does not apply on top of HEAD because of the recent commit,
attached patch is rebased on top of HEAD.
Thanks for your patch. Here are some comments for 0001 patch.
1. doc/src/sgml/catalogs.sgml
@@ -6438,6 +6438,15 @@ SCRAM-SHA-256$<replaceable><iteration count></replaceable>:<replaceable>&l
A null value indicates that all columns are published.
</para></entry>
</row>
+
+ <row>
+ <entry role="catalog_table_entry"><para role="column_definition">
+ <structfield>pnskip</structfield> <type>bool</type>
+ </para>
+ <para>
+ True if the schema is skip schema
+ </para></entry>
+ </row>
</tbody>
</tgroup>
</table>
This change is added to pg_publication_rel, I think it should be added to
pg_publication_namespace, right?
2.
postgres=# alter publication p1 add skip all tables in schema s1,s2;
ERROR: schema "s1" is already member of publication "p1"
This error message seems odd to me, can we improve it? Something like:
schema "s1" is already skipped in publication "p1"
3.
create table tbl (a int primary key);
create schema s1;
create schema s2;
create table s1.tbl (a int);
create publication p1 for all tables skip all tables in schema s1,s2;
postgres=# \dRp+
Publication p1
Owner | All tables | Inserts | Updates | Deletes | Truncates | Via root
----------+------------+---------+---------+---------+-----------+----------
postgres | t | t | t | t | t | f
Skip tables from schemas:
"s1"
"s2"
postgres=# select * from pg_publication_tables;
pubname | schemaname | tablename
---------+------------+-----------
p1 | public | tbl
p1 | s1 | tbl
(2 rows)
There shouldn't be a record of s1.tbl, since all tables in schema s1 are skipped.
I found that it is caused by the following code:
src/backend/catalog/pg_publication.c
+ foreach(cell, pubschemalist)
+ {
+ PublicationSchInfo *pubsch = (PublicationSchInfo *) lfirst(cell);
+
+ skipschemaidlist = lappend_oid(result, pubsch->oid);
+ }
The first argument to append_oid() seems wrong, should it be:
skipschemaidlist = lappend_oid(skipschemaidlist, pubsch->oid);
4. src/backend/commands/publicationcmds.c
/*
* Convert the PublicationObjSpecType list into schema oid list and
* PublicationTable list.
*/
static void
ObjectsInPublicationToOids(List *pubobjspec_list, ParseState *pstate,
List **rels, List **schemas)
Should we modify the comment of ObjectsInPublicationToOids()?
"schema oid list" should be "PublicationSchInfo list".
Regards,
Shi yu
On Tue, Apr 12, 2022 at 2:23 PM vignesh C <vignesh21@gmail.com> wrote:
The patch does not apply on top of HEAD because of the recent commit,
attached patch is rebased on top of HEAD.
Thanks for your patches.
Here are some comments for v1-0001:
1.
I found the patch add the following two new functions in gram.y:
preprocess_alltables_pubobj_list, check_skip_in_pubobj_list.
These two functions look similar. So could we just add one new function?
Besides, do we need the API `location` in new function
preprocess_alltables_pubobj_list? It seems that "location" is not used in this
new function.
In addition, the location of error cursor in the messages seems has a little
problem. For example:
postgres=# create publication pub for all TABLES skip all tables in schema public, table test;
ERROR: only SKIP ALL TABLES IN SCHEMA or SKIP TABLE can be specified with ALL TABLES option
LINE 1: create publication pub for all TABLES skip all tables in sch...
^
(The location of error cursor is under 'create')
2. I think maybe there is a minor missing in function
preprocess_alltables_pubobj_list and check_skip_in_pubobj_list:
We seem to be missing the CURRENT_SCHEMA case.
For example(In function preprocess_alltables_pubobj_list) :
+ /* Only SKIP ALL TABLES IN SCHEMA option supported with ALL TABLES */
+ if (pubobj->pubobjtype != PUBLICATIONOBJ_TABLES_IN_SCHEMA ||
+ !pubobj->skip)
maybe need to be changed like this:
+ /* Only SKIP ALL TABLES IN SCHEMA option supported with ALL TABLES */
+ if ((pubobj->pubobjtype != PUBLICATIONOBJ_TABLES_IN_SCHEMA &&
+ pubobj->pubobjtype != PUBLICATIONOBJ_TABLES_IN_CUR_SCHEMA) &&
+ pubobj->skip)
3. I think maybe there are some minor missing in create_publication.sgml.
+ [ FOR ALL TABLES [SKIP ALL TABLES IN SCHEMA { <replaceable class="parameter">schema_name</replaceable> | CURRENT_SCHEMA }]
maybe need to be changed to this:
+ [ FOR ALL TABLES [SKIP ALL TABLES IN SCHEMA { <replaceable class="parameter">schema_name</replaceable> | CURRENT_SCHEMA } [, ... ]]
4. The error message of function CreatePublication.
Does the message below need to be modified like the comment?
In addition, I think maybe "FOR/SKIP" is better.
@@ -835,18 +843,21 @@ CreatePublication(ParseState *pstate, CreatePublicationStmt *stmt)
- /* FOR ALL TABLES IN SCHEMA requires superuser */
+ /* FOR [SKIP] ALL TABLES IN SCHEMA requires superuser */
if (list_length(schemaidlist) > 0 && !superuser())
ereport(ERROR,
errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
errmsg("must be superuser to create FOR ALL TABLES IN SCHEMA publication"));
5.
I think there are some minor missing in tab-complete.c.
+ Matches("CREATE", "PUBLICATION", MatchAny, "FOR", "SKIP", "ALL", "TABLES", "IN", "SCHEMA"))
maybe need to be changed to this:
+ Matches("CREATE", "PUBLICATION", MatchAny, "FOR", "ALL", "TABLES", "SKIP", "ALL", "TABLES", "IN", "SCHEMA"))
+ Matches("CREATE", "PUBLICATION", MatchAny, "SKIP", "FOR", "ALL", "TABLES", "IN", "SCHEMA", MatchAny)) &&
maybe need to be changed to this:
+ Matches("CREATE", "PUBLICATION", MatchAny, "FOR", "ALL", "TABLES", "SKIP", "ALL", "TABLES", "IN", "SCHEMA", MatchAny)) &&
6.
In function get_rel_sync_entry, do we need `if (!publish)` in below code?
I think `publish` is always false here, as we delete the check for
"pub->alltables".
```
- /*
- * If this is a FOR ALL TABLES publication, pick the partition root
- * and set the ancestor level accordingly.
- */
- if (pub->alltables)
- {
- ......
- }
-
if (!publish)
```
Regards,
Wang wei
On 12.04.22 08:23, vignesh C wrote:
I have also included the implementation for skipping a few tables from
all tables publication, the 0002 patch has the implementation for the
same.
This feature is helpful for use cases where the user wants to
subscribe to all the changes except for the changes present in a few
tables.
Ex:
CREATE PUBLICATION pub1 FOR ALL TABLES SKIP TABLE t1,t2;
OR
ALTER PUBLICATION pub1 ADD SKIP TABLE t1,t2;
We have already allocated the "skip" terminology for skipping
transactions, which is a dynamic run-time action. We are also using the
term "skip" elsewhere to skip locked rows, which is similarly a run-time
action. I think it would be confusing to use the term SKIP for DDL
construction.
Let's find another term like "omit", "except", etc.
I would also think about this in broader terms. For example, sometimes
people want features like "all columns except these" in certain places.
The syntax for those things should be similar.
That said, I'm not sure this feature is worth the trouble. If this is
useful, what about "whole database except these schemas"? What about
"create this database from this template except these schemas". This
could get out of hand. I think we should encourage users to group their
object the way they want and not offer these complicated negative
selection mechanisms.
On Thu, Apr 14, 2022, at 10:47 AM, Peter Eisentraut wrote:
On 12.04.22 08:23, vignesh C wrote:
I have also included the implementation for skipping a few tables from
all tables publication, the 0002 patch has the implementation for the
same.
This feature is helpful for use cases where the user wants to
subscribe to all the changes except for the changes present in a few
tables.
Ex:
CREATE PUBLICATION pub1 FOR ALL TABLES SKIP TABLE t1,t2;
OR
ALTER PUBLICATION pub1 ADD SKIP TABLE t1,t2;We have already allocated the "skip" terminology for skipping
transactions, which is a dynamic run-time action. We are also using the
term "skip" elsewhere to skip locked rows, which is similarly a run-time
action. I think it would be confusing to use the term SKIP for DDL
construction.
I didn't like the SKIP choice too. We already have EXCEPT for IMPORT FOREIGN
SCHEMA and if I were to suggest a keyword, it would be EXCEPT.
I would also think about this in broader terms. For example, sometimes
people want features like "all columns except these" in certain places.
The syntax for those things should be similar.
The questions are:
What kind of issues does it solve?
Do we have a workaround for it?
That said, I'm not sure this feature is worth the trouble. If this is
useful, what about "whole database except these schemas"? What about
"create this database from this template except these schemas". This
could get out of hand. I think we should encourage users to group their
object the way they want and not offer these complicated negative
selection mechanisms.
I have the same impression too. We already provide a way to:
* include individual tables;
* include all tables;
* include all tables in a certain schema.
Doesn't it cover the majority of the use cases? We don't need to cover all
possible cases in one DDL command. IMO the current grammar for CREATE
PUBLICATION is already complicated after the ALL TABLES IN SCHEMA. You are
proposing to add "ALL TABLES SKIP ALL TABLES" that sounds repetitive but it is
not; doesn't seem well-thought-out. I'm also concerned about possible gotchas
for this proposal. The first command above suggests that it skips all tables in a
certain schema. What happen if I decide to include a particular table of the
skipped schema (second command)?
ALTER PUBLICATION pub1 ADD SKIP ALL TABLES IN SCHEMA s1,s2;
ALTER PUBLICATION pub1 ADD TABLE s1.foo;
Having said that I'm not wedded to this proposal. Unless someone provides
compelling use cases for this additional syntax, I think we should leave the
publication syntax as is.
--
Euler Taveira
EDB https://www.enterprisedb.com/
On Fri, Apr 15, 2022 at 1:26 AM Euler Taveira <euler@eulerto.com> wrote:
On Thu, Apr 14, 2022, at 10:47 AM, Peter Eisentraut wrote:
On 12.04.22 08:23, vignesh C wrote:
I have also included the implementation for skipping a few tables from
all tables publication, the 0002 patch has the implementation for the
same.
This feature is helpful for use cases where the user wants to
subscribe to all the changes except for the changes present in a few
tables.
Ex:
CREATE PUBLICATION pub1 FOR ALL TABLES SKIP TABLE t1,t2;
OR
ALTER PUBLICATION pub1 ADD SKIP TABLE t1,t2;We have already allocated the "skip" terminology for skipping
transactions, which is a dynamic run-time action. We are also using the
term "skip" elsewhere to skip locked rows, which is similarly a run-time
action. I think it would be confusing to use the term SKIP for DDL
construction.I didn't like the SKIP choice too. We already have EXCEPT for IMPORT FOREIGN
SCHEMA and if I were to suggest a keyword, it would be EXCEPT.
+1 for EXCEPT.
I would also think about this in broader terms. For example, sometimes
people want features like "all columns except these" in certain places.
The syntax for those things should be similar.The questions are:
What kind of issues does it solve?
As far as I understand, it is for usability, otherwise, users need to
list all required columns' names even if they don't want to hide most
of the columns in the table. Consider user doesn't want to publish the
'salary' or other sensitive information of executives/employees but
would like to publish all other columns. I feel in such cases it will
be a lot of work for the user especially when the table has many
columns. I see that Oracle has a similar feature [1]https://docs.oracle.com/en/cloud/paas/goldengate-cloud/gwuad/selecting-columns.html#GUID-9A851C8B-48F7-43DF-8D98-D086BE069E20. I think without
this it will be difficult for users to use this feature in some cases.
Do we have a workaround for it?
I can't think of any except the user needs to manually input all
required columns. Can you think of any other workaround?
That said, I'm not sure this feature is worth the trouble. If this is
useful, what about "whole database except these schemas"? What about
"create this database from this template except these schemas". This
could get out of hand. I think we should encourage users to group their
object the way they want and not offer these complicated negative
selection mechanisms.I have the same impression too. We already provide a way to:
* include individual tables;
* include all tables;
* include all tables in a certain schema.Doesn't it cover the majority of the use cases?
Similar to columns, the same applies to tables. Users need to manually
add all tables for a database even when she wants to avoid only a
handful of tables from the database say because they contain sensitive
information or are not required. I think we don't need to cover all
possible exceptions but a few where users can avoid some tables would
be useful. If not, what kind of alternative do users have except for
listing all columns or all tables that are required.
--
With Regards,
Amit Kapila.
On Thu, Apr 14, 2022 at 7:18 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
On 12.04.22 08:23, vignesh C wrote:
I have also included the implementation for skipping a few tables from
all tables publication, the 0002 patch has the implementation for the
same.
This feature is helpful for use cases where the user wants to
subscribe to all the changes except for the changes present in a few
tables.
Ex:
CREATE PUBLICATION pub1 FOR ALL TABLES SKIP TABLE t1,t2;
OR
ALTER PUBLICATION pub1 ADD SKIP TABLE t1,t2;We have already allocated the "skip" terminology for skipping
transactions, which is a dynamic run-time action. We are also using the
term "skip" elsewhere to skip locked rows, which is similarly a run-time
action. I think it would be confusing to use the term SKIP for DDL
construction.Let's find another term like "omit", "except", etc.
+1 for Except
I would also think about this in broader terms. For example, sometimes
people want features like "all columns except these" in certain places.
The syntax for those things should be similar.That said, I'm not sure this feature is worth the trouble. If this is
useful, what about "whole database except these schemas"? What about
"create this database from this template except these schemas". This
could get out of hand. I think we should encourage users to group their
object the way they want and not offer these complicated negative
selection mechanisms.
I thought this feature would help when there are many many tables in
the database and the user wants only certain confidential tables like
credit card information. In this case instead of specifying the whole
table list it will be better to specify "ALL TABLES EXCEPT
cred_info_tbl".
I had seen that mysql also has a similar option replicate-ignore-table
to ignore the changes on specific tables as mentioned in [1]https://dev.mysql.com/doc/refman/5.7/en/change-replication-filter.html.
Similar use case exists in pg_dump too. pg_dump has an option
exclude-table that will be used for not dumping any tables that are
matching the table specified as in [2]https://www.postgresql.org/docs/devel/app-pgdump.html.
[1]: https://dev.mysql.com/doc/refman/5.7/en/change-replication-filter.html
[2]: https://www.postgresql.org/docs/devel/app-pgdump.html
Regards,
Vignesh
On Mon, Apr 18, 2022 at 12:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Apr 15, 2022 at 1:26 AM Euler Taveira <euler@eulerto.com> wrote:
On Thu, Apr 14, 2022, at 10:47 AM, Peter Eisentraut wrote:
On 12.04.22 08:23, vignesh C wrote:
I have also included the implementation for skipping a few tables from
all tables publication, the 0002 patch has the implementation for the
same.
This feature is helpful for use cases where the user wants to
subscribe to all the changes except for the changes present in a few
tables.
Ex:
CREATE PUBLICATION pub1 FOR ALL TABLES SKIP TABLE t1,t2;
OR
ALTER PUBLICATION pub1 ADD SKIP TABLE t1,t2;We have already allocated the "skip" terminology for skipping
transactions, which is a dynamic run-time action. We are also using the
term "skip" elsewhere to skip locked rows, which is similarly a run-time
action. I think it would be confusing to use the term SKIP for DDL
construction.I didn't like the SKIP choice too. We already have EXCEPT for IMPORT FOREIGN
SCHEMA and if I were to suggest a keyword, it would be EXCEPT.+1 for EXCEPT.
Updated patch by changing the syntax to use EXCEPT instead of SKIP.
Regards,
Vignesh
Attachments:
v2-0001-Skip-publishing-the-tables-specified-in-EXCEPT-TA.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Skip-publishing-the-tables-specified-in-EXCEPT-TA.patchDownload+580-114
On Tue, Mar 22, 2022 at 12:39 PM vignesh C <vignesh21@gmail.com> wrote:
Hi,
This feature adds an option to skip changes of all tables in specified
schema while creating publication.
This feature is helpful for use cases where the user wants to
subscribe to all the changes except for the changes present in a few
schemas.
Ex:
CREATE PUBLICATION pub1 FOR ALL TABLES SKIP ALL TABLES IN SCHEMA s1,s2;
OR
ALTER PUBLICATION pub1 ADD SKIP ALL TABLES IN SCHEMA s1,s2;A new column pnskip is added to table "pg_publication_namespace", to
maintain the schemas that the user wants to skip publishing through
the publication. Modified the output plugin (pgoutput) to skip
publishing the changes if the relation is part of skip schema
publication.
As a continuation to this, I will work on implementing skipping tables
from all tables in schema and skipping tables from all tables
publication.Attached patch has the implementation for this.
This feature is for the pg16 version.
Thoughts?
The feature seems to be useful especially when there are lots of
schemas in a database. However, I don't quite like the syntax. Do we
have 'SKIP' identifier in any of the SQL statements in SQL standard?
Can we think of adding skip_schema_list as an option, something like
below?
CREATE PUBLICATION foo FOR ALL TABLES (skip_schema_list = 's1, s2');
ALTER PUBLICATION foo SET (skip_schema_list = 's1, s2'); - to set
ALTER PUBLICATION foo SET (skip_schema_list = ''); - to reset
Regards,
Bharath Rupireddy.
On Sat, Apr 23, 2022 at 2:09 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
On Tue, Mar 22, 2022 at 12:39 PM vignesh C <vignesh21@gmail.com> wrote:
Hi,
This feature adds an option to skip changes of all tables in specified
schema while creating publication.
This feature is helpful for use cases where the user wants to
subscribe to all the changes except for the changes present in a few
schemas.
Ex:
CREATE PUBLICATION pub1 FOR ALL TABLES SKIP ALL TABLES IN SCHEMA s1,s2;
OR
ALTER PUBLICATION pub1 ADD SKIP ALL TABLES IN SCHEMA s1,s2;A new column pnskip is added to table "pg_publication_namespace", to
maintain the schemas that the user wants to skip publishing through
the publication. Modified the output plugin (pgoutput) to skip
publishing the changes if the relation is part of skip schema
publication.
As a continuation to this, I will work on implementing skipping tables
from all tables in schema and skipping tables from all tables
publication.Attached patch has the implementation for this.
This feature is for the pg16 version.
Thoughts?The feature seems to be useful especially when there are lots of
schemas in a database. However, I don't quite like the syntax. Do we
have 'SKIP' identifier in any of the SQL statements in SQL standard?
Can we think of adding skip_schema_list as an option, something like
below?CREATE PUBLICATION foo FOR ALL TABLES (skip_schema_list = 's1, s2');
ALTER PUBLICATION foo SET (skip_schema_list = 's1, s2'); - to set
ALTER PUBLICATION foo SET (skip_schema_list = ''); - to reset
I had been wondering for some time if there was any way to introduce a
more flexible pattern matching into PUBLICATION but without bloating
the syntax. Maybe your idea to use an option for the "skip" gives a
way to do it...
For example, if we could use regex (for <schemaname>.<tablename>
patterns) for the option value then....
~~
e.g.1. Exclude certain tables:
// do NOT publish any tables of schemas s1,s2
CREATE PUBLICATION foo FOR ALL TABLES (exclude_match = '(s1\..*)|(s2\..*)');
// do NOT publish my secret tables (those called "mysecretXXX")
CREATE PUBLICATION foo FOR ALL TABLES (exclude_match = '(.*\.mysecret.*)');
~~
e.g.2. Only allow certain tables.
// ONLY publish my tables (those called "mytableXXX")
CREATE PUBLICATION foo FOR ALL TABLES (subset_match = '(.*\.mytable.*)');
// So following is equivalent to FOR ALL TABLES IN SCHEMA s1
CREATE PUBLICATION foo FOR ALL TABLES (subset_match = '(s1\..*)');
------
Kind Regards,
Peter Smith.
Fujitsu Australia
On Thursday, April 21, 2022 12:15 PM vignesh C <vignesh21@gmail.com> wrote:
Updated patch by changing the syntax to use EXCEPT instead of SKIP.
Hi
This is my review comments on the v2 patch.
(1) gram.y
I think we can make a unified function that merges
preprocess_alltables_pubobj_list with check_except_in_pubobj_list.
With regard to preprocess_alltables_pubobj_list,
we don't use the 2nd argument "location" in this function.
(2) create_publication.sgml
+ <para>
+ Create a publication that publishes all changes in all the tables except for
+ the changes of <structname>users</structname> and
+ <structname>departments</structname> table;
This sentence should end ":" not ";".
(3) publication.out & publication.sql
+-- fail - can't set except table to schema publication
+ALTER PUBLICATION testpub_forschema SET EXCEPT TABLE testpub_tbl1;
There is one unnecessary space in the comment.
Kindly change from "schema publication" to "schema publication".
(4) pg_dump.c & describe.c
In your first email of this thread, you explained this feature
is for PG16. Don't we need additional branch for PG16 ?
@@ -6322,6 +6328,21 @@ describePublications(const char *pattern)
}
}
+ if (pset.sversion >= 150000)
+ {
@@ -4162,7 +4164,7 @@ getPublicationTables(Archive *fout, TableInfo tblinfo[], int numTables)
/* Collect all publication membership info. */
if (fout->remoteVersion >= 150000)
appendPQExpBufferStr(query,
- "SELECT tableoid, oid, prpubid, prrelid, "
+ "SELECT tableoid, oid, prpubid, prrelid, prexcept,"
(5) psql-ref.sgml
+ If <literal>+</literal> is appended to the command name, the tables,
+ except tables and schemas associated with each publication are shown as
+ well.
I'm not sure if "except tables" is a good description.
I suggest "excluded tables". This applies to the entire patch,
in case if this is reasonable suggestion.
Best Regards,
Takamichi Osumi
On Tue, Apr 26, 2022 at 11:32 AM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:
On Thursday, April 21, 2022 12:15 PM vignesh C <vignesh21@gmail.com> wrote:
Updated patch by changing the syntax to use EXCEPT instead of SKIP.
Hi
This is my review comments on the v2 patch.
(1) gram.y
I think we can make a unified function that merges
preprocess_alltables_pubobj_list with check_except_in_pubobj_list.With regard to preprocess_alltables_pubobj_list,
we don't use the 2nd argument "location" in this function.
Removed location and made a unified function.
(2) create_publication.sgml
+ <para> + Create a publication that publishes all changes in all the tables except for + the changes of <structname>users</structname> and + <structname>departments</structname> table;This sentence should end ":" not ";".
Modified
(3) publication.out & publication.sql
+-- fail - can't set except table to schema publication +ALTER PUBLICATION testpub_forschema SET EXCEPT TABLE testpub_tbl1;There is one unnecessary space in the comment.
Kindly change from "schema publication" to "schema publication".
Modified
(4) pg_dump.c & describe.c
In your first email of this thread, you explained this feature
is for PG16. Don't we need additional branch for PG16 ?@@ -6322,6 +6328,21 @@ describePublications(const char *pattern)
}
}+ if (pset.sversion >= 150000) + {@@ -4162,7 +4164,7 @@ getPublicationTables(Archive *fout, TableInfo tblinfo[], int numTables) /* Collect all publication membership info. */ if (fout->remoteVersion >= 150000) appendPQExpBufferStr(query, - "SELECT tableoid, oid, prpubid, prrelid, " + "SELECT tableoid, oid, prpubid, prrelid, prexcept,"
Modified by adding a comment saying "FIXME: 150000 should be changed
to 160000 later for PG16."
(5) psql-ref.sgml
+ If <literal>+</literal> is appended to the command name, the tables, + except tables and schemas associated with each publication are shown as + well.I'm not sure if "except tables" is a good description.
I suggest "excluded tables". This applies to the entire patch,
in case if this is reasonable suggestion.
Modified it in most of the places where it was applicable. I felt the
usage was ok in a few places.
Thanks for the comments, the attached v3 patch has the changes for the same.
Regards.
Vignesh