Add pg_get_publication_ddl function

Started by Jonathan Gonzalez V.about 2 months ago5 messages

jonathan.abdiel@gmail.com

about 2 months ago

Hi all!

Following the set of pg_get_{object}_ddl functions described by Andrew
Dustan[1]/messages/by-id/945db7c5-be75-45bf-b55b-cb1e56f2e3e9@dunslane.net I'm attaching another patch that I created for the
PUBLICATION object, I had to left some stuff out and added others based
on other patches feedback for the same type of functions:

* Pretty print: There's a few patched that implemented the pretty print
using spaces or tabs, by default or not, etc. I just decided to keep
this out of this patch, the idea is to follow any decision that can
be made for all the patches at some point, and mostly probably have
a common set of functions to use in all these functions since there
will be a lot of them, and rewriting every time the same and
reviewing the same possible code may take some time and I think it's
better to have the base set of functions and later work on stuff like
pretty print.

* Testing: I've added a regress test with the name
`publiction_ddl.sql`, those, it's close to the publication object in
names on the list of files and not inside a big generic file with
many SQL for different objects. Those, we can keep a list of file in
a fixed section inside the file `parallel_schedule`.

The patch add the function, tests and documentation, but on the
documentation side I don't know if it should go anywhere else or if the
current location it's ok, I haven't found any decision on the that
specific subject.

Related to the required privileges, to CREATE or ALTER a PUBLICATION
you require a specific set of privileges, but there's no defined
privilege to "view" a publication, being really strict, this function
could leak information like columns and tables that exist inside the
database without even have permissions to see those tables, the same
will be for sequences in the future. This took me to the question, does
a kind of "view" privilege exists for these objects? Because you can
have access to the database or even some tables but not all of them at
the same time, and this could lead to a leak of data. I'm really new on
PostgreSQL development, but I couldn't find the "view" privilege that
will allow to read a PUBLICATION and check it, but in my opinion not
being allowed to create a publication doesn't mean that you cannot see
the content of it, am I missing a privilege here or if you can't create
you shouldn't be able to "view" a publication?

Related to the default values, there's a couple of things that I think
it should be there just because, either are always defined or to make
clear the value is there since it will affect the object:

* publish: this field contains a list of permissions [2]https://www.postgresql.org/docs/18/sql-createpublication.html#SQL-CREATEPUBLICATION-PARAMS-WITH that are a
list of variables inside a struct with values always being set

* publish_generated_columns: the default is well defined and not adding
the default to the statement will make sense, but since the others
two will be there, for useful and educational purposes, this makes
sense to me.

* publish_via_partition_root: this boolean is always set, so adding it
make sense since it will make explicit to someone debugging a
publication how this will behave on a TRUNCATE situation.

Thank you in advance for the reviews and all the time on this!

[1]: /messages/by-id/945db7c5-be75-45bf-b55b-cb1e56f2e3e9@dunslane.net
/messages/by-id/945db7c5-be75-45bf-b55b-cb1e56f2e3e9@dunslane.net
[2]: https://www.postgresql.org/docs/18/sql-createpublication.html#SQL-CREATEPUBLICATION-PARAMS-WITH
https://www.postgresql.org/docs/18/sql-createpublication.html#SQL-CREATEPUBLICATION-PARAMS-WITH

--
Jonathan Gonzalez V. <jonathan.abdiel@gmail.com>
EnterpriseDB

zengman

zengman@halodbtech.com

about 2 months ago

In reply to: Jonathan Gonzalez V. (#1)

Re:Add pg_get_publication_ddl function

Hi,

I haven't tested your patch yet, but I noticed a few points that may need adjustment:

Based on your code logic, the pub parameter passed here can never be NULL — if the corresponding PUBLICATION does not exist, an error should have already been thrown earlier in the code flow.
Therefore, the following code block can be removed, and the usage of return (Datum) NULL; in this block is also incorrect:
```
if (pub == NULL)
return (Datum) NULL;
```
pubtuple is not being freed — please use heap_freetuple to release it.

The format string "%sALL SEQUENCE" should be corrected to "%sALL SEQUENCES".

--
Regards,
Man Zeng
www.openhalo.org

Peter Smith

smithpb2250@gmail.com

about 2 months ago

In reply to: Jonathan Gonzalez V. (#1)

Re: Add pg_get_publication_ddl function

Hi Jonathan,

Here are some initial review comments for patch v1-0001.

(I haven't applied or run anything -- this is all just by inspection).

======
1. GENERAL - typos

There are many typos and upper/lower capitalisation mistakes in the
comments / commit message/docs.

I won't list them all, but below is a sample. Consider using a
spelling/grammar checker to discover all of them.

typo /for a giving PUBLICATION./
typo /thuse/
typo /Comprhensive/
typo /CREATE statement for a giving PUBLICATION/
typo /DDL fro a PUBLICATION/
typo /The sequence are/
typo /publication an have relations/
typo /If no null/
typo /at the end make sense to/
typo /create publation/

======

doc/src/sgml/func/func-info.sgml

2.
+     <tbody>
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_get_publication_ddl</primary>
+        </indexterm>
+        <function>pg_get_publication_ddl</function> (
<parameter>publication</parameter> <type>text</type> or
<type>oid</type> )
+        <returnvalue>text</returnvalue>
+       </para>

Maybe say "name" instead of "text" for the parameter; otherwise, there
is no indication of what the text should represent.

~~~

3.
+       <para>
+        Recreate the CREATE statement for a giving PUBLICATION.
+        The result is a complete <command>CREATE
PUBLICATION</command> statement.
+       </para></entry>
+      </row>
+     </tbody>

SUGGESTION
Returns the <command>CREATE PUBLICATION</command> command that would
create this publication.

======
src/backend/utils/adt/ruleutils.c

pg_do_publication_ddl:

4.
+ if (pub == NULL)
+ return (Datum) NULL;

Was a cast needed?

~~~

5.
+ publications = GetPublicationRelations(pub->oid, PUBLICATION_PART_ALL);

AFAIK, this returns relids of the published tables. So this variable
name 'publications' is very misleading.

~~~

6.
+ /*
+ * we add the publication name, this isn't covered by the schema, yet?
+ */

What does that "covered by the schema" part mean? Pubnames are not
schema-qualified (??)

~~~

7.
+ /*
+ * having all tables or all sequence means there's not publications per
+ * table
+ */

This might be true for now, but it might be wise to avoid making this
assumption if it is possible to do so, because one day, "FOR ALL
SEQUENCES, FOR TABLE t1" syntax may be added, and then your assumption
will be broken.

IOW, would it be better not to say 'else'?

~~~

8.
+ if (pub->allsequences)
+ appendStringInfo(buf,
+ "%sALL SEQUENCE",
+ pub->alltables ? ", " : " ");

This should say "ALL SEQUENCES", not "ALL SEQUENCE"

~~~

9.
+ /*
+ * publication an have relations of tables, in schema or current
+ * schema
+ *
+ */

9a.
Unnecessary newline.

9b.
Why are you mentioning FOR TABLES IN SCHEMA here? That seems
unnecessary. That clause gets handled separately later.

~~~~

10.
+ /* If no null, means we have a list of columns to publish */
+ if (!cols_nulls)
+ {
+ ArrayType  *arr;
+ int ncolumns;
+ int16    *cols;
+ bool fcol = true;
+
+ arr = DatumGetArrayTypeP(columns);
+ ncolumns = ARR_DIMS(arr)[0];
+ cols = (int16 *) ARR_DATA_PTR(arr);
+
+ appendStringInfoChar(buf, '(');
+ for (int i = 0; i < ncolumns; i++)
+ {
+ appendStringInfo(buf, "%s%s",
+ fcol ? "" : ", ",
+ get_attname(rel->rd_id, cols[i], true));
+ fcol = false;
+ }
+ appendStringInfoChar(buf, ')');
+ }

10a.
SUGGESTION
Handle any Column List for this table

10b.
Variable 'fcol' is unnecessary; just check i == 0.

~~~

11.
+ /*
+ * If there's a condition goes after the columns but if there's no
+ * column we can have conditions too
+ */

For everything related to Anum_pg_publication_rel_prattrs and
Anum_pg_publication_rel_prqual, the well-known terminology is "Column
Lists" and "Row Filters", so all the variables and comments using
those should be calling them by the proper names.

IIUC, Row Filters are independent of Column Lists, so that code
comment is not very helpful.

SUGGESTION
Handle any Row Filter for this table

~~~

12.
+ appendStringInfo(buf, " WHERE %s ", str);

Why the trailing space? You didn't have one if there was only a Column
List, but no Row Filter.

~~~

13.
+ /* if we have schemas, for sure this will go right before the WITH */

Overcomplicated comment, you don't need to say stuff like "for sure
this will go right before the WITH" because the DDL order of clauses
is obvious from the logic.

SUGGESTION
Handle FOR TABLES IN SCHEMA

~~~

14.
+
+
+ /* always add the WITH options */
+ appendStringInfo(buf, " WITH (");

Double blank line.

~~~

15.
+ /* publish string */
+ appendStringInfo(buf, "publish='");

SUGGESTION
Parameter: 'publish'

~~~

16.
+ /*
+ * by precedence we know that the insert will always be first, no need
+ * to check previous values
+ */

There is no "precedence". The 'publish' values are in this order only
because you chose to put them in this order. This whole comment seems
unnecessary.

~~~

17.
+ /* publish_generated_columns string */
+ appendStringInfo(buf, "publish_generated_columns='%s', ",
+ pub->pubgencols_type == PUBLISH_GENCOLS_NONE ? "none" : "stored");

SUGGESTION
Parameter: 'publish_generated_columns'

~~~

18.
+ /* publish_via_partition_root value */
+ appendStringInfo(buf, "publish_via_partition_root='%s')",
+ pub->pubviaroot ? "true" : "false");

18a
SUGGESTION
Parameter: 'publish_via_partition_root'

18b.
Putting the true/false in single quotes is unnecessary, and it looks a
bit strange

======
src/test/regress/sql/publication_ddl.sql

19. GENERAL

I have not checked the tests in detail, but here are some general comments:

19a. There is a mix of upper/lowercase -- e/g/sometimes you say
"create publication", other times you say "ALTER PUBLICATION". Be
consistent.

19b.
Some tests are missing comments to introduce them. The spacing
grouping the tests is also inconsistent.

19c.
Use proper terminology of "Column Lists" and "Row Filters" instead of
just calling them columns and conditions

19d.
All of those cleanup DROPs can be combined; You don't need a separate
DROP statement for every object.

~~~

20.
+
+-- a new schema for multiple schemas
+CREATE SCHEMA pub_schema_test_ddl_2;
+CREATE TABLE pub_schema_test_ddl_2.schema_tbl1 (foo int, bar int);

This could've been done earlier same place where you created the other
test schema/tables.

======

FYI, last year I wrote a blog [1]https://www.postgresql.fastware.com/pzone/tailoring-create-publication-for-logical-replication-a-complete-syntax-guide claiming to be the "complete syntax
guide" for CREATE PUBLICATION. Perhaps it is useful as a reference?

======
[1]: https://www.postgresql.fastware.com/pzone/tailoring-create-publication-for-logical-replication-a-complete-syntax-guide

Kind Regards,
Peter Smith.
Fujitsu Australia

Cary Huang

cary.huang@highgo.ca

24 days ago

In reply to: Peter Smith (#3)

Re: Add pg_get_publication_ddl function

The following review has been posted through the commitfest application:
make installcheck-world: not tested
Implements feature: not tested
Spec compliant: not tested
Documentation: not tested

Hello

+# run the retail DDL tests at the end make sense to not interrupt with other
+# tests in case the object names are the same to other objects previously used.
+test: publication_ddl

retail DDL tests is unclear to me. Might as well remove the comment.

+	if (pub->alltables || pub->allsequences)
+	{
+		appendStringInfo(buf, " FOR ");
+
+		if (pub->alltables)
+			appendStringInfo(buf, "ALL TABLES");
+
+		/* The sequence are published only on versions 19+ */
+		if (pub->allsequences)
+			appendStringInfo(buf,
+							 "%sALL SEQUENCE",
+							 pub->alltables ? ", " : " ");
+	}

Critical bug / typo here, I think you meant ALL SEQUENCES instead
of ALL SEQUENCE.

+ if (pub == NULL)
+ return (Datum) NULL;

use PG_RETURN_NULL() instead like everywhere else for consistency.

Typos in your commit message and everywhere else in code.. For
example:

"giving" → "given", "thuse" → "thus",
"Comprhensive" → "Comprehensive", "fro" → "for"

Need to run the patch through a spelling / typo check tool

Also, you should test pg_get_publication_ddl() with non-existent
publication as well.

Best regards

Cary Huang
Highgo Software

Hüseyin Demir

huseyin.d3r@gmail.com

11 days ago

In reply to: Jonathan Gonzalez V. (#1)

Re: Add pg_get_publication_ddl function

Hi,

Thanks for the patch. I reviewed v1 and found a few issues that need
to be addressed before commit.

1. In pg_do_publication_ddl(), schemaname is initialized to NULL before
the loop but never reset per iteration. When the first table is in a
non-default schema and the second is in public, the public table
incorrectly inherits the previous schema prefix:
```
CREATE SCHEMA s1;
CREATE TABLE s1.t1 (id int);
CREATE TABLE public.t2 (id int);
CREATE PUBLICATION p FOR TABLE s1.t1, public.t2;
SELECT pg_get_publication_ddl('p');
-- Got: ... FOR TABLE s1.t1, s1.t2 ← WRONG
-- Want: ... FOR TABLE s1.t1, t2
```

2. FOR TABLE ONLY is silently dropped, changing semantics for publications
on tables with inheritance or partitioning:
```
CREATE PUBLICATION p FOR TABLE ONLY t (col) WHERE (col > 0);
SELECT pg_get_publication_ddl('p');
-- Got: FOR TABLE t(col) WHERE (col > 0) WITH ...
-- Want: FOR TABLE ONLY t(col) WHERE (col > 0) WITH ...

```

As already mentioned there are also typos in the code. Otherwise, the idea and way lgtm.

Regards.

Add pg_get_publication_ddl function

Attachments: