DDL deparse
Hi all,
(CC'ing people who worked on DDL deparse)
DDL Deparse has been developed[1]/messages/by-id/20150215044814.GL3391@alvh.no-ip.org[2]/messages/by-id/OS0PR01MB57163E6487EFF7378CB8E17C9438A@OS0PR01MB5716.jpnprd01.prod.outlook.com for quite a long time and was
originally proposed as a building block of DDL replication. Reading
the related discussion threads, there is an agreement on implementing
DDL replication on top of DDL deparse, and some hackers prefer the
format of its output. I've reviewed the last proposed approach and
researched DDL deparse/DDL replication, and there are some points that
are unclear to me, and things have changed since it was actively
developed. So I've started this thread separately from the DDL
replication thread in order to discuss DDL deparse itself while
working toward DDL replication development.
Quick summary of the last developed DDL deparse feature[3]Attached patches that I rebased to the current HEAD. We can test DDL deparse feature with an event trigger like:: the basic
functionality is that it takes a parse tree as an input and constructs
DDLs by retrieving the information from system catalogs based on the
parse tree. The output format is self-documenting JSON, enabling us to
easily do table name mapping or schema name mapping while
reconstructing a DDL command. For instance, deparsing "create table
test (a int)" produces:
{
"fmt": "CREATE TABLE %{identity}D (%{table_elements:, }s)",
"identity": {
"objname": "test",
"schemaname": "public"
},
"table_elements": [
{
"fmt": "%{name}I %{coltype}T STORAGE %{colstorage}s",
"name": "a",
"type": "column",
"coltype": {
"typmod": "",
"typarray": false,
"typename": "int4",
"schemaname": "pg_catalog"
},
"colstorage": "PLAIN"
}
]
}
The main point that I want to discuss is what output we expect from
DDL deparse, especially CREATE DDLs. I originally thought that DDL
deparse converts the parse tree back into the DDL command originally
executed. However, what DDL deparse for CREATE TABLE actually does is
to generate possibly multiple DDLs to achieve the exact same catalog
state. That is, the deparsed command doesn't necessarily preserve the
user intent in the original DDL command, and possibly generates
multiple commands for the one command. For instance:
Deparsing "create table test_serial (a serial)" generates:
* CREATE SEQUENCE public.test_serial_a_seq CACHE 1 NO CYCLE INCREMENT
BY 1 MINVALUE 1 MAXVALUE 2147483647 START WITH 1 RESTART 1 AS
pg_catalog.int4;
* CREATE TABLE public.test_serial (a pg_catalog.int4 STORAGE PLAIN NOT
NULL DEFAULT pg_catalog.nextval('public.test_serial_a_seq'::pg_catalog.regclass));
* ALTER SEQUENCE public.test_serial_a_seq OWNED BY public.test_serial.a;
And deparsing "create table test_b (a int, b int references test_a
(a))" generates:
* CREATE TABLE public.test_b (a pg_catalog.int4 STORAGE PLAIN, b
pg_catalog.int4 STORAGE PLAIN);
* ALTER TABLE public.test_b ADD CONSTRAINT test_b_b_fkey FOREIGN KEY
(b) REFERENCES public.test_a(a);
I haven't seen any discussion on whether it's architecturally correct
for a DDL deparser not to preserve the user intent.
If it's okay for DDL deparse to expand one DDL command into multiple
ones, it's very similar to what pg_get_table_ddl()[4]/messages/by-id/CANxoLDfjQnhM=E6JSyYo9s9OdjqoN8s_3wE5yL=kaDu_X8j-dA@mail.gmail.com does. The only
difference between the two is the output format. I guess we could add
an option to the SQL function to output DDLs as a JSON blob. I don't
think we want to maintain two features if they provide very similar
functionality. Also, I'm not if it could be useful other than DDL
replication use cases.
I personally think that DDL deparse should preserve (and possibly
normalize) the user intent, producing the following query, for
example:
* CREATE TABLE public.test_b (a pg_catalog.int4, b pg_catalog.int4
REFERENCES public.test_a (a)), or
* CREATE TABLE public.test_b (a pg_catalog.int4, b pg_catalog.int4,
CONSTRAINT test_b_b_fkey FOREIGN KEY (b) REFERENCES public.test_a(a))
It would be helpful for some use cases besides DDL replication. Since
DDL deparse takes a post-transformed parse tree as an input, we might
need to store some information in the parse tree to give hints for DDL
deparse to construct the DDLs in the original query form.
So my question is: should DDL deparse preserve user intent, or is it
acceptable to materialize the catalog state (in which case, should we
just extend pg_get_table_ddl instead)?
FYI, from the DDL replication point of view, it's okay to expand one
DDL into multiple DDLs when sending them to the subscriber since they
can produce the same result. pg_get_table_ddl() can also be used for
this purpose; we can write the table OID in a WAL record and call
pg_get_table_ddl() while decoding the WAL record, reconstructing DDLs
while using a historical snapshot. It would still require deparse for
ALTER TABLE commands, but it would reduce much of the code to
maintain.
Feedback is very welcome.
Regards,
[1]: /messages/by-id/20150215044814.GL3391@alvh.no-ip.org
[2]: /messages/by-id/OS0PR01MB57163E6487EFF7378CB8E17C9438A@OS0PR01MB5716.jpnprd01.prod.outlook.com
[3]: Attached patches that I rebased to the current HEAD. We can test DDL deparse feature with an event trigger like:
DDL deparse feature with an event trigger like:
CREATE OR REPLACE FUNCTION deparse_test_trigger()
RETURNS event_trigger
LANGUAGE plpgsql
AS $$
DECLARE
r record;
json_cmd text;
re_cmd text;
BEGIN
FOR r IN
SELECT * FROM pg_event_trigger_ddl_commands()
LOOP
-- 1. Deparse the collected command into a JSON blob
json_cmd := ddl_deparse_to_json(r.command);
-- 2. Re-expand the JSON blob back into a plain SQL string
re_cmd := ddl_deparse_expand_command(json_cmd);
RAISE NOTICE 'command_tag: %', r.command_tag;
RAISE NOTICE 'object_identity: %', r.object_identity;
RAISE NOTICE 'JSON: %', json_cmd;
RAISE NOTICE 'reconstructed SQL: %', re_cmd;
END LOOP;
END;
$$;
CREATE EVENT TRIGGER deparse_test
ON ddl_command_end
WHEN TAG IN ('CREATE TABLE', 'CREATE SEQUENCE', 'ALTER TABLE',
'ALTER SEQUENCE', 'DROP TABLE')
EXECUTE FUNCTION deparse_test_trigger();
Note that it supports CREATE TABLE, CREATE SEQUENCE, ALTER TABLE,
ALTER SEQUENCE, and DROP TABLE.
[4]: /messages/by-id/CANxoLDfjQnhM=E6JSyYo9s9OdjqoN8s_3wE5yL=kaDu_X8j-dA@mail.gmail.com
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com