Patch to add hook to copydir()

Started by Swen Kooijover 6 years ago8 messages
#1Swen Kooij
swenkooij@gmail.com
1 attachment(s)

Hello all,

I've been working on an extension that tightly integrates
postgres with underlying filesystem . I need to customize
how postgres copies directories for new databases.

I first looked at the ProcessUtility_hook. This would require
me to copy or rewrite most of the createdb() function. This is
less than ideal of course. Someone on the IRC channel
suggested I could add a hook for copydir().

I implemented the hook similar to how the
ProcessUtility_hook is implemented. I couldn't find any tests
for any of the existing hooks. I've been looking at the regression
tests, but I am not entirely sure how to proceed on that front.

I tested my patch extensively against master and
the REL_12_STABLE branch. All tests pass and the patch has
been working great with my extension.

I attached a first draft of the patch against master.

---
Swen Kooij

Attachments:

copy-dir-hook_v1.patchtext/x-patch; charset=x-binaryenc; name=copy-dir-hook_v1.patchDownload
diff --git a/src/backend/storage/file/copydir.c b/src/backend/storage/file/copydir.c
index ca1c9cd765..e44bebf947 100644
--- a/src/backend/storage/file/copydir.c
+++ b/src/backend/storage/file/copydir.c
@@ -27,6 +27,9 @@
 #include "miscadmin.h"
 #include "pgstat.h"
 
+/* Hook for plugins to get control in copydir() */
+copydir_hook_type copydir_hook = NULL;
+
 /*
  * copydir: copy a directory
  *
@@ -36,6 +39,22 @@
 void
 copydir(char *fromdir, char *todir, bool recurse)
 {
+	if (copydir_hook)
+		(*copydir_hook) (fromdir, todir, recurse);
+	else
+		standard_copydir(fromdir, todir, recurse);
+}
+
+/*
+ * copydir: copy a directory
+ *
+ * If recurse is false, subdirectories are ignored.  Anything that's not
+ * a directory or a regular file is ignored.
+ */
+void
+standard_copydir(char *fromdir, char *todir, bool recurse)
+{
+
 	DIR		   *xldir;
 	struct dirent *xlde;
 	char		fromfile[MAXPGPATH * 2];
diff --git a/src/include/storage/copydir.h b/src/include/storage/copydir.h
index 525cc6203e..8b73184177 100644
--- a/src/include/storage/copydir.h
+++ b/src/include/storage/copydir.h
@@ -13,7 +13,12 @@
 #ifndef COPYDIR_H
 #define COPYDIR_H
 
+/* Hook for plugins to get control in copydir() */
+typedef void (*copydir_hook_type) (char *fromdir, char *todir, bool recurse);
+extern PGDLLIMPORT copydir_hook_type copydir_hook;
+
 extern void copydir(char *fromdir, char *todir, bool recurse);
+extern void standard_copydir(char *fromdir, char *todir, bool recurse);
 extern void copy_file(char *fromfile, char *tofile);
 
 #endif							/* COPYDIR_H */
#2Swen Kooij
swenkooij@gmail.com
In reply to: Swen Kooij (#1)
1 attachment(s)
Re: Patch to add hook to copydir()

I just realized I completely borked the patch file.

My apologies. Attached a (hopefully) correct patch file.

---
Swen Kooij

Show quoted text

On Mon, Sep 2, 2019 at 9:54 PM Swen Kooij <swenkooij@gmail.com> wrote:

Hello all,

I've been working on an extension that tightly integrates
postgres with underlying filesystem . I need to customize
how postgres copies directories for new databases.

I first looked at the ProcessUtility_hook. This would require
me to copy or rewrite most of the createdb() function. This is
less than ideal of course. Someone on the IRC channel
suggested I could add a hook for copydir().

I implemented the hook similar to how the
ProcessUtility_hook is implemented. I couldn't find any tests
for any of the existing hooks. I've been looking at the regression
tests, but I am not entirely sure how to proceed on that front.

I tested my patch extensively against master and
the REL_12_STABLE branch. All tests pass and the patch has
been working great with my extension.

I attached a first draft of the patch against master.

---
Swen Kooij

Attachments:

copy-dir-hook_v2.patchtext/x-patch; charset=US-ASCII; name=copy-dir-hook_v2.patchDownload
From 7033aed34e3b35859c470a738a5efbe48f101418 Mon Sep 17 00:00:00 2001
From: Swen Kooij <swenkooij@gmail.com>
Date: Sun, 1 Sep 2019 22:42:04 +0300
Subject: [PATCH] Add hook for copydir()

Plugins/extensions that want to customize how
PostgreSQL interact with the filesystem can hook
in copydir() instead of having to hook in on
a higher level and potentially copy large amounts
of code.

One example use case is wanting to customize how
directories for new databases are created. Without
this hook, one would have to use ProcessUtility_hook.
This would require copying the vast majority of createdb().
---
 src/backend/storage/file/copydir.c | 19 +++++++++++++++++++
 src/include/storage/copydir.h      |  5 +++++
 2 files changed, 24 insertions(+)

diff --git a/src/backend/storage/file/copydir.c b/src/backend/storage/file/copydir.c
index ca1c9cd765..e44bebf947 100644
--- a/src/backend/storage/file/copydir.c
+++ b/src/backend/storage/file/copydir.c
@@ -27,6 +27,9 @@
 #include "miscadmin.h"
 #include "pgstat.h"
 
+/* Hook for plugins to get control in copydir() */
+copydir_hook_type copydir_hook = NULL;
+
 /*
  * copydir: copy a directory
  *
@@ -36,6 +39,22 @@
 void
 copydir(char *fromdir, char *todir, bool recurse)
 {
+	if (copydir_hook)
+		(*copydir_hook) (fromdir, todir, recurse);
+	else
+		standard_copydir(fromdir, todir, recurse);
+}
+
+/*
+ * copydir: copy a directory
+ *
+ * If recurse is false, subdirectories are ignored.  Anything that's not
+ * a directory or a regular file is ignored.
+ */
+void
+standard_copydir(char *fromdir, char *todir, bool recurse)
+{
+
 	DIR		   *xldir;
 	struct dirent *xlde;
 	char		fromfile[MAXPGPATH * 2];
diff --git a/src/include/storage/copydir.h b/src/include/storage/copydir.h
index 525cc6203e..8b73184177 100644
--- a/src/include/storage/copydir.h
+++ b/src/include/storage/copydir.h
@@ -13,7 +13,12 @@
 #ifndef COPYDIR_H
 #define COPYDIR_H
 
+/* Hook for plugins to get control in copydir() */
+typedef void (*copydir_hook_type) (char *fromdir, char *todir, bool recurse);
+extern PGDLLIMPORT copydir_hook_type copydir_hook;
+
 extern void copydir(char *fromdir, char *todir, bool recurse);
+extern void standard_copydir(char *fromdir, char *todir, bool recurse);
 extern void copy_file(char *fromfile, char *tofile);
 
 #endif							/* COPYDIR_H */
-- 
2.20.1

#3Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Swen Kooij (#1)
Re: Patch to add hook to copydir()

On 2019-09-02 20:54, Swen Kooij wrote:

I've been working on an extension that tightly integrates
postgres with underlying filesystem . I need to customize
how postgres copies directories for new databases.

Could you share some more details, so we can assess whether that is a
sensible way to go about it, and what other hooks might be needed?

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#4Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Peter Eisentraut (#3)
Re: Patch to add hook to copydir()

On 2019-Sep-02, Peter Eisentraut wrote:

On 2019-09-02 20:54, Swen Kooij wrote:

I've been working on an extension that tightly integrates
postgres with underlying filesystem . I need to customize
how postgres copies directories for new databases.

Could you share some more details, so we can assess whether that is a
sensible way to go about it, and what other hooks might be needed?

It seems either terribly high-level, or terribly low-level, depending on
how you look at it. I wonder to what extent it conflicts with the table
AM work, and with the idea of allowing FDWs to have real relfilenodes.
I wonder if this is just one missing piece that's needed to complete
something of a layer for which other pieces are already satisfied by
other hooks.

As is and pending further explanation, it seems a bad idea to me. If we
want pluggability here, maybe we need some new abstraction layer.

--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#5Swen Kooij
swenkooij@gmail.com
In reply to: Alvaro Herrera (#4)
Re: Patch to add hook to copydir()

Thank you both for the feedback. To give some background
on what I am trying to do. Mostly experimental work right now.

I've built an extension that takes advantage of the copy-on-write
properties of ZFS and BTRFS. A tool transforms a data directory
into a set of datasets/sub volumes. When a new database is
created and the directories are copied, the extension uses the
filesystem's ability to create a snapshot/clone of the source dataset/sub
volume.

As createdb() iterates over the template database's tablespaces
and copies the directories to the new one, the copydir() gets invoked
(from dbcommands.c). My extension hooks copydir() and instead
snapshots the source dir and clones it into the target dir (respecting the
recurse flag).

If the source dir is not a ZFS data set or BTRFS sub volume, the
standard implementation gets invoked and everything is business
as usual.

As described in the first paragraph, I've built a small tool that transforms
all base/* directories in the data directory into ZFS datasets or BTRFS
sub volumes. New tablespaces that get created later would have to be
ZFS datasets or BTRFS sub volumes as well in order for the extension
to work. As explained above, if they are not, the extension will just copy
the files.

I also hook ProcessUtility in order to clean up the ZFS datasets and
BTRFS subvolumes _after_ a database gets deleted. Postgres just
handles the deletion of the files and the extension cleans up any
dangling ZFS/BTRFS resources.

Is there anything that I am missing? My early experiments have been
very promising but my experience with Postgres internals is limited. Any
help or feedback would be much appreciated.

Show quoted text

On Mon, Sep 2, 2019 at 11:06 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On 2019-Sep-02, Peter Eisentraut wrote:

On 2019-09-02 20:54, Swen Kooij wrote:

I've been working on an extension that tightly integrates
postgres with underlying filesystem . I need to customize
how postgres copies directories for new databases.

Could you share some more details, so we can assess whether that is a
sensible way to go about it, and what other hooks might be needed?

It seems either terribly high-level, or terribly low-level, depending on
how you look at it. I wonder to what extent it conflicts with the table
AM work, and with the idea of allowing FDWs to have real relfilenodes.
I wonder if this is just one missing piece that's needed to complete
something of a layer for which other pieces are already satisfied by
other hooks.

As is and pending further explanation, it seems a bad idea to me. If we
want pluggability here, maybe we need some new abstraction layer.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#6Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Swen Kooij (#5)
Re: Patch to add hook to copydir()

On 2019-09-02 22:16, Swen Kooij wrote:

Is there anything that I am missing? My early experiments have been
very promising but my experience with Postgres internals is limited. Any
help or feedback would be much appreciated.

You might want to review several previous threads that were
contemplating doing some reflink stuff with btrfs during database
creation. Not exactly what you are doing, but related.

Also, wouldn't it work if your extension just defined its own copydir()
in the .so? That should then be used over the built-in one -- maybe.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#7Swen Kooij
swenkooij@gmail.com
In reply to: Peter Eisentraut (#6)
Re: Patch to add hook to copydir()

I read two previous proposals for something similar.

First one is from 2013 [0]/messages/by-id/511B5D11.4040507@socialserve.com. It proposed a OS and filesystem specific
implementation. Which was then changed to a patch that adds a
config option to specify shell commands that postgres should use
for copying files and dirs. This was after commentary that such a
feature is better suited for an extension. Which is what I am trying to do.

What was true in 2013 (when the other patch was proposed) is still
true. There's no good way at the moment to hook into how postgres
copies files and directories.

The second thread I found from 2018 [1]/messages/by-id/bc9ca382-b98d-0446-f699-8c5de2307ca7@2ndquadrant.com proposes filesystem specific
changes (BTRFS, APFS and XFS) that operate on a file level. Really
nice patch actually, but much more invasive than what I am proposing.

Both patches make rather invasive and significant changes specific
to the features they're trying to build. The general sentiment I got
from reading those two threads is that building in support for these
kind of features is hard and is probably better first done as an extension.

I understand that the patch I proposed is not an ideal interface,
but it gets the job done without having to built this kind of
functionality into postgres. I am not proposing any filesystem
or OS specific hooks/changes in postgres to make this work.

I tried to override the copydir() function without any hooks
and I haven't gotten it to work yet. If it does, I'd still prefer the
hook. It's more predictable and probably more portable.

[0]: /messages/by-id/511B5D11.4040507@socialserve.com
[1]: /messages/by-id/bc9ca382-b98d-0446-f699-8c5de2307ca7@2ndquadrant.com

On Tue, Sep 3, 2019 at 9:48 AM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

Show quoted text

On 2019-09-02 22:16, Swen Kooij wrote:

Is there anything that I am missing? My early experiments have been
very promising but my experience with Postgres internals is limited. Any
help or feedback would be much appreciated.

You might want to review several previous threads that were
contemplating doing some reflink stuff with btrfs during database
creation. Not exactly what you are doing, but related.

Also, wouldn't it work if your extension just defined its own copydir()
in the .so? That should then be used over the built-in one -- maybe.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#8Swen Kooij
swenkooij@gmail.com
In reply to: Swen Kooij (#7)
Re: Patch to add hook to copydir()

Should I forget about getting such a patch in?

I am up for implementing alternative solutions if the current
one is considered unacceptable. As I tried to demonstrate
in my last email, previous attempts also failed.

Show quoted text

On Tue, Sep 3, 2019 at 12:14 PM Swen Kooij <swenkooij@gmail.com> wrote:

I read two previous proposals for something similar.

First one is from 2013 [0]. It proposed a OS and filesystem specific
implementation. Which was then changed to a patch that adds a
config option to specify shell commands that postgres should use
for copying files and dirs. This was after commentary that such a
feature is better suited for an extension. Which is what I am trying to do.

What was true in 2013 (when the other patch was proposed) is still
true. There's no good way at the moment to hook into how postgres
copies files and directories.

The second thread I found from 2018 [1] proposes filesystem specific
changes (BTRFS, APFS and XFS) that operate on a file level. Really
nice patch actually, but much more invasive than what I am proposing.

Both patches make rather invasive and significant changes specific
to the features they're trying to build. The general sentiment I got
from reading those two threads is that building in support for these
kind of features is hard and is probably better first done as an extension.

I understand that the patch I proposed is not an ideal interface,
but it gets the job done without having to built this kind of
functionality into postgres. I am not proposing any filesystem
or OS specific hooks/changes in postgres to make this work.

I tried to override the copydir() function without any hooks
and I haven't gotten it to work yet. If it does, I'd still prefer the
hook. It's more predictable and probably more portable.

[0] /messages/by-id/511B5D11.4040507@socialserve.com
[1] /messages/by-id/bc9ca382-b98d-0446-f699-8c5de2307ca7@2ndquadrant.com

On Tue, Sep 3, 2019 at 9:48 AM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:

On 2019-09-02 22:16, Swen Kooij wrote:

Is there anything that I am missing? My early experiments have been
very promising but my experience with Postgres internals is limited. Any
help or feedback would be much appreciated.

You might want to review several previous threads that were
contemplating doing some reflink stuff with btrfs during database
creation. Not exactly what you are doing, but related.

Also, wouldn't it work if your extension just defined its own copydir()
in the .so? That should then be used over the built-in one -- maybe.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services