refactoring basebackup.c

Started by Robert Haasalmost 6 years ago233 messageshackers
Jump to latest
#1Robert Haas
robertmhaas@gmail.com

Hi,

I'd like to propose a fairly major refactoring of the server's
basebackup.c. The current code isn't horrific or anything, but the
base backup mechanism has grown quite a few features over the years
and all of the code knows about all of the features. This is going to
make it progressively more difficult to add additional features, and I
have a few in mind that I'd like to add, as discussed below and also
on several other recent threads.[1]/messages/by-id/CA+TgmoZubLXYR+Pd_gi3MVgyv5hQdLm-GBrVXkun-Lewaw12Kg@mail.gmail.com[2]/messages/by-id/CA+TgmoYr7+-0_vyQoHbTP5H3QGZFgfhnrn6ewDteF=kUqkG=Fw@mail.gmail.com The attached patch set shows
what I have in mind. It needs more work, but I believe that there's
enough here for someone to review the overall direction, and even some
of the specifics, and hopefully give me some useful feedback.

This patch set is built around the idea of creating two new
abstractions, a base backup sink -- or bbsink -- and a base backup
archiver -- or bbarchiver. Each of these works like a foreign data
wrapper or custom scan or TupleTableSlot. That is, there's a table of
function pointers that act like method callbacks. Every implementation
can allocate a struct of sufficient size for its own bookkeeping data,
and the first member of the struct is always the same, and basically
holds the data that all implementations must store, including a
pointer to the table of function pointers. If we were using C++,
bbarchiver and bbsink would be abstract base classes.

They represent closely-related concepts, so much so that I initially
thought we could get by with just one new abstraction layer. I found
on experimentation that this did not work well, so I split it up into
two and that worked a lot better. The distinction is this: a bbsink is
something to which you can send a bunch of archives -- currently, each
would be a tarfile -- and also a backup manifest. A bbarchiver is
something to which you send every file in the data directory
individually, or at least the ones that are getting backed up, plus
any that are being injected into the backup (e.g. the backup_label).
Commonly, a bbsink will do something with the data and then forward it
to a subsequent bbsink, or a bbarchiver will do something with the
data and then forward it to a subsequent bbarchiver or bbsink. For
example, there's a bbarchiver_tar object which, like any bbarchiver,
sees all the files and their contents as input. The output is a
tarfile, which gets send to a bbsink. As things stand in the patch set
now, the tar archives are ultimately sent to the "libpq" bbsink, which
sends them to the client.

In the future, we could have other bbarchivers. For example, we could
add "pax", "zip", or "cpio" bbarchiver which produces archives of that
format, and any given backup could choose which one to use. Or, we
could have a bbarchiver that runs each individual file through a
compression algorithm and then forwards the resulting data to a
subsequent bbarchiver. That would make it easy to produce a tarfile of
individually compressed files, which is one possible way of creating a
seekable achive.[3]/messages/by-id/CA+TgmoZQCoCyPv6fGoovtPEZF98AXCwYDnSB0=p5XtxNY68r_A@mail.gmail.com and following Likewise, we could have other bbsinks. For
example, we could have a "localdisk" bbsink that cause the server to
write the backup somewhere in the local filesystem instead of
streaming it out over libpq. Or, we could have an "s3" bbsink that
writes the archives to S3. We could also have bbsinks that compresses
the input archives using some compressor (e.g. lz4, zstd, bzip2, ...)
and forward the resulting compressed archives to the next bbsink in
the chain. I'm not trying to pass judgement on whether any of these
particular things are things we want to do, nor am I saying that this
patch set solves all the problems with doing them. However, I believe
it will make such things a whole lot easier to implement, because all
of the knowledge about whatever new functionality is being added is
centralized in one place, rather than being spread across the entirety
of basebackup.c. As an example of this, look at how 0010 changes
basebackup.c and basebackup_tar.c: afterwards, basebackup.c no longer
knows anything that is tar-specific, whereas right now it knows about
tar-specific things in many places.

Here's an overview of this patch set:

0001-0003 are cleanup patches that I have posted for review on
separate threads.[4]/messages/by-id/CA+TgmoYq+59SJ2zBbP891ngWPA9fymOqntqYcweSDYXS2a620A@mail.gmail.com[5]/messages/by-id/CA+TgmobWbfReO9-XFk8urR1K4wTNwqoHx_v56t7=T8KaiEoKNw@mail.gmail.com They are included here to make it easy to
apply this whole series if someone wishes to do so.

0004 is a minor refactoring that reduces by 1 the number of functions
in basebackup.c that know about the specifics of tarfiles. It is just
a preparatory patch and probably not very interesting.

0005 invents the bbsink abstraction.

0006 creates basebackup_libpq.c and moves all code that knows about
the details of sending archives via libpq there. The functionality is
exposed for use by basebackup.c as a new type of bbsink, bbsink_libpq.

0007 creates basebackup_throttle.c and moves all code that knows about
throttling backups there. The functionality is exposed for use by
basebackup.c as a new type of bbsink, bbsink_throttle. This means that
the throttling logic could be reused to throttle output to any final
destination. Essentially, this is a bbsink that just passes everything
it gets through to the next bbsink, but with a rate limit. If
throttling's not enabled, no bbsink_throttle object is created, so all
of the throttling code is completely out of the execution pipeline.

0008 creates basebackup_progress.c and moves all code that knows about
progress reporting there. The functionality is exposed for use by
basebackup.c as a new type of bbsink, bbsink_progress. Since the
abstraction doesn't fit perfectly in this case, some extra functions
are added to work around the problem. This is not entirely elegant,
but I don't think it's still an improvement over what we have now, and
I don't have a better idea.

0009 invents the bbarchiver abstraction.

0010 invents two new bbarchivers, a tar bbarchiver and a tarsize
bbarchiver, and refactors basebackup.c to make use of them. The tar
bbarchiver puts the files it sees into tar archives and forwards the
resulting archives to a bbsink. The tarsize bbarchiver is used to
support the PROGRESS option to the BASE_BACKUP command. It just
estimates the size of the backup by summing up the file sizes without
reading them. This approach is good for a couple of reasons. First,
without something like this, it's impossible to keep basebackup.c from
knowing something about the tar format, because the PROGRESS option
doesn't just figure out how big the files to be backed up are: it
figures out how big it thinks the archives will be, and that involves
tar-specific considerations. This area needs more work, as the whole
idea of measuring progress by estimating the archive size is going to
break down as soon as server-side compression is in the picture.
Second, this makes the code path that we use for figuring out the
backup size details much more similar to the path we use for
performing the actual backup. For instance, with this patch, we
include the exact same files in the calculation that we will include
in the backup, and in the same order, something that's not true today.
The basebackup_tar.c file added by this patch is sadly lacking in
comments, which I will add in a future version of the patch set. I
think, though, that it will not be too unclear what's going on here.

0011 invents another new kind of bbarchiver. This bbarchiver just
eavesdrops on the stream of files to facilitate backup manifest
construction, and then forwards everything through to a subsequent
bbarchiver. Like bbsink_throttle, it can be entirely omitted if not
used. This patch is a bit clunky at the moment and needs some polish,
but it is another demonstration of how these abstractions can be used
to simplify basebackup.c, so that basebackup.c only has to worry about
determining what should be backed up and not have to worry much about
all the specific things that need to be done as part of that.

Although this patch set adds quite a bit of code on net, it makes
basebackup.c considerably smaller and simpler, removing more than 400
lines of code from that file, about 20% of the current total. There
are some gratifying changes vs. the status quo. For example, in
master, we have this:

sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
bool sendtblspclinks, backup_manifest_info *manifest,
const char *spcoid)

Notably, the sizeonly flag makes the function not do what the name of
the function suggests that it does. Also, we've got to pass some extra
fields through to enable specific features. With the patch set, the
equivalent function looks like this:

archive_directory(bbarchiver *archiver, const char *path, int basepathlen,
List *tablespaces, bool sendtblspclinks)

The question "what should I do with the directories and files we find
as we recurse?" is now answered by the choice of which bbarchiver to
pass to the function, rather than by the values of sizeonly, manifest,
and spcoid. That's not night and day, but I think it's better,
especially as you imagine adding more features in the future. The
really important part, for me, is that you can make the bbarchiver do
anything you like without needing to make any more changes to this
function. It just arranges to invoke your callbacks. You take it from
there.

One pretty major question that this patch set doesn't address is what
the user interface for any of the hypothetical features mentioned
above ought to look like, or how basebackup.c ought to support them.
The syntax for the BASE_BACKUP command, like the contents of
basebackup.c, has grown organically, and doesn't seem to be very
scalable. Also, the wire protocol - a series of CopyData results which
the client is entirely responsible for knowing how to interpret and
about which the server provides only minimal information - doesn't
much lend itself to extensibility. Some careful design work is likely
needed in both areas, and this patch does not try to do any of it. I
am quite interested in discussing those questions, but I felt that
they weren't the most important problems to solve first.

What do you all think?

Thanks,

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1]: /messages/by-id/CA+TgmoZubLXYR+Pd_gi3MVgyv5hQdLm-GBrVXkun-Lewaw12Kg@mail.gmail.com
[2]: /messages/by-id/CA+TgmoYr7+-0_vyQoHbTP5H3QGZFgfhnrn6ewDteF=kUqkG=Fw@mail.gmail.com
[3]: /messages/by-id/CA+TgmoZQCoCyPv6fGoovtPEZF98AXCwYDnSB0=p5XtxNY68r_A@mail.gmail.com and following
and following
[4]: /messages/by-id/CA+TgmoYq+59SJ2zBbP891ngWPA9fymOqntqYcweSDYXS2a620A@mail.gmail.com
[5]: /messages/by-id/CA+TgmobWbfReO9-XFk8urR1K4wTNwqoHx_v56t7=T8KaiEoKNw@mail.gmail.com

#2Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#1)
Re: refactoring basebackup.c

So it might be good if I'd remembered to attach the patches. Let's try
that again.

...Robert

Attachments:

v1-0005-Introduce-bbsink-abstraction.patchapplication/octet-stream; name=v1-0005-Introduce-bbsink-abstraction.patchDownload+249-1
v1-0002-Minor-code-cleanup-for-perform_base_backup.patchapplication/octet-stream; name=v1-0002-Minor-code-cleanup-for-perform_base_backup.patchDownload+9-12
v1-0003-Assorted-cleanup-of-tar-related-code.patchapplication/octet-stream; name=v1-0003-Assorted-cleanup-of-tar-related-code.patchDownload+93-59
v1-0004-Recast-_tarWriteDirectory-as-convert_link_to_dire.patchapplication/octet-stream; name=v1-0004-Recast-_tarWriteDirectory-as-convert_link_to_dire.patchDownload+14-13
v1-0001-Don-t-export-basebackup.c-s-sendTablespace.patchapplication/octet-stream; name=v1-0001-Don-t-export-basebackup.c-s-sendTablespace.patchDownload+19-29
v1-0006-Convert-libpq-related-code-to-a-bbsink.patchapplication/octet-stream; name=v1-0006-Convert-libpq-related-code-to-a-bbsink.patchDownload+388-234
v1-0009-Introduce-bbarchiver-abstraction.patchapplication/octet-stream; name=v1-0009-Introduce-bbarchiver-abstraction.patchDownload+315-1
v1-0008-Convert-progress-reporting-code-to-a-bbsink.patchapplication/octet-stream; name=v1-0008-Convert-progress-reporting-code-to-a-bbsink.patchDownload+304-97
v1-0010-Create-and-use-bbarchiver-implementations-for-tar.patchapplication/octet-stream; name=v1-0010-Create-and-use-bbarchiver-implementations-for-tar.patchDownload+454-242
v1-0007-Convert-throttling-related-code-to-a-bbsink.patchapplication/octet-stream; name=v1-0007-Convert-throttling-related-code-to-a-bbsink.patchDownload+217-120
v1-0011-WIP-Convert-backup-manifest-generation-to-a-bbarc.patchapplication/octet-stream; name=v1-0011-WIP-Convert-backup-manifest-generation-to-a-bbarc.patchDownload+153-72
#3Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#1)
Re: refactoring basebackup.c

Hi,

On 2020-05-08 16:53:09 -0400, Robert Haas wrote:

They represent closely-related concepts, so much so that I initially
thought we could get by with just one new abstraction layer. I found
on experimentation that this did not work well, so I split it up into
two and that worked a lot better. The distinction is this: a bbsink is
something to which you can send a bunch of archives -- currently, each
would be a tarfile -- and also a backup manifest. A bbarchiver is
something to which you send every file in the data directory
individually, or at least the ones that are getting backed up, plus
any that are being injected into the backup (e.g. the backup_label).
Commonly, a bbsink will do something with the data and then forward it
to a subsequent bbsink, or a bbarchiver will do something with the
data and then forward it to a subsequent bbarchiver or bbsink. For
example, there's a bbarchiver_tar object which, like any bbarchiver,
sees all the files and their contents as input. The output is a
tarfile, which gets send to a bbsink. As things stand in the patch set
now, the tar archives are ultimately sent to the "libpq" bbsink, which
sends them to the client.

Hm.

I wonder if there's cases where recursively forwarding like this will
cause noticable performance effects. The only operation that seems
frequent enough to potentially be noticable would be "chunks" of the
file. So perhaps it'd be good to make sure we read in large enough
chunks?

0010 invents two new bbarchivers, a tar bbarchiver and a tarsize
bbarchiver, and refactors basebackup.c to make use of them. The tar
bbarchiver puts the files it sees into tar archives and forwards the
resulting archives to a bbsink. The tarsize bbarchiver is used to
support the PROGRESS option to the BASE_BACKUP command. It just
estimates the size of the backup by summing up the file sizes without
reading them. This approach is good for a couple of reasons. First,
without something like this, it's impossible to keep basebackup.c from
knowing something about the tar format, because the PROGRESS option
doesn't just figure out how big the files to be backed up are: it
figures out how big it thinks the archives will be, and that involves
tar-specific considerations.

ISTM that it's not actually good to have the progress calculations
include the tar overhead. As you say:

This area needs more work, as the whole idea of measuring progress by
estimating the archive size is going to break down as soon as
server-side compression is in the picture.

This, to me, indicates that we should measure the progress solely based
on how much of the "source" data was processed. The overhead of tar, the
reduction due to compression, shouldn't be included.

What do you all think?

I've not though enough about the specifics, but I think it looks like
it's going roughly in a better direction.

One thing I wonder about is how stateful the interface is. Archivers
will pretty much always track which file is currently open etc. Somehow
such a repeating state machine seems a bit ugly - but I don't really
have a better answer.

Greetings,

Andres Freund

#4Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#3)
Re: refactoring basebackup.c

On Fri, May 8, 2020 at 5:27 PM Andres Freund <andres@anarazel.de> wrote:

I wonder if there's cases where recursively forwarding like this will
cause noticable performance effects. The only operation that seems
frequent enough to potentially be noticable would be "chunks" of the
file. So perhaps it'd be good to make sure we read in large enough
chunks?

Yeah, that needs to be tested. Right now the chunk size is 32kB but it
might be a good idea to go larger. Another thing is that right now the
chunk size is tied to the protocol message size, and I'm not sure
whether the size that's optimal for disk reads is also optimal for
protocol messages.

This, to me, indicates that we should measure the progress solely based
on how much of the "source" data was processed. The overhead of tar, the
reduction due to compression, shouldn't be included.

I don't think it's a particularly bad thing that we include a small
amount of progress for sending an empty file, a directory, or a
symlink. That could make the results more meaningful if you have a
database with lots of empty relations in it. However, I agree that the
effect of compression shouldn't be included. To get there, I think we
need to redesign the wire protocol. Right now, the server has no way
of letting the client know how many uncompressed bytes it's sent, and
the client has no way of figuring it out without uncompressing, which
seems like something we want to avoid.

There are some other problems with the current wire protocol, too:

1. The syntax for the BASE_BACKUP command is large and unwieldy. We
really ought to adopt an extensible options syntax, like COPY, VACUUM,
EXPLAIN, etc. do, rather than using a zillion ad-hoc bolt-ons, each
with bespoke lexer and parser support.

2. The client is sent a list of tablespaces and is supposed to use
that to expect an equal number of archives, computing the name for
each one on the client side from the tablespace info. However, I think
we should be able to support modes like "put all the tablespaces in a
single archive" or "send a separate archive for every 256GB" or "ship
it all to the cloud and don't send me any archives". To get there, I
think we should have the server send the archive name to the clients,
and the client should just keep receiving the next archive until it's
told that there are no more. Then if there's one archive or ten
archives or no archives, the client doesn't have to care. It just
receives what the server sends until it hears that there are no more.
It also doesn't know how the server is naming the archives; the server
can, for example, adjust the archive names based on which compression
format is being chosen, without knowledge of the server's naming
conventions needing to exist on the client side.

I think we should keep support for the current BASE_BACKUP command but
either add a new variant using an extensible options, or else invent a
whole new command with a different name (BACKUP, SEND_BACKUP,
whatever) that takes extensible options. This command should send back
all the archives and the backup manifest using a single COPY stream
rather than multiple COPY streams. Within the COPY stream, we'll
invent a sub-protocol, e.g. based on the first letter of the message,
e.g.:

t = Tablespace boundary. No further message payload. Indicates, for
progress reporting purposes, that we are advancing to the next
tablespace.
f = Filename. The remainder of the message payload is the name of the
next file that will be transferred.
d = Data. The next four bytes contain the number of uncompressed bytes
covered by this message, for progress reporting purposes. The rest of
the message is payload, possibly compressed. Could be empty, if the
data is being shipped elsewhere, and these messages are only being
sent to update the client's notion of progress.

I've not though enough about the specifics, but I think it looks like
it's going roughly in a better direction.

Good to hear.

One thing I wonder about is how stateful the interface is. Archivers
will pretty much always track which file is currently open etc. Somehow
such a repeating state machine seems a bit ugly - but I don't really
have a better answer.

I thought about that a bit, too. There might be some way to unify that
by having some common context object that's defined by basebackup.c
and all archivers get it, so that they have some commonly-desired
details without needing bespoke code, but I'm not sure at this point
whether that will actually produce a nicer result. Even if we don't
have it initially, it seems like it wouldn't be very hard to add it
later, so I'm not too stressed about it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#5Sumanta Mukherjee
sumanta.mukherjee@enterprisedb.com
In reply to: Robert Haas (#4)
Re: refactoring basebackup.c

Hi Robert,
Please see my comments inline below.

On Tue, May 12, 2020 at 12:33 AM Robert Haas <robertmhaas@gmail.com> wrote:

Yeah, that needs to be tested. Right now the chunk size is 32kB but it
might be a good idea to go larger. Another thing is that right now the
chunk size is tied to the protocol message size, and I'm not sure
whether the size that's optimal for disk reads is also optimal for
protocol messages.

One needs a balance between the number of packets to be sent across the
network and so if the size
of reading from the disk and the network packet size could be unified then
it might provide a better optimization.

I don't think it's a particularly bad thing that we include a small
amount of progress for sending an empty file, a directory, or a
symlink. That could make the results more meaningful if you have a
database with lots of empty relations in it. However, I agree that the
effect of compression shouldn't be included. To get there, I think we
need to redesign the wire protocol. Right now, the server has no way
of letting the client know how many uncompressed bytes it's sent, and
the client has no way of figuring it out without uncompressing, which
seems like something we want to avoid.

I agree here too, except that if we have too many tar files one might
cringe
but sending the xtra amt from these tar files looks okay to me.

There are some other problems with the current wire protocol, too:

1. The syntax for the BASE_BACKUP command is large and unwieldy. We
really ought to adopt an extensible options syntax, like COPY, VACUUM,
EXPLAIN, etc. do, rather than using a zillion ad-hoc bolt-ons, each
with bespoke lexer and parser support.

2. The client is sent a list of tablespaces and is supposed to use
that to expect an equal number of archives, computing the name for
each one on the client side from the tablespace info. However, I think
we should be able to support modes like "put all the tablespaces in a
single archive" or "send a separate archive for every 256GB" or "ship
it all to the cloud and don't send me any archives". To get there, I
think we should have the server send the archive name to the clients,
and the client should just keep receiving the next archive until it's
told that there are no more. Then if there's one archive or ten
archives or no archives, the client doesn't have to care. It just
receives what the server sends until it hears that there are no more.
It also doesn't know how the server is naming the archives; the server
can, for example, adjust the archive names based on which compression
format is being chosen, without knowledge of the server's naming
conventions needing to exist on the client side.

One thing to remember here could be that an optimization would need to

be made between the number of options
we provide and people coming back and saying which combinations do not
work
For example, if a user script has "put all the tablespaces in a single
archive" and later on somebody makes some
script changes to break it down at "256 GB" and there is a conflict then
which one takes precedence needs to be chosen.
When the number of options like this become very large this could lead to
some complications.

I think we should keep support for the current BASE_BACKUP command but
either add a new variant using an extensible options, or else invent a
whole new command with a different name (BACKUP, SEND_BACKUP,
whatever) that takes extensible options. This command should send back
all the archives and the backup manifest using a single COPY stream
rather than multiple COPY streams. Within the COPY stream, we'll
invent a sub-protocol, e.g. based on the first letter of the message,
e.g.:

t = Tablespace boundary. No further message payload. Indicates, for
progress reporting purposes, that we are advancing to the next
tablespace.
f = Filename. The remainder of the message payload is the name of the
next file that will be transferred.
d = Data. The next four bytes contain the number of uncompressed bytes
covered by this message, for progress reporting purposes. The rest of
the message is payload, possibly compressed. Could be empty, if the
data is being shipped elsewhere, and these messages are only being
sent to update the client's notion of progress.

Here I support this.

I thought about that a bit, too. There might be some way to unify that
by having some common context object that's defined by basebackup.c
and all archivers get it, so that they have some commonly-desired
details without needing bespoke code, but I'm not sure at this point
whether that will actually produce a nicer result. Even if we don't
have it initially, it seems like it wouldn't be very hard to add it
later, so I'm not too stressed about it.

--Sumanta Mukherjee
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Show quoted text
#6Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#1)
Re: refactoring basebackup.c

On Sat, May 9, 2020 at 2:23 AM Robert Haas <robertmhaas@gmail.com> wrote:

Hi,

I'd like to propose a fairly major refactoring of the server's
basebackup.c. The current code isn't horrific or anything, but the
base backup mechanism has grown quite a few features over the years
and all of the code knows about all of the features. This is going to
make it progressively more difficult to add additional features, and I
have a few in mind that I'd like to add, as discussed below and also
on several other recent threads.[1][2] The attached patch set shows
what I have in mind. It needs more work, but I believe that there's
enough here for someone to review the overall direction, and even some
of the specifics, and hopefully give me some useful feedback.

This patch set is built around the idea of creating two new
abstractions, a base backup sink -- or bbsink -- and a base backup
archiver -- or bbarchiver. Each of these works like a foreign data
wrapper or custom scan or TupleTableSlot. That is, there's a table of
function pointers that act like method callbacks. Every implementation
can allocate a struct of sufficient size for its own bookkeeping data,
and the first member of the struct is always the same, and basically
holds the data that all implementations must store, including a
pointer to the table of function pointers. If we were using C++,
bbarchiver and bbsink would be abstract base classes.

They represent closely-related concepts, so much so that I initially
thought we could get by with just one new abstraction layer. I found
on experimentation that this did not work well, so I split it up into
two and that worked a lot better. The distinction is this: a bbsink is
something to which you can send a bunch of archives -- currently, each
would be a tarfile -- and also a backup manifest. A bbarchiver is
something to which you send every file in the data directory
individually, or at least the ones that are getting backed up, plus
any that are being injected into the backup (e.g. the backup_label).
Commonly, a bbsink will do something with the data and then forward it
to a subsequent bbsink, or a bbarchiver will do something with the
data and then forward it to a subsequent bbarchiver or bbsink. For
example, there's a bbarchiver_tar object which, like any bbarchiver,
sees all the files and their contents as input. The output is a
tarfile, which gets send to a bbsink. As things stand in the patch set
now, the tar archives are ultimately sent to the "libpq" bbsink, which
sends them to the client.

In the future, we could have other bbarchivers. For example, we could
add "pax", "zip", or "cpio" bbarchiver which produces archives of that
format, and any given backup could choose which one to use. Or, we
could have a bbarchiver that runs each individual file through a
compression algorithm and then forwards the resulting data to a
subsequent bbarchiver. That would make it easy to produce a tarfile of
individually compressed files, which is one possible way of creating a
seekable achive.[3] Likewise, we could have other bbsinks. For
example, we could have a "localdisk" bbsink that cause the server to
write the backup somewhere in the local filesystem instead of
streaming it out over libpq. Or, we could have an "s3" bbsink that
writes the archives to S3. We could also have bbsinks that compresses
the input archives using some compressor (e.g. lz4, zstd, bzip2, ...)
and forward the resulting compressed archives to the next bbsink in
the chain. I'm not trying to pass judgement on whether any of these
particular things are things we want to do, nor am I saying that this
patch set solves all the problems with doing them. However, I believe
it will make such things a whole lot easier to implement, because all
of the knowledge about whatever new functionality is being added is
centralized in one place, rather than being spread across the entirety
of basebackup.c. As an example of this, look at how 0010 changes
basebackup.c and basebackup_tar.c: afterwards, basebackup.c no longer
knows anything that is tar-specific, whereas right now it knows about
tar-specific things in many places.

Here's an overview of this patch set:

0001-0003 are cleanup patches that I have posted for review on
separate threads.[4][5] They are included here to make it easy to
apply this whole series if someone wishes to do so.

0004 is a minor refactoring that reduces by 1 the number of functions
in basebackup.c that know about the specifics of tarfiles. It is just
a preparatory patch and probably not very interesting.

0005 invents the bbsink abstraction.

0006 creates basebackup_libpq.c and moves all code that knows about
the details of sending archives via libpq there. The functionality is
exposed for use by basebackup.c as a new type of bbsink, bbsink_libpq.

0007 creates basebackup_throttle.c and moves all code that knows about
throttling backups there. The functionality is exposed for use by
basebackup.c as a new type of bbsink, bbsink_throttle. This means that
the throttling logic could be reused to throttle output to any final
destination. Essentially, this is a bbsink that just passes everything
it gets through to the next bbsink, but with a rate limit. If
throttling's not enabled, no bbsink_throttle object is created, so all
of the throttling code is completely out of the execution pipeline.

0008 creates basebackup_progress.c and moves all code that knows about
progress reporting there. The functionality is exposed for use by
basebackup.c as a new type of bbsink, bbsink_progress. Since the
abstraction doesn't fit perfectly in this case, some extra functions
are added to work around the problem. This is not entirely elegant,
but I don't think it's still an improvement over what we have now, and
I don't have a better idea.

0009 invents the bbarchiver abstraction.

0010 invents two new bbarchivers, a tar bbarchiver and a tarsize
bbarchiver, and refactors basebackup.c to make use of them. The tar
bbarchiver puts the files it sees into tar archives and forwards the
resulting archives to a bbsink. The tarsize bbarchiver is used to
support the PROGRESS option to the BASE_BACKUP command. It just
estimates the size of the backup by summing up the file sizes without
reading them. This approach is good for a couple of reasons. First,
without something like this, it's impossible to keep basebackup.c from
knowing something about the tar format, because the PROGRESS option
doesn't just figure out how big the files to be backed up are: it
figures out how big it thinks the archives will be, and that involves
tar-specific considerations. This area needs more work, as the whole
idea of measuring progress by estimating the archive size is going to
break down as soon as server-side compression is in the picture.
Second, this makes the code path that we use for figuring out the
backup size details much more similar to the path we use for
performing the actual backup. For instance, with this patch, we
include the exact same files in the calculation that we will include
in the backup, and in the same order, something that's not true today.
The basebackup_tar.c file added by this patch is sadly lacking in
comments, which I will add in a future version of the patch set. I
think, though, that it will not be too unclear what's going on here.

0011 invents another new kind of bbarchiver. This bbarchiver just
eavesdrops on the stream of files to facilitate backup manifest
construction, and then forwards everything through to a subsequent
bbarchiver. Like bbsink_throttle, it can be entirely omitted if not
used. This patch is a bit clunky at the moment and needs some polish,
but it is another demonstration of how these abstractions can be used
to simplify basebackup.c, so that basebackup.c only has to worry about
determining what should be backed up and not have to worry much about
all the specific things that need to be done as part of that.

Although this patch set adds quite a bit of code on net, it makes
basebackup.c considerably smaller and simpler, removing more than 400
lines of code from that file, about 20% of the current total. There
are some gratifying changes vs. the status quo. For example, in
master, we have this:

sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
bool sendtblspclinks, backup_manifest_info *manifest,
const char *spcoid)

Notably, the sizeonly flag makes the function not do what the name of
the function suggests that it does. Also, we've got to pass some extra
fields through to enable specific features. With the patch set, the
equivalent function looks like this:

archive_directory(bbarchiver *archiver, const char *path, int basepathlen,
List *tablespaces, bool sendtblspclinks)

The question "what should I do with the directories and files we find
as we recurse?" is now answered by the choice of which bbarchiver to
pass to the function, rather than by the values of sizeonly, manifest,
and spcoid. That's not night and day, but I think it's better,
especially as you imagine adding more features in the future. The
really important part, for me, is that you can make the bbarchiver do
anything you like without needing to make any more changes to this
function. It just arranges to invoke your callbacks. You take it from
there.

One pretty major question that this patch set doesn't address is what
the user interface for any of the hypothetical features mentioned
above ought to look like, or how basebackup.c ought to support them.
The syntax for the BASE_BACKUP command, like the contents of
basebackup.c, has grown organically, and doesn't seem to be very
scalable. Also, the wire protocol - a series of CopyData results which
the client is entirely responsible for knowing how to interpret and
about which the server provides only minimal information - doesn't
much lend itself to extensibility. Some careful design work is likely
needed in both areas, and this patch does not try to do any of it. I
am quite interested in discussing those questions, but I felt that
they weren't the most important problems to solve first.

What do you all think?

The overall idea looks quite nice. I had a look at some of the
patches at least 0005 and 0006. At first look, I have one comment.

+/*
+ * Each archive is set as a separate stream of COPY data, and thus begins
+ * with a CopyOutResponse message.
+ */
+static void
+bbsink_libpq_begin_archive(bbsink *sink, const char *archive_name)
+{
+ SendCopyOutResponse();
+}

Some of the bbsink_libpq_* functions are directly calling pq_* e.g.
bbsink_libpq_begin_backup whereas others are calling SendCopy*
functions and therein those are calling pq_* functions. I think
bbsink_libpq_* function can directly call pq_* functions instead of
adding one more level of the function call.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#7Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#6)
Re: refactoring basebackup.c

On Tue, May 12, 2020 at 4:32 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

Some of the bbsink_libpq_* functions are directly calling pq_* e.g.
bbsink_libpq_begin_backup whereas others are calling SendCopy*
functions and therein those are calling pq_* functions. I think
bbsink_libpq_* function can directly call pq_* functions instead of
adding one more level of the function call.

I think all the helper functions have more than one caller, though.
That's why I created them - to avoid duplicating code.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#8Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#7)
Re: refactoring basebackup.c

On Wed, May 13, 2020 at 1:56 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, May 12, 2020 at 4:32 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:

Some of the bbsink_libpq_* functions are directly calling pq_* e.g.
bbsink_libpq_begin_backup whereas others are calling SendCopy*
functions and therein those are calling pq_* functions. I think
bbsink_libpq_* function can directly call pq_* functions instead of
adding one more level of the function call.

I think all the helper functions have more than one caller, though.
That's why I created them - to avoid duplicating code.

You are right, somehow I missed that part. Sorry for the noise.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#9Suraj Kharage
suraj.kharage@enterprisedb.com
In reply to: Andres Freund (#3)
Re: refactoring basebackup.c

Hi,

Did some performance testing by varying TAR_SEND_SIZE with Robert's
refactor patch and without the patch to check the impact.

Below are the details:

*Backup type*: local backup using pg_basebackup
*Data size*: Around 200GB (200 tables - each table around 1.05 GB)
*different TAR_SEND_SIZE values*: 8kb, 32kb (default value), 128kB, 1MB (
1024kB)

*Server details:*
RAM: 500 GB CPU details: Architecture: x86_64 CPU op-mode(s): 32-bit,
64-bit Byte Order: Little Endian CPU(s): 128 Filesystem: ext4

8kb 32kb (default value) 128kB 1024kB
Without refactor patch real 10m22.718s
user 1m23.629s
sys 8m51.410s real 8m36.245s
user 1m8.471s
sys 7m21.520s real 6m54.299s
user 0m55.690s
sys 5m46.502s real 18m3.511s
user 1m38.197s
sys 9m36.517s
With refactor patch (Robert's patch) real 10m11.350s
user 1m25.038s
sys 8m39.226s real 8m56.226s
user 1m9.774s
sys 7m41.032s real 7m26.678s
user 0m54.833s
sys 6m20.057s real 18m17.230s
user 1m42.749s
sys 9m53.704s

The above numbers are taken from the minimum of two runs of each scenario.

I can see, when we have TAR_SEND_SIZE as 32kb or 128kb, it is giving us a
good performance whereas, for 1Mb it is taking 2.5x more time.

Please let me know your thoughts/suggestions on the same.

--
--

Thanks & Regards,
Suraj kharage,
EnterpriseDB Corporation,
The Postgres Database Company.

#10Sumanta Mukherjee
sumanta.mukherjee@enterprisedb.com
In reply to: Suraj Kharage (#9)
Re: refactoring basebackup.c

Hi Suraj,

Two points I wanted to mention.

1. The max rate at which the transfer is happening when the tar size is
128 Kb is at most .48 GB/sec. Is there a possibility to understand what is
the buffer size which is being used. That could help us explain some part
of the puzzle.
2. Secondly the idea of taking just the min of two runs is a bit counter
to the following. How do we justify the performance numbers and attribute
that the differences is not related to noise. It might be better to do a
few experiments for each of the kind and then try and fit a basic linear
model and report the std deviation. "Order statistics" where you get the
min(X1, X2, ... , Xn) is generally a biased estimator. A variance
calculation of the biased statistics is a bit tricky and so the results
could be corrupted by noise.

With Regards,
Sumanta Mukherjee.
EnterpriseDB: http://www.enterprisedb.com

On Wed, May 13, 2020 at 9:31 AM Suraj Kharage <
suraj.kharage@enterprisedb.com> wrote:

Show quoted text

Hi,

Did some performance testing by varying TAR_SEND_SIZE with Robert's
refactor patch and without the patch to check the impact.

Below are the details:

*Backup type*: local backup using pg_basebackup
*Data size*: Around 200GB (200 tables - each table around 1.05 GB)
*different TAR_SEND_SIZE values*: 8kb, 32kb (default value), 128kB, 1MB (
1024kB)

*Server details:*
RAM: 500 GB CPU details: Architecture: x86_64 CPU op-mode(s): 32-bit,
64-bit Byte Order: Little Endian CPU(s): 128 Filesystem: ext4

8kb 32kb (default value) 128kB 1024kB
Without refactor patch real 10m22.718s
user 1m23.629s
sys 8m51.410s real 8m36.245s
user 1m8.471s
sys 7m21.520s real 6m54.299s
user 0m55.690s
sys 5m46.502s real 18m3.511s
user 1m38.197s
sys 9m36.517s
With refactor patch (Robert's patch) real 10m11.350s
user 1m25.038s
sys 8m39.226s real 8m56.226s
user 1m9.774s
sys 7m41.032s real 7m26.678s
user 0m54.833s
sys 6m20.057s real 18m17.230s
user 1m42.749s
sys 9m53.704s

The above numbers are taken from the minimum of two runs of each scenario.

I can see, when we have TAR_SEND_SIZE as 32kb or 128kb, it is giving us a
good performance whereas, for 1Mb it is taking 2.5x more time.

Please let me know your thoughts/suggestions on the same.

--
--

Thanks & Regards,
Suraj kharage,
EnterpriseDB Corporation,
The Postgres Database Company.

#11Robert Haas
robertmhaas@gmail.com
In reply to: Suraj Kharage (#9)
Re: refactoring basebackup.c

On Wed, May 13, 2020 at 12:01 AM Suraj Kharage <
suraj.kharage@enterprisedb.com> wrote:

8kb 32kb (default value) 128kB 1024kB
Without refactor patch real 10m22.718s
user 1m23.629s
sys 8m51.410s real 8m36.245s
user 1m8.471s
sys 7m21.520s real 6m54.299s
user 0m55.690s
sys 5m46.502s real 18m3.511s
user 1m38.197s
sys 9m36.517s
With refactor patch (Robert's patch) real 10m11.350s
user 1m25.038s
sys 8m39.226s real 8m56.226s
user 1m9.774s
sys 7m41.032s real 7m26.678s
user 0m54.833s
sys 6m20.057s real 18m17.230s
user 1m42.749s
sys 9m53.704s

The above numbers are taken from the minimum of two runs of each scenario.

I can see, when we have TAR_SEND_SIZE as 32kb or 128kb, it is giving us a
good performance whereas, for 1Mb it is taking 2.5x more time.

Please let me know your thoughts/suggestions on the same.

So the patch came out slightly faster at 8kB and slightly slower in the
other tests. That's kinda strange. I wonder if it's just noise. How much do
the results vary run to run?

I would've expected (and I think Andres thought the same) that a smaller
block size would be bad for the patch and a larger block size would be
good, but that's not what these numbers show.

I wouldn't worry too much about the regression at 1MB. Probably what's
happening there is that we're losing some concurrency - perhaps with
smaller block sizes the OS can buffer the entire chunk in the pipe
connecting pg_basebackup to the server and start on the next one, but when
you go up to 1MB it doesn't fit and ends up spending a lot of time waiting
for data to be read from the pipe. Wait event profiling might tell you
more. Probably what this suggests is that you want the largest buffer size
that doesn't cause you to overrun the network/pipe buffer and no larger.
Unfortunately, I have no idea how we'd figure that out dynamically, and I
don't see a reason to believe that everyone will have the same size buffers.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#12Suraj Kharage
suraj.kharage@enterprisedb.com
In reply to: Robert Haas (#11)
Re: refactoring basebackup.c

Hi,

On Wed, May 13, 2020 at 7:49 PM Robert Haas <robertmhaas@gmail.com> wrote:

So the patch came out slightly faster at 8kB and slightly slower in the
other tests. That's kinda strange. I wonder if it's just noise. How much do
the results vary run to run?

It is not varying much except for 8kB run. Please see below details for
both runs of each scenario.

8kb 32kb (default value) 128kB 1024kB
WIthout refactor
patch 1st run real 10m50.924s
user 1m29.774s
sys 9m13.058s real 8m36.245s
user 1m8.471s
sys 7m21.520s real 7m8.690s
user 0m54.840s
sys 6m1.725s real 18m16.898s
user 1m39.105s
sys 9m42.803s
2nd run real 10m22.718s
user 1m23.629s
sys 8m51.410s real 8m44.455s
user 1m7.896s
sys 7m28.909s real 6m54.299s
user 0m55.690s
sys 5m46.502s real 18m3.511s
user 1m38.197s
sys 9m36.517s
WIth refactor
patch 1st run real 10m11.350s
user 1m25.038s
sys 8m39.226s real 8m56.226s
user 1m9.774s
sys 7m41.032s real 7m26.678s
user 0m54.833s
sys 6m20.057s real 19m5.218s
user 1m44.122s
sys 10m17.623s
2nd run real 11m30.500s
user 1m45.221s
sys 9m37.815s real 9m4.103s
user 1m6.893s
sys 7m49.393s real 7m26.713s
user 0m54.868s
sys 6m19.652s real 18m17.230s
user 1m42.749s
sys 9m53.704s

--
--

Thanks & Regards,
Suraj kharage,
EnterpriseDB Corporation,
The Postgres Database Company.

#13Dipesh Pandit
dipesh.pandit@gmail.com
In reply to: Suraj Kharage (#12)
Re: refactoring basebackup.c

Hi,

I have repeated the experiment with 8K block size and found that the
results are not varying much after applying the patch.
Please find the details below.

*Backup type*: local backup using pg_basebackup
*Data size*: Around 200GB (200 tables - each table around 1.05 GB)
*TAR_SEND_SIZE value*: 8kb

*Server details:*
RAM: 500 GB CPU details: Architecture: x86_64 CPU op-mode(s): 32-bit,
64-bit Byte Order: Little Endian CPU(s): 128 Filesystem: ext4

*Results:*

Iteration WIthout refactor
patch WIth refactor
patch
1st run real 10m19.001s
user 1m37.895s
sys 8m33.008s real 9m45.291s
user 1m23.192s
sys 8m14.993s
2nd run real 9m33.970s
user 1m19.490s
sys 8m6.062s real 9m30.560s
user 1m22.124s
sys 8m0.979s
3rd run real 9m19.327s
user 1m21.772s
sys 7m50.613s real 8m59.241s
user 1m19.001s
sys 7m32.645s
4th run real 9m56.873s
user 1m22.370s
sys 8m27.054s real 9m52.290s
user 1m22.175s
sys 8m23.052s
5th run real 9m45.343s
user 1m23.113s
sys 8m15.418s real 9m49.633s
user 1m23.122s
sys 8m19.240s

Later I connected with Suraj to validate the experiment details and found
that the setup and steps followed are exactly the same in this
experiment when compared with the previous experiment.

Thanks,
Dipesh

On Thu, May 14, 2020 at 7:50 AM Suraj Kharage <
suraj.kharage@enterprisedb.com> wrote:

Show quoted text

Hi,

On Wed, May 13, 2020 at 7:49 PM Robert Haas <robertmhaas@gmail.com> wrote:

So the patch came out slightly faster at 8kB and slightly slower in the
other tests. That's kinda strange. I wonder if it's just noise. How much do
the results vary run to run?

It is not varying much except for 8kB run. Please see below details for
both runs of each scenario.

8kb 32kb (default value) 128kB 1024kB
WIthout refactor
patch 1st run real 10m50.924s
user 1m29.774s
sys 9m13.058s real 8m36.245s
user 1m8.471s
sys 7m21.520s real 7m8.690s
user 0m54.840s
sys 6m1.725s real 18m16.898s
user 1m39.105s
sys 9m42.803s
2nd run real 10m22.718s
user 1m23.629s
sys 8m51.410s real 8m44.455s
user 1m7.896s
sys 7m28.909s real 6m54.299s
user 0m55.690s
sys 5m46.502s real 18m3.511s
user 1m38.197s
sys 9m36.517s
WIth refactor
patch 1st run real 10m11.350s
user 1m25.038s
sys 8m39.226s real 8m56.226s
user 1m9.774s
sys 7m41.032s real 7m26.678s
user 0m54.833s
sys 6m20.057s real 19m5.218s
user 1m44.122s
sys 10m17.623s
2nd run real 11m30.500s
user 1m45.221s
sys 9m37.815s real 9m4.103s
user 1m6.893s
sys 7m49.393s real 7m26.713s
user 0m54.868s
sys 6m19.652s real 18m17.230s
user 1m42.749s
sys 9m53.704s

--
--

Thanks & Regards,
Suraj kharage,
EnterpriseDB Corporation,
The Postgres Database Company.

#14Suraj Kharage
suraj.kharage@enterprisedb.com
In reply to: Dipesh Pandit (#13)
Re: refactoring basebackup.c

On Tue, Jun 30, 2020 at 10:45 AM Dipesh Pandit <dipesh.pandit@gmail.com>
wrote:

Hi,

I have repeated the experiment with 8K block size and found that the
results are not varying much after applying the patch.
Please find the details below.

Later I connected with Suraj to validate the experiment details and found
that the setup and steps followed are exactly the same in this
experiment when compared with the previous experiment.

Thanks Dipesh.
It looks like the results are not varying much with your run as you
followed the same steps.
One of my run with 8kb which took more time than others might be because of
noise at that time.

--
--

Thanks & Regards,
Suraj kharage,

edbpostgres.com

#15Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#2)
Re: refactoring basebackup.c

On Fri, May 8, 2020 at 4:55 PM Robert Haas <robertmhaas@gmail.com> wrote:

So it might be good if I'd remembered to attach the patches. Let's try
that again.

Here's an updated patch set. This is now rebased over master and
includes as 0001 the patch I posted separately at
/messages/by-id/CA+TgmobAczXDRO_Gr2euo_TxgzaH1JxbNxvFx=HYvBinefNH8Q@mail.gmail.com
but drops some other patches that were committed meanwhile. 0002-0009
of this series are basically the same as 0004-0011 from the previous
series, except for rebasing and fixing a bug I discovered in what's
now 0006. 0012 does a refactoring of pg_basebackup along similar lines
to the server-side refactoring from patches earlier in the series.
0012 is a really terrible, hacky, awful demonstration of how this
infrastructure can support server-side compression. If you apply it
and take a tar-format backup without -R, you will get .tar files that
are actually .tar.gz files. You can rename them, decompress them, and
use pg_verifybackup to check that everything is OK. If you try to do
anything else with 0012 applied, everything will break.

In the process of working on this, I learned a lot about how
pg_basebackup actually works, and found out about a number of things
that, with the benefit of hindsight, seem like they might not have
been the best way to go.

1. pg_basebackup -R injects recovery.conf (on older versions) or
injects standby.signal and appends to postgresql.auto.conf (on newer
versions) by parsing the tar file sent by the server and editing it on
the fly. From the point of view of server-side compression, this is
not ideal, because if you want to make these kinds of changes when
server-side compression is in use, you'd have to decompress the stream
on the client side in order to figure out where in the steam you ought
to inject your changes. But having to do that is a major expense. If
the client instead told the server what to change when generating the
archive, and the server did it, this expense could be avoided. It
would have the additional advantage that the backup manifest could
reflect the effects of those changes; right now it doesn't, and
pg_verifybackup just knows to expect differences in those files.

2. According to the comments, some tar programs require two tar blocks
(i.e. 512-byte blocks) of zero bytes at the end of an archive. The
server does not generate these blocks of zero bytes, so it basically
creates a tar file that works fine with my copy of tar but might break
with somebody else's. Instead, the client appends 1024 zero bytes to
the end of every file it receives from the server. That is an odd way
of fixing this problem, and it makes things rather inflexible. If the
server sends you any kind of a file OTHER THAN a tar file with the
last 1024 zero bytes stripped off, then adding 1024 zero bytes will be
the wrong thing to do. It would be better if the server just generated
fully correct tar files (whatever we think that means) and the client
wrote out exactly what it got from the server. Then, we could have the
server generate cpio archives or zip files or gzip-compressed tar
files or lz4-compressed tar files or anything we like, and the client
wouldn't really need to care as long as it didn't need to extract
those archives. That seems a lot cleaner.

3. The way that progress reporting works relies on the server knowing
exactly how large the archive sent to the client is going to be.
Progress as reckoned by the client is equal to the number of archive
payload bytes the client has received. This works OK with a tar
because we know how big the tar file is going to be based on the size
of the input files we intend to send, but it's unsuitable for any sort
of compressed archive (tar.gz, zip, whatever) because the compression
ratio cannot be predicted in advance. It would be better if the server
sent the payload bytes (possibly compressed) interleaved with progress
indicators, so that the client could correctly indicate that, say, the
backup is 30% complete because 30GB of 100GB has been processed on the
server side, even though the amount of data actually received by the
client might be 25GB or 20GB or 10GB or whatever because it got
compressed before transmission.

4. A related consideration is that we might want to have an option to
do something with the backup other than send it to the client. For
example, it might be useful to have an option for pg_basebackup to
tell the server to write the backup files to some specified server
directory, or to, say, S3. There are security concerns there, and I'm
not proposing to do anything about this immediately, but it seems like
something we might eventually want to have. In such a case, we are not
going to send any payload to the client, but the client probably still
wants progress indicators, so the current system of coupling progress
to the number of bytes received by the client breaks down for that
reason also.

5. As things stand today, the client must know exactly how many
archives it should expect to receive from the server and what each one
is. It can do that, because it knows to expect one archive per
tablespace, and the archive must be an uncompressed tarfile, so there
is no ambiguity. But, if the server could send archives to other
places, or send other kinds of archives to the client, then this would
become more complex. There is no intrinsic reason why the logic on the
client side can't simply be made more complicated in order to cope,
but it doesn't seem like great design, because then every time you
enhance the server, you've also got to enhance the client, and that
limits cross-version compatibility, and also seems more fragile. I
would rather that the server advertise the number of archives and the
names of each archive to the client explicitly, allowing the client to
be dumb unless it needs to post-process (e.g. extract) those archives.

Putting all of the above together, what I propose - but have not yet
tried to implement - is a new COPY sub-protocol for taking base
backups. Instead of sending a COPY stream per archive, the server
would send a single COPY stream where the first byte of each message
is a type indicator, like we do with the replication sub-protocol
today. For example, if the first byte is 'a' that could indicate that
we're beginning a new archive and the rest of the message would
indicate the archive name and perhaps some flags or options. If the
first byte is 'p' that could indicate that we're sending archive
payload, perhaps with the first four bytes of the message being
progress, i.e. the number of newly-processed bytes on the server side
prior to any compression, and the remaining bytes being payload. On
receipt of such a message, the client would increment the progress
indicator by the value indicated in those first four bytes, and then
process the remaining bytes by writing them to a file or whatever
behavior the user selected via -Fp, -Ft, -Z, etc. To be clear, I'm not
saying that this specific thing is the right thing, just something of
this sort. The server would need to continue supporting the current
multi-copy protocol for compatibility with older pg_basebackup
versions, and pg_basebackup would need to continue to support it for
compatibility with older server versions, but we could use the new
approach going forward. (Or, we could break compatibility, but that
would probably be unpopular and seems unnecessary and even risky to me
at this point.)

The ideas in the previous paragraph would address #3-#5 directly, but
they also indirectly address #2 because while we're switching
protocols we could easily move the padding with zero bytes to the
server side, and I think we should. #1 is a bit of a separate
consideration. To tackle #1 along the lines proposed above, the client
needs a way to send the recovery.conf contents to the server so that
the server can inject them into the tar file. It's not exactly clear
to me what the best way of permitting this is, and maybe there's a
totally different approach that would be altogether better. One thing
to consider is that we might well want the client to be able to send
*multiple* chunks of data to the server at the start of a backup. For
instance, suppose we want to support incremental backups. I think the
right approach is for the client to send the backup_manifest file from
the previous full backup to the server. What exactly the server does
with it afterward depends on your preferred approach, but the
necessary information is there. Maybe incremental backup is based on
comparing cryptographic checksums, so the server looks at all the
files and sends to the client those where the checksum (hopefully
SHA-something!) does not match. I wouldn't favor this approach myself,
but I know some people like it. Or maybe it's based on finding blocks
modified since the LSN of the previous backup; the manifest has enough
information for that to work, too. In such an approach, there can be
altogether new files with old LSNs, because files can be flat-copied
without changing block LSNs, so it's important to have the complete
list of files from the previous backup, and that too is in the
manifest. There are even timestamps for the bold among you. Anyway, my
point is to advocate for a design where the client says (1) I want a
backup with these options and then (2) here are 0, 1, or >1 files
(recovery parameters and/or backup manifest and/or other things) in
support of that and then the server hands back a stream of archives
which the client may or may not choose to post-process.

It's tempting to think about solving this problem by appealing to
CopyBoth, but I think that might be the wrong idea. The reason we use
CopyBoth for the replication subprotocol is because there's periodic
messages flowing in both directions that are only loosely coupled to
each other. Apart from reading frequently enough to avoid a deadlock
because both sides have full write buffers, each end of the connection
can kind of do whatever it wants. But for the kinds of use cases I'm
talking about here, that's not so. First the client talks and the
server acknowledges, then the reverse. That doesn't mean we couldn't
use CopyBoth, but maybe a CopyIn followed by a CopyOut would be more
straightforward. I am not sure of the details here and am happy to
hear the ideas of others.

One final thought is that the options framework for pg_basebackup is a
little unfortunate. As of today, what the client receives, always, is
a series of tar files. If you say -Fp, it doesn't change the backup
format; it just extracts the tar files. If you say -Ft, it doesn't. If
you say -Ft but also -Z, it compresses the tar files. Thinking just
about server-side compression and ignoring for the moment more remote
features like alternate archive formats (e.g. zip) or things like
storing the backup to an alternate location rather than returning it
to the client, you probably want the client to be able to specify at
least (1) server-side compression (perhaps with one of several
algorithms) and the client just writes the results, (2) server-side
compression (still with a choice of algorithm) and the client
decompresses but does not extract, (3) server-side compression (still
with a choice of algorithms) and the client decompresses and extracts,
(4) client-side compression (with a choice of algorithms), and (5)
client-side extraction. You might also want (6) server-side
compression (with a choice of algorithms) and client-side decompresses
and then re-compresses with a different algorithm (e.g. lz4 on the
server to save bandwidth at moderate CPU cost into parallel bzip2 on
the client for minimum archival storage). Or, as also discussed
upthread, you might want (7) server-side compression of each file
individually, so that you get a seekable archive of individually
compressed files (e.g. to support fast delta restore). I think that
with these refactoring patches - and the wire protocol redesign
mentioned above - it is very reasonable to offer as many of these
options as we believe users will find useful, but it is not very clear
how to extend the current command-line option framework to support
them. It probably would have been better if pg_basebackup, instead of
having -Fp and -Ft, just had an --extract option that you could either
specify or omit, because that would not have presumed anything about
the archive format, but the existing structure is well-baked at this
point.

Thanks,

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

v2-0001-Flexible-options-for-BASE_BACKUP-and-CREATE_REPLI.patchapplication/octet-stream; name=v2-0001-Flexible-options-for-BASE_BACKUP-and-CREATE_REPLI.patchDownload+220-74
v2-0003-Introduce-bbsink-abstraction.patchapplication/octet-stream; name=v2-0003-Introduce-bbsink-abstraction.patchDownload+249-1
v2-0002-Recast-_tarWriteDirectory-as-convert_link_to_dire.patchapplication/octet-stream; name=v2-0002-Recast-_tarWriteDirectory-as-convert_link_to_dire.patchDownload+14-13
v2-0005-Convert-throttling-related-code-to-a-bbsink.patchapplication/octet-stream; name=v2-0005-Convert-throttling-related-code-to-a-bbsink.patchDownload+217-120
v2-0004-Convert-libpq-related-code-to-a-bbsink.patchapplication/octet-stream; name=v2-0004-Convert-libpq-related-code-to-a-bbsink.patchDownload+388-234
v2-0006-Convert-progress-reporting-code-to-a-bbsink.patchapplication/octet-stream; name=v2-0006-Convert-progress-reporting-code-to-a-bbsink.patchDownload+308-97
v2-0009-WIP-Convert-backup-manifest-generation-to-a-bbarc.patchapplication/octet-stream; name=v2-0009-WIP-Convert-backup-manifest-generation-to-a-bbarc.patchDownload+153-72
v2-0008-Create-and-use-bbarchiver-implementations-for-tar.patchapplication/octet-stream; name=v2-0008-Create-and-use-bbarchiver-implementations-for-tar.patchDownload+454-242
v2-0007-Introduce-bbarchiver-abstraction.patchapplication/octet-stream; name=v2-0007-Introduce-bbarchiver-abstraction.patchDownload+315-1
v2-0010-WIP-Introduce-bbstreamer-abstration-and-adapt-pg_.patchapplication/octet-stream; name=v2-0010-WIP-Introduce-bbstreamer-abstration-and-adapt-pg_.patchDownload+1568-698
v2-0011-POC-Embarrassingly-bad-server-side-compression-pa.patchapplication/octet-stream; name=v2-0011-POC-Embarrassingly-bad-server-side-compression-pa.patchDownload+177-4
#16Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#15)
Re: refactoring basebackup.c

Hi,

On 2020-07-29 11:31:26 -0400, Robert Haas wrote:

Here's an updated patch set. This is now rebased over master and
includes as 0001 the patch I posted separately at
/messages/by-id/CA+TgmobAczXDRO_Gr2euo_TxgzaH1JxbNxvFx=HYvBinefNH8Q@mail.gmail.com
but drops some other patches that were committed meanwhile. 0002-0009
of this series are basically the same as 0004-0011 from the previous
series, except for rebasing and fixing a bug I discovered in what's
now 0006. 0012 does a refactoring of pg_basebackup along similar lines
to the server-side refactoring from patches earlier in the series.

Have you tested whether this still works against older servers? Or do
you think we should not have that as a goal?

1. pg_basebackup -R injects recovery.conf (on older versions) or
injects standby.signal and appends to postgresql.auto.conf (on newer
versions) by parsing the tar file sent by the server and editing it on
the fly. From the point of view of server-side compression, this is
not ideal, because if you want to make these kinds of changes when
server-side compression is in use, you'd have to decompress the stream
on the client side in order to figure out where in the steam you ought
to inject your changes. But having to do that is a major expense. If
the client instead told the server what to change when generating the
archive, and the server did it, this expense could be avoided. It
would have the additional advantage that the backup manifest could
reflect the effects of those changes; right now it doesn't, and
pg_verifybackup just knows to expect differences in those files.

Hm. I don't think I terribly like the idea of things like -R having to
be processed server side. That'll be awfully annoying to keep working
across versions, for one. But perhaps the config file should just not be
in the main tar file going forward?

I think we should eventually be able to use one archive for multiple
purposes, e.g. to set up a standby as well as using it for a base
backup. Or multiple standbys with different tablespace remappings.

2. According to the comments, some tar programs require two tar blocks
(i.e. 512-byte blocks) of zero bytes at the end of an archive. The
server does not generate these blocks of zero bytes, so it basically
creates a tar file that works fine with my copy of tar but might break
with somebody else's. Instead, the client appends 1024 zero bytes to
the end of every file it receives from the server. That is an odd way
of fixing this problem, and it makes things rather inflexible. If the
server sends you any kind of a file OTHER THAN a tar file with the
last 1024 zero bytes stripped off, then adding 1024 zero bytes will be
the wrong thing to do. It would be better if the server just generated
fully correct tar files (whatever we think that means) and the client
wrote out exactly what it got from the server. Then, we could have the
server generate cpio archives or zip files or gzip-compressed tar
files or lz4-compressed tar files or anything we like, and the client
wouldn't really need to care as long as it didn't need to extract
those archives. That seems a lot cleaner.

Yea.

5. As things stand today, the client must know exactly how many
archives it should expect to receive from the server and what each one
is. It can do that, because it knows to expect one archive per
tablespace, and the archive must be an uncompressed tarfile, so there
is no ambiguity. But, if the server could send archives to other
places, or send other kinds of archives to the client, then this would
become more complex. There is no intrinsic reason why the logic on the
client side can't simply be made more complicated in order to cope,
but it doesn't seem like great design, because then every time you
enhance the server, you've also got to enhance the client, and that
limits cross-version compatibility, and also seems more fragile. I
would rather that the server advertise the number of archives and the
names of each archive to the client explicitly, allowing the client to
be dumb unless it needs to post-process (e.g. extract) those archives.

ISTM that that can help to some degree, but things like tablespace
remapping etc IMO aren't best done server side, so I think the client
will continue to need to know about the contents to a significnat
degree?

Putting all of the above together, what I propose - but have not yet
tried to implement - is a new COPY sub-protocol for taking base
backups. Instead of sending a COPY stream per archive, the server
would send a single COPY stream where the first byte of each message
is a type indicator, like we do with the replication sub-protocol
today. For example, if the first byte is 'a' that could indicate that
we're beginning a new archive and the rest of the message would
indicate the archive name and perhaps some flags or options. If the
first byte is 'p' that could indicate that we're sending archive
payload, perhaps with the first four bytes of the message being
progress, i.e. the number of newly-processed bytes on the server side
prior to any compression, and the remaining bytes being payload. On
receipt of such a message, the client would increment the progress
indicator by the value indicated in those first four bytes, and then
process the remaining bytes by writing them to a file or whatever
behavior the user selected via -Fp, -Ft, -Z, etc.

Wonder if there's a way to get this to be less stateful. It seems a bit
ugly that the client would know what the last 'a' was for a 'p'? Perhaps
we could actually make 'a' include an identifier for each archive, and
then 'p' would append to a specific archive? Which would then also would
allow for concurrent processing of those archives on the server side.

I'd personally rather have a separate message type for progress and
payload. Seems odd to have to send payload messages with 0 payload just
because we want to update progress (in case of uploading to
e.g. S3). And I think it'd be nice if we could have a more extensible
progress measurement approach than a fixed length prefix. E.g. it might
be nice to allow it to report both the overall progress, as well as a
per archive progress. Or we might want to send progress when uploading
to S3, even when not having pre-calculated the total size of the data
directory.

Greetings,

Andres Freund

#17Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#16)
Re: refactoring basebackup.c

On Fri, Jul 31, 2020 at 12:49 PM Andres Freund <andres@anarazel.de> wrote:

Have you tested whether this still works against older servers? Or do
you think we should not have that as a goal?

I haven't tested that recently but I intended to keep it working. I'll
make sure to nail that down before I get to the point of committing
anything, but I don't expect big problems. It's kind of annoying to
have so much backward compatibility stuff here but I think ripping any
of that out should wait for another time.

Hm. I don't think I terribly like the idea of things like -R having to
be processed server side. That'll be awfully annoying to keep working
across versions, for one. But perhaps the config file should just not be
in the main tar file going forward?

That'd be a user-visible change, though, whereas what I'm proposing
isn't. Instead of directly injecting stuff, the client can just send
it to the server and have the server inject it, provided the server is
new enough. Cross-version issues don't seem to be any worse than now.
That being said, I don't love it, either. We could just suggest to
people that using -R together with server compression is

I think we should eventually be able to use one archive for multiple
purposes, e.g. to set up a standby as well as using it for a base
backup. Or multiple standbys with different tablespace remappings.

I don't think I understand your point here.

ISTM that that can help to some degree, but things like tablespace
remapping etc IMO aren't best done server side, so I think the client
will continue to need to know about the contents to a significnat
degree?

If I'm not mistaken, those mappings are only applied with -Fp i.e. if
we're extracting. And it's no problem to jigger things in that case;
we can only do this if we understand the archive in the first place.
The problem is when you have to decompress and recompress to jigger
things.

Wonder if there's a way to get this to be less stateful. It seems a bit
ugly that the client would know what the last 'a' was for a 'p'? Perhaps
we could actually make 'a' include an identifier for each archive, and
then 'p' would append to a specific archive? Which would then also would
allow for concurrent processing of those archives on the server side.

...says the guy working on asynchronous I/O. I don't know, it's not a
bad idea, but I think we'd have to change a LOT of code to make it
actually do something useful. I feel like this could be added as a
later extension of the protocol, rather than being something that we
necessarily need to do now.

I'd personally rather have a separate message type for progress and
payload. Seems odd to have to send payload messages with 0 payload just
because we want to update progress (in case of uploading to
e.g. S3). And I think it'd be nice if we could have a more extensible
progress measurement approach than a fixed length prefix. E.g. it might
be nice to allow it to report both the overall progress, as well as a
per archive progress. Or we might want to send progress when uploading
to S3, even when not having pre-calculated the total size of the data
directory.

I don't mind a separate message type here, but if you want merging of
short messages with adjacent longer messages to generate a minimal
number of system calls, that might have some implications for the
other thread where we're talking about how to avoid extra memory
copies when generating protocol messages. If you don't mind them going
out as separate network packets, then it doesn't matter.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#18Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Robert Haas (#15)
Re: refactoring basebackup.c

On Jul 29, 2020, at 8:31 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, May 8, 2020 at 4:55 PM Robert Haas <robertmhaas@gmail.com> wrote:

So it might be good if I'd remembered to attach the patches. Let's try
that again.

Here's an updated patch set.

Hi Robert,

v2-0001 through v2-0009 still apply cleanly, but v2-0010 no longer applies. It seems to be conflicting with Heikki's work from August. Could you rebase please?


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#19Robert Haas
robertmhaas@gmail.com
In reply to: Mark Dilger (#18)
Re: refactoring basebackup.c

On Wed, Oct 21, 2020 at 12:14 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:

v2-0001 through v2-0009 still apply cleanly, but v2-0010 no longer applies. It seems to be conflicting with Heikki's work from August. Could you rebase please?

Here at last is a new version. I've dropped the "bbarchiver" patch for
now, added a new patch that I'll talk about below, and revised the
others. I'm pretty happy with the code now, so I guess the main things
that I'd like feedback on are (1) whether design changes seem to be
needed and (2) the UI. Once we have that stuff hammered out, I'll work
on adding documentation, which is missing at present. The interesting
patches in terms of functionality are 0006 and 0007; the rest is
preparatory refactoring.

0006 adds a concept of base backup "targets," which means that it lets
you send the base backup to someplace other than the client. You
specify the target using a new "-t" option to pg_basebackup. By way of
example, 0006 adds a "blackhole" target which throws the backup away
instead of sending it anywhere, and also a "server" target which
stores the backup to the server filesystem in lieu of streaming it to
the client. So you can say something like "pg_basebackup -Xnone -Ft -t
server:/backup/2021-07-08" and, provided that you're superuser, the
server will try to drop the backup there. At present, you can't use
-Fp or -Xfetch or -Xstream with a backup target, because that
functionality is implemented on the client side. I think that's an
acceptable restriction. Eventually I imagine we will want to have
targets like "aws" or "s3" or maybe some kind of plug-in system for
new targets. I haven't designed anything like that yet, but I think
it's probably not all that hard to generalize what I've got.

0007 adds server-side compression; currently, it only supports
server-side compression using gzip, but I hope that it won't be hard
to generalize that to support LZ4 as well, and Andres told me he
thinks we should aim to support zstd since that library has built-in
parallel compression which is very appealing in this context. So you
say something like "pg_basebackup -Ft --server-compression=gzip -D
/backup/2021-07-08" or, if you want that compressed backup stored on
the server and compressed as hard as possible, you could say
"pg_basebackup -Xnone -Ft --server-compression=gzip9 -t
server:/backup/2021-07-08". Unfortunately, here again there are a
number of features that are implemented on the client side, and they
don't work in combination with this. -Fp could be made to work by
teaching the client to decompress; I just haven't written the code to
do that. It's probably not very useful in general, but maybe there's a
use case if you're really tight on network bandwidth. Making -R work
looks outright useless, because the client would have to get the whole
compressed tarfile from the server and then uncompress it, edit the
tar file, and recompress. That seems like a thing no one can possibly
want. Also, if you say pg_basebackup -Ft -D- >whatever.tar, the server
injects the backup manifest into the tarfile, which if you used
--server-compression would require decompressing and recompressing the
whole thing, so it doesn't seem worth supporting. It's more likely to
be a footgun than to help anybody. This option can be used with
-Xstream or -Xfetch, but it doesn't compress pg_wal.tar, because
that's generated on the client side.

The thing I'm really unhappy with here is the -F option to
pg_basebackup, which presently allows only p for plain or t for tar.
For purposes of these patches, I've essentially treated this as if -Fp
means "I want the tar files the server sends to be extracted" and
"-Ft" as if it means "I'm happy with them the way they are." Under
that interpretation, it's fine for --server-compression to cause e.g.
base.tar.gz to be written, because that's what the server sent. But
it's not really a "tar" output format; it's a "tar.gz" output format.
However, it doesn't seem to make any sense to define -Fz to mean "i
want tar.gz output" because -Z or -z already produces tar.gz output
when used with -Ft, and also because it would be redundant to make
people specify both -Fz and --server-compression. Similarly, when you
use --target, the output format is arguably, well, nothing. I mean,
some tar files got stored to the target, but you don't have them, but
again it seems redundant to have people specify --target and then also
have to change the argument to -F. Hindsight being 20-20, I think we
would have been better off not having a -Ft or -Fp option at all, and
having an --extract option that says you want to extract what the
server sends you, but it's probably too late to make that change now.
Or maybe it isn't, and we should just break command-line argument
compatibility for v15. I don't know. Opinions appreciated, especially
if they are nuanced.

If you're curious about what the other patches in the series do,
here's a very fast recap; see commit messages for more. 0001 revises
the grammar for some replication commands to use an extensible-options
syntax. 0002 is a trivial refactoring of basebackup.c. 0003 and 0004
refactor the server's basebackup.c and the client's pg_basebackup.c,
respectively, by introducing abstractions called bbsink and
bbstreamer. 0005 introduces a new COPY sub-protocol for taking base
backups. I think it's worth mentioning that I believe that this
refactoring is quite powerful and could let us do a bunch of other
things that this patch set doesn't attempt. For instance, since this
makes it pretty easy to implement server-side compression, it could
probably also pretty easily be made to do server-side encryption, if
you're brave enough to want to have a discussion on pgsql-hackers
about how to design an encryption feature.

Thanks to my colleague Tushar Ahuja for helping test some of this code.

--
Robert Haas
EDB: http://www.enterprisedb.com

Attachments:

v3-0001-Flexible-options-for-BASE_BACKUP-and-CREATE_REPLI.patchapplication/octet-stream; name=v3-0001-Flexible-options-for-BASE_BACKUP-and-CREATE_REPLI.patchDownload+273-81
v3-0006-Support-base-backup-targets.patchapplication/octet-stream; name=v3-0006-Support-base-backup-targets.patchDownload+560-60
v3-0005-Modify-pg_basebackup-to-use-a-new-COPY-subprotoco.patchapplication/octet-stream; name=v3-0005-Modify-pg_basebackup-to-use-a-new-COPY-subprotoco.patchDownload+712-54
v3-0007-WIP-Server-side-gzip-compression.patchapplication/octet-stream; name=v3-0007-WIP-Server-side-gzip-compression.patchDownload+374-4
v3-0004-Introduce-bbstreamer-abstraction-to-modularize-pg.patchapplication/octet-stream; name=v3-0004-Introduce-bbstreamer-abstraction-to-modularize-pg.patchDownload+1691-722
v3-0002-Refactor-basebackup.c-s-_tarWriteDir-function.patchapplication/octet-stream; name=v3-0002-Refactor-basebackup.c-s-_tarWriteDir-function.patchDownload+14-13
v3-0003-Introduce-bbsink-abstraction-to-modularize-base-b.patchapplication/octet-stream; name=v3-0003-Introduce-bbsink-abstraction-to-modularize-base-b.patchDownload+1357-515
#20tushar
tushar.ahuja@enterprisedb.com
In reply to: Robert Haas (#19)
Re: refactoring basebackup.c

On 7/8/21 9:26 PM, Robert Haas wrote:

Here at last is a new version.

Please refer this scenario ,where backup target using
--server-compression is closing the server
unexpectedly if we don't provide -no-manifest option

[tushar@localhost bin]$ ./pg_basebackup --server-compression=gzip4  -t
server:/tmp/data_1  -Xnone
NOTICE:  WAL archiving is not enabled; you must ensure that all required
WAL segments are copied through other means to complete the backup
pg_basebackup: error: could not read COPY data: server closed the
connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.

if we try to check with -Ft then this same scenario  is working ?

[tushar@localhost bin]$ ./pg_basebackup --server-compression=gzip4  -Ft
-D data_0 -Xnone
NOTICE:  WAL archiving is not enabled; you must ensure that all required
WAL segments are copied through other means to complete the backup
[tushar@localhost bin]$

--
regards,tushar
EnterpriseDB https://www.enterprisedb.com/
The Enterprise PostgreSQL Company

#21Dilip Kumar
dilipbalaut@gmail.com
In reply to: tushar (#20)
#22tushar
tushar.ahuja@enterprisedb.com
In reply to: Robert Haas (#19)
#23Dilip Kumar
dilipbalaut@gmail.com
In reply to: Dilip Kumar (#21)
#24tushar
tushar.ahuja@enterprisedb.com
In reply to: Dilip Kumar (#21)
#25Dilip Kumar
dilipbalaut@gmail.com
In reply to: tushar (#24)
#26Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Robert Haas (#19)
#27Robert Haas
robertmhaas@gmail.com
In reply to: Mark Dilger (#26)
#28Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Robert Haas (#27)
#29Robert Haas
robertmhaas@gmail.com
In reply to: Mark Dilger (#28)
#30Mark Dilger
mark.dilger@enterprisedb.com
In reply to: Robert Haas (#29)
#31Robert Haas
robertmhaas@gmail.com
In reply to: Mark Dilger (#30)
#32tushar
tushar.ahuja@enterprisedb.com
In reply to: Dilip Kumar (#25)
#33Robert Haas
robertmhaas@gmail.com
In reply to: tushar (#32)
#34Jeevan Ladhe
jeevan.ladhe@enterprisedb.com
In reply to: Robert Haas (#19)
#35Robert Haas
robertmhaas@gmail.com
In reply to: Jeevan Ladhe (#34)
#36Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#35)
#37Jeevan Ladhe
jeevan.ladhe@enterprisedb.com
In reply to: Robert Haas (#35)
#38Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#36)
#39Robert Haas
robertmhaas@gmail.com
In reply to: Jeevan Ladhe (#37)
#40Robert Haas
robertmhaas@gmail.com
In reply to: Dilip Kumar (#38)
#41Dilip Kumar
dilipbalaut@gmail.com
In reply to: Robert Haas (#40)
In reply to: Robert Haas (#40)
#43Jeevan Ladhe
jeevan.ladhe@enterprisedb.com
In reply to: Robert Haas (#40)
#44Jeevan Ladhe
jeevan.ladhe@enterprisedb.com
In reply to: Robert Haas (#39)
#45Jeevan Ladhe
jeevan.ladhe@enterprisedb.com
In reply to: Robert Haas (#40)
#46Robert Haas
robertmhaas@gmail.com
In reply to: Sergei Kornilov (#42)
#47Robert Haas
robertmhaas@gmail.com
In reply to: Jeevan Ladhe (#43)
#48Robert Haas
robertmhaas@gmail.com
In reply to: Jeevan Ladhe (#44)
#49Robert Haas
robertmhaas@gmail.com
In reply to: Jeevan Ladhe (#45)
#50Jeevan Ladhe
jeevan.ladhe@enterprisedb.com
In reply to: Robert Haas (#48)
#51Jeevan Ladhe
jeevan.ladhe@enterprisedb.com
In reply to: Robert Haas (#49)
#52Robert Haas
robertmhaas@gmail.com
In reply to: Jeevan Ladhe (#50)
#53Jeevan Ladhe
jeevan.ladhe@enterprisedb.com
In reply to: Robert Haas (#52)
#54Jeevan Ladhe
jeevan.ladhe@enterprisedb.com
In reply to: Jeevan Ladhe (#53)
#55Jeevan Ladhe
jeevan.ladhe@enterprisedb.com
In reply to: Robert Haas (#47)
#56Robert Haas
robertmhaas@gmail.com
In reply to: Jeevan Ladhe (#55)
#57Robert Haas
robertmhaas@gmail.com
In reply to: Jeevan Ladhe (#54)
#58Jeevan Ladhe
jeevan.ladhe@enterprisedb.com
In reply to: Robert Haas (#57)
#59Robert Haas
robertmhaas@gmail.com
In reply to: Jeevan Ladhe (#58)
#60Jeevan Ladhe
jeevan.ladhe@enterprisedb.com
In reply to: Robert Haas (#59)
#61Robert Haas
robertmhaas@gmail.com
In reply to: Jeevan Ladhe (#60)
#62Jeevan Ladhe
jeevan.ladhe@enterprisedb.com
In reply to: Robert Haas (#61)
#63Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#61)
#64Jeevan Ladhe
jeevan.ladhe@enterprisedb.com
In reply to: Robert Haas (#63)
#65Robert Haas
robertmhaas@gmail.com
In reply to: Jeevan Ladhe (#64)
#66Jeevan Ladhe
jeevan.ladhe@enterprisedb.com
In reply to: Robert Haas (#65)
#67Robert Haas
robertmhaas@gmail.com
In reply to: Jeevan Ladhe (#66)
#68Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#67)
#69Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#67)
#70Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#69)
#71Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#70)
#72Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#71)
#73Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#72)
#74Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#73)
#75Dmitry Dolgov
9erthalion6@gmail.com
In reply to: Robert Haas (#69)
#76Robert Haas
robertmhaas@gmail.com
In reply to: Dmitry Dolgov (#75)
#77Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#76)
#78Jeevan Ladhe
jeevan.ladhe@enterprisedb.com
In reply to: Robert Haas (#77)
#79tushar
tushar.ahuja@enterprisedb.com
In reply to: Jeevan Ladhe (#78)
#80Jeevan Ladhe
jeevan.ladhe@enterprisedb.com
In reply to: tushar (#79)
#81tushar
tushar.ahuja@enterprisedb.com
In reply to: Jeevan Ladhe (#80)
#82tushar
tushar.ahuja@enterprisedb.com
In reply to: Jeevan Ladhe (#78)
#83tushar
tushar.ahuja@enterprisedb.com
In reply to: Jeevan Ladhe (#78)
#84Robert Haas
robertmhaas@gmail.com
In reply to: tushar (#83)
#85tushar
tushar.ahuja@enterprisedb.com
In reply to: Robert Haas (#84)
#86Robert Haas
robertmhaas@gmail.com
In reply to: tushar (#85)
#87tushar
tushar.ahuja@enterprisedb.com
In reply to: Jeevan Ladhe (#80)
#88Jeevan Ladhe
jeevan.ladhe@enterprisedb.com
In reply to: tushar (#87)
#89Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#77)
#90Robert Haas
robertmhaas@gmail.com
In reply to: Jeevan Ladhe (#88)
#91Dipesh Pandit
dipesh.pandit@gmail.com
In reply to: Robert Haas (#90)
#92Robert Haas
robertmhaas@gmail.com
In reply to: Dipesh Pandit (#91)
#93tushar
tushar.ahuja@enterprisedb.com
In reply to: Jeevan Ladhe (#88)
#94Robert Haas
robertmhaas@gmail.com
In reply to: Dipesh Pandit (#91)
#95Dipesh Pandit
dipesh.pandit@gmail.com
In reply to: Robert Haas (#94)
#96Robert Haas
robertmhaas@gmail.com
In reply to: Dipesh Pandit (#95)
#97Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#94)
#98Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#96)
#99Dipesh Pandit
dipesh.pandit@gmail.com
In reply to: Robert Haas (#98)
#100Robert Haas
robertmhaas@gmail.com
In reply to: Dipesh Pandit (#99)
In reply to: Robert Haas (#97)
In reply to: Shinoda, Noriyoshi (PN Japan FSIP) (#101)
In reply to: Dagfinn Ilmari Mannsåker (#102)
#104tushar
tushar.ahuja@enterprisedb.com
In reply to: Robert Haas (#97)
#105Robert Haas
robertmhaas@gmail.com
In reply to: Dagfinn Ilmari Mannsåker (#103)
#106Michael Paquier
michael@paquier.xyz
In reply to: Shinoda, Noriyoshi (PN Japan FSIP) (#101)
#107Michael Paquier
michael@paquier.xyz
In reply to: tushar (#104)
#108Robert Haas
robertmhaas@gmail.com
In reply to: Michael Paquier (#106)
#109Robert Haas
robertmhaas@gmail.com
In reply to: tushar (#104)
#110Dipesh Pandit
dipesh.pandit@gmail.com
In reply to: Robert Haas (#109)
#111tushar
tushar.ahuja@enterprisedb.com
In reply to: Robert Haas (#109)
#112Robert Haas
robertmhaas@gmail.com
In reply to: tushar (#111)
#113tushar
tushar.ahuja@enterprisedb.com
In reply to: Robert Haas (#112)
#114Robert Haas
robertmhaas@gmail.com
In reply to: tushar (#113)
#115Robert Haas
robertmhaas@gmail.com
In reply to: Dipesh Pandit (#110)
#116Dipesh Pandit
dipesh.pandit@gmail.com
In reply to: Robert Haas (#115)
#117tushar
tushar.ahuja@enterprisedb.com
In reply to: Robert Haas (#114)
#118Robert Haas
robertmhaas@gmail.com
In reply to: Dipesh Pandit (#116)
#119Jeevan Ladhe
jeevan.ladhe@enterprisedb.com
In reply to: Robert Haas (#118)
#120Robert Haas
robertmhaas@gmail.com
In reply to: Jeevan Ladhe (#119)
#121Jeevan Ladhe
jeevan.ladhe@enterprisedb.com
In reply to: Robert Haas (#120)
#122Jeevan Ladhe
jeevan.ladhe@enterprisedb.com
In reply to: Robert Haas (#120)
#123Robert Haas
robertmhaas@gmail.com
In reply to: Jeevan Ladhe (#122)
#124Jeevan Ladhe
jeevan.ladhe@enterprisedb.com
In reply to: Jeevan Ladhe (#122)
#125Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#89)
#126Abhijit Menon-Sen
ams@2ndQuadrant.com
In reply to: Robert Haas (#125)
#127Dipesh Pandit
dipesh.pandit@gmail.com
In reply to: Abhijit Menon-Sen (#126)
#128Jeevan Ladhe
jeevan.ladhe@enterprisedb.com
In reply to: Dipesh Pandit (#127)
#129Dipesh Pandit
dipesh.pandit@gmail.com
In reply to: Jeevan Ladhe (#128)
#130Jeevan Ladhe
jeevan.ladhe@enterprisedb.com
In reply to: Dipesh Pandit (#129)
#131Dipesh Pandit
dipesh.pandit@gmail.com
In reply to: Jeevan Ladhe (#130)
#132Robert Haas
robertmhaas@gmail.com
In reply to: Jeevan Ladhe (#130)
#133Robert Haas
robertmhaas@gmail.com
In reply to: Dipesh Pandit (#131)
#134Jeevan Ladhe
jeevan.ladhe@enterprisedb.com
In reply to: Robert Haas (#133)
#135Justin Pryzby
pryzby@telsasoft.com
In reply to: Jeevan Ladhe (#134)
#136Robert Haas
robertmhaas@gmail.com
In reply to: Justin Pryzby (#135)
In reply to: Robert Haas (#136)
#138Justin Pryzby
pryzby@telsasoft.com
In reply to: Justin Pryzby (#135)
#139Andres Freund
andres@anarazel.de
In reply to: Justin Pryzby (#138)
#140Robert Haas
robertmhaas@gmail.com
In reply to: Shinoda, Noriyoshi (PN Japan FSIP) (#137)
#141Jeevan Ladhe
jeevan.ladhe@enterprisedb.com
In reply to: Robert Haas (#140)
#142Robert Haas
robertmhaas@gmail.com
In reply to: Abhijit Menon-Sen (#126)
#143tushar
tushar.ahuja@enterprisedb.com
In reply to: Jeevan Ladhe (#141)
#144Justin Pryzby
pryzby@telsasoft.com
In reply to: Jeevan Ladhe (#141)
#145Jeevan Ladhe
jeevan.ladhe@enterprisedb.com
In reply to: tushar (#143)
#146Robert Haas
robertmhaas@gmail.com
In reply to: Justin Pryzby (#144)
#147Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#140)
#148Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#147)
#149Jeevan Ladhe
jeevan.ladhe@enterprisedb.com
In reply to: Robert Haas (#148)
#150Robert Haas
robertmhaas@gmail.com
In reply to: Jeevan Ladhe (#149)
#151Jeevan Ladhe
jeevan.ladhe@enterprisedb.com
In reply to: Robert Haas (#150)
#152Dipesh Pandit
dipesh.pandit@gmail.com
In reply to: Jeevan Ladhe (#151)
#153Robert Haas
robertmhaas@gmail.com
In reply to: Dipesh Pandit (#152)
#154Robert Haas
robertmhaas@gmail.com
In reply to: Jeevan Ladhe (#151)
#155Jeevan Ladhe
jeevan.ladhe@enterprisedb.com
In reply to: Robert Haas (#154)
#156Robert Haas
robertmhaas@gmail.com
In reply to: Jeevan Ladhe (#155)
#157Jeevan Ladhe
jeevan.ladhe@enterprisedb.com
In reply to: Robert Haas (#156)
#158Robert Haas
robertmhaas@gmail.com
In reply to: Jeevan Ladhe (#157)
#159Jeevan Ladhe
jeevan.ladhe@enterprisedb.com
In reply to: Robert Haas (#158)
#160Justin Pryzby
pryzby@telsasoft.com
In reply to: Jeevan Ladhe (#159)
#161Robert Haas
robertmhaas@gmail.com
In reply to: Justin Pryzby (#160)
#162Justin Pryzby
pryzby@telsasoft.com
In reply to: Robert Haas (#161)
#163Robert Haas
robertmhaas@gmail.com
In reply to: Justin Pryzby (#162)
#164Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#142)
#165Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#161)
#166Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#165)
#167Dipesh Pandit
dipesh.pandit@gmail.com
In reply to: Robert Haas (#166)
#168Justin Pryzby
pryzby@telsasoft.com
In reply to: Dipesh Pandit (#167)
#169Robert Haas
robertmhaas@gmail.com
In reply to: Justin Pryzby (#168)
#170Justin Pryzby
pryzby@telsasoft.com
In reply to: Robert Haas (#169)
#171Robert Haas
robertmhaas@gmail.com
In reply to: Justin Pryzby (#170)
#172Jeevan Ladhe
jeevan.ladhe@enterprisedb.com
In reply to: Dipesh Pandit (#167)
#173Robert Haas
robertmhaas@gmail.com
In reply to: Jeevan Ladhe (#172)
#174Justin Pryzby
pryzby@telsasoft.com
In reply to: Robert Haas (#173)
#175Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#171)
#176Justin Pryzby
pryzby@telsasoft.com
In reply to: Robert Haas (#175)
#177Robert Haas
robertmhaas@gmail.com
In reply to: Justin Pryzby (#176)
#178Robert Haas
robertmhaas@gmail.com
In reply to: Justin Pryzby (#176)
#179Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#178)
#180Justin Pryzby
pryzby@telsasoft.com
In reply to: Robert Haas (#178)
#181Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#179)
#182Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#181)
#183Robert Haas
robertmhaas@gmail.com
In reply to: Justin Pryzby (#180)
#184Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#182)
#185Justin Pryzby
pryzby@telsasoft.com
In reply to: Robert Haas (#183)
#186Robert Haas
robertmhaas@gmail.com
In reply to: Justin Pryzby (#185)
#187Justin Pryzby
pryzby@telsasoft.com
In reply to: Robert Haas (#186)
#188Robert Haas
robertmhaas@gmail.com
In reply to: Justin Pryzby (#187)
In reply to: Robert Haas (#188)
#190Robert Haas
robertmhaas@gmail.com
In reply to: Dagfinn Ilmari Mannsåker (#189)
#191Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#190)
#192Robert Haas
robertmhaas@gmail.com
In reply to: Dipesh Pandit (#167)
#193Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#192)
#194Justin Pryzby
pryzby@telsasoft.com
In reply to: Robert Haas (#192)
#195Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#193)
#196Robert Haas
robertmhaas@gmail.com
In reply to: Justin Pryzby (#194)
#197Justin Pryzby
pryzby@telsasoft.com
In reply to: Robert Haas (#192)
#198Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#195)
#199Andres Freund
andres@anarazel.de
In reply to: Justin Pryzby (#197)
#200Robert Haas
robertmhaas@gmail.com
In reply to: Justin Pryzby (#197)
In reply to: Robert Haas (#192)
#202Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#198)
#203Robert Haas
robertmhaas@gmail.com
In reply to: Justin Pryzby (#194)
#204Dipesh Pandit
dipesh.pandit@gmail.com
In reply to: Robert Haas (#203)
#205Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#190)
#206Justin Pryzby
pryzby@telsasoft.com
In reply to: Robert Haas (#196)
#207Robert Haas
robertmhaas@gmail.com
In reply to: Dipesh Pandit (#204)
#208Robert Haas
robertmhaas@gmail.com
In reply to: Dagfinn Ilmari Mannsåker (#201)
In reply to: Robert Haas (#208)
#210Robert Haas
robertmhaas@gmail.com
In reply to: Dagfinn Ilmari Mannsåker (#209)
#211Robert Haas
robertmhaas@gmail.com
In reply to: Justin Pryzby (#206)
#212Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#211)
#213Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#205)
#214Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#213)
#215Justin Pryzby
pryzby@telsasoft.com
In reply to: Robert Haas (#213)
#216Robert Haas
robertmhaas@gmail.com
In reply to: Justin Pryzby (#215)
#217Justin Pryzby
pryzby@telsasoft.com
In reply to: Robert Haas (#216)
#218Tom Lane
tgl@sss.pgh.pa.us
In reply to: Justin Pryzby (#217)
#219Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#218)
In reply to: Robert Haas (#192)
In reply to: Robert Haas (#210)
#222Andrew Dunstan
andrew@dunslane.net
In reply to: Dagfinn Ilmari Mannsåker (#221)
#223Robert Haas
robertmhaas@gmail.com
In reply to: Dagfinn Ilmari Mannsåker (#220)
In reply to: Robert Haas (#223)
#225Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#219)
#226Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#225)
#227Justin Pryzby
pryzby@telsasoft.com
In reply to: Tom Lane (#226)
#228Andrew Dunstan
andrew@dunslane.net
In reply to: Dagfinn Ilmari Mannsåker (#220)
#229Thomas Munro
thomas.munro@gmail.com
In reply to: Robert Haas (#150)
#230Robert Haas
robertmhaas@gmail.com
In reply to: Thomas Munro (#229)
#231Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#230)
#232Justin Pryzby
pryzby@telsasoft.com
In reply to: Robert Haas (#231)
#233Robert Haas
robertmhaas@gmail.com
In reply to: Justin Pryzby (#232)