WIP/PoC for parallel backup

Started by Asif Rehmanover 6 years ago120 messageshackers
Jump to latest
#1Asif Rehman
asifr.rehman@gmail.com

Hi Hackers,

I have been looking into adding parallel backup feature in pg_basebackup.
Currently pg_basebackup sends BASE_BACKUP command for taking full backup,
server scans the PGDATA and sends the files to pg_basebackup. In general,
server takes the following steps on BASE_BACKUP command:

- do pg_start_backup
- scans PGDATA, creates and send header containing information of
tablespaces.
- sends each tablespace to pg_basebackup.
- and then do pg_stop_backup

All these steps are executed sequentially by a single process. The idea I
am working on is to separate these steps into multiple commands in
replication grammer. Add worker processes to the pg_basebackup where they
can copy the contents of PGDATA in parallel.

The command line interface syntax would be like:
pg_basebackup --jobs=WORKERS

Replication commands:

- BASE_BACKUP [PARALLEL] - returns a list of files in PGDATA
If the parallel option is there, then it will only do pg_start_backup,
scans PGDATA and sends a list of file names.

- SEND_FILES_CONTENTS (file1, file2,...) - returns the files in given list.
pg_basebackup will then send back a list of filenames in this command. This
commands will be send by each worker and that worker will be getting the
said files.

- STOP_BACKUP
when all workers finish then, pg_basebackup will send STOP_BACKUP command.

The pg_basebackup can start by sending "BASE_BACKUP PARALLEL" command and
getting a list of filenames from the server in response. It should then
divide this list as per --jobs parameter. (This division can be based on
file sizes). Each of the worker process will issue a SEND_FILES_CONTENTS
(file1, file2,...) command. In response, the server will send the files
mentioned in the list back to the requesting worker process.

Once all the files are copied, then pg_basebackup will send the STOP_BACKUP
command. Similar idea has been been discussed by Robert, on the incremental
backup thread a while ago. This is similar to that but instead of
START_BACKUP and SEND_FILE_LIST, I have combined them into BASE_BACKUP
PARALLEL.

I have done a basic proof of concenpt (POC), which is also attached. I
would appreciate some input on this. So far, I am simply dividing the list
equally and assigning them to worker processes. I intend to fine tune this
by taking into consideration file sizes. Further to add tar format support,
I am considering that each worker process, processes all files belonging to
a tablespace in its list (i.e. creates and copies tar file), before it
processes the next tablespace. As a result, this will create tar files that
are disjointed with respect tablespace data. For example:

Say, tablespace t1 has 20 files and we have 5 worker processes and
tablespace t2 has 10. Ignoring all other factors for the sake of this
example, each worker process will get a group of 4 files of t1 and 2 files
of t2. Each process will create 2 tar files, one for t1 containing 4 files
and another for t2 containing 2 files.

Regards,
Asif

Attachments:

0001-Initial-POC-on-parallel-backup.patchapplication/octet-stream; name=0001-Initial-POC-on-parallel-backup.patchDownload+865-251
#2Asim R P
apraveen@pivotal.io
In reply to: Asif Rehman (#1)
Re: WIP/PoC for parallel backup

Hi Asif

Interesting proposal. Bulk of the work in a backup is transferring files
from source data directory to destination. Your patch is breaking this
task down in multiple sets of files and transferring each set in parallel.
This seems correct, however, your patch is also creating a new process to
handle each set. Is that necessary? I think we should try to achieve this
using multiple asynchronous libpq connections from a single basebackup
process. That is to use PQconnectStartParams() interface instead of
PQconnectdbParams(), wich is currently used by basebackup. On the server
side, it may still result in multiple backend processes per connection, and
an attempt should be made to avoid that as well, but it seems complicated.

What do you think?

Asim

#3Ibrar Ahmed
ibrar.ahmad@gmail.com
In reply to: Asim R P (#2)
Re: WIP/PoC for parallel backup

On Fri, Aug 23, 2019 at 3:18 PM Asim R P <apraveen@pivotal.io> wrote:

Hi Asif

Interesting proposal. Bulk of the work in a backup is transferring files
from source data directory to destination. Your patch is breaking this
task down in multiple sets of files and transferring each set in parallel.
This seems correct, however, your patch is also creating a new process to
handle each set. Is that necessary? I think we should try to achieve this
using multiple asynchronous libpq connections from a single basebackup
process. That is to use PQconnectStartParams() interface instead of
PQconnectdbParams(), wich is currently used by basebackup. On the server
side, it may still result in multiple backend processes per connection, and
an attempt should be made to avoid that as well, but it seems complicated.

What do you think?

The main question is what we really want to solve here. What is the

bottleneck? and which HW want to saturate?. Why I am saying that because
there are multiple H/W involve while taking the backup (Network/CPU/Disk).
If we
already saturated the disk then there is no need to add parallelism because
we will be blocked on disk I/O anyway. I implemented the parallel backup
in a sperate
application and has wonderful results. I just skim through the code and have
some reservation that creating a separate process only for copying data is
overkill.
There are two options, one is non-blocking calls or you can have some
worker threads.
But before doing that need to see the pg_basebackup bottleneck, after that,
we
can see what is the best way to solve that. Some numbers may help to
understand the
actual benefit.

--
Ibrar Ahmed

#4Asif Rehman
asifr.rehman@gmail.com
In reply to: Asim R P (#2)
Re: WIP/PoC for parallel backup

On Fri, Aug 23, 2019 at 3:18 PM Asim R P <apraveen@pivotal.io> wrote:

Hi Asif

Interesting proposal. Bulk of the work in a backup is transferring files
from source data directory to destination. Your patch is breaking this
task down in multiple sets of files and transferring each set in parallel.
This seems correct, however, your patch is also creating a new process to
handle each set. Is that necessary? I think we should try to achieve this
using multiple asynchronous libpq connections from a single basebackup
process. That is to use PQconnectStartParams() interface instead of
PQconnectdbParams(), wich is currently used by basebackup. On the server
side, it may still result in multiple backend processes per connection, and
an attempt should be made to avoid that as well, but it seems complicated.

What do you think?

Asim

Thanks Asim for the feedback. This is a good suggestion. The main idea I
wanted to discuss is the design where we can open multiple backend
connections to get the data instead of a single connection.
On the client side we can have multiple approaches, One is to use
asynchronous APIs ( as suggested by you) and other could be to decide
between multi-process and multi-thread. The main point was we can extract
lot of performance benefit by using the multiple connections and I built
this POC to float the idea of how the parallel backup can work, since the
core logic of getting the files using multiple connections will remain the
same, wether we use asynchronous, multi-process or multi-threaded.

I am going to address the division of files to be distributed evenly among
multiple workers based on file sizes, that would allow to get some concrete
numbers as well as it will also us to gauge some benefits between async and
multiprocess/thread approach on client side.

Regards,
Asif

#5Stephen Frost
sfrost@snowman.net
In reply to: Asif Rehman (#4)
Re: WIP/PoC for parallel backup

Greetings,

* Asif Rehman (asifr.rehman@gmail.com) wrote:

On Fri, Aug 23, 2019 at 3:18 PM Asim R P <apraveen@pivotal.io> wrote:

Interesting proposal. Bulk of the work in a backup is transferring files
from source data directory to destination. Your patch is breaking this
task down in multiple sets of files and transferring each set in parallel.
This seems correct, however, your patch is also creating a new process to
handle each set. Is that necessary? I think we should try to achieve this
using multiple asynchronous libpq connections from a single basebackup
process. That is to use PQconnectStartParams() interface instead of
PQconnectdbParams(), wich is currently used by basebackup. On the server
side, it may still result in multiple backend processes per connection, and
an attempt should be made to avoid that as well, but it seems complicated.

Thanks Asim for the feedback. This is a good suggestion. The main idea I
wanted to discuss is the design where we can open multiple backend
connections to get the data instead of a single connection.
On the client side we can have multiple approaches, One is to use
asynchronous APIs ( as suggested by you) and other could be to decide
between multi-process and multi-thread. The main point was we can extract
lot of performance benefit by using the multiple connections and I built
this POC to float the idea of how the parallel backup can work, since the
core logic of getting the files using multiple connections will remain the
same, wether we use asynchronous, multi-process or multi-threaded.

I am going to address the division of files to be distributed evenly among
multiple workers based on file sizes, that would allow to get some concrete
numbers as well as it will also us to gauge some benefits between async and
multiprocess/thread approach on client side.

I would expect you to quickly want to support compression on the server
side, before the data is sent across the network, and possibly
encryption, and so it'd likely make sense to just have independent
processes and connections through which to do that.

Thanks,

Stephen

#6Ibrar Ahmed
ibrar.ahmad@gmail.com
In reply to: Stephen Frost (#5)
Re: WIP/PoC for parallel backup

On Fri, Aug 23, 2019 at 10:26 PM Stephen Frost <sfrost@snowman.net> wrote:

Greetings,

* Asif Rehman (asifr.rehman@gmail.com) wrote:

On Fri, Aug 23, 2019 at 3:18 PM Asim R P <apraveen@pivotal.io> wrote:

Interesting proposal. Bulk of the work in a backup is transferring

files

from source data directory to destination. Your patch is breaking this
task down in multiple sets of files and transferring each set in

parallel.

This seems correct, however, your patch is also creating a new process

to

handle each set. Is that necessary? I think we should try to achieve

this

using multiple asynchronous libpq connections from a single basebackup
process. That is to use PQconnectStartParams() interface instead of
PQconnectdbParams(), wich is currently used by basebackup. On the

server

side, it may still result in multiple backend processes per

connection, and

an attempt should be made to avoid that as well, but it seems

complicated.

Thanks Asim for the feedback. This is a good suggestion. The main idea I
wanted to discuss is the design where we can open multiple backend
connections to get the data instead of a single connection.
On the client side we can have multiple approaches, One is to use
asynchronous APIs ( as suggested by you) and other could be to decide
between multi-process and multi-thread. The main point was we can extract
lot of performance benefit by using the multiple connections and I built
this POC to float the idea of how the parallel backup can work, since the
core logic of getting the files using multiple connections will remain

the

same, wether we use asynchronous, multi-process or multi-threaded.

I am going to address the division of files to be distributed evenly

among

multiple workers based on file sizes, that would allow to get some

concrete

numbers as well as it will also us to gauge some benefits between async

and

multiprocess/thread approach on client side.

I would expect you to quickly want to support compression on the server
side, before the data is sent across the network, and possibly
encryption, and so it'd likely make sense to just have independent
processes and connections through which to do that.

+1 for compression and encryption, but I think parallelism will give us

the benefit with and without the compression.

Thanks,

Stephen

--
Ibrar Ahmed

#7Ahsan Hadi
ahsan.hadi@gmail.com
In reply to: Stephen Frost (#5)
Re: WIP/PoC for parallel backup

On Fri, 23 Aug 2019 at 10:26 PM, Stephen Frost <sfrost@snowman.net> wrote:

Greetings,

* Asif Rehman (asifr.rehman@gmail.com) wrote:

On Fri, Aug 23, 2019 at 3:18 PM Asim R P <apraveen@pivotal.io> wrote:

Interesting proposal. Bulk of the work in a backup is transferring

files

from source data directory to destination. Your patch is breaking this
task down in multiple sets of files and transferring each set in

parallel.

This seems correct, however, your patch is also creating a new process

to

handle each set. Is that necessary? I think we should try to achieve

this

using multiple asynchronous libpq connections from a single basebackup
process. That is to use PQconnectStartParams() interface instead of
PQconnectdbParams(), wich is currently used by basebackup. On the

server

side, it may still result in multiple backend processes per

connection, and

an attempt should be made to avoid that as well, but it seems

complicated.

Thanks Asim for the feedback. This is a good suggestion. The main idea I
wanted to discuss is the design where we can open multiple backend
connections to get the data instead of a single connection.
On the client side we can have multiple approaches, One is to use
asynchronous APIs ( as suggested by you) and other could be to decide
between multi-process and multi-thread. The main point was we can extract
lot of performance benefit by using the multiple connections and I built
this POC to float the idea of how the parallel backup can work, since the
core logic of getting the files using multiple connections will remain

the

same, wether we use asynchronous, multi-process or multi-threaded.

I am going to address the division of files to be distributed evenly

among

multiple workers based on file sizes, that would allow to get some

concrete

numbers as well as it will also us to gauge some benefits between async

and

multiprocess/thread approach on client side.

I would expect you to quickly want to support compression on the server
side, before the data is sent across the network, and possibly
encryption, and so it'd likely make sense to just have independent
processes and connections through which to do that.

It would be interesting to see the benefits of compression (before the data
is transferred over the network) on top of parallelism. Since there is also
some overhead associated with performing the compression. I agree with your
suggestion of trying to add parallelism first and then try compression
before the data is sent across the network.

Show quoted text

Thanks,

Stephen

#8Stephen Frost
sfrost@snowman.net
In reply to: Ahsan Hadi (#7)
Re: WIP/PoC for parallel backup

Greetings,

* Ahsan Hadi (ahsan.hadi@gmail.com) wrote:

On Fri, 23 Aug 2019 at 10:26 PM, Stephen Frost <sfrost@snowman.net> wrote:

I would expect you to quickly want to support compression on the server
side, before the data is sent across the network, and possibly
encryption, and so it'd likely make sense to just have independent
processes and connections through which to do that.

It would be interesting to see the benefits of compression (before the data
is transferred over the network) on top of parallelism. Since there is also
some overhead associated with performing the compression. I agree with your
suggestion of trying to add parallelism first and then try compression
before the data is sent across the network.

You're welcome to take a look at pgbackrest for insight and to play with
regarding compression-before-transfer, how best to split up the files
and order them, encryption, et al. We've put quite a bit of effort into
figuring all of that out.

Thanks!

Stephen

#9Robert Haas
robertmhaas@gmail.com
In reply to: Asif Rehman (#1)
Re: WIP/PoC for parallel backup

On Wed, Aug 21, 2019 at 9:53 AM Asif Rehman <asifr.rehman@gmail.com> wrote:

- BASE_BACKUP [PARALLEL] - returns a list of files in PGDATA
If the parallel option is there, then it will only do pg_start_backup, scans PGDATA and sends a list of file names.

So IIUC, this would mean that BASE_BACKUP without PARALLEL returns
tarfiles, and BASE_BACKUP with PARALLEL returns a result set with a
list of file names. I don't think that's a good approach. It's too
confusing to have one replication command that returns totally
different things depending on whether some option is given.

- SEND_FILES_CONTENTS (file1, file2,...) - returns the files in given list.
pg_basebackup will then send back a list of filenames in this command. This commands will be send by each worker and that worker will be getting the said files.

Seems reasonable, but I think you should just pass one file name and
use the command multiple times, once per file.

- STOP_BACKUP
when all workers finish then, pg_basebackup will send STOP_BACKUP command.

This also seems reasonable, but surely the matching command should
then be called START_BACKUP, not BASEBACKUP PARALLEL.

I have done a basic proof of concenpt (POC), which is also attached. I would appreciate some input on this. So far, I am simply dividing the list equally and assigning them to worker processes. I intend to fine tune this by taking into consideration file sizes. Further to add tar format support, I am considering that each worker process, processes all files belonging to a tablespace in its list (i.e. creates and copies tar file), before it processes the next tablespace. As a result, this will create tar files that are disjointed with respect tablespace data. For example:

Instead of doing this, I suggest that you should just maintain a list
of all the files that need to be fetched and have each worker pull a
file from the head of the list and fetch it when it finishes receiving
the previous file. That way, if some connections go faster or slower
than others, the distribution of work ends up fairly even. If you
instead pre-distribute the work, you're guessing what's going to
happen in the future instead of just waiting to see what actually does
happen. Guessing isn't intrinsically bad, but guessing when you could
be sure of doing the right thing *is* bad.

If you want to be really fancy, you could start by sorting the files
in descending order of size, so that big files are fetched before
small ones. Since the largest possible file is 1GB and any database
where this feature is important is probably hundreds or thousands of
GB, this may not be very important. I suggest not worrying about it
for v1.

Say, tablespace t1 has 20 files and we have 5 worker processes and tablespace t2 has 10. Ignoring all other factors for the sake of this example, each worker process will get a group of 4 files of t1 and 2 files of t2. Each process will create 2 tar files, one for t1 containing 4 files and another for t2 containing 2 files.

This is one of several possible approaches. If we're doing a
plain-format backup in parallel, we can just write each file where it
needs to go and call it good. But, with a tar-format backup, what
should we do? I can see three options:

1. Error! Tar format parallel backups are not supported.

2. Write multiple tar files. The user might reasonably expect that
they're going to end up with the same files at the end of the backup
regardless of whether they do it in parallel. A user with this
expectation will be disappointed.

3. Write one tar file. In this design, the workers have to take turns
writing to the tar file, so you need some synchronization around that.
Perhaps you'd have N threads that read and buffer a file, and N+1
buffers. Then you have one additional thread that reads the complete
files from the buffers and writes them to the tar file. There's
obviously some possibility that the writer won't be able to keep up
and writing the backup will therefore be slower than it would be with
approach (2).

There's probably also a possibility that approach (2) would thrash the
disk head back and forth between multiple files that are all being
written at the same time, and approach (3) will therefore win by not
thrashing the disk head. But, since spinning media are becoming less
and less popular and are likely to have multiple disk heads under the
hood when they are used, this is probably not too likely.

I think your choice to go with approach (2) is probably reasonable,
but I'm not sure whether everyone will agree.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#10Asif Rehman
asifr.rehman@gmail.com
In reply to: Robert Haas (#9)
Re: WIP/PoC for parallel backup

Hi Robert,

Thanks for the feedback. Please see the comments below:

On Tue, Sep 24, 2019 at 10:53 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Aug 21, 2019 at 9:53 AM Asif Rehman <asifr.rehman@gmail.com>
wrote:

- BASE_BACKUP [PARALLEL] - returns a list of files in PGDATA
If the parallel option is there, then it will only do pg_start_backup,

scans PGDATA and sends a list of file names.

So IIUC, this would mean that BASE_BACKUP without PARALLEL returns
tarfiles, and BASE_BACKUP with PARALLEL returns a result set with a
list of file names. I don't think that's a good approach. It's too
confusing to have one replication command that returns totally
different things depending on whether some option is given.

Sure. I will add a separate command (START_BACKUP) for parallel.

- SEND_FILES_CONTENTS (file1, file2,...) - returns the files in given

list.

pg_basebackup will then send back a list of filenames in this command.

This commands will be send by each worker and that worker will be getting
the said files.

Seems reasonable, but I think you should just pass one file name and
use the command multiple times, once per file.

I considered this approach initially, however, I adopted the current
strategy to avoid multiple round trips between the server and clients and
save on query processing time by issuing a single command rather than
multiple ones. Further fetching multiple files at once will also aid in
supporting the tar format by utilising the existing ReceiveTarFile()
function and will be able to create a tarball for per tablespace per worker.

- STOP_BACKUP
when all workers finish then, pg_basebackup will send STOP_BACKUP

command.

This also seems reasonable, but surely the matching command should
then be called START_BACKUP, not BASEBACKUP PARALLEL.

I have done a basic proof of concenpt (POC), which is also attached. I

would appreciate some input on this. So far, I am simply dividing the list
equally and assigning them to worker processes. I intend to fine tune this
by taking into consideration file sizes. Further to add tar format support,
I am considering that each worker process, processes all files belonging to
a tablespace in its list (i.e. creates and copies tar file), before it
processes the next tablespace. As a result, this will create tar files that
are disjointed with respect tablespace data. For example:

Instead of doing this, I suggest that you should just maintain a list
of all the files that need to be fetched and have each worker pull a
file from the head of the list and fetch it when it finishes receiving
the previous file. That way, if some connections go faster or slower
than others, the distribution of work ends up fairly even. If you
instead pre-distribute the work, you're guessing what's going to
happen in the future instead of just waiting to see what actually does
happen. Guessing isn't intrinsically bad, but guessing when you could
be sure of doing the right thing *is* bad.

If you want to be really fancy, you could start by sorting the files
in descending order of size, so that big files are fetched before
small ones. Since the largest possible file is 1GB and any database
where this feature is important is probably hundreds or thousands of
GB, this may not be very important. I suggest not worrying about it
for v1.

Ideally, I would like to support the tar format as well, which would be
much easier to implement when fetching multiple files at once since that
would enable using the existent functionality to be used without much
change.

Your idea of sorting the files in descending order of size seems very
appealing. I think we can do this and have the file divided among the
workers one by one i.e. the first file in the list goes to worker 1, the
second to process 2, and so on and so forth.

Say, tablespace t1 has 20 files and we have 5 worker processes and

tablespace t2 has 10. Ignoring all other factors for the sake of this
example, each worker process will get a group of 4 files of t1 and 2 files
of t2. Each process will create 2 tar files, one for t1 containing 4 files
and another for t2 containing 2 files.

This is one of several possible approaches. If we're doing a
plain-format backup in parallel, we can just write each file where it
needs to go and call it good. But, with a tar-format backup, what
should we do? I can see three options:

1. Error! Tar format parallel backups are not supported.

2. Write multiple tar files. The user might reasonably expect that
they're going to end up with the same files at the end of the backup
regardless of whether they do it in parallel. A user with this
expectation will be disappointed.

3. Write one tar file. In this design, the workers have to take turns
writing to the tar file, so you need some synchronization around that.
Perhaps you'd have N threads that read and buffer a file, and N+1
buffers. Then you have one additional thread that reads the complete
files from the buffers and writes them to the tar file. There's
obviously some possibility that the writer won't be able to keep up
and writing the backup will therefore be slower than it would be with
approach (2).

There's probably also a possibility that approach (2) would thrash the
disk head back and forth between multiple files that are all being
written at the same time, and approach (3) will therefore win by not
thrashing the disk head. But, since spinning media are becoming less
and less popular and are likely to have multiple disk heads under the
hood when they are used, this is probably not too likely.

I think your choice to go with approach (2) is probably reasonable,
but I'm not sure whether everyone will agree.

Yes for the tar format support, approach (2) is what I had in
mind. Currently I'm working on the implementation and will share the patch
in a couple of days.

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

#11Jeevan Chalke
jeevan.chalke@enterprisedb.com
In reply to: Asif Rehman (#10)
Re: WIP/PoC for parallel backup

Hi Asif,

I was looking at the patch and tried comipling it. However, got few errors
and warnings.

Fixed those in the attached patch.

On Fri, Sep 27, 2019 at 9:30 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

Hi Robert,

Thanks for the feedback. Please see the comments below:

On Tue, Sep 24, 2019 at 10:53 PM Robert Haas <robertmhaas@gmail.com>
wrote:

On Wed, Aug 21, 2019 at 9:53 AM Asif Rehman <asifr.rehman@gmail.com>
wrote:

- BASE_BACKUP [PARALLEL] - returns a list of files in PGDATA
If the parallel option is there, then it will only do pg_start_backup,

scans PGDATA and sends a list of file names.

So IIUC, this would mean that BASE_BACKUP without PARALLEL returns
tarfiles, and BASE_BACKUP with PARALLEL returns a result set with a
list of file names. I don't think that's a good approach. It's too
confusing to have one replication command that returns totally
different things depending on whether some option is given.

Sure. I will add a separate command (START_BACKUP) for parallel.

- SEND_FILES_CONTENTS (file1, file2,...) - returns the files in given

list.

pg_basebackup will then send back a list of filenames in this command.

This commands will be send by each worker and that worker will be getting
the said files.

Seems reasonable, but I think you should just pass one file name and
use the command multiple times, once per file.

I considered this approach initially, however, I adopted the current
strategy to avoid multiple round trips between the server and clients and
save on query processing time by issuing a single command rather than
multiple ones. Further fetching multiple files at once will also aid in
supporting the tar format by utilising the existing ReceiveTarFile()
function and will be able to create a tarball for per tablespace per worker.

- STOP_BACKUP
when all workers finish then, pg_basebackup will send STOP_BACKUP

command.

This also seems reasonable, but surely the matching command should
then be called START_BACKUP, not BASEBACKUP PARALLEL.

I have done a basic proof of concenpt (POC), which is also attached. I

would appreciate some input on this. So far, I am simply dividing the list
equally and assigning them to worker processes. I intend to fine tune this
by taking into consideration file sizes. Further to add tar format support,
I am considering that each worker process, processes all files belonging to
a tablespace in its list (i.e. creates and copies tar file), before it
processes the next tablespace. As a result, this will create tar files that
are disjointed with respect tablespace data. For example:

Instead of doing this, I suggest that you should just maintain a list
of all the files that need to be fetched and have each worker pull a
file from the head of the list and fetch it when it finishes receiving
the previous file. That way, if some connections go faster or slower
than others, the distribution of work ends up fairly even. If you
instead pre-distribute the work, you're guessing what's going to
happen in the future instead of just waiting to see what actually does
happen. Guessing isn't intrinsically bad, but guessing when you could
be sure of doing the right thing *is* bad.

If you want to be really fancy, you could start by sorting the files
in descending order of size, so that big files are fetched before
small ones. Since the largest possible file is 1GB and any database
where this feature is important is probably hundreds or thousands of
GB, this may not be very important. I suggest not worrying about it
for v1.

Ideally, I would like to support the tar format as well, which would be
much easier to implement when fetching multiple files at once since that
would enable using the existent functionality to be used without much
change.

Your idea of sorting the files in descending order of size seems very
appealing. I think we can do this and have the file divided among the
workers one by one i.e. the first file in the list goes to worker 1, the
second to process 2, and so on and so forth.

Say, tablespace t1 has 20 files and we have 5 worker processes and

tablespace t2 has 10. Ignoring all other factors for the sake of this
example, each worker process will get a group of 4 files of t1 and 2 files
of t2. Each process will create 2 tar files, one for t1 containing 4 files
and another for t2 containing 2 files.

This is one of several possible approaches. If we're doing a
plain-format backup in parallel, we can just write each file where it
needs to go and call it good. But, with a tar-format backup, what
should we do? I can see three options:

1. Error! Tar format parallel backups are not supported.

2. Write multiple tar files. The user might reasonably expect that
they're going to end up with the same files at the end of the backup
regardless of whether they do it in parallel. A user with this
expectation will be disappointed.

3. Write one tar file. In this design, the workers have to take turns
writing to the tar file, so you need some synchronization around that.
Perhaps you'd have N threads that read and buffer a file, and N+1
buffers. Then you have one additional thread that reads the complete
files from the buffers and writes them to the tar file. There's
obviously some possibility that the writer won't be able to keep up
and writing the backup will therefore be slower than it would be with
approach (2).

There's probably also a possibility that approach (2) would thrash the
disk head back and forth between multiple files that are all being
written at the same time, and approach (3) will therefore win by not
thrashing the disk head. But, since spinning media are becoming less
and less popular and are likely to have multiple disk heads under the
hood when they are used, this is probably not too likely.

I think your choice to go with approach (2) is probably reasonable,
but I'm not sure whether everyone will agree.

Yes for the tar format support, approach (2) is what I had in
mind. Currently I'm working on the implementation and will share the patch
in a couple of days.

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

--
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Attachments:

0001-Initial-POC-on-parallel-backup_fix_errors_warnings_delta.patchtext/x-patch; charset=US-ASCII; name=0001-Initial-POC-on-parallel-backup_fix_errors_warnings_delta.patchDownload+15-11
#12Robert Haas
robertmhaas@gmail.com
In reply to: Asif Rehman (#10)
Re: WIP/PoC for parallel backup

On Fri, Sep 27, 2019 at 12:00 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

- SEND_FILES_CONTENTS (file1, file2,...) - returns the files in given list.
pg_basebackup will then send back a list of filenames in this command. This commands will be send by each worker and that worker will be getting the said files.

Seems reasonable, but I think you should just pass one file name and
use the command multiple times, once per file.

I considered this approach initially, however, I adopted the current strategy to avoid multiple round trips between the server and clients and save on query processing time by issuing a single command rather than multiple ones. Further fetching multiple files at once will also aid in supporting the tar format by utilising the existing ReceiveTarFile() function and will be able to create a tarball for per tablespace per worker.

I think that sending multiple filenames on a line could save some time
when there are lots of very small files, because then the round-trip
overhead could be significant.

However, if you've got mostly big files, I think this is going to be a
loser. It'll be fine if you're able to divide the work exactly evenly,
but that's pretty hard to do, because some workers may succeed in
copying the data faster than others for a variety of reasons: some
data is in memory, some data has to be read from disk, different data
may need to be read from different disks that run at different speeds,
not all the network connections may run at the same speed. Remember
that the backup's not done until the last worker finishes, and so
there may well be a significant advantage in terms of overall speed in
putting some energy into making sure that they finish as close to each
other in time as possible.

To put that another way, the first time all the workers except one get
done while the last one still has 10GB of data to copy, somebody's
going to be unhappy.

Ideally, I would like to support the tar format as well, which would be much easier to implement when fetching multiple files at once since that would enable using the existent functionality to be used without much change.

I think we should just have the client generate the tarfile. It'll
require duplicating some code, but it's not actually that much code or
that complicated from what I can see.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#13Asif Rehman
asifr.rehman@gmail.com
In reply to: Robert Haas (#12)
Re: WIP/PoC for parallel backup

On Thu, Oct 3, 2019 at 6:40 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Sep 27, 2019 at 12:00 PM Asif Rehman <asifr.rehman@gmail.com>
wrote:

- SEND_FILES_CONTENTS (file1, file2,...) - returns the files in given

list.

pg_basebackup will then send back a list of filenames in this

command. This commands will be send by each worker and that worker will be
getting the said files.

Seems reasonable, but I think you should just pass one file name and
use the command multiple times, once per file.

I considered this approach initially, however, I adopted the current

strategy to avoid multiple round trips between the server and clients and
save on query processing time by issuing a single command rather than
multiple ones. Further fetching multiple files at once will also aid in
supporting the tar format by utilising the existing ReceiveTarFile()
function and will be able to create a tarball for per tablespace per worker.

I think that sending multiple filenames on a line could save some time
when there are lots of very small files, because then the round-trip
overhead could be significant.

However, if you've got mostly big files, I think this is going to be a
loser. It'll be fine if you're able to divide the work exactly evenly,
but that's pretty hard to do, because some workers may succeed in
copying the data faster than others for a variety of reasons: some
data is in memory, some data has to be read from disk, different data
may need to be read from different disks that run at different speeds,
not all the network connections may run at the same speed. Remember
that the backup's not done until the last worker finishes, and so
there may well be a significant advantage in terms of overall speed in
putting some energy into making sure that they finish as close to each
other in time as possible.

To put that another way, the first time all the workers except one get
done while the last one still has 10GB of data to copy, somebody's
going to be unhappy.

I have updated the patch (see the attached patch) to include tablespace
support, tar format support and all other backup base backup options to
work in parallel mode as well. As previously suggested, I have removed
BASE_BACKUP [PARALLEL] and have added START_BACKUP instead to start the
backup. The tar format will write multiple tar files depending upon the
number of workers specified. Also made all commands
(START_BACKUP/SEND_FILES_CONTENT/STOP_BACKUP) to accept the
base_backup_opt_list. This way the command-line options can also be
provided to these commands. Since the command-line options don't change
once the backup initiates, I went this way instead of storing them in
shared state.

The START_BACKUP command will now return a sorted list of files in
descending order based on file sizes. This way, the larger files will be on
top of the list. hence these files will be assigned to workers one by one,
making it so that the larger files will be copied before other files.

Based on my understanding your main concern is that the files won't be
distributed fairly i.e one worker might get a big file and take more time
while others get done early with smaller files? In this approach I have
created a list of files in descending order based on there sizes so all the
big size files will come at the top. The maximum file size in PG is 1GB so
if we have four workers who are picking up file from the list one by one,
the worst case scenario is that one worker gets a file of 1GB to process
while others get files of smaller size. However with this approach of
descending files based on size and handing it out to workers one by one,
there is a very high likelihood of workers getting work evenly. does this
address your concerns?

Furthermore the patch also includes the regression test. As t/
010_pg_basebackup.pl test-case is testing base backup comprehensively, so I
have duplicated it to "t/040_pg_basebackup_parallel.pl" and added parallel
option in all of its tests, to make sure parallel mode works expectantly.
The one thing that differs from base backup is the file checksum reporting.
In parallel mode, the total number of checksum failures are not reported
correctly however it will abort the backup whenever a checksum failure
occurs. This is because processes are not maintaining any shared state. I
assume that it's not much important to report total number of failures vs
noticing the failure and aborting.

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Attachments:

0001-parallel-backup.patchapplication/octet-stream; name=0001-parallel-backup.patchDownload+1797-289
#14Robert Haas
robertmhaas@gmail.com
In reply to: Asif Rehman (#13)
Re: WIP/PoC for parallel backup

On Fri, Oct 4, 2019 at 7:02 AM Asif Rehman <asifr.rehman@gmail.com> wrote:

Based on my understanding your main concern is that the files won't be distributed fairly i.e one worker might get a big file and take more time while others get done early with smaller files? In this approach I have created a list of files in descending order based on there sizes so all the big size files will come at the top. The maximum file size in PG is 1GB so if we have four workers who are picking up file from the list one by one, the worst case scenario is that one worker gets a file of 1GB to process while others get files of smaller size. However with this approach of descending files based on size and handing it out to workers one by one, there is a very high likelihood of workers getting work evenly. does this address your concerns?

Somewhat, but I'm not sure it's good enough. There are lots of reasons
why two processes that are started at the same time with the same
amount of work might not finish at the same time.

I'm also not particularly excited about having the server do the
sorting based on file size. Seems like that ought to be the client's
job, if the client needs the sorting.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#15Rushabh Lathia
rushabh.lathia@gmail.com
In reply to: Asif Rehman (#13)
Re: WIP/PoC for parallel backup

Thanks Asif for the patch. I am opting this for a review. Patch is
bit big, so here are very initial comments to make the review process
easier.

1) Patch seems doing lot of code shuffling, I think it would be easy
to review if you can break the clean up patch separately.

Example:
a: setup_throttle
b: include_wal_files

2) As I can see this patch basically have three major phase.

a) Introducing new commands like START_BACKUP, SEND_FILES_CONTENT and
STOP_BACKUP.
b) Implementation of actual parallel backup.
c) Testcase

I would suggest, if you can break out in three as a separate patch that
would be nice. It will benefit in reviewing the patch.

3) In your patch you are preparing the backup manifest (file which
giving the information about the data files). Robert Haas, submitted
the backup manifests patch on another thread [1]/messages/by-id/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com, and I think we
should use that patch to get the backup manifests for parallel backup.

Further, I will continue to review patch but meanwhile if you can
break the patches - so that review process be easier.

[1]: /messages/by-id/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com
/messages/by-id/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com

Thanks,

On Fri, Oct 4, 2019 at 4:32 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

On Thu, Oct 3, 2019 at 6:40 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Sep 27, 2019 at 12:00 PM Asif Rehman <asifr.rehman@gmail.com>
wrote:

- SEND_FILES_CONTENTS (file1, file2,...) - returns the files in

given list.

pg_basebackup will then send back a list of filenames in this

command. This commands will be send by each worker and that worker will be
getting the said files.

Seems reasonable, but I think you should just pass one file name and
use the command multiple times, once per file.

I considered this approach initially, however, I adopted the current

strategy to avoid multiple round trips between the server and clients and
save on query processing time by issuing a single command rather than
multiple ones. Further fetching multiple files at once will also aid in
supporting the tar format by utilising the existing ReceiveTarFile()
function and will be able to create a tarball for per tablespace per worker.

I think that sending multiple filenames on a line could save some time
when there are lots of very small files, because then the round-trip
overhead could be significant.

However, if you've got mostly big files, I think this is going to be a
loser. It'll be fine if you're able to divide the work exactly evenly,
but that's pretty hard to do, because some workers may succeed in
copying the data faster than others for a variety of reasons: some
data is in memory, some data has to be read from disk, different data
may need to be read from different disks that run at different speeds,
not all the network connections may run at the same speed. Remember
that the backup's not done until the last worker finishes, and so
there may well be a significant advantage in terms of overall speed in
putting some energy into making sure that they finish as close to each
other in time as possible.

To put that another way, the first time all the workers except one get
done while the last one still has 10GB of data to copy, somebody's
going to be unhappy.

I have updated the patch (see the attached patch) to include tablespace
support, tar format support and all other backup base backup options to
work in parallel mode as well. As previously suggested, I have removed
BASE_BACKUP [PARALLEL] and have added START_BACKUP instead to start the
backup. The tar format will write multiple tar files depending upon the
number of workers specified. Also made all commands
(START_BACKUP/SEND_FILES_CONTENT/STOP_BACKUP) to accept the
base_backup_opt_list. This way the command-line options can also be
provided to these commands. Since the command-line options don't change
once the backup initiates, I went this way instead of storing them in
shared state.

The START_BACKUP command will now return a sorted list of files in
descending order based on file sizes. This way, the larger files will be on
top of the list. hence these files will be assigned to workers one by one,
making it so that the larger files will be copied before other files.

Based on my understanding your main concern is that the files won't be
distributed fairly i.e one worker might get a big file and take more time
while others get done early with smaller files? In this approach I have
created a list of files in descending order based on there sizes so all the
big size files will come at the top. The maximum file size in PG is 1GB so
if we have four workers who are picking up file from the list one by one,
the worst case scenario is that one worker gets a file of 1GB to process
while others get files of smaller size. However with this approach of
descending files based on size and handing it out to workers one by one,
there is a very high likelihood of workers getting work evenly. does this
address your concerns?

Furthermore the patch also includes the regression test. As t/
010_pg_basebackup.pl test-case is testing base backup comprehensively, so
I have duplicated it to "t/040_pg_basebackup_parallel.pl" and added
parallel option in all of its tests, to make sure parallel mode works
expectantly. The one thing that differs from base backup is the file
checksum reporting. In parallel mode, the total number of checksum failures
are not reported correctly however it will abort the backup whenever a
checksum failure occurs. This is because processes are not maintaining any
shared state. I assume that it's not much important to report total number
of failures vs noticing the failure and aborting.

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

--
Rushabh Lathia

#16Asif Rehman
asifr.rehman@gmail.com
In reply to: Rushabh Lathia (#15)
Re: WIP/PoC for parallel backup

On Mon, Oct 7, 2019 at 1:52 PM Rushabh Lathia <rushabh.lathia@gmail.com>
wrote:

Thanks Asif for the patch. I am opting this for a review. Patch is
bit big, so here are very initial comments to make the review process
easier.

Thanks Rushabh for reviewing the patch.

1) Patch seems doing lot of code shuffling, I think it would be easy
to review if you can break the clean up patch separately.

Example:
a: setup_throttle
b: include_wal_files

2) As I can see this patch basically have three major phase.

a) Introducing new commands like START_BACKUP, SEND_FILES_CONTENT and
STOP_BACKUP.
b) Implementation of actual parallel backup.
c) Testcase

I would suggest, if you can break out in three as a separate patch that
would be nice. It will benefit in reviewing the patch.

Sure, why not. I will break them into multiple patches.

3) In your patch you are preparing the backup manifest (file which
giving the information about the data files). Robert Haas, submitted
the backup manifests patch on another thread [1], and I think we
should use that patch to get the backup manifests for parallel backup.

Sure. Though the backup manifest patch calculates and includes the checksum
of backup files and is done
while the file is being transferred to the frontend-end. The manifest file
itself is copied at the
very end of the backup. In parallel backup, I need the list of filenames
before file contents are transferred, in
order to divide them into multiple workers. For that, the manifest file has
to be available when START_BACKUP
is called.

That means, backup manifest should support its creation while excluding the
checksum during START_BACKUP().
I also need the directory information as well for two reasons:

- In plain format, base path has to exist before we can write the file. we
can extract the base path from the file
but doing that for all files does not seem a good idea.
- base backup does not include the content of some directories but those
directories although empty, are still
expected in PGDATA.

I can make these changes part of parallel backup (which would be on top of
backup manifest patch) or
these changes can be done as part of manifest patch and then parallel can
use them.

Robert what do you suggest?

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

#17Robert Haas
robertmhaas@gmail.com
In reply to: Asif Rehman (#16)
Re: WIP/PoC for parallel backup

On Mon, Oct 7, 2019 at 8:48 AM Asif Rehman <asifr.rehman@gmail.com> wrote:

Sure. Though the backup manifest patch calculates and includes the checksum of backup files and is done
while the file is being transferred to the frontend-end. The manifest file itself is copied at the
very end of the backup. In parallel backup, I need the list of filenames before file contents are transferred, in
order to divide them into multiple workers. For that, the manifest file has to be available when START_BACKUP
is called.

That means, backup manifest should support its creation while excluding the checksum during START_BACKUP().
I also need the directory information as well for two reasons:

- In plain format, base path has to exist before we can write the file. we can extract the base path from the file
but doing that for all files does not seem a good idea.
- base backup does not include the content of some directories but those directories although empty, are still
expected in PGDATA.

I can make these changes part of parallel backup (which would be on top of backup manifest patch) or
these changes can be done as part of manifest patch and then parallel can use them.

Robert what do you suggest?

I think we should probably not use backup manifests here, actually. I
initially thought that would be a good idea, but after further thought
it seems like it just complicates the code to no real benefit. I
suggest that the START_BACKUP command just return a result set, like a
query, with perhaps four columns: file name, file type ('d' for
directory or 'f' for file), file size, file mtime. pg_basebackup will
ignore the mtime, but some other tools might find that useful
information.

I wonder if we should also split START_BACKUP (which should enter
non-exclusive backup mode) from GET_FILE_LIST, in case some other
client program wants to use one of those but not the other. I think
that's probably a good idea, but not sure.

I still think that the files should be requested one at a time, not a
huge long list in a single command.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#18Asif Rehman
asifr.rehman@gmail.com
In reply to: Robert Haas (#17)
Re: WIP/PoC for parallel backup

On Mon, Oct 7, 2019 at 6:05 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Oct 7, 2019 at 8:48 AM Asif Rehman <asifr.rehman@gmail.com> wrote:

Sure. Though the backup manifest patch calculates and includes the

checksum of backup files and is done

while the file is being transferred to the frontend-end. The manifest

file itself is copied at the

very end of the backup. In parallel backup, I need the list of filenames

before file contents are transferred, in

order to divide them into multiple workers. For that, the manifest file

has to be available when START_BACKUP

is called.

That means, backup manifest should support its creation while excluding

the checksum during START_BACKUP().

I also need the directory information as well for two reasons:

- In plain format, base path has to exist before we can write the file.

we can extract the base path from the file

but doing that for all files does not seem a good idea.
- base backup does not include the content of some directories but those

directories although empty, are still

expected in PGDATA.

I can make these changes part of parallel backup (which would be on top

of backup manifest patch) or

these changes can be done as part of manifest patch and then parallel

can use them.

Robert what do you suggest?

I think we should probably not use backup manifests here, actually. I
initially thought that would be a good idea, but after further thought
it seems like it just complicates the code to no real benefit.

Okay.

I
suggest that the START_BACKUP command just return a result set, like a
query, with perhaps four columns: file name, file type ('d' for
directory or 'f' for file), file size, file mtime. pg_basebackup will
ignore the mtime, but some other tools might find that useful
information.

yes current patch already returns the result set. will add the additional
information.

I wonder if we should also split START_BACKUP (which should enter
non-exclusive backup mode) from GET_FILE_LIST, in case some other
client program wants to use one of those but not the other. I think
that's probably a good idea, but not sure.

Currently pg_basebackup does not enter in exclusive backup mode and other
tools have to
use pg_start_backup() and pg_stop_backup() functions to achieve that. Since
we are breaking
backup into multiple command, I believe it would be a good idea to have
this option. I will include
it in next revision of this patch.

I still think that the files should be requested one at a time, not a
huge long list in a single command.

sure, will make the change.

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

#19Ibrar Ahmed
ibrar.ahmad@gmail.com
In reply to: Robert Haas (#17)
Re: WIP/PoC for parallel backup

On Mon, Oct 7, 2019 at 6:06 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Oct 7, 2019 at 8:48 AM Asif Rehman <asifr.rehman@gmail.com> wrote:

Sure. Though the backup manifest patch calculates and includes the

checksum of backup files and is done

while the file is being transferred to the frontend-end. The manifest

file itself is copied at the

very end of the backup. In parallel backup, I need the list of filenames

before file contents are transferred, in

order to divide them into multiple workers. For that, the manifest file

has to be available when START_BACKUP

is called.

That means, backup manifest should support its creation while excluding

the checksum during START_BACKUP().

I also need the directory information as well for two reasons:

- In plain format, base path has to exist before we can write the file.

we can extract the base path from the file

but doing that for all files does not seem a good idea.
- base backup does not include the content of some directories but those

directories although empty, are still

expected in PGDATA.

I can make these changes part of parallel backup (which would be on top

of backup manifest patch) or

these changes can be done as part of manifest patch and then parallel

can use them.

Robert what do you suggest?

I think we should probably not use backup manifests here, actually. I
initially thought that would be a good idea, but after further thought
it seems like it just complicates the code to no real benefit. I
suggest that the START_BACKUP command just return a result set, like a
query, with perhaps four columns: file name, file type ('d' for
directory or 'f' for file), file size, file mtime. pg_basebackup will
ignore the mtime, but some other tools might find that useful
information.

I wonder if we should also split START_BACKUP (which should enter
non-exclusive backup mode) from GET_FILE_LIST, in case some other
client program wants to use one of those but not the other. I think
that's probably a good idea, but not sure.

I still think that the files should be requested one at a time, not a
huge long list in a single command.

What about have an API to get the single file or list of files? We will use
a single file in
our application and other tools can get the benefit of list of files.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Ibrar Ahmed

#20Robert Haas
robertmhaas@gmail.com
In reply to: Ibrar Ahmed (#19)
Re: WIP/PoC for parallel backup

On Mon, Oct 7, 2019 at 9:43 AM Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:

What about have an API to get the single file or list of files? We will use a single file in
our application and other tools can get the benefit of list of files.

That sounds a bit speculative to me. Who is to say that anyone will
find that useful? I mean, I think it's fine and good to build the
functionality that we need in a way that maximizes the likelihood that
other tools can reuse that functionality, and I think we should do
that. But I don't think it's smart to build functionality that we
don't really need in the hope that somebody else will find it useful
unless we're pretty sure that they actually will. I don't see that as
being the case here; YMMV.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#21Asif Rehman
asifr.rehman@gmail.com
In reply to: Asif Rehman (#18)
#22Jeevan Ladhe
jeevan.ladhe@enterprisedb.com
In reply to: Asif Rehman (#21)
#23Asif Rehman
asifr.rehman@gmail.com
In reply to: Jeevan Ladhe (#22)
#24Jeevan Chalke
jeevan.chalke@enterprisedb.com
In reply to: Asif Rehman (#23)
#25Ibrar Ahmed
ibrar.ahmad@gmail.com
In reply to: Jeevan Chalke (#24)
#26Asif Rehman
asifr.rehman@gmail.com
In reply to: Ibrar Ahmed (#25)
#27Asif Rehman
asifr.rehman@gmail.com
In reply to: Asif Rehman (#26)
#28Robert Haas
robertmhaas@gmail.com
In reply to: Asif Rehman (#27)
#29Asif Rehman
asifr.rehman@gmail.com
In reply to: Robert Haas (#28)
#30Asif Rehman
asifr.rehman@gmail.com
In reply to: Asif Rehman (#29)
#31Robert Haas
robertmhaas@gmail.com
In reply to: Asif Rehman (#29)
#32Asif Rehman
asifr.rehman@gmail.com
In reply to: Robert Haas (#31)
#33Asif Rehman
asifr.rehman@gmail.com
In reply to: Asif Rehman (#32)
#34Asif Rehman
asifr.rehman@gmail.com
In reply to: Asif Rehman (#33)
#35Jeevan Chalke
jeevan.chalke@enterprisedb.com
In reply to: Asif Rehman (#34)
#36Robert Haas
robertmhaas@gmail.com
In reply to: Jeevan Chalke (#35)
#37Asif Rehman
asifr.rehman@gmail.com
In reply to: Jeevan Chalke (#35)
#38Asif Rehman
asifr.rehman@gmail.com
In reply to: Robert Haas (#36)
#39Asif Rehman
asifr.rehman@gmail.com
In reply to: Asif Rehman (#38)
#40Robert Haas
robertmhaas@gmail.com
In reply to: Asif Rehman (#39)
#41Asif Rehman
asifr.rehman@gmail.com
In reply to: Robert Haas (#40)
#42Asif Rehman
asifr.rehman@gmail.com
In reply to: Asif Rehman (#41)
#43Jeevan Chalke
jeevan.chalke@enterprisedb.com
In reply to: Asif Rehman (#42)
#44Asif Rehman
asifr.rehman@gmail.com
In reply to: Jeevan Chalke (#43)
#45Asif Rehman
asifr.rehman@gmail.com
In reply to: Asif Rehman (#44)
#46Rajkumar Raghuwanshi
rajkumar.raghuwanshi@enterprisedb.com
In reply to: Asif Rehman (#45)
#47Asif Rehman
asifr.rehman@gmail.com
In reply to: Rajkumar Raghuwanshi (#46)
#48Rajkumar Raghuwanshi
rajkumar.raghuwanshi@enterprisedb.com
In reply to: Asif Rehman (#47)
#49Asif Rehman
asifr.rehman@gmail.com
In reply to: Rajkumar Raghuwanshi (#48)
#50Rajkumar Raghuwanshi
rajkumar.raghuwanshi@enterprisedb.com
In reply to: Asif Rehman (#49)
#51Jeevan Chalke
jeevan.chalke@enterprisedb.com
In reply to: Asif Rehman (#47)
#52Rajkumar Raghuwanshi
rajkumar.raghuwanshi@enterprisedb.com
In reply to: Jeevan Chalke (#51)
#53Rajkumar Raghuwanshi
rajkumar.raghuwanshi@enterprisedb.com
In reply to: Rajkumar Raghuwanshi (#52)
#54Rajkumar Raghuwanshi
rajkumar.raghuwanshi@enterprisedb.com
In reply to: Rajkumar Raghuwanshi (#53)
#55Asif Rehman
asifr.rehman@gmail.com
In reply to: Rajkumar Raghuwanshi (#54)
#56Rajkumar Raghuwanshi
rajkumar.raghuwanshi@enterprisedb.com
In reply to: Asif Rehman (#55)
#57Ahsan Hadi
ahsan.hadi@gmail.com
In reply to: Rajkumar Raghuwanshi (#56)
#58Rajkumar Raghuwanshi
rajkumar.raghuwanshi@enterprisedb.com
In reply to: Ahsan Hadi (#57)
#59Kashif Zeeshan
kashif.zeeshan@enterprisedb.com
In reply to: Rajkumar Raghuwanshi (#58)
#60Robert Haas
robertmhaas@gmail.com
In reply to: Asif Rehman (#55)
#61Robert Haas
robertmhaas@gmail.com
In reply to: Kashif Zeeshan (#59)
#62Kashif Zeeshan
kashif.zeeshan@enterprisedb.com
In reply to: Robert Haas (#61)
#63Robert Haas
robertmhaas@gmail.com
In reply to: Kashif Zeeshan (#62)
#64Kashif Zeeshan
kashif.zeeshan@enterprisedb.com
In reply to: Robert Haas (#63)
#65Robert Haas
robertmhaas@gmail.com
In reply to: Kashif Zeeshan (#64)
#66Asif Rehman
asifr.rehman@gmail.com
In reply to: Robert Haas (#60)
#67Robert Haas
robertmhaas@gmail.com
In reply to: Asif Rehman (#66)
#68Asif Rehman
asifr.rehman@gmail.com
In reply to: Robert Haas (#67)
#69Kashif Zeeshan
kashif.zeeshan@enterprisedb.com
In reply to: Asif Rehman (#68)
#70Jeevan Chalke
jeevan.chalke@enterprisedb.com
In reply to: Kashif Zeeshan (#69)
#71Kashif Zeeshan
kashif.zeeshan@enterprisedb.com
In reply to: Kashif Zeeshan (#69)
#72Asif Rehman
asifr.rehman@gmail.com
In reply to: Kashif Zeeshan (#71)
#73Jeevan Chalke
jeevan.chalke@enterprisedb.com
In reply to: Asif Rehman (#72)
#74Asif Rehman
asifr.rehman@gmail.com
In reply to: Jeevan Chalke (#73)
#75Robert Haas
robertmhaas@gmail.com
In reply to: Asif Rehman (#68)
#76Robert Haas
robertmhaas@gmail.com
In reply to: Asif Rehman (#74)
#77Rajkumar Raghuwanshi
rajkumar.raghuwanshi@enterprisedb.com
In reply to: Robert Haas (#76)
#78Asif Rehman
asifr.rehman@gmail.com
In reply to: Rajkumar Raghuwanshi (#77)
#79Kashif Zeeshan
kashif.zeeshan@enterprisedb.com
In reply to: Asif Rehman (#72)
#80Asif Rehman
asifr.rehman@gmail.com
In reply to: Kashif Zeeshan (#79)
#81Kashif Zeeshan
kashif.zeeshan@enterprisedb.com
In reply to: Asif Rehman (#80)
#82Asif Rehman
asifr.rehman@gmail.com
In reply to: Kashif Zeeshan (#81)
#83Robert Haas
robertmhaas@gmail.com
In reply to: Asif Rehman (#82)
#84Ahsan Hadi
ahsan.hadi@gmail.com
In reply to: Robert Haas (#83)
#85Rajkumar Raghuwanshi
rajkumar.raghuwanshi@enterprisedb.com
In reply to: Ahsan Hadi (#84)
#86Robert Haas
robertmhaas@gmail.com
In reply to: Ahsan Hadi (#84)
#87Kashif Zeeshan
kashif.zeeshan@enterprisedb.com
In reply to: Asif Rehman (#82)
#88Kashif Zeeshan
kashif.zeeshan@enterprisedb.com
In reply to: Asif Rehman (#80)
#89Amit Kapila
amit.kapila16@gmail.com
In reply to: Asif Rehman (#82)
#90Asif Rehman
asifr.rehman@gmail.com
In reply to: Amit Kapila (#89)
#91Jeevan Ladhe
jeevan.ladhe@enterprisedb.com
In reply to: Asif Rehman (#90)
#92Asif Rehman
asifr.rehman@gmail.com
In reply to: Jeevan Ladhe (#91)
#93Amit Kapila
amit.kapila16@gmail.com
In reply to: Asif Rehman (#90)
#94Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#93)
#95Ahsan Hadi
ahsan.hadi@gmail.com
In reply to: Amit Kapila (#94)
#96Amit Kapila
amit.kapila16@gmail.com
In reply to: Ahsan Hadi (#95)
#97Dipesh Pandit
dipesh.pandit@gmail.com
In reply to: Amit Kapila (#96)
#98Asif Rehman
asifr.rehman@gmail.com
In reply to: Robert Haas (#83)
#99Robert Haas
robertmhaas@gmail.com
In reply to: Asif Rehman (#98)
#100Rajkumar Raghuwanshi
rajkumar.raghuwanshi@enterprisedb.com
In reply to: Asif Rehman (#98)
#101Asif Rehman
asifr.rehman@gmail.com
In reply to: Rajkumar Raghuwanshi (#100)
#102Rajkumar Raghuwanshi
rajkumar.raghuwanshi@enterprisedb.com
In reply to: Asif Rehman (#101)
#103David Zhang
david.zhang@highgo.ca
In reply to: Amit Kapila (#96)
#104Amit Kapila
amit.kapila16@gmail.com
In reply to: David Zhang (#103)
#105Suraj Kharage
suraj.kharage@enterprisedb.com
In reply to: Amit Kapila (#104)
#106David Zhang
david.zhang@highgo.ca
In reply to: Suraj Kharage (#105)
#107Sumanta Mukherjee
sumanta.mukherjee@enterprisedb.com
In reply to: David Zhang (#106)
#108Amit Kapila
amit.kapila16@gmail.com
In reply to: Suraj Kharage (#105)
#109Amit Kapila
amit.kapila16@gmail.com
In reply to: Amit Kapila (#108)
#110David Zhang
david.zhang@highgo.ca
In reply to: Sumanta Mukherjee (#107)
#111Rushabh Lathia
rushabh.lathia@gmail.com
In reply to: Amit Kapila (#108)
#112Ahsan Hadi
ahsan.hadi@gmail.com
In reply to: Rushabh Lathia (#111)
#113Rushabh Lathia
rushabh.lathia@gmail.com
In reply to: Ahsan Hadi (#112)
#114Amit Kapila
amit.kapila16@gmail.com
In reply to: Rushabh Lathia (#113)
#115Robert Haas
robertmhaas@gmail.com
In reply to: Rushabh Lathia (#113)
#116Suraj Kharage
suraj.kharage@enterprisedb.com
In reply to: Robert Haas (#115)
#117Hamid Akhtar
hamid.akhtar@gmail.com
In reply to: Suraj Kharage (#116)
#118Robert Haas
robertmhaas@gmail.com
In reply to: Hamid Akhtar (#117)
#119Daniel Gustafsson
daniel@yesql.se
In reply to: Robert Haas (#118)
#120Hamid Akhtar
hamid.akhtar@gmail.com
In reply to: Daniel Gustafsson (#119)