Parallel copy
This work is to parallelize the copy command and in particular "Copy
<table_name> from 'filename' Where <condition>;" command.
Before going into how and what portion of 'copy command' processing we
can parallelize, let us see in brief what are the top-level operations
we perform while copying from the file into a table. We read the file
in 64KB chunks, then find the line endings and process that data line
by line, where each line corresponds to one tuple. We first form the
tuple (in form of value/null array) from that line, check if it
qualifies the where condition and if it qualifies, then perform
constraint check and few other checks and then finally store it in
local tuple array. Once we reach 1000 tuples or consumed 64KB
(whichever occurred first), we insert them together via
table_multi_insert API and then for each tuple insert into the
index(es) and execute after row triggers.
So if we see here we do a lot of work after reading each 64K chunk.
We can read the next chunk only after all the tuples are processed in
the previous chunk we read. This brings us an opportunity to
parallelize each 64K chunk processing. I think we can do this in more
than one way.
The first idea is that we allocate each chunk to a worker and once the
worker has finished processing the current chunk, it can start with
the next unprocessed chunk. Here, we need to see how to handle the
partial tuples at the end or beginning of each chunk. We can read the
chunks in dsa/dsm instead of in local buffer for processing.
Alternatively, if we think that accessing shared memory can be costly
we can read the entire chunk in local memory, but copy the partial
tuple at the beginning of a chunk (if any) to a dsa. We mainly need
partial tuple in the shared memory area. The worker which has found
the initial part of the partial tuple will be responsible to process
the entire tuple. Now, to detect whether there is a partial tuple at
the beginning of a chunk, we always start reading one byte, prior to
the start of the current chunk and if that byte is not a terminating
line byte, we know that it is a partial tuple. Now, while processing
the chunk, we will ignore this first line and start after the first
terminating line.
To connect the partial tuple in two consecutive chunks, we need to
have another data structure (for the ease of reference in this email,
I call it CTM (chunk-tuple-map)) in shared memory where we store some
per-chunk information like the chunk-number, dsa location of that
chunk and a variable which indicates whether we can free/reuse the
current entry. Whenever we encounter the partial tuple at the
beginning of a chunk we note down its chunk number, and dsa location
in CTM. Next, whenever we encounter any partial tuple at the end of
the chunk, we search CTM for next chunk-number and read from
corresponding dsa location till we encounter terminating line byte.
Once we have read and processed this partial tuple, we can mark the
entry as available for reuse. There are some loose ends here like how
many entries shall we allocate in this data structure. It depends on
whether we want to allow the worker to start reading the next chunk
before the partial tuple of the previous chunk is processed. To keep
it simple, we can allow the worker to process the next chunk only when
the partial tuple in the previous chunk is processed. This will allow
us to keep the entries equal to a number of workers in CTM. I think
we can easily improve this if we want but I don't think it will matter
too much as in most cases by the time we processed the tuples in that
chunk, the partial tuple would have been consumed by the other worker.
Another approach that came up during an offlist discussion with Robert
is that we have one dedicated worker for reading the chunks from file
and it copies the complete tuples of one chunk in the shared memory
and once that is done, a handover that chunks to another worker which
can process tuples in that area. We can imagine that the reader
worker is responsible to form some sort of work queue that can be
processed by the other workers. In this idea, we won't be able to get
the benefit of initial tokenization (forming tuple boundaries) via
parallel workers and might need some additional memory processing as
after reader worker has handed the initial shared memory segment, we
need to somehow identify tuple boundaries and then process them.
Another thing we need to figure out is the how many workers to use for
the copy command. I think we can use it based on the file size which
needs some experiments or may be based on user input.
I think we have two related problems to solve for this (a) relation
extension lock (required for extending the relation) which won't
conflict among workers due to group locking, we are working on a
solution for this in another thread [1]/messages/by-id/CAD21AoCmT3cFQUN4aVvzy5chw7DuzXrJCbrjTU05B+Ss=Gn1LA@mail.gmail.com, (b) Use of Page locks in Gin
indexes, we can probably disallow parallelism if the table has Gin
index which is not a great thing but not bad either.
To be clear, this work is for PG14.
Thoughts?
[1]: /messages/by-id/CAD21AoCmT3cFQUN4aVvzy5chw7DuzXrJCbrjTU05B+Ss=Gn1LA@mail.gmail.com
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Fri, Feb 14, 2020 at 9:12 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
This work is to parallelize the copy command and in particular "Copy
<table_name> from 'filename' Where <condition>;" command.
Nice project, and a great stepping stone towards parallel DML.
The first idea is that we allocate each chunk to a worker and once the
worker has finished processing the current chunk, it can start with
the next unprocessed chunk. Here, we need to see how to handle the
partial tuples at the end or beginning of each chunk. We can read the
chunks in dsa/dsm instead of in local buffer for processing.
Alternatively, if we think that accessing shared memory can be costly
we can read the entire chunk in local memory, but copy the partial
tuple at the beginning of a chunk (if any) to a dsa. We mainly need
partial tuple in the shared memory area. The worker which has found
the initial part of the partial tuple will be responsible to process
the entire tuple. Now, to detect whether there is a partial tuple at
the beginning of a chunk, we always start reading one byte, prior to
the start of the current chunk and if that byte is not a terminating
line byte, we know that it is a partial tuple. Now, while processing
the chunk, we will ignore this first line and start after the first
terminating line.
That's quiet similar to the approach I took with a parallel file_fdw
patch[1]/messages/by-id/CA+hUKGKZu8fpZo0W=POmQEN46kXhLedzqqAnt5iJZy7tD0x6sw@mail.gmail.com, which mostly consisted of parallelising the reading part of
copy.c, except that...
To connect the partial tuple in two consecutive chunks, we need to
have another data structure (for the ease of reference in this email,
I call it CTM (chunk-tuple-map)) in shared memory where we store some
per-chunk information like the chunk-number, dsa location of that
chunk and a variable which indicates whether we can free/reuse the
current entry. Whenever we encounter the partial tuple at the
beginning of a chunk we note down its chunk number, and dsa location
in CTM. Next, whenever we encounter any partial tuple at the end of
the chunk, we search CTM for next chunk-number and read from
corresponding dsa location till we encounter terminating line byte.
Once we have read and processed this partial tuple, we can mark the
entry as available for reuse. There are some loose ends here like how
many entries shall we allocate in this data structure. It depends on
whether we want to allow the worker to start reading the next chunk
before the partial tuple of the previous chunk is processed. To keep
it simple, we can allow the worker to process the next chunk only when
the partial tuple in the previous chunk is processed. This will allow
us to keep the entries equal to a number of workers in CTM. I think
we can easily improve this if we want but I don't think it will matter
too much as in most cases by the time we processed the tuples in that
chunk, the partial tuple would have been consumed by the other worker.
... I didn't use a shm 'partial tuple' exchanging mechanism, I just
had each worker follow the final tuple in its chunk into the next
chunk, and have each worker ignore the first tuple in chunk after
chunk 0 because it knows someone else is looking after that. That
means that there was some double reading going on near the boundaries,
and considering how much I've been complaining about bogus extra
system calls on this mailing list lately, yeah, your idea of doing a
bit more coordination is a better idea. If you go this way, you might
at least find the copy.c part of the patch I wrote useful as stand-in
scaffolding code in the meantime while you prototype the parallel
writing side, if you don't already have something better for this?
Another approach that came up during an offlist discussion with Robert
is that we have one dedicated worker for reading the chunks from file
and it copies the complete tuples of one chunk in the shared memory
and once that is done, a handover that chunks to another worker which
can process tuples in that area. We can imagine that the reader
worker is responsible to form some sort of work queue that can be
processed by the other workers. In this idea, we won't be able to get
the benefit of initial tokenization (forming tuple boundaries) via
parallel workers and might need some additional memory processing as
after reader worker has handed the initial shared memory segment, we
need to somehow identify tuple boundaries and then process them.
Yeah, I have also wondered about something like this in a slightly
different context. For parallel query in general, I wondered if there
should be a Parallel Scatter node, that can be put on top of any
parallel-safe plan, and it runs it in a worker process that just
pushes tuples into a single-producer multi-consumer shm queue, and
then other workers read from that whenever they need a tuple. Hmm,
but for COPY, I suppose you'd want to push the raw lines with minimal
examination, not tuples, into a shm queue, so I guess that's a bit
different.
Another thing we need to figure out is the how many workers to use for
the copy command. I think we can use it based on the file size which
needs some experiments or may be based on user input.
It seems like we don't even really have a general model for that sort
of thing in the rest of the system yet, and I guess some kind of
fairly dumb explicit system would make sense in the early days...
Thoughts?
This is cool.
[1]: /messages/by-id/CA+hUKGKZu8fpZo0W=POmQEN46kXhLedzqqAnt5iJZy7tD0x6sw@mail.gmail.com
On Fri, Feb 14, 2020 at 3:36 PM Thomas Munro <thomas.munro@gmail.com> wrote:
On Fri, Feb 14, 2020 at 9:12 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
This work is to parallelize the copy command and in particular "Copy
<table_name> from 'filename' Where <condition>;" command.Nice project, and a great stepping stone towards parallel DML.
Thanks.
The first idea is that we allocate each chunk to a worker and once the
worker has finished processing the current chunk, it can start with
the next unprocessed chunk. Here, we need to see how to handle the
partial tuples at the end or beginning of each chunk. We can read the
chunks in dsa/dsm instead of in local buffer for processing.
Alternatively, if we think that accessing shared memory can be costly
we can read the entire chunk in local memory, but copy the partial
tuple at the beginning of a chunk (if any) to a dsa. We mainly need
partial tuple in the shared memory area. The worker which has found
the initial part of the partial tuple will be responsible to process
the entire tuple. Now, to detect whether there is a partial tuple at
the beginning of a chunk, we always start reading one byte, prior to
the start of the current chunk and if that byte is not a terminating
line byte, we know that it is a partial tuple. Now, while processing
the chunk, we will ignore this first line and start after the first
terminating line.That's quiet similar to the approach I took with a parallel file_fdw
patch[1], which mostly consisted of parallelising the reading part of
copy.c, except that...To connect the partial tuple in two consecutive chunks, we need to
have another data structure (for the ease of reference in this email,
I call it CTM (chunk-tuple-map)) in shared memory where we store some
per-chunk information like the chunk-number, dsa location of that
chunk and a variable which indicates whether we can free/reuse the
current entry. Whenever we encounter the partial tuple at the
beginning of a chunk we note down its chunk number, and dsa location
in CTM. Next, whenever we encounter any partial tuple at the end of
the chunk, we search CTM for next chunk-number and read from
corresponding dsa location till we encounter terminating line byte.
Once we have read and processed this partial tuple, we can mark the
entry as available for reuse. There are some loose ends here like how
many entries shall we allocate in this data structure. It depends on
whether we want to allow the worker to start reading the next chunk
before the partial tuple of the previous chunk is processed. To keep
it simple, we can allow the worker to process the next chunk only when
the partial tuple in the previous chunk is processed. This will allow
us to keep the entries equal to a number of workers in CTM. I think
we can easily improve this if we want but I don't think it will matter
too much as in most cases by the time we processed the tuples in that
chunk, the partial tuple would have been consumed by the other worker.... I didn't use a shm 'partial tuple' exchanging mechanism, I just
had each worker follow the final tuple in its chunk into the next
chunk, and have each worker ignore the first tuple in chunk after
chunk 0 because it knows someone else is looking after that. That
means that there was some double reading going on near the boundaries,
Right and especially if the part in the second chunk is bigger, then
we might need to read most of the second chunk.
and considering how much I've been complaining about bogus extra
system calls on this mailing list lately, yeah, your idea of doing a
bit more coordination is a better idea. If you go this way, you might
at least find the copy.c part of the patch I wrote useful as stand-in
scaffolding code in the meantime while you prototype the parallel
writing side, if you don't already have something better for this?
No, I haven't started writing anything yet, but I have some ideas on
how to achieve this. I quickly skimmed through your patch and I think
that can be used as a starting point though if we decide to go with
accumulating the partial tuple or all the data in shm, then the things
might differ.
Another approach that came up during an offlist discussion with Robert
is that we have one dedicated worker for reading the chunks from file
and it copies the complete tuples of one chunk in the shared memory
and once that is done, a handover that chunks to another worker which
can process tuples in that area. We can imagine that the reader
worker is responsible to form some sort of work queue that can be
processed by the other workers. In this idea, we won't be able to get
the benefit of initial tokenization (forming tuple boundaries) via
parallel workers and might need some additional memory processing as
after reader worker has handed the initial shared memory segment, we
need to somehow identify tuple boundaries and then process them.Yeah, I have also wondered about something like this in a slightly
different context. For parallel query in general, I wondered if there
should be a Parallel Scatter node, that can be put on top of any
parallel-safe plan, and it runs it in a worker process that just
pushes tuples into a single-producer multi-consumer shm queue, and
then other workers read from that whenever they need a tuple.
The idea sounds great but the past experience shows that shoving all
the tuples through queue might add a significant overhead. However, I
don't know how exactly you are planning to use it?
Hmm,
but for COPY, I suppose you'd want to push the raw lines with minimal
examination, not tuples, into a shm queue, so I guess that's a bit
different.
Yeah.
Another thing we need to figure out is the how many workers to use for
the copy command. I think we can use it based on the file size which
needs some experiments or may be based on user input.It seems like we don't even really have a general model for that sort
of thing in the rest of the system yet, and I guess some kind of
fairly dumb explicit system would make sense in the early days...
makes sense.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Fri, 14 Feb 2020 at 11:57, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Feb 14, 2020 at 3:36 PM Thomas Munro <thomas.munro@gmail.com>
wrote:On Fri, Feb 14, 2020 at 9:12 PM Amit Kapila <amit.kapila16@gmail.com>
wrote:
...
Another approach that came up during an offlist discussion with Robert
is that we have one dedicated worker for reading the chunks from file
and it copies the complete tuples of one chunk in the shared memory
and once that is done, a handover that chunks to another worker which
can process tuples in that area. We can imagine that the reader
worker is responsible to form some sort of work queue that can be
processed by the other workers. In this idea, we won't be able to get
the benefit of initial tokenization (forming tuple boundaries) via
parallel workers and might need some additional memory processing as
after reader worker has handed the initial shared memory segment, we
need to somehow identify tuple boundaries and then process them.
Parsing rows from the raw input (the work done by CopyReadLine()) in a
single process would accommodate line returns in quoted fields. I don't
think there's a way of getting parallel workers to manage the
in-quote/out-of-quote state required. A single worker could also process a
stream without having to reread/rewind so it would be able to process input
from STDIN or PROGRAM sources, making the improvements applicable to load
operations done by third party tools and scripted \copy in psql.
...
Another thing we need to figure out is the how many workers to use for
the copy command. I think we can use it based on the file size which
needs some experiments or may be based on user input.It seems like we don't even really have a general model for that sort
of thing in the rest of the system yet, and I guess some kind of
fairly dumb explicit system would make sense in the early days...makes sense.
The ratio between chunking or line parsing processes and the parallel
worker pool would vary with the width of the table, complexity of the data
or file (dates, encoding conversions), complexity of constraints and
acceptable impact of the load. Being able to control it through user input
would be great.
--
Alastair
On Fri, Feb 14, 2020 at 7:16 PM Alastair Turner <minion@decodable.me> wrote:
On Fri, 14 Feb 2020 at 11:57, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Feb 14, 2020 at 3:36 PM Thomas Munro <thomas.munro@gmail.com> wrote:
On Fri, Feb 14, 2020 at 9:12 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
...
Another approach that came up during an offlist discussion with Robert
is that we have one dedicated worker for reading the chunks from file
and it copies the complete tuples of one chunk in the shared memory
and once that is done, a handover that chunks to another worker which
can process tuples in that area. We can imagine that the reader
worker is responsible to form some sort of work queue that can be
processed by the other workers. In this idea, we won't be able to get
the benefit of initial tokenization (forming tuple boundaries) via
parallel workers and might need some additional memory processing as
after reader worker has handed the initial shared memory segment, we
need to somehow identify tuple boundaries and then process them.Parsing rows from the raw input (the work done by CopyReadLine()) in a single process would accommodate line returns in quoted fields. I don't think there's a way of getting parallel workers to manage the in-quote/out-of-quote state required.
AFAIU, the whole of this in-quote/out-of-quote state is manged inside
CopyReadLineText which will be done by each of the parallel workers,
something on the lines of what Thomas did in his patch [1]/messages/by-id/CA+hUKGKZu8fpZo0W=POmQEN46kXhLedzqqAnt5iJZy7tD0x6sw@mail.gmail.com.
Basically, we need to invent a mechanism to allocate chunks to
individual workers and then the whole processing will be done as we
are doing now except for special handling for partial tuples which I
have explained in my previous email. Am, I missing something here?
...
Another thing we need to figure out is the how many workers to use for
the copy command. I think we can use it based on the file size which
needs some experiments or may be based on user input.It seems like we don't even really have a general model for that sort
of thing in the rest of the system yet, and I guess some kind of
fairly dumb explicit system would make sense in the early days...makes sense.
The ratio between chunking or line parsing processes and the parallel worker pool would vary with the width of the table, complexity of the data or file (dates, encoding conversions), complexity of constraints and acceptable impact of the load. Being able to control it through user input would be great.
Okay, I think one simple way could be that we compute the number of
workers based on filesize (some experiments are required to determine
this) unless the user has given the input. If the user has provided
the input then we can use that with an upper limit to
max_parallel_workers.
[1]: /messages/by-id/CA+hUKGKZu8fpZo0W=POmQEN46kXhLedzqqAnt5iJZy7tD0x6sw@mail.gmail.com
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Sat, 15 Feb 2020 at 04:55, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Feb 14, 2020 at 7:16 PM Alastair Turner <minion@decodable.me> wrote:
...
Parsing rows from the raw input (the work done by CopyReadLine()) in a single process would accommodate line returns in quoted fields. I don't think there's a way of getting parallel workers to manage the in-quote/out-of-quote state required.
AFAIU, the whole of this in-quote/out-of-quote state is manged inside
CopyReadLineText which will be done by each of the parallel workers,
something on the lines of what Thomas did in his patch [1].
Basically, we need to invent a mechanism to allocate chunks to
individual workers and then the whole processing will be done as we
are doing now except for special handling for partial tuples which I
have explained in my previous email. Am, I missing something here?
The problem case that I see is the chunk boundary falling in the
middle of a quoted field where
- The quote opens in chunk 1
- The quote closes in chunk 2
- There is an EoL character between the start of chunk 2 and the closing quote
When the worker processing chunk 2 starts, it believes itself to be in
out-of-quote state, so only data between the start of the chunk and
the EoL is regarded as belonging to the partial line. From that point
on the parsing of the rest of the chunk goes off track.
Some of the resulting errors can be avoided by, for instance,
requiring a quote to be preceded by a delimiter or EoL. That answer
fails when fields end with EoL characters, which happens often enough
in the wild.
Recovering from an incorrect in-quote/out-of-quote state assumption at
the start of parsing a chunk just seems like a hole with no bottom. So
it looks to me like it's best done in a single process which can keep
track of that state reliably.
--
Aastair
On Sat, Feb 15, 2020 at 4:08 PM Alastair Turner <minion@decodable.me> wrote:
On Sat, 15 Feb 2020 at 04:55, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Feb 14, 2020 at 7:16 PM Alastair Turner <minion@decodable.me> wrote:
...
Parsing rows from the raw input (the work done by CopyReadLine()) in a single process would accommodate line returns in quoted fields. I don't think there's a way of getting parallel workers to manage the in-quote/out-of-quote state required.
AFAIU, the whole of this in-quote/out-of-quote state is manged inside
CopyReadLineText which will be done by each of the parallel workers,
something on the lines of what Thomas did in his patch [1].
Basically, we need to invent a mechanism to allocate chunks to
individual workers and then the whole processing will be done as we
are doing now except for special handling for partial tuples which I
have explained in my previous email. Am, I missing something here?The problem case that I see is the chunk boundary falling in the
middle of a quoted field where
- The quote opens in chunk 1
- The quote closes in chunk 2
- There is an EoL character between the start of chunk 2 and the closing quoteWhen the worker processing chunk 2 starts, it believes itself to be in
out-of-quote state, so only data between the start of the chunk and
the EoL is regarded as belonging to the partial line. From that point
on the parsing of the rest of the chunk goes off track.Some of the resulting errors can be avoided by, for instance,
requiring a quote to be preceded by a delimiter or EoL. That answer
fails when fields end with EoL characters, which happens often enough
in the wild.Recovering from an incorrect in-quote/out-of-quote state assumption at
the start of parsing a chunk just seems like a hole with no bottom. So
it looks to me like it's best done in a single process which can keep
track of that state reliably.
Good point and I agree with you that having a single process would
avoid any such stuff. However, I will think some more on it and if
you/anyone else gets some idea on how to deal with this in a
multi-worker system (where we can allow each worker to read and
process the chunk) then feel free to share your thoughts.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Sat, Feb 15, 2020 at 06:02:06PM +0530, Amit Kapila wrote:
On Sat, Feb 15, 2020 at 4:08 PM Alastair Turner <minion@decodable.me> wrote:
On Sat, 15 Feb 2020 at 04:55, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Feb 14, 2020 at 7:16 PM Alastair Turner <minion@decodable.me> wrote:
...
Parsing rows from the raw input (the work done by CopyReadLine()) in a single process would accommodate line returns in quoted fields. I don't think there's a way of getting parallel workers to manage the in-quote/out-of-quote state required.
AFAIU, the whole of this in-quote/out-of-quote state is manged inside
CopyReadLineText which will be done by each of the parallel workers,
something on the lines of what Thomas did in his patch [1].
Basically, we need to invent a mechanism to allocate chunks to
individual workers and then the whole processing will be done as we
are doing now except for special handling for partial tuples which I
have explained in my previous email. Am, I missing something here?The problem case that I see is the chunk boundary falling in the
middle of a quoted field where
- The quote opens in chunk 1
- The quote closes in chunk 2
- There is an EoL character between the start of chunk 2 and the closing quoteWhen the worker processing chunk 2 starts, it believes itself to be in
out-of-quote state, so only data between the start of the chunk and
the EoL is regarded as belonging to the partial line. From that point
on the parsing of the rest of the chunk goes off track.Some of the resulting errors can be avoided by, for instance,
requiring a quote to be preceded by a delimiter or EoL. That answer
fails when fields end with EoL characters, which happens often enough
in the wild.Recovering from an incorrect in-quote/out-of-quote state assumption at
the start of parsing a chunk just seems like a hole with no bottom. So
it looks to me like it's best done in a single process which can keep
track of that state reliably.Good point and I agree with you that having a single process would
avoid any such stuff. However, I will think some more on it and if
you/anyone else gets some idea on how to deal with this in a
multi-worker system (where we can allow each worker to read and
process the chunk) then feel free to share your thoughts.
I see two pieces of this puzzle: an input format we control, and the
ones we don't.
In the former case, we could encode all fields with base85 (or
something similar that reduces the input alphabet efficiently), then
reserve bytes that denote delimiters of various types. ASCII has
separators for file, group, record, and unit that we could use as
inspiration.
I don't have anything to offer for free-form input other than to agree
that it looks like a hole with no bottom, and maybe we should just
keep that process serial, at least until someone finds a bottom.
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
On 2/15/20 7:32 AM, Amit Kapila wrote:
On Sat, Feb 15, 2020 at 4:08 PM Alastair Turner <minion@decodable.me> wrote:
On Sat, 15 Feb 2020 at 04:55, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Feb 14, 2020 at 7:16 PM Alastair Turner <minion@decodable.me> wrote:
...
Parsing rows from the raw input (the work done by CopyReadLine()) in a single process would accommodate line returns in quoted fields. I don't think there's a way of getting parallel workers to manage the in-quote/out-of-quote state required.
AFAIU, the whole of this in-quote/out-of-quote state is manged inside
CopyReadLineText which will be done by each of the parallel workers,
something on the lines of what Thomas did in his patch [1].
Basically, we need to invent a mechanism to allocate chunks to
individual workers and then the whole processing will be done as we
are doing now except for special handling for partial tuples which I
have explained in my previous email. Am, I missing something here?The problem case that I see is the chunk boundary falling in the
middle of a quoted field where
- The quote opens in chunk 1
- The quote closes in chunk 2
- There is an EoL character between the start of chunk 2 and the closing quoteWhen the worker processing chunk 2 starts, it believes itself to be in
out-of-quote state, so only data between the start of the chunk and
the EoL is regarded as belonging to the partial line. From that point
on the parsing of the rest of the chunk goes off track.Some of the resulting errors can be avoided by, for instance,
requiring a quote to be preceded by a delimiter or EoL. That answer
fails when fields end with EoL characters, which happens often enough
in the wild.Recovering from an incorrect in-quote/out-of-quote state assumption at
the start of parsing a chunk just seems like a hole with no bottom. So
it looks to me like it's best done in a single process which can keep
track of that state reliably.Good point and I agree with you that having a single process would
avoid any such stuff. However, I will think some more on it and if
you/anyone else gets some idea on how to deal with this in a
multi-worker system (where we can allow each worker to read and
process the chunk) then feel free to share your thoughts.
IIRC, in_quote only matters here in CSV mode (because CSV fields can
have embedded newlines). So why not just forbid parallel copy in CSV
mode, at least for now? I guess it depends on the actual use case. If we
expect to be parallel loading humungous CSVs then that won't fly.
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sun, Feb 16, 2020 at 12:21 PM Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:
On 2/15/20 7:32 AM, Amit Kapila wrote:
On Sat, Feb 15, 2020 at 4:08 PM Alastair Turner <minion@decodable.me> wrote:
The problem case that I see is the chunk boundary falling in the
middle of a quoted field where
- The quote opens in chunk 1
- The quote closes in chunk 2
- There is an EoL character between the start of chunk 2 and the closing quoteWhen the worker processing chunk 2 starts, it believes itself to be in
out-of-quote state, so only data between the start of the chunk and
the EoL is regarded as belonging to the partial line. From that point
on the parsing of the rest of the chunk goes off track.Some of the resulting errors can be avoided by, for instance,
requiring a quote to be preceded by a delimiter or EoL. That answer
fails when fields end with EoL characters, which happens often enough
in the wild.Recovering from an incorrect in-quote/out-of-quote state assumption at
the start of parsing a chunk just seems like a hole with no bottom. So
it looks to me like it's best done in a single process which can keep
track of that state reliably.Good point and I agree with you that having a single process would
avoid any such stuff. However, I will think some more on it and if
you/anyone else gets some idea on how to deal with this in a
multi-worker system (where we can allow each worker to read and
process the chunk) then feel free to share your thoughts.IIRC, in_quote only matters here in CSV mode (because CSV fields can
have embedded newlines).
AFAIU, that is correct.
So why not just forbid parallel copy in CSV
mode, at least for now? I guess it depends on the actual use case. If we
expect to be parallel loading humungous CSVs then that won't fly.
I am not sure about this part. However, I guess we should at the very
least have some extendable solution that can deal with csv, otherwise,
we might end up re-designing everything if someday we want to deal
with CSV. One naive idea is that in csv mode, we can set up the
things slightly differently like the worker, won't start processing
the chunk unless the previous chunk is completely parsed. So each
worker would first parse and tokenize the entire chunk and then start
writing it. So, this will make the reading/parsing part serialized,
but writes can still be parallel. Now, I don't know if it is a good
idea to process in a different way for csv mode.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Sat, 15 Feb 2020 at 14:32, Amit Kapila <amit.kapila16@gmail.com> wrote:
Good point and I agree with you that having a single process would
avoid any such stuff. However, I will think some more on it and if
you/anyone else gets some idea on how to deal with this in a
multi-worker system (where we can allow each worker to read and
process the chunk) then feel free to share your thoughts.
I think having a single process handle splitting the input into tuples makes
most sense. It's possible to parse csv at multiple GB/s rates [1]https://github.com/geofflangdale/simdcsv/, finding
tuple boundaries is a subset of that task.
My first thought for a design would be to have two shared memory ring buffers,
one for data and one for tuple start positions. Reader process reads the CSV
data into the main buffer, finds tuple start locations in there and writes
those to the secondary buffer.
Worker processes claim a chunk of tuple positions from the secondary buffer and
update their "keep this data around" position with the first position. Then
proceed to parse and insert the tuples, updating their position until they find
the end of the last tuple in the chunk.
Buffer size, maximum and minimum chunk size could be tunable. Ideally the
buffers would be at least big enough to absorb one of the workers getting
scheduled out for a timeslice, which could be up to tens of megabytes.
Regards,
Ants Aasma
At Mon, 17 Feb 2020 16:49:22 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in
On Sun, Feb 16, 2020 at 12:21 PM Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:On 2/15/20 7:32 AM, Amit Kapila wrote:
On Sat, Feb 15, 2020 at 4:08 PM Alastair Turner <minion@decodable.me> wrot> > So why not just forbid parallel copy in CSV
mode, at least for now? I guess it depends on the actual use case. If we
expect to be parallel loading humungous CSVs then that won't fly.I am not sure about this part. However, I guess we should at the very
least have some extendable solution that can deal with csv, otherwise,
we might end up re-designing everything if someday we want to deal
with CSV. One naive idea is that in csv mode, we can set up the
things slightly differently like the worker, won't start processing
the chunk unless the previous chunk is completely parsed. So each
worker would first parse and tokenize the entire chunk and then start
writing it. So, this will make the reading/parsing part serialized,
but writes can still be parallel. Now, I don't know if it is a good
idea to process in a different way for csv mode.
In an extreme case, if we didn't see a QUOTE in a chunk, we cannot
know the chunk is in a quoted section or not, until all the past
chunks are parsed. After all we are forced to parse fully
sequentially as far as we allow QUOTE.
On the other hand, if we allowed "COPY t FROM f WITH (FORMAT CSV,
QUOTE '')" in order to signal that there's no quoted section in the
file then all chunks would be fully concurrently parsable.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On Tue, Feb 18, 2020 at 4:04 AM Ants Aasma <ants@cybertec.at> wrote:
On Sat, 15 Feb 2020 at 14:32, Amit Kapila <amit.kapila16@gmail.com> wrote:
Good point and I agree with you that having a single process would
avoid any such stuff. However, I will think some more on it and if
you/anyone else gets some idea on how to deal with this in a
multi-worker system (where we can allow each worker to read and
process the chunk) then feel free to share your thoughts.I think having a single process handle splitting the input into tuples makes
most sense. It's possible to parse csv at multiple GB/s rates [1], finding
tuple boundaries is a subset of that task.
Yeah, this is compelling. Even though it has to read the file
serially, the real gains from parallel COPY should come from doing the
real work in parallel: data-type parsing, tuple forming, WHERE clause
filtering, partition routing, buffer management, insertion and
associated triggers, FKs and index maintenance.
The reason I used the other approach for the file_fdw patch is that I
was trying to make it look as much as possible like parallel
sequential scan and not create an extra worker, because I didn't feel
like an FDW should be allowed to do that (what if executor nodes all
over the query tree created worker processes willy-nilly?). Obviously
it doesn't work correctly for embedded newlines, and even if you
decree that multi-line values aren't allowed in parallel COPY, the
stuff about tuples crossing chunk boundaries is still a bit unpleasant
(whether solved by double reading as I showed, or a bunch of tap
dancing in shared memory) and creates overheads.
My first thought for a design would be to have two shared memory ring buffers,
one for data and one for tuple start positions. Reader process reads the CSV
data into the main buffer, finds tuple start locations in there and writes
those to the secondary buffer.Worker processes claim a chunk of tuple positions from the secondary buffer and
update their "keep this data around" position with the first position. Then
proceed to parse and insert the tuples, updating their position until they find
the end of the last tuple in the chunk.
+1. That sort of two-queue scheme is exactly how I sketched out a
multi-consumer queue for a hypothetical Parallel Scatter node. It
probably gets a bit trickier when the payload has to be broken up into
fragments to wrap around the "data" buffer N times.
On Tue, 18 Feb 2020 at 04:40, Thomas Munro <thomas.munro@gmail.com> wrote:
+1. That sort of two-queue scheme is exactly how I sketched out a
multi-consumer queue for a hypothetical Parallel Scatter node. It
probably gets a bit trickier when the payload has to be broken up into
fragments to wrap around the "data" buffer N times.
At least for copy it should be easy enough - it already has to handle reading
data block by block. If worker updates its position while doing so the reader
can wrap around the data buffer.
There will be no parallelism while one worker is buffering up a line larger
than the data buffer, but that doesn't seem like a major issue. Once the line is
buffered and begins inserting next worker can start buffering the next tuple.
Regards,
Ants Aasma
On Mon, Feb 17, 2020 at 8:34 PM Ants Aasma <ants@cybertec.at> wrote:
On Sat, 15 Feb 2020 at 14:32, Amit Kapila <amit.kapila16@gmail.com> wrote:
Good point and I agree with you that having a single process would
avoid any such stuff. However, I will think some more on it and if
you/anyone else gets some idea on how to deal with this in a
multi-worker system (where we can allow each worker to read and
process the chunk) then feel free to share your thoughts.I think having a single process handle splitting the input into tuples makes
most sense. It's possible to parse csv at multiple GB/s rates [1], finding
tuple boundaries is a subset of that task.My first thought for a design would be to have two shared memory ring buffers,
one for data and one for tuple start positions. Reader process reads the CSV
data into the main buffer, finds tuple start locations in there and writes
those to the secondary buffer.Worker processes claim a chunk of tuple positions from the secondary buffer and
update their "keep this data around" position with the first position. Then
proceed to parse and insert the tuples, updating their position until they find
the end of the last tuple in the chunk.
This is something similar to what I had also in mind for this idea. I
had thought of handing over complete chunk (64K or whatever we
decide). The one thing that slightly bothers me is that we will add
some additional overhead of copying to and from shared memory which
was earlier from local process memory. And, the tokenization (finding
line boundaries) would be serial. I think that tokenization should be
a small part of the overall work we do during the copy operation, but
will do some measurements to ascertain the same.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Tue, Feb 18, 2020 at 7:28 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
At Mon, 17 Feb 2020 16:49:22 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in
On Sun, Feb 16, 2020 at 12:21 PM Andrew Dunstan
<andrew.dunstan@2ndquadrant.com> wrote:On 2/15/20 7:32 AM, Amit Kapila wrote:
On Sat, Feb 15, 2020 at 4:08 PM Alastair Turner <minion@decodable.me> wrot> > So why not just forbid parallel copy in CSV
mode, at least for now? I guess it depends on the actual use case. If we
expect to be parallel loading humungous CSVs then that won't fly.I am not sure about this part. However, I guess we should at the very
least have some extendable solution that can deal with csv, otherwise,
we might end up re-designing everything if someday we want to deal
with CSV. One naive idea is that in csv mode, we can set up the
things slightly differently like the worker, won't start processing
the chunk unless the previous chunk is completely parsed. So each
worker would first parse and tokenize the entire chunk and then start
writing it. So, this will make the reading/parsing part serialized,
but writes can still be parallel. Now, I don't know if it is a good
idea to process in a different way for csv mode.In an extreme case, if we didn't see a QUOTE in a chunk, we cannot
know the chunk is in a quoted section or not, until all the past
chunks are parsed. After all we are forced to parse fully
sequentially as far as we allow QUOTE.
Right, I think the benefits of this as compared to single reader idea
would be (a) we can save accessing shared memory for the most part of
the chunk (b) for non-csv mode, even the tokenization (finding line
boundaries) would also be parallel. OTOH, doing processing
differently for csv and non-csv mode might not be good.
On the other hand, if we allowed "COPY t FROM f WITH (FORMAT CSV,
QUOTE '')" in order to signal that there's no quoted section in the
file then all chunks would be fully concurrently parsable.
Yeah, if we can provide such an option, we can probably make parallel
csv processing equivalent to non-csv. However, users might not like
this as I think in some cases it won't be easier for them to tell
whether the file has quoted fields or not. I am not very sure of this
point.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
At Tue, 18 Feb 2020 15:59:36 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in
On Tue, Feb 18, 2020 at 7:28 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:In an extreme case, if we didn't see a QUOTE in a chunk, we cannot
know the chunk is in a quoted section or not, until all the past
chunks are parsed. After all we are forced to parse fully
sequentially as far as we allow QUOTE.Right, I think the benefits of this as compared to single reader idea
would be (a) we can save accessing shared memory for the most part of
the chunk (b) for non-csv mode, even the tokenization (finding line
boundaries) would also be parallel. OTOH, doing processing
differently for csv and non-csv mode might not be good.
Agreed. So I think it's a good point of compromize.
On the other hand, if we allowed "COPY t FROM f WITH (FORMAT CSV,
QUOTE '')" in order to signal that there's no quoted section in the
file then all chunks would be fully concurrently parsable.Yeah, if we can provide such an option, we can probably make parallel
csv processing equivalent to non-csv. However, users might not like
this as I think in some cases it won't be easier for them to tell
whether the file has quoted fields or not. I am not very sure of this
point.
I'm not sure how large portion of the usage contains quoted sections,
so I'm not sure how it is useful..
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
On Tue, 18 Feb 2020 at 12:20, Amit Kapila <amit.kapila16@gmail.com> wrote:
This is something similar to what I had also in mind for this idea. I
had thought of handing over complete chunk (64K or whatever we
decide). The one thing that slightly bothers me is that we will add
some additional overhead of copying to and from shared memory which
was earlier from local process memory. And, the tokenization (finding
line boundaries) would be serial. I think that tokenization should be
a small part of the overall work we do during the copy operation, but
will do some measurements to ascertain the same.
I don't think any extra copying is needed. The reader can directly
fread()/pq_copymsgbytes() into shared memory, and the workers can run
CopyReadLineText() inner loop directly off of the buffer in shared memory.
For serial performance of tokenization into lines, I really think a SIMD
based approach will be fast enough for quite some time. I hacked up the code in
the simdcsv project to only tokenize on line endings and it was able to
tokenize a CSV file with short lines at 8+ GB/s. There are going to be many
other bottlenecks before this one starts limiting. Patch attached if you'd
like to try that out.
Regards,
Ants Aasma
Attachments:
simdcsv-find-only-lineendings.difftext/x-patch; charset=US-ASCII; name=simdcsv-find-only-lineendings.diffDownload
diff --git a/src/main.cpp b/src/main.cpp
index 9d33a85..2cf775c 100644
--- a/src/main.cpp
+++ b/src/main.cpp
@@ -185,7 +185,6 @@ bool find_indexes(const uint8_t * buf, size_t len, ParsedCSV & pcsv) {
#endif
simd_input in = fill_input(buf+internal_idx);
uint64_t quote_mask = find_quote_mask(in, prev_iter_inside_quote);
- uint64_t sep = cmp_mask_against_input(in, ',');
#ifdef CRLF
uint64_t cr = cmp_mask_against_input(in, 0x0d);
uint64_t cr_adjusted = (cr << 1) | prev_iter_cr_end;
@@ -195,7 +194,7 @@ bool find_indexes(const uint8_t * buf, size_t len, ParsedCSV & pcsv) {
#else
uint64_t end = cmp_mask_against_input(in, 0x0a);
#endif
- fields[b] = (end | sep) & ~quote_mask;
+ fields[b] = (end) & ~quote_mask;
}
for(size_t b = 0; b < SIMDCSV_BUFFERSIZE; b++){
size_t internal_idx = 64 * b + idx;
@@ -211,7 +210,6 @@ bool find_indexes(const uint8_t * buf, size_t len, ParsedCSV & pcsv) {
#endif
simd_input in = fill_input(buf+idx);
uint64_t quote_mask = find_quote_mask(in, prev_iter_inside_quote);
- uint64_t sep = cmp_mask_against_input(in, ',');
#ifdef CRLF
uint64_t cr = cmp_mask_against_input(in, 0x0d);
uint64_t cr_adjusted = (cr << 1) | prev_iter_cr_end;
@@ -226,7 +224,7 @@ bool find_indexes(const uint8_t * buf, size_t len, ParsedCSV & pcsv) {
// then outside the quotes with LF so it's OK to "and off"
// the quoted bits here. Some other quote convention would
// need to be thought about carefully
- uint64_t field_sep = (end | sep) & ~quote_mask;
+ uint64_t field_sep = (end) & ~quote_mask;
flatten_bits(base_ptr, base, idx, field_sep);
}
#undef SIMDCSV_BUFFERSIZE
On Tue, Feb 18, 2020 at 5:59 PM Ants Aasma <ants@cybertec.at> wrote:
On Tue, 18 Feb 2020 at 12:20, Amit Kapila <amit.kapila16@gmail.com> wrote:
This is something similar to what I had also in mind for this idea. I
had thought of handing over complete chunk (64K or whatever we
decide). The one thing that slightly bothers me is that we will add
some additional overhead of copying to and from shared memory which
was earlier from local process memory. And, the tokenization (finding
line boundaries) would be serial. I think that tokenization should be
a small part of the overall work we do during the copy operation, but
will do some measurements to ascertain the same.I don't think any extra copying is needed.
I am talking about access to shared memory instead of the process
local memory. I understand that an extra copy won't be required.
The reader can directly
fread()/pq_copymsgbytes() into shared memory, and the workers can run
CopyReadLineText() inner loop directly off of the buffer in shared memory.
I am slightly confused here. AFAIU, the for(;;) loop in
CopyReadLineText is about finding the line endings which we thought
that the reader process will do.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Sun, Feb 16, 2020 at 12:51 AM Andrew Dunstan <
andrew.dunstan@2ndquadrant.com> wrote:
IIRC, in_quote only matters here in CSV mode (because CSV fields can
have embedded newlines). So why not just forbid parallel copy in CSV
mode, at least for now? I guess it depends on the actual use case. If we
expect to be parallel loading humungous CSVs then that won't fly.
Loading large CSV files is pretty common here. I hope this can be
supported.
MIKE BLACKWELL
* <Mike.Blackwell@rrd.com>*
Show quoted text
On Tue, 18 Feb 2020 at 15:21, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Feb 18, 2020 at 5:59 PM Ants Aasma <ants@cybertec.at> wrote:
On Tue, 18 Feb 2020 at 12:20, Amit Kapila <amit.kapila16@gmail.com> wrote:
This is something similar to what I had also in mind for this idea. I
had thought of handing over complete chunk (64K or whatever we
decide). The one thing that slightly bothers me is that we will add
some additional overhead of copying to and from shared memory which
was earlier from local process memory. And, the tokenization (finding
line boundaries) would be serial. I think that tokenization should be
a small part of the overall work we do during the copy operation, but
will do some measurements to ascertain the same.I don't think any extra copying is needed.
I am talking about access to shared memory instead of the process
local memory. I understand that an extra copy won't be required.The reader can directly
fread()/pq_copymsgbytes() into shared memory, and the workers can run
CopyReadLineText() inner loop directly off of the buffer in shared memory.I am slightly confused here. AFAIU, the for(;;) loop in
CopyReadLineText is about finding the line endings which we thought
that the reader process will do.
Indeed, I somehow misread the code while scanning over it. So CopyReadLineText
currently copies data from cstate->raw_buf to the StringInfo in
cstate->line_buf. In parallel mode it would copy it from the shared data buffer
to local line_buf until it hits the line end found by the data reader. The
amount of copying done is still exactly the same as it is now.
Regards,
Ants Aasma
On Tue, Feb 18, 2020 at 06:51:29PM +0530, Amit Kapila wrote:
On Tue, Feb 18, 2020 at 5:59 PM Ants Aasma <ants@cybertec.at> wrote:
On Tue, 18 Feb 2020 at 12:20, Amit Kapila <amit.kapila16@gmail.com> wrote:
This is something similar to what I had also in mind for this idea. I
had thought of handing over complete chunk (64K or whatever we
decide). The one thing that slightly bothers me is that we will add
some additional overhead of copying to and from shared memory which
was earlier from local process memory. And, the tokenization (finding
line boundaries) would be serial. I think that tokenization should be
a small part of the overall work we do during the copy operation, but
will do some measurements to ascertain the same.I don't think any extra copying is needed.
I am talking about access to shared memory instead of the process
local memory. I understand that an extra copy won't be required.
Isn't accessing shared memory from different pieces of execution what
threads were designed to do?
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
On Tue, Feb 18, 2020 at 8:41 PM David Fetter <david@fetter.org> wrote:
On Tue, Feb 18, 2020 at 06:51:29PM +0530, Amit Kapila wrote:
On Tue, Feb 18, 2020 at 5:59 PM Ants Aasma <ants@cybertec.at> wrote:
On Tue, 18 Feb 2020 at 12:20, Amit Kapila <amit.kapila16@gmail.com> wrote:
This is something similar to what I had also in mind for this idea. I
had thought of handing over complete chunk (64K or whatever we
decide). The one thing that slightly bothers me is that we will add
some additional overhead of copying to and from shared memory which
was earlier from local process memory. And, the tokenization (finding
line boundaries) would be serial. I think that tokenization should be
a small part of the overall work we do during the copy operation, but
will do some measurements to ascertain the same.I don't think any extra copying is needed.
I am talking about access to shared memory instead of the process
local memory. I understand that an extra copy won't be required.Isn't accessing shared memory from different pieces of execution what
threads were designed to do?
Sorry, but I don't understand what you mean by the above? We are
going to use background workers (which are processes) for parallel
workers. In general, it might not make a big difference in accessing
shared memory as compared to local memory especially because the cost
of other stuff in the copy is relatively higher. But still, it is a
point to consider.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Tue, Feb 18, 2020 at 8:08 PM Ants Aasma <ants@cybertec.at> wrote:
On Tue, 18 Feb 2020 at 15:21, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Feb 18, 2020 at 5:59 PM Ants Aasma <ants@cybertec.at> wrote:
On Tue, 18 Feb 2020 at 12:20, Amit Kapila <amit.kapila16@gmail.com> wrote:
This is something similar to what I had also in mind for this idea. I
had thought of handing over complete chunk (64K or whatever we
decide). The one thing that slightly bothers me is that we will add
some additional overhead of copying to and from shared memory which
was earlier from local process memory. And, the tokenization (finding
line boundaries) would be serial. I think that tokenization should be
a small part of the overall work we do during the copy operation, but
will do some measurements to ascertain the same.I don't think any extra copying is needed.
I am talking about access to shared memory instead of the process
local memory. I understand that an extra copy won't be required.The reader can directly
fread()/pq_copymsgbytes() into shared memory, and the workers can run
CopyReadLineText() inner loop directly off of the buffer in shared memory.I am slightly confused here. AFAIU, the for(;;) loop in
CopyReadLineText is about finding the line endings which we thought
that the reader process will do.Indeed, I somehow misread the code while scanning over it. So CopyReadLineText
currently copies data from cstate->raw_buf to the StringInfo in
cstate->line_buf. In parallel mode it would copy it from the shared data buffer
to local line_buf until it hits the line end found by the data reader. The
amount of copying done is still exactly the same as it is now.
Yeah, on a broader level it will be something like that, but actual
details might vary during implementation. BTW, have you given any
thoughts on one other approach I have shared above [1]/messages/by-id/CAA4eK1LyAyPCtBk4rkwomeT6=yTse5qWws-7i9EFwnUFZhvu5w@mail.gmail.com? We might not
go with that idea, but it is better to discuss different ideas and
evaluate their pros and cons.
[1]: /messages/by-id/CAA4eK1LyAyPCtBk4rkwomeT6=yTse5qWws-7i9EFwnUFZhvu5w@mail.gmail.com
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Tue, Feb 18, 2020 at 7:51 PM Mike Blackwell <mike.blackwell@rrd.com>
wrote:
On Sun, Feb 16, 2020 at 12:51 AM Andrew Dunstan <
andrew.dunstan@2ndquadrant.com> wrote:IIRC, in_quote only matters here in CSV mode (because CSV fields can
have embedded newlines). So why not just forbid parallel copy in CSV
mode, at least for now? I guess it depends on the actual use case. If we
expect to be parallel loading humungous CSVs then that won't fly.Loading large CSV files is pretty common here. I hope this can be
supported.
Thank you for your inputs. It is important and valuable.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Wed, 19 Feb 2020 at 06:22, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Feb 18, 2020 at 8:08 PM Ants Aasma <ants@cybertec.at> wrote:
On Tue, 18 Feb 2020 at 15:21, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Feb 18, 2020 at 5:59 PM Ants Aasma <ants@cybertec.at> wrote:
On Tue, 18 Feb 2020 at 12:20, Amit Kapila <amit.kapila16@gmail.com> wrote:
This is something similar to what I had also in mind for this idea. I
had thought of handing over complete chunk (64K or whatever we
decide). The one thing that slightly bothers me is that we will add
some additional overhead of copying to and from shared memory which
was earlier from local process memory. And, the tokenization (finding
line boundaries) would be serial. I think that tokenization should be
a small part of the overall work we do during the copy operation, but
will do some measurements to ascertain the same.I don't think any extra copying is needed.
I am talking about access to shared memory instead of the process
local memory. I understand that an extra copy won't be required.The reader can directly
fread()/pq_copymsgbytes() into shared memory, and the workers can run
CopyReadLineText() inner loop directly off of the buffer in shared memory.I am slightly confused here. AFAIU, the for(;;) loop in
CopyReadLineText is about finding the line endings which we thought
that the reader process will do.Indeed, I somehow misread the code while scanning over it. So CopyReadLineText
currently copies data from cstate->raw_buf to the StringInfo in
cstate->line_buf. In parallel mode it would copy it from the shared data buffer
to local line_buf until it hits the line end found by the data reader. The
amount of copying done is still exactly the same as it is now.Yeah, on a broader level it will be something like that, but actual
details might vary during implementation. BTW, have you given any
thoughts on one other approach I have shared above [1]? We might not
go with that idea, but it is better to discuss different ideas and
evaluate their pros and cons.[1] - /messages/by-id/CAA4eK1LyAyPCtBk4rkwomeT6=yTse5qWws-7i9EFwnUFZhvu5w@mail.gmail.com
It seems to be that at least for the general CSV case the tokenization to
tuples is an inherently serial task. Adding thread synchronization to that path
for coordinating between multiple workers is only going to make it slower. It
may be possible to enforce limitations on the input (e.g. no quotes allowed) or
do some speculative tokenization (e.g. if we encounter quote before newline
assume the chunk started in a quoted section) to make it possible to do the
tokenization in parallel. But given that the simpler and more featured approach
of handling it in a single reader process looks to be fast enough, I don't see
the point. I rather think that the next big step would be to overlap reading
input and tokenization, hopefully by utilizing Andres's work on asyncio.
Regards,
Ants Aasma
On Wed, Feb 19, 2020 at 11:02:15AM +0200, Ants Aasma wrote:
On Wed, 19 Feb 2020 at 06:22, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Feb 18, 2020 at 8:08 PM Ants Aasma <ants@cybertec.at> wrote:
On Tue, 18 Feb 2020 at 15:21, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Feb 18, 2020 at 5:59 PM Ants Aasma <ants@cybertec.at> wrote:
On Tue, 18 Feb 2020 at 12:20, Amit Kapila <amit.kapila16@gmail.com> wrote:
This is something similar to what I had also in mind for this idea. I
had thought of handing over complete chunk (64K or whatever we
decide). The one thing that slightly bothers me is that we will add
some additional overhead of copying to and from shared memory which
was earlier from local process memory. And, the tokenization (finding
line boundaries) would be serial. I think that tokenization should be
a small part of the overall work we do during the copy operation, but
will do some measurements to ascertain the same.I don't think any extra copying is needed.
I am talking about access to shared memory instead of the process
local memory. I understand that an extra copy won't be required.The reader can directly
fread()/pq_copymsgbytes() into shared memory, and the workers can run
CopyReadLineText() inner loop directly off of the buffer in shared memory.I am slightly confused here. AFAIU, the for(;;) loop in
CopyReadLineText is about finding the line endings which we thought
that the reader process will do.Indeed, I somehow misread the code while scanning over it. So CopyReadLineText
currently copies data from cstate->raw_buf to the StringInfo in
cstate->line_buf. In parallel mode it would copy it from the shared data buffer
to local line_buf until it hits the line end found by the data reader. The
amount of copying done is still exactly the same as it is now.Yeah, on a broader level it will be something like that, but actual
details might vary during implementation. BTW, have you given any
thoughts on one other approach I have shared above [1]? We might not
go with that idea, but it is better to discuss different ideas and
evaluate their pros and cons.[1] - /messages/by-id/CAA4eK1LyAyPCtBk4rkwomeT6=yTse5qWws-7i9EFwnUFZhvu5w@mail.gmail.com
It seems to be that at least for the general CSV case the tokenization to
tuples is an inherently serial task. Adding thread synchronization to that path
for coordinating between multiple workers is only going to make it slower. It
may be possible to enforce limitations on the input (e.g. no quotes allowed) or
do some speculative tokenization (e.g. if we encounter quote before newline
assume the chunk started in a quoted section) to make it possible to do the
tokenization in parallel. But given that the simpler and more featured approach
of handling it in a single reader process looks to be fast enough, I don't see
the point. I rather think that the next big step would be to overlap reading
input and tokenization, hopefully by utilizing Andres's work on asyncio.
I generally agree with the impression that parsing CSV is tricky and
unlikely to benefit from parallelism in general. There may be cases with
restrictions making it easier (e.g. restrictions on the format) but that
might be a bit too complex to start with.
For example, I had an idea to parallelise the planning by splitting it
into two phases:
1) indexing
Splits the CSV file into equally-sized chunks, make each worker to just
scan through it's chunk and store positions of delimiters, quotes,
newlines etc. This is probably the most expensive part of the parsing
(essentially go char by char), and we'd speed it up linearly.
2) merge
Combine the information from (1) in a single process, and actually parse
the CSV data - we would not have to inspect each character, because we'd
know positions of interesting chars, so this should be fast. We might
have to recheck some stuff (e.g. escaping) but it should still be much
faster.
But yes, this may be a bit complex and I'm not sure it's worth it.
The one piece of information I'm missing here is at least a very rough
quantification of the individual steps of CSV processing - for example
if parsing takes only 10% of the time, it's pretty pointless to start by
parallelising this part and we should focus on the rest. If it's 50% it
might be a different story. Has anyone done any measurements?
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Feb 19, 2020 at 4:08 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
The one piece of information I'm missing here is at least a very rough
quantification of the individual steps of CSV processing - for example
if parsing takes only 10% of the time, it's pretty pointless to start by
parallelising this part and we should focus on the rest. If it's 50% it
might be a different story.
Right, this is important information to know.
Has anyone done any measurements?
Not yet, but planning to work on it.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Fri, Feb 14, 2020 at 01:41:54PM +0530, Amit Kapila wrote:
This work is to parallelize the copy command and in particular "Copy
<table_name> from 'filename' Where <condition>;" command.
Apropos of the initial parsing issue generally, there's an interesting
approach taken here: https://github.com/robertdavidgraham/wc2
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
On Thu, Feb 20, 2020 at 5:12 AM David Fetter <david@fetter.org> wrote:
On Fri, Feb 14, 2020 at 01:41:54PM +0530, Amit Kapila wrote:
This work is to parallelize the copy command and in particular "Copy
<table_name> from 'filename' Where <condition>;" command.Apropos of the initial parsing issue generally, there's an interesting
approach taken here: https://github.com/robertdavidgraham/wc2
Thanks for sharing. I might be missing something, but I can't figure
out how this can help here. Does this in some way help to allow
multiple workers to read and tokenize the chunks?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Thu, Feb 20, 2020 at 04:11:39PM +0530, Amit Kapila wrote:
On Thu, Feb 20, 2020 at 5:12 AM David Fetter <david@fetter.org> wrote:
On Fri, Feb 14, 2020 at 01:41:54PM +0530, Amit Kapila wrote:
This work is to parallelize the copy command and in particular "Copy
<table_name> from 'filename' Where <condition>;" command.Apropos of the initial parsing issue generally, there's an interesting
approach taken here: https://github.com/robertdavidgraham/wc2Thanks for sharing. I might be missing something, but I can't figure
out how this can help here. Does this in some way help to allow
multiple workers to read and tokenize the chunks?
I think the wc2 is showing that maybe instead of parallelizing the
parsing, we might instead try using a different tokenizer/parser and
make the implementation more efficient instead of just throwing more
CPUs on it.
I don't know if our code is similar to what wc does, maytbe parsing
csv is more complicated than what wc does.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Feb 20, 2020 at 02:36:02PM +0100, Tomas Vondra wrote:
On Thu, Feb 20, 2020 at 04:11:39PM +0530, Amit Kapila wrote:
On Thu, Feb 20, 2020 at 5:12 AM David Fetter <david@fetter.org> wrote:
On Fri, Feb 14, 2020 at 01:41:54PM +0530, Amit Kapila wrote:
This work is to parallelize the copy command and in particular "Copy
<table_name> from 'filename' Where <condition>;" command.Apropos of the initial parsing issue generally, there's an interesting
approach taken here: https://github.com/robertdavidgraham/wc2Thanks for sharing. I might be missing something, but I can't figure
out how this can help here. Does this in some way help to allow
multiple workers to read and tokenize the chunks?I think the wc2 is showing that maybe instead of parallelizing the
parsing, we might instead try using a different tokenizer/parser and
make the implementation more efficient instead of just throwing more
CPUs on it.
That was what I had in mind.
I don't know if our code is similar to what wc does, maytbe parsing
csv is more complicated than what wc does.
CSV parsing differs from wc in that there are more states in the state
machine, but I don't see anything fundamentally different.
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
On Thu, 20 Feb 2020 at 18:43, David Fetter <david@fetter.org> wrote:>
On Thu, Feb 20, 2020 at 02:36:02PM +0100, Tomas Vondra wrote:
I think the wc2 is showing that maybe instead of parallelizing the
parsing, we might instead try using a different tokenizer/parser and
make the implementation more efficient instead of just throwing more
CPUs on it.That was what I had in mind.
I don't know if our code is similar to what wc does, maytbe parsing
csv is more complicated than what wc does.CSV parsing differs from wc in that there are more states in the state
machine, but I don't see anything fundamentally different.
The trouble with a state machine based approach is that the state
transitions form a dependency chain, which means that at best the
processing rate will be 4-5 cycles per byte (L1 latency to fetch the
next state).
I whipped together a quick prototype that uses SIMD and bitmap
manipulations to do the equivalent of CopyReadLineText() in csv mode
including quotes and escape handling, this runs at 0.25-0.5 cycles per
byte.
Regards,
Ants Aasma
Attachments:
On Fri, Feb 21, 2020 at 02:54:31PM +0200, Ants Aasma wrote:
On Thu, 20 Feb 2020 at 18:43, David Fetter <david@fetter.org> wrote:>
On Thu, Feb 20, 2020 at 02:36:02PM +0100, Tomas Vondra wrote:
I think the wc2 is showing that maybe instead of parallelizing the
parsing, we might instead try using a different tokenizer/parser and
make the implementation more efficient instead of just throwing more
CPUs on it.That was what I had in mind.
I don't know if our code is similar to what wc does, maytbe parsing
csv is more complicated than what wc does.CSV parsing differs from wc in that there are more states in the state
machine, but I don't see anything fundamentally different.The trouble with a state machine based approach is that the state
transitions form a dependency chain, which means that at best the
processing rate will be 4-5 cycles per byte (L1 latency to fetch the
next state).I whipped together a quick prototype that uses SIMD and bitmap
manipulations to do the equivalent of CopyReadLineText() in csv mode
including quotes and escape handling, this runs at 0.25-0.5 cycles per
byte.
Interesting. How does that compare to what we currently have?
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Feb 18, 2020 at 6:51 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
I am talking about access to shared memory instead of the process
local memory. I understand that an extra copy won't be required.
You make it sound like there is some performance penalty for accessing
shared memory, but I don't think that's true. It's true that
*contended* access to shared memory can be slower, because if multiple
processes are trying to access the same memory, and especially if
multiple processes are trying to write the same memory, then the cache
lines have to be shared and that has a cost. However, I don't think
that would create any noticeable effect in this case. First, there's
presumably only one writer process. Second, you wouldn't normally have
multiple readers working on the same part of the data at the same
time.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi,
On 2020-02-19 11:38:45 +0100, Tomas Vondra wrote:
I generally agree with the impression that parsing CSV is tricky and
unlikely to benefit from parallelism in general. There may be cases with
restrictions making it easier (e.g. restrictions on the format) but that
might be a bit too complex to start with.For example, I had an idea to parallelise the planning by splitting it
into two phases:
FWIW, I think we ought to rewrite our COPY parsers before we go for
complex schemes. They're way slower than a decent green-field
CSV/... parser.
The one piece of information I'm missing here is at least a very rough
quantification of the individual steps of CSV processing - for example
if parsing takes only 10% of the time, it's pretty pointless to start by
parallelising this part and we should focus on the rest. If it's 50% it
might be a different story. Has anyone done any measurements?
Not recently, but I'm pretty sure that I've observed CSV parsing to be
way more than 10%.
Greetings,
Andres Freund
On Sun, Feb 23, 2020 at 05:09:51PM -0800, Andres Freund wrote:
Hi,
On 2020-02-19 11:38:45 +0100, Tomas Vondra wrote:
I generally agree with the impression that parsing CSV is tricky and
unlikely to benefit from parallelism in general. There may be cases with
restrictions making it easier (e.g. restrictions on the format) but that
might be a bit too complex to start with.For example, I had an idea to parallelise the planning by splitting it
into two phases:FWIW, I think we ought to rewrite our COPY parsers before we go for
complex schemes. They're way slower than a decent green-field
CSV/... parser.
Yep, that's quite possible.
The one piece of information I'm missing here is at least a very rough
quantification of the individual steps of CSV processing - for example
if parsing takes only 10% of the time, it's pretty pointless to start by
parallelising this part and we should focus on the rest. If it's 50% it
might be a different story. Has anyone done any measurements?Not recently, but I'm pretty sure that I've observed CSV parsing to be
way more than 10%.
Perhaps. I guess it'll depend on the CSV file (number of fields, ...),
so I still think we need to do some measurements first. I'm willing to
do that, but (a) I doubt I'll have time for that until after 2020-03,
and (b) it'd be good to agree on some set of typical CSV files.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Feb 25, 2020 at 9:30 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
On Sun, Feb 23, 2020 at 05:09:51PM -0800, Andres Freund wrote:
Hi,
The one piece of information I'm missing here is at least a very rough
quantification of the individual steps of CSV processing - for example
if parsing takes only 10% of the time, it's pretty pointless to start by
parallelising this part and we should focus on the rest. If it's 50% it
might be a different story. Has anyone done any measurements?Not recently, but I'm pretty sure that I've observed CSV parsing to be
way more than 10%.Perhaps. I guess it'll depend on the CSV file (number of fields, ...),
so I still think we need to do some measurements first.
Agreed.
I'm willing to
do that, but (a) I doubt I'll have time for that until after 2020-03,
and (b) it'd be good to agree on some set of typical CSV files.
Right, I don't know what is the best way to define that. I can think
of the below tests.
1. A table with 10 columns (with datatypes as integers, date, text).
It has one index (unique/primary). Load with 1 million rows (basically
the data should be probably 5-10 GB).
2. A table with 10 columns (with datatypes as integers, date, text).
It has three indexes, one index can be (unique/primary). Load with 1
million rows (basically the data should be probably 5-10 GB).
3. A table with 10 columns (with datatypes as integers, date, text).
It has three indexes, one index can be (unique/primary). It has before
and after trigeers. Load with 1 million rows (basically the data
should be probably 5-10 GB).
4. A table with 10 columns (with datatypes as integers, date, text).
It has five or six indexes, one index can be (unique/primary). Load
with 1 million rows (basically the data should be probably 5-10 GB).
Among all these tests, we can check how much time did we spend in
reading, parsing the csv files vs. rest of execution?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Wed, 26 Feb 2020 at 10:54, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Feb 25, 2020 at 9:30 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
...
Perhaps. I guess it'll depend on the CSV file (number of fields, ...),
so I still think we need to do some measurements first.Agreed.
I'm willing to
do that, but (a) I doubt I'll have time for that until after 2020-03,
and (b) it'd be good to agree on some set of typical CSV files.Right, I don't know what is the best way to define that. I can think
of the below tests.1. A table with 10 columns (with datatypes as integers, date, text).
It has one index (unique/primary). Load with 1 million rows (basically
the data should be probably 5-10 GB).
2. A table with 10 columns (with datatypes as integers, date, text).
It has three indexes, one index can be (unique/primary). Load with 1
million rows (basically the data should be probably 5-10 GB).
3. A table with 10 columns (with datatypes as integers, date, text).
It has three indexes, one index can be (unique/primary). It has before
and after trigeers. Load with 1 million rows (basically the data
should be probably 5-10 GB).
4. A table with 10 columns (with datatypes as integers, date, text).
It has five or six indexes, one index can be (unique/primary). Load
with 1 million rows (basically the data should be probably 5-10 GB).Among all these tests, we can check how much time did we spend in
reading, parsing the csv files vs. rest of execution?
That's a good set of tests of what happens after the parse. Two
simpler test runs may provide useful baselines - no
constraints/indexes with all columns varchar and no
constraints/indexes with columns correctly typed.
For testing the impact of various parts of the parse process, my idea would be:
- A base dataset with 10 columns including int, date and text. One
text field quoted and containing both delimiters and line terminators
- A derivative to measure just line parsing - strip the quotes around
the text field and quote the whole row as one text field
- A derivative to measure the impact of quoted fields - clean up the
text field so it doesn't require quoting
- A derivative to measure the impact of row length - run ten rows
together to make 100 column rows, but only a tenth as many rows
If that sounds reasonable, I'll try to knock up a generator.
--
Alastair
On Tue, 25 Feb 2020 at 18:00, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
Perhaps. I guess it'll depend on the CSV file (number of fields, ...),
so I still think we need to do some measurements first. I'm willing to
do that, but (a) I doubt I'll have time for that until after 2020-03,
and (b) it'd be good to agree on some set of typical CSV files.
I agree that getting a nice varied dataset would be nice. Including
things like narrow integer only tables, strings with newlines and
escapes in them, extremely wide rows.
I tried to capture a quick profile just to see what it looks like.
Grabbed a random open data set from the web, about 800MB of narrow
rows CSV [1]https://www3.stats.govt.nz/2018census/Age-sex-by-ethnic-group-grouped-total-responses-census-usually-resident-population-counts-2006-2013-2018-Censuses-RC-TA-SA2-DHB.zip.
Script:
CREATE TABLE census (year int,age int,ethnic int,sex int,area text,count text);
COPY census FROM '.../Data8277.csv' WITH (FORMAT 'csv', HEADER true);
Profile:
# Samples: 59K of event 'cycles:u'
# Event count (approx.): 57644269486
#
# Overhead Command Shared Object Symbol
# ........ ........ ..................
.......................................
#
18.24% postgres postgres [.] CopyReadLine
9.23% postgres postgres [.] NextCopyFrom
8.87% postgres postgres [.] NextCopyFromRawFields
5.82% postgres postgres [.] pg_verify_mbstr_len
5.45% postgres postgres [.] pg_strtoint32
4.16% postgres postgres [.] heap_fill_tuple
4.03% postgres postgres [.] heap_compute_data_size
3.83% postgres postgres [.] CopyFrom
3.78% postgres postgres [.] AllocSetAlloc
3.53% postgres postgres [.] heap_form_tuple
2.96% postgres postgres [.] InputFunctionCall
2.89% postgres libc-2.30.so [.] __memmove_avx_unaligned_erms
1.82% postgres libc-2.30.so [.] __strlen_avx2
1.72% postgres postgres [.] AllocSetReset
1.72% postgres postgres [.] RelationPutHeapTuple
1.47% postgres postgres [.] heap_prepare_insert
1.31% postgres postgres [.] heap_multi_insert
1.25% postgres postgres [.] textin
1.24% postgres postgres [.] int4in
1.05% postgres postgres [.] tts_buffer_heap_clear
0.85% postgres postgres [.] pg_any_to_server
0.80% postgres postgres [.] pg_comp_crc32c_sse42
0.77% postgres postgres [.] cstring_to_text_with_len
0.69% postgres postgres [.] AllocSetFree
0.60% postgres postgres [.] appendBinaryStringInfo
0.55% postgres postgres [.] tts_buffer_heap_materialize.part.0
0.54% postgres postgres [.] palloc
0.54% postgres libc-2.30.so [.] __memmove_avx_unaligned
0.51% postgres postgres [.] palloc0
0.51% postgres postgres [.] pg_encoding_max_length
0.48% postgres postgres [.] enlargeStringInfo
0.47% postgres postgres [.] ExecStoreVirtualTuple
0.45% postgres postgres [.] PageAddItemExtended
So that confirms that the parsing is a huge chunk of overhead with
current splitting into lines being the largest portion. Amdahl's law
says that splitting into tuples needs to be made fast before
parallelizing makes any sense.
Regards,
Ants Aasma
On Wed, Feb 26, 2020 at 8:47 PM Ants Aasma <ants@cybertec.at> wrote:
On Tue, 25 Feb 2020 at 18:00, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
Perhaps. I guess it'll depend on the CSV file (number of fields, ...),
so I still think we need to do some measurements first. I'm willing to
do that, but (a) I doubt I'll have time for that until after 2020-03,
and (b) it'd be good to agree on some set of typical CSV files.I agree that getting a nice varied dataset would be nice. Including
things like narrow integer only tables, strings with newlines and
escapes in them, extremely wide rows.I tried to capture a quick profile just to see what it looks like.
Grabbed a random open data set from the web, about 800MB of narrow
rows CSV [1].Script:
CREATE TABLE census (year int,age int,ethnic int,sex int,area text,count text);
COPY census FROM '.../Data8277.csv' WITH (FORMAT 'csv', HEADER true);Profile:
# Samples: 59K of event 'cycles:u'
# Event count (approx.): 57644269486
#
# Overhead Command Shared Object Symbol
# ........ ........ ..................
.......................................
#
18.24% postgres postgres [.] CopyReadLine
9.23% postgres postgres [.] NextCopyFrom
8.87% postgres postgres [.] NextCopyFromRawFields
5.82% postgres postgres [.] pg_verify_mbstr_len
5.45% postgres postgres [.] pg_strtoint32
4.16% postgres postgres [.] heap_fill_tuple
4.03% postgres postgres [.] heap_compute_data_size
3.83% postgres postgres [.] CopyFrom
3.78% postgres postgres [.] AllocSetAlloc
3.53% postgres postgres [.] heap_form_tuple
2.96% postgres postgres [.] InputFunctionCall
2.89% postgres libc-2.30.so [.] __memmove_avx_unaligned_erms
1.82% postgres libc-2.30.so [.] __strlen_avx2
1.72% postgres postgres [.] AllocSetReset
1.72% postgres postgres [.] RelationPutHeapTuple
1.47% postgres postgres [.] heap_prepare_insert
1.31% postgres postgres [.] heap_multi_insert
1.25% postgres postgres [.] textin
1.24% postgres postgres [.] int4in
1.05% postgres postgres [.] tts_buffer_heap_clear
0.85% postgres postgres [.] pg_any_to_server
0.80% postgres postgres [.] pg_comp_crc32c_sse42
0.77% postgres postgres [.] cstring_to_text_with_len
0.69% postgres postgres [.] AllocSetFree
0.60% postgres postgres [.] appendBinaryStringInfo
0.55% postgres postgres [.] tts_buffer_heap_materialize.part.0
0.54% postgres postgres [.] palloc
0.54% postgres libc-2.30.so [.] __memmove_avx_unaligned
0.51% postgres postgres [.] palloc0
0.51% postgres postgres [.] pg_encoding_max_length
0.48% postgres postgres [.] enlargeStringInfo
0.47% postgres postgres [.] ExecStoreVirtualTuple
0.45% postgres postgres [.] PageAddItemExtendedSo that confirms that the parsing is a huge chunk of overhead with
current splitting into lines being the largest portion. Amdahl's law
says that splitting into tuples needs to be made fast before
parallelizing makes any sense.
I have ran very simple case on table with 2 indexes and I can see a
lot of time is spent in index insertion. I agree that there is a good
amount of time spent in tokanizing but it is not very huge compared to
index insertion.
I have expanded the time spent in the CopyFrom function from my perf
report and pasted here. We can see that a lot of time is spent in
ExecInsertIndexTuples(77%). I agree that we need to further evaluate
that out of which how much is I/O vs CPU operations. But, the point I
want to make is that it's not true for all the cases that parsing is
taking maximum amout of time.
- 99.50% CopyFrom
- 82.90% CopyMultiInsertInfoFlush
- 82.85% CopyMultiInsertBufferFlush
+ 77.68% ExecInsertIndexTuples
+ 3.74% table_multi_insert
+ 0.89% ExecClearTuple
- 12.54% NextCopyFrom
- 7.70% NextCopyFromRawFields
- 5.72% CopyReadLine
3.96% CopyReadLineText
+ 1.49% pg_any_to_server
1.86% CopyReadAttributesCSV
+ 3.68% InputFunctionCall
+ 2.11% ExecMaterializeSlot
+ 0.94% MemoryContextReset
My test:
-- Prepare:
CREATE TABLE t (a int, b int, c varchar);
insert into t select i,i, 'aaaaaaaaaaaaaaaaaaaaaaaa' from
generate_series(1,10000000) as i;
copy t to '/home/dilipkumar/a.csv' WITH (FORMAT 'csv', HEADER true);
truncate table t;
create index idx on t(a);
create index idx1 on t(c);
-- Test CopyFrom and measure with perf:
copy t from '/home/dilipkumar/a.csv' WITH (FORMAT 'csv', HEADER true);
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Wed, Feb 26, 2020 at 4:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Feb 25, 2020 at 9:30 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:On Sun, Feb 23, 2020 at 05:09:51PM -0800, Andres Freund wrote:
Hi,
The one piece of information I'm missing here is at least a very
rough
quantification of the individual steps of CSV processing - for
example
if parsing takes only 10% of the time, it's pretty pointless to
start by
parallelising this part and we should focus on the rest. If it's 50%
it
might be a different story. Has anyone done any measurements?
Not recently, but I'm pretty sure that I've observed CSV parsing to be
way more than 10%.Perhaps. I guess it'll depend on the CSV file (number of fields, ...),
so I still think we need to do some measurements first.Agreed.
I'm willing to
do that, but (a) I doubt I'll have time for that until after 2020-03,
and (b) it'd be good to agree on some set of typical CSV files.Right, I don't know what is the best way to define that. I can think
of the below tests.1. A table with 10 columns (with datatypes as integers, date, text).
It has one index (unique/primary). Load with 1 million rows (basically
the data should be probably 5-10 GB).
2. A table with 10 columns (with datatypes as integers, date, text).
It has three indexes, one index can be (unique/primary). Load with 1
million rows (basically the data should be probably 5-10 GB).
3. A table with 10 columns (with datatypes as integers, date, text).
It has three indexes, one index can be (unique/primary). It has before
and after trigeers. Load with 1 million rows (basically the data
should be probably 5-10 GB).
4. A table with 10 columns (with datatypes as integers, date, text).
It has five or six indexes, one index can be (unique/primary). Load
with 1 million rows (basically the data should be probably 5-10 GB).
I have tried to capture the execution time taken for 3 scenarios which I
felt could give a fair idea:
Test1 (Table with 3 indexes and 1 trigger)
Test2 (Table with 2 indexes)
Test3 (Table without indexes/triggers)
I have captured the following details:
File Read time - time taken to read the file from CopyGetData function.
Read line Time - time taken to read line from NextCopyFrom function(read
time & tokenise time excluded)
Tokenize Time - time taken to tokenize the contents from
NextCopyFromRawFields function.
Data Execution Time - remaining execution time from the total time
The execution breakdown for the tests are given below:
Test/ Time(In Seconds) Total Time File Read Time Read line /Buffer
Read Time Tokenize
Time Data Execution Time
Test1 1693.369 0.256 34.173 5.578 1653.362
Test2 736.096 0.288 39.762 6.525 689.521
Test3 112.06 0.266 39.189 6.433 66.172
Steps for the scenarios:
Test1(Table with 3 indexes and 1 trigger):
CREATE TABLE census2 (year int,age int,ethnic int,sex int,area text,count
text);
CREATE TABLE census3(year int,age int,ethnic int,sex int,area text,count
text);
CREATE INDEX idx1_census2 on census2(year);
CREATE INDEX idx2_census2 on census2(age);
CREATE INDEX idx2_census2 on census2(ethnic);
CREATE or REPLACE FUNCTION census2_afterinsert()
RETURNS TRIGGER
AS $$
BEGIN
INSERT INTO census3 SELECT * FROM census2 limit 1;
RETURN NEW;
END;
$$
LANGUAGE plpgsql;
CREATE TRIGGER census2_trigger AFTER INSERT ON census2 FOR EACH ROW
EXECUTE PROCEDURE census2_afterinsert();
COPY census2 FROM 'Data8277.csv' WITH (FORMAT 'csv', HEADER true);
Test2 (Table with 2 indexes):
CREATE TABLE census1 (year int,age int,ethnic int,sex int,area text,count
text);
CREATE INDEX idx1_census1 on census1(year);
CREATE INDEX idx2_census1 on census1(age);
COPY census1 FROM 'Data8277.csv' WITH (FORMAT 'csv', HEADER true);
Test3 (Table without indexes/triggers):
CREATE TABLE census (year int,age int,ethnic int,sex int,area text,count
text);
COPY census FROM 'Data8277.csv' WITH (FORMAT 'csv', HEADER true);
Note: The Data8277.csv used was the same data that Ants aasma had used.
From the above result we could infer that Read line will have to be done
sequentially. Read line time takes about 2.01%, 5.40% and 34.97%of the
total time. I felt we will be able to parallelise the remaining phases of
the copy. The performance improvement will vary based on the
scenario(indexes/triggers), it will be proportionate to the number of
indexes and triggers. Read line can also be parallelised in txt format(non
csv). I feel parallelising copy could give significant improvement in quite
some scenarios.
Further I'm planning to see how the execution will be for toast table. I'm
also planning to do test on RAM disk where I will configure the data on RAM
disk, so that we can further eliminate the I/O cost.
Thoughts?
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
On Wed, Feb 26, 2020 at 8:47 PM Ants Aasma <ants@cybertec.at> wrote:
On Tue, 25 Feb 2020 at 18:00, Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:
Perhaps. I guess it'll depend on the CSV file (number of fields, ...),
so I still think we need to do some measurements first. I'm willing to
do that, but (a) I doubt I'll have time for that until after 2020-03,
and (b) it'd be good to agree on some set of typical CSV files.I agree that getting a nice varied dataset would be nice. Including
things like narrow integer only tables, strings with newlines and
escapes in them, extremely wide rows.I tried to capture a quick profile just to see what it looks like.
Grabbed a random open data set from the web, about 800MB of narrow
rows CSV [1].Script:
CREATE TABLE census (year int,age int,ethnic int,sex int,area text,count
text);
COPY census FROM '.../Data8277.csv' WITH (FORMAT 'csv', HEADER true);
Profile:
# Samples: 59K of event 'cycles:u'
# Event count (approx.): 57644269486
#
# Overhead Command Shared Object Symbol
# ........ ........ ..................
.......................................
#
18.24% postgres postgres [.] CopyReadLine
9.23% postgres postgres [.] NextCopyFrom
8.87% postgres postgres [.] NextCopyFromRawFields
5.82% postgres postgres [.] pg_verify_mbstr_len
5.45% postgres postgres [.] pg_strtoint32
4.16% postgres postgres [.] heap_fill_tuple
4.03% postgres postgres [.] heap_compute_data_size
3.83% postgres postgres [.] CopyFrom
3.78% postgres postgres [.] AllocSetAlloc
3.53% postgres postgres [.] heap_form_tuple
2.96% postgres postgres [.] InputFunctionCall
2.89% postgres libc-2.30.so [.] __memmove_avx_unaligned_erms
1.82% postgres libc-2.30.so [.] __strlen_avx2
1.72% postgres postgres [.] AllocSetReset
1.72% postgres postgres [.] RelationPutHeapTuple
1.47% postgres postgres [.] heap_prepare_insert
1.31% postgres postgres [.] heap_multi_insert
1.25% postgres postgres [.] textin
1.24% postgres postgres [.] int4in
1.05% postgres postgres [.] tts_buffer_heap_clear
0.85% postgres postgres [.] pg_any_to_server
0.80% postgres postgres [.] pg_comp_crc32c_sse42
0.77% postgres postgres [.] cstring_to_text_with_len
0.69% postgres postgres [.] AllocSetFree
0.60% postgres postgres [.] appendBinaryStringInfo
0.55% postgres postgres [.]
tts_buffer_heap_materialize.part.0
0.54% postgres postgres [.] palloc
0.54% postgres libc-2.30.so [.] __memmove_avx_unaligned
0.51% postgres postgres [.] palloc0
0.51% postgres postgres [.] pg_encoding_max_length
0.48% postgres postgres [.] enlargeStringInfo
0.47% postgres postgres [.] ExecStoreVirtualTuple
0.45% postgres postgres [.] PageAddItemExtendedSo that confirms that the parsing is a huge chunk of overhead with
current splitting into lines being the largest portion. Amdahl's law
says that splitting into tuples needs to be made fast before
parallelizing makes any sense.
I had taken perf report with the same test data that you had used, I was
getting the following results:
.....
+ 99.61% 0.00% postgres postgres [.] PortalRun
+ 99.61% 0.00% postgres postgres [.] PortalRunMulti
+ 99.61% 0.00% postgres postgres [.] PortalRunUtility
+ 99.61% 0.00% postgres postgres [.] ProcessUtility
+ 99.61% 0.00% postgres postgres [.]
standard_ProcessUtility
+ 99.61% 0.00% postgres postgres [.] DoCopy
+ 99.30% 0.94% postgres postgres [.] CopyFrom
+ 51.61% 7.76% postgres postgres [.] NextCopyFrom
+ 23.66% 0.01% postgres postgres [.]
CopyMultiInsertInfoFlush
+ 23.61% 0.28% postgres postgres [.]
CopyMultiInsertBufferFlush
+ 21.99% 1.02% postgres postgres [.]
NextCopyFromRawFields
*+ 19.79% 0.01% postgres postgres [.]
table_multi_insert+ 19.32% 3.00% postgres postgres [.]
heap_multi_insert*+ 18.27% 2.44% postgres postgres [.]
InputFunctionCall
*+ 15.19% 0.89% postgres postgres [.] CopyReadLine*+
13.05% 0.18% postgres postgres [.] ExecMaterializeSlot
+ 13.00% 0.55% postgres postgres [.]
tts_buffer_heap_materialize
+ 12.31% 1.77% postgres postgres [.] heap_form_tuple
+ 10.43% 0.45% postgres postgres [.] int4in
+ 10.18% 8.92% postgres postgres [.] CopyReadLineText
......
In my results I observed execution table_multi_insert was nearly 20%. Also
I felt like once we have made few tuples from CopyReadLine, the parallel
workers should be able to start consuming the data and process the data. We
need not wait for the complete tokenisation to be finished. Once few tuples
are tokenised parallel workers should start consuming the data parallelly
and tokenisation should happen simultaneously. In this way once the copy is
done parallelly total execution time should be CopyReadLine Time + delta
processing time.
Thoughts?
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
I have got the execution breakdown for few scenarios with normal disk and
RAM disk.
*Execution breakup in Normal disk:*
Test/ Time(In Seconds)
Total TIme File Read Time copyreadline Time
Remaining
Execution Time Read line percentage
Test1(3 index + 1 trigger) 2099.017 0.311 10.256 2088.45 0.4886096682
Test2(2 index) 657.994 0.303 10.171 647.52 1.545758776
Test3(no index, no trigger) 112.41 0.296 10.996 101.118 9.782047861
Test4(toast) 360.028 1.43 46.556 312.042 12.93121646
*Execution breakup in RAM disk:*
Test/ Time(In Seconds)
Total TIme File Read Time copyreadline Time
Remaining
Execution Time Read line percentage
Test1(3 index + 1 trigger) 1571.558 0.259 6.986 1564.313 0.4445270235
Test2(2 index) 369.942 0.263 6.848 362.831 1.851100983
Test3(no index, no trigger) 54.077 0.239 6.805 47.033 12.58390813
Test4(toast) 96.323 0.918 26.603 68.802 27.61853348
Steps for the scenarios:
*Test1(Table with 3 indexes and 1 trigger):*
*CREATE TABLE census2 (year int,age int,ethnic int,sex int,area text,count
text);CREATE TABLE census3(year int,age int,ethnic int,sex int,area
text,count text);CREATE INDEX idx1_census2 on census2(year);CREATE INDEX
idx2_census2 on census2(age);CREATE INDEX idx3_census2 on
census2(ethnic);CREATE or REPLACE FUNCTION census2_afterinsert()RETURNS
TRIGGERAS $$BEGIN INSERT INTO census3 SELECT * FROM census2 limit 1;
RETURN NEW;END;$$LANGUAGE plpgsql;CREATE TRIGGER census2_trigger AFTER
INSERT ON census2 FOR EACH ROW EXECUTE PROCEDURE
census2_afterinsert();COPY census2 FROM 'Data8277.csv' WITH (FORMAT 'csv',
HEADER true);*
*Test2 (Table with 2 indexes):*
*CREATE TABLE census1 (year int,age int,ethnic int,sex int,area text,count
text);CREATE INDEX idx1_census1 on census1(year);CREATE INDEX idx2_census1
on census1(age);COPY census1 FROM 'Data8277.csv' WITH (FORMAT 'csv', HEADER
true);*
*Test3 (Table without indexes/triggers):*
*CREATE TABLE census (year int,age int,ethnic int,sex int,area text,count
text);COPY census FROM 'Data8277.csv' WITH (FORMAT 'csv', HEADER true);*
*Random open data set from the web, about 800MB of narrow rows CSV [1] was
used in the above tests, the same which Ants Aasma had used.*
*Test4 (Toast table):*
*CREATE TABLE indtoasttest(descr text, cnt int DEFAULT 0, f1 text, f2
text);alter table indtoasttest alter column f1 set storage external;alter
table indtoasttest alter column f2 set storage external;inserted 262144
recordscopy indtoasttest to
'/mnt/magnetic/vignesh.c/postgres/toast_data3.csv' WITH (FORMAT 'csv',
HEADER true);CREATE TABLE indtoasttest1(descr text, cnt int DEFAULT 0, f1
text, f2 text);alter table indtoasttest1 alter column f1 set storage
external;alter table indtoasttest1 alter column f2 set storage
external;copy indtoasttest1 from
'/mnt/magnetic/vignesh.c/postgres/toast_data3.csv' WITH (FORMAT 'csv',
HEADER true);*
We could infer that Read line Time cannot be parallelized, this is mainly
because if the data has quote present we will not be able to differentiate
if it is part of previous record or it is part of current record. The rest
of the execution time can be parallelized. Read line Time takes about 0.5%,
1.5%, 9.8% & 12.9% of the total time. We could parallelize the remaining
phases of the copy. The performance improvement will vary based on the
scenario(indexes/triggers), it will be proportionate to the number of
indexes and triggers. Read line can also be parallelized in txt format(non
csv). We feel parallelize copy could give significant improvement in many
scenarios.
Attached patch for reference which was used to capture the execution time
breakup.
Thoughts?
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
On Tue, Mar 3, 2020 at 11:44 AM vignesh C <vignesh21@gmail.com> wrote:
Show quoted text
On Wed, Feb 26, 2020 at 8:47 PM Ants Aasma <ants@cybertec.at> wrote:
On Tue, 25 Feb 2020 at 18:00, Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:
Perhaps. I guess it'll depend on the CSV file (number of fields, ...),
so I still think we need to do some measurements first. I'm willing to
do that, but (a) I doubt I'll have time for that until after 2020-03,
and (b) it'd be good to agree on some set of typical CSV files.I agree that getting a nice varied dataset would be nice. Including
things like narrow integer only tables, strings with newlines and
escapes in them, extremely wide rows.I tried to capture a quick profile just to see what it looks like.
Grabbed a random open data set from the web, about 800MB of narrow
rows CSV [1].Script:
CREATE TABLE census (year int,age int,ethnic int,sex int,area text,counttext);
COPY census FROM '.../Data8277.csv' WITH (FORMAT 'csv', HEADER true);
Profile:
# Samples: 59K of event 'cycles:u'
# Event count (approx.): 57644269486
#
# Overhead Command Shared Object Symbol
# ........ ........ ..................
.......................................
#
18.24% postgres postgres [.] CopyReadLine
9.23% postgres postgres [.] NextCopyFrom
8.87% postgres postgres [.] NextCopyFromRawFields
5.82% postgres postgres [.] pg_verify_mbstr_len
5.45% postgres postgres [.] pg_strtoint32
4.16% postgres postgres [.] heap_fill_tuple
4.03% postgres postgres [.] heap_compute_data_size
3.83% postgres postgres [.] CopyFrom
3.78% postgres postgres [.] AllocSetAlloc
3.53% postgres postgres [.] heap_form_tuple
2.96% postgres postgres [.] InputFunctionCall
2.89% postgres libc-2.30.so [.]__memmove_avx_unaligned_erms
1.82% postgres libc-2.30.so [.] __strlen_avx2
1.72% postgres postgres [.] AllocSetReset
1.72% postgres postgres [.] RelationPutHeapTuple
1.47% postgres postgres [.] heap_prepare_insert
1.31% postgres postgres [.] heap_multi_insert
1.25% postgres postgres [.] textin
1.24% postgres postgres [.] int4in
1.05% postgres postgres [.] tts_buffer_heap_clear
0.85% postgres postgres [.] pg_any_to_server
0.80% postgres postgres [.] pg_comp_crc32c_sse42
0.77% postgres postgres [.] cstring_to_text_with_len
0.69% postgres postgres [.] AllocSetFree
0.60% postgres postgres [.] appendBinaryStringInfo
0.55% postgres postgres [.]tts_buffer_heap_materialize.part.0
0.54% postgres postgres [.] palloc
0.54% postgres libc-2.30.so [.] __memmove_avx_unaligned
0.51% postgres postgres [.] palloc0
0.51% postgres postgres [.] pg_encoding_max_length
0.48% postgres postgres [.] enlargeStringInfo
0.47% postgres postgres [.] ExecStoreVirtualTuple
0.45% postgres postgres [.] PageAddItemExtendedSo that confirms that the parsing is a huge chunk of overhead with
current splitting into lines being the largest portion. Amdahl's law
says that splitting into tuples needs to be made fast before
parallelizing makes any sense.I had taken perf report with the same test data that you had used, I was getting the following results: ..... + 99.61% 0.00% postgres postgres [.] PortalRun + 99.61% 0.00% postgres postgres [.] PortalRunMulti + 99.61% 0.00% postgres postgres [.] PortalRunUtility + 99.61% 0.00% postgres postgres [.] ProcessUtility + 99.61% 0.00% postgres postgres [.] standard_ProcessUtility + 99.61% 0.00% postgres postgres [.] DoCopy + 99.30% 0.94% postgres postgres [.] CopyFrom + 51.61% 7.76% postgres postgres [.] NextCopyFrom + 23.66% 0.01% postgres postgres [.] CopyMultiInsertInfoFlush + 23.61% 0.28% postgres postgres [.] CopyMultiInsertBufferFlush + 21.99% 1.02% postgres postgres [.] NextCopyFromRawFields*+ 19.79% 0.01% postgres postgres [.]
table_multi_insert+ 19.32% 3.00% postgres postgres [.]
heap_multi_insert*+ 18.27% 2.44% postgres postgres [.]
InputFunctionCall*+ 15.19% 0.89% postgres postgres [.] CopyReadLine*+ 13.05% 0.18% postgres postgres [.] ExecMaterializeSlot + 13.00% 0.55% postgres postgres [.] tts_buffer_heap_materialize + 12.31% 1.77% postgres postgres [.] heap_form_tuple + 10.43% 0.45% postgres postgres [.] int4in + 10.18% 8.92% postgres postgres [.] CopyReadLineText ......In my results I observed execution table_multi_insert was nearly 20%. Also
I felt like once we have made few tuples from CopyReadLine, the parallel
workers should be able to start consuming the data and process the data. We
need not wait for the complete tokenisation to be finished. Once few tuples
are tokenised parallel workers should start consuming the data parallelly
and tokenisation should happen simultaneously. In this way once the copy is
done parallelly total execution time should be CopyReadLine Time + delta
processing time.Thoughts?
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
Attachments:
copy_execution_time_v2.patchtext/x-patch; charset=US-ASCII; name=copy_execution_time_v2.patchDownload
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index e79ede4..ea0cc6e 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -61,6 +61,8 @@
#define ISOCTAL(c) (((c) >= '0') && ((c) <= '7'))
#define OCTVALUE(c) ((c) - '0')
+double copyreadlineTime, readTime, totalTime;
+
/*
* Represents the different source/dest cases we need to worry about at
* the bottom level
@@ -610,10 +612,12 @@ static int
CopyGetData(CopyState cstate, void *databuf, int minread, int maxread)
{
int bytesread = 0;
-
+ struct timespec before, after;
switch (cstate->copy_dest)
{
case COPY_FILE:
+ INSTR_TIME_SET_CURRENT(before);
+
bytesread = fread(databuf, 1, maxread, cstate->copy_file);
if (ferror(cstate->copy_file))
ereport(ERROR,
@@ -621,6 +625,10 @@ CopyGetData(CopyState cstate, void *databuf, int minread, int maxread)
errmsg("could not read from COPY file: %m")));
if (bytesread == 0)
cstate->reached_eof = true;
+
+ INSTR_TIME_SET_CURRENT(after);
+ INSTR_TIME_SUBTRACT(after, before);
+ readTime += INSTR_TIME_GET_MILLISEC(after);
break;
case COPY_OLD_FE:
@@ -1059,8 +1067,15 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
if (is_from)
{
+ struct timespec before, after;
+ INSTR_TIME_SET_CURRENT(before);
Assert(rel);
+ /* Reset the variables before every copy operation */
+ readTime = 0;
+ copyreadlineTime = 0;
+ totalTime = 0;
+
/* check read-only transaction and parallel mode */
if (XactReadOnly && !rel->rd_islocaltemp)
PreventCommandIfReadOnly("COPY FROM");
@@ -1070,6 +1085,22 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
cstate->whereClause = whereClause;
*processed = CopyFrom(cstate); /* copy from file to database */
EndCopyFrom(cstate);
+
+ INSTR_TIME_SET_CURRENT(after);
+ INSTR_TIME_SUBTRACT(after, before);
+ totalTime += INSTR_TIME_GET_MILLISEC(after);
+
+ elog(LOG, "Total file read time for copying operation : %.3f ms\n",
+ readTime);
+
+ /* Read time is included in copyreadlinetime*/
+ elog(LOG, "Total copyreadline time for copying operation: %.3f ms\n",
+ copyreadlineTime - readTime);
+
+ elog(LOG, "Remaining execution time for copying operation: %.3f ms\n",
+ totalTime - copyreadlineTime);
+ elog(LOG, "Total time for copying: %.3f ms\n", totalTime);
+
}
else
{
@@ -3890,12 +3921,14 @@ static bool
CopyReadLine(CopyState cstate)
{
bool result;
+ struct timespec before, after;
resetStringInfo(&cstate->line_buf);
cstate->line_buf_valid = true;
/* Mark that encoding conversion hasn't occurred yet */
cstate->line_buf_converted = false;
+ INSTR_TIME_SET_CURRENT(before);
/* Parse data and transfer into line_buf */
result = CopyReadLineText(cstate);
@@ -3949,6 +3982,10 @@ CopyReadLine(CopyState cstate)
}
}
+ INSTR_TIME_SET_CURRENT(after);
+ INSTR_TIME_SUBTRACT(after, before);
+ copyreadlineTime += INSTR_TIME_GET_MILLISEC(after);
+
/* Done reading the line. Convert it to server encoding. */
if (cstate->need_transcoding)
{
On Thu, Mar 12, 2020 at 6:39 PM vignesh C <vignesh21@gmail.com> wrote:
Existing parallel copy code flow. Copy supports copy operation from
csv, txt & bin format file. For processing csv & text format, it will
read 64kb chunk or lesser size if in case the file has lesser size
contents in the input file. Server will then read one tuple of data
and do the processing of the tuple. If the above tuple that is
generated was less than 64kb data, then the server will try to
generate another tuple for processing from the remaining unprocessed
data. If it is not able to generate one tuple from the unprocessed
data it will do a further 64kb data read or lesser remaining size that
is present in the file and send the tuple for processing. This process
is repeated till the complete file is processed. For processing bin
format file the flow is slightly different. Server will read the
number of columns that are present. Then read the column size data and
then read the actual column contents, repeat this for all the columns.
Server will then process the tuple that is generated. This process is
repeated for all the remaining tuples in the bin file. The tuple
processing flow is the same in all the formats. Currently all the
operations happen sequentially. This project will help in
parallelizing the copy operation.
I'm planning to do the POC of parallel copy with the below design:
Proposed Syntax:
COPY table_name FROM ‘copy_file' WITH (FORMAT ‘format’, PARALLEL ‘workers’);
Users can specify the number of workers that must be used for copying
the data in parallel. Here ‘workers’ is the number of workers that
must be used for parallel copy operation apart from the leader. Leader
is responsible for reading the data from the input file and generating
the work for the workers. Leader will start a transaction and share
this transaction with the workers. All workers will be using the same
transaction to insert the records. Leader will create a circular queue
and share it across the workers. The circular queue will be present in
DSM. Leader will be using a fixed size queue to share the contents
between the leader and the workers. Currently we will have 100
elements present in the queue. This will be created before the workers
are started and shared with the workers. The data structures that are
required by the parallel workers will be initialized by the leader,
the size required in dsm will be calculated and the necessary keys
will be loaded in the DSM. The specified number of workers will then
be launched. Leader will read the table data from the file and copy
the contents to the queue element by element. Each element in the
queue will have 64K size DSA. This DSA will be used to store tuple
contents from the file. The leader will try to copy as much content as
possible within one 64K DSA queue element. We intend to store at least
one tuple in each queue element. There are some cases where the 64K
space may not be enough to store a single tuple. Mostly in cases where
the table has toast data present and the single tuple can be more than
64K size. In these scenarios we will extend the DSA space accordingly.
We cannot change the size of the dsm once the workers are launched.
Whereas in case of DSA we can free the dsa pointer and reallocate the
dsa pointer based on the memory size required. This is the very reason
for choosing DSA over DSM for storing the data that must be inserted
into the relation. Leader will keep on loading the data into the queue
till the queue becomes full. Leader will transform his role into a
worker either when the Queue is full or the Complete file is
processed. Once the queue is full, the leader will switch its role to
become a worker, then the leader will continue to act as worker till
25% of the elements in the queue is consumed by all the workers. Once
there is at least 25% space available in the queue leader who was
working as a worker will switch its role back to become the leader
again. The above process of filling the queue will be continued by the
leader until the whole file is processed. Leader will wait until the
respective workers finish processing the queue elements. The copy from
functionality is also being used during initdb operations where the
copy is intended to be performed in single mode or the user can still
continue running in non-parallel mode. In case of non parallel mode,
memory allocation will happen using palloc instead of DSM/DSA and most
of the flow will be the same in both parallel and non parallel cases.
We had a couple of options for the way in which queue elements can be stored.
Option 1: Each element (DSA chunk) will contain tuples such that each
tuple will be preceded by the length of the tuple. So the tuples will
be arranged like (Length of tuple-1, tuple-1), (Length of tuple-2,
tuple-2), .... Or Option 2: Each element (DSA chunk) will contain only
tuples (tuple-1), (tuple-2), ..... And we will have a second
ring-buffer which contains a start-offset or length of each tuple. The
old design used to generate one tuple of data and process tuple by
tuple. In the new design, the server will generate multiple tuples of
data per queue element. The worker will then process data tuple by
tuple. As we are processing the data tuple by tuple, I felt both of
the options are almost the same. However Design1 was chosen over
Design 2 as we can save up on some space that was required by another
variable in each element of the queue.
The parallel workers will read the tuples from the queue and do the
following operations, all of these operations: a) where clause
handling, b) convert tuple to columns, c) add default null values for
the missing columns that are not present in that record, d) find the
partition if it is partitioned table, e) before row insert Triggers,
constraints f) insertion of the data. Rest of the flow is the same as
the existing code.
Enhancements after POC is done:
Initially we plan to use the number of workers based on the worker
count user has specified, Later we will do some experiments and think
of an approach to choose workers automatically after processing sample
contents from the file.
Initially we plan to use 100 elements in the queue, Later we will
experiment to find the right size for the queue once the basic patch
is ready.
Initially we plan to generate the transaction from the leader and
share it across to the workers. Later we will change this in such a
way that the first process that will do an insert operation will
generate the transaction and share it with the rest of them.
Thoughts?
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
On Tue, 7 Apr 2020 at 08:24, vignesh C <vignesh21@gmail.com> wrote:
Leader will create a circular queue
and share it across the workers. The circular queue will be present in
DSM. Leader will be using a fixed size queue to share the contents
between the leader and the workers. Currently we will have 100
elements present in the queue. This will be created before the workers
are started and shared with the workers. The data structures that are
required by the parallel workers will be initialized by the leader,
the size required in dsm will be calculated and the necessary keys
will be loaded in the DSM. The specified number of workers will then
be launched. Leader will read the table data from the file and copy
the contents to the queue element by element. Each element in the
queue will have 64K size DSA. This DSA will be used to store tuple
contents from the file. The leader will try to copy as much content as
possible within one 64K DSA queue element. We intend to store at least
one tuple in each queue element. There are some cases where the 64K
space may not be enough to store a single tuple. Mostly in cases where
the table has toast data present and the single tuple can be more than
64K size. In these scenarios we will extend the DSA space accordingly.
We cannot change the size of the dsm once the workers are launched.
Whereas in case of DSA we can free the dsa pointer and reallocate the
dsa pointer based on the memory size required. This is the very reason
for choosing DSA over DSM for storing the data that must be inserted
into the relation.
I think the element based approach and requirement that all tuples fit
into the queue makes things unnecessarily complex. The approach I
detailed earlier allows for tuples to be bigger than the buffer. In
that case a worker will claim the long tuple from the ring queue of
tuple start positions, and starts copying it into its local line_buf.
This can wrap around the buffer multiple times until the next start
position shows up. At that point this worker can proceed with
inserting the tuple and the next worker will claim the next tuple.
This way nothing needs to be resized, there is no risk of a file with
huge tuples running the system out of memory because each element will
be reallocated to be huge and the number of elements is not something
that has to be tuned.
We had a couple of options for the way in which queue elements can be stored.
Option 1: Each element (DSA chunk) will contain tuples such that each
tuple will be preceded by the length of the tuple. So the tuples will
be arranged like (Length of tuple-1, tuple-1), (Length of tuple-2,
tuple-2), .... Or Option 2: Each element (DSA chunk) will contain only
tuples (tuple-1), (tuple-2), ..... And we will have a second
ring-buffer which contains a start-offset or length of each tuple. The
old design used to generate one tuple of data and process tuple by
tuple. In the new design, the server will generate multiple tuples of
data per queue element. The worker will then process data tuple by
tuple. As we are processing the data tuple by tuple, I felt both of
the options are almost the same. However Design1 was chosen over
Design 2 as we can save up on some space that was required by another
variable in each element of the queue.
With option 1 it's not possible to read input data into shared memory
and there needs to be an extra memcpy in the time critical sequential
flow of the leader. With option 2 data could be read directly into the
shared memory buffer. With future async io support, reading and
looking for tuple boundaries could be performed concurrently.
Regards,
Ants Aasma
Cybertec
On Tue, Apr 7, 2020 at 7:08 PM Ants Aasma <ants@cybertec.at> wrote:
On Tue, 7 Apr 2020 at 08:24, vignesh C <vignesh21@gmail.com> wrote:
Leader will create a circular queue
and share it across the workers. The circular queue will be present in
DSM. Leader will be using a fixed size queue to share the contents
between the leader and the workers. Currently we will have 100
elements present in the queue. This will be created before the workers
are started and shared with the workers. The data structures that are
required by the parallel workers will be initialized by the leader,
the size required in dsm will be calculated and the necessary keys
will be loaded in the DSM. The specified number of workers will then
be launched. Leader will read the table data from the file and copy
the contents to the queue element by element. Each element in the
queue will have 64K size DSA. This DSA will be used to store tuple
contents from the file. The leader will try to copy as much content as
possible within one 64K DSA queue element. We intend to store at least
one tuple in each queue element. There are some cases where the 64K
space may not be enough to store a single tuple. Mostly in cases where
the table has toast data present and the single tuple can be more than
64K size. In these scenarios we will extend the DSA space accordingly.
We cannot change the size of the dsm once the workers are launched.
Whereas in case of DSA we can free the dsa pointer and reallocate the
dsa pointer based on the memory size required. This is the very reason
for choosing DSA over DSM for storing the data that must be inserted
into the relation.I think the element based approach and requirement that all tuples fit
into the queue makes things unnecessarily complex. The approach I
detailed earlier allows for tuples to be bigger than the buffer. In
that case a worker will claim the long tuple from the ring queue of
tuple start positions, and starts copying it into its local line_buf.
This can wrap around the buffer multiple times until the next start
position shows up. At that point this worker can proceed with
inserting the tuple and the next worker will claim the next tuple.
IIUC, with the fixed size buffer, the parallelism might hit a bit
because till the worker copies the data from shared buffer to local
buffer the reader process won't be able to continue. I think there
will be somewhat more leader-worker coordination is required with the
fixed buffer size. However, as you pointed out, we can't allow it to
increase it to max_size possible for all tuples as that might require
a lot of memory. One idea could be that we allow it for first any
such tuple and then if any other element/chunk in the queue required
more memory than the default 64KB, then we will always fallback to use
the memory we have allocated for first chunk. This will allow us to
not use more memory except for one tuple and won't hit parallelism
much as in many cases not all tuples will be so large.
I think in the proposed approach queue element is nothing but a way to
divide the work among workers based on size rather than based on
number of tuples. Say if we try to divide the work among workers
based on start offsets, it can be more tricky. Because it could lead
to either a lot of contentention if we choose say one offset
per-worker (basically copy the data for one tuple, process it and then
pick next tuple) or probably unequal division of work because some can
be smaller and others can be bigger. I guess division based on size
would be a better idea. OTOH, I see the advantage of your approach as
well and I will think more on it.
We had a couple of options for the way in which queue elements can be stored.
Option 1: Each element (DSA chunk) will contain tuples such that each
tuple will be preceded by the length of the tuple. So the tuples will
be arranged like (Length of tuple-1, tuple-1), (Length of tuple-2,
tuple-2), .... Or Option 2: Each element (DSA chunk) will contain only
tuples (tuple-1), (tuple-2), ..... And we will have a second
ring-buffer which contains a start-offset or length of each tuple. The
old design used to generate one tuple of data and process tuple by
tuple. In the new design, the server will generate multiple tuples of
data per queue element. The worker will then process data tuple by
tuple. As we are processing the data tuple by tuple, I felt both of
the options are almost the same. However Design1 was chosen over
Design 2 as we can save up on some space that was required by another
variable in each element of the queue.With option 1 it's not possible to read input data into shared memory
and there needs to be an extra memcpy in the time critical sequential
flow of the leader. With option 2 data could be read directly into the
shared memory buffer. With future async io support, reading and
looking for tuple boundaries could be performed concurrently.
Yeah, option-2 sounds better.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Tue, Apr 7, 2020 at 9:38 AM Ants Aasma <ants@cybertec.at> wrote:
I think the element based approach and requirement that all tuples fit
into the queue makes things unnecessarily complex. The approach I
detailed earlier allows for tuples to be bigger than the buffer. In
that case a worker will claim the long tuple from the ring queue of
tuple start positions, and starts copying it into its local line_buf.
This can wrap around the buffer multiple times until the next start
position shows up. At that point this worker can proceed with
inserting the tuple and the next worker will claim the next tuple.This way nothing needs to be resized, there is no risk of a file with
huge tuples running the system out of memory because each element will
be reallocated to be huge and the number of elements is not something
that has to be tuned.
+1. This seems like the right way to do it.
We had a couple of options for the way in which queue elements can be stored.
Option 1: Each element (DSA chunk) will contain tuples such that each
tuple will be preceded by the length of the tuple. So the tuples will
be arranged like (Length of tuple-1, tuple-1), (Length of tuple-2,
tuple-2), .... Or Option 2: Each element (DSA chunk) will contain only
tuples (tuple-1), (tuple-2), ..... And we will have a second
ring-buffer which contains a start-offset or length of each tuple. The
old design used to generate one tuple of data and process tuple by
tuple. In the new design, the server will generate multiple tuples of
data per queue element. The worker will then process data tuple by
tuple. As we are processing the data tuple by tuple, I felt both of
the options are almost the same. However Design1 was chosen over
Design 2 as we can save up on some space that was required by another
variable in each element of the queue.With option 1 it's not possible to read input data into shared memory
and there needs to be an extra memcpy in the time critical sequential
flow of the leader. With option 2 data could be read directly into the
shared memory buffer. With future async io support, reading and
looking for tuple boundaries could be performed concurrently.
But option 2 still seems significantly worse than your proposal above, right?
I really think we don't want a single worker in charge of finding
tuple boundaries for everybody. That adds a lot of unnecessary
inter-process communication and synchronization. Each process should
just get the next tuple starting after where the last one ended, and
then advance the end pointer so that the next process can do the same
thing. Vignesh's proposal involves having a leader process that has to
switch roles - he picks an arbitrary 25% threshold - and if it doesn't
switch roles at the right time, performance will be impacted. If the
leader doesn't get scheduled in time to refill the queue before it
runs completely empty, workers will have to wait. Ants's scheme avoids
that risk: whoever needs the next tuple reads the next line. There's
no need to ever wait for the leader because there is no leader.
I think it's worth enumerating some of the other ways that a project
in this area can fail to achieve good speedups, so that we can try to
avoid those that are avoidable and be aware of the others:
- If we're unable to supply data to the COPY process as fast as the
workers could load it, then speed will be limited at that point. We
know reading the file from disk is pretty fast compared to what a
single process can do. I'm not sure we've tested what happens with a
network socket. It will depend on the network speed some, but it might
be useful to know how many MB/s we can pump through over a UNIX
socket.
- The portion of the time that is used to split the lines is not
easily parallelizable. That seems to be a fairly small percentage for
a reasonably wide table, but it looks significant (13-18%) for a
narrow table. Such cases will gain less performance and be limited to
a smaller number of workers. I think we also need to be careful about
files whose lines are longer than the size of the buffer. If we're not
careful, we could get a significant performance drop-off in such
cases. We should make sure to pick an algorithm that seems like it
will handle such cases without serious regressions and check that a
file composed entirely of such long lines is handled reasonably
efficiently.
- There could be index contention. Let's suppose that we can read data
super fast and break it up into lines super fast. Maybe the file we're
reading is fully RAM-cached and the lines are long. Now all of the
backends are inserting into the indexes at the same time, and they
might be trying to insert into the same pages. If so, lock contention
could become a factor that hinders performance.
- There could also be similar contention on the heap. Say the tuples
are narrow, and many backends are trying to insert tuples into the
same heap page at the same time. This would lead to many lock/unlock
cycles. This could be avoided if the backends avoid targeting the same
heap pages, but I'm not sure there's any reason to expect that they
would do so unless we make some special provision for it.
- These problems could also arise with respect to TOAST table
insertions, either on the TOAST table itself or on its index. This
would only happen if the table contains a lot of toastable values, but
that could be the case: imagine a table with a bunch of columns each
of which contains a long string that isn't very compressible.
- What else? I bet the above list is not comprehensive.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Wed, 8 Apr 2020 at 22:30, Robert Haas <robertmhaas@gmail.com> wrote:
- If we're unable to supply data to the COPY process as fast as the
workers could load it, then speed will be limited at that point. We
know reading the file from disk is pretty fast compared to what a
single process can do. I'm not sure we've tested what happens with a
network socket. It will depend on the network speed some, but it might
be useful to know how many MB/s we can pump through over a UNIX
socket.
This raises a good point. If at some point we want to minimize the
amount of memory copies then we might want to allow for RDMA to
directly write incoming network traffic into a distributing ring
buffer, which would include the protocol level headers. But at this
point we are so far off from network reception becoming a bottleneck I
don't think it's worth holding anything up for not allowing for zero
copy transfers.
- The portion of the time that is used to split the lines is not
easily parallelizable. That seems to be a fairly small percentage for
a reasonably wide table, but it looks significant (13-18%) for a
narrow table. Such cases will gain less performance and be limited to
a smaller number of workers. I think we also need to be careful about
files whose lines are longer than the size of the buffer. If we're not
careful, we could get a significant performance drop-off in such
cases. We should make sure to pick an algorithm that seems like it
will handle such cases without serious regressions and check that a
file composed entirely of such long lines is handled reasonably
efficiently.
I don't have a proof, but my gut feel tells me that it's fundamentally
impossible to ingest csv without a serial line-ending/comment
tokenization pass. The current line splitting algorithm is terrible.
I'm currently working with some scientific data where on ingestion
CopyReadLineText() is about 25% on profiles. I prototyped a
replacement that can do ~8GB/s on narrow rows, more on wider ones.
For rows that are consistently wider than the input buffer I think
parallelism will still give a win - the serial phase is just memcpy
through a ringbuffer, after which a worker goes away to perform the
actual insert, letting the next worker read the data. The memcpy is
already happening today, CopyReadLineText() copies the input buffer
into a StringInfo, so the only extra work is synchronization between
leader and worker.
- There could be index contention. Let's suppose that we can read data
super fast and break it up into lines super fast. Maybe the file we're
reading is fully RAM-cached and the lines are long. Now all of the
backends are inserting into the indexes at the same time, and they
might be trying to insert into the same pages. If so, lock contention
could become a factor that hinders performance.
Different data distribution strategies can have an effect on that.
Dealing out input data in larger or smaller chunks will have a
considerable effect on contention, btree page splits and all kinds of
things. I think the common theme would be a push to increase chunk
size to reduce contention..
- There could also be similar contention on the heap. Say the tuples
are narrow, and many backends are trying to insert tuples into the
same heap page at the same time. This would lead to many lock/unlock
cycles. This could be avoided if the backends avoid targeting the same
heap pages, but I'm not sure there's any reason to expect that they
would do so unless we make some special provision for it.
I thought there already was a provision for that. Am I mis-remembering?
- What else? I bet the above list is not comprehensive.
I think parallel copy patch needs to concentrate on splitting input
data to workers. After that any performance issues would be basically
the same as a normal parallel insert workload. There may well be
bottlenecks there, but those could be tackled independently.
Regards,
Ants Aasma
Cybertec
On Thu, Apr 9, 2020 at 1:00 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Apr 7, 2020 at 9:38 AM Ants Aasma <ants@cybertec.at> wrote:
With option 1 it's not possible to read input data into shared memory
and there needs to be an extra memcpy in the time critical sequential
flow of the leader. With option 2 data could be read directly into the
shared memory buffer. With future async io support, reading and
looking for tuple boundaries could be performed concurrently.But option 2 still seems significantly worse than your proposal above, right?
I really think we don't want a single worker in charge of finding
tuple boundaries for everybody. That adds a lot of unnecessary
inter-process communication and synchronization. Each process should
just get the next tuple starting after where the last one ended, and
then advance the end pointer so that the next process can do the same
thing. Vignesh's proposal involves having a leader process that has to
switch roles - he picks an arbitrary 25% threshold - and if it doesn't
switch roles at the right time, performance will be impacted. If the
leader doesn't get scheduled in time to refill the queue before it
runs completely empty, workers will have to wait. Ants's scheme avoids
that risk: whoever needs the next tuple reads the next line. There's
no need to ever wait for the leader because there is no leader.
Hmm, I think in his scheme also there is a single reader process. See
the email above [1] where he described how it should work. I think
the difference is in the division of work. AFAIU, in Ants scheme, the
worker needs to pick the work from tuple_offset queue whereas in
Vignesh's scheme it will be based on the size (each worker will get
probably 64KB of work). I think in his scheme the main thing to find
out is how many tuple offsets to be assigned to each worker in one-go
so that we don't unnecessarily add contention for finding the work
unit. I think we need to find the right balance between size and
number of tuples. I am trying to consider size here because larger
sized tuples will probably require more time as we need to allocate
more space for them and also probably requires more processing time.
One way to achieve that could be each worker will try to claim 500
tuples (or some other threshold number) but if their size is greater
than 64K (or some other threshold size) then the worker will try with
lesser number of tuples (such that the size of the chunk of tuples is
less than a threshold size.).
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Thu, Apr 9, 2020 at 4:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Apr 9, 2020 at 1:00 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Apr 7, 2020 at 9:38 AM Ants Aasma <ants@cybertec.at> wrote:
With option 1 it's not possible to read input data into shared memory
and there needs to be an extra memcpy in the time critical sequential
flow of the leader. With option 2 data could be read directly into the
shared memory buffer. With future async io support, reading and
looking for tuple boundaries could be performed concurrently.But option 2 still seems significantly worse than your proposal above, right?
I really think we don't want a single worker in charge of finding
tuple boundaries for everybody. That adds a lot of unnecessary
inter-process communication and synchronization. Each process should
just get the next tuple starting after where the last one ended, and
then advance the end pointer so that the next process can do the same
thing. Vignesh's proposal involves having a leader process that has to
switch roles - he picks an arbitrary 25% threshold - and if it doesn't
switch roles at the right time, performance will be impacted. If the
leader doesn't get scheduled in time to refill the queue before it
runs completely empty, workers will have to wait. Ants's scheme avoids
that risk: whoever needs the next tuple reads the next line. There's
no need to ever wait for the leader because there is no leader.Hmm, I think in his scheme also there is a single reader process. See
the email above [1] where he described how it should work.
oops, I forgot to specify the link to the email. See
/messages/by-id/CANwKhkO87A8gApobOz_o6c9P5auuEG1W2iCz0D5CfOeGgAnk3g@mail.gmail.com
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Thu, Apr 9, 2020 at 3:55 AM Ants Aasma <ants@cybertec.at> wrote:
On Wed, 8 Apr 2020 at 22:30, Robert Haas <robertmhaas@gmail.com> wrote:
- The portion of the time that is used to split the lines is not
easily parallelizable. That seems to be a fairly small percentage for
a reasonably wide table, but it looks significant (13-18%) for a
narrow table. Such cases will gain less performance and be limited to
a smaller number of workers. I think we also need to be careful about
files whose lines are longer than the size of the buffer. If we're not
careful, we could get a significant performance drop-off in such
cases. We should make sure to pick an algorithm that seems like it
will handle such cases without serious regressions and check that a
file composed entirely of such long lines is handled reasonably
efficiently.I don't have a proof, but my gut feel tells me that it's fundamentally
impossible to ingest csv without a serial line-ending/comment
tokenization pass.
I think even if we try to do it via multiple workers it might not be
better. In such a scheme, every worker needs to update the end
boundaries and the next worker to keep a check if the previous has
updated the end pointer. I think this can add a significant
synchronization effort for cases where tuples are of 100 or so bytes
which will be a common case.
The current line splitting algorithm is terrible.
I'm currently working with some scientific data where on ingestion
CopyReadLineText() is about 25% on profiles. I prototyped a
replacement that can do ~8GB/s on narrow rows, more on wider ones.
Good to hear. I think that will be a good project on its own and that
might give a boost to parallel copy as with that we can further reduce
the non-parallelizable work unit.
For rows that are consistently wider than the input buffer I think
parallelism will still give a win - the serial phase is just memcpy
through a ringbuffer, after which a worker goes away to perform the
actual insert, letting the next worker read the data. The memcpy is
already happening today, CopyReadLineText() copies the input buffer
into a StringInfo, so the only extra work is synchronization between
leader and worker.- There could also be similar contention on the heap. Say the tuples
are narrow, and many backends are trying to insert tuples into the
same heap page at the same time. This would lead to many lock/unlock
cycles. This could be avoided if the backends avoid targeting the same
heap pages, but I'm not sure there's any reason to expect that they
would do so unless we make some special provision for it.I thought there already was a provision for that. Am I mis-remembering?
The copy uses heap_multi_insert to insert batch of tuples and I think
each batch should ideally use a different page mostly it will be a new
page. So, not sure if this will be a problem or a problem of a level
for which we need to do some special handling. But if this turns out
to be a problem, we definetly need some better way to deal with it.
- What else? I bet the above list is not comprehensive.
I think parallel copy patch needs to concentrate on splitting input
data to workers. After that any performance issues would be basically
the same as a normal parallel insert workload. There may well be
bottlenecks there, but those could be tackled independently.
I agree.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Thu, Apr 9, 2020 at 1:00 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Apr 7, 2020 at 9:38 AM Ants Aasma <ants@cybertec.at> wrote:
I think the element based approach and requirement that all tuples fit
into the queue makes things unnecessarily complex. The approach I
detailed earlier allows for tuples to be bigger than the buffer. In
that case a worker will claim the long tuple from the ring queue of
tuple start positions, and starts copying it into its local line_buf.
This can wrap around the buffer multiple times until the next start
position shows up. At that point this worker can proceed with
inserting the tuple and the next worker will claim the next tuple.This way nothing needs to be resized, there is no risk of a file with
huge tuples running the system out of memory because each element will
be reallocated to be huge and the number of elements is not something
that has to be tuned.+1. This seems like the right way to do it.
We had a couple of options for the way in which queue elements can be stored.
Option 1: Each element (DSA chunk) will contain tuples such that each
tuple will be preceded by the length of the tuple. So the tuples will
be arranged like (Length of tuple-1, tuple-1), (Length of tuple-2,
tuple-2), .... Or Option 2: Each element (DSA chunk) will contain only
tuples (tuple-1), (tuple-2), ..... And we will have a second
ring-buffer which contains a start-offset or length of each tuple. The
old design used to generate one tuple of data and process tuple by
tuple. In the new design, the server will generate multiple tuples of
data per queue element. The worker will then process data tuple by
tuple. As we are processing the data tuple by tuple, I felt both of
the options are almost the same. However Design1 was chosen over
Design 2 as we can save up on some space that was required by another
variable in each element of the queue.With option 1 it's not possible to read input data into shared memory
and there needs to be an extra memcpy in the time critical sequential
flow of the leader. With option 2 data could be read directly into the
shared memory buffer. With future async io support, reading and
looking for tuple boundaries could be performed concurrently.But option 2 still seems significantly worse than your proposal above, right?
I really think we don't want a single worker in charge of finding
tuple boundaries for everybody. That adds a lot of unnecessary
inter-process communication and synchronization. Each process should
just get the next tuple starting after where the last one ended, and
then advance the end pointer so that the next process can do the same
thing. Vignesh's proposal involves having a leader process that has to
switch roles - he picks an arbitrary 25% threshold - and if it doesn't
switch roles at the right time, performance will be impacted. If the
leader doesn't get scheduled in time to refill the queue before it
runs completely empty, workers will have to wait. Ants's scheme avoids
that risk: whoever needs the next tuple reads the next line. There's
no need to ever wait for the leader because there is no leader.
I agree that if the leader switches the role, then it is possible that
sometimes the leader might not produce the work before the queue is
empty. OTOH, the problem with the approach you are suggesting is that
the work will be generated on-demand, i.e. there is no specific
process who is generating the data while workers are busy inserting
the data. So IMHO, if we have a specific leader process then there
will always be work available for all the workers. I agree that we
need to find the correct point when the leader will work as a worker.
One idea could be that when the queue is full and there is no space to
push more work to queue then the leader himself processes that work.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Thu, Apr 9, 2020 at 7:49 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
I agree that if the leader switches the role, then it is possible that
sometimes the leader might not produce the work before the queue is
empty. OTOH, the problem with the approach you are suggesting is that
the work will be generated on-demand, i.e. there is no specific
process who is generating the data while workers are busy inserting
the data.
I think you have a point. The way I think things could go wrong if we
don't have a leader is if it tends to happen that everyone wants new
work at the same time. In that case, everyone will wait at once,
whereas if there is a designated process that aggressively queues up
work, we could perhaps avoid that. Note that you really have to have
the case where everyone wants new work at the exact same moment,
because otherwise they just all take turns finding work for
themselves, and everything is fine, because nobody's waiting for
anybody else to do any work, so everyone is always making forward
progress.
Now on the other hand, if we do have a leader, and for some reason
it's slow in responding, everyone will have to wait. That could happen
either because the leader also has other responsibilities, like
reading data or helping with the main work when the queue is full, or
just because the system is really busy and the leader doesn't get
scheduled on-CPU for a while. I am inclined to think that's likely to
be a more serious problem.
The thing is, the problem of everyone needing new work at the same
time can't really keep on repeating. Say that everyone finishes
processing their first chunk at the same time. Now everyone needs a
second chunk, and in a leaderless system, they must take turns getting
it. So they will go in some order. The ones who go later will
presumably also finish later, so the end times for the second and
following chunks will be scattered. You shouldn't get repeated
pile-ups with everyone finishing at the same time, because each time
it happens, it will force a little bit of waiting that will spread
things out. If they clump up again, that will happen again, but it
shouldn't happen every time.
But in the case where there is a leader, I don't think there's any
similar protection. Suppose we go with the design Vignesh proposes
where the leader switches to processing chunks when the queue is more
than 75% full. If the leader has a "hiccup" where it gets swapped out
or is busy with processing a chunk for a longer-than-normal time, all
of the other processes have to wait for it. Now we can probably tune
this to some degree by adjusting the queue size and fullness
thresholds, but the optimal values for those parameters might be quite
different on different systems, depending on load, I/O performance,
CPU architecture, etc. If there's a system or configuration where the
leader tends not to respond fast enough, it will probably just keep
happening, because nothing in the algorithm will tend to shake it out
of that bad pattern.
I'm not 100% certain that my analysis here is right, so it will be
interesting to hear from other people. However, as a general rule, I
think we want to minimize the amount of work that can only be done by
one process (the leader) and maximize the amount that can be done by
any process with whichever one is available taking on the job. In the
case of COPY FROM STDIN, the reads from the network socket can only be
done by the one process connected to it. In the case of COPY from a
file, even that could be rotated around, if all processes open the
file individually and seek to the appropriate offset.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi,
On April 9, 2020 4:01:43 AM PDT, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Apr 9, 2020 at 3:55 AM Ants Aasma <ants@cybertec.at> wrote:
On Wed, 8 Apr 2020 at 22:30, Robert Haas <robertmhaas@gmail.com>
wrote:
- The portion of the time that is used to split the lines is not
easily parallelizable. That seems to be a fairly small percentagefor
a reasonably wide table, but it looks significant (13-18%) for a
narrow table. Such cases will gain less performance and be limitedto
a smaller number of workers. I think we also need to be careful
about
files whose lines are longer than the size of the buffer. If we're
not
careful, we could get a significant performance drop-off in such
cases. We should make sure to pick an algorithm that seems like it
will handle such cases without serious regressions and check that a
file composed entirely of such long lines is handled reasonably
efficiently.I don't have a proof, but my gut feel tells me that it's
fundamentally
impossible to ingest csv without a serial line-ending/comment
tokenization pass.
I can't quite see a way either. But even if it were, I have a hard time seeing parallelizing that path as the right thing.
I think even if we try to do it via multiple workers it might not be
better. In such a scheme, every worker needs to update the end
boundaries and the next worker to keep a check if the previous has
updated the end pointer. I think this can add a significant
synchronization effort for cases where tuples are of 100 or so bytes
which will be a common case.
It seems like it'd also have terrible caching and instruction level parallelism behavior. By constantly switching the process that analyzes boundaries, the current data will have to be brought into l1/register, rather than staying there.
I'm fairly certain that we do *not* want to distribute input data between processes on a single tuple basis. Probably not even below a few hundred kb. If there's any sort of natural clustering in the loaded data - extremely common, think timestamps - splitting on a granular basis will make indexing much more expensive. And have a lot more contention.
The current line splitting algorithm is terrible.
I'm currently working with some scientific data where on ingestion
CopyReadLineText() is about 25% on profiles. I prototyped a
replacement that can do ~8GB/s on narrow rows, more on wider ones.
We should really replace the entire copy parsing code. It's terrible.
Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
On Thu, Apr 9, 2020 at 2:55 PM Andres Freund <andres@anarazel.de> wrote:
I'm fairly certain that we do *not* want to distribute input data between processes on a single tuple basis. Probably not even below a few hundred kb. If there's any sort of natural clustering in the loaded data - extremely common, think timestamps - splitting on a granular basis will make indexing much more expensive. And have a lot more contention.
That's a fair point. I think the solution ought to be that once any
process starts finding line endings, it continues until it's grabbed
at least a certain amount of data for itself. Then it stops and lets
some other process grab a chunk of data.
Or are you are arguing that there should be only one process that's
allowed to find line endings for the entire duration of the load?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi,
On April 9, 2020 12:29:09 PM PDT, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Apr 9, 2020 at 2:55 PM Andres Freund <andres@anarazel.de>
wrote:I'm fairly certain that we do *not* want to distribute input data
between processes on a single tuple basis. Probably not even below a
few hundred kb. If there's any sort of natural clustering in the loaded
data - extremely common, think timestamps - splitting on a granular
basis will make indexing much more expensive. And have a lot more
contention.That's a fair point. I think the solution ought to be that once any
process starts finding line endings, it continues until it's grabbed
at least a certain amount of data for itself. Then it stops and lets
some other process grab a chunk of data.Or are you are arguing that there should be only one process that's
allowed to find line endings for the entire duration of the load?
I've not yet read the whole thread. So I'm probably restating ideas.
Imo, yes, there should be only one process doing the chunking. For ilp, cache efficiency, but also because the leader is the only process with access to the network socket. It should load input data into one large buffer that's shared across processes. There should be a separate ringbuffer with tuple/partial tuple (for huge tuples) offsets. Worker processes should grab large chunks of offsets from the offset ringbuffer. If the ringbuffer is not full, the worker chunks should be reduced in size.
Given that everything stalls if the leader doesn't accept further input data, as well as when there are no available splitted chunks, it doesn't seem like a good idea to have the leader do other work.
I don't think optimizing/targeting copy from local files, where multiple processes could read, is useful. COPY STDIN is the only thing that practically matters.
Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
On Thu, Apr 9, 2020 at 4:00 PM Andres Freund <andres@anarazel.de> wrote:
I've not yet read the whole thread. So I'm probably restating ideas.
Yeah, but that's OK.
Imo, yes, there should be only one process doing the chunking. For ilp, cache efficiency, but also because the leader is the only process with access to the network socket. It should load input data into one large buffer that's shared across processes. There should be a separate ringbuffer with tuple/partial tuple (for huge tuples) offsets. Worker processes should grab large chunks of offsets from the offset ringbuffer. If the ringbuffer is not full, the worker chunks should be reduced in size.
My concern here is that it's going to be hard to avoid processes going
idle. If the leader does nothing at all once the ring buffer is full,
it's wasting time that it could spend processing a chunk. But if it
picks up a chunk, then it might not get around to refilling the buffer
before other processes are idle with no work to do.
Still, it might be the case that having the process that is reading
the data also find the line endings is so fast that it makes no sense
to split those two tasks. After all, whoever just read the data must
have it in cache, and that helps a lot.
Given that everything stalls if the leader doesn't accept further input data, as well as when there are no available splitted chunks, it doesn't seem like a good idea to have the leader do other work.
I don't think optimizing/targeting copy from local files, where multiple processes could read, is useful. COPY STDIN is the only thing that practically matters.
Yeah, I think Amit has been thinking primarily in terms of COPY from
files, and I've been encouraging him to at least consider the STDIN
case. But I think you're right, and COPY FROM STDIN should be the
design center for this feature.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi,
On 2020-04-10 07:40:06 -0400, Robert Haas wrote:
On Thu, Apr 9, 2020 at 4:00 PM Andres Freund <andres@anarazel.de> wrote:
Imo, yes, there should be only one process doing the chunking. For ilp, cache efficiency, but also because the leader is the only process with access to the network socket. It should load input data into one large buffer that's shared across processes. There should be a separate ringbuffer with tuple/partial tuple (for huge tuples) offsets. Worker processes should grab large chunks of offsets from the offset ringbuffer. If the ringbuffer is not full, the worker chunks should be reduced in size.
My concern here is that it's going to be hard to avoid processes going
idle. If the leader does nothing at all once the ring buffer is full,
it's wasting time that it could spend processing a chunk. But if it
picks up a chunk, then it might not get around to refilling the buffer
before other processes are idle with no work to do.
An idle process doesn't cost much. Processes that use CPU inefficiently
however...
Still, it might be the case that having the process that is reading
the data also find the line endings is so fast that it makes no sense
to split those two tasks. After all, whoever just read the data must
have it in cache, and that helps a lot.
Yea. And if it's not fast enough to split lines, then we have a problem
regardless of which process does the splitting.
Greetings,
Andres Freund
On Fri, Apr 10, 2020 at 2:26 PM Andres Freund <andres@anarazel.de> wrote:
Still, it might be the case that having the process that is reading
the data also find the line endings is so fast that it makes no sense
to split those two tasks. After all, whoever just read the data must
have it in cache, and that helps a lot.Yea. And if it's not fast enough to split lines, then we have a problem
regardless of which process does the splitting.
Still, if the reader does the splitting, then you don't need as much
IPC, right? The shared memory data structure is just a ring of bytes,
and whoever reads from it is responsible for the rest.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi,
On 2020-04-13 14:13:46 -0400, Robert Haas wrote:
On Fri, Apr 10, 2020 at 2:26 PM Andres Freund <andres@anarazel.de> wrote:
Still, it might be the case that having the process that is reading
the data also find the line endings is so fast that it makes no sense
to split those two tasks. After all, whoever just read the data must
have it in cache, and that helps a lot.Yea. And if it's not fast enough to split lines, then we have a problem
regardless of which process does the splitting.Still, if the reader does the splitting, then you don't need as much
IPC, right? The shared memory data structure is just a ring of bytes,
and whoever reads from it is responsible for the rest.
I don't think so. If only one process does the splitting, the
exclusively locked section is just popping off a bunch of offsets of the
ring. And that could fairly easily be done with atomic ops (since what
we need is basically a single producer multiple consumer queue, which
can be done lock free fairly easily ). Whereas in the case of each
process doing the splitting, the exclusively locked part is splitting
along lines - which takes considerably longer than just popping off a
few offsets.
Greetings,
Andres Freund
On Mon, Apr 13, 2020 at 4:16 PM Andres Freund <andres@anarazel.de> wrote:
I don't think so. If only one process does the splitting, the
exclusively locked section is just popping off a bunch of offsets of the
ring. And that could fairly easily be done with atomic ops (since what
we need is basically a single producer multiple consumer queue, which
can be done lock free fairly easily ). Whereas in the case of each
process doing the splitting, the exclusively locked part is splitting
along lines - which takes considerably longer than just popping off a
few offsets.
Hmm, that does seem believable.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hello,
I was going through some literatures on parsing CSV files in a fully
parallelized way and found (from [1]https://www.microsoft.com/en-us/research/uploads/prod/2019/04/chunker-sigmod19.pdf) an interesting approach
implemented in the open-source project ParaText[2]ParaText. https://github.com/wiseio/paratext.. The algorithm
follows a two-phase approach: the first pass identifies the adjusted
chunks in parallel by exploiting the simplicity of CSV formats and the
second phase processes complete records within each adjusted chunk by
one of the available workers. Here is the sketch:
1. Each worker scans a distinct fixed sized chunk of the CSV file and
collects the following three stats from the chunk:
a) number of quotes
b) position of the first new line after even number of quotes
c) position of the first new line after odd number of quotes
2. Once stats from all the chunks are collected, the leader identifies
the adjusted chunk boundaries by iterating over the stats linearly:
- For the k-th chunk, the leader adds the number of quotes in k-1 chunks.
- If the number is even, then the k-th chunk does not start in the
middle of a quoted field, and the first newline after an even number
of quotes (the second collected information) is the first record
delimiter in this chunk.
- Otherwise, if the number is odd, the first newline after an odd
number of quotes (the third collected information) is the first record
delimiter.
- The end position of the adjusted chunk is obtained based on the
starting position of the next adjusted chunk.
3. Once the boundaries of the chunks are determined (forming adjusted
chunks), individual worker may take up one adjusted chunk and process
the tuples independently.
Although this approach parses the CSV in parallel, it requires two
scan on the CSV file. So, given a system with spinning hard-disk and
small RAM, as per my understanding, the algorithm will perform very
poorly. But, if we use this algorithm to parse a CSV file on a
multi-core system with a large RAM, the performance might be improved
significantly [1]https://www.microsoft.com/en-us/research/uploads/prod/2019/04/chunker-sigmod19.pdf.
Hence, I was trying to think whether we can leverage this idea for
implementing parallel COPY in PG. We can design an algorithm similar
to parallel hash-join where the workers pass through different phases.
1. Phase 1 - Read fixed size chunks in parallel, store the chunks and
the small stats about each chunk in the shared memory. If the shared
memory is full, go to phase 2.
2. Phase 2 - Allow a single worker to process the stats and decide the
actual chunk boundaries so that no tuple spans across two different
chunks. Go to phase 3.
3. Phase 3 - Each worker picks one adjusted chunk, parse and process
tuples from the same. Once done with one chunk, it picks the next one
and so on.
4. If there are still some unread contents, go back to phase 1.
We can probably use separate workers for phase 1 and phase 3 so that
they can work concurrently.
Advantages:
1. Each worker spends some significant time in each phase. Gets
benefit of the instruction cache - at least in phase 1.
2. It also has the same advantage of parallel hash join - fast workers
get to work more.
3. We can extend this solution for reading data from STDIN. Of course,
the phase 1 and phase 2 must be performed by the leader process who
can read from the socket.
Disadvantages:
1. Surely doesn't work if we don't have enough shared memory.
2. Probably, this approach is just impractical for PG due to certain
limitations.
Thoughts?
[1]: https://www.microsoft.com/en-us/research/uploads/prod/2019/04/chunker-sigmod19.pdf
[2]: ParaText. https://github.com/wiseio/paratext.
--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com
On Tue, 14 Apr 2020 at 22:40, Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:
1. Each worker scans a distinct fixed sized chunk of the CSV file and
collects the following three stats from the chunk:
a) number of quotes
b) position of the first new line after even number of quotes
c) position of the first new line after odd number of quotes
2. Once stats from all the chunks are collected, the leader identifies
the adjusted chunk boundaries by iterating over the stats linearly:
- For the k-th chunk, the leader adds the number of quotes in k-1 chunks.
- If the number is even, then the k-th chunk does not start in the
middle of a quoted field, and the first newline after an even number
of quotes (the second collected information) is the first record
delimiter in this chunk.
- Otherwise, if the number is odd, the first newline after an odd
number of quotes (the third collected information) is the first record
delimiter.
- The end position of the adjusted chunk is obtained based on the
starting position of the next adjusted chunk.
The trouble is that, at least with current coding, the number of
quotes in a chunk can depend on whether the chunk started in a quote
or not. That's because escape characters only count inside quotes. See
for example the following csv:
foo,\"bar
baz",\"xyz"
This currently parses as one line and the number of parsed quotes
doesn't change if you add a quote in front.
But the general approach of doing the tokenization in parallel and
then a serial pass over the tokenization would still work. The quote
counting and new line finding just has to be done for both starting in
quote and not starting in quote case.
Using phases doesn't look like the correct approach - the tokenization
can be prepared just in time for the serial pass and processing the
chunk can proceed immediately after. This could all be done by having
the data in a single ringbuffer with a processing pipeline where one
process does the reading, then workers grab tokenization chunks as
they become available, then one process handles determining the chunk
boundaries, after which the chunks are processed.
But I still don't think this is something to worry about for the first
version. Just a better line splitting algorithm should go a looong way
in feeding a large number of workers, even when inserting to an
unindexed unlogged table. If we get the SIMD line splitting in, it
will be enough to overwhelm most I/O subsystems available today.
Regards,
Ants Aasma
On Mon, 13 Apr 2020 at 23:16, Andres Freund <andres@anarazel.de> wrote:
Still, if the reader does the splitting, then you don't need as much
IPC, right? The shared memory data structure is just a ring of bytes,
and whoever reads from it is responsible for the rest.I don't think so. If only one process does the splitting, the
exclusively locked section is just popping off a bunch of offsets of the
ring. And that could fairly easily be done with atomic ops (since what
we need is basically a single producer multiple consumer queue, which
can be done lock free fairly easily ). Whereas in the case of each
process doing the splitting, the exclusively locked part is splitting
along lines - which takes considerably longer than just popping off a
few offsets.
I see the benefit of having one process responsible for splitting as
being able to run ahead of the workers to queue up work when many of
them need new data at the same time. I don't think the locking
benefits of a ring are important in this case. At current rather
conservative chunk sizes we are looking at ~100k chunks per second at
best, normal locking should be perfectly adequate. And chunk size can
easily be increased. I see the main value in it being simple.
But there is a point that having a layer of indirection instead of a
linear buffer allows for some workers to fall behind. Either because
the kernel scheduled them out for a time slice, or they need to do I/O
or because inserting some tuple hit an unique conflict and needs to
wait for a tx to complete or abort to resolve. With a ring buffer
reading has to wait on the slowest worker reading its chunk. Having
workers copy the data to a local buffer as the first step would reduce
the probability of hitting any issues. But still, at GB/s rates,
hiding a 10ms timeslice of delay would need 10's of megabytes of
buffer.
FWIW. I think just increasing the buffer is good enough - the CPUs
processing this workload are likely to have tens to hundreds of
megabytes of cache on board.
On Wed, Apr 15, 2020 at 1:10 AM Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:
Hence, I was trying to think whether we can leverage this idea for
implementing parallel COPY in PG. We can design an algorithm similar
to parallel hash-join where the workers pass through different phases.
1. Phase 1 - Read fixed size chunks in parallel, store the chunks and
the small stats about each chunk in the shared memory. If the shared
memory is full, go to phase 2.
2. Phase 2 - Allow a single worker to process the stats and decide the
actual chunk boundaries so that no tuple spans across two different
chunks. Go to phase 3.3. Phase 3 - Each worker picks one adjusted chunk, parse and process
tuples from the same. Once done with one chunk, it picks the next one
and so on.4. If there are still some unread contents, go back to phase 1.
We can probably use separate workers for phase 1 and phase 3 so that
they can work concurrently.Advantages:
1. Each worker spends some significant time in each phase. Gets
benefit of the instruction cache - at least in phase 1.
2. It also has the same advantage of parallel hash join - fast workers
get to work more.
3. We can extend this solution for reading data from STDIN. Of course,
the phase 1 and phase 2 must be performed by the leader process who
can read from the socket.Disadvantages:
1. Surely doesn't work if we don't have enough shared memory.
2. Probably, this approach is just impractical for PG due to certain
limitations.
As I understand this, it needs to parse the lines twice (second time
in phase-3) and till the first two phases are over, we can't start the
tuple processing work which is done in phase-3. So even if the
tokenization is done a bit faster but we will lose some on processing
the tuples which might not be an overall win and in fact, it can be
worse as compared to the single reader approach being discussed.
Now, if the work done in tokenization is a major (or significant)
portion of the copy then thinking of such a technique might be useful
but that is not the case as seen in the data shared above (the
tokenize time is very less as compared to data processing time) in
this email.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Wed, Apr 15, 2020 at 7:15 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
As I understand this, it needs to parse the lines twice (second time
in phase-3) and till the first two phases are over, we can't start the
tuple processing work which is done in phase-3. So even if the
tokenization is done a bit faster but we will lose some on processing
the tuples which might not be an overall win and in fact, it can be
worse as compared to the single reader approach being discussed.
Now, if the work done in tokenization is a major (or significant)
portion of the copy then thinking of such a technique might be useful
but that is not the case as seen in the data shared above (the
tokenize time is very less as compared to data processing time) in
this email.
It seems to me that a good first step here might be to forget about
parallelism for a minute and just write a patch to make the line
splitting as fast as possible.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Wed, Apr 15, 2020 at 2:15 PM Ants Aasma <ants@cybertec.at> wrote:
On Tue, 14 Apr 2020 at 22:40, Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:
1. Each worker scans a distinct fixed sized chunk of the CSV file and
collects the following three stats from the chunk:
a) number of quotes
b) position of the first new line after even number of quotes
c) position of the first new line after odd number of quotes
2. Once stats from all the chunks are collected, the leader identifies
the adjusted chunk boundaries by iterating over the stats linearly:
- For the k-th chunk, the leader adds the number of quotes in k-1 chunks.
- If the number is even, then the k-th chunk does not start in the
middle of a quoted field, and the first newline after an even number
of quotes (the second collected information) is the first record
delimiter in this chunk.
- Otherwise, if the number is odd, the first newline after an odd
number of quotes (the third collected information) is the first record
delimiter.
- The end position of the adjusted chunk is obtained based on the
starting position of the next adjusted chunk.The trouble is that, at least with current coding, the number of
quotes in a chunk can depend on whether the chunk started in a quote
or not. That's because escape characters only count inside quotes. See
for example the following csv:foo,\"bar
baz",\"xyz"This currently parses as one line and the number of parsed quotes
doesn't change if you add a quote in front.But the general approach of doing the tokenization in parallel and
then a serial pass over the tokenization would still work. The quote
counting and new line finding just has to be done for both starting in
quote and not starting in quote case.
Yeah, right.
Using phases doesn't look like the correct approach - the tokenization
can be prepared just in time for the serial pass and processing the
chunk can proceed immediately after. This could all be done by having
the data in a single ringbuffer with a processing pipeline where one
process does the reading, then workers grab tokenization chunks as
they become available, then one process handles determining the chunk
boundaries, after which the chunks are processed.
I was thinking from this point of view - the sooner we introduce
parallelism in the process, the greater the benefits. Probably there
isn't any way to avoid a single-pass over the data (phase - 2 in the
above case) to tokenise the chunks. So yeah, if the reading and
tokenisation phase doesn't take much time, parallelising the same will
just be an overkill. As pointed by Andres and you, using a lock-free
circular buffer implementation sounds the way to go forward. AFAIK,
FIFO circular queue with CAS-based implementation suffers from two
problems - 1. (as pointed by you) slow workers may block producers. 2.
Since it doesn't partition the queue among the workers, does not
achieve good locality and cache-friendliness, limits their scalability
on NUMA systems.
But I still don't think this is something to worry about for the first
version. Just a better line splitting algorithm should go a looong way
in feeding a large number of workers, even when inserting to an
unindexed unlogged table. If we get the SIMD line splitting in, it
will be enough to overwhelm most I/O subsystems available today.
Yeah. Parsing text is a great use case for data parallelism which can
be achieved by SIMD instructions. Consider processing 8-bit ASCII
characters in 512-bit SIMD word. A lot of code and complexity from
CopyReadLineText will surely go away. And further (I'm not sure in
this point), if we can use the schema of the table, perhaps JIT can
generate machine code to efficient read of fields based on their
types.
--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com
On 2020-04-15 10:12:14 -0400, Robert Haas wrote:
On Wed, Apr 15, 2020 at 7:15 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
As I understand this, it needs to parse the lines twice (second time
in phase-3) and till the first two phases are over, we can't start the
tuple processing work which is done in phase-3. So even if the
tokenization is done a bit faster but we will lose some on processing
the tuples which might not be an overall win and in fact, it can be
worse as compared to the single reader approach being discussed.
Now, if the work done in tokenization is a major (or significant)
portion of the copy then thinking of such a technique might be useful
but that is not the case as seen in the data shared above (the
tokenize time is very less as compared to data processing time) in
this email.It seems to me that a good first step here might be to forget about
parallelism for a minute and just write a patch to make the line
splitting as fast as possible.
+1
Compared to all the rest of the efforts during COPY a fast "split rows"
implementation should not be a bottleneck anymore.
Hi,
On 2020-04-15 20:36:39 +0530, Kuntal Ghosh wrote:
I was thinking from this point of view - the sooner we introduce
parallelism in the process, the greater the benefits.
I don't really agree. Sure, that's true from a theoretical perspective,
but the incremental gains may be very small, and the cost in complexity
very high. If we can get single threaded splitting of rows to be >4GB/s,
which should very well be attainable, the rest of the COPY work is going
to dominate the time. We shouldn't add complexity to parallelize more
of the line splitting, caring too much about scalable datastructures,
etc when the bottleneck after some straightforward optimization is
usually still in the parallelized part.
I'd expect that for now we'd likely hit scalability issues in other
parts of the system first (e.g. extension locks, buffer mapping).
Greetings,
Andres Freund
Hi,
On 2020-04-15 12:05:47 +0300, Ants Aasma wrote:
I see the benefit of having one process responsible for splitting as
being able to run ahead of the workers to queue up work when many of
them need new data at the same time.
Yea, I agree.
I don't think the locking benefits of a ring are important in this
case. At current rather conservative chunk sizes we are looking at
~100k chunks per second at best, normal locking should be perfectly
adequate. And chunk size can easily be increased. I see the main value
in it being simple.
I think the locking benefits of not needing to hold a lock *while*
splitting (as we'd need in some proposal floated earlier) is likely to
already be beneficial. I don't think we need to worry about lock
scalability protecting the queue of already split data, for now.
I don't think we really want to have a much larger chunk size,
btw. Makes it more likely for data to workers to take an uneven amount
of time.
But there is a point that having a layer of indirection instead of a
linear buffer allows for some workers to fall behind.
Yea. It'd probably make sense to read the input data into an array of
evenly sized blocks, and have the datastructure (still think a
ringbuffer makes sense) of split boundaries point into those entries. If
we don't require the input blocks to be in-order in that array, we can
reuse blocks therein that are fully processed, even if "earlier" data in
the input has not yet been fully processed.
With a ring buffer reading has to wait on the slowest worker reading
its chunk.
To be clear, I was only thinking of using a ringbuffer to indicate split
boundaries. And that workers would just pop entries from it before they
actually process the data (stored outside of the ringbuffer). Since the
split boundaries will always be read in order by workers, and the
entries will be tiny, there's no need to avoid copying out entries.
So basically what I was thinking we *eventually* may want (I'd forgo some
of this initially) is something like:
struct InputBlock
{
uint32 unprocessed_chunk_parts;
uint32 following_block;
char data[INPUT_BLOCK_SIZE]
};
// array of input data, with > 2*nworkers entries
InputBlock *input_blocks;
struct ChunkedInputBoundary
{
uint32 firstblock;
uint32 startoff;
};
struct ChunkedInputBoundaries
{
uint32 read_pos;
uint32 write_end;
ChunkedInputBoundary ring[RINGSIZE];
};
Where the leader would read data into InputBlocks with
unprocessed_chunk_parts == 0. Then it'd split the read input data into
chunks (presumably with chunk size << input block size), putting
identified chunks into ChunkedInputBoundaries. For each
ChunkedInputBoundary it'd increment the unprocessed_chunk_parts of each
InputBlock containing parts of the chunk. For chunks across >1
InputBlocks each InputBlock's following_block would be set accordingly.
Workers would just pop an entry from the ringbuffer (making that entry
reusable), and process the chunk. The underlying data would not be
copied out of the InputBlocks, but obviously readers would need to take
care to handle InputBlock boundaries. Whenever a chunk is fully read, or
when crossing a InputBlock boundary, the InputBlock's
unprocessed_chunk_parts would be decremented.
Recycling of InputBlocks could probably just be an occasional linear
search for buffers with unprocessed_chunk_parts == 0.
Something roughly like this should not be too complicated to
implement. Unless extremely unluckly (very wide input data spanning many
InputBlocks) a straggling reader would not prevent global progress, it'd
just prevent reuse of the InputBlocks with data for its chunk (normally
that'd be two InputBlocks, not more).
Having workers copy the data to a local buffer as the first
step would reduce the probability of hitting any issues. But still, at
GB/s rates, hiding a 10ms timeslice of delay would need 10's of
megabytes of buffer.
Yea. Given the likelihood of blocking on resources (reading in index
data, writing out dirty buffers for reclaim, row locks for uniqueness
checks, extension locks, ...), as well as non uniform per-row costs
(partial indexes, index splits, ...) I think we ought to try to cope
well with that. IMO/IME it'll be common to see stalls that are much
longer than 10ms for processes that do COPY, even when the system is not
overloaded.
FWIW. I think just increasing the buffer is good enough - the CPUs
processing this workload are likely to have tens to hundreds of
megabytes of cache on board.
It'll not necessarily be a cache shared between leader / workers though,
and some of the cache-cache transfers will be more expensive even within
a socket (between core complexes for AMD, multi chip processors for
Intel).
Greetings,
Andres Freund
On Wed, Apr 15, 2020 at 10:45 PM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2020-04-15 20:36:39 +0530, Kuntal Ghosh wrote:
I was thinking from this point of view - the sooner we introduce
parallelism in the process, the greater the benefits.I don't really agree. Sure, that's true from a theoretical perspective,
but the incremental gains may be very small, and the cost in complexity
very high. If we can get single threaded splitting of rows to be >4GB/s,
which should very well be attainable, the rest of the COPY work is going
to dominate the time. We shouldn't add complexity to parallelize more
of the line splitting, caring too much about scalable datastructures,
etc when the bottleneck after some straightforward optimization is
usually still in the parallelized part.I'd expect that for now we'd likely hit scalability issues in other
parts of the system first (e.g. extension locks, buffer mapping).
Got your point. In this particular case, a single producer is fast
enough (or probably we can make it fast enough) to generate enough
chunks for multiple consumers so that they don't stay idle and wait
for work.
--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com
On Wed, Apr 15, 2020 at 11:49 PM Andres Freund <andres@anarazel.de> wrote:
To be clear, I was only thinking of using a ringbuffer to indicate split
boundaries. And that workers would just pop entries from it before they
actually process the data (stored outside of the ringbuffer). Since the
split boundaries will always be read in order by workers, and the
entries will be tiny, there's no need to avoid copying out entries.
I think the binary mode processing will be slightly different because
unlike text and csv format, the data is stored in Length, Value format
for each column and there are no line markers. I don't think there
will be a big difference but still, we need to somewhere keep the
information what is the format of data in ring buffers. Basically, we
can copy the data in Length, Value format and once the writers know
about the format, they will parse the data in the appropriate format.
We currently also have a different way of parsing the binary format,
see NextCopyFrom. I think we need to be careful about avoiding
duplicate work as much as possible.
Apart from this, we have analyzed the other cases as mentioned below
where we need to decide whether we can allow parallelism for the copy
command.
Case-1:
Do we want to enable parallelism for a copy when transition tables are
involved? Basically, during the copy, we do capture tuples in
transition tables for certain cases like when after statement trigger
accesses the same relation on which we have a trigger. See the
example below [1]create or replace function dump_insert() returns trigger language plpgsql as $$ begin raise notice 'trigger = %, new table = %', TG_NAME, (select string_agg(new_table::text, ', ' order by a) from new_table); return null; end; $$;. We decide this in function
MakeTransitionCaptureState. For such cases, we collect minimal tuples
in the tuple store after processing them so that later after statement
triggers can access them. Now, if we want to enable parallelism for
such cases, we instead need to store and access tuples from shared
tuple store (sharedtuplestore.c/sharedtuplestore.h). However, it
doesn't have the facility to store tuples in-memory, so we always need
to store and access from a file which could be costly unless we also
have an additional way to store minimal tuples in shared memory till
work_memory and then in shared tuple store. It is possible to do all
this or part of this work to enable parallel copy for such cases but I
am not sure if it is worth it. We can decide to not enable parallelism
for such cases and later allow if we see demand for the same and it
will also help us to not introduce additional work/complexity in the
first version of the patch.
Case-2:
The Single Insertion mode (CIM_SINGLE) is performed in various
scenarios and whether we can allow parallelism for those depends on
case to case basis which is discussed below:
a. When there are BEFORE/INSTEAD OF triggers on the table. We don't
allow multi-inserts in such cases because such triggers might query
the table we're inserting into and act differently if the tuples that
have already been processed and prepared for insertion are not there.
Now, if we allow parallelism with such triggers the behavior would
depend on if the parallel worker has already inserted or not that
particular row. I guess such functions should ideally be marked as
parallel-unsafe. So, in short in this case whether to allow
parallelism or not depends upon the parallel-safety marking of this
function.
b. For partitioned tables, we can't support multi-inserts when there
are any statement-level insert triggers. This is because as of now,
we expect that any before row insert and statement-level insert
triggers are on the same relation. Now, there is no harm in allowing
parallelism for such cases but it depends upon if we have the
infrastructure (basically allow tuples to be collected in shared tuple
store) to support statement-level insert triggers.
c. For inserts into foreign tables. We can't allow the parallelism in
this case because each worker needs to establish the FDW connection
and operate in a separate transaction. Now unless we have a
capability to provide a two-phase commit protocol for "Transactions
involving multiple postgres foreign servers" (which is being discussed
in a separate thread [2]/messages/by-id/20191206.173215.1818665441859410805.horikyota.ntt@gmail.com), we can't allow this.
d. If there are volatile default expressions or the where clause
contains a volatile expression. Here, we can check if the expression
is parallel-safe, then we can allow parallelism.
Case-3:
In copy command, for performing foreign key checks, we take KEY SHARE
lock on primary key table rows which inturn will increment the command
counter and updates the snapshot. Now, as we share the snapshots at
the beginning of the command, we can't allow it to be changed later.
So, unless we do something special for it, I think we can't allow
parallelism in such cases.
I couldn't think of many problems if we allow parallelism in such
cases. One inconsistency, if we allow FK checks via workers, would be
that at the end of COPY the value of command_counter will not be what
we expect as we wouldn't have accounted for that from workers. Now,
if COPY is being done in a transaction it will not assign the correct
values to the next commands. Also, for executing deferred triggers,
we use transaction snapshot, so if anything is changed in snapshot via
parallel workers, ideally it should have synced the changed snapshot
in the worker.
Now, the other concern could be that different workers can try to
acquire KEY SHARE lock on the same tuples which they will be able to
acquire due to group locking or otherwise but I don't see any problem
with it.
I am not sure if it above leads to any user-visible problem but I
might be missing something here. I think if we can think of any real
problems we can try to design a better solution to address those.
Case-4:
For Deferred Triggers, it seems we record CTIDs of tuples (via
ExecARInsertTriggers->AfterTriggerSaveEvent) and then execute deferred
triggers at transaction end using AfterTriggerFireDeferred or at end
of the statement. The challenge to allow parallelism for such cases
is we need to capture the CTID events in shared memory. For that, we
either need to invent a new infrastructure for event capturing in
shared memory which will be a huge task on its own. The other idea is
to get CTIDs via shared memory and then add those to event queues via
leader but I think in that case we need to ensure the order of CTIDs
(basically it should be in the same order in which we have processed
them).
[1]: create or replace function dump_insert() returns trigger language plpgsql as $$ begin raise notice 'trigger = %, new table = %', TG_NAME, (select string_agg(new_table::text, ', ' order by a) from new_table); return null; end; $$;
create or replace function dump_insert() returns trigger language plpgsql as
$$
begin
raise notice 'trigger = %, new table = %',
TG_NAME,
(select string_agg(new_table::text, ', ' order by a)
from new_table);
return null;
end;
$$;
create table test (a int);
create trigger trg1_test after insert on test referencing new table
as new_table for each statement execute procedure dump_insert();
copy test (a) from stdin;
1
2
3
\.
[2]: /messages/by-id/20191206.173215.1818665441859410805.horikyota.ntt@gmail.com
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
I wonder why you're still looking at this instead of looking at just
speeding up the current code, especially the line splitting, per
previous discussion. And then coming back to study this issue more
after that's done.
On Mon, May 11, 2020 at 8:12 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
Apart from this, we have analyzed the other cases as mentioned below
where we need to decide whether we can allow parallelism for the copy
command.
Case-1:
Do we want to enable parallelism for a copy when transition tables are
involved?
I think it would be OK not to support this.
Case-2:
a. When there are BEFORE/INSTEAD OF triggers on the table.
b. For partitioned tables, we can't support multi-inserts when there
are any statement-level insert triggers.
c. For inserts into foreign tables.
d. If there are volatile default expressions or the where clause
contains a volatile expression. Here, we can check if the expression
is parallel-safe, then we can allow parallelism.
This all sounds fine.
Case-3:
In copy command, for performing foreign key checks, we take KEY SHARE
lock on primary key table rows which inturn will increment the command
counter and updates the snapshot. Now, as we share the snapshots at
the beginning of the command, we can't allow it to be changed later.
So, unless we do something special for it, I think we can't allow
parallelism in such cases.
This sounds like much more of a problem to me; it'd be a significant
restriction that would kick in routine cases where the user isn't
doing anything particularly exciting. The command counter presumably
only needs to be updated once per command, so maybe we could do that
before we start parallelism. However, I think we would need to have
some kind of dynamic memory structure to which new combo CIDs can be
added by any member of the group, and then discovered by other members
of the group later. At the end of the parallel operation, the leader
must discover any combo CIDs added by others to that table before
destroying it, even if it has no immediate use for the information. We
can't allow a situation where the group members have inconsistent
notions of which combo CIDs exist or what their mappings are, and if
KEY SHARE locks are being taken, new combo CIDs could be created.
Case-4:
For Deferred Triggers, it seems we record CTIDs of tuples (via
ExecARInsertTriggers->AfterTriggerSaveEvent) and then execute deferred
triggers at transaction end using AfterTriggerFireDeferred or at end
of the statement.
I think this could be left for the future.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Mon, May 11, 2020 at 11:52 PM Robert Haas <robertmhaas@gmail.com> wrote:
I wonder why you're still looking at this instead of looking at just
speeding up the current code, especially the line splitting,
Because the line splitting is just 1-2% of overall work in common
cases. See the data shared by Vignesh for various workloads [1]/messages/by-id/CALDaNm3r8cPsk0Vo_-6AXipTrVwd0o9U2S0nCmRdku1Dn-Tpqg@mail.gmail.com. The
time it takes is in range of 0.5-12% approximately and for cases like
a table with few indexes, it is not more than 1-2%.
[1]: /messages/by-id/CALDaNm3r8cPsk0Vo_-6AXipTrVwd0o9U2S0nCmRdku1Dn-Tpqg@mail.gmail.com
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Mon, May 11, 2020 at 11:52 PM Robert Haas <robertmhaas@gmail.com> wrote:
Case-3:
In copy command, for performing foreign key checks, we take KEY SHARE
lock on primary key table rows which inturn will increment the command
counter and updates the snapshot. Now, as we share the snapshots at
the beginning of the command, we can't allow it to be changed later.
So, unless we do something special for it, I think we can't allow
parallelism in such cases.This sounds like much more of a problem to me; it'd be a significant
restriction that would kick in routine cases where the user isn't
doing anything particularly exciting. The command counter presumably
only needs to be updated once per command, so maybe we could do that
before we start parallelism. However, I think we would need to have
some kind of dynamic memory structure to which new combo CIDs can be
added by any member of the group, and then discovered by other members
of the group later. At the end of the parallel operation, the leader
must discover any combo CIDs added by others to that table before
destroying it, even if it has no immediate use for the information. We
can't allow a situation where the group members have inconsistent
notions of which combo CIDs exist or what their mappings are, and if
KEY SHARE locks are being taken, new combo CIDs could be created.
AFAIU, we don't generate combo CIDs for this case. See below code in
heap_lock_tuple():
/*
* Store transaction information of xact locking the tuple.
*
* Note: Cmax is meaningless in this context, so don't set it; this avoids
* possibly generating a useless combo CID. Moreover, if we're locking a
* previously updated tuple, it's important to preserve the Cmax.
*
* Also reset the HOT UPDATE bit, but only if there's no update; otherwise
* we would break the HOT chain.
*/
tuple->t_data->t_infomask &= ~HEAP_XMAX_BITS;
tuple->t_data->t_infomask2 &= ~HEAP_KEYS_UPDATED;
tuple->t_data->t_infomask |= new_infomask;
tuple->t_data->t_infomask2 |= new_infomask2;
I don't understand why we need to do something special for combo CIDs
if they are not generated during this operation?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Tue, May 12, 2020 at 1:01 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
I don't understand why we need to do something special for combo CIDs
if they are not generated during this operation?
Hmm. Well I guess if they're not being generated then we don't need to
do anything about them, but I still think we should try to work around
having to disable parallelism for a table which is referenced by
foreign keys.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Thu, May 14, 2020 at 12:39 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, May 12, 2020 at 1:01 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
I don't understand why we need to do something special for combo CIDs
if they are not generated during this operation?Hmm. Well I guess if they're not being generated then we don't need to
do anything about them, but I still think we should try to work around
having to disable parallelism for a table which is referenced by
foreign keys.
Okay, just to be clear, we want to allow parallelism for a table that
has foreign keys. Basically, a parallel copy should work while
loading data into tables having FK references.
To support that, we need to consider a few things.
a. Currently, we increment the command counter each time we take a key
share lock on a tuple during trigger execution. I am really not sure
if this is required during Copy command execution or we can just
increment it once for the copy. If we need to increment the command
counter just once for copy command then for the parallel copy we can
ensure that we do it just once at the end of the parallel copy but if
not then we might need some special handling.
b. Another point is that after inserting rows we record CTIDs of the
tuples in the event queue and then once all tuples are processed we
call FK trigger for each CTID. Now, with parallelism, the FK checks
will be processed once the worker processed one chunk. I don't see
any problem with it but still, this will be a bit different from what
we do in serial case. Do you see any problem with this?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Thu, May 14, 2020 at 11:48 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, May 14, 2020 at 12:39 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, May 12, 2020 at 1:01 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
I don't understand why we need to do something special for combo CIDs
if they are not generated during this operation?Hmm. Well I guess if they're not being generated then we don't need to
do anything about them, but I still think we should try to work around
having to disable parallelism for a table which is referenced by
foreign keys.Okay, just to be clear, we want to allow parallelism for a table that
has foreign keys. Basically, a parallel copy should work while
loading data into tables having FK references.To support that, we need to consider a few things.
a. Currently, we increment the command counter each time we take a key
share lock on a tuple during trigger execution. I am really not sure
if this is required during Copy command execution or we can just
increment it once for the copy. If we need to increment the command
counter just once for copy command then for the parallel copy we can
ensure that we do it just once at the end of the parallel copy but if
not then we might need some special handling.b. Another point is that after inserting rows we record CTIDs of the
tuples in the event queue and then once all tuples are processed we
call FK trigger for each CTID. Now, with parallelism, the FK checks
will be processed once the worker processed one chunk. I don't see
any problem with it but still, this will be a bit different from what
we do in serial case. Do you see any problem with this?
IMHO, it should not be a problem because without parallelism also we
trigger the foreign key check when we detect EOF and end of data from
STDIN. And, with parallel workers also the worker will assume that it
has complete all the work and it can go for the foreign key check is
only after the leader receives EOF and end of data from STDIN.
The only difference is that each worker is not waiting for all the
data (from all workers) to get inserted before checking the
constraint. Moreover, we are not supporting external triggers with
the parallel copy, otherwise, we might have to worry that those
triggers could do something on the primary table before we check the
constraint. I am not sure if there are any other factors that I am
missing.
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
On Thu, May 14, 2020 at 2:18 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
To support that, we need to consider a few things.
a. Currently, we increment the command counter each time we take a key
share lock on a tuple during trigger execution. I am really not sure
if this is required during Copy command execution or we can just
increment it once for the copy. If we need to increment the command
counter just once for copy command then for the parallel copy we can
ensure that we do it just once at the end of the parallel copy but if
not then we might need some special handling.
My sense is that it would be a lot more sensible to do it at the
*beginning* of the parallel operation. Once we do it once, we
shouldn't ever do it again; that's how it works now. Deferring it
until later seems much more likely to break things.
b. Another point is that after inserting rows we record CTIDs of the
tuples in the event queue and then once all tuples are processed we
call FK trigger for each CTID. Now, with parallelism, the FK checks
will be processed once the worker processed one chunk. I don't see
any problem with it but still, this will be a bit different from what
we do in serial case. Do you see any problem with this?
I think there could be some problems here. For instance, suppose that
there are two entries for different workers for the same CTID. If the
leader were trying to do all the work, they'd be handled
consecutively. If they were from completely unrelated processes,
locking would serialize them. But group locking won't, so there you
have an issue, I think. Also, it's not ideal from a work-distribution
perspective: one worker could finish early and be unable to help the
others.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Fri, May 15, 2020 at 1:51 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, May 14, 2020 at 2:18 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
To support that, we need to consider a few things.
a. Currently, we increment the command counter each time we take a key
share lock on a tuple during trigger execution. I am really not sure
if this is required during Copy command execution or we can just
increment it once for the copy. If we need to increment the command
counter just once for copy command then for the parallel copy we can
ensure that we do it just once at the end of the parallel copy but if
not then we might need some special handling.My sense is that it would be a lot more sensible to do it at the
*beginning* of the parallel operation. Once we do it once, we
shouldn't ever do it again; that's how it works now. Deferring it
until later seems much more likely to break things.
AFAIU, we always increment the command counter after executing the
command. Why do we want to do it differently here?
b. Another point is that after inserting rows we record CTIDs of the
tuples in the event queue and then once all tuples are processed we
call FK trigger for each CTID. Now, with parallelism, the FK checks
will be processed once the worker processed one chunk. I don't see
any problem with it but still, this will be a bit different from what
we do in serial case. Do you see any problem with this?I think there could be some problems here. For instance, suppose that
there are two entries for different workers for the same CTID.
First, let me clarify the CTID I have used in my email are for the
table in which insertion is happening which means FK table. So, in
such a case, we can't have the same CTIDs queued for different
workers. Basically, we use CTID to fetch the row from FK table later
and form a query to lock (in KEY SHARE mode) the corresponding tuple
in PK table. Now, it is possible that two different workers try to
lock the same row of PK table. I am not clear what problem group
locking can have in this case because these are non-conflicting locks.
Can you please elaborate a bit more?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Fri, May 15, 2020 at 12:19 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
My sense is that it would be a lot more sensible to do it at the
*beginning* of the parallel operation. Once we do it once, we
shouldn't ever do it again; that's how it works now. Deferring it
until later seems much more likely to break things.AFAIU, we always increment the command counter after executing the
command. Why do we want to do it differently here?
Hmm, now I'm starting to think that I'm confused about what is under
discussion here. Which CommandCounterIncrement() are we talking about
here?
First, let me clarify the CTID I have used in my email are for the
table in which insertion is happening which means FK table. So, in
such a case, we can't have the same CTIDs queued for different
workers. Basically, we use CTID to fetch the row from FK table later
and form a query to lock (in KEY SHARE mode) the corresponding tuple
in PK table. Now, it is possible that two different workers try to
lock the same row of PK table. I am not clear what problem group
locking can have in this case because these are non-conflicting locks.
Can you please elaborate a bit more?
I'm concerned about two workers trying to take the same lock at the
same time. If that's prevented by the buffer locking then I think it's
OK, but if it's prevented by a heavyweight lock then it's not going to
work in this case.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Fri, May 15, 2020 at 6:49 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, May 15, 2020 at 12:19 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
My sense is that it would be a lot more sensible to do it at the
*beginning* of the parallel operation. Once we do it once, we
shouldn't ever do it again; that's how it works now. Deferring it
until later seems much more likely to break things.AFAIU, we always increment the command counter after executing the
command. Why do we want to do it differently here?Hmm, now I'm starting to think that I'm confused about what is under
discussion here. Which CommandCounterIncrement() are we talking about
here?
The one we do after executing a non-readonly command. Let me try to
explain by example:
CREATE TABLE tab_fk_referenced_chk(refindex INTEGER PRIMARY KEY,
height real, weight real);
insert into tab_fk_referenced_chk values( 1, 1.1, 100);
CREATE TABLE tab_fk_referencing_chk(index INTEGER REFERENCES
tab_fk_referenced_chk(refindex), height real, weight real);
COPY tab_fk_referencing_chk(index,height,weight) FROM stdin WITH(
DELIMITER ',');
1,1.1,100
1,2.1,200
1,3.1,300
\.
In the above case, even though we are executing a single command from
the user perspective, but the currentCommandId will be four after the
command. One increment will be for the copy command and the other
three increments are for locking tuple in PK table
(tab_fk_referenced_chk) for three tuples in FK table
(tab_fk_referencing_chk). Now, for parallel workers, it is
(theoretically) possible that the three tuples are processed by three
different workers which don't get synced as of now. The question was
do we see any kind of problem with this and if so can we just sync it
up at the end of parallelism.
First, let me clarify the CTID I have used in my email are for the
table in which insertion is happening which means FK table. So, in
such a case, we can't have the same CTIDs queued for different
workers. Basically, we use CTID to fetch the row from FK table later
and form a query to lock (in KEY SHARE mode) the corresponding tuple
in PK table. Now, it is possible that two different workers try to
lock the same row of PK table. I am not clear what problem group
locking can have in this case because these are non-conflicting locks.
Can you please elaborate a bit more?I'm concerned about two workers trying to take the same lock at the
same time. If that's prevented by the buffer locking then I think it's
OK, but if it's prevented by a heavyweight lock then it's not going to
work in this case.
We do take buffer lock in exclusive mode before trying to acquire KEY
SHARE lock on the tuple, so the two workers shouldn't try to acquire
at the same time. I think you are trying to see if in any case, two
workers try to acquire heavyweight lock like tuple lock or something
like that to perform the operation then it will create a problem
because due to group locking it will allow such an operation where it
should not have been. But I don't think anything of that sort is
feasible in COPY operation and if it is then we probably need to
carefully block it or find some solution for it.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Hi.
We have made a patch on the lines that were discussed in the previous
mails. We could achieve up to 9.87X performance improvement. The
improvement varies from case to case.
Workers/
Exec time (seconds) copy from file,
2 indexes on integer columns
1 index on text column copy from stdin,
2 indexes on integer columns
1 index on text column copy from file, 1 gist index on text column copy
from file,
3 indexes on integer columns copy from stdin, 3 indexes on integer columns
0 1162.772(1X) 1176.035(1X) 827.669(1X) 216.171(1X) 217.376(1X)
1 1110.288(1.05X) 1120.556(1.05X) 747.384(1.11X) 174.242(1.24X)
163.492(1.33X)
2 635.249(1.83X) 668.18(1.76X) 435.673(1.9X) 133.829(1.61X) 126.516(1.72X)
4 336.835(3.45X) 346.768(3.39X) 236.406(3.5X) 105.767(2.04X) 107.382(2.02X)
8 188.577(6.17X) 194.491(6.04X) 148.962(5.56X) 100.708(2.15X) 107.72(2.01X)
16 126.819(9.17X) 146.402(8.03X) 119.923(6.9X) 97.996(2.2X) 106.531(2.04X)
20 *117.845(9.87X)* 149.203(7.88X) 138.741(5.96X) 97.94(2.21X) 107.5(2.02)
30 127.554(9.11X) 161.218(7.29X) 172.443(4.8X) 98.232(2.2X) 108.778(1.99X)
Posting the initial patch to get the feedback.
Design of the Parallel Copy: The backend, to which the "COPY FROM" query is
submitted acts as leader with the responsibility of reading data from the
file/stdin, launching at most n number of workers as specified with
PARALLEL 'n' option in the "COPY FROM" query. The leader populates the
common data required for the workers execution in the DSM and shares it
with the workers. The leader then executes before statement triggers if
there exists any. Leader populates DSM chunks which includes the start
offset and chunk size, while populating the chunks it reads as many blocks
as required into the DSM data blocks from the file. Each block is of 64K
size. The leader parses the data to identify a chunk, the existing logic
from CopyReadLineText which identifies the chunks with some changes was
used for this. Leader checks if a free chunk is available to copy the
information, if there is no free chunk it waits till the required chunk is
freed up by the worker and then copies the identified chunks information
(offset & chunk size) into the DSM chunks. This process is repeated till
the complete file is processed. Simultaneously, the workers cache the
chunks(50) locally into the local memory and release the chunks to the
leader for further populating. Each worker processes the chunk that it
cached and inserts it into the table. The leader waits till all the chunks
populated are processed by the workers and exits.
We would like to include support of parallel copy for referential integrity
constraints and parallelizing copy from binary format files in the future.
The above mentioned tests were run with CSV format, file size of 5.1GB & 10
million records in the table. The postgres configuration and system
configuration used is attached in config.txt.
Myself and one of my colleagues Bharath have developed this patch. We would
like to thank Amit, Dilip, Robert, Andres, Ants, Kuntal, Alastair, Tomas,
David, Thomas, Andrew & Kyotaro for their thoughts/discussions/suggestions.
Thoughts?
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
On Mon, May 18, 2020 at 10:18 AM Amit Kapila <amit.kapila16@gmail.com>
wrote:
Show quoted text
On Fri, May 15, 2020 at 6:49 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, May 15, 2020 at 12:19 AM Amit Kapila <amit.kapila16@gmail.com>
wrote:
My sense is that it would be a lot more sensible to do it at the
*beginning* of the parallel operation. Once we do it once, we
shouldn't ever do it again; that's how it works now. Deferring it
until later seems much more likely to break things.AFAIU, we always increment the command counter after executing the
command. Why do we want to do it differently here?Hmm, now I'm starting to think that I'm confused about what is under
discussion here. Which CommandCounterIncrement() are we talking about
here?The one we do after executing a non-readonly command. Let me try to
explain by example:CREATE TABLE tab_fk_referenced_chk(refindex INTEGER PRIMARY KEY,
height real, weight real);
insert into tab_fk_referenced_chk values( 1, 1.1, 100);
CREATE TABLE tab_fk_referencing_chk(index INTEGER REFERENCES
tab_fk_referenced_chk(refindex), height real, weight real);COPY tab_fk_referencing_chk(index,height,weight) FROM stdin WITH(
DELIMITER ',');
1,1.1,100
1,2.1,200
1,3.1,300
\.In the above case, even though we are executing a single command from
the user perspective, but the currentCommandId will be four after the
command. One increment will be for the copy command and the other
three increments are for locking tuple in PK table
(tab_fk_referenced_chk) for three tuples in FK table
(tab_fk_referencing_chk). Now, for parallel workers, it is
(theoretically) possible that the three tuples are processed by three
different workers which don't get synced as of now. The question was
do we see any kind of problem with this and if so can we just sync it
up at the end of parallelism.First, let me clarify the CTID I have used in my email are for the
table in which insertion is happening which means FK table. So, in
such a case, we can't have the same CTIDs queued for different
workers. Basically, we use CTID to fetch the row from FK table later
and form a query to lock (in KEY SHARE mode) the corresponding tuple
in PK table. Now, it is possible that two different workers try to
lock the same row of PK table. I am not clear what problem group
locking can have in this case because these are non-conflicting locks.
Can you please elaborate a bit more?I'm concerned about two workers trying to take the same lock at the
same time. If that's prevented by the buffer locking then I think it's
OK, but if it's prevented by a heavyweight lock then it's not going to
work in this case.We do take buffer lock in exclusive mode before trying to acquire KEY
SHARE lock on the tuple, so the two workers shouldn't try to acquire
at the same time. I think you are trying to see if in any case, two
workers try to acquire heavyweight lock like tuple lock or something
like that to perform the operation then it will create a problem
because due to group locking it will allow such an operation where it
should not have been. But I don't think anything of that sort is
feasible in COPY operation and if it is then we probably need to
carefully block it or find some solution for it.--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
0001-Support-parallel-copy.patchapplication/x-patch; name=0001-Support-parallel-copy.patchDownload
From 4a1febf53529c4ea29312660c7a7b633f829c342 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Date: Wed, 3 Jun 2020 09:29:58 +0530
Subject: [PATCH] Allow copy from command to process data from file/STDIN
contents to a table in parallel.
This feature allows the copy from to leverage multiple CPUs in order to copy
data from file/STDIN to a table. This adds a PARALLEL option to COPY FROM
command where the user can specify the number of workers that can be used
to perform the COPY FROM command. Specifying zero as number of workers will
disable parallelism.
---
doc/src/sgml/ref/copy.sgml | 16 +
src/backend/access/heap/heapam.c | 13 -
src/backend/access/transam/parallel.c | 4 +
src/backend/access/transam/xact.c | 13 +
src/backend/commands/copy.c | 2729 ++++++++++++++++++++++-----
src/backend/optimizer/util/clauses.c | 2 +-
src/backend/replication/logical/tablesync.c | 2 +-
src/include/access/xact.h | 1 +
src/include/commands/copy.h | 4 +
src/tools/pgindent/typedefs.list | 10 +
10 files changed, 2329 insertions(+), 465 deletions(-)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18189ab..95d349d 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -37,6 +37,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
NULL '<replaceable class="parameter">null_string</replaceable>'
HEADER [ <replaceable class="parameter">boolean</replaceable> ]
+ PARALLEL <replaceable class="parameter">integer</replaceable>
QUOTE '<replaceable class="parameter">quote_character</replaceable>'
ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -275,6 +276,21 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</varlistentry>
<varlistentry>
+ <term><literal>PARALLEL</literal></term>
+ <listitem>
+ <para>
+ Perform <command>COPY FROM</command> in parallel using <replaceable
+ class="parameter"> integer</replaceable> background workers. Please
+ note that it is not guaranteed that the number of parallel workers
+ specified in <replaceable class="parameter">integer</replaceable> will
+ be used during execution. It is possible for a copy to run with fewer
+ workers than specified, or even with no workers at all. This option is
+ allowed only in <command>COPY FROM</command>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>QUOTE</literal></term>
<listitem>
<para>
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 94eb37d..6991b9f 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2012,19 +2012,6 @@ static HeapTuple
heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
CommandId cid, int options)
{
- /*
- * Parallel operations are required to be strictly read-only in a parallel
- * worker. Parallel inserts are not safe even in the leader in the
- * general case, because group locking means that heavyweight locks for
- * relation extension or GIN page locks will not conflict between members
- * of a lock group, but we don't prohibit that case here because there are
- * useful special cases that we can safely allow, such as CREATE TABLE AS.
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-
tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 14a8690..09e7a19 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -25,6 +25,7 @@
#include "catalog/pg_enum.h"
#include "catalog/storage.h"
#include "commands/async.h"
+#include "commands/copy.h"
#include "executor/execParallel.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
@@ -145,6 +146,9 @@ static const struct
},
{
"parallel_vacuum_main", parallel_vacuum_main
+ },
+ {
+ "ParallelCopyMain", ParallelCopyMain
}
};
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index cd30b62..d43902c 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -502,6 +502,19 @@ GetCurrentFullTransactionIdIfAny(void)
}
/*
+ * AssignFullTransactionIdForWorker
+ *
+ * For parallel copy, all the workers must use the same transaction id.
+ */
+void AssignFullTransactionIdForWorker(FullTransactionId fullTransactionId)
+{
+ TransactionState s = CurrentTransactionState;
+
+ Assert((IsInParallelMode() || IsParallelWorker()));
+ s->fullTransactionId = fullTransactionId;
+}
+
+/*
* MarkCurrentTransactionIdLoggedIfAny
*
* Remember that the current xid - if it is assigned - now has been wal logged.
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 6d53dc4..2a49255 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -26,6 +26,7 @@
#include "access/xlog.h"
#include "catalog/dependency.h"
#include "catalog/pg_authid.h"
+#include "catalog/pg_proc_d.h"
#include "catalog/pg_type.h"
#include "commands/copy.h"
#include "commands/defrem.h"
@@ -40,11 +41,13 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "optimizer/clauses.h"
#include "optimizer/optimizer.h"
#include "parser/parse_coerce.h"
#include "parser/parse_collate.h"
#include "parser/parse_expr.h"
#include "parser/parse_relation.h"
+#include "pgstat.h"
#include "port/pg_bswap.h"
#include "rewrite/rewriteHandler.h"
#include "storage/fd.h"
@@ -96,6 +99,154 @@ typedef enum CopyInsertMethod
} CopyInsertMethod;
/*
+ * State of the chunk.
+ */
+typedef enum ChunkState
+{
+ CHUNK_INIT, /* initial state of chunk */
+ CHUNK_LEADER_POPULATING, /* leader processing chunk */
+ CHUNK_LEADER_POPULATED, /* leader completed populating chunk */
+ CHUNK_WORKER_PROCESSING, /* worker processing chunk */
+ CHUNK_WORKER_PROCESSED /* worker completed processing chunk */
+}ChunkState;
+
+#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+
+#define DATA_BLOCK_SIZE RAW_BUF_SIZE
+#define RINGSIZE (10 * 1000)
+#define MAX_BLOCKS_COUNT 1000
+#define WORKER_CHUNK_COUNT 50 /* should be mod of RINGSIZE */
+
+#define IsParallelCopy() (cstate->is_parallel)
+#define IsLeader() (cstate->pcdata->is_leader)
+#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
+
+/*
+ * Copy data block information.
+ */
+typedef struct CopyDataBlock
+{
+ /* The number of unprocessed chunks in the current block. */
+ pg_atomic_uint32 unprocessed_chunk_parts;
+
+ /*
+ * If the current chunk data is continued into another block,
+ * following_block will have the position where the remaining data need to
+ * be read.
+ */
+ uint32 following_block;
+
+ /*
+ * This flag will be set, when the leader finds out this block can be read
+ * safely by the worker. This helps the worker to start processing the chunk
+ * early where the chunk will be spread across many blocks and the worker
+ * need not wait for the complete chunk to be processed.
+ */
+ bool curr_blk_completed;
+ char data[DATA_BLOCK_SIZE + 1]; /* data read from file */
+}CopyDataBlock;
+
+/*
+ * Individual Chunk information.
+ */
+typedef struct ChunkBoundary
+{
+ /* Position of the first block in data_blocks array. */
+ uint32 first_block;
+ uint32 start_offset; /* start offset of the chunk */
+
+ /*
+ * Size of the current chunk -1 means chunk is yet to be filled completely,
+ * 0 means empty chunk, >0 means chunk filled with chunk size data.
+ */
+ pg_atomic_uint32 chunk_size;
+ pg_atomic_uint32 chunk_state; /* chunk state */
+ uint64 cur_lineno; /* line number for error messages */
+}ChunkBoundary;
+
+/*
+ * Array of the chunk.
+ */
+typedef struct ChunkBoundaries
+{
+ /* Position for the leader to populate a chunk. */
+ uint32 leader_pos;
+
+ /* Data read from the file/stdin by the leader process. */
+ ChunkBoundary ring[RINGSIZE];
+}ChunkBoundaries;
+
+/*
+ * Shared information among parallel copy workers. This will be allocated in the
+ * DSM segment.
+ */
+typedef struct ShmCopyInfo
+{
+ bool is_read_in_progress; /* file read status */
+
+ /*
+ * Actual Chunks inserted by worker (some records will be filtered based on
+ * where condition).
+ */
+ pg_atomic_uint64 processed;
+ pg_atomic_uint64 total_worker_processed; /* total processed records by the workers */
+ uint64 populated; /* Chunks populated by leader */
+ uint32 cur_block_pos; /* current data block */
+ CopyDataBlock data_blocks[MAX_BLOCKS_COUNT]; /* data block array */
+ FullTransactionId full_transaction_id; /* xid for copy from statement */
+ CommandId mycid; /* command id */
+ ChunkBoundaries chunk_boundaries; /* chunk array */
+} ShmCopyInfo;
+
+/*
+ * This structure maintains the state of the buffer information.
+ */
+typedef struct CopyBufferState
+{
+ char *copy_raw_buf;
+ int raw_buf_ptr; /* current offset */
+ int copy_buf_len; /* total size available */
+
+ /* For parallel copy */
+ CopyDataBlock *data_blk_ptr;
+ CopyDataBlock *curr_data_blk_ptr;
+ uint32 chunk_size;
+ bool block_switched;
+}CopyBufferState;
+
+/*
+ * Parallel copy line buffer information.
+ */
+typedef struct ParallelCopyLineBuf
+{
+ StringInfoData line_buf;
+ uint64 cur_lineno; /* line number for error messages */
+}ParallelCopyLineBuf;
+
+/*
+ * Parallel copy data information.
+ */
+typedef struct ParallelCopyData
+{
+ Oid relid; /* relation id of the table */
+ ShmCopyInfo *pcshared_info; /* common info in shared memory */
+ bool is_leader;
+
+ /* chunk position which worker is processing */
+ uint32 worker_processed_pos;
+
+ /*
+ * Local line_buf array, workers will copy it here and release the chunks
+ * for the leader to continue.
+ */
+ ParallelCopyLineBuf worker_line_buf[WORKER_CHUNK_COUNT];
+ uint32 worker_line_buf_count; /* Number of lines */
+
+ /* Current position in worker_line_buf */
+ uint32 worker_line_buf_pos;
+}ParallelCopyData;
+
+/*
* This struct contains all the state variables used throughout a COPY
* operation. For simplicity, we use the same struct for all variants of COPY,
* even though some fields are used in only some cases.
@@ -219,12 +370,61 @@ typedef struct CopyStateData
* converts it. Note: we guarantee that there is a \0 at
* raw_buf[raw_buf_len].
*/
-#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
char *raw_buf;
int raw_buf_index; /* next byte to process */
int raw_buf_len; /* total # of bytes stored */
+ int nworkers;
+ bool is_parallel;
+ ParallelCopyData *pcdata;
} CopyStateData;
+/*
+ * Common information that need to be copied to shared memory.
+ */
+typedef struct CopyWorkerCommonData
+{
+ /* low-level state data */
+ CopyDest copy_dest; /* type of copy source/destination */
+ int file_encoding; /* file or remote side's character encoding */
+ bool need_transcoding; /* file encoding diff from server? */
+ bool encoding_embeds_ascii; /* ASCII can be non-first byte? */
+
+ /* parameters from the COPY command */
+ bool csv_mode; /* Comma Separated Value format? */
+ bool header_line; /* CSV header line? */
+ int null_print_len; /* length of same */
+ bool force_quote_all; /* FORCE_QUOTE *? */
+ bool convert_selectively; /* do selective binary conversion? */
+
+ /* Working state for COPY FROM */
+ AttrNumber num_defaults;
+ Oid relid;
+}CopyWorkerCommonData;
+
+/* List information */
+typedef struct ListInfo
+{
+ int count; /* count of attributes */
+
+ /* string info in the form info followed by info1, info2... infon */
+ char info[1];
+} ListInfo;
+
+/*
+ * List of internal parallel copy function pointers.
+ */
+static const struct
+{
+ char *fn_name;
+ copy_data_source_cb fn_addr;
+} InternalParallelCopyFuncPtrs[] =
+
+{
+ {
+ "copy_read_data", copy_read_data
+ },
+};
+
/* DestReceiver for COPY (query) TO */
typedef struct
{
@@ -254,6 +454,23 @@ typedef struct
/* Trim the list of buffers back down to this number after flushing */
#define MAX_PARTITION_BUFFERS 32
+/* DSM keys for parallel copy. */
+#define PARALLEL_COPY_KEY_SHARED_INFO 1
+#define PARALLEL_COPY_KEY_CSTATE 2
+#define PARALLEL_COPY_KEY_NULL_PRINT 3
+#define PARALLEL_COPY_KEY_NULL_PRINT_CLIENT 4
+#define PARALLEL_COPY_KEY_DELIM 5
+#define PARALLEL_COPY_KEY_QUOTE 6
+#define PARALLEL_COPY_KEY_ESCAPE 7
+#define PARALLEL_COPY_KEY_ATTNAME_LIST 8
+#define PARALLEL_COPY_KEY_FORCE_QUOTE_LIST 9
+#define PARALLEL_COPY_KEY_NOT_NULL_LIST 10
+#define PARALLEL_COPY_KEY_NULL_LIST 11
+#define PARALLEL_COPY_KEY_CONVERT_LIST 12
+#define PARALLEL_COPY_KEY_DATASOURCE_CB 13
+#define PARALLEL_COPY_KEY_WHERE_CLAUSE_STR 14
+#define PARALLEL_COPY_KEY_RANGE_TABLE 15
+
/* Stores multi-insert data related to a single relation in CopyFrom. */
typedef struct CopyMultiInsertBuffer
{
@@ -295,104 +512,1484 @@ typedef struct CopyMultiInsertInfo
*/
/*
- * This keeps the character read at the top of the loop in the buffer
- * even if there is more than one read-ahead.
+ * This keeps the character read at the top of the loop in the buffer
+ * even if there is more than one read-ahead.
+ */
+#define IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(extralen) \
+if (1) \
+{ \
+ if (copy_buff_state.raw_buf_ptr + (extralen) >= copy_buff_state.copy_buf_len && !hit_eof) \
+ { \
+ if (IsParallelCopy()) \
+ { \
+ copy_buff_state.chunk_size = prev_chunk_size; /* update previous chunk size */ \
+ if (copy_buff_state.block_switched) \
+ { \
+ pg_atomic_sub_fetch_u32(©_buff_state.data_blk_ptr->unprocessed_chunk_parts, 1); \
+ copy_buff_state.copy_buf_len = prev_copy_buf_len; \
+ } \
+ } \
+ copy_buff_state.raw_buf_ptr = prev_raw_ptr; /* undo fetch */ \
+ need_data = true; \
+ continue; \
+ } \
+} else ((void) 0)
+
+/* This consumes the remainder of the buffer and breaks */
+#define IF_NEED_REFILL_AND_EOF_BREAK(extralen) \
+if (1) \
+{ \
+ if (copy_buff_state.raw_buf_ptr + (extralen) >= copy_buff_state.copy_buf_len && hit_eof) \
+ { \
+ if (extralen) \
+ copy_buff_state.raw_buf_ptr = copy_buff_state.copy_buf_len; /* consume the partial character */ \
+ /* backslash just before EOF, treat as data char */ \
+ result = true; \
+ break; \
+ } \
+} else ((void) 0)
+
+/*
+ * Transfer any approved data to line_buf; must do this to be sure
+ * there is some room in raw_buf.
+ */
+#define REFILL_LINEBUF \
+if (1) \
+{ \
+ if (copy_buff_state.raw_buf_ptr > cstate->raw_buf_index && !IsParallelCopy()) \
+ appendBinaryStringInfo(&cstate->line_buf, \
+ cstate->raw_buf + cstate->raw_buf_index, \
+ copy_buff_state.raw_buf_ptr - cstate->raw_buf_index); \
+ cstate->raw_buf_index = copy_buff_state.raw_buf_ptr; \
+} else ((void) 0)
+
+/* Undo any read-ahead and jump out of the block. */
+#define NO_END_OF_COPY_GOTO \
+if (1) \
+{ \
+ if (!IsParallelCopy()) \
+ copy_buff_state.raw_buf_ptr = prev_raw_ptr + 1; \
+ else \
+ { \
+ copy_buff_state.chunk_size = prev_chunk_size + 1; \
+ if (copy_buff_state.block_switched) \
+ { \
+ pg_atomic_sub_fetch_u32(©_buff_state.data_blk_ptr->unprocessed_chunk_parts, 1); \
+ cstate->raw_buf = copy_buff_state.data_blk_ptr->data; \
+ copy_buff_state.copy_buf_len = prev_copy_buf_len; \
+ } \
+ copy_buff_state.raw_buf_ptr = (prev_raw_ptr + 1) % DATA_BLOCK_SIZE; \
+ } \
+ goto not_end_of_copy; \
+} else ((void) 0)
+
+/*
+ * SEEK_COPY_BUFF_POS - Seek the buffer and set the buffer state information.
+ */
+#define SEEK_COPY_BUFF_POS(cstate, add_size, copy_buff_state) \
+{ \
+ if (IsParallelCopy()) \
+ { \
+ copy_buff_state.chunk_size += add_size; \
+ if (copy_buff_state.raw_buf_ptr + add_size >= DATA_BLOCK_SIZE) \
+ { \
+ /* Increment the unprocessed chunks for the block which we are working */ \
+ if (copy_buff_state.copy_raw_buf == copy_buff_state.data_blk_ptr->data) \
+ pg_atomic_add_fetch_u32(©_buff_state.data_blk_ptr->unprocessed_chunk_parts, 1); \
+ else \
+ pg_atomic_add_fetch_u32(©_buff_state.curr_data_blk_ptr->unprocessed_chunk_parts, 1); \
+ cstate->raw_buf = copy_buff_state.curr_data_blk_ptr->data; \
+ copy_buff_state.copy_buf_len -= DATA_BLOCK_SIZE; \
+ copy_buff_state.raw_buf_ptr = 0; \
+ copy_buff_state.block_switched = true; \
+ } \
+ else \
+ copy_buff_state.raw_buf_ptr += add_size; \
+ } \
+ else \
+ copy_buff_state.raw_buf_ptr += add_size; \
+}
+
+/*
+ * BEGIN_READ_LINE - Initializes the buff state for read line.
+ */
+#define BEGIN_READ_LINE(cstate, chunk_first_block) \
+{ \
+ copy_buff_state.copy_raw_buf = cstate->raw_buf; \
+ copy_buff_state.raw_buf_ptr = cstate->raw_buf_index; \
+ copy_buff_state.copy_buf_len = cstate->raw_buf_len; \
+ /* \
+ * There is some data that was read earlier, which need to be \
+ * processed. \
+ */ \
+ if (IsParallelCopy()) \
+ { \
+ copy_buff_state.chunk_size = 0; \
+ if ((copy_buff_state.copy_buf_len - copy_buff_state.raw_buf_ptr) > 0) \
+ { \
+ ShmCopyInfo *pcshared_info = cstate->pcdata->pcshared_info; \
+ uint32 cur_block_pos = pcshared_info->cur_block_pos; \
+ chunk_first_block = pcshared_info->cur_block_pos; \
+ copy_buff_state.data_blk_ptr = &pcshared_info->data_blocks[cur_block_pos]; \
+ copy_buff_state.curr_data_blk_ptr = copy_buff_state.data_blk_ptr; \
+ } \
+ } \
+}
+
+/*
+ * SET_RAWBUF_FOR_LOAD - Set raw_buf to the shared memory where the file data must
+ * be read.
+ */
+#define SET_RAWBUF_FOR_LOAD() \
+{ \
+ ShmCopyInfo *pcshared_info = cstate->pcdata->pcshared_info; \
+ uint32 cur_block_pos; \
+ /* \
+ * Mark the previous block as completed, worker can start copying this data. \
+ */ \
+ if (copy_buff_state.data_blk_ptr != copy_buff_state.curr_data_blk_ptr && \
+ copy_buff_state.data_blk_ptr->curr_blk_completed == false) \
+ copy_buff_state.data_blk_ptr->curr_blk_completed = true; \
+ \
+ copy_buff_state.data_blk_ptr = copy_buff_state.curr_data_blk_ptr; \
+ cur_block_pos = WaitGetFreeCopyBlock(pcshared_info); \
+ copy_buff_state.curr_data_blk_ptr = &pcshared_info->data_blocks[cur_block_pos]; \
+ \
+ if (!copy_buff_state.data_blk_ptr) \
+ { \
+ copy_buff_state.data_blk_ptr = copy_buff_state.curr_data_blk_ptr; \
+ chunk_first_block = cur_block_pos; \
+ } \
+ else if (need_data == false) \
+ copy_buff_state.data_blk_ptr->following_block = cur_block_pos; \
+ \
+ cstate->raw_buf = copy_buff_state.curr_data_blk_ptr->data; \
+ copy_buff_state.copy_raw_buf = cstate->raw_buf; \
+}
+
+/*
+ * END_CHUNK_PARALLEL_COPY - Update the chunk information in shared memory.
+ */
+#define END_CHUNK_PARALLEL_COPY() \
+{ \
+ if (!IsHeaderLine()) \
+ { \
+ ShmCopyInfo *pcshared_info = cstate->pcdata->pcshared_info; \
+ ChunkBoundaries *chunkBoundaryPtr = &pcshared_info->chunk_boundaries; \
+ if (copy_buff_state.chunk_size) \
+ { \
+ ChunkBoundary *chunkInfo = &chunkBoundaryPtr->ring[chunk_pos]; \
+ /* \
+ * If raw_buf_ptr is zero, unprocessed_chunk_parts would have been \
+ * incremented in SEEK_COPY_BUFF_POS. This will happen if the whole \
+ * chunk finishes at the end of the current block. If the \
+ * new_line_size > raw_buf_ptr, then the new block has only new line \
+ * char content. The unprocessed count should not be increased in \
+ * this case. \
+ */ \
+ if (copy_buff_state.raw_buf_ptr != 0 && \
+ copy_buff_state.raw_buf_ptr > new_line_size) \
+ pg_atomic_add_fetch_u32(©_buff_state.curr_data_blk_ptr->unprocessed_chunk_parts, 1); \
+ \
+ /* Update chunk size. */ \
+ pg_atomic_write_u32(&chunkInfo->chunk_size, copy_buff_state.chunk_size); \
+ pg_atomic_write_u32(&chunkInfo->chunk_state, CHUNK_LEADER_POPULATED); \
+ elog(DEBUG1, "[Leader] After adding - chunk position:%d, chunk_size:%d", \
+ chunk_pos, copy_buff_state.chunk_size); \
+ pcshared_info->populated++; \
+ } \
+ else if (new_line_size) \
+ { \
+ /* \
+ * This means only new line char, empty record should be \
+ * inserted. \
+ */ \
+ ChunkBoundary *chunkInfo; \
+ chunk_pos = UpdateBlockInChunkInfo(cstate, -1, -1, 0, \
+ CHUNK_LEADER_POPULATED); \
+ chunkInfo = &chunkBoundaryPtr->ring[chunk_pos]; \
+ elog(DEBUG1, "[Leader] Added empty chunk with offset:%d, chunk position:%d, chunk size:%d", \
+ chunkInfo->start_offset, chunk_pos, \
+ pg_atomic_read_u32(&chunkInfo->chunk_size)); \
+ pcshared_info->populated++; \
+ } \
+ }\
+ \
+ /*\
+ * All of the read data is processed, reset index & len. In the\
+ * subsequent read, we will get a new block and copy data in to the\
+ * new block.\
+ */\
+ if (copy_buff_state.raw_buf_ptr == copy_buff_state.copy_buf_len)\
+ {\
+ cstate->raw_buf_index = 0;\
+ cstate->raw_buf_len = 0;\
+ }\
+ else\
+ cstate->raw_buf_len = copy_buff_state.copy_buf_len;\
+}
+
+/*
+ * CONVERT_TO_SERVER_ENCODING - convert contents to server encoding.
+ */
+#define CONVERT_TO_SERVER_ENCODING(cstate) \
+{ \
+ /* Done reading the line. Convert it to server encoding. */ \
+ if (cstate->need_transcoding && \
+ (!IsParallelCopy() || \
+ (IsParallelCopy() && !IsLeader()))) \
+ { \
+ char *cvt; \
+ cvt = pg_any_to_server(cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ cstate->file_encoding); \
+ if (cvt != cstate->line_buf.data) \
+ { \
+ /* transfer converted data back to line_buf */ \
+ resetStringInfo(&cstate->line_buf); \
+ appendBinaryStringInfo(&cstate->line_buf, cvt, strlen(cvt)); \
+ pfree(cvt); \
+ } \
+ } \
+ /* Now it's safe to use the buffer in error messages */ \
+ cstate->line_buf_converted = true; \
+}
+
+/*
+ * CLEAR_EOL_CHUNK_NON_PARALLEL - Clear EOL from the copied data.
+ */
+#define CLEAR_EOL_CHUNK_NON_PARALLEL(cstate) \
+{ \
+ /* \
+ * If we didn't hit EOF, then we must have transferred the EOL marker \
+ * to line_buf along with the data. Get rid of it. \
+ */ \
+ switch (cstate->eol_type) \
+ { \
+ case EOL_NL: \
+ Assert(cstate->line_buf.len >= 1); \
+ Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\n'); \
+ cstate->line_buf.len--; \
+ cstate->line_buf.data[cstate->line_buf.len] = '\0'; \
+ break; \
+ case EOL_CR: \
+ Assert(cstate->line_buf.len >= 1); \
+ Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\r'); \
+ cstate->line_buf.len--; \
+ cstate->line_buf.data[cstate->line_buf.len] = '\0'; \
+ break; \
+ case EOL_CRNL: \
+ Assert(cstate->line_buf.len >= 2); \
+ Assert(cstate->line_buf.data[cstate->line_buf.len - 2] == '\r'); \
+ Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\n'); \
+ cstate->line_buf.len -= 2; \
+ cstate->line_buf.data[cstate->line_buf.len] = '\0'; \
+ break; \
+ case EOL_UNKNOWN: \
+ /* shouldn't get here */ \
+ Assert(false); \
+ break; \
+ } \
+}
+
+/*
+ * CLEAR_EOL_LINE - Wrapper for clearing EOL.
+ */
+#define CLEAR_EOL_LINE() \
+{ \
+ if (!result && !IsHeaderLine()) \
+ { \
+ if (IsParallelCopy()) \
+ new_line_size = ClearEOLFromParallelChunk(cstate, ©_buff_state); \
+ else \
+ CLEAR_EOL_CHUNK_NON_PARALLEL(cstate) \
+ } \
+}
+
+/*
+ * COPY_WAIT_TO_PROCESS - Wait before continuing to process.
+ */
+#define COPY_WAIT_TO_PROCESS() \
+{ \
+ CHECK_FOR_INTERRUPTS(); \
+ (void) WaitLatch(MyLatch, \
+ WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, \
+ 1L, WAIT_EVENT_PG_SLEEP); \
+ ResetLatch(MyLatch); \
+}
+
+/*
+ * INCREMENTPROCESSED - Increment the lines processed.
+ */
+#define INCREMENTPROCESSED(processed) \
+{ \
+ if (!IsParallelCopy()) \
+ processed++; \
+ else \
+ pg_atomic_add_fetch_u64(&cstate->pcdata->pcshared_info->processed, 1); \
+}
+
+/*
+ * GETPROCESSED - Get the lines processed.
+ */
+#define GETPROCESSED(processed) \
+if (!IsParallelCopy()) \
+ return processed; \
+else \
+ return pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
+
+static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
+
+/* non-export function prototypes */
+static CopyState BeginCopy(ParseState *pstate, bool is_from, Relation rel,
+ RawStmt *raw_query, Oid queryRelId, List *attnamelist,
+ List *options);
+static void EndCopy(CopyState cstate);
+static void ClosePipeToProgram(CopyState cstate);
+static CopyState BeginCopyTo(ParseState *pstate, Relation rel, RawStmt *query,
+ Oid queryRelId, const char *filename, bool is_program,
+ List *attnamelist, List *options);
+static void EndCopyTo(CopyState cstate);
+static uint64 DoCopyTo(CopyState cstate);
+static uint64 CopyTo(CopyState cstate);
+static void CopyOneRowTo(CopyState cstate, TupleTableSlot *slot);
+static bool CopyReadLine(CopyState cstate);
+static bool CopyReadLineText(CopyState cstate);
+static int CopyReadAttributesText(CopyState cstate);
+static int CopyReadAttributesCSV(CopyState cstate);
+static Datum CopyReadBinaryAttribute(CopyState cstate,
+ int column_no, FmgrInfo *flinfo,
+ Oid typioparam, int32 typmod,
+ bool *isnull);
+static void CopyAttributeOutText(CopyState cstate, char *string);
+static void CopyAttributeOutCSV(CopyState cstate, char *string,
+ bool use_quote, bool single_attr);
+static List *CopyGetAttnums(TupleDesc tupDesc, Relation rel,
+ List *attnamelist);
+static char *limit_printout_length(const char *str);
+
+/* Low-level communications functions */
+static void SendCopyBegin(CopyState cstate);
+static void ReceiveCopyBegin(CopyState cstate);
+static void SendCopyEnd(CopyState cstate);
+static void CopySendData(CopyState cstate, const void *databuf, int datasize);
+static void CopySendString(CopyState cstate, const char *str);
+static void CopySendChar(CopyState cstate, char c);
+static void CopySendEndOfRow(CopyState cstate);
+static int CopyGetData(CopyState cstate, void *databuf,
+ int minread, int maxread);
+static void CopySendInt32(CopyState cstate, int32 val);
+static bool CopyGetInt32(CopyState cstate, int32 *val);
+static void CopySendInt16(CopyState cstate, int16 val);
+static bool CopyGetInt16(CopyState cstate, int16 *val);
+
+static ParallelContext *BeginParallelCopy(int nworkers, CopyState cstate,
+ List *attlist, Oid relid);
+static pg_attribute_always_inline void EndParallelCopy(ParallelContext *pcxt);
+static void ExecBeforeStmtTrigger(CopyState cstate);
+static pg_attribute_always_inline bool IsTriggerFunctionParallelSafe(TriggerDesc *trigdesc);
+static pg_attribute_always_inline bool IsParallelCopyAllowed(CopyState cstate);
+static void CheckCopyFromValidity(CopyState cstate);
+static pg_attribute_always_inline bool CheckExprParallelSafety(CopyState cstate);
+extern void ParallelCopyMain(dsm_segment *seg, shm_toc *toc);
+static void ParallelCopyLeader(CopyState cstate);
+static void ParallelWorkerInitialization(CopyWorkerCommonData *shared_cstate,
+ CopyState cstate, List *attnamelist);
+static bool CacheChunkInfo(CopyState cstate, uint32 buff_count);
+static void PopulateAttributes(CopyState cstate, TupleDesc tup_desc,
+ List *attnamelist);
+static void PopulateCatalogInformation(CopyState cstate);
+static pg_attribute_always_inline uint32 GetChunkPosition(CopyState cstate);
+
+static pg_attribute_always_inline copy_data_source_cb LookupParallelCopyFnPtr(const char *funcname);
+static pg_attribute_always_inline char* LookupParallelCopyFnStr(copy_data_source_cb fn_addr);
+
+/*
+ * CopyCommonInfoForWorker - Copy shared_cstate using cstate information.
+ */
+static pg_attribute_always_inline void
+CopyCommonInfoForWorker(CopyState cstate, CopyWorkerCommonData *shared_cstate)
+{
+ shared_cstate->copy_dest = cstate->copy_dest;
+ shared_cstate->file_encoding = cstate->file_encoding;
+ shared_cstate->need_transcoding = cstate->need_transcoding;
+ shared_cstate->encoding_embeds_ascii = cstate->encoding_embeds_ascii;
+ shared_cstate->csv_mode = cstate->csv_mode;
+ shared_cstate->header_line = cstate->header_line;
+ shared_cstate->null_print_len = cstate->null_print_len;
+ shared_cstate->force_quote_all = cstate->force_quote_all;
+ shared_cstate->convert_selectively = cstate->convert_selectively;
+ shared_cstate->num_defaults = cstate->num_defaults;
+ shared_cstate->relid = cstate->pcdata->relid;
+}
+
+/*
+ * RetrieveSharedString - Retrieve the string from shared memory.
+ */
+static void
+RetrieveSharedString(shm_toc *toc, int sharedkey, char **copystr)
+{
+ char *shared_str_val = (char *)shm_toc_lookup(toc, sharedkey, true);
+ if (shared_str_val)
+ *copystr = pstrdup(shared_str_val);
+}
+
+/*
+ * RetrieveSharedList - Retrieve the list from shared memory.
+ */
+static void
+RetrieveSharedList(shm_toc *toc, int sharedkey, List **copylist)
+{
+ ListInfo *listinformation = (ListInfo *)shm_toc_lookup(toc, sharedkey,
+ true);
+ if (listinformation)
+ {
+ int length = 0;
+ int count;
+
+ for (count = 0; count < listinformation->count; count++)
+ {
+ char *attname = (char *)(listinformation->info + length);
+ length += strlen(attname) + 1;
+ *copylist = lappend(*copylist, makeString(attname));
+ }
+ }
+}
+
+/*
+ * CopyListSharedMemory - Copy the list into shared memory.
+ */
+static void
+CopyListSharedMemory(List *inputlist, Size memsize, ListInfo *sharedlistinfo)
+{
+ ListCell *l;
+ int length = 0;
+
+ MemSet(sharedlistinfo, 0, memsize);
+ foreach(l, inputlist)
+ {
+ char *name = strVal(lfirst(l));
+ memcpy((char *)(sharedlistinfo->info + length), name, strlen(name));
+ sharedlistinfo->count++;
+ length += strlen(name) + 1;
+ }
+}
+
+/*
+ * ComputeListSize - compute the list size.
+ */
+static int
+ComputeListSize(List *inputlist)
+{
+ int est_size = sizeof(int);
+ if (inputlist != NIL)
+ {
+ ListCell *l;
+ foreach(l, inputlist)
+ est_size += strlen(strVal(lfirst(l))) + 1;
+ }
+
+ return est_size;
+}
+
+/*
+ * EstimateChunkKeysStr - Estimate the size required in shared memory for the
+ * input string.
+ */
+static void
+EstimateChunkKeysStr(ParallelContext *pcxt, char *inputstr)
+{
+ if (inputstr)
+ {
+ shm_toc_estimate_chunk(&pcxt->estimator, strlen(inputstr) + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+}
+
+/*
+ * EstimateChunkKeysList - Estimate the size required in shared memory for the
+ * input list.
+ */
+static void
+EstimateChunkKeysList(ParallelContext *pcxt, List *inputlist,
+ Size *est_list_size)
+{
+ if (inputlist != NIL)
+ {
+ *est_list_size = ComputeListSize(inputlist);
+ shm_toc_estimate_chunk(&pcxt->estimator, *est_list_size);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+}
+
+/*
+ * InsertStringShm - Insert a string into shared memory.
+ */
+static void
+InsertStringShm(ParallelContext *pcxt, int key, char *inputstr)
+{
+ if (inputstr)
+ {
+ char *shmptr = (char *)shm_toc_allocate(pcxt->toc,
+ strlen(inputstr) + 1);
+ strcpy(shmptr, inputstr);
+ shm_toc_insert(pcxt->toc, key, shmptr);
+ }
+}
+
+/*
+ * InsertListShm - Insert a list into shared memory.
+ */
+static void
+InsertListShm(ParallelContext *pcxt, int key, List *inputlist,
+ Size est_list_size)
+{
+ if (inputlist != NIL)
+ {
+ ListInfo *sharedlistinfo = (ListInfo *)shm_toc_allocate(pcxt->toc,
+ est_list_size);
+ CopyListSharedMemory(inputlist, est_list_size, sharedlistinfo);
+ shm_toc_insert(pcxt->toc, key, sharedlistinfo);
+ }
+}
+
+/*
+ * IsTriggerFunctionParallelSafe - Check if the trigger function is parallel
+ * safe for the triggers. Return false if any one of the trigger has parallel
+ * unsafe function.
+ */
+static pg_attribute_always_inline bool
+IsTriggerFunctionParallelSafe(TriggerDesc *trigdesc)
+{
+ int i;
+ for (i = 0; i < trigdesc->numtriggers; i++)
+ {
+ Trigger *trigger = &trigdesc->triggers[i];
+ int trigtype = RI_TRIGGER_NONE;
+
+ if (func_parallel(trigger->tgfoid) != PROPARALLEL_SAFE)
+ return false;
+
+ /* If the trigger is parallel safe, also look for RI_TRIGGER. */
+ trigtype = RI_FKey_trigger_type(trigger->tgfoid);
+ if (trigtype == RI_TRIGGER_PK || trigtype == RI_TRIGGER_FK)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * CheckExprParallelSafety - determine parallel safety of volatile expressions
+ * in default clause of column definition or in where clause and return true if
+ * they are parallel safe.
+ */
+static pg_attribute_always_inline bool
+CheckExprParallelSafety(CopyState cstate)
+{
+ if (contain_volatile_functions(cstate->whereClause))
+ {
+ if (!is_parallel_safe(NULL, (Node *)cstate->whereClause))
+ return false;
+ }
+
+ if (cstate->volatile_defexprs && cstate->defexprs != NULL &&
+ cstate->num_defaults != 0)
+ {
+ int i;
+ for (i = 0; i < cstate->num_defaults; i++)
+ {
+ if (!is_parallel_safe(NULL, (Node *) cstate->defexprs[i]->expr))
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * FindInsertMethod - determine insert mode single, multi, or multi conditional.
+ */
+static pg_attribute_always_inline CopyInsertMethod
+FindInsertMethod(CopyState cstate)
+{
+ if (cstate->rel->trigdesc != NULL &&
+ (cstate->rel->trigdesc->trig_insert_before_row ||
+ cstate->rel->trigdesc->trig_insert_instead_row))
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
+ cstate->rel->trigdesc != NULL &&
+ cstate->rel->trigdesc->trig_insert_new_table)
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+ return CIM_MULTI_CONDITIONAL;
+
+ return CIM_MULTI;
+}
+
+/*
+ * IsParallelCopyAllowed - check for the cases where parallel copy is not
+ * applicable.
+ */
+static pg_attribute_always_inline bool
+IsParallelCopyAllowed(CopyState cstate)
+{
+ /* Parallel copy not allowed for freeze & binary option. */
+ if (cstate->freeze || cstate->binary)
+ return false;
+
+ /* Check if copy is into foreign table. */
+ if (cstate->rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+ return false;
+
+ /* Check if copy is into a temporary table. */
+ if (RELATION_IS_LOCAL(cstate->rel) || RELATION_IS_OTHER_TEMP(cstate->rel))
+ return false;
+
+ /* Check if trigger function is parallel safe. */
+ if (cstate->rel->trigdesc != NULL &&
+ !IsTriggerFunctionParallelSafe(cstate->rel->trigdesc))
+ return false;
+
+ /*
+ * Check if there is after statement or instead of trigger or transition
+ * table triggers.
+ */
+ if (cstate->rel->trigdesc != NULL &&
+ (cstate->rel->trigdesc->trig_insert_after_statement ||
+ cstate->rel->trigdesc->trig_insert_instead_row ||
+ cstate->rel->trigdesc->trig_insert_new_table))
+ return false;
+
+ /* Check if the volatile expressions are parallel safe, if present any. */
+ if (!CheckExprParallelSafety(cstate))
+ return false;
+
+ /* Check if the insertion mode is single. */
+ if (FindInsertMethod(cstate) == CIM_SINGLE)
+ return false;
+
+ return true;
+}
+
+/*
+ * BeginParallelCopy - start parallel copy tasks.
+ *
+ * Get the number of workers required to perform the parallel copy. The data
+ * structures that are required by the parallel workers will be initialized, the
+ * size required in DSM will be calculated and the necessary keys will be loaded
+ * in the DSM. The specified number of workers will then be launched.
+ *
+ */
+static ParallelContext*
+BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
+{
+ ParallelContext *pcxt;
+ ShmCopyInfo *shared_info_ptr;
+ CopyWorkerCommonData *shared_cstate;
+ FullTransactionId full_transaction_id;
+ Size est_shared_info;
+ Size est_cstateshared;
+ Size est_att_list_size;
+ Size est_quote_list_size;
+ Size est_notnull_list_size;
+ Size est_null_list_size;
+ Size est_convert_list_size;
+ Size est_datasource_cb_size;
+ int count = 0;
+ char *whereClauseStr = NULL;
+ char *rangeTableStr = NULL;
+ int parallel_workers = 0;
+ ParallelCopyData *pcdata;
+
+ CheckCopyFromValidity(cstate);
+
+ parallel_workers = Min(nworkers, max_worker_processes);
+
+ /* Can't perform copy in parallel */
+ if (parallel_workers <= 0)
+ return NULL;
+
+ pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
+ cstate->pcdata = pcdata;
+
+ /*
+ * User chosen parallel copy. Determine if the parallel copy is actually
+ * allowed. If not, go with the non-parallel mode.
+ */
+ if (!IsParallelCopyAllowed(cstate))
+ return NULL;
+
+ full_transaction_id = GetCurrentFullTransactionId();
+
+ EnterParallelMode();
+ pcxt = CreateParallelContext("postgres", "ParallelCopyMain",
+ parallel_workers);
+ Assert(pcxt->nworkers > 0);
+
+ /*
+ * Estimate size for shared information for PARALLEL_COPY_KEY_SHARED_INFO
+ */
+ est_shared_info = sizeof(ShmCopyInfo);
+ shm_toc_estimate_chunk(&pcxt->estimator, est_shared_info);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate the size for shared information for PARALLEL_COPY_KEY_CSTATE */
+ est_cstateshared = MAXALIGN(sizeof(CopyWorkerCommonData));
+ shm_toc_estimate_chunk(&pcxt->estimator, est_cstateshared);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ EstimateChunkKeysStr(pcxt, cstate->null_print);
+ EstimateChunkKeysStr(pcxt, cstate->null_print_client);
+ EstimateChunkKeysStr(pcxt, cstate->delim);
+ EstimateChunkKeysStr(pcxt, cstate->quote);
+ EstimateChunkKeysStr(pcxt, cstate->escape);
+
+ if (cstate->whereClause != NULL)
+ {
+ whereClauseStr = nodeToString(cstate->whereClause);
+ EstimateChunkKeysStr(pcxt, whereClauseStr);
+ }
+
+ if (cstate->range_table != NULL)
+ {
+ rangeTableStr = nodeToString(cstate->range_table);
+ EstimateChunkKeysStr(pcxt, rangeTableStr);
+ }
+
+ /* Estimate the size for shared information for PARALLEL_COPY_KEY_XID. */
+ shm_toc_estimate_chunk(&pcxt->estimator, sizeof(FullTransactionId));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_ATTNAME_LIST.
+ */
+ EstimateChunkKeysList(pcxt, attnamelist, &est_att_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_FORCE_QUOTE_LIST.
+ */
+ EstimateChunkKeysList(pcxt, cstate->force_quote, &est_quote_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_NOT_NULL_LIST.
+ */
+ EstimateChunkKeysList(pcxt, cstate->force_notnull,
+ &est_notnull_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_NULL_LIST.
+ */
+ EstimateChunkKeysList(pcxt, cstate->force_null, &est_null_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_CONVERT_LIST.
+ */
+ EstimateChunkKeysList(pcxt, cstate->convert_select,
+ &est_convert_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_DATASOURCE_CB.
+ */
+ if (cstate->data_source_cb)
+ {
+ char *functionname = LookupParallelCopyFnStr(cstate->data_source_cb);
+ est_datasource_cb_size = strlen(functionname) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, est_datasource_cb_size);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+
+ InitializeParallelDSM(pcxt);
+
+ /* If no DSM segment was available, back out (do serial copy) */
+ if (pcxt->seg == NULL)
+ {
+ EndParallelCopy(pcxt);
+ return NULL;
+ }
+
+ /* Allocate shared memory for PARALLEL_COPY_KEY_SHARED_INFO */
+ shared_info_ptr = (ShmCopyInfo *) shm_toc_allocate(pcxt->toc,
+ est_shared_info);
+ MemSet(shared_info_ptr, 0, est_shared_info);
+ shared_info_ptr->is_read_in_progress = true;
+ shared_info_ptr->cur_block_pos = -1;
+ shared_info_ptr->full_transaction_id = full_transaction_id;
+ shared_info_ptr->mycid = GetCurrentCommandId(true);
+ for (count = 0; count < RINGSIZE; count++)
+ {
+ ChunkBoundary *chunkInfo = &shared_info_ptr->chunk_boundaries.ring[count];
+ pg_atomic_init_u32(&(chunkInfo->chunk_size), -1);
+ }
+
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_SHARED_INFO, shared_info_ptr);
+ pcdata->pcshared_info = shared_info_ptr;
+ pcdata->relid = relid;
+
+ /* Store shared build state, for which we reserved space. */
+ shared_cstate = (CopyWorkerCommonData *)shm_toc_allocate(pcxt->toc,
+ est_cstateshared);
+
+ /* copy cstate variables. */
+ CopyCommonInfoForWorker(cstate, shared_cstate);
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_CSTATE, shared_cstate);
+
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_NULL_PRINT, cstate->null_print);
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_NULL_PRINT_CLIENT,
+ cstate->null_print_client);
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_DELIM, cstate->delim);
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_QUOTE, cstate->quote);
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_ESCAPE, cstate->escape);
+
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_ATTNAME_LIST,
+ attnamelist, est_att_list_size);
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_FORCE_QUOTE_LIST,
+ cstate->force_quote, est_quote_list_size);
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_NOT_NULL_LIST,
+ cstate->force_notnull, est_notnull_list_size);
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_NULL_LIST, cstate->force_null,
+ est_null_list_size);
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_CONVERT_LIST,
+ cstate->convert_select, est_convert_list_size);
+
+ if (cstate->data_source_cb)
+ {
+ char *functionname = LookupParallelCopyFnStr(cstate->data_source_cb);
+ char *data_source_cb = (char *) shm_toc_allocate(pcxt->toc,
+ est_datasource_cb_size);
+ strcpy(data_source_cb, functionname);
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_DATASOURCE_CB,
+ data_source_cb);
+ }
+
+ if (cstate->whereClause)
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_WHERE_CLAUSE_STR,
+ whereClauseStr);
+
+ if(cstate->range_table)
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_RANGE_TABLE, rangeTableStr);
+
+ LaunchParallelWorkers(pcxt);
+ if (pcxt->nworkers_launched == 0)
+ {
+ EndParallelCopy(pcxt);
+ return NULL;
+ }
+
+ /*
+ * Caller needs to wait for all launched workers when we return. Make sure
+ * that the failure-to-start case will not hang forever.
+ */
+ WaitForParallelWorkersToAttach(pcxt);
+
+ pcdata->is_leader = true;
+ cstate->is_parallel = true;
+ return pcxt;
+}
+
+/*
+ * EndParallelCopy - end the parallel copy tasks.
+ */
+static pg_attribute_always_inline void
+EndParallelCopy(ParallelContext *pcxt)
+{
+ Assert(!IsParallelWorker());
+
+ DestroyParallelContext(pcxt);
+ ExitParallelMode();
+}
+
+/*
+ * ParallelWorkerInitialization - Initialize parallel worker.
+ */
+static void
+ParallelWorkerInitialization(CopyWorkerCommonData *shared_cstate,
+ CopyState cstate, List *attnamelist)
+{
+ uint32 count;
+ ParallelCopyData *pcdata = cstate->pcdata;
+ TupleDesc tup_desc = RelationGetDescr(cstate->rel);
+
+ cstate->copy_dest = shared_cstate->copy_dest;
+ cstate->file_encoding = shared_cstate->file_encoding;
+ cstate->need_transcoding = shared_cstate->need_transcoding;
+ cstate->encoding_embeds_ascii = shared_cstate->encoding_embeds_ascii;
+ cstate->csv_mode = shared_cstate->csv_mode;
+ cstate->header_line = shared_cstate->header_line;
+ cstate->null_print_len = shared_cstate->null_print_len;
+ cstate->force_quote_all = shared_cstate->force_quote_all;
+ cstate->convert_selectively = shared_cstate->convert_selectively;
+ cstate->num_defaults = shared_cstate->num_defaults;
+ pcdata->relid = shared_cstate->relid;
+
+ PopulateAttributes(cstate, tup_desc, attnamelist);
+
+ /* Initialize state variables. */
+ cstate->reached_eof = false;
+ cstate->eol_type = EOL_UNKNOWN;
+ cstate->cur_relname = RelationGetRelationName(cstate->rel);
+ cstate->cur_lineno = 0;
+ cstate->cur_attname = NULL;
+ cstate->cur_attval = NULL;
+
+ /* Set up variables to avoid per-attribute overhead. */
+ initStringInfo(&cstate->attribute_buf);
+
+ initStringInfo(&cstate->line_buf);
+ for (count = 0; count < WORKER_CHUNK_COUNT;count++)
+ initStringInfo(&pcdata->worker_line_buf[count].line_buf);
+
+ cstate->line_buf_converted = false;
+ cstate->raw_buf = NULL;
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+
+ PopulateCatalogInformation(cstate);
+
+ /* Create workspace for CopyReadAttributes results. */
+ if (!cstate->binary)
+ {
+ AttrNumber attr_count = list_length(cstate->attnumlist);
+
+ cstate->max_fields = attr_count;
+ cstate->raw_fields = (char **)palloc(attr_count * sizeof(char *));
+ }
+}
+
+/*
+ * CacheChunkInfo - Cache the chunk information to local memory.
+ */
+static bool
+CacheChunkInfo(CopyState cstate, uint32 buff_count)
+{
+ ShmCopyInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyData *pcdata = cstate->pcdata;
+ uint32 write_pos;
+ CopyDataBlock *data_blk_ptr;
+ ChunkBoundary *chunkInfo;
+ uint32 offset;
+ int dataSize;
+ int copiedSize = 0;
+
+ resetStringInfo(&pcdata->worker_line_buf[buff_count].line_buf);
+ write_pos = GetChunkPosition(cstate);
+ if (-1 == write_pos)
+ return true;
+
+ /* Get the current chunk information. */
+ chunkInfo = &pcshared_info->chunk_boundaries.ring[write_pos];
+ if (pg_atomic_read_u32(&chunkInfo->chunk_size) == 0)
+ goto empty_data_chunk_update;
+
+ /* Get the block information. */
+ data_blk_ptr = &pcshared_info->data_blocks[chunkInfo->first_block];
+
+ /* Get the offset information from where the data must be copied. */
+ offset = chunkInfo->start_offset;
+ pcdata->worker_line_buf[buff_count].cur_lineno = chunkInfo->cur_lineno;
+
+ elog(DEBUG1, "[Worker] Processing - chunk position:%d, block:%d, unprocessed chunks:%d, offset:%d, chunk size:%d",
+ write_pos, chunkInfo->first_block,
+ pg_atomic_read_u32(&data_blk_ptr->unprocessed_chunk_parts),
+ offset, pg_atomic_read_u32(&chunkInfo->chunk_size));
+
+ for (;;)
+ {
+ /*
+ * There is a possibility that the above loop has come out because
+ * data_blk_ptr->curr_blk_completed is set, but dataSize read might
+ * be an old value, if data_blk_ptr->curr_blk_completed and the chunk is
+ * completed, chunk_size will be set. Read the chunk_size again to be
+ * sure if it is complete or partial block.
+ */
+ dataSize = pg_atomic_read_u32(&chunkInfo->chunk_size);
+ if (dataSize)
+ {
+ int remainingSize = dataSize - copiedSize;
+ if (!remainingSize)
+ break;
+
+ /* Whole chunk is in current block. */
+ if (remainingSize + offset < DATA_BLOCK_SIZE)
+ {
+ appendBinaryStringInfo(&pcdata->worker_line_buf[buff_count].line_buf, &data_blk_ptr->data[offset],
+ remainingSize);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_chunk_parts, 1);
+ break;
+ }
+ else
+ {
+ /* Chunk is spread across the blocks. */
+ int chunkInCurrentBlock = DATA_BLOCK_SIZE - offset;
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ chunkInCurrentBlock);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_chunk_parts, 1);
+ copiedSize += chunkInCurrentBlock;
+ while (copiedSize < dataSize)
+ {
+ CopyDataBlock *currBlkPtr = &pcshared_info->data_blocks[data_blk_ptr->following_block];
+
+ /*
+ * If complete data is present in current block use
+ * dataSize - copiedSize, or copy the whole block from
+ * current block.
+ */
+ int currentBlockCopySize = Min(dataSize - copiedSize, DATA_BLOCK_SIZE);
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &currBlkPtr->data[0],
+ currentBlockCopySize);
+ pg_atomic_sub_fetch_u32(&currBlkPtr->unprocessed_chunk_parts, 1);
+ copiedSize += currentBlockCopySize;
+ data_blk_ptr = currBlkPtr;
+ }
+
+ break;
+ }
+ }
+ else
+ {
+ /* Copy this complete block from the current offset. */
+ int chunkInCurrentBlock = DATA_BLOCK_SIZE - offset;
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ chunkInCurrentBlock);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_chunk_parts, 1);
+ copiedSize += chunkInCurrentBlock;
+
+ /*
+ * Reset the offset. For the first copy, copy from the offset. For
+ * the subsequent copy the complete block.
+ */
+ offset = 0;
+
+ /* Set data_blk_ptr to the following block. */
+ data_blk_ptr = &pcshared_info->data_blocks[data_blk_ptr->following_block];
+ }
+
+ for (;;)
+ {
+ /* Get the size of this chunk */
+ dataSize = pg_atomic_read_u32(&chunkInfo->chunk_size);
+
+ /*
+ * If the data is present in current block chunkInfo.chunk_size
+ * will be updated. If the data is spread across the blocks either
+ * of chunkInfo.chunk_size or data_blk_ptr->curr_blk_completed can
+ * be updated. chunkInfo.chunk_size will be updated if the complete
+ * read is finished. data_blk_ptr->curr_blk_completed will be
+ * updated if processing of current block is finished and data
+ * processing is not finished.
+ */
+ if (data_blk_ptr->curr_blk_completed || (dataSize != -1))
+ break;
+
+ COPY_WAIT_TO_PROCESS()
+ }
+ }
+
+empty_data_chunk_update:
+ elog(DEBUG1, "[Worker] Completed processing chunk:%d", write_pos);
+ pg_atomic_write_u32(&chunkInfo->chunk_state, CHUNK_WORKER_PROCESSED);
+ pg_atomic_write_u32(&chunkInfo->chunk_size, -1);
+ pg_atomic_add_fetch_u64(&pcshared_info->total_worker_processed, 1);
+ return false;
+}
+
+/*
+ * GetWorkerChunk - Returns a chunk for worker to process.
+ */
+static bool
+GetWorkerChunk(CopyState cstate)
+{
+ uint32 buff_count;
+ ParallelCopyData *pcdata = cstate->pcdata;
+
+ /*
+ * Copy the chunk data to line_buf and release the chunk position so that the
+ * worker can continue loading data.
+ */
+ if (pcdata->worker_line_buf_pos < pcdata->worker_line_buf_count)
+ goto return_chunk;
+
+ pcdata->worker_line_buf_pos = 0;
+ pcdata->worker_line_buf_count = 0;
+
+ for (buff_count = 0; buff_count < WORKER_CHUNK_COUNT; buff_count++)
+ {
+ bool result = CacheChunkInfo(cstate, buff_count);
+ if (result)
+ break;
+
+ pcdata->worker_line_buf_count++;
+ }
+
+ if (pcdata->worker_line_buf_count)
+ goto return_chunk;
+ else
+ resetStringInfo(&cstate->line_buf);
+
+ return true;
+
+return_chunk:
+ cstate->line_buf = pcdata->worker_line_buf[pcdata->worker_line_buf_pos].line_buf;
+ cstate->cur_lineno = pcdata->worker_line_buf[pcdata->worker_line_buf_pos].cur_lineno;
+ cstate->line_buf_valid = true;
+
+ /* Mark that encoding conversion hasn't occurred yet. */
+ cstate->line_buf_converted = false;
+ CONVERT_TO_SERVER_ENCODING(cstate)
+ pcdata->worker_line_buf_pos++;
+ return false;
+}
+
+/*
+ * ParallelCopyMain - parallel copy worker's code.
+ *
+ * Where clause handling, convert tuple to columns, add default null values for
+ * the missing columns that are not present in that record. Find the partition
+ * if it is partitioned table, invoke before row insert Triggers, handle
+ * constraints and insert the tuples.
+ */
+void
+ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
+{
+ CopyState cstate;
+ ParallelCopyData *pcdata;
+ ShmCopyInfo *pcshared_info;
+ CopyWorkerCommonData *shared_cstate;
+ Relation rel = NULL;
+ MemoryContext oldcontext;
+ List *attlist = NIL;
+ char *data_source_cb;
+ char *whereClauseStr = NULL;
+ char *rangeTableStr = NULL;
+
+ /* Allocate workspace and zero all fields. */
+ cstate = (CopyStateData *) palloc0(sizeof(CopyStateData));
+ pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
+ cstate->pcdata = pcdata;
+ pcdata->is_leader = false;
+ pcdata->worker_processed_pos = -1;
+ cstate->is_parallel = true;
+
+ /*
+ * We allocate everything used by a cstate in a new memory context. This
+ * avoids memory leaks during repeated use of COPY in a query.
+ */
+ cstate->copycontext = AllocSetContextCreate(CurrentMemoryContext,
+ "COPY",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+
+ pcshared_info = (ShmCopyInfo *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_SHARED_INFO,
+ false);
+
+ ereport(DEBUG1, (errmsg("Starting parallel copy worker")));
+
+ pcdata->pcshared_info = pcshared_info;
+ AssignFullTransactionIdForWorker(pcshared_info->full_transaction_id);
+
+ shared_cstate = (CopyWorkerCommonData *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_CSTATE,
+ false);
+
+ cstate->null_print = (char *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_NULL_PRINT,
+ true);
+ cstate->null_print_client = (char *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_NULL_PRINT_CLIENT,
+ true);
+
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_DELIM, &cstate->delim);
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_QUOTE, &cstate->quote);
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_ESCAPE, &cstate->escape);
+
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_ATTNAME_LIST, &attlist);
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_FORCE_QUOTE_LIST, &cstate->force_quote);
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_NOT_NULL_LIST, &cstate->force_notnull);
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_NULL_LIST, &cstate->force_null);
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_CONVERT_LIST, &cstate->convert_select);
+ data_source_cb = (char *)shm_toc_lookup(toc,
+ PARALLEL_COPY_KEY_DATASOURCE_CB,
+ true);
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_WHERE_CLAUSE_STR, &whereClauseStr);
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_RANGE_TABLE, &rangeTableStr);
+
+ if (data_source_cb)
+ cstate->data_source_cb = LookupParallelCopyFnPtr(data_source_cb);
+
+ if (whereClauseStr)
+ {
+ Node *whereClauseCnvrtdFrmStr = (Node *) stringToNode(whereClauseStr);
+ cstate->whereClause = whereClauseCnvrtdFrmStr;
+ }
+
+ if (rangeTableStr)
+ {
+ List *rangeTableCnvrtdFrmStr = (List *) stringToNode(rangeTableStr);
+ cstate->range_table = rangeTableCnvrtdFrmStr;
+ }
+
+ /* Open and lock the relation, using the appropriate lock type. */
+ rel = table_open(shared_cstate->relid, RowExclusiveLock);
+ cstate->rel = rel;
+ ParallelWorkerInitialization(shared_cstate, cstate, attlist);
+
+ CopyFrom(cstate);
+
+ if (rel != NULL)
+ table_close(rel, RowExclusiveLock);
+
+ MemoryContextSwitchTo(oldcontext);
+ return;
+}
+
+/*
+ * UpdateBlockInChunkInfo - Update the chunk information.
+ */
+static pg_attribute_always_inline int
+UpdateBlockInChunkInfo(CopyState cstate, uint32 blk_pos,
+ uint32 offset, uint32 chunk_size, uint32 chunk_state)
+{
+ ShmCopyInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ChunkBoundaries *chunkBoundaryPtr = &pcshared_info->chunk_boundaries;
+ ChunkBoundary *chunkInfo;
+ int chunk_pos = chunkBoundaryPtr->leader_pos;
+
+ /* Update the chunk information for the worker to pick and process. */
+ chunkInfo = &chunkBoundaryPtr->ring[chunk_pos];
+ while (pg_atomic_read_u32(&chunkInfo->chunk_size) != -1)
+ COPY_WAIT_TO_PROCESS()
+
+ chunkInfo->first_block = blk_pos;
+ chunkInfo->start_offset = offset;
+ chunkInfo->cur_lineno = cstate->cur_lineno;
+ pg_atomic_write_u32(&chunkInfo->chunk_size, chunk_size);
+ pg_atomic_write_u32(&chunkInfo->chunk_state, chunk_state);
+ chunkBoundaryPtr->leader_pos = (chunkBoundaryPtr->leader_pos + 1) % RINGSIZE;
+
+ return chunk_pos;
+}
+
+/*
+ * ParallelCopyLeader - parallel copy leader's functionality.
+ *
+ * Leader will populate the shared queue and share it across the workers. Leader
+ * will read the table data from the file and copy the contents to block. Leader
+ * will then read the input contents and identify the data based on line beaks.
+ * This information is called chunk. The chunk will be populate in
+ * ChunkBoundary. Workers will then pick up this information and insert
+ * in to table. Leader will do this till it completes processing the file.
+ * Leader executes the before statement if before statement trigger is present.
+ * Leader read the data from input file. Leader then loads data to data blocks
+ * as and when required block by block. Leader traverses through the data block
+ * to identify one chunk. It gets a free chunk to copy the information, if there
+ * is no free chunk it will wait till there is one free chunk.
+ * Server copies the identified chunks information into chunks. This process is
+ * repeated till the complete file is processed.
+ * Leader will wait till all the chunks populated are processed by the workers
+ * and exits.
+ */
+static void
+ParallelCopyLeader(CopyState cstate)
+{
+ ShmCopyInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ereport(DEBUG1, (errmsg("Running parallel copy leader")));
+
+ /* Execute the before statement triggers from the leader */
+ ExecBeforeStmtTrigger(cstate);
+
+ /* On input just throw the header line away. */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
+ {
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ {
+ pcshared_info->is_read_in_progress = false;
+ return; /* done */
+ }
+ }
+
+ for (;;)
+ {
+ bool done;
+ cstate->cur_lineno++;
+
+ /* Actually read the line into memory here. */
+ done = CopyReadLine(cstate);
+
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ break;
+ }
+
+ pcshared_info->is_read_in_progress = false;
+ cstate->cur_lineno = 0;
+}
+
+/*
+ * GetChunkPosition - return the chunk position that worker should process.
+ */
+static uint32
+GetChunkPosition(CopyState cstate)
+{
+ ParallelCopyData *pcdata = cstate->pcdata;
+ ShmCopyInfo *pcshared_info = pcdata->pcshared_info;
+ uint32 previous_pos = pcdata->worker_processed_pos;
+ uint32 write_pos = (previous_pos == -1) ? 0 : (previous_pos + 1) % RINGSIZE;
+ for (;;)
+ {
+ int dataSize;
+ bool is_read_in_progress = pcshared_info->is_read_in_progress;
+ ChunkBoundary *chunkInfo;
+ CopyDataBlock *data_blk_ptr;
+ ChunkState chunk_state = CHUNK_LEADER_POPULATED;
+ ChunkState curr_chunk_state;
+ CHECK_FOR_INTERRUPTS();
+
+ /* File read completed & no elements to process. */
+ if (!is_read_in_progress &&
+ (pcshared_info->populated ==
+ pg_atomic_read_u64(&pcshared_info->total_worker_processed)))
+ {
+ write_pos = -1;
+ break;
+ }
+
+ /* Get the current chunk information. */
+ chunkInfo = &pcshared_info->chunk_boundaries.ring[write_pos];
+ curr_chunk_state = pg_atomic_read_u32(&chunkInfo->chunk_state);
+ if ((write_pos % WORKER_CHUNK_COUNT == 0) &&
+ (curr_chunk_state == CHUNK_WORKER_PROCESSED ||
+ curr_chunk_state == CHUNK_WORKER_PROCESSING))
+ {
+ pcdata->worker_processed_pos = write_pos;
+ write_pos = (write_pos + WORKER_CHUNK_COUNT) % RINGSIZE;
+ continue;
+ }
+
+ /* Get the size of this chunk. */
+ dataSize = pg_atomic_read_u32(&chunkInfo->chunk_size);
+
+ if (dataSize != 0) /* If not an empty chunk. */
+ {
+ /* Get the block information. */
+ data_blk_ptr = &pcshared_info->data_blocks[chunkInfo->first_block];
+
+ if (!data_blk_ptr->curr_blk_completed && (dataSize == -1))
+ {
+ /* Wait till the current chunk or block is added. */
+ COPY_WAIT_TO_PROCESS()
+ continue;
+ }
+ }
+
+ /* Make sure that no worker has consumed this element. */
+ if (pg_atomic_compare_exchange_u32(&chunkInfo->chunk_state,
+ &chunk_state, CHUNK_WORKER_PROCESSING))
+ break;
+ }
+
+ pcdata->worker_processed_pos = write_pos;
+ return write_pos;
+}
+
+/*
+ * GetFreeCopyBlock - Get a free block for data to be copied.
+ */
+static pg_attribute_always_inline uint32
+GetFreeCopyBlock(ShmCopyInfo *pcshared_info)
+{
+ int count = 0;
+ uint32 last_free_block = pcshared_info->cur_block_pos;
+ uint32 block_pos = (last_free_block != -1) ? ((last_free_block + 1) % MAX_BLOCKS_COUNT): 0;
+
+ /* Get a new block for copying data. */
+ while (count < MAX_BLOCKS_COUNT)
+ {
+ CopyDataBlock *inputBlk = &pcshared_info->data_blocks[block_pos];
+ uint32 unprocessed_chunk_parts = pg_atomic_read_u32(&inputBlk->unprocessed_chunk_parts);
+ if (unprocessed_chunk_parts == 0)
+ {
+ inputBlk->curr_blk_completed = false;
+ pcshared_info->cur_block_pos = block_pos;
+ return block_pos;
+ }
+
+ block_pos = (block_pos + 1) % MAX_BLOCKS_COUNT;
+ count++;
+ }
+
+ return -1;
+}
+
+/*
+ * WaitGetFreeCopyBlock - If there are no blocks available, wait and get a block
+ * for copying data.
*/
-#define IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(extralen) \
-if (1) \
-{ \
- if (raw_buf_ptr + (extralen) >= copy_buf_len && !hit_eof) \
- { \
- raw_buf_ptr = prev_raw_ptr; /* undo fetch */ \
- need_data = true; \
- continue; \
- } \
-} else ((void) 0)
+static uint32
+WaitGetFreeCopyBlock(ShmCopyInfo *pcshared_info)
+{
+ uint32 new_free_pos = -1;
+ for (;;)
+ {
+ new_free_pos = GetFreeCopyBlock(pcshared_info);
+ if (new_free_pos != -1) /* We have got one block, break now. */
+ break;
-/* This consumes the remainder of the buffer and breaks */
-#define IF_NEED_REFILL_AND_EOF_BREAK(extralen) \
-if (1) \
-{ \
- if (raw_buf_ptr + (extralen) >= copy_buf_len && hit_eof) \
- { \
- if (extralen) \
- raw_buf_ptr = copy_buf_len; /* consume the partial character */ \
- /* backslash just before EOF, treat as data char */ \
- result = true; \
- break; \
- } \
-} else ((void) 0)
+ COPY_WAIT_TO_PROCESS()
+ }
+
+ return new_free_pos;
+}
/*
- * Transfer any approved data to line_buf; must do this to be sure
- * there is some room in raw_buf.
+ * LookupParallelCopyFnPtr - Look up parallel copy function pointer.
*/
-#define REFILL_LINEBUF \
-if (1) \
-{ \
- if (raw_buf_ptr > cstate->raw_buf_index) \
- { \
- appendBinaryStringInfo(&cstate->line_buf, \
- cstate->raw_buf + cstate->raw_buf_index, \
- raw_buf_ptr - cstate->raw_buf_index); \
- cstate->raw_buf_index = raw_buf_ptr; \
- } \
-} else ((void) 0)
-
-/* Undo any read-ahead and jump out of the block. */
-#define NO_END_OF_COPY_GOTO \
-if (1) \
-{ \
- raw_buf_ptr = prev_raw_ptr + 1; \
- goto not_end_of_copy; \
-} else ((void) 0)
+static pg_attribute_always_inline copy_data_source_cb
+LookupParallelCopyFnPtr(const char *funcname)
+{
+ int i;
-static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
+ for (i = 0; i < lengthof(InternalParallelCopyFuncPtrs); i++)
+ {
+ if (strcmp(InternalParallelCopyFuncPtrs[i].fn_name, funcname) == 0)
+ return InternalParallelCopyFuncPtrs[i].fn_addr;
+ }
+ /* We can only reach this by programming error. */
+ elog(ERROR, "internal function \"%s\" not found", funcname);
+}
-/* non-export function prototypes */
-static CopyState BeginCopy(ParseState *pstate, bool is_from, Relation rel,
- RawStmt *raw_query, Oid queryRelId, List *attnamelist,
- List *options);
-static void EndCopy(CopyState cstate);
-static void ClosePipeToProgram(CopyState cstate);
-static CopyState BeginCopyTo(ParseState *pstate, Relation rel, RawStmt *query,
- Oid queryRelId, const char *filename, bool is_program,
- List *attnamelist, List *options);
-static void EndCopyTo(CopyState cstate);
-static uint64 DoCopyTo(CopyState cstate);
-static uint64 CopyTo(CopyState cstate);
-static void CopyOneRowTo(CopyState cstate, TupleTableSlot *slot);
-static bool CopyReadLine(CopyState cstate);
-static bool CopyReadLineText(CopyState cstate);
-static int CopyReadAttributesText(CopyState cstate);
-static int CopyReadAttributesCSV(CopyState cstate);
-static Datum CopyReadBinaryAttribute(CopyState cstate,
- int column_no, FmgrInfo *flinfo,
- Oid typioparam, int32 typmod,
- bool *isnull);
-static void CopyAttributeOutText(CopyState cstate, char *string);
-static void CopyAttributeOutCSV(CopyState cstate, char *string,
- bool use_quote, bool single_attr);
-static List *CopyGetAttnums(TupleDesc tupDesc, Relation rel,
- List *attnamelist);
-static char *limit_printout_length(const char *str);
+/*
+ * LookupParallelCopyFnStr - Lookup function string from a function pointer.
+ */
+static pg_attribute_always_inline char*
+LookupParallelCopyFnStr(copy_data_source_cb fn_addr)
+{
+ int i;
-/* Low-level communications functions */
-static void SendCopyBegin(CopyState cstate);
-static void ReceiveCopyBegin(CopyState cstate);
-static void SendCopyEnd(CopyState cstate);
-static void CopySendData(CopyState cstate, const void *databuf, int datasize);
-static void CopySendString(CopyState cstate, const char *str);
-static void CopySendChar(CopyState cstate, char c);
-static void CopySendEndOfRow(CopyState cstate);
-static int CopyGetData(CopyState cstate, void *databuf,
- int minread, int maxread);
-static void CopySendInt32(CopyState cstate, int32 val);
-static bool CopyGetInt32(CopyState cstate, int32 *val);
-static void CopySendInt16(CopyState cstate, int16 val);
-static bool CopyGetInt16(CopyState cstate, int16 *val);
+ for (i = 0; i < lengthof(InternalParallelCopyFuncPtrs); i++)
+ {
+ if (InternalParallelCopyFuncPtrs[i].fn_addr == fn_addr)
+ return InternalParallelCopyFuncPtrs[i].fn_name;
+ }
+ /* We can only reach this by programming error. */
+ elog(ERROR, "internal function pointer not found");
+}
/*
* Send copy start/stop messages for frontend copies. These have changed
@@ -611,7 +2208,6 @@ static int
CopyGetData(CopyState cstate, void *databuf, int minread, int maxread)
{
int bytesread = 0;
-
switch (cstate->copy_dest)
{
case COPY_FILE:
@@ -790,17 +2386,17 @@ CopyGetInt16(CopyState cstate, int16 *val)
* bufferload boundary.
*/
static bool
-CopyLoadRawBuf(CopyState cstate)
+CopyLoadRawBuf(CopyState cstate, int raw_buf_len, int *raw_buf_index)
{
int nbytes;
int inbytes;
- if (cstate->raw_buf_index < cstate->raw_buf_len)
+ if (!IsParallelCopy() && *raw_buf_index < raw_buf_len)
{
- /* Copy down the unprocessed data */
- nbytes = cstate->raw_buf_len - cstate->raw_buf_index;
- memmove(cstate->raw_buf, cstate->raw_buf + cstate->raw_buf_index,
- nbytes);
+ /* Copy down the unprocessed data. */
+ nbytes = raw_buf_len - *raw_buf_index;
+ memmove(cstate->raw_buf, cstate->raw_buf + *raw_buf_index,
+ nbytes);
}
else
nbytes = 0; /* no data need be saved */
@@ -809,12 +2405,16 @@ CopyLoadRawBuf(CopyState cstate)
1, RAW_BUF_SIZE - nbytes);
nbytes += inbytes;
cstate->raw_buf[nbytes] = '\0';
- cstate->raw_buf_index = 0;
+ if (!IsParallelCopy())
+ {
+ cstate->raw_buf_index = 0;
+ *raw_buf_index = 0;
+ }
cstate->raw_buf_len = nbytes;
+
return (inbytes > 0);
}
-
/*
* DoCopy executes the SQL COPY statement
*
@@ -1060,6 +2660,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
if (is_from)
{
+ ParallelContext *pcxt = NULL;
Assert(rel);
/* check read-only transaction and parallel mode */
@@ -1069,7 +2670,24 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
cstate = BeginCopyFrom(pstate, rel, stmt->filename, stmt->is_program,
NULL, stmt->attlist, stmt->options);
cstate->whereClause = whereClause;
- *processed = CopyFrom(cstate); /* copy from file to database */
+ cstate->is_parallel = false;
+
+ if (cstate->nworkers > 0)
+ pcxt = BeginParallelCopy(cstate->nworkers, cstate, stmt->attlist,
+ relid);
+
+ if (pcxt)
+ {
+ ParallelCopyLeader(cstate);
+
+ /* Wait for all copy workers to finish */
+ WaitForParallelWorkersToFinish(pcxt);
+ *processed = pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
+ EndParallelCopy(pcxt);
+ }
+ else
+ *processed = CopyFrom(cstate); /* copy from file to database */
+
EndCopyFrom(cstate);
}
else
@@ -1118,6 +2736,7 @@ ProcessCopyOptions(ParseState *pstate,
cstate->is_copy_from = is_from;
cstate->file_encoding = -1;
+ cstate->nworkers = -1;
/* Extract options from the statement node tree */
foreach(option, options)
@@ -1286,6 +2905,26 @@ ProcessCopyOptions(ParseState *pstate,
defel->defname),
parser_errposition(pstate, defel->location)));
}
+ else if (strcmp(defel->defname, "parallel") == 0)
+ {
+ if (!cstate->is_copy_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("parallel option supported only for copy from"),
+ parser_errposition(pstate, defel->location)));
+ if (cstate->nworkers >= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("conflicting or redundant options"),
+ parser_errposition(pstate, defel->location)));
+ cstate->nworkers = atoi(defGetString(defel));
+ if (cstate->nworkers < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("argument to option \"%s\" must be a non-negative integer",
+ defel->defname),
+ parser_errposition(pstate, defel->location)));
+ }
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1377,67 +3016,194 @@ ProcessCopyOptions(ParseState *pstate,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("COPY quote must be a single one-byte character")));
- if (cstate->csv_mode && cstate->delim[0] == cstate->quote[0])
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
- errmsg("COPY delimiter and quote must be different")));
+ if (cstate->csv_mode && cstate->delim[0] == cstate->quote[0])
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("COPY delimiter and quote must be different")));
+
+ /* Check escape */
+ if (!cstate->csv_mode && cstate->escape != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY escape available only in CSV mode")));
+
+ if (cstate->csv_mode && strlen(cstate->escape) != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY escape must be a single one-byte character")));
+
+ /* Check force_quote */
+ if (!cstate->csv_mode && (cstate->force_quote || cstate->force_quote_all))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY force quote available only in CSV mode")));
+ if ((cstate->force_quote || cstate->force_quote_all) && is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY force quote only available using COPY TO")));
+
+ /* Check force_notnull */
+ if (!cstate->csv_mode && cstate->force_notnull != NIL)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY force not null available only in CSV mode")));
+ if (cstate->force_notnull != NIL && !is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY force not null only available using COPY FROM")));
+
+ /* Check force_null */
+ if (!cstate->csv_mode && cstate->force_null != NIL)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY force null available only in CSV mode")));
+
+ if (cstate->force_null != NIL && !is_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY force null only available using COPY FROM")));
+
+ /* Don't allow the delimiter to appear in the null string. */
+ if (strchr(cstate->null_print, cstate->delim[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("COPY delimiter must not appear in the NULL specification")));
+
+ /* Don't allow the CSV quote char to appear in the null string. */
+ if (cstate->csv_mode &&
+ strchr(cstate->null_print, cstate->quote[0]) != NULL)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("CSV quote character must not appear in the NULL specification")));
+}
+
+/*
+ * PopulateAttributes - Populate the attributes.
+ */
+void PopulateAttributes(CopyState cstate, TupleDesc tupDesc, List *attnamelist)
+{
+ int num_phys_attrs;
+
+ /* Generate or convert list of attributes to process */
+ cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
+
+ num_phys_attrs = tupDesc->natts;
+
+ /* Convert FORCE_QUOTE name list to per-column flags, check validity */
+ cstate->force_quote_flags = (bool *) palloc0(num_phys_attrs * sizeof(bool));
+ if (cstate->force_quote_all)
+ {
+ int i;
+
+ for (i = 0; i < num_phys_attrs; i++)
+ cstate->force_quote_flags[i] = true;
+ }
+ else if (cstate->force_quote)
+ {
+ List *attnums;
+ ListCell *cur;
+
+ attnums = CopyGetAttnums(tupDesc, cstate->rel, cstate->force_quote);
+
+ foreach(cur, attnums)
+ {
+ int attnum = lfirst_int(cur);
+ Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
+
+ if (!list_member_int(cstate->attnumlist, attnum))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
+ errmsg("FORCE_QUOTE column \"%s\" not referenced by COPY",
+ NameStr(attr->attname))));
+ cstate->force_quote_flags[attnum - 1] = true;
+ }
+ }
+
+ /* Convert FORCE_NOT_NULL name list to per-column flags, check validity */
+ cstate->force_notnull_flags = (bool *) palloc0(num_phys_attrs * sizeof(bool));
+ if (cstate->force_notnull)
+ {
+ List *attnums;
+ ListCell *cur;
+
+ attnums = CopyGetAttnums(tupDesc, cstate->rel, cstate->force_notnull);
+
+ foreach(cur, attnums)
+ {
+ int attnum = lfirst_int(cur);
+ Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
+
+ if (!list_member_int(cstate->attnumlist, attnum))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
+ errmsg("FORCE_NOT_NULL column \"%s\" not referenced by COPY",
+ NameStr(attr->attname))));
+ cstate->force_notnull_flags[attnum - 1] = true;
+ }
+ }
+
+ /* Convert FORCE_NULL name list to per-column flags, check validity */
+ cstate->force_null_flags = (bool *) palloc0(num_phys_attrs * sizeof(bool));
+ if (cstate->force_null)
+ {
+ List *attnums;
+ ListCell *cur;
+
+ attnums = CopyGetAttnums(tupDesc, cstate->rel, cstate->force_null);
- /* Check escape */
- if (!cstate->csv_mode && cstate->escape != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY escape available only in CSV mode")));
+ foreach(cur, attnums)
+ {
+ int attnum = lfirst_int(cur);
+ Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
- if (cstate->csv_mode && strlen(cstate->escape) != 1)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY escape must be a single one-byte character")));
+ if (!list_member_int(cstate->attnumlist, attnum))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
+ errmsg("FORCE_NULL column \"%s\" not referenced by COPY",
+ NameStr(attr->attname))));
+ cstate->force_null_flags[attnum - 1] = true;
+ }
+ }
- /* Check force_quote */
- if (!cstate->csv_mode && (cstate->force_quote || cstate->force_quote_all))
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY force quote available only in CSV mode")));
- if ((cstate->force_quote || cstate->force_quote_all) && is_from)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY force quote only available using COPY TO")));
+ /* Convert convert_selectively name list to per-column flags */
+ if (cstate->convert_selectively)
+ {
+ List *attnums;
+ ListCell *cur;
- /* Check force_notnull */
- if (!cstate->csv_mode && cstate->force_notnull != NIL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY force not null available only in CSV mode")));
- if (cstate->force_notnull != NIL && !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY force not null only available using COPY FROM")));
+ cstate->convert_select_flags = (bool *) palloc0(num_phys_attrs * sizeof(bool));
- /* Check force_null */
- if (!cstate->csv_mode && cstate->force_null != NIL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY force null available only in CSV mode")));
+ attnums = CopyGetAttnums(tupDesc, cstate->rel, cstate->convert_select);
- if (cstate->force_null != NIL && !is_from)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY force null only available using COPY FROM")));
+ foreach(cur, attnums)
+ {
+ int attnum = lfirst_int(cur);
+ Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
- /* Don't allow the delimiter to appear in the null string. */
- if (strchr(cstate->null_print, cstate->delim[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("COPY delimiter must not appear in the NULL specification")));
+ if (!list_member_int(cstate->attnumlist, attnum))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
+ errmsg_internal("selected column \"%s\" not referenced by COPY",
+ NameStr(attr->attname))));
+ cstate->convert_select_flags[attnum - 1] = true;
+ }
+ }
- /* Don't allow the CSV quote char to appear in the null string. */
- if (cstate->csv_mode &&
- strchr(cstate->null_print, cstate->quote[0]) != NULL)
- ereport(ERROR,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("CSV quote character must not appear in the NULL specification")));
-}
+ /* Use client encoding when ENCODING option is not specified. */
+ if (cstate->file_encoding < 0)
+ cstate->file_encoding = pg_get_client_encoding();
+ /*
+ * Set up encoding conversion info. Even if the file and server encodings
+ * are the same, we must apply pg_any_to_server() to validate data in
+ * multibyte encodings.
+ */
+ cstate->need_transcoding =
+ (cstate->file_encoding != GetDatabaseEncoding() ||
+ pg_database_encoding_max_length() > 1);
+ /* See Multibyte encoding comment above */
+ cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
+}
/*
* Common setup routines used by BeginCopyFrom and BeginCopyTo.
*
@@ -1464,7 +3230,6 @@ BeginCopy(ParseState *pstate,
{
CopyState cstate;
TupleDesc tupDesc;
- int num_phys_attrs;
MemoryContext oldcontext;
/* Allocate workspace and zero all fields */
@@ -1630,126 +3395,7 @@ BeginCopy(ParseState *pstate,
tupDesc = cstate->queryDesc->tupDesc;
}
- /* Generate or convert list of attributes to process */
- cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
-
- num_phys_attrs = tupDesc->natts;
-
- /* Convert FORCE_QUOTE name list to per-column flags, check validity */
- cstate->force_quote_flags = (bool *) palloc0(num_phys_attrs * sizeof(bool));
- if (cstate->force_quote_all)
- {
- int i;
-
- for (i = 0; i < num_phys_attrs; i++)
- cstate->force_quote_flags[i] = true;
- }
- else if (cstate->force_quote)
- {
- List *attnums;
- ListCell *cur;
-
- attnums = CopyGetAttnums(tupDesc, cstate->rel, cstate->force_quote);
-
- foreach(cur, attnums)
- {
- int attnum = lfirst_int(cur);
- Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
-
- if (!list_member_int(cstate->attnumlist, attnum))
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
- errmsg("FORCE_QUOTE column \"%s\" not referenced by COPY",
- NameStr(attr->attname))));
- cstate->force_quote_flags[attnum - 1] = true;
- }
- }
-
- /* Convert FORCE_NOT_NULL name list to per-column flags, check validity */
- cstate->force_notnull_flags = (bool *) palloc0(num_phys_attrs * sizeof(bool));
- if (cstate->force_notnull)
- {
- List *attnums;
- ListCell *cur;
-
- attnums = CopyGetAttnums(tupDesc, cstate->rel, cstate->force_notnull);
-
- foreach(cur, attnums)
- {
- int attnum = lfirst_int(cur);
- Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
-
- if (!list_member_int(cstate->attnumlist, attnum))
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
- errmsg("FORCE_NOT_NULL column \"%s\" not referenced by COPY",
- NameStr(attr->attname))));
- cstate->force_notnull_flags[attnum - 1] = true;
- }
- }
-
- /* Convert FORCE_NULL name list to per-column flags, check validity */
- cstate->force_null_flags = (bool *) palloc0(num_phys_attrs * sizeof(bool));
- if (cstate->force_null)
- {
- List *attnums;
- ListCell *cur;
-
- attnums = CopyGetAttnums(tupDesc, cstate->rel, cstate->force_null);
-
- foreach(cur, attnums)
- {
- int attnum = lfirst_int(cur);
- Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
-
- if (!list_member_int(cstate->attnumlist, attnum))
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
- errmsg("FORCE_NULL column \"%s\" not referenced by COPY",
- NameStr(attr->attname))));
- cstate->force_null_flags[attnum - 1] = true;
- }
- }
-
- /* Convert convert_selectively name list to per-column flags */
- if (cstate->convert_selectively)
- {
- List *attnums;
- ListCell *cur;
-
- cstate->convert_select_flags = (bool *) palloc0(num_phys_attrs * sizeof(bool));
-
- attnums = CopyGetAttnums(tupDesc, cstate->rel, cstate->convert_select);
-
- foreach(cur, attnums)
- {
- int attnum = lfirst_int(cur);
- Form_pg_attribute attr = TupleDescAttr(tupDesc, attnum - 1);
-
- if (!list_member_int(cstate->attnumlist, attnum))
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
- errmsg_internal("selected column \"%s\" not referenced by COPY",
- NameStr(attr->attname))));
- cstate->convert_select_flags[attnum - 1] = true;
- }
- }
-
- /* Use client encoding when ENCODING option is not specified. */
- if (cstate->file_encoding < 0)
- cstate->file_encoding = pg_get_client_encoding();
-
- /*
- * Set up encoding conversion info. Even if the file and server encodings
- * are the same, we must apply pg_any_to_server() to validate data in
- * multibyte encodings.
- */
- cstate->need_transcoding =
- (cstate->file_encoding != GetDatabaseEncoding() ||
- pg_database_encoding_max_length() > 1);
- /* See Multibyte encoding comment above */
- cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
-
+ PopulateAttributes(cstate, tupDesc, attnamelist);
cstate->copy_dest = COPY_FILE; /* default */
MemoryContextSwitchTo(oldcontext);
@@ -2638,41 +4284,67 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
/* Store the line number so we can properly report any errors later */
buffer->linenos[buffer->nused] = lineno;
- /* Record this slot as being used */
- buffer->nused++;
+ /* Record this slot as being used */
+ buffer->nused++;
+
+ /* Update how many tuples are stored and their size */
+ miinfo->bufferedTuples++;
+ miinfo->bufferedBytes += tuplen;
+}
+
+/*
+ * ExecBeforeStmtTrigger - Execute the before statement trigger, this will be
+ * executed for parallel copy by the leader process.
+ */
+static void
+ExecBeforeStmtTrigger(CopyState cstate)
+{
+ EState *estate = CreateExecutorState();
+ ResultRelInfo *resultRelInfo;
+
+ Assert(IsLeader());
+
+ /*
+ * We need a ResultRelInfo so we can use the regular executor's
+ * index-entry-making machinery. (There used to be a huge amount of code
+ * here that basically duplicated execUtils.c ...)
+ */
+ resultRelInfo = makeNode(ResultRelInfo);
+ InitResultRelInfo(resultRelInfo,
+ cstate->rel,
+ 1, /* must match rel's position in range_table */
+ NULL,
+ 0);
+
+ /* Verify the named relation is a valid target for INSERT */
+ CheckValidResultRel(resultRelInfo, CMD_INSERT);
+
+ estate->es_result_relations = resultRelInfo;
+ estate->es_num_result_relations = 1;
+ estate->es_result_relation_info = resultRelInfo;
+
+ ExecInitRangeTable(estate, cstate->range_table);
+
+ /*
+ * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
+ * should do this for COPY, since it's not really an "INSERT" statement as
+ * such. However, executing these triggers maintains consistency with the
+ * EACH ROW triggers that we already fire on COPY.
+ */
+ ExecBSInsertTriggers(estate, resultRelInfo);
+
+ /* Close any trigger target relations */
+ ExecCleanUpTriggerState(estate);
- /* Update how many tuples are stored and their size */
- miinfo->bufferedTuples++;
- miinfo->bufferedBytes += tuplen;
+ FreeExecutorState(estate);
}
/*
- * Copy FROM file to relation.
+ * Check if the relation specified in copy from is valid.
*/
-uint64
-CopyFrom(CopyState cstate)
+static void
+CheckCopyFromValidity(CopyState cstate)
{
- ResultRelInfo *resultRelInfo;
- ResultRelInfo *target_resultRelInfo;
- ResultRelInfo *prevResultRelInfo = NULL;
- EState *estate = CreateExecutorState(); /* for ExecConstraints() */
- ModifyTableState *mtstate;
- ExprContext *econtext;
- TupleTableSlot *singleslot = NULL;
- MemoryContext oldcontext = CurrentMemoryContext;
-
- PartitionTupleRouting *proute = NULL;
- ErrorContextCallback errcallback;
- CommandId mycid = GetCurrentCommandId(true);
- int ti_options = 0; /* start with default options for insert */
- BulkInsertState bistate = NULL;
- CopyInsertMethod insertMethod;
- CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
- uint64 processed = 0;
- bool has_before_insert_row_trig;
- bool has_instead_insert_row_trig;
- bool leafpart_use_multi_insert = false;
-
Assert(cstate->rel);
/*
@@ -2708,6 +4380,44 @@ CopyFrom(CopyState cstate)
errmsg("cannot copy to non-table relation \"%s\"",
RelationGetRelationName(cstate->rel))));
}
+}
+
+/*
+ * Copy FROM file to relation.
+ */
+uint64
+CopyFrom(CopyState cstate)
+{
+ ResultRelInfo *resultRelInfo;
+ ResultRelInfo *target_resultRelInfo;
+ ResultRelInfo *prevResultRelInfo = NULL;
+ EState *estate = CreateExecutorState(); /* for ExecConstraints() */
+ ModifyTableState *mtstate;
+ ExprContext *econtext;
+ TupleTableSlot *singleslot = NULL;
+ MemoryContext oldcontext = CurrentMemoryContext;
+
+ PartitionTupleRouting *proute = NULL;
+ ErrorContextCallback errcallback;
+ CommandId mycid = IsParallelCopy() ? cstate->pcdata->pcshared_info->mycid :
+ GetCurrentCommandId(true);
+ int ti_options = 0; /* start with default options for insert */
+ BulkInsertState bistate = NULL;
+ CopyInsertMethod insertMethod;
+ CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
+ uint64 processed = 0;
+ bool has_before_insert_row_trig;
+ bool has_instead_insert_row_trig;
+ bool leafpart_use_multi_insert = false;
+
+ /*
+ * Perform this check if it is not parallel copy. In case of parallel
+ * copy, this check is done by the leader, so that if any invalid case
+ * exist the copy from command will error out from the leader itself,
+ * avoiding launching workers, just to throw error.
+ */
+ if (!IsParallelCopy())
+ CheckCopyFromValidity(cstate);
/*
* If the target file is new-in-transaction, we assume that checking FSM
@@ -2934,13 +4644,16 @@ CopyFrom(CopyState cstate)
has_instead_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
resultRelInfo->ri_TrigDesc->trig_insert_instead_row);
- /*
- * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
- * should do this for COPY, since it's not really an "INSERT" statement as
- * such. However, executing these triggers maintains consistency with the
- * EACH ROW triggers that we already fire on COPY.
- */
- ExecBSInsertTriggers(estate, resultRelInfo);
+ if (!IsParallelCopy())
+ {
+ /*
+ * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
+ * should do this for COPY, since it's not really an "INSERT" statement as
+ * such. However, executing these triggers maintains consistency with the
+ * EACH ROW triggers that we already fire on COPY.
+ */
+ ExecBSInsertTriggers(estate, resultRelInfo);
+ }
econtext = GetPerTupleExprContext(estate);
@@ -3262,7 +4975,7 @@ CopyFrom(CopyState cstate)
* or FDW; this is the same definition used by nodeModifyTable.c
* for counting tuples inserted by an INSERT command.
*/
- processed++;
+ INCREMENTPROCESSED(processed)
}
}
@@ -3317,30 +5030,14 @@ CopyFrom(CopyState cstate)
FreeExecutorState(estate);
- return processed;
+ GETPROCESSED(processed)
}
/*
- * Setup to read tuples from a file for COPY FROM.
- *
- * 'rel': Used as a template for the tuples
- * 'filename': Name of server-local file to read
- * 'attnamelist': List of char *, columns to include. NIL selects all cols.
- * 'options': List of DefElem. See copy_opt_item in gram.y for selections.
- *
- * Returns a CopyState, to be passed to NextCopyFrom and related functions.
+ * PopulateCatalogInformation - populate the catalog information.
*/
-CopyState
-BeginCopyFrom(ParseState *pstate,
- Relation rel,
- const char *filename,
- bool is_program,
- copy_data_source_cb data_source_cb,
- List *attnamelist,
- List *options)
+void PopulateCatalogInformation(CopyState cstate)
{
- CopyState cstate;
- bool pipe = (filename == NULL);
TupleDesc tupDesc;
AttrNumber num_phys_attrs,
num_defaults;
@@ -3350,31 +5047,8 @@ BeginCopyFrom(ParseState *pstate,
Oid in_func_oid;
int *defmap;
ExprState **defexprs;
- MemoryContext oldcontext;
bool volatile_defexprs;
- cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
- oldcontext = MemoryContextSwitchTo(cstate->copycontext);
-
- /* Initialize state variables */
- cstate->reached_eof = false;
- cstate->eol_type = EOL_UNKNOWN;
- cstate->cur_relname = RelationGetRelationName(cstate->rel);
- cstate->cur_lineno = 0;
- cstate->cur_attname = NULL;
- cstate->cur_attval = NULL;
-
- /* Set up variables to avoid per-attribute overhead. */
- initStringInfo(&cstate->attribute_buf);
- initStringInfo(&cstate->line_buf);
- cstate->line_buf_converted = false;
- cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
- cstate->raw_buf_index = cstate->raw_buf_len = 0;
-
- /* Assign range table, we'll need it in CopyFrom. */
- if (pstate)
- cstate->range_table = pstate->p_rtable;
-
tupDesc = RelationGetDescr(cstate->rel);
num_phys_attrs = tupDesc->natts;
num_defaults = 0;
@@ -3452,6 +5126,55 @@ BeginCopyFrom(ParseState *pstate,
cstate->defexprs = defexprs;
cstate->volatile_defexprs = volatile_defexprs;
cstate->num_defaults = num_defaults;
+}
+
+/*
+ * Setup to read tuples from a file for COPY FROM.
+ *
+ * 'rel': Used as a template for the tuples
+ * 'filename': Name of server-local file to read
+ * 'attnamelist': List of char *, columns to include. NIL selects all cols.
+ * 'options': List of DefElem. See copy_opt_item in gram.y for selections.
+ *
+ * Returns a CopyState, to be passed to NextCopyFrom and related functions.
+ */
+CopyState
+BeginCopyFrom(ParseState *pstate,
+ Relation rel,
+ const char *filename,
+ bool is_program,
+ copy_data_source_cb data_source_cb,
+ List *attnamelist,
+ List *options)
+{
+ CopyState cstate;
+ bool pipe = (filename == NULL);
+ MemoryContext oldcontext;
+
+ cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist,
+ options);
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+
+ /* Initialize state variables */
+ cstate->reached_eof = false;
+ cstate->eol_type = EOL_UNKNOWN;
+ cstate->cur_relname = RelationGetRelationName(cstate->rel);
+ cstate->cur_lineno = 0;
+ cstate->cur_attname = NULL;
+ cstate->cur_attval = NULL;
+
+ /* Set up variables to avoid per-attribute overhead. */
+ initStringInfo(&cstate->attribute_buf);
+ initStringInfo(&cstate->line_buf);
+ cstate->line_buf_converted = false;
+ cstate->raw_buf = (IsParallelCopy()) ? NULL : (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+
+ /* Assign range table, we'll need it in CopyFrom. */
+ if (pstate)
+ cstate->range_table = pstate->p_rtable;
+
+ PopulateCatalogInformation(cstate);
cstate->is_program = is_program;
if (data_source_cb)
@@ -3588,26 +5311,35 @@ NextCopyFromRawFields(CopyState cstate, char ***fields, int *nfields)
/* only available for text or csv input */
Assert(!cstate->binary);
- /* on input just throw the header line away */
- if (cstate->cur_lineno == 0 && cstate->header_line)
+ if (IsParallelCopy())
{
- cstate->cur_lineno++;
- if (CopyReadLine(cstate))
- return false; /* done */
+ done = GetWorkerChunk(cstate);
+ if (done && cstate->line_buf.len == 0)
+ return false;
}
+ else
+ {
+ /* on input just throw the header line away */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
+ {
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ return false; /* done */
+ }
- cstate->cur_lineno++;
+ cstate->cur_lineno++;
- /* Actually read the line into memory here */
- done = CopyReadLine(cstate);
+ /* Actually read the line into memory here */
+ done = CopyReadLine(cstate);
- /*
- * EOF at start of line means we're done. If we see EOF after some
- * characters, we act as though it was newline followed by EOF, ie,
- * process the line and then exit loop on next iteration.
- */
- if (done && cstate->line_buf.len == 0)
- return false;
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ return false;
+ }
/* Parse the line into de-escaped field values */
if (cstate->csv_mode)
@@ -3828,6 +5560,61 @@ EndCopyFrom(CopyState cstate)
}
/*
+ * ClearEOLFromParallelChunk - Clear EOL from the copied data.
+ */
+static int
+ClearEOLFromParallelChunk(CopyState cstate, CopyBufferState *copy_buff_state)
+{
+ /* raw_buf_ptr will be pointing to the next char that need to be read. */
+ int cur_pos = (copy_buff_state->raw_buf_ptr == 0) ? RAW_BUF_SIZE - 1: copy_buff_state->raw_buf_ptr - 1;
+ CopyDataBlock *data_blk_ptr = copy_buff_state->data_blk_ptr;
+ CopyDataBlock *curr_data_blk_ptr = copy_buff_state->curr_data_blk_ptr;
+ int new_line_size = 0;
+ PG_USED_FOR_ASSERTS_ONLY char ch;
+
+ /*
+ * If we didn't hit EOF, then we must have transferred the EOL marker
+ * to line_buf along with the data. Get rid of it.
+ */
+ switch (cstate->eol_type)
+ {
+ case EOL_NL:
+ Assert(copy_buff_state->chunk_size >= 1);
+ Assert(curr_data_blk_ptr->data[cur_pos] == '\n');
+ copy_buff_state->chunk_size -= 1;
+ curr_data_blk_ptr->data[cur_pos] = '\0';
+ new_line_size = 1;
+ break;
+ case EOL_CR:
+ Assert(copy_buff_state->chunk_size >= 1);
+ Assert(curr_data_blk_ptr->data[cur_pos] == '\r');
+ copy_buff_state->chunk_size -= 1;
+ curr_data_blk_ptr->data[cur_pos] = '\0';
+ new_line_size = 1;
+ break;
+ case EOL_CRNL:
+ Assert(copy_buff_state->chunk_size >= 2);
+
+ if (cur_pos >= 1)
+ ch = curr_data_blk_ptr->data[cur_pos - 1];
+ else
+ ch = data_blk_ptr->data[RAW_BUF_SIZE - 1];
+ Assert(ch == '\r');
+ Assert(curr_data_blk_ptr->data[cur_pos] == '\n');
+ copy_buff_state->chunk_size -= 2;
+ curr_data_blk_ptr->data[cur_pos] = '\0';
+ new_line_size = 2;
+ break;
+ case EOL_UNKNOWN:
+ /* shouldn't get here */
+ Assert(false);
+ break;
+ }
+
+ return new_line_size;
+}
+
+/*
* Read the next input line and stash it in line_buf, with conversion to
* server encoding.
*
@@ -3839,7 +5626,6 @@ static bool
CopyReadLine(CopyState cstate)
{
bool result;
-
resetStringInfo(&cstate->line_buf);
cstate->line_buf_valid = true;
@@ -3858,66 +5644,40 @@ CopyReadLine(CopyState cstate)
*/
if (cstate->copy_dest == COPY_NEW_FE)
{
+ bool bIsFirst = true;
do
{
- cstate->raw_buf_index = cstate->raw_buf_len;
- } while (CopyLoadRawBuf(cstate));
- }
- }
- else
- {
- /*
- * If we didn't hit EOF, then we must have transferred the EOL marker
- * to line_buf along with the data. Get rid of it.
- */
- switch (cstate->eol_type)
- {
- case EOL_NL:
- Assert(cstate->line_buf.len >= 1);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\n');
- cstate->line_buf.len--;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_CR:
- Assert(cstate->line_buf.len >= 1);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\r');
- cstate->line_buf.len--;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_CRNL:
- Assert(cstate->line_buf.len >= 2);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 2] == '\r');
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\n');
- cstate->line_buf.len -= 2;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_UNKNOWN:
- /* shouldn't get here */
- Assert(false);
- break;
- }
- }
-
- /* Done reading the line. Convert it to server encoding. */
- if (cstate->need_transcoding)
- {
- char *cvt;
+ if (!IsParallelCopy())
+ cstate->raw_buf_index = cstate->raw_buf_len;
+ else
+ {
+ if (cstate->raw_buf_index == RAW_BUF_SIZE)
+ {
+ /* Get a new block if it is the first time */
+ if (bIsFirst)
+ {
+ ShmCopyInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint32 block_pos = WaitGetFreeCopyBlock(pcshared_info);
+ cstate->raw_buf = pcshared_info->data_blocks[block_pos].data;
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+ bIsFirst = false;
+ }
+ else
+ {
+ /*
+ * From the subsequent time, reset the index and
+ * re-use the same block.
+ */
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+ }
+ }
+ }
- cvt = pg_any_to_server(cstate->line_buf.data,
- cstate->line_buf.len,
- cstate->file_encoding);
- if (cvt != cstate->line_buf.data)
- {
- /* transfer converted data back to line_buf */
- resetStringInfo(&cstate->line_buf);
- appendBinaryStringInfo(&cstate->line_buf, cvt, strlen(cvt));
- pfree(cvt);
+ } while (CopyLoadRawBuf(cstate, cstate->raw_buf_len, &cstate->raw_buf_index));
}
}
- /* Now it's safe to use the buffer in error messages */
- cstate->line_buf_converted = true;
-
+ CONVERT_TO_SERVER_ENCODING(cstate)
return result;
}
@@ -3927,9 +5687,6 @@ CopyReadLine(CopyState cstate)
static bool
CopyReadLineText(CopyState cstate)
{
- char *copy_raw_buf;
- int raw_buf_ptr;
- int copy_buf_len;
bool need_data = false;
bool hit_eof = false;
bool result = false;
@@ -3942,6 +5699,11 @@ CopyReadLineText(CopyState cstate)
char quotec = '\0';
char escapec = '\0';
+ int chunk_pos = 0;
+ uint32 chunk_first_block = 0;
+ uint32 new_line_size = 0;
+ CopyBufferState copy_buff_state = {0};
+
if (cstate->csv_mode)
{
quotec = cstate->quote[0];
@@ -3974,14 +5736,13 @@ CopyReadLineText(CopyState cstate)
* For a little extra speed within the loop, we copy raw_buf and
* raw_buf_len into local variables.
*/
- copy_raw_buf = cstate->raw_buf;
- raw_buf_ptr = cstate->raw_buf_index;
- copy_buf_len = cstate->raw_buf_len;
-
+ BEGIN_READ_LINE(cstate, chunk_first_block)
for (;;)
{
int prev_raw_ptr;
char c;
+ uint32 prev_chunk_size = copy_buff_state.chunk_size;
+ int prev_copy_buf_len = copy_buff_state.copy_buf_len;
/*
* Load more data if needed. Ideally we would just force four bytes
@@ -3993,35 +5754,70 @@ CopyReadLineText(CopyState cstate)
* cstate->copy_dest != COPY_OLD_FE, but it hardly seems worth it,
* considering the size of the buffer.
*/
- if (raw_buf_ptr >= copy_buf_len || need_data)
+ if (copy_buff_state.raw_buf_ptr >= copy_buff_state.copy_buf_len ||
+ need_data)
{
+ uint32 remaining_data;
REFILL_LINEBUF;
+ /* In parallel mode, read as much data as possible to a new block */
+ if (IsParallelCopy())
+ SET_RAWBUF_FOR_LOAD()
+
+ remaining_data = copy_buff_state.copy_buf_len - copy_buff_state.raw_buf_ptr;
+
/*
* Try to read some more data. This will certainly reset
* raw_buf_index to zero, and raw_buf_ptr must go with it.
*/
- if (!CopyLoadRawBuf(cstate))
+ if (!CopyLoadRawBuf(cstate, copy_buff_state.copy_buf_len,
+ ©_buff_state.raw_buf_ptr))
hit_eof = true;
- raw_buf_ptr = 0;
- copy_buf_len = cstate->raw_buf_len;
+
+ remaining_data += cstate->raw_buf_len;
/*
- * If we are completely out of data, break out of the loop,
- * reporting EOF.
+ * If all the data of previous block is not consumed, set raw_buf back to
+ * previous block.
*/
- if (copy_buf_len <= 0)
+ if (IsParallelCopy() && need_data &&
+ (copy_buff_state.curr_data_blk_ptr != copy_buff_state.data_blk_ptr))
+ {
+ copy_buff_state.copy_buf_len += cstate->raw_buf_len;
+ cstate->raw_buf = copy_buff_state.data_blk_ptr->data;
+ }
+ else
+ copy_buff_state.copy_buf_len = cstate->raw_buf_len;
+
+ need_data = false;
+ if (remaining_data <= 0)
{
+ /*
+ * If we are completely out of data, break out of the loop,
+ * reporting EOF.
+ */
result = true;
break;
}
- need_data = false;
}
- /* OK to fetch a character */
- prev_raw_ptr = raw_buf_ptr;
- c = copy_raw_buf[raw_buf_ptr++];
+ /*
+ * Store the current information, we might have to reset if we find that
+ * enough data is not present while reading. If enough data is not
+ * present, we will reset using the current information and load more
+ * data.
+ */
+ prev_raw_ptr = copy_buff_state.raw_buf_ptr;
+ if (IsParallelCopy())
+ {
+ prev_copy_buf_len = copy_buff_state.copy_buf_len;
+ prev_chunk_size = copy_buff_state.chunk_size;
+ copy_buff_state.block_switched = false;
+ }
+ /* OK to fetch a character */
+ c = copy_buff_state.copy_raw_buf[copy_buff_state.raw_buf_ptr];
+ SEEK_COPY_BUFF_POS(cstate, 1, copy_buff_state)
if (cstate->csv_mode)
{
/*
@@ -4079,11 +5875,10 @@ CopyReadLineText(CopyState cstate)
IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(0);
/* get next char */
- c = copy_raw_buf[raw_buf_ptr];
-
+ c = copy_buff_state.copy_raw_buf[copy_buff_state.raw_buf_ptr];
if (c == '\n')
{
- raw_buf_ptr++; /* eat newline */
+ SEEK_COPY_BUFF_POS(cstate, 1, copy_buff_state) /* eat newline */
cstate->eol_type = EOL_CRNL; /* in case not set yet */
}
else
@@ -4153,11 +5948,11 @@ CopyReadLineText(CopyState cstate)
* through and continue processing for file encoding.
* -----
*/
- c2 = copy_raw_buf[raw_buf_ptr];
+ c2 = copy_buff_state.copy_raw_buf[copy_buff_state.raw_buf_ptr];
if (c2 == '.')
{
- raw_buf_ptr++; /* consume the '.' */
+ SEEK_COPY_BUFF_POS(cstate, 1, copy_buff_state) /* consume the '.' */
/*
* Note: if we loop back for more data here, it does not
@@ -4169,7 +5964,8 @@ CopyReadLineText(CopyState cstate)
/* Get the next character */
IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(0);
/* if hit_eof, c2 will become '\0' */
- c2 = copy_raw_buf[raw_buf_ptr++];
+ c2 = copy_buff_state.copy_raw_buf[copy_buff_state.raw_buf_ptr];
+ SEEK_COPY_BUFF_POS(cstate, 1, copy_buff_state)
if (c2 == '\n')
{
@@ -4194,7 +5990,8 @@ CopyReadLineText(CopyState cstate)
/* Get the next character */
IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(0);
/* if hit_eof, c2 will become '\0' */
- c2 = copy_raw_buf[raw_buf_ptr++];
+ c2 = copy_buff_state.copy_raw_buf[copy_buff_state.raw_buf_ptr];
+ SEEK_COPY_BUFF_POS(cstate, 1, copy_buff_state)
if (c2 != '\r' && c2 != '\n')
{
@@ -4220,14 +6017,21 @@ CopyReadLineText(CopyState cstate)
* discard the data and the \. sequence.
*/
if (prev_raw_ptr > cstate->raw_buf_index)
- appendBinaryStringInfo(&cstate->line_buf,
+ {
+ if (!IsParallelCopy())
+ appendBinaryStringInfo(&cstate->line_buf,
cstate->raw_buf + cstate->raw_buf_index,
prev_raw_ptr - cstate->raw_buf_index);
- cstate->raw_buf_index = raw_buf_ptr;
+ else
+ copy_buff_state.chunk_size = prev_chunk_size;
+ }
+
+ cstate->raw_buf_index = copy_buff_state.raw_buf_ptr;
result = true; /* report EOF */
break;
}
else if (!cstate->csv_mode)
+ {
/*
* If we are here, it means we found a backslash followed by
@@ -4240,7 +6044,8 @@ CopyReadLineText(CopyState cstate)
* character after the backslash just like a normal character,
* so we don't increment in those cases.
*/
- raw_buf_ptr++;
+ SEEK_COPY_BUFF_POS(cstate, 1, copy_buff_state)
+ }
}
/*
@@ -4272,8 +6077,27 @@ not_end_of_copy:
IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(mblen - 1);
IF_NEED_REFILL_AND_EOF_BREAK(mblen - 1);
- raw_buf_ptr += mblen - 1;
+ SEEK_COPY_BUFF_POS(cstate, mblen - 1, copy_buff_state)
+ }
+
+ /*
+ * Skip the header line. Update the chunk here, this cannot be done at
+ * the beginning, as there is a possibility that file contains empty
+ * lines.
+ */
+ if (IsParallelCopy() && first_char_in_line && !IsHeaderLine())
+ {
+ ShmCopyInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ChunkBoundary *chunkInfo;
+ chunk_pos = UpdateBlockInChunkInfo(cstate,
+ chunk_first_block,
+ cstate->raw_buf_index, -1,
+ CHUNK_LEADER_POPULATING);
+ chunkInfo = &pcshared_info->chunk_boundaries.ring[chunk_pos];
+ elog(DEBUG1, "[Leader] Adding - block:%d, offset:%d, chunk position:%d",
+ chunk_first_block, chunkInfo->start_offset, chunk_pos);
}
+
first_char_in_line = false;
} /* end of outer loop */
@@ -4281,6 +6105,11 @@ not_end_of_copy:
* Transfer any still-uncopied data to line_buf.
*/
REFILL_LINEBUF;
+ CLEAR_EOL_LINE()
+
+ /* Skip the header line */
+ if (IsParallelCopy())
+ END_CHUNK_PARALLEL_COPY()
return result;
}
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 0c6fe01..3faadb8 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -865,7 +865,7 @@ is_parallel_safe(PlannerInfo *root, Node *node)
* planning, because those are parallel-restricted and there might be one
* in this expression. But otherwise we don't need to look.
*/
- if (root->glob->maxParallelHazard == PROPARALLEL_SAFE &&
+ if (root != NULL && root->glob->maxParallelHazard == PROPARALLEL_SAFE &&
root->glob->paramExecTypes == NIL)
return true;
/* Else use max_parallel_hazard's search logic, but stop on RESTRICTED */
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index c27d970..b3787c1 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -557,7 +557,7 @@ make_copy_attnamelist(LogicalRepRelMapEntry *rel)
* Data source callback for the COPY FROM, which reads from the remote
* connection and passes the data back to our local COPY.
*/
-static int
+int
copy_read_data(void *outbuf, int minread, int maxread)
{
int bytesread = 0;
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 7ee04ba..6933ade 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -381,6 +381,7 @@ extern FullTransactionId GetTopFullTransactionId(void);
extern FullTransactionId GetTopFullTransactionIdIfAny(void);
extern FullTransactionId GetCurrentFullTransactionId(void);
extern FullTransactionId GetCurrentFullTransactionIdIfAny(void);
+extern void AssignFullTransactionIdForWorker(FullTransactionId fullTransactionId);
extern void MarkCurrentTransactionIdLoggedIfAny(void);
extern bool SubTransactionIsActive(SubTransactionId subxid);
extern CommandId GetCurrentCommandId(bool used);
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c639833..5dc95ac 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -14,6 +14,7 @@
#ifndef COPY_H
#define COPY_H
+#include "access/parallel.h"
#include "nodes/execnodes.h"
#include "nodes/parsenodes.h"
#include "parser/parse_node.h"
@@ -41,4 +42,7 @@ extern uint64 CopyFrom(CopyState cstate);
extern DestReceiver *CreateCopyDestReceiver(void);
+extern void ParallelCopyMain(dsm_segment *seg, shm_toc *toc);
+
+extern int copy_read_data(void *outbuf, int minread, int maxread);
#endif /* COPY_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 05c5e9c..ad5fbd0 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -337,6 +337,9 @@ CheckpointStatsData
CheckpointerRequest
CheckpointerShmemStruct
Chromosome
+ChunkBoundaries
+ChunkBoundary
+ChunkState
CkptSortItem
CkptTsStatus
ClientAuthentication_hook_type
@@ -419,6 +422,8 @@ ConvProcInfo
ConversionLocation
ConvertRowtypeExpr
CookedConstraint
+CopyBufferState
+CopyDataBlock
CopyDest
CopyInsertMethod
CopyMultiInsertBuffer
@@ -426,6 +431,7 @@ CopyMultiInsertInfo
CopyState
CopyStateData
CopyStmt
+CopyWorkerCommonData
Cost
CostSelector
Counters
@@ -1278,6 +1284,7 @@ LimitStateCond
List
ListCell
ListDictionary
+ListInfo
ListParsedLex
ListenAction
ListenActionKind
@@ -1699,6 +1706,8 @@ ParallelBitmapHeapState
ParallelBlockTableScanDesc
ParallelCompletionPtr
ParallelContext
+ParallelCopyData
+ParallelCopyLineBuf
ParallelExecutorInfo
ParallelHashGrowth
ParallelHashJoinBatch
@@ -2264,6 +2273,7 @@ Sharedsort
ShellTypeInfo
ShippableCacheEntry
ShippableCacheKey
+ShmCopyInfo
ShmemIndexEnt
ShutdownForeignScan_function
ShutdownInformation
--
1.8.3.1
On Mon, May 18, 2020 at 12:48 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
In the above case, even though we are executing a single command from
the user perspective, but the currentCommandId will be four after the
command. One increment will be for the copy command and the other
three increments are for locking tuple in PK table
(tab_fk_referenced_chk) for three tuples in FK table
(tab_fk_referencing_chk). Now, for parallel workers, it is
(theoretically) possible that the three tuples are processed by three
different workers which don't get synced as of now. The question was
do we see any kind of problem with this and if so can we just sync it
up at the end of parallelism.
I strongly disagree with the idea of "just sync(ing) it up at the end
of parallelism". That seems like a completely unprincipled approach to
the problem. Either the command counter increment is important or it's
not. If it's not important, maybe we can arrange to skip it in the
first place. If it is important, then it's probably not OK for each
backend to be doing it separately.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi,
On 2020-06-03 12:13:14 -0400, Robert Haas wrote:
On Mon, May 18, 2020 at 12:48 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
In the above case, even though we are executing a single command from
the user perspective, but the currentCommandId will be four after the
command. One increment will be for the copy command and the other
three increments are for locking tuple in PK table
(tab_fk_referenced_chk) for three tuples in FK table
(tab_fk_referencing_chk). Now, for parallel workers, it is
(theoretically) possible that the three tuples are processed by three
different workers which don't get synced as of now. The question was
do we see any kind of problem with this and if so can we just sync it
up at the end of parallelism.
I strongly disagree with the idea of "just sync(ing) it up at the end
of parallelism". That seems like a completely unprincipled approach to
the problem. Either the command counter increment is important or it's
not. If it's not important, maybe we can arrange to skip it in the
first place. If it is important, then it's probably not OK for each
backend to be doing it separately.
That scares me too. These command counter increments definitely aren't
unnecessary in the general case.
Even in the example you share above, aren't we potentially going to
actually lock rows multiple times from within the same transaction,
instead of once? If the command counter increments from within
ri_trigger.c aren't visible to other parallel workers/leader, we'll not
correctly understand that a locked row is invisible to heap_lock_tuple,
because we're not using a new enough snapshot (by dint of not having a
new enough cid).
I've not dug through everything that'd potentially cause, but it seems
pretty clearly a no-go from here.
Greetings,
Andres Freund
Hi,
On 2020-06-03 15:53:24 +0530, vignesh C wrote:
Workers/
Exec time (seconds) copy from file,
2 indexes on integer columns
1 index on text column copy from stdin,
2 indexes on integer columns
1 index on text column copy from file, 1 gist index on text column copy
from file,
3 indexes on integer columns copy from stdin, 3 indexes on integer columns
0 1162.772(1X) 1176.035(1X) 827.669(1X) 216.171(1X) 217.376(1X)
1 1110.288(1.05X) 1120.556(1.05X) 747.384(1.11X) 174.242(1.24X) 163.492(1.33X)
2 635.249(1.83X) 668.18(1.76X) 435.673(1.9X) 133.829(1.61X) 126.516(1.72X)
4 336.835(3.45X) 346.768(3.39X) 236.406(3.5X) 105.767(2.04X) 107.382(2.02X)
8 188.577(6.17X) 194.491(6.04X) 148.962(5.56X) 100.708(2.15X) 107.72(2.01X)
16 126.819(9.17X) 146.402(8.03X) 119.923(6.9X) 97.996(2.2X) 106.531(2.04X)
20 *117.845(9.87X)* 149.203(7.88X) 138.741(5.96X) 97.94(2.21X) 107.5(2.02)
30 127.554(9.11X) 161.218(7.29X) 172.443(4.8X) 98.232(2.2X) 108.778(1.99X)
Hm. you don't explicitly mention that in your design, but given how
small the benefits going from 0-1 workers is, I assume the leader
doesn't do any "chunk processing" on its own?
Design of the Parallel Copy: The backend, to which the "COPY FROM" query is
submitted acts as leader with the responsibility of reading data from the
file/stdin, launching at most n number of workers as specified with
PARALLEL 'n' option in the "COPY FROM" query. The leader populates the
common data required for the workers execution in the DSM and shares it
with the workers. The leader then executes before statement triggers if
there exists any. Leader populates DSM chunks which includes the start
offset and chunk size, while populating the chunks it reads as many blocks
as required into the DSM data blocks from the file. Each block is of 64K
size. The leader parses the data to identify a chunk, the existing logic
from CopyReadLineText which identifies the chunks with some changes was
used for this. Leader checks if a free chunk is available to copy the
information, if there is no free chunk it waits till the required chunk is
freed up by the worker and then copies the identified chunks information
(offset & chunk size) into the DSM chunks. This process is repeated till
the complete file is processed. Simultaneously, the workers cache the
chunks(50) locally into the local memory and release the chunks to the
leader for further populating. Each worker processes the chunk that it
cached and inserts it into the table. The leader waits till all the chunks
populated are processed by the workers and exits.
Why do we need the local copy of 50 chunks? Copying memory around is far
from free. I don't see why it'd be better to add per-process caching,
rather than making the DSM bigger? I can see some benefit in marking
multiple chunks as being processed with one lock acquisition, but I
don't think adding a memory copy is a good idea.
This patch *desperately* needs to be split up. It imo is close to
unreviewable, due to a large amount of changes that just move code
around without other functional changes being mixed in with the actual
new stuff.
/* + * State of the chunk. + */ +typedef enum ChunkState +{ + CHUNK_INIT, /* initial state of chunk */ + CHUNK_LEADER_POPULATING, /* leader processing chunk */ + CHUNK_LEADER_POPULATED, /* leader completed populating chunk */ + CHUNK_WORKER_PROCESSING, /* worker processing chunk */ + CHUNK_WORKER_PROCESSED /* worker completed processing chunk */ +}ChunkState; + +#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */ + +#define DATA_BLOCK_SIZE RAW_BUF_SIZE +#define RINGSIZE (10 * 1000) +#define MAX_BLOCKS_COUNT 1000 +#define WORKER_CHUNK_COUNT 50 /* should be mod of RINGSIZE */ + +#define IsParallelCopy() (cstate->is_parallel) +#define IsLeader() (cstate->pcdata->is_leader) +#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1) + +/* + * Copy data block information. + */ +typedef struct CopyDataBlock +{ + /* The number of unprocessed chunks in the current block. */ + pg_atomic_uint32 unprocessed_chunk_parts; + + /* + * If the current chunk data is continued into another block, + * following_block will have the position where the remaining data need to + * be read. + */ + uint32 following_block; + + /* + * This flag will be set, when the leader finds out this block can be read + * safely by the worker. This helps the worker to start processing the chunk + * early where the chunk will be spread across many blocks and the worker + * need not wait for the complete chunk to be processed. + */ + bool curr_blk_completed; + char data[DATA_BLOCK_SIZE + 1]; /* data read from file */ +}CopyDataBlock;
What's the + 1 here about?
+/* + * Parallel copy line buffer information. + */ +typedef struct ParallelCopyLineBuf +{ + StringInfoData line_buf; + uint64 cur_lineno; /* line number for error messages */ +}ParallelCopyLineBuf;
Why do we need separate infrastructure for this? We shouldn't duplicate
infrastructure unnecessarily.
+/* + * Common information that need to be copied to shared memory. + */ +typedef struct CopyWorkerCommonData +{
Why is parallel specific stuff here suddenly not named ParallelCopy*
anymore? If you introduce a naming like that it imo should be used
consistently.
+ /* low-level state data */ + CopyDest copy_dest; /* type of copy source/destination */ + int file_encoding; /* file or remote side's character encoding */ + bool need_transcoding; /* file encoding diff from server? */ + bool encoding_embeds_ascii; /* ASCII can be non-first byte? */ + + /* parameters from the COPY command */ + bool csv_mode; /* Comma Separated Value format? */ + bool header_line; /* CSV header line? */ + int null_print_len; /* length of same */ + bool force_quote_all; /* FORCE_QUOTE *? */ + bool convert_selectively; /* do selective binary conversion? */ + + /* Working state for COPY FROM */ + AttrNumber num_defaults; + Oid relid; +}CopyWorkerCommonData;
But I actually think we shouldn't have this information in two different
structs. This should exist once, independent of using parallel /
non-parallel copy.
+/* List information */ +typedef struct ListInfo +{ + int count; /* count of attributes */ + + /* string info in the form info followed by info1, info2... infon */ + char info[1]; +} ListInfo;
Based on these comments I have no idea what this could be for.
/* - * This keeps the character read at the top of the loop in the buffer - * even if there is more than one read-ahead. + * This keeps the character read at the top of the loop in the buffer + * even if there is more than one read-ahead. + */ +#define IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(extralen) \ +if (1) \ +{ \ + if (copy_buff_state.raw_buf_ptr + (extralen) >= copy_buff_state.copy_buf_len && !hit_eof) \ + { \ + if (IsParallelCopy()) \ + { \ + copy_buff_state.chunk_size = prev_chunk_size; /* update previous chunk size */ \ + if (copy_buff_state.block_switched) \ + { \ + pg_atomic_sub_fetch_u32(©_buff_state.data_blk_ptr->unprocessed_chunk_parts, 1); \ + copy_buff_state.copy_buf_len = prev_copy_buf_len; \ + } \ + } \ + copy_buff_state.raw_buf_ptr = prev_raw_ptr; /* undo fetch */ \ + need_data = true; \ + continue; \ + } \ +} else ((void) 0)
I think it's an absolutely clear no-go to add new branches to
these. They're *really* hot already, and this is going to sprinkle a
significant amount of new instructions over a lot of places.
+/* + * SET_RAWBUF_FOR_LOAD - Set raw_buf to the shared memory where the file data must + * be read. + */ +#define SET_RAWBUF_FOR_LOAD() \ +{ \ + ShmCopyInfo *pcshared_info = cstate->pcdata->pcshared_info; \ + uint32 cur_block_pos; \ + /* \ + * Mark the previous block as completed, worker can start copying this data. \ + */ \ + if (copy_buff_state.data_blk_ptr != copy_buff_state.curr_data_blk_ptr && \ + copy_buff_state.data_blk_ptr->curr_blk_completed == false) \ + copy_buff_state.data_blk_ptr->curr_blk_completed = true; \ + \ + copy_buff_state.data_blk_ptr = copy_buff_state.curr_data_blk_ptr; \ + cur_block_pos = WaitGetFreeCopyBlock(pcshared_info); \ + copy_buff_state.curr_data_blk_ptr = &pcshared_info->data_blocks[cur_block_pos]; \ + \ + if (!copy_buff_state.data_blk_ptr) \ + { \ + copy_buff_state.data_blk_ptr = copy_buff_state.curr_data_blk_ptr; \ + chunk_first_block = cur_block_pos; \ + } \ + else if (need_data == false) \ + copy_buff_state.data_blk_ptr->following_block = cur_block_pos; \ + \ + cstate->raw_buf = copy_buff_state.curr_data_blk_ptr->data; \ + copy_buff_state.copy_raw_buf = cstate->raw_buf; \ +} + +/* + * END_CHUNK_PARALLEL_COPY - Update the chunk information in shared memory. + */ +#define END_CHUNK_PARALLEL_COPY() \ +{ \ + if (!IsHeaderLine()) \ + { \ + ShmCopyInfo *pcshared_info = cstate->pcdata->pcshared_info; \ + ChunkBoundaries *chunkBoundaryPtr = &pcshared_info->chunk_boundaries; \ + if (copy_buff_state.chunk_size) \ + { \ + ChunkBoundary *chunkInfo = &chunkBoundaryPtr->ring[chunk_pos]; \ + /* \ + * If raw_buf_ptr is zero, unprocessed_chunk_parts would have been \ + * incremented in SEEK_COPY_BUFF_POS. This will happen if the whole \ + * chunk finishes at the end of the current block. If the \ + * new_line_size > raw_buf_ptr, then the new block has only new line \ + * char content. The unprocessed count should not be increased in \ + * this case. \ + */ \ + if (copy_buff_state.raw_buf_ptr != 0 && \ + copy_buff_state.raw_buf_ptr > new_line_size) \ + pg_atomic_add_fetch_u32(©_buff_state.curr_data_blk_ptr->unprocessed_chunk_parts, 1); \ + \ + /* Update chunk size. */ \ + pg_atomic_write_u32(&chunkInfo->chunk_size, copy_buff_state.chunk_size); \ + pg_atomic_write_u32(&chunkInfo->chunk_state, CHUNK_LEADER_POPULATED); \ + elog(DEBUG1, "[Leader] After adding - chunk position:%d, chunk_size:%d", \ + chunk_pos, copy_buff_state.chunk_size); \ + pcshared_info->populated++; \ + } \ + else if (new_line_size) \ + { \ + /* \ + * This means only new line char, empty record should be \ + * inserted. \ + */ \ + ChunkBoundary *chunkInfo; \ + chunk_pos = UpdateBlockInChunkInfo(cstate, -1, -1, 0, \ + CHUNK_LEADER_POPULATED); \ + chunkInfo = &chunkBoundaryPtr->ring[chunk_pos]; \ + elog(DEBUG1, "[Leader] Added empty chunk with offset:%d, chunk position:%d, chunk size:%d", \ + chunkInfo->start_offset, chunk_pos, \ + pg_atomic_read_u32(&chunkInfo->chunk_size)); \ + pcshared_info->populated++; \ + } \ + }\ + \ + /*\ + * All of the read data is processed, reset index & len. In the\ + * subsequent read, we will get a new block and copy data in to the\ + * new block.\ + */\ + if (copy_buff_state.raw_buf_ptr == copy_buff_state.copy_buf_len)\ + {\ + cstate->raw_buf_index = 0;\ + cstate->raw_buf_len = 0;\ + }\ + else\ + cstate->raw_buf_len = copy_buff_state.copy_buf_len;\ +}
Why are these macros? They are way way way above a length where that
makes any sort of sense.
Greetings,
Andres Freund
On Thu, Jun 4, 2020 at 12:09 AM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2020-06-03 12:13:14 -0400, Robert Haas wrote:
On Mon, May 18, 2020 at 12:48 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
In the above case, even though we are executing a single command from
the user perspective, but the currentCommandId will be four after the
command. One increment will be for the copy command and the other
three increments are for locking tuple in PK table
(tab_fk_referenced_chk) for three tuples in FK table
(tab_fk_referencing_chk). Now, for parallel workers, it is
(theoretically) possible that the three tuples are processed by three
different workers which don't get synced as of now. The question was
do we see any kind of problem with this and if so can we just sync it
up at the end of parallelism.I strongly disagree with the idea of "just sync(ing) it up at the end
of parallelism". That seems like a completely unprincipled approach to
the problem. Either the command counter increment is important or it's
not. If it's not important, maybe we can arrange to skip it in the
first place. If it is important, then it's probably not OK for each
backend to be doing it separately.That scares me too. These command counter increments definitely aren't
unnecessary in the general case.
Yeah, this is what we want to understand? Can you explain how they
are useful here? AFAIU, heap_lock_tuple doesn't use commandid while
storing the transaction information of xact while locking the tuple.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Hi,
On 2020-06-04 08:10:07 +0530, Amit Kapila wrote:
On Thu, Jun 4, 2020 at 12:09 AM Andres Freund <andres@anarazel.de> wrote:
I strongly disagree with the idea of "just sync(ing) it up at the end
of parallelism". That seems like a completely unprincipled approach to
the problem. Either the command counter increment is important or it's
not. If it's not important, maybe we can arrange to skip it in the
first place. If it is important, then it's probably not OK for each
backend to be doing it separately.That scares me too. These command counter increments definitely aren't
unnecessary in the general case.Yeah, this is what we want to understand? Can you explain how they
are useful here? AFAIU, heap_lock_tuple doesn't use commandid while
storing the transaction information of xact while locking the tuple.
But the HeapTupleSatisfiesUpdate() call does use it?
And even if that weren't an issue, I don't see how it's defensible to
just randomly break the the commandid coherency for parallel copy.
Greetings,
Andres Freund
On Thu, Jun 4, 2020 at 9:10 AM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2020-06-04 08:10:07 +0530, Amit Kapila wrote:
On Thu, Jun 4, 2020 at 12:09 AM Andres Freund <andres@anarazel.de> wrote:
I strongly disagree with the idea of "just sync(ing) it up at the end
of parallelism". That seems like a completely unprincipled approach to
the problem. Either the command counter increment is important or it's
not. If it's not important, maybe we can arrange to skip it in the
first place. If it is important, then it's probably not OK for each
backend to be doing it separately.That scares me too. These command counter increments definitely aren't
unnecessary in the general case.Yeah, this is what we want to understand? Can you explain how they
are useful here? AFAIU, heap_lock_tuple doesn't use commandid while
storing the transaction information of xact while locking the tuple.But the HeapTupleSatisfiesUpdate() call does use it?
It won't use 'cid' for lockers or multi-lockers case (AFAICS, there
are special case handling for lockers/multi-lockers). I think it is
used for updates/deletes.
And even if that weren't an issue, I don't see how it's defensible to
just randomly break the the commandid coherency for parallel copy.
At this stage, we are evelauating whether there is any need to
increment command counter for foreign key checks or is it just
happening because we are using using some common code to execute
"Select ... For Key Share" statetement during these checks.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Thu, Jun 4, 2020 at 12:44 AM Andres Freund <andres@anarazel.de> wrote
Hm. you don't explicitly mention that in your design, but given how
small the benefits going from 0-1 workers is, I assume the leader
doesn't do any "chunk processing" on its own?
Yes you are right, the leader does not do any processing, Leader's
work is mainly to populate the shared memory with the offset
information for each record.
Design of the Parallel Copy: The backend, to which the "COPY FROM" query is
submitted acts as leader with the responsibility of reading data from the
file/stdin, launching at most n number of workers as specified with
PARALLEL 'n' option in the "COPY FROM" query. The leader populates the
common data required for the workers execution in the DSM and shares it
with the workers. The leader then executes before statement triggers if
there exists any. Leader populates DSM chunks which includes the start
offset and chunk size, while populating the chunks it reads as many blocks
as required into the DSM data blocks from the file. Each block is of 64K
size. The leader parses the data to identify a chunk, the existing logic
from CopyReadLineText which identifies the chunks with some changes was
used for this. Leader checks if a free chunk is available to copy the
information, if there is no free chunk it waits till the required chunk is
freed up by the worker and then copies the identified chunks information
(offset & chunk size) into the DSM chunks. This process is repeated till
the complete file is processed. Simultaneously, the workers cache the
chunks(50) locally into the local memory and release the chunks to the
leader for further populating. Each worker processes the chunk that it
cached and inserts it into the table. The leader waits till all the chunks
populated are processed by the workers and exits.Why do we need the local copy of 50 chunks? Copying memory around is far
from free. I don't see why it'd be better to add per-process caching,
rather than making the DSM bigger? I can see some benefit in marking
multiple chunks as being processed with one lock acquisition, but I
don't think adding a memory copy is a good idea.
We had run performance with csv data file, 5.1GB, 10million tuples, 2
indexes on integer columns, results for the same are given below. We
noticed in some cases the performance is better if we copy the 50
records locally and release the shared memory. We will get better
benefits as the workers increase. Thoughts?
------------------------------------------------------------------------------------------------
Workers | Exec time (With local copying | Exec time (Without copying,
| 50 records & release the | processing record by record)
| shared memory) |
------------------------------------------------------------------------------------------------
0 | 1162.772(1X) | 1152.684(1X)
2 | 635.249(1.83X) | 647.894(1.78X)
4 | 336.835(3.45X) | 335.534(3.43X)
8 | 188.577(6.17 X) | 189.461(6.08X)
16 | 126.819(9.17X) | 142.730(8.07X)
20 | 117.845(9.87X) | 146.533(7.87X)
30 | 127.554(9.11X) | 160.307(7.19X)
This patch *desperately* needs to be split up. It imo is close to
unreviewable, due to a large amount of changes that just move code
around without other functional changes being mixed in with the actual
new stuff.
I have split the patch, the new split patches are attached.
/* + * State of the chunk. + */ +typedef enum ChunkState +{ + CHUNK_INIT, /* initial state of chunk */ + CHUNK_LEADER_POPULATING, /* leader processing chunk */ + CHUNK_LEADER_POPULATED, /* leader completed populating chunk */ + CHUNK_WORKER_PROCESSING, /* worker processing chunk */ + CHUNK_WORKER_PROCESSED /* worker completed processing chunk */ +}ChunkState; + +#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */ + +#define DATA_BLOCK_SIZE RAW_BUF_SIZE +#define RINGSIZE (10 * 1000) +#define MAX_BLOCKS_COUNT 1000 +#define WORKER_CHUNK_COUNT 50 /* should be mod of RINGSIZE */ + +#define IsParallelCopy() (cstate->is_parallel) +#define IsLeader() (cstate->pcdata->is_leader) +#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1) + +/* + * Copy data block information. + */ +typedef struct CopyDataBlock +{ + /* The number of unprocessed chunks in the current block. */ + pg_atomic_uint32 unprocessed_chunk_parts; + + /* + * If the current chunk data is continued into another block, + * following_block will have the position where the remaining data need to + * be read. + */ + uint32 following_block; + + /* + * This flag will be set, when the leader finds out this block can be read + * safely by the worker. This helps the worker to start processing the chunk + * early where the chunk will be spread across many blocks and the worker + * need not wait for the complete chunk to be processed. + */ + bool curr_blk_completed; + char data[DATA_BLOCK_SIZE + 1]; /* data read from file */ +}CopyDataBlock;What's the + 1 here about?
Fixed this, removed +1. That is not needed.
+/* + * Parallel copy line buffer information. + */ +typedef struct ParallelCopyLineBuf +{ + StringInfoData line_buf; + uint64 cur_lineno; /* line number for error messages */ +}ParallelCopyLineBuf;Why do we need separate infrastructure for this? We shouldn't duplicate
infrastructure unnecessarily.
This was required for copying the multiple records locally and
releasing the shared memory. I have not changed this, will decide on
this based on the decision taken for one of the previous comments.
+/* + * Common information that need to be copied to shared memory. + */ +typedef struct CopyWorkerCommonData +{Why is parallel specific stuff here suddenly not named ParallelCopy*
anymore? If you introduce a naming like that it imo should be used
consistently.
Fixed, changed to maintain ParallelCopy in all structs.
+ /* low-level state data */ + CopyDest copy_dest; /* type of copy source/destination */ + int file_encoding; /* file or remote side's character encoding */ + bool need_transcoding; /* file encoding diff from server? */ + bool encoding_embeds_ascii; /* ASCII can be non-first byte? */ + + /* parameters from the COPY command */ + bool csv_mode; /* Comma Separated Value format? */ + bool header_line; /* CSV header line? */ + int null_print_len; /* length of same */ + bool force_quote_all; /* FORCE_QUOTE *? */ + bool convert_selectively; /* do selective binary conversion? */ + + /* Working state for COPY FROM */ + AttrNumber num_defaults; + Oid relid; +}CopyWorkerCommonData;But I actually think we shouldn't have this information in two different
structs. This should exist once, independent of using parallel /
non-parallel copy.
This structure helps in storing the common data from CopyStateData
that are required by the workers. This information will then be
allocated and stored into the DSM for the worker to retrieve and copy
it to CopyStateData.
+/* List information */ +typedef struct ListInfo +{ + int count; /* count of attributes */ + + /* string info in the form info followed by info1, info2... infon */ + char info[1]; +} ListInfo;Based on these comments I have no idea what this could be for.
Have added better comments for this. The following is added: This
structure will help in converting a List data type into the below
structure format with the count having the number of elements in the
list and the info having the List elements appended contiguously. This
converted structure will be allocated in shared memory and stored in
DSM for the worker to retrieve and later convert it back to List data
type.
/* - * This keeps the character read at the top of the loop in the buffer - * even if there is more than one read-ahead. + * This keeps the character read at the top of the loop in the buffer + * even if there is more than one read-ahead. + */ +#define IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(extralen) \ +if (1) \ +{ \ + if (copy_buff_state.raw_buf_ptr + (extralen) >= copy_buff_state.copy_buf_len && !hit_eof) \ + { \ + if (IsParallelCopy()) \ + { \ + copy_buff_state.chunk_size = prev_chunk_size; /* update previous chunk size */ \ + if (copy_buff_state.block_switched) \ + { \ + pg_atomic_sub_fetch_u32(©_buff_state.data_blk_ptr->unprocessed_chunk_parts, 1); \ + copy_buff_state.copy_buf_len = prev_copy_buf_len; \ + } \ + } \ + copy_buff_state.raw_buf_ptr = prev_raw_ptr; /* undo fetch */ \ + need_data = true; \ + continue; \ + } \ +} else ((void) 0)I think it's an absolutely clear no-go to add new branches to
these. They're *really* hot already, and this is going to sprinkle a
significant amount of new instructions over a lot of places.
Fixed, removed this.
+/* + * SET_RAWBUF_FOR_LOAD - Set raw_buf to the shared memory where the file data must + * be read. + */ +#define SET_RAWBUF_FOR_LOAD() \ +{ \ + ShmCopyInfo *pcshared_info = cstate->pcdata->pcshared_info; \ + uint32 cur_block_pos; \ + /* \ + * Mark the previous block as completed, worker can start copying this data. \ + */ \ + if (copy_buff_state.data_blk_ptr != copy_buff_state.curr_data_blk_ptr && \ + copy_buff_state.data_blk_ptr->curr_blk_completed == false) \ + copy_buff_state.data_blk_ptr->curr_blk_completed = true; \ + \ + copy_buff_state.data_blk_ptr = copy_buff_state.curr_data_blk_ptr; \ + cur_block_pos = WaitGetFreeCopyBlock(pcshared_info); \ + copy_buff_state.curr_data_blk_ptr = &pcshared_info->data_blocks[cur_block_pos]; \ + \ + if (!copy_buff_state.data_blk_ptr) \ + { \ + copy_buff_state.data_blk_ptr = copy_buff_state.curr_data_blk_ptr; \ + chunk_first_block = cur_block_pos; \ + } \ + else if (need_data == false) \ + copy_buff_state.data_blk_ptr->following_block = cur_block_pos; \ + \ + cstate->raw_buf = copy_buff_state.curr_data_blk_ptr->data; \ + copy_buff_state.copy_raw_buf = cstate->raw_buf; \ +} + +/* + * END_CHUNK_PARALLEL_COPY - Update the chunk information in shared memory. + */ +#define END_CHUNK_PARALLEL_COPY() \ +{ \ + if (!IsHeaderLine()) \ + { \ + ShmCopyInfo *pcshared_info = cstate->pcdata->pcshared_info; \ + ChunkBoundaries *chunkBoundaryPtr = &pcshared_info->chunk_boundaries; \ + if (copy_buff_state.chunk_size) \ + { \ + ChunkBoundary *chunkInfo = &chunkBoundaryPtr->ring[chunk_pos]; \ + /* \ + * If raw_buf_ptr is zero, unprocessed_chunk_parts would have been \ + * incremented in SEEK_COPY_BUFF_POS. This will happen if the whole \ + * chunk finishes at the end of the current block. If the \ + * new_line_size > raw_buf_ptr, then the new block has only new line \ + * char content. The unprocessed count should not be increased in \ + * this case. \ + */ \ + if (copy_buff_state.raw_buf_ptr != 0 && \ + copy_buff_state.raw_buf_ptr > new_line_size) \ + pg_atomic_add_fetch_u32(©_buff_state.curr_data_blk_ptr->unprocessed_chunk_parts, 1); \ + \ + /* Update chunk size. */ \ + pg_atomic_write_u32(&chunkInfo->chunk_size, copy_buff_state.chunk_size); \ + pg_atomic_write_u32(&chunkInfo->chunk_state, CHUNK_LEADER_POPULATED); \ + elog(DEBUG1, "[Leader] After adding - chunk position:%d, chunk_size:%d", \ + chunk_pos, copy_buff_state.chunk_size); \ + pcshared_info->populated++; \ + } \ + else if (new_line_size) \ + { \ + /* \ + * This means only new line char, empty record should be \ + * inserted. \ + */ \ + ChunkBoundary *chunkInfo; \ + chunk_pos = UpdateBlockInChunkInfo(cstate, -1, -1, 0, \ + CHUNK_LEADER_POPULATED); \ + chunkInfo = &chunkBoundaryPtr->ring[chunk_pos]; \ + elog(DEBUG1, "[Leader] Added empty chunk with offset:%d, chunk position:%d, chunk size:%d", \ + chunkInfo->start_offset, chunk_pos, \ + pg_atomic_read_u32(&chunkInfo->chunk_size)); \ + pcshared_info->populated++; \ + } \ + }\ + \ + /*\ + * All of the read data is processed, reset index & len. In the\ + * subsequent read, we will get a new block and copy data in to the\ + * new block.\ + */\ + if (copy_buff_state.raw_buf_ptr == copy_buff_state.copy_buf_len)\ + {\ + cstate->raw_buf_index = 0;\ + cstate->raw_buf_len = 0;\ + }\ + else\ + cstate->raw_buf_len = copy_buff_state.copy_buf_len;\ +}Why are these macros? They are way way way above a length where that
makes any sort of sense.
Converted these macros to functions.
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
Attachments:
0001-Copy-code-readjustment-to-support-parallel-copy.patchtext/x-patch; charset=US-ASCII; name=0001-Copy-code-readjustment-to-support-parallel-copy.patchDownload
From 97204eb6abafe891a654b34ff84cf9812e6c1fef Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Wed, 10 Jun 2020 06:07:17 +0530
Subject: [PATCH 1/4] Copy code readjustment to support parallel copy.
This patch has the copy code slightly readjusted so that the common code is
separated to functions/macros, these functions/macros will be used by the
workers in the parallel copy code of the upcoming patches. EOL removal is moved
from CopyReadLine to CopyReadLineText, this change was required because in case
of parallel copy the chunk identification and chunk updation is done in
CopyReadLineText, before chunk information is updated in shared memory the new
line characters should be removed.
---
src/backend/commands/copy.c | 320 ++++++++++++++++++++++++++------------------
1 file changed, 191 insertions(+), 129 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 6d53dc4..eaf0f78 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -95,6 +95,9 @@ typedef enum CopyInsertMethod
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
} CopyInsertMethod;
+#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
+
/*
* This struct contains all the state variables used throughout a COPY
* operation. For simplicity, we use the same struct for all variants of COPY,
@@ -219,7 +222,6 @@ typedef struct CopyStateData
* converts it. Note: we guarantee that there is a \0 at
* raw_buf[raw_buf_len].
*/
-#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
char *raw_buf;
int raw_buf_index; /* next byte to process */
int raw_buf_len; /* total # of bytes stored */
@@ -347,6 +349,88 @@ if (1) \
goto not_end_of_copy; \
} else ((void) 0)
+/*
+ * CONVERT_TO_SERVER_ENCODING - convert contents to server encoding.
+ */
+#define CONVERT_TO_SERVER_ENCODING(cstate) \
+{ \
+ /* Done reading the line. Convert it to server encoding. */ \
+ if (cstate->need_transcoding) \
+ { \
+ char *cvt; \
+ cvt = pg_any_to_server(cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ cstate->file_encoding); \
+ if (cvt != cstate->line_buf.data) \
+ { \
+ /* transfer converted data back to line_buf */ \
+ resetStringInfo(&cstate->line_buf); \
+ appendBinaryStringInfo(&cstate->line_buf, cvt, strlen(cvt)); \
+ pfree(cvt); \
+ } \
+ } \
+ /* Now it's safe to use the buffer in error messages */ \
+ cstate->line_buf_converted = true; \
+}
+
+/*
+ * CLEAR_EOL_FROM_COPIED_DATA - Clear EOL from the copied data.
+ */
+#define CLEAR_EOL_FROM_COPIED_DATA(copy_line_data, copy_line_pos, copy_line_size) \
+{ \
+ /* \
+ * If we didn't hit EOF, then we must have transferred the EOL marker \
+ * to line_buf along with the data. Get rid of it. \
+ */ \
+ switch (cstate->eol_type) \
+ { \
+ case EOL_NL: \
+ Assert(copy_line_size >= 1); \
+ Assert(copy_line_data[copy_line_pos - 1] == '\n'); \
+ copy_line_data[copy_line_pos - 1] = '\0'; \
+ copy_line_size--; \
+ break; \
+ case EOL_CR: \
+ Assert(copy_line_size >= 1); \
+ Assert(copy_line_data[copy_line_pos - 1] == '\r'); \
+ copy_line_data[copy_line_pos - 1] = '\0'; \
+ copy_line_size--; \
+ break; \
+ case EOL_CRNL: \
+ Assert(copy_line_size >= 2); \
+ Assert(copy_line_data[copy_line_pos - 2] == '\r'); \
+ Assert(copy_line_data[copy_line_pos - 1] == '\n'); \
+ copy_line_data[copy_line_pos - 2] = '\0'; \
+ copy_line_size -= 2; \
+ break; \
+ case EOL_UNKNOWN: \
+ /* shouldn't get here */ \
+ Assert(false); \
+ break; \
+ } \
+}
+
+/*
+ * CLEAR_EOL_LINE - Wrapper for clearing EOL.
+ */
+#define CLEAR_EOL_LINE() \
+if (!result && !IsHeaderLine()) \
+ CLEAR_EOL_FROM_COPIED_DATA(cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ cstate->line_buf.len) \
+
+/*
+ * INCREMENTPROCESSED - Increment the lines processed.
+ */
+#define INCREMENTPROCESSED(processed) \
+processed++;
+
+/*
+ * GETPROCESSED - Get the lines processed.
+ */
+#define GETPROCESSED(processed) \
+return processed;
+
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
@@ -393,6 +477,8 @@ static bool CopyGetInt32(CopyState cstate, int32 *val);
static void CopySendInt16(CopyState cstate, int16 val);
static bool CopyGetInt16(CopyState cstate, int16 *val);
+static void PopulateAttributes(CopyState cstate, TupleDesc tup_desc,
+ List *attnamelist);
/*
* Send copy start/stop messages for frontend copies. These have changed
@@ -1464,7 +1550,6 @@ BeginCopy(ParseState *pstate,
{
CopyState cstate;
TupleDesc tupDesc;
- int num_phys_attrs;
MemoryContext oldcontext;
/* Allocate workspace and zero all fields */
@@ -1630,6 +1715,22 @@ BeginCopy(ParseState *pstate,
tupDesc = cstate->queryDesc->tupDesc;
}
+ PopulateAttributes(cstate, tupDesc, attnamelist);
+ cstate->copy_dest = COPY_FILE; /* default */
+
+ MemoryContextSwitchTo(oldcontext);
+
+ return cstate;
+}
+
+/*
+ * PopulateAttributes - Populate the attributes.
+ */
+static void
+PopulateAttributes(CopyState cstate, TupleDesc tupDesc, List *attnamelist)
+{
+ int num_phys_attrs;
+
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
@@ -1749,12 +1850,6 @@ BeginCopy(ParseState *pstate,
pg_database_encoding_max_length() > 1);
/* See Multibyte encoding comment above */
cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
-
- cstate->copy_dest = COPY_FILE; /* default */
-
- MemoryContextSwitchTo(oldcontext);
-
- return cstate;
}
/*
@@ -2647,32 +2742,11 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
}
/*
- * Copy FROM file to relation.
+ * Check if the relation specified in copy from is valid.
*/
-uint64
-CopyFrom(CopyState cstate)
+static void
+CheckCopyFromValidity(CopyState cstate)
{
- ResultRelInfo *resultRelInfo;
- ResultRelInfo *target_resultRelInfo;
- ResultRelInfo *prevResultRelInfo = NULL;
- EState *estate = CreateExecutorState(); /* for ExecConstraints() */
- ModifyTableState *mtstate;
- ExprContext *econtext;
- TupleTableSlot *singleslot = NULL;
- MemoryContext oldcontext = CurrentMemoryContext;
-
- PartitionTupleRouting *proute = NULL;
- ErrorContextCallback errcallback;
- CommandId mycid = GetCurrentCommandId(true);
- int ti_options = 0; /* start with default options for insert */
- BulkInsertState bistate = NULL;
- CopyInsertMethod insertMethod;
- CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
- uint64 processed = 0;
- bool has_before_insert_row_trig;
- bool has_instead_insert_row_trig;
- bool leafpart_use_multi_insert = false;
-
Assert(cstate->rel);
/*
@@ -2708,6 +2782,36 @@ CopyFrom(CopyState cstate)
errmsg("cannot copy to non-table relation \"%s\"",
RelationGetRelationName(cstate->rel))));
}
+}
+
+/*
+ * Copy FROM file to relation.
+ */
+uint64
+CopyFrom(CopyState cstate)
+{
+ ResultRelInfo *resultRelInfo;
+ ResultRelInfo *target_resultRelInfo;
+ ResultRelInfo *prevResultRelInfo = NULL;
+ EState *estate = CreateExecutorState(); /* for ExecConstraints() */
+ ModifyTableState *mtstate;
+ ExprContext *econtext;
+ TupleTableSlot *singleslot = NULL;
+ MemoryContext oldcontext = CurrentMemoryContext;
+
+ PartitionTupleRouting *proute = NULL;
+ ErrorContextCallback errcallback;
+ CommandId mycid = GetCurrentCommandId(true);
+ int ti_options = 0; /* start with default options for insert */
+ BulkInsertState bistate = NULL;
+ CopyInsertMethod insertMethod;
+ CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
+ uint64 processed = 0;
+ bool has_before_insert_row_trig;
+ bool has_instead_insert_row_trig;
+ bool leafpart_use_multi_insert = false;
+
+ CheckCopyFromValidity(cstate);
/*
* If the target file is new-in-transaction, we assume that checking FSM
@@ -3262,7 +3366,7 @@ CopyFrom(CopyState cstate)
* or FDW; this is the same definition used by nodeModifyTable.c
* for counting tuples inserted by an INSERT command.
*/
- processed++;
+ INCREMENTPROCESSED(processed)
}
}
@@ -3317,30 +3421,15 @@ CopyFrom(CopyState cstate)
FreeExecutorState(estate);
- return processed;
+ GETPROCESSED(processed)
}
/*
- * Setup to read tuples from a file for COPY FROM.
- *
- * 'rel': Used as a template for the tuples
- * 'filename': Name of server-local file to read
- * 'attnamelist': List of char *, columns to include. NIL selects all cols.
- * 'options': List of DefElem. See copy_opt_item in gram.y for selections.
- *
- * Returns a CopyState, to be passed to NextCopyFrom and related functions.
+ * PopulateCatalogInformation - populate the catalog information.
*/
-CopyState
-BeginCopyFrom(ParseState *pstate,
- Relation rel,
- const char *filename,
- bool is_program,
- copy_data_source_cb data_source_cb,
- List *attnamelist,
- List *options)
+static void
+PopulateCatalogInformation(CopyState cstate)
{
- CopyState cstate;
- bool pipe = (filename == NULL);
TupleDesc tupDesc;
AttrNumber num_phys_attrs,
num_defaults;
@@ -3350,31 +3439,8 @@ BeginCopyFrom(ParseState *pstate,
Oid in_func_oid;
int *defmap;
ExprState **defexprs;
- MemoryContext oldcontext;
bool volatile_defexprs;
- cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
- oldcontext = MemoryContextSwitchTo(cstate->copycontext);
-
- /* Initialize state variables */
- cstate->reached_eof = false;
- cstate->eol_type = EOL_UNKNOWN;
- cstate->cur_relname = RelationGetRelationName(cstate->rel);
- cstate->cur_lineno = 0;
- cstate->cur_attname = NULL;
- cstate->cur_attval = NULL;
-
- /* Set up variables to avoid per-attribute overhead. */
- initStringInfo(&cstate->attribute_buf);
- initStringInfo(&cstate->line_buf);
- cstate->line_buf_converted = false;
- cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
- cstate->raw_buf_index = cstate->raw_buf_len = 0;
-
- /* Assign range table, we'll need it in CopyFrom. */
- if (pstate)
- cstate->range_table = pstate->p_rtable;
-
tupDesc = RelationGetDescr(cstate->rel);
num_phys_attrs = tupDesc->natts;
num_defaults = 0;
@@ -3452,6 +3518,54 @@ BeginCopyFrom(ParseState *pstate,
cstate->defexprs = defexprs;
cstate->volatile_defexprs = volatile_defexprs;
cstate->num_defaults = num_defaults;
+}
+
+/*
+ * Setup to read tuples from a file for COPY FROM.
+ *
+ * 'rel': Used as a template for the tuples
+ * 'filename': Name of server-local file to read
+ * 'attnamelist': List of char *, columns to include. NIL selects all cols.
+ * 'options': List of DefElem. See copy_opt_item in gram.y for selections.
+ *
+ * Returns a CopyState, to be passed to NextCopyFrom and related functions.
+ */
+CopyState
+BeginCopyFrom(ParseState *pstate,
+ Relation rel,
+ const char *filename,
+ bool is_program,
+ copy_data_source_cb data_source_cb,
+ List *attnamelist,
+ List *options)
+{
+ CopyState cstate;
+ bool pipe = (filename == NULL);
+ MemoryContext oldcontext;
+
+ cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+
+ /* Initialize state variables */
+ cstate->reached_eof = false;
+ cstate->eol_type = EOL_UNKNOWN;
+ cstate->cur_relname = RelationGetRelationName(cstate->rel);
+ cstate->cur_lineno = 0;
+ cstate->cur_attname = NULL;
+ cstate->cur_attval = NULL;
+
+ /* Set up variables to avoid per-attribute overhead. */
+ initStringInfo(&cstate->attribute_buf);
+ initStringInfo(&cstate->line_buf);
+ cstate->line_buf_converted = false;
+ cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+
+ /* Assign range table, we'll need it in CopyFrom. */
+ if (pstate)
+ cstate->range_table = pstate->p_rtable;
+
+ PopulateCatalogInformation(cstate);
cstate->is_program = is_program;
if (data_source_cb)
@@ -3839,7 +3953,6 @@ static bool
CopyReadLine(CopyState cstate)
{
bool result;
-
resetStringInfo(&cstate->line_buf);
cstate->line_buf_valid = true;
@@ -3864,60 +3977,8 @@ CopyReadLine(CopyState cstate)
} while (CopyLoadRawBuf(cstate));
}
}
- else
- {
- /*
- * If we didn't hit EOF, then we must have transferred the EOL marker
- * to line_buf along with the data. Get rid of it.
- */
- switch (cstate->eol_type)
- {
- case EOL_NL:
- Assert(cstate->line_buf.len >= 1);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\n');
- cstate->line_buf.len--;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_CR:
- Assert(cstate->line_buf.len >= 1);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\r');
- cstate->line_buf.len--;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_CRNL:
- Assert(cstate->line_buf.len >= 2);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 2] == '\r');
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\n');
- cstate->line_buf.len -= 2;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_UNKNOWN:
- /* shouldn't get here */
- Assert(false);
- break;
- }
- }
-
- /* Done reading the line. Convert it to server encoding. */
- if (cstate->need_transcoding)
- {
- char *cvt;
-
- cvt = pg_any_to_server(cstate->line_buf.data,
- cstate->line_buf.len,
- cstate->file_encoding);
- if (cvt != cstate->line_buf.data)
- {
- /* transfer converted data back to line_buf */
- resetStringInfo(&cstate->line_buf);
- appendBinaryStringInfo(&cstate->line_buf, cvt, strlen(cvt));
- pfree(cvt);
- }
- }
-
- /* Now it's safe to use the buffer in error messages */
- cstate->line_buf_converted = true;
+ CONVERT_TO_SERVER_ENCODING(cstate)
return result;
}
@@ -4281,6 +4342,7 @@ not_end_of_copy:
* Transfer any still-uncopied data to line_buf.
*/
REFILL_LINEBUF;
+ CLEAR_EOL_LINE()
return result;
}
--
1.8.3.1
0003-Allow-copy-from-command-to-process-data-from-file-ST.patchtext/x-patch; charset=US-ASCII; name=0003-Allow-copy-from-command-to-process-data-from-file-ST.patchDownload
From 4ff785c888e93a8dd33d4e48cb4f804e204cb739 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Wed, 10 Jun 2020 07:18:33 +0530
Subject: [PATCH 3/4] Allow copy from command to process data from file/STDIN
contents to a table in parallel.
This feature allows the copy from to leverage multiple CPUs in order to copy
data from file/STDIN to a table. This adds a PARALLEL option to COPY FROM
command where the user can specify the number of workers that can be used
to perform the COPY FROM command. Specifying zero as number of workers will
disable parallelism.
The backend, to which the "COPY FROM" query is submitted acts as leader with
the responsibility of reading data from the file/stdin, launching at most n
number of workers as specified with PARALLEL 'n' option in the "COPY FROM"
query. The leader populates the common data required for the workers execution
in the DSM and shares it with the workers. The leader then executes before
statement triggers if there exists any. Leader populates DSM chunks which
includes the start offset and chunk size, while populating the chunks it reads
as many blocks as required into the DSM data blocks from the file. Each block
is of 64K size. The leader parses the data to identify a chunk, the existing
logic from CopyReadLineText which identifies the chunks with some changes was
used for this. Leader checks if a free chunk is available to copy the
information, if there is no free chunk it waits till the required chunk is
freed up by the worker and then copies the identified chunks information
(offset & chunk size) into the DSM chunks. This process is repeated till the
complete file is processed. Simultaneously, the workers cache the chunks(50)
locally into the local memory and release the chunks to the leader for further
populating. Each worker processes the chunk that it cached and inserts it into
the table. The leader does not participate in the insertion of data, leaders
only responsibility will be to identify the chunks as fast as possible for the
workers to do the actual copy operation. The leader waits till all the chunks
populated are processed by the workers and exits.
---
src/backend/access/heap/heapam.c | 13 -
src/backend/access/transam/xact.c | 13 +
src/backend/commands/copy.c | 875 +++++++++++++++++++++++++++++++++--
src/backend/optimizer/util/clauses.c | 2 +-
src/include/access/xact.h | 1 +
src/tools/pgindent/typedefs.list | 1 +
6 files changed, 853 insertions(+), 52 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 94eb37d..6991b9f 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2012,19 +2012,6 @@ static HeapTuple
heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
CommandId cid, int options)
{
- /*
- * Parallel operations are required to be strictly read-only in a parallel
- * worker. Parallel inserts are not safe even in the leader in the
- * general case, because group locking means that heavyweight locks for
- * relation extension or GIN page locks will not conflict between members
- * of a lock group, but we don't prohibit that case here because there are
- * useful special cases that we can safely allow, such as CREATE TABLE AS.
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-
tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index cd30b62..d43902c 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -502,6 +502,19 @@ GetCurrentFullTransactionIdIfAny(void)
}
/*
+ * AssignFullTransactionIdForWorker
+ *
+ * For parallel copy, all the workers must use the same transaction id.
+ */
+void AssignFullTransactionIdForWorker(FullTransactionId fullTransactionId)
+{
+ TransactionState s = CurrentTransactionState;
+
+ Assert((IsInParallelMode() || IsParallelWorker()));
+ s->fullTransactionId = fullTransactionId;
+}
+
+/*
* MarkCurrentTransactionIdLoggedIfAny
*
* Remember that the current xid - if it is assigned - now has been wal logged.
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index d930644..b1e2e71 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -26,6 +26,7 @@
#include "access/xlog.h"
#include "catalog/dependency.h"
#include "catalog/pg_authid.h"
+#include "catalog/pg_proc_d.h"
#include "catalog/pg_type.h"
#include "commands/copy.h"
#include "commands/defrem.h"
@@ -40,11 +41,13 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "optimizer/clauses.h"
#include "optimizer/optimizer.h"
#include "parser/parse_coerce.h"
#include "parser/parse_collate.h"
#include "parser/parse_expr.h"
#include "parser/parse_relation.h"
+#include "pgstat.h"
#include "port/pg_bswap.h"
#include "rewrite/rewriteHandler.h"
#include "storage/fd.h"
@@ -95,6 +98,18 @@ typedef enum CopyInsertMethod
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
} CopyInsertMethod;
+/*
+ * State of the chunk.
+ */
+typedef enum ParallelCopyChunkState
+{
+ CHUNK_INIT, /* initial state of chunk */
+ CHUNK_LEADER_POPULATING, /* leader processing chunk */
+ CHUNK_LEADER_POPULATED, /* leader completed populating chunk */
+ CHUNK_WORKER_PROCESSING, /* worker processing chunk */
+ CHUNK_WORKER_PROCESSED /* worker completed processing chunk */
+}ParallelCopyChunkState;
+
#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
#define DATA_BLOCK_SIZE RAW_BUF_SIZE
@@ -527,9 +542,13 @@ if (1) \
{ \
if (raw_buf_ptr > cstate->raw_buf_index) \
{ \
- appendBinaryStringInfo(&cstate->line_buf, \
- cstate->raw_buf + cstate->raw_buf_index, \
- raw_buf_ptr - cstate->raw_buf_index); \
+ if (!IsParallelCopy()) \
+ appendBinaryStringInfo(&cstate->line_buf, \
+ cstate->raw_buf + cstate->raw_buf_index, \
+ raw_buf_ptr - cstate->raw_buf_index); \
+ else \
+ chunk_size += raw_buf_ptr - cstate->raw_buf_index; \
+ \
cstate->raw_buf_index = raw_buf_ptr; \
} \
} else ((void) 0)
@@ -542,13 +561,40 @@ if (1) \
goto not_end_of_copy; \
} else ((void) 0)
+/* Begin parallel copy Macros */
+#define SET_NEWLINE_SIZE() \
+{ \
+ if (cstate->eol_type == EOL_NL || cstate->eol_type == EOL_CR) \
+ new_line_size = 1; \
+ else if (cstate->eol_type == EOL_CRNL) \
+ new_line_size = 2; \
+ else \
+ new_line_size = 0; \
+}
+
+/*
+ * COPY_WAIT_TO_PROCESS - Wait before continuing to process.
+ */
+#define COPY_WAIT_TO_PROCESS() \
+{ \
+ CHECK_FOR_INTERRUPTS(); \
+ (void) WaitLatch(MyLatch, \
+ WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, \
+ 1L, WAIT_EVENT_PG_SLEEP); \
+ ResetLatch(MyLatch); \
+}
+
+/* End parallel copy Macros */
+
/*
* CONVERT_TO_SERVER_ENCODING - convert contents to server encoding.
*/
#define CONVERT_TO_SERVER_ENCODING(cstate) \
{ \
/* Done reading the line. Convert it to server encoding. */ \
- if (cstate->need_transcoding) \
+ if (cstate->need_transcoding && \
+ (!IsParallelCopy() || \
+ (IsParallelCopy() && !IsLeader()))) \
{ \
char *cvt; \
cvt = pg_any_to_server(cstate->line_buf.data, \
@@ -607,22 +653,38 @@ if (1) \
* CLEAR_EOL_LINE - Wrapper for clearing EOL.
*/
#define CLEAR_EOL_LINE() \
-if (!result && !IsHeaderLine()) \
- CLEAR_EOL_FROM_COPIED_DATA(cstate->line_buf.data, \
- cstate->line_buf.len, \
- cstate->line_buf.len) \
+{ \
+ if (!result && !IsHeaderLine()) \
+ { \
+ if (IsParallelCopy()) \
+ CLEAR_EOL_FROM_COPIED_DATA(cstate->raw_buf, \
+ raw_buf_ptr, chunk_size) \
+ else \
+ CLEAR_EOL_FROM_COPIED_DATA(cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ cstate->line_buf.len) \
+ } \
+}
/*
* INCREMENTPROCESSED - Increment the lines processed.
*/
-#define INCREMENTPROCESSED(processed) \
-processed++;
+#define INCREMENTPROCESSED(processed) \
+{ \
+ if (!IsParallelCopy()) \
+ processed++; \
+ else \
+ pg_atomic_add_fetch_u64(&cstate->pcdata->pcshared_info->processed, 1); \
+}
/*
* GETPROCESSED - Get the lines processed.
*/
#define GETPROCESSED(processed) \
-return processed;
+if (!IsParallelCopy()) \
+ return processed; \
+else \
+ return pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
@@ -671,8 +733,12 @@ static void CopySendInt16(CopyState cstate, int16 val);
static bool CopyGetInt16(CopyState cstate, int16 *val);
static pg_attribute_always_inline void EndParallelCopy(ParallelContext *pcxt);
+static void ExecBeforeStmtTrigger(CopyState cstate);
+static void CheckCopyFromValidity(CopyState cstate);
static void PopulateAttributes(CopyState cstate, TupleDesc tup_desc,
List *attnamelist);
+static void PopulateCatalogInformation(CopyState cstate);
+static pg_attribute_always_inline uint32 GetChunkPosition(CopyState cstate);
static pg_attribute_always_inline copy_data_source_cb LookupParallelCopyFnPtr(const char *funcname);
static pg_attribute_always_inline char* LookupParallelCopyFnStr(copy_data_source_cb fn_addr);
@@ -826,6 +892,130 @@ InsertListShm(ParallelContext *pcxt, int key, List *inputlist,
}
/*
+ * IsTriggerFunctionParallelSafe - Check if the trigger function is parallel
+ * safe for the triggers. Return false if any one of the trigger has parallel
+ * unsafe function.
+ */
+static pg_attribute_always_inline bool
+IsTriggerFunctionParallelSafe(TriggerDesc *trigdesc)
+{
+ int i;
+ for (i = 0; i < trigdesc->numtriggers; i++)
+ {
+ Trigger *trigger = &trigdesc->triggers[i];
+ int trigtype = RI_TRIGGER_NONE;
+
+ if (func_parallel(trigger->tgfoid) != PROPARALLEL_SAFE)
+ return false;
+
+ /* If the trigger is parallel safe, also look for RI_TRIGGER. */
+ trigtype = RI_FKey_trigger_type(trigger->tgfoid);
+ if (trigtype == RI_TRIGGER_PK || trigtype == RI_TRIGGER_FK)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * CheckExprParallelSafety - determine parallel safety of volatile expressions
+ * in default clause of column definition or in where clause and return true if
+ * they are parallel safe.
+ */
+static pg_attribute_always_inline bool
+CheckExprParallelSafety(CopyState cstate)
+{
+ if (contain_volatile_functions(cstate->whereClause))
+ {
+ if (!is_parallel_safe(NULL, (Node *)cstate->whereClause))
+ return false;
+ }
+
+ if (cstate->volatile_defexprs && cstate->defexprs != NULL &&
+ cstate->num_defaults != 0)
+ {
+ int i;
+ for (i = 0; i < cstate->num_defaults; i++)
+ {
+ if (!is_parallel_safe(NULL, (Node *) cstate->defexprs[i]->expr))
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * FindInsertMethod - determine insert mode single, multi, or multi conditional.
+ */
+static pg_attribute_always_inline CopyInsertMethod
+FindInsertMethod(CopyState cstate)
+{
+ if (cstate->rel->trigdesc != NULL &&
+ (cstate->rel->trigdesc->trig_insert_before_row ||
+ cstate->rel->trigdesc->trig_insert_instead_row))
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
+ cstate->rel->trigdesc != NULL &&
+ cstate->rel->trigdesc->trig_insert_new_table)
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+ return CIM_MULTI_CONDITIONAL;
+
+ return CIM_MULTI;
+}
+
+/*
+ * IsParallelCopyAllowed - check for the cases where parallel copy is not
+ * applicable.
+ */
+static pg_attribute_always_inline bool
+IsParallelCopyAllowed(CopyState cstate)
+{
+ /* Parallel copy not allowed for freeze & binary option. */
+ if (cstate->freeze || cstate->binary)
+ return false;
+
+ /* Check if copy is into foreign table. */
+ if (cstate->rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+ return false;
+
+ /* Check if copy is into a temporary table. */
+ if (RELATION_IS_LOCAL(cstate->rel) || RELATION_IS_OTHER_TEMP(cstate->rel))
+ return false;
+
+ /* Check if trigger function is parallel safe. */
+ if (cstate->rel->trigdesc != NULL &&
+ !IsTriggerFunctionParallelSafe(cstate->rel->trigdesc))
+ return false;
+
+ /*
+ * Check if there is after statement or instead of trigger or transition
+ * table triggers.
+ */
+ if (cstate->rel->trigdesc != NULL &&
+ (cstate->rel->trigdesc->trig_insert_after_statement ||
+ cstate->rel->trigdesc->trig_insert_instead_row ||
+ cstate->rel->trigdesc->trig_insert_new_table))
+ return false;
+
+ /* Check if the volatile expressions are parallel safe, if present any. */
+ if (!CheckExprParallelSafety(cstate))
+ return false;
+
+ /* Check if the insertion mode is single. */
+ if (FindInsertMethod(cstate) == CIM_SINGLE)
+ return false;
+
+ return true;
+}
+
+/*
* BeginParallelCopy - start parallel copy tasks.
*
* Get the number of workers required to perform the parallel copy. The data
@@ -855,6 +1045,8 @@ BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
int parallel_workers = 0;
ParallelCopyData *pcdata;
+ CheckCopyFromValidity(cstate);
+
parallel_workers = Min(nworkers, max_worker_processes);
/* Can't perform copy in parallel */
@@ -864,6 +1056,15 @@ BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
cstate->pcdata = pcdata;
+ /*
+ * User chosen parallel copy. Determine if the parallel copy is actually
+ * allowed. If not, go with the non-parallel mode.
+ */
+ if (!IsParallelCopyAllowed(cstate))
+ return NULL;
+
+ full_transaction_id = GetCurrentFullTransactionId();
+
EnterParallelMode();
pcxt = CreateParallelContext("postgres", "ParallelCopyMain",
parallel_workers);
@@ -1090,7 +1291,211 @@ ParallelWorkerInitialization(ParallelCopyCommonKeyData *shared_cstate,
cstate->line_buf_converted = false;
cstate->raw_buf = NULL;
cstate->raw_buf_index = cstate->raw_buf_len = 0;
+
+ PopulateCatalogInformation(cstate);
+
+ /* Create workspace for CopyReadAttributes results. */
+ if (!cstate->binary)
+ {
+ AttrNumber attr_count = list_length(cstate->attnumlist);
+
+ cstate->max_fields = attr_count;
+ cstate->raw_fields = (char **)palloc(attr_count * sizeof(char *));
+ }
}
+
+/*
+ * CacheChunkInfo - Cache the chunk information to local memory.
+ */
+static bool
+CacheChunkInfo(CopyState cstate, uint32 buff_count)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyData *pcdata = cstate->pcdata;
+ uint32 write_pos;
+ ParallelCopyDataBlock *data_blk_ptr;
+ ParallelCopyChunkBoundary *chunkInfo;
+ uint32 offset;
+ int dataSize;
+ int copiedSize = 0;
+
+ resetStringInfo(&pcdata->worker_line_buf[buff_count].line_buf);
+ write_pos = GetChunkPosition(cstate);
+ if (-1 == write_pos)
+ return true;
+
+ /* Get the current chunk information. */
+ chunkInfo = &pcshared_info->chunk_boundaries.ring[write_pos];
+ if (pg_atomic_read_u32(&chunkInfo->chunk_size) == 0)
+ goto empty_data_chunk_update;
+
+ /* Get the block information. */
+ data_blk_ptr = &pcshared_info->data_blocks[chunkInfo->first_block];
+
+ /* Get the offset information from where the data must be copied. */
+ offset = chunkInfo->start_offset;
+ pcdata->worker_line_buf[buff_count].cur_lineno = chunkInfo->cur_lineno;
+
+ elog(DEBUG1, "[Worker] Processing - chunk position:%d, block:%d, unprocessed chunks:%d, offset:%d, chunk size:%d",
+ write_pos, chunkInfo->first_block,
+ pg_atomic_read_u32(&data_blk_ptr->unprocessed_chunk_parts),
+ offset, pg_atomic_read_u32(&chunkInfo->chunk_size));
+
+ for (;;)
+ {
+ uint8 skip_bytes = data_blk_ptr->skip_bytes;
+ /*
+ * There is a possibility that the above loop has come out because
+ * data_blk_ptr->curr_blk_completed is set, but dataSize read might
+ * be an old value, if data_blk_ptr->curr_blk_completed and the chunk is
+ * completed, chunk_size will be set. Read the chunk_size again to be
+ * sure if it is complete or partial block.
+ */
+ dataSize = pg_atomic_read_u32(&chunkInfo->chunk_size);
+ if (dataSize)
+ {
+ int remainingSize = dataSize - copiedSize;
+ if (!remainingSize)
+ break;
+
+ /* Whole chunk is in current block. */
+ if (remainingSize + offset + skip_bytes < DATA_BLOCK_SIZE)
+ {
+ appendBinaryStringInfo(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ remainingSize);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_chunk_parts,
+ 1);
+ break;
+ }
+ else
+ {
+ /* Chunk is spread across the blocks. */
+ uint32 chunkInCurrentBlock = (DATA_BLOCK_SIZE - skip_bytes) - offset;
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ chunkInCurrentBlock);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_chunk_parts, 1);
+ copiedSize += chunkInCurrentBlock;
+ while (copiedSize < dataSize)
+ {
+ uint32 currentBlockCopySize;
+ ParallelCopyDataBlock *currBlkPtr = &pcshared_info->data_blocks[data_blk_ptr->following_block];
+ skip_bytes = currBlkPtr->skip_bytes;
+
+ /*
+ * If complete data is present in current block use
+ * dataSize - copiedSize, or copy the whole block from
+ * current block.
+ */
+ currentBlockCopySize = Min(dataSize - copiedSize, DATA_BLOCK_SIZE - skip_bytes);
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &currBlkPtr->data[0],
+ currentBlockCopySize);
+ pg_atomic_sub_fetch_u32(&currBlkPtr->unprocessed_chunk_parts, 1);
+ copiedSize += currentBlockCopySize;
+ data_blk_ptr = currBlkPtr;
+ }
+
+ break;
+ }
+ }
+ else
+ {
+ /* Copy this complete block from the current offset. */
+ uint32 chunkInCurrentBlock = (DATA_BLOCK_SIZE - skip_bytes) - offset;
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ chunkInCurrentBlock);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_chunk_parts, 1);
+ copiedSize += chunkInCurrentBlock;
+
+ /*
+ * Reset the offset. For the first copy, copy from the offset. For
+ * the subsequent copy the complete block.
+ */
+ offset = 0;
+
+ /* Set data_blk_ptr to the following block. */
+ data_blk_ptr = &pcshared_info->data_blocks[data_blk_ptr->following_block];
+ }
+
+ for (;;)
+ {
+ /* Get the size of this chunk */
+ dataSize = pg_atomic_read_u32(&chunkInfo->chunk_size);
+
+ /*
+ * If the data is present in current block chunkInfo.chunk_size
+ * will be updated. If the data is spread across the blocks either
+ * of chunkInfo.chunk_size or data_blk_ptr->curr_blk_completed can
+ * be updated. chunkInfo.chunk_size will be updated if the complete
+ * read is finished. data_blk_ptr->curr_blk_completed will be
+ * updated if processing of current block is finished and data
+ * processing is not finished.
+ */
+ if (data_blk_ptr->curr_blk_completed || (dataSize != -1))
+ break;
+
+ COPY_WAIT_TO_PROCESS()
+ }
+ }
+
+empty_data_chunk_update:
+ elog(DEBUG1, "[Worker] Completed processing chunk:%d", write_pos);
+ pg_atomic_write_u32(&chunkInfo->chunk_state, CHUNK_WORKER_PROCESSED);
+ pg_atomic_write_u32(&chunkInfo->chunk_size, -1);
+ pg_atomic_add_fetch_u64(&pcshared_info->total_worker_processed, 1);
+ return false;
+}
+
+/*
+ * GetWorkerChunk - Returns a chunk for worker to process.
+ */
+static bool
+GetWorkerChunk(CopyState cstate)
+{
+ uint32 buff_count;
+ ParallelCopyData *pcdata = cstate->pcdata;
+
+ /*
+ * Copy the chunk data to line_buf and release the chunk position so that the
+ * worker can continue loading data.
+ */
+ if (pcdata->worker_line_buf_pos < pcdata->worker_line_buf_count)
+ goto return_chunk;
+
+ pcdata->worker_line_buf_pos = 0;
+ pcdata->worker_line_buf_count = 0;
+
+ for (buff_count = 0; buff_count < WORKER_CHUNK_COUNT; buff_count++)
+ {
+ bool result = CacheChunkInfo(cstate, buff_count);
+ if (result)
+ break;
+
+ pcdata->worker_line_buf_count++;
+ }
+
+ if (pcdata->worker_line_buf_count)
+ goto return_chunk;
+ else
+ resetStringInfo(&cstate->line_buf);
+
+ return true;
+
+return_chunk:
+ cstate->line_buf = pcdata->worker_line_buf[pcdata->worker_line_buf_pos].line_buf;
+ cstate->cur_lineno = pcdata->worker_line_buf[pcdata->worker_line_buf_pos].cur_lineno;
+ cstate->line_buf_valid = true;
+
+ /* Mark that encoding conversion hasn't occurred yet. */
+ cstate->line_buf_converted = false;
+ CONVERT_TO_SERVER_ENCODING(cstate)
+ pcdata->worker_line_buf_pos++;
+ return false;
+}
+
/*
* ParallelCopyMain - parallel copy worker's code.
*
@@ -1136,6 +1541,7 @@ ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
ereport(DEBUG1, (errmsg("Starting parallel copy worker")));
pcdata->pcshared_info = pcshared_info;
+ AssignFullTransactionIdForWorker(pcshared_info->full_transaction_id);
shared_cstate = (ParallelCopyCommonKeyData *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_CSTATE,
false);
@@ -1188,6 +1594,34 @@ ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
MemoryContextSwitchTo(oldcontext);
return;
}
+
+/*
+ * UpdateBlockInChunkInfo - Update the chunk information.
+ */
+static pg_attribute_always_inline int
+UpdateBlockInChunkInfo(CopyState cstate, uint32 blk_pos,
+ uint32 offset, uint32 chunk_size, uint32 chunk_state)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyChunkBoundaries *chunkBoundaryPtr = &pcshared_info->chunk_boundaries;
+ ParallelCopyChunkBoundary *chunkInfo;
+ int chunk_pos = chunkBoundaryPtr->leader_pos;
+
+ /* Update the chunk information for the worker to pick and process. */
+ chunkInfo = &chunkBoundaryPtr->ring[chunk_pos];
+ while (pg_atomic_read_u32(&chunkInfo->chunk_size) != -1)
+ COPY_WAIT_TO_PROCESS()
+
+ chunkInfo->first_block = blk_pos;
+ chunkInfo->start_offset = offset;
+ chunkInfo->cur_lineno = cstate->cur_lineno;
+ pg_atomic_write_u32(&chunkInfo->chunk_size, chunk_size);
+ pg_atomic_write_u32(&chunkInfo->chunk_state, chunk_state);
+ chunkBoundaryPtr->leader_pos = (chunkBoundaryPtr->leader_pos + 1) % RINGSIZE;
+
+ return chunk_pos;
+}
+
/*
* ParallelCopyLeader - parallel copy leader's functionality.
*
@@ -1213,9 +1647,158 @@ ParallelCopyLeader(CopyState cstate)
ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
ereport(DEBUG1, (errmsg("Running parallel copy leader")));
+ /* Execute the before statement triggers from the leader */
+ ExecBeforeStmtTrigger(cstate);
+
+ /* On input just throw the header line away. */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
+ {
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ {
+ pcshared_info->is_read_in_progress = false;
+ return; /* done */
+ }
+ }
+
+ for (;;)
+ {
+ bool done;
+ cstate->cur_lineno++;
+
+ /* Actually read the line into memory here. */
+ done = CopyReadLine(cstate);
+
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ break;
+ }
+
pcshared_info->is_read_in_progress = false;
cstate->cur_lineno = 0;
}
+
+/*
+ * GetChunkPosition - return the chunk position that worker should process.
+ */
+static uint32
+GetChunkPosition(CopyState cstate)
+{
+ ParallelCopyData *pcdata = cstate->pcdata;
+ ParallelCopyShmInfo *pcshared_info = pcdata->pcshared_info;
+ uint32 previous_pos = pcdata->worker_processed_pos;
+ uint32 write_pos = (previous_pos == -1) ? 0 : (previous_pos + 1) % RINGSIZE;
+ for (;;)
+ {
+ int dataSize;
+ bool is_read_in_progress = pcshared_info->is_read_in_progress;
+ ParallelCopyChunkBoundary *chunkInfo;
+ ParallelCopyDataBlock *data_blk_ptr;
+ ParallelCopyChunkState chunk_state = CHUNK_LEADER_POPULATED;
+ ParallelCopyChunkState curr_chunk_state;
+ CHECK_FOR_INTERRUPTS();
+
+ /* File read completed & no elements to process. */
+ if (!is_read_in_progress &&
+ (pcshared_info->populated ==
+ pg_atomic_read_u64(&pcshared_info->total_worker_processed)))
+ {
+ write_pos = -1;
+ break;
+ }
+
+ /* Get the current chunk information. */
+ chunkInfo = &pcshared_info->chunk_boundaries.ring[write_pos];
+ curr_chunk_state = pg_atomic_read_u32(&chunkInfo->chunk_state);
+ if ((write_pos % WORKER_CHUNK_COUNT == 0) &&
+ (curr_chunk_state == CHUNK_WORKER_PROCESSED ||
+ curr_chunk_state == CHUNK_WORKER_PROCESSING))
+ {
+ pcdata->worker_processed_pos = write_pos;
+ write_pos = (write_pos + WORKER_CHUNK_COUNT) % RINGSIZE;
+ continue;
+ }
+
+ /* Get the size of this chunk. */
+ dataSize = pg_atomic_read_u32(&chunkInfo->chunk_size);
+
+ if (dataSize != 0) /* If not an empty chunk. */
+ {
+ /* Get the block information. */
+ data_blk_ptr = &pcshared_info->data_blocks[chunkInfo->first_block];
+
+ if (!data_blk_ptr->curr_blk_completed && (dataSize == -1))
+ {
+ /* Wait till the current chunk or block is added. */
+ COPY_WAIT_TO_PROCESS()
+ continue;
+ }
+ }
+
+ /* Make sure that no worker has consumed this element. */
+ if (pg_atomic_compare_exchange_u32(&chunkInfo->chunk_state,
+ &chunk_state, CHUNK_WORKER_PROCESSING))
+ break;
+ }
+
+ pcdata->worker_processed_pos = write_pos;
+ return write_pos;
+}
+
+/*
+ * GetFreeCopyBlock - Get a free block for data to be copied.
+ */
+static pg_attribute_always_inline uint32
+GetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
+{
+ int count = 0;
+ uint32 last_free_block = pcshared_info->cur_block_pos;
+ uint32 block_pos = (last_free_block != -1) ? ((last_free_block + 1) % MAX_BLOCKS_COUNT): 0;
+
+ /* Get a new block for copying data. */
+ while (count < MAX_BLOCKS_COUNT)
+ {
+ ParallelCopyDataBlock *dataBlkPtr = &pcshared_info->data_blocks[block_pos];
+ uint32 unprocessed_chunk_parts = pg_atomic_read_u32(&dataBlkPtr->unprocessed_chunk_parts);
+ if (unprocessed_chunk_parts == 0)
+ {
+ dataBlkPtr->curr_blk_completed = false;
+ dataBlkPtr->skip_bytes = 0;
+ pcshared_info->cur_block_pos = block_pos;
+ return block_pos;
+ }
+
+ block_pos = (block_pos + 1) % MAX_BLOCKS_COUNT;
+ count++;
+ }
+
+ return -1;
+}
+
+/*
+ * WaitGetFreeCopyBlock - If there are no blocks available, wait and get a block
+ * for copying data.
+ */
+static uint32
+WaitGetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
+{
+ uint32 new_free_pos = -1;
+ for (;;)
+ {
+ new_free_pos = GetFreeCopyBlock(pcshared_info);
+ if (new_free_pos != -1) /* We have got one block, break now. */
+ break;
+
+ COPY_WAIT_TO_PROCESS()
+ }
+
+ return new_free_pos;
+}
+
/*
* LookupParallelCopyFnPtr - Look up parallel copy function pointer.
*/
@@ -1251,6 +1834,146 @@ LookupParallelCopyFnStr(copy_data_source_cb fn_addr)
/* We can only reach this by programming error. */
elog(ERROR, "internal function pointer not found");
}
+
+/*
+ * SetRawBufForLoad - Set raw_buf to the shared memory where the file data must
+ * be read.
+ */
+static void
+SetRawBufForLoad(CopyState cstate, uint32 chunk_size, uint32 copy_buf_len,
+ uint32 raw_buf_ptr, char **copy_raw_buf)
+{
+ ParallelCopyShmInfo *pcshared_info;
+ uint32 cur_block_pos;
+ uint32 next_block_pos;
+ ParallelCopyDataBlock *cur_data_blk_ptr = NULL;
+ ParallelCopyDataBlock *next_data_blk_ptr = NULL;
+
+ if (!IsParallelCopy())
+ return;
+
+ pcshared_info = cstate->pcdata->pcshared_info;
+ cur_block_pos = pcshared_info->cur_block_pos;
+ cur_data_blk_ptr = (cstate->raw_buf) ? &pcshared_info->data_blocks[cur_block_pos] : NULL;
+ next_block_pos = WaitGetFreeCopyBlock(pcshared_info);
+ next_data_blk_ptr = &pcshared_info->data_blocks[next_block_pos];
+
+ /* set raw_buf to the data block in shared memory */
+ cstate->raw_buf = next_data_blk_ptr->data;
+ *copy_raw_buf = cstate->raw_buf;
+ if (cur_data_blk_ptr && chunk_size)
+ {
+ /*
+ * Mark the previous block as completed, worker can start copying this
+ * data.
+ */
+ cur_data_blk_ptr->following_block = next_block_pos;
+ pg_atomic_add_fetch_u32(&cur_data_blk_ptr->unprocessed_chunk_parts, 1);
+ cur_data_blk_ptr->skip_bytes = copy_buf_len - raw_buf_ptr;
+ cur_data_blk_ptr->curr_blk_completed = true;
+ }
+}
+
+/*
+ * EndChunkParallelCopy - Update the chunk information in shared memory.
+ */
+static void
+EndChunkParallelCopy(CopyState cstate, uint32 chunk_pos, uint32 chunk_size,
+ uint32 raw_buf_ptr)
+{
+ uint8 new_line_size;
+ if (!IsParallelCopy())
+ return;
+
+ if (!IsHeaderLine())
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyChunkBoundaries *chunkBoundaryPtr = &pcshared_info->chunk_boundaries;
+ SET_NEWLINE_SIZE()
+ if (chunk_size)
+ {
+ ParallelCopyChunkBoundary *chunkInfo = &chunkBoundaryPtr->ring[chunk_pos];
+ /*
+ * If the new_line_size > raw_buf_ptr, then the new block has only
+ * new line char content. The unprocessed count should not be
+ * increased in this case.
+ */
+ if (raw_buf_ptr > new_line_size)
+ {
+ uint32 cur_block_pos = pcshared_info->cur_block_pos;
+ ParallelCopyDataBlock *curr_data_blk_ptr = &pcshared_info->data_blocks[cur_block_pos];
+ pg_atomic_add_fetch_u32(&curr_data_blk_ptr->unprocessed_chunk_parts, 1);
+ }
+
+ /* Update chunk size. */
+ pg_atomic_write_u32(&chunkInfo->chunk_size, chunk_size);
+ pg_atomic_write_u32(&chunkInfo->chunk_state, CHUNK_LEADER_POPULATED);
+ elog(DEBUG1, "[Leader] After adding - chunk position:%d, chunk_size:%d",
+ chunk_pos, chunk_size);
+ pcshared_info->populated++;
+ }
+ else if (new_line_size)
+ {
+ /* This means only new line char, empty record should be inserted.*/
+ ParallelCopyChunkBoundary *chunkInfo;
+ chunk_pos = UpdateBlockInChunkInfo(cstate, -1, -1, 0,
+ CHUNK_LEADER_POPULATED);
+ chunkInfo = &chunkBoundaryPtr->ring[chunk_pos];
+ elog(DEBUG1, "[Leader] Added empty chunk with offset:%d, chunk position:%d, chunk size:%d",
+ chunkInfo->start_offset, chunk_pos,
+ pg_atomic_read_u32(&chunkInfo->chunk_size));
+ pcshared_info->populated++;
+ }
+ }
+}
+
+/*
+ * ExecBeforeStmtTrigger - Execute the before statement trigger, this will be
+ * executed for parallel copy by the leader process.
+ */
+static void
+ExecBeforeStmtTrigger(CopyState cstate)
+{
+ EState *estate = CreateExecutorState();
+ ResultRelInfo *resultRelInfo;
+
+ Assert(IsLeader());
+
+ /*
+ * We need a ResultRelInfo so we can use the regular executor's
+ * index-entry-making machinery. (There used to be a huge amount of code
+ * here that basically duplicated execUtils.c ...)
+ */
+ resultRelInfo = makeNode(ResultRelInfo);
+ InitResultRelInfo(resultRelInfo,
+ cstate->rel,
+ 1, /* must match rel's position in range_table */
+ NULL,
+ 0);
+
+ /* Verify the named relation is a valid target for INSERT */
+ CheckValidResultRel(resultRelInfo, CMD_INSERT);
+
+ estate->es_result_relations = resultRelInfo;
+ estate->es_num_result_relations = 1;
+ estate->es_result_relation_info = resultRelInfo;
+
+ ExecInitRangeTable(estate, cstate->range_table);
+
+ /*
+ * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
+ * should do this for COPY, since it's not really an "INSERT" statement as
+ * such. However, executing these triggers maintains consistency with the
+ * EACH ROW triggers that we already fire on COPY.
+ */
+ ExecBSInsertTriggers(estate, resultRelInfo);
+
+ /* Close any trigger target relations */
+ ExecCleanUpTriggerState(estate);
+
+ FreeExecutorState(estate);
+}
+
/*
* Send copy start/stop messages for frontend copies. These have changed
* in past protocol redesigns.
@@ -3611,7 +4334,8 @@ CopyFrom(CopyState cstate)
PartitionTupleRouting *proute = NULL;
ErrorContextCallback errcallback;
- CommandId mycid = GetCurrentCommandId(true);
+ CommandId mycid = IsParallelCopy() ? cstate->pcdata->pcshared_info->mycid :
+ GetCurrentCommandId(true);
int ti_options = 0; /* start with default options for insert */
BulkInsertState bistate = NULL;
CopyInsertMethod insertMethod;
@@ -3621,7 +4345,14 @@ CopyFrom(CopyState cstate)
bool has_instead_insert_row_trig;
bool leafpart_use_multi_insert = false;
- CheckCopyFromValidity(cstate);
+ /*
+ * Perform this check if it is not parallel copy. In case of parallel
+ * copy, this check is done by the leader, so that if any invalid case
+ * exist the copy from command will error out from the leader itself,
+ * avoiding launching workers, just to throw error.
+ */
+ if (!IsParallelCopy())
+ CheckCopyFromValidity(cstate);
/*
* If the target file is new-in-transaction, we assume that checking FSM
@@ -3848,13 +4579,16 @@ CopyFrom(CopyState cstate)
has_instead_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
resultRelInfo->ri_TrigDesc->trig_insert_instead_row);
- /*
- * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
- * should do this for COPY, since it's not really an "INSERT" statement as
- * such. However, executing these triggers maintains consistency with the
- * EACH ROW triggers that we already fire on COPY.
- */
- ExecBSInsertTriggers(estate, resultRelInfo);
+ if (!IsParallelCopy())
+ {
+ /*
+ * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
+ * should do this for COPY, since it's not really an "INSERT" statement as
+ * such. However, executing these triggers maintains consistency with the
+ * EACH ROW triggers that we already fire on COPY.
+ */
+ ExecBSInsertTriggers(estate, resultRelInfo);
+ }
econtext = GetPerTupleExprContext(estate);
@@ -4368,7 +5102,7 @@ BeginCopyFrom(ParseState *pstate,
initStringInfo(&cstate->attribute_buf);
initStringInfo(&cstate->line_buf);
cstate->line_buf_converted = false;
- cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf = (IsParallelCopy()) ? NULL : (char *) palloc(RAW_BUF_SIZE + 1);
cstate->raw_buf_index = cstate->raw_buf_len = 0;
/* Assign range table, we'll need it in CopyFrom. */
@@ -4512,26 +5246,35 @@ NextCopyFromRawFields(CopyState cstate, char ***fields, int *nfields)
/* only available for text or csv input */
Assert(!cstate->binary);
- /* on input just throw the header line away */
- if (cstate->cur_lineno == 0 && cstate->header_line)
+ if (IsParallelCopy())
{
- cstate->cur_lineno++;
- if (CopyReadLine(cstate))
- return false; /* done */
+ done = GetWorkerChunk(cstate);
+ if (done && cstate->line_buf.len == 0)
+ return false;
}
+ else
+ {
+ /* on input just throw the header line away */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
+ {
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ return false; /* done */
+ }
- cstate->cur_lineno++;
+ cstate->cur_lineno++;
- /* Actually read the line into memory here */
- done = CopyReadLine(cstate);
+ /* Actually read the line into memory here */
+ done = CopyReadLine(cstate);
- /*
- * EOF at start of line means we're done. If we see EOF after some
- * characters, we act as though it was newline followed by EOF, ie,
- * process the line and then exit loop on next iteration.
- */
- if (done && cstate->line_buf.len == 0)
- return false;
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ return false;
+ }
/* Parse the line into de-escaped field values */
if (cstate->csv_mode)
@@ -4781,9 +5524,31 @@ CopyReadLine(CopyState cstate)
*/
if (cstate->copy_dest == COPY_NEW_FE)
{
+ bool bIsFirst = true;
do
{
- cstate->raw_buf_index = cstate->raw_buf_len;
+ if (!IsParallelCopy())
+ cstate->raw_buf_index = cstate->raw_buf_len;
+ else
+ {
+ if (cstate->raw_buf_index == RAW_BUF_SIZE)
+ {
+ /* Get a new block if it is the first time, From the
+ * subsequent time, reset the index and re-use the same
+ * block.
+ */
+ if (bIsFirst)
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint32 block_pos = WaitGetFreeCopyBlock(pcshared_info);
+ cstate->raw_buf = pcshared_info->data_blocks[block_pos].data;
+ bIsFirst = false;
+ }
+
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+ }
+ }
+
} while (CopyLoadRawBuf(cstate));
}
}
@@ -4813,6 +5578,11 @@ CopyReadLineText(CopyState cstate)
char quotec = '\0';
char escapec = '\0';
+ /* For parallel copy */
+ uint32 chunk_size = 0;
+ int chunk_pos = 0;
+
+ cstate->eol_type = EOL_UNKNOWN;
if (cstate->csv_mode)
{
quotec = cstate->quote[0];
@@ -4867,6 +5637,8 @@ CopyReadLineText(CopyState cstate)
if (raw_buf_ptr >= copy_buf_len || need_data)
{
REFILL_LINEBUF;
+ SetRawBufForLoad(cstate, chunk_size, copy_buf_len, raw_buf_ptr,
+ ©_raw_buf);
/*
* Try to read some more data. This will certainly reset
@@ -5091,9 +5863,15 @@ CopyReadLineText(CopyState cstate)
* discard the data and the \. sequence.
*/
if (prev_raw_ptr > cstate->raw_buf_index)
- appendBinaryStringInfo(&cstate->line_buf,
+ {
+ if (!IsParallelCopy())
+ appendBinaryStringInfo(&cstate->line_buf,
cstate->raw_buf + cstate->raw_buf_index,
prev_raw_ptr - cstate->raw_buf_index);
+ else
+ chunk_size += prev_raw_ptr - cstate->raw_buf_index;
+ }
+
cstate->raw_buf_index = raw_buf_ptr;
result = true; /* report EOF */
break;
@@ -5145,6 +5923,26 @@ not_end_of_copy:
IF_NEED_REFILL_AND_EOF_BREAK(mblen - 1);
raw_buf_ptr += mblen - 1;
}
+
+ /*
+ * Skip the header line. Update the chunk here, this cannot be done at
+ * the beginning, as there is a possibility that file contains empty
+ * lines.
+ */
+ if (IsParallelCopy() && first_char_in_line && !IsHeaderLine())
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyChunkBoundary *chunkInfo;
+ uint32 chunk_first_block = pcshared_info->cur_block_pos;
+ chunk_pos = UpdateBlockInChunkInfo(cstate,
+ chunk_first_block,
+ cstate->raw_buf_index, -1,
+ CHUNK_LEADER_POPULATING);
+ chunkInfo = &pcshared_info->chunk_boundaries.ring[chunk_pos];
+ elog(DEBUG1, "[Leader] Adding - block:%d, offset:%d, chunk position:%d",
+ chunk_first_block, chunkInfo->start_offset, chunk_pos);
+ }
+
first_char_in_line = false;
} /* end of outer loop */
@@ -5153,6 +5951,7 @@ not_end_of_copy:
*/
REFILL_LINEBUF;
CLEAR_EOL_LINE()
+ EndChunkParallelCopy(cstate, chunk_pos, chunk_size, raw_buf_ptr);
return result;
}
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 0c6fe01..3faadb8 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -865,7 +865,7 @@ is_parallel_safe(PlannerInfo *root, Node *node)
* planning, because those are parallel-restricted and there might be one
* in this expression. But otherwise we don't need to look.
*/
- if (root->glob->maxParallelHazard == PROPARALLEL_SAFE &&
+ if (root != NULL && root->glob->maxParallelHazard == PROPARALLEL_SAFE &&
root->glob->paramExecTypes == NIL)
return true;
/* Else use max_parallel_hazard's search logic, but stop on RESTRICTED */
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 88025b1..f8bdcc3 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -381,6 +381,7 @@ extern FullTransactionId GetTopFullTransactionId(void);
extern FullTransactionId GetTopFullTransactionIdIfAny(void);
extern FullTransactionId GetCurrentFullTransactionId(void);
extern FullTransactionId GetCurrentFullTransactionIdIfAny(void);
+extern void AssignFullTransactionIdForWorker(FullTransactionId fullTransactionId);
extern void MarkCurrentTransactionIdLoggedIfAny(void);
extern bool SubTransactionIsActive(SubTransactionId subxid);
extern CommandId GetCurrentCommandId(bool used);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 3373894..30eb49d 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1700,6 +1700,7 @@ ParallelCompletionPtr
ParallelContext
ParallelCopyChunkBoundaries
ParallelCopyChunkBoundary
+ParallelCopyChunkState
ParallelCopyCommonKeyData
ParallelCopyData
ParallelCopyDataBlock
--
1.8.3.1
0002-Framework-for-leader-worker-in-parallel-copy.patchtext/x-patch; charset=US-ASCII; name=0002-Framework-for-leader-worker-in-parallel-copy.patchDownload
From 8a4d7943545a16d980ae06dcc9f25b6a6b0b5a92 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Wed, 10 Jun 2020 06:53:04 +0530
Subject: [PATCH 2/4] Framework for leader/worker in parallel copy.
This patch has the framework for data structures in parallel copy, leader
initialization, worker initialization, shared memory updation, starting workers,
wait for workers and workers exiting.
---
src/backend/access/transam/parallel.c | 4 +
src/backend/commands/copy.c | 812 +++++++++++++++++++++++++++-
src/backend/replication/logical/tablesync.c | 2 +-
src/include/commands/copy.h | 4 +
src/tools/pgindent/typedefs.list | 8 +
5 files changed, 828 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 14a8690..09e7a19 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -25,6 +25,7 @@
#include "catalog/pg_enum.h"
#include "catalog/storage.h"
#include "commands/async.h"
+#include "commands/copy.h"
#include "executor/execParallel.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
@@ -145,6 +146,9 @@ static const struct
},
{
"parallel_vacuum_main", parallel_vacuum_main
+ },
+ {
+ "ParallelCopyMain", ParallelCopyMain
}
};
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index eaf0f78..d930644 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -96,9 +96,127 @@ typedef enum CopyInsertMethod
} CopyInsertMethod;
#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+
+#define DATA_BLOCK_SIZE RAW_BUF_SIZE
+#define RINGSIZE (10 * 1000)
+#define MAX_BLOCKS_COUNT 1000
+#define WORKER_CHUNK_COUNT 50 /* should be mod of RINGSIZE */
+
+#define IsParallelCopy() (cstate->is_parallel)
+#define IsLeader() (cstate->pcdata->is_leader)
#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
/*
+ * Copy data block information.
+ */
+typedef struct ParallelCopyDataBlock
+{
+ /* The number of unprocessed chunks in the current block. */
+ pg_atomic_uint32 unprocessed_chunk_parts;
+
+ /*
+ * If the current chunk data is continued into another block,
+ * following_block will have the position where the remaining data need to
+ * be read.
+ */
+ uint32 following_block;
+
+ /*
+ * This flag will be set, when the leader finds out this block can be read
+ * safely by the worker. This helps the worker to start processing the chunk
+ * early where the chunk will be spread across many blocks and the worker
+ * need not wait for the complete chunk to be processed.
+ */
+ bool curr_blk_completed;
+ char data[DATA_BLOCK_SIZE]; /* data read from file */
+ uint8 skip_bytes;
+}ParallelCopyDataBlock;
+
+/*
+ * Individual Chunk information.
+ */
+typedef struct ParallelCopyChunkBoundary
+{
+ /* Position of the first block in data_blocks array. */
+ uint32 first_block;
+ uint32 start_offset; /* start offset of the chunk */
+
+ /*
+ * Size of the current chunk -1 means chunk is yet to be filled completely,
+ * 0 means empty chunk, >0 means chunk filled with chunk size data.
+ */
+ pg_atomic_uint32 chunk_size;
+ pg_atomic_uint32 chunk_state; /* chunk state */
+ uint64 cur_lineno; /* line number for error messages */
+}ParallelCopyChunkBoundary;
+
+/*
+ * Array of the chunk.
+ */
+typedef struct ParallelCopyChunkBoundaries
+{
+ /* Position for the leader to populate a chunk. */
+ uint32 leader_pos;
+
+ /* Data read from the file/stdin by the leader process. */
+ ParallelCopyChunkBoundary ring[RINGSIZE];
+}ParallelCopyChunkBoundaries;
+
+/*
+ * Shared information among parallel copy workers. This will be allocated in the
+ * DSM segment.
+ */
+typedef struct ParallelCopyShmInfo
+{
+ bool is_read_in_progress; /* file read status */
+
+ /*
+ * Actual Chunks inserted by worker (some records will be filtered based on
+ * where condition).
+ */
+ pg_atomic_uint64 processed;
+ pg_atomic_uint64 total_worker_processed; /* total processed records by the workers */
+ uint64 populated; /* Chunks populated by leader */
+ uint32 cur_block_pos; /* current data block */
+ ParallelCopyDataBlock data_blocks[MAX_BLOCKS_COUNT]; /* data block array */
+ FullTransactionId full_transaction_id; /* xid for copy from statement */
+ CommandId mycid; /* command id */
+ ParallelCopyChunkBoundaries chunk_boundaries; /* chunk array */
+} ParallelCopyShmInfo;
+
+/*
+ * Parallel copy line buffer information.
+ */
+typedef struct ParallelCopyLineBuf
+{
+ StringInfoData line_buf;
+ uint64 cur_lineno; /* line number for error messages */
+}ParallelCopyLineBuf;
+
+/*
+ * Parallel copy data information.
+ */
+typedef struct ParallelCopyData
+{
+ Oid relid; /* relation id of the table */
+ ParallelCopyShmInfo *pcshared_info; /* common info in shared memory */
+ bool is_leader;
+
+ /* chunk position which worker is processing */
+ uint32 worker_processed_pos;
+
+ /*
+ * Local line_buf array, workers will copy it here and release the chunks
+ * for the leader to continue.
+ */
+ ParallelCopyLineBuf worker_line_buf[WORKER_CHUNK_COUNT];
+ uint32 worker_line_buf_count; /* Number of lines */
+
+ /* Current position in worker_line_buf */
+ uint32 worker_line_buf_pos;
+}ParallelCopyData;
+
+/*
* This struct contains all the state variables used throughout a COPY
* operation. For simplicity, we use the same struct for all variants of COPY,
* even though some fields are used in only some cases.
@@ -225,8 +343,66 @@ typedef struct CopyStateData
char *raw_buf;
int raw_buf_index; /* next byte to process */
int raw_buf_len; /* total # of bytes stored */
+ int nworkers;
+ bool is_parallel;
+ ParallelCopyData *pcdata;
} CopyStateData;
+/*
+ * This structure helps in storing the common data from CopyStateData that are
+ * required by the workers. This information will then be allocated and stored
+ * into the DSM for the worker to retrieve and copy it to CopyStateData.
+ */
+typedef struct ParallelCopyCommonKeyData
+{
+ /* low-level state data */
+ CopyDest copy_dest; /* type of copy source/destination */
+ int file_encoding; /* file or remote side's character encoding */
+ bool need_transcoding; /* file encoding diff from server? */
+ bool encoding_embeds_ascii; /* ASCII can be non-first byte? */
+
+ /* parameters from the COPY command */
+ bool csv_mode; /* Comma Separated Value format? */
+ bool header_line; /* CSV header line? */
+ int null_print_len; /* length of same */
+ bool force_quote_all; /* FORCE_QUOTE *? */
+ bool convert_selectively; /* do selective binary conversion? */
+
+ /* Working state for COPY FROM */
+ AttrNumber num_defaults;
+ Oid relid;
+}ParallelCopyCommonKeyData;
+
+/*
+ * This structure will help in converting a List data type into the below
+ * structure format with the count having the number of elements in the list and
+ * the info having the List elements appended contigously. This converted
+ * structure will be allocated in shared memory and stored in DSM for the worker
+ * to retrieve and later convert it back to List data type.
+ */
+typedef struct ParallelCopyKeyListInfo
+{
+ int count; /* count of attributes */
+
+ /* string info in the form info followed by info1, info2... infon */
+ char info[1];
+} ParallelCopyKeyListInfo;
+
+/*
+ * List of internal parallel copy function pointers.
+ */
+static const struct
+{
+ char *fn_name;
+ copy_data_source_cb fn_addr;
+} InternalParallelCopyFuncPtrs[] =
+
+{
+ {
+ "copy_read_data", copy_read_data
+ },
+};
+
/* DestReceiver for COPY (query) TO */
typedef struct
{
@@ -256,6 +432,23 @@ typedef struct
/* Trim the list of buffers back down to this number after flushing */
#define MAX_PARTITION_BUFFERS 32
+/* DSM keys for parallel copy. */
+#define PARALLEL_COPY_KEY_SHARED_INFO 1
+#define PARALLEL_COPY_KEY_CSTATE 2
+#define PARALLEL_COPY_KEY_NULL_PRINT 3
+#define PARALLEL_COPY_KEY_NULL_PRINT_CLIENT 4
+#define PARALLEL_COPY_KEY_DELIM 5
+#define PARALLEL_COPY_KEY_QUOTE 6
+#define PARALLEL_COPY_KEY_ESCAPE 7
+#define PARALLEL_COPY_KEY_ATTNAME_LIST 8
+#define PARALLEL_COPY_KEY_FORCE_QUOTE_LIST 9
+#define PARALLEL_COPY_KEY_NOT_NULL_LIST 10
+#define PARALLEL_COPY_KEY_NULL_LIST 11
+#define PARALLEL_COPY_KEY_CONVERT_LIST 12
+#define PARALLEL_COPY_KEY_DATASOURCE_CB 13
+#define PARALLEL_COPY_KEY_WHERE_CLAUSE_STR 14
+#define PARALLEL_COPY_KEY_RANGE_TABLE 15
+
/* Stores multi-insert data related to a single relation in CopyFrom. */
typedef struct CopyMultiInsertBuffer
{
@@ -477,10 +670,588 @@ static bool CopyGetInt32(CopyState cstate, int32 *val);
static void CopySendInt16(CopyState cstate, int16 val);
static bool CopyGetInt16(CopyState cstate, int16 *val);
+static pg_attribute_always_inline void EndParallelCopy(ParallelContext *pcxt);
static void PopulateAttributes(CopyState cstate, TupleDesc tup_desc,
List *attnamelist);
+static pg_attribute_always_inline copy_data_source_cb LookupParallelCopyFnPtr(const char *funcname);
+static pg_attribute_always_inline char* LookupParallelCopyFnStr(copy_data_source_cb fn_addr);
/*
+ * CopyCommonInfoForWorker - Copy shared_cstate using cstate information.
+ */
+static pg_attribute_always_inline void
+CopyCommonInfoForWorker(CopyState cstate, ParallelCopyCommonKeyData *shared_cstate)
+{
+ shared_cstate->copy_dest = cstate->copy_dest;
+ shared_cstate->file_encoding = cstate->file_encoding;
+ shared_cstate->need_transcoding = cstate->need_transcoding;
+ shared_cstate->encoding_embeds_ascii = cstate->encoding_embeds_ascii;
+ shared_cstate->csv_mode = cstate->csv_mode;
+ shared_cstate->header_line = cstate->header_line;
+ shared_cstate->null_print_len = cstate->null_print_len;
+ shared_cstate->force_quote_all = cstate->force_quote_all;
+ shared_cstate->convert_selectively = cstate->convert_selectively;
+ shared_cstate->num_defaults = cstate->num_defaults;
+ shared_cstate->relid = cstate->pcdata->relid;
+}
+
+/*
+ * RetrieveSharedString - Retrieve the string from shared memory.
+ */
+static void
+RetrieveSharedString(shm_toc *toc, int sharedkey, char **copystr)
+{
+ char *shared_str_val = (char *)shm_toc_lookup(toc, sharedkey, true);
+ if (shared_str_val)
+ *copystr = pstrdup(shared_str_val);
+}
+
+/*
+ * RetrieveSharedList - Retrieve the list from shared memory.
+ */
+static void
+RetrieveSharedList(shm_toc *toc, int sharedkey, List **copylist)
+{
+ ParallelCopyKeyListInfo *listinformation = (ParallelCopyKeyListInfo *)shm_toc_lookup(toc, sharedkey,
+ true);
+ if (listinformation)
+ {
+ int length = 0;
+ int count;
+
+ for (count = 0; count < listinformation->count; count++)
+ {
+ char *attname = (char *)(listinformation->info + length);
+ length += strlen(attname) + 1;
+ *copylist = lappend(*copylist, makeString(attname));
+ }
+ }
+}
+
+/*
+ * CopyListSharedMemory - Copy the list into shared memory.
+ */
+static void
+CopyListSharedMemory(List *inputlist, Size memsize, ParallelCopyKeyListInfo *sharedlistinfo)
+{
+ ListCell *l;
+ int length = 0;
+
+ MemSet(sharedlistinfo, 0, memsize);
+ foreach(l, inputlist)
+ {
+ char *name = strVal(lfirst(l));
+ memcpy((char *)(sharedlistinfo->info + length), name, strlen(name));
+ sharedlistinfo->count++;
+ length += strlen(name) + 1;
+ }
+}
+
+/*
+ * ComputeListSize - compute the list size.
+ */
+static int
+ComputeListSize(List *inputlist)
+{
+ int est_size = sizeof(int);
+ if (inputlist != NIL)
+ {
+ ListCell *l;
+ foreach(l, inputlist)
+ est_size += strlen(strVal(lfirst(l))) + 1;
+ }
+
+ return est_size;
+}
+
+/*
+ * EstimateChunkKeysStr - Estimate the size required in shared memory for the
+ * input string.
+ */
+static void
+EstimateChunkKeysStr(ParallelContext *pcxt, char *inputstr)
+{
+ if (inputstr)
+ {
+ shm_toc_estimate_chunk(&pcxt->estimator, strlen(inputstr) + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+}
+
+/*
+ * EstimateChunkKeysList - Estimate the size required in shared memory for the
+ * input list.
+ */
+static void
+EstimateChunkKeysList(ParallelContext *pcxt, List *inputlist,
+ Size *est_list_size)
+{
+ if (inputlist != NIL)
+ {
+ *est_list_size = ComputeListSize(inputlist);
+ shm_toc_estimate_chunk(&pcxt->estimator, *est_list_size);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+}
+
+/*
+ * InsertStringShm - Insert a string into shared memory.
+ */
+static void
+InsertStringShm(ParallelContext *pcxt, int key, char *inputstr)
+{
+ if (inputstr)
+ {
+ char *shmptr = (char *)shm_toc_allocate(pcxt->toc,
+ strlen(inputstr) + 1);
+ strcpy(shmptr, inputstr);
+ shm_toc_insert(pcxt->toc, key, shmptr);
+ }
+}
+
+/*
+ * InsertListShm - Insert a list into shared memory.
+ */
+static void
+InsertListShm(ParallelContext *pcxt, int key, List *inputlist,
+ Size est_list_size)
+{
+ if (inputlist != NIL)
+ {
+ ParallelCopyKeyListInfo *sharedlistinfo = (ParallelCopyKeyListInfo *)shm_toc_allocate(pcxt->toc,
+ est_list_size);
+ CopyListSharedMemory(inputlist, est_list_size, sharedlistinfo);
+ shm_toc_insert(pcxt->toc, key, sharedlistinfo);
+ }
+}
+
+/*
+ * BeginParallelCopy - start parallel copy tasks.
+ *
+ * Get the number of workers required to perform the parallel copy. The data
+ * structures that are required by the parallel workers will be initialized, the
+ * size required in DSM will be calculated and the necessary keys will be loaded
+ * in the DSM. The specified number of workers will then be launched.
+ *
+ */
+static ParallelContext*
+BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
+{
+ ParallelContext *pcxt;
+ ParallelCopyShmInfo *shared_info_ptr;
+ ParallelCopyCommonKeyData *shared_cstate;
+ FullTransactionId full_transaction_id;
+ Size est_shared_info;
+ Size est_cstateshared;
+ Size est_att_list_size;
+ Size est_quote_list_size;
+ Size est_notnull_list_size;
+ Size est_null_list_size;
+ Size est_convert_list_size;
+ Size est_datasource_cb_size;
+ int count = 0;
+ char *whereClauseStr = NULL;
+ char *rangeTableStr = NULL;
+ int parallel_workers = 0;
+ ParallelCopyData *pcdata;
+
+ parallel_workers = Min(nworkers, max_worker_processes);
+
+ /* Can't perform copy in parallel */
+ if (parallel_workers <= 0)
+ return NULL;
+
+ pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
+ cstate->pcdata = pcdata;
+
+ EnterParallelMode();
+ pcxt = CreateParallelContext("postgres", "ParallelCopyMain",
+ parallel_workers);
+ Assert(pcxt->nworkers > 0);
+
+ /*
+ * Estimate size for shared information for PARALLEL_COPY_KEY_SHARED_INFO
+ */
+ est_shared_info = sizeof(ParallelCopyShmInfo);
+ shm_toc_estimate_chunk(&pcxt->estimator, est_shared_info);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate the size for shared information for PARALLEL_COPY_KEY_CSTATE */
+ est_cstateshared = MAXALIGN(sizeof(ParallelCopyCommonKeyData));
+ shm_toc_estimate_chunk(&pcxt->estimator, est_cstateshared);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ EstimateChunkKeysStr(pcxt, cstate->null_print);
+ EstimateChunkKeysStr(pcxt, cstate->null_print_client);
+ EstimateChunkKeysStr(pcxt, cstate->delim);
+ EstimateChunkKeysStr(pcxt, cstate->quote);
+ EstimateChunkKeysStr(pcxt, cstate->escape);
+
+ if (cstate->whereClause != NULL)
+ {
+ whereClauseStr = nodeToString(cstate->whereClause);
+ EstimateChunkKeysStr(pcxt, whereClauseStr);
+ }
+
+ if (cstate->range_table != NULL)
+ {
+ rangeTableStr = nodeToString(cstate->range_table);
+ EstimateChunkKeysStr(pcxt, rangeTableStr);
+ }
+
+ /* Estimate the size for shared information for PARALLEL_COPY_KEY_XID. */
+ shm_toc_estimate_chunk(&pcxt->estimator, sizeof(FullTransactionId));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_ATTNAME_LIST.
+ */
+ EstimateChunkKeysList(pcxt, attnamelist, &est_att_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_FORCE_QUOTE_LIST.
+ */
+ EstimateChunkKeysList(pcxt, cstate->force_quote, &est_quote_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_NOT_NULL_LIST.
+ */
+ EstimateChunkKeysList(pcxt, cstate->force_notnull,
+ &est_notnull_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_NULL_LIST.
+ */
+ EstimateChunkKeysList(pcxt, cstate->force_null, &est_null_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_CONVERT_LIST.
+ */
+ EstimateChunkKeysList(pcxt, cstate->convert_select,
+ &est_convert_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_DATASOURCE_CB.
+ */
+ if (cstate->data_source_cb)
+ {
+ char *functionname = LookupParallelCopyFnStr(cstate->data_source_cb);
+ est_datasource_cb_size = strlen(functionname) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, est_datasource_cb_size);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+
+ InitializeParallelDSM(pcxt);
+
+ /* If no DSM segment was available, back out (do serial copy) */
+ if (pcxt->seg == NULL)
+ {
+ EndParallelCopy(pcxt);
+ return NULL;
+ }
+
+ /* Allocate shared memory for PARALLEL_COPY_KEY_SHARED_INFO */
+ shared_info_ptr = (ParallelCopyShmInfo *) shm_toc_allocate(pcxt->toc,
+ est_shared_info);
+ MemSet(shared_info_ptr, 0, est_shared_info);
+ shared_info_ptr->is_read_in_progress = true;
+ shared_info_ptr->cur_block_pos = -1;
+ shared_info_ptr->full_transaction_id = full_transaction_id;
+ shared_info_ptr->mycid = GetCurrentCommandId(true);
+ for (count = 0; count < RINGSIZE; count++)
+ {
+ ParallelCopyChunkBoundary *chunkInfo = &shared_info_ptr->chunk_boundaries.ring[count];
+ pg_atomic_init_u32(&(chunkInfo->chunk_size), -1);
+ }
+
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_SHARED_INFO, shared_info_ptr);
+ pcdata->pcshared_info = shared_info_ptr;
+ pcdata->relid = relid;
+
+ /* Store shared build state, for which we reserved space. */
+ shared_cstate = (ParallelCopyCommonKeyData *)shm_toc_allocate(pcxt->toc,
+ est_cstateshared);
+
+ /* copy cstate variables. */
+ CopyCommonInfoForWorker(cstate, shared_cstate);
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_CSTATE, shared_cstate);
+
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_NULL_PRINT, cstate->null_print);
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_NULL_PRINT_CLIENT,
+ cstate->null_print_client);
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_DELIM, cstate->delim);
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_QUOTE, cstate->quote);
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_ESCAPE, cstate->escape);
+
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_ATTNAME_LIST,
+ attnamelist, est_att_list_size);
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_FORCE_QUOTE_LIST,
+ cstate->force_quote, est_quote_list_size);
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_NOT_NULL_LIST,
+ cstate->force_notnull, est_notnull_list_size);
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_NULL_LIST, cstate->force_null,
+ est_null_list_size);
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_CONVERT_LIST,
+ cstate->convert_select, est_convert_list_size);
+
+ if (cstate->data_source_cb)
+ {
+ char *functionname = LookupParallelCopyFnStr(cstate->data_source_cb);
+ char *data_source_cb = (char *) shm_toc_allocate(pcxt->toc,
+ est_datasource_cb_size);
+ strcpy(data_source_cb, functionname);
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_DATASOURCE_CB,
+ data_source_cb);
+ }
+
+ if (cstate->whereClause)
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_WHERE_CLAUSE_STR,
+ whereClauseStr);
+
+ if(cstate->range_table)
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_RANGE_TABLE, rangeTableStr);
+
+ LaunchParallelWorkers(pcxt);
+ if (pcxt->nworkers_launched == 0)
+ {
+ EndParallelCopy(pcxt);
+ return NULL;
+ }
+
+ /*
+ * Caller needs to wait for all launched workers when we return. Make sure
+ * that the failure-to-start case will not hang forever.
+ */
+ WaitForParallelWorkersToAttach(pcxt);
+
+ pcdata->is_leader = true;
+ cstate->is_parallel = true;
+ return pcxt;
+}
+
+/*
+ * EndParallelCopy - end the parallel copy tasks.
+ */
+static pg_attribute_always_inline void
+EndParallelCopy(ParallelContext *pcxt)
+{
+ Assert(!IsParallelWorker());
+
+ DestroyParallelContext(pcxt);
+ ExitParallelMode();
+}
+
+/*
+ * ParallelWorkerInitialization - Initialize parallel worker.
+ */
+static void
+ParallelWorkerInitialization(ParallelCopyCommonKeyData *shared_cstate,
+ CopyState cstate, List *attnamelist)
+{
+ uint32 count;
+ ParallelCopyData *pcdata = cstate->pcdata;
+ TupleDesc tup_desc = RelationGetDescr(cstate->rel);
+
+ cstate->copy_dest = shared_cstate->copy_dest;
+ cstate->file_encoding = shared_cstate->file_encoding;
+ cstate->need_transcoding = shared_cstate->need_transcoding;
+ cstate->encoding_embeds_ascii = shared_cstate->encoding_embeds_ascii;
+ cstate->csv_mode = shared_cstate->csv_mode;
+ cstate->header_line = shared_cstate->header_line;
+ cstate->null_print_len = shared_cstate->null_print_len;
+ cstate->force_quote_all = shared_cstate->force_quote_all;
+ cstate->convert_selectively = shared_cstate->convert_selectively;
+ cstate->num_defaults = shared_cstate->num_defaults;
+ pcdata->relid = shared_cstate->relid;
+
+ PopulateAttributes(cstate, tup_desc, attnamelist);
+
+ /* Initialize state variables. */
+ cstate->reached_eof = false;
+ cstate->eol_type = EOL_UNKNOWN;
+ cstate->cur_relname = RelationGetRelationName(cstate->rel);
+ cstate->cur_lineno = 0;
+ cstate->cur_attname = NULL;
+ cstate->cur_attval = NULL;
+
+ /* Set up variables to avoid per-attribute overhead. */
+ initStringInfo(&cstate->attribute_buf);
+
+ initStringInfo(&cstate->line_buf);
+ for (count = 0; count < WORKER_CHUNK_COUNT;count++)
+ initStringInfo(&pcdata->worker_line_buf[count].line_buf);
+
+ cstate->line_buf_converted = false;
+ cstate->raw_buf = NULL;
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+}
+/*
+ * ParallelCopyMain - parallel copy worker's code.
+ *
+ * Where clause handling, convert tuple to columns, add default null values for
+ * the missing columns that are not present in that record. Find the partition
+ * if it is partitioned table, invoke before row insert Triggers, handle
+ * constraints and insert the tuples.
+ */
+void
+ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
+{
+ CopyState cstate;
+ ParallelCopyData *pcdata;
+ ParallelCopyShmInfo *pcshared_info;
+ ParallelCopyCommonKeyData *shared_cstate;
+ Relation rel = NULL;
+ MemoryContext oldcontext;
+ List *attlist = NIL;
+ char *data_source_cb;
+ char *whereClauseStr = NULL;
+ char *rangeTableStr = NULL;
+
+ /* Allocate workspace and zero all fields. */
+ cstate = (CopyStateData *) palloc0(sizeof(CopyStateData));
+ pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
+ cstate->pcdata = pcdata;
+ pcdata->is_leader = false;
+ pcdata->worker_processed_pos = -1;
+ cstate->is_parallel = true;
+
+ /*
+ * We allocate everything used by a cstate in a new memory context. This
+ * avoids memory leaks during repeated use of COPY in a query.
+ */
+ cstate->copycontext = AllocSetContextCreate(CurrentMemoryContext,
+ "COPY",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+
+ pcshared_info = (ParallelCopyShmInfo *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_SHARED_INFO,
+ false);
+
+ ereport(DEBUG1, (errmsg("Starting parallel copy worker")));
+
+ pcdata->pcshared_info = pcshared_info;
+
+ shared_cstate = (ParallelCopyCommonKeyData *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_CSTATE,
+ false);
+
+ cstate->null_print = (char *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_NULL_PRINT,
+ true);
+ cstate->null_print_client = (char *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_NULL_PRINT_CLIENT,
+ true);
+
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_DELIM, &cstate->delim);
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_QUOTE, &cstate->quote);
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_ESCAPE, &cstate->escape);
+
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_ATTNAME_LIST, &attlist);
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_FORCE_QUOTE_LIST, &cstate->force_quote);
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_NOT_NULL_LIST, &cstate->force_notnull);
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_NULL_LIST, &cstate->force_null);
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_CONVERT_LIST, &cstate->convert_select);
+ data_source_cb = (char *)shm_toc_lookup(toc,
+ PARALLEL_COPY_KEY_DATASOURCE_CB,
+ true);
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_WHERE_CLAUSE_STR, &whereClauseStr);
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_RANGE_TABLE, &rangeTableStr);
+
+ if (data_source_cb)
+ cstate->data_source_cb = LookupParallelCopyFnPtr(data_source_cb);
+
+ if (whereClauseStr)
+ {
+ Node *whereClauseCnvrtdFrmStr = (Node *) stringToNode(whereClauseStr);
+ cstate->whereClause = whereClauseCnvrtdFrmStr;
+ }
+
+ if (rangeTableStr)
+ {
+ List *rangeTableCnvrtdFrmStr = (List *) stringToNode(rangeTableStr);
+ cstate->range_table = rangeTableCnvrtdFrmStr;
+ }
+
+ /* Open and lock the relation, using the appropriate lock type. */
+ rel = table_open(shared_cstate->relid, RowExclusiveLock);
+ cstate->rel = rel;
+ ParallelWorkerInitialization(shared_cstate, cstate, attlist);
+
+ CopyFrom(cstate);
+
+ if (rel != NULL)
+ table_close(rel, RowExclusiveLock);
+
+ MemoryContextSwitchTo(oldcontext);
+ return;
+}
+/*
+ * ParallelCopyLeader - parallel copy leader's functionality.
+ *
+ * Leader will populate the shared queue and share it across the workers. Leader
+ * will read the table data from the file and copy the contents to block. Leader
+ * will then read the input contents and identify the data based on line beaks.
+ * This information is called chunk. The chunk will be populate in
+ * ParallelCopyChunkBoundary. Workers will then pick up this information and insert
+ * in to table. Leader will do this till it completes processing the file.
+ * Leader executes the before statement if before statement trigger is present.
+ * Leader read the data from input file. Leader then loads data to data blocks
+ * as and when required block by block. Leader traverses through the data block
+ * to identify one chunk. It gets a free chunk to copy the information, if there
+ * is no free chunk it will wait till there is one free chunk.
+ * Server copies the identified chunks information into chunks. This process is
+ * repeated till the complete file is processed.
+ * Leader will wait till all the chunks populated are processed by the workers
+ * and exits.
+ */
+static void
+ParallelCopyLeader(CopyState cstate)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ereport(DEBUG1, (errmsg("Running parallel copy leader")));
+
+ pcshared_info->is_read_in_progress = false;
+ cstate->cur_lineno = 0;
+}
+/*
+ * LookupParallelCopyFnPtr - Look up parallel copy function pointer.
+ */
+static pg_attribute_always_inline copy_data_source_cb
+LookupParallelCopyFnPtr(const char *funcname)
+{
+ int i;
+
+ for (i = 0; i < lengthof(InternalParallelCopyFuncPtrs); i++)
+ {
+ if (strcmp(InternalParallelCopyFuncPtrs[i].fn_name, funcname) == 0)
+ return InternalParallelCopyFuncPtrs[i].fn_addr;
+ }
+
+ /* We can only reach this by programming error. */
+ elog(ERROR, "internal function \"%s\" not found", funcname);
+}
+
+/*
+ * LookupParallelCopyFnStr - Lookup function string from a function pointer.
+ */
+static pg_attribute_always_inline char*
+LookupParallelCopyFnStr(copy_data_source_cb fn_addr)
+{
+ int i;
+
+ for (i = 0; i < lengthof(InternalParallelCopyFuncPtrs); i++)
+ {
+ if (InternalParallelCopyFuncPtrs[i].fn_addr == fn_addr)
+ return InternalParallelCopyFuncPtrs[i].fn_name;
+ }
+
+ /* We can only reach this by programming error. */
+ elog(ERROR, "internal function pointer not found");
+}
+/*
* Send copy start/stop messages for frontend copies. These have changed
* in past protocol redesigns.
*/
@@ -1146,6 +1917,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
if (is_from)
{
+ ParallelContext *pcxt = NULL;
Assert(rel);
/* check read-only transaction and parallel mode */
@@ -1155,7 +1927,24 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
cstate = BeginCopyFrom(pstate, rel, stmt->filename, stmt->is_program,
NULL, stmt->attlist, stmt->options);
cstate->whereClause = whereClause;
- *processed = CopyFrom(cstate); /* copy from file to database */
+ cstate->is_parallel = false;
+
+ if (cstate->nworkers > 0)
+ pcxt = BeginParallelCopy(cstate->nworkers, cstate, stmt->attlist,
+ relid);
+
+ if (pcxt)
+ {
+ ParallelCopyLeader(cstate);
+
+ /* Wait for all copy workers to finish */
+ WaitForParallelWorkersToFinish(pcxt);
+ *processed = pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
+ EndParallelCopy(pcxt);
+ }
+ else
+ *processed = CopyFrom(cstate); /* copy from file to database */
+
EndCopyFrom(cstate);
}
else
@@ -1204,6 +1993,7 @@ ProcessCopyOptions(ParseState *pstate,
cstate->is_copy_from = is_from;
cstate->file_encoding = -1;
+ cstate->nworkers = -1;
/* Extract options from the statement node tree */
foreach(option, options)
@@ -1372,6 +2162,26 @@ ProcessCopyOptions(ParseState *pstate,
defel->defname),
parser_errposition(pstate, defel->location)));
}
+ else if (strcmp(defel->defname, "parallel") == 0)
+ {
+ if (!cstate->is_copy_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("parallel option supported only for copy from"),
+ parser_errposition(pstate, defel->location)));
+ if (cstate->nworkers >= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("conflicting or redundant options"),
+ parser_errposition(pstate, defel->location)));
+ cstate->nworkers = atoi(defGetString(defel));
+ if (cstate->nworkers < 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("argument to option \"%s\" must be a non-negative integer",
+ defel->defname),
+ parser_errposition(pstate, defel->location)));
+ }
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index c27d970..b3787c1 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -557,7 +557,7 @@ make_copy_attnamelist(LogicalRepRelMapEntry *rel)
* Data source callback for the COPY FROM, which reads from the remote
* connection and passes the data back to our local COPY.
*/
-static int
+int
copy_read_data(void *outbuf, int minread, int maxread)
{
int bytesread = 0;
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c639833..5dc95ac 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -14,6 +14,7 @@
#ifndef COPY_H
#define COPY_H
+#include "access/parallel.h"
#include "nodes/execnodes.h"
#include "nodes/parsenodes.h"
#include "parser/parse_node.h"
@@ -41,4 +42,7 @@ extern uint64 CopyFrom(CopyState cstate);
extern DestReceiver *CreateCopyDestReceiver(void);
+extern void ParallelCopyMain(dsm_segment *seg, shm_toc *toc);
+
+extern int copy_read_data(void *outbuf, int minread, int maxread);
#endif /* COPY_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index c65a552..3373894 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1698,6 +1698,14 @@ ParallelBitmapHeapState
ParallelBlockTableScanDesc
ParallelCompletionPtr
ParallelContext
+ParallelCopyChunkBoundaries
+ParallelCopyChunkBoundary
+ParallelCopyCommonKeyData
+ParallelCopyData
+ParallelCopyDataBlock
+ParallelCopyKeyListInfo
+ParallelCopyLineBuf
+ParallelCopyShmInfo
ParallelExecutorInfo
ParallelHashGrowth
ParallelHashJoinBatch
--
1.8.3.1
0004-Documentation-for-parallel-copy.patchtext/x-patch; charset=US-ASCII; name=0004-Documentation-for-parallel-copy.patchDownload
From a45985a66ead7f31e6f885b8406da14f261d7021 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Wed, 10 Jun 2020 07:21:10 +0530
Subject: [PATCH 4/4] Documentation for parallel copy.
This patch has the documentation changes for parallel copy.
---
doc/src/sgml/ref/copy.sgml | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18189ab..95d349d 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -37,6 +37,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
NULL '<replaceable class="parameter">null_string</replaceable>'
HEADER [ <replaceable class="parameter">boolean</replaceable> ]
+ PARALLEL <replaceable class="parameter">integer</replaceable>
QUOTE '<replaceable class="parameter">quote_character</replaceable>'
ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -275,6 +276,21 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</varlistentry>
<varlistentry>
+ <term><literal>PARALLEL</literal></term>
+ <listitem>
+ <para>
+ Perform <command>COPY FROM</command> in parallel using <replaceable
+ class="parameter"> integer</replaceable> background workers. Please
+ note that it is not guaranteed that the number of parallel workers
+ specified in <replaceable class="parameter">integer</replaceable> will
+ be used during execution. It is possible for a copy to run with fewer
+ workers than specified, or even with no workers at all. This option is
+ allowed only in <command>COPY FROM</command>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>QUOTE</literal></term>
<listitem>
<para>
--
1.8.3.1
Hi All,
I've spent little bit of time going through the project discussion that has
happened in this email thread and to start with I have few questions which
I would like to put here:
Q1) Are we also planning to read the input data in parallel or is it only
about performing the multi-insert operation in parallel? AFAIU, the data
reading part will be done by the leader process alone so no parallelism is
involved there.
Q2) How are we going to deal with the partitioned tables? I mean will there
be some worker process dedicated for each partition or how is it? Further,
the challenge that I see incase of partitioned tables is that we would have
a single input file containing data to be inserted into multiple tables
(aka partitions) unlike the normal case where all the tuples in the input
file would belong to the same table.
Q3) Incase of toast tables, there is a possibility of having a single tuple
in the input file which could be of a very big size (probably in GB)
eventually resulting in a bigger file size. So, in this case, how are we
going to decide the number of worker processes to be launched. I mean,
although the file size is big, but the number of tuples to be processed is
just one or few of them, so, can we decide the number of the worker
processes to be launched based on the file size?
Q4) Who is going to process constraints (preferably the deferred
constraint) that is supposed to be executed at the COMMIT time? I mean is
it the leader process or the worker process or in such cases we won't be
choosing the parallelism at all?
Q5) Do we have any risk of table bloating when the data is loaded in
parallel. I am just asking this because incase of parallelism there would
be multiple processes performing bulk insert into a table. There is a
chance that the table file might get extended even if there is some space
into the file being written into, but that space is locked by some other
worker process and hence that might result in a creation of a new block for
that table. Sorry, if I am missing something here.
Please note that I haven't gone through all the emails in this thread so
there is a possibility that I might have repeated the question that has
already been raised and answered here. If that is the case, I am sorry for
that, but it would be very helpful if someone could point out that thread
so that I can go through it. Thank you.
--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
On Fri, Jun 12, 2020 at 11:01 AM vignesh C <vignesh21@gmail.com> wrote:
Show quoted text
On Thu, Jun 4, 2020 at 12:44 AM Andres Freund <andres@anarazel.de> wrote
Hm. you don't explicitly mention that in your design, but given how
small the benefits going from 0-1 workers is, I assume the leader
doesn't do any "chunk processing" on its own?Yes you are right, the leader does not do any processing, Leader's
work is mainly to populate the shared memory with the offset
information for each record.Design of the Parallel Copy: The backend, to which the "COPY FROM"
query is
submitted acts as leader with the responsibility of reading data from
the
file/stdin, launching at most n number of workers as specified with
PARALLEL 'n' option in the "COPY FROM" query. The leader populates the
common data required for the workers execution in the DSM and shares it
with the workers. The leader then executes before statement triggers if
there exists any. Leader populates DSM chunks which includes the start
offset and chunk size, while populating the chunks it reads as manyblocks
as required into the DSM data blocks from the file. Each block is of
64K
size. The leader parses the data to identify a chunk, the existing
logic
from CopyReadLineText which identifies the chunks with some changes was
used for this. Leader checks if a free chunk is available to copy the
information, if there is no free chunk it waits till the requiredchunk is
freed up by the worker and then copies the identified chunks
information
(offset & chunk size) into the DSM chunks. This process is repeated
till
the complete file is processed. Simultaneously, the workers cache the
chunks(50) locally into the local memory and release the chunks to the
leader for further populating. Each worker processes the chunk that it
cached and inserts it into the table. The leader waits till all thechunks
populated are processed by the workers and exits.
Why do we need the local copy of 50 chunks? Copying memory around is far
from free. I don't see why it'd be better to add per-process caching,
rather than making the DSM bigger? I can see some benefit in marking
multiple chunks as being processed with one lock acquisition, but I
don't think adding a memory copy is a good idea.We had run performance with csv data file, 5.1GB, 10million tuples, 2
indexes on integer columns, results for the same are given below. We
noticed in some cases the performance is better if we copy the 50
records locally and release the shared memory. We will get better
benefits as the workers increase. Thoughts?------------------------------------------------------------------------------------------------
Workers | Exec time (With local copying | Exec time (Without copying,
| 50 records & release the | processing record by
record)
| shared memory) |------------------------------------------------------------------------------------------------
0 | 1162.772(1X) | 1152.684(1X)
2 | 635.249(1.83X) | 647.894(1.78X)
4 | 336.835(3.45X) | 335.534(3.43X)
8 | 188.577(6.17 X) | 189.461(6.08X)
16 | 126.819(9.17X) | 142.730(8.07X)
20 | 117.845(9.87X) | 146.533(7.87X)
30 | 127.554(9.11X) | 160.307(7.19X)This patch *desperately* needs to be split up. It imo is close to
unreviewable, due to a large amount of changes that just move code
around without other functional changes being mixed in with the actual
new stuff.I have split the patch, the new split patches are attached.
/* + * State of the chunk. + */ +typedef enum ChunkState +{ + CHUNK_INIT, /* initial stateof chunk */
+ CHUNK_LEADER_POPULATING, /* leader processing chunk */ + CHUNK_LEADER_POPULATED, /* leader completed populatingchunk */
+ CHUNK_WORKER_PROCESSING, /* worker processing chunk */ + CHUNK_WORKER_PROCESSED /* worker completed processingchunk */
+}ChunkState; + +#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1bytes */
+ +#define DATA_BLOCK_SIZE RAW_BUF_SIZE +#define RINGSIZE (10 * 1000) +#define MAX_BLOCKS_COUNT 1000 +#define WORKER_CHUNK_COUNT 50 /* should be mod of RINGSIZE */ + +#define IsParallelCopy() (cstate->is_parallel) +#define IsLeader()(cstate->pcdata->is_leader)
+#define IsHeaderLine() (cstate->header_line &&
cstate->cur_lineno == 1)
+ +/* + * Copy data block information. + */ +typedef struct CopyDataBlock +{ + /* The number of unprocessed chunks in the current block. */ + pg_atomic_uint32 unprocessed_chunk_parts; + + /* + * If the current chunk data is continued into another block, + * following_block will have the position where the remainingdata need to
+ * be read. + */ + uint32 following_block; + + /* + * This flag will be set, when the leader finds out this blockcan be read
+ * safely by the worker. This helps the worker to start
processing the chunk
+ * early where the chunk will be spread across many blocks and
the worker
+ * need not wait for the complete chunk to be processed. + */ + bool curr_blk_completed; + char data[DATA_BLOCK_SIZE + 1]; /* data read from file */ +}CopyDataBlock;What's the + 1 here about?
Fixed this, removed +1. That is not needed.
+/* + * Parallel copy line buffer information. + */ +typedef struct ParallelCopyLineBuf +{ + StringInfoData line_buf; + uint64 cur_lineno; /* line numberfor error messages */
+}ParallelCopyLineBuf;
Why do we need separate infrastructure for this? We shouldn't duplicate
infrastructure unnecessarily.This was required for copying the multiple records locally and
releasing the shared memory. I have not changed this, will decide on
this based on the decision taken for one of the previous comments.+/* + * Common information that need to be copied to shared memory. + */ +typedef struct CopyWorkerCommonData +{Why is parallel specific stuff here suddenly not named ParallelCopy*
anymore? If you introduce a naming like that it imo should be used
consistently.Fixed, changed to maintain ParallelCopy in all structs.
+ /* low-level state data */ + CopyDest copy_dest; /* type of copysource/destination */
+ int file_encoding; /* file or remote side's
character encoding */
+ bool need_transcoding; /* file encoding diff
from server? */
+ bool encoding_embeds_ascii; /* ASCII can be
non-first byte? */
+ + /* parameters from the COPY command */ + bool csv_mode; /* Comma Separated Valueformat? */
+ bool header_line; /* CSV header line? */ + int null_print_len; /* length of same */ + bool force_quote_all; /* FORCE_QUOTE *? */ + bool convert_selectively; /* do selectivebinary conversion? */
+ + /* Working state for COPY FROM */ + AttrNumber num_defaults; + Oid relid; +}CopyWorkerCommonData;But I actually think we shouldn't have this information in two different
structs. This should exist once, independent of using parallel /
non-parallel copy.This structure helps in storing the common data from CopyStateData
that are required by the workers. This information will then be
allocated and stored into the DSM for the worker to retrieve and copy
it to CopyStateData.+/* List information */ +typedef struct ListInfo +{ + int count; /* count of attributes */ + + /* string info in the form info followed by info1, info2...infon */
+ char info[1];
+} ListInfo;Based on these comments I have no idea what this could be for.
Have added better comments for this. The following is added: This
structure will help in converting a List data type into the below
structure format with the count having the number of elements in the
list and the info having the List elements appended contiguously. This
converted structure will be allocated in shared memory and stored in
DSM for the worker to retrieve and later convert it back to List data
type./* - * This keeps the character read at the top of the loop in the buffer - * even if there is more than one read-ahead. + * This keeps the character read at the top of the loop in the buffer + * even if there is more than one read-ahead. + */ +#define IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(extralen) \ +if (1) \ +{ \ + if (copy_buff_state.raw_buf_ptr + (extralen) >=copy_buff_state.copy_buf_len && !hit_eof) \
+ { \ + if (IsParallelCopy()) \ + { \ + copy_buff_state.chunk_size = prev_chunk_size; /*update previous chunk size */ \
+ if (copy_buff_state.block_switched) \ + { \ +pg_atomic_sub_fetch_u32(©_buff_state.data_blk_ptr->unprocessed_chunk_parts,
1); \+ copy_buff_state.copy_buf_len =
prev_copy_buf_len; \
+ } \ + } \ + copy_buff_state.raw_buf_ptr = prev_raw_ptr; /* undofetch */ \
+ need_data = true; \ + continue; \ + } \ +} else ((void) 0)I think it's an absolutely clear no-go to add new branches to
these. They're *really* hot already, and this is going to sprinkle a
significant amount of new instructions over a lot of places.Fixed, removed this.
+/* + * SET_RAWBUF_FOR_LOAD - Set raw_buf to the shared memory where thefile data must
+ * be read. + */ +#define SET_RAWBUF_FOR_LOAD() \ +{ \ + ShmCopyInfo *pcshared_info = cstate->pcdata->pcshared_info; \ + uint32 cur_block_pos; \ + /* \ + * Mark the previous block as completed, worker can startcopying this data. \
+ */ \ + if (copy_buff_state.data_blk_ptr !=copy_buff_state.curr_data_blk_ptr && \
+ copy_buff_state.data_blk_ptr->curr_blk_completed ==
false) \
+ copy_buff_state.data_blk_ptr->curr_blk_completed = true;
\
+ \
+ copy_buff_state.data_blk_ptr =copy_buff_state.curr_data_blk_ptr; \
+ cur_block_pos = WaitGetFreeCopyBlock(pcshared_info); \ + copy_buff_state.curr_data_blk_ptr =&pcshared_info->data_blocks[cur_block_pos]; \
+ \ + if (!copy_buff_state.data_blk_ptr) \ + { \ + copy_buff_state.data_blk_ptr =copy_buff_state.curr_data_blk_ptr; \
+ chunk_first_block = cur_block_pos; \ + } \ + else if (need_data == false) \ + copy_buff_state.data_blk_ptr->following_block =cur_block_pos; \
+ \ + cstate->raw_buf = copy_buff_state.curr_data_blk_ptr->data; \ + copy_buff_state.copy_raw_buf = cstate->raw_buf; \ +} + +/* + * END_CHUNK_PARALLEL_COPY - Update the chunk information in sharedmemory.
+ */ +#define END_CHUNK_PARALLEL_COPY() \ +{ \ + if (!IsHeaderLine()) \ + { \ + ShmCopyInfo *pcshared_info =cstate->pcdata->pcshared_info; \
+ ChunkBoundaries *chunkBoundaryPtr =
&pcshared_info->chunk_boundaries; \
+ if (copy_buff_state.chunk_size) \ + { \ + ChunkBoundary *chunkInfo =&chunkBoundaryPtr->ring[chunk_pos]; \
+ /* \ + * If raw_buf_ptr is zero,unprocessed_chunk_parts would have been \
+ * incremented in SEEK_COPY_BUFF_POS. This will
happen if the whole \
+ * chunk finishes at the end of the current
block. If the \
+ * new_line_size > raw_buf_ptr, then the new
block has only new line \
+ * char content. The unprocessed count should
not be increased in \
+ * this case. \ + */ \ + if (copy_buff_state.raw_buf_ptr != 0 && \ + copy_buff_state.raw_buf_ptr >new_line_size) \
+
pg_atomic_add_fetch_u32(©_buff_state.curr_data_blk_ptr->unprocessed_chunk_parts,
1); \+ \ + /* Update chunk size. */ \ + pg_atomic_write_u32(&chunkInfo->chunk_size,copy_buff_state.chunk_size); \
+ pg_atomic_write_u32(&chunkInfo->chunk_state,
CHUNK_LEADER_POPULATED); \
+ elog(DEBUG1, "[Leader] After adding - chunk
position:%d, chunk_size:%d", \
+ chunk_pos,
copy_buff_state.chunk_size); \
+ pcshared_info->populated++; \ + } \ + else if (new_line_size) \ + { \ + /* \ + * This means only new line char, empty recordshould be \
+ * inserted. \ + */ \ + ChunkBoundary *chunkInfo; \ + chunk_pos = UpdateBlockInChunkInfo(cstate, -1,-1, 0, \
+
CHUNK_LEADER_POPULATED); \
+ chunkInfo = &chunkBoundaryPtr->ring[chunk_pos]; \ + elog(DEBUG1, "[Leader] Added empty chunk withoffset:%d, chunk position:%d, chunk size:%d", \
+
chunkInfo->start_offset, chunk_pos, \
+
pg_atomic_read_u32(&chunkInfo->chunk_size)); \
+ pcshared_info->populated++; \ + } \ + }\ + \ + /*\ + * All of the read data is processed, reset index & len. In the\ + * subsequent read, we will get a new block and copy data in tothe\
+ * new block.\ + */\ + if (copy_buff_state.raw_buf_ptr == copy_buff_state.copy_buf_len)\ + {\ + cstate->raw_buf_index = 0;\ + cstate->raw_buf_len = 0;\ + }\ + else\ + cstate->raw_buf_len = copy_buff_state.copy_buf_len;\ +}Why are these macros? They are way way way above a length where that
makes any sort of sense.Converted these macros to functions.
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
On Fri, Jun 12, 2020 at 4:57 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
Hi All,
I've spent little bit of time going through the project discussion that has happened in this email thread and to start with I have few questions which I would like to put here:
Q1) Are we also planning to read the input data in parallel or is it only about performing the multi-insert operation in parallel? AFAIU, the data reading part will be done by the leader process alone so no parallelism is involved there.
Yes, your understanding is correct.
Q2) How are we going to deal with the partitioned tables?
I haven't studied the patch but my understanding is that we will
support parallel copy for partitioned tables with a few restrictions
as explained in my earlier email [1]/messages/by-id/CAA4eK1+ANNEaMJCCXm4naweP5PLY6LhJMvGo_V7-Pnfbh6GsOA@mail.gmail.com. See, Case-2 (b) in the email.
I mean will there be some worker process dedicated for each partition or how is it?
No, it the split is just based on the input, otherwise each worker
should insert as we would have done without any workers.
Q3) Incase of toast tables, there is a possibility of having a single tuple in the input file which could be of a very big size (probably in GB) eventually resulting in a bigger file size. So, in this case, how are we going to decide the number of worker processes to be launched. I mean, although the file size is big, but the number of tuples to be processed is just one or few of them, so, can we decide the number of the worker processes to be launched based on the file size?
Yeah, such situations would be tricky, so we should have an option for
user to specify the number of workers.
Q4) Who is going to process constraints (preferably the deferred constraint) that is supposed to be executed at the COMMIT time? I mean is it the leader process or the worker process or in such cases we won't be choosing the parallelism at all?
In the first version, we won't do parallelism for this. Again, see
one of my earlier email [1]/messages/by-id/CAA4eK1+ANNEaMJCCXm4naweP5PLY6LhJMvGo_V7-Pnfbh6GsOA@mail.gmail.com where I have explained this and other
cases where we won't be supporting parallel copy.
Q5) Do we have any risk of table bloating when the data is loaded in parallel. I am just asking this because incase of parallelism there would be multiple processes performing bulk insert into a table. There is a chance that the table file might get extended even if there is some space into the file being written into, but that space is locked by some other worker process and hence that might result in a creation of a new block for that table. Sorry, if I am missing something here.
Hmm, each worker will operate at page level, after first insertion,
the same worker will try to insert in the same page in which it has
inserted last, so there shouldn't be such a problem.
Please note that I haven't gone through all the emails in this thread so there is a possibility that I might have repeated the question that has already been raised and answered here. If that is the case, I am sorry for that, but it would be very helpful if someone could point out that thread so that I can go through it. Thank you.
No problem, I understand sometimes it is difficult to go through each
and every email especially when the discussion is long. Anyway,
thanks for showing the interest in the patch.
[1]: /messages/by-id/CAA4eK1+ANNEaMJCCXm4naweP5PLY6LhJMvGo_V7-Pnfbh6GsOA@mail.gmail.com
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Hi,
Attached is the patch supporting parallel copy for binary format files.
The performance improvement achieved with different workers is as shown
below. Dataset used has 10million tuples and is of 5.3GB size.
parallel workers test case 1(exec time in sec): copy from binary file, 2
indexes on integer columns and 1 index on text column test case 2(exec time
in sec): copy from binary file, 1 gist index on text column test case
3(exec time in sec): copy from binary file, 3 indexes on integer columns
0 1106.899(1X) 772.758(1X) 171.338(1X)
1 1094.165(1.01X) 757.365(1.02X) 163.018(1.05X)
2 618.397(1.79X) 428.304(1.8X) 117.508(1.46X)
4 320.511(3.45X) 231.938(3.33X) 80.297(2.13X)
8 172.462(6.42X) 150.212(5.14X) *71.518(2.39X)*
16 110.460(10.02X) *124.929(6.18X)* 91.308(1.88X)
20 *98.470(11.24X)* 137.313(5.63X) 95.289(1.79X)
30 109.229(10.13X) 173.54(4.45X) 95.799(1.78X)
Design followed for developing this patch:
Leader reads data from the file into the DSM data blocks each of 64K size.
It also identifies each tuple data block id, start offset, end offset,
tuple size and updates this information in the ring data structure. Workers
parallely read the tuple information from the ring data structure, the
actual tuple data from the data blocks and parallely insert the tuples into
the table.
Please note that this patch can be applied on the series of patches that
were posted previously[1]/messages/by-id/CALDaNm3uyHpD9sKoFtB0EnMO8DLuD6H9pReFm=tm=9ccEWuUVQ@mail.gmail.com for parallel copy for csv/text files.
The correct order to apply all the patches is -
0001-Copy-code-readjustment-to-support-parallel-copy.patch
</messages/by-id/attachment/111463/0001-Copy-code-readjustment-to-support-parallel-copy.patch>
0002-Framework-for-leader-worker-in-parallel-copy.patch
</messages/by-id/attachment/111465/0002-Framework-for-leader-worker-in-parallel-copy.patch>
0003-Allow-copy-from-command-to-process-data-from-file-ST.patch
</messages/by-id/attachment/111464/0003-Allow-copy-from-command-to-process-data-from-file-ST.patch>
0004-Documentation-for-parallel-copy.patch
</messages/by-id/attachment/111466/0004-Documentation-for-parallel-copy.patch>
and
0005-Parallel-Copy-For-Binary-Format-Files.patch
The above tests were run with the configuration attached config.txt, which
is the same used for performance tests of csv/text files posted earlier in
this mail chain.
Request the community to take this patch up for review along with the
parallel copy for csv/text file patches and provide feedback.
[1]: /messages/by-id/CALDaNm3uyHpD9sKoFtB0EnMO8DLuD6H9pReFm=tm=9ccEWuUVQ@mail.gmail.com
/messages/by-id/CALDaNm3uyHpD9sKoFtB0EnMO8DLuD6H9pReFm=tm=9ccEWuUVQ@mail.gmail.com
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
0005-Parallel-Copy-For-Binary-Format-Files.patchapplication/octet-stream; name=0005-Parallel-Copy-For-Binary-Format-Files.patchDownload
From 5f3f9c365e5ba75a293f9685247a1a6c19762c51 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Mon, 15 Jun 2020 15:41:06 +0530
Subject: [PATCH v3] Parallel Copy For Binary Format Files
Leader reads data from the file into the DSM data blocks each of 64K size.
It also identifies each tuple data block id, start offset, end offset,
tuple size and updates this information in the ring data structure.
Workers parallely read the tuple information from the ring data structure,
the actual tuple data from the data blocks and parallely insert the tuples
into the table.
---
src/backend/commands/copy.c | 667 ++++++++++++++++++++++++++++++++----
1 file changed, 599 insertions(+), 68 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index b1e2e71a7c..5b9508d27b 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -231,6 +231,16 @@ typedef struct ParallelCopyData
uint32 worker_line_buf_pos;
}ParallelCopyData;
+/*
+ * Tuple boundary information used for parallel copy
+ * for binary format files.
+ */
+typedef struct ParallelCopyTupleInfo
+{
+ uint32 offset;
+ uint32 block_id;
+}ParallelCopyTupleInfo;
+
/*
* This struct contains all the state variables used throughout a COPY
* operation. For simplicity, we use the same struct for all variants of COPY,
@@ -361,6 +371,16 @@ typedef struct CopyStateData
int nworkers;
bool is_parallel;
ParallelCopyData *pcdata;
+
+ /*
+ * Parallel copy for binary formatted files
+ */
+ ParallelCopyDataBlock *curr_data_block;
+ ParallelCopyDataBlock *prev_data_block;
+ uint32 curr_data_offset;
+ uint32 curr_block_pos;
+ ParallelCopyTupleInfo curr_tuple_start_info;
+ ParallelCopyTupleInfo curr_tuple_end_info;
} CopyStateData;
/*
@@ -386,6 +406,7 @@ typedef struct ParallelCopyCommonKeyData
/* Working state for COPY FROM */
AttrNumber num_defaults;
Oid relid;
+ bool binary;
}ParallelCopyCommonKeyData;
/*
@@ -741,6 +762,14 @@ static void PopulateCatalogInformation(CopyState cstate);
static pg_attribute_always_inline uint32 GetChunkPosition(CopyState cstate);
static pg_attribute_always_inline copy_data_source_cb LookupParallelCopyFnPtr(const char *funcname);
static pg_attribute_always_inline char* LookupParallelCopyFnStr(copy_data_source_cb fn_addr);
+static uint32 WaitGetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info);
+static pg_attribute_always_inline bool CopyReadBinaryTupleLeader(CopyState cstate);
+static pg_attribute_always_inline void CopyReadBinaryAttributeLeader(CopyState cstate,
+ FmgrInfo *flinfo, Oid typioparam, int32 typmod,
+ uint32 *new_block_pos);
+static pg_attribute_always_inline bool CopyReadBinaryTupleWorker(CopyState cstate, Datum *values, bool *nulls);
+static pg_attribute_always_inline Datum CopyReadBinaryAttributeWorker(CopyState cstate, int column_no,
+ FmgrInfo *flinfo, Oid typioparam, int32 typmod, bool *isnull);
/*
* CopyCommonInfoForWorker - Copy shared_cstate using cstate information.
@@ -759,6 +788,7 @@ CopyCommonInfoForWorker(CopyState cstate, ParallelCopyCommonKeyData *shared_csta
shared_cstate->convert_selectively = cstate->convert_selectively;
shared_cstate->num_defaults = cstate->num_defaults;
shared_cstate->relid = cstate->pcdata->relid;
+ shared_cstate->binary = cstate->binary;
}
/*
@@ -977,8 +1007,8 @@ FindInsertMethod(CopyState cstate)
static pg_attribute_always_inline bool
IsParallelCopyAllowed(CopyState cstate)
{
- /* Parallel copy not allowed for freeze & binary option. */
- if (cstate->freeze || cstate->binary)
+ /* Parallel copy not allowed for freeze. */
+ if (cstate->freeze)
return false;
/* Check if copy is into foreign table. */
@@ -1270,6 +1300,7 @@ ParallelWorkerInitialization(ParallelCopyCommonKeyData *shared_cstate,
cstate->convert_selectively = shared_cstate->convert_selectively;
cstate->num_defaults = shared_cstate->num_defaults;
pcdata->relid = shared_cstate->relid;
+ cstate->binary = shared_cstate->binary;
PopulateAttributes(cstate, tup_desc, attnamelist);
@@ -1302,6 +1333,15 @@ ParallelWorkerInitialization(ParallelCopyCommonKeyData *shared_cstate,
cstate->max_fields = attr_count;
cstate->raw_fields = (char **)palloc(attr_count * sizeof(char *));
}
+
+ cstate->curr_data_block = NULL;
+ cstate->prev_data_block = NULL;
+ cstate->curr_data_offset = 0;
+ cstate->curr_block_pos = 0;
+ cstate->curr_tuple_start_info.block_id = -1;
+ cstate->curr_tuple_start_info.offset = -1;
+ cstate->curr_tuple_end_info.block_id = -1;
+ cstate->curr_tuple_end_info.offset = -1;
}
/*
@@ -1650,38 +1690,515 @@ ParallelCopyLeader(CopyState cstate)
/* Execute the before statement triggers from the leader */
ExecBeforeStmtTrigger(cstate);
- /* On input just throw the header line away. */
- if (cstate->cur_lineno == 0 && cstate->header_line)
+ if (!cstate->binary)
{
- cstate->cur_lineno++;
- if (CopyReadLine(cstate))
+ /* On input just throw the header line away. */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
{
- pcshared_info->is_read_in_progress = false;
- return; /* done */
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ {
+ pcshared_info->is_read_in_progress = false;
+ return; /* done */
+ }
}
- }
- for (;;)
+ for (;;)
+ {
+ bool done;
+ cstate->cur_lineno++;
+
+ /* Actually read the line into memory here. */
+ done = CopyReadLine(cstate);
+
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ break;
+ }
+ }
+ else
{
- bool done;
- cstate->cur_lineno++;
+ cstate->curr_data_block = NULL;
+ cstate->prev_data_block = NULL;
+ cstate->curr_data_offset = 0;
+ cstate->curr_block_pos = 0;
+ cstate->curr_tuple_start_info.block_id = -1;
+ cstate->curr_tuple_start_info.offset = -1;
+ cstate->curr_tuple_end_info.block_id = -1;
+ cstate->curr_tuple_end_info.offset = -1;
+ pcshared_info->populated = 0;
+ cstate->cur_lineno = 0;
- /* Actually read the line into memory here. */
- done = CopyReadLine(cstate);
+ for (;;)
+ {
+ bool eof = false;
+ cstate->cur_lineno++;
- /*
- * EOF at start of line means we're done. If we see EOF after some
- * characters, we act as though it was newline followed by EOF, ie,
- * process the line and then exit loop on next iteration.
- */
- if (done && cstate->line_buf.len == 0)
- break;
+ eof = CopyReadBinaryTupleLeader(cstate);
+
+ if (eof)
+ break;
+ }
}
pcshared_info->is_read_in_progress = false;
cstate->cur_lineno = 0;
}
+/*
+ * CopyReadBinaryTupleLeader - leader reads data from binary formatted file
+ * to data blocks and identies tuple boundaries/offsets so that workers
+ * can work on the data blocks data.
+ */
+static pg_attribute_always_inline bool
+CopyReadBinaryTupleLeader(CopyState cstate)
+{
+ int readbytes = 0;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint32 block_pos;
+ int16 fld_count;
+ AttrNumber attr_count = list_length(cstate->attnumlist);
+ ListCell *cur;
+ FmgrInfo *in_functions = cstate->in_functions;
+ Oid *typioparams = cstate->typioparams;
+ TupleDesc tupDesc = RelationGetDescr(cstate->rel);
+ uint32 new_block_pos;
+ uint32 chunk_size;
+
+ if (cstate->curr_data_block == NULL)
+ {
+ block_pos = WaitGetFreeCopyBlock(pcshared_info);
+
+ cstate->curr_block_pos = block_pos;
+
+ cstate->curr_data_block = &pcshared_info->data_blocks[block_pos];
+
+ cstate->curr_data_offset = 0;
+
+ readbytes = CopyGetData(cstate, &cstate->curr_data_block->data, 1, DATA_BLOCK_SIZE);
+
+ elog(DEBUG1, "LEADER - bytes read from file %d", readbytes);
+
+ if (cstate->reached_eof)
+ return true;
+ }
+
+ if (cstate->curr_data_offset + sizeof(fld_count) >= (DATA_BLOCK_SIZE - 1))
+ {
+ ParallelCopyDataBlock *data_block = NULL;
+ uint8 movebytes = 0;
+
+ block_pos = WaitGetFreeCopyBlock(pcshared_info);
+
+ movebytes = DATA_BLOCK_SIZE - cstate->curr_data_offset;
+
+ cstate->curr_data_block->skip_bytes = movebytes;
+
+ data_block = &pcshared_info->data_blocks[block_pos];
+
+ if (movebytes > 0)
+ memmove(&data_block->data[0], &cstate->curr_data_block->data[cstate->curr_data_offset],
+ movebytes);
+
+ elog(DEBUG1, "LEADER - field count is spread across data blocks - moved %d bytes from current block %u to %u block",
+ movebytes, cstate->curr_block_pos, block_pos);
+
+ readbytes = CopyGetData(cstate, &data_block->data[movebytes], 1, (DATA_BLOCK_SIZE - movebytes));
+
+ elog(DEBUG1, "LEADER - bytes read from file after field count is moved to next data block %d", readbytes);
+
+ if (cstate->reached_eof)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("unexpected EOF in COPY data")));
+
+ cstate->curr_data_block = data_block;
+ cstate->curr_data_offset = 0;
+ cstate->curr_block_pos = block_pos;
+ }
+
+ memcpy(&fld_count, &cstate->curr_data_block->data[cstate->curr_data_offset], sizeof(fld_count));
+
+ fld_count = (int16) pg_ntoh16(fld_count);
+
+ if (fld_count == -1)
+ {
+ /*
+ * Received EOF marker. In a V3-protocol copy, wait for the
+ * protocol-level EOF, and complain if it doesn't come
+ * immediately. This ensures that we correctly handle CopyFail,
+ * if client chooses to send that now.
+ *
+ * Note that we MUST NOT try to read more data in an old-protocol
+ * copy, since there is no protocol-level EOF marker then. We
+ * could go either way for copy from file, but choose to throw
+ * error if there's data after the EOF marker, for consistency
+ * with the new-protocol case.
+ */
+ char dummy;
+
+ if (cstate->copy_dest != COPY_OLD_FE &&
+ CopyGetData(cstate, &dummy, 1, 1) > 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("received copy data after EOF marker")));
+ return true;
+ }
+
+ if (fld_count != attr_count)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("row field count is %d, expected %d",
+ (int) fld_count, attr_count)));
+
+ cstate->curr_tuple_start_info.block_id = cstate->curr_block_pos;
+ cstate->curr_tuple_start_info.offset = cstate->curr_data_offset;
+ cstate->curr_data_offset = cstate->curr_data_offset + sizeof(fld_count);
+ new_block_pos = cstate->curr_block_pos;
+
+ foreach(cur, cstate->attnumlist)
+ {
+ int attnum = lfirst_int(cur);
+ int m = attnum - 1;
+ Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+ CopyReadBinaryAttributeLeader(cstate,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &new_block_pos);
+ }
+
+ cstate->curr_tuple_end_info.block_id = new_block_pos;
+ cstate->curr_tuple_end_info.offset = cstate->curr_data_offset-1;;
+
+ if (cstate->curr_tuple_start_info.block_id == cstate->curr_tuple_end_info.block_id)
+ {
+ elog(DEBUG1,"LEADER - tuple lies in a single data block");
+
+ chunk_size = cstate->curr_tuple_end_info.offset - cstate->curr_tuple_start_info.offset + 1;
+ pg_atomic_add_fetch_u32(&pcshared_info->data_blocks[cstate->curr_tuple_start_info.block_id].unprocessed_chunk_parts, 1);
+ }
+ else
+ {
+ uint32 following_block_id = pcshared_info->data_blocks[cstate->curr_tuple_start_info.block_id].following_block;
+
+ elog(DEBUG1,"LEADER - tuple is spread across data blocks");
+
+ chunk_size = DATA_BLOCK_SIZE - cstate->curr_tuple_start_info.offset-
+ pcshared_info->data_blocks[cstate->curr_tuple_start_info.block_id].skip_bytes;
+
+ pg_atomic_add_fetch_u32(&pcshared_info->data_blocks[cstate->curr_tuple_start_info.block_id].unprocessed_chunk_parts, 1);
+
+ while (following_block_id != cstate->curr_tuple_end_info.block_id)
+ {
+ chunk_size = chunk_size + DATA_BLOCK_SIZE - pcshared_info->data_blocks[following_block_id].skip_bytes;
+
+ pg_atomic_add_fetch_u32(&pcshared_info->data_blocks[following_block_id].unprocessed_chunk_parts, 1);
+
+ following_block_id = pcshared_info->data_blocks[following_block_id].following_block;
+
+ if (following_block_id == -1)
+ break;
+ }
+
+ if (following_block_id != -1)
+ pg_atomic_add_fetch_u32(&pcshared_info->data_blocks[following_block_id].unprocessed_chunk_parts, 1);
+
+ chunk_size = chunk_size + cstate->curr_tuple_end_info.offset + 1;
+ }
+
+ if (chunk_size > 0)
+ {
+ int chunk_pos = UpdateBlockInChunkInfo(cstate,
+ cstate->curr_tuple_start_info.block_id,
+ cstate->curr_tuple_start_info.offset,
+ chunk_size,
+ CHUNK_LEADER_POPULATED);
+
+ pcshared_info->populated++;
+
+ elog(DEBUG1, "LEADER - adding - block:%u, offset:%u, chunk size:%u chunk position:%d",
+ cstate->curr_tuple_start_info.block_id,
+ cstate->curr_tuple_start_info.offset,
+ chunk_size, chunk_pos);
+ }
+
+ cstate->curr_tuple_start_info.block_id = -1;
+ cstate->curr_tuple_start_info.offset = -1;
+ cstate->curr_tuple_end_info.block_id = -1;
+ cstate->curr_tuple_end_info.offset = -1;
+
+ return false;
+}
+
+/*
+ * CopyReadBinaryAttributeLeader - leader identifies boundaries/offsets
+ * for each attribute/column, it moves on to next data block if the
+ * attribute/column is spread across data blocks.
+ */
+static pg_attribute_always_inline void
+CopyReadBinaryAttributeLeader(CopyState cstate, FmgrInfo *flinfo,
+ Oid typioparam, int32 typmod, uint32 *new_block_pos)
+{
+ int32 fld_size;
+ int readbytes;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint32 block_pos;
+
+ if ((cstate->curr_data_offset + sizeof(fld_size)) >= (DATA_BLOCK_SIZE - 1))
+ {
+ ParallelCopyDataBlock *data_block = NULL;
+ uint8 movebytes = DATA_BLOCK_SIZE - cstate->curr_data_offset;
+
+ block_pos = WaitGetFreeCopyBlock(pcshared_info);
+
+ cstate->curr_data_block->skip_bytes = movebytes;
+
+ data_block = &pcshared_info->data_blocks[block_pos];
+
+ if (movebytes > 0)
+ memmove(&data_block->data[0], &cstate->curr_data_block->data[cstate->curr_data_offset], movebytes);
+
+ elog(DEBUG1, "LEADER - field size is spread across data blocks - moved %d bytes from current block %u to %u block",
+ movebytes, cstate->curr_block_pos, block_pos);
+
+ readbytes = CopyGetData(cstate, &data_block->data[movebytes], 1, DATA_BLOCK_SIZE-movebytes);
+
+ elog(DEBUG1, "LEADER - bytes read from file after field size is moved to next data block %d", readbytes);
+
+ if (cstate->reached_eof)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("unexpected EOF in COPY data")));
+
+ cstate->prev_data_block = cstate->curr_data_block;
+ cstate->prev_data_block->following_block = block_pos;
+ cstate->curr_data_block = data_block;
+ cstate->curr_block_pos = block_pos;
+
+ if (cstate->prev_data_block->curr_blk_completed == false)
+ cstate->prev_data_block->curr_blk_completed = true;
+
+ cstate->curr_data_offset = 0;
+ *new_block_pos = block_pos;
+ }
+
+ memcpy(&fld_size, &cstate->curr_data_block->data[cstate->curr_data_offset], sizeof(fld_size));
+
+ cstate->curr_data_offset = cstate->curr_data_offset + sizeof(fld_size);
+
+ fld_size = (int32) pg_ntoh32(fld_size);
+
+ if (fld_size == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("unexpected EOF in COPY data")));
+
+ if (fld_size < -1)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("invalid field size")));
+
+ if ((DATA_BLOCK_SIZE-cstate->curr_data_offset) >= fld_size)
+ {
+ cstate->curr_data_offset = cstate->curr_data_offset + fld_size;
+ }
+ else
+ {
+ ParallelCopyDataBlock *data_block = NULL;
+
+ block_pos = WaitGetFreeCopyBlock(pcshared_info);
+
+ data_block = &pcshared_info->data_blocks[block_pos];
+
+ readbytes = CopyGetData(cstate, &data_block->data[0], 1, DATA_BLOCK_SIZE);
+
+ elog(DEBUG1, "LEADER - bytes read from file after detecting that tuple is spread across data blocks %d", readbytes);
+
+ if (cstate->reached_eof)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("unexpected EOF in COPY data")));
+
+
+ cstate->prev_data_block = cstate->curr_data_block;
+ cstate->prev_data_block->following_block = block_pos;
+
+ if (cstate->prev_data_block->curr_blk_completed == false)
+ cstate->prev_data_block->curr_blk_completed = true;
+
+ cstate->curr_data_block = data_block;
+ cstate->curr_data_offset = fld_size - (DATA_BLOCK_SIZE - cstate->curr_data_offset);
+ cstate->curr_block_pos = block_pos;
+ *new_block_pos = block_pos;
+ }
+}
+
+/*
+ * CopyReadBinaryTupleWorker - each worker reads data from data blocks after
+ * getting leader-identified tuple offsets from ring data structure.
+ */
+static pg_attribute_always_inline bool
+CopyReadBinaryTupleWorker(CopyState cstate, Datum *values, bool *nulls)
+{
+ uint32 chunk_pos;
+ ParallelCopyChunkBoundary *chunk_info;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ int16 fld_count;
+ AttrNumber attr_count = list_length(cstate->attnumlist);
+ ListCell *cur;
+ FmgrInfo *in_functions = cstate->in_functions;
+ Oid *typioparams = cstate->typioparams;
+ TupleDesc tupDesc = RelationGetDescr(cstate->rel);
+ int i;
+
+ chunk_pos = GetChunkPosition(cstate);
+
+ if (chunk_pos == -1)
+ return true;
+
+ chunk_info = &pcshared_info->chunk_boundaries.ring[chunk_pos];
+ cstate->curr_data_block = &pcshared_info->data_blocks[chunk_info->first_block];
+ cstate->curr_data_offset = chunk_info->start_offset;
+
+ if (cstate->curr_data_offset + sizeof(fld_count) >= (DATA_BLOCK_SIZE - 1))
+ {
+ /*the case where field count spread across datablocks should never occur.
+ as the leader would have moved it to next block*/
+ elog(DEBUG1, "WORKER - field count spread across datablocks should never occur");
+ }
+
+ memcpy(&fld_count, &cstate->curr_data_block->data[cstate->curr_data_offset], sizeof(fld_count));
+ fld_count = (int16) pg_ntoh16(fld_count);
+
+ if (fld_count == -1)
+ {
+ return true;
+ }
+
+ if (fld_count != attr_count)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("row field count is %d, expected %d",
+ (int) fld_count, attr_count)));
+
+ cstate->curr_data_offset = cstate->curr_data_offset + sizeof(fld_count);
+ i = 0;
+
+ foreach(cur, cstate->attnumlist)
+ {
+ int attnum = lfirst_int(cur);
+ int m = attnum - 1;
+ Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+ cstate->cur_attname = NameStr(att->attname);
+
+ i++;
+ values[m] = CopyReadBinaryAttributeWorker(cstate,
+ i,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &nulls[m]);
+ cstate->cur_attname = NULL;
+ }
+
+ pg_atomic_sub_fetch_u32(&cstate->curr_data_block->unprocessed_chunk_parts, 1);
+ chunk_info->start_offset = -1;
+ pg_atomic_write_u32(&chunk_info->chunk_state, CHUNK_WORKER_PROCESSED);
+ pg_atomic_write_u32(&chunk_info->chunk_size, -1);
+ pg_atomic_add_fetch_u64(&pcshared_info->total_worker_processed, 1);
+
+ return false;
+}
+
+/*
+ * CopyReadBinaryAttributeWorker - leader identifies boundaries/offsets
+ * for each attribute/column, it moves on to next data block if the
+ * attribute/column is spread across data blocks.
+ */
+static pg_attribute_always_inline Datum
+CopyReadBinaryAttributeWorker(CopyState cstate, int column_no,
+ FmgrInfo *flinfo, Oid typioparam, int32 typmod, bool *isnull)
+{
+ int32 fld_size;
+ Datum result;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+
+ if ((cstate->curr_data_offset + sizeof(fld_size)) >= (DATA_BLOCK_SIZE - 1))
+ {
+ ParallelCopyDataBlock *prev_data_block = cstate->curr_data_block;
+
+ elog(DEBUG1, "WORKER - field size is spread across data blocks");
+
+ cstate->curr_data_block = &pcshared_info->data_blocks[cstate->curr_data_block->following_block];
+ pg_atomic_sub_fetch_u32(&prev_data_block->unprocessed_chunk_parts, 1);
+ cstate->curr_data_offset = 0;
+ }
+
+ memcpy(&fld_size, &cstate->curr_data_block->data[cstate->curr_data_offset], sizeof(fld_size));
+ fld_size = (int32) pg_ntoh32(fld_size);
+
+ if (fld_size == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("unexpected EOF in COPY data")));
+
+ if (fld_size < -1)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("invalid field size")));
+
+ cstate->curr_data_offset = cstate->curr_data_offset + sizeof(fld_size);
+
+ /* reset attribute_buf to empty, and load raw data in it */
+ resetStringInfo(&cstate->attribute_buf);
+
+ enlargeStringInfo(&cstate->attribute_buf, fld_size);
+
+ if ((DATA_BLOCK_SIZE-cstate->curr_data_offset) >= fld_size)
+ {
+ elog(DEBUG1, "WORKER - tuple lies in single data block");
+
+ memcpy(&cstate->attribute_buf.data[0],&cstate->curr_data_block->data[cstate->curr_data_offset], fld_size);
+ cstate->curr_data_offset = cstate->curr_data_offset + fld_size;
+ }
+ else
+ {
+ ParallelCopyDataBlock *prev_data_block = cstate->curr_data_block;
+
+ elog(DEBUG1, "WORKER - tuple is spread across data blocks");
+
+ memcpy(&cstate->attribute_buf.data[0], &cstate->curr_data_block->data[cstate->curr_data_offset],
+ (DATA_BLOCK_SIZE - cstate->curr_data_offset));
+ cstate->curr_data_block = &pcshared_info->data_blocks[cstate->curr_data_block->following_block];
+ pg_atomic_sub_fetch_u32(&prev_data_block->unprocessed_chunk_parts, 1);
+ memcpy(&cstate->attribute_buf.data[DATA_BLOCK_SIZE - cstate->curr_data_offset],
+ &cstate->curr_data_block->data[0], (fld_size - (DATA_BLOCK_SIZE - cstate->curr_data_offset)));
+ cstate->curr_data_offset = fld_size - (DATA_BLOCK_SIZE - cstate->curr_data_offset);
+ }
+
+ cstate->attribute_buf.len = fld_size;
+ cstate->attribute_buf.data[fld_size] = '\0';
+
+ /* Call the column type's binary input converter */
+ result = ReceiveFunctionCall(flinfo, &cstate->attribute_buf,
+ typioparam, typmod);
+
+ /* Trouble if it didn't eat the whole buffer */
+ if (cstate->attribute_buf.cursor != cstate->attribute_buf.len)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
+ errmsg("incorrect binary data format")));
+
+ *isnull = false;
+ return result;
+}
+
/*
* GetChunkPosition - return the chunk position that worker should process.
*/
@@ -5402,63 +5919,77 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
else
{
/* binary */
- int16 fld_count;
- ListCell *cur;
+ if (!IsParallelCopy())
+ {
+ int16 fld_count;
+ ListCell *cur;
- cstate->cur_lineno++;
+ cstate->cur_lineno++;
- if (!CopyGetInt16(cstate, &fld_count))
- {
- /* EOF detected (end of file, or protocol-level EOF) */
- return false;
- }
+ if (!CopyGetInt16(cstate, &fld_count))
+ {
+ /* EOF detected (end of file, or protocol-level EOF) */
+ return false;
+ }
- if (fld_count == -1)
- {
- /*
- * Received EOF marker. In a V3-protocol copy, wait for the
- * protocol-level EOF, and complain if it doesn't come
- * immediately. This ensures that we correctly handle CopyFail,
- * if client chooses to send that now.
- *
- * Note that we MUST NOT try to read more data in an old-protocol
- * copy, since there is no protocol-level EOF marker then. We
- * could go either way for copy from file, but choose to throw
- * error if there's data after the EOF marker, for consistency
- * with the new-protocol case.
- */
- char dummy;
+ if (fld_count == -1)
+ {
+ /*
+ * Received EOF marker. In a V3-protocol copy, wait for the
+ * protocol-level EOF, and complain if it doesn't come
+ * immediately. This ensures that we correctly handle CopyFail,
+ * if client chooses to send that now.
+ *
+ * Note that we MUST NOT try to read more data in an old-protocol
+ * copy, since there is no protocol-level EOF marker then. We
+ * could go either way for copy from file, but choose to throw
+ * error if there's data after the EOF marker, for consistency
+ * with the new-protocol case.
+ */
+ char dummy;
+
+ if (cstate->copy_dest != COPY_OLD_FE &&
+ CopyGetData(cstate, &dummy, 1, 1) > 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("received copy data after EOF marker")));
+ return false;
+ }
- if (cstate->copy_dest != COPY_OLD_FE &&
- CopyGetData(cstate, &dummy, 1, 1) > 0)
+ if (fld_count != attr_count)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("received copy data after EOF marker")));
- return false;
+ errmsg("row field count is %d, expected %d",
+ (int) fld_count, attr_count)));
+
+ i = 0;
+ foreach(cur, cstate->attnumlist)
+ {
+ int attnum = lfirst_int(cur);
+ int m = attnum - 1;
+ Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+ cstate->cur_attname = NameStr(att->attname);
+ i++;
+ values[m] = CopyReadBinaryAttribute(cstate,
+ i,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &nulls[m]);
+ cstate->cur_attname = NULL;
+ }
}
+ else
+ {
+ bool eof = false;
- if (fld_count != attr_count)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("row field count is %d, expected %d",
- (int) fld_count, attr_count)));
+ cstate->cur_lineno++;
- i = 0;
- foreach(cur, cstate->attnumlist)
- {
- int attnum = lfirst_int(cur);
- int m = attnum - 1;
- Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+ eof = CopyReadBinaryTupleWorker(cstate, values, nulls);
- cstate->cur_attname = NameStr(att->attname);
- i++;
- values[m] = CopyReadBinaryAttribute(cstate,
- i,
- &in_functions[m],
- typioparams[m],
- att->atttypmod,
- &nulls[m]);
- cstate->cur_attname = NULL;
+ if (eof)
+ return false;
}
}
--
2.25.1
Thanks Amit for the clarifications. Regarding partitioned table, one of the
question was - if we are loading data into a partitioned table using COPY
command, then the input file would contain tuples for different tables
(partitions) unlike the normal table case where all the tuples in the input
file would belong to the same table. So, in such a case, how are we going
to accumulate tuples into the DSM? I mean will the leader process check
which tuple needs to be routed to which partition and accordingly
accumulate them into the DSM. For e.g. let's say in the input data file we
have 10 tuples where the 1st tuple belongs to partition1, 2nd belongs to
partition2 and likewise. So, in such cases, will the leader process
accumulate all the tuples belonging to partition1 into one DSM and tuples
belonging to partition2 into some other DSM and assign them to the worker
process or we have taken some other approach to handle this scenario?
Further, I haven't got much time to look into the links that you have
shared in your previous response. Will have a look into those and will also
slowly start looking into the patches as and when I get some time. Thank
you.
--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
On Sat, Jun 13, 2020 at 9:42 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
Show quoted text
On Fri, Jun 12, 2020 at 4:57 PM Ashutosh Sharma <ashu.coek88@gmail.com>
wrote:Hi All,
I've spent little bit of time going through the project discussion that
has happened in this email thread and to start with I have few questions
which I would like to put here:Q1) Are we also planning to read the input data in parallel or is it
only about performing the multi-insert operation in parallel? AFAIU, the
data reading part will be done by the leader process alone so no
parallelism is involved there.Yes, your understanding is correct.
Q2) How are we going to deal with the partitioned tables?
I haven't studied the patch but my understanding is that we will
support parallel copy for partitioned tables with a few restrictions
as explained in my earlier email [1]. See, Case-2 (b) in the email.I mean will there be some worker process dedicated for each partition or
how is it?
No, it the split is just based on the input, otherwise each worker
should insert as we would have done without any workers.Q3) Incase of toast tables, there is a possibility of having a single
tuple in the input file which could be of a very big size (probably in GB)
eventually resulting in a bigger file size. So, in this case, how are we
going to decide the number of worker processes to be launched. I mean,
although the file size is big, but the number of tuples to be processed is
just one or few of them, so, can we decide the number of the worker
processes to be launched based on the file size?Yeah, such situations would be tricky, so we should have an option for
user to specify the number of workers.Q4) Who is going to process constraints (preferably the deferred
constraint) that is supposed to be executed at the COMMIT time? I mean is
it the leader process or the worker process or in such cases we won't be
choosing the parallelism at all?In the first version, we won't do parallelism for this. Again, see
one of my earlier email [1] where I have explained this and other
cases where we won't be supporting parallel copy.Q5) Do we have any risk of table bloating when the data is loaded in
parallel. I am just asking this because incase of parallelism there would
be multiple processes performing bulk insert into a table. There is a
chance that the table file might get extended even if there is some space
into the file being written into, but that space is locked by some other
worker process and hence that might result in a creation of a new block for
that table. Sorry, if I am missing something here.Hmm, each worker will operate at page level, after first insertion,
the same worker will try to insert in the same page in which it has
inserted last, so there shouldn't be such a problem.Please note that I haven't gone through all the emails in this thread so
there is a possibility that I might have repeated the question that has
already been raised and answered here. If that is the case, I am sorry for
that, but it would be very helpful if someone could point out that thread
so that I can go through it. Thank you.No problem, I understand sometimes it is difficult to go through each
and every email especially when the discussion is long. Anyway,
thanks for showing the interest in the patch.[1] -
/messages/by-id/CAA4eK1+ANNEaMJCCXm4naweP5PLY6LhJMvGo_V7-Pnfbh6GsOA@mail.gmail.com--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Mon, Jun 15, 2020 at 7:41 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
Thanks Amit for the clarifications. Regarding partitioned table, one of the question was - if we are loading data into a partitioned table using COPY command, then the input file would contain tuples for different tables (partitions) unlike the normal table case where all the tuples in the input file would belong to the same table. So, in such a case, how are we going to accumulate tuples into the DSM? I mean will the leader process check which tuple needs to be routed to which partition and accordingly accumulate them into the DSM. For e.g. let's say in the input data file we have 10 tuples where the 1st tuple belongs to partition1, 2nd belongs to partition2 and likewise. So, in such cases, will the leader process accumulate all the tuples belonging to partition1 into one DSM and tuples belonging to partition2 into some other DSM and assign them to the worker process or we have taken some other approach to handle this scenario?
No, all the tuples (for all partitions) will be accumulated in a
single DSM and the workers/leader will route the tuple to an
appropriate partition.
Further, I haven't got much time to look into the links that you have shared in your previous response. Will have a look into those and will also slowly start looking into the patches as and when I get some time. Thank you.
Yeah, it will be good if you go through all the emails once because
most of the decisions (and design) in the patch is supposed to be
based on the discussion in this thread.
Note - Please don't top post, try to give inline replies.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Hi,
I have included tests for parallel copy feature & few bugs that were
identified during testing have been fixed. Attached patches for the
same.
Thoughts?
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
Show quoted text
On Tue, Jun 16, 2020 at 3:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Jun 15, 2020 at 7:41 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
Thanks Amit for the clarifications. Regarding partitioned table, one of the question was - if we are loading data into a partitioned table using COPY command, then the input file would contain tuples for different tables (partitions) unlike the normal table case where all the tuples in the input file would belong to the same table. So, in such a case, how are we going to accumulate tuples into the DSM? I mean will the leader process check which tuple needs to be routed to which partition and accordingly accumulate them into the DSM. For e.g. let's say in the input data file we have 10 tuples where the 1st tuple belongs to partition1, 2nd belongs to partition2 and likewise. So, in such cases, will the leader process accumulate all the tuples belonging to partition1 into one DSM and tuples belonging to partition2 into some other DSM and assign them to the worker process or we have taken some other approach to handle this scenario?
No, all the tuples (for all partitions) will be accumulated in a
single DSM and the workers/leader will route the tuple to an
appropriate partition.Further, I haven't got much time to look into the links that you have shared in your previous response. Will have a look into those and will also slowly start looking into the patches as and when I get some time. Thank you.
Yeah, it will be good if you go through all the emails once because
most of the decisions (and design) in the patch is supposed to be
based on the discussion in this thread.Note - Please don't top post, try to give inline replies.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
0002-Framework-for-leader-worker-in-parallel-copy.patchtext/x-patch; charset=US-ASCII; name=0002-Framework-for-leader-worker-in-parallel-copy.patchDownload
From df3439e3292ff04a471274a3e5f385c7773e1916 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>,Bharath <bharath.rupireddyforpostgres@gmail.com>
Date: Wed, 17 Jun 2020 07:23:14 +0530
Subject: [PATCH 2/6] Framework for leader/worker in parallel copy.
This patch has the framework for data structures in parallel copy, leader
initialization, worker initialization, shared memory updation, starting workers,
wait for workers and workers exiting.
---
src/backend/access/transam/parallel.c | 4 +
src/backend/commands/copy.c | 831 +++++++++++++++++++++++++++-
src/backend/replication/logical/tablesync.c | 2 +-
src/include/commands/copy.h | 4 +
src/tools/pgindent/typedefs.list | 8 +
5 files changed, 847 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 14a8690..09e7a19 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -25,6 +25,7 @@
#include "catalog/pg_enum.h"
#include "catalog/storage.h"
#include "commands/async.h"
+#include "commands/copy.h"
#include "executor/execParallel.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
@@ -145,6 +146,9 @@ static const struct
},
{
"parallel_vacuum_main", parallel_vacuum_main
+ },
+ {
+ "ParallelCopyMain", ParallelCopyMain
}
};
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index f2310ab..9977aa6 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -96,9 +96,138 @@ typedef enum CopyInsertMethod
} CopyInsertMethod;
#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+
+#define DATA_BLOCK_SIZE RAW_BUF_SIZE
+#define RINGSIZE (10 * 1000)
+#define MAX_BLOCKS_COUNT 1000
+#define WORKER_CHUNK_COUNT 50 /* should be mod of RINGSIZE */
+
+#define IsParallelCopy() (cstate->is_parallel)
+#define IsLeader() (cstate->pcdata->is_leader)
#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
/*
+ * Copy data block information.
+ */
+typedef struct ParallelCopyDataBlock
+{
+ /* The number of unprocessed lines in the current block. */
+ pg_atomic_uint32 unprocessed_line_parts;
+
+ /*
+ * If the current line data is continued into another block, following_block
+ * will have the position where the remaining data need to be read.
+ */
+ uint32 following_block;
+
+ /*
+ * This flag will be set, when the leader finds out this block can be read
+ * safely by the worker. This helps the worker to start processing the line
+ * early where the line will be spread across many blocks and the worker
+ * need not wait for the complete line to be processed.
+ */
+ bool curr_blk_completed;
+ char data[DATA_BLOCK_SIZE]; /* data read from file */
+ uint8 skip_bytes;
+}ParallelCopyDataBlock;
+
+/*
+ * Individual line information.
+ * ParallelCopyLineBoundary is common data structure between leader & worker,
+ * this is protected by the following sequence in the leader & worker.
+ * Leader should operate in the following order:
+ * 1) update first_block, start_offset & cur_lineno in any order.
+ * 2) update line_size.
+ * 3) update line_state.
+ * Worker should operate in the following order:
+ * 1) read line_size.
+ * 2) only one worker should choose one line for processing, this is handled by
+ * using pg_atomic_compare_exchange_u32, worker will change the sate to
+ * LINE_WORKER_PROCESSING only if line_state is LINE_LEADER_POPULATED.
+ * 3) read first_block, start_offset & cur_lineno in any order.
+ */
+typedef struct ParallelCopyLineBoundary
+{
+ /* Position of the first block in data_blocks array. */
+ uint32 first_block;
+ uint32 start_offset; /* start offset of the line */
+
+ /*
+ * Size of the current line -1 means line is yet to be filled completely,
+ * 0 means empty line, >0 means line filled with line size data.
+ */
+ pg_atomic_uint32 line_size;
+ pg_atomic_uint32 line_state; /* line state */
+ uint64 cur_lineno; /* line number for error messages */
+}ParallelCopyLineBoundary;
+
+/*
+ * Circular queue used to store the line information.
+ */
+typedef struct ParallelCopyLineBoundaries
+{
+ /* Position for the leader to populate a line. */
+ uint32 leader_pos;
+
+ /* Data read from the file/stdin by the leader process. */
+ ParallelCopyLineBoundary ring[RINGSIZE];
+}ParallelCopyLineBoundaries;
+
+/*
+ * Shared information among parallel copy workers. This will be allocated in the
+ * DSM segment.
+ */
+typedef struct ParallelCopyShmInfo
+{
+ bool is_read_in_progress; /* file read status */
+
+ /*
+ * Actual lines inserted by worker (some records will be filtered based on
+ * where condition).
+ */
+ pg_atomic_uint64 processed;
+ pg_atomic_uint64 total_worker_processed; /* total processed records by the workers */
+ uint64 populated; /* lines populated by leader */
+ uint32 cur_block_pos; /* current data block */
+ ParallelCopyDataBlock data_blocks[MAX_BLOCKS_COUNT]; /* data block array */
+ FullTransactionId full_transaction_id; /* xid for copy from statement */
+ CommandId mycid; /* command id */
+ ParallelCopyLineBoundaries line_boundaries; /* line array */
+} ParallelCopyShmInfo;
+
+/*
+ * Parallel copy line buffer information.
+ */
+typedef struct ParallelCopyLineBuf
+{
+ StringInfoData line_buf;
+ uint64 cur_lineno; /* line number for error messages */
+}ParallelCopyLineBuf;
+
+/*
+ * Parallel copy data information.
+ */
+typedef struct ParallelCopyData
+{
+ Oid relid; /* relation id of the table */
+ ParallelCopyShmInfo *pcshared_info; /* common info in shared memory */
+ bool is_leader;
+
+ /* line position which worker is processing */
+ uint32 worker_processed_pos;
+
+ /*
+ * Local line_buf array, workers will copy it here and release the lines
+ * for the leader to continue.
+ */
+ ParallelCopyLineBuf worker_line_buf[WORKER_CHUNK_COUNT];
+ uint32 worker_line_buf_count; /* Number of lines */
+
+ /* Current position in worker_line_buf */
+ uint32 worker_line_buf_pos;
+}ParallelCopyData;
+
+/*
* This struct contains all the state variables used throughout a COPY
* operation. For simplicity, we use the same struct for all variants of COPY,
* even though some fields are used in only some cases.
@@ -225,8 +354,66 @@ typedef struct CopyStateData
char *raw_buf;
int raw_buf_index; /* next byte to process */
int raw_buf_len; /* total # of bytes stored */
+ int nworkers;
+ bool is_parallel;
+ ParallelCopyData *pcdata;
} CopyStateData;
+/*
+ * This structure helps in storing the common data from CopyStateData that are
+ * required by the workers. This information will then be allocated and stored
+ * into the DSM for the worker to retrieve and copy it to CopyStateData.
+ */
+typedef struct ParallelCopyCommonKeyData
+{
+ /* low-level state data */
+ CopyDest copy_dest; /* type of copy source/destination */
+ int file_encoding; /* file or remote side's character encoding */
+ bool need_transcoding; /* file encoding diff from server? */
+ bool encoding_embeds_ascii; /* ASCII can be non-first byte? */
+
+ /* parameters from the COPY command */
+ bool csv_mode; /* Comma Separated Value format? */
+ bool header_line; /* CSV header line? */
+ int null_print_len; /* length of same */
+ bool force_quote_all; /* FORCE_QUOTE *? */
+ bool convert_selectively; /* do selective binary conversion? */
+
+ /* Working state for COPY FROM */
+ AttrNumber num_defaults;
+ Oid relid;
+}ParallelCopyCommonKeyData;
+
+/*
+ * This structure will help in converting a List data type into the below
+ * structure format with the count having the number of elements in the list and
+ * the info having the List elements appended contigously. This converted
+ * structure will be allocated in shared memory and stored in DSM for the worker
+ * to retrieve and later convert it back to List data type.
+ */
+typedef struct ParallelCopyKeyListInfo
+{
+ int count; /* count of attributes */
+
+ /* string info in the form info followed by info1, info2... infon */
+ char info[1];
+} ParallelCopyKeyListInfo;
+
+/*
+ * List of internal parallel copy function pointers.
+ */
+static const struct
+{
+ char *fn_name;
+ copy_data_source_cb fn_addr;
+} InternalParallelCopyFuncPtrs[] =
+
+{
+ {
+ "copy_read_data", copy_read_data
+ },
+};
+
/* DestReceiver for COPY (query) TO */
typedef struct
{
@@ -256,6 +443,23 @@ typedef struct
/* Trim the list of buffers back down to this number after flushing */
#define MAX_PARTITION_BUFFERS 32
+/* DSM keys for parallel copy. */
+#define PARALLEL_COPY_KEY_SHARED_INFO 1
+#define PARALLEL_COPY_KEY_CSTATE 2
+#define PARALLEL_COPY_KEY_NULL_PRINT 3
+#define PARALLEL_COPY_KEY_NULL_PRINT_CLIENT 4
+#define PARALLEL_COPY_KEY_DELIM 5
+#define PARALLEL_COPY_KEY_QUOTE 6
+#define PARALLEL_COPY_KEY_ESCAPE 7
+#define PARALLEL_COPY_KEY_ATTNAME_LIST 8
+#define PARALLEL_COPY_KEY_FORCE_QUOTE_LIST 9
+#define PARALLEL_COPY_KEY_NOT_NULL_LIST 10
+#define PARALLEL_COPY_KEY_NULL_LIST 11
+#define PARALLEL_COPY_KEY_CONVERT_LIST 12
+#define PARALLEL_COPY_KEY_DATASOURCE_CB 13
+#define PARALLEL_COPY_KEY_WHERE_CLAUSE_STR 14
+#define PARALLEL_COPY_KEY_RANGE_TABLE 15
+
/* Stores multi-insert data related to a single relation in CopyFrom. */
typedef struct CopyMultiInsertBuffer
{
@@ -477,10 +681,596 @@ static bool CopyGetInt32(CopyState cstate, int32 *val);
static void CopySendInt16(CopyState cstate, int16 val);
static bool CopyGetInt16(CopyState cstate, int16 *val);
+static pg_attribute_always_inline void EndParallelCopy(ParallelContext *pcxt);
static void PopulateAttributes(CopyState cstate, TupleDesc tup_desc,
List *attnamelist);
+static pg_attribute_always_inline copy_data_source_cb LookupParallelCopyFnPtr(const char *funcname);
+static pg_attribute_always_inline char* LookupParallelCopyFnStr(copy_data_source_cb fn_addr);
/*
+ * CopyCommonInfoForWorker - Copy shared_cstate using cstate information.
+ */
+static pg_attribute_always_inline void
+CopyCommonInfoForWorker(CopyState cstate, ParallelCopyCommonKeyData *shared_cstate)
+{
+ shared_cstate->copy_dest = cstate->copy_dest;
+ shared_cstate->file_encoding = cstate->file_encoding;
+ shared_cstate->need_transcoding = cstate->need_transcoding;
+ shared_cstate->encoding_embeds_ascii = cstate->encoding_embeds_ascii;
+ shared_cstate->csv_mode = cstate->csv_mode;
+ shared_cstate->header_line = cstate->header_line;
+ shared_cstate->null_print_len = cstate->null_print_len;
+ shared_cstate->force_quote_all = cstate->force_quote_all;
+ shared_cstate->convert_selectively = cstate->convert_selectively;
+ shared_cstate->num_defaults = cstate->num_defaults;
+ shared_cstate->relid = cstate->pcdata->relid;
+}
+
+/*
+ * RetrieveSharedString - Retrieve the string from shared memory.
+ */
+static void
+RetrieveSharedString(shm_toc *toc, int sharedkey, char **copystr)
+{
+ char *shared_str_val = (char *)shm_toc_lookup(toc, sharedkey, true);
+ if (shared_str_val)
+ *copystr = pstrdup(shared_str_val);
+}
+
+/*
+ * RetrieveSharedList - Retrieve the list from shared memory.
+ */
+static void
+RetrieveSharedList(shm_toc *toc, int sharedkey, List **copylist)
+{
+ ParallelCopyKeyListInfo *listinformation = (ParallelCopyKeyListInfo *)shm_toc_lookup(toc, sharedkey,
+ true);
+ if (listinformation)
+ {
+ int length = 0;
+ int count;
+
+ for (count = 0; count < listinformation->count; count++)
+ {
+ char *attname = (char *)(listinformation->info + length);
+ length += strlen(attname) + 1;
+ *copylist = lappend(*copylist, makeString(attname));
+ }
+ }
+}
+
+/*
+ * CopyListSharedMemory - Copy the list into shared memory.
+ */
+static void
+CopyListSharedMemory(List *inputlist, Size memsize, ParallelCopyKeyListInfo *sharedlistinfo)
+{
+ ListCell *l;
+ int length = 0;
+
+ MemSet(sharedlistinfo, 0, memsize);
+ foreach(l, inputlist)
+ {
+ char *name = strVal(lfirst(l));
+ memcpy((char *)(sharedlistinfo->info + length), name, strlen(name));
+ sharedlistinfo->count++;
+ length += strlen(name) + 1;
+ }
+}
+
+/*
+ * ComputeListSize - compute the list size.
+ */
+static int
+ComputeListSize(List *inputlist)
+{
+ int est_size = sizeof(int);
+ if (inputlist != NIL)
+ {
+ ListCell *l;
+ foreach(l, inputlist)
+ est_size += strlen(strVal(lfirst(l))) + 1;
+ }
+
+ return est_size;
+}
+
+/*
+ * EstimateLineKeysStr - Estimate the size required in shared memory for the
+ * input string.
+ */
+static void
+EstimateLineKeysStr(ParallelContext *pcxt, char *inputstr)
+{
+ if (inputstr)
+ {
+ shm_toc_estimate_chunk(&pcxt->estimator, strlen(inputstr) + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+}
+
+/*
+ * EstimateLineKeysList - Estimate the size required in shared memory for the
+ * input list.
+ */
+static void
+EstimateLineKeysList(ParallelContext *pcxt, List *inputlist,
+ Size *est_list_size)
+{
+ if (inputlist != NIL)
+ {
+ *est_list_size = ComputeListSize(inputlist);
+ shm_toc_estimate_chunk(&pcxt->estimator, *est_list_size);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+}
+
+/*
+ * InsertStringShm - Insert a string into shared memory.
+ */
+static void
+InsertStringShm(ParallelContext *pcxt, int key, char *inputstr)
+{
+ if (inputstr)
+ {
+ char *shmptr = (char *)shm_toc_allocate(pcxt->toc,
+ strlen(inputstr) + 1);
+ strcpy(shmptr, inputstr);
+ shm_toc_insert(pcxt->toc, key, shmptr);
+ }
+}
+
+/*
+ * InsertListShm - Insert a list into shared memory.
+ */
+static void
+InsertListShm(ParallelContext *pcxt, int key, List *inputlist,
+ Size est_list_size)
+{
+ if (inputlist != NIL)
+ {
+ ParallelCopyKeyListInfo *sharedlistinfo = (ParallelCopyKeyListInfo *)shm_toc_allocate(pcxt->toc,
+ est_list_size);
+ CopyListSharedMemory(inputlist, est_list_size, sharedlistinfo);
+ shm_toc_insert(pcxt->toc, key, sharedlistinfo);
+ }
+}
+
+/*
+ * BeginParallelCopy - start parallel copy tasks.
+ *
+ * Get the number of workers required to perform the parallel copy. The data
+ * structures that are required by the parallel workers will be initialized, the
+ * size required in DSM will be calculated and the necessary keys will be loaded
+ * in the DSM. The specified number of workers will then be launched.
+ *
+ */
+static ParallelContext*
+BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
+{
+ ParallelContext *pcxt;
+ ParallelCopyShmInfo *shared_info_ptr;
+ ParallelCopyCommonKeyData *shared_cstate;
+ FullTransactionId full_transaction_id;
+ Size est_shared_info;
+ Size est_cstateshared;
+ Size est_att_list_size;
+ Size est_quote_list_size;
+ Size est_notnull_list_size;
+ Size est_null_list_size;
+ Size est_convert_list_size;
+ Size est_datasource_cb_size;
+ int count = 0;
+ char *whereClauseStr = NULL;
+ char *rangeTableStr = NULL;
+ int parallel_workers = 0;
+ ParallelCopyData *pcdata;
+
+ parallel_workers = Min(nworkers, max_worker_processes);
+
+ /* Can't perform copy in parallel */
+ if (parallel_workers <= 0)
+ {
+ elog(WARNING,
+ "No workers available, copy will be run in non-parallel mode");
+ return NULL;
+ }
+
+ pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
+ cstate->pcdata = pcdata;
+
+ EnterParallelMode();
+ pcxt = CreateParallelContext("postgres", "ParallelCopyMain",
+ parallel_workers);
+ Assert(pcxt->nworkers > 0);
+
+ /*
+ * Estimate size for shared information for PARALLEL_COPY_KEY_SHARED_INFO
+ */
+ est_shared_info = sizeof(ParallelCopyShmInfo);
+ shm_toc_estimate_chunk(&pcxt->estimator, est_shared_info);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate the size for shared information for PARALLEL_COPY_KEY_CSTATE */
+ est_cstateshared = MAXALIGN(sizeof(ParallelCopyCommonKeyData));
+ shm_toc_estimate_chunk(&pcxt->estimator, est_cstateshared);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ EstimateLineKeysStr(pcxt, cstate->null_print);
+ EstimateLineKeysStr(pcxt, cstate->null_print_client);
+ EstimateLineKeysStr(pcxt, cstate->delim);
+ EstimateLineKeysStr(pcxt, cstate->quote);
+ EstimateLineKeysStr(pcxt, cstate->escape);
+
+ if (cstate->whereClause != NULL)
+ {
+ whereClauseStr = nodeToString(cstate->whereClause);
+ EstimateLineKeysStr(pcxt, whereClauseStr);
+ }
+
+ if (cstate->range_table != NULL)
+ {
+ rangeTableStr = nodeToString(cstate->range_table);
+ EstimateLineKeysStr(pcxt, rangeTableStr);
+ }
+
+ /* Estimate the size for shared information for PARALLEL_COPY_KEY_XID. */
+ shm_toc_estimate_chunk(&pcxt->estimator, sizeof(FullTransactionId));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_ATTNAME_LIST.
+ */
+ EstimateLineKeysList(pcxt, attnamelist, &est_att_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_FORCE_QUOTE_LIST.
+ */
+ EstimateLineKeysList(pcxt, cstate->force_quote, &est_quote_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_NOT_NULL_LIST.
+ */
+ EstimateLineKeysList(pcxt, cstate->force_notnull,
+ &est_notnull_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_NULL_LIST.
+ */
+ EstimateLineKeysList(pcxt, cstate->force_null, &est_null_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_CONVERT_LIST.
+ */
+ EstimateLineKeysList(pcxt, cstate->convert_select,
+ &est_convert_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_DATASOURCE_CB.
+ */
+ if (cstate->data_source_cb)
+ {
+ char *functionname = LookupParallelCopyFnStr(cstate->data_source_cb);
+ est_datasource_cb_size = strlen(functionname) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, est_datasource_cb_size);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+
+ InitializeParallelDSM(pcxt);
+
+ /* If no DSM segment was available, back out (do serial copy) */
+ if (pcxt->seg == NULL)
+ {
+ EndParallelCopy(pcxt);
+ elog(WARNING,
+ "DSM segments not available, copy will be run in non-parallel mode");
+ return NULL;
+ }
+
+ /* Allocate shared memory for PARALLEL_COPY_KEY_SHARED_INFO */
+ shared_info_ptr = (ParallelCopyShmInfo *) shm_toc_allocate(pcxt->toc,
+ est_shared_info);
+ MemSet(shared_info_ptr, 0, est_shared_info);
+ shared_info_ptr->is_read_in_progress = true;
+ shared_info_ptr->cur_block_pos = -1;
+ shared_info_ptr->full_transaction_id = full_transaction_id;
+ shared_info_ptr->mycid = GetCurrentCommandId(true);
+ for (count = 0; count < RINGSIZE; count++)
+ {
+ ParallelCopyLineBoundary *lineInfo = &shared_info_ptr->line_boundaries.ring[count];
+ pg_atomic_init_u32(&(lineInfo->line_size), -1);
+ }
+
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_SHARED_INFO, shared_info_ptr);
+ pcdata->pcshared_info = shared_info_ptr;
+ pcdata->relid = relid;
+
+ /* Store shared build state, for which we reserved space. */
+ shared_cstate = (ParallelCopyCommonKeyData *)shm_toc_allocate(pcxt->toc,
+ est_cstateshared);
+
+ /* copy cstate variables. */
+ CopyCommonInfoForWorker(cstate, shared_cstate);
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_CSTATE, shared_cstate);
+
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_NULL_PRINT, cstate->null_print);
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_NULL_PRINT_CLIENT,
+ cstate->null_print_client);
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_DELIM, cstate->delim);
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_QUOTE, cstate->quote);
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_ESCAPE, cstate->escape);
+
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_ATTNAME_LIST,
+ attnamelist, est_att_list_size);
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_FORCE_QUOTE_LIST,
+ cstate->force_quote, est_quote_list_size);
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_NOT_NULL_LIST,
+ cstate->force_notnull, est_notnull_list_size);
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_NULL_LIST, cstate->force_null,
+ est_null_list_size);
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_CONVERT_LIST,
+ cstate->convert_select, est_convert_list_size);
+
+ if (cstate->data_source_cb)
+ {
+ char *functionname = LookupParallelCopyFnStr(cstate->data_source_cb);
+ char *data_source_cb = (char *) shm_toc_allocate(pcxt->toc,
+ est_datasource_cb_size);
+ strcpy(data_source_cb, functionname);
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_DATASOURCE_CB,
+ data_source_cb);
+ }
+
+ if (cstate->whereClause)
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_WHERE_CLAUSE_STR,
+ whereClauseStr);
+
+ if(cstate->range_table)
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_RANGE_TABLE, rangeTableStr);
+
+ LaunchParallelWorkers(pcxt);
+ if (pcxt->nworkers_launched == 0)
+ {
+ EndParallelCopy(pcxt);
+ elog(WARNING,
+ "No workers available, copy will be run in non-parallel mode");
+ return NULL;
+ }
+
+ /*
+ * Caller needs to wait for all launched workers when we return. Make sure
+ * that the failure-to-start case will not hang forever.
+ */
+ WaitForParallelWorkersToAttach(pcxt);
+
+ pcdata->is_leader = true;
+ cstate->is_parallel = true;
+ return pcxt;
+}
+
+/*
+ * EndParallelCopy - end the parallel copy tasks.
+ */
+static pg_attribute_always_inline void
+EndParallelCopy(ParallelContext *pcxt)
+{
+ Assert(!IsParallelWorker());
+
+ DestroyParallelContext(pcxt);
+ ExitParallelMode();
+}
+
+/*
+ * ParallelWorkerInitialization - Initialize parallel worker.
+ */
+static void
+ParallelWorkerInitialization(ParallelCopyCommonKeyData *shared_cstate,
+ CopyState cstate, List *attnamelist)
+{
+ uint32 count;
+ ParallelCopyData *pcdata = cstate->pcdata;
+ TupleDesc tup_desc = RelationGetDescr(cstate->rel);
+
+ cstate->copy_dest = shared_cstate->copy_dest;
+ cstate->file_encoding = shared_cstate->file_encoding;
+ cstate->need_transcoding = shared_cstate->need_transcoding;
+ cstate->encoding_embeds_ascii = shared_cstate->encoding_embeds_ascii;
+ cstate->csv_mode = shared_cstate->csv_mode;
+ cstate->header_line = shared_cstate->header_line;
+ cstate->null_print_len = shared_cstate->null_print_len;
+ cstate->force_quote_all = shared_cstate->force_quote_all;
+ cstate->convert_selectively = shared_cstate->convert_selectively;
+ cstate->num_defaults = shared_cstate->num_defaults;
+ pcdata->relid = shared_cstate->relid;
+
+ PopulateAttributes(cstate, tup_desc, attnamelist);
+
+ /* Initialize state variables. */
+ cstate->reached_eof = false;
+ cstate->eol_type = EOL_UNKNOWN;
+ cstate->cur_relname = RelationGetRelationName(cstate->rel);
+ cstate->cur_lineno = 0;
+ cstate->cur_attname = NULL;
+ cstate->cur_attval = NULL;
+
+ /* Set up variables to avoid per-attribute overhead. */
+ initStringInfo(&cstate->attribute_buf);
+
+ initStringInfo(&cstate->line_buf);
+ for (count = 0; count < WORKER_CHUNK_COUNT;count++)
+ initStringInfo(&pcdata->worker_line_buf[count].line_buf);
+
+ cstate->line_buf_converted = false;
+ cstate->raw_buf = NULL;
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+}
+/*
+ * ParallelCopyMain - parallel copy worker's code.
+ *
+ * Where clause handling, convert tuple to columns, add default null values for
+ * the missing columns that are not present in that record. Find the partition
+ * if it is partitioned table, invoke before row insert Triggers, handle
+ * constraints and insert the tuples.
+ */
+void
+ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
+{
+ CopyState cstate;
+ ParallelCopyData *pcdata;
+ ParallelCopyShmInfo *pcshared_info;
+ ParallelCopyCommonKeyData *shared_cstate;
+ Relation rel = NULL;
+ MemoryContext oldcontext;
+ List *attlist = NIL;
+ char *data_source_cb;
+ char *whereClauseStr = NULL;
+ char *rangeTableStr = NULL;
+
+ /* Allocate workspace and zero all fields. */
+ cstate = (CopyStateData *) palloc0(sizeof(CopyStateData));
+ pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
+ cstate->pcdata = pcdata;
+ pcdata->is_leader = false;
+ pcdata->worker_processed_pos = -1;
+ cstate->is_parallel = true;
+
+ /*
+ * We allocate everything used by a cstate in a new memory context. This
+ * avoids memory leaks during repeated use of COPY in a query.
+ */
+ cstate->copycontext = AllocSetContextCreate(CurrentMemoryContext,
+ "COPY",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+
+ pcshared_info = (ParallelCopyShmInfo *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_SHARED_INFO,
+ false);
+
+ ereport(DEBUG1, (errmsg("Starting parallel copy worker")));
+
+ pcdata->pcshared_info = pcshared_info;
+
+ shared_cstate = (ParallelCopyCommonKeyData *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_CSTATE,
+ false);
+
+ cstate->null_print = (char *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_NULL_PRINT,
+ true);
+ cstate->null_print_client = (char *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_NULL_PRINT_CLIENT,
+ true);
+
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_DELIM, &cstate->delim);
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_QUOTE, &cstate->quote);
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_ESCAPE, &cstate->escape);
+
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_ATTNAME_LIST, &attlist);
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_FORCE_QUOTE_LIST, &cstate->force_quote);
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_NOT_NULL_LIST, &cstate->force_notnull);
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_NULL_LIST, &cstate->force_null);
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_CONVERT_LIST, &cstate->convert_select);
+ data_source_cb = (char *)shm_toc_lookup(toc,
+ PARALLEL_COPY_KEY_DATASOURCE_CB,
+ true);
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_WHERE_CLAUSE_STR, &whereClauseStr);
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_RANGE_TABLE, &rangeTableStr);
+
+ if (data_source_cb)
+ cstate->data_source_cb = LookupParallelCopyFnPtr(data_source_cb);
+
+ if (whereClauseStr)
+ {
+ Node *whereClauseCnvrtdFrmStr = (Node *) stringToNode(whereClauseStr);
+ cstate->whereClause = whereClauseCnvrtdFrmStr;
+ }
+
+ if (rangeTableStr)
+ {
+ List *rangeTableCnvrtdFrmStr = (List *) stringToNode(rangeTableStr);
+ cstate->range_table = rangeTableCnvrtdFrmStr;
+ }
+
+ /* Open and lock the relation, using the appropriate lock type. */
+ rel = table_open(shared_cstate->relid, RowExclusiveLock);
+ cstate->rel = rel;
+ ParallelWorkerInitialization(shared_cstate, cstate, attlist);
+
+ CopyFrom(cstate);
+
+ if (rel != NULL)
+ table_close(rel, RowExclusiveLock);
+
+ MemoryContextSwitchTo(oldcontext);
+ return;
+}
+/*
+ * ParallelCopyLeader - parallel copy leader's functionality.
+ *
+ * Leader will populate the shared memory and share it across the workers.
+ * Leader will read the table data from the file and copy the contents to block.
+ * Leader will then read the input contents and identify the data based on line
+ * beaks. This information is called line. The line will be populate in
+ * ParallelCopyLineBoundary. Workers will then pick up this information and
+ * insert in to table. Leader will do this till it completes processing the
+ * file. Leader executes the before statement if before statement trigger is
+ * present. Leader read the data from input file. Leader then loads data to data
+ * blocks as and when required block by block. Leader traverses through the data
+ * block to identify one line. It gets a free line to copy the information, if
+ * there is no free line it will wait till there is one free line.
+ * Server copies the identified lines information into lines. This process is
+ * repeated till the complete file is processed.
+ * Leader will wait till all the lines populated are processed by the workers
+ * and exits.
+ */
+static void
+ParallelCopyLeader(CopyState cstate)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ereport(DEBUG1, (errmsg("Running parallel copy leader")));
+
+ pcshared_info->is_read_in_progress = false;
+ cstate->cur_lineno = 0;
+}
+/*
+ * LookupParallelCopyFnPtr - Look up parallel copy function pointer.
+ */
+static pg_attribute_always_inline copy_data_source_cb
+LookupParallelCopyFnPtr(const char *funcname)
+{
+ int i;
+
+ for (i = 0; i < lengthof(InternalParallelCopyFuncPtrs); i++)
+ {
+ if (strcmp(InternalParallelCopyFuncPtrs[i].fn_name, funcname) == 0)
+ return InternalParallelCopyFuncPtrs[i].fn_addr;
+ }
+
+ /* We can only reach this by programming error. */
+ elog(ERROR, "internal function \"%s\" not found", funcname);
+}
+
+/*
+ * LookupParallelCopyFnStr - Lookup function string from a function pointer.
+ */
+static pg_attribute_always_inline char*
+LookupParallelCopyFnStr(copy_data_source_cb fn_addr)
+{
+ int i;
+
+ for (i = 0; i < lengthof(InternalParallelCopyFuncPtrs); i++)
+ {
+ if (InternalParallelCopyFuncPtrs[i].fn_addr == fn_addr)
+ return InternalParallelCopyFuncPtrs[i].fn_name;
+ }
+
+ /* We can only reach this by programming error. */
+ elog(ERROR, "internal function pointer not found");
+}
+/*
* Send copy start/stop messages for frontend copies. These have changed
* in past protocol redesigns.
*/
@@ -1150,6 +1940,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
if (is_from)
{
+ ParallelContext *pcxt = NULL;
Assert(rel);
/* check read-only transaction and parallel mode */
@@ -1159,7 +1950,24 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
cstate = BeginCopyFrom(pstate, rel, stmt->filename, stmt->is_program,
NULL, stmt->attlist, stmt->options);
cstate->whereClause = whereClause;
- *processed = CopyFrom(cstate); /* copy from file to database */
+ cstate->is_parallel = false;
+
+ if (cstate->nworkers > 0)
+ pcxt = BeginParallelCopy(cstate->nworkers, cstate, stmt->attlist,
+ relid);
+
+ if (pcxt)
+ {
+ ParallelCopyLeader(cstate);
+
+ /* Wait for all copy workers to finish */
+ WaitForParallelWorkersToFinish(pcxt);
+ *processed = pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
+ EndParallelCopy(pcxt);
+ }
+ else
+ *processed = CopyFrom(cstate); /* copy from file to database */
+
EndCopyFrom(cstate);
}
else
@@ -1208,6 +2016,7 @@ ProcessCopyOptions(ParseState *pstate,
cstate->is_copy_from = is_from;
cstate->file_encoding = -1;
+ cstate->nworkers = -1;
/* Extract options from the statement node tree */
foreach(option, options)
@@ -1376,6 +2185,26 @@ ProcessCopyOptions(ParseState *pstate,
defel->defname),
parser_errposition(pstate, defel->location)));
}
+ else if (strcmp(defel->defname, "parallel") == 0)
+ {
+ if (!cstate->is_copy_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("parallel option supported only for copy from"),
+ parser_errposition(pstate, defel->location)));
+ if (cstate->nworkers >= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("conflicting or redundant options"),
+ parser_errposition(pstate, defel->location)));
+ cstate->nworkers = atoi(defGetString(defel));
+ if (cstate->nworkers <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("argument to option \"%s\" must be a positive integer greater than zero",
+ defel->defname),
+ parser_errposition(pstate, defel->location)));
+ }
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index c27d970..b3787c1 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -557,7 +557,7 @@ make_copy_attnamelist(LogicalRepRelMapEntry *rel)
* Data source callback for the COPY FROM, which reads from the remote
* connection and passes the data back to our local COPY.
*/
-static int
+int
copy_read_data(void *outbuf, int minread, int maxread)
{
int bytesread = 0;
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c639833..5dc95ac 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -14,6 +14,7 @@
#ifndef COPY_H
#define COPY_H
+#include "access/parallel.h"
#include "nodes/execnodes.h"
#include "nodes/parsenodes.h"
#include "parser/parse_node.h"
@@ -41,4 +42,7 @@ extern uint64 CopyFrom(CopyState cstate);
extern DestReceiver *CreateCopyDestReceiver(void);
+extern void ParallelCopyMain(dsm_segment *seg, shm_toc *toc);
+
+extern int copy_read_data(void *outbuf, int minread, int maxread);
#endif /* COPY_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index c65a552..8a79794 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1698,6 +1698,14 @@ ParallelBitmapHeapState
ParallelBlockTableScanDesc
ParallelCompletionPtr
ParallelContext
+ParallelCopyLineBoundaries
+ParallelCopyLineBoundary
+ParallelCopyCommonKeyData
+ParallelCopyData
+ParallelCopyDataBlock
+ParallelCopyKeyListInfo
+ParallelCopyLineBuf
+ParallelCopyShmInfo
ParallelExecutorInfo
ParallelHashGrowth
ParallelHashJoinBatch
--
1.8.3.1
0003-Allow-copy-from-command-to-process-data-from-file-ST.patchtext/x-patch; charset=US-ASCII; name=0003-Allow-copy-from-command-to-process-data-from-file-ST.patchDownload
From 1f0b6d47ffe5ef7d1c08edbfb62c527e122494ab Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>, Bharath <bharath.rupireddyforpostgres@gmail.com>
Date: Wed, 17 Jun 2020 07:31:30 +0530
Subject: [PATCH 3/6] Allow copy from command to process data from file/STDIN
contents to a table in parallel.
This feature allows the copy from to leverage multiple CPUs in order to copy
data from file/STDIN to a table. This adds a PARALLEL option to COPY FROM
command where the user can specify the number of workers that can be used
to perform the COPY FROM command. Specifying zero as number of workers will
disable parallelism.
The backend, to which the "COPY FROM" query is submitted acts as leader with
the responsibility of reading data from the file/stdin, launching at most n
number of workers as specified with PARALLEL 'n' option in the "COPY FROM"
query. The leader populates the common data required for the workers execution
in the DSM and shares it with the workers. The leader then executes before
statement triggers if there exists any. Leader populates DSM lines which
includes the start offset and line size, while populating the lines it reads
as many blocks as required into the DSM data blocks from the file. Each block
is of 64K size. The leader parses the data to identify a line, the existing
logic from CopyReadLineText which identifies the lines with some changes was
used for this. Leader checks if a free line is available to copy the
information, if there is no free line it waits till the required line is
freed up by the worker and then copies the identified lines information
(offset & line size) into the DSM lines. This process is repeated till the
complete file is processed. Simultaneously, the workers cache the lines(50)
locally into the local memory and release the lines to the leader for further
populating. Each worker processes the lines that it cached and inserts it into
the table. The leader does not participate in the insertion of data, leaders
only responsibility will be to identify the lines as fast as possible for the
workers to do the actual copy operation. The leader waits till all the lines
populated are processed by the workers and exits.
---
src/backend/access/heap/heapam.c | 13 -
src/backend/access/transam/xact.c | 13 +
src/backend/commands/copy.c | 896 ++++++++++++++++++++++++++++++++++++--
src/include/access/xact.h | 1 +
src/tools/pgindent/typedefs.list | 1 +
5 files changed, 873 insertions(+), 51 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 537913d..28f3a98 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2012,19 +2012,6 @@ static HeapTuple
heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
CommandId cid, int options)
{
- /*
- * Parallel operations are required to be strictly read-only in a parallel
- * worker. Parallel inserts are not safe even in the leader in the
- * general case, because group locking means that heavyweight locks for
- * relation extension or GIN page locks will not conflict between members
- * of a lock group, but we don't prohibit that case here because there are
- * useful special cases that we can safely allow, such as CREATE TABLE AS.
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-
tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index cd30b62..d43902c 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -502,6 +502,19 @@ GetCurrentFullTransactionIdIfAny(void)
}
/*
+ * AssignFullTransactionIdForWorker
+ *
+ * For parallel copy, all the workers must use the same transaction id.
+ */
+void AssignFullTransactionIdForWorker(FullTransactionId fullTransactionId)
+{
+ TransactionState s = CurrentTransactionState;
+
+ Assert((IsInParallelMode() || IsParallelWorker()));
+ s->fullTransactionId = fullTransactionId;
+}
+
+/*
* MarkCurrentTransactionIdLoggedIfAny
*
* Remember that the current xid - if it is assigned - now has been wal logged.
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 9977aa6..f19e991 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -26,6 +26,7 @@
#include "access/xlog.h"
#include "catalog/dependency.h"
#include "catalog/pg_authid.h"
+#include "catalog/pg_proc_d.h"
#include "catalog/pg_type.h"
#include "commands/copy.h"
#include "commands/defrem.h"
@@ -40,11 +41,13 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "optimizer/clauses.h"
#include "optimizer/optimizer.h"
#include "parser/parse_coerce.h"
#include "parser/parse_collate.h"
#include "parser/parse_expr.h"
#include "parser/parse_relation.h"
+#include "pgstat.h"
#include "port/pg_bswap.h"
#include "rewrite/rewriteHandler.h"
#include "storage/fd.h"
@@ -95,6 +98,18 @@ typedef enum CopyInsertMethod
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
} CopyInsertMethod;
+/*
+ * State of the line.
+ */
+typedef enum ParallelCopyLineState
+{
+ LINE_INIT, /* initial state of line */
+ LINE_LEADER_POPULATING, /* leader processing line */
+ LINE_LEADER_POPULATED, /* leader completed populating line */
+ LINE_WORKER_PROCESSING, /* worker processing line */
+ LINE_WORKER_PROCESSED /* worker completed processing line */
+}ParallelCopyLineState;
+
#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
#define DATA_BLOCK_SIZE RAW_BUF_SIZE
@@ -538,9 +553,13 @@ if (1) \
{ \
if (raw_buf_ptr > cstate->raw_buf_index) \
{ \
- appendBinaryStringInfo(&cstate->line_buf, \
- cstate->raw_buf + cstate->raw_buf_index, \
- raw_buf_ptr - cstate->raw_buf_index); \
+ if (!IsParallelCopy()) \
+ appendBinaryStringInfo(&cstate->line_buf, \
+ cstate->raw_buf + cstate->raw_buf_index, \
+ raw_buf_ptr - cstate->raw_buf_index); \
+ else \
+ line_size += raw_buf_ptr - cstate->raw_buf_index; \
+ \
cstate->raw_buf_index = raw_buf_ptr; \
} \
} else ((void) 0)
@@ -553,13 +572,40 @@ if (1) \
goto not_end_of_copy; \
} else ((void) 0)
+/* Begin parallel copy Macros */
+#define SET_NEWLINE_SIZE() \
+{ \
+ if (cstate->eol_type == EOL_NL || cstate->eol_type == EOL_CR) \
+ new_line_size = 1; \
+ else if (cstate->eol_type == EOL_CRNL) \
+ new_line_size = 2; \
+ else \
+ new_line_size = 0; \
+}
+
+/*
+ * COPY_WAIT_TO_PROCESS - Wait before continuing to process.
+ */
+#define COPY_WAIT_TO_PROCESS() \
+{ \
+ CHECK_FOR_INTERRUPTS(); \
+ (void) WaitLatch(MyLatch, \
+ WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, \
+ 1L, WAIT_EVENT_PG_SLEEP); \
+ ResetLatch(MyLatch); \
+}
+
+/* End parallel copy Macros */
+
/*
* CONVERT_TO_SERVER_ENCODING - convert contents to server encoding.
*/
#define CONVERT_TO_SERVER_ENCODING(cstate) \
{ \
/* Done reading the line. Convert it to server encoding. */ \
- if (cstate->need_transcoding) \
+ if (cstate->need_transcoding && \
+ (!IsParallelCopy() || \
+ (IsParallelCopy() && !IsLeader()))) \
{ \
char *cvt; \
cvt = pg_any_to_server(cstate->line_buf.data, \
@@ -618,22 +664,38 @@ if (1) \
* CLEAR_EOL_LINE - Wrapper for clearing EOL.
*/
#define CLEAR_EOL_LINE() \
-if (!result && !IsHeaderLine()) \
- CLEAR_EOL_FROM_COPIED_DATA(cstate->line_buf.data, \
- cstate->line_buf.len, \
- cstate->line_buf.len) \
+{ \
+ if (!result && !IsHeaderLine()) \
+ { \
+ if (IsParallelCopy()) \
+ CLEAR_EOL_FROM_COPIED_DATA(cstate->raw_buf, \
+ raw_buf_ptr, line_size) \
+ else \
+ CLEAR_EOL_FROM_COPIED_DATA(cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ cstate->line_buf.len) \
+ } \
+}
/*
* INCREMENTPROCESSED - Increment the lines processed.
*/
-#define INCREMENTPROCESSED(processed) \
-processed++;
+#define INCREMENTPROCESSED(processed) \
+{ \
+ if (!IsParallelCopy()) \
+ processed++; \
+ else \
+ pg_atomic_add_fetch_u64(&cstate->pcdata->pcshared_info->processed, 1); \
+}
/*
* GETPROCESSED - Get the lines processed.
*/
#define GETPROCESSED(processed) \
-return processed;
+if (!IsParallelCopy()) \
+ return processed; \
+else \
+ return pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
@@ -682,8 +744,12 @@ static void CopySendInt16(CopyState cstate, int16 val);
static bool CopyGetInt16(CopyState cstate, int16 *val);
static pg_attribute_always_inline void EndParallelCopy(ParallelContext *pcxt);
+static void ExecBeforeStmtTrigger(CopyState cstate);
+static void CheckCopyFromValidity(CopyState cstate);
static void PopulateAttributes(CopyState cstate, TupleDesc tup_desc,
List *attnamelist);
+static void PopulateCatalogInformation(CopyState cstate);
+static pg_attribute_always_inline uint32 GetLinePosition(CopyState cstate);
static pg_attribute_always_inline copy_data_source_cb LookupParallelCopyFnPtr(const char *funcname);
static pg_attribute_always_inline char* LookupParallelCopyFnStr(copy_data_source_cb fn_addr);
@@ -837,6 +903,137 @@ InsertListShm(ParallelContext *pcxt, int key, List *inputlist,
}
/*
+ * IsTriggerFunctionParallelSafe - Check if the trigger function is parallel
+ * safe for the triggers. Return false if any one of the trigger has parallel
+ * unsafe function.
+ */
+static pg_attribute_always_inline bool
+IsTriggerFunctionParallelSafe(TriggerDesc *trigdesc)
+{
+ int i;
+ for (i = 0; i < trigdesc->numtriggers; i++)
+ {
+ Trigger *trigger = &trigdesc->triggers[i];
+ int trigtype = RI_TRIGGER_NONE;
+
+ if (func_parallel(trigger->tgfoid) != PROPARALLEL_SAFE)
+ return false;
+
+ /* If the trigger is parallel safe, also look for RI_TRIGGER. */
+ trigtype = RI_FKey_trigger_type(trigger->tgfoid);
+ if (trigtype == RI_TRIGGER_PK || trigtype == RI_TRIGGER_FK)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * CheckExprParallelSafety - determine parallel safety of volatile expressions
+ * in default clause of column definition or in where clause and return true if
+ * they are parallel safe.
+ */
+static pg_attribute_always_inline bool
+CheckExprParallelSafety(CopyState cstate)
+{
+ if (contain_volatile_functions(cstate->whereClause))
+ {
+ if (max_parallel_hazard((Query *)cstate->whereClause) != PROPARALLEL_SAFE)
+ return false;
+ }
+
+ /*
+ * Check if any of the column has volatile default expression. if yes, and
+ * they are not parallel safe, then parallelism is not allowed. For
+ * instance, if there are any serial/bigserial columns for which nextval()
+ * default expression which is parallel unsafe is associated, parallelism
+ * should not be allowed. In non paralllel copy volatile functions are not
+ * checked for nextval().
+ */
+ if (cstate->defexprs != NULL && cstate->num_defaults != 0)
+ {
+ int i;
+ for (i = 0; i < cstate->num_defaults; i++)
+ {
+ bool volatile_expr = contain_volatile_functions((Node *)cstate->defexprs[i]->expr);
+ if (volatile_expr &&
+ (max_parallel_hazard((Query *)cstate->defexprs[i]->expr)) !=
+ PROPARALLEL_SAFE)
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * FindInsertMethod - determine insert mode single, multi, or multi conditional.
+ */
+static pg_attribute_always_inline CopyInsertMethod
+FindInsertMethod(CopyState cstate)
+{
+ if (cstate->rel->trigdesc != NULL &&
+ (cstate->rel->trigdesc->trig_insert_before_row ||
+ cstate->rel->trigdesc->trig_insert_instead_row))
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
+ cstate->rel->trigdesc != NULL &&
+ cstate->rel->trigdesc->trig_insert_new_table)
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+ return CIM_MULTI_CONDITIONAL;
+
+ return CIM_MULTI;
+}
+
+/*
+ * IsParallelCopyAllowed - check for the cases where parallel copy is not
+ * applicable.
+ */
+static pg_attribute_always_inline bool
+IsParallelCopyAllowed(CopyState cstate)
+{
+ /* Parallel copy not allowed for frontend (2.0 protocol) & binary option. */
+ if ((cstate->copy_dest == COPY_OLD_FE) || cstate->binary)
+ return false;
+
+ /* Check if copy is into foreign table or temporary table. */
+ if (cstate->rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE ||
+ RelationUsesLocalBuffers(cstate->rel))
+ return false;
+
+ /* Check if trigger function is parallel safe. */
+ if (cstate->rel->trigdesc != NULL &&
+ !IsTriggerFunctionParallelSafe(cstate->rel->trigdesc))
+ return false;
+
+ /*
+ * Check if there is after statement or instead of trigger or transition
+ * table triggers.
+ */
+ if (cstate->rel->trigdesc != NULL &&
+ (cstate->rel->trigdesc->trig_insert_after_statement ||
+ cstate->rel->trigdesc->trig_insert_instead_row ||
+ cstate->rel->trigdesc->trig_insert_new_table))
+ return false;
+
+ /* Check if the volatile expressions are parallel safe, if present any. */
+ if (!CheckExprParallelSafety(cstate))
+ return false;
+
+ /* Check if the insertion mode is single. */
+ if (FindInsertMethod(cstate) == CIM_SINGLE)
+ return false;
+
+ return true;
+}
+
+/*
* BeginParallelCopy - start parallel copy tasks.
*
* Get the number of workers required to perform the parallel copy. The data
@@ -866,6 +1063,8 @@ BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
int parallel_workers = 0;
ParallelCopyData *pcdata;
+ CheckCopyFromValidity(cstate);
+
parallel_workers = Min(nworkers, max_worker_processes);
/* Can't perform copy in parallel */
@@ -879,6 +1078,19 @@ BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
cstate->pcdata = pcdata;
+ /*
+ * User chosen parallel copy. Determine if the parallel copy is actually
+ * allowed. If not, go with the non-parallel mode.
+ */
+ if (!IsParallelCopyAllowed(cstate))
+ {
+ elog(WARNING,
+ "Parallel copy not supported for specified table, copy will be run in non-parallel mode");
+ return NULL;
+ }
+
+ full_transaction_id = GetCurrentFullTransactionId();
+
EnterParallelMode();
pcxt = CreateParallelContext("postgres", "ParallelCopyMain",
parallel_workers);
@@ -1109,7 +1321,211 @@ ParallelWorkerInitialization(ParallelCopyCommonKeyData *shared_cstate,
cstate->line_buf_converted = false;
cstate->raw_buf = NULL;
cstate->raw_buf_index = cstate->raw_buf_len = 0;
+
+ PopulateCatalogInformation(cstate);
+
+ /* Create workspace for CopyReadAttributes results. */
+ if (!cstate->binary)
+ {
+ AttrNumber attr_count = list_length(cstate->attnumlist);
+
+ cstate->max_fields = attr_count;
+ cstate->raw_fields = (char **)palloc(attr_count * sizeof(char *));
+ }
+}
+
+/*
+ * CacheLineInfo - Cache the line information to local memory.
+ */
+static bool
+CacheLineInfo(CopyState cstate, uint32 buff_count)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyData *pcdata = cstate->pcdata;
+ uint32 write_pos;
+ ParallelCopyDataBlock *data_blk_ptr;
+ ParallelCopyLineBoundary *lineInfo;
+ uint32 offset;
+ int dataSize;
+ int copiedSize = 0;
+
+ resetStringInfo(&pcdata->worker_line_buf[buff_count].line_buf);
+ write_pos = GetLinePosition(cstate);
+ if (-1 == write_pos)
+ return true;
+
+ /* Get the current line information. */
+ lineInfo = &pcshared_info->line_boundaries.ring[write_pos];
+ if (pg_atomic_read_u32(&lineInfo->line_size) == 0)
+ goto empty_data_line_update;
+
+ /* Get the block information. */
+ data_blk_ptr = &pcshared_info->data_blocks[lineInfo->first_block];
+
+ /* Get the offset information from where the data must be copied. */
+ offset = lineInfo->start_offset;
+ pcdata->worker_line_buf[buff_count].cur_lineno = lineInfo->cur_lineno;
+
+ elog(DEBUG1, "[Worker] Processing - line position:%d, block:%d, unprocessed lines:%d, offset:%d, line size:%d",
+ write_pos, lineInfo->first_block,
+ pg_atomic_read_u32(&data_blk_ptr->unprocessed_line_parts),
+ offset, pg_atomic_read_u32(&lineInfo->line_size));
+
+ for (;;)
+ {
+ uint8 skip_bytes = data_blk_ptr->skip_bytes;
+ /*
+ * There is a possibility that the above loop has come out because
+ * data_blk_ptr->curr_blk_completed is set, but dataSize read might
+ * be an old value, if data_blk_ptr->curr_blk_completed and the line is
+ * completed, line_size will be set. Read the line_size again to be
+ * sure if it is complete or partial block.
+ */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+ if (dataSize)
+ {
+ int remainingSize = dataSize - copiedSize;
+ if (!remainingSize)
+ break;
+
+ /* Whole line is in current block. */
+ if (remainingSize + offset + skip_bytes < DATA_BLOCK_SIZE)
+ {
+ appendBinaryStringInfo(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ remainingSize);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts,
+ 1);
+ break;
+ }
+ else
+ {
+ /* Line is spread across the blocks. */
+ uint32 lineInCurrentBlock = (DATA_BLOCK_SIZE - skip_bytes) - offset;
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ lineInCurrentBlock);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts, 1);
+ copiedSize += lineInCurrentBlock;
+ while (copiedSize < dataSize)
+ {
+ uint32 currentBlockCopySize;
+ ParallelCopyDataBlock *currBlkPtr = &pcshared_info->data_blocks[data_blk_ptr->following_block];
+ skip_bytes = currBlkPtr->skip_bytes;
+
+ /*
+ * If complete data is present in current block use
+ * dataSize - copiedSize, or copy the whole block from
+ * current block.
+ */
+ currentBlockCopySize = Min(dataSize - copiedSize, DATA_BLOCK_SIZE - skip_bytes);
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &currBlkPtr->data[0],
+ currentBlockCopySize);
+ pg_atomic_sub_fetch_u32(&currBlkPtr->unprocessed_line_parts, 1);
+ copiedSize += currentBlockCopySize;
+ data_blk_ptr = currBlkPtr;
+ }
+
+ break;
+ }
+ }
+ else
+ {
+ /* Copy this complete block from the current offset. */
+ uint32 lineInCurrentBlock = (DATA_BLOCK_SIZE - skip_bytes) - offset;
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ lineInCurrentBlock);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts, 1);
+ copiedSize += lineInCurrentBlock;
+
+ /*
+ * Reset the offset. For the first copy, copy from the offset. For
+ * the subsequent copy the complete block.
+ */
+ offset = 0;
+
+ /* Set data_blk_ptr to the following block. */
+ data_blk_ptr = &pcshared_info->data_blocks[data_blk_ptr->following_block];
+ }
+
+ for (;;)
+ {
+ /* Get the size of this line */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+
+ /*
+ * If the data is present in current block lineInfo.line_size
+ * will be updated. If the data is spread across the blocks either
+ * of lineInfo.line_size or data_blk_ptr->curr_blk_completed can
+ * be updated. lineInfo.line_size will be updated if the complete
+ * read is finished. data_blk_ptr->curr_blk_completed will be
+ * updated if processing of current block is finished and data
+ * processing is not finished.
+ */
+ if (data_blk_ptr->curr_blk_completed || (dataSize != -1))
+ break;
+
+ COPY_WAIT_TO_PROCESS()
+ }
+ }
+
+empty_data_line_update:
+ elog(DEBUG1, "[Worker] Completed processing line:%d", write_pos);
+ pg_atomic_write_u32(&lineInfo->line_state, LINE_WORKER_PROCESSED);
+ pg_atomic_write_u32(&lineInfo->line_size, -1);
+ pg_atomic_add_fetch_u64(&pcshared_info->total_worker_processed, 1);
+ return false;
+}
+
+/*
+ * GetWorkerLine - Returns a line for worker to process.
+ */
+static bool
+GetWorkerLine(CopyState cstate)
+{
+ uint32 buff_count;
+ ParallelCopyData *pcdata = cstate->pcdata;
+
+ /*
+ * Copy the line data to line_buf and release the line position so that the
+ * worker can continue loading data.
+ */
+ if (pcdata->worker_line_buf_pos < pcdata->worker_line_buf_count)
+ goto return_line;
+
+ pcdata->worker_line_buf_pos = 0;
+ pcdata->worker_line_buf_count = 0;
+
+ for (buff_count = 0; buff_count < WORKER_CHUNK_COUNT; buff_count++)
+ {
+ bool result = CacheLineInfo(cstate, buff_count);
+ if (result)
+ break;
+
+ pcdata->worker_line_buf_count++;
+ }
+
+ if (pcdata->worker_line_buf_count)
+ goto return_line;
+ else
+ resetStringInfo(&cstate->line_buf);
+
+ return true;
+
+return_line:
+ cstate->line_buf = pcdata->worker_line_buf[pcdata->worker_line_buf_pos].line_buf;
+ cstate->cur_lineno = pcdata->worker_line_buf[pcdata->worker_line_buf_pos].cur_lineno;
+ cstate->line_buf_valid = true;
+
+ /* Mark that encoding conversion hasn't occurred yet. */
+ cstate->line_buf_converted = false;
+ CONVERT_TO_SERVER_ENCODING(cstate)
+ pcdata->worker_line_buf_pos++;
+ return false;
}
+
/*
* ParallelCopyMain - parallel copy worker's code.
*
@@ -1155,6 +1571,7 @@ ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
ereport(DEBUG1, (errmsg("Starting parallel copy worker")));
pcdata->pcshared_info = pcshared_info;
+ AssignFullTransactionIdForWorker(pcshared_info->full_transaction_id);
shared_cstate = (ParallelCopyCommonKeyData *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_CSTATE,
false);
@@ -1207,6 +1624,34 @@ ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
MemoryContextSwitchTo(oldcontext);
return;
}
+
+/*
+ * UpdateBlockInLineInfo - Update the line information.
+ */
+static pg_attribute_always_inline int
+UpdateBlockInLineInfo(CopyState cstate, uint32 blk_pos,
+ uint32 offset, uint32 line_size, uint32 line_state)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundaries *lineBoundaryPtr = &pcshared_info->line_boundaries;
+ ParallelCopyLineBoundary *lineInfo;
+ int line_pos = lineBoundaryPtr->leader_pos;
+
+ /* Update the line information for the worker to pick and process. */
+ lineInfo = &lineBoundaryPtr->ring[line_pos];
+ while (pg_atomic_read_u32(&lineInfo->line_size) != -1)
+ COPY_WAIT_TO_PROCESS()
+
+ lineInfo->first_block = blk_pos;
+ lineInfo->start_offset = offset;
+ lineInfo->cur_lineno = cstate->cur_lineno;
+ pg_atomic_write_u32(&lineInfo->line_size, line_size);
+ pg_atomic_write_u32(&lineInfo->line_state, line_state);
+ lineBoundaryPtr->leader_pos = (lineBoundaryPtr->leader_pos + 1) % RINGSIZE;
+
+ return line_pos;
+}
+
/*
* ParallelCopyLeader - parallel copy leader's functionality.
*
@@ -1232,9 +1677,158 @@ ParallelCopyLeader(CopyState cstate)
ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
ereport(DEBUG1, (errmsg("Running parallel copy leader")));
+ /* Execute the before statement triggers from the leader */
+ ExecBeforeStmtTrigger(cstate);
+
+ /* On input just throw the header line away. */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
+ {
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ {
+ pcshared_info->is_read_in_progress = false;
+ return; /* done */
+ }
+ }
+
+ for (;;)
+ {
+ bool done;
+ cstate->cur_lineno++;
+
+ /* Actually read the line into memory here. */
+ done = CopyReadLine(cstate);
+
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ break;
+ }
+
pcshared_info->is_read_in_progress = false;
cstate->cur_lineno = 0;
}
+
+/*
+ * GetLinePosition - return the line position that worker should process.
+ */
+static uint32
+GetLinePosition(CopyState cstate)
+{
+ ParallelCopyData *pcdata = cstate->pcdata;
+ ParallelCopyShmInfo *pcshared_info = pcdata->pcshared_info;
+ uint32 previous_pos = pcdata->worker_processed_pos;
+ uint32 write_pos = (previous_pos == -1) ? 0 : (previous_pos + 1) % RINGSIZE;
+ for (;;)
+ {
+ int dataSize;
+ bool is_read_in_progress = pcshared_info->is_read_in_progress;
+ ParallelCopyLineBoundary *lineInfo;
+ ParallelCopyDataBlock *data_blk_ptr;
+ ParallelCopyLineState line_state = LINE_LEADER_POPULATED;
+ ParallelCopyLineState curr_line_state;
+ CHECK_FOR_INTERRUPTS();
+
+ /* File read completed & no elements to process. */
+ if (!is_read_in_progress &&
+ (pcshared_info->populated ==
+ pg_atomic_read_u64(&pcshared_info->total_worker_processed)))
+ {
+ write_pos = -1;
+ break;
+ }
+
+ /* Get the current line information. */
+ lineInfo = &pcshared_info->line_boundaries.ring[write_pos];
+ curr_line_state = pg_atomic_read_u32(&lineInfo->line_state);
+ if ((write_pos % WORKER_CHUNK_COUNT == 0) &&
+ (curr_line_state == LINE_WORKER_PROCESSED ||
+ curr_line_state == LINE_WORKER_PROCESSING))
+ {
+ pcdata->worker_processed_pos = write_pos;
+ write_pos = (write_pos + WORKER_CHUNK_COUNT) % RINGSIZE;
+ continue;
+ }
+
+ /* Get the size of this line. */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+
+ if (dataSize != 0) /* If not an empty line. */
+ {
+ /* Get the block information. */
+ data_blk_ptr = &pcshared_info->data_blocks[lineInfo->first_block];
+
+ if (!data_blk_ptr->curr_blk_completed && (dataSize == -1))
+ {
+ /* Wait till the current line or block is added. */
+ COPY_WAIT_TO_PROCESS()
+ continue;
+ }
+ }
+
+ /* Make sure that no worker has consumed this element. */
+ if (pg_atomic_compare_exchange_u32(&lineInfo->line_state,
+ &line_state, LINE_WORKER_PROCESSING))
+ break;
+ }
+
+ pcdata->worker_processed_pos = write_pos;
+ return write_pos;
+}
+
+/*
+ * GetFreeCopyBlock - Get a free block for data to be copied.
+ */
+static pg_attribute_always_inline uint32
+GetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
+{
+ int count = 0;
+ uint32 last_free_block = pcshared_info->cur_block_pos;
+ uint32 block_pos = (last_free_block != -1) ? ((last_free_block + 1) % MAX_BLOCKS_COUNT): 0;
+
+ /* Get a new block for copying data. */
+ while (count < MAX_BLOCKS_COUNT)
+ {
+ ParallelCopyDataBlock *dataBlkPtr = &pcshared_info->data_blocks[block_pos];
+ uint32 unprocessed_line_parts = pg_atomic_read_u32(&dataBlkPtr->unprocessed_line_parts);
+ if (unprocessed_line_parts == 0)
+ {
+ dataBlkPtr->curr_blk_completed = false;
+ dataBlkPtr->skip_bytes = 0;
+ pcshared_info->cur_block_pos = block_pos;
+ return block_pos;
+ }
+
+ block_pos = (block_pos + 1) % MAX_BLOCKS_COUNT;
+ count++;
+ }
+
+ return -1;
+}
+
+/*
+ * WaitGetFreeCopyBlock - If there are no blocks available, wait and get a block
+ * for copying data.
+ */
+static uint32
+WaitGetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
+{
+ uint32 new_free_pos = -1;
+ for (;;)
+ {
+ new_free_pos = GetFreeCopyBlock(pcshared_info);
+ if (new_free_pos != -1) /* We have got one block, break now. */
+ break;
+
+ COPY_WAIT_TO_PROCESS()
+ }
+
+ return new_free_pos;
+}
+
/*
* LookupParallelCopyFnPtr - Look up parallel copy function pointer.
*/
@@ -1270,6 +1864,146 @@ LookupParallelCopyFnStr(copy_data_source_cb fn_addr)
/* We can only reach this by programming error. */
elog(ERROR, "internal function pointer not found");
}
+
+/*
+ * SetRawBufForLoad - Set raw_buf to the shared memory where the file data must
+ * be read.
+ */
+static void
+SetRawBufForLoad(CopyState cstate, uint32 line_size, uint32 copy_buf_len,
+ uint32 raw_buf_ptr, char **copy_raw_buf)
+{
+ ParallelCopyShmInfo *pcshared_info;
+ uint32 cur_block_pos;
+ uint32 next_block_pos;
+ ParallelCopyDataBlock *cur_data_blk_ptr = NULL;
+ ParallelCopyDataBlock *next_data_blk_ptr = NULL;
+
+ if (!IsParallelCopy())
+ return;
+
+ pcshared_info = cstate->pcdata->pcshared_info;
+ cur_block_pos = pcshared_info->cur_block_pos;
+ cur_data_blk_ptr = (cstate->raw_buf) ? &pcshared_info->data_blocks[cur_block_pos] : NULL;
+ next_block_pos = WaitGetFreeCopyBlock(pcshared_info);
+ next_data_blk_ptr = &pcshared_info->data_blocks[next_block_pos];
+
+ /* set raw_buf to the data block in shared memory */
+ cstate->raw_buf = next_data_blk_ptr->data;
+ *copy_raw_buf = cstate->raw_buf;
+ if (cur_data_blk_ptr && line_size)
+ {
+ /*
+ * Mark the previous block as completed, worker can start copying this
+ * data.
+ */
+ cur_data_blk_ptr->following_block = next_block_pos;
+ pg_atomic_add_fetch_u32(&cur_data_blk_ptr->unprocessed_line_parts, 1);
+ cur_data_blk_ptr->skip_bytes = copy_buf_len - raw_buf_ptr;
+ cur_data_blk_ptr->curr_blk_completed = true;
+ }
+}
+
+/*
+ * EndLineParallelCopy - Update the line information in shared memory.
+ */
+static void
+EndLineParallelCopy(CopyState cstate, uint32 line_pos, uint32 line_size,
+ uint32 raw_buf_ptr)
+{
+ uint8 new_line_size;
+ if (!IsParallelCopy())
+ return;
+
+ if (!IsHeaderLine())
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundaries *lineBoundaryPtr = &pcshared_info->line_boundaries;
+ SET_NEWLINE_SIZE()
+ if (line_size)
+ {
+ ParallelCopyLineBoundary *lineInfo = &lineBoundaryPtr->ring[line_pos];
+ /*
+ * If the new_line_size > raw_buf_ptr, then the new block has only
+ * new line char content. The unprocessed count should not be
+ * increased in this case.
+ */
+ if (raw_buf_ptr > new_line_size)
+ {
+ uint32 cur_block_pos = pcshared_info->cur_block_pos;
+ ParallelCopyDataBlock *curr_data_blk_ptr = &pcshared_info->data_blocks[cur_block_pos];
+ pg_atomic_add_fetch_u32(&curr_data_blk_ptr->unprocessed_line_parts, 1);
+ }
+
+ /* Update line size. */
+ pg_atomic_write_u32(&lineInfo->line_size, line_size);
+ pg_atomic_write_u32(&lineInfo->line_state, LINE_LEADER_POPULATED);
+ elog(DEBUG1, "[Leader] After adding - line position:%d, line_size:%d",
+ line_pos, line_size);
+ pcshared_info->populated++;
+ }
+ else if (new_line_size)
+ {
+ /* This means only new line char, empty record should be inserted.*/
+ ParallelCopyLineBoundary *lineInfo;
+ line_pos = UpdateBlockInLineInfo(cstate, -1, -1, 0,
+ LINE_LEADER_POPULATED);
+ lineInfo = &lineBoundaryPtr->ring[line_pos];
+ elog(DEBUG1, "[Leader] Added empty line with offset:%d, line position:%d, line size:%d",
+ lineInfo->start_offset, line_pos,
+ pg_atomic_read_u32(&lineInfo->line_size));
+ pcshared_info->populated++;
+ }
+ }
+}
+
+/*
+ * ExecBeforeStmtTrigger - Execute the before statement trigger, this will be
+ * executed for parallel copy by the leader process.
+ */
+static void
+ExecBeforeStmtTrigger(CopyState cstate)
+{
+ EState *estate = CreateExecutorState();
+ ResultRelInfo *resultRelInfo;
+
+ Assert(IsLeader());
+
+ /*
+ * We need a ResultRelInfo so we can use the regular executor's
+ * index-entry-making machinery. (There used to be a huge amount of code
+ * here that basically duplicated execUtils.c ...)
+ */
+ resultRelInfo = makeNode(ResultRelInfo);
+ InitResultRelInfo(resultRelInfo,
+ cstate->rel,
+ 1, /* must match rel's position in range_table */
+ NULL,
+ 0);
+
+ /* Verify the named relation is a valid target for INSERT */
+ CheckValidResultRel(resultRelInfo, CMD_INSERT);
+
+ estate->es_result_relations = resultRelInfo;
+ estate->es_num_result_relations = 1;
+ estate->es_result_relation_info = resultRelInfo;
+
+ ExecInitRangeTable(estate, cstate->range_table);
+
+ /*
+ * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
+ * should do this for COPY, since it's not really an "INSERT" statement as
+ * such. However, executing these triggers maintains consistency with the
+ * EACH ROW triggers that we already fire on COPY.
+ */
+ ExecBSInsertTriggers(estate, resultRelInfo);
+
+ /* Close any trigger target relations */
+ ExecCleanUpTriggerState(estate);
+
+ FreeExecutorState(estate);
+}
+
/*
* Send copy start/stop messages for frontend copies. These have changed
* in past protocol redesigns.
@@ -3673,7 +4407,8 @@ CopyFrom(CopyState cstate)
PartitionTupleRouting *proute = NULL;
ErrorContextCallback errcallback;
- CommandId mycid = GetCurrentCommandId(true);
+ CommandId mycid = IsParallelCopy() ? cstate->pcdata->pcshared_info->mycid :
+ GetCurrentCommandId(true);
int ti_options = 0; /* start with default options for insert */
BulkInsertState bistate = NULL;
CopyInsertMethod insertMethod;
@@ -3683,7 +4418,14 @@ CopyFrom(CopyState cstate)
bool has_instead_insert_row_trig;
bool leafpart_use_multi_insert = false;
- CheckCopyFromValidity(cstate);
+ /*
+ * Perform this check if it is not parallel copy. In case of parallel
+ * copy, this check is done by the leader, so that if any invalid case
+ * exist the copy from command will error out from the leader itself,
+ * avoiding launching workers, just to throw error.
+ */
+ if (!IsParallelCopy())
+ CheckCopyFromValidity(cstate);
/*
* If the target file is new-in-transaction, we assume that checking FSM
@@ -3872,13 +4614,16 @@ CopyFrom(CopyState cstate)
has_instead_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
resultRelInfo->ri_TrigDesc->trig_insert_instead_row);
- /*
- * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
- * should do this for COPY, since it's not really an "INSERT" statement as
- * such. However, executing these triggers maintains consistency with the
- * EACH ROW triggers that we already fire on COPY.
- */
- ExecBSInsertTriggers(estate, resultRelInfo);
+ if (!IsParallelCopy())
+ {
+ /*
+ * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
+ * should do this for COPY, since it's not really an "INSERT" statement as
+ * such. However, executing these triggers maintains consistency with the
+ * EACH ROW triggers that we already fire on COPY.
+ */
+ ExecBSInsertTriggers(estate, resultRelInfo);
+ }
econtext = GetPerTupleExprContext(estate);
@@ -3978,6 +4723,16 @@ CopyFrom(CopyState cstate)
!has_instead_insert_row_trig &&
resultRelInfo->ri_FdwRoutine == NULL;
+ /*
+ * If the table has any partitions that are either foreign or
+ * has BEFORE/INSTEAD OF triggers, we can't perform copy
+ * operations with parallel workers.
+ */
+ if (!leafpart_use_multi_insert && IsParallelWorker())
+ ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot perform PARALLEL COPY if partition has BEFORE/INSTEAD OF triggers, or if the partition is foreign partition"),
+ errhint("Try COPY without PARALLEL option")));
+
/* Set the multi-insert buffer to use for this partition. */
if (leafpart_use_multi_insert)
{
@@ -4392,7 +5147,7 @@ BeginCopyFrom(ParseState *pstate,
initStringInfo(&cstate->attribute_buf);
initStringInfo(&cstate->line_buf);
cstate->line_buf_converted = false;
- cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf = (IsParallelCopy()) ? NULL : (char *) palloc(RAW_BUF_SIZE + 1);
cstate->raw_buf_index = cstate->raw_buf_len = 0;
/* Assign range table, we'll need it in CopyFrom. */
@@ -4536,26 +5291,35 @@ NextCopyFromRawFields(CopyState cstate, char ***fields, int *nfields)
/* only available for text or csv input */
Assert(!cstate->binary);
- /* on input just throw the header line away */
- if (cstate->cur_lineno == 0 && cstate->header_line)
+ if (IsParallelCopy())
{
- cstate->cur_lineno++;
- if (CopyReadLine(cstate))
- return false; /* done */
+ done = GetWorkerLine(cstate);
+ if (done && cstate->line_buf.len == 0)
+ return false;
}
+ else
+ {
+ /* on input just throw the header line away */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
+ {
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ return false; /* done */
+ }
- cstate->cur_lineno++;
+ cstate->cur_lineno++;
- /* Actually read the line into memory here */
- done = CopyReadLine(cstate);
+ /* Actually read the line into memory here */
+ done = CopyReadLine(cstate);
- /*
- * EOF at start of line means we're done. If we see EOF after some
- * characters, we act as though it was newline followed by EOF, ie,
- * process the line and then exit loop on next iteration.
- */
- if (done && cstate->line_buf.len == 0)
- return false;
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ return false;
+ }
/* Parse the line into de-escaped field values */
if (cstate->csv_mode)
@@ -4805,9 +5569,31 @@ CopyReadLine(CopyState cstate)
*/
if (cstate->copy_dest == COPY_NEW_FE)
{
+ bool bIsFirst = true;
do
{
- cstate->raw_buf_index = cstate->raw_buf_len;
+ if (!IsParallelCopy())
+ cstate->raw_buf_index = cstate->raw_buf_len;
+ else
+ {
+ if (cstate->raw_buf_index == RAW_BUF_SIZE)
+ {
+ /* Get a new block if it is the first time, From the
+ * subsequent time, reset the index and re-use the same
+ * block.
+ */
+ if (bIsFirst)
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint32 block_pos = WaitGetFreeCopyBlock(pcshared_info);
+ cstate->raw_buf = pcshared_info->data_blocks[block_pos].data;
+ bIsFirst = false;
+ }
+
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+ }
+ }
+
} while (CopyLoadRawBuf(cstate));
}
}
@@ -4837,6 +5623,11 @@ CopyReadLineText(CopyState cstate)
char quotec = '\0';
char escapec = '\0';
+ /* For parallel copy */
+ uint32 line_size = 0;
+ int line_pos = 0;
+
+ cstate->eol_type = EOL_UNKNOWN;
if (cstate->csv_mode)
{
quotec = cstate->quote[0];
@@ -4891,6 +5682,8 @@ CopyReadLineText(CopyState cstate)
if (raw_buf_ptr >= copy_buf_len || need_data)
{
REFILL_LINEBUF;
+ SetRawBufForLoad(cstate, line_size, copy_buf_len, raw_buf_ptr,
+ ©_raw_buf);
/*
* Try to read some more data. This will certainly reset
@@ -5115,9 +5908,15 @@ CopyReadLineText(CopyState cstate)
* discard the data and the \. sequence.
*/
if (prev_raw_ptr > cstate->raw_buf_index)
- appendBinaryStringInfo(&cstate->line_buf,
+ {
+ if (!IsParallelCopy())
+ appendBinaryStringInfo(&cstate->line_buf,
cstate->raw_buf + cstate->raw_buf_index,
prev_raw_ptr - cstate->raw_buf_index);
+ else
+ line_size += prev_raw_ptr - cstate->raw_buf_index;
+ }
+
cstate->raw_buf_index = raw_buf_ptr;
result = true; /* report EOF */
break;
@@ -5169,6 +5968,26 @@ not_end_of_copy:
IF_NEED_REFILL_AND_EOF_BREAK(mblen - 1);
raw_buf_ptr += mblen - 1;
}
+
+ /*
+ * Skip the header line. Update the line here, this cannot be done at
+ * the beginning, as there is a possibility that file contains empty
+ * lines.
+ */
+ if (IsParallelCopy() && first_char_in_line && !IsHeaderLine())
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundary *lineInfo;
+ uint32 line_first_block = pcshared_info->cur_block_pos;
+ line_pos = UpdateBlockInLineInfo(cstate,
+ line_first_block,
+ cstate->raw_buf_index, -1,
+ LINE_LEADER_POPULATING);
+ lineInfo = &pcshared_info->line_boundaries.ring[line_pos];
+ elog(DEBUG1, "[Leader] Adding - block:%d, offset:%d, line position:%d",
+ line_first_block, lineInfo->start_offset, line_pos);
+ }
+
first_char_in_line = false;
} /* end of outer loop */
@@ -5177,6 +5996,7 @@ not_end_of_copy:
*/
REFILL_LINEBUF;
CLEAR_EOL_LINE()
+ EndLineParallelCopy(cstate, line_pos, line_size, raw_buf_ptr);
return result;
}
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 88025b1..f8bdcc3 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -381,6 +381,7 @@ extern FullTransactionId GetTopFullTransactionId(void);
extern FullTransactionId GetTopFullTransactionIdIfAny(void);
extern FullTransactionId GetCurrentFullTransactionId(void);
extern FullTransactionId GetCurrentFullTransactionIdIfAny(void);
+extern void AssignFullTransactionIdForWorker(FullTransactionId fullTransactionId);
extern void MarkCurrentTransactionIdLoggedIfAny(void);
extern bool SubTransactionIsActive(SubTransactionId subxid);
extern CommandId GetCurrentCommandId(bool used);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 8a79794..86b5c62 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1700,6 +1700,7 @@ ParallelCompletionPtr
ParallelContext
ParallelCopyLineBoundaries
ParallelCopyLineBoundary
+ParallelCopyLineState
ParallelCopyCommonKeyData
ParallelCopyData
ParallelCopyDataBlock
--
1.8.3.1
0006-Parallel-Copy-For-Binary-Format-Files.patchtext/x-patch; charset=US-ASCII; name=0006-Parallel-Copy-For-Binary-Format-Files.patchDownload
From 770204d2eb526a975b832ddae63abaddf181cde5 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>, Vignesh C <vignesh21@gmail.com>
Date: Wed, 17 Jun 2020 08:00:31 +0530
Subject: [PATCH 6/6] Parallel Copy For Binary Format Files
Leader reads data from the file into the DSM data blocks each of 64K size.
It also identifies each tuple data block id, start offset, end offset,
tuple size and updates this information in the ring data structure.
Workers parallely read the tuple information from the ring data structure,
the actual tuple data from the data blocks and parallely insert the tuples
into the table.
---
src/backend/commands/copy.c | 669 +++++++++++++++++++++++++++++++++++++++-----
1 file changed, 600 insertions(+), 69 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index f19e991..093b836 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -243,6 +243,16 @@ typedef struct ParallelCopyData
}ParallelCopyData;
/*
+ * Tuple boundary information used for parallel copy
+ * for binary format files.
+ */
+typedef struct ParallelCopyTupleInfo
+{
+ uint32 offset;
+ uint32 block_id;
+}ParallelCopyTupleInfo;
+
+/*
* This struct contains all the state variables used throughout a COPY
* operation. For simplicity, we use the same struct for all variants of COPY,
* even though some fields are used in only some cases.
@@ -372,6 +382,16 @@ typedef struct CopyStateData
int nworkers;
bool is_parallel;
ParallelCopyData *pcdata;
+
+ /*
+ * Parallel copy for binary formatted files
+ */
+ ParallelCopyDataBlock *curr_data_block;
+ ParallelCopyDataBlock *prev_data_block;
+ uint32 curr_data_offset;
+ uint32 curr_block_pos;
+ ParallelCopyTupleInfo curr_tuple_start_info;
+ ParallelCopyTupleInfo curr_tuple_end_info;
} CopyStateData;
/*
@@ -397,6 +417,7 @@ typedef struct ParallelCopyCommonKeyData
/* Working state for COPY FROM */
AttrNumber num_defaults;
Oid relid;
+ bool binary;
}ParallelCopyCommonKeyData;
/*
@@ -752,6 +773,14 @@ static void PopulateCatalogInformation(CopyState cstate);
static pg_attribute_always_inline uint32 GetLinePosition(CopyState cstate);
static pg_attribute_always_inline copy_data_source_cb LookupParallelCopyFnPtr(const char *funcname);
static pg_attribute_always_inline char* LookupParallelCopyFnStr(copy_data_source_cb fn_addr);
+static uint32 WaitGetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info);
+static pg_attribute_always_inline bool CopyReadBinaryTupleLeader(CopyState cstate);
+static pg_attribute_always_inline void CopyReadBinaryAttributeLeader(CopyState cstate,
+ FmgrInfo *flinfo, Oid typioparam, int32 typmod,
+ uint32 *new_block_pos);
+static pg_attribute_always_inline bool CopyReadBinaryTupleWorker(CopyState cstate, Datum *values, bool *nulls);
+static pg_attribute_always_inline Datum CopyReadBinaryAttributeWorker(CopyState cstate, int column_no,
+ FmgrInfo *flinfo, Oid typioparam, int32 typmod, bool *isnull);
/*
* CopyCommonInfoForWorker - Copy shared_cstate using cstate information.
@@ -770,6 +799,7 @@ CopyCommonInfoForWorker(CopyState cstate, ParallelCopyCommonKeyData *shared_csta
shared_cstate->convert_selectively = cstate->convert_selectively;
shared_cstate->num_defaults = cstate->num_defaults;
shared_cstate->relid = cstate->pcdata->relid;
+ shared_cstate->binary = cstate->binary;
}
/*
@@ -998,8 +1028,8 @@ FindInsertMethod(CopyState cstate)
static pg_attribute_always_inline bool
IsParallelCopyAllowed(CopyState cstate)
{
- /* Parallel copy not allowed for frontend (2.0 protocol) & binary option. */
- if ((cstate->copy_dest == COPY_OLD_FE) || cstate->binary)
+ /* Parallel copy not allowed for frontend (2.0 protocol). */
+ if (cstate->copy_dest == COPY_OLD_FE)
return false;
/* Check if copy is into foreign table or temporary table. */
@@ -1300,6 +1330,7 @@ ParallelWorkerInitialization(ParallelCopyCommonKeyData *shared_cstate,
cstate->convert_selectively = shared_cstate->convert_selectively;
cstate->num_defaults = shared_cstate->num_defaults;
pcdata->relid = shared_cstate->relid;
+ cstate->binary = shared_cstate->binary;
PopulateAttributes(cstate, tup_desc, attnamelist);
@@ -1315,7 +1346,7 @@ ParallelWorkerInitialization(ParallelCopyCommonKeyData *shared_cstate,
initStringInfo(&cstate->attribute_buf);
initStringInfo(&cstate->line_buf);
- for (count = 0; count < WORKER_CHUNK_COUNT;count++)
+ for (count = 0; count < WORKER_CHUNK_COUNT; count++)
initStringInfo(&pcdata->worker_line_buf[count].line_buf);
cstate->line_buf_converted = false;
@@ -1332,6 +1363,15 @@ ParallelWorkerInitialization(ParallelCopyCommonKeyData *shared_cstate,
cstate->max_fields = attr_count;
cstate->raw_fields = (char **)palloc(attr_count * sizeof(char *));
}
+
+ cstate->curr_data_block = NULL;
+ cstate->prev_data_block = NULL;
+ cstate->curr_data_offset = 0;
+ cstate->curr_block_pos = 0;
+ cstate->curr_tuple_start_info.block_id = -1;
+ cstate->curr_tuple_start_info.offset = -1;
+ cstate->curr_tuple_end_info.block_id = -1;
+ cstate->curr_tuple_end_info.offset = -1;
}
/*
@@ -1680,32 +1720,59 @@ ParallelCopyLeader(CopyState cstate)
/* Execute the before statement triggers from the leader */
ExecBeforeStmtTrigger(cstate);
- /* On input just throw the header line away. */
- if (cstate->cur_lineno == 0 && cstate->header_line)
+ if (!cstate->binary)
{
- cstate->cur_lineno++;
- if (CopyReadLine(cstate))
+ /* On input just throw the header line away. */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
{
- pcshared_info->is_read_in_progress = false;
- return; /* done */
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ {
+ pcshared_info->is_read_in_progress = false;
+ return; /* done */
+ }
}
- }
- for (;;)
+ for (;;)
+ {
+ bool done;
+ cstate->cur_lineno++;
+
+ /* Actually read the line into memory here. */
+ done = CopyReadLine(cstate);
+
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ break;
+ }
+ }
+ else
{
- bool done;
- cstate->cur_lineno++;
+ cstate->curr_data_block = NULL;
+ cstate->prev_data_block = NULL;
+ cstate->curr_data_offset = 0;
+ cstate->curr_block_pos = 0;
+ cstate->curr_tuple_start_info.block_id = -1;
+ cstate->curr_tuple_start_info.offset = -1;
+ cstate->curr_tuple_end_info.block_id = -1;
+ cstate->curr_tuple_end_info.offset = -1;
+ pcshared_info->populated = 0;
+ cstate->cur_lineno = 0;
- /* Actually read the line into memory here. */
- done = CopyReadLine(cstate);
+ for (;;)
+ {
+ bool eof = false;
+ cstate->cur_lineno++;
- /*
- * EOF at start of line means we're done. If we see EOF after some
- * characters, we act as though it was newline followed by EOF, ie,
- * process the line and then exit loop on next iteration.
- */
- if (done && cstate->line_buf.len == 0)
- break;
+ eof = CopyReadBinaryTupleLeader(cstate);
+
+ if (eof)
+ break;
+ }
}
pcshared_info->is_read_in_progress = false;
@@ -1713,6 +1780,456 @@ ParallelCopyLeader(CopyState cstate)
}
/*
+ * CopyReadBinaryTupleLeader - leader reads data from binary formatted file
+ * to data blocks and identies tuple boundaries/offsets so that workers
+ * can work on the data blocks data.
+ */
+static pg_attribute_always_inline bool
+CopyReadBinaryTupleLeader(CopyState cstate)
+{
+ int readbytes = 0;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint32 block_pos;
+ int16 fld_count;
+ AttrNumber attr_count = list_length(cstate->attnumlist);
+ ListCell *cur;
+ FmgrInfo *in_functions = cstate->in_functions;
+ Oid *typioparams = cstate->typioparams;
+ TupleDesc tupDesc = RelationGetDescr(cstate->rel);
+ uint32 new_block_pos;
+ uint32 line_size;
+
+ if (cstate->curr_data_block == NULL)
+ {
+ block_pos = WaitGetFreeCopyBlock(pcshared_info);
+
+ cstate->curr_block_pos = block_pos;
+
+ cstate->curr_data_block = &pcshared_info->data_blocks[block_pos];
+
+ cstate->curr_data_offset = 0;
+
+ readbytes = CopyGetData(cstate, &cstate->curr_data_block->data, 1, DATA_BLOCK_SIZE);
+
+ elog(DEBUG1, "LEADER - bytes read from file %d", readbytes);
+
+ if (cstate->reached_eof)
+ return true;
+ }
+
+ if (cstate->curr_data_offset + sizeof(fld_count) >= (DATA_BLOCK_SIZE - 1))
+ {
+ ParallelCopyDataBlock *data_block = NULL;
+ uint8 movebytes = 0;
+
+ block_pos = WaitGetFreeCopyBlock(pcshared_info);
+
+ movebytes = DATA_BLOCK_SIZE - cstate->curr_data_offset;
+
+ cstate->curr_data_block->skip_bytes = movebytes;
+
+ data_block = &pcshared_info->data_blocks[block_pos];
+
+ if (movebytes > 0)
+ memmove(&data_block->data[0], &cstate->curr_data_block->data[cstate->curr_data_offset],
+ movebytes);
+
+ elog(DEBUG1, "LEADER - field count is spread across data blocks - moved %d bytes from current block %u to %u block",
+ movebytes, cstate->curr_block_pos, block_pos);
+
+ readbytes = CopyGetData(cstate, &data_block->data[movebytes], 1, (DATA_BLOCK_SIZE - movebytes));
+
+ elog(DEBUG1, "LEADER - bytes read from file after field count is moved to next data block %d", readbytes);
+
+ if (cstate->reached_eof)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("unexpected EOF in COPY data")));
+
+ cstate->curr_data_block = data_block;
+ cstate->curr_data_offset = 0;
+ cstate->curr_block_pos = block_pos;
+ }
+
+ memcpy(&fld_count, &cstate->curr_data_block->data[cstate->curr_data_offset], sizeof(fld_count));
+
+ fld_count = (int16) pg_ntoh16(fld_count);
+
+ if (fld_count == -1)
+ {
+ /*
+ * Received EOF marker. In a V3-protocol copy, wait for the
+ * protocol-level EOF, and complain if it doesn't come
+ * immediately. This ensures that we correctly handle CopyFail,
+ * if client chooses to send that now.
+ *
+ * Note that we MUST NOT try to read more data in an old-protocol
+ * copy, since there is no protocol-level EOF marker then. We
+ * could go either way for copy from file, but choose to throw
+ * error if there's data after the EOF marker, for consistency
+ * with the new-protocol case.
+ */
+ char dummy;
+
+ if (cstate->copy_dest != COPY_OLD_FE &&
+ CopyGetData(cstate, &dummy, 1, 1) > 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("received copy data after EOF marker")));
+ return true;
+ }
+
+ if (fld_count != attr_count)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("row field count is %d, expected %d",
+ (int) fld_count, attr_count)));
+
+ cstate->curr_tuple_start_info.block_id = cstate->curr_block_pos;
+ cstate->curr_tuple_start_info.offset = cstate->curr_data_offset;
+ cstate->curr_data_offset = cstate->curr_data_offset + sizeof(fld_count);
+ new_block_pos = cstate->curr_block_pos;
+
+ foreach(cur, cstate->attnumlist)
+ {
+ int attnum = lfirst_int(cur);
+ int m = attnum - 1;
+ Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+ CopyReadBinaryAttributeLeader(cstate,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &new_block_pos);
+ }
+
+ cstate->curr_tuple_end_info.block_id = new_block_pos;
+ cstate->curr_tuple_end_info.offset = cstate->curr_data_offset-1;;
+
+ if (cstate->curr_tuple_start_info.block_id == cstate->curr_tuple_end_info.block_id)
+ {
+ elog(DEBUG1,"LEADER - tuple lies in a single data block");
+
+ line_size = cstate->curr_tuple_end_info.offset - cstate->curr_tuple_start_info.offset + 1;
+ pg_atomic_add_fetch_u32(&pcshared_info->data_blocks[cstate->curr_tuple_start_info.block_id].unprocessed_line_parts, 1);
+ }
+ else
+ {
+ uint32 following_block_id = pcshared_info->data_blocks[cstate->curr_tuple_start_info.block_id].following_block;
+
+ elog(DEBUG1,"LEADER - tuple is spread across data blocks");
+
+ line_size = DATA_BLOCK_SIZE - cstate->curr_tuple_start_info.offset -
+ pcshared_info->data_blocks[cstate->curr_tuple_start_info.block_id].skip_bytes;
+
+ pg_atomic_add_fetch_u32(&pcshared_info->data_blocks[cstate->curr_tuple_start_info.block_id].unprocessed_line_parts, 1);
+
+ while (following_block_id != cstate->curr_tuple_end_info.block_id)
+ {
+ line_size = line_size + DATA_BLOCK_SIZE - pcshared_info->data_blocks[following_block_id].skip_bytes;
+
+ pg_atomic_add_fetch_u32(&pcshared_info->data_blocks[following_block_id].unprocessed_line_parts, 1);
+
+ following_block_id = pcshared_info->data_blocks[following_block_id].following_block;
+
+ if (following_block_id == -1)
+ break;
+ }
+
+ if (following_block_id != -1)
+ pg_atomic_add_fetch_u32(&pcshared_info->data_blocks[following_block_id].unprocessed_line_parts, 1);
+
+ line_size = line_size + cstate->curr_tuple_end_info.offset + 1;
+ }
+
+ if (line_size > 0)
+ {
+ int line_pos = UpdateBlockInLineInfo(cstate,
+ cstate->curr_tuple_start_info.block_id,
+ cstate->curr_tuple_start_info.offset,
+ line_size,
+ LINE_LEADER_POPULATED);
+
+ pcshared_info->populated++;
+
+ elog(DEBUG1, "LEADER - adding - block:%u, offset:%u, line size:%u line position:%d",
+ cstate->curr_tuple_start_info.block_id,
+ cstate->curr_tuple_start_info.offset,
+ line_size, line_pos);
+ }
+
+ cstate->curr_tuple_start_info.block_id = -1;
+ cstate->curr_tuple_start_info.offset = -1;
+ cstate->curr_tuple_end_info.block_id = -1;
+ cstate->curr_tuple_end_info.offset = -1;
+
+ return false;
+}
+
+/*
+ * CopyReadBinaryAttributeLeader - leader identifies boundaries/offsets
+ * for each attribute/column, it moves on to next data block if the
+ * attribute/column is spread across data blocks.
+ */
+static pg_attribute_always_inline void
+CopyReadBinaryAttributeLeader(CopyState cstate, FmgrInfo *flinfo,
+ Oid typioparam, int32 typmod, uint32 *new_block_pos)
+{
+ int32 fld_size;
+ int readbytes;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint32 block_pos;
+
+ if ((cstate->curr_data_offset + sizeof(fld_size)) >= (DATA_BLOCK_SIZE - 1))
+ {
+ ParallelCopyDataBlock *data_block = NULL;
+ uint8 movebytes = DATA_BLOCK_SIZE - cstate->curr_data_offset;
+
+ block_pos = WaitGetFreeCopyBlock(pcshared_info);
+
+ cstate->curr_data_block->skip_bytes = movebytes;
+
+ data_block = &pcshared_info->data_blocks[block_pos];
+
+ if (movebytes > 0)
+ memmove(&data_block->data[0], &cstate->curr_data_block->data[cstate->curr_data_offset], movebytes);
+
+ elog(DEBUG1, "LEADER - field size is spread across data blocks - moved %d bytes from current block %u to %u block",
+ movebytes, cstate->curr_block_pos, block_pos);
+
+ readbytes = CopyGetData(cstate, &data_block->data[movebytes], 1, DATA_BLOCK_SIZE-movebytes);
+
+ elog(DEBUG1, "LEADER - bytes read from file after field size is moved to next data block %d", readbytes);
+
+ if (cstate->reached_eof)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("unexpected EOF in COPY data")));
+
+ cstate->prev_data_block = cstate->curr_data_block;
+ cstate->prev_data_block->following_block = block_pos;
+ cstate->curr_data_block = data_block;
+ cstate->curr_block_pos = block_pos;
+
+ if (cstate->prev_data_block->curr_blk_completed == false)
+ cstate->prev_data_block->curr_blk_completed = true;
+
+ cstate->curr_data_offset = 0;
+ *new_block_pos = block_pos;
+ }
+
+ memcpy(&fld_size, &cstate->curr_data_block->data[cstate->curr_data_offset], sizeof(fld_size));
+
+ cstate->curr_data_offset = cstate->curr_data_offset + sizeof(fld_size);
+
+ fld_size = (int32) pg_ntoh32(fld_size);
+
+ if (fld_size == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("unexpected EOF in COPY data")));
+
+ if (fld_size < -1)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("invalid field size")));
+
+ if ((DATA_BLOCK_SIZE-cstate->curr_data_offset) >= fld_size)
+ {
+ cstate->curr_data_offset = cstate->curr_data_offset + fld_size;
+ }
+ else
+ {
+ ParallelCopyDataBlock *data_block = NULL;
+
+ block_pos = WaitGetFreeCopyBlock(pcshared_info);
+
+ data_block = &pcshared_info->data_blocks[block_pos];
+
+ readbytes = CopyGetData(cstate, &data_block->data[0], 1, DATA_BLOCK_SIZE);
+
+ elog(DEBUG1, "LEADER - bytes read from file after detecting that tuple is spread across data blocks %d", readbytes);
+
+ if (cstate->reached_eof)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("unexpected EOF in COPY data")));
+
+
+ cstate->prev_data_block = cstate->curr_data_block;
+ cstate->prev_data_block->following_block = block_pos;
+
+ if (cstate->prev_data_block->curr_blk_completed == false)
+ cstate->prev_data_block->curr_blk_completed = true;
+
+ cstate->curr_data_block = data_block;
+ cstate->curr_data_offset = fld_size - (DATA_BLOCK_SIZE - cstate->curr_data_offset);
+ cstate->curr_block_pos = block_pos;
+ *new_block_pos = block_pos;
+ }
+}
+
+/*
+ * CopyReadBinaryTupleWorker - each worker reads data from data blocks after
+ * getting leader-identified tuple offsets from ring data structure.
+ */
+static pg_attribute_always_inline bool
+CopyReadBinaryTupleWorker(CopyState cstate, Datum *values, bool *nulls)
+{
+ uint32 line_pos;
+ ParallelCopyLineBoundary *line_info;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ int16 fld_count;
+ AttrNumber attr_count = list_length(cstate->attnumlist);
+ ListCell *cur;
+ FmgrInfo *in_functions = cstate->in_functions;
+ Oid *typioparams = cstate->typioparams;
+ TupleDesc tupDesc = RelationGetDescr(cstate->rel);
+ int i;
+
+ line_pos = GetLinePosition(cstate);
+
+ if (line_pos == -1)
+ return true;
+
+ line_info = &pcshared_info->line_boundaries.ring[line_pos];
+ cstate->curr_data_block = &pcshared_info->data_blocks[line_info->first_block];
+ cstate->curr_data_offset = line_info->start_offset;
+
+ if (cstate->curr_data_offset + sizeof(fld_count) >= (DATA_BLOCK_SIZE - 1))
+ {
+ /*the case where field count spread across datablocks should never occur.
+ as the leader would have moved it to next block*/
+ elog(DEBUG1, "WORKER - field count spread across datablocks should never occur");
+ }
+
+ memcpy(&fld_count, &cstate->curr_data_block->data[cstate->curr_data_offset], sizeof(fld_count));
+ fld_count = (int16) pg_ntoh16(fld_count);
+
+ if (fld_count == -1)
+ {
+ return true;
+ }
+
+ if (fld_count != attr_count)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("row field count is %d, expected %d",
+ (int) fld_count, attr_count)));
+
+ cstate->curr_data_offset = cstate->curr_data_offset + sizeof(fld_count);
+ i = 0;
+
+ foreach(cur, cstate->attnumlist)
+ {
+ int attnum = lfirst_int(cur);
+ int m = attnum - 1;
+ Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+ cstate->cur_attname = NameStr(att->attname);
+
+ i++;
+ values[m] = CopyReadBinaryAttributeWorker(cstate,
+ i,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &nulls[m]);
+ cstate->cur_attname = NULL;
+ }
+
+ pg_atomic_sub_fetch_u32(&cstate->curr_data_block->unprocessed_line_parts, 1);
+ line_info->start_offset = -1;
+ pg_atomic_write_u32(&line_info->line_state, LINE_WORKER_PROCESSED);
+ pg_atomic_write_u32(&line_info->line_size, -1);
+ pg_atomic_add_fetch_u64(&pcshared_info->total_worker_processed, 1);
+
+ return false;
+}
+
+/*
+ * CopyReadBinaryAttributeWorker - leader identifies boundaries/offsets
+ * for each attribute/column, it moves on to next data block if the
+ * attribute/column is spread across data blocks.
+ */
+static pg_attribute_always_inline Datum
+CopyReadBinaryAttributeWorker(CopyState cstate, int column_no,
+ FmgrInfo *flinfo, Oid typioparam, int32 typmod, bool *isnull)
+{
+ int32 fld_size;
+ Datum result;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+
+ if ((cstate->curr_data_offset + sizeof(fld_size)) >= (DATA_BLOCK_SIZE - 1))
+ {
+ ParallelCopyDataBlock *prev_data_block = cstate->curr_data_block;
+
+ elog(DEBUG1, "WORKER - field size is spread across data blocks");
+
+ cstate->curr_data_block = &pcshared_info->data_blocks[cstate->curr_data_block->following_block];
+ pg_atomic_sub_fetch_u32(&prev_data_block->unprocessed_line_parts, 1);
+ cstate->curr_data_offset = 0;
+ }
+
+ memcpy(&fld_size, &cstate->curr_data_block->data[cstate->curr_data_offset], sizeof(fld_size));
+ fld_size = (int32) pg_ntoh32(fld_size);
+
+ if (fld_size == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("unexpected EOF in COPY data")));
+
+ if (fld_size < -1)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("invalid field size")));
+
+ cstate->curr_data_offset = cstate->curr_data_offset + sizeof(fld_size);
+
+ /* reset attribute_buf to empty, and load raw data in it */
+ resetStringInfo(&cstate->attribute_buf);
+
+ enlargeStringInfo(&cstate->attribute_buf, fld_size);
+
+ if ((DATA_BLOCK_SIZE-cstate->curr_data_offset) >= fld_size)
+ {
+ elog(DEBUG1, "WORKER - tuple lies in single data block");
+
+ memcpy(&cstate->attribute_buf.data[0],&cstate->curr_data_block->data[cstate->curr_data_offset], fld_size);
+ cstate->curr_data_offset = cstate->curr_data_offset + fld_size;
+ }
+ else
+ {
+ ParallelCopyDataBlock *prev_data_block = cstate->curr_data_block;
+
+ elog(DEBUG1, "WORKER - tuple is spread across data blocks");
+
+ memcpy(&cstate->attribute_buf.data[0], &cstate->curr_data_block->data[cstate->curr_data_offset],
+ (DATA_BLOCK_SIZE - cstate->curr_data_offset));
+ cstate->curr_data_block = &pcshared_info->data_blocks[cstate->curr_data_block->following_block];
+ pg_atomic_sub_fetch_u32(&prev_data_block->unprocessed_line_parts, 1);
+ memcpy(&cstate->attribute_buf.data[DATA_BLOCK_SIZE - cstate->curr_data_offset],
+ &cstate->curr_data_block->data[0], (fld_size - (DATA_BLOCK_SIZE - cstate->curr_data_offset)));
+ cstate->curr_data_offset = fld_size - (DATA_BLOCK_SIZE - cstate->curr_data_offset);
+ }
+
+ cstate->attribute_buf.len = fld_size;
+ cstate->attribute_buf.data[fld_size] = '\0';
+
+ /* Call the column type's binary input converter */
+ result = ReceiveFunctionCall(flinfo, &cstate->attribute_buf,
+ typioparam, typmod);
+
+ /* Trouble if it didn't eat the whole buffer */
+ if (cstate->attribute_buf.cursor != cstate->attribute_buf.len)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
+ errmsg("incorrect binary data format")));
+
+ *isnull = false;
+ return result;
+}
+
+/*
* GetLinePosition - return the line position that worker should process.
*/
static uint32
@@ -5447,63 +5964,77 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
else
{
/* binary */
- int16 fld_count;
- ListCell *cur;
+ if (!IsParallelCopy())
+ {
+ int16 fld_count;
+ ListCell *cur;
- cstate->cur_lineno++;
+ cstate->cur_lineno++;
- if (!CopyGetInt16(cstate, &fld_count))
- {
- /* EOF detected (end of file, or protocol-level EOF) */
- return false;
- }
+ if (!CopyGetInt16(cstate, &fld_count))
+ {
+ /* EOF detected (end of file, or protocol-level EOF) */
+ return false;
+ }
- if (fld_count == -1)
- {
- /*
- * Received EOF marker. In a V3-protocol copy, wait for the
- * protocol-level EOF, and complain if it doesn't come
- * immediately. This ensures that we correctly handle CopyFail,
- * if client chooses to send that now.
- *
- * Note that we MUST NOT try to read more data in an old-protocol
- * copy, since there is no protocol-level EOF marker then. We
- * could go either way for copy from file, but choose to throw
- * error if there's data after the EOF marker, for consistency
- * with the new-protocol case.
- */
- char dummy;
+ if (fld_count == -1)
+ {
+ /*
+ * Received EOF marker. In a V3-protocol copy, wait for the
+ * protocol-level EOF, and complain if it doesn't come
+ * immediately. This ensures that we correctly handle CopyFail,
+ * if client chooses to send that now.
+ *
+ * Note that we MUST NOT try to read more data in an old-protocol
+ * copy, since there is no protocol-level EOF marker then. We
+ * could go either way for copy from file, but choose to throw
+ * error if there's data after the EOF marker, for consistency
+ * with the new-protocol case.
+ */
+ char dummy;
+
+ if (cstate->copy_dest != COPY_OLD_FE &&
+ CopyGetData(cstate, &dummy, 1, 1) > 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("received copy data after EOF marker")));
+ return false;
+ }
- if (cstate->copy_dest != COPY_OLD_FE &&
- CopyGetData(cstate, &dummy, 1, 1) > 0)
+ if (fld_count != attr_count)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("received copy data after EOF marker")));
- return false;
+ errmsg("row field count is %d, expected %d",
+ (int) fld_count, attr_count)));
+
+ i = 0;
+ foreach(cur, cstate->attnumlist)
+ {
+ int attnum = lfirst_int(cur);
+ int m = attnum - 1;
+ Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+ cstate->cur_attname = NameStr(att->attname);
+ i++;
+ values[m] = CopyReadBinaryAttribute(cstate,
+ i,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &nulls[m]);
+ cstate->cur_attname = NULL;
+ }
}
+ else
+ {
+ bool eof = false;
- if (fld_count != attr_count)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("row field count is %d, expected %d",
- (int) fld_count, attr_count)));
+ cstate->cur_lineno++;
- i = 0;
- foreach(cur, cstate->attnumlist)
- {
- int attnum = lfirst_int(cur);
- int m = attnum - 1;
- Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+ eof = CopyReadBinaryTupleWorker(cstate, values, nulls);
- cstate->cur_attname = NameStr(att->attname);
- i++;
- values[m] = CopyReadBinaryAttribute(cstate,
- i,
- &in_functions[m],
- typioparams[m],
- att->atttypmod,
- &nulls[m]);
- cstate->cur_attname = NULL;
+ if (eof)
+ return false;
}
}
--
1.8.3.1
0005-Tests-for-parallel-copy.patchtext/x-patch; charset=US-ASCII; name=0005-Tests-for-parallel-copy.patchDownload
From d06328623b7ecb051bfa23b2e54a439eed7f073f Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Wed, 17 Jun 2020 07:37:26 +0530
Subject: [PATCH 5/6] Tests for parallel copy.
This patch has the tests for parallel copy.
---
src/test/regress/expected/copy2.out | 210 +++++++++++++++++++++++++++++++++++-
src/test/regress/input/copy.source | 12 +++
src/test/regress/output/copy.source | 12 +++
src/test/regress/sql/copy2.sql | 208 ++++++++++++++++++++++++++++++++++-
4 files changed, 434 insertions(+), 8 deletions(-)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index e40287d..a088f72 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -254,18 +254,32 @@ It is "perfect".|
"It is ""perfect""."," "
"",
--test that we read consecutive LFs properly
-CREATE TEMP TABLE testnl (a int, b text, c int);
+CREATE TABLE testnl (a int, b text, c int);
COPY testnl FROM stdin CSV;
+COPY testnl FROM stdin WITH (FORMAT 'csv', PARALLEL '2');
+SELECT * FROM testnl;
+ a | b | c
+---+----------------------+---
+ 1 | a field with two LFs+| 2
+ | +|
+ | inside |
+ 1 | a field with two LFs+| 2
+ | +|
+ | inside |
+(2 rows)
+
-- test end of copy marker
-CREATE TEMP TABLE testeoc (a text);
+CREATE TABLE testeoc (a text);
COPY testeoc FROM stdin CSV;
+TRUNCATE testeoc;
+COPY testeoc FROM stdin WITH (FORMAT 'csv', PARALLEL '2');
COPY testeoc TO stdout CSV;
a\.
\.b
c\.d
"\."
-- test handling of nonstandard null marker that violates escaping rules
-CREATE TEMP TABLE testnull(a int, b text);
+CREATE TABLE testnull(a int, b text);
INSERT INTO testnull VALUES (1, E'\\0'), (NULL, NULL);
COPY testnull TO stdout WITH NULL AS E'\\0';
1 \\0
@@ -280,6 +294,15 @@ SELECT * FROM testnull;
|
(4 rows)
+TRUNCATE testnull;
+COPY testnull FROM stdin WITH (NULL E'\\0', PARALLEL 2);
+SELECT * FROM testnull;
+ a | b
+----+----
+ 42 | \0
+ |
+(2 rows)
+
BEGIN;
CREATE TABLE vistest (LIKE testeoc);
COPY vistest FROM stdin CSV;
@@ -349,6 +372,34 @@ SELECT * FROM vistest;
BEGIN;
TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 2);
+SELECT * FROM vistest;
+ a
+----
+ a2
+ b
+(2 rows)
+
+SAVEPOINT s1;
+TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 2);
+SELECT * FROM vistest;
+ a
+----
+ d2
+ e
+(2 rows)
+
+COMMIT;
+SELECT * FROM vistest;
+ a
+----
+ d2
+ e
+(2 rows)
+
+BEGIN;
+TRUNCATE vistest;
COPY vistest FROM stdin CSV FREEZE;
SELECT * FROM vistest;
a
@@ -409,7 +460,7 @@ SELECT * FROM vistest;
(2 rows)
-- Test FORCE_NOT_NULL and FORCE_NULL options
-CREATE TEMP TABLE forcetest (
+CREATE TABLE forcetest (
a INT NOT NULL,
b TEXT NOT NULL,
c TEXT,
@@ -420,6 +471,8 @@ CREATE TEMP TABLE forcetest (
-- should succeed with no effect ("b" remains an empty string, "c" remains NULL)
BEGIN;
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
+TRUNCATE forcetest;
+COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c), PARALLEL 2);
COMMIT;
SELECT b, c FROM forcetest WHERE a = 1;
b | c
@@ -430,6 +483,8 @@ SELECT b, c FROM forcetest WHERE a = 1;
-- should succeed, FORCE_NULL and FORCE_NOT_NULL can be both specified
BEGIN;
COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d));
+TRUNCATE forcetest;
+COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d), PARALLEL 2);
COMMIT;
SELECT c, d FROM forcetest WHERE a = 2;
c | d
@@ -486,6 +541,31 @@ select * from check_con_tbl;
(2 rows)
+\d+ check_con_tbl
+ Table "public.check_con_tbl"
+ Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
+--------+---------+-----------+----------+---------+---------+--------------+-------------
+ f1 | integer | | | | plain | |
+Check constraints:
+ "check_con_tbl_check" CHECK (check_con_function(check_con_tbl.*))
+
+truncate check_con_tbl;
+copy check_con_tbl from stdin with (parallel 2);
+NOTICE: input = {"f1":1}
+NOTICE: input = {"f1":null}
+copy check_con_tbl from stdin with (parallel 2);
+NOTICE: input = {"f1":0}
+ERROR: new row for relation "check_con_tbl" violates check constraint "check_con_tbl_check"
+DETAIL: Failing row contains (0).
+CONTEXT: COPY check_con_tbl, line 1: "0"
+parallel worker
+select * from check_con_tbl;
+ f1
+----
+ 1
+
+(2 rows)
+
-- test with RLS enabled.
CREATE ROLE regress_rls_copy_user;
CREATE ROLE regress_rls_copy_user_colperms;
@@ -600,8 +680,130 @@ SELECT * FROM instead_of_insert_tbl;
(2 rows)
COMMIT;
+-- Parallel copy tests.
+CREATE TABLE test_parallel_copy (
+ a INT,
+ b INT,
+ c TEXT not null default 'stuff',
+ d TEXT,
+ e TEXT
+) ;
+COPY test_parallel_copy (a, b, c, d, e) from stdin with (PARALLEL '2');
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '2');
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '2');
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '2', FORMAT 'csv', HEADER);
+-- zero workers: should perform non-parallel copy
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0');
+ERROR: argument to option "parallel" must be a positive integer greater than zero
+LINE 1: COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0'...
+ ^
+-- referencing table: should perform non-parallel copy
+CREATE TABLE test_copy_pk(c1 INT PRIMARY KEY);
+INSERT INTO test_copy_pk VALUES(10);
+CREATE TABLE test_copy_ri(c1 INT REFERENCES test_copy_pk(c1));
+COPY test_copy_ri from stdin with (FORMAT csv, DELIMITER ',', PARALLEL '2');
+WARNING: Parallel copy not supported for specified table, copy will be run in non-parallel mode
+-- expressions: should perform non-parallel copy
+CREATE TABLE test_copy_expr (index INT, height REAL, weight REAL);
+COPY test_copy_expr FROM STDIN WITH (FORMAT csv, DELIMITER ',', parallel '2') WHERE height > random() * 65;
+WARNING: Parallel copy not supported for specified table, copy will be run in non-parallel mode
+-- serial data: should perform non-parallel copy
+CREATE TABLE testserial (index SERIAL, height REAL);
+COPY testserial(height) FROM STDIN WITH (FORMAT csv, DELIMITER ',', parallel '2');
+WARNING: Parallel copy not supported for specified table, copy will be run in non-parallel mode
+-- temporary table copy: should peform non-parallel copy
+CREATE TEMPORARY TABLE temp_test(
+ a int
+) ;
+COPY temp_test (a) from stdin with (PARALLEL '2');
+WARNING: Parallel copy not supported for specified table, copy will be run in non-parallel mode
+-- non-existent column in column list: should fail
+COPY test_parallel_copy (xyz) from stdin with (PARALLEL '2');
+ERROR: column "xyz" of relation "test_parallel_copy" does not exist
+-- too many columns in column list: should fail
+COPY test_parallel_copy (a, b, c, d, e, d, c) from stdin with (PARALLEL '2');
+ERROR: column "d" specified more than once
+-- missing data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL '2');
+ERROR: invalid input syntax for type integer: ""
+CONTEXT: COPY test_parallel_copy, line 0, column a: ""
+parallel worker
+COPY test_parallel_copy from stdin with (PARALLEL '2');
+ERROR: missing data for column "e"
+CONTEXT: COPY test_parallel_copy, line 1: "2000 230 23 23"
+parallel worker
+COPY test_parallel_copy from stdin with (PARALLEL '2');
+ERROR: missing data for column "e"
+CONTEXT: COPY test_parallel_copy, line 1: "2001 231 \N \N"
+parallel worker
+-- extra data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL '2');
+ERROR: extra data after last expected column
+CONTEXT: COPY test_parallel_copy, line 1: "2002 232 40 50 60 70 80"
+parallel worker
+-- various COPY options: delimiters, oids, NULL string, encoding
+COPY test_parallel_copy (b, c, d, e) from stdin with (delimiter ',', null 'x', PARALLEL '2') ;
+COPY test_parallel_copy from stdin WITH (DELIMITER ';', NULL '', PARALLEL '2');
+COPY test_parallel_copy from stdin WITH (DELIMITER ':', NULL E'\\X', ENCODING 'sql_ascii', PARALLEL '2');
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a = 50004;
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a > 60003;
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE f > 60003;
+ERROR: column "f" does not exist
+LINE 1: ...arallel_copy from stdin WITH (PARALLEL '2') WHERE f > 60003;
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a = max(x.b);
+ERROR: missing FROM-clause entry for table "x"
+LINE 1: ...llel_copy from stdin WITH (PARALLEL '2') WHERE a = max(x.b);
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a IN (SELECT 1 FROM x);
+ERROR: cannot use subquery in COPY FROM WHERE condition
+LINE 1: ...allel_copy from stdin WITH (PARALLEL '2') WHERE a IN (SELECT...
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a IN (generate_series(1,5));
+ERROR: set-returning functions are not allowed in COPY FROM WHERE conditions
+LINE 1: ...l_copy from stdin WITH (PARALLEL '2') WHERE a IN (generate_s...
+ ^
+COPY test_parallel_copy from stdin WITH(PARALLEL '2') WHERE a = row_number() over(b);
+ERROR: window functions are not allowed in COPY FROM WHERE conditions
+LINE 1: ...llel_copy from stdin WITH(PARALLEL '2') WHERE a = row_number...
+ ^
+-- check results of copy in
+SELECT * FROM test_parallel_copy;
+ a | b | c | d | e
+-------+----+------------+---------+---------
+ 1 | 11 | test_c1 | test_d1 | test_e1
+ 2 | 12 | test_c2 | test_d2 | test_e2
+ | 3 | stuff | test_d3 |
+ | 4 | stuff | test_d4 |
+ | 5 | stuff | test_d5 |
+ | | 45 | 80 | 90
+ | | x | \x | \x
+ | | , | \, | \
+ 3000 | | c | |
+ 4000 | | C | |
+ 4001 | 1 | empty | |
+ 4002 | 2 | null | |
+ 4003 | 3 | Backslash | \ | \
+ 4004 | 4 | BackslashX | \X | \X
+ 4005 | 5 | N | N | N
+ 4006 | 6 | BackslashN | \N | \N
+ 4007 | 7 | XX | XX | XX
+ 4008 | 8 | Delimiter | : | :
+ 50004 | 25 | 35 | 45 | 55
+ 60004 | 25 | 35 | 45 | 55
+ 60005 | 26 | 36 | 46 | 56
+(21 rows)
+
-- clean up
+DROP TABLE test_copy_ri;
+DROP TABLE test_copy_pk;
+DROP TABLE test_copy_expr;
+DROP TABLE testeoc;
+DROP TABLE testnl;
DROP TABLE forcetest;
+DROP TABLE test_parallel_copy;
+DROP TABLE testserial;
+DROP TABLE testnull;
DROP TABLE vistest;
DROP FUNCTION truncate_in_subxact();
DROP TABLE x, y;
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index a1d529a..159c058 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -15,6 +15,13 @@ DELETE FROM onek;
COPY onek FROM '@abs_builddir@/results/onek.data';
+-- test parallel copy
+COPY tenk1 FROM '@abs_srcdir@/data/tenk.data' WITH (parallel 2);
+
+SELECT COUNT(*) FROM tenk1;
+
+TRUNCATE tenk1;
+
COPY tenk1 FROM '@abs_srcdir@/data/tenk.data';
COPY slow_emp4000 FROM '@abs_srcdir@/data/rect.data';
@@ -159,6 +166,11 @@ truncate parted_copytest;
copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv';
+-- Test parallel copy from with a partitioned table.
+truncate parted_copytest;
+
+copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv' with (parallel 2);
+
-- Ensure COPY FREEZE errors for partitioned tables.
begin;
truncate parted_copytest;
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index 938d355..c3003fe 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -9,6 +9,15 @@ COPY onek FROM '@abs_srcdir@/data/onek.data';
COPY onek TO '@abs_builddir@/results/onek.data';
DELETE FROM onek;
COPY onek FROM '@abs_builddir@/results/onek.data';
+-- test parallel copy
+COPY tenk1 FROM '@abs_srcdir@/data/tenk.data' WITH (parallel 2);
+SELECT COUNT(*) FROM tenk1;
+ count
+-------
+ 10000
+(1 row)
+
+TRUNCATE tenk1;
COPY tenk1 FROM '@abs_srcdir@/data/tenk.data';
COPY slow_emp4000 FROM '@abs_srcdir@/data/rect.data';
COPY person FROM '@abs_srcdir@/data/person.data';
@@ -113,6 +122,9 @@ insert into parted_copytest select x,1,'One' from generate_series(1011,1020) x;
copy (select * from parted_copytest order by a) to '@abs_builddir@/results/parted_copytest.csv';
truncate parted_copytest;
copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv';
+-- Test parallel copy from with a partitioned table.
+truncate parted_copytest;
+copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv' with (parallel 2);
-- Ensure COPY FREEZE errors for partitioned tables.
begin;
truncate parted_copytest;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 902f4fa..13104f4 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -157,7 +157,7 @@ COPY y TO stdout (FORMAT CSV, FORCE_QUOTE *);
--test that we read consecutive LFs properly
-CREATE TEMP TABLE testnl (a int, b text, c int);
+CREATE TABLE testnl (a int, b text, c int);
COPY testnl FROM stdin CSV;
1,"a field with two LFs
@@ -165,8 +165,16 @@ COPY testnl FROM stdin CSV;
inside",2
\.
+COPY testnl FROM stdin WITH (FORMAT 'csv', PARALLEL '2');
+1,"a field with two LFs
+
+inside",2
+\.
+
+SELECT * FROM testnl;
+
-- test end of copy marker
-CREATE TEMP TABLE testeoc (a text);
+CREATE TABLE testeoc (a text);
COPY testeoc FROM stdin CSV;
a\.
@@ -175,11 +183,19 @@ c\.d
"\."
\.
+TRUNCATE testeoc;
+COPY testeoc FROM stdin WITH (FORMAT 'csv', PARALLEL '2');
+a\.
+\.b
+c\.d
+"\."
+\.
+
COPY testeoc TO stdout CSV;
-- test handling of nonstandard null marker that violates escaping rules
-CREATE TEMP TABLE testnull(a int, b text);
+CREATE TABLE testnull(a int, b text);
INSERT INTO testnull VALUES (1, E'\\0'), (NULL, NULL);
COPY testnull TO stdout WITH NULL AS E'\\0';
@@ -191,6 +207,14 @@ COPY testnull FROM stdin WITH NULL AS E'\\0';
SELECT * FROM testnull;
+TRUNCATE testnull;
+COPY testnull FROM stdin WITH (NULL E'\\0', PARALLEL 2);
+42 \\0
+\0 \0
+\.
+
+SELECT * FROM testnull;
+
BEGIN;
CREATE TABLE vistest (LIKE testeoc);
COPY vistest FROM stdin CSV;
@@ -235,6 +259,23 @@ SELECT * FROM vistest;
BEGIN;
TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 2);
+a2
+b
+\.
+SELECT * FROM vistest;
+SAVEPOINT s1;
+TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 2);
+d2
+e
+\.
+SELECT * FROM vistest;
+COMMIT;
+SELECT * FROM vistest;
+
+BEGIN;
+TRUNCATE vistest;
COPY vistest FROM stdin CSV FREEZE;
x
y
@@ -284,7 +325,7 @@ SELECT * FROM vistest;
COMMIT;
SELECT * FROM vistest;
-- Test FORCE_NOT_NULL and FORCE_NULL options
-CREATE TEMP TABLE forcetest (
+CREATE TABLE forcetest (
a INT NOT NULL,
b TEXT NOT NULL,
c TEXT,
@@ -297,6 +338,10 @@ BEGIN;
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
1,,""
\.
+TRUNCATE forcetest;
+COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c), PARALLEL 2);
+1,,""
+\.
COMMIT;
SELECT b, c FROM forcetest WHERE a = 1;
-- should succeed, FORCE_NULL and FORCE_NOT_NULL can be both specified
@@ -304,6 +349,10 @@ BEGIN;
COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d));
2,'a',,""
\.
+TRUNCATE forcetest;
+COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d), PARALLEL 2);
+2,'a',,""
+\.
COMMIT;
SELECT c, d FROM forcetest WHERE a = 2;
-- should fail with not-null constraint violation
@@ -339,6 +388,16 @@ copy check_con_tbl from stdin;
0
\.
select * from check_con_tbl;
+\d+ check_con_tbl
+truncate check_con_tbl;
+copy check_con_tbl from stdin with (parallel 2);
+1
+\N
+\.
+copy check_con_tbl from stdin with (parallel 2);
+0
+\.
+select * from check_con_tbl;
-- test with RLS enabled.
CREATE ROLE regress_rls_copy_user;
@@ -440,8 +499,149 @@ test1
SELECT * FROM instead_of_insert_tbl;
COMMIT;
+-- Parallel copy tests.
+CREATE TABLE test_parallel_copy (
+ a INT,
+ b INT,
+ c TEXT not null default 'stuff',
+ d TEXT,
+ e TEXT
+) ;
+
+COPY test_parallel_copy (a, b, c, d, e) from stdin with (PARALLEL '2');
+1 11 test_c1 test_d1 test_e1
+2 12 test_c2 test_d2 test_e2
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '2');
+3 test_d3
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '2');
+4 test_d4
+5 test_d5
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '2', FORMAT 'csv', HEADER);
+b d
+\.
+
+-- zero workers: should perform non-parallel copy
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0');
+
+-- referencing table: should perform non-parallel copy
+CREATE TABLE test_copy_pk(c1 INT PRIMARY KEY);
+INSERT INTO test_copy_pk VALUES(10);
+CREATE TABLE test_copy_ri(c1 INT REFERENCES test_copy_pk(c1));
+COPY test_copy_ri from stdin with (FORMAT csv, DELIMITER ',', PARALLEL '2');
+10
+\.
+
+-- expressions: should perform non-parallel copy
+CREATE TABLE test_copy_expr (index INT, height REAL, weight REAL);
+
+COPY test_copy_expr FROM STDIN WITH (FORMAT csv, DELIMITER ',', parallel '2') WHERE height > random() * 65;
+60,60,60
+\.
+
+-- serial data: should perform non-parallel copy
+CREATE TABLE testserial (index SERIAL, height REAL);
+
+COPY testserial(height) FROM STDIN WITH (FORMAT csv, DELIMITER ',', parallel '2');
+60
+\.
+
+-- temporary table copy: should peform non-parallel copy
+CREATE TEMPORARY TABLE temp_test(
+ a int
+) ;
+
+COPY temp_test (a) from stdin with (PARALLEL '2');
+10
+\.
+
+-- non-existent column in column list: should fail
+COPY test_parallel_copy (xyz) from stdin with (PARALLEL '2');
+
+-- too many columns in column list: should fail
+COPY test_parallel_copy (a, b, c, d, e, d, c) from stdin with (PARALLEL '2');
+
+-- missing data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL '2');
+
+\.
+COPY test_parallel_copy from stdin with (PARALLEL '2');
+2000 230 23 23
+\.
+COPY test_parallel_copy from stdin with (PARALLEL '2');
+2001 231 \N \N
+\.
+
+-- extra data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL '2');
+2002 232 40 50 60 70 80
+\.
+
+-- various COPY options: delimiters, oids, NULL string, encoding
+COPY test_parallel_copy (b, c, d, e) from stdin with (delimiter ',', null 'x', PARALLEL '2') ;
+x,45,80,90
+x,\x,\\x,\\\x
+x,\,,\\\,,\\
+\.
+
+COPY test_parallel_copy from stdin WITH (DELIMITER ';', NULL '', PARALLEL '2');
+3000;;c;;
+\.
+
+COPY test_parallel_copy from stdin WITH (DELIMITER ':', NULL E'\\X', ENCODING 'sql_ascii', PARALLEL '2');
+4000:\X:C:\X:\X
+4001:1:empty::
+4002:2:null:\X:\X
+4003:3:Backslash:\\:\\
+4004:4:BackslashX:\\X:\\X
+4005:5:N:\N:\N
+4006:6:BackslashN:\\N:\\N
+4007:7:XX:\XX:\XX
+4008:8:Delimiter:\::\:
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a = 50004;
+50003 24 34 44 54
+50004 25 35 45 55
+50005 26 36 46 56
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a > 60003;
+60001 22 32 42 52
+60002 23 33 43 53
+60003 24 34 44 54
+60004 25 35 45 55
+60005 26 36 46 56
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE f > 60003;
+
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a = max(x.b);
+
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a IN (SELECT 1 FROM x);
+
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a IN (generate_series(1,5));
+
+COPY test_parallel_copy from stdin WITH(PARALLEL '2') WHERE a = row_number() over(b);
+
+-- check results of copy in
+SELECT * FROM test_parallel_copy;
+
-- clean up
+DROP TABLE test_copy_ri;
+DROP TABLE test_copy_pk;
+DROP TABLE test_copy_expr;
+DROP TABLE testeoc;
+DROP TABLE testnl;
DROP TABLE forcetest;
+DROP TABLE test_parallel_copy;
+DROP TABLE testserial;
+DROP TABLE testnull;
DROP TABLE vistest;
DROP FUNCTION truncate_in_subxact();
DROP TABLE x, y;
--
1.8.3.1
0004-Documentation-for-parallel-copy.patchtext/x-patch; charset=US-ASCII; name=0004-Documentation-for-parallel-copy.patchDownload
From 59fc88e4ce32d955e874d6ea83d1976e7b148b9e Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Wed, 17 Jun 2020 07:34:56 +0530
Subject: [PATCH 4/6] Documentation for parallel copy.
This patch has the documentation changes for parallel copy.
---
doc/src/sgml/ref/copy.sgml | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18189ab..95d349d 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -37,6 +37,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
NULL '<replaceable class="parameter">null_string</replaceable>'
HEADER [ <replaceable class="parameter">boolean</replaceable> ]
+ PARALLEL <replaceable class="parameter">integer</replaceable>
QUOTE '<replaceable class="parameter">quote_character</replaceable>'
ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -275,6 +276,21 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</varlistentry>
<varlistentry>
+ <term><literal>PARALLEL</literal></term>
+ <listitem>
+ <para>
+ Perform <command>COPY FROM</command> in parallel using <replaceable
+ class="parameter"> integer</replaceable> background workers. Please
+ note that it is not guaranteed that the number of parallel workers
+ specified in <replaceable class="parameter">integer</replaceable> will
+ be used during execution. It is possible for a copy to run with fewer
+ workers than specified, or even with no workers at all. This option is
+ allowed only in <command>COPY FROM</command>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>QUOTE</literal></term>
<listitem>
<para>
--
1.8.3.1
0001-Copy-code-readjustment-to-support-parallel-copy.patchtext/x-patch; charset=US-ASCII; name=0001-Copy-code-readjustment-to-support-parallel-copy.patchDownload
From c7b9bb834716b1ba83f641dbb4932a592a9310fc Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Wed, 17 Jun 2020 07:13:53 +0530
Subject: [PATCH 1/6] Copy code readjustment to support parallel copy.
This patch has the copy code slightly readjusted so that the common code is
separated to functions/macros, these functions/macros will be used by the
workers in the parallel copy code of the upcoming patches. EOL removal is moved
from CopyReadLine to CopyReadLineText, this change was required because in case
of parallel copy the chunk identification and chunk updation is done in
CopyReadLineText, before chunk information is updated in shared memory the new
line characters should be removed.
---
src/backend/commands/copy.c | 371 ++++++++++++++++++++++++++------------------
1 file changed, 219 insertions(+), 152 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 6d53dc4..f2310ab 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -95,6 +95,9 @@ typedef enum CopyInsertMethod
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
} CopyInsertMethod;
+#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
+
/*
* This struct contains all the state variables used throughout a COPY
* operation. For simplicity, we use the same struct for all variants of COPY,
@@ -219,7 +222,6 @@ typedef struct CopyStateData
* converts it. Note: we guarantee that there is a \0 at
* raw_buf[raw_buf_len].
*/
-#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
char *raw_buf;
int raw_buf_index; /* next byte to process */
int raw_buf_len; /* total # of bytes stored */
@@ -347,6 +349,88 @@ if (1) \
goto not_end_of_copy; \
} else ((void) 0)
+/*
+ * CONVERT_TO_SERVER_ENCODING - convert contents to server encoding.
+ */
+#define CONVERT_TO_SERVER_ENCODING(cstate) \
+{ \
+ /* Done reading the line. Convert it to server encoding. */ \
+ if (cstate->need_transcoding) \
+ { \
+ char *cvt; \
+ cvt = pg_any_to_server(cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ cstate->file_encoding); \
+ if (cvt != cstate->line_buf.data) \
+ { \
+ /* transfer converted data back to line_buf */ \
+ resetStringInfo(&cstate->line_buf); \
+ appendBinaryStringInfo(&cstate->line_buf, cvt, strlen(cvt)); \
+ pfree(cvt); \
+ } \
+ } \
+ /* Now it's safe to use the buffer in error messages */ \
+ cstate->line_buf_converted = true; \
+}
+
+/*
+ * CLEAR_EOL_FROM_COPIED_DATA - Clear EOL from the copied data.
+ */
+#define CLEAR_EOL_FROM_COPIED_DATA(copy_line_data, copy_line_pos, copy_line_size) \
+{ \
+ /* \
+ * If we didn't hit EOF, then we must have transferred the EOL marker \
+ * to line_buf along with the data. Get rid of it. \
+ */ \
+ switch (cstate->eol_type) \
+ { \
+ case EOL_NL: \
+ Assert(copy_line_size >= 1); \
+ Assert(copy_line_data[copy_line_pos - 1] == '\n'); \
+ copy_line_data[copy_line_pos - 1] = '\0'; \
+ copy_line_size--; \
+ break; \
+ case EOL_CR: \
+ Assert(copy_line_size >= 1); \
+ Assert(copy_line_data[copy_line_pos - 1] == '\r'); \
+ copy_line_data[copy_line_pos - 1] = '\0'; \
+ copy_line_size--; \
+ break; \
+ case EOL_CRNL: \
+ Assert(copy_line_size >= 2); \
+ Assert(copy_line_data[copy_line_pos - 2] == '\r'); \
+ Assert(copy_line_data[copy_line_pos - 1] == '\n'); \
+ copy_line_data[copy_line_pos - 2] = '\0'; \
+ copy_line_size -= 2; \
+ break; \
+ case EOL_UNKNOWN: \
+ /* shouldn't get here */ \
+ Assert(false); \
+ break; \
+ } \
+}
+
+/*
+ * CLEAR_EOL_LINE - Wrapper for clearing EOL.
+ */
+#define CLEAR_EOL_LINE() \
+if (!result && !IsHeaderLine()) \
+ CLEAR_EOL_FROM_COPIED_DATA(cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ cstate->line_buf.len) \
+
+/*
+ * INCREMENTPROCESSED - Increment the lines processed.
+ */
+#define INCREMENTPROCESSED(processed) \
+processed++;
+
+/*
+ * GETPROCESSED - Get the lines processed.
+ */
+#define GETPROCESSED(processed) \
+return processed;
+
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
@@ -393,6 +477,8 @@ static bool CopyGetInt32(CopyState cstate, int32 *val);
static void CopySendInt16(CopyState cstate, int16 val);
static bool CopyGetInt16(CopyState cstate, int16 *val);
+static void PopulateAttributes(CopyState cstate, TupleDesc tup_desc,
+ List *attnamelist);
/*
* Send copy start/stop messages for frontend copies. These have changed
@@ -794,6 +880,7 @@ CopyLoadRawBuf(CopyState cstate)
{
int nbytes;
int inbytes;
+ int minread = 1;
if (cstate->raw_buf_index < cstate->raw_buf_len)
{
@@ -805,8 +892,11 @@ CopyLoadRawBuf(CopyState cstate)
else
nbytes = 0; /* no data need be saved */
+ if (cstate->copy_dest == COPY_NEW_FE)
+ minread = RAW_BUF_SIZE - nbytes;
+
inbytes = CopyGetData(cstate, cstate->raw_buf + nbytes,
- 1, RAW_BUF_SIZE - nbytes);
+ minread, RAW_BUF_SIZE - nbytes);
nbytes += inbytes;
cstate->raw_buf[nbytes] = '\0';
cstate->raw_buf_index = 0;
@@ -1464,7 +1554,6 @@ BeginCopy(ParseState *pstate,
{
CopyState cstate;
TupleDesc tupDesc;
- int num_phys_attrs;
MemoryContext oldcontext;
/* Allocate workspace and zero all fields */
@@ -1630,6 +1719,22 @@ BeginCopy(ParseState *pstate,
tupDesc = cstate->queryDesc->tupDesc;
}
+ PopulateAttributes(cstate, tupDesc, attnamelist);
+ cstate->copy_dest = COPY_FILE; /* default */
+
+ MemoryContextSwitchTo(oldcontext);
+
+ return cstate;
+}
+
+/*
+ * PopulateAttributes - Populate the attributes.
+ */
+static void
+PopulateAttributes(CopyState cstate, TupleDesc tupDesc, List *attnamelist)
+{
+ int num_phys_attrs;
+
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
@@ -1749,12 +1854,6 @@ BeginCopy(ParseState *pstate,
pg_database_encoding_max_length() > 1);
/* See Multibyte encoding comment above */
cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
-
- cstate->copy_dest = COPY_FILE; /* default */
-
- MemoryContextSwitchTo(oldcontext);
-
- return cstate;
}
/*
@@ -2647,32 +2746,11 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
}
/*
- * Copy FROM file to relation.
+ * Check if the relation specified in copy from is valid.
*/
-uint64
-CopyFrom(CopyState cstate)
+static void
+CheckCopyFromValidity(CopyState cstate)
{
- ResultRelInfo *resultRelInfo;
- ResultRelInfo *target_resultRelInfo;
- ResultRelInfo *prevResultRelInfo = NULL;
- EState *estate = CreateExecutorState(); /* for ExecConstraints() */
- ModifyTableState *mtstate;
- ExprContext *econtext;
- TupleTableSlot *singleslot = NULL;
- MemoryContext oldcontext = CurrentMemoryContext;
-
- PartitionTupleRouting *proute = NULL;
- ErrorContextCallback errcallback;
- CommandId mycid = GetCurrentCommandId(true);
- int ti_options = 0; /* start with default options for insert */
- BulkInsertState bistate = NULL;
- CopyInsertMethod insertMethod;
- CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
- uint64 processed = 0;
- bool has_before_insert_row_trig;
- bool has_instead_insert_row_trig;
- bool leafpart_use_multi_insert = false;
-
Assert(cstate->rel);
/*
@@ -2709,27 +2787,6 @@ CopyFrom(CopyState cstate)
RelationGetRelationName(cstate->rel))));
}
- /*
- * If the target file is new-in-transaction, we assume that checking FSM
- * for free space is a waste of time. This could possibly be wrong, but
- * it's unlikely.
- */
- if (RELKIND_HAS_STORAGE(cstate->rel->rd_rel->relkind) &&
- (cstate->rel->rd_createSubid != InvalidSubTransactionId ||
- cstate->rel->rd_firstRelfilenodeSubid != InvalidSubTransactionId))
- ti_options |= TABLE_INSERT_SKIP_FSM;
-
- /*
- * Optimize if new relfilenode was created in this subxact or one of its
- * committed children and we won't see those rows later as part of an
- * earlier scan or command. The subxact test ensures that if this subxact
- * aborts then the frozen rows won't be visible after xact cleanup. Note
- * that the stronger test of exactly which subtransaction created it is
- * crucial for correctness of this optimization. The test for an earlier
- * scan or command tolerates false negatives. FREEZE causes other sessions
- * to see rows they would not see under MVCC, and a false negative merely
- * spreads that anomaly to the current session.
- */
if (cstate->freeze)
{
/*
@@ -2767,9 +2824,61 @@ CopyFrom(CopyState cstate)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("cannot perform COPY FREEZE because the table was not created or truncated in the current subtransaction")));
+ }
+}
+
+/*
+ * Copy FROM file to relation.
+ */
+uint64
+CopyFrom(CopyState cstate)
+{
+ ResultRelInfo *resultRelInfo;
+ ResultRelInfo *target_resultRelInfo;
+ ResultRelInfo *prevResultRelInfo = NULL;
+ EState *estate = CreateExecutorState(); /* for ExecConstraints() */
+ ModifyTableState *mtstate;
+ ExprContext *econtext;
+ TupleTableSlot *singleslot = NULL;
+ MemoryContext oldcontext = CurrentMemoryContext;
+
+ PartitionTupleRouting *proute = NULL;
+ ErrorContextCallback errcallback;
+ CommandId mycid = GetCurrentCommandId(true);
+ int ti_options = 0; /* start with default options for insert */
+ BulkInsertState bistate = NULL;
+ CopyInsertMethod insertMethod;
+ CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
+ uint64 processed = 0;
+ bool has_before_insert_row_trig;
+ bool has_instead_insert_row_trig;
+ bool leafpart_use_multi_insert = false;
+
+ CheckCopyFromValidity(cstate);
+
+ /*
+ * If the target file is new-in-transaction, we assume that checking FSM
+ * for free space is a waste of time. This could possibly be wrong, but
+ * it's unlikely.
+ */
+ if (RELKIND_HAS_STORAGE(cstate->rel->rd_rel->relkind) &&
+ (cstate->rel->rd_createSubid != InvalidSubTransactionId ||
+ cstate->rel->rd_firstRelfilenodeSubid != InvalidSubTransactionId))
+ ti_options |= TABLE_INSERT_SKIP_FSM;
+ /*
+ * Optimize if new relfilenode was created in this subxact or one of its
+ * committed children and we won't see those rows later as part of an
+ * earlier scan or command. The subxact test ensures that if this subxact
+ * aborts then the frozen rows won't be visible after xact cleanup. Note
+ * that the stronger test of exactly which subtransaction created it is
+ * crucial for correctness of this optimization. The test for an earlier
+ * scan or command tolerates false negatives. FREEZE causes other sessions
+ * to see rows they would not see under MVCC, and a false negative merely
+ * spreads that anomaly to the current session.
+ */
+ if (cstate->freeze)
ti_options |= TABLE_INSERT_FROZEN;
- }
/*
* We need a ResultRelInfo so we can use the regular executor's
@@ -3262,7 +3371,7 @@ CopyFrom(CopyState cstate)
* or FDW; this is the same definition used by nodeModifyTable.c
* for counting tuples inserted by an INSERT command.
*/
- processed++;
+ INCREMENTPROCESSED(processed)
}
}
@@ -3317,30 +3426,15 @@ CopyFrom(CopyState cstate)
FreeExecutorState(estate);
- return processed;
+ GETPROCESSED(processed)
}
/*
- * Setup to read tuples from a file for COPY FROM.
- *
- * 'rel': Used as a template for the tuples
- * 'filename': Name of server-local file to read
- * 'attnamelist': List of char *, columns to include. NIL selects all cols.
- * 'options': List of DefElem. See copy_opt_item in gram.y for selections.
- *
- * Returns a CopyState, to be passed to NextCopyFrom and related functions.
+ * PopulateCatalogInformation - populate the catalog information.
*/
-CopyState
-BeginCopyFrom(ParseState *pstate,
- Relation rel,
- const char *filename,
- bool is_program,
- copy_data_source_cb data_source_cb,
- List *attnamelist,
- List *options)
+static void
+PopulateCatalogInformation(CopyState cstate)
{
- CopyState cstate;
- bool pipe = (filename == NULL);
TupleDesc tupDesc;
AttrNumber num_phys_attrs,
num_defaults;
@@ -3350,31 +3444,8 @@ BeginCopyFrom(ParseState *pstate,
Oid in_func_oid;
int *defmap;
ExprState **defexprs;
- MemoryContext oldcontext;
bool volatile_defexprs;
- cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
- oldcontext = MemoryContextSwitchTo(cstate->copycontext);
-
- /* Initialize state variables */
- cstate->reached_eof = false;
- cstate->eol_type = EOL_UNKNOWN;
- cstate->cur_relname = RelationGetRelationName(cstate->rel);
- cstate->cur_lineno = 0;
- cstate->cur_attname = NULL;
- cstate->cur_attval = NULL;
-
- /* Set up variables to avoid per-attribute overhead. */
- initStringInfo(&cstate->attribute_buf);
- initStringInfo(&cstate->line_buf);
- cstate->line_buf_converted = false;
- cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
- cstate->raw_buf_index = cstate->raw_buf_len = 0;
-
- /* Assign range table, we'll need it in CopyFrom. */
- if (pstate)
- cstate->range_table = pstate->p_rtable;
-
tupDesc = RelationGetDescr(cstate->rel);
num_phys_attrs = tupDesc->natts;
num_defaults = 0;
@@ -3452,6 +3523,54 @@ BeginCopyFrom(ParseState *pstate,
cstate->defexprs = defexprs;
cstate->volatile_defexprs = volatile_defexprs;
cstate->num_defaults = num_defaults;
+}
+
+/*
+ * Setup to read tuples from a file for COPY FROM.
+ *
+ * 'rel': Used as a template for the tuples
+ * 'filename': Name of server-local file to read
+ * 'attnamelist': List of char *, columns to include. NIL selects all cols.
+ * 'options': List of DefElem. See copy_opt_item in gram.y for selections.
+ *
+ * Returns a CopyState, to be passed to NextCopyFrom and related functions.
+ */
+CopyState
+BeginCopyFrom(ParseState *pstate,
+ Relation rel,
+ const char *filename,
+ bool is_program,
+ copy_data_source_cb data_source_cb,
+ List *attnamelist,
+ List *options)
+{
+ CopyState cstate;
+ bool pipe = (filename == NULL);
+ MemoryContext oldcontext;
+
+ cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+
+ /* Initialize state variables */
+ cstate->reached_eof = false;
+ cstate->eol_type = EOL_UNKNOWN;
+ cstate->cur_relname = RelationGetRelationName(cstate->rel);
+ cstate->cur_lineno = 0;
+ cstate->cur_attname = NULL;
+ cstate->cur_attval = NULL;
+
+ /* Set up variables to avoid per-attribute overhead. */
+ initStringInfo(&cstate->attribute_buf);
+ initStringInfo(&cstate->line_buf);
+ cstate->line_buf_converted = false;
+ cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+
+ /* Assign range table, we'll need it in CopyFrom. */
+ if (pstate)
+ cstate->range_table = pstate->p_rtable;
+
+ PopulateCatalogInformation(cstate);
cstate->is_program = is_program;
if (data_source_cb)
@@ -3839,7 +3958,6 @@ static bool
CopyReadLine(CopyState cstate)
{
bool result;
-
resetStringInfo(&cstate->line_buf);
cstate->line_buf_valid = true;
@@ -3864,60 +3982,8 @@ CopyReadLine(CopyState cstate)
} while (CopyLoadRawBuf(cstate));
}
}
- else
- {
- /*
- * If we didn't hit EOF, then we must have transferred the EOL marker
- * to line_buf along with the data. Get rid of it.
- */
- switch (cstate->eol_type)
- {
- case EOL_NL:
- Assert(cstate->line_buf.len >= 1);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\n');
- cstate->line_buf.len--;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_CR:
- Assert(cstate->line_buf.len >= 1);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\r');
- cstate->line_buf.len--;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_CRNL:
- Assert(cstate->line_buf.len >= 2);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 2] == '\r');
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\n');
- cstate->line_buf.len -= 2;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_UNKNOWN:
- /* shouldn't get here */
- Assert(false);
- break;
- }
- }
-
- /* Done reading the line. Convert it to server encoding. */
- if (cstate->need_transcoding)
- {
- char *cvt;
-
- cvt = pg_any_to_server(cstate->line_buf.data,
- cstate->line_buf.len,
- cstate->file_encoding);
- if (cvt != cstate->line_buf.data)
- {
- /* transfer converted data back to line_buf */
- resetStringInfo(&cstate->line_buf);
- appendBinaryStringInfo(&cstate->line_buf, cvt, strlen(cvt));
- pfree(cvt);
- }
- }
-
- /* Now it's safe to use the buffer in error messages */
- cstate->line_buf_converted = true;
+ CONVERT_TO_SERVER_ENCODING(cstate)
return result;
}
@@ -4281,6 +4347,7 @@ not_end_of_copy:
* Transfer any still-uncopied data to line_buf.
*/
REFILL_LINEBUF;
+ CLEAR_EOL_LINE()
return result;
}
--
1.8.3.1
On Mon, Jun 15, 2020 at 4:39 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
The above tests were run with the configuration attached config.txt, which is the same used for performance tests of csv/text files posted earlier in this mail chain.
Request the community to take this patch up for review along with the parallel copy for csv/text file patches and provide feedback.
I had reviewed the patch, few comments:
+
+ /*
+ * Parallel copy for binary formatted files
+ */
+ ParallelCopyDataBlock *curr_data_block;
+ ParallelCopyDataBlock *prev_data_block;
+ uint32 curr_data_offset;
+ uint32 curr_block_pos;
+ ParallelCopyTupleInfo curr_tuple_start_info;
+ ParallelCopyTupleInfo curr_tuple_end_info;
} CopyStateData;
The new members added should be present in ParallelCopyData
+ if (cstate->curr_tuple_start_info.block_id ==
cstate->curr_tuple_end_info.block_id)
+ {
+ elog(DEBUG1,"LEADER - tuple lies in a single data block");
+
+ line_size = cstate->curr_tuple_end_info.offset -
cstate->curr_tuple_start_info.offset + 1;
+
pg_atomic_add_fetch_u32(&pcshared_info->data_blocks[cstate->curr_tuple_start_info.block_id].unprocessed_line_parts,
1);
+ }
+ else
+ {
+ uint32 following_block_id =
pcshared_info->data_blocks[cstate->curr_tuple_start_info.block_id].following_block;
+
+ elog(DEBUG1,"LEADER - tuple is spread across data blocks");
+
+ line_size = DATA_BLOCK_SIZE -
cstate->curr_tuple_start_info.offset -
+
pcshared_info->data_blocks[cstate->curr_tuple_start_info.block_id].skip_bytes;
+
+
pg_atomic_add_fetch_u32(&pcshared_info->data_blocks[cstate->curr_tuple_start_info.block_id].unprocessed_line_parts,
1);
+
+ while (following_block_id !=
cstate->curr_tuple_end_info.block_id)
+ {
+ line_size = line_size + DATA_BLOCK_SIZE -
pcshared_info->data_blocks[following_block_id].skip_bytes;
+
+
pg_atomic_add_fetch_u32(&pcshared_info->data_blocks[following_block_id].unprocessed_line_parts,
1);
+
+ following_block_id =
pcshared_info->data_blocks[following_block_id].following_block;
+
+ if (following_block_id == -1)
+ break;
+ }
+
+ if (following_block_id != -1)
+
pg_atomic_add_fetch_u32(&pcshared_info->data_blocks[following_block_id].unprocessed_line_parts,
1);
+
+ line_size = line_size + cstate->curr_tuple_end_info.offset + 1;
+ }
line_size can be set as and when we process the tuple from
CopyReadBinaryTupleLeader and this can be set at the end. That way the
above code can be removed.
+
+ /*
+ * Parallel copy for binary formatted files
+ */
+ ParallelCopyDataBlock *curr_data_block;
+ ParallelCopyDataBlock *prev_data_block;
+ uint32 curr_data_offset;
+ uint32 curr_block_pos;
+ ParallelCopyTupleInfo curr_tuple_start_info;
+ ParallelCopyTupleInfo curr_tuple_end_info;
} CopyStateData;
curr_block_pos variable is present in ParallelCopyShmInfo, we could
use it and remove from here.
curr_data_offset, similar variable raw_buf_index is present in
CopyStateData, we could use it and remove from here.
+ if (cstate->curr_data_offset + sizeof(fld_count) >= (DATA_BLOCK_SIZE - 1))
+ {
+ ParallelCopyDataBlock *data_block = NULL;
+ uint8 movebytes = 0;
+
+ block_pos = WaitGetFreeCopyBlock(pcshared_info);
+
+ movebytes = DATA_BLOCK_SIZE - cstate->curr_data_offset;
+
+ cstate->curr_data_block->skip_bytes = movebytes;
+
+ data_block = &pcshared_info->data_blocks[block_pos];
+
+ if (movebytes > 0)
+ memmove(&data_block->data[0],
&cstate->curr_data_block->data[cstate->curr_data_offset],
+ movebytes);
+
+ elog(DEBUG1, "LEADER - field count is spread across data blocks -
moved %d bytes from current block %u to %u block",
+ movebytes, cstate->curr_block_pos, block_pos);
+
+ readbytes = CopyGetData(cstate, &data_block->data[movebytes], 1,
(DATA_BLOCK_SIZE - movebytes));
+
+ elog(DEBUG1, "LEADER - bytes read from file after field count is
moved to next data block %d", readbytes);
+
+ if (cstate->reached_eof)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("unexpected EOF in COPY data")));
+
+ cstate->curr_data_block = data_block;
+ cstate->curr_data_offset = 0;
+ cstate->curr_block_pos = block_pos;
+ }
This code is duplicate in CopyReadBinaryTupleLeader &
CopyReadBinaryAttributeLeader. We could make a function and re-use.
+/*
+ * CopyReadBinaryAttributeWorker - leader identifies boundaries/offsets
+ * for each attribute/column, it moves on to next data block if the
+ * attribute/column is spread across data blocks.
+ */
+static pg_attribute_always_inline Datum
+CopyReadBinaryAttributeWorker(CopyState cstate, int column_no,
+ FmgrInfo *flinfo, Oid typioparam, int32 typmod, bool *isnull)
+{
+ int32 fld_size;
+ Datum result;
column_no is not used, it can be removed
+ if (fld_count == -1)
+ {
+ /*
+ * Received EOF marker. In a V3-protocol copy,
wait for the
+ * protocol-level EOF, and complain if it doesn't come
+ * immediately. This ensures that we correctly
handle CopyFail,
+ * if client chooses to send that now.
+ *
+ * Note that we MUST NOT try to read more data
in an old-protocol
+ * copy, since there is no protocol-level EOF
marker then. We
+ * could go either way for copy from file, but
choose to throw
+ * error if there's data after the EOF marker,
for consistency
+ * with the new-protocol case.
+ */
+ char dummy;
+
+ if (cstate->copy_dest != COPY_OLD_FE &&
+ CopyGetData(cstate, &dummy, 1, 1) > 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("received copy
data after EOF marker")));
+ return true;
+ }
+
+ if (fld_count != attr_count)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("row field count is %d, expected %d",
+ (int) fld_count, attr_count)));
+
+ cstate->curr_tuple_start_info.block_id = cstate->curr_block_pos;
+ cstate->curr_tuple_start_info.offset = cstate->curr_data_offset;
+ cstate->curr_data_offset = cstate->curr_data_offset + sizeof(fld_count);
+ new_block_pos = cstate->curr_block_pos;
+
+ foreach(cur, cstate->attnumlist)
+ {
+ int attnum = lfirst_int(cur);
+ int m = attnum - 1;
+ Form_pg_attribute att = TupleDescAttr(tupDesc, m);
The above code is present in NextCopyFrom & CopyReadBinaryTupleLeader,
check if we can make a common function or we could use NextCopyFrom as
it is.
+ memcpy(&fld_count,
&cstate->curr_data_block->data[cstate->curr_data_offset],
sizeof(fld_count));
+ fld_count = (int16) pg_ntoh16(fld_count);
+
+ if (fld_count == -1)
+ {
+ return true;
+ }
Should this be an assert in CopyReadBinaryTupleWorker function as this
check is already done in the leader.
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
Hi,
I just got some time to review the first patch in the list i.e.
0001-Copy-code-readjustment-to-support-parallel-copy.patch. As the patch
name suggests, it is just trying to reshuffle the existing code for COPY
command here and there. There is no extra changes added in the patch as
such, but still I do have some review comments, please have a look:
1) Can you please add some comments atop the new function
PopulateAttributes() describing its functionality in detail. Further, this
new function contains the code from BeginCopy() to set attribute level
options used with COPY FROM such as FORCE_QUOTE, FORCE_NOT_NULL, FORCE_NULL
etc. in cstate and along with that it also copies the code from BeginCopy()
to set other infos such as client encoding type, encoding conversion etc.
Hence, I think it would be good to give it some better name, basically
something that matches with what actually it is doing.
2) Again, the name for the new function CheckCopyFromValidity() doesn't
look good to me. From the function name it appears as if it does the sanity
check of the entire COPY FROM command, but actually it is just doing the
sanity check for the target relation specified with COPY FROM. So, probably
something like CheckTargetRelValidity would look more sensible, I think?
TBH, I am not good at naming the functions so you can always ignore my
suggestions about function and variable names :)
3) Any reason for not making CheckCopyFromValidity as a macro instead of a
new function. It is just doing the sanity check for the target relation.
4) Earlier in CopyReadLine() function while trying to clear the EOL marker
from cstate->line_buf.data (copied data), we were not checking if the line
read by CopyReadLineText() function is a header line or not, but I can see
that your patch checks that before clearing the EOL marker. Any reason for
this extra check?
5) I noticed the below spurious line removal in the patch.
@@ -3839,7 +3953,6 @@ static bool
CopyReadLine(CopyState cstate)
{
bool result;
-
Please note that I haven't got a chance to look into other patches as of
now. I will do that whenever possible. Thank you.
--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
On Fri, Jun 12, 2020 at 11:01 AM vignesh C <vignesh21@gmail.com> wrote:
Show quoted text
On Thu, Jun 4, 2020 at 12:44 AM Andres Freund <andres@anarazel.de> wrote
Hm. you don't explicitly mention that in your design, but given how
small the benefits going from 0-1 workers is, I assume the leader
doesn't do any "chunk processing" on its own?Yes you are right, the leader does not do any processing, Leader's
work is mainly to populate the shared memory with the offset
information for each record.Design of the Parallel Copy: The backend, to which the "COPY FROM"
query is
submitted acts as leader with the responsibility of reading data from
the
file/stdin, launching at most n number of workers as specified with
PARALLEL 'n' option in the "COPY FROM" query. The leader populates the
common data required for the workers execution in the DSM and shares it
with the workers. The leader then executes before statement triggers if
there exists any. Leader populates DSM chunks which includes the start
offset and chunk size, while populating the chunks it reads as manyblocks
as required into the DSM data blocks from the file. Each block is of
64K
size. The leader parses the data to identify a chunk, the existing
logic
from CopyReadLineText which identifies the chunks with some changes was
used for this. Leader checks if a free chunk is available to copy the
information, if there is no free chunk it waits till the requiredchunk is
freed up by the worker and then copies the identified chunks
information
(offset & chunk size) into the DSM chunks. This process is repeated
till
the complete file is processed. Simultaneously, the workers cache the
chunks(50) locally into the local memory and release the chunks to the
leader for further populating. Each worker processes the chunk that it
cached and inserts it into the table. The leader waits till all thechunks
populated are processed by the workers and exits.
Why do we need the local copy of 50 chunks? Copying memory around is far
from free. I don't see why it'd be better to add per-process caching,
rather than making the DSM bigger? I can see some benefit in marking
multiple chunks as being processed with one lock acquisition, but I
don't think adding a memory copy is a good idea.We had run performance with csv data file, 5.1GB, 10million tuples, 2
indexes on integer columns, results for the same are given below. We
noticed in some cases the performance is better if we copy the 50
records locally and release the shared memory. We will get better
benefits as the workers increase. Thoughts?------------------------------------------------------------------------------------------------
Workers | Exec time (With local copying | Exec time (Without copying,
| 50 records & release the | processing record by
record)
| shared memory) |------------------------------------------------------------------------------------------------
0 | 1162.772(1X) | 1152.684(1X)
2 | 635.249(1.83X) | 647.894(1.78X)
4 | 336.835(3.45X) | 335.534(3.43X)
8 | 188.577(6.17 X) | 189.461(6.08X)
16 | 126.819(9.17X) | 142.730(8.07X)
20 | 117.845(9.87X) | 146.533(7.87X)
30 | 127.554(9.11X) | 160.307(7.19X)This patch *desperately* needs to be split up. It imo is close to
unreviewable, due to a large amount of changes that just move code
around without other functional changes being mixed in with the actual
new stuff.I have split the patch, the new split patches are attached.
/* + * State of the chunk. + */ +typedef enum ChunkState +{ + CHUNK_INIT, /* initial stateof chunk */
+ CHUNK_LEADER_POPULATING, /* leader processing chunk */ + CHUNK_LEADER_POPULATED, /* leader completed populatingchunk */
+ CHUNK_WORKER_PROCESSING, /* worker processing chunk */ + CHUNK_WORKER_PROCESSED /* worker completed processingchunk */
+}ChunkState; + +#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1bytes */
+ +#define DATA_BLOCK_SIZE RAW_BUF_SIZE +#define RINGSIZE (10 * 1000) +#define MAX_BLOCKS_COUNT 1000 +#define WORKER_CHUNK_COUNT 50 /* should be mod of RINGSIZE */ + +#define IsParallelCopy() (cstate->is_parallel) +#define IsLeader()(cstate->pcdata->is_leader)
+#define IsHeaderLine() (cstate->header_line &&
cstate->cur_lineno == 1)
+ +/* + * Copy data block information. + */ +typedef struct CopyDataBlock +{ + /* The number of unprocessed chunks in the current block. */ + pg_atomic_uint32 unprocessed_chunk_parts; + + /* + * If the current chunk data is continued into another block, + * following_block will have the position where the remainingdata need to
+ * be read. + */ + uint32 following_block; + + /* + * This flag will be set, when the leader finds out this blockcan be read
+ * safely by the worker. This helps the worker to start
processing the chunk
+ * early where the chunk will be spread across many blocks and
the worker
+ * need not wait for the complete chunk to be processed. + */ + bool curr_blk_completed; + char data[DATA_BLOCK_SIZE + 1]; /* data read from file */ +}CopyDataBlock;What's the + 1 here about?
Fixed this, removed +1. That is not needed.
+/* + * Parallel copy line buffer information. + */ +typedef struct ParallelCopyLineBuf +{ + StringInfoData line_buf; + uint64 cur_lineno; /* line numberfor error messages */
+}ParallelCopyLineBuf;
Why do we need separate infrastructure for this? We shouldn't duplicate
infrastructure unnecessarily.This was required for copying the multiple records locally and
releasing the shared memory. I have not changed this, will decide on
this based on the decision taken for one of the previous comments.+/* + * Common information that need to be copied to shared memory. + */ +typedef struct CopyWorkerCommonData +{Why is parallel specific stuff here suddenly not named ParallelCopy*
anymore? If you introduce a naming like that it imo should be used
consistently.Fixed, changed to maintain ParallelCopy in all structs.
+ /* low-level state data */ + CopyDest copy_dest; /* type of copysource/destination */
+ int file_encoding; /* file or remote side's
character encoding */
+ bool need_transcoding; /* file encoding diff
from server? */
+ bool encoding_embeds_ascii; /* ASCII can be
non-first byte? */
+ + /* parameters from the COPY command */ + bool csv_mode; /* Comma Separated Valueformat? */
+ bool header_line; /* CSV header line? */ + int null_print_len; /* length of same */ + bool force_quote_all; /* FORCE_QUOTE *? */ + bool convert_selectively; /* do selectivebinary conversion? */
+ + /* Working state for COPY FROM */ + AttrNumber num_defaults; + Oid relid; +}CopyWorkerCommonData;But I actually think we shouldn't have this information in two different
structs. This should exist once, independent of using parallel /
non-parallel copy.This structure helps in storing the common data from CopyStateData
that are required by the workers. This information will then be
allocated and stored into the DSM for the worker to retrieve and copy
it to CopyStateData.+/* List information */ +typedef struct ListInfo +{ + int count; /* count of attributes */ + + /* string info in the form info followed by info1, info2...infon */
+ char info[1];
+} ListInfo;Based on these comments I have no idea what this could be for.
Have added better comments for this. The following is added: This
structure will help in converting a List data type into the below
structure format with the count having the number of elements in the
list and the info having the List elements appended contiguously. This
converted structure will be allocated in shared memory and stored in
DSM for the worker to retrieve and later convert it back to List data
type./* - * This keeps the character read at the top of the loop in the buffer - * even if there is more than one read-ahead. + * This keeps the character read at the top of the loop in the buffer + * even if there is more than one read-ahead. + */ +#define IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(extralen) \ +if (1) \ +{ \ + if (copy_buff_state.raw_buf_ptr + (extralen) >=copy_buff_state.copy_buf_len && !hit_eof) \
+ { \ + if (IsParallelCopy()) \ + { \ + copy_buff_state.chunk_size = prev_chunk_size; /*update previous chunk size */ \
+ if (copy_buff_state.block_switched) \ + { \ +pg_atomic_sub_fetch_u32(©_buff_state.data_blk_ptr->unprocessed_chunk_parts,
1); \+ copy_buff_state.copy_buf_len =
prev_copy_buf_len; \
+ } \ + } \ + copy_buff_state.raw_buf_ptr = prev_raw_ptr; /* undofetch */ \
+ need_data = true; \ + continue; \ + } \ +} else ((void) 0)I think it's an absolutely clear no-go to add new branches to
these. They're *really* hot already, and this is going to sprinkle a
significant amount of new instructions over a lot of places.Fixed, removed this.
+/* + * SET_RAWBUF_FOR_LOAD - Set raw_buf to the shared memory where thefile data must
+ * be read. + */ +#define SET_RAWBUF_FOR_LOAD() \ +{ \ + ShmCopyInfo *pcshared_info = cstate->pcdata->pcshared_info; \ + uint32 cur_block_pos; \ + /* \ + * Mark the previous block as completed, worker can startcopying this data. \
+ */ \ + if (copy_buff_state.data_blk_ptr !=copy_buff_state.curr_data_blk_ptr && \
+ copy_buff_state.data_blk_ptr->curr_blk_completed ==
false) \
+ copy_buff_state.data_blk_ptr->curr_blk_completed = true;
\
+ \
+ copy_buff_state.data_blk_ptr =copy_buff_state.curr_data_blk_ptr; \
+ cur_block_pos = WaitGetFreeCopyBlock(pcshared_info); \ + copy_buff_state.curr_data_blk_ptr =&pcshared_info->data_blocks[cur_block_pos]; \
+ \ + if (!copy_buff_state.data_blk_ptr) \ + { \ + copy_buff_state.data_blk_ptr =copy_buff_state.curr_data_blk_ptr; \
+ chunk_first_block = cur_block_pos; \ + } \ + else if (need_data == false) \ + copy_buff_state.data_blk_ptr->following_block =cur_block_pos; \
+ \ + cstate->raw_buf = copy_buff_state.curr_data_blk_ptr->data; \ + copy_buff_state.copy_raw_buf = cstate->raw_buf; \ +} + +/* + * END_CHUNK_PARALLEL_COPY - Update the chunk information in sharedmemory.
+ */ +#define END_CHUNK_PARALLEL_COPY() \ +{ \ + if (!IsHeaderLine()) \ + { \ + ShmCopyInfo *pcshared_info =cstate->pcdata->pcshared_info; \
+ ChunkBoundaries *chunkBoundaryPtr =
&pcshared_info->chunk_boundaries; \
+ if (copy_buff_state.chunk_size) \ + { \ + ChunkBoundary *chunkInfo =&chunkBoundaryPtr->ring[chunk_pos]; \
+ /* \ + * If raw_buf_ptr is zero,unprocessed_chunk_parts would have been \
+ * incremented in SEEK_COPY_BUFF_POS. This will
happen if the whole \
+ * chunk finishes at the end of the current
block. If the \
+ * new_line_size > raw_buf_ptr, then the new
block has only new line \
+ * char content. The unprocessed count should
not be increased in \
+ * this case. \ + */ \ + if (copy_buff_state.raw_buf_ptr != 0 && \ + copy_buff_state.raw_buf_ptr >new_line_size) \
+
pg_atomic_add_fetch_u32(©_buff_state.curr_data_blk_ptr->unprocessed_chunk_parts,
1); \+ \ + /* Update chunk size. */ \ + pg_atomic_write_u32(&chunkInfo->chunk_size,copy_buff_state.chunk_size); \
+ pg_atomic_write_u32(&chunkInfo->chunk_state,
CHUNK_LEADER_POPULATED); \
+ elog(DEBUG1, "[Leader] After adding - chunk
position:%d, chunk_size:%d", \
+ chunk_pos,
copy_buff_state.chunk_size); \
+ pcshared_info->populated++; \ + } \ + else if (new_line_size) \ + { \ + /* \ + * This means only new line char, empty recordshould be \
+ * inserted. \ + */ \ + ChunkBoundary *chunkInfo; \ + chunk_pos = UpdateBlockInChunkInfo(cstate, -1,-1, 0, \
+
CHUNK_LEADER_POPULATED); \
+ chunkInfo = &chunkBoundaryPtr->ring[chunk_pos]; \ + elog(DEBUG1, "[Leader] Added empty chunk withoffset:%d, chunk position:%d, chunk size:%d", \
+
chunkInfo->start_offset, chunk_pos, \
+
pg_atomic_read_u32(&chunkInfo->chunk_size)); \
+ pcshared_info->populated++; \ + } \ + }\ + \ + /*\ + * All of the read data is processed, reset index & len. In the\ + * subsequent read, we will get a new block and copy data in tothe\
+ * new block.\ + */\ + if (copy_buff_state.raw_buf_ptr == copy_buff_state.copy_buf_len)\ + {\ + cstate->raw_buf_index = 0;\ + cstate->raw_buf_len = 0;\ + }\ + else\ + cstate->raw_buf_len = copy_buff_state.copy_buf_len;\ +}Why are these macros? They are way way way above a length where that
makes any sort of sense.Converted these macros to functions.
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
Thanks Ashutosh For your review, my comments are inline.
On Fri, Jun 19, 2020 at 5:41 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
Hi,
I just got some time to review the first patch in the list i.e. 0001-Copy-code-readjustment-to-support-parallel-copy.patch. As the patch name suggests, it is just trying to reshuffle the existing code for COPY command here and there. There is no extra changes added in the patch as such, but still I do have some review comments, please have a look:
1) Can you please add some comments atop the new function PopulateAttributes() describing its functionality in detail. Further, this new function contains the code from BeginCopy() to set attribute level options used with COPY FROM such as FORCE_QUOTE, FORCE_NOT_NULL, FORCE_NULL etc. in cstate and along with that it also copies the code from BeginCopy() to set other infos such as client encoding type, encoding conversion etc. Hence, I think it would be good to give it some better name, basically something that matches with what actually it is doing.
There is no new code added in this function, some part of code from
BeginCopy was made in to a new function as this part of code will also
be required for the parallel copy workers before the workers start the
actual copy operation. This code was made into a function to avoid
duplication. Changed the function name to PopulateGlobalsForCopyFrom &
added few comments.
2) Again, the name for the new function CheckCopyFromValidity() doesn't look good to me. From the function name it appears as if it does the sanity check of the entire COPY FROM command, but actually it is just doing the sanity check for the target relation specified with COPY FROM. So, probably something like CheckTargetRelValidity would look more sensible, I think? TBH, I am not good at naming the functions so you can always ignore my suggestions about function and variable names :)
Changed as suggested.
3) Any reason for not making CheckCopyFromValidity as a macro instead of a new function. It is just doing the sanity check for the target relation.
I felt there is reasonable number of lines in the function & it is not
in performance intensive path, so I preferred function over macro.
Your thoughts?
4) Earlier in CopyReadLine() function while trying to clear the EOL marker from cstate->line_buf.data (copied data), we were not checking if the line read by CopyReadLineText() function is a header line or not, but I can see that your patch checks that before clearing the EOL marker. Any reason for this extra check?
If you see the caller of CopyReadLine, i.e. NextCopyFromRawFields does
nothing for the header line, server basically calls CopyReadLine
again, it is a kind of small optimization. Anyway server is not going
to do anything with header line, I felt no need to clear EOL marker
for header lines.
/* on input just throw the header line away */
if (cstate->cur_lineno == 0 && cstate->header_line)
{
cstate->cur_lineno++;
if (CopyReadLine(cstate))
return false; /* done */
}
cstate->cur_lineno++;
/* Actually read the line into memory here */
done = CopyReadLine(cstate);
I think no need to make a fix for this. Your thoughts?
5) I noticed the below spurious line removal in the patch.
@@ -3839,7 +3953,6 @@ static bool
CopyReadLine(CopyState cstate)
{
bool result;
-
Fixed.
I have attached the patch for the same with the fixes.
Thoughts?
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
Attachments:
0001-Copy-code-readjustment-to-support-parallel-copy.patchtext/x-patch; charset=US-ASCII; name=0001-Copy-code-readjustment-to-support-parallel-copy.patchDownload
From 4455d3e067bda56316bb292e5d010bdf40254fec Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Tue, 23 Jun 2020 06:49:22 +0530
Subject: [PATCH 1/6] Copy code readjustment to support parallel copy.
This patch has the copy code slightly readjusted so that the common code is
separated to functions/macros, these functions/macros will be used by the
workers in the parallel copy code of the upcoming patches. EOL removal is moved
from CopyReadLine to CopyReadLineText, this change was required because in case
of parallel copy the record identification and record updation is done in
CopyReadLineText, before record information is updated in shared memory the new
line characters should be removed.
---
src/backend/commands/copy.c | 372 ++++++++++++++++++++++++++------------------
1 file changed, 221 insertions(+), 151 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 6b1fd6d..65a504f 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -95,6 +95,9 @@ typedef enum CopyInsertMethod
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
} CopyInsertMethod;
+#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
+
/*
* This struct contains all the state variables used throughout a COPY
* operation. For simplicity, we use the same struct for all variants of COPY,
@@ -219,7 +222,6 @@ typedef struct CopyStateData
* converts it. Note: we guarantee that there is a \0 at
* raw_buf[raw_buf_len].
*/
-#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
char *raw_buf;
int raw_buf_index; /* next byte to process */
int raw_buf_len; /* total # of bytes stored */
@@ -347,6 +349,88 @@ if (1) \
goto not_end_of_copy; \
} else ((void) 0)
+/*
+ * CONVERT_TO_SERVER_ENCODING - convert contents to server encoding.
+ */
+#define CONVERT_TO_SERVER_ENCODING(cstate) \
+{ \
+ /* Done reading the line. Convert it to server encoding. */ \
+ if (cstate->need_transcoding) \
+ { \
+ char *cvt; \
+ cvt = pg_any_to_server(cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ cstate->file_encoding); \
+ if (cvt != cstate->line_buf.data) \
+ { \
+ /* transfer converted data back to line_buf */ \
+ resetStringInfo(&cstate->line_buf); \
+ appendBinaryStringInfo(&cstate->line_buf, cvt, strlen(cvt)); \
+ pfree(cvt); \
+ } \
+ } \
+ /* Now it's safe to use the buffer in error messages */ \
+ cstate->line_buf_converted = true; \
+}
+
+/*
+ * CLEAR_EOL_FROM_COPIED_DATA - Clear EOL from the copied data.
+ */
+#define CLEAR_EOL_FROM_COPIED_DATA(copy_line_data, copy_line_pos, copy_line_size) \
+{ \
+ /* \
+ * If we didn't hit EOF, then we must have transferred the EOL marker \
+ * to line_buf along with the data. Get rid of it. \
+ */ \
+ switch (cstate->eol_type) \
+ { \
+ case EOL_NL: \
+ Assert(copy_line_size >= 1); \
+ Assert(copy_line_data[copy_line_pos - 1] == '\n'); \
+ copy_line_data[copy_line_pos - 1] = '\0'; \
+ copy_line_size--; \
+ break; \
+ case EOL_CR: \
+ Assert(copy_line_size >= 1); \
+ Assert(copy_line_data[copy_line_pos - 1] == '\r'); \
+ copy_line_data[copy_line_pos - 1] = '\0'; \
+ copy_line_size--; \
+ break; \
+ case EOL_CRNL: \
+ Assert(copy_line_size >= 2); \
+ Assert(copy_line_data[copy_line_pos - 2] == '\r'); \
+ Assert(copy_line_data[copy_line_pos - 1] == '\n'); \
+ copy_line_data[copy_line_pos - 2] = '\0'; \
+ copy_line_size -= 2; \
+ break; \
+ case EOL_UNKNOWN: \
+ /* shouldn't get here */ \
+ Assert(false); \
+ break; \
+ } \
+}
+
+/*
+ * CLEAR_EOL_LINE - Wrapper for clearing EOL.
+ */
+#define CLEAR_EOL_LINE() \
+if (!result && !IsHeaderLine()) \
+ CLEAR_EOL_FROM_COPIED_DATA(cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ cstate->line_buf.len) \
+
+/*
+ * INCREMENTPROCESSED - Increment the lines processed.
+ */
+#define INCREMENTPROCESSED(processed) \
+processed++;
+
+/*
+ * GETPROCESSED - Get the lines processed.
+ */
+#define GETPROCESSED(processed) \
+return processed;
+
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
@@ -392,6 +476,8 @@ static bool CopyGetInt32(CopyState cstate, int32 *val);
static void CopySendInt16(CopyState cstate, int16 val);
static bool CopyGetInt16(CopyState cstate, int16 *val);
+static void PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tup_desc,
+ List *attnamelist);
/*
* Send copy start/stop messages for frontend copies. These have changed
@@ -793,6 +879,7 @@ CopyLoadRawBuf(CopyState cstate)
{
int nbytes;
int inbytes;
+ int minread = 1;
if (cstate->raw_buf_index < cstate->raw_buf_len)
{
@@ -804,8 +891,11 @@ CopyLoadRawBuf(CopyState cstate)
else
nbytes = 0; /* no data need be saved */
+ if (cstate->copy_dest == COPY_NEW_FE)
+ minread = RAW_BUF_SIZE - nbytes;
+
inbytes = CopyGetData(cstate, cstate->raw_buf + nbytes,
- 1, RAW_BUF_SIZE - nbytes);
+ minread, RAW_BUF_SIZE - nbytes);
nbytes += inbytes;
cstate->raw_buf[nbytes] = '\0';
cstate->raw_buf_index = 0;
@@ -1463,7 +1553,6 @@ BeginCopy(ParseState *pstate,
{
CopyState cstate;
TupleDesc tupDesc;
- int num_phys_attrs;
MemoryContext oldcontext;
/* Allocate workspace and zero all fields */
@@ -1629,6 +1718,24 @@ BeginCopy(ParseState *pstate,
tupDesc = cstate->queryDesc->tupDesc;
}
+ PopulateGlobalsForCopyFrom(cstate, tupDesc, attnamelist);
+ cstate->copy_dest = COPY_FILE; /* default */
+
+ MemoryContextSwitchTo(oldcontext);
+
+ return cstate;
+}
+
+/*
+ * PopulateGlobalsForCopyFrom - Populates the common variables required for copy
+ * from operation. This is a helper function for BeginCopy function.
+ */
+static void
+PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tupDesc,
+ List *attnamelist)
+{
+ int num_phys_attrs;
+
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
@@ -1748,12 +1855,6 @@ BeginCopy(ParseState *pstate,
pg_database_encoding_max_length() > 1);
/* See Multibyte encoding comment above */
cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
-
- cstate->copy_dest = COPY_FILE; /* default */
-
- MemoryContextSwitchTo(oldcontext);
-
- return cstate;
}
/*
@@ -2646,32 +2747,11 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
}
/*
- * Copy FROM file to relation.
+ * Check if the relation specified in copy from is valid.
*/
-uint64
-CopyFrom(CopyState cstate)
+static void
+CheckTargetRelValidity(CopyState cstate)
{
- ResultRelInfo *resultRelInfo;
- ResultRelInfo *target_resultRelInfo;
- ResultRelInfo *prevResultRelInfo = NULL;
- EState *estate = CreateExecutorState(); /* for ExecConstraints() */
- ModifyTableState *mtstate;
- ExprContext *econtext;
- TupleTableSlot *singleslot = NULL;
- MemoryContext oldcontext = CurrentMemoryContext;
-
- PartitionTupleRouting *proute = NULL;
- ErrorContextCallback errcallback;
- CommandId mycid = GetCurrentCommandId(true);
- int ti_options = 0; /* start with default options for insert */
- BulkInsertState bistate = NULL;
- CopyInsertMethod insertMethod;
- CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
- uint64 processed = 0;
- bool has_before_insert_row_trig;
- bool has_instead_insert_row_trig;
- bool leafpart_use_multi_insert = false;
-
Assert(cstate->rel);
/*
@@ -2708,27 +2788,6 @@ CopyFrom(CopyState cstate)
RelationGetRelationName(cstate->rel))));
}
- /*
- * If the target file is new-in-transaction, we assume that checking FSM
- * for free space is a waste of time. This could possibly be wrong, but
- * it's unlikely.
- */
- if (RELKIND_HAS_STORAGE(cstate->rel->rd_rel->relkind) &&
- (cstate->rel->rd_createSubid != InvalidSubTransactionId ||
- cstate->rel->rd_firstRelfilenodeSubid != InvalidSubTransactionId))
- ti_options |= TABLE_INSERT_SKIP_FSM;
-
- /*
- * Optimize if new relfilenode was created in this subxact or one of its
- * committed children and we won't see those rows later as part of an
- * earlier scan or command. The subxact test ensures that if this subxact
- * aborts then the frozen rows won't be visible after xact cleanup. Note
- * that the stronger test of exactly which subtransaction created it is
- * crucial for correctness of this optimization. The test for an earlier
- * scan or command tolerates false negatives. FREEZE causes other sessions
- * to see rows they would not see under MVCC, and a false negative merely
- * spreads that anomaly to the current session.
- */
if (cstate->freeze)
{
/*
@@ -2766,9 +2825,61 @@ CopyFrom(CopyState cstate)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("cannot perform COPY FREEZE because the table was not created or truncated in the current subtransaction")));
+ }
+}
+
+/*
+ * Copy FROM file to relation.
+ */
+uint64
+CopyFrom(CopyState cstate)
+{
+ ResultRelInfo *resultRelInfo;
+ ResultRelInfo *target_resultRelInfo;
+ ResultRelInfo *prevResultRelInfo = NULL;
+ EState *estate = CreateExecutorState(); /* for ExecConstraints() */
+ ModifyTableState *mtstate;
+ ExprContext *econtext;
+ TupleTableSlot *singleslot = NULL;
+ MemoryContext oldcontext = CurrentMemoryContext;
+
+ PartitionTupleRouting *proute = NULL;
+ ErrorContextCallback errcallback;
+ CommandId mycid = GetCurrentCommandId(true);
+ int ti_options = 0; /* start with default options for insert */
+ BulkInsertState bistate = NULL;
+ CopyInsertMethod insertMethod;
+ CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
+ uint64 processed = 0;
+ bool has_before_insert_row_trig;
+ bool has_instead_insert_row_trig;
+ bool leafpart_use_multi_insert = false;
+
+ CheckTargetRelValidity(cstate);
+ /*
+ * If the target file is new-in-transaction, we assume that checking FSM
+ * for free space is a waste of time. This could possibly be wrong, but
+ * it's unlikely.
+ */
+ if (RELKIND_HAS_STORAGE(cstate->rel->rd_rel->relkind) &&
+ (cstate->rel->rd_createSubid != InvalidSubTransactionId ||
+ cstate->rel->rd_firstRelfilenodeSubid != InvalidSubTransactionId))
+ ti_options |= TABLE_INSERT_SKIP_FSM;
+
+ /*
+ * Optimize if new relfilenode was created in this subxact or one of its
+ * committed children and we won't see those rows later as part of an
+ * earlier scan or command. The subxact test ensures that if this subxact
+ * aborts then the frozen rows won't be visible after xact cleanup. Note
+ * that the stronger test of exactly which subtransaction created it is
+ * crucial for correctness of this optimization. The test for an earlier
+ * scan or command tolerates false negatives. FREEZE causes other sessions
+ * to see rows they would not see under MVCC, and a false negative merely
+ * spreads that anomaly to the current session.
+ */
+ if (cstate->freeze)
ti_options |= TABLE_INSERT_FROZEN;
- }
/*
* We need a ResultRelInfo so we can use the regular executor's
@@ -3261,7 +3372,7 @@ CopyFrom(CopyState cstate)
* or FDW; this is the same definition used by nodeModifyTable.c
* for counting tuples inserted by an INSERT command.
*/
- processed++;
+ INCREMENTPROCESSED(processed)
}
}
@@ -3316,30 +3427,15 @@ CopyFrom(CopyState cstate)
FreeExecutorState(estate);
- return processed;
+ GETPROCESSED(processed)
}
/*
- * Setup to read tuples from a file for COPY FROM.
- *
- * 'rel': Used as a template for the tuples
- * 'filename': Name of server-local file to read
- * 'attnamelist': List of char *, columns to include. NIL selects all cols.
- * 'options': List of DefElem. See copy_opt_item in gram.y for selections.
- *
- * Returns a CopyState, to be passed to NextCopyFrom and related functions.
+ * PopulateCatalogInformation - populate the catalog information.
*/
-CopyState
-BeginCopyFrom(ParseState *pstate,
- Relation rel,
- const char *filename,
- bool is_program,
- copy_data_source_cb data_source_cb,
- List *attnamelist,
- List *options)
+static void
+PopulateCatalogInformation(CopyState cstate)
{
- CopyState cstate;
- bool pipe = (filename == NULL);
TupleDesc tupDesc;
AttrNumber num_phys_attrs,
num_defaults;
@@ -3349,31 +3445,8 @@ BeginCopyFrom(ParseState *pstate,
Oid in_func_oid;
int *defmap;
ExprState **defexprs;
- MemoryContext oldcontext;
bool volatile_defexprs;
- cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
- oldcontext = MemoryContextSwitchTo(cstate->copycontext);
-
- /* Initialize state variables */
- cstate->reached_eof = false;
- cstate->eol_type = EOL_UNKNOWN;
- cstate->cur_relname = RelationGetRelationName(cstate->rel);
- cstate->cur_lineno = 0;
- cstate->cur_attname = NULL;
- cstate->cur_attval = NULL;
-
- /* Set up variables to avoid per-attribute overhead. */
- initStringInfo(&cstate->attribute_buf);
- initStringInfo(&cstate->line_buf);
- cstate->line_buf_converted = false;
- cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
- cstate->raw_buf_index = cstate->raw_buf_len = 0;
-
- /* Assign range table, we'll need it in CopyFrom. */
- if (pstate)
- cstate->range_table = pstate->p_rtable;
-
tupDesc = RelationGetDescr(cstate->rel);
num_phys_attrs = tupDesc->natts;
num_defaults = 0;
@@ -3451,6 +3524,54 @@ BeginCopyFrom(ParseState *pstate,
cstate->defexprs = defexprs;
cstate->volatile_defexprs = volatile_defexprs;
cstate->num_defaults = num_defaults;
+}
+
+/*
+ * Setup to read tuples from a file for COPY FROM.
+ *
+ * 'rel': Used as a template for the tuples
+ * 'filename': Name of server-local file to read
+ * 'attnamelist': List of char *, columns to include. NIL selects all cols.
+ * 'options': List of DefElem. See copy_opt_item in gram.y for selections.
+ *
+ * Returns a CopyState, to be passed to NextCopyFrom and related functions.
+ */
+CopyState
+BeginCopyFrom(ParseState *pstate,
+ Relation rel,
+ const char *filename,
+ bool is_program,
+ copy_data_source_cb data_source_cb,
+ List *attnamelist,
+ List *options)
+{
+ CopyState cstate;
+ bool pipe = (filename == NULL);
+ MemoryContext oldcontext;
+
+ cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+
+ /* Initialize state variables */
+ cstate->reached_eof = false;
+ cstate->eol_type = EOL_UNKNOWN;
+ cstate->cur_relname = RelationGetRelationName(cstate->rel);
+ cstate->cur_lineno = 0;
+ cstate->cur_attname = NULL;
+ cstate->cur_attval = NULL;
+
+ /* Set up variables to avoid per-attribute overhead. */
+ initStringInfo(&cstate->attribute_buf);
+ initStringInfo(&cstate->line_buf);
+ cstate->line_buf_converted = false;
+ cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+
+ /* Assign range table, we'll need it in CopyFrom. */
+ if (pstate)
+ cstate->range_table = pstate->p_rtable;
+
+ PopulateCatalogInformation(cstate);
cstate->is_program = is_program;
if (data_source_cb)
@@ -3860,60 +3981,8 @@ CopyReadLine(CopyState cstate)
} while (CopyLoadRawBuf(cstate));
}
}
- else
- {
- /*
- * If we didn't hit EOF, then we must have transferred the EOL marker
- * to line_buf along with the data. Get rid of it.
- */
- switch (cstate->eol_type)
- {
- case EOL_NL:
- Assert(cstate->line_buf.len >= 1);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\n');
- cstate->line_buf.len--;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_CR:
- Assert(cstate->line_buf.len >= 1);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\r');
- cstate->line_buf.len--;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_CRNL:
- Assert(cstate->line_buf.len >= 2);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 2] == '\r');
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\n');
- cstate->line_buf.len -= 2;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_UNKNOWN:
- /* shouldn't get here */
- Assert(false);
- break;
- }
- }
-
- /* Done reading the line. Convert it to server encoding. */
- if (cstate->need_transcoding)
- {
- char *cvt;
-
- cvt = pg_any_to_server(cstate->line_buf.data,
- cstate->line_buf.len,
- cstate->file_encoding);
- if (cvt != cstate->line_buf.data)
- {
- /* transfer converted data back to line_buf */
- resetStringInfo(&cstate->line_buf);
- appendBinaryStringInfo(&cstate->line_buf, cvt, strlen(cvt));
- pfree(cvt);
- }
- }
-
- /* Now it's safe to use the buffer in error messages */
- cstate->line_buf_converted = true;
+ CONVERT_TO_SERVER_ENCODING(cstate)
return result;
}
@@ -4277,6 +4346,7 @@ not_end_of_copy:
* Transfer any still-uncopied data to line_buf.
*/
REFILL_LINEBUF;
+ CLEAR_EOL_LINE()
return result;
}
--
1.8.3.1
On Tue, Jun 23, 2020 at 8:07 AM vignesh C <vignesh21@gmail.com> wrote:
I have attached the patch for the same with the fixes.
The patches were not applying on the head, attached the patches that can be
applied on head.
I have added a commitfest entry[1]https://commitfest.postgresql.org/28/2610/ for this feature.
[1]: https://commitfest.postgresql.org/28/2610/
On Tue, Jun 23, 2020 at 8:07 AM vignesh C <vignesh21@gmail.com> wrote:
Show quoted text
Thanks Ashutosh For your review, my comments are inline.
On Fri, Jun 19, 2020 at 5:41 PM Ashutosh Sharma <ashu.coek88@gmail.com>
wrote:Hi,
I just got some time to review the first patch in the list i.e.
0001-Copy-code-readjustment-to-support-parallel-copy.patch. As the patch
name suggests, it is just trying to reshuffle the existing code for COPY
command here and there. There is no extra changes added in the patch as
such, but still I do have some review comments, please have a look:1) Can you please add some comments atop the new function
PopulateAttributes() describing its functionality in detail. Further, this
new function contains the code from BeginCopy() to set attribute level
options used with COPY FROM such as FORCE_QUOTE, FORCE_NOT_NULL, FORCE_NULL
etc. in cstate and along with that it also copies the code from BeginCopy()
to set other infos such as client encoding type, encoding conversion etc.
Hence, I think it would be good to give it some better name, basically
something that matches with what actually it is doing.There is no new code added in this function, some part of code from
BeginCopy was made in to a new function as this part of code will also
be required for the parallel copy workers before the workers start the
actual copy operation. This code was made into a function to avoid
duplication. Changed the function name to PopulateGlobalsForCopyFrom &
added few comments.2) Again, the name for the new function CheckCopyFromValidity() doesn't
look good to me. From the function name it appears as if it does the sanity
check of the entire COPY FROM command, but actually it is just doing the
sanity check for the target relation specified with COPY FROM. So, probably
something like CheckTargetRelValidity would look more sensible, I think?
TBH, I am not good at naming the functions so you can always ignore my
suggestions about function and variable names :)Changed as suggested.
3) Any reason for not making CheckCopyFromValidity as a macro instead of
a new function. It is just doing the sanity check for the target relation.
I felt there is reasonable number of lines in the function & it is not
in performance intensive path, so I preferred function over macro.
Your thoughts?4) Earlier in CopyReadLine() function while trying to clear the EOL
marker from cstate->line_buf.data (copied data), we were not checking if
the line read by CopyReadLineText() function is a header line or not, but I
can see that your patch checks that before clearing the EOL marker. Any
reason for this extra check?If you see the caller of CopyReadLine, i.e. NextCopyFromRawFields does
nothing for the header line, server basically calls CopyReadLine
again, it is a kind of small optimization. Anyway server is not going
to do anything with header line, I felt no need to clear EOL marker
for header lines.
/* on input just throw the header line away */
if (cstate->cur_lineno == 0 && cstate->header_line)
{
cstate->cur_lineno++;
if (CopyReadLine(cstate))
return false; /* done */
}cstate->cur_lineno++;
/* Actually read the line into memory here */
done = CopyReadLine(cstate);
I think no need to make a fix for this. Your thoughts?5) I noticed the below spurious line removal in the patch.
@@ -3839,7 +3953,6 @@ static bool
CopyReadLine(CopyState cstate)
{
bool result;
-Fixed.
I have attached the patch for the same with the fixes.
Thoughts?Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
Attachments:
0001-Copy-code-readjustment-to-support-parallel-copy.patchapplication/x-patch; name=0001-Copy-code-readjustment-to-support-parallel-copy.patchDownload
From 4455d3e067bda56316bb292e5d010bdf40254fec Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Tue, 23 Jun 2020 06:49:22 +0530
Subject: [PATCH 1/6] Copy code readjustment to support parallel copy.
This patch has the copy code slightly readjusted so that the common code is
separated to functions/macros, these functions/macros will be used by the
workers in the parallel copy code of the upcoming patches. EOL removal is moved
from CopyReadLine to CopyReadLineText, this change was required because in case
of parallel copy the record identification and record updation is done in
CopyReadLineText, before record information is updated in shared memory the new
line characters should be removed.
---
src/backend/commands/copy.c | 372 ++++++++++++++++++++++++++------------------
1 file changed, 221 insertions(+), 151 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 6b1fd6d..65a504f 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -95,6 +95,9 @@ typedef enum CopyInsertMethod
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
} CopyInsertMethod;
+#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
+
/*
* This struct contains all the state variables used throughout a COPY
* operation. For simplicity, we use the same struct for all variants of COPY,
@@ -219,7 +222,6 @@ typedef struct CopyStateData
* converts it. Note: we guarantee that there is a \0 at
* raw_buf[raw_buf_len].
*/
-#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
char *raw_buf;
int raw_buf_index; /* next byte to process */
int raw_buf_len; /* total # of bytes stored */
@@ -347,6 +349,88 @@ if (1) \
goto not_end_of_copy; \
} else ((void) 0)
+/*
+ * CONVERT_TO_SERVER_ENCODING - convert contents to server encoding.
+ */
+#define CONVERT_TO_SERVER_ENCODING(cstate) \
+{ \
+ /* Done reading the line. Convert it to server encoding. */ \
+ if (cstate->need_transcoding) \
+ { \
+ char *cvt; \
+ cvt = pg_any_to_server(cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ cstate->file_encoding); \
+ if (cvt != cstate->line_buf.data) \
+ { \
+ /* transfer converted data back to line_buf */ \
+ resetStringInfo(&cstate->line_buf); \
+ appendBinaryStringInfo(&cstate->line_buf, cvt, strlen(cvt)); \
+ pfree(cvt); \
+ } \
+ } \
+ /* Now it's safe to use the buffer in error messages */ \
+ cstate->line_buf_converted = true; \
+}
+
+/*
+ * CLEAR_EOL_FROM_COPIED_DATA - Clear EOL from the copied data.
+ */
+#define CLEAR_EOL_FROM_COPIED_DATA(copy_line_data, copy_line_pos, copy_line_size) \
+{ \
+ /* \
+ * If we didn't hit EOF, then we must have transferred the EOL marker \
+ * to line_buf along with the data. Get rid of it. \
+ */ \
+ switch (cstate->eol_type) \
+ { \
+ case EOL_NL: \
+ Assert(copy_line_size >= 1); \
+ Assert(copy_line_data[copy_line_pos - 1] == '\n'); \
+ copy_line_data[copy_line_pos - 1] = '\0'; \
+ copy_line_size--; \
+ break; \
+ case EOL_CR: \
+ Assert(copy_line_size >= 1); \
+ Assert(copy_line_data[copy_line_pos - 1] == '\r'); \
+ copy_line_data[copy_line_pos - 1] = '\0'; \
+ copy_line_size--; \
+ break; \
+ case EOL_CRNL: \
+ Assert(copy_line_size >= 2); \
+ Assert(copy_line_data[copy_line_pos - 2] == '\r'); \
+ Assert(copy_line_data[copy_line_pos - 1] == '\n'); \
+ copy_line_data[copy_line_pos - 2] = '\0'; \
+ copy_line_size -= 2; \
+ break; \
+ case EOL_UNKNOWN: \
+ /* shouldn't get here */ \
+ Assert(false); \
+ break; \
+ } \
+}
+
+/*
+ * CLEAR_EOL_LINE - Wrapper for clearing EOL.
+ */
+#define CLEAR_EOL_LINE() \
+if (!result && !IsHeaderLine()) \
+ CLEAR_EOL_FROM_COPIED_DATA(cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ cstate->line_buf.len) \
+
+/*
+ * INCREMENTPROCESSED - Increment the lines processed.
+ */
+#define INCREMENTPROCESSED(processed) \
+processed++;
+
+/*
+ * GETPROCESSED - Get the lines processed.
+ */
+#define GETPROCESSED(processed) \
+return processed;
+
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
@@ -392,6 +476,8 @@ static bool CopyGetInt32(CopyState cstate, int32 *val);
static void CopySendInt16(CopyState cstate, int16 val);
static bool CopyGetInt16(CopyState cstate, int16 *val);
+static void PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tup_desc,
+ List *attnamelist);
/*
* Send copy start/stop messages for frontend copies. These have changed
@@ -793,6 +879,7 @@ CopyLoadRawBuf(CopyState cstate)
{
int nbytes;
int inbytes;
+ int minread = 1;
if (cstate->raw_buf_index < cstate->raw_buf_len)
{
@@ -804,8 +891,11 @@ CopyLoadRawBuf(CopyState cstate)
else
nbytes = 0; /* no data need be saved */
+ if (cstate->copy_dest == COPY_NEW_FE)
+ minread = RAW_BUF_SIZE - nbytes;
+
inbytes = CopyGetData(cstate, cstate->raw_buf + nbytes,
- 1, RAW_BUF_SIZE - nbytes);
+ minread, RAW_BUF_SIZE - nbytes);
nbytes += inbytes;
cstate->raw_buf[nbytes] = '\0';
cstate->raw_buf_index = 0;
@@ -1463,7 +1553,6 @@ BeginCopy(ParseState *pstate,
{
CopyState cstate;
TupleDesc tupDesc;
- int num_phys_attrs;
MemoryContext oldcontext;
/* Allocate workspace and zero all fields */
@@ -1629,6 +1718,24 @@ BeginCopy(ParseState *pstate,
tupDesc = cstate->queryDesc->tupDesc;
}
+ PopulateGlobalsForCopyFrom(cstate, tupDesc, attnamelist);
+ cstate->copy_dest = COPY_FILE; /* default */
+
+ MemoryContextSwitchTo(oldcontext);
+
+ return cstate;
+}
+
+/*
+ * PopulateGlobalsForCopyFrom - Populates the common variables required for copy
+ * from operation. This is a helper function for BeginCopy function.
+ */
+static void
+PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tupDesc,
+ List *attnamelist)
+{
+ int num_phys_attrs;
+
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
@@ -1748,12 +1855,6 @@ BeginCopy(ParseState *pstate,
pg_database_encoding_max_length() > 1);
/* See Multibyte encoding comment above */
cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
-
- cstate->copy_dest = COPY_FILE; /* default */
-
- MemoryContextSwitchTo(oldcontext);
-
- return cstate;
}
/*
@@ -2646,32 +2747,11 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
}
/*
- * Copy FROM file to relation.
+ * Check if the relation specified in copy from is valid.
*/
-uint64
-CopyFrom(CopyState cstate)
+static void
+CheckTargetRelValidity(CopyState cstate)
{
- ResultRelInfo *resultRelInfo;
- ResultRelInfo *target_resultRelInfo;
- ResultRelInfo *prevResultRelInfo = NULL;
- EState *estate = CreateExecutorState(); /* for ExecConstraints() */
- ModifyTableState *mtstate;
- ExprContext *econtext;
- TupleTableSlot *singleslot = NULL;
- MemoryContext oldcontext = CurrentMemoryContext;
-
- PartitionTupleRouting *proute = NULL;
- ErrorContextCallback errcallback;
- CommandId mycid = GetCurrentCommandId(true);
- int ti_options = 0; /* start with default options for insert */
- BulkInsertState bistate = NULL;
- CopyInsertMethod insertMethod;
- CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
- uint64 processed = 0;
- bool has_before_insert_row_trig;
- bool has_instead_insert_row_trig;
- bool leafpart_use_multi_insert = false;
-
Assert(cstate->rel);
/*
@@ -2708,27 +2788,6 @@ CopyFrom(CopyState cstate)
RelationGetRelationName(cstate->rel))));
}
- /*
- * If the target file is new-in-transaction, we assume that checking FSM
- * for free space is a waste of time. This could possibly be wrong, but
- * it's unlikely.
- */
- if (RELKIND_HAS_STORAGE(cstate->rel->rd_rel->relkind) &&
- (cstate->rel->rd_createSubid != InvalidSubTransactionId ||
- cstate->rel->rd_firstRelfilenodeSubid != InvalidSubTransactionId))
- ti_options |= TABLE_INSERT_SKIP_FSM;
-
- /*
- * Optimize if new relfilenode was created in this subxact or one of its
- * committed children and we won't see those rows later as part of an
- * earlier scan or command. The subxact test ensures that if this subxact
- * aborts then the frozen rows won't be visible after xact cleanup. Note
- * that the stronger test of exactly which subtransaction created it is
- * crucial for correctness of this optimization. The test for an earlier
- * scan or command tolerates false negatives. FREEZE causes other sessions
- * to see rows they would not see under MVCC, and a false negative merely
- * spreads that anomaly to the current session.
- */
if (cstate->freeze)
{
/*
@@ -2766,9 +2825,61 @@ CopyFrom(CopyState cstate)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("cannot perform COPY FREEZE because the table was not created or truncated in the current subtransaction")));
+ }
+}
+
+/*
+ * Copy FROM file to relation.
+ */
+uint64
+CopyFrom(CopyState cstate)
+{
+ ResultRelInfo *resultRelInfo;
+ ResultRelInfo *target_resultRelInfo;
+ ResultRelInfo *prevResultRelInfo = NULL;
+ EState *estate = CreateExecutorState(); /* for ExecConstraints() */
+ ModifyTableState *mtstate;
+ ExprContext *econtext;
+ TupleTableSlot *singleslot = NULL;
+ MemoryContext oldcontext = CurrentMemoryContext;
+
+ PartitionTupleRouting *proute = NULL;
+ ErrorContextCallback errcallback;
+ CommandId mycid = GetCurrentCommandId(true);
+ int ti_options = 0; /* start with default options for insert */
+ BulkInsertState bistate = NULL;
+ CopyInsertMethod insertMethod;
+ CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
+ uint64 processed = 0;
+ bool has_before_insert_row_trig;
+ bool has_instead_insert_row_trig;
+ bool leafpart_use_multi_insert = false;
+
+ CheckTargetRelValidity(cstate);
+ /*
+ * If the target file is new-in-transaction, we assume that checking FSM
+ * for free space is a waste of time. This could possibly be wrong, but
+ * it's unlikely.
+ */
+ if (RELKIND_HAS_STORAGE(cstate->rel->rd_rel->relkind) &&
+ (cstate->rel->rd_createSubid != InvalidSubTransactionId ||
+ cstate->rel->rd_firstRelfilenodeSubid != InvalidSubTransactionId))
+ ti_options |= TABLE_INSERT_SKIP_FSM;
+
+ /*
+ * Optimize if new relfilenode was created in this subxact or one of its
+ * committed children and we won't see those rows later as part of an
+ * earlier scan or command. The subxact test ensures that if this subxact
+ * aborts then the frozen rows won't be visible after xact cleanup. Note
+ * that the stronger test of exactly which subtransaction created it is
+ * crucial for correctness of this optimization. The test for an earlier
+ * scan or command tolerates false negatives. FREEZE causes other sessions
+ * to see rows they would not see under MVCC, and a false negative merely
+ * spreads that anomaly to the current session.
+ */
+ if (cstate->freeze)
ti_options |= TABLE_INSERT_FROZEN;
- }
/*
* We need a ResultRelInfo so we can use the regular executor's
@@ -3261,7 +3372,7 @@ CopyFrom(CopyState cstate)
* or FDW; this is the same definition used by nodeModifyTable.c
* for counting tuples inserted by an INSERT command.
*/
- processed++;
+ INCREMENTPROCESSED(processed)
}
}
@@ -3316,30 +3427,15 @@ CopyFrom(CopyState cstate)
FreeExecutorState(estate);
- return processed;
+ GETPROCESSED(processed)
}
/*
- * Setup to read tuples from a file for COPY FROM.
- *
- * 'rel': Used as a template for the tuples
- * 'filename': Name of server-local file to read
- * 'attnamelist': List of char *, columns to include. NIL selects all cols.
- * 'options': List of DefElem. See copy_opt_item in gram.y for selections.
- *
- * Returns a CopyState, to be passed to NextCopyFrom and related functions.
+ * PopulateCatalogInformation - populate the catalog information.
*/
-CopyState
-BeginCopyFrom(ParseState *pstate,
- Relation rel,
- const char *filename,
- bool is_program,
- copy_data_source_cb data_source_cb,
- List *attnamelist,
- List *options)
+static void
+PopulateCatalogInformation(CopyState cstate)
{
- CopyState cstate;
- bool pipe = (filename == NULL);
TupleDesc tupDesc;
AttrNumber num_phys_attrs,
num_defaults;
@@ -3349,31 +3445,8 @@ BeginCopyFrom(ParseState *pstate,
Oid in_func_oid;
int *defmap;
ExprState **defexprs;
- MemoryContext oldcontext;
bool volatile_defexprs;
- cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
- oldcontext = MemoryContextSwitchTo(cstate->copycontext);
-
- /* Initialize state variables */
- cstate->reached_eof = false;
- cstate->eol_type = EOL_UNKNOWN;
- cstate->cur_relname = RelationGetRelationName(cstate->rel);
- cstate->cur_lineno = 0;
- cstate->cur_attname = NULL;
- cstate->cur_attval = NULL;
-
- /* Set up variables to avoid per-attribute overhead. */
- initStringInfo(&cstate->attribute_buf);
- initStringInfo(&cstate->line_buf);
- cstate->line_buf_converted = false;
- cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
- cstate->raw_buf_index = cstate->raw_buf_len = 0;
-
- /* Assign range table, we'll need it in CopyFrom. */
- if (pstate)
- cstate->range_table = pstate->p_rtable;
-
tupDesc = RelationGetDescr(cstate->rel);
num_phys_attrs = tupDesc->natts;
num_defaults = 0;
@@ -3451,6 +3524,54 @@ BeginCopyFrom(ParseState *pstate,
cstate->defexprs = defexprs;
cstate->volatile_defexprs = volatile_defexprs;
cstate->num_defaults = num_defaults;
+}
+
+/*
+ * Setup to read tuples from a file for COPY FROM.
+ *
+ * 'rel': Used as a template for the tuples
+ * 'filename': Name of server-local file to read
+ * 'attnamelist': List of char *, columns to include. NIL selects all cols.
+ * 'options': List of DefElem. See copy_opt_item in gram.y for selections.
+ *
+ * Returns a CopyState, to be passed to NextCopyFrom and related functions.
+ */
+CopyState
+BeginCopyFrom(ParseState *pstate,
+ Relation rel,
+ const char *filename,
+ bool is_program,
+ copy_data_source_cb data_source_cb,
+ List *attnamelist,
+ List *options)
+{
+ CopyState cstate;
+ bool pipe = (filename == NULL);
+ MemoryContext oldcontext;
+
+ cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+
+ /* Initialize state variables */
+ cstate->reached_eof = false;
+ cstate->eol_type = EOL_UNKNOWN;
+ cstate->cur_relname = RelationGetRelationName(cstate->rel);
+ cstate->cur_lineno = 0;
+ cstate->cur_attname = NULL;
+ cstate->cur_attval = NULL;
+
+ /* Set up variables to avoid per-attribute overhead. */
+ initStringInfo(&cstate->attribute_buf);
+ initStringInfo(&cstate->line_buf);
+ cstate->line_buf_converted = false;
+ cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+
+ /* Assign range table, we'll need it in CopyFrom. */
+ if (pstate)
+ cstate->range_table = pstate->p_rtable;
+
+ PopulateCatalogInformation(cstate);
cstate->is_program = is_program;
if (data_source_cb)
@@ -3860,60 +3981,8 @@ CopyReadLine(CopyState cstate)
} while (CopyLoadRawBuf(cstate));
}
}
- else
- {
- /*
- * If we didn't hit EOF, then we must have transferred the EOL marker
- * to line_buf along with the data. Get rid of it.
- */
- switch (cstate->eol_type)
- {
- case EOL_NL:
- Assert(cstate->line_buf.len >= 1);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\n');
- cstate->line_buf.len--;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_CR:
- Assert(cstate->line_buf.len >= 1);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\r');
- cstate->line_buf.len--;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_CRNL:
- Assert(cstate->line_buf.len >= 2);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 2] == '\r');
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\n');
- cstate->line_buf.len -= 2;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_UNKNOWN:
- /* shouldn't get here */
- Assert(false);
- break;
- }
- }
-
- /* Done reading the line. Convert it to server encoding. */
- if (cstate->need_transcoding)
- {
- char *cvt;
-
- cvt = pg_any_to_server(cstate->line_buf.data,
- cstate->line_buf.len,
- cstate->file_encoding);
- if (cvt != cstate->line_buf.data)
- {
- /* transfer converted data back to line_buf */
- resetStringInfo(&cstate->line_buf);
- appendBinaryStringInfo(&cstate->line_buf, cvt, strlen(cvt));
- pfree(cvt);
- }
- }
-
- /* Now it's safe to use the buffer in error messages */
- cstate->line_buf_converted = true;
+ CONVERT_TO_SERVER_ENCODING(cstate)
return result;
}
@@ -4277,6 +4346,7 @@ not_end_of_copy:
* Transfer any still-uncopied data to line_buf.
*/
REFILL_LINEBUF;
+ CLEAR_EOL_LINE()
return result;
}
--
1.8.3.1
0004-Documentation-for-parallel-copy.patchapplication/x-patch; name=0004-Documentation-for-parallel-copy.patchDownload
From f06311aab275234125ed5b2c1a5c4740db55df3f Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Wed, 17 Jun 2020 07:34:56 +0530
Subject: [PATCH 4/6] Documentation for parallel copy.
This patch has the documentation changes for parallel copy.
---
doc/src/sgml/ref/copy.sgml | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18189ab..95d349d 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -37,6 +37,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
NULL '<replaceable class="parameter">null_string</replaceable>'
HEADER [ <replaceable class="parameter">boolean</replaceable> ]
+ PARALLEL <replaceable class="parameter">integer</replaceable>
QUOTE '<replaceable class="parameter">quote_character</replaceable>'
ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -275,6 +276,21 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</varlistentry>
<varlistentry>
+ <term><literal>PARALLEL</literal></term>
+ <listitem>
+ <para>
+ Perform <command>COPY FROM</command> in parallel using <replaceable
+ class="parameter"> integer</replaceable> background workers. Please
+ note that it is not guaranteed that the number of parallel workers
+ specified in <replaceable class="parameter">integer</replaceable> will
+ be used during execution. It is possible for a copy to run with fewer
+ workers than specified, or even with no workers at all. This option is
+ allowed only in <command>COPY FROM</command>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>QUOTE</literal></term>
<listitem>
<para>
--
1.8.3.1
0003-Allow-copy-from-command-to-process-data-from-file-ST.patchapplication/x-patch; name=0003-Allow-copy-from-command-to-process-data-from-file-ST.patchDownload
From efc6dcb1ad7285c4e7486cd152c5908a48a7544d Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>, Bharath <bharath.rupireddyforpostgres@gmail.com>
Date: Tue, 23 Jun 2020 07:15:43 +0530
Subject: [PATCH 3/6] Allow copy from command to process data from file/STDIN
contents to a table in parallel.
This feature allows the copy from to leverage multiple CPUs in order to copy
data from file/STDIN to a table. This adds a PARALLEL option to COPY FROM
command where the user can specify the number of workers that can be used
to perform the COPY FROM command. Specifying zero as number of workers will
disable parallelism.
The backend, to which the "COPY FROM" query is submitted acts as leader with
the responsibility of reading data from the file/stdin, launching at most n
number of workers as specified with PARALLEL 'n' option in the "COPY FROM"
query. The leader populates the common data required for the workers execution
in the DSM and shares it with the workers. The leader then executes before
statement triggers if there exists any. Leader populates DSM lines which
includes the start offset and line size, while populating the lines it reads
as many blocks as required into the DSM data blocks from the file. Each block
is of 64K size. The leader parses the data to identify a line, the existing
logic from CopyReadLineText which identifies the lines with some changes was
used for this. Leader checks if a free line is available to copy the
information, if there is no free line it waits till the required line is
freed up by the worker and then copies the identified lines information
(offset & line size) into the DSM lines. This process is repeated till the
complete file is processed. Simultaneously, the workers cache the lines(50)
locally into the local memory and release the lines to the leader for further
populating. Each worker processes the lines that it cached and inserts it into
the table. The leader does not participate in the insertion of data, leaders
only responsibility will be to identify the lines as fast as possible for the
workers to do the actual copy operation. The leader waits till all the lines
populated are processed by the workers and exits.
---
src/backend/access/heap/heapam.c | 13 -
src/backend/access/transam/xact.c | 13 +
src/backend/commands/copy.c | 896 ++++++++++++++++++++++++++++++++++++--
src/include/access/xact.h | 1 +
src/tools/pgindent/typedefs.list | 1 +
5 files changed, 873 insertions(+), 51 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 537913d..28f3a98 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2012,19 +2012,6 @@ static HeapTuple
heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
CommandId cid, int options)
{
- /*
- * Parallel operations are required to be strictly read-only in a parallel
- * worker. Parallel inserts are not safe even in the leader in the
- * general case, because group locking means that heavyweight locks for
- * relation extension or GIN page locks will not conflict between members
- * of a lock group, but we don't prohibit that case here because there are
- * useful special cases that we can safely allow, such as CREATE TABLE AS.
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-
tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 905dc7d..66f7236 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -502,6 +502,19 @@ GetCurrentFullTransactionIdIfAny(void)
}
/*
+ * AssignFullTransactionIdForWorker
+ *
+ * For parallel copy, all the workers must use the same transaction id.
+ */
+void AssignFullTransactionIdForWorker(FullTransactionId fullTransactionId)
+{
+ TransactionState s = CurrentTransactionState;
+
+ Assert((IsInParallelMode() || IsParallelWorker()));
+ s->fullTransactionId = fullTransactionId;
+}
+
+/*
* MarkCurrentTransactionIdLoggedIfAny
*
* Remember that the current xid - if it is assigned - now has been wal logged.
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 34c657c..e8a89a4 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -26,6 +26,7 @@
#include "access/xlog.h"
#include "catalog/dependency.h"
#include "catalog/pg_authid.h"
+#include "catalog/pg_proc_d.h"
#include "catalog/pg_type.h"
#include "commands/copy.h"
#include "commands/defrem.h"
@@ -40,11 +41,13 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "optimizer/clauses.h"
#include "optimizer/optimizer.h"
#include "parser/parse_coerce.h"
#include "parser/parse_collate.h"
#include "parser/parse_expr.h"
#include "parser/parse_relation.h"
+#include "pgstat.h"
#include "port/pg_bswap.h"
#include "rewrite/rewriteHandler.h"
#include "storage/fd.h"
@@ -95,6 +98,18 @@ typedef enum CopyInsertMethod
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
} CopyInsertMethod;
+/*
+ * State of the line.
+ */
+typedef enum ParallelCopyLineState
+{
+ LINE_INIT, /* initial state of line */
+ LINE_LEADER_POPULATING, /* leader processing line */
+ LINE_LEADER_POPULATED, /* leader completed populating line */
+ LINE_WORKER_PROCESSING, /* worker processing line */
+ LINE_WORKER_PROCESSED /* worker completed processing line */
+}ParallelCopyLineState;
+
#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
#define DATA_BLOCK_SIZE RAW_BUF_SIZE
@@ -538,9 +553,13 @@ if (1) \
{ \
if (raw_buf_ptr > cstate->raw_buf_index) \
{ \
- appendBinaryStringInfo(&cstate->line_buf, \
- cstate->raw_buf + cstate->raw_buf_index, \
- raw_buf_ptr - cstate->raw_buf_index); \
+ if (!IsParallelCopy()) \
+ appendBinaryStringInfo(&cstate->line_buf, \
+ cstate->raw_buf + cstate->raw_buf_index, \
+ raw_buf_ptr - cstate->raw_buf_index); \
+ else \
+ line_size += raw_buf_ptr - cstate->raw_buf_index; \
+ \
cstate->raw_buf_index = raw_buf_ptr; \
} \
} else ((void) 0)
@@ -553,13 +572,40 @@ if (1) \
goto not_end_of_copy; \
} else ((void) 0)
+/* Begin parallel copy Macros */
+#define SET_NEWLINE_SIZE() \
+{ \
+ if (cstate->eol_type == EOL_NL || cstate->eol_type == EOL_CR) \
+ new_line_size = 1; \
+ else if (cstate->eol_type == EOL_CRNL) \
+ new_line_size = 2; \
+ else \
+ new_line_size = 0; \
+}
+
+/*
+ * COPY_WAIT_TO_PROCESS - Wait before continuing to process.
+ */
+#define COPY_WAIT_TO_PROCESS() \
+{ \
+ CHECK_FOR_INTERRUPTS(); \
+ (void) WaitLatch(MyLatch, \
+ WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, \
+ 1L, WAIT_EVENT_PG_SLEEP); \
+ ResetLatch(MyLatch); \
+}
+
+/* End parallel copy Macros */
+
/*
* CONVERT_TO_SERVER_ENCODING - convert contents to server encoding.
*/
#define CONVERT_TO_SERVER_ENCODING(cstate) \
{ \
/* Done reading the line. Convert it to server encoding. */ \
- if (cstate->need_transcoding) \
+ if (cstate->need_transcoding && \
+ (!IsParallelCopy() || \
+ (IsParallelCopy() && !IsLeader()))) \
{ \
char *cvt; \
cvt = pg_any_to_server(cstate->line_buf.data, \
@@ -618,22 +664,38 @@ if (1) \
* CLEAR_EOL_LINE - Wrapper for clearing EOL.
*/
#define CLEAR_EOL_LINE() \
-if (!result && !IsHeaderLine()) \
- CLEAR_EOL_FROM_COPIED_DATA(cstate->line_buf.data, \
- cstate->line_buf.len, \
- cstate->line_buf.len) \
+{ \
+ if (!result && !IsHeaderLine()) \
+ { \
+ if (IsParallelCopy()) \
+ CLEAR_EOL_FROM_COPIED_DATA(cstate->raw_buf, \
+ raw_buf_ptr, line_size) \
+ else \
+ CLEAR_EOL_FROM_COPIED_DATA(cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ cstate->line_buf.len) \
+ } \
+}
/*
* INCREMENTPROCESSED - Increment the lines processed.
*/
-#define INCREMENTPROCESSED(processed) \
-processed++;
+#define INCREMENTPROCESSED(processed) \
+{ \
+ if (!IsParallelCopy()) \
+ processed++; \
+ else \
+ pg_atomic_add_fetch_u64(&cstate->pcdata->pcshared_info->processed, 1); \
+}
/*
* GETPROCESSED - Get the lines processed.
*/
#define GETPROCESSED(processed) \
-return processed;
+if (!IsParallelCopy()) \
+ return processed; \
+else \
+ return pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
@@ -681,8 +743,12 @@ static void CopySendInt16(CopyState cstate, int16 val);
static bool CopyGetInt16(CopyState cstate, int16 *val);
static pg_attribute_always_inline void EndParallelCopy(ParallelContext *pcxt);
+static void ExecBeforeStmtTrigger(CopyState cstate);
+static void CheckTargetRelValidity(CopyState cstate);
static void PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tup_desc,
List *attnamelist);
+static void PopulateCatalogInformation(CopyState cstate);
+static pg_attribute_always_inline uint32 GetLinePosition(CopyState cstate);
static pg_attribute_always_inline copy_data_source_cb LookupParallelCopyFnPtr(const char *funcname);
static pg_attribute_always_inline char* LookupParallelCopyFnStr(copy_data_source_cb fn_addr);
@@ -836,6 +902,137 @@ InsertListShm(ParallelContext *pcxt, int key, List *inputlist,
}
/*
+ * IsTriggerFunctionParallelSafe - Check if the trigger function is parallel
+ * safe for the triggers. Return false if any one of the trigger has parallel
+ * unsafe function.
+ */
+static pg_attribute_always_inline bool
+IsTriggerFunctionParallelSafe(TriggerDesc *trigdesc)
+{
+ int i;
+ for (i = 0; i < trigdesc->numtriggers; i++)
+ {
+ Trigger *trigger = &trigdesc->triggers[i];
+ int trigtype = RI_TRIGGER_NONE;
+
+ if (func_parallel(trigger->tgfoid) != PROPARALLEL_SAFE)
+ return false;
+
+ /* If the trigger is parallel safe, also look for RI_TRIGGER. */
+ trigtype = RI_FKey_trigger_type(trigger->tgfoid);
+ if (trigtype == RI_TRIGGER_PK || trigtype == RI_TRIGGER_FK)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * CheckExprParallelSafety - determine parallel safety of volatile expressions
+ * in default clause of column definition or in where clause and return true if
+ * they are parallel safe.
+ */
+static pg_attribute_always_inline bool
+CheckExprParallelSafety(CopyState cstate)
+{
+ if (contain_volatile_functions(cstate->whereClause))
+ {
+ if (max_parallel_hazard((Query *)cstate->whereClause) != PROPARALLEL_SAFE)
+ return false;
+ }
+
+ /*
+ * Check if any of the column has volatile default expression. if yes, and
+ * they are not parallel safe, then parallelism is not allowed. For
+ * instance, if there are any serial/bigserial columns for which nextval()
+ * default expression which is parallel unsafe is associated, parallelism
+ * should not be allowed. In non paralllel copy volatile functions are not
+ * checked for nextval().
+ */
+ if (cstate->defexprs != NULL && cstate->num_defaults != 0)
+ {
+ int i;
+ for (i = 0; i < cstate->num_defaults; i++)
+ {
+ bool volatile_expr = contain_volatile_functions((Node *)cstate->defexprs[i]->expr);
+ if (volatile_expr &&
+ (max_parallel_hazard((Query *)cstate->defexprs[i]->expr)) !=
+ PROPARALLEL_SAFE)
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * FindInsertMethod - determine insert mode single, multi, or multi conditional.
+ */
+static pg_attribute_always_inline CopyInsertMethod
+FindInsertMethod(CopyState cstate)
+{
+ if (cstate->rel->trigdesc != NULL &&
+ (cstate->rel->trigdesc->trig_insert_before_row ||
+ cstate->rel->trigdesc->trig_insert_instead_row))
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
+ cstate->rel->trigdesc != NULL &&
+ cstate->rel->trigdesc->trig_insert_new_table)
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+ return CIM_MULTI_CONDITIONAL;
+
+ return CIM_MULTI;
+}
+
+/*
+ * IsParallelCopyAllowed - check for the cases where parallel copy is not
+ * applicable.
+ */
+static pg_attribute_always_inline bool
+IsParallelCopyAllowed(CopyState cstate)
+{
+ /* Parallel copy not allowed for frontend (2.0 protocol) & binary option. */
+ if ((cstate->copy_dest == COPY_OLD_FE) || cstate->binary)
+ return false;
+
+ /* Check if copy is into foreign table or temporary table. */
+ if (cstate->rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE ||
+ RelationUsesLocalBuffers(cstate->rel))
+ return false;
+
+ /* Check if trigger function is parallel safe. */
+ if (cstate->rel->trigdesc != NULL &&
+ !IsTriggerFunctionParallelSafe(cstate->rel->trigdesc))
+ return false;
+
+ /*
+ * Check if there is after statement or instead of trigger or transition
+ * table triggers.
+ */
+ if (cstate->rel->trigdesc != NULL &&
+ (cstate->rel->trigdesc->trig_insert_after_statement ||
+ cstate->rel->trigdesc->trig_insert_instead_row ||
+ cstate->rel->trigdesc->trig_insert_new_table))
+ return false;
+
+ /* Check if the volatile expressions are parallel safe, if present any. */
+ if (!CheckExprParallelSafety(cstate))
+ return false;
+
+ /* Check if the insertion mode is single. */
+ if (FindInsertMethod(cstate) == CIM_SINGLE)
+ return false;
+
+ return true;
+}
+
+/*
* BeginParallelCopy - start parallel copy tasks.
*
* Get the number of workers required to perform the parallel copy. The data
@@ -865,6 +1062,8 @@ BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
int parallel_workers = 0;
ParallelCopyData *pcdata;
+ CheckTargetRelValidity(cstate);
+
parallel_workers = Min(nworkers, max_worker_processes);
/* Can't perform copy in parallel */
@@ -878,6 +1077,19 @@ BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
cstate->pcdata = pcdata;
+ /*
+ * User chosen parallel copy. Determine if the parallel copy is actually
+ * allowed. If not, go with the non-parallel mode.
+ */
+ if (!IsParallelCopyAllowed(cstate))
+ {
+ elog(WARNING,
+ "Parallel copy not supported for specified table, copy will be run in non-parallel mode");
+ return NULL;
+ }
+
+ full_transaction_id = GetCurrentFullTransactionId();
+
EnterParallelMode();
pcxt = CreateParallelContext("postgres", "ParallelCopyMain",
parallel_workers);
@@ -1108,7 +1320,211 @@ ParallelWorkerInitialization(ParallelCopyCommonKeyData *shared_cstate,
cstate->line_buf_converted = false;
cstate->raw_buf = NULL;
cstate->raw_buf_index = cstate->raw_buf_len = 0;
+
+ PopulateCatalogInformation(cstate);
+
+ /* Create workspace for CopyReadAttributes results. */
+ if (!cstate->binary)
+ {
+ AttrNumber attr_count = list_length(cstate->attnumlist);
+
+ cstate->max_fields = attr_count;
+ cstate->raw_fields = (char **)palloc(attr_count * sizeof(char *));
+ }
+}
+
+/*
+ * CacheLineInfo - Cache the line information to local memory.
+ */
+static bool
+CacheLineInfo(CopyState cstate, uint32 buff_count)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyData *pcdata = cstate->pcdata;
+ uint32 write_pos;
+ ParallelCopyDataBlock *data_blk_ptr;
+ ParallelCopyLineBoundary *lineInfo;
+ uint32 offset;
+ int dataSize;
+ int copiedSize = 0;
+
+ resetStringInfo(&pcdata->worker_line_buf[buff_count].line_buf);
+ write_pos = GetLinePosition(cstate);
+ if (-1 == write_pos)
+ return true;
+
+ /* Get the current line information. */
+ lineInfo = &pcshared_info->line_boundaries.ring[write_pos];
+ if (pg_atomic_read_u32(&lineInfo->line_size) == 0)
+ goto empty_data_line_update;
+
+ /* Get the block information. */
+ data_blk_ptr = &pcshared_info->data_blocks[lineInfo->first_block];
+
+ /* Get the offset information from where the data must be copied. */
+ offset = lineInfo->start_offset;
+ pcdata->worker_line_buf[buff_count].cur_lineno = lineInfo->cur_lineno;
+
+ elog(DEBUG1, "[Worker] Processing - line position:%d, block:%d, unprocessed lines:%d, offset:%d, line size:%d",
+ write_pos, lineInfo->first_block,
+ pg_atomic_read_u32(&data_blk_ptr->unprocessed_line_parts),
+ offset, pg_atomic_read_u32(&lineInfo->line_size));
+
+ for (;;)
+ {
+ uint8 skip_bytes = data_blk_ptr->skip_bytes;
+ /*
+ * There is a possibility that the above loop has come out because
+ * data_blk_ptr->curr_blk_completed is set, but dataSize read might
+ * be an old value, if data_blk_ptr->curr_blk_completed and the line is
+ * completed, line_size will be set. Read the line_size again to be
+ * sure if it is complete or partial block.
+ */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+ if (dataSize)
+ {
+ int remainingSize = dataSize - copiedSize;
+ if (!remainingSize)
+ break;
+
+ /* Whole line is in current block. */
+ if (remainingSize + offset + skip_bytes < DATA_BLOCK_SIZE)
+ {
+ appendBinaryStringInfo(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ remainingSize);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts,
+ 1);
+ break;
+ }
+ else
+ {
+ /* Line is spread across the blocks. */
+ uint32 lineInCurrentBlock = (DATA_BLOCK_SIZE - skip_bytes) - offset;
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ lineInCurrentBlock);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts, 1);
+ copiedSize += lineInCurrentBlock;
+ while (copiedSize < dataSize)
+ {
+ uint32 currentBlockCopySize;
+ ParallelCopyDataBlock *currBlkPtr = &pcshared_info->data_blocks[data_blk_ptr->following_block];
+ skip_bytes = currBlkPtr->skip_bytes;
+
+ /*
+ * If complete data is present in current block use
+ * dataSize - copiedSize, or copy the whole block from
+ * current block.
+ */
+ currentBlockCopySize = Min(dataSize - copiedSize, DATA_BLOCK_SIZE - skip_bytes);
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &currBlkPtr->data[0],
+ currentBlockCopySize);
+ pg_atomic_sub_fetch_u32(&currBlkPtr->unprocessed_line_parts, 1);
+ copiedSize += currentBlockCopySize;
+ data_blk_ptr = currBlkPtr;
+ }
+
+ break;
+ }
+ }
+ else
+ {
+ /* Copy this complete block from the current offset. */
+ uint32 lineInCurrentBlock = (DATA_BLOCK_SIZE - skip_bytes) - offset;
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ lineInCurrentBlock);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts, 1);
+ copiedSize += lineInCurrentBlock;
+
+ /*
+ * Reset the offset. For the first copy, copy from the offset. For
+ * the subsequent copy the complete block.
+ */
+ offset = 0;
+
+ /* Set data_blk_ptr to the following block. */
+ data_blk_ptr = &pcshared_info->data_blocks[data_blk_ptr->following_block];
+ }
+
+ for (;;)
+ {
+ /* Get the size of this line */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+
+ /*
+ * If the data is present in current block lineInfo.line_size
+ * will be updated. If the data is spread across the blocks either
+ * of lineInfo.line_size or data_blk_ptr->curr_blk_completed can
+ * be updated. lineInfo.line_size will be updated if the complete
+ * read is finished. data_blk_ptr->curr_blk_completed will be
+ * updated if processing of current block is finished and data
+ * processing is not finished.
+ */
+ if (data_blk_ptr->curr_blk_completed || (dataSize != -1))
+ break;
+
+ COPY_WAIT_TO_PROCESS()
+ }
+ }
+
+empty_data_line_update:
+ elog(DEBUG1, "[Worker] Completed processing line:%d", write_pos);
+ pg_atomic_write_u32(&lineInfo->line_state, LINE_WORKER_PROCESSED);
+ pg_atomic_write_u32(&lineInfo->line_size, -1);
+ pg_atomic_add_fetch_u64(&pcshared_info->total_worker_processed, 1);
+ return false;
+}
+
+/*
+ * GetWorkerLine - Returns a line for worker to process.
+ */
+static bool
+GetWorkerLine(CopyState cstate)
+{
+ uint32 buff_count;
+ ParallelCopyData *pcdata = cstate->pcdata;
+
+ /*
+ * Copy the line data to line_buf and release the line position so that the
+ * worker can continue loading data.
+ */
+ if (pcdata->worker_line_buf_pos < pcdata->worker_line_buf_count)
+ goto return_line;
+
+ pcdata->worker_line_buf_pos = 0;
+ pcdata->worker_line_buf_count = 0;
+
+ for (buff_count = 0; buff_count < WORKER_CHUNK_COUNT; buff_count++)
+ {
+ bool result = CacheLineInfo(cstate, buff_count);
+ if (result)
+ break;
+
+ pcdata->worker_line_buf_count++;
+ }
+
+ if (pcdata->worker_line_buf_count)
+ goto return_line;
+ else
+ resetStringInfo(&cstate->line_buf);
+
+ return true;
+
+return_line:
+ cstate->line_buf = pcdata->worker_line_buf[pcdata->worker_line_buf_pos].line_buf;
+ cstate->cur_lineno = pcdata->worker_line_buf[pcdata->worker_line_buf_pos].cur_lineno;
+ cstate->line_buf_valid = true;
+
+ /* Mark that encoding conversion hasn't occurred yet. */
+ cstate->line_buf_converted = false;
+ CONVERT_TO_SERVER_ENCODING(cstate)
+ pcdata->worker_line_buf_pos++;
+ return false;
}
+
/*
* ParallelCopyMain - parallel copy worker's code.
*
@@ -1154,6 +1570,7 @@ ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
ereport(DEBUG1, (errmsg("Starting parallel copy worker")));
pcdata->pcshared_info = pcshared_info;
+ AssignFullTransactionIdForWorker(pcshared_info->full_transaction_id);
shared_cstate = (ParallelCopyCommonKeyData *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_CSTATE,
false);
@@ -1206,6 +1623,34 @@ ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
MemoryContextSwitchTo(oldcontext);
return;
}
+
+/*
+ * UpdateBlockInLineInfo - Update the line information.
+ */
+static pg_attribute_always_inline int
+UpdateBlockInLineInfo(CopyState cstate, uint32 blk_pos,
+ uint32 offset, uint32 line_size, uint32 line_state)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundaries *lineBoundaryPtr = &pcshared_info->line_boundaries;
+ ParallelCopyLineBoundary *lineInfo;
+ int line_pos = lineBoundaryPtr->leader_pos;
+
+ /* Update the line information for the worker to pick and process. */
+ lineInfo = &lineBoundaryPtr->ring[line_pos];
+ while (pg_atomic_read_u32(&lineInfo->line_size) != -1)
+ COPY_WAIT_TO_PROCESS()
+
+ lineInfo->first_block = blk_pos;
+ lineInfo->start_offset = offset;
+ lineInfo->cur_lineno = cstate->cur_lineno;
+ pg_atomic_write_u32(&lineInfo->line_size, line_size);
+ pg_atomic_write_u32(&lineInfo->line_state, line_state);
+ lineBoundaryPtr->leader_pos = (lineBoundaryPtr->leader_pos + 1) % RINGSIZE;
+
+ return line_pos;
+}
+
/*
* ParallelCopyLeader - parallel copy leader's functionality.
*
@@ -1231,9 +1676,158 @@ ParallelCopyLeader(CopyState cstate)
ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
ereport(DEBUG1, (errmsg("Running parallel copy leader")));
+ /* Execute the before statement triggers from the leader */
+ ExecBeforeStmtTrigger(cstate);
+
+ /* On input just throw the header line away. */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
+ {
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ {
+ pcshared_info->is_read_in_progress = false;
+ return; /* done */
+ }
+ }
+
+ for (;;)
+ {
+ bool done;
+ cstate->cur_lineno++;
+
+ /* Actually read the line into memory here. */
+ done = CopyReadLine(cstate);
+
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ break;
+ }
+
pcshared_info->is_read_in_progress = false;
cstate->cur_lineno = 0;
}
+
+/*
+ * GetLinePosition - return the line position that worker should process.
+ */
+static uint32
+GetLinePosition(CopyState cstate)
+{
+ ParallelCopyData *pcdata = cstate->pcdata;
+ ParallelCopyShmInfo *pcshared_info = pcdata->pcshared_info;
+ uint32 previous_pos = pcdata->worker_processed_pos;
+ uint32 write_pos = (previous_pos == -1) ? 0 : (previous_pos + 1) % RINGSIZE;
+ for (;;)
+ {
+ int dataSize;
+ bool is_read_in_progress = pcshared_info->is_read_in_progress;
+ ParallelCopyLineBoundary *lineInfo;
+ ParallelCopyDataBlock *data_blk_ptr;
+ ParallelCopyLineState line_state = LINE_LEADER_POPULATED;
+ ParallelCopyLineState curr_line_state;
+ CHECK_FOR_INTERRUPTS();
+
+ /* File read completed & no elements to process. */
+ if (!is_read_in_progress &&
+ (pcshared_info->populated ==
+ pg_atomic_read_u64(&pcshared_info->total_worker_processed)))
+ {
+ write_pos = -1;
+ break;
+ }
+
+ /* Get the current line information. */
+ lineInfo = &pcshared_info->line_boundaries.ring[write_pos];
+ curr_line_state = pg_atomic_read_u32(&lineInfo->line_state);
+ if ((write_pos % WORKER_CHUNK_COUNT == 0) &&
+ (curr_line_state == LINE_WORKER_PROCESSED ||
+ curr_line_state == LINE_WORKER_PROCESSING))
+ {
+ pcdata->worker_processed_pos = write_pos;
+ write_pos = (write_pos + WORKER_CHUNK_COUNT) % RINGSIZE;
+ continue;
+ }
+
+ /* Get the size of this line. */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+
+ if (dataSize != 0) /* If not an empty line. */
+ {
+ /* Get the block information. */
+ data_blk_ptr = &pcshared_info->data_blocks[lineInfo->first_block];
+
+ if (!data_blk_ptr->curr_blk_completed && (dataSize == -1))
+ {
+ /* Wait till the current line or block is added. */
+ COPY_WAIT_TO_PROCESS()
+ continue;
+ }
+ }
+
+ /* Make sure that no worker has consumed this element. */
+ if (pg_atomic_compare_exchange_u32(&lineInfo->line_state,
+ &line_state, LINE_WORKER_PROCESSING))
+ break;
+ }
+
+ pcdata->worker_processed_pos = write_pos;
+ return write_pos;
+}
+
+/*
+ * GetFreeCopyBlock - Get a free block for data to be copied.
+ */
+static pg_attribute_always_inline uint32
+GetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
+{
+ int count = 0;
+ uint32 last_free_block = pcshared_info->cur_block_pos;
+ uint32 block_pos = (last_free_block != -1) ? ((last_free_block + 1) % MAX_BLOCKS_COUNT): 0;
+
+ /* Get a new block for copying data. */
+ while (count < MAX_BLOCKS_COUNT)
+ {
+ ParallelCopyDataBlock *dataBlkPtr = &pcshared_info->data_blocks[block_pos];
+ uint32 unprocessed_line_parts = pg_atomic_read_u32(&dataBlkPtr->unprocessed_line_parts);
+ if (unprocessed_line_parts == 0)
+ {
+ dataBlkPtr->curr_blk_completed = false;
+ dataBlkPtr->skip_bytes = 0;
+ pcshared_info->cur_block_pos = block_pos;
+ return block_pos;
+ }
+
+ block_pos = (block_pos + 1) % MAX_BLOCKS_COUNT;
+ count++;
+ }
+
+ return -1;
+}
+
+/*
+ * WaitGetFreeCopyBlock - If there are no blocks available, wait and get a block
+ * for copying data.
+ */
+static uint32
+WaitGetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
+{
+ uint32 new_free_pos = -1;
+ for (;;)
+ {
+ new_free_pos = GetFreeCopyBlock(pcshared_info);
+ if (new_free_pos != -1) /* We have got one block, break now. */
+ break;
+
+ COPY_WAIT_TO_PROCESS()
+ }
+
+ return new_free_pos;
+}
+
/*
* LookupParallelCopyFnPtr - Look up parallel copy function pointer.
*/
@@ -1269,6 +1863,146 @@ LookupParallelCopyFnStr(copy_data_source_cb fn_addr)
/* We can only reach this by programming error. */
elog(ERROR, "internal function pointer not found");
}
+
+/*
+ * SetRawBufForLoad - Set raw_buf to the shared memory where the file data must
+ * be read.
+ */
+static void
+SetRawBufForLoad(CopyState cstate, uint32 line_size, uint32 copy_buf_len,
+ uint32 raw_buf_ptr, char **copy_raw_buf)
+{
+ ParallelCopyShmInfo *pcshared_info;
+ uint32 cur_block_pos;
+ uint32 next_block_pos;
+ ParallelCopyDataBlock *cur_data_blk_ptr = NULL;
+ ParallelCopyDataBlock *next_data_blk_ptr = NULL;
+
+ if (!IsParallelCopy())
+ return;
+
+ pcshared_info = cstate->pcdata->pcshared_info;
+ cur_block_pos = pcshared_info->cur_block_pos;
+ cur_data_blk_ptr = (cstate->raw_buf) ? &pcshared_info->data_blocks[cur_block_pos] : NULL;
+ next_block_pos = WaitGetFreeCopyBlock(pcshared_info);
+ next_data_blk_ptr = &pcshared_info->data_blocks[next_block_pos];
+
+ /* set raw_buf to the data block in shared memory */
+ cstate->raw_buf = next_data_blk_ptr->data;
+ *copy_raw_buf = cstate->raw_buf;
+ if (cur_data_blk_ptr && line_size)
+ {
+ /*
+ * Mark the previous block as completed, worker can start copying this
+ * data.
+ */
+ cur_data_blk_ptr->following_block = next_block_pos;
+ pg_atomic_add_fetch_u32(&cur_data_blk_ptr->unprocessed_line_parts, 1);
+ cur_data_blk_ptr->skip_bytes = copy_buf_len - raw_buf_ptr;
+ cur_data_blk_ptr->curr_blk_completed = true;
+ }
+}
+
+/*
+ * EndLineParallelCopy - Update the line information in shared memory.
+ */
+static void
+EndLineParallelCopy(CopyState cstate, uint32 line_pos, uint32 line_size,
+ uint32 raw_buf_ptr)
+{
+ uint8 new_line_size;
+ if (!IsParallelCopy())
+ return;
+
+ if (!IsHeaderLine())
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundaries *lineBoundaryPtr = &pcshared_info->line_boundaries;
+ SET_NEWLINE_SIZE()
+ if (line_size)
+ {
+ ParallelCopyLineBoundary *lineInfo = &lineBoundaryPtr->ring[line_pos];
+ /*
+ * If the new_line_size > raw_buf_ptr, then the new block has only
+ * new line char content. The unprocessed count should not be
+ * increased in this case.
+ */
+ if (raw_buf_ptr > new_line_size)
+ {
+ uint32 cur_block_pos = pcshared_info->cur_block_pos;
+ ParallelCopyDataBlock *curr_data_blk_ptr = &pcshared_info->data_blocks[cur_block_pos];
+ pg_atomic_add_fetch_u32(&curr_data_blk_ptr->unprocessed_line_parts, 1);
+ }
+
+ /* Update line size. */
+ pg_atomic_write_u32(&lineInfo->line_size, line_size);
+ pg_atomic_write_u32(&lineInfo->line_state, LINE_LEADER_POPULATED);
+ elog(DEBUG1, "[Leader] After adding - line position:%d, line_size:%d",
+ line_pos, line_size);
+ pcshared_info->populated++;
+ }
+ else if (new_line_size)
+ {
+ /* This means only new line char, empty record should be inserted.*/
+ ParallelCopyLineBoundary *lineInfo;
+ line_pos = UpdateBlockInLineInfo(cstate, -1, -1, 0,
+ LINE_LEADER_POPULATED);
+ lineInfo = &lineBoundaryPtr->ring[line_pos];
+ elog(DEBUG1, "[Leader] Added empty line with offset:%d, line position:%d, line size:%d",
+ lineInfo->start_offset, line_pos,
+ pg_atomic_read_u32(&lineInfo->line_size));
+ pcshared_info->populated++;
+ }
+ }
+}
+
+/*
+ * ExecBeforeStmtTrigger - Execute the before statement trigger, this will be
+ * executed for parallel copy by the leader process.
+ */
+static void
+ExecBeforeStmtTrigger(CopyState cstate)
+{
+ EState *estate = CreateExecutorState();
+ ResultRelInfo *resultRelInfo;
+
+ Assert(IsLeader());
+
+ /*
+ * We need a ResultRelInfo so we can use the regular executor's
+ * index-entry-making machinery. (There used to be a huge amount of code
+ * here that basically duplicated execUtils.c ...)
+ */
+ resultRelInfo = makeNode(ResultRelInfo);
+ InitResultRelInfo(resultRelInfo,
+ cstate->rel,
+ 1, /* must match rel's position in range_table */
+ NULL,
+ 0);
+
+ /* Verify the named relation is a valid target for INSERT */
+ CheckValidResultRel(resultRelInfo, CMD_INSERT);
+
+ estate->es_result_relations = resultRelInfo;
+ estate->es_num_result_relations = 1;
+ estate->es_result_relation_info = resultRelInfo;
+
+ ExecInitRangeTable(estate, cstate->range_table);
+
+ /*
+ * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
+ * should do this for COPY, since it's not really an "INSERT" statement as
+ * such. However, executing these triggers maintains consistency with the
+ * EACH ROW triggers that we already fire on COPY.
+ */
+ ExecBSInsertTriggers(estate, resultRelInfo);
+
+ /* Close any trigger target relations */
+ ExecCleanUpTriggerState(estate);
+
+ FreeExecutorState(estate);
+}
+
/*
* Send copy start/stop messages for frontend copies. These have changed
* in past protocol redesigns.
@@ -3675,7 +4409,8 @@ CopyFrom(CopyState cstate)
PartitionTupleRouting *proute = NULL;
ErrorContextCallback errcallback;
- CommandId mycid = GetCurrentCommandId(true);
+ CommandId mycid = IsParallelCopy() ? cstate->pcdata->pcshared_info->mycid :
+ GetCurrentCommandId(true);
int ti_options = 0; /* start with default options for insert */
BulkInsertState bistate = NULL;
CopyInsertMethod insertMethod;
@@ -3685,7 +4420,14 @@ CopyFrom(CopyState cstate)
bool has_instead_insert_row_trig;
bool leafpart_use_multi_insert = false;
- CheckTargetRelValidity(cstate);
+ /*
+ * Perform this check if it is not parallel copy. In case of parallel
+ * copy, this check is done by the leader, so that if any invalid case
+ * exist the copy from command will error out from the leader itself,
+ * avoiding launching workers, just to throw error.
+ */
+ if (!IsParallelCopy())
+ CheckTargetRelValidity(cstate);
/*
* If the target file is new-in-transaction, we assume that checking FSM
@@ -3874,13 +4616,16 @@ CopyFrom(CopyState cstate)
has_instead_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
resultRelInfo->ri_TrigDesc->trig_insert_instead_row);
- /*
- * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
- * should do this for COPY, since it's not really an "INSERT" statement as
- * such. However, executing these triggers maintains consistency with the
- * EACH ROW triggers that we already fire on COPY.
- */
- ExecBSInsertTriggers(estate, resultRelInfo);
+ if (!IsParallelCopy())
+ {
+ /*
+ * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
+ * should do this for COPY, since it's not really an "INSERT" statement as
+ * such. However, executing these triggers maintains consistency with the
+ * EACH ROW triggers that we already fire on COPY.
+ */
+ ExecBSInsertTriggers(estate, resultRelInfo);
+ }
econtext = GetPerTupleExprContext(estate);
@@ -3980,6 +4725,16 @@ CopyFrom(CopyState cstate)
!has_instead_insert_row_trig &&
resultRelInfo->ri_FdwRoutine == NULL;
+ /*
+ * If the table has any partitions that are either foreign or
+ * has BEFORE/INSTEAD OF triggers, we can't perform copy
+ * operations with parallel workers.
+ */
+ if (!leafpart_use_multi_insert && IsParallelWorker())
+ ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot perform PARALLEL COPY if partition has BEFORE/INSTEAD OF triggers, or if the partition is foreign partition"),
+ errhint("Try COPY without PARALLEL option")));
+
/* Set the multi-insert buffer to use for this partition. */
if (leafpart_use_multi_insert)
{
@@ -4394,7 +5149,7 @@ BeginCopyFrom(ParseState *pstate,
initStringInfo(&cstate->attribute_buf);
initStringInfo(&cstate->line_buf);
cstate->line_buf_converted = false;
- cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf = (IsParallelCopy()) ? NULL : (char *) palloc(RAW_BUF_SIZE + 1);
cstate->raw_buf_index = cstate->raw_buf_len = 0;
/* Assign range table, we'll need it in CopyFrom. */
@@ -4538,26 +5293,35 @@ NextCopyFromRawFields(CopyState cstate, char ***fields, int *nfields)
/* only available for text or csv input */
Assert(!cstate->binary);
- /* on input just throw the header line away */
- if (cstate->cur_lineno == 0 && cstate->header_line)
+ if (IsParallelCopy())
{
- cstate->cur_lineno++;
- if (CopyReadLine(cstate))
- return false; /* done */
+ done = GetWorkerLine(cstate);
+ if (done && cstate->line_buf.len == 0)
+ return false;
}
+ else
+ {
+ /* on input just throw the header line away */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
+ {
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ return false; /* done */
+ }
- cstate->cur_lineno++;
+ cstate->cur_lineno++;
- /* Actually read the line into memory here */
- done = CopyReadLine(cstate);
+ /* Actually read the line into memory here */
+ done = CopyReadLine(cstate);
- /*
- * EOF at start of line means we're done. If we see EOF after some
- * characters, we act as though it was newline followed by EOF, ie,
- * process the line and then exit loop on next iteration.
- */
- if (done && cstate->line_buf.len == 0)
- return false;
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ return false;
+ }
/* Parse the line into de-escaped field values */
if (cstate->csv_mode)
@@ -4805,9 +5569,31 @@ CopyReadLine(CopyState cstate)
*/
if (cstate->copy_dest == COPY_NEW_FE)
{
+ bool bIsFirst = true;
do
{
- cstate->raw_buf_index = cstate->raw_buf_len;
+ if (!IsParallelCopy())
+ cstate->raw_buf_index = cstate->raw_buf_len;
+ else
+ {
+ if (cstate->raw_buf_index == RAW_BUF_SIZE)
+ {
+ /* Get a new block if it is the first time, From the
+ * subsequent time, reset the index and re-use the same
+ * block.
+ */
+ if (bIsFirst)
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint32 block_pos = WaitGetFreeCopyBlock(pcshared_info);
+ cstate->raw_buf = pcshared_info->data_blocks[block_pos].data;
+ bIsFirst = false;
+ }
+
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+ }
+ }
+
} while (CopyLoadRawBuf(cstate));
}
}
@@ -4837,6 +5623,11 @@ CopyReadLineText(CopyState cstate)
char quotec = '\0';
char escapec = '\0';
+ /* For parallel copy */
+ uint32 line_size = 0;
+ int line_pos = 0;
+
+ cstate->eol_type = EOL_UNKNOWN;
if (cstate->csv_mode)
{
quotec = cstate->quote[0];
@@ -4891,6 +5682,8 @@ CopyReadLineText(CopyState cstate)
if (raw_buf_ptr >= copy_buf_len || need_data)
{
REFILL_LINEBUF;
+ SetRawBufForLoad(cstate, line_size, copy_buf_len, raw_buf_ptr,
+ ©_raw_buf);
/*
* Try to read some more data. This will certainly reset
@@ -5115,9 +5908,15 @@ CopyReadLineText(CopyState cstate)
* discard the data and the \. sequence.
*/
if (prev_raw_ptr > cstate->raw_buf_index)
- appendBinaryStringInfo(&cstate->line_buf,
+ {
+ if (!IsParallelCopy())
+ appendBinaryStringInfo(&cstate->line_buf,
cstate->raw_buf + cstate->raw_buf_index,
prev_raw_ptr - cstate->raw_buf_index);
+ else
+ line_size += prev_raw_ptr - cstate->raw_buf_index;
+ }
+
cstate->raw_buf_index = raw_buf_ptr;
result = true; /* report EOF */
break;
@@ -5169,6 +5968,26 @@ not_end_of_copy:
IF_NEED_REFILL_AND_EOF_BREAK(mblen - 1);
raw_buf_ptr += mblen - 1;
}
+
+ /*
+ * Skip the header line. Update the line here, this cannot be done at
+ * the beginning, as there is a possibility that file contains empty
+ * lines.
+ */
+ if (IsParallelCopy() && first_char_in_line && !IsHeaderLine())
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundary *lineInfo;
+ uint32 line_first_block = pcshared_info->cur_block_pos;
+ line_pos = UpdateBlockInLineInfo(cstate,
+ line_first_block,
+ cstate->raw_buf_index, -1,
+ LINE_LEADER_POPULATING);
+ lineInfo = &pcshared_info->line_boundaries.ring[line_pos];
+ elog(DEBUG1, "[Leader] Adding - block:%d, offset:%d, line position:%d",
+ line_first_block, lineInfo->start_offset, line_pos);
+ }
+
first_char_in_line = false;
} /* end of outer loop */
@@ -5177,6 +5996,7 @@ not_end_of_copy:
*/
REFILL_LINEBUF;
CLEAR_EOL_LINE()
+ EndLineParallelCopy(cstate, line_pos, line_size, raw_buf_ptr);
return result;
}
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index db19187..584b7ee 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -381,6 +381,7 @@ extern FullTransactionId GetTopFullTransactionId(void);
extern FullTransactionId GetTopFullTransactionIdIfAny(void);
extern FullTransactionId GetCurrentFullTransactionId(void);
extern FullTransactionId GetCurrentFullTransactionIdIfAny(void);
+extern void AssignFullTransactionIdForWorker(FullTransactionId fullTransactionId);
extern void MarkCurrentTransactionIdLoggedIfAny(void);
extern bool SubTransactionIsActive(SubTransactionId subxid);
extern CommandId GetCurrentCommandId(bool used);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 8a79794..86b5c62 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1700,6 +1700,7 @@ ParallelCompletionPtr
ParallelContext
ParallelCopyLineBoundaries
ParallelCopyLineBoundary
+ParallelCopyLineState
ParallelCopyCommonKeyData
ParallelCopyData
ParallelCopyDataBlock
--
1.8.3.1
0002-Framework-for-leader-worker-in-parallel-copy.patchapplication/x-patch; name=0002-Framework-for-leader-worker-in-parallel-copy.patchDownload
From 447a954eed01432c170ad94e3ffffa30112f53aa Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>,Bharath <bharath.rupireddyforpostgres@gmail.com>
Date: Tue, 23 Jun 2020 07:04:46 +0530
Subject: [PATCH 2/6] Framework for leader/worker in parallel copy.
This patch has the framework for data structures in parallel copy, leader
initialization, worker initialization, shared memory updation, starting workers,
wait for workers and workers exiting.
---
src/backend/access/transam/parallel.c | 4 +
src/backend/commands/copy.c | 838 +++++++++++++++++++++++++++-
src/backend/replication/logical/tablesync.c | 2 +-
src/include/commands/copy.h | 4 +
src/tools/pgindent/typedefs.list | 8 +
5 files changed, 851 insertions(+), 5 deletions(-)
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 14a8690..09e7a19 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -25,6 +25,7 @@
#include "catalog/pg_enum.h"
#include "catalog/storage.h"
#include "commands/async.h"
+#include "commands/copy.h"
#include "executor/execParallel.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
@@ -145,6 +146,9 @@ static const struct
},
{
"parallel_vacuum_main", parallel_vacuum_main
+ },
+ {
+ "ParallelCopyMain", ParallelCopyMain
}
};
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 65a504f..34c657c 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -96,9 +96,138 @@ typedef enum CopyInsertMethod
} CopyInsertMethod;
#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+
+#define DATA_BLOCK_SIZE RAW_BUF_SIZE
+#define RINGSIZE (10 * 1000)
+#define MAX_BLOCKS_COUNT 1000
+#define WORKER_CHUNK_COUNT 50 /* should be mod of RINGSIZE */
+
+#define IsParallelCopy() (cstate->is_parallel)
+#define IsLeader() (cstate->pcdata->is_leader)
#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
/*
+ * Copy data block information.
+ */
+typedef struct ParallelCopyDataBlock
+{
+ /* The number of unprocessed lines in the current block. */
+ pg_atomic_uint32 unprocessed_line_parts;
+
+ /*
+ * If the current line data is continued into another block, following_block
+ * will have the position where the remaining data need to be read.
+ */
+ uint32 following_block;
+
+ /*
+ * This flag will be set, when the leader finds out this block can be read
+ * safely by the worker. This helps the worker to start processing the line
+ * early where the line will be spread across many blocks and the worker
+ * need not wait for the complete line to be processed.
+ */
+ bool curr_blk_completed;
+ char data[DATA_BLOCK_SIZE]; /* data read from file */
+ uint8 skip_bytes;
+}ParallelCopyDataBlock;
+
+/*
+ * Individual line information.
+ * ParallelCopyLineBoundary is common data structure between leader & worker,
+ * this is protected by the following sequence in the leader & worker.
+ * Leader should operate in the following order:
+ * 1) update first_block, start_offset & cur_lineno in any order.
+ * 2) update line_size.
+ * 3) update line_state.
+ * Worker should operate in the following order:
+ * 1) read line_size.
+ * 2) only one worker should choose one line for processing, this is handled by
+ * using pg_atomic_compare_exchange_u32, worker will change the sate to
+ * LINE_WORKER_PROCESSING only if line_state is LINE_LEADER_POPULATED.
+ * 3) read first_block, start_offset & cur_lineno in any order.
+ */
+typedef struct ParallelCopyLineBoundary
+{
+ /* Position of the first block in data_blocks array. */
+ uint32 first_block;
+ uint32 start_offset; /* start offset of the line */
+
+ /*
+ * Size of the current line -1 means line is yet to be filled completely,
+ * 0 means empty line, >0 means line filled with line size data.
+ */
+ pg_atomic_uint32 line_size;
+ pg_atomic_uint32 line_state; /* line state */
+ uint64 cur_lineno; /* line number for error messages */
+}ParallelCopyLineBoundary;
+
+/*
+ * Circular queue used to store the line information.
+ */
+typedef struct ParallelCopyLineBoundaries
+{
+ /* Position for the leader to populate a line. */
+ uint32 leader_pos;
+
+ /* Data read from the file/stdin by the leader process. */
+ ParallelCopyLineBoundary ring[RINGSIZE];
+}ParallelCopyLineBoundaries;
+
+/*
+ * Shared information among parallel copy workers. This will be allocated in the
+ * DSM segment.
+ */
+typedef struct ParallelCopyShmInfo
+{
+ bool is_read_in_progress; /* file read status */
+
+ /*
+ * Actual lines inserted by worker (some records will be filtered based on
+ * where condition).
+ */
+ pg_atomic_uint64 processed;
+ pg_atomic_uint64 total_worker_processed; /* total processed records by the workers */
+ uint64 populated; /* lines populated by leader */
+ uint32 cur_block_pos; /* current data block */
+ ParallelCopyDataBlock data_blocks[MAX_BLOCKS_COUNT]; /* data block array */
+ FullTransactionId full_transaction_id; /* xid for copy from statement */
+ CommandId mycid; /* command id */
+ ParallelCopyLineBoundaries line_boundaries; /* line array */
+} ParallelCopyShmInfo;
+
+/*
+ * Parallel copy line buffer information.
+ */
+typedef struct ParallelCopyLineBuf
+{
+ StringInfoData line_buf;
+ uint64 cur_lineno; /* line number for error messages */
+}ParallelCopyLineBuf;
+
+/*
+ * Parallel copy data information.
+ */
+typedef struct ParallelCopyData
+{
+ Oid relid; /* relation id of the table */
+ ParallelCopyShmInfo *pcshared_info; /* common info in shared memory */
+ bool is_leader;
+
+ /* line position which worker is processing */
+ uint32 worker_processed_pos;
+
+ /*
+ * Local line_buf array, workers will copy it here and release the lines
+ * for the leader to continue.
+ */
+ ParallelCopyLineBuf worker_line_buf[WORKER_CHUNK_COUNT];
+ uint32 worker_line_buf_count; /* Number of lines */
+
+ /* Current position in worker_line_buf */
+ uint32 worker_line_buf_pos;
+}ParallelCopyData;
+
+/*
* This struct contains all the state variables used throughout a COPY
* operation. For simplicity, we use the same struct for all variants of COPY,
* even though some fields are used in only some cases.
@@ -225,8 +354,66 @@ typedef struct CopyStateData
char *raw_buf;
int raw_buf_index; /* next byte to process */
int raw_buf_len; /* total # of bytes stored */
+ int nworkers;
+ bool is_parallel;
+ ParallelCopyData *pcdata;
} CopyStateData;
+/*
+ * This structure helps in storing the common data from CopyStateData that are
+ * required by the workers. This information will then be allocated and stored
+ * into the DSM for the worker to retrieve and copy it to CopyStateData.
+ */
+typedef struct ParallelCopyCommonKeyData
+{
+ /* low-level state data */
+ CopyDest copy_dest; /* type of copy source/destination */
+ int file_encoding; /* file or remote side's character encoding */
+ bool need_transcoding; /* file encoding diff from server? */
+ bool encoding_embeds_ascii; /* ASCII can be non-first byte? */
+
+ /* parameters from the COPY command */
+ bool csv_mode; /* Comma Separated Value format? */
+ bool header_line; /* CSV header line? */
+ int null_print_len; /* length of same */
+ bool force_quote_all; /* FORCE_QUOTE *? */
+ bool convert_selectively; /* do selective binary conversion? */
+
+ /* Working state for COPY FROM */
+ AttrNumber num_defaults;
+ Oid relid;
+}ParallelCopyCommonKeyData;
+
+/*
+ * This structure will help in converting a List data type into the below
+ * structure format with the count having the number of elements in the list and
+ * the info having the List elements appended contigously. This converted
+ * structure will be allocated in shared memory and stored in DSM for the worker
+ * to retrieve and later convert it back to List data type.
+ */
+typedef struct ParallelCopyKeyListInfo
+{
+ int count; /* count of attributes */
+
+ /* string info in the form info followed by info1, info2... infon */
+ char info[1];
+} ParallelCopyKeyListInfo;
+
+/*
+ * List of internal parallel copy function pointers.
+ */
+static const struct
+{
+ char *fn_name;
+ copy_data_source_cb fn_addr;
+} InternalParallelCopyFuncPtrs[] =
+
+{
+ {
+ "copy_read_data", copy_read_data
+ },
+};
+
/* DestReceiver for COPY (query) TO */
typedef struct
{
@@ -256,6 +443,23 @@ typedef struct
/* Trim the list of buffers back down to this number after flushing */
#define MAX_PARTITION_BUFFERS 32
+/* DSM keys for parallel copy. */
+#define PARALLEL_COPY_KEY_SHARED_INFO 1
+#define PARALLEL_COPY_KEY_CSTATE 2
+#define PARALLEL_COPY_KEY_NULL_PRINT 3
+#define PARALLEL_COPY_KEY_NULL_PRINT_CLIENT 4
+#define PARALLEL_COPY_KEY_DELIM 5
+#define PARALLEL_COPY_KEY_QUOTE 6
+#define PARALLEL_COPY_KEY_ESCAPE 7
+#define PARALLEL_COPY_KEY_ATTNAME_LIST 8
+#define PARALLEL_COPY_KEY_FORCE_QUOTE_LIST 9
+#define PARALLEL_COPY_KEY_NOT_NULL_LIST 10
+#define PARALLEL_COPY_KEY_NULL_LIST 11
+#define PARALLEL_COPY_KEY_CONVERT_LIST 12
+#define PARALLEL_COPY_KEY_DATASOURCE_CB 13
+#define PARALLEL_COPY_KEY_WHERE_CLAUSE_STR 14
+#define PARALLEL_COPY_KEY_RANGE_TABLE 15
+
/* Stores multi-insert data related to a single relation in CopyFrom. */
typedef struct CopyMultiInsertBuffer
{
@@ -476,10 +680,596 @@ static bool CopyGetInt32(CopyState cstate, int32 *val);
static void CopySendInt16(CopyState cstate, int16 val);
static bool CopyGetInt16(CopyState cstate, int16 *val);
+static pg_attribute_always_inline void EndParallelCopy(ParallelContext *pcxt);
static void PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tup_desc,
List *attnamelist);
+static pg_attribute_always_inline copy_data_source_cb LookupParallelCopyFnPtr(const char *funcname);
+static pg_attribute_always_inline char* LookupParallelCopyFnStr(copy_data_source_cb fn_addr);
/*
+ * CopyCommonInfoForWorker - Copy shared_cstate using cstate information.
+ */
+static pg_attribute_always_inline void
+CopyCommonInfoForWorker(CopyState cstate, ParallelCopyCommonKeyData *shared_cstate)
+{
+ shared_cstate->copy_dest = cstate->copy_dest;
+ shared_cstate->file_encoding = cstate->file_encoding;
+ shared_cstate->need_transcoding = cstate->need_transcoding;
+ shared_cstate->encoding_embeds_ascii = cstate->encoding_embeds_ascii;
+ shared_cstate->csv_mode = cstate->csv_mode;
+ shared_cstate->header_line = cstate->header_line;
+ shared_cstate->null_print_len = cstate->null_print_len;
+ shared_cstate->force_quote_all = cstate->force_quote_all;
+ shared_cstate->convert_selectively = cstate->convert_selectively;
+ shared_cstate->num_defaults = cstate->num_defaults;
+ shared_cstate->relid = cstate->pcdata->relid;
+}
+
+/*
+ * RetrieveSharedString - Retrieve the string from shared memory.
+ */
+static void
+RetrieveSharedString(shm_toc *toc, int sharedkey, char **copystr)
+{
+ char *shared_str_val = (char *)shm_toc_lookup(toc, sharedkey, true);
+ if (shared_str_val)
+ *copystr = pstrdup(shared_str_val);
+}
+
+/*
+ * RetrieveSharedList - Retrieve the list from shared memory.
+ */
+static void
+RetrieveSharedList(shm_toc *toc, int sharedkey, List **copylist)
+{
+ ParallelCopyKeyListInfo *listinformation = (ParallelCopyKeyListInfo *)shm_toc_lookup(toc, sharedkey,
+ true);
+ if (listinformation)
+ {
+ int length = 0;
+ int count;
+
+ for (count = 0; count < listinformation->count; count++)
+ {
+ char *attname = (char *)(listinformation->info + length);
+ length += strlen(attname) + 1;
+ *copylist = lappend(*copylist, makeString(attname));
+ }
+ }
+}
+
+/*
+ * CopyListSharedMemory - Copy the list into shared memory.
+ */
+static void
+CopyListSharedMemory(List *inputlist, Size memsize, ParallelCopyKeyListInfo *sharedlistinfo)
+{
+ ListCell *l;
+ int length = 0;
+
+ MemSet(sharedlistinfo, 0, memsize);
+ foreach(l, inputlist)
+ {
+ char *name = strVal(lfirst(l));
+ memcpy((char *)(sharedlistinfo->info + length), name, strlen(name));
+ sharedlistinfo->count++;
+ length += strlen(name) + 1;
+ }
+}
+
+/*
+ * ComputeListSize - compute the list size.
+ */
+static int
+ComputeListSize(List *inputlist)
+{
+ int est_size = sizeof(int);
+ if (inputlist != NIL)
+ {
+ ListCell *l;
+ foreach(l, inputlist)
+ est_size += strlen(strVal(lfirst(l))) + 1;
+ }
+
+ return est_size;
+}
+
+/*
+ * EstimateLineKeysStr - Estimate the size required in shared memory for the
+ * input string.
+ */
+static void
+EstimateLineKeysStr(ParallelContext *pcxt, char *inputstr)
+{
+ if (inputstr)
+ {
+ shm_toc_estimate_chunk(&pcxt->estimator, strlen(inputstr) + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+}
+
+/*
+ * EstimateLineKeysList - Estimate the size required in shared memory for the
+ * input list.
+ */
+static void
+EstimateLineKeysList(ParallelContext *pcxt, List *inputlist,
+ Size *est_list_size)
+{
+ if (inputlist != NIL)
+ {
+ *est_list_size = ComputeListSize(inputlist);
+ shm_toc_estimate_chunk(&pcxt->estimator, *est_list_size);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+}
+
+/*
+ * InsertStringShm - Insert a string into shared memory.
+ */
+static void
+InsertStringShm(ParallelContext *pcxt, int key, char *inputstr)
+{
+ if (inputstr)
+ {
+ char *shmptr = (char *)shm_toc_allocate(pcxt->toc,
+ strlen(inputstr) + 1);
+ strcpy(shmptr, inputstr);
+ shm_toc_insert(pcxt->toc, key, shmptr);
+ }
+}
+
+/*
+ * InsertListShm - Insert a list into shared memory.
+ */
+static void
+InsertListShm(ParallelContext *pcxt, int key, List *inputlist,
+ Size est_list_size)
+{
+ if (inputlist != NIL)
+ {
+ ParallelCopyKeyListInfo *sharedlistinfo = (ParallelCopyKeyListInfo *)shm_toc_allocate(pcxt->toc,
+ est_list_size);
+ CopyListSharedMemory(inputlist, est_list_size, sharedlistinfo);
+ shm_toc_insert(pcxt->toc, key, sharedlistinfo);
+ }
+}
+
+/*
+ * BeginParallelCopy - start parallel copy tasks.
+ *
+ * Get the number of workers required to perform the parallel copy. The data
+ * structures that are required by the parallel workers will be initialized, the
+ * size required in DSM will be calculated and the necessary keys will be loaded
+ * in the DSM. The specified number of workers will then be launched.
+ *
+ */
+static ParallelContext*
+BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
+{
+ ParallelContext *pcxt;
+ ParallelCopyShmInfo *shared_info_ptr;
+ ParallelCopyCommonKeyData *shared_cstate;
+ FullTransactionId full_transaction_id;
+ Size est_shared_info;
+ Size est_cstateshared;
+ Size est_att_list_size;
+ Size est_quote_list_size;
+ Size est_notnull_list_size;
+ Size est_null_list_size;
+ Size est_convert_list_size;
+ Size est_datasource_cb_size;
+ int count = 0;
+ char *whereClauseStr = NULL;
+ char *rangeTableStr = NULL;
+ int parallel_workers = 0;
+ ParallelCopyData *pcdata;
+
+ parallel_workers = Min(nworkers, max_worker_processes);
+
+ /* Can't perform copy in parallel */
+ if (parallel_workers <= 0)
+ {
+ elog(WARNING,
+ "No workers available, copy will be run in non-parallel mode");
+ return NULL;
+ }
+
+ pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
+ cstate->pcdata = pcdata;
+
+ EnterParallelMode();
+ pcxt = CreateParallelContext("postgres", "ParallelCopyMain",
+ parallel_workers);
+ Assert(pcxt->nworkers > 0);
+
+ /*
+ * Estimate size for shared information for PARALLEL_COPY_KEY_SHARED_INFO
+ */
+ est_shared_info = sizeof(ParallelCopyShmInfo);
+ shm_toc_estimate_chunk(&pcxt->estimator, est_shared_info);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate the size for shared information for PARALLEL_COPY_KEY_CSTATE */
+ est_cstateshared = MAXALIGN(sizeof(ParallelCopyCommonKeyData));
+ shm_toc_estimate_chunk(&pcxt->estimator, est_cstateshared);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ EstimateLineKeysStr(pcxt, cstate->null_print);
+ EstimateLineKeysStr(pcxt, cstate->null_print_client);
+ EstimateLineKeysStr(pcxt, cstate->delim);
+ EstimateLineKeysStr(pcxt, cstate->quote);
+ EstimateLineKeysStr(pcxt, cstate->escape);
+
+ if (cstate->whereClause != NULL)
+ {
+ whereClauseStr = nodeToString(cstate->whereClause);
+ EstimateLineKeysStr(pcxt, whereClauseStr);
+ }
+
+ if (cstate->range_table != NULL)
+ {
+ rangeTableStr = nodeToString(cstate->range_table);
+ EstimateLineKeysStr(pcxt, rangeTableStr);
+ }
+
+ /* Estimate the size for shared information for PARALLEL_COPY_KEY_XID. */
+ shm_toc_estimate_chunk(&pcxt->estimator, sizeof(FullTransactionId));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_ATTNAME_LIST.
+ */
+ EstimateLineKeysList(pcxt, attnamelist, &est_att_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_FORCE_QUOTE_LIST.
+ */
+ EstimateLineKeysList(pcxt, cstate->force_quote, &est_quote_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_NOT_NULL_LIST.
+ */
+ EstimateLineKeysList(pcxt, cstate->force_notnull,
+ &est_notnull_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_NULL_LIST.
+ */
+ EstimateLineKeysList(pcxt, cstate->force_null, &est_null_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_CONVERT_LIST.
+ */
+ EstimateLineKeysList(pcxt, cstate->convert_select,
+ &est_convert_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_DATASOURCE_CB.
+ */
+ if (cstate->data_source_cb)
+ {
+ char *functionname = LookupParallelCopyFnStr(cstate->data_source_cb);
+ est_datasource_cb_size = strlen(functionname) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, est_datasource_cb_size);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+
+ InitializeParallelDSM(pcxt);
+
+ /* If no DSM segment was available, back out (do serial copy) */
+ if (pcxt->seg == NULL)
+ {
+ EndParallelCopy(pcxt);
+ elog(WARNING,
+ "DSM segments not available, copy will be run in non-parallel mode");
+ return NULL;
+ }
+
+ /* Allocate shared memory for PARALLEL_COPY_KEY_SHARED_INFO */
+ shared_info_ptr = (ParallelCopyShmInfo *) shm_toc_allocate(pcxt->toc,
+ est_shared_info);
+ MemSet(shared_info_ptr, 0, est_shared_info);
+ shared_info_ptr->is_read_in_progress = true;
+ shared_info_ptr->cur_block_pos = -1;
+ shared_info_ptr->full_transaction_id = full_transaction_id;
+ shared_info_ptr->mycid = GetCurrentCommandId(true);
+ for (count = 0; count < RINGSIZE; count++)
+ {
+ ParallelCopyLineBoundary *lineInfo = &shared_info_ptr->line_boundaries.ring[count];
+ pg_atomic_init_u32(&(lineInfo->line_size), -1);
+ }
+
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_SHARED_INFO, shared_info_ptr);
+ pcdata->pcshared_info = shared_info_ptr;
+ pcdata->relid = relid;
+
+ /* Store shared build state, for which we reserved space. */
+ shared_cstate = (ParallelCopyCommonKeyData *)shm_toc_allocate(pcxt->toc,
+ est_cstateshared);
+
+ /* copy cstate variables. */
+ CopyCommonInfoForWorker(cstate, shared_cstate);
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_CSTATE, shared_cstate);
+
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_NULL_PRINT, cstate->null_print);
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_NULL_PRINT_CLIENT,
+ cstate->null_print_client);
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_DELIM, cstate->delim);
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_QUOTE, cstate->quote);
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_ESCAPE, cstate->escape);
+
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_ATTNAME_LIST,
+ attnamelist, est_att_list_size);
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_FORCE_QUOTE_LIST,
+ cstate->force_quote, est_quote_list_size);
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_NOT_NULL_LIST,
+ cstate->force_notnull, est_notnull_list_size);
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_NULL_LIST, cstate->force_null,
+ est_null_list_size);
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_CONVERT_LIST,
+ cstate->convert_select, est_convert_list_size);
+
+ if (cstate->data_source_cb)
+ {
+ char *functionname = LookupParallelCopyFnStr(cstate->data_source_cb);
+ char *data_source_cb = (char *) shm_toc_allocate(pcxt->toc,
+ est_datasource_cb_size);
+ strcpy(data_source_cb, functionname);
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_DATASOURCE_CB,
+ data_source_cb);
+ }
+
+ if (cstate->whereClause)
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_WHERE_CLAUSE_STR,
+ whereClauseStr);
+
+ if(cstate->range_table)
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_RANGE_TABLE, rangeTableStr);
+
+ LaunchParallelWorkers(pcxt);
+ if (pcxt->nworkers_launched == 0)
+ {
+ EndParallelCopy(pcxt);
+ elog(WARNING,
+ "No workers available, copy will be run in non-parallel mode");
+ return NULL;
+ }
+
+ /*
+ * Caller needs to wait for all launched workers when we return. Make sure
+ * that the failure-to-start case will not hang forever.
+ */
+ WaitForParallelWorkersToAttach(pcxt);
+
+ pcdata->is_leader = true;
+ cstate->is_parallel = true;
+ return pcxt;
+}
+
+/*
+ * EndParallelCopy - end the parallel copy tasks.
+ */
+static pg_attribute_always_inline void
+EndParallelCopy(ParallelContext *pcxt)
+{
+ Assert(!IsParallelWorker());
+
+ DestroyParallelContext(pcxt);
+ ExitParallelMode();
+}
+
+/*
+ * ParallelWorkerInitialization - Initialize parallel worker.
+ */
+static void
+ParallelWorkerInitialization(ParallelCopyCommonKeyData *shared_cstate,
+ CopyState cstate, List *attnamelist)
+{
+ uint32 count;
+ ParallelCopyData *pcdata = cstate->pcdata;
+ TupleDesc tup_desc = RelationGetDescr(cstate->rel);
+
+ cstate->copy_dest = shared_cstate->copy_dest;
+ cstate->file_encoding = shared_cstate->file_encoding;
+ cstate->need_transcoding = shared_cstate->need_transcoding;
+ cstate->encoding_embeds_ascii = shared_cstate->encoding_embeds_ascii;
+ cstate->csv_mode = shared_cstate->csv_mode;
+ cstate->header_line = shared_cstate->header_line;
+ cstate->null_print_len = shared_cstate->null_print_len;
+ cstate->force_quote_all = shared_cstate->force_quote_all;
+ cstate->convert_selectively = shared_cstate->convert_selectively;
+ cstate->num_defaults = shared_cstate->num_defaults;
+ pcdata->relid = shared_cstate->relid;
+
+ PopulateGlobalsForCopyFrom(cstate, tup_desc, attnamelist);
+
+ /* Initialize state variables. */
+ cstate->reached_eof = false;
+ cstate->eol_type = EOL_UNKNOWN;
+ cstate->cur_relname = RelationGetRelationName(cstate->rel);
+ cstate->cur_lineno = 0;
+ cstate->cur_attname = NULL;
+ cstate->cur_attval = NULL;
+
+ /* Set up variables to avoid per-attribute overhead. */
+ initStringInfo(&cstate->attribute_buf);
+
+ initStringInfo(&cstate->line_buf);
+ for (count = 0; count < WORKER_CHUNK_COUNT;count++)
+ initStringInfo(&pcdata->worker_line_buf[count].line_buf);
+
+ cstate->line_buf_converted = false;
+ cstate->raw_buf = NULL;
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+}
+/*
+ * ParallelCopyMain - parallel copy worker's code.
+ *
+ * Where clause handling, convert tuple to columns, add default null values for
+ * the missing columns that are not present in that record. Find the partition
+ * if it is partitioned table, invoke before row insert Triggers, handle
+ * constraints and insert the tuples.
+ */
+void
+ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
+{
+ CopyState cstate;
+ ParallelCopyData *pcdata;
+ ParallelCopyShmInfo *pcshared_info;
+ ParallelCopyCommonKeyData *shared_cstate;
+ Relation rel = NULL;
+ MemoryContext oldcontext;
+ List *attlist = NIL;
+ char *data_source_cb;
+ char *whereClauseStr = NULL;
+ char *rangeTableStr = NULL;
+
+ /* Allocate workspace and zero all fields. */
+ cstate = (CopyStateData *) palloc0(sizeof(CopyStateData));
+ pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
+ cstate->pcdata = pcdata;
+ pcdata->is_leader = false;
+ pcdata->worker_processed_pos = -1;
+ cstate->is_parallel = true;
+
+ /*
+ * We allocate everything used by a cstate in a new memory context. This
+ * avoids memory leaks during repeated use of COPY in a query.
+ */
+ cstate->copycontext = AllocSetContextCreate(CurrentMemoryContext,
+ "COPY",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+
+ pcshared_info = (ParallelCopyShmInfo *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_SHARED_INFO,
+ false);
+
+ ereport(DEBUG1, (errmsg("Starting parallel copy worker")));
+
+ pcdata->pcshared_info = pcshared_info;
+
+ shared_cstate = (ParallelCopyCommonKeyData *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_CSTATE,
+ false);
+
+ cstate->null_print = (char *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_NULL_PRINT,
+ true);
+ cstate->null_print_client = (char *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_NULL_PRINT_CLIENT,
+ true);
+
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_DELIM, &cstate->delim);
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_QUOTE, &cstate->quote);
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_ESCAPE, &cstate->escape);
+
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_ATTNAME_LIST, &attlist);
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_FORCE_QUOTE_LIST, &cstate->force_quote);
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_NOT_NULL_LIST, &cstate->force_notnull);
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_NULL_LIST, &cstate->force_null);
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_CONVERT_LIST, &cstate->convert_select);
+ data_source_cb = (char *)shm_toc_lookup(toc,
+ PARALLEL_COPY_KEY_DATASOURCE_CB,
+ true);
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_WHERE_CLAUSE_STR, &whereClauseStr);
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_RANGE_TABLE, &rangeTableStr);
+
+ if (data_source_cb)
+ cstate->data_source_cb = LookupParallelCopyFnPtr(data_source_cb);
+
+ if (whereClauseStr)
+ {
+ Node *whereClauseCnvrtdFrmStr = (Node *) stringToNode(whereClauseStr);
+ cstate->whereClause = whereClauseCnvrtdFrmStr;
+ }
+
+ if (rangeTableStr)
+ {
+ List *rangeTableCnvrtdFrmStr = (List *) stringToNode(rangeTableStr);
+ cstate->range_table = rangeTableCnvrtdFrmStr;
+ }
+
+ /* Open and lock the relation, using the appropriate lock type. */
+ rel = table_open(shared_cstate->relid, RowExclusiveLock);
+ cstate->rel = rel;
+ ParallelWorkerInitialization(shared_cstate, cstate, attlist);
+
+ CopyFrom(cstate);
+
+ if (rel != NULL)
+ table_close(rel, RowExclusiveLock);
+
+ MemoryContextSwitchTo(oldcontext);
+ return;
+}
+/*
+ * ParallelCopyLeader - parallel copy leader's functionality.
+ *
+ * Leader will populate the shared memory and share it across the workers.
+ * Leader will read the table data from the file and copy the contents to block.
+ * Leader will then read the input contents and identify the data based on line
+ * beaks. This information is called line. The line will be populate in
+ * ParallelCopyLineBoundary. Workers will then pick up this information and
+ * insert in to table. Leader will do this till it completes processing the
+ * file. Leader executes the before statement if before statement trigger is
+ * present. Leader read the data from input file. Leader then loads data to data
+ * blocks as and when required block by block. Leader traverses through the data
+ * block to identify one line. It gets a free line to copy the information, if
+ * there is no free line it will wait till there is one free line.
+ * Server copies the identified lines information into lines. This process is
+ * repeated till the complete file is processed.
+ * Leader will wait till all the lines populated are processed by the workers
+ * and exits.
+ */
+static void
+ParallelCopyLeader(CopyState cstate)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ereport(DEBUG1, (errmsg("Running parallel copy leader")));
+
+ pcshared_info->is_read_in_progress = false;
+ cstate->cur_lineno = 0;
+}
+/*
+ * LookupParallelCopyFnPtr - Look up parallel copy function pointer.
+ */
+static pg_attribute_always_inline copy_data_source_cb
+LookupParallelCopyFnPtr(const char *funcname)
+{
+ int i;
+
+ for (i = 0; i < lengthof(InternalParallelCopyFuncPtrs); i++)
+ {
+ if (strcmp(InternalParallelCopyFuncPtrs[i].fn_name, funcname) == 0)
+ return InternalParallelCopyFuncPtrs[i].fn_addr;
+ }
+
+ /* We can only reach this by programming error. */
+ elog(ERROR, "internal function \"%s\" not found", funcname);
+}
+
+/*
+ * LookupParallelCopyFnStr - Lookup function string from a function pointer.
+ */
+static pg_attribute_always_inline char*
+LookupParallelCopyFnStr(copy_data_source_cb fn_addr)
+{
+ int i;
+
+ for (i = 0; i < lengthof(InternalParallelCopyFuncPtrs); i++)
+ {
+ if (InternalParallelCopyFuncPtrs[i].fn_addr == fn_addr)
+ return InternalParallelCopyFuncPtrs[i].fn_name;
+ }
+
+ /* We can only reach this by programming error. */
+ elog(ERROR, "internal function pointer not found");
+}
+/*
* Send copy start/stop messages for frontend copies. These have changed
* in past protocol redesigns.
*/
@@ -1149,6 +1939,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
if (is_from)
{
+ ParallelContext *pcxt = NULL;
Assert(rel);
/* check read-only transaction and parallel mode */
@@ -1158,7 +1949,24 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
cstate = BeginCopyFrom(pstate, rel, stmt->filename, stmt->is_program,
NULL, stmt->attlist, stmt->options);
cstate->whereClause = whereClause;
- *processed = CopyFrom(cstate); /* copy from file to database */
+ cstate->is_parallel = false;
+
+ if (cstate->nworkers > 0)
+ pcxt = BeginParallelCopy(cstate->nworkers, cstate, stmt->attlist,
+ relid);
+
+ if (pcxt)
+ {
+ ParallelCopyLeader(cstate);
+
+ /* Wait for all copy workers to finish */
+ WaitForParallelWorkersToFinish(pcxt);
+ *processed = pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
+ EndParallelCopy(pcxt);
+ }
+ else
+ *processed = CopyFrom(cstate); /* copy from file to database */
+
EndCopyFrom(cstate);
}
else
@@ -1207,6 +2015,7 @@ ProcessCopyOptions(ParseState *pstate,
cstate->is_copy_from = is_from;
cstate->file_encoding = -1;
+ cstate->nworkers = -1;
/* Extract options from the statement node tree */
foreach(option, options)
@@ -1375,6 +2184,26 @@ ProcessCopyOptions(ParseState *pstate,
defel->defname),
parser_errposition(pstate, defel->location)));
}
+ else if (strcmp(defel->defname, "parallel") == 0)
+ {
+ if (!cstate->is_copy_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("parallel option supported only for copy from"),
+ parser_errposition(pstate, defel->location)));
+ if (cstate->nworkers >= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("conflicting or redundant options"),
+ parser_errposition(pstate, defel->location)));
+ cstate->nworkers = atoi(defGetString(defel));
+ if (cstate->nworkers <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("argument to option \"%s\" must be a positive integer greater than zero",
+ defel->defname),
+ parser_errposition(pstate, defel->location)));
+ }
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1727,12 +2556,13 @@ BeginCopy(ParseState *pstate,
}
/*
- * PopulateGlobalsForCopyFrom - Populates the common variables required for copy
- * from operation. This is a helper function for BeginCopy function.
+ * PopulateCatalogInformation - Populates the common variables required for copy
+ * from operation. This is a helper function for BeginCopy &
+ * ParallelWorkerInitialization function.
*/
static void
PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tupDesc,
- List *attnamelist)
+ List *attnamelist)
{
int num_phys_attrs;
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index c27d970..b3787c1 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -557,7 +557,7 @@ make_copy_attnamelist(LogicalRepRelMapEntry *rel)
* Data source callback for the COPY FROM, which reads from the remote
* connection and passes the data back to our local COPY.
*/
-static int
+int
copy_read_data(void *outbuf, int minread, int maxread)
{
int bytesread = 0;
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c639833..5dc95ac 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -14,6 +14,7 @@
#ifndef COPY_H
#define COPY_H
+#include "access/parallel.h"
#include "nodes/execnodes.h"
#include "nodes/parsenodes.h"
#include "parser/parse_node.h"
@@ -41,4 +42,7 @@ extern uint64 CopyFrom(CopyState cstate);
extern DestReceiver *CreateCopyDestReceiver(void);
+extern void ParallelCopyMain(dsm_segment *seg, shm_toc *toc);
+
+extern int copy_read_data(void *outbuf, int minread, int maxread);
#endif /* COPY_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index c65a552..8a79794 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1698,6 +1698,14 @@ ParallelBitmapHeapState
ParallelBlockTableScanDesc
ParallelCompletionPtr
ParallelContext
+ParallelCopyLineBoundaries
+ParallelCopyLineBoundary
+ParallelCopyCommonKeyData
+ParallelCopyData
+ParallelCopyDataBlock
+ParallelCopyKeyListInfo
+ParallelCopyLineBuf
+ParallelCopyShmInfo
ParallelExecutorInfo
ParallelHashGrowth
ParallelHashJoinBatch
--
1.8.3.1
0005-Tests-for-parallel-copy.patchapplication/x-patch; name=0005-Tests-for-parallel-copy.patchDownload
From 740f39b8a87eb74e74182ed2cc5e8c18bd2ce367 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Wed, 17 Jun 2020 07:37:26 +0530
Subject: [PATCH 5/6] Tests for parallel copy.
This patch has the tests for parallel copy.
---
src/test/regress/expected/copy2.out | 210 +++++++++++++++++++++++++++++++++++-
src/test/regress/input/copy.source | 12 +++
src/test/regress/output/copy.source | 12 +++
src/test/regress/sql/copy2.sql | 208 ++++++++++++++++++++++++++++++++++-
4 files changed, 434 insertions(+), 8 deletions(-)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index e40287d..a088f72 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -254,18 +254,32 @@ It is "perfect".|
"It is ""perfect""."," "
"",
--test that we read consecutive LFs properly
-CREATE TEMP TABLE testnl (a int, b text, c int);
+CREATE TABLE testnl (a int, b text, c int);
COPY testnl FROM stdin CSV;
+COPY testnl FROM stdin WITH (FORMAT 'csv', PARALLEL '2');
+SELECT * FROM testnl;
+ a | b | c
+---+----------------------+---
+ 1 | a field with two LFs+| 2
+ | +|
+ | inside |
+ 1 | a field with two LFs+| 2
+ | +|
+ | inside |
+(2 rows)
+
-- test end of copy marker
-CREATE TEMP TABLE testeoc (a text);
+CREATE TABLE testeoc (a text);
COPY testeoc FROM stdin CSV;
+TRUNCATE testeoc;
+COPY testeoc FROM stdin WITH (FORMAT 'csv', PARALLEL '2');
COPY testeoc TO stdout CSV;
a\.
\.b
c\.d
"\."
-- test handling of nonstandard null marker that violates escaping rules
-CREATE TEMP TABLE testnull(a int, b text);
+CREATE TABLE testnull(a int, b text);
INSERT INTO testnull VALUES (1, E'\\0'), (NULL, NULL);
COPY testnull TO stdout WITH NULL AS E'\\0';
1 \\0
@@ -280,6 +294,15 @@ SELECT * FROM testnull;
|
(4 rows)
+TRUNCATE testnull;
+COPY testnull FROM stdin WITH (NULL E'\\0', PARALLEL 2);
+SELECT * FROM testnull;
+ a | b
+----+----
+ 42 | \0
+ |
+(2 rows)
+
BEGIN;
CREATE TABLE vistest (LIKE testeoc);
COPY vistest FROM stdin CSV;
@@ -349,6 +372,34 @@ SELECT * FROM vistest;
BEGIN;
TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 2);
+SELECT * FROM vistest;
+ a
+----
+ a2
+ b
+(2 rows)
+
+SAVEPOINT s1;
+TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 2);
+SELECT * FROM vistest;
+ a
+----
+ d2
+ e
+(2 rows)
+
+COMMIT;
+SELECT * FROM vistest;
+ a
+----
+ d2
+ e
+(2 rows)
+
+BEGIN;
+TRUNCATE vistest;
COPY vistest FROM stdin CSV FREEZE;
SELECT * FROM vistest;
a
@@ -409,7 +460,7 @@ SELECT * FROM vistest;
(2 rows)
-- Test FORCE_NOT_NULL and FORCE_NULL options
-CREATE TEMP TABLE forcetest (
+CREATE TABLE forcetest (
a INT NOT NULL,
b TEXT NOT NULL,
c TEXT,
@@ -420,6 +471,8 @@ CREATE TEMP TABLE forcetest (
-- should succeed with no effect ("b" remains an empty string, "c" remains NULL)
BEGIN;
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
+TRUNCATE forcetest;
+COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c), PARALLEL 2);
COMMIT;
SELECT b, c FROM forcetest WHERE a = 1;
b | c
@@ -430,6 +483,8 @@ SELECT b, c FROM forcetest WHERE a = 1;
-- should succeed, FORCE_NULL and FORCE_NOT_NULL can be both specified
BEGIN;
COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d));
+TRUNCATE forcetest;
+COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d), PARALLEL 2);
COMMIT;
SELECT c, d FROM forcetest WHERE a = 2;
c | d
@@ -486,6 +541,31 @@ select * from check_con_tbl;
(2 rows)
+\d+ check_con_tbl
+ Table "public.check_con_tbl"
+ Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
+--------+---------+-----------+----------+---------+---------+--------------+-------------
+ f1 | integer | | | | plain | |
+Check constraints:
+ "check_con_tbl_check" CHECK (check_con_function(check_con_tbl.*))
+
+truncate check_con_tbl;
+copy check_con_tbl from stdin with (parallel 2);
+NOTICE: input = {"f1":1}
+NOTICE: input = {"f1":null}
+copy check_con_tbl from stdin with (parallel 2);
+NOTICE: input = {"f1":0}
+ERROR: new row for relation "check_con_tbl" violates check constraint "check_con_tbl_check"
+DETAIL: Failing row contains (0).
+CONTEXT: COPY check_con_tbl, line 1: "0"
+parallel worker
+select * from check_con_tbl;
+ f1
+----
+ 1
+
+(2 rows)
+
-- test with RLS enabled.
CREATE ROLE regress_rls_copy_user;
CREATE ROLE regress_rls_copy_user_colperms;
@@ -600,8 +680,130 @@ SELECT * FROM instead_of_insert_tbl;
(2 rows)
COMMIT;
+-- Parallel copy tests.
+CREATE TABLE test_parallel_copy (
+ a INT,
+ b INT,
+ c TEXT not null default 'stuff',
+ d TEXT,
+ e TEXT
+) ;
+COPY test_parallel_copy (a, b, c, d, e) from stdin with (PARALLEL '2');
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '2');
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '2');
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '2', FORMAT 'csv', HEADER);
+-- zero workers: should perform non-parallel copy
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0');
+ERROR: argument to option "parallel" must be a positive integer greater than zero
+LINE 1: COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0'...
+ ^
+-- referencing table: should perform non-parallel copy
+CREATE TABLE test_copy_pk(c1 INT PRIMARY KEY);
+INSERT INTO test_copy_pk VALUES(10);
+CREATE TABLE test_copy_ri(c1 INT REFERENCES test_copy_pk(c1));
+COPY test_copy_ri from stdin with (FORMAT csv, DELIMITER ',', PARALLEL '2');
+WARNING: Parallel copy not supported for specified table, copy will be run in non-parallel mode
+-- expressions: should perform non-parallel copy
+CREATE TABLE test_copy_expr (index INT, height REAL, weight REAL);
+COPY test_copy_expr FROM STDIN WITH (FORMAT csv, DELIMITER ',', parallel '2') WHERE height > random() * 65;
+WARNING: Parallel copy not supported for specified table, copy will be run in non-parallel mode
+-- serial data: should perform non-parallel copy
+CREATE TABLE testserial (index SERIAL, height REAL);
+COPY testserial(height) FROM STDIN WITH (FORMAT csv, DELIMITER ',', parallel '2');
+WARNING: Parallel copy not supported for specified table, copy will be run in non-parallel mode
+-- temporary table copy: should peform non-parallel copy
+CREATE TEMPORARY TABLE temp_test(
+ a int
+) ;
+COPY temp_test (a) from stdin with (PARALLEL '2');
+WARNING: Parallel copy not supported for specified table, copy will be run in non-parallel mode
+-- non-existent column in column list: should fail
+COPY test_parallel_copy (xyz) from stdin with (PARALLEL '2');
+ERROR: column "xyz" of relation "test_parallel_copy" does not exist
+-- too many columns in column list: should fail
+COPY test_parallel_copy (a, b, c, d, e, d, c) from stdin with (PARALLEL '2');
+ERROR: column "d" specified more than once
+-- missing data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL '2');
+ERROR: invalid input syntax for type integer: ""
+CONTEXT: COPY test_parallel_copy, line 0, column a: ""
+parallel worker
+COPY test_parallel_copy from stdin with (PARALLEL '2');
+ERROR: missing data for column "e"
+CONTEXT: COPY test_parallel_copy, line 1: "2000 230 23 23"
+parallel worker
+COPY test_parallel_copy from stdin with (PARALLEL '2');
+ERROR: missing data for column "e"
+CONTEXT: COPY test_parallel_copy, line 1: "2001 231 \N \N"
+parallel worker
+-- extra data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL '2');
+ERROR: extra data after last expected column
+CONTEXT: COPY test_parallel_copy, line 1: "2002 232 40 50 60 70 80"
+parallel worker
+-- various COPY options: delimiters, oids, NULL string, encoding
+COPY test_parallel_copy (b, c, d, e) from stdin with (delimiter ',', null 'x', PARALLEL '2') ;
+COPY test_parallel_copy from stdin WITH (DELIMITER ';', NULL '', PARALLEL '2');
+COPY test_parallel_copy from stdin WITH (DELIMITER ':', NULL E'\\X', ENCODING 'sql_ascii', PARALLEL '2');
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a = 50004;
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a > 60003;
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE f > 60003;
+ERROR: column "f" does not exist
+LINE 1: ...arallel_copy from stdin WITH (PARALLEL '2') WHERE f > 60003;
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a = max(x.b);
+ERROR: missing FROM-clause entry for table "x"
+LINE 1: ...llel_copy from stdin WITH (PARALLEL '2') WHERE a = max(x.b);
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a IN (SELECT 1 FROM x);
+ERROR: cannot use subquery in COPY FROM WHERE condition
+LINE 1: ...allel_copy from stdin WITH (PARALLEL '2') WHERE a IN (SELECT...
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a IN (generate_series(1,5));
+ERROR: set-returning functions are not allowed in COPY FROM WHERE conditions
+LINE 1: ...l_copy from stdin WITH (PARALLEL '2') WHERE a IN (generate_s...
+ ^
+COPY test_parallel_copy from stdin WITH(PARALLEL '2') WHERE a = row_number() over(b);
+ERROR: window functions are not allowed in COPY FROM WHERE conditions
+LINE 1: ...llel_copy from stdin WITH(PARALLEL '2') WHERE a = row_number...
+ ^
+-- check results of copy in
+SELECT * FROM test_parallel_copy;
+ a | b | c | d | e
+-------+----+------------+---------+---------
+ 1 | 11 | test_c1 | test_d1 | test_e1
+ 2 | 12 | test_c2 | test_d2 | test_e2
+ | 3 | stuff | test_d3 |
+ | 4 | stuff | test_d4 |
+ | 5 | stuff | test_d5 |
+ | | 45 | 80 | 90
+ | | x | \x | \x
+ | | , | \, | \
+ 3000 | | c | |
+ 4000 | | C | |
+ 4001 | 1 | empty | |
+ 4002 | 2 | null | |
+ 4003 | 3 | Backslash | \ | \
+ 4004 | 4 | BackslashX | \X | \X
+ 4005 | 5 | N | N | N
+ 4006 | 6 | BackslashN | \N | \N
+ 4007 | 7 | XX | XX | XX
+ 4008 | 8 | Delimiter | : | :
+ 50004 | 25 | 35 | 45 | 55
+ 60004 | 25 | 35 | 45 | 55
+ 60005 | 26 | 36 | 46 | 56
+(21 rows)
+
-- clean up
+DROP TABLE test_copy_ri;
+DROP TABLE test_copy_pk;
+DROP TABLE test_copy_expr;
+DROP TABLE testeoc;
+DROP TABLE testnl;
DROP TABLE forcetest;
+DROP TABLE test_parallel_copy;
+DROP TABLE testserial;
+DROP TABLE testnull;
DROP TABLE vistest;
DROP FUNCTION truncate_in_subxact();
DROP TABLE x, y;
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index a1d529a..159c058 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -15,6 +15,13 @@ DELETE FROM onek;
COPY onek FROM '@abs_builddir@/results/onek.data';
+-- test parallel copy
+COPY tenk1 FROM '@abs_srcdir@/data/tenk.data' WITH (parallel 2);
+
+SELECT COUNT(*) FROM tenk1;
+
+TRUNCATE tenk1;
+
COPY tenk1 FROM '@abs_srcdir@/data/tenk.data';
COPY slow_emp4000 FROM '@abs_srcdir@/data/rect.data';
@@ -159,6 +166,11 @@ truncate parted_copytest;
copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv';
+-- Test parallel copy from with a partitioned table.
+truncate parted_copytest;
+
+copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv' with (parallel 2);
+
-- Ensure COPY FREEZE errors for partitioned tables.
begin;
truncate parted_copytest;
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index 938d355..c3003fe 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -9,6 +9,15 @@ COPY onek FROM '@abs_srcdir@/data/onek.data';
COPY onek TO '@abs_builddir@/results/onek.data';
DELETE FROM onek;
COPY onek FROM '@abs_builddir@/results/onek.data';
+-- test parallel copy
+COPY tenk1 FROM '@abs_srcdir@/data/tenk.data' WITH (parallel 2);
+SELECT COUNT(*) FROM tenk1;
+ count
+-------
+ 10000
+(1 row)
+
+TRUNCATE tenk1;
COPY tenk1 FROM '@abs_srcdir@/data/tenk.data';
COPY slow_emp4000 FROM '@abs_srcdir@/data/rect.data';
COPY person FROM '@abs_srcdir@/data/person.data';
@@ -113,6 +122,9 @@ insert into parted_copytest select x,1,'One' from generate_series(1011,1020) x;
copy (select * from parted_copytest order by a) to '@abs_builddir@/results/parted_copytest.csv';
truncate parted_copytest;
copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv';
+-- Test parallel copy from with a partitioned table.
+truncate parted_copytest;
+copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv' with (parallel 2);
-- Ensure COPY FREEZE errors for partitioned tables.
begin;
truncate parted_copytest;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 902f4fa..13104f4 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -157,7 +157,7 @@ COPY y TO stdout (FORMAT CSV, FORCE_QUOTE *);
--test that we read consecutive LFs properly
-CREATE TEMP TABLE testnl (a int, b text, c int);
+CREATE TABLE testnl (a int, b text, c int);
COPY testnl FROM stdin CSV;
1,"a field with two LFs
@@ -165,8 +165,16 @@ COPY testnl FROM stdin CSV;
inside",2
\.
+COPY testnl FROM stdin WITH (FORMAT 'csv', PARALLEL '2');
+1,"a field with two LFs
+
+inside",2
+\.
+
+SELECT * FROM testnl;
+
-- test end of copy marker
-CREATE TEMP TABLE testeoc (a text);
+CREATE TABLE testeoc (a text);
COPY testeoc FROM stdin CSV;
a\.
@@ -175,11 +183,19 @@ c\.d
"\."
\.
+TRUNCATE testeoc;
+COPY testeoc FROM stdin WITH (FORMAT 'csv', PARALLEL '2');
+a\.
+\.b
+c\.d
+"\."
+\.
+
COPY testeoc TO stdout CSV;
-- test handling of nonstandard null marker that violates escaping rules
-CREATE TEMP TABLE testnull(a int, b text);
+CREATE TABLE testnull(a int, b text);
INSERT INTO testnull VALUES (1, E'\\0'), (NULL, NULL);
COPY testnull TO stdout WITH NULL AS E'\\0';
@@ -191,6 +207,14 @@ COPY testnull FROM stdin WITH NULL AS E'\\0';
SELECT * FROM testnull;
+TRUNCATE testnull;
+COPY testnull FROM stdin WITH (NULL E'\\0', PARALLEL 2);
+42 \\0
+\0 \0
+\.
+
+SELECT * FROM testnull;
+
BEGIN;
CREATE TABLE vistest (LIKE testeoc);
COPY vistest FROM stdin CSV;
@@ -235,6 +259,23 @@ SELECT * FROM vistest;
BEGIN;
TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 2);
+a2
+b
+\.
+SELECT * FROM vistest;
+SAVEPOINT s1;
+TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 2);
+d2
+e
+\.
+SELECT * FROM vistest;
+COMMIT;
+SELECT * FROM vistest;
+
+BEGIN;
+TRUNCATE vistest;
COPY vistest FROM stdin CSV FREEZE;
x
y
@@ -284,7 +325,7 @@ SELECT * FROM vistest;
COMMIT;
SELECT * FROM vistest;
-- Test FORCE_NOT_NULL and FORCE_NULL options
-CREATE TEMP TABLE forcetest (
+CREATE TABLE forcetest (
a INT NOT NULL,
b TEXT NOT NULL,
c TEXT,
@@ -297,6 +338,10 @@ BEGIN;
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
1,,""
\.
+TRUNCATE forcetest;
+COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c), PARALLEL 2);
+1,,""
+\.
COMMIT;
SELECT b, c FROM forcetest WHERE a = 1;
-- should succeed, FORCE_NULL and FORCE_NOT_NULL can be both specified
@@ -304,6 +349,10 @@ BEGIN;
COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d));
2,'a',,""
\.
+TRUNCATE forcetest;
+COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d), PARALLEL 2);
+2,'a',,""
+\.
COMMIT;
SELECT c, d FROM forcetest WHERE a = 2;
-- should fail with not-null constraint violation
@@ -339,6 +388,16 @@ copy check_con_tbl from stdin;
0
\.
select * from check_con_tbl;
+\d+ check_con_tbl
+truncate check_con_tbl;
+copy check_con_tbl from stdin with (parallel 2);
+1
+\N
+\.
+copy check_con_tbl from stdin with (parallel 2);
+0
+\.
+select * from check_con_tbl;
-- test with RLS enabled.
CREATE ROLE regress_rls_copy_user;
@@ -440,8 +499,149 @@ test1
SELECT * FROM instead_of_insert_tbl;
COMMIT;
+-- Parallel copy tests.
+CREATE TABLE test_parallel_copy (
+ a INT,
+ b INT,
+ c TEXT not null default 'stuff',
+ d TEXT,
+ e TEXT
+) ;
+
+COPY test_parallel_copy (a, b, c, d, e) from stdin with (PARALLEL '2');
+1 11 test_c1 test_d1 test_e1
+2 12 test_c2 test_d2 test_e2
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '2');
+3 test_d3
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '2');
+4 test_d4
+5 test_d5
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '2', FORMAT 'csv', HEADER);
+b d
+\.
+
+-- zero workers: should perform non-parallel copy
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0');
+
+-- referencing table: should perform non-parallel copy
+CREATE TABLE test_copy_pk(c1 INT PRIMARY KEY);
+INSERT INTO test_copy_pk VALUES(10);
+CREATE TABLE test_copy_ri(c1 INT REFERENCES test_copy_pk(c1));
+COPY test_copy_ri from stdin with (FORMAT csv, DELIMITER ',', PARALLEL '2');
+10
+\.
+
+-- expressions: should perform non-parallel copy
+CREATE TABLE test_copy_expr (index INT, height REAL, weight REAL);
+
+COPY test_copy_expr FROM STDIN WITH (FORMAT csv, DELIMITER ',', parallel '2') WHERE height > random() * 65;
+60,60,60
+\.
+
+-- serial data: should perform non-parallel copy
+CREATE TABLE testserial (index SERIAL, height REAL);
+
+COPY testserial(height) FROM STDIN WITH (FORMAT csv, DELIMITER ',', parallel '2');
+60
+\.
+
+-- temporary table copy: should peform non-parallel copy
+CREATE TEMPORARY TABLE temp_test(
+ a int
+) ;
+
+COPY temp_test (a) from stdin with (PARALLEL '2');
+10
+\.
+
+-- non-existent column in column list: should fail
+COPY test_parallel_copy (xyz) from stdin with (PARALLEL '2');
+
+-- too many columns in column list: should fail
+COPY test_parallel_copy (a, b, c, d, e, d, c) from stdin with (PARALLEL '2');
+
+-- missing data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL '2');
+
+\.
+COPY test_parallel_copy from stdin with (PARALLEL '2');
+2000 230 23 23
+\.
+COPY test_parallel_copy from stdin with (PARALLEL '2');
+2001 231 \N \N
+\.
+
+-- extra data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL '2');
+2002 232 40 50 60 70 80
+\.
+
+-- various COPY options: delimiters, oids, NULL string, encoding
+COPY test_parallel_copy (b, c, d, e) from stdin with (delimiter ',', null 'x', PARALLEL '2') ;
+x,45,80,90
+x,\x,\\x,\\\x
+x,\,,\\\,,\\
+\.
+
+COPY test_parallel_copy from stdin WITH (DELIMITER ';', NULL '', PARALLEL '2');
+3000;;c;;
+\.
+
+COPY test_parallel_copy from stdin WITH (DELIMITER ':', NULL E'\\X', ENCODING 'sql_ascii', PARALLEL '2');
+4000:\X:C:\X:\X
+4001:1:empty::
+4002:2:null:\X:\X
+4003:3:Backslash:\\:\\
+4004:4:BackslashX:\\X:\\X
+4005:5:N:\N:\N
+4006:6:BackslashN:\\N:\\N
+4007:7:XX:\XX:\XX
+4008:8:Delimiter:\::\:
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a = 50004;
+50003 24 34 44 54
+50004 25 35 45 55
+50005 26 36 46 56
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a > 60003;
+60001 22 32 42 52
+60002 23 33 43 53
+60003 24 34 44 54
+60004 25 35 45 55
+60005 26 36 46 56
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE f > 60003;
+
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a = max(x.b);
+
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a IN (SELECT 1 FROM x);
+
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a IN (generate_series(1,5));
+
+COPY test_parallel_copy from stdin WITH(PARALLEL '2') WHERE a = row_number() over(b);
+
+-- check results of copy in
+SELECT * FROM test_parallel_copy;
+
-- clean up
+DROP TABLE test_copy_ri;
+DROP TABLE test_copy_pk;
+DROP TABLE test_copy_expr;
+DROP TABLE testeoc;
+DROP TABLE testnl;
DROP TABLE forcetest;
+DROP TABLE test_parallel_copy;
+DROP TABLE testserial;
+DROP TABLE testnull;
DROP TABLE vistest;
DROP FUNCTION truncate_in_subxact();
DROP TABLE x, y;
--
1.8.3.1
0006-Parallel-Copy-For-Binary-Format-Files.patchapplication/x-patch; name=0006-Parallel-Copy-For-Binary-Format-Files.patchDownload
From 14cf6fcfaa9b49c8e8ab9e503e549d40f1eebafc Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>, Vignesh C <vignesh21@gmail.com>
Date: Tue, 23 Jun 2020 07:25:14 +0530
Subject: [PATCH 6/6] Parallel Copy For Binary Format Files
Leader reads data from the file into the DSM data blocks each of 64K size.
It also identifies each tuple data block id, start offset, end offset,
tuple size and updates this information in the ring data structure.
Workers parallely read the tuple information from the ring data structure,
the actual tuple data from the data blocks and parallely insert the tuples
into the table.
---
src/backend/commands/copy.c | 663 +++++++++++++++++++++++++++++++++++++++-----
1 file changed, 597 insertions(+), 66 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index e8a89a4..e22292c 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -243,6 +243,16 @@ typedef struct ParallelCopyData
}ParallelCopyData;
/*
+ * Tuple boundary information used for parallel copy
+ * for binary format files.
+ */
+typedef struct ParallelCopyTupleInfo
+{
+ uint32 offset;
+ uint32 block_id;
+}ParallelCopyTupleInfo;
+
+/*
* This struct contains all the state variables used throughout a COPY
* operation. For simplicity, we use the same struct for all variants of COPY,
* even though some fields are used in only some cases.
@@ -372,6 +382,16 @@ typedef struct CopyStateData
int nworkers;
bool is_parallel;
ParallelCopyData *pcdata;
+
+ /*
+ * Parallel copy for binary formatted files
+ */
+ ParallelCopyDataBlock *curr_data_block;
+ ParallelCopyDataBlock *prev_data_block;
+ uint32 curr_data_offset;
+ uint32 curr_block_pos;
+ ParallelCopyTupleInfo curr_tuple_start_info;
+ ParallelCopyTupleInfo curr_tuple_end_info;
} CopyStateData;
/*
@@ -397,6 +417,7 @@ typedef struct ParallelCopyCommonKeyData
/* Working state for COPY FROM */
AttrNumber num_defaults;
Oid relid;
+ bool binary;
}ParallelCopyCommonKeyData;
/*
@@ -751,6 +772,14 @@ static void PopulateCatalogInformation(CopyState cstate);
static pg_attribute_always_inline uint32 GetLinePosition(CopyState cstate);
static pg_attribute_always_inline copy_data_source_cb LookupParallelCopyFnPtr(const char *funcname);
static pg_attribute_always_inline char* LookupParallelCopyFnStr(copy_data_source_cb fn_addr);
+static uint32 WaitGetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info);
+static pg_attribute_always_inline bool CopyReadBinaryTupleLeader(CopyState cstate);
+static pg_attribute_always_inline void CopyReadBinaryAttributeLeader(CopyState cstate,
+ FmgrInfo *flinfo, Oid typioparam, int32 typmod,
+ uint32 *new_block_pos);
+static pg_attribute_always_inline bool CopyReadBinaryTupleWorker(CopyState cstate, Datum *values, bool *nulls);
+static pg_attribute_always_inline Datum CopyReadBinaryAttributeWorker(CopyState cstate, int column_no,
+ FmgrInfo *flinfo, Oid typioparam, int32 typmod, bool *isnull);
/*
* CopyCommonInfoForWorker - Copy shared_cstate using cstate information.
@@ -769,6 +798,7 @@ CopyCommonInfoForWorker(CopyState cstate, ParallelCopyCommonKeyData *shared_csta
shared_cstate->convert_selectively = cstate->convert_selectively;
shared_cstate->num_defaults = cstate->num_defaults;
shared_cstate->relid = cstate->pcdata->relid;
+ shared_cstate->binary = cstate->binary;
}
/*
@@ -997,8 +1027,8 @@ FindInsertMethod(CopyState cstate)
static pg_attribute_always_inline bool
IsParallelCopyAllowed(CopyState cstate)
{
- /* Parallel copy not allowed for frontend (2.0 protocol) & binary option. */
- if ((cstate->copy_dest == COPY_OLD_FE) || cstate->binary)
+ /* Parallel copy not allowed for frontend (2.0 protocol). */
+ if (cstate->copy_dest == COPY_OLD_FE)
return false;
/* Check if copy is into foreign table or temporary table. */
@@ -1299,6 +1329,7 @@ ParallelWorkerInitialization(ParallelCopyCommonKeyData *shared_cstate,
cstate->convert_selectively = shared_cstate->convert_selectively;
cstate->num_defaults = shared_cstate->num_defaults;
pcdata->relid = shared_cstate->relid;
+ cstate->binary = shared_cstate->binary;
PopulateGlobalsForCopyFrom(cstate, tup_desc, attnamelist);
@@ -1314,7 +1345,7 @@ ParallelWorkerInitialization(ParallelCopyCommonKeyData *shared_cstate,
initStringInfo(&cstate->attribute_buf);
initStringInfo(&cstate->line_buf);
- for (count = 0; count < WORKER_CHUNK_COUNT;count++)
+ for (count = 0; count < WORKER_CHUNK_COUNT; count++)
initStringInfo(&pcdata->worker_line_buf[count].line_buf);
cstate->line_buf_converted = false;
@@ -1331,6 +1362,15 @@ ParallelWorkerInitialization(ParallelCopyCommonKeyData *shared_cstate,
cstate->max_fields = attr_count;
cstate->raw_fields = (char **)palloc(attr_count * sizeof(char *));
}
+
+ cstate->curr_data_block = NULL;
+ cstate->prev_data_block = NULL;
+ cstate->curr_data_offset = 0;
+ cstate->curr_block_pos = 0;
+ cstate->curr_tuple_start_info.block_id = -1;
+ cstate->curr_tuple_start_info.offset = -1;
+ cstate->curr_tuple_end_info.block_id = -1;
+ cstate->curr_tuple_end_info.offset = -1;
}
/*
@@ -1679,32 +1719,59 @@ ParallelCopyLeader(CopyState cstate)
/* Execute the before statement triggers from the leader */
ExecBeforeStmtTrigger(cstate);
- /* On input just throw the header line away. */
- if (cstate->cur_lineno == 0 && cstate->header_line)
+ if (!cstate->binary)
{
- cstate->cur_lineno++;
- if (CopyReadLine(cstate))
+ /* On input just throw the header line away. */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
{
- pcshared_info->is_read_in_progress = false;
- return; /* done */
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ {
+ pcshared_info->is_read_in_progress = false;
+ return; /* done */
+ }
}
- }
- for (;;)
+ for (;;)
+ {
+ bool done;
+ cstate->cur_lineno++;
+
+ /* Actually read the line into memory here. */
+ done = CopyReadLine(cstate);
+
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ break;
+ }
+ }
+ else
{
- bool done;
- cstate->cur_lineno++;
+ cstate->curr_data_block = NULL;
+ cstate->prev_data_block = NULL;
+ cstate->curr_data_offset = 0;
+ cstate->curr_block_pos = 0;
+ cstate->curr_tuple_start_info.block_id = -1;
+ cstate->curr_tuple_start_info.offset = -1;
+ cstate->curr_tuple_end_info.block_id = -1;
+ cstate->curr_tuple_end_info.offset = -1;
+ pcshared_info->populated = 0;
+ cstate->cur_lineno = 0;
- /* Actually read the line into memory here. */
- done = CopyReadLine(cstate);
+ for (;;)
+ {
+ bool eof = false;
+ cstate->cur_lineno++;
- /*
- * EOF at start of line means we're done. If we see EOF after some
- * characters, we act as though it was newline followed by EOF, ie,
- * process the line and then exit loop on next iteration.
- */
- if (done && cstate->line_buf.len == 0)
- break;
+ eof = CopyReadBinaryTupleLeader(cstate);
+
+ if (eof)
+ break;
+ }
}
pcshared_info->is_read_in_progress = false;
@@ -1712,6 +1779,456 @@ ParallelCopyLeader(CopyState cstate)
}
/*
+ * CopyReadBinaryTupleLeader - leader reads data from binary formatted file
+ * to data blocks and identies tuple boundaries/offsets so that workers
+ * can work on the data blocks data.
+ */
+static pg_attribute_always_inline bool
+CopyReadBinaryTupleLeader(CopyState cstate)
+{
+ int readbytes = 0;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint32 block_pos;
+ int16 fld_count;
+ AttrNumber attr_count = list_length(cstate->attnumlist);
+ ListCell *cur;
+ FmgrInfo *in_functions = cstate->in_functions;
+ Oid *typioparams = cstate->typioparams;
+ TupleDesc tupDesc = RelationGetDescr(cstate->rel);
+ uint32 new_block_pos;
+ uint32 line_size;
+
+ if (cstate->curr_data_block == NULL)
+ {
+ block_pos = WaitGetFreeCopyBlock(pcshared_info);
+
+ cstate->curr_block_pos = block_pos;
+
+ cstate->curr_data_block = &pcshared_info->data_blocks[block_pos];
+
+ cstate->curr_data_offset = 0;
+
+ readbytes = CopyGetData(cstate, &cstate->curr_data_block->data, 1, DATA_BLOCK_SIZE);
+
+ elog(DEBUG1, "LEADER - bytes read from file %d", readbytes);
+
+ if (cstate->reached_eof)
+ return true;
+ }
+
+ if (cstate->curr_data_offset + sizeof(fld_count) >= (DATA_BLOCK_SIZE - 1))
+ {
+ ParallelCopyDataBlock *data_block = NULL;
+ uint8 movebytes = 0;
+
+ block_pos = WaitGetFreeCopyBlock(pcshared_info);
+
+ movebytes = DATA_BLOCK_SIZE - cstate->curr_data_offset;
+
+ cstate->curr_data_block->skip_bytes = movebytes;
+
+ data_block = &pcshared_info->data_blocks[block_pos];
+
+ if (movebytes > 0)
+ memmove(&data_block->data[0], &cstate->curr_data_block->data[cstate->curr_data_offset],
+ movebytes);
+
+ elog(DEBUG1, "LEADER - field count is spread across data blocks - moved %d bytes from current block %u to %u block",
+ movebytes, cstate->curr_block_pos, block_pos);
+
+ readbytes = CopyGetData(cstate, &data_block->data[movebytes], 1, (DATA_BLOCK_SIZE - movebytes));
+
+ elog(DEBUG1, "LEADER - bytes read from file after field count is moved to next data block %d", readbytes);
+
+ if (cstate->reached_eof)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("unexpected EOF in COPY data")));
+
+ cstate->curr_data_block = data_block;
+ cstate->curr_data_offset = 0;
+ cstate->curr_block_pos = block_pos;
+ }
+
+ memcpy(&fld_count, &cstate->curr_data_block->data[cstate->curr_data_offset], sizeof(fld_count));
+
+ fld_count = (int16) pg_ntoh16(fld_count);
+
+ if (fld_count == -1)
+ {
+ /*
+ * Received EOF marker. In a V3-protocol copy, wait for the
+ * protocol-level EOF, and complain if it doesn't come
+ * immediately. This ensures that we correctly handle CopyFail,
+ * if client chooses to send that now.
+ *
+ * Note that we MUST NOT try to read more data in an old-protocol
+ * copy, since there is no protocol-level EOF marker then. We
+ * could go either way for copy from file, but choose to throw
+ * error if there's data after the EOF marker, for consistency
+ * with the new-protocol case.
+ */
+ char dummy;
+
+ if (cstate->copy_dest != COPY_OLD_FE &&
+ CopyGetData(cstate, &dummy, 1, 1) > 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("received copy data after EOF marker")));
+ return true;
+ }
+
+ if (fld_count != attr_count)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("row field count is %d, expected %d",
+ (int) fld_count, attr_count)));
+
+ cstate->curr_tuple_start_info.block_id = cstate->curr_block_pos;
+ cstate->curr_tuple_start_info.offset = cstate->curr_data_offset;
+ cstate->curr_data_offset = cstate->curr_data_offset + sizeof(fld_count);
+ new_block_pos = cstate->curr_block_pos;
+
+ foreach(cur, cstate->attnumlist)
+ {
+ int attnum = lfirst_int(cur);
+ int m = attnum - 1;
+ Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+ CopyReadBinaryAttributeLeader(cstate,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &new_block_pos);
+ }
+
+ cstate->curr_tuple_end_info.block_id = new_block_pos;
+ cstate->curr_tuple_end_info.offset = cstate->curr_data_offset-1;;
+
+ if (cstate->curr_tuple_start_info.block_id == cstate->curr_tuple_end_info.block_id)
+ {
+ elog(DEBUG1,"LEADER - tuple lies in a single data block");
+
+ line_size = cstate->curr_tuple_end_info.offset - cstate->curr_tuple_start_info.offset + 1;
+ pg_atomic_add_fetch_u32(&pcshared_info->data_blocks[cstate->curr_tuple_start_info.block_id].unprocessed_line_parts, 1);
+ }
+ else
+ {
+ uint32 following_block_id = pcshared_info->data_blocks[cstate->curr_tuple_start_info.block_id].following_block;
+
+ elog(DEBUG1,"LEADER - tuple is spread across data blocks");
+
+ line_size = DATA_BLOCK_SIZE - cstate->curr_tuple_start_info.offset -
+ pcshared_info->data_blocks[cstate->curr_tuple_start_info.block_id].skip_bytes;
+
+ pg_atomic_add_fetch_u32(&pcshared_info->data_blocks[cstate->curr_tuple_start_info.block_id].unprocessed_line_parts, 1);
+
+ while (following_block_id != cstate->curr_tuple_end_info.block_id)
+ {
+ line_size = line_size + DATA_BLOCK_SIZE - pcshared_info->data_blocks[following_block_id].skip_bytes;
+
+ pg_atomic_add_fetch_u32(&pcshared_info->data_blocks[following_block_id].unprocessed_line_parts, 1);
+
+ following_block_id = pcshared_info->data_blocks[following_block_id].following_block;
+
+ if (following_block_id == -1)
+ break;
+ }
+
+ if (following_block_id != -1)
+ pg_atomic_add_fetch_u32(&pcshared_info->data_blocks[following_block_id].unprocessed_line_parts, 1);
+
+ line_size = line_size + cstate->curr_tuple_end_info.offset + 1;
+ }
+
+ if (line_size > 0)
+ {
+ int line_pos = UpdateBlockInLineInfo(cstate,
+ cstate->curr_tuple_start_info.block_id,
+ cstate->curr_tuple_start_info.offset,
+ line_size,
+ LINE_LEADER_POPULATED);
+
+ pcshared_info->populated++;
+
+ elog(DEBUG1, "LEADER - adding - block:%u, offset:%u, line size:%u line position:%d",
+ cstate->curr_tuple_start_info.block_id,
+ cstate->curr_tuple_start_info.offset,
+ line_size, line_pos);
+ }
+
+ cstate->curr_tuple_start_info.block_id = -1;
+ cstate->curr_tuple_start_info.offset = -1;
+ cstate->curr_tuple_end_info.block_id = -1;
+ cstate->curr_tuple_end_info.offset = -1;
+
+ return false;
+}
+
+/*
+ * CopyReadBinaryAttributeLeader - leader identifies boundaries/offsets
+ * for each attribute/column, it moves on to next data block if the
+ * attribute/column is spread across data blocks.
+ */
+static pg_attribute_always_inline void
+CopyReadBinaryAttributeLeader(CopyState cstate, FmgrInfo *flinfo,
+ Oid typioparam, int32 typmod, uint32 *new_block_pos)
+{
+ int32 fld_size;
+ int readbytes;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint32 block_pos;
+
+ if ((cstate->curr_data_offset + sizeof(fld_size)) >= (DATA_BLOCK_SIZE - 1))
+ {
+ ParallelCopyDataBlock *data_block = NULL;
+ uint8 movebytes = DATA_BLOCK_SIZE - cstate->curr_data_offset;
+
+ block_pos = WaitGetFreeCopyBlock(pcshared_info);
+
+ cstate->curr_data_block->skip_bytes = movebytes;
+
+ data_block = &pcshared_info->data_blocks[block_pos];
+
+ if (movebytes > 0)
+ memmove(&data_block->data[0], &cstate->curr_data_block->data[cstate->curr_data_offset], movebytes);
+
+ elog(DEBUG1, "LEADER - field size is spread across data blocks - moved %d bytes from current block %u to %u block",
+ movebytes, cstate->curr_block_pos, block_pos);
+
+ readbytes = CopyGetData(cstate, &data_block->data[movebytes], 1, DATA_BLOCK_SIZE-movebytes);
+
+ elog(DEBUG1, "LEADER - bytes read from file after field size is moved to next data block %d", readbytes);
+
+ if (cstate->reached_eof)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("unexpected EOF in COPY data")));
+
+ cstate->prev_data_block = cstate->curr_data_block;
+ cstate->prev_data_block->following_block = block_pos;
+ cstate->curr_data_block = data_block;
+ cstate->curr_block_pos = block_pos;
+
+ if (cstate->prev_data_block->curr_blk_completed == false)
+ cstate->prev_data_block->curr_blk_completed = true;
+
+ cstate->curr_data_offset = 0;
+ *new_block_pos = block_pos;
+ }
+
+ memcpy(&fld_size, &cstate->curr_data_block->data[cstate->curr_data_offset], sizeof(fld_size));
+
+ cstate->curr_data_offset = cstate->curr_data_offset + sizeof(fld_size);
+
+ fld_size = (int32) pg_ntoh32(fld_size);
+
+ if (fld_size == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("unexpected EOF in COPY data")));
+
+ if (fld_size < -1)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("invalid field size")));
+
+ if ((DATA_BLOCK_SIZE-cstate->curr_data_offset) >= fld_size)
+ {
+ cstate->curr_data_offset = cstate->curr_data_offset + fld_size;
+ }
+ else
+ {
+ ParallelCopyDataBlock *data_block = NULL;
+
+ block_pos = WaitGetFreeCopyBlock(pcshared_info);
+
+ data_block = &pcshared_info->data_blocks[block_pos];
+
+ readbytes = CopyGetData(cstate, &data_block->data[0], 1, DATA_BLOCK_SIZE);
+
+ elog(DEBUG1, "LEADER - bytes read from file after detecting that tuple is spread across data blocks %d", readbytes);
+
+ if (cstate->reached_eof)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("unexpected EOF in COPY data")));
+
+
+ cstate->prev_data_block = cstate->curr_data_block;
+ cstate->prev_data_block->following_block = block_pos;
+
+ if (cstate->prev_data_block->curr_blk_completed == false)
+ cstate->prev_data_block->curr_blk_completed = true;
+
+ cstate->curr_data_block = data_block;
+ cstate->curr_data_offset = fld_size - (DATA_BLOCK_SIZE - cstate->curr_data_offset);
+ cstate->curr_block_pos = block_pos;
+ *new_block_pos = block_pos;
+ }
+}
+
+/*
+ * CopyReadBinaryTupleWorker - each worker reads data from data blocks after
+ * getting leader-identified tuple offsets from ring data structure.
+ */
+static pg_attribute_always_inline bool
+CopyReadBinaryTupleWorker(CopyState cstate, Datum *values, bool *nulls)
+{
+ uint32 line_pos;
+ ParallelCopyLineBoundary *line_info;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ int16 fld_count;
+ AttrNumber attr_count = list_length(cstate->attnumlist);
+ ListCell *cur;
+ FmgrInfo *in_functions = cstate->in_functions;
+ Oid *typioparams = cstate->typioparams;
+ TupleDesc tupDesc = RelationGetDescr(cstate->rel);
+ int i;
+
+ line_pos = GetLinePosition(cstate);
+
+ if (line_pos == -1)
+ return true;
+
+ line_info = &pcshared_info->line_boundaries.ring[line_pos];
+ cstate->curr_data_block = &pcshared_info->data_blocks[line_info->first_block];
+ cstate->curr_data_offset = line_info->start_offset;
+
+ if (cstate->curr_data_offset + sizeof(fld_count) >= (DATA_BLOCK_SIZE - 1))
+ {
+ /*the case where field count spread across datablocks should never occur.
+ as the leader would have moved it to next block*/
+ elog(DEBUG1, "WORKER - field count spread across datablocks should never occur");
+ }
+
+ memcpy(&fld_count, &cstate->curr_data_block->data[cstate->curr_data_offset], sizeof(fld_count));
+ fld_count = (int16) pg_ntoh16(fld_count);
+
+ if (fld_count == -1)
+ {
+ return true;
+ }
+
+ if (fld_count != attr_count)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("row field count is %d, expected %d",
+ (int) fld_count, attr_count)));
+
+ cstate->curr_data_offset = cstate->curr_data_offset + sizeof(fld_count);
+ i = 0;
+
+ foreach(cur, cstate->attnumlist)
+ {
+ int attnum = lfirst_int(cur);
+ int m = attnum - 1;
+ Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+ cstate->cur_attname = NameStr(att->attname);
+
+ i++;
+ values[m] = CopyReadBinaryAttributeWorker(cstate,
+ i,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &nulls[m]);
+ cstate->cur_attname = NULL;
+ }
+
+ pg_atomic_sub_fetch_u32(&cstate->curr_data_block->unprocessed_line_parts, 1);
+ line_info->start_offset = -1;
+ pg_atomic_write_u32(&line_info->line_state, LINE_WORKER_PROCESSED);
+ pg_atomic_write_u32(&line_info->line_size, -1);
+ pg_atomic_add_fetch_u64(&pcshared_info->total_worker_processed, 1);
+
+ return false;
+}
+
+/*
+ * CopyReadBinaryAttributeWorker - leader identifies boundaries/offsets
+ * for each attribute/column, it moves on to next data block if the
+ * attribute/column is spread across data blocks.
+ */
+static pg_attribute_always_inline Datum
+CopyReadBinaryAttributeWorker(CopyState cstate, int column_no,
+ FmgrInfo *flinfo, Oid typioparam, int32 typmod, bool *isnull)
+{
+ int32 fld_size;
+ Datum result;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+
+ if ((cstate->curr_data_offset + sizeof(fld_size)) >= (DATA_BLOCK_SIZE - 1))
+ {
+ ParallelCopyDataBlock *prev_data_block = cstate->curr_data_block;
+
+ elog(DEBUG1, "WORKER - field size is spread across data blocks");
+
+ cstate->curr_data_block = &pcshared_info->data_blocks[cstate->curr_data_block->following_block];
+ pg_atomic_sub_fetch_u32(&prev_data_block->unprocessed_line_parts, 1);
+ cstate->curr_data_offset = 0;
+ }
+
+ memcpy(&fld_size, &cstate->curr_data_block->data[cstate->curr_data_offset], sizeof(fld_size));
+ fld_size = (int32) pg_ntoh32(fld_size);
+
+ if (fld_size == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("unexpected EOF in COPY data")));
+
+ if (fld_size < -1)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("invalid field size")));
+
+ cstate->curr_data_offset = cstate->curr_data_offset + sizeof(fld_size);
+
+ /* reset attribute_buf to empty, and load raw data in it */
+ resetStringInfo(&cstate->attribute_buf);
+
+ enlargeStringInfo(&cstate->attribute_buf, fld_size);
+
+ if ((DATA_BLOCK_SIZE-cstate->curr_data_offset) >= fld_size)
+ {
+ elog(DEBUG1, "WORKER - tuple lies in single data block");
+
+ memcpy(&cstate->attribute_buf.data[0],&cstate->curr_data_block->data[cstate->curr_data_offset], fld_size);
+ cstate->curr_data_offset = cstate->curr_data_offset + fld_size;
+ }
+ else
+ {
+ ParallelCopyDataBlock *prev_data_block = cstate->curr_data_block;
+
+ elog(DEBUG1, "WORKER - tuple is spread across data blocks");
+
+ memcpy(&cstate->attribute_buf.data[0], &cstate->curr_data_block->data[cstate->curr_data_offset],
+ (DATA_BLOCK_SIZE - cstate->curr_data_offset));
+ cstate->curr_data_block = &pcshared_info->data_blocks[cstate->curr_data_block->following_block];
+ pg_atomic_sub_fetch_u32(&prev_data_block->unprocessed_line_parts, 1);
+ memcpy(&cstate->attribute_buf.data[DATA_BLOCK_SIZE - cstate->curr_data_offset],
+ &cstate->curr_data_block->data[0], (fld_size - (DATA_BLOCK_SIZE - cstate->curr_data_offset)));
+ cstate->curr_data_offset = fld_size - (DATA_BLOCK_SIZE - cstate->curr_data_offset);
+ }
+
+ cstate->attribute_buf.len = fld_size;
+ cstate->attribute_buf.data[fld_size] = '\0';
+
+ /* Call the column type's binary input converter */
+ result = ReceiveFunctionCall(flinfo, &cstate->attribute_buf,
+ typioparam, typmod);
+
+ /* Trouble if it didn't eat the whole buffer */
+ if (cstate->attribute_buf.cursor != cstate->attribute_buf.len)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
+ errmsg("incorrect binary data format")));
+
+ *isnull = false;
+ return result;
+}
+
+/*
* GetLinePosition - return the line position that worker should process.
*/
static uint32
@@ -5449,60 +5966,74 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
else
{
/* binary */
- int16 fld_count;
- ListCell *cur;
+ if (!IsParallelCopy())
+ {
+ int16 fld_count;
+ ListCell *cur;
- cstate->cur_lineno++;
+ cstate->cur_lineno++;
- if (!CopyGetInt16(cstate, &fld_count))
- {
- /* EOF detected (end of file, or protocol-level EOF) */
- return false;
- }
+ if (!CopyGetInt16(cstate, &fld_count))
+ {
+ /* EOF detected (end of file, or protocol-level EOF) */
+ return false;
+ }
- if (fld_count == -1)
- {
- /*
- * Received EOF marker. In a V3-protocol copy, wait for the
- * protocol-level EOF, and complain if it doesn't come
- * immediately. This ensures that we correctly handle CopyFail,
- * if client chooses to send that now.
- *
- * Note that we MUST NOT try to read more data in an old-protocol
- * copy, since there is no protocol-level EOF marker then. We
- * could go either way for copy from file, but choose to throw
- * error if there's data after the EOF marker, for consistency
- * with the new-protocol case.
- */
- char dummy;
+ if (fld_count == -1)
+ {
+ /*
+ * Received EOF marker. In a V3-protocol copy, wait for the
+ * protocol-level EOF, and complain if it doesn't come
+ * immediately. This ensures that we correctly handle CopyFail,
+ * if client chooses to send that now.
+ *
+ * Note that we MUST NOT try to read more data in an old-protocol
+ * copy, since there is no protocol-level EOF marker then. We
+ * could go either way for copy from file, but choose to throw
+ * error if there's data after the EOF marker, for consistency
+ * with the new-protocol case.
+ */
+ char dummy;
+
+ if (cstate->copy_dest != COPY_OLD_FE &&
+ CopyGetData(cstate, &dummy, 1, 1) > 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("received copy data after EOF marker")));
+ return false;
+ }
- if (cstate->copy_dest != COPY_OLD_FE &&
- CopyGetData(cstate, &dummy, 1, 1) > 0)
+ if (fld_count != attr_count)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("received copy data after EOF marker")));
- return false;
+ errmsg("row field count is %d, expected %d",
+ (int) fld_count, attr_count)));
+
+ foreach(cur, cstate->attnumlist)
+ {
+ int attnum = lfirst_int(cur);
+ int m = attnum - 1;
+ Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+ cstate->cur_attname = NameStr(att->attname);
+ values[m] = CopyReadBinaryAttribute(cstate,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &nulls[m]);
+ cstate->cur_attname = NULL;
+ }
}
+ else
+ {
+ bool eof = false;
- if (fld_count != attr_count)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("row field count is %d, expected %d",
- (int) fld_count, attr_count)));
+ cstate->cur_lineno++;
- foreach(cur, cstate->attnumlist)
- {
- int attnum = lfirst_int(cur);
- int m = attnum - 1;
- Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+ eof = CopyReadBinaryTupleWorker(cstate, values, nulls);
- cstate->cur_attname = NameStr(att->attname);
- values[m] = CopyReadBinaryAttribute(cstate,
- &in_functions[m],
- typioparams[m],
- att->atttypmod,
- &nulls[m]);
- cstate->cur_attname = NULL;
+ if (eof)
+ return false;
}
}
--
1.8.3.1
Hi,
Thanks Vignesh for reviewing parallel copy for binary format files
patch. I tried to address the comments in the attached patch
(0006-Parallel-Copy-For-Binary-Format-Files.patch).
On Thu, Jun 18, 2020 at 6:42 PM vignesh C <vignesh21@gmail.com> wrote:
On Mon, Jun 15, 2020 at 4:39 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:The above tests were run with the configuration attached config.txt, which is the same used for performance tests of csv/text files posted earlier in this mail chain.
Request the community to take this patch up for review along with the parallel copy for csv/text file patches and provide feedback.
I had reviewed the patch, few comments:
The new members added should be present in ParallelCopyData
Added to ParallelCopyData.
line_size can be set as and when we process the tuple from
CopyReadBinaryTupleLeader and this can be set at the end. That way the
above code can be removed.
curr_tuple_start_info and curr_tuple_end_info variables are now local
variables to CopyReadBinaryTupleLeader and the line size calculation
code is moved to CopyReadBinaryAttributeLeader.
curr_block_pos variable is present in ParallelCopyShmInfo, we could
use it and remove from here.
curr_data_offset, similar variable raw_buf_index is present in
CopyStateData, we could use it and remove from here.
Yes, making use of them now.
This code is duplicate in CopyReadBinaryTupleLeader &
CopyReadBinaryAttributeLeader. We could make a function and re-use.
Added a new function AdjustFieldInfo.
column_no is not used, it can be removed
Removed.
The above code is present in NextCopyFrom & CopyReadBinaryTupleLeader,
check if we can make a common function or we could use NextCopyFrom as
it is.
Added a macro CHECK_FIELD_COUNT.
+ if (fld_count == -1) + { + return true; + }Should this be an assert in CopyReadBinaryTupleWorker function as this
check is already done in the leader.
This check in leader signifies the end of the file. For the workers,
the eof is when GetLinePosition() returns -1.
line_pos = GetLinePosition(cstate);
if (line_pos == -1)
return true;
In case the if (fld_count == -1) is encountered in the worker, workers
should just return true from CopyReadBinaryTupleWorker marking eof.
Having this as an assert doesn't serve the purpose I feel.
Along with the review comments addressed
patch(0006-Parallel-Copy-For-Binary-Format-Files.patch) also attaching
all other latest series of patches(0001 to 0005) from [1]/messages/by-id/CALDaNm0H3N9gK7CMheoaXkO99g=uAPA93nSZXu0xDarPyPY6sg@mail.gmail.com, the order
of applying patches is from 0001 to 0006.
[1]: /messages/by-id/CALDaNm0H3N9gK7CMheoaXkO99g=uAPA93nSZXu0xDarPyPY6sg@mail.gmail.com
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
0006-Parallel-Copy-For-Binary-Format-Files.patchapplication/octet-stream; name=0006-Parallel-Copy-For-Binary-Format-Files.patchDownload
From be5da9036b223be38d0df4617781eb02634ecdac Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Wed, 24 Jun 2020 13:12:55 +0530
Subject: [PATCH 6/6] Parallel Copy For Binary Format Files
Leader reads data from the file into the DSM data blocks each of 64K size.
It also identifies each tuple data block id, start offset, end offset,
tuple size and updates this information in the ring data structure.
Workers parallely read the tuple information from the ring data structure,
the actual tuple data from the data blocks and parallely insert the tuples
into the table.
---
src/backend/commands/copy.c | 642 ++++++++++++++++++++++++++++++++----
1 file changed, 572 insertions(+), 70 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index e8a89a40a0..e1f03241e8 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -219,6 +219,16 @@ typedef struct ParallelCopyLineBuf
uint64 cur_lineno; /* line number for error messages */
}ParallelCopyLineBuf;
+/*
+ * Tuple boundary information used for parallel copy
+ * for binary format files.
+ */
+typedef struct ParallelCopyTupleInfo
+{
+ uint32 offset;
+ uint32 block_id;
+}ParallelCopyTupleInfo;
+
/*
* Parallel copy data information.
*/
@@ -240,6 +250,11 @@ typedef struct ParallelCopyData
/* Current position in worker_line_buf */
uint32 worker_line_buf_pos;
+
+ /*
+ * For binary formatted files
+ */
+ ParallelCopyDataBlock *curr_data_block;
}ParallelCopyData;
/*
@@ -397,6 +412,7 @@ typedef struct ParallelCopyCommonKeyData
/* Working state for COPY FROM */
AttrNumber num_defaults;
Oid relid;
+ bool binary;
}ParallelCopyCommonKeyData;
/*
@@ -697,6 +713,51 @@ if (!IsParallelCopy()) \
else \
return pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
+/*
+ * CHECK_FIELD_COUNT - Handles the error cases for field count
+ * for binary format files.
+ */
+#define CHECK_FIELD_COUNT \
+{\
+ if (fld_count == -1) \
+ { \
+ if (IsParallelCopy() && \
+ !IsLeader()) \
+ return true; \
+ else \
+ { \
+ /* \
+ * Received EOF marker. In a V3-protocol copy, wait for the \
+ * protocol-level EOF, and complain if it doesn't come \
+ * immediately. This ensures that we correctly handle CopyFail, \
+ * if client chooses to send that now. \
+ * \
+ * Note that we MUST NOT try to read more data in an old-protocol \
+ * copy, since there is no protocol-level EOF marker then. We \
+ * could go either way for copy from file, but choose to throw \
+ * error if there's data after the EOF marker, for consistency \
+ * with the new-protocol case. \
+ */ \
+ char dummy; \
+ if (cstate->copy_dest != COPY_OLD_FE && \
+ CopyGetData(cstate, &dummy, 1, 1) > 0) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("received copy data after EOF marker"))); \
+ if (IsParallelCopy() && \
+ IsLeader()) \
+ return true; \
+ else \
+ return false; \
+ } \
+ } \
+ if (fld_count != cstate->max_fields) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("row field count is %d, expected %d", \
+ (int) fld_count, cstate->max_fields))); \
+}
+
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
@@ -751,6 +812,16 @@ static void PopulateCatalogInformation(CopyState cstate);
static pg_attribute_always_inline uint32 GetLinePosition(CopyState cstate);
static pg_attribute_always_inline copy_data_source_cb LookupParallelCopyFnPtr(const char *funcname);
static pg_attribute_always_inline char* LookupParallelCopyFnStr(copy_data_source_cb fn_addr);
+static uint32 WaitGetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info);
+static pg_attribute_always_inline bool CopyReadBinaryTupleLeader(CopyState cstate);
+static pg_attribute_always_inline void CopyReadBinaryAttributeLeader(CopyState cstate,
+ FmgrInfo *flinfo, Oid typioparam, int32 typmod,
+ uint32 *new_block_pos, int m, ParallelCopyTupleInfo *tuple_start_info_ptr,
+ ParallelCopyTupleInfo *tuple_end_info_ptr, uint32 *line_size);
+static pg_attribute_always_inline bool CopyReadBinaryTupleWorker(CopyState cstate, Datum *values, bool *nulls);
+static pg_attribute_always_inline Datum CopyReadBinaryAttributeWorker(CopyState cstate, FmgrInfo *flinfo,
+ Oid typioparam, int32 typmod, bool *isnull);
+static void AdjustFieldInfo(CopyState cstate, uint8 mode);
/*
* CopyCommonInfoForWorker - Copy shared_cstate using cstate information.
@@ -769,6 +840,7 @@ CopyCommonInfoForWorker(CopyState cstate, ParallelCopyCommonKeyData *shared_csta
shared_cstate->convert_selectively = cstate->convert_selectively;
shared_cstate->num_defaults = cstate->num_defaults;
shared_cstate->relid = cstate->pcdata->relid;
+ shared_cstate->binary = cstate->binary;
}
/*
@@ -997,8 +1069,8 @@ FindInsertMethod(CopyState cstate)
static pg_attribute_always_inline bool
IsParallelCopyAllowed(CopyState cstate)
{
- /* Parallel copy not allowed for frontend (2.0 protocol) & binary option. */
- if ((cstate->copy_dest == COPY_OLD_FE) || cstate->binary)
+ /* Parallel copy not allowed for frontend (2.0 protocol). */
+ if (cstate->copy_dest == COPY_OLD_FE)
return false;
/* Check if copy is into foreign table or temporary table. */
@@ -1299,6 +1371,7 @@ ParallelWorkerInitialization(ParallelCopyCommonKeyData *shared_cstate,
cstate->convert_selectively = shared_cstate->convert_selectively;
cstate->num_defaults = shared_cstate->num_defaults;
pcdata->relid = shared_cstate->relid;
+ cstate->binary = shared_cstate->binary;
PopulateGlobalsForCopyFrom(cstate, tup_desc, attnamelist);
@@ -1314,7 +1387,7 @@ ParallelWorkerInitialization(ParallelCopyCommonKeyData *shared_cstate,
initStringInfo(&cstate->attribute_buf);
initStringInfo(&cstate->line_buf);
- for (count = 0; count < WORKER_CHUNK_COUNT;count++)
+ for (count = 0; count < WORKER_CHUNK_COUNT; count++)
initStringInfo(&pcdata->worker_line_buf[count].line_buf);
cstate->line_buf_converted = false;
@@ -1679,32 +1752,55 @@ ParallelCopyLeader(CopyState cstate)
/* Execute the before statement triggers from the leader */
ExecBeforeStmtTrigger(cstate);
- /* On input just throw the header line away. */
- if (cstate->cur_lineno == 0 && cstate->header_line)
+ if (!cstate->binary)
{
- cstate->cur_lineno++;
- if (CopyReadLine(cstate))
+ /* On input just throw the header line away. */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
{
- pcshared_info->is_read_in_progress = false;
- return; /* done */
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ {
+ pcshared_info->is_read_in_progress = false;
+ return; /* done */
+ }
}
- }
- for (;;)
+ for (;;)
+ {
+ bool done;
+ cstate->cur_lineno++;
+
+ /* Actually read the line into memory here. */
+ done = CopyReadLine(cstate);
+
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ break;
+ }
+ }
+ else
{
- bool done;
- cstate->cur_lineno++;
+ /* binary format */
+ cstate->pcdata->curr_data_block = NULL;
+ cstate->raw_buf_index = 0;
+ pcshared_info->populated = 0;
+ cstate->cur_lineno = 0;
+ cstate->max_fields = list_length(cstate->attnumlist);
- /* Actually read the line into memory here. */
- done = CopyReadLine(cstate);
+ for (;;)
+ {
+ bool eof = false;
+ cstate->cur_lineno++;
- /*
- * EOF at start of line means we're done. If we see EOF after some
- * characters, we act as though it was newline followed by EOF, ie,
- * process the line and then exit loop on next iteration.
- */
- if (done && cstate->line_buf.len == 0)
- break;
+ eof = CopyReadBinaryTupleLeader(cstate);
+
+ if (eof)
+ break;
+ }
}
pcshared_info->is_read_in_progress = false;
@@ -1712,7 +1808,425 @@ ParallelCopyLeader(CopyState cstate)
}
/*
- * GetLinePosition - return the line position that worker should process.
+ * AdjustFieldInfo - gets a new block, updates the
+ * current offset, calculates the skip bytes.
+ * Works in two modes, 1 for field count
+ * 2 for field size
+ */
+static void
+AdjustFieldInfo(CopyState cstate, uint8 mode)
+{
+ ParallelCopyDataBlock *data_block = NULL;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint8 movebytes = 0;
+ uint32 block_pos;
+ uint32 prev_block_pos;
+ int readbytes = 0;
+
+ prev_block_pos = pcshared_info->cur_block_pos;
+
+ block_pos = WaitGetFreeCopyBlock(pcshared_info);
+
+ movebytes = DATA_BLOCK_SIZE - cstate->raw_buf_index;
+
+ cstate->pcdata->curr_data_block->skip_bytes = movebytes;
+
+ data_block = &pcshared_info->data_blocks[block_pos];
+
+ if (movebytes > 0)
+ memmove(&data_block->data[0], &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index],
+ movebytes);
+
+ elog(DEBUG1, "LEADER - field info is spread across data blocks - moved %d bytes from current block %u to %u block",
+ movebytes, prev_block_pos, block_pos);
+
+ readbytes = CopyGetData(cstate, &data_block->data[movebytes], 1, (DATA_BLOCK_SIZE - movebytes));
+
+ elog(DEBUG1, "LEADER - bytes read from file after field info is moved to next data block %d", readbytes);
+
+ if (cstate->reached_eof)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("unexpected EOF in COPY data")));
+
+ if (mode == 1)
+ {
+ cstate->pcdata->curr_data_block = data_block;
+ cstate->raw_buf_index = 0;
+ }
+ else if(mode == 2)
+ {
+ ParallelCopyDataBlock *prev_data_block = NULL;
+ prev_data_block = cstate->pcdata->curr_data_block;
+ prev_data_block->following_block = block_pos;
+ cstate->pcdata->curr_data_block = data_block;
+
+ if (prev_data_block->curr_blk_completed == false)
+ prev_data_block->curr_blk_completed = true;
+
+ cstate->raw_buf_index = 0;
+ }
+}
+
+/*
+ * CopyReadBinaryTupleLeader - leader reads data from binary formatted file
+ * to data blocks and identies tuple boundaries/offsets so that workers
+ * can work on the data blocks data.
+ */
+static pg_attribute_always_inline bool
+CopyReadBinaryTupleLeader(CopyState cstate)
+{
+ int readbytes = 0;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint32 block_pos;
+ int16 fld_count;
+ ListCell *cur;
+ FmgrInfo *in_functions = cstate->in_functions;
+ Oid *typioparams = cstate->typioparams;
+ TupleDesc tupDesc = RelationGetDescr(cstate->rel);
+ uint32 new_block_pos;
+ uint32 line_size = -1;
+ ParallelCopyTupleInfo curr_tuple_start_info;
+ ParallelCopyTupleInfo curr_tuple_end_info;
+
+ curr_tuple_start_info.block_id = -1;
+ curr_tuple_start_info.offset = -1;
+ curr_tuple_end_info.block_id = -1;
+ curr_tuple_end_info.offset = -1;
+
+ if (cstate->pcdata->curr_data_block == NULL)
+ {
+ block_pos = WaitGetFreeCopyBlock(pcshared_info);
+
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[block_pos];
+
+ cstate->raw_buf_index = 0;
+
+ readbytes = CopyGetData(cstate, &cstate->pcdata->curr_data_block->data, 1, DATA_BLOCK_SIZE);
+
+ elog(DEBUG1, "LEADER - bytes read from file %d", readbytes);
+
+ if (cstate->reached_eof)
+ return true;
+ }
+
+ if (cstate->raw_buf_index + sizeof(fld_count) >= (DATA_BLOCK_SIZE - 1))
+ AdjustFieldInfo(cstate, 1);
+
+ memcpy(&fld_count, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_count));
+
+ fld_count = (int16) pg_ntoh16(fld_count);
+
+ CHECK_FIELD_COUNT;
+
+ cstate->raw_buf_index = cstate->raw_buf_index + sizeof(fld_count);
+ new_block_pos = pcshared_info->cur_block_pos;
+
+ foreach(cur, cstate->attnumlist)
+ {
+ int attnum = lfirst_int(cur);
+ int m = attnum - 1;
+ Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+ CopyReadBinaryAttributeLeader(cstate,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &new_block_pos,
+ m,
+ &curr_tuple_start_info,
+ &curr_tuple_end_info,
+ &line_size);
+ }
+
+ if (line_size > 0)
+ {
+ int line_pos = UpdateBlockInLineInfo(cstate,
+ curr_tuple_start_info.block_id,
+ curr_tuple_start_info.offset,
+ line_size,
+ LINE_LEADER_POPULATED);
+
+ pcshared_info->populated++;
+
+ elog(DEBUG1, "LEADER - adding - block:%u, offset:%u, line size:%u line position:%d",
+ curr_tuple_start_info.block_id,
+ curr_tuple_start_info.offset,
+ line_size, line_pos);
+ }
+
+ return false;
+}
+
+/*
+ * CopyReadBinaryAttributeLeader - leader identifies boundaries/offsets
+ * for each attribute/column, it moves on to next data block if the
+ * attribute/column is spread across data blocks.
+ */
+static pg_attribute_always_inline void
+CopyReadBinaryAttributeLeader(CopyState cstate, FmgrInfo *flinfo,
+ Oid typioparam, int32 typmod, uint32 *new_block_pos,
+ int m, ParallelCopyTupleInfo *tuple_start_info_ptr,
+ ParallelCopyTupleInfo *tuple_end_info_ptr, uint32 *line_size)
+{
+ int32 fld_size;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+
+ if (m == 0)
+ {
+ tuple_start_info_ptr->block_id = pcshared_info->cur_block_pos;
+ /* raw_buf_index would have moved away size of field count bytes
+ in the caller, so move back to store the tuple start offset.
+ */
+ tuple_start_info_ptr->offset = cstate->raw_buf_index - sizeof(int16);
+ }
+
+ if ((cstate->raw_buf_index + sizeof(fld_size)) >= (DATA_BLOCK_SIZE - 1))
+ {
+ AdjustFieldInfo(cstate, 2);
+ *new_block_pos = pcshared_info->cur_block_pos;
+ }
+
+ memcpy(&fld_size, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_size));
+
+ cstate->raw_buf_index = cstate->raw_buf_index + sizeof(fld_size);
+
+ fld_size = (int32) pg_ntoh32(fld_size);
+
+ if (fld_size == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("unexpected EOF in COPY data")));
+
+ if (fld_size < -1)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("invalid field size")));
+
+ if ((DATA_BLOCK_SIZE-cstate->raw_buf_index) >= fld_size)
+ {
+ cstate->raw_buf_index = cstate->raw_buf_index + fld_size;
+ }
+ else
+ {
+ ParallelCopyDataBlock *data_block = NULL;
+ ParallelCopyDataBlock *prev_data_block = NULL;
+ uint32 block_pos;
+ int readbytes;
+
+ block_pos = WaitGetFreeCopyBlock(pcshared_info);
+
+ data_block = &pcshared_info->data_blocks[block_pos];
+
+ readbytes = CopyGetData(cstate, &data_block->data[0], 1, DATA_BLOCK_SIZE);
+
+ elog(DEBUG1, "LEADER - bytes read from file after detecting that tuple is spread across data blocks %d", readbytes);
+
+ if (cstate->reached_eof)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("unexpected EOF in COPY data")));
+
+ prev_data_block = cstate->pcdata->curr_data_block;
+ prev_data_block->following_block = block_pos;
+
+ if (prev_data_block->curr_blk_completed == false)
+ prev_data_block->curr_blk_completed = true;
+
+ cstate->pcdata->curr_data_block = data_block;
+ cstate->raw_buf_index = fld_size - (DATA_BLOCK_SIZE - cstate->raw_buf_index);
+ *new_block_pos = block_pos;
+ }
+
+ if (m == cstate->max_fields - 1)
+ {
+ tuple_end_info_ptr->block_id = *new_block_pos;
+ tuple_end_info_ptr->offset = cstate->raw_buf_index - 1;
+
+ if (tuple_start_info_ptr->block_id == tuple_end_info_ptr->block_id)
+ {
+ elog(DEBUG1,"LEADER - tuple lies in a single data block");
+
+ *line_size = tuple_end_info_ptr->offset - tuple_start_info_ptr->offset + 1;
+ pg_atomic_add_fetch_u32(&pcshared_info->data_blocks[tuple_start_info_ptr->block_id].unprocessed_line_parts, 1);
+ }
+ else
+ {
+ uint32 following_block_id = pcshared_info->data_blocks[tuple_start_info_ptr->block_id].following_block;
+
+ elog(DEBUG1,"LEADER - tuple is spread across data blocks");
+
+ *line_size = DATA_BLOCK_SIZE - tuple_start_info_ptr->offset -
+ pcshared_info->data_blocks[tuple_start_info_ptr->block_id].skip_bytes;
+
+ pg_atomic_add_fetch_u32(&pcshared_info->data_blocks[tuple_start_info_ptr->block_id].unprocessed_line_parts, 1);
+
+ while (following_block_id != tuple_end_info_ptr->block_id)
+ {
+ *line_size = *line_size + DATA_BLOCK_SIZE - pcshared_info->data_blocks[following_block_id].skip_bytes;
+
+ pg_atomic_add_fetch_u32(&pcshared_info->data_blocks[following_block_id].unprocessed_line_parts, 1);
+
+ following_block_id = pcshared_info->data_blocks[following_block_id].following_block;
+
+ if (following_block_id == -1)
+ break;
+ }
+
+ if (following_block_id != -1)
+ pg_atomic_add_fetch_u32(&pcshared_info->data_blocks[following_block_id].unprocessed_line_parts, 1);
+
+ *line_size = *line_size + tuple_end_info_ptr->offset + 1;
+ }
+ }
+}
+
+/*
+ * CopyReadBinaryTupleWorker - each worker reads data from data blocks after
+ * getting leader-identified tuple offsets from ring data structure.
+ */
+static pg_attribute_always_inline bool
+CopyReadBinaryTupleWorker(CopyState cstate, Datum *values, bool *nulls)
+{
+ uint32 line_pos;
+ ParallelCopyLineBoundary *line_info;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ int16 fld_count;
+ ListCell *cur;
+ FmgrInfo *in_functions = cstate->in_functions;
+ Oid *typioparams = cstate->typioparams;
+ TupleDesc tupDesc = RelationGetDescr(cstate->rel);
+
+ line_pos = GetLinePosition(cstate);
+
+ if (line_pos == -1)
+ return true;
+
+ line_info = &pcshared_info->line_boundaries.ring[line_pos];
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[line_info->first_block];
+ cstate->raw_buf_index = line_info->start_offset;
+
+ if (cstate->raw_buf_index + sizeof(fld_count) >= (DATA_BLOCK_SIZE - 1))
+ /*the case where field count spread across datablocks should never occur.
+ as the leader would have moved it to next block*/
+ elog(DEBUG1, "WORKER - field count spread across datablocks should never occur");
+
+ memcpy(&fld_count, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_count));
+ fld_count = (int16) pg_ntoh16(fld_count);
+
+ CHECK_FIELD_COUNT;
+
+ cstate->raw_buf_index = cstate->raw_buf_index + sizeof(fld_count);
+
+ foreach(cur, cstate->attnumlist)
+ {
+ int attnum = lfirst_int(cur);
+ int m = attnum - 1;
+ Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+ cstate->cur_attname = NameStr(att->attname);
+
+ values[m] = CopyReadBinaryAttributeWorker(cstate,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &nulls[m]);
+ cstate->cur_attname = NULL;
+ }
+
+ pg_atomic_sub_fetch_u32(&cstate->pcdata->curr_data_block->unprocessed_line_parts, 1);
+ line_info->start_offset = -1;
+ pg_atomic_write_u32(&line_info->line_state, LINE_WORKER_PROCESSED);
+ pg_atomic_write_u32(&line_info->line_size, -1);
+ pg_atomic_add_fetch_u64(&pcshared_info->total_worker_processed, 1);
+
+ return false;
+}
+
+/*
+ * CopyReadBinaryAttributeWorker - leader identifies boundaries/offsets
+ * for each attribute/column, it moves on to next data block if the
+ * attribute/column is spread across data blocks.
+ */
+static pg_attribute_always_inline Datum
+CopyReadBinaryAttributeWorker(CopyState cstate, FmgrInfo *flinfo,
+ Oid typioparam, int32 typmod, bool *isnull)
+{
+ int32 fld_size;
+ Datum result;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+
+ if ((cstate->raw_buf_index + sizeof(fld_size)) >= (DATA_BLOCK_SIZE - 1))
+ {
+ ParallelCopyDataBlock *prev_data_block = cstate->pcdata->curr_data_block;
+
+ elog(DEBUG1, "WORKER - field size is spread across data blocks");
+
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[cstate->pcdata->curr_data_block->following_block];
+ pg_atomic_sub_fetch_u32(&prev_data_block->unprocessed_line_parts, 1);
+ cstate->raw_buf_index = 0;
+ }
+
+ memcpy(&fld_size, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_size));
+ fld_size = (int32) pg_ntoh32(fld_size);
+
+ if (fld_size == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("unexpected EOF in COPY data")));
+
+ if (fld_size < -1)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("invalid field size")));
+
+ cstate->raw_buf_index = cstate->raw_buf_index + sizeof(fld_size);
+
+ /* reset attribute_buf to empty, and load raw data in it */
+ resetStringInfo(&cstate->attribute_buf);
+
+ enlargeStringInfo(&cstate->attribute_buf, fld_size);
+
+ if ((DATA_BLOCK_SIZE - cstate->raw_buf_index) >= fld_size)
+ {
+ elog(DEBUG1, "WORKER - tuple lies in single data block");
+
+ memcpy(&cstate->attribute_buf.data[0],&cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], fld_size);
+ cstate->raw_buf_index = cstate->raw_buf_index + fld_size;
+ }
+ else
+ {
+ ParallelCopyDataBlock *prev_data_block = cstate->pcdata->curr_data_block;
+
+ elog(DEBUG1, "WORKER - tuple is spread across data blocks");
+
+ memcpy(&cstate->attribute_buf.data[0], &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index],
+ (DATA_BLOCK_SIZE - cstate->raw_buf_index));
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[cstate->pcdata->curr_data_block->following_block];
+ pg_atomic_sub_fetch_u32(&prev_data_block->unprocessed_line_parts, 1);
+ memcpy(&cstate->attribute_buf.data[DATA_BLOCK_SIZE - cstate->raw_buf_index],
+ &cstate->pcdata->curr_data_block->data[0], (fld_size - (DATA_BLOCK_SIZE - cstate->raw_buf_index)));
+ cstate->raw_buf_index = fld_size - (DATA_BLOCK_SIZE - cstate->raw_buf_index);
+ }
+
+ cstate->attribute_buf.len = fld_size;
+ cstate->attribute_buf.data[fld_size] = '\0';
+
+ /* Call the column type's binary input converter */
+ result = ReceiveFunctionCall(flinfo, &cstate->attribute_buf,
+ typioparam, typmod);
+
+ /* Trouble if it didn't eat the whole buffer */
+ if (cstate->attribute_buf.cursor != cstate->attribute_buf.len)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
+ errmsg("incorrect binary data format")));
+
+ *isnull = false;
+ return result;
+}
+
+/*
+ * GetLinePosition - return the line position that worker should pcdata->process.
*/
static uint32
GetLinePosition(CopyState cstate)
@@ -1797,6 +2311,7 @@ GetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
{
dataBlkPtr->curr_blk_completed = false;
dataBlkPtr->skip_bytes = 0;
+ dataBlkPtr->following_block = -1;
pcshared_info->cur_block_pos = block_pos;
return block_pos;
}
@@ -5449,60 +5964,47 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
else
{
/* binary */
- int16 fld_count;
- ListCell *cur;
-
- cstate->cur_lineno++;
-
- if (!CopyGetInt16(cstate, &fld_count))
+ if (!IsParallelCopy())
{
- /* EOF detected (end of file, or protocol-level EOF) */
- return false;
- }
+ int16 fld_count;
+ ListCell *cur;
- if (fld_count == -1)
- {
- /*
- * Received EOF marker. In a V3-protocol copy, wait for the
- * protocol-level EOF, and complain if it doesn't come
- * immediately. This ensures that we correctly handle CopyFail,
- * if client chooses to send that now.
- *
- * Note that we MUST NOT try to read more data in an old-protocol
- * copy, since there is no protocol-level EOF marker then. We
- * could go either way for copy from file, but choose to throw
- * error if there's data after the EOF marker, for consistency
- * with the new-protocol case.
- */
- char dummy;
+ cstate->cur_lineno++;
+ cstate->max_fields = list_length(cstate->attnumlist);
- if (cstate->copy_dest != COPY_OLD_FE &&
- CopyGetData(cstate, &dummy, 1, 1) > 0)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("received copy data after EOF marker")));
- return false;
- }
+ if (!CopyGetInt16(cstate, &fld_count))
+ {
+ /* EOF detected (end of file, or protocol-level EOF) */
+ return false;
+ }
- if (fld_count != attr_count)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("row field count is %d, expected %d",
- (int) fld_count, attr_count)));
+ CHECK_FIELD_COUNT;
- foreach(cur, cstate->attnumlist)
+ foreach(cur, cstate->attnumlist)
+ {
+ int attnum = lfirst_int(cur);
+ int m = attnum - 1;
+ Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+ cstate->cur_attname = NameStr(att->attname);
+ values[m] = CopyReadBinaryAttribute(cstate,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &nulls[m]);
+ cstate->cur_attname = NULL;
+ }
+ }
+ else
{
- int attnum = lfirst_int(cur);
- int m = attnum - 1;
- Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+ bool eof = false;
+ cstate->cur_lineno++;
+ cstate->max_fields = list_length(cstate->attnumlist);
- cstate->cur_attname = NameStr(att->attname);
- values[m] = CopyReadBinaryAttribute(cstate,
- &in_functions[m],
- typioparams[m],
- att->atttypmod,
- &nulls[m]);
- cstate->cur_attname = NULL;
+ eof = CopyReadBinaryTupleWorker(cstate, values, nulls);
+
+ if (eof)
+ return false;
}
}
--
2.25.1
0001-Copy-code-readjustment-to-support-parallel-copy.patchapplication/octet-stream; name=0001-Copy-code-readjustment-to-support-parallel-copy.patchDownload
From 4455d3e067bda56316bb292e5d010bdf40254fec Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Tue, 23 Jun 2020 06:49:22 +0530
Subject: [PATCH 1/6] Copy code readjustment to support parallel copy.
This patch has the copy code slightly readjusted so that the common code is
separated to functions/macros, these functions/macros will be used by the
workers in the parallel copy code of the upcoming patches. EOL removal is moved
from CopyReadLine to CopyReadLineText, this change was required because in case
of parallel copy the record identification and record updation is done in
CopyReadLineText, before record information is updated in shared memory the new
line characters should be removed.
---
src/backend/commands/copy.c | 372 ++++++++++++++++++++++++++------------------
1 file changed, 221 insertions(+), 151 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 6b1fd6d..65a504f 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -95,6 +95,9 @@ typedef enum CopyInsertMethod
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
} CopyInsertMethod;
+#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
+
/*
* This struct contains all the state variables used throughout a COPY
* operation. For simplicity, we use the same struct for all variants of COPY,
@@ -219,7 +222,6 @@ typedef struct CopyStateData
* converts it. Note: we guarantee that there is a \0 at
* raw_buf[raw_buf_len].
*/
-#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
char *raw_buf;
int raw_buf_index; /* next byte to process */
int raw_buf_len; /* total # of bytes stored */
@@ -347,6 +349,88 @@ if (1) \
goto not_end_of_copy; \
} else ((void) 0)
+/*
+ * CONVERT_TO_SERVER_ENCODING - convert contents to server encoding.
+ */
+#define CONVERT_TO_SERVER_ENCODING(cstate) \
+{ \
+ /* Done reading the line. Convert it to server encoding. */ \
+ if (cstate->need_transcoding) \
+ { \
+ char *cvt; \
+ cvt = pg_any_to_server(cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ cstate->file_encoding); \
+ if (cvt != cstate->line_buf.data) \
+ { \
+ /* transfer converted data back to line_buf */ \
+ resetStringInfo(&cstate->line_buf); \
+ appendBinaryStringInfo(&cstate->line_buf, cvt, strlen(cvt)); \
+ pfree(cvt); \
+ } \
+ } \
+ /* Now it's safe to use the buffer in error messages */ \
+ cstate->line_buf_converted = true; \
+}
+
+/*
+ * CLEAR_EOL_FROM_COPIED_DATA - Clear EOL from the copied data.
+ */
+#define CLEAR_EOL_FROM_COPIED_DATA(copy_line_data, copy_line_pos, copy_line_size) \
+{ \
+ /* \
+ * If we didn't hit EOF, then we must have transferred the EOL marker \
+ * to line_buf along with the data. Get rid of it. \
+ */ \
+ switch (cstate->eol_type) \
+ { \
+ case EOL_NL: \
+ Assert(copy_line_size >= 1); \
+ Assert(copy_line_data[copy_line_pos - 1] == '\n'); \
+ copy_line_data[copy_line_pos - 1] = '\0'; \
+ copy_line_size--; \
+ break; \
+ case EOL_CR: \
+ Assert(copy_line_size >= 1); \
+ Assert(copy_line_data[copy_line_pos - 1] == '\r'); \
+ copy_line_data[copy_line_pos - 1] = '\0'; \
+ copy_line_size--; \
+ break; \
+ case EOL_CRNL: \
+ Assert(copy_line_size >= 2); \
+ Assert(copy_line_data[copy_line_pos - 2] == '\r'); \
+ Assert(copy_line_data[copy_line_pos - 1] == '\n'); \
+ copy_line_data[copy_line_pos - 2] = '\0'; \
+ copy_line_size -= 2; \
+ break; \
+ case EOL_UNKNOWN: \
+ /* shouldn't get here */ \
+ Assert(false); \
+ break; \
+ } \
+}
+
+/*
+ * CLEAR_EOL_LINE - Wrapper for clearing EOL.
+ */
+#define CLEAR_EOL_LINE() \
+if (!result && !IsHeaderLine()) \
+ CLEAR_EOL_FROM_COPIED_DATA(cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ cstate->line_buf.len) \
+
+/*
+ * INCREMENTPROCESSED - Increment the lines processed.
+ */
+#define INCREMENTPROCESSED(processed) \
+processed++;
+
+/*
+ * GETPROCESSED - Get the lines processed.
+ */
+#define GETPROCESSED(processed) \
+return processed;
+
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
@@ -392,6 +476,8 @@ static bool CopyGetInt32(CopyState cstate, int32 *val);
static void CopySendInt16(CopyState cstate, int16 val);
static bool CopyGetInt16(CopyState cstate, int16 *val);
+static void PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tup_desc,
+ List *attnamelist);
/*
* Send copy start/stop messages for frontend copies. These have changed
@@ -793,6 +879,7 @@ CopyLoadRawBuf(CopyState cstate)
{
int nbytes;
int inbytes;
+ int minread = 1;
if (cstate->raw_buf_index < cstate->raw_buf_len)
{
@@ -804,8 +891,11 @@ CopyLoadRawBuf(CopyState cstate)
else
nbytes = 0; /* no data need be saved */
+ if (cstate->copy_dest == COPY_NEW_FE)
+ minread = RAW_BUF_SIZE - nbytes;
+
inbytes = CopyGetData(cstate, cstate->raw_buf + nbytes,
- 1, RAW_BUF_SIZE - nbytes);
+ minread, RAW_BUF_SIZE - nbytes);
nbytes += inbytes;
cstate->raw_buf[nbytes] = '\0';
cstate->raw_buf_index = 0;
@@ -1463,7 +1553,6 @@ BeginCopy(ParseState *pstate,
{
CopyState cstate;
TupleDesc tupDesc;
- int num_phys_attrs;
MemoryContext oldcontext;
/* Allocate workspace and zero all fields */
@@ -1629,6 +1718,24 @@ BeginCopy(ParseState *pstate,
tupDesc = cstate->queryDesc->tupDesc;
}
+ PopulateGlobalsForCopyFrom(cstate, tupDesc, attnamelist);
+ cstate->copy_dest = COPY_FILE; /* default */
+
+ MemoryContextSwitchTo(oldcontext);
+
+ return cstate;
+}
+
+/*
+ * PopulateGlobalsForCopyFrom - Populates the common variables required for copy
+ * from operation. This is a helper function for BeginCopy function.
+ */
+static void
+PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tupDesc,
+ List *attnamelist)
+{
+ int num_phys_attrs;
+
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
@@ -1748,12 +1855,6 @@ BeginCopy(ParseState *pstate,
pg_database_encoding_max_length() > 1);
/* See Multibyte encoding comment above */
cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
-
- cstate->copy_dest = COPY_FILE; /* default */
-
- MemoryContextSwitchTo(oldcontext);
-
- return cstate;
}
/*
@@ -2646,32 +2747,11 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
}
/*
- * Copy FROM file to relation.
+ * Check if the relation specified in copy from is valid.
*/
-uint64
-CopyFrom(CopyState cstate)
+static void
+CheckTargetRelValidity(CopyState cstate)
{
- ResultRelInfo *resultRelInfo;
- ResultRelInfo *target_resultRelInfo;
- ResultRelInfo *prevResultRelInfo = NULL;
- EState *estate = CreateExecutorState(); /* for ExecConstraints() */
- ModifyTableState *mtstate;
- ExprContext *econtext;
- TupleTableSlot *singleslot = NULL;
- MemoryContext oldcontext = CurrentMemoryContext;
-
- PartitionTupleRouting *proute = NULL;
- ErrorContextCallback errcallback;
- CommandId mycid = GetCurrentCommandId(true);
- int ti_options = 0; /* start with default options for insert */
- BulkInsertState bistate = NULL;
- CopyInsertMethod insertMethod;
- CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
- uint64 processed = 0;
- bool has_before_insert_row_trig;
- bool has_instead_insert_row_trig;
- bool leafpart_use_multi_insert = false;
-
Assert(cstate->rel);
/*
@@ -2708,27 +2788,6 @@ CopyFrom(CopyState cstate)
RelationGetRelationName(cstate->rel))));
}
- /*
- * If the target file is new-in-transaction, we assume that checking FSM
- * for free space is a waste of time. This could possibly be wrong, but
- * it's unlikely.
- */
- if (RELKIND_HAS_STORAGE(cstate->rel->rd_rel->relkind) &&
- (cstate->rel->rd_createSubid != InvalidSubTransactionId ||
- cstate->rel->rd_firstRelfilenodeSubid != InvalidSubTransactionId))
- ti_options |= TABLE_INSERT_SKIP_FSM;
-
- /*
- * Optimize if new relfilenode was created in this subxact or one of its
- * committed children and we won't see those rows later as part of an
- * earlier scan or command. The subxact test ensures that if this subxact
- * aborts then the frozen rows won't be visible after xact cleanup. Note
- * that the stronger test of exactly which subtransaction created it is
- * crucial for correctness of this optimization. The test for an earlier
- * scan or command tolerates false negatives. FREEZE causes other sessions
- * to see rows they would not see under MVCC, and a false negative merely
- * spreads that anomaly to the current session.
- */
if (cstate->freeze)
{
/*
@@ -2766,9 +2825,61 @@ CopyFrom(CopyState cstate)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("cannot perform COPY FREEZE because the table was not created or truncated in the current subtransaction")));
+ }
+}
+
+/*
+ * Copy FROM file to relation.
+ */
+uint64
+CopyFrom(CopyState cstate)
+{
+ ResultRelInfo *resultRelInfo;
+ ResultRelInfo *target_resultRelInfo;
+ ResultRelInfo *prevResultRelInfo = NULL;
+ EState *estate = CreateExecutorState(); /* for ExecConstraints() */
+ ModifyTableState *mtstate;
+ ExprContext *econtext;
+ TupleTableSlot *singleslot = NULL;
+ MemoryContext oldcontext = CurrentMemoryContext;
+
+ PartitionTupleRouting *proute = NULL;
+ ErrorContextCallback errcallback;
+ CommandId mycid = GetCurrentCommandId(true);
+ int ti_options = 0; /* start with default options for insert */
+ BulkInsertState bistate = NULL;
+ CopyInsertMethod insertMethod;
+ CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
+ uint64 processed = 0;
+ bool has_before_insert_row_trig;
+ bool has_instead_insert_row_trig;
+ bool leafpart_use_multi_insert = false;
+
+ CheckTargetRelValidity(cstate);
+ /*
+ * If the target file is new-in-transaction, we assume that checking FSM
+ * for free space is a waste of time. This could possibly be wrong, but
+ * it's unlikely.
+ */
+ if (RELKIND_HAS_STORAGE(cstate->rel->rd_rel->relkind) &&
+ (cstate->rel->rd_createSubid != InvalidSubTransactionId ||
+ cstate->rel->rd_firstRelfilenodeSubid != InvalidSubTransactionId))
+ ti_options |= TABLE_INSERT_SKIP_FSM;
+
+ /*
+ * Optimize if new relfilenode was created in this subxact or one of its
+ * committed children and we won't see those rows later as part of an
+ * earlier scan or command. The subxact test ensures that if this subxact
+ * aborts then the frozen rows won't be visible after xact cleanup. Note
+ * that the stronger test of exactly which subtransaction created it is
+ * crucial for correctness of this optimization. The test for an earlier
+ * scan or command tolerates false negatives. FREEZE causes other sessions
+ * to see rows they would not see under MVCC, and a false negative merely
+ * spreads that anomaly to the current session.
+ */
+ if (cstate->freeze)
ti_options |= TABLE_INSERT_FROZEN;
- }
/*
* We need a ResultRelInfo so we can use the regular executor's
@@ -3261,7 +3372,7 @@ CopyFrom(CopyState cstate)
* or FDW; this is the same definition used by nodeModifyTable.c
* for counting tuples inserted by an INSERT command.
*/
- processed++;
+ INCREMENTPROCESSED(processed)
}
}
@@ -3316,30 +3427,15 @@ CopyFrom(CopyState cstate)
FreeExecutorState(estate);
- return processed;
+ GETPROCESSED(processed)
}
/*
- * Setup to read tuples from a file for COPY FROM.
- *
- * 'rel': Used as a template for the tuples
- * 'filename': Name of server-local file to read
- * 'attnamelist': List of char *, columns to include. NIL selects all cols.
- * 'options': List of DefElem. See copy_opt_item in gram.y for selections.
- *
- * Returns a CopyState, to be passed to NextCopyFrom and related functions.
+ * PopulateCatalogInformation - populate the catalog information.
*/
-CopyState
-BeginCopyFrom(ParseState *pstate,
- Relation rel,
- const char *filename,
- bool is_program,
- copy_data_source_cb data_source_cb,
- List *attnamelist,
- List *options)
+static void
+PopulateCatalogInformation(CopyState cstate)
{
- CopyState cstate;
- bool pipe = (filename == NULL);
TupleDesc tupDesc;
AttrNumber num_phys_attrs,
num_defaults;
@@ -3349,31 +3445,8 @@ BeginCopyFrom(ParseState *pstate,
Oid in_func_oid;
int *defmap;
ExprState **defexprs;
- MemoryContext oldcontext;
bool volatile_defexprs;
- cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
- oldcontext = MemoryContextSwitchTo(cstate->copycontext);
-
- /* Initialize state variables */
- cstate->reached_eof = false;
- cstate->eol_type = EOL_UNKNOWN;
- cstate->cur_relname = RelationGetRelationName(cstate->rel);
- cstate->cur_lineno = 0;
- cstate->cur_attname = NULL;
- cstate->cur_attval = NULL;
-
- /* Set up variables to avoid per-attribute overhead. */
- initStringInfo(&cstate->attribute_buf);
- initStringInfo(&cstate->line_buf);
- cstate->line_buf_converted = false;
- cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
- cstate->raw_buf_index = cstate->raw_buf_len = 0;
-
- /* Assign range table, we'll need it in CopyFrom. */
- if (pstate)
- cstate->range_table = pstate->p_rtable;
-
tupDesc = RelationGetDescr(cstate->rel);
num_phys_attrs = tupDesc->natts;
num_defaults = 0;
@@ -3451,6 +3524,54 @@ BeginCopyFrom(ParseState *pstate,
cstate->defexprs = defexprs;
cstate->volatile_defexprs = volatile_defexprs;
cstate->num_defaults = num_defaults;
+}
+
+/*
+ * Setup to read tuples from a file for COPY FROM.
+ *
+ * 'rel': Used as a template for the tuples
+ * 'filename': Name of server-local file to read
+ * 'attnamelist': List of char *, columns to include. NIL selects all cols.
+ * 'options': List of DefElem. See copy_opt_item in gram.y for selections.
+ *
+ * Returns a CopyState, to be passed to NextCopyFrom and related functions.
+ */
+CopyState
+BeginCopyFrom(ParseState *pstate,
+ Relation rel,
+ const char *filename,
+ bool is_program,
+ copy_data_source_cb data_source_cb,
+ List *attnamelist,
+ List *options)
+{
+ CopyState cstate;
+ bool pipe = (filename == NULL);
+ MemoryContext oldcontext;
+
+ cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+
+ /* Initialize state variables */
+ cstate->reached_eof = false;
+ cstate->eol_type = EOL_UNKNOWN;
+ cstate->cur_relname = RelationGetRelationName(cstate->rel);
+ cstate->cur_lineno = 0;
+ cstate->cur_attname = NULL;
+ cstate->cur_attval = NULL;
+
+ /* Set up variables to avoid per-attribute overhead. */
+ initStringInfo(&cstate->attribute_buf);
+ initStringInfo(&cstate->line_buf);
+ cstate->line_buf_converted = false;
+ cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+
+ /* Assign range table, we'll need it in CopyFrom. */
+ if (pstate)
+ cstate->range_table = pstate->p_rtable;
+
+ PopulateCatalogInformation(cstate);
cstate->is_program = is_program;
if (data_source_cb)
@@ -3860,60 +3981,8 @@ CopyReadLine(CopyState cstate)
} while (CopyLoadRawBuf(cstate));
}
}
- else
- {
- /*
- * If we didn't hit EOF, then we must have transferred the EOL marker
- * to line_buf along with the data. Get rid of it.
- */
- switch (cstate->eol_type)
- {
- case EOL_NL:
- Assert(cstate->line_buf.len >= 1);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\n');
- cstate->line_buf.len--;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_CR:
- Assert(cstate->line_buf.len >= 1);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\r');
- cstate->line_buf.len--;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_CRNL:
- Assert(cstate->line_buf.len >= 2);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 2] == '\r');
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\n');
- cstate->line_buf.len -= 2;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_UNKNOWN:
- /* shouldn't get here */
- Assert(false);
- break;
- }
- }
-
- /* Done reading the line. Convert it to server encoding. */
- if (cstate->need_transcoding)
- {
- char *cvt;
-
- cvt = pg_any_to_server(cstate->line_buf.data,
- cstate->line_buf.len,
- cstate->file_encoding);
- if (cvt != cstate->line_buf.data)
- {
- /* transfer converted data back to line_buf */
- resetStringInfo(&cstate->line_buf);
- appendBinaryStringInfo(&cstate->line_buf, cvt, strlen(cvt));
- pfree(cvt);
- }
- }
-
- /* Now it's safe to use the buffer in error messages */
- cstate->line_buf_converted = true;
+ CONVERT_TO_SERVER_ENCODING(cstate)
return result;
}
@@ -4277,6 +4346,7 @@ not_end_of_copy:
* Transfer any still-uncopied data to line_buf.
*/
REFILL_LINEBUF;
+ CLEAR_EOL_LINE()
return result;
}
--
1.8.3.1
0002-Framework-for-leader-worker-in-parallel-copy.patchapplication/octet-stream; name=0002-Framework-for-leader-worker-in-parallel-copy.patchDownload
From 447a954eed01432c170ad94e3ffffa30112f53aa Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>,Bharath <bharath.rupireddyforpostgres@gmail.com>
Date: Tue, 23 Jun 2020 07:04:46 +0530
Subject: [PATCH 2/6] Framework for leader/worker in parallel copy.
This patch has the framework for data structures in parallel copy, leader
initialization, worker initialization, shared memory updation, starting workers,
wait for workers and workers exiting.
---
src/backend/access/transam/parallel.c | 4 +
src/backend/commands/copy.c | 838 +++++++++++++++++++++++++++-
src/backend/replication/logical/tablesync.c | 2 +-
src/include/commands/copy.h | 4 +
src/tools/pgindent/typedefs.list | 8 +
5 files changed, 851 insertions(+), 5 deletions(-)
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 14a8690..09e7a19 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -25,6 +25,7 @@
#include "catalog/pg_enum.h"
#include "catalog/storage.h"
#include "commands/async.h"
+#include "commands/copy.h"
#include "executor/execParallel.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
@@ -145,6 +146,9 @@ static const struct
},
{
"parallel_vacuum_main", parallel_vacuum_main
+ },
+ {
+ "ParallelCopyMain", ParallelCopyMain
}
};
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 65a504f..34c657c 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -96,9 +96,138 @@ typedef enum CopyInsertMethod
} CopyInsertMethod;
#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+
+#define DATA_BLOCK_SIZE RAW_BUF_SIZE
+#define RINGSIZE (10 * 1000)
+#define MAX_BLOCKS_COUNT 1000
+#define WORKER_CHUNK_COUNT 50 /* should be mod of RINGSIZE */
+
+#define IsParallelCopy() (cstate->is_parallel)
+#define IsLeader() (cstate->pcdata->is_leader)
#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
/*
+ * Copy data block information.
+ */
+typedef struct ParallelCopyDataBlock
+{
+ /* The number of unprocessed lines in the current block. */
+ pg_atomic_uint32 unprocessed_line_parts;
+
+ /*
+ * If the current line data is continued into another block, following_block
+ * will have the position where the remaining data need to be read.
+ */
+ uint32 following_block;
+
+ /*
+ * This flag will be set, when the leader finds out this block can be read
+ * safely by the worker. This helps the worker to start processing the line
+ * early where the line will be spread across many blocks and the worker
+ * need not wait for the complete line to be processed.
+ */
+ bool curr_blk_completed;
+ char data[DATA_BLOCK_SIZE]; /* data read from file */
+ uint8 skip_bytes;
+}ParallelCopyDataBlock;
+
+/*
+ * Individual line information.
+ * ParallelCopyLineBoundary is common data structure between leader & worker,
+ * this is protected by the following sequence in the leader & worker.
+ * Leader should operate in the following order:
+ * 1) update first_block, start_offset & cur_lineno in any order.
+ * 2) update line_size.
+ * 3) update line_state.
+ * Worker should operate in the following order:
+ * 1) read line_size.
+ * 2) only one worker should choose one line for processing, this is handled by
+ * using pg_atomic_compare_exchange_u32, worker will change the sate to
+ * LINE_WORKER_PROCESSING only if line_state is LINE_LEADER_POPULATED.
+ * 3) read first_block, start_offset & cur_lineno in any order.
+ */
+typedef struct ParallelCopyLineBoundary
+{
+ /* Position of the first block in data_blocks array. */
+ uint32 first_block;
+ uint32 start_offset; /* start offset of the line */
+
+ /*
+ * Size of the current line -1 means line is yet to be filled completely,
+ * 0 means empty line, >0 means line filled with line size data.
+ */
+ pg_atomic_uint32 line_size;
+ pg_atomic_uint32 line_state; /* line state */
+ uint64 cur_lineno; /* line number for error messages */
+}ParallelCopyLineBoundary;
+
+/*
+ * Circular queue used to store the line information.
+ */
+typedef struct ParallelCopyLineBoundaries
+{
+ /* Position for the leader to populate a line. */
+ uint32 leader_pos;
+
+ /* Data read from the file/stdin by the leader process. */
+ ParallelCopyLineBoundary ring[RINGSIZE];
+}ParallelCopyLineBoundaries;
+
+/*
+ * Shared information among parallel copy workers. This will be allocated in the
+ * DSM segment.
+ */
+typedef struct ParallelCopyShmInfo
+{
+ bool is_read_in_progress; /* file read status */
+
+ /*
+ * Actual lines inserted by worker (some records will be filtered based on
+ * where condition).
+ */
+ pg_atomic_uint64 processed;
+ pg_atomic_uint64 total_worker_processed; /* total processed records by the workers */
+ uint64 populated; /* lines populated by leader */
+ uint32 cur_block_pos; /* current data block */
+ ParallelCopyDataBlock data_blocks[MAX_BLOCKS_COUNT]; /* data block array */
+ FullTransactionId full_transaction_id; /* xid for copy from statement */
+ CommandId mycid; /* command id */
+ ParallelCopyLineBoundaries line_boundaries; /* line array */
+} ParallelCopyShmInfo;
+
+/*
+ * Parallel copy line buffer information.
+ */
+typedef struct ParallelCopyLineBuf
+{
+ StringInfoData line_buf;
+ uint64 cur_lineno; /* line number for error messages */
+}ParallelCopyLineBuf;
+
+/*
+ * Parallel copy data information.
+ */
+typedef struct ParallelCopyData
+{
+ Oid relid; /* relation id of the table */
+ ParallelCopyShmInfo *pcshared_info; /* common info in shared memory */
+ bool is_leader;
+
+ /* line position which worker is processing */
+ uint32 worker_processed_pos;
+
+ /*
+ * Local line_buf array, workers will copy it here and release the lines
+ * for the leader to continue.
+ */
+ ParallelCopyLineBuf worker_line_buf[WORKER_CHUNK_COUNT];
+ uint32 worker_line_buf_count; /* Number of lines */
+
+ /* Current position in worker_line_buf */
+ uint32 worker_line_buf_pos;
+}ParallelCopyData;
+
+/*
* This struct contains all the state variables used throughout a COPY
* operation. For simplicity, we use the same struct for all variants of COPY,
* even though some fields are used in only some cases.
@@ -225,8 +354,66 @@ typedef struct CopyStateData
char *raw_buf;
int raw_buf_index; /* next byte to process */
int raw_buf_len; /* total # of bytes stored */
+ int nworkers;
+ bool is_parallel;
+ ParallelCopyData *pcdata;
} CopyStateData;
+/*
+ * This structure helps in storing the common data from CopyStateData that are
+ * required by the workers. This information will then be allocated and stored
+ * into the DSM for the worker to retrieve and copy it to CopyStateData.
+ */
+typedef struct ParallelCopyCommonKeyData
+{
+ /* low-level state data */
+ CopyDest copy_dest; /* type of copy source/destination */
+ int file_encoding; /* file or remote side's character encoding */
+ bool need_transcoding; /* file encoding diff from server? */
+ bool encoding_embeds_ascii; /* ASCII can be non-first byte? */
+
+ /* parameters from the COPY command */
+ bool csv_mode; /* Comma Separated Value format? */
+ bool header_line; /* CSV header line? */
+ int null_print_len; /* length of same */
+ bool force_quote_all; /* FORCE_QUOTE *? */
+ bool convert_selectively; /* do selective binary conversion? */
+
+ /* Working state for COPY FROM */
+ AttrNumber num_defaults;
+ Oid relid;
+}ParallelCopyCommonKeyData;
+
+/*
+ * This structure will help in converting a List data type into the below
+ * structure format with the count having the number of elements in the list and
+ * the info having the List elements appended contigously. This converted
+ * structure will be allocated in shared memory and stored in DSM for the worker
+ * to retrieve and later convert it back to List data type.
+ */
+typedef struct ParallelCopyKeyListInfo
+{
+ int count; /* count of attributes */
+
+ /* string info in the form info followed by info1, info2... infon */
+ char info[1];
+} ParallelCopyKeyListInfo;
+
+/*
+ * List of internal parallel copy function pointers.
+ */
+static const struct
+{
+ char *fn_name;
+ copy_data_source_cb fn_addr;
+} InternalParallelCopyFuncPtrs[] =
+
+{
+ {
+ "copy_read_data", copy_read_data
+ },
+};
+
/* DestReceiver for COPY (query) TO */
typedef struct
{
@@ -256,6 +443,23 @@ typedef struct
/* Trim the list of buffers back down to this number after flushing */
#define MAX_PARTITION_BUFFERS 32
+/* DSM keys for parallel copy. */
+#define PARALLEL_COPY_KEY_SHARED_INFO 1
+#define PARALLEL_COPY_KEY_CSTATE 2
+#define PARALLEL_COPY_KEY_NULL_PRINT 3
+#define PARALLEL_COPY_KEY_NULL_PRINT_CLIENT 4
+#define PARALLEL_COPY_KEY_DELIM 5
+#define PARALLEL_COPY_KEY_QUOTE 6
+#define PARALLEL_COPY_KEY_ESCAPE 7
+#define PARALLEL_COPY_KEY_ATTNAME_LIST 8
+#define PARALLEL_COPY_KEY_FORCE_QUOTE_LIST 9
+#define PARALLEL_COPY_KEY_NOT_NULL_LIST 10
+#define PARALLEL_COPY_KEY_NULL_LIST 11
+#define PARALLEL_COPY_KEY_CONVERT_LIST 12
+#define PARALLEL_COPY_KEY_DATASOURCE_CB 13
+#define PARALLEL_COPY_KEY_WHERE_CLAUSE_STR 14
+#define PARALLEL_COPY_KEY_RANGE_TABLE 15
+
/* Stores multi-insert data related to a single relation in CopyFrom. */
typedef struct CopyMultiInsertBuffer
{
@@ -476,10 +680,596 @@ static bool CopyGetInt32(CopyState cstate, int32 *val);
static void CopySendInt16(CopyState cstate, int16 val);
static bool CopyGetInt16(CopyState cstate, int16 *val);
+static pg_attribute_always_inline void EndParallelCopy(ParallelContext *pcxt);
static void PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tup_desc,
List *attnamelist);
+static pg_attribute_always_inline copy_data_source_cb LookupParallelCopyFnPtr(const char *funcname);
+static pg_attribute_always_inline char* LookupParallelCopyFnStr(copy_data_source_cb fn_addr);
/*
+ * CopyCommonInfoForWorker - Copy shared_cstate using cstate information.
+ */
+static pg_attribute_always_inline void
+CopyCommonInfoForWorker(CopyState cstate, ParallelCopyCommonKeyData *shared_cstate)
+{
+ shared_cstate->copy_dest = cstate->copy_dest;
+ shared_cstate->file_encoding = cstate->file_encoding;
+ shared_cstate->need_transcoding = cstate->need_transcoding;
+ shared_cstate->encoding_embeds_ascii = cstate->encoding_embeds_ascii;
+ shared_cstate->csv_mode = cstate->csv_mode;
+ shared_cstate->header_line = cstate->header_line;
+ shared_cstate->null_print_len = cstate->null_print_len;
+ shared_cstate->force_quote_all = cstate->force_quote_all;
+ shared_cstate->convert_selectively = cstate->convert_selectively;
+ shared_cstate->num_defaults = cstate->num_defaults;
+ shared_cstate->relid = cstate->pcdata->relid;
+}
+
+/*
+ * RetrieveSharedString - Retrieve the string from shared memory.
+ */
+static void
+RetrieveSharedString(shm_toc *toc, int sharedkey, char **copystr)
+{
+ char *shared_str_val = (char *)shm_toc_lookup(toc, sharedkey, true);
+ if (shared_str_val)
+ *copystr = pstrdup(shared_str_val);
+}
+
+/*
+ * RetrieveSharedList - Retrieve the list from shared memory.
+ */
+static void
+RetrieveSharedList(shm_toc *toc, int sharedkey, List **copylist)
+{
+ ParallelCopyKeyListInfo *listinformation = (ParallelCopyKeyListInfo *)shm_toc_lookup(toc, sharedkey,
+ true);
+ if (listinformation)
+ {
+ int length = 0;
+ int count;
+
+ for (count = 0; count < listinformation->count; count++)
+ {
+ char *attname = (char *)(listinformation->info + length);
+ length += strlen(attname) + 1;
+ *copylist = lappend(*copylist, makeString(attname));
+ }
+ }
+}
+
+/*
+ * CopyListSharedMemory - Copy the list into shared memory.
+ */
+static void
+CopyListSharedMemory(List *inputlist, Size memsize, ParallelCopyKeyListInfo *sharedlistinfo)
+{
+ ListCell *l;
+ int length = 0;
+
+ MemSet(sharedlistinfo, 0, memsize);
+ foreach(l, inputlist)
+ {
+ char *name = strVal(lfirst(l));
+ memcpy((char *)(sharedlistinfo->info + length), name, strlen(name));
+ sharedlistinfo->count++;
+ length += strlen(name) + 1;
+ }
+}
+
+/*
+ * ComputeListSize - compute the list size.
+ */
+static int
+ComputeListSize(List *inputlist)
+{
+ int est_size = sizeof(int);
+ if (inputlist != NIL)
+ {
+ ListCell *l;
+ foreach(l, inputlist)
+ est_size += strlen(strVal(lfirst(l))) + 1;
+ }
+
+ return est_size;
+}
+
+/*
+ * EstimateLineKeysStr - Estimate the size required in shared memory for the
+ * input string.
+ */
+static void
+EstimateLineKeysStr(ParallelContext *pcxt, char *inputstr)
+{
+ if (inputstr)
+ {
+ shm_toc_estimate_chunk(&pcxt->estimator, strlen(inputstr) + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+}
+
+/*
+ * EstimateLineKeysList - Estimate the size required in shared memory for the
+ * input list.
+ */
+static void
+EstimateLineKeysList(ParallelContext *pcxt, List *inputlist,
+ Size *est_list_size)
+{
+ if (inputlist != NIL)
+ {
+ *est_list_size = ComputeListSize(inputlist);
+ shm_toc_estimate_chunk(&pcxt->estimator, *est_list_size);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+}
+
+/*
+ * InsertStringShm - Insert a string into shared memory.
+ */
+static void
+InsertStringShm(ParallelContext *pcxt, int key, char *inputstr)
+{
+ if (inputstr)
+ {
+ char *shmptr = (char *)shm_toc_allocate(pcxt->toc,
+ strlen(inputstr) + 1);
+ strcpy(shmptr, inputstr);
+ shm_toc_insert(pcxt->toc, key, shmptr);
+ }
+}
+
+/*
+ * InsertListShm - Insert a list into shared memory.
+ */
+static void
+InsertListShm(ParallelContext *pcxt, int key, List *inputlist,
+ Size est_list_size)
+{
+ if (inputlist != NIL)
+ {
+ ParallelCopyKeyListInfo *sharedlistinfo = (ParallelCopyKeyListInfo *)shm_toc_allocate(pcxt->toc,
+ est_list_size);
+ CopyListSharedMemory(inputlist, est_list_size, sharedlistinfo);
+ shm_toc_insert(pcxt->toc, key, sharedlistinfo);
+ }
+}
+
+/*
+ * BeginParallelCopy - start parallel copy tasks.
+ *
+ * Get the number of workers required to perform the parallel copy. The data
+ * structures that are required by the parallel workers will be initialized, the
+ * size required in DSM will be calculated and the necessary keys will be loaded
+ * in the DSM. The specified number of workers will then be launched.
+ *
+ */
+static ParallelContext*
+BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
+{
+ ParallelContext *pcxt;
+ ParallelCopyShmInfo *shared_info_ptr;
+ ParallelCopyCommonKeyData *shared_cstate;
+ FullTransactionId full_transaction_id;
+ Size est_shared_info;
+ Size est_cstateshared;
+ Size est_att_list_size;
+ Size est_quote_list_size;
+ Size est_notnull_list_size;
+ Size est_null_list_size;
+ Size est_convert_list_size;
+ Size est_datasource_cb_size;
+ int count = 0;
+ char *whereClauseStr = NULL;
+ char *rangeTableStr = NULL;
+ int parallel_workers = 0;
+ ParallelCopyData *pcdata;
+
+ parallel_workers = Min(nworkers, max_worker_processes);
+
+ /* Can't perform copy in parallel */
+ if (parallel_workers <= 0)
+ {
+ elog(WARNING,
+ "No workers available, copy will be run in non-parallel mode");
+ return NULL;
+ }
+
+ pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
+ cstate->pcdata = pcdata;
+
+ EnterParallelMode();
+ pcxt = CreateParallelContext("postgres", "ParallelCopyMain",
+ parallel_workers);
+ Assert(pcxt->nworkers > 0);
+
+ /*
+ * Estimate size for shared information for PARALLEL_COPY_KEY_SHARED_INFO
+ */
+ est_shared_info = sizeof(ParallelCopyShmInfo);
+ shm_toc_estimate_chunk(&pcxt->estimator, est_shared_info);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate the size for shared information for PARALLEL_COPY_KEY_CSTATE */
+ est_cstateshared = MAXALIGN(sizeof(ParallelCopyCommonKeyData));
+ shm_toc_estimate_chunk(&pcxt->estimator, est_cstateshared);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ EstimateLineKeysStr(pcxt, cstate->null_print);
+ EstimateLineKeysStr(pcxt, cstate->null_print_client);
+ EstimateLineKeysStr(pcxt, cstate->delim);
+ EstimateLineKeysStr(pcxt, cstate->quote);
+ EstimateLineKeysStr(pcxt, cstate->escape);
+
+ if (cstate->whereClause != NULL)
+ {
+ whereClauseStr = nodeToString(cstate->whereClause);
+ EstimateLineKeysStr(pcxt, whereClauseStr);
+ }
+
+ if (cstate->range_table != NULL)
+ {
+ rangeTableStr = nodeToString(cstate->range_table);
+ EstimateLineKeysStr(pcxt, rangeTableStr);
+ }
+
+ /* Estimate the size for shared information for PARALLEL_COPY_KEY_XID. */
+ shm_toc_estimate_chunk(&pcxt->estimator, sizeof(FullTransactionId));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_ATTNAME_LIST.
+ */
+ EstimateLineKeysList(pcxt, attnamelist, &est_att_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_FORCE_QUOTE_LIST.
+ */
+ EstimateLineKeysList(pcxt, cstate->force_quote, &est_quote_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_NOT_NULL_LIST.
+ */
+ EstimateLineKeysList(pcxt, cstate->force_notnull,
+ &est_notnull_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_NULL_LIST.
+ */
+ EstimateLineKeysList(pcxt, cstate->force_null, &est_null_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_CONVERT_LIST.
+ */
+ EstimateLineKeysList(pcxt, cstate->convert_select,
+ &est_convert_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_DATASOURCE_CB.
+ */
+ if (cstate->data_source_cb)
+ {
+ char *functionname = LookupParallelCopyFnStr(cstate->data_source_cb);
+ est_datasource_cb_size = strlen(functionname) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, est_datasource_cb_size);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+
+ InitializeParallelDSM(pcxt);
+
+ /* If no DSM segment was available, back out (do serial copy) */
+ if (pcxt->seg == NULL)
+ {
+ EndParallelCopy(pcxt);
+ elog(WARNING,
+ "DSM segments not available, copy will be run in non-parallel mode");
+ return NULL;
+ }
+
+ /* Allocate shared memory for PARALLEL_COPY_KEY_SHARED_INFO */
+ shared_info_ptr = (ParallelCopyShmInfo *) shm_toc_allocate(pcxt->toc,
+ est_shared_info);
+ MemSet(shared_info_ptr, 0, est_shared_info);
+ shared_info_ptr->is_read_in_progress = true;
+ shared_info_ptr->cur_block_pos = -1;
+ shared_info_ptr->full_transaction_id = full_transaction_id;
+ shared_info_ptr->mycid = GetCurrentCommandId(true);
+ for (count = 0; count < RINGSIZE; count++)
+ {
+ ParallelCopyLineBoundary *lineInfo = &shared_info_ptr->line_boundaries.ring[count];
+ pg_atomic_init_u32(&(lineInfo->line_size), -1);
+ }
+
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_SHARED_INFO, shared_info_ptr);
+ pcdata->pcshared_info = shared_info_ptr;
+ pcdata->relid = relid;
+
+ /* Store shared build state, for which we reserved space. */
+ shared_cstate = (ParallelCopyCommonKeyData *)shm_toc_allocate(pcxt->toc,
+ est_cstateshared);
+
+ /* copy cstate variables. */
+ CopyCommonInfoForWorker(cstate, shared_cstate);
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_CSTATE, shared_cstate);
+
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_NULL_PRINT, cstate->null_print);
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_NULL_PRINT_CLIENT,
+ cstate->null_print_client);
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_DELIM, cstate->delim);
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_QUOTE, cstate->quote);
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_ESCAPE, cstate->escape);
+
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_ATTNAME_LIST,
+ attnamelist, est_att_list_size);
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_FORCE_QUOTE_LIST,
+ cstate->force_quote, est_quote_list_size);
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_NOT_NULL_LIST,
+ cstate->force_notnull, est_notnull_list_size);
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_NULL_LIST, cstate->force_null,
+ est_null_list_size);
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_CONVERT_LIST,
+ cstate->convert_select, est_convert_list_size);
+
+ if (cstate->data_source_cb)
+ {
+ char *functionname = LookupParallelCopyFnStr(cstate->data_source_cb);
+ char *data_source_cb = (char *) shm_toc_allocate(pcxt->toc,
+ est_datasource_cb_size);
+ strcpy(data_source_cb, functionname);
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_DATASOURCE_CB,
+ data_source_cb);
+ }
+
+ if (cstate->whereClause)
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_WHERE_CLAUSE_STR,
+ whereClauseStr);
+
+ if(cstate->range_table)
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_RANGE_TABLE, rangeTableStr);
+
+ LaunchParallelWorkers(pcxt);
+ if (pcxt->nworkers_launched == 0)
+ {
+ EndParallelCopy(pcxt);
+ elog(WARNING,
+ "No workers available, copy will be run in non-parallel mode");
+ return NULL;
+ }
+
+ /*
+ * Caller needs to wait for all launched workers when we return. Make sure
+ * that the failure-to-start case will not hang forever.
+ */
+ WaitForParallelWorkersToAttach(pcxt);
+
+ pcdata->is_leader = true;
+ cstate->is_parallel = true;
+ return pcxt;
+}
+
+/*
+ * EndParallelCopy - end the parallel copy tasks.
+ */
+static pg_attribute_always_inline void
+EndParallelCopy(ParallelContext *pcxt)
+{
+ Assert(!IsParallelWorker());
+
+ DestroyParallelContext(pcxt);
+ ExitParallelMode();
+}
+
+/*
+ * ParallelWorkerInitialization - Initialize parallel worker.
+ */
+static void
+ParallelWorkerInitialization(ParallelCopyCommonKeyData *shared_cstate,
+ CopyState cstate, List *attnamelist)
+{
+ uint32 count;
+ ParallelCopyData *pcdata = cstate->pcdata;
+ TupleDesc tup_desc = RelationGetDescr(cstate->rel);
+
+ cstate->copy_dest = shared_cstate->copy_dest;
+ cstate->file_encoding = shared_cstate->file_encoding;
+ cstate->need_transcoding = shared_cstate->need_transcoding;
+ cstate->encoding_embeds_ascii = shared_cstate->encoding_embeds_ascii;
+ cstate->csv_mode = shared_cstate->csv_mode;
+ cstate->header_line = shared_cstate->header_line;
+ cstate->null_print_len = shared_cstate->null_print_len;
+ cstate->force_quote_all = shared_cstate->force_quote_all;
+ cstate->convert_selectively = shared_cstate->convert_selectively;
+ cstate->num_defaults = shared_cstate->num_defaults;
+ pcdata->relid = shared_cstate->relid;
+
+ PopulateGlobalsForCopyFrom(cstate, tup_desc, attnamelist);
+
+ /* Initialize state variables. */
+ cstate->reached_eof = false;
+ cstate->eol_type = EOL_UNKNOWN;
+ cstate->cur_relname = RelationGetRelationName(cstate->rel);
+ cstate->cur_lineno = 0;
+ cstate->cur_attname = NULL;
+ cstate->cur_attval = NULL;
+
+ /* Set up variables to avoid per-attribute overhead. */
+ initStringInfo(&cstate->attribute_buf);
+
+ initStringInfo(&cstate->line_buf);
+ for (count = 0; count < WORKER_CHUNK_COUNT;count++)
+ initStringInfo(&pcdata->worker_line_buf[count].line_buf);
+
+ cstate->line_buf_converted = false;
+ cstate->raw_buf = NULL;
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+}
+/*
+ * ParallelCopyMain - parallel copy worker's code.
+ *
+ * Where clause handling, convert tuple to columns, add default null values for
+ * the missing columns that are not present in that record. Find the partition
+ * if it is partitioned table, invoke before row insert Triggers, handle
+ * constraints and insert the tuples.
+ */
+void
+ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
+{
+ CopyState cstate;
+ ParallelCopyData *pcdata;
+ ParallelCopyShmInfo *pcshared_info;
+ ParallelCopyCommonKeyData *shared_cstate;
+ Relation rel = NULL;
+ MemoryContext oldcontext;
+ List *attlist = NIL;
+ char *data_source_cb;
+ char *whereClauseStr = NULL;
+ char *rangeTableStr = NULL;
+
+ /* Allocate workspace and zero all fields. */
+ cstate = (CopyStateData *) palloc0(sizeof(CopyStateData));
+ pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
+ cstate->pcdata = pcdata;
+ pcdata->is_leader = false;
+ pcdata->worker_processed_pos = -1;
+ cstate->is_parallel = true;
+
+ /*
+ * We allocate everything used by a cstate in a new memory context. This
+ * avoids memory leaks during repeated use of COPY in a query.
+ */
+ cstate->copycontext = AllocSetContextCreate(CurrentMemoryContext,
+ "COPY",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+
+ pcshared_info = (ParallelCopyShmInfo *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_SHARED_INFO,
+ false);
+
+ ereport(DEBUG1, (errmsg("Starting parallel copy worker")));
+
+ pcdata->pcshared_info = pcshared_info;
+
+ shared_cstate = (ParallelCopyCommonKeyData *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_CSTATE,
+ false);
+
+ cstate->null_print = (char *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_NULL_PRINT,
+ true);
+ cstate->null_print_client = (char *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_NULL_PRINT_CLIENT,
+ true);
+
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_DELIM, &cstate->delim);
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_QUOTE, &cstate->quote);
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_ESCAPE, &cstate->escape);
+
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_ATTNAME_LIST, &attlist);
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_FORCE_QUOTE_LIST, &cstate->force_quote);
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_NOT_NULL_LIST, &cstate->force_notnull);
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_NULL_LIST, &cstate->force_null);
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_CONVERT_LIST, &cstate->convert_select);
+ data_source_cb = (char *)shm_toc_lookup(toc,
+ PARALLEL_COPY_KEY_DATASOURCE_CB,
+ true);
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_WHERE_CLAUSE_STR, &whereClauseStr);
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_RANGE_TABLE, &rangeTableStr);
+
+ if (data_source_cb)
+ cstate->data_source_cb = LookupParallelCopyFnPtr(data_source_cb);
+
+ if (whereClauseStr)
+ {
+ Node *whereClauseCnvrtdFrmStr = (Node *) stringToNode(whereClauseStr);
+ cstate->whereClause = whereClauseCnvrtdFrmStr;
+ }
+
+ if (rangeTableStr)
+ {
+ List *rangeTableCnvrtdFrmStr = (List *) stringToNode(rangeTableStr);
+ cstate->range_table = rangeTableCnvrtdFrmStr;
+ }
+
+ /* Open and lock the relation, using the appropriate lock type. */
+ rel = table_open(shared_cstate->relid, RowExclusiveLock);
+ cstate->rel = rel;
+ ParallelWorkerInitialization(shared_cstate, cstate, attlist);
+
+ CopyFrom(cstate);
+
+ if (rel != NULL)
+ table_close(rel, RowExclusiveLock);
+
+ MemoryContextSwitchTo(oldcontext);
+ return;
+}
+/*
+ * ParallelCopyLeader - parallel copy leader's functionality.
+ *
+ * Leader will populate the shared memory and share it across the workers.
+ * Leader will read the table data from the file and copy the contents to block.
+ * Leader will then read the input contents and identify the data based on line
+ * beaks. This information is called line. The line will be populate in
+ * ParallelCopyLineBoundary. Workers will then pick up this information and
+ * insert in to table. Leader will do this till it completes processing the
+ * file. Leader executes the before statement if before statement trigger is
+ * present. Leader read the data from input file. Leader then loads data to data
+ * blocks as and when required block by block. Leader traverses through the data
+ * block to identify one line. It gets a free line to copy the information, if
+ * there is no free line it will wait till there is one free line.
+ * Server copies the identified lines information into lines. This process is
+ * repeated till the complete file is processed.
+ * Leader will wait till all the lines populated are processed by the workers
+ * and exits.
+ */
+static void
+ParallelCopyLeader(CopyState cstate)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ereport(DEBUG1, (errmsg("Running parallel copy leader")));
+
+ pcshared_info->is_read_in_progress = false;
+ cstate->cur_lineno = 0;
+}
+/*
+ * LookupParallelCopyFnPtr - Look up parallel copy function pointer.
+ */
+static pg_attribute_always_inline copy_data_source_cb
+LookupParallelCopyFnPtr(const char *funcname)
+{
+ int i;
+
+ for (i = 0; i < lengthof(InternalParallelCopyFuncPtrs); i++)
+ {
+ if (strcmp(InternalParallelCopyFuncPtrs[i].fn_name, funcname) == 0)
+ return InternalParallelCopyFuncPtrs[i].fn_addr;
+ }
+
+ /* We can only reach this by programming error. */
+ elog(ERROR, "internal function \"%s\" not found", funcname);
+}
+
+/*
+ * LookupParallelCopyFnStr - Lookup function string from a function pointer.
+ */
+static pg_attribute_always_inline char*
+LookupParallelCopyFnStr(copy_data_source_cb fn_addr)
+{
+ int i;
+
+ for (i = 0; i < lengthof(InternalParallelCopyFuncPtrs); i++)
+ {
+ if (InternalParallelCopyFuncPtrs[i].fn_addr == fn_addr)
+ return InternalParallelCopyFuncPtrs[i].fn_name;
+ }
+
+ /* We can only reach this by programming error. */
+ elog(ERROR, "internal function pointer not found");
+}
+/*
* Send copy start/stop messages for frontend copies. These have changed
* in past protocol redesigns.
*/
@@ -1149,6 +1939,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
if (is_from)
{
+ ParallelContext *pcxt = NULL;
Assert(rel);
/* check read-only transaction and parallel mode */
@@ -1158,7 +1949,24 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
cstate = BeginCopyFrom(pstate, rel, stmt->filename, stmt->is_program,
NULL, stmt->attlist, stmt->options);
cstate->whereClause = whereClause;
- *processed = CopyFrom(cstate); /* copy from file to database */
+ cstate->is_parallel = false;
+
+ if (cstate->nworkers > 0)
+ pcxt = BeginParallelCopy(cstate->nworkers, cstate, stmt->attlist,
+ relid);
+
+ if (pcxt)
+ {
+ ParallelCopyLeader(cstate);
+
+ /* Wait for all copy workers to finish */
+ WaitForParallelWorkersToFinish(pcxt);
+ *processed = pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
+ EndParallelCopy(pcxt);
+ }
+ else
+ *processed = CopyFrom(cstate); /* copy from file to database */
+
EndCopyFrom(cstate);
}
else
@@ -1207,6 +2015,7 @@ ProcessCopyOptions(ParseState *pstate,
cstate->is_copy_from = is_from;
cstate->file_encoding = -1;
+ cstate->nworkers = -1;
/* Extract options from the statement node tree */
foreach(option, options)
@@ -1375,6 +2184,26 @@ ProcessCopyOptions(ParseState *pstate,
defel->defname),
parser_errposition(pstate, defel->location)));
}
+ else if (strcmp(defel->defname, "parallel") == 0)
+ {
+ if (!cstate->is_copy_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("parallel option supported only for copy from"),
+ parser_errposition(pstate, defel->location)));
+ if (cstate->nworkers >= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("conflicting or redundant options"),
+ parser_errposition(pstate, defel->location)));
+ cstate->nworkers = atoi(defGetString(defel));
+ if (cstate->nworkers <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("argument to option \"%s\" must be a positive integer greater than zero",
+ defel->defname),
+ parser_errposition(pstate, defel->location)));
+ }
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1727,12 +2556,13 @@ BeginCopy(ParseState *pstate,
}
/*
- * PopulateGlobalsForCopyFrom - Populates the common variables required for copy
- * from operation. This is a helper function for BeginCopy function.
+ * PopulateCatalogInformation - Populates the common variables required for copy
+ * from operation. This is a helper function for BeginCopy &
+ * ParallelWorkerInitialization function.
*/
static void
PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tupDesc,
- List *attnamelist)
+ List *attnamelist)
{
int num_phys_attrs;
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index c27d970..b3787c1 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -557,7 +557,7 @@ make_copy_attnamelist(LogicalRepRelMapEntry *rel)
* Data source callback for the COPY FROM, which reads from the remote
* connection and passes the data back to our local COPY.
*/
-static int
+int
copy_read_data(void *outbuf, int minread, int maxread)
{
int bytesread = 0;
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c639833..5dc95ac 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -14,6 +14,7 @@
#ifndef COPY_H
#define COPY_H
+#include "access/parallel.h"
#include "nodes/execnodes.h"
#include "nodes/parsenodes.h"
#include "parser/parse_node.h"
@@ -41,4 +42,7 @@ extern uint64 CopyFrom(CopyState cstate);
extern DestReceiver *CreateCopyDestReceiver(void);
+extern void ParallelCopyMain(dsm_segment *seg, shm_toc *toc);
+
+extern int copy_read_data(void *outbuf, int minread, int maxread);
#endif /* COPY_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index c65a552..8a79794 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1698,6 +1698,14 @@ ParallelBitmapHeapState
ParallelBlockTableScanDesc
ParallelCompletionPtr
ParallelContext
+ParallelCopyLineBoundaries
+ParallelCopyLineBoundary
+ParallelCopyCommonKeyData
+ParallelCopyData
+ParallelCopyDataBlock
+ParallelCopyKeyListInfo
+ParallelCopyLineBuf
+ParallelCopyShmInfo
ParallelExecutorInfo
ParallelHashGrowth
ParallelHashJoinBatch
--
1.8.3.1
0003-Allow-copy-from-command-to-process-data-from-file-ST.patchapplication/octet-stream; name=0003-Allow-copy-from-command-to-process-data-from-file-ST.patchDownload
From efc6dcb1ad7285c4e7486cd152c5908a48a7544d Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>, Bharath <bharath.rupireddyforpostgres@gmail.com>
Date: Tue, 23 Jun 2020 07:15:43 +0530
Subject: [PATCH 3/6] Allow copy from command to process data from file/STDIN
contents to a table in parallel.
This feature allows the copy from to leverage multiple CPUs in order to copy
data from file/STDIN to a table. This adds a PARALLEL option to COPY FROM
command where the user can specify the number of workers that can be used
to perform the COPY FROM command. Specifying zero as number of workers will
disable parallelism.
The backend, to which the "COPY FROM" query is submitted acts as leader with
the responsibility of reading data from the file/stdin, launching at most n
number of workers as specified with PARALLEL 'n' option in the "COPY FROM"
query. The leader populates the common data required for the workers execution
in the DSM and shares it with the workers. The leader then executes before
statement triggers if there exists any. Leader populates DSM lines which
includes the start offset and line size, while populating the lines it reads
as many blocks as required into the DSM data blocks from the file. Each block
is of 64K size. The leader parses the data to identify a line, the existing
logic from CopyReadLineText which identifies the lines with some changes was
used for this. Leader checks if a free line is available to copy the
information, if there is no free line it waits till the required line is
freed up by the worker and then copies the identified lines information
(offset & line size) into the DSM lines. This process is repeated till the
complete file is processed. Simultaneously, the workers cache the lines(50)
locally into the local memory and release the lines to the leader for further
populating. Each worker processes the lines that it cached and inserts it into
the table. The leader does not participate in the insertion of data, leaders
only responsibility will be to identify the lines as fast as possible for the
workers to do the actual copy operation. The leader waits till all the lines
populated are processed by the workers and exits.
---
src/backend/access/heap/heapam.c | 13 -
src/backend/access/transam/xact.c | 13 +
src/backend/commands/copy.c | 896 ++++++++++++++++++++++++++++++++++++--
src/include/access/xact.h | 1 +
src/tools/pgindent/typedefs.list | 1 +
5 files changed, 873 insertions(+), 51 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 537913d..28f3a98 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2012,19 +2012,6 @@ static HeapTuple
heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
CommandId cid, int options)
{
- /*
- * Parallel operations are required to be strictly read-only in a parallel
- * worker. Parallel inserts are not safe even in the leader in the
- * general case, because group locking means that heavyweight locks for
- * relation extension or GIN page locks will not conflict between members
- * of a lock group, but we don't prohibit that case here because there are
- * useful special cases that we can safely allow, such as CREATE TABLE AS.
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-
tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 905dc7d..66f7236 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -502,6 +502,19 @@ GetCurrentFullTransactionIdIfAny(void)
}
/*
+ * AssignFullTransactionIdForWorker
+ *
+ * For parallel copy, all the workers must use the same transaction id.
+ */
+void AssignFullTransactionIdForWorker(FullTransactionId fullTransactionId)
+{
+ TransactionState s = CurrentTransactionState;
+
+ Assert((IsInParallelMode() || IsParallelWorker()));
+ s->fullTransactionId = fullTransactionId;
+}
+
+/*
* MarkCurrentTransactionIdLoggedIfAny
*
* Remember that the current xid - if it is assigned - now has been wal logged.
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 34c657c..e8a89a4 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -26,6 +26,7 @@
#include "access/xlog.h"
#include "catalog/dependency.h"
#include "catalog/pg_authid.h"
+#include "catalog/pg_proc_d.h"
#include "catalog/pg_type.h"
#include "commands/copy.h"
#include "commands/defrem.h"
@@ -40,11 +41,13 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "optimizer/clauses.h"
#include "optimizer/optimizer.h"
#include "parser/parse_coerce.h"
#include "parser/parse_collate.h"
#include "parser/parse_expr.h"
#include "parser/parse_relation.h"
+#include "pgstat.h"
#include "port/pg_bswap.h"
#include "rewrite/rewriteHandler.h"
#include "storage/fd.h"
@@ -95,6 +98,18 @@ typedef enum CopyInsertMethod
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
} CopyInsertMethod;
+/*
+ * State of the line.
+ */
+typedef enum ParallelCopyLineState
+{
+ LINE_INIT, /* initial state of line */
+ LINE_LEADER_POPULATING, /* leader processing line */
+ LINE_LEADER_POPULATED, /* leader completed populating line */
+ LINE_WORKER_PROCESSING, /* worker processing line */
+ LINE_WORKER_PROCESSED /* worker completed processing line */
+}ParallelCopyLineState;
+
#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
#define DATA_BLOCK_SIZE RAW_BUF_SIZE
@@ -538,9 +553,13 @@ if (1) \
{ \
if (raw_buf_ptr > cstate->raw_buf_index) \
{ \
- appendBinaryStringInfo(&cstate->line_buf, \
- cstate->raw_buf + cstate->raw_buf_index, \
- raw_buf_ptr - cstate->raw_buf_index); \
+ if (!IsParallelCopy()) \
+ appendBinaryStringInfo(&cstate->line_buf, \
+ cstate->raw_buf + cstate->raw_buf_index, \
+ raw_buf_ptr - cstate->raw_buf_index); \
+ else \
+ line_size += raw_buf_ptr - cstate->raw_buf_index; \
+ \
cstate->raw_buf_index = raw_buf_ptr; \
} \
} else ((void) 0)
@@ -553,13 +572,40 @@ if (1) \
goto not_end_of_copy; \
} else ((void) 0)
+/* Begin parallel copy Macros */
+#define SET_NEWLINE_SIZE() \
+{ \
+ if (cstate->eol_type == EOL_NL || cstate->eol_type == EOL_CR) \
+ new_line_size = 1; \
+ else if (cstate->eol_type == EOL_CRNL) \
+ new_line_size = 2; \
+ else \
+ new_line_size = 0; \
+}
+
+/*
+ * COPY_WAIT_TO_PROCESS - Wait before continuing to process.
+ */
+#define COPY_WAIT_TO_PROCESS() \
+{ \
+ CHECK_FOR_INTERRUPTS(); \
+ (void) WaitLatch(MyLatch, \
+ WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, \
+ 1L, WAIT_EVENT_PG_SLEEP); \
+ ResetLatch(MyLatch); \
+}
+
+/* End parallel copy Macros */
+
/*
* CONVERT_TO_SERVER_ENCODING - convert contents to server encoding.
*/
#define CONVERT_TO_SERVER_ENCODING(cstate) \
{ \
/* Done reading the line. Convert it to server encoding. */ \
- if (cstate->need_transcoding) \
+ if (cstate->need_transcoding && \
+ (!IsParallelCopy() || \
+ (IsParallelCopy() && !IsLeader()))) \
{ \
char *cvt; \
cvt = pg_any_to_server(cstate->line_buf.data, \
@@ -618,22 +664,38 @@ if (1) \
* CLEAR_EOL_LINE - Wrapper for clearing EOL.
*/
#define CLEAR_EOL_LINE() \
-if (!result && !IsHeaderLine()) \
- CLEAR_EOL_FROM_COPIED_DATA(cstate->line_buf.data, \
- cstate->line_buf.len, \
- cstate->line_buf.len) \
+{ \
+ if (!result && !IsHeaderLine()) \
+ { \
+ if (IsParallelCopy()) \
+ CLEAR_EOL_FROM_COPIED_DATA(cstate->raw_buf, \
+ raw_buf_ptr, line_size) \
+ else \
+ CLEAR_EOL_FROM_COPIED_DATA(cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ cstate->line_buf.len) \
+ } \
+}
/*
* INCREMENTPROCESSED - Increment the lines processed.
*/
-#define INCREMENTPROCESSED(processed) \
-processed++;
+#define INCREMENTPROCESSED(processed) \
+{ \
+ if (!IsParallelCopy()) \
+ processed++; \
+ else \
+ pg_atomic_add_fetch_u64(&cstate->pcdata->pcshared_info->processed, 1); \
+}
/*
* GETPROCESSED - Get the lines processed.
*/
#define GETPROCESSED(processed) \
-return processed;
+if (!IsParallelCopy()) \
+ return processed; \
+else \
+ return pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
@@ -681,8 +743,12 @@ static void CopySendInt16(CopyState cstate, int16 val);
static bool CopyGetInt16(CopyState cstate, int16 *val);
static pg_attribute_always_inline void EndParallelCopy(ParallelContext *pcxt);
+static void ExecBeforeStmtTrigger(CopyState cstate);
+static void CheckTargetRelValidity(CopyState cstate);
static void PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tup_desc,
List *attnamelist);
+static void PopulateCatalogInformation(CopyState cstate);
+static pg_attribute_always_inline uint32 GetLinePosition(CopyState cstate);
static pg_attribute_always_inline copy_data_source_cb LookupParallelCopyFnPtr(const char *funcname);
static pg_attribute_always_inline char* LookupParallelCopyFnStr(copy_data_source_cb fn_addr);
@@ -836,6 +902,137 @@ InsertListShm(ParallelContext *pcxt, int key, List *inputlist,
}
/*
+ * IsTriggerFunctionParallelSafe - Check if the trigger function is parallel
+ * safe for the triggers. Return false if any one of the trigger has parallel
+ * unsafe function.
+ */
+static pg_attribute_always_inline bool
+IsTriggerFunctionParallelSafe(TriggerDesc *trigdesc)
+{
+ int i;
+ for (i = 0; i < trigdesc->numtriggers; i++)
+ {
+ Trigger *trigger = &trigdesc->triggers[i];
+ int trigtype = RI_TRIGGER_NONE;
+
+ if (func_parallel(trigger->tgfoid) != PROPARALLEL_SAFE)
+ return false;
+
+ /* If the trigger is parallel safe, also look for RI_TRIGGER. */
+ trigtype = RI_FKey_trigger_type(trigger->tgfoid);
+ if (trigtype == RI_TRIGGER_PK || trigtype == RI_TRIGGER_FK)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * CheckExprParallelSafety - determine parallel safety of volatile expressions
+ * in default clause of column definition or in where clause and return true if
+ * they are parallel safe.
+ */
+static pg_attribute_always_inline bool
+CheckExprParallelSafety(CopyState cstate)
+{
+ if (contain_volatile_functions(cstate->whereClause))
+ {
+ if (max_parallel_hazard((Query *)cstate->whereClause) != PROPARALLEL_SAFE)
+ return false;
+ }
+
+ /*
+ * Check if any of the column has volatile default expression. if yes, and
+ * they are not parallel safe, then parallelism is not allowed. For
+ * instance, if there are any serial/bigserial columns for which nextval()
+ * default expression which is parallel unsafe is associated, parallelism
+ * should not be allowed. In non paralllel copy volatile functions are not
+ * checked for nextval().
+ */
+ if (cstate->defexprs != NULL && cstate->num_defaults != 0)
+ {
+ int i;
+ for (i = 0; i < cstate->num_defaults; i++)
+ {
+ bool volatile_expr = contain_volatile_functions((Node *)cstate->defexprs[i]->expr);
+ if (volatile_expr &&
+ (max_parallel_hazard((Query *)cstate->defexprs[i]->expr)) !=
+ PROPARALLEL_SAFE)
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * FindInsertMethod - determine insert mode single, multi, or multi conditional.
+ */
+static pg_attribute_always_inline CopyInsertMethod
+FindInsertMethod(CopyState cstate)
+{
+ if (cstate->rel->trigdesc != NULL &&
+ (cstate->rel->trigdesc->trig_insert_before_row ||
+ cstate->rel->trigdesc->trig_insert_instead_row))
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
+ cstate->rel->trigdesc != NULL &&
+ cstate->rel->trigdesc->trig_insert_new_table)
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+ return CIM_MULTI_CONDITIONAL;
+
+ return CIM_MULTI;
+}
+
+/*
+ * IsParallelCopyAllowed - check for the cases where parallel copy is not
+ * applicable.
+ */
+static pg_attribute_always_inline bool
+IsParallelCopyAllowed(CopyState cstate)
+{
+ /* Parallel copy not allowed for frontend (2.0 protocol) & binary option. */
+ if ((cstate->copy_dest == COPY_OLD_FE) || cstate->binary)
+ return false;
+
+ /* Check if copy is into foreign table or temporary table. */
+ if (cstate->rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE ||
+ RelationUsesLocalBuffers(cstate->rel))
+ return false;
+
+ /* Check if trigger function is parallel safe. */
+ if (cstate->rel->trigdesc != NULL &&
+ !IsTriggerFunctionParallelSafe(cstate->rel->trigdesc))
+ return false;
+
+ /*
+ * Check if there is after statement or instead of trigger or transition
+ * table triggers.
+ */
+ if (cstate->rel->trigdesc != NULL &&
+ (cstate->rel->trigdesc->trig_insert_after_statement ||
+ cstate->rel->trigdesc->trig_insert_instead_row ||
+ cstate->rel->trigdesc->trig_insert_new_table))
+ return false;
+
+ /* Check if the volatile expressions are parallel safe, if present any. */
+ if (!CheckExprParallelSafety(cstate))
+ return false;
+
+ /* Check if the insertion mode is single. */
+ if (FindInsertMethod(cstate) == CIM_SINGLE)
+ return false;
+
+ return true;
+}
+
+/*
* BeginParallelCopy - start parallel copy tasks.
*
* Get the number of workers required to perform the parallel copy. The data
@@ -865,6 +1062,8 @@ BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
int parallel_workers = 0;
ParallelCopyData *pcdata;
+ CheckTargetRelValidity(cstate);
+
parallel_workers = Min(nworkers, max_worker_processes);
/* Can't perform copy in parallel */
@@ -878,6 +1077,19 @@ BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
cstate->pcdata = pcdata;
+ /*
+ * User chosen parallel copy. Determine if the parallel copy is actually
+ * allowed. If not, go with the non-parallel mode.
+ */
+ if (!IsParallelCopyAllowed(cstate))
+ {
+ elog(WARNING,
+ "Parallel copy not supported for specified table, copy will be run in non-parallel mode");
+ return NULL;
+ }
+
+ full_transaction_id = GetCurrentFullTransactionId();
+
EnterParallelMode();
pcxt = CreateParallelContext("postgres", "ParallelCopyMain",
parallel_workers);
@@ -1108,7 +1320,211 @@ ParallelWorkerInitialization(ParallelCopyCommonKeyData *shared_cstate,
cstate->line_buf_converted = false;
cstate->raw_buf = NULL;
cstate->raw_buf_index = cstate->raw_buf_len = 0;
+
+ PopulateCatalogInformation(cstate);
+
+ /* Create workspace for CopyReadAttributes results. */
+ if (!cstate->binary)
+ {
+ AttrNumber attr_count = list_length(cstate->attnumlist);
+
+ cstate->max_fields = attr_count;
+ cstate->raw_fields = (char **)palloc(attr_count * sizeof(char *));
+ }
+}
+
+/*
+ * CacheLineInfo - Cache the line information to local memory.
+ */
+static bool
+CacheLineInfo(CopyState cstate, uint32 buff_count)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyData *pcdata = cstate->pcdata;
+ uint32 write_pos;
+ ParallelCopyDataBlock *data_blk_ptr;
+ ParallelCopyLineBoundary *lineInfo;
+ uint32 offset;
+ int dataSize;
+ int copiedSize = 0;
+
+ resetStringInfo(&pcdata->worker_line_buf[buff_count].line_buf);
+ write_pos = GetLinePosition(cstate);
+ if (-1 == write_pos)
+ return true;
+
+ /* Get the current line information. */
+ lineInfo = &pcshared_info->line_boundaries.ring[write_pos];
+ if (pg_atomic_read_u32(&lineInfo->line_size) == 0)
+ goto empty_data_line_update;
+
+ /* Get the block information. */
+ data_blk_ptr = &pcshared_info->data_blocks[lineInfo->first_block];
+
+ /* Get the offset information from where the data must be copied. */
+ offset = lineInfo->start_offset;
+ pcdata->worker_line_buf[buff_count].cur_lineno = lineInfo->cur_lineno;
+
+ elog(DEBUG1, "[Worker] Processing - line position:%d, block:%d, unprocessed lines:%d, offset:%d, line size:%d",
+ write_pos, lineInfo->first_block,
+ pg_atomic_read_u32(&data_blk_ptr->unprocessed_line_parts),
+ offset, pg_atomic_read_u32(&lineInfo->line_size));
+
+ for (;;)
+ {
+ uint8 skip_bytes = data_blk_ptr->skip_bytes;
+ /*
+ * There is a possibility that the above loop has come out because
+ * data_blk_ptr->curr_blk_completed is set, but dataSize read might
+ * be an old value, if data_blk_ptr->curr_blk_completed and the line is
+ * completed, line_size will be set. Read the line_size again to be
+ * sure if it is complete or partial block.
+ */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+ if (dataSize)
+ {
+ int remainingSize = dataSize - copiedSize;
+ if (!remainingSize)
+ break;
+
+ /* Whole line is in current block. */
+ if (remainingSize + offset + skip_bytes < DATA_BLOCK_SIZE)
+ {
+ appendBinaryStringInfo(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ remainingSize);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts,
+ 1);
+ break;
+ }
+ else
+ {
+ /* Line is spread across the blocks. */
+ uint32 lineInCurrentBlock = (DATA_BLOCK_SIZE - skip_bytes) - offset;
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ lineInCurrentBlock);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts, 1);
+ copiedSize += lineInCurrentBlock;
+ while (copiedSize < dataSize)
+ {
+ uint32 currentBlockCopySize;
+ ParallelCopyDataBlock *currBlkPtr = &pcshared_info->data_blocks[data_blk_ptr->following_block];
+ skip_bytes = currBlkPtr->skip_bytes;
+
+ /*
+ * If complete data is present in current block use
+ * dataSize - copiedSize, or copy the whole block from
+ * current block.
+ */
+ currentBlockCopySize = Min(dataSize - copiedSize, DATA_BLOCK_SIZE - skip_bytes);
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &currBlkPtr->data[0],
+ currentBlockCopySize);
+ pg_atomic_sub_fetch_u32(&currBlkPtr->unprocessed_line_parts, 1);
+ copiedSize += currentBlockCopySize;
+ data_blk_ptr = currBlkPtr;
+ }
+
+ break;
+ }
+ }
+ else
+ {
+ /* Copy this complete block from the current offset. */
+ uint32 lineInCurrentBlock = (DATA_BLOCK_SIZE - skip_bytes) - offset;
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ lineInCurrentBlock);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts, 1);
+ copiedSize += lineInCurrentBlock;
+
+ /*
+ * Reset the offset. For the first copy, copy from the offset. For
+ * the subsequent copy the complete block.
+ */
+ offset = 0;
+
+ /* Set data_blk_ptr to the following block. */
+ data_blk_ptr = &pcshared_info->data_blocks[data_blk_ptr->following_block];
+ }
+
+ for (;;)
+ {
+ /* Get the size of this line */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+
+ /*
+ * If the data is present in current block lineInfo.line_size
+ * will be updated. If the data is spread across the blocks either
+ * of lineInfo.line_size or data_blk_ptr->curr_blk_completed can
+ * be updated. lineInfo.line_size will be updated if the complete
+ * read is finished. data_blk_ptr->curr_blk_completed will be
+ * updated if processing of current block is finished and data
+ * processing is not finished.
+ */
+ if (data_blk_ptr->curr_blk_completed || (dataSize != -1))
+ break;
+
+ COPY_WAIT_TO_PROCESS()
+ }
+ }
+
+empty_data_line_update:
+ elog(DEBUG1, "[Worker] Completed processing line:%d", write_pos);
+ pg_atomic_write_u32(&lineInfo->line_state, LINE_WORKER_PROCESSED);
+ pg_atomic_write_u32(&lineInfo->line_size, -1);
+ pg_atomic_add_fetch_u64(&pcshared_info->total_worker_processed, 1);
+ return false;
+}
+
+/*
+ * GetWorkerLine - Returns a line for worker to process.
+ */
+static bool
+GetWorkerLine(CopyState cstate)
+{
+ uint32 buff_count;
+ ParallelCopyData *pcdata = cstate->pcdata;
+
+ /*
+ * Copy the line data to line_buf and release the line position so that the
+ * worker can continue loading data.
+ */
+ if (pcdata->worker_line_buf_pos < pcdata->worker_line_buf_count)
+ goto return_line;
+
+ pcdata->worker_line_buf_pos = 0;
+ pcdata->worker_line_buf_count = 0;
+
+ for (buff_count = 0; buff_count < WORKER_CHUNK_COUNT; buff_count++)
+ {
+ bool result = CacheLineInfo(cstate, buff_count);
+ if (result)
+ break;
+
+ pcdata->worker_line_buf_count++;
+ }
+
+ if (pcdata->worker_line_buf_count)
+ goto return_line;
+ else
+ resetStringInfo(&cstate->line_buf);
+
+ return true;
+
+return_line:
+ cstate->line_buf = pcdata->worker_line_buf[pcdata->worker_line_buf_pos].line_buf;
+ cstate->cur_lineno = pcdata->worker_line_buf[pcdata->worker_line_buf_pos].cur_lineno;
+ cstate->line_buf_valid = true;
+
+ /* Mark that encoding conversion hasn't occurred yet. */
+ cstate->line_buf_converted = false;
+ CONVERT_TO_SERVER_ENCODING(cstate)
+ pcdata->worker_line_buf_pos++;
+ return false;
}
+
/*
* ParallelCopyMain - parallel copy worker's code.
*
@@ -1154,6 +1570,7 @@ ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
ereport(DEBUG1, (errmsg("Starting parallel copy worker")));
pcdata->pcshared_info = pcshared_info;
+ AssignFullTransactionIdForWorker(pcshared_info->full_transaction_id);
shared_cstate = (ParallelCopyCommonKeyData *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_CSTATE,
false);
@@ -1206,6 +1623,34 @@ ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
MemoryContextSwitchTo(oldcontext);
return;
}
+
+/*
+ * UpdateBlockInLineInfo - Update the line information.
+ */
+static pg_attribute_always_inline int
+UpdateBlockInLineInfo(CopyState cstate, uint32 blk_pos,
+ uint32 offset, uint32 line_size, uint32 line_state)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundaries *lineBoundaryPtr = &pcshared_info->line_boundaries;
+ ParallelCopyLineBoundary *lineInfo;
+ int line_pos = lineBoundaryPtr->leader_pos;
+
+ /* Update the line information for the worker to pick and process. */
+ lineInfo = &lineBoundaryPtr->ring[line_pos];
+ while (pg_atomic_read_u32(&lineInfo->line_size) != -1)
+ COPY_WAIT_TO_PROCESS()
+
+ lineInfo->first_block = blk_pos;
+ lineInfo->start_offset = offset;
+ lineInfo->cur_lineno = cstate->cur_lineno;
+ pg_atomic_write_u32(&lineInfo->line_size, line_size);
+ pg_atomic_write_u32(&lineInfo->line_state, line_state);
+ lineBoundaryPtr->leader_pos = (lineBoundaryPtr->leader_pos + 1) % RINGSIZE;
+
+ return line_pos;
+}
+
/*
* ParallelCopyLeader - parallel copy leader's functionality.
*
@@ -1231,9 +1676,158 @@ ParallelCopyLeader(CopyState cstate)
ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
ereport(DEBUG1, (errmsg("Running parallel copy leader")));
+ /* Execute the before statement triggers from the leader */
+ ExecBeforeStmtTrigger(cstate);
+
+ /* On input just throw the header line away. */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
+ {
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ {
+ pcshared_info->is_read_in_progress = false;
+ return; /* done */
+ }
+ }
+
+ for (;;)
+ {
+ bool done;
+ cstate->cur_lineno++;
+
+ /* Actually read the line into memory here. */
+ done = CopyReadLine(cstate);
+
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ break;
+ }
+
pcshared_info->is_read_in_progress = false;
cstate->cur_lineno = 0;
}
+
+/*
+ * GetLinePosition - return the line position that worker should process.
+ */
+static uint32
+GetLinePosition(CopyState cstate)
+{
+ ParallelCopyData *pcdata = cstate->pcdata;
+ ParallelCopyShmInfo *pcshared_info = pcdata->pcshared_info;
+ uint32 previous_pos = pcdata->worker_processed_pos;
+ uint32 write_pos = (previous_pos == -1) ? 0 : (previous_pos + 1) % RINGSIZE;
+ for (;;)
+ {
+ int dataSize;
+ bool is_read_in_progress = pcshared_info->is_read_in_progress;
+ ParallelCopyLineBoundary *lineInfo;
+ ParallelCopyDataBlock *data_blk_ptr;
+ ParallelCopyLineState line_state = LINE_LEADER_POPULATED;
+ ParallelCopyLineState curr_line_state;
+ CHECK_FOR_INTERRUPTS();
+
+ /* File read completed & no elements to process. */
+ if (!is_read_in_progress &&
+ (pcshared_info->populated ==
+ pg_atomic_read_u64(&pcshared_info->total_worker_processed)))
+ {
+ write_pos = -1;
+ break;
+ }
+
+ /* Get the current line information. */
+ lineInfo = &pcshared_info->line_boundaries.ring[write_pos];
+ curr_line_state = pg_atomic_read_u32(&lineInfo->line_state);
+ if ((write_pos % WORKER_CHUNK_COUNT == 0) &&
+ (curr_line_state == LINE_WORKER_PROCESSED ||
+ curr_line_state == LINE_WORKER_PROCESSING))
+ {
+ pcdata->worker_processed_pos = write_pos;
+ write_pos = (write_pos + WORKER_CHUNK_COUNT) % RINGSIZE;
+ continue;
+ }
+
+ /* Get the size of this line. */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+
+ if (dataSize != 0) /* If not an empty line. */
+ {
+ /* Get the block information. */
+ data_blk_ptr = &pcshared_info->data_blocks[lineInfo->first_block];
+
+ if (!data_blk_ptr->curr_blk_completed && (dataSize == -1))
+ {
+ /* Wait till the current line or block is added. */
+ COPY_WAIT_TO_PROCESS()
+ continue;
+ }
+ }
+
+ /* Make sure that no worker has consumed this element. */
+ if (pg_atomic_compare_exchange_u32(&lineInfo->line_state,
+ &line_state, LINE_WORKER_PROCESSING))
+ break;
+ }
+
+ pcdata->worker_processed_pos = write_pos;
+ return write_pos;
+}
+
+/*
+ * GetFreeCopyBlock - Get a free block for data to be copied.
+ */
+static pg_attribute_always_inline uint32
+GetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
+{
+ int count = 0;
+ uint32 last_free_block = pcshared_info->cur_block_pos;
+ uint32 block_pos = (last_free_block != -1) ? ((last_free_block + 1) % MAX_BLOCKS_COUNT): 0;
+
+ /* Get a new block for copying data. */
+ while (count < MAX_BLOCKS_COUNT)
+ {
+ ParallelCopyDataBlock *dataBlkPtr = &pcshared_info->data_blocks[block_pos];
+ uint32 unprocessed_line_parts = pg_atomic_read_u32(&dataBlkPtr->unprocessed_line_parts);
+ if (unprocessed_line_parts == 0)
+ {
+ dataBlkPtr->curr_blk_completed = false;
+ dataBlkPtr->skip_bytes = 0;
+ pcshared_info->cur_block_pos = block_pos;
+ return block_pos;
+ }
+
+ block_pos = (block_pos + 1) % MAX_BLOCKS_COUNT;
+ count++;
+ }
+
+ return -1;
+}
+
+/*
+ * WaitGetFreeCopyBlock - If there are no blocks available, wait and get a block
+ * for copying data.
+ */
+static uint32
+WaitGetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
+{
+ uint32 new_free_pos = -1;
+ for (;;)
+ {
+ new_free_pos = GetFreeCopyBlock(pcshared_info);
+ if (new_free_pos != -1) /* We have got one block, break now. */
+ break;
+
+ COPY_WAIT_TO_PROCESS()
+ }
+
+ return new_free_pos;
+}
+
/*
* LookupParallelCopyFnPtr - Look up parallel copy function pointer.
*/
@@ -1269,6 +1863,146 @@ LookupParallelCopyFnStr(copy_data_source_cb fn_addr)
/* We can only reach this by programming error. */
elog(ERROR, "internal function pointer not found");
}
+
+/*
+ * SetRawBufForLoad - Set raw_buf to the shared memory where the file data must
+ * be read.
+ */
+static void
+SetRawBufForLoad(CopyState cstate, uint32 line_size, uint32 copy_buf_len,
+ uint32 raw_buf_ptr, char **copy_raw_buf)
+{
+ ParallelCopyShmInfo *pcshared_info;
+ uint32 cur_block_pos;
+ uint32 next_block_pos;
+ ParallelCopyDataBlock *cur_data_blk_ptr = NULL;
+ ParallelCopyDataBlock *next_data_blk_ptr = NULL;
+
+ if (!IsParallelCopy())
+ return;
+
+ pcshared_info = cstate->pcdata->pcshared_info;
+ cur_block_pos = pcshared_info->cur_block_pos;
+ cur_data_blk_ptr = (cstate->raw_buf) ? &pcshared_info->data_blocks[cur_block_pos] : NULL;
+ next_block_pos = WaitGetFreeCopyBlock(pcshared_info);
+ next_data_blk_ptr = &pcshared_info->data_blocks[next_block_pos];
+
+ /* set raw_buf to the data block in shared memory */
+ cstate->raw_buf = next_data_blk_ptr->data;
+ *copy_raw_buf = cstate->raw_buf;
+ if (cur_data_blk_ptr && line_size)
+ {
+ /*
+ * Mark the previous block as completed, worker can start copying this
+ * data.
+ */
+ cur_data_blk_ptr->following_block = next_block_pos;
+ pg_atomic_add_fetch_u32(&cur_data_blk_ptr->unprocessed_line_parts, 1);
+ cur_data_blk_ptr->skip_bytes = copy_buf_len - raw_buf_ptr;
+ cur_data_blk_ptr->curr_blk_completed = true;
+ }
+}
+
+/*
+ * EndLineParallelCopy - Update the line information in shared memory.
+ */
+static void
+EndLineParallelCopy(CopyState cstate, uint32 line_pos, uint32 line_size,
+ uint32 raw_buf_ptr)
+{
+ uint8 new_line_size;
+ if (!IsParallelCopy())
+ return;
+
+ if (!IsHeaderLine())
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundaries *lineBoundaryPtr = &pcshared_info->line_boundaries;
+ SET_NEWLINE_SIZE()
+ if (line_size)
+ {
+ ParallelCopyLineBoundary *lineInfo = &lineBoundaryPtr->ring[line_pos];
+ /*
+ * If the new_line_size > raw_buf_ptr, then the new block has only
+ * new line char content. The unprocessed count should not be
+ * increased in this case.
+ */
+ if (raw_buf_ptr > new_line_size)
+ {
+ uint32 cur_block_pos = pcshared_info->cur_block_pos;
+ ParallelCopyDataBlock *curr_data_blk_ptr = &pcshared_info->data_blocks[cur_block_pos];
+ pg_atomic_add_fetch_u32(&curr_data_blk_ptr->unprocessed_line_parts, 1);
+ }
+
+ /* Update line size. */
+ pg_atomic_write_u32(&lineInfo->line_size, line_size);
+ pg_atomic_write_u32(&lineInfo->line_state, LINE_LEADER_POPULATED);
+ elog(DEBUG1, "[Leader] After adding - line position:%d, line_size:%d",
+ line_pos, line_size);
+ pcshared_info->populated++;
+ }
+ else if (new_line_size)
+ {
+ /* This means only new line char, empty record should be inserted.*/
+ ParallelCopyLineBoundary *lineInfo;
+ line_pos = UpdateBlockInLineInfo(cstate, -1, -1, 0,
+ LINE_LEADER_POPULATED);
+ lineInfo = &lineBoundaryPtr->ring[line_pos];
+ elog(DEBUG1, "[Leader] Added empty line with offset:%d, line position:%d, line size:%d",
+ lineInfo->start_offset, line_pos,
+ pg_atomic_read_u32(&lineInfo->line_size));
+ pcshared_info->populated++;
+ }
+ }
+}
+
+/*
+ * ExecBeforeStmtTrigger - Execute the before statement trigger, this will be
+ * executed for parallel copy by the leader process.
+ */
+static void
+ExecBeforeStmtTrigger(CopyState cstate)
+{
+ EState *estate = CreateExecutorState();
+ ResultRelInfo *resultRelInfo;
+
+ Assert(IsLeader());
+
+ /*
+ * We need a ResultRelInfo so we can use the regular executor's
+ * index-entry-making machinery. (There used to be a huge amount of code
+ * here that basically duplicated execUtils.c ...)
+ */
+ resultRelInfo = makeNode(ResultRelInfo);
+ InitResultRelInfo(resultRelInfo,
+ cstate->rel,
+ 1, /* must match rel's position in range_table */
+ NULL,
+ 0);
+
+ /* Verify the named relation is a valid target for INSERT */
+ CheckValidResultRel(resultRelInfo, CMD_INSERT);
+
+ estate->es_result_relations = resultRelInfo;
+ estate->es_num_result_relations = 1;
+ estate->es_result_relation_info = resultRelInfo;
+
+ ExecInitRangeTable(estate, cstate->range_table);
+
+ /*
+ * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
+ * should do this for COPY, since it's not really an "INSERT" statement as
+ * such. However, executing these triggers maintains consistency with the
+ * EACH ROW triggers that we already fire on COPY.
+ */
+ ExecBSInsertTriggers(estate, resultRelInfo);
+
+ /* Close any trigger target relations */
+ ExecCleanUpTriggerState(estate);
+
+ FreeExecutorState(estate);
+}
+
/*
* Send copy start/stop messages for frontend copies. These have changed
* in past protocol redesigns.
@@ -3675,7 +4409,8 @@ CopyFrom(CopyState cstate)
PartitionTupleRouting *proute = NULL;
ErrorContextCallback errcallback;
- CommandId mycid = GetCurrentCommandId(true);
+ CommandId mycid = IsParallelCopy() ? cstate->pcdata->pcshared_info->mycid :
+ GetCurrentCommandId(true);
int ti_options = 0; /* start with default options for insert */
BulkInsertState bistate = NULL;
CopyInsertMethod insertMethod;
@@ -3685,7 +4420,14 @@ CopyFrom(CopyState cstate)
bool has_instead_insert_row_trig;
bool leafpart_use_multi_insert = false;
- CheckTargetRelValidity(cstate);
+ /*
+ * Perform this check if it is not parallel copy. In case of parallel
+ * copy, this check is done by the leader, so that if any invalid case
+ * exist the copy from command will error out from the leader itself,
+ * avoiding launching workers, just to throw error.
+ */
+ if (!IsParallelCopy())
+ CheckTargetRelValidity(cstate);
/*
* If the target file is new-in-transaction, we assume that checking FSM
@@ -3874,13 +4616,16 @@ CopyFrom(CopyState cstate)
has_instead_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
resultRelInfo->ri_TrigDesc->trig_insert_instead_row);
- /*
- * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
- * should do this for COPY, since it's not really an "INSERT" statement as
- * such. However, executing these triggers maintains consistency with the
- * EACH ROW triggers that we already fire on COPY.
- */
- ExecBSInsertTriggers(estate, resultRelInfo);
+ if (!IsParallelCopy())
+ {
+ /*
+ * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
+ * should do this for COPY, since it's not really an "INSERT" statement as
+ * such. However, executing these triggers maintains consistency with the
+ * EACH ROW triggers that we already fire on COPY.
+ */
+ ExecBSInsertTriggers(estate, resultRelInfo);
+ }
econtext = GetPerTupleExprContext(estate);
@@ -3980,6 +4725,16 @@ CopyFrom(CopyState cstate)
!has_instead_insert_row_trig &&
resultRelInfo->ri_FdwRoutine == NULL;
+ /*
+ * If the table has any partitions that are either foreign or
+ * has BEFORE/INSTEAD OF triggers, we can't perform copy
+ * operations with parallel workers.
+ */
+ if (!leafpart_use_multi_insert && IsParallelWorker())
+ ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot perform PARALLEL COPY if partition has BEFORE/INSTEAD OF triggers, or if the partition is foreign partition"),
+ errhint("Try COPY without PARALLEL option")));
+
/* Set the multi-insert buffer to use for this partition. */
if (leafpart_use_multi_insert)
{
@@ -4394,7 +5149,7 @@ BeginCopyFrom(ParseState *pstate,
initStringInfo(&cstate->attribute_buf);
initStringInfo(&cstate->line_buf);
cstate->line_buf_converted = false;
- cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf = (IsParallelCopy()) ? NULL : (char *) palloc(RAW_BUF_SIZE + 1);
cstate->raw_buf_index = cstate->raw_buf_len = 0;
/* Assign range table, we'll need it in CopyFrom. */
@@ -4538,26 +5293,35 @@ NextCopyFromRawFields(CopyState cstate, char ***fields, int *nfields)
/* only available for text or csv input */
Assert(!cstate->binary);
- /* on input just throw the header line away */
- if (cstate->cur_lineno == 0 && cstate->header_line)
+ if (IsParallelCopy())
{
- cstate->cur_lineno++;
- if (CopyReadLine(cstate))
- return false; /* done */
+ done = GetWorkerLine(cstate);
+ if (done && cstate->line_buf.len == 0)
+ return false;
}
+ else
+ {
+ /* on input just throw the header line away */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
+ {
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ return false; /* done */
+ }
- cstate->cur_lineno++;
+ cstate->cur_lineno++;
- /* Actually read the line into memory here */
- done = CopyReadLine(cstate);
+ /* Actually read the line into memory here */
+ done = CopyReadLine(cstate);
- /*
- * EOF at start of line means we're done. If we see EOF after some
- * characters, we act as though it was newline followed by EOF, ie,
- * process the line and then exit loop on next iteration.
- */
- if (done && cstate->line_buf.len == 0)
- return false;
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ return false;
+ }
/* Parse the line into de-escaped field values */
if (cstate->csv_mode)
@@ -4805,9 +5569,31 @@ CopyReadLine(CopyState cstate)
*/
if (cstate->copy_dest == COPY_NEW_FE)
{
+ bool bIsFirst = true;
do
{
- cstate->raw_buf_index = cstate->raw_buf_len;
+ if (!IsParallelCopy())
+ cstate->raw_buf_index = cstate->raw_buf_len;
+ else
+ {
+ if (cstate->raw_buf_index == RAW_BUF_SIZE)
+ {
+ /* Get a new block if it is the first time, From the
+ * subsequent time, reset the index and re-use the same
+ * block.
+ */
+ if (bIsFirst)
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint32 block_pos = WaitGetFreeCopyBlock(pcshared_info);
+ cstate->raw_buf = pcshared_info->data_blocks[block_pos].data;
+ bIsFirst = false;
+ }
+
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+ }
+ }
+
} while (CopyLoadRawBuf(cstate));
}
}
@@ -4837,6 +5623,11 @@ CopyReadLineText(CopyState cstate)
char quotec = '\0';
char escapec = '\0';
+ /* For parallel copy */
+ uint32 line_size = 0;
+ int line_pos = 0;
+
+ cstate->eol_type = EOL_UNKNOWN;
if (cstate->csv_mode)
{
quotec = cstate->quote[0];
@@ -4891,6 +5682,8 @@ CopyReadLineText(CopyState cstate)
if (raw_buf_ptr >= copy_buf_len || need_data)
{
REFILL_LINEBUF;
+ SetRawBufForLoad(cstate, line_size, copy_buf_len, raw_buf_ptr,
+ ©_raw_buf);
/*
* Try to read some more data. This will certainly reset
@@ -5115,9 +5908,15 @@ CopyReadLineText(CopyState cstate)
* discard the data and the \. sequence.
*/
if (prev_raw_ptr > cstate->raw_buf_index)
- appendBinaryStringInfo(&cstate->line_buf,
+ {
+ if (!IsParallelCopy())
+ appendBinaryStringInfo(&cstate->line_buf,
cstate->raw_buf + cstate->raw_buf_index,
prev_raw_ptr - cstate->raw_buf_index);
+ else
+ line_size += prev_raw_ptr - cstate->raw_buf_index;
+ }
+
cstate->raw_buf_index = raw_buf_ptr;
result = true; /* report EOF */
break;
@@ -5169,6 +5968,26 @@ not_end_of_copy:
IF_NEED_REFILL_AND_EOF_BREAK(mblen - 1);
raw_buf_ptr += mblen - 1;
}
+
+ /*
+ * Skip the header line. Update the line here, this cannot be done at
+ * the beginning, as there is a possibility that file contains empty
+ * lines.
+ */
+ if (IsParallelCopy() && first_char_in_line && !IsHeaderLine())
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundary *lineInfo;
+ uint32 line_first_block = pcshared_info->cur_block_pos;
+ line_pos = UpdateBlockInLineInfo(cstate,
+ line_first_block,
+ cstate->raw_buf_index, -1,
+ LINE_LEADER_POPULATING);
+ lineInfo = &pcshared_info->line_boundaries.ring[line_pos];
+ elog(DEBUG1, "[Leader] Adding - block:%d, offset:%d, line position:%d",
+ line_first_block, lineInfo->start_offset, line_pos);
+ }
+
first_char_in_line = false;
} /* end of outer loop */
@@ -5177,6 +5996,7 @@ not_end_of_copy:
*/
REFILL_LINEBUF;
CLEAR_EOL_LINE()
+ EndLineParallelCopy(cstate, line_pos, line_size, raw_buf_ptr);
return result;
}
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index db19187..584b7ee 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -381,6 +381,7 @@ extern FullTransactionId GetTopFullTransactionId(void);
extern FullTransactionId GetTopFullTransactionIdIfAny(void);
extern FullTransactionId GetCurrentFullTransactionId(void);
extern FullTransactionId GetCurrentFullTransactionIdIfAny(void);
+extern void AssignFullTransactionIdForWorker(FullTransactionId fullTransactionId);
extern void MarkCurrentTransactionIdLoggedIfAny(void);
extern bool SubTransactionIsActive(SubTransactionId subxid);
extern CommandId GetCurrentCommandId(bool used);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 8a79794..86b5c62 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1700,6 +1700,7 @@ ParallelCompletionPtr
ParallelContext
ParallelCopyLineBoundaries
ParallelCopyLineBoundary
+ParallelCopyLineState
ParallelCopyCommonKeyData
ParallelCopyData
ParallelCopyDataBlock
--
1.8.3.1
0004-Documentation-for-parallel-copy.patchapplication/octet-stream; name=0004-Documentation-for-parallel-copy.patchDownload
From f06311aab275234125ed5b2c1a5c4740db55df3f Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Wed, 17 Jun 2020 07:34:56 +0530
Subject: [PATCH 4/6] Documentation for parallel copy.
This patch has the documentation changes for parallel copy.
---
doc/src/sgml/ref/copy.sgml | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18189ab..95d349d 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -37,6 +37,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
NULL '<replaceable class="parameter">null_string</replaceable>'
HEADER [ <replaceable class="parameter">boolean</replaceable> ]
+ PARALLEL <replaceable class="parameter">integer</replaceable>
QUOTE '<replaceable class="parameter">quote_character</replaceable>'
ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -275,6 +276,21 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</varlistentry>
<varlistentry>
+ <term><literal>PARALLEL</literal></term>
+ <listitem>
+ <para>
+ Perform <command>COPY FROM</command> in parallel using <replaceable
+ class="parameter"> integer</replaceable> background workers. Please
+ note that it is not guaranteed that the number of parallel workers
+ specified in <replaceable class="parameter">integer</replaceable> will
+ be used during execution. It is possible for a copy to run with fewer
+ workers than specified, or even with no workers at all. This option is
+ allowed only in <command>COPY FROM</command>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>QUOTE</literal></term>
<listitem>
<para>
--
1.8.3.1
0005-Tests-for-parallel-copy.patchapplication/octet-stream; name=0005-Tests-for-parallel-copy.patchDownload
From 740f39b8a87eb74e74182ed2cc5e8c18bd2ce367 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Wed, 17 Jun 2020 07:37:26 +0530
Subject: [PATCH 5/6] Tests for parallel copy.
This patch has the tests for parallel copy.
---
src/test/regress/expected/copy2.out | 210 +++++++++++++++++++++++++++++++++++-
src/test/regress/input/copy.source | 12 +++
src/test/regress/output/copy.source | 12 +++
src/test/regress/sql/copy2.sql | 208 ++++++++++++++++++++++++++++++++++-
4 files changed, 434 insertions(+), 8 deletions(-)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index e40287d..a088f72 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -254,18 +254,32 @@ It is "perfect".|
"It is ""perfect""."," "
"",
--test that we read consecutive LFs properly
-CREATE TEMP TABLE testnl (a int, b text, c int);
+CREATE TABLE testnl (a int, b text, c int);
COPY testnl FROM stdin CSV;
+COPY testnl FROM stdin WITH (FORMAT 'csv', PARALLEL '2');
+SELECT * FROM testnl;
+ a | b | c
+---+----------------------+---
+ 1 | a field with two LFs+| 2
+ | +|
+ | inside |
+ 1 | a field with two LFs+| 2
+ | +|
+ | inside |
+(2 rows)
+
-- test end of copy marker
-CREATE TEMP TABLE testeoc (a text);
+CREATE TABLE testeoc (a text);
COPY testeoc FROM stdin CSV;
+TRUNCATE testeoc;
+COPY testeoc FROM stdin WITH (FORMAT 'csv', PARALLEL '2');
COPY testeoc TO stdout CSV;
a\.
\.b
c\.d
"\."
-- test handling of nonstandard null marker that violates escaping rules
-CREATE TEMP TABLE testnull(a int, b text);
+CREATE TABLE testnull(a int, b text);
INSERT INTO testnull VALUES (1, E'\\0'), (NULL, NULL);
COPY testnull TO stdout WITH NULL AS E'\\0';
1 \\0
@@ -280,6 +294,15 @@ SELECT * FROM testnull;
|
(4 rows)
+TRUNCATE testnull;
+COPY testnull FROM stdin WITH (NULL E'\\0', PARALLEL 2);
+SELECT * FROM testnull;
+ a | b
+----+----
+ 42 | \0
+ |
+(2 rows)
+
BEGIN;
CREATE TABLE vistest (LIKE testeoc);
COPY vistest FROM stdin CSV;
@@ -349,6 +372,34 @@ SELECT * FROM vistest;
BEGIN;
TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 2);
+SELECT * FROM vistest;
+ a
+----
+ a2
+ b
+(2 rows)
+
+SAVEPOINT s1;
+TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 2);
+SELECT * FROM vistest;
+ a
+----
+ d2
+ e
+(2 rows)
+
+COMMIT;
+SELECT * FROM vistest;
+ a
+----
+ d2
+ e
+(2 rows)
+
+BEGIN;
+TRUNCATE vistest;
COPY vistest FROM stdin CSV FREEZE;
SELECT * FROM vistest;
a
@@ -409,7 +460,7 @@ SELECT * FROM vistest;
(2 rows)
-- Test FORCE_NOT_NULL and FORCE_NULL options
-CREATE TEMP TABLE forcetest (
+CREATE TABLE forcetest (
a INT NOT NULL,
b TEXT NOT NULL,
c TEXT,
@@ -420,6 +471,8 @@ CREATE TEMP TABLE forcetest (
-- should succeed with no effect ("b" remains an empty string, "c" remains NULL)
BEGIN;
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
+TRUNCATE forcetest;
+COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c), PARALLEL 2);
COMMIT;
SELECT b, c FROM forcetest WHERE a = 1;
b | c
@@ -430,6 +483,8 @@ SELECT b, c FROM forcetest WHERE a = 1;
-- should succeed, FORCE_NULL and FORCE_NOT_NULL can be both specified
BEGIN;
COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d));
+TRUNCATE forcetest;
+COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d), PARALLEL 2);
COMMIT;
SELECT c, d FROM forcetest WHERE a = 2;
c | d
@@ -486,6 +541,31 @@ select * from check_con_tbl;
(2 rows)
+\d+ check_con_tbl
+ Table "public.check_con_tbl"
+ Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
+--------+---------+-----------+----------+---------+---------+--------------+-------------
+ f1 | integer | | | | plain | |
+Check constraints:
+ "check_con_tbl_check" CHECK (check_con_function(check_con_tbl.*))
+
+truncate check_con_tbl;
+copy check_con_tbl from stdin with (parallel 2);
+NOTICE: input = {"f1":1}
+NOTICE: input = {"f1":null}
+copy check_con_tbl from stdin with (parallel 2);
+NOTICE: input = {"f1":0}
+ERROR: new row for relation "check_con_tbl" violates check constraint "check_con_tbl_check"
+DETAIL: Failing row contains (0).
+CONTEXT: COPY check_con_tbl, line 1: "0"
+parallel worker
+select * from check_con_tbl;
+ f1
+----
+ 1
+
+(2 rows)
+
-- test with RLS enabled.
CREATE ROLE regress_rls_copy_user;
CREATE ROLE regress_rls_copy_user_colperms;
@@ -600,8 +680,130 @@ SELECT * FROM instead_of_insert_tbl;
(2 rows)
COMMIT;
+-- Parallel copy tests.
+CREATE TABLE test_parallel_copy (
+ a INT,
+ b INT,
+ c TEXT not null default 'stuff',
+ d TEXT,
+ e TEXT
+) ;
+COPY test_parallel_copy (a, b, c, d, e) from stdin with (PARALLEL '2');
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '2');
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '2');
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '2', FORMAT 'csv', HEADER);
+-- zero workers: should perform non-parallel copy
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0');
+ERROR: argument to option "parallel" must be a positive integer greater than zero
+LINE 1: COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0'...
+ ^
+-- referencing table: should perform non-parallel copy
+CREATE TABLE test_copy_pk(c1 INT PRIMARY KEY);
+INSERT INTO test_copy_pk VALUES(10);
+CREATE TABLE test_copy_ri(c1 INT REFERENCES test_copy_pk(c1));
+COPY test_copy_ri from stdin with (FORMAT csv, DELIMITER ',', PARALLEL '2');
+WARNING: Parallel copy not supported for specified table, copy will be run in non-parallel mode
+-- expressions: should perform non-parallel copy
+CREATE TABLE test_copy_expr (index INT, height REAL, weight REAL);
+COPY test_copy_expr FROM STDIN WITH (FORMAT csv, DELIMITER ',', parallel '2') WHERE height > random() * 65;
+WARNING: Parallel copy not supported for specified table, copy will be run in non-parallel mode
+-- serial data: should perform non-parallel copy
+CREATE TABLE testserial (index SERIAL, height REAL);
+COPY testserial(height) FROM STDIN WITH (FORMAT csv, DELIMITER ',', parallel '2');
+WARNING: Parallel copy not supported for specified table, copy will be run in non-parallel mode
+-- temporary table copy: should peform non-parallel copy
+CREATE TEMPORARY TABLE temp_test(
+ a int
+) ;
+COPY temp_test (a) from stdin with (PARALLEL '2');
+WARNING: Parallel copy not supported for specified table, copy will be run in non-parallel mode
+-- non-existent column in column list: should fail
+COPY test_parallel_copy (xyz) from stdin with (PARALLEL '2');
+ERROR: column "xyz" of relation "test_parallel_copy" does not exist
+-- too many columns in column list: should fail
+COPY test_parallel_copy (a, b, c, d, e, d, c) from stdin with (PARALLEL '2');
+ERROR: column "d" specified more than once
+-- missing data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL '2');
+ERROR: invalid input syntax for type integer: ""
+CONTEXT: COPY test_parallel_copy, line 0, column a: ""
+parallel worker
+COPY test_parallel_copy from stdin with (PARALLEL '2');
+ERROR: missing data for column "e"
+CONTEXT: COPY test_parallel_copy, line 1: "2000 230 23 23"
+parallel worker
+COPY test_parallel_copy from stdin with (PARALLEL '2');
+ERROR: missing data for column "e"
+CONTEXT: COPY test_parallel_copy, line 1: "2001 231 \N \N"
+parallel worker
+-- extra data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL '2');
+ERROR: extra data after last expected column
+CONTEXT: COPY test_parallel_copy, line 1: "2002 232 40 50 60 70 80"
+parallel worker
+-- various COPY options: delimiters, oids, NULL string, encoding
+COPY test_parallel_copy (b, c, d, e) from stdin with (delimiter ',', null 'x', PARALLEL '2') ;
+COPY test_parallel_copy from stdin WITH (DELIMITER ';', NULL '', PARALLEL '2');
+COPY test_parallel_copy from stdin WITH (DELIMITER ':', NULL E'\\X', ENCODING 'sql_ascii', PARALLEL '2');
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a = 50004;
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a > 60003;
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE f > 60003;
+ERROR: column "f" does not exist
+LINE 1: ...arallel_copy from stdin WITH (PARALLEL '2') WHERE f > 60003;
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a = max(x.b);
+ERROR: missing FROM-clause entry for table "x"
+LINE 1: ...llel_copy from stdin WITH (PARALLEL '2') WHERE a = max(x.b);
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a IN (SELECT 1 FROM x);
+ERROR: cannot use subquery in COPY FROM WHERE condition
+LINE 1: ...allel_copy from stdin WITH (PARALLEL '2') WHERE a IN (SELECT...
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a IN (generate_series(1,5));
+ERROR: set-returning functions are not allowed in COPY FROM WHERE conditions
+LINE 1: ...l_copy from stdin WITH (PARALLEL '2') WHERE a IN (generate_s...
+ ^
+COPY test_parallel_copy from stdin WITH(PARALLEL '2') WHERE a = row_number() over(b);
+ERROR: window functions are not allowed in COPY FROM WHERE conditions
+LINE 1: ...llel_copy from stdin WITH(PARALLEL '2') WHERE a = row_number...
+ ^
+-- check results of copy in
+SELECT * FROM test_parallel_copy;
+ a | b | c | d | e
+-------+----+------------+---------+---------
+ 1 | 11 | test_c1 | test_d1 | test_e1
+ 2 | 12 | test_c2 | test_d2 | test_e2
+ | 3 | stuff | test_d3 |
+ | 4 | stuff | test_d4 |
+ | 5 | stuff | test_d5 |
+ | | 45 | 80 | 90
+ | | x | \x | \x
+ | | , | \, | \
+ 3000 | | c | |
+ 4000 | | C | |
+ 4001 | 1 | empty | |
+ 4002 | 2 | null | |
+ 4003 | 3 | Backslash | \ | \
+ 4004 | 4 | BackslashX | \X | \X
+ 4005 | 5 | N | N | N
+ 4006 | 6 | BackslashN | \N | \N
+ 4007 | 7 | XX | XX | XX
+ 4008 | 8 | Delimiter | : | :
+ 50004 | 25 | 35 | 45 | 55
+ 60004 | 25 | 35 | 45 | 55
+ 60005 | 26 | 36 | 46 | 56
+(21 rows)
+
-- clean up
+DROP TABLE test_copy_ri;
+DROP TABLE test_copy_pk;
+DROP TABLE test_copy_expr;
+DROP TABLE testeoc;
+DROP TABLE testnl;
DROP TABLE forcetest;
+DROP TABLE test_parallel_copy;
+DROP TABLE testserial;
+DROP TABLE testnull;
DROP TABLE vistest;
DROP FUNCTION truncate_in_subxact();
DROP TABLE x, y;
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index a1d529a..159c058 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -15,6 +15,13 @@ DELETE FROM onek;
COPY onek FROM '@abs_builddir@/results/onek.data';
+-- test parallel copy
+COPY tenk1 FROM '@abs_srcdir@/data/tenk.data' WITH (parallel 2);
+
+SELECT COUNT(*) FROM tenk1;
+
+TRUNCATE tenk1;
+
COPY tenk1 FROM '@abs_srcdir@/data/tenk.data';
COPY slow_emp4000 FROM '@abs_srcdir@/data/rect.data';
@@ -159,6 +166,11 @@ truncate parted_copytest;
copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv';
+-- Test parallel copy from with a partitioned table.
+truncate parted_copytest;
+
+copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv' with (parallel 2);
+
-- Ensure COPY FREEZE errors for partitioned tables.
begin;
truncate parted_copytest;
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index 938d355..c3003fe 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -9,6 +9,15 @@ COPY onek FROM '@abs_srcdir@/data/onek.data';
COPY onek TO '@abs_builddir@/results/onek.data';
DELETE FROM onek;
COPY onek FROM '@abs_builddir@/results/onek.data';
+-- test parallel copy
+COPY tenk1 FROM '@abs_srcdir@/data/tenk.data' WITH (parallel 2);
+SELECT COUNT(*) FROM tenk1;
+ count
+-------
+ 10000
+(1 row)
+
+TRUNCATE tenk1;
COPY tenk1 FROM '@abs_srcdir@/data/tenk.data';
COPY slow_emp4000 FROM '@abs_srcdir@/data/rect.data';
COPY person FROM '@abs_srcdir@/data/person.data';
@@ -113,6 +122,9 @@ insert into parted_copytest select x,1,'One' from generate_series(1011,1020) x;
copy (select * from parted_copytest order by a) to '@abs_builddir@/results/parted_copytest.csv';
truncate parted_copytest;
copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv';
+-- Test parallel copy from with a partitioned table.
+truncate parted_copytest;
+copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv' with (parallel 2);
-- Ensure COPY FREEZE errors for partitioned tables.
begin;
truncate parted_copytest;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 902f4fa..13104f4 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -157,7 +157,7 @@ COPY y TO stdout (FORMAT CSV, FORCE_QUOTE *);
--test that we read consecutive LFs properly
-CREATE TEMP TABLE testnl (a int, b text, c int);
+CREATE TABLE testnl (a int, b text, c int);
COPY testnl FROM stdin CSV;
1,"a field with two LFs
@@ -165,8 +165,16 @@ COPY testnl FROM stdin CSV;
inside",2
\.
+COPY testnl FROM stdin WITH (FORMAT 'csv', PARALLEL '2');
+1,"a field with two LFs
+
+inside",2
+\.
+
+SELECT * FROM testnl;
+
-- test end of copy marker
-CREATE TEMP TABLE testeoc (a text);
+CREATE TABLE testeoc (a text);
COPY testeoc FROM stdin CSV;
a\.
@@ -175,11 +183,19 @@ c\.d
"\."
\.
+TRUNCATE testeoc;
+COPY testeoc FROM stdin WITH (FORMAT 'csv', PARALLEL '2');
+a\.
+\.b
+c\.d
+"\."
+\.
+
COPY testeoc TO stdout CSV;
-- test handling of nonstandard null marker that violates escaping rules
-CREATE TEMP TABLE testnull(a int, b text);
+CREATE TABLE testnull(a int, b text);
INSERT INTO testnull VALUES (1, E'\\0'), (NULL, NULL);
COPY testnull TO stdout WITH NULL AS E'\\0';
@@ -191,6 +207,14 @@ COPY testnull FROM stdin WITH NULL AS E'\\0';
SELECT * FROM testnull;
+TRUNCATE testnull;
+COPY testnull FROM stdin WITH (NULL E'\\0', PARALLEL 2);
+42 \\0
+\0 \0
+\.
+
+SELECT * FROM testnull;
+
BEGIN;
CREATE TABLE vistest (LIKE testeoc);
COPY vistest FROM stdin CSV;
@@ -235,6 +259,23 @@ SELECT * FROM vistest;
BEGIN;
TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 2);
+a2
+b
+\.
+SELECT * FROM vistest;
+SAVEPOINT s1;
+TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 2);
+d2
+e
+\.
+SELECT * FROM vistest;
+COMMIT;
+SELECT * FROM vistest;
+
+BEGIN;
+TRUNCATE vistest;
COPY vistest FROM stdin CSV FREEZE;
x
y
@@ -284,7 +325,7 @@ SELECT * FROM vistest;
COMMIT;
SELECT * FROM vistest;
-- Test FORCE_NOT_NULL and FORCE_NULL options
-CREATE TEMP TABLE forcetest (
+CREATE TABLE forcetest (
a INT NOT NULL,
b TEXT NOT NULL,
c TEXT,
@@ -297,6 +338,10 @@ BEGIN;
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
1,,""
\.
+TRUNCATE forcetest;
+COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c), PARALLEL 2);
+1,,""
+\.
COMMIT;
SELECT b, c FROM forcetest WHERE a = 1;
-- should succeed, FORCE_NULL and FORCE_NOT_NULL can be both specified
@@ -304,6 +349,10 @@ BEGIN;
COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d));
2,'a',,""
\.
+TRUNCATE forcetest;
+COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d), PARALLEL 2);
+2,'a',,""
+\.
COMMIT;
SELECT c, d FROM forcetest WHERE a = 2;
-- should fail with not-null constraint violation
@@ -339,6 +388,16 @@ copy check_con_tbl from stdin;
0
\.
select * from check_con_tbl;
+\d+ check_con_tbl
+truncate check_con_tbl;
+copy check_con_tbl from stdin with (parallel 2);
+1
+\N
+\.
+copy check_con_tbl from stdin with (parallel 2);
+0
+\.
+select * from check_con_tbl;
-- test with RLS enabled.
CREATE ROLE regress_rls_copy_user;
@@ -440,8 +499,149 @@ test1
SELECT * FROM instead_of_insert_tbl;
COMMIT;
+-- Parallel copy tests.
+CREATE TABLE test_parallel_copy (
+ a INT,
+ b INT,
+ c TEXT not null default 'stuff',
+ d TEXT,
+ e TEXT
+) ;
+
+COPY test_parallel_copy (a, b, c, d, e) from stdin with (PARALLEL '2');
+1 11 test_c1 test_d1 test_e1
+2 12 test_c2 test_d2 test_e2
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '2');
+3 test_d3
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '2');
+4 test_d4
+5 test_d5
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '2', FORMAT 'csv', HEADER);
+b d
+\.
+
+-- zero workers: should perform non-parallel copy
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0');
+
+-- referencing table: should perform non-parallel copy
+CREATE TABLE test_copy_pk(c1 INT PRIMARY KEY);
+INSERT INTO test_copy_pk VALUES(10);
+CREATE TABLE test_copy_ri(c1 INT REFERENCES test_copy_pk(c1));
+COPY test_copy_ri from stdin with (FORMAT csv, DELIMITER ',', PARALLEL '2');
+10
+\.
+
+-- expressions: should perform non-parallel copy
+CREATE TABLE test_copy_expr (index INT, height REAL, weight REAL);
+
+COPY test_copy_expr FROM STDIN WITH (FORMAT csv, DELIMITER ',', parallel '2') WHERE height > random() * 65;
+60,60,60
+\.
+
+-- serial data: should perform non-parallel copy
+CREATE TABLE testserial (index SERIAL, height REAL);
+
+COPY testserial(height) FROM STDIN WITH (FORMAT csv, DELIMITER ',', parallel '2');
+60
+\.
+
+-- temporary table copy: should peform non-parallel copy
+CREATE TEMPORARY TABLE temp_test(
+ a int
+) ;
+
+COPY temp_test (a) from stdin with (PARALLEL '2');
+10
+\.
+
+-- non-existent column in column list: should fail
+COPY test_parallel_copy (xyz) from stdin with (PARALLEL '2');
+
+-- too many columns in column list: should fail
+COPY test_parallel_copy (a, b, c, d, e, d, c) from stdin with (PARALLEL '2');
+
+-- missing data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL '2');
+
+\.
+COPY test_parallel_copy from stdin with (PARALLEL '2');
+2000 230 23 23
+\.
+COPY test_parallel_copy from stdin with (PARALLEL '2');
+2001 231 \N \N
+\.
+
+-- extra data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL '2');
+2002 232 40 50 60 70 80
+\.
+
+-- various COPY options: delimiters, oids, NULL string, encoding
+COPY test_parallel_copy (b, c, d, e) from stdin with (delimiter ',', null 'x', PARALLEL '2') ;
+x,45,80,90
+x,\x,\\x,\\\x
+x,\,,\\\,,\\
+\.
+
+COPY test_parallel_copy from stdin WITH (DELIMITER ';', NULL '', PARALLEL '2');
+3000;;c;;
+\.
+
+COPY test_parallel_copy from stdin WITH (DELIMITER ':', NULL E'\\X', ENCODING 'sql_ascii', PARALLEL '2');
+4000:\X:C:\X:\X
+4001:1:empty::
+4002:2:null:\X:\X
+4003:3:Backslash:\\:\\
+4004:4:BackslashX:\\X:\\X
+4005:5:N:\N:\N
+4006:6:BackslashN:\\N:\\N
+4007:7:XX:\XX:\XX
+4008:8:Delimiter:\::\:
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a = 50004;
+50003 24 34 44 54
+50004 25 35 45 55
+50005 26 36 46 56
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a > 60003;
+60001 22 32 42 52
+60002 23 33 43 53
+60003 24 34 44 54
+60004 25 35 45 55
+60005 26 36 46 56
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE f > 60003;
+
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a = max(x.b);
+
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a IN (SELECT 1 FROM x);
+
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a IN (generate_series(1,5));
+
+COPY test_parallel_copy from stdin WITH(PARALLEL '2') WHERE a = row_number() over(b);
+
+-- check results of copy in
+SELECT * FROM test_parallel_copy;
+
-- clean up
+DROP TABLE test_copy_ri;
+DROP TABLE test_copy_pk;
+DROP TABLE test_copy_expr;
+DROP TABLE testeoc;
+DROP TABLE testnl;
DROP TABLE forcetest;
+DROP TABLE test_parallel_copy;
+DROP TABLE testserial;
+DROP TABLE testnull;
DROP TABLE vistest;
DROP FUNCTION truncate_in_subxact();
DROP TABLE x, y;
--
1.8.3.1
Hi,
It looks like the parsing of newly introduced "PARALLEL" option for
COPY FROM command has an issue(in the
0002-Framework-for-leader-worker-in-parallel-copy.patch),
Mentioning ....PARALLEL '4ar2eteid'); would pass with 4 workers since
atoi() is being used for converting string to integer which just
returns 4, ignoring other strings.
I used strtol(), added error checks and introduced the error "
improper use of argument to option "parallel"" for the above cases.
parallel '4ar2eteid');
ERROR: improper use of argument to option "parallel"
LINE 5: parallel '1\');
Along with the updated patch
0002-Framework-for-leader-worker-in-parallel-copy.patch, also
attaching all the latest patches from [1]/messages/by-id/CALj2ACW94icER3WrWapon7JkcX8j0TGRue5ycWMTEvgA3X7fOg@mail.gmail.com.
[1]: /messages/by-id/CALj2ACW94icER3WrWapon7JkcX8j0TGRue5ycWMTEvgA3X7fOg@mail.gmail.com
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Show quoted text
On Tue, Jun 23, 2020 at 12:22 PM vignesh C <vignesh21@gmail.com> wrote:
On Tue, Jun 23, 2020 at 8:07 AM vignesh C <vignesh21@gmail.com> wrote:
I have attached the patch for the same with the fixes.
The patches were not applying on the head, attached the patches that can be applied on head.
I have added a commitfest entry[1] for this feature.[1] - https://commitfest.postgresql.org/28/2610/
On Tue, Jun 23, 2020 at 8:07 AM vignesh C <vignesh21@gmail.com> wrote:
Thanks Ashutosh For your review, my comments are inline.
On Fri, Jun 19, 2020 at 5:41 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:Hi,
I just got some time to review the first patch in the list i.e. 0001-Copy-code-readjustment-to-support-parallel-copy.patch. As the patch name suggests, it is just trying to reshuffle the existing code for COPY command here and there. There is no extra changes added in the patch as such, but still I do have some review comments, please have a look:
1) Can you please add some comments atop the new function PopulateAttributes() describing its functionality in detail. Further, this new function contains the code from BeginCopy() to set attribute level options used with COPY FROM such as FORCE_QUOTE, FORCE_NOT_NULL, FORCE_NULL etc. in cstate and along with that it also copies the code from BeginCopy() to set other infos such as client encoding type, encoding conversion etc. Hence, I think it would be good to give it some better name, basically something that matches with what actually it is doing.
There is no new code added in this function, some part of code from
BeginCopy was made in to a new function as this part of code will also
be required for the parallel copy workers before the workers start the
actual copy operation. This code was made into a function to avoid
duplication. Changed the function name to PopulateGlobalsForCopyFrom &
added few comments.2) Again, the name for the new function CheckCopyFromValidity() doesn't look good to me. From the function name it appears as if it does the sanity check of the entire COPY FROM command, but actually it is just doing the sanity check for the target relation specified with COPY FROM. So, probably something like CheckTargetRelValidity would look more sensible, I think? TBH, I am not good at naming the functions so you can always ignore my suggestions about function and variable names :)
Changed as suggested.
3) Any reason for not making CheckCopyFromValidity as a macro instead of a new function. It is just doing the sanity check for the target relation.
I felt there is reasonable number of lines in the function & it is not
in performance intensive path, so I preferred function over macro.
Your thoughts?4) Earlier in CopyReadLine() function while trying to clear the EOL marker from cstate->line_buf.data (copied data), we were not checking if the line read by CopyReadLineText() function is a header line or not, but I can see that your patch checks that before clearing the EOL marker. Any reason for this extra check?
If you see the caller of CopyReadLine, i.e. NextCopyFromRawFields does
nothing for the header line, server basically calls CopyReadLine
again, it is a kind of small optimization. Anyway server is not going
to do anything with header line, I felt no need to clear EOL marker
for header lines.
/* on input just throw the header line away */
if (cstate->cur_lineno == 0 && cstate->header_line)
{
cstate->cur_lineno++;
if (CopyReadLine(cstate))
return false; /* done */
}cstate->cur_lineno++;
/* Actually read the line into memory here */
done = CopyReadLine(cstate);
I think no need to make a fix for this. Your thoughts?5) I noticed the below spurious line removal in the patch.
@@ -3839,7 +3953,6 @@ static bool
CopyReadLine(CopyState cstate)
{
bool result;
-Fixed.
I have attached the patch for the same with the fixes.
Thoughts?Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
On Wed, Jun 24, 2020 at 2:16 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
Hi,
It looks like the parsing of newly introduced "PARALLEL" option for
COPY FROM command has an issue(in the
0002-Framework-for-leader-worker-in-parallel-copy.patch),
Mentioning ....PARALLEL '4ar2eteid'); would pass with 4 workers since
atoi() is being used for converting string to integer which just
returns 4, ignoring other strings.I used strtol(), added error checks and introduced the error "
improper use of argument to option "parallel"" for the above cases.parallel '4ar2eteid');
ERROR: improper use of argument to option "parallel"
LINE 5: parallel '1\');Along with the updated patch
0002-Framework-for-leader-worker-in-parallel-copy.patch, also
attaching all the latest patches from [1].[1] - /messages/by-id/CALj2ACW94icER3WrWapon7JkcX8j0TGRue5ycWMTEvgA3X7fOg@mail.gmail.com
I'm sorry, I forgot to attach the patches. Here are the latest series
of patches.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
0001-Copy-code-readjustment-to-support-parallel-copy.patchapplication/octet-stream; name=0001-Copy-code-readjustment-to-support-parallel-copy.patchDownload
From 4455d3e067bda56316bb292e5d010bdf40254fec Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Tue, 23 Jun 2020 06:49:22 +0530
Subject: [PATCH 1/6] Copy code readjustment to support parallel copy.
This patch has the copy code slightly readjusted so that the common code is
separated to functions/macros, these functions/macros will be used by the
workers in the parallel copy code of the upcoming patches. EOL removal is moved
from CopyReadLine to CopyReadLineText, this change was required because in case
of parallel copy the record identification and record updation is done in
CopyReadLineText, before record information is updated in shared memory the new
line characters should be removed.
---
src/backend/commands/copy.c | 372 ++++++++++++++++++++++++++------------------
1 file changed, 221 insertions(+), 151 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 6b1fd6d..65a504f 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -95,6 +95,9 @@ typedef enum CopyInsertMethod
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
} CopyInsertMethod;
+#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
+
/*
* This struct contains all the state variables used throughout a COPY
* operation. For simplicity, we use the same struct for all variants of COPY,
@@ -219,7 +222,6 @@ typedef struct CopyStateData
* converts it. Note: we guarantee that there is a \0 at
* raw_buf[raw_buf_len].
*/
-#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
char *raw_buf;
int raw_buf_index; /* next byte to process */
int raw_buf_len; /* total # of bytes stored */
@@ -347,6 +349,88 @@ if (1) \
goto not_end_of_copy; \
} else ((void) 0)
+/*
+ * CONVERT_TO_SERVER_ENCODING - convert contents to server encoding.
+ */
+#define CONVERT_TO_SERVER_ENCODING(cstate) \
+{ \
+ /* Done reading the line. Convert it to server encoding. */ \
+ if (cstate->need_transcoding) \
+ { \
+ char *cvt; \
+ cvt = pg_any_to_server(cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ cstate->file_encoding); \
+ if (cvt != cstate->line_buf.data) \
+ { \
+ /* transfer converted data back to line_buf */ \
+ resetStringInfo(&cstate->line_buf); \
+ appendBinaryStringInfo(&cstate->line_buf, cvt, strlen(cvt)); \
+ pfree(cvt); \
+ } \
+ } \
+ /* Now it's safe to use the buffer in error messages */ \
+ cstate->line_buf_converted = true; \
+}
+
+/*
+ * CLEAR_EOL_FROM_COPIED_DATA - Clear EOL from the copied data.
+ */
+#define CLEAR_EOL_FROM_COPIED_DATA(copy_line_data, copy_line_pos, copy_line_size) \
+{ \
+ /* \
+ * If we didn't hit EOF, then we must have transferred the EOL marker \
+ * to line_buf along with the data. Get rid of it. \
+ */ \
+ switch (cstate->eol_type) \
+ { \
+ case EOL_NL: \
+ Assert(copy_line_size >= 1); \
+ Assert(copy_line_data[copy_line_pos - 1] == '\n'); \
+ copy_line_data[copy_line_pos - 1] = '\0'; \
+ copy_line_size--; \
+ break; \
+ case EOL_CR: \
+ Assert(copy_line_size >= 1); \
+ Assert(copy_line_data[copy_line_pos - 1] == '\r'); \
+ copy_line_data[copy_line_pos - 1] = '\0'; \
+ copy_line_size--; \
+ break; \
+ case EOL_CRNL: \
+ Assert(copy_line_size >= 2); \
+ Assert(copy_line_data[copy_line_pos - 2] == '\r'); \
+ Assert(copy_line_data[copy_line_pos - 1] == '\n'); \
+ copy_line_data[copy_line_pos - 2] = '\0'; \
+ copy_line_size -= 2; \
+ break; \
+ case EOL_UNKNOWN: \
+ /* shouldn't get here */ \
+ Assert(false); \
+ break; \
+ } \
+}
+
+/*
+ * CLEAR_EOL_LINE - Wrapper for clearing EOL.
+ */
+#define CLEAR_EOL_LINE() \
+if (!result && !IsHeaderLine()) \
+ CLEAR_EOL_FROM_COPIED_DATA(cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ cstate->line_buf.len) \
+
+/*
+ * INCREMENTPROCESSED - Increment the lines processed.
+ */
+#define INCREMENTPROCESSED(processed) \
+processed++;
+
+/*
+ * GETPROCESSED - Get the lines processed.
+ */
+#define GETPROCESSED(processed) \
+return processed;
+
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
@@ -392,6 +476,8 @@ static bool CopyGetInt32(CopyState cstate, int32 *val);
static void CopySendInt16(CopyState cstate, int16 val);
static bool CopyGetInt16(CopyState cstate, int16 *val);
+static void PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tup_desc,
+ List *attnamelist);
/*
* Send copy start/stop messages for frontend copies. These have changed
@@ -793,6 +879,7 @@ CopyLoadRawBuf(CopyState cstate)
{
int nbytes;
int inbytes;
+ int minread = 1;
if (cstate->raw_buf_index < cstate->raw_buf_len)
{
@@ -804,8 +891,11 @@ CopyLoadRawBuf(CopyState cstate)
else
nbytes = 0; /* no data need be saved */
+ if (cstate->copy_dest == COPY_NEW_FE)
+ minread = RAW_BUF_SIZE - nbytes;
+
inbytes = CopyGetData(cstate, cstate->raw_buf + nbytes,
- 1, RAW_BUF_SIZE - nbytes);
+ minread, RAW_BUF_SIZE - nbytes);
nbytes += inbytes;
cstate->raw_buf[nbytes] = '\0';
cstate->raw_buf_index = 0;
@@ -1463,7 +1553,6 @@ BeginCopy(ParseState *pstate,
{
CopyState cstate;
TupleDesc tupDesc;
- int num_phys_attrs;
MemoryContext oldcontext;
/* Allocate workspace and zero all fields */
@@ -1629,6 +1718,24 @@ BeginCopy(ParseState *pstate,
tupDesc = cstate->queryDesc->tupDesc;
}
+ PopulateGlobalsForCopyFrom(cstate, tupDesc, attnamelist);
+ cstate->copy_dest = COPY_FILE; /* default */
+
+ MemoryContextSwitchTo(oldcontext);
+
+ return cstate;
+}
+
+/*
+ * PopulateGlobalsForCopyFrom - Populates the common variables required for copy
+ * from operation. This is a helper function for BeginCopy function.
+ */
+static void
+PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tupDesc,
+ List *attnamelist)
+{
+ int num_phys_attrs;
+
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
@@ -1748,12 +1855,6 @@ BeginCopy(ParseState *pstate,
pg_database_encoding_max_length() > 1);
/* See Multibyte encoding comment above */
cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
-
- cstate->copy_dest = COPY_FILE; /* default */
-
- MemoryContextSwitchTo(oldcontext);
-
- return cstate;
}
/*
@@ -2646,32 +2747,11 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
}
/*
- * Copy FROM file to relation.
+ * Check if the relation specified in copy from is valid.
*/
-uint64
-CopyFrom(CopyState cstate)
+static void
+CheckTargetRelValidity(CopyState cstate)
{
- ResultRelInfo *resultRelInfo;
- ResultRelInfo *target_resultRelInfo;
- ResultRelInfo *prevResultRelInfo = NULL;
- EState *estate = CreateExecutorState(); /* for ExecConstraints() */
- ModifyTableState *mtstate;
- ExprContext *econtext;
- TupleTableSlot *singleslot = NULL;
- MemoryContext oldcontext = CurrentMemoryContext;
-
- PartitionTupleRouting *proute = NULL;
- ErrorContextCallback errcallback;
- CommandId mycid = GetCurrentCommandId(true);
- int ti_options = 0; /* start with default options for insert */
- BulkInsertState bistate = NULL;
- CopyInsertMethod insertMethod;
- CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
- uint64 processed = 0;
- bool has_before_insert_row_trig;
- bool has_instead_insert_row_trig;
- bool leafpart_use_multi_insert = false;
-
Assert(cstate->rel);
/*
@@ -2708,27 +2788,6 @@ CopyFrom(CopyState cstate)
RelationGetRelationName(cstate->rel))));
}
- /*
- * If the target file is new-in-transaction, we assume that checking FSM
- * for free space is a waste of time. This could possibly be wrong, but
- * it's unlikely.
- */
- if (RELKIND_HAS_STORAGE(cstate->rel->rd_rel->relkind) &&
- (cstate->rel->rd_createSubid != InvalidSubTransactionId ||
- cstate->rel->rd_firstRelfilenodeSubid != InvalidSubTransactionId))
- ti_options |= TABLE_INSERT_SKIP_FSM;
-
- /*
- * Optimize if new relfilenode was created in this subxact or one of its
- * committed children and we won't see those rows later as part of an
- * earlier scan or command. The subxact test ensures that if this subxact
- * aborts then the frozen rows won't be visible after xact cleanup. Note
- * that the stronger test of exactly which subtransaction created it is
- * crucial for correctness of this optimization. The test for an earlier
- * scan or command tolerates false negatives. FREEZE causes other sessions
- * to see rows they would not see under MVCC, and a false negative merely
- * spreads that anomaly to the current session.
- */
if (cstate->freeze)
{
/*
@@ -2766,9 +2825,61 @@ CopyFrom(CopyState cstate)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("cannot perform COPY FREEZE because the table was not created or truncated in the current subtransaction")));
+ }
+}
+
+/*
+ * Copy FROM file to relation.
+ */
+uint64
+CopyFrom(CopyState cstate)
+{
+ ResultRelInfo *resultRelInfo;
+ ResultRelInfo *target_resultRelInfo;
+ ResultRelInfo *prevResultRelInfo = NULL;
+ EState *estate = CreateExecutorState(); /* for ExecConstraints() */
+ ModifyTableState *mtstate;
+ ExprContext *econtext;
+ TupleTableSlot *singleslot = NULL;
+ MemoryContext oldcontext = CurrentMemoryContext;
+
+ PartitionTupleRouting *proute = NULL;
+ ErrorContextCallback errcallback;
+ CommandId mycid = GetCurrentCommandId(true);
+ int ti_options = 0; /* start with default options for insert */
+ BulkInsertState bistate = NULL;
+ CopyInsertMethod insertMethod;
+ CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
+ uint64 processed = 0;
+ bool has_before_insert_row_trig;
+ bool has_instead_insert_row_trig;
+ bool leafpart_use_multi_insert = false;
+
+ CheckTargetRelValidity(cstate);
+ /*
+ * If the target file is new-in-transaction, we assume that checking FSM
+ * for free space is a waste of time. This could possibly be wrong, but
+ * it's unlikely.
+ */
+ if (RELKIND_HAS_STORAGE(cstate->rel->rd_rel->relkind) &&
+ (cstate->rel->rd_createSubid != InvalidSubTransactionId ||
+ cstate->rel->rd_firstRelfilenodeSubid != InvalidSubTransactionId))
+ ti_options |= TABLE_INSERT_SKIP_FSM;
+
+ /*
+ * Optimize if new relfilenode was created in this subxact or one of its
+ * committed children and we won't see those rows later as part of an
+ * earlier scan or command. The subxact test ensures that if this subxact
+ * aborts then the frozen rows won't be visible after xact cleanup. Note
+ * that the stronger test of exactly which subtransaction created it is
+ * crucial for correctness of this optimization. The test for an earlier
+ * scan or command tolerates false negatives. FREEZE causes other sessions
+ * to see rows they would not see under MVCC, and a false negative merely
+ * spreads that anomaly to the current session.
+ */
+ if (cstate->freeze)
ti_options |= TABLE_INSERT_FROZEN;
- }
/*
* We need a ResultRelInfo so we can use the regular executor's
@@ -3261,7 +3372,7 @@ CopyFrom(CopyState cstate)
* or FDW; this is the same definition used by nodeModifyTable.c
* for counting tuples inserted by an INSERT command.
*/
- processed++;
+ INCREMENTPROCESSED(processed)
}
}
@@ -3316,30 +3427,15 @@ CopyFrom(CopyState cstate)
FreeExecutorState(estate);
- return processed;
+ GETPROCESSED(processed)
}
/*
- * Setup to read tuples from a file for COPY FROM.
- *
- * 'rel': Used as a template for the tuples
- * 'filename': Name of server-local file to read
- * 'attnamelist': List of char *, columns to include. NIL selects all cols.
- * 'options': List of DefElem. See copy_opt_item in gram.y for selections.
- *
- * Returns a CopyState, to be passed to NextCopyFrom and related functions.
+ * PopulateCatalogInformation - populate the catalog information.
*/
-CopyState
-BeginCopyFrom(ParseState *pstate,
- Relation rel,
- const char *filename,
- bool is_program,
- copy_data_source_cb data_source_cb,
- List *attnamelist,
- List *options)
+static void
+PopulateCatalogInformation(CopyState cstate)
{
- CopyState cstate;
- bool pipe = (filename == NULL);
TupleDesc tupDesc;
AttrNumber num_phys_attrs,
num_defaults;
@@ -3349,31 +3445,8 @@ BeginCopyFrom(ParseState *pstate,
Oid in_func_oid;
int *defmap;
ExprState **defexprs;
- MemoryContext oldcontext;
bool volatile_defexprs;
- cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
- oldcontext = MemoryContextSwitchTo(cstate->copycontext);
-
- /* Initialize state variables */
- cstate->reached_eof = false;
- cstate->eol_type = EOL_UNKNOWN;
- cstate->cur_relname = RelationGetRelationName(cstate->rel);
- cstate->cur_lineno = 0;
- cstate->cur_attname = NULL;
- cstate->cur_attval = NULL;
-
- /* Set up variables to avoid per-attribute overhead. */
- initStringInfo(&cstate->attribute_buf);
- initStringInfo(&cstate->line_buf);
- cstate->line_buf_converted = false;
- cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
- cstate->raw_buf_index = cstate->raw_buf_len = 0;
-
- /* Assign range table, we'll need it in CopyFrom. */
- if (pstate)
- cstate->range_table = pstate->p_rtable;
-
tupDesc = RelationGetDescr(cstate->rel);
num_phys_attrs = tupDesc->natts;
num_defaults = 0;
@@ -3451,6 +3524,54 @@ BeginCopyFrom(ParseState *pstate,
cstate->defexprs = defexprs;
cstate->volatile_defexprs = volatile_defexprs;
cstate->num_defaults = num_defaults;
+}
+
+/*
+ * Setup to read tuples from a file for COPY FROM.
+ *
+ * 'rel': Used as a template for the tuples
+ * 'filename': Name of server-local file to read
+ * 'attnamelist': List of char *, columns to include. NIL selects all cols.
+ * 'options': List of DefElem. See copy_opt_item in gram.y for selections.
+ *
+ * Returns a CopyState, to be passed to NextCopyFrom and related functions.
+ */
+CopyState
+BeginCopyFrom(ParseState *pstate,
+ Relation rel,
+ const char *filename,
+ bool is_program,
+ copy_data_source_cb data_source_cb,
+ List *attnamelist,
+ List *options)
+{
+ CopyState cstate;
+ bool pipe = (filename == NULL);
+ MemoryContext oldcontext;
+
+ cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+
+ /* Initialize state variables */
+ cstate->reached_eof = false;
+ cstate->eol_type = EOL_UNKNOWN;
+ cstate->cur_relname = RelationGetRelationName(cstate->rel);
+ cstate->cur_lineno = 0;
+ cstate->cur_attname = NULL;
+ cstate->cur_attval = NULL;
+
+ /* Set up variables to avoid per-attribute overhead. */
+ initStringInfo(&cstate->attribute_buf);
+ initStringInfo(&cstate->line_buf);
+ cstate->line_buf_converted = false;
+ cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+
+ /* Assign range table, we'll need it in CopyFrom. */
+ if (pstate)
+ cstate->range_table = pstate->p_rtable;
+
+ PopulateCatalogInformation(cstate);
cstate->is_program = is_program;
if (data_source_cb)
@@ -3860,60 +3981,8 @@ CopyReadLine(CopyState cstate)
} while (CopyLoadRawBuf(cstate));
}
}
- else
- {
- /*
- * If we didn't hit EOF, then we must have transferred the EOL marker
- * to line_buf along with the data. Get rid of it.
- */
- switch (cstate->eol_type)
- {
- case EOL_NL:
- Assert(cstate->line_buf.len >= 1);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\n');
- cstate->line_buf.len--;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_CR:
- Assert(cstate->line_buf.len >= 1);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\r');
- cstate->line_buf.len--;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_CRNL:
- Assert(cstate->line_buf.len >= 2);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 2] == '\r');
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\n');
- cstate->line_buf.len -= 2;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_UNKNOWN:
- /* shouldn't get here */
- Assert(false);
- break;
- }
- }
-
- /* Done reading the line. Convert it to server encoding. */
- if (cstate->need_transcoding)
- {
- char *cvt;
-
- cvt = pg_any_to_server(cstate->line_buf.data,
- cstate->line_buf.len,
- cstate->file_encoding);
- if (cvt != cstate->line_buf.data)
- {
- /* transfer converted data back to line_buf */
- resetStringInfo(&cstate->line_buf);
- appendBinaryStringInfo(&cstate->line_buf, cvt, strlen(cvt));
- pfree(cvt);
- }
- }
-
- /* Now it's safe to use the buffer in error messages */
- cstate->line_buf_converted = true;
+ CONVERT_TO_SERVER_ENCODING(cstate)
return result;
}
@@ -4277,6 +4346,7 @@ not_end_of_copy:
* Transfer any still-uncopied data to line_buf.
*/
REFILL_LINEBUF;
+ CLEAR_EOL_LINE()
return result;
}
--
1.8.3.1
0002-Framework-for-leader-worker-in-parallel-copy.patchapplication/octet-stream; name=0002-Framework-for-leader-worker-in-parallel-copy.patchDownload
From d8e1447263289fe2ad5ed40f620ab00b5f0bd407 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Wed, 24 Jun 2020 13:51:56 +0530
Subject: [PATCH 2/6] Framework for leader/worker in parallel copy
This patch has the framework for data structures in parallel copy, leader
initialization, worker initialization, shared memory updation, starting workers,
wait for workers and workers exiting.
---
src/backend/access/transam/parallel.c | 4 +
src/backend/commands/copy.c | 865 +++++++++++++++++++-
src/backend/replication/logical/tablesync.c | 2 +-
src/include/commands/copy.h | 4 +
src/tools/pgindent/typedefs.list | 8 +
5 files changed, 878 insertions(+), 5 deletions(-)
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 14a8690019..09e7a191d3 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -25,6 +25,7 @@
#include "catalog/pg_enum.h"
#include "catalog/storage.h"
#include "commands/async.h"
+#include "commands/copy.h"
#include "executor/execParallel.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
@@ -145,6 +146,9 @@ static const struct
},
{
"parallel_vacuum_main", parallel_vacuum_main
+ },
+ {
+ "ParallelCopyMain", ParallelCopyMain
}
};
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 65a504fe96..c906655d0b 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -96,8 +96,137 @@ typedef enum CopyInsertMethod
} CopyInsertMethod;
#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+
+#define DATA_BLOCK_SIZE RAW_BUF_SIZE
+#define RINGSIZE (10 * 1000)
+#define MAX_BLOCKS_COUNT 1000
+#define WORKER_CHUNK_COUNT 50 /* should be mod of RINGSIZE */
+
+#define IsParallelCopy() (cstate->is_parallel)
+#define IsLeader() (cstate->pcdata->is_leader)
#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
+/*
+ * Copy data block information.
+ */
+typedef struct ParallelCopyDataBlock
+{
+ /* The number of unprocessed lines in the current block. */
+ pg_atomic_uint32 unprocessed_line_parts;
+
+ /*
+ * If the current line data is continued into another block, following_block
+ * will have the position where the remaining data need to be read.
+ */
+ uint32 following_block;
+
+ /*
+ * This flag will be set, when the leader finds out this block can be read
+ * safely by the worker. This helps the worker to start processing the line
+ * early where the line will be spread across many blocks and the worker
+ * need not wait for the complete line to be processed.
+ */
+ bool curr_blk_completed;
+ char data[DATA_BLOCK_SIZE]; /* data read from file */
+ uint8 skip_bytes;
+}ParallelCopyDataBlock;
+
+/*
+ * Individual line information.
+ * ParallelCopyLineBoundary is common data structure between leader & worker,
+ * this is protected by the following sequence in the leader & worker.
+ * Leader should operate in the following order:
+ * 1) update first_block, start_offset & cur_lineno in any order.
+ * 2) update line_size.
+ * 3) update line_state.
+ * Worker should operate in the following order:
+ * 1) read line_size.
+ * 2) only one worker should choose one line for processing, this is handled by
+ * using pg_atomic_compare_exchange_u32, worker will change the sate to
+ * LINE_WORKER_PROCESSING only if line_state is LINE_LEADER_POPULATED.
+ * 3) read first_block, start_offset & cur_lineno in any order.
+ */
+typedef struct ParallelCopyLineBoundary
+{
+ /* Position of the first block in data_blocks array. */
+ uint32 first_block;
+ uint32 start_offset; /* start offset of the line */
+
+ /*
+ * Size of the current line -1 means line is yet to be filled completely,
+ * 0 means empty line, >0 means line filled with line size data.
+ */
+ pg_atomic_uint32 line_size;
+ pg_atomic_uint32 line_state; /* line state */
+ uint64 cur_lineno; /* line number for error messages */
+}ParallelCopyLineBoundary;
+
+/*
+ * Circular queue used to store the line information.
+ */
+typedef struct ParallelCopyLineBoundaries
+{
+ /* Position for the leader to populate a line. */
+ uint32 leader_pos;
+
+ /* Data read from the file/stdin by the leader process. */
+ ParallelCopyLineBoundary ring[RINGSIZE];
+}ParallelCopyLineBoundaries;
+
+/*
+ * Shared information among parallel copy workers. This will be allocated in the
+ * DSM segment.
+ */
+typedef struct ParallelCopyShmInfo
+{
+ bool is_read_in_progress; /* file read status */
+
+ /*
+ * Actual lines inserted by worker (some records will be filtered based on
+ * where condition).
+ */
+ pg_atomic_uint64 processed;
+ pg_atomic_uint64 total_worker_processed; /* total processed records by the workers */
+ uint64 populated; /* lines populated by leader */
+ uint32 cur_block_pos; /* current data block */
+ ParallelCopyDataBlock data_blocks[MAX_BLOCKS_COUNT]; /* data block array */
+ FullTransactionId full_transaction_id; /* xid for copy from statement */
+ CommandId mycid; /* command id */
+ ParallelCopyLineBoundaries line_boundaries; /* line array */
+} ParallelCopyShmInfo;
+
+/*
+ * Parallel copy line buffer information.
+ */
+typedef struct ParallelCopyLineBuf
+{
+ StringInfoData line_buf;
+ uint64 cur_lineno; /* line number for error messages */
+}ParallelCopyLineBuf;
+
+/*
+ * Parallel copy data information.
+ */
+typedef struct ParallelCopyData
+{
+ Oid relid; /* relation id of the table */
+ ParallelCopyShmInfo *pcshared_info; /* common info in shared memory */
+ bool is_leader;
+
+ /* line position which worker is processing */
+ uint32 worker_processed_pos;
+
+ /*
+ * Local line_buf array, workers will copy it here and release the lines
+ * for the leader to continue.
+ */
+ ParallelCopyLineBuf worker_line_buf[WORKER_CHUNK_COUNT];
+ uint32 worker_line_buf_count; /* Number of lines */
+
+ /* Current position in worker_line_buf */
+ uint32 worker_line_buf_pos;
+}ParallelCopyData;
+
/*
* This struct contains all the state variables used throughout a COPY
* operation. For simplicity, we use the same struct for all variants of COPY,
@@ -225,8 +354,66 @@ typedef struct CopyStateData
char *raw_buf;
int raw_buf_index; /* next byte to process */
int raw_buf_len; /* total # of bytes stored */
+ int nworkers;
+ bool is_parallel;
+ ParallelCopyData *pcdata;
} CopyStateData;
+/*
+ * This structure helps in storing the common data from CopyStateData that are
+ * required by the workers. This information will then be allocated and stored
+ * into the DSM for the worker to retrieve and copy it to CopyStateData.
+ */
+typedef struct ParallelCopyCommonKeyData
+{
+ /* low-level state data */
+ CopyDest copy_dest; /* type of copy source/destination */
+ int file_encoding; /* file or remote side's character encoding */
+ bool need_transcoding; /* file encoding diff from server? */
+ bool encoding_embeds_ascii; /* ASCII can be non-first byte? */
+
+ /* parameters from the COPY command */
+ bool csv_mode; /* Comma Separated Value format? */
+ bool header_line; /* CSV header line? */
+ int null_print_len; /* length of same */
+ bool force_quote_all; /* FORCE_QUOTE *? */
+ bool convert_selectively; /* do selective binary conversion? */
+
+ /* Working state for COPY FROM */
+ AttrNumber num_defaults;
+ Oid relid;
+}ParallelCopyCommonKeyData;
+
+/*
+ * This structure will help in converting a List data type into the below
+ * structure format with the count having the number of elements in the list and
+ * the info having the List elements appended contigously. This converted
+ * structure will be allocated in shared memory and stored in DSM for the worker
+ * to retrieve and later convert it back to List data type.
+ */
+typedef struct ParallelCopyKeyListInfo
+{
+ int count; /* count of attributes */
+
+ /* string info in the form info followed by info1, info2... infon */
+ char info[1];
+} ParallelCopyKeyListInfo;
+
+/*
+ * List of internal parallel copy function pointers.
+ */
+static const struct
+{
+ char *fn_name;
+ copy_data_source_cb fn_addr;
+} InternalParallelCopyFuncPtrs[] =
+
+{
+ {
+ "copy_read_data", copy_read_data
+ },
+};
+
/* DestReceiver for COPY (query) TO */
typedef struct
{
@@ -256,6 +443,23 @@ typedef struct
/* Trim the list of buffers back down to this number after flushing */
#define MAX_PARTITION_BUFFERS 32
+/* DSM keys for parallel copy. */
+#define PARALLEL_COPY_KEY_SHARED_INFO 1
+#define PARALLEL_COPY_KEY_CSTATE 2
+#define PARALLEL_COPY_KEY_NULL_PRINT 3
+#define PARALLEL_COPY_KEY_NULL_PRINT_CLIENT 4
+#define PARALLEL_COPY_KEY_DELIM 5
+#define PARALLEL_COPY_KEY_QUOTE 6
+#define PARALLEL_COPY_KEY_ESCAPE 7
+#define PARALLEL_COPY_KEY_ATTNAME_LIST 8
+#define PARALLEL_COPY_KEY_FORCE_QUOTE_LIST 9
+#define PARALLEL_COPY_KEY_NOT_NULL_LIST 10
+#define PARALLEL_COPY_KEY_NULL_LIST 11
+#define PARALLEL_COPY_KEY_CONVERT_LIST 12
+#define PARALLEL_COPY_KEY_DATASOURCE_CB 13
+#define PARALLEL_COPY_KEY_WHERE_CLAUSE_STR 14
+#define PARALLEL_COPY_KEY_RANGE_TABLE 15
+
/* Stores multi-insert data related to a single relation in CopyFrom. */
typedef struct CopyMultiInsertBuffer
{
@@ -476,9 +680,595 @@ static bool CopyGetInt32(CopyState cstate, int32 *val);
static void CopySendInt16(CopyState cstate, int16 val);
static bool CopyGetInt16(CopyState cstate, int16 *val);
+static pg_attribute_always_inline void EndParallelCopy(ParallelContext *pcxt);
static void PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tup_desc,
List *attnamelist);
+static pg_attribute_always_inline copy_data_source_cb LookupParallelCopyFnPtr(const char *funcname);
+static pg_attribute_always_inline char* LookupParallelCopyFnStr(copy_data_source_cb fn_addr);
+
+/*
+ * CopyCommonInfoForWorker - Copy shared_cstate using cstate information.
+ */
+static pg_attribute_always_inline void
+CopyCommonInfoForWorker(CopyState cstate, ParallelCopyCommonKeyData *shared_cstate)
+{
+ shared_cstate->copy_dest = cstate->copy_dest;
+ shared_cstate->file_encoding = cstate->file_encoding;
+ shared_cstate->need_transcoding = cstate->need_transcoding;
+ shared_cstate->encoding_embeds_ascii = cstate->encoding_embeds_ascii;
+ shared_cstate->csv_mode = cstate->csv_mode;
+ shared_cstate->header_line = cstate->header_line;
+ shared_cstate->null_print_len = cstate->null_print_len;
+ shared_cstate->force_quote_all = cstate->force_quote_all;
+ shared_cstate->convert_selectively = cstate->convert_selectively;
+ shared_cstate->num_defaults = cstate->num_defaults;
+ shared_cstate->relid = cstate->pcdata->relid;
+}
+
+/*
+ * RetrieveSharedString - Retrieve the string from shared memory.
+ */
+static void
+RetrieveSharedString(shm_toc *toc, int sharedkey, char **copystr)
+{
+ char *shared_str_val = (char *)shm_toc_lookup(toc, sharedkey, true);
+ if (shared_str_val)
+ *copystr = pstrdup(shared_str_val);
+}
+
+/*
+ * RetrieveSharedList - Retrieve the list from shared memory.
+ */
+static void
+RetrieveSharedList(shm_toc *toc, int sharedkey, List **copylist)
+{
+ ParallelCopyKeyListInfo *listinformation = (ParallelCopyKeyListInfo *)shm_toc_lookup(toc, sharedkey,
+ true);
+ if (listinformation)
+ {
+ int length = 0;
+ int count;
+
+ for (count = 0; count < listinformation->count; count++)
+ {
+ char *attname = (char *)(listinformation->info + length);
+ length += strlen(attname) + 1;
+ *copylist = lappend(*copylist, makeString(attname));
+ }
+ }
+}
+
+/*
+ * CopyListSharedMemory - Copy the list into shared memory.
+ */
+static void
+CopyListSharedMemory(List *inputlist, Size memsize, ParallelCopyKeyListInfo *sharedlistinfo)
+{
+ ListCell *l;
+ int length = 0;
+
+ MemSet(sharedlistinfo, 0, memsize);
+ foreach(l, inputlist)
+ {
+ char *name = strVal(lfirst(l));
+ memcpy((char *)(sharedlistinfo->info + length), name, strlen(name));
+ sharedlistinfo->count++;
+ length += strlen(name) + 1;
+ }
+}
+
+/*
+ * ComputeListSize - compute the list size.
+ */
+static int
+ComputeListSize(List *inputlist)
+{
+ int est_size = sizeof(int);
+ if (inputlist != NIL)
+ {
+ ListCell *l;
+ foreach(l, inputlist)
+ est_size += strlen(strVal(lfirst(l))) + 1;
+ }
+
+ return est_size;
+}
+
+/*
+ * EstimateLineKeysStr - Estimate the size required in shared memory for the
+ * input string.
+ */
+static void
+EstimateLineKeysStr(ParallelContext *pcxt, char *inputstr)
+{
+ if (inputstr)
+ {
+ shm_toc_estimate_chunk(&pcxt->estimator, strlen(inputstr) + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+}
+
+/*
+ * EstimateLineKeysList - Estimate the size required in shared memory for the
+ * input list.
+ */
+static void
+EstimateLineKeysList(ParallelContext *pcxt, List *inputlist,
+ Size *est_list_size)
+{
+ if (inputlist != NIL)
+ {
+ *est_list_size = ComputeListSize(inputlist);
+ shm_toc_estimate_chunk(&pcxt->estimator, *est_list_size);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+}
+/*
+ * InsertStringShm - Insert a string into shared memory.
+ */
+static void
+InsertStringShm(ParallelContext *pcxt, int key, char *inputstr)
+{
+ if (inputstr)
+ {
+ char *shmptr = (char *)shm_toc_allocate(pcxt->toc,
+ strlen(inputstr) + 1);
+ strcpy(shmptr, inputstr);
+ shm_toc_insert(pcxt->toc, key, shmptr);
+ }
+}
+
+/*
+ * InsertListShm - Insert a list into shared memory.
+ */
+static void
+InsertListShm(ParallelContext *pcxt, int key, List *inputlist,
+ Size est_list_size)
+{
+ if (inputlist != NIL)
+ {
+ ParallelCopyKeyListInfo *sharedlistinfo = (ParallelCopyKeyListInfo *)shm_toc_allocate(pcxt->toc,
+ est_list_size);
+ CopyListSharedMemory(inputlist, est_list_size, sharedlistinfo);
+ shm_toc_insert(pcxt->toc, key, sharedlistinfo);
+ }
+}
+
+/*
+ * BeginParallelCopy - start parallel copy tasks.
+ *
+ * Get the number of workers required to perform the parallel copy. The data
+ * structures that are required by the parallel workers will be initialized, the
+ * size required in DSM will be calculated and the necessary keys will be loaded
+ * in the DSM. The specified number of workers will then be launched.
+ *
+ */
+static ParallelContext*
+BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
+{
+ ParallelContext *pcxt;
+ ParallelCopyShmInfo *shared_info_ptr;
+ ParallelCopyCommonKeyData *shared_cstate;
+ FullTransactionId full_transaction_id;
+ Size est_shared_info;
+ Size est_cstateshared;
+ Size est_att_list_size;
+ Size est_quote_list_size;
+ Size est_notnull_list_size;
+ Size est_null_list_size;
+ Size est_convert_list_size;
+ Size est_datasource_cb_size;
+ int count = 0;
+ char *whereClauseStr = NULL;
+ char *rangeTableStr = NULL;
+ int parallel_workers = 0;
+ ParallelCopyData *pcdata;
+
+ parallel_workers = Min(nworkers, max_worker_processes);
+
+ /* Can't perform copy in parallel */
+ if (parallel_workers <= 0)
+ {
+ elog(WARNING,
+ "No workers available, copy will be run in non-parallel mode");
+ return NULL;
+ }
+
+ pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
+ cstate->pcdata = pcdata;
+
+ EnterParallelMode();
+ pcxt = CreateParallelContext("postgres", "ParallelCopyMain",
+ parallel_workers);
+ Assert(pcxt->nworkers > 0);
+
+ /*
+ * Estimate size for shared information for PARALLEL_COPY_KEY_SHARED_INFO
+ */
+ est_shared_info = sizeof(ParallelCopyShmInfo);
+ shm_toc_estimate_chunk(&pcxt->estimator, est_shared_info);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate the size for shared information for PARALLEL_COPY_KEY_CSTATE */
+ est_cstateshared = MAXALIGN(sizeof(ParallelCopyCommonKeyData));
+ shm_toc_estimate_chunk(&pcxt->estimator, est_cstateshared);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ EstimateLineKeysStr(pcxt, cstate->null_print);
+ EstimateLineKeysStr(pcxt, cstate->null_print_client);
+ EstimateLineKeysStr(pcxt, cstate->delim);
+ EstimateLineKeysStr(pcxt, cstate->quote);
+ EstimateLineKeysStr(pcxt, cstate->escape);
+
+ if (cstate->whereClause != NULL)
+ {
+ whereClauseStr = nodeToString(cstate->whereClause);
+ EstimateLineKeysStr(pcxt, whereClauseStr);
+ }
+
+ if (cstate->range_table != NULL)
+ {
+ rangeTableStr = nodeToString(cstate->range_table);
+ EstimateLineKeysStr(pcxt, rangeTableStr);
+ }
+
+ /* Estimate the size for shared information for PARALLEL_COPY_KEY_XID. */
+ shm_toc_estimate_chunk(&pcxt->estimator, sizeof(FullTransactionId));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_ATTNAME_LIST.
+ */
+ EstimateLineKeysList(pcxt, attnamelist, &est_att_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_FORCE_QUOTE_LIST.
+ */
+ EstimateLineKeysList(pcxt, cstate->force_quote, &est_quote_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_NOT_NULL_LIST.
+ */
+ EstimateLineKeysList(pcxt, cstate->force_notnull,
+ &est_notnull_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_NULL_LIST.
+ */
+ EstimateLineKeysList(pcxt, cstate->force_null, &est_null_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_CONVERT_LIST.
+ */
+ EstimateLineKeysList(pcxt, cstate->convert_select,
+ &est_convert_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_DATASOURCE_CB.
+ */
+ if (cstate->data_source_cb)
+ {
+ char *functionname = LookupParallelCopyFnStr(cstate->data_source_cb);
+ est_datasource_cb_size = strlen(functionname) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, est_datasource_cb_size);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+
+ InitializeParallelDSM(pcxt);
+
+ /* If no DSM segment was available, back out (do serial copy) */
+ if (pcxt->seg == NULL)
+ {
+ EndParallelCopy(pcxt);
+ elog(WARNING,
+ "DSM segments not available, copy will be run in non-parallel mode");
+ return NULL;
+ }
+
+ /* Allocate shared memory for PARALLEL_COPY_KEY_SHARED_INFO */
+ shared_info_ptr = (ParallelCopyShmInfo *) shm_toc_allocate(pcxt->toc,
+ est_shared_info);
+ MemSet(shared_info_ptr, 0, est_shared_info);
+ shared_info_ptr->is_read_in_progress = true;
+ shared_info_ptr->cur_block_pos = -1;
+ shared_info_ptr->full_transaction_id = full_transaction_id;
+ shared_info_ptr->mycid = GetCurrentCommandId(true);
+ for (count = 0; count < RINGSIZE; count++)
+ {
+ ParallelCopyLineBoundary *lineInfo = &shared_info_ptr->line_boundaries.ring[count];
+ pg_atomic_init_u32(&(lineInfo->line_size), -1);
+ }
+
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_SHARED_INFO, shared_info_ptr);
+ pcdata->pcshared_info = shared_info_ptr;
+ pcdata->relid = relid;
+
+ /* Store shared build state, for which we reserved space. */
+ shared_cstate = (ParallelCopyCommonKeyData *)shm_toc_allocate(pcxt->toc,
+ est_cstateshared);
+
+ /* copy cstate variables. */
+ CopyCommonInfoForWorker(cstate, shared_cstate);
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_CSTATE, shared_cstate);
+
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_NULL_PRINT, cstate->null_print);
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_NULL_PRINT_CLIENT,
+ cstate->null_print_client);
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_DELIM, cstate->delim);
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_QUOTE, cstate->quote);
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_ESCAPE, cstate->escape);
+
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_ATTNAME_LIST,
+ attnamelist, est_att_list_size);
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_FORCE_QUOTE_LIST,
+ cstate->force_quote, est_quote_list_size);
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_NOT_NULL_LIST,
+ cstate->force_notnull, est_notnull_list_size);
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_NULL_LIST, cstate->force_null,
+ est_null_list_size);
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_CONVERT_LIST,
+ cstate->convert_select, est_convert_list_size);
+
+ if (cstate->data_source_cb)
+ {
+ char *functionname = LookupParallelCopyFnStr(cstate->data_source_cb);
+ char *data_source_cb = (char *) shm_toc_allocate(pcxt->toc,
+ est_datasource_cb_size);
+ strcpy(data_source_cb, functionname);
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_DATASOURCE_CB,
+ data_source_cb);
+ }
+
+ if (cstate->whereClause)
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_WHERE_CLAUSE_STR,
+ whereClauseStr);
+
+ if(cstate->range_table)
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_RANGE_TABLE, rangeTableStr);
+
+ LaunchParallelWorkers(pcxt);
+ if (pcxt->nworkers_launched == 0)
+ {
+ EndParallelCopy(pcxt);
+ elog(WARNING,
+ "No workers available, copy will be run in non-parallel mode");
+ return NULL;
+ }
+
+ /*
+ * Caller needs to wait for all launched workers when we return. Make sure
+ * that the failure-to-start case will not hang forever.
+ */
+ WaitForParallelWorkersToAttach(pcxt);
+
+ pcdata->is_leader = true;
+ cstate->is_parallel = true;
+ return pcxt;
+}
+
+/*
+ * EndParallelCopy - end the parallel copy tasks.
+ */
+static pg_attribute_always_inline void
+EndParallelCopy(ParallelContext *pcxt)
+{
+ Assert(!IsParallelWorker());
+
+ DestroyParallelContext(pcxt);
+ ExitParallelMode();
+}
+
+/*
+ * ParallelWorkerInitialization - Initialize parallel worker.
+ */
+static void
+ParallelWorkerInitialization(ParallelCopyCommonKeyData *shared_cstate,
+ CopyState cstate, List *attnamelist)
+{
+ uint32 count;
+ ParallelCopyData *pcdata = cstate->pcdata;
+ TupleDesc tup_desc = RelationGetDescr(cstate->rel);
+
+ cstate->copy_dest = shared_cstate->copy_dest;
+ cstate->file_encoding = shared_cstate->file_encoding;
+ cstate->need_transcoding = shared_cstate->need_transcoding;
+ cstate->encoding_embeds_ascii = shared_cstate->encoding_embeds_ascii;
+ cstate->csv_mode = shared_cstate->csv_mode;
+ cstate->header_line = shared_cstate->header_line;
+ cstate->null_print_len = shared_cstate->null_print_len;
+ cstate->force_quote_all = shared_cstate->force_quote_all;
+ cstate->convert_selectively = shared_cstate->convert_selectively;
+ cstate->num_defaults = shared_cstate->num_defaults;
+ pcdata->relid = shared_cstate->relid;
+
+ PopulateGlobalsForCopyFrom(cstate, tup_desc, attnamelist);
+
+ /* Initialize state variables. */
+ cstate->reached_eof = false;
+ cstate->eol_type = EOL_UNKNOWN;
+ cstate->cur_relname = RelationGetRelationName(cstate->rel);
+ cstate->cur_lineno = 0;
+ cstate->cur_attname = NULL;
+ cstate->cur_attval = NULL;
+
+ /* Set up variables to avoid per-attribute overhead. */
+ initStringInfo(&cstate->attribute_buf);
+
+ initStringInfo(&cstate->line_buf);
+ for (count = 0; count < WORKER_CHUNK_COUNT;count++)
+ initStringInfo(&pcdata->worker_line_buf[count].line_buf);
+
+ cstate->line_buf_converted = false;
+ cstate->raw_buf = NULL;
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+}
+/*
+ * ParallelCopyMain - parallel copy worker's code.
+ *
+ * Where clause handling, convert tuple to columns, add default null values for
+ * the missing columns that are not present in that record. Find the partition
+ * if it is partitioned table, invoke before row insert Triggers, handle
+ * constraints and insert the tuples.
+ */
+void
+ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
+{
+ CopyState cstate;
+ ParallelCopyData *pcdata;
+ ParallelCopyShmInfo *pcshared_info;
+ ParallelCopyCommonKeyData *shared_cstate;
+ Relation rel = NULL;
+ MemoryContext oldcontext;
+ List *attlist = NIL;
+ char *data_source_cb;
+ char *whereClauseStr = NULL;
+ char *rangeTableStr = NULL;
+
+ /* Allocate workspace and zero all fields. */
+ cstate = (CopyStateData *) palloc0(sizeof(CopyStateData));
+ pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
+ cstate->pcdata = pcdata;
+ pcdata->is_leader = false;
+ pcdata->worker_processed_pos = -1;
+ cstate->is_parallel = true;
+
+ /*
+ * We allocate everything used by a cstate in a new memory context. This
+ * avoids memory leaks during repeated use of COPY in a query.
+ */
+ cstate->copycontext = AllocSetContextCreate(CurrentMemoryContext,
+ "COPY",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+
+ pcshared_info = (ParallelCopyShmInfo *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_SHARED_INFO,
+ false);
+
+ ereport(DEBUG1, (errmsg("Starting parallel copy worker")));
+
+ pcdata->pcshared_info = pcshared_info;
+
+ shared_cstate = (ParallelCopyCommonKeyData *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_CSTATE,
+ false);
+
+ cstate->null_print = (char *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_NULL_PRINT,
+ true);
+ cstate->null_print_client = (char *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_NULL_PRINT_CLIENT,
+ true);
+
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_DELIM, &cstate->delim);
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_QUOTE, &cstate->quote);
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_ESCAPE, &cstate->escape);
+
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_ATTNAME_LIST, &attlist);
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_FORCE_QUOTE_LIST, &cstate->force_quote);
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_NOT_NULL_LIST, &cstate->force_notnull);
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_NULL_LIST, &cstate->force_null);
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_CONVERT_LIST, &cstate->convert_select);
+ data_source_cb = (char *)shm_toc_lookup(toc,
+ PARALLEL_COPY_KEY_DATASOURCE_CB,
+ true);
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_WHERE_CLAUSE_STR, &whereClauseStr);
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_RANGE_TABLE, &rangeTableStr);
+
+ if (data_source_cb)
+ cstate->data_source_cb = LookupParallelCopyFnPtr(data_source_cb);
+
+ if (whereClauseStr)
+ {
+ Node *whereClauseCnvrtdFrmStr = (Node *) stringToNode(whereClauseStr);
+ cstate->whereClause = whereClauseCnvrtdFrmStr;
+ }
+
+ if (rangeTableStr)
+ {
+ List *rangeTableCnvrtdFrmStr = (List *) stringToNode(rangeTableStr);
+ cstate->range_table = rangeTableCnvrtdFrmStr;
+ }
+
+ /* Open and lock the relation, using the appropriate lock type. */
+ rel = table_open(shared_cstate->relid, RowExclusiveLock);
+ cstate->rel = rel;
+ ParallelWorkerInitialization(shared_cstate, cstate, attlist);
+
+ CopyFrom(cstate);
+
+ if (rel != NULL)
+ table_close(rel, RowExclusiveLock);
+
+ MemoryContextSwitchTo(oldcontext);
+ return;
+}
+/*
+ * ParallelCopyLeader - parallel copy leader's functionality.
+ *
+ * Leader will populate the shared memory and share it across the workers.
+ * Leader will read the table data from the file and copy the contents to block.
+ * Leader will then read the input contents and identify the data based on line
+ * beaks. This information is called line. The line will be populate in
+ * ParallelCopyLineBoundary. Workers will then pick up this information and
+ * insert in to table. Leader will do this till it completes processing the
+ * file. Leader executes the before statement if before statement trigger is
+ * present. Leader read the data from input file. Leader then loads data to data
+ * blocks as and when required block by block. Leader traverses through the data
+ * block to identify one line. It gets a free line to copy the information, if
+ * there is no free line it will wait till there is one free line.
+ * Server copies the identified lines information into lines. This process is
+ * repeated till the complete file is processed.
+ * Leader will wait till all the lines populated are processed by the workers
+ * and exits.
+ */
+static void
+ParallelCopyLeader(CopyState cstate)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ereport(DEBUG1, (errmsg("Running parallel copy leader")));
+
+ pcshared_info->is_read_in_progress = false;
+ cstate->cur_lineno = 0;
+}
+/*
+ * LookupParallelCopyFnPtr - Look up parallel copy function pointer.
+ */
+static pg_attribute_always_inline copy_data_source_cb
+LookupParallelCopyFnPtr(const char *funcname)
+{
+ int i;
+
+ for (i = 0; i < lengthof(InternalParallelCopyFuncPtrs); i++)
+ {
+ if (strcmp(InternalParallelCopyFuncPtrs[i].fn_name, funcname) == 0)
+ return InternalParallelCopyFuncPtrs[i].fn_addr;
+ }
+
+ /* We can only reach this by programming error. */
+ elog(ERROR, "internal function \"%s\" not found", funcname);
+}
+
+/*
+ * LookupParallelCopyFnStr - Lookup function string from a function pointer.
+ */
+static pg_attribute_always_inline char*
+LookupParallelCopyFnStr(copy_data_source_cb fn_addr)
+{
+ int i;
+
+ for (i = 0; i < lengthof(InternalParallelCopyFuncPtrs); i++)
+ {
+ if (InternalParallelCopyFuncPtrs[i].fn_addr == fn_addr)
+ return InternalParallelCopyFuncPtrs[i].fn_name;
+ }
+
+ /* We can only reach this by programming error. */
+ elog(ERROR, "internal function pointer not found");
+}
/*
* Send copy start/stop messages for frontend copies. These have changed
* in past protocol redesigns.
@@ -1149,6 +1939,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
if (is_from)
{
+ ParallelContext *pcxt = NULL;
Assert(rel);
/* check read-only transaction and parallel mode */
@@ -1158,7 +1949,24 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
cstate = BeginCopyFrom(pstate, rel, stmt->filename, stmt->is_program,
NULL, stmt->attlist, stmt->options);
cstate->whereClause = whereClause;
- *processed = CopyFrom(cstate); /* copy from file to database */
+ cstate->is_parallel = false;
+
+ if (cstate->nworkers > 0)
+ pcxt = BeginParallelCopy(cstate->nworkers, cstate, stmt->attlist,
+ relid);
+
+ if (pcxt)
+ {
+ ParallelCopyLeader(cstate);
+
+ /* Wait for all copy workers to finish */
+ WaitForParallelWorkersToFinish(pcxt);
+ *processed = pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
+ EndParallelCopy(pcxt);
+ }
+ else
+ *processed = CopyFrom(cstate); /* copy from file to database */
+
EndCopyFrom(cstate);
}
else
@@ -1207,6 +2015,7 @@ ProcessCopyOptions(ParseState *pstate,
cstate->is_copy_from = is_from;
cstate->file_encoding = -1;
+ cstate->nworkers = -1;
/* Extract options from the statement node tree */
foreach(option, options)
@@ -1375,6 +2184,53 @@ ProcessCopyOptions(ParseState *pstate,
defel->defname),
parser_errposition(pstate, defel->location)));
}
+ else if (strcmp(defel->defname, "parallel") == 0)
+ {
+ char *endptr, *str;
+ long val;
+
+ if (!cstate->is_copy_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("parallel option supported only for copy from"),
+ parser_errposition(pstate, defel->location)));
+ if (cstate->nworkers >= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("conflicting or redundant options"),
+ parser_errposition(pstate, defel->location)));
+
+ str = defGetString(defel);
+
+ errno = 0; /* To distinguish success/failure after call */
+ val = strtol(str, &endptr, 10);
+
+ /* Check for various possible errors */
+ if ((errno == ERANGE && (val == LONG_MAX || val == LONG_MIN))
+ || (errno != 0 && val == 0) ||
+ *endptr)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("improper use of argument to option \"%s\"",
+ defel->defname),
+ parser_errposition(pstate, defel->location)));
+
+ if (endptr == str)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("no digits were found in argument to option \"%s\"",
+ defel->defname),
+ parser_errposition(pstate, defel->location)));
+
+ cstate->nworkers = (int) val;
+
+ if (cstate->nworkers <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("argument to option \"%s\" must be a positive integer greater than zero",
+ defel->defname),
+ parser_errposition(pstate, defel->location)));
+ }
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1727,12 +2583,13 @@ BeginCopy(ParseState *pstate,
}
/*
- * PopulateGlobalsForCopyFrom - Populates the common variables required for copy
- * from operation. This is a helper function for BeginCopy function.
+ * PopulateCatalogInformation - Populates the common variables required for copy
+ * from operation. This is a helper function for BeginCopy &
+ * ParallelWorkerInitialization function.
*/
static void
PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tupDesc,
- List *attnamelist)
+ List *attnamelist)
{
int num_phys_attrs;
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index c27d970589..b3787c1c7f 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -557,7 +557,7 @@ make_copy_attnamelist(LogicalRepRelMapEntry *rel)
* Data source callback for the COPY FROM, which reads from the remote
* connection and passes the data back to our local COPY.
*/
-static int
+int
copy_read_data(void *outbuf, int minread, int maxread)
{
int bytesread = 0;
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c639833565..5dc95ac3f5 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -14,6 +14,7 @@
#ifndef COPY_H
#define COPY_H
+#include "access/parallel.h"
#include "nodes/execnodes.h"
#include "nodes/parsenodes.h"
#include "parser/parse_node.h"
@@ -41,4 +42,7 @@ extern uint64 CopyFrom(CopyState cstate);
extern DestReceiver *CreateCopyDestReceiver(void);
+extern void ParallelCopyMain(dsm_segment *seg, shm_toc *toc);
+
+extern int copy_read_data(void *outbuf, int minread, int maxread);
#endif /* COPY_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index c65a55257d..8a7979412b 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1698,6 +1698,14 @@ ParallelBitmapHeapState
ParallelBlockTableScanDesc
ParallelCompletionPtr
ParallelContext
+ParallelCopyLineBoundaries
+ParallelCopyLineBoundary
+ParallelCopyCommonKeyData
+ParallelCopyData
+ParallelCopyDataBlock
+ParallelCopyKeyListInfo
+ParallelCopyLineBuf
+ParallelCopyShmInfo
ParallelExecutorInfo
ParallelHashGrowth
ParallelHashJoinBatch
--
2.25.1
0003-Allow-copy-from-command-to-process-data-from-file-ST.patchapplication/octet-stream; name=0003-Allow-copy-from-command-to-process-data-from-file-ST.patchDownload
From efc6dcb1ad7285c4e7486cd152c5908a48a7544d Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>, Bharath <bharath.rupireddyforpostgres@gmail.com>
Date: Tue, 23 Jun 2020 07:15:43 +0530
Subject: [PATCH 3/6] Allow copy from command to process data from file/STDIN
contents to a table in parallel.
This feature allows the copy from to leverage multiple CPUs in order to copy
data from file/STDIN to a table. This adds a PARALLEL option to COPY FROM
command where the user can specify the number of workers that can be used
to perform the COPY FROM command. Specifying zero as number of workers will
disable parallelism.
The backend, to which the "COPY FROM" query is submitted acts as leader with
the responsibility of reading data from the file/stdin, launching at most n
number of workers as specified with PARALLEL 'n' option in the "COPY FROM"
query. The leader populates the common data required for the workers execution
in the DSM and shares it with the workers. The leader then executes before
statement triggers if there exists any. Leader populates DSM lines which
includes the start offset and line size, while populating the lines it reads
as many blocks as required into the DSM data blocks from the file. Each block
is of 64K size. The leader parses the data to identify a line, the existing
logic from CopyReadLineText which identifies the lines with some changes was
used for this. Leader checks if a free line is available to copy the
information, if there is no free line it waits till the required line is
freed up by the worker and then copies the identified lines information
(offset & line size) into the DSM lines. This process is repeated till the
complete file is processed. Simultaneously, the workers cache the lines(50)
locally into the local memory and release the lines to the leader for further
populating. Each worker processes the lines that it cached and inserts it into
the table. The leader does not participate in the insertion of data, leaders
only responsibility will be to identify the lines as fast as possible for the
workers to do the actual copy operation. The leader waits till all the lines
populated are processed by the workers and exits.
---
src/backend/access/heap/heapam.c | 13 -
src/backend/access/transam/xact.c | 13 +
src/backend/commands/copy.c | 896 ++++++++++++++++++++++++++++++++++++--
src/include/access/xact.h | 1 +
src/tools/pgindent/typedefs.list | 1 +
5 files changed, 873 insertions(+), 51 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 537913d..28f3a98 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2012,19 +2012,6 @@ static HeapTuple
heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
CommandId cid, int options)
{
- /*
- * Parallel operations are required to be strictly read-only in a parallel
- * worker. Parallel inserts are not safe even in the leader in the
- * general case, because group locking means that heavyweight locks for
- * relation extension or GIN page locks will not conflict between members
- * of a lock group, but we don't prohibit that case here because there are
- * useful special cases that we can safely allow, such as CREATE TABLE AS.
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-
tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 905dc7d..66f7236 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -502,6 +502,19 @@ GetCurrentFullTransactionIdIfAny(void)
}
/*
+ * AssignFullTransactionIdForWorker
+ *
+ * For parallel copy, all the workers must use the same transaction id.
+ */
+void AssignFullTransactionIdForWorker(FullTransactionId fullTransactionId)
+{
+ TransactionState s = CurrentTransactionState;
+
+ Assert((IsInParallelMode() || IsParallelWorker()));
+ s->fullTransactionId = fullTransactionId;
+}
+
+/*
* MarkCurrentTransactionIdLoggedIfAny
*
* Remember that the current xid - if it is assigned - now has been wal logged.
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 34c657c..e8a89a4 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -26,6 +26,7 @@
#include "access/xlog.h"
#include "catalog/dependency.h"
#include "catalog/pg_authid.h"
+#include "catalog/pg_proc_d.h"
#include "catalog/pg_type.h"
#include "commands/copy.h"
#include "commands/defrem.h"
@@ -40,11 +41,13 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "optimizer/clauses.h"
#include "optimizer/optimizer.h"
#include "parser/parse_coerce.h"
#include "parser/parse_collate.h"
#include "parser/parse_expr.h"
#include "parser/parse_relation.h"
+#include "pgstat.h"
#include "port/pg_bswap.h"
#include "rewrite/rewriteHandler.h"
#include "storage/fd.h"
@@ -95,6 +98,18 @@ typedef enum CopyInsertMethod
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
} CopyInsertMethod;
+/*
+ * State of the line.
+ */
+typedef enum ParallelCopyLineState
+{
+ LINE_INIT, /* initial state of line */
+ LINE_LEADER_POPULATING, /* leader processing line */
+ LINE_LEADER_POPULATED, /* leader completed populating line */
+ LINE_WORKER_PROCESSING, /* worker processing line */
+ LINE_WORKER_PROCESSED /* worker completed processing line */
+}ParallelCopyLineState;
+
#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
#define DATA_BLOCK_SIZE RAW_BUF_SIZE
@@ -538,9 +553,13 @@ if (1) \
{ \
if (raw_buf_ptr > cstate->raw_buf_index) \
{ \
- appendBinaryStringInfo(&cstate->line_buf, \
- cstate->raw_buf + cstate->raw_buf_index, \
- raw_buf_ptr - cstate->raw_buf_index); \
+ if (!IsParallelCopy()) \
+ appendBinaryStringInfo(&cstate->line_buf, \
+ cstate->raw_buf + cstate->raw_buf_index, \
+ raw_buf_ptr - cstate->raw_buf_index); \
+ else \
+ line_size += raw_buf_ptr - cstate->raw_buf_index; \
+ \
cstate->raw_buf_index = raw_buf_ptr; \
} \
} else ((void) 0)
@@ -553,13 +572,40 @@ if (1) \
goto not_end_of_copy; \
} else ((void) 0)
+/* Begin parallel copy Macros */
+#define SET_NEWLINE_SIZE() \
+{ \
+ if (cstate->eol_type == EOL_NL || cstate->eol_type == EOL_CR) \
+ new_line_size = 1; \
+ else if (cstate->eol_type == EOL_CRNL) \
+ new_line_size = 2; \
+ else \
+ new_line_size = 0; \
+}
+
+/*
+ * COPY_WAIT_TO_PROCESS - Wait before continuing to process.
+ */
+#define COPY_WAIT_TO_PROCESS() \
+{ \
+ CHECK_FOR_INTERRUPTS(); \
+ (void) WaitLatch(MyLatch, \
+ WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, \
+ 1L, WAIT_EVENT_PG_SLEEP); \
+ ResetLatch(MyLatch); \
+}
+
+/* End parallel copy Macros */
+
/*
* CONVERT_TO_SERVER_ENCODING - convert contents to server encoding.
*/
#define CONVERT_TO_SERVER_ENCODING(cstate) \
{ \
/* Done reading the line. Convert it to server encoding. */ \
- if (cstate->need_transcoding) \
+ if (cstate->need_transcoding && \
+ (!IsParallelCopy() || \
+ (IsParallelCopy() && !IsLeader()))) \
{ \
char *cvt; \
cvt = pg_any_to_server(cstate->line_buf.data, \
@@ -618,22 +664,38 @@ if (1) \
* CLEAR_EOL_LINE - Wrapper for clearing EOL.
*/
#define CLEAR_EOL_LINE() \
-if (!result && !IsHeaderLine()) \
- CLEAR_EOL_FROM_COPIED_DATA(cstate->line_buf.data, \
- cstate->line_buf.len, \
- cstate->line_buf.len) \
+{ \
+ if (!result && !IsHeaderLine()) \
+ { \
+ if (IsParallelCopy()) \
+ CLEAR_EOL_FROM_COPIED_DATA(cstate->raw_buf, \
+ raw_buf_ptr, line_size) \
+ else \
+ CLEAR_EOL_FROM_COPIED_DATA(cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ cstate->line_buf.len) \
+ } \
+}
/*
* INCREMENTPROCESSED - Increment the lines processed.
*/
-#define INCREMENTPROCESSED(processed) \
-processed++;
+#define INCREMENTPROCESSED(processed) \
+{ \
+ if (!IsParallelCopy()) \
+ processed++; \
+ else \
+ pg_atomic_add_fetch_u64(&cstate->pcdata->pcshared_info->processed, 1); \
+}
/*
* GETPROCESSED - Get the lines processed.
*/
#define GETPROCESSED(processed) \
-return processed;
+if (!IsParallelCopy()) \
+ return processed; \
+else \
+ return pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
@@ -681,8 +743,12 @@ static void CopySendInt16(CopyState cstate, int16 val);
static bool CopyGetInt16(CopyState cstate, int16 *val);
static pg_attribute_always_inline void EndParallelCopy(ParallelContext *pcxt);
+static void ExecBeforeStmtTrigger(CopyState cstate);
+static void CheckTargetRelValidity(CopyState cstate);
static void PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tup_desc,
List *attnamelist);
+static void PopulateCatalogInformation(CopyState cstate);
+static pg_attribute_always_inline uint32 GetLinePosition(CopyState cstate);
static pg_attribute_always_inline copy_data_source_cb LookupParallelCopyFnPtr(const char *funcname);
static pg_attribute_always_inline char* LookupParallelCopyFnStr(copy_data_source_cb fn_addr);
@@ -836,6 +902,137 @@ InsertListShm(ParallelContext *pcxt, int key, List *inputlist,
}
/*
+ * IsTriggerFunctionParallelSafe - Check if the trigger function is parallel
+ * safe for the triggers. Return false if any one of the trigger has parallel
+ * unsafe function.
+ */
+static pg_attribute_always_inline bool
+IsTriggerFunctionParallelSafe(TriggerDesc *trigdesc)
+{
+ int i;
+ for (i = 0; i < trigdesc->numtriggers; i++)
+ {
+ Trigger *trigger = &trigdesc->triggers[i];
+ int trigtype = RI_TRIGGER_NONE;
+
+ if (func_parallel(trigger->tgfoid) != PROPARALLEL_SAFE)
+ return false;
+
+ /* If the trigger is parallel safe, also look for RI_TRIGGER. */
+ trigtype = RI_FKey_trigger_type(trigger->tgfoid);
+ if (trigtype == RI_TRIGGER_PK || trigtype == RI_TRIGGER_FK)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * CheckExprParallelSafety - determine parallel safety of volatile expressions
+ * in default clause of column definition or in where clause and return true if
+ * they are parallel safe.
+ */
+static pg_attribute_always_inline bool
+CheckExprParallelSafety(CopyState cstate)
+{
+ if (contain_volatile_functions(cstate->whereClause))
+ {
+ if (max_parallel_hazard((Query *)cstate->whereClause) != PROPARALLEL_SAFE)
+ return false;
+ }
+
+ /*
+ * Check if any of the column has volatile default expression. if yes, and
+ * they are not parallel safe, then parallelism is not allowed. For
+ * instance, if there are any serial/bigserial columns for which nextval()
+ * default expression which is parallel unsafe is associated, parallelism
+ * should not be allowed. In non paralllel copy volatile functions are not
+ * checked for nextval().
+ */
+ if (cstate->defexprs != NULL && cstate->num_defaults != 0)
+ {
+ int i;
+ for (i = 0; i < cstate->num_defaults; i++)
+ {
+ bool volatile_expr = contain_volatile_functions((Node *)cstate->defexprs[i]->expr);
+ if (volatile_expr &&
+ (max_parallel_hazard((Query *)cstate->defexprs[i]->expr)) !=
+ PROPARALLEL_SAFE)
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * FindInsertMethod - determine insert mode single, multi, or multi conditional.
+ */
+static pg_attribute_always_inline CopyInsertMethod
+FindInsertMethod(CopyState cstate)
+{
+ if (cstate->rel->trigdesc != NULL &&
+ (cstate->rel->trigdesc->trig_insert_before_row ||
+ cstate->rel->trigdesc->trig_insert_instead_row))
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
+ cstate->rel->trigdesc != NULL &&
+ cstate->rel->trigdesc->trig_insert_new_table)
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+ return CIM_MULTI_CONDITIONAL;
+
+ return CIM_MULTI;
+}
+
+/*
+ * IsParallelCopyAllowed - check for the cases where parallel copy is not
+ * applicable.
+ */
+static pg_attribute_always_inline bool
+IsParallelCopyAllowed(CopyState cstate)
+{
+ /* Parallel copy not allowed for frontend (2.0 protocol) & binary option. */
+ if ((cstate->copy_dest == COPY_OLD_FE) || cstate->binary)
+ return false;
+
+ /* Check if copy is into foreign table or temporary table. */
+ if (cstate->rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE ||
+ RelationUsesLocalBuffers(cstate->rel))
+ return false;
+
+ /* Check if trigger function is parallel safe. */
+ if (cstate->rel->trigdesc != NULL &&
+ !IsTriggerFunctionParallelSafe(cstate->rel->trigdesc))
+ return false;
+
+ /*
+ * Check if there is after statement or instead of trigger or transition
+ * table triggers.
+ */
+ if (cstate->rel->trigdesc != NULL &&
+ (cstate->rel->trigdesc->trig_insert_after_statement ||
+ cstate->rel->trigdesc->trig_insert_instead_row ||
+ cstate->rel->trigdesc->trig_insert_new_table))
+ return false;
+
+ /* Check if the volatile expressions are parallel safe, if present any. */
+ if (!CheckExprParallelSafety(cstate))
+ return false;
+
+ /* Check if the insertion mode is single. */
+ if (FindInsertMethod(cstate) == CIM_SINGLE)
+ return false;
+
+ return true;
+}
+
+/*
* BeginParallelCopy - start parallel copy tasks.
*
* Get the number of workers required to perform the parallel copy. The data
@@ -865,6 +1062,8 @@ BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
int parallel_workers = 0;
ParallelCopyData *pcdata;
+ CheckTargetRelValidity(cstate);
+
parallel_workers = Min(nworkers, max_worker_processes);
/* Can't perform copy in parallel */
@@ -878,6 +1077,19 @@ BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
cstate->pcdata = pcdata;
+ /*
+ * User chosen parallel copy. Determine if the parallel copy is actually
+ * allowed. If not, go with the non-parallel mode.
+ */
+ if (!IsParallelCopyAllowed(cstate))
+ {
+ elog(WARNING,
+ "Parallel copy not supported for specified table, copy will be run in non-parallel mode");
+ return NULL;
+ }
+
+ full_transaction_id = GetCurrentFullTransactionId();
+
EnterParallelMode();
pcxt = CreateParallelContext("postgres", "ParallelCopyMain",
parallel_workers);
@@ -1108,7 +1320,211 @@ ParallelWorkerInitialization(ParallelCopyCommonKeyData *shared_cstate,
cstate->line_buf_converted = false;
cstate->raw_buf = NULL;
cstate->raw_buf_index = cstate->raw_buf_len = 0;
+
+ PopulateCatalogInformation(cstate);
+
+ /* Create workspace for CopyReadAttributes results. */
+ if (!cstate->binary)
+ {
+ AttrNumber attr_count = list_length(cstate->attnumlist);
+
+ cstate->max_fields = attr_count;
+ cstate->raw_fields = (char **)palloc(attr_count * sizeof(char *));
+ }
+}
+
+/*
+ * CacheLineInfo - Cache the line information to local memory.
+ */
+static bool
+CacheLineInfo(CopyState cstate, uint32 buff_count)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyData *pcdata = cstate->pcdata;
+ uint32 write_pos;
+ ParallelCopyDataBlock *data_blk_ptr;
+ ParallelCopyLineBoundary *lineInfo;
+ uint32 offset;
+ int dataSize;
+ int copiedSize = 0;
+
+ resetStringInfo(&pcdata->worker_line_buf[buff_count].line_buf);
+ write_pos = GetLinePosition(cstate);
+ if (-1 == write_pos)
+ return true;
+
+ /* Get the current line information. */
+ lineInfo = &pcshared_info->line_boundaries.ring[write_pos];
+ if (pg_atomic_read_u32(&lineInfo->line_size) == 0)
+ goto empty_data_line_update;
+
+ /* Get the block information. */
+ data_blk_ptr = &pcshared_info->data_blocks[lineInfo->first_block];
+
+ /* Get the offset information from where the data must be copied. */
+ offset = lineInfo->start_offset;
+ pcdata->worker_line_buf[buff_count].cur_lineno = lineInfo->cur_lineno;
+
+ elog(DEBUG1, "[Worker] Processing - line position:%d, block:%d, unprocessed lines:%d, offset:%d, line size:%d",
+ write_pos, lineInfo->first_block,
+ pg_atomic_read_u32(&data_blk_ptr->unprocessed_line_parts),
+ offset, pg_atomic_read_u32(&lineInfo->line_size));
+
+ for (;;)
+ {
+ uint8 skip_bytes = data_blk_ptr->skip_bytes;
+ /*
+ * There is a possibility that the above loop has come out because
+ * data_blk_ptr->curr_blk_completed is set, but dataSize read might
+ * be an old value, if data_blk_ptr->curr_blk_completed and the line is
+ * completed, line_size will be set. Read the line_size again to be
+ * sure if it is complete or partial block.
+ */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+ if (dataSize)
+ {
+ int remainingSize = dataSize - copiedSize;
+ if (!remainingSize)
+ break;
+
+ /* Whole line is in current block. */
+ if (remainingSize + offset + skip_bytes < DATA_BLOCK_SIZE)
+ {
+ appendBinaryStringInfo(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ remainingSize);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts,
+ 1);
+ break;
+ }
+ else
+ {
+ /* Line is spread across the blocks. */
+ uint32 lineInCurrentBlock = (DATA_BLOCK_SIZE - skip_bytes) - offset;
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ lineInCurrentBlock);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts, 1);
+ copiedSize += lineInCurrentBlock;
+ while (copiedSize < dataSize)
+ {
+ uint32 currentBlockCopySize;
+ ParallelCopyDataBlock *currBlkPtr = &pcshared_info->data_blocks[data_blk_ptr->following_block];
+ skip_bytes = currBlkPtr->skip_bytes;
+
+ /*
+ * If complete data is present in current block use
+ * dataSize - copiedSize, or copy the whole block from
+ * current block.
+ */
+ currentBlockCopySize = Min(dataSize - copiedSize, DATA_BLOCK_SIZE - skip_bytes);
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &currBlkPtr->data[0],
+ currentBlockCopySize);
+ pg_atomic_sub_fetch_u32(&currBlkPtr->unprocessed_line_parts, 1);
+ copiedSize += currentBlockCopySize;
+ data_blk_ptr = currBlkPtr;
+ }
+
+ break;
+ }
+ }
+ else
+ {
+ /* Copy this complete block from the current offset. */
+ uint32 lineInCurrentBlock = (DATA_BLOCK_SIZE - skip_bytes) - offset;
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ lineInCurrentBlock);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts, 1);
+ copiedSize += lineInCurrentBlock;
+
+ /*
+ * Reset the offset. For the first copy, copy from the offset. For
+ * the subsequent copy the complete block.
+ */
+ offset = 0;
+
+ /* Set data_blk_ptr to the following block. */
+ data_blk_ptr = &pcshared_info->data_blocks[data_blk_ptr->following_block];
+ }
+
+ for (;;)
+ {
+ /* Get the size of this line */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+
+ /*
+ * If the data is present in current block lineInfo.line_size
+ * will be updated. If the data is spread across the blocks either
+ * of lineInfo.line_size or data_blk_ptr->curr_blk_completed can
+ * be updated. lineInfo.line_size will be updated if the complete
+ * read is finished. data_blk_ptr->curr_blk_completed will be
+ * updated if processing of current block is finished and data
+ * processing is not finished.
+ */
+ if (data_blk_ptr->curr_blk_completed || (dataSize != -1))
+ break;
+
+ COPY_WAIT_TO_PROCESS()
+ }
+ }
+
+empty_data_line_update:
+ elog(DEBUG1, "[Worker] Completed processing line:%d", write_pos);
+ pg_atomic_write_u32(&lineInfo->line_state, LINE_WORKER_PROCESSED);
+ pg_atomic_write_u32(&lineInfo->line_size, -1);
+ pg_atomic_add_fetch_u64(&pcshared_info->total_worker_processed, 1);
+ return false;
+}
+
+/*
+ * GetWorkerLine - Returns a line for worker to process.
+ */
+static bool
+GetWorkerLine(CopyState cstate)
+{
+ uint32 buff_count;
+ ParallelCopyData *pcdata = cstate->pcdata;
+
+ /*
+ * Copy the line data to line_buf and release the line position so that the
+ * worker can continue loading data.
+ */
+ if (pcdata->worker_line_buf_pos < pcdata->worker_line_buf_count)
+ goto return_line;
+
+ pcdata->worker_line_buf_pos = 0;
+ pcdata->worker_line_buf_count = 0;
+
+ for (buff_count = 0; buff_count < WORKER_CHUNK_COUNT; buff_count++)
+ {
+ bool result = CacheLineInfo(cstate, buff_count);
+ if (result)
+ break;
+
+ pcdata->worker_line_buf_count++;
+ }
+
+ if (pcdata->worker_line_buf_count)
+ goto return_line;
+ else
+ resetStringInfo(&cstate->line_buf);
+
+ return true;
+
+return_line:
+ cstate->line_buf = pcdata->worker_line_buf[pcdata->worker_line_buf_pos].line_buf;
+ cstate->cur_lineno = pcdata->worker_line_buf[pcdata->worker_line_buf_pos].cur_lineno;
+ cstate->line_buf_valid = true;
+
+ /* Mark that encoding conversion hasn't occurred yet. */
+ cstate->line_buf_converted = false;
+ CONVERT_TO_SERVER_ENCODING(cstate)
+ pcdata->worker_line_buf_pos++;
+ return false;
}
+
/*
* ParallelCopyMain - parallel copy worker's code.
*
@@ -1154,6 +1570,7 @@ ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
ereport(DEBUG1, (errmsg("Starting parallel copy worker")));
pcdata->pcshared_info = pcshared_info;
+ AssignFullTransactionIdForWorker(pcshared_info->full_transaction_id);
shared_cstate = (ParallelCopyCommonKeyData *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_CSTATE,
false);
@@ -1206,6 +1623,34 @@ ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
MemoryContextSwitchTo(oldcontext);
return;
}
+
+/*
+ * UpdateBlockInLineInfo - Update the line information.
+ */
+static pg_attribute_always_inline int
+UpdateBlockInLineInfo(CopyState cstate, uint32 blk_pos,
+ uint32 offset, uint32 line_size, uint32 line_state)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundaries *lineBoundaryPtr = &pcshared_info->line_boundaries;
+ ParallelCopyLineBoundary *lineInfo;
+ int line_pos = lineBoundaryPtr->leader_pos;
+
+ /* Update the line information for the worker to pick and process. */
+ lineInfo = &lineBoundaryPtr->ring[line_pos];
+ while (pg_atomic_read_u32(&lineInfo->line_size) != -1)
+ COPY_WAIT_TO_PROCESS()
+
+ lineInfo->first_block = blk_pos;
+ lineInfo->start_offset = offset;
+ lineInfo->cur_lineno = cstate->cur_lineno;
+ pg_atomic_write_u32(&lineInfo->line_size, line_size);
+ pg_atomic_write_u32(&lineInfo->line_state, line_state);
+ lineBoundaryPtr->leader_pos = (lineBoundaryPtr->leader_pos + 1) % RINGSIZE;
+
+ return line_pos;
+}
+
/*
* ParallelCopyLeader - parallel copy leader's functionality.
*
@@ -1231,9 +1676,158 @@ ParallelCopyLeader(CopyState cstate)
ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
ereport(DEBUG1, (errmsg("Running parallel copy leader")));
+ /* Execute the before statement triggers from the leader */
+ ExecBeforeStmtTrigger(cstate);
+
+ /* On input just throw the header line away. */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
+ {
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ {
+ pcshared_info->is_read_in_progress = false;
+ return; /* done */
+ }
+ }
+
+ for (;;)
+ {
+ bool done;
+ cstate->cur_lineno++;
+
+ /* Actually read the line into memory here. */
+ done = CopyReadLine(cstate);
+
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ break;
+ }
+
pcshared_info->is_read_in_progress = false;
cstate->cur_lineno = 0;
}
+
+/*
+ * GetLinePosition - return the line position that worker should process.
+ */
+static uint32
+GetLinePosition(CopyState cstate)
+{
+ ParallelCopyData *pcdata = cstate->pcdata;
+ ParallelCopyShmInfo *pcshared_info = pcdata->pcshared_info;
+ uint32 previous_pos = pcdata->worker_processed_pos;
+ uint32 write_pos = (previous_pos == -1) ? 0 : (previous_pos + 1) % RINGSIZE;
+ for (;;)
+ {
+ int dataSize;
+ bool is_read_in_progress = pcshared_info->is_read_in_progress;
+ ParallelCopyLineBoundary *lineInfo;
+ ParallelCopyDataBlock *data_blk_ptr;
+ ParallelCopyLineState line_state = LINE_LEADER_POPULATED;
+ ParallelCopyLineState curr_line_state;
+ CHECK_FOR_INTERRUPTS();
+
+ /* File read completed & no elements to process. */
+ if (!is_read_in_progress &&
+ (pcshared_info->populated ==
+ pg_atomic_read_u64(&pcshared_info->total_worker_processed)))
+ {
+ write_pos = -1;
+ break;
+ }
+
+ /* Get the current line information. */
+ lineInfo = &pcshared_info->line_boundaries.ring[write_pos];
+ curr_line_state = pg_atomic_read_u32(&lineInfo->line_state);
+ if ((write_pos % WORKER_CHUNK_COUNT == 0) &&
+ (curr_line_state == LINE_WORKER_PROCESSED ||
+ curr_line_state == LINE_WORKER_PROCESSING))
+ {
+ pcdata->worker_processed_pos = write_pos;
+ write_pos = (write_pos + WORKER_CHUNK_COUNT) % RINGSIZE;
+ continue;
+ }
+
+ /* Get the size of this line. */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+
+ if (dataSize != 0) /* If not an empty line. */
+ {
+ /* Get the block information. */
+ data_blk_ptr = &pcshared_info->data_blocks[lineInfo->first_block];
+
+ if (!data_blk_ptr->curr_blk_completed && (dataSize == -1))
+ {
+ /* Wait till the current line or block is added. */
+ COPY_WAIT_TO_PROCESS()
+ continue;
+ }
+ }
+
+ /* Make sure that no worker has consumed this element. */
+ if (pg_atomic_compare_exchange_u32(&lineInfo->line_state,
+ &line_state, LINE_WORKER_PROCESSING))
+ break;
+ }
+
+ pcdata->worker_processed_pos = write_pos;
+ return write_pos;
+}
+
+/*
+ * GetFreeCopyBlock - Get a free block for data to be copied.
+ */
+static pg_attribute_always_inline uint32
+GetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
+{
+ int count = 0;
+ uint32 last_free_block = pcshared_info->cur_block_pos;
+ uint32 block_pos = (last_free_block != -1) ? ((last_free_block + 1) % MAX_BLOCKS_COUNT): 0;
+
+ /* Get a new block for copying data. */
+ while (count < MAX_BLOCKS_COUNT)
+ {
+ ParallelCopyDataBlock *dataBlkPtr = &pcshared_info->data_blocks[block_pos];
+ uint32 unprocessed_line_parts = pg_atomic_read_u32(&dataBlkPtr->unprocessed_line_parts);
+ if (unprocessed_line_parts == 0)
+ {
+ dataBlkPtr->curr_blk_completed = false;
+ dataBlkPtr->skip_bytes = 0;
+ pcshared_info->cur_block_pos = block_pos;
+ return block_pos;
+ }
+
+ block_pos = (block_pos + 1) % MAX_BLOCKS_COUNT;
+ count++;
+ }
+
+ return -1;
+}
+
+/*
+ * WaitGetFreeCopyBlock - If there are no blocks available, wait and get a block
+ * for copying data.
+ */
+static uint32
+WaitGetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
+{
+ uint32 new_free_pos = -1;
+ for (;;)
+ {
+ new_free_pos = GetFreeCopyBlock(pcshared_info);
+ if (new_free_pos != -1) /* We have got one block, break now. */
+ break;
+
+ COPY_WAIT_TO_PROCESS()
+ }
+
+ return new_free_pos;
+}
+
/*
* LookupParallelCopyFnPtr - Look up parallel copy function pointer.
*/
@@ -1269,6 +1863,146 @@ LookupParallelCopyFnStr(copy_data_source_cb fn_addr)
/* We can only reach this by programming error. */
elog(ERROR, "internal function pointer not found");
}
+
+/*
+ * SetRawBufForLoad - Set raw_buf to the shared memory where the file data must
+ * be read.
+ */
+static void
+SetRawBufForLoad(CopyState cstate, uint32 line_size, uint32 copy_buf_len,
+ uint32 raw_buf_ptr, char **copy_raw_buf)
+{
+ ParallelCopyShmInfo *pcshared_info;
+ uint32 cur_block_pos;
+ uint32 next_block_pos;
+ ParallelCopyDataBlock *cur_data_blk_ptr = NULL;
+ ParallelCopyDataBlock *next_data_blk_ptr = NULL;
+
+ if (!IsParallelCopy())
+ return;
+
+ pcshared_info = cstate->pcdata->pcshared_info;
+ cur_block_pos = pcshared_info->cur_block_pos;
+ cur_data_blk_ptr = (cstate->raw_buf) ? &pcshared_info->data_blocks[cur_block_pos] : NULL;
+ next_block_pos = WaitGetFreeCopyBlock(pcshared_info);
+ next_data_blk_ptr = &pcshared_info->data_blocks[next_block_pos];
+
+ /* set raw_buf to the data block in shared memory */
+ cstate->raw_buf = next_data_blk_ptr->data;
+ *copy_raw_buf = cstate->raw_buf;
+ if (cur_data_blk_ptr && line_size)
+ {
+ /*
+ * Mark the previous block as completed, worker can start copying this
+ * data.
+ */
+ cur_data_blk_ptr->following_block = next_block_pos;
+ pg_atomic_add_fetch_u32(&cur_data_blk_ptr->unprocessed_line_parts, 1);
+ cur_data_blk_ptr->skip_bytes = copy_buf_len - raw_buf_ptr;
+ cur_data_blk_ptr->curr_blk_completed = true;
+ }
+}
+
+/*
+ * EndLineParallelCopy - Update the line information in shared memory.
+ */
+static void
+EndLineParallelCopy(CopyState cstate, uint32 line_pos, uint32 line_size,
+ uint32 raw_buf_ptr)
+{
+ uint8 new_line_size;
+ if (!IsParallelCopy())
+ return;
+
+ if (!IsHeaderLine())
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundaries *lineBoundaryPtr = &pcshared_info->line_boundaries;
+ SET_NEWLINE_SIZE()
+ if (line_size)
+ {
+ ParallelCopyLineBoundary *lineInfo = &lineBoundaryPtr->ring[line_pos];
+ /*
+ * If the new_line_size > raw_buf_ptr, then the new block has only
+ * new line char content. The unprocessed count should not be
+ * increased in this case.
+ */
+ if (raw_buf_ptr > new_line_size)
+ {
+ uint32 cur_block_pos = pcshared_info->cur_block_pos;
+ ParallelCopyDataBlock *curr_data_blk_ptr = &pcshared_info->data_blocks[cur_block_pos];
+ pg_atomic_add_fetch_u32(&curr_data_blk_ptr->unprocessed_line_parts, 1);
+ }
+
+ /* Update line size. */
+ pg_atomic_write_u32(&lineInfo->line_size, line_size);
+ pg_atomic_write_u32(&lineInfo->line_state, LINE_LEADER_POPULATED);
+ elog(DEBUG1, "[Leader] After adding - line position:%d, line_size:%d",
+ line_pos, line_size);
+ pcshared_info->populated++;
+ }
+ else if (new_line_size)
+ {
+ /* This means only new line char, empty record should be inserted.*/
+ ParallelCopyLineBoundary *lineInfo;
+ line_pos = UpdateBlockInLineInfo(cstate, -1, -1, 0,
+ LINE_LEADER_POPULATED);
+ lineInfo = &lineBoundaryPtr->ring[line_pos];
+ elog(DEBUG1, "[Leader] Added empty line with offset:%d, line position:%d, line size:%d",
+ lineInfo->start_offset, line_pos,
+ pg_atomic_read_u32(&lineInfo->line_size));
+ pcshared_info->populated++;
+ }
+ }
+}
+
+/*
+ * ExecBeforeStmtTrigger - Execute the before statement trigger, this will be
+ * executed for parallel copy by the leader process.
+ */
+static void
+ExecBeforeStmtTrigger(CopyState cstate)
+{
+ EState *estate = CreateExecutorState();
+ ResultRelInfo *resultRelInfo;
+
+ Assert(IsLeader());
+
+ /*
+ * We need a ResultRelInfo so we can use the regular executor's
+ * index-entry-making machinery. (There used to be a huge amount of code
+ * here that basically duplicated execUtils.c ...)
+ */
+ resultRelInfo = makeNode(ResultRelInfo);
+ InitResultRelInfo(resultRelInfo,
+ cstate->rel,
+ 1, /* must match rel's position in range_table */
+ NULL,
+ 0);
+
+ /* Verify the named relation is a valid target for INSERT */
+ CheckValidResultRel(resultRelInfo, CMD_INSERT);
+
+ estate->es_result_relations = resultRelInfo;
+ estate->es_num_result_relations = 1;
+ estate->es_result_relation_info = resultRelInfo;
+
+ ExecInitRangeTable(estate, cstate->range_table);
+
+ /*
+ * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
+ * should do this for COPY, since it's not really an "INSERT" statement as
+ * such. However, executing these triggers maintains consistency with the
+ * EACH ROW triggers that we already fire on COPY.
+ */
+ ExecBSInsertTriggers(estate, resultRelInfo);
+
+ /* Close any trigger target relations */
+ ExecCleanUpTriggerState(estate);
+
+ FreeExecutorState(estate);
+}
+
/*
* Send copy start/stop messages for frontend copies. These have changed
* in past protocol redesigns.
@@ -3675,7 +4409,8 @@ CopyFrom(CopyState cstate)
PartitionTupleRouting *proute = NULL;
ErrorContextCallback errcallback;
- CommandId mycid = GetCurrentCommandId(true);
+ CommandId mycid = IsParallelCopy() ? cstate->pcdata->pcshared_info->mycid :
+ GetCurrentCommandId(true);
int ti_options = 0; /* start with default options for insert */
BulkInsertState bistate = NULL;
CopyInsertMethod insertMethod;
@@ -3685,7 +4420,14 @@ CopyFrom(CopyState cstate)
bool has_instead_insert_row_trig;
bool leafpart_use_multi_insert = false;
- CheckTargetRelValidity(cstate);
+ /*
+ * Perform this check if it is not parallel copy. In case of parallel
+ * copy, this check is done by the leader, so that if any invalid case
+ * exist the copy from command will error out from the leader itself,
+ * avoiding launching workers, just to throw error.
+ */
+ if (!IsParallelCopy())
+ CheckTargetRelValidity(cstate);
/*
* If the target file is new-in-transaction, we assume that checking FSM
@@ -3874,13 +4616,16 @@ CopyFrom(CopyState cstate)
has_instead_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
resultRelInfo->ri_TrigDesc->trig_insert_instead_row);
- /*
- * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
- * should do this for COPY, since it's not really an "INSERT" statement as
- * such. However, executing these triggers maintains consistency with the
- * EACH ROW triggers that we already fire on COPY.
- */
- ExecBSInsertTriggers(estate, resultRelInfo);
+ if (!IsParallelCopy())
+ {
+ /*
+ * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
+ * should do this for COPY, since it's not really an "INSERT" statement as
+ * such. However, executing these triggers maintains consistency with the
+ * EACH ROW triggers that we already fire on COPY.
+ */
+ ExecBSInsertTriggers(estate, resultRelInfo);
+ }
econtext = GetPerTupleExprContext(estate);
@@ -3980,6 +4725,16 @@ CopyFrom(CopyState cstate)
!has_instead_insert_row_trig &&
resultRelInfo->ri_FdwRoutine == NULL;
+ /*
+ * If the table has any partitions that are either foreign or
+ * has BEFORE/INSTEAD OF triggers, we can't perform copy
+ * operations with parallel workers.
+ */
+ if (!leafpart_use_multi_insert && IsParallelWorker())
+ ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot perform PARALLEL COPY if partition has BEFORE/INSTEAD OF triggers, or if the partition is foreign partition"),
+ errhint("Try COPY without PARALLEL option")));
+
/* Set the multi-insert buffer to use for this partition. */
if (leafpart_use_multi_insert)
{
@@ -4394,7 +5149,7 @@ BeginCopyFrom(ParseState *pstate,
initStringInfo(&cstate->attribute_buf);
initStringInfo(&cstate->line_buf);
cstate->line_buf_converted = false;
- cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf = (IsParallelCopy()) ? NULL : (char *) palloc(RAW_BUF_SIZE + 1);
cstate->raw_buf_index = cstate->raw_buf_len = 0;
/* Assign range table, we'll need it in CopyFrom. */
@@ -4538,26 +5293,35 @@ NextCopyFromRawFields(CopyState cstate, char ***fields, int *nfields)
/* only available for text or csv input */
Assert(!cstate->binary);
- /* on input just throw the header line away */
- if (cstate->cur_lineno == 0 && cstate->header_line)
+ if (IsParallelCopy())
{
- cstate->cur_lineno++;
- if (CopyReadLine(cstate))
- return false; /* done */
+ done = GetWorkerLine(cstate);
+ if (done && cstate->line_buf.len == 0)
+ return false;
}
+ else
+ {
+ /* on input just throw the header line away */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
+ {
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ return false; /* done */
+ }
- cstate->cur_lineno++;
+ cstate->cur_lineno++;
- /* Actually read the line into memory here */
- done = CopyReadLine(cstate);
+ /* Actually read the line into memory here */
+ done = CopyReadLine(cstate);
- /*
- * EOF at start of line means we're done. If we see EOF after some
- * characters, we act as though it was newline followed by EOF, ie,
- * process the line and then exit loop on next iteration.
- */
- if (done && cstate->line_buf.len == 0)
- return false;
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ return false;
+ }
/* Parse the line into de-escaped field values */
if (cstate->csv_mode)
@@ -4805,9 +5569,31 @@ CopyReadLine(CopyState cstate)
*/
if (cstate->copy_dest == COPY_NEW_FE)
{
+ bool bIsFirst = true;
do
{
- cstate->raw_buf_index = cstate->raw_buf_len;
+ if (!IsParallelCopy())
+ cstate->raw_buf_index = cstate->raw_buf_len;
+ else
+ {
+ if (cstate->raw_buf_index == RAW_BUF_SIZE)
+ {
+ /* Get a new block if it is the first time, From the
+ * subsequent time, reset the index and re-use the same
+ * block.
+ */
+ if (bIsFirst)
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint32 block_pos = WaitGetFreeCopyBlock(pcshared_info);
+ cstate->raw_buf = pcshared_info->data_blocks[block_pos].data;
+ bIsFirst = false;
+ }
+
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+ }
+ }
+
} while (CopyLoadRawBuf(cstate));
}
}
@@ -4837,6 +5623,11 @@ CopyReadLineText(CopyState cstate)
char quotec = '\0';
char escapec = '\0';
+ /* For parallel copy */
+ uint32 line_size = 0;
+ int line_pos = 0;
+
+ cstate->eol_type = EOL_UNKNOWN;
if (cstate->csv_mode)
{
quotec = cstate->quote[0];
@@ -4891,6 +5682,8 @@ CopyReadLineText(CopyState cstate)
if (raw_buf_ptr >= copy_buf_len || need_data)
{
REFILL_LINEBUF;
+ SetRawBufForLoad(cstate, line_size, copy_buf_len, raw_buf_ptr,
+ ©_raw_buf);
/*
* Try to read some more data. This will certainly reset
@@ -5115,9 +5908,15 @@ CopyReadLineText(CopyState cstate)
* discard the data and the \. sequence.
*/
if (prev_raw_ptr > cstate->raw_buf_index)
- appendBinaryStringInfo(&cstate->line_buf,
+ {
+ if (!IsParallelCopy())
+ appendBinaryStringInfo(&cstate->line_buf,
cstate->raw_buf + cstate->raw_buf_index,
prev_raw_ptr - cstate->raw_buf_index);
+ else
+ line_size += prev_raw_ptr - cstate->raw_buf_index;
+ }
+
cstate->raw_buf_index = raw_buf_ptr;
result = true; /* report EOF */
break;
@@ -5169,6 +5968,26 @@ not_end_of_copy:
IF_NEED_REFILL_AND_EOF_BREAK(mblen - 1);
raw_buf_ptr += mblen - 1;
}
+
+ /*
+ * Skip the header line. Update the line here, this cannot be done at
+ * the beginning, as there is a possibility that file contains empty
+ * lines.
+ */
+ if (IsParallelCopy() && first_char_in_line && !IsHeaderLine())
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundary *lineInfo;
+ uint32 line_first_block = pcshared_info->cur_block_pos;
+ line_pos = UpdateBlockInLineInfo(cstate,
+ line_first_block,
+ cstate->raw_buf_index, -1,
+ LINE_LEADER_POPULATING);
+ lineInfo = &pcshared_info->line_boundaries.ring[line_pos];
+ elog(DEBUG1, "[Leader] Adding - block:%d, offset:%d, line position:%d",
+ line_first_block, lineInfo->start_offset, line_pos);
+ }
+
first_char_in_line = false;
} /* end of outer loop */
@@ -5177,6 +5996,7 @@ not_end_of_copy:
*/
REFILL_LINEBUF;
CLEAR_EOL_LINE()
+ EndLineParallelCopy(cstate, line_pos, line_size, raw_buf_ptr);
return result;
}
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index db19187..584b7ee 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -381,6 +381,7 @@ extern FullTransactionId GetTopFullTransactionId(void);
extern FullTransactionId GetTopFullTransactionIdIfAny(void);
extern FullTransactionId GetCurrentFullTransactionId(void);
extern FullTransactionId GetCurrentFullTransactionIdIfAny(void);
+extern void AssignFullTransactionIdForWorker(FullTransactionId fullTransactionId);
extern void MarkCurrentTransactionIdLoggedIfAny(void);
extern bool SubTransactionIsActive(SubTransactionId subxid);
extern CommandId GetCurrentCommandId(bool used);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 8a79794..86b5c62 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1700,6 +1700,7 @@ ParallelCompletionPtr
ParallelContext
ParallelCopyLineBoundaries
ParallelCopyLineBoundary
+ParallelCopyLineState
ParallelCopyCommonKeyData
ParallelCopyData
ParallelCopyDataBlock
--
1.8.3.1
0004-Documentation-for-parallel-copy.patchapplication/octet-stream; name=0004-Documentation-for-parallel-copy.patchDownload
From f06311aab275234125ed5b2c1a5c4740db55df3f Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Wed, 17 Jun 2020 07:34:56 +0530
Subject: [PATCH 4/6] Documentation for parallel copy.
This patch has the documentation changes for parallel copy.
---
doc/src/sgml/ref/copy.sgml | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18189ab..95d349d 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -37,6 +37,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
NULL '<replaceable class="parameter">null_string</replaceable>'
HEADER [ <replaceable class="parameter">boolean</replaceable> ]
+ PARALLEL <replaceable class="parameter">integer</replaceable>
QUOTE '<replaceable class="parameter">quote_character</replaceable>'
ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -275,6 +276,21 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</varlistentry>
<varlistentry>
+ <term><literal>PARALLEL</literal></term>
+ <listitem>
+ <para>
+ Perform <command>COPY FROM</command> in parallel using <replaceable
+ class="parameter"> integer</replaceable> background workers. Please
+ note that it is not guaranteed that the number of parallel workers
+ specified in <replaceable class="parameter">integer</replaceable> will
+ be used during execution. It is possible for a copy to run with fewer
+ workers than specified, or even with no workers at all. This option is
+ allowed only in <command>COPY FROM</command>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>QUOTE</literal></term>
<listitem>
<para>
--
1.8.3.1
0005-Tests-for-parallel-copy.patchapplication/octet-stream; name=0005-Tests-for-parallel-copy.patchDownload
From 740f39b8a87eb74e74182ed2cc5e8c18bd2ce367 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Wed, 17 Jun 2020 07:37:26 +0530
Subject: [PATCH 5/6] Tests for parallel copy.
This patch has the tests for parallel copy.
---
src/test/regress/expected/copy2.out | 210 +++++++++++++++++++++++++++++++++++-
src/test/regress/input/copy.source | 12 +++
src/test/regress/output/copy.source | 12 +++
src/test/regress/sql/copy2.sql | 208 ++++++++++++++++++++++++++++++++++-
4 files changed, 434 insertions(+), 8 deletions(-)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index e40287d..a088f72 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -254,18 +254,32 @@ It is "perfect".|
"It is ""perfect""."," "
"",
--test that we read consecutive LFs properly
-CREATE TEMP TABLE testnl (a int, b text, c int);
+CREATE TABLE testnl (a int, b text, c int);
COPY testnl FROM stdin CSV;
+COPY testnl FROM stdin WITH (FORMAT 'csv', PARALLEL '2');
+SELECT * FROM testnl;
+ a | b | c
+---+----------------------+---
+ 1 | a field with two LFs+| 2
+ | +|
+ | inside |
+ 1 | a field with two LFs+| 2
+ | +|
+ | inside |
+(2 rows)
+
-- test end of copy marker
-CREATE TEMP TABLE testeoc (a text);
+CREATE TABLE testeoc (a text);
COPY testeoc FROM stdin CSV;
+TRUNCATE testeoc;
+COPY testeoc FROM stdin WITH (FORMAT 'csv', PARALLEL '2');
COPY testeoc TO stdout CSV;
a\.
\.b
c\.d
"\."
-- test handling of nonstandard null marker that violates escaping rules
-CREATE TEMP TABLE testnull(a int, b text);
+CREATE TABLE testnull(a int, b text);
INSERT INTO testnull VALUES (1, E'\\0'), (NULL, NULL);
COPY testnull TO stdout WITH NULL AS E'\\0';
1 \\0
@@ -280,6 +294,15 @@ SELECT * FROM testnull;
|
(4 rows)
+TRUNCATE testnull;
+COPY testnull FROM stdin WITH (NULL E'\\0', PARALLEL 2);
+SELECT * FROM testnull;
+ a | b
+----+----
+ 42 | \0
+ |
+(2 rows)
+
BEGIN;
CREATE TABLE vistest (LIKE testeoc);
COPY vistest FROM stdin CSV;
@@ -349,6 +372,34 @@ SELECT * FROM vistest;
BEGIN;
TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 2);
+SELECT * FROM vistest;
+ a
+----
+ a2
+ b
+(2 rows)
+
+SAVEPOINT s1;
+TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 2);
+SELECT * FROM vistest;
+ a
+----
+ d2
+ e
+(2 rows)
+
+COMMIT;
+SELECT * FROM vistest;
+ a
+----
+ d2
+ e
+(2 rows)
+
+BEGIN;
+TRUNCATE vistest;
COPY vistest FROM stdin CSV FREEZE;
SELECT * FROM vistest;
a
@@ -409,7 +460,7 @@ SELECT * FROM vistest;
(2 rows)
-- Test FORCE_NOT_NULL and FORCE_NULL options
-CREATE TEMP TABLE forcetest (
+CREATE TABLE forcetest (
a INT NOT NULL,
b TEXT NOT NULL,
c TEXT,
@@ -420,6 +471,8 @@ CREATE TEMP TABLE forcetest (
-- should succeed with no effect ("b" remains an empty string, "c" remains NULL)
BEGIN;
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
+TRUNCATE forcetest;
+COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c), PARALLEL 2);
COMMIT;
SELECT b, c FROM forcetest WHERE a = 1;
b | c
@@ -430,6 +483,8 @@ SELECT b, c FROM forcetest WHERE a = 1;
-- should succeed, FORCE_NULL and FORCE_NOT_NULL can be both specified
BEGIN;
COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d));
+TRUNCATE forcetest;
+COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d), PARALLEL 2);
COMMIT;
SELECT c, d FROM forcetest WHERE a = 2;
c | d
@@ -486,6 +541,31 @@ select * from check_con_tbl;
(2 rows)
+\d+ check_con_tbl
+ Table "public.check_con_tbl"
+ Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
+--------+---------+-----------+----------+---------+---------+--------------+-------------
+ f1 | integer | | | | plain | |
+Check constraints:
+ "check_con_tbl_check" CHECK (check_con_function(check_con_tbl.*))
+
+truncate check_con_tbl;
+copy check_con_tbl from stdin with (parallel 2);
+NOTICE: input = {"f1":1}
+NOTICE: input = {"f1":null}
+copy check_con_tbl from stdin with (parallel 2);
+NOTICE: input = {"f1":0}
+ERROR: new row for relation "check_con_tbl" violates check constraint "check_con_tbl_check"
+DETAIL: Failing row contains (0).
+CONTEXT: COPY check_con_tbl, line 1: "0"
+parallel worker
+select * from check_con_tbl;
+ f1
+----
+ 1
+
+(2 rows)
+
-- test with RLS enabled.
CREATE ROLE regress_rls_copy_user;
CREATE ROLE regress_rls_copy_user_colperms;
@@ -600,8 +680,130 @@ SELECT * FROM instead_of_insert_tbl;
(2 rows)
COMMIT;
+-- Parallel copy tests.
+CREATE TABLE test_parallel_copy (
+ a INT,
+ b INT,
+ c TEXT not null default 'stuff',
+ d TEXT,
+ e TEXT
+) ;
+COPY test_parallel_copy (a, b, c, d, e) from stdin with (PARALLEL '2');
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '2');
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '2');
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '2', FORMAT 'csv', HEADER);
+-- zero workers: should perform non-parallel copy
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0');
+ERROR: argument to option "parallel" must be a positive integer greater than zero
+LINE 1: COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0'...
+ ^
+-- referencing table: should perform non-parallel copy
+CREATE TABLE test_copy_pk(c1 INT PRIMARY KEY);
+INSERT INTO test_copy_pk VALUES(10);
+CREATE TABLE test_copy_ri(c1 INT REFERENCES test_copy_pk(c1));
+COPY test_copy_ri from stdin with (FORMAT csv, DELIMITER ',', PARALLEL '2');
+WARNING: Parallel copy not supported for specified table, copy will be run in non-parallel mode
+-- expressions: should perform non-parallel copy
+CREATE TABLE test_copy_expr (index INT, height REAL, weight REAL);
+COPY test_copy_expr FROM STDIN WITH (FORMAT csv, DELIMITER ',', parallel '2') WHERE height > random() * 65;
+WARNING: Parallel copy not supported for specified table, copy will be run in non-parallel mode
+-- serial data: should perform non-parallel copy
+CREATE TABLE testserial (index SERIAL, height REAL);
+COPY testserial(height) FROM STDIN WITH (FORMAT csv, DELIMITER ',', parallel '2');
+WARNING: Parallel copy not supported for specified table, copy will be run in non-parallel mode
+-- temporary table copy: should peform non-parallel copy
+CREATE TEMPORARY TABLE temp_test(
+ a int
+) ;
+COPY temp_test (a) from stdin with (PARALLEL '2');
+WARNING: Parallel copy not supported for specified table, copy will be run in non-parallel mode
+-- non-existent column in column list: should fail
+COPY test_parallel_copy (xyz) from stdin with (PARALLEL '2');
+ERROR: column "xyz" of relation "test_parallel_copy" does not exist
+-- too many columns in column list: should fail
+COPY test_parallel_copy (a, b, c, d, e, d, c) from stdin with (PARALLEL '2');
+ERROR: column "d" specified more than once
+-- missing data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL '2');
+ERROR: invalid input syntax for type integer: ""
+CONTEXT: COPY test_parallel_copy, line 0, column a: ""
+parallel worker
+COPY test_parallel_copy from stdin with (PARALLEL '2');
+ERROR: missing data for column "e"
+CONTEXT: COPY test_parallel_copy, line 1: "2000 230 23 23"
+parallel worker
+COPY test_parallel_copy from stdin with (PARALLEL '2');
+ERROR: missing data for column "e"
+CONTEXT: COPY test_parallel_copy, line 1: "2001 231 \N \N"
+parallel worker
+-- extra data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL '2');
+ERROR: extra data after last expected column
+CONTEXT: COPY test_parallel_copy, line 1: "2002 232 40 50 60 70 80"
+parallel worker
+-- various COPY options: delimiters, oids, NULL string, encoding
+COPY test_parallel_copy (b, c, d, e) from stdin with (delimiter ',', null 'x', PARALLEL '2') ;
+COPY test_parallel_copy from stdin WITH (DELIMITER ';', NULL '', PARALLEL '2');
+COPY test_parallel_copy from stdin WITH (DELIMITER ':', NULL E'\\X', ENCODING 'sql_ascii', PARALLEL '2');
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a = 50004;
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a > 60003;
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE f > 60003;
+ERROR: column "f" does not exist
+LINE 1: ...arallel_copy from stdin WITH (PARALLEL '2') WHERE f > 60003;
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a = max(x.b);
+ERROR: missing FROM-clause entry for table "x"
+LINE 1: ...llel_copy from stdin WITH (PARALLEL '2') WHERE a = max(x.b);
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a IN (SELECT 1 FROM x);
+ERROR: cannot use subquery in COPY FROM WHERE condition
+LINE 1: ...allel_copy from stdin WITH (PARALLEL '2') WHERE a IN (SELECT...
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a IN (generate_series(1,5));
+ERROR: set-returning functions are not allowed in COPY FROM WHERE conditions
+LINE 1: ...l_copy from stdin WITH (PARALLEL '2') WHERE a IN (generate_s...
+ ^
+COPY test_parallel_copy from stdin WITH(PARALLEL '2') WHERE a = row_number() over(b);
+ERROR: window functions are not allowed in COPY FROM WHERE conditions
+LINE 1: ...llel_copy from stdin WITH(PARALLEL '2') WHERE a = row_number...
+ ^
+-- check results of copy in
+SELECT * FROM test_parallel_copy;
+ a | b | c | d | e
+-------+----+------------+---------+---------
+ 1 | 11 | test_c1 | test_d1 | test_e1
+ 2 | 12 | test_c2 | test_d2 | test_e2
+ | 3 | stuff | test_d3 |
+ | 4 | stuff | test_d4 |
+ | 5 | stuff | test_d5 |
+ | | 45 | 80 | 90
+ | | x | \x | \x
+ | | , | \, | \
+ 3000 | | c | |
+ 4000 | | C | |
+ 4001 | 1 | empty | |
+ 4002 | 2 | null | |
+ 4003 | 3 | Backslash | \ | \
+ 4004 | 4 | BackslashX | \X | \X
+ 4005 | 5 | N | N | N
+ 4006 | 6 | BackslashN | \N | \N
+ 4007 | 7 | XX | XX | XX
+ 4008 | 8 | Delimiter | : | :
+ 50004 | 25 | 35 | 45 | 55
+ 60004 | 25 | 35 | 45 | 55
+ 60005 | 26 | 36 | 46 | 56
+(21 rows)
+
-- clean up
+DROP TABLE test_copy_ri;
+DROP TABLE test_copy_pk;
+DROP TABLE test_copy_expr;
+DROP TABLE testeoc;
+DROP TABLE testnl;
DROP TABLE forcetest;
+DROP TABLE test_parallel_copy;
+DROP TABLE testserial;
+DROP TABLE testnull;
DROP TABLE vistest;
DROP FUNCTION truncate_in_subxact();
DROP TABLE x, y;
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index a1d529a..159c058 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -15,6 +15,13 @@ DELETE FROM onek;
COPY onek FROM '@abs_builddir@/results/onek.data';
+-- test parallel copy
+COPY tenk1 FROM '@abs_srcdir@/data/tenk.data' WITH (parallel 2);
+
+SELECT COUNT(*) FROM tenk1;
+
+TRUNCATE tenk1;
+
COPY tenk1 FROM '@abs_srcdir@/data/tenk.data';
COPY slow_emp4000 FROM '@abs_srcdir@/data/rect.data';
@@ -159,6 +166,11 @@ truncate parted_copytest;
copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv';
+-- Test parallel copy from with a partitioned table.
+truncate parted_copytest;
+
+copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv' with (parallel 2);
+
-- Ensure COPY FREEZE errors for partitioned tables.
begin;
truncate parted_copytest;
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index 938d355..c3003fe 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -9,6 +9,15 @@ COPY onek FROM '@abs_srcdir@/data/onek.data';
COPY onek TO '@abs_builddir@/results/onek.data';
DELETE FROM onek;
COPY onek FROM '@abs_builddir@/results/onek.data';
+-- test parallel copy
+COPY tenk1 FROM '@abs_srcdir@/data/tenk.data' WITH (parallel 2);
+SELECT COUNT(*) FROM tenk1;
+ count
+-------
+ 10000
+(1 row)
+
+TRUNCATE tenk1;
COPY tenk1 FROM '@abs_srcdir@/data/tenk.data';
COPY slow_emp4000 FROM '@abs_srcdir@/data/rect.data';
COPY person FROM '@abs_srcdir@/data/person.data';
@@ -113,6 +122,9 @@ insert into parted_copytest select x,1,'One' from generate_series(1011,1020) x;
copy (select * from parted_copytest order by a) to '@abs_builddir@/results/parted_copytest.csv';
truncate parted_copytest;
copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv';
+-- Test parallel copy from with a partitioned table.
+truncate parted_copytest;
+copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv' with (parallel 2);
-- Ensure COPY FREEZE errors for partitioned tables.
begin;
truncate parted_copytest;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 902f4fa..13104f4 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -157,7 +157,7 @@ COPY y TO stdout (FORMAT CSV, FORCE_QUOTE *);
--test that we read consecutive LFs properly
-CREATE TEMP TABLE testnl (a int, b text, c int);
+CREATE TABLE testnl (a int, b text, c int);
COPY testnl FROM stdin CSV;
1,"a field with two LFs
@@ -165,8 +165,16 @@ COPY testnl FROM stdin CSV;
inside",2
\.
+COPY testnl FROM stdin WITH (FORMAT 'csv', PARALLEL '2');
+1,"a field with two LFs
+
+inside",2
+\.
+
+SELECT * FROM testnl;
+
-- test end of copy marker
-CREATE TEMP TABLE testeoc (a text);
+CREATE TABLE testeoc (a text);
COPY testeoc FROM stdin CSV;
a\.
@@ -175,11 +183,19 @@ c\.d
"\."
\.
+TRUNCATE testeoc;
+COPY testeoc FROM stdin WITH (FORMAT 'csv', PARALLEL '2');
+a\.
+\.b
+c\.d
+"\."
+\.
+
COPY testeoc TO stdout CSV;
-- test handling of nonstandard null marker that violates escaping rules
-CREATE TEMP TABLE testnull(a int, b text);
+CREATE TABLE testnull(a int, b text);
INSERT INTO testnull VALUES (1, E'\\0'), (NULL, NULL);
COPY testnull TO stdout WITH NULL AS E'\\0';
@@ -191,6 +207,14 @@ COPY testnull FROM stdin WITH NULL AS E'\\0';
SELECT * FROM testnull;
+TRUNCATE testnull;
+COPY testnull FROM stdin WITH (NULL E'\\0', PARALLEL 2);
+42 \\0
+\0 \0
+\.
+
+SELECT * FROM testnull;
+
BEGIN;
CREATE TABLE vistest (LIKE testeoc);
COPY vistest FROM stdin CSV;
@@ -235,6 +259,23 @@ SELECT * FROM vistest;
BEGIN;
TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 2);
+a2
+b
+\.
+SELECT * FROM vistest;
+SAVEPOINT s1;
+TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 2);
+d2
+e
+\.
+SELECT * FROM vistest;
+COMMIT;
+SELECT * FROM vistest;
+
+BEGIN;
+TRUNCATE vistest;
COPY vistest FROM stdin CSV FREEZE;
x
y
@@ -284,7 +325,7 @@ SELECT * FROM vistest;
COMMIT;
SELECT * FROM vistest;
-- Test FORCE_NOT_NULL and FORCE_NULL options
-CREATE TEMP TABLE forcetest (
+CREATE TABLE forcetest (
a INT NOT NULL,
b TEXT NOT NULL,
c TEXT,
@@ -297,6 +338,10 @@ BEGIN;
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
1,,""
\.
+TRUNCATE forcetest;
+COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c), PARALLEL 2);
+1,,""
+\.
COMMIT;
SELECT b, c FROM forcetest WHERE a = 1;
-- should succeed, FORCE_NULL and FORCE_NOT_NULL can be both specified
@@ -304,6 +349,10 @@ BEGIN;
COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d));
2,'a',,""
\.
+TRUNCATE forcetest;
+COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d), PARALLEL 2);
+2,'a',,""
+\.
COMMIT;
SELECT c, d FROM forcetest WHERE a = 2;
-- should fail with not-null constraint violation
@@ -339,6 +388,16 @@ copy check_con_tbl from stdin;
0
\.
select * from check_con_tbl;
+\d+ check_con_tbl
+truncate check_con_tbl;
+copy check_con_tbl from stdin with (parallel 2);
+1
+\N
+\.
+copy check_con_tbl from stdin with (parallel 2);
+0
+\.
+select * from check_con_tbl;
-- test with RLS enabled.
CREATE ROLE regress_rls_copy_user;
@@ -440,8 +499,149 @@ test1
SELECT * FROM instead_of_insert_tbl;
COMMIT;
+-- Parallel copy tests.
+CREATE TABLE test_parallel_copy (
+ a INT,
+ b INT,
+ c TEXT not null default 'stuff',
+ d TEXT,
+ e TEXT
+) ;
+
+COPY test_parallel_copy (a, b, c, d, e) from stdin with (PARALLEL '2');
+1 11 test_c1 test_d1 test_e1
+2 12 test_c2 test_d2 test_e2
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '2');
+3 test_d3
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '2');
+4 test_d4
+5 test_d5
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '2', FORMAT 'csv', HEADER);
+b d
+\.
+
+-- zero workers: should perform non-parallel copy
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0');
+
+-- referencing table: should perform non-parallel copy
+CREATE TABLE test_copy_pk(c1 INT PRIMARY KEY);
+INSERT INTO test_copy_pk VALUES(10);
+CREATE TABLE test_copy_ri(c1 INT REFERENCES test_copy_pk(c1));
+COPY test_copy_ri from stdin with (FORMAT csv, DELIMITER ',', PARALLEL '2');
+10
+\.
+
+-- expressions: should perform non-parallel copy
+CREATE TABLE test_copy_expr (index INT, height REAL, weight REAL);
+
+COPY test_copy_expr FROM STDIN WITH (FORMAT csv, DELIMITER ',', parallel '2') WHERE height > random() * 65;
+60,60,60
+\.
+
+-- serial data: should perform non-parallel copy
+CREATE TABLE testserial (index SERIAL, height REAL);
+
+COPY testserial(height) FROM STDIN WITH (FORMAT csv, DELIMITER ',', parallel '2');
+60
+\.
+
+-- temporary table copy: should peform non-parallel copy
+CREATE TEMPORARY TABLE temp_test(
+ a int
+) ;
+
+COPY temp_test (a) from stdin with (PARALLEL '2');
+10
+\.
+
+-- non-existent column in column list: should fail
+COPY test_parallel_copy (xyz) from stdin with (PARALLEL '2');
+
+-- too many columns in column list: should fail
+COPY test_parallel_copy (a, b, c, d, e, d, c) from stdin with (PARALLEL '2');
+
+-- missing data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL '2');
+
+\.
+COPY test_parallel_copy from stdin with (PARALLEL '2');
+2000 230 23 23
+\.
+COPY test_parallel_copy from stdin with (PARALLEL '2');
+2001 231 \N \N
+\.
+
+-- extra data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL '2');
+2002 232 40 50 60 70 80
+\.
+
+-- various COPY options: delimiters, oids, NULL string, encoding
+COPY test_parallel_copy (b, c, d, e) from stdin with (delimiter ',', null 'x', PARALLEL '2') ;
+x,45,80,90
+x,\x,\\x,\\\x
+x,\,,\\\,,\\
+\.
+
+COPY test_parallel_copy from stdin WITH (DELIMITER ';', NULL '', PARALLEL '2');
+3000;;c;;
+\.
+
+COPY test_parallel_copy from stdin WITH (DELIMITER ':', NULL E'\\X', ENCODING 'sql_ascii', PARALLEL '2');
+4000:\X:C:\X:\X
+4001:1:empty::
+4002:2:null:\X:\X
+4003:3:Backslash:\\:\\
+4004:4:BackslashX:\\X:\\X
+4005:5:N:\N:\N
+4006:6:BackslashN:\\N:\\N
+4007:7:XX:\XX:\XX
+4008:8:Delimiter:\::\:
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a = 50004;
+50003 24 34 44 54
+50004 25 35 45 55
+50005 26 36 46 56
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a > 60003;
+60001 22 32 42 52
+60002 23 33 43 53
+60003 24 34 44 54
+60004 25 35 45 55
+60005 26 36 46 56
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE f > 60003;
+
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a = max(x.b);
+
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a IN (SELECT 1 FROM x);
+
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a IN (generate_series(1,5));
+
+COPY test_parallel_copy from stdin WITH(PARALLEL '2') WHERE a = row_number() over(b);
+
+-- check results of copy in
+SELECT * FROM test_parallel_copy;
+
-- clean up
+DROP TABLE test_copy_ri;
+DROP TABLE test_copy_pk;
+DROP TABLE test_copy_expr;
+DROP TABLE testeoc;
+DROP TABLE testnl;
DROP TABLE forcetest;
+DROP TABLE test_parallel_copy;
+DROP TABLE testserial;
+DROP TABLE testnull;
DROP TABLE vistest;
DROP FUNCTION truncate_in_subxact();
DROP TABLE x, y;
--
1.8.3.1
0006-Parallel-Copy-For-Binary-Format-Files.patchapplication/octet-stream; name=0006-Parallel-Copy-For-Binary-Format-Files.patchDownload
From be5da9036b223be38d0df4617781eb02634ecdac Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Wed, 24 Jun 2020 13:12:55 +0530
Subject: [PATCH 6/6] Parallel Copy For Binary Format Files
Leader reads data from the file into the DSM data blocks each of 64K size.
It also identifies each tuple data block id, start offset, end offset,
tuple size and updates this information in the ring data structure.
Workers parallely read the tuple information from the ring data structure,
the actual tuple data from the data blocks and parallely insert the tuples
into the table.
---
src/backend/commands/copy.c | 642 ++++++++++++++++++++++++++++++++----
1 file changed, 572 insertions(+), 70 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index e8a89a40a0..e1f03241e8 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -219,6 +219,16 @@ typedef struct ParallelCopyLineBuf
uint64 cur_lineno; /* line number for error messages */
}ParallelCopyLineBuf;
+/*
+ * Tuple boundary information used for parallel copy
+ * for binary format files.
+ */
+typedef struct ParallelCopyTupleInfo
+{
+ uint32 offset;
+ uint32 block_id;
+}ParallelCopyTupleInfo;
+
/*
* Parallel copy data information.
*/
@@ -240,6 +250,11 @@ typedef struct ParallelCopyData
/* Current position in worker_line_buf */
uint32 worker_line_buf_pos;
+
+ /*
+ * For binary formatted files
+ */
+ ParallelCopyDataBlock *curr_data_block;
}ParallelCopyData;
/*
@@ -397,6 +412,7 @@ typedef struct ParallelCopyCommonKeyData
/* Working state for COPY FROM */
AttrNumber num_defaults;
Oid relid;
+ bool binary;
}ParallelCopyCommonKeyData;
/*
@@ -697,6 +713,51 @@ if (!IsParallelCopy()) \
else \
return pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
+/*
+ * CHECK_FIELD_COUNT - Handles the error cases for field count
+ * for binary format files.
+ */
+#define CHECK_FIELD_COUNT \
+{\
+ if (fld_count == -1) \
+ { \
+ if (IsParallelCopy() && \
+ !IsLeader()) \
+ return true; \
+ else \
+ { \
+ /* \
+ * Received EOF marker. In a V3-protocol copy, wait for the \
+ * protocol-level EOF, and complain if it doesn't come \
+ * immediately. This ensures that we correctly handle CopyFail, \
+ * if client chooses to send that now. \
+ * \
+ * Note that we MUST NOT try to read more data in an old-protocol \
+ * copy, since there is no protocol-level EOF marker then. We \
+ * could go either way for copy from file, but choose to throw \
+ * error if there's data after the EOF marker, for consistency \
+ * with the new-protocol case. \
+ */ \
+ char dummy; \
+ if (cstate->copy_dest != COPY_OLD_FE && \
+ CopyGetData(cstate, &dummy, 1, 1) > 0) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("received copy data after EOF marker"))); \
+ if (IsParallelCopy() && \
+ IsLeader()) \
+ return true; \
+ else \
+ return false; \
+ } \
+ } \
+ if (fld_count != cstate->max_fields) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("row field count is %d, expected %d", \
+ (int) fld_count, cstate->max_fields))); \
+}
+
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
@@ -751,6 +812,16 @@ static void PopulateCatalogInformation(CopyState cstate);
static pg_attribute_always_inline uint32 GetLinePosition(CopyState cstate);
static pg_attribute_always_inline copy_data_source_cb LookupParallelCopyFnPtr(const char *funcname);
static pg_attribute_always_inline char* LookupParallelCopyFnStr(copy_data_source_cb fn_addr);
+static uint32 WaitGetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info);
+static pg_attribute_always_inline bool CopyReadBinaryTupleLeader(CopyState cstate);
+static pg_attribute_always_inline void CopyReadBinaryAttributeLeader(CopyState cstate,
+ FmgrInfo *flinfo, Oid typioparam, int32 typmod,
+ uint32 *new_block_pos, int m, ParallelCopyTupleInfo *tuple_start_info_ptr,
+ ParallelCopyTupleInfo *tuple_end_info_ptr, uint32 *line_size);
+static pg_attribute_always_inline bool CopyReadBinaryTupleWorker(CopyState cstate, Datum *values, bool *nulls);
+static pg_attribute_always_inline Datum CopyReadBinaryAttributeWorker(CopyState cstate, FmgrInfo *flinfo,
+ Oid typioparam, int32 typmod, bool *isnull);
+static void AdjustFieldInfo(CopyState cstate, uint8 mode);
/*
* CopyCommonInfoForWorker - Copy shared_cstate using cstate information.
@@ -769,6 +840,7 @@ CopyCommonInfoForWorker(CopyState cstate, ParallelCopyCommonKeyData *shared_csta
shared_cstate->convert_selectively = cstate->convert_selectively;
shared_cstate->num_defaults = cstate->num_defaults;
shared_cstate->relid = cstate->pcdata->relid;
+ shared_cstate->binary = cstate->binary;
}
/*
@@ -997,8 +1069,8 @@ FindInsertMethod(CopyState cstate)
static pg_attribute_always_inline bool
IsParallelCopyAllowed(CopyState cstate)
{
- /* Parallel copy not allowed for frontend (2.0 protocol) & binary option. */
- if ((cstate->copy_dest == COPY_OLD_FE) || cstate->binary)
+ /* Parallel copy not allowed for frontend (2.0 protocol). */
+ if (cstate->copy_dest == COPY_OLD_FE)
return false;
/* Check if copy is into foreign table or temporary table. */
@@ -1299,6 +1371,7 @@ ParallelWorkerInitialization(ParallelCopyCommonKeyData *shared_cstate,
cstate->convert_selectively = shared_cstate->convert_selectively;
cstate->num_defaults = shared_cstate->num_defaults;
pcdata->relid = shared_cstate->relid;
+ cstate->binary = shared_cstate->binary;
PopulateGlobalsForCopyFrom(cstate, tup_desc, attnamelist);
@@ -1314,7 +1387,7 @@ ParallelWorkerInitialization(ParallelCopyCommonKeyData *shared_cstate,
initStringInfo(&cstate->attribute_buf);
initStringInfo(&cstate->line_buf);
- for (count = 0; count < WORKER_CHUNK_COUNT;count++)
+ for (count = 0; count < WORKER_CHUNK_COUNT; count++)
initStringInfo(&pcdata->worker_line_buf[count].line_buf);
cstate->line_buf_converted = false;
@@ -1679,32 +1752,55 @@ ParallelCopyLeader(CopyState cstate)
/* Execute the before statement triggers from the leader */
ExecBeforeStmtTrigger(cstate);
- /* On input just throw the header line away. */
- if (cstate->cur_lineno == 0 && cstate->header_line)
+ if (!cstate->binary)
{
- cstate->cur_lineno++;
- if (CopyReadLine(cstate))
+ /* On input just throw the header line away. */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
{
- pcshared_info->is_read_in_progress = false;
- return; /* done */
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ {
+ pcshared_info->is_read_in_progress = false;
+ return; /* done */
+ }
}
- }
- for (;;)
+ for (;;)
+ {
+ bool done;
+ cstate->cur_lineno++;
+
+ /* Actually read the line into memory here. */
+ done = CopyReadLine(cstate);
+
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ break;
+ }
+ }
+ else
{
- bool done;
- cstate->cur_lineno++;
+ /* binary format */
+ cstate->pcdata->curr_data_block = NULL;
+ cstate->raw_buf_index = 0;
+ pcshared_info->populated = 0;
+ cstate->cur_lineno = 0;
+ cstate->max_fields = list_length(cstate->attnumlist);
- /* Actually read the line into memory here. */
- done = CopyReadLine(cstate);
+ for (;;)
+ {
+ bool eof = false;
+ cstate->cur_lineno++;
- /*
- * EOF at start of line means we're done. If we see EOF after some
- * characters, we act as though it was newline followed by EOF, ie,
- * process the line and then exit loop on next iteration.
- */
- if (done && cstate->line_buf.len == 0)
- break;
+ eof = CopyReadBinaryTupleLeader(cstate);
+
+ if (eof)
+ break;
+ }
}
pcshared_info->is_read_in_progress = false;
@@ -1712,7 +1808,425 @@ ParallelCopyLeader(CopyState cstate)
}
/*
- * GetLinePosition - return the line position that worker should process.
+ * AdjustFieldInfo - gets a new block, updates the
+ * current offset, calculates the skip bytes.
+ * Works in two modes, 1 for field count
+ * 2 for field size
+ */
+static void
+AdjustFieldInfo(CopyState cstate, uint8 mode)
+{
+ ParallelCopyDataBlock *data_block = NULL;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint8 movebytes = 0;
+ uint32 block_pos;
+ uint32 prev_block_pos;
+ int readbytes = 0;
+
+ prev_block_pos = pcshared_info->cur_block_pos;
+
+ block_pos = WaitGetFreeCopyBlock(pcshared_info);
+
+ movebytes = DATA_BLOCK_SIZE - cstate->raw_buf_index;
+
+ cstate->pcdata->curr_data_block->skip_bytes = movebytes;
+
+ data_block = &pcshared_info->data_blocks[block_pos];
+
+ if (movebytes > 0)
+ memmove(&data_block->data[0], &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index],
+ movebytes);
+
+ elog(DEBUG1, "LEADER - field info is spread across data blocks - moved %d bytes from current block %u to %u block",
+ movebytes, prev_block_pos, block_pos);
+
+ readbytes = CopyGetData(cstate, &data_block->data[movebytes], 1, (DATA_BLOCK_SIZE - movebytes));
+
+ elog(DEBUG1, "LEADER - bytes read from file after field info is moved to next data block %d", readbytes);
+
+ if (cstate->reached_eof)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("unexpected EOF in COPY data")));
+
+ if (mode == 1)
+ {
+ cstate->pcdata->curr_data_block = data_block;
+ cstate->raw_buf_index = 0;
+ }
+ else if(mode == 2)
+ {
+ ParallelCopyDataBlock *prev_data_block = NULL;
+ prev_data_block = cstate->pcdata->curr_data_block;
+ prev_data_block->following_block = block_pos;
+ cstate->pcdata->curr_data_block = data_block;
+
+ if (prev_data_block->curr_blk_completed == false)
+ prev_data_block->curr_blk_completed = true;
+
+ cstate->raw_buf_index = 0;
+ }
+}
+
+/*
+ * CopyReadBinaryTupleLeader - leader reads data from binary formatted file
+ * to data blocks and identies tuple boundaries/offsets so that workers
+ * can work on the data blocks data.
+ */
+static pg_attribute_always_inline bool
+CopyReadBinaryTupleLeader(CopyState cstate)
+{
+ int readbytes = 0;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint32 block_pos;
+ int16 fld_count;
+ ListCell *cur;
+ FmgrInfo *in_functions = cstate->in_functions;
+ Oid *typioparams = cstate->typioparams;
+ TupleDesc tupDesc = RelationGetDescr(cstate->rel);
+ uint32 new_block_pos;
+ uint32 line_size = -1;
+ ParallelCopyTupleInfo curr_tuple_start_info;
+ ParallelCopyTupleInfo curr_tuple_end_info;
+
+ curr_tuple_start_info.block_id = -1;
+ curr_tuple_start_info.offset = -1;
+ curr_tuple_end_info.block_id = -1;
+ curr_tuple_end_info.offset = -1;
+
+ if (cstate->pcdata->curr_data_block == NULL)
+ {
+ block_pos = WaitGetFreeCopyBlock(pcshared_info);
+
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[block_pos];
+
+ cstate->raw_buf_index = 0;
+
+ readbytes = CopyGetData(cstate, &cstate->pcdata->curr_data_block->data, 1, DATA_BLOCK_SIZE);
+
+ elog(DEBUG1, "LEADER - bytes read from file %d", readbytes);
+
+ if (cstate->reached_eof)
+ return true;
+ }
+
+ if (cstate->raw_buf_index + sizeof(fld_count) >= (DATA_BLOCK_SIZE - 1))
+ AdjustFieldInfo(cstate, 1);
+
+ memcpy(&fld_count, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_count));
+
+ fld_count = (int16) pg_ntoh16(fld_count);
+
+ CHECK_FIELD_COUNT;
+
+ cstate->raw_buf_index = cstate->raw_buf_index + sizeof(fld_count);
+ new_block_pos = pcshared_info->cur_block_pos;
+
+ foreach(cur, cstate->attnumlist)
+ {
+ int attnum = lfirst_int(cur);
+ int m = attnum - 1;
+ Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+ CopyReadBinaryAttributeLeader(cstate,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &new_block_pos,
+ m,
+ &curr_tuple_start_info,
+ &curr_tuple_end_info,
+ &line_size);
+ }
+
+ if (line_size > 0)
+ {
+ int line_pos = UpdateBlockInLineInfo(cstate,
+ curr_tuple_start_info.block_id,
+ curr_tuple_start_info.offset,
+ line_size,
+ LINE_LEADER_POPULATED);
+
+ pcshared_info->populated++;
+
+ elog(DEBUG1, "LEADER - adding - block:%u, offset:%u, line size:%u line position:%d",
+ curr_tuple_start_info.block_id,
+ curr_tuple_start_info.offset,
+ line_size, line_pos);
+ }
+
+ return false;
+}
+
+/*
+ * CopyReadBinaryAttributeLeader - leader identifies boundaries/offsets
+ * for each attribute/column, it moves on to next data block if the
+ * attribute/column is spread across data blocks.
+ */
+static pg_attribute_always_inline void
+CopyReadBinaryAttributeLeader(CopyState cstate, FmgrInfo *flinfo,
+ Oid typioparam, int32 typmod, uint32 *new_block_pos,
+ int m, ParallelCopyTupleInfo *tuple_start_info_ptr,
+ ParallelCopyTupleInfo *tuple_end_info_ptr, uint32 *line_size)
+{
+ int32 fld_size;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+
+ if (m == 0)
+ {
+ tuple_start_info_ptr->block_id = pcshared_info->cur_block_pos;
+ /* raw_buf_index would have moved away size of field count bytes
+ in the caller, so move back to store the tuple start offset.
+ */
+ tuple_start_info_ptr->offset = cstate->raw_buf_index - sizeof(int16);
+ }
+
+ if ((cstate->raw_buf_index + sizeof(fld_size)) >= (DATA_BLOCK_SIZE - 1))
+ {
+ AdjustFieldInfo(cstate, 2);
+ *new_block_pos = pcshared_info->cur_block_pos;
+ }
+
+ memcpy(&fld_size, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_size));
+
+ cstate->raw_buf_index = cstate->raw_buf_index + sizeof(fld_size);
+
+ fld_size = (int32) pg_ntoh32(fld_size);
+
+ if (fld_size == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("unexpected EOF in COPY data")));
+
+ if (fld_size < -1)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("invalid field size")));
+
+ if ((DATA_BLOCK_SIZE-cstate->raw_buf_index) >= fld_size)
+ {
+ cstate->raw_buf_index = cstate->raw_buf_index + fld_size;
+ }
+ else
+ {
+ ParallelCopyDataBlock *data_block = NULL;
+ ParallelCopyDataBlock *prev_data_block = NULL;
+ uint32 block_pos;
+ int readbytes;
+
+ block_pos = WaitGetFreeCopyBlock(pcshared_info);
+
+ data_block = &pcshared_info->data_blocks[block_pos];
+
+ readbytes = CopyGetData(cstate, &data_block->data[0], 1, DATA_BLOCK_SIZE);
+
+ elog(DEBUG1, "LEADER - bytes read from file after detecting that tuple is spread across data blocks %d", readbytes);
+
+ if (cstate->reached_eof)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("unexpected EOF in COPY data")));
+
+ prev_data_block = cstate->pcdata->curr_data_block;
+ prev_data_block->following_block = block_pos;
+
+ if (prev_data_block->curr_blk_completed == false)
+ prev_data_block->curr_blk_completed = true;
+
+ cstate->pcdata->curr_data_block = data_block;
+ cstate->raw_buf_index = fld_size - (DATA_BLOCK_SIZE - cstate->raw_buf_index);
+ *new_block_pos = block_pos;
+ }
+
+ if (m == cstate->max_fields - 1)
+ {
+ tuple_end_info_ptr->block_id = *new_block_pos;
+ tuple_end_info_ptr->offset = cstate->raw_buf_index - 1;
+
+ if (tuple_start_info_ptr->block_id == tuple_end_info_ptr->block_id)
+ {
+ elog(DEBUG1,"LEADER - tuple lies in a single data block");
+
+ *line_size = tuple_end_info_ptr->offset - tuple_start_info_ptr->offset + 1;
+ pg_atomic_add_fetch_u32(&pcshared_info->data_blocks[tuple_start_info_ptr->block_id].unprocessed_line_parts, 1);
+ }
+ else
+ {
+ uint32 following_block_id = pcshared_info->data_blocks[tuple_start_info_ptr->block_id].following_block;
+
+ elog(DEBUG1,"LEADER - tuple is spread across data blocks");
+
+ *line_size = DATA_BLOCK_SIZE - tuple_start_info_ptr->offset -
+ pcshared_info->data_blocks[tuple_start_info_ptr->block_id].skip_bytes;
+
+ pg_atomic_add_fetch_u32(&pcshared_info->data_blocks[tuple_start_info_ptr->block_id].unprocessed_line_parts, 1);
+
+ while (following_block_id != tuple_end_info_ptr->block_id)
+ {
+ *line_size = *line_size + DATA_BLOCK_SIZE - pcshared_info->data_blocks[following_block_id].skip_bytes;
+
+ pg_atomic_add_fetch_u32(&pcshared_info->data_blocks[following_block_id].unprocessed_line_parts, 1);
+
+ following_block_id = pcshared_info->data_blocks[following_block_id].following_block;
+
+ if (following_block_id == -1)
+ break;
+ }
+
+ if (following_block_id != -1)
+ pg_atomic_add_fetch_u32(&pcshared_info->data_blocks[following_block_id].unprocessed_line_parts, 1);
+
+ *line_size = *line_size + tuple_end_info_ptr->offset + 1;
+ }
+ }
+}
+
+/*
+ * CopyReadBinaryTupleWorker - each worker reads data from data blocks after
+ * getting leader-identified tuple offsets from ring data structure.
+ */
+static pg_attribute_always_inline bool
+CopyReadBinaryTupleWorker(CopyState cstate, Datum *values, bool *nulls)
+{
+ uint32 line_pos;
+ ParallelCopyLineBoundary *line_info;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ int16 fld_count;
+ ListCell *cur;
+ FmgrInfo *in_functions = cstate->in_functions;
+ Oid *typioparams = cstate->typioparams;
+ TupleDesc tupDesc = RelationGetDescr(cstate->rel);
+
+ line_pos = GetLinePosition(cstate);
+
+ if (line_pos == -1)
+ return true;
+
+ line_info = &pcshared_info->line_boundaries.ring[line_pos];
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[line_info->first_block];
+ cstate->raw_buf_index = line_info->start_offset;
+
+ if (cstate->raw_buf_index + sizeof(fld_count) >= (DATA_BLOCK_SIZE - 1))
+ /*the case where field count spread across datablocks should never occur.
+ as the leader would have moved it to next block*/
+ elog(DEBUG1, "WORKER - field count spread across datablocks should never occur");
+
+ memcpy(&fld_count, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_count));
+ fld_count = (int16) pg_ntoh16(fld_count);
+
+ CHECK_FIELD_COUNT;
+
+ cstate->raw_buf_index = cstate->raw_buf_index + sizeof(fld_count);
+
+ foreach(cur, cstate->attnumlist)
+ {
+ int attnum = lfirst_int(cur);
+ int m = attnum - 1;
+ Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+ cstate->cur_attname = NameStr(att->attname);
+
+ values[m] = CopyReadBinaryAttributeWorker(cstate,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &nulls[m]);
+ cstate->cur_attname = NULL;
+ }
+
+ pg_atomic_sub_fetch_u32(&cstate->pcdata->curr_data_block->unprocessed_line_parts, 1);
+ line_info->start_offset = -1;
+ pg_atomic_write_u32(&line_info->line_state, LINE_WORKER_PROCESSED);
+ pg_atomic_write_u32(&line_info->line_size, -1);
+ pg_atomic_add_fetch_u64(&pcshared_info->total_worker_processed, 1);
+
+ return false;
+}
+
+/*
+ * CopyReadBinaryAttributeWorker - leader identifies boundaries/offsets
+ * for each attribute/column, it moves on to next data block if the
+ * attribute/column is spread across data blocks.
+ */
+static pg_attribute_always_inline Datum
+CopyReadBinaryAttributeWorker(CopyState cstate, FmgrInfo *flinfo,
+ Oid typioparam, int32 typmod, bool *isnull)
+{
+ int32 fld_size;
+ Datum result;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+
+ if ((cstate->raw_buf_index + sizeof(fld_size)) >= (DATA_BLOCK_SIZE - 1))
+ {
+ ParallelCopyDataBlock *prev_data_block = cstate->pcdata->curr_data_block;
+
+ elog(DEBUG1, "WORKER - field size is spread across data blocks");
+
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[cstate->pcdata->curr_data_block->following_block];
+ pg_atomic_sub_fetch_u32(&prev_data_block->unprocessed_line_parts, 1);
+ cstate->raw_buf_index = 0;
+ }
+
+ memcpy(&fld_size, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_size));
+ fld_size = (int32) pg_ntoh32(fld_size);
+
+ if (fld_size == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("unexpected EOF in COPY data")));
+
+ if (fld_size < -1)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("invalid field size")));
+
+ cstate->raw_buf_index = cstate->raw_buf_index + sizeof(fld_size);
+
+ /* reset attribute_buf to empty, and load raw data in it */
+ resetStringInfo(&cstate->attribute_buf);
+
+ enlargeStringInfo(&cstate->attribute_buf, fld_size);
+
+ if ((DATA_BLOCK_SIZE - cstate->raw_buf_index) >= fld_size)
+ {
+ elog(DEBUG1, "WORKER - tuple lies in single data block");
+
+ memcpy(&cstate->attribute_buf.data[0],&cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], fld_size);
+ cstate->raw_buf_index = cstate->raw_buf_index + fld_size;
+ }
+ else
+ {
+ ParallelCopyDataBlock *prev_data_block = cstate->pcdata->curr_data_block;
+
+ elog(DEBUG1, "WORKER - tuple is spread across data blocks");
+
+ memcpy(&cstate->attribute_buf.data[0], &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index],
+ (DATA_BLOCK_SIZE - cstate->raw_buf_index));
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[cstate->pcdata->curr_data_block->following_block];
+ pg_atomic_sub_fetch_u32(&prev_data_block->unprocessed_line_parts, 1);
+ memcpy(&cstate->attribute_buf.data[DATA_BLOCK_SIZE - cstate->raw_buf_index],
+ &cstate->pcdata->curr_data_block->data[0], (fld_size - (DATA_BLOCK_SIZE - cstate->raw_buf_index)));
+ cstate->raw_buf_index = fld_size - (DATA_BLOCK_SIZE - cstate->raw_buf_index);
+ }
+
+ cstate->attribute_buf.len = fld_size;
+ cstate->attribute_buf.data[fld_size] = '\0';
+
+ /* Call the column type's binary input converter */
+ result = ReceiveFunctionCall(flinfo, &cstate->attribute_buf,
+ typioparam, typmod);
+
+ /* Trouble if it didn't eat the whole buffer */
+ if (cstate->attribute_buf.cursor != cstate->attribute_buf.len)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
+ errmsg("incorrect binary data format")));
+
+ *isnull = false;
+ return result;
+}
+
+/*
+ * GetLinePosition - return the line position that worker should pcdata->process.
*/
static uint32
GetLinePosition(CopyState cstate)
@@ -1797,6 +2311,7 @@ GetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
{
dataBlkPtr->curr_blk_completed = false;
dataBlkPtr->skip_bytes = 0;
+ dataBlkPtr->following_block = -1;
pcshared_info->cur_block_pos = block_pos;
return block_pos;
}
@@ -5449,60 +5964,47 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
else
{
/* binary */
- int16 fld_count;
- ListCell *cur;
-
- cstate->cur_lineno++;
-
- if (!CopyGetInt16(cstate, &fld_count))
+ if (!IsParallelCopy())
{
- /* EOF detected (end of file, or protocol-level EOF) */
- return false;
- }
+ int16 fld_count;
+ ListCell *cur;
- if (fld_count == -1)
- {
- /*
- * Received EOF marker. In a V3-protocol copy, wait for the
- * protocol-level EOF, and complain if it doesn't come
- * immediately. This ensures that we correctly handle CopyFail,
- * if client chooses to send that now.
- *
- * Note that we MUST NOT try to read more data in an old-protocol
- * copy, since there is no protocol-level EOF marker then. We
- * could go either way for copy from file, but choose to throw
- * error if there's data after the EOF marker, for consistency
- * with the new-protocol case.
- */
- char dummy;
+ cstate->cur_lineno++;
+ cstate->max_fields = list_length(cstate->attnumlist);
- if (cstate->copy_dest != COPY_OLD_FE &&
- CopyGetData(cstate, &dummy, 1, 1) > 0)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("received copy data after EOF marker")));
- return false;
- }
+ if (!CopyGetInt16(cstate, &fld_count))
+ {
+ /* EOF detected (end of file, or protocol-level EOF) */
+ return false;
+ }
- if (fld_count != attr_count)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("row field count is %d, expected %d",
- (int) fld_count, attr_count)));
+ CHECK_FIELD_COUNT;
- foreach(cur, cstate->attnumlist)
+ foreach(cur, cstate->attnumlist)
+ {
+ int attnum = lfirst_int(cur);
+ int m = attnum - 1;
+ Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+ cstate->cur_attname = NameStr(att->attname);
+ values[m] = CopyReadBinaryAttribute(cstate,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &nulls[m]);
+ cstate->cur_attname = NULL;
+ }
+ }
+ else
{
- int attnum = lfirst_int(cur);
- int m = attnum - 1;
- Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+ bool eof = false;
+ cstate->cur_lineno++;
+ cstate->max_fields = list_length(cstate->attnumlist);
- cstate->cur_attname = NameStr(att->attname);
- values[m] = CopyReadBinaryAttribute(cstate,
- &in_functions[m],
- typioparams[m],
- att->atttypmod,
- &nulls[m]);
- cstate->cur_attname = NULL;
+ eof = CopyReadBinaryTupleWorker(cstate, values, nulls);
+
+ if (eof)
+ return false;
}
}
--
2.25.1
Hi,
0006 patch has some code clean up and issue fixes found during internal testing.
Attaching the latest patches herewith.
The order of applying the patches remains the same i.e. from 0001 to 0006.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
0001-Copy-code-readjustment-to-support-parallel-copy.patchapplication/octet-stream; name=0001-Copy-code-readjustment-to-support-parallel-copy.patchDownload
From 4455d3e067bda56316bb292e5d010bdf40254fec Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Tue, 23 Jun 2020 06:49:22 +0530
Subject: [PATCH 1/6] Copy code readjustment to support parallel copy.
This patch has the copy code slightly readjusted so that the common code is
separated to functions/macros, these functions/macros will be used by the
workers in the parallel copy code of the upcoming patches. EOL removal is moved
from CopyReadLine to CopyReadLineText, this change was required because in case
of parallel copy the record identification and record updation is done in
CopyReadLineText, before record information is updated in shared memory the new
line characters should be removed.
---
src/backend/commands/copy.c | 372 ++++++++++++++++++++++++++------------------
1 file changed, 221 insertions(+), 151 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 6b1fd6d..65a504f 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -95,6 +95,9 @@ typedef enum CopyInsertMethod
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
} CopyInsertMethod;
+#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
+
/*
* This struct contains all the state variables used throughout a COPY
* operation. For simplicity, we use the same struct for all variants of COPY,
@@ -219,7 +222,6 @@ typedef struct CopyStateData
* converts it. Note: we guarantee that there is a \0 at
* raw_buf[raw_buf_len].
*/
-#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
char *raw_buf;
int raw_buf_index; /* next byte to process */
int raw_buf_len; /* total # of bytes stored */
@@ -347,6 +349,88 @@ if (1) \
goto not_end_of_copy; \
} else ((void) 0)
+/*
+ * CONVERT_TO_SERVER_ENCODING - convert contents to server encoding.
+ */
+#define CONVERT_TO_SERVER_ENCODING(cstate) \
+{ \
+ /* Done reading the line. Convert it to server encoding. */ \
+ if (cstate->need_transcoding) \
+ { \
+ char *cvt; \
+ cvt = pg_any_to_server(cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ cstate->file_encoding); \
+ if (cvt != cstate->line_buf.data) \
+ { \
+ /* transfer converted data back to line_buf */ \
+ resetStringInfo(&cstate->line_buf); \
+ appendBinaryStringInfo(&cstate->line_buf, cvt, strlen(cvt)); \
+ pfree(cvt); \
+ } \
+ } \
+ /* Now it's safe to use the buffer in error messages */ \
+ cstate->line_buf_converted = true; \
+}
+
+/*
+ * CLEAR_EOL_FROM_COPIED_DATA - Clear EOL from the copied data.
+ */
+#define CLEAR_EOL_FROM_COPIED_DATA(copy_line_data, copy_line_pos, copy_line_size) \
+{ \
+ /* \
+ * If we didn't hit EOF, then we must have transferred the EOL marker \
+ * to line_buf along with the data. Get rid of it. \
+ */ \
+ switch (cstate->eol_type) \
+ { \
+ case EOL_NL: \
+ Assert(copy_line_size >= 1); \
+ Assert(copy_line_data[copy_line_pos - 1] == '\n'); \
+ copy_line_data[copy_line_pos - 1] = '\0'; \
+ copy_line_size--; \
+ break; \
+ case EOL_CR: \
+ Assert(copy_line_size >= 1); \
+ Assert(copy_line_data[copy_line_pos - 1] == '\r'); \
+ copy_line_data[copy_line_pos - 1] = '\0'; \
+ copy_line_size--; \
+ break; \
+ case EOL_CRNL: \
+ Assert(copy_line_size >= 2); \
+ Assert(copy_line_data[copy_line_pos - 2] == '\r'); \
+ Assert(copy_line_data[copy_line_pos - 1] == '\n'); \
+ copy_line_data[copy_line_pos - 2] = '\0'; \
+ copy_line_size -= 2; \
+ break; \
+ case EOL_UNKNOWN: \
+ /* shouldn't get here */ \
+ Assert(false); \
+ break; \
+ } \
+}
+
+/*
+ * CLEAR_EOL_LINE - Wrapper for clearing EOL.
+ */
+#define CLEAR_EOL_LINE() \
+if (!result && !IsHeaderLine()) \
+ CLEAR_EOL_FROM_COPIED_DATA(cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ cstate->line_buf.len) \
+
+/*
+ * INCREMENTPROCESSED - Increment the lines processed.
+ */
+#define INCREMENTPROCESSED(processed) \
+processed++;
+
+/*
+ * GETPROCESSED - Get the lines processed.
+ */
+#define GETPROCESSED(processed) \
+return processed;
+
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
@@ -392,6 +476,8 @@ static bool CopyGetInt32(CopyState cstate, int32 *val);
static void CopySendInt16(CopyState cstate, int16 val);
static bool CopyGetInt16(CopyState cstate, int16 *val);
+static void PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tup_desc,
+ List *attnamelist);
/*
* Send copy start/stop messages for frontend copies. These have changed
@@ -793,6 +879,7 @@ CopyLoadRawBuf(CopyState cstate)
{
int nbytes;
int inbytes;
+ int minread = 1;
if (cstate->raw_buf_index < cstate->raw_buf_len)
{
@@ -804,8 +891,11 @@ CopyLoadRawBuf(CopyState cstate)
else
nbytes = 0; /* no data need be saved */
+ if (cstate->copy_dest == COPY_NEW_FE)
+ minread = RAW_BUF_SIZE - nbytes;
+
inbytes = CopyGetData(cstate, cstate->raw_buf + nbytes,
- 1, RAW_BUF_SIZE - nbytes);
+ minread, RAW_BUF_SIZE - nbytes);
nbytes += inbytes;
cstate->raw_buf[nbytes] = '\0';
cstate->raw_buf_index = 0;
@@ -1463,7 +1553,6 @@ BeginCopy(ParseState *pstate,
{
CopyState cstate;
TupleDesc tupDesc;
- int num_phys_attrs;
MemoryContext oldcontext;
/* Allocate workspace and zero all fields */
@@ -1629,6 +1718,24 @@ BeginCopy(ParseState *pstate,
tupDesc = cstate->queryDesc->tupDesc;
}
+ PopulateGlobalsForCopyFrom(cstate, tupDesc, attnamelist);
+ cstate->copy_dest = COPY_FILE; /* default */
+
+ MemoryContextSwitchTo(oldcontext);
+
+ return cstate;
+}
+
+/*
+ * PopulateGlobalsForCopyFrom - Populates the common variables required for copy
+ * from operation. This is a helper function for BeginCopy function.
+ */
+static void
+PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tupDesc,
+ List *attnamelist)
+{
+ int num_phys_attrs;
+
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
@@ -1748,12 +1855,6 @@ BeginCopy(ParseState *pstate,
pg_database_encoding_max_length() > 1);
/* See Multibyte encoding comment above */
cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
-
- cstate->copy_dest = COPY_FILE; /* default */
-
- MemoryContextSwitchTo(oldcontext);
-
- return cstate;
}
/*
@@ -2646,32 +2747,11 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
}
/*
- * Copy FROM file to relation.
+ * Check if the relation specified in copy from is valid.
*/
-uint64
-CopyFrom(CopyState cstate)
+static void
+CheckTargetRelValidity(CopyState cstate)
{
- ResultRelInfo *resultRelInfo;
- ResultRelInfo *target_resultRelInfo;
- ResultRelInfo *prevResultRelInfo = NULL;
- EState *estate = CreateExecutorState(); /* for ExecConstraints() */
- ModifyTableState *mtstate;
- ExprContext *econtext;
- TupleTableSlot *singleslot = NULL;
- MemoryContext oldcontext = CurrentMemoryContext;
-
- PartitionTupleRouting *proute = NULL;
- ErrorContextCallback errcallback;
- CommandId mycid = GetCurrentCommandId(true);
- int ti_options = 0; /* start with default options for insert */
- BulkInsertState bistate = NULL;
- CopyInsertMethod insertMethod;
- CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
- uint64 processed = 0;
- bool has_before_insert_row_trig;
- bool has_instead_insert_row_trig;
- bool leafpart_use_multi_insert = false;
-
Assert(cstate->rel);
/*
@@ -2708,27 +2788,6 @@ CopyFrom(CopyState cstate)
RelationGetRelationName(cstate->rel))));
}
- /*
- * If the target file is new-in-transaction, we assume that checking FSM
- * for free space is a waste of time. This could possibly be wrong, but
- * it's unlikely.
- */
- if (RELKIND_HAS_STORAGE(cstate->rel->rd_rel->relkind) &&
- (cstate->rel->rd_createSubid != InvalidSubTransactionId ||
- cstate->rel->rd_firstRelfilenodeSubid != InvalidSubTransactionId))
- ti_options |= TABLE_INSERT_SKIP_FSM;
-
- /*
- * Optimize if new relfilenode was created in this subxact or one of its
- * committed children and we won't see those rows later as part of an
- * earlier scan or command. The subxact test ensures that if this subxact
- * aborts then the frozen rows won't be visible after xact cleanup. Note
- * that the stronger test of exactly which subtransaction created it is
- * crucial for correctness of this optimization. The test for an earlier
- * scan or command tolerates false negatives. FREEZE causes other sessions
- * to see rows they would not see under MVCC, and a false negative merely
- * spreads that anomaly to the current session.
- */
if (cstate->freeze)
{
/*
@@ -2766,9 +2825,61 @@ CopyFrom(CopyState cstate)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("cannot perform COPY FREEZE because the table was not created or truncated in the current subtransaction")));
+ }
+}
+
+/*
+ * Copy FROM file to relation.
+ */
+uint64
+CopyFrom(CopyState cstate)
+{
+ ResultRelInfo *resultRelInfo;
+ ResultRelInfo *target_resultRelInfo;
+ ResultRelInfo *prevResultRelInfo = NULL;
+ EState *estate = CreateExecutorState(); /* for ExecConstraints() */
+ ModifyTableState *mtstate;
+ ExprContext *econtext;
+ TupleTableSlot *singleslot = NULL;
+ MemoryContext oldcontext = CurrentMemoryContext;
+
+ PartitionTupleRouting *proute = NULL;
+ ErrorContextCallback errcallback;
+ CommandId mycid = GetCurrentCommandId(true);
+ int ti_options = 0; /* start with default options for insert */
+ BulkInsertState bistate = NULL;
+ CopyInsertMethod insertMethod;
+ CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
+ uint64 processed = 0;
+ bool has_before_insert_row_trig;
+ bool has_instead_insert_row_trig;
+ bool leafpart_use_multi_insert = false;
+
+ CheckTargetRelValidity(cstate);
+ /*
+ * If the target file is new-in-transaction, we assume that checking FSM
+ * for free space is a waste of time. This could possibly be wrong, but
+ * it's unlikely.
+ */
+ if (RELKIND_HAS_STORAGE(cstate->rel->rd_rel->relkind) &&
+ (cstate->rel->rd_createSubid != InvalidSubTransactionId ||
+ cstate->rel->rd_firstRelfilenodeSubid != InvalidSubTransactionId))
+ ti_options |= TABLE_INSERT_SKIP_FSM;
+
+ /*
+ * Optimize if new relfilenode was created in this subxact or one of its
+ * committed children and we won't see those rows later as part of an
+ * earlier scan or command. The subxact test ensures that if this subxact
+ * aborts then the frozen rows won't be visible after xact cleanup. Note
+ * that the stronger test of exactly which subtransaction created it is
+ * crucial for correctness of this optimization. The test for an earlier
+ * scan or command tolerates false negatives. FREEZE causes other sessions
+ * to see rows they would not see under MVCC, and a false negative merely
+ * spreads that anomaly to the current session.
+ */
+ if (cstate->freeze)
ti_options |= TABLE_INSERT_FROZEN;
- }
/*
* We need a ResultRelInfo so we can use the regular executor's
@@ -3261,7 +3372,7 @@ CopyFrom(CopyState cstate)
* or FDW; this is the same definition used by nodeModifyTable.c
* for counting tuples inserted by an INSERT command.
*/
- processed++;
+ INCREMENTPROCESSED(processed)
}
}
@@ -3316,30 +3427,15 @@ CopyFrom(CopyState cstate)
FreeExecutorState(estate);
- return processed;
+ GETPROCESSED(processed)
}
/*
- * Setup to read tuples from a file for COPY FROM.
- *
- * 'rel': Used as a template for the tuples
- * 'filename': Name of server-local file to read
- * 'attnamelist': List of char *, columns to include. NIL selects all cols.
- * 'options': List of DefElem. See copy_opt_item in gram.y for selections.
- *
- * Returns a CopyState, to be passed to NextCopyFrom and related functions.
+ * PopulateCatalogInformation - populate the catalog information.
*/
-CopyState
-BeginCopyFrom(ParseState *pstate,
- Relation rel,
- const char *filename,
- bool is_program,
- copy_data_source_cb data_source_cb,
- List *attnamelist,
- List *options)
+static void
+PopulateCatalogInformation(CopyState cstate)
{
- CopyState cstate;
- bool pipe = (filename == NULL);
TupleDesc tupDesc;
AttrNumber num_phys_attrs,
num_defaults;
@@ -3349,31 +3445,8 @@ BeginCopyFrom(ParseState *pstate,
Oid in_func_oid;
int *defmap;
ExprState **defexprs;
- MemoryContext oldcontext;
bool volatile_defexprs;
- cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
- oldcontext = MemoryContextSwitchTo(cstate->copycontext);
-
- /* Initialize state variables */
- cstate->reached_eof = false;
- cstate->eol_type = EOL_UNKNOWN;
- cstate->cur_relname = RelationGetRelationName(cstate->rel);
- cstate->cur_lineno = 0;
- cstate->cur_attname = NULL;
- cstate->cur_attval = NULL;
-
- /* Set up variables to avoid per-attribute overhead. */
- initStringInfo(&cstate->attribute_buf);
- initStringInfo(&cstate->line_buf);
- cstate->line_buf_converted = false;
- cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
- cstate->raw_buf_index = cstate->raw_buf_len = 0;
-
- /* Assign range table, we'll need it in CopyFrom. */
- if (pstate)
- cstate->range_table = pstate->p_rtable;
-
tupDesc = RelationGetDescr(cstate->rel);
num_phys_attrs = tupDesc->natts;
num_defaults = 0;
@@ -3451,6 +3524,54 @@ BeginCopyFrom(ParseState *pstate,
cstate->defexprs = defexprs;
cstate->volatile_defexprs = volatile_defexprs;
cstate->num_defaults = num_defaults;
+}
+
+/*
+ * Setup to read tuples from a file for COPY FROM.
+ *
+ * 'rel': Used as a template for the tuples
+ * 'filename': Name of server-local file to read
+ * 'attnamelist': List of char *, columns to include. NIL selects all cols.
+ * 'options': List of DefElem. See copy_opt_item in gram.y for selections.
+ *
+ * Returns a CopyState, to be passed to NextCopyFrom and related functions.
+ */
+CopyState
+BeginCopyFrom(ParseState *pstate,
+ Relation rel,
+ const char *filename,
+ bool is_program,
+ copy_data_source_cb data_source_cb,
+ List *attnamelist,
+ List *options)
+{
+ CopyState cstate;
+ bool pipe = (filename == NULL);
+ MemoryContext oldcontext;
+
+ cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+
+ /* Initialize state variables */
+ cstate->reached_eof = false;
+ cstate->eol_type = EOL_UNKNOWN;
+ cstate->cur_relname = RelationGetRelationName(cstate->rel);
+ cstate->cur_lineno = 0;
+ cstate->cur_attname = NULL;
+ cstate->cur_attval = NULL;
+
+ /* Set up variables to avoid per-attribute overhead. */
+ initStringInfo(&cstate->attribute_buf);
+ initStringInfo(&cstate->line_buf);
+ cstate->line_buf_converted = false;
+ cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+
+ /* Assign range table, we'll need it in CopyFrom. */
+ if (pstate)
+ cstate->range_table = pstate->p_rtable;
+
+ PopulateCatalogInformation(cstate);
cstate->is_program = is_program;
if (data_source_cb)
@@ -3860,60 +3981,8 @@ CopyReadLine(CopyState cstate)
} while (CopyLoadRawBuf(cstate));
}
}
- else
- {
- /*
- * If we didn't hit EOF, then we must have transferred the EOL marker
- * to line_buf along with the data. Get rid of it.
- */
- switch (cstate->eol_type)
- {
- case EOL_NL:
- Assert(cstate->line_buf.len >= 1);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\n');
- cstate->line_buf.len--;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_CR:
- Assert(cstate->line_buf.len >= 1);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\r');
- cstate->line_buf.len--;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_CRNL:
- Assert(cstate->line_buf.len >= 2);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 2] == '\r');
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\n');
- cstate->line_buf.len -= 2;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_UNKNOWN:
- /* shouldn't get here */
- Assert(false);
- break;
- }
- }
-
- /* Done reading the line. Convert it to server encoding. */
- if (cstate->need_transcoding)
- {
- char *cvt;
-
- cvt = pg_any_to_server(cstate->line_buf.data,
- cstate->line_buf.len,
- cstate->file_encoding);
- if (cvt != cstate->line_buf.data)
- {
- /* transfer converted data back to line_buf */
- resetStringInfo(&cstate->line_buf);
- appendBinaryStringInfo(&cstate->line_buf, cvt, strlen(cvt));
- pfree(cvt);
- }
- }
-
- /* Now it's safe to use the buffer in error messages */
- cstate->line_buf_converted = true;
+ CONVERT_TO_SERVER_ENCODING(cstate)
return result;
}
@@ -4277,6 +4346,7 @@ not_end_of_copy:
* Transfer any still-uncopied data to line_buf.
*/
REFILL_LINEBUF;
+ CLEAR_EOL_LINE()
return result;
}
--
1.8.3.1
0002-Framework-for-leader-worker-in-parallel-copy.patchapplication/octet-stream; name=0002-Framework-for-leader-worker-in-parallel-copy.patchDownload
From d8e1447263289fe2ad5ed40f620ab00b5f0bd407 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Wed, 24 Jun 2020 13:51:56 +0530
Subject: [PATCH 2/6] Framework for leader/worker in parallel copy
This patch has the framework for data structures in parallel copy, leader
initialization, worker initialization, shared memory updation, starting workers,
wait for workers and workers exiting.
---
src/backend/access/transam/parallel.c | 4 +
src/backend/commands/copy.c | 865 +++++++++++++++++++-
src/backend/replication/logical/tablesync.c | 2 +-
src/include/commands/copy.h | 4 +
src/tools/pgindent/typedefs.list | 8 +
5 files changed, 878 insertions(+), 5 deletions(-)
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 14a8690019..09e7a191d3 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -25,6 +25,7 @@
#include "catalog/pg_enum.h"
#include "catalog/storage.h"
#include "commands/async.h"
+#include "commands/copy.h"
#include "executor/execParallel.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
@@ -145,6 +146,9 @@ static const struct
},
{
"parallel_vacuum_main", parallel_vacuum_main
+ },
+ {
+ "ParallelCopyMain", ParallelCopyMain
}
};
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 65a504fe96..c906655d0b 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -96,8 +96,137 @@ typedef enum CopyInsertMethod
} CopyInsertMethod;
#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+
+#define DATA_BLOCK_SIZE RAW_BUF_SIZE
+#define RINGSIZE (10 * 1000)
+#define MAX_BLOCKS_COUNT 1000
+#define WORKER_CHUNK_COUNT 50 /* should be mod of RINGSIZE */
+
+#define IsParallelCopy() (cstate->is_parallel)
+#define IsLeader() (cstate->pcdata->is_leader)
#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
+/*
+ * Copy data block information.
+ */
+typedef struct ParallelCopyDataBlock
+{
+ /* The number of unprocessed lines in the current block. */
+ pg_atomic_uint32 unprocessed_line_parts;
+
+ /*
+ * If the current line data is continued into another block, following_block
+ * will have the position where the remaining data need to be read.
+ */
+ uint32 following_block;
+
+ /*
+ * This flag will be set, when the leader finds out this block can be read
+ * safely by the worker. This helps the worker to start processing the line
+ * early where the line will be spread across many blocks and the worker
+ * need not wait for the complete line to be processed.
+ */
+ bool curr_blk_completed;
+ char data[DATA_BLOCK_SIZE]; /* data read from file */
+ uint8 skip_bytes;
+}ParallelCopyDataBlock;
+
+/*
+ * Individual line information.
+ * ParallelCopyLineBoundary is common data structure between leader & worker,
+ * this is protected by the following sequence in the leader & worker.
+ * Leader should operate in the following order:
+ * 1) update first_block, start_offset & cur_lineno in any order.
+ * 2) update line_size.
+ * 3) update line_state.
+ * Worker should operate in the following order:
+ * 1) read line_size.
+ * 2) only one worker should choose one line for processing, this is handled by
+ * using pg_atomic_compare_exchange_u32, worker will change the sate to
+ * LINE_WORKER_PROCESSING only if line_state is LINE_LEADER_POPULATED.
+ * 3) read first_block, start_offset & cur_lineno in any order.
+ */
+typedef struct ParallelCopyLineBoundary
+{
+ /* Position of the first block in data_blocks array. */
+ uint32 first_block;
+ uint32 start_offset; /* start offset of the line */
+
+ /*
+ * Size of the current line -1 means line is yet to be filled completely,
+ * 0 means empty line, >0 means line filled with line size data.
+ */
+ pg_atomic_uint32 line_size;
+ pg_atomic_uint32 line_state; /* line state */
+ uint64 cur_lineno; /* line number for error messages */
+}ParallelCopyLineBoundary;
+
+/*
+ * Circular queue used to store the line information.
+ */
+typedef struct ParallelCopyLineBoundaries
+{
+ /* Position for the leader to populate a line. */
+ uint32 leader_pos;
+
+ /* Data read from the file/stdin by the leader process. */
+ ParallelCopyLineBoundary ring[RINGSIZE];
+}ParallelCopyLineBoundaries;
+
+/*
+ * Shared information among parallel copy workers. This will be allocated in the
+ * DSM segment.
+ */
+typedef struct ParallelCopyShmInfo
+{
+ bool is_read_in_progress; /* file read status */
+
+ /*
+ * Actual lines inserted by worker (some records will be filtered based on
+ * where condition).
+ */
+ pg_atomic_uint64 processed;
+ pg_atomic_uint64 total_worker_processed; /* total processed records by the workers */
+ uint64 populated; /* lines populated by leader */
+ uint32 cur_block_pos; /* current data block */
+ ParallelCopyDataBlock data_blocks[MAX_BLOCKS_COUNT]; /* data block array */
+ FullTransactionId full_transaction_id; /* xid for copy from statement */
+ CommandId mycid; /* command id */
+ ParallelCopyLineBoundaries line_boundaries; /* line array */
+} ParallelCopyShmInfo;
+
+/*
+ * Parallel copy line buffer information.
+ */
+typedef struct ParallelCopyLineBuf
+{
+ StringInfoData line_buf;
+ uint64 cur_lineno; /* line number for error messages */
+}ParallelCopyLineBuf;
+
+/*
+ * Parallel copy data information.
+ */
+typedef struct ParallelCopyData
+{
+ Oid relid; /* relation id of the table */
+ ParallelCopyShmInfo *pcshared_info; /* common info in shared memory */
+ bool is_leader;
+
+ /* line position which worker is processing */
+ uint32 worker_processed_pos;
+
+ /*
+ * Local line_buf array, workers will copy it here and release the lines
+ * for the leader to continue.
+ */
+ ParallelCopyLineBuf worker_line_buf[WORKER_CHUNK_COUNT];
+ uint32 worker_line_buf_count; /* Number of lines */
+
+ /* Current position in worker_line_buf */
+ uint32 worker_line_buf_pos;
+}ParallelCopyData;
+
/*
* This struct contains all the state variables used throughout a COPY
* operation. For simplicity, we use the same struct for all variants of COPY,
@@ -225,8 +354,66 @@ typedef struct CopyStateData
char *raw_buf;
int raw_buf_index; /* next byte to process */
int raw_buf_len; /* total # of bytes stored */
+ int nworkers;
+ bool is_parallel;
+ ParallelCopyData *pcdata;
} CopyStateData;
+/*
+ * This structure helps in storing the common data from CopyStateData that are
+ * required by the workers. This information will then be allocated and stored
+ * into the DSM for the worker to retrieve and copy it to CopyStateData.
+ */
+typedef struct ParallelCopyCommonKeyData
+{
+ /* low-level state data */
+ CopyDest copy_dest; /* type of copy source/destination */
+ int file_encoding; /* file or remote side's character encoding */
+ bool need_transcoding; /* file encoding diff from server? */
+ bool encoding_embeds_ascii; /* ASCII can be non-first byte? */
+
+ /* parameters from the COPY command */
+ bool csv_mode; /* Comma Separated Value format? */
+ bool header_line; /* CSV header line? */
+ int null_print_len; /* length of same */
+ bool force_quote_all; /* FORCE_QUOTE *? */
+ bool convert_selectively; /* do selective binary conversion? */
+
+ /* Working state for COPY FROM */
+ AttrNumber num_defaults;
+ Oid relid;
+}ParallelCopyCommonKeyData;
+
+/*
+ * This structure will help in converting a List data type into the below
+ * structure format with the count having the number of elements in the list and
+ * the info having the List elements appended contigously. This converted
+ * structure will be allocated in shared memory and stored in DSM for the worker
+ * to retrieve and later convert it back to List data type.
+ */
+typedef struct ParallelCopyKeyListInfo
+{
+ int count; /* count of attributes */
+
+ /* string info in the form info followed by info1, info2... infon */
+ char info[1];
+} ParallelCopyKeyListInfo;
+
+/*
+ * List of internal parallel copy function pointers.
+ */
+static const struct
+{
+ char *fn_name;
+ copy_data_source_cb fn_addr;
+} InternalParallelCopyFuncPtrs[] =
+
+{
+ {
+ "copy_read_data", copy_read_data
+ },
+};
+
/* DestReceiver for COPY (query) TO */
typedef struct
{
@@ -256,6 +443,23 @@ typedef struct
/* Trim the list of buffers back down to this number after flushing */
#define MAX_PARTITION_BUFFERS 32
+/* DSM keys for parallel copy. */
+#define PARALLEL_COPY_KEY_SHARED_INFO 1
+#define PARALLEL_COPY_KEY_CSTATE 2
+#define PARALLEL_COPY_KEY_NULL_PRINT 3
+#define PARALLEL_COPY_KEY_NULL_PRINT_CLIENT 4
+#define PARALLEL_COPY_KEY_DELIM 5
+#define PARALLEL_COPY_KEY_QUOTE 6
+#define PARALLEL_COPY_KEY_ESCAPE 7
+#define PARALLEL_COPY_KEY_ATTNAME_LIST 8
+#define PARALLEL_COPY_KEY_FORCE_QUOTE_LIST 9
+#define PARALLEL_COPY_KEY_NOT_NULL_LIST 10
+#define PARALLEL_COPY_KEY_NULL_LIST 11
+#define PARALLEL_COPY_KEY_CONVERT_LIST 12
+#define PARALLEL_COPY_KEY_DATASOURCE_CB 13
+#define PARALLEL_COPY_KEY_WHERE_CLAUSE_STR 14
+#define PARALLEL_COPY_KEY_RANGE_TABLE 15
+
/* Stores multi-insert data related to a single relation in CopyFrom. */
typedef struct CopyMultiInsertBuffer
{
@@ -476,9 +680,595 @@ static bool CopyGetInt32(CopyState cstate, int32 *val);
static void CopySendInt16(CopyState cstate, int16 val);
static bool CopyGetInt16(CopyState cstate, int16 *val);
+static pg_attribute_always_inline void EndParallelCopy(ParallelContext *pcxt);
static void PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tup_desc,
List *attnamelist);
+static pg_attribute_always_inline copy_data_source_cb LookupParallelCopyFnPtr(const char *funcname);
+static pg_attribute_always_inline char* LookupParallelCopyFnStr(copy_data_source_cb fn_addr);
+
+/*
+ * CopyCommonInfoForWorker - Copy shared_cstate using cstate information.
+ */
+static pg_attribute_always_inline void
+CopyCommonInfoForWorker(CopyState cstate, ParallelCopyCommonKeyData *shared_cstate)
+{
+ shared_cstate->copy_dest = cstate->copy_dest;
+ shared_cstate->file_encoding = cstate->file_encoding;
+ shared_cstate->need_transcoding = cstate->need_transcoding;
+ shared_cstate->encoding_embeds_ascii = cstate->encoding_embeds_ascii;
+ shared_cstate->csv_mode = cstate->csv_mode;
+ shared_cstate->header_line = cstate->header_line;
+ shared_cstate->null_print_len = cstate->null_print_len;
+ shared_cstate->force_quote_all = cstate->force_quote_all;
+ shared_cstate->convert_selectively = cstate->convert_selectively;
+ shared_cstate->num_defaults = cstate->num_defaults;
+ shared_cstate->relid = cstate->pcdata->relid;
+}
+
+/*
+ * RetrieveSharedString - Retrieve the string from shared memory.
+ */
+static void
+RetrieveSharedString(shm_toc *toc, int sharedkey, char **copystr)
+{
+ char *shared_str_val = (char *)shm_toc_lookup(toc, sharedkey, true);
+ if (shared_str_val)
+ *copystr = pstrdup(shared_str_val);
+}
+
+/*
+ * RetrieveSharedList - Retrieve the list from shared memory.
+ */
+static void
+RetrieveSharedList(shm_toc *toc, int sharedkey, List **copylist)
+{
+ ParallelCopyKeyListInfo *listinformation = (ParallelCopyKeyListInfo *)shm_toc_lookup(toc, sharedkey,
+ true);
+ if (listinformation)
+ {
+ int length = 0;
+ int count;
+
+ for (count = 0; count < listinformation->count; count++)
+ {
+ char *attname = (char *)(listinformation->info + length);
+ length += strlen(attname) + 1;
+ *copylist = lappend(*copylist, makeString(attname));
+ }
+ }
+}
+
+/*
+ * CopyListSharedMemory - Copy the list into shared memory.
+ */
+static void
+CopyListSharedMemory(List *inputlist, Size memsize, ParallelCopyKeyListInfo *sharedlistinfo)
+{
+ ListCell *l;
+ int length = 0;
+
+ MemSet(sharedlistinfo, 0, memsize);
+ foreach(l, inputlist)
+ {
+ char *name = strVal(lfirst(l));
+ memcpy((char *)(sharedlistinfo->info + length), name, strlen(name));
+ sharedlistinfo->count++;
+ length += strlen(name) + 1;
+ }
+}
+
+/*
+ * ComputeListSize - compute the list size.
+ */
+static int
+ComputeListSize(List *inputlist)
+{
+ int est_size = sizeof(int);
+ if (inputlist != NIL)
+ {
+ ListCell *l;
+ foreach(l, inputlist)
+ est_size += strlen(strVal(lfirst(l))) + 1;
+ }
+
+ return est_size;
+}
+
+/*
+ * EstimateLineKeysStr - Estimate the size required in shared memory for the
+ * input string.
+ */
+static void
+EstimateLineKeysStr(ParallelContext *pcxt, char *inputstr)
+{
+ if (inputstr)
+ {
+ shm_toc_estimate_chunk(&pcxt->estimator, strlen(inputstr) + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+}
+
+/*
+ * EstimateLineKeysList - Estimate the size required in shared memory for the
+ * input list.
+ */
+static void
+EstimateLineKeysList(ParallelContext *pcxt, List *inputlist,
+ Size *est_list_size)
+{
+ if (inputlist != NIL)
+ {
+ *est_list_size = ComputeListSize(inputlist);
+ shm_toc_estimate_chunk(&pcxt->estimator, *est_list_size);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+}
+/*
+ * InsertStringShm - Insert a string into shared memory.
+ */
+static void
+InsertStringShm(ParallelContext *pcxt, int key, char *inputstr)
+{
+ if (inputstr)
+ {
+ char *shmptr = (char *)shm_toc_allocate(pcxt->toc,
+ strlen(inputstr) + 1);
+ strcpy(shmptr, inputstr);
+ shm_toc_insert(pcxt->toc, key, shmptr);
+ }
+}
+
+/*
+ * InsertListShm - Insert a list into shared memory.
+ */
+static void
+InsertListShm(ParallelContext *pcxt, int key, List *inputlist,
+ Size est_list_size)
+{
+ if (inputlist != NIL)
+ {
+ ParallelCopyKeyListInfo *sharedlistinfo = (ParallelCopyKeyListInfo *)shm_toc_allocate(pcxt->toc,
+ est_list_size);
+ CopyListSharedMemory(inputlist, est_list_size, sharedlistinfo);
+ shm_toc_insert(pcxt->toc, key, sharedlistinfo);
+ }
+}
+
+/*
+ * BeginParallelCopy - start parallel copy tasks.
+ *
+ * Get the number of workers required to perform the parallel copy. The data
+ * structures that are required by the parallel workers will be initialized, the
+ * size required in DSM will be calculated and the necessary keys will be loaded
+ * in the DSM. The specified number of workers will then be launched.
+ *
+ */
+static ParallelContext*
+BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
+{
+ ParallelContext *pcxt;
+ ParallelCopyShmInfo *shared_info_ptr;
+ ParallelCopyCommonKeyData *shared_cstate;
+ FullTransactionId full_transaction_id;
+ Size est_shared_info;
+ Size est_cstateshared;
+ Size est_att_list_size;
+ Size est_quote_list_size;
+ Size est_notnull_list_size;
+ Size est_null_list_size;
+ Size est_convert_list_size;
+ Size est_datasource_cb_size;
+ int count = 0;
+ char *whereClauseStr = NULL;
+ char *rangeTableStr = NULL;
+ int parallel_workers = 0;
+ ParallelCopyData *pcdata;
+
+ parallel_workers = Min(nworkers, max_worker_processes);
+
+ /* Can't perform copy in parallel */
+ if (parallel_workers <= 0)
+ {
+ elog(WARNING,
+ "No workers available, copy will be run in non-parallel mode");
+ return NULL;
+ }
+
+ pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
+ cstate->pcdata = pcdata;
+
+ EnterParallelMode();
+ pcxt = CreateParallelContext("postgres", "ParallelCopyMain",
+ parallel_workers);
+ Assert(pcxt->nworkers > 0);
+
+ /*
+ * Estimate size for shared information for PARALLEL_COPY_KEY_SHARED_INFO
+ */
+ est_shared_info = sizeof(ParallelCopyShmInfo);
+ shm_toc_estimate_chunk(&pcxt->estimator, est_shared_info);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate the size for shared information for PARALLEL_COPY_KEY_CSTATE */
+ est_cstateshared = MAXALIGN(sizeof(ParallelCopyCommonKeyData));
+ shm_toc_estimate_chunk(&pcxt->estimator, est_cstateshared);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ EstimateLineKeysStr(pcxt, cstate->null_print);
+ EstimateLineKeysStr(pcxt, cstate->null_print_client);
+ EstimateLineKeysStr(pcxt, cstate->delim);
+ EstimateLineKeysStr(pcxt, cstate->quote);
+ EstimateLineKeysStr(pcxt, cstate->escape);
+
+ if (cstate->whereClause != NULL)
+ {
+ whereClauseStr = nodeToString(cstate->whereClause);
+ EstimateLineKeysStr(pcxt, whereClauseStr);
+ }
+
+ if (cstate->range_table != NULL)
+ {
+ rangeTableStr = nodeToString(cstate->range_table);
+ EstimateLineKeysStr(pcxt, rangeTableStr);
+ }
+
+ /* Estimate the size for shared information for PARALLEL_COPY_KEY_XID. */
+ shm_toc_estimate_chunk(&pcxt->estimator, sizeof(FullTransactionId));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_ATTNAME_LIST.
+ */
+ EstimateLineKeysList(pcxt, attnamelist, &est_att_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_FORCE_QUOTE_LIST.
+ */
+ EstimateLineKeysList(pcxt, cstate->force_quote, &est_quote_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_NOT_NULL_LIST.
+ */
+ EstimateLineKeysList(pcxt, cstate->force_notnull,
+ &est_notnull_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_NULL_LIST.
+ */
+ EstimateLineKeysList(pcxt, cstate->force_null, &est_null_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_CONVERT_LIST.
+ */
+ EstimateLineKeysList(pcxt, cstate->convert_select,
+ &est_convert_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_DATASOURCE_CB.
+ */
+ if (cstate->data_source_cb)
+ {
+ char *functionname = LookupParallelCopyFnStr(cstate->data_source_cb);
+ est_datasource_cb_size = strlen(functionname) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, est_datasource_cb_size);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+
+ InitializeParallelDSM(pcxt);
+
+ /* If no DSM segment was available, back out (do serial copy) */
+ if (pcxt->seg == NULL)
+ {
+ EndParallelCopy(pcxt);
+ elog(WARNING,
+ "DSM segments not available, copy will be run in non-parallel mode");
+ return NULL;
+ }
+
+ /* Allocate shared memory for PARALLEL_COPY_KEY_SHARED_INFO */
+ shared_info_ptr = (ParallelCopyShmInfo *) shm_toc_allocate(pcxt->toc,
+ est_shared_info);
+ MemSet(shared_info_ptr, 0, est_shared_info);
+ shared_info_ptr->is_read_in_progress = true;
+ shared_info_ptr->cur_block_pos = -1;
+ shared_info_ptr->full_transaction_id = full_transaction_id;
+ shared_info_ptr->mycid = GetCurrentCommandId(true);
+ for (count = 0; count < RINGSIZE; count++)
+ {
+ ParallelCopyLineBoundary *lineInfo = &shared_info_ptr->line_boundaries.ring[count];
+ pg_atomic_init_u32(&(lineInfo->line_size), -1);
+ }
+
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_SHARED_INFO, shared_info_ptr);
+ pcdata->pcshared_info = shared_info_ptr;
+ pcdata->relid = relid;
+
+ /* Store shared build state, for which we reserved space. */
+ shared_cstate = (ParallelCopyCommonKeyData *)shm_toc_allocate(pcxt->toc,
+ est_cstateshared);
+
+ /* copy cstate variables. */
+ CopyCommonInfoForWorker(cstate, shared_cstate);
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_CSTATE, shared_cstate);
+
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_NULL_PRINT, cstate->null_print);
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_NULL_PRINT_CLIENT,
+ cstate->null_print_client);
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_DELIM, cstate->delim);
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_QUOTE, cstate->quote);
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_ESCAPE, cstate->escape);
+
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_ATTNAME_LIST,
+ attnamelist, est_att_list_size);
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_FORCE_QUOTE_LIST,
+ cstate->force_quote, est_quote_list_size);
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_NOT_NULL_LIST,
+ cstate->force_notnull, est_notnull_list_size);
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_NULL_LIST, cstate->force_null,
+ est_null_list_size);
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_CONVERT_LIST,
+ cstate->convert_select, est_convert_list_size);
+
+ if (cstate->data_source_cb)
+ {
+ char *functionname = LookupParallelCopyFnStr(cstate->data_source_cb);
+ char *data_source_cb = (char *) shm_toc_allocate(pcxt->toc,
+ est_datasource_cb_size);
+ strcpy(data_source_cb, functionname);
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_DATASOURCE_CB,
+ data_source_cb);
+ }
+
+ if (cstate->whereClause)
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_WHERE_CLAUSE_STR,
+ whereClauseStr);
+
+ if(cstate->range_table)
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_RANGE_TABLE, rangeTableStr);
+
+ LaunchParallelWorkers(pcxt);
+ if (pcxt->nworkers_launched == 0)
+ {
+ EndParallelCopy(pcxt);
+ elog(WARNING,
+ "No workers available, copy will be run in non-parallel mode");
+ return NULL;
+ }
+
+ /*
+ * Caller needs to wait for all launched workers when we return. Make sure
+ * that the failure-to-start case will not hang forever.
+ */
+ WaitForParallelWorkersToAttach(pcxt);
+
+ pcdata->is_leader = true;
+ cstate->is_parallel = true;
+ return pcxt;
+}
+
+/*
+ * EndParallelCopy - end the parallel copy tasks.
+ */
+static pg_attribute_always_inline void
+EndParallelCopy(ParallelContext *pcxt)
+{
+ Assert(!IsParallelWorker());
+
+ DestroyParallelContext(pcxt);
+ ExitParallelMode();
+}
+
+/*
+ * ParallelWorkerInitialization - Initialize parallel worker.
+ */
+static void
+ParallelWorkerInitialization(ParallelCopyCommonKeyData *shared_cstate,
+ CopyState cstate, List *attnamelist)
+{
+ uint32 count;
+ ParallelCopyData *pcdata = cstate->pcdata;
+ TupleDesc tup_desc = RelationGetDescr(cstate->rel);
+
+ cstate->copy_dest = shared_cstate->copy_dest;
+ cstate->file_encoding = shared_cstate->file_encoding;
+ cstate->need_transcoding = shared_cstate->need_transcoding;
+ cstate->encoding_embeds_ascii = shared_cstate->encoding_embeds_ascii;
+ cstate->csv_mode = shared_cstate->csv_mode;
+ cstate->header_line = shared_cstate->header_line;
+ cstate->null_print_len = shared_cstate->null_print_len;
+ cstate->force_quote_all = shared_cstate->force_quote_all;
+ cstate->convert_selectively = shared_cstate->convert_selectively;
+ cstate->num_defaults = shared_cstate->num_defaults;
+ pcdata->relid = shared_cstate->relid;
+
+ PopulateGlobalsForCopyFrom(cstate, tup_desc, attnamelist);
+
+ /* Initialize state variables. */
+ cstate->reached_eof = false;
+ cstate->eol_type = EOL_UNKNOWN;
+ cstate->cur_relname = RelationGetRelationName(cstate->rel);
+ cstate->cur_lineno = 0;
+ cstate->cur_attname = NULL;
+ cstate->cur_attval = NULL;
+
+ /* Set up variables to avoid per-attribute overhead. */
+ initStringInfo(&cstate->attribute_buf);
+
+ initStringInfo(&cstate->line_buf);
+ for (count = 0; count < WORKER_CHUNK_COUNT;count++)
+ initStringInfo(&pcdata->worker_line_buf[count].line_buf);
+
+ cstate->line_buf_converted = false;
+ cstate->raw_buf = NULL;
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+}
+/*
+ * ParallelCopyMain - parallel copy worker's code.
+ *
+ * Where clause handling, convert tuple to columns, add default null values for
+ * the missing columns that are not present in that record. Find the partition
+ * if it is partitioned table, invoke before row insert Triggers, handle
+ * constraints and insert the tuples.
+ */
+void
+ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
+{
+ CopyState cstate;
+ ParallelCopyData *pcdata;
+ ParallelCopyShmInfo *pcshared_info;
+ ParallelCopyCommonKeyData *shared_cstate;
+ Relation rel = NULL;
+ MemoryContext oldcontext;
+ List *attlist = NIL;
+ char *data_source_cb;
+ char *whereClauseStr = NULL;
+ char *rangeTableStr = NULL;
+
+ /* Allocate workspace and zero all fields. */
+ cstate = (CopyStateData *) palloc0(sizeof(CopyStateData));
+ pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
+ cstate->pcdata = pcdata;
+ pcdata->is_leader = false;
+ pcdata->worker_processed_pos = -1;
+ cstate->is_parallel = true;
+
+ /*
+ * We allocate everything used by a cstate in a new memory context. This
+ * avoids memory leaks during repeated use of COPY in a query.
+ */
+ cstate->copycontext = AllocSetContextCreate(CurrentMemoryContext,
+ "COPY",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+
+ pcshared_info = (ParallelCopyShmInfo *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_SHARED_INFO,
+ false);
+
+ ereport(DEBUG1, (errmsg("Starting parallel copy worker")));
+
+ pcdata->pcshared_info = pcshared_info;
+
+ shared_cstate = (ParallelCopyCommonKeyData *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_CSTATE,
+ false);
+
+ cstate->null_print = (char *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_NULL_PRINT,
+ true);
+ cstate->null_print_client = (char *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_NULL_PRINT_CLIENT,
+ true);
+
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_DELIM, &cstate->delim);
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_QUOTE, &cstate->quote);
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_ESCAPE, &cstate->escape);
+
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_ATTNAME_LIST, &attlist);
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_FORCE_QUOTE_LIST, &cstate->force_quote);
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_NOT_NULL_LIST, &cstate->force_notnull);
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_NULL_LIST, &cstate->force_null);
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_CONVERT_LIST, &cstate->convert_select);
+ data_source_cb = (char *)shm_toc_lookup(toc,
+ PARALLEL_COPY_KEY_DATASOURCE_CB,
+ true);
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_WHERE_CLAUSE_STR, &whereClauseStr);
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_RANGE_TABLE, &rangeTableStr);
+
+ if (data_source_cb)
+ cstate->data_source_cb = LookupParallelCopyFnPtr(data_source_cb);
+
+ if (whereClauseStr)
+ {
+ Node *whereClauseCnvrtdFrmStr = (Node *) stringToNode(whereClauseStr);
+ cstate->whereClause = whereClauseCnvrtdFrmStr;
+ }
+
+ if (rangeTableStr)
+ {
+ List *rangeTableCnvrtdFrmStr = (List *) stringToNode(rangeTableStr);
+ cstate->range_table = rangeTableCnvrtdFrmStr;
+ }
+
+ /* Open and lock the relation, using the appropriate lock type. */
+ rel = table_open(shared_cstate->relid, RowExclusiveLock);
+ cstate->rel = rel;
+ ParallelWorkerInitialization(shared_cstate, cstate, attlist);
+
+ CopyFrom(cstate);
+
+ if (rel != NULL)
+ table_close(rel, RowExclusiveLock);
+
+ MemoryContextSwitchTo(oldcontext);
+ return;
+}
+/*
+ * ParallelCopyLeader - parallel copy leader's functionality.
+ *
+ * Leader will populate the shared memory and share it across the workers.
+ * Leader will read the table data from the file and copy the contents to block.
+ * Leader will then read the input contents and identify the data based on line
+ * beaks. This information is called line. The line will be populate in
+ * ParallelCopyLineBoundary. Workers will then pick up this information and
+ * insert in to table. Leader will do this till it completes processing the
+ * file. Leader executes the before statement if before statement trigger is
+ * present. Leader read the data from input file. Leader then loads data to data
+ * blocks as and when required block by block. Leader traverses through the data
+ * block to identify one line. It gets a free line to copy the information, if
+ * there is no free line it will wait till there is one free line.
+ * Server copies the identified lines information into lines. This process is
+ * repeated till the complete file is processed.
+ * Leader will wait till all the lines populated are processed by the workers
+ * and exits.
+ */
+static void
+ParallelCopyLeader(CopyState cstate)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ereport(DEBUG1, (errmsg("Running parallel copy leader")));
+
+ pcshared_info->is_read_in_progress = false;
+ cstate->cur_lineno = 0;
+}
+/*
+ * LookupParallelCopyFnPtr - Look up parallel copy function pointer.
+ */
+static pg_attribute_always_inline copy_data_source_cb
+LookupParallelCopyFnPtr(const char *funcname)
+{
+ int i;
+
+ for (i = 0; i < lengthof(InternalParallelCopyFuncPtrs); i++)
+ {
+ if (strcmp(InternalParallelCopyFuncPtrs[i].fn_name, funcname) == 0)
+ return InternalParallelCopyFuncPtrs[i].fn_addr;
+ }
+
+ /* We can only reach this by programming error. */
+ elog(ERROR, "internal function \"%s\" not found", funcname);
+}
+
+/*
+ * LookupParallelCopyFnStr - Lookup function string from a function pointer.
+ */
+static pg_attribute_always_inline char*
+LookupParallelCopyFnStr(copy_data_source_cb fn_addr)
+{
+ int i;
+
+ for (i = 0; i < lengthof(InternalParallelCopyFuncPtrs); i++)
+ {
+ if (InternalParallelCopyFuncPtrs[i].fn_addr == fn_addr)
+ return InternalParallelCopyFuncPtrs[i].fn_name;
+ }
+
+ /* We can only reach this by programming error. */
+ elog(ERROR, "internal function pointer not found");
+}
/*
* Send copy start/stop messages for frontend copies. These have changed
* in past protocol redesigns.
@@ -1149,6 +1939,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
if (is_from)
{
+ ParallelContext *pcxt = NULL;
Assert(rel);
/* check read-only transaction and parallel mode */
@@ -1158,7 +1949,24 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
cstate = BeginCopyFrom(pstate, rel, stmt->filename, stmt->is_program,
NULL, stmt->attlist, stmt->options);
cstate->whereClause = whereClause;
- *processed = CopyFrom(cstate); /* copy from file to database */
+ cstate->is_parallel = false;
+
+ if (cstate->nworkers > 0)
+ pcxt = BeginParallelCopy(cstate->nworkers, cstate, stmt->attlist,
+ relid);
+
+ if (pcxt)
+ {
+ ParallelCopyLeader(cstate);
+
+ /* Wait for all copy workers to finish */
+ WaitForParallelWorkersToFinish(pcxt);
+ *processed = pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
+ EndParallelCopy(pcxt);
+ }
+ else
+ *processed = CopyFrom(cstate); /* copy from file to database */
+
EndCopyFrom(cstate);
}
else
@@ -1207,6 +2015,7 @@ ProcessCopyOptions(ParseState *pstate,
cstate->is_copy_from = is_from;
cstate->file_encoding = -1;
+ cstate->nworkers = -1;
/* Extract options from the statement node tree */
foreach(option, options)
@@ -1375,6 +2184,53 @@ ProcessCopyOptions(ParseState *pstate,
defel->defname),
parser_errposition(pstate, defel->location)));
}
+ else if (strcmp(defel->defname, "parallel") == 0)
+ {
+ char *endptr, *str;
+ long val;
+
+ if (!cstate->is_copy_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("parallel option supported only for copy from"),
+ parser_errposition(pstate, defel->location)));
+ if (cstate->nworkers >= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("conflicting or redundant options"),
+ parser_errposition(pstate, defel->location)));
+
+ str = defGetString(defel);
+
+ errno = 0; /* To distinguish success/failure after call */
+ val = strtol(str, &endptr, 10);
+
+ /* Check for various possible errors */
+ if ((errno == ERANGE && (val == LONG_MAX || val == LONG_MIN))
+ || (errno != 0 && val == 0) ||
+ *endptr)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("improper use of argument to option \"%s\"",
+ defel->defname),
+ parser_errposition(pstate, defel->location)));
+
+ if (endptr == str)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("no digits were found in argument to option \"%s\"",
+ defel->defname),
+ parser_errposition(pstate, defel->location)));
+
+ cstate->nworkers = (int) val;
+
+ if (cstate->nworkers <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("argument to option \"%s\" must be a positive integer greater than zero",
+ defel->defname),
+ parser_errposition(pstate, defel->location)));
+ }
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1727,12 +2583,13 @@ BeginCopy(ParseState *pstate,
}
/*
- * PopulateGlobalsForCopyFrom - Populates the common variables required for copy
- * from operation. This is a helper function for BeginCopy function.
+ * PopulateCatalogInformation - Populates the common variables required for copy
+ * from operation. This is a helper function for BeginCopy &
+ * ParallelWorkerInitialization function.
*/
static void
PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tupDesc,
- List *attnamelist)
+ List *attnamelist)
{
int num_phys_attrs;
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index c27d970589..b3787c1c7f 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -557,7 +557,7 @@ make_copy_attnamelist(LogicalRepRelMapEntry *rel)
* Data source callback for the COPY FROM, which reads from the remote
* connection and passes the data back to our local COPY.
*/
-static int
+int
copy_read_data(void *outbuf, int minread, int maxread)
{
int bytesread = 0;
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c639833565..5dc95ac3f5 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -14,6 +14,7 @@
#ifndef COPY_H
#define COPY_H
+#include "access/parallel.h"
#include "nodes/execnodes.h"
#include "nodes/parsenodes.h"
#include "parser/parse_node.h"
@@ -41,4 +42,7 @@ extern uint64 CopyFrom(CopyState cstate);
extern DestReceiver *CreateCopyDestReceiver(void);
+extern void ParallelCopyMain(dsm_segment *seg, shm_toc *toc);
+
+extern int copy_read_data(void *outbuf, int minread, int maxread);
#endif /* COPY_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index c65a55257d..8a7979412b 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1698,6 +1698,14 @@ ParallelBitmapHeapState
ParallelBlockTableScanDesc
ParallelCompletionPtr
ParallelContext
+ParallelCopyLineBoundaries
+ParallelCopyLineBoundary
+ParallelCopyCommonKeyData
+ParallelCopyData
+ParallelCopyDataBlock
+ParallelCopyKeyListInfo
+ParallelCopyLineBuf
+ParallelCopyShmInfo
ParallelExecutorInfo
ParallelHashGrowth
ParallelHashJoinBatch
--
2.25.1
0003-Allow-copy-from-command-to-process-data-from-file-ST.patchapplication/octet-stream; name=0003-Allow-copy-from-command-to-process-data-from-file-ST.patchDownload
From efc6dcb1ad7285c4e7486cd152c5908a48a7544d Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>, Bharath <bharath.rupireddyforpostgres@gmail.com>
Date: Tue, 23 Jun 2020 07:15:43 +0530
Subject: [PATCH 3/6] Allow copy from command to process data from file/STDIN
contents to a table in parallel.
This feature allows the copy from to leverage multiple CPUs in order to copy
data from file/STDIN to a table. This adds a PARALLEL option to COPY FROM
command where the user can specify the number of workers that can be used
to perform the COPY FROM command. Specifying zero as number of workers will
disable parallelism.
The backend, to which the "COPY FROM" query is submitted acts as leader with
the responsibility of reading data from the file/stdin, launching at most n
number of workers as specified with PARALLEL 'n' option in the "COPY FROM"
query. The leader populates the common data required for the workers execution
in the DSM and shares it with the workers. The leader then executes before
statement triggers if there exists any. Leader populates DSM lines which
includes the start offset and line size, while populating the lines it reads
as many blocks as required into the DSM data blocks from the file. Each block
is of 64K size. The leader parses the data to identify a line, the existing
logic from CopyReadLineText which identifies the lines with some changes was
used for this. Leader checks if a free line is available to copy the
information, if there is no free line it waits till the required line is
freed up by the worker and then copies the identified lines information
(offset & line size) into the DSM lines. This process is repeated till the
complete file is processed. Simultaneously, the workers cache the lines(50)
locally into the local memory and release the lines to the leader for further
populating. Each worker processes the lines that it cached and inserts it into
the table. The leader does not participate in the insertion of data, leaders
only responsibility will be to identify the lines as fast as possible for the
workers to do the actual copy operation. The leader waits till all the lines
populated are processed by the workers and exits.
---
src/backend/access/heap/heapam.c | 13 -
src/backend/access/transam/xact.c | 13 +
src/backend/commands/copy.c | 896 ++++++++++++++++++++++++++++++++++++--
src/include/access/xact.h | 1 +
src/tools/pgindent/typedefs.list | 1 +
5 files changed, 873 insertions(+), 51 deletions(-)
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 537913d..28f3a98 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2012,19 +2012,6 @@ static HeapTuple
heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
CommandId cid, int options)
{
- /*
- * Parallel operations are required to be strictly read-only in a parallel
- * worker. Parallel inserts are not safe even in the leader in the
- * general case, because group locking means that heavyweight locks for
- * relation extension or GIN page locks will not conflict between members
- * of a lock group, but we don't prohibit that case here because there are
- * useful special cases that we can safely allow, such as CREATE TABLE AS.
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-
tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 905dc7d..66f7236 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -502,6 +502,19 @@ GetCurrentFullTransactionIdIfAny(void)
}
/*
+ * AssignFullTransactionIdForWorker
+ *
+ * For parallel copy, all the workers must use the same transaction id.
+ */
+void AssignFullTransactionIdForWorker(FullTransactionId fullTransactionId)
+{
+ TransactionState s = CurrentTransactionState;
+
+ Assert((IsInParallelMode() || IsParallelWorker()));
+ s->fullTransactionId = fullTransactionId;
+}
+
+/*
* MarkCurrentTransactionIdLoggedIfAny
*
* Remember that the current xid - if it is assigned - now has been wal logged.
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 34c657c..e8a89a4 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -26,6 +26,7 @@
#include "access/xlog.h"
#include "catalog/dependency.h"
#include "catalog/pg_authid.h"
+#include "catalog/pg_proc_d.h"
#include "catalog/pg_type.h"
#include "commands/copy.h"
#include "commands/defrem.h"
@@ -40,11 +41,13 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "optimizer/clauses.h"
#include "optimizer/optimizer.h"
#include "parser/parse_coerce.h"
#include "parser/parse_collate.h"
#include "parser/parse_expr.h"
#include "parser/parse_relation.h"
+#include "pgstat.h"
#include "port/pg_bswap.h"
#include "rewrite/rewriteHandler.h"
#include "storage/fd.h"
@@ -95,6 +98,18 @@ typedef enum CopyInsertMethod
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
} CopyInsertMethod;
+/*
+ * State of the line.
+ */
+typedef enum ParallelCopyLineState
+{
+ LINE_INIT, /* initial state of line */
+ LINE_LEADER_POPULATING, /* leader processing line */
+ LINE_LEADER_POPULATED, /* leader completed populating line */
+ LINE_WORKER_PROCESSING, /* worker processing line */
+ LINE_WORKER_PROCESSED /* worker completed processing line */
+}ParallelCopyLineState;
+
#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
#define DATA_BLOCK_SIZE RAW_BUF_SIZE
@@ -538,9 +553,13 @@ if (1) \
{ \
if (raw_buf_ptr > cstate->raw_buf_index) \
{ \
- appendBinaryStringInfo(&cstate->line_buf, \
- cstate->raw_buf + cstate->raw_buf_index, \
- raw_buf_ptr - cstate->raw_buf_index); \
+ if (!IsParallelCopy()) \
+ appendBinaryStringInfo(&cstate->line_buf, \
+ cstate->raw_buf + cstate->raw_buf_index, \
+ raw_buf_ptr - cstate->raw_buf_index); \
+ else \
+ line_size += raw_buf_ptr - cstate->raw_buf_index; \
+ \
cstate->raw_buf_index = raw_buf_ptr; \
} \
} else ((void) 0)
@@ -553,13 +572,40 @@ if (1) \
goto not_end_of_copy; \
} else ((void) 0)
+/* Begin parallel copy Macros */
+#define SET_NEWLINE_SIZE() \
+{ \
+ if (cstate->eol_type == EOL_NL || cstate->eol_type == EOL_CR) \
+ new_line_size = 1; \
+ else if (cstate->eol_type == EOL_CRNL) \
+ new_line_size = 2; \
+ else \
+ new_line_size = 0; \
+}
+
+/*
+ * COPY_WAIT_TO_PROCESS - Wait before continuing to process.
+ */
+#define COPY_WAIT_TO_PROCESS() \
+{ \
+ CHECK_FOR_INTERRUPTS(); \
+ (void) WaitLatch(MyLatch, \
+ WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, \
+ 1L, WAIT_EVENT_PG_SLEEP); \
+ ResetLatch(MyLatch); \
+}
+
+/* End parallel copy Macros */
+
/*
* CONVERT_TO_SERVER_ENCODING - convert contents to server encoding.
*/
#define CONVERT_TO_SERVER_ENCODING(cstate) \
{ \
/* Done reading the line. Convert it to server encoding. */ \
- if (cstate->need_transcoding) \
+ if (cstate->need_transcoding && \
+ (!IsParallelCopy() || \
+ (IsParallelCopy() && !IsLeader()))) \
{ \
char *cvt; \
cvt = pg_any_to_server(cstate->line_buf.data, \
@@ -618,22 +664,38 @@ if (1) \
* CLEAR_EOL_LINE - Wrapper for clearing EOL.
*/
#define CLEAR_EOL_LINE() \
-if (!result && !IsHeaderLine()) \
- CLEAR_EOL_FROM_COPIED_DATA(cstate->line_buf.data, \
- cstate->line_buf.len, \
- cstate->line_buf.len) \
+{ \
+ if (!result && !IsHeaderLine()) \
+ { \
+ if (IsParallelCopy()) \
+ CLEAR_EOL_FROM_COPIED_DATA(cstate->raw_buf, \
+ raw_buf_ptr, line_size) \
+ else \
+ CLEAR_EOL_FROM_COPIED_DATA(cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ cstate->line_buf.len) \
+ } \
+}
/*
* INCREMENTPROCESSED - Increment the lines processed.
*/
-#define INCREMENTPROCESSED(processed) \
-processed++;
+#define INCREMENTPROCESSED(processed) \
+{ \
+ if (!IsParallelCopy()) \
+ processed++; \
+ else \
+ pg_atomic_add_fetch_u64(&cstate->pcdata->pcshared_info->processed, 1); \
+}
/*
* GETPROCESSED - Get the lines processed.
*/
#define GETPROCESSED(processed) \
-return processed;
+if (!IsParallelCopy()) \
+ return processed; \
+else \
+ return pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
@@ -681,8 +743,12 @@ static void CopySendInt16(CopyState cstate, int16 val);
static bool CopyGetInt16(CopyState cstate, int16 *val);
static pg_attribute_always_inline void EndParallelCopy(ParallelContext *pcxt);
+static void ExecBeforeStmtTrigger(CopyState cstate);
+static void CheckTargetRelValidity(CopyState cstate);
static void PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tup_desc,
List *attnamelist);
+static void PopulateCatalogInformation(CopyState cstate);
+static pg_attribute_always_inline uint32 GetLinePosition(CopyState cstate);
static pg_attribute_always_inline copy_data_source_cb LookupParallelCopyFnPtr(const char *funcname);
static pg_attribute_always_inline char* LookupParallelCopyFnStr(copy_data_source_cb fn_addr);
@@ -836,6 +902,137 @@ InsertListShm(ParallelContext *pcxt, int key, List *inputlist,
}
/*
+ * IsTriggerFunctionParallelSafe - Check if the trigger function is parallel
+ * safe for the triggers. Return false if any one of the trigger has parallel
+ * unsafe function.
+ */
+static pg_attribute_always_inline bool
+IsTriggerFunctionParallelSafe(TriggerDesc *trigdesc)
+{
+ int i;
+ for (i = 0; i < trigdesc->numtriggers; i++)
+ {
+ Trigger *trigger = &trigdesc->triggers[i];
+ int trigtype = RI_TRIGGER_NONE;
+
+ if (func_parallel(trigger->tgfoid) != PROPARALLEL_SAFE)
+ return false;
+
+ /* If the trigger is parallel safe, also look for RI_TRIGGER. */
+ trigtype = RI_FKey_trigger_type(trigger->tgfoid);
+ if (trigtype == RI_TRIGGER_PK || trigtype == RI_TRIGGER_FK)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * CheckExprParallelSafety - determine parallel safety of volatile expressions
+ * in default clause of column definition or in where clause and return true if
+ * they are parallel safe.
+ */
+static pg_attribute_always_inline bool
+CheckExprParallelSafety(CopyState cstate)
+{
+ if (contain_volatile_functions(cstate->whereClause))
+ {
+ if (max_parallel_hazard((Query *)cstate->whereClause) != PROPARALLEL_SAFE)
+ return false;
+ }
+
+ /*
+ * Check if any of the column has volatile default expression. if yes, and
+ * they are not parallel safe, then parallelism is not allowed. For
+ * instance, if there are any serial/bigserial columns for which nextval()
+ * default expression which is parallel unsafe is associated, parallelism
+ * should not be allowed. In non paralllel copy volatile functions are not
+ * checked for nextval().
+ */
+ if (cstate->defexprs != NULL && cstate->num_defaults != 0)
+ {
+ int i;
+ for (i = 0; i < cstate->num_defaults; i++)
+ {
+ bool volatile_expr = contain_volatile_functions((Node *)cstate->defexprs[i]->expr);
+ if (volatile_expr &&
+ (max_parallel_hazard((Query *)cstate->defexprs[i]->expr)) !=
+ PROPARALLEL_SAFE)
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * FindInsertMethod - determine insert mode single, multi, or multi conditional.
+ */
+static pg_attribute_always_inline CopyInsertMethod
+FindInsertMethod(CopyState cstate)
+{
+ if (cstate->rel->trigdesc != NULL &&
+ (cstate->rel->trigdesc->trig_insert_before_row ||
+ cstate->rel->trigdesc->trig_insert_instead_row))
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
+ cstate->rel->trigdesc != NULL &&
+ cstate->rel->trigdesc->trig_insert_new_table)
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+ return CIM_MULTI_CONDITIONAL;
+
+ return CIM_MULTI;
+}
+
+/*
+ * IsParallelCopyAllowed - check for the cases where parallel copy is not
+ * applicable.
+ */
+static pg_attribute_always_inline bool
+IsParallelCopyAllowed(CopyState cstate)
+{
+ /* Parallel copy not allowed for frontend (2.0 protocol) & binary option. */
+ if ((cstate->copy_dest == COPY_OLD_FE) || cstate->binary)
+ return false;
+
+ /* Check if copy is into foreign table or temporary table. */
+ if (cstate->rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE ||
+ RelationUsesLocalBuffers(cstate->rel))
+ return false;
+
+ /* Check if trigger function is parallel safe. */
+ if (cstate->rel->trigdesc != NULL &&
+ !IsTriggerFunctionParallelSafe(cstate->rel->trigdesc))
+ return false;
+
+ /*
+ * Check if there is after statement or instead of trigger or transition
+ * table triggers.
+ */
+ if (cstate->rel->trigdesc != NULL &&
+ (cstate->rel->trigdesc->trig_insert_after_statement ||
+ cstate->rel->trigdesc->trig_insert_instead_row ||
+ cstate->rel->trigdesc->trig_insert_new_table))
+ return false;
+
+ /* Check if the volatile expressions are parallel safe, if present any. */
+ if (!CheckExprParallelSafety(cstate))
+ return false;
+
+ /* Check if the insertion mode is single. */
+ if (FindInsertMethod(cstate) == CIM_SINGLE)
+ return false;
+
+ return true;
+}
+
+/*
* BeginParallelCopy - start parallel copy tasks.
*
* Get the number of workers required to perform the parallel copy. The data
@@ -865,6 +1062,8 @@ BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
int parallel_workers = 0;
ParallelCopyData *pcdata;
+ CheckTargetRelValidity(cstate);
+
parallel_workers = Min(nworkers, max_worker_processes);
/* Can't perform copy in parallel */
@@ -878,6 +1077,19 @@ BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
cstate->pcdata = pcdata;
+ /*
+ * User chosen parallel copy. Determine if the parallel copy is actually
+ * allowed. If not, go with the non-parallel mode.
+ */
+ if (!IsParallelCopyAllowed(cstate))
+ {
+ elog(WARNING,
+ "Parallel copy not supported for specified table, copy will be run in non-parallel mode");
+ return NULL;
+ }
+
+ full_transaction_id = GetCurrentFullTransactionId();
+
EnterParallelMode();
pcxt = CreateParallelContext("postgres", "ParallelCopyMain",
parallel_workers);
@@ -1108,7 +1320,211 @@ ParallelWorkerInitialization(ParallelCopyCommonKeyData *shared_cstate,
cstate->line_buf_converted = false;
cstate->raw_buf = NULL;
cstate->raw_buf_index = cstate->raw_buf_len = 0;
+
+ PopulateCatalogInformation(cstate);
+
+ /* Create workspace for CopyReadAttributes results. */
+ if (!cstate->binary)
+ {
+ AttrNumber attr_count = list_length(cstate->attnumlist);
+
+ cstate->max_fields = attr_count;
+ cstate->raw_fields = (char **)palloc(attr_count * sizeof(char *));
+ }
+}
+
+/*
+ * CacheLineInfo - Cache the line information to local memory.
+ */
+static bool
+CacheLineInfo(CopyState cstate, uint32 buff_count)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyData *pcdata = cstate->pcdata;
+ uint32 write_pos;
+ ParallelCopyDataBlock *data_blk_ptr;
+ ParallelCopyLineBoundary *lineInfo;
+ uint32 offset;
+ int dataSize;
+ int copiedSize = 0;
+
+ resetStringInfo(&pcdata->worker_line_buf[buff_count].line_buf);
+ write_pos = GetLinePosition(cstate);
+ if (-1 == write_pos)
+ return true;
+
+ /* Get the current line information. */
+ lineInfo = &pcshared_info->line_boundaries.ring[write_pos];
+ if (pg_atomic_read_u32(&lineInfo->line_size) == 0)
+ goto empty_data_line_update;
+
+ /* Get the block information. */
+ data_blk_ptr = &pcshared_info->data_blocks[lineInfo->first_block];
+
+ /* Get the offset information from where the data must be copied. */
+ offset = lineInfo->start_offset;
+ pcdata->worker_line_buf[buff_count].cur_lineno = lineInfo->cur_lineno;
+
+ elog(DEBUG1, "[Worker] Processing - line position:%d, block:%d, unprocessed lines:%d, offset:%d, line size:%d",
+ write_pos, lineInfo->first_block,
+ pg_atomic_read_u32(&data_blk_ptr->unprocessed_line_parts),
+ offset, pg_atomic_read_u32(&lineInfo->line_size));
+
+ for (;;)
+ {
+ uint8 skip_bytes = data_blk_ptr->skip_bytes;
+ /*
+ * There is a possibility that the above loop has come out because
+ * data_blk_ptr->curr_blk_completed is set, but dataSize read might
+ * be an old value, if data_blk_ptr->curr_blk_completed and the line is
+ * completed, line_size will be set. Read the line_size again to be
+ * sure if it is complete or partial block.
+ */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+ if (dataSize)
+ {
+ int remainingSize = dataSize - copiedSize;
+ if (!remainingSize)
+ break;
+
+ /* Whole line is in current block. */
+ if (remainingSize + offset + skip_bytes < DATA_BLOCK_SIZE)
+ {
+ appendBinaryStringInfo(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ remainingSize);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts,
+ 1);
+ break;
+ }
+ else
+ {
+ /* Line is spread across the blocks. */
+ uint32 lineInCurrentBlock = (DATA_BLOCK_SIZE - skip_bytes) - offset;
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ lineInCurrentBlock);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts, 1);
+ copiedSize += lineInCurrentBlock;
+ while (copiedSize < dataSize)
+ {
+ uint32 currentBlockCopySize;
+ ParallelCopyDataBlock *currBlkPtr = &pcshared_info->data_blocks[data_blk_ptr->following_block];
+ skip_bytes = currBlkPtr->skip_bytes;
+
+ /*
+ * If complete data is present in current block use
+ * dataSize - copiedSize, or copy the whole block from
+ * current block.
+ */
+ currentBlockCopySize = Min(dataSize - copiedSize, DATA_BLOCK_SIZE - skip_bytes);
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &currBlkPtr->data[0],
+ currentBlockCopySize);
+ pg_atomic_sub_fetch_u32(&currBlkPtr->unprocessed_line_parts, 1);
+ copiedSize += currentBlockCopySize;
+ data_blk_ptr = currBlkPtr;
+ }
+
+ break;
+ }
+ }
+ else
+ {
+ /* Copy this complete block from the current offset. */
+ uint32 lineInCurrentBlock = (DATA_BLOCK_SIZE - skip_bytes) - offset;
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ lineInCurrentBlock);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts, 1);
+ copiedSize += lineInCurrentBlock;
+
+ /*
+ * Reset the offset. For the first copy, copy from the offset. For
+ * the subsequent copy the complete block.
+ */
+ offset = 0;
+
+ /* Set data_blk_ptr to the following block. */
+ data_blk_ptr = &pcshared_info->data_blocks[data_blk_ptr->following_block];
+ }
+
+ for (;;)
+ {
+ /* Get the size of this line */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+
+ /*
+ * If the data is present in current block lineInfo.line_size
+ * will be updated. If the data is spread across the blocks either
+ * of lineInfo.line_size or data_blk_ptr->curr_blk_completed can
+ * be updated. lineInfo.line_size will be updated if the complete
+ * read is finished. data_blk_ptr->curr_blk_completed will be
+ * updated if processing of current block is finished and data
+ * processing is not finished.
+ */
+ if (data_blk_ptr->curr_blk_completed || (dataSize != -1))
+ break;
+
+ COPY_WAIT_TO_PROCESS()
+ }
+ }
+
+empty_data_line_update:
+ elog(DEBUG1, "[Worker] Completed processing line:%d", write_pos);
+ pg_atomic_write_u32(&lineInfo->line_state, LINE_WORKER_PROCESSED);
+ pg_atomic_write_u32(&lineInfo->line_size, -1);
+ pg_atomic_add_fetch_u64(&pcshared_info->total_worker_processed, 1);
+ return false;
+}
+
+/*
+ * GetWorkerLine - Returns a line for worker to process.
+ */
+static bool
+GetWorkerLine(CopyState cstate)
+{
+ uint32 buff_count;
+ ParallelCopyData *pcdata = cstate->pcdata;
+
+ /*
+ * Copy the line data to line_buf and release the line position so that the
+ * worker can continue loading data.
+ */
+ if (pcdata->worker_line_buf_pos < pcdata->worker_line_buf_count)
+ goto return_line;
+
+ pcdata->worker_line_buf_pos = 0;
+ pcdata->worker_line_buf_count = 0;
+
+ for (buff_count = 0; buff_count < WORKER_CHUNK_COUNT; buff_count++)
+ {
+ bool result = CacheLineInfo(cstate, buff_count);
+ if (result)
+ break;
+
+ pcdata->worker_line_buf_count++;
+ }
+
+ if (pcdata->worker_line_buf_count)
+ goto return_line;
+ else
+ resetStringInfo(&cstate->line_buf);
+
+ return true;
+
+return_line:
+ cstate->line_buf = pcdata->worker_line_buf[pcdata->worker_line_buf_pos].line_buf;
+ cstate->cur_lineno = pcdata->worker_line_buf[pcdata->worker_line_buf_pos].cur_lineno;
+ cstate->line_buf_valid = true;
+
+ /* Mark that encoding conversion hasn't occurred yet. */
+ cstate->line_buf_converted = false;
+ CONVERT_TO_SERVER_ENCODING(cstate)
+ pcdata->worker_line_buf_pos++;
+ return false;
}
+
/*
* ParallelCopyMain - parallel copy worker's code.
*
@@ -1154,6 +1570,7 @@ ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
ereport(DEBUG1, (errmsg("Starting parallel copy worker")));
pcdata->pcshared_info = pcshared_info;
+ AssignFullTransactionIdForWorker(pcshared_info->full_transaction_id);
shared_cstate = (ParallelCopyCommonKeyData *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_CSTATE,
false);
@@ -1206,6 +1623,34 @@ ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
MemoryContextSwitchTo(oldcontext);
return;
}
+
+/*
+ * UpdateBlockInLineInfo - Update the line information.
+ */
+static pg_attribute_always_inline int
+UpdateBlockInLineInfo(CopyState cstate, uint32 blk_pos,
+ uint32 offset, uint32 line_size, uint32 line_state)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundaries *lineBoundaryPtr = &pcshared_info->line_boundaries;
+ ParallelCopyLineBoundary *lineInfo;
+ int line_pos = lineBoundaryPtr->leader_pos;
+
+ /* Update the line information for the worker to pick and process. */
+ lineInfo = &lineBoundaryPtr->ring[line_pos];
+ while (pg_atomic_read_u32(&lineInfo->line_size) != -1)
+ COPY_WAIT_TO_PROCESS()
+
+ lineInfo->first_block = blk_pos;
+ lineInfo->start_offset = offset;
+ lineInfo->cur_lineno = cstate->cur_lineno;
+ pg_atomic_write_u32(&lineInfo->line_size, line_size);
+ pg_atomic_write_u32(&lineInfo->line_state, line_state);
+ lineBoundaryPtr->leader_pos = (lineBoundaryPtr->leader_pos + 1) % RINGSIZE;
+
+ return line_pos;
+}
+
/*
* ParallelCopyLeader - parallel copy leader's functionality.
*
@@ -1231,9 +1676,158 @@ ParallelCopyLeader(CopyState cstate)
ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
ereport(DEBUG1, (errmsg("Running parallel copy leader")));
+ /* Execute the before statement triggers from the leader */
+ ExecBeforeStmtTrigger(cstate);
+
+ /* On input just throw the header line away. */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
+ {
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ {
+ pcshared_info->is_read_in_progress = false;
+ return; /* done */
+ }
+ }
+
+ for (;;)
+ {
+ bool done;
+ cstate->cur_lineno++;
+
+ /* Actually read the line into memory here. */
+ done = CopyReadLine(cstate);
+
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ break;
+ }
+
pcshared_info->is_read_in_progress = false;
cstate->cur_lineno = 0;
}
+
+/*
+ * GetLinePosition - return the line position that worker should process.
+ */
+static uint32
+GetLinePosition(CopyState cstate)
+{
+ ParallelCopyData *pcdata = cstate->pcdata;
+ ParallelCopyShmInfo *pcshared_info = pcdata->pcshared_info;
+ uint32 previous_pos = pcdata->worker_processed_pos;
+ uint32 write_pos = (previous_pos == -1) ? 0 : (previous_pos + 1) % RINGSIZE;
+ for (;;)
+ {
+ int dataSize;
+ bool is_read_in_progress = pcshared_info->is_read_in_progress;
+ ParallelCopyLineBoundary *lineInfo;
+ ParallelCopyDataBlock *data_blk_ptr;
+ ParallelCopyLineState line_state = LINE_LEADER_POPULATED;
+ ParallelCopyLineState curr_line_state;
+ CHECK_FOR_INTERRUPTS();
+
+ /* File read completed & no elements to process. */
+ if (!is_read_in_progress &&
+ (pcshared_info->populated ==
+ pg_atomic_read_u64(&pcshared_info->total_worker_processed)))
+ {
+ write_pos = -1;
+ break;
+ }
+
+ /* Get the current line information. */
+ lineInfo = &pcshared_info->line_boundaries.ring[write_pos];
+ curr_line_state = pg_atomic_read_u32(&lineInfo->line_state);
+ if ((write_pos % WORKER_CHUNK_COUNT == 0) &&
+ (curr_line_state == LINE_WORKER_PROCESSED ||
+ curr_line_state == LINE_WORKER_PROCESSING))
+ {
+ pcdata->worker_processed_pos = write_pos;
+ write_pos = (write_pos + WORKER_CHUNK_COUNT) % RINGSIZE;
+ continue;
+ }
+
+ /* Get the size of this line. */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+
+ if (dataSize != 0) /* If not an empty line. */
+ {
+ /* Get the block information. */
+ data_blk_ptr = &pcshared_info->data_blocks[lineInfo->first_block];
+
+ if (!data_blk_ptr->curr_blk_completed && (dataSize == -1))
+ {
+ /* Wait till the current line or block is added. */
+ COPY_WAIT_TO_PROCESS()
+ continue;
+ }
+ }
+
+ /* Make sure that no worker has consumed this element. */
+ if (pg_atomic_compare_exchange_u32(&lineInfo->line_state,
+ &line_state, LINE_WORKER_PROCESSING))
+ break;
+ }
+
+ pcdata->worker_processed_pos = write_pos;
+ return write_pos;
+}
+
+/*
+ * GetFreeCopyBlock - Get a free block for data to be copied.
+ */
+static pg_attribute_always_inline uint32
+GetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
+{
+ int count = 0;
+ uint32 last_free_block = pcshared_info->cur_block_pos;
+ uint32 block_pos = (last_free_block != -1) ? ((last_free_block + 1) % MAX_BLOCKS_COUNT): 0;
+
+ /* Get a new block for copying data. */
+ while (count < MAX_BLOCKS_COUNT)
+ {
+ ParallelCopyDataBlock *dataBlkPtr = &pcshared_info->data_blocks[block_pos];
+ uint32 unprocessed_line_parts = pg_atomic_read_u32(&dataBlkPtr->unprocessed_line_parts);
+ if (unprocessed_line_parts == 0)
+ {
+ dataBlkPtr->curr_blk_completed = false;
+ dataBlkPtr->skip_bytes = 0;
+ pcshared_info->cur_block_pos = block_pos;
+ return block_pos;
+ }
+
+ block_pos = (block_pos + 1) % MAX_BLOCKS_COUNT;
+ count++;
+ }
+
+ return -1;
+}
+
+/*
+ * WaitGetFreeCopyBlock - If there are no blocks available, wait and get a block
+ * for copying data.
+ */
+static uint32
+WaitGetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
+{
+ uint32 new_free_pos = -1;
+ for (;;)
+ {
+ new_free_pos = GetFreeCopyBlock(pcshared_info);
+ if (new_free_pos != -1) /* We have got one block, break now. */
+ break;
+
+ COPY_WAIT_TO_PROCESS()
+ }
+
+ return new_free_pos;
+}
+
/*
* LookupParallelCopyFnPtr - Look up parallel copy function pointer.
*/
@@ -1269,6 +1863,146 @@ LookupParallelCopyFnStr(copy_data_source_cb fn_addr)
/* We can only reach this by programming error. */
elog(ERROR, "internal function pointer not found");
}
+
+/*
+ * SetRawBufForLoad - Set raw_buf to the shared memory where the file data must
+ * be read.
+ */
+static void
+SetRawBufForLoad(CopyState cstate, uint32 line_size, uint32 copy_buf_len,
+ uint32 raw_buf_ptr, char **copy_raw_buf)
+{
+ ParallelCopyShmInfo *pcshared_info;
+ uint32 cur_block_pos;
+ uint32 next_block_pos;
+ ParallelCopyDataBlock *cur_data_blk_ptr = NULL;
+ ParallelCopyDataBlock *next_data_blk_ptr = NULL;
+
+ if (!IsParallelCopy())
+ return;
+
+ pcshared_info = cstate->pcdata->pcshared_info;
+ cur_block_pos = pcshared_info->cur_block_pos;
+ cur_data_blk_ptr = (cstate->raw_buf) ? &pcshared_info->data_blocks[cur_block_pos] : NULL;
+ next_block_pos = WaitGetFreeCopyBlock(pcshared_info);
+ next_data_blk_ptr = &pcshared_info->data_blocks[next_block_pos];
+
+ /* set raw_buf to the data block in shared memory */
+ cstate->raw_buf = next_data_blk_ptr->data;
+ *copy_raw_buf = cstate->raw_buf;
+ if (cur_data_blk_ptr && line_size)
+ {
+ /*
+ * Mark the previous block as completed, worker can start copying this
+ * data.
+ */
+ cur_data_blk_ptr->following_block = next_block_pos;
+ pg_atomic_add_fetch_u32(&cur_data_blk_ptr->unprocessed_line_parts, 1);
+ cur_data_blk_ptr->skip_bytes = copy_buf_len - raw_buf_ptr;
+ cur_data_blk_ptr->curr_blk_completed = true;
+ }
+}
+
+/*
+ * EndLineParallelCopy - Update the line information in shared memory.
+ */
+static void
+EndLineParallelCopy(CopyState cstate, uint32 line_pos, uint32 line_size,
+ uint32 raw_buf_ptr)
+{
+ uint8 new_line_size;
+ if (!IsParallelCopy())
+ return;
+
+ if (!IsHeaderLine())
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundaries *lineBoundaryPtr = &pcshared_info->line_boundaries;
+ SET_NEWLINE_SIZE()
+ if (line_size)
+ {
+ ParallelCopyLineBoundary *lineInfo = &lineBoundaryPtr->ring[line_pos];
+ /*
+ * If the new_line_size > raw_buf_ptr, then the new block has only
+ * new line char content. The unprocessed count should not be
+ * increased in this case.
+ */
+ if (raw_buf_ptr > new_line_size)
+ {
+ uint32 cur_block_pos = pcshared_info->cur_block_pos;
+ ParallelCopyDataBlock *curr_data_blk_ptr = &pcshared_info->data_blocks[cur_block_pos];
+ pg_atomic_add_fetch_u32(&curr_data_blk_ptr->unprocessed_line_parts, 1);
+ }
+
+ /* Update line size. */
+ pg_atomic_write_u32(&lineInfo->line_size, line_size);
+ pg_atomic_write_u32(&lineInfo->line_state, LINE_LEADER_POPULATED);
+ elog(DEBUG1, "[Leader] After adding - line position:%d, line_size:%d",
+ line_pos, line_size);
+ pcshared_info->populated++;
+ }
+ else if (new_line_size)
+ {
+ /* This means only new line char, empty record should be inserted.*/
+ ParallelCopyLineBoundary *lineInfo;
+ line_pos = UpdateBlockInLineInfo(cstate, -1, -1, 0,
+ LINE_LEADER_POPULATED);
+ lineInfo = &lineBoundaryPtr->ring[line_pos];
+ elog(DEBUG1, "[Leader] Added empty line with offset:%d, line position:%d, line size:%d",
+ lineInfo->start_offset, line_pos,
+ pg_atomic_read_u32(&lineInfo->line_size));
+ pcshared_info->populated++;
+ }
+ }
+}
+
+/*
+ * ExecBeforeStmtTrigger - Execute the before statement trigger, this will be
+ * executed for parallel copy by the leader process.
+ */
+static void
+ExecBeforeStmtTrigger(CopyState cstate)
+{
+ EState *estate = CreateExecutorState();
+ ResultRelInfo *resultRelInfo;
+
+ Assert(IsLeader());
+
+ /*
+ * We need a ResultRelInfo so we can use the regular executor's
+ * index-entry-making machinery. (There used to be a huge amount of code
+ * here that basically duplicated execUtils.c ...)
+ */
+ resultRelInfo = makeNode(ResultRelInfo);
+ InitResultRelInfo(resultRelInfo,
+ cstate->rel,
+ 1, /* must match rel's position in range_table */
+ NULL,
+ 0);
+
+ /* Verify the named relation is a valid target for INSERT */
+ CheckValidResultRel(resultRelInfo, CMD_INSERT);
+
+ estate->es_result_relations = resultRelInfo;
+ estate->es_num_result_relations = 1;
+ estate->es_result_relation_info = resultRelInfo;
+
+ ExecInitRangeTable(estate, cstate->range_table);
+
+ /*
+ * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
+ * should do this for COPY, since it's not really an "INSERT" statement as
+ * such. However, executing these triggers maintains consistency with the
+ * EACH ROW triggers that we already fire on COPY.
+ */
+ ExecBSInsertTriggers(estate, resultRelInfo);
+
+ /* Close any trigger target relations */
+ ExecCleanUpTriggerState(estate);
+
+ FreeExecutorState(estate);
+}
+
/*
* Send copy start/stop messages for frontend copies. These have changed
* in past protocol redesigns.
@@ -3675,7 +4409,8 @@ CopyFrom(CopyState cstate)
PartitionTupleRouting *proute = NULL;
ErrorContextCallback errcallback;
- CommandId mycid = GetCurrentCommandId(true);
+ CommandId mycid = IsParallelCopy() ? cstate->pcdata->pcshared_info->mycid :
+ GetCurrentCommandId(true);
int ti_options = 0; /* start with default options for insert */
BulkInsertState bistate = NULL;
CopyInsertMethod insertMethod;
@@ -3685,7 +4420,14 @@ CopyFrom(CopyState cstate)
bool has_instead_insert_row_trig;
bool leafpart_use_multi_insert = false;
- CheckTargetRelValidity(cstate);
+ /*
+ * Perform this check if it is not parallel copy. In case of parallel
+ * copy, this check is done by the leader, so that if any invalid case
+ * exist the copy from command will error out from the leader itself,
+ * avoiding launching workers, just to throw error.
+ */
+ if (!IsParallelCopy())
+ CheckTargetRelValidity(cstate);
/*
* If the target file is new-in-transaction, we assume that checking FSM
@@ -3874,13 +4616,16 @@ CopyFrom(CopyState cstate)
has_instead_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
resultRelInfo->ri_TrigDesc->trig_insert_instead_row);
- /*
- * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
- * should do this for COPY, since it's not really an "INSERT" statement as
- * such. However, executing these triggers maintains consistency with the
- * EACH ROW triggers that we already fire on COPY.
- */
- ExecBSInsertTriggers(estate, resultRelInfo);
+ if (!IsParallelCopy())
+ {
+ /*
+ * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
+ * should do this for COPY, since it's not really an "INSERT" statement as
+ * such. However, executing these triggers maintains consistency with the
+ * EACH ROW triggers that we already fire on COPY.
+ */
+ ExecBSInsertTriggers(estate, resultRelInfo);
+ }
econtext = GetPerTupleExprContext(estate);
@@ -3980,6 +4725,16 @@ CopyFrom(CopyState cstate)
!has_instead_insert_row_trig &&
resultRelInfo->ri_FdwRoutine == NULL;
+ /*
+ * If the table has any partitions that are either foreign or
+ * has BEFORE/INSTEAD OF triggers, we can't perform copy
+ * operations with parallel workers.
+ */
+ if (!leafpart_use_multi_insert && IsParallelWorker())
+ ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot perform PARALLEL COPY if partition has BEFORE/INSTEAD OF triggers, or if the partition is foreign partition"),
+ errhint("Try COPY without PARALLEL option")));
+
/* Set the multi-insert buffer to use for this partition. */
if (leafpart_use_multi_insert)
{
@@ -4394,7 +5149,7 @@ BeginCopyFrom(ParseState *pstate,
initStringInfo(&cstate->attribute_buf);
initStringInfo(&cstate->line_buf);
cstate->line_buf_converted = false;
- cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf = (IsParallelCopy()) ? NULL : (char *) palloc(RAW_BUF_SIZE + 1);
cstate->raw_buf_index = cstate->raw_buf_len = 0;
/* Assign range table, we'll need it in CopyFrom. */
@@ -4538,26 +5293,35 @@ NextCopyFromRawFields(CopyState cstate, char ***fields, int *nfields)
/* only available for text or csv input */
Assert(!cstate->binary);
- /* on input just throw the header line away */
- if (cstate->cur_lineno == 0 && cstate->header_line)
+ if (IsParallelCopy())
{
- cstate->cur_lineno++;
- if (CopyReadLine(cstate))
- return false; /* done */
+ done = GetWorkerLine(cstate);
+ if (done && cstate->line_buf.len == 0)
+ return false;
}
+ else
+ {
+ /* on input just throw the header line away */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
+ {
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ return false; /* done */
+ }
- cstate->cur_lineno++;
+ cstate->cur_lineno++;
- /* Actually read the line into memory here */
- done = CopyReadLine(cstate);
+ /* Actually read the line into memory here */
+ done = CopyReadLine(cstate);
- /*
- * EOF at start of line means we're done. If we see EOF after some
- * characters, we act as though it was newline followed by EOF, ie,
- * process the line and then exit loop on next iteration.
- */
- if (done && cstate->line_buf.len == 0)
- return false;
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ return false;
+ }
/* Parse the line into de-escaped field values */
if (cstate->csv_mode)
@@ -4805,9 +5569,31 @@ CopyReadLine(CopyState cstate)
*/
if (cstate->copy_dest == COPY_NEW_FE)
{
+ bool bIsFirst = true;
do
{
- cstate->raw_buf_index = cstate->raw_buf_len;
+ if (!IsParallelCopy())
+ cstate->raw_buf_index = cstate->raw_buf_len;
+ else
+ {
+ if (cstate->raw_buf_index == RAW_BUF_SIZE)
+ {
+ /* Get a new block if it is the first time, From the
+ * subsequent time, reset the index and re-use the same
+ * block.
+ */
+ if (bIsFirst)
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint32 block_pos = WaitGetFreeCopyBlock(pcshared_info);
+ cstate->raw_buf = pcshared_info->data_blocks[block_pos].data;
+ bIsFirst = false;
+ }
+
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+ }
+ }
+
} while (CopyLoadRawBuf(cstate));
}
}
@@ -4837,6 +5623,11 @@ CopyReadLineText(CopyState cstate)
char quotec = '\0';
char escapec = '\0';
+ /* For parallel copy */
+ uint32 line_size = 0;
+ int line_pos = 0;
+
+ cstate->eol_type = EOL_UNKNOWN;
if (cstate->csv_mode)
{
quotec = cstate->quote[0];
@@ -4891,6 +5682,8 @@ CopyReadLineText(CopyState cstate)
if (raw_buf_ptr >= copy_buf_len || need_data)
{
REFILL_LINEBUF;
+ SetRawBufForLoad(cstate, line_size, copy_buf_len, raw_buf_ptr,
+ ©_raw_buf);
/*
* Try to read some more data. This will certainly reset
@@ -5115,9 +5908,15 @@ CopyReadLineText(CopyState cstate)
* discard the data and the \. sequence.
*/
if (prev_raw_ptr > cstate->raw_buf_index)
- appendBinaryStringInfo(&cstate->line_buf,
+ {
+ if (!IsParallelCopy())
+ appendBinaryStringInfo(&cstate->line_buf,
cstate->raw_buf + cstate->raw_buf_index,
prev_raw_ptr - cstate->raw_buf_index);
+ else
+ line_size += prev_raw_ptr - cstate->raw_buf_index;
+ }
+
cstate->raw_buf_index = raw_buf_ptr;
result = true; /* report EOF */
break;
@@ -5169,6 +5968,26 @@ not_end_of_copy:
IF_NEED_REFILL_AND_EOF_BREAK(mblen - 1);
raw_buf_ptr += mblen - 1;
}
+
+ /*
+ * Skip the header line. Update the line here, this cannot be done at
+ * the beginning, as there is a possibility that file contains empty
+ * lines.
+ */
+ if (IsParallelCopy() && first_char_in_line && !IsHeaderLine())
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundary *lineInfo;
+ uint32 line_first_block = pcshared_info->cur_block_pos;
+ line_pos = UpdateBlockInLineInfo(cstate,
+ line_first_block,
+ cstate->raw_buf_index, -1,
+ LINE_LEADER_POPULATING);
+ lineInfo = &pcshared_info->line_boundaries.ring[line_pos];
+ elog(DEBUG1, "[Leader] Adding - block:%d, offset:%d, line position:%d",
+ line_first_block, lineInfo->start_offset, line_pos);
+ }
+
first_char_in_line = false;
} /* end of outer loop */
@@ -5177,6 +5996,7 @@ not_end_of_copy:
*/
REFILL_LINEBUF;
CLEAR_EOL_LINE()
+ EndLineParallelCopy(cstate, line_pos, line_size, raw_buf_ptr);
return result;
}
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index db19187..584b7ee 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -381,6 +381,7 @@ extern FullTransactionId GetTopFullTransactionId(void);
extern FullTransactionId GetTopFullTransactionIdIfAny(void);
extern FullTransactionId GetCurrentFullTransactionId(void);
extern FullTransactionId GetCurrentFullTransactionIdIfAny(void);
+extern void AssignFullTransactionIdForWorker(FullTransactionId fullTransactionId);
extern void MarkCurrentTransactionIdLoggedIfAny(void);
extern bool SubTransactionIsActive(SubTransactionId subxid);
extern CommandId GetCurrentCommandId(bool used);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 8a79794..86b5c62 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1700,6 +1700,7 @@ ParallelCompletionPtr
ParallelContext
ParallelCopyLineBoundaries
ParallelCopyLineBoundary
+ParallelCopyLineState
ParallelCopyCommonKeyData
ParallelCopyData
ParallelCopyDataBlock
--
1.8.3.1
0004-Documentation-for-parallel-copy.patchapplication/octet-stream; name=0004-Documentation-for-parallel-copy.patchDownload
From f06311aab275234125ed5b2c1a5c4740db55df3f Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Wed, 17 Jun 2020 07:34:56 +0530
Subject: [PATCH 4/6] Documentation for parallel copy.
This patch has the documentation changes for parallel copy.
---
doc/src/sgml/ref/copy.sgml | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18189ab..95d349d 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -37,6 +37,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
NULL '<replaceable class="parameter">null_string</replaceable>'
HEADER [ <replaceable class="parameter">boolean</replaceable> ]
+ PARALLEL <replaceable class="parameter">integer</replaceable>
QUOTE '<replaceable class="parameter">quote_character</replaceable>'
ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -275,6 +276,21 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</varlistentry>
<varlistentry>
+ <term><literal>PARALLEL</literal></term>
+ <listitem>
+ <para>
+ Perform <command>COPY FROM</command> in parallel using <replaceable
+ class="parameter"> integer</replaceable> background workers. Please
+ note that it is not guaranteed that the number of parallel workers
+ specified in <replaceable class="parameter">integer</replaceable> will
+ be used during execution. It is possible for a copy to run with fewer
+ workers than specified, or even with no workers at all. This option is
+ allowed only in <command>COPY FROM</command>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>QUOTE</literal></term>
<listitem>
<para>
--
1.8.3.1
0005-Tests-for-parallel-copy.patchapplication/octet-stream; name=0005-Tests-for-parallel-copy.patchDownload
From 740f39b8a87eb74e74182ed2cc5e8c18bd2ce367 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Wed, 17 Jun 2020 07:37:26 +0530
Subject: [PATCH 5/6] Tests for parallel copy.
This patch has the tests for parallel copy.
---
src/test/regress/expected/copy2.out | 210 +++++++++++++++++++++++++++++++++++-
src/test/regress/input/copy.source | 12 +++
src/test/regress/output/copy.source | 12 +++
src/test/regress/sql/copy2.sql | 208 ++++++++++++++++++++++++++++++++++-
4 files changed, 434 insertions(+), 8 deletions(-)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index e40287d..a088f72 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -254,18 +254,32 @@ It is "perfect".|
"It is ""perfect""."," "
"",
--test that we read consecutive LFs properly
-CREATE TEMP TABLE testnl (a int, b text, c int);
+CREATE TABLE testnl (a int, b text, c int);
COPY testnl FROM stdin CSV;
+COPY testnl FROM stdin WITH (FORMAT 'csv', PARALLEL '2');
+SELECT * FROM testnl;
+ a | b | c
+---+----------------------+---
+ 1 | a field with two LFs+| 2
+ | +|
+ | inside |
+ 1 | a field with two LFs+| 2
+ | +|
+ | inside |
+(2 rows)
+
-- test end of copy marker
-CREATE TEMP TABLE testeoc (a text);
+CREATE TABLE testeoc (a text);
COPY testeoc FROM stdin CSV;
+TRUNCATE testeoc;
+COPY testeoc FROM stdin WITH (FORMAT 'csv', PARALLEL '2');
COPY testeoc TO stdout CSV;
a\.
\.b
c\.d
"\."
-- test handling of nonstandard null marker that violates escaping rules
-CREATE TEMP TABLE testnull(a int, b text);
+CREATE TABLE testnull(a int, b text);
INSERT INTO testnull VALUES (1, E'\\0'), (NULL, NULL);
COPY testnull TO stdout WITH NULL AS E'\\0';
1 \\0
@@ -280,6 +294,15 @@ SELECT * FROM testnull;
|
(4 rows)
+TRUNCATE testnull;
+COPY testnull FROM stdin WITH (NULL E'\\0', PARALLEL 2);
+SELECT * FROM testnull;
+ a | b
+----+----
+ 42 | \0
+ |
+(2 rows)
+
BEGIN;
CREATE TABLE vistest (LIKE testeoc);
COPY vistest FROM stdin CSV;
@@ -349,6 +372,34 @@ SELECT * FROM vistest;
BEGIN;
TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 2);
+SELECT * FROM vistest;
+ a
+----
+ a2
+ b
+(2 rows)
+
+SAVEPOINT s1;
+TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 2);
+SELECT * FROM vistest;
+ a
+----
+ d2
+ e
+(2 rows)
+
+COMMIT;
+SELECT * FROM vistest;
+ a
+----
+ d2
+ e
+(2 rows)
+
+BEGIN;
+TRUNCATE vistest;
COPY vistest FROM stdin CSV FREEZE;
SELECT * FROM vistest;
a
@@ -409,7 +460,7 @@ SELECT * FROM vistest;
(2 rows)
-- Test FORCE_NOT_NULL and FORCE_NULL options
-CREATE TEMP TABLE forcetest (
+CREATE TABLE forcetest (
a INT NOT NULL,
b TEXT NOT NULL,
c TEXT,
@@ -420,6 +471,8 @@ CREATE TEMP TABLE forcetest (
-- should succeed with no effect ("b" remains an empty string, "c" remains NULL)
BEGIN;
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
+TRUNCATE forcetest;
+COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c), PARALLEL 2);
COMMIT;
SELECT b, c FROM forcetest WHERE a = 1;
b | c
@@ -430,6 +483,8 @@ SELECT b, c FROM forcetest WHERE a = 1;
-- should succeed, FORCE_NULL and FORCE_NOT_NULL can be both specified
BEGIN;
COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d));
+TRUNCATE forcetest;
+COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d), PARALLEL 2);
COMMIT;
SELECT c, d FROM forcetest WHERE a = 2;
c | d
@@ -486,6 +541,31 @@ select * from check_con_tbl;
(2 rows)
+\d+ check_con_tbl
+ Table "public.check_con_tbl"
+ Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
+--------+---------+-----------+----------+---------+---------+--------------+-------------
+ f1 | integer | | | | plain | |
+Check constraints:
+ "check_con_tbl_check" CHECK (check_con_function(check_con_tbl.*))
+
+truncate check_con_tbl;
+copy check_con_tbl from stdin with (parallel 2);
+NOTICE: input = {"f1":1}
+NOTICE: input = {"f1":null}
+copy check_con_tbl from stdin with (parallel 2);
+NOTICE: input = {"f1":0}
+ERROR: new row for relation "check_con_tbl" violates check constraint "check_con_tbl_check"
+DETAIL: Failing row contains (0).
+CONTEXT: COPY check_con_tbl, line 1: "0"
+parallel worker
+select * from check_con_tbl;
+ f1
+----
+ 1
+
+(2 rows)
+
-- test with RLS enabled.
CREATE ROLE regress_rls_copy_user;
CREATE ROLE regress_rls_copy_user_colperms;
@@ -600,8 +680,130 @@ SELECT * FROM instead_of_insert_tbl;
(2 rows)
COMMIT;
+-- Parallel copy tests.
+CREATE TABLE test_parallel_copy (
+ a INT,
+ b INT,
+ c TEXT not null default 'stuff',
+ d TEXT,
+ e TEXT
+) ;
+COPY test_parallel_copy (a, b, c, d, e) from stdin with (PARALLEL '2');
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '2');
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '2');
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '2', FORMAT 'csv', HEADER);
+-- zero workers: should perform non-parallel copy
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0');
+ERROR: argument to option "parallel" must be a positive integer greater than zero
+LINE 1: COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0'...
+ ^
+-- referencing table: should perform non-parallel copy
+CREATE TABLE test_copy_pk(c1 INT PRIMARY KEY);
+INSERT INTO test_copy_pk VALUES(10);
+CREATE TABLE test_copy_ri(c1 INT REFERENCES test_copy_pk(c1));
+COPY test_copy_ri from stdin with (FORMAT csv, DELIMITER ',', PARALLEL '2');
+WARNING: Parallel copy not supported for specified table, copy will be run in non-parallel mode
+-- expressions: should perform non-parallel copy
+CREATE TABLE test_copy_expr (index INT, height REAL, weight REAL);
+COPY test_copy_expr FROM STDIN WITH (FORMAT csv, DELIMITER ',', parallel '2') WHERE height > random() * 65;
+WARNING: Parallel copy not supported for specified table, copy will be run in non-parallel mode
+-- serial data: should perform non-parallel copy
+CREATE TABLE testserial (index SERIAL, height REAL);
+COPY testserial(height) FROM STDIN WITH (FORMAT csv, DELIMITER ',', parallel '2');
+WARNING: Parallel copy not supported for specified table, copy will be run in non-parallel mode
+-- temporary table copy: should peform non-parallel copy
+CREATE TEMPORARY TABLE temp_test(
+ a int
+) ;
+COPY temp_test (a) from stdin with (PARALLEL '2');
+WARNING: Parallel copy not supported for specified table, copy will be run in non-parallel mode
+-- non-existent column in column list: should fail
+COPY test_parallel_copy (xyz) from stdin with (PARALLEL '2');
+ERROR: column "xyz" of relation "test_parallel_copy" does not exist
+-- too many columns in column list: should fail
+COPY test_parallel_copy (a, b, c, d, e, d, c) from stdin with (PARALLEL '2');
+ERROR: column "d" specified more than once
+-- missing data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL '2');
+ERROR: invalid input syntax for type integer: ""
+CONTEXT: COPY test_parallel_copy, line 0, column a: ""
+parallel worker
+COPY test_parallel_copy from stdin with (PARALLEL '2');
+ERROR: missing data for column "e"
+CONTEXT: COPY test_parallel_copy, line 1: "2000 230 23 23"
+parallel worker
+COPY test_parallel_copy from stdin with (PARALLEL '2');
+ERROR: missing data for column "e"
+CONTEXT: COPY test_parallel_copy, line 1: "2001 231 \N \N"
+parallel worker
+-- extra data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL '2');
+ERROR: extra data after last expected column
+CONTEXT: COPY test_parallel_copy, line 1: "2002 232 40 50 60 70 80"
+parallel worker
+-- various COPY options: delimiters, oids, NULL string, encoding
+COPY test_parallel_copy (b, c, d, e) from stdin with (delimiter ',', null 'x', PARALLEL '2') ;
+COPY test_parallel_copy from stdin WITH (DELIMITER ';', NULL '', PARALLEL '2');
+COPY test_parallel_copy from stdin WITH (DELIMITER ':', NULL E'\\X', ENCODING 'sql_ascii', PARALLEL '2');
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a = 50004;
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a > 60003;
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE f > 60003;
+ERROR: column "f" does not exist
+LINE 1: ...arallel_copy from stdin WITH (PARALLEL '2') WHERE f > 60003;
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a = max(x.b);
+ERROR: missing FROM-clause entry for table "x"
+LINE 1: ...llel_copy from stdin WITH (PARALLEL '2') WHERE a = max(x.b);
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a IN (SELECT 1 FROM x);
+ERROR: cannot use subquery in COPY FROM WHERE condition
+LINE 1: ...allel_copy from stdin WITH (PARALLEL '2') WHERE a IN (SELECT...
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a IN (generate_series(1,5));
+ERROR: set-returning functions are not allowed in COPY FROM WHERE conditions
+LINE 1: ...l_copy from stdin WITH (PARALLEL '2') WHERE a IN (generate_s...
+ ^
+COPY test_parallel_copy from stdin WITH(PARALLEL '2') WHERE a = row_number() over(b);
+ERROR: window functions are not allowed in COPY FROM WHERE conditions
+LINE 1: ...llel_copy from stdin WITH(PARALLEL '2') WHERE a = row_number...
+ ^
+-- check results of copy in
+SELECT * FROM test_parallel_copy;
+ a | b | c | d | e
+-------+----+------------+---------+---------
+ 1 | 11 | test_c1 | test_d1 | test_e1
+ 2 | 12 | test_c2 | test_d2 | test_e2
+ | 3 | stuff | test_d3 |
+ | 4 | stuff | test_d4 |
+ | 5 | stuff | test_d5 |
+ | | 45 | 80 | 90
+ | | x | \x | \x
+ | | , | \, | \
+ 3000 | | c | |
+ 4000 | | C | |
+ 4001 | 1 | empty | |
+ 4002 | 2 | null | |
+ 4003 | 3 | Backslash | \ | \
+ 4004 | 4 | BackslashX | \X | \X
+ 4005 | 5 | N | N | N
+ 4006 | 6 | BackslashN | \N | \N
+ 4007 | 7 | XX | XX | XX
+ 4008 | 8 | Delimiter | : | :
+ 50004 | 25 | 35 | 45 | 55
+ 60004 | 25 | 35 | 45 | 55
+ 60005 | 26 | 36 | 46 | 56
+(21 rows)
+
-- clean up
+DROP TABLE test_copy_ri;
+DROP TABLE test_copy_pk;
+DROP TABLE test_copy_expr;
+DROP TABLE testeoc;
+DROP TABLE testnl;
DROP TABLE forcetest;
+DROP TABLE test_parallel_copy;
+DROP TABLE testserial;
+DROP TABLE testnull;
DROP TABLE vistest;
DROP FUNCTION truncate_in_subxact();
DROP TABLE x, y;
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index a1d529a..159c058 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -15,6 +15,13 @@ DELETE FROM onek;
COPY onek FROM '@abs_builddir@/results/onek.data';
+-- test parallel copy
+COPY tenk1 FROM '@abs_srcdir@/data/tenk.data' WITH (parallel 2);
+
+SELECT COUNT(*) FROM tenk1;
+
+TRUNCATE tenk1;
+
COPY tenk1 FROM '@abs_srcdir@/data/tenk.data';
COPY slow_emp4000 FROM '@abs_srcdir@/data/rect.data';
@@ -159,6 +166,11 @@ truncate parted_copytest;
copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv';
+-- Test parallel copy from with a partitioned table.
+truncate parted_copytest;
+
+copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv' with (parallel 2);
+
-- Ensure COPY FREEZE errors for partitioned tables.
begin;
truncate parted_copytest;
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index 938d355..c3003fe 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -9,6 +9,15 @@ COPY onek FROM '@abs_srcdir@/data/onek.data';
COPY onek TO '@abs_builddir@/results/onek.data';
DELETE FROM onek;
COPY onek FROM '@abs_builddir@/results/onek.data';
+-- test parallel copy
+COPY tenk1 FROM '@abs_srcdir@/data/tenk.data' WITH (parallel 2);
+SELECT COUNT(*) FROM tenk1;
+ count
+-------
+ 10000
+(1 row)
+
+TRUNCATE tenk1;
COPY tenk1 FROM '@abs_srcdir@/data/tenk.data';
COPY slow_emp4000 FROM '@abs_srcdir@/data/rect.data';
COPY person FROM '@abs_srcdir@/data/person.data';
@@ -113,6 +122,9 @@ insert into parted_copytest select x,1,'One' from generate_series(1011,1020) x;
copy (select * from parted_copytest order by a) to '@abs_builddir@/results/parted_copytest.csv';
truncate parted_copytest;
copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv';
+-- Test parallel copy from with a partitioned table.
+truncate parted_copytest;
+copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv' with (parallel 2);
-- Ensure COPY FREEZE errors for partitioned tables.
begin;
truncate parted_copytest;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 902f4fa..13104f4 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -157,7 +157,7 @@ COPY y TO stdout (FORMAT CSV, FORCE_QUOTE *);
--test that we read consecutive LFs properly
-CREATE TEMP TABLE testnl (a int, b text, c int);
+CREATE TABLE testnl (a int, b text, c int);
COPY testnl FROM stdin CSV;
1,"a field with two LFs
@@ -165,8 +165,16 @@ COPY testnl FROM stdin CSV;
inside",2
\.
+COPY testnl FROM stdin WITH (FORMAT 'csv', PARALLEL '2');
+1,"a field with two LFs
+
+inside",2
+\.
+
+SELECT * FROM testnl;
+
-- test end of copy marker
-CREATE TEMP TABLE testeoc (a text);
+CREATE TABLE testeoc (a text);
COPY testeoc FROM stdin CSV;
a\.
@@ -175,11 +183,19 @@ c\.d
"\."
\.
+TRUNCATE testeoc;
+COPY testeoc FROM stdin WITH (FORMAT 'csv', PARALLEL '2');
+a\.
+\.b
+c\.d
+"\."
+\.
+
COPY testeoc TO stdout CSV;
-- test handling of nonstandard null marker that violates escaping rules
-CREATE TEMP TABLE testnull(a int, b text);
+CREATE TABLE testnull(a int, b text);
INSERT INTO testnull VALUES (1, E'\\0'), (NULL, NULL);
COPY testnull TO stdout WITH NULL AS E'\\0';
@@ -191,6 +207,14 @@ COPY testnull FROM stdin WITH NULL AS E'\\0';
SELECT * FROM testnull;
+TRUNCATE testnull;
+COPY testnull FROM stdin WITH (NULL E'\\0', PARALLEL 2);
+42 \\0
+\0 \0
+\.
+
+SELECT * FROM testnull;
+
BEGIN;
CREATE TABLE vistest (LIKE testeoc);
COPY vistest FROM stdin CSV;
@@ -235,6 +259,23 @@ SELECT * FROM vistest;
BEGIN;
TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 2);
+a2
+b
+\.
+SELECT * FROM vistest;
+SAVEPOINT s1;
+TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 2);
+d2
+e
+\.
+SELECT * FROM vistest;
+COMMIT;
+SELECT * FROM vistest;
+
+BEGIN;
+TRUNCATE vistest;
COPY vistest FROM stdin CSV FREEZE;
x
y
@@ -284,7 +325,7 @@ SELECT * FROM vistest;
COMMIT;
SELECT * FROM vistest;
-- Test FORCE_NOT_NULL and FORCE_NULL options
-CREATE TEMP TABLE forcetest (
+CREATE TABLE forcetest (
a INT NOT NULL,
b TEXT NOT NULL,
c TEXT,
@@ -297,6 +338,10 @@ BEGIN;
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
1,,""
\.
+TRUNCATE forcetest;
+COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c), PARALLEL 2);
+1,,""
+\.
COMMIT;
SELECT b, c FROM forcetest WHERE a = 1;
-- should succeed, FORCE_NULL and FORCE_NOT_NULL can be both specified
@@ -304,6 +349,10 @@ BEGIN;
COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d));
2,'a',,""
\.
+TRUNCATE forcetest;
+COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d), PARALLEL 2);
+2,'a',,""
+\.
COMMIT;
SELECT c, d FROM forcetest WHERE a = 2;
-- should fail with not-null constraint violation
@@ -339,6 +388,16 @@ copy check_con_tbl from stdin;
0
\.
select * from check_con_tbl;
+\d+ check_con_tbl
+truncate check_con_tbl;
+copy check_con_tbl from stdin with (parallel 2);
+1
+\N
+\.
+copy check_con_tbl from stdin with (parallel 2);
+0
+\.
+select * from check_con_tbl;
-- test with RLS enabled.
CREATE ROLE regress_rls_copy_user;
@@ -440,8 +499,149 @@ test1
SELECT * FROM instead_of_insert_tbl;
COMMIT;
+-- Parallel copy tests.
+CREATE TABLE test_parallel_copy (
+ a INT,
+ b INT,
+ c TEXT not null default 'stuff',
+ d TEXT,
+ e TEXT
+) ;
+
+COPY test_parallel_copy (a, b, c, d, e) from stdin with (PARALLEL '2');
+1 11 test_c1 test_d1 test_e1
+2 12 test_c2 test_d2 test_e2
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '2');
+3 test_d3
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '2');
+4 test_d4
+5 test_d5
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '2', FORMAT 'csv', HEADER);
+b d
+\.
+
+-- zero workers: should perform non-parallel copy
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0');
+
+-- referencing table: should perform non-parallel copy
+CREATE TABLE test_copy_pk(c1 INT PRIMARY KEY);
+INSERT INTO test_copy_pk VALUES(10);
+CREATE TABLE test_copy_ri(c1 INT REFERENCES test_copy_pk(c1));
+COPY test_copy_ri from stdin with (FORMAT csv, DELIMITER ',', PARALLEL '2');
+10
+\.
+
+-- expressions: should perform non-parallel copy
+CREATE TABLE test_copy_expr (index INT, height REAL, weight REAL);
+
+COPY test_copy_expr FROM STDIN WITH (FORMAT csv, DELIMITER ',', parallel '2') WHERE height > random() * 65;
+60,60,60
+\.
+
+-- serial data: should perform non-parallel copy
+CREATE TABLE testserial (index SERIAL, height REAL);
+
+COPY testserial(height) FROM STDIN WITH (FORMAT csv, DELIMITER ',', parallel '2');
+60
+\.
+
+-- temporary table copy: should peform non-parallel copy
+CREATE TEMPORARY TABLE temp_test(
+ a int
+) ;
+
+COPY temp_test (a) from stdin with (PARALLEL '2');
+10
+\.
+
+-- non-existent column in column list: should fail
+COPY test_parallel_copy (xyz) from stdin with (PARALLEL '2');
+
+-- too many columns in column list: should fail
+COPY test_parallel_copy (a, b, c, d, e, d, c) from stdin with (PARALLEL '2');
+
+-- missing data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL '2');
+
+\.
+COPY test_parallel_copy from stdin with (PARALLEL '2');
+2000 230 23 23
+\.
+COPY test_parallel_copy from stdin with (PARALLEL '2');
+2001 231 \N \N
+\.
+
+-- extra data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL '2');
+2002 232 40 50 60 70 80
+\.
+
+-- various COPY options: delimiters, oids, NULL string, encoding
+COPY test_parallel_copy (b, c, d, e) from stdin with (delimiter ',', null 'x', PARALLEL '2') ;
+x,45,80,90
+x,\x,\\x,\\\x
+x,\,,\\\,,\\
+\.
+
+COPY test_parallel_copy from stdin WITH (DELIMITER ';', NULL '', PARALLEL '2');
+3000;;c;;
+\.
+
+COPY test_parallel_copy from stdin WITH (DELIMITER ':', NULL E'\\X', ENCODING 'sql_ascii', PARALLEL '2');
+4000:\X:C:\X:\X
+4001:1:empty::
+4002:2:null:\X:\X
+4003:3:Backslash:\\:\\
+4004:4:BackslashX:\\X:\\X
+4005:5:N:\N:\N
+4006:6:BackslashN:\\N:\\N
+4007:7:XX:\XX:\XX
+4008:8:Delimiter:\::\:
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a = 50004;
+50003 24 34 44 54
+50004 25 35 45 55
+50005 26 36 46 56
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a > 60003;
+60001 22 32 42 52
+60002 23 33 43 53
+60003 24 34 44 54
+60004 25 35 45 55
+60005 26 36 46 56
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE f > 60003;
+
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a = max(x.b);
+
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a IN (SELECT 1 FROM x);
+
+COPY test_parallel_copy from stdin WITH (PARALLEL '2') WHERE a IN (generate_series(1,5));
+
+COPY test_parallel_copy from stdin WITH(PARALLEL '2') WHERE a = row_number() over(b);
+
+-- check results of copy in
+SELECT * FROM test_parallel_copy;
+
-- clean up
+DROP TABLE test_copy_ri;
+DROP TABLE test_copy_pk;
+DROP TABLE test_copy_expr;
+DROP TABLE testeoc;
+DROP TABLE testnl;
DROP TABLE forcetest;
+DROP TABLE test_parallel_copy;
+DROP TABLE testserial;
+DROP TABLE testnull;
DROP TABLE vistest;
DROP FUNCTION truncate_in_subxact();
DROP TABLE x, y;
--
1.8.3.1
0006-Parallel-Copy-For-Binary-Format-Files.patchapplication/octet-stream; name=0006-Parallel-Copy-For-Binary-Format-Files.patchDownload
From 19af24a893547e7cc87adf2384c462a9b2bea188 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Fri, 26 Jun 2020 11:43:48 +0530
Subject: [PATCH 6/6] Parallel Copy For Binary Format Files
Leader reads data from the file into the DSM data blocks each of 64K size.
It also identifies each tuple data block id, start offset, end offset,
tuple size and updates this information in the ring data structure.
Workers parallely read the tuple information from the ring data structure,
the actual tuple data from the data blocks and parallely insert the tuples
into the table.
---
src/backend/commands/copy.c | 697 ++++++++++++++++++++++++++++++++----
1 file changed, 627 insertions(+), 70 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 440ada868d..c4c8078fe1 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -219,6 +219,16 @@ typedef struct ParallelCopyLineBuf
uint64 cur_lineno; /* line number for error messages */
}ParallelCopyLineBuf;
+/*
+ * Tuple boundary information used for parallel copy
+ * for binary format files.
+ */
+typedef struct ParallelCopyTupleInfo
+{
+ uint32 offset;
+ uint32 block_id;
+}ParallelCopyTupleInfo;
+
/*
* Parallel copy data information.
*/
@@ -240,6 +250,11 @@ typedef struct ParallelCopyData
/* Current position in worker_line_buf */
uint32 worker_line_buf_pos;
+
+ /*
+ * For binary formatted files
+ */
+ ParallelCopyDataBlock *curr_data_block;
}ParallelCopyData;
/*
@@ -397,6 +412,7 @@ typedef struct ParallelCopyCommonKeyData
/* Working state for COPY FROM */
AttrNumber num_defaults;
Oid relid;
+ bool binary;
}ParallelCopyCommonKeyData;
/*
@@ -697,6 +713,56 @@ if (!IsParallelCopy()) \
else \
return pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
+/*
+ * CHECK_FIELD_COUNT - Handles the error cases for field count
+ * for binary format files.
+ */
+#define CHECK_FIELD_COUNT \
+{\
+ if (fld_count == -1) \
+ { \
+ if (IsParallelCopy() && \
+ !IsLeader()) \
+ return true; \
+ else if (IsParallelCopy() && \
+ IsLeader()) \
+ { \
+ if (cstate->pcdata->curr_data_block->data[cstate->raw_buf_index + sizeof(fld_count)] != 0) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("received copy data after EOF marker"))); \
+ return true; \
+ } \
+ else \
+ { \
+ /* \
+ * Received EOF marker. In a V3-protocol copy, wait for the \
+ * protocol-level EOF, and complain if it doesn't come \
+ * immediately. This ensures that we correctly handle CopyFail, \
+ * if client chooses to send that now. \
+ * \
+ * Note that we MUST NOT try to read more data in an old-protocol \
+ * copy, since there is no protocol-level EOF marker then. We \
+ * could go either way for copy from file, but choose to throw \
+ * error if there's data after the EOF marker, for consistency \
+ * with the new-protocol case. \
+ */ \
+ char dummy; \
+ if (cstate->copy_dest != COPY_OLD_FE && \
+ CopyGetData(cstate, &dummy, 1, 1) > 0) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("received copy data after EOF marker"))); \
+ return false; \
+ } \
+ } \
+ if (fld_count != cstate->max_fields) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("row field count is %d, expected %d", \
+ (int) fld_count, cstate->max_fields))); \
+}
+
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
@@ -751,6 +817,16 @@ static void PopulateCatalogInformation(CopyState cstate);
static pg_attribute_always_inline uint32 GetLinePosition(CopyState cstate);
static pg_attribute_always_inline copy_data_source_cb LookupParallelCopyFnPtr(const char *funcname);
static pg_attribute_always_inline char* LookupParallelCopyFnStr(copy_data_source_cb fn_addr);
+static uint32 WaitGetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info);
+static pg_attribute_always_inline bool CopyReadBinaryTupleLeader(CopyState cstate);
+static pg_attribute_always_inline void CopyReadBinaryAttributeLeader(CopyState cstate,
+ FmgrInfo *flinfo, Oid typioparam, int32 typmod,
+ uint32 *new_block_pos, int m, ParallelCopyTupleInfo *tuple_start_info_ptr,
+ ParallelCopyTupleInfo *tuple_end_info_ptr, uint32 *line_size);
+static pg_attribute_always_inline bool CopyReadBinaryTupleWorker(CopyState cstate, Datum *values, bool *nulls);
+static pg_attribute_always_inline Datum CopyReadBinaryAttributeWorker(CopyState cstate, FmgrInfo *flinfo,
+ Oid typioparam, int32 typmod, bool *isnull);
+static void AdjustFieldInfo(CopyState cstate, uint8 mode);
/*
* CopyCommonInfoForWorker - Copy shared_cstate using cstate information.
@@ -769,6 +845,7 @@ CopyCommonInfoForWorker(CopyState cstate, ParallelCopyCommonKeyData *shared_csta
shared_cstate->convert_selectively = cstate->convert_selectively;
shared_cstate->num_defaults = cstate->num_defaults;
shared_cstate->relid = cstate->pcdata->relid;
+ shared_cstate->binary = cstate->binary;
}
/*
@@ -997,8 +1074,8 @@ FindInsertMethod(CopyState cstate)
static pg_attribute_always_inline bool
IsParallelCopyAllowed(CopyState cstate)
{
- /* Parallel copy not allowed for frontend (2.0 protocol) & binary option. */
- if ((cstate->copy_dest == COPY_OLD_FE) || cstate->binary)
+ /* Parallel copy not allowed for frontend (2.0 protocol). */
+ if (cstate->copy_dest == COPY_OLD_FE)
return false;
/* Check if copy is into foreign table or temporary table. */
@@ -1299,6 +1376,7 @@ ParallelWorkerInitialization(ParallelCopyCommonKeyData *shared_cstate,
cstate->convert_selectively = shared_cstate->convert_selectively;
cstate->num_defaults = shared_cstate->num_defaults;
pcdata->relid = shared_cstate->relid;
+ cstate->binary = shared_cstate->binary;
PopulateGlobalsForCopyFrom(cstate, tup_desc, attnamelist);
@@ -1314,7 +1392,7 @@ ParallelWorkerInitialization(ParallelCopyCommonKeyData *shared_cstate,
initStringInfo(&cstate->attribute_buf);
initStringInfo(&cstate->line_buf);
- for (count = 0; count < WORKER_CHUNK_COUNT;count++)
+ for (count = 0; count < WORKER_CHUNK_COUNT; count++)
initStringInfo(&pcdata->worker_line_buf[count].line_buf);
cstate->line_buf_converted = false;
@@ -1679,32 +1757,66 @@ ParallelCopyLeader(CopyState cstate)
/* Execute the before statement triggers from the leader */
ExecBeforeStmtTrigger(cstate);
- /* On input just throw the header line away. */
- if (cstate->cur_lineno == 0 && cstate->header_line)
+ if (!cstate->binary)
{
- cstate->cur_lineno++;
- if (CopyReadLine(cstate))
+ /* On input just throw the header line away. */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
{
- pcshared_info->is_read_in_progress = false;
- return; /* done */
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ {
+ pcshared_info->is_read_in_progress = false;
+ return; /* done */
+ }
}
- }
- for (;;)
- {
- bool done;
- cstate->cur_lineno++;
+ for (;;)
+ {
+ bool done;
+ cstate->cur_lineno++;
- /* Actually read the line into memory here. */
- done = CopyReadLine(cstate);
+ /* Actually read the line into memory here. */
+ done = CopyReadLine(cstate);
- /*
- * EOF at start of line means we're done. If we see EOF after some
- * characters, we act as though it was newline followed by EOF, ie,
- * process the line and then exit loop on next iteration.
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ break;
+ }
+ }
+ else
+ {
+ /* binary format */
+ /* for paralle copy leader, fill in the error
+ * context inforation here, in case any failures
+ * while determining tuple offsets, leader
+ * would throw the errors with proper context.
*/
- if (done && cstate->line_buf.len == 0)
- break;
+ ErrorContextCallback errcallback;
+ errcallback.callback = CopyFromErrorCallback;
+ errcallback.arg = (void *) cstate;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ cstate->pcdata->curr_data_block = NULL;
+ cstate->raw_buf_index = 0;
+ pcshared_info->populated = 0;
+ cstate->cur_lineno = 0;
+ cstate->max_fields = list_length(cstate->attnumlist);
+
+ for (;;)
+ {
+ bool eof = false;
+ cstate->cur_lineno++;
+
+ eof = CopyReadBinaryTupleLeader(cstate);
+
+ if (eof)
+ break;
+ }
}
pcshared_info->is_read_in_progress = false;
@@ -1712,7 +1824,463 @@ ParallelCopyLeader(CopyState cstate)
}
/*
- * GetLinePosition - return the line position that worker should process.
+ * AdjustFieldInfo - gets a new block, updates the
+ * current offset, calculates the skip bytes.
+ * Works in two modes, 1 for field count
+ * 2 for field size
+ */
+static void
+AdjustFieldInfo(CopyState cstate, uint8 mode)
+{
+ ParallelCopyDataBlock *data_block = NULL;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint8 movebytes = 0;
+ uint32 block_pos;
+ uint32 prev_block_pos;
+ int readbytes = 0;
+
+ prev_block_pos = pcshared_info->cur_block_pos;
+
+ block_pos = WaitGetFreeCopyBlock(pcshared_info);
+
+ movebytes = DATA_BLOCK_SIZE - cstate->raw_buf_index;
+
+ cstate->pcdata->curr_data_block->skip_bytes = movebytes;
+
+ data_block = &pcshared_info->data_blocks[block_pos];
+
+ if (movebytes > 0)
+ memmove(&data_block->data[0], &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index],
+ movebytes);
+
+ elog(DEBUG1, "LEADER - field info is spread across data blocks - moved %d bytes from current block %u to %u block",
+ movebytes, prev_block_pos, block_pos);
+
+ readbytes = CopyGetData(cstate, &data_block->data[movebytes], 1, (DATA_BLOCK_SIZE - movebytes));
+
+ elog(DEBUG1, "LEADER - bytes read from file after field info is moved to next data block %d", readbytes);
+
+ if (cstate->reached_eof)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("unexpected EOF in COPY data")));
+
+ if (mode == 1)
+ {
+ cstate->pcdata->curr_data_block = data_block;
+ cstate->raw_buf_index = 0;
+ }
+ else if(mode == 2)
+ {
+ ParallelCopyDataBlock *prev_data_block = NULL;
+ prev_data_block = cstate->pcdata->curr_data_block;
+ prev_data_block->following_block = block_pos;
+ cstate->pcdata->curr_data_block = data_block;
+
+ if (prev_data_block->curr_blk_completed == false)
+ prev_data_block->curr_blk_completed = true;
+
+ cstate->raw_buf_index = 0;
+ }
+}
+
+/*
+ * CopyReadBinaryTupleLeader - leader reads data from binary formatted file
+ * to data blocks and identies tuple boundaries/offsets so that workers
+ * can work on the data blocks data.
+ */
+static pg_attribute_always_inline bool
+CopyReadBinaryTupleLeader(CopyState cstate)
+{
+ int readbytes = 0;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint32 block_pos;
+ int16 fld_count;
+ ListCell *cur;
+ FmgrInfo *in_functions = cstate->in_functions;
+ Oid *typioparams = cstate->typioparams;
+ TupleDesc tupDesc = RelationGetDescr(cstate->rel);
+ uint32 new_block_pos;
+ uint32 line_size = -1;
+ ParallelCopyTupleInfo curr_tuple_start_info;
+ ParallelCopyTupleInfo curr_tuple_end_info;
+
+ curr_tuple_start_info.block_id = -1;
+ curr_tuple_start_info.offset = -1;
+ curr_tuple_end_info.block_id = -1;
+ curr_tuple_end_info.offset = -1;
+
+ if (cstate->pcdata->curr_data_block == NULL)
+ {
+ block_pos = WaitGetFreeCopyBlock(pcshared_info);
+
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[block_pos];
+
+ cstate->raw_buf_index = 0;
+
+ readbytes = CopyGetData(cstate, &cstate->pcdata->curr_data_block->data, 1, DATA_BLOCK_SIZE);
+
+ elog(DEBUG1, "LEADER - bytes read from file %d", readbytes);
+
+ if (cstate->reached_eof)
+ return true;
+ }
+
+ if (cstate->raw_buf_index + sizeof(fld_count) >= (DATA_BLOCK_SIZE - 1))
+ AdjustFieldInfo(cstate, 1);
+
+ memcpy(&fld_count, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_count));
+
+ fld_count = (int16) pg_ntoh16(fld_count);
+
+ CHECK_FIELD_COUNT;
+
+ cstate->raw_buf_index = cstate->raw_buf_index + sizeof(fld_count);
+ new_block_pos = pcshared_info->cur_block_pos;
+
+ foreach(cur, cstate->attnumlist)
+ {
+ int attnum = lfirst_int(cur);
+ int m = attnum - 1;
+ Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+ cstate->cur_attname = NameStr(att->attname);
+
+ CopyReadBinaryAttributeLeader(cstate,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &new_block_pos,
+ m,
+ &curr_tuple_start_info,
+ &curr_tuple_end_info,
+ &line_size);
+
+ cstate->cur_attname = NULL;
+ }
+
+ if (line_size > 0)
+ {
+ int line_pos = UpdateBlockInLineInfo(cstate,
+ curr_tuple_start_info.block_id,
+ curr_tuple_start_info.offset,
+ line_size,
+ LINE_LEADER_POPULATED);
+
+ pcshared_info->populated++;
+
+ elog(DEBUG1, "LEADER - adding - block:%u, offset:%u, line size:%u line position:%d",
+ curr_tuple_start_info.block_id,
+ curr_tuple_start_info.offset,
+ line_size, line_pos);
+ }
+
+ return false;
+}
+
+/*
+ * CopyReadBinaryAttributeLeader - leader identifies boundaries/offsets
+ * for each attribute/column, it moves on to next data block if the
+ * attribute/column is spread across data blocks.
+ */
+static pg_attribute_always_inline void
+CopyReadBinaryAttributeLeader(CopyState cstate, FmgrInfo *flinfo,
+ Oid typioparam, int32 typmod, uint32 *new_block_pos,
+ int m, ParallelCopyTupleInfo *tuple_start_info_ptr,
+ ParallelCopyTupleInfo *tuple_end_info_ptr, uint32 *line_size)
+{
+ int32 fld_size;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+
+ if (m == 0)
+ {
+ tuple_start_info_ptr->block_id = pcshared_info->cur_block_pos;
+ /* raw_buf_index would have moved away size of field count bytes
+ in the caller, so move back to store the tuple start offset.
+ */
+ tuple_start_info_ptr->offset = cstate->raw_buf_index - sizeof(int16);
+ }
+
+ if ((cstate->raw_buf_index + sizeof(fld_size)) >= (DATA_BLOCK_SIZE - 1))
+ {
+ AdjustFieldInfo(cstate, 2);
+ *new_block_pos = pcshared_info->cur_block_pos;
+ }
+
+ memcpy(&fld_size, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_size));
+
+ cstate->raw_buf_index = cstate->raw_buf_index + sizeof(fld_size);
+
+ fld_size = (int32) pg_ntoh32(fld_size);
+
+ if (fld_size == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("unexpected EOF in COPY data")));
+
+ if (fld_size < -1)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("invalid field size")));
+
+ if ((DATA_BLOCK_SIZE - cstate->raw_buf_index) >= fld_size)
+ {
+ cstate->raw_buf_index = cstate->raw_buf_index + fld_size;
+ }
+ else
+ {
+ ParallelCopyDataBlock *data_block = NULL;
+ ParallelCopyDataBlock *prev_data_block = NULL;
+ uint32 block_pos = *new_block_pos;
+ int readbytes;
+ int32 requiredblocks = 0;
+ int32 remainingbytesincurrdatablock = DATA_BLOCK_SIZE - cstate->raw_buf_index;
+ int i = 0;
+
+ /* field size can spread across multiple data blocks,
+ * calculate the number of required data blocks and try to get
+ * those many data blocks.
+ */
+ requiredblocks = (int32)(fld_size - remainingbytesincurrdatablock)/(int32)DATA_BLOCK_SIZE;
+
+ if ((fld_size - remainingbytesincurrdatablock)%DATA_BLOCK_SIZE != 0)
+ requiredblocks++;
+
+ i = requiredblocks;
+
+ while(i > 0)
+ {
+ data_block = NULL;
+ prev_data_block = NULL;
+ readbytes = 0;
+
+ block_pos = WaitGetFreeCopyBlock(pcshared_info);
+
+ data_block = &pcshared_info->data_blocks[block_pos];
+
+ readbytes = CopyGetData(cstate, &data_block->data[0], 1, DATA_BLOCK_SIZE);
+
+ elog(DEBUG1, "LEADER - bytes read from file after detecting that tuple is spread across data blocks %d", readbytes);
+
+ /* eof here means that most probably,
+ * the field size would have been corrupted.
+ */
+ if (cstate->reached_eof)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("unexpected EOF in COPY data")));
+
+ prev_data_block = cstate->pcdata->curr_data_block;
+ prev_data_block->following_block = block_pos;
+
+ if (prev_data_block->curr_blk_completed == false)
+ prev_data_block->curr_blk_completed = true;
+
+ cstate->pcdata->curr_data_block = data_block;
+
+ i--;
+ }
+
+ cstate->raw_buf_index = fld_size - (((requiredblocks - 1) * DATA_BLOCK_SIZE) +
+ remainingbytesincurrdatablock);
+
+ /* raw_buf_index should never cross data block size,
+ * as the required number of data blocks would have
+ * been obtained in the above while loop.
+ */
+ Assert(cstate->raw_buf_index <= DATA_BLOCK_SIZE);
+
+ *new_block_pos = block_pos;
+ }
+
+ if (m == cstate->max_fields - 1)
+ {
+ tuple_end_info_ptr->block_id = *new_block_pos;
+ tuple_end_info_ptr->offset = cstate->raw_buf_index - 1;
+
+ if (tuple_start_info_ptr->block_id == tuple_end_info_ptr->block_id)
+ {
+ elog(DEBUG1,"LEADER - tuple lies in a single data block");
+
+ *line_size = tuple_end_info_ptr->offset - tuple_start_info_ptr->offset + 1;
+ pg_atomic_add_fetch_u32(&pcshared_info->data_blocks[tuple_start_info_ptr->block_id].unprocessed_line_parts, 1);
+ }
+ else
+ {
+ uint32 following_block_id = pcshared_info->data_blocks[tuple_start_info_ptr->block_id].following_block;
+
+ elog(DEBUG1,"LEADER - tuple is spread across data blocks");
+
+ *line_size = DATA_BLOCK_SIZE - tuple_start_info_ptr->offset -
+ pcshared_info->data_blocks[tuple_start_info_ptr->block_id].skip_bytes;
+
+ pg_atomic_add_fetch_u32(&pcshared_info->data_blocks[tuple_start_info_ptr->block_id].unprocessed_line_parts, 1);
+
+ while (following_block_id != tuple_end_info_ptr->block_id)
+ {
+ *line_size = *line_size + DATA_BLOCK_SIZE - pcshared_info->data_blocks[following_block_id].skip_bytes;
+
+ pg_atomic_add_fetch_u32(&pcshared_info->data_blocks[following_block_id].unprocessed_line_parts, 1);
+
+ following_block_id = pcshared_info->data_blocks[following_block_id].following_block;
+
+ if (following_block_id == -1)
+ break;
+ }
+
+ if (following_block_id != -1)
+ pg_atomic_add_fetch_u32(&pcshared_info->data_blocks[following_block_id].unprocessed_line_parts, 1);
+
+ *line_size = *line_size + tuple_end_info_ptr->offset + 1;
+ }
+ }
+}
+
+/*
+ * CopyReadBinaryTupleWorker - each worker reads data from data blocks after
+ * getting leader-identified tuple offsets from ring data structure.
+ */
+static pg_attribute_always_inline bool
+CopyReadBinaryTupleWorker(CopyState cstate, Datum *values, bool *nulls)
+{
+ uint32 line_pos;
+ ParallelCopyLineBoundary *line_info;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ int16 fld_count;
+ ListCell *cur;
+ FmgrInfo *in_functions = cstate->in_functions;
+ Oid *typioparams = cstate->typioparams;
+ TupleDesc tupDesc = RelationGetDescr(cstate->rel);
+
+ line_pos = GetLinePosition(cstate);
+
+ if (line_pos == -1)
+ return true;
+
+ line_info = &pcshared_info->line_boundaries.ring[line_pos];
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[line_info->first_block];
+ cstate->raw_buf_index = line_info->start_offset;
+
+ if (cstate->raw_buf_index + sizeof(fld_count) >= (DATA_BLOCK_SIZE - 1))
+ /*the case where field count spread across datablocks should never occur.
+ as the leader would have moved it to next block*/
+ elog(DEBUG1, "WORKER - field count spread across datablocks should never occur");
+
+ memcpy(&fld_count, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_count));
+ fld_count = (int16) pg_ntoh16(fld_count);
+
+ CHECK_FIELD_COUNT;
+
+ cstate->raw_buf_index = cstate->raw_buf_index + sizeof(fld_count);
+
+ foreach(cur, cstate->attnumlist)
+ {
+ int attnum = lfirst_int(cur);
+ int m = attnum - 1;
+ Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+ cstate->cur_attname = NameStr(att->attname);
+
+ values[m] = CopyReadBinaryAttributeWorker(cstate,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &nulls[m]);
+ cstate->cur_attname = NULL;
+ }
+
+ pg_atomic_sub_fetch_u32(&cstate->pcdata->curr_data_block->unprocessed_line_parts, 1);
+ line_info->start_offset = -1;
+ pg_atomic_write_u32(&line_info->line_state, LINE_WORKER_PROCESSED);
+ pg_atomic_write_u32(&line_info->line_size, -1);
+ pg_atomic_add_fetch_u64(&pcshared_info->total_worker_processed, 1);
+
+ return false;
+}
+
+/*
+ * CopyReadBinaryAttributeWorker - leader identifies boundaries/offsets
+ * for each attribute/column, it moves on to next data block if the
+ * attribute/column is spread across data blocks.
+ */
+static pg_attribute_always_inline Datum
+CopyReadBinaryAttributeWorker(CopyState cstate, FmgrInfo *flinfo,
+ Oid typioparam, int32 typmod, bool *isnull)
+{
+ int32 fld_size;
+ Datum result;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+
+ if ((cstate->raw_buf_index + sizeof(fld_size)) >= (DATA_BLOCK_SIZE - 1))
+ {
+ ParallelCopyDataBlock *prev_data_block = cstate->pcdata->curr_data_block;
+
+ elog(DEBUG1, "WORKER - field size is spread across data blocks");
+
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[cstate->pcdata->curr_data_block->following_block];
+ pg_atomic_sub_fetch_u32(&prev_data_block->unprocessed_line_parts, 1);
+ cstate->raw_buf_index = 0;
+ }
+
+ memcpy(&fld_size, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_size));
+ fld_size = (int32) pg_ntoh32(fld_size);
+
+ if (fld_size == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("unexpected EOF in COPY data")));
+
+ if (fld_size < -1)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("invalid field size")));
+
+ cstate->raw_buf_index = cstate->raw_buf_index + sizeof(fld_size);
+
+ /* reset attribute_buf to empty, and load raw data in it */
+ resetStringInfo(&cstate->attribute_buf);
+
+ enlargeStringInfo(&cstate->attribute_buf, fld_size);
+
+ if ((DATA_BLOCK_SIZE - cstate->raw_buf_index) >= fld_size)
+ {
+ elog(DEBUG1, "WORKER - tuple lies in single data block");
+
+ memcpy(&cstate->attribute_buf.data[0],&cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], fld_size);
+ cstate->raw_buf_index = cstate->raw_buf_index + fld_size;
+ }
+ else
+ {
+ ParallelCopyDataBlock *prev_data_block = cstate->pcdata->curr_data_block;
+
+ elog(DEBUG1, "WORKER - tuple is spread across data blocks");
+
+ memcpy(&cstate->attribute_buf.data[0], &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index],
+ (DATA_BLOCK_SIZE - cstate->raw_buf_index));
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[cstate->pcdata->curr_data_block->following_block];
+ pg_atomic_sub_fetch_u32(&prev_data_block->unprocessed_line_parts, 1);
+ memcpy(&cstate->attribute_buf.data[DATA_BLOCK_SIZE - cstate->raw_buf_index],
+ &cstate->pcdata->curr_data_block->data[0], (fld_size - (DATA_BLOCK_SIZE - cstate->raw_buf_index)));
+ cstate->raw_buf_index = fld_size - (DATA_BLOCK_SIZE - cstate->raw_buf_index);
+ }
+
+ cstate->attribute_buf.len = fld_size;
+ cstate->attribute_buf.data[fld_size] = '\0';
+
+ /* Call the column type's binary input converter */
+ result = ReceiveFunctionCall(flinfo, &cstate->attribute_buf,
+ typioparam, typmod);
+
+ /* Trouble if it didn't eat the whole buffer */
+ if (cstate->attribute_buf.cursor != cstate->attribute_buf.len)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
+ errmsg("incorrect binary data format")));
+
+ *isnull = false;
+ return result;
+}
+
+/*
+ * GetLinePosition - return the line position that worker should pcdata->process.
*/
static uint32
GetLinePosition(CopyState cstate)
@@ -1797,7 +2365,9 @@ GetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
{
dataBlkPtr->curr_blk_completed = false;
dataBlkPtr->skip_bytes = 0;
+ dataBlkPtr->following_block = -1;
pcshared_info->cur_block_pos = block_pos;
+ MemSet(&dataBlkPtr->data[0], 0, DATA_BLOCK_SIZE);
return block_pos;
}
@@ -5476,60 +6046,47 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
else
{
/* binary */
- int16 fld_count;
- ListCell *cur;
-
- cstate->cur_lineno++;
-
- if (!CopyGetInt16(cstate, &fld_count))
+ if (!IsParallelCopy())
{
- /* EOF detected (end of file, or protocol-level EOF) */
- return false;
- }
+ int16 fld_count;
+ ListCell *cur;
- if (fld_count == -1)
- {
- /*
- * Received EOF marker. In a V3-protocol copy, wait for the
- * protocol-level EOF, and complain if it doesn't come
- * immediately. This ensures that we correctly handle CopyFail,
- * if client chooses to send that now.
- *
- * Note that we MUST NOT try to read more data in an old-protocol
- * copy, since there is no protocol-level EOF marker then. We
- * could go either way for copy from file, but choose to throw
- * error if there's data after the EOF marker, for consistency
- * with the new-protocol case.
- */
- char dummy;
+ cstate->cur_lineno++;
+ cstate->max_fields = list_length(cstate->attnumlist);
- if (cstate->copy_dest != COPY_OLD_FE &&
- CopyGetData(cstate, &dummy, 1, 1) > 0)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("received copy data after EOF marker")));
- return false;
- }
+ if (!CopyGetInt16(cstate, &fld_count))
+ {
+ /* EOF detected (end of file, or protocol-level EOF) */
+ return false;
+ }
- if (fld_count != attr_count)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("row field count is %d, expected %d",
- (int) fld_count, attr_count)));
+ CHECK_FIELD_COUNT;
- foreach(cur, cstate->attnumlist)
+ foreach(cur, cstate->attnumlist)
+ {
+ int attnum = lfirst_int(cur);
+ int m = attnum - 1;
+ Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+ cstate->cur_attname = NameStr(att->attname);
+ values[m] = CopyReadBinaryAttribute(cstate,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &nulls[m]);
+ cstate->cur_attname = NULL;
+ }
+ }
+ else
{
- int attnum = lfirst_int(cur);
- int m = attnum - 1;
- Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+ bool eof = false;
+ cstate->cur_lineno++;
+ cstate->max_fields = list_length(cstate->attnumlist);
- cstate->cur_attname = NameStr(att->attname);
- values[m] = CopyReadBinaryAttribute(cstate,
- &in_functions[m],
- typioparams[m],
- att->atttypmod,
- &nulls[m]);
- cstate->cur_attname = NULL;
+ eof = CopyReadBinaryTupleWorker(cstate, values, nulls);
+
+ if (eof)
+ return false;
}
}
--
2.25.1
Hi,
I have made few changes in 0003 & 0005 patch, there were a couple of
bugs in 0003 patch & some random test failures in 0005 patch.
Attached new patches which include the fixes for the same.
Thoughts?
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
On Fri, Jun 26, 2020 at 2:34 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
Show quoted text
Hi,
0006 patch has some code clean up and issue fixes found during internal testing.
Attaching the latest patches herewith.
The order of applying the patches remains the same i.e. from 0001 to 0006.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
0001-Copy-code-readjustment-to-support-parallel-copy.patchtext/x-patch; charset=US-ASCII; name=0001-Copy-code-readjustment-to-support-parallel-copy.patchDownload
From a6d82e18e41f5c5b9310699ccab4ce786984a8d6 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Tue, 23 Jun 2020 06:49:22 +0530
Subject: [PATCH 1/6] Copy code readjustment to support parallel copy.
This patch has the copy code slightly readjusted so that the common code is
separated to functions/macros, these functions/macros will be used by the
workers in the parallel copy code of the upcoming patches. EOL removal is moved
from CopyReadLine to CopyReadLineText, this change was required because in case
of parallel copy the record identification and record updation is done in
CopyReadLineText, before record information is updated in shared memory the new
line characters should be removed.
---
src/backend/commands/copy.c | 372 ++++++++++++++++++++++++++------------------
1 file changed, 221 insertions(+), 151 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 6b1fd6d..65a504f 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -95,6 +95,9 @@ typedef enum CopyInsertMethod
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
} CopyInsertMethod;
+#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
+
/*
* This struct contains all the state variables used throughout a COPY
* operation. For simplicity, we use the same struct for all variants of COPY,
@@ -219,7 +222,6 @@ typedef struct CopyStateData
* converts it. Note: we guarantee that there is a \0 at
* raw_buf[raw_buf_len].
*/
-#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
char *raw_buf;
int raw_buf_index; /* next byte to process */
int raw_buf_len; /* total # of bytes stored */
@@ -347,6 +349,88 @@ if (1) \
goto not_end_of_copy; \
} else ((void) 0)
+/*
+ * CONVERT_TO_SERVER_ENCODING - convert contents to server encoding.
+ */
+#define CONVERT_TO_SERVER_ENCODING(cstate) \
+{ \
+ /* Done reading the line. Convert it to server encoding. */ \
+ if (cstate->need_transcoding) \
+ { \
+ char *cvt; \
+ cvt = pg_any_to_server(cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ cstate->file_encoding); \
+ if (cvt != cstate->line_buf.data) \
+ { \
+ /* transfer converted data back to line_buf */ \
+ resetStringInfo(&cstate->line_buf); \
+ appendBinaryStringInfo(&cstate->line_buf, cvt, strlen(cvt)); \
+ pfree(cvt); \
+ } \
+ } \
+ /* Now it's safe to use the buffer in error messages */ \
+ cstate->line_buf_converted = true; \
+}
+
+/*
+ * CLEAR_EOL_FROM_COPIED_DATA - Clear EOL from the copied data.
+ */
+#define CLEAR_EOL_FROM_COPIED_DATA(copy_line_data, copy_line_pos, copy_line_size) \
+{ \
+ /* \
+ * If we didn't hit EOF, then we must have transferred the EOL marker \
+ * to line_buf along with the data. Get rid of it. \
+ */ \
+ switch (cstate->eol_type) \
+ { \
+ case EOL_NL: \
+ Assert(copy_line_size >= 1); \
+ Assert(copy_line_data[copy_line_pos - 1] == '\n'); \
+ copy_line_data[copy_line_pos - 1] = '\0'; \
+ copy_line_size--; \
+ break; \
+ case EOL_CR: \
+ Assert(copy_line_size >= 1); \
+ Assert(copy_line_data[copy_line_pos - 1] == '\r'); \
+ copy_line_data[copy_line_pos - 1] = '\0'; \
+ copy_line_size--; \
+ break; \
+ case EOL_CRNL: \
+ Assert(copy_line_size >= 2); \
+ Assert(copy_line_data[copy_line_pos - 2] == '\r'); \
+ Assert(copy_line_data[copy_line_pos - 1] == '\n'); \
+ copy_line_data[copy_line_pos - 2] = '\0'; \
+ copy_line_size -= 2; \
+ break; \
+ case EOL_UNKNOWN: \
+ /* shouldn't get here */ \
+ Assert(false); \
+ break; \
+ } \
+}
+
+/*
+ * CLEAR_EOL_LINE - Wrapper for clearing EOL.
+ */
+#define CLEAR_EOL_LINE() \
+if (!result && !IsHeaderLine()) \
+ CLEAR_EOL_FROM_COPIED_DATA(cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ cstate->line_buf.len) \
+
+/*
+ * INCREMENTPROCESSED - Increment the lines processed.
+ */
+#define INCREMENTPROCESSED(processed) \
+processed++;
+
+/*
+ * GETPROCESSED - Get the lines processed.
+ */
+#define GETPROCESSED(processed) \
+return processed;
+
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
@@ -392,6 +476,8 @@ static bool CopyGetInt32(CopyState cstate, int32 *val);
static void CopySendInt16(CopyState cstate, int16 val);
static bool CopyGetInt16(CopyState cstate, int16 *val);
+static void PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tup_desc,
+ List *attnamelist);
/*
* Send copy start/stop messages for frontend copies. These have changed
@@ -793,6 +879,7 @@ CopyLoadRawBuf(CopyState cstate)
{
int nbytes;
int inbytes;
+ int minread = 1;
if (cstate->raw_buf_index < cstate->raw_buf_len)
{
@@ -804,8 +891,11 @@ CopyLoadRawBuf(CopyState cstate)
else
nbytes = 0; /* no data need be saved */
+ if (cstate->copy_dest == COPY_NEW_FE)
+ minread = RAW_BUF_SIZE - nbytes;
+
inbytes = CopyGetData(cstate, cstate->raw_buf + nbytes,
- 1, RAW_BUF_SIZE - nbytes);
+ minread, RAW_BUF_SIZE - nbytes);
nbytes += inbytes;
cstate->raw_buf[nbytes] = '\0';
cstate->raw_buf_index = 0;
@@ -1463,7 +1553,6 @@ BeginCopy(ParseState *pstate,
{
CopyState cstate;
TupleDesc tupDesc;
- int num_phys_attrs;
MemoryContext oldcontext;
/* Allocate workspace and zero all fields */
@@ -1629,6 +1718,24 @@ BeginCopy(ParseState *pstate,
tupDesc = cstate->queryDesc->tupDesc;
}
+ PopulateGlobalsForCopyFrom(cstate, tupDesc, attnamelist);
+ cstate->copy_dest = COPY_FILE; /* default */
+
+ MemoryContextSwitchTo(oldcontext);
+
+ return cstate;
+}
+
+/*
+ * PopulateGlobalsForCopyFrom - Populates the common variables required for copy
+ * from operation. This is a helper function for BeginCopy function.
+ */
+static void
+PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tupDesc,
+ List *attnamelist)
+{
+ int num_phys_attrs;
+
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
@@ -1748,12 +1855,6 @@ BeginCopy(ParseState *pstate,
pg_database_encoding_max_length() > 1);
/* See Multibyte encoding comment above */
cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
-
- cstate->copy_dest = COPY_FILE; /* default */
-
- MemoryContextSwitchTo(oldcontext);
-
- return cstate;
}
/*
@@ -2646,32 +2747,11 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
}
/*
- * Copy FROM file to relation.
+ * Check if the relation specified in copy from is valid.
*/
-uint64
-CopyFrom(CopyState cstate)
+static void
+CheckTargetRelValidity(CopyState cstate)
{
- ResultRelInfo *resultRelInfo;
- ResultRelInfo *target_resultRelInfo;
- ResultRelInfo *prevResultRelInfo = NULL;
- EState *estate = CreateExecutorState(); /* for ExecConstraints() */
- ModifyTableState *mtstate;
- ExprContext *econtext;
- TupleTableSlot *singleslot = NULL;
- MemoryContext oldcontext = CurrentMemoryContext;
-
- PartitionTupleRouting *proute = NULL;
- ErrorContextCallback errcallback;
- CommandId mycid = GetCurrentCommandId(true);
- int ti_options = 0; /* start with default options for insert */
- BulkInsertState bistate = NULL;
- CopyInsertMethod insertMethod;
- CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
- uint64 processed = 0;
- bool has_before_insert_row_trig;
- bool has_instead_insert_row_trig;
- bool leafpart_use_multi_insert = false;
-
Assert(cstate->rel);
/*
@@ -2708,27 +2788,6 @@ CopyFrom(CopyState cstate)
RelationGetRelationName(cstate->rel))));
}
- /*
- * If the target file is new-in-transaction, we assume that checking FSM
- * for free space is a waste of time. This could possibly be wrong, but
- * it's unlikely.
- */
- if (RELKIND_HAS_STORAGE(cstate->rel->rd_rel->relkind) &&
- (cstate->rel->rd_createSubid != InvalidSubTransactionId ||
- cstate->rel->rd_firstRelfilenodeSubid != InvalidSubTransactionId))
- ti_options |= TABLE_INSERT_SKIP_FSM;
-
- /*
- * Optimize if new relfilenode was created in this subxact or one of its
- * committed children and we won't see those rows later as part of an
- * earlier scan or command. The subxact test ensures that if this subxact
- * aborts then the frozen rows won't be visible after xact cleanup. Note
- * that the stronger test of exactly which subtransaction created it is
- * crucial for correctness of this optimization. The test for an earlier
- * scan or command tolerates false negatives. FREEZE causes other sessions
- * to see rows they would not see under MVCC, and a false negative merely
- * spreads that anomaly to the current session.
- */
if (cstate->freeze)
{
/*
@@ -2766,9 +2825,61 @@ CopyFrom(CopyState cstate)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("cannot perform COPY FREEZE because the table was not created or truncated in the current subtransaction")));
+ }
+}
+
+/*
+ * Copy FROM file to relation.
+ */
+uint64
+CopyFrom(CopyState cstate)
+{
+ ResultRelInfo *resultRelInfo;
+ ResultRelInfo *target_resultRelInfo;
+ ResultRelInfo *prevResultRelInfo = NULL;
+ EState *estate = CreateExecutorState(); /* for ExecConstraints() */
+ ModifyTableState *mtstate;
+ ExprContext *econtext;
+ TupleTableSlot *singleslot = NULL;
+ MemoryContext oldcontext = CurrentMemoryContext;
+
+ PartitionTupleRouting *proute = NULL;
+ ErrorContextCallback errcallback;
+ CommandId mycid = GetCurrentCommandId(true);
+ int ti_options = 0; /* start with default options for insert */
+ BulkInsertState bistate = NULL;
+ CopyInsertMethod insertMethod;
+ CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
+ uint64 processed = 0;
+ bool has_before_insert_row_trig;
+ bool has_instead_insert_row_trig;
+ bool leafpart_use_multi_insert = false;
+
+ CheckTargetRelValidity(cstate);
+ /*
+ * If the target file is new-in-transaction, we assume that checking FSM
+ * for free space is a waste of time. This could possibly be wrong, but
+ * it's unlikely.
+ */
+ if (RELKIND_HAS_STORAGE(cstate->rel->rd_rel->relkind) &&
+ (cstate->rel->rd_createSubid != InvalidSubTransactionId ||
+ cstate->rel->rd_firstRelfilenodeSubid != InvalidSubTransactionId))
+ ti_options |= TABLE_INSERT_SKIP_FSM;
+
+ /*
+ * Optimize if new relfilenode was created in this subxact or one of its
+ * committed children and we won't see those rows later as part of an
+ * earlier scan or command. The subxact test ensures that if this subxact
+ * aborts then the frozen rows won't be visible after xact cleanup. Note
+ * that the stronger test of exactly which subtransaction created it is
+ * crucial for correctness of this optimization. The test for an earlier
+ * scan or command tolerates false negatives. FREEZE causes other sessions
+ * to see rows they would not see under MVCC, and a false negative merely
+ * spreads that anomaly to the current session.
+ */
+ if (cstate->freeze)
ti_options |= TABLE_INSERT_FROZEN;
- }
/*
* We need a ResultRelInfo so we can use the regular executor's
@@ -3261,7 +3372,7 @@ CopyFrom(CopyState cstate)
* or FDW; this is the same definition used by nodeModifyTable.c
* for counting tuples inserted by an INSERT command.
*/
- processed++;
+ INCREMENTPROCESSED(processed)
}
}
@@ -3316,30 +3427,15 @@ CopyFrom(CopyState cstate)
FreeExecutorState(estate);
- return processed;
+ GETPROCESSED(processed)
}
/*
- * Setup to read tuples from a file for COPY FROM.
- *
- * 'rel': Used as a template for the tuples
- * 'filename': Name of server-local file to read
- * 'attnamelist': List of char *, columns to include. NIL selects all cols.
- * 'options': List of DefElem. See copy_opt_item in gram.y for selections.
- *
- * Returns a CopyState, to be passed to NextCopyFrom and related functions.
+ * PopulateCatalogInformation - populate the catalog information.
*/
-CopyState
-BeginCopyFrom(ParseState *pstate,
- Relation rel,
- const char *filename,
- bool is_program,
- copy_data_source_cb data_source_cb,
- List *attnamelist,
- List *options)
+static void
+PopulateCatalogInformation(CopyState cstate)
{
- CopyState cstate;
- bool pipe = (filename == NULL);
TupleDesc tupDesc;
AttrNumber num_phys_attrs,
num_defaults;
@@ -3349,31 +3445,8 @@ BeginCopyFrom(ParseState *pstate,
Oid in_func_oid;
int *defmap;
ExprState **defexprs;
- MemoryContext oldcontext;
bool volatile_defexprs;
- cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
- oldcontext = MemoryContextSwitchTo(cstate->copycontext);
-
- /* Initialize state variables */
- cstate->reached_eof = false;
- cstate->eol_type = EOL_UNKNOWN;
- cstate->cur_relname = RelationGetRelationName(cstate->rel);
- cstate->cur_lineno = 0;
- cstate->cur_attname = NULL;
- cstate->cur_attval = NULL;
-
- /* Set up variables to avoid per-attribute overhead. */
- initStringInfo(&cstate->attribute_buf);
- initStringInfo(&cstate->line_buf);
- cstate->line_buf_converted = false;
- cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
- cstate->raw_buf_index = cstate->raw_buf_len = 0;
-
- /* Assign range table, we'll need it in CopyFrom. */
- if (pstate)
- cstate->range_table = pstate->p_rtable;
-
tupDesc = RelationGetDescr(cstate->rel);
num_phys_attrs = tupDesc->natts;
num_defaults = 0;
@@ -3451,6 +3524,54 @@ BeginCopyFrom(ParseState *pstate,
cstate->defexprs = defexprs;
cstate->volatile_defexprs = volatile_defexprs;
cstate->num_defaults = num_defaults;
+}
+
+/*
+ * Setup to read tuples from a file for COPY FROM.
+ *
+ * 'rel': Used as a template for the tuples
+ * 'filename': Name of server-local file to read
+ * 'attnamelist': List of char *, columns to include. NIL selects all cols.
+ * 'options': List of DefElem. See copy_opt_item in gram.y for selections.
+ *
+ * Returns a CopyState, to be passed to NextCopyFrom and related functions.
+ */
+CopyState
+BeginCopyFrom(ParseState *pstate,
+ Relation rel,
+ const char *filename,
+ bool is_program,
+ copy_data_source_cb data_source_cb,
+ List *attnamelist,
+ List *options)
+{
+ CopyState cstate;
+ bool pipe = (filename == NULL);
+ MemoryContext oldcontext;
+
+ cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+
+ /* Initialize state variables */
+ cstate->reached_eof = false;
+ cstate->eol_type = EOL_UNKNOWN;
+ cstate->cur_relname = RelationGetRelationName(cstate->rel);
+ cstate->cur_lineno = 0;
+ cstate->cur_attname = NULL;
+ cstate->cur_attval = NULL;
+
+ /* Set up variables to avoid per-attribute overhead. */
+ initStringInfo(&cstate->attribute_buf);
+ initStringInfo(&cstate->line_buf);
+ cstate->line_buf_converted = false;
+ cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+
+ /* Assign range table, we'll need it in CopyFrom. */
+ if (pstate)
+ cstate->range_table = pstate->p_rtable;
+
+ PopulateCatalogInformation(cstate);
cstate->is_program = is_program;
if (data_source_cb)
@@ -3860,60 +3981,8 @@ CopyReadLine(CopyState cstate)
} while (CopyLoadRawBuf(cstate));
}
}
- else
- {
- /*
- * If we didn't hit EOF, then we must have transferred the EOL marker
- * to line_buf along with the data. Get rid of it.
- */
- switch (cstate->eol_type)
- {
- case EOL_NL:
- Assert(cstate->line_buf.len >= 1);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\n');
- cstate->line_buf.len--;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_CR:
- Assert(cstate->line_buf.len >= 1);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\r');
- cstate->line_buf.len--;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_CRNL:
- Assert(cstate->line_buf.len >= 2);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 2] == '\r');
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\n');
- cstate->line_buf.len -= 2;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_UNKNOWN:
- /* shouldn't get here */
- Assert(false);
- break;
- }
- }
-
- /* Done reading the line. Convert it to server encoding. */
- if (cstate->need_transcoding)
- {
- char *cvt;
-
- cvt = pg_any_to_server(cstate->line_buf.data,
- cstate->line_buf.len,
- cstate->file_encoding);
- if (cvt != cstate->line_buf.data)
- {
- /* transfer converted data back to line_buf */
- resetStringInfo(&cstate->line_buf);
- appendBinaryStringInfo(&cstate->line_buf, cvt, strlen(cvt));
- pfree(cvt);
- }
- }
-
- /* Now it's safe to use the buffer in error messages */
- cstate->line_buf_converted = true;
+ CONVERT_TO_SERVER_ENCODING(cstate)
return result;
}
@@ -4277,6 +4346,7 @@ not_end_of_copy:
* Transfer any still-uncopied data to line_buf.
*/
REFILL_LINEBUF;
+ CLEAR_EOL_LINE()
return result;
}
--
1.8.3.1
0002-Framework-for-leader-worker-in-parallel-copy.patchtext/x-patch; charset=US-ASCII; name=0002-Framework-for-leader-worker-in-parallel-copy.patchDownload
From 3a2b807acb145ab0dbd12c9c3270da955e82a72c Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>, Bharath <bharath.rupireddyforpostgres@gmail.com>
Date: Wed, 24 Jun 2020 13:51:56 +0530
Subject: [PATCH 2/6] Framework for leader/worker in parallel copy
This patch has the framework for data structures in parallel copy, leader
initialization, worker initialization, shared memory updation, starting workers,
wait for workers and workers exiting.
---
src/backend/access/transam/parallel.c | 4 +
src/backend/commands/copy.c | 865 +++++++++++++++++++++++++++-
src/backend/replication/logical/tablesync.c | 2 +-
src/include/commands/copy.h | 4 +
src/tools/pgindent/typedefs.list | 8 +
5 files changed, 878 insertions(+), 5 deletions(-)
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 14a8690..09e7a19 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -25,6 +25,7 @@
#include "catalog/pg_enum.h"
#include "catalog/storage.h"
#include "commands/async.h"
+#include "commands/copy.h"
#include "executor/execParallel.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
@@ -145,6 +146,9 @@ static const struct
},
{
"parallel_vacuum_main", parallel_vacuum_main
+ },
+ {
+ "ParallelCopyMain", ParallelCopyMain
}
};
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 65a504f..c906655 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -96,9 +96,138 @@ typedef enum CopyInsertMethod
} CopyInsertMethod;
#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+
+#define DATA_BLOCK_SIZE RAW_BUF_SIZE
+#define RINGSIZE (10 * 1000)
+#define MAX_BLOCKS_COUNT 1000
+#define WORKER_CHUNK_COUNT 50 /* should be mod of RINGSIZE */
+
+#define IsParallelCopy() (cstate->is_parallel)
+#define IsLeader() (cstate->pcdata->is_leader)
#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
/*
+ * Copy data block information.
+ */
+typedef struct ParallelCopyDataBlock
+{
+ /* The number of unprocessed lines in the current block. */
+ pg_atomic_uint32 unprocessed_line_parts;
+
+ /*
+ * If the current line data is continued into another block, following_block
+ * will have the position where the remaining data need to be read.
+ */
+ uint32 following_block;
+
+ /*
+ * This flag will be set, when the leader finds out this block can be read
+ * safely by the worker. This helps the worker to start processing the line
+ * early where the line will be spread across many blocks and the worker
+ * need not wait for the complete line to be processed.
+ */
+ bool curr_blk_completed;
+ char data[DATA_BLOCK_SIZE]; /* data read from file */
+ uint8 skip_bytes;
+}ParallelCopyDataBlock;
+
+/*
+ * Individual line information.
+ * ParallelCopyLineBoundary is common data structure between leader & worker,
+ * this is protected by the following sequence in the leader & worker.
+ * Leader should operate in the following order:
+ * 1) update first_block, start_offset & cur_lineno in any order.
+ * 2) update line_size.
+ * 3) update line_state.
+ * Worker should operate in the following order:
+ * 1) read line_size.
+ * 2) only one worker should choose one line for processing, this is handled by
+ * using pg_atomic_compare_exchange_u32, worker will change the sate to
+ * LINE_WORKER_PROCESSING only if line_state is LINE_LEADER_POPULATED.
+ * 3) read first_block, start_offset & cur_lineno in any order.
+ */
+typedef struct ParallelCopyLineBoundary
+{
+ /* Position of the first block in data_blocks array. */
+ uint32 first_block;
+ uint32 start_offset; /* start offset of the line */
+
+ /*
+ * Size of the current line -1 means line is yet to be filled completely,
+ * 0 means empty line, >0 means line filled with line size data.
+ */
+ pg_atomic_uint32 line_size;
+ pg_atomic_uint32 line_state; /* line state */
+ uint64 cur_lineno; /* line number for error messages */
+}ParallelCopyLineBoundary;
+
+/*
+ * Circular queue used to store the line information.
+ */
+typedef struct ParallelCopyLineBoundaries
+{
+ /* Position for the leader to populate a line. */
+ uint32 leader_pos;
+
+ /* Data read from the file/stdin by the leader process. */
+ ParallelCopyLineBoundary ring[RINGSIZE];
+}ParallelCopyLineBoundaries;
+
+/*
+ * Shared information among parallel copy workers. This will be allocated in the
+ * DSM segment.
+ */
+typedef struct ParallelCopyShmInfo
+{
+ bool is_read_in_progress; /* file read status */
+
+ /*
+ * Actual lines inserted by worker (some records will be filtered based on
+ * where condition).
+ */
+ pg_atomic_uint64 processed;
+ pg_atomic_uint64 total_worker_processed; /* total processed records by the workers */
+ uint64 populated; /* lines populated by leader */
+ uint32 cur_block_pos; /* current data block */
+ ParallelCopyDataBlock data_blocks[MAX_BLOCKS_COUNT]; /* data block array */
+ FullTransactionId full_transaction_id; /* xid for copy from statement */
+ CommandId mycid; /* command id */
+ ParallelCopyLineBoundaries line_boundaries; /* line array */
+} ParallelCopyShmInfo;
+
+/*
+ * Parallel copy line buffer information.
+ */
+typedef struct ParallelCopyLineBuf
+{
+ StringInfoData line_buf;
+ uint64 cur_lineno; /* line number for error messages */
+}ParallelCopyLineBuf;
+
+/*
+ * Parallel copy data information.
+ */
+typedef struct ParallelCopyData
+{
+ Oid relid; /* relation id of the table */
+ ParallelCopyShmInfo *pcshared_info; /* common info in shared memory */
+ bool is_leader;
+
+ /* line position which worker is processing */
+ uint32 worker_processed_pos;
+
+ /*
+ * Local line_buf array, workers will copy it here and release the lines
+ * for the leader to continue.
+ */
+ ParallelCopyLineBuf worker_line_buf[WORKER_CHUNK_COUNT];
+ uint32 worker_line_buf_count; /* Number of lines */
+
+ /* Current position in worker_line_buf */
+ uint32 worker_line_buf_pos;
+}ParallelCopyData;
+
+/*
* This struct contains all the state variables used throughout a COPY
* operation. For simplicity, we use the same struct for all variants of COPY,
* even though some fields are used in only some cases.
@@ -225,8 +354,66 @@ typedef struct CopyStateData
char *raw_buf;
int raw_buf_index; /* next byte to process */
int raw_buf_len; /* total # of bytes stored */
+ int nworkers;
+ bool is_parallel;
+ ParallelCopyData *pcdata;
} CopyStateData;
+/*
+ * This structure helps in storing the common data from CopyStateData that are
+ * required by the workers. This information will then be allocated and stored
+ * into the DSM for the worker to retrieve and copy it to CopyStateData.
+ */
+typedef struct ParallelCopyCommonKeyData
+{
+ /* low-level state data */
+ CopyDest copy_dest; /* type of copy source/destination */
+ int file_encoding; /* file or remote side's character encoding */
+ bool need_transcoding; /* file encoding diff from server? */
+ bool encoding_embeds_ascii; /* ASCII can be non-first byte? */
+
+ /* parameters from the COPY command */
+ bool csv_mode; /* Comma Separated Value format? */
+ bool header_line; /* CSV header line? */
+ int null_print_len; /* length of same */
+ bool force_quote_all; /* FORCE_QUOTE *? */
+ bool convert_selectively; /* do selective binary conversion? */
+
+ /* Working state for COPY FROM */
+ AttrNumber num_defaults;
+ Oid relid;
+}ParallelCopyCommonKeyData;
+
+/*
+ * This structure will help in converting a List data type into the below
+ * structure format with the count having the number of elements in the list and
+ * the info having the List elements appended contigously. This converted
+ * structure will be allocated in shared memory and stored in DSM for the worker
+ * to retrieve and later convert it back to List data type.
+ */
+typedef struct ParallelCopyKeyListInfo
+{
+ int count; /* count of attributes */
+
+ /* string info in the form info followed by info1, info2... infon */
+ char info[1];
+} ParallelCopyKeyListInfo;
+
+/*
+ * List of internal parallel copy function pointers.
+ */
+static const struct
+{
+ char *fn_name;
+ copy_data_source_cb fn_addr;
+} InternalParallelCopyFuncPtrs[] =
+
+{
+ {
+ "copy_read_data", copy_read_data
+ },
+};
+
/* DestReceiver for COPY (query) TO */
typedef struct
{
@@ -256,6 +443,23 @@ typedef struct
/* Trim the list of buffers back down to this number after flushing */
#define MAX_PARTITION_BUFFERS 32
+/* DSM keys for parallel copy. */
+#define PARALLEL_COPY_KEY_SHARED_INFO 1
+#define PARALLEL_COPY_KEY_CSTATE 2
+#define PARALLEL_COPY_KEY_NULL_PRINT 3
+#define PARALLEL_COPY_KEY_NULL_PRINT_CLIENT 4
+#define PARALLEL_COPY_KEY_DELIM 5
+#define PARALLEL_COPY_KEY_QUOTE 6
+#define PARALLEL_COPY_KEY_ESCAPE 7
+#define PARALLEL_COPY_KEY_ATTNAME_LIST 8
+#define PARALLEL_COPY_KEY_FORCE_QUOTE_LIST 9
+#define PARALLEL_COPY_KEY_NOT_NULL_LIST 10
+#define PARALLEL_COPY_KEY_NULL_LIST 11
+#define PARALLEL_COPY_KEY_CONVERT_LIST 12
+#define PARALLEL_COPY_KEY_DATASOURCE_CB 13
+#define PARALLEL_COPY_KEY_WHERE_CLAUSE_STR 14
+#define PARALLEL_COPY_KEY_RANGE_TABLE 15
+
/* Stores multi-insert data related to a single relation in CopyFrom. */
typedef struct CopyMultiInsertBuffer
{
@@ -476,10 +680,596 @@ static bool CopyGetInt32(CopyState cstate, int32 *val);
static void CopySendInt16(CopyState cstate, int16 val);
static bool CopyGetInt16(CopyState cstate, int16 *val);
+static pg_attribute_always_inline void EndParallelCopy(ParallelContext *pcxt);
static void PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tup_desc,
List *attnamelist);
+static pg_attribute_always_inline copy_data_source_cb LookupParallelCopyFnPtr(const char *funcname);
+static pg_attribute_always_inline char* LookupParallelCopyFnStr(copy_data_source_cb fn_addr);
+
+/*
+ * CopyCommonInfoForWorker - Copy shared_cstate using cstate information.
+ */
+static pg_attribute_always_inline void
+CopyCommonInfoForWorker(CopyState cstate, ParallelCopyCommonKeyData *shared_cstate)
+{
+ shared_cstate->copy_dest = cstate->copy_dest;
+ shared_cstate->file_encoding = cstate->file_encoding;
+ shared_cstate->need_transcoding = cstate->need_transcoding;
+ shared_cstate->encoding_embeds_ascii = cstate->encoding_embeds_ascii;
+ shared_cstate->csv_mode = cstate->csv_mode;
+ shared_cstate->header_line = cstate->header_line;
+ shared_cstate->null_print_len = cstate->null_print_len;
+ shared_cstate->force_quote_all = cstate->force_quote_all;
+ shared_cstate->convert_selectively = cstate->convert_selectively;
+ shared_cstate->num_defaults = cstate->num_defaults;
+ shared_cstate->relid = cstate->pcdata->relid;
+}
+
+/*
+ * RetrieveSharedString - Retrieve the string from shared memory.
+ */
+static void
+RetrieveSharedString(shm_toc *toc, int sharedkey, char **copystr)
+{
+ char *shared_str_val = (char *)shm_toc_lookup(toc, sharedkey, true);
+ if (shared_str_val)
+ *copystr = pstrdup(shared_str_val);
+}
+
+/*
+ * RetrieveSharedList - Retrieve the list from shared memory.
+ */
+static void
+RetrieveSharedList(shm_toc *toc, int sharedkey, List **copylist)
+{
+ ParallelCopyKeyListInfo *listinformation = (ParallelCopyKeyListInfo *)shm_toc_lookup(toc, sharedkey,
+ true);
+ if (listinformation)
+ {
+ int length = 0;
+ int count;
+
+ for (count = 0; count < listinformation->count; count++)
+ {
+ char *attname = (char *)(listinformation->info + length);
+ length += strlen(attname) + 1;
+ *copylist = lappend(*copylist, makeString(attname));
+ }
+ }
+}
+
+/*
+ * CopyListSharedMemory - Copy the list into shared memory.
+ */
+static void
+CopyListSharedMemory(List *inputlist, Size memsize, ParallelCopyKeyListInfo *sharedlistinfo)
+{
+ ListCell *l;
+ int length = 0;
+
+ MemSet(sharedlistinfo, 0, memsize);
+ foreach(l, inputlist)
+ {
+ char *name = strVal(lfirst(l));
+ memcpy((char *)(sharedlistinfo->info + length), name, strlen(name));
+ sharedlistinfo->count++;
+ length += strlen(name) + 1;
+ }
+}
+
+/*
+ * ComputeListSize - compute the list size.
+ */
+static int
+ComputeListSize(List *inputlist)
+{
+ int est_size = sizeof(int);
+ if (inputlist != NIL)
+ {
+ ListCell *l;
+ foreach(l, inputlist)
+ est_size += strlen(strVal(lfirst(l))) + 1;
+ }
+
+ return est_size;
+}
+
+/*
+ * EstimateLineKeysStr - Estimate the size required in shared memory for the
+ * input string.
+ */
+static void
+EstimateLineKeysStr(ParallelContext *pcxt, char *inputstr)
+{
+ if (inputstr)
+ {
+ shm_toc_estimate_chunk(&pcxt->estimator, strlen(inputstr) + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+}
+
+/*
+ * EstimateLineKeysList - Estimate the size required in shared memory for the
+ * input list.
+ */
+static void
+EstimateLineKeysList(ParallelContext *pcxt, List *inputlist,
+ Size *est_list_size)
+{
+ if (inputlist != NIL)
+ {
+ *est_list_size = ComputeListSize(inputlist);
+ shm_toc_estimate_chunk(&pcxt->estimator, *est_list_size);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+}
/*
+ * InsertStringShm - Insert a string into shared memory.
+ */
+static void
+InsertStringShm(ParallelContext *pcxt, int key, char *inputstr)
+{
+ if (inputstr)
+ {
+ char *shmptr = (char *)shm_toc_allocate(pcxt->toc,
+ strlen(inputstr) + 1);
+ strcpy(shmptr, inputstr);
+ shm_toc_insert(pcxt->toc, key, shmptr);
+ }
+}
+
+/*
+ * InsertListShm - Insert a list into shared memory.
+ */
+static void
+InsertListShm(ParallelContext *pcxt, int key, List *inputlist,
+ Size est_list_size)
+{
+ if (inputlist != NIL)
+ {
+ ParallelCopyKeyListInfo *sharedlistinfo = (ParallelCopyKeyListInfo *)shm_toc_allocate(pcxt->toc,
+ est_list_size);
+ CopyListSharedMemory(inputlist, est_list_size, sharedlistinfo);
+ shm_toc_insert(pcxt->toc, key, sharedlistinfo);
+ }
+}
+
+/*
+ * BeginParallelCopy - start parallel copy tasks.
+ *
+ * Get the number of workers required to perform the parallel copy. The data
+ * structures that are required by the parallel workers will be initialized, the
+ * size required in DSM will be calculated and the necessary keys will be loaded
+ * in the DSM. The specified number of workers will then be launched.
+ *
+ */
+static ParallelContext*
+BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
+{
+ ParallelContext *pcxt;
+ ParallelCopyShmInfo *shared_info_ptr;
+ ParallelCopyCommonKeyData *shared_cstate;
+ FullTransactionId full_transaction_id;
+ Size est_shared_info;
+ Size est_cstateshared;
+ Size est_att_list_size;
+ Size est_quote_list_size;
+ Size est_notnull_list_size;
+ Size est_null_list_size;
+ Size est_convert_list_size;
+ Size est_datasource_cb_size;
+ int count = 0;
+ char *whereClauseStr = NULL;
+ char *rangeTableStr = NULL;
+ int parallel_workers = 0;
+ ParallelCopyData *pcdata;
+
+ parallel_workers = Min(nworkers, max_worker_processes);
+
+ /* Can't perform copy in parallel */
+ if (parallel_workers <= 0)
+ {
+ elog(WARNING,
+ "No workers available, copy will be run in non-parallel mode");
+ return NULL;
+ }
+
+ pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
+ cstate->pcdata = pcdata;
+
+ EnterParallelMode();
+ pcxt = CreateParallelContext("postgres", "ParallelCopyMain",
+ parallel_workers);
+ Assert(pcxt->nworkers > 0);
+
+ /*
+ * Estimate size for shared information for PARALLEL_COPY_KEY_SHARED_INFO
+ */
+ est_shared_info = sizeof(ParallelCopyShmInfo);
+ shm_toc_estimate_chunk(&pcxt->estimator, est_shared_info);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate the size for shared information for PARALLEL_COPY_KEY_CSTATE */
+ est_cstateshared = MAXALIGN(sizeof(ParallelCopyCommonKeyData));
+ shm_toc_estimate_chunk(&pcxt->estimator, est_cstateshared);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ EstimateLineKeysStr(pcxt, cstate->null_print);
+ EstimateLineKeysStr(pcxt, cstate->null_print_client);
+ EstimateLineKeysStr(pcxt, cstate->delim);
+ EstimateLineKeysStr(pcxt, cstate->quote);
+ EstimateLineKeysStr(pcxt, cstate->escape);
+
+ if (cstate->whereClause != NULL)
+ {
+ whereClauseStr = nodeToString(cstate->whereClause);
+ EstimateLineKeysStr(pcxt, whereClauseStr);
+ }
+
+ if (cstate->range_table != NULL)
+ {
+ rangeTableStr = nodeToString(cstate->range_table);
+ EstimateLineKeysStr(pcxt, rangeTableStr);
+ }
+
+ /* Estimate the size for shared information for PARALLEL_COPY_KEY_XID. */
+ shm_toc_estimate_chunk(&pcxt->estimator, sizeof(FullTransactionId));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_ATTNAME_LIST.
+ */
+ EstimateLineKeysList(pcxt, attnamelist, &est_att_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_FORCE_QUOTE_LIST.
+ */
+ EstimateLineKeysList(pcxt, cstate->force_quote, &est_quote_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_NOT_NULL_LIST.
+ */
+ EstimateLineKeysList(pcxt, cstate->force_notnull,
+ &est_notnull_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_NULL_LIST.
+ */
+ EstimateLineKeysList(pcxt, cstate->force_null, &est_null_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_CONVERT_LIST.
+ */
+ EstimateLineKeysList(pcxt, cstate->convert_select,
+ &est_convert_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_DATASOURCE_CB.
+ */
+ if (cstate->data_source_cb)
+ {
+ char *functionname = LookupParallelCopyFnStr(cstate->data_source_cb);
+ est_datasource_cb_size = strlen(functionname) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, est_datasource_cb_size);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+
+ InitializeParallelDSM(pcxt);
+
+ /* If no DSM segment was available, back out (do serial copy) */
+ if (pcxt->seg == NULL)
+ {
+ EndParallelCopy(pcxt);
+ elog(WARNING,
+ "DSM segments not available, copy will be run in non-parallel mode");
+ return NULL;
+ }
+
+ /* Allocate shared memory for PARALLEL_COPY_KEY_SHARED_INFO */
+ shared_info_ptr = (ParallelCopyShmInfo *) shm_toc_allocate(pcxt->toc,
+ est_shared_info);
+ MemSet(shared_info_ptr, 0, est_shared_info);
+ shared_info_ptr->is_read_in_progress = true;
+ shared_info_ptr->cur_block_pos = -1;
+ shared_info_ptr->full_transaction_id = full_transaction_id;
+ shared_info_ptr->mycid = GetCurrentCommandId(true);
+ for (count = 0; count < RINGSIZE; count++)
+ {
+ ParallelCopyLineBoundary *lineInfo = &shared_info_ptr->line_boundaries.ring[count];
+ pg_atomic_init_u32(&(lineInfo->line_size), -1);
+ }
+
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_SHARED_INFO, shared_info_ptr);
+ pcdata->pcshared_info = shared_info_ptr;
+ pcdata->relid = relid;
+
+ /* Store shared build state, for which we reserved space. */
+ shared_cstate = (ParallelCopyCommonKeyData *)shm_toc_allocate(pcxt->toc,
+ est_cstateshared);
+
+ /* copy cstate variables. */
+ CopyCommonInfoForWorker(cstate, shared_cstate);
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_CSTATE, shared_cstate);
+
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_NULL_PRINT, cstate->null_print);
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_NULL_PRINT_CLIENT,
+ cstate->null_print_client);
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_DELIM, cstate->delim);
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_QUOTE, cstate->quote);
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_ESCAPE, cstate->escape);
+
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_ATTNAME_LIST,
+ attnamelist, est_att_list_size);
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_FORCE_QUOTE_LIST,
+ cstate->force_quote, est_quote_list_size);
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_NOT_NULL_LIST,
+ cstate->force_notnull, est_notnull_list_size);
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_NULL_LIST, cstate->force_null,
+ est_null_list_size);
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_CONVERT_LIST,
+ cstate->convert_select, est_convert_list_size);
+
+ if (cstate->data_source_cb)
+ {
+ char *functionname = LookupParallelCopyFnStr(cstate->data_source_cb);
+ char *data_source_cb = (char *) shm_toc_allocate(pcxt->toc,
+ est_datasource_cb_size);
+ strcpy(data_source_cb, functionname);
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_DATASOURCE_CB,
+ data_source_cb);
+ }
+
+ if (cstate->whereClause)
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_WHERE_CLAUSE_STR,
+ whereClauseStr);
+
+ if(cstate->range_table)
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_RANGE_TABLE, rangeTableStr);
+
+ LaunchParallelWorkers(pcxt);
+ if (pcxt->nworkers_launched == 0)
+ {
+ EndParallelCopy(pcxt);
+ elog(WARNING,
+ "No workers available, copy will be run in non-parallel mode");
+ return NULL;
+ }
+
+ /*
+ * Caller needs to wait for all launched workers when we return. Make sure
+ * that the failure-to-start case will not hang forever.
+ */
+ WaitForParallelWorkersToAttach(pcxt);
+
+ pcdata->is_leader = true;
+ cstate->is_parallel = true;
+ return pcxt;
+}
+
+/*
+ * EndParallelCopy - end the parallel copy tasks.
+ */
+static pg_attribute_always_inline void
+EndParallelCopy(ParallelContext *pcxt)
+{
+ Assert(!IsParallelWorker());
+
+ DestroyParallelContext(pcxt);
+ ExitParallelMode();
+}
+
+/*
+ * ParallelWorkerInitialization - Initialize parallel worker.
+ */
+static void
+ParallelWorkerInitialization(ParallelCopyCommonKeyData *shared_cstate,
+ CopyState cstate, List *attnamelist)
+{
+ uint32 count;
+ ParallelCopyData *pcdata = cstate->pcdata;
+ TupleDesc tup_desc = RelationGetDescr(cstate->rel);
+
+ cstate->copy_dest = shared_cstate->copy_dest;
+ cstate->file_encoding = shared_cstate->file_encoding;
+ cstate->need_transcoding = shared_cstate->need_transcoding;
+ cstate->encoding_embeds_ascii = shared_cstate->encoding_embeds_ascii;
+ cstate->csv_mode = shared_cstate->csv_mode;
+ cstate->header_line = shared_cstate->header_line;
+ cstate->null_print_len = shared_cstate->null_print_len;
+ cstate->force_quote_all = shared_cstate->force_quote_all;
+ cstate->convert_selectively = shared_cstate->convert_selectively;
+ cstate->num_defaults = shared_cstate->num_defaults;
+ pcdata->relid = shared_cstate->relid;
+
+ PopulateGlobalsForCopyFrom(cstate, tup_desc, attnamelist);
+
+ /* Initialize state variables. */
+ cstate->reached_eof = false;
+ cstate->eol_type = EOL_UNKNOWN;
+ cstate->cur_relname = RelationGetRelationName(cstate->rel);
+ cstate->cur_lineno = 0;
+ cstate->cur_attname = NULL;
+ cstate->cur_attval = NULL;
+
+ /* Set up variables to avoid per-attribute overhead. */
+ initStringInfo(&cstate->attribute_buf);
+
+ initStringInfo(&cstate->line_buf);
+ for (count = 0; count < WORKER_CHUNK_COUNT;count++)
+ initStringInfo(&pcdata->worker_line_buf[count].line_buf);
+
+ cstate->line_buf_converted = false;
+ cstate->raw_buf = NULL;
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+}
+/*
+ * ParallelCopyMain - parallel copy worker's code.
+ *
+ * Where clause handling, convert tuple to columns, add default null values for
+ * the missing columns that are not present in that record. Find the partition
+ * if it is partitioned table, invoke before row insert Triggers, handle
+ * constraints and insert the tuples.
+ */
+void
+ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
+{
+ CopyState cstate;
+ ParallelCopyData *pcdata;
+ ParallelCopyShmInfo *pcshared_info;
+ ParallelCopyCommonKeyData *shared_cstate;
+ Relation rel = NULL;
+ MemoryContext oldcontext;
+ List *attlist = NIL;
+ char *data_source_cb;
+ char *whereClauseStr = NULL;
+ char *rangeTableStr = NULL;
+
+ /* Allocate workspace and zero all fields. */
+ cstate = (CopyStateData *) palloc0(sizeof(CopyStateData));
+ pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
+ cstate->pcdata = pcdata;
+ pcdata->is_leader = false;
+ pcdata->worker_processed_pos = -1;
+ cstate->is_parallel = true;
+
+ /*
+ * We allocate everything used by a cstate in a new memory context. This
+ * avoids memory leaks during repeated use of COPY in a query.
+ */
+ cstate->copycontext = AllocSetContextCreate(CurrentMemoryContext,
+ "COPY",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+
+ pcshared_info = (ParallelCopyShmInfo *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_SHARED_INFO,
+ false);
+
+ ereport(DEBUG1, (errmsg("Starting parallel copy worker")));
+
+ pcdata->pcshared_info = pcshared_info;
+
+ shared_cstate = (ParallelCopyCommonKeyData *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_CSTATE,
+ false);
+
+ cstate->null_print = (char *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_NULL_PRINT,
+ true);
+ cstate->null_print_client = (char *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_NULL_PRINT_CLIENT,
+ true);
+
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_DELIM, &cstate->delim);
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_QUOTE, &cstate->quote);
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_ESCAPE, &cstate->escape);
+
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_ATTNAME_LIST, &attlist);
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_FORCE_QUOTE_LIST, &cstate->force_quote);
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_NOT_NULL_LIST, &cstate->force_notnull);
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_NULL_LIST, &cstate->force_null);
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_CONVERT_LIST, &cstate->convert_select);
+ data_source_cb = (char *)shm_toc_lookup(toc,
+ PARALLEL_COPY_KEY_DATASOURCE_CB,
+ true);
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_WHERE_CLAUSE_STR, &whereClauseStr);
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_RANGE_TABLE, &rangeTableStr);
+
+ if (data_source_cb)
+ cstate->data_source_cb = LookupParallelCopyFnPtr(data_source_cb);
+
+ if (whereClauseStr)
+ {
+ Node *whereClauseCnvrtdFrmStr = (Node *) stringToNode(whereClauseStr);
+ cstate->whereClause = whereClauseCnvrtdFrmStr;
+ }
+
+ if (rangeTableStr)
+ {
+ List *rangeTableCnvrtdFrmStr = (List *) stringToNode(rangeTableStr);
+ cstate->range_table = rangeTableCnvrtdFrmStr;
+ }
+
+ /* Open and lock the relation, using the appropriate lock type. */
+ rel = table_open(shared_cstate->relid, RowExclusiveLock);
+ cstate->rel = rel;
+ ParallelWorkerInitialization(shared_cstate, cstate, attlist);
+
+ CopyFrom(cstate);
+
+ if (rel != NULL)
+ table_close(rel, RowExclusiveLock);
+
+ MemoryContextSwitchTo(oldcontext);
+ return;
+}
+/*
+ * ParallelCopyLeader - parallel copy leader's functionality.
+ *
+ * Leader will populate the shared memory and share it across the workers.
+ * Leader will read the table data from the file and copy the contents to block.
+ * Leader will then read the input contents and identify the data based on line
+ * beaks. This information is called line. The line will be populate in
+ * ParallelCopyLineBoundary. Workers will then pick up this information and
+ * insert in to table. Leader will do this till it completes processing the
+ * file. Leader executes the before statement if before statement trigger is
+ * present. Leader read the data from input file. Leader then loads data to data
+ * blocks as and when required block by block. Leader traverses through the data
+ * block to identify one line. It gets a free line to copy the information, if
+ * there is no free line it will wait till there is one free line.
+ * Server copies the identified lines information into lines. This process is
+ * repeated till the complete file is processed.
+ * Leader will wait till all the lines populated are processed by the workers
+ * and exits.
+ */
+static void
+ParallelCopyLeader(CopyState cstate)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ereport(DEBUG1, (errmsg("Running parallel copy leader")));
+
+ pcshared_info->is_read_in_progress = false;
+ cstate->cur_lineno = 0;
+}
+/*
+ * LookupParallelCopyFnPtr - Look up parallel copy function pointer.
+ */
+static pg_attribute_always_inline copy_data_source_cb
+LookupParallelCopyFnPtr(const char *funcname)
+{
+ int i;
+
+ for (i = 0; i < lengthof(InternalParallelCopyFuncPtrs); i++)
+ {
+ if (strcmp(InternalParallelCopyFuncPtrs[i].fn_name, funcname) == 0)
+ return InternalParallelCopyFuncPtrs[i].fn_addr;
+ }
+
+ /* We can only reach this by programming error. */
+ elog(ERROR, "internal function \"%s\" not found", funcname);
+}
+
+/*
+ * LookupParallelCopyFnStr - Lookup function string from a function pointer.
+ */
+static pg_attribute_always_inline char*
+LookupParallelCopyFnStr(copy_data_source_cb fn_addr)
+{
+ int i;
+
+ for (i = 0; i < lengthof(InternalParallelCopyFuncPtrs); i++)
+ {
+ if (InternalParallelCopyFuncPtrs[i].fn_addr == fn_addr)
+ return InternalParallelCopyFuncPtrs[i].fn_name;
+ }
+
+ /* We can only reach this by programming error. */
+ elog(ERROR, "internal function pointer not found");
+}
+/*
* Send copy start/stop messages for frontend copies. These have changed
* in past protocol redesigns.
*/
@@ -1149,6 +1939,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
if (is_from)
{
+ ParallelContext *pcxt = NULL;
Assert(rel);
/* check read-only transaction and parallel mode */
@@ -1158,7 +1949,24 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
cstate = BeginCopyFrom(pstate, rel, stmt->filename, stmt->is_program,
NULL, stmt->attlist, stmt->options);
cstate->whereClause = whereClause;
- *processed = CopyFrom(cstate); /* copy from file to database */
+ cstate->is_parallel = false;
+
+ if (cstate->nworkers > 0)
+ pcxt = BeginParallelCopy(cstate->nworkers, cstate, stmt->attlist,
+ relid);
+
+ if (pcxt)
+ {
+ ParallelCopyLeader(cstate);
+
+ /* Wait for all copy workers to finish */
+ WaitForParallelWorkersToFinish(pcxt);
+ *processed = pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
+ EndParallelCopy(pcxt);
+ }
+ else
+ *processed = CopyFrom(cstate); /* copy from file to database */
+
EndCopyFrom(cstate);
}
else
@@ -1207,6 +2015,7 @@ ProcessCopyOptions(ParseState *pstate,
cstate->is_copy_from = is_from;
cstate->file_encoding = -1;
+ cstate->nworkers = -1;
/* Extract options from the statement node tree */
foreach(option, options)
@@ -1375,6 +2184,53 @@ ProcessCopyOptions(ParseState *pstate,
defel->defname),
parser_errposition(pstate, defel->location)));
}
+ else if (strcmp(defel->defname, "parallel") == 0)
+ {
+ char *endptr, *str;
+ long val;
+
+ if (!cstate->is_copy_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("parallel option supported only for copy from"),
+ parser_errposition(pstate, defel->location)));
+ if (cstate->nworkers >= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("conflicting or redundant options"),
+ parser_errposition(pstate, defel->location)));
+
+ str = defGetString(defel);
+
+ errno = 0; /* To distinguish success/failure after call */
+ val = strtol(str, &endptr, 10);
+
+ /* Check for various possible errors */
+ if ((errno == ERANGE && (val == LONG_MAX || val == LONG_MIN))
+ || (errno != 0 && val == 0) ||
+ *endptr)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("improper use of argument to option \"%s\"",
+ defel->defname),
+ parser_errposition(pstate, defel->location)));
+
+ if (endptr == str)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("no digits were found in argument to option \"%s\"",
+ defel->defname),
+ parser_errposition(pstate, defel->location)));
+
+ cstate->nworkers = (int) val;
+
+ if (cstate->nworkers <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("argument to option \"%s\" must be a positive integer greater than zero",
+ defel->defname),
+ parser_errposition(pstate, defel->location)));
+ }
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1727,12 +2583,13 @@ BeginCopy(ParseState *pstate,
}
/*
- * PopulateGlobalsForCopyFrom - Populates the common variables required for copy
- * from operation. This is a helper function for BeginCopy function.
+ * PopulateCatalogInformation - Populates the common variables required for copy
+ * from operation. This is a helper function for BeginCopy &
+ * ParallelWorkerInitialization function.
*/
static void
PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tupDesc,
- List *attnamelist)
+ List *attnamelist)
{
int num_phys_attrs;
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index c27d970..b3787c1 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -557,7 +557,7 @@ make_copy_attnamelist(LogicalRepRelMapEntry *rel)
* Data source callback for the COPY FROM, which reads from the remote
* connection and passes the data back to our local COPY.
*/
-static int
+int
copy_read_data(void *outbuf, int minread, int maxread)
{
int bytesread = 0;
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c639833..5dc95ac 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -14,6 +14,7 @@
#ifndef COPY_H
#define COPY_H
+#include "access/parallel.h"
#include "nodes/execnodes.h"
#include "nodes/parsenodes.h"
#include "parser/parse_node.h"
@@ -41,4 +42,7 @@ extern uint64 CopyFrom(CopyState cstate);
extern DestReceiver *CreateCopyDestReceiver(void);
+extern void ParallelCopyMain(dsm_segment *seg, shm_toc *toc);
+
+extern int copy_read_data(void *outbuf, int minread, int maxread);
#endif /* COPY_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index c65a552..8a79794 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1698,6 +1698,14 @@ ParallelBitmapHeapState
ParallelBlockTableScanDesc
ParallelCompletionPtr
ParallelContext
+ParallelCopyLineBoundaries
+ParallelCopyLineBoundary
+ParallelCopyCommonKeyData
+ParallelCopyData
+ParallelCopyDataBlock
+ParallelCopyKeyListInfo
+ParallelCopyLineBuf
+ParallelCopyShmInfo
ParallelExecutorInfo
ParallelHashGrowth
ParallelHashJoinBatch
--
1.8.3.1
0003-Allow-copy-from-command-to-process-data-from-file-ST.patchtext/x-patch; charset=US-ASCII; name=0003-Allow-copy-from-command-to-process-data-from-file-ST.patchDownload
From 0286eb202922101b488e5b7414fb2fe2ac302cc8 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>, Bharath <bharath.rupireddyforpostgres@gmail.com>
Date: Tue, 23 Jun 2020 07:15:43 +0530
Subject: [PATCH 3/6] Allow copy from command to process data from file/STDIN
contents to a table in parallel.
This feature allows the copy from to leverage multiple CPUs in order to copy
data from file/STDIN to a table. This adds a PARALLEL option to COPY FROM
command where the user can specify the number of workers that can be used
to perform the COPY FROM command. Specifying zero as number of workers will
disable parallelism.
The backend, to which the "COPY FROM" query is submitted acts as leader with
the responsibility of reading data from the file/stdin, launching at most n
number of workers as specified with PARALLEL 'n' option in the "COPY FROM"
query. The leader populates the common data required for the workers execution
in the DSM and shares it with the workers. The leader then executes before
statement triggers if there exists any. Leader populates DSM lines which
includes the start offset and line size, while populating the lines it reads
as many blocks as required into the DSM data blocks from the file. Each block
is of 64K size. The leader parses the data to identify a line, the existing
logic from CopyReadLineText which identifies the lines with some changes was
used for this. Leader checks if a free line is available to copy the
information, if there is no free line it waits till the required line is
freed up by the worker and then copies the identified lines information
(offset & line size) into the DSM lines. This process is repeated till the
complete file is processed. Simultaneously, the workers cache the lines(50)
locally into the local memory and release the lines to the leader for further
populating. Each worker processes the lines that it cached and inserts it into
the table. The leader does not participate in the insertion of data, leaders
only responsibility will be to identify the lines as fast as possible for the
workers to do the actual copy operation. The leader waits till all the lines
populated are processed by the workers and exits.
---
src/backend/access/common/toast_internals.c | 11 +-
src/backend/access/heap/heapam.c | 13 -
src/backend/access/transam/xact.c | 31 +
src/backend/commands/copy.c | 900 ++++++++++++++++++++++++++--
src/backend/libpq/pqmq.c | 3 +-
src/include/access/xact.h | 2 +
src/tools/pgindent/typedefs.list | 1 +
7 files changed, 908 insertions(+), 53 deletions(-)
diff --git a/src/backend/access/common/toast_internals.c b/src/backend/access/common/toast_internals.c
index 25a81e5..586d53d 100644
--- a/src/backend/access/common/toast_internals.c
+++ b/src/backend/access/common/toast_internals.c
@@ -17,6 +17,7 @@
#include "access/genam.h"
#include "access/heapam.h"
#include "access/heaptoast.h"
+#include "access/parallel.h"
#include "access/table.h"
#include "access/toast_internals.h"
#include "access/xact.h"
@@ -116,7 +117,15 @@ toast_save_datum(Relation rel, Datum value,
TupleDesc toasttupDesc;
Datum t_values[3];
bool t_isnull[3];
- CommandId mycid = GetCurrentCommandId(true);
+
+ /*
+ * Parallel copy can insert toast tuples, in case of parallel copy the
+ * command would have been set already by calling AssignCommandIdForWorker.
+ * For parallel copy call GetCurrentCommandId to get currentCommandId by
+ * passing used as false, as this is taken care earlier.
+ */
+ CommandId mycid = IsParallelWorker() ? GetCurrentCommandId(false) :
+ GetCurrentCommandId(true);
struct varlena *result;
struct varatt_external toast_pointer;
union
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 537913d..28f3a98 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2012,19 +2012,6 @@ static HeapTuple
heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
CommandId cid, int options)
{
- /*
- * Parallel operations are required to be strictly read-only in a parallel
- * worker. Parallel inserts are not safe even in the leader in the
- * general case, because group locking means that heavyweight locks for
- * relation extension or GIN page locks will not conflict between members
- * of a lock group, but we don't prohibit that case here because there are
- * useful special cases that we can safely allow, such as CREATE TABLE AS.
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-
tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 905dc7d..ed4009e 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -502,6 +502,37 @@ GetCurrentFullTransactionIdIfAny(void)
}
/*
+ * AssignFullTransactionIdForWorker
+ *
+ * For parallel copy, transaction id of leader will be used by the workers.
+ */
+void
+AssignFullTransactionIdForWorker(FullTransactionId fullTransactionId)
+{
+ TransactionState s = CurrentTransactionState;
+
+ Assert((IsInParallelMode() || IsParallelWorker()));
+ s->fullTransactionId = fullTransactionId;
+}
+
+/*
+ * AssignCommandIdForWorker
+ *
+ * For parallel copy, command id of leader will be used by the workers.
+ */
+void
+AssignCommandIdForWorker(CommandId commandId, bool used)
+{
+ Assert((IsInParallelMode() || IsParallelWorker()));
+
+ /* this is global to a transaction, not subtransaction-local */
+ if (used)
+ currentCommandIdUsed = true;
+
+ currentCommandId = commandId;
+}
+
+/*
* MarkCurrentTransactionIdLoggedIfAny
*
* Remember that the current xid - if it is assigned - now has been wal logged.
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index c906655..c7af30b 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -26,6 +26,7 @@
#include "access/xlog.h"
#include "catalog/dependency.h"
#include "catalog/pg_authid.h"
+#include "catalog/pg_proc_d.h"
#include "catalog/pg_type.h"
#include "commands/copy.h"
#include "commands/defrem.h"
@@ -40,11 +41,13 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "optimizer/clauses.h"
#include "optimizer/optimizer.h"
#include "parser/parse_coerce.h"
#include "parser/parse_collate.h"
#include "parser/parse_expr.h"
#include "parser/parse_relation.h"
+#include "pgstat.h"
#include "port/pg_bswap.h"
#include "rewrite/rewriteHandler.h"
#include "storage/fd.h"
@@ -95,6 +98,18 @@ typedef enum CopyInsertMethod
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
} CopyInsertMethod;
+/*
+ * State of the line.
+ */
+typedef enum ParallelCopyLineState
+{
+ LINE_INIT, /* initial state of line */
+ LINE_LEADER_POPULATING, /* leader processing line */
+ LINE_LEADER_POPULATED, /* leader completed populating line */
+ LINE_WORKER_PROCESSING, /* worker processing line */
+ LINE_WORKER_PROCESSED /* worker completed processing line */
+}ParallelCopyLineState;
+
#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
#define DATA_BLOCK_SIZE RAW_BUF_SIZE
@@ -538,9 +553,13 @@ if (1) \
{ \
if (raw_buf_ptr > cstate->raw_buf_index) \
{ \
- appendBinaryStringInfo(&cstate->line_buf, \
- cstate->raw_buf + cstate->raw_buf_index, \
- raw_buf_ptr - cstate->raw_buf_index); \
+ if (!IsParallelCopy()) \
+ appendBinaryStringInfo(&cstate->line_buf, \
+ cstate->raw_buf + cstate->raw_buf_index, \
+ raw_buf_ptr - cstate->raw_buf_index); \
+ else \
+ line_size += raw_buf_ptr - cstate->raw_buf_index; \
+ \
cstate->raw_buf_index = raw_buf_ptr; \
} \
} else ((void) 0)
@@ -553,13 +572,40 @@ if (1) \
goto not_end_of_copy; \
} else ((void) 0)
+/* Begin parallel copy Macros */
+#define SET_NEWLINE_SIZE() \
+{ \
+ if (cstate->eol_type == EOL_NL || cstate->eol_type == EOL_CR) \
+ new_line_size = 1; \
+ else if (cstate->eol_type == EOL_CRNL) \
+ new_line_size = 2; \
+ else \
+ new_line_size = 0; \
+}
+
+/*
+ * COPY_WAIT_TO_PROCESS - Wait before continuing to process.
+ */
+#define COPY_WAIT_TO_PROCESS() \
+{ \
+ CHECK_FOR_INTERRUPTS(); \
+ (void) WaitLatch(MyLatch, \
+ WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, \
+ 1L, WAIT_EVENT_PG_SLEEP); \
+ ResetLatch(MyLatch); \
+}
+
+/* End parallel copy Macros */
+
/*
* CONVERT_TO_SERVER_ENCODING - convert contents to server encoding.
*/
#define CONVERT_TO_SERVER_ENCODING(cstate) \
{ \
/* Done reading the line. Convert it to server encoding. */ \
- if (cstate->need_transcoding) \
+ if (cstate->need_transcoding && \
+ (!IsParallelCopy() || \
+ (IsParallelCopy() && !IsLeader()))) \
{ \
char *cvt; \
cvt = pg_any_to_server(cstate->line_buf.data, \
@@ -618,22 +664,38 @@ if (1) \
* CLEAR_EOL_LINE - Wrapper for clearing EOL.
*/
#define CLEAR_EOL_LINE() \
-if (!result && !IsHeaderLine()) \
- CLEAR_EOL_FROM_COPIED_DATA(cstate->line_buf.data, \
- cstate->line_buf.len, \
- cstate->line_buf.len) \
+{ \
+ if (!result && !IsHeaderLine()) \
+ { \
+ if (IsParallelCopy()) \
+ CLEAR_EOL_FROM_COPIED_DATA(cstate->raw_buf, \
+ raw_buf_ptr, line_size) \
+ else \
+ CLEAR_EOL_FROM_COPIED_DATA(cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ cstate->line_buf.len) \
+ } \
+}
/*
* INCREMENTPROCESSED - Increment the lines processed.
*/
-#define INCREMENTPROCESSED(processed) \
-processed++;
+#define INCREMENTPROCESSED(processed) \
+{ \
+ if (!IsParallelCopy()) \
+ processed++; \
+ else \
+ pg_atomic_add_fetch_u64(&cstate->pcdata->pcshared_info->processed, 1); \
+}
/*
* GETPROCESSED - Get the lines processed.
*/
#define GETPROCESSED(processed) \
-return processed;
+if (!IsParallelCopy()) \
+ return processed; \
+else \
+ return pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
@@ -681,8 +743,12 @@ static void CopySendInt16(CopyState cstate, int16 val);
static bool CopyGetInt16(CopyState cstate, int16 *val);
static pg_attribute_always_inline void EndParallelCopy(ParallelContext *pcxt);
+static void ExecBeforeStmtTrigger(CopyState cstate);
+static void CheckTargetRelValidity(CopyState cstate);
static void PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tup_desc,
List *attnamelist);
+static void PopulateCatalogInformation(CopyState cstate);
+static pg_attribute_always_inline uint32 GetLinePosition(CopyState cstate);
static pg_attribute_always_inline copy_data_source_cb LookupParallelCopyFnPtr(const char *funcname);
static pg_attribute_always_inline char* LookupParallelCopyFnStr(copy_data_source_cb fn_addr);
@@ -836,6 +902,137 @@ InsertListShm(ParallelContext *pcxt, int key, List *inputlist,
}
/*
+ * IsTriggerFunctionParallelSafe - Check if the trigger function is parallel
+ * safe for the triggers. Return false if any one of the trigger has parallel
+ * unsafe function.
+ */
+static pg_attribute_always_inline bool
+IsTriggerFunctionParallelSafe(TriggerDesc *trigdesc)
+{
+ int i;
+ for (i = 0; i < trigdesc->numtriggers; i++)
+ {
+ Trigger *trigger = &trigdesc->triggers[i];
+ int trigtype = RI_TRIGGER_NONE;
+
+ if (func_parallel(trigger->tgfoid) != PROPARALLEL_SAFE)
+ return false;
+
+ /* If the trigger is parallel safe, also look for RI_TRIGGER. */
+ trigtype = RI_FKey_trigger_type(trigger->tgfoid);
+ if (trigtype == RI_TRIGGER_PK || trigtype == RI_TRIGGER_FK)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * CheckExprParallelSafety - determine parallel safety of volatile expressions
+ * in default clause of column definition or in where clause and return true if
+ * they are parallel safe.
+ */
+static pg_attribute_always_inline bool
+CheckExprParallelSafety(CopyState cstate)
+{
+ if (contain_volatile_functions(cstate->whereClause))
+ {
+ if (max_parallel_hazard((Query *)cstate->whereClause) != PROPARALLEL_SAFE)
+ return false;
+ }
+
+ /*
+ * Check if any of the column has volatile default expression. if yes, and
+ * they are not parallel safe, then parallelism is not allowed. For
+ * instance, if there are any serial/bigserial columns for which nextval()
+ * default expression which is parallel unsafe is associated, parallelism
+ * should not be allowed. In non parallel copy volatile functions are not
+ * checked for nextval().
+ */
+ if (cstate->defexprs != NULL && cstate->num_defaults != 0)
+ {
+ int i;
+ for (i = 0; i < cstate->num_defaults; i++)
+ {
+ bool volatile_expr = contain_volatile_functions((Node *)cstate->defexprs[i]->expr);
+ if (volatile_expr &&
+ (max_parallel_hazard((Query *)cstate->defexprs[i]->expr)) !=
+ PROPARALLEL_SAFE)
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * FindInsertMethod - determine insert mode single, multi, or multi conditional.
+ */
+static pg_attribute_always_inline CopyInsertMethod
+FindInsertMethod(CopyState cstate)
+{
+ if (cstate->rel->trigdesc != NULL &&
+ (cstate->rel->trigdesc->trig_insert_before_row ||
+ cstate->rel->trigdesc->trig_insert_instead_row))
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
+ cstate->rel->trigdesc != NULL &&
+ cstate->rel->trigdesc->trig_insert_new_table)
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+ return CIM_MULTI_CONDITIONAL;
+
+ return CIM_MULTI;
+}
+
+/*
+ * IsParallelCopyAllowed - check for the cases where parallel copy is not
+ * applicable.
+ */
+static pg_attribute_always_inline bool
+IsParallelCopyAllowed(CopyState cstate)
+{
+ /* Parallel copy not allowed for frontend (2.0 protocol) & binary option. */
+ if ((cstate->copy_dest == COPY_OLD_FE) || cstate->binary)
+ return false;
+
+ /* Check if copy is into foreign table or temporary table. */
+ if (cstate->rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE ||
+ RelationUsesLocalBuffers(cstate->rel))
+ return false;
+
+ /* Check if trigger function is parallel safe. */
+ if (cstate->rel->trigdesc != NULL &&
+ !IsTriggerFunctionParallelSafe(cstate->rel->trigdesc))
+ return false;
+
+ /*
+ * Check if there is after statement or instead of trigger or transition
+ * table triggers.
+ */
+ if (cstate->rel->trigdesc != NULL &&
+ (cstate->rel->trigdesc->trig_insert_after_statement ||
+ cstate->rel->trigdesc->trig_insert_instead_row ||
+ cstate->rel->trigdesc->trig_insert_new_table))
+ return false;
+
+ /* Check if the volatile expressions are parallel safe, if present any. */
+ if (!CheckExprParallelSafety(cstate))
+ return false;
+
+ /* Check if the insertion mode is single. */
+ if (FindInsertMethod(cstate) == CIM_SINGLE)
+ return false;
+
+ return true;
+}
+
+/*
* BeginParallelCopy - start parallel copy tasks.
*
* Get the number of workers required to perform the parallel copy. The data
@@ -865,6 +1062,8 @@ BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
int parallel_workers = 0;
ParallelCopyData *pcdata;
+ CheckTargetRelValidity(cstate);
+
parallel_workers = Min(nworkers, max_worker_processes);
/* Can't perform copy in parallel */
@@ -878,6 +1077,19 @@ BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
cstate->pcdata = pcdata;
+ /*
+ * User chosen parallel copy. Determine if the parallel copy is actually
+ * allowed. If not, go with the non-parallel mode.
+ */
+ if (!IsParallelCopyAllowed(cstate))
+ {
+ elog(WARNING,
+ "Parallel copy not supported for specified table, copy will be run in non-parallel mode");
+ return NULL;
+ }
+
+ full_transaction_id = GetCurrentFullTransactionId();
+
EnterParallelMode();
pcxt = CreateParallelContext("postgres", "ParallelCopyMain",
parallel_workers);
@@ -1108,7 +1320,211 @@ ParallelWorkerInitialization(ParallelCopyCommonKeyData *shared_cstate,
cstate->line_buf_converted = false;
cstate->raw_buf = NULL;
cstate->raw_buf_index = cstate->raw_buf_len = 0;
+
+ PopulateCatalogInformation(cstate);
+
+ /* Create workspace for CopyReadAttributes results. */
+ if (!cstate->binary)
+ {
+ AttrNumber attr_count = list_length(cstate->attnumlist);
+
+ cstate->max_fields = attr_count;
+ cstate->raw_fields = (char **)palloc(attr_count * sizeof(char *));
+ }
+}
+
+/*
+ * CacheLineInfo - Cache the line information to local memory.
+ */
+static bool
+CacheLineInfo(CopyState cstate, uint32 buff_count)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyData *pcdata = cstate->pcdata;
+ uint32 write_pos;
+ ParallelCopyDataBlock *data_blk_ptr;
+ ParallelCopyLineBoundary *lineInfo;
+ uint32 offset;
+ int dataSize;
+ int copiedSize = 0;
+
+ resetStringInfo(&pcdata->worker_line_buf[buff_count].line_buf);
+ write_pos = GetLinePosition(cstate);
+ if (-1 == write_pos)
+ return true;
+
+ /* Get the current line information. */
+ lineInfo = &pcshared_info->line_boundaries.ring[write_pos];
+ if (pg_atomic_read_u32(&lineInfo->line_size) == 0)
+ goto empty_data_line_update;
+
+ /* Get the block information. */
+ data_blk_ptr = &pcshared_info->data_blocks[lineInfo->first_block];
+
+ /* Get the offset information from where the data must be copied. */
+ offset = lineInfo->start_offset;
+ pcdata->worker_line_buf[buff_count].cur_lineno = lineInfo->cur_lineno;
+
+ elog(DEBUG1, "[Worker] Processing - line position:%d, block:%d, unprocessed lines:%d, offset:%d, line size:%d",
+ write_pos, lineInfo->first_block,
+ pg_atomic_read_u32(&data_blk_ptr->unprocessed_line_parts),
+ offset, pg_atomic_read_u32(&lineInfo->line_size));
+
+ for (;;)
+ {
+ uint8 skip_bytes = data_blk_ptr->skip_bytes;
+ /*
+ * There is a possibility that the above loop has come out because
+ * data_blk_ptr->curr_blk_completed is set, but dataSize read might
+ * be an old value, if data_blk_ptr->curr_blk_completed and the line is
+ * completed, line_size will be set. Read the line_size again to be
+ * sure if it is complete or partial block.
+ */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+ if (dataSize)
+ {
+ int remainingSize = dataSize - copiedSize;
+ if (!remainingSize)
+ break;
+
+ /* Whole line is in current block. */
+ if (remainingSize + offset + skip_bytes < DATA_BLOCK_SIZE)
+ {
+ appendBinaryStringInfo(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ remainingSize);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts,
+ 1);
+ break;
+ }
+ else
+ {
+ /* Line is spread across the blocks. */
+ uint32 lineInCurrentBlock = (DATA_BLOCK_SIZE - skip_bytes) - offset;
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ lineInCurrentBlock);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts, 1);
+ copiedSize += lineInCurrentBlock;
+ while (copiedSize < dataSize)
+ {
+ uint32 currentBlockCopySize;
+ ParallelCopyDataBlock *currBlkPtr = &pcshared_info->data_blocks[data_blk_ptr->following_block];
+ skip_bytes = currBlkPtr->skip_bytes;
+
+ /*
+ * If complete data is present in current block use
+ * dataSize - copiedSize, or copy the whole block from
+ * current block.
+ */
+ currentBlockCopySize = Min(dataSize - copiedSize, DATA_BLOCK_SIZE - skip_bytes);
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &currBlkPtr->data[0],
+ currentBlockCopySize);
+ pg_atomic_sub_fetch_u32(&currBlkPtr->unprocessed_line_parts, 1);
+ copiedSize += currentBlockCopySize;
+ data_blk_ptr = currBlkPtr;
+ }
+
+ break;
+ }
+ }
+ else
+ {
+ /* Copy this complete block from the current offset. */
+ uint32 lineInCurrentBlock = (DATA_BLOCK_SIZE - skip_bytes) - offset;
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ lineInCurrentBlock);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts, 1);
+ copiedSize += lineInCurrentBlock;
+
+ /*
+ * Reset the offset. For the first copy, copy from the offset. For
+ * the subsequent copy the complete block.
+ */
+ offset = 0;
+
+ /* Set data_blk_ptr to the following block. */
+ data_blk_ptr = &pcshared_info->data_blocks[data_blk_ptr->following_block];
+ }
+
+ for (;;)
+ {
+ /* Get the size of this line */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+
+ /*
+ * If the data is present in current block lineInfo.line_size
+ * will be updated. If the data is spread across the blocks either
+ * of lineInfo.line_size or data_blk_ptr->curr_blk_completed can
+ * be updated. lineInfo.line_size will be updated if the complete
+ * read is finished. data_blk_ptr->curr_blk_completed will be
+ * updated if processing of current block is finished and data
+ * processing is not finished.
+ */
+ if (data_blk_ptr->curr_blk_completed || (dataSize != -1))
+ break;
+
+ COPY_WAIT_TO_PROCESS()
+ }
+ }
+
+empty_data_line_update:
+ elog(DEBUG1, "[Worker] Completed processing line:%d", write_pos);
+ pg_atomic_write_u32(&lineInfo->line_state, LINE_WORKER_PROCESSED);
+ pg_atomic_write_u32(&lineInfo->line_size, -1);
+ pg_atomic_add_fetch_u64(&pcshared_info->total_worker_processed, 1);
+ return false;
+}
+
+/*
+ * GetWorkerLine - Returns a line for worker to process.
+ */
+static bool
+GetWorkerLine(CopyState cstate)
+{
+ uint32 buff_count;
+ ParallelCopyData *pcdata = cstate->pcdata;
+
+ /*
+ * Copy the line data to line_buf and release the line position so that the
+ * worker can continue loading data.
+ */
+ if (pcdata->worker_line_buf_pos < pcdata->worker_line_buf_count)
+ goto return_line;
+
+ pcdata->worker_line_buf_pos = 0;
+ pcdata->worker_line_buf_count = 0;
+
+ for (buff_count = 0; buff_count < WORKER_CHUNK_COUNT; buff_count++)
+ {
+ bool result = CacheLineInfo(cstate, buff_count);
+ if (result)
+ break;
+
+ pcdata->worker_line_buf_count++;
+ }
+
+ if (pcdata->worker_line_buf_count)
+ goto return_line;
+ else
+ resetStringInfo(&cstate->line_buf);
+
+ return true;
+
+return_line:
+ cstate->line_buf = pcdata->worker_line_buf[pcdata->worker_line_buf_pos].line_buf;
+ cstate->cur_lineno = pcdata->worker_line_buf[pcdata->worker_line_buf_pos].cur_lineno;
+ cstate->line_buf_valid = true;
+
+ /* Mark that encoding conversion hasn't occurred yet. */
+ cstate->line_buf_converted = false;
+ CONVERT_TO_SERVER_ENCODING(cstate)
+ pcdata->worker_line_buf_pos++;
+ return false;
}
+
/*
* ParallelCopyMain - parallel copy worker's code.
*
@@ -1154,6 +1570,8 @@ ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
ereport(DEBUG1, (errmsg("Starting parallel copy worker")));
pcdata->pcshared_info = pcshared_info;
+ AssignFullTransactionIdForWorker(pcshared_info->full_transaction_id);
+ AssignCommandIdForWorker(pcshared_info->mycid, true);
shared_cstate = (ParallelCopyCommonKeyData *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_CSTATE,
false);
@@ -1206,6 +1624,34 @@ ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
MemoryContextSwitchTo(oldcontext);
return;
}
+
+/*
+ * UpdateBlockInLineInfo - Update the line information.
+ */
+static pg_attribute_always_inline int
+UpdateBlockInLineInfo(CopyState cstate, uint32 blk_pos,
+ uint32 offset, uint32 line_size, uint32 line_state)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundaries *lineBoundaryPtr = &pcshared_info->line_boundaries;
+ ParallelCopyLineBoundary *lineInfo;
+ int line_pos = lineBoundaryPtr->leader_pos;
+
+ /* Update the line information for the worker to pick and process. */
+ lineInfo = &lineBoundaryPtr->ring[line_pos];
+ while (pg_atomic_read_u32(&lineInfo->line_size) != -1)
+ COPY_WAIT_TO_PROCESS()
+
+ lineInfo->first_block = blk_pos;
+ lineInfo->start_offset = offset;
+ lineInfo->cur_lineno = cstate->cur_lineno;
+ pg_atomic_write_u32(&lineInfo->line_size, line_size);
+ pg_atomic_write_u32(&lineInfo->line_state, line_state);
+ lineBoundaryPtr->leader_pos = (lineBoundaryPtr->leader_pos + 1) % RINGSIZE;
+
+ return line_pos;
+}
+
/*
* ParallelCopyLeader - parallel copy leader's functionality.
*
@@ -1231,9 +1677,161 @@ ParallelCopyLeader(CopyState cstate)
ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
ereport(DEBUG1, (errmsg("Running parallel copy leader")));
+ /* Execute the before statement triggers from the leader */
+ ExecBeforeStmtTrigger(cstate);
+
+ /* On input just throw the header line away. */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
+ {
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ {
+ pcshared_info->is_read_in_progress = false;
+ return; /* done */
+ }
+ }
+
+ for (;;)
+ {
+ bool done;
+ cstate->cur_lineno++;
+
+ /* Actually read the line into memory here. */
+ done = CopyReadLine(cstate);
+
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ break;
+ }
+
pcshared_info->is_read_in_progress = false;
cstate->cur_lineno = 0;
}
+
+/*
+ * GetLinePosition - return the line position that worker should process.
+ */
+static uint32
+GetLinePosition(CopyState cstate)
+{
+ ParallelCopyData *pcdata = cstate->pcdata;
+ ParallelCopyShmInfo *pcshared_info = pcdata->pcshared_info;
+ uint32 previous_pos = pcdata->worker_processed_pos;
+ uint32 write_pos = (previous_pos == -1) ? 0 : (previous_pos + 1) % RINGSIZE;
+ for (;;)
+ {
+ int dataSize;
+ bool is_read_in_progress = pcshared_info->is_read_in_progress;
+ ParallelCopyLineBoundary *lineInfo;
+ ParallelCopyDataBlock *data_blk_ptr;
+ ParallelCopyLineState line_state = LINE_LEADER_POPULATED;
+ ParallelCopyLineState curr_line_state;
+ CHECK_FOR_INTERRUPTS();
+
+ /* File read completed & no elements to process. */
+ if (!is_read_in_progress &&
+ (pcshared_info->populated ==
+ pg_atomic_read_u64(&pcshared_info->total_worker_processed)))
+ {
+ write_pos = -1;
+ break;
+ }
+
+ /* Get the current line information. */
+ lineInfo = &pcshared_info->line_boundaries.ring[write_pos];
+ curr_line_state = pg_atomic_read_u32(&lineInfo->line_state);
+ if ((write_pos % WORKER_CHUNK_COUNT == 0) &&
+ (curr_line_state == LINE_WORKER_PROCESSED ||
+ curr_line_state == LINE_WORKER_PROCESSING))
+ {
+ pcdata->worker_processed_pos = write_pos;
+ write_pos = (write_pos + WORKER_CHUNK_COUNT) % RINGSIZE;
+ continue;
+ }
+
+ /* Get the size of this line. */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+
+ if (dataSize != 0) /* If not an empty line. */
+ {
+ /* Get the block information. */
+ data_blk_ptr = &pcshared_info->data_blocks[lineInfo->first_block];
+
+ if (!data_blk_ptr->curr_blk_completed && (dataSize == -1))
+ {
+ /* Wait till the current line or block is added. */
+ COPY_WAIT_TO_PROCESS()
+ continue;
+ }
+ }
+
+ /* Make sure that no worker has consumed this element. */
+ if (pg_atomic_compare_exchange_u32(&lineInfo->line_state,
+ &line_state, LINE_WORKER_PROCESSING))
+ break;
+ }
+
+ pcdata->worker_processed_pos = write_pos;
+ return write_pos;
+}
+
+/*
+ * GetFreeCopyBlock - Get a free block for data to be copied.
+ */
+static pg_attribute_always_inline uint32
+GetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
+{
+ int count = 0;
+ uint32 last_free_block = pcshared_info->cur_block_pos;
+ uint32 block_pos = (last_free_block != -1) ? ((last_free_block + 1) % MAX_BLOCKS_COUNT): 0;
+
+ /*
+ * Get a new block for copying data, don't check curent block, current
+ * block will have some unprocessed data.
+ */
+ while (count < (MAX_BLOCKS_COUNT - 1))
+ {
+ ParallelCopyDataBlock *dataBlkPtr = &pcshared_info->data_blocks[block_pos];
+ uint32 unprocessed_line_parts = pg_atomic_read_u32(&dataBlkPtr->unprocessed_line_parts);
+ if (unprocessed_line_parts == 0)
+ {
+ dataBlkPtr->curr_blk_completed = false;
+ dataBlkPtr->skip_bytes = 0;
+ pcshared_info->cur_block_pos = block_pos;
+ return block_pos;
+ }
+
+ block_pos = (block_pos + 1) % MAX_BLOCKS_COUNT;
+ count++;
+ }
+
+ return -1;
+}
+
+/*
+ * WaitGetFreeCopyBlock - If there are no blocks available, wait and get a block
+ * for copying data.
+ */
+static uint32
+WaitGetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
+{
+ uint32 new_free_pos = -1;
+ for (;;)
+ {
+ new_free_pos = GetFreeCopyBlock(pcshared_info);
+ if (new_free_pos != -1) /* We have got one block, break now. */
+ break;
+
+ COPY_WAIT_TO_PROCESS()
+ }
+
+ return new_free_pos;
+}
+
/*
* LookupParallelCopyFnPtr - Look up parallel copy function pointer.
*/
@@ -1269,6 +1867,146 @@ LookupParallelCopyFnStr(copy_data_source_cb fn_addr)
/* We can only reach this by programming error. */
elog(ERROR, "internal function pointer not found");
}
+
+/*
+ * SetRawBufForLoad - Set raw_buf to the shared memory where the file data must
+ * be read.
+ */
+static void
+SetRawBufForLoad(CopyState cstate, uint32 line_size, uint32 copy_buf_len,
+ uint32 raw_buf_ptr, char **copy_raw_buf)
+{
+ ParallelCopyShmInfo *pcshared_info;
+ uint32 cur_block_pos;
+ uint32 next_block_pos;
+ ParallelCopyDataBlock *cur_data_blk_ptr = NULL;
+ ParallelCopyDataBlock *next_data_blk_ptr = NULL;
+
+ if (!IsParallelCopy())
+ return;
+
+ pcshared_info = cstate->pcdata->pcshared_info;
+ cur_block_pos = pcshared_info->cur_block_pos;
+ cur_data_blk_ptr = (cstate->raw_buf) ? &pcshared_info->data_blocks[cur_block_pos] : NULL;
+ next_block_pos = WaitGetFreeCopyBlock(pcshared_info);
+ next_data_blk_ptr = &pcshared_info->data_blocks[next_block_pos];
+
+ /* set raw_buf to the data block in shared memory */
+ cstate->raw_buf = next_data_blk_ptr->data;
+ *copy_raw_buf = cstate->raw_buf;
+ if (cur_data_blk_ptr && line_size)
+ {
+ /*
+ * Mark the previous block as completed, worker can start copying this
+ * data.
+ */
+ cur_data_blk_ptr->following_block = next_block_pos;
+ pg_atomic_add_fetch_u32(&cur_data_blk_ptr->unprocessed_line_parts, 1);
+ cur_data_blk_ptr->skip_bytes = copy_buf_len - raw_buf_ptr;
+ cur_data_blk_ptr->curr_blk_completed = true;
+ }
+}
+
+/*
+ * EndLineParallelCopy - Update the line information in shared memory.
+ */
+static void
+EndLineParallelCopy(CopyState cstate, uint32 line_pos, uint32 line_size,
+ uint32 raw_buf_ptr)
+{
+ uint8 new_line_size;
+ if (!IsParallelCopy())
+ return;
+
+ if (!IsHeaderLine())
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundaries *lineBoundaryPtr = &pcshared_info->line_boundaries;
+ SET_NEWLINE_SIZE()
+ if (line_size)
+ {
+ ParallelCopyLineBoundary *lineInfo = &lineBoundaryPtr->ring[line_pos];
+ /*
+ * If the new_line_size > raw_buf_ptr, then the new block has only
+ * new line char content. The unprocessed count should not be
+ * increased in this case.
+ */
+ if (raw_buf_ptr > new_line_size)
+ {
+ uint32 cur_block_pos = pcshared_info->cur_block_pos;
+ ParallelCopyDataBlock *curr_data_blk_ptr = &pcshared_info->data_blocks[cur_block_pos];
+ pg_atomic_add_fetch_u32(&curr_data_blk_ptr->unprocessed_line_parts, 1);
+ }
+
+ /* Update line size. */
+ pg_atomic_write_u32(&lineInfo->line_size, line_size);
+ pg_atomic_write_u32(&lineInfo->line_state, LINE_LEADER_POPULATED);
+ elog(DEBUG1, "[Leader] After adding - line position:%d, line_size:%d",
+ line_pos, line_size);
+ pcshared_info->populated++;
+ }
+ else if (new_line_size)
+ {
+ /* This means only new line char, empty record should be inserted.*/
+ ParallelCopyLineBoundary *lineInfo;
+ line_pos = UpdateBlockInLineInfo(cstate, -1, -1, 0,
+ LINE_LEADER_POPULATED);
+ lineInfo = &lineBoundaryPtr->ring[line_pos];
+ elog(DEBUG1, "[Leader] Added empty line with offset:%d, line position:%d, line size:%d",
+ lineInfo->start_offset, line_pos,
+ pg_atomic_read_u32(&lineInfo->line_size));
+ pcshared_info->populated++;
+ }
+ }
+}
+
+/*
+ * ExecBeforeStmtTrigger - Execute the before statement trigger, this will be
+ * executed for parallel copy by the leader process.
+ */
+static void
+ExecBeforeStmtTrigger(CopyState cstate)
+{
+ EState *estate = CreateExecutorState();
+ ResultRelInfo *resultRelInfo;
+
+ Assert(IsLeader());
+
+ /*
+ * We need a ResultRelInfo so we can use the regular executor's
+ * index-entry-making machinery. (There used to be a huge amount of code
+ * here that basically duplicated execUtils.c ...)
+ */
+ resultRelInfo = makeNode(ResultRelInfo);
+ InitResultRelInfo(resultRelInfo,
+ cstate->rel,
+ 1, /* must match rel's position in range_table */
+ NULL,
+ 0);
+
+ /* Verify the named relation is a valid target for INSERT */
+ CheckValidResultRel(resultRelInfo, CMD_INSERT);
+
+ estate->es_result_relations = resultRelInfo;
+ estate->es_num_result_relations = 1;
+ estate->es_result_relation_info = resultRelInfo;
+
+ ExecInitRangeTable(estate, cstate->range_table);
+
+ /*
+ * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
+ * should do this for COPY, since it's not really an "INSERT" statement as
+ * such. However, executing these triggers maintains consistency with the
+ * EACH ROW triggers that we already fire on COPY.
+ */
+ ExecBSInsertTriggers(estate, resultRelInfo);
+
+ /* Close any trigger target relations */
+ ExecCleanUpTriggerState(estate);
+
+ FreeExecutorState(estate);
+}
+
/*
* Send copy start/stop messages for frontend copies. These have changed
* in past protocol redesigns.
@@ -3702,7 +4440,8 @@ CopyFrom(CopyState cstate)
PartitionTupleRouting *proute = NULL;
ErrorContextCallback errcallback;
- CommandId mycid = GetCurrentCommandId(true);
+ CommandId mycid = IsParallelCopy() ? GetCurrentCommandId(false) :
+ GetCurrentCommandId(true);
int ti_options = 0; /* start with default options for insert */
BulkInsertState bistate = NULL;
CopyInsertMethod insertMethod;
@@ -3712,7 +4451,14 @@ CopyFrom(CopyState cstate)
bool has_instead_insert_row_trig;
bool leafpart_use_multi_insert = false;
- CheckTargetRelValidity(cstate);
+ /*
+ * Perform this check if it is not parallel copy. In case of parallel
+ * copy, this check is done by the leader, so that if any invalid case
+ * exist the copy from command will error out from the leader itself,
+ * avoiding launching workers, just to throw error.
+ */
+ if (!IsParallelCopy())
+ CheckTargetRelValidity(cstate);
/*
* If the target file is new-in-transaction, we assume that checking FSM
@@ -3901,13 +4647,16 @@ CopyFrom(CopyState cstate)
has_instead_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
resultRelInfo->ri_TrigDesc->trig_insert_instead_row);
- /*
- * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
- * should do this for COPY, since it's not really an "INSERT" statement as
- * such. However, executing these triggers maintains consistency with the
- * EACH ROW triggers that we already fire on COPY.
- */
- ExecBSInsertTriggers(estate, resultRelInfo);
+ if (!IsParallelCopy())
+ {
+ /*
+ * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
+ * should do this for COPY, since it's not really an "INSERT" statement as
+ * such. However, executing these triggers maintains consistency with the
+ * EACH ROW triggers that we already fire on COPY.
+ */
+ ExecBSInsertTriggers(estate, resultRelInfo);
+ }
econtext = GetPerTupleExprContext(estate);
@@ -4007,6 +4756,16 @@ CopyFrom(CopyState cstate)
!has_instead_insert_row_trig &&
resultRelInfo->ri_FdwRoutine == NULL;
+ /*
+ * If the table has any partitions that are either foreign or
+ * has BEFORE/INSTEAD OF triggers, we can't perform copy
+ * operations with parallel workers.
+ */
+ if (!leafpart_use_multi_insert && IsParallelWorker())
+ ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot perform PARALLEL COPY if partition has BEFORE/INSTEAD OF triggers, or if the partition is foreign partition"),
+ errhint("Try COPY without PARALLEL option")));
+
/* Set the multi-insert buffer to use for this partition. */
if (leafpart_use_multi_insert)
{
@@ -4421,7 +5180,7 @@ BeginCopyFrom(ParseState *pstate,
initStringInfo(&cstate->attribute_buf);
initStringInfo(&cstate->line_buf);
cstate->line_buf_converted = false;
- cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf = (IsParallelCopy()) ? NULL : (char *) palloc(RAW_BUF_SIZE + 1);
cstate->raw_buf_index = cstate->raw_buf_len = 0;
/* Assign range table, we'll need it in CopyFrom. */
@@ -4565,26 +5324,35 @@ NextCopyFromRawFields(CopyState cstate, char ***fields, int *nfields)
/* only available for text or csv input */
Assert(!cstate->binary);
- /* on input just throw the header line away */
- if (cstate->cur_lineno == 0 && cstate->header_line)
+ if (IsParallelCopy())
{
- cstate->cur_lineno++;
- if (CopyReadLine(cstate))
- return false; /* done */
+ done = GetWorkerLine(cstate);
+ if (done && cstate->line_buf.len == 0)
+ return false;
}
+ else
+ {
+ /* on input just throw the header line away */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
+ {
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ return false; /* done */
+ }
- cstate->cur_lineno++;
+ cstate->cur_lineno++;
- /* Actually read the line into memory here */
- done = CopyReadLine(cstate);
+ /* Actually read the line into memory here */
+ done = CopyReadLine(cstate);
- /*
- * EOF at start of line means we're done. If we see EOF after some
- * characters, we act as though it was newline followed by EOF, ie,
- * process the line and then exit loop on next iteration.
- */
- if (done && cstate->line_buf.len == 0)
- return false;
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ return false;
+ }
/* Parse the line into de-escaped field values */
if (cstate->csv_mode)
@@ -4832,9 +5600,31 @@ CopyReadLine(CopyState cstate)
*/
if (cstate->copy_dest == COPY_NEW_FE)
{
+ bool bIsFirst = true;
do
{
- cstate->raw_buf_index = cstate->raw_buf_len;
+ if (!IsParallelCopy())
+ cstate->raw_buf_index = cstate->raw_buf_len;
+ else
+ {
+ if (cstate->raw_buf_index == RAW_BUF_SIZE)
+ {
+ /* Get a new block if it is the first time, From the
+ * subsequent time, reset the index and re-use the same
+ * block.
+ */
+ if (bIsFirst)
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint32 block_pos = WaitGetFreeCopyBlock(pcshared_info);
+ cstate->raw_buf = pcshared_info->data_blocks[block_pos].data;
+ bIsFirst = false;
+ }
+
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+ }
+ }
+
} while (CopyLoadRawBuf(cstate));
}
}
@@ -4864,6 +5654,11 @@ CopyReadLineText(CopyState cstate)
char quotec = '\0';
char escapec = '\0';
+ /* For parallel copy */
+ uint32 line_size = 0;
+ int line_pos = 0;
+
+ cstate->eol_type = EOL_UNKNOWN;
if (cstate->csv_mode)
{
quotec = cstate->quote[0];
@@ -4918,6 +5713,8 @@ CopyReadLineText(CopyState cstate)
if (raw_buf_ptr >= copy_buf_len || need_data)
{
REFILL_LINEBUF;
+ SetRawBufForLoad(cstate, line_size, copy_buf_len, raw_buf_ptr,
+ ©_raw_buf);
/*
* Try to read some more data. This will certainly reset
@@ -5142,9 +5939,15 @@ CopyReadLineText(CopyState cstate)
* discard the data and the \. sequence.
*/
if (prev_raw_ptr > cstate->raw_buf_index)
- appendBinaryStringInfo(&cstate->line_buf,
+ {
+ if (!IsParallelCopy())
+ appendBinaryStringInfo(&cstate->line_buf,
cstate->raw_buf + cstate->raw_buf_index,
prev_raw_ptr - cstate->raw_buf_index);
+ else
+ line_size += prev_raw_ptr - cstate->raw_buf_index;
+ }
+
cstate->raw_buf_index = raw_buf_ptr;
result = true; /* report EOF */
break;
@@ -5196,6 +5999,26 @@ not_end_of_copy:
IF_NEED_REFILL_AND_EOF_BREAK(mblen - 1);
raw_buf_ptr += mblen - 1;
}
+
+ /*
+ * Skip the header line. Update the line here, this cannot be done at
+ * the beginning, as there is a possibility that file contains empty
+ * lines.
+ */
+ if (IsParallelCopy() && first_char_in_line && !IsHeaderLine())
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundary *lineInfo;
+ uint32 line_first_block = pcshared_info->cur_block_pos;
+ line_pos = UpdateBlockInLineInfo(cstate,
+ line_first_block,
+ cstate->raw_buf_index, -1,
+ LINE_LEADER_POPULATING);
+ lineInfo = &pcshared_info->line_boundaries.ring[line_pos];
+ elog(DEBUG1, "[Leader] Adding - block:%d, offset:%d, line position:%d",
+ line_first_block, lineInfo->start_offset, line_pos);
+ }
+
first_char_in_line = false;
} /* end of outer loop */
@@ -5204,6 +6027,7 @@ not_end_of_copy:
*/
REFILL_LINEBUF;
CLEAR_EOL_LINE()
+ EndLineParallelCopy(cstate, line_pos, line_size, raw_buf_ptr);
return result;
}
diff --git a/src/backend/libpq/pqmq.c b/src/backend/libpq/pqmq.c
index 743d24c..21ef87f 100644
--- a/src/backend/libpq/pqmq.c
+++ b/src/backend/libpq/pqmq.c
@@ -168,7 +168,8 @@ mq_putmessage(char msgtype, const char *s, size_t len)
if (result != SHM_MQ_WOULD_BLOCK)
break;
- (void) WaitLatch(MyLatch, WL_LATCH_SET | WL_EXIT_ON_PM_DEATH, 0,
+ (void) WaitLatch(MyLatch,
+ WL_TIMEOUT | WL_LATCH_SET | WL_EXIT_ON_PM_DEATH, 0,
WAIT_EVENT_MQ_PUT_MESSAGE);
ResetLatch(MyLatch);
CHECK_FOR_INTERRUPTS();
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index db19187..20dafb7 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -381,6 +381,8 @@ extern FullTransactionId GetTopFullTransactionId(void);
extern FullTransactionId GetTopFullTransactionIdIfAny(void);
extern FullTransactionId GetCurrentFullTransactionId(void);
extern FullTransactionId GetCurrentFullTransactionIdIfAny(void);
+extern void AssignFullTransactionIdForWorker(FullTransactionId fullTransactionId);
+extern void AssignCommandIdForWorker(CommandId commandId, bool used);
extern void MarkCurrentTransactionIdLoggedIfAny(void);
extern bool SubTransactionIsActive(SubTransactionId subxid);
extern CommandId GetCurrentCommandId(bool used);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 8a79794..86b5c62 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1700,6 +1700,7 @@ ParallelCompletionPtr
ParallelContext
ParallelCopyLineBoundaries
ParallelCopyLineBoundary
+ParallelCopyLineState
ParallelCopyCommonKeyData
ParallelCopyData
ParallelCopyDataBlock
--
1.8.3.1
0004-Documentation-for-parallel-copy.patchtext/x-patch; charset=US-ASCII; name=0004-Documentation-for-parallel-copy.patchDownload
From 8db83cde4430300d40b926a02d8c3cb6e9847fa5 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Wed, 17 Jun 2020 07:34:56 +0530
Subject: [PATCH 4/6] Documentation for parallel copy.
This patch has the documentation changes for parallel copy.
---
doc/src/sgml/ref/copy.sgml | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18189ab..95d349d 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -37,6 +37,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
NULL '<replaceable class="parameter">null_string</replaceable>'
HEADER [ <replaceable class="parameter">boolean</replaceable> ]
+ PARALLEL <replaceable class="parameter">integer</replaceable>
QUOTE '<replaceable class="parameter">quote_character</replaceable>'
ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -275,6 +276,21 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</varlistentry>
<varlistentry>
+ <term><literal>PARALLEL</literal></term>
+ <listitem>
+ <para>
+ Perform <command>COPY FROM</command> in parallel using <replaceable
+ class="parameter"> integer</replaceable> background workers. Please
+ note that it is not guaranteed that the number of parallel workers
+ specified in <replaceable class="parameter">integer</replaceable> will
+ be used during execution. It is possible for a copy to run with fewer
+ workers than specified, or even with no workers at all. This option is
+ allowed only in <command>COPY FROM</command>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>QUOTE</literal></term>
<listitem>
<para>
--
1.8.3.1
0005-Tests-for-parallel-copy.patchtext/x-patch; charset=US-ASCII; name=0005-Tests-for-parallel-copy.patchDownload
From 9eb6f57ba4e4bdf9f63eaab348c997fdfb7d0e98 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Wed, 17 Jun 2020 07:37:26 +0530
Subject: [PATCH 5/6] Tests for parallel copy.
This patch has the tests for parallel copy.
---
src/test/regress/expected/copy2.out | 210 +++++++++++++++++++++++++++++++++++-
src/test/regress/input/copy.source | 12 +++
src/test/regress/output/copy.source | 12 +++
src/test/regress/sql/copy2.sql | 208 ++++++++++++++++++++++++++++++++++-
4 files changed, 434 insertions(+), 8 deletions(-)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index e40287d..ef75148 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -254,18 +254,32 @@ It is "perfect".|
"It is ""perfect""."," "
"",
--test that we read consecutive LFs properly
-CREATE TEMP TABLE testnl (a int, b text, c int);
+CREATE TABLE testnl (a int, b text, c int);
COPY testnl FROM stdin CSV;
+COPY testnl FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
+SELECT * FROM testnl;
+ a | b | c
+---+----------------------+---
+ 1 | a field with two LFs+| 2
+ | +|
+ | inside |
+ 1 | a field with two LFs+| 2
+ | +|
+ | inside |
+(2 rows)
+
-- test end of copy marker
-CREATE TEMP TABLE testeoc (a text);
+CREATE TABLE testeoc (a text);
COPY testeoc FROM stdin CSV;
+TRUNCATE testeoc;
+COPY testeoc FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
COPY testeoc TO stdout CSV;
a\.
\.b
c\.d
"\."
-- test handling of nonstandard null marker that violates escaping rules
-CREATE TEMP TABLE testnull(a int, b text);
+CREATE TABLE testnull(a int, b text);
INSERT INTO testnull VALUES (1, E'\\0'), (NULL, NULL);
COPY testnull TO stdout WITH NULL AS E'\\0';
1 \\0
@@ -280,6 +294,15 @@ SELECT * FROM testnull;
|
(4 rows)
+TRUNCATE testnull;
+COPY testnull FROM stdin WITH (NULL E'\\0', PARALLEL 1);
+SELECT * FROM testnull;
+ a | b
+----+----
+ 42 | \0
+ |
+(2 rows)
+
BEGIN;
CREATE TABLE vistest (LIKE testeoc);
COPY vistest FROM stdin CSV;
@@ -349,6 +372,34 @@ SELECT * FROM vistest;
BEGIN;
TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+SELECT * FROM vistest;
+ a
+----
+ a2
+ b
+(2 rows)
+
+SAVEPOINT s1;
+TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+SELECT * FROM vistest;
+ a
+----
+ d2
+ e
+(2 rows)
+
+COMMIT;
+SELECT * FROM vistest;
+ a
+----
+ d2
+ e
+(2 rows)
+
+BEGIN;
+TRUNCATE vistest;
COPY vistest FROM stdin CSV FREEZE;
SELECT * FROM vistest;
a
@@ -409,7 +460,7 @@ SELECT * FROM vistest;
(2 rows)
-- Test FORCE_NOT_NULL and FORCE_NULL options
-CREATE TEMP TABLE forcetest (
+CREATE TABLE forcetest (
a INT NOT NULL,
b TEXT NOT NULL,
c TEXT,
@@ -420,6 +471,8 @@ CREATE TEMP TABLE forcetest (
-- should succeed with no effect ("b" remains an empty string, "c" remains NULL)
BEGIN;
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
+TRUNCATE forcetest;
+COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c), PARALLEL 1);
COMMIT;
SELECT b, c FROM forcetest WHERE a = 1;
b | c
@@ -430,6 +483,8 @@ SELECT b, c FROM forcetest WHERE a = 1;
-- should succeed, FORCE_NULL and FORCE_NOT_NULL can be both specified
BEGIN;
COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d));
+TRUNCATE forcetest;
+COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d), PARALLEL 1);
COMMIT;
SELECT c, d FROM forcetest WHERE a = 2;
c | d
@@ -486,6 +541,31 @@ select * from check_con_tbl;
(2 rows)
+\d+ check_con_tbl
+ Table "public.check_con_tbl"
+ Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
+--------+---------+-----------+----------+---------+---------+--------------+-------------
+ f1 | integer | | | | plain | |
+Check constraints:
+ "check_con_tbl_check" CHECK (check_con_function(check_con_tbl.*))
+
+truncate check_con_tbl;
+copy check_con_tbl from stdin with (parallel 1);
+NOTICE: input = {"f1":1}
+NOTICE: input = {"f1":null}
+copy check_con_tbl from stdin with (parallel 1);
+NOTICE: input = {"f1":0}
+ERROR: new row for relation "check_con_tbl" violates check constraint "check_con_tbl_check"
+DETAIL: Failing row contains (0).
+CONTEXT: COPY check_con_tbl, line 1: "0"
+parallel worker
+select * from check_con_tbl;
+ f1
+----
+ 1
+
+(2 rows)
+
-- test with RLS enabled.
CREATE ROLE regress_rls_copy_user;
CREATE ROLE regress_rls_copy_user_colperms;
@@ -600,8 +680,130 @@ SELECT * FROM instead_of_insert_tbl;
(2 rows)
COMMIT;
+-- Parallel copy tests.
+CREATE TABLE test_parallel_copy (
+ a INT,
+ b INT,
+ c TEXT not null default 'stuff',
+ d TEXT,
+ e TEXT
+) ;
+COPY test_parallel_copy (a, b, c, d, e) from stdin with (PARALLEL 1);
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1, FORMAT 'csv', HEADER);
+-- zero workers: should perform non-parallel copy
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0');
+ERROR: argument to option "parallel" must be a positive integer greater than zero
+LINE 1: COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0'...
+ ^
+-- referencing table: should perform non-parallel copy
+CREATE TABLE test_copy_pk(c1 INT PRIMARY KEY);
+INSERT INTO test_copy_pk VALUES(10);
+CREATE TABLE test_copy_ri(c1 INT REFERENCES test_copy_pk(c1));
+COPY test_copy_ri from stdin with (FORMAT csv, DELIMITER ',', PARALLEL 1);
+WARNING: Parallel copy not supported for specified table, copy will be run in non-parallel mode
+-- expressions: should perform non-parallel copy
+CREATE TABLE test_copy_expr (index INT, height REAL, weight REAL);
+COPY test_copy_expr FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1) WHERE height > random() * 65;
+WARNING: Parallel copy not supported for specified table, copy will be run in non-parallel mode
+-- serial data: should perform non-parallel copy
+CREATE TABLE testserial (index SERIAL, height REAL);
+COPY testserial(height) FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1);
+WARNING: Parallel copy not supported for specified table, copy will be run in non-parallel mode
+-- temporary table copy: should peform non-parallel copy
+CREATE TEMPORARY TABLE temp_test(
+ a int
+) ;
+COPY temp_test (a) from stdin with (PARALLEL 1);
+WARNING: Parallel copy not supported for specified table, copy will be run in non-parallel mode
+-- non-existent column in column list: should fail
+COPY test_parallel_copy (xyz) from stdin with (PARALLEL 1);
+ERROR: column "xyz" of relation "test_parallel_copy" does not exist
+-- too many columns in column list: should fail
+COPY test_parallel_copy (a, b, c, d, e, d, c) from stdin with (PARALLEL 1);
+ERROR: column "d" specified more than once
+-- missing data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: invalid input syntax for type integer: ""
+CONTEXT: COPY test_parallel_copy, line 0, column a: ""
+parallel worker
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: missing data for column "e"
+CONTEXT: COPY test_parallel_copy, line 1: "2000 230 23 23"
+parallel worker
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: missing data for column "e"
+CONTEXT: COPY test_parallel_copy, line 1: "2001 231 \N \N"
+parallel worker
+-- extra data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: extra data after last expected column
+CONTEXT: COPY test_parallel_copy, line 1: "2002 232 40 50 60 70 80"
+parallel worker
+-- various COPY options: delimiters, oids, NULL string, encoding
+COPY test_parallel_copy (b, c, d, e) from stdin with (delimiter ',', null 'x', PARALLEL 1) ;
+COPY test_parallel_copy from stdin WITH (DELIMITER ';', NULL '', PARALLEL 1);
+COPY test_parallel_copy from stdin WITH (DELIMITER ':', NULL E'\\X', ENCODING 'sql_ascii', PARALLEL 1);
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = 50004;
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a > 60003;
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE f > 60003;
+ERROR: column "f" does not exist
+LINE 1: ..._parallel_copy from stdin WITH (PARALLEL 1) WHERE f > 60003;
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = max(x.b);
+ERROR: missing FROM-clause entry for table "x"
+LINE 1: ...rallel_copy from stdin WITH (PARALLEL 1) WHERE a = max(x.b);
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (SELECT 1 FROM x);
+ERROR: cannot use subquery in COPY FROM WHERE condition
+LINE 1: ...arallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (SELECT...
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (generate_series(1,5));
+ERROR: set-returning functions are not allowed in COPY FROM WHERE conditions
+LINE 1: ...lel_copy from stdin WITH (PARALLEL 1) WHERE a IN (generate_s...
+ ^
+COPY test_parallel_copy from stdin WITH(PARALLEL 1) WHERE a = row_number() over(b);
+ERROR: window functions are not allowed in COPY FROM WHERE conditions
+LINE 1: ...rallel_copy from stdin WITH(PARALLEL 1) WHERE a = row_number...
+ ^
+-- check results of copy in
+SELECT * FROM test_parallel_copy;
+ a | b | c | d | e
+-------+----+------------+---------+---------
+ 1 | 11 | test_c1 | test_d1 | test_e1
+ 2 | 12 | test_c2 | test_d2 | test_e2
+ | 3 | stuff | test_d3 |
+ | 4 | stuff | test_d4 |
+ | 5 | stuff | test_d5 |
+ | | 45 | 80 | 90
+ | | x | \x | \x
+ | | , | \, | \
+ 3000 | | c | |
+ 4000 | | C | |
+ 4001 | 1 | empty | |
+ 4002 | 2 | null | |
+ 4003 | 3 | Backslash | \ | \
+ 4004 | 4 | BackslashX | \X | \X
+ 4005 | 5 | N | N | N
+ 4006 | 6 | BackslashN | \N | \N
+ 4007 | 7 | XX | XX | XX
+ 4008 | 8 | Delimiter | : | :
+ 50004 | 25 | 35 | 45 | 55
+ 60004 | 25 | 35 | 45 | 55
+ 60005 | 26 | 36 | 46 | 56
+(21 rows)
+
-- clean up
+DROP TABLE test_copy_ri;
+DROP TABLE test_copy_pk;
+DROP TABLE test_copy_expr;
+DROP TABLE testeoc;
+DROP TABLE testnl;
DROP TABLE forcetest;
+DROP TABLE test_parallel_copy;
+DROP TABLE testserial;
+DROP TABLE testnull;
DROP TABLE vistest;
DROP FUNCTION truncate_in_subxact();
DROP TABLE x, y;
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index a1d529a..159c058 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -15,6 +15,13 @@ DELETE FROM onek;
COPY onek FROM '@abs_builddir@/results/onek.data';
+-- test parallel copy
+COPY tenk1 FROM '@abs_srcdir@/data/tenk.data' WITH (parallel 2);
+
+SELECT COUNT(*) FROM tenk1;
+
+TRUNCATE tenk1;
+
COPY tenk1 FROM '@abs_srcdir@/data/tenk.data';
COPY slow_emp4000 FROM '@abs_srcdir@/data/rect.data';
@@ -159,6 +166,11 @@ truncate parted_copytest;
copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv';
+-- Test parallel copy from with a partitioned table.
+truncate parted_copytest;
+
+copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv' with (parallel 2);
+
-- Ensure COPY FREEZE errors for partitioned tables.
begin;
truncate parted_copytest;
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index 938d355..c3003fe 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -9,6 +9,15 @@ COPY onek FROM '@abs_srcdir@/data/onek.data';
COPY onek TO '@abs_builddir@/results/onek.data';
DELETE FROM onek;
COPY onek FROM '@abs_builddir@/results/onek.data';
+-- test parallel copy
+COPY tenk1 FROM '@abs_srcdir@/data/tenk.data' WITH (parallel 2);
+SELECT COUNT(*) FROM tenk1;
+ count
+-------
+ 10000
+(1 row)
+
+TRUNCATE tenk1;
COPY tenk1 FROM '@abs_srcdir@/data/tenk.data';
COPY slow_emp4000 FROM '@abs_srcdir@/data/rect.data';
COPY person FROM '@abs_srcdir@/data/person.data';
@@ -113,6 +122,9 @@ insert into parted_copytest select x,1,'One' from generate_series(1011,1020) x;
copy (select * from parted_copytest order by a) to '@abs_builddir@/results/parted_copytest.csv';
truncate parted_copytest;
copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv';
+-- Test parallel copy from with a partitioned table.
+truncate parted_copytest;
+copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv' with (parallel 2);
-- Ensure COPY FREEZE errors for partitioned tables.
begin;
truncate parted_copytest;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 902f4fa..032cea9 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -157,7 +157,7 @@ COPY y TO stdout (FORMAT CSV, FORCE_QUOTE *);
--test that we read consecutive LFs properly
-CREATE TEMP TABLE testnl (a int, b text, c int);
+CREATE TABLE testnl (a int, b text, c int);
COPY testnl FROM stdin CSV;
1,"a field with two LFs
@@ -165,8 +165,16 @@ COPY testnl FROM stdin CSV;
inside",2
\.
+COPY testnl FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
+1,"a field with two LFs
+
+inside",2
+\.
+
+SELECT * FROM testnl;
+
-- test end of copy marker
-CREATE TEMP TABLE testeoc (a text);
+CREATE TABLE testeoc (a text);
COPY testeoc FROM stdin CSV;
a\.
@@ -175,11 +183,19 @@ c\.d
"\."
\.
+TRUNCATE testeoc;
+COPY testeoc FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
+a\.
+\.b
+c\.d
+"\."
+\.
+
COPY testeoc TO stdout CSV;
-- test handling of nonstandard null marker that violates escaping rules
-CREATE TEMP TABLE testnull(a int, b text);
+CREATE TABLE testnull(a int, b text);
INSERT INTO testnull VALUES (1, E'\\0'), (NULL, NULL);
COPY testnull TO stdout WITH NULL AS E'\\0';
@@ -191,6 +207,14 @@ COPY testnull FROM stdin WITH NULL AS E'\\0';
SELECT * FROM testnull;
+TRUNCATE testnull;
+COPY testnull FROM stdin WITH (NULL E'\\0', PARALLEL 1);
+42 \\0
+\0 \0
+\.
+
+SELECT * FROM testnull;
+
BEGIN;
CREATE TABLE vistest (LIKE testeoc);
COPY vistest FROM stdin CSV;
@@ -235,6 +259,23 @@ SELECT * FROM vistest;
BEGIN;
TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+a2
+b
+\.
+SELECT * FROM vistest;
+SAVEPOINT s1;
+TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+d2
+e
+\.
+SELECT * FROM vistest;
+COMMIT;
+SELECT * FROM vistest;
+
+BEGIN;
+TRUNCATE vistest;
COPY vistest FROM stdin CSV FREEZE;
x
y
@@ -284,7 +325,7 @@ SELECT * FROM vistest;
COMMIT;
SELECT * FROM vistest;
-- Test FORCE_NOT_NULL and FORCE_NULL options
-CREATE TEMP TABLE forcetest (
+CREATE TABLE forcetest (
a INT NOT NULL,
b TEXT NOT NULL,
c TEXT,
@@ -297,6 +338,10 @@ BEGIN;
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
1,,""
\.
+TRUNCATE forcetest;
+COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c), PARALLEL 1);
+1,,""
+\.
COMMIT;
SELECT b, c FROM forcetest WHERE a = 1;
-- should succeed, FORCE_NULL and FORCE_NOT_NULL can be both specified
@@ -304,6 +349,10 @@ BEGIN;
COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d));
2,'a',,""
\.
+TRUNCATE forcetest;
+COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d), PARALLEL 1);
+2,'a',,""
+\.
COMMIT;
SELECT c, d FROM forcetest WHERE a = 2;
-- should fail with not-null constraint violation
@@ -339,6 +388,16 @@ copy check_con_tbl from stdin;
0
\.
select * from check_con_tbl;
+\d+ check_con_tbl
+truncate check_con_tbl;
+copy check_con_tbl from stdin with (parallel 1);
+1
+\N
+\.
+copy check_con_tbl from stdin with (parallel 1);
+0
+\.
+select * from check_con_tbl;
-- test with RLS enabled.
CREATE ROLE regress_rls_copy_user;
@@ -440,8 +499,149 @@ test1
SELECT * FROM instead_of_insert_tbl;
COMMIT;
+-- Parallel copy tests.
+CREATE TABLE test_parallel_copy (
+ a INT,
+ b INT,
+ c TEXT not null default 'stuff',
+ d TEXT,
+ e TEXT
+) ;
+
+COPY test_parallel_copy (a, b, c, d, e) from stdin with (PARALLEL 1);
+1 11 test_c1 test_d1 test_e1
+2 12 test_c2 test_d2 test_e2
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+3 test_d3
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+4 test_d4
+5 test_d5
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1, FORMAT 'csv', HEADER);
+b d
+\.
+
+-- zero workers: should perform non-parallel copy
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0');
+
+-- referencing table: should perform non-parallel copy
+CREATE TABLE test_copy_pk(c1 INT PRIMARY KEY);
+INSERT INTO test_copy_pk VALUES(10);
+CREATE TABLE test_copy_ri(c1 INT REFERENCES test_copy_pk(c1));
+COPY test_copy_ri from stdin with (FORMAT csv, DELIMITER ',', PARALLEL 1);
+10
+\.
+
+-- expressions: should perform non-parallel copy
+CREATE TABLE test_copy_expr (index INT, height REAL, weight REAL);
+
+COPY test_copy_expr FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1) WHERE height > random() * 65;
+60,60,60
+\.
+
+-- serial data: should perform non-parallel copy
+CREATE TABLE testserial (index SERIAL, height REAL);
+
+COPY testserial(height) FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1);
+60
+\.
+
+-- temporary table copy: should peform non-parallel copy
+CREATE TEMPORARY TABLE temp_test(
+ a int
+) ;
+
+COPY temp_test (a) from stdin with (PARALLEL 1);
+10
+\.
+
+-- non-existent column in column list: should fail
+COPY test_parallel_copy (xyz) from stdin with (PARALLEL 1);
+
+-- too many columns in column list: should fail
+COPY test_parallel_copy (a, b, c, d, e, d, c) from stdin with (PARALLEL 1);
+
+-- missing data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+
+\.
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+2000 230 23 23
+\.
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+2001 231 \N \N
+\.
+
+-- extra data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+2002 232 40 50 60 70 80
+\.
+
+-- various COPY options: delimiters, oids, NULL string, encoding
+COPY test_parallel_copy (b, c, d, e) from stdin with (delimiter ',', null 'x', PARALLEL 1) ;
+x,45,80,90
+x,\x,\\x,\\\x
+x,\,,\\\,,\\
+\.
+
+COPY test_parallel_copy from stdin WITH (DELIMITER ';', NULL '', PARALLEL 1);
+3000;;c;;
+\.
+
+COPY test_parallel_copy from stdin WITH (DELIMITER ':', NULL E'\\X', ENCODING 'sql_ascii', PARALLEL 1);
+4000:\X:C:\X:\X
+4001:1:empty::
+4002:2:null:\X:\X
+4003:3:Backslash:\\:\\
+4004:4:BackslashX:\\X:\\X
+4005:5:N:\N:\N
+4006:6:BackslashN:\\N:\\N
+4007:7:XX:\XX:\XX
+4008:8:Delimiter:\::\:
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = 50004;
+50003 24 34 44 54
+50004 25 35 45 55
+50005 26 36 46 56
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a > 60003;
+60001 22 32 42 52
+60002 23 33 43 53
+60003 24 34 44 54
+60004 25 35 45 55
+60005 26 36 46 56
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE f > 60003;
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = max(x.b);
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (SELECT 1 FROM x);
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (generate_series(1,5));
+
+COPY test_parallel_copy from stdin WITH(PARALLEL 1) WHERE a = row_number() over(b);
+
+-- check results of copy in
+SELECT * FROM test_parallel_copy;
+
-- clean up
+DROP TABLE test_copy_ri;
+DROP TABLE test_copy_pk;
+DROP TABLE test_copy_expr;
+DROP TABLE testeoc;
+DROP TABLE testnl;
DROP TABLE forcetest;
+DROP TABLE test_parallel_copy;
+DROP TABLE testserial;
+DROP TABLE testnull;
DROP TABLE vistest;
DROP FUNCTION truncate_in_subxact();
DROP TABLE x, y;
--
1.8.3.1
0006-Parallel-Copy-For-Binary-Format-Files.patchtext/x-patch; charset=US-ASCII; name=0006-Parallel-Copy-For-Binary-Format-Files.patchDownload
From 92156be05e28391d587a08cc11e318bcb8bab1ef Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Fri, 26 Jun 2020 11:43:48 +0530
Subject: [PATCH 6/6] Parallel Copy For Binary Format Files
Leader reads data from the file into the DSM data blocks each of 64K size.
It also identifies each tuple data block id, start offset, end offset,
tuple size and updates this information in the ring data structure.
Workers parallely read the tuple information from the ring data structure,
the actual tuple data from the data blocks and parallely insert the tuples
into the table.
---
src/backend/commands/copy.c | 697 +++++++++++++++++++++++++++++++++++++++-----
1 file changed, 627 insertions(+), 70 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index c7af30b..f1d5539 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -220,6 +220,16 @@ typedef struct ParallelCopyLineBuf
}ParallelCopyLineBuf;
/*
+ * Tuple boundary information used for parallel copy
+ * for binary format files.
+ */
+typedef struct ParallelCopyTupleInfo
+{
+ uint32 offset;
+ uint32 block_id;
+}ParallelCopyTupleInfo;
+
+/*
* Parallel copy data information.
*/
typedef struct ParallelCopyData
@@ -240,6 +250,11 @@ typedef struct ParallelCopyData
/* Current position in worker_line_buf */
uint32 worker_line_buf_pos;
+
+ /*
+ * For binary formatted files
+ */
+ ParallelCopyDataBlock *curr_data_block;
}ParallelCopyData;
/*
@@ -397,6 +412,7 @@ typedef struct ParallelCopyCommonKeyData
/* Working state for COPY FROM */
AttrNumber num_defaults;
Oid relid;
+ bool binary;
}ParallelCopyCommonKeyData;
/*
@@ -697,6 +713,56 @@ if (!IsParallelCopy()) \
else \
return pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
+/*
+ * CHECK_FIELD_COUNT - Handles the error cases for field count
+ * for binary format files.
+ */
+#define CHECK_FIELD_COUNT \
+{\
+ if (fld_count == -1) \
+ { \
+ if (IsParallelCopy() && \
+ !IsLeader()) \
+ return true; \
+ else if (IsParallelCopy() && \
+ IsLeader()) \
+ { \
+ if (cstate->pcdata->curr_data_block->data[cstate->raw_buf_index + sizeof(fld_count)] != 0) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("received copy data after EOF marker"))); \
+ return true; \
+ } \
+ else \
+ { \
+ /* \
+ * Received EOF marker. In a V3-protocol copy, wait for the \
+ * protocol-level EOF, and complain if it doesn't come \
+ * immediately. This ensures that we correctly handle CopyFail, \
+ * if client chooses to send that now. \
+ * \
+ * Note that we MUST NOT try to read more data in an old-protocol \
+ * copy, since there is no protocol-level EOF marker then. We \
+ * could go either way for copy from file, but choose to throw \
+ * error if there's data after the EOF marker, for consistency \
+ * with the new-protocol case. \
+ */ \
+ char dummy; \
+ if (cstate->copy_dest != COPY_OLD_FE && \
+ CopyGetData(cstate, &dummy, 1, 1) > 0) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("received copy data after EOF marker"))); \
+ return false; \
+ } \
+ } \
+ if (fld_count != cstate->max_fields) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("row field count is %d, expected %d", \
+ (int) fld_count, cstate->max_fields))); \
+}
+
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
@@ -751,6 +817,16 @@ static void PopulateCatalogInformation(CopyState cstate);
static pg_attribute_always_inline uint32 GetLinePosition(CopyState cstate);
static pg_attribute_always_inline copy_data_source_cb LookupParallelCopyFnPtr(const char *funcname);
static pg_attribute_always_inline char* LookupParallelCopyFnStr(copy_data_source_cb fn_addr);
+static uint32 WaitGetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info);
+static pg_attribute_always_inline bool CopyReadBinaryTupleLeader(CopyState cstate);
+static pg_attribute_always_inline void CopyReadBinaryAttributeLeader(CopyState cstate,
+ FmgrInfo *flinfo, Oid typioparam, int32 typmod,
+ uint32 *new_block_pos, int m, ParallelCopyTupleInfo *tuple_start_info_ptr,
+ ParallelCopyTupleInfo *tuple_end_info_ptr, uint32 *line_size);
+static pg_attribute_always_inline bool CopyReadBinaryTupleWorker(CopyState cstate, Datum *values, bool *nulls);
+static pg_attribute_always_inline Datum CopyReadBinaryAttributeWorker(CopyState cstate, FmgrInfo *flinfo,
+ Oid typioparam, int32 typmod, bool *isnull);
+static void AdjustFieldInfo(CopyState cstate, uint8 mode);
/*
* CopyCommonInfoForWorker - Copy shared_cstate using cstate information.
@@ -769,6 +845,7 @@ CopyCommonInfoForWorker(CopyState cstate, ParallelCopyCommonKeyData *shared_csta
shared_cstate->convert_selectively = cstate->convert_selectively;
shared_cstate->num_defaults = cstate->num_defaults;
shared_cstate->relid = cstate->pcdata->relid;
+ shared_cstate->binary = cstate->binary;
}
/*
@@ -997,8 +1074,8 @@ FindInsertMethod(CopyState cstate)
static pg_attribute_always_inline bool
IsParallelCopyAllowed(CopyState cstate)
{
- /* Parallel copy not allowed for frontend (2.0 protocol) & binary option. */
- if ((cstate->copy_dest == COPY_OLD_FE) || cstate->binary)
+ /* Parallel copy not allowed for frontend (2.0 protocol). */
+ if (cstate->copy_dest == COPY_OLD_FE)
return false;
/* Check if copy is into foreign table or temporary table. */
@@ -1299,6 +1376,7 @@ ParallelWorkerInitialization(ParallelCopyCommonKeyData *shared_cstate,
cstate->convert_selectively = shared_cstate->convert_selectively;
cstate->num_defaults = shared_cstate->num_defaults;
pcdata->relid = shared_cstate->relid;
+ cstate->binary = shared_cstate->binary;
PopulateGlobalsForCopyFrom(cstate, tup_desc, attnamelist);
@@ -1314,7 +1392,7 @@ ParallelWorkerInitialization(ParallelCopyCommonKeyData *shared_cstate,
initStringInfo(&cstate->attribute_buf);
initStringInfo(&cstate->line_buf);
- for (count = 0; count < WORKER_CHUNK_COUNT;count++)
+ for (count = 0; count < WORKER_CHUNK_COUNT; count++)
initStringInfo(&pcdata->worker_line_buf[count].line_buf);
cstate->line_buf_converted = false;
@@ -1680,32 +1758,66 @@ ParallelCopyLeader(CopyState cstate)
/* Execute the before statement triggers from the leader */
ExecBeforeStmtTrigger(cstate);
- /* On input just throw the header line away. */
- if (cstate->cur_lineno == 0 && cstate->header_line)
+ if (!cstate->binary)
{
- cstate->cur_lineno++;
- if (CopyReadLine(cstate))
+ /* On input just throw the header line away. */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
{
- pcshared_info->is_read_in_progress = false;
- return; /* done */
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ {
+ pcshared_info->is_read_in_progress = false;
+ return; /* done */
+ }
}
- }
- for (;;)
- {
- bool done;
- cstate->cur_lineno++;
+ for (;;)
+ {
+ bool done;
+ cstate->cur_lineno++;
- /* Actually read the line into memory here. */
- done = CopyReadLine(cstate);
+ /* Actually read the line into memory here. */
+ done = CopyReadLine(cstate);
- /*
- * EOF at start of line means we're done. If we see EOF after some
- * characters, we act as though it was newline followed by EOF, ie,
- * process the line and then exit loop on next iteration.
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ break;
+ }
+ }
+ else
+ {
+ /* binary format */
+ /* for paralle copy leader, fill in the error
+ * context inforation here, in case any failures
+ * while determining tuple offsets, leader
+ * would throw the errors with proper context.
*/
- if (done && cstate->line_buf.len == 0)
- break;
+ ErrorContextCallback errcallback;
+ errcallback.callback = CopyFromErrorCallback;
+ errcallback.arg = (void *) cstate;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ cstate->pcdata->curr_data_block = NULL;
+ cstate->raw_buf_index = 0;
+ pcshared_info->populated = 0;
+ cstate->cur_lineno = 0;
+ cstate->max_fields = list_length(cstate->attnumlist);
+
+ for (;;)
+ {
+ bool eof = false;
+ cstate->cur_lineno++;
+
+ eof = CopyReadBinaryTupleLeader(cstate);
+
+ if (eof)
+ break;
+ }
}
pcshared_info->is_read_in_progress = false;
@@ -1713,7 +1825,463 @@ ParallelCopyLeader(CopyState cstate)
}
/*
- * GetLinePosition - return the line position that worker should process.
+ * AdjustFieldInfo - gets a new block, updates the
+ * current offset, calculates the skip bytes.
+ * Works in two modes, 1 for field count
+ * 2 for field size
+ */
+static void
+AdjustFieldInfo(CopyState cstate, uint8 mode)
+{
+ ParallelCopyDataBlock *data_block = NULL;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint8 movebytes = 0;
+ uint32 block_pos;
+ uint32 prev_block_pos;
+ int readbytes = 0;
+
+ prev_block_pos = pcshared_info->cur_block_pos;
+
+ block_pos = WaitGetFreeCopyBlock(pcshared_info);
+
+ movebytes = DATA_BLOCK_SIZE - cstate->raw_buf_index;
+
+ cstate->pcdata->curr_data_block->skip_bytes = movebytes;
+
+ data_block = &pcshared_info->data_blocks[block_pos];
+
+ if (movebytes > 0)
+ memmove(&data_block->data[0], &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index],
+ movebytes);
+
+ elog(DEBUG1, "LEADER - field info is spread across data blocks - moved %d bytes from current block %u to %u block",
+ movebytes, prev_block_pos, block_pos);
+
+ readbytes = CopyGetData(cstate, &data_block->data[movebytes], 1, (DATA_BLOCK_SIZE - movebytes));
+
+ elog(DEBUG1, "LEADER - bytes read from file after field info is moved to next data block %d", readbytes);
+
+ if (cstate->reached_eof)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("unexpected EOF in COPY data")));
+
+ if (mode == 1)
+ {
+ cstate->pcdata->curr_data_block = data_block;
+ cstate->raw_buf_index = 0;
+ }
+ else if(mode == 2)
+ {
+ ParallelCopyDataBlock *prev_data_block = NULL;
+ prev_data_block = cstate->pcdata->curr_data_block;
+ prev_data_block->following_block = block_pos;
+ cstate->pcdata->curr_data_block = data_block;
+
+ if (prev_data_block->curr_blk_completed == false)
+ prev_data_block->curr_blk_completed = true;
+
+ cstate->raw_buf_index = 0;
+ }
+}
+
+/*
+ * CopyReadBinaryTupleLeader - leader reads data from binary formatted file
+ * to data blocks and identies tuple boundaries/offsets so that workers
+ * can work on the data blocks data.
+ */
+static pg_attribute_always_inline bool
+CopyReadBinaryTupleLeader(CopyState cstate)
+{
+ int readbytes = 0;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint32 block_pos;
+ int16 fld_count;
+ ListCell *cur;
+ FmgrInfo *in_functions = cstate->in_functions;
+ Oid *typioparams = cstate->typioparams;
+ TupleDesc tupDesc = RelationGetDescr(cstate->rel);
+ uint32 new_block_pos;
+ uint32 line_size = -1;
+ ParallelCopyTupleInfo curr_tuple_start_info;
+ ParallelCopyTupleInfo curr_tuple_end_info;
+
+ curr_tuple_start_info.block_id = -1;
+ curr_tuple_start_info.offset = -1;
+ curr_tuple_end_info.block_id = -1;
+ curr_tuple_end_info.offset = -1;
+
+ if (cstate->pcdata->curr_data_block == NULL)
+ {
+ block_pos = WaitGetFreeCopyBlock(pcshared_info);
+
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[block_pos];
+
+ cstate->raw_buf_index = 0;
+
+ readbytes = CopyGetData(cstate, &cstate->pcdata->curr_data_block->data, 1, DATA_BLOCK_SIZE);
+
+ elog(DEBUG1, "LEADER - bytes read from file %d", readbytes);
+
+ if (cstate->reached_eof)
+ return true;
+ }
+
+ if (cstate->raw_buf_index + sizeof(fld_count) >= (DATA_BLOCK_SIZE - 1))
+ AdjustFieldInfo(cstate, 1);
+
+ memcpy(&fld_count, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_count));
+
+ fld_count = (int16) pg_ntoh16(fld_count);
+
+ CHECK_FIELD_COUNT;
+
+ cstate->raw_buf_index = cstate->raw_buf_index + sizeof(fld_count);
+ new_block_pos = pcshared_info->cur_block_pos;
+
+ foreach(cur, cstate->attnumlist)
+ {
+ int attnum = lfirst_int(cur);
+ int m = attnum - 1;
+ Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+ cstate->cur_attname = NameStr(att->attname);
+
+ CopyReadBinaryAttributeLeader(cstate,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &new_block_pos,
+ m,
+ &curr_tuple_start_info,
+ &curr_tuple_end_info,
+ &line_size);
+
+ cstate->cur_attname = NULL;
+ }
+
+ if (line_size > 0)
+ {
+ int line_pos = UpdateBlockInLineInfo(cstate,
+ curr_tuple_start_info.block_id,
+ curr_tuple_start_info.offset,
+ line_size,
+ LINE_LEADER_POPULATED);
+
+ pcshared_info->populated++;
+
+ elog(DEBUG1, "LEADER - adding - block:%u, offset:%u, line size:%u line position:%d",
+ curr_tuple_start_info.block_id,
+ curr_tuple_start_info.offset,
+ line_size, line_pos);
+ }
+
+ return false;
+}
+
+/*
+ * CopyReadBinaryAttributeLeader - leader identifies boundaries/offsets
+ * for each attribute/column, it moves on to next data block if the
+ * attribute/column is spread across data blocks.
+ */
+static pg_attribute_always_inline void
+CopyReadBinaryAttributeLeader(CopyState cstate, FmgrInfo *flinfo,
+ Oid typioparam, int32 typmod, uint32 *new_block_pos,
+ int m, ParallelCopyTupleInfo *tuple_start_info_ptr,
+ ParallelCopyTupleInfo *tuple_end_info_ptr, uint32 *line_size)
+{
+ int32 fld_size;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+
+ if (m == 0)
+ {
+ tuple_start_info_ptr->block_id = pcshared_info->cur_block_pos;
+ /* raw_buf_index would have moved away size of field count bytes
+ in the caller, so move back to store the tuple start offset.
+ */
+ tuple_start_info_ptr->offset = cstate->raw_buf_index - sizeof(int16);
+ }
+
+ if ((cstate->raw_buf_index + sizeof(fld_size)) >= (DATA_BLOCK_SIZE - 1))
+ {
+ AdjustFieldInfo(cstate, 2);
+ *new_block_pos = pcshared_info->cur_block_pos;
+ }
+
+ memcpy(&fld_size, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_size));
+
+ cstate->raw_buf_index = cstate->raw_buf_index + sizeof(fld_size);
+
+ fld_size = (int32) pg_ntoh32(fld_size);
+
+ if (fld_size == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("unexpected EOF in COPY data")));
+
+ if (fld_size < -1)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("invalid field size")));
+
+ if ((DATA_BLOCK_SIZE - cstate->raw_buf_index) >= fld_size)
+ {
+ cstate->raw_buf_index = cstate->raw_buf_index + fld_size;
+ }
+ else
+ {
+ ParallelCopyDataBlock *data_block = NULL;
+ ParallelCopyDataBlock *prev_data_block = NULL;
+ uint32 block_pos = *new_block_pos;
+ int readbytes;
+ int32 requiredblocks = 0;
+ int32 remainingbytesincurrdatablock = DATA_BLOCK_SIZE - cstate->raw_buf_index;
+ int i = 0;
+
+ /* field size can spread across multiple data blocks,
+ * calculate the number of required data blocks and try to get
+ * those many data blocks.
+ */
+ requiredblocks = (int32)(fld_size - remainingbytesincurrdatablock)/(int32)DATA_BLOCK_SIZE;
+
+ if ((fld_size - remainingbytesincurrdatablock)%DATA_BLOCK_SIZE != 0)
+ requiredblocks++;
+
+ i = requiredblocks;
+
+ while(i > 0)
+ {
+ data_block = NULL;
+ prev_data_block = NULL;
+ readbytes = 0;
+
+ block_pos = WaitGetFreeCopyBlock(pcshared_info);
+
+ data_block = &pcshared_info->data_blocks[block_pos];
+
+ readbytes = CopyGetData(cstate, &data_block->data[0], 1, DATA_BLOCK_SIZE);
+
+ elog(DEBUG1, "LEADER - bytes read from file after detecting that tuple is spread across data blocks %d", readbytes);
+
+ /* eof here means that most probably,
+ * the field size would have been corrupted.
+ */
+ if (cstate->reached_eof)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("unexpected EOF in COPY data")));
+
+ prev_data_block = cstate->pcdata->curr_data_block;
+ prev_data_block->following_block = block_pos;
+
+ if (prev_data_block->curr_blk_completed == false)
+ prev_data_block->curr_blk_completed = true;
+
+ cstate->pcdata->curr_data_block = data_block;
+
+ i--;
+ }
+
+ cstate->raw_buf_index = fld_size - (((requiredblocks - 1) * DATA_BLOCK_SIZE) +
+ remainingbytesincurrdatablock);
+
+ /* raw_buf_index should never cross data block size,
+ * as the required number of data blocks would have
+ * been obtained in the above while loop.
+ */
+ Assert(cstate->raw_buf_index <= DATA_BLOCK_SIZE);
+
+ *new_block_pos = block_pos;
+ }
+
+ if (m == cstate->max_fields - 1)
+ {
+ tuple_end_info_ptr->block_id = *new_block_pos;
+ tuple_end_info_ptr->offset = cstate->raw_buf_index - 1;
+
+ if (tuple_start_info_ptr->block_id == tuple_end_info_ptr->block_id)
+ {
+ elog(DEBUG1,"LEADER - tuple lies in a single data block");
+
+ *line_size = tuple_end_info_ptr->offset - tuple_start_info_ptr->offset + 1;
+ pg_atomic_add_fetch_u32(&pcshared_info->data_blocks[tuple_start_info_ptr->block_id].unprocessed_line_parts, 1);
+ }
+ else
+ {
+ uint32 following_block_id = pcshared_info->data_blocks[tuple_start_info_ptr->block_id].following_block;
+
+ elog(DEBUG1,"LEADER - tuple is spread across data blocks");
+
+ *line_size = DATA_BLOCK_SIZE - tuple_start_info_ptr->offset -
+ pcshared_info->data_blocks[tuple_start_info_ptr->block_id].skip_bytes;
+
+ pg_atomic_add_fetch_u32(&pcshared_info->data_blocks[tuple_start_info_ptr->block_id].unprocessed_line_parts, 1);
+
+ while (following_block_id != tuple_end_info_ptr->block_id)
+ {
+ *line_size = *line_size + DATA_BLOCK_SIZE - pcshared_info->data_blocks[following_block_id].skip_bytes;
+
+ pg_atomic_add_fetch_u32(&pcshared_info->data_blocks[following_block_id].unprocessed_line_parts, 1);
+
+ following_block_id = pcshared_info->data_blocks[following_block_id].following_block;
+
+ if (following_block_id == -1)
+ break;
+ }
+
+ if (following_block_id != -1)
+ pg_atomic_add_fetch_u32(&pcshared_info->data_blocks[following_block_id].unprocessed_line_parts, 1);
+
+ *line_size = *line_size + tuple_end_info_ptr->offset + 1;
+ }
+ }
+}
+
+/*
+ * CopyReadBinaryTupleWorker - each worker reads data from data blocks after
+ * getting leader-identified tuple offsets from ring data structure.
+ */
+static pg_attribute_always_inline bool
+CopyReadBinaryTupleWorker(CopyState cstate, Datum *values, bool *nulls)
+{
+ uint32 line_pos;
+ ParallelCopyLineBoundary *line_info;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ int16 fld_count;
+ ListCell *cur;
+ FmgrInfo *in_functions = cstate->in_functions;
+ Oid *typioparams = cstate->typioparams;
+ TupleDesc tupDesc = RelationGetDescr(cstate->rel);
+
+ line_pos = GetLinePosition(cstate);
+
+ if (line_pos == -1)
+ return true;
+
+ line_info = &pcshared_info->line_boundaries.ring[line_pos];
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[line_info->first_block];
+ cstate->raw_buf_index = line_info->start_offset;
+
+ if (cstate->raw_buf_index + sizeof(fld_count) >= (DATA_BLOCK_SIZE - 1))
+ /*the case where field count spread across datablocks should never occur.
+ as the leader would have moved it to next block*/
+ elog(DEBUG1, "WORKER - field count spread across datablocks should never occur");
+
+ memcpy(&fld_count, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_count));
+ fld_count = (int16) pg_ntoh16(fld_count);
+
+ CHECK_FIELD_COUNT;
+
+ cstate->raw_buf_index = cstate->raw_buf_index + sizeof(fld_count);
+
+ foreach(cur, cstate->attnumlist)
+ {
+ int attnum = lfirst_int(cur);
+ int m = attnum - 1;
+ Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+ cstate->cur_attname = NameStr(att->attname);
+
+ values[m] = CopyReadBinaryAttributeWorker(cstate,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &nulls[m]);
+ cstate->cur_attname = NULL;
+ }
+
+ pg_atomic_sub_fetch_u32(&cstate->pcdata->curr_data_block->unprocessed_line_parts, 1);
+ line_info->start_offset = -1;
+ pg_atomic_write_u32(&line_info->line_state, LINE_WORKER_PROCESSED);
+ pg_atomic_write_u32(&line_info->line_size, -1);
+ pg_atomic_add_fetch_u64(&pcshared_info->total_worker_processed, 1);
+
+ return false;
+}
+
+/*
+ * CopyReadBinaryAttributeWorker - leader identifies boundaries/offsets
+ * for each attribute/column, it moves on to next data block if the
+ * attribute/column is spread across data blocks.
+ */
+static pg_attribute_always_inline Datum
+CopyReadBinaryAttributeWorker(CopyState cstate, FmgrInfo *flinfo,
+ Oid typioparam, int32 typmod, bool *isnull)
+{
+ int32 fld_size;
+ Datum result;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+
+ if ((cstate->raw_buf_index + sizeof(fld_size)) >= (DATA_BLOCK_SIZE - 1))
+ {
+ ParallelCopyDataBlock *prev_data_block = cstate->pcdata->curr_data_block;
+
+ elog(DEBUG1, "WORKER - field size is spread across data blocks");
+
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[cstate->pcdata->curr_data_block->following_block];
+ pg_atomic_sub_fetch_u32(&prev_data_block->unprocessed_line_parts, 1);
+ cstate->raw_buf_index = 0;
+ }
+
+ memcpy(&fld_size, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_size));
+ fld_size = (int32) pg_ntoh32(fld_size);
+
+ if (fld_size == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("unexpected EOF in COPY data")));
+
+ if (fld_size < -1)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("invalid field size")));
+
+ cstate->raw_buf_index = cstate->raw_buf_index + sizeof(fld_size);
+
+ /* reset attribute_buf to empty, and load raw data in it */
+ resetStringInfo(&cstate->attribute_buf);
+
+ enlargeStringInfo(&cstate->attribute_buf, fld_size);
+
+ if ((DATA_BLOCK_SIZE - cstate->raw_buf_index) >= fld_size)
+ {
+ elog(DEBUG1, "WORKER - tuple lies in single data block");
+
+ memcpy(&cstate->attribute_buf.data[0],&cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], fld_size);
+ cstate->raw_buf_index = cstate->raw_buf_index + fld_size;
+ }
+ else
+ {
+ ParallelCopyDataBlock *prev_data_block = cstate->pcdata->curr_data_block;
+
+ elog(DEBUG1, "WORKER - tuple is spread across data blocks");
+
+ memcpy(&cstate->attribute_buf.data[0], &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index],
+ (DATA_BLOCK_SIZE - cstate->raw_buf_index));
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[cstate->pcdata->curr_data_block->following_block];
+ pg_atomic_sub_fetch_u32(&prev_data_block->unprocessed_line_parts, 1);
+ memcpy(&cstate->attribute_buf.data[DATA_BLOCK_SIZE - cstate->raw_buf_index],
+ &cstate->pcdata->curr_data_block->data[0], (fld_size - (DATA_BLOCK_SIZE - cstate->raw_buf_index)));
+ cstate->raw_buf_index = fld_size - (DATA_BLOCK_SIZE - cstate->raw_buf_index);
+ }
+
+ cstate->attribute_buf.len = fld_size;
+ cstate->attribute_buf.data[fld_size] = '\0';
+
+ /* Call the column type's binary input converter */
+ result = ReceiveFunctionCall(flinfo, &cstate->attribute_buf,
+ typioparam, typmod);
+
+ /* Trouble if it didn't eat the whole buffer */
+ if (cstate->attribute_buf.cursor != cstate->attribute_buf.len)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
+ errmsg("incorrect binary data format")));
+
+ *isnull = false;
+ return result;
+}
+
+/*
+ * GetLinePosition - return the line position that worker should pcdata->process.
*/
static uint32
GetLinePosition(CopyState cstate)
@@ -1801,7 +2369,9 @@ GetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
{
dataBlkPtr->curr_blk_completed = false;
dataBlkPtr->skip_bytes = 0;
+ dataBlkPtr->following_block = -1;
pcshared_info->cur_block_pos = block_pos;
+ MemSet(&dataBlkPtr->data[0], 0, DATA_BLOCK_SIZE);
return block_pos;
}
@@ -5480,60 +6050,47 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
else
{
/* binary */
- int16 fld_count;
- ListCell *cur;
-
- cstate->cur_lineno++;
-
- if (!CopyGetInt16(cstate, &fld_count))
+ if (!IsParallelCopy())
{
- /* EOF detected (end of file, or protocol-level EOF) */
- return false;
- }
+ int16 fld_count;
+ ListCell *cur;
- if (fld_count == -1)
- {
- /*
- * Received EOF marker. In a V3-protocol copy, wait for the
- * protocol-level EOF, and complain if it doesn't come
- * immediately. This ensures that we correctly handle CopyFail,
- * if client chooses to send that now.
- *
- * Note that we MUST NOT try to read more data in an old-protocol
- * copy, since there is no protocol-level EOF marker then. We
- * could go either way for copy from file, but choose to throw
- * error if there's data after the EOF marker, for consistency
- * with the new-protocol case.
- */
- char dummy;
+ cstate->cur_lineno++;
+ cstate->max_fields = list_length(cstate->attnumlist);
- if (cstate->copy_dest != COPY_OLD_FE &&
- CopyGetData(cstate, &dummy, 1, 1) > 0)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("received copy data after EOF marker")));
- return false;
- }
+ if (!CopyGetInt16(cstate, &fld_count))
+ {
+ /* EOF detected (end of file, or protocol-level EOF) */
+ return false;
+ }
- if (fld_count != attr_count)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("row field count is %d, expected %d",
- (int) fld_count, attr_count)));
+ CHECK_FIELD_COUNT;
- foreach(cur, cstate->attnumlist)
+ foreach(cur, cstate->attnumlist)
+ {
+ int attnum = lfirst_int(cur);
+ int m = attnum - 1;
+ Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+ cstate->cur_attname = NameStr(att->attname);
+ values[m] = CopyReadBinaryAttribute(cstate,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &nulls[m]);
+ cstate->cur_attname = NULL;
+ }
+ }
+ else
{
- int attnum = lfirst_int(cur);
- int m = attnum - 1;
- Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+ bool eof = false;
+ cstate->cur_lineno++;
+ cstate->max_fields = list_length(cstate->attnumlist);
- cstate->cur_attname = NameStr(att->attname);
- values[m] = CopyReadBinaryAttribute(cstate,
- &in_functions[m],
- typioparams[m],
- att->atttypmod,
- &nulls[m]);
- cstate->cur_attname = NULL;
+ eof = CopyReadBinaryTupleWorker(cstate, values, nulls);
+
+ if (eof)
+ return false;
}
}
--
1.8.3.1
On Wed, Jul 1, 2020 at 2:46 PM vignesh C <vignesh21@gmail.com> wrote:
Hi,
I have made few changes in 0003 & 0005 patch, there were a couple of
bugs in 0003 patch & some random test failures in 0005 patch.
Attached new patches which include the fixes for the same.
I have made changes in 0003 patch, to remove changes made in pqmq.c for
parallel worker error handling hang issue. This is being discussed in email
[1]: /messages/by-id/CALDaNm1d1hHPZUg3xU4XjtWBOLCrA+-2cJcLpw-cePZ=GgDVfA@mail.gmail.com
changes.
[1]: /messages/by-id/CALDaNm1d1hHPZUg3xU4XjtWBOLCrA+-2cJcLpw-cePZ=GgDVfA@mail.gmail.com
/messages/by-id/CALDaNm1d1hHPZUg3xU4XjtWBOLCrA+-2cJcLpw-cePZ=GgDVfA@mail.gmail.com
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
Attachments:
0001-Copy-code-readjustment-to-support-parallel-copy.patchapplication/x-patch; name=0001-Copy-code-readjustment-to-support-parallel-copy.patchDownload
From fe65142a5d461cf457ea87abdd62ab88feb62bea Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Tue, 23 Jun 2020 06:49:22 +0530
Subject: [PATCH 1/6] Copy code readjustment to support parallel copy.
This patch has the copy code slightly readjusted so that the common code is
separated to functions/macros, these functions/macros will be used by the
workers in the parallel copy code of the upcoming patches. EOL removal is moved
from CopyReadLine to CopyReadLineText, this change was required because in case
of parallel copy the record identification and record updation is done in
CopyReadLineText, before record information is updated in shared memory the new
line characters should be removed.
---
src/backend/commands/copy.c | 372 ++++++++++++++++++++++++++------------------
1 file changed, 221 insertions(+), 151 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3e199bd..33ee891 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -95,6 +95,9 @@ typedef enum CopyInsertMethod
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
} CopyInsertMethod;
+#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
+
/*
* This struct contains all the state variables used throughout a COPY
* operation. For simplicity, we use the same struct for all variants of COPY,
@@ -219,7 +222,6 @@ typedef struct CopyStateData
* converts it. Note: we guarantee that there is a \0 at
* raw_buf[raw_buf_len].
*/
-#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
char *raw_buf;
int raw_buf_index; /* next byte to process */
int raw_buf_len; /* total # of bytes stored */
@@ -347,6 +349,88 @@ if (1) \
goto not_end_of_copy; \
} else ((void) 0)
+/*
+ * CONVERT_TO_SERVER_ENCODING - convert contents to server encoding.
+ */
+#define CONVERT_TO_SERVER_ENCODING(cstate) \
+{ \
+ /* Done reading the line. Convert it to server encoding. */ \
+ if (cstate->need_transcoding) \
+ { \
+ char *cvt; \
+ cvt = pg_any_to_server(cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ cstate->file_encoding); \
+ if (cvt != cstate->line_buf.data) \
+ { \
+ /* transfer converted data back to line_buf */ \
+ resetStringInfo(&cstate->line_buf); \
+ appendBinaryStringInfo(&cstate->line_buf, cvt, strlen(cvt)); \
+ pfree(cvt); \
+ } \
+ } \
+ /* Now it's safe to use the buffer in error messages */ \
+ cstate->line_buf_converted = true; \
+}
+
+/*
+ * CLEAR_EOL_FROM_COPIED_DATA - Clear EOL from the copied data.
+ */
+#define CLEAR_EOL_FROM_COPIED_DATA(copy_line_data, copy_line_pos, copy_line_size) \
+{ \
+ /* \
+ * If we didn't hit EOF, then we must have transferred the EOL marker \
+ * to line_buf along with the data. Get rid of it. \
+ */ \
+ switch (cstate->eol_type) \
+ { \
+ case EOL_NL: \
+ Assert(copy_line_size >= 1); \
+ Assert(copy_line_data[copy_line_pos - 1] == '\n'); \
+ copy_line_data[copy_line_pos - 1] = '\0'; \
+ copy_line_size--; \
+ break; \
+ case EOL_CR: \
+ Assert(copy_line_size >= 1); \
+ Assert(copy_line_data[copy_line_pos - 1] == '\r'); \
+ copy_line_data[copy_line_pos - 1] = '\0'; \
+ copy_line_size--; \
+ break; \
+ case EOL_CRNL: \
+ Assert(copy_line_size >= 2); \
+ Assert(copy_line_data[copy_line_pos - 2] == '\r'); \
+ Assert(copy_line_data[copy_line_pos - 1] == '\n'); \
+ copy_line_data[copy_line_pos - 2] = '\0'; \
+ copy_line_size -= 2; \
+ break; \
+ case EOL_UNKNOWN: \
+ /* shouldn't get here */ \
+ Assert(false); \
+ break; \
+ } \
+}
+
+/*
+ * CLEAR_EOL_LINE - Wrapper for clearing EOL.
+ */
+#define CLEAR_EOL_LINE() \
+if (!result && !IsHeaderLine()) \
+ CLEAR_EOL_FROM_COPIED_DATA(cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ cstate->line_buf.len) \
+
+/*
+ * INCREMENTPROCESSED - Increment the lines processed.
+ */
+#define INCREMENTPROCESSED(processed) \
+processed++;
+
+/*
+ * GETPROCESSED - Get the lines processed.
+ */
+#define GETPROCESSED(processed) \
+return processed;
+
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
@@ -392,6 +476,8 @@ static bool CopyGetInt32(CopyState cstate, int32 *val);
static void CopySendInt16(CopyState cstate, int16 val);
static bool CopyGetInt16(CopyState cstate, int16 *val);
+static void PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tup_desc,
+ List *attnamelist);
/*
* Send copy start/stop messages for frontend copies. These have changed
@@ -793,6 +879,7 @@ CopyLoadRawBuf(CopyState cstate)
{
int nbytes;
int inbytes;
+ int minread = 1;
if (cstate->raw_buf_index < cstate->raw_buf_len)
{
@@ -804,8 +891,11 @@ CopyLoadRawBuf(CopyState cstate)
else
nbytes = 0; /* no data need be saved */
+ if (cstate->copy_dest == COPY_NEW_FE)
+ minread = RAW_BUF_SIZE - nbytes;
+
inbytes = CopyGetData(cstate, cstate->raw_buf + nbytes,
- 1, RAW_BUF_SIZE - nbytes);
+ minread, RAW_BUF_SIZE - nbytes);
nbytes += inbytes;
cstate->raw_buf[nbytes] = '\0';
cstate->raw_buf_index = 0;
@@ -1463,7 +1553,6 @@ BeginCopy(ParseState *pstate,
{
CopyState cstate;
TupleDesc tupDesc;
- int num_phys_attrs;
MemoryContext oldcontext;
/* Allocate workspace and zero all fields */
@@ -1629,6 +1718,24 @@ BeginCopy(ParseState *pstate,
tupDesc = cstate->queryDesc->tupDesc;
}
+ PopulateGlobalsForCopyFrom(cstate, tupDesc, attnamelist);
+ cstate->copy_dest = COPY_FILE; /* default */
+
+ MemoryContextSwitchTo(oldcontext);
+
+ return cstate;
+}
+
+/*
+ * PopulateGlobalsForCopyFrom - Populates the common variables required for copy
+ * from operation. This is a helper function for BeginCopy function.
+ */
+static void
+PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tupDesc,
+ List *attnamelist)
+{
+ int num_phys_attrs;
+
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
@@ -1748,12 +1855,6 @@ BeginCopy(ParseState *pstate,
pg_database_encoding_max_length() > 1);
/* See Multibyte encoding comment above */
cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
-
- cstate->copy_dest = COPY_FILE; /* default */
-
- MemoryContextSwitchTo(oldcontext);
-
- return cstate;
}
/*
@@ -2642,32 +2743,11 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
}
/*
- * Copy FROM file to relation.
+ * Check if the relation specified in copy from is valid.
*/
-uint64
-CopyFrom(CopyState cstate)
+static void
+CheckTargetRelValidity(CopyState cstate)
{
- ResultRelInfo *resultRelInfo;
- ResultRelInfo *target_resultRelInfo;
- ResultRelInfo *prevResultRelInfo = NULL;
- EState *estate = CreateExecutorState(); /* for ExecConstraints() */
- ModifyTableState *mtstate;
- ExprContext *econtext;
- TupleTableSlot *singleslot = NULL;
- MemoryContext oldcontext = CurrentMemoryContext;
-
- PartitionTupleRouting *proute = NULL;
- ErrorContextCallback errcallback;
- CommandId mycid = GetCurrentCommandId(true);
- int ti_options = 0; /* start with default options for insert */
- BulkInsertState bistate = NULL;
- CopyInsertMethod insertMethod;
- CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
- uint64 processed = 0;
- bool has_before_insert_row_trig;
- bool has_instead_insert_row_trig;
- bool leafpart_use_multi_insert = false;
-
Assert(cstate->rel);
/*
@@ -2704,27 +2784,6 @@ CopyFrom(CopyState cstate)
RelationGetRelationName(cstate->rel))));
}
- /*
- * If the target file is new-in-transaction, we assume that checking FSM
- * for free space is a waste of time. This could possibly be wrong, but
- * it's unlikely.
- */
- if (RELKIND_HAS_STORAGE(cstate->rel->rd_rel->relkind) &&
- (cstate->rel->rd_createSubid != InvalidSubTransactionId ||
- cstate->rel->rd_firstRelfilenodeSubid != InvalidSubTransactionId))
- ti_options |= TABLE_INSERT_SKIP_FSM;
-
- /*
- * Optimize if new relfilenode was created in this subxact or one of its
- * committed children and we won't see those rows later as part of an
- * earlier scan or command. The subxact test ensures that if this subxact
- * aborts then the frozen rows won't be visible after xact cleanup. Note
- * that the stronger test of exactly which subtransaction created it is
- * crucial for correctness of this optimization. The test for an earlier
- * scan or command tolerates false negatives. FREEZE causes other sessions
- * to see rows they would not see under MVCC, and a false negative merely
- * spreads that anomaly to the current session.
- */
if (cstate->freeze)
{
/*
@@ -2762,9 +2821,61 @@ CopyFrom(CopyState cstate)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("cannot perform COPY FREEZE because the table was not created or truncated in the current subtransaction")));
+ }
+}
+
+/*
+ * Copy FROM file to relation.
+ */
+uint64
+CopyFrom(CopyState cstate)
+{
+ ResultRelInfo *resultRelInfo;
+ ResultRelInfo *target_resultRelInfo;
+ ResultRelInfo *prevResultRelInfo = NULL;
+ EState *estate = CreateExecutorState(); /* for ExecConstraints() */
+ ModifyTableState *mtstate;
+ ExprContext *econtext;
+ TupleTableSlot *singleslot = NULL;
+ MemoryContext oldcontext = CurrentMemoryContext;
+
+ PartitionTupleRouting *proute = NULL;
+ ErrorContextCallback errcallback;
+ CommandId mycid = GetCurrentCommandId(true);
+ int ti_options = 0; /* start with default options for insert */
+ BulkInsertState bistate = NULL;
+ CopyInsertMethod insertMethod;
+ CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
+ uint64 processed = 0;
+ bool has_before_insert_row_trig;
+ bool has_instead_insert_row_trig;
+ bool leafpart_use_multi_insert = false;
+
+ CheckTargetRelValidity(cstate);
+ /*
+ * If the target file is new-in-transaction, we assume that checking FSM
+ * for free space is a waste of time. This could possibly be wrong, but
+ * it's unlikely.
+ */
+ if (RELKIND_HAS_STORAGE(cstate->rel->rd_rel->relkind) &&
+ (cstate->rel->rd_createSubid != InvalidSubTransactionId ||
+ cstate->rel->rd_firstRelfilenodeSubid != InvalidSubTransactionId))
+ ti_options |= TABLE_INSERT_SKIP_FSM;
+
+ /*
+ * Optimize if new relfilenode was created in this subxact or one of its
+ * committed children and we won't see those rows later as part of an
+ * earlier scan or command. The subxact test ensures that if this subxact
+ * aborts then the frozen rows won't be visible after xact cleanup. Note
+ * that the stronger test of exactly which subtransaction created it is
+ * crucial for correctness of this optimization. The test for an earlier
+ * scan or command tolerates false negatives. FREEZE causes other sessions
+ * to see rows they would not see under MVCC, and a false negative merely
+ * spreads that anomaly to the current session.
+ */
+ if (cstate->freeze)
ti_options |= TABLE_INSERT_FROZEN;
- }
/*
* We need a ResultRelInfo so we can use the regular executor's
@@ -3257,7 +3368,7 @@ CopyFrom(CopyState cstate)
* or FDW; this is the same definition used by nodeModifyTable.c
* for counting tuples inserted by an INSERT command.
*/
- processed++;
+ INCREMENTPROCESSED(processed)
}
}
@@ -3312,30 +3423,15 @@ CopyFrom(CopyState cstate)
FreeExecutorState(estate);
- return processed;
+ GETPROCESSED(processed)
}
/*
- * Setup to read tuples from a file for COPY FROM.
- *
- * 'rel': Used as a template for the tuples
- * 'filename': Name of server-local file to read
- * 'attnamelist': List of char *, columns to include. NIL selects all cols.
- * 'options': List of DefElem. See copy_opt_item in gram.y for selections.
- *
- * Returns a CopyState, to be passed to NextCopyFrom and related functions.
+ * PopulateCatalogInformation - populate the catalog information.
*/
-CopyState
-BeginCopyFrom(ParseState *pstate,
- Relation rel,
- const char *filename,
- bool is_program,
- copy_data_source_cb data_source_cb,
- List *attnamelist,
- List *options)
+static void
+PopulateCatalogInformation(CopyState cstate)
{
- CopyState cstate;
- bool pipe = (filename == NULL);
TupleDesc tupDesc;
AttrNumber num_phys_attrs,
num_defaults;
@@ -3345,31 +3441,8 @@ BeginCopyFrom(ParseState *pstate,
Oid in_func_oid;
int *defmap;
ExprState **defexprs;
- MemoryContext oldcontext;
bool volatile_defexprs;
- cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
- oldcontext = MemoryContextSwitchTo(cstate->copycontext);
-
- /* Initialize state variables */
- cstate->reached_eof = false;
- cstate->eol_type = EOL_UNKNOWN;
- cstate->cur_relname = RelationGetRelationName(cstate->rel);
- cstate->cur_lineno = 0;
- cstate->cur_attname = NULL;
- cstate->cur_attval = NULL;
-
- /* Set up variables to avoid per-attribute overhead. */
- initStringInfo(&cstate->attribute_buf);
- initStringInfo(&cstate->line_buf);
- cstate->line_buf_converted = false;
- cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
- cstate->raw_buf_index = cstate->raw_buf_len = 0;
-
- /* Assign range table, we'll need it in CopyFrom. */
- if (pstate)
- cstate->range_table = pstate->p_rtable;
-
tupDesc = RelationGetDescr(cstate->rel);
num_phys_attrs = tupDesc->natts;
num_defaults = 0;
@@ -3447,6 +3520,54 @@ BeginCopyFrom(ParseState *pstate,
cstate->defexprs = defexprs;
cstate->volatile_defexprs = volatile_defexprs;
cstate->num_defaults = num_defaults;
+}
+
+/*
+ * Setup to read tuples from a file for COPY FROM.
+ *
+ * 'rel': Used as a template for the tuples
+ * 'filename': Name of server-local file to read
+ * 'attnamelist': List of char *, columns to include. NIL selects all cols.
+ * 'options': List of DefElem. See copy_opt_item in gram.y for selections.
+ *
+ * Returns a CopyState, to be passed to NextCopyFrom and related functions.
+ */
+CopyState
+BeginCopyFrom(ParseState *pstate,
+ Relation rel,
+ const char *filename,
+ bool is_program,
+ copy_data_source_cb data_source_cb,
+ List *attnamelist,
+ List *options)
+{
+ CopyState cstate;
+ bool pipe = (filename == NULL);
+ MemoryContext oldcontext;
+
+ cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+
+ /* Initialize state variables */
+ cstate->reached_eof = false;
+ cstate->eol_type = EOL_UNKNOWN;
+ cstate->cur_relname = RelationGetRelationName(cstate->rel);
+ cstate->cur_lineno = 0;
+ cstate->cur_attname = NULL;
+ cstate->cur_attval = NULL;
+
+ /* Set up variables to avoid per-attribute overhead. */
+ initStringInfo(&cstate->attribute_buf);
+ initStringInfo(&cstate->line_buf);
+ cstate->line_buf_converted = false;
+ cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+
+ /* Assign range table, we'll need it in CopyFrom. */
+ if (pstate)
+ cstate->range_table = pstate->p_rtable;
+
+ PopulateCatalogInformation(cstate);
cstate->is_program = is_program;
if (data_source_cb)
@@ -3856,60 +3977,8 @@ CopyReadLine(CopyState cstate)
} while (CopyLoadRawBuf(cstate));
}
}
- else
- {
- /*
- * If we didn't hit EOF, then we must have transferred the EOL marker
- * to line_buf along with the data. Get rid of it.
- */
- switch (cstate->eol_type)
- {
- case EOL_NL:
- Assert(cstate->line_buf.len >= 1);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\n');
- cstate->line_buf.len--;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_CR:
- Assert(cstate->line_buf.len >= 1);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\r');
- cstate->line_buf.len--;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_CRNL:
- Assert(cstate->line_buf.len >= 2);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 2] == '\r');
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\n');
- cstate->line_buf.len -= 2;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_UNKNOWN:
- /* shouldn't get here */
- Assert(false);
- break;
- }
- }
-
- /* Done reading the line. Convert it to server encoding. */
- if (cstate->need_transcoding)
- {
- char *cvt;
-
- cvt = pg_any_to_server(cstate->line_buf.data,
- cstate->line_buf.len,
- cstate->file_encoding);
- if (cvt != cstate->line_buf.data)
- {
- /* transfer converted data back to line_buf */
- resetStringInfo(&cstate->line_buf);
- appendBinaryStringInfo(&cstate->line_buf, cvt, strlen(cvt));
- pfree(cvt);
- }
- }
-
- /* Now it's safe to use the buffer in error messages */
- cstate->line_buf_converted = true;
+ CONVERT_TO_SERVER_ENCODING(cstate)
return result;
}
@@ -4273,6 +4342,7 @@ not_end_of_copy:
* Transfer any still-uncopied data to line_buf.
*/
REFILL_LINEBUF;
+ CLEAR_EOL_LINE()
return result;
}
--
1.8.3.1
0002-Framework-for-leader-worker-in-parallel-copy.patchapplication/x-patch; name=0002-Framework-for-leader-worker-in-parallel-copy.patchDownload
From f852a2d53aa2107b90899df9b8c07824b131e498 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>, Bharath <bharath.rupireddyforpostgres@gmail.com>
Date: Wed, 24 Jun 2020 13:51:56 +0530
Subject: [PATCH 2/6] Framework for leader/worker in parallel copy
This patch has the framework for data structures in parallel copy, leader
initialization, worker initialization, shared memory updation, starting workers,
wait for workers and workers exiting.
---
src/backend/access/transam/parallel.c | 4 +
src/backend/commands/copy.c | 865 +++++++++++++++++++++++++++-
src/backend/replication/logical/tablesync.c | 2 +-
src/include/commands/copy.h | 4 +
src/tools/pgindent/typedefs.list | 8 +
5 files changed, 878 insertions(+), 5 deletions(-)
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 14a8690..09e7a19 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -25,6 +25,7 @@
#include "catalog/pg_enum.h"
#include "catalog/storage.h"
#include "commands/async.h"
+#include "commands/copy.h"
#include "executor/execParallel.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
@@ -145,6 +146,9 @@ static const struct
},
{
"parallel_vacuum_main", parallel_vacuum_main
+ },
+ {
+ "ParallelCopyMain", ParallelCopyMain
}
};
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 33ee891..0a4d997 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -96,9 +96,138 @@ typedef enum CopyInsertMethod
} CopyInsertMethod;
#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+
+#define DATA_BLOCK_SIZE RAW_BUF_SIZE
+#define RINGSIZE (10 * 1000)
+#define MAX_BLOCKS_COUNT 1000
+#define WORKER_CHUNK_COUNT 50 /* should be mod of RINGSIZE */
+
+#define IsParallelCopy() (cstate->is_parallel)
+#define IsLeader() (cstate->pcdata->is_leader)
#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
/*
+ * Copy data block information.
+ */
+typedef struct ParallelCopyDataBlock
+{
+ /* The number of unprocessed lines in the current block. */
+ pg_atomic_uint32 unprocessed_line_parts;
+
+ /*
+ * If the current line data is continued into another block, following_block
+ * will have the position where the remaining data need to be read.
+ */
+ uint32 following_block;
+
+ /*
+ * This flag will be set, when the leader finds out this block can be read
+ * safely by the worker. This helps the worker to start processing the line
+ * early where the line will be spread across many blocks and the worker
+ * need not wait for the complete line to be processed.
+ */
+ bool curr_blk_completed;
+ char data[DATA_BLOCK_SIZE]; /* data read from file */
+ uint8 skip_bytes;
+}ParallelCopyDataBlock;
+
+/*
+ * Individual line information.
+ * ParallelCopyLineBoundary is common data structure between leader & worker,
+ * this is protected by the following sequence in the leader & worker.
+ * Leader should operate in the following order:
+ * 1) update first_block, start_offset & cur_lineno in any order.
+ * 2) update line_size.
+ * 3) update line_state.
+ * Worker should operate in the following order:
+ * 1) read line_size.
+ * 2) only one worker should choose one line for processing, this is handled by
+ * using pg_atomic_compare_exchange_u32, worker will change the sate to
+ * LINE_WORKER_PROCESSING only if line_state is LINE_LEADER_POPULATED.
+ * 3) read first_block, start_offset & cur_lineno in any order.
+ */
+typedef struct ParallelCopyLineBoundary
+{
+ /* Position of the first block in data_blocks array. */
+ uint32 first_block;
+ uint32 start_offset; /* start offset of the line */
+
+ /*
+ * Size of the current line -1 means line is yet to be filled completely,
+ * 0 means empty line, >0 means line filled with line size data.
+ */
+ pg_atomic_uint32 line_size;
+ pg_atomic_uint32 line_state; /* line state */
+ uint64 cur_lineno; /* line number for error messages */
+}ParallelCopyLineBoundary;
+
+/*
+ * Circular queue used to store the line information.
+ */
+typedef struct ParallelCopyLineBoundaries
+{
+ /* Position for the leader to populate a line. */
+ uint32 leader_pos;
+
+ /* Data read from the file/stdin by the leader process. */
+ ParallelCopyLineBoundary ring[RINGSIZE];
+}ParallelCopyLineBoundaries;
+
+/*
+ * Shared information among parallel copy workers. This will be allocated in the
+ * DSM segment.
+ */
+typedef struct ParallelCopyShmInfo
+{
+ bool is_read_in_progress; /* file read status */
+
+ /*
+ * Actual lines inserted by worker (some records will be filtered based on
+ * where condition).
+ */
+ pg_atomic_uint64 processed;
+ pg_atomic_uint64 total_worker_processed; /* total processed records by the workers */
+ uint64 populated; /* lines populated by leader */
+ uint32 cur_block_pos; /* current data block */
+ ParallelCopyDataBlock data_blocks[MAX_BLOCKS_COUNT]; /* data block array */
+ FullTransactionId full_transaction_id; /* xid for copy from statement */
+ CommandId mycid; /* command id */
+ ParallelCopyLineBoundaries line_boundaries; /* line array */
+} ParallelCopyShmInfo;
+
+/*
+ * Parallel copy line buffer information.
+ */
+typedef struct ParallelCopyLineBuf
+{
+ StringInfoData line_buf;
+ uint64 cur_lineno; /* line number for error messages */
+}ParallelCopyLineBuf;
+
+/*
+ * Parallel copy data information.
+ */
+typedef struct ParallelCopyData
+{
+ Oid relid; /* relation id of the table */
+ ParallelCopyShmInfo *pcshared_info; /* common info in shared memory */
+ bool is_leader;
+
+ /* line position which worker is processing */
+ uint32 worker_processed_pos;
+
+ /*
+ * Local line_buf array, workers will copy it here and release the lines
+ * for the leader to continue.
+ */
+ ParallelCopyLineBuf worker_line_buf[WORKER_CHUNK_COUNT];
+ uint32 worker_line_buf_count; /* Number of lines */
+
+ /* Current position in worker_line_buf */
+ uint32 worker_line_buf_pos;
+}ParallelCopyData;
+
+/*
* This struct contains all the state variables used throughout a COPY
* operation. For simplicity, we use the same struct for all variants of COPY,
* even though some fields are used in only some cases.
@@ -225,8 +354,66 @@ typedef struct CopyStateData
char *raw_buf;
int raw_buf_index; /* next byte to process */
int raw_buf_len; /* total # of bytes stored */
+ int nworkers;
+ bool is_parallel;
+ ParallelCopyData *pcdata;
} CopyStateData;
+/*
+ * This structure helps in storing the common data from CopyStateData that are
+ * required by the workers. This information will then be allocated and stored
+ * into the DSM for the worker to retrieve and copy it to CopyStateData.
+ */
+typedef struct ParallelCopyCommonKeyData
+{
+ /* low-level state data */
+ CopyDest copy_dest; /* type of copy source/destination */
+ int file_encoding; /* file or remote side's character encoding */
+ bool need_transcoding; /* file encoding diff from server? */
+ bool encoding_embeds_ascii; /* ASCII can be non-first byte? */
+
+ /* parameters from the COPY command */
+ bool csv_mode; /* Comma Separated Value format? */
+ bool header_line; /* CSV header line? */
+ int null_print_len; /* length of same */
+ bool force_quote_all; /* FORCE_QUOTE *? */
+ bool convert_selectively; /* do selective binary conversion? */
+
+ /* Working state for COPY FROM */
+ AttrNumber num_defaults;
+ Oid relid;
+}ParallelCopyCommonKeyData;
+
+/*
+ * This structure will help in converting a List data type into the below
+ * structure format with the count having the number of elements in the list and
+ * the info having the List elements appended contigously. This converted
+ * structure will be allocated in shared memory and stored in DSM for the worker
+ * to retrieve and later convert it back to List data type.
+ */
+typedef struct ParallelCopyKeyListInfo
+{
+ int count; /* count of attributes */
+
+ /* string info in the form info followed by info1, info2... infon */
+ char info[1];
+} ParallelCopyKeyListInfo;
+
+/*
+ * List of internal parallel copy function pointers.
+ */
+static const struct
+{
+ char *fn_name;
+ copy_data_source_cb fn_addr;
+} InternalParallelCopyFuncPtrs[] =
+
+{
+ {
+ "copy_read_data", copy_read_data
+ },
+};
+
/* DestReceiver for COPY (query) TO */
typedef struct
{
@@ -256,6 +443,23 @@ typedef struct
/* Trim the list of buffers back down to this number after flushing */
#define MAX_PARTITION_BUFFERS 32
+/* DSM keys for parallel copy. */
+#define PARALLEL_COPY_KEY_SHARED_INFO 1
+#define PARALLEL_COPY_KEY_CSTATE 2
+#define PARALLEL_COPY_KEY_NULL_PRINT 3
+#define PARALLEL_COPY_KEY_NULL_PRINT_CLIENT 4
+#define PARALLEL_COPY_KEY_DELIM 5
+#define PARALLEL_COPY_KEY_QUOTE 6
+#define PARALLEL_COPY_KEY_ESCAPE 7
+#define PARALLEL_COPY_KEY_ATTNAME_LIST 8
+#define PARALLEL_COPY_KEY_FORCE_QUOTE_LIST 9
+#define PARALLEL_COPY_KEY_NOT_NULL_LIST 10
+#define PARALLEL_COPY_KEY_NULL_LIST 11
+#define PARALLEL_COPY_KEY_CONVERT_LIST 12
+#define PARALLEL_COPY_KEY_DATASOURCE_CB 13
+#define PARALLEL_COPY_KEY_WHERE_CLAUSE_STR 14
+#define PARALLEL_COPY_KEY_RANGE_TABLE 15
+
/* Stores multi-insert data related to a single relation in CopyFrom. */
typedef struct CopyMultiInsertBuffer
{
@@ -476,10 +680,596 @@ static bool CopyGetInt32(CopyState cstate, int32 *val);
static void CopySendInt16(CopyState cstate, int16 val);
static bool CopyGetInt16(CopyState cstate, int16 *val);
+static pg_attribute_always_inline void EndParallelCopy(ParallelContext *pcxt);
static void PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tup_desc,
List *attnamelist);
+static pg_attribute_always_inline copy_data_source_cb LookupParallelCopyFnPtr(const char *funcname);
+static pg_attribute_always_inline char* LookupParallelCopyFnStr(copy_data_source_cb fn_addr);
+
+/*
+ * CopyCommonInfoForWorker - Copy shared_cstate using cstate information.
+ */
+static pg_attribute_always_inline void
+CopyCommonInfoForWorker(CopyState cstate, ParallelCopyCommonKeyData *shared_cstate)
+{
+ shared_cstate->copy_dest = cstate->copy_dest;
+ shared_cstate->file_encoding = cstate->file_encoding;
+ shared_cstate->need_transcoding = cstate->need_transcoding;
+ shared_cstate->encoding_embeds_ascii = cstate->encoding_embeds_ascii;
+ shared_cstate->csv_mode = cstate->csv_mode;
+ shared_cstate->header_line = cstate->header_line;
+ shared_cstate->null_print_len = cstate->null_print_len;
+ shared_cstate->force_quote_all = cstate->force_quote_all;
+ shared_cstate->convert_selectively = cstate->convert_selectively;
+ shared_cstate->num_defaults = cstate->num_defaults;
+ shared_cstate->relid = cstate->pcdata->relid;
+}
+
+/*
+ * RetrieveSharedString - Retrieve the string from shared memory.
+ */
+static void
+RetrieveSharedString(shm_toc *toc, int sharedkey, char **copystr)
+{
+ char *shared_str_val = (char *)shm_toc_lookup(toc, sharedkey, true);
+ if (shared_str_val)
+ *copystr = pstrdup(shared_str_val);
+}
+
+/*
+ * RetrieveSharedList - Retrieve the list from shared memory.
+ */
+static void
+RetrieveSharedList(shm_toc *toc, int sharedkey, List **copylist)
+{
+ ParallelCopyKeyListInfo *listinformation = (ParallelCopyKeyListInfo *)shm_toc_lookup(toc, sharedkey,
+ true);
+ if (listinformation)
+ {
+ int length = 0;
+ int count;
+
+ for (count = 0; count < listinformation->count; count++)
+ {
+ char *attname = (char *)(listinformation->info + length);
+ length += strlen(attname) + 1;
+ *copylist = lappend(*copylist, makeString(attname));
+ }
+ }
+}
+
+/*
+ * CopyListSharedMemory - Copy the list into shared memory.
+ */
+static void
+CopyListSharedMemory(List *inputlist, Size memsize, ParallelCopyKeyListInfo *sharedlistinfo)
+{
+ ListCell *l;
+ int length = 0;
+
+ MemSet(sharedlistinfo, 0, memsize);
+ foreach(l, inputlist)
+ {
+ char *name = strVal(lfirst(l));
+ memcpy((char *)(sharedlistinfo->info + length), name, strlen(name));
+ sharedlistinfo->count++;
+ length += strlen(name) + 1;
+ }
+}
+
+/*
+ * ComputeListSize - compute the list size.
+ */
+static int
+ComputeListSize(List *inputlist)
+{
+ int est_size = sizeof(int);
+ if (inputlist != NIL)
+ {
+ ListCell *l;
+ foreach(l, inputlist)
+ est_size += strlen(strVal(lfirst(l))) + 1;
+ }
+
+ return est_size;
+}
+
+/*
+ * EstimateLineKeysStr - Estimate the size required in shared memory for the
+ * input string.
+ */
+static void
+EstimateLineKeysStr(ParallelContext *pcxt, char *inputstr)
+{
+ if (inputstr)
+ {
+ shm_toc_estimate_chunk(&pcxt->estimator, strlen(inputstr) + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+}
+
+/*
+ * EstimateLineKeysList - Estimate the size required in shared memory for the
+ * input list.
+ */
+static void
+EstimateLineKeysList(ParallelContext *pcxt, List *inputlist,
+ Size *est_list_size)
+{
+ if (inputlist != NIL)
+ {
+ *est_list_size = ComputeListSize(inputlist);
+ shm_toc_estimate_chunk(&pcxt->estimator, *est_list_size);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+}
/*
+ * InsertStringShm - Insert a string into shared memory.
+ */
+static void
+InsertStringShm(ParallelContext *pcxt, int key, char *inputstr)
+{
+ if (inputstr)
+ {
+ char *shmptr = (char *)shm_toc_allocate(pcxt->toc,
+ strlen(inputstr) + 1);
+ strcpy(shmptr, inputstr);
+ shm_toc_insert(pcxt->toc, key, shmptr);
+ }
+}
+
+/*
+ * InsertListShm - Insert a list into shared memory.
+ */
+static void
+InsertListShm(ParallelContext *pcxt, int key, List *inputlist,
+ Size est_list_size)
+{
+ if (inputlist != NIL)
+ {
+ ParallelCopyKeyListInfo *sharedlistinfo = (ParallelCopyKeyListInfo *)shm_toc_allocate(pcxt->toc,
+ est_list_size);
+ CopyListSharedMemory(inputlist, est_list_size, sharedlistinfo);
+ shm_toc_insert(pcxt->toc, key, sharedlistinfo);
+ }
+}
+
+/*
+ * BeginParallelCopy - start parallel copy tasks.
+ *
+ * Get the number of workers required to perform the parallel copy. The data
+ * structures that are required by the parallel workers will be initialized, the
+ * size required in DSM will be calculated and the necessary keys will be loaded
+ * in the DSM. The specified number of workers will then be launched.
+ *
+ */
+static ParallelContext*
+BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
+{
+ ParallelContext *pcxt;
+ ParallelCopyShmInfo *shared_info_ptr;
+ ParallelCopyCommonKeyData *shared_cstate;
+ FullTransactionId full_transaction_id;
+ Size est_shared_info;
+ Size est_cstateshared;
+ Size est_att_list_size;
+ Size est_quote_list_size;
+ Size est_notnull_list_size;
+ Size est_null_list_size;
+ Size est_convert_list_size;
+ Size est_datasource_cb_size;
+ int count = 0;
+ char *whereClauseStr = NULL;
+ char *rangeTableStr = NULL;
+ int parallel_workers = 0;
+ ParallelCopyData *pcdata;
+
+ parallel_workers = Min(nworkers, max_worker_processes);
+
+ /* Can't perform copy in parallel */
+ if (parallel_workers <= 0)
+ {
+ elog(WARNING,
+ "No workers available, copy will be run in non-parallel mode");
+ return NULL;
+ }
+
+ pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
+ cstate->pcdata = pcdata;
+
+ EnterParallelMode();
+ pcxt = CreateParallelContext("postgres", "ParallelCopyMain",
+ parallel_workers);
+ Assert(pcxt->nworkers > 0);
+
+ /*
+ * Estimate size for shared information for PARALLEL_COPY_KEY_SHARED_INFO
+ */
+ est_shared_info = sizeof(ParallelCopyShmInfo);
+ shm_toc_estimate_chunk(&pcxt->estimator, est_shared_info);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate the size for shared information for PARALLEL_COPY_KEY_CSTATE */
+ est_cstateshared = MAXALIGN(sizeof(ParallelCopyCommonKeyData));
+ shm_toc_estimate_chunk(&pcxt->estimator, est_cstateshared);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ EstimateLineKeysStr(pcxt, cstate->null_print);
+ EstimateLineKeysStr(pcxt, cstate->null_print_client);
+ EstimateLineKeysStr(pcxt, cstate->delim);
+ EstimateLineKeysStr(pcxt, cstate->quote);
+ EstimateLineKeysStr(pcxt, cstate->escape);
+
+ if (cstate->whereClause != NULL)
+ {
+ whereClauseStr = nodeToString(cstate->whereClause);
+ EstimateLineKeysStr(pcxt, whereClauseStr);
+ }
+
+ if (cstate->range_table != NULL)
+ {
+ rangeTableStr = nodeToString(cstate->range_table);
+ EstimateLineKeysStr(pcxt, rangeTableStr);
+ }
+
+ /* Estimate the size for shared information for PARALLEL_COPY_KEY_XID. */
+ shm_toc_estimate_chunk(&pcxt->estimator, sizeof(FullTransactionId));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_ATTNAME_LIST.
+ */
+ EstimateLineKeysList(pcxt, attnamelist, &est_att_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_FORCE_QUOTE_LIST.
+ */
+ EstimateLineKeysList(pcxt, cstate->force_quote, &est_quote_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_NOT_NULL_LIST.
+ */
+ EstimateLineKeysList(pcxt, cstate->force_notnull,
+ &est_notnull_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_NULL_LIST.
+ */
+ EstimateLineKeysList(pcxt, cstate->force_null, &est_null_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_CONVERT_LIST.
+ */
+ EstimateLineKeysList(pcxt, cstate->convert_select,
+ &est_convert_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_DATASOURCE_CB.
+ */
+ if (cstate->data_source_cb)
+ {
+ char *functionname = LookupParallelCopyFnStr(cstate->data_source_cb);
+ est_datasource_cb_size = strlen(functionname) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, est_datasource_cb_size);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+
+ InitializeParallelDSM(pcxt);
+
+ /* If no DSM segment was available, back out (do serial copy) */
+ if (pcxt->seg == NULL)
+ {
+ EndParallelCopy(pcxt);
+ elog(WARNING,
+ "DSM segments not available, copy will be run in non-parallel mode");
+ return NULL;
+ }
+
+ /* Allocate shared memory for PARALLEL_COPY_KEY_SHARED_INFO */
+ shared_info_ptr = (ParallelCopyShmInfo *) shm_toc_allocate(pcxt->toc,
+ est_shared_info);
+ MemSet(shared_info_ptr, 0, est_shared_info);
+ shared_info_ptr->is_read_in_progress = true;
+ shared_info_ptr->cur_block_pos = -1;
+ shared_info_ptr->full_transaction_id = full_transaction_id;
+ shared_info_ptr->mycid = GetCurrentCommandId(true);
+ for (count = 0; count < RINGSIZE; count++)
+ {
+ ParallelCopyLineBoundary *lineInfo = &shared_info_ptr->line_boundaries.ring[count];
+ pg_atomic_init_u32(&(lineInfo->line_size), -1);
+ }
+
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_SHARED_INFO, shared_info_ptr);
+ pcdata->pcshared_info = shared_info_ptr;
+ pcdata->relid = relid;
+
+ /* Store shared build state, for which we reserved space. */
+ shared_cstate = (ParallelCopyCommonKeyData *)shm_toc_allocate(pcxt->toc,
+ est_cstateshared);
+
+ /* copy cstate variables. */
+ CopyCommonInfoForWorker(cstate, shared_cstate);
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_CSTATE, shared_cstate);
+
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_NULL_PRINT, cstate->null_print);
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_NULL_PRINT_CLIENT,
+ cstate->null_print_client);
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_DELIM, cstate->delim);
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_QUOTE, cstate->quote);
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_ESCAPE, cstate->escape);
+
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_ATTNAME_LIST,
+ attnamelist, est_att_list_size);
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_FORCE_QUOTE_LIST,
+ cstate->force_quote, est_quote_list_size);
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_NOT_NULL_LIST,
+ cstate->force_notnull, est_notnull_list_size);
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_NULL_LIST, cstate->force_null,
+ est_null_list_size);
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_CONVERT_LIST,
+ cstate->convert_select, est_convert_list_size);
+
+ if (cstate->data_source_cb)
+ {
+ char *functionname = LookupParallelCopyFnStr(cstate->data_source_cb);
+ char *data_source_cb = (char *) shm_toc_allocate(pcxt->toc,
+ est_datasource_cb_size);
+ strcpy(data_source_cb, functionname);
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_DATASOURCE_CB,
+ data_source_cb);
+ }
+
+ if (cstate->whereClause)
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_WHERE_CLAUSE_STR,
+ whereClauseStr);
+
+ if(cstate->range_table)
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_RANGE_TABLE, rangeTableStr);
+
+ LaunchParallelWorkers(pcxt);
+ if (pcxt->nworkers_launched == 0)
+ {
+ EndParallelCopy(pcxt);
+ elog(WARNING,
+ "No workers available, copy will be run in non-parallel mode");
+ return NULL;
+ }
+
+ /*
+ * Caller needs to wait for all launched workers when we return. Make sure
+ * that the failure-to-start case will not hang forever.
+ */
+ WaitForParallelWorkersToAttach(pcxt);
+
+ pcdata->is_leader = true;
+ cstate->is_parallel = true;
+ return pcxt;
+}
+
+/*
+ * EndParallelCopy - end the parallel copy tasks.
+ */
+static pg_attribute_always_inline void
+EndParallelCopy(ParallelContext *pcxt)
+{
+ Assert(!IsParallelWorker());
+
+ DestroyParallelContext(pcxt);
+ ExitParallelMode();
+}
+
+/*
+ * ParallelWorkerInitialization - Initialize parallel worker.
+ */
+static void
+ParallelWorkerInitialization(ParallelCopyCommonKeyData *shared_cstate,
+ CopyState cstate, List *attnamelist)
+{
+ uint32 count;
+ ParallelCopyData *pcdata = cstate->pcdata;
+ TupleDesc tup_desc = RelationGetDescr(cstate->rel);
+
+ cstate->copy_dest = shared_cstate->copy_dest;
+ cstate->file_encoding = shared_cstate->file_encoding;
+ cstate->need_transcoding = shared_cstate->need_transcoding;
+ cstate->encoding_embeds_ascii = shared_cstate->encoding_embeds_ascii;
+ cstate->csv_mode = shared_cstate->csv_mode;
+ cstate->header_line = shared_cstate->header_line;
+ cstate->null_print_len = shared_cstate->null_print_len;
+ cstate->force_quote_all = shared_cstate->force_quote_all;
+ cstate->convert_selectively = shared_cstate->convert_selectively;
+ cstate->num_defaults = shared_cstate->num_defaults;
+ pcdata->relid = shared_cstate->relid;
+
+ PopulateGlobalsForCopyFrom(cstate, tup_desc, attnamelist);
+
+ /* Initialize state variables. */
+ cstate->reached_eof = false;
+ cstate->eol_type = EOL_UNKNOWN;
+ cstate->cur_relname = RelationGetRelationName(cstate->rel);
+ cstate->cur_lineno = 0;
+ cstate->cur_attname = NULL;
+ cstate->cur_attval = NULL;
+
+ /* Set up variables to avoid per-attribute overhead. */
+ initStringInfo(&cstate->attribute_buf);
+
+ initStringInfo(&cstate->line_buf);
+ for (count = 0; count < WORKER_CHUNK_COUNT;count++)
+ initStringInfo(&pcdata->worker_line_buf[count].line_buf);
+
+ cstate->line_buf_converted = false;
+ cstate->raw_buf = NULL;
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+}
+/*
+ * ParallelCopyMain - parallel copy worker's code.
+ *
+ * Where clause handling, convert tuple to columns, add default null values for
+ * the missing columns that are not present in that record. Find the partition
+ * if it is partitioned table, invoke before row insert Triggers, handle
+ * constraints and insert the tuples.
+ */
+void
+ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
+{
+ CopyState cstate;
+ ParallelCopyData *pcdata;
+ ParallelCopyShmInfo *pcshared_info;
+ ParallelCopyCommonKeyData *shared_cstate;
+ Relation rel = NULL;
+ MemoryContext oldcontext;
+ List *attlist = NIL;
+ char *data_source_cb;
+ char *whereClauseStr = NULL;
+ char *rangeTableStr = NULL;
+
+ /* Allocate workspace and zero all fields. */
+ cstate = (CopyStateData *) palloc0(sizeof(CopyStateData));
+ pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
+ cstate->pcdata = pcdata;
+ pcdata->is_leader = false;
+ pcdata->worker_processed_pos = -1;
+ cstate->is_parallel = true;
+
+ /*
+ * We allocate everything used by a cstate in a new memory context. This
+ * avoids memory leaks during repeated use of COPY in a query.
+ */
+ cstate->copycontext = AllocSetContextCreate(CurrentMemoryContext,
+ "COPY",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+
+ pcshared_info = (ParallelCopyShmInfo *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_SHARED_INFO,
+ false);
+
+ ereport(DEBUG1, (errmsg("Starting parallel copy worker")));
+
+ pcdata->pcshared_info = pcshared_info;
+
+ shared_cstate = (ParallelCopyCommonKeyData *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_CSTATE,
+ false);
+
+ cstate->null_print = (char *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_NULL_PRINT,
+ true);
+ cstate->null_print_client = (char *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_NULL_PRINT_CLIENT,
+ true);
+
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_DELIM, &cstate->delim);
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_QUOTE, &cstate->quote);
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_ESCAPE, &cstate->escape);
+
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_ATTNAME_LIST, &attlist);
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_FORCE_QUOTE_LIST, &cstate->force_quote);
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_NOT_NULL_LIST, &cstate->force_notnull);
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_NULL_LIST, &cstate->force_null);
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_CONVERT_LIST, &cstate->convert_select);
+ data_source_cb = (char *)shm_toc_lookup(toc,
+ PARALLEL_COPY_KEY_DATASOURCE_CB,
+ true);
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_WHERE_CLAUSE_STR, &whereClauseStr);
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_RANGE_TABLE, &rangeTableStr);
+
+ if (data_source_cb)
+ cstate->data_source_cb = LookupParallelCopyFnPtr(data_source_cb);
+
+ if (whereClauseStr)
+ {
+ Node *whereClauseCnvrtdFrmStr = (Node *) stringToNode(whereClauseStr);
+ cstate->whereClause = whereClauseCnvrtdFrmStr;
+ }
+
+ if (rangeTableStr)
+ {
+ List *rangeTableCnvrtdFrmStr = (List *) stringToNode(rangeTableStr);
+ cstate->range_table = rangeTableCnvrtdFrmStr;
+ }
+
+ /* Open and lock the relation, using the appropriate lock type. */
+ rel = table_open(shared_cstate->relid, RowExclusiveLock);
+ cstate->rel = rel;
+ ParallelWorkerInitialization(shared_cstate, cstate, attlist);
+
+ CopyFrom(cstate);
+
+ if (rel != NULL)
+ table_close(rel, RowExclusiveLock);
+
+ MemoryContextSwitchTo(oldcontext);
+ return;
+}
+/*
+ * ParallelCopyLeader - parallel copy leader's functionality.
+ *
+ * Leader will populate the shared memory and share it across the workers.
+ * Leader will read the table data from the file and copy the contents to block.
+ * Leader will then read the input contents and identify the data based on line
+ * beaks. This information is called line. The line will be populate in
+ * ParallelCopyLineBoundary. Workers will then pick up this information and
+ * insert in to table. Leader will do this till it completes processing the
+ * file. Leader executes the before statement if before statement trigger is
+ * present. Leader read the data from input file. Leader then loads data to data
+ * blocks as and when required block by block. Leader traverses through the data
+ * block to identify one line. It gets a free line to copy the information, if
+ * there is no free line it will wait till there is one free line.
+ * Server copies the identified lines information into lines. This process is
+ * repeated till the complete file is processed.
+ * Leader will wait till all the lines populated are processed by the workers
+ * and exits.
+ */
+static void
+ParallelCopyLeader(CopyState cstate)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ereport(DEBUG1, (errmsg("Running parallel copy leader")));
+
+ pcshared_info->is_read_in_progress = false;
+ cstate->cur_lineno = 0;
+}
+/*
+ * LookupParallelCopyFnPtr - Look up parallel copy function pointer.
+ */
+static pg_attribute_always_inline copy_data_source_cb
+LookupParallelCopyFnPtr(const char *funcname)
+{
+ int i;
+
+ for (i = 0; i < lengthof(InternalParallelCopyFuncPtrs); i++)
+ {
+ if (strcmp(InternalParallelCopyFuncPtrs[i].fn_name, funcname) == 0)
+ return InternalParallelCopyFuncPtrs[i].fn_addr;
+ }
+
+ /* We can only reach this by programming error. */
+ elog(ERROR, "internal function \"%s\" not found", funcname);
+}
+
+/*
+ * LookupParallelCopyFnStr - Lookup function string from a function pointer.
+ */
+static pg_attribute_always_inline char*
+LookupParallelCopyFnStr(copy_data_source_cb fn_addr)
+{
+ int i;
+
+ for (i = 0; i < lengthof(InternalParallelCopyFuncPtrs); i++)
+ {
+ if (InternalParallelCopyFuncPtrs[i].fn_addr == fn_addr)
+ return InternalParallelCopyFuncPtrs[i].fn_name;
+ }
+
+ /* We can only reach this by programming error. */
+ elog(ERROR, "internal function pointer not found");
+}
+/*
* Send copy start/stop messages for frontend copies. These have changed
* in past protocol redesigns.
*/
@@ -1149,6 +1939,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
if (is_from)
{
+ ParallelContext *pcxt = NULL;
Assert(rel);
/* check read-only transaction and parallel mode */
@@ -1158,7 +1949,24 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
cstate = BeginCopyFrom(pstate, rel, stmt->filename, stmt->is_program,
NULL, stmt->attlist, stmt->options);
cstate->whereClause = whereClause;
- *processed = CopyFrom(cstate); /* copy from file to database */
+ cstate->is_parallel = false;
+
+ if (cstate->nworkers > 0)
+ pcxt = BeginParallelCopy(cstate->nworkers, cstate, stmt->attlist,
+ relid);
+
+ if (pcxt)
+ {
+ ParallelCopyLeader(cstate);
+
+ /* Wait for all copy workers to finish */
+ WaitForParallelWorkersToFinish(pcxt);
+ *processed = pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
+ EndParallelCopy(pcxt);
+ }
+ else
+ *processed = CopyFrom(cstate); /* copy from file to database */
+
EndCopyFrom(cstate);
}
else
@@ -1207,6 +2015,7 @@ ProcessCopyOptions(ParseState *pstate,
cstate->is_copy_from = is_from;
cstate->file_encoding = -1;
+ cstate->nworkers = -1;
/* Extract options from the statement node tree */
foreach(option, options)
@@ -1375,6 +2184,53 @@ ProcessCopyOptions(ParseState *pstate,
defel->defname),
parser_errposition(pstate, defel->location)));
}
+ else if (strcmp(defel->defname, "parallel") == 0)
+ {
+ char *endptr, *str;
+ long val;
+
+ if (!cstate->is_copy_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("parallel option supported only for copy from"),
+ parser_errposition(pstate, defel->location)));
+ if (cstate->nworkers >= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("conflicting or redundant options"),
+ parser_errposition(pstate, defel->location)));
+
+ str = defGetString(defel);
+
+ errno = 0; /* To distinguish success/failure after call */
+ val = strtol(str, &endptr, 10);
+
+ /* Check for various possible errors */
+ if ((errno == ERANGE && (val == LONG_MAX || val == LONG_MIN))
+ || (errno != 0 && val == 0) ||
+ *endptr)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("improper use of argument to option \"%s\"",
+ defel->defname),
+ parser_errposition(pstate, defel->location)));
+
+ if (endptr == str)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("no digits were found in argument to option \"%s\"",
+ defel->defname),
+ parser_errposition(pstate, defel->location)));
+
+ cstate->nworkers = (int) val;
+
+ if (cstate->nworkers <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("argument to option \"%s\" must be a positive integer greater than zero",
+ defel->defname),
+ parser_errposition(pstate, defel->location)));
+ }
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1727,12 +2583,13 @@ BeginCopy(ParseState *pstate,
}
/*
- * PopulateGlobalsForCopyFrom - Populates the common variables required for copy
- * from operation. This is a helper function for BeginCopy function.
+ * PopulateCatalogInformation - Populates the common variables required for copy
+ * from operation. This is a helper function for BeginCopy &
+ * ParallelWorkerInitialization function.
*/
static void
PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tupDesc,
- List *attnamelist)
+ List *attnamelist)
{
int num_phys_attrs;
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index c27d970..b3787c1 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -557,7 +557,7 @@ make_copy_attnamelist(LogicalRepRelMapEntry *rel)
* Data source callback for the COPY FROM, which reads from the remote
* connection and passes the data back to our local COPY.
*/
-static int
+int
copy_read_data(void *outbuf, int minread, int maxread)
{
int bytesread = 0;
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c639833..5dc95ac 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -14,6 +14,7 @@
#ifndef COPY_H
#define COPY_H
+#include "access/parallel.h"
#include "nodes/execnodes.h"
#include "nodes/parsenodes.h"
#include "parser/parse_node.h"
@@ -41,4 +42,7 @@ extern uint64 CopyFrom(CopyState cstate);
extern DestReceiver *CreateCopyDestReceiver(void);
+extern void ParallelCopyMain(dsm_segment *seg, shm_toc *toc);
+
+extern int copy_read_data(void *outbuf, int minread, int maxread);
#endif /* COPY_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 7eaaad1..6a42ac4 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1699,6 +1699,14 @@ ParallelBitmapHeapState
ParallelBlockTableScanDesc
ParallelCompletionPtr
ParallelContext
+ParallelCopyLineBoundaries
+ParallelCopyLineBoundary
+ParallelCopyCommonKeyData
+ParallelCopyData
+ParallelCopyDataBlock
+ParallelCopyKeyListInfo
+ParallelCopyLineBuf
+ParallelCopyShmInfo
ParallelExecutorInfo
ParallelHashGrowth
ParallelHashJoinBatch
--
1.8.3.1
0003-Allow-copy-from-command-to-process-data-from-file-ST.patchapplication/x-patch; name=0003-Allow-copy-from-command-to-process-data-from-file-ST.patchDownload
From 47d27d300dea6adf9af6d688044e04770d1d8650 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>, Bharath <bharath.rupireddyforpostgres@gmail.com>
Date: Mon, 6 Jul 2020 14:33:12 +0530
Subject: [PATCH 3/6] Allow copy from command to process data from file/STDIN
contents to a table in parallel.
This feature allows the copy from to leverage multiple CPUs in order to copy
data from file/STDIN to a table. This adds a PARALLEL option to COPY FROM
command where the user can specify the number of workers that can be used
to perform the COPY FROM command. Specifying zero as number of workers will
disable parallelism.
The backend, to which the "COPY FROM" query is submitted acts as leader with
the responsibility of reading data from the file/stdin, launching at most n
number of workers as specified with PARALLEL 'n' option in the "COPY FROM"
query. The leader populates the common data required for the workers execution
in the DSM and shares it with the workers. The leader then executes before
statement triggers if there exists any. Leader populates DSM lines which
includes the start offset and line size, while populating the lines it reads
as many blocks as required into the DSM data blocks from the file. Each block
is of 64K size. The leader parses the data to identify a line, the existing
logic from CopyReadLineText which identifies the lines with some changes was
used for this. Leader checks if a free line is available to copy the
information, if there is no free line it waits till the required line is
freed up by the worker and then copies the identified lines information
(offset & line size) into the DSM lines. This process is repeated till the
complete file is processed. Simultaneously, the workers cache the lines(50)
locally into the local memory and release the lines to the leader for further
populating. Each worker processes the lines that it cached and inserts it into
the table. The leader does not participate in the insertion of data, leaders
only responsibility will be to identify the lines as fast as possible for the
workers to do the actual copy operation. The leader waits till all the lines
populated are processed by the workers and exits.
---
src/backend/access/common/toast_internals.c | 11 +-
src/backend/access/heap/heapam.c | 13 -
src/backend/access/transam/xact.c | 31 +
src/backend/commands/copy.c | 900 ++++++++++++++++++++++++++--
src/include/access/xact.h | 2 +
src/tools/pgindent/typedefs.list | 1 +
6 files changed, 906 insertions(+), 52 deletions(-)
diff --git a/src/backend/access/common/toast_internals.c b/src/backend/access/common/toast_internals.c
index 25a81e5..586d53d 100644
--- a/src/backend/access/common/toast_internals.c
+++ b/src/backend/access/common/toast_internals.c
@@ -17,6 +17,7 @@
#include "access/genam.h"
#include "access/heapam.h"
#include "access/heaptoast.h"
+#include "access/parallel.h"
#include "access/table.h"
#include "access/toast_internals.h"
#include "access/xact.h"
@@ -116,7 +117,15 @@ toast_save_datum(Relation rel, Datum value,
TupleDesc toasttupDesc;
Datum t_values[3];
bool t_isnull[3];
- CommandId mycid = GetCurrentCommandId(true);
+
+ /*
+ * Parallel copy can insert toast tuples, in case of parallel copy the
+ * command would have been set already by calling AssignCommandIdForWorker.
+ * For parallel copy call GetCurrentCommandId to get currentCommandId by
+ * passing used as false, as this is taken care earlier.
+ */
+ CommandId mycid = IsParallelWorker() ? GetCurrentCommandId(false) :
+ GetCurrentCommandId(true);
struct varlena *result;
struct varatt_external toast_pointer;
union
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 537913d..28f3a98 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2012,19 +2012,6 @@ static HeapTuple
heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
CommandId cid, int options)
{
- /*
- * Parallel operations are required to be strictly read-only in a parallel
- * worker. Parallel inserts are not safe even in the leader in the
- * general case, because group locking means that heavyweight locks for
- * relation extension or GIN page locks will not conflict between members
- * of a lock group, but we don't prohibit that case here because there are
- * useful special cases that we can safely allow, such as CREATE TABLE AS.
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-
tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 905dc7d..ed4009e 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -502,6 +502,37 @@ GetCurrentFullTransactionIdIfAny(void)
}
/*
+ * AssignFullTransactionIdForWorker
+ *
+ * For parallel copy, transaction id of leader will be used by the workers.
+ */
+void
+AssignFullTransactionIdForWorker(FullTransactionId fullTransactionId)
+{
+ TransactionState s = CurrentTransactionState;
+
+ Assert((IsInParallelMode() || IsParallelWorker()));
+ s->fullTransactionId = fullTransactionId;
+}
+
+/*
+ * AssignCommandIdForWorker
+ *
+ * For parallel copy, command id of leader will be used by the workers.
+ */
+void
+AssignCommandIdForWorker(CommandId commandId, bool used)
+{
+ Assert((IsInParallelMode() || IsParallelWorker()));
+
+ /* this is global to a transaction, not subtransaction-local */
+ if (used)
+ currentCommandIdUsed = true;
+
+ currentCommandId = commandId;
+}
+
+/*
* MarkCurrentTransactionIdLoggedIfAny
*
* Remember that the current xid - if it is assigned - now has been wal logged.
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 0a4d997..048f2d2 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -26,6 +26,7 @@
#include "access/xlog.h"
#include "catalog/dependency.h"
#include "catalog/pg_authid.h"
+#include "catalog/pg_proc_d.h"
#include "catalog/pg_type.h"
#include "commands/copy.h"
#include "commands/defrem.h"
@@ -40,11 +41,13 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "optimizer/clauses.h"
#include "optimizer/optimizer.h"
#include "parser/parse_coerce.h"
#include "parser/parse_collate.h"
#include "parser/parse_expr.h"
#include "parser/parse_relation.h"
+#include "pgstat.h"
#include "port/pg_bswap.h"
#include "rewrite/rewriteHandler.h"
#include "storage/fd.h"
@@ -95,6 +98,18 @@ typedef enum CopyInsertMethod
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
} CopyInsertMethod;
+/*
+ * State of the line.
+ */
+typedef enum ParallelCopyLineState
+{
+ LINE_INIT, /* initial state of line */
+ LINE_LEADER_POPULATING, /* leader processing line */
+ LINE_LEADER_POPULATED, /* leader completed populating line */
+ LINE_WORKER_PROCESSING, /* worker processing line */
+ LINE_WORKER_PROCESSED /* worker completed processing line */
+}ParallelCopyLineState;
+
#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
#define DATA_BLOCK_SIZE RAW_BUF_SIZE
@@ -538,9 +553,13 @@ if (1) \
{ \
if (raw_buf_ptr > cstate->raw_buf_index) \
{ \
- appendBinaryStringInfo(&cstate->line_buf, \
- cstate->raw_buf + cstate->raw_buf_index, \
- raw_buf_ptr - cstate->raw_buf_index); \
+ if (!IsParallelCopy()) \
+ appendBinaryStringInfo(&cstate->line_buf, \
+ cstate->raw_buf + cstate->raw_buf_index, \
+ raw_buf_ptr - cstate->raw_buf_index); \
+ else \
+ line_size += raw_buf_ptr - cstate->raw_buf_index; \
+ \
cstate->raw_buf_index = raw_buf_ptr; \
} \
} else ((void) 0)
@@ -553,13 +572,40 @@ if (1) \
goto not_end_of_copy; \
} else ((void) 0)
+/* Begin parallel copy Macros */
+#define SET_NEWLINE_SIZE() \
+{ \
+ if (cstate->eol_type == EOL_NL || cstate->eol_type == EOL_CR) \
+ new_line_size = 1; \
+ else if (cstate->eol_type == EOL_CRNL) \
+ new_line_size = 2; \
+ else \
+ new_line_size = 0; \
+}
+
+/*
+ * COPY_WAIT_TO_PROCESS - Wait before continuing to process.
+ */
+#define COPY_WAIT_TO_PROCESS() \
+{ \
+ CHECK_FOR_INTERRUPTS(); \
+ (void) WaitLatch(MyLatch, \
+ WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, \
+ 1L, WAIT_EVENT_PG_SLEEP); \
+ ResetLatch(MyLatch); \
+}
+
+/* End parallel copy Macros */
+
/*
* CONVERT_TO_SERVER_ENCODING - convert contents to server encoding.
*/
#define CONVERT_TO_SERVER_ENCODING(cstate) \
{ \
/* Done reading the line. Convert it to server encoding. */ \
- if (cstate->need_transcoding) \
+ if (cstate->need_transcoding && \
+ (!IsParallelCopy() || \
+ (IsParallelCopy() && !IsLeader()))) \
{ \
char *cvt; \
cvt = pg_any_to_server(cstate->line_buf.data, \
@@ -618,22 +664,38 @@ if (1) \
* CLEAR_EOL_LINE - Wrapper for clearing EOL.
*/
#define CLEAR_EOL_LINE() \
-if (!result && !IsHeaderLine()) \
- CLEAR_EOL_FROM_COPIED_DATA(cstate->line_buf.data, \
- cstate->line_buf.len, \
- cstate->line_buf.len) \
+{ \
+ if (!result && !IsHeaderLine()) \
+ { \
+ if (IsParallelCopy()) \
+ CLEAR_EOL_FROM_COPIED_DATA(cstate->raw_buf, \
+ raw_buf_ptr, line_size) \
+ else \
+ CLEAR_EOL_FROM_COPIED_DATA(cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ cstate->line_buf.len) \
+ } \
+}
/*
* INCREMENTPROCESSED - Increment the lines processed.
*/
-#define INCREMENTPROCESSED(processed) \
-processed++;
+#define INCREMENTPROCESSED(processed) \
+{ \
+ if (!IsParallelCopy()) \
+ processed++; \
+ else \
+ pg_atomic_add_fetch_u64(&cstate->pcdata->pcshared_info->processed, 1); \
+}
/*
* GETPROCESSED - Get the lines processed.
*/
#define GETPROCESSED(processed) \
-return processed;
+if (!IsParallelCopy()) \
+ return processed; \
+else \
+ return pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
@@ -681,8 +743,12 @@ static void CopySendInt16(CopyState cstate, int16 val);
static bool CopyGetInt16(CopyState cstate, int16 *val);
static pg_attribute_always_inline void EndParallelCopy(ParallelContext *pcxt);
+static void ExecBeforeStmtTrigger(CopyState cstate);
+static void CheckTargetRelValidity(CopyState cstate);
static void PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tup_desc,
List *attnamelist);
+static void PopulateCatalogInformation(CopyState cstate);
+static pg_attribute_always_inline uint32 GetLinePosition(CopyState cstate);
static pg_attribute_always_inline copy_data_source_cb LookupParallelCopyFnPtr(const char *funcname);
static pg_attribute_always_inline char* LookupParallelCopyFnStr(copy_data_source_cb fn_addr);
@@ -836,6 +902,137 @@ InsertListShm(ParallelContext *pcxt, int key, List *inputlist,
}
/*
+ * IsTriggerFunctionParallelSafe - Check if the trigger function is parallel
+ * safe for the triggers. Return false if any one of the trigger has parallel
+ * unsafe function.
+ */
+static pg_attribute_always_inline bool
+IsTriggerFunctionParallelSafe(TriggerDesc *trigdesc)
+{
+ int i;
+ for (i = 0; i < trigdesc->numtriggers; i++)
+ {
+ Trigger *trigger = &trigdesc->triggers[i];
+ int trigtype = RI_TRIGGER_NONE;
+
+ if (func_parallel(trigger->tgfoid) != PROPARALLEL_SAFE)
+ return false;
+
+ /* If the trigger is parallel safe, also look for RI_TRIGGER. */
+ trigtype = RI_FKey_trigger_type(trigger->tgfoid);
+ if (trigtype == RI_TRIGGER_PK || trigtype == RI_TRIGGER_FK)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * CheckExprParallelSafety - determine parallel safety of volatile expressions
+ * in default clause of column definition or in where clause and return true if
+ * they are parallel safe.
+ */
+static pg_attribute_always_inline bool
+CheckExprParallelSafety(CopyState cstate)
+{
+ if (contain_volatile_functions(cstate->whereClause))
+ {
+ if (max_parallel_hazard((Query *)cstate->whereClause) != PROPARALLEL_SAFE)
+ return false;
+ }
+
+ /*
+ * Check if any of the column has volatile default expression. if yes, and
+ * they are not parallel safe, then parallelism is not allowed. For
+ * instance, if there are any serial/bigserial columns for which nextval()
+ * default expression which is parallel unsafe is associated, parallelism
+ * should not be allowed. In non parallel copy volatile functions are not
+ * checked for nextval().
+ */
+ if (cstate->defexprs != NULL && cstate->num_defaults != 0)
+ {
+ int i;
+ for (i = 0; i < cstate->num_defaults; i++)
+ {
+ bool volatile_expr = contain_volatile_functions((Node *)cstate->defexprs[i]->expr);
+ if (volatile_expr &&
+ (max_parallel_hazard((Query *)cstate->defexprs[i]->expr)) !=
+ PROPARALLEL_SAFE)
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * FindInsertMethod - determine insert mode single, multi, or multi conditional.
+ */
+static pg_attribute_always_inline CopyInsertMethod
+FindInsertMethod(CopyState cstate)
+{
+ if (cstate->rel->trigdesc != NULL &&
+ (cstate->rel->trigdesc->trig_insert_before_row ||
+ cstate->rel->trigdesc->trig_insert_instead_row))
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
+ cstate->rel->trigdesc != NULL &&
+ cstate->rel->trigdesc->trig_insert_new_table)
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+ return CIM_MULTI_CONDITIONAL;
+
+ return CIM_MULTI;
+}
+
+/*
+ * IsParallelCopyAllowed - check for the cases where parallel copy is not
+ * applicable.
+ */
+static pg_attribute_always_inline bool
+IsParallelCopyAllowed(CopyState cstate)
+{
+ /* Parallel copy not allowed for frontend (2.0 protocol) & binary option. */
+ if ((cstate->copy_dest == COPY_OLD_FE) || cstate->binary)
+ return false;
+
+ /* Check if copy is into foreign table or temporary table. */
+ if (cstate->rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE ||
+ RelationUsesLocalBuffers(cstate->rel))
+ return false;
+
+ /* Check if trigger function is parallel safe. */
+ if (cstate->rel->trigdesc != NULL &&
+ !IsTriggerFunctionParallelSafe(cstate->rel->trigdesc))
+ return false;
+
+ /*
+ * Check if there is after statement or instead of trigger or transition
+ * table triggers.
+ */
+ if (cstate->rel->trigdesc != NULL &&
+ (cstate->rel->trigdesc->trig_insert_after_statement ||
+ cstate->rel->trigdesc->trig_insert_instead_row ||
+ cstate->rel->trigdesc->trig_insert_new_table))
+ return false;
+
+ /* Check if the volatile expressions are parallel safe, if present any. */
+ if (!CheckExprParallelSafety(cstate))
+ return false;
+
+ /* Check if the insertion mode is single. */
+ if (FindInsertMethod(cstate) == CIM_SINGLE)
+ return false;
+
+ return true;
+}
+
+/*
* BeginParallelCopy - start parallel copy tasks.
*
* Get the number of workers required to perform the parallel copy. The data
@@ -865,6 +1062,8 @@ BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
int parallel_workers = 0;
ParallelCopyData *pcdata;
+ CheckTargetRelValidity(cstate);
+
parallel_workers = Min(nworkers, max_worker_processes);
/* Can't perform copy in parallel */
@@ -878,6 +1077,19 @@ BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
cstate->pcdata = pcdata;
+ /*
+ * User chosen parallel copy. Determine if the parallel copy is actually
+ * allowed. If not, go with the non-parallel mode.
+ */
+ if (!IsParallelCopyAllowed(cstate))
+ {
+ elog(WARNING,
+ "Parallel copy not supported for specified table, copy will be run in non-parallel mode");
+ return NULL;
+ }
+
+ full_transaction_id = GetCurrentFullTransactionId();
+
EnterParallelMode();
pcxt = CreateParallelContext("postgres", "ParallelCopyMain",
parallel_workers);
@@ -1108,7 +1320,211 @@ ParallelWorkerInitialization(ParallelCopyCommonKeyData *shared_cstate,
cstate->line_buf_converted = false;
cstate->raw_buf = NULL;
cstate->raw_buf_index = cstate->raw_buf_len = 0;
+
+ PopulateCatalogInformation(cstate);
+
+ /* Create workspace for CopyReadAttributes results. */
+ if (!cstate->binary)
+ {
+ AttrNumber attr_count = list_length(cstate->attnumlist);
+
+ cstate->max_fields = attr_count;
+ cstate->raw_fields = (char **)palloc(attr_count * sizeof(char *));
+ }
+}
+
+/*
+ * CacheLineInfo - Cache the line information to local memory.
+ */
+static bool
+CacheLineInfo(CopyState cstate, uint32 buff_count)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyData *pcdata = cstate->pcdata;
+ uint32 write_pos;
+ ParallelCopyDataBlock *data_blk_ptr;
+ ParallelCopyLineBoundary *lineInfo;
+ uint32 offset;
+ int dataSize;
+ int copiedSize = 0;
+
+ resetStringInfo(&pcdata->worker_line_buf[buff_count].line_buf);
+ write_pos = GetLinePosition(cstate);
+ if (-1 == write_pos)
+ return true;
+
+ /* Get the current line information. */
+ lineInfo = &pcshared_info->line_boundaries.ring[write_pos];
+ if (pg_atomic_read_u32(&lineInfo->line_size) == 0)
+ goto empty_data_line_update;
+
+ /* Get the block information. */
+ data_blk_ptr = &pcshared_info->data_blocks[lineInfo->first_block];
+
+ /* Get the offset information from where the data must be copied. */
+ offset = lineInfo->start_offset;
+ pcdata->worker_line_buf[buff_count].cur_lineno = lineInfo->cur_lineno;
+
+ elog(DEBUG1, "[Worker] Processing - line position:%d, block:%d, unprocessed lines:%d, offset:%d, line size:%d",
+ write_pos, lineInfo->first_block,
+ pg_atomic_read_u32(&data_blk_ptr->unprocessed_line_parts),
+ offset, pg_atomic_read_u32(&lineInfo->line_size));
+
+ for (;;)
+ {
+ uint8 skip_bytes = data_blk_ptr->skip_bytes;
+ /*
+ * There is a possibility that the above loop has come out because
+ * data_blk_ptr->curr_blk_completed is set, but dataSize read might
+ * be an old value, if data_blk_ptr->curr_blk_completed and the line is
+ * completed, line_size will be set. Read the line_size again to be
+ * sure if it is complete or partial block.
+ */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+ if (dataSize)
+ {
+ int remainingSize = dataSize - copiedSize;
+ if (!remainingSize)
+ break;
+
+ /* Whole line is in current block. */
+ if (remainingSize + offset + skip_bytes < DATA_BLOCK_SIZE)
+ {
+ appendBinaryStringInfo(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ remainingSize);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts,
+ 1);
+ break;
+ }
+ else
+ {
+ /* Line is spread across the blocks. */
+ uint32 lineInCurrentBlock = (DATA_BLOCK_SIZE - skip_bytes) - offset;
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ lineInCurrentBlock);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts, 1);
+ copiedSize += lineInCurrentBlock;
+ while (copiedSize < dataSize)
+ {
+ uint32 currentBlockCopySize;
+ ParallelCopyDataBlock *currBlkPtr = &pcshared_info->data_blocks[data_blk_ptr->following_block];
+ skip_bytes = currBlkPtr->skip_bytes;
+
+ /*
+ * If complete data is present in current block use
+ * dataSize - copiedSize, or copy the whole block from
+ * current block.
+ */
+ currentBlockCopySize = Min(dataSize - copiedSize, DATA_BLOCK_SIZE - skip_bytes);
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &currBlkPtr->data[0],
+ currentBlockCopySize);
+ pg_atomic_sub_fetch_u32(&currBlkPtr->unprocessed_line_parts, 1);
+ copiedSize += currentBlockCopySize;
+ data_blk_ptr = currBlkPtr;
+ }
+
+ break;
+ }
+ }
+ else
+ {
+ /* Copy this complete block from the current offset. */
+ uint32 lineInCurrentBlock = (DATA_BLOCK_SIZE - skip_bytes) - offset;
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ lineInCurrentBlock);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts, 1);
+ copiedSize += lineInCurrentBlock;
+
+ /*
+ * Reset the offset. For the first copy, copy from the offset. For
+ * the subsequent copy the complete block.
+ */
+ offset = 0;
+
+ /* Set data_blk_ptr to the following block. */
+ data_blk_ptr = &pcshared_info->data_blocks[data_blk_ptr->following_block];
+ }
+
+ for (;;)
+ {
+ /* Get the size of this line */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+
+ /*
+ * If the data is present in current block lineInfo.line_size
+ * will be updated. If the data is spread across the blocks either
+ * of lineInfo.line_size or data_blk_ptr->curr_blk_completed can
+ * be updated. lineInfo.line_size will be updated if the complete
+ * read is finished. data_blk_ptr->curr_blk_completed will be
+ * updated if processing of current block is finished and data
+ * processing is not finished.
+ */
+ if (data_blk_ptr->curr_blk_completed || (dataSize != -1))
+ break;
+
+ COPY_WAIT_TO_PROCESS()
+ }
+ }
+
+empty_data_line_update:
+ elog(DEBUG1, "[Worker] Completed processing line:%d", write_pos);
+ pg_atomic_write_u32(&lineInfo->line_state, LINE_WORKER_PROCESSED);
+ pg_atomic_write_u32(&lineInfo->line_size, -1);
+ pg_atomic_add_fetch_u64(&pcshared_info->total_worker_processed, 1);
+ return false;
+}
+
+/*
+ * GetWorkerLine - Returns a line for worker to process.
+ */
+static bool
+GetWorkerLine(CopyState cstate)
+{
+ uint32 buff_count;
+ ParallelCopyData *pcdata = cstate->pcdata;
+
+ /*
+ * Copy the line data to line_buf and release the line position so that the
+ * worker can continue loading data.
+ */
+ if (pcdata->worker_line_buf_pos < pcdata->worker_line_buf_count)
+ goto return_line;
+
+ pcdata->worker_line_buf_pos = 0;
+ pcdata->worker_line_buf_count = 0;
+
+ for (buff_count = 0; buff_count < WORKER_CHUNK_COUNT; buff_count++)
+ {
+ bool result = CacheLineInfo(cstate, buff_count);
+ if (result)
+ break;
+
+ pcdata->worker_line_buf_count++;
+ }
+
+ if (pcdata->worker_line_buf_count)
+ goto return_line;
+ else
+ resetStringInfo(&cstate->line_buf);
+
+ return true;
+
+return_line:
+ cstate->line_buf = pcdata->worker_line_buf[pcdata->worker_line_buf_pos].line_buf;
+ cstate->cur_lineno = pcdata->worker_line_buf[pcdata->worker_line_buf_pos].cur_lineno;
+ cstate->line_buf_valid = true;
+
+ /* Mark that encoding conversion hasn't occurred yet. */
+ cstate->line_buf_converted = false;
+ CONVERT_TO_SERVER_ENCODING(cstate)
+ pcdata->worker_line_buf_pos++;
+ return false;
}
+
/*
* ParallelCopyMain - parallel copy worker's code.
*
@@ -1154,6 +1570,8 @@ ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
ereport(DEBUG1, (errmsg("Starting parallel copy worker")));
pcdata->pcshared_info = pcshared_info;
+ AssignFullTransactionIdForWorker(pcshared_info->full_transaction_id);
+ AssignCommandIdForWorker(pcshared_info->mycid, true);
shared_cstate = (ParallelCopyCommonKeyData *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_CSTATE,
false);
@@ -1206,6 +1624,34 @@ ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
MemoryContextSwitchTo(oldcontext);
return;
}
+
+/*
+ * UpdateBlockInLineInfo - Update the line information.
+ */
+static pg_attribute_always_inline int
+UpdateBlockInLineInfo(CopyState cstate, uint32 blk_pos,
+ uint32 offset, uint32 line_size, uint32 line_state)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundaries *lineBoundaryPtr = &pcshared_info->line_boundaries;
+ ParallelCopyLineBoundary *lineInfo;
+ int line_pos = lineBoundaryPtr->leader_pos;
+
+ /* Update the line information for the worker to pick and process. */
+ lineInfo = &lineBoundaryPtr->ring[line_pos];
+ while (pg_atomic_read_u32(&lineInfo->line_size) != -1)
+ COPY_WAIT_TO_PROCESS()
+
+ lineInfo->first_block = blk_pos;
+ lineInfo->start_offset = offset;
+ lineInfo->cur_lineno = cstate->cur_lineno;
+ pg_atomic_write_u32(&lineInfo->line_size, line_size);
+ pg_atomic_write_u32(&lineInfo->line_state, line_state);
+ lineBoundaryPtr->leader_pos = (lineBoundaryPtr->leader_pos + 1) % RINGSIZE;
+
+ return line_pos;
+}
+
/*
* ParallelCopyLeader - parallel copy leader's functionality.
*
@@ -1231,9 +1677,161 @@ ParallelCopyLeader(CopyState cstate)
ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
ereport(DEBUG1, (errmsg("Running parallel copy leader")));
+ /* Execute the before statement triggers from the leader */
+ ExecBeforeStmtTrigger(cstate);
+
+ /* On input just throw the header line away. */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
+ {
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ {
+ pcshared_info->is_read_in_progress = false;
+ return; /* done */
+ }
+ }
+
+ for (;;)
+ {
+ bool done;
+ cstate->cur_lineno++;
+
+ /* Actually read the line into memory here. */
+ done = CopyReadLine(cstate);
+
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ break;
+ }
+
pcshared_info->is_read_in_progress = false;
cstate->cur_lineno = 0;
}
+
+/*
+ * GetLinePosition - return the line position that worker should process.
+ */
+static uint32
+GetLinePosition(CopyState cstate)
+{
+ ParallelCopyData *pcdata = cstate->pcdata;
+ ParallelCopyShmInfo *pcshared_info = pcdata->pcshared_info;
+ uint32 previous_pos = pcdata->worker_processed_pos;
+ uint32 write_pos = (previous_pos == -1) ? 0 : (previous_pos + 1) % RINGSIZE;
+ for (;;)
+ {
+ int dataSize;
+ bool is_read_in_progress = pcshared_info->is_read_in_progress;
+ ParallelCopyLineBoundary *lineInfo;
+ ParallelCopyDataBlock *data_blk_ptr;
+ ParallelCopyLineState line_state = LINE_LEADER_POPULATED;
+ ParallelCopyLineState curr_line_state;
+ CHECK_FOR_INTERRUPTS();
+
+ /* File read completed & no elements to process. */
+ if (!is_read_in_progress &&
+ (pcshared_info->populated ==
+ pg_atomic_read_u64(&pcshared_info->total_worker_processed)))
+ {
+ write_pos = -1;
+ break;
+ }
+
+ /* Get the current line information. */
+ lineInfo = &pcshared_info->line_boundaries.ring[write_pos];
+ curr_line_state = pg_atomic_read_u32(&lineInfo->line_state);
+ if ((write_pos % WORKER_CHUNK_COUNT == 0) &&
+ (curr_line_state == LINE_WORKER_PROCESSED ||
+ curr_line_state == LINE_WORKER_PROCESSING))
+ {
+ pcdata->worker_processed_pos = write_pos;
+ write_pos = (write_pos + WORKER_CHUNK_COUNT) % RINGSIZE;
+ continue;
+ }
+
+ /* Get the size of this line. */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+
+ if (dataSize != 0) /* If not an empty line. */
+ {
+ /* Get the block information. */
+ data_blk_ptr = &pcshared_info->data_blocks[lineInfo->first_block];
+
+ if (!data_blk_ptr->curr_blk_completed && (dataSize == -1))
+ {
+ /* Wait till the current line or block is added. */
+ COPY_WAIT_TO_PROCESS()
+ continue;
+ }
+ }
+
+ /* Make sure that no worker has consumed this element. */
+ if (pg_atomic_compare_exchange_u32(&lineInfo->line_state,
+ &line_state, LINE_WORKER_PROCESSING))
+ break;
+ }
+
+ pcdata->worker_processed_pos = write_pos;
+ return write_pos;
+}
+
+/*
+ * GetFreeCopyBlock - Get a free block for data to be copied.
+ */
+static pg_attribute_always_inline uint32
+GetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
+{
+ int count = 0;
+ uint32 last_free_block = pcshared_info->cur_block_pos;
+ uint32 block_pos = (last_free_block != -1) ? ((last_free_block + 1) % MAX_BLOCKS_COUNT): 0;
+
+ /*
+ * Get a new block for copying data, don't check curent block, current
+ * block will have some unprocessed data.
+ */
+ while (count < (MAX_BLOCKS_COUNT - 1))
+ {
+ ParallelCopyDataBlock *dataBlkPtr = &pcshared_info->data_blocks[block_pos];
+ uint32 unprocessed_line_parts = pg_atomic_read_u32(&dataBlkPtr->unprocessed_line_parts);
+ if (unprocessed_line_parts == 0)
+ {
+ dataBlkPtr->curr_blk_completed = false;
+ dataBlkPtr->skip_bytes = 0;
+ pcshared_info->cur_block_pos = block_pos;
+ return block_pos;
+ }
+
+ block_pos = (block_pos + 1) % MAX_BLOCKS_COUNT;
+ count++;
+ }
+
+ return -1;
+}
+
+/*
+ * WaitGetFreeCopyBlock - If there are no blocks available, wait and get a block
+ * for copying data.
+ */
+static uint32
+WaitGetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
+{
+ uint32 new_free_pos = -1;
+ for (;;)
+ {
+ new_free_pos = GetFreeCopyBlock(pcshared_info);
+ if (new_free_pos != -1) /* We have got one block, break now. */
+ break;
+
+ COPY_WAIT_TO_PROCESS()
+ }
+
+ return new_free_pos;
+}
+
/*
* LookupParallelCopyFnPtr - Look up parallel copy function pointer.
*/
@@ -1269,6 +1867,146 @@ LookupParallelCopyFnStr(copy_data_source_cb fn_addr)
/* We can only reach this by programming error. */
elog(ERROR, "internal function pointer not found");
}
+
+/*
+ * SetRawBufForLoad - Set raw_buf to the shared memory where the file data must
+ * be read.
+ */
+static void
+SetRawBufForLoad(CopyState cstate, uint32 line_size, uint32 copy_buf_len,
+ uint32 raw_buf_ptr, char **copy_raw_buf)
+{
+ ParallelCopyShmInfo *pcshared_info;
+ uint32 cur_block_pos;
+ uint32 next_block_pos;
+ ParallelCopyDataBlock *cur_data_blk_ptr = NULL;
+ ParallelCopyDataBlock *next_data_blk_ptr = NULL;
+
+ if (!IsParallelCopy())
+ return;
+
+ pcshared_info = cstate->pcdata->pcshared_info;
+ cur_block_pos = pcshared_info->cur_block_pos;
+ cur_data_blk_ptr = (cstate->raw_buf) ? &pcshared_info->data_blocks[cur_block_pos] : NULL;
+ next_block_pos = WaitGetFreeCopyBlock(pcshared_info);
+ next_data_blk_ptr = &pcshared_info->data_blocks[next_block_pos];
+
+ /* set raw_buf to the data block in shared memory */
+ cstate->raw_buf = next_data_blk_ptr->data;
+ *copy_raw_buf = cstate->raw_buf;
+ if (cur_data_blk_ptr && line_size)
+ {
+ /*
+ * Mark the previous block as completed, worker can start copying this
+ * data.
+ */
+ cur_data_blk_ptr->following_block = next_block_pos;
+ pg_atomic_add_fetch_u32(&cur_data_blk_ptr->unprocessed_line_parts, 1);
+ cur_data_blk_ptr->skip_bytes = copy_buf_len - raw_buf_ptr;
+ cur_data_blk_ptr->curr_blk_completed = true;
+ }
+}
+
+/*
+ * EndLineParallelCopy - Update the line information in shared memory.
+ */
+static void
+EndLineParallelCopy(CopyState cstate, uint32 line_pos, uint32 line_size,
+ uint32 raw_buf_ptr)
+{
+ uint8 new_line_size;
+ if (!IsParallelCopy())
+ return;
+
+ if (!IsHeaderLine())
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundaries *lineBoundaryPtr = &pcshared_info->line_boundaries;
+ SET_NEWLINE_SIZE()
+ if (line_size)
+ {
+ ParallelCopyLineBoundary *lineInfo = &lineBoundaryPtr->ring[line_pos];
+ /*
+ * If the new_line_size > raw_buf_ptr, then the new block has only
+ * new line char content. The unprocessed count should not be
+ * increased in this case.
+ */
+ if (raw_buf_ptr > new_line_size)
+ {
+ uint32 cur_block_pos = pcshared_info->cur_block_pos;
+ ParallelCopyDataBlock *curr_data_blk_ptr = &pcshared_info->data_blocks[cur_block_pos];
+ pg_atomic_add_fetch_u32(&curr_data_blk_ptr->unprocessed_line_parts, 1);
+ }
+
+ /* Update line size. */
+ pg_atomic_write_u32(&lineInfo->line_size, line_size);
+ pg_atomic_write_u32(&lineInfo->line_state, LINE_LEADER_POPULATED);
+ elog(DEBUG1, "[Leader] After adding - line position:%d, line_size:%d",
+ line_pos, line_size);
+ pcshared_info->populated++;
+ }
+ else if (new_line_size)
+ {
+ /* This means only new line char, empty record should be inserted.*/
+ ParallelCopyLineBoundary *lineInfo;
+ line_pos = UpdateBlockInLineInfo(cstate, -1, -1, 0,
+ LINE_LEADER_POPULATED);
+ lineInfo = &lineBoundaryPtr->ring[line_pos];
+ elog(DEBUG1, "[Leader] Added empty line with offset:%d, line position:%d, line size:%d",
+ lineInfo->start_offset, line_pos,
+ pg_atomic_read_u32(&lineInfo->line_size));
+ pcshared_info->populated++;
+ }
+ }
+}
+
+/*
+ * ExecBeforeStmtTrigger - Execute the before statement trigger, this will be
+ * executed for parallel copy by the leader process.
+ */
+static void
+ExecBeforeStmtTrigger(CopyState cstate)
+{
+ EState *estate = CreateExecutorState();
+ ResultRelInfo *resultRelInfo;
+
+ Assert(IsLeader());
+
+ /*
+ * We need a ResultRelInfo so we can use the regular executor's
+ * index-entry-making machinery. (There used to be a huge amount of code
+ * here that basically duplicated execUtils.c ...)
+ */
+ resultRelInfo = makeNode(ResultRelInfo);
+ InitResultRelInfo(resultRelInfo,
+ cstate->rel,
+ 1, /* must match rel's position in range_table */
+ NULL,
+ 0);
+
+ /* Verify the named relation is a valid target for INSERT */
+ CheckValidResultRel(resultRelInfo, CMD_INSERT);
+
+ estate->es_result_relations = resultRelInfo;
+ estate->es_num_result_relations = 1;
+ estate->es_result_relation_info = resultRelInfo;
+
+ ExecInitRangeTable(estate, cstate->range_table);
+
+ /*
+ * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
+ * should do this for COPY, since it's not really an "INSERT" statement as
+ * such. However, executing these triggers maintains consistency with the
+ * EACH ROW triggers that we already fire on COPY.
+ */
+ ExecBSInsertTriggers(estate, resultRelInfo);
+
+ /* Close any trigger target relations */
+ ExecCleanUpTriggerState(estate);
+
+ FreeExecutorState(estate);
+}
+
/*
* Send copy start/stop messages for frontend copies. These have changed
* in past protocol redesigns.
@@ -3698,7 +4436,8 @@ CopyFrom(CopyState cstate)
PartitionTupleRouting *proute = NULL;
ErrorContextCallback errcallback;
- CommandId mycid = GetCurrentCommandId(true);
+ CommandId mycid = IsParallelCopy() ? GetCurrentCommandId(false) :
+ GetCurrentCommandId(true);
int ti_options = 0; /* start with default options for insert */
BulkInsertState bistate = NULL;
CopyInsertMethod insertMethod;
@@ -3708,7 +4447,14 @@ CopyFrom(CopyState cstate)
bool has_instead_insert_row_trig;
bool leafpart_use_multi_insert = false;
- CheckTargetRelValidity(cstate);
+ /*
+ * Perform this check if it is not parallel copy. In case of parallel
+ * copy, this check is done by the leader, so that if any invalid case
+ * exist the copy from command will error out from the leader itself,
+ * avoiding launching workers, just to throw error.
+ */
+ if (!IsParallelCopy())
+ CheckTargetRelValidity(cstate);
/*
* If the target file is new-in-transaction, we assume that checking FSM
@@ -3897,13 +4643,16 @@ CopyFrom(CopyState cstate)
has_instead_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
resultRelInfo->ri_TrigDesc->trig_insert_instead_row);
- /*
- * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
- * should do this for COPY, since it's not really an "INSERT" statement as
- * such. However, executing these triggers maintains consistency with the
- * EACH ROW triggers that we already fire on COPY.
- */
- ExecBSInsertTriggers(estate, resultRelInfo);
+ if (!IsParallelCopy())
+ {
+ /*
+ * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
+ * should do this for COPY, since it's not really an "INSERT" statement as
+ * such. However, executing these triggers maintains consistency with the
+ * EACH ROW triggers that we already fire on COPY.
+ */
+ ExecBSInsertTriggers(estate, resultRelInfo);
+ }
econtext = GetPerTupleExprContext(estate);
@@ -4003,6 +4752,16 @@ CopyFrom(CopyState cstate)
!has_instead_insert_row_trig &&
resultRelInfo->ri_FdwRoutine == NULL;
+ /*
+ * If the table has any partitions that are either foreign or
+ * has BEFORE/INSTEAD OF triggers, we can't perform copy
+ * operations with parallel workers.
+ */
+ if (!leafpart_use_multi_insert && IsParallelWorker())
+ ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot perform PARALLEL COPY if partition has BEFORE/INSTEAD OF triggers, or if the partition is foreign partition"),
+ errhint("Try COPY without PARALLEL option")));
+
/* Set the multi-insert buffer to use for this partition. */
if (leafpart_use_multi_insert)
{
@@ -4417,7 +5176,7 @@ BeginCopyFrom(ParseState *pstate,
initStringInfo(&cstate->attribute_buf);
initStringInfo(&cstate->line_buf);
cstate->line_buf_converted = false;
- cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf = (IsParallelCopy()) ? NULL : (char *) palloc(RAW_BUF_SIZE + 1);
cstate->raw_buf_index = cstate->raw_buf_len = 0;
/* Assign range table, we'll need it in CopyFrom. */
@@ -4561,26 +5320,35 @@ NextCopyFromRawFields(CopyState cstate, char ***fields, int *nfields)
/* only available for text or csv input */
Assert(!cstate->binary);
- /* on input just throw the header line away */
- if (cstate->cur_lineno == 0 && cstate->header_line)
+ if (IsParallelCopy())
{
- cstate->cur_lineno++;
- if (CopyReadLine(cstate))
- return false; /* done */
+ done = GetWorkerLine(cstate);
+ if (done && cstate->line_buf.len == 0)
+ return false;
}
+ else
+ {
+ /* on input just throw the header line away */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
+ {
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ return false; /* done */
+ }
- cstate->cur_lineno++;
+ cstate->cur_lineno++;
- /* Actually read the line into memory here */
- done = CopyReadLine(cstate);
+ /* Actually read the line into memory here */
+ done = CopyReadLine(cstate);
- /*
- * EOF at start of line means we're done. If we see EOF after some
- * characters, we act as though it was newline followed by EOF, ie,
- * process the line and then exit loop on next iteration.
- */
- if (done && cstate->line_buf.len == 0)
- return false;
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ return false;
+ }
/* Parse the line into de-escaped field values */
if (cstate->csv_mode)
@@ -4828,9 +5596,31 @@ CopyReadLine(CopyState cstate)
*/
if (cstate->copy_dest == COPY_NEW_FE)
{
+ bool bIsFirst = true;
do
{
- cstate->raw_buf_index = cstate->raw_buf_len;
+ if (!IsParallelCopy())
+ cstate->raw_buf_index = cstate->raw_buf_len;
+ else
+ {
+ if (cstate->raw_buf_index == RAW_BUF_SIZE)
+ {
+ /* Get a new block if it is the first time, From the
+ * subsequent time, reset the index and re-use the same
+ * block.
+ */
+ if (bIsFirst)
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint32 block_pos = WaitGetFreeCopyBlock(pcshared_info);
+ cstate->raw_buf = pcshared_info->data_blocks[block_pos].data;
+ bIsFirst = false;
+ }
+
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+ }
+ }
+
} while (CopyLoadRawBuf(cstate));
}
}
@@ -4860,6 +5650,11 @@ CopyReadLineText(CopyState cstate)
char quotec = '\0';
char escapec = '\0';
+ /* For parallel copy */
+ uint32 line_size = 0;
+ int line_pos = 0;
+
+ cstate->eol_type = EOL_UNKNOWN;
if (cstate->csv_mode)
{
quotec = cstate->quote[0];
@@ -4914,6 +5709,8 @@ CopyReadLineText(CopyState cstate)
if (raw_buf_ptr >= copy_buf_len || need_data)
{
REFILL_LINEBUF;
+ SetRawBufForLoad(cstate, line_size, copy_buf_len, raw_buf_ptr,
+ ©_raw_buf);
/*
* Try to read some more data. This will certainly reset
@@ -5138,9 +5935,15 @@ CopyReadLineText(CopyState cstate)
* discard the data and the \. sequence.
*/
if (prev_raw_ptr > cstate->raw_buf_index)
- appendBinaryStringInfo(&cstate->line_buf,
+ {
+ if (!IsParallelCopy())
+ appendBinaryStringInfo(&cstate->line_buf,
cstate->raw_buf + cstate->raw_buf_index,
prev_raw_ptr - cstate->raw_buf_index);
+ else
+ line_size += prev_raw_ptr - cstate->raw_buf_index;
+ }
+
cstate->raw_buf_index = raw_buf_ptr;
result = true; /* report EOF */
break;
@@ -5192,6 +5995,26 @@ not_end_of_copy:
IF_NEED_REFILL_AND_EOF_BREAK(mblen - 1);
raw_buf_ptr += mblen - 1;
}
+
+ /*
+ * Skip the header line. Update the line here, this cannot be done at
+ * the beginning, as there is a possibility that file contains empty
+ * lines.
+ */
+ if (IsParallelCopy() && first_char_in_line && !IsHeaderLine())
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundary *lineInfo;
+ uint32 line_first_block = pcshared_info->cur_block_pos;
+ line_pos = UpdateBlockInLineInfo(cstate,
+ line_first_block,
+ cstate->raw_buf_index, -1,
+ LINE_LEADER_POPULATING);
+ lineInfo = &pcshared_info->line_boundaries.ring[line_pos];
+ elog(DEBUG1, "[Leader] Adding - block:%d, offset:%d, line position:%d",
+ line_first_block, lineInfo->start_offset, line_pos);
+ }
+
first_char_in_line = false;
} /* end of outer loop */
@@ -5200,6 +6023,7 @@ not_end_of_copy:
*/
REFILL_LINEBUF;
CLEAR_EOL_LINE()
+ EndLineParallelCopy(cstate, line_pos, line_size, raw_buf_ptr);
return result;
}
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index db19187..20dafb7 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -381,6 +381,8 @@ extern FullTransactionId GetTopFullTransactionId(void);
extern FullTransactionId GetTopFullTransactionIdIfAny(void);
extern FullTransactionId GetCurrentFullTransactionId(void);
extern FullTransactionId GetCurrentFullTransactionIdIfAny(void);
+extern void AssignFullTransactionIdForWorker(FullTransactionId fullTransactionId);
+extern void AssignCommandIdForWorker(CommandId commandId, bool used);
extern void MarkCurrentTransactionIdLoggedIfAny(void);
extern bool SubTransactionIsActive(SubTransactionId subxid);
extern CommandId GetCurrentCommandId(bool used);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 6a42ac4..86a7620 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1701,6 +1701,7 @@ ParallelCompletionPtr
ParallelContext
ParallelCopyLineBoundaries
ParallelCopyLineBoundary
+ParallelCopyLineState
ParallelCopyCommonKeyData
ParallelCopyData
ParallelCopyDataBlock
--
1.8.3.1
0004-Documentation-for-parallel-copy.patchapplication/x-patch; name=0004-Documentation-for-parallel-copy.patchDownload
From d04485ce9e4768cbab3f2722dadcf032586fda33 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Wed, 17 Jun 2020 07:34:56 +0530
Subject: [PATCH 4/6] Documentation for parallel copy.
This patch has the documentation changes for parallel copy.
---
doc/src/sgml/ref/copy.sgml | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18189ab..95d349d 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -37,6 +37,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
NULL '<replaceable class="parameter">null_string</replaceable>'
HEADER [ <replaceable class="parameter">boolean</replaceable> ]
+ PARALLEL <replaceable class="parameter">integer</replaceable>
QUOTE '<replaceable class="parameter">quote_character</replaceable>'
ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -275,6 +276,21 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</varlistentry>
<varlistentry>
+ <term><literal>PARALLEL</literal></term>
+ <listitem>
+ <para>
+ Perform <command>COPY FROM</command> in parallel using <replaceable
+ class="parameter"> integer</replaceable> background workers. Please
+ note that it is not guaranteed that the number of parallel workers
+ specified in <replaceable class="parameter">integer</replaceable> will
+ be used during execution. It is possible for a copy to run with fewer
+ workers than specified, or even with no workers at all. This option is
+ allowed only in <command>COPY FROM</command>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>QUOTE</literal></term>
<listitem>
<para>
--
1.8.3.1
0005-Tests-for-parallel-copy.patchapplication/x-patch; name=0005-Tests-for-parallel-copy.patchDownload
From cd57bede56d817a95ae251ab4b3c543c73bcf9a8 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Wed, 17 Jun 2020 07:37:26 +0530
Subject: [PATCH 5/6] Tests for parallel copy.
This patch has the tests for parallel copy.
---
src/test/regress/expected/copy2.out | 210 +++++++++++++++++++++++++++++++++++-
src/test/regress/input/copy.source | 12 +++
src/test/regress/output/copy.source | 12 +++
src/test/regress/sql/copy2.sql | 208 ++++++++++++++++++++++++++++++++++-
4 files changed, 434 insertions(+), 8 deletions(-)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index e40287d..ef75148 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -254,18 +254,32 @@ It is "perfect".|
"It is ""perfect""."," "
"",
--test that we read consecutive LFs properly
-CREATE TEMP TABLE testnl (a int, b text, c int);
+CREATE TABLE testnl (a int, b text, c int);
COPY testnl FROM stdin CSV;
+COPY testnl FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
+SELECT * FROM testnl;
+ a | b | c
+---+----------------------+---
+ 1 | a field with two LFs+| 2
+ | +|
+ | inside |
+ 1 | a field with two LFs+| 2
+ | +|
+ | inside |
+(2 rows)
+
-- test end of copy marker
-CREATE TEMP TABLE testeoc (a text);
+CREATE TABLE testeoc (a text);
COPY testeoc FROM stdin CSV;
+TRUNCATE testeoc;
+COPY testeoc FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
COPY testeoc TO stdout CSV;
a\.
\.b
c\.d
"\."
-- test handling of nonstandard null marker that violates escaping rules
-CREATE TEMP TABLE testnull(a int, b text);
+CREATE TABLE testnull(a int, b text);
INSERT INTO testnull VALUES (1, E'\\0'), (NULL, NULL);
COPY testnull TO stdout WITH NULL AS E'\\0';
1 \\0
@@ -280,6 +294,15 @@ SELECT * FROM testnull;
|
(4 rows)
+TRUNCATE testnull;
+COPY testnull FROM stdin WITH (NULL E'\\0', PARALLEL 1);
+SELECT * FROM testnull;
+ a | b
+----+----
+ 42 | \0
+ |
+(2 rows)
+
BEGIN;
CREATE TABLE vistest (LIKE testeoc);
COPY vistest FROM stdin CSV;
@@ -349,6 +372,34 @@ SELECT * FROM vistest;
BEGIN;
TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+SELECT * FROM vistest;
+ a
+----
+ a2
+ b
+(2 rows)
+
+SAVEPOINT s1;
+TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+SELECT * FROM vistest;
+ a
+----
+ d2
+ e
+(2 rows)
+
+COMMIT;
+SELECT * FROM vistest;
+ a
+----
+ d2
+ e
+(2 rows)
+
+BEGIN;
+TRUNCATE vistest;
COPY vistest FROM stdin CSV FREEZE;
SELECT * FROM vistest;
a
@@ -409,7 +460,7 @@ SELECT * FROM vistest;
(2 rows)
-- Test FORCE_NOT_NULL and FORCE_NULL options
-CREATE TEMP TABLE forcetest (
+CREATE TABLE forcetest (
a INT NOT NULL,
b TEXT NOT NULL,
c TEXT,
@@ -420,6 +471,8 @@ CREATE TEMP TABLE forcetest (
-- should succeed with no effect ("b" remains an empty string, "c" remains NULL)
BEGIN;
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
+TRUNCATE forcetest;
+COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c), PARALLEL 1);
COMMIT;
SELECT b, c FROM forcetest WHERE a = 1;
b | c
@@ -430,6 +483,8 @@ SELECT b, c FROM forcetest WHERE a = 1;
-- should succeed, FORCE_NULL and FORCE_NOT_NULL can be both specified
BEGIN;
COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d));
+TRUNCATE forcetest;
+COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d), PARALLEL 1);
COMMIT;
SELECT c, d FROM forcetest WHERE a = 2;
c | d
@@ -486,6 +541,31 @@ select * from check_con_tbl;
(2 rows)
+\d+ check_con_tbl
+ Table "public.check_con_tbl"
+ Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
+--------+---------+-----------+----------+---------+---------+--------------+-------------
+ f1 | integer | | | | plain | |
+Check constraints:
+ "check_con_tbl_check" CHECK (check_con_function(check_con_tbl.*))
+
+truncate check_con_tbl;
+copy check_con_tbl from stdin with (parallel 1);
+NOTICE: input = {"f1":1}
+NOTICE: input = {"f1":null}
+copy check_con_tbl from stdin with (parallel 1);
+NOTICE: input = {"f1":0}
+ERROR: new row for relation "check_con_tbl" violates check constraint "check_con_tbl_check"
+DETAIL: Failing row contains (0).
+CONTEXT: COPY check_con_tbl, line 1: "0"
+parallel worker
+select * from check_con_tbl;
+ f1
+----
+ 1
+
+(2 rows)
+
-- test with RLS enabled.
CREATE ROLE regress_rls_copy_user;
CREATE ROLE regress_rls_copy_user_colperms;
@@ -600,8 +680,130 @@ SELECT * FROM instead_of_insert_tbl;
(2 rows)
COMMIT;
+-- Parallel copy tests.
+CREATE TABLE test_parallel_copy (
+ a INT,
+ b INT,
+ c TEXT not null default 'stuff',
+ d TEXT,
+ e TEXT
+) ;
+COPY test_parallel_copy (a, b, c, d, e) from stdin with (PARALLEL 1);
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1, FORMAT 'csv', HEADER);
+-- zero workers: should perform non-parallel copy
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0');
+ERROR: argument to option "parallel" must be a positive integer greater than zero
+LINE 1: COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0'...
+ ^
+-- referencing table: should perform non-parallel copy
+CREATE TABLE test_copy_pk(c1 INT PRIMARY KEY);
+INSERT INTO test_copy_pk VALUES(10);
+CREATE TABLE test_copy_ri(c1 INT REFERENCES test_copy_pk(c1));
+COPY test_copy_ri from stdin with (FORMAT csv, DELIMITER ',', PARALLEL 1);
+WARNING: Parallel copy not supported for specified table, copy will be run in non-parallel mode
+-- expressions: should perform non-parallel copy
+CREATE TABLE test_copy_expr (index INT, height REAL, weight REAL);
+COPY test_copy_expr FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1) WHERE height > random() * 65;
+WARNING: Parallel copy not supported for specified table, copy will be run in non-parallel mode
+-- serial data: should perform non-parallel copy
+CREATE TABLE testserial (index SERIAL, height REAL);
+COPY testserial(height) FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1);
+WARNING: Parallel copy not supported for specified table, copy will be run in non-parallel mode
+-- temporary table copy: should peform non-parallel copy
+CREATE TEMPORARY TABLE temp_test(
+ a int
+) ;
+COPY temp_test (a) from stdin with (PARALLEL 1);
+WARNING: Parallel copy not supported for specified table, copy will be run in non-parallel mode
+-- non-existent column in column list: should fail
+COPY test_parallel_copy (xyz) from stdin with (PARALLEL 1);
+ERROR: column "xyz" of relation "test_parallel_copy" does not exist
+-- too many columns in column list: should fail
+COPY test_parallel_copy (a, b, c, d, e, d, c) from stdin with (PARALLEL 1);
+ERROR: column "d" specified more than once
+-- missing data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: invalid input syntax for type integer: ""
+CONTEXT: COPY test_parallel_copy, line 0, column a: ""
+parallel worker
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: missing data for column "e"
+CONTEXT: COPY test_parallel_copy, line 1: "2000 230 23 23"
+parallel worker
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: missing data for column "e"
+CONTEXT: COPY test_parallel_copy, line 1: "2001 231 \N \N"
+parallel worker
+-- extra data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: extra data after last expected column
+CONTEXT: COPY test_parallel_copy, line 1: "2002 232 40 50 60 70 80"
+parallel worker
+-- various COPY options: delimiters, oids, NULL string, encoding
+COPY test_parallel_copy (b, c, d, e) from stdin with (delimiter ',', null 'x', PARALLEL 1) ;
+COPY test_parallel_copy from stdin WITH (DELIMITER ';', NULL '', PARALLEL 1);
+COPY test_parallel_copy from stdin WITH (DELIMITER ':', NULL E'\\X', ENCODING 'sql_ascii', PARALLEL 1);
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = 50004;
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a > 60003;
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE f > 60003;
+ERROR: column "f" does not exist
+LINE 1: ..._parallel_copy from stdin WITH (PARALLEL 1) WHERE f > 60003;
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = max(x.b);
+ERROR: missing FROM-clause entry for table "x"
+LINE 1: ...rallel_copy from stdin WITH (PARALLEL 1) WHERE a = max(x.b);
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (SELECT 1 FROM x);
+ERROR: cannot use subquery in COPY FROM WHERE condition
+LINE 1: ...arallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (SELECT...
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (generate_series(1,5));
+ERROR: set-returning functions are not allowed in COPY FROM WHERE conditions
+LINE 1: ...lel_copy from stdin WITH (PARALLEL 1) WHERE a IN (generate_s...
+ ^
+COPY test_parallel_copy from stdin WITH(PARALLEL 1) WHERE a = row_number() over(b);
+ERROR: window functions are not allowed in COPY FROM WHERE conditions
+LINE 1: ...rallel_copy from stdin WITH(PARALLEL 1) WHERE a = row_number...
+ ^
+-- check results of copy in
+SELECT * FROM test_parallel_copy;
+ a | b | c | d | e
+-------+----+------------+---------+---------
+ 1 | 11 | test_c1 | test_d1 | test_e1
+ 2 | 12 | test_c2 | test_d2 | test_e2
+ | 3 | stuff | test_d3 |
+ | 4 | stuff | test_d4 |
+ | 5 | stuff | test_d5 |
+ | | 45 | 80 | 90
+ | | x | \x | \x
+ | | , | \, | \
+ 3000 | | c | |
+ 4000 | | C | |
+ 4001 | 1 | empty | |
+ 4002 | 2 | null | |
+ 4003 | 3 | Backslash | \ | \
+ 4004 | 4 | BackslashX | \X | \X
+ 4005 | 5 | N | N | N
+ 4006 | 6 | BackslashN | \N | \N
+ 4007 | 7 | XX | XX | XX
+ 4008 | 8 | Delimiter | : | :
+ 50004 | 25 | 35 | 45 | 55
+ 60004 | 25 | 35 | 45 | 55
+ 60005 | 26 | 36 | 46 | 56
+(21 rows)
+
-- clean up
+DROP TABLE test_copy_ri;
+DROP TABLE test_copy_pk;
+DROP TABLE test_copy_expr;
+DROP TABLE testeoc;
+DROP TABLE testnl;
DROP TABLE forcetest;
+DROP TABLE test_parallel_copy;
+DROP TABLE testserial;
+DROP TABLE testnull;
DROP TABLE vistest;
DROP FUNCTION truncate_in_subxact();
DROP TABLE x, y;
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index a1d529a..159c058 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -15,6 +15,13 @@ DELETE FROM onek;
COPY onek FROM '@abs_builddir@/results/onek.data';
+-- test parallel copy
+COPY tenk1 FROM '@abs_srcdir@/data/tenk.data' WITH (parallel 2);
+
+SELECT COUNT(*) FROM tenk1;
+
+TRUNCATE tenk1;
+
COPY tenk1 FROM '@abs_srcdir@/data/tenk.data';
COPY slow_emp4000 FROM '@abs_srcdir@/data/rect.data';
@@ -159,6 +166,11 @@ truncate parted_copytest;
copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv';
+-- Test parallel copy from with a partitioned table.
+truncate parted_copytest;
+
+copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv' with (parallel 2);
+
-- Ensure COPY FREEZE errors for partitioned tables.
begin;
truncate parted_copytest;
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index 938d355..c3003fe 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -9,6 +9,15 @@ COPY onek FROM '@abs_srcdir@/data/onek.data';
COPY onek TO '@abs_builddir@/results/onek.data';
DELETE FROM onek;
COPY onek FROM '@abs_builddir@/results/onek.data';
+-- test parallel copy
+COPY tenk1 FROM '@abs_srcdir@/data/tenk.data' WITH (parallel 2);
+SELECT COUNT(*) FROM tenk1;
+ count
+-------
+ 10000
+(1 row)
+
+TRUNCATE tenk1;
COPY tenk1 FROM '@abs_srcdir@/data/tenk.data';
COPY slow_emp4000 FROM '@abs_srcdir@/data/rect.data';
COPY person FROM '@abs_srcdir@/data/person.data';
@@ -113,6 +122,9 @@ insert into parted_copytest select x,1,'One' from generate_series(1011,1020) x;
copy (select * from parted_copytest order by a) to '@abs_builddir@/results/parted_copytest.csv';
truncate parted_copytest;
copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv';
+-- Test parallel copy from with a partitioned table.
+truncate parted_copytest;
+copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv' with (parallel 2);
-- Ensure COPY FREEZE errors for partitioned tables.
begin;
truncate parted_copytest;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 902f4fa..032cea9 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -157,7 +157,7 @@ COPY y TO stdout (FORMAT CSV, FORCE_QUOTE *);
--test that we read consecutive LFs properly
-CREATE TEMP TABLE testnl (a int, b text, c int);
+CREATE TABLE testnl (a int, b text, c int);
COPY testnl FROM stdin CSV;
1,"a field with two LFs
@@ -165,8 +165,16 @@ COPY testnl FROM stdin CSV;
inside",2
\.
+COPY testnl FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
+1,"a field with two LFs
+
+inside",2
+\.
+
+SELECT * FROM testnl;
+
-- test end of copy marker
-CREATE TEMP TABLE testeoc (a text);
+CREATE TABLE testeoc (a text);
COPY testeoc FROM stdin CSV;
a\.
@@ -175,11 +183,19 @@ c\.d
"\."
\.
+TRUNCATE testeoc;
+COPY testeoc FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
+a\.
+\.b
+c\.d
+"\."
+\.
+
COPY testeoc TO stdout CSV;
-- test handling of nonstandard null marker that violates escaping rules
-CREATE TEMP TABLE testnull(a int, b text);
+CREATE TABLE testnull(a int, b text);
INSERT INTO testnull VALUES (1, E'\\0'), (NULL, NULL);
COPY testnull TO stdout WITH NULL AS E'\\0';
@@ -191,6 +207,14 @@ COPY testnull FROM stdin WITH NULL AS E'\\0';
SELECT * FROM testnull;
+TRUNCATE testnull;
+COPY testnull FROM stdin WITH (NULL E'\\0', PARALLEL 1);
+42 \\0
+\0 \0
+\.
+
+SELECT * FROM testnull;
+
BEGIN;
CREATE TABLE vistest (LIKE testeoc);
COPY vistest FROM stdin CSV;
@@ -235,6 +259,23 @@ SELECT * FROM vistest;
BEGIN;
TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+a2
+b
+\.
+SELECT * FROM vistest;
+SAVEPOINT s1;
+TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+d2
+e
+\.
+SELECT * FROM vistest;
+COMMIT;
+SELECT * FROM vistest;
+
+BEGIN;
+TRUNCATE vistest;
COPY vistest FROM stdin CSV FREEZE;
x
y
@@ -284,7 +325,7 @@ SELECT * FROM vistest;
COMMIT;
SELECT * FROM vistest;
-- Test FORCE_NOT_NULL and FORCE_NULL options
-CREATE TEMP TABLE forcetest (
+CREATE TABLE forcetest (
a INT NOT NULL,
b TEXT NOT NULL,
c TEXT,
@@ -297,6 +338,10 @@ BEGIN;
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
1,,""
\.
+TRUNCATE forcetest;
+COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c), PARALLEL 1);
+1,,""
+\.
COMMIT;
SELECT b, c FROM forcetest WHERE a = 1;
-- should succeed, FORCE_NULL and FORCE_NOT_NULL can be both specified
@@ -304,6 +349,10 @@ BEGIN;
COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d));
2,'a',,""
\.
+TRUNCATE forcetest;
+COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d), PARALLEL 1);
+2,'a',,""
+\.
COMMIT;
SELECT c, d FROM forcetest WHERE a = 2;
-- should fail with not-null constraint violation
@@ -339,6 +388,16 @@ copy check_con_tbl from stdin;
0
\.
select * from check_con_tbl;
+\d+ check_con_tbl
+truncate check_con_tbl;
+copy check_con_tbl from stdin with (parallel 1);
+1
+\N
+\.
+copy check_con_tbl from stdin with (parallel 1);
+0
+\.
+select * from check_con_tbl;
-- test with RLS enabled.
CREATE ROLE regress_rls_copy_user;
@@ -440,8 +499,149 @@ test1
SELECT * FROM instead_of_insert_tbl;
COMMIT;
+-- Parallel copy tests.
+CREATE TABLE test_parallel_copy (
+ a INT,
+ b INT,
+ c TEXT not null default 'stuff',
+ d TEXT,
+ e TEXT
+) ;
+
+COPY test_parallel_copy (a, b, c, d, e) from stdin with (PARALLEL 1);
+1 11 test_c1 test_d1 test_e1
+2 12 test_c2 test_d2 test_e2
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+3 test_d3
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+4 test_d4
+5 test_d5
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1, FORMAT 'csv', HEADER);
+b d
+\.
+
+-- zero workers: should perform non-parallel copy
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0');
+
+-- referencing table: should perform non-parallel copy
+CREATE TABLE test_copy_pk(c1 INT PRIMARY KEY);
+INSERT INTO test_copy_pk VALUES(10);
+CREATE TABLE test_copy_ri(c1 INT REFERENCES test_copy_pk(c1));
+COPY test_copy_ri from stdin with (FORMAT csv, DELIMITER ',', PARALLEL 1);
+10
+\.
+
+-- expressions: should perform non-parallel copy
+CREATE TABLE test_copy_expr (index INT, height REAL, weight REAL);
+
+COPY test_copy_expr FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1) WHERE height > random() * 65;
+60,60,60
+\.
+
+-- serial data: should perform non-parallel copy
+CREATE TABLE testserial (index SERIAL, height REAL);
+
+COPY testserial(height) FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1);
+60
+\.
+
+-- temporary table copy: should peform non-parallel copy
+CREATE TEMPORARY TABLE temp_test(
+ a int
+) ;
+
+COPY temp_test (a) from stdin with (PARALLEL 1);
+10
+\.
+
+-- non-existent column in column list: should fail
+COPY test_parallel_copy (xyz) from stdin with (PARALLEL 1);
+
+-- too many columns in column list: should fail
+COPY test_parallel_copy (a, b, c, d, e, d, c) from stdin with (PARALLEL 1);
+
+-- missing data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+
+\.
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+2000 230 23 23
+\.
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+2001 231 \N \N
+\.
+
+-- extra data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+2002 232 40 50 60 70 80
+\.
+
+-- various COPY options: delimiters, oids, NULL string, encoding
+COPY test_parallel_copy (b, c, d, e) from stdin with (delimiter ',', null 'x', PARALLEL 1) ;
+x,45,80,90
+x,\x,\\x,\\\x
+x,\,,\\\,,\\
+\.
+
+COPY test_parallel_copy from stdin WITH (DELIMITER ';', NULL '', PARALLEL 1);
+3000;;c;;
+\.
+
+COPY test_parallel_copy from stdin WITH (DELIMITER ':', NULL E'\\X', ENCODING 'sql_ascii', PARALLEL 1);
+4000:\X:C:\X:\X
+4001:1:empty::
+4002:2:null:\X:\X
+4003:3:Backslash:\\:\\
+4004:4:BackslashX:\\X:\\X
+4005:5:N:\N:\N
+4006:6:BackslashN:\\N:\\N
+4007:7:XX:\XX:\XX
+4008:8:Delimiter:\::\:
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = 50004;
+50003 24 34 44 54
+50004 25 35 45 55
+50005 26 36 46 56
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a > 60003;
+60001 22 32 42 52
+60002 23 33 43 53
+60003 24 34 44 54
+60004 25 35 45 55
+60005 26 36 46 56
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE f > 60003;
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = max(x.b);
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (SELECT 1 FROM x);
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (generate_series(1,5));
+
+COPY test_parallel_copy from stdin WITH(PARALLEL 1) WHERE a = row_number() over(b);
+
+-- check results of copy in
+SELECT * FROM test_parallel_copy;
+
-- clean up
+DROP TABLE test_copy_ri;
+DROP TABLE test_copy_pk;
+DROP TABLE test_copy_expr;
+DROP TABLE testeoc;
+DROP TABLE testnl;
DROP TABLE forcetest;
+DROP TABLE test_parallel_copy;
+DROP TABLE testserial;
+DROP TABLE testnull;
DROP TABLE vistest;
DROP FUNCTION truncate_in_subxact();
DROP TABLE x, y;
--
1.8.3.1
0006-Parallel-Copy-For-Binary-Format-Files.patchapplication/x-patch; name=0006-Parallel-Copy-For-Binary-Format-Files.patchDownload
From 4a1eb9d091eb8cf82e101a1e8db683ebe22cbe88 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Fri, 26 Jun 2020 11:43:48 +0530
Subject: [PATCH 6/6] Parallel Copy For Binary Format Files
Leader reads data from the file into the DSM data blocks each of 64K size.
It also identifies each tuple data block id, start offset, end offset,
tuple size and updates this information in the ring data structure.
Workers parallely read the tuple information from the ring data structure,
the actual tuple data from the data blocks and parallely insert the tuples
into the table.
---
src/backend/commands/copy.c | 697 +++++++++++++++++++++++++++++++++++++++-----
1 file changed, 627 insertions(+), 70 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 048f2d2..5eafd5d 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -220,6 +220,16 @@ typedef struct ParallelCopyLineBuf
}ParallelCopyLineBuf;
/*
+ * Tuple boundary information used for parallel copy
+ * for binary format files.
+ */
+typedef struct ParallelCopyTupleInfo
+{
+ uint32 offset;
+ uint32 block_id;
+}ParallelCopyTupleInfo;
+
+/*
* Parallel copy data information.
*/
typedef struct ParallelCopyData
@@ -240,6 +250,11 @@ typedef struct ParallelCopyData
/* Current position in worker_line_buf */
uint32 worker_line_buf_pos;
+
+ /*
+ * For binary formatted files
+ */
+ ParallelCopyDataBlock *curr_data_block;
}ParallelCopyData;
/*
@@ -397,6 +412,7 @@ typedef struct ParallelCopyCommonKeyData
/* Working state for COPY FROM */
AttrNumber num_defaults;
Oid relid;
+ bool binary;
}ParallelCopyCommonKeyData;
/*
@@ -697,6 +713,56 @@ if (!IsParallelCopy()) \
else \
return pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
+/*
+ * CHECK_FIELD_COUNT - Handles the error cases for field count
+ * for binary format files.
+ */
+#define CHECK_FIELD_COUNT \
+{\
+ if (fld_count == -1) \
+ { \
+ if (IsParallelCopy() && \
+ !IsLeader()) \
+ return true; \
+ else if (IsParallelCopy() && \
+ IsLeader()) \
+ { \
+ if (cstate->pcdata->curr_data_block->data[cstate->raw_buf_index + sizeof(fld_count)] != 0) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("received copy data after EOF marker"))); \
+ return true; \
+ } \
+ else \
+ { \
+ /* \
+ * Received EOF marker. In a V3-protocol copy, wait for the \
+ * protocol-level EOF, and complain if it doesn't come \
+ * immediately. This ensures that we correctly handle CopyFail, \
+ * if client chooses to send that now. \
+ * \
+ * Note that we MUST NOT try to read more data in an old-protocol \
+ * copy, since there is no protocol-level EOF marker then. We \
+ * could go either way for copy from file, but choose to throw \
+ * error if there's data after the EOF marker, for consistency \
+ * with the new-protocol case. \
+ */ \
+ char dummy; \
+ if (cstate->copy_dest != COPY_OLD_FE && \
+ CopyGetData(cstate, &dummy, 1, 1) > 0) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("received copy data after EOF marker"))); \
+ return false; \
+ } \
+ } \
+ if (fld_count != cstate->max_fields) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("row field count is %d, expected %d", \
+ (int) fld_count, cstate->max_fields))); \
+}
+
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
@@ -751,6 +817,16 @@ static void PopulateCatalogInformation(CopyState cstate);
static pg_attribute_always_inline uint32 GetLinePosition(CopyState cstate);
static pg_attribute_always_inline copy_data_source_cb LookupParallelCopyFnPtr(const char *funcname);
static pg_attribute_always_inline char* LookupParallelCopyFnStr(copy_data_source_cb fn_addr);
+static uint32 WaitGetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info);
+static pg_attribute_always_inline bool CopyReadBinaryTupleLeader(CopyState cstate);
+static pg_attribute_always_inline void CopyReadBinaryAttributeLeader(CopyState cstate,
+ FmgrInfo *flinfo, Oid typioparam, int32 typmod,
+ uint32 *new_block_pos, int m, ParallelCopyTupleInfo *tuple_start_info_ptr,
+ ParallelCopyTupleInfo *tuple_end_info_ptr, uint32 *line_size);
+static pg_attribute_always_inline bool CopyReadBinaryTupleWorker(CopyState cstate, Datum *values, bool *nulls);
+static pg_attribute_always_inline Datum CopyReadBinaryAttributeWorker(CopyState cstate, FmgrInfo *flinfo,
+ Oid typioparam, int32 typmod, bool *isnull);
+static void AdjustFieldInfo(CopyState cstate, uint8 mode);
/*
* CopyCommonInfoForWorker - Copy shared_cstate using cstate information.
@@ -769,6 +845,7 @@ CopyCommonInfoForWorker(CopyState cstate, ParallelCopyCommonKeyData *shared_csta
shared_cstate->convert_selectively = cstate->convert_selectively;
shared_cstate->num_defaults = cstate->num_defaults;
shared_cstate->relid = cstate->pcdata->relid;
+ shared_cstate->binary = cstate->binary;
}
/*
@@ -997,8 +1074,8 @@ FindInsertMethod(CopyState cstate)
static pg_attribute_always_inline bool
IsParallelCopyAllowed(CopyState cstate)
{
- /* Parallel copy not allowed for frontend (2.0 protocol) & binary option. */
- if ((cstate->copy_dest == COPY_OLD_FE) || cstate->binary)
+ /* Parallel copy not allowed for frontend (2.0 protocol). */
+ if (cstate->copy_dest == COPY_OLD_FE)
return false;
/* Check if copy is into foreign table or temporary table. */
@@ -1299,6 +1376,7 @@ ParallelWorkerInitialization(ParallelCopyCommonKeyData *shared_cstate,
cstate->convert_selectively = shared_cstate->convert_selectively;
cstate->num_defaults = shared_cstate->num_defaults;
pcdata->relid = shared_cstate->relid;
+ cstate->binary = shared_cstate->binary;
PopulateGlobalsForCopyFrom(cstate, tup_desc, attnamelist);
@@ -1314,7 +1392,7 @@ ParallelWorkerInitialization(ParallelCopyCommonKeyData *shared_cstate,
initStringInfo(&cstate->attribute_buf);
initStringInfo(&cstate->line_buf);
- for (count = 0; count < WORKER_CHUNK_COUNT;count++)
+ for (count = 0; count < WORKER_CHUNK_COUNT; count++)
initStringInfo(&pcdata->worker_line_buf[count].line_buf);
cstate->line_buf_converted = false;
@@ -1680,32 +1758,66 @@ ParallelCopyLeader(CopyState cstate)
/* Execute the before statement triggers from the leader */
ExecBeforeStmtTrigger(cstate);
- /* On input just throw the header line away. */
- if (cstate->cur_lineno == 0 && cstate->header_line)
+ if (!cstate->binary)
{
- cstate->cur_lineno++;
- if (CopyReadLine(cstate))
+ /* On input just throw the header line away. */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
{
- pcshared_info->is_read_in_progress = false;
- return; /* done */
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ {
+ pcshared_info->is_read_in_progress = false;
+ return; /* done */
+ }
}
- }
- for (;;)
- {
- bool done;
- cstate->cur_lineno++;
+ for (;;)
+ {
+ bool done;
+ cstate->cur_lineno++;
- /* Actually read the line into memory here. */
- done = CopyReadLine(cstate);
+ /* Actually read the line into memory here. */
+ done = CopyReadLine(cstate);
- /*
- * EOF at start of line means we're done. If we see EOF after some
- * characters, we act as though it was newline followed by EOF, ie,
- * process the line and then exit loop on next iteration.
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ break;
+ }
+ }
+ else
+ {
+ /* binary format */
+ /* for paralle copy leader, fill in the error
+ * context inforation here, in case any failures
+ * while determining tuple offsets, leader
+ * would throw the errors with proper context.
*/
- if (done && cstate->line_buf.len == 0)
- break;
+ ErrorContextCallback errcallback;
+ errcallback.callback = CopyFromErrorCallback;
+ errcallback.arg = (void *) cstate;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+
+ cstate->pcdata->curr_data_block = NULL;
+ cstate->raw_buf_index = 0;
+ pcshared_info->populated = 0;
+ cstate->cur_lineno = 0;
+ cstate->max_fields = list_length(cstate->attnumlist);
+
+ for (;;)
+ {
+ bool eof = false;
+ cstate->cur_lineno++;
+
+ eof = CopyReadBinaryTupleLeader(cstate);
+
+ if (eof)
+ break;
+ }
}
pcshared_info->is_read_in_progress = false;
@@ -1713,7 +1825,463 @@ ParallelCopyLeader(CopyState cstate)
}
/*
- * GetLinePosition - return the line position that worker should process.
+ * AdjustFieldInfo - gets a new block, updates the
+ * current offset, calculates the skip bytes.
+ * Works in two modes, 1 for field count
+ * 2 for field size
+ */
+static void
+AdjustFieldInfo(CopyState cstate, uint8 mode)
+{
+ ParallelCopyDataBlock *data_block = NULL;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint8 movebytes = 0;
+ uint32 block_pos;
+ uint32 prev_block_pos;
+ int readbytes = 0;
+
+ prev_block_pos = pcshared_info->cur_block_pos;
+
+ block_pos = WaitGetFreeCopyBlock(pcshared_info);
+
+ movebytes = DATA_BLOCK_SIZE - cstate->raw_buf_index;
+
+ cstate->pcdata->curr_data_block->skip_bytes = movebytes;
+
+ data_block = &pcshared_info->data_blocks[block_pos];
+
+ if (movebytes > 0)
+ memmove(&data_block->data[0], &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index],
+ movebytes);
+
+ elog(DEBUG1, "LEADER - field info is spread across data blocks - moved %d bytes from current block %u to %u block",
+ movebytes, prev_block_pos, block_pos);
+
+ readbytes = CopyGetData(cstate, &data_block->data[movebytes], 1, (DATA_BLOCK_SIZE - movebytes));
+
+ elog(DEBUG1, "LEADER - bytes read from file after field info is moved to next data block %d", readbytes);
+
+ if (cstate->reached_eof)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("unexpected EOF in COPY data")));
+
+ if (mode == 1)
+ {
+ cstate->pcdata->curr_data_block = data_block;
+ cstate->raw_buf_index = 0;
+ }
+ else if(mode == 2)
+ {
+ ParallelCopyDataBlock *prev_data_block = NULL;
+ prev_data_block = cstate->pcdata->curr_data_block;
+ prev_data_block->following_block = block_pos;
+ cstate->pcdata->curr_data_block = data_block;
+
+ if (prev_data_block->curr_blk_completed == false)
+ prev_data_block->curr_blk_completed = true;
+
+ cstate->raw_buf_index = 0;
+ }
+}
+
+/*
+ * CopyReadBinaryTupleLeader - leader reads data from binary formatted file
+ * to data blocks and identies tuple boundaries/offsets so that workers
+ * can work on the data blocks data.
+ */
+static pg_attribute_always_inline bool
+CopyReadBinaryTupleLeader(CopyState cstate)
+{
+ int readbytes = 0;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint32 block_pos;
+ int16 fld_count;
+ ListCell *cur;
+ FmgrInfo *in_functions = cstate->in_functions;
+ Oid *typioparams = cstate->typioparams;
+ TupleDesc tupDesc = RelationGetDescr(cstate->rel);
+ uint32 new_block_pos;
+ uint32 line_size = -1;
+ ParallelCopyTupleInfo curr_tuple_start_info;
+ ParallelCopyTupleInfo curr_tuple_end_info;
+
+ curr_tuple_start_info.block_id = -1;
+ curr_tuple_start_info.offset = -1;
+ curr_tuple_end_info.block_id = -1;
+ curr_tuple_end_info.offset = -1;
+
+ if (cstate->pcdata->curr_data_block == NULL)
+ {
+ block_pos = WaitGetFreeCopyBlock(pcshared_info);
+
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[block_pos];
+
+ cstate->raw_buf_index = 0;
+
+ readbytes = CopyGetData(cstate, &cstate->pcdata->curr_data_block->data, 1, DATA_BLOCK_SIZE);
+
+ elog(DEBUG1, "LEADER - bytes read from file %d", readbytes);
+
+ if (cstate->reached_eof)
+ return true;
+ }
+
+ if (cstate->raw_buf_index + sizeof(fld_count) >= (DATA_BLOCK_SIZE - 1))
+ AdjustFieldInfo(cstate, 1);
+
+ memcpy(&fld_count, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_count));
+
+ fld_count = (int16) pg_ntoh16(fld_count);
+
+ CHECK_FIELD_COUNT;
+
+ cstate->raw_buf_index = cstate->raw_buf_index + sizeof(fld_count);
+ new_block_pos = pcshared_info->cur_block_pos;
+
+ foreach(cur, cstate->attnumlist)
+ {
+ int attnum = lfirst_int(cur);
+ int m = attnum - 1;
+ Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+ cstate->cur_attname = NameStr(att->attname);
+
+ CopyReadBinaryAttributeLeader(cstate,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &new_block_pos,
+ m,
+ &curr_tuple_start_info,
+ &curr_tuple_end_info,
+ &line_size);
+
+ cstate->cur_attname = NULL;
+ }
+
+ if (line_size > 0)
+ {
+ int line_pos = UpdateBlockInLineInfo(cstate,
+ curr_tuple_start_info.block_id,
+ curr_tuple_start_info.offset,
+ line_size,
+ LINE_LEADER_POPULATED);
+
+ pcshared_info->populated++;
+
+ elog(DEBUG1, "LEADER - adding - block:%u, offset:%u, line size:%u line position:%d",
+ curr_tuple_start_info.block_id,
+ curr_tuple_start_info.offset,
+ line_size, line_pos);
+ }
+
+ return false;
+}
+
+/*
+ * CopyReadBinaryAttributeLeader - leader identifies boundaries/offsets
+ * for each attribute/column, it moves on to next data block if the
+ * attribute/column is spread across data blocks.
+ */
+static pg_attribute_always_inline void
+CopyReadBinaryAttributeLeader(CopyState cstate, FmgrInfo *flinfo,
+ Oid typioparam, int32 typmod, uint32 *new_block_pos,
+ int m, ParallelCopyTupleInfo *tuple_start_info_ptr,
+ ParallelCopyTupleInfo *tuple_end_info_ptr, uint32 *line_size)
+{
+ int32 fld_size;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+
+ if (m == 0)
+ {
+ tuple_start_info_ptr->block_id = pcshared_info->cur_block_pos;
+ /* raw_buf_index would have moved away size of field count bytes
+ in the caller, so move back to store the tuple start offset.
+ */
+ tuple_start_info_ptr->offset = cstate->raw_buf_index - sizeof(int16);
+ }
+
+ if ((cstate->raw_buf_index + sizeof(fld_size)) >= (DATA_BLOCK_SIZE - 1))
+ {
+ AdjustFieldInfo(cstate, 2);
+ *new_block_pos = pcshared_info->cur_block_pos;
+ }
+
+ memcpy(&fld_size, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_size));
+
+ cstate->raw_buf_index = cstate->raw_buf_index + sizeof(fld_size);
+
+ fld_size = (int32) pg_ntoh32(fld_size);
+
+ if (fld_size == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("unexpected EOF in COPY data")));
+
+ if (fld_size < -1)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("invalid field size")));
+
+ if ((DATA_BLOCK_SIZE - cstate->raw_buf_index) >= fld_size)
+ {
+ cstate->raw_buf_index = cstate->raw_buf_index + fld_size;
+ }
+ else
+ {
+ ParallelCopyDataBlock *data_block = NULL;
+ ParallelCopyDataBlock *prev_data_block = NULL;
+ uint32 block_pos = *new_block_pos;
+ int readbytes;
+ int32 requiredblocks = 0;
+ int32 remainingbytesincurrdatablock = DATA_BLOCK_SIZE - cstate->raw_buf_index;
+ int i = 0;
+
+ /* field size can spread across multiple data blocks,
+ * calculate the number of required data blocks and try to get
+ * those many data blocks.
+ */
+ requiredblocks = (int32)(fld_size - remainingbytesincurrdatablock)/(int32)DATA_BLOCK_SIZE;
+
+ if ((fld_size - remainingbytesincurrdatablock)%DATA_BLOCK_SIZE != 0)
+ requiredblocks++;
+
+ i = requiredblocks;
+
+ while(i > 0)
+ {
+ data_block = NULL;
+ prev_data_block = NULL;
+ readbytes = 0;
+
+ block_pos = WaitGetFreeCopyBlock(pcshared_info);
+
+ data_block = &pcshared_info->data_blocks[block_pos];
+
+ readbytes = CopyGetData(cstate, &data_block->data[0], 1, DATA_BLOCK_SIZE);
+
+ elog(DEBUG1, "LEADER - bytes read from file after detecting that tuple is spread across data blocks %d", readbytes);
+
+ /* eof here means that most probably,
+ * the field size would have been corrupted.
+ */
+ if (cstate->reached_eof)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("unexpected EOF in COPY data")));
+
+ prev_data_block = cstate->pcdata->curr_data_block;
+ prev_data_block->following_block = block_pos;
+
+ if (prev_data_block->curr_blk_completed == false)
+ prev_data_block->curr_blk_completed = true;
+
+ cstate->pcdata->curr_data_block = data_block;
+
+ i--;
+ }
+
+ cstate->raw_buf_index = fld_size - (((requiredblocks - 1) * DATA_BLOCK_SIZE) +
+ remainingbytesincurrdatablock);
+
+ /* raw_buf_index should never cross data block size,
+ * as the required number of data blocks would have
+ * been obtained in the above while loop.
+ */
+ Assert(cstate->raw_buf_index <= DATA_BLOCK_SIZE);
+
+ *new_block_pos = block_pos;
+ }
+
+ if (m == cstate->max_fields - 1)
+ {
+ tuple_end_info_ptr->block_id = *new_block_pos;
+ tuple_end_info_ptr->offset = cstate->raw_buf_index - 1;
+
+ if (tuple_start_info_ptr->block_id == tuple_end_info_ptr->block_id)
+ {
+ elog(DEBUG1,"LEADER - tuple lies in a single data block");
+
+ *line_size = tuple_end_info_ptr->offset - tuple_start_info_ptr->offset + 1;
+ pg_atomic_add_fetch_u32(&pcshared_info->data_blocks[tuple_start_info_ptr->block_id].unprocessed_line_parts, 1);
+ }
+ else
+ {
+ uint32 following_block_id = pcshared_info->data_blocks[tuple_start_info_ptr->block_id].following_block;
+
+ elog(DEBUG1,"LEADER - tuple is spread across data blocks");
+
+ *line_size = DATA_BLOCK_SIZE - tuple_start_info_ptr->offset -
+ pcshared_info->data_blocks[tuple_start_info_ptr->block_id].skip_bytes;
+
+ pg_atomic_add_fetch_u32(&pcshared_info->data_blocks[tuple_start_info_ptr->block_id].unprocessed_line_parts, 1);
+
+ while (following_block_id != tuple_end_info_ptr->block_id)
+ {
+ *line_size = *line_size + DATA_BLOCK_SIZE - pcshared_info->data_blocks[following_block_id].skip_bytes;
+
+ pg_atomic_add_fetch_u32(&pcshared_info->data_blocks[following_block_id].unprocessed_line_parts, 1);
+
+ following_block_id = pcshared_info->data_blocks[following_block_id].following_block;
+
+ if (following_block_id == -1)
+ break;
+ }
+
+ if (following_block_id != -1)
+ pg_atomic_add_fetch_u32(&pcshared_info->data_blocks[following_block_id].unprocessed_line_parts, 1);
+
+ *line_size = *line_size + tuple_end_info_ptr->offset + 1;
+ }
+ }
+}
+
+/*
+ * CopyReadBinaryTupleWorker - each worker reads data from data blocks after
+ * getting leader-identified tuple offsets from ring data structure.
+ */
+static pg_attribute_always_inline bool
+CopyReadBinaryTupleWorker(CopyState cstate, Datum *values, bool *nulls)
+{
+ uint32 line_pos;
+ ParallelCopyLineBoundary *line_info;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ int16 fld_count;
+ ListCell *cur;
+ FmgrInfo *in_functions = cstate->in_functions;
+ Oid *typioparams = cstate->typioparams;
+ TupleDesc tupDesc = RelationGetDescr(cstate->rel);
+
+ line_pos = GetLinePosition(cstate);
+
+ if (line_pos == -1)
+ return true;
+
+ line_info = &pcshared_info->line_boundaries.ring[line_pos];
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[line_info->first_block];
+ cstate->raw_buf_index = line_info->start_offset;
+
+ if (cstate->raw_buf_index + sizeof(fld_count) >= (DATA_BLOCK_SIZE - 1))
+ /*the case where field count spread across datablocks should never occur.
+ as the leader would have moved it to next block*/
+ elog(DEBUG1, "WORKER - field count spread across datablocks should never occur");
+
+ memcpy(&fld_count, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_count));
+ fld_count = (int16) pg_ntoh16(fld_count);
+
+ CHECK_FIELD_COUNT;
+
+ cstate->raw_buf_index = cstate->raw_buf_index + sizeof(fld_count);
+
+ foreach(cur, cstate->attnumlist)
+ {
+ int attnum = lfirst_int(cur);
+ int m = attnum - 1;
+ Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+ cstate->cur_attname = NameStr(att->attname);
+
+ values[m] = CopyReadBinaryAttributeWorker(cstate,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &nulls[m]);
+ cstate->cur_attname = NULL;
+ }
+
+ pg_atomic_sub_fetch_u32(&cstate->pcdata->curr_data_block->unprocessed_line_parts, 1);
+ line_info->start_offset = -1;
+ pg_atomic_write_u32(&line_info->line_state, LINE_WORKER_PROCESSED);
+ pg_atomic_write_u32(&line_info->line_size, -1);
+ pg_atomic_add_fetch_u64(&pcshared_info->total_worker_processed, 1);
+
+ return false;
+}
+
+/*
+ * CopyReadBinaryAttributeWorker - leader identifies boundaries/offsets
+ * for each attribute/column, it moves on to next data block if the
+ * attribute/column is spread across data blocks.
+ */
+static pg_attribute_always_inline Datum
+CopyReadBinaryAttributeWorker(CopyState cstate, FmgrInfo *flinfo,
+ Oid typioparam, int32 typmod, bool *isnull)
+{
+ int32 fld_size;
+ Datum result;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+
+ if ((cstate->raw_buf_index + sizeof(fld_size)) >= (DATA_BLOCK_SIZE - 1))
+ {
+ ParallelCopyDataBlock *prev_data_block = cstate->pcdata->curr_data_block;
+
+ elog(DEBUG1, "WORKER - field size is spread across data blocks");
+
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[cstate->pcdata->curr_data_block->following_block];
+ pg_atomic_sub_fetch_u32(&prev_data_block->unprocessed_line_parts, 1);
+ cstate->raw_buf_index = 0;
+ }
+
+ memcpy(&fld_size, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_size));
+ fld_size = (int32) pg_ntoh32(fld_size);
+
+ if (fld_size == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("unexpected EOF in COPY data")));
+
+ if (fld_size < -1)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("invalid field size")));
+
+ cstate->raw_buf_index = cstate->raw_buf_index + sizeof(fld_size);
+
+ /* reset attribute_buf to empty, and load raw data in it */
+ resetStringInfo(&cstate->attribute_buf);
+
+ enlargeStringInfo(&cstate->attribute_buf, fld_size);
+
+ if ((DATA_BLOCK_SIZE - cstate->raw_buf_index) >= fld_size)
+ {
+ elog(DEBUG1, "WORKER - tuple lies in single data block");
+
+ memcpy(&cstate->attribute_buf.data[0],&cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], fld_size);
+ cstate->raw_buf_index = cstate->raw_buf_index + fld_size;
+ }
+ else
+ {
+ ParallelCopyDataBlock *prev_data_block = cstate->pcdata->curr_data_block;
+
+ elog(DEBUG1, "WORKER - tuple is spread across data blocks");
+
+ memcpy(&cstate->attribute_buf.data[0], &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index],
+ (DATA_BLOCK_SIZE - cstate->raw_buf_index));
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[cstate->pcdata->curr_data_block->following_block];
+ pg_atomic_sub_fetch_u32(&prev_data_block->unprocessed_line_parts, 1);
+ memcpy(&cstate->attribute_buf.data[DATA_BLOCK_SIZE - cstate->raw_buf_index],
+ &cstate->pcdata->curr_data_block->data[0], (fld_size - (DATA_BLOCK_SIZE - cstate->raw_buf_index)));
+ cstate->raw_buf_index = fld_size - (DATA_BLOCK_SIZE - cstate->raw_buf_index);
+ }
+
+ cstate->attribute_buf.len = fld_size;
+ cstate->attribute_buf.data[fld_size] = '\0';
+
+ /* Call the column type's binary input converter */
+ result = ReceiveFunctionCall(flinfo, &cstate->attribute_buf,
+ typioparam, typmod);
+
+ /* Trouble if it didn't eat the whole buffer */
+ if (cstate->attribute_buf.cursor != cstate->attribute_buf.len)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
+ errmsg("incorrect binary data format")));
+
+ *isnull = false;
+ return result;
+}
+
+/*
+ * GetLinePosition - return the line position that worker should pcdata->process.
*/
static uint32
GetLinePosition(CopyState cstate)
@@ -1801,7 +2369,9 @@ GetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
{
dataBlkPtr->curr_blk_completed = false;
dataBlkPtr->skip_bytes = 0;
+ dataBlkPtr->following_block = -1;
pcshared_info->cur_block_pos = block_pos;
+ MemSet(&dataBlkPtr->data[0], 0, DATA_BLOCK_SIZE);
return block_pos;
}
@@ -5476,60 +6046,47 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
else
{
/* binary */
- int16 fld_count;
- ListCell *cur;
-
- cstate->cur_lineno++;
-
- if (!CopyGetInt16(cstate, &fld_count))
+ if (!IsParallelCopy())
{
- /* EOF detected (end of file, or protocol-level EOF) */
- return false;
- }
+ int16 fld_count;
+ ListCell *cur;
- if (fld_count == -1)
- {
- /*
- * Received EOF marker. In a V3-protocol copy, wait for the
- * protocol-level EOF, and complain if it doesn't come
- * immediately. This ensures that we correctly handle CopyFail,
- * if client chooses to send that now.
- *
- * Note that we MUST NOT try to read more data in an old-protocol
- * copy, since there is no protocol-level EOF marker then. We
- * could go either way for copy from file, but choose to throw
- * error if there's data after the EOF marker, for consistency
- * with the new-protocol case.
- */
- char dummy;
+ cstate->cur_lineno++;
+ cstate->max_fields = list_length(cstate->attnumlist);
- if (cstate->copy_dest != COPY_OLD_FE &&
- CopyGetData(cstate, &dummy, 1, 1) > 0)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("received copy data after EOF marker")));
- return false;
- }
+ if (!CopyGetInt16(cstate, &fld_count))
+ {
+ /* EOF detected (end of file, or protocol-level EOF) */
+ return false;
+ }
- if (fld_count != attr_count)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("row field count is %d, expected %d",
- (int) fld_count, attr_count)));
+ CHECK_FIELD_COUNT;
- foreach(cur, cstate->attnumlist)
+ foreach(cur, cstate->attnumlist)
+ {
+ int attnum = lfirst_int(cur);
+ int m = attnum - 1;
+ Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+ cstate->cur_attname = NameStr(att->attname);
+ values[m] = CopyReadBinaryAttribute(cstate,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &nulls[m]);
+ cstate->cur_attname = NULL;
+ }
+ }
+ else
{
- int attnum = lfirst_int(cur);
- int m = attnum - 1;
- Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+ bool eof = false;
+ cstate->cur_lineno++;
+ cstate->max_fields = list_length(cstate->attnumlist);
- cstate->cur_attname = NameStr(att->attname);
- values[m] = CopyReadBinaryAttribute(cstate,
- &in_functions[m],
- typioparams[m],
- att->atttypmod,
- &nulls[m]);
- cstate->cur_attname = NULL;
+ eof = CopyReadBinaryTupleWorker(cstate, values, nulls);
+
+ if (eof)
+ return false;
}
}
--
1.8.3.1
On Wed, Jun 24, 2020 at 1:41 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
Along with the review comments addressed
patch(0006-Parallel-Copy-For-Binary-Format-Files.patch) also attaching
all other latest series of patches(0001 to 0005) from [1], the order
of applying patches is from 0001 to 0006.[1] /messages/by-id/CALDaNm0H3N9gK7CMheoaXkO99g=uAPA93nSZXu0xDarPyPY6sg@mail.gmail.com
Some comments:
+ movebytes = DATA_BLOCK_SIZE - cstate->raw_buf_index;
+
+ cstate->pcdata->curr_data_block->skip_bytes = movebytes;
+
+ data_block = &pcshared_info->data_blocks[block_pos];
+
+ if (movebytes > 0)
+ memmove(&data_block->data[0],
&cstate->pcdata->curr_data_block->data[cstate->raw_buf_index],
+ movebytes);
we can create a local variable and use in place of
cstate->pcdata->curr_data_block.
+ if (cstate->raw_buf_index + sizeof(fld_count) >= (DATA_BLOCK_SIZE - 1))
+ AdjustFieldInfo(cstate, 1);
+
+ memcpy(&fld_count,
&cstate->pcdata->curr_data_block->data[cstate->raw_buf_index],
sizeof(fld_count));
Should this be like below, as the remaining size can fit in current block:
if (cstate->raw_buf_index + sizeof(fld_count) >= DATA_BLOCK_SIZE)
+ if ((cstate->raw_buf_index + sizeof(fld_size)) >= (DATA_BLOCK_SIZE - 1))
+ {
+ AdjustFieldInfo(cstate, 2);
+ *new_block_pos = pcshared_info->cur_block_pos;
+ }
Same like above.
+ movebytes = DATA_BLOCK_SIZE - cstate->raw_buf_index;
+
+ cstate->pcdata->curr_data_block->skip_bytes = movebytes;
+
+ data_block = &pcshared_info->data_blocks[block_pos];
+
+ if (movebytes > 0)
Instead of the above check, we can have an assert check for movebytes.
+ if (mode == 1)
+ {
+ cstate->pcdata->curr_data_block = data_block;
+ cstate->raw_buf_index = 0;
+ }
+ else if(mode == 2)
+ {
+ ParallelCopyDataBlock *prev_data_block = NULL;
+ prev_data_block = cstate->pcdata->curr_data_block;
+ prev_data_block->following_block = block_pos;
+ cstate->pcdata->curr_data_block = data_block;
+
+ if (prev_data_block->curr_blk_completed == false)
+ prev_data_block->curr_blk_completed = true;
+
+ cstate->raw_buf_index = 0;
+ }
This code is common for both, keep in common flow and remove if (mode == 1)
cstate->pcdata->curr_data_block = data_block;
cstate->raw_buf_index = 0;
+#define CHECK_FIELD_COUNT \
+{\
+ if (fld_count == -1) \
+ { \
+ if (IsParallelCopy() && \
+ !IsLeader()) \
+ return true; \
+ else if (IsParallelCopy() && \
+ IsLeader()) \
+ { \
+ if
(cstate->pcdata->curr_data_block->data[cstate->raw_buf_index +
sizeof(fld_count)] != 0) \
+ ereport(ERROR, \
+
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("received copy
data after EOF marker"))); \
+ return true; \
+ } \
We only copy sizeof(fld_count), Shouldn't we check fld_count !=
cstate->max_fields? Am I missing something here?
+ if ((cstate->raw_buf_index + sizeof(fld_size)) >= (DATA_BLOCK_SIZE - 1))
+ {
+ AdjustFieldInfo(cstate, 2);
+ *new_block_pos = pcshared_info->cur_block_pos;
+ }
+
+ memcpy(&fld_size,
&cstate->pcdata->curr_data_block->data[cstate->raw_buf_index],
sizeof(fld_size));
+
+ cstate->raw_buf_index = cstate->raw_buf_index + sizeof(fld_size);
+
+ fld_size = (int32) pg_ntoh32(fld_size);
+
+ if (fld_size == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("unexpected EOF in COPY data")));
+
+ if (fld_size < -1)
+ ereport(ERROR,
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+ errmsg("invalid field size")));
+
+ if ((DATA_BLOCK_SIZE - cstate->raw_buf_index) >= fld_size)
+ {
+ cstate->raw_buf_index = cstate->raw_buf_index + fld_size;
+ }
We can keep the check like cstate->raw_buf_index + fld_size < ..., for
better readability and consistency.
+static pg_attribute_always_inline void
+CopyReadBinaryAttributeLeader(CopyState cstate, FmgrInfo *flinfo,
+ Oid typioparam, int32 typmod, uint32 *new_block_pos,
+ int m, ParallelCopyTupleInfo *tuple_start_info_ptr,
+ ParallelCopyTupleInfo *tuple_end_info_ptr, uint32 *line_size)
flinfo, typioparam & typmod is not used, we can remove the parameter.
+static pg_attribute_always_inline void
+CopyReadBinaryAttributeLeader(CopyState cstate, FmgrInfo *flinfo,
+ Oid typioparam, int32 typmod, uint32 *new_block_pos,
+ int m, ParallelCopyTupleInfo *tuple_start_info_ptr,
+ ParallelCopyTupleInfo *tuple_end_info_ptr, uint32 *line_size)
I felt this function need not be an inline function.
+ /* binary format */
+ /* for paralle copy leader, fill in the error
There are some typos, run spell check
+ /* raw_buf_index should never cross data block size,
+ * as the required number of data blocks would have
+ * been obtained in the above while loop.
+ */
There are few places, commenting style should be changed to postgres style
+ if (cstate->pcdata->curr_data_block == NULL)
+ {
+ block_pos = WaitGetFreeCopyBlock(pcshared_info);
+
+ cstate->pcdata->curr_data_block =
&pcshared_info->data_blocks[block_pos];
+
+ cstate->raw_buf_index = 0;
+
+ readbytes = CopyGetData(cstate,
&cstate->pcdata->curr_data_block->data, 1, DATA_BLOCK_SIZE);
+
+ elog(DEBUG1, "LEADER - bytes read from file %d", readbytes);
+
+ if (cstate->reached_eof)
+ return true;
+ }
There are many empty lines, these are not required.
+ if (cstate->raw_buf_index + sizeof(fld_count) >= (DATA_BLOCK_SIZE - 1))
+ AdjustFieldInfo(cstate, 1);
+
+ memcpy(&fld_count,
&cstate->pcdata->curr_data_block->data[cstate->raw_buf_index],
sizeof(fld_count));
+
+ fld_count = (int16) pg_ntoh16(fld_count);
+
+ CHECK_FIELD_COUNT;
+
+ cstate->raw_buf_index = cstate->raw_buf_index + sizeof(fld_count);
+ new_block_pos = pcshared_info->cur_block_pos;
You can run pg_indent once for the changes.
+ if (mode == 1)
+ {
+ cstate->pcdata->curr_data_block = data_block;
+ cstate->raw_buf_index = 0;
+ }
+ else if(mode == 2)
+ {
Could use macros for 1 & 2 for better readability.
+ if (tuple_start_info_ptr->block_id ==
tuple_end_info_ptr->block_id)
+ {
+ elog(DEBUG1,"LEADER - tuple lies in a single
data block");
+
+ *line_size = tuple_end_info_ptr->offset -
tuple_start_info_ptr->offset + 1;
+
pg_atomic_add_fetch_u32(&pcshared_info->data_blocks[tuple_start_info_ptr->block_id].unprocessed_line_parts,
1);
+ }
+ else
+ {
+ uint32 following_block_id =
pcshared_info->data_blocks[tuple_start_info_ptr->block_id].following_block;
+
+ elog(DEBUG1,"LEADER - tuple is spread across
data blocks");
+
+ *line_size = DATA_BLOCK_SIZE -
tuple_start_info_ptr->offset -
+
pcshared_info->data_blocks[tuple_start_info_ptr->block_id].skip_bytes;
+
+
pg_atomic_add_fetch_u32(&pcshared_info->data_blocks[tuple_start_info_ptr->block_id].unprocessed_line_parts,
1);
+
+ while (following_block_id !=
tuple_end_info_ptr->block_id)
+ {
+ *line_size = *line_size +
DATA_BLOCK_SIZE -
pcshared_info->data_blocks[following_block_id].skip_bytes;
+
+
pg_atomic_add_fetch_u32(&pcshared_info->data_blocks[following_block_id].unprocessed_line_parts,
1);
+
+ following_block_id =
pcshared_info->data_blocks[following_block_id].following_block;
+
+ if (following_block_id == -1)
+ break;
+ }
+
+ if (following_block_id != -1)
+
pg_atomic_add_fetch_u32(&pcshared_info->data_blocks[following_block_id].unprocessed_line_parts,
1);
+
+ *line_size = *line_size +
tuple_end_info_ptr->offset + 1;
+ }
We could calculate the size as we parse and identify one record, if we
do that way this can be removed.
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
Thanks Vignesh for the review. Addressed the comments in 0006 patch.
we can create a local variable and use in place of
cstate->pcdata->curr_data_block.
Done.
+ if (cstate->raw_buf_index + sizeof(fld_count) >= (DATA_BLOCK_SIZE - 1)) + AdjustFieldInfo(cstate, 1); + + memcpy(&fld_count, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_count)); Should this be like below, as the remaining size can fit in current block: if (cstate->raw_buf_index + sizeof(fld_count) >= DATA_BLOCK_SIZE)+ if ((cstate->raw_buf_index + sizeof(fld_size)) >= (DATA_BLOCK_SIZE - 1)) + { + AdjustFieldInfo(cstate, 2); + *new_block_pos = pcshared_info->cur_block_pos; + } Same like above.
Yes you are right. Changed.
+ movebytes = DATA_BLOCK_SIZE - cstate->raw_buf_index; + + cstate->pcdata->curr_data_block->skip_bytes = movebytes; + + data_block = &pcshared_info->data_blocks[block_pos]; + + if (movebytes > 0) Instead of the above check, we can have an assert check for movebytes.
No, we can't use assert here. For the edge case where the current data
block is full to the size DATA_BLOCK_SIZE, then movebytes will be 0,
but we need to get a new data block. We avoid memmove by having
movebytes>0 check.
+ if (mode == 1) + { + cstate->pcdata->curr_data_block = data_block; + cstate->raw_buf_index = 0; + } + else if(mode == 2) + { + ParallelCopyDataBlock *prev_data_block = NULL; + prev_data_block = cstate->pcdata->curr_data_block; + prev_data_block->following_block = block_pos; + cstate->pcdata->curr_data_block = data_block; + + if (prev_data_block->curr_blk_completed == false) + prev_data_block->curr_blk_completed = true; + + cstate->raw_buf_index = 0; + }This code is common for both, keep in common flow and remove if (mode == 1)
cstate->pcdata->curr_data_block = data_block;
cstate->raw_buf_index = 0;
Done.
+#define CHECK_FIELD_COUNT \ +{\ + if (fld_count == -1) \ + { \ + if (IsParallelCopy() && \ + !IsLeader()) \ + return true; \ + else if (IsParallelCopy() && \ + IsLeader()) \ + { \ + if (cstate->pcdata->curr_data_block->data[cstate->raw_buf_index + sizeof(fld_count)] != 0) \ + ereport(ERROR, \ + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \ + errmsg("received copy data after EOF marker"))); \ + return true; \ + } \ We only copy sizeof(fld_count), Shouldn't we check fld_count != cstate->max_fields? Am I missing something here?
fld_count != cstate->max_fields check is done after the above checks.
+ if ((DATA_BLOCK_SIZE - cstate->raw_buf_index) >= fld_size) + { + cstate->raw_buf_index = cstate->raw_buf_index + fld_size; + } We can keep the check like cstate->raw_buf_index + fld_size < ..., for better readability and consistency.
I think this is okay. It gives a good meaning that available bytes in
the current data block is greater or equal to fld_size then, the tuple
lies in the current data block.
+static pg_attribute_always_inline void +CopyReadBinaryAttributeLeader(CopyState cstate, FmgrInfo *flinfo, + Oid typioparam, int32 typmod, uint32 *new_block_pos, + int m, ParallelCopyTupleInfo *tuple_start_info_ptr, + ParallelCopyTupleInfo *tuple_end_info_ptr, uint32 *line_size) flinfo, typioparam & typmod is not used, we can remove the parameter.
Done.
+static pg_attribute_always_inline void +CopyReadBinaryAttributeLeader(CopyState cstate, FmgrInfo *flinfo, + Oid typioparam, int32 typmod, uint32 *new_block_pos, + int m, ParallelCopyTupleInfo *tuple_start_info_ptr, + ParallelCopyTupleInfo *tuple_end_info_ptr, uint32 *line_size) I felt this function need not be an inline function.
Yes. Changed.
+ /* binary format */ + /* for paralle copy leader, fill in the error There are some typos, run spell check
Done.
+ /* raw_buf_index should never cross data block size, + * as the required number of data blocks would have + * been obtained in the above while loop. + */ There are few places, commenting style should be changed to postgres style
Changed.
+ if (cstate->pcdata->curr_data_block == NULL) + { + block_pos = WaitGetFreeCopyBlock(pcshared_info); + + cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[block_pos]; + + cstate->raw_buf_index = 0; + + readbytes = CopyGetData(cstate, &cstate->pcdata->curr_data_block->data, 1, DATA_BLOCK_SIZE); + + elog(DEBUG1, "LEADER - bytes read from file %d", readbytes); + + if (cstate->reached_eof) + return true; + } There are many empty lines, these are not required.
Removed.
+ + fld_count = (int16) pg_ntoh16(fld_count); + + CHECK_FIELD_COUNT; + + cstate->raw_buf_index = cstate->raw_buf_index + sizeof(fld_count); + new_block_pos = pcshared_info->cur_block_pos; You can run pg_indent once for the changes.
I ran pg_indent and observed that there are many places getting
modified by pg_indent. If we need to run pg_indet on copy.c for
parallel copy alone, then first, we need to run on plane copy.c and
take those changes and then run for all parallel copy files. I think
we better run pg_indent, for all the parallel copy patches once and
for all, maybe just before we kind of finish up all the code reviews.
+ if (mode == 1) + { + cstate->pcdata->curr_data_block = data_block; + cstate->raw_buf_index = 0; + } + else if(mode == 2) + { Could use macros for 1 & 2 for better readability.
Done.
+ + if (following_block_id == -1) + break; + } + + if (following_block_id != -1) + pg_atomic_add_fetch_u32(&pcshared_info->data_blocks[following_block_id].unprocessed_line_parts, 1); + + *line_size = *line_size + tuple_end_info_ptr->offset + 1; + } We could calculate the size as we parse and identify one record, if we do that way this can be removed.
Done.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
0001-Copy-code-readjustment-to-support-parallel-copy.patchapplication/octet-stream; name=0001-Copy-code-readjustment-to-support-parallel-copy.patchDownload
From fe65142a5d461cf457ea87abdd62ab88feb62bea Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Tue, 23 Jun 2020 06:49:22 +0530
Subject: [PATCH 1/6] Copy code readjustment to support parallel copy.
This patch has the copy code slightly readjusted so that the common code is
separated to functions/macros, these functions/macros will be used by the
workers in the parallel copy code of the upcoming patches. EOL removal is moved
from CopyReadLine to CopyReadLineText, this change was required because in case
of parallel copy the record identification and record updation is done in
CopyReadLineText, before record information is updated in shared memory the new
line characters should be removed.
---
src/backend/commands/copy.c | 372 ++++++++++++++++++++++++++------------------
1 file changed, 221 insertions(+), 151 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3e199bd..33ee891 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -95,6 +95,9 @@ typedef enum CopyInsertMethod
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
} CopyInsertMethod;
+#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
+
/*
* This struct contains all the state variables used throughout a COPY
* operation. For simplicity, we use the same struct for all variants of COPY,
@@ -219,7 +222,6 @@ typedef struct CopyStateData
* converts it. Note: we guarantee that there is a \0 at
* raw_buf[raw_buf_len].
*/
-#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
char *raw_buf;
int raw_buf_index; /* next byte to process */
int raw_buf_len; /* total # of bytes stored */
@@ -347,6 +349,88 @@ if (1) \
goto not_end_of_copy; \
} else ((void) 0)
+/*
+ * CONVERT_TO_SERVER_ENCODING - convert contents to server encoding.
+ */
+#define CONVERT_TO_SERVER_ENCODING(cstate) \
+{ \
+ /* Done reading the line. Convert it to server encoding. */ \
+ if (cstate->need_transcoding) \
+ { \
+ char *cvt; \
+ cvt = pg_any_to_server(cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ cstate->file_encoding); \
+ if (cvt != cstate->line_buf.data) \
+ { \
+ /* transfer converted data back to line_buf */ \
+ resetStringInfo(&cstate->line_buf); \
+ appendBinaryStringInfo(&cstate->line_buf, cvt, strlen(cvt)); \
+ pfree(cvt); \
+ } \
+ } \
+ /* Now it's safe to use the buffer in error messages */ \
+ cstate->line_buf_converted = true; \
+}
+
+/*
+ * CLEAR_EOL_FROM_COPIED_DATA - Clear EOL from the copied data.
+ */
+#define CLEAR_EOL_FROM_COPIED_DATA(copy_line_data, copy_line_pos, copy_line_size) \
+{ \
+ /* \
+ * If we didn't hit EOF, then we must have transferred the EOL marker \
+ * to line_buf along with the data. Get rid of it. \
+ */ \
+ switch (cstate->eol_type) \
+ { \
+ case EOL_NL: \
+ Assert(copy_line_size >= 1); \
+ Assert(copy_line_data[copy_line_pos - 1] == '\n'); \
+ copy_line_data[copy_line_pos - 1] = '\0'; \
+ copy_line_size--; \
+ break; \
+ case EOL_CR: \
+ Assert(copy_line_size >= 1); \
+ Assert(copy_line_data[copy_line_pos - 1] == '\r'); \
+ copy_line_data[copy_line_pos - 1] = '\0'; \
+ copy_line_size--; \
+ break; \
+ case EOL_CRNL: \
+ Assert(copy_line_size >= 2); \
+ Assert(copy_line_data[copy_line_pos - 2] == '\r'); \
+ Assert(copy_line_data[copy_line_pos - 1] == '\n'); \
+ copy_line_data[copy_line_pos - 2] = '\0'; \
+ copy_line_size -= 2; \
+ break; \
+ case EOL_UNKNOWN: \
+ /* shouldn't get here */ \
+ Assert(false); \
+ break; \
+ } \
+}
+
+/*
+ * CLEAR_EOL_LINE - Wrapper for clearing EOL.
+ */
+#define CLEAR_EOL_LINE() \
+if (!result && !IsHeaderLine()) \
+ CLEAR_EOL_FROM_COPIED_DATA(cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ cstate->line_buf.len) \
+
+/*
+ * INCREMENTPROCESSED - Increment the lines processed.
+ */
+#define INCREMENTPROCESSED(processed) \
+processed++;
+
+/*
+ * GETPROCESSED - Get the lines processed.
+ */
+#define GETPROCESSED(processed) \
+return processed;
+
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
@@ -392,6 +476,8 @@ static bool CopyGetInt32(CopyState cstate, int32 *val);
static void CopySendInt16(CopyState cstate, int16 val);
static bool CopyGetInt16(CopyState cstate, int16 *val);
+static void PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tup_desc,
+ List *attnamelist);
/*
* Send copy start/stop messages for frontend copies. These have changed
@@ -793,6 +879,7 @@ CopyLoadRawBuf(CopyState cstate)
{
int nbytes;
int inbytes;
+ int minread = 1;
if (cstate->raw_buf_index < cstate->raw_buf_len)
{
@@ -804,8 +891,11 @@ CopyLoadRawBuf(CopyState cstate)
else
nbytes = 0; /* no data need be saved */
+ if (cstate->copy_dest == COPY_NEW_FE)
+ minread = RAW_BUF_SIZE - nbytes;
+
inbytes = CopyGetData(cstate, cstate->raw_buf + nbytes,
- 1, RAW_BUF_SIZE - nbytes);
+ minread, RAW_BUF_SIZE - nbytes);
nbytes += inbytes;
cstate->raw_buf[nbytes] = '\0';
cstate->raw_buf_index = 0;
@@ -1463,7 +1553,6 @@ BeginCopy(ParseState *pstate,
{
CopyState cstate;
TupleDesc tupDesc;
- int num_phys_attrs;
MemoryContext oldcontext;
/* Allocate workspace and zero all fields */
@@ -1629,6 +1718,24 @@ BeginCopy(ParseState *pstate,
tupDesc = cstate->queryDesc->tupDesc;
}
+ PopulateGlobalsForCopyFrom(cstate, tupDesc, attnamelist);
+ cstate->copy_dest = COPY_FILE; /* default */
+
+ MemoryContextSwitchTo(oldcontext);
+
+ return cstate;
+}
+
+/*
+ * PopulateGlobalsForCopyFrom - Populates the common variables required for copy
+ * from operation. This is a helper function for BeginCopy function.
+ */
+static void
+PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tupDesc,
+ List *attnamelist)
+{
+ int num_phys_attrs;
+
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
@@ -1748,12 +1855,6 @@ BeginCopy(ParseState *pstate,
pg_database_encoding_max_length() > 1);
/* See Multibyte encoding comment above */
cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
-
- cstate->copy_dest = COPY_FILE; /* default */
-
- MemoryContextSwitchTo(oldcontext);
-
- return cstate;
}
/*
@@ -2642,32 +2743,11 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
}
/*
- * Copy FROM file to relation.
+ * Check if the relation specified in copy from is valid.
*/
-uint64
-CopyFrom(CopyState cstate)
+static void
+CheckTargetRelValidity(CopyState cstate)
{
- ResultRelInfo *resultRelInfo;
- ResultRelInfo *target_resultRelInfo;
- ResultRelInfo *prevResultRelInfo = NULL;
- EState *estate = CreateExecutorState(); /* for ExecConstraints() */
- ModifyTableState *mtstate;
- ExprContext *econtext;
- TupleTableSlot *singleslot = NULL;
- MemoryContext oldcontext = CurrentMemoryContext;
-
- PartitionTupleRouting *proute = NULL;
- ErrorContextCallback errcallback;
- CommandId mycid = GetCurrentCommandId(true);
- int ti_options = 0; /* start with default options for insert */
- BulkInsertState bistate = NULL;
- CopyInsertMethod insertMethod;
- CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
- uint64 processed = 0;
- bool has_before_insert_row_trig;
- bool has_instead_insert_row_trig;
- bool leafpart_use_multi_insert = false;
-
Assert(cstate->rel);
/*
@@ -2704,27 +2784,6 @@ CopyFrom(CopyState cstate)
RelationGetRelationName(cstate->rel))));
}
- /*
- * If the target file is new-in-transaction, we assume that checking FSM
- * for free space is a waste of time. This could possibly be wrong, but
- * it's unlikely.
- */
- if (RELKIND_HAS_STORAGE(cstate->rel->rd_rel->relkind) &&
- (cstate->rel->rd_createSubid != InvalidSubTransactionId ||
- cstate->rel->rd_firstRelfilenodeSubid != InvalidSubTransactionId))
- ti_options |= TABLE_INSERT_SKIP_FSM;
-
- /*
- * Optimize if new relfilenode was created in this subxact or one of its
- * committed children and we won't see those rows later as part of an
- * earlier scan or command. The subxact test ensures that if this subxact
- * aborts then the frozen rows won't be visible after xact cleanup. Note
- * that the stronger test of exactly which subtransaction created it is
- * crucial for correctness of this optimization. The test for an earlier
- * scan or command tolerates false negatives. FREEZE causes other sessions
- * to see rows they would not see under MVCC, and a false negative merely
- * spreads that anomaly to the current session.
- */
if (cstate->freeze)
{
/*
@@ -2762,9 +2821,61 @@ CopyFrom(CopyState cstate)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("cannot perform COPY FREEZE because the table was not created or truncated in the current subtransaction")));
+ }
+}
+
+/*
+ * Copy FROM file to relation.
+ */
+uint64
+CopyFrom(CopyState cstate)
+{
+ ResultRelInfo *resultRelInfo;
+ ResultRelInfo *target_resultRelInfo;
+ ResultRelInfo *prevResultRelInfo = NULL;
+ EState *estate = CreateExecutorState(); /* for ExecConstraints() */
+ ModifyTableState *mtstate;
+ ExprContext *econtext;
+ TupleTableSlot *singleslot = NULL;
+ MemoryContext oldcontext = CurrentMemoryContext;
+
+ PartitionTupleRouting *proute = NULL;
+ ErrorContextCallback errcallback;
+ CommandId mycid = GetCurrentCommandId(true);
+ int ti_options = 0; /* start with default options for insert */
+ BulkInsertState bistate = NULL;
+ CopyInsertMethod insertMethod;
+ CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
+ uint64 processed = 0;
+ bool has_before_insert_row_trig;
+ bool has_instead_insert_row_trig;
+ bool leafpart_use_multi_insert = false;
+
+ CheckTargetRelValidity(cstate);
+ /*
+ * If the target file is new-in-transaction, we assume that checking FSM
+ * for free space is a waste of time. This could possibly be wrong, but
+ * it's unlikely.
+ */
+ if (RELKIND_HAS_STORAGE(cstate->rel->rd_rel->relkind) &&
+ (cstate->rel->rd_createSubid != InvalidSubTransactionId ||
+ cstate->rel->rd_firstRelfilenodeSubid != InvalidSubTransactionId))
+ ti_options |= TABLE_INSERT_SKIP_FSM;
+
+ /*
+ * Optimize if new relfilenode was created in this subxact or one of its
+ * committed children and we won't see those rows later as part of an
+ * earlier scan or command. The subxact test ensures that if this subxact
+ * aborts then the frozen rows won't be visible after xact cleanup. Note
+ * that the stronger test of exactly which subtransaction created it is
+ * crucial for correctness of this optimization. The test for an earlier
+ * scan or command tolerates false negatives. FREEZE causes other sessions
+ * to see rows they would not see under MVCC, and a false negative merely
+ * spreads that anomaly to the current session.
+ */
+ if (cstate->freeze)
ti_options |= TABLE_INSERT_FROZEN;
- }
/*
* We need a ResultRelInfo so we can use the regular executor's
@@ -3257,7 +3368,7 @@ CopyFrom(CopyState cstate)
* or FDW; this is the same definition used by nodeModifyTable.c
* for counting tuples inserted by an INSERT command.
*/
- processed++;
+ INCREMENTPROCESSED(processed)
}
}
@@ -3312,30 +3423,15 @@ CopyFrom(CopyState cstate)
FreeExecutorState(estate);
- return processed;
+ GETPROCESSED(processed)
}
/*
- * Setup to read tuples from a file for COPY FROM.
- *
- * 'rel': Used as a template for the tuples
- * 'filename': Name of server-local file to read
- * 'attnamelist': List of char *, columns to include. NIL selects all cols.
- * 'options': List of DefElem. See copy_opt_item in gram.y for selections.
- *
- * Returns a CopyState, to be passed to NextCopyFrom and related functions.
+ * PopulateCatalogInformation - populate the catalog information.
*/
-CopyState
-BeginCopyFrom(ParseState *pstate,
- Relation rel,
- const char *filename,
- bool is_program,
- copy_data_source_cb data_source_cb,
- List *attnamelist,
- List *options)
+static void
+PopulateCatalogInformation(CopyState cstate)
{
- CopyState cstate;
- bool pipe = (filename == NULL);
TupleDesc tupDesc;
AttrNumber num_phys_attrs,
num_defaults;
@@ -3345,31 +3441,8 @@ BeginCopyFrom(ParseState *pstate,
Oid in_func_oid;
int *defmap;
ExprState **defexprs;
- MemoryContext oldcontext;
bool volatile_defexprs;
- cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
- oldcontext = MemoryContextSwitchTo(cstate->copycontext);
-
- /* Initialize state variables */
- cstate->reached_eof = false;
- cstate->eol_type = EOL_UNKNOWN;
- cstate->cur_relname = RelationGetRelationName(cstate->rel);
- cstate->cur_lineno = 0;
- cstate->cur_attname = NULL;
- cstate->cur_attval = NULL;
-
- /* Set up variables to avoid per-attribute overhead. */
- initStringInfo(&cstate->attribute_buf);
- initStringInfo(&cstate->line_buf);
- cstate->line_buf_converted = false;
- cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
- cstate->raw_buf_index = cstate->raw_buf_len = 0;
-
- /* Assign range table, we'll need it in CopyFrom. */
- if (pstate)
- cstate->range_table = pstate->p_rtable;
-
tupDesc = RelationGetDescr(cstate->rel);
num_phys_attrs = tupDesc->natts;
num_defaults = 0;
@@ -3447,6 +3520,54 @@ BeginCopyFrom(ParseState *pstate,
cstate->defexprs = defexprs;
cstate->volatile_defexprs = volatile_defexprs;
cstate->num_defaults = num_defaults;
+}
+
+/*
+ * Setup to read tuples from a file for COPY FROM.
+ *
+ * 'rel': Used as a template for the tuples
+ * 'filename': Name of server-local file to read
+ * 'attnamelist': List of char *, columns to include. NIL selects all cols.
+ * 'options': List of DefElem. See copy_opt_item in gram.y for selections.
+ *
+ * Returns a CopyState, to be passed to NextCopyFrom and related functions.
+ */
+CopyState
+BeginCopyFrom(ParseState *pstate,
+ Relation rel,
+ const char *filename,
+ bool is_program,
+ copy_data_source_cb data_source_cb,
+ List *attnamelist,
+ List *options)
+{
+ CopyState cstate;
+ bool pipe = (filename == NULL);
+ MemoryContext oldcontext;
+
+ cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+
+ /* Initialize state variables */
+ cstate->reached_eof = false;
+ cstate->eol_type = EOL_UNKNOWN;
+ cstate->cur_relname = RelationGetRelationName(cstate->rel);
+ cstate->cur_lineno = 0;
+ cstate->cur_attname = NULL;
+ cstate->cur_attval = NULL;
+
+ /* Set up variables to avoid per-attribute overhead. */
+ initStringInfo(&cstate->attribute_buf);
+ initStringInfo(&cstate->line_buf);
+ cstate->line_buf_converted = false;
+ cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+
+ /* Assign range table, we'll need it in CopyFrom. */
+ if (pstate)
+ cstate->range_table = pstate->p_rtable;
+
+ PopulateCatalogInformation(cstate);
cstate->is_program = is_program;
if (data_source_cb)
@@ -3856,60 +3977,8 @@ CopyReadLine(CopyState cstate)
} while (CopyLoadRawBuf(cstate));
}
}
- else
- {
- /*
- * If we didn't hit EOF, then we must have transferred the EOL marker
- * to line_buf along with the data. Get rid of it.
- */
- switch (cstate->eol_type)
- {
- case EOL_NL:
- Assert(cstate->line_buf.len >= 1);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\n');
- cstate->line_buf.len--;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_CR:
- Assert(cstate->line_buf.len >= 1);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\r');
- cstate->line_buf.len--;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_CRNL:
- Assert(cstate->line_buf.len >= 2);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 2] == '\r');
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\n');
- cstate->line_buf.len -= 2;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_UNKNOWN:
- /* shouldn't get here */
- Assert(false);
- break;
- }
- }
-
- /* Done reading the line. Convert it to server encoding. */
- if (cstate->need_transcoding)
- {
- char *cvt;
-
- cvt = pg_any_to_server(cstate->line_buf.data,
- cstate->line_buf.len,
- cstate->file_encoding);
- if (cvt != cstate->line_buf.data)
- {
- /* transfer converted data back to line_buf */
- resetStringInfo(&cstate->line_buf);
- appendBinaryStringInfo(&cstate->line_buf, cvt, strlen(cvt));
- pfree(cvt);
- }
- }
-
- /* Now it's safe to use the buffer in error messages */
- cstate->line_buf_converted = true;
+ CONVERT_TO_SERVER_ENCODING(cstate)
return result;
}
@@ -4273,6 +4342,7 @@ not_end_of_copy:
* Transfer any still-uncopied data to line_buf.
*/
REFILL_LINEBUF;
+ CLEAR_EOL_LINE()
return result;
}
--
1.8.3.1
0002-Framework-for-leader-worker-in-parallel-copy.patchapplication/octet-stream; name=0002-Framework-for-leader-worker-in-parallel-copy.patchDownload
From f852a2d53aa2107b90899df9b8c07824b131e498 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>, Bharath <bharath.rupireddyforpostgres@gmail.com>
Date: Wed, 24 Jun 2020 13:51:56 +0530
Subject: [PATCH 2/6] Framework for leader/worker in parallel copy
This patch has the framework for data structures in parallel copy, leader
initialization, worker initialization, shared memory updation, starting workers,
wait for workers and workers exiting.
---
src/backend/access/transam/parallel.c | 4 +
src/backend/commands/copy.c | 865 +++++++++++++++++++++++++++-
src/backend/replication/logical/tablesync.c | 2 +-
src/include/commands/copy.h | 4 +
src/tools/pgindent/typedefs.list | 8 +
5 files changed, 878 insertions(+), 5 deletions(-)
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 14a8690..09e7a19 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -25,6 +25,7 @@
#include "catalog/pg_enum.h"
#include "catalog/storage.h"
#include "commands/async.h"
+#include "commands/copy.h"
#include "executor/execParallel.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
@@ -145,6 +146,9 @@ static const struct
},
{
"parallel_vacuum_main", parallel_vacuum_main
+ },
+ {
+ "ParallelCopyMain", ParallelCopyMain
}
};
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 33ee891..0a4d997 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -96,9 +96,138 @@ typedef enum CopyInsertMethod
} CopyInsertMethod;
#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+
+#define DATA_BLOCK_SIZE RAW_BUF_SIZE
+#define RINGSIZE (10 * 1000)
+#define MAX_BLOCKS_COUNT 1000
+#define WORKER_CHUNK_COUNT 50 /* should be mod of RINGSIZE */
+
+#define IsParallelCopy() (cstate->is_parallel)
+#define IsLeader() (cstate->pcdata->is_leader)
#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
/*
+ * Copy data block information.
+ */
+typedef struct ParallelCopyDataBlock
+{
+ /* The number of unprocessed lines in the current block. */
+ pg_atomic_uint32 unprocessed_line_parts;
+
+ /*
+ * If the current line data is continued into another block, following_block
+ * will have the position where the remaining data need to be read.
+ */
+ uint32 following_block;
+
+ /*
+ * This flag will be set, when the leader finds out this block can be read
+ * safely by the worker. This helps the worker to start processing the line
+ * early where the line will be spread across many blocks and the worker
+ * need not wait for the complete line to be processed.
+ */
+ bool curr_blk_completed;
+ char data[DATA_BLOCK_SIZE]; /* data read from file */
+ uint8 skip_bytes;
+}ParallelCopyDataBlock;
+
+/*
+ * Individual line information.
+ * ParallelCopyLineBoundary is common data structure between leader & worker,
+ * this is protected by the following sequence in the leader & worker.
+ * Leader should operate in the following order:
+ * 1) update first_block, start_offset & cur_lineno in any order.
+ * 2) update line_size.
+ * 3) update line_state.
+ * Worker should operate in the following order:
+ * 1) read line_size.
+ * 2) only one worker should choose one line for processing, this is handled by
+ * using pg_atomic_compare_exchange_u32, worker will change the sate to
+ * LINE_WORKER_PROCESSING only if line_state is LINE_LEADER_POPULATED.
+ * 3) read first_block, start_offset & cur_lineno in any order.
+ */
+typedef struct ParallelCopyLineBoundary
+{
+ /* Position of the first block in data_blocks array. */
+ uint32 first_block;
+ uint32 start_offset; /* start offset of the line */
+
+ /*
+ * Size of the current line -1 means line is yet to be filled completely,
+ * 0 means empty line, >0 means line filled with line size data.
+ */
+ pg_atomic_uint32 line_size;
+ pg_atomic_uint32 line_state; /* line state */
+ uint64 cur_lineno; /* line number for error messages */
+}ParallelCopyLineBoundary;
+
+/*
+ * Circular queue used to store the line information.
+ */
+typedef struct ParallelCopyLineBoundaries
+{
+ /* Position for the leader to populate a line. */
+ uint32 leader_pos;
+
+ /* Data read from the file/stdin by the leader process. */
+ ParallelCopyLineBoundary ring[RINGSIZE];
+}ParallelCopyLineBoundaries;
+
+/*
+ * Shared information among parallel copy workers. This will be allocated in the
+ * DSM segment.
+ */
+typedef struct ParallelCopyShmInfo
+{
+ bool is_read_in_progress; /* file read status */
+
+ /*
+ * Actual lines inserted by worker (some records will be filtered based on
+ * where condition).
+ */
+ pg_atomic_uint64 processed;
+ pg_atomic_uint64 total_worker_processed; /* total processed records by the workers */
+ uint64 populated; /* lines populated by leader */
+ uint32 cur_block_pos; /* current data block */
+ ParallelCopyDataBlock data_blocks[MAX_BLOCKS_COUNT]; /* data block array */
+ FullTransactionId full_transaction_id; /* xid for copy from statement */
+ CommandId mycid; /* command id */
+ ParallelCopyLineBoundaries line_boundaries; /* line array */
+} ParallelCopyShmInfo;
+
+/*
+ * Parallel copy line buffer information.
+ */
+typedef struct ParallelCopyLineBuf
+{
+ StringInfoData line_buf;
+ uint64 cur_lineno; /* line number for error messages */
+}ParallelCopyLineBuf;
+
+/*
+ * Parallel copy data information.
+ */
+typedef struct ParallelCopyData
+{
+ Oid relid; /* relation id of the table */
+ ParallelCopyShmInfo *pcshared_info; /* common info in shared memory */
+ bool is_leader;
+
+ /* line position which worker is processing */
+ uint32 worker_processed_pos;
+
+ /*
+ * Local line_buf array, workers will copy it here and release the lines
+ * for the leader to continue.
+ */
+ ParallelCopyLineBuf worker_line_buf[WORKER_CHUNK_COUNT];
+ uint32 worker_line_buf_count; /* Number of lines */
+
+ /* Current position in worker_line_buf */
+ uint32 worker_line_buf_pos;
+}ParallelCopyData;
+
+/*
* This struct contains all the state variables used throughout a COPY
* operation. For simplicity, we use the same struct for all variants of COPY,
* even though some fields are used in only some cases.
@@ -225,8 +354,66 @@ typedef struct CopyStateData
char *raw_buf;
int raw_buf_index; /* next byte to process */
int raw_buf_len; /* total # of bytes stored */
+ int nworkers;
+ bool is_parallel;
+ ParallelCopyData *pcdata;
} CopyStateData;
+/*
+ * This structure helps in storing the common data from CopyStateData that are
+ * required by the workers. This information will then be allocated and stored
+ * into the DSM for the worker to retrieve and copy it to CopyStateData.
+ */
+typedef struct ParallelCopyCommonKeyData
+{
+ /* low-level state data */
+ CopyDest copy_dest; /* type of copy source/destination */
+ int file_encoding; /* file or remote side's character encoding */
+ bool need_transcoding; /* file encoding diff from server? */
+ bool encoding_embeds_ascii; /* ASCII can be non-first byte? */
+
+ /* parameters from the COPY command */
+ bool csv_mode; /* Comma Separated Value format? */
+ bool header_line; /* CSV header line? */
+ int null_print_len; /* length of same */
+ bool force_quote_all; /* FORCE_QUOTE *? */
+ bool convert_selectively; /* do selective binary conversion? */
+
+ /* Working state for COPY FROM */
+ AttrNumber num_defaults;
+ Oid relid;
+}ParallelCopyCommonKeyData;
+
+/*
+ * This structure will help in converting a List data type into the below
+ * structure format with the count having the number of elements in the list and
+ * the info having the List elements appended contigously. This converted
+ * structure will be allocated in shared memory and stored in DSM for the worker
+ * to retrieve and later convert it back to List data type.
+ */
+typedef struct ParallelCopyKeyListInfo
+{
+ int count; /* count of attributes */
+
+ /* string info in the form info followed by info1, info2... infon */
+ char info[1];
+} ParallelCopyKeyListInfo;
+
+/*
+ * List of internal parallel copy function pointers.
+ */
+static const struct
+{
+ char *fn_name;
+ copy_data_source_cb fn_addr;
+} InternalParallelCopyFuncPtrs[] =
+
+{
+ {
+ "copy_read_data", copy_read_data
+ },
+};
+
/* DestReceiver for COPY (query) TO */
typedef struct
{
@@ -256,6 +443,23 @@ typedef struct
/* Trim the list of buffers back down to this number after flushing */
#define MAX_PARTITION_BUFFERS 32
+/* DSM keys for parallel copy. */
+#define PARALLEL_COPY_KEY_SHARED_INFO 1
+#define PARALLEL_COPY_KEY_CSTATE 2
+#define PARALLEL_COPY_KEY_NULL_PRINT 3
+#define PARALLEL_COPY_KEY_NULL_PRINT_CLIENT 4
+#define PARALLEL_COPY_KEY_DELIM 5
+#define PARALLEL_COPY_KEY_QUOTE 6
+#define PARALLEL_COPY_KEY_ESCAPE 7
+#define PARALLEL_COPY_KEY_ATTNAME_LIST 8
+#define PARALLEL_COPY_KEY_FORCE_QUOTE_LIST 9
+#define PARALLEL_COPY_KEY_NOT_NULL_LIST 10
+#define PARALLEL_COPY_KEY_NULL_LIST 11
+#define PARALLEL_COPY_KEY_CONVERT_LIST 12
+#define PARALLEL_COPY_KEY_DATASOURCE_CB 13
+#define PARALLEL_COPY_KEY_WHERE_CLAUSE_STR 14
+#define PARALLEL_COPY_KEY_RANGE_TABLE 15
+
/* Stores multi-insert data related to a single relation in CopyFrom. */
typedef struct CopyMultiInsertBuffer
{
@@ -476,10 +680,596 @@ static bool CopyGetInt32(CopyState cstate, int32 *val);
static void CopySendInt16(CopyState cstate, int16 val);
static bool CopyGetInt16(CopyState cstate, int16 *val);
+static pg_attribute_always_inline void EndParallelCopy(ParallelContext *pcxt);
static void PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tup_desc,
List *attnamelist);
+static pg_attribute_always_inline copy_data_source_cb LookupParallelCopyFnPtr(const char *funcname);
+static pg_attribute_always_inline char* LookupParallelCopyFnStr(copy_data_source_cb fn_addr);
+
+/*
+ * CopyCommonInfoForWorker - Copy shared_cstate using cstate information.
+ */
+static pg_attribute_always_inline void
+CopyCommonInfoForWorker(CopyState cstate, ParallelCopyCommonKeyData *shared_cstate)
+{
+ shared_cstate->copy_dest = cstate->copy_dest;
+ shared_cstate->file_encoding = cstate->file_encoding;
+ shared_cstate->need_transcoding = cstate->need_transcoding;
+ shared_cstate->encoding_embeds_ascii = cstate->encoding_embeds_ascii;
+ shared_cstate->csv_mode = cstate->csv_mode;
+ shared_cstate->header_line = cstate->header_line;
+ shared_cstate->null_print_len = cstate->null_print_len;
+ shared_cstate->force_quote_all = cstate->force_quote_all;
+ shared_cstate->convert_selectively = cstate->convert_selectively;
+ shared_cstate->num_defaults = cstate->num_defaults;
+ shared_cstate->relid = cstate->pcdata->relid;
+}
+
+/*
+ * RetrieveSharedString - Retrieve the string from shared memory.
+ */
+static void
+RetrieveSharedString(shm_toc *toc, int sharedkey, char **copystr)
+{
+ char *shared_str_val = (char *)shm_toc_lookup(toc, sharedkey, true);
+ if (shared_str_val)
+ *copystr = pstrdup(shared_str_val);
+}
+
+/*
+ * RetrieveSharedList - Retrieve the list from shared memory.
+ */
+static void
+RetrieveSharedList(shm_toc *toc, int sharedkey, List **copylist)
+{
+ ParallelCopyKeyListInfo *listinformation = (ParallelCopyKeyListInfo *)shm_toc_lookup(toc, sharedkey,
+ true);
+ if (listinformation)
+ {
+ int length = 0;
+ int count;
+
+ for (count = 0; count < listinformation->count; count++)
+ {
+ char *attname = (char *)(listinformation->info + length);
+ length += strlen(attname) + 1;
+ *copylist = lappend(*copylist, makeString(attname));
+ }
+ }
+}
+
+/*
+ * CopyListSharedMemory - Copy the list into shared memory.
+ */
+static void
+CopyListSharedMemory(List *inputlist, Size memsize, ParallelCopyKeyListInfo *sharedlistinfo)
+{
+ ListCell *l;
+ int length = 0;
+
+ MemSet(sharedlistinfo, 0, memsize);
+ foreach(l, inputlist)
+ {
+ char *name = strVal(lfirst(l));
+ memcpy((char *)(sharedlistinfo->info + length), name, strlen(name));
+ sharedlistinfo->count++;
+ length += strlen(name) + 1;
+ }
+}
+
+/*
+ * ComputeListSize - compute the list size.
+ */
+static int
+ComputeListSize(List *inputlist)
+{
+ int est_size = sizeof(int);
+ if (inputlist != NIL)
+ {
+ ListCell *l;
+ foreach(l, inputlist)
+ est_size += strlen(strVal(lfirst(l))) + 1;
+ }
+
+ return est_size;
+}
+
+/*
+ * EstimateLineKeysStr - Estimate the size required in shared memory for the
+ * input string.
+ */
+static void
+EstimateLineKeysStr(ParallelContext *pcxt, char *inputstr)
+{
+ if (inputstr)
+ {
+ shm_toc_estimate_chunk(&pcxt->estimator, strlen(inputstr) + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+}
+
+/*
+ * EstimateLineKeysList - Estimate the size required in shared memory for the
+ * input list.
+ */
+static void
+EstimateLineKeysList(ParallelContext *pcxt, List *inputlist,
+ Size *est_list_size)
+{
+ if (inputlist != NIL)
+ {
+ *est_list_size = ComputeListSize(inputlist);
+ shm_toc_estimate_chunk(&pcxt->estimator, *est_list_size);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+}
/*
+ * InsertStringShm - Insert a string into shared memory.
+ */
+static void
+InsertStringShm(ParallelContext *pcxt, int key, char *inputstr)
+{
+ if (inputstr)
+ {
+ char *shmptr = (char *)shm_toc_allocate(pcxt->toc,
+ strlen(inputstr) + 1);
+ strcpy(shmptr, inputstr);
+ shm_toc_insert(pcxt->toc, key, shmptr);
+ }
+}
+
+/*
+ * InsertListShm - Insert a list into shared memory.
+ */
+static void
+InsertListShm(ParallelContext *pcxt, int key, List *inputlist,
+ Size est_list_size)
+{
+ if (inputlist != NIL)
+ {
+ ParallelCopyKeyListInfo *sharedlistinfo = (ParallelCopyKeyListInfo *)shm_toc_allocate(pcxt->toc,
+ est_list_size);
+ CopyListSharedMemory(inputlist, est_list_size, sharedlistinfo);
+ shm_toc_insert(pcxt->toc, key, sharedlistinfo);
+ }
+}
+
+/*
+ * BeginParallelCopy - start parallel copy tasks.
+ *
+ * Get the number of workers required to perform the parallel copy. The data
+ * structures that are required by the parallel workers will be initialized, the
+ * size required in DSM will be calculated and the necessary keys will be loaded
+ * in the DSM. The specified number of workers will then be launched.
+ *
+ */
+static ParallelContext*
+BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
+{
+ ParallelContext *pcxt;
+ ParallelCopyShmInfo *shared_info_ptr;
+ ParallelCopyCommonKeyData *shared_cstate;
+ FullTransactionId full_transaction_id;
+ Size est_shared_info;
+ Size est_cstateshared;
+ Size est_att_list_size;
+ Size est_quote_list_size;
+ Size est_notnull_list_size;
+ Size est_null_list_size;
+ Size est_convert_list_size;
+ Size est_datasource_cb_size;
+ int count = 0;
+ char *whereClauseStr = NULL;
+ char *rangeTableStr = NULL;
+ int parallel_workers = 0;
+ ParallelCopyData *pcdata;
+
+ parallel_workers = Min(nworkers, max_worker_processes);
+
+ /* Can't perform copy in parallel */
+ if (parallel_workers <= 0)
+ {
+ elog(WARNING,
+ "No workers available, copy will be run in non-parallel mode");
+ return NULL;
+ }
+
+ pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
+ cstate->pcdata = pcdata;
+
+ EnterParallelMode();
+ pcxt = CreateParallelContext("postgres", "ParallelCopyMain",
+ parallel_workers);
+ Assert(pcxt->nworkers > 0);
+
+ /*
+ * Estimate size for shared information for PARALLEL_COPY_KEY_SHARED_INFO
+ */
+ est_shared_info = sizeof(ParallelCopyShmInfo);
+ shm_toc_estimate_chunk(&pcxt->estimator, est_shared_info);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate the size for shared information for PARALLEL_COPY_KEY_CSTATE */
+ est_cstateshared = MAXALIGN(sizeof(ParallelCopyCommonKeyData));
+ shm_toc_estimate_chunk(&pcxt->estimator, est_cstateshared);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ EstimateLineKeysStr(pcxt, cstate->null_print);
+ EstimateLineKeysStr(pcxt, cstate->null_print_client);
+ EstimateLineKeysStr(pcxt, cstate->delim);
+ EstimateLineKeysStr(pcxt, cstate->quote);
+ EstimateLineKeysStr(pcxt, cstate->escape);
+
+ if (cstate->whereClause != NULL)
+ {
+ whereClauseStr = nodeToString(cstate->whereClause);
+ EstimateLineKeysStr(pcxt, whereClauseStr);
+ }
+
+ if (cstate->range_table != NULL)
+ {
+ rangeTableStr = nodeToString(cstate->range_table);
+ EstimateLineKeysStr(pcxt, rangeTableStr);
+ }
+
+ /* Estimate the size for shared information for PARALLEL_COPY_KEY_XID. */
+ shm_toc_estimate_chunk(&pcxt->estimator, sizeof(FullTransactionId));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_ATTNAME_LIST.
+ */
+ EstimateLineKeysList(pcxt, attnamelist, &est_att_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_FORCE_QUOTE_LIST.
+ */
+ EstimateLineKeysList(pcxt, cstate->force_quote, &est_quote_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_NOT_NULL_LIST.
+ */
+ EstimateLineKeysList(pcxt, cstate->force_notnull,
+ &est_notnull_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_NULL_LIST.
+ */
+ EstimateLineKeysList(pcxt, cstate->force_null, &est_null_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_CONVERT_LIST.
+ */
+ EstimateLineKeysList(pcxt, cstate->convert_select,
+ &est_convert_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_DATASOURCE_CB.
+ */
+ if (cstate->data_source_cb)
+ {
+ char *functionname = LookupParallelCopyFnStr(cstate->data_source_cb);
+ est_datasource_cb_size = strlen(functionname) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, est_datasource_cb_size);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+
+ InitializeParallelDSM(pcxt);
+
+ /* If no DSM segment was available, back out (do serial copy) */
+ if (pcxt->seg == NULL)
+ {
+ EndParallelCopy(pcxt);
+ elog(WARNING,
+ "DSM segments not available, copy will be run in non-parallel mode");
+ return NULL;
+ }
+
+ /* Allocate shared memory for PARALLEL_COPY_KEY_SHARED_INFO */
+ shared_info_ptr = (ParallelCopyShmInfo *) shm_toc_allocate(pcxt->toc,
+ est_shared_info);
+ MemSet(shared_info_ptr, 0, est_shared_info);
+ shared_info_ptr->is_read_in_progress = true;
+ shared_info_ptr->cur_block_pos = -1;
+ shared_info_ptr->full_transaction_id = full_transaction_id;
+ shared_info_ptr->mycid = GetCurrentCommandId(true);
+ for (count = 0; count < RINGSIZE; count++)
+ {
+ ParallelCopyLineBoundary *lineInfo = &shared_info_ptr->line_boundaries.ring[count];
+ pg_atomic_init_u32(&(lineInfo->line_size), -1);
+ }
+
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_SHARED_INFO, shared_info_ptr);
+ pcdata->pcshared_info = shared_info_ptr;
+ pcdata->relid = relid;
+
+ /* Store shared build state, for which we reserved space. */
+ shared_cstate = (ParallelCopyCommonKeyData *)shm_toc_allocate(pcxt->toc,
+ est_cstateshared);
+
+ /* copy cstate variables. */
+ CopyCommonInfoForWorker(cstate, shared_cstate);
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_CSTATE, shared_cstate);
+
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_NULL_PRINT, cstate->null_print);
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_NULL_PRINT_CLIENT,
+ cstate->null_print_client);
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_DELIM, cstate->delim);
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_QUOTE, cstate->quote);
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_ESCAPE, cstate->escape);
+
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_ATTNAME_LIST,
+ attnamelist, est_att_list_size);
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_FORCE_QUOTE_LIST,
+ cstate->force_quote, est_quote_list_size);
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_NOT_NULL_LIST,
+ cstate->force_notnull, est_notnull_list_size);
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_NULL_LIST, cstate->force_null,
+ est_null_list_size);
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_CONVERT_LIST,
+ cstate->convert_select, est_convert_list_size);
+
+ if (cstate->data_source_cb)
+ {
+ char *functionname = LookupParallelCopyFnStr(cstate->data_source_cb);
+ char *data_source_cb = (char *) shm_toc_allocate(pcxt->toc,
+ est_datasource_cb_size);
+ strcpy(data_source_cb, functionname);
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_DATASOURCE_CB,
+ data_source_cb);
+ }
+
+ if (cstate->whereClause)
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_WHERE_CLAUSE_STR,
+ whereClauseStr);
+
+ if(cstate->range_table)
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_RANGE_TABLE, rangeTableStr);
+
+ LaunchParallelWorkers(pcxt);
+ if (pcxt->nworkers_launched == 0)
+ {
+ EndParallelCopy(pcxt);
+ elog(WARNING,
+ "No workers available, copy will be run in non-parallel mode");
+ return NULL;
+ }
+
+ /*
+ * Caller needs to wait for all launched workers when we return. Make sure
+ * that the failure-to-start case will not hang forever.
+ */
+ WaitForParallelWorkersToAttach(pcxt);
+
+ pcdata->is_leader = true;
+ cstate->is_parallel = true;
+ return pcxt;
+}
+
+/*
+ * EndParallelCopy - end the parallel copy tasks.
+ */
+static pg_attribute_always_inline void
+EndParallelCopy(ParallelContext *pcxt)
+{
+ Assert(!IsParallelWorker());
+
+ DestroyParallelContext(pcxt);
+ ExitParallelMode();
+}
+
+/*
+ * ParallelWorkerInitialization - Initialize parallel worker.
+ */
+static void
+ParallelWorkerInitialization(ParallelCopyCommonKeyData *shared_cstate,
+ CopyState cstate, List *attnamelist)
+{
+ uint32 count;
+ ParallelCopyData *pcdata = cstate->pcdata;
+ TupleDesc tup_desc = RelationGetDescr(cstate->rel);
+
+ cstate->copy_dest = shared_cstate->copy_dest;
+ cstate->file_encoding = shared_cstate->file_encoding;
+ cstate->need_transcoding = shared_cstate->need_transcoding;
+ cstate->encoding_embeds_ascii = shared_cstate->encoding_embeds_ascii;
+ cstate->csv_mode = shared_cstate->csv_mode;
+ cstate->header_line = shared_cstate->header_line;
+ cstate->null_print_len = shared_cstate->null_print_len;
+ cstate->force_quote_all = shared_cstate->force_quote_all;
+ cstate->convert_selectively = shared_cstate->convert_selectively;
+ cstate->num_defaults = shared_cstate->num_defaults;
+ pcdata->relid = shared_cstate->relid;
+
+ PopulateGlobalsForCopyFrom(cstate, tup_desc, attnamelist);
+
+ /* Initialize state variables. */
+ cstate->reached_eof = false;
+ cstate->eol_type = EOL_UNKNOWN;
+ cstate->cur_relname = RelationGetRelationName(cstate->rel);
+ cstate->cur_lineno = 0;
+ cstate->cur_attname = NULL;
+ cstate->cur_attval = NULL;
+
+ /* Set up variables to avoid per-attribute overhead. */
+ initStringInfo(&cstate->attribute_buf);
+
+ initStringInfo(&cstate->line_buf);
+ for (count = 0; count < WORKER_CHUNK_COUNT;count++)
+ initStringInfo(&pcdata->worker_line_buf[count].line_buf);
+
+ cstate->line_buf_converted = false;
+ cstate->raw_buf = NULL;
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+}
+/*
+ * ParallelCopyMain - parallel copy worker's code.
+ *
+ * Where clause handling, convert tuple to columns, add default null values for
+ * the missing columns that are not present in that record. Find the partition
+ * if it is partitioned table, invoke before row insert Triggers, handle
+ * constraints and insert the tuples.
+ */
+void
+ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
+{
+ CopyState cstate;
+ ParallelCopyData *pcdata;
+ ParallelCopyShmInfo *pcshared_info;
+ ParallelCopyCommonKeyData *shared_cstate;
+ Relation rel = NULL;
+ MemoryContext oldcontext;
+ List *attlist = NIL;
+ char *data_source_cb;
+ char *whereClauseStr = NULL;
+ char *rangeTableStr = NULL;
+
+ /* Allocate workspace and zero all fields. */
+ cstate = (CopyStateData *) palloc0(sizeof(CopyStateData));
+ pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
+ cstate->pcdata = pcdata;
+ pcdata->is_leader = false;
+ pcdata->worker_processed_pos = -1;
+ cstate->is_parallel = true;
+
+ /*
+ * We allocate everything used by a cstate in a new memory context. This
+ * avoids memory leaks during repeated use of COPY in a query.
+ */
+ cstate->copycontext = AllocSetContextCreate(CurrentMemoryContext,
+ "COPY",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+
+ pcshared_info = (ParallelCopyShmInfo *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_SHARED_INFO,
+ false);
+
+ ereport(DEBUG1, (errmsg("Starting parallel copy worker")));
+
+ pcdata->pcshared_info = pcshared_info;
+
+ shared_cstate = (ParallelCopyCommonKeyData *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_CSTATE,
+ false);
+
+ cstate->null_print = (char *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_NULL_PRINT,
+ true);
+ cstate->null_print_client = (char *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_NULL_PRINT_CLIENT,
+ true);
+
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_DELIM, &cstate->delim);
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_QUOTE, &cstate->quote);
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_ESCAPE, &cstate->escape);
+
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_ATTNAME_LIST, &attlist);
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_FORCE_QUOTE_LIST, &cstate->force_quote);
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_NOT_NULL_LIST, &cstate->force_notnull);
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_NULL_LIST, &cstate->force_null);
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_CONVERT_LIST, &cstate->convert_select);
+ data_source_cb = (char *)shm_toc_lookup(toc,
+ PARALLEL_COPY_KEY_DATASOURCE_CB,
+ true);
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_WHERE_CLAUSE_STR, &whereClauseStr);
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_RANGE_TABLE, &rangeTableStr);
+
+ if (data_source_cb)
+ cstate->data_source_cb = LookupParallelCopyFnPtr(data_source_cb);
+
+ if (whereClauseStr)
+ {
+ Node *whereClauseCnvrtdFrmStr = (Node *) stringToNode(whereClauseStr);
+ cstate->whereClause = whereClauseCnvrtdFrmStr;
+ }
+
+ if (rangeTableStr)
+ {
+ List *rangeTableCnvrtdFrmStr = (List *) stringToNode(rangeTableStr);
+ cstate->range_table = rangeTableCnvrtdFrmStr;
+ }
+
+ /* Open and lock the relation, using the appropriate lock type. */
+ rel = table_open(shared_cstate->relid, RowExclusiveLock);
+ cstate->rel = rel;
+ ParallelWorkerInitialization(shared_cstate, cstate, attlist);
+
+ CopyFrom(cstate);
+
+ if (rel != NULL)
+ table_close(rel, RowExclusiveLock);
+
+ MemoryContextSwitchTo(oldcontext);
+ return;
+}
+/*
+ * ParallelCopyLeader - parallel copy leader's functionality.
+ *
+ * Leader will populate the shared memory and share it across the workers.
+ * Leader will read the table data from the file and copy the contents to block.
+ * Leader will then read the input contents and identify the data based on line
+ * beaks. This information is called line. The line will be populate in
+ * ParallelCopyLineBoundary. Workers will then pick up this information and
+ * insert in to table. Leader will do this till it completes processing the
+ * file. Leader executes the before statement if before statement trigger is
+ * present. Leader read the data from input file. Leader then loads data to data
+ * blocks as and when required block by block. Leader traverses through the data
+ * block to identify one line. It gets a free line to copy the information, if
+ * there is no free line it will wait till there is one free line.
+ * Server copies the identified lines information into lines. This process is
+ * repeated till the complete file is processed.
+ * Leader will wait till all the lines populated are processed by the workers
+ * and exits.
+ */
+static void
+ParallelCopyLeader(CopyState cstate)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ereport(DEBUG1, (errmsg("Running parallel copy leader")));
+
+ pcshared_info->is_read_in_progress = false;
+ cstate->cur_lineno = 0;
+}
+/*
+ * LookupParallelCopyFnPtr - Look up parallel copy function pointer.
+ */
+static pg_attribute_always_inline copy_data_source_cb
+LookupParallelCopyFnPtr(const char *funcname)
+{
+ int i;
+
+ for (i = 0; i < lengthof(InternalParallelCopyFuncPtrs); i++)
+ {
+ if (strcmp(InternalParallelCopyFuncPtrs[i].fn_name, funcname) == 0)
+ return InternalParallelCopyFuncPtrs[i].fn_addr;
+ }
+
+ /* We can only reach this by programming error. */
+ elog(ERROR, "internal function \"%s\" not found", funcname);
+}
+
+/*
+ * LookupParallelCopyFnStr - Lookup function string from a function pointer.
+ */
+static pg_attribute_always_inline char*
+LookupParallelCopyFnStr(copy_data_source_cb fn_addr)
+{
+ int i;
+
+ for (i = 0; i < lengthof(InternalParallelCopyFuncPtrs); i++)
+ {
+ if (InternalParallelCopyFuncPtrs[i].fn_addr == fn_addr)
+ return InternalParallelCopyFuncPtrs[i].fn_name;
+ }
+
+ /* We can only reach this by programming error. */
+ elog(ERROR, "internal function pointer not found");
+}
+/*
* Send copy start/stop messages for frontend copies. These have changed
* in past protocol redesigns.
*/
@@ -1149,6 +1939,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
if (is_from)
{
+ ParallelContext *pcxt = NULL;
Assert(rel);
/* check read-only transaction and parallel mode */
@@ -1158,7 +1949,24 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
cstate = BeginCopyFrom(pstate, rel, stmt->filename, stmt->is_program,
NULL, stmt->attlist, stmt->options);
cstate->whereClause = whereClause;
- *processed = CopyFrom(cstate); /* copy from file to database */
+ cstate->is_parallel = false;
+
+ if (cstate->nworkers > 0)
+ pcxt = BeginParallelCopy(cstate->nworkers, cstate, stmt->attlist,
+ relid);
+
+ if (pcxt)
+ {
+ ParallelCopyLeader(cstate);
+
+ /* Wait for all copy workers to finish */
+ WaitForParallelWorkersToFinish(pcxt);
+ *processed = pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
+ EndParallelCopy(pcxt);
+ }
+ else
+ *processed = CopyFrom(cstate); /* copy from file to database */
+
EndCopyFrom(cstate);
}
else
@@ -1207,6 +2015,7 @@ ProcessCopyOptions(ParseState *pstate,
cstate->is_copy_from = is_from;
cstate->file_encoding = -1;
+ cstate->nworkers = -1;
/* Extract options from the statement node tree */
foreach(option, options)
@@ -1375,6 +2184,53 @@ ProcessCopyOptions(ParseState *pstate,
defel->defname),
parser_errposition(pstate, defel->location)));
}
+ else if (strcmp(defel->defname, "parallel") == 0)
+ {
+ char *endptr, *str;
+ long val;
+
+ if (!cstate->is_copy_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("parallel option supported only for copy from"),
+ parser_errposition(pstate, defel->location)));
+ if (cstate->nworkers >= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("conflicting or redundant options"),
+ parser_errposition(pstate, defel->location)));
+
+ str = defGetString(defel);
+
+ errno = 0; /* To distinguish success/failure after call */
+ val = strtol(str, &endptr, 10);
+
+ /* Check for various possible errors */
+ if ((errno == ERANGE && (val == LONG_MAX || val == LONG_MIN))
+ || (errno != 0 && val == 0) ||
+ *endptr)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("improper use of argument to option \"%s\"",
+ defel->defname),
+ parser_errposition(pstate, defel->location)));
+
+ if (endptr == str)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("no digits were found in argument to option \"%s\"",
+ defel->defname),
+ parser_errposition(pstate, defel->location)));
+
+ cstate->nworkers = (int) val;
+
+ if (cstate->nworkers <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("argument to option \"%s\" must be a positive integer greater than zero",
+ defel->defname),
+ parser_errposition(pstate, defel->location)));
+ }
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1727,12 +2583,13 @@ BeginCopy(ParseState *pstate,
}
/*
- * PopulateGlobalsForCopyFrom - Populates the common variables required for copy
- * from operation. This is a helper function for BeginCopy function.
+ * PopulateCatalogInformation - Populates the common variables required for copy
+ * from operation. This is a helper function for BeginCopy &
+ * ParallelWorkerInitialization function.
*/
static void
PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tupDesc,
- List *attnamelist)
+ List *attnamelist)
{
int num_phys_attrs;
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index c27d970..b3787c1 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -557,7 +557,7 @@ make_copy_attnamelist(LogicalRepRelMapEntry *rel)
* Data source callback for the COPY FROM, which reads from the remote
* connection and passes the data back to our local COPY.
*/
-static int
+int
copy_read_data(void *outbuf, int minread, int maxread)
{
int bytesread = 0;
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c639833..5dc95ac 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -14,6 +14,7 @@
#ifndef COPY_H
#define COPY_H
+#include "access/parallel.h"
#include "nodes/execnodes.h"
#include "nodes/parsenodes.h"
#include "parser/parse_node.h"
@@ -41,4 +42,7 @@ extern uint64 CopyFrom(CopyState cstate);
extern DestReceiver *CreateCopyDestReceiver(void);
+extern void ParallelCopyMain(dsm_segment *seg, shm_toc *toc);
+
+extern int copy_read_data(void *outbuf, int minread, int maxread);
#endif /* COPY_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 7eaaad1..6a42ac4 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1699,6 +1699,14 @@ ParallelBitmapHeapState
ParallelBlockTableScanDesc
ParallelCompletionPtr
ParallelContext
+ParallelCopyLineBoundaries
+ParallelCopyLineBoundary
+ParallelCopyCommonKeyData
+ParallelCopyData
+ParallelCopyDataBlock
+ParallelCopyKeyListInfo
+ParallelCopyLineBuf
+ParallelCopyShmInfo
ParallelExecutorInfo
ParallelHashGrowth
ParallelHashJoinBatch
--
1.8.3.1
0003-Allow-copy-from-command-to-process-data-from-file-ST.patchapplication/octet-stream; name=0003-Allow-copy-from-command-to-process-data-from-file-ST.patchDownload
From 47d27d300dea6adf9af6d688044e04770d1d8650 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>, Bharath <bharath.rupireddyforpostgres@gmail.com>
Date: Mon, 6 Jul 2020 14:33:12 +0530
Subject: [PATCH 3/6] Allow copy from command to process data from file/STDIN
contents to a table in parallel.
This feature allows the copy from to leverage multiple CPUs in order to copy
data from file/STDIN to a table. This adds a PARALLEL option to COPY FROM
command where the user can specify the number of workers that can be used
to perform the COPY FROM command. Specifying zero as number of workers will
disable parallelism.
The backend, to which the "COPY FROM" query is submitted acts as leader with
the responsibility of reading data from the file/stdin, launching at most n
number of workers as specified with PARALLEL 'n' option in the "COPY FROM"
query. The leader populates the common data required for the workers execution
in the DSM and shares it with the workers. The leader then executes before
statement triggers if there exists any. Leader populates DSM lines which
includes the start offset and line size, while populating the lines it reads
as many blocks as required into the DSM data blocks from the file. Each block
is of 64K size. The leader parses the data to identify a line, the existing
logic from CopyReadLineText which identifies the lines with some changes was
used for this. Leader checks if a free line is available to copy the
information, if there is no free line it waits till the required line is
freed up by the worker and then copies the identified lines information
(offset & line size) into the DSM lines. This process is repeated till the
complete file is processed. Simultaneously, the workers cache the lines(50)
locally into the local memory and release the lines to the leader for further
populating. Each worker processes the lines that it cached and inserts it into
the table. The leader does not participate in the insertion of data, leaders
only responsibility will be to identify the lines as fast as possible for the
workers to do the actual copy operation. The leader waits till all the lines
populated are processed by the workers and exits.
---
src/backend/access/common/toast_internals.c | 11 +-
src/backend/access/heap/heapam.c | 13 -
src/backend/access/transam/xact.c | 31 +
src/backend/commands/copy.c | 900 ++++++++++++++++++++++++++--
src/include/access/xact.h | 2 +
src/tools/pgindent/typedefs.list | 1 +
6 files changed, 906 insertions(+), 52 deletions(-)
diff --git a/src/backend/access/common/toast_internals.c b/src/backend/access/common/toast_internals.c
index 25a81e5..586d53d 100644
--- a/src/backend/access/common/toast_internals.c
+++ b/src/backend/access/common/toast_internals.c
@@ -17,6 +17,7 @@
#include "access/genam.h"
#include "access/heapam.h"
#include "access/heaptoast.h"
+#include "access/parallel.h"
#include "access/table.h"
#include "access/toast_internals.h"
#include "access/xact.h"
@@ -116,7 +117,15 @@ toast_save_datum(Relation rel, Datum value,
TupleDesc toasttupDesc;
Datum t_values[3];
bool t_isnull[3];
- CommandId mycid = GetCurrentCommandId(true);
+
+ /*
+ * Parallel copy can insert toast tuples, in case of parallel copy the
+ * command would have been set already by calling AssignCommandIdForWorker.
+ * For parallel copy call GetCurrentCommandId to get currentCommandId by
+ * passing used as false, as this is taken care earlier.
+ */
+ CommandId mycid = IsParallelWorker() ? GetCurrentCommandId(false) :
+ GetCurrentCommandId(true);
struct varlena *result;
struct varatt_external toast_pointer;
union
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 537913d..28f3a98 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2012,19 +2012,6 @@ static HeapTuple
heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
CommandId cid, int options)
{
- /*
- * Parallel operations are required to be strictly read-only in a parallel
- * worker. Parallel inserts are not safe even in the leader in the
- * general case, because group locking means that heavyweight locks for
- * relation extension or GIN page locks will not conflict between members
- * of a lock group, but we don't prohibit that case here because there are
- * useful special cases that we can safely allow, such as CREATE TABLE AS.
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-
tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 905dc7d..ed4009e 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -502,6 +502,37 @@ GetCurrentFullTransactionIdIfAny(void)
}
/*
+ * AssignFullTransactionIdForWorker
+ *
+ * For parallel copy, transaction id of leader will be used by the workers.
+ */
+void
+AssignFullTransactionIdForWorker(FullTransactionId fullTransactionId)
+{
+ TransactionState s = CurrentTransactionState;
+
+ Assert((IsInParallelMode() || IsParallelWorker()));
+ s->fullTransactionId = fullTransactionId;
+}
+
+/*
+ * AssignCommandIdForWorker
+ *
+ * For parallel copy, command id of leader will be used by the workers.
+ */
+void
+AssignCommandIdForWorker(CommandId commandId, bool used)
+{
+ Assert((IsInParallelMode() || IsParallelWorker()));
+
+ /* this is global to a transaction, not subtransaction-local */
+ if (used)
+ currentCommandIdUsed = true;
+
+ currentCommandId = commandId;
+}
+
+/*
* MarkCurrentTransactionIdLoggedIfAny
*
* Remember that the current xid - if it is assigned - now has been wal logged.
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 0a4d997..048f2d2 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -26,6 +26,7 @@
#include "access/xlog.h"
#include "catalog/dependency.h"
#include "catalog/pg_authid.h"
+#include "catalog/pg_proc_d.h"
#include "catalog/pg_type.h"
#include "commands/copy.h"
#include "commands/defrem.h"
@@ -40,11 +41,13 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "optimizer/clauses.h"
#include "optimizer/optimizer.h"
#include "parser/parse_coerce.h"
#include "parser/parse_collate.h"
#include "parser/parse_expr.h"
#include "parser/parse_relation.h"
+#include "pgstat.h"
#include "port/pg_bswap.h"
#include "rewrite/rewriteHandler.h"
#include "storage/fd.h"
@@ -95,6 +98,18 @@ typedef enum CopyInsertMethod
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
} CopyInsertMethod;
+/*
+ * State of the line.
+ */
+typedef enum ParallelCopyLineState
+{
+ LINE_INIT, /* initial state of line */
+ LINE_LEADER_POPULATING, /* leader processing line */
+ LINE_LEADER_POPULATED, /* leader completed populating line */
+ LINE_WORKER_PROCESSING, /* worker processing line */
+ LINE_WORKER_PROCESSED /* worker completed processing line */
+}ParallelCopyLineState;
+
#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
#define DATA_BLOCK_SIZE RAW_BUF_SIZE
@@ -538,9 +553,13 @@ if (1) \
{ \
if (raw_buf_ptr > cstate->raw_buf_index) \
{ \
- appendBinaryStringInfo(&cstate->line_buf, \
- cstate->raw_buf + cstate->raw_buf_index, \
- raw_buf_ptr - cstate->raw_buf_index); \
+ if (!IsParallelCopy()) \
+ appendBinaryStringInfo(&cstate->line_buf, \
+ cstate->raw_buf + cstate->raw_buf_index, \
+ raw_buf_ptr - cstate->raw_buf_index); \
+ else \
+ line_size += raw_buf_ptr - cstate->raw_buf_index; \
+ \
cstate->raw_buf_index = raw_buf_ptr; \
} \
} else ((void) 0)
@@ -553,13 +572,40 @@ if (1) \
goto not_end_of_copy; \
} else ((void) 0)
+/* Begin parallel copy Macros */
+#define SET_NEWLINE_SIZE() \
+{ \
+ if (cstate->eol_type == EOL_NL || cstate->eol_type == EOL_CR) \
+ new_line_size = 1; \
+ else if (cstate->eol_type == EOL_CRNL) \
+ new_line_size = 2; \
+ else \
+ new_line_size = 0; \
+}
+
+/*
+ * COPY_WAIT_TO_PROCESS - Wait before continuing to process.
+ */
+#define COPY_WAIT_TO_PROCESS() \
+{ \
+ CHECK_FOR_INTERRUPTS(); \
+ (void) WaitLatch(MyLatch, \
+ WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, \
+ 1L, WAIT_EVENT_PG_SLEEP); \
+ ResetLatch(MyLatch); \
+}
+
+/* End parallel copy Macros */
+
/*
* CONVERT_TO_SERVER_ENCODING - convert contents to server encoding.
*/
#define CONVERT_TO_SERVER_ENCODING(cstate) \
{ \
/* Done reading the line. Convert it to server encoding. */ \
- if (cstate->need_transcoding) \
+ if (cstate->need_transcoding && \
+ (!IsParallelCopy() || \
+ (IsParallelCopy() && !IsLeader()))) \
{ \
char *cvt; \
cvt = pg_any_to_server(cstate->line_buf.data, \
@@ -618,22 +664,38 @@ if (1) \
* CLEAR_EOL_LINE - Wrapper for clearing EOL.
*/
#define CLEAR_EOL_LINE() \
-if (!result && !IsHeaderLine()) \
- CLEAR_EOL_FROM_COPIED_DATA(cstate->line_buf.data, \
- cstate->line_buf.len, \
- cstate->line_buf.len) \
+{ \
+ if (!result && !IsHeaderLine()) \
+ { \
+ if (IsParallelCopy()) \
+ CLEAR_EOL_FROM_COPIED_DATA(cstate->raw_buf, \
+ raw_buf_ptr, line_size) \
+ else \
+ CLEAR_EOL_FROM_COPIED_DATA(cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ cstate->line_buf.len) \
+ } \
+}
/*
* INCREMENTPROCESSED - Increment the lines processed.
*/
-#define INCREMENTPROCESSED(processed) \
-processed++;
+#define INCREMENTPROCESSED(processed) \
+{ \
+ if (!IsParallelCopy()) \
+ processed++; \
+ else \
+ pg_atomic_add_fetch_u64(&cstate->pcdata->pcshared_info->processed, 1); \
+}
/*
* GETPROCESSED - Get the lines processed.
*/
#define GETPROCESSED(processed) \
-return processed;
+if (!IsParallelCopy()) \
+ return processed; \
+else \
+ return pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
@@ -681,8 +743,12 @@ static void CopySendInt16(CopyState cstate, int16 val);
static bool CopyGetInt16(CopyState cstate, int16 *val);
static pg_attribute_always_inline void EndParallelCopy(ParallelContext *pcxt);
+static void ExecBeforeStmtTrigger(CopyState cstate);
+static void CheckTargetRelValidity(CopyState cstate);
static void PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tup_desc,
List *attnamelist);
+static void PopulateCatalogInformation(CopyState cstate);
+static pg_attribute_always_inline uint32 GetLinePosition(CopyState cstate);
static pg_attribute_always_inline copy_data_source_cb LookupParallelCopyFnPtr(const char *funcname);
static pg_attribute_always_inline char* LookupParallelCopyFnStr(copy_data_source_cb fn_addr);
@@ -836,6 +902,137 @@ InsertListShm(ParallelContext *pcxt, int key, List *inputlist,
}
/*
+ * IsTriggerFunctionParallelSafe - Check if the trigger function is parallel
+ * safe for the triggers. Return false if any one of the trigger has parallel
+ * unsafe function.
+ */
+static pg_attribute_always_inline bool
+IsTriggerFunctionParallelSafe(TriggerDesc *trigdesc)
+{
+ int i;
+ for (i = 0; i < trigdesc->numtriggers; i++)
+ {
+ Trigger *trigger = &trigdesc->triggers[i];
+ int trigtype = RI_TRIGGER_NONE;
+
+ if (func_parallel(trigger->tgfoid) != PROPARALLEL_SAFE)
+ return false;
+
+ /* If the trigger is parallel safe, also look for RI_TRIGGER. */
+ trigtype = RI_FKey_trigger_type(trigger->tgfoid);
+ if (trigtype == RI_TRIGGER_PK || trigtype == RI_TRIGGER_FK)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * CheckExprParallelSafety - determine parallel safety of volatile expressions
+ * in default clause of column definition or in where clause and return true if
+ * they are parallel safe.
+ */
+static pg_attribute_always_inline bool
+CheckExprParallelSafety(CopyState cstate)
+{
+ if (contain_volatile_functions(cstate->whereClause))
+ {
+ if (max_parallel_hazard((Query *)cstate->whereClause) != PROPARALLEL_SAFE)
+ return false;
+ }
+
+ /*
+ * Check if any of the column has volatile default expression. if yes, and
+ * they are not parallel safe, then parallelism is not allowed. For
+ * instance, if there are any serial/bigserial columns for which nextval()
+ * default expression which is parallel unsafe is associated, parallelism
+ * should not be allowed. In non parallel copy volatile functions are not
+ * checked for nextval().
+ */
+ if (cstate->defexprs != NULL && cstate->num_defaults != 0)
+ {
+ int i;
+ for (i = 0; i < cstate->num_defaults; i++)
+ {
+ bool volatile_expr = contain_volatile_functions((Node *)cstate->defexprs[i]->expr);
+ if (volatile_expr &&
+ (max_parallel_hazard((Query *)cstate->defexprs[i]->expr)) !=
+ PROPARALLEL_SAFE)
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * FindInsertMethod - determine insert mode single, multi, or multi conditional.
+ */
+static pg_attribute_always_inline CopyInsertMethod
+FindInsertMethod(CopyState cstate)
+{
+ if (cstate->rel->trigdesc != NULL &&
+ (cstate->rel->trigdesc->trig_insert_before_row ||
+ cstate->rel->trigdesc->trig_insert_instead_row))
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
+ cstate->rel->trigdesc != NULL &&
+ cstate->rel->trigdesc->trig_insert_new_table)
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+ return CIM_MULTI_CONDITIONAL;
+
+ return CIM_MULTI;
+}
+
+/*
+ * IsParallelCopyAllowed - check for the cases where parallel copy is not
+ * applicable.
+ */
+static pg_attribute_always_inline bool
+IsParallelCopyAllowed(CopyState cstate)
+{
+ /* Parallel copy not allowed for frontend (2.0 protocol) & binary option. */
+ if ((cstate->copy_dest == COPY_OLD_FE) || cstate->binary)
+ return false;
+
+ /* Check if copy is into foreign table or temporary table. */
+ if (cstate->rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE ||
+ RelationUsesLocalBuffers(cstate->rel))
+ return false;
+
+ /* Check if trigger function is parallel safe. */
+ if (cstate->rel->trigdesc != NULL &&
+ !IsTriggerFunctionParallelSafe(cstate->rel->trigdesc))
+ return false;
+
+ /*
+ * Check if there is after statement or instead of trigger or transition
+ * table triggers.
+ */
+ if (cstate->rel->trigdesc != NULL &&
+ (cstate->rel->trigdesc->trig_insert_after_statement ||
+ cstate->rel->trigdesc->trig_insert_instead_row ||
+ cstate->rel->trigdesc->trig_insert_new_table))
+ return false;
+
+ /* Check if the volatile expressions are parallel safe, if present any. */
+ if (!CheckExprParallelSafety(cstate))
+ return false;
+
+ /* Check if the insertion mode is single. */
+ if (FindInsertMethod(cstate) == CIM_SINGLE)
+ return false;
+
+ return true;
+}
+
+/*
* BeginParallelCopy - start parallel copy tasks.
*
* Get the number of workers required to perform the parallel copy. The data
@@ -865,6 +1062,8 @@ BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
int parallel_workers = 0;
ParallelCopyData *pcdata;
+ CheckTargetRelValidity(cstate);
+
parallel_workers = Min(nworkers, max_worker_processes);
/* Can't perform copy in parallel */
@@ -878,6 +1077,19 @@ BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
cstate->pcdata = pcdata;
+ /*
+ * User chosen parallel copy. Determine if the parallel copy is actually
+ * allowed. If not, go with the non-parallel mode.
+ */
+ if (!IsParallelCopyAllowed(cstate))
+ {
+ elog(WARNING,
+ "Parallel copy not supported for specified table, copy will be run in non-parallel mode");
+ return NULL;
+ }
+
+ full_transaction_id = GetCurrentFullTransactionId();
+
EnterParallelMode();
pcxt = CreateParallelContext("postgres", "ParallelCopyMain",
parallel_workers);
@@ -1108,7 +1320,211 @@ ParallelWorkerInitialization(ParallelCopyCommonKeyData *shared_cstate,
cstate->line_buf_converted = false;
cstate->raw_buf = NULL;
cstate->raw_buf_index = cstate->raw_buf_len = 0;
+
+ PopulateCatalogInformation(cstate);
+
+ /* Create workspace for CopyReadAttributes results. */
+ if (!cstate->binary)
+ {
+ AttrNumber attr_count = list_length(cstate->attnumlist);
+
+ cstate->max_fields = attr_count;
+ cstate->raw_fields = (char **)palloc(attr_count * sizeof(char *));
+ }
+}
+
+/*
+ * CacheLineInfo - Cache the line information to local memory.
+ */
+static bool
+CacheLineInfo(CopyState cstate, uint32 buff_count)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyData *pcdata = cstate->pcdata;
+ uint32 write_pos;
+ ParallelCopyDataBlock *data_blk_ptr;
+ ParallelCopyLineBoundary *lineInfo;
+ uint32 offset;
+ int dataSize;
+ int copiedSize = 0;
+
+ resetStringInfo(&pcdata->worker_line_buf[buff_count].line_buf);
+ write_pos = GetLinePosition(cstate);
+ if (-1 == write_pos)
+ return true;
+
+ /* Get the current line information. */
+ lineInfo = &pcshared_info->line_boundaries.ring[write_pos];
+ if (pg_atomic_read_u32(&lineInfo->line_size) == 0)
+ goto empty_data_line_update;
+
+ /* Get the block information. */
+ data_blk_ptr = &pcshared_info->data_blocks[lineInfo->first_block];
+
+ /* Get the offset information from where the data must be copied. */
+ offset = lineInfo->start_offset;
+ pcdata->worker_line_buf[buff_count].cur_lineno = lineInfo->cur_lineno;
+
+ elog(DEBUG1, "[Worker] Processing - line position:%d, block:%d, unprocessed lines:%d, offset:%d, line size:%d",
+ write_pos, lineInfo->first_block,
+ pg_atomic_read_u32(&data_blk_ptr->unprocessed_line_parts),
+ offset, pg_atomic_read_u32(&lineInfo->line_size));
+
+ for (;;)
+ {
+ uint8 skip_bytes = data_blk_ptr->skip_bytes;
+ /*
+ * There is a possibility that the above loop has come out because
+ * data_blk_ptr->curr_blk_completed is set, but dataSize read might
+ * be an old value, if data_blk_ptr->curr_blk_completed and the line is
+ * completed, line_size will be set. Read the line_size again to be
+ * sure if it is complete or partial block.
+ */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+ if (dataSize)
+ {
+ int remainingSize = dataSize - copiedSize;
+ if (!remainingSize)
+ break;
+
+ /* Whole line is in current block. */
+ if (remainingSize + offset + skip_bytes < DATA_BLOCK_SIZE)
+ {
+ appendBinaryStringInfo(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ remainingSize);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts,
+ 1);
+ break;
+ }
+ else
+ {
+ /* Line is spread across the blocks. */
+ uint32 lineInCurrentBlock = (DATA_BLOCK_SIZE - skip_bytes) - offset;
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ lineInCurrentBlock);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts, 1);
+ copiedSize += lineInCurrentBlock;
+ while (copiedSize < dataSize)
+ {
+ uint32 currentBlockCopySize;
+ ParallelCopyDataBlock *currBlkPtr = &pcshared_info->data_blocks[data_blk_ptr->following_block];
+ skip_bytes = currBlkPtr->skip_bytes;
+
+ /*
+ * If complete data is present in current block use
+ * dataSize - copiedSize, or copy the whole block from
+ * current block.
+ */
+ currentBlockCopySize = Min(dataSize - copiedSize, DATA_BLOCK_SIZE - skip_bytes);
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &currBlkPtr->data[0],
+ currentBlockCopySize);
+ pg_atomic_sub_fetch_u32(&currBlkPtr->unprocessed_line_parts, 1);
+ copiedSize += currentBlockCopySize;
+ data_blk_ptr = currBlkPtr;
+ }
+
+ break;
+ }
+ }
+ else
+ {
+ /* Copy this complete block from the current offset. */
+ uint32 lineInCurrentBlock = (DATA_BLOCK_SIZE - skip_bytes) - offset;
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ lineInCurrentBlock);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts, 1);
+ copiedSize += lineInCurrentBlock;
+
+ /*
+ * Reset the offset. For the first copy, copy from the offset. For
+ * the subsequent copy the complete block.
+ */
+ offset = 0;
+
+ /* Set data_blk_ptr to the following block. */
+ data_blk_ptr = &pcshared_info->data_blocks[data_blk_ptr->following_block];
+ }
+
+ for (;;)
+ {
+ /* Get the size of this line */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+
+ /*
+ * If the data is present in current block lineInfo.line_size
+ * will be updated. If the data is spread across the blocks either
+ * of lineInfo.line_size or data_blk_ptr->curr_blk_completed can
+ * be updated. lineInfo.line_size will be updated if the complete
+ * read is finished. data_blk_ptr->curr_blk_completed will be
+ * updated if processing of current block is finished and data
+ * processing is not finished.
+ */
+ if (data_blk_ptr->curr_blk_completed || (dataSize != -1))
+ break;
+
+ COPY_WAIT_TO_PROCESS()
+ }
+ }
+
+empty_data_line_update:
+ elog(DEBUG1, "[Worker] Completed processing line:%d", write_pos);
+ pg_atomic_write_u32(&lineInfo->line_state, LINE_WORKER_PROCESSED);
+ pg_atomic_write_u32(&lineInfo->line_size, -1);
+ pg_atomic_add_fetch_u64(&pcshared_info->total_worker_processed, 1);
+ return false;
+}
+
+/*
+ * GetWorkerLine - Returns a line for worker to process.
+ */
+static bool
+GetWorkerLine(CopyState cstate)
+{
+ uint32 buff_count;
+ ParallelCopyData *pcdata = cstate->pcdata;
+
+ /*
+ * Copy the line data to line_buf and release the line position so that the
+ * worker can continue loading data.
+ */
+ if (pcdata->worker_line_buf_pos < pcdata->worker_line_buf_count)
+ goto return_line;
+
+ pcdata->worker_line_buf_pos = 0;
+ pcdata->worker_line_buf_count = 0;
+
+ for (buff_count = 0; buff_count < WORKER_CHUNK_COUNT; buff_count++)
+ {
+ bool result = CacheLineInfo(cstate, buff_count);
+ if (result)
+ break;
+
+ pcdata->worker_line_buf_count++;
+ }
+
+ if (pcdata->worker_line_buf_count)
+ goto return_line;
+ else
+ resetStringInfo(&cstate->line_buf);
+
+ return true;
+
+return_line:
+ cstate->line_buf = pcdata->worker_line_buf[pcdata->worker_line_buf_pos].line_buf;
+ cstate->cur_lineno = pcdata->worker_line_buf[pcdata->worker_line_buf_pos].cur_lineno;
+ cstate->line_buf_valid = true;
+
+ /* Mark that encoding conversion hasn't occurred yet. */
+ cstate->line_buf_converted = false;
+ CONVERT_TO_SERVER_ENCODING(cstate)
+ pcdata->worker_line_buf_pos++;
+ return false;
}
+
/*
* ParallelCopyMain - parallel copy worker's code.
*
@@ -1154,6 +1570,8 @@ ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
ereport(DEBUG1, (errmsg("Starting parallel copy worker")));
pcdata->pcshared_info = pcshared_info;
+ AssignFullTransactionIdForWorker(pcshared_info->full_transaction_id);
+ AssignCommandIdForWorker(pcshared_info->mycid, true);
shared_cstate = (ParallelCopyCommonKeyData *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_CSTATE,
false);
@@ -1206,6 +1624,34 @@ ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
MemoryContextSwitchTo(oldcontext);
return;
}
+
+/*
+ * UpdateBlockInLineInfo - Update the line information.
+ */
+static pg_attribute_always_inline int
+UpdateBlockInLineInfo(CopyState cstate, uint32 blk_pos,
+ uint32 offset, uint32 line_size, uint32 line_state)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundaries *lineBoundaryPtr = &pcshared_info->line_boundaries;
+ ParallelCopyLineBoundary *lineInfo;
+ int line_pos = lineBoundaryPtr->leader_pos;
+
+ /* Update the line information for the worker to pick and process. */
+ lineInfo = &lineBoundaryPtr->ring[line_pos];
+ while (pg_atomic_read_u32(&lineInfo->line_size) != -1)
+ COPY_WAIT_TO_PROCESS()
+
+ lineInfo->first_block = blk_pos;
+ lineInfo->start_offset = offset;
+ lineInfo->cur_lineno = cstate->cur_lineno;
+ pg_atomic_write_u32(&lineInfo->line_size, line_size);
+ pg_atomic_write_u32(&lineInfo->line_state, line_state);
+ lineBoundaryPtr->leader_pos = (lineBoundaryPtr->leader_pos + 1) % RINGSIZE;
+
+ return line_pos;
+}
+
/*
* ParallelCopyLeader - parallel copy leader's functionality.
*
@@ -1231,9 +1677,161 @@ ParallelCopyLeader(CopyState cstate)
ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
ereport(DEBUG1, (errmsg("Running parallel copy leader")));
+ /* Execute the before statement triggers from the leader */
+ ExecBeforeStmtTrigger(cstate);
+
+ /* On input just throw the header line away. */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
+ {
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ {
+ pcshared_info->is_read_in_progress = false;
+ return; /* done */
+ }
+ }
+
+ for (;;)
+ {
+ bool done;
+ cstate->cur_lineno++;
+
+ /* Actually read the line into memory here. */
+ done = CopyReadLine(cstate);
+
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ break;
+ }
+
pcshared_info->is_read_in_progress = false;
cstate->cur_lineno = 0;
}
+
+/*
+ * GetLinePosition - return the line position that worker should process.
+ */
+static uint32
+GetLinePosition(CopyState cstate)
+{
+ ParallelCopyData *pcdata = cstate->pcdata;
+ ParallelCopyShmInfo *pcshared_info = pcdata->pcshared_info;
+ uint32 previous_pos = pcdata->worker_processed_pos;
+ uint32 write_pos = (previous_pos == -1) ? 0 : (previous_pos + 1) % RINGSIZE;
+ for (;;)
+ {
+ int dataSize;
+ bool is_read_in_progress = pcshared_info->is_read_in_progress;
+ ParallelCopyLineBoundary *lineInfo;
+ ParallelCopyDataBlock *data_blk_ptr;
+ ParallelCopyLineState line_state = LINE_LEADER_POPULATED;
+ ParallelCopyLineState curr_line_state;
+ CHECK_FOR_INTERRUPTS();
+
+ /* File read completed & no elements to process. */
+ if (!is_read_in_progress &&
+ (pcshared_info->populated ==
+ pg_atomic_read_u64(&pcshared_info->total_worker_processed)))
+ {
+ write_pos = -1;
+ break;
+ }
+
+ /* Get the current line information. */
+ lineInfo = &pcshared_info->line_boundaries.ring[write_pos];
+ curr_line_state = pg_atomic_read_u32(&lineInfo->line_state);
+ if ((write_pos % WORKER_CHUNK_COUNT == 0) &&
+ (curr_line_state == LINE_WORKER_PROCESSED ||
+ curr_line_state == LINE_WORKER_PROCESSING))
+ {
+ pcdata->worker_processed_pos = write_pos;
+ write_pos = (write_pos + WORKER_CHUNK_COUNT) % RINGSIZE;
+ continue;
+ }
+
+ /* Get the size of this line. */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+
+ if (dataSize != 0) /* If not an empty line. */
+ {
+ /* Get the block information. */
+ data_blk_ptr = &pcshared_info->data_blocks[lineInfo->first_block];
+
+ if (!data_blk_ptr->curr_blk_completed && (dataSize == -1))
+ {
+ /* Wait till the current line or block is added. */
+ COPY_WAIT_TO_PROCESS()
+ continue;
+ }
+ }
+
+ /* Make sure that no worker has consumed this element. */
+ if (pg_atomic_compare_exchange_u32(&lineInfo->line_state,
+ &line_state, LINE_WORKER_PROCESSING))
+ break;
+ }
+
+ pcdata->worker_processed_pos = write_pos;
+ return write_pos;
+}
+
+/*
+ * GetFreeCopyBlock - Get a free block for data to be copied.
+ */
+static pg_attribute_always_inline uint32
+GetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
+{
+ int count = 0;
+ uint32 last_free_block = pcshared_info->cur_block_pos;
+ uint32 block_pos = (last_free_block != -1) ? ((last_free_block + 1) % MAX_BLOCKS_COUNT): 0;
+
+ /*
+ * Get a new block for copying data, don't check curent block, current
+ * block will have some unprocessed data.
+ */
+ while (count < (MAX_BLOCKS_COUNT - 1))
+ {
+ ParallelCopyDataBlock *dataBlkPtr = &pcshared_info->data_blocks[block_pos];
+ uint32 unprocessed_line_parts = pg_atomic_read_u32(&dataBlkPtr->unprocessed_line_parts);
+ if (unprocessed_line_parts == 0)
+ {
+ dataBlkPtr->curr_blk_completed = false;
+ dataBlkPtr->skip_bytes = 0;
+ pcshared_info->cur_block_pos = block_pos;
+ return block_pos;
+ }
+
+ block_pos = (block_pos + 1) % MAX_BLOCKS_COUNT;
+ count++;
+ }
+
+ return -1;
+}
+
+/*
+ * WaitGetFreeCopyBlock - If there are no blocks available, wait and get a block
+ * for copying data.
+ */
+static uint32
+WaitGetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
+{
+ uint32 new_free_pos = -1;
+ for (;;)
+ {
+ new_free_pos = GetFreeCopyBlock(pcshared_info);
+ if (new_free_pos != -1) /* We have got one block, break now. */
+ break;
+
+ COPY_WAIT_TO_PROCESS()
+ }
+
+ return new_free_pos;
+}
+
/*
* LookupParallelCopyFnPtr - Look up parallel copy function pointer.
*/
@@ -1269,6 +1867,146 @@ LookupParallelCopyFnStr(copy_data_source_cb fn_addr)
/* We can only reach this by programming error. */
elog(ERROR, "internal function pointer not found");
}
+
+/*
+ * SetRawBufForLoad - Set raw_buf to the shared memory where the file data must
+ * be read.
+ */
+static void
+SetRawBufForLoad(CopyState cstate, uint32 line_size, uint32 copy_buf_len,
+ uint32 raw_buf_ptr, char **copy_raw_buf)
+{
+ ParallelCopyShmInfo *pcshared_info;
+ uint32 cur_block_pos;
+ uint32 next_block_pos;
+ ParallelCopyDataBlock *cur_data_blk_ptr = NULL;
+ ParallelCopyDataBlock *next_data_blk_ptr = NULL;
+
+ if (!IsParallelCopy())
+ return;
+
+ pcshared_info = cstate->pcdata->pcshared_info;
+ cur_block_pos = pcshared_info->cur_block_pos;
+ cur_data_blk_ptr = (cstate->raw_buf) ? &pcshared_info->data_blocks[cur_block_pos] : NULL;
+ next_block_pos = WaitGetFreeCopyBlock(pcshared_info);
+ next_data_blk_ptr = &pcshared_info->data_blocks[next_block_pos];
+
+ /* set raw_buf to the data block in shared memory */
+ cstate->raw_buf = next_data_blk_ptr->data;
+ *copy_raw_buf = cstate->raw_buf;
+ if (cur_data_blk_ptr && line_size)
+ {
+ /*
+ * Mark the previous block as completed, worker can start copying this
+ * data.
+ */
+ cur_data_blk_ptr->following_block = next_block_pos;
+ pg_atomic_add_fetch_u32(&cur_data_blk_ptr->unprocessed_line_parts, 1);
+ cur_data_blk_ptr->skip_bytes = copy_buf_len - raw_buf_ptr;
+ cur_data_blk_ptr->curr_blk_completed = true;
+ }
+}
+
+/*
+ * EndLineParallelCopy - Update the line information in shared memory.
+ */
+static void
+EndLineParallelCopy(CopyState cstate, uint32 line_pos, uint32 line_size,
+ uint32 raw_buf_ptr)
+{
+ uint8 new_line_size;
+ if (!IsParallelCopy())
+ return;
+
+ if (!IsHeaderLine())
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundaries *lineBoundaryPtr = &pcshared_info->line_boundaries;
+ SET_NEWLINE_SIZE()
+ if (line_size)
+ {
+ ParallelCopyLineBoundary *lineInfo = &lineBoundaryPtr->ring[line_pos];
+ /*
+ * If the new_line_size > raw_buf_ptr, then the new block has only
+ * new line char content. The unprocessed count should not be
+ * increased in this case.
+ */
+ if (raw_buf_ptr > new_line_size)
+ {
+ uint32 cur_block_pos = pcshared_info->cur_block_pos;
+ ParallelCopyDataBlock *curr_data_blk_ptr = &pcshared_info->data_blocks[cur_block_pos];
+ pg_atomic_add_fetch_u32(&curr_data_blk_ptr->unprocessed_line_parts, 1);
+ }
+
+ /* Update line size. */
+ pg_atomic_write_u32(&lineInfo->line_size, line_size);
+ pg_atomic_write_u32(&lineInfo->line_state, LINE_LEADER_POPULATED);
+ elog(DEBUG1, "[Leader] After adding - line position:%d, line_size:%d",
+ line_pos, line_size);
+ pcshared_info->populated++;
+ }
+ else if (new_line_size)
+ {
+ /* This means only new line char, empty record should be inserted.*/
+ ParallelCopyLineBoundary *lineInfo;
+ line_pos = UpdateBlockInLineInfo(cstate, -1, -1, 0,
+ LINE_LEADER_POPULATED);
+ lineInfo = &lineBoundaryPtr->ring[line_pos];
+ elog(DEBUG1, "[Leader] Added empty line with offset:%d, line position:%d, line size:%d",
+ lineInfo->start_offset, line_pos,
+ pg_atomic_read_u32(&lineInfo->line_size));
+ pcshared_info->populated++;
+ }
+ }
+}
+
+/*
+ * ExecBeforeStmtTrigger - Execute the before statement trigger, this will be
+ * executed for parallel copy by the leader process.
+ */
+static void
+ExecBeforeStmtTrigger(CopyState cstate)
+{
+ EState *estate = CreateExecutorState();
+ ResultRelInfo *resultRelInfo;
+
+ Assert(IsLeader());
+
+ /*
+ * We need a ResultRelInfo so we can use the regular executor's
+ * index-entry-making machinery. (There used to be a huge amount of code
+ * here that basically duplicated execUtils.c ...)
+ */
+ resultRelInfo = makeNode(ResultRelInfo);
+ InitResultRelInfo(resultRelInfo,
+ cstate->rel,
+ 1, /* must match rel's position in range_table */
+ NULL,
+ 0);
+
+ /* Verify the named relation is a valid target for INSERT */
+ CheckValidResultRel(resultRelInfo, CMD_INSERT);
+
+ estate->es_result_relations = resultRelInfo;
+ estate->es_num_result_relations = 1;
+ estate->es_result_relation_info = resultRelInfo;
+
+ ExecInitRangeTable(estate, cstate->range_table);
+
+ /*
+ * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
+ * should do this for COPY, since it's not really an "INSERT" statement as
+ * such. However, executing these triggers maintains consistency with the
+ * EACH ROW triggers that we already fire on COPY.
+ */
+ ExecBSInsertTriggers(estate, resultRelInfo);
+
+ /* Close any trigger target relations */
+ ExecCleanUpTriggerState(estate);
+
+ FreeExecutorState(estate);
+}
+
/*
* Send copy start/stop messages for frontend copies. These have changed
* in past protocol redesigns.
@@ -3698,7 +4436,8 @@ CopyFrom(CopyState cstate)
PartitionTupleRouting *proute = NULL;
ErrorContextCallback errcallback;
- CommandId mycid = GetCurrentCommandId(true);
+ CommandId mycid = IsParallelCopy() ? GetCurrentCommandId(false) :
+ GetCurrentCommandId(true);
int ti_options = 0; /* start with default options for insert */
BulkInsertState bistate = NULL;
CopyInsertMethod insertMethod;
@@ -3708,7 +4447,14 @@ CopyFrom(CopyState cstate)
bool has_instead_insert_row_trig;
bool leafpart_use_multi_insert = false;
- CheckTargetRelValidity(cstate);
+ /*
+ * Perform this check if it is not parallel copy. In case of parallel
+ * copy, this check is done by the leader, so that if any invalid case
+ * exist the copy from command will error out from the leader itself,
+ * avoiding launching workers, just to throw error.
+ */
+ if (!IsParallelCopy())
+ CheckTargetRelValidity(cstate);
/*
* If the target file is new-in-transaction, we assume that checking FSM
@@ -3897,13 +4643,16 @@ CopyFrom(CopyState cstate)
has_instead_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
resultRelInfo->ri_TrigDesc->trig_insert_instead_row);
- /*
- * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
- * should do this for COPY, since it's not really an "INSERT" statement as
- * such. However, executing these triggers maintains consistency with the
- * EACH ROW triggers that we already fire on COPY.
- */
- ExecBSInsertTriggers(estate, resultRelInfo);
+ if (!IsParallelCopy())
+ {
+ /*
+ * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
+ * should do this for COPY, since it's not really an "INSERT" statement as
+ * such. However, executing these triggers maintains consistency with the
+ * EACH ROW triggers that we already fire on COPY.
+ */
+ ExecBSInsertTriggers(estate, resultRelInfo);
+ }
econtext = GetPerTupleExprContext(estate);
@@ -4003,6 +4752,16 @@ CopyFrom(CopyState cstate)
!has_instead_insert_row_trig &&
resultRelInfo->ri_FdwRoutine == NULL;
+ /*
+ * If the table has any partitions that are either foreign or
+ * has BEFORE/INSTEAD OF triggers, we can't perform copy
+ * operations with parallel workers.
+ */
+ if (!leafpart_use_multi_insert && IsParallelWorker())
+ ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot perform PARALLEL COPY if partition has BEFORE/INSTEAD OF triggers, or if the partition is foreign partition"),
+ errhint("Try COPY without PARALLEL option")));
+
/* Set the multi-insert buffer to use for this partition. */
if (leafpart_use_multi_insert)
{
@@ -4417,7 +5176,7 @@ BeginCopyFrom(ParseState *pstate,
initStringInfo(&cstate->attribute_buf);
initStringInfo(&cstate->line_buf);
cstate->line_buf_converted = false;
- cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf = (IsParallelCopy()) ? NULL : (char *) palloc(RAW_BUF_SIZE + 1);
cstate->raw_buf_index = cstate->raw_buf_len = 0;
/* Assign range table, we'll need it in CopyFrom. */
@@ -4561,26 +5320,35 @@ NextCopyFromRawFields(CopyState cstate, char ***fields, int *nfields)
/* only available for text or csv input */
Assert(!cstate->binary);
- /* on input just throw the header line away */
- if (cstate->cur_lineno == 0 && cstate->header_line)
+ if (IsParallelCopy())
{
- cstate->cur_lineno++;
- if (CopyReadLine(cstate))
- return false; /* done */
+ done = GetWorkerLine(cstate);
+ if (done && cstate->line_buf.len == 0)
+ return false;
}
+ else
+ {
+ /* on input just throw the header line away */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
+ {
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ return false; /* done */
+ }
- cstate->cur_lineno++;
+ cstate->cur_lineno++;
- /* Actually read the line into memory here */
- done = CopyReadLine(cstate);
+ /* Actually read the line into memory here */
+ done = CopyReadLine(cstate);
- /*
- * EOF at start of line means we're done. If we see EOF after some
- * characters, we act as though it was newline followed by EOF, ie,
- * process the line and then exit loop on next iteration.
- */
- if (done && cstate->line_buf.len == 0)
- return false;
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ return false;
+ }
/* Parse the line into de-escaped field values */
if (cstate->csv_mode)
@@ -4828,9 +5596,31 @@ CopyReadLine(CopyState cstate)
*/
if (cstate->copy_dest == COPY_NEW_FE)
{
+ bool bIsFirst = true;
do
{
- cstate->raw_buf_index = cstate->raw_buf_len;
+ if (!IsParallelCopy())
+ cstate->raw_buf_index = cstate->raw_buf_len;
+ else
+ {
+ if (cstate->raw_buf_index == RAW_BUF_SIZE)
+ {
+ /* Get a new block if it is the first time, From the
+ * subsequent time, reset the index and re-use the same
+ * block.
+ */
+ if (bIsFirst)
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint32 block_pos = WaitGetFreeCopyBlock(pcshared_info);
+ cstate->raw_buf = pcshared_info->data_blocks[block_pos].data;
+ bIsFirst = false;
+ }
+
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+ }
+ }
+
} while (CopyLoadRawBuf(cstate));
}
}
@@ -4860,6 +5650,11 @@ CopyReadLineText(CopyState cstate)
char quotec = '\0';
char escapec = '\0';
+ /* For parallel copy */
+ uint32 line_size = 0;
+ int line_pos = 0;
+
+ cstate->eol_type = EOL_UNKNOWN;
if (cstate->csv_mode)
{
quotec = cstate->quote[0];
@@ -4914,6 +5709,8 @@ CopyReadLineText(CopyState cstate)
if (raw_buf_ptr >= copy_buf_len || need_data)
{
REFILL_LINEBUF;
+ SetRawBufForLoad(cstate, line_size, copy_buf_len, raw_buf_ptr,
+ ©_raw_buf);
/*
* Try to read some more data. This will certainly reset
@@ -5138,9 +5935,15 @@ CopyReadLineText(CopyState cstate)
* discard the data and the \. sequence.
*/
if (prev_raw_ptr > cstate->raw_buf_index)
- appendBinaryStringInfo(&cstate->line_buf,
+ {
+ if (!IsParallelCopy())
+ appendBinaryStringInfo(&cstate->line_buf,
cstate->raw_buf + cstate->raw_buf_index,
prev_raw_ptr - cstate->raw_buf_index);
+ else
+ line_size += prev_raw_ptr - cstate->raw_buf_index;
+ }
+
cstate->raw_buf_index = raw_buf_ptr;
result = true; /* report EOF */
break;
@@ -5192,6 +5995,26 @@ not_end_of_copy:
IF_NEED_REFILL_AND_EOF_BREAK(mblen - 1);
raw_buf_ptr += mblen - 1;
}
+
+ /*
+ * Skip the header line. Update the line here, this cannot be done at
+ * the beginning, as there is a possibility that file contains empty
+ * lines.
+ */
+ if (IsParallelCopy() && first_char_in_line && !IsHeaderLine())
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundary *lineInfo;
+ uint32 line_first_block = pcshared_info->cur_block_pos;
+ line_pos = UpdateBlockInLineInfo(cstate,
+ line_first_block,
+ cstate->raw_buf_index, -1,
+ LINE_LEADER_POPULATING);
+ lineInfo = &pcshared_info->line_boundaries.ring[line_pos];
+ elog(DEBUG1, "[Leader] Adding - block:%d, offset:%d, line position:%d",
+ line_first_block, lineInfo->start_offset, line_pos);
+ }
+
first_char_in_line = false;
} /* end of outer loop */
@@ -5200,6 +6023,7 @@ not_end_of_copy:
*/
REFILL_LINEBUF;
CLEAR_EOL_LINE()
+ EndLineParallelCopy(cstate, line_pos, line_size, raw_buf_ptr);
return result;
}
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index db19187..20dafb7 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -381,6 +381,8 @@ extern FullTransactionId GetTopFullTransactionId(void);
extern FullTransactionId GetTopFullTransactionIdIfAny(void);
extern FullTransactionId GetCurrentFullTransactionId(void);
extern FullTransactionId GetCurrentFullTransactionIdIfAny(void);
+extern void AssignFullTransactionIdForWorker(FullTransactionId fullTransactionId);
+extern void AssignCommandIdForWorker(CommandId commandId, bool used);
extern void MarkCurrentTransactionIdLoggedIfAny(void);
extern bool SubTransactionIsActive(SubTransactionId subxid);
extern CommandId GetCurrentCommandId(bool used);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 6a42ac4..86a7620 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1701,6 +1701,7 @@ ParallelCompletionPtr
ParallelContext
ParallelCopyLineBoundaries
ParallelCopyLineBoundary
+ParallelCopyLineState
ParallelCopyCommonKeyData
ParallelCopyData
ParallelCopyDataBlock
--
1.8.3.1
0004-Documentation-for-parallel-copy.patchapplication/octet-stream; name=0004-Documentation-for-parallel-copy.patchDownload
From d04485ce9e4768cbab3f2722dadcf032586fda33 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Wed, 17 Jun 2020 07:34:56 +0530
Subject: [PATCH 4/6] Documentation for parallel copy.
This patch has the documentation changes for parallel copy.
---
doc/src/sgml/ref/copy.sgml | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18189ab..95d349d 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -37,6 +37,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
NULL '<replaceable class="parameter">null_string</replaceable>'
HEADER [ <replaceable class="parameter">boolean</replaceable> ]
+ PARALLEL <replaceable class="parameter">integer</replaceable>
QUOTE '<replaceable class="parameter">quote_character</replaceable>'
ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -275,6 +276,21 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</varlistentry>
<varlistentry>
+ <term><literal>PARALLEL</literal></term>
+ <listitem>
+ <para>
+ Perform <command>COPY FROM</command> in parallel using <replaceable
+ class="parameter"> integer</replaceable> background workers. Please
+ note that it is not guaranteed that the number of parallel workers
+ specified in <replaceable class="parameter">integer</replaceable> will
+ be used during execution. It is possible for a copy to run with fewer
+ workers than specified, or even with no workers at all. This option is
+ allowed only in <command>COPY FROM</command>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>QUOTE</literal></term>
<listitem>
<para>
--
1.8.3.1
0005-Tests-for-parallel-copy.patchapplication/octet-stream; name=0005-Tests-for-parallel-copy.patchDownload
From cd57bede56d817a95ae251ab4b3c543c73bcf9a8 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Wed, 17 Jun 2020 07:37:26 +0530
Subject: [PATCH 5/6] Tests for parallel copy.
This patch has the tests for parallel copy.
---
src/test/regress/expected/copy2.out | 210 +++++++++++++++++++++++++++++++++++-
src/test/regress/input/copy.source | 12 +++
src/test/regress/output/copy.source | 12 +++
src/test/regress/sql/copy2.sql | 208 ++++++++++++++++++++++++++++++++++-
4 files changed, 434 insertions(+), 8 deletions(-)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index e40287d..ef75148 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -254,18 +254,32 @@ It is "perfect".|
"It is ""perfect""."," "
"",
--test that we read consecutive LFs properly
-CREATE TEMP TABLE testnl (a int, b text, c int);
+CREATE TABLE testnl (a int, b text, c int);
COPY testnl FROM stdin CSV;
+COPY testnl FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
+SELECT * FROM testnl;
+ a | b | c
+---+----------------------+---
+ 1 | a field with two LFs+| 2
+ | +|
+ | inside |
+ 1 | a field with two LFs+| 2
+ | +|
+ | inside |
+(2 rows)
+
-- test end of copy marker
-CREATE TEMP TABLE testeoc (a text);
+CREATE TABLE testeoc (a text);
COPY testeoc FROM stdin CSV;
+TRUNCATE testeoc;
+COPY testeoc FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
COPY testeoc TO stdout CSV;
a\.
\.b
c\.d
"\."
-- test handling of nonstandard null marker that violates escaping rules
-CREATE TEMP TABLE testnull(a int, b text);
+CREATE TABLE testnull(a int, b text);
INSERT INTO testnull VALUES (1, E'\\0'), (NULL, NULL);
COPY testnull TO stdout WITH NULL AS E'\\0';
1 \\0
@@ -280,6 +294,15 @@ SELECT * FROM testnull;
|
(4 rows)
+TRUNCATE testnull;
+COPY testnull FROM stdin WITH (NULL E'\\0', PARALLEL 1);
+SELECT * FROM testnull;
+ a | b
+----+----
+ 42 | \0
+ |
+(2 rows)
+
BEGIN;
CREATE TABLE vistest (LIKE testeoc);
COPY vistest FROM stdin CSV;
@@ -349,6 +372,34 @@ SELECT * FROM vistest;
BEGIN;
TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+SELECT * FROM vistest;
+ a
+----
+ a2
+ b
+(2 rows)
+
+SAVEPOINT s1;
+TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+SELECT * FROM vistest;
+ a
+----
+ d2
+ e
+(2 rows)
+
+COMMIT;
+SELECT * FROM vistest;
+ a
+----
+ d2
+ e
+(2 rows)
+
+BEGIN;
+TRUNCATE vistest;
COPY vistest FROM stdin CSV FREEZE;
SELECT * FROM vistest;
a
@@ -409,7 +460,7 @@ SELECT * FROM vistest;
(2 rows)
-- Test FORCE_NOT_NULL and FORCE_NULL options
-CREATE TEMP TABLE forcetest (
+CREATE TABLE forcetest (
a INT NOT NULL,
b TEXT NOT NULL,
c TEXT,
@@ -420,6 +471,8 @@ CREATE TEMP TABLE forcetest (
-- should succeed with no effect ("b" remains an empty string, "c" remains NULL)
BEGIN;
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
+TRUNCATE forcetest;
+COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c), PARALLEL 1);
COMMIT;
SELECT b, c FROM forcetest WHERE a = 1;
b | c
@@ -430,6 +483,8 @@ SELECT b, c FROM forcetest WHERE a = 1;
-- should succeed, FORCE_NULL and FORCE_NOT_NULL can be both specified
BEGIN;
COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d));
+TRUNCATE forcetest;
+COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d), PARALLEL 1);
COMMIT;
SELECT c, d FROM forcetest WHERE a = 2;
c | d
@@ -486,6 +541,31 @@ select * from check_con_tbl;
(2 rows)
+\d+ check_con_tbl
+ Table "public.check_con_tbl"
+ Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
+--------+---------+-----------+----------+---------+---------+--------------+-------------
+ f1 | integer | | | | plain | |
+Check constraints:
+ "check_con_tbl_check" CHECK (check_con_function(check_con_tbl.*))
+
+truncate check_con_tbl;
+copy check_con_tbl from stdin with (parallel 1);
+NOTICE: input = {"f1":1}
+NOTICE: input = {"f1":null}
+copy check_con_tbl from stdin with (parallel 1);
+NOTICE: input = {"f1":0}
+ERROR: new row for relation "check_con_tbl" violates check constraint "check_con_tbl_check"
+DETAIL: Failing row contains (0).
+CONTEXT: COPY check_con_tbl, line 1: "0"
+parallel worker
+select * from check_con_tbl;
+ f1
+----
+ 1
+
+(2 rows)
+
-- test with RLS enabled.
CREATE ROLE regress_rls_copy_user;
CREATE ROLE regress_rls_copy_user_colperms;
@@ -600,8 +680,130 @@ SELECT * FROM instead_of_insert_tbl;
(2 rows)
COMMIT;
+-- Parallel copy tests.
+CREATE TABLE test_parallel_copy (
+ a INT,
+ b INT,
+ c TEXT not null default 'stuff',
+ d TEXT,
+ e TEXT
+) ;
+COPY test_parallel_copy (a, b, c, d, e) from stdin with (PARALLEL 1);
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1, FORMAT 'csv', HEADER);
+-- zero workers: should perform non-parallel copy
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0');
+ERROR: argument to option "parallel" must be a positive integer greater than zero
+LINE 1: COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0'...
+ ^
+-- referencing table: should perform non-parallel copy
+CREATE TABLE test_copy_pk(c1 INT PRIMARY KEY);
+INSERT INTO test_copy_pk VALUES(10);
+CREATE TABLE test_copy_ri(c1 INT REFERENCES test_copy_pk(c1));
+COPY test_copy_ri from stdin with (FORMAT csv, DELIMITER ',', PARALLEL 1);
+WARNING: Parallel copy not supported for specified table, copy will be run in non-parallel mode
+-- expressions: should perform non-parallel copy
+CREATE TABLE test_copy_expr (index INT, height REAL, weight REAL);
+COPY test_copy_expr FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1) WHERE height > random() * 65;
+WARNING: Parallel copy not supported for specified table, copy will be run in non-parallel mode
+-- serial data: should perform non-parallel copy
+CREATE TABLE testserial (index SERIAL, height REAL);
+COPY testserial(height) FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1);
+WARNING: Parallel copy not supported for specified table, copy will be run in non-parallel mode
+-- temporary table copy: should peform non-parallel copy
+CREATE TEMPORARY TABLE temp_test(
+ a int
+) ;
+COPY temp_test (a) from stdin with (PARALLEL 1);
+WARNING: Parallel copy not supported for specified table, copy will be run in non-parallel mode
+-- non-existent column in column list: should fail
+COPY test_parallel_copy (xyz) from stdin with (PARALLEL 1);
+ERROR: column "xyz" of relation "test_parallel_copy" does not exist
+-- too many columns in column list: should fail
+COPY test_parallel_copy (a, b, c, d, e, d, c) from stdin with (PARALLEL 1);
+ERROR: column "d" specified more than once
+-- missing data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: invalid input syntax for type integer: ""
+CONTEXT: COPY test_parallel_copy, line 0, column a: ""
+parallel worker
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: missing data for column "e"
+CONTEXT: COPY test_parallel_copy, line 1: "2000 230 23 23"
+parallel worker
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: missing data for column "e"
+CONTEXT: COPY test_parallel_copy, line 1: "2001 231 \N \N"
+parallel worker
+-- extra data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: extra data after last expected column
+CONTEXT: COPY test_parallel_copy, line 1: "2002 232 40 50 60 70 80"
+parallel worker
+-- various COPY options: delimiters, oids, NULL string, encoding
+COPY test_parallel_copy (b, c, d, e) from stdin with (delimiter ',', null 'x', PARALLEL 1) ;
+COPY test_parallel_copy from stdin WITH (DELIMITER ';', NULL '', PARALLEL 1);
+COPY test_parallel_copy from stdin WITH (DELIMITER ':', NULL E'\\X', ENCODING 'sql_ascii', PARALLEL 1);
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = 50004;
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a > 60003;
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE f > 60003;
+ERROR: column "f" does not exist
+LINE 1: ..._parallel_copy from stdin WITH (PARALLEL 1) WHERE f > 60003;
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = max(x.b);
+ERROR: missing FROM-clause entry for table "x"
+LINE 1: ...rallel_copy from stdin WITH (PARALLEL 1) WHERE a = max(x.b);
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (SELECT 1 FROM x);
+ERROR: cannot use subquery in COPY FROM WHERE condition
+LINE 1: ...arallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (SELECT...
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (generate_series(1,5));
+ERROR: set-returning functions are not allowed in COPY FROM WHERE conditions
+LINE 1: ...lel_copy from stdin WITH (PARALLEL 1) WHERE a IN (generate_s...
+ ^
+COPY test_parallel_copy from stdin WITH(PARALLEL 1) WHERE a = row_number() over(b);
+ERROR: window functions are not allowed in COPY FROM WHERE conditions
+LINE 1: ...rallel_copy from stdin WITH(PARALLEL 1) WHERE a = row_number...
+ ^
+-- check results of copy in
+SELECT * FROM test_parallel_copy;
+ a | b | c | d | e
+-------+----+------------+---------+---------
+ 1 | 11 | test_c1 | test_d1 | test_e1
+ 2 | 12 | test_c2 | test_d2 | test_e2
+ | 3 | stuff | test_d3 |
+ | 4 | stuff | test_d4 |
+ | 5 | stuff | test_d5 |
+ | | 45 | 80 | 90
+ | | x | \x | \x
+ | | , | \, | \
+ 3000 | | c | |
+ 4000 | | C | |
+ 4001 | 1 | empty | |
+ 4002 | 2 | null | |
+ 4003 | 3 | Backslash | \ | \
+ 4004 | 4 | BackslashX | \X | \X
+ 4005 | 5 | N | N | N
+ 4006 | 6 | BackslashN | \N | \N
+ 4007 | 7 | XX | XX | XX
+ 4008 | 8 | Delimiter | : | :
+ 50004 | 25 | 35 | 45 | 55
+ 60004 | 25 | 35 | 45 | 55
+ 60005 | 26 | 36 | 46 | 56
+(21 rows)
+
-- clean up
+DROP TABLE test_copy_ri;
+DROP TABLE test_copy_pk;
+DROP TABLE test_copy_expr;
+DROP TABLE testeoc;
+DROP TABLE testnl;
DROP TABLE forcetest;
+DROP TABLE test_parallel_copy;
+DROP TABLE testserial;
+DROP TABLE testnull;
DROP TABLE vistest;
DROP FUNCTION truncate_in_subxact();
DROP TABLE x, y;
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index a1d529a..159c058 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -15,6 +15,13 @@ DELETE FROM onek;
COPY onek FROM '@abs_builddir@/results/onek.data';
+-- test parallel copy
+COPY tenk1 FROM '@abs_srcdir@/data/tenk.data' WITH (parallel 2);
+
+SELECT COUNT(*) FROM tenk1;
+
+TRUNCATE tenk1;
+
COPY tenk1 FROM '@abs_srcdir@/data/tenk.data';
COPY slow_emp4000 FROM '@abs_srcdir@/data/rect.data';
@@ -159,6 +166,11 @@ truncate parted_copytest;
copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv';
+-- Test parallel copy from with a partitioned table.
+truncate parted_copytest;
+
+copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv' with (parallel 2);
+
-- Ensure COPY FREEZE errors for partitioned tables.
begin;
truncate parted_copytest;
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index 938d355..c3003fe 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -9,6 +9,15 @@ COPY onek FROM '@abs_srcdir@/data/onek.data';
COPY onek TO '@abs_builddir@/results/onek.data';
DELETE FROM onek;
COPY onek FROM '@abs_builddir@/results/onek.data';
+-- test parallel copy
+COPY tenk1 FROM '@abs_srcdir@/data/tenk.data' WITH (parallel 2);
+SELECT COUNT(*) FROM tenk1;
+ count
+-------
+ 10000
+(1 row)
+
+TRUNCATE tenk1;
COPY tenk1 FROM '@abs_srcdir@/data/tenk.data';
COPY slow_emp4000 FROM '@abs_srcdir@/data/rect.data';
COPY person FROM '@abs_srcdir@/data/person.data';
@@ -113,6 +122,9 @@ insert into parted_copytest select x,1,'One' from generate_series(1011,1020) x;
copy (select * from parted_copytest order by a) to '@abs_builddir@/results/parted_copytest.csv';
truncate parted_copytest;
copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv';
+-- Test parallel copy from with a partitioned table.
+truncate parted_copytest;
+copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv' with (parallel 2);
-- Ensure COPY FREEZE errors for partitioned tables.
begin;
truncate parted_copytest;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 902f4fa..032cea9 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -157,7 +157,7 @@ COPY y TO stdout (FORMAT CSV, FORCE_QUOTE *);
--test that we read consecutive LFs properly
-CREATE TEMP TABLE testnl (a int, b text, c int);
+CREATE TABLE testnl (a int, b text, c int);
COPY testnl FROM stdin CSV;
1,"a field with two LFs
@@ -165,8 +165,16 @@ COPY testnl FROM stdin CSV;
inside",2
\.
+COPY testnl FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
+1,"a field with two LFs
+
+inside",2
+\.
+
+SELECT * FROM testnl;
+
-- test end of copy marker
-CREATE TEMP TABLE testeoc (a text);
+CREATE TABLE testeoc (a text);
COPY testeoc FROM stdin CSV;
a\.
@@ -175,11 +183,19 @@ c\.d
"\."
\.
+TRUNCATE testeoc;
+COPY testeoc FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
+a\.
+\.b
+c\.d
+"\."
+\.
+
COPY testeoc TO stdout CSV;
-- test handling of nonstandard null marker that violates escaping rules
-CREATE TEMP TABLE testnull(a int, b text);
+CREATE TABLE testnull(a int, b text);
INSERT INTO testnull VALUES (1, E'\\0'), (NULL, NULL);
COPY testnull TO stdout WITH NULL AS E'\\0';
@@ -191,6 +207,14 @@ COPY testnull FROM stdin WITH NULL AS E'\\0';
SELECT * FROM testnull;
+TRUNCATE testnull;
+COPY testnull FROM stdin WITH (NULL E'\\0', PARALLEL 1);
+42 \\0
+\0 \0
+\.
+
+SELECT * FROM testnull;
+
BEGIN;
CREATE TABLE vistest (LIKE testeoc);
COPY vistest FROM stdin CSV;
@@ -235,6 +259,23 @@ SELECT * FROM vistest;
BEGIN;
TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+a2
+b
+\.
+SELECT * FROM vistest;
+SAVEPOINT s1;
+TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+d2
+e
+\.
+SELECT * FROM vistest;
+COMMIT;
+SELECT * FROM vistest;
+
+BEGIN;
+TRUNCATE vistest;
COPY vistest FROM stdin CSV FREEZE;
x
y
@@ -284,7 +325,7 @@ SELECT * FROM vistest;
COMMIT;
SELECT * FROM vistest;
-- Test FORCE_NOT_NULL and FORCE_NULL options
-CREATE TEMP TABLE forcetest (
+CREATE TABLE forcetest (
a INT NOT NULL,
b TEXT NOT NULL,
c TEXT,
@@ -297,6 +338,10 @@ BEGIN;
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
1,,""
\.
+TRUNCATE forcetest;
+COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c), PARALLEL 1);
+1,,""
+\.
COMMIT;
SELECT b, c FROM forcetest WHERE a = 1;
-- should succeed, FORCE_NULL and FORCE_NOT_NULL can be both specified
@@ -304,6 +349,10 @@ BEGIN;
COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d));
2,'a',,""
\.
+TRUNCATE forcetest;
+COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d), PARALLEL 1);
+2,'a',,""
+\.
COMMIT;
SELECT c, d FROM forcetest WHERE a = 2;
-- should fail with not-null constraint violation
@@ -339,6 +388,16 @@ copy check_con_tbl from stdin;
0
\.
select * from check_con_tbl;
+\d+ check_con_tbl
+truncate check_con_tbl;
+copy check_con_tbl from stdin with (parallel 1);
+1
+\N
+\.
+copy check_con_tbl from stdin with (parallel 1);
+0
+\.
+select * from check_con_tbl;
-- test with RLS enabled.
CREATE ROLE regress_rls_copy_user;
@@ -440,8 +499,149 @@ test1
SELECT * FROM instead_of_insert_tbl;
COMMIT;
+-- Parallel copy tests.
+CREATE TABLE test_parallel_copy (
+ a INT,
+ b INT,
+ c TEXT not null default 'stuff',
+ d TEXT,
+ e TEXT
+) ;
+
+COPY test_parallel_copy (a, b, c, d, e) from stdin with (PARALLEL 1);
+1 11 test_c1 test_d1 test_e1
+2 12 test_c2 test_d2 test_e2
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+3 test_d3
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+4 test_d4
+5 test_d5
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1, FORMAT 'csv', HEADER);
+b d
+\.
+
+-- zero workers: should perform non-parallel copy
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0');
+
+-- referencing table: should perform non-parallel copy
+CREATE TABLE test_copy_pk(c1 INT PRIMARY KEY);
+INSERT INTO test_copy_pk VALUES(10);
+CREATE TABLE test_copy_ri(c1 INT REFERENCES test_copy_pk(c1));
+COPY test_copy_ri from stdin with (FORMAT csv, DELIMITER ',', PARALLEL 1);
+10
+\.
+
+-- expressions: should perform non-parallel copy
+CREATE TABLE test_copy_expr (index INT, height REAL, weight REAL);
+
+COPY test_copy_expr FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1) WHERE height > random() * 65;
+60,60,60
+\.
+
+-- serial data: should perform non-parallel copy
+CREATE TABLE testserial (index SERIAL, height REAL);
+
+COPY testserial(height) FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1);
+60
+\.
+
+-- temporary table copy: should peform non-parallel copy
+CREATE TEMPORARY TABLE temp_test(
+ a int
+) ;
+
+COPY temp_test (a) from stdin with (PARALLEL 1);
+10
+\.
+
+-- non-existent column in column list: should fail
+COPY test_parallel_copy (xyz) from stdin with (PARALLEL 1);
+
+-- too many columns in column list: should fail
+COPY test_parallel_copy (a, b, c, d, e, d, c) from stdin with (PARALLEL 1);
+
+-- missing data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+
+\.
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+2000 230 23 23
+\.
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+2001 231 \N \N
+\.
+
+-- extra data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+2002 232 40 50 60 70 80
+\.
+
+-- various COPY options: delimiters, oids, NULL string, encoding
+COPY test_parallel_copy (b, c, d, e) from stdin with (delimiter ',', null 'x', PARALLEL 1) ;
+x,45,80,90
+x,\x,\\x,\\\x
+x,\,,\\\,,\\
+\.
+
+COPY test_parallel_copy from stdin WITH (DELIMITER ';', NULL '', PARALLEL 1);
+3000;;c;;
+\.
+
+COPY test_parallel_copy from stdin WITH (DELIMITER ':', NULL E'\\X', ENCODING 'sql_ascii', PARALLEL 1);
+4000:\X:C:\X:\X
+4001:1:empty::
+4002:2:null:\X:\X
+4003:3:Backslash:\\:\\
+4004:4:BackslashX:\\X:\\X
+4005:5:N:\N:\N
+4006:6:BackslashN:\\N:\\N
+4007:7:XX:\XX:\XX
+4008:8:Delimiter:\::\:
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = 50004;
+50003 24 34 44 54
+50004 25 35 45 55
+50005 26 36 46 56
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a > 60003;
+60001 22 32 42 52
+60002 23 33 43 53
+60003 24 34 44 54
+60004 25 35 45 55
+60005 26 36 46 56
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE f > 60003;
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = max(x.b);
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (SELECT 1 FROM x);
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (generate_series(1,5));
+
+COPY test_parallel_copy from stdin WITH(PARALLEL 1) WHERE a = row_number() over(b);
+
+-- check results of copy in
+SELECT * FROM test_parallel_copy;
+
-- clean up
+DROP TABLE test_copy_ri;
+DROP TABLE test_copy_pk;
+DROP TABLE test_copy_expr;
+DROP TABLE testeoc;
+DROP TABLE testnl;
DROP TABLE forcetest;
+DROP TABLE test_parallel_copy;
+DROP TABLE testserial;
+DROP TABLE testnull;
DROP TABLE vistest;
DROP FUNCTION truncate_in_subxact();
DROP TABLE x, y;
--
1.8.3.1
0006-Parallel-Copy-For-Binary-Format-Files.patchapplication/octet-stream; name=0006-Parallel-Copy-For-Binary-Format-Files.patchDownload
From dbcf9dcfc87d8e5eed4d56b2a079a64258c591c6 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Sat, 11 Jul 2020 11:39:50 +0530
Subject: [PATCH v10] Parallel Copy For Binary Format Files
Leader reads data from the file into the DSM data blocks each of 64K size.
It also identifies each tuple data block id, start offset, end offset,
tuple size and updates this information in the ring data structure.
Workers parallely read the tuple information from the ring data structure,
the actual tuple data from the data blocks and parallely insert the tuples
into the table.
---
src/backend/commands/copy.c | 655 +++++++++++++++++++++++++++++++-----
1 file changed, 574 insertions(+), 81 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 048f2d2cb6..a4e4163c32 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -219,6 +219,17 @@ typedef struct ParallelCopyLineBuf
uint64 cur_lineno; /* line number for error messages */
}ParallelCopyLineBuf;
+/*
+ * Represents the usage mode for CopyReadBinaryGetDataBlock.
+ */
+typedef enum FieldInfoType
+{
+ FIELD_NONE = 0,
+ FIELD_COUNT,
+ FIELD_SIZE,
+ FIELD_DATA
+} FieldInfoType;
+
/*
* Parallel copy data information.
*/
@@ -240,6 +251,9 @@ typedef struct ParallelCopyData
/* Current position in worker_line_buf */
uint32 worker_line_buf_pos;
+
+ /* For binary formatted files */
+ ParallelCopyDataBlock *curr_data_block;
}ParallelCopyData;
/*
@@ -397,6 +411,7 @@ typedef struct ParallelCopyCommonKeyData
/* Working state for COPY FROM */
AttrNumber num_defaults;
Oid relid;
+ bool binary;
}ParallelCopyCommonKeyData;
/*
@@ -502,7 +517,6 @@ typedef struct CopyMultiInsertInfo
int ti_options; /* table insert options */
} CopyMultiInsertInfo;
-
/*
* These macros centralize code used to process line_buf and raw_buf buffers.
* They are macros because they often do continue/break control and to avoid
@@ -697,8 +711,110 @@ if (!IsParallelCopy()) \
else \
return pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
-static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
+/*
+ * CHECK_FIELD_COUNT - Handles the error cases for field count
+ * for binary format files.
+ */
+#define CHECK_FIELD_COUNT \
+{\
+ if (fld_count == -1) \
+ { \
+ if (IsParallelCopy() && \
+ !IsLeader()) \
+ return true; \
+ else if (IsParallelCopy() && \
+ IsLeader()) \
+ { \
+ if (cstate->pcdata->curr_data_block->data[cstate->raw_buf_index + sizeof(fld_count)] != 0) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("received copy data after EOF marker"))); \
+ return true; \
+ } \
+ else \
+ { \
+ /* \
+ * Received EOF marker. In a V3-protocol copy, wait for the \
+ * protocol-level EOF, and complain if it doesn't come \
+ * immediately. This ensures that we correctly handle CopyFail, \
+ * if client chooses to send that now. \
+ * \
+ * Note that we MUST NOT try to read more data in an old-protocol \
+ * copy, since there is no protocol-level EOF marker then. We \
+ * could go either way for copy from file, but choose to throw \
+ * error if there's data after the EOF marker, for consistency \
+ * with the new-protocol case. \
+ */ \
+ char dummy; \
+ if (cstate->copy_dest != COPY_OLD_FE && \
+ CopyGetData(cstate, &dummy, 1, 1) > 0) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("received copy data after EOF marker"))); \
+ return false; \
+ } \
+ } \
+ if (fld_count != cstate->max_fields) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("row field count is %d, expected %d", \
+ (int) fld_count, cstate->max_fields))); \
+}
+
+/*
+ * CHECK_FIELD_SIZE - Handles the error case for field size
+ * for binary format files.
+ */
+#define CHECK_FIELD_SIZE(fld_size) \
+{ \
+ if (fld_size < -1) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("invalid field size")));\
+}
+
+/*
+ * EOF_ERROR - Error statement for EOF for binary format
+ * files.
+ */
+#define EOF_ERROR \
+{ \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("unexpected EOF in COPY data")));\
+}
+
+/*
+ * GET_RAW_BUF_INDEX - Caclulates the raw buf index for the cases
+ * where the data spread is across multiple data blocks.
+ */
+#define GET_RAW_BUF_INDEX(raw_buf_index, fld_size, required_blks, curr_blk_bytes) \
+{ \
+ raw_buf_index = fld_size - (((required_blks - 1) * DATA_BLOCK_SIZE) + curr_blk_bytes); \
+}
+
+/*
+ * GET_REQUIRED_BLOCKS - Caclulates the number of data
+ * blocks required for the cases where the data spread
+ * is across multiple data blocks.
+ */
+#define GET_REQUIRED_BLOCKS(required_blks, fld_size, curr_blk_bytes) \
+{ \
+ /* \
+ * field size can spread across multiple data blocks, \
+ * calculate the number of required data blocks and try to get \
+ * those many data blocks. \
+ */ \
+ required_blks = (int32)(fld_size - curr_blk_bytes)/(int32)DATA_BLOCK_SIZE; \
+ /* \
+ * check if we need the data block for the field data \
+ * bytes that are not modulus of data block size. \
+ */ \
+ if ((fld_size - curr_blk_bytes)%DATA_BLOCK_SIZE != 0) \
+ required_blks++; \
+}
+static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
/* non-export function prototypes */
static CopyState BeginCopy(ParseState *pstate, bool is_from, Relation rel,
@@ -751,6 +867,13 @@ static void PopulateCatalogInformation(CopyState cstate);
static pg_attribute_always_inline uint32 GetLinePosition(CopyState cstate);
static pg_attribute_always_inline copy_data_source_cb LookupParallelCopyFnPtr(const char *funcname);
static pg_attribute_always_inline char* LookupParallelCopyFnStr(copy_data_source_cb fn_addr);
+static uint32 WaitGetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info);
+static bool CopyReadBinaryTupleLeader(CopyState cstate);
+static void CopyReadBinaryFindTupleSize(CopyState cstate, uint32 *line_size);
+static bool CopyReadBinaryTupleWorker(CopyState cstate, Datum *values, bool *nulls);
+static Datum CopyReadBinaryAttributeWorker(CopyState cstate, FmgrInfo *flinfo,
+ Oid typioparam, int32 typmod, bool *isnull);
+static void CopyReadBinaryGetDataBlock(CopyState cstate, FieldInfoType field_info);
/*
* CopyCommonInfoForWorker - Copy shared_cstate using cstate information.
@@ -769,6 +892,7 @@ CopyCommonInfoForWorker(CopyState cstate, ParallelCopyCommonKeyData *shared_csta
shared_cstate->convert_selectively = cstate->convert_selectively;
shared_cstate->num_defaults = cstate->num_defaults;
shared_cstate->relid = cstate->pcdata->relid;
+ shared_cstate->binary = cstate->binary;
}
/*
@@ -997,8 +1121,8 @@ FindInsertMethod(CopyState cstate)
static pg_attribute_always_inline bool
IsParallelCopyAllowed(CopyState cstate)
{
- /* Parallel copy not allowed for frontend (2.0 protocol) & binary option. */
- if ((cstate->copy_dest == COPY_OLD_FE) || cstate->binary)
+ /* Parallel copy not allowed for frontend (2.0 protocol). */
+ if (cstate->copy_dest == COPY_OLD_FE)
return false;
/* Check if copy is into foreign table or temporary table. */
@@ -1299,6 +1423,7 @@ ParallelWorkerInitialization(ParallelCopyCommonKeyData *shared_cstate,
cstate->convert_selectively = shared_cstate->convert_selectively;
cstate->num_defaults = shared_cstate->num_defaults;
pcdata->relid = shared_cstate->relid;
+ cstate->binary = shared_cstate->binary;
PopulateGlobalsForCopyFrom(cstate, tup_desc, attnamelist);
@@ -1314,7 +1439,7 @@ ParallelWorkerInitialization(ParallelCopyCommonKeyData *shared_cstate,
initStringInfo(&cstate->attribute_buf);
initStringInfo(&cstate->line_buf);
- for (count = 0; count < WORKER_CHUNK_COUNT;count++)
+ for (count = 0; count < WORKER_CHUNK_COUNT; count++)
initStringInfo(&pcdata->worker_line_buf[count].line_buf);
cstate->line_buf_converted = false;
@@ -1680,32 +1805,66 @@ ParallelCopyLeader(CopyState cstate)
/* Execute the before statement triggers from the leader */
ExecBeforeStmtTrigger(cstate);
- /* On input just throw the header line away. */
- if (cstate->cur_lineno == 0 && cstate->header_line)
+ if (!cstate->binary)
{
- cstate->cur_lineno++;
- if (CopyReadLine(cstate))
+ /* On input just throw the header line away. */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
{
- pcshared_info->is_read_in_progress = false;
- return; /* done */
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ {
+ pcshared_info->is_read_in_progress = false;
+ return; /* done */
+ }
}
- }
- for (;;)
- {
- bool done;
- cstate->cur_lineno++;
+ for (;;)
+ {
+ bool done;
+ cstate->cur_lineno++;
- /* Actually read the line into memory here. */
- done = CopyReadLine(cstate);
+ /* Actually read the line into memory here. */
+ done = CopyReadLine(cstate);
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ break;
+ }
+ }
+ else
+ {
/*
- * EOF at start of line means we're done. If we see EOF after some
- * characters, we act as though it was newline followed by EOF, ie,
- * process the line and then exit loop on next iteration.
+ * Binary Format Files.
+ * For parallel copy leader, fill in the error
+ * context inforation here, in case any failures
+ * while determining tuple offsets, leader
+ * would throw the errors with proper context.
*/
- if (done && cstate->line_buf.len == 0)
- break;
+ ErrorContextCallback errcallback;
+ errcallback.callback = CopyFromErrorCallback;
+ errcallback.arg = (void *) cstate;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+ cstate->pcdata->curr_data_block = NULL;
+ cstate->raw_buf_index = 0;
+ pcshared_info->populated = 0;
+ cstate->cur_lineno = 0;
+ cstate->max_fields = list_length(cstate->attnumlist);
+
+ for (;;)
+ {
+ bool eof = false;
+ cstate->cur_lineno++;
+
+ eof = CopyReadBinaryTupleLeader(cstate);
+
+ if (eof)
+ break;
+ }
}
pcshared_info->is_read_in_progress = false;
@@ -1713,7 +1872,357 @@ ParallelCopyLeader(CopyState cstate)
}
/*
- * GetLinePosition - return the line position that worker should process.
+ * CopyReadBinaryGetDataBlock - gets a new block, updates
+ * the current offset, calculates the skip bytes.
+ */
+static void
+CopyReadBinaryGetDataBlock(CopyState cstate, FieldInfoType field_info)
+{
+ ParallelCopyDataBlock *data_block = NULL;
+ ParallelCopyDataBlock *curr_data_block = cstate->pcdata->curr_data_block;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint8 move_bytes = 0;
+ uint32 block_pos;
+ uint32 prev_block_pos;
+ int read_bytes = 0;
+
+ prev_block_pos = pcshared_info->cur_block_pos;
+
+ block_pos = WaitGetFreeCopyBlock(pcshared_info);
+
+ if (field_info == FIELD_SIZE || field_info == FIELD_COUNT)
+ move_bytes = (DATA_BLOCK_SIZE - cstate->raw_buf_index);
+
+ if (curr_data_block != NULL)
+ curr_data_block->skip_bytes = move_bytes;
+
+ data_block = &pcshared_info->data_blocks[block_pos];
+
+ if (move_bytes > 0 && curr_data_block != NULL)
+ memmove(&data_block->data[0], &curr_data_block->data[cstate->raw_buf_index], move_bytes);
+
+ elog(DEBUG1, "LEADER - field info %d is spread across data blocks - moved %d bytes from current block %u to %u block",
+ field_info, move_bytes, prev_block_pos, block_pos);
+
+ read_bytes = CopyGetData(cstate, &data_block->data[move_bytes], 1, (DATA_BLOCK_SIZE - move_bytes));
+
+ if (field_info == FIELD_NONE && cstate->reached_eof)
+ return;
+
+ if (cstate->reached_eof)
+ EOF_ERROR;
+
+ elog(DEBUG1, "LEADER - bytes read from file %d", read_bytes);
+
+ if(field_info == FIELD_SIZE || field_info == FIELD_DATA)
+ {
+ ParallelCopyDataBlock *prev_data_block = NULL;
+ prev_data_block = curr_data_block;
+ prev_data_block->following_block = block_pos;
+
+ if (prev_data_block->curr_blk_completed == false)
+ prev_data_block->curr_blk_completed = true;
+
+ pg_atomic_add_fetch_u32(&prev_data_block->unprocessed_line_parts, 1);
+ }
+
+ cstate->pcdata->curr_data_block = data_block;
+ cstate->raw_buf_index = 0;
+}
+
+/*
+ * CopyReadBinaryTupleLeader - leader reads data from binary formatted file
+ * to data blocks and identies tuple boundaries/offsets so that workers
+ * can work on the data blocks data.
+ */
+static bool
+CopyReadBinaryTupleLeader(CopyState cstate)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ int16 fld_count;
+ uint32 line_size = 0;
+ uint32 start_block_pos;
+ uint32 start_offset;
+
+ if (cstate->pcdata->curr_data_block == NULL)
+ {
+ CopyReadBinaryGetDataBlock(cstate, FIELD_NONE);
+
+ /*
+ * no data is read from file here. one possibility
+ * to be here could be that the binary file just
+ * has a valid signature but nothing else.
+ */
+ if (cstate->reached_eof)
+ return true;
+ }
+
+ if ((cstate->raw_buf_index + sizeof(fld_count)) >= DATA_BLOCK_SIZE)
+ CopyReadBinaryGetDataBlock(cstate, FIELD_COUNT);
+
+ memcpy(&fld_count, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_count));
+ fld_count = (int16) pg_ntoh16(fld_count);
+ CHECK_FIELD_COUNT;
+ start_offset = cstate->raw_buf_index;
+ cstate->raw_buf_index += sizeof(fld_count);
+ line_size += sizeof(fld_count);
+ start_block_pos = pcshared_info->cur_block_pos;
+
+ CopyReadBinaryFindTupleSize(cstate, &line_size);
+
+ pg_atomic_add_fetch_u32(&cstate->pcdata->curr_data_block->unprocessed_line_parts, 1);
+
+ if (line_size > 0)
+ {
+ int line_pos = UpdateBlockInLineInfo(cstate,
+ start_block_pos,
+ start_offset,
+ line_size,
+ LINE_LEADER_POPULATED);
+
+ pcshared_info->populated++;
+ elog(DEBUG1, "LEADER - adding - block:%u, offset:%u, line size:%u line position:%d",
+ start_block_pos, start_offset, line_size, line_pos);
+ }
+
+ return false;
+}
+
+/*
+ * CopyReadBinaryFindTupleSize - leader identifies boundaries/
+ * offsets for each attribute/column and finally results in the
+ * tuple/row size. It moves on to next data block if the attribute/
+ * column is spread across data blocks.
+ */
+static void
+CopyReadBinaryFindTupleSize(CopyState cstate, uint32 *line_size)
+{
+ int32 fld_size;
+ ListCell *cur;
+ TupleDesc tup_desc = RelationGetDescr(cstate->rel);
+
+ foreach(cur, cstate->attnumlist)
+ {
+ int att_num = lfirst_int(cur);
+ Form_pg_attribute att = TupleDescAttr(tup_desc, (att_num - 1));
+ cstate->cur_attname = NameStr(att->attname);
+ fld_size = 0;
+
+ if ((cstate->raw_buf_index + sizeof(fld_size)) >= DATA_BLOCK_SIZE)
+ CopyReadBinaryGetDataBlock(cstate, FIELD_SIZE);
+
+ memcpy(&fld_size, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_size));
+ cstate->raw_buf_index += sizeof(fld_size);
+ *line_size += sizeof(fld_size);
+ fld_size = (int32) pg_ntoh32(fld_size);
+
+ CHECK_FIELD_SIZE(fld_size);
+
+ *line_size += fld_size;
+
+ if ((DATA_BLOCK_SIZE - cstate->raw_buf_index) >= fld_size)
+ {
+ cstate->raw_buf_index += fld_size;
+ elog(DEBUG1, "LEADER - tuple lies in he same data block");
+ }
+ else
+ {
+ int32 required_blks = 0;
+ int32 curr_blk_bytes = (DATA_BLOCK_SIZE - cstate->raw_buf_index);
+ int i = 0;
+
+ GET_REQUIRED_BLOCKS(required_blks, fld_size, curr_blk_bytes);
+
+ i = required_blks;
+
+ while(i > 0)
+ {
+ CopyReadBinaryGetDataBlock(cstate, FIELD_DATA);
+ i--;
+ }
+
+ GET_RAW_BUF_INDEX(cstate->raw_buf_index, fld_size, required_blks, curr_blk_bytes);
+
+ /*
+ * raw_buf_index should never cross data block size,
+ * as the required number of data blocks would have
+ * been obtained in the above while loop.
+ */
+ Assert(cstate->raw_buf_index <= DATA_BLOCK_SIZE);
+ }
+ cstate->cur_attname = NULL;
+ }
+}
+
+/*
+ * CopyReadBinaryTupleWorker - each worker reads data from data blocks after
+ * getting leader-identified tuple offsets from ring data structure.
+ */
+static bool
+CopyReadBinaryTupleWorker(CopyState cstate, Datum *values, bool *nulls)
+{
+ uint32 line_pos;
+ ParallelCopyLineBoundary *line_info;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ int16 fld_count;
+ ListCell *cur;
+ FmgrInfo *in_functions = cstate->in_functions;
+ Oid *typioparams = cstate->typioparams;
+ TupleDesc tup_desc = RelationGetDescr(cstate->rel);
+
+ line_pos = GetLinePosition(cstate);
+
+ if (line_pos == -1)
+ return true;
+
+ line_info = &pcshared_info->line_boundaries.ring[line_pos];
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[line_info->first_block];
+ cstate->raw_buf_index = line_info->start_offset;
+
+#if 0
+ if (cstate->raw_buf_index + sizeof(fld_count) >= DATA_BLOCK_SIZE)
+ {
+ /*
+ * The case where field count spread across datablocks should never occur,
+ * as the leader would have moved it to next block. enable this if block
+ * for debugging purposes only.
+ */
+ elog(DEBUG1, "WORKER - field count spread across datablocks should never occur");
+ }
+#endif
+
+ memcpy(&fld_count, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_count));
+ fld_count = (int16) pg_ntoh16(fld_count);
+
+ CHECK_FIELD_COUNT;
+
+ cstate->raw_buf_index += sizeof(fld_count);
+
+ foreach(cur, cstate->attnumlist)
+ {
+ int att_num = lfirst_int(cur);
+ int m = att_num - 1;
+ Form_pg_attribute att = TupleDescAttr(tup_desc, m);
+ cstate->cur_attname = NameStr(att->attname);
+
+ values[m] = CopyReadBinaryAttributeWorker(cstate,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &nulls[m]);
+ cstate->cur_attname = NULL;
+ }
+
+ pg_atomic_sub_fetch_u32(&cstate->pcdata->curr_data_block->unprocessed_line_parts, 1);
+ line_info->start_offset = -1;
+ pg_atomic_write_u32(&line_info->line_state, LINE_WORKER_PROCESSED);
+ pg_atomic_write_u32(&line_info->line_size, -1);
+ pg_atomic_add_fetch_u64(&pcshared_info->total_worker_processed, 1);
+
+ return false;
+}
+
+/*
+ * CopyReadBinaryAttributeWorker - leader identifies boundaries/offsets
+ * for each attribute/column, it moves on to next data block if the
+ * attribute/column is spread across data blocks.
+ */
+static Datum
+CopyReadBinaryAttributeWorker(CopyState cstate, FmgrInfo *flinfo,
+ Oid typioparam, int32 typmod, bool *isnull)
+{
+ int32 fld_size;
+ Datum result;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+
+ if ((cstate->raw_buf_index + sizeof(fld_size)) >= DATA_BLOCK_SIZE)
+ {
+ ParallelCopyDataBlock *prev_data_block = cstate->pcdata->curr_data_block;
+ elog(DEBUG1, "WORKER - field size is spread across data blocks");
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[prev_data_block->following_block];
+ pg_atomic_sub_fetch_u32(&prev_data_block->unprocessed_line_parts, 1);
+ cstate->raw_buf_index = 0;
+ }
+
+ memcpy(&fld_size, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_size));
+ fld_size = (int32) pg_ntoh32(fld_size);
+
+ CHECK_FIELD_SIZE(fld_size);
+
+ cstate->raw_buf_index += sizeof(fld_size);
+
+ /* reset attribute_buf to empty, and load raw data in it */
+ resetStringInfo(&cstate->attribute_buf);
+
+ enlargeStringInfo(&cstate->attribute_buf, fld_size);
+
+ if ((DATA_BLOCK_SIZE - cstate->raw_buf_index) >= fld_size)
+ {
+ elog(DEBUG1, "WORKER - tuple lies in single data block");
+ memcpy(&cstate->attribute_buf.data[0],&cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], fld_size);
+ cstate->raw_buf_index += fld_size;
+ }
+ else
+ {
+ uint32 att_buf_idx = 0;
+ uint32 copy_bytes = 0;
+ int32 required_blks = 0;
+ int32 curr_blk_bytes = (DATA_BLOCK_SIZE - cstate->raw_buf_index);
+ ParallelCopyDataBlock *prev_data_block = NULL;
+ int i = 0;
+
+ GET_REQUIRED_BLOCKS(required_blks, fld_size, curr_blk_bytes);
+
+ i = required_blks;
+ prev_data_block = cstate->pcdata->curr_data_block;
+ elog(DEBUG1, "WORKER - tuple is spread across data blocks");
+ memcpy(&cstate->attribute_buf.data[0], &prev_data_block->data[cstate->raw_buf_index],
+ curr_blk_bytes);
+ copy_bytes = curr_blk_bytes;
+ att_buf_idx = curr_blk_bytes;
+
+ while (i>0)
+ {
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[prev_data_block->following_block];
+ pg_atomic_sub_fetch_u32(&prev_data_block->unprocessed_line_parts, 1);
+ cstate->raw_buf_index = 0;
+ copy_bytes = fld_size - att_buf_idx;
+
+ /*
+ * the bytes that are yet to be taken into att buff are more than
+ * the entire data block size, but only take the data block size
+ * elements.
+ */
+ if (copy_bytes >= DATA_BLOCK_SIZE)
+ copy_bytes = DATA_BLOCK_SIZE;
+
+ memcpy(&cstate->attribute_buf.data[att_buf_idx],
+ &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], copy_bytes);
+ att_buf_idx += copy_bytes;
+ i--;
+ }
+ GET_RAW_BUF_INDEX(cstate->raw_buf_index, fld_size, required_blks, curr_blk_bytes);
+ }
+
+ cstate->attribute_buf.len = fld_size;
+ cstate->attribute_buf.data[fld_size] = '\0';
+
+ /* Call the column type's binary input converter */
+ result = ReceiveFunctionCall(flinfo, &cstate->attribute_buf,
+ typioparam, typmod);
+
+ /* Trouble if it didn't eat the whole buffer */
+ if (cstate->attribute_buf.cursor != cstate->attribute_buf.len)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
+ errmsg("incorrect binary data format")));
+
+ *isnull = false;
+ return result;
+}
+
+/*
+ * GetLinePosition - return the line position that worker should pcdata->process.
*/
static uint32
GetLinePosition(CopyState cstate)
@@ -1801,7 +2310,9 @@ GetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
{
dataBlkPtr->curr_blk_completed = false;
dataBlkPtr->skip_bytes = 0;
+ dataBlkPtr->following_block = -1;
pcshared_info->cur_block_pos = block_pos;
+ MemSet(&dataBlkPtr->data[0], 0, DATA_BLOCK_SIZE);
return block_pos;
}
@@ -5476,60 +5987,47 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
else
{
/* binary */
- int16 fld_count;
- ListCell *cur;
-
- cstate->cur_lineno++;
-
- if (!CopyGetInt16(cstate, &fld_count))
+ if (!IsParallelCopy())
{
- /* EOF detected (end of file, or protocol-level EOF) */
- return false;
- }
+ int16 fld_count;
+ ListCell *cur;
- if (fld_count == -1)
- {
- /*
- * Received EOF marker. In a V3-protocol copy, wait for the
- * protocol-level EOF, and complain if it doesn't come
- * immediately. This ensures that we correctly handle CopyFail,
- * if client chooses to send that now.
- *
- * Note that we MUST NOT try to read more data in an old-protocol
- * copy, since there is no protocol-level EOF marker then. We
- * could go either way for copy from file, but choose to throw
- * error if there's data after the EOF marker, for consistency
- * with the new-protocol case.
- */
- char dummy;
+ cstate->cur_lineno++;
+ cstate->max_fields = list_length(cstate->attnumlist);
- if (cstate->copy_dest != COPY_OLD_FE &&
- CopyGetData(cstate, &dummy, 1, 1) > 0)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("received copy data after EOF marker")));
- return false;
- }
+ if (!CopyGetInt16(cstate, &fld_count))
+ {
+ /* EOF detected (end of file, or protocol-level EOF) */
+ return false;
+ }
- if (fld_count != attr_count)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("row field count is %d, expected %d",
- (int) fld_count, attr_count)));
+ CHECK_FIELD_COUNT;
- foreach(cur, cstate->attnumlist)
+ foreach(cur, cstate->attnumlist)
+ {
+ int attnum = lfirst_int(cur);
+ int m = attnum - 1;
+ Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+ cstate->cur_attname = NameStr(att->attname);
+ values[m] = CopyReadBinaryAttribute(cstate,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &nulls[m]);
+ cstate->cur_attname = NULL;
+ }
+ }
+ else
{
- int attnum = lfirst_int(cur);
- int m = attnum - 1;
- Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+ bool eof = false;
+ cstate->cur_lineno++;
+ cstate->max_fields = list_length(cstate->attnumlist);
- cstate->cur_attname = NameStr(att->attname);
- values[m] = CopyReadBinaryAttribute(cstate,
- &in_functions[m],
- typioparams[m],
- att->atttypmod,
- &nulls[m]);
- cstate->cur_attname = NULL;
+ eof = CopyReadBinaryTupleWorker(cstate, values, nulls);
+
+ if (eof)
+ return false;
}
}
@@ -6465,18 +6963,15 @@ CopyReadBinaryAttribute(CopyState cstate, FmgrInfo *flinfo,
Datum result;
if (!CopyGetInt32(cstate, &fld_size))
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("unexpected EOF in COPY data")));
+ EOF_ERROR;
+
if (fld_size == -1)
{
*isnull = true;
return ReceiveFunctionCall(flinfo, NULL, typioparam, typmod);
}
- if (fld_size < 0)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("invalid field size")));
+
+ CHECK_FIELD_SIZE(fld_size);
/* reset attribute_buf to empty, and load raw data in it */
resetStringInfo(&cstate->attribute_buf);
@@ -6484,9 +6979,7 @@ CopyReadBinaryAttribute(CopyState cstate, FmgrInfo *flinfo,
enlargeStringInfo(&cstate->attribute_buf, fld_size);
if (CopyGetData(cstate, cstate->attribute_buf.data,
fld_size, fld_size) != fld_size)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("unexpected EOF in COPY data")));
+ EOF_ERROR;
cstate->attribute_buf.len = fld_size;
cstate->attribute_buf.data[fld_size] = '\0';
--
2.25.1
On Sat, 11 Jul 2020 at 08:55, Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
Thanks Vignesh for the review. Addressed the comments in 0006 patch.
we can create a local variable and use in place of
cstate->pcdata->curr_data_block.Done.
+ if (cstate->raw_buf_index + sizeof(fld_count) >= (DATA_BLOCK_SIZE - 1)) + AdjustFieldInfo(cstate, 1); + + memcpy(&fld_count, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_count)); Should this be like below, as the remaining size can fit in current block: if (cstate->raw_buf_index + sizeof(fld_count) >= DATA_BLOCK_SIZE)+ if ((cstate->raw_buf_index + sizeof(fld_size)) >= (DATA_BLOCK_SIZE - 1)) + { + AdjustFieldInfo(cstate, 2); + *new_block_pos = pcshared_info->cur_block_pos; + } Same like above.Yes you are right. Changed.
+ movebytes = DATA_BLOCK_SIZE - cstate->raw_buf_index; + + cstate->pcdata->curr_data_block->skip_bytes = movebytes; + + data_block = &pcshared_info->data_blocks[block_pos]; + + if (movebytes > 0) Instead of the above check, we can have an assert check for movebytes.No, we can't use assert here. For the edge case where the current data
block is full to the size DATA_BLOCK_SIZE, then movebytes will be 0,
but we need to get a new data block. We avoid memmove by having
movebytes>0 check.+ if (mode == 1) + { + cstate->pcdata->curr_data_block = data_block; + cstate->raw_buf_index = 0; + } + else if(mode == 2) + { + ParallelCopyDataBlock *prev_data_block = NULL; + prev_data_block = cstate->pcdata->curr_data_block; + prev_data_block->following_block = block_pos; + cstate->pcdata->curr_data_block = data_block; + + if (prev_data_block->curr_blk_completed == false) + prev_data_block->curr_blk_completed = true; + + cstate->raw_buf_index = 0; + }This code is common for both, keep in common flow and remove if (mode == 1)
cstate->pcdata->curr_data_block = data_block;
cstate->raw_buf_index = 0;Done.
+#define CHECK_FIELD_COUNT \ +{\ + if (fld_count == -1) \ + { \ + if (IsParallelCopy() && \ + !IsLeader()) \ + return true; \ + else if (IsParallelCopy() && \ + IsLeader()) \ + { \ + if (cstate->pcdata->curr_data_block->data[cstate->raw_buf_index + sizeof(fld_count)] != 0) \ + ereport(ERROR, \ + (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \ + errmsg("received copy data after EOF marker"))); \ + return true; \ + } \ We only copy sizeof(fld_count), Shouldn't we check fld_count != cstate->max_fields? Am I missing something here?fld_count != cstate->max_fields check is done after the above checks.
+ if ((DATA_BLOCK_SIZE - cstate->raw_buf_index) >= fld_size) + { + cstate->raw_buf_index = cstate->raw_buf_index + fld_size; + } We can keep the check like cstate->raw_buf_index + fld_size < ..., for better readability and consistency.I think this is okay. It gives a good meaning that available bytes in
the current data block is greater or equal to fld_size then, the tuple
lies in the current data block.+static pg_attribute_always_inline void +CopyReadBinaryAttributeLeader(CopyState cstate, FmgrInfo *flinfo, + Oid typioparam, int32 typmod, uint32 *new_block_pos, + int m, ParallelCopyTupleInfo *tuple_start_info_ptr, + ParallelCopyTupleInfo *tuple_end_info_ptr, uint32 *line_size) flinfo, typioparam & typmod is not used, we can remove the parameter.Done.
+static pg_attribute_always_inline void +CopyReadBinaryAttributeLeader(CopyState cstate, FmgrInfo *flinfo, + Oid typioparam, int32 typmod, uint32 *new_block_pos, + int m, ParallelCopyTupleInfo *tuple_start_info_ptr, + ParallelCopyTupleInfo *tuple_end_info_ptr, uint32 *line_size) I felt this function need not be an inline function.Yes. Changed.
+ /* binary format */ + /* for paralle copy leader, fill in the error There are some typos, run spell checkDone.
+ /* raw_buf_index should never cross data block size, + * as the required number of data blocks would have + * been obtained in the above while loop. + */ There are few places, commenting style should be changed to postgres styleChanged.
+ if (cstate->pcdata->curr_data_block == NULL) + { + block_pos = WaitGetFreeCopyBlock(pcshared_info); + + cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[block_pos]; + + cstate->raw_buf_index = 0; + + readbytes = CopyGetData(cstate, &cstate->pcdata->curr_data_block->data, 1, DATA_BLOCK_SIZE); + + elog(DEBUG1, "LEADER - bytes read from file %d", readbytes); + + if (cstate->reached_eof) + return true; + } There are many empty lines, these are not required.Removed.
+ + fld_count = (int16) pg_ntoh16(fld_count); + + CHECK_FIELD_COUNT; + + cstate->raw_buf_index = cstate->raw_buf_index + sizeof(fld_count); + new_block_pos = pcshared_info->cur_block_pos; You can run pg_indent once for the changes.I ran pg_indent and observed that there are many places getting
modified by pg_indent. If we need to run pg_indet on copy.c for
parallel copy alone, then first, we need to run on plane copy.c and
take those changes and then run for all parallel copy files. I think
we better run pg_indent, for all the parallel copy patches once and
for all, maybe just before we kind of finish up all the code reviews.+ if (mode == 1) + { + cstate->pcdata->curr_data_block = data_block; + cstate->raw_buf_index = 0; + } + else if(mode == 2) + { Could use macros for 1 & 2 for better readability.Done.
+ + if (following_block_id == -1) + break; + } + + if (following_block_id != -1) + pg_atomic_add_fetch_u32(&pcshared_info->data_blocks[following_block_id].unprocessed_line_parts, 1); + + *line_size = *line_size + tuple_end_info_ptr->offset + 1; + } We could calculate the size as we parse and identify one record, if we do that way this can be removed.Done.
Hi Bharath,
I was looking forward to review this patch-set but unfortunately it is
showing a reject in copy.c, and might need a rebase.
I was applying on master over the commit-
cd22d3cdb9bd9963c694c01a8c0232bbae3ddcfb.
--
Regards,
Rafia Sabih
Hi Bharath,
I was looking forward to review this patch-set but unfortunately it is
showing a reject in copy.c, and might need a rebase.
I was applying on master over the commit-
cd22d3cdb9bd9963c694c01a8c0232bbae3ddcfb.
Thanks for showing interest. Please find the patch set rebased to
latest commit b1e48bbe64a411666bb1928b9741e112e267836d.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
0002-Framework-for-leader-worker-in-parallel-copy.patchapplication/octet-stream; name=0002-Framework-for-leader-worker-in-parallel-copy.patchDownload
From f852a2d53aa2107b90899df9b8c07824b131e498 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>, Bharath <bharath.rupireddyforpostgres@gmail.com>
Date: Wed, 24 Jun 2020 13:51:56 +0530
Subject: [PATCH 2/6] Framework for leader/worker in parallel copy
This patch has the framework for data structures in parallel copy, leader
initialization, worker initialization, shared memory updation, starting workers,
wait for workers and workers exiting.
---
src/backend/access/transam/parallel.c | 4 +
src/backend/commands/copy.c | 865 +++++++++++++++++++++++++++-
src/backend/replication/logical/tablesync.c | 2 +-
src/include/commands/copy.h | 4 +
src/tools/pgindent/typedefs.list | 8 +
5 files changed, 878 insertions(+), 5 deletions(-)
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 14a8690..09e7a19 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -25,6 +25,7 @@
#include "catalog/pg_enum.h"
#include "catalog/storage.h"
#include "commands/async.h"
+#include "commands/copy.h"
#include "executor/execParallel.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
@@ -145,6 +146,9 @@ static const struct
},
{
"parallel_vacuum_main", parallel_vacuum_main
+ },
+ {
+ "ParallelCopyMain", ParallelCopyMain
}
};
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 33ee891..0a4d997 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -96,9 +96,138 @@ typedef enum CopyInsertMethod
} CopyInsertMethod;
#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+
+#define DATA_BLOCK_SIZE RAW_BUF_SIZE
+#define RINGSIZE (10 * 1000)
+#define MAX_BLOCKS_COUNT 1000
+#define WORKER_CHUNK_COUNT 50 /* should be mod of RINGSIZE */
+
+#define IsParallelCopy() (cstate->is_parallel)
+#define IsLeader() (cstate->pcdata->is_leader)
#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
/*
+ * Copy data block information.
+ */
+typedef struct ParallelCopyDataBlock
+{
+ /* The number of unprocessed lines in the current block. */
+ pg_atomic_uint32 unprocessed_line_parts;
+
+ /*
+ * If the current line data is continued into another block, following_block
+ * will have the position where the remaining data need to be read.
+ */
+ uint32 following_block;
+
+ /*
+ * This flag will be set, when the leader finds out this block can be read
+ * safely by the worker. This helps the worker to start processing the line
+ * early where the line will be spread across many blocks and the worker
+ * need not wait for the complete line to be processed.
+ */
+ bool curr_blk_completed;
+ char data[DATA_BLOCK_SIZE]; /* data read from file */
+ uint8 skip_bytes;
+}ParallelCopyDataBlock;
+
+/*
+ * Individual line information.
+ * ParallelCopyLineBoundary is common data structure between leader & worker,
+ * this is protected by the following sequence in the leader & worker.
+ * Leader should operate in the following order:
+ * 1) update first_block, start_offset & cur_lineno in any order.
+ * 2) update line_size.
+ * 3) update line_state.
+ * Worker should operate in the following order:
+ * 1) read line_size.
+ * 2) only one worker should choose one line for processing, this is handled by
+ * using pg_atomic_compare_exchange_u32, worker will change the sate to
+ * LINE_WORKER_PROCESSING only if line_state is LINE_LEADER_POPULATED.
+ * 3) read first_block, start_offset & cur_lineno in any order.
+ */
+typedef struct ParallelCopyLineBoundary
+{
+ /* Position of the first block in data_blocks array. */
+ uint32 first_block;
+ uint32 start_offset; /* start offset of the line */
+
+ /*
+ * Size of the current line -1 means line is yet to be filled completely,
+ * 0 means empty line, >0 means line filled with line size data.
+ */
+ pg_atomic_uint32 line_size;
+ pg_atomic_uint32 line_state; /* line state */
+ uint64 cur_lineno; /* line number for error messages */
+}ParallelCopyLineBoundary;
+
+/*
+ * Circular queue used to store the line information.
+ */
+typedef struct ParallelCopyLineBoundaries
+{
+ /* Position for the leader to populate a line. */
+ uint32 leader_pos;
+
+ /* Data read from the file/stdin by the leader process. */
+ ParallelCopyLineBoundary ring[RINGSIZE];
+}ParallelCopyLineBoundaries;
+
+/*
+ * Shared information among parallel copy workers. This will be allocated in the
+ * DSM segment.
+ */
+typedef struct ParallelCopyShmInfo
+{
+ bool is_read_in_progress; /* file read status */
+
+ /*
+ * Actual lines inserted by worker (some records will be filtered based on
+ * where condition).
+ */
+ pg_atomic_uint64 processed;
+ pg_atomic_uint64 total_worker_processed; /* total processed records by the workers */
+ uint64 populated; /* lines populated by leader */
+ uint32 cur_block_pos; /* current data block */
+ ParallelCopyDataBlock data_blocks[MAX_BLOCKS_COUNT]; /* data block array */
+ FullTransactionId full_transaction_id; /* xid for copy from statement */
+ CommandId mycid; /* command id */
+ ParallelCopyLineBoundaries line_boundaries; /* line array */
+} ParallelCopyShmInfo;
+
+/*
+ * Parallel copy line buffer information.
+ */
+typedef struct ParallelCopyLineBuf
+{
+ StringInfoData line_buf;
+ uint64 cur_lineno; /* line number for error messages */
+}ParallelCopyLineBuf;
+
+/*
+ * Parallel copy data information.
+ */
+typedef struct ParallelCopyData
+{
+ Oid relid; /* relation id of the table */
+ ParallelCopyShmInfo *pcshared_info; /* common info in shared memory */
+ bool is_leader;
+
+ /* line position which worker is processing */
+ uint32 worker_processed_pos;
+
+ /*
+ * Local line_buf array, workers will copy it here and release the lines
+ * for the leader to continue.
+ */
+ ParallelCopyLineBuf worker_line_buf[WORKER_CHUNK_COUNT];
+ uint32 worker_line_buf_count; /* Number of lines */
+
+ /* Current position in worker_line_buf */
+ uint32 worker_line_buf_pos;
+}ParallelCopyData;
+
+/*
* This struct contains all the state variables used throughout a COPY
* operation. For simplicity, we use the same struct for all variants of COPY,
* even though some fields are used in only some cases.
@@ -225,8 +354,66 @@ typedef struct CopyStateData
char *raw_buf;
int raw_buf_index; /* next byte to process */
int raw_buf_len; /* total # of bytes stored */
+ int nworkers;
+ bool is_parallel;
+ ParallelCopyData *pcdata;
} CopyStateData;
+/*
+ * This structure helps in storing the common data from CopyStateData that are
+ * required by the workers. This information will then be allocated and stored
+ * into the DSM for the worker to retrieve and copy it to CopyStateData.
+ */
+typedef struct ParallelCopyCommonKeyData
+{
+ /* low-level state data */
+ CopyDest copy_dest; /* type of copy source/destination */
+ int file_encoding; /* file or remote side's character encoding */
+ bool need_transcoding; /* file encoding diff from server? */
+ bool encoding_embeds_ascii; /* ASCII can be non-first byte? */
+
+ /* parameters from the COPY command */
+ bool csv_mode; /* Comma Separated Value format? */
+ bool header_line; /* CSV header line? */
+ int null_print_len; /* length of same */
+ bool force_quote_all; /* FORCE_QUOTE *? */
+ bool convert_selectively; /* do selective binary conversion? */
+
+ /* Working state for COPY FROM */
+ AttrNumber num_defaults;
+ Oid relid;
+}ParallelCopyCommonKeyData;
+
+/*
+ * This structure will help in converting a List data type into the below
+ * structure format with the count having the number of elements in the list and
+ * the info having the List elements appended contigously. This converted
+ * structure will be allocated in shared memory and stored in DSM for the worker
+ * to retrieve and later convert it back to List data type.
+ */
+typedef struct ParallelCopyKeyListInfo
+{
+ int count; /* count of attributes */
+
+ /* string info in the form info followed by info1, info2... infon */
+ char info[1];
+} ParallelCopyKeyListInfo;
+
+/*
+ * List of internal parallel copy function pointers.
+ */
+static const struct
+{
+ char *fn_name;
+ copy_data_source_cb fn_addr;
+} InternalParallelCopyFuncPtrs[] =
+
+{
+ {
+ "copy_read_data", copy_read_data
+ },
+};
+
/* DestReceiver for COPY (query) TO */
typedef struct
{
@@ -256,6 +443,23 @@ typedef struct
/* Trim the list of buffers back down to this number after flushing */
#define MAX_PARTITION_BUFFERS 32
+/* DSM keys for parallel copy. */
+#define PARALLEL_COPY_KEY_SHARED_INFO 1
+#define PARALLEL_COPY_KEY_CSTATE 2
+#define PARALLEL_COPY_KEY_NULL_PRINT 3
+#define PARALLEL_COPY_KEY_NULL_PRINT_CLIENT 4
+#define PARALLEL_COPY_KEY_DELIM 5
+#define PARALLEL_COPY_KEY_QUOTE 6
+#define PARALLEL_COPY_KEY_ESCAPE 7
+#define PARALLEL_COPY_KEY_ATTNAME_LIST 8
+#define PARALLEL_COPY_KEY_FORCE_QUOTE_LIST 9
+#define PARALLEL_COPY_KEY_NOT_NULL_LIST 10
+#define PARALLEL_COPY_KEY_NULL_LIST 11
+#define PARALLEL_COPY_KEY_CONVERT_LIST 12
+#define PARALLEL_COPY_KEY_DATASOURCE_CB 13
+#define PARALLEL_COPY_KEY_WHERE_CLAUSE_STR 14
+#define PARALLEL_COPY_KEY_RANGE_TABLE 15
+
/* Stores multi-insert data related to a single relation in CopyFrom. */
typedef struct CopyMultiInsertBuffer
{
@@ -476,10 +680,596 @@ static bool CopyGetInt32(CopyState cstate, int32 *val);
static void CopySendInt16(CopyState cstate, int16 val);
static bool CopyGetInt16(CopyState cstate, int16 *val);
+static pg_attribute_always_inline void EndParallelCopy(ParallelContext *pcxt);
static void PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tup_desc,
List *attnamelist);
+static pg_attribute_always_inline copy_data_source_cb LookupParallelCopyFnPtr(const char *funcname);
+static pg_attribute_always_inline char* LookupParallelCopyFnStr(copy_data_source_cb fn_addr);
+
+/*
+ * CopyCommonInfoForWorker - Copy shared_cstate using cstate information.
+ */
+static pg_attribute_always_inline void
+CopyCommonInfoForWorker(CopyState cstate, ParallelCopyCommonKeyData *shared_cstate)
+{
+ shared_cstate->copy_dest = cstate->copy_dest;
+ shared_cstate->file_encoding = cstate->file_encoding;
+ shared_cstate->need_transcoding = cstate->need_transcoding;
+ shared_cstate->encoding_embeds_ascii = cstate->encoding_embeds_ascii;
+ shared_cstate->csv_mode = cstate->csv_mode;
+ shared_cstate->header_line = cstate->header_line;
+ shared_cstate->null_print_len = cstate->null_print_len;
+ shared_cstate->force_quote_all = cstate->force_quote_all;
+ shared_cstate->convert_selectively = cstate->convert_selectively;
+ shared_cstate->num_defaults = cstate->num_defaults;
+ shared_cstate->relid = cstate->pcdata->relid;
+}
+
+/*
+ * RetrieveSharedString - Retrieve the string from shared memory.
+ */
+static void
+RetrieveSharedString(shm_toc *toc, int sharedkey, char **copystr)
+{
+ char *shared_str_val = (char *)shm_toc_lookup(toc, sharedkey, true);
+ if (shared_str_val)
+ *copystr = pstrdup(shared_str_val);
+}
+
+/*
+ * RetrieveSharedList - Retrieve the list from shared memory.
+ */
+static void
+RetrieveSharedList(shm_toc *toc, int sharedkey, List **copylist)
+{
+ ParallelCopyKeyListInfo *listinformation = (ParallelCopyKeyListInfo *)shm_toc_lookup(toc, sharedkey,
+ true);
+ if (listinformation)
+ {
+ int length = 0;
+ int count;
+
+ for (count = 0; count < listinformation->count; count++)
+ {
+ char *attname = (char *)(listinformation->info + length);
+ length += strlen(attname) + 1;
+ *copylist = lappend(*copylist, makeString(attname));
+ }
+ }
+}
+
+/*
+ * CopyListSharedMemory - Copy the list into shared memory.
+ */
+static void
+CopyListSharedMemory(List *inputlist, Size memsize, ParallelCopyKeyListInfo *sharedlistinfo)
+{
+ ListCell *l;
+ int length = 0;
+
+ MemSet(sharedlistinfo, 0, memsize);
+ foreach(l, inputlist)
+ {
+ char *name = strVal(lfirst(l));
+ memcpy((char *)(sharedlistinfo->info + length), name, strlen(name));
+ sharedlistinfo->count++;
+ length += strlen(name) + 1;
+ }
+}
+
+/*
+ * ComputeListSize - compute the list size.
+ */
+static int
+ComputeListSize(List *inputlist)
+{
+ int est_size = sizeof(int);
+ if (inputlist != NIL)
+ {
+ ListCell *l;
+ foreach(l, inputlist)
+ est_size += strlen(strVal(lfirst(l))) + 1;
+ }
+
+ return est_size;
+}
+
+/*
+ * EstimateLineKeysStr - Estimate the size required in shared memory for the
+ * input string.
+ */
+static void
+EstimateLineKeysStr(ParallelContext *pcxt, char *inputstr)
+{
+ if (inputstr)
+ {
+ shm_toc_estimate_chunk(&pcxt->estimator, strlen(inputstr) + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+}
+
+/*
+ * EstimateLineKeysList - Estimate the size required in shared memory for the
+ * input list.
+ */
+static void
+EstimateLineKeysList(ParallelContext *pcxt, List *inputlist,
+ Size *est_list_size)
+{
+ if (inputlist != NIL)
+ {
+ *est_list_size = ComputeListSize(inputlist);
+ shm_toc_estimate_chunk(&pcxt->estimator, *est_list_size);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+}
/*
+ * InsertStringShm - Insert a string into shared memory.
+ */
+static void
+InsertStringShm(ParallelContext *pcxt, int key, char *inputstr)
+{
+ if (inputstr)
+ {
+ char *shmptr = (char *)shm_toc_allocate(pcxt->toc,
+ strlen(inputstr) + 1);
+ strcpy(shmptr, inputstr);
+ shm_toc_insert(pcxt->toc, key, shmptr);
+ }
+}
+
+/*
+ * InsertListShm - Insert a list into shared memory.
+ */
+static void
+InsertListShm(ParallelContext *pcxt, int key, List *inputlist,
+ Size est_list_size)
+{
+ if (inputlist != NIL)
+ {
+ ParallelCopyKeyListInfo *sharedlistinfo = (ParallelCopyKeyListInfo *)shm_toc_allocate(pcxt->toc,
+ est_list_size);
+ CopyListSharedMemory(inputlist, est_list_size, sharedlistinfo);
+ shm_toc_insert(pcxt->toc, key, sharedlistinfo);
+ }
+}
+
+/*
+ * BeginParallelCopy - start parallel copy tasks.
+ *
+ * Get the number of workers required to perform the parallel copy. The data
+ * structures that are required by the parallel workers will be initialized, the
+ * size required in DSM will be calculated and the necessary keys will be loaded
+ * in the DSM. The specified number of workers will then be launched.
+ *
+ */
+static ParallelContext*
+BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
+{
+ ParallelContext *pcxt;
+ ParallelCopyShmInfo *shared_info_ptr;
+ ParallelCopyCommonKeyData *shared_cstate;
+ FullTransactionId full_transaction_id;
+ Size est_shared_info;
+ Size est_cstateshared;
+ Size est_att_list_size;
+ Size est_quote_list_size;
+ Size est_notnull_list_size;
+ Size est_null_list_size;
+ Size est_convert_list_size;
+ Size est_datasource_cb_size;
+ int count = 0;
+ char *whereClauseStr = NULL;
+ char *rangeTableStr = NULL;
+ int parallel_workers = 0;
+ ParallelCopyData *pcdata;
+
+ parallel_workers = Min(nworkers, max_worker_processes);
+
+ /* Can't perform copy in parallel */
+ if (parallel_workers <= 0)
+ {
+ elog(WARNING,
+ "No workers available, copy will be run in non-parallel mode");
+ return NULL;
+ }
+
+ pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
+ cstate->pcdata = pcdata;
+
+ EnterParallelMode();
+ pcxt = CreateParallelContext("postgres", "ParallelCopyMain",
+ parallel_workers);
+ Assert(pcxt->nworkers > 0);
+
+ /*
+ * Estimate size for shared information for PARALLEL_COPY_KEY_SHARED_INFO
+ */
+ est_shared_info = sizeof(ParallelCopyShmInfo);
+ shm_toc_estimate_chunk(&pcxt->estimator, est_shared_info);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate the size for shared information for PARALLEL_COPY_KEY_CSTATE */
+ est_cstateshared = MAXALIGN(sizeof(ParallelCopyCommonKeyData));
+ shm_toc_estimate_chunk(&pcxt->estimator, est_cstateshared);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ EstimateLineKeysStr(pcxt, cstate->null_print);
+ EstimateLineKeysStr(pcxt, cstate->null_print_client);
+ EstimateLineKeysStr(pcxt, cstate->delim);
+ EstimateLineKeysStr(pcxt, cstate->quote);
+ EstimateLineKeysStr(pcxt, cstate->escape);
+
+ if (cstate->whereClause != NULL)
+ {
+ whereClauseStr = nodeToString(cstate->whereClause);
+ EstimateLineKeysStr(pcxt, whereClauseStr);
+ }
+
+ if (cstate->range_table != NULL)
+ {
+ rangeTableStr = nodeToString(cstate->range_table);
+ EstimateLineKeysStr(pcxt, rangeTableStr);
+ }
+
+ /* Estimate the size for shared information for PARALLEL_COPY_KEY_XID. */
+ shm_toc_estimate_chunk(&pcxt->estimator, sizeof(FullTransactionId));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_ATTNAME_LIST.
+ */
+ EstimateLineKeysList(pcxt, attnamelist, &est_att_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_FORCE_QUOTE_LIST.
+ */
+ EstimateLineKeysList(pcxt, cstate->force_quote, &est_quote_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_NOT_NULL_LIST.
+ */
+ EstimateLineKeysList(pcxt, cstate->force_notnull,
+ &est_notnull_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_NULL_LIST.
+ */
+ EstimateLineKeysList(pcxt, cstate->force_null, &est_null_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_CONVERT_LIST.
+ */
+ EstimateLineKeysList(pcxt, cstate->convert_select,
+ &est_convert_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_DATASOURCE_CB.
+ */
+ if (cstate->data_source_cb)
+ {
+ char *functionname = LookupParallelCopyFnStr(cstate->data_source_cb);
+ est_datasource_cb_size = strlen(functionname) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, est_datasource_cb_size);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+
+ InitializeParallelDSM(pcxt);
+
+ /* If no DSM segment was available, back out (do serial copy) */
+ if (pcxt->seg == NULL)
+ {
+ EndParallelCopy(pcxt);
+ elog(WARNING,
+ "DSM segments not available, copy will be run in non-parallel mode");
+ return NULL;
+ }
+
+ /* Allocate shared memory for PARALLEL_COPY_KEY_SHARED_INFO */
+ shared_info_ptr = (ParallelCopyShmInfo *) shm_toc_allocate(pcxt->toc,
+ est_shared_info);
+ MemSet(shared_info_ptr, 0, est_shared_info);
+ shared_info_ptr->is_read_in_progress = true;
+ shared_info_ptr->cur_block_pos = -1;
+ shared_info_ptr->full_transaction_id = full_transaction_id;
+ shared_info_ptr->mycid = GetCurrentCommandId(true);
+ for (count = 0; count < RINGSIZE; count++)
+ {
+ ParallelCopyLineBoundary *lineInfo = &shared_info_ptr->line_boundaries.ring[count];
+ pg_atomic_init_u32(&(lineInfo->line_size), -1);
+ }
+
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_SHARED_INFO, shared_info_ptr);
+ pcdata->pcshared_info = shared_info_ptr;
+ pcdata->relid = relid;
+
+ /* Store shared build state, for which we reserved space. */
+ shared_cstate = (ParallelCopyCommonKeyData *)shm_toc_allocate(pcxt->toc,
+ est_cstateshared);
+
+ /* copy cstate variables. */
+ CopyCommonInfoForWorker(cstate, shared_cstate);
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_CSTATE, shared_cstate);
+
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_NULL_PRINT, cstate->null_print);
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_NULL_PRINT_CLIENT,
+ cstate->null_print_client);
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_DELIM, cstate->delim);
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_QUOTE, cstate->quote);
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_ESCAPE, cstate->escape);
+
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_ATTNAME_LIST,
+ attnamelist, est_att_list_size);
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_FORCE_QUOTE_LIST,
+ cstate->force_quote, est_quote_list_size);
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_NOT_NULL_LIST,
+ cstate->force_notnull, est_notnull_list_size);
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_NULL_LIST, cstate->force_null,
+ est_null_list_size);
+ InsertListShm(pcxt, PARALLEL_COPY_KEY_CONVERT_LIST,
+ cstate->convert_select, est_convert_list_size);
+
+ if (cstate->data_source_cb)
+ {
+ char *functionname = LookupParallelCopyFnStr(cstate->data_source_cb);
+ char *data_source_cb = (char *) shm_toc_allocate(pcxt->toc,
+ est_datasource_cb_size);
+ strcpy(data_source_cb, functionname);
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_DATASOURCE_CB,
+ data_source_cb);
+ }
+
+ if (cstate->whereClause)
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_WHERE_CLAUSE_STR,
+ whereClauseStr);
+
+ if(cstate->range_table)
+ InsertStringShm(pcxt, PARALLEL_COPY_KEY_RANGE_TABLE, rangeTableStr);
+
+ LaunchParallelWorkers(pcxt);
+ if (pcxt->nworkers_launched == 0)
+ {
+ EndParallelCopy(pcxt);
+ elog(WARNING,
+ "No workers available, copy will be run in non-parallel mode");
+ return NULL;
+ }
+
+ /*
+ * Caller needs to wait for all launched workers when we return. Make sure
+ * that the failure-to-start case will not hang forever.
+ */
+ WaitForParallelWorkersToAttach(pcxt);
+
+ pcdata->is_leader = true;
+ cstate->is_parallel = true;
+ return pcxt;
+}
+
+/*
+ * EndParallelCopy - end the parallel copy tasks.
+ */
+static pg_attribute_always_inline void
+EndParallelCopy(ParallelContext *pcxt)
+{
+ Assert(!IsParallelWorker());
+
+ DestroyParallelContext(pcxt);
+ ExitParallelMode();
+}
+
+/*
+ * ParallelWorkerInitialization - Initialize parallel worker.
+ */
+static void
+ParallelWorkerInitialization(ParallelCopyCommonKeyData *shared_cstate,
+ CopyState cstate, List *attnamelist)
+{
+ uint32 count;
+ ParallelCopyData *pcdata = cstate->pcdata;
+ TupleDesc tup_desc = RelationGetDescr(cstate->rel);
+
+ cstate->copy_dest = shared_cstate->copy_dest;
+ cstate->file_encoding = shared_cstate->file_encoding;
+ cstate->need_transcoding = shared_cstate->need_transcoding;
+ cstate->encoding_embeds_ascii = shared_cstate->encoding_embeds_ascii;
+ cstate->csv_mode = shared_cstate->csv_mode;
+ cstate->header_line = shared_cstate->header_line;
+ cstate->null_print_len = shared_cstate->null_print_len;
+ cstate->force_quote_all = shared_cstate->force_quote_all;
+ cstate->convert_selectively = shared_cstate->convert_selectively;
+ cstate->num_defaults = shared_cstate->num_defaults;
+ pcdata->relid = shared_cstate->relid;
+
+ PopulateGlobalsForCopyFrom(cstate, tup_desc, attnamelist);
+
+ /* Initialize state variables. */
+ cstate->reached_eof = false;
+ cstate->eol_type = EOL_UNKNOWN;
+ cstate->cur_relname = RelationGetRelationName(cstate->rel);
+ cstate->cur_lineno = 0;
+ cstate->cur_attname = NULL;
+ cstate->cur_attval = NULL;
+
+ /* Set up variables to avoid per-attribute overhead. */
+ initStringInfo(&cstate->attribute_buf);
+
+ initStringInfo(&cstate->line_buf);
+ for (count = 0; count < WORKER_CHUNK_COUNT;count++)
+ initStringInfo(&pcdata->worker_line_buf[count].line_buf);
+
+ cstate->line_buf_converted = false;
+ cstate->raw_buf = NULL;
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+}
+/*
+ * ParallelCopyMain - parallel copy worker's code.
+ *
+ * Where clause handling, convert tuple to columns, add default null values for
+ * the missing columns that are not present in that record. Find the partition
+ * if it is partitioned table, invoke before row insert Triggers, handle
+ * constraints and insert the tuples.
+ */
+void
+ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
+{
+ CopyState cstate;
+ ParallelCopyData *pcdata;
+ ParallelCopyShmInfo *pcshared_info;
+ ParallelCopyCommonKeyData *shared_cstate;
+ Relation rel = NULL;
+ MemoryContext oldcontext;
+ List *attlist = NIL;
+ char *data_source_cb;
+ char *whereClauseStr = NULL;
+ char *rangeTableStr = NULL;
+
+ /* Allocate workspace and zero all fields. */
+ cstate = (CopyStateData *) palloc0(sizeof(CopyStateData));
+ pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
+ cstate->pcdata = pcdata;
+ pcdata->is_leader = false;
+ pcdata->worker_processed_pos = -1;
+ cstate->is_parallel = true;
+
+ /*
+ * We allocate everything used by a cstate in a new memory context. This
+ * avoids memory leaks during repeated use of COPY in a query.
+ */
+ cstate->copycontext = AllocSetContextCreate(CurrentMemoryContext,
+ "COPY",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+
+ pcshared_info = (ParallelCopyShmInfo *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_SHARED_INFO,
+ false);
+
+ ereport(DEBUG1, (errmsg("Starting parallel copy worker")));
+
+ pcdata->pcshared_info = pcshared_info;
+
+ shared_cstate = (ParallelCopyCommonKeyData *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_CSTATE,
+ false);
+
+ cstate->null_print = (char *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_NULL_PRINT,
+ true);
+ cstate->null_print_client = (char *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_NULL_PRINT_CLIENT,
+ true);
+
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_DELIM, &cstate->delim);
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_QUOTE, &cstate->quote);
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_ESCAPE, &cstate->escape);
+
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_ATTNAME_LIST, &attlist);
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_FORCE_QUOTE_LIST, &cstate->force_quote);
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_NOT_NULL_LIST, &cstate->force_notnull);
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_NULL_LIST, &cstate->force_null);
+ RetrieveSharedList(toc, PARALLEL_COPY_KEY_CONVERT_LIST, &cstate->convert_select);
+ data_source_cb = (char *)shm_toc_lookup(toc,
+ PARALLEL_COPY_KEY_DATASOURCE_CB,
+ true);
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_WHERE_CLAUSE_STR, &whereClauseStr);
+ RetrieveSharedString(toc, PARALLEL_COPY_KEY_RANGE_TABLE, &rangeTableStr);
+
+ if (data_source_cb)
+ cstate->data_source_cb = LookupParallelCopyFnPtr(data_source_cb);
+
+ if (whereClauseStr)
+ {
+ Node *whereClauseCnvrtdFrmStr = (Node *) stringToNode(whereClauseStr);
+ cstate->whereClause = whereClauseCnvrtdFrmStr;
+ }
+
+ if (rangeTableStr)
+ {
+ List *rangeTableCnvrtdFrmStr = (List *) stringToNode(rangeTableStr);
+ cstate->range_table = rangeTableCnvrtdFrmStr;
+ }
+
+ /* Open and lock the relation, using the appropriate lock type. */
+ rel = table_open(shared_cstate->relid, RowExclusiveLock);
+ cstate->rel = rel;
+ ParallelWorkerInitialization(shared_cstate, cstate, attlist);
+
+ CopyFrom(cstate);
+
+ if (rel != NULL)
+ table_close(rel, RowExclusiveLock);
+
+ MemoryContextSwitchTo(oldcontext);
+ return;
+}
+/*
+ * ParallelCopyLeader - parallel copy leader's functionality.
+ *
+ * Leader will populate the shared memory and share it across the workers.
+ * Leader will read the table data from the file and copy the contents to block.
+ * Leader will then read the input contents and identify the data based on line
+ * beaks. This information is called line. The line will be populate in
+ * ParallelCopyLineBoundary. Workers will then pick up this information and
+ * insert in to table. Leader will do this till it completes processing the
+ * file. Leader executes the before statement if before statement trigger is
+ * present. Leader read the data from input file. Leader then loads data to data
+ * blocks as and when required block by block. Leader traverses through the data
+ * block to identify one line. It gets a free line to copy the information, if
+ * there is no free line it will wait till there is one free line.
+ * Server copies the identified lines information into lines. This process is
+ * repeated till the complete file is processed.
+ * Leader will wait till all the lines populated are processed by the workers
+ * and exits.
+ */
+static void
+ParallelCopyLeader(CopyState cstate)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ereport(DEBUG1, (errmsg("Running parallel copy leader")));
+
+ pcshared_info->is_read_in_progress = false;
+ cstate->cur_lineno = 0;
+}
+/*
+ * LookupParallelCopyFnPtr - Look up parallel copy function pointer.
+ */
+static pg_attribute_always_inline copy_data_source_cb
+LookupParallelCopyFnPtr(const char *funcname)
+{
+ int i;
+
+ for (i = 0; i < lengthof(InternalParallelCopyFuncPtrs); i++)
+ {
+ if (strcmp(InternalParallelCopyFuncPtrs[i].fn_name, funcname) == 0)
+ return InternalParallelCopyFuncPtrs[i].fn_addr;
+ }
+
+ /* We can only reach this by programming error. */
+ elog(ERROR, "internal function \"%s\" not found", funcname);
+}
+
+/*
+ * LookupParallelCopyFnStr - Lookup function string from a function pointer.
+ */
+static pg_attribute_always_inline char*
+LookupParallelCopyFnStr(copy_data_source_cb fn_addr)
+{
+ int i;
+
+ for (i = 0; i < lengthof(InternalParallelCopyFuncPtrs); i++)
+ {
+ if (InternalParallelCopyFuncPtrs[i].fn_addr == fn_addr)
+ return InternalParallelCopyFuncPtrs[i].fn_name;
+ }
+
+ /* We can only reach this by programming error. */
+ elog(ERROR, "internal function pointer not found");
+}
+/*
* Send copy start/stop messages for frontend copies. These have changed
* in past protocol redesigns.
*/
@@ -1149,6 +1939,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
if (is_from)
{
+ ParallelContext *pcxt = NULL;
Assert(rel);
/* check read-only transaction and parallel mode */
@@ -1158,7 +1949,24 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
cstate = BeginCopyFrom(pstate, rel, stmt->filename, stmt->is_program,
NULL, stmt->attlist, stmt->options);
cstate->whereClause = whereClause;
- *processed = CopyFrom(cstate); /* copy from file to database */
+ cstate->is_parallel = false;
+
+ if (cstate->nworkers > 0)
+ pcxt = BeginParallelCopy(cstate->nworkers, cstate, stmt->attlist,
+ relid);
+
+ if (pcxt)
+ {
+ ParallelCopyLeader(cstate);
+
+ /* Wait for all copy workers to finish */
+ WaitForParallelWorkersToFinish(pcxt);
+ *processed = pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
+ EndParallelCopy(pcxt);
+ }
+ else
+ *processed = CopyFrom(cstate); /* copy from file to database */
+
EndCopyFrom(cstate);
}
else
@@ -1207,6 +2015,7 @@ ProcessCopyOptions(ParseState *pstate,
cstate->is_copy_from = is_from;
cstate->file_encoding = -1;
+ cstate->nworkers = -1;
/* Extract options from the statement node tree */
foreach(option, options)
@@ -1375,6 +2184,53 @@ ProcessCopyOptions(ParseState *pstate,
defel->defname),
parser_errposition(pstate, defel->location)));
}
+ else if (strcmp(defel->defname, "parallel") == 0)
+ {
+ char *endptr, *str;
+ long val;
+
+ if (!cstate->is_copy_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("parallel option supported only for copy from"),
+ parser_errposition(pstate, defel->location)));
+ if (cstate->nworkers >= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("conflicting or redundant options"),
+ parser_errposition(pstate, defel->location)));
+
+ str = defGetString(defel);
+
+ errno = 0; /* To distinguish success/failure after call */
+ val = strtol(str, &endptr, 10);
+
+ /* Check for various possible errors */
+ if ((errno == ERANGE && (val == LONG_MAX || val == LONG_MIN))
+ || (errno != 0 && val == 0) ||
+ *endptr)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("improper use of argument to option \"%s\"",
+ defel->defname),
+ parser_errposition(pstate, defel->location)));
+
+ if (endptr == str)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("no digits were found in argument to option \"%s\"",
+ defel->defname),
+ parser_errposition(pstate, defel->location)));
+
+ cstate->nworkers = (int) val;
+
+ if (cstate->nworkers <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("argument to option \"%s\" must be a positive integer greater than zero",
+ defel->defname),
+ parser_errposition(pstate, defel->location)));
+ }
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1727,12 +2583,13 @@ BeginCopy(ParseState *pstate,
}
/*
- * PopulateGlobalsForCopyFrom - Populates the common variables required for copy
- * from operation. This is a helper function for BeginCopy function.
+ * PopulateCatalogInformation - Populates the common variables required for copy
+ * from operation. This is a helper function for BeginCopy &
+ * ParallelWorkerInitialization function.
*/
static void
PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tupDesc,
- List *attnamelist)
+ List *attnamelist)
{
int num_phys_attrs;
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index c27d970..b3787c1 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -557,7 +557,7 @@ make_copy_attnamelist(LogicalRepRelMapEntry *rel)
* Data source callback for the COPY FROM, which reads from the remote
* connection and passes the data back to our local COPY.
*/
-static int
+int
copy_read_data(void *outbuf, int minread, int maxread)
{
int bytesread = 0;
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c639833..5dc95ac 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -14,6 +14,7 @@
#ifndef COPY_H
#define COPY_H
+#include "access/parallel.h"
#include "nodes/execnodes.h"
#include "nodes/parsenodes.h"
#include "parser/parse_node.h"
@@ -41,4 +42,7 @@ extern uint64 CopyFrom(CopyState cstate);
extern DestReceiver *CreateCopyDestReceiver(void);
+extern void ParallelCopyMain(dsm_segment *seg, shm_toc *toc);
+
+extern int copy_read_data(void *outbuf, int minread, int maxread);
#endif /* COPY_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 7eaaad1..6a42ac4 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1699,6 +1699,14 @@ ParallelBitmapHeapState
ParallelBlockTableScanDesc
ParallelCompletionPtr
ParallelContext
+ParallelCopyLineBoundaries
+ParallelCopyLineBoundary
+ParallelCopyCommonKeyData
+ParallelCopyData
+ParallelCopyDataBlock
+ParallelCopyKeyListInfo
+ParallelCopyLineBuf
+ParallelCopyShmInfo
ParallelExecutorInfo
ParallelHashGrowth
ParallelHashJoinBatch
--
1.8.3.1
0001-Copy-code-readjustment-to-support-parallel-copy.patchapplication/octet-stream; name=0001-Copy-code-readjustment-to-support-parallel-copy.patchDownload
From cb04472cb10fbb8c4da4d3073b29c4dff3c51bce Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Sun, 12 Jul 2020 17:20:49 +0530
Subject: [PATCH 1/6] Copy code readjustment to support parallel copy.
This patch has the copy code slightly readjusted so that the common code is
separated to functions/macros, these functions/macros will be used by the
workers in the parallel copy code of the upcoming patches. EOL removal is moved
from CopyReadLine to CopyReadLineText, this change was required because in case
of parallel copy the record identification and record updation is done in
CopyReadLineText, before record information is updated in shared memory the new
line characters should be removed.
---
src/backend/commands/copy.c | 386 +++++++++++++++++++++---------------
1 file changed, 228 insertions(+), 158 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 99d1457180..17e270c3d1 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -95,6 +95,9 @@ typedef enum CopyInsertMethod
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
} CopyInsertMethod;
+#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
+
/*
* This struct contains all the state variables used throughout a COPY
* operation. For simplicity, we use the same struct for all variants of COPY,
@@ -222,7 +225,6 @@ typedef struct CopyStateData
* converts it. Note: we guarantee that there is a \0 at
* raw_buf[raw_buf_len].
*/
-#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
char *raw_buf;
int raw_buf_index; /* next byte to process */
int raw_buf_len; /* total # of bytes stored */
@@ -350,6 +352,88 @@ if (1) \
goto not_end_of_copy; \
} else ((void) 0)
+/*
+ * CONVERT_TO_SERVER_ENCODING - convert contents to server encoding.
+ */
+#define CONVERT_TO_SERVER_ENCODING(cstate) \
+{ \
+ /* Done reading the line. Convert it to server encoding. */ \
+ if (cstate->need_transcoding) \
+ { \
+ char *cvt; \
+ cvt = pg_any_to_server(cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ cstate->file_encoding); \
+ if (cvt != cstate->line_buf.data) \
+ { \
+ /* transfer converted data back to line_buf */ \
+ resetStringInfo(&cstate->line_buf); \
+ appendBinaryStringInfo(&cstate->line_buf, cvt, strlen(cvt)); \
+ pfree(cvt); \
+ } \
+ } \
+ /* Now it's safe to use the buffer in error messages */ \
+ cstate->line_buf_converted = true; \
+}
+
+/*
+ * CLEAR_EOL_FROM_COPIED_DATA - Clear EOL from the copied data.
+ */
+#define CLEAR_EOL_FROM_COPIED_DATA(copy_line_data, copy_line_pos, copy_line_size) \
+{ \
+ /* \
+ * If we didn't hit EOF, then we must have transferred the EOL marker \
+ * to line_buf along with the data. Get rid of it. \
+ */ \
+ switch (cstate->eol_type) \
+ { \
+ case EOL_NL: \
+ Assert(copy_line_size >= 1); \
+ Assert(copy_line_data[copy_line_pos - 1] == '\n'); \
+ copy_line_data[copy_line_pos - 1] = '\0'; \
+ copy_line_size--; \
+ break; \
+ case EOL_CR: \
+ Assert(copy_line_size >= 1); \
+ Assert(copy_line_data[copy_line_pos - 1] == '\r'); \
+ copy_line_data[copy_line_pos - 1] = '\0'; \
+ copy_line_size--; \
+ break; \
+ case EOL_CRNL: \
+ Assert(copy_line_size >= 2); \
+ Assert(copy_line_data[copy_line_pos - 2] == '\r'); \
+ Assert(copy_line_data[copy_line_pos - 1] == '\n'); \
+ copy_line_data[copy_line_pos - 2] = '\0'; \
+ copy_line_size -= 2; \
+ break; \
+ case EOL_UNKNOWN: \
+ /* shouldn't get here */ \
+ Assert(false); \
+ break; \
+ } \
+}
+
+/*
+ * CLEAR_EOL_LINE - Wrapper for clearing EOL.
+ */
+#define CLEAR_EOL_LINE() \
+if (!result && !IsHeaderLine()) \
+ CLEAR_EOL_FROM_COPIED_DATA(cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ cstate->line_buf.len) \
+
+/*
+ * INCREMENTPROCESSED - Increment the lines processed.
+ */
+#define INCREMENTPROCESSED(processed) \
+processed++;
+
+/*
+ * GETPROCESSED - Get the lines processed.
+ */
+#define GETPROCESSED(processed) \
+return processed;
+
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
@@ -395,6 +479,8 @@ static bool CopyGetInt32(CopyState cstate, int32 *val);
static void CopySendInt16(CopyState cstate, int16 val);
static bool CopyGetInt16(CopyState cstate, int16 *val);
+static void PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tup_desc,
+ List *attnamelist);
/*
* Send copy start/stop messages for frontend copies. These have changed
@@ -796,6 +882,7 @@ CopyLoadRawBuf(CopyState cstate)
{
int nbytes;
int inbytes;
+ int minread = 1;
if (cstate->raw_buf_index < cstate->raw_buf_len)
{
@@ -807,8 +894,11 @@ CopyLoadRawBuf(CopyState cstate)
else
nbytes = 0; /* no data need be saved */
+ if (cstate->copy_dest == COPY_NEW_FE)
+ minread = RAW_BUF_SIZE - nbytes;
+
inbytes = CopyGetData(cstate, cstate->raw_buf + nbytes,
- 1, RAW_BUF_SIZE - nbytes);
+ minread, RAW_BUF_SIZE - nbytes);
nbytes += inbytes;
cstate->raw_buf[nbytes] = '\0';
cstate->raw_buf_index = 0;
@@ -1466,7 +1556,6 @@ BeginCopy(ParseState *pstate,
{
CopyState cstate;
TupleDesc tupDesc;
- int num_phys_attrs;
MemoryContext oldcontext;
/* Allocate workspace and zero all fields */
@@ -1632,6 +1721,24 @@ BeginCopy(ParseState *pstate,
tupDesc = cstate->queryDesc->tupDesc;
}
+ PopulateGlobalsForCopyFrom(cstate, tupDesc, attnamelist);
+ cstate->copy_dest = COPY_FILE; /* default */
+
+ MemoryContextSwitchTo(oldcontext);
+
+ return cstate;
+}
+
+/*
+ * PopulateGlobalsForCopyFrom - Populates the common variables required for copy
+ * from operation. This is a helper function for BeginCopy function.
+ */
+static void
+PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tupDesc,
+ List *attnamelist)
+{
+ int num_phys_attrs;
+
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
@@ -1751,12 +1858,6 @@ BeginCopy(ParseState *pstate,
pg_database_encoding_max_length() > 1);
/* See Multibyte encoding comment above */
cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
-
- cstate->copy_dest = COPY_FILE; /* default */
-
- MemoryContextSwitchTo(oldcontext);
-
- return cstate;
}
/*
@@ -2645,32 +2746,11 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
}
/*
- * Copy FROM file to relation.
+ * Check if the relation specified in copy from is valid.
*/
-uint64
-CopyFrom(CopyState cstate)
+static void
+CheckTargetRelValidity(CopyState cstate)
{
- ResultRelInfo *resultRelInfo;
- ResultRelInfo *target_resultRelInfo;
- ResultRelInfo *prevResultRelInfo = NULL;
- EState *estate = CreateExecutorState(); /* for ExecConstraints() */
- ModifyTableState *mtstate;
- ExprContext *econtext;
- TupleTableSlot *singleslot = NULL;
- MemoryContext oldcontext = CurrentMemoryContext;
-
- PartitionTupleRouting *proute = NULL;
- ErrorContextCallback errcallback;
- CommandId mycid = GetCurrentCommandId(true);
- int ti_options = 0; /* start with default options for insert */
- BulkInsertState bistate = NULL;
- CopyInsertMethod insertMethod;
- CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
- uint64 processed = 0;
- bool has_before_insert_row_trig;
- bool has_instead_insert_row_trig;
- bool leafpart_use_multi_insert = false;
-
Assert(cstate->rel);
/*
@@ -2707,27 +2787,6 @@ CopyFrom(CopyState cstate)
RelationGetRelationName(cstate->rel))));
}
- /*
- * If the target file is new-in-transaction, we assume that checking FSM
- * for free space is a waste of time. This could possibly be wrong, but
- * it's unlikely.
- */
- if (RELKIND_HAS_STORAGE(cstate->rel->rd_rel->relkind) &&
- (cstate->rel->rd_createSubid != InvalidSubTransactionId ||
- cstate->rel->rd_firstRelfilenodeSubid != InvalidSubTransactionId))
- ti_options |= TABLE_INSERT_SKIP_FSM;
-
- /*
- * Optimize if new relfilenode was created in this subxact or one of its
- * committed children and we won't see those rows later as part of an
- * earlier scan or command. The subxact test ensures that if this subxact
- * aborts then the frozen rows won't be visible after xact cleanup. Note
- * that the stronger test of exactly which subtransaction created it is
- * crucial for correctness of this optimization. The test for an earlier
- * scan or command tolerates false negatives. FREEZE causes other sessions
- * to see rows they would not see under MVCC, and a false negative merely
- * spreads that anomaly to the current session.
- */
if (cstate->freeze)
{
/*
@@ -2765,9 +2824,61 @@ CopyFrom(CopyState cstate)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("cannot perform COPY FREEZE because the table was not created or truncated in the current subtransaction")));
+ }
+}
+/*
+ * Copy FROM file to relation.
+ */
+uint64
+CopyFrom(CopyState cstate)
+{
+ ResultRelInfo *resultRelInfo;
+ ResultRelInfo *target_resultRelInfo;
+ ResultRelInfo *prevResultRelInfo = NULL;
+ EState *estate = CreateExecutorState(); /* for ExecConstraints() */
+ ModifyTableState *mtstate;
+ ExprContext *econtext;
+ TupleTableSlot *singleslot = NULL;
+ MemoryContext oldcontext = CurrentMemoryContext;
+
+ PartitionTupleRouting *proute = NULL;
+ ErrorContextCallback errcallback;
+ CommandId mycid = GetCurrentCommandId(true);
+ int ti_options = 0; /* start with default options for insert */
+ BulkInsertState bistate = NULL;
+ CopyInsertMethod insertMethod;
+ CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
+ uint64 processed = 0;
+ bool has_before_insert_row_trig;
+ bool has_instead_insert_row_trig;
+ bool leafpart_use_multi_insert = false;
+
+ CheckTargetRelValidity(cstate);
+
+ /*
+ * If the target file is new-in-transaction, we assume that checking FSM
+ * for free space is a waste of time. This could possibly be wrong, but
+ * it's unlikely.
+ */
+ if (RELKIND_HAS_STORAGE(cstate->rel->rd_rel->relkind) &&
+ (cstate->rel->rd_createSubid != InvalidSubTransactionId ||
+ cstate->rel->rd_firstRelfilenodeSubid != InvalidSubTransactionId))
+ ti_options |= TABLE_INSERT_SKIP_FSM;
+
+ /*
+ * Optimize if new relfilenode was created in this subxact or one of its
+ * committed children and we won't see those rows later as part of an
+ * earlier scan or command. The subxact test ensures that if this subxact
+ * aborts then the frozen rows won't be visible after xact cleanup. Note
+ * that the stronger test of exactly which subtransaction created it is
+ * crucial for correctness of this optimization. The test for an earlier
+ * scan or command tolerates false negatives. FREEZE causes other sessions
+ * to see rows they would not see under MVCC, and a false negative merely
+ * spreads that anomaly to the current session.
+ */
+ if (cstate->freeze)
ti_options |= TABLE_INSERT_FROZEN;
- }
/*
* We need a ResultRelInfo so we can use the regular executor's
@@ -3260,7 +3371,7 @@ CopyFrom(CopyState cstate)
* or FDW; this is the same definition used by nodeModifyTable.c
* for counting tuples inserted by an INSERT command.
*/
- processed++;
+ INCREMENTPROCESSED(processed)
}
}
@@ -3315,30 +3426,15 @@ CopyFrom(CopyState cstate)
FreeExecutorState(estate);
- return processed;
+ GETPROCESSED(processed)
}
/*
- * Setup to read tuples from a file for COPY FROM.
- *
- * 'rel': Used as a template for the tuples
- * 'filename': Name of server-local file to read
- * 'attnamelist': List of char *, columns to include. NIL selects all cols.
- * 'options': List of DefElem. See copy_opt_item in gram.y for selections.
- *
- * Returns a CopyState, to be passed to NextCopyFrom and related functions.
+ * PopulateCatalogInformation - populate the catalog information.
*/
-CopyState
-BeginCopyFrom(ParseState *pstate,
- Relation rel,
- const char *filename,
- bool is_program,
- copy_data_source_cb data_source_cb,
- List *attnamelist,
- List *options)
+static void
+PopulateCatalogInformation(CopyState cstate)
{
- CopyState cstate;
- bool pipe = (filename == NULL);
TupleDesc tupDesc;
AttrNumber num_phys_attrs,
num_defaults;
@@ -3348,38 +3444,8 @@ BeginCopyFrom(ParseState *pstate,
Oid in_func_oid;
int *defmap;
ExprState **defexprs;
- MemoryContext oldcontext;
bool volatile_defexprs;
- cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
- oldcontext = MemoryContextSwitchTo(cstate->copycontext);
-
- /* Initialize state variables */
- cstate->reached_eof = false;
- cstate->eol_type = EOL_UNKNOWN;
- cstate->cur_relname = RelationGetRelationName(cstate->rel);
- cstate->cur_lineno = 0;
- cstate->cur_attname = NULL;
- cstate->cur_attval = NULL;
-
- /*
- * Set up variables to avoid per-attribute overhead. attribute_buf is
- * used in both text and binary modes, but we use line_buf and raw_buf
- * only in text mode.
- */
- initStringInfo(&cstate->attribute_buf);
- if (!cstate->binary)
- {
- initStringInfo(&cstate->line_buf);
- cstate->line_buf_converted = false;
- cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
- cstate->raw_buf_index = cstate->raw_buf_len = 0;
- }
-
- /* Assign range table, we'll need it in CopyFrom. */
- if (pstate)
- cstate->range_table = pstate->p_rtable;
-
tupDesc = RelationGetDescr(cstate->rel);
num_phys_attrs = tupDesc->natts;
num_defaults = 0;
@@ -3457,6 +3523,61 @@ BeginCopyFrom(ParseState *pstate,
cstate->defexprs = defexprs;
cstate->volatile_defexprs = volatile_defexprs;
cstate->num_defaults = num_defaults;
+}
+
+/*
+ * Setup to read tuples from a file for COPY FROM.
+ *
+ * 'rel': Used as a template for the tuples
+ * 'filename': Name of server-local file to read
+ * 'attnamelist': List of char *, columns to include. NIL selects all cols.
+ * 'options': List of DefElem. See copy_opt_item in gram.y for selections.
+ *
+ * Returns a CopyState, to be passed to NextCopyFrom and related functions.
+ */
+CopyState
+BeginCopyFrom(ParseState *pstate,
+ Relation rel,
+ const char *filename,
+ bool is_program,
+ copy_data_source_cb data_source_cb,
+ List *attnamelist,
+ List *options)
+{
+ CopyState cstate;
+ bool pipe = (filename == NULL);
+ MemoryContext oldcontext;
+
+ cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+
+ /* Initialize state variables */
+ cstate->reached_eof = false;
+ cstate->eol_type = EOL_UNKNOWN;
+ cstate->cur_relname = RelationGetRelationName(cstate->rel);
+ cstate->cur_lineno = 0;
+ cstate->cur_attname = NULL;
+ cstate->cur_attval = NULL;
+
+ /*
+ * Set up variables to avoid per-attribute overhead. attribute_buf is
+ * used in both text and binary modes, but we use line_buf and raw_buf
+ * only in text mode.
+ */
+ initStringInfo(&cstate->attribute_buf);
+ if (!cstate->binary)
+ {
+ initStringInfo(&cstate->line_buf);
+ cstate->line_buf_converted = false;
+ cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+ }
+
+ /* Assign range table, we'll need it in CopyFrom. */
+ if (pstate)
+ cstate->range_table = pstate->p_rtable;
+
+ PopulateCatalogInformation(cstate);
cstate->is_program = is_program;
if (data_source_cb)
@@ -3866,60 +3987,8 @@ CopyReadLine(CopyState cstate)
} while (CopyLoadRawBuf(cstate));
}
}
- else
- {
- /*
- * If we didn't hit EOF, then we must have transferred the EOL marker
- * to line_buf along with the data. Get rid of it.
- */
- switch (cstate->eol_type)
- {
- case EOL_NL:
- Assert(cstate->line_buf.len >= 1);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\n');
- cstate->line_buf.len--;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_CR:
- Assert(cstate->line_buf.len >= 1);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\r');
- cstate->line_buf.len--;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_CRNL:
- Assert(cstate->line_buf.len >= 2);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 2] == '\r');
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\n');
- cstate->line_buf.len -= 2;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_UNKNOWN:
- /* shouldn't get here */
- Assert(false);
- break;
- }
- }
-
- /* Done reading the line. Convert it to server encoding. */
- if (cstate->need_transcoding)
- {
- char *cvt;
-
- cvt = pg_any_to_server(cstate->line_buf.data,
- cstate->line_buf.len,
- cstate->file_encoding);
- if (cvt != cstate->line_buf.data)
- {
- /* transfer converted data back to line_buf */
- resetStringInfo(&cstate->line_buf);
- appendBinaryStringInfo(&cstate->line_buf, cvt, strlen(cvt));
- pfree(cvt);
- }
- }
-
- /* Now it's safe to use the buffer in error messages */
- cstate->line_buf_converted = true;
+ CONVERT_TO_SERVER_ENCODING(cstate)
return result;
}
@@ -4283,6 +4352,7 @@ not_end_of_copy:
* Transfer any still-uncopied data to line_buf.
*/
REFILL_LINEBUF;
+ CLEAR_EOL_LINE()
return result;
}
--
2.25.1
0003-Allow-copy-from-command-to-process-data-from-file.patchapplication/octet-stream; name=0003-Allow-copy-from-command-to-process-data-from-file.patchDownload
From a3ea5cdbc62441af763800220d21b4a05a33738f Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Sun, 12 Jul 2020 17:33:14 +0530
Subject: [PATCH 3/6] Allow copy from command to process data from file/STDIN
contents to a table in parallel.
This feature allows the copy from to leverage multiple CPUs in order to copy
data from file/STDIN to a table. This adds a PARALLEL option to COPY FROM
command where the user can specify the number of workers that can be used
to perform the COPY FROM command. Specifying zero as number of workers will
disable parallelism.
The backend, to which the "COPY FROM" query is submitted acts as leader with
the responsibility of reading data from the file/stdin, launching at most n
number of workers as specified with PARALLEL 'n' option in the "COPY FROM"
query. The leader populates the common data required for the workers execution
in the DSM and shares it with the workers. The leader then executes before
statement triggers if there exists any. Leader populates DSM lines which
includes the start offset and line size, while populating the lines it reads
as many blocks as required into the DSM data blocks from the file. Each block
is of 64K size. The leader parses the data to identify a line, the existing
logic from CopyReadLineText which identifies the lines with some changes was
used for this. Leader checks if a free line is available to copy the
information, if there is no free line it waits till the required line is
freed up by the worker and then copies the identified lines information
(offset & line size) into the DSM lines. This process is repeated till the
complete file is processed. Simultaneously, the workers cache the lines(50)
locally into the local memory and release the lines to the leader for further
populating. Each worker processes the lines that it cached and inserts it into
the table. The leader does not participate in the insertion of data, leaders
only responsibility will be to identify the lines as fast as possible for the
workers to do the actual copy operation. The leader waits till all the lines
populated are processed by the workers and exits.
---
src/backend/access/common/toast_internals.c | 11 +-
src/backend/access/heap/heapam.c | 13 -
src/backend/access/transam/xact.c | 31 +
src/backend/commands/copy.c | 900 +++++++++++++++++++-
src/include/access/xact.h | 2 +
src/tools/pgindent/typedefs.list | 1 +
6 files changed, 906 insertions(+), 52 deletions(-)
diff --git a/src/backend/access/common/toast_internals.c b/src/backend/access/common/toast_internals.c
index 25a81e5ec6..586d53dd53 100644
--- a/src/backend/access/common/toast_internals.c
+++ b/src/backend/access/common/toast_internals.c
@@ -17,6 +17,7 @@
#include "access/genam.h"
#include "access/heapam.h"
#include "access/heaptoast.h"
+#include "access/parallel.h"
#include "access/table.h"
#include "access/toast_internals.h"
#include "access/xact.h"
@@ -116,7 +117,15 @@ toast_save_datum(Relation rel, Datum value,
TupleDesc toasttupDesc;
Datum t_values[3];
bool t_isnull[3];
- CommandId mycid = GetCurrentCommandId(true);
+
+ /*
+ * Parallel copy can insert toast tuples, in case of parallel copy the
+ * command would have been set already by calling AssignCommandIdForWorker.
+ * For parallel copy call GetCurrentCommandId to get currentCommandId by
+ * passing used as false, as this is taken care earlier.
+ */
+ CommandId mycid = IsParallelWorker() ? GetCurrentCommandId(false) :
+ GetCurrentCommandId(true);
struct varlena *result;
struct varatt_external toast_pointer;
union
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 7bd45703aa..862c4987f7 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2012,19 +2012,6 @@ static HeapTuple
heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
CommandId cid, int options)
{
- /*
- * Parallel operations are required to be strictly read-only in a parallel
- * worker. Parallel inserts are not safe even in the leader in the
- * general case, because group locking means that heavyweight locks for
- * relation extension or GIN page locks will not conflict between members
- * of a lock group, but we don't prohibit that case here because there are
- * useful special cases that we can safely allow, such as CREATE TABLE AS.
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-
tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index b3ee7fa7ea..94988b6739 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -501,6 +501,37 @@ GetCurrentFullTransactionIdIfAny(void)
return CurrentTransactionState->fullTransactionId;
}
+/*
+ * AssignFullTransactionIdForWorker
+ *
+ * For parallel copy, transaction id of leader will be used by the workers.
+ */
+void
+AssignFullTransactionIdForWorker(FullTransactionId fullTransactionId)
+{
+ TransactionState s = CurrentTransactionState;
+
+ Assert((IsInParallelMode() || IsParallelWorker()));
+ s->fullTransactionId = fullTransactionId;
+}
+
+/*
+ * AssignCommandIdForWorker
+ *
+ * For parallel copy, command id of leader will be used by the workers.
+ */
+void
+AssignCommandIdForWorker(CommandId commandId, bool used)
+{
+ Assert((IsInParallelMode() || IsParallelWorker()));
+
+ /* this is global to a transaction, not subtransaction-local */
+ if (used)
+ currentCommandIdUsed = true;
+
+ currentCommandId = commandId;
+}
+
/*
* MarkCurrentTransactionIdLoggedIfAny
*
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 973d20de6e..d17fbb77c6 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -26,6 +26,7 @@
#include "access/xlog.h"
#include "catalog/dependency.h"
#include "catalog/pg_authid.h"
+#include "catalog/pg_proc_d.h"
#include "catalog/pg_type.h"
#include "commands/copy.h"
#include "commands/defrem.h"
@@ -40,11 +41,13 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "optimizer/clauses.h"
#include "optimizer/optimizer.h"
#include "parser/parse_coerce.h"
#include "parser/parse_collate.h"
#include "parser/parse_expr.h"
#include "parser/parse_relation.h"
+#include "pgstat.h"
#include "port/pg_bswap.h"
#include "rewrite/rewriteHandler.h"
#include "storage/fd.h"
@@ -95,6 +98,18 @@ typedef enum CopyInsertMethod
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
} CopyInsertMethod;
+/*
+ * State of the line.
+ */
+typedef enum ParallelCopyLineState
+{
+ LINE_INIT, /* initial state of line */
+ LINE_LEADER_POPULATING, /* leader processing line */
+ LINE_LEADER_POPULATED, /* leader completed populating line */
+ LINE_WORKER_PROCESSING, /* worker processing line */
+ LINE_WORKER_PROCESSED /* worker completed processing line */
+}ParallelCopyLineState;
+
#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
#define DATA_BLOCK_SIZE RAW_BUF_SIZE
@@ -541,9 +556,13 @@ if (1) \
{ \
if (raw_buf_ptr > cstate->raw_buf_index) \
{ \
- appendBinaryStringInfo(&cstate->line_buf, \
- cstate->raw_buf + cstate->raw_buf_index, \
- raw_buf_ptr - cstate->raw_buf_index); \
+ if (!IsParallelCopy()) \
+ appendBinaryStringInfo(&cstate->line_buf, \
+ cstate->raw_buf + cstate->raw_buf_index, \
+ raw_buf_ptr - cstate->raw_buf_index); \
+ else \
+ line_size += raw_buf_ptr - cstate->raw_buf_index; \
+ \
cstate->raw_buf_index = raw_buf_ptr; \
} \
} else ((void) 0)
@@ -556,13 +575,40 @@ if (1) \
goto not_end_of_copy; \
} else ((void) 0)
+/* Begin parallel copy Macros */
+#define SET_NEWLINE_SIZE() \
+{ \
+ if (cstate->eol_type == EOL_NL || cstate->eol_type == EOL_CR) \
+ new_line_size = 1; \
+ else if (cstate->eol_type == EOL_CRNL) \
+ new_line_size = 2; \
+ else \
+ new_line_size = 0; \
+}
+
+/*
+ * COPY_WAIT_TO_PROCESS - Wait before continuing to process.
+ */
+#define COPY_WAIT_TO_PROCESS() \
+{ \
+ CHECK_FOR_INTERRUPTS(); \
+ (void) WaitLatch(MyLatch, \
+ WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, \
+ 1L, WAIT_EVENT_PG_SLEEP); \
+ ResetLatch(MyLatch); \
+}
+
+/* End parallel copy Macros */
+
/*
* CONVERT_TO_SERVER_ENCODING - convert contents to server encoding.
*/
#define CONVERT_TO_SERVER_ENCODING(cstate) \
{ \
/* Done reading the line. Convert it to server encoding. */ \
- if (cstate->need_transcoding) \
+ if (cstate->need_transcoding && \
+ (!IsParallelCopy() || \
+ (IsParallelCopy() && !IsLeader()))) \
{ \
char *cvt; \
cvt = pg_any_to_server(cstate->line_buf.data, \
@@ -621,22 +667,38 @@ if (1) \
* CLEAR_EOL_LINE - Wrapper for clearing EOL.
*/
#define CLEAR_EOL_LINE() \
-if (!result && !IsHeaderLine()) \
- CLEAR_EOL_FROM_COPIED_DATA(cstate->line_buf.data, \
- cstate->line_buf.len, \
- cstate->line_buf.len) \
+{ \
+ if (!result && !IsHeaderLine()) \
+ { \
+ if (IsParallelCopy()) \
+ CLEAR_EOL_FROM_COPIED_DATA(cstate->raw_buf, \
+ raw_buf_ptr, line_size) \
+ else \
+ CLEAR_EOL_FROM_COPIED_DATA(cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ cstate->line_buf.len) \
+ } \
+}
/*
* INCREMENTPROCESSED - Increment the lines processed.
*/
-#define INCREMENTPROCESSED(processed) \
-processed++;
+#define INCREMENTPROCESSED(processed) \
+{ \
+ if (!IsParallelCopy()) \
+ processed++; \
+ else \
+ pg_atomic_add_fetch_u64(&cstate->pcdata->pcshared_info->processed, 1); \
+}
/*
* GETPROCESSED - Get the lines processed.
*/
#define GETPROCESSED(processed) \
-return processed;
+if (!IsParallelCopy()) \
+ return processed; \
+else \
+ return pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
@@ -684,8 +746,12 @@ static void CopySendInt16(CopyState cstate, int16 val);
static bool CopyGetInt16(CopyState cstate, int16 *val);
static pg_attribute_always_inline void EndParallelCopy(ParallelContext *pcxt);
+static void ExecBeforeStmtTrigger(CopyState cstate);
+static void CheckTargetRelValidity(CopyState cstate);
static void PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tup_desc,
List *attnamelist);
+static void PopulateCatalogInformation(CopyState cstate);
+static pg_attribute_always_inline uint32 GetLinePosition(CopyState cstate);
static pg_attribute_always_inline copy_data_source_cb LookupParallelCopyFnPtr(const char *funcname);
static pg_attribute_always_inline char* LookupParallelCopyFnStr(copy_data_source_cb fn_addr);
@@ -838,6 +904,137 @@ InsertListShm(ParallelContext *pcxt, int key, List *inputlist,
}
}
+/*
+ * IsTriggerFunctionParallelSafe - Check if the trigger function is parallel
+ * safe for the triggers. Return false if any one of the trigger has parallel
+ * unsafe function.
+ */
+static pg_attribute_always_inline bool
+IsTriggerFunctionParallelSafe(TriggerDesc *trigdesc)
+{
+ int i;
+ for (i = 0; i < trigdesc->numtriggers; i++)
+ {
+ Trigger *trigger = &trigdesc->triggers[i];
+ int trigtype = RI_TRIGGER_NONE;
+
+ if (func_parallel(trigger->tgfoid) != PROPARALLEL_SAFE)
+ return false;
+
+ /* If the trigger is parallel safe, also look for RI_TRIGGER. */
+ trigtype = RI_FKey_trigger_type(trigger->tgfoid);
+ if (trigtype == RI_TRIGGER_PK || trigtype == RI_TRIGGER_FK)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * CheckExprParallelSafety - determine parallel safety of volatile expressions
+ * in default clause of column definition or in where clause and return true if
+ * they are parallel safe.
+ */
+static pg_attribute_always_inline bool
+CheckExprParallelSafety(CopyState cstate)
+{
+ if (contain_volatile_functions(cstate->whereClause))
+ {
+ if (max_parallel_hazard((Query *)cstate->whereClause) != PROPARALLEL_SAFE)
+ return false;
+ }
+
+ /*
+ * Check if any of the column has volatile default expression. if yes, and
+ * they are not parallel safe, then parallelism is not allowed. For
+ * instance, if there are any serial/bigserial columns for which nextval()
+ * default expression which is parallel unsafe is associated, parallelism
+ * should not be allowed. In non parallel copy volatile functions are not
+ * checked for nextval().
+ */
+ if (cstate->defexprs != NULL && cstate->num_defaults != 0)
+ {
+ int i;
+ for (i = 0; i < cstate->num_defaults; i++)
+ {
+ bool volatile_expr = contain_volatile_functions((Node *)cstate->defexprs[i]->expr);
+ if (volatile_expr &&
+ (max_parallel_hazard((Query *)cstate->defexprs[i]->expr)) !=
+ PROPARALLEL_SAFE)
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * FindInsertMethod - determine insert mode single, multi, or multi conditional.
+ */
+static pg_attribute_always_inline CopyInsertMethod
+FindInsertMethod(CopyState cstate)
+{
+ if (cstate->rel->trigdesc != NULL &&
+ (cstate->rel->trigdesc->trig_insert_before_row ||
+ cstate->rel->trigdesc->trig_insert_instead_row))
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
+ cstate->rel->trigdesc != NULL &&
+ cstate->rel->trigdesc->trig_insert_new_table)
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+ return CIM_MULTI_CONDITIONAL;
+
+ return CIM_MULTI;
+}
+
+/*
+ * IsParallelCopyAllowed - check for the cases where parallel copy is not
+ * applicable.
+ */
+static pg_attribute_always_inline bool
+IsParallelCopyAllowed(CopyState cstate)
+{
+ /* Parallel copy not allowed for frontend (2.0 protocol) & binary option. */
+ if ((cstate->copy_dest == COPY_OLD_FE) || cstate->binary)
+ return false;
+
+ /* Check if copy is into foreign table or temporary table. */
+ if (cstate->rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE ||
+ RelationUsesLocalBuffers(cstate->rel))
+ return false;
+
+ /* Check if trigger function is parallel safe. */
+ if (cstate->rel->trigdesc != NULL &&
+ !IsTriggerFunctionParallelSafe(cstate->rel->trigdesc))
+ return false;
+
+ /*
+ * Check if there is after statement or instead of trigger or transition
+ * table triggers.
+ */
+ if (cstate->rel->trigdesc != NULL &&
+ (cstate->rel->trigdesc->trig_insert_after_statement ||
+ cstate->rel->trigdesc->trig_insert_instead_row ||
+ cstate->rel->trigdesc->trig_insert_new_table))
+ return false;
+
+ /* Check if the volatile expressions are parallel safe, if present any. */
+ if (!CheckExprParallelSafety(cstate))
+ return false;
+
+ /* Check if the insertion mode is single. */
+ if (FindInsertMethod(cstate) == CIM_SINGLE)
+ return false;
+
+ return true;
+}
+
/*
* BeginParallelCopy - start parallel copy tasks.
*
@@ -868,6 +1065,8 @@ BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
int parallel_workers = 0;
ParallelCopyData *pcdata;
+ CheckTargetRelValidity(cstate);
+
parallel_workers = Min(nworkers, max_worker_processes);
/* Can't perform copy in parallel */
@@ -881,6 +1080,19 @@ BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
cstate->pcdata = pcdata;
+ /*
+ * User chosen parallel copy. Determine if the parallel copy is actually
+ * allowed. If not, go with the non-parallel mode.
+ */
+ if (!IsParallelCopyAllowed(cstate))
+ {
+ elog(WARNING,
+ "Parallel copy not supported for specified table, copy will be run in non-parallel mode");
+ return NULL;
+ }
+
+ full_transaction_id = GetCurrentFullTransactionId();
+
EnterParallelMode();
pcxt = CreateParallelContext("postgres", "ParallelCopyMain",
parallel_workers);
@@ -1111,7 +1323,211 @@ ParallelWorkerInitialization(ParallelCopyCommonKeyData *shared_cstate,
cstate->line_buf_converted = false;
cstate->raw_buf = NULL;
cstate->raw_buf_index = cstate->raw_buf_len = 0;
+
+ PopulateCatalogInformation(cstate);
+
+ /* Create workspace for CopyReadAttributes results. */
+ if (!cstate->binary)
+ {
+ AttrNumber attr_count = list_length(cstate->attnumlist);
+
+ cstate->max_fields = attr_count;
+ cstate->raw_fields = (char **)palloc(attr_count * sizeof(char *));
+ }
+}
+
+/*
+ * CacheLineInfo - Cache the line information to local memory.
+ */
+static bool
+CacheLineInfo(CopyState cstate, uint32 buff_count)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyData *pcdata = cstate->pcdata;
+ uint32 write_pos;
+ ParallelCopyDataBlock *data_blk_ptr;
+ ParallelCopyLineBoundary *lineInfo;
+ uint32 offset;
+ int dataSize;
+ int copiedSize = 0;
+
+ resetStringInfo(&pcdata->worker_line_buf[buff_count].line_buf);
+ write_pos = GetLinePosition(cstate);
+ if (-1 == write_pos)
+ return true;
+
+ /* Get the current line information. */
+ lineInfo = &pcshared_info->line_boundaries.ring[write_pos];
+ if (pg_atomic_read_u32(&lineInfo->line_size) == 0)
+ goto empty_data_line_update;
+
+ /* Get the block information. */
+ data_blk_ptr = &pcshared_info->data_blocks[lineInfo->first_block];
+
+ /* Get the offset information from where the data must be copied. */
+ offset = lineInfo->start_offset;
+ pcdata->worker_line_buf[buff_count].cur_lineno = lineInfo->cur_lineno;
+
+ elog(DEBUG1, "[Worker] Processing - line position:%d, block:%d, unprocessed lines:%d, offset:%d, line size:%d",
+ write_pos, lineInfo->first_block,
+ pg_atomic_read_u32(&data_blk_ptr->unprocessed_line_parts),
+ offset, pg_atomic_read_u32(&lineInfo->line_size));
+
+ for (;;)
+ {
+ uint8 skip_bytes = data_blk_ptr->skip_bytes;
+ /*
+ * There is a possibility that the above loop has come out because
+ * data_blk_ptr->curr_blk_completed is set, but dataSize read might
+ * be an old value, if data_blk_ptr->curr_blk_completed and the line is
+ * completed, line_size will be set. Read the line_size again to be
+ * sure if it is complete or partial block.
+ */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+ if (dataSize)
+ {
+ int remainingSize = dataSize - copiedSize;
+ if (!remainingSize)
+ break;
+
+ /* Whole line is in current block. */
+ if (remainingSize + offset + skip_bytes < DATA_BLOCK_SIZE)
+ {
+ appendBinaryStringInfo(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ remainingSize);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts,
+ 1);
+ break;
+ }
+ else
+ {
+ /* Line is spread across the blocks. */
+ uint32 lineInCurrentBlock = (DATA_BLOCK_SIZE - skip_bytes) - offset;
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ lineInCurrentBlock);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts, 1);
+ copiedSize += lineInCurrentBlock;
+ while (copiedSize < dataSize)
+ {
+ uint32 currentBlockCopySize;
+ ParallelCopyDataBlock *currBlkPtr = &pcshared_info->data_blocks[data_blk_ptr->following_block];
+ skip_bytes = currBlkPtr->skip_bytes;
+
+ /*
+ * If complete data is present in current block use
+ * dataSize - copiedSize, or copy the whole block from
+ * current block.
+ */
+ currentBlockCopySize = Min(dataSize - copiedSize, DATA_BLOCK_SIZE - skip_bytes);
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &currBlkPtr->data[0],
+ currentBlockCopySize);
+ pg_atomic_sub_fetch_u32(&currBlkPtr->unprocessed_line_parts, 1);
+ copiedSize += currentBlockCopySize;
+ data_blk_ptr = currBlkPtr;
+ }
+
+ break;
+ }
+ }
+ else
+ {
+ /* Copy this complete block from the current offset. */
+ uint32 lineInCurrentBlock = (DATA_BLOCK_SIZE - skip_bytes) - offset;
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ lineInCurrentBlock);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts, 1);
+ copiedSize += lineInCurrentBlock;
+
+ /*
+ * Reset the offset. For the first copy, copy from the offset. For
+ * the subsequent copy the complete block.
+ */
+ offset = 0;
+
+ /* Set data_blk_ptr to the following block. */
+ data_blk_ptr = &pcshared_info->data_blocks[data_blk_ptr->following_block];
+ }
+
+ for (;;)
+ {
+ /* Get the size of this line */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+
+ /*
+ * If the data is present in current block lineInfo.line_size
+ * will be updated. If the data is spread across the blocks either
+ * of lineInfo.line_size or data_blk_ptr->curr_blk_completed can
+ * be updated. lineInfo.line_size will be updated if the complete
+ * read is finished. data_blk_ptr->curr_blk_completed will be
+ * updated if processing of current block is finished and data
+ * processing is not finished.
+ */
+ if (data_blk_ptr->curr_blk_completed || (dataSize != -1))
+ break;
+
+ COPY_WAIT_TO_PROCESS()
+ }
+ }
+
+empty_data_line_update:
+ elog(DEBUG1, "[Worker] Completed processing line:%d", write_pos);
+ pg_atomic_write_u32(&lineInfo->line_state, LINE_WORKER_PROCESSED);
+ pg_atomic_write_u32(&lineInfo->line_size, -1);
+ pg_atomic_add_fetch_u64(&pcshared_info->total_worker_processed, 1);
+ return false;
+}
+
+/*
+ * GetWorkerLine - Returns a line for worker to process.
+ */
+static bool
+GetWorkerLine(CopyState cstate)
+{
+ uint32 buff_count;
+ ParallelCopyData *pcdata = cstate->pcdata;
+
+ /*
+ * Copy the line data to line_buf and release the line position so that the
+ * worker can continue loading data.
+ */
+ if (pcdata->worker_line_buf_pos < pcdata->worker_line_buf_count)
+ goto return_line;
+
+ pcdata->worker_line_buf_pos = 0;
+ pcdata->worker_line_buf_count = 0;
+
+ for (buff_count = 0; buff_count < WORKER_CHUNK_COUNT; buff_count++)
+ {
+ bool result = CacheLineInfo(cstate, buff_count);
+ if (result)
+ break;
+
+ pcdata->worker_line_buf_count++;
+ }
+
+ if (pcdata->worker_line_buf_count)
+ goto return_line;
+ else
+ resetStringInfo(&cstate->line_buf);
+
+ return true;
+
+return_line:
+ cstate->line_buf = pcdata->worker_line_buf[pcdata->worker_line_buf_pos].line_buf;
+ cstate->cur_lineno = pcdata->worker_line_buf[pcdata->worker_line_buf_pos].cur_lineno;
+ cstate->line_buf_valid = true;
+
+ /* Mark that encoding conversion hasn't occurred yet. */
+ cstate->line_buf_converted = false;
+ CONVERT_TO_SERVER_ENCODING(cstate)
+ pcdata->worker_line_buf_pos++;
+ return false;
}
+
/*
* ParallelCopyMain - parallel copy worker's code.
*
@@ -1157,6 +1573,8 @@ ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
ereport(DEBUG1, (errmsg("Starting parallel copy worker")));
pcdata->pcshared_info = pcshared_info;
+ AssignFullTransactionIdForWorker(pcshared_info->full_transaction_id);
+ AssignCommandIdForWorker(pcshared_info->mycid, true);
shared_cstate = (ParallelCopyCommonKeyData *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_CSTATE,
false);
@@ -1209,6 +1627,34 @@ ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
MemoryContextSwitchTo(oldcontext);
return;
}
+
+/*
+ * UpdateBlockInLineInfo - Update the line information.
+ */
+static pg_attribute_always_inline int
+UpdateBlockInLineInfo(CopyState cstate, uint32 blk_pos,
+ uint32 offset, uint32 line_size, uint32 line_state)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundaries *lineBoundaryPtr = &pcshared_info->line_boundaries;
+ ParallelCopyLineBoundary *lineInfo;
+ int line_pos = lineBoundaryPtr->leader_pos;
+
+ /* Update the line information for the worker to pick and process. */
+ lineInfo = &lineBoundaryPtr->ring[line_pos];
+ while (pg_atomic_read_u32(&lineInfo->line_size) != -1)
+ COPY_WAIT_TO_PROCESS()
+
+ lineInfo->first_block = blk_pos;
+ lineInfo->start_offset = offset;
+ lineInfo->cur_lineno = cstate->cur_lineno;
+ pg_atomic_write_u32(&lineInfo->line_size, line_size);
+ pg_atomic_write_u32(&lineInfo->line_state, line_state);
+ lineBoundaryPtr->leader_pos = (lineBoundaryPtr->leader_pos + 1) % RINGSIZE;
+
+ return line_pos;
+}
+
/*
* ParallelCopyLeader - parallel copy leader's functionality.
*
@@ -1234,9 +1680,161 @@ ParallelCopyLeader(CopyState cstate)
ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
ereport(DEBUG1, (errmsg("Running parallel copy leader")));
+ /* Execute the before statement triggers from the leader */
+ ExecBeforeStmtTrigger(cstate);
+
+ /* On input just throw the header line away. */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
+ {
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ {
+ pcshared_info->is_read_in_progress = false;
+ return; /* done */
+ }
+ }
+
+ for (;;)
+ {
+ bool done;
+ cstate->cur_lineno++;
+
+ /* Actually read the line into memory here. */
+ done = CopyReadLine(cstate);
+
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ break;
+ }
+
pcshared_info->is_read_in_progress = false;
cstate->cur_lineno = 0;
}
+
+/*
+ * GetLinePosition - return the line position that worker should process.
+ */
+static uint32
+GetLinePosition(CopyState cstate)
+{
+ ParallelCopyData *pcdata = cstate->pcdata;
+ ParallelCopyShmInfo *pcshared_info = pcdata->pcshared_info;
+ uint32 previous_pos = pcdata->worker_processed_pos;
+ uint32 write_pos = (previous_pos == -1) ? 0 : (previous_pos + 1) % RINGSIZE;
+ for (;;)
+ {
+ int dataSize;
+ bool is_read_in_progress = pcshared_info->is_read_in_progress;
+ ParallelCopyLineBoundary *lineInfo;
+ ParallelCopyDataBlock *data_blk_ptr;
+ ParallelCopyLineState line_state = LINE_LEADER_POPULATED;
+ ParallelCopyLineState curr_line_state;
+ CHECK_FOR_INTERRUPTS();
+
+ /* File read completed & no elements to process. */
+ if (!is_read_in_progress &&
+ (pcshared_info->populated ==
+ pg_atomic_read_u64(&pcshared_info->total_worker_processed)))
+ {
+ write_pos = -1;
+ break;
+ }
+
+ /* Get the current line information. */
+ lineInfo = &pcshared_info->line_boundaries.ring[write_pos];
+ curr_line_state = pg_atomic_read_u32(&lineInfo->line_state);
+ if ((write_pos % WORKER_CHUNK_COUNT == 0) &&
+ (curr_line_state == LINE_WORKER_PROCESSED ||
+ curr_line_state == LINE_WORKER_PROCESSING))
+ {
+ pcdata->worker_processed_pos = write_pos;
+ write_pos = (write_pos + WORKER_CHUNK_COUNT) % RINGSIZE;
+ continue;
+ }
+
+ /* Get the size of this line. */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+
+ if (dataSize != 0) /* If not an empty line. */
+ {
+ /* Get the block information. */
+ data_blk_ptr = &pcshared_info->data_blocks[lineInfo->first_block];
+
+ if (!data_blk_ptr->curr_blk_completed && (dataSize == -1))
+ {
+ /* Wait till the current line or block is added. */
+ COPY_WAIT_TO_PROCESS()
+ continue;
+ }
+ }
+
+ /* Make sure that no worker has consumed this element. */
+ if (pg_atomic_compare_exchange_u32(&lineInfo->line_state,
+ &line_state, LINE_WORKER_PROCESSING))
+ break;
+ }
+
+ pcdata->worker_processed_pos = write_pos;
+ return write_pos;
+}
+
+/*
+ * GetFreeCopyBlock - Get a free block for data to be copied.
+ */
+static pg_attribute_always_inline uint32
+GetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
+{
+ int count = 0;
+ uint32 last_free_block = pcshared_info->cur_block_pos;
+ uint32 block_pos = (last_free_block != -1) ? ((last_free_block + 1) % MAX_BLOCKS_COUNT): 0;
+
+ /*
+ * Get a new block for copying data, don't check curent block, current
+ * block will have some unprocessed data.
+ */
+ while (count < (MAX_BLOCKS_COUNT - 1))
+ {
+ ParallelCopyDataBlock *dataBlkPtr = &pcshared_info->data_blocks[block_pos];
+ uint32 unprocessed_line_parts = pg_atomic_read_u32(&dataBlkPtr->unprocessed_line_parts);
+ if (unprocessed_line_parts == 0)
+ {
+ dataBlkPtr->curr_blk_completed = false;
+ dataBlkPtr->skip_bytes = 0;
+ pcshared_info->cur_block_pos = block_pos;
+ return block_pos;
+ }
+
+ block_pos = (block_pos + 1) % MAX_BLOCKS_COUNT;
+ count++;
+ }
+
+ return -1;
+}
+
+/*
+ * WaitGetFreeCopyBlock - If there are no blocks available, wait and get a block
+ * for copying data.
+ */
+static uint32
+WaitGetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
+{
+ uint32 new_free_pos = -1;
+ for (;;)
+ {
+ new_free_pos = GetFreeCopyBlock(pcshared_info);
+ if (new_free_pos != -1) /* We have got one block, break now. */
+ break;
+
+ COPY_WAIT_TO_PROCESS()
+ }
+
+ return new_free_pos;
+}
+
/*
* LookupParallelCopyFnPtr - Look up parallel copy function pointer.
*/
@@ -1272,6 +1870,146 @@ LookupParallelCopyFnStr(copy_data_source_cb fn_addr)
/* We can only reach this by programming error. */
elog(ERROR, "internal function pointer not found");
}
+
+/*
+ * SetRawBufForLoad - Set raw_buf to the shared memory where the file data must
+ * be read.
+ */
+static void
+SetRawBufForLoad(CopyState cstate, uint32 line_size, uint32 copy_buf_len,
+ uint32 raw_buf_ptr, char **copy_raw_buf)
+{
+ ParallelCopyShmInfo *pcshared_info;
+ uint32 cur_block_pos;
+ uint32 next_block_pos;
+ ParallelCopyDataBlock *cur_data_blk_ptr = NULL;
+ ParallelCopyDataBlock *next_data_blk_ptr = NULL;
+
+ if (!IsParallelCopy())
+ return;
+
+ pcshared_info = cstate->pcdata->pcshared_info;
+ cur_block_pos = pcshared_info->cur_block_pos;
+ cur_data_blk_ptr = (cstate->raw_buf) ? &pcshared_info->data_blocks[cur_block_pos] : NULL;
+ next_block_pos = WaitGetFreeCopyBlock(pcshared_info);
+ next_data_blk_ptr = &pcshared_info->data_blocks[next_block_pos];
+
+ /* set raw_buf to the data block in shared memory */
+ cstate->raw_buf = next_data_blk_ptr->data;
+ *copy_raw_buf = cstate->raw_buf;
+ if (cur_data_blk_ptr && line_size)
+ {
+ /*
+ * Mark the previous block as completed, worker can start copying this
+ * data.
+ */
+ cur_data_blk_ptr->following_block = next_block_pos;
+ pg_atomic_add_fetch_u32(&cur_data_blk_ptr->unprocessed_line_parts, 1);
+ cur_data_blk_ptr->skip_bytes = copy_buf_len - raw_buf_ptr;
+ cur_data_blk_ptr->curr_blk_completed = true;
+ }
+}
+
+/*
+ * EndLineParallelCopy - Update the line information in shared memory.
+ */
+static void
+EndLineParallelCopy(CopyState cstate, uint32 line_pos, uint32 line_size,
+ uint32 raw_buf_ptr)
+{
+ uint8 new_line_size;
+ if (!IsParallelCopy())
+ return;
+
+ if (!IsHeaderLine())
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundaries *lineBoundaryPtr = &pcshared_info->line_boundaries;
+ SET_NEWLINE_SIZE()
+ if (line_size)
+ {
+ ParallelCopyLineBoundary *lineInfo = &lineBoundaryPtr->ring[line_pos];
+ /*
+ * If the new_line_size > raw_buf_ptr, then the new block has only
+ * new line char content. The unprocessed count should not be
+ * increased in this case.
+ */
+ if (raw_buf_ptr > new_line_size)
+ {
+ uint32 cur_block_pos = pcshared_info->cur_block_pos;
+ ParallelCopyDataBlock *curr_data_blk_ptr = &pcshared_info->data_blocks[cur_block_pos];
+ pg_atomic_add_fetch_u32(&curr_data_blk_ptr->unprocessed_line_parts, 1);
+ }
+
+ /* Update line size. */
+ pg_atomic_write_u32(&lineInfo->line_size, line_size);
+ pg_atomic_write_u32(&lineInfo->line_state, LINE_LEADER_POPULATED);
+ elog(DEBUG1, "[Leader] After adding - line position:%d, line_size:%d",
+ line_pos, line_size);
+ pcshared_info->populated++;
+ }
+ else if (new_line_size)
+ {
+ /* This means only new line char, empty record should be inserted.*/
+ ParallelCopyLineBoundary *lineInfo;
+ line_pos = UpdateBlockInLineInfo(cstate, -1, -1, 0,
+ LINE_LEADER_POPULATED);
+ lineInfo = &lineBoundaryPtr->ring[line_pos];
+ elog(DEBUG1, "[Leader] Added empty line with offset:%d, line position:%d, line size:%d",
+ lineInfo->start_offset, line_pos,
+ pg_atomic_read_u32(&lineInfo->line_size));
+ pcshared_info->populated++;
+ }
+ }
+}
+
+/*
+ * ExecBeforeStmtTrigger - Execute the before statement trigger, this will be
+ * executed for parallel copy by the leader process.
+ */
+static void
+ExecBeforeStmtTrigger(CopyState cstate)
+{
+ EState *estate = CreateExecutorState();
+ ResultRelInfo *resultRelInfo;
+
+ Assert(IsLeader());
+
+ /*
+ * We need a ResultRelInfo so we can use the regular executor's
+ * index-entry-making machinery. (There used to be a huge amount of code
+ * here that basically duplicated execUtils.c ...)
+ */
+ resultRelInfo = makeNode(ResultRelInfo);
+ InitResultRelInfo(resultRelInfo,
+ cstate->rel,
+ 1, /* must match rel's position in range_table */
+ NULL,
+ 0);
+
+ /* Verify the named relation is a valid target for INSERT */
+ CheckValidResultRel(resultRelInfo, CMD_INSERT);
+
+ estate->es_result_relations = resultRelInfo;
+ estate->es_num_result_relations = 1;
+ estate->es_result_relation_info = resultRelInfo;
+
+ ExecInitRangeTable(estate, cstate->range_table);
+
+ /*
+ * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
+ * should do this for COPY, since it's not really an "INSERT" statement as
+ * such. However, executing these triggers maintains consistency with the
+ * EACH ROW triggers that we already fire on COPY.
+ */
+ ExecBSInsertTriggers(estate, resultRelInfo);
+
+ /* Close any trigger target relations */
+ ExecCleanUpTriggerState(estate);
+
+ FreeExecutorState(estate);
+}
+
/*
* Send copy start/stop messages for frontend copies. These have changed
* in past protocol redesigns.
@@ -3701,7 +4439,8 @@ CopyFrom(CopyState cstate)
PartitionTupleRouting *proute = NULL;
ErrorContextCallback errcallback;
- CommandId mycid = GetCurrentCommandId(true);
+ CommandId mycid = IsParallelCopy() ? GetCurrentCommandId(false) :
+ GetCurrentCommandId(true);
int ti_options = 0; /* start with default options for insert */
BulkInsertState bistate = NULL;
CopyInsertMethod insertMethod;
@@ -3711,7 +4450,14 @@ CopyFrom(CopyState cstate)
bool has_instead_insert_row_trig;
bool leafpart_use_multi_insert = false;
- CheckTargetRelValidity(cstate);
+ /*
+ * Perform this check if it is not parallel copy. In case of parallel
+ * copy, this check is done by the leader, so that if any invalid case
+ * exist the copy from command will error out from the leader itself,
+ * avoiding launching workers, just to throw error.
+ */
+ if (!IsParallelCopy())
+ CheckTargetRelValidity(cstate);
/*
* If the target file is new-in-transaction, we assume that checking FSM
@@ -3900,13 +4646,16 @@ CopyFrom(CopyState cstate)
has_instead_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
resultRelInfo->ri_TrigDesc->trig_insert_instead_row);
- /*
- * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
- * should do this for COPY, since it's not really an "INSERT" statement as
- * such. However, executing these triggers maintains consistency with the
- * EACH ROW triggers that we already fire on COPY.
- */
- ExecBSInsertTriggers(estate, resultRelInfo);
+ if (!IsParallelCopy())
+ {
+ /*
+ * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
+ * should do this for COPY, since it's not really an "INSERT" statement as
+ * such. However, executing these triggers maintains consistency with the
+ * EACH ROW triggers that we already fire on COPY.
+ */
+ ExecBSInsertTriggers(estate, resultRelInfo);
+ }
econtext = GetPerTupleExprContext(estate);
@@ -4006,6 +4755,16 @@ CopyFrom(CopyState cstate)
!has_instead_insert_row_trig &&
resultRelInfo->ri_FdwRoutine == NULL;
+ /*
+ * If the table has any partitions that are either foreign or
+ * has BEFORE/INSTEAD OF triggers, we can't perform copy
+ * operations with parallel workers.
+ */
+ if (!leafpart_use_multi_insert && IsParallelWorker())
+ ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot perform PARALLEL COPY if partition has BEFORE/INSTEAD OF triggers, or if the partition is foreign partition"),
+ errhint("Try COPY without PARALLEL option")));
+
/* Set the multi-insert buffer to use for this partition. */
if (leafpart_use_multi_insert)
{
@@ -4426,7 +5185,7 @@ BeginCopyFrom(ParseState *pstate,
{
initStringInfo(&cstate->line_buf);
cstate->line_buf_converted = false;
- cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf = (IsParallelCopy()) ? NULL : (char *) palloc(RAW_BUF_SIZE + 1);
cstate->raw_buf_index = cstate->raw_buf_len = 0;
}
@@ -4571,26 +5330,35 @@ NextCopyFromRawFields(CopyState cstate, char ***fields, int *nfields)
/* only available for text or csv input */
Assert(!cstate->binary);
- /* on input just throw the header line away */
- if (cstate->cur_lineno == 0 && cstate->header_line)
+ if (IsParallelCopy())
{
- cstate->cur_lineno++;
- if (CopyReadLine(cstate))
- return false; /* done */
+ done = GetWorkerLine(cstate);
+ if (done && cstate->line_buf.len == 0)
+ return false;
}
+ else
+ {
+ /* on input just throw the header line away */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
+ {
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ return false; /* done */
+ }
- cstate->cur_lineno++;
+ cstate->cur_lineno++;
- /* Actually read the line into memory here */
- done = CopyReadLine(cstate);
+ /* Actually read the line into memory here */
+ done = CopyReadLine(cstate);
- /*
- * EOF at start of line means we're done. If we see EOF after some
- * characters, we act as though it was newline followed by EOF, ie,
- * process the line and then exit loop on next iteration.
- */
- if (done && cstate->line_buf.len == 0)
- return false;
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ return false;
+ }
/* Parse the line into de-escaped field values */
if (cstate->csv_mode)
@@ -4838,9 +5606,31 @@ CopyReadLine(CopyState cstate)
*/
if (cstate->copy_dest == COPY_NEW_FE)
{
+ bool bIsFirst = true;
do
{
- cstate->raw_buf_index = cstate->raw_buf_len;
+ if (!IsParallelCopy())
+ cstate->raw_buf_index = cstate->raw_buf_len;
+ else
+ {
+ if (cstate->raw_buf_index == RAW_BUF_SIZE)
+ {
+ /* Get a new block if it is the first time, From the
+ * subsequent time, reset the index and re-use the same
+ * block.
+ */
+ if (bIsFirst)
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint32 block_pos = WaitGetFreeCopyBlock(pcshared_info);
+ cstate->raw_buf = pcshared_info->data_blocks[block_pos].data;
+ bIsFirst = false;
+ }
+
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+ }
+ }
+
} while (CopyLoadRawBuf(cstate));
}
}
@@ -4870,6 +5660,11 @@ CopyReadLineText(CopyState cstate)
char quotec = '\0';
char escapec = '\0';
+ /* For parallel copy */
+ uint32 line_size = 0;
+ int line_pos = 0;
+
+ cstate->eol_type = EOL_UNKNOWN;
if (cstate->csv_mode)
{
quotec = cstate->quote[0];
@@ -4924,6 +5719,8 @@ CopyReadLineText(CopyState cstate)
if (raw_buf_ptr >= copy_buf_len || need_data)
{
REFILL_LINEBUF;
+ SetRawBufForLoad(cstate, line_size, copy_buf_len, raw_buf_ptr,
+ ©_raw_buf);
/*
* Try to read some more data. This will certainly reset
@@ -5148,9 +5945,15 @@ CopyReadLineText(CopyState cstate)
* discard the data and the \. sequence.
*/
if (prev_raw_ptr > cstate->raw_buf_index)
- appendBinaryStringInfo(&cstate->line_buf,
+ {
+ if (!IsParallelCopy())
+ appendBinaryStringInfo(&cstate->line_buf,
cstate->raw_buf + cstate->raw_buf_index,
prev_raw_ptr - cstate->raw_buf_index);
+ else
+ line_size += prev_raw_ptr - cstate->raw_buf_index;
+ }
+
cstate->raw_buf_index = raw_buf_ptr;
result = true; /* report EOF */
break;
@@ -5202,6 +6005,26 @@ not_end_of_copy:
IF_NEED_REFILL_AND_EOF_BREAK(mblen - 1);
raw_buf_ptr += mblen - 1;
}
+
+ /*
+ * Skip the header line. Update the line here, this cannot be done at
+ * the beginning, as there is a possibility that file contains empty
+ * lines.
+ */
+ if (IsParallelCopy() && first_char_in_line && !IsHeaderLine())
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundary *lineInfo;
+ uint32 line_first_block = pcshared_info->cur_block_pos;
+ line_pos = UpdateBlockInLineInfo(cstate,
+ line_first_block,
+ cstate->raw_buf_index, -1,
+ LINE_LEADER_POPULATING);
+ lineInfo = &pcshared_info->line_boundaries.ring[line_pos];
+ elog(DEBUG1, "[Leader] Adding - block:%d, offset:%d, line position:%d",
+ line_first_block, lineInfo->start_offset, line_pos);
+ }
+
first_char_in_line = false;
} /* end of outer loop */
@@ -5210,6 +6033,7 @@ not_end_of_copy:
*/
REFILL_LINEBUF;
CLEAR_EOL_LINE()
+ EndLineParallelCopy(cstate, line_pos, line_size, raw_buf_ptr);
return result;
}
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index db191879b9..20dafb75ba 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -381,6 +381,8 @@ extern FullTransactionId GetTopFullTransactionId(void);
extern FullTransactionId GetTopFullTransactionIdIfAny(void);
extern FullTransactionId GetCurrentFullTransactionId(void);
extern FullTransactionId GetCurrentFullTransactionIdIfAny(void);
+extern void AssignFullTransactionIdForWorker(FullTransactionId fullTransactionId);
+extern void AssignCommandIdForWorker(CommandId commandId, bool used);
extern void MarkCurrentTransactionIdLoggedIfAny(void);
extern bool SubTransactionIsActive(SubTransactionId subxid);
extern CommandId GetCurrentCommandId(bool used);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 6a42ac425a..86a76208b7 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1701,6 +1701,7 @@ ParallelCompletionPtr
ParallelContext
ParallelCopyLineBoundaries
ParallelCopyLineBoundary
+ParallelCopyLineState
ParallelCopyCommonKeyData
ParallelCopyData
ParallelCopyDataBlock
--
2.25.1
0004-Documentation-for-parallel-copy.patchapplication/octet-stream; name=0004-Documentation-for-parallel-copy.patchDownload
From d04485ce9e4768cbab3f2722dadcf032586fda33 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Wed, 17 Jun 2020 07:34:56 +0530
Subject: [PATCH 4/6] Documentation for parallel copy.
This patch has the documentation changes for parallel copy.
---
doc/src/sgml/ref/copy.sgml | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18189ab..95d349d 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -37,6 +37,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
NULL '<replaceable class="parameter">null_string</replaceable>'
HEADER [ <replaceable class="parameter">boolean</replaceable> ]
+ PARALLEL <replaceable class="parameter">integer</replaceable>
QUOTE '<replaceable class="parameter">quote_character</replaceable>'
ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -275,6 +276,21 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</varlistentry>
<varlistentry>
+ <term><literal>PARALLEL</literal></term>
+ <listitem>
+ <para>
+ Perform <command>COPY FROM</command> in parallel using <replaceable
+ class="parameter"> integer</replaceable> background workers. Please
+ note that it is not guaranteed that the number of parallel workers
+ specified in <replaceable class="parameter">integer</replaceable> will
+ be used during execution. It is possible for a copy to run with fewer
+ workers than specified, or even with no workers at all. This option is
+ allowed only in <command>COPY FROM</command>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>QUOTE</literal></term>
<listitem>
<para>
--
1.8.3.1
0005-Tests-for-parallel-copy.patchapplication/octet-stream; name=0005-Tests-for-parallel-copy.patchDownload
From cd57bede56d817a95ae251ab4b3c543c73bcf9a8 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Wed, 17 Jun 2020 07:37:26 +0530
Subject: [PATCH 5/6] Tests for parallel copy.
This patch has the tests for parallel copy.
---
src/test/regress/expected/copy2.out | 210 +++++++++++++++++++++++++++++++++++-
src/test/regress/input/copy.source | 12 +++
src/test/regress/output/copy.source | 12 +++
src/test/regress/sql/copy2.sql | 208 ++++++++++++++++++++++++++++++++++-
4 files changed, 434 insertions(+), 8 deletions(-)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index e40287d..ef75148 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -254,18 +254,32 @@ It is "perfect".|
"It is ""perfect""."," "
"",
--test that we read consecutive LFs properly
-CREATE TEMP TABLE testnl (a int, b text, c int);
+CREATE TABLE testnl (a int, b text, c int);
COPY testnl FROM stdin CSV;
+COPY testnl FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
+SELECT * FROM testnl;
+ a | b | c
+---+----------------------+---
+ 1 | a field with two LFs+| 2
+ | +|
+ | inside |
+ 1 | a field with two LFs+| 2
+ | +|
+ | inside |
+(2 rows)
+
-- test end of copy marker
-CREATE TEMP TABLE testeoc (a text);
+CREATE TABLE testeoc (a text);
COPY testeoc FROM stdin CSV;
+TRUNCATE testeoc;
+COPY testeoc FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
COPY testeoc TO stdout CSV;
a\.
\.b
c\.d
"\."
-- test handling of nonstandard null marker that violates escaping rules
-CREATE TEMP TABLE testnull(a int, b text);
+CREATE TABLE testnull(a int, b text);
INSERT INTO testnull VALUES (1, E'\\0'), (NULL, NULL);
COPY testnull TO stdout WITH NULL AS E'\\0';
1 \\0
@@ -280,6 +294,15 @@ SELECT * FROM testnull;
|
(4 rows)
+TRUNCATE testnull;
+COPY testnull FROM stdin WITH (NULL E'\\0', PARALLEL 1);
+SELECT * FROM testnull;
+ a | b
+----+----
+ 42 | \0
+ |
+(2 rows)
+
BEGIN;
CREATE TABLE vistest (LIKE testeoc);
COPY vistest FROM stdin CSV;
@@ -349,6 +372,34 @@ SELECT * FROM vistest;
BEGIN;
TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+SELECT * FROM vistest;
+ a
+----
+ a2
+ b
+(2 rows)
+
+SAVEPOINT s1;
+TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+SELECT * FROM vistest;
+ a
+----
+ d2
+ e
+(2 rows)
+
+COMMIT;
+SELECT * FROM vistest;
+ a
+----
+ d2
+ e
+(2 rows)
+
+BEGIN;
+TRUNCATE vistest;
COPY vistest FROM stdin CSV FREEZE;
SELECT * FROM vistest;
a
@@ -409,7 +460,7 @@ SELECT * FROM vistest;
(2 rows)
-- Test FORCE_NOT_NULL and FORCE_NULL options
-CREATE TEMP TABLE forcetest (
+CREATE TABLE forcetest (
a INT NOT NULL,
b TEXT NOT NULL,
c TEXT,
@@ -420,6 +471,8 @@ CREATE TEMP TABLE forcetest (
-- should succeed with no effect ("b" remains an empty string, "c" remains NULL)
BEGIN;
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
+TRUNCATE forcetest;
+COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c), PARALLEL 1);
COMMIT;
SELECT b, c FROM forcetest WHERE a = 1;
b | c
@@ -430,6 +483,8 @@ SELECT b, c FROM forcetest WHERE a = 1;
-- should succeed, FORCE_NULL and FORCE_NOT_NULL can be both specified
BEGIN;
COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d));
+TRUNCATE forcetest;
+COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d), PARALLEL 1);
COMMIT;
SELECT c, d FROM forcetest WHERE a = 2;
c | d
@@ -486,6 +541,31 @@ select * from check_con_tbl;
(2 rows)
+\d+ check_con_tbl
+ Table "public.check_con_tbl"
+ Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
+--------+---------+-----------+----------+---------+---------+--------------+-------------
+ f1 | integer | | | | plain | |
+Check constraints:
+ "check_con_tbl_check" CHECK (check_con_function(check_con_tbl.*))
+
+truncate check_con_tbl;
+copy check_con_tbl from stdin with (parallel 1);
+NOTICE: input = {"f1":1}
+NOTICE: input = {"f1":null}
+copy check_con_tbl from stdin with (parallel 1);
+NOTICE: input = {"f1":0}
+ERROR: new row for relation "check_con_tbl" violates check constraint "check_con_tbl_check"
+DETAIL: Failing row contains (0).
+CONTEXT: COPY check_con_tbl, line 1: "0"
+parallel worker
+select * from check_con_tbl;
+ f1
+----
+ 1
+
+(2 rows)
+
-- test with RLS enabled.
CREATE ROLE regress_rls_copy_user;
CREATE ROLE regress_rls_copy_user_colperms;
@@ -600,8 +680,130 @@ SELECT * FROM instead_of_insert_tbl;
(2 rows)
COMMIT;
+-- Parallel copy tests.
+CREATE TABLE test_parallel_copy (
+ a INT,
+ b INT,
+ c TEXT not null default 'stuff',
+ d TEXT,
+ e TEXT
+) ;
+COPY test_parallel_copy (a, b, c, d, e) from stdin with (PARALLEL 1);
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1, FORMAT 'csv', HEADER);
+-- zero workers: should perform non-parallel copy
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0');
+ERROR: argument to option "parallel" must be a positive integer greater than zero
+LINE 1: COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0'...
+ ^
+-- referencing table: should perform non-parallel copy
+CREATE TABLE test_copy_pk(c1 INT PRIMARY KEY);
+INSERT INTO test_copy_pk VALUES(10);
+CREATE TABLE test_copy_ri(c1 INT REFERENCES test_copy_pk(c1));
+COPY test_copy_ri from stdin with (FORMAT csv, DELIMITER ',', PARALLEL 1);
+WARNING: Parallel copy not supported for specified table, copy will be run in non-parallel mode
+-- expressions: should perform non-parallel copy
+CREATE TABLE test_copy_expr (index INT, height REAL, weight REAL);
+COPY test_copy_expr FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1) WHERE height > random() * 65;
+WARNING: Parallel copy not supported for specified table, copy will be run in non-parallel mode
+-- serial data: should perform non-parallel copy
+CREATE TABLE testserial (index SERIAL, height REAL);
+COPY testserial(height) FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1);
+WARNING: Parallel copy not supported for specified table, copy will be run in non-parallel mode
+-- temporary table copy: should peform non-parallel copy
+CREATE TEMPORARY TABLE temp_test(
+ a int
+) ;
+COPY temp_test (a) from stdin with (PARALLEL 1);
+WARNING: Parallel copy not supported for specified table, copy will be run in non-parallel mode
+-- non-existent column in column list: should fail
+COPY test_parallel_copy (xyz) from stdin with (PARALLEL 1);
+ERROR: column "xyz" of relation "test_parallel_copy" does not exist
+-- too many columns in column list: should fail
+COPY test_parallel_copy (a, b, c, d, e, d, c) from stdin with (PARALLEL 1);
+ERROR: column "d" specified more than once
+-- missing data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: invalid input syntax for type integer: ""
+CONTEXT: COPY test_parallel_copy, line 0, column a: ""
+parallel worker
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: missing data for column "e"
+CONTEXT: COPY test_parallel_copy, line 1: "2000 230 23 23"
+parallel worker
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: missing data for column "e"
+CONTEXT: COPY test_parallel_copy, line 1: "2001 231 \N \N"
+parallel worker
+-- extra data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: extra data after last expected column
+CONTEXT: COPY test_parallel_copy, line 1: "2002 232 40 50 60 70 80"
+parallel worker
+-- various COPY options: delimiters, oids, NULL string, encoding
+COPY test_parallel_copy (b, c, d, e) from stdin with (delimiter ',', null 'x', PARALLEL 1) ;
+COPY test_parallel_copy from stdin WITH (DELIMITER ';', NULL '', PARALLEL 1);
+COPY test_parallel_copy from stdin WITH (DELIMITER ':', NULL E'\\X', ENCODING 'sql_ascii', PARALLEL 1);
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = 50004;
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a > 60003;
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE f > 60003;
+ERROR: column "f" does not exist
+LINE 1: ..._parallel_copy from stdin WITH (PARALLEL 1) WHERE f > 60003;
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = max(x.b);
+ERROR: missing FROM-clause entry for table "x"
+LINE 1: ...rallel_copy from stdin WITH (PARALLEL 1) WHERE a = max(x.b);
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (SELECT 1 FROM x);
+ERROR: cannot use subquery in COPY FROM WHERE condition
+LINE 1: ...arallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (SELECT...
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (generate_series(1,5));
+ERROR: set-returning functions are not allowed in COPY FROM WHERE conditions
+LINE 1: ...lel_copy from stdin WITH (PARALLEL 1) WHERE a IN (generate_s...
+ ^
+COPY test_parallel_copy from stdin WITH(PARALLEL 1) WHERE a = row_number() over(b);
+ERROR: window functions are not allowed in COPY FROM WHERE conditions
+LINE 1: ...rallel_copy from stdin WITH(PARALLEL 1) WHERE a = row_number...
+ ^
+-- check results of copy in
+SELECT * FROM test_parallel_copy;
+ a | b | c | d | e
+-------+----+------------+---------+---------
+ 1 | 11 | test_c1 | test_d1 | test_e1
+ 2 | 12 | test_c2 | test_d2 | test_e2
+ | 3 | stuff | test_d3 |
+ | 4 | stuff | test_d4 |
+ | 5 | stuff | test_d5 |
+ | | 45 | 80 | 90
+ | | x | \x | \x
+ | | , | \, | \
+ 3000 | | c | |
+ 4000 | | C | |
+ 4001 | 1 | empty | |
+ 4002 | 2 | null | |
+ 4003 | 3 | Backslash | \ | \
+ 4004 | 4 | BackslashX | \X | \X
+ 4005 | 5 | N | N | N
+ 4006 | 6 | BackslashN | \N | \N
+ 4007 | 7 | XX | XX | XX
+ 4008 | 8 | Delimiter | : | :
+ 50004 | 25 | 35 | 45 | 55
+ 60004 | 25 | 35 | 45 | 55
+ 60005 | 26 | 36 | 46 | 56
+(21 rows)
+
-- clean up
+DROP TABLE test_copy_ri;
+DROP TABLE test_copy_pk;
+DROP TABLE test_copy_expr;
+DROP TABLE testeoc;
+DROP TABLE testnl;
DROP TABLE forcetest;
+DROP TABLE test_parallel_copy;
+DROP TABLE testserial;
+DROP TABLE testnull;
DROP TABLE vistest;
DROP FUNCTION truncate_in_subxact();
DROP TABLE x, y;
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index a1d529a..159c058 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -15,6 +15,13 @@ DELETE FROM onek;
COPY onek FROM '@abs_builddir@/results/onek.data';
+-- test parallel copy
+COPY tenk1 FROM '@abs_srcdir@/data/tenk.data' WITH (parallel 2);
+
+SELECT COUNT(*) FROM tenk1;
+
+TRUNCATE tenk1;
+
COPY tenk1 FROM '@abs_srcdir@/data/tenk.data';
COPY slow_emp4000 FROM '@abs_srcdir@/data/rect.data';
@@ -159,6 +166,11 @@ truncate parted_copytest;
copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv';
+-- Test parallel copy from with a partitioned table.
+truncate parted_copytest;
+
+copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv' with (parallel 2);
+
-- Ensure COPY FREEZE errors for partitioned tables.
begin;
truncate parted_copytest;
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index 938d355..c3003fe 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -9,6 +9,15 @@ COPY onek FROM '@abs_srcdir@/data/onek.data';
COPY onek TO '@abs_builddir@/results/onek.data';
DELETE FROM onek;
COPY onek FROM '@abs_builddir@/results/onek.data';
+-- test parallel copy
+COPY tenk1 FROM '@abs_srcdir@/data/tenk.data' WITH (parallel 2);
+SELECT COUNT(*) FROM tenk1;
+ count
+-------
+ 10000
+(1 row)
+
+TRUNCATE tenk1;
COPY tenk1 FROM '@abs_srcdir@/data/tenk.data';
COPY slow_emp4000 FROM '@abs_srcdir@/data/rect.data';
COPY person FROM '@abs_srcdir@/data/person.data';
@@ -113,6 +122,9 @@ insert into parted_copytest select x,1,'One' from generate_series(1011,1020) x;
copy (select * from parted_copytest order by a) to '@abs_builddir@/results/parted_copytest.csv';
truncate parted_copytest;
copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv';
+-- Test parallel copy from with a partitioned table.
+truncate parted_copytest;
+copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv' with (parallel 2);
-- Ensure COPY FREEZE errors for partitioned tables.
begin;
truncate parted_copytest;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 902f4fa..032cea9 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -157,7 +157,7 @@ COPY y TO stdout (FORMAT CSV, FORCE_QUOTE *);
--test that we read consecutive LFs properly
-CREATE TEMP TABLE testnl (a int, b text, c int);
+CREATE TABLE testnl (a int, b text, c int);
COPY testnl FROM stdin CSV;
1,"a field with two LFs
@@ -165,8 +165,16 @@ COPY testnl FROM stdin CSV;
inside",2
\.
+COPY testnl FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
+1,"a field with two LFs
+
+inside",2
+\.
+
+SELECT * FROM testnl;
+
-- test end of copy marker
-CREATE TEMP TABLE testeoc (a text);
+CREATE TABLE testeoc (a text);
COPY testeoc FROM stdin CSV;
a\.
@@ -175,11 +183,19 @@ c\.d
"\."
\.
+TRUNCATE testeoc;
+COPY testeoc FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
+a\.
+\.b
+c\.d
+"\."
+\.
+
COPY testeoc TO stdout CSV;
-- test handling of nonstandard null marker that violates escaping rules
-CREATE TEMP TABLE testnull(a int, b text);
+CREATE TABLE testnull(a int, b text);
INSERT INTO testnull VALUES (1, E'\\0'), (NULL, NULL);
COPY testnull TO stdout WITH NULL AS E'\\0';
@@ -191,6 +207,14 @@ COPY testnull FROM stdin WITH NULL AS E'\\0';
SELECT * FROM testnull;
+TRUNCATE testnull;
+COPY testnull FROM stdin WITH (NULL E'\\0', PARALLEL 1);
+42 \\0
+\0 \0
+\.
+
+SELECT * FROM testnull;
+
BEGIN;
CREATE TABLE vistest (LIKE testeoc);
COPY vistest FROM stdin CSV;
@@ -235,6 +259,23 @@ SELECT * FROM vistest;
BEGIN;
TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+a2
+b
+\.
+SELECT * FROM vistest;
+SAVEPOINT s1;
+TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+d2
+e
+\.
+SELECT * FROM vistest;
+COMMIT;
+SELECT * FROM vistest;
+
+BEGIN;
+TRUNCATE vistest;
COPY vistest FROM stdin CSV FREEZE;
x
y
@@ -284,7 +325,7 @@ SELECT * FROM vistest;
COMMIT;
SELECT * FROM vistest;
-- Test FORCE_NOT_NULL and FORCE_NULL options
-CREATE TEMP TABLE forcetest (
+CREATE TABLE forcetest (
a INT NOT NULL,
b TEXT NOT NULL,
c TEXT,
@@ -297,6 +338,10 @@ BEGIN;
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
1,,""
\.
+TRUNCATE forcetest;
+COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c), PARALLEL 1);
+1,,""
+\.
COMMIT;
SELECT b, c FROM forcetest WHERE a = 1;
-- should succeed, FORCE_NULL and FORCE_NOT_NULL can be both specified
@@ -304,6 +349,10 @@ BEGIN;
COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d));
2,'a',,""
\.
+TRUNCATE forcetest;
+COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d), PARALLEL 1);
+2,'a',,""
+\.
COMMIT;
SELECT c, d FROM forcetest WHERE a = 2;
-- should fail with not-null constraint violation
@@ -339,6 +388,16 @@ copy check_con_tbl from stdin;
0
\.
select * from check_con_tbl;
+\d+ check_con_tbl
+truncate check_con_tbl;
+copy check_con_tbl from stdin with (parallel 1);
+1
+\N
+\.
+copy check_con_tbl from stdin with (parallel 1);
+0
+\.
+select * from check_con_tbl;
-- test with RLS enabled.
CREATE ROLE regress_rls_copy_user;
@@ -440,8 +499,149 @@ test1
SELECT * FROM instead_of_insert_tbl;
COMMIT;
+-- Parallel copy tests.
+CREATE TABLE test_parallel_copy (
+ a INT,
+ b INT,
+ c TEXT not null default 'stuff',
+ d TEXT,
+ e TEXT
+) ;
+
+COPY test_parallel_copy (a, b, c, d, e) from stdin with (PARALLEL 1);
+1 11 test_c1 test_d1 test_e1
+2 12 test_c2 test_d2 test_e2
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+3 test_d3
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+4 test_d4
+5 test_d5
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1, FORMAT 'csv', HEADER);
+b d
+\.
+
+-- zero workers: should perform non-parallel copy
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0');
+
+-- referencing table: should perform non-parallel copy
+CREATE TABLE test_copy_pk(c1 INT PRIMARY KEY);
+INSERT INTO test_copy_pk VALUES(10);
+CREATE TABLE test_copy_ri(c1 INT REFERENCES test_copy_pk(c1));
+COPY test_copy_ri from stdin with (FORMAT csv, DELIMITER ',', PARALLEL 1);
+10
+\.
+
+-- expressions: should perform non-parallel copy
+CREATE TABLE test_copy_expr (index INT, height REAL, weight REAL);
+
+COPY test_copy_expr FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1) WHERE height > random() * 65;
+60,60,60
+\.
+
+-- serial data: should perform non-parallel copy
+CREATE TABLE testserial (index SERIAL, height REAL);
+
+COPY testserial(height) FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1);
+60
+\.
+
+-- temporary table copy: should peform non-parallel copy
+CREATE TEMPORARY TABLE temp_test(
+ a int
+) ;
+
+COPY temp_test (a) from stdin with (PARALLEL 1);
+10
+\.
+
+-- non-existent column in column list: should fail
+COPY test_parallel_copy (xyz) from stdin with (PARALLEL 1);
+
+-- too many columns in column list: should fail
+COPY test_parallel_copy (a, b, c, d, e, d, c) from stdin with (PARALLEL 1);
+
+-- missing data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+
+\.
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+2000 230 23 23
+\.
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+2001 231 \N \N
+\.
+
+-- extra data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+2002 232 40 50 60 70 80
+\.
+
+-- various COPY options: delimiters, oids, NULL string, encoding
+COPY test_parallel_copy (b, c, d, e) from stdin with (delimiter ',', null 'x', PARALLEL 1) ;
+x,45,80,90
+x,\x,\\x,\\\x
+x,\,,\\\,,\\
+\.
+
+COPY test_parallel_copy from stdin WITH (DELIMITER ';', NULL '', PARALLEL 1);
+3000;;c;;
+\.
+
+COPY test_parallel_copy from stdin WITH (DELIMITER ':', NULL E'\\X', ENCODING 'sql_ascii', PARALLEL 1);
+4000:\X:C:\X:\X
+4001:1:empty::
+4002:2:null:\X:\X
+4003:3:Backslash:\\:\\
+4004:4:BackslashX:\\X:\\X
+4005:5:N:\N:\N
+4006:6:BackslashN:\\N:\\N
+4007:7:XX:\XX:\XX
+4008:8:Delimiter:\::\:
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = 50004;
+50003 24 34 44 54
+50004 25 35 45 55
+50005 26 36 46 56
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a > 60003;
+60001 22 32 42 52
+60002 23 33 43 53
+60003 24 34 44 54
+60004 25 35 45 55
+60005 26 36 46 56
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE f > 60003;
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = max(x.b);
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (SELECT 1 FROM x);
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (generate_series(1,5));
+
+COPY test_parallel_copy from stdin WITH(PARALLEL 1) WHERE a = row_number() over(b);
+
+-- check results of copy in
+SELECT * FROM test_parallel_copy;
+
-- clean up
+DROP TABLE test_copy_ri;
+DROP TABLE test_copy_pk;
+DROP TABLE test_copy_expr;
+DROP TABLE testeoc;
+DROP TABLE testnl;
DROP TABLE forcetest;
+DROP TABLE test_parallel_copy;
+DROP TABLE testserial;
+DROP TABLE testnull;
DROP TABLE vistest;
DROP FUNCTION truncate_in_subxact();
DROP TABLE x, y;
--
1.8.3.1
0006-Parallel-Copy-For-Binary-Format-Files.patchapplication/octet-stream; name=0006-Parallel-Copy-For-Binary-Format-Files.patchDownload
From dbcf9dcfc87d8e5eed4d56b2a079a64258c591c6 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Sat, 11 Jul 2020 11:39:50 +0530
Subject: [PATCH v10] Parallel Copy For Binary Format Files
Leader reads data from the file into the DSM data blocks each of 64K size.
It also identifies each tuple data block id, start offset, end offset,
tuple size and updates this information in the ring data structure.
Workers parallely read the tuple information from the ring data structure,
the actual tuple data from the data blocks and parallely insert the tuples
into the table.
---
src/backend/commands/copy.c | 655 +++++++++++++++++++++++++++++++-----
1 file changed, 574 insertions(+), 81 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 048f2d2cb6..a4e4163c32 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -219,6 +219,17 @@ typedef struct ParallelCopyLineBuf
uint64 cur_lineno; /* line number for error messages */
}ParallelCopyLineBuf;
+/*
+ * Represents the usage mode for CopyReadBinaryGetDataBlock.
+ */
+typedef enum FieldInfoType
+{
+ FIELD_NONE = 0,
+ FIELD_COUNT,
+ FIELD_SIZE,
+ FIELD_DATA
+} FieldInfoType;
+
/*
* Parallel copy data information.
*/
@@ -240,6 +251,9 @@ typedef struct ParallelCopyData
/* Current position in worker_line_buf */
uint32 worker_line_buf_pos;
+
+ /* For binary formatted files */
+ ParallelCopyDataBlock *curr_data_block;
}ParallelCopyData;
/*
@@ -397,6 +411,7 @@ typedef struct ParallelCopyCommonKeyData
/* Working state for COPY FROM */
AttrNumber num_defaults;
Oid relid;
+ bool binary;
}ParallelCopyCommonKeyData;
/*
@@ -502,7 +517,6 @@ typedef struct CopyMultiInsertInfo
int ti_options; /* table insert options */
} CopyMultiInsertInfo;
-
/*
* These macros centralize code used to process line_buf and raw_buf buffers.
* They are macros because they often do continue/break control and to avoid
@@ -697,8 +711,110 @@ if (!IsParallelCopy()) \
else \
return pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
-static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
+/*
+ * CHECK_FIELD_COUNT - Handles the error cases for field count
+ * for binary format files.
+ */
+#define CHECK_FIELD_COUNT \
+{\
+ if (fld_count == -1) \
+ { \
+ if (IsParallelCopy() && \
+ !IsLeader()) \
+ return true; \
+ else if (IsParallelCopy() && \
+ IsLeader()) \
+ { \
+ if (cstate->pcdata->curr_data_block->data[cstate->raw_buf_index + sizeof(fld_count)] != 0) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("received copy data after EOF marker"))); \
+ return true; \
+ } \
+ else \
+ { \
+ /* \
+ * Received EOF marker. In a V3-protocol copy, wait for the \
+ * protocol-level EOF, and complain if it doesn't come \
+ * immediately. This ensures that we correctly handle CopyFail, \
+ * if client chooses to send that now. \
+ * \
+ * Note that we MUST NOT try to read more data in an old-protocol \
+ * copy, since there is no protocol-level EOF marker then. We \
+ * could go either way for copy from file, but choose to throw \
+ * error if there's data after the EOF marker, for consistency \
+ * with the new-protocol case. \
+ */ \
+ char dummy; \
+ if (cstate->copy_dest != COPY_OLD_FE && \
+ CopyGetData(cstate, &dummy, 1, 1) > 0) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("received copy data after EOF marker"))); \
+ return false; \
+ } \
+ } \
+ if (fld_count != cstate->max_fields) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("row field count is %d, expected %d", \
+ (int) fld_count, cstate->max_fields))); \
+}
+
+/*
+ * CHECK_FIELD_SIZE - Handles the error case for field size
+ * for binary format files.
+ */
+#define CHECK_FIELD_SIZE(fld_size) \
+{ \
+ if (fld_size < -1) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("invalid field size")));\
+}
+
+/*
+ * EOF_ERROR - Error statement for EOF for binary format
+ * files.
+ */
+#define EOF_ERROR \
+{ \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("unexpected EOF in COPY data")));\
+}
+
+/*
+ * GET_RAW_BUF_INDEX - Caclulates the raw buf index for the cases
+ * where the data spread is across multiple data blocks.
+ */
+#define GET_RAW_BUF_INDEX(raw_buf_index, fld_size, required_blks, curr_blk_bytes) \
+{ \
+ raw_buf_index = fld_size - (((required_blks - 1) * DATA_BLOCK_SIZE) + curr_blk_bytes); \
+}
+
+/*
+ * GET_REQUIRED_BLOCKS - Caclulates the number of data
+ * blocks required for the cases where the data spread
+ * is across multiple data blocks.
+ */
+#define GET_REQUIRED_BLOCKS(required_blks, fld_size, curr_blk_bytes) \
+{ \
+ /* \
+ * field size can spread across multiple data blocks, \
+ * calculate the number of required data blocks and try to get \
+ * those many data blocks. \
+ */ \
+ required_blks = (int32)(fld_size - curr_blk_bytes)/(int32)DATA_BLOCK_SIZE; \
+ /* \
+ * check if we need the data block for the field data \
+ * bytes that are not modulus of data block size. \
+ */ \
+ if ((fld_size - curr_blk_bytes)%DATA_BLOCK_SIZE != 0) \
+ required_blks++; \
+}
+static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
/* non-export function prototypes */
static CopyState BeginCopy(ParseState *pstate, bool is_from, Relation rel,
@@ -751,6 +867,13 @@ static void PopulateCatalogInformation(CopyState cstate);
static pg_attribute_always_inline uint32 GetLinePosition(CopyState cstate);
static pg_attribute_always_inline copy_data_source_cb LookupParallelCopyFnPtr(const char *funcname);
static pg_attribute_always_inline char* LookupParallelCopyFnStr(copy_data_source_cb fn_addr);
+static uint32 WaitGetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info);
+static bool CopyReadBinaryTupleLeader(CopyState cstate);
+static void CopyReadBinaryFindTupleSize(CopyState cstate, uint32 *line_size);
+static bool CopyReadBinaryTupleWorker(CopyState cstate, Datum *values, bool *nulls);
+static Datum CopyReadBinaryAttributeWorker(CopyState cstate, FmgrInfo *flinfo,
+ Oid typioparam, int32 typmod, bool *isnull);
+static void CopyReadBinaryGetDataBlock(CopyState cstate, FieldInfoType field_info);
/*
* CopyCommonInfoForWorker - Copy shared_cstate using cstate information.
@@ -769,6 +892,7 @@ CopyCommonInfoForWorker(CopyState cstate, ParallelCopyCommonKeyData *shared_csta
shared_cstate->convert_selectively = cstate->convert_selectively;
shared_cstate->num_defaults = cstate->num_defaults;
shared_cstate->relid = cstate->pcdata->relid;
+ shared_cstate->binary = cstate->binary;
}
/*
@@ -997,8 +1121,8 @@ FindInsertMethod(CopyState cstate)
static pg_attribute_always_inline bool
IsParallelCopyAllowed(CopyState cstate)
{
- /* Parallel copy not allowed for frontend (2.0 protocol) & binary option. */
- if ((cstate->copy_dest == COPY_OLD_FE) || cstate->binary)
+ /* Parallel copy not allowed for frontend (2.0 protocol). */
+ if (cstate->copy_dest == COPY_OLD_FE)
return false;
/* Check if copy is into foreign table or temporary table. */
@@ -1299,6 +1423,7 @@ ParallelWorkerInitialization(ParallelCopyCommonKeyData *shared_cstate,
cstate->convert_selectively = shared_cstate->convert_selectively;
cstate->num_defaults = shared_cstate->num_defaults;
pcdata->relid = shared_cstate->relid;
+ cstate->binary = shared_cstate->binary;
PopulateGlobalsForCopyFrom(cstate, tup_desc, attnamelist);
@@ -1314,7 +1439,7 @@ ParallelWorkerInitialization(ParallelCopyCommonKeyData *shared_cstate,
initStringInfo(&cstate->attribute_buf);
initStringInfo(&cstate->line_buf);
- for (count = 0; count < WORKER_CHUNK_COUNT;count++)
+ for (count = 0; count < WORKER_CHUNK_COUNT; count++)
initStringInfo(&pcdata->worker_line_buf[count].line_buf);
cstate->line_buf_converted = false;
@@ -1680,32 +1805,66 @@ ParallelCopyLeader(CopyState cstate)
/* Execute the before statement triggers from the leader */
ExecBeforeStmtTrigger(cstate);
- /* On input just throw the header line away. */
- if (cstate->cur_lineno == 0 && cstate->header_line)
+ if (!cstate->binary)
{
- cstate->cur_lineno++;
- if (CopyReadLine(cstate))
+ /* On input just throw the header line away. */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
{
- pcshared_info->is_read_in_progress = false;
- return; /* done */
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ {
+ pcshared_info->is_read_in_progress = false;
+ return; /* done */
+ }
}
- }
- for (;;)
- {
- bool done;
- cstate->cur_lineno++;
+ for (;;)
+ {
+ bool done;
+ cstate->cur_lineno++;
- /* Actually read the line into memory here. */
- done = CopyReadLine(cstate);
+ /* Actually read the line into memory here. */
+ done = CopyReadLine(cstate);
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ break;
+ }
+ }
+ else
+ {
/*
- * EOF at start of line means we're done. If we see EOF after some
- * characters, we act as though it was newline followed by EOF, ie,
- * process the line and then exit loop on next iteration.
+ * Binary Format Files.
+ * For parallel copy leader, fill in the error
+ * context inforation here, in case any failures
+ * while determining tuple offsets, leader
+ * would throw the errors with proper context.
*/
- if (done && cstate->line_buf.len == 0)
- break;
+ ErrorContextCallback errcallback;
+ errcallback.callback = CopyFromErrorCallback;
+ errcallback.arg = (void *) cstate;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+ cstate->pcdata->curr_data_block = NULL;
+ cstate->raw_buf_index = 0;
+ pcshared_info->populated = 0;
+ cstate->cur_lineno = 0;
+ cstate->max_fields = list_length(cstate->attnumlist);
+
+ for (;;)
+ {
+ bool eof = false;
+ cstate->cur_lineno++;
+
+ eof = CopyReadBinaryTupleLeader(cstate);
+
+ if (eof)
+ break;
+ }
}
pcshared_info->is_read_in_progress = false;
@@ -1713,7 +1872,357 @@ ParallelCopyLeader(CopyState cstate)
}
/*
- * GetLinePosition - return the line position that worker should process.
+ * CopyReadBinaryGetDataBlock - gets a new block, updates
+ * the current offset, calculates the skip bytes.
+ */
+static void
+CopyReadBinaryGetDataBlock(CopyState cstate, FieldInfoType field_info)
+{
+ ParallelCopyDataBlock *data_block = NULL;
+ ParallelCopyDataBlock *curr_data_block = cstate->pcdata->curr_data_block;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint8 move_bytes = 0;
+ uint32 block_pos;
+ uint32 prev_block_pos;
+ int read_bytes = 0;
+
+ prev_block_pos = pcshared_info->cur_block_pos;
+
+ block_pos = WaitGetFreeCopyBlock(pcshared_info);
+
+ if (field_info == FIELD_SIZE || field_info == FIELD_COUNT)
+ move_bytes = (DATA_BLOCK_SIZE - cstate->raw_buf_index);
+
+ if (curr_data_block != NULL)
+ curr_data_block->skip_bytes = move_bytes;
+
+ data_block = &pcshared_info->data_blocks[block_pos];
+
+ if (move_bytes > 0 && curr_data_block != NULL)
+ memmove(&data_block->data[0], &curr_data_block->data[cstate->raw_buf_index], move_bytes);
+
+ elog(DEBUG1, "LEADER - field info %d is spread across data blocks - moved %d bytes from current block %u to %u block",
+ field_info, move_bytes, prev_block_pos, block_pos);
+
+ read_bytes = CopyGetData(cstate, &data_block->data[move_bytes], 1, (DATA_BLOCK_SIZE - move_bytes));
+
+ if (field_info == FIELD_NONE && cstate->reached_eof)
+ return;
+
+ if (cstate->reached_eof)
+ EOF_ERROR;
+
+ elog(DEBUG1, "LEADER - bytes read from file %d", read_bytes);
+
+ if(field_info == FIELD_SIZE || field_info == FIELD_DATA)
+ {
+ ParallelCopyDataBlock *prev_data_block = NULL;
+ prev_data_block = curr_data_block;
+ prev_data_block->following_block = block_pos;
+
+ if (prev_data_block->curr_blk_completed == false)
+ prev_data_block->curr_blk_completed = true;
+
+ pg_atomic_add_fetch_u32(&prev_data_block->unprocessed_line_parts, 1);
+ }
+
+ cstate->pcdata->curr_data_block = data_block;
+ cstate->raw_buf_index = 0;
+}
+
+/*
+ * CopyReadBinaryTupleLeader - leader reads data from binary formatted file
+ * to data blocks and identies tuple boundaries/offsets so that workers
+ * can work on the data blocks data.
+ */
+static bool
+CopyReadBinaryTupleLeader(CopyState cstate)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ int16 fld_count;
+ uint32 line_size = 0;
+ uint32 start_block_pos;
+ uint32 start_offset;
+
+ if (cstate->pcdata->curr_data_block == NULL)
+ {
+ CopyReadBinaryGetDataBlock(cstate, FIELD_NONE);
+
+ /*
+ * no data is read from file here. one possibility
+ * to be here could be that the binary file just
+ * has a valid signature but nothing else.
+ */
+ if (cstate->reached_eof)
+ return true;
+ }
+
+ if ((cstate->raw_buf_index + sizeof(fld_count)) >= DATA_BLOCK_SIZE)
+ CopyReadBinaryGetDataBlock(cstate, FIELD_COUNT);
+
+ memcpy(&fld_count, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_count));
+ fld_count = (int16) pg_ntoh16(fld_count);
+ CHECK_FIELD_COUNT;
+ start_offset = cstate->raw_buf_index;
+ cstate->raw_buf_index += sizeof(fld_count);
+ line_size += sizeof(fld_count);
+ start_block_pos = pcshared_info->cur_block_pos;
+
+ CopyReadBinaryFindTupleSize(cstate, &line_size);
+
+ pg_atomic_add_fetch_u32(&cstate->pcdata->curr_data_block->unprocessed_line_parts, 1);
+
+ if (line_size > 0)
+ {
+ int line_pos = UpdateBlockInLineInfo(cstate,
+ start_block_pos,
+ start_offset,
+ line_size,
+ LINE_LEADER_POPULATED);
+
+ pcshared_info->populated++;
+ elog(DEBUG1, "LEADER - adding - block:%u, offset:%u, line size:%u line position:%d",
+ start_block_pos, start_offset, line_size, line_pos);
+ }
+
+ return false;
+}
+
+/*
+ * CopyReadBinaryFindTupleSize - leader identifies boundaries/
+ * offsets for each attribute/column and finally results in the
+ * tuple/row size. It moves on to next data block if the attribute/
+ * column is spread across data blocks.
+ */
+static void
+CopyReadBinaryFindTupleSize(CopyState cstate, uint32 *line_size)
+{
+ int32 fld_size;
+ ListCell *cur;
+ TupleDesc tup_desc = RelationGetDescr(cstate->rel);
+
+ foreach(cur, cstate->attnumlist)
+ {
+ int att_num = lfirst_int(cur);
+ Form_pg_attribute att = TupleDescAttr(tup_desc, (att_num - 1));
+ cstate->cur_attname = NameStr(att->attname);
+ fld_size = 0;
+
+ if ((cstate->raw_buf_index + sizeof(fld_size)) >= DATA_BLOCK_SIZE)
+ CopyReadBinaryGetDataBlock(cstate, FIELD_SIZE);
+
+ memcpy(&fld_size, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_size));
+ cstate->raw_buf_index += sizeof(fld_size);
+ *line_size += sizeof(fld_size);
+ fld_size = (int32) pg_ntoh32(fld_size);
+
+ CHECK_FIELD_SIZE(fld_size);
+
+ *line_size += fld_size;
+
+ if ((DATA_BLOCK_SIZE - cstate->raw_buf_index) >= fld_size)
+ {
+ cstate->raw_buf_index += fld_size;
+ elog(DEBUG1, "LEADER - tuple lies in he same data block");
+ }
+ else
+ {
+ int32 required_blks = 0;
+ int32 curr_blk_bytes = (DATA_BLOCK_SIZE - cstate->raw_buf_index);
+ int i = 0;
+
+ GET_REQUIRED_BLOCKS(required_blks, fld_size, curr_blk_bytes);
+
+ i = required_blks;
+
+ while(i > 0)
+ {
+ CopyReadBinaryGetDataBlock(cstate, FIELD_DATA);
+ i--;
+ }
+
+ GET_RAW_BUF_INDEX(cstate->raw_buf_index, fld_size, required_blks, curr_blk_bytes);
+
+ /*
+ * raw_buf_index should never cross data block size,
+ * as the required number of data blocks would have
+ * been obtained in the above while loop.
+ */
+ Assert(cstate->raw_buf_index <= DATA_BLOCK_SIZE);
+ }
+ cstate->cur_attname = NULL;
+ }
+}
+
+/*
+ * CopyReadBinaryTupleWorker - each worker reads data from data blocks after
+ * getting leader-identified tuple offsets from ring data structure.
+ */
+static bool
+CopyReadBinaryTupleWorker(CopyState cstate, Datum *values, bool *nulls)
+{
+ uint32 line_pos;
+ ParallelCopyLineBoundary *line_info;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ int16 fld_count;
+ ListCell *cur;
+ FmgrInfo *in_functions = cstate->in_functions;
+ Oid *typioparams = cstate->typioparams;
+ TupleDesc tup_desc = RelationGetDescr(cstate->rel);
+
+ line_pos = GetLinePosition(cstate);
+
+ if (line_pos == -1)
+ return true;
+
+ line_info = &pcshared_info->line_boundaries.ring[line_pos];
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[line_info->first_block];
+ cstate->raw_buf_index = line_info->start_offset;
+
+#if 0
+ if (cstate->raw_buf_index + sizeof(fld_count) >= DATA_BLOCK_SIZE)
+ {
+ /*
+ * The case where field count spread across datablocks should never occur,
+ * as the leader would have moved it to next block. enable this if block
+ * for debugging purposes only.
+ */
+ elog(DEBUG1, "WORKER - field count spread across datablocks should never occur");
+ }
+#endif
+
+ memcpy(&fld_count, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_count));
+ fld_count = (int16) pg_ntoh16(fld_count);
+
+ CHECK_FIELD_COUNT;
+
+ cstate->raw_buf_index += sizeof(fld_count);
+
+ foreach(cur, cstate->attnumlist)
+ {
+ int att_num = lfirst_int(cur);
+ int m = att_num - 1;
+ Form_pg_attribute att = TupleDescAttr(tup_desc, m);
+ cstate->cur_attname = NameStr(att->attname);
+
+ values[m] = CopyReadBinaryAttributeWorker(cstate,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &nulls[m]);
+ cstate->cur_attname = NULL;
+ }
+
+ pg_atomic_sub_fetch_u32(&cstate->pcdata->curr_data_block->unprocessed_line_parts, 1);
+ line_info->start_offset = -1;
+ pg_atomic_write_u32(&line_info->line_state, LINE_WORKER_PROCESSED);
+ pg_atomic_write_u32(&line_info->line_size, -1);
+ pg_atomic_add_fetch_u64(&pcshared_info->total_worker_processed, 1);
+
+ return false;
+}
+
+/*
+ * CopyReadBinaryAttributeWorker - leader identifies boundaries/offsets
+ * for each attribute/column, it moves on to next data block if the
+ * attribute/column is spread across data blocks.
+ */
+static Datum
+CopyReadBinaryAttributeWorker(CopyState cstate, FmgrInfo *flinfo,
+ Oid typioparam, int32 typmod, bool *isnull)
+{
+ int32 fld_size;
+ Datum result;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+
+ if ((cstate->raw_buf_index + sizeof(fld_size)) >= DATA_BLOCK_SIZE)
+ {
+ ParallelCopyDataBlock *prev_data_block = cstate->pcdata->curr_data_block;
+ elog(DEBUG1, "WORKER - field size is spread across data blocks");
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[prev_data_block->following_block];
+ pg_atomic_sub_fetch_u32(&prev_data_block->unprocessed_line_parts, 1);
+ cstate->raw_buf_index = 0;
+ }
+
+ memcpy(&fld_size, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_size));
+ fld_size = (int32) pg_ntoh32(fld_size);
+
+ CHECK_FIELD_SIZE(fld_size);
+
+ cstate->raw_buf_index += sizeof(fld_size);
+
+ /* reset attribute_buf to empty, and load raw data in it */
+ resetStringInfo(&cstate->attribute_buf);
+
+ enlargeStringInfo(&cstate->attribute_buf, fld_size);
+
+ if ((DATA_BLOCK_SIZE - cstate->raw_buf_index) >= fld_size)
+ {
+ elog(DEBUG1, "WORKER - tuple lies in single data block");
+ memcpy(&cstate->attribute_buf.data[0],&cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], fld_size);
+ cstate->raw_buf_index += fld_size;
+ }
+ else
+ {
+ uint32 att_buf_idx = 0;
+ uint32 copy_bytes = 0;
+ int32 required_blks = 0;
+ int32 curr_blk_bytes = (DATA_BLOCK_SIZE - cstate->raw_buf_index);
+ ParallelCopyDataBlock *prev_data_block = NULL;
+ int i = 0;
+
+ GET_REQUIRED_BLOCKS(required_blks, fld_size, curr_blk_bytes);
+
+ i = required_blks;
+ prev_data_block = cstate->pcdata->curr_data_block;
+ elog(DEBUG1, "WORKER - tuple is spread across data blocks");
+ memcpy(&cstate->attribute_buf.data[0], &prev_data_block->data[cstate->raw_buf_index],
+ curr_blk_bytes);
+ copy_bytes = curr_blk_bytes;
+ att_buf_idx = curr_blk_bytes;
+
+ while (i>0)
+ {
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[prev_data_block->following_block];
+ pg_atomic_sub_fetch_u32(&prev_data_block->unprocessed_line_parts, 1);
+ cstate->raw_buf_index = 0;
+ copy_bytes = fld_size - att_buf_idx;
+
+ /*
+ * the bytes that are yet to be taken into att buff are more than
+ * the entire data block size, but only take the data block size
+ * elements.
+ */
+ if (copy_bytes >= DATA_BLOCK_SIZE)
+ copy_bytes = DATA_BLOCK_SIZE;
+
+ memcpy(&cstate->attribute_buf.data[att_buf_idx],
+ &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], copy_bytes);
+ att_buf_idx += copy_bytes;
+ i--;
+ }
+ GET_RAW_BUF_INDEX(cstate->raw_buf_index, fld_size, required_blks, curr_blk_bytes);
+ }
+
+ cstate->attribute_buf.len = fld_size;
+ cstate->attribute_buf.data[fld_size] = '\0';
+
+ /* Call the column type's binary input converter */
+ result = ReceiveFunctionCall(flinfo, &cstate->attribute_buf,
+ typioparam, typmod);
+
+ /* Trouble if it didn't eat the whole buffer */
+ if (cstate->attribute_buf.cursor != cstate->attribute_buf.len)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
+ errmsg("incorrect binary data format")));
+
+ *isnull = false;
+ return result;
+}
+
+/*
+ * GetLinePosition - return the line position that worker should pcdata->process.
*/
static uint32
GetLinePosition(CopyState cstate)
@@ -1801,7 +2310,9 @@ GetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
{
dataBlkPtr->curr_blk_completed = false;
dataBlkPtr->skip_bytes = 0;
+ dataBlkPtr->following_block = -1;
pcshared_info->cur_block_pos = block_pos;
+ MemSet(&dataBlkPtr->data[0], 0, DATA_BLOCK_SIZE);
return block_pos;
}
@@ -5476,60 +5987,47 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
else
{
/* binary */
- int16 fld_count;
- ListCell *cur;
-
- cstate->cur_lineno++;
-
- if (!CopyGetInt16(cstate, &fld_count))
+ if (!IsParallelCopy())
{
- /* EOF detected (end of file, or protocol-level EOF) */
- return false;
- }
+ int16 fld_count;
+ ListCell *cur;
- if (fld_count == -1)
- {
- /*
- * Received EOF marker. In a V3-protocol copy, wait for the
- * protocol-level EOF, and complain if it doesn't come
- * immediately. This ensures that we correctly handle CopyFail,
- * if client chooses to send that now.
- *
- * Note that we MUST NOT try to read more data in an old-protocol
- * copy, since there is no protocol-level EOF marker then. We
- * could go either way for copy from file, but choose to throw
- * error if there's data after the EOF marker, for consistency
- * with the new-protocol case.
- */
- char dummy;
+ cstate->cur_lineno++;
+ cstate->max_fields = list_length(cstate->attnumlist);
- if (cstate->copy_dest != COPY_OLD_FE &&
- CopyGetData(cstate, &dummy, 1, 1) > 0)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("received copy data after EOF marker")));
- return false;
- }
+ if (!CopyGetInt16(cstate, &fld_count))
+ {
+ /* EOF detected (end of file, or protocol-level EOF) */
+ return false;
+ }
- if (fld_count != attr_count)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("row field count is %d, expected %d",
- (int) fld_count, attr_count)));
+ CHECK_FIELD_COUNT;
- foreach(cur, cstate->attnumlist)
+ foreach(cur, cstate->attnumlist)
+ {
+ int attnum = lfirst_int(cur);
+ int m = attnum - 1;
+ Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+ cstate->cur_attname = NameStr(att->attname);
+ values[m] = CopyReadBinaryAttribute(cstate,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &nulls[m]);
+ cstate->cur_attname = NULL;
+ }
+ }
+ else
{
- int attnum = lfirst_int(cur);
- int m = attnum - 1;
- Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+ bool eof = false;
+ cstate->cur_lineno++;
+ cstate->max_fields = list_length(cstate->attnumlist);
- cstate->cur_attname = NameStr(att->attname);
- values[m] = CopyReadBinaryAttribute(cstate,
- &in_functions[m],
- typioparams[m],
- att->atttypmod,
- &nulls[m]);
- cstate->cur_attname = NULL;
+ eof = CopyReadBinaryTupleWorker(cstate, values, nulls);
+
+ if (eof)
+ return false;
}
}
@@ -6465,18 +6963,15 @@ CopyReadBinaryAttribute(CopyState cstate, FmgrInfo *flinfo,
Datum result;
if (!CopyGetInt32(cstate, &fld_size))
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("unexpected EOF in COPY data")));
+ EOF_ERROR;
+
if (fld_size == -1)
{
*isnull = true;
return ReceiveFunctionCall(flinfo, NULL, typioparam, typmod);
}
- if (fld_size < 0)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("invalid field size")));
+
+ CHECK_FIELD_SIZE(fld_size);
/* reset attribute_buf to empty, and load raw data in it */
resetStringInfo(&cstate->attribute_buf);
@@ -6484,9 +6979,7 @@ CopyReadBinaryAttribute(CopyState cstate, FmgrInfo *flinfo,
enlargeStringInfo(&cstate->attribute_buf, fld_size);
if (CopyGetData(cstate, cstate->attribute_buf.data,
fld_size, fld_size) != fld_size)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("unexpected EOF in COPY data")));
+ EOF_ERROR;
cstate->attribute_buf.len = fld_size;
cstate->attribute_buf.data[fld_size] = '\0';
--
2.25.1
On Sun, Jul 12, 2020 at 5:48 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
Hi Bharath,
I was looking forward to review this patch-set but unfortunately it is
showing a reject in copy.c, and might need a rebase.
I was applying on master over the commit-
cd22d3cdb9bd9963c694c01a8c0232bbae3ddcfb.Thanks for showing interest. Please find the patch set rebased to
latest commit b1e48bbe64a411666bb1928b9741e112e267836d.
Few comments:
====================
0001-Copy-code-readjustment-to-support-parallel-copy
I am not sure converting the code to macros is a good idea, it makes
this code harder to read. Also, there are a few changes which I am
not sure are necessary.
1.
+/*
+ * CLEAR_EOL_FROM_COPIED_DATA - Clear EOL from the copied data.
+ */
+#define CLEAR_EOL_FROM_COPIED_DATA(copy_line_data, copy_line_pos,
copy_line_size) \
+{ \
+ /* \
+ * If we didn't hit EOF, then we must have transferred the EOL marker \
+ * to line_buf along with the data. Get rid of it. \
+ */ \
+ switch (cstate->eol_type) \
+ { \
+ case EOL_NL: \
+ Assert(copy_line_size >= 1); \
+ Assert(copy_line_data[copy_line_pos - 1] == '\n'); \
+ copy_line_data[copy_line_pos - 1] = '\0'; \
+ copy_line_size--; \
+ break; \
+ case EOL_CR: \
+ Assert(copy_line_size >= 1); \
+ Assert(copy_line_data[copy_line_pos - 1] == '\r'); \
+ copy_line_data[copy_line_pos - 1] = '\0'; \
+ copy_line_size--; \
+ break; \
+ case EOL_CRNL: \
+ Assert(copy_line_size >= 2); \
+ Assert(copy_line_data[copy_line_pos - 2] == '\r'); \
+ Assert(copy_line_data[copy_line_pos - 1] == '\n'); \
+ copy_line_data[copy_line_pos - 2] = '\0'; \
+ copy_line_size -= 2; \
+ break; \
+ case EOL_UNKNOWN: \
+ /* shouldn't get here */ \
+ Assert(false); \
+ break; \
+ } \
+}
In the original code, we are using only len and buffer, here we are
using position, length/size and buffer. Is it really required or can
we do with just len and buffer?
2.
+/*
+ * INCREMENTPROCESSED - Increment the lines processed.
+ */
+#define INCREMENTPROCESSED(processed) \
+processed++;
+
+/*
+ * GETPROCESSED - Get the lines processed.
+ */
+#define GETPROCESSED(processed) \
+return processed;
+
I don't like converting above to macros. I don't think converting
such things to macros will buy us much.
0002-Framework-for-leader-worker-in-parallel-copy
3.
/*
+ * Copy data block information.
+ */
+typedef struct ParallelCopyDataBlock
It is better to add a few comments atop this data structure to explain
how it is used?
4.
+ * ParallelCopyLineBoundary is common data structure between leader & worker,
+ * this is protected by the following sequence in the leader & worker.
+ * Leader should operate in the following order:
+ * 1) update first_block, start_offset & cur_lineno in any order.
+ * 2) update line_size.
+ * 3) update line_state.
+ * Worker should operate in the following order:
+ * 1) read line_size.
+ * 2) only one worker should choose one line for processing, this is handled by
+ * using pg_atomic_compare_exchange_u32, worker will change the sate to
+ * LINE_WORKER_PROCESSING only if line_state is LINE_LEADER_POPULATED.
+ * 3) read first_block, start_offset & cur_lineno in any order.
+ */
+typedef struct ParallelCopyLineBoundary
Here, you have mentioned how workers and leader should operate to make
sure access to the data is sane. However, you have not explained what
is the problem if they don't do so and it is not apparent to me.
Also, it is not very clear what is the purpose of this data structure
from comments.
5.
+/*
+ * Circular queue used to store the line information.
+ */
+typedef struct ParallelCopyLineBoundaries
+{
+ /* Position for the leader to populate a line. */
+ uint32 leader_pos;
I don't think the variable needs to be named as leader_pos, it is okay
to name it is as 'pos' as the comment above it explains its usage.
7.
+#define DATA_BLOCK_SIZE RAW_BUF_SIZE
+#define RINGSIZE (10 * 1000)
+#define MAX_BLOCKS_COUNT 1000
+#define WORKER_CHUNK_COUNT 50 /* should be mod of RINGSIZE */
It would be good if you can write a few comments to explain why you
have chosen these default values.
8.
ParallelCopyCommonKeyData, shall we name this as
SerializedParallelCopyState or something like that? For example, see
SerializedSnapshotData which has been used to pass snapshot
information to passed to workers.
9.
+CopyCommonInfoForWorker(CopyState cstate, ParallelCopyCommonKeyData
*shared_cstate)
If you agree with point-8, then let's name this as
SerializeParallelCopyState. See, if there is more usage of similar
types in the patch then lets change those as well.
10.
+ * in the DSM. The specified number of workers will then be launched.
+ *
+ */
+static ParallelContext*
+BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
No need of an extra line with only '*' in the above multi-line comment.
11.
BeginParallelCopy(..)
{
..
+ EstimateLineKeysStr(pcxt, cstate->null_print);
+ EstimateLineKeysStr(pcxt, cstate->null_print_client);
+ EstimateLineKeysStr(pcxt, cstate->delim);
+ EstimateLineKeysStr(pcxt, cstate->quote);
+ EstimateLineKeysStr(pcxt, cstate->escape);
..
}
Why do we need to do this separately for each variable of cstate?
Can't we serialize it along with other members of
SerializeParallelCopyState (a new name for ParallelCopyCommonKeyData)?
12.
BeginParallelCopy(..)
{
..
+ LaunchParallelWorkers(pcxt);
+ if (pcxt->nworkers_launched == 0)
+ {
+ EndParallelCopy(pcxt);
+ elog(WARNING,
+ "No workers available, copy will be run in non-parallel mode");
..
}
I don't see the need to issue a WARNING if we are not able to launch
workers. We don't do that for other cases where we fail to launch
workers.
13.
+}
+/*
+ * ParallelCopyMain -
..
+}
+/*
+ * ParallelCopyLeader
One line space is required before starting a new function.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Thanks for the comments Amit.
On Wed, Jul 15, 2020 at 10:34 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
Few comments:
====================
0001-Copy-code-readjustment-to-support-parallel-copyI am not sure converting the code to macros is a good idea, it makes this code harder to read. Also, there are a few changes which I am not sure are necessary. 1. +/* + * CLEAR_EOL_FROM_COPIED_DATA - Clear EOL from the copied data. + */ +#define CLEAR_EOL_FROM_COPIED_DATA(copy_line_data, copy_line_pos, copy_line_size) \ +{ \ + /* \ + * If we didn't hit EOF, then we must have transferred the EOL marker \ + * to line_buf along with the data. Get rid of it. \ + */ \ + switch (cstate->eol_type) \ + { \ + case EOL_NL: \ + Assert(copy_line_size >= 1); \ + Assert(copy_line_data[copy_line_pos - 1] == '\n'); \ + copy_line_data[copy_line_pos - 1] = '\0'; \ + copy_line_size--; \ + break; \ + case EOL_CR: \ + Assert(copy_line_size >= 1); \ + Assert(copy_line_data[copy_line_pos - 1] == '\r'); \ + copy_line_data[copy_line_pos - 1] = '\0'; \ + copy_line_size--; \ + break; \ + case EOL_CRNL: \ + Assert(copy_line_size >= 2); \ + Assert(copy_line_data[copy_line_pos - 2] == '\r'); \ + Assert(copy_line_data[copy_line_pos - 1] == '\n'); \ + copy_line_data[copy_line_pos - 2] = '\0'; \ + copy_line_size -= 2; \ + break; \ + case EOL_UNKNOWN: \ + /* shouldn't get here */ \ + Assert(false); \ + break; \ + } \ +}In the original code, we are using only len and buffer, here we are
using position, length/size and buffer. Is it really required or can
we do with just len and buffer?
Position is required so that we can have common code for parallel &
non-parallel copy, in case of parallel copy position & length will
differ as they can spread across multiple data blocks. Retained the
variables as is.
Changed the macro to function.
2. +/* + * INCREMENTPROCESSED - Increment the lines processed. + */ +#define INCREMENTPROCESSED(processed) \ +processed++; + +/* + * GETPROCESSED - Get the lines processed. + */ +#define GETPROCESSED(processed) \ +return processed; +I don't like converting above to macros. I don't think converting
such things to macros will buy us much.
This macro will be extended to in
0003-Allow-copy-from-command-to-process-data-from-file.patch:
+#define INCREMENTPROCESSED(processed) \
+{ \
+ if (!IsParallelCopy()) \
+ processed++; \
+ else \
+
pg_atomic_add_fetch_u64(&cstate->pcdata->pcshared_info->processed, 1);
\
+}
This need to be made to macro so that it can handle both parallel copy
and non parallel copy.
Retaining this as macro, if you insist I can move the change to
0003-Allow-copy-from-command-to-process-data-from-file.patch patch.
0002-Framework-for-leader-worker-in-parallel-copy 3. /* + * Copy data block information. + */ +typedef struct ParallelCopyDataBlockIt is better to add a few comments atop this data structure to explain
how it is used?
Fixed.
4. + * ParallelCopyLineBoundary is common data structure between leader & worker, + * this is protected by the following sequence in the leader & worker. + * Leader should operate in the following order: + * 1) update first_block, start_offset & cur_lineno in any order. + * 2) update line_size. + * 3) update line_state. + * Worker should operate in the following order: + * 1) read line_size. + * 2) only one worker should choose one line for processing, this is handled by + * using pg_atomic_compare_exchange_u32, worker will change the sate to + * LINE_WORKER_PROCESSING only if line_state is LINE_LEADER_POPULATED. + * 3) read first_block, start_offset & cur_lineno in any order. + */ +typedef struct ParallelCopyLineBoundaryHere, you have mentioned how workers and leader should operate to make
sure access to the data is sane. However, you have not explained what
is the problem if they don't do so and it is not apparent to me.
Also, it is not very clear what is the purpose of this data structure
from comments.
Fixed
5. +/* + * Circular queue used to store the line information. + */ +typedef struct ParallelCopyLineBoundaries +{ + /* Position for the leader to populate a line. */ + uint32 leader_pos;I don't think the variable needs to be named as leader_pos, it is okay
to name it is as 'pos' as the comment above it explains its usage.
Fixed
7. +#define DATA_BLOCK_SIZE RAW_BUF_SIZE +#define RINGSIZE (10 * 1000) +#define MAX_BLOCKS_COUNT 1000 +#define WORKER_CHUNK_COUNT 50 /* should be mod of RINGSIZE */It would be good if you can write a few comments to explain why you
have chosen these default values.
Fixed
8.
ParallelCopyCommonKeyData, shall we name this as
SerializedParallelCopyState or something like that? For example, see
SerializedSnapshotData which has been used to pass snapshot
information to passed to workers.
Renamed as suggested
9.
+CopyCommonInfoForWorker(CopyState cstate, ParallelCopyCommonKeyData
*shared_cstate)If you agree with point-8, then let's name this as
SerializeParallelCopyState. See, if there is more usage of similar
types in the patch then lets change those as well.
Fixed
10. + * in the DSM. The specified number of workers will then be launched. + * + */ +static ParallelContext* +BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)No need of an extra line with only '*' in the above multi-line comment.
Fixed
11. BeginParallelCopy(..) { .. + EstimateLineKeysStr(pcxt, cstate->null_print); + EstimateLineKeysStr(pcxt, cstate->null_print_client); + EstimateLineKeysStr(pcxt, cstate->delim); + EstimateLineKeysStr(pcxt, cstate->quote); + EstimateLineKeysStr(pcxt, cstate->escape); .. }Why do we need to do this separately for each variable of cstate?
Can't we serialize it along with other members of
SerializeParallelCopyState (a new name for ParallelCopyCommonKeyData)?
These are variable length string variables, I felt we will not be able
to serialize along with other members and need to be serialized
separately.
12. BeginParallelCopy(..) { .. + LaunchParallelWorkers(pcxt); + if (pcxt->nworkers_launched == 0) + { + EndParallelCopy(pcxt); + elog(WARNING, + "No workers available, copy will be run in non-parallel mode"); .. }I don't see the need to issue a WARNING if we are not able to launch
workers. We don't do that for other cases where we fail to launch
workers.
Fixed
13. +} +/* + * ParallelCopyMain - ..+} +/* + * ParallelCopyLeaderOne line space is required before starting a new function.
Fixed
Please find the updated patch with the fixes included.
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
Attachments:
0001-Copy-code-readjustment-to-support-parallel-copy.patchapplication/x-patch; name=0001-Copy-code-readjustment-to-support-parallel-copy.patchDownload
From e4d85e2d78c8496248e5fa08e144bd50a14c2173 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Thu, 16 Jul 2020 21:51:25 +0530
Subject: [PATCH 1/6] Copy code readjustment to support parallel copy.
This patch has the copy code slightly readjusted so that the common code is
separated to functions/macros, these functions/macros will be used by the
workers in the parallel copy code of the upcoming patches. EOL removal is moved
from CopyReadLine to CopyReadLineText, this change was required because in case
of parallel copy the record identification and record updation is done in
CopyReadLineText, before record information is updated in shared memory the new
line characters should be removed.
---
src/backend/commands/copy.c | 361 ++++++++++++++++++++++++++------------------
1 file changed, 218 insertions(+), 143 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 44da71c..09419ea 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -95,6 +95,9 @@ typedef enum CopyInsertMethod
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
} CopyInsertMethod;
+#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
+
/*
* This struct contains all the state variables used throughout a COPY
* operation. For simplicity, we use the same struct for all variants of COPY,
@@ -222,7 +225,6 @@ typedef struct CopyStateData
* converts it. Note: we guarantee that there is a \0 at
* raw_buf[raw_buf_len].
*/
-#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
char *raw_buf;
int raw_buf_index; /* next byte to process */
int raw_buf_len; /* total # of bytes stored */
@@ -350,6 +352,27 @@ if (1) \
goto not_end_of_copy; \
} else ((void) 0)
+/*
+ * CLEAR_EOL_LINE - Wrapper for clearing EOL.
+ */
+#define CLEAR_EOL_LINE() \
+if (!result && !IsHeaderLine()) \
+ ClearEOLFromCopiedData(cstate, cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ &cstate->line_buf.len) \
+
+/*
+ * INCREMENTPROCESSED - Increment the lines processed.
+ */
+#define INCREMENTPROCESSED(processed) \
+processed++;
+
+/*
+ * GETPROCESSED - Get the lines processed.
+ */
+#define GETPROCESSED(processed) \
+return processed;
+
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
@@ -395,7 +418,11 @@ static bool CopyGetInt32(CopyState cstate, int32 *val);
static void CopySendInt16(CopyState cstate, int16 val);
static bool CopyGetInt16(CopyState cstate, int16 *val);
-
+static void PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tup_desc,
+ List *attnamelist);
+static void ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
+ int copy_line_pos, int *copy_line_size);
+static void ConvertToServerEncoding(CopyState cstate);
/*
* Send copy start/stop messages for frontend copies. These have changed
* in past protocol redesigns.
@@ -796,6 +823,7 @@ CopyLoadRawBuf(CopyState cstate)
{
int nbytes;
int inbytes;
+ int minread = 1;
if (cstate->raw_buf_index < cstate->raw_buf_len)
{
@@ -807,8 +835,11 @@ CopyLoadRawBuf(CopyState cstate)
else
nbytes = 0; /* no data need be saved */
+ if (cstate->copy_dest == COPY_NEW_FE)
+ minread = RAW_BUF_SIZE - nbytes;
+
inbytes = CopyGetData(cstate, cstate->raw_buf + nbytes,
- 1, RAW_BUF_SIZE - nbytes);
+ minread, RAW_BUF_SIZE - nbytes);
nbytes += inbytes;
cstate->raw_buf[nbytes] = '\0';
cstate->raw_buf_index = 0;
@@ -1466,7 +1497,6 @@ BeginCopy(ParseState *pstate,
{
CopyState cstate;
TupleDesc tupDesc;
- int num_phys_attrs;
MemoryContext oldcontext;
/* Allocate workspace and zero all fields */
@@ -1632,6 +1662,24 @@ BeginCopy(ParseState *pstate,
tupDesc = cstate->queryDesc->tupDesc;
}
+ PopulateGlobalsForCopyFrom(cstate, tupDesc, attnamelist);
+ cstate->copy_dest = COPY_FILE; /* default */
+
+ MemoryContextSwitchTo(oldcontext);
+
+ return cstate;
+}
+
+/*
+ * PopulateGlobalsForCopyFrom - Populates the common variables required for copy
+ * from operation. This is a helper function for BeginCopy function.
+ */
+static void
+PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tupDesc,
+ List *attnamelist)
+{
+ int num_phys_attrs;
+
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
@@ -1751,12 +1799,6 @@ BeginCopy(ParseState *pstate,
pg_database_encoding_max_length() > 1);
/* See Multibyte encoding comment above */
cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
-
- cstate->copy_dest = COPY_FILE; /* default */
-
- MemoryContextSwitchTo(oldcontext);
-
- return cstate;
}
/*
@@ -2648,32 +2690,11 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
}
/*
- * Copy FROM file to relation.
+ * Check if the relation specified in copy from is valid.
*/
-uint64
-CopyFrom(CopyState cstate)
+static void
+CheckTargetRelValidity(CopyState cstate)
{
- ResultRelInfo *resultRelInfo;
- ResultRelInfo *target_resultRelInfo;
- ResultRelInfo *prevResultRelInfo = NULL;
- EState *estate = CreateExecutorState(); /* for ExecConstraints() */
- ModifyTableState *mtstate;
- ExprContext *econtext;
- TupleTableSlot *singleslot = NULL;
- MemoryContext oldcontext = CurrentMemoryContext;
-
- PartitionTupleRouting *proute = NULL;
- ErrorContextCallback errcallback;
- CommandId mycid = GetCurrentCommandId(true);
- int ti_options = 0; /* start with default options for insert */
- BulkInsertState bistate = NULL;
- CopyInsertMethod insertMethod;
- CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
- uint64 processed = 0;
- bool has_before_insert_row_trig;
- bool has_instead_insert_row_trig;
- bool leafpart_use_multi_insert = false;
-
Assert(cstate->rel);
/*
@@ -2710,27 +2731,6 @@ CopyFrom(CopyState cstate)
RelationGetRelationName(cstate->rel))));
}
- /*
- * If the target file is new-in-transaction, we assume that checking FSM
- * for free space is a waste of time. This could possibly be wrong, but
- * it's unlikely.
- */
- if (RELKIND_HAS_STORAGE(cstate->rel->rd_rel->relkind) &&
- (cstate->rel->rd_createSubid != InvalidSubTransactionId ||
- cstate->rel->rd_firstRelfilenodeSubid != InvalidSubTransactionId))
- ti_options |= TABLE_INSERT_SKIP_FSM;
-
- /*
- * Optimize if new relfilenode was created in this subxact or one of its
- * committed children and we won't see those rows later as part of an
- * earlier scan or command. The subxact test ensures that if this subxact
- * aborts then the frozen rows won't be visible after xact cleanup. Note
- * that the stronger test of exactly which subtransaction created it is
- * crucial for correctness of this optimization. The test for an earlier
- * scan or command tolerates false negatives. FREEZE causes other sessions
- * to see rows they would not see under MVCC, and a false negative merely
- * spreads that anomaly to the current session.
- */
if (cstate->freeze)
{
/*
@@ -2768,9 +2768,61 @@ CopyFrom(CopyState cstate)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("cannot perform COPY FREEZE because the table was not created or truncated in the current subtransaction")));
+ }
+}
+
+/*
+ * Copy FROM file to relation.
+ */
+uint64
+CopyFrom(CopyState cstate)
+{
+ ResultRelInfo *resultRelInfo;
+ ResultRelInfo *target_resultRelInfo;
+ ResultRelInfo *prevResultRelInfo = NULL;
+ EState *estate = CreateExecutorState(); /* for ExecConstraints() */
+ ModifyTableState *mtstate;
+ ExprContext *econtext;
+ TupleTableSlot *singleslot = NULL;
+ MemoryContext oldcontext = CurrentMemoryContext;
+ PartitionTupleRouting *proute = NULL;
+ ErrorContextCallback errcallback;
+ CommandId mycid = GetCurrentCommandId(true);
+ int ti_options = 0; /* start with default options for insert */
+ BulkInsertState bistate = NULL;
+ CopyInsertMethod insertMethod;
+ CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
+ uint64 processed = 0;
+ bool has_before_insert_row_trig;
+ bool has_instead_insert_row_trig;
+ bool leafpart_use_multi_insert = false;
+
+ CheckTargetRelValidity(cstate);
+
+ /*
+ * If the target file is new-in-transaction, we assume that checking FSM
+ * for free space is a waste of time. This could possibly be wrong, but
+ * it's unlikely.
+ */
+ if (RELKIND_HAS_STORAGE(cstate->rel->rd_rel->relkind) &&
+ (cstate->rel->rd_createSubid != InvalidSubTransactionId ||
+ cstate->rel->rd_firstRelfilenodeSubid != InvalidSubTransactionId))
+ ti_options |= TABLE_INSERT_SKIP_FSM;
+
+ /*
+ * Optimize if new relfilenode was created in this subxact or one of its
+ * committed children and we won't see those rows later as part of an
+ * earlier scan or command. The subxact test ensures that if this subxact
+ * aborts then the frozen rows won't be visible after xact cleanup. Note
+ * that the stronger test of exactly which subtransaction created it is
+ * crucial for correctness of this optimization. The test for an earlier
+ * scan or command tolerates false negatives. FREEZE causes other sessions
+ * to see rows they would not see under MVCC, and a false negative merely
+ * spreads that anomaly to the current session.
+ */
+ if (cstate->freeze)
ti_options |= TABLE_INSERT_FROZEN;
- }
/*
* We need a ResultRelInfo so we can use the regular executor's
@@ -3263,7 +3315,7 @@ CopyFrom(CopyState cstate)
* or FDW; this is the same definition used by nodeModifyTable.c
* for counting tuples inserted by an INSERT command.
*/
- processed++;
+ INCREMENTPROCESSED(processed)
}
}
@@ -3318,30 +3370,15 @@ CopyFrom(CopyState cstate)
FreeExecutorState(estate);
- return processed;
+ GETPROCESSED(processed)
}
/*
- * Setup to read tuples from a file for COPY FROM.
- *
- * 'rel': Used as a template for the tuples
- * 'filename': Name of server-local file to read
- * 'attnamelist': List of char *, columns to include. NIL selects all cols.
- * 'options': List of DefElem. See copy_opt_item in gram.y for selections.
- *
- * Returns a CopyState, to be passed to NextCopyFrom and related functions.
+ * PopulateCatalogInformation - populate the catalog information.
*/
-CopyState
-BeginCopyFrom(ParseState *pstate,
- Relation rel,
- const char *filename,
- bool is_program,
- copy_data_source_cb data_source_cb,
- List *attnamelist,
- List *options)
+static void
+PopulateCatalogInformation(CopyState cstate)
{
- CopyState cstate;
- bool pipe = (filename == NULL);
TupleDesc tupDesc;
AttrNumber num_phys_attrs,
num_defaults;
@@ -3351,38 +3388,8 @@ BeginCopyFrom(ParseState *pstate,
Oid in_func_oid;
int *defmap;
ExprState **defexprs;
- MemoryContext oldcontext;
bool volatile_defexprs;
- cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
- oldcontext = MemoryContextSwitchTo(cstate->copycontext);
-
- /* Initialize state variables */
- cstate->reached_eof = false;
- cstate->eol_type = EOL_UNKNOWN;
- cstate->cur_relname = RelationGetRelationName(cstate->rel);
- cstate->cur_lineno = 0;
- cstate->cur_attname = NULL;
- cstate->cur_attval = NULL;
-
- /*
- * Set up variables to avoid per-attribute overhead. attribute_buf is
- * used in both text and binary modes, but we use line_buf and raw_buf
- * only in text mode.
- */
- initStringInfo(&cstate->attribute_buf);
- if (!cstate->binary)
- {
- initStringInfo(&cstate->line_buf);
- cstate->line_buf_converted = false;
- cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
- cstate->raw_buf_index = cstate->raw_buf_len = 0;
- }
-
- /* Assign range table, we'll need it in CopyFrom. */
- if (pstate)
- cstate->range_table = pstate->p_rtable;
-
tupDesc = RelationGetDescr(cstate->rel);
num_phys_attrs = tupDesc->natts;
num_defaults = 0;
@@ -3460,6 +3467,61 @@ BeginCopyFrom(ParseState *pstate,
cstate->defexprs = defexprs;
cstate->volatile_defexprs = volatile_defexprs;
cstate->num_defaults = num_defaults;
+}
+
+/*
+ * Setup to read tuples from a file for COPY FROM.
+ *
+ * 'rel': Used as a template for the tuples
+ * 'filename': Name of server-local file to read
+ * 'attnamelist': List of char *, columns to include. NIL selects all cols.
+ * 'options': List of DefElem. See copy_opt_item in gram.y for selections.
+ *
+ * Returns a CopyState, to be passed to NextCopyFrom and related functions.
+ */
+CopyState
+BeginCopyFrom(ParseState *pstate,
+ Relation rel,
+ const char *filename,
+ bool is_program,
+ copy_data_source_cb data_source_cb,
+ List *attnamelist,
+ List *options)
+{
+ CopyState cstate;
+ bool pipe = (filename == NULL);
+ MemoryContext oldcontext;
+
+ cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+
+ /* Initialize state variables */
+ cstate->reached_eof = false;
+ cstate->eol_type = EOL_UNKNOWN;
+ cstate->cur_relname = RelationGetRelationName(cstate->rel);
+ cstate->cur_lineno = 0;
+ cstate->cur_attname = NULL;
+ cstate->cur_attval = NULL;
+
+ /*
+ * Set up variables to avoid per-attribute overhead. attribute_buf is
+ * used in both text and binary modes, but we use line_buf and raw_buf
+ * only in text mode.
+ */
+ initStringInfo(&cstate->attribute_buf);
+ if (!cstate->binary)
+ {
+ initStringInfo(&cstate->line_buf);
+ cstate->line_buf_converted = false;
+ cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+ }
+
+ /* Assign range table, we'll need it in CopyFrom. */
+ if (pstate)
+ cstate->range_table = pstate->p_rtable;
+
+ PopulateCatalogInformation(cstate);
cstate->is_program = is_program;
if (data_source_cb)
@@ -3869,45 +3931,60 @@ CopyReadLine(CopyState cstate)
} while (CopyLoadRawBuf(cstate));
}
}
- else
- {
- /*
- * If we didn't hit EOF, then we must have transferred the EOL marker
- * to line_buf along with the data. Get rid of it.
- */
- switch (cstate->eol_type)
- {
- case EOL_NL:
- Assert(cstate->line_buf.len >= 1);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\n');
- cstate->line_buf.len--;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_CR:
- Assert(cstate->line_buf.len >= 1);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\r');
- cstate->line_buf.len--;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_CRNL:
- Assert(cstate->line_buf.len >= 2);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 2] == '\r');
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\n');
- cstate->line_buf.len -= 2;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_UNKNOWN:
- /* shouldn't get here */
- Assert(false);
- break;
- }
- }
+ ConvertToServerEncoding(cstate);
+ return result;
+}
+
+/*
+ * ClearEOLFromCopiedData - Clear EOL from the copied data.
+ */
+static void
+ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
+ int copy_line_pos, int *copy_line_size)
+{
+ /*
+ * If we didn't hit EOF, then we must have transferred the EOL marker
+ * to line_buf along with the data. Get rid of it.
+ */
+ switch (cstate->eol_type)
+ {
+ case EOL_NL:
+ Assert(*copy_line_size >= 1);
+ Assert(copy_line_data[copy_line_pos - 1] == '\n');
+ copy_line_data[copy_line_pos - 1] = '\0';
+ (*copy_line_size)--;
+ break;
+ case EOL_CR:
+ Assert(*copy_line_size >= 1);
+ Assert(copy_line_data[copy_line_pos - 1] == '\r');
+ copy_line_data[copy_line_pos - 1] = '\0';
+ (*copy_line_size)--;
+ break;
+ case EOL_CRNL:
+ Assert(*copy_line_size >= 2);
+ Assert(copy_line_data[copy_line_pos - 2] == '\r');
+ Assert(copy_line_data[copy_line_pos - 1] == '\n');
+ copy_line_data[copy_line_pos - 2] = '\0';
+ *copy_line_size -= 2;
+ break;
+ case EOL_UNKNOWN:
+ /* shouldn't get here */
+ Assert(false);
+ break;
+ }
+}
+
+/*
+ * ConvertToServerEncoding - convert contents to server encoding.
+ */
+static void
+ConvertToServerEncoding(CopyState cstate)
+{
/* Done reading the line. Convert it to server encoding. */
if (cstate->need_transcoding)
{
char *cvt;
-
cvt = pg_any_to_server(cstate->line_buf.data,
cstate->line_buf.len,
cstate->file_encoding);
@@ -3919,11 +3996,8 @@ CopyReadLine(CopyState cstate)
pfree(cvt);
}
}
-
/* Now it's safe to use the buffer in error messages */
cstate->line_buf_converted = true;
-
- return result;
}
/*
@@ -4286,6 +4360,7 @@ not_end_of_copy:
* Transfer any still-uncopied data to line_buf.
*/
REFILL_LINEBUF;
+ CLEAR_EOL_LINE();
return result;
}
--
1.8.3.1
0002-Framework-for-leader-worker-in-parallel-copy.patchapplication/x-patch; name=0002-Framework-for-leader-worker-in-parallel-copy.patchDownload
From 68704caa97bf06ca4fbc87751118c9b0f7f324c9 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Thu, 16 Jul 2020 21:53:11 +0530
Subject: [PATCH 2/6] Framework for leader/worker in parallel copy
This patch has the framework for data structures in parallel copy, leader
initialization, worker initialization, shared memory updation, starting workers,
wait for workers and workers exiting.
---
src/backend/access/transam/parallel.c | 4 +
src/backend/commands/copy.c | 897 +++++++++++++++++++++++++++-
src/backend/replication/logical/tablesync.c | 2 +-
src/include/commands/copy.h | 4 +
src/tools/pgindent/typedefs.list | 8 +
5 files changed, 910 insertions(+), 5 deletions(-)
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index b042696..a3cff4b 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -25,6 +25,7 @@
#include "catalog/pg_enum.h"
#include "catalog/storage.h"
#include "commands/async.h"
+#include "commands/copy.h"
#include "executor/execParallel.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
@@ -145,6 +146,9 @@ static const struct
},
{
"parallel_vacuum_main", parallel_vacuum_main
+ },
+ {
+ "ParallelCopyMain", ParallelCopyMain
}
};
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 09419ea..50da871 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -96,9 +96,174 @@ typedef enum CopyInsertMethod
} CopyInsertMethod;
#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+
+/*
+ * The macros DATA_BLOCK_SIZE, RINGSIZE & MAX_BLOCKS_COUNT stores the records
+ * read from the file that need to be inserted into the relation. These values
+ * help in handover of multiple records with significant size of data to be
+ * processed by each of the workers to make sure there is no context swich & the
+ * work is fairly distrtibuted among the workers. This number showed best
+ * results in the performance tests.
+ */
+#define DATA_BLOCK_SIZE RAW_BUF_SIZE
+
+/* It can hold upto 10000 record information for worker to process. */
+#define RINGSIZE (10 * 1000)
+
+/* It can hold 1000 blocks of 64K data in DSM to be processed by the worker. */
+#define MAX_BLOCKS_COUNT 1000
+
+/*
+ * Each worker will be allocated WORKER_CHUNK_COUNT of records from DSM data
+ * block to process to avoid lock contention. This value should be mode of
+ * RINGSIZE, as wrap around cases is currently not handled while selecting the
+ * WORKER_CHUNK_COUNT by the worker.
+ */
+#define WORKER_CHUNK_COUNT 50
+
+#define IsParallelCopy() (cstate->is_parallel)
+#define IsLeader() (cstate->pcdata->is_leader)
#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
/*
+ * Copy data block information.
+ * ParallelCopyDataBlock's will be created in DSM. Data read from file will be
+ * copied in these DSM data blockes. The leader process identifies the records
+ * and the record information will be shared to the workers. The workers will
+ * insert the records into the table. There can be one or more number of records
+ * in each of the data block based on the record size.
+ */
+typedef struct ParallelCopyDataBlock
+{
+ /* The number of unprocessed lines in the current block. */
+ pg_atomic_uint32 unprocessed_line_parts;
+
+ /*
+ * If the current line data is continued into another block, following_block
+ * will have the position where the remaining data need to be read.
+ */
+ uint32 following_block;
+
+ /*
+ * This flag will be set, when the leader finds out this block can be read
+ * safely by the worker. This helps the worker to start processing the line
+ * early where the line will be spread across many blocks and the worker
+ * need not wait for the complete line to be processed.
+ */
+ bool curr_blk_completed;
+ char data[DATA_BLOCK_SIZE]; /* data read from file */
+ uint8 skip_bytes;
+}ParallelCopyDataBlock;
+
+/*
+ * Individual line information.
+ * ParallelCopyLineBoundary is common data structure between leader & worker,
+ * Leader process will be populating data block, data block offset & the size of
+ * the record in DSM for the workers to copy the data into the relation.
+ * This is protected by the following sequence in the leader & worker. If they
+ * don't follow this order the worker might process wrong line_size and leader
+ * might populate the information which worker has not yet processed or in the
+ * process of processing.
+ * Leader should operate in the following order:
+ * 1) check if line_size is -1, if not wait, it means worker is still
+ * processing.
+ * 2) set line_state to LINE_LEADER_POPULATING.
+ * 3) update first_block, start_offset & cur_lineno in any order.
+ * 4) update line_size.
+ * 5) update line_state to LINE_LEADER_POPULATED.
+ * Worker should operate in the following order:
+ * 1) check line_state is LINE_LEADER_POPULATED, if not it means leader is still
+ * populating the data.
+ * 2) read line_size.
+ * 3) only one worker should choose one line for processing, this is handled by
+ * using pg_atomic_compare_exchange_u32, worker will change the sate to
+ * LINE_WORKER_PROCESSING only if line_state is LINE_LEADER_POPULATED.
+ * 4) read first_block, start_offset & cur_lineno in any order.
+ * 5) process line_size data.
+ * 6) update line_size to -1.
+ */
+typedef struct ParallelCopyLineBoundary
+{
+ /* Position of the first block in data_blocks array. */
+ uint32 first_block;
+ uint32 start_offset; /* start offset of the line */
+
+ /*
+ * Size of the current line -1 means line is yet to be filled completely,
+ * 0 means empty line, >0 means line filled with line size data.
+ */
+ pg_atomic_uint32 line_size;
+ pg_atomic_uint32 line_state; /* line state */
+ uint64 cur_lineno; /* line number for error messages */
+}ParallelCopyLineBoundary;
+
+/*
+ * Circular queue used to store the line information.
+ */
+typedef struct ParallelCopyLineBoundaries
+{
+ /* Position for the leader to populate a line. */
+ uint32 pos;
+
+ /* Data read from the file/stdin by the leader process. */
+ ParallelCopyLineBoundary ring[RINGSIZE];
+}ParallelCopyLineBoundaries;
+
+/*
+ * Shared information among parallel copy workers. This will be allocated in the
+ * DSM segment.
+ */
+typedef struct ParallelCopyShmInfo
+{
+ bool is_read_in_progress; /* file read status */
+
+ /*
+ * Actual lines inserted by worker (some records will be filtered based on
+ * where condition).
+ */
+ pg_atomic_uint64 processed;
+ pg_atomic_uint64 total_worker_processed; /* total processed records by the workers */
+ uint64 populated; /* lines populated by leader */
+ uint32 cur_block_pos; /* current data block */
+ ParallelCopyDataBlock data_blocks[MAX_BLOCKS_COUNT]; /* data block array */
+ FullTransactionId full_transaction_id; /* xid for copy from statement */
+ CommandId mycid; /* command id */
+ ParallelCopyLineBoundaries line_boundaries; /* line array */
+} ParallelCopyShmInfo;
+
+/*
+ * Parallel copy line buffer information.
+ */
+typedef struct ParallelCopyLineBuf
+{
+ StringInfoData line_buf;
+ uint64 cur_lineno; /* line number for error messages */
+}ParallelCopyLineBuf;
+
+/*
+ * Parallel copy data information.
+ */
+typedef struct ParallelCopyData
+{
+ Oid relid; /* relation id of the table */
+ ParallelCopyShmInfo *pcshared_info; /* common info in shared memory */
+ bool is_leader;
+
+ /* line position which worker is processing */
+ uint32 worker_processed_pos;
+
+ /*
+ * Local line_buf array, workers will copy it here and release the lines
+ * for the leader to continue.
+ */
+ ParallelCopyLineBuf worker_line_buf[WORKER_CHUNK_COUNT];
+ uint32 worker_line_buf_count; /* Number of lines */
+
+ /* Current position in worker_line_buf */
+ uint32 worker_line_buf_pos;
+}ParallelCopyData;
+
+/*
* This struct contains all the state variables used throughout a COPY
* operation. For simplicity, we use the same struct for all variants of COPY,
* even though some fields are used in only some cases.
@@ -228,8 +393,66 @@ typedef struct CopyStateData
char *raw_buf;
int raw_buf_index; /* next byte to process */
int raw_buf_len; /* total # of bytes stored */
+ int nworkers;
+ bool is_parallel;
+ ParallelCopyData *pcdata;
} CopyStateData;
+/*
+ * This structure helps in storing the common data from CopyStateData that are
+ * required by the workers. This information will then be allocated and stored
+ * into the DSM for the worker to retrieve and copy it to CopyStateData.
+ */
+typedef struct SerializedParallelCopyState
+{
+ /* low-level state data */
+ CopyDest copy_dest; /* type of copy source/destination */
+ int file_encoding; /* file or remote side's character encoding */
+ bool need_transcoding; /* file encoding diff from server? */
+ bool encoding_embeds_ascii; /* ASCII can be non-first byte? */
+
+ /* parameters from the COPY command */
+ bool csv_mode; /* Comma Separated Value format? */
+ bool header_line; /* CSV header line? */
+ int null_print_len; /* length of same */
+ bool force_quote_all; /* FORCE_QUOTE *? */
+ bool convert_selectively; /* do selective binary conversion? */
+
+ /* Working state for COPY FROM */
+ AttrNumber num_defaults;
+ Oid relid;
+}SerializedParallelCopyState;
+
+/*
+ * This structure will help in converting a List data type into the below
+ * structure format with the count having the number of elements in the list and
+ * the info having the List elements appended contigously. This converted
+ * structure will be allocated in shared memory and stored in DSM for the worker
+ * to retrieve and later convert it back to List data type.
+ */
+typedef struct ParallelCopyKeyListInfo
+{
+ int count; /* count of attributes */
+
+ /* string info in the form info followed by info1, info2... infon */
+ char info[1];
+} ParallelCopyKeyListInfo;
+
+/*
+ * List of internal parallel copy function pointers.
+ */
+static const struct
+{
+ char *fn_name;
+ copy_data_source_cb fn_addr;
+} InternalParallelCopyFuncPtrs[] =
+
+{
+ {
+ "copy_read_data", copy_read_data
+ },
+};
+
/* DestReceiver for COPY (query) TO */
typedef struct
{
@@ -259,6 +482,23 @@ typedef struct
/* Trim the list of buffers back down to this number after flushing */
#define MAX_PARTITION_BUFFERS 32
+/* DSM keys for parallel copy. */
+#define PARALLEL_COPY_KEY_SHARED_INFO 1
+#define PARALLEL_COPY_KEY_CSTATE 2
+#define PARALLEL_COPY_KEY_NULL_PRINT 3
+#define PARALLEL_COPY_KEY_NULL_PRINT_CLIENT 4
+#define PARALLEL_COPY_KEY_DELIM 5
+#define PARALLEL_COPY_KEY_QUOTE 6
+#define PARALLEL_COPY_KEY_ESCAPE 7
+#define PARALLEL_COPY_KEY_ATTNAME_LIST 8
+#define PARALLEL_COPY_KEY_FORCE_QUOTE_LIST 9
+#define PARALLEL_COPY_KEY_NOT_NULL_LIST 10
+#define PARALLEL_COPY_KEY_NULL_LIST 11
+#define PARALLEL_COPY_KEY_CONVERT_LIST 12
+#define PARALLEL_COPY_KEY_DATASOURCE_CB 13
+#define PARALLEL_COPY_KEY_WHERE_CLAUSE_STR 14
+#define PARALLEL_COPY_KEY_RANGE_TABLE 15
+
/* Stores multi-insert data related to a single relation in CopyFrom. */
typedef struct CopyMultiInsertBuffer
{
@@ -418,11 +658,593 @@ static bool CopyGetInt32(CopyState cstate, int32 *val);
static void CopySendInt16(CopyState cstate, int16 val);
static bool CopyGetInt16(CopyState cstate, int16 *val);
+static pg_attribute_always_inline void EndParallelCopy(ParallelContext *pcxt);
static void PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tup_desc,
List *attnamelist);
static void ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
int copy_line_pos, int *copy_line_size);
static void ConvertToServerEncoding(CopyState cstate);
+static pg_attribute_always_inline copy_data_source_cb LookupParallelCopyFnPtr(const char *funcname);
+static pg_attribute_always_inline char* LookupParallelCopyFnStr(copy_data_source_cb fn_addr);
+
+/*
+ * SerializeParallelCopyState - Copy shared_cstate using cstate information.
+ */
+static pg_attribute_always_inline void
+SerializeParallelCopyState(CopyState cstate, SerializedParallelCopyState *shared_cstate)
+{
+ shared_cstate->copy_dest = cstate->copy_dest;
+ shared_cstate->file_encoding = cstate->file_encoding;
+ shared_cstate->need_transcoding = cstate->need_transcoding;
+ shared_cstate->encoding_embeds_ascii = cstate->encoding_embeds_ascii;
+ shared_cstate->csv_mode = cstate->csv_mode;
+ shared_cstate->header_line = cstate->header_line;
+ shared_cstate->null_print_len = cstate->null_print_len;
+ shared_cstate->force_quote_all = cstate->force_quote_all;
+ shared_cstate->convert_selectively = cstate->convert_selectively;
+ shared_cstate->num_defaults = cstate->num_defaults;
+ shared_cstate->relid = cstate->pcdata->relid;
+}
+
+/*
+ * DeserializeString - Retrieve the string from shared memory.
+ */
+static void
+DeserializeString(shm_toc *toc, int sharedkey, char **copystr)
+{
+ char *shared_str_val = (char *)shm_toc_lookup(toc, sharedkey, true);
+ if (shared_str_val)
+ *copystr = pstrdup(shared_str_val);
+}
+
+/*
+ * DeserializeList - Retrieve the list from shared memory.
+ */
+static void
+DeserializeList(shm_toc *toc, int sharedkey, List **copylist)
+{
+ ParallelCopyKeyListInfo *listinformation = (ParallelCopyKeyListInfo *)shm_toc_lookup(toc, sharedkey,
+ true);
+ if (listinformation)
+ {
+ int length = 0;
+ int count;
+
+ for (count = 0; count < listinformation->count; count++)
+ {
+ char *attname = (char *)(listinformation->info + length);
+ length += strlen(attname) + 1;
+ *copylist = lappend(*copylist, makeString(attname));
+ }
+ }
+}
+
+/*
+ * CopyListSharedMemory - Copy the list into shared memory.
+ */
+static void
+CopyListSharedMemory(List *inputlist, Size memsize, ParallelCopyKeyListInfo *sharedlistinfo)
+{
+ ListCell *l;
+ int length = 0;
+
+ MemSet(sharedlistinfo, 0, memsize);
+ foreach(l, inputlist)
+ {
+ char *name = strVal(lfirst(l));
+ memcpy((char *)(sharedlistinfo->info + length), name, strlen(name));
+ sharedlistinfo->count++;
+ length += strlen(name) + 1;
+ }
+}
+
+/*
+ * ComputeListSize - compute the list size.
+ */
+static int
+ComputeListSize(List *inputlist)
+{
+ int est_size = sizeof(int);
+ if (inputlist != NIL)
+ {
+ ListCell *l;
+ foreach(l, inputlist)
+ est_size += strlen(strVal(lfirst(l))) + 1;
+ }
+
+ return est_size;
+}
+
+/*
+ * EstimateLineKeysStr - Estimate the size required in shared memory for the
+ * input string.
+ */
+static void
+EstimateLineKeysStr(ParallelContext *pcxt, char *inputstr)
+{
+ if (inputstr)
+ {
+ shm_toc_estimate_chunk(&pcxt->estimator, strlen(inputstr) + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+}
+
+/*
+ * EstimateLineKeysList - Estimate the size required in shared memory for the
+ * input list.
+ */
+static void
+EstimateLineKeysList(ParallelContext *pcxt, List *inputlist,
+ Size *est_list_size)
+{
+ if (inputlist != NIL)
+ {
+ *est_list_size = ComputeListSize(inputlist);
+ shm_toc_estimate_chunk(&pcxt->estimator, *est_list_size);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+}
+
+/*
+ * SerializeString - Insert a string into shared memory.
+ */
+static void
+SerializeString(ParallelContext *pcxt, int key, char *inputstr)
+{
+ if (inputstr)
+ {
+ char *shmptr = (char *)shm_toc_allocate(pcxt->toc,
+ strlen(inputstr) + 1);
+ strcpy(shmptr, inputstr);
+ shm_toc_insert(pcxt->toc, key, shmptr);
+ }
+}
+
+/*
+ * SerializeList - Insert a list into shared memory.
+ */
+static void
+SerializeList(ParallelContext *pcxt, int key, List *inputlist,
+ Size est_list_size)
+{
+ if (inputlist != NIL)
+ {
+ ParallelCopyKeyListInfo *sharedlistinfo = (ParallelCopyKeyListInfo *)shm_toc_allocate(pcxt->toc,
+ est_list_size);
+ CopyListSharedMemory(inputlist, est_list_size, sharedlistinfo);
+ shm_toc_insert(pcxt->toc, key, sharedlistinfo);
+ }
+}
+
+/*
+ * BeginParallelCopy - start parallel copy tasks.
+ *
+ * Get the number of workers required to perform the parallel copy. The data
+ * structures that are required by the parallel workers will be initialized, the
+ * size required in DSM will be calculated and the necessary keys will be loaded
+ * in the DSM. The specified number of workers will then be launched.
+ */
+static ParallelContext*
+BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
+{
+ ParallelContext *pcxt;
+ ParallelCopyShmInfo *shared_info_ptr;
+ SerializedParallelCopyState *shared_cstate;
+ FullTransactionId full_transaction_id;
+ Size est_shared_info;
+ Size est_cstateshared;
+ Size est_att_list_size;
+ Size est_quote_list_size;
+ Size est_notnull_list_size;
+ Size est_null_list_size;
+ Size est_convert_list_size;
+ Size est_datasource_cb_size;
+ int count = 0;
+ char *whereClauseStr = NULL;
+ char *rangeTableStr = NULL;
+ int parallel_workers = 0;
+ ParallelCopyData *pcdata;
+
+ parallel_workers = Min(nworkers, max_worker_processes);
+
+ /* Can't perform copy in parallel */
+ if (parallel_workers <= 0)
+ return NULL;
+
+ pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
+ cstate->pcdata = pcdata;
+
+ EnterParallelMode();
+ pcxt = CreateParallelContext("postgres", "ParallelCopyMain",
+ parallel_workers);
+ Assert(pcxt->nworkers > 0);
+
+ /*
+ * Estimate size for shared information for PARALLEL_COPY_KEY_SHARED_INFO
+ */
+ est_shared_info = sizeof(ParallelCopyShmInfo);
+ shm_toc_estimate_chunk(&pcxt->estimator, est_shared_info);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate the size for shared information for PARALLEL_COPY_KEY_CSTATE */
+ est_cstateshared = MAXALIGN(sizeof(SerializedParallelCopyState));
+ shm_toc_estimate_chunk(&pcxt->estimator, est_cstateshared);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ EstimateLineKeysStr(pcxt, cstate->null_print);
+ EstimateLineKeysStr(pcxt, cstate->null_print_client);
+ EstimateLineKeysStr(pcxt, cstate->delim);
+ EstimateLineKeysStr(pcxt, cstate->quote);
+ EstimateLineKeysStr(pcxt, cstate->escape);
+
+ if (cstate->whereClause != NULL)
+ {
+ whereClauseStr = nodeToString(cstate->whereClause);
+ EstimateLineKeysStr(pcxt, whereClauseStr);
+ }
+
+ if (cstate->range_table != NULL)
+ {
+ rangeTableStr = nodeToString(cstate->range_table);
+ EstimateLineKeysStr(pcxt, rangeTableStr);
+ }
+
+ /* Estimate the size for shared information for PARALLEL_COPY_KEY_XID. */
+ shm_toc_estimate_chunk(&pcxt->estimator, sizeof(FullTransactionId));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_ATTNAME_LIST.
+ */
+ EstimateLineKeysList(pcxt, attnamelist, &est_att_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_FORCE_QUOTE_LIST.
+ */
+ EstimateLineKeysList(pcxt, cstate->force_quote, &est_quote_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_NOT_NULL_LIST.
+ */
+ EstimateLineKeysList(pcxt, cstate->force_notnull,
+ &est_notnull_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_NULL_LIST.
+ */
+ EstimateLineKeysList(pcxt, cstate->force_null, &est_null_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_CONVERT_LIST.
+ */
+ EstimateLineKeysList(pcxt, cstate->convert_select,
+ &est_convert_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_DATASOURCE_CB.
+ */
+ if (cstate->data_source_cb)
+ {
+ char *functionname = LookupParallelCopyFnStr(cstate->data_source_cb);
+ est_datasource_cb_size = strlen(functionname) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, est_datasource_cb_size);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+
+ InitializeParallelDSM(pcxt);
+
+ /* If no DSM segment was available, back out (do serial copy) */
+ if (pcxt->seg == NULL)
+ {
+ EndParallelCopy(pcxt);
+ return NULL;
+ }
+
+ /* Allocate shared memory for PARALLEL_COPY_KEY_SHARED_INFO */
+ shared_info_ptr = (ParallelCopyShmInfo *) shm_toc_allocate(pcxt->toc,
+ est_shared_info);
+ MemSet(shared_info_ptr, 0, est_shared_info);
+ shared_info_ptr->is_read_in_progress = true;
+ shared_info_ptr->cur_block_pos = -1;
+ shared_info_ptr->full_transaction_id = full_transaction_id;
+ shared_info_ptr->mycid = GetCurrentCommandId(true);
+ for (count = 0; count < RINGSIZE; count++)
+ {
+ ParallelCopyLineBoundary *lineInfo = &shared_info_ptr->line_boundaries.ring[count];
+ pg_atomic_init_u32(&(lineInfo->line_size), -1);
+ }
+
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_SHARED_INFO, shared_info_ptr);
+ pcdata->pcshared_info = shared_info_ptr;
+ pcdata->relid = relid;
+
+ /* Store shared build state, for which we reserved space. */
+ shared_cstate = (SerializedParallelCopyState *)shm_toc_allocate(pcxt->toc,
+ est_cstateshared);
+
+ /* copy cstate variables. */
+ SerializeParallelCopyState(cstate, shared_cstate);
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_CSTATE, shared_cstate);
+
+ SerializeString(pcxt, PARALLEL_COPY_KEY_NULL_PRINT, cstate->null_print);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_NULL_PRINT_CLIENT,
+ cstate->null_print_client);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_DELIM, cstate->delim);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_QUOTE, cstate->quote);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_ESCAPE, cstate->escape);
+
+ SerializeList(pcxt, PARALLEL_COPY_KEY_ATTNAME_LIST,
+ attnamelist, est_att_list_size);
+ SerializeList(pcxt, PARALLEL_COPY_KEY_FORCE_QUOTE_LIST,
+ cstate->force_quote, est_quote_list_size);
+ SerializeList(pcxt, PARALLEL_COPY_KEY_NOT_NULL_LIST,
+ cstate->force_notnull, est_notnull_list_size);
+ SerializeList(pcxt, PARALLEL_COPY_KEY_NULL_LIST, cstate->force_null,
+ est_null_list_size);
+ SerializeList(pcxt, PARALLEL_COPY_KEY_CONVERT_LIST,
+ cstate->convert_select, est_convert_list_size);
+
+ if (cstate->data_source_cb)
+ {
+ char *functionname = LookupParallelCopyFnStr(cstate->data_source_cb);
+ char *data_source_cb = (char *) shm_toc_allocate(pcxt->toc,
+ est_datasource_cb_size);
+ strcpy(data_source_cb, functionname);
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_DATASOURCE_CB,
+ data_source_cb);
+ }
+
+ if (cstate->whereClause)
+ SerializeString(pcxt, PARALLEL_COPY_KEY_WHERE_CLAUSE_STR,
+ whereClauseStr);
+
+ if(cstate->range_table)
+ SerializeString(pcxt, PARALLEL_COPY_KEY_RANGE_TABLE, rangeTableStr);
+
+ LaunchParallelWorkers(pcxt);
+ if (pcxt->nworkers_launched == 0)
+ {
+ EndParallelCopy(pcxt);
+ return NULL;
+ }
+
+ /*
+ * Caller needs to wait for all launched workers when we return. Make sure
+ * that the failure-to-start case will not hang forever.
+ */
+ WaitForParallelWorkersToAttach(pcxt);
+
+ pcdata->is_leader = true;
+ cstate->is_parallel = true;
+ return pcxt;
+}
+
+/*
+ * EndParallelCopy - end the parallel copy tasks.
+ */
+static pg_attribute_always_inline void
+EndParallelCopy(ParallelContext *pcxt)
+{
+ Assert(!IsParallelWorker());
+
+ DestroyParallelContext(pcxt);
+ ExitParallelMode();
+}
+
+/*
+ * ParallelWorkerInitialization - Initialize parallel worker.
+ */
+static void
+ParallelWorkerInitialization(SerializedParallelCopyState *shared_cstate,
+ CopyState cstate, List *attnamelist)
+{
+ uint32 count;
+ ParallelCopyData *pcdata = cstate->pcdata;
+ TupleDesc tup_desc = RelationGetDescr(cstate->rel);
+
+ cstate->copy_dest = shared_cstate->copy_dest;
+ cstate->file_encoding = shared_cstate->file_encoding;
+ cstate->need_transcoding = shared_cstate->need_transcoding;
+ cstate->encoding_embeds_ascii = shared_cstate->encoding_embeds_ascii;
+ cstate->csv_mode = shared_cstate->csv_mode;
+ cstate->header_line = shared_cstate->header_line;
+ cstate->null_print_len = shared_cstate->null_print_len;
+ cstate->force_quote_all = shared_cstate->force_quote_all;
+ cstate->convert_selectively = shared_cstate->convert_selectively;
+ cstate->num_defaults = shared_cstate->num_defaults;
+ pcdata->relid = shared_cstate->relid;
+
+ PopulateGlobalsForCopyFrom(cstate, tup_desc, attnamelist);
+
+ /* Initialize state variables. */
+ cstate->reached_eof = false;
+ cstate->eol_type = EOL_UNKNOWN;
+ cstate->cur_relname = RelationGetRelationName(cstate->rel);
+ cstate->cur_lineno = 0;
+ cstate->cur_attname = NULL;
+ cstate->cur_attval = NULL;
+
+ /* Set up variables to avoid per-attribute overhead. */
+ initStringInfo(&cstate->attribute_buf);
+
+ initStringInfo(&cstate->line_buf);
+ for (count = 0; count < WORKER_CHUNK_COUNT;count++)
+ initStringInfo(&pcdata->worker_line_buf[count].line_buf);
+
+ cstate->line_buf_converted = false;
+ cstate->raw_buf = NULL;
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+}
+
+/*
+ * ParallelCopyMain - parallel copy worker's code.
+ *
+ * Where clause handling, convert tuple to columns, add default null values for
+ * the missing columns that are not present in that record. Find the partition
+ * if it is partitioned table, invoke before row insert Triggers, handle
+ * constraints and insert the tuples.
+ */
+void
+ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
+{
+ CopyState cstate;
+ ParallelCopyData *pcdata;
+ ParallelCopyShmInfo *pcshared_info;
+ SerializedParallelCopyState *shared_cstate;
+ Relation rel = NULL;
+ MemoryContext oldcontext;
+ List *attlist = NIL;
+ char *data_source_cb;
+ char *whereClauseStr = NULL;
+ char *rangeTableStr = NULL;
+
+ /* Allocate workspace and zero all fields. */
+ cstate = (CopyStateData *) palloc0(sizeof(CopyStateData));
+ pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
+ cstate->pcdata = pcdata;
+ pcdata->is_leader = false;
+ pcdata->worker_processed_pos = -1;
+ cstate->is_parallel = true;
+
+ /*
+ * We allocate everything used by a cstate in a new memory context. This
+ * avoids memory leaks during repeated use of COPY in a query.
+ */
+ cstate->copycontext = AllocSetContextCreate(CurrentMemoryContext,
+ "COPY",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+
+ pcshared_info = (ParallelCopyShmInfo *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_SHARED_INFO,
+ false);
+
+ ereport(DEBUG1, (errmsg("Starting parallel copy worker")));
+
+ pcdata->pcshared_info = pcshared_info;
+
+ shared_cstate = (SerializedParallelCopyState *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_CSTATE,
+ false);
+
+ cstate->null_print = (char *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_NULL_PRINT,
+ true);
+ cstate->null_print_client = (char *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_NULL_PRINT_CLIENT,
+ true);
+
+ DeserializeString(toc, PARALLEL_COPY_KEY_DELIM, &cstate->delim);
+ DeserializeString(toc, PARALLEL_COPY_KEY_QUOTE, &cstate->quote);
+ DeserializeString(toc, PARALLEL_COPY_KEY_ESCAPE, &cstate->escape);
+
+ DeserializeList(toc, PARALLEL_COPY_KEY_ATTNAME_LIST, &attlist);
+ DeserializeList(toc, PARALLEL_COPY_KEY_FORCE_QUOTE_LIST, &cstate->force_quote);
+ DeserializeList(toc, PARALLEL_COPY_KEY_NOT_NULL_LIST, &cstate->force_notnull);
+ DeserializeList(toc, PARALLEL_COPY_KEY_NULL_LIST, &cstate->force_null);
+ DeserializeList(toc, PARALLEL_COPY_KEY_CONVERT_LIST, &cstate->convert_select);
+ data_source_cb = (char *)shm_toc_lookup(toc,
+ PARALLEL_COPY_KEY_DATASOURCE_CB,
+ true);
+ DeserializeString(toc, PARALLEL_COPY_KEY_WHERE_CLAUSE_STR, &whereClauseStr);
+ DeserializeString(toc, PARALLEL_COPY_KEY_RANGE_TABLE, &rangeTableStr);
+
+ if (data_source_cb)
+ cstate->data_source_cb = LookupParallelCopyFnPtr(data_source_cb);
+
+ if (whereClauseStr)
+ {
+ Node *whereClauseCnvrtdFrmStr = (Node *) stringToNode(whereClauseStr);
+ cstate->whereClause = whereClauseCnvrtdFrmStr;
+ }
+
+ if (rangeTableStr)
+ {
+ List *rangeTableCnvrtdFrmStr = (List *) stringToNode(rangeTableStr);
+ cstate->range_table = rangeTableCnvrtdFrmStr;
+ }
+
+ /* Open and lock the relation, using the appropriate lock type. */
+ rel = table_open(shared_cstate->relid, RowExclusiveLock);
+ cstate->rel = rel;
+ ParallelWorkerInitialization(shared_cstate, cstate, attlist);
+
+ CopyFrom(cstate);
+
+ if (rel != NULL)
+ table_close(rel, RowExclusiveLock);
+
+ MemoryContextSwitchTo(oldcontext);
+ return;
+}
+
+/*
+ * ParallelCopyLeader - parallel copy leader's functionality.
+ *
+ * Leader will populate the shared memory and share it across the workers.
+ * Leader will read the table data from the file and copy the contents to block.
+ * Leader will then read the input contents and identify the data based on line
+ * beaks. This information is called line. The line will be populate in
+ * ParallelCopyLineBoundary. Workers will then pick up this information and
+ * insert in to table. Leader will do this till it completes processing the
+ * file. Leader executes the before statement if before statement trigger is
+ * present. Leader read the data from input file. Leader then loads data to data
+ * blocks as and when required block by block. Leader traverses through the data
+ * block to identify one line. It gets a free line to copy the information, if
+ * there is no free line it will wait till there is one free line.
+ * Server copies the identified lines information into lines. This process is
+ * repeated till the complete file is processed.
+ * Leader will wait till all the lines populated are processed by the workers
+ * and exits.
+ */
+static void
+ParallelCopyLeader(CopyState cstate)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ereport(DEBUG1, (errmsg("Running parallel copy leader")));
+
+ pcshared_info->is_read_in_progress = false;
+ cstate->cur_lineno = 0;
+}
+
+/*
+ * LookupParallelCopyFnPtr - Look up parallel copy function pointer.
+ */
+static pg_attribute_always_inline copy_data_source_cb
+LookupParallelCopyFnPtr(const char *funcname)
+{
+ int i;
+
+ for (i = 0; i < lengthof(InternalParallelCopyFuncPtrs); i++)
+ {
+ if (strcmp(InternalParallelCopyFuncPtrs[i].fn_name, funcname) == 0)
+ return InternalParallelCopyFuncPtrs[i].fn_addr;
+ }
+
+ /* We can only reach this by programming error. */
+ elog(ERROR, "internal function \"%s\" not found", funcname);
+}
+
+/*
+ * LookupParallelCopyFnStr - Lookup function string from a function pointer.
+ */
+static pg_attribute_always_inline char*
+LookupParallelCopyFnStr(copy_data_source_cb fn_addr)
+{
+ int i;
+
+ for (i = 0; i < lengthof(InternalParallelCopyFuncPtrs); i++)
+ {
+ if (InternalParallelCopyFuncPtrs[i].fn_addr == fn_addr)
+ return InternalParallelCopyFuncPtrs[i].fn_name;
+ }
+
+ /* We can only reach this by programming error. */
+ elog(ERROR, "internal function pointer not found");
+}
+
/*
* Send copy start/stop messages for frontend copies. These have changed
* in past protocol redesigns.
@@ -1093,6 +1915,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
if (is_from)
{
+ ParallelContext *pcxt = NULL;
Assert(rel);
/* check read-only transaction and parallel mode */
@@ -1102,7 +1925,24 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
cstate = BeginCopyFrom(pstate, rel, stmt->filename, stmt->is_program,
NULL, stmt->attlist, stmt->options);
cstate->whereClause = whereClause;
- *processed = CopyFrom(cstate); /* copy from file to database */
+ cstate->is_parallel = false;
+
+ if (cstate->nworkers > 0)
+ pcxt = BeginParallelCopy(cstate->nworkers, cstate, stmt->attlist,
+ relid);
+
+ if (pcxt)
+ {
+ ParallelCopyLeader(cstate);
+
+ /* Wait for all copy workers to finish */
+ WaitForParallelWorkersToFinish(pcxt);
+ *processed = pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
+ EndParallelCopy(pcxt);
+ }
+ else
+ *processed = CopyFrom(cstate); /* copy from file to database */
+
EndCopyFrom(cstate);
}
else
@@ -1151,6 +1991,7 @@ ProcessCopyOptions(ParseState *pstate,
cstate->is_copy_from = is_from;
cstate->file_encoding = -1;
+ cstate->nworkers = -1;
/* Extract options from the statement node tree */
foreach(option, options)
@@ -1319,6 +2160,53 @@ ProcessCopyOptions(ParseState *pstate,
defel->defname),
parser_errposition(pstate, defel->location)));
}
+ else if (strcmp(defel->defname, "parallel") == 0)
+ {
+ char *endptr, *str;
+ long val;
+
+ if (!cstate->is_copy_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("parallel option supported only for copy from"),
+ parser_errposition(pstate, defel->location)));
+ if (cstate->nworkers >= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("conflicting or redundant options"),
+ parser_errposition(pstate, defel->location)));
+
+ str = defGetString(defel);
+
+ errno = 0; /* To distinguish success/failure after call */
+ val = strtol(str, &endptr, 10);
+
+ /* Check for various possible errors */
+ if ((errno == ERANGE && (val == LONG_MAX || val == LONG_MIN))
+ || (errno != 0 && val == 0) ||
+ *endptr)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("improper use of argument to option \"%s\"",
+ defel->defname),
+ parser_errposition(pstate, defel->location)));
+
+ if (endptr == str)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("no digits were found in argument to option \"%s\"",
+ defel->defname),
+ parser_errposition(pstate, defel->location)));
+
+ cstate->nworkers = (int) val;
+
+ if (cstate->nworkers <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("argument to option \"%s\" must be a positive integer greater than zero",
+ defel->defname),
+ parser_errposition(pstate, defel->location)));
+ }
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1671,12 +2559,13 @@ BeginCopy(ParseState *pstate,
}
/*
- * PopulateGlobalsForCopyFrom - Populates the common variables required for copy
- * from operation. This is a helper function for BeginCopy function.
+ * PopulateCatalogInformation - Populates the common variables required for copy
+ * from operation. This is a helper function for BeginCopy &
+ * ParallelWorkerInitialization function.
*/
static void
PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tupDesc,
- List *attnamelist)
+ List *attnamelist)
{
int num_phys_attrs;
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index c27d970..b3787c1 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -557,7 +557,7 @@ make_copy_attnamelist(LogicalRepRelMapEntry *rel)
* Data source callback for the COPY FROM, which reads from the remote
* connection and passes the data back to our local COPY.
*/
-static int
+int
copy_read_data(void *outbuf, int minread, int maxread)
{
int bytesread = 0;
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c639833..5dc95ac 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -14,6 +14,7 @@
#ifndef COPY_H
#define COPY_H
+#include "access/parallel.h"
#include "nodes/execnodes.h"
#include "nodes/parsenodes.h"
#include "parser/parse_node.h"
@@ -41,4 +42,7 @@ extern uint64 CopyFrom(CopyState cstate);
extern DestReceiver *CreateCopyDestReceiver(void);
+extern void ParallelCopyMain(dsm_segment *seg, shm_toc *toc);
+
+extern int copy_read_data(void *outbuf, int minread, int maxread);
#endif /* COPY_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 7eaaad1..dd812b0 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1699,6 +1699,13 @@ ParallelBitmapHeapState
ParallelBlockTableScanDesc
ParallelCompletionPtr
ParallelContext
+ParallelCopyLineBoundaries
+ParallelCopyLineBoundary
+ParallelCopyData
+ParallelCopyDataBlock
+ParallelCopyKeyListInfo
+ParallelCopyLineBuf
+ParallelCopyShmInfo
ParallelExecutorInfo
ParallelHashGrowth
ParallelHashJoinBatch
@@ -2215,6 +2222,7 @@ SerCommitSeqNo
SerialControl
SerializableXactHandle
SerializedActiveRelMaps
+SerializedParallelCopyState
SerializedReindexState
SerializedSnapshotData
SerializedTransactionState
--
1.8.3.1
0003-Allow-copy-from-command-to-process-data-from-file-ST.patchapplication/x-patch; name=0003-Allow-copy-from-command-to-process-data-from-file-ST.patchDownload
From 6a1b095aeaa20b138d89161eaa08b4209038059f Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Thu, 16 Jul 2020 22:07:11 +0530
Subject: [PATCH 3/6] Allow copy from command to process data from file/STDIN
contents to a table in parallel.
This feature allows the copy from to leverage multiple CPUs in order to copy
data from file/STDIN to a table. This adds a PARALLEL option to COPY FROM
command where the user can specify the number of workers that can be used
to perform the COPY FROM command. Specifying zero as number of workers will
disable parallelism.
The backend, to which the "COPY FROM" query is submitted acts as leader with
the responsibility of reading data from the file/stdin, launching at most n
number of workers as specified with PARALLEL 'n' option in the "COPY FROM"
query. The leader populates the common data required for the workers execution
in the DSM and shares it with the workers. The leader then executes before
statement triggers if there exists any. Leader populates DSM lines which
includes the start offset and line size, while populating the lines it reads
as many blocks as required into the DSM data blocks from the file. Each block
is of 64K size. The leader parses the data to identify a line, the existing
logic from CopyReadLineText which identifies the lines with some changes was
used for this. Leader checks if a free line is available to copy the
information, if there is no free line it waits till the required line is
freed up by the worker and then copies the identified lines information
(offset & line size) into the DSM lines. This process is repeated till the
complete file is processed. Simultaneously, the workers cache the lines(50)
locally into the local memory and release the lines to the leader for further
populating. Each worker processes the lines that it cached and inserts it into
the table. The leader does not participate in the insertion of data, leaders
only responsibility will be to identify the lines as fast as possible for the
workers to do the actual copy operation. The leader waits till all the lines
populated are processed by the workers and exits.
---
src/backend/access/common/toast_internals.c | 11 +-
src/backend/access/heap/heapam.c | 13 -
src/backend/access/transam/xact.c | 31 +
src/backend/commands/copy.c | 907 ++++++++++++++++++++++++++--
src/include/access/xact.h | 2 +
5 files changed, 910 insertions(+), 54 deletions(-)
diff --git a/src/backend/access/common/toast_internals.c b/src/backend/access/common/toast_internals.c
index 25a81e5..586d53d 100644
--- a/src/backend/access/common/toast_internals.c
+++ b/src/backend/access/common/toast_internals.c
@@ -17,6 +17,7 @@
#include "access/genam.h"
#include "access/heapam.h"
#include "access/heaptoast.h"
+#include "access/parallel.h"
#include "access/table.h"
#include "access/toast_internals.h"
#include "access/xact.h"
@@ -116,7 +117,15 @@ toast_save_datum(Relation rel, Datum value,
TupleDesc toasttupDesc;
Datum t_values[3];
bool t_isnull[3];
- CommandId mycid = GetCurrentCommandId(true);
+
+ /*
+ * Parallel copy can insert toast tuples, in case of parallel copy the
+ * command would have been set already by calling AssignCommandIdForWorker.
+ * For parallel copy call GetCurrentCommandId to get currentCommandId by
+ * passing used as false, as this is taken care earlier.
+ */
+ CommandId mycid = IsParallelWorker() ? GetCurrentCommandId(false) :
+ GetCurrentCommandId(true);
struct varlena *result;
struct varatt_external toast_pointer;
union
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index d881f4c..70ecd51 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2012,19 +2012,6 @@ static HeapTuple
heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
CommandId cid, int options)
{
- /*
- * Parallel operations are required to be strictly read-only in a parallel
- * worker. Parallel inserts are not safe even in the leader in the
- * general case, because group locking means that heavyweight locks for
- * relation extension or GIN page locks will not conflict between members
- * of a lock group, but we don't prohibit that case here because there are
- * useful special cases that we can safely allow, such as CREATE TABLE AS.
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-
tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index b3ee7fa..94988b6 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -502,6 +502,37 @@ GetCurrentFullTransactionIdIfAny(void)
}
/*
+ * AssignFullTransactionIdForWorker
+ *
+ * For parallel copy, transaction id of leader will be used by the workers.
+ */
+void
+AssignFullTransactionIdForWorker(FullTransactionId fullTransactionId)
+{
+ TransactionState s = CurrentTransactionState;
+
+ Assert((IsInParallelMode() || IsParallelWorker()));
+ s->fullTransactionId = fullTransactionId;
+}
+
+/*
+ * AssignCommandIdForWorker
+ *
+ * For parallel copy, command id of leader will be used by the workers.
+ */
+void
+AssignCommandIdForWorker(CommandId commandId, bool used)
+{
+ Assert((IsInParallelMode() || IsParallelWorker()));
+
+ /* this is global to a transaction, not subtransaction-local */
+ if (used)
+ currentCommandIdUsed = true;
+
+ currentCommandId = commandId;
+}
+
+/*
* MarkCurrentTransactionIdLoggedIfAny
*
* Remember that the current xid - if it is assigned - now has been wal logged.
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 50da871..b709b2f 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -26,6 +26,7 @@
#include "access/xlog.h"
#include "catalog/dependency.h"
#include "catalog/pg_authid.h"
+#include "catalog/pg_proc_d.h"
#include "catalog/pg_type.h"
#include "commands/copy.h"
#include "commands/defrem.h"
@@ -40,11 +41,13 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "optimizer/clauses.h"
#include "optimizer/optimizer.h"
#include "parser/parse_coerce.h"
#include "parser/parse_collate.h"
#include "parser/parse_expr.h"
#include "parser/parse_relation.h"
+#include "pgstat.h"
#include "port/pg_bswap.h"
#include "rewrite/rewriteHandler.h"
#include "storage/fd.h"
@@ -95,6 +98,18 @@ typedef enum CopyInsertMethod
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
} CopyInsertMethod;
+/*
+ * State of the line.
+ */
+typedef enum ParallelCopyLineState
+{
+ LINE_INIT, /* initial state of line */
+ LINE_LEADER_POPULATING, /* leader processing line */
+ LINE_LEADER_POPULATED, /* leader completed populating line */
+ LINE_WORKER_PROCESSING, /* worker processing line */
+ LINE_WORKER_PROCESSED /* worker completed processing line */
+}ParallelCopyLineState;
+
#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
/*
@@ -577,9 +592,13 @@ if (1) \
{ \
if (raw_buf_ptr > cstate->raw_buf_index) \
{ \
- appendBinaryStringInfo(&cstate->line_buf, \
- cstate->raw_buf + cstate->raw_buf_index, \
- raw_buf_ptr - cstate->raw_buf_index); \
+ if (!IsParallelCopy()) \
+ appendBinaryStringInfo(&cstate->line_buf, \
+ cstate->raw_buf + cstate->raw_buf_index, \
+ raw_buf_ptr - cstate->raw_buf_index); \
+ else \
+ line_size += raw_buf_ptr - cstate->raw_buf_index; \
+ \
cstate->raw_buf_index = raw_buf_ptr; \
} \
} else ((void) 0)
@@ -592,26 +611,76 @@ if (1) \
goto not_end_of_copy; \
} else ((void) 0)
+/* Begin parallel copy Macros */
+#define SET_NEWLINE_SIZE() \
+{ \
+ if (cstate->eol_type == EOL_NL || cstate->eol_type == EOL_CR) \
+ new_line_size = 1; \
+ else if (cstate->eol_type == EOL_CRNL) \
+ new_line_size = 2; \
+ else \
+ new_line_size = 0; \
+}
+
+/* Begin parallel copy Macros */
+#define SET_NEWLINE_SIZE() \
+{ \
+ if (cstate->eol_type == EOL_NL || cstate->eol_type == EOL_CR) \
+ new_line_size = 1; \
+ else if (cstate->eol_type == EOL_CRNL) \
+ new_line_size = 2; \
+ else \
+ new_line_size = 0; \
+}
+
+/*
+ * COPY_WAIT_TO_PROCESS - Wait before continuing to process.
+ */
+#define COPY_WAIT_TO_PROCESS() \
+{ \
+ CHECK_FOR_INTERRUPTS(); \
+ (void) WaitLatch(MyLatch, \
+ WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, \
+ 1L, WAIT_EVENT_PG_SLEEP); \
+ ResetLatch(MyLatch); \
+}
+
/*
* CLEAR_EOL_LINE - Wrapper for clearing EOL.
*/
#define CLEAR_EOL_LINE() \
if (!result && !IsHeaderLine()) \
- ClearEOLFromCopiedData(cstate, cstate->line_buf.data, \
- cstate->line_buf.len, \
- &cstate->line_buf.len) \
+{ \
+ if (IsParallelCopy()) \
+ ClearEOLFromCopiedData(cstate, cstate->raw_buf, \
+ raw_buf_ptr, &line_size); \
+ else \
+ ClearEOLFromCopiedData(cstate, cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ &cstate->line_buf.len); \
+} \
/*
* INCREMENTPROCESSED - Increment the lines processed.
*/
-#define INCREMENTPROCESSED(processed) \
-processed++;
+#define INCREMENTPROCESSED(processed) \
+{ \
+ if (!IsParallelCopy()) \
+ processed++; \
+ else \
+ pg_atomic_add_fetch_u64(&cstate->pcdata->pcshared_info->processed, 1); \
+}
/*
* GETPROCESSED - Get the lines processed.
*/
#define GETPROCESSED(processed) \
-return processed;
+if (!IsParallelCopy()) \
+ return processed; \
+else \
+ return pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
+
+/* End parallel copy Macros */
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
@@ -666,7 +735,10 @@ static void ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
static void ConvertToServerEncoding(CopyState cstate);
static pg_attribute_always_inline copy_data_source_cb LookupParallelCopyFnPtr(const char *funcname);
static pg_attribute_always_inline char* LookupParallelCopyFnStr(copy_data_source_cb fn_addr);
-
+static void ExecBeforeStmtTrigger(CopyState cstate);
+static void CheckTargetRelValidity(CopyState cstate);
+static void PopulateCatalogInformation(CopyState cstate);
+static pg_attribute_always_inline uint32 GetLinePosition(CopyState cstate);
/*
* SerializeParallelCopyState - Copy shared_cstate using cstate information.
*/
@@ -817,6 +889,137 @@ SerializeList(ParallelContext *pcxt, int key, List *inputlist,
}
/*
+ * IsTriggerFunctionParallelSafe - Check if the trigger function is parallel
+ * safe for the triggers. Return false if any one of the trigger has parallel
+ * unsafe function.
+ */
+static pg_attribute_always_inline bool
+IsTriggerFunctionParallelSafe(TriggerDesc *trigdesc)
+{
+ int i;
+ for (i = 0; i < trigdesc->numtriggers; i++)
+ {
+ Trigger *trigger = &trigdesc->triggers[i];
+ int trigtype = RI_TRIGGER_NONE;
+
+ if (func_parallel(trigger->tgfoid) != PROPARALLEL_SAFE)
+ return false;
+
+ /* If the trigger is parallel safe, also look for RI_TRIGGER. */
+ trigtype = RI_FKey_trigger_type(trigger->tgfoid);
+ if (trigtype == RI_TRIGGER_PK || trigtype == RI_TRIGGER_FK)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * CheckExprParallelSafety - determine parallel safety of volatile expressions
+ * in default clause of column definition or in where clause and return true if
+ * they are parallel safe.
+ */
+static pg_attribute_always_inline bool
+CheckExprParallelSafety(CopyState cstate)
+{
+ if (contain_volatile_functions(cstate->whereClause))
+ {
+ if (max_parallel_hazard((Query *)cstate->whereClause) != PROPARALLEL_SAFE)
+ return false;
+ }
+
+ /*
+ * Check if any of the column has volatile default expression. if yes, and
+ * they are not parallel safe, then parallelism is not allowed. For
+ * instance, if there are any serial/bigserial columns for which nextval()
+ * default expression which is parallel unsafe is associated, parallelism
+ * should not be allowed. In non parallel copy volatile functions are not
+ * checked for nextval().
+ */
+ if (cstate->defexprs != NULL && cstate->num_defaults != 0)
+ {
+ int i;
+ for (i = 0; i < cstate->num_defaults; i++)
+ {
+ bool volatile_expr = contain_volatile_functions((Node *)cstate->defexprs[i]->expr);
+ if (volatile_expr &&
+ (max_parallel_hazard((Query *)cstate->defexprs[i]->expr)) !=
+ PROPARALLEL_SAFE)
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * FindInsertMethod - determine insert mode single, multi, or multi conditional.
+ */
+static pg_attribute_always_inline CopyInsertMethod
+FindInsertMethod(CopyState cstate)
+{
+ if (cstate->rel->trigdesc != NULL &&
+ (cstate->rel->trigdesc->trig_insert_before_row ||
+ cstate->rel->trigdesc->trig_insert_instead_row))
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
+ cstate->rel->trigdesc != NULL &&
+ cstate->rel->trigdesc->trig_insert_new_table)
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+ return CIM_MULTI_CONDITIONAL;
+
+ return CIM_MULTI;
+}
+
+/*
+ * IsParallelCopyAllowed - check for the cases where parallel copy is not
+ * applicable.
+ */
+static pg_attribute_always_inline bool
+IsParallelCopyAllowed(CopyState cstate)
+{
+ /* Parallel copy not allowed for frontend (2.0 protocol) & binary option. */
+ if ((cstate->copy_dest == COPY_OLD_FE) || cstate->binary)
+ return false;
+
+ /* Check if copy is into foreign table or temporary table. */
+ if (cstate->rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE ||
+ RelationUsesLocalBuffers(cstate->rel))
+ return false;
+
+ /* Check if trigger function is parallel safe. */
+ if (cstate->rel->trigdesc != NULL &&
+ !IsTriggerFunctionParallelSafe(cstate->rel->trigdesc))
+ return false;
+
+ /*
+ * Check if there is after statement or instead of trigger or transition
+ * table triggers.
+ */
+ if (cstate->rel->trigdesc != NULL &&
+ (cstate->rel->trigdesc->trig_insert_after_statement ||
+ cstate->rel->trigdesc->trig_insert_instead_row ||
+ cstate->rel->trigdesc->trig_insert_new_table))
+ return false;
+
+ /* Check if the volatile expressions are parallel safe, if present any. */
+ if (!CheckExprParallelSafety(cstate))
+ return false;
+
+ /* Check if the insertion mode is single. */
+ if (FindInsertMethod(cstate) == CIM_SINGLE)
+ return false;
+
+ return true;
+}
+
+/*
* BeginParallelCopy - start parallel copy tasks.
*
* Get the number of workers required to perform the parallel copy. The data
@@ -845,6 +1048,8 @@ BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
int parallel_workers = 0;
ParallelCopyData *pcdata;
+ CheckTargetRelValidity(cstate);
+
parallel_workers = Min(nworkers, max_worker_processes);
/* Can't perform copy in parallel */
@@ -854,6 +1059,19 @@ BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
cstate->pcdata = pcdata;
+ /*
+ * User chosen parallel copy. Determine if the parallel copy is actually
+ * allowed. If not, go with the non-parallel mode.
+ */
+ if (!IsParallelCopyAllowed(cstate))
+ {
+ elog(WARNING,
+ "Parallel copy not supported for specified table, copy will be run in non-parallel mode");
+ return NULL;
+ }
+
+ full_transaction_id = GetCurrentFullTransactionId();
+
EnterParallelMode();
pcxt = CreateParallelContext("postgres", "ParallelCopyMain",
parallel_workers);
@@ -1080,9 +1298,212 @@ ParallelWorkerInitialization(SerializedParallelCopyState *shared_cstate,
cstate->line_buf_converted = false;
cstate->raw_buf = NULL;
cstate->raw_buf_index = cstate->raw_buf_len = 0;
+
+ PopulateCatalogInformation(cstate);
+
+ /* Create workspace for CopyReadAttributes results. */
+ if (!cstate->binary)
+ {
+ AttrNumber attr_count = list_length(cstate->attnumlist);
+
+ cstate->max_fields = attr_count;
+ cstate->raw_fields = (char **)palloc(attr_count * sizeof(char *));
+ }
}
/*
+ * CacheLineInfo - Cache the line information to local memory.
+ */
+static bool
+CacheLineInfo(CopyState cstate, uint32 buff_count)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyData *pcdata = cstate->pcdata;
+ uint32 write_pos;
+ ParallelCopyDataBlock *data_blk_ptr;
+ ParallelCopyLineBoundary *lineInfo;
+ uint32 offset;
+ int dataSize;
+ int copiedSize = 0;
+
+ resetStringInfo(&pcdata->worker_line_buf[buff_count].line_buf);
+ write_pos = GetLinePosition(cstate);
+ if (-1 == write_pos)
+ return true;
+
+ /* Get the current line information. */
+ lineInfo = &pcshared_info->line_boundaries.ring[write_pos];
+ if (pg_atomic_read_u32(&lineInfo->line_size) == 0)
+ goto empty_data_line_update;
+
+ /* Get the block information. */
+ data_blk_ptr = &pcshared_info->data_blocks[lineInfo->first_block];
+
+ /* Get the offset information from where the data must be copied. */
+ offset = lineInfo->start_offset;
+ pcdata->worker_line_buf[buff_count].cur_lineno = lineInfo->cur_lineno;
+
+ elog(DEBUG1, "[Worker] Processing - line position:%d, block:%d, unprocessed lines:%d, offset:%d, line size:%d",
+ write_pos, lineInfo->first_block,
+ pg_atomic_read_u32(&data_blk_ptr->unprocessed_line_parts),
+ offset, pg_atomic_read_u32(&lineInfo->line_size));
+
+ for (;;)
+ {
+ uint8 skip_bytes = data_blk_ptr->skip_bytes;
+ /*
+ * There is a possibility that the above loop has come out because
+ * data_blk_ptr->curr_blk_completed is set, but dataSize read might
+ * be an old value, if data_blk_ptr->curr_blk_completed and the line is
+ * completed, line_size will be set. Read the line_size again to be
+ * sure if it is complete or partial block.
+ */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+ if (dataSize)
+ {
+ int remainingSize = dataSize - copiedSize;
+ if (!remainingSize)
+ break;
+
+ /* Whole line is in current block. */
+ if (remainingSize + offset + skip_bytes < DATA_BLOCK_SIZE)
+ {
+ appendBinaryStringInfo(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ remainingSize);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts,
+ 1);
+ break;
+ }
+ else
+ {
+ /* Line is spread across the blocks. */
+ uint32 lineInCurrentBlock = (DATA_BLOCK_SIZE - skip_bytes) - offset;
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ lineInCurrentBlock);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts, 1);
+ copiedSize += lineInCurrentBlock;
+ while (copiedSize < dataSize)
+ {
+ uint32 currentBlockCopySize;
+ ParallelCopyDataBlock *currBlkPtr = &pcshared_info->data_blocks[data_blk_ptr->following_block];
+ skip_bytes = currBlkPtr->skip_bytes;
+
+ /*
+ * If complete data is present in current block use
+ * dataSize - copiedSize, or copy the whole block from
+ * current block.
+ */
+ currentBlockCopySize = Min(dataSize - copiedSize, DATA_BLOCK_SIZE - skip_bytes);
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &currBlkPtr->data[0],
+ currentBlockCopySize);
+ pg_atomic_sub_fetch_u32(&currBlkPtr->unprocessed_line_parts, 1);
+ copiedSize += currentBlockCopySize;
+ data_blk_ptr = currBlkPtr;
+ }
+
+ break;
+ }
+ }
+ else
+ {
+ /* Copy this complete block from the current offset. */
+ uint32 lineInCurrentBlock = (DATA_BLOCK_SIZE - skip_bytes) - offset;
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ lineInCurrentBlock);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts, 1);
+ copiedSize += lineInCurrentBlock;
+
+ /*
+ * Reset the offset. For the first copy, copy from the offset. For
+ * the subsequent copy the complete block.
+ */
+ offset = 0;
+
+ /* Set data_blk_ptr to the following block. */
+ data_blk_ptr = &pcshared_info->data_blocks[data_blk_ptr->following_block];
+ }
+
+ for (;;)
+ {
+ /* Get the size of this line */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+
+ /*
+ * If the data is present in current block lineInfo.line_size
+ * will be updated. If the data is spread across the blocks either
+ * of lineInfo.line_size or data_blk_ptr->curr_blk_completed can
+ * be updated. lineInfo.line_size will be updated if the complete
+ * read is finished. data_blk_ptr->curr_blk_completed will be
+ * updated if processing of current block is finished and data
+ * processing is not finished.
+ */
+ if (data_blk_ptr->curr_blk_completed || (dataSize != -1))
+ break;
+
+ COPY_WAIT_TO_PROCESS()
+ }
+ }
+
+empty_data_line_update:
+ elog(DEBUG1, "[Worker] Completed processing line:%d", write_pos);
+ pg_atomic_write_u32(&lineInfo->line_state, LINE_WORKER_PROCESSED);
+ pg_atomic_write_u32(&lineInfo->line_size, -1);
+ pg_atomic_add_fetch_u64(&pcshared_info->total_worker_processed, 1);
+ return false;
+}
+
+/*
+ * GetWorkerLine - Returns a line for worker to process.
+ */
+static bool
+GetWorkerLine(CopyState cstate)
+{
+ uint32 buff_count;
+ ParallelCopyData *pcdata = cstate->pcdata;
+
+ /*
+ * Copy the line data to line_buf and release the line position so that the
+ * worker can continue loading data.
+ */
+ if (pcdata->worker_line_buf_pos < pcdata->worker_line_buf_count)
+ goto return_line;
+
+ pcdata->worker_line_buf_pos = 0;
+ pcdata->worker_line_buf_count = 0;
+
+ for (buff_count = 0; buff_count < WORKER_CHUNK_COUNT; buff_count++)
+ {
+ bool result = CacheLineInfo(cstate, buff_count);
+ if (result)
+ break;
+
+ pcdata->worker_line_buf_count++;
+ }
+
+ if (pcdata->worker_line_buf_count)
+ goto return_line;
+ else
+ resetStringInfo(&cstate->line_buf);
+
+ return true;
+
+return_line:
+ cstate->line_buf = pcdata->worker_line_buf[pcdata->worker_line_buf_pos].line_buf;
+ cstate->cur_lineno = pcdata->worker_line_buf[pcdata->worker_line_buf_pos].cur_lineno;
+ cstate->line_buf_valid = true;
+
+ /* Mark that encoding conversion hasn't occurred yet. */
+ cstate->line_buf_converted = false;
+ ConvertToServerEncoding(cstate);
+ pcdata->worker_line_buf_pos++;
+ return false;
+ }
+
+/*
* ParallelCopyMain - parallel copy worker's code.
*
* Where clause handling, convert tuple to columns, add default null values for
@@ -1127,6 +1548,8 @@ ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
ereport(DEBUG1, (errmsg("Starting parallel copy worker")));
pcdata->pcshared_info = pcshared_info;
+ AssignFullTransactionIdForWorker(pcshared_info->full_transaction_id);
+ AssignCommandIdForWorker(pcshared_info->mycid, true);
shared_cstate = (SerializedParallelCopyState *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_CSTATE,
false);
@@ -1181,6 +1604,33 @@ ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
}
/*
+ * UpdateBlockInLineInfo - Update the line information.
+ */
+static pg_attribute_always_inline int
+UpdateBlockInLineInfo(CopyState cstate, uint32 blk_pos,
+ uint32 offset, uint32 line_size, uint32 line_state)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundaries *lineBoundaryPtr = &pcshared_info->line_boundaries;
+ ParallelCopyLineBoundary *lineInfo;
+ int line_pos = lineBoundaryPtr->pos;
+
+ /* Update the line information for the worker to pick and process. */
+ lineInfo = &lineBoundaryPtr->ring[line_pos];
+ while (pg_atomic_read_u32(&lineInfo->line_size) != -1)
+ COPY_WAIT_TO_PROCESS()
+
+ lineInfo->first_block = blk_pos;
+ lineInfo->start_offset = offset;
+ lineInfo->cur_lineno = cstate->cur_lineno;
+ pg_atomic_write_u32(&lineInfo->line_size, line_size);
+ pg_atomic_write_u32(&lineInfo->line_state, line_state);
+ lineBoundaryPtr->pos = (lineBoundaryPtr->pos + 1) % RINGSIZE;
+
+ return line_pos;
+}
+
+/*
* ParallelCopyLeader - parallel copy leader's functionality.
*
* Leader will populate the shared memory and share it across the workers.
@@ -1205,8 +1655,159 @@ ParallelCopyLeader(CopyState cstate)
ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
ereport(DEBUG1, (errmsg("Running parallel copy leader")));
- pcshared_info->is_read_in_progress = false;
- cstate->cur_lineno = 0;
+ /* Execute the before statement triggers from the leader */
+ ExecBeforeStmtTrigger(cstate);
+
+ /* On input just throw the header line away. */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
+ {
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ {
+ pcshared_info->is_read_in_progress = false;
+ return; /* done */
+ }
+ }
+
+ for (;;)
+ {
+ bool done;
+ cstate->cur_lineno++;
+
+ /* Actually read the line into memory here. */
+ done = CopyReadLine(cstate);
+
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ break;
+ }
+
+ pcshared_info->is_read_in_progress = false;
+ cstate->cur_lineno = 0;
+ }
+
+/*
+ * GetLinePosition - return the line position that worker should process.
+ */
+static uint32
+GetLinePosition(CopyState cstate)
+{
+ ParallelCopyData *pcdata = cstate->pcdata;
+ ParallelCopyShmInfo *pcshared_info = pcdata->pcshared_info;
+ uint32 previous_pos = pcdata->worker_processed_pos;
+ uint32 write_pos = (previous_pos == -1) ? 0 : (previous_pos + 1) % RINGSIZE;
+ for (;;)
+ {
+ int dataSize;
+ bool is_read_in_progress = pcshared_info->is_read_in_progress;
+ ParallelCopyLineBoundary *lineInfo;
+ ParallelCopyDataBlock *data_blk_ptr;
+ ParallelCopyLineState line_state = LINE_LEADER_POPULATED;
+ ParallelCopyLineState curr_line_state;
+ CHECK_FOR_INTERRUPTS();
+
+ /* File read completed & no elements to process. */
+ if (!is_read_in_progress &&
+ (pcshared_info->populated ==
+ pg_atomic_read_u64(&pcshared_info->total_worker_processed)))
+ {
+ write_pos = -1;
+ break;
+ }
+
+ /* Get the current line information. */
+ lineInfo = &pcshared_info->line_boundaries.ring[write_pos];
+ curr_line_state = pg_atomic_read_u32(&lineInfo->line_state);
+ if ((write_pos % WORKER_CHUNK_COUNT == 0) &&
+ (curr_line_state == LINE_WORKER_PROCESSED ||
+ curr_line_state == LINE_WORKER_PROCESSING))
+ {
+ pcdata->worker_processed_pos = write_pos;
+ write_pos = (write_pos + WORKER_CHUNK_COUNT) % RINGSIZE;
+ continue;
+ }
+
+ /* Get the size of this line. */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+
+ if (dataSize != 0) /* If not an empty line. */
+ {
+ /* Get the block information. */
+ data_blk_ptr = &pcshared_info->data_blocks[lineInfo->first_block];
+
+ if (!data_blk_ptr->curr_blk_completed && (dataSize == -1))
+ {
+ /* Wait till the current line or block is added. */
+ COPY_WAIT_TO_PROCESS()
+ continue;
+ }
+ }
+
+ /* Make sure that no worker has consumed this element. */
+ if (pg_atomic_compare_exchange_u32(&lineInfo->line_state,
+ &line_state, LINE_WORKER_PROCESSING))
+ break;
+ }
+
+ pcdata->worker_processed_pos = write_pos;
+ return write_pos;
+}
+
+/*
+ * GetFreeCopyBlock - Get a free block for data to be copied.
+ */
+static pg_attribute_always_inline uint32
+GetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
+{
+ int count = 0;
+ uint32 last_free_block = pcshared_info->cur_block_pos;
+ uint32 block_pos = (last_free_block != -1) ? ((last_free_block + 1) % MAX_BLOCKS_COUNT): 0;
+
+ /*
+ * Get a new block for copying data, don't check curent block, current
+ * block will have some unprocessed data.
+ */
+ while (count < (MAX_BLOCKS_COUNT - 1))
+ {
+ ParallelCopyDataBlock *dataBlkPtr = &pcshared_info->data_blocks[block_pos];
+ uint32 unprocessed_line_parts = pg_atomic_read_u32(&dataBlkPtr->unprocessed_line_parts);
+ if (unprocessed_line_parts == 0)
+ {
+ dataBlkPtr->curr_blk_completed = false;
+ dataBlkPtr->skip_bytes = 0;
+ pcshared_info->cur_block_pos = block_pos;
+ return block_pos;
+ }
+
+ block_pos = (block_pos + 1) % MAX_BLOCKS_COUNT;
+ count++;
+ }
+
+ return -1;
+}
+
+/*
+ * WaitGetFreeCopyBlock - If there are no blocks available, wait and get a block
+ * for copying data.
+ */
+static uint32
+WaitGetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
+{
+ uint32 new_free_pos = -1;
+ for (;;)
+ {
+ new_free_pos = GetFreeCopyBlock(pcshared_info);
+ if (new_free_pos != -1) /* We have got one block, break now. */
+ break;
+
+ COPY_WAIT_TO_PROCESS()
+ }
+
+ return new_free_pos;
}
/*
@@ -1246,6 +1847,145 @@ LookupParallelCopyFnStr(copy_data_source_cb fn_addr)
}
/*
+ * SetRawBufForLoad - Set raw_buf to the shared memory where the file data must
+ * be read.
+ */
+static void
+SetRawBufForLoad(CopyState cstate, uint32 line_size, uint32 copy_buf_len,
+ uint32 raw_buf_ptr, char **copy_raw_buf)
+{
+ ParallelCopyShmInfo *pcshared_info;
+ uint32 cur_block_pos;
+ uint32 next_block_pos;
+ ParallelCopyDataBlock *cur_data_blk_ptr = NULL;
+ ParallelCopyDataBlock *next_data_blk_ptr = NULL;
+
+ if (!IsParallelCopy())
+ return;
+
+ pcshared_info = cstate->pcdata->pcshared_info;
+ cur_block_pos = pcshared_info->cur_block_pos;
+ cur_data_blk_ptr = (cstate->raw_buf) ? &pcshared_info->data_blocks[cur_block_pos] : NULL;
+ next_block_pos = WaitGetFreeCopyBlock(pcshared_info);
+ next_data_blk_ptr = &pcshared_info->data_blocks[next_block_pos];
+
+ /* set raw_buf to the data block in shared memory */
+ cstate->raw_buf = next_data_blk_ptr->data;
+ *copy_raw_buf = cstate->raw_buf;
+ if (cur_data_blk_ptr && line_size)
+ {
+ /*
+ * Mark the previous block as completed, worker can start copying this
+ * data.
+ */
+ cur_data_blk_ptr->following_block = next_block_pos;
+ pg_atomic_add_fetch_u32(&cur_data_blk_ptr->unprocessed_line_parts, 1);
+ cur_data_blk_ptr->skip_bytes = copy_buf_len - raw_buf_ptr;
+ cur_data_blk_ptr->curr_blk_completed = true;
+ }
+}
+
+/*
+ * EndLineParallelCopy - Update the line information in shared memory.
+ */
+static void
+EndLineParallelCopy(CopyState cstate, uint32 line_pos, uint32 line_size,
+ uint32 raw_buf_ptr)
+{
+ uint8 new_line_size;
+ if (!IsParallelCopy())
+ return;
+
+ if (!IsHeaderLine())
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundaries *lineBoundaryPtr = &pcshared_info->line_boundaries;
+ SET_NEWLINE_SIZE()
+ if (line_size)
+ {
+ ParallelCopyLineBoundary *lineInfo = &lineBoundaryPtr->ring[line_pos];
+ /*
+ * If the new_line_size > raw_buf_ptr, then the new block has only
+ * new line char content. The unprocessed count should not be
+ * increased in this case.
+ */
+ if (raw_buf_ptr > new_line_size)
+ {
+ uint32 cur_block_pos = pcshared_info->cur_block_pos;
+ ParallelCopyDataBlock *curr_data_blk_ptr = &pcshared_info->data_blocks[cur_block_pos];
+ pg_atomic_add_fetch_u32(&curr_data_blk_ptr->unprocessed_line_parts, 1);
+ }
+
+ /* Update line size. */
+ pg_atomic_write_u32(&lineInfo->line_size, line_size);
+ pg_atomic_write_u32(&lineInfo->line_state, LINE_LEADER_POPULATED);
+ elog(DEBUG1, "[Leader] After adding - line position:%d, line_size:%d",
+ line_pos, line_size);
+ pcshared_info->populated++;
+ }
+ else if (new_line_size)
+ {
+ /* This means only new line char, empty record should be inserted.*/
+ ParallelCopyLineBoundary *lineInfo;
+ line_pos = UpdateBlockInLineInfo(cstate, -1, -1, 0,
+ LINE_LEADER_POPULATED);
+ lineInfo = &lineBoundaryPtr->ring[line_pos];
+ elog(DEBUG1, "[Leader] Added empty line with offset:%d, line position:%d, line size:%d",
+ lineInfo->start_offset, line_pos,
+ pg_atomic_read_u32(&lineInfo->line_size));
+ pcshared_info->populated++;
+ }
+ }
+}
+
+/*
+ * ExecBeforeStmtTrigger - Execute the before statement trigger, this will be
+ * executed for parallel copy by the leader process.
+ */
+static void
+ExecBeforeStmtTrigger(CopyState cstate)
+{
+ EState *estate = CreateExecutorState();
+ ResultRelInfo *resultRelInfo;
+
+ Assert(IsLeader());
+
+ /*
+ * We need a ResultRelInfo so we can use the regular executor's
+ * index-entry-making machinery. (There used to be a huge amount of code
+ * here that basically duplicated execUtils.c ...)
+ */
+ resultRelInfo = makeNode(ResultRelInfo);
+ InitResultRelInfo(resultRelInfo,
+ cstate->rel,
+ 1, /* must match rel's position in range_table */
+ NULL,
+ 0);
+
+ /* Verify the named relation is a valid target for INSERT */
+ CheckValidResultRel(resultRelInfo, CMD_INSERT);
+
+ estate->es_result_relations = resultRelInfo;
+ estate->es_num_result_relations = 1;
+ estate->es_result_relation_info = resultRelInfo;
+
+ ExecInitRangeTable(estate, cstate->range_table);
+
+ /*
+ * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
+ * should do this for COPY, since it's not really an "INSERT" statement as
+ * such. However, executing these triggers maintains consistency with the
+ * EACH ROW triggers that we already fire on COPY.
+ */
+ ExecBSInsertTriggers(estate, resultRelInfo);
+
+ /* Close any trigger target relations */
+ ExecCleanUpTriggerState(estate);
+
+ FreeExecutorState(estate);
+}
+
+/*
* Send copy start/stop messages for frontend copies. These have changed
* in past protocol redesigns.
*/
@@ -3677,7 +4417,8 @@ CopyFrom(CopyState cstate)
PartitionTupleRouting *proute = NULL;
ErrorContextCallback errcallback;
- CommandId mycid = GetCurrentCommandId(true);
+ CommandId mycid = IsParallelCopy() ? GetCurrentCommandId(false) :
+ GetCurrentCommandId(true);
int ti_options = 0; /* start with default options for insert */
BulkInsertState bistate = NULL;
CopyInsertMethod insertMethod;
@@ -3687,7 +4428,14 @@ CopyFrom(CopyState cstate)
bool has_instead_insert_row_trig;
bool leafpart_use_multi_insert = false;
- CheckTargetRelValidity(cstate);
+ /*
+ * Perform this check if it is not parallel copy. In case of parallel
+ * copy, this check is done by the leader, so that if any invalid case
+ * exist the copy from command will error out from the leader itself,
+ * avoiding launching workers, just to throw error.
+ */
+ if (!IsParallelCopy())
+ CheckTargetRelValidity(cstate);
/*
* If the target file is new-in-transaction, we assume that checking FSM
@@ -3876,13 +4624,16 @@ CopyFrom(CopyState cstate)
has_instead_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
resultRelInfo->ri_TrigDesc->trig_insert_instead_row);
- /*
- * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
- * should do this for COPY, since it's not really an "INSERT" statement as
- * such. However, executing these triggers maintains consistency with the
- * EACH ROW triggers that we already fire on COPY.
- */
- ExecBSInsertTriggers(estate, resultRelInfo);
+ if (!IsParallelCopy())
+ {
+ /*
+ * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
+ * should do this for COPY, since it's not really an "INSERT" statement as
+ * such. However, executing these triggers maintains consistency with the
+ * EACH ROW triggers that we already fire on COPY.
+ */
+ ExecBSInsertTriggers(estate, resultRelInfo);
+ }
econtext = GetPerTupleExprContext(estate);
@@ -3982,6 +4733,16 @@ CopyFrom(CopyState cstate)
!has_instead_insert_row_trig &&
resultRelInfo->ri_FdwRoutine == NULL;
+ /*
+ * If the table has any partitions that are either foreign or
+ * has BEFORE/INSTEAD OF triggers, we can't perform copy
+ * operations with parallel workers.
+ */
+ if (!leafpart_use_multi_insert && IsParallelWorker())
+ ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot perform PARALLEL COPY if partition has BEFORE/INSTEAD OF triggers, or if the partition is foreign partition"),
+ errhint("Try COPY without PARALLEL option")));
+
/* Set the multi-insert buffer to use for this partition. */
if (leafpart_use_multi_insert)
{
@@ -4402,7 +5163,7 @@ BeginCopyFrom(ParseState *pstate,
{
initStringInfo(&cstate->line_buf);
cstate->line_buf_converted = false;
- cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf = (IsParallelCopy()) ? NULL : (char *) palloc(RAW_BUF_SIZE + 1);
cstate->raw_buf_index = cstate->raw_buf_len = 0;
}
@@ -4547,26 +5308,35 @@ NextCopyFromRawFields(CopyState cstate, char ***fields, int *nfields)
/* only available for text or csv input */
Assert(!cstate->binary);
- /* on input just throw the header line away */
- if (cstate->cur_lineno == 0 && cstate->header_line)
+ if (IsParallelCopy())
{
- cstate->cur_lineno++;
- if (CopyReadLine(cstate))
- return false; /* done */
+ done = GetWorkerLine(cstate);
+ if (done && cstate->line_buf.len == 0)
+ return false;
}
+ else
+ {
+ /* on input just throw the header line away */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
+ {
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ return false; /* done */
+ }
- cstate->cur_lineno++;
+ cstate->cur_lineno++;
- /* Actually read the line into memory here */
- done = CopyReadLine(cstate);
+ /* Actually read the line into memory here */
+ done = CopyReadLine(cstate);
- /*
- * EOF at start of line means we're done. If we see EOF after some
- * characters, we act as though it was newline followed by EOF, ie,
- * process the line and then exit loop on next iteration.
- */
- if (done && cstate->line_buf.len == 0)
- return false;
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ return false;
+ }
/* Parse the line into de-escaped field values */
if (cstate->csv_mode)
@@ -4814,9 +5584,31 @@ CopyReadLine(CopyState cstate)
*/
if (cstate->copy_dest == COPY_NEW_FE)
{
+ bool bIsFirst = true;
do
{
- cstate->raw_buf_index = cstate->raw_buf_len;
+ if (!IsParallelCopy())
+ cstate->raw_buf_index = cstate->raw_buf_len;
+ else
+ {
+ if (cstate->raw_buf_index == RAW_BUF_SIZE)
+ {
+ /* Get a new block if it is the first time, From the
+ * subsequent time, reset the index and re-use the same
+ * block.
+ */
+ if (bIsFirst)
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint32 block_pos = WaitGetFreeCopyBlock(pcshared_info);
+ cstate->raw_buf = pcshared_info->data_blocks[block_pos].data;
+ bIsFirst = false;
+ }
+
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+ }
+ }
+
} while (CopyLoadRawBuf(cstate));
}
}
@@ -4871,7 +5663,8 @@ static void
ConvertToServerEncoding(CopyState cstate)
{
/* Done reading the line. Convert it to server encoding. */
- if (cstate->need_transcoding)
+ if (cstate->need_transcoding &&
+ (!IsParallelCopy() || (IsParallelCopy() && !IsLeader())))
{
char *cvt;
cvt = pg_any_to_server(cstate->line_buf.data,
@@ -4910,6 +5703,11 @@ CopyReadLineText(CopyState cstate)
char quotec = '\0';
char escapec = '\0';
+ /* For parallel copy */
+ int line_size = 0;
+ int line_pos = 0;
+
+ cstate->eol_type = EOL_UNKNOWN;
if (cstate->csv_mode)
{
quotec = cstate->quote[0];
@@ -4964,6 +5762,8 @@ CopyReadLineText(CopyState cstate)
if (raw_buf_ptr >= copy_buf_len || need_data)
{
REFILL_LINEBUF;
+ SetRawBufForLoad(cstate, line_size, copy_buf_len, raw_buf_ptr,
+ ©_raw_buf);
/*
* Try to read some more data. This will certainly reset
@@ -5188,9 +5988,15 @@ CopyReadLineText(CopyState cstate)
* discard the data and the \. sequence.
*/
if (prev_raw_ptr > cstate->raw_buf_index)
- appendBinaryStringInfo(&cstate->line_buf,
+ {
+ if (!IsParallelCopy())
+ appendBinaryStringInfo(&cstate->line_buf,
cstate->raw_buf + cstate->raw_buf_index,
prev_raw_ptr - cstate->raw_buf_index);
+ else
+ line_size += prev_raw_ptr - cstate->raw_buf_index;
+ }
+
cstate->raw_buf_index = raw_buf_ptr;
result = true; /* report EOF */
break;
@@ -5242,6 +6048,26 @@ not_end_of_copy:
IF_NEED_REFILL_AND_EOF_BREAK(mblen - 1);
raw_buf_ptr += mblen - 1;
}
+
+ /*
+ * Skip the header line. Update the line here, this cannot be done at
+ * the beginning, as there is a possibility that file contains empty
+ * lines.
+ */
+ if (IsParallelCopy() && first_char_in_line && !IsHeaderLine())
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundary *lineInfo;
+ uint32 line_first_block = pcshared_info->cur_block_pos;
+ line_pos = UpdateBlockInLineInfo(cstate,
+ line_first_block,
+ cstate->raw_buf_index, -1,
+ LINE_LEADER_POPULATING);
+ lineInfo = &pcshared_info->line_boundaries.ring[line_pos];
+ elog(DEBUG1, "[Leader] Adding - block:%d, offset:%d, line position:%d",
+ line_first_block, lineInfo->start_offset, line_pos);
+ }
+
first_char_in_line = false;
} /* end of outer loop */
@@ -5250,6 +6076,7 @@ not_end_of_copy:
*/
REFILL_LINEBUF;
CLEAR_EOL_LINE();
+ EndLineParallelCopy(cstate, line_pos, line_size, raw_buf_ptr);
return result;
}
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index db19187..20dafb7 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -381,6 +381,8 @@ extern FullTransactionId GetTopFullTransactionId(void);
extern FullTransactionId GetTopFullTransactionIdIfAny(void);
extern FullTransactionId GetCurrentFullTransactionId(void);
extern FullTransactionId GetCurrentFullTransactionIdIfAny(void);
+extern void AssignFullTransactionIdForWorker(FullTransactionId fullTransactionId);
+extern void AssignCommandIdForWorker(CommandId commandId, bool used);
extern void MarkCurrentTransactionIdLoggedIfAny(void);
extern bool SubTransactionIsActive(SubTransactionId subxid);
extern CommandId GetCurrentCommandId(bool used);
--
1.8.3.1
0004-Documentation-for-parallel-copy.patchapplication/x-patch; name=0004-Documentation-for-parallel-copy.patchDownload
From 065975f77b0ddfe06911b68abee2fec27e695afa Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Thu, 16 Jul 2020 22:07:34 +0530
Subject: [PATCH 4/6] Documentation for parallel copy.
This patch has the documentation changes for parallel copy.
---
doc/src/sgml/ref/copy.sgml | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18189ab..95d349d 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -37,6 +37,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
NULL '<replaceable class="parameter">null_string</replaceable>'
HEADER [ <replaceable class="parameter">boolean</replaceable> ]
+ PARALLEL <replaceable class="parameter">integer</replaceable>
QUOTE '<replaceable class="parameter">quote_character</replaceable>'
ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -275,6 +276,21 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</varlistentry>
<varlistentry>
+ <term><literal>PARALLEL</literal></term>
+ <listitem>
+ <para>
+ Perform <command>COPY FROM</command> in parallel using <replaceable
+ class="parameter"> integer</replaceable> background workers. Please
+ note that it is not guaranteed that the number of parallel workers
+ specified in <replaceable class="parameter">integer</replaceable> will
+ be used during execution. It is possible for a copy to run with fewer
+ workers than specified, or even with no workers at all. This option is
+ allowed only in <command>COPY FROM</command>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>QUOTE</literal></term>
<listitem>
<para>
--
1.8.3.1
0005-Tests-for-parallel-copy.patchapplication/x-patch; name=0005-Tests-for-parallel-copy.patchDownload
From 28c62ea99cf3ed6700028c077a3e41d2f3398a22 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Thu, 16 Jul 2020 22:10:11 +0530
Subject: [PATCH 5/6] Tests for parallel copy.
This patch has the tests for parallel copy.
---
src/test/regress/expected/copy2.out | 210 +++++++++++++++++++++++++++++++++++-
src/test/regress/input/copy.source | 12 +++
src/test/regress/output/copy.source | 12 +++
src/test/regress/sql/copy2.sql | 208 ++++++++++++++++++++++++++++++++++-
4 files changed, 434 insertions(+), 8 deletions(-)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index e40287d..ef75148 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -254,18 +254,32 @@ It is "perfect".|
"It is ""perfect""."," "
"",
--test that we read consecutive LFs properly
-CREATE TEMP TABLE testnl (a int, b text, c int);
+CREATE TABLE testnl (a int, b text, c int);
COPY testnl FROM stdin CSV;
+COPY testnl FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
+SELECT * FROM testnl;
+ a | b | c
+---+----------------------+---
+ 1 | a field with two LFs+| 2
+ | +|
+ | inside |
+ 1 | a field with two LFs+| 2
+ | +|
+ | inside |
+(2 rows)
+
-- test end of copy marker
-CREATE TEMP TABLE testeoc (a text);
+CREATE TABLE testeoc (a text);
COPY testeoc FROM stdin CSV;
+TRUNCATE testeoc;
+COPY testeoc FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
COPY testeoc TO stdout CSV;
a\.
\.b
c\.d
"\."
-- test handling of nonstandard null marker that violates escaping rules
-CREATE TEMP TABLE testnull(a int, b text);
+CREATE TABLE testnull(a int, b text);
INSERT INTO testnull VALUES (1, E'\\0'), (NULL, NULL);
COPY testnull TO stdout WITH NULL AS E'\\0';
1 \\0
@@ -280,6 +294,15 @@ SELECT * FROM testnull;
|
(4 rows)
+TRUNCATE testnull;
+COPY testnull FROM stdin WITH (NULL E'\\0', PARALLEL 1);
+SELECT * FROM testnull;
+ a | b
+----+----
+ 42 | \0
+ |
+(2 rows)
+
BEGIN;
CREATE TABLE vistest (LIKE testeoc);
COPY vistest FROM stdin CSV;
@@ -349,6 +372,34 @@ SELECT * FROM vistest;
BEGIN;
TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+SELECT * FROM vistest;
+ a
+----
+ a2
+ b
+(2 rows)
+
+SAVEPOINT s1;
+TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+SELECT * FROM vistest;
+ a
+----
+ d2
+ e
+(2 rows)
+
+COMMIT;
+SELECT * FROM vistest;
+ a
+----
+ d2
+ e
+(2 rows)
+
+BEGIN;
+TRUNCATE vistest;
COPY vistest FROM stdin CSV FREEZE;
SELECT * FROM vistest;
a
@@ -409,7 +460,7 @@ SELECT * FROM vistest;
(2 rows)
-- Test FORCE_NOT_NULL and FORCE_NULL options
-CREATE TEMP TABLE forcetest (
+CREATE TABLE forcetest (
a INT NOT NULL,
b TEXT NOT NULL,
c TEXT,
@@ -420,6 +471,8 @@ CREATE TEMP TABLE forcetest (
-- should succeed with no effect ("b" remains an empty string, "c" remains NULL)
BEGIN;
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
+TRUNCATE forcetest;
+COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c), PARALLEL 1);
COMMIT;
SELECT b, c FROM forcetest WHERE a = 1;
b | c
@@ -430,6 +483,8 @@ SELECT b, c FROM forcetest WHERE a = 1;
-- should succeed, FORCE_NULL and FORCE_NOT_NULL can be both specified
BEGIN;
COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d));
+TRUNCATE forcetest;
+COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d), PARALLEL 1);
COMMIT;
SELECT c, d FROM forcetest WHERE a = 2;
c | d
@@ -486,6 +541,31 @@ select * from check_con_tbl;
(2 rows)
+\d+ check_con_tbl
+ Table "public.check_con_tbl"
+ Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
+--------+---------+-----------+----------+---------+---------+--------------+-------------
+ f1 | integer | | | | plain | |
+Check constraints:
+ "check_con_tbl_check" CHECK (check_con_function(check_con_tbl.*))
+
+truncate check_con_tbl;
+copy check_con_tbl from stdin with (parallel 1);
+NOTICE: input = {"f1":1}
+NOTICE: input = {"f1":null}
+copy check_con_tbl from stdin with (parallel 1);
+NOTICE: input = {"f1":0}
+ERROR: new row for relation "check_con_tbl" violates check constraint "check_con_tbl_check"
+DETAIL: Failing row contains (0).
+CONTEXT: COPY check_con_tbl, line 1: "0"
+parallel worker
+select * from check_con_tbl;
+ f1
+----
+ 1
+
+(2 rows)
+
-- test with RLS enabled.
CREATE ROLE regress_rls_copy_user;
CREATE ROLE regress_rls_copy_user_colperms;
@@ -600,8 +680,130 @@ SELECT * FROM instead_of_insert_tbl;
(2 rows)
COMMIT;
+-- Parallel copy tests.
+CREATE TABLE test_parallel_copy (
+ a INT,
+ b INT,
+ c TEXT not null default 'stuff',
+ d TEXT,
+ e TEXT
+) ;
+COPY test_parallel_copy (a, b, c, d, e) from stdin with (PARALLEL 1);
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1, FORMAT 'csv', HEADER);
+-- zero workers: should perform non-parallel copy
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0');
+ERROR: argument to option "parallel" must be a positive integer greater than zero
+LINE 1: COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0'...
+ ^
+-- referencing table: should perform non-parallel copy
+CREATE TABLE test_copy_pk(c1 INT PRIMARY KEY);
+INSERT INTO test_copy_pk VALUES(10);
+CREATE TABLE test_copy_ri(c1 INT REFERENCES test_copy_pk(c1));
+COPY test_copy_ri from stdin with (FORMAT csv, DELIMITER ',', PARALLEL 1);
+WARNING: Parallel copy not supported for specified table, copy will be run in non-parallel mode
+-- expressions: should perform non-parallel copy
+CREATE TABLE test_copy_expr (index INT, height REAL, weight REAL);
+COPY test_copy_expr FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1) WHERE height > random() * 65;
+WARNING: Parallel copy not supported for specified table, copy will be run in non-parallel mode
+-- serial data: should perform non-parallel copy
+CREATE TABLE testserial (index SERIAL, height REAL);
+COPY testserial(height) FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1);
+WARNING: Parallel copy not supported for specified table, copy will be run in non-parallel mode
+-- temporary table copy: should peform non-parallel copy
+CREATE TEMPORARY TABLE temp_test(
+ a int
+) ;
+COPY temp_test (a) from stdin with (PARALLEL 1);
+WARNING: Parallel copy not supported for specified table, copy will be run in non-parallel mode
+-- non-existent column in column list: should fail
+COPY test_parallel_copy (xyz) from stdin with (PARALLEL 1);
+ERROR: column "xyz" of relation "test_parallel_copy" does not exist
+-- too many columns in column list: should fail
+COPY test_parallel_copy (a, b, c, d, e, d, c) from stdin with (PARALLEL 1);
+ERROR: column "d" specified more than once
+-- missing data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: invalid input syntax for type integer: ""
+CONTEXT: COPY test_parallel_copy, line 0, column a: ""
+parallel worker
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: missing data for column "e"
+CONTEXT: COPY test_parallel_copy, line 1: "2000 230 23 23"
+parallel worker
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: missing data for column "e"
+CONTEXT: COPY test_parallel_copy, line 1: "2001 231 \N \N"
+parallel worker
+-- extra data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: extra data after last expected column
+CONTEXT: COPY test_parallel_copy, line 1: "2002 232 40 50 60 70 80"
+parallel worker
+-- various COPY options: delimiters, oids, NULL string, encoding
+COPY test_parallel_copy (b, c, d, e) from stdin with (delimiter ',', null 'x', PARALLEL 1) ;
+COPY test_parallel_copy from stdin WITH (DELIMITER ';', NULL '', PARALLEL 1);
+COPY test_parallel_copy from stdin WITH (DELIMITER ':', NULL E'\\X', ENCODING 'sql_ascii', PARALLEL 1);
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = 50004;
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a > 60003;
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE f > 60003;
+ERROR: column "f" does not exist
+LINE 1: ..._parallel_copy from stdin WITH (PARALLEL 1) WHERE f > 60003;
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = max(x.b);
+ERROR: missing FROM-clause entry for table "x"
+LINE 1: ...rallel_copy from stdin WITH (PARALLEL 1) WHERE a = max(x.b);
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (SELECT 1 FROM x);
+ERROR: cannot use subquery in COPY FROM WHERE condition
+LINE 1: ...arallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (SELECT...
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (generate_series(1,5));
+ERROR: set-returning functions are not allowed in COPY FROM WHERE conditions
+LINE 1: ...lel_copy from stdin WITH (PARALLEL 1) WHERE a IN (generate_s...
+ ^
+COPY test_parallel_copy from stdin WITH(PARALLEL 1) WHERE a = row_number() over(b);
+ERROR: window functions are not allowed in COPY FROM WHERE conditions
+LINE 1: ...rallel_copy from stdin WITH(PARALLEL 1) WHERE a = row_number...
+ ^
+-- check results of copy in
+SELECT * FROM test_parallel_copy;
+ a | b | c | d | e
+-------+----+------------+---------+---------
+ 1 | 11 | test_c1 | test_d1 | test_e1
+ 2 | 12 | test_c2 | test_d2 | test_e2
+ | 3 | stuff | test_d3 |
+ | 4 | stuff | test_d4 |
+ | 5 | stuff | test_d5 |
+ | | 45 | 80 | 90
+ | | x | \x | \x
+ | | , | \, | \
+ 3000 | | c | |
+ 4000 | | C | |
+ 4001 | 1 | empty | |
+ 4002 | 2 | null | |
+ 4003 | 3 | Backslash | \ | \
+ 4004 | 4 | BackslashX | \X | \X
+ 4005 | 5 | N | N | N
+ 4006 | 6 | BackslashN | \N | \N
+ 4007 | 7 | XX | XX | XX
+ 4008 | 8 | Delimiter | : | :
+ 50004 | 25 | 35 | 45 | 55
+ 60004 | 25 | 35 | 45 | 55
+ 60005 | 26 | 36 | 46 | 56
+(21 rows)
+
-- clean up
+DROP TABLE test_copy_ri;
+DROP TABLE test_copy_pk;
+DROP TABLE test_copy_expr;
+DROP TABLE testeoc;
+DROP TABLE testnl;
DROP TABLE forcetest;
+DROP TABLE test_parallel_copy;
+DROP TABLE testserial;
+DROP TABLE testnull;
DROP TABLE vistest;
DROP FUNCTION truncate_in_subxact();
DROP TABLE x, y;
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index a1d529a..159c058 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -15,6 +15,13 @@ DELETE FROM onek;
COPY onek FROM '@abs_builddir@/results/onek.data';
+-- test parallel copy
+COPY tenk1 FROM '@abs_srcdir@/data/tenk.data' WITH (parallel 2);
+
+SELECT COUNT(*) FROM tenk1;
+
+TRUNCATE tenk1;
+
COPY tenk1 FROM '@abs_srcdir@/data/tenk.data';
COPY slow_emp4000 FROM '@abs_srcdir@/data/rect.data';
@@ -159,6 +166,11 @@ truncate parted_copytest;
copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv';
+-- Test parallel copy from with a partitioned table.
+truncate parted_copytest;
+
+copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv' with (parallel 2);
+
-- Ensure COPY FREEZE errors for partitioned tables.
begin;
truncate parted_copytest;
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index 938d355..c3003fe 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -9,6 +9,15 @@ COPY onek FROM '@abs_srcdir@/data/onek.data';
COPY onek TO '@abs_builddir@/results/onek.data';
DELETE FROM onek;
COPY onek FROM '@abs_builddir@/results/onek.data';
+-- test parallel copy
+COPY tenk1 FROM '@abs_srcdir@/data/tenk.data' WITH (parallel 2);
+SELECT COUNT(*) FROM tenk1;
+ count
+-------
+ 10000
+(1 row)
+
+TRUNCATE tenk1;
COPY tenk1 FROM '@abs_srcdir@/data/tenk.data';
COPY slow_emp4000 FROM '@abs_srcdir@/data/rect.data';
COPY person FROM '@abs_srcdir@/data/person.data';
@@ -113,6 +122,9 @@ insert into parted_copytest select x,1,'One' from generate_series(1011,1020) x;
copy (select * from parted_copytest order by a) to '@abs_builddir@/results/parted_copytest.csv';
truncate parted_copytest;
copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv';
+-- Test parallel copy from with a partitioned table.
+truncate parted_copytest;
+copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv' with (parallel 2);
-- Ensure COPY FREEZE errors for partitioned tables.
begin;
truncate parted_copytest;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 902f4fa..032cea9 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -157,7 +157,7 @@ COPY y TO stdout (FORMAT CSV, FORCE_QUOTE *);
--test that we read consecutive LFs properly
-CREATE TEMP TABLE testnl (a int, b text, c int);
+CREATE TABLE testnl (a int, b text, c int);
COPY testnl FROM stdin CSV;
1,"a field with two LFs
@@ -165,8 +165,16 @@ COPY testnl FROM stdin CSV;
inside",2
\.
+COPY testnl FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
+1,"a field with two LFs
+
+inside",2
+\.
+
+SELECT * FROM testnl;
+
-- test end of copy marker
-CREATE TEMP TABLE testeoc (a text);
+CREATE TABLE testeoc (a text);
COPY testeoc FROM stdin CSV;
a\.
@@ -175,11 +183,19 @@ c\.d
"\."
\.
+TRUNCATE testeoc;
+COPY testeoc FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
+a\.
+\.b
+c\.d
+"\."
+\.
+
COPY testeoc TO stdout CSV;
-- test handling of nonstandard null marker that violates escaping rules
-CREATE TEMP TABLE testnull(a int, b text);
+CREATE TABLE testnull(a int, b text);
INSERT INTO testnull VALUES (1, E'\\0'), (NULL, NULL);
COPY testnull TO stdout WITH NULL AS E'\\0';
@@ -191,6 +207,14 @@ COPY testnull FROM stdin WITH NULL AS E'\\0';
SELECT * FROM testnull;
+TRUNCATE testnull;
+COPY testnull FROM stdin WITH (NULL E'\\0', PARALLEL 1);
+42 \\0
+\0 \0
+\.
+
+SELECT * FROM testnull;
+
BEGIN;
CREATE TABLE vistest (LIKE testeoc);
COPY vistest FROM stdin CSV;
@@ -235,6 +259,23 @@ SELECT * FROM vistest;
BEGIN;
TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+a2
+b
+\.
+SELECT * FROM vistest;
+SAVEPOINT s1;
+TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+d2
+e
+\.
+SELECT * FROM vistest;
+COMMIT;
+SELECT * FROM vistest;
+
+BEGIN;
+TRUNCATE vistest;
COPY vistest FROM stdin CSV FREEZE;
x
y
@@ -284,7 +325,7 @@ SELECT * FROM vistest;
COMMIT;
SELECT * FROM vistest;
-- Test FORCE_NOT_NULL and FORCE_NULL options
-CREATE TEMP TABLE forcetest (
+CREATE TABLE forcetest (
a INT NOT NULL,
b TEXT NOT NULL,
c TEXT,
@@ -297,6 +338,10 @@ BEGIN;
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
1,,""
\.
+TRUNCATE forcetest;
+COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c), PARALLEL 1);
+1,,""
+\.
COMMIT;
SELECT b, c FROM forcetest WHERE a = 1;
-- should succeed, FORCE_NULL and FORCE_NOT_NULL can be both specified
@@ -304,6 +349,10 @@ BEGIN;
COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d));
2,'a',,""
\.
+TRUNCATE forcetest;
+COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d), PARALLEL 1);
+2,'a',,""
+\.
COMMIT;
SELECT c, d FROM forcetest WHERE a = 2;
-- should fail with not-null constraint violation
@@ -339,6 +388,16 @@ copy check_con_tbl from stdin;
0
\.
select * from check_con_tbl;
+\d+ check_con_tbl
+truncate check_con_tbl;
+copy check_con_tbl from stdin with (parallel 1);
+1
+\N
+\.
+copy check_con_tbl from stdin with (parallel 1);
+0
+\.
+select * from check_con_tbl;
-- test with RLS enabled.
CREATE ROLE regress_rls_copy_user;
@@ -440,8 +499,149 @@ test1
SELECT * FROM instead_of_insert_tbl;
COMMIT;
+-- Parallel copy tests.
+CREATE TABLE test_parallel_copy (
+ a INT,
+ b INT,
+ c TEXT not null default 'stuff',
+ d TEXT,
+ e TEXT
+) ;
+
+COPY test_parallel_copy (a, b, c, d, e) from stdin with (PARALLEL 1);
+1 11 test_c1 test_d1 test_e1
+2 12 test_c2 test_d2 test_e2
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+3 test_d3
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+4 test_d4
+5 test_d5
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1, FORMAT 'csv', HEADER);
+b d
+\.
+
+-- zero workers: should perform non-parallel copy
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0');
+
+-- referencing table: should perform non-parallel copy
+CREATE TABLE test_copy_pk(c1 INT PRIMARY KEY);
+INSERT INTO test_copy_pk VALUES(10);
+CREATE TABLE test_copy_ri(c1 INT REFERENCES test_copy_pk(c1));
+COPY test_copy_ri from stdin with (FORMAT csv, DELIMITER ',', PARALLEL 1);
+10
+\.
+
+-- expressions: should perform non-parallel copy
+CREATE TABLE test_copy_expr (index INT, height REAL, weight REAL);
+
+COPY test_copy_expr FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1) WHERE height > random() * 65;
+60,60,60
+\.
+
+-- serial data: should perform non-parallel copy
+CREATE TABLE testserial (index SERIAL, height REAL);
+
+COPY testserial(height) FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1);
+60
+\.
+
+-- temporary table copy: should peform non-parallel copy
+CREATE TEMPORARY TABLE temp_test(
+ a int
+) ;
+
+COPY temp_test (a) from stdin with (PARALLEL 1);
+10
+\.
+
+-- non-existent column in column list: should fail
+COPY test_parallel_copy (xyz) from stdin with (PARALLEL 1);
+
+-- too many columns in column list: should fail
+COPY test_parallel_copy (a, b, c, d, e, d, c) from stdin with (PARALLEL 1);
+
+-- missing data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+
+\.
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+2000 230 23 23
+\.
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+2001 231 \N \N
+\.
+
+-- extra data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+2002 232 40 50 60 70 80
+\.
+
+-- various COPY options: delimiters, oids, NULL string, encoding
+COPY test_parallel_copy (b, c, d, e) from stdin with (delimiter ',', null 'x', PARALLEL 1) ;
+x,45,80,90
+x,\x,\\x,\\\x
+x,\,,\\\,,\\
+\.
+
+COPY test_parallel_copy from stdin WITH (DELIMITER ';', NULL '', PARALLEL 1);
+3000;;c;;
+\.
+
+COPY test_parallel_copy from stdin WITH (DELIMITER ':', NULL E'\\X', ENCODING 'sql_ascii', PARALLEL 1);
+4000:\X:C:\X:\X
+4001:1:empty::
+4002:2:null:\X:\X
+4003:3:Backslash:\\:\\
+4004:4:BackslashX:\\X:\\X
+4005:5:N:\N:\N
+4006:6:BackslashN:\\N:\\N
+4007:7:XX:\XX:\XX
+4008:8:Delimiter:\::\:
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = 50004;
+50003 24 34 44 54
+50004 25 35 45 55
+50005 26 36 46 56
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a > 60003;
+60001 22 32 42 52
+60002 23 33 43 53
+60003 24 34 44 54
+60004 25 35 45 55
+60005 26 36 46 56
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE f > 60003;
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = max(x.b);
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (SELECT 1 FROM x);
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (generate_series(1,5));
+
+COPY test_parallel_copy from stdin WITH(PARALLEL 1) WHERE a = row_number() over(b);
+
+-- check results of copy in
+SELECT * FROM test_parallel_copy;
+
-- clean up
+DROP TABLE test_copy_ri;
+DROP TABLE test_copy_pk;
+DROP TABLE test_copy_expr;
+DROP TABLE testeoc;
+DROP TABLE testnl;
DROP TABLE forcetest;
+DROP TABLE test_parallel_copy;
+DROP TABLE testserial;
+DROP TABLE testnull;
DROP TABLE vistest;
DROP FUNCTION truncate_in_subxact();
DROP TABLE x, y;
--
1.8.3.1
0006-Parallel-Copy-For-Binary-Format-Files.patchapplication/x-patch; name=0006-Parallel-Copy-For-Binary-Format-Files.patchDownload
From 6cf067d7b4600322a3f980a37adaa924e41bf07a Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Thu, 16 Jul 2020 22:15:37 +0530
Subject: [PATCH 6/6] Parallel Copy For Binary Format Files
Leader reads data from the file into the DSM data blocks each of 64K size.
It also identifies each tuple data block id, start offset, end offset,
tuple size and updates this information in the ring data structure.
Workers parallely read the tuple information from the ring data structure,
the actual tuple data from the data blocks and parallely insert the tuples
into the table.
---
src/backend/commands/copy.c | 656 ++++++++++++++++++++++++++++++++++++++------
1 file changed, 575 insertions(+), 81 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index b709b2f..3b8fe7d 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -256,6 +256,17 @@ typedef struct ParallelCopyLineBuf
}ParallelCopyLineBuf;
/*
+ * Represents the usage mode for CopyReadBinaryGetDataBlock.
+ */
+typedef enum FieldInfoType
+{
+ FIELD_NONE = 0,
+ FIELD_COUNT,
+ FIELD_SIZE,
+ FIELD_DATA
+} FieldInfoType;
+
+/*
* Parallel copy data information.
*/
typedef struct ParallelCopyData
@@ -276,6 +287,9 @@ typedef struct ParallelCopyData
/* Current position in worker_line_buf */
uint32 worker_line_buf_pos;
+
+ /* For binary formatted files */
+ ParallelCopyDataBlock *curr_data_block;
}ParallelCopyData;
/*
@@ -436,6 +450,7 @@ typedef struct SerializedParallelCopyState
/* Working state for COPY FROM */
AttrNumber num_defaults;
Oid relid;
+ bool binary;
}SerializedParallelCopyState;
/*
@@ -541,7 +556,6 @@ typedef struct CopyMultiInsertInfo
int ti_options; /* table insert options */
} CopyMultiInsertInfo;
-
/*
* These macros centralize code used to process line_buf and raw_buf buffers.
* They are macros because they often do continue/break control and to avoid
@@ -682,8 +696,110 @@ else \
/* End parallel copy Macros */
-static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
+/*
+ * CHECK_FIELD_COUNT - Handles the error cases for field count
+ * for binary format files.
+ */
+#define CHECK_FIELD_COUNT \
+{\
+ if (fld_count == -1) \
+ { \
+ if (IsParallelCopy() && \
+ !IsLeader()) \
+ return true; \
+ else if (IsParallelCopy() && \
+ IsLeader()) \
+ { \
+ if (cstate->pcdata->curr_data_block->data[cstate->raw_buf_index + sizeof(fld_count)] != 0) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("received copy data after EOF marker"))); \
+ return true; \
+ } \
+ else \
+ { \
+ /* \
+ * Received EOF marker. In a V3-protocol copy, wait for the \
+ * protocol-level EOF, and complain if it doesn't come \
+ * immediately. This ensures that we correctly handle CopyFail, \
+ * if client chooses to send that now. \
+ * \
+ * Note that we MUST NOT try to read more data in an old-protocol \
+ * copy, since there is no protocol-level EOF marker then. We \
+ * could go either way for copy from file, but choose to throw \
+ * error if there's data after the EOF marker, for consistency \
+ * with the new-protocol case. \
+ */ \
+ char dummy; \
+ if (cstate->copy_dest != COPY_OLD_FE && \
+ CopyGetData(cstate, &dummy, 1, 1) > 0) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("received copy data after EOF marker"))); \
+ return false; \
+ } \
+ } \
+ if (fld_count != cstate->max_fields) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("row field count is %d, expected %d", \
+ (int) fld_count, cstate->max_fields))); \
+}
+
+/*
+ * CHECK_FIELD_SIZE - Handles the error case for field size
+ * for binary format files.
+ */
+#define CHECK_FIELD_SIZE(fld_size) \
+{ \
+ if (fld_size < -1) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("invalid field size")));\
+}
+/*
+ * EOF_ERROR - Error statement for EOF for binary format
+ * files.
+ */
+#define EOF_ERROR \
+{ \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("unexpected EOF in COPY data")));\
+}
+
+/*
+ * GET_RAW_BUF_INDEX - Caclulates the raw buf index for the cases
+ * where the data spread is across multiple data blocks.
+ */
+#define GET_RAW_BUF_INDEX(raw_buf_index, fld_size, required_blks, curr_blk_bytes) \
+{ \
+ raw_buf_index = fld_size - (((required_blks - 1) * DATA_BLOCK_SIZE) + curr_blk_bytes); \
+}
+
+/*
+ * GET_REQUIRED_BLOCKS - Caclulates the number of data
+ * blocks required for the cases where the data spread
+ * is across multiple data blocks.
+ */
+#define GET_REQUIRED_BLOCKS(required_blks, fld_size, curr_blk_bytes) \
+{ \
+ /* \
+ * field size can spread across multiple data blocks, \
+ * calculate the number of required data blocks and try to get \
+ * those many data blocks. \
+ */ \
+ required_blks = (int32)(fld_size - curr_blk_bytes)/(int32)DATA_BLOCK_SIZE; \
+ /* \
+ * check if we need the data block for the field data \
+ * bytes that are not modulus of data block size. \
+ */ \
+ if ((fld_size - curr_blk_bytes)%DATA_BLOCK_SIZE != 0) \
+ required_blks++; \
+}
+
+static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
/* non-export function prototypes */
static CopyState BeginCopy(ParseState *pstate, bool is_from, Relation rel,
@@ -739,6 +855,14 @@ static void ExecBeforeStmtTrigger(CopyState cstate);
static void CheckTargetRelValidity(CopyState cstate);
static void PopulateCatalogInformation(CopyState cstate);
static pg_attribute_always_inline uint32 GetLinePosition(CopyState cstate);
+static uint32 WaitGetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info);
+static bool CopyReadBinaryTupleLeader(CopyState cstate);
+static void CopyReadBinaryFindTupleSize(CopyState cstate, uint32 *line_size);
+static bool CopyReadBinaryTupleWorker(CopyState cstate, Datum *values, bool *nulls);
+static Datum CopyReadBinaryAttributeWorker(CopyState cstate, FmgrInfo *flinfo,
+ Oid typioparam, int32 typmod, bool *isnull);
+static void CopyReadBinaryGetDataBlock(CopyState cstate, FieldInfoType field_info);
+
/*
* SerializeParallelCopyState - Copy shared_cstate using cstate information.
*/
@@ -756,6 +880,7 @@ SerializeParallelCopyState(CopyState cstate, SerializedParallelCopyState *shared
shared_cstate->convert_selectively = cstate->convert_selectively;
shared_cstate->num_defaults = cstate->num_defaults;
shared_cstate->relid = cstate->pcdata->relid;
+ shared_cstate->binary = cstate->binary;
}
/*
@@ -984,8 +1109,8 @@ FindInsertMethod(CopyState cstate)
static pg_attribute_always_inline bool
IsParallelCopyAllowed(CopyState cstate)
{
- /* Parallel copy not allowed for frontend (2.0 protocol) & binary option. */
- if ((cstate->copy_dest == COPY_OLD_FE) || cstate->binary)
+ /* Parallel copy not allowed for frontend (2.0 protocol). */
+ if (cstate->copy_dest == COPY_OLD_FE)
return false;
/* Check if copy is into foreign table or temporary table. */
@@ -1277,6 +1402,7 @@ ParallelWorkerInitialization(SerializedParallelCopyState *shared_cstate,
cstate->convert_selectively = shared_cstate->convert_selectively;
cstate->num_defaults = shared_cstate->num_defaults;
pcdata->relid = shared_cstate->relid;
+ cstate->binary = shared_cstate->binary;
PopulateGlobalsForCopyFrom(cstate, tup_desc, attnamelist);
@@ -1292,7 +1418,7 @@ ParallelWorkerInitialization(SerializedParallelCopyState *shared_cstate,
initStringInfo(&cstate->attribute_buf);
initStringInfo(&cstate->line_buf);
- for (count = 0; count < WORKER_CHUNK_COUNT;count++)
+ for (count = 0; count < WORKER_CHUNK_COUNT; count++)
initStringInfo(&pcdata->worker_line_buf[count].line_buf);
cstate->line_buf_converted = false;
@@ -1658,32 +1784,66 @@ ParallelCopyLeader(CopyState cstate)
/* Execute the before statement triggers from the leader */
ExecBeforeStmtTrigger(cstate);
- /* On input just throw the header line away. */
- if (cstate->cur_lineno == 0 && cstate->header_line)
+ if (!cstate->binary)
{
- cstate->cur_lineno++;
- if (CopyReadLine(cstate))
+ /* On input just throw the header line away. */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
{
- pcshared_info->is_read_in_progress = false;
- return; /* done */
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ {
+ pcshared_info->is_read_in_progress = false;
+ return; /* done */
+ }
}
- }
- for (;;)
- {
- bool done;
- cstate->cur_lineno++;
+ for (;;)
+ {
+ bool done;
+ cstate->cur_lineno++;
- /* Actually read the line into memory here. */
- done = CopyReadLine(cstate);
+ /* Actually read the line into memory here. */
+ done = CopyReadLine(cstate);
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ break;
+ }
+ }
+ else
+ {
/*
- * EOF at start of line means we're done. If we see EOF after some
- * characters, we act as though it was newline followed by EOF, ie,
- * process the line and then exit loop on next iteration.
+ * Binary Format Files.
+ * For parallel copy leader, fill in the error
+ * context inforation here, in case any failures
+ * while determining tuple offsets, leader
+ * would throw the errors with proper context.
*/
- if (done && cstate->line_buf.len == 0)
- break;
+ ErrorContextCallback errcallback;
+ errcallback.callback = CopyFromErrorCallback;
+ errcallback.arg = (void *) cstate;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+ cstate->pcdata->curr_data_block = NULL;
+ cstate->raw_buf_index = 0;
+ pcshared_info->populated = 0;
+ cstate->cur_lineno = 0;
+ cstate->max_fields = list_length(cstate->attnumlist);
+
+ for (;;)
+ {
+ bool eof = false;
+ cstate->cur_lineno++;
+
+ eof = CopyReadBinaryTupleLeader(cstate);
+
+ if (eof)
+ break;
+ }
}
pcshared_info->is_read_in_progress = false;
@@ -1691,7 +1851,357 @@ ParallelCopyLeader(CopyState cstate)
}
/*
- * GetLinePosition - return the line position that worker should process.
+ * CopyReadBinaryGetDataBlock - gets a new block, updates
+ * the current offset, calculates the skip bytes.
+ */
+static void
+CopyReadBinaryGetDataBlock(CopyState cstate, FieldInfoType field_info)
+{
+ ParallelCopyDataBlock *data_block = NULL;
+ ParallelCopyDataBlock *curr_data_block = cstate->pcdata->curr_data_block;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint8 move_bytes = 0;
+ uint32 block_pos;
+ uint32 prev_block_pos;
+ int read_bytes = 0;
+
+ prev_block_pos = pcshared_info->cur_block_pos;
+
+ block_pos = WaitGetFreeCopyBlock(pcshared_info);
+
+ if (field_info == FIELD_SIZE || field_info == FIELD_COUNT)
+ move_bytes = (DATA_BLOCK_SIZE - cstate->raw_buf_index);
+
+ if (curr_data_block != NULL)
+ curr_data_block->skip_bytes = move_bytes;
+
+ data_block = &pcshared_info->data_blocks[block_pos];
+
+ if (move_bytes > 0 && curr_data_block != NULL)
+ memmove(&data_block->data[0], &curr_data_block->data[cstate->raw_buf_index], move_bytes);
+
+ elog(DEBUG1, "LEADER - field info %d is spread across data blocks - moved %d bytes from current block %u to %u block",
+ field_info, move_bytes, prev_block_pos, block_pos);
+
+ read_bytes = CopyGetData(cstate, &data_block->data[move_bytes], 1, (DATA_BLOCK_SIZE - move_bytes));
+
+ if (field_info == FIELD_NONE && cstate->reached_eof)
+ return;
+
+ if (cstate->reached_eof)
+ EOF_ERROR;
+
+ elog(DEBUG1, "LEADER - bytes read from file %d", read_bytes);
+
+ if(field_info == FIELD_SIZE || field_info == FIELD_DATA)
+ {
+ ParallelCopyDataBlock *prev_data_block = NULL;
+ prev_data_block = curr_data_block;
+ prev_data_block->following_block = block_pos;
+
+ if (prev_data_block->curr_blk_completed == false)
+ prev_data_block->curr_blk_completed = true;
+
+ pg_atomic_add_fetch_u32(&prev_data_block->unprocessed_line_parts, 1);
+ }
+
+ cstate->pcdata->curr_data_block = data_block;
+ cstate->raw_buf_index = 0;
+}
+
+/*
+ * CopyReadBinaryTupleLeader - leader reads data from binary formatted file
+ * to data blocks and identies tuple boundaries/offsets so that workers
+ * can work on the data blocks data.
+ */
+static bool
+CopyReadBinaryTupleLeader(CopyState cstate)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ int16 fld_count;
+ uint32 line_size = 0;
+ uint32 start_block_pos;
+ uint32 start_offset;
+
+ if (cstate->pcdata->curr_data_block == NULL)
+ {
+ CopyReadBinaryGetDataBlock(cstate, FIELD_NONE);
+
+ /*
+ * no data is read from file here. one possibility
+ * to be here could be that the binary file just
+ * has a valid signature but nothing else.
+ */
+ if (cstate->reached_eof)
+ return true;
+ }
+
+ if ((cstate->raw_buf_index + sizeof(fld_count)) >= DATA_BLOCK_SIZE)
+ CopyReadBinaryGetDataBlock(cstate, FIELD_COUNT);
+
+ memcpy(&fld_count, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_count));
+ fld_count = (int16) pg_ntoh16(fld_count);
+ CHECK_FIELD_COUNT;
+ start_offset = cstate->raw_buf_index;
+ cstate->raw_buf_index += sizeof(fld_count);
+ line_size += sizeof(fld_count);
+ start_block_pos = pcshared_info->cur_block_pos;
+
+ CopyReadBinaryFindTupleSize(cstate, &line_size);
+
+ pg_atomic_add_fetch_u32(&cstate->pcdata->curr_data_block->unprocessed_line_parts, 1);
+
+ if (line_size > 0)
+ {
+ int line_pos = UpdateBlockInLineInfo(cstate,
+ start_block_pos,
+ start_offset,
+ line_size,
+ LINE_LEADER_POPULATED);
+
+ pcshared_info->populated++;
+ elog(DEBUG1, "LEADER - adding - block:%u, offset:%u, line size:%u line position:%d",
+ start_block_pos, start_offset, line_size, line_pos);
+ }
+
+ return false;
+}
+
+/*
+ * CopyReadBinaryFindTupleSize - leader identifies boundaries/
+ * offsets for each attribute/column and finally results in the
+ * tuple/row size. It moves on to next data block if the attribute/
+ * column is spread across data blocks.
+ */
+static void
+CopyReadBinaryFindTupleSize(CopyState cstate, uint32 *line_size)
+{
+ int32 fld_size;
+ ListCell *cur;
+ TupleDesc tup_desc = RelationGetDescr(cstate->rel);
+
+ foreach(cur, cstate->attnumlist)
+ {
+ int att_num = lfirst_int(cur);
+ Form_pg_attribute att = TupleDescAttr(tup_desc, (att_num - 1));
+ cstate->cur_attname = NameStr(att->attname);
+ fld_size = 0;
+
+ if ((cstate->raw_buf_index + sizeof(fld_size)) >= DATA_BLOCK_SIZE)
+ CopyReadBinaryGetDataBlock(cstate, FIELD_SIZE);
+
+ memcpy(&fld_size, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_size));
+ cstate->raw_buf_index += sizeof(fld_size);
+ *line_size += sizeof(fld_size);
+ fld_size = (int32) pg_ntoh32(fld_size);
+
+ CHECK_FIELD_SIZE(fld_size);
+
+ *line_size += fld_size;
+
+ if ((DATA_BLOCK_SIZE - cstate->raw_buf_index) >= fld_size)
+ {
+ cstate->raw_buf_index += fld_size;
+ elog(DEBUG1, "LEADER - tuple lies in he same data block");
+ }
+ else
+ {
+ int32 required_blks = 0;
+ int32 curr_blk_bytes = (DATA_BLOCK_SIZE - cstate->raw_buf_index);
+ int i = 0;
+
+ GET_REQUIRED_BLOCKS(required_blks, fld_size, curr_blk_bytes);
+
+ i = required_blks;
+
+ while(i > 0)
+ {
+ CopyReadBinaryGetDataBlock(cstate, FIELD_DATA);
+ i--;
+ }
+
+ GET_RAW_BUF_INDEX(cstate->raw_buf_index, fld_size, required_blks, curr_blk_bytes);
+
+ /*
+ * raw_buf_index should never cross data block size,
+ * as the required number of data blocks would have
+ * been obtained in the above while loop.
+ */
+ Assert(cstate->raw_buf_index <= DATA_BLOCK_SIZE);
+ }
+ cstate->cur_attname = NULL;
+ }
+}
+
+/*
+ * CopyReadBinaryTupleWorker - each worker reads data from data blocks after
+ * getting leader-identified tuple offsets from ring data structure.
+ */
+static bool
+CopyReadBinaryTupleWorker(CopyState cstate, Datum *values, bool *nulls)
+{
+ uint32 line_pos;
+ ParallelCopyLineBoundary *line_info;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ int16 fld_count;
+ ListCell *cur;
+ FmgrInfo *in_functions = cstate->in_functions;
+ Oid *typioparams = cstate->typioparams;
+ TupleDesc tup_desc = RelationGetDescr(cstate->rel);
+
+ line_pos = GetLinePosition(cstate);
+
+ if (line_pos == -1)
+ return true;
+
+ line_info = &pcshared_info->line_boundaries.ring[line_pos];
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[line_info->first_block];
+ cstate->raw_buf_index = line_info->start_offset;
+
+#if 0
+ if (cstate->raw_buf_index + sizeof(fld_count) >= DATA_BLOCK_SIZE)
+ {
+ /*
+ * The case where field count spread across datablocks should never occur,
+ * as the leader would have moved it to next block. enable this if block
+ * for debugging purposes only.
+ */
+ elog(DEBUG1, "WORKER - field count spread across datablocks should never occur");
+ }
+#endif
+
+ memcpy(&fld_count, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_count));
+ fld_count = (int16) pg_ntoh16(fld_count);
+
+ CHECK_FIELD_COUNT;
+
+ cstate->raw_buf_index += sizeof(fld_count);
+
+ foreach(cur, cstate->attnumlist)
+ {
+ int att_num = lfirst_int(cur);
+ int m = att_num - 1;
+ Form_pg_attribute att = TupleDescAttr(tup_desc, m);
+ cstate->cur_attname = NameStr(att->attname);
+
+ values[m] = CopyReadBinaryAttributeWorker(cstate,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &nulls[m]);
+ cstate->cur_attname = NULL;
+ }
+
+ pg_atomic_sub_fetch_u32(&cstate->pcdata->curr_data_block->unprocessed_line_parts, 1);
+ line_info->start_offset = -1;
+ pg_atomic_write_u32(&line_info->line_state, LINE_WORKER_PROCESSED);
+ pg_atomic_write_u32(&line_info->line_size, -1);
+ pg_atomic_add_fetch_u64(&pcshared_info->total_worker_processed, 1);
+
+ return false;
+}
+
+/*
+ * CopyReadBinaryAttributeWorker - leader identifies boundaries/offsets
+ * for each attribute/column, it moves on to next data block if the
+ * attribute/column is spread across data blocks.
+ */
+static Datum
+CopyReadBinaryAttributeWorker(CopyState cstate, FmgrInfo *flinfo,
+ Oid typioparam, int32 typmod, bool *isnull)
+{
+ int32 fld_size;
+ Datum result;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+
+ if ((cstate->raw_buf_index + sizeof(fld_size)) >= DATA_BLOCK_SIZE)
+ {
+ ParallelCopyDataBlock *prev_data_block = cstate->pcdata->curr_data_block;
+ elog(DEBUG1, "WORKER - field size is spread across data blocks");
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[prev_data_block->following_block];
+ pg_atomic_sub_fetch_u32(&prev_data_block->unprocessed_line_parts, 1);
+ cstate->raw_buf_index = 0;
+ }
+
+ memcpy(&fld_size, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_size));
+ fld_size = (int32) pg_ntoh32(fld_size);
+
+ CHECK_FIELD_SIZE(fld_size);
+
+ cstate->raw_buf_index += sizeof(fld_size);
+
+ /* reset attribute_buf to empty, and load raw data in it */
+ resetStringInfo(&cstate->attribute_buf);
+
+ enlargeStringInfo(&cstate->attribute_buf, fld_size);
+
+ if ((DATA_BLOCK_SIZE - cstate->raw_buf_index) >= fld_size)
+ {
+ elog(DEBUG1, "WORKER - tuple lies in single data block");
+ memcpy(&cstate->attribute_buf.data[0],&cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], fld_size);
+ cstate->raw_buf_index += fld_size;
+ }
+ else
+ {
+ uint32 att_buf_idx = 0;
+ uint32 copy_bytes = 0;
+ int32 required_blks = 0;
+ int32 curr_blk_bytes = (DATA_BLOCK_SIZE - cstate->raw_buf_index);
+ ParallelCopyDataBlock *prev_data_block = NULL;
+ int i = 0;
+
+ GET_REQUIRED_BLOCKS(required_blks, fld_size, curr_blk_bytes);
+
+ i = required_blks;
+ prev_data_block = cstate->pcdata->curr_data_block;
+ elog(DEBUG1, "WORKER - tuple is spread across data blocks");
+ memcpy(&cstate->attribute_buf.data[0], &prev_data_block->data[cstate->raw_buf_index],
+ curr_blk_bytes);
+ copy_bytes = curr_blk_bytes;
+ att_buf_idx = curr_blk_bytes;
+
+ while (i>0)
+ {
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[prev_data_block->following_block];
+ pg_atomic_sub_fetch_u32(&prev_data_block->unprocessed_line_parts, 1);
+ cstate->raw_buf_index = 0;
+ copy_bytes = fld_size - att_buf_idx;
+
+ /*
+ * the bytes that are yet to be taken into att buff are more than
+ * the entire data block size, but only take the data block size
+ * elements.
+ */
+ if (copy_bytes >= DATA_BLOCK_SIZE)
+ copy_bytes = DATA_BLOCK_SIZE;
+
+ memcpy(&cstate->attribute_buf.data[att_buf_idx],
+ &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], copy_bytes);
+ att_buf_idx += copy_bytes;
+ i--;
+ }
+ GET_RAW_BUF_INDEX(cstate->raw_buf_index, fld_size, required_blks, curr_blk_bytes);
+ }
+
+ cstate->attribute_buf.len = fld_size;
+ cstate->attribute_buf.data[fld_size] = '\0';
+
+ /* Call the column type's binary input converter */
+ result = ReceiveFunctionCall(flinfo, &cstate->attribute_buf,
+ typioparam, typmod);
+
+ /* Trouble if it didn't eat the whole buffer */
+ if (cstate->attribute_buf.cursor != cstate->attribute_buf.len)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
+ errmsg("incorrect binary data format")));
+
+ *isnull = false;
+ return result;
+}
+
+/*
+ * GetLinePosition - return the line position that worker should pcdata->process.
*/
static uint32
GetLinePosition(CopyState cstate)
@@ -1779,7 +2289,9 @@ GetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
{
dataBlkPtr->curr_blk_completed = false;
dataBlkPtr->skip_bytes = 0;
+ dataBlkPtr->following_block = -1;
pcshared_info->cur_block_pos = block_pos;
+ MemSet(&dataBlkPtr->data[0], 0, DATA_BLOCK_SIZE);
return block_pos;
}
@@ -5464,60 +5976,47 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
else
{
/* binary */
- int16 fld_count;
- ListCell *cur;
-
- cstate->cur_lineno++;
-
- if (!CopyGetInt16(cstate, &fld_count))
+ if (!IsParallelCopy())
{
- /* EOF detected (end of file, or protocol-level EOF) */
- return false;
- }
+ int16 fld_count;
+ ListCell *cur;
- if (fld_count == -1)
- {
- /*
- * Received EOF marker. In a V3-protocol copy, wait for the
- * protocol-level EOF, and complain if it doesn't come
- * immediately. This ensures that we correctly handle CopyFail,
- * if client chooses to send that now.
- *
- * Note that we MUST NOT try to read more data in an old-protocol
- * copy, since there is no protocol-level EOF marker then. We
- * could go either way for copy from file, but choose to throw
- * error if there's data after the EOF marker, for consistency
- * with the new-protocol case.
- */
- char dummy;
+ cstate->cur_lineno++;
+ cstate->max_fields = list_length(cstate->attnumlist);
- if (cstate->copy_dest != COPY_OLD_FE &&
- CopyGetData(cstate, &dummy, 1, 1) > 0)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("received copy data after EOF marker")));
- return false;
- }
+ if (!CopyGetInt16(cstate, &fld_count))
+ {
+ /* EOF detected (end of file, or protocol-level EOF) */
+ return false;
+ }
- if (fld_count != attr_count)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("row field count is %d, expected %d",
- (int) fld_count, attr_count)));
+ CHECK_FIELD_COUNT;
- foreach(cur, cstate->attnumlist)
+ foreach(cur, cstate->attnumlist)
+ {
+ int attnum = lfirst_int(cur);
+ int m = attnum - 1;
+ Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+ cstate->cur_attname = NameStr(att->attname);
+ values[m] = CopyReadBinaryAttribute(cstate,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &nulls[m]);
+ cstate->cur_attname = NULL;
+ }
+ }
+ else
{
- int attnum = lfirst_int(cur);
- int m = attnum - 1;
- Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+ bool eof = false;
+ cstate->cur_lineno++;
+ cstate->max_fields = list_length(cstate->attnumlist);
- cstate->cur_attname = NameStr(att->attname);
- values[m] = CopyReadBinaryAttribute(cstate,
- &in_functions[m],
- typioparams[m],
- att->atttypmod,
- &nulls[m]);
- cstate->cur_attname = NULL;
+ eof = CopyReadBinaryTupleWorker(cstate, values, nulls);
+
+ if (eof)
+ return false;
}
}
@@ -6518,18 +7017,15 @@ CopyReadBinaryAttribute(CopyState cstate, FmgrInfo *flinfo,
Datum result;
if (!CopyGetInt32(cstate, &fld_size))
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("unexpected EOF in COPY data")));
+ EOF_ERROR;
+
if (fld_size == -1)
{
*isnull = true;
return ReceiveFunctionCall(flinfo, NULL, typioparam, typmod);
}
- if (fld_size < 0)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("invalid field size")));
+
+ CHECK_FIELD_SIZE(fld_size);
/* reset attribute_buf to empty, and load raw data in it */
resetStringInfo(&cstate->attribute_buf);
@@ -6537,9 +7033,7 @@ CopyReadBinaryAttribute(CopyState cstate, FmgrInfo *flinfo,
enlargeStringInfo(&cstate->attribute_buf, fld_size);
if (CopyGetData(cstate, cstate->attribute_buf.data,
fld_size, fld_size) != fld_size)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("unexpected EOF in COPY data")));
+ EOF_ERROR;
cstate->attribute_buf.len = fld_size;
cstate->attribute_buf.data[fld_size] = '\0';
--
1.8.3.1
Please find the updated patch with the fixes included.
Patch 0003-Allow-copy-from-command-to-process-data-from-file-ST.patch
had few indentation issues, I have fixed and attached the patch for
the same.
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
Attachments:
0006-Parallel-Copy-For-Binary-Format-Files.patchtext/x-patch; charset=US-ASCII; name=0006-Parallel-Copy-For-Binary-Format-Files.patchDownload
From ab3b3ddeb080112a408ab6560cd2aa1e9acd1374 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Fri, 17 Jul 2020 13:18:54 +0530
Subject: [PATCH 6/6] Parallel Copy For Binary Format Files
Leader reads data from the file into the DSM data blocks each of 64K size.
It also identifies each tuple data block id, start offset, end offset,
tuple size and updates this information in the ring data structure.
Workers parallely read the tuple information from the ring data structure,
the actual tuple data from the data blocks and parallely insert the tuples
into the table.
---
src/backend/commands/copy.c | 656 ++++++++++++++++++++++++++++++++++++++------
1 file changed, 575 insertions(+), 81 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index d4cc4d2..e147893 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -256,6 +256,17 @@ typedef struct ParallelCopyLineBuf
}ParallelCopyLineBuf;
/*
+ * Represents the usage mode for CopyReadBinaryGetDataBlock.
+ */
+typedef enum FieldInfoType
+{
+ FIELD_NONE = 0,
+ FIELD_COUNT,
+ FIELD_SIZE,
+ FIELD_DATA
+} FieldInfoType;
+
+/*
* Parallel copy data information.
*/
typedef struct ParallelCopyData
@@ -276,6 +287,9 @@ typedef struct ParallelCopyData
/* Current position in worker_line_buf */
uint32 worker_line_buf_pos;
+
+ /* For binary formatted files */
+ ParallelCopyDataBlock *curr_data_block;
}ParallelCopyData;
/*
@@ -436,6 +450,7 @@ typedef struct SerializedParallelCopyState
/* Working state for COPY FROM */
AttrNumber num_defaults;
Oid relid;
+ bool binary;
}SerializedParallelCopyState;
/*
@@ -541,7 +556,6 @@ typedef struct CopyMultiInsertInfo
int ti_options; /* table insert options */
} CopyMultiInsertInfo;
-
/*
* These macros centralize code used to process line_buf and raw_buf buffers.
* They are macros because they often do continue/break control and to avoid
@@ -671,8 +685,110 @@ else \
/* End parallel copy Macros */
-static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
+/*
+ * CHECK_FIELD_COUNT - Handles the error cases for field count
+ * for binary format files.
+ */
+#define CHECK_FIELD_COUNT \
+{\
+ if (fld_count == -1) \
+ { \
+ if (IsParallelCopy() && \
+ !IsLeader()) \
+ return true; \
+ else if (IsParallelCopy() && \
+ IsLeader()) \
+ { \
+ if (cstate->pcdata->curr_data_block->data[cstate->raw_buf_index + sizeof(fld_count)] != 0) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("received copy data after EOF marker"))); \
+ return true; \
+ } \
+ else \
+ { \
+ /* \
+ * Received EOF marker. In a V3-protocol copy, wait for the \
+ * protocol-level EOF, and complain if it doesn't come \
+ * immediately. This ensures that we correctly handle CopyFail, \
+ * if client chooses to send that now. \
+ * \
+ * Note that we MUST NOT try to read more data in an old-protocol \
+ * copy, since there is no protocol-level EOF marker then. We \
+ * could go either way for copy from file, but choose to throw \
+ * error if there's data after the EOF marker, for consistency \
+ * with the new-protocol case. \
+ */ \
+ char dummy; \
+ if (cstate->copy_dest != COPY_OLD_FE && \
+ CopyGetData(cstate, &dummy, 1, 1) > 0) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("received copy data after EOF marker"))); \
+ return false; \
+ } \
+ } \
+ if (fld_count != cstate->max_fields) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("row field count is %d, expected %d", \
+ (int) fld_count, cstate->max_fields))); \
+}
+
+/*
+ * CHECK_FIELD_SIZE - Handles the error case for field size
+ * for binary format files.
+ */
+#define CHECK_FIELD_SIZE(fld_size) \
+{ \
+ if (fld_size < -1) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("invalid field size")));\
+}
+/*
+ * EOF_ERROR - Error statement for EOF for binary format
+ * files.
+ */
+#define EOF_ERROR \
+{ \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("unexpected EOF in COPY data")));\
+}
+
+/*
+ * GET_RAW_BUF_INDEX - Caclulates the raw buf index for the cases
+ * where the data spread is across multiple data blocks.
+ */
+#define GET_RAW_BUF_INDEX(raw_buf_index, fld_size, required_blks, curr_blk_bytes) \
+{ \
+ raw_buf_index = fld_size - (((required_blks - 1) * DATA_BLOCK_SIZE) + curr_blk_bytes); \
+}
+
+/*
+ * GET_REQUIRED_BLOCKS - Caclulates the number of data
+ * blocks required for the cases where the data spread
+ * is across multiple data blocks.
+ */
+#define GET_REQUIRED_BLOCKS(required_blks, fld_size, curr_blk_bytes) \
+{ \
+ /* \
+ * field size can spread across multiple data blocks, \
+ * calculate the number of required data blocks and try to get \
+ * those many data blocks. \
+ */ \
+ required_blks = (int32)(fld_size - curr_blk_bytes)/(int32)DATA_BLOCK_SIZE; \
+ /* \
+ * check if we need the data block for the field data \
+ * bytes that are not modulus of data block size. \
+ */ \
+ if ((fld_size - curr_blk_bytes)%DATA_BLOCK_SIZE != 0) \
+ required_blks++; \
+}
+
+static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
/* non-export function prototypes */
static CopyState BeginCopy(ParseState *pstate, bool is_from, Relation rel,
@@ -728,6 +844,14 @@ static void ExecBeforeStmtTrigger(CopyState cstate);
static void CheckTargetRelValidity(CopyState cstate);
static void PopulateCatalogInformation(CopyState cstate);
static pg_attribute_always_inline uint32 GetLinePosition(CopyState cstate);
+static uint32 WaitGetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info);
+static bool CopyReadBinaryTupleLeader(CopyState cstate);
+static void CopyReadBinaryFindTupleSize(CopyState cstate, uint32 *line_size);
+static bool CopyReadBinaryTupleWorker(CopyState cstate, Datum *values, bool *nulls);
+static Datum CopyReadBinaryAttributeWorker(CopyState cstate, FmgrInfo *flinfo,
+ Oid typioparam, int32 typmod, bool *isnull);
+static void CopyReadBinaryGetDataBlock(CopyState cstate, FieldInfoType field_info);
+
/*
* SerializeParallelCopyState - Copy shared_cstate using cstate information.
*/
@@ -745,6 +869,7 @@ SerializeParallelCopyState(CopyState cstate, SerializedParallelCopyState *shared
shared_cstate->convert_selectively = cstate->convert_selectively;
shared_cstate->num_defaults = cstate->num_defaults;
shared_cstate->relid = cstate->pcdata->relid;
+ shared_cstate->binary = cstate->binary;
}
/*
@@ -973,8 +1098,8 @@ FindInsertMethod(CopyState cstate)
static pg_attribute_always_inline bool
IsParallelCopyAllowed(CopyState cstate)
{
- /* Parallel copy not allowed for frontend (2.0 protocol) & binary option. */
- if ((cstate->copy_dest == COPY_OLD_FE) || cstate->binary)
+ /* Parallel copy not allowed for frontend (2.0 protocol). */
+ if (cstate->copy_dest == COPY_OLD_FE)
return false;
/* Check if copy is into foreign table or temporary table. */
@@ -1266,6 +1391,7 @@ ParallelWorkerInitialization(SerializedParallelCopyState *shared_cstate,
cstate->convert_selectively = shared_cstate->convert_selectively;
cstate->num_defaults = shared_cstate->num_defaults;
pcdata->relid = shared_cstate->relid;
+ cstate->binary = shared_cstate->binary;
PopulateGlobalsForCopyFrom(cstate, tup_desc, attnamelist);
@@ -1281,7 +1407,7 @@ ParallelWorkerInitialization(SerializedParallelCopyState *shared_cstate,
initStringInfo(&cstate->attribute_buf);
initStringInfo(&cstate->line_buf);
- for (count = 0; count < WORKER_CHUNK_COUNT;count++)
+ for (count = 0; count < WORKER_CHUNK_COUNT; count++)
initStringInfo(&pcdata->worker_line_buf[count].line_buf);
cstate->line_buf_converted = false;
@@ -1647,32 +1773,66 @@ ParallelCopyLeader(CopyState cstate)
/* Execute the before statement triggers from the leader */
ExecBeforeStmtTrigger(cstate);
- /* On input just throw the header line away. */
- if (cstate->cur_lineno == 0 && cstate->header_line)
+ if (!cstate->binary)
{
- cstate->cur_lineno++;
- if (CopyReadLine(cstate))
+ /* On input just throw the header line away. */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
{
- pcshared_info->is_read_in_progress = false;
- return; /* done */
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ {
+ pcshared_info->is_read_in_progress = false;
+ return; /* done */
+ }
}
- }
- for (;;)
- {
- bool done;
- cstate->cur_lineno++;
+ for (;;)
+ {
+ bool done;
+ cstate->cur_lineno++;
- /* Actually read the line into memory here. */
- done = CopyReadLine(cstate);
+ /* Actually read the line into memory here. */
+ done = CopyReadLine(cstate);
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ break;
+ }
+ }
+ else
+ {
/*
- * EOF at start of line means we're done. If we see EOF after some
- * characters, we act as though it was newline followed by EOF, ie,
- * process the line and then exit loop on next iteration.
+ * Binary Format Files.
+ * For parallel copy leader, fill in the error
+ * context inforation here, in case any failures
+ * while determining tuple offsets, leader
+ * would throw the errors with proper context.
*/
- if (done && cstate->line_buf.len == 0)
- break;
+ ErrorContextCallback errcallback;
+ errcallback.callback = CopyFromErrorCallback;
+ errcallback.arg = (void *) cstate;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+ cstate->pcdata->curr_data_block = NULL;
+ cstate->raw_buf_index = 0;
+ pcshared_info->populated = 0;
+ cstate->cur_lineno = 0;
+ cstate->max_fields = list_length(cstate->attnumlist);
+
+ for (;;)
+ {
+ bool eof = false;
+ cstate->cur_lineno++;
+
+ eof = CopyReadBinaryTupleLeader(cstate);
+
+ if (eof)
+ break;
+ }
}
pcshared_info->is_read_in_progress = false;
@@ -1680,7 +1840,357 @@ ParallelCopyLeader(CopyState cstate)
}
/*
- * GetLinePosition - return the line position that worker should process.
+ * CopyReadBinaryGetDataBlock - gets a new block, updates
+ * the current offset, calculates the skip bytes.
+ */
+static void
+CopyReadBinaryGetDataBlock(CopyState cstate, FieldInfoType field_info)
+{
+ ParallelCopyDataBlock *data_block = NULL;
+ ParallelCopyDataBlock *curr_data_block = cstate->pcdata->curr_data_block;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint8 move_bytes = 0;
+ uint32 block_pos;
+ uint32 prev_block_pos;
+ int read_bytes = 0;
+
+ prev_block_pos = pcshared_info->cur_block_pos;
+
+ block_pos = WaitGetFreeCopyBlock(pcshared_info);
+
+ if (field_info == FIELD_SIZE || field_info == FIELD_COUNT)
+ move_bytes = (DATA_BLOCK_SIZE - cstate->raw_buf_index);
+
+ if (curr_data_block != NULL)
+ curr_data_block->skip_bytes = move_bytes;
+
+ data_block = &pcshared_info->data_blocks[block_pos];
+
+ if (move_bytes > 0 && curr_data_block != NULL)
+ memmove(&data_block->data[0], &curr_data_block->data[cstate->raw_buf_index], move_bytes);
+
+ elog(DEBUG1, "LEADER - field info %d is spread across data blocks - moved %d bytes from current block %u to %u block",
+ field_info, move_bytes, prev_block_pos, block_pos);
+
+ read_bytes = CopyGetData(cstate, &data_block->data[move_bytes], 1, (DATA_BLOCK_SIZE - move_bytes));
+
+ if (field_info == FIELD_NONE && cstate->reached_eof)
+ return;
+
+ if (cstate->reached_eof)
+ EOF_ERROR;
+
+ elog(DEBUG1, "LEADER - bytes read from file %d", read_bytes);
+
+ if(field_info == FIELD_SIZE || field_info == FIELD_DATA)
+ {
+ ParallelCopyDataBlock *prev_data_block = NULL;
+ prev_data_block = curr_data_block;
+ prev_data_block->following_block = block_pos;
+
+ if (prev_data_block->curr_blk_completed == false)
+ prev_data_block->curr_blk_completed = true;
+
+ pg_atomic_add_fetch_u32(&prev_data_block->unprocessed_line_parts, 1);
+ }
+
+ cstate->pcdata->curr_data_block = data_block;
+ cstate->raw_buf_index = 0;
+}
+
+/*
+ * CopyReadBinaryTupleLeader - leader reads data from binary formatted file
+ * to data blocks and identies tuple boundaries/offsets so that workers
+ * can work on the data blocks data.
+ */
+static bool
+CopyReadBinaryTupleLeader(CopyState cstate)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ int16 fld_count;
+ uint32 line_size = 0;
+ uint32 start_block_pos;
+ uint32 start_offset;
+
+ if (cstate->pcdata->curr_data_block == NULL)
+ {
+ CopyReadBinaryGetDataBlock(cstate, FIELD_NONE);
+
+ /*
+ * no data is read from file here. one possibility
+ * to be here could be that the binary file just
+ * has a valid signature but nothing else.
+ */
+ if (cstate->reached_eof)
+ return true;
+ }
+
+ if ((cstate->raw_buf_index + sizeof(fld_count)) >= DATA_BLOCK_SIZE)
+ CopyReadBinaryGetDataBlock(cstate, FIELD_COUNT);
+
+ memcpy(&fld_count, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_count));
+ fld_count = (int16) pg_ntoh16(fld_count);
+ CHECK_FIELD_COUNT;
+ start_offset = cstate->raw_buf_index;
+ cstate->raw_buf_index += sizeof(fld_count);
+ line_size += sizeof(fld_count);
+ start_block_pos = pcshared_info->cur_block_pos;
+
+ CopyReadBinaryFindTupleSize(cstate, &line_size);
+
+ pg_atomic_add_fetch_u32(&cstate->pcdata->curr_data_block->unprocessed_line_parts, 1);
+
+ if (line_size > 0)
+ {
+ int line_pos = UpdateBlockInLineInfo(cstate,
+ start_block_pos,
+ start_offset,
+ line_size,
+ LINE_LEADER_POPULATED);
+
+ pcshared_info->populated++;
+ elog(DEBUG1, "LEADER - adding - block:%u, offset:%u, line size:%u line position:%d",
+ start_block_pos, start_offset, line_size, line_pos);
+ }
+
+ return false;
+}
+
+/*
+ * CopyReadBinaryFindTupleSize - leader identifies boundaries/
+ * offsets for each attribute/column and finally results in the
+ * tuple/row size. It moves on to next data block if the attribute/
+ * column is spread across data blocks.
+ */
+static void
+CopyReadBinaryFindTupleSize(CopyState cstate, uint32 *line_size)
+{
+ int32 fld_size;
+ ListCell *cur;
+ TupleDesc tup_desc = RelationGetDescr(cstate->rel);
+
+ foreach(cur, cstate->attnumlist)
+ {
+ int att_num = lfirst_int(cur);
+ Form_pg_attribute att = TupleDescAttr(tup_desc, (att_num - 1));
+ cstate->cur_attname = NameStr(att->attname);
+ fld_size = 0;
+
+ if ((cstate->raw_buf_index + sizeof(fld_size)) >= DATA_BLOCK_SIZE)
+ CopyReadBinaryGetDataBlock(cstate, FIELD_SIZE);
+
+ memcpy(&fld_size, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_size));
+ cstate->raw_buf_index += sizeof(fld_size);
+ *line_size += sizeof(fld_size);
+ fld_size = (int32) pg_ntoh32(fld_size);
+
+ CHECK_FIELD_SIZE(fld_size);
+
+ *line_size += fld_size;
+
+ if ((DATA_BLOCK_SIZE - cstate->raw_buf_index) >= fld_size)
+ {
+ cstate->raw_buf_index += fld_size;
+ elog(DEBUG1, "LEADER - tuple lies in he same data block");
+ }
+ else
+ {
+ int32 required_blks = 0;
+ int32 curr_blk_bytes = (DATA_BLOCK_SIZE - cstate->raw_buf_index);
+ int i = 0;
+
+ GET_REQUIRED_BLOCKS(required_blks, fld_size, curr_blk_bytes);
+
+ i = required_blks;
+
+ while(i > 0)
+ {
+ CopyReadBinaryGetDataBlock(cstate, FIELD_DATA);
+ i--;
+ }
+
+ GET_RAW_BUF_INDEX(cstate->raw_buf_index, fld_size, required_blks, curr_blk_bytes);
+
+ /*
+ * raw_buf_index should never cross data block size,
+ * as the required number of data blocks would have
+ * been obtained in the above while loop.
+ */
+ Assert(cstate->raw_buf_index <= DATA_BLOCK_SIZE);
+ }
+ cstate->cur_attname = NULL;
+ }
+}
+
+/*
+ * CopyReadBinaryTupleWorker - each worker reads data from data blocks after
+ * getting leader-identified tuple offsets from ring data structure.
+ */
+static bool
+CopyReadBinaryTupleWorker(CopyState cstate, Datum *values, bool *nulls)
+{
+ uint32 line_pos;
+ ParallelCopyLineBoundary *line_info;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ int16 fld_count;
+ ListCell *cur;
+ FmgrInfo *in_functions = cstate->in_functions;
+ Oid *typioparams = cstate->typioparams;
+ TupleDesc tup_desc = RelationGetDescr(cstate->rel);
+
+ line_pos = GetLinePosition(cstate);
+
+ if (line_pos == -1)
+ return true;
+
+ line_info = &pcshared_info->line_boundaries.ring[line_pos];
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[line_info->first_block];
+ cstate->raw_buf_index = line_info->start_offset;
+
+#if 0
+ if (cstate->raw_buf_index + sizeof(fld_count) >= DATA_BLOCK_SIZE)
+ {
+ /*
+ * The case where field count spread across datablocks should never occur,
+ * as the leader would have moved it to next block. enable this if block
+ * for debugging purposes only.
+ */
+ elog(DEBUG1, "WORKER - field count spread across datablocks should never occur");
+ }
+#endif
+
+ memcpy(&fld_count, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_count));
+ fld_count = (int16) pg_ntoh16(fld_count);
+
+ CHECK_FIELD_COUNT;
+
+ cstate->raw_buf_index += sizeof(fld_count);
+
+ foreach(cur, cstate->attnumlist)
+ {
+ int att_num = lfirst_int(cur);
+ int m = att_num - 1;
+ Form_pg_attribute att = TupleDescAttr(tup_desc, m);
+ cstate->cur_attname = NameStr(att->attname);
+
+ values[m] = CopyReadBinaryAttributeWorker(cstate,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &nulls[m]);
+ cstate->cur_attname = NULL;
+ }
+
+ pg_atomic_sub_fetch_u32(&cstate->pcdata->curr_data_block->unprocessed_line_parts, 1);
+ line_info->start_offset = -1;
+ pg_atomic_write_u32(&line_info->line_state, LINE_WORKER_PROCESSED);
+ pg_atomic_write_u32(&line_info->line_size, -1);
+ pg_atomic_add_fetch_u64(&pcshared_info->total_worker_processed, 1);
+
+ return false;
+}
+
+/*
+ * CopyReadBinaryAttributeWorker - leader identifies boundaries/offsets
+ * for each attribute/column, it moves on to next data block if the
+ * attribute/column is spread across data blocks.
+ */
+static Datum
+CopyReadBinaryAttributeWorker(CopyState cstate, FmgrInfo *flinfo,
+ Oid typioparam, int32 typmod, bool *isnull)
+{
+ int32 fld_size;
+ Datum result;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+
+ if ((cstate->raw_buf_index + sizeof(fld_size)) >= DATA_BLOCK_SIZE)
+ {
+ ParallelCopyDataBlock *prev_data_block = cstate->pcdata->curr_data_block;
+ elog(DEBUG1, "WORKER - field size is spread across data blocks");
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[prev_data_block->following_block];
+ pg_atomic_sub_fetch_u32(&prev_data_block->unprocessed_line_parts, 1);
+ cstate->raw_buf_index = 0;
+ }
+
+ memcpy(&fld_size, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_size));
+ fld_size = (int32) pg_ntoh32(fld_size);
+
+ CHECK_FIELD_SIZE(fld_size);
+
+ cstate->raw_buf_index += sizeof(fld_size);
+
+ /* reset attribute_buf to empty, and load raw data in it */
+ resetStringInfo(&cstate->attribute_buf);
+
+ enlargeStringInfo(&cstate->attribute_buf, fld_size);
+
+ if ((DATA_BLOCK_SIZE - cstate->raw_buf_index) >= fld_size)
+ {
+ elog(DEBUG1, "WORKER - tuple lies in single data block");
+ memcpy(&cstate->attribute_buf.data[0],&cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], fld_size);
+ cstate->raw_buf_index += fld_size;
+ }
+ else
+ {
+ uint32 att_buf_idx = 0;
+ uint32 copy_bytes = 0;
+ int32 required_blks = 0;
+ int32 curr_blk_bytes = (DATA_BLOCK_SIZE - cstate->raw_buf_index);
+ ParallelCopyDataBlock *prev_data_block = NULL;
+ int i = 0;
+
+ GET_REQUIRED_BLOCKS(required_blks, fld_size, curr_blk_bytes);
+
+ i = required_blks;
+ prev_data_block = cstate->pcdata->curr_data_block;
+ elog(DEBUG1, "WORKER - tuple is spread across data blocks");
+ memcpy(&cstate->attribute_buf.data[0], &prev_data_block->data[cstate->raw_buf_index],
+ curr_blk_bytes);
+ copy_bytes = curr_blk_bytes;
+ att_buf_idx = curr_blk_bytes;
+
+ while (i>0)
+ {
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[prev_data_block->following_block];
+ pg_atomic_sub_fetch_u32(&prev_data_block->unprocessed_line_parts, 1);
+ cstate->raw_buf_index = 0;
+ copy_bytes = fld_size - att_buf_idx;
+
+ /*
+ * the bytes that are yet to be taken into att buff are more than
+ * the entire data block size, but only take the data block size
+ * elements.
+ */
+ if (copy_bytes >= DATA_BLOCK_SIZE)
+ copy_bytes = DATA_BLOCK_SIZE;
+
+ memcpy(&cstate->attribute_buf.data[att_buf_idx],
+ &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], copy_bytes);
+ att_buf_idx += copy_bytes;
+ i--;
+ }
+ GET_RAW_BUF_INDEX(cstate->raw_buf_index, fld_size, required_blks, curr_blk_bytes);
+ }
+
+ cstate->attribute_buf.len = fld_size;
+ cstate->attribute_buf.data[fld_size] = '\0';
+
+ /* Call the column type's binary input converter */
+ result = ReceiveFunctionCall(flinfo, &cstate->attribute_buf,
+ typioparam, typmod);
+
+ /* Trouble if it didn't eat the whole buffer */
+ if (cstate->attribute_buf.cursor != cstate->attribute_buf.len)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
+ errmsg("incorrect binary data format")));
+
+ *isnull = false;
+ return result;
+}
+
+/*
+ * GetLinePosition - return the line position that worker should pcdata->process.
*/
static uint32
GetLinePosition(CopyState cstate)
@@ -1768,7 +2278,9 @@ GetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
{
dataBlkPtr->curr_blk_completed = false;
dataBlkPtr->skip_bytes = 0;
+ dataBlkPtr->following_block = -1;
pcshared_info->cur_block_pos = block_pos;
+ MemSet(&dataBlkPtr->data[0], 0, DATA_BLOCK_SIZE);
return block_pos;
}
@@ -5453,60 +5965,47 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
else
{
/* binary */
- int16 fld_count;
- ListCell *cur;
-
- cstate->cur_lineno++;
-
- if (!CopyGetInt16(cstate, &fld_count))
+ if (!IsParallelCopy())
{
- /* EOF detected (end of file, or protocol-level EOF) */
- return false;
- }
+ int16 fld_count;
+ ListCell *cur;
- if (fld_count == -1)
- {
- /*
- * Received EOF marker. In a V3-protocol copy, wait for the
- * protocol-level EOF, and complain if it doesn't come
- * immediately. This ensures that we correctly handle CopyFail,
- * if client chooses to send that now.
- *
- * Note that we MUST NOT try to read more data in an old-protocol
- * copy, since there is no protocol-level EOF marker then. We
- * could go either way for copy from file, but choose to throw
- * error if there's data after the EOF marker, for consistency
- * with the new-protocol case.
- */
- char dummy;
+ cstate->cur_lineno++;
+ cstate->max_fields = list_length(cstate->attnumlist);
- if (cstate->copy_dest != COPY_OLD_FE &&
- CopyGetData(cstate, &dummy, 1, 1) > 0)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("received copy data after EOF marker")));
- return false;
- }
+ if (!CopyGetInt16(cstate, &fld_count))
+ {
+ /* EOF detected (end of file, or protocol-level EOF) */
+ return false;
+ }
- if (fld_count != attr_count)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("row field count is %d, expected %d",
- (int) fld_count, attr_count)));
+ CHECK_FIELD_COUNT;
- foreach(cur, cstate->attnumlist)
+ foreach(cur, cstate->attnumlist)
+ {
+ int attnum = lfirst_int(cur);
+ int m = attnum - 1;
+ Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+ cstate->cur_attname = NameStr(att->attname);
+ values[m] = CopyReadBinaryAttribute(cstate,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &nulls[m]);
+ cstate->cur_attname = NULL;
+ }
+ }
+ else
{
- int attnum = lfirst_int(cur);
- int m = attnum - 1;
- Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+ bool eof = false;
+ cstate->cur_lineno++;
+ cstate->max_fields = list_length(cstate->attnumlist);
- cstate->cur_attname = NameStr(att->attname);
- values[m] = CopyReadBinaryAttribute(cstate,
- &in_functions[m],
- typioparams[m],
- att->atttypmod,
- &nulls[m]);
- cstate->cur_attname = NULL;
+ eof = CopyReadBinaryTupleWorker(cstate, values, nulls);
+
+ if (eof)
+ return false;
}
}
@@ -6507,18 +7006,15 @@ CopyReadBinaryAttribute(CopyState cstate, FmgrInfo *flinfo,
Datum result;
if (!CopyGetInt32(cstate, &fld_size))
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("unexpected EOF in COPY data")));
+ EOF_ERROR;
+
if (fld_size == -1)
{
*isnull = true;
return ReceiveFunctionCall(flinfo, NULL, typioparam, typmod);
}
- if (fld_size < 0)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("invalid field size")));
+
+ CHECK_FIELD_SIZE(fld_size);
/* reset attribute_buf to empty, and load raw data in it */
resetStringInfo(&cstate->attribute_buf);
@@ -6526,9 +7022,7 @@ CopyReadBinaryAttribute(CopyState cstate, FmgrInfo *flinfo,
enlargeStringInfo(&cstate->attribute_buf, fld_size);
if (CopyGetData(cstate, cstate->attribute_buf.data,
fld_size, fld_size) != fld_size)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("unexpected EOF in COPY data")));
+ EOF_ERROR;
cstate->attribute_buf.len = fld_size;
cstate->attribute_buf.data[fld_size] = '\0';
--
1.8.3.1
0001-Copy-code-readjustment-to-support-parallel-copy.patchtext/x-patch; charset=US-ASCII; name=0001-Copy-code-readjustment-to-support-parallel-copy.patchDownload
From d80c94ebf2922b539ba9abc21d2afea493b3deef Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Fri, 17 Jul 2020 13:15:29 +0530
Subject: [PATCH 1/6] Copy code readjustment to support parallel copy.
This patch has the copy code slightly readjusted so that the common code is
separated to functions/macros, these functions/macros will be used by the
workers in the parallel copy code of the upcoming patches. EOL removal is moved
from CopyReadLine to CopyReadLineText, this change was required because in case
of parallel copy the record identification and record updation is done in
CopyReadLineText, before record information is updated in shared memory the new
line characters should be removed.
---
src/backend/commands/copy.c | 361 ++++++++++++++++++++++++++------------------
1 file changed, 218 insertions(+), 143 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 44da71c..09419ea 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -95,6 +95,9 @@ typedef enum CopyInsertMethod
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
} CopyInsertMethod;
+#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
+
/*
* This struct contains all the state variables used throughout a COPY
* operation. For simplicity, we use the same struct for all variants of COPY,
@@ -222,7 +225,6 @@ typedef struct CopyStateData
* converts it. Note: we guarantee that there is a \0 at
* raw_buf[raw_buf_len].
*/
-#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
char *raw_buf;
int raw_buf_index; /* next byte to process */
int raw_buf_len; /* total # of bytes stored */
@@ -350,6 +352,27 @@ if (1) \
goto not_end_of_copy; \
} else ((void) 0)
+/*
+ * CLEAR_EOL_LINE - Wrapper for clearing EOL.
+ */
+#define CLEAR_EOL_LINE() \
+if (!result && !IsHeaderLine()) \
+ ClearEOLFromCopiedData(cstate, cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ &cstate->line_buf.len) \
+
+/*
+ * INCREMENTPROCESSED - Increment the lines processed.
+ */
+#define INCREMENTPROCESSED(processed) \
+processed++;
+
+/*
+ * GETPROCESSED - Get the lines processed.
+ */
+#define GETPROCESSED(processed) \
+return processed;
+
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
@@ -395,7 +418,11 @@ static bool CopyGetInt32(CopyState cstate, int32 *val);
static void CopySendInt16(CopyState cstate, int16 val);
static bool CopyGetInt16(CopyState cstate, int16 *val);
-
+static void PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tup_desc,
+ List *attnamelist);
+static void ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
+ int copy_line_pos, int *copy_line_size);
+static void ConvertToServerEncoding(CopyState cstate);
/*
* Send copy start/stop messages for frontend copies. These have changed
* in past protocol redesigns.
@@ -796,6 +823,7 @@ CopyLoadRawBuf(CopyState cstate)
{
int nbytes;
int inbytes;
+ int minread = 1;
if (cstate->raw_buf_index < cstate->raw_buf_len)
{
@@ -807,8 +835,11 @@ CopyLoadRawBuf(CopyState cstate)
else
nbytes = 0; /* no data need be saved */
+ if (cstate->copy_dest == COPY_NEW_FE)
+ minread = RAW_BUF_SIZE - nbytes;
+
inbytes = CopyGetData(cstate, cstate->raw_buf + nbytes,
- 1, RAW_BUF_SIZE - nbytes);
+ minread, RAW_BUF_SIZE - nbytes);
nbytes += inbytes;
cstate->raw_buf[nbytes] = '\0';
cstate->raw_buf_index = 0;
@@ -1466,7 +1497,6 @@ BeginCopy(ParseState *pstate,
{
CopyState cstate;
TupleDesc tupDesc;
- int num_phys_attrs;
MemoryContext oldcontext;
/* Allocate workspace and zero all fields */
@@ -1632,6 +1662,24 @@ BeginCopy(ParseState *pstate,
tupDesc = cstate->queryDesc->tupDesc;
}
+ PopulateGlobalsForCopyFrom(cstate, tupDesc, attnamelist);
+ cstate->copy_dest = COPY_FILE; /* default */
+
+ MemoryContextSwitchTo(oldcontext);
+
+ return cstate;
+}
+
+/*
+ * PopulateGlobalsForCopyFrom - Populates the common variables required for copy
+ * from operation. This is a helper function for BeginCopy function.
+ */
+static void
+PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tupDesc,
+ List *attnamelist)
+{
+ int num_phys_attrs;
+
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
@@ -1751,12 +1799,6 @@ BeginCopy(ParseState *pstate,
pg_database_encoding_max_length() > 1);
/* See Multibyte encoding comment above */
cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
-
- cstate->copy_dest = COPY_FILE; /* default */
-
- MemoryContextSwitchTo(oldcontext);
-
- return cstate;
}
/*
@@ -2648,32 +2690,11 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
}
/*
- * Copy FROM file to relation.
+ * Check if the relation specified in copy from is valid.
*/
-uint64
-CopyFrom(CopyState cstate)
+static void
+CheckTargetRelValidity(CopyState cstate)
{
- ResultRelInfo *resultRelInfo;
- ResultRelInfo *target_resultRelInfo;
- ResultRelInfo *prevResultRelInfo = NULL;
- EState *estate = CreateExecutorState(); /* for ExecConstraints() */
- ModifyTableState *mtstate;
- ExprContext *econtext;
- TupleTableSlot *singleslot = NULL;
- MemoryContext oldcontext = CurrentMemoryContext;
-
- PartitionTupleRouting *proute = NULL;
- ErrorContextCallback errcallback;
- CommandId mycid = GetCurrentCommandId(true);
- int ti_options = 0; /* start with default options for insert */
- BulkInsertState bistate = NULL;
- CopyInsertMethod insertMethod;
- CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
- uint64 processed = 0;
- bool has_before_insert_row_trig;
- bool has_instead_insert_row_trig;
- bool leafpart_use_multi_insert = false;
-
Assert(cstate->rel);
/*
@@ -2710,27 +2731,6 @@ CopyFrom(CopyState cstate)
RelationGetRelationName(cstate->rel))));
}
- /*
- * If the target file is new-in-transaction, we assume that checking FSM
- * for free space is a waste of time. This could possibly be wrong, but
- * it's unlikely.
- */
- if (RELKIND_HAS_STORAGE(cstate->rel->rd_rel->relkind) &&
- (cstate->rel->rd_createSubid != InvalidSubTransactionId ||
- cstate->rel->rd_firstRelfilenodeSubid != InvalidSubTransactionId))
- ti_options |= TABLE_INSERT_SKIP_FSM;
-
- /*
- * Optimize if new relfilenode was created in this subxact or one of its
- * committed children and we won't see those rows later as part of an
- * earlier scan or command. The subxact test ensures that if this subxact
- * aborts then the frozen rows won't be visible after xact cleanup. Note
- * that the stronger test of exactly which subtransaction created it is
- * crucial for correctness of this optimization. The test for an earlier
- * scan or command tolerates false negatives. FREEZE causes other sessions
- * to see rows they would not see under MVCC, and a false negative merely
- * spreads that anomaly to the current session.
- */
if (cstate->freeze)
{
/*
@@ -2768,9 +2768,61 @@ CopyFrom(CopyState cstate)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("cannot perform COPY FREEZE because the table was not created or truncated in the current subtransaction")));
+ }
+}
+
+/*
+ * Copy FROM file to relation.
+ */
+uint64
+CopyFrom(CopyState cstate)
+{
+ ResultRelInfo *resultRelInfo;
+ ResultRelInfo *target_resultRelInfo;
+ ResultRelInfo *prevResultRelInfo = NULL;
+ EState *estate = CreateExecutorState(); /* for ExecConstraints() */
+ ModifyTableState *mtstate;
+ ExprContext *econtext;
+ TupleTableSlot *singleslot = NULL;
+ MemoryContext oldcontext = CurrentMemoryContext;
+ PartitionTupleRouting *proute = NULL;
+ ErrorContextCallback errcallback;
+ CommandId mycid = GetCurrentCommandId(true);
+ int ti_options = 0; /* start with default options for insert */
+ BulkInsertState bistate = NULL;
+ CopyInsertMethod insertMethod;
+ CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
+ uint64 processed = 0;
+ bool has_before_insert_row_trig;
+ bool has_instead_insert_row_trig;
+ bool leafpart_use_multi_insert = false;
+
+ CheckTargetRelValidity(cstate);
+
+ /*
+ * If the target file is new-in-transaction, we assume that checking FSM
+ * for free space is a waste of time. This could possibly be wrong, but
+ * it's unlikely.
+ */
+ if (RELKIND_HAS_STORAGE(cstate->rel->rd_rel->relkind) &&
+ (cstate->rel->rd_createSubid != InvalidSubTransactionId ||
+ cstate->rel->rd_firstRelfilenodeSubid != InvalidSubTransactionId))
+ ti_options |= TABLE_INSERT_SKIP_FSM;
+
+ /*
+ * Optimize if new relfilenode was created in this subxact or one of its
+ * committed children and we won't see those rows later as part of an
+ * earlier scan or command. The subxact test ensures that if this subxact
+ * aborts then the frozen rows won't be visible after xact cleanup. Note
+ * that the stronger test of exactly which subtransaction created it is
+ * crucial for correctness of this optimization. The test for an earlier
+ * scan or command tolerates false negatives. FREEZE causes other sessions
+ * to see rows they would not see under MVCC, and a false negative merely
+ * spreads that anomaly to the current session.
+ */
+ if (cstate->freeze)
ti_options |= TABLE_INSERT_FROZEN;
- }
/*
* We need a ResultRelInfo so we can use the regular executor's
@@ -3263,7 +3315,7 @@ CopyFrom(CopyState cstate)
* or FDW; this is the same definition used by nodeModifyTable.c
* for counting tuples inserted by an INSERT command.
*/
- processed++;
+ INCREMENTPROCESSED(processed)
}
}
@@ -3318,30 +3370,15 @@ CopyFrom(CopyState cstate)
FreeExecutorState(estate);
- return processed;
+ GETPROCESSED(processed)
}
/*
- * Setup to read tuples from a file for COPY FROM.
- *
- * 'rel': Used as a template for the tuples
- * 'filename': Name of server-local file to read
- * 'attnamelist': List of char *, columns to include. NIL selects all cols.
- * 'options': List of DefElem. See copy_opt_item in gram.y for selections.
- *
- * Returns a CopyState, to be passed to NextCopyFrom and related functions.
+ * PopulateCatalogInformation - populate the catalog information.
*/
-CopyState
-BeginCopyFrom(ParseState *pstate,
- Relation rel,
- const char *filename,
- bool is_program,
- copy_data_source_cb data_source_cb,
- List *attnamelist,
- List *options)
+static void
+PopulateCatalogInformation(CopyState cstate)
{
- CopyState cstate;
- bool pipe = (filename == NULL);
TupleDesc tupDesc;
AttrNumber num_phys_attrs,
num_defaults;
@@ -3351,38 +3388,8 @@ BeginCopyFrom(ParseState *pstate,
Oid in_func_oid;
int *defmap;
ExprState **defexprs;
- MemoryContext oldcontext;
bool volatile_defexprs;
- cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
- oldcontext = MemoryContextSwitchTo(cstate->copycontext);
-
- /* Initialize state variables */
- cstate->reached_eof = false;
- cstate->eol_type = EOL_UNKNOWN;
- cstate->cur_relname = RelationGetRelationName(cstate->rel);
- cstate->cur_lineno = 0;
- cstate->cur_attname = NULL;
- cstate->cur_attval = NULL;
-
- /*
- * Set up variables to avoid per-attribute overhead. attribute_buf is
- * used in both text and binary modes, but we use line_buf and raw_buf
- * only in text mode.
- */
- initStringInfo(&cstate->attribute_buf);
- if (!cstate->binary)
- {
- initStringInfo(&cstate->line_buf);
- cstate->line_buf_converted = false;
- cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
- cstate->raw_buf_index = cstate->raw_buf_len = 0;
- }
-
- /* Assign range table, we'll need it in CopyFrom. */
- if (pstate)
- cstate->range_table = pstate->p_rtable;
-
tupDesc = RelationGetDescr(cstate->rel);
num_phys_attrs = tupDesc->natts;
num_defaults = 0;
@@ -3460,6 +3467,61 @@ BeginCopyFrom(ParseState *pstate,
cstate->defexprs = defexprs;
cstate->volatile_defexprs = volatile_defexprs;
cstate->num_defaults = num_defaults;
+}
+
+/*
+ * Setup to read tuples from a file for COPY FROM.
+ *
+ * 'rel': Used as a template for the tuples
+ * 'filename': Name of server-local file to read
+ * 'attnamelist': List of char *, columns to include. NIL selects all cols.
+ * 'options': List of DefElem. See copy_opt_item in gram.y for selections.
+ *
+ * Returns a CopyState, to be passed to NextCopyFrom and related functions.
+ */
+CopyState
+BeginCopyFrom(ParseState *pstate,
+ Relation rel,
+ const char *filename,
+ bool is_program,
+ copy_data_source_cb data_source_cb,
+ List *attnamelist,
+ List *options)
+{
+ CopyState cstate;
+ bool pipe = (filename == NULL);
+ MemoryContext oldcontext;
+
+ cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+
+ /* Initialize state variables */
+ cstate->reached_eof = false;
+ cstate->eol_type = EOL_UNKNOWN;
+ cstate->cur_relname = RelationGetRelationName(cstate->rel);
+ cstate->cur_lineno = 0;
+ cstate->cur_attname = NULL;
+ cstate->cur_attval = NULL;
+
+ /*
+ * Set up variables to avoid per-attribute overhead. attribute_buf is
+ * used in both text and binary modes, but we use line_buf and raw_buf
+ * only in text mode.
+ */
+ initStringInfo(&cstate->attribute_buf);
+ if (!cstate->binary)
+ {
+ initStringInfo(&cstate->line_buf);
+ cstate->line_buf_converted = false;
+ cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+ }
+
+ /* Assign range table, we'll need it in CopyFrom. */
+ if (pstate)
+ cstate->range_table = pstate->p_rtable;
+
+ PopulateCatalogInformation(cstate);
cstate->is_program = is_program;
if (data_source_cb)
@@ -3869,45 +3931,60 @@ CopyReadLine(CopyState cstate)
} while (CopyLoadRawBuf(cstate));
}
}
- else
- {
- /*
- * If we didn't hit EOF, then we must have transferred the EOL marker
- * to line_buf along with the data. Get rid of it.
- */
- switch (cstate->eol_type)
- {
- case EOL_NL:
- Assert(cstate->line_buf.len >= 1);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\n');
- cstate->line_buf.len--;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_CR:
- Assert(cstate->line_buf.len >= 1);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\r');
- cstate->line_buf.len--;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_CRNL:
- Assert(cstate->line_buf.len >= 2);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 2] == '\r');
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\n');
- cstate->line_buf.len -= 2;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_UNKNOWN:
- /* shouldn't get here */
- Assert(false);
- break;
- }
- }
+ ConvertToServerEncoding(cstate);
+ return result;
+}
+
+/*
+ * ClearEOLFromCopiedData - Clear EOL from the copied data.
+ */
+static void
+ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
+ int copy_line_pos, int *copy_line_size)
+{
+ /*
+ * If we didn't hit EOF, then we must have transferred the EOL marker
+ * to line_buf along with the data. Get rid of it.
+ */
+ switch (cstate->eol_type)
+ {
+ case EOL_NL:
+ Assert(*copy_line_size >= 1);
+ Assert(copy_line_data[copy_line_pos - 1] == '\n');
+ copy_line_data[copy_line_pos - 1] = '\0';
+ (*copy_line_size)--;
+ break;
+ case EOL_CR:
+ Assert(*copy_line_size >= 1);
+ Assert(copy_line_data[copy_line_pos - 1] == '\r');
+ copy_line_data[copy_line_pos - 1] = '\0';
+ (*copy_line_size)--;
+ break;
+ case EOL_CRNL:
+ Assert(*copy_line_size >= 2);
+ Assert(copy_line_data[copy_line_pos - 2] == '\r');
+ Assert(copy_line_data[copy_line_pos - 1] == '\n');
+ copy_line_data[copy_line_pos - 2] = '\0';
+ *copy_line_size -= 2;
+ break;
+ case EOL_UNKNOWN:
+ /* shouldn't get here */
+ Assert(false);
+ break;
+ }
+}
+
+/*
+ * ConvertToServerEncoding - convert contents to server encoding.
+ */
+static void
+ConvertToServerEncoding(CopyState cstate)
+{
/* Done reading the line. Convert it to server encoding. */
if (cstate->need_transcoding)
{
char *cvt;
-
cvt = pg_any_to_server(cstate->line_buf.data,
cstate->line_buf.len,
cstate->file_encoding);
@@ -3919,11 +3996,8 @@ CopyReadLine(CopyState cstate)
pfree(cvt);
}
}
-
/* Now it's safe to use the buffer in error messages */
cstate->line_buf_converted = true;
-
- return result;
}
/*
@@ -4286,6 +4360,7 @@ not_end_of_copy:
* Transfer any still-uncopied data to line_buf.
*/
REFILL_LINEBUF;
+ CLEAR_EOL_LINE();
return result;
}
--
1.8.3.1
0002-Framework-for-leader-worker-in-parallel-copy.patchtext/x-patch; charset=US-ASCII; name=0002-Framework-for-leader-worker-in-parallel-copy.patchDownload
From fa5efa81c002d077a63d387ed43db96b5b16aabe Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Fri, 17 Jul 2020 13:16:08 +0530
Subject: [PATCH 2/6] Framework for leader/worker in parallel copy
This patch has the framework for data structures in parallel copy, leader
initialization, worker initialization, shared memory updation, starting workers,
wait for workers and workers exiting.
---
src/backend/access/transam/parallel.c | 4 +
src/backend/commands/copy.c | 897 +++++++++++++++++++++++++++-
src/backend/replication/logical/tablesync.c | 2 +-
src/include/commands/copy.h | 4 +
src/tools/pgindent/typedefs.list | 8 +
5 files changed, 910 insertions(+), 5 deletions(-)
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index b042696..a3cff4b 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -25,6 +25,7 @@
#include "catalog/pg_enum.h"
#include "catalog/storage.h"
#include "commands/async.h"
+#include "commands/copy.h"
#include "executor/execParallel.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
@@ -145,6 +146,9 @@ static const struct
},
{
"parallel_vacuum_main", parallel_vacuum_main
+ },
+ {
+ "ParallelCopyMain", ParallelCopyMain
}
};
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 09419ea..50da871 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -96,9 +96,174 @@ typedef enum CopyInsertMethod
} CopyInsertMethod;
#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+
+/*
+ * The macros DATA_BLOCK_SIZE, RINGSIZE & MAX_BLOCKS_COUNT stores the records
+ * read from the file that need to be inserted into the relation. These values
+ * help in handover of multiple records with significant size of data to be
+ * processed by each of the workers to make sure there is no context swich & the
+ * work is fairly distrtibuted among the workers. This number showed best
+ * results in the performance tests.
+ */
+#define DATA_BLOCK_SIZE RAW_BUF_SIZE
+
+/* It can hold upto 10000 record information for worker to process. */
+#define RINGSIZE (10 * 1000)
+
+/* It can hold 1000 blocks of 64K data in DSM to be processed by the worker. */
+#define MAX_BLOCKS_COUNT 1000
+
+/*
+ * Each worker will be allocated WORKER_CHUNK_COUNT of records from DSM data
+ * block to process to avoid lock contention. This value should be mode of
+ * RINGSIZE, as wrap around cases is currently not handled while selecting the
+ * WORKER_CHUNK_COUNT by the worker.
+ */
+#define WORKER_CHUNK_COUNT 50
+
+#define IsParallelCopy() (cstate->is_parallel)
+#define IsLeader() (cstate->pcdata->is_leader)
#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
/*
+ * Copy data block information.
+ * ParallelCopyDataBlock's will be created in DSM. Data read from file will be
+ * copied in these DSM data blockes. The leader process identifies the records
+ * and the record information will be shared to the workers. The workers will
+ * insert the records into the table. There can be one or more number of records
+ * in each of the data block based on the record size.
+ */
+typedef struct ParallelCopyDataBlock
+{
+ /* The number of unprocessed lines in the current block. */
+ pg_atomic_uint32 unprocessed_line_parts;
+
+ /*
+ * If the current line data is continued into another block, following_block
+ * will have the position where the remaining data need to be read.
+ */
+ uint32 following_block;
+
+ /*
+ * This flag will be set, when the leader finds out this block can be read
+ * safely by the worker. This helps the worker to start processing the line
+ * early where the line will be spread across many blocks and the worker
+ * need not wait for the complete line to be processed.
+ */
+ bool curr_blk_completed;
+ char data[DATA_BLOCK_SIZE]; /* data read from file */
+ uint8 skip_bytes;
+}ParallelCopyDataBlock;
+
+/*
+ * Individual line information.
+ * ParallelCopyLineBoundary is common data structure between leader & worker,
+ * Leader process will be populating data block, data block offset & the size of
+ * the record in DSM for the workers to copy the data into the relation.
+ * This is protected by the following sequence in the leader & worker. If they
+ * don't follow this order the worker might process wrong line_size and leader
+ * might populate the information which worker has not yet processed or in the
+ * process of processing.
+ * Leader should operate in the following order:
+ * 1) check if line_size is -1, if not wait, it means worker is still
+ * processing.
+ * 2) set line_state to LINE_LEADER_POPULATING.
+ * 3) update first_block, start_offset & cur_lineno in any order.
+ * 4) update line_size.
+ * 5) update line_state to LINE_LEADER_POPULATED.
+ * Worker should operate in the following order:
+ * 1) check line_state is LINE_LEADER_POPULATED, if not it means leader is still
+ * populating the data.
+ * 2) read line_size.
+ * 3) only one worker should choose one line for processing, this is handled by
+ * using pg_atomic_compare_exchange_u32, worker will change the sate to
+ * LINE_WORKER_PROCESSING only if line_state is LINE_LEADER_POPULATED.
+ * 4) read first_block, start_offset & cur_lineno in any order.
+ * 5) process line_size data.
+ * 6) update line_size to -1.
+ */
+typedef struct ParallelCopyLineBoundary
+{
+ /* Position of the first block in data_blocks array. */
+ uint32 first_block;
+ uint32 start_offset; /* start offset of the line */
+
+ /*
+ * Size of the current line -1 means line is yet to be filled completely,
+ * 0 means empty line, >0 means line filled with line size data.
+ */
+ pg_atomic_uint32 line_size;
+ pg_atomic_uint32 line_state; /* line state */
+ uint64 cur_lineno; /* line number for error messages */
+}ParallelCopyLineBoundary;
+
+/*
+ * Circular queue used to store the line information.
+ */
+typedef struct ParallelCopyLineBoundaries
+{
+ /* Position for the leader to populate a line. */
+ uint32 pos;
+
+ /* Data read from the file/stdin by the leader process. */
+ ParallelCopyLineBoundary ring[RINGSIZE];
+}ParallelCopyLineBoundaries;
+
+/*
+ * Shared information among parallel copy workers. This will be allocated in the
+ * DSM segment.
+ */
+typedef struct ParallelCopyShmInfo
+{
+ bool is_read_in_progress; /* file read status */
+
+ /*
+ * Actual lines inserted by worker (some records will be filtered based on
+ * where condition).
+ */
+ pg_atomic_uint64 processed;
+ pg_atomic_uint64 total_worker_processed; /* total processed records by the workers */
+ uint64 populated; /* lines populated by leader */
+ uint32 cur_block_pos; /* current data block */
+ ParallelCopyDataBlock data_blocks[MAX_BLOCKS_COUNT]; /* data block array */
+ FullTransactionId full_transaction_id; /* xid for copy from statement */
+ CommandId mycid; /* command id */
+ ParallelCopyLineBoundaries line_boundaries; /* line array */
+} ParallelCopyShmInfo;
+
+/*
+ * Parallel copy line buffer information.
+ */
+typedef struct ParallelCopyLineBuf
+{
+ StringInfoData line_buf;
+ uint64 cur_lineno; /* line number for error messages */
+}ParallelCopyLineBuf;
+
+/*
+ * Parallel copy data information.
+ */
+typedef struct ParallelCopyData
+{
+ Oid relid; /* relation id of the table */
+ ParallelCopyShmInfo *pcshared_info; /* common info in shared memory */
+ bool is_leader;
+
+ /* line position which worker is processing */
+ uint32 worker_processed_pos;
+
+ /*
+ * Local line_buf array, workers will copy it here and release the lines
+ * for the leader to continue.
+ */
+ ParallelCopyLineBuf worker_line_buf[WORKER_CHUNK_COUNT];
+ uint32 worker_line_buf_count; /* Number of lines */
+
+ /* Current position in worker_line_buf */
+ uint32 worker_line_buf_pos;
+}ParallelCopyData;
+
+/*
* This struct contains all the state variables used throughout a COPY
* operation. For simplicity, we use the same struct for all variants of COPY,
* even though some fields are used in only some cases.
@@ -228,8 +393,66 @@ typedef struct CopyStateData
char *raw_buf;
int raw_buf_index; /* next byte to process */
int raw_buf_len; /* total # of bytes stored */
+ int nworkers;
+ bool is_parallel;
+ ParallelCopyData *pcdata;
} CopyStateData;
+/*
+ * This structure helps in storing the common data from CopyStateData that are
+ * required by the workers. This information will then be allocated and stored
+ * into the DSM for the worker to retrieve and copy it to CopyStateData.
+ */
+typedef struct SerializedParallelCopyState
+{
+ /* low-level state data */
+ CopyDest copy_dest; /* type of copy source/destination */
+ int file_encoding; /* file or remote side's character encoding */
+ bool need_transcoding; /* file encoding diff from server? */
+ bool encoding_embeds_ascii; /* ASCII can be non-first byte? */
+
+ /* parameters from the COPY command */
+ bool csv_mode; /* Comma Separated Value format? */
+ bool header_line; /* CSV header line? */
+ int null_print_len; /* length of same */
+ bool force_quote_all; /* FORCE_QUOTE *? */
+ bool convert_selectively; /* do selective binary conversion? */
+
+ /* Working state for COPY FROM */
+ AttrNumber num_defaults;
+ Oid relid;
+}SerializedParallelCopyState;
+
+/*
+ * This structure will help in converting a List data type into the below
+ * structure format with the count having the number of elements in the list and
+ * the info having the List elements appended contigously. This converted
+ * structure will be allocated in shared memory and stored in DSM for the worker
+ * to retrieve and later convert it back to List data type.
+ */
+typedef struct ParallelCopyKeyListInfo
+{
+ int count; /* count of attributes */
+
+ /* string info in the form info followed by info1, info2... infon */
+ char info[1];
+} ParallelCopyKeyListInfo;
+
+/*
+ * List of internal parallel copy function pointers.
+ */
+static const struct
+{
+ char *fn_name;
+ copy_data_source_cb fn_addr;
+} InternalParallelCopyFuncPtrs[] =
+
+{
+ {
+ "copy_read_data", copy_read_data
+ },
+};
+
/* DestReceiver for COPY (query) TO */
typedef struct
{
@@ -259,6 +482,23 @@ typedef struct
/* Trim the list of buffers back down to this number after flushing */
#define MAX_PARTITION_BUFFERS 32
+/* DSM keys for parallel copy. */
+#define PARALLEL_COPY_KEY_SHARED_INFO 1
+#define PARALLEL_COPY_KEY_CSTATE 2
+#define PARALLEL_COPY_KEY_NULL_PRINT 3
+#define PARALLEL_COPY_KEY_NULL_PRINT_CLIENT 4
+#define PARALLEL_COPY_KEY_DELIM 5
+#define PARALLEL_COPY_KEY_QUOTE 6
+#define PARALLEL_COPY_KEY_ESCAPE 7
+#define PARALLEL_COPY_KEY_ATTNAME_LIST 8
+#define PARALLEL_COPY_KEY_FORCE_QUOTE_LIST 9
+#define PARALLEL_COPY_KEY_NOT_NULL_LIST 10
+#define PARALLEL_COPY_KEY_NULL_LIST 11
+#define PARALLEL_COPY_KEY_CONVERT_LIST 12
+#define PARALLEL_COPY_KEY_DATASOURCE_CB 13
+#define PARALLEL_COPY_KEY_WHERE_CLAUSE_STR 14
+#define PARALLEL_COPY_KEY_RANGE_TABLE 15
+
/* Stores multi-insert data related to a single relation in CopyFrom. */
typedef struct CopyMultiInsertBuffer
{
@@ -418,11 +658,593 @@ static bool CopyGetInt32(CopyState cstate, int32 *val);
static void CopySendInt16(CopyState cstate, int16 val);
static bool CopyGetInt16(CopyState cstate, int16 *val);
+static pg_attribute_always_inline void EndParallelCopy(ParallelContext *pcxt);
static void PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tup_desc,
List *attnamelist);
static void ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
int copy_line_pos, int *copy_line_size);
static void ConvertToServerEncoding(CopyState cstate);
+static pg_attribute_always_inline copy_data_source_cb LookupParallelCopyFnPtr(const char *funcname);
+static pg_attribute_always_inline char* LookupParallelCopyFnStr(copy_data_source_cb fn_addr);
+
+/*
+ * SerializeParallelCopyState - Copy shared_cstate using cstate information.
+ */
+static pg_attribute_always_inline void
+SerializeParallelCopyState(CopyState cstate, SerializedParallelCopyState *shared_cstate)
+{
+ shared_cstate->copy_dest = cstate->copy_dest;
+ shared_cstate->file_encoding = cstate->file_encoding;
+ shared_cstate->need_transcoding = cstate->need_transcoding;
+ shared_cstate->encoding_embeds_ascii = cstate->encoding_embeds_ascii;
+ shared_cstate->csv_mode = cstate->csv_mode;
+ shared_cstate->header_line = cstate->header_line;
+ shared_cstate->null_print_len = cstate->null_print_len;
+ shared_cstate->force_quote_all = cstate->force_quote_all;
+ shared_cstate->convert_selectively = cstate->convert_selectively;
+ shared_cstate->num_defaults = cstate->num_defaults;
+ shared_cstate->relid = cstate->pcdata->relid;
+}
+
+/*
+ * DeserializeString - Retrieve the string from shared memory.
+ */
+static void
+DeserializeString(shm_toc *toc, int sharedkey, char **copystr)
+{
+ char *shared_str_val = (char *)shm_toc_lookup(toc, sharedkey, true);
+ if (shared_str_val)
+ *copystr = pstrdup(shared_str_val);
+}
+
+/*
+ * DeserializeList - Retrieve the list from shared memory.
+ */
+static void
+DeserializeList(shm_toc *toc, int sharedkey, List **copylist)
+{
+ ParallelCopyKeyListInfo *listinformation = (ParallelCopyKeyListInfo *)shm_toc_lookup(toc, sharedkey,
+ true);
+ if (listinformation)
+ {
+ int length = 0;
+ int count;
+
+ for (count = 0; count < listinformation->count; count++)
+ {
+ char *attname = (char *)(listinformation->info + length);
+ length += strlen(attname) + 1;
+ *copylist = lappend(*copylist, makeString(attname));
+ }
+ }
+}
+
+/*
+ * CopyListSharedMemory - Copy the list into shared memory.
+ */
+static void
+CopyListSharedMemory(List *inputlist, Size memsize, ParallelCopyKeyListInfo *sharedlistinfo)
+{
+ ListCell *l;
+ int length = 0;
+
+ MemSet(sharedlistinfo, 0, memsize);
+ foreach(l, inputlist)
+ {
+ char *name = strVal(lfirst(l));
+ memcpy((char *)(sharedlistinfo->info + length), name, strlen(name));
+ sharedlistinfo->count++;
+ length += strlen(name) + 1;
+ }
+}
+
+/*
+ * ComputeListSize - compute the list size.
+ */
+static int
+ComputeListSize(List *inputlist)
+{
+ int est_size = sizeof(int);
+ if (inputlist != NIL)
+ {
+ ListCell *l;
+ foreach(l, inputlist)
+ est_size += strlen(strVal(lfirst(l))) + 1;
+ }
+
+ return est_size;
+}
+
+/*
+ * EstimateLineKeysStr - Estimate the size required in shared memory for the
+ * input string.
+ */
+static void
+EstimateLineKeysStr(ParallelContext *pcxt, char *inputstr)
+{
+ if (inputstr)
+ {
+ shm_toc_estimate_chunk(&pcxt->estimator, strlen(inputstr) + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+}
+
+/*
+ * EstimateLineKeysList - Estimate the size required in shared memory for the
+ * input list.
+ */
+static void
+EstimateLineKeysList(ParallelContext *pcxt, List *inputlist,
+ Size *est_list_size)
+{
+ if (inputlist != NIL)
+ {
+ *est_list_size = ComputeListSize(inputlist);
+ shm_toc_estimate_chunk(&pcxt->estimator, *est_list_size);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+}
+
+/*
+ * SerializeString - Insert a string into shared memory.
+ */
+static void
+SerializeString(ParallelContext *pcxt, int key, char *inputstr)
+{
+ if (inputstr)
+ {
+ char *shmptr = (char *)shm_toc_allocate(pcxt->toc,
+ strlen(inputstr) + 1);
+ strcpy(shmptr, inputstr);
+ shm_toc_insert(pcxt->toc, key, shmptr);
+ }
+}
+
+/*
+ * SerializeList - Insert a list into shared memory.
+ */
+static void
+SerializeList(ParallelContext *pcxt, int key, List *inputlist,
+ Size est_list_size)
+{
+ if (inputlist != NIL)
+ {
+ ParallelCopyKeyListInfo *sharedlistinfo = (ParallelCopyKeyListInfo *)shm_toc_allocate(pcxt->toc,
+ est_list_size);
+ CopyListSharedMemory(inputlist, est_list_size, sharedlistinfo);
+ shm_toc_insert(pcxt->toc, key, sharedlistinfo);
+ }
+}
+
+/*
+ * BeginParallelCopy - start parallel copy tasks.
+ *
+ * Get the number of workers required to perform the parallel copy. The data
+ * structures that are required by the parallel workers will be initialized, the
+ * size required in DSM will be calculated and the necessary keys will be loaded
+ * in the DSM. The specified number of workers will then be launched.
+ */
+static ParallelContext*
+BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
+{
+ ParallelContext *pcxt;
+ ParallelCopyShmInfo *shared_info_ptr;
+ SerializedParallelCopyState *shared_cstate;
+ FullTransactionId full_transaction_id;
+ Size est_shared_info;
+ Size est_cstateshared;
+ Size est_att_list_size;
+ Size est_quote_list_size;
+ Size est_notnull_list_size;
+ Size est_null_list_size;
+ Size est_convert_list_size;
+ Size est_datasource_cb_size;
+ int count = 0;
+ char *whereClauseStr = NULL;
+ char *rangeTableStr = NULL;
+ int parallel_workers = 0;
+ ParallelCopyData *pcdata;
+
+ parallel_workers = Min(nworkers, max_worker_processes);
+
+ /* Can't perform copy in parallel */
+ if (parallel_workers <= 0)
+ return NULL;
+
+ pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
+ cstate->pcdata = pcdata;
+
+ EnterParallelMode();
+ pcxt = CreateParallelContext("postgres", "ParallelCopyMain",
+ parallel_workers);
+ Assert(pcxt->nworkers > 0);
+
+ /*
+ * Estimate size for shared information for PARALLEL_COPY_KEY_SHARED_INFO
+ */
+ est_shared_info = sizeof(ParallelCopyShmInfo);
+ shm_toc_estimate_chunk(&pcxt->estimator, est_shared_info);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate the size for shared information for PARALLEL_COPY_KEY_CSTATE */
+ est_cstateshared = MAXALIGN(sizeof(SerializedParallelCopyState));
+ shm_toc_estimate_chunk(&pcxt->estimator, est_cstateshared);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ EstimateLineKeysStr(pcxt, cstate->null_print);
+ EstimateLineKeysStr(pcxt, cstate->null_print_client);
+ EstimateLineKeysStr(pcxt, cstate->delim);
+ EstimateLineKeysStr(pcxt, cstate->quote);
+ EstimateLineKeysStr(pcxt, cstate->escape);
+
+ if (cstate->whereClause != NULL)
+ {
+ whereClauseStr = nodeToString(cstate->whereClause);
+ EstimateLineKeysStr(pcxt, whereClauseStr);
+ }
+
+ if (cstate->range_table != NULL)
+ {
+ rangeTableStr = nodeToString(cstate->range_table);
+ EstimateLineKeysStr(pcxt, rangeTableStr);
+ }
+
+ /* Estimate the size for shared information for PARALLEL_COPY_KEY_XID. */
+ shm_toc_estimate_chunk(&pcxt->estimator, sizeof(FullTransactionId));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_ATTNAME_LIST.
+ */
+ EstimateLineKeysList(pcxt, attnamelist, &est_att_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_FORCE_QUOTE_LIST.
+ */
+ EstimateLineKeysList(pcxt, cstate->force_quote, &est_quote_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_NOT_NULL_LIST.
+ */
+ EstimateLineKeysList(pcxt, cstate->force_notnull,
+ &est_notnull_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_NULL_LIST.
+ */
+ EstimateLineKeysList(pcxt, cstate->force_null, &est_null_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_CONVERT_LIST.
+ */
+ EstimateLineKeysList(pcxt, cstate->convert_select,
+ &est_convert_list_size);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_DATASOURCE_CB.
+ */
+ if (cstate->data_source_cb)
+ {
+ char *functionname = LookupParallelCopyFnStr(cstate->data_source_cb);
+ est_datasource_cb_size = strlen(functionname) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, est_datasource_cb_size);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+
+ InitializeParallelDSM(pcxt);
+
+ /* If no DSM segment was available, back out (do serial copy) */
+ if (pcxt->seg == NULL)
+ {
+ EndParallelCopy(pcxt);
+ return NULL;
+ }
+
+ /* Allocate shared memory for PARALLEL_COPY_KEY_SHARED_INFO */
+ shared_info_ptr = (ParallelCopyShmInfo *) shm_toc_allocate(pcxt->toc,
+ est_shared_info);
+ MemSet(shared_info_ptr, 0, est_shared_info);
+ shared_info_ptr->is_read_in_progress = true;
+ shared_info_ptr->cur_block_pos = -1;
+ shared_info_ptr->full_transaction_id = full_transaction_id;
+ shared_info_ptr->mycid = GetCurrentCommandId(true);
+ for (count = 0; count < RINGSIZE; count++)
+ {
+ ParallelCopyLineBoundary *lineInfo = &shared_info_ptr->line_boundaries.ring[count];
+ pg_atomic_init_u32(&(lineInfo->line_size), -1);
+ }
+
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_SHARED_INFO, shared_info_ptr);
+ pcdata->pcshared_info = shared_info_ptr;
+ pcdata->relid = relid;
+
+ /* Store shared build state, for which we reserved space. */
+ shared_cstate = (SerializedParallelCopyState *)shm_toc_allocate(pcxt->toc,
+ est_cstateshared);
+
+ /* copy cstate variables. */
+ SerializeParallelCopyState(cstate, shared_cstate);
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_CSTATE, shared_cstate);
+
+ SerializeString(pcxt, PARALLEL_COPY_KEY_NULL_PRINT, cstate->null_print);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_NULL_PRINT_CLIENT,
+ cstate->null_print_client);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_DELIM, cstate->delim);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_QUOTE, cstate->quote);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_ESCAPE, cstate->escape);
+
+ SerializeList(pcxt, PARALLEL_COPY_KEY_ATTNAME_LIST,
+ attnamelist, est_att_list_size);
+ SerializeList(pcxt, PARALLEL_COPY_KEY_FORCE_QUOTE_LIST,
+ cstate->force_quote, est_quote_list_size);
+ SerializeList(pcxt, PARALLEL_COPY_KEY_NOT_NULL_LIST,
+ cstate->force_notnull, est_notnull_list_size);
+ SerializeList(pcxt, PARALLEL_COPY_KEY_NULL_LIST, cstate->force_null,
+ est_null_list_size);
+ SerializeList(pcxt, PARALLEL_COPY_KEY_CONVERT_LIST,
+ cstate->convert_select, est_convert_list_size);
+
+ if (cstate->data_source_cb)
+ {
+ char *functionname = LookupParallelCopyFnStr(cstate->data_source_cb);
+ char *data_source_cb = (char *) shm_toc_allocate(pcxt->toc,
+ est_datasource_cb_size);
+ strcpy(data_source_cb, functionname);
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_DATASOURCE_CB,
+ data_source_cb);
+ }
+
+ if (cstate->whereClause)
+ SerializeString(pcxt, PARALLEL_COPY_KEY_WHERE_CLAUSE_STR,
+ whereClauseStr);
+
+ if(cstate->range_table)
+ SerializeString(pcxt, PARALLEL_COPY_KEY_RANGE_TABLE, rangeTableStr);
+
+ LaunchParallelWorkers(pcxt);
+ if (pcxt->nworkers_launched == 0)
+ {
+ EndParallelCopy(pcxt);
+ return NULL;
+ }
+
+ /*
+ * Caller needs to wait for all launched workers when we return. Make sure
+ * that the failure-to-start case will not hang forever.
+ */
+ WaitForParallelWorkersToAttach(pcxt);
+
+ pcdata->is_leader = true;
+ cstate->is_parallel = true;
+ return pcxt;
+}
+
+/*
+ * EndParallelCopy - end the parallel copy tasks.
+ */
+static pg_attribute_always_inline void
+EndParallelCopy(ParallelContext *pcxt)
+{
+ Assert(!IsParallelWorker());
+
+ DestroyParallelContext(pcxt);
+ ExitParallelMode();
+}
+
+/*
+ * ParallelWorkerInitialization - Initialize parallel worker.
+ */
+static void
+ParallelWorkerInitialization(SerializedParallelCopyState *shared_cstate,
+ CopyState cstate, List *attnamelist)
+{
+ uint32 count;
+ ParallelCopyData *pcdata = cstate->pcdata;
+ TupleDesc tup_desc = RelationGetDescr(cstate->rel);
+
+ cstate->copy_dest = shared_cstate->copy_dest;
+ cstate->file_encoding = shared_cstate->file_encoding;
+ cstate->need_transcoding = shared_cstate->need_transcoding;
+ cstate->encoding_embeds_ascii = shared_cstate->encoding_embeds_ascii;
+ cstate->csv_mode = shared_cstate->csv_mode;
+ cstate->header_line = shared_cstate->header_line;
+ cstate->null_print_len = shared_cstate->null_print_len;
+ cstate->force_quote_all = shared_cstate->force_quote_all;
+ cstate->convert_selectively = shared_cstate->convert_selectively;
+ cstate->num_defaults = shared_cstate->num_defaults;
+ pcdata->relid = shared_cstate->relid;
+
+ PopulateGlobalsForCopyFrom(cstate, tup_desc, attnamelist);
+
+ /* Initialize state variables. */
+ cstate->reached_eof = false;
+ cstate->eol_type = EOL_UNKNOWN;
+ cstate->cur_relname = RelationGetRelationName(cstate->rel);
+ cstate->cur_lineno = 0;
+ cstate->cur_attname = NULL;
+ cstate->cur_attval = NULL;
+
+ /* Set up variables to avoid per-attribute overhead. */
+ initStringInfo(&cstate->attribute_buf);
+
+ initStringInfo(&cstate->line_buf);
+ for (count = 0; count < WORKER_CHUNK_COUNT;count++)
+ initStringInfo(&pcdata->worker_line_buf[count].line_buf);
+
+ cstate->line_buf_converted = false;
+ cstate->raw_buf = NULL;
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+}
+
+/*
+ * ParallelCopyMain - parallel copy worker's code.
+ *
+ * Where clause handling, convert tuple to columns, add default null values for
+ * the missing columns that are not present in that record. Find the partition
+ * if it is partitioned table, invoke before row insert Triggers, handle
+ * constraints and insert the tuples.
+ */
+void
+ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
+{
+ CopyState cstate;
+ ParallelCopyData *pcdata;
+ ParallelCopyShmInfo *pcshared_info;
+ SerializedParallelCopyState *shared_cstate;
+ Relation rel = NULL;
+ MemoryContext oldcontext;
+ List *attlist = NIL;
+ char *data_source_cb;
+ char *whereClauseStr = NULL;
+ char *rangeTableStr = NULL;
+
+ /* Allocate workspace and zero all fields. */
+ cstate = (CopyStateData *) palloc0(sizeof(CopyStateData));
+ pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
+ cstate->pcdata = pcdata;
+ pcdata->is_leader = false;
+ pcdata->worker_processed_pos = -1;
+ cstate->is_parallel = true;
+
+ /*
+ * We allocate everything used by a cstate in a new memory context. This
+ * avoids memory leaks during repeated use of COPY in a query.
+ */
+ cstate->copycontext = AllocSetContextCreate(CurrentMemoryContext,
+ "COPY",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+
+ pcshared_info = (ParallelCopyShmInfo *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_SHARED_INFO,
+ false);
+
+ ereport(DEBUG1, (errmsg("Starting parallel copy worker")));
+
+ pcdata->pcshared_info = pcshared_info;
+
+ shared_cstate = (SerializedParallelCopyState *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_CSTATE,
+ false);
+
+ cstate->null_print = (char *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_NULL_PRINT,
+ true);
+ cstate->null_print_client = (char *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_NULL_PRINT_CLIENT,
+ true);
+
+ DeserializeString(toc, PARALLEL_COPY_KEY_DELIM, &cstate->delim);
+ DeserializeString(toc, PARALLEL_COPY_KEY_QUOTE, &cstate->quote);
+ DeserializeString(toc, PARALLEL_COPY_KEY_ESCAPE, &cstate->escape);
+
+ DeserializeList(toc, PARALLEL_COPY_KEY_ATTNAME_LIST, &attlist);
+ DeserializeList(toc, PARALLEL_COPY_KEY_FORCE_QUOTE_LIST, &cstate->force_quote);
+ DeserializeList(toc, PARALLEL_COPY_KEY_NOT_NULL_LIST, &cstate->force_notnull);
+ DeserializeList(toc, PARALLEL_COPY_KEY_NULL_LIST, &cstate->force_null);
+ DeserializeList(toc, PARALLEL_COPY_KEY_CONVERT_LIST, &cstate->convert_select);
+ data_source_cb = (char *)shm_toc_lookup(toc,
+ PARALLEL_COPY_KEY_DATASOURCE_CB,
+ true);
+ DeserializeString(toc, PARALLEL_COPY_KEY_WHERE_CLAUSE_STR, &whereClauseStr);
+ DeserializeString(toc, PARALLEL_COPY_KEY_RANGE_TABLE, &rangeTableStr);
+
+ if (data_source_cb)
+ cstate->data_source_cb = LookupParallelCopyFnPtr(data_source_cb);
+
+ if (whereClauseStr)
+ {
+ Node *whereClauseCnvrtdFrmStr = (Node *) stringToNode(whereClauseStr);
+ cstate->whereClause = whereClauseCnvrtdFrmStr;
+ }
+
+ if (rangeTableStr)
+ {
+ List *rangeTableCnvrtdFrmStr = (List *) stringToNode(rangeTableStr);
+ cstate->range_table = rangeTableCnvrtdFrmStr;
+ }
+
+ /* Open and lock the relation, using the appropriate lock type. */
+ rel = table_open(shared_cstate->relid, RowExclusiveLock);
+ cstate->rel = rel;
+ ParallelWorkerInitialization(shared_cstate, cstate, attlist);
+
+ CopyFrom(cstate);
+
+ if (rel != NULL)
+ table_close(rel, RowExclusiveLock);
+
+ MemoryContextSwitchTo(oldcontext);
+ return;
+}
+
+/*
+ * ParallelCopyLeader - parallel copy leader's functionality.
+ *
+ * Leader will populate the shared memory and share it across the workers.
+ * Leader will read the table data from the file and copy the contents to block.
+ * Leader will then read the input contents and identify the data based on line
+ * beaks. This information is called line. The line will be populate in
+ * ParallelCopyLineBoundary. Workers will then pick up this information and
+ * insert in to table. Leader will do this till it completes processing the
+ * file. Leader executes the before statement if before statement trigger is
+ * present. Leader read the data from input file. Leader then loads data to data
+ * blocks as and when required block by block. Leader traverses through the data
+ * block to identify one line. It gets a free line to copy the information, if
+ * there is no free line it will wait till there is one free line.
+ * Server copies the identified lines information into lines. This process is
+ * repeated till the complete file is processed.
+ * Leader will wait till all the lines populated are processed by the workers
+ * and exits.
+ */
+static void
+ParallelCopyLeader(CopyState cstate)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ereport(DEBUG1, (errmsg("Running parallel copy leader")));
+
+ pcshared_info->is_read_in_progress = false;
+ cstate->cur_lineno = 0;
+}
+
+/*
+ * LookupParallelCopyFnPtr - Look up parallel copy function pointer.
+ */
+static pg_attribute_always_inline copy_data_source_cb
+LookupParallelCopyFnPtr(const char *funcname)
+{
+ int i;
+
+ for (i = 0; i < lengthof(InternalParallelCopyFuncPtrs); i++)
+ {
+ if (strcmp(InternalParallelCopyFuncPtrs[i].fn_name, funcname) == 0)
+ return InternalParallelCopyFuncPtrs[i].fn_addr;
+ }
+
+ /* We can only reach this by programming error. */
+ elog(ERROR, "internal function \"%s\" not found", funcname);
+}
+
+/*
+ * LookupParallelCopyFnStr - Lookup function string from a function pointer.
+ */
+static pg_attribute_always_inline char*
+LookupParallelCopyFnStr(copy_data_source_cb fn_addr)
+{
+ int i;
+
+ for (i = 0; i < lengthof(InternalParallelCopyFuncPtrs); i++)
+ {
+ if (InternalParallelCopyFuncPtrs[i].fn_addr == fn_addr)
+ return InternalParallelCopyFuncPtrs[i].fn_name;
+ }
+
+ /* We can only reach this by programming error. */
+ elog(ERROR, "internal function pointer not found");
+}
+
/*
* Send copy start/stop messages for frontend copies. These have changed
* in past protocol redesigns.
@@ -1093,6 +1915,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
if (is_from)
{
+ ParallelContext *pcxt = NULL;
Assert(rel);
/* check read-only transaction and parallel mode */
@@ -1102,7 +1925,24 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
cstate = BeginCopyFrom(pstate, rel, stmt->filename, stmt->is_program,
NULL, stmt->attlist, stmt->options);
cstate->whereClause = whereClause;
- *processed = CopyFrom(cstate); /* copy from file to database */
+ cstate->is_parallel = false;
+
+ if (cstate->nworkers > 0)
+ pcxt = BeginParallelCopy(cstate->nworkers, cstate, stmt->attlist,
+ relid);
+
+ if (pcxt)
+ {
+ ParallelCopyLeader(cstate);
+
+ /* Wait for all copy workers to finish */
+ WaitForParallelWorkersToFinish(pcxt);
+ *processed = pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
+ EndParallelCopy(pcxt);
+ }
+ else
+ *processed = CopyFrom(cstate); /* copy from file to database */
+
EndCopyFrom(cstate);
}
else
@@ -1151,6 +1991,7 @@ ProcessCopyOptions(ParseState *pstate,
cstate->is_copy_from = is_from;
cstate->file_encoding = -1;
+ cstate->nworkers = -1;
/* Extract options from the statement node tree */
foreach(option, options)
@@ -1319,6 +2160,53 @@ ProcessCopyOptions(ParseState *pstate,
defel->defname),
parser_errposition(pstate, defel->location)));
}
+ else if (strcmp(defel->defname, "parallel") == 0)
+ {
+ char *endptr, *str;
+ long val;
+
+ if (!cstate->is_copy_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("parallel option supported only for copy from"),
+ parser_errposition(pstate, defel->location)));
+ if (cstate->nworkers >= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("conflicting or redundant options"),
+ parser_errposition(pstate, defel->location)));
+
+ str = defGetString(defel);
+
+ errno = 0; /* To distinguish success/failure after call */
+ val = strtol(str, &endptr, 10);
+
+ /* Check for various possible errors */
+ if ((errno == ERANGE && (val == LONG_MAX || val == LONG_MIN))
+ || (errno != 0 && val == 0) ||
+ *endptr)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("improper use of argument to option \"%s\"",
+ defel->defname),
+ parser_errposition(pstate, defel->location)));
+
+ if (endptr == str)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("no digits were found in argument to option \"%s\"",
+ defel->defname),
+ parser_errposition(pstate, defel->location)));
+
+ cstate->nworkers = (int) val;
+
+ if (cstate->nworkers <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("argument to option \"%s\" must be a positive integer greater than zero",
+ defel->defname),
+ parser_errposition(pstate, defel->location)));
+ }
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1671,12 +2559,13 @@ BeginCopy(ParseState *pstate,
}
/*
- * PopulateGlobalsForCopyFrom - Populates the common variables required for copy
- * from operation. This is a helper function for BeginCopy function.
+ * PopulateCatalogInformation - Populates the common variables required for copy
+ * from operation. This is a helper function for BeginCopy &
+ * ParallelWorkerInitialization function.
*/
static void
PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tupDesc,
- List *attnamelist)
+ List *attnamelist)
{
int num_phys_attrs;
diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c
index c27d970..b3787c1 100644
--- a/src/backend/replication/logical/tablesync.c
+++ b/src/backend/replication/logical/tablesync.c
@@ -557,7 +557,7 @@ make_copy_attnamelist(LogicalRepRelMapEntry *rel)
* Data source callback for the COPY FROM, which reads from the remote
* connection and passes the data back to our local COPY.
*/
-static int
+int
copy_read_data(void *outbuf, int minread, int maxread)
{
int bytesread = 0;
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c639833..5dc95ac 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -14,6 +14,7 @@
#ifndef COPY_H
#define COPY_H
+#include "access/parallel.h"
#include "nodes/execnodes.h"
#include "nodes/parsenodes.h"
#include "parser/parse_node.h"
@@ -41,4 +42,7 @@ extern uint64 CopyFrom(CopyState cstate);
extern DestReceiver *CreateCopyDestReceiver(void);
+extern void ParallelCopyMain(dsm_segment *seg, shm_toc *toc);
+
+extern int copy_read_data(void *outbuf, int minread, int maxread);
#endif /* COPY_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 7eaaad1..dd812b0 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1699,6 +1699,13 @@ ParallelBitmapHeapState
ParallelBlockTableScanDesc
ParallelCompletionPtr
ParallelContext
+ParallelCopyLineBoundaries
+ParallelCopyLineBoundary
+ParallelCopyData
+ParallelCopyDataBlock
+ParallelCopyKeyListInfo
+ParallelCopyLineBuf
+ParallelCopyShmInfo
ParallelExecutorInfo
ParallelHashGrowth
ParallelHashJoinBatch
@@ -2215,6 +2222,7 @@ SerCommitSeqNo
SerialControl
SerializableXactHandle
SerializedActiveRelMaps
+SerializedParallelCopyState
SerializedReindexState
SerializedSnapshotData
SerializedTransactionState
--
1.8.3.1
0003-Allow-copy-from-command-to-process-data-from-file-ST.patchtext/x-patch; charset=US-ASCII; name=0003-Allow-copy-from-command-to-process-data-from-file-ST.patchDownload
From d2dfd7d398a571080da6ef29d17d983151278bde Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Fri, 17 Jul 2020 13:17:51 +0530
Subject: [PATCH 3/6] Allow copy from command to process data from file/STDIN
contents to a table in parallel.
This feature allows the copy from to leverage multiple CPUs in order to copy
data from file/STDIN to a table. This adds a PARALLEL option to COPY FROM
command where the user can specify the number of workers that can be used
to perform the COPY FROM command. Specifying zero as number of workers will
disable parallelism.
The backend, to which the "COPY FROM" query is submitted acts as leader with
the responsibility of reading data from the file/stdin, launching at most n
number of workers as specified with PARALLEL 'n' option in the "COPY FROM"
query. The leader populates the common data required for the workers execution
in the DSM and shares it with the workers. The leader then executes before
statement triggers if there exists any. Leader populates DSM lines which
includes the start offset and line size, while populating the lines it reads
as many blocks as required into the DSM data blocks from the file. Each block
is of 64K size. The leader parses the data to identify a line, the existing
logic from CopyReadLineText which identifies the lines with some changes was
used for this. Leader checks if a free line is available to copy the
information, if there is no free line it waits till the required line is
freed up by the worker and then copies the identified lines information
(offset & line size) into the DSM lines. This process is repeated till the
complete file is processed. Simultaneously, the workers cache the lines(50)
locally into the local memory and release the lines to the leader for further
populating. Each worker processes the lines that it cached and inserts it into
the table. The leader does not participate in the insertion of data, leaders
only responsibility will be to identify the lines as fast as possible for the
workers to do the actual copy operation. The leader waits till all the lines
populated are processed by the workers and exits.
---
src/backend/access/common/toast_internals.c | 11 +-
src/backend/access/heap/heapam.c | 13 -
src/backend/access/transam/xact.c | 31 +
src/backend/commands/copy.c | 892 ++++++++++++++++++++++++++--
src/include/access/xact.h | 2 +
src/tools/pgindent/typedefs.list | 1 +
6 files changed, 898 insertions(+), 52 deletions(-)
diff --git a/src/backend/access/common/toast_internals.c b/src/backend/access/common/toast_internals.c
index 25a81e5..586d53d 100644
--- a/src/backend/access/common/toast_internals.c
+++ b/src/backend/access/common/toast_internals.c
@@ -17,6 +17,7 @@
#include "access/genam.h"
#include "access/heapam.h"
#include "access/heaptoast.h"
+#include "access/parallel.h"
#include "access/table.h"
#include "access/toast_internals.h"
#include "access/xact.h"
@@ -116,7 +117,15 @@ toast_save_datum(Relation rel, Datum value,
TupleDesc toasttupDesc;
Datum t_values[3];
bool t_isnull[3];
- CommandId mycid = GetCurrentCommandId(true);
+
+ /*
+ * Parallel copy can insert toast tuples, in case of parallel copy the
+ * command would have been set already by calling AssignCommandIdForWorker.
+ * For parallel copy call GetCurrentCommandId to get currentCommandId by
+ * passing used as false, as this is taken care earlier.
+ */
+ CommandId mycid = IsParallelWorker() ? GetCurrentCommandId(false) :
+ GetCurrentCommandId(true);
struct varlena *result;
struct varatt_external toast_pointer;
union
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index d881f4c..70ecd51 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2012,19 +2012,6 @@ static HeapTuple
heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
CommandId cid, int options)
{
- /*
- * Parallel operations are required to be strictly read-only in a parallel
- * worker. Parallel inserts are not safe even in the leader in the
- * general case, because group locking means that heavyweight locks for
- * relation extension or GIN page locks will not conflict between members
- * of a lock group, but we don't prohibit that case here because there are
- * useful special cases that we can safely allow, such as CREATE TABLE AS.
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-
tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index b3ee7fa..94988b6 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -502,6 +502,37 @@ GetCurrentFullTransactionIdIfAny(void)
}
/*
+ * AssignFullTransactionIdForWorker
+ *
+ * For parallel copy, transaction id of leader will be used by the workers.
+ */
+void
+AssignFullTransactionIdForWorker(FullTransactionId fullTransactionId)
+{
+ TransactionState s = CurrentTransactionState;
+
+ Assert((IsInParallelMode() || IsParallelWorker()));
+ s->fullTransactionId = fullTransactionId;
+}
+
+/*
+ * AssignCommandIdForWorker
+ *
+ * For parallel copy, command id of leader will be used by the workers.
+ */
+void
+AssignCommandIdForWorker(CommandId commandId, bool used)
+{
+ Assert((IsInParallelMode() || IsParallelWorker()));
+
+ /* this is global to a transaction, not subtransaction-local */
+ if (used)
+ currentCommandIdUsed = true;
+
+ currentCommandId = commandId;
+}
+
+/*
* MarkCurrentTransactionIdLoggedIfAny
*
* Remember that the current xid - if it is assigned - now has been wal logged.
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 50da871..d4cc4d2 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -26,6 +26,7 @@
#include "access/xlog.h"
#include "catalog/dependency.h"
#include "catalog/pg_authid.h"
+#include "catalog/pg_proc_d.h"
#include "catalog/pg_type.h"
#include "commands/copy.h"
#include "commands/defrem.h"
@@ -40,11 +41,13 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "optimizer/clauses.h"
#include "optimizer/optimizer.h"
#include "parser/parse_coerce.h"
#include "parser/parse_collate.h"
#include "parser/parse_expr.h"
#include "parser/parse_relation.h"
+#include "pgstat.h"
#include "port/pg_bswap.h"
#include "rewrite/rewriteHandler.h"
#include "storage/fd.h"
@@ -95,6 +98,18 @@ typedef enum CopyInsertMethod
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
} CopyInsertMethod;
+/*
+ * State of the line.
+ */
+typedef enum ParallelCopyLineState
+{
+ LINE_INIT, /* initial state of line */
+ LINE_LEADER_POPULATING, /* leader processing line */
+ LINE_LEADER_POPULATED, /* leader completed populating line */
+ LINE_WORKER_PROCESSING, /* worker processing line */
+ LINE_WORKER_PROCESSED /* worker completed processing line */
+}ParallelCopyLineState;
+
#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
/*
@@ -577,9 +592,13 @@ if (1) \
{ \
if (raw_buf_ptr > cstate->raw_buf_index) \
{ \
- appendBinaryStringInfo(&cstate->line_buf, \
- cstate->raw_buf + cstate->raw_buf_index, \
- raw_buf_ptr - cstate->raw_buf_index); \
+ if (!IsParallelCopy()) \
+ appendBinaryStringInfo(&cstate->line_buf, \
+ cstate->raw_buf + cstate->raw_buf_index, \
+ raw_buf_ptr - cstate->raw_buf_index); \
+ else \
+ line_size += raw_buf_ptr - cstate->raw_buf_index; \
+ \
cstate->raw_buf_index = raw_buf_ptr; \
} \
} else ((void) 0)
@@ -592,26 +611,65 @@ if (1) \
goto not_end_of_copy; \
} else ((void) 0)
+/* Begin parallel copy Macros */
+#define SET_NEWLINE_SIZE() \
+{ \
+ if (cstate->eol_type == EOL_NL || cstate->eol_type == EOL_CR) \
+ new_line_size = 1; \
+ else if (cstate->eol_type == EOL_CRNL) \
+ new_line_size = 2; \
+ else \
+ new_line_size = 0; \
+}
+
+/*
+ * COPY_WAIT_TO_PROCESS - Wait before continuing to process.
+ */
+#define COPY_WAIT_TO_PROCESS() \
+{ \
+ CHECK_FOR_INTERRUPTS(); \
+ (void) WaitLatch(MyLatch, \
+ WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, \
+ 1L, WAIT_EVENT_PG_SLEEP); \
+ ResetLatch(MyLatch); \
+}
+
/*
* CLEAR_EOL_LINE - Wrapper for clearing EOL.
*/
#define CLEAR_EOL_LINE() \
if (!result && !IsHeaderLine()) \
- ClearEOLFromCopiedData(cstate, cstate->line_buf.data, \
- cstate->line_buf.len, \
- &cstate->line_buf.len) \
+{ \
+ if (IsParallelCopy()) \
+ ClearEOLFromCopiedData(cstate, cstate->raw_buf, \
+ raw_buf_ptr, &line_size); \
+ else \
+ ClearEOLFromCopiedData(cstate, cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ &cstate->line_buf.len); \
+} \
/*
* INCREMENTPROCESSED - Increment the lines processed.
*/
-#define INCREMENTPROCESSED(processed) \
-processed++;
+#define INCREMENTPROCESSED(processed) \
+{ \
+ if (!IsParallelCopy()) \
+ processed++; \
+ else \
+ pg_atomic_add_fetch_u64(&cstate->pcdata->pcshared_info->processed, 1); \
+}
/*
* GETPROCESSED - Get the lines processed.
*/
#define GETPROCESSED(processed) \
-return processed;
+if (!IsParallelCopy()) \
+ return processed; \
+else \
+ return pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
+
+/* End parallel copy Macros */
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
@@ -666,7 +724,10 @@ static void ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
static void ConvertToServerEncoding(CopyState cstate);
static pg_attribute_always_inline copy_data_source_cb LookupParallelCopyFnPtr(const char *funcname);
static pg_attribute_always_inline char* LookupParallelCopyFnStr(copy_data_source_cb fn_addr);
-
+static void ExecBeforeStmtTrigger(CopyState cstate);
+static void CheckTargetRelValidity(CopyState cstate);
+static void PopulateCatalogInformation(CopyState cstate);
+static pg_attribute_always_inline uint32 GetLinePosition(CopyState cstate);
/*
* SerializeParallelCopyState - Copy shared_cstate using cstate information.
*/
@@ -817,6 +878,137 @@ SerializeList(ParallelContext *pcxt, int key, List *inputlist,
}
/*
+ * IsTriggerFunctionParallelSafe - Check if the trigger function is parallel
+ * safe for the triggers. Return false if any one of the trigger has parallel
+ * unsafe function.
+ */
+static pg_attribute_always_inline bool
+IsTriggerFunctionParallelSafe(TriggerDesc *trigdesc)
+{
+ int i;
+ for (i = 0; i < trigdesc->numtriggers; i++)
+ {
+ Trigger *trigger = &trigdesc->triggers[i];
+ int trigtype = RI_TRIGGER_NONE;
+
+ if (func_parallel(trigger->tgfoid) != PROPARALLEL_SAFE)
+ return false;
+
+ /* If the trigger is parallel safe, also look for RI_TRIGGER. */
+ trigtype = RI_FKey_trigger_type(trigger->tgfoid);
+ if (trigtype == RI_TRIGGER_PK || trigtype == RI_TRIGGER_FK)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * CheckExprParallelSafety - determine parallel safety of volatile expressions
+ * in default clause of column definition or in where clause and return true if
+ * they are parallel safe.
+ */
+static pg_attribute_always_inline bool
+CheckExprParallelSafety(CopyState cstate)
+{
+ if (contain_volatile_functions(cstate->whereClause))
+ {
+ if (max_parallel_hazard((Query *)cstate->whereClause) != PROPARALLEL_SAFE)
+ return false;
+ }
+
+ /*
+ * Check if any of the column has volatile default expression. if yes, and
+ * they are not parallel safe, then parallelism is not allowed. For
+ * instance, if there are any serial/bigserial columns for which nextval()
+ * default expression which is parallel unsafe is associated, parallelism
+ * should not be allowed. In non parallel copy volatile functions are not
+ * checked for nextval().
+ */
+ if (cstate->defexprs != NULL && cstate->num_defaults != 0)
+ {
+ int i;
+ for (i = 0; i < cstate->num_defaults; i++)
+ {
+ bool volatile_expr = contain_volatile_functions((Node *)cstate->defexprs[i]->expr);
+ if (volatile_expr &&
+ (max_parallel_hazard((Query *)cstate->defexprs[i]->expr)) !=
+ PROPARALLEL_SAFE)
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * FindInsertMethod - determine insert mode single, multi, or multi conditional.
+ */
+static pg_attribute_always_inline CopyInsertMethod
+FindInsertMethod(CopyState cstate)
+{
+ if (cstate->rel->trigdesc != NULL &&
+ (cstate->rel->trigdesc->trig_insert_before_row ||
+ cstate->rel->trigdesc->trig_insert_instead_row))
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
+ cstate->rel->trigdesc != NULL &&
+ cstate->rel->trigdesc->trig_insert_new_table)
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+ return CIM_MULTI_CONDITIONAL;
+
+ return CIM_MULTI;
+}
+
+/*
+ * IsParallelCopyAllowed - check for the cases where parallel copy is not
+ * applicable.
+ */
+static pg_attribute_always_inline bool
+IsParallelCopyAllowed(CopyState cstate)
+{
+ /* Parallel copy not allowed for frontend (2.0 protocol) & binary option. */
+ if ((cstate->copy_dest == COPY_OLD_FE) || cstate->binary)
+ return false;
+
+ /* Check if copy is into foreign table or temporary table. */
+ if (cstate->rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE ||
+ RelationUsesLocalBuffers(cstate->rel))
+ return false;
+
+ /* Check if trigger function is parallel safe. */
+ if (cstate->rel->trigdesc != NULL &&
+ !IsTriggerFunctionParallelSafe(cstate->rel->trigdesc))
+ return false;
+
+ /*
+ * Check if there is after statement or instead of trigger or transition
+ * table triggers.
+ */
+ if (cstate->rel->trigdesc != NULL &&
+ (cstate->rel->trigdesc->trig_insert_after_statement ||
+ cstate->rel->trigdesc->trig_insert_instead_row ||
+ cstate->rel->trigdesc->trig_insert_new_table))
+ return false;
+
+ /* Check if the volatile expressions are parallel safe, if present any. */
+ if (!CheckExprParallelSafety(cstate))
+ return false;
+
+ /* Check if the insertion mode is single. */
+ if (FindInsertMethod(cstate) == CIM_SINGLE)
+ return false;
+
+ return true;
+}
+
+/*
* BeginParallelCopy - start parallel copy tasks.
*
* Get the number of workers required to perform the parallel copy. The data
@@ -845,6 +1037,8 @@ BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
int parallel_workers = 0;
ParallelCopyData *pcdata;
+ CheckTargetRelValidity(cstate);
+
parallel_workers = Min(nworkers, max_worker_processes);
/* Can't perform copy in parallel */
@@ -854,6 +1048,19 @@ BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
cstate->pcdata = pcdata;
+ /*
+ * User chosen parallel copy. Determine if the parallel copy is actually
+ * allowed. If not, go with the non-parallel mode.
+ */
+ if (!IsParallelCopyAllowed(cstate))
+ {
+ elog(WARNING,
+ "Parallel copy not supported for specified table, copy will be run in non-parallel mode");
+ return NULL;
+ }
+
+ full_transaction_id = GetCurrentFullTransactionId();
+
EnterParallelMode();
pcxt = CreateParallelContext("postgres", "ParallelCopyMain",
parallel_workers);
@@ -1080,9 +1287,212 @@ ParallelWorkerInitialization(SerializedParallelCopyState *shared_cstate,
cstate->line_buf_converted = false;
cstate->raw_buf = NULL;
cstate->raw_buf_index = cstate->raw_buf_len = 0;
+
+ PopulateCatalogInformation(cstate);
+
+ /* Create workspace for CopyReadAttributes results. */
+ if (!cstate->binary)
+ {
+ AttrNumber attr_count = list_length(cstate->attnumlist);
+
+ cstate->max_fields = attr_count;
+ cstate->raw_fields = (char **)palloc(attr_count * sizeof(char *));
+ }
}
/*
+ * CacheLineInfo - Cache the line information to local memory.
+ */
+static bool
+CacheLineInfo(CopyState cstate, uint32 buff_count)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyData *pcdata = cstate->pcdata;
+ uint32 write_pos;
+ ParallelCopyDataBlock *data_blk_ptr;
+ ParallelCopyLineBoundary *lineInfo;
+ uint32 offset;
+ int dataSize;
+ int copiedSize = 0;
+
+ resetStringInfo(&pcdata->worker_line_buf[buff_count].line_buf);
+ write_pos = GetLinePosition(cstate);
+ if (-1 == write_pos)
+ return true;
+
+ /* Get the current line information. */
+ lineInfo = &pcshared_info->line_boundaries.ring[write_pos];
+ if (pg_atomic_read_u32(&lineInfo->line_size) == 0)
+ goto empty_data_line_update;
+
+ /* Get the block information. */
+ data_blk_ptr = &pcshared_info->data_blocks[lineInfo->first_block];
+
+ /* Get the offset information from where the data must be copied. */
+ offset = lineInfo->start_offset;
+ pcdata->worker_line_buf[buff_count].cur_lineno = lineInfo->cur_lineno;
+
+ elog(DEBUG1, "[Worker] Processing - line position:%d, block:%d, unprocessed lines:%d, offset:%d, line size:%d",
+ write_pos, lineInfo->first_block,
+ pg_atomic_read_u32(&data_blk_ptr->unprocessed_line_parts),
+ offset, pg_atomic_read_u32(&lineInfo->line_size));
+
+ for (;;)
+ {
+ uint8 skip_bytes = data_blk_ptr->skip_bytes;
+ /*
+ * There is a possibility that the above loop has come out because
+ * data_blk_ptr->curr_blk_completed is set, but dataSize read might
+ * be an old value, if data_blk_ptr->curr_blk_completed and the line is
+ * completed, line_size will be set. Read the line_size again to be
+ * sure if it is complete or partial block.
+ */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+ if (dataSize)
+ {
+ int remainingSize = dataSize - copiedSize;
+ if (!remainingSize)
+ break;
+
+ /* Whole line is in current block. */
+ if (remainingSize + offset + skip_bytes < DATA_BLOCK_SIZE)
+ {
+ appendBinaryStringInfo(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ remainingSize);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts,
+ 1);
+ break;
+ }
+ else
+ {
+ /* Line is spread across the blocks. */
+ uint32 lineInCurrentBlock = (DATA_BLOCK_SIZE - skip_bytes) - offset;
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ lineInCurrentBlock);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts, 1);
+ copiedSize += lineInCurrentBlock;
+ while (copiedSize < dataSize)
+ {
+ uint32 currentBlockCopySize;
+ ParallelCopyDataBlock *currBlkPtr = &pcshared_info->data_blocks[data_blk_ptr->following_block];
+ skip_bytes = currBlkPtr->skip_bytes;
+
+ /*
+ * If complete data is present in current block use
+ * dataSize - copiedSize, or copy the whole block from
+ * current block.
+ */
+ currentBlockCopySize = Min(dataSize - copiedSize, DATA_BLOCK_SIZE - skip_bytes);
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &currBlkPtr->data[0],
+ currentBlockCopySize);
+ pg_atomic_sub_fetch_u32(&currBlkPtr->unprocessed_line_parts, 1);
+ copiedSize += currentBlockCopySize;
+ data_blk_ptr = currBlkPtr;
+ }
+
+ break;
+ }
+ }
+ else
+ {
+ /* Copy this complete block from the current offset. */
+ uint32 lineInCurrentBlock = (DATA_BLOCK_SIZE - skip_bytes) - offset;
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ lineInCurrentBlock);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts, 1);
+ copiedSize += lineInCurrentBlock;
+
+ /*
+ * Reset the offset. For the first copy, copy from the offset. For
+ * the subsequent copy the complete block.
+ */
+ offset = 0;
+
+ /* Set data_blk_ptr to the following block. */
+ data_blk_ptr = &pcshared_info->data_blocks[data_blk_ptr->following_block];
+ }
+
+ for (;;)
+ {
+ /* Get the size of this line */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+
+ /*
+ * If the data is present in current block lineInfo.line_size
+ * will be updated. If the data is spread across the blocks either
+ * of lineInfo.line_size or data_blk_ptr->curr_blk_completed can
+ * be updated. lineInfo.line_size will be updated if the complete
+ * read is finished. data_blk_ptr->curr_blk_completed will be
+ * updated if processing of current block is finished and data
+ * processing is not finished.
+ */
+ if (data_blk_ptr->curr_blk_completed || (dataSize != -1))
+ break;
+
+ COPY_WAIT_TO_PROCESS()
+ }
+ }
+
+empty_data_line_update:
+ elog(DEBUG1, "[Worker] Completed processing line:%d", write_pos);
+ pg_atomic_write_u32(&lineInfo->line_state, LINE_WORKER_PROCESSED);
+ pg_atomic_write_u32(&lineInfo->line_size, -1);
+ pg_atomic_add_fetch_u64(&pcshared_info->total_worker_processed, 1);
+ return false;
+}
+
+/*
+ * GetWorkerLine - Returns a line for worker to process.
+ */
+static bool
+GetWorkerLine(CopyState cstate)
+{
+ uint32 buff_count;
+ ParallelCopyData *pcdata = cstate->pcdata;
+
+ /*
+ * Copy the line data to line_buf and release the line position so that the
+ * worker can continue loading data.
+ */
+ if (pcdata->worker_line_buf_pos < pcdata->worker_line_buf_count)
+ goto return_line;
+
+ pcdata->worker_line_buf_pos = 0;
+ pcdata->worker_line_buf_count = 0;
+
+ for (buff_count = 0; buff_count < WORKER_CHUNK_COUNT; buff_count++)
+ {
+ bool result = CacheLineInfo(cstate, buff_count);
+ if (result)
+ break;
+
+ pcdata->worker_line_buf_count++;
+ }
+
+ if (pcdata->worker_line_buf_count)
+ goto return_line;
+ else
+ resetStringInfo(&cstate->line_buf);
+
+ return true;
+
+return_line:
+ cstate->line_buf = pcdata->worker_line_buf[pcdata->worker_line_buf_pos].line_buf;
+ cstate->cur_lineno = pcdata->worker_line_buf[pcdata->worker_line_buf_pos].cur_lineno;
+ cstate->line_buf_valid = true;
+
+ /* Mark that encoding conversion hasn't occurred yet. */
+ cstate->line_buf_converted = false;
+ ConvertToServerEncoding(cstate);
+ pcdata->worker_line_buf_pos++;
+ return false;
+ }
+
+/*
* ParallelCopyMain - parallel copy worker's code.
*
* Where clause handling, convert tuple to columns, add default null values for
@@ -1127,6 +1537,8 @@ ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
ereport(DEBUG1, (errmsg("Starting parallel copy worker")));
pcdata->pcshared_info = pcshared_info;
+ AssignFullTransactionIdForWorker(pcshared_info->full_transaction_id);
+ AssignCommandIdForWorker(pcshared_info->mycid, true);
shared_cstate = (SerializedParallelCopyState *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_CSTATE,
false);
@@ -1181,6 +1593,33 @@ ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
}
/*
+ * UpdateBlockInLineInfo - Update the line information.
+ */
+static pg_attribute_always_inline int
+UpdateBlockInLineInfo(CopyState cstate, uint32 blk_pos,
+ uint32 offset, uint32 line_size, uint32 line_state)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundaries *lineBoundaryPtr = &pcshared_info->line_boundaries;
+ ParallelCopyLineBoundary *lineInfo;
+ int line_pos = lineBoundaryPtr->pos;
+
+ /* Update the line information for the worker to pick and process. */
+ lineInfo = &lineBoundaryPtr->ring[line_pos];
+ while (pg_atomic_read_u32(&lineInfo->line_size) != -1)
+ COPY_WAIT_TO_PROCESS()
+
+ lineInfo->first_block = blk_pos;
+ lineInfo->start_offset = offset;
+ lineInfo->cur_lineno = cstate->cur_lineno;
+ pg_atomic_write_u32(&lineInfo->line_size, line_size);
+ pg_atomic_write_u32(&lineInfo->line_state, line_state);
+ lineBoundaryPtr->pos = (lineBoundaryPtr->pos + 1) % RINGSIZE;
+
+ return line_pos;
+}
+
+/*
* ParallelCopyLeader - parallel copy leader's functionality.
*
* Leader will populate the shared memory and share it across the workers.
@@ -1205,8 +1644,159 @@ ParallelCopyLeader(CopyState cstate)
ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
ereport(DEBUG1, (errmsg("Running parallel copy leader")));
+ /* Execute the before statement triggers from the leader */
+ ExecBeforeStmtTrigger(cstate);
+
+ /* On input just throw the header line away. */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
+ {
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ {
+ pcshared_info->is_read_in_progress = false;
+ return; /* done */
+ }
+ }
+
+ for (;;)
+ {
+ bool done;
+ cstate->cur_lineno++;
+
+ /* Actually read the line into memory here. */
+ done = CopyReadLine(cstate);
+
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ break;
+ }
+
pcshared_info->is_read_in_progress = false;
cstate->cur_lineno = 0;
+ }
+
+/*
+ * GetLinePosition - return the line position that worker should process.
+ */
+static uint32
+GetLinePosition(CopyState cstate)
+{
+ ParallelCopyData *pcdata = cstate->pcdata;
+ ParallelCopyShmInfo *pcshared_info = pcdata->pcshared_info;
+ uint32 previous_pos = pcdata->worker_processed_pos;
+ uint32 write_pos = (previous_pos == -1) ? 0 : (previous_pos + 1) % RINGSIZE;
+ for (;;)
+ {
+ int dataSize;
+ bool is_read_in_progress = pcshared_info->is_read_in_progress;
+ ParallelCopyLineBoundary *lineInfo;
+ ParallelCopyDataBlock *data_blk_ptr;
+ ParallelCopyLineState line_state = LINE_LEADER_POPULATED;
+ ParallelCopyLineState curr_line_state;
+ CHECK_FOR_INTERRUPTS();
+
+ /* File read completed & no elements to process. */
+ if (!is_read_in_progress &&
+ (pcshared_info->populated ==
+ pg_atomic_read_u64(&pcshared_info->total_worker_processed)))
+ {
+ write_pos = -1;
+ break;
+ }
+
+ /* Get the current line information. */
+ lineInfo = &pcshared_info->line_boundaries.ring[write_pos];
+ curr_line_state = pg_atomic_read_u32(&lineInfo->line_state);
+ if ((write_pos % WORKER_CHUNK_COUNT == 0) &&
+ (curr_line_state == LINE_WORKER_PROCESSED ||
+ curr_line_state == LINE_WORKER_PROCESSING))
+ {
+ pcdata->worker_processed_pos = write_pos;
+ write_pos = (write_pos + WORKER_CHUNK_COUNT) % RINGSIZE;
+ continue;
+ }
+
+ /* Get the size of this line. */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+
+ if (dataSize != 0) /* If not an empty line. */
+ {
+ /* Get the block information. */
+ data_blk_ptr = &pcshared_info->data_blocks[lineInfo->first_block];
+
+ if (!data_blk_ptr->curr_blk_completed && (dataSize == -1))
+ {
+ /* Wait till the current line or block is added. */
+ COPY_WAIT_TO_PROCESS()
+ continue;
+ }
+ }
+
+ /* Make sure that no worker has consumed this element. */
+ if (pg_atomic_compare_exchange_u32(&lineInfo->line_state,
+ &line_state, LINE_WORKER_PROCESSING))
+ break;
+ }
+
+ pcdata->worker_processed_pos = write_pos;
+ return write_pos;
+}
+
+/*
+ * GetFreeCopyBlock - Get a free block for data to be copied.
+ */
+static pg_attribute_always_inline uint32
+GetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
+{
+ int count = 0;
+ uint32 last_free_block = pcshared_info->cur_block_pos;
+ uint32 block_pos = (last_free_block != -1) ? ((last_free_block + 1) % MAX_BLOCKS_COUNT): 0;
+
+ /*
+ * Get a new block for copying data, don't check curent block, current
+ * block will have some unprocessed data.
+ */
+ while (count < (MAX_BLOCKS_COUNT - 1))
+ {
+ ParallelCopyDataBlock *dataBlkPtr = &pcshared_info->data_blocks[block_pos];
+ uint32 unprocessed_line_parts = pg_atomic_read_u32(&dataBlkPtr->unprocessed_line_parts);
+ if (unprocessed_line_parts == 0)
+ {
+ dataBlkPtr->curr_blk_completed = false;
+ dataBlkPtr->skip_bytes = 0;
+ pcshared_info->cur_block_pos = block_pos;
+ return block_pos;
+ }
+
+ block_pos = (block_pos + 1) % MAX_BLOCKS_COUNT;
+ count++;
+ }
+
+ return -1;
+}
+
+/*
+ * WaitGetFreeCopyBlock - If there are no blocks available, wait and get a block
+ * for copying data.
+ */
+static uint32
+WaitGetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
+{
+ uint32 new_free_pos = -1;
+ for (;;)
+ {
+ new_free_pos = GetFreeCopyBlock(pcshared_info);
+ if (new_free_pos != -1) /* We have got one block, break now. */
+ break;
+
+ COPY_WAIT_TO_PROCESS()
+ }
+
+ return new_free_pos;
}
/*
@@ -1246,6 +1836,145 @@ LookupParallelCopyFnStr(copy_data_source_cb fn_addr)
}
/*
+ * SetRawBufForLoad - Set raw_buf to the shared memory where the file data must
+ * be read.
+ */
+static void
+SetRawBufForLoad(CopyState cstate, uint32 line_size, uint32 copy_buf_len,
+ uint32 raw_buf_ptr, char **copy_raw_buf)
+{
+ ParallelCopyShmInfo *pcshared_info;
+ uint32 cur_block_pos;
+ uint32 next_block_pos;
+ ParallelCopyDataBlock *cur_data_blk_ptr = NULL;
+ ParallelCopyDataBlock *next_data_blk_ptr = NULL;
+
+ if (!IsParallelCopy())
+ return;
+
+ pcshared_info = cstate->pcdata->pcshared_info;
+ cur_block_pos = pcshared_info->cur_block_pos;
+ cur_data_blk_ptr = (cstate->raw_buf) ? &pcshared_info->data_blocks[cur_block_pos] : NULL;
+ next_block_pos = WaitGetFreeCopyBlock(pcshared_info);
+ next_data_blk_ptr = &pcshared_info->data_blocks[next_block_pos];
+
+ /* set raw_buf to the data block in shared memory */
+ cstate->raw_buf = next_data_blk_ptr->data;
+ *copy_raw_buf = cstate->raw_buf;
+ if (cur_data_blk_ptr && line_size)
+ {
+ /*
+ * Mark the previous block as completed, worker can start copying this
+ * data.
+ */
+ cur_data_blk_ptr->following_block = next_block_pos;
+ pg_atomic_add_fetch_u32(&cur_data_blk_ptr->unprocessed_line_parts, 1);
+ cur_data_blk_ptr->skip_bytes = copy_buf_len - raw_buf_ptr;
+ cur_data_blk_ptr->curr_blk_completed = true;
+ }
+}
+
+/*
+ * EndLineParallelCopy - Update the line information in shared memory.
+ */
+static void
+EndLineParallelCopy(CopyState cstate, uint32 line_pos, uint32 line_size,
+ uint32 raw_buf_ptr)
+{
+ uint8 new_line_size;
+ if (!IsParallelCopy())
+ return;
+
+ if (!IsHeaderLine())
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundaries *lineBoundaryPtr = &pcshared_info->line_boundaries;
+ SET_NEWLINE_SIZE()
+ if (line_size)
+ {
+ ParallelCopyLineBoundary *lineInfo = &lineBoundaryPtr->ring[line_pos];
+ /*
+ * If the new_line_size > raw_buf_ptr, then the new block has only
+ * new line char content. The unprocessed count should not be
+ * increased in this case.
+ */
+ if (raw_buf_ptr > new_line_size)
+ {
+ uint32 cur_block_pos = pcshared_info->cur_block_pos;
+ ParallelCopyDataBlock *curr_data_blk_ptr = &pcshared_info->data_blocks[cur_block_pos];
+ pg_atomic_add_fetch_u32(&curr_data_blk_ptr->unprocessed_line_parts, 1);
+ }
+
+ /* Update line size. */
+ pg_atomic_write_u32(&lineInfo->line_size, line_size);
+ pg_atomic_write_u32(&lineInfo->line_state, LINE_LEADER_POPULATED);
+ elog(DEBUG1, "[Leader] After adding - line position:%d, line_size:%d",
+ line_pos, line_size);
+ pcshared_info->populated++;
+ }
+ else if (new_line_size)
+ {
+ /* This means only new line char, empty record should be inserted.*/
+ ParallelCopyLineBoundary *lineInfo;
+ line_pos = UpdateBlockInLineInfo(cstate, -1, -1, 0,
+ LINE_LEADER_POPULATED);
+ lineInfo = &lineBoundaryPtr->ring[line_pos];
+ elog(DEBUG1, "[Leader] Added empty line with offset:%d, line position:%d, line size:%d",
+ lineInfo->start_offset, line_pos,
+ pg_atomic_read_u32(&lineInfo->line_size));
+ pcshared_info->populated++;
+ }
+ }
+}
+
+/*
+ * ExecBeforeStmtTrigger - Execute the before statement trigger, this will be
+ * executed for parallel copy by the leader process.
+ */
+static void
+ExecBeforeStmtTrigger(CopyState cstate)
+{
+ EState *estate = CreateExecutorState();
+ ResultRelInfo *resultRelInfo;
+
+ Assert(IsLeader());
+
+ /*
+ * We need a ResultRelInfo so we can use the regular executor's
+ * index-entry-making machinery. (There used to be a huge amount of code
+ * here that basically duplicated execUtils.c ...)
+ */
+ resultRelInfo = makeNode(ResultRelInfo);
+ InitResultRelInfo(resultRelInfo,
+ cstate->rel,
+ 1, /* must match rel's position in range_table */
+ NULL,
+ 0);
+
+ /* Verify the named relation is a valid target for INSERT */
+ CheckValidResultRel(resultRelInfo, CMD_INSERT);
+
+ estate->es_result_relations = resultRelInfo;
+ estate->es_num_result_relations = 1;
+ estate->es_result_relation_info = resultRelInfo;
+
+ ExecInitRangeTable(estate, cstate->range_table);
+
+ /*
+ * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
+ * should do this for COPY, since it's not really an "INSERT" statement as
+ * such. However, executing these triggers maintains consistency with the
+ * EACH ROW triggers that we already fire on COPY.
+ */
+ ExecBSInsertTriggers(estate, resultRelInfo);
+
+ /* Close any trigger target relations */
+ ExecCleanUpTriggerState(estate);
+
+ FreeExecutorState(estate);
+}
+
+/*
* Send copy start/stop messages for frontend copies. These have changed
* in past protocol redesigns.
*/
@@ -3677,7 +4406,8 @@ CopyFrom(CopyState cstate)
PartitionTupleRouting *proute = NULL;
ErrorContextCallback errcallback;
- CommandId mycid = GetCurrentCommandId(true);
+ CommandId mycid = IsParallelCopy() ? GetCurrentCommandId(false) :
+ GetCurrentCommandId(true);
int ti_options = 0; /* start with default options for insert */
BulkInsertState bistate = NULL;
CopyInsertMethod insertMethod;
@@ -3687,7 +4417,14 @@ CopyFrom(CopyState cstate)
bool has_instead_insert_row_trig;
bool leafpart_use_multi_insert = false;
- CheckTargetRelValidity(cstate);
+ /*
+ * Perform this check if it is not parallel copy. In case of parallel
+ * copy, this check is done by the leader, so that if any invalid case
+ * exist the copy from command will error out from the leader itself,
+ * avoiding launching workers, just to throw error.
+ */
+ if (!IsParallelCopy())
+ CheckTargetRelValidity(cstate);
/*
* If the target file is new-in-transaction, we assume that checking FSM
@@ -3876,13 +4613,16 @@ CopyFrom(CopyState cstate)
has_instead_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
resultRelInfo->ri_TrigDesc->trig_insert_instead_row);
- /*
- * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
- * should do this for COPY, since it's not really an "INSERT" statement as
- * such. However, executing these triggers maintains consistency with the
- * EACH ROW triggers that we already fire on COPY.
- */
- ExecBSInsertTriggers(estate, resultRelInfo);
+ if (!IsParallelCopy())
+ {
+ /*
+ * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
+ * should do this for COPY, since it's not really an "INSERT" statement as
+ * such. However, executing these triggers maintains consistency with the
+ * EACH ROW triggers that we already fire on COPY.
+ */
+ ExecBSInsertTriggers(estate, resultRelInfo);
+ }
econtext = GetPerTupleExprContext(estate);
@@ -3982,6 +4722,16 @@ CopyFrom(CopyState cstate)
!has_instead_insert_row_trig &&
resultRelInfo->ri_FdwRoutine == NULL;
+ /*
+ * If the table has any partitions that are either foreign or
+ * has BEFORE/INSTEAD OF triggers, we can't perform copy
+ * operations with parallel workers.
+ */
+ if (!leafpart_use_multi_insert && IsParallelWorker())
+ ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot perform PARALLEL COPY if partition has BEFORE/INSTEAD OF triggers, or if the partition is foreign partition"),
+ errhint("Try COPY without PARALLEL option")));
+
/* Set the multi-insert buffer to use for this partition. */
if (leafpart_use_multi_insert)
{
@@ -4402,7 +5152,7 @@ BeginCopyFrom(ParseState *pstate,
{
initStringInfo(&cstate->line_buf);
cstate->line_buf_converted = false;
- cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf = (IsParallelCopy()) ? NULL : (char *) palloc(RAW_BUF_SIZE + 1);
cstate->raw_buf_index = cstate->raw_buf_len = 0;
}
@@ -4547,26 +5297,35 @@ NextCopyFromRawFields(CopyState cstate, char ***fields, int *nfields)
/* only available for text or csv input */
Assert(!cstate->binary);
- /* on input just throw the header line away */
- if (cstate->cur_lineno == 0 && cstate->header_line)
+ if (IsParallelCopy())
{
- cstate->cur_lineno++;
- if (CopyReadLine(cstate))
- return false; /* done */
+ done = GetWorkerLine(cstate);
+ if (done && cstate->line_buf.len == 0)
+ return false;
}
+ else
+ {
+ /* on input just throw the header line away */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
+ {
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ return false; /* done */
+ }
- cstate->cur_lineno++;
+ cstate->cur_lineno++;
- /* Actually read the line into memory here */
- done = CopyReadLine(cstate);
+ /* Actually read the line into memory here */
+ done = CopyReadLine(cstate);
- /*
- * EOF at start of line means we're done. If we see EOF after some
- * characters, we act as though it was newline followed by EOF, ie,
- * process the line and then exit loop on next iteration.
- */
- if (done && cstate->line_buf.len == 0)
- return false;
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ return false;
+ }
/* Parse the line into de-escaped field values */
if (cstate->csv_mode)
@@ -4814,9 +5573,31 @@ CopyReadLine(CopyState cstate)
*/
if (cstate->copy_dest == COPY_NEW_FE)
{
+ bool bIsFirst = true;
do
{
- cstate->raw_buf_index = cstate->raw_buf_len;
+ if (!IsParallelCopy())
+ cstate->raw_buf_index = cstate->raw_buf_len;
+ else
+ {
+ if (cstate->raw_buf_index == RAW_BUF_SIZE)
+ {
+ /* Get a new block if it is the first time, From the
+ * subsequent time, reset the index and re-use the same
+ * block.
+ */
+ if (bIsFirst)
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint32 block_pos = WaitGetFreeCopyBlock(pcshared_info);
+ cstate->raw_buf = pcshared_info->data_blocks[block_pos].data;
+ bIsFirst = false;
+ }
+
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+ }
+ }
+
} while (CopyLoadRawBuf(cstate));
}
}
@@ -4871,7 +5652,8 @@ static void
ConvertToServerEncoding(CopyState cstate)
{
/* Done reading the line. Convert it to server encoding. */
- if (cstate->need_transcoding)
+ if (cstate->need_transcoding &&
+ (!IsParallelCopy() || (IsParallelCopy() && !IsLeader())))
{
char *cvt;
cvt = pg_any_to_server(cstate->line_buf.data,
@@ -4910,6 +5692,11 @@ CopyReadLineText(CopyState cstate)
char quotec = '\0';
char escapec = '\0';
+ /* For parallel copy */
+ int line_size = 0;
+ int line_pos = 0;
+
+ cstate->eol_type = EOL_UNKNOWN;
if (cstate->csv_mode)
{
quotec = cstate->quote[0];
@@ -4964,6 +5751,8 @@ CopyReadLineText(CopyState cstate)
if (raw_buf_ptr >= copy_buf_len || need_data)
{
REFILL_LINEBUF;
+ SetRawBufForLoad(cstate, line_size, copy_buf_len, raw_buf_ptr,
+ ©_raw_buf);
/*
* Try to read some more data. This will certainly reset
@@ -5188,9 +5977,15 @@ CopyReadLineText(CopyState cstate)
* discard the data and the \. sequence.
*/
if (prev_raw_ptr > cstate->raw_buf_index)
- appendBinaryStringInfo(&cstate->line_buf,
+ {
+ if (!IsParallelCopy())
+ appendBinaryStringInfo(&cstate->line_buf,
cstate->raw_buf + cstate->raw_buf_index,
prev_raw_ptr - cstate->raw_buf_index);
+ else
+ line_size += prev_raw_ptr - cstate->raw_buf_index;
+ }
+
cstate->raw_buf_index = raw_buf_ptr;
result = true; /* report EOF */
break;
@@ -5242,6 +6037,26 @@ not_end_of_copy:
IF_NEED_REFILL_AND_EOF_BREAK(mblen - 1);
raw_buf_ptr += mblen - 1;
}
+
+ /*
+ * Skip the header line. Update the line here, this cannot be done at
+ * the beginning, as there is a possibility that file contains empty
+ * lines.
+ */
+ if (IsParallelCopy() && first_char_in_line && !IsHeaderLine())
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundary *lineInfo;
+ uint32 line_first_block = pcshared_info->cur_block_pos;
+ line_pos = UpdateBlockInLineInfo(cstate,
+ line_first_block,
+ cstate->raw_buf_index, -1,
+ LINE_LEADER_POPULATING);
+ lineInfo = &pcshared_info->line_boundaries.ring[line_pos];
+ elog(DEBUG1, "[Leader] Adding - block:%d, offset:%d, line position:%d",
+ line_first_block, lineInfo->start_offset, line_pos);
+ }
+
first_char_in_line = false;
} /* end of outer loop */
@@ -5250,6 +6065,7 @@ not_end_of_copy:
*/
REFILL_LINEBUF;
CLEAR_EOL_LINE();
+ EndLineParallelCopy(cstate, line_pos, line_size, raw_buf_ptr);
return result;
}
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index db19187..20dafb7 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -381,6 +381,8 @@ extern FullTransactionId GetTopFullTransactionId(void);
extern FullTransactionId GetTopFullTransactionIdIfAny(void);
extern FullTransactionId GetCurrentFullTransactionId(void);
extern FullTransactionId GetCurrentFullTransactionIdIfAny(void);
+extern void AssignFullTransactionIdForWorker(FullTransactionId fullTransactionId);
+extern void AssignCommandIdForWorker(CommandId commandId, bool used);
extern void MarkCurrentTransactionIdLoggedIfAny(void);
extern bool SubTransactionIsActive(SubTransactionId subxid);
extern CommandId GetCurrentCommandId(bool used);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index dd812b0..d7879fd 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1705,6 +1705,7 @@ ParallelCopyData
ParallelCopyDataBlock
ParallelCopyKeyListInfo
ParallelCopyLineBuf
+ParallelCopyLineState
ParallelCopyShmInfo
ParallelExecutorInfo
ParallelHashGrowth
--
1.8.3.1
0004-Documentation-for-parallel-copy.patchtext/x-patch; charset=US-ASCII; name=0004-Documentation-for-parallel-copy.patchDownload
From 6653f44f1adca06af0db7bd6438a8b15140f1f8f Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Fri, 17 Jul 2020 13:18:13 +0530
Subject: [PATCH 4/6] Documentation for parallel copy.
This patch has the documentation changes for parallel copy.
---
doc/src/sgml/ref/copy.sgml | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18189ab..95d349d 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -37,6 +37,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
NULL '<replaceable class="parameter">null_string</replaceable>'
HEADER [ <replaceable class="parameter">boolean</replaceable> ]
+ PARALLEL <replaceable class="parameter">integer</replaceable>
QUOTE '<replaceable class="parameter">quote_character</replaceable>'
ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -275,6 +276,21 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</varlistentry>
<varlistentry>
+ <term><literal>PARALLEL</literal></term>
+ <listitem>
+ <para>
+ Perform <command>COPY FROM</command> in parallel using <replaceable
+ class="parameter"> integer</replaceable> background workers. Please
+ note that it is not guaranteed that the number of parallel workers
+ specified in <replaceable class="parameter">integer</replaceable> will
+ be used during execution. It is possible for a copy to run with fewer
+ workers than specified, or even with no workers at all. This option is
+ allowed only in <command>COPY FROM</command>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>QUOTE</literal></term>
<listitem>
<para>
--
1.8.3.1
0005-Tests-for-parallel-copy.patchtext/x-patch; charset=US-ASCII; name=0005-Tests-for-parallel-copy.patchDownload
From 4de7281c80911cd97e3937899f3c79af03c5f505 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Fri, 17 Jul 2020 13:18:36 +0530
Subject: [PATCH 5/6] Tests for parallel copy.
This patch has the tests for parallel copy.
---
src/test/regress/expected/copy2.out | 210 +++++++++++++++++++++++++++++++++++-
src/test/regress/input/copy.source | 12 +++
src/test/regress/output/copy.source | 12 +++
src/test/regress/sql/copy2.sql | 208 ++++++++++++++++++++++++++++++++++-
4 files changed, 434 insertions(+), 8 deletions(-)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index e40287d..ef75148 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -254,18 +254,32 @@ It is "perfect".|
"It is ""perfect""."," "
"",
--test that we read consecutive LFs properly
-CREATE TEMP TABLE testnl (a int, b text, c int);
+CREATE TABLE testnl (a int, b text, c int);
COPY testnl FROM stdin CSV;
+COPY testnl FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
+SELECT * FROM testnl;
+ a | b | c
+---+----------------------+---
+ 1 | a field with two LFs+| 2
+ | +|
+ | inside |
+ 1 | a field with two LFs+| 2
+ | +|
+ | inside |
+(2 rows)
+
-- test end of copy marker
-CREATE TEMP TABLE testeoc (a text);
+CREATE TABLE testeoc (a text);
COPY testeoc FROM stdin CSV;
+TRUNCATE testeoc;
+COPY testeoc FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
COPY testeoc TO stdout CSV;
a\.
\.b
c\.d
"\."
-- test handling of nonstandard null marker that violates escaping rules
-CREATE TEMP TABLE testnull(a int, b text);
+CREATE TABLE testnull(a int, b text);
INSERT INTO testnull VALUES (1, E'\\0'), (NULL, NULL);
COPY testnull TO stdout WITH NULL AS E'\\0';
1 \\0
@@ -280,6 +294,15 @@ SELECT * FROM testnull;
|
(4 rows)
+TRUNCATE testnull;
+COPY testnull FROM stdin WITH (NULL E'\\0', PARALLEL 1);
+SELECT * FROM testnull;
+ a | b
+----+----
+ 42 | \0
+ |
+(2 rows)
+
BEGIN;
CREATE TABLE vistest (LIKE testeoc);
COPY vistest FROM stdin CSV;
@@ -349,6 +372,34 @@ SELECT * FROM vistest;
BEGIN;
TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+SELECT * FROM vistest;
+ a
+----
+ a2
+ b
+(2 rows)
+
+SAVEPOINT s1;
+TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+SELECT * FROM vistest;
+ a
+----
+ d2
+ e
+(2 rows)
+
+COMMIT;
+SELECT * FROM vistest;
+ a
+----
+ d2
+ e
+(2 rows)
+
+BEGIN;
+TRUNCATE vistest;
COPY vistest FROM stdin CSV FREEZE;
SELECT * FROM vistest;
a
@@ -409,7 +460,7 @@ SELECT * FROM vistest;
(2 rows)
-- Test FORCE_NOT_NULL and FORCE_NULL options
-CREATE TEMP TABLE forcetest (
+CREATE TABLE forcetest (
a INT NOT NULL,
b TEXT NOT NULL,
c TEXT,
@@ -420,6 +471,8 @@ CREATE TEMP TABLE forcetest (
-- should succeed with no effect ("b" remains an empty string, "c" remains NULL)
BEGIN;
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
+TRUNCATE forcetest;
+COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c), PARALLEL 1);
COMMIT;
SELECT b, c FROM forcetest WHERE a = 1;
b | c
@@ -430,6 +483,8 @@ SELECT b, c FROM forcetest WHERE a = 1;
-- should succeed, FORCE_NULL and FORCE_NOT_NULL can be both specified
BEGIN;
COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d));
+TRUNCATE forcetest;
+COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d), PARALLEL 1);
COMMIT;
SELECT c, d FROM forcetest WHERE a = 2;
c | d
@@ -486,6 +541,31 @@ select * from check_con_tbl;
(2 rows)
+\d+ check_con_tbl
+ Table "public.check_con_tbl"
+ Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
+--------+---------+-----------+----------+---------+---------+--------------+-------------
+ f1 | integer | | | | plain | |
+Check constraints:
+ "check_con_tbl_check" CHECK (check_con_function(check_con_tbl.*))
+
+truncate check_con_tbl;
+copy check_con_tbl from stdin with (parallel 1);
+NOTICE: input = {"f1":1}
+NOTICE: input = {"f1":null}
+copy check_con_tbl from stdin with (parallel 1);
+NOTICE: input = {"f1":0}
+ERROR: new row for relation "check_con_tbl" violates check constraint "check_con_tbl_check"
+DETAIL: Failing row contains (0).
+CONTEXT: COPY check_con_tbl, line 1: "0"
+parallel worker
+select * from check_con_tbl;
+ f1
+----
+ 1
+
+(2 rows)
+
-- test with RLS enabled.
CREATE ROLE regress_rls_copy_user;
CREATE ROLE regress_rls_copy_user_colperms;
@@ -600,8 +680,130 @@ SELECT * FROM instead_of_insert_tbl;
(2 rows)
COMMIT;
+-- Parallel copy tests.
+CREATE TABLE test_parallel_copy (
+ a INT,
+ b INT,
+ c TEXT not null default 'stuff',
+ d TEXT,
+ e TEXT
+) ;
+COPY test_parallel_copy (a, b, c, d, e) from stdin with (PARALLEL 1);
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1, FORMAT 'csv', HEADER);
+-- zero workers: should perform non-parallel copy
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0');
+ERROR: argument to option "parallel" must be a positive integer greater than zero
+LINE 1: COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0'...
+ ^
+-- referencing table: should perform non-parallel copy
+CREATE TABLE test_copy_pk(c1 INT PRIMARY KEY);
+INSERT INTO test_copy_pk VALUES(10);
+CREATE TABLE test_copy_ri(c1 INT REFERENCES test_copy_pk(c1));
+COPY test_copy_ri from stdin with (FORMAT csv, DELIMITER ',', PARALLEL 1);
+WARNING: Parallel copy not supported for specified table, copy will be run in non-parallel mode
+-- expressions: should perform non-parallel copy
+CREATE TABLE test_copy_expr (index INT, height REAL, weight REAL);
+COPY test_copy_expr FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1) WHERE height > random() * 65;
+WARNING: Parallel copy not supported for specified table, copy will be run in non-parallel mode
+-- serial data: should perform non-parallel copy
+CREATE TABLE testserial (index SERIAL, height REAL);
+COPY testserial(height) FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1);
+WARNING: Parallel copy not supported for specified table, copy will be run in non-parallel mode
+-- temporary table copy: should peform non-parallel copy
+CREATE TEMPORARY TABLE temp_test(
+ a int
+) ;
+COPY temp_test (a) from stdin with (PARALLEL 1);
+WARNING: Parallel copy not supported for specified table, copy will be run in non-parallel mode
+-- non-existent column in column list: should fail
+COPY test_parallel_copy (xyz) from stdin with (PARALLEL 1);
+ERROR: column "xyz" of relation "test_parallel_copy" does not exist
+-- too many columns in column list: should fail
+COPY test_parallel_copy (a, b, c, d, e, d, c) from stdin with (PARALLEL 1);
+ERROR: column "d" specified more than once
+-- missing data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: invalid input syntax for type integer: ""
+CONTEXT: COPY test_parallel_copy, line 0, column a: ""
+parallel worker
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: missing data for column "e"
+CONTEXT: COPY test_parallel_copy, line 1: "2000 230 23 23"
+parallel worker
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: missing data for column "e"
+CONTEXT: COPY test_parallel_copy, line 1: "2001 231 \N \N"
+parallel worker
+-- extra data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: extra data after last expected column
+CONTEXT: COPY test_parallel_copy, line 1: "2002 232 40 50 60 70 80"
+parallel worker
+-- various COPY options: delimiters, oids, NULL string, encoding
+COPY test_parallel_copy (b, c, d, e) from stdin with (delimiter ',', null 'x', PARALLEL 1) ;
+COPY test_parallel_copy from stdin WITH (DELIMITER ';', NULL '', PARALLEL 1);
+COPY test_parallel_copy from stdin WITH (DELIMITER ':', NULL E'\\X', ENCODING 'sql_ascii', PARALLEL 1);
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = 50004;
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a > 60003;
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE f > 60003;
+ERROR: column "f" does not exist
+LINE 1: ..._parallel_copy from stdin WITH (PARALLEL 1) WHERE f > 60003;
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = max(x.b);
+ERROR: missing FROM-clause entry for table "x"
+LINE 1: ...rallel_copy from stdin WITH (PARALLEL 1) WHERE a = max(x.b);
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (SELECT 1 FROM x);
+ERROR: cannot use subquery in COPY FROM WHERE condition
+LINE 1: ...arallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (SELECT...
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (generate_series(1,5));
+ERROR: set-returning functions are not allowed in COPY FROM WHERE conditions
+LINE 1: ...lel_copy from stdin WITH (PARALLEL 1) WHERE a IN (generate_s...
+ ^
+COPY test_parallel_copy from stdin WITH(PARALLEL 1) WHERE a = row_number() over(b);
+ERROR: window functions are not allowed in COPY FROM WHERE conditions
+LINE 1: ...rallel_copy from stdin WITH(PARALLEL 1) WHERE a = row_number...
+ ^
+-- check results of copy in
+SELECT * FROM test_parallel_copy;
+ a | b | c | d | e
+-------+----+------------+---------+---------
+ 1 | 11 | test_c1 | test_d1 | test_e1
+ 2 | 12 | test_c2 | test_d2 | test_e2
+ | 3 | stuff | test_d3 |
+ | 4 | stuff | test_d4 |
+ | 5 | stuff | test_d5 |
+ | | 45 | 80 | 90
+ | | x | \x | \x
+ | | , | \, | \
+ 3000 | | c | |
+ 4000 | | C | |
+ 4001 | 1 | empty | |
+ 4002 | 2 | null | |
+ 4003 | 3 | Backslash | \ | \
+ 4004 | 4 | BackslashX | \X | \X
+ 4005 | 5 | N | N | N
+ 4006 | 6 | BackslashN | \N | \N
+ 4007 | 7 | XX | XX | XX
+ 4008 | 8 | Delimiter | : | :
+ 50004 | 25 | 35 | 45 | 55
+ 60004 | 25 | 35 | 45 | 55
+ 60005 | 26 | 36 | 46 | 56
+(21 rows)
+
-- clean up
+DROP TABLE test_copy_ri;
+DROP TABLE test_copy_pk;
+DROP TABLE test_copy_expr;
+DROP TABLE testeoc;
+DROP TABLE testnl;
DROP TABLE forcetest;
+DROP TABLE test_parallel_copy;
+DROP TABLE testserial;
+DROP TABLE testnull;
DROP TABLE vistest;
DROP FUNCTION truncate_in_subxact();
DROP TABLE x, y;
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index a1d529a..159c058 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -15,6 +15,13 @@ DELETE FROM onek;
COPY onek FROM '@abs_builddir@/results/onek.data';
+-- test parallel copy
+COPY tenk1 FROM '@abs_srcdir@/data/tenk.data' WITH (parallel 2);
+
+SELECT COUNT(*) FROM tenk1;
+
+TRUNCATE tenk1;
+
COPY tenk1 FROM '@abs_srcdir@/data/tenk.data';
COPY slow_emp4000 FROM '@abs_srcdir@/data/rect.data';
@@ -159,6 +166,11 @@ truncate parted_copytest;
copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv';
+-- Test parallel copy from with a partitioned table.
+truncate parted_copytest;
+
+copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv' with (parallel 2);
+
-- Ensure COPY FREEZE errors for partitioned tables.
begin;
truncate parted_copytest;
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index 938d355..c3003fe 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -9,6 +9,15 @@ COPY onek FROM '@abs_srcdir@/data/onek.data';
COPY onek TO '@abs_builddir@/results/onek.data';
DELETE FROM onek;
COPY onek FROM '@abs_builddir@/results/onek.data';
+-- test parallel copy
+COPY tenk1 FROM '@abs_srcdir@/data/tenk.data' WITH (parallel 2);
+SELECT COUNT(*) FROM tenk1;
+ count
+-------
+ 10000
+(1 row)
+
+TRUNCATE tenk1;
COPY tenk1 FROM '@abs_srcdir@/data/tenk.data';
COPY slow_emp4000 FROM '@abs_srcdir@/data/rect.data';
COPY person FROM '@abs_srcdir@/data/person.data';
@@ -113,6 +122,9 @@ insert into parted_copytest select x,1,'One' from generate_series(1011,1020) x;
copy (select * from parted_copytest order by a) to '@abs_builddir@/results/parted_copytest.csv';
truncate parted_copytest;
copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv';
+-- Test parallel copy from with a partitioned table.
+truncate parted_copytest;
+copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv' with (parallel 2);
-- Ensure COPY FREEZE errors for partitioned tables.
begin;
truncate parted_copytest;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 902f4fa..032cea9 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -157,7 +157,7 @@ COPY y TO stdout (FORMAT CSV, FORCE_QUOTE *);
--test that we read consecutive LFs properly
-CREATE TEMP TABLE testnl (a int, b text, c int);
+CREATE TABLE testnl (a int, b text, c int);
COPY testnl FROM stdin CSV;
1,"a field with two LFs
@@ -165,8 +165,16 @@ COPY testnl FROM stdin CSV;
inside",2
\.
+COPY testnl FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
+1,"a field with two LFs
+
+inside",2
+\.
+
+SELECT * FROM testnl;
+
-- test end of copy marker
-CREATE TEMP TABLE testeoc (a text);
+CREATE TABLE testeoc (a text);
COPY testeoc FROM stdin CSV;
a\.
@@ -175,11 +183,19 @@ c\.d
"\."
\.
+TRUNCATE testeoc;
+COPY testeoc FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
+a\.
+\.b
+c\.d
+"\."
+\.
+
COPY testeoc TO stdout CSV;
-- test handling of nonstandard null marker that violates escaping rules
-CREATE TEMP TABLE testnull(a int, b text);
+CREATE TABLE testnull(a int, b text);
INSERT INTO testnull VALUES (1, E'\\0'), (NULL, NULL);
COPY testnull TO stdout WITH NULL AS E'\\0';
@@ -191,6 +207,14 @@ COPY testnull FROM stdin WITH NULL AS E'\\0';
SELECT * FROM testnull;
+TRUNCATE testnull;
+COPY testnull FROM stdin WITH (NULL E'\\0', PARALLEL 1);
+42 \\0
+\0 \0
+\.
+
+SELECT * FROM testnull;
+
BEGIN;
CREATE TABLE vistest (LIKE testeoc);
COPY vistest FROM stdin CSV;
@@ -235,6 +259,23 @@ SELECT * FROM vistest;
BEGIN;
TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+a2
+b
+\.
+SELECT * FROM vistest;
+SAVEPOINT s1;
+TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+d2
+e
+\.
+SELECT * FROM vistest;
+COMMIT;
+SELECT * FROM vistest;
+
+BEGIN;
+TRUNCATE vistest;
COPY vistest FROM stdin CSV FREEZE;
x
y
@@ -284,7 +325,7 @@ SELECT * FROM vistest;
COMMIT;
SELECT * FROM vistest;
-- Test FORCE_NOT_NULL and FORCE_NULL options
-CREATE TEMP TABLE forcetest (
+CREATE TABLE forcetest (
a INT NOT NULL,
b TEXT NOT NULL,
c TEXT,
@@ -297,6 +338,10 @@ BEGIN;
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
1,,""
\.
+TRUNCATE forcetest;
+COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c), PARALLEL 1);
+1,,""
+\.
COMMIT;
SELECT b, c FROM forcetest WHERE a = 1;
-- should succeed, FORCE_NULL and FORCE_NOT_NULL can be both specified
@@ -304,6 +349,10 @@ BEGIN;
COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d));
2,'a',,""
\.
+TRUNCATE forcetest;
+COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d), PARALLEL 1);
+2,'a',,""
+\.
COMMIT;
SELECT c, d FROM forcetest WHERE a = 2;
-- should fail with not-null constraint violation
@@ -339,6 +388,16 @@ copy check_con_tbl from stdin;
0
\.
select * from check_con_tbl;
+\d+ check_con_tbl
+truncate check_con_tbl;
+copy check_con_tbl from stdin with (parallel 1);
+1
+\N
+\.
+copy check_con_tbl from stdin with (parallel 1);
+0
+\.
+select * from check_con_tbl;
-- test with RLS enabled.
CREATE ROLE regress_rls_copy_user;
@@ -440,8 +499,149 @@ test1
SELECT * FROM instead_of_insert_tbl;
COMMIT;
+-- Parallel copy tests.
+CREATE TABLE test_parallel_copy (
+ a INT,
+ b INT,
+ c TEXT not null default 'stuff',
+ d TEXT,
+ e TEXT
+) ;
+
+COPY test_parallel_copy (a, b, c, d, e) from stdin with (PARALLEL 1);
+1 11 test_c1 test_d1 test_e1
+2 12 test_c2 test_d2 test_e2
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+3 test_d3
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+4 test_d4
+5 test_d5
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1, FORMAT 'csv', HEADER);
+b d
+\.
+
+-- zero workers: should perform non-parallel copy
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0');
+
+-- referencing table: should perform non-parallel copy
+CREATE TABLE test_copy_pk(c1 INT PRIMARY KEY);
+INSERT INTO test_copy_pk VALUES(10);
+CREATE TABLE test_copy_ri(c1 INT REFERENCES test_copy_pk(c1));
+COPY test_copy_ri from stdin with (FORMAT csv, DELIMITER ',', PARALLEL 1);
+10
+\.
+
+-- expressions: should perform non-parallel copy
+CREATE TABLE test_copy_expr (index INT, height REAL, weight REAL);
+
+COPY test_copy_expr FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1) WHERE height > random() * 65;
+60,60,60
+\.
+
+-- serial data: should perform non-parallel copy
+CREATE TABLE testserial (index SERIAL, height REAL);
+
+COPY testserial(height) FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1);
+60
+\.
+
+-- temporary table copy: should peform non-parallel copy
+CREATE TEMPORARY TABLE temp_test(
+ a int
+) ;
+
+COPY temp_test (a) from stdin with (PARALLEL 1);
+10
+\.
+
+-- non-existent column in column list: should fail
+COPY test_parallel_copy (xyz) from stdin with (PARALLEL 1);
+
+-- too many columns in column list: should fail
+COPY test_parallel_copy (a, b, c, d, e, d, c) from stdin with (PARALLEL 1);
+
+-- missing data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+
+\.
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+2000 230 23 23
+\.
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+2001 231 \N \N
+\.
+
+-- extra data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+2002 232 40 50 60 70 80
+\.
+
+-- various COPY options: delimiters, oids, NULL string, encoding
+COPY test_parallel_copy (b, c, d, e) from stdin with (delimiter ',', null 'x', PARALLEL 1) ;
+x,45,80,90
+x,\x,\\x,\\\x
+x,\,,\\\,,\\
+\.
+
+COPY test_parallel_copy from stdin WITH (DELIMITER ';', NULL '', PARALLEL 1);
+3000;;c;;
+\.
+
+COPY test_parallel_copy from stdin WITH (DELIMITER ':', NULL E'\\X', ENCODING 'sql_ascii', PARALLEL 1);
+4000:\X:C:\X:\X
+4001:1:empty::
+4002:2:null:\X:\X
+4003:3:Backslash:\\:\\
+4004:4:BackslashX:\\X:\\X
+4005:5:N:\N:\N
+4006:6:BackslashN:\\N:\\N
+4007:7:XX:\XX:\XX
+4008:8:Delimiter:\::\:
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = 50004;
+50003 24 34 44 54
+50004 25 35 45 55
+50005 26 36 46 56
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a > 60003;
+60001 22 32 42 52
+60002 23 33 43 53
+60003 24 34 44 54
+60004 25 35 45 55
+60005 26 36 46 56
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE f > 60003;
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = max(x.b);
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (SELECT 1 FROM x);
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (generate_series(1,5));
+
+COPY test_parallel_copy from stdin WITH(PARALLEL 1) WHERE a = row_number() over(b);
+
+-- check results of copy in
+SELECT * FROM test_parallel_copy;
+
-- clean up
+DROP TABLE test_copy_ri;
+DROP TABLE test_copy_pk;
+DROP TABLE test_copy_expr;
+DROP TABLE testeoc;
+DROP TABLE testnl;
DROP TABLE forcetest;
+DROP TABLE test_parallel_copy;
+DROP TABLE testserial;
+DROP TABLE testnull;
DROP TABLE vistest;
DROP FUNCTION truncate_in_subxact();
DROP TABLE x, y;
--
1.8.3.1
Some review comments (mostly) from the leader side code changes:
1) Do we need a DSM key for the FORCE_QUOTE option? I think FORCE_QUOTE
option is only used with COPY TO and not COPY FROM so not sure why you have
added it.
PARALLEL_COPY_KEY_FORCE_QUOTE_LIST
2) Should we be allocating the parallel copy data structure only when it is
confirmed that the parallel copy is allowed?
pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
cstate->pcdata = pcdata;
Or, if you want it to be allocated before confirming if Parallel copy is
allowed or not, then I think it would be good to allocate it in
*cstate->copycontext* memory context so that when EndCopy is called towards
the end of the COPY FROM operation, the entire context itself gets deleted
thereby freeing the memory space allocated for pcdata. In fact it would be
good to ensure that all the local memory allocated inside the ctstate
structure gets allocated in the *cstate->copycontext* memory context.
3) Should we allow Parallel Copy when the insert method is
CIM_MULTI_CONDITIONAL?
+ /* Check if the insertion mode is single. */
+ if (FindInsertMethod(cstate) == CIM_SINGLE)
+ return false;
I know we have added checks in CopyFrom() to ensure that if any trigger
(before row or instead of) is found on any of partition being loaded with
data, then COPY FROM operation would fail, but does it mean that we are
okay to perform parallel copy on partitioned table. Have we done some
performance testing with the partitioned table where the data in the input
file needs to be routed to the different partitions?
4) There are lot of if-checks in IsParallelCopyAllowed function that are
checked in CopyFrom function as well which means in case of Parallel Copy
those checks will get executed multiple times (first by the leader and from
second time onwards by each worker process). Is that required?
5) Should the worker process be calling this function when the leader has
already called it once in ExecBeforeStmtTrigger()?
/* Verify the named relation is a valid target for INSERT */
CheckValidResultRel(resultRelInfo, CMD_INSERT);
6) I think it would be good to re-write the comments atop
ParallelCopyLeader(). From the present comments it appears as if you were
trying to put the information pointwise but somehow you ended up putting in
a paragraph. The comments also have some typos like *line beaks* which
possibly means line breaks. This is applicable for other comments as well
where you
7) Is the following checking equivalent to IsWorker()? If so, it would be
good to replace it with an IsWorker like macro to increase the readability.
(IsParallelCopy() && !IsLeader())
--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
On Fri, Jul 17, 2020 at 2:09 PM vignesh C <vignesh21@gmail.com> wrote:
Show quoted text
Please find the updated patch with the fixes included.
Patch 0003-Allow-copy-from-command-to-process-data-from-file-ST.patch
had few indentation issues, I have fixed and attached the patch for
the same.Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
On Fri, Jul 17, 2020 at 2:09 PM vignesh C <vignesh21@gmail.com> wrote:
Please find the updated patch with the fixes included.
Patch 0003-Allow-copy-from-command-to-process-data-from-file-ST.patch
had few indentation issues, I have fixed and attached the patch for
the same.
Ensure to use the version with each patch-series as that makes it
easier for the reviewer to verify the changes done in the latest
version of the patch. One way is to use commands like "git
format-patch -6 -v <version_of_patch_series>" or you can add the
version number manually.
Review comments:
===================
0001-Copy-code-readjustment-to-support-parallel-copy
1.
@@ -807,8 +835,11 @@ CopyLoadRawBuf(CopyState cstate)
else
nbytes = 0; /* no data need be saved */
+ if (cstate->copy_dest == COPY_NEW_FE)
+ minread = RAW_BUF_SIZE - nbytes;
+
inbytes = CopyGetData(cstate, cstate->raw_buf + nbytes,
- 1, RAW_BUF_SIZE - nbytes);
+ minread, RAW_BUF_SIZE - nbytes);
No comment to explain why this change is done?
0002-Framework-for-leader-worker-in-parallel-copy
2.
+ * ParallelCopyLineBoundary is common data structure between leader & worker,
+ * Leader process will be populating data block, data block offset &
the size of
+ * the record in DSM for the workers to copy the data into the relation.
+ * This is protected by the following sequence in the leader & worker. If they
+ * don't follow this order the worker might process wrong line_size and leader
+ * might populate the information which worker has not yet processed or in the
+ * process of processing.
+ * Leader should operate in the following order:
+ * 1) check if line_size is -1, if not wait, it means worker is still
+ * processing.
+ * 2) set line_state to LINE_LEADER_POPULATING.
+ * 3) update first_block, start_offset & cur_lineno in any order.
+ * 4) update line_size.
+ * 5) update line_state to LINE_LEADER_POPULATED.
+ * Worker should operate in the following order:
+ * 1) check line_state is LINE_LEADER_POPULATED, if not it means
leader is still
+ * populating the data.
+ * 2) read line_size.
+ * 3) only one worker should choose one line for processing, this is handled by
+ * using pg_atomic_compare_exchange_u32, worker will change the sate to
+ * LINE_WORKER_PROCESSING only if line_state is LINE_LEADER_POPULATED.
+ * 4) read first_block, start_offset & cur_lineno in any order.
+ * 5) process line_size data.
+ * 6) update line_size to -1.
+ */
+typedef struct ParallelCopyLineBoundary
Are we doing all this state management to avoid using locks while
processing lines? If so, I think we can use either spinlock or LWLock
to keep the main patch simple and then provide a later patch to make
it lock-less. This will allow us to first focus on the main design of
the patch rather than trying to make this datastructure processing
lock-less in the best possible way.
3.
+ /*
+ * Actual lines inserted by worker (some records will be filtered based on
+ * where condition).
+ */
+ pg_atomic_uint64 processed;
+ pg_atomic_uint64 total_worker_processed; /* total processed records
by the workers */
The difference between processed and total_worker_processed is not
clear. Can we expand the comments a bit?
4.
+ * SerializeList - Insert a list into shared memory.
+ */
+static void
+SerializeList(ParallelContext *pcxt, int key, List *inputlist,
+ Size est_list_size)
+{
+ if (inputlist != NIL)
+ {
+ ParallelCopyKeyListInfo *sharedlistinfo = (ParallelCopyKeyListInfo
*)shm_toc_allocate(pcxt->toc,
+ est_list_size);
+ CopyListSharedMemory(inputlist, est_list_size, sharedlistinfo);
+ shm_toc_insert(pcxt->toc, key, sharedlistinfo);
+ }
+}
Why do we need to write a special mechanism (CopyListSharedMemory) to
serialize a list. Why can't we use nodeToString? It should be able
to take care of List datatype, see outNode which is called from
nodeToString. Once you do that, I think you won't need even
EstimateLineKeysList, strlen should work instead.
Check, if you have any similar special handling for other types that
can be dealt with nodeToString?
5.
+ MemSet(shared_info_ptr, 0, est_shared_info);
+ shared_info_ptr->is_read_in_progress = true;
+ shared_info_ptr->cur_block_pos = -1;
+ shared_info_ptr->full_transaction_id = full_transaction_id;
+ shared_info_ptr->mycid = GetCurrentCommandId(true);
+ for (count = 0; count < RINGSIZE; count++)
+ {
+ ParallelCopyLineBoundary *lineInfo =
&shared_info_ptr->line_boundaries.ring[count];
+ pg_atomic_init_u32(&(lineInfo->line_size), -1);
+ }
+
You can move this initialization in a separate function.
6.
In function BeginParallelCopy(), you need to keep a provision to
collect wal_usage and buf_usage stats. See _bt_begin_parallel for
reference. Those will be required for pg_stat_statements.
7.
DeserializeString() -- it is better to name this function as RestoreString.
ParallelWorkerInitialization() -- it is better to name this function
as InitializeParallelCopyInfo or something like that, the current name
is quite confusing.
ParallelCopyLeader() -- how about ParallelCopyFrom? ParallelCopyLeader
doesn't sound good to me. You can suggest something else if you don't
like ParallelCopyFrom
8.
/*
- * PopulateGlobalsForCopyFrom - Populates the common variables
required for copy
- * from operation. This is a helper function for BeginCopy function.
+ * PopulateCatalogInformation - Populates the common variables
required for copy
+ * from operation. This is a helper function for BeginCopy &
+ * ParallelWorkerInitialization function.
*/
static void
PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tupDesc,
- List *attnamelist)
+ List *attnamelist)
The actual function name and the name in function header don't match.
I also don't like this function name, how about
PopulateCommonCstateInfo? Similarly how about changing
PopulateCatalogInformation to PopulateCstateCatalogInfo?
9.
+static const struct
+{
+ char *fn_name;
+ copy_data_source_cb fn_addr;
+} InternalParallelCopyFuncPtrs[] =
+
+{
+ {
+ "copy_read_data", copy_read_data
+ },
+};
The function copy_read_data is present in
src/backend/replication/logical/tablesync.c and seems to be used
during logical replication. Why do we want to expose this function as
part of this patch?
0003-Allow-copy-from-command-to-process-data-from-file-ST
10.
In the commit message, you have written "The leader does not
participate in the insertion of data, leaders only responsibility will
be to identify the lines as fast as possible for the workers to do the
actual copy operation. The leader waits till all the lines populated
are processed by the workers and exits."
I think you should also mention that we have chosen this design based
on the reason "that everything stalls if the leader doesn't accept
further input data, as well as when there are no available splitted
chunks so it doesn't seem like a good idea to have the leader do other
work. This is backed by the performance data where we have seen that
with 1 worker there is just a 5-10% (or whatever percentage difference
you have seen) performance difference)".
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Thanks for your comments Amit, i have worked on the comments, my thoughts
on the same are mentioned below.
On Tue, Jul 21, 2020 at 3:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Jul 17, 2020 at 2:09 PM vignesh C <vignesh21@gmail.com> wrote:
Please find the updated patch with the fixes included.
Patch 0003-Allow-copy-from-command-to-process-data-from-file-ST.patch
had few indentation issues, I have fixed and attached the patch for
the same.Ensure to use the version with each patch-series as that makes it
easier for the reviewer to verify the changes done in the latest
version of the patch. One way is to use commands like "git
format-patch -6 -v <version_of_patch_series>" or you can add the
version number manually.
Taken care.
Review comments:
===================0001-Copy-code-readjustment-to-support-parallel-copy
1.
@@ -807,8 +835,11 @@ CopyLoadRawBuf(CopyState cstate)
else
nbytes = 0; /* no data need be saved */+ if (cstate->copy_dest == COPY_NEW_FE) + minread = RAW_BUF_SIZE - nbytes; + inbytes = CopyGetData(cstate, cstate->raw_buf + nbytes, - 1, RAW_BUF_SIZE - nbytes); + minread, RAW_BUF_SIZE - nbytes);No comment to explain why this change is done?
0002-Framework-for-leader-worker-in-parallel-copy
Currently CopyGetData copies a lesser amount of data to buffer even though
space is available in buffer because minread was passed as 1 to
CopyGetData. Because of this there are frequent call to CopyGetData for
fetching the data. In this case it will load only some data due to the
below check:
while (maxread > 0 && bytesread < minread && !cstate->reached_eof)
After reading some data bytesread will be greater than minread which is
passed as 1 and return with lesser amount of data, even though there is
some space.
This change is required for parallel copy feature as each time we get a new
DSM data block which is of 64K size and copy the data. If we copy less data
into DSM data blocks we might end up consuming all the DSM data blocks. I
felt this issue can be fixed as part of HEAD. Have posted a separate thread
[1]: /messages/by-id/CALDaNm0v4CjmvSnftYnx_9pOS_dKRG=O3NnBgJsQmi0KipvLog@mail.gmail.com
Can that go as a separate
patch or should we include it here?
[1]: /messages/by-id/CALDaNm0v4CjmvSnftYnx_9pOS_dKRG=O3NnBgJsQmi0KipvLog@mail.gmail.com
/messages/by-id/CALDaNm0v4CjmvSnftYnx_9pOS_dKRG=O3NnBgJsQmi0KipvLog@mail.gmail.com
2.
+ * ParallelCopyLineBoundary is common data structure between leader &
worker,
+ * Leader process will be populating data block, data block offset & the size of + * the record in DSM for the workers to copy the data into the relation. + * This is protected by the following sequence in the leader & worker.
If they
+ * don't follow this order the worker might process wrong line_size and
leader
+ * might populate the information which worker has not yet processed or
in the
+ * process of processing. + * Leader should operate in the following order: + * 1) check if line_size is -1, if not wait, it means worker is still + * processing. + * 2) set line_state to LINE_LEADER_POPULATING. + * 3) update first_block, start_offset & cur_lineno in any order. + * 4) update line_size. + * 5) update line_state to LINE_LEADER_POPULATED. + * Worker should operate in the following order: + * 1) check line_state is LINE_LEADER_POPULATED, if not it means leader is still + * populating the data. + * 2) read line_size. + * 3) only one worker should choose one line for processing, this is
handled by
+ * using pg_atomic_compare_exchange_u32, worker will change the sate
to
+ * LINE_WORKER_PROCESSING only if line_state is LINE_LEADER_POPULATED. + * 4) read first_block, start_offset & cur_lineno in any order. + * 5) process line_size data. + * 6) update line_size to -1. + */ +typedef struct ParallelCopyLineBoundaryAre we doing all this state management to avoid using locks while
processing lines? If so, I think we can use either spinlock or LWLock
to keep the main patch simple and then provide a later patch to make
it lock-less. This will allow us to first focus on the main design of
the patch rather than trying to make this datastructure processing
lock-less in the best possible way.
The steps will be more or less same if we use spinlock too. step 1, step 3
& step 4 will be common we have to use lock & unlock instead of step 2 &
step 5. I feel we can retain the current implementation.
3. + /* + * Actual lines inserted by worker (some records will be filtered based
on
+ * where condition). + */ + pg_atomic_uint64 processed; + pg_atomic_uint64 total_worker_processed; /* total processed records by the workers */The difference between processed and total_worker_processed is not
clear. Can we expand the comments a bit?
Fixed
4. + * SerializeList - Insert a list into shared memory. + */ +static void +SerializeList(ParallelContext *pcxt, int key, List *inputlist, + Size est_list_size) +{ + if (inputlist != NIL) + { + ParallelCopyKeyListInfo *sharedlistinfo = (ParallelCopyKeyListInfo *)shm_toc_allocate(pcxt->toc, + est_list_size); + CopyListSharedMemory(inputlist, est_list_size, sharedlistinfo); + shm_toc_insert(pcxt->toc, key, sharedlistinfo); + } +}Why do we need to write a special mechanism (CopyListSharedMemory) to
serialize a list. Why can't we use nodeToString? It should be able
to take care of List datatype, see outNode which is called from
nodeToString. Once you do that, I think you won't need even
EstimateLineKeysList, strlen should work instead.Check, if you have any similar special handling for other types that
can be dealt with nodeToString?
Fixed
5. + MemSet(shared_info_ptr, 0, est_shared_info); + shared_info_ptr->is_read_in_progress = true; + shared_info_ptr->cur_block_pos = -1; + shared_info_ptr->full_transaction_id = full_transaction_id; + shared_info_ptr->mycid = GetCurrentCommandId(true); + for (count = 0; count < RINGSIZE; count++) + { + ParallelCopyLineBoundary *lineInfo = &shared_info_ptr->line_boundaries.ring[count]; + pg_atomic_init_u32(&(lineInfo->line_size), -1); + } +You can move this initialization in a separate function.
Fixed
6.
In function BeginParallelCopy(), you need to keep a provision to
collect wal_usage and buf_usage stats. See _bt_begin_parallel for
reference. Those will be required for pg_stat_statements.
Fixed
7.
DeserializeString() -- it is better to name this function as
RestoreString.
ParallelWorkerInitialization() -- it is better to name this function
as InitializeParallelCopyInfo or something like that, the current name
is quite confusing.
ParallelCopyLeader() -- how about ParallelCopyFrom? ParallelCopyLeader
doesn't sound good to me. You can suggest something else if you don't
like ParallelCopyFrom
Fixed
8. /* - * PopulateGlobalsForCopyFrom - Populates the common variables required for copy - * from operation. This is a helper function for BeginCopy function. + * PopulateCatalogInformation - Populates the common variables required for copy + * from operation. This is a helper function for BeginCopy & + * ParallelWorkerInitialization function. */ static void PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tupDesc, - List *attnamelist) + List *attnamelist)The actual function name and the name in function header don't match.
I also don't like this function name, how about
PopulateCommonCstateInfo? Similarly how about changing
PopulateCatalogInformation to PopulateCstateCatalogInfo?
Fixed
9. +static const struct +{ + char *fn_name; + copy_data_source_cb fn_addr; +} InternalParallelCopyFuncPtrs[] = + +{ + { + "copy_read_data", copy_read_data + }, +};The function copy_read_data is present in
src/backend/replication/logical/tablesync.c and seems to be used
during logical replication. Why do we want to expose this function as
part of this patch?
I was thinking we could include the framework to support parallelism for
logical replication too and can be enhanced when it is needed. Now I have
removed this as part of the new patch provided, that can be added whenever
required.
0003-Allow-copy-from-command-to-process-data-from-file-ST
10.
In the commit message, you have written "The leader does not
participate in the insertion of data, leaders only responsibility will
be to identify the lines as fast as possible for the workers to do the
actual copy operation. The leader waits till all the lines populated
are processed by the workers and exits."I think you should also mention that we have chosen this design based
on the reason "that everything stalls if the leader doesn't accept
further input data, as well as when there are no available splitted
chunks so it doesn't seem like a good idea to have the leader do other
work. This is backed by the performance data where we have seen that
with 1 worker there is just a 5-10% (or whatever percentage difference
you have seen) performance difference)".
Fixed.
Please find the new patch attached with the fixes.
Thoughts?
On Tue, Jul 21, 2020 at 3:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Show quoted text
On Fri, Jul 17, 2020 at 2:09 PM vignesh C <vignesh21@gmail.com> wrote:
Please find the updated patch with the fixes included.
Patch 0003-Allow-copy-from-command-to-process-data-from-file-ST.patch
had few indentation issues, I have fixed and attached the patch for
the same.Ensure to use the version with each patch-series as that makes it
easier for the reviewer to verify the changes done in the latest
version of the patch. One way is to use commands like "git
format-patch -6 -v <version_of_patch_series>" or you can add the
version number manually.Review comments:
===================0001-Copy-code-readjustment-to-support-parallel-copy
1.
@@ -807,8 +835,11 @@ CopyLoadRawBuf(CopyState cstate)
else
nbytes = 0; /* no data need be saved */+ if (cstate->copy_dest == COPY_NEW_FE) + minread = RAW_BUF_SIZE - nbytes; + inbytes = CopyGetData(cstate, cstate->raw_buf + nbytes, - 1, RAW_BUF_SIZE - nbytes); + minread, RAW_BUF_SIZE - nbytes);No comment to explain why this change is done?
0002-Framework-for-leader-worker-in-parallel-copy 2. + * ParallelCopyLineBoundary is common data structure between leader & worker, + * Leader process will be populating data block, data block offset & the size of + * the record in DSM for the workers to copy the data into the relation. + * This is protected by the following sequence in the leader & worker. If they + * don't follow this order the worker might process wrong line_size and leader + * might populate the information which worker has not yet processed or in the + * process of processing. + * Leader should operate in the following order: + * 1) check if line_size is -1, if not wait, it means worker is still + * processing. + * 2) set line_state to LINE_LEADER_POPULATING. + * 3) update first_block, start_offset & cur_lineno in any order. + * 4) update line_size. + * 5) update line_state to LINE_LEADER_POPULATED. + * Worker should operate in the following order: + * 1) check line_state is LINE_LEADER_POPULATED, if not it means leader is still + * populating the data. + * 2) read line_size. + * 3) only one worker should choose one line for processing, this is handled by + * using pg_atomic_compare_exchange_u32, worker will change the sate to + * LINE_WORKER_PROCESSING only if line_state is LINE_LEADER_POPULATED. + * 4) read first_block, start_offset & cur_lineno in any order. + * 5) process line_size data. + * 6) update line_size to -1. + */ +typedef struct ParallelCopyLineBoundaryAre we doing all this state management to avoid using locks while
processing lines? If so, I think we can use either spinlock or LWLock
to keep the main patch simple and then provide a later patch to make
it lock-less. This will allow us to first focus on the main design of
the patch rather than trying to make this datastructure processing
lock-less in the best possible way.3. + /* + * Actual lines inserted by worker (some records will be filtered based on + * where condition). + */ + pg_atomic_uint64 processed; + pg_atomic_uint64 total_worker_processed; /* total processed records by the workers */The difference between processed and total_worker_processed is not
clear. Can we expand the comments a bit?4. + * SerializeList - Insert a list into shared memory. + */ +static void +SerializeList(ParallelContext *pcxt, int key, List *inputlist, + Size est_list_size) +{ + if (inputlist != NIL) + { + ParallelCopyKeyListInfo *sharedlistinfo = (ParallelCopyKeyListInfo *)shm_toc_allocate(pcxt->toc, + est_list_size); + CopyListSharedMemory(inputlist, est_list_size, sharedlistinfo); + shm_toc_insert(pcxt->toc, key, sharedlistinfo); + } +}Why do we need to write a special mechanism (CopyListSharedMemory) to
serialize a list. Why can't we use nodeToString? It should be able
to take care of List datatype, see outNode which is called from
nodeToString. Once you do that, I think you won't need even
EstimateLineKeysList, strlen should work instead.Check, if you have any similar special handling for other types that
can be dealt with nodeToString?5. + MemSet(shared_info_ptr, 0, est_shared_info); + shared_info_ptr->is_read_in_progress = true; + shared_info_ptr->cur_block_pos = -1; + shared_info_ptr->full_transaction_id = full_transaction_id; + shared_info_ptr->mycid = GetCurrentCommandId(true); + for (count = 0; count < RINGSIZE; count++) + { + ParallelCopyLineBoundary *lineInfo = &shared_info_ptr->line_boundaries.ring[count]; + pg_atomic_init_u32(&(lineInfo->line_size), -1); + } +You can move this initialization in a separate function.
6.
In function BeginParallelCopy(), you need to keep a provision to
collect wal_usage and buf_usage stats. See _bt_begin_parallel for
reference. Those will be required for pg_stat_statements.7.
DeserializeString() -- it is better to name this function as RestoreString.
ParallelWorkerInitialization() -- it is better to name this function
as InitializeParallelCopyInfo or something like that, the current name
is quite confusing.
ParallelCopyLeader() -- how about ParallelCopyFrom? ParallelCopyLeader
doesn't sound good to me. You can suggest something else if you don't
like ParallelCopyFrom8. /* - * PopulateGlobalsForCopyFrom - Populates the common variables required for copy - * from operation. This is a helper function for BeginCopy function. + * PopulateCatalogInformation - Populates the common variables required for copy + * from operation. This is a helper function for BeginCopy & + * ParallelWorkerInitialization function. */ static void PopulateGlobalsForCopyFrom(CopyState cstate, TupleDesc tupDesc, - List *attnamelist) + List *attnamelist)The actual function name and the name in function header don't match.
I also don't like this function name, how about
PopulateCommonCstateInfo? Similarly how about changing
PopulateCatalogInformation to PopulateCstateCatalogInfo?9. +static const struct +{ + char *fn_name; + copy_data_source_cb fn_addr; +} InternalParallelCopyFuncPtrs[] = + +{ + { + "copy_read_data", copy_read_data + }, +};The function copy_read_data is present in
src/backend/replication/logical/tablesync.c and seems to be used
during logical replication. Why do we want to expose this function as
part of this patch?0003-Allow-copy-from-command-to-process-data-from-file-ST
10.
In the commit message, you have written "The leader does not
participate in the insertion of data, leaders only responsibility will
be to identify the lines as fast as possible for the workers to do the
actual copy operation. The leader waits till all the lines populated
are processed by the workers and exits."I think you should also mention that we have chosen this design based
on the reason "that everything stalls if the leader doesn't accept
further input data, as well as when there are no available splitted
chunks so it doesn't seem like a good idea to have the leader do other
work. This is backed by the performance data where we have seen that
with 1 worker there is just a 5-10% (or whatever percentage difference
you have seen) performance difference)".--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v2-0001-Copy-code-readjustment-to-support-parallel-copy.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Copy-code-readjustment-to-support-parallel-copy.patchDownload
From 8207e4279b841705a00eb33cd9d19506e267f1c5 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Wed, 22 Jul 2020 10:38:11 +0530
Subject: [PATCH v2 1/6] Copy code readjustment to support parallel copy.
This patch has the copy code slightly readjusted so that the common code is
separated to functions/macros, these functions/macros will be used by the
workers in the parallel copy code of the upcoming patches. EOL removal is moved
from CopyReadLine to CopyReadLineText, this change was required because in case
of parallel copy the record identification and record updation is done in
CopyReadLineText, before record information is updated in shared memory the new
line characters should be removed.
---
src/backend/commands/copy.c | 361 ++++++++++++++++++++++++++------------------
1 file changed, 218 insertions(+), 143 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 44da71c..249e908 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -95,6 +95,9 @@ typedef enum CopyInsertMethod
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
} CopyInsertMethod;
+#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
+
/*
* This struct contains all the state variables used throughout a COPY
* operation. For simplicity, we use the same struct for all variants of COPY,
@@ -222,7 +225,6 @@ typedef struct CopyStateData
* converts it. Note: we guarantee that there is a \0 at
* raw_buf[raw_buf_len].
*/
-#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
char *raw_buf;
int raw_buf_index; /* next byte to process */
int raw_buf_len; /* total # of bytes stored */
@@ -350,6 +352,27 @@ if (1) \
goto not_end_of_copy; \
} else ((void) 0)
+/*
+ * CLEAR_EOL_LINE - Wrapper for clearing EOL.
+ */
+#define CLEAR_EOL_LINE() \
+if (!result && !IsHeaderLine()) \
+ ClearEOLFromCopiedData(cstate, cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ &cstate->line_buf.len) \
+
+/*
+ * INCREMENTPROCESSED - Increment the lines processed.
+ */
+#define INCREMENTPROCESSED(processed) \
+processed++;
+
+/*
+ * GETPROCESSED - Get the lines processed.
+ */
+#define GETPROCESSED(processed) \
+return processed;
+
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
@@ -395,7 +418,11 @@ static bool CopyGetInt32(CopyState cstate, int32 *val);
static void CopySendInt16(CopyState cstate, int16 val);
static bool CopyGetInt16(CopyState cstate, int16 *val);
-
+static void PopulateCommonCstateInfo(CopyState cstate, TupleDesc tup_desc,
+ List *attnamelist);
+static void ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
+ int copy_line_pos, int *copy_line_size);
+static void ConvertToServerEncoding(CopyState cstate);
/*
* Send copy start/stop messages for frontend copies. These have changed
* in past protocol redesigns.
@@ -796,6 +823,7 @@ CopyLoadRawBuf(CopyState cstate)
{
int nbytes;
int inbytes;
+ int minread = 1;
if (cstate->raw_buf_index < cstate->raw_buf_len)
{
@@ -807,8 +835,11 @@ CopyLoadRawBuf(CopyState cstate)
else
nbytes = 0; /* no data need be saved */
+ if (cstate->copy_dest == COPY_NEW_FE)
+ minread = RAW_BUF_SIZE - nbytes;
+
inbytes = CopyGetData(cstate, cstate->raw_buf + nbytes,
- 1, RAW_BUF_SIZE - nbytes);
+ minread, RAW_BUF_SIZE - nbytes);
nbytes += inbytes;
cstate->raw_buf[nbytes] = '\0';
cstate->raw_buf_index = 0;
@@ -1466,7 +1497,6 @@ BeginCopy(ParseState *pstate,
{
CopyState cstate;
TupleDesc tupDesc;
- int num_phys_attrs;
MemoryContext oldcontext;
/* Allocate workspace and zero all fields */
@@ -1632,6 +1662,24 @@ BeginCopy(ParseState *pstate,
tupDesc = cstate->queryDesc->tupDesc;
}
+ PopulateCommonCstateInfo(cstate, tupDesc, attnamelist);
+ cstate->copy_dest = COPY_FILE; /* default */
+
+ MemoryContextSwitchTo(oldcontext);
+
+ return cstate;
+}
+
+/*
+ * PopulateCommonCstateInfo - Populates the common variables required for copy
+ * from operation. This is a helper function for BeginCopy function.
+ */
+static void
+PopulateCommonCstateInfo(CopyState cstate, TupleDesc tupDesc,
+ List *attnamelist)
+{
+ int num_phys_attrs;
+
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
@@ -1751,12 +1799,6 @@ BeginCopy(ParseState *pstate,
pg_database_encoding_max_length() > 1);
/* See Multibyte encoding comment above */
cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
-
- cstate->copy_dest = COPY_FILE; /* default */
-
- MemoryContextSwitchTo(oldcontext);
-
- return cstate;
}
/*
@@ -2648,32 +2690,11 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
}
/*
- * Copy FROM file to relation.
+ * Check if the relation specified in copy from is valid.
*/
-uint64
-CopyFrom(CopyState cstate)
+static void
+CheckTargetRelValidity(CopyState cstate)
{
- ResultRelInfo *resultRelInfo;
- ResultRelInfo *target_resultRelInfo;
- ResultRelInfo *prevResultRelInfo = NULL;
- EState *estate = CreateExecutorState(); /* for ExecConstraints() */
- ModifyTableState *mtstate;
- ExprContext *econtext;
- TupleTableSlot *singleslot = NULL;
- MemoryContext oldcontext = CurrentMemoryContext;
-
- PartitionTupleRouting *proute = NULL;
- ErrorContextCallback errcallback;
- CommandId mycid = GetCurrentCommandId(true);
- int ti_options = 0; /* start with default options for insert */
- BulkInsertState bistate = NULL;
- CopyInsertMethod insertMethod;
- CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
- uint64 processed = 0;
- bool has_before_insert_row_trig;
- bool has_instead_insert_row_trig;
- bool leafpart_use_multi_insert = false;
-
Assert(cstate->rel);
/*
@@ -2710,27 +2731,6 @@ CopyFrom(CopyState cstate)
RelationGetRelationName(cstate->rel))));
}
- /*
- * If the target file is new-in-transaction, we assume that checking FSM
- * for free space is a waste of time. This could possibly be wrong, but
- * it's unlikely.
- */
- if (RELKIND_HAS_STORAGE(cstate->rel->rd_rel->relkind) &&
- (cstate->rel->rd_createSubid != InvalidSubTransactionId ||
- cstate->rel->rd_firstRelfilenodeSubid != InvalidSubTransactionId))
- ti_options |= TABLE_INSERT_SKIP_FSM;
-
- /*
- * Optimize if new relfilenode was created in this subxact or one of its
- * committed children and we won't see those rows later as part of an
- * earlier scan or command. The subxact test ensures that if this subxact
- * aborts then the frozen rows won't be visible after xact cleanup. Note
- * that the stronger test of exactly which subtransaction created it is
- * crucial for correctness of this optimization. The test for an earlier
- * scan or command tolerates false negatives. FREEZE causes other sessions
- * to see rows they would not see under MVCC, and a false negative merely
- * spreads that anomaly to the current session.
- */
if (cstate->freeze)
{
/*
@@ -2768,9 +2768,61 @@ CopyFrom(CopyState cstate)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("cannot perform COPY FREEZE because the table was not created or truncated in the current subtransaction")));
+ }
+}
+
+/*
+ * Copy FROM file to relation.
+ */
+uint64
+CopyFrom(CopyState cstate)
+{
+ ResultRelInfo *resultRelInfo;
+ ResultRelInfo *target_resultRelInfo;
+ ResultRelInfo *prevResultRelInfo = NULL;
+ EState *estate = CreateExecutorState(); /* for ExecConstraints() */
+ ModifyTableState *mtstate;
+ ExprContext *econtext;
+ TupleTableSlot *singleslot = NULL;
+ MemoryContext oldcontext = CurrentMemoryContext;
+ PartitionTupleRouting *proute = NULL;
+ ErrorContextCallback errcallback;
+ CommandId mycid = GetCurrentCommandId(true);
+ int ti_options = 0; /* start with default options for insert */
+ BulkInsertState bistate = NULL;
+ CopyInsertMethod insertMethod;
+ CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
+ uint64 processed = 0;
+ bool has_before_insert_row_trig;
+ bool has_instead_insert_row_trig;
+ bool leafpart_use_multi_insert = false;
+
+ CheckTargetRelValidity(cstate);
+
+ /*
+ * If the target file is new-in-transaction, we assume that checking FSM
+ * for free space is a waste of time. This could possibly be wrong, but
+ * it's unlikely.
+ */
+ if (RELKIND_HAS_STORAGE(cstate->rel->rd_rel->relkind) &&
+ (cstate->rel->rd_createSubid != InvalidSubTransactionId ||
+ cstate->rel->rd_firstRelfilenodeSubid != InvalidSubTransactionId))
+ ti_options |= TABLE_INSERT_SKIP_FSM;
+
+ /*
+ * Optimize if new relfilenode was created in this subxact or one of its
+ * committed children and we won't see those rows later as part of an
+ * earlier scan or command. The subxact test ensures that if this subxact
+ * aborts then the frozen rows won't be visible after xact cleanup. Note
+ * that the stronger test of exactly which subtransaction created it is
+ * crucial for correctness of this optimization. The test for an earlier
+ * scan or command tolerates false negatives. FREEZE causes other sessions
+ * to see rows they would not see under MVCC, and a false negative merely
+ * spreads that anomaly to the current session.
+ */
+ if (cstate->freeze)
ti_options |= TABLE_INSERT_FROZEN;
- }
/*
* We need a ResultRelInfo so we can use the regular executor's
@@ -3263,7 +3315,7 @@ CopyFrom(CopyState cstate)
* or FDW; this is the same definition used by nodeModifyTable.c
* for counting tuples inserted by an INSERT command.
*/
- processed++;
+ INCREMENTPROCESSED(processed)
}
}
@@ -3318,30 +3370,15 @@ CopyFrom(CopyState cstate)
FreeExecutorState(estate);
- return processed;
+ GETPROCESSED(processed)
}
/*
- * Setup to read tuples from a file for COPY FROM.
- *
- * 'rel': Used as a template for the tuples
- * 'filename': Name of server-local file to read
- * 'attnamelist': List of char *, columns to include. NIL selects all cols.
- * 'options': List of DefElem. See copy_opt_item in gram.y for selections.
- *
- * Returns a CopyState, to be passed to NextCopyFrom and related functions.
+ * PopulateCstateCatalogInfo - populate the catalog information.
*/
-CopyState
-BeginCopyFrom(ParseState *pstate,
- Relation rel,
- const char *filename,
- bool is_program,
- copy_data_source_cb data_source_cb,
- List *attnamelist,
- List *options)
+static void
+PopulateCstateCatalogInfo(CopyState cstate)
{
- CopyState cstate;
- bool pipe = (filename == NULL);
TupleDesc tupDesc;
AttrNumber num_phys_attrs,
num_defaults;
@@ -3351,38 +3388,8 @@ BeginCopyFrom(ParseState *pstate,
Oid in_func_oid;
int *defmap;
ExprState **defexprs;
- MemoryContext oldcontext;
bool volatile_defexprs;
- cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
- oldcontext = MemoryContextSwitchTo(cstate->copycontext);
-
- /* Initialize state variables */
- cstate->reached_eof = false;
- cstate->eol_type = EOL_UNKNOWN;
- cstate->cur_relname = RelationGetRelationName(cstate->rel);
- cstate->cur_lineno = 0;
- cstate->cur_attname = NULL;
- cstate->cur_attval = NULL;
-
- /*
- * Set up variables to avoid per-attribute overhead. attribute_buf is
- * used in both text and binary modes, but we use line_buf and raw_buf
- * only in text mode.
- */
- initStringInfo(&cstate->attribute_buf);
- if (!cstate->binary)
- {
- initStringInfo(&cstate->line_buf);
- cstate->line_buf_converted = false;
- cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
- cstate->raw_buf_index = cstate->raw_buf_len = 0;
- }
-
- /* Assign range table, we'll need it in CopyFrom. */
- if (pstate)
- cstate->range_table = pstate->p_rtable;
-
tupDesc = RelationGetDescr(cstate->rel);
num_phys_attrs = tupDesc->natts;
num_defaults = 0;
@@ -3460,6 +3467,61 @@ BeginCopyFrom(ParseState *pstate,
cstate->defexprs = defexprs;
cstate->volatile_defexprs = volatile_defexprs;
cstate->num_defaults = num_defaults;
+}
+
+/*
+ * Setup to read tuples from a file for COPY FROM.
+ *
+ * 'rel': Used as a template for the tuples
+ * 'filename': Name of server-local file to read
+ * 'attnamelist': List of char *, columns to include. NIL selects all cols.
+ * 'options': List of DefElem. See copy_opt_item in gram.y for selections.
+ *
+ * Returns a CopyState, to be passed to NextCopyFrom and related functions.
+ */
+CopyState
+BeginCopyFrom(ParseState *pstate,
+ Relation rel,
+ const char *filename,
+ bool is_program,
+ copy_data_source_cb data_source_cb,
+ List *attnamelist,
+ List *options)
+{
+ CopyState cstate;
+ bool pipe = (filename == NULL);
+ MemoryContext oldcontext;
+
+ cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+
+ /* Initialize state variables */
+ cstate->reached_eof = false;
+ cstate->eol_type = EOL_UNKNOWN;
+ cstate->cur_relname = RelationGetRelationName(cstate->rel);
+ cstate->cur_lineno = 0;
+ cstate->cur_attname = NULL;
+ cstate->cur_attval = NULL;
+
+ /*
+ * Set up variables to avoid per-attribute overhead. attribute_buf is
+ * used in both text and binary modes, but we use line_buf and raw_buf
+ * only in text mode.
+ */
+ initStringInfo(&cstate->attribute_buf);
+ if (!cstate->binary)
+ {
+ initStringInfo(&cstate->line_buf);
+ cstate->line_buf_converted = false;
+ cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+ }
+
+ /* Assign range table, we'll need it in CopyFrom. */
+ if (pstate)
+ cstate->range_table = pstate->p_rtable;
+
+ PopulateCstateCatalogInfo(cstate);
cstate->is_program = is_program;
if (data_source_cb)
@@ -3869,45 +3931,60 @@ CopyReadLine(CopyState cstate)
} while (CopyLoadRawBuf(cstate));
}
}
- else
- {
- /*
- * If we didn't hit EOF, then we must have transferred the EOL marker
- * to line_buf along with the data. Get rid of it.
- */
- switch (cstate->eol_type)
- {
- case EOL_NL:
- Assert(cstate->line_buf.len >= 1);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\n');
- cstate->line_buf.len--;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_CR:
- Assert(cstate->line_buf.len >= 1);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\r');
- cstate->line_buf.len--;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_CRNL:
- Assert(cstate->line_buf.len >= 2);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 2] == '\r');
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\n');
- cstate->line_buf.len -= 2;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_UNKNOWN:
- /* shouldn't get here */
- Assert(false);
- break;
- }
- }
+ ConvertToServerEncoding(cstate);
+ return result;
+}
+
+/*
+ * ClearEOLFromCopiedData - Clear EOL from the copied data.
+ */
+static void
+ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
+ int copy_line_pos, int *copy_line_size)
+{
+ /*
+ * If we didn't hit EOF, then we must have transferred the EOL marker
+ * to line_buf along with the data. Get rid of it.
+ */
+ switch (cstate->eol_type)
+ {
+ case EOL_NL:
+ Assert(*copy_line_size >= 1);
+ Assert(copy_line_data[copy_line_pos - 1] == '\n');
+ copy_line_data[copy_line_pos - 1] = '\0';
+ (*copy_line_size)--;
+ break;
+ case EOL_CR:
+ Assert(*copy_line_size >= 1);
+ Assert(copy_line_data[copy_line_pos - 1] == '\r');
+ copy_line_data[copy_line_pos - 1] = '\0';
+ (*copy_line_size)--;
+ break;
+ case EOL_CRNL:
+ Assert(*copy_line_size >= 2);
+ Assert(copy_line_data[copy_line_pos - 2] == '\r');
+ Assert(copy_line_data[copy_line_pos - 1] == '\n');
+ copy_line_data[copy_line_pos - 2] = '\0';
+ *copy_line_size -= 2;
+ break;
+ case EOL_UNKNOWN:
+ /* shouldn't get here */
+ Assert(false);
+ break;
+ }
+}
+
+/*
+ * ConvertToServerEncoding - convert contents to server encoding.
+ */
+static void
+ConvertToServerEncoding(CopyState cstate)
+{
/* Done reading the line. Convert it to server encoding. */
if (cstate->need_transcoding)
{
char *cvt;
-
cvt = pg_any_to_server(cstate->line_buf.data,
cstate->line_buf.len,
cstate->file_encoding);
@@ -3919,11 +3996,8 @@ CopyReadLine(CopyState cstate)
pfree(cvt);
}
}
-
/* Now it's safe to use the buffer in error messages */
cstate->line_buf_converted = true;
-
- return result;
}
/*
@@ -4286,6 +4360,7 @@ not_end_of_copy:
* Transfer any still-uncopied data to line_buf.
*/
REFILL_LINEBUF;
+ CLEAR_EOL_LINE();
return result;
}
--
1.8.3.1
v2-0002-Framework-for-leader-worker-in-parallel-copy.patchtext/x-patch; charset=US-ASCII; name=v2-0002-Framework-for-leader-worker-in-parallel-copy.patchDownload
From 8e013d8d1e780d9d9c3fd60595b70ec6463e7a54 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Wed, 22 Jul 2020 18:51:08 +0530
Subject: [PATCH v2 2/6] Framework for leader/worker in parallel copy
This patch has the framework for data structures in parallel copy, leader
initialization, worker initialization, shared memory updation, starting workers,
wait for workers and workers exiting.
---
src/backend/access/transam/parallel.c | 4 +
src/backend/commands/copy.c | 756 +++++++++++++++++++++++++++++++++-
src/include/commands/copy.h | 2 +
src/tools/pgindent/typedefs.list | 7 +
4 files changed, 765 insertions(+), 4 deletions(-)
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index b042696..a3cff4b 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -25,6 +25,7 @@
#include "catalog/pg_enum.h"
#include "catalog/storage.h"
#include "commands/async.h"
+#include "commands/copy.h"
#include "executor/execParallel.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
@@ -145,6 +146,9 @@ static const struct
},
{
"parallel_vacuum_main", parallel_vacuum_main
+ },
+ {
+ "ParallelCopyMain", ParallelCopyMain
}
};
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 249e908..a3cf750 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -96,9 +96,180 @@ typedef enum CopyInsertMethod
} CopyInsertMethod;
#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+
+/*
+ * The macros DATA_BLOCK_SIZE, RINGSIZE & MAX_BLOCKS_COUNT stores the records
+ * read from the file that need to be inserted into the relation. These values
+ * help in handover of multiple records with significant size of data to be
+ * processed by each of the workers to make sure there is no context switch & the
+ * work is fairly distributed among the workers. This number showed best
+ * results in the performance tests.
+ */
+#define DATA_BLOCK_SIZE RAW_BUF_SIZE
+
+/* It can hold upto 10000 record information for worker to process. */
+#define RINGSIZE (10 * 1000)
+
+/* It can hold 1000 blocks of 64K data in DSM to be processed by the worker. */
+#define MAX_BLOCKS_COUNT 1000
+
+/*
+ * Each worker will be allocated WORKER_CHUNK_COUNT of records from DSM data
+ * block to process to avoid lock contention. This value should be mode of
+ * RINGSIZE, as wrap around cases is currently not handled while selecting the
+ * WORKER_CHUNK_COUNT by the worker.
+ */
+#define WORKER_CHUNK_COUNT 50
+
+#define IsParallelCopy() (cstate->is_parallel)
+#define IsLeader() (cstate->pcdata->is_leader)
#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
/*
+ * Copy data block information.
+ * ParallelCopyDataBlock's will be created in DSM. Data read from file will be
+ * copied in these DSM data blocks. The leader process identifies the records
+ * and the record information will be shared to the workers. The workers will
+ * insert the records into the table. There can be one or more number of records
+ * in each of the data block based on the record size.
+ */
+typedef struct ParallelCopyDataBlock
+{
+ /* The number of unprocessed lines in the current block. */
+ pg_atomic_uint32 unprocessed_line_parts;
+
+ /*
+ * If the current line data is continued into another block, following_block
+ * will have the position where the remaining data need to be read.
+ */
+ uint32 following_block;
+
+ /*
+ * This flag will be set, when the leader finds out this block can be read
+ * safely by the worker. This helps the worker to start processing the line
+ * early where the line will be spread across many blocks and the worker
+ * need not wait for the complete line to be processed.
+ */
+ bool curr_blk_completed;
+ char data[DATA_BLOCK_SIZE]; /* data read from file */
+ uint8 skip_bytes;
+}ParallelCopyDataBlock;
+
+/*
+ * Individual line information.
+ * ParallelCopyLineBoundary is common data structure between leader & worker,
+ * Leader process will be populating data block, data block offset & the size of
+ * the record in DSM for the workers to copy the data into the relation.
+ * This is protected by the following sequence in the leader & worker. If they
+ * don't follow this order the worker might process wrong line_size and leader
+ * might populate the information which worker has not yet processed or in the
+ * process of processing.
+ * Leader should operate in the following order:
+ * 1) check if line_size is -1, if not wait, it means worker is still
+ * processing.
+ * 2) set line_state to LINE_LEADER_POPULATING.
+ * 3) update first_block, start_offset & cur_lineno in any order.
+ * 4) update line_size.
+ * 5) update line_state to LINE_LEADER_POPULATED.
+ * Worker should operate in the following order:
+ * 1) check line_state is LINE_LEADER_POPULATED, if not it means leader is still
+ * populating the data.
+ * 2) read line_size.
+ * 3) only one worker should choose one line for processing, this is handled by
+ * using pg_atomic_compare_exchange_u32, worker will change the sate to
+ * LINE_WORKER_PROCESSING only if line_state is LINE_LEADER_POPULATED.
+ * 4) read first_block, start_offset & cur_lineno in any order.
+ * 5) process line_size data.
+ * 6) update line_size to -1.
+ */
+typedef struct ParallelCopyLineBoundary
+{
+ /* Position of the first block in data_blocks array. */
+ uint32 first_block;
+ uint32 start_offset; /* start offset of the line */
+
+ /*
+ * Size of the current line -1 means line is yet to be filled completely,
+ * 0 means empty line, >0 means line filled with line size data.
+ */
+ pg_atomic_uint32 line_size;
+ pg_atomic_uint32 line_state; /* line state */
+ uint64 cur_lineno; /* line number for error messages */
+}ParallelCopyLineBoundary;
+
+/*
+ * Circular queue used to store the line information.
+ */
+typedef struct ParallelCopyLineBoundaries
+{
+ /* Position for the leader to populate a line. */
+ uint32 pos;
+
+ /* Data read from the file/stdin by the leader process. */
+ ParallelCopyLineBoundary ring[RINGSIZE];
+}ParallelCopyLineBoundaries;
+
+/*
+ * Shared information among parallel copy workers. This will be allocated in the
+ * DSM segment.
+ */
+typedef struct ParallelCopyShmInfo
+{
+ bool is_read_in_progress; /* file read status */
+
+ /*
+ * Actual lines inserted by worker, will not be same as
+ * total_worker_processed if where condition is specified along with copy.
+ * This will be the actual records inserted into the relation.
+ */
+ pg_atomic_uint64 processed;
+
+ /*
+ * The number of records currently processed by the worker, this will also
+ * include the number of records that was filtered because of where clause.
+ */
+ pg_atomic_uint64 total_worker_processed;
+ uint64 populated; /* lines populated by leader */
+ uint32 cur_block_pos; /* current data block */
+ ParallelCopyDataBlock data_blocks[MAX_BLOCKS_COUNT]; /* data block array */
+ FullTransactionId full_transaction_id; /* xid for copy from statement */
+ CommandId mycid; /* command id */
+ ParallelCopyLineBoundaries line_boundaries; /* line array */
+} ParallelCopyShmInfo;
+
+/*
+ * Parallel copy line buffer information.
+ */
+typedef struct ParallelCopyLineBuf
+{
+ StringInfoData line_buf;
+ uint64 cur_lineno; /* line number for error messages */
+}ParallelCopyLineBuf;
+
+/*
+ * Parallel copy data information.
+ */
+typedef struct ParallelCopyData
+{
+ Oid relid; /* relation id of the table */
+ ParallelCopyShmInfo *pcshared_info; /* common info in shared memory */
+ bool is_leader;
+
+ /* line position which worker is processing */
+ uint32 worker_processed_pos;
+
+ /*
+ * Local line_buf array, workers will copy it here and release the lines
+ * for the leader to continue.
+ */
+ ParallelCopyLineBuf worker_line_buf[WORKER_CHUNK_COUNT];
+ uint32 worker_line_buf_count; /* Number of lines */
+
+ /* Current position in worker_line_buf */
+ uint32 worker_line_buf_pos;
+}ParallelCopyData;
+
+/*
* This struct contains all the state variables used throughout a COPY
* operation. For simplicity, we use the same struct for all variants of COPY,
* even though some fields are used in only some cases.
@@ -228,8 +399,36 @@ typedef struct CopyStateData
char *raw_buf;
int raw_buf_index; /* next byte to process */
int raw_buf_len; /* total # of bytes stored */
+ int nworkers;
+ bool is_parallel;
+ ParallelCopyData *pcdata;
} CopyStateData;
+/*
+ * This structure helps in storing the common data from CopyStateData that are
+ * required by the workers. This information will then be allocated and stored
+ * into the DSM for the worker to retrieve and copy it to CopyStateData.
+ */
+typedef struct SerializedParallelCopyState
+{
+ /* low-level state data */
+ CopyDest copy_dest; /* type of copy source/destination */
+ int file_encoding; /* file or remote side's character encoding */
+ bool need_transcoding; /* file encoding diff from server? */
+ bool encoding_embeds_ascii; /* ASCII can be non-first byte? */
+
+ /* parameters from the COPY command */
+ bool csv_mode; /* Comma Separated Value format? */
+ bool header_line; /* CSV header line? */
+ int null_print_len; /* length of same */
+ bool force_quote_all; /* FORCE_QUOTE *? */
+ bool convert_selectively; /* do selective binary conversion? */
+
+ /* Working state for COPY FROM */
+ AttrNumber num_defaults;
+ Oid relid;
+}SerializedParallelCopyState;
+
/* DestReceiver for COPY (query) TO */
typedef struct
{
@@ -259,6 +458,22 @@ typedef struct
/* Trim the list of buffers back down to this number after flushing */
#define MAX_PARTITION_BUFFERS 32
+/* DSM keys for parallel copy. */
+#define PARALLEL_COPY_KEY_SHARED_INFO 1
+#define PARALLEL_COPY_KEY_CSTATE 2
+#define PARALLEL_COPY_KEY_NULL_PRINT 3
+#define PARALLEL_COPY_KEY_DELIM 4
+#define PARALLEL_COPY_KEY_QUOTE 5
+#define PARALLEL_COPY_KEY_ESCAPE 6
+#define PARALLEL_COPY_KEY_ATTNAME_LIST 7
+#define PARALLEL_COPY_KEY_NOT_NULL_LIST 8
+#define PARALLEL_COPY_KEY_NULL_LIST 9
+#define PARALLEL_COPY_KEY_CONVERT_LIST 10
+#define PARALLEL_COPY_KEY_WHERE_CLAUSE_STR 11
+#define PARALLEL_COPY_KEY_RANGE_TABLE 12
+#define PARALLEL_COPY_WAL_USAGE 13
+#define PARALLEL_COPY_BUFFER_USAGE 14
+
/* Stores multi-insert data related to a single relation in CopyFrom. */
typedef struct CopyMultiInsertBuffer
{
@@ -418,11 +633,477 @@ static bool CopyGetInt32(CopyState cstate, int32 *val);
static void CopySendInt16(CopyState cstate, int16 val);
static bool CopyGetInt16(CopyState cstate, int16 *val);
+static pg_attribute_always_inline void EndParallelCopy(ParallelContext *pcxt);
static void PopulateCommonCstateInfo(CopyState cstate, TupleDesc tup_desc,
- List *attnamelist);
+ List *attnamelist);
static void ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
int copy_line_pos, int *copy_line_size);
static void ConvertToServerEncoding(CopyState cstate);
+
+/*
+ * SerializeParallelCopyState - Copy shared_cstate using cstate information.
+ */
+static pg_attribute_always_inline void
+SerializeParallelCopyState(CopyState cstate, SerializedParallelCopyState *shared_cstate)
+{
+ shared_cstate->copy_dest = cstate->copy_dest;
+ shared_cstate->file_encoding = cstate->file_encoding;
+ shared_cstate->need_transcoding = cstate->need_transcoding;
+ shared_cstate->encoding_embeds_ascii = cstate->encoding_embeds_ascii;
+ shared_cstate->csv_mode = cstate->csv_mode;
+ shared_cstate->header_line = cstate->header_line;
+ shared_cstate->null_print_len = cstate->null_print_len;
+ shared_cstate->force_quote_all = cstate->force_quote_all;
+ shared_cstate->convert_selectively = cstate->convert_selectively;
+ shared_cstate->num_defaults = cstate->num_defaults;
+ shared_cstate->relid = cstate->pcdata->relid;
+}
+
+/*
+ * RestoreString - Retrieve the string from shared memory.
+ */
+static void
+RestoreString(shm_toc *toc, int sharedkey, char **copystr)
+{
+ char *shared_str_val = (char *)shm_toc_lookup(toc, sharedkey, true);
+ if (shared_str_val)
+ *copystr = pstrdup(shared_str_val);
+}
+
+/*
+ * EstimateLineKeysStr - Estimate the size required in shared memory for the
+ * input string.
+ */
+static void
+EstimateLineKeysStr(ParallelContext *pcxt, char *inputstr)
+{
+ if (inputstr)
+ {
+ shm_toc_estimate_chunk(&pcxt->estimator, strlen(inputstr) + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+}
+
+/*
+ * SerializeString - Insert a string into shared memory.
+ */
+static void
+SerializeString(ParallelContext *pcxt, int key, char *inputstr)
+{
+ if (inputstr)
+ {
+ char *shmptr = (char *)shm_toc_allocate(pcxt->toc,
+ strlen(inputstr) + 1);
+ strcpy(shmptr, inputstr);
+ shm_toc_insert(pcxt->toc, key, shmptr);
+ }
+}
+
+/*
+ * PopulateParallelCopyShmInfo - set ParallelCopyShmInfo.
+ */
+static void
+PopulateParallelCopyShmInfo(ParallelCopyShmInfo *shared_info_ptr,
+ FullTransactionId full_transaction_id)
+{
+ uint32 count;
+
+ MemSet(shared_info_ptr, 0, sizeof(ParallelCopyShmInfo));
+ shared_info_ptr->is_read_in_progress = true;
+ shared_info_ptr->cur_block_pos = -1;
+ shared_info_ptr->full_transaction_id = full_transaction_id;
+ shared_info_ptr->mycid = GetCurrentCommandId(true);
+ for (count = 0; count < RINGSIZE; count++)
+ {
+ ParallelCopyLineBoundary *lineInfo = &shared_info_ptr->line_boundaries.ring[count];
+ pg_atomic_init_u32(&(lineInfo->line_size), -1);
+ }
+}
+
+/*
+ * BeginParallelCopy - start parallel copy tasks.
+ *
+ * Get the number of workers required to perform the parallel copy. The data
+ * structures that are required by the parallel workers will be initialized, the
+ * size required in DSM will be calculated and the necessary keys will be loaded
+ * in the DSM. The specified number of workers will then be launched.
+ */
+static ParallelContext*
+BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
+{
+ ParallelContext *pcxt;
+ ParallelCopyShmInfo *shared_info_ptr;
+ SerializedParallelCopyState *shared_cstate;
+ FullTransactionId full_transaction_id;
+ Size est_cstateshared;
+ char *whereClauseStr = NULL;
+ char *rangeTableStr = NULL;
+ char *attnameListStr = NULL;
+ char *notnullListStr = NULL;
+ char *nullListStr = NULL;
+ char *convertListStr = NULL;
+ int parallel_workers = 0;
+ WalUsage *walusage;
+ BufferUsage *bufferusage;
+ ParallelCopyData *pcdata;
+ MemoryContext oldcontext;
+
+ parallel_workers = Min(nworkers, max_worker_processes);
+
+ /* Can't perform copy in parallel */
+ if (parallel_workers <= 0)
+ return NULL;
+
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+ pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
+ MemoryContextSwitchTo(oldcontext);
+ cstate->pcdata = pcdata;
+
+ EnterParallelMode();
+ pcxt = CreateParallelContext("postgres", "ParallelCopyMain",
+ parallel_workers);
+ Assert(pcxt->nworkers > 0);
+
+ /*
+ * Estimate size for shared information for PARALLEL_COPY_KEY_SHARED_INFO
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator, sizeof(ParallelCopyShmInfo));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate the size for shared information for PARALLEL_COPY_KEY_CSTATE */
+ est_cstateshared = MAXALIGN(sizeof(SerializedParallelCopyState));
+ shm_toc_estimate_chunk(&pcxt->estimator, est_cstateshared);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ EstimateLineKeysStr(pcxt, cstate->null_print);
+ EstimateLineKeysStr(pcxt, cstate->null_print_client);
+ EstimateLineKeysStr(pcxt, cstate->delim);
+ EstimateLineKeysStr(pcxt, cstate->quote);
+ EstimateLineKeysStr(pcxt, cstate->escape);
+
+ if (cstate->whereClause != NULL)
+ {
+ whereClauseStr = nodeToString(cstate->whereClause);
+ EstimateLineKeysStr(pcxt, whereClauseStr);
+ }
+
+ if (cstate->range_table != NULL)
+ {
+ rangeTableStr = nodeToString(cstate->range_table);
+ EstimateLineKeysStr(pcxt, rangeTableStr);
+ }
+
+ /* Estimate the size for shared information for PARALLEL_COPY_KEY_XID. */
+ shm_toc_estimate_chunk(&pcxt->estimator, sizeof(FullTransactionId));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_ATTNAME_LIST.
+ */
+ if (attnamelist != NIL)
+ {
+ attnameListStr = nodeToString(attnamelist);
+ EstimateLineKeysStr(pcxt, attnameListStr);
+ }
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_NOT_NULL_LIST.
+ */
+ if (cstate->force_notnull != NIL)
+ {
+ notnullListStr = nodeToString(cstate->force_notnull);
+ EstimateLineKeysStr(pcxt, notnullListStr);
+ }
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_NULL_LIST.
+ */
+ if (cstate->force_null != NIL)
+ {
+ nullListStr = nodeToString(cstate->force_null);
+ EstimateLineKeysStr(pcxt, nullListStr);
+ }
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_CONVERT_LIST.
+ */
+ if (cstate->convert_select != NIL)
+ {
+ convertListStr = nodeToString(cstate->convert_select);
+ EstimateLineKeysStr(pcxt, convertListStr);
+ }
+
+ /*
+ * Estimate space for WalUsage and BufferUsage -- PARALLEL_COPY_WAL_USAGE
+ * and PARALLEL_COPY_BUFFER_USAGE.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ InitializeParallelDSM(pcxt);
+
+ /* If no DSM segment was available, back out (do serial copy) */
+ if (pcxt->seg == NULL)
+ {
+ EndParallelCopy(pcxt);
+ return NULL;
+ }
+
+ /* Allocate shared memory for PARALLEL_COPY_KEY_SHARED_INFO */
+ shared_info_ptr = (ParallelCopyShmInfo *) shm_toc_allocate(pcxt->toc, sizeof(ParallelCopyShmInfo));
+ PopulateParallelCopyShmInfo(shared_info_ptr, full_transaction_id);
+
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_SHARED_INFO, shared_info_ptr);
+ pcdata->pcshared_info = shared_info_ptr;
+ pcdata->relid = relid;
+
+ /* Store shared build state, for which we reserved space. */
+ shared_cstate = (SerializedParallelCopyState *)shm_toc_allocate(pcxt->toc, est_cstateshared);
+
+ /* copy cstate variables. */
+ SerializeParallelCopyState(cstate, shared_cstate);
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_CSTATE, shared_cstate);
+
+ SerializeString(pcxt, PARALLEL_COPY_KEY_NULL_PRINT, cstate->null_print);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_DELIM, cstate->delim);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_QUOTE, cstate->quote);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_ESCAPE, cstate->escape);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_ATTNAME_LIST, attnameListStr);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_NOT_NULL_LIST, notnullListStr);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_NULL_LIST, nullListStr);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_CONVERT_LIST, convertListStr);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_WHERE_CLAUSE_STR, whereClauseStr);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_RANGE_TABLE, rangeTableStr);
+
+ /*
+ * Allocate space for each worker's WalUsage and BufferUsage; no need to
+ * initialize.
+ */
+ walusage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_WAL_USAGE, walusage);
+ bufferusage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_BUFFER_USAGE, bufferusage);
+
+ LaunchParallelWorkers(pcxt);
+ if (pcxt->nworkers_launched == 0)
+ {
+ EndParallelCopy(pcxt);
+ return NULL;
+ }
+
+ /*
+ * Caller needs to wait for all launched workers when we return. Make sure
+ * that the failure-to-start case will not hang forever.
+ */
+ WaitForParallelWorkersToAttach(pcxt);
+
+ pcdata->is_leader = true;
+ cstate->is_parallel = true;
+ return pcxt;
+}
+
+/*
+ * EndParallelCopy - end the parallel copy tasks.
+ */
+static pg_attribute_always_inline void
+EndParallelCopy(ParallelContext *pcxt)
+{
+ Assert(!IsParallelWorker());
+
+ DestroyParallelContext(pcxt);
+ ExitParallelMode();
+}
+
+/*
+ * InitializeParallelCopyInfo - Initialize parallel worker.
+ */
+static void
+InitializeParallelCopyInfo(SerializedParallelCopyState *shared_cstate,
+ CopyState cstate, List *attnamelist)
+{
+ uint32 count;
+ ParallelCopyData *pcdata = cstate->pcdata;
+ TupleDesc tup_desc = RelationGetDescr(cstate->rel);
+
+ cstate->copy_dest = shared_cstate->copy_dest;
+ cstate->file_encoding = shared_cstate->file_encoding;
+ cstate->need_transcoding = shared_cstate->need_transcoding;
+ cstate->encoding_embeds_ascii = shared_cstate->encoding_embeds_ascii;
+ cstate->csv_mode = shared_cstate->csv_mode;
+ cstate->header_line = shared_cstate->header_line;
+ cstate->null_print_len = shared_cstate->null_print_len;
+ cstate->force_quote_all = shared_cstate->force_quote_all;
+ cstate->convert_selectively = shared_cstate->convert_selectively;
+ cstate->num_defaults = shared_cstate->num_defaults;
+ pcdata->relid = shared_cstate->relid;
+
+ PopulateCommonCstateInfo(cstate, tup_desc, attnamelist);
+
+ /* Initialize state variables. */
+ cstate->reached_eof = false;
+ cstate->eol_type = EOL_UNKNOWN;
+ cstate->cur_relname = RelationGetRelationName(cstate->rel);
+ cstate->cur_lineno = 0;
+ cstate->cur_attname = NULL;
+ cstate->cur_attval = NULL;
+
+ /* Set up variables to avoid per-attribute overhead. */
+ initStringInfo(&cstate->attribute_buf);
+
+ initStringInfo(&cstate->line_buf);
+ for (count = 0; count < WORKER_CHUNK_COUNT; count++)
+ initStringInfo(&pcdata->worker_line_buf[count].line_buf);
+
+ cstate->line_buf_converted = false;
+ cstate->raw_buf = NULL;
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+}
+
+/*
+ * ParallelCopyMain - parallel copy worker's code.
+ *
+ * Where clause handling, convert tuple to columns, add default null values for
+ * the missing columns that are not present in that record. Find the partition
+ * if it is partitioned table, invoke before row insert Triggers, handle
+ * constraints and insert the tuples.
+ */
+void
+ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
+{
+ CopyState cstate;
+ ParallelCopyData *pcdata;
+ ParallelCopyShmInfo *pcshared_info;
+ SerializedParallelCopyState *shared_cstate;
+ Relation rel = NULL;
+ MemoryContext oldcontext;
+ List *attlist = NIL;
+ char *whereClauseStr = NULL;
+ char *rangeTableStr = NULL;
+ char *attnameListStr = NULL;
+ char *notnullListStr = NULL;
+ char *nullListStr = NULL;
+ char *convertListStr = NULL;
+ WalUsage *walusage;
+ BufferUsage *bufferusage;
+
+ /* Allocate workspace and zero all fields. */
+ cstate = (CopyStateData *) palloc0(sizeof(CopyStateData));
+
+ /*
+ * We allocate everything used by a cstate in a new memory context. This
+ * avoids memory leaks during repeated use of COPY in a query.
+ */
+ cstate->copycontext = AllocSetContextCreate(CurrentMemoryContext,
+ "COPY",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+
+ pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
+ cstate->pcdata = pcdata;
+ pcdata->is_leader = false;
+ pcdata->worker_processed_pos = -1;
+ cstate->is_parallel = true;
+ pcshared_info = (ParallelCopyShmInfo *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_SHARED_INFO, false);
+
+ ereport(DEBUG1, (errmsg("Starting parallel copy worker")));
+
+ pcdata->pcshared_info = pcshared_info;
+
+ shared_cstate = (SerializedParallelCopyState *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_CSTATE, false);
+ cstate->null_print = (char *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_NULL_PRINT, true);
+
+ RestoreString(toc, PARALLEL_COPY_KEY_DELIM, &cstate->delim);
+ RestoreString(toc, PARALLEL_COPY_KEY_QUOTE, &cstate->quote);
+ RestoreString(toc, PARALLEL_COPY_KEY_ESCAPE, &cstate->escape);
+ RestoreString(toc, PARALLEL_COPY_KEY_ATTNAME_LIST, &attnameListStr);
+ if (attnameListStr)
+ attlist = (List *)stringToNode(attnameListStr);
+
+ RestoreString(toc, PARALLEL_COPY_KEY_NOT_NULL_LIST, ¬nullListStr);
+ if (notnullListStr)
+ cstate->force_notnull = (List *)stringToNode(notnullListStr);
+
+ RestoreString(toc, PARALLEL_COPY_KEY_NULL_LIST, &nullListStr);
+ if (nullListStr)
+ cstate->force_null = (List *)stringToNode(nullListStr);
+
+ RestoreString(toc, PARALLEL_COPY_KEY_CONVERT_LIST, &convertListStr);
+ if (convertListStr)
+ cstate->convert_select = (List *)stringToNode(convertListStr);
+
+ RestoreString(toc, PARALLEL_COPY_KEY_WHERE_CLAUSE_STR, &whereClauseStr);
+ RestoreString(toc, PARALLEL_COPY_KEY_RANGE_TABLE, &rangeTableStr);
+
+ if (whereClauseStr)
+ {
+ Node *whereClauseCnvrtdFrmStr = (Node *) stringToNode(whereClauseStr);
+ cstate->whereClause = whereClauseCnvrtdFrmStr;
+ }
+
+ if (rangeTableStr)
+ {
+ List *rangeTableCnvrtdFrmStr = (List *) stringToNode(rangeTableStr);
+ cstate->range_table = rangeTableCnvrtdFrmStr;
+ }
+
+ /* Open and lock the relation, using the appropriate lock type. */
+ rel = table_open(shared_cstate->relid, RowExclusiveLock);
+ cstate->rel = rel;
+ InitializeParallelCopyInfo(shared_cstate, cstate, attlist);
+
+ CopyFrom(cstate);
+
+ if (rel != NULL)
+ table_close(rel, RowExclusiveLock);
+
+ /* Report WAL/buffer usage during parallel execution */
+ bufferusage = shm_toc_lookup(toc, PARALLEL_COPY_BUFFER_USAGE, false);
+ walusage = shm_toc_lookup(toc, PARALLEL_COPY_WAL_USAGE, false);
+ InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber],
+ &walusage[ParallelWorkerNumber]);
+
+ MemoryContextSwitchTo(oldcontext);
+ pfree(cstate);
+ return;
+}
+
+/*
+ * ParallelCopyFrom - parallel copy leader's functionality.
+ *
+ * Leader executes the before statement for before statement trigger, if before
+ * statement trigger is present. It will read the table data from the file and
+ * copy the contents to DSM data blocks. It will then read the input contents
+ * from the DSM data block and identify the records based on line breaks. This
+ * information is called line or a record that need to be inserted into a
+ * relation. The line information will be stored in ParallelCopyLineBoundary DSM
+ * data structure. Workers will then process this information and insert the
+ * data in to table. It will repeat this process until the all data is read from
+ * the file and all the DSM data blocks are processed. While processing if
+ * leader identifies that DSM Data blocks or DSM ParallelCopyLineBoundary data
+ * structures is full, leader will wait till the worker frees up some entries
+ * and repeat the process. It will wait till all the lines populated are
+ * processed by the workers and exits.
+ */
+static void
+ParallelCopyFrom(CopyState cstate)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ereport(DEBUG1, (errmsg("Running parallel copy leader")));
+
+ pcshared_info->is_read_in_progress = false;
+ cstate->cur_lineno = 0;
+}
+
/*
* Send copy start/stop messages for frontend copies. These have changed
* in past protocol redesigns.
@@ -1093,6 +1774,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
if (is_from)
{
+ ParallelContext *pcxt = NULL;
Assert(rel);
/* check read-only transaction and parallel mode */
@@ -1102,7 +1784,24 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
cstate = BeginCopyFrom(pstate, rel, stmt->filename, stmt->is_program,
NULL, stmt->attlist, stmt->options);
cstate->whereClause = whereClause;
- *processed = CopyFrom(cstate); /* copy from file to database */
+ cstate->is_parallel = false;
+
+ if (cstate->nworkers > 0)
+ pcxt = BeginParallelCopy(cstate->nworkers, cstate, stmt->attlist,
+ relid);
+
+ if (pcxt)
+ {
+ ParallelCopyFrom(cstate);
+
+ /* Wait for all copy workers to finish */
+ WaitForParallelWorkersToFinish(pcxt);
+ *processed = pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
+ EndParallelCopy(pcxt);
+ }
+ else
+ *processed = CopyFrom(cstate); /* copy from file to database */
+
EndCopyFrom(cstate);
}
else
@@ -1151,6 +1850,7 @@ ProcessCopyOptions(ParseState *pstate,
cstate->is_copy_from = is_from;
cstate->file_encoding = -1;
+ cstate->nworkers = -1;
/* Extract options from the statement node tree */
foreach(option, options)
@@ -1319,6 +2019,53 @@ ProcessCopyOptions(ParseState *pstate,
defel->defname),
parser_errposition(pstate, defel->location)));
}
+ else if (strcmp(defel->defname, "parallel") == 0)
+ {
+ char *endptr, *str;
+ long val;
+
+ if (!cstate->is_copy_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("parallel option supported only for copy from"),
+ parser_errposition(pstate, defel->location)));
+ if (cstate->nworkers >= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("conflicting or redundant options"),
+ parser_errposition(pstate, defel->location)));
+
+ str = defGetString(defel);
+
+ errno = 0; /* To distinguish success/failure after call */
+ val = strtol(str, &endptr, 10);
+
+ /* Check for various possible errors */
+ if ((errno == ERANGE && (val == LONG_MAX || val == LONG_MIN))
+ || (errno != 0 && val == 0) ||
+ *endptr)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("improper use of argument to option \"%s\"",
+ defel->defname),
+ parser_errposition(pstate, defel->location)));
+
+ if (endptr == str)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("no digits were found in argument to option \"%s\"",
+ defel->defname),
+ parser_errposition(pstate, defel->location)));
+
+ cstate->nworkers = (int) val;
+
+ if (cstate->nworkers <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("argument to option \"%s\" must be a positive integer greater than zero",
+ defel->defname),
+ parser_errposition(pstate, defel->location)));
+ }
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1672,11 +2419,12 @@ BeginCopy(ParseState *pstate,
/*
* PopulateCommonCstateInfo - Populates the common variables required for copy
- * from operation. This is a helper function for BeginCopy function.
+ * from operation. This is a helper function for BeginCopy &
+ * InitializeParallelCopyInfo function.
*/
static void
PopulateCommonCstateInfo(CopyState cstate, TupleDesc tupDesc,
- List *attnamelist)
+ List *attnamelist)
{
int num_phys_attrs;
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c639833..82843c6 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -14,6 +14,7 @@
#ifndef COPY_H
#define COPY_H
+#include "access/parallel.h"
#include "nodes/execnodes.h"
#include "nodes/parsenodes.h"
#include "parser/parse_node.h"
@@ -41,4 +42,5 @@ extern uint64 CopyFrom(CopyState cstate);
extern DestReceiver *CreateCopyDestReceiver(void);
+extern void ParallelCopyMain(dsm_segment *seg, shm_toc *toc);
#endif /* COPY_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 7eaaad1..3a83d0f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1699,6 +1699,12 @@ ParallelBitmapHeapState
ParallelBlockTableScanDesc
ParallelCompletionPtr
ParallelContext
+ParallelCopyLineBoundaries
+ParallelCopyLineBoundary
+ParallelCopyData
+ParallelCopyDataBlock
+ParallelCopyLineBuf
+ParallelCopyShmInfo
ParallelExecutorInfo
ParallelHashGrowth
ParallelHashJoinBatch
@@ -2215,6 +2221,7 @@ SerCommitSeqNo
SerialControl
SerializableXactHandle
SerializedActiveRelMaps
+SerializedParallelCopyState
SerializedReindexState
SerializedSnapshotData
SerializedTransactionState
--
1.8.3.1
v2-0003-Allow-copy-from-command-to-process-data-from-file.patchtext/x-patch; charset=US-ASCII; name=v2-0003-Allow-copy-from-command-to-process-data-from-file.patchDownload
From ca4254873e6dc7bd7f23943d14f7c3fd5c4e3cd1 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Wed, 22 Jul 2020 19:18:29 +0530
Subject: [PATCH v2 3/6] Allow copy from command to process data from
file/STDIN contents to a table in parallel.
This feature allows the copy from to leverage multiple CPUs in order to copy
data from file/STDIN to a table. This adds a PARALLEL option to COPY FROM
command where the user can specify the number of workers that can be used
to perform the COPY FROM command. Specifying zero as number of workers will
disable parallelism.
The backend, to which the "COPY FROM" query is submitted acts as leader with
the responsibility of reading data from the file/stdin, launching at most n
number of workers as specified with PARALLEL 'n' option in the "COPY FROM"
query. The leader populates the common data required for the workers execution
in the DSM and shares it with the workers. The leader then executes before
statement triggers if there exists any. Leader populates DSM lines which
includes the start offset and line size, while populating the lines it reads
as many blocks as required into the DSM data blocks from the file. Each block
is of 64K size. The leader parses the data to identify a line, the existing
logic from CopyReadLineText which identifies the lines with some changes was
used for this. Leader checks if a free line is available to copy the
information, if there is no free line it waits till the required line is
freed up by the worker and then copies the identified lines information
(offset & line size) into the DSM lines. This process is repeated till the
complete file is processed. Simultaneously, the workers cache the lines(50)
locally into the local memory and release the lines to the leader for further
populating. Each worker processes the lines that it cached and inserts it into
the table. The leader does not participate in the insertion of data, leaders
only responsibility will be to identify the lines as fast as possible for the
workers to do the actual copy operation. The leader waits till all the lines
populated are processed by the workers and exits.
We have chosen this design based on the reason "that everything stalls if the
leader doesn't accept further input data, as well as when there are no
available splitted chunks so it doesn't seem like a good idea to have the
leader do other work. This is backed by the performance data where we have
seen that with 1 worker there is just a 5-10% performance difference".
---
src/backend/access/common/toast_internals.c | 11 +-
src/backend/access/heap/heapam.c | 13 -
src/backend/access/transam/xact.c | 31 +
src/backend/commands/copy.c | 894 ++++++++++++++++++++++++++--
src/include/access/xact.h | 2 +
src/tools/pgindent/typedefs.list | 1 +
6 files changed, 899 insertions(+), 53 deletions(-)
diff --git a/src/backend/access/common/toast_internals.c b/src/backend/access/common/toast_internals.c
index 25a81e5..586d53d 100644
--- a/src/backend/access/common/toast_internals.c
+++ b/src/backend/access/common/toast_internals.c
@@ -17,6 +17,7 @@
#include "access/genam.h"
#include "access/heapam.h"
#include "access/heaptoast.h"
+#include "access/parallel.h"
#include "access/table.h"
#include "access/toast_internals.h"
#include "access/xact.h"
@@ -116,7 +117,15 @@ toast_save_datum(Relation rel, Datum value,
TupleDesc toasttupDesc;
Datum t_values[3];
bool t_isnull[3];
- CommandId mycid = GetCurrentCommandId(true);
+
+ /*
+ * Parallel copy can insert toast tuples, in case of parallel copy the
+ * command would have been set already by calling AssignCommandIdForWorker.
+ * For parallel copy call GetCurrentCommandId to get currentCommandId by
+ * passing used as false, as this is taken care earlier.
+ */
+ CommandId mycid = IsParallelWorker() ? GetCurrentCommandId(false) :
+ GetCurrentCommandId(true);
struct varlena *result;
struct varatt_external toast_pointer;
union
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index d881f4c..70ecd51 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2012,19 +2012,6 @@ static HeapTuple
heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
CommandId cid, int options)
{
- /*
- * Parallel operations are required to be strictly read-only in a parallel
- * worker. Parallel inserts are not safe even in the leader in the
- * general case, because group locking means that heavyweight locks for
- * relation extension or GIN page locks will not conflict between members
- * of a lock group, but we don't prohibit that case here because there are
- * useful special cases that we can safely allow, such as CREATE TABLE AS.
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-
tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index bd4c3cf..1aed74f 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -504,6 +504,37 @@ GetCurrentFullTransactionIdIfAny(void)
}
/*
+ * AssignFullTransactionIdForWorker
+ *
+ * For parallel copy, transaction id of leader will be used by the workers.
+ */
+void
+AssignFullTransactionIdForWorker(FullTransactionId fullTransactionId)
+{
+ TransactionState s = CurrentTransactionState;
+
+ Assert((IsInParallelMode() || IsParallelWorker()));
+ s->fullTransactionId = fullTransactionId;
+}
+
+/*
+ * AssignCommandIdForWorker
+ *
+ * For parallel copy, command id of leader will be used by the workers.
+ */
+void
+AssignCommandIdForWorker(CommandId commandId, bool used)
+{
+ Assert((IsInParallelMode() || IsParallelWorker()));
+
+ /* this is global to a transaction, not subtransaction-local */
+ if (used)
+ currentCommandIdUsed = true;
+
+ currentCommandId = commandId;
+}
+
+/*
* MarkCurrentTransactionIdLoggedIfAny
*
* Remember that the current xid - if it is assigned - now has been wal logged.
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index a3cf750..9b657ab 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -26,6 +26,7 @@
#include "access/xlog.h"
#include "catalog/dependency.h"
#include "catalog/pg_authid.h"
+#include "catalog/pg_proc_d.h"
#include "catalog/pg_type.h"
#include "commands/copy.h"
#include "commands/defrem.h"
@@ -40,11 +41,13 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "optimizer/clauses.h"
#include "optimizer/optimizer.h"
#include "parser/parse_coerce.h"
#include "parser/parse_collate.h"
#include "parser/parse_expr.h"
#include "parser/parse_relation.h"
+#include "pgstat.h"
#include "port/pg_bswap.h"
#include "rewrite/rewriteHandler.h"
#include "storage/fd.h"
@@ -95,6 +98,18 @@ typedef enum CopyInsertMethod
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
} CopyInsertMethod;
+/*
+ * State of the line.
+ */
+typedef enum ParallelCopyLineState
+{
+ LINE_INIT, /* initial state of line */
+ LINE_LEADER_POPULATING, /* leader processing line */
+ LINE_LEADER_POPULATED, /* leader completed populating line */
+ LINE_WORKER_PROCESSING, /* worker processing line */
+ LINE_WORKER_PROCESSED /* worker completed processing line */
+}ParallelCopyLineState;
+
#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
/*
@@ -123,6 +138,7 @@ typedef enum CopyInsertMethod
#define IsParallelCopy() (cstate->is_parallel)
#define IsLeader() (cstate->pcdata->is_leader)
+#define IsWorker() (IsParallelCopy() && !IsLeader())
#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
/*
@@ -552,9 +568,13 @@ if (1) \
{ \
if (raw_buf_ptr > cstate->raw_buf_index) \
{ \
- appendBinaryStringInfo(&cstate->line_buf, \
- cstate->raw_buf + cstate->raw_buf_index, \
- raw_buf_ptr - cstate->raw_buf_index); \
+ if (!IsParallelCopy()) \
+ appendBinaryStringInfo(&cstate->line_buf, \
+ cstate->raw_buf + cstate->raw_buf_index, \
+ raw_buf_ptr - cstate->raw_buf_index); \
+ else \
+ line_size += raw_buf_ptr - cstate->raw_buf_index; \
+ \
cstate->raw_buf_index = raw_buf_ptr; \
} \
} else ((void) 0)
@@ -567,26 +587,65 @@ if (1) \
goto not_end_of_copy; \
} else ((void) 0)
+/* Begin parallel copy Macros */
+#define SET_NEWLINE_SIZE() \
+{ \
+ if (cstate->eol_type == EOL_NL || cstate->eol_type == EOL_CR) \
+ new_line_size = 1; \
+ else if (cstate->eol_type == EOL_CRNL) \
+ new_line_size = 2; \
+ else \
+ new_line_size = 0; \
+}
+
+/*
+ * COPY_WAIT_TO_PROCESS - Wait before continuing to process.
+ */
+#define COPY_WAIT_TO_PROCESS() \
+{ \
+ CHECK_FOR_INTERRUPTS(); \
+ (void) WaitLatch(MyLatch, \
+ WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, \
+ 1L, WAIT_EVENT_PG_SLEEP); \
+ ResetLatch(MyLatch); \
+}
+
/*
* CLEAR_EOL_LINE - Wrapper for clearing EOL.
*/
#define CLEAR_EOL_LINE() \
if (!result && !IsHeaderLine()) \
- ClearEOLFromCopiedData(cstate, cstate->line_buf.data, \
- cstate->line_buf.len, \
- &cstate->line_buf.len) \
+{ \
+ if (IsParallelCopy()) \
+ ClearEOLFromCopiedData(cstate, cstate->raw_buf, \
+ raw_buf_ptr, &line_size); \
+ else \
+ ClearEOLFromCopiedData(cstate, cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ &cstate->line_buf.len); \
+} \
/*
* INCREMENTPROCESSED - Increment the lines processed.
*/
-#define INCREMENTPROCESSED(processed) \
-processed++;
+#define INCREMENTPROCESSED(processed) \
+{ \
+ if (!IsParallelCopy()) \
+ processed++; \
+ else \
+ pg_atomic_add_fetch_u64(&cstate->pcdata->pcshared_info->processed, 1); \
+}
/*
* GETPROCESSED - Get the lines processed.
*/
#define GETPROCESSED(processed) \
-return processed;
+if (!IsParallelCopy()) \
+ return processed; \
+else \
+ return pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
+
+/* End parallel copy Macros */
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
@@ -639,7 +698,10 @@ static void PopulateCommonCstateInfo(CopyState cstate, TupleDesc tup_desc,
static void ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
int copy_line_pos, int *copy_line_size);
static void ConvertToServerEncoding(CopyState cstate);
-
+static void ExecBeforeStmtTrigger(CopyState cstate);
+static void CheckTargetRelValidity(CopyState cstate);
+static void PopulateCstateCatalogInfo(CopyState cstate);
+static pg_attribute_always_inline uint32 GetLinePosition(CopyState cstate);
/*
* SerializeParallelCopyState - Copy shared_cstate using cstate information.
*/
@@ -721,6 +783,137 @@ PopulateParallelCopyShmInfo(ParallelCopyShmInfo *shared_info_ptr,
}
/*
+ * IsTriggerFunctionParallelSafe - Check if the trigger function is parallel
+ * safe for the triggers. Return false if any one of the trigger has parallel
+ * unsafe function.
+ */
+static pg_attribute_always_inline bool
+IsTriggerFunctionParallelSafe(TriggerDesc *trigdesc)
+{
+ int i;
+ for (i = 0; i < trigdesc->numtriggers; i++)
+ {
+ Trigger *trigger = &trigdesc->triggers[i];
+ int trigtype = RI_TRIGGER_NONE;
+
+ if (func_parallel(trigger->tgfoid) != PROPARALLEL_SAFE)
+ return false;
+
+ /* If the trigger is parallel safe, also look for RI_TRIGGER. */
+ trigtype = RI_FKey_trigger_type(trigger->tgfoid);
+ if (trigtype == RI_TRIGGER_PK || trigtype == RI_TRIGGER_FK)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * CheckExprParallelSafety - determine parallel safety of volatile expressions
+ * in default clause of column definition or in where clause and return true if
+ * they are parallel safe.
+ */
+static pg_attribute_always_inline bool
+CheckExprParallelSafety(CopyState cstate)
+{
+ if (contain_volatile_functions(cstate->whereClause))
+ {
+ if (max_parallel_hazard((Query *)cstate->whereClause) != PROPARALLEL_SAFE)
+ return false;
+ }
+
+ /*
+ * Check if any of the column has volatile default expression. if yes, and
+ * they are not parallel safe, then parallelism is not allowed. For
+ * instance, if there are any serial/bigserial columns for which nextval()
+ * default expression which is parallel unsafe is associated, parallelism
+ * should not be allowed. In non parallel copy volatile functions are not
+ * checked for nextval().
+ */
+ if (cstate->defexprs != NULL && cstate->num_defaults != 0)
+ {
+ int i;
+ for (i = 0; i < cstate->num_defaults; i++)
+ {
+ bool volatile_expr = contain_volatile_functions((Node *)cstate->defexprs[i]->expr);
+ if (volatile_expr &&
+ (max_parallel_hazard((Query *)cstate->defexprs[i]->expr)) !=
+ PROPARALLEL_SAFE)
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * FindInsertMethod - determine insert mode single, multi, or multi conditional.
+ */
+static pg_attribute_always_inline CopyInsertMethod
+FindInsertMethod(CopyState cstate)
+{
+ if (cstate->rel->trigdesc != NULL &&
+ (cstate->rel->trigdesc->trig_insert_before_row ||
+ cstate->rel->trigdesc->trig_insert_instead_row))
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
+ cstate->rel->trigdesc != NULL &&
+ cstate->rel->trigdesc->trig_insert_new_table)
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+ return CIM_MULTI_CONDITIONAL;
+
+ return CIM_MULTI;
+}
+
+/*
+ * IsParallelCopyAllowed - check for the cases where parallel copy is not
+ * applicable.
+ */
+static pg_attribute_always_inline bool
+IsParallelCopyAllowed(CopyState cstate)
+{
+ /* Parallel copy not allowed for frontend (2.0 protocol) & binary option. */
+ if ((cstate->copy_dest == COPY_OLD_FE) || cstate->binary)
+ return false;
+
+ /* Check if copy is into foreign table or temporary table. */
+ if (cstate->rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE ||
+ RelationUsesLocalBuffers(cstate->rel))
+ return false;
+
+ /* Check if trigger function is parallel safe. */
+ if (cstate->rel->trigdesc != NULL &&
+ !IsTriggerFunctionParallelSafe(cstate->rel->trigdesc))
+ return false;
+
+ /*
+ * Check if there is after statement or instead of trigger or transition
+ * table triggers.
+ */
+ if (cstate->rel->trigdesc != NULL &&
+ (cstate->rel->trigdesc->trig_insert_after_statement ||
+ cstate->rel->trigdesc->trig_insert_instead_row ||
+ cstate->rel->trigdesc->trig_insert_new_table))
+ return false;
+
+ /* Check if the volatile expressions are parallel safe, if present any. */
+ if (!CheckExprParallelSafety(cstate))
+ return false;
+
+ /* Check if the insertion mode is single. */
+ if (FindInsertMethod(cstate) == CIM_SINGLE)
+ return false;
+
+ return true;
+}
+
+/*
* BeginParallelCopy - start parallel copy tasks.
*
* Get the number of workers required to perform the parallel copy. The data
@@ -748,6 +941,7 @@ BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
ParallelCopyData *pcdata;
MemoryContext oldcontext;
+ CheckTargetRelValidity(cstate);
parallel_workers = Min(nworkers, max_worker_processes);
/* Can't perform copy in parallel */
@@ -759,6 +953,19 @@ BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
MemoryContextSwitchTo(oldcontext);
cstate->pcdata = pcdata;
+ /*
+ * User chosen parallel copy. Determine if the parallel copy is actually
+ * allowed. If not, go with the non-parallel mode.
+ */
+ if (!IsParallelCopyAllowed(cstate))
+ {
+ elog(WARNING,
+ "Parallel copy not supported for specified table, copy will be run in non-parallel mode");
+ return NULL;
+ }
+
+ full_transaction_id = GetCurrentFullTransactionId();
+
EnterParallelMode();
pcxt = CreateParallelContext("postgres", "ParallelCopyMain",
parallel_workers);
@@ -967,9 +1174,212 @@ InitializeParallelCopyInfo(SerializedParallelCopyState *shared_cstate,
cstate->line_buf_converted = false;
cstate->raw_buf = NULL;
cstate->raw_buf_index = cstate->raw_buf_len = 0;
+
+ PopulateCstateCatalogInfo(cstate);
+
+ /* Create workspace for CopyReadAttributes results. */
+ if (!cstate->binary)
+ {
+ AttrNumber attr_count = list_length(cstate->attnumlist);
+
+ cstate->max_fields = attr_count;
+ cstate->raw_fields = (char **)palloc(attr_count * sizeof(char *));
+ }
}
/*
+ * CacheLineInfo - Cache the line information to local memory.
+ */
+static bool
+CacheLineInfo(CopyState cstate, uint32 buff_count)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyData *pcdata = cstate->pcdata;
+ uint32 write_pos;
+ ParallelCopyDataBlock *data_blk_ptr;
+ ParallelCopyLineBoundary *lineInfo;
+ uint32 offset;
+ int dataSize;
+ int copiedSize = 0;
+
+ resetStringInfo(&pcdata->worker_line_buf[buff_count].line_buf);
+ write_pos = GetLinePosition(cstate);
+ if (-1 == write_pos)
+ return true;
+
+ /* Get the current line information. */
+ lineInfo = &pcshared_info->line_boundaries.ring[write_pos];
+ if (pg_atomic_read_u32(&lineInfo->line_size) == 0)
+ goto empty_data_line_update;
+
+ /* Get the block information. */
+ data_blk_ptr = &pcshared_info->data_blocks[lineInfo->first_block];
+
+ /* Get the offset information from where the data must be copied. */
+ offset = lineInfo->start_offset;
+ pcdata->worker_line_buf[buff_count].cur_lineno = lineInfo->cur_lineno;
+
+ elog(DEBUG1, "[Worker] Processing - line position:%d, block:%d, unprocessed lines:%d, offset:%d, line size:%d",
+ write_pos, lineInfo->first_block,
+ pg_atomic_read_u32(&data_blk_ptr->unprocessed_line_parts),
+ offset, pg_atomic_read_u32(&lineInfo->line_size));
+
+ for (;;)
+ {
+ uint8 skip_bytes = data_blk_ptr->skip_bytes;
+ /*
+ * There is a possibility that the above loop has come out because
+ * data_blk_ptr->curr_blk_completed is set, but dataSize read might
+ * be an old value, if data_blk_ptr->curr_blk_completed and the line is
+ * completed, line_size will be set. Read the line_size again to be
+ * sure if it is complete or partial block.
+ */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+ if (dataSize)
+ {
+ int remainingSize = dataSize - copiedSize;
+ if (!remainingSize)
+ break;
+
+ /* Whole line is in current block. */
+ if (remainingSize + offset + skip_bytes < DATA_BLOCK_SIZE)
+ {
+ appendBinaryStringInfo(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ remainingSize);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts,
+ 1);
+ break;
+ }
+ else
+ {
+ /* Line is spread across the blocks. */
+ uint32 lineInCurrentBlock = (DATA_BLOCK_SIZE - skip_bytes) - offset;
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ lineInCurrentBlock);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts, 1);
+ copiedSize += lineInCurrentBlock;
+ while (copiedSize < dataSize)
+ {
+ uint32 currentBlockCopySize;
+ ParallelCopyDataBlock *currBlkPtr = &pcshared_info->data_blocks[data_blk_ptr->following_block];
+ skip_bytes = currBlkPtr->skip_bytes;
+
+ /*
+ * If complete data is present in current block use
+ * dataSize - copiedSize, or copy the whole block from
+ * current block.
+ */
+ currentBlockCopySize = Min(dataSize - copiedSize, DATA_BLOCK_SIZE - skip_bytes);
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &currBlkPtr->data[0],
+ currentBlockCopySize);
+ pg_atomic_sub_fetch_u32(&currBlkPtr->unprocessed_line_parts, 1);
+ copiedSize += currentBlockCopySize;
+ data_blk_ptr = currBlkPtr;
+ }
+
+ break;
+ }
+ }
+ else
+ {
+ /* Copy this complete block from the current offset. */
+ uint32 lineInCurrentBlock = (DATA_BLOCK_SIZE - skip_bytes) - offset;
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ lineInCurrentBlock);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts, 1);
+ copiedSize += lineInCurrentBlock;
+
+ /*
+ * Reset the offset. For the first copy, copy from the offset. For
+ * the subsequent copy the complete block.
+ */
+ offset = 0;
+
+ /* Set data_blk_ptr to the following block. */
+ data_blk_ptr = &pcshared_info->data_blocks[data_blk_ptr->following_block];
+ }
+
+ for (;;)
+ {
+ /* Get the size of this line */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+
+ /*
+ * If the data is present in current block lineInfo.line_size
+ * will be updated. If the data is spread across the blocks either
+ * of lineInfo.line_size or data_blk_ptr->curr_blk_completed can
+ * be updated. lineInfo.line_size will be updated if the complete
+ * read is finished. data_blk_ptr->curr_blk_completed will be
+ * updated if processing of current block is finished and data
+ * processing is not finished.
+ */
+ if (data_blk_ptr->curr_blk_completed || (dataSize != -1))
+ break;
+
+ COPY_WAIT_TO_PROCESS()
+ }
+ }
+
+empty_data_line_update:
+ elog(DEBUG1, "[Worker] Completed processing line:%d", write_pos);
+ pg_atomic_write_u32(&lineInfo->line_state, LINE_WORKER_PROCESSED);
+ pg_atomic_write_u32(&lineInfo->line_size, -1);
+ pg_atomic_add_fetch_u64(&pcshared_info->total_worker_processed, 1);
+ return false;
+}
+
+/*
+ * GetWorkerLine - Returns a line for worker to process.
+ */
+static bool
+GetWorkerLine(CopyState cstate)
+{
+ uint32 buff_count;
+ ParallelCopyData *pcdata = cstate->pcdata;
+
+ /*
+ * Copy the line data to line_buf and release the line position so that the
+ * worker can continue loading data.
+ */
+ if (pcdata->worker_line_buf_pos < pcdata->worker_line_buf_count)
+ goto return_line;
+
+ pcdata->worker_line_buf_pos = 0;
+ pcdata->worker_line_buf_count = 0;
+
+ for (buff_count = 0; buff_count < WORKER_CHUNK_COUNT; buff_count++)
+ {
+ bool result = CacheLineInfo(cstate, buff_count);
+ if (result)
+ break;
+
+ pcdata->worker_line_buf_count++;
+ }
+
+ if (pcdata->worker_line_buf_count)
+ goto return_line;
+ else
+ resetStringInfo(&cstate->line_buf);
+
+ return true;
+
+return_line:
+ cstate->line_buf = pcdata->worker_line_buf[pcdata->worker_line_buf_pos].line_buf;
+ cstate->cur_lineno = pcdata->worker_line_buf[pcdata->worker_line_buf_pos].cur_lineno;
+ cstate->line_buf_valid = true;
+
+ /* Mark that encoding conversion hasn't occurred yet. */
+ cstate->line_buf_converted = false;
+ ConvertToServerEncoding(cstate);
+ pcdata->worker_line_buf_pos++;
+ return false;
+ }
+
+/*
* ParallelCopyMain - parallel copy worker's code.
*
* Where clause handling, convert tuple to columns, add default null values for
@@ -1018,6 +1428,8 @@ ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
ereport(DEBUG1, (errmsg("Starting parallel copy worker")));
pcdata->pcshared_info = pcshared_info;
+ AssignFullTransactionIdForWorker(pcshared_info->full_transaction_id);
+ AssignCommandIdForWorker(pcshared_info->mycid, true);
shared_cstate = (SerializedParallelCopyState *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_CSTATE, false);
cstate->null_print = (char *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_NULL_PRINT, true);
@@ -1078,6 +1490,33 @@ ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
}
/*
+ * UpdateBlockInLineInfo - Update the line information.
+ */
+static pg_attribute_always_inline int
+UpdateBlockInLineInfo(CopyState cstate, uint32 blk_pos,
+ uint32 offset, uint32 line_size, uint32 line_state)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundaries *lineBoundaryPtr = &pcshared_info->line_boundaries;
+ ParallelCopyLineBoundary *lineInfo;
+ int line_pos = lineBoundaryPtr->pos;
+
+ /* Update the line information for the worker to pick and process. */
+ lineInfo = &lineBoundaryPtr->ring[line_pos];
+ while (pg_atomic_read_u32(&lineInfo->line_size) != -1)
+ COPY_WAIT_TO_PROCESS()
+
+ lineInfo->first_block = blk_pos;
+ lineInfo->start_offset = offset;
+ lineInfo->cur_lineno = cstate->cur_lineno;
+ pg_atomic_write_u32(&lineInfo->line_size, line_size);
+ pg_atomic_write_u32(&lineInfo->line_state, line_state);
+ lineBoundaryPtr->pos = (lineBoundaryPtr->pos + 1) % RINGSIZE;
+
+ return line_pos;
+}
+
+/*
* ParallelCopyFrom - parallel copy leader's functionality.
*
* Leader executes the before statement for before statement trigger, if before
@@ -1100,8 +1539,298 @@ ParallelCopyFrom(CopyState cstate)
ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
ereport(DEBUG1, (errmsg("Running parallel copy leader")));
+ /* Execute the before statement triggers from the leader */
+ ExecBeforeStmtTrigger(cstate);
+
+ /* On input just throw the header line away. */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
+ {
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ {
+ pcshared_info->is_read_in_progress = false;
+ return; /* done */
+ }
+ }
+
+ for (;;)
+ {
+ bool done;
+ cstate->cur_lineno++;
+
+ /* Actually read the line into memory here. */
+ done = CopyReadLine(cstate);
+
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ break;
+ }
+
pcshared_info->is_read_in_progress = false;
cstate->cur_lineno = 0;
+ }
+
+/*
+ * GetLinePosition - return the line position that worker should process.
+ */
+static uint32
+GetLinePosition(CopyState cstate)
+{
+ ParallelCopyData *pcdata = cstate->pcdata;
+ ParallelCopyShmInfo *pcshared_info = pcdata->pcshared_info;
+ uint32 previous_pos = pcdata->worker_processed_pos;
+ uint32 write_pos = (previous_pos == -1) ? 0 : (previous_pos + 1) % RINGSIZE;
+ for (;;)
+ {
+ int dataSize;
+ bool is_read_in_progress = pcshared_info->is_read_in_progress;
+ ParallelCopyLineBoundary *lineInfo;
+ ParallelCopyDataBlock *data_blk_ptr;
+ ParallelCopyLineState line_state = LINE_LEADER_POPULATED;
+ ParallelCopyLineState curr_line_state;
+ CHECK_FOR_INTERRUPTS();
+
+ /* File read completed & no elements to process. */
+ if (!is_read_in_progress &&
+ (pcshared_info->populated ==
+ pg_atomic_read_u64(&pcshared_info->total_worker_processed)))
+ {
+ write_pos = -1;
+ break;
+ }
+
+ /* Get the current line information. */
+ lineInfo = &pcshared_info->line_boundaries.ring[write_pos];
+ curr_line_state = pg_atomic_read_u32(&lineInfo->line_state);
+ if ((write_pos % WORKER_CHUNK_COUNT == 0) &&
+ (curr_line_state == LINE_WORKER_PROCESSED ||
+ curr_line_state == LINE_WORKER_PROCESSING))
+ {
+ pcdata->worker_processed_pos = write_pos;
+ write_pos = (write_pos + WORKER_CHUNK_COUNT) % RINGSIZE;
+ continue;
+ }
+
+ /* Get the size of this line. */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+
+ if (dataSize != 0) /* If not an empty line. */
+ {
+ /* Get the block information. */
+ data_blk_ptr = &pcshared_info->data_blocks[lineInfo->first_block];
+
+ if (!data_blk_ptr->curr_blk_completed && (dataSize == -1))
+ {
+ /* Wait till the current line or block is added. */
+ COPY_WAIT_TO_PROCESS()
+ continue;
+ }
+ }
+
+ /* Make sure that no worker has consumed this element. */
+ if (pg_atomic_compare_exchange_u32(&lineInfo->line_state,
+ &line_state, LINE_WORKER_PROCESSING))
+ break;
+ }
+
+ pcdata->worker_processed_pos = write_pos;
+ return write_pos;
+}
+
+/*
+ * GetFreeCopyBlock - Get a free block for data to be copied.
+ */
+static pg_attribute_always_inline uint32
+GetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
+{
+ int count = 0;
+ uint32 last_free_block = pcshared_info->cur_block_pos;
+ uint32 block_pos = (last_free_block != -1) ? ((last_free_block + 1) % MAX_BLOCKS_COUNT): 0;
+
+ /*
+ * Get a new block for copying data, don't check current block, current
+ * block will have some unprocessed data.
+ */
+ while (count < (MAX_BLOCKS_COUNT - 1))
+ {
+ ParallelCopyDataBlock *dataBlkPtr = &pcshared_info->data_blocks[block_pos];
+ uint32 unprocessed_line_parts = pg_atomic_read_u32(&dataBlkPtr->unprocessed_line_parts);
+ if (unprocessed_line_parts == 0)
+ {
+ dataBlkPtr->curr_blk_completed = false;
+ dataBlkPtr->skip_bytes = 0;
+ pcshared_info->cur_block_pos = block_pos;
+ return block_pos;
+ }
+
+ block_pos = (block_pos + 1) % MAX_BLOCKS_COUNT;
+ count++;
+ }
+
+ return -1;
+}
+
+/*
+ * WaitGetFreeCopyBlock - If there are no blocks available, wait and get a block
+ * for copying data.
+ */
+static uint32
+WaitGetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
+{
+ uint32 new_free_pos = -1;
+ for (;;)
+ {
+ new_free_pos = GetFreeCopyBlock(pcshared_info);
+ if (new_free_pos != -1) /* We have got one block, break now. */
+ break;
+
+ COPY_WAIT_TO_PROCESS()
+ }
+
+ return new_free_pos;
+}
+
+/*
+ * SetRawBufForLoad - Set raw_buf to the shared memory where the file data must
+ * be read.
+ */
+static void
+SetRawBufForLoad(CopyState cstate, uint32 line_size, uint32 copy_buf_len,
+ uint32 raw_buf_ptr, char **copy_raw_buf)
+{
+ ParallelCopyShmInfo *pcshared_info;
+ uint32 cur_block_pos;
+ uint32 next_block_pos;
+ ParallelCopyDataBlock *cur_data_blk_ptr = NULL;
+ ParallelCopyDataBlock *next_data_blk_ptr = NULL;
+
+ if (!IsParallelCopy())
+ return;
+
+ pcshared_info = cstate->pcdata->pcshared_info;
+ cur_block_pos = pcshared_info->cur_block_pos;
+ cur_data_blk_ptr = (cstate->raw_buf) ? &pcshared_info->data_blocks[cur_block_pos] : NULL;
+ next_block_pos = WaitGetFreeCopyBlock(pcshared_info);
+ next_data_blk_ptr = &pcshared_info->data_blocks[next_block_pos];
+
+ /* set raw_buf to the data block in shared memory */
+ cstate->raw_buf = next_data_blk_ptr->data;
+ *copy_raw_buf = cstate->raw_buf;
+ if (cur_data_blk_ptr && line_size)
+ {
+ /*
+ * Mark the previous block as completed, worker can start copying this
+ * data.
+ */
+ cur_data_blk_ptr->following_block = next_block_pos;
+ pg_atomic_add_fetch_u32(&cur_data_blk_ptr->unprocessed_line_parts, 1);
+ cur_data_blk_ptr->skip_bytes = copy_buf_len - raw_buf_ptr;
+ cur_data_blk_ptr->curr_blk_completed = true;
+ }
+}
+
+/*
+ * EndLineParallelCopy - Update the line information in shared memory.
+ */
+static void
+EndLineParallelCopy(CopyState cstate, uint32 line_pos, uint32 line_size,
+ uint32 raw_buf_ptr)
+{
+ uint8 new_line_size;
+ if (!IsParallelCopy())
+ return;
+
+ if (!IsHeaderLine())
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundaries *lineBoundaryPtr = &pcshared_info->line_boundaries;
+ SET_NEWLINE_SIZE()
+ if (line_size)
+ {
+ ParallelCopyLineBoundary *lineInfo = &lineBoundaryPtr->ring[line_pos];
+ /*
+ * If the new_line_size > raw_buf_ptr, then the new block has only
+ * new line char content. The unprocessed count should not be
+ * increased in this case.
+ */
+ if (raw_buf_ptr > new_line_size)
+ {
+ uint32 cur_block_pos = pcshared_info->cur_block_pos;
+ ParallelCopyDataBlock *curr_data_blk_ptr = &pcshared_info->data_blocks[cur_block_pos];
+ pg_atomic_add_fetch_u32(&curr_data_blk_ptr->unprocessed_line_parts, 1);
+ }
+
+ /* Update line size. */
+ pg_atomic_write_u32(&lineInfo->line_size, line_size);
+ pg_atomic_write_u32(&lineInfo->line_state, LINE_LEADER_POPULATED);
+ elog(DEBUG1, "[Leader] After adding - line position:%d, line_size:%d",
+ line_pos, line_size);
+ pcshared_info->populated++;
+ }
+ else if (new_line_size)
+ {
+ /* This means only new line char, empty record should be inserted.*/
+ ParallelCopyLineBoundary *lineInfo;
+ line_pos = UpdateBlockInLineInfo(cstate, -1, -1, 0,
+ LINE_LEADER_POPULATED);
+ lineInfo = &lineBoundaryPtr->ring[line_pos];
+ elog(DEBUG1, "[Leader] Added empty line with offset:%d, line position:%d, line size:%d",
+ lineInfo->start_offset, line_pos,
+ pg_atomic_read_u32(&lineInfo->line_size));
+ pcshared_info->populated++;
+ }
+ }
+}
+
+/*
+ * ExecBeforeStmtTrigger - Execute the before statement trigger, this will be
+ * executed for parallel copy by the leader process.
+ */
+static void
+ExecBeforeStmtTrigger(CopyState cstate)
+{
+ EState *estate = CreateExecutorState();
+ ResultRelInfo *resultRelInfo;
+
+ Assert(IsLeader());
+
+ /*
+ * We need a ResultRelInfo so we can use the regular executor's
+ * index-entry-making machinery. (There used to be a huge amount of code
+ * here that basically duplicated execUtils.c ...)
+ */
+ resultRelInfo = makeNode(ResultRelInfo);
+ InitResultRelInfo(resultRelInfo,
+ cstate->rel,
+ 1, /* must match rel's position in range_table */
+ NULL,
+ 0);
+
+ /* Verify the named relation is a valid target for INSERT */
+ CheckValidResultRel(resultRelInfo, CMD_INSERT);
+
+ estate->es_result_relations = resultRelInfo;
+ estate->es_num_result_relations = 1;
+ estate->es_result_relation_info = resultRelInfo;
+
+ ExecInitRangeTable(estate, cstate->range_table);
+
+ /*
+ * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
+ * should do this for COPY, since it's not really an "INSERT" statement as
+ * such. However, executing these triggers maintains consistency with the
+ * EACH ROW triggers that we already fire on COPY.
+ */
+ ExecBSInsertTriggers(estate, resultRelInfo);
+
+ /* Close any trigger target relations */
+ ExecCleanUpTriggerState(estate);
+
+ FreeExecutorState(estate);
}
/*
@@ -3536,7 +4265,8 @@ CopyFrom(CopyState cstate)
PartitionTupleRouting *proute = NULL;
ErrorContextCallback errcallback;
- CommandId mycid = GetCurrentCommandId(true);
+ CommandId mycid = IsParallelCopy() ? GetCurrentCommandId(false) :
+ GetCurrentCommandId(true);
int ti_options = 0; /* start with default options for insert */
BulkInsertState bistate = NULL;
CopyInsertMethod insertMethod;
@@ -3546,7 +4276,14 @@ CopyFrom(CopyState cstate)
bool has_instead_insert_row_trig;
bool leafpart_use_multi_insert = false;
- CheckTargetRelValidity(cstate);
+ /*
+ * Perform this check if it is not parallel copy. In case of parallel
+ * copy, this check is done by the leader, so that if any invalid case
+ * exist the copy from command will error out from the leader itself,
+ * avoiding launching workers, just to throw error.
+ */
+ if (!IsParallelCopy())
+ CheckTargetRelValidity(cstate);
/*
* If the target file is new-in-transaction, we assume that checking FSM
@@ -3586,7 +4323,8 @@ CopyFrom(CopyState cstate)
target_resultRelInfo = resultRelInfo;
/* Verify the named relation is a valid target for INSERT */
- CheckValidResultRel(resultRelInfo, CMD_INSERT);
+ if (!IsParallelCopy())
+ CheckValidResultRel(resultRelInfo, CMD_INSERT);
ExecOpenIndices(resultRelInfo, false);
@@ -3735,13 +4473,16 @@ CopyFrom(CopyState cstate)
has_instead_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
resultRelInfo->ri_TrigDesc->trig_insert_instead_row);
- /*
- * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
- * should do this for COPY, since it's not really an "INSERT" statement as
- * such. However, executing these triggers maintains consistency with the
- * EACH ROW triggers that we already fire on COPY.
- */
- ExecBSInsertTriggers(estate, resultRelInfo);
+ if (!IsParallelCopy())
+ {
+ /*
+ * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
+ * should do this for COPY, since it's not really an "INSERT" statement as
+ * such. However, executing these triggers maintains consistency with the
+ * EACH ROW triggers that we already fire on COPY.
+ */
+ ExecBSInsertTriggers(estate, resultRelInfo);
+ }
econtext = GetPerTupleExprContext(estate);
@@ -3841,6 +4582,16 @@ CopyFrom(CopyState cstate)
!has_instead_insert_row_trig &&
resultRelInfo->ri_FdwRoutine == NULL;
+ /*
+ * If the table has any partitions that are either foreign or
+ * has BEFORE/INSTEAD OF triggers, we can't perform copy
+ * operations with parallel workers.
+ */
+ if (!leafpart_use_multi_insert && IsParallelWorker())
+ ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot perform PARALLEL COPY if partition has BEFORE/INSTEAD OF triggers, or if the partition is foreign partition"),
+ errhint("Try COPY without PARALLEL option")));
+
/* Set the multi-insert buffer to use for this partition. */
if (leafpart_use_multi_insert)
{
@@ -4261,7 +5012,7 @@ BeginCopyFrom(ParseState *pstate,
{
initStringInfo(&cstate->line_buf);
cstate->line_buf_converted = false;
- cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf = (IsParallelCopy()) ? NULL : (char *) palloc(RAW_BUF_SIZE + 1);
cstate->raw_buf_index = cstate->raw_buf_len = 0;
}
@@ -4406,26 +5157,35 @@ NextCopyFromRawFields(CopyState cstate, char ***fields, int *nfields)
/* only available for text or csv input */
Assert(!cstate->binary);
- /* on input just throw the header line away */
- if (cstate->cur_lineno == 0 && cstate->header_line)
+ if (IsParallelCopy())
{
- cstate->cur_lineno++;
- if (CopyReadLine(cstate))
- return false; /* done */
+ done = GetWorkerLine(cstate);
+ if (done && cstate->line_buf.len == 0)
+ return false;
}
+ else
+ {
+ /* on input just throw the header line away */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
+ {
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ return false; /* done */
+ }
- cstate->cur_lineno++;
+ cstate->cur_lineno++;
- /* Actually read the line into memory here */
- done = CopyReadLine(cstate);
+ /* Actually read the line into memory here */
+ done = CopyReadLine(cstate);
- /*
- * EOF at start of line means we're done. If we see EOF after some
- * characters, we act as though it was newline followed by EOF, ie,
- * process the line and then exit loop on next iteration.
- */
- if (done && cstate->line_buf.len == 0)
- return false;
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ return false;
+ }
/* Parse the line into de-escaped field values */
if (cstate->csv_mode)
@@ -4673,9 +5433,31 @@ CopyReadLine(CopyState cstate)
*/
if (cstate->copy_dest == COPY_NEW_FE)
{
+ bool bIsFirst = true;
do
{
- cstate->raw_buf_index = cstate->raw_buf_len;
+ if (!IsParallelCopy())
+ cstate->raw_buf_index = cstate->raw_buf_len;
+ else
+ {
+ if (cstate->raw_buf_index == RAW_BUF_SIZE)
+ {
+ /* Get a new block if it is the first time, From the
+ * subsequent time, reset the index and re-use the same
+ * block.
+ */
+ if (bIsFirst)
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint32 block_pos = WaitGetFreeCopyBlock(pcshared_info);
+ cstate->raw_buf = pcshared_info->data_blocks[block_pos].data;
+ bIsFirst = false;
+ }
+
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+ }
+ }
+
} while (CopyLoadRawBuf(cstate));
}
}
@@ -4730,7 +5512,7 @@ static void
ConvertToServerEncoding(CopyState cstate)
{
/* Done reading the line. Convert it to server encoding. */
- if (cstate->need_transcoding)
+ if (cstate->need_transcoding && (!IsParallelCopy() || IsWorker()))
{
char *cvt;
cvt = pg_any_to_server(cstate->line_buf.data,
@@ -4769,6 +5551,11 @@ CopyReadLineText(CopyState cstate)
char quotec = '\0';
char escapec = '\0';
+ /* For parallel copy */
+ int line_size = 0;
+ int line_pos = 0;
+
+ cstate->eol_type = EOL_UNKNOWN;
if (cstate->csv_mode)
{
quotec = cstate->quote[0];
@@ -4823,6 +5610,8 @@ CopyReadLineText(CopyState cstate)
if (raw_buf_ptr >= copy_buf_len || need_data)
{
REFILL_LINEBUF;
+ SetRawBufForLoad(cstate, line_size, copy_buf_len, raw_buf_ptr,
+ ©_raw_buf);
/*
* Try to read some more data. This will certainly reset
@@ -5047,9 +5836,15 @@ CopyReadLineText(CopyState cstate)
* discard the data and the \. sequence.
*/
if (prev_raw_ptr > cstate->raw_buf_index)
- appendBinaryStringInfo(&cstate->line_buf,
+ {
+ if (!IsParallelCopy())
+ appendBinaryStringInfo(&cstate->line_buf,
cstate->raw_buf + cstate->raw_buf_index,
prev_raw_ptr - cstate->raw_buf_index);
+ else
+ line_size += prev_raw_ptr - cstate->raw_buf_index;
+ }
+
cstate->raw_buf_index = raw_buf_ptr;
result = true; /* report EOF */
break;
@@ -5101,6 +5896,26 @@ not_end_of_copy:
IF_NEED_REFILL_AND_EOF_BREAK(mblen - 1);
raw_buf_ptr += mblen - 1;
}
+
+ /*
+ * Skip the header line. Update the line here, this cannot be done at
+ * the beginning, as there is a possibility that file contains empty
+ * lines.
+ */
+ if (IsParallelCopy() && first_char_in_line && !IsHeaderLine())
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundary *lineInfo;
+ uint32 line_first_block = pcshared_info->cur_block_pos;
+ line_pos = UpdateBlockInLineInfo(cstate,
+ line_first_block,
+ cstate->raw_buf_index, -1,
+ LINE_LEADER_POPULATING);
+ lineInfo = &pcshared_info->line_boundaries.ring[line_pos];
+ elog(DEBUG1, "[Leader] Adding - block:%d, offset:%d, line position:%d",
+ line_first_block, lineInfo->start_offset, line_pos);
+ }
+
first_char_in_line = false;
} /* end of outer loop */
@@ -5109,6 +5924,7 @@ not_end_of_copy:
*/
REFILL_LINEBUF;
CLEAR_EOL_LINE();
+ EndLineParallelCopy(cstate, line_pos, line_size, raw_buf_ptr);
return result;
}
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index aef8555..27fb3e3 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -381,6 +381,8 @@ extern FullTransactionId GetTopFullTransactionId(void);
extern FullTransactionId GetTopFullTransactionIdIfAny(void);
extern FullTransactionId GetCurrentFullTransactionId(void);
extern FullTransactionId GetCurrentFullTransactionIdIfAny(void);
+extern void AssignFullTransactionIdForWorker(FullTransactionId fullTransactionId);
+extern void AssignCommandIdForWorker(CommandId commandId, bool used);
extern void MarkCurrentTransactionIdLoggedIfAny(void);
extern bool SubTransactionIsActive(SubTransactionId subxid);
extern CommandId GetCurrentCommandId(bool used);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 3a83d0f..ba93f65 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1704,6 +1704,7 @@ ParallelCopyLineBoundary
ParallelCopyData
ParallelCopyDataBlock
ParallelCopyLineBuf
+ParallelCopyLineState
ParallelCopyShmInfo
ParallelExecutorInfo
ParallelHashGrowth
--
1.8.3.1
v2-0004-Documentation-for-parallel-copy.patchtext/x-patch; charset=US-ASCII; name=v2-0004-Documentation-for-parallel-copy.patchDownload
From f8ea98e3b3c97272a04b8ff0967f25aa648d6c41 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Tue, 21 Jul 2020 11:13:10 +0530
Subject: [PATCH v2 4/6] Documentation for parallel copy.
This patch has the documentation changes for parallel copy.
---
doc/src/sgml/ref/copy.sgml | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18189ab..95d349d 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -37,6 +37,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
NULL '<replaceable class="parameter">null_string</replaceable>'
HEADER [ <replaceable class="parameter">boolean</replaceable> ]
+ PARALLEL <replaceable class="parameter">integer</replaceable>
QUOTE '<replaceable class="parameter">quote_character</replaceable>'
ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -275,6 +276,21 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</varlistentry>
<varlistentry>
+ <term><literal>PARALLEL</literal></term>
+ <listitem>
+ <para>
+ Perform <command>COPY FROM</command> in parallel using <replaceable
+ class="parameter"> integer</replaceable> background workers. Please
+ note that it is not guaranteed that the number of parallel workers
+ specified in <replaceable class="parameter">integer</replaceable> will
+ be used during execution. It is possible for a copy to run with fewer
+ workers than specified, or even with no workers at all. This option is
+ allowed only in <command>COPY FROM</command>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>QUOTE</literal></term>
<listitem>
<para>
--
1.8.3.1
v2-0005-Tests-for-parallel-copy.patchtext/x-patch; charset=US-ASCII; name=v2-0005-Tests-for-parallel-copy.patchDownload
From 12f284ae5b3ec9e8cb65fa958e5ff8de5e7458d5 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Wed, 22 Jul 2020 18:59:44 +0530
Subject: [PATCH v2 5/6] Tests for parallel copy.
This patch has the tests for parallel copy.
---
src/test/regress/expected/copy2.out | 210 +++++++++++++++++++++++++++++++++++-
src/test/regress/input/copy.source | 12 +++
src/test/regress/output/copy.source | 12 +++
src/test/regress/sql/copy2.sql | 208 ++++++++++++++++++++++++++++++++++-
4 files changed, 434 insertions(+), 8 deletions(-)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index e40287d..361d572 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -254,18 +254,32 @@ It is "perfect".|
"It is ""perfect""."," "
"",
--test that we read consecutive LFs properly
-CREATE TEMP TABLE testnl (a int, b text, c int);
+CREATE TABLE testnl (a int, b text, c int);
COPY testnl FROM stdin CSV;
+COPY testnl FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
+SELECT * FROM testnl;
+ a | b | c
+---+----------------------+---
+ 1 | a field with two LFs+| 2
+ | +|
+ | inside |
+ 1 | a field with two LFs+| 2
+ | +|
+ | inside |
+(2 rows)
+
-- test end of copy marker
-CREATE TEMP TABLE testeoc (a text);
+CREATE TABLE testeoc (a text);
COPY testeoc FROM stdin CSV;
+TRUNCATE testeoc;
+COPY testeoc FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
COPY testeoc TO stdout CSV;
a\.
\.b
c\.d
"\."
-- test handling of nonstandard null marker that violates escaping rules
-CREATE TEMP TABLE testnull(a int, b text);
+CREATE TABLE testnull(a int, b text);
INSERT INTO testnull VALUES (1, E'\\0'), (NULL, NULL);
COPY testnull TO stdout WITH NULL AS E'\\0';
1 \\0
@@ -280,6 +294,15 @@ SELECT * FROM testnull;
|
(4 rows)
+TRUNCATE testnull;
+COPY testnull FROM stdin WITH (NULL E'\\0', PARALLEL 1);
+SELECT * FROM testnull;
+ a | b
+----+----
+ 42 | \0
+ |
+(2 rows)
+
BEGIN;
CREATE TABLE vistest (LIKE testeoc);
COPY vistest FROM stdin CSV;
@@ -349,6 +372,34 @@ SELECT * FROM vistest;
BEGIN;
TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+SELECT * FROM vistest;
+ a
+----
+ a2
+ b
+(2 rows)
+
+SAVEPOINT s1;
+TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+SELECT * FROM vistest;
+ a
+----
+ d2
+ e
+(2 rows)
+
+COMMIT;
+SELECT * FROM vistest;
+ a
+----
+ d2
+ e
+(2 rows)
+
+BEGIN;
+TRUNCATE vistest;
COPY vistest FROM stdin CSV FREEZE;
SELECT * FROM vistest;
a
@@ -409,7 +460,7 @@ SELECT * FROM vistest;
(2 rows)
-- Test FORCE_NOT_NULL and FORCE_NULL options
-CREATE TEMP TABLE forcetest (
+CREATE TABLE forcetest (
a INT NOT NULL,
b TEXT NOT NULL,
c TEXT,
@@ -420,6 +471,8 @@ CREATE TEMP TABLE forcetest (
-- should succeed with no effect ("b" remains an empty string, "c" remains NULL)
BEGIN;
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
+TRUNCATE forcetest;
+COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c), PARALLEL 1);
COMMIT;
SELECT b, c FROM forcetest WHERE a = 1;
b | c
@@ -430,6 +483,8 @@ SELECT b, c FROM forcetest WHERE a = 1;
-- should succeed, FORCE_NULL and FORCE_NOT_NULL can be both specified
BEGIN;
COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d));
+TRUNCATE forcetest;
+COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d), PARALLEL 1);
COMMIT;
SELECT c, d FROM forcetest WHERE a = 2;
c | d
@@ -486,6 +541,31 @@ select * from check_con_tbl;
(2 rows)
+\d+ check_con_tbl
+ Table "public.check_con_tbl"
+ Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
+--------+---------+-----------+----------+---------+---------+--------------+-------------
+ f1 | integer | | | | plain | |
+Check constraints:
+ "check_con_tbl_check" CHECK (check_con_function(check_con_tbl.*))
+
+truncate check_con_tbl;
+copy check_con_tbl from stdin with (parallel 1);
+NOTICE: input = {"f1":1}
+NOTICE: input = {"f1":null}
+copy check_con_tbl from stdin with (parallel 1);
+NOTICE: input = {"f1":0}
+ERROR: new row for relation "check_con_tbl" violates check constraint "check_con_tbl_check"
+DETAIL: Failing row contains (0).
+CONTEXT: COPY check_con_tbl, line 1: "0"
+parallel worker
+select * from check_con_tbl;
+ f1
+----
+ 1
+
+(2 rows)
+
-- test with RLS enabled.
CREATE ROLE regress_rls_copy_user;
CREATE ROLE regress_rls_copy_user_colperms;
@@ -600,8 +680,130 @@ SELECT * FROM instead_of_insert_tbl;
(2 rows)
COMMIT;
+-- Parallel copy tests.
+CREATE TABLE test_parallel_copy (
+ a INT,
+ b INT,
+ c TEXT not null default 'stuff',
+ d TEXT,
+ e TEXT
+) ;
+COPY test_parallel_copy (a, b, c, d, e) from stdin with (PARALLEL 1);
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1, FORMAT 'csv', HEADER);
+-- zero workers: should perform non-parallel copy
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0');
+ERROR: argument to option "parallel" must be a positive integer greater than zero
+LINE 1: COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0'...
+ ^
+-- referencing table: should perform non-parallel copy
+CREATE TABLE test_copy_pk(c1 INT PRIMARY KEY);
+INSERT INTO test_copy_pk VALUES(10);
+CREATE TABLE test_copy_ri(c1 INT REFERENCES test_copy_pk(c1));
+COPY test_copy_ri from stdin with (FORMAT csv, DELIMITER ',', PARALLEL 1);
+WARNING: Parallel copy not supported for specified table, copy will be run in non-parallel mode
+-- expressions: should perform non-parallel copy
+CREATE TABLE test_copy_expr (index INT, height REAL, weight REAL);
+COPY test_copy_expr FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1) WHERE height > random() * 65;
+WARNING: Parallel copy not supported for specified table, copy will be run in non-parallel mode
+-- serial data: should perform non-parallel copy
+CREATE TABLE testserial (index SERIAL, height REAL);
+COPY testserial(height) FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1);
+WARNING: Parallel copy not supported for specified table, copy will be run in non-parallel mode
+-- temporary table copy: should perform non-parallel copy
+CREATE TEMPORARY TABLE temp_test(
+ a int
+) ;
+COPY temp_test (a) from stdin with (PARALLEL 1);
+WARNING: Parallel copy not supported for specified table, copy will be run in non-parallel mode
+-- non-existent column in column list: should fail
+COPY test_parallel_copy (xyz) from stdin with (PARALLEL 1);
+ERROR: column "xyz" of relation "test_parallel_copy" does not exist
+-- too many columns in column list: should fail
+COPY test_parallel_copy (a, b, c, d, e, d, c) from stdin with (PARALLEL 1);
+ERROR: column "d" specified more than once
+-- missing data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: invalid input syntax for type integer: ""
+CONTEXT: COPY test_parallel_copy, line 0, column a: ""
+parallel worker
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: missing data for column "e"
+CONTEXT: COPY test_parallel_copy, line 1: "2000 230 23 23"
+parallel worker
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: missing data for column "e"
+CONTEXT: COPY test_parallel_copy, line 1: "2001 231 \N \N"
+parallel worker
+-- extra data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: extra data after last expected column
+CONTEXT: COPY test_parallel_copy, line 1: "2002 232 40 50 60 70 80"
+parallel worker
+-- various COPY options: delimiters, oids, NULL string, encoding
+COPY test_parallel_copy (b, c, d, e) from stdin with (delimiter ',', null 'x', PARALLEL 1) ;
+COPY test_parallel_copy from stdin WITH (DELIMITER ';', NULL '', PARALLEL 1);
+COPY test_parallel_copy from stdin WITH (DELIMITER ':', NULL E'\\X', ENCODING 'sql_ascii', PARALLEL 1);
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = 50004;
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a > 60003;
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE f > 60003;
+ERROR: column "f" does not exist
+LINE 1: ..._parallel_copy from stdin WITH (PARALLEL 1) WHERE f > 60003;
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = max(x.b);
+ERROR: missing FROM-clause entry for table "x"
+LINE 1: ...rallel_copy from stdin WITH (PARALLEL 1) WHERE a = max(x.b);
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (SELECT 1 FROM x);
+ERROR: cannot use subquery in COPY FROM WHERE condition
+LINE 1: ...arallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (SELECT...
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (generate_series(1,5));
+ERROR: set-returning functions are not allowed in COPY FROM WHERE conditions
+LINE 1: ...lel_copy from stdin WITH (PARALLEL 1) WHERE a IN (generate_s...
+ ^
+COPY test_parallel_copy from stdin WITH(PARALLEL 1) WHERE a = row_number() over(b);
+ERROR: window functions are not allowed in COPY FROM WHERE conditions
+LINE 1: ...rallel_copy from stdin WITH(PARALLEL 1) WHERE a = row_number...
+ ^
+-- check results of copy in
+SELECT * FROM test_parallel_copy;
+ a | b | c | d | e
+-------+----+------------+---------+---------
+ 1 | 11 | test_c1 | test_d1 | test_e1
+ 2 | 12 | test_c2 | test_d2 | test_e2
+ | 3 | stuff | test_d3 |
+ | 4 | stuff | test_d4 |
+ | 5 | stuff | test_d5 |
+ | | 45 | 80 | 90
+ | | x | \x | \x
+ | | , | \, | \
+ 3000 | | c | |
+ 4000 | | C | |
+ 4001 | 1 | empty | |
+ 4002 | 2 | null | |
+ 4003 | 3 | Backslash | \ | \
+ 4004 | 4 | BackslashX | \X | \X
+ 4005 | 5 | N | N | N
+ 4006 | 6 | BackslashN | \N | \N
+ 4007 | 7 | XX | XX | XX
+ 4008 | 8 | Delimiter | : | :
+ 50004 | 25 | 35 | 45 | 55
+ 60004 | 25 | 35 | 45 | 55
+ 60005 | 26 | 36 | 46 | 56
+(21 rows)
+
-- clean up
+DROP TABLE test_copy_ri;
+DROP TABLE test_copy_pk;
+DROP TABLE test_copy_expr;
+DROP TABLE testeoc;
+DROP TABLE testnl;
DROP TABLE forcetest;
+DROP TABLE test_parallel_copy;
+DROP TABLE testserial;
+DROP TABLE testnull;
DROP TABLE vistest;
DROP FUNCTION truncate_in_subxact();
DROP TABLE x, y;
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index a1d529a..159c058 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -15,6 +15,13 @@ DELETE FROM onek;
COPY onek FROM '@abs_builddir@/results/onek.data';
+-- test parallel copy
+COPY tenk1 FROM '@abs_srcdir@/data/tenk.data' WITH (parallel 2);
+
+SELECT COUNT(*) FROM tenk1;
+
+TRUNCATE tenk1;
+
COPY tenk1 FROM '@abs_srcdir@/data/tenk.data';
COPY slow_emp4000 FROM '@abs_srcdir@/data/rect.data';
@@ -159,6 +166,11 @@ truncate parted_copytest;
copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv';
+-- Test parallel copy from with a partitioned table.
+truncate parted_copytest;
+
+copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv' with (parallel 2);
+
-- Ensure COPY FREEZE errors for partitioned tables.
begin;
truncate parted_copytest;
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index 938d355..c3003fe 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -9,6 +9,15 @@ COPY onek FROM '@abs_srcdir@/data/onek.data';
COPY onek TO '@abs_builddir@/results/onek.data';
DELETE FROM onek;
COPY onek FROM '@abs_builddir@/results/onek.data';
+-- test parallel copy
+COPY tenk1 FROM '@abs_srcdir@/data/tenk.data' WITH (parallel 2);
+SELECT COUNT(*) FROM tenk1;
+ count
+-------
+ 10000
+(1 row)
+
+TRUNCATE tenk1;
COPY tenk1 FROM '@abs_srcdir@/data/tenk.data';
COPY slow_emp4000 FROM '@abs_srcdir@/data/rect.data';
COPY person FROM '@abs_srcdir@/data/person.data';
@@ -113,6 +122,9 @@ insert into parted_copytest select x,1,'One' from generate_series(1011,1020) x;
copy (select * from parted_copytest order by a) to '@abs_builddir@/results/parted_copytest.csv';
truncate parted_copytest;
copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv';
+-- Test parallel copy from with a partitioned table.
+truncate parted_copytest;
+copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv' with (parallel 2);
-- Ensure COPY FREEZE errors for partitioned tables.
begin;
truncate parted_copytest;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 902f4fa..7015698 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -157,7 +157,7 @@ COPY y TO stdout (FORMAT CSV, FORCE_QUOTE *);
--test that we read consecutive LFs properly
-CREATE TEMP TABLE testnl (a int, b text, c int);
+CREATE TABLE testnl (a int, b text, c int);
COPY testnl FROM stdin CSV;
1,"a field with two LFs
@@ -165,8 +165,16 @@ COPY testnl FROM stdin CSV;
inside",2
\.
+COPY testnl FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
+1,"a field with two LFs
+
+inside",2
+\.
+
+SELECT * FROM testnl;
+
-- test end of copy marker
-CREATE TEMP TABLE testeoc (a text);
+CREATE TABLE testeoc (a text);
COPY testeoc FROM stdin CSV;
a\.
@@ -175,11 +183,19 @@ c\.d
"\."
\.
+TRUNCATE testeoc;
+COPY testeoc FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
+a\.
+\.b
+c\.d
+"\."
+\.
+
COPY testeoc TO stdout CSV;
-- test handling of nonstandard null marker that violates escaping rules
-CREATE TEMP TABLE testnull(a int, b text);
+CREATE TABLE testnull(a int, b text);
INSERT INTO testnull VALUES (1, E'\\0'), (NULL, NULL);
COPY testnull TO stdout WITH NULL AS E'\\0';
@@ -191,6 +207,14 @@ COPY testnull FROM stdin WITH NULL AS E'\\0';
SELECT * FROM testnull;
+TRUNCATE testnull;
+COPY testnull FROM stdin WITH (NULL E'\\0', PARALLEL 1);
+42 \\0
+\0 \0
+\.
+
+SELECT * FROM testnull;
+
BEGIN;
CREATE TABLE vistest (LIKE testeoc);
COPY vistest FROM stdin CSV;
@@ -235,6 +259,23 @@ SELECT * FROM vistest;
BEGIN;
TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+a2
+b
+\.
+SELECT * FROM vistest;
+SAVEPOINT s1;
+TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+d2
+e
+\.
+SELECT * FROM vistest;
+COMMIT;
+SELECT * FROM vistest;
+
+BEGIN;
+TRUNCATE vistest;
COPY vistest FROM stdin CSV FREEZE;
x
y
@@ -284,7 +325,7 @@ SELECT * FROM vistest;
COMMIT;
SELECT * FROM vistest;
-- Test FORCE_NOT_NULL and FORCE_NULL options
-CREATE TEMP TABLE forcetest (
+CREATE TABLE forcetest (
a INT NOT NULL,
b TEXT NOT NULL,
c TEXT,
@@ -297,6 +338,10 @@ BEGIN;
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
1,,""
\.
+TRUNCATE forcetest;
+COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c), PARALLEL 1);
+1,,""
+\.
COMMIT;
SELECT b, c FROM forcetest WHERE a = 1;
-- should succeed, FORCE_NULL and FORCE_NOT_NULL can be both specified
@@ -304,6 +349,10 @@ BEGIN;
COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d));
2,'a',,""
\.
+TRUNCATE forcetest;
+COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d), PARALLEL 1);
+2,'a',,""
+\.
COMMIT;
SELECT c, d FROM forcetest WHERE a = 2;
-- should fail with not-null constraint violation
@@ -339,6 +388,16 @@ copy check_con_tbl from stdin;
0
\.
select * from check_con_tbl;
+\d+ check_con_tbl
+truncate check_con_tbl;
+copy check_con_tbl from stdin with (parallel 1);
+1
+\N
+\.
+copy check_con_tbl from stdin with (parallel 1);
+0
+\.
+select * from check_con_tbl;
-- test with RLS enabled.
CREATE ROLE regress_rls_copy_user;
@@ -440,8 +499,149 @@ test1
SELECT * FROM instead_of_insert_tbl;
COMMIT;
+-- Parallel copy tests.
+CREATE TABLE test_parallel_copy (
+ a INT,
+ b INT,
+ c TEXT not null default 'stuff',
+ d TEXT,
+ e TEXT
+) ;
+
+COPY test_parallel_copy (a, b, c, d, e) from stdin with (PARALLEL 1);
+1 11 test_c1 test_d1 test_e1
+2 12 test_c2 test_d2 test_e2
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+3 test_d3
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+4 test_d4
+5 test_d5
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1, FORMAT 'csv', HEADER);
+b d
+\.
+
+-- zero workers: should perform non-parallel copy
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0');
+
+-- referencing table: should perform non-parallel copy
+CREATE TABLE test_copy_pk(c1 INT PRIMARY KEY);
+INSERT INTO test_copy_pk VALUES(10);
+CREATE TABLE test_copy_ri(c1 INT REFERENCES test_copy_pk(c1));
+COPY test_copy_ri from stdin with (FORMAT csv, DELIMITER ',', PARALLEL 1);
+10
+\.
+
+-- expressions: should perform non-parallel copy
+CREATE TABLE test_copy_expr (index INT, height REAL, weight REAL);
+
+COPY test_copy_expr FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1) WHERE height > random() * 65;
+60,60,60
+\.
+
+-- serial data: should perform non-parallel copy
+CREATE TABLE testserial (index SERIAL, height REAL);
+
+COPY testserial(height) FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1);
+60
+\.
+
+-- temporary table copy: should perform non-parallel copy
+CREATE TEMPORARY TABLE temp_test(
+ a int
+) ;
+
+COPY temp_test (a) from stdin with (PARALLEL 1);
+10
+\.
+
+-- non-existent column in column list: should fail
+COPY test_parallel_copy (xyz) from stdin with (PARALLEL 1);
+
+-- too many columns in column list: should fail
+COPY test_parallel_copy (a, b, c, d, e, d, c) from stdin with (PARALLEL 1);
+
+-- missing data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+
+\.
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+2000 230 23 23
+\.
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+2001 231 \N \N
+\.
+
+-- extra data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+2002 232 40 50 60 70 80
+\.
+
+-- various COPY options: delimiters, oids, NULL string, encoding
+COPY test_parallel_copy (b, c, d, e) from stdin with (delimiter ',', null 'x', PARALLEL 1) ;
+x,45,80,90
+x,\x,\\x,\\\x
+x,\,,\\\,,\\
+\.
+
+COPY test_parallel_copy from stdin WITH (DELIMITER ';', NULL '', PARALLEL 1);
+3000;;c;;
+\.
+
+COPY test_parallel_copy from stdin WITH (DELIMITER ':', NULL E'\\X', ENCODING 'sql_ascii', PARALLEL 1);
+4000:\X:C:\X:\X
+4001:1:empty::
+4002:2:null:\X:\X
+4003:3:Backslash:\\:\\
+4004:4:BackslashX:\\X:\\X
+4005:5:N:\N:\N
+4006:6:BackslashN:\\N:\\N
+4007:7:XX:\XX:\XX
+4008:8:Delimiter:\::\:
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = 50004;
+50003 24 34 44 54
+50004 25 35 45 55
+50005 26 36 46 56
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a > 60003;
+60001 22 32 42 52
+60002 23 33 43 53
+60003 24 34 44 54
+60004 25 35 45 55
+60005 26 36 46 56
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE f > 60003;
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = max(x.b);
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (SELECT 1 FROM x);
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (generate_series(1,5));
+
+COPY test_parallel_copy from stdin WITH(PARALLEL 1) WHERE a = row_number() over(b);
+
+-- check results of copy in
+SELECT * FROM test_parallel_copy;
+
-- clean up
+DROP TABLE test_copy_ri;
+DROP TABLE test_copy_pk;
+DROP TABLE test_copy_expr;
+DROP TABLE testeoc;
+DROP TABLE testnl;
DROP TABLE forcetest;
+DROP TABLE test_parallel_copy;
+DROP TABLE testserial;
+DROP TABLE testnull;
DROP TABLE vistest;
DROP FUNCTION truncate_in_subxact();
DROP TABLE x, y;
--
1.8.3.1
v2-0006-Parallel-Copy-For-Binary-Format-Files.patchtext/x-patch; charset=US-ASCII; name=v2-0006-Parallel-Copy-For-Binary-Format-Files.patchDownload
From 7fbd84dc6a5b749f7eeb1d021887c8bd890e3a0f Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Wed, 22 Jul 2020 19:23:30 +0530
Subject: [PATCH v2 6/6] Parallel Copy For Binary Format Files
Leader reads data from the file into the DSM data blocks each of 64K size.
It also identifies each tuple data block id, start offset, end offset,
tuple size and updates this information in the ring data structure.
Workers parallelly read the tuple information from the ring data structure,
the actual tuple data from the data blocks and parallelly insert the tuples
into the table.
---
src/backend/commands/copy.c | 654 ++++++++++++++++++++++++++++++++++++++------
1 file changed, 574 insertions(+), 80 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 9b657ab..f19d359 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -263,6 +263,17 @@ typedef struct ParallelCopyLineBuf
}ParallelCopyLineBuf;
/*
+ * Represents the usage mode for CopyReadBinaryGetDataBlock.
+ */
+typedef enum FieldInfoType
+{
+ FIELD_NONE = 0,
+ FIELD_COUNT,
+ FIELD_SIZE,
+ FIELD_DATA
+} FieldInfoType;
+
+/*
* Parallel copy data information.
*/
typedef struct ParallelCopyData
@@ -283,6 +294,9 @@ typedef struct ParallelCopyData
/* Current position in worker_line_buf */
uint32 worker_line_buf_pos;
+
+ /* For binary formatted files */
+ ParallelCopyDataBlock *curr_data_block;
}ParallelCopyData;
/*
@@ -443,6 +457,7 @@ typedef struct SerializedParallelCopyState
/* Working state for COPY FROM */
AttrNumber num_defaults;
Oid relid;
+ bool binary;
}SerializedParallelCopyState;
/* DestReceiver for COPY (query) TO */
@@ -517,7 +532,6 @@ typedef struct CopyMultiInsertInfo
int ti_options; /* table insert options */
} CopyMultiInsertInfo;
-
/*
* These macros centralize code used to process line_buf and raw_buf buffers.
* They are macros because they often do continue/break control and to avoid
@@ -647,8 +661,110 @@ else \
/* End parallel copy Macros */
-static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
+/*
+ * CHECK_FIELD_COUNT - Handles the error cases for field count
+ * for binary format files.
+ */
+#define CHECK_FIELD_COUNT \
+{\
+ if (fld_count == -1) \
+ { \
+ if (IsParallelCopy() && \
+ !IsLeader()) \
+ return true; \
+ else if (IsParallelCopy() && \
+ IsLeader()) \
+ { \
+ if (cstate->pcdata->curr_data_block->data[cstate->raw_buf_index + sizeof(fld_count)] != 0) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("received copy data after EOF marker"))); \
+ return true; \
+ } \
+ else \
+ { \
+ /* \
+ * Received EOF marker. In a V3-protocol copy, wait for the \
+ * protocol-level EOF, and complain if it doesn't come \
+ * immediately. This ensures that we correctly handle CopyFail, \
+ * if client chooses to send that now. \
+ * \
+ * Note that we MUST NOT try to read more data in an old-protocol \
+ * copy, since there is no protocol-level EOF marker then. We \
+ * could go either way for copy from file, but choose to throw \
+ * error if there's data after the EOF marker, for consistency \
+ * with the new-protocol case. \
+ */ \
+ char dummy; \
+ if (cstate->copy_dest != COPY_OLD_FE && \
+ CopyGetData(cstate, &dummy, 1, 1) > 0) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("received copy data after EOF marker"))); \
+ return false; \
+ } \
+ } \
+ if (fld_count != cstate->max_fields) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("row field count is %d, expected %d", \
+ (int) fld_count, cstate->max_fields))); \
+}
+
+/*
+ * CHECK_FIELD_SIZE - Handles the error case for field size
+ * for binary format files.
+ */
+#define CHECK_FIELD_SIZE(fld_size) \
+{ \
+ if (fld_size < -1) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("invalid field size")));\
+}
+/*
+ * EOF_ERROR - Error statement for EOF for binary format
+ * files.
+ */
+#define EOF_ERROR \
+{ \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("unexpected EOF in COPY data")));\
+}
+
+/*
+ * GET_RAW_BUF_INDEX - Calculates the raw buf index for the cases
+ * where the data spread is across multiple data blocks.
+ */
+#define GET_RAW_BUF_INDEX(raw_buf_index, fld_size, required_blks, curr_blk_bytes) \
+{ \
+ raw_buf_index = fld_size - (((required_blks - 1) * DATA_BLOCK_SIZE) + curr_blk_bytes); \
+}
+
+/*
+ * GET_REQUIRED_BLOCKS - Calculates the number of data
+ * blocks required for the cases where the data spread
+ * is across multiple data blocks.
+ */
+#define GET_REQUIRED_BLOCKS(required_blks, fld_size, curr_blk_bytes) \
+{ \
+ /* \
+ * field size can spread across multiple data blocks, \
+ * calculate the number of required data blocks and try to get \
+ * those many data blocks. \
+ */ \
+ required_blks = (int32)(fld_size - curr_blk_bytes)/(int32)DATA_BLOCK_SIZE; \
+ /* \
+ * check if we need the data block for the field data \
+ * bytes that are not modulus of data block size. \
+ */ \
+ if ((fld_size - curr_blk_bytes)%DATA_BLOCK_SIZE != 0) \
+ required_blks++; \
+}
+
+static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
/* non-export function prototypes */
static CopyState BeginCopy(ParseState *pstate, bool is_from, Relation rel,
@@ -702,6 +818,14 @@ static void ExecBeforeStmtTrigger(CopyState cstate);
static void CheckTargetRelValidity(CopyState cstate);
static void PopulateCstateCatalogInfo(CopyState cstate);
static pg_attribute_always_inline uint32 GetLinePosition(CopyState cstate);
+static uint32 WaitGetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info);
+static bool CopyReadBinaryTupleLeader(CopyState cstate);
+static void CopyReadBinaryFindTupleSize(CopyState cstate, uint32 *line_size);
+static bool CopyReadBinaryTupleWorker(CopyState cstate, Datum *values, bool *nulls);
+static Datum CopyReadBinaryAttributeWorker(CopyState cstate, FmgrInfo *flinfo,
+ Oid typioparam, int32 typmod, bool *isnull);
+static void CopyReadBinaryGetDataBlock(CopyState cstate, FieldInfoType field_info);
+
/*
* SerializeParallelCopyState - Copy shared_cstate using cstate information.
*/
@@ -719,6 +843,7 @@ SerializeParallelCopyState(CopyState cstate, SerializedParallelCopyState *shared
shared_cstate->convert_selectively = cstate->convert_selectively;
shared_cstate->num_defaults = cstate->num_defaults;
shared_cstate->relid = cstate->pcdata->relid;
+ shared_cstate->binary = cstate->binary;
}
/*
@@ -878,8 +1003,8 @@ FindInsertMethod(CopyState cstate)
static pg_attribute_always_inline bool
IsParallelCopyAllowed(CopyState cstate)
{
- /* Parallel copy not allowed for frontend (2.0 protocol) & binary option. */
- if ((cstate->copy_dest == COPY_OLD_FE) || cstate->binary)
+ /* Parallel copy not allowed for frontend (2.0 protocol). */
+ if (cstate->copy_dest == COPY_OLD_FE)
return false;
/* Check if copy is into foreign table or temporary table. */
@@ -1153,6 +1278,7 @@ InitializeParallelCopyInfo(SerializedParallelCopyState *shared_cstate,
cstate->convert_selectively = shared_cstate->convert_selectively;
cstate->num_defaults = shared_cstate->num_defaults;
pcdata->relid = shared_cstate->relid;
+ cstate->binary = shared_cstate->binary;
PopulateCommonCstateInfo(cstate, tup_desc, attnamelist);
@@ -1542,32 +1668,66 @@ ParallelCopyFrom(CopyState cstate)
/* Execute the before statement triggers from the leader */
ExecBeforeStmtTrigger(cstate);
- /* On input just throw the header line away. */
- if (cstate->cur_lineno == 0 && cstate->header_line)
+ if (!cstate->binary)
{
- cstate->cur_lineno++;
- if (CopyReadLine(cstate))
+ /* On input just throw the header line away. */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
{
- pcshared_info->is_read_in_progress = false;
- return; /* done */
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ {
+ pcshared_info->is_read_in_progress = false;
+ return; /* done */
+ }
}
- }
- for (;;)
- {
- bool done;
- cstate->cur_lineno++;
+ for (;;)
+ {
+ bool done;
+ cstate->cur_lineno++;
- /* Actually read the line into memory here. */
- done = CopyReadLine(cstate);
+ /* Actually read the line into memory here. */
+ done = CopyReadLine(cstate);
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ break;
+ }
+ }
+ else
+ {
/*
- * EOF at start of line means we're done. If we see EOF after some
- * characters, we act as though it was newline followed by EOF, ie,
- * process the line and then exit loop on next iteration.
+ * Binary Format Files.
+ * For parallel copy leader, fill in the error
+ * context information here, in case any failures
+ * while determining tuple offsets, leader
+ * would throw the errors with proper context.
*/
- if (done && cstate->line_buf.len == 0)
- break;
+ ErrorContextCallback errcallback;
+ errcallback.callback = CopyFromErrorCallback;
+ errcallback.arg = (void *) cstate;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+ cstate->pcdata->curr_data_block = NULL;
+ cstate->raw_buf_index = 0;
+ pcshared_info->populated = 0;
+ cstate->cur_lineno = 0;
+ cstate->max_fields = list_length(cstate->attnumlist);
+
+ for (;;)
+ {
+ bool eof = false;
+ cstate->cur_lineno++;
+
+ eof = CopyReadBinaryTupleLeader(cstate);
+
+ if (eof)
+ break;
+ }
}
pcshared_info->is_read_in_progress = false;
@@ -1575,7 +1735,357 @@ ParallelCopyFrom(CopyState cstate)
}
/*
- * GetLinePosition - return the line position that worker should process.
+ * CopyReadBinaryGetDataBlock - gets a new block, updates
+ * the current offset, calculates the skip bytes.
+ */
+static void
+CopyReadBinaryGetDataBlock(CopyState cstate, FieldInfoType field_info)
+{
+ ParallelCopyDataBlock *data_block = NULL;
+ ParallelCopyDataBlock *curr_data_block = cstate->pcdata->curr_data_block;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint8 move_bytes = 0;
+ uint32 block_pos;
+ uint32 prev_block_pos;
+ int read_bytes = 0;
+
+ prev_block_pos = pcshared_info->cur_block_pos;
+
+ block_pos = WaitGetFreeCopyBlock(pcshared_info);
+
+ if (field_info == FIELD_SIZE || field_info == FIELD_COUNT)
+ move_bytes = (DATA_BLOCK_SIZE - cstate->raw_buf_index);
+
+ if (curr_data_block != NULL)
+ curr_data_block->skip_bytes = move_bytes;
+
+ data_block = &pcshared_info->data_blocks[block_pos];
+
+ if (move_bytes > 0 && curr_data_block != NULL)
+ memmove(&data_block->data[0], &curr_data_block->data[cstate->raw_buf_index], move_bytes);
+
+ elog(DEBUG1, "LEADER - field info %d is spread across data blocks - moved %d bytes from current block %u to %u block",
+ field_info, move_bytes, prev_block_pos, block_pos);
+
+ read_bytes = CopyGetData(cstate, &data_block->data[move_bytes], 1, (DATA_BLOCK_SIZE - move_bytes));
+
+ if (field_info == FIELD_NONE && cstate->reached_eof)
+ return;
+
+ if (cstate->reached_eof)
+ EOF_ERROR;
+
+ elog(DEBUG1, "LEADER - bytes read from file %d", read_bytes);
+
+ if(field_info == FIELD_SIZE || field_info == FIELD_DATA)
+ {
+ ParallelCopyDataBlock *prev_data_block = NULL;
+ prev_data_block = curr_data_block;
+ prev_data_block->following_block = block_pos;
+
+ if (prev_data_block->curr_blk_completed == false)
+ prev_data_block->curr_blk_completed = true;
+
+ pg_atomic_add_fetch_u32(&prev_data_block->unprocessed_line_parts, 1);
+ }
+
+ cstate->pcdata->curr_data_block = data_block;
+ cstate->raw_buf_index = 0;
+}
+
+/*
+ * CopyReadBinaryTupleLeader - leader reads data from binary formatted file
+ * to data blocks and identifies tuple boundaries/offsets so that workers
+ * can work on the data blocks data.
+ */
+static bool
+CopyReadBinaryTupleLeader(CopyState cstate)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ int16 fld_count;
+ uint32 line_size = 0;
+ uint32 start_block_pos;
+ uint32 start_offset;
+
+ if (cstate->pcdata->curr_data_block == NULL)
+ {
+ CopyReadBinaryGetDataBlock(cstate, FIELD_NONE);
+
+ /*
+ * no data is read from file here. one possibility
+ * to be here could be that the binary file just
+ * has a valid signature but nothing else.
+ */
+ if (cstate->reached_eof)
+ return true;
+ }
+
+ if ((cstate->raw_buf_index + sizeof(fld_count)) >= DATA_BLOCK_SIZE)
+ CopyReadBinaryGetDataBlock(cstate, FIELD_COUNT);
+
+ memcpy(&fld_count, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_count));
+ fld_count = (int16) pg_ntoh16(fld_count);
+ CHECK_FIELD_COUNT;
+ start_offset = cstate->raw_buf_index;
+ cstate->raw_buf_index += sizeof(fld_count);
+ line_size += sizeof(fld_count);
+ start_block_pos = pcshared_info->cur_block_pos;
+
+ CopyReadBinaryFindTupleSize(cstate, &line_size);
+
+ pg_atomic_add_fetch_u32(&cstate->pcdata->curr_data_block->unprocessed_line_parts, 1);
+
+ if (line_size > 0)
+ {
+ int line_pos = UpdateBlockInLineInfo(cstate,
+ start_block_pos,
+ start_offset,
+ line_size,
+ LINE_LEADER_POPULATED);
+
+ pcshared_info->populated++;
+ elog(DEBUG1, "LEADER - adding - block:%u, offset:%u, line size:%u line position:%d",
+ start_block_pos, start_offset, line_size, line_pos);
+ }
+
+ return false;
+}
+
+/*
+ * CopyReadBinaryFindTupleSize - leader identifies boundaries/
+ * offsets for each attribute/column and finally results in the
+ * tuple/row size. It moves on to next data block if the attribute/
+ * column is spread across data blocks.
+ */
+static void
+CopyReadBinaryFindTupleSize(CopyState cstate, uint32 *line_size)
+{
+ int32 fld_size;
+ ListCell *cur;
+ TupleDesc tup_desc = RelationGetDescr(cstate->rel);
+
+ foreach(cur, cstate->attnumlist)
+ {
+ int att_num = lfirst_int(cur);
+ Form_pg_attribute att = TupleDescAttr(tup_desc, (att_num - 1));
+ cstate->cur_attname = NameStr(att->attname);
+ fld_size = 0;
+
+ if ((cstate->raw_buf_index + sizeof(fld_size)) >= DATA_BLOCK_SIZE)
+ CopyReadBinaryGetDataBlock(cstate, FIELD_SIZE);
+
+ memcpy(&fld_size, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_size));
+ cstate->raw_buf_index += sizeof(fld_size);
+ *line_size += sizeof(fld_size);
+ fld_size = (int32) pg_ntoh32(fld_size);
+
+ CHECK_FIELD_SIZE(fld_size);
+
+ *line_size += fld_size;
+
+ if ((DATA_BLOCK_SIZE - cstate->raw_buf_index) >= fld_size)
+ {
+ cstate->raw_buf_index += fld_size;
+ elog(DEBUG1, "LEADER - tuple lies in he same data block");
+ }
+ else
+ {
+ int32 required_blks = 0;
+ int32 curr_blk_bytes = (DATA_BLOCK_SIZE - cstate->raw_buf_index);
+ int i = 0;
+
+ GET_REQUIRED_BLOCKS(required_blks, fld_size, curr_blk_bytes);
+
+ i = required_blks;
+
+ while(i > 0)
+ {
+ CopyReadBinaryGetDataBlock(cstate, FIELD_DATA);
+ i--;
+ }
+
+ GET_RAW_BUF_INDEX(cstate->raw_buf_index, fld_size, required_blks, curr_blk_bytes);
+
+ /*
+ * raw_buf_index should never cross data block size,
+ * as the required number of data blocks would have
+ * been obtained in the above while loop.
+ */
+ Assert(cstate->raw_buf_index <= DATA_BLOCK_SIZE);
+ }
+ cstate->cur_attname = NULL;
+ }
+}
+
+/*
+ * CopyReadBinaryTupleWorker - each worker reads data from data blocks after
+ * getting leader-identified tuple offsets from ring data structure.
+ */
+static bool
+CopyReadBinaryTupleWorker(CopyState cstate, Datum *values, bool *nulls)
+{
+ uint32 line_pos;
+ ParallelCopyLineBoundary *line_info;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ int16 fld_count;
+ ListCell *cur;
+ FmgrInfo *in_functions = cstate->in_functions;
+ Oid *typioparams = cstate->typioparams;
+ TupleDesc tup_desc = RelationGetDescr(cstate->rel);
+
+ line_pos = GetLinePosition(cstate);
+
+ if (line_pos == -1)
+ return true;
+
+ line_info = &pcshared_info->line_boundaries.ring[line_pos];
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[line_info->first_block];
+ cstate->raw_buf_index = line_info->start_offset;
+
+#if 0
+ if (cstate->raw_buf_index + sizeof(fld_count) >= DATA_BLOCK_SIZE)
+ {
+ /*
+ * The case where field count spread across datablocks should never occur,
+ * as the leader would have moved it to next block. enable this if block
+ * for debugging purposes only.
+ */
+ elog(DEBUG1, "WORKER - field count spread across datablocks should never occur");
+ }
+#endif
+
+ memcpy(&fld_count, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_count));
+ fld_count = (int16) pg_ntoh16(fld_count);
+
+ CHECK_FIELD_COUNT;
+
+ cstate->raw_buf_index += sizeof(fld_count);
+
+ foreach(cur, cstate->attnumlist)
+ {
+ int att_num = lfirst_int(cur);
+ int m = att_num - 1;
+ Form_pg_attribute att = TupleDescAttr(tup_desc, m);
+ cstate->cur_attname = NameStr(att->attname);
+
+ values[m] = CopyReadBinaryAttributeWorker(cstate,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &nulls[m]);
+ cstate->cur_attname = NULL;
+ }
+
+ pg_atomic_sub_fetch_u32(&cstate->pcdata->curr_data_block->unprocessed_line_parts, 1);
+ line_info->start_offset = -1;
+ pg_atomic_write_u32(&line_info->line_state, LINE_WORKER_PROCESSED);
+ pg_atomic_write_u32(&line_info->line_size, -1);
+ pg_atomic_add_fetch_u64(&pcshared_info->total_worker_processed, 1);
+
+ return false;
+}
+
+/*
+ * CopyReadBinaryAttributeWorker - leader identifies boundaries/offsets
+ * for each attribute/column, it moves on to next data block if the
+ * attribute/column is spread across data blocks.
+ */
+static Datum
+CopyReadBinaryAttributeWorker(CopyState cstate, FmgrInfo *flinfo,
+ Oid typioparam, int32 typmod, bool *isnull)
+{
+ int32 fld_size;
+ Datum result;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+
+ if ((cstate->raw_buf_index + sizeof(fld_size)) >= DATA_BLOCK_SIZE)
+ {
+ ParallelCopyDataBlock *prev_data_block = cstate->pcdata->curr_data_block;
+ elog(DEBUG1, "WORKER - field size is spread across data blocks");
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[prev_data_block->following_block];
+ pg_atomic_sub_fetch_u32(&prev_data_block->unprocessed_line_parts, 1);
+ cstate->raw_buf_index = 0;
+ }
+
+ memcpy(&fld_size, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_size));
+ fld_size = (int32) pg_ntoh32(fld_size);
+
+ CHECK_FIELD_SIZE(fld_size);
+
+ cstate->raw_buf_index += sizeof(fld_size);
+
+ /* reset attribute_buf to empty, and load raw data in it */
+ resetStringInfo(&cstate->attribute_buf);
+
+ enlargeStringInfo(&cstate->attribute_buf, fld_size);
+
+ if ((DATA_BLOCK_SIZE - cstate->raw_buf_index) >= fld_size)
+ {
+ elog(DEBUG1, "WORKER - tuple lies in single data block");
+ memcpy(&cstate->attribute_buf.data[0],&cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], fld_size);
+ cstate->raw_buf_index += fld_size;
+ }
+ else
+ {
+ uint32 att_buf_idx = 0;
+ uint32 copy_bytes = 0;
+ int32 required_blks = 0;
+ int32 curr_blk_bytes = (DATA_BLOCK_SIZE - cstate->raw_buf_index);
+ ParallelCopyDataBlock *prev_data_block = NULL;
+ int i = 0;
+
+ GET_REQUIRED_BLOCKS(required_blks, fld_size, curr_blk_bytes);
+
+ i = required_blks;
+ prev_data_block = cstate->pcdata->curr_data_block;
+ elog(DEBUG1, "WORKER - tuple is spread across data blocks");
+ memcpy(&cstate->attribute_buf.data[0], &prev_data_block->data[cstate->raw_buf_index],
+ curr_blk_bytes);
+ copy_bytes = curr_blk_bytes;
+ att_buf_idx = curr_blk_bytes;
+
+ while (i>0)
+ {
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[prev_data_block->following_block];
+ pg_atomic_sub_fetch_u32(&prev_data_block->unprocessed_line_parts, 1);
+ cstate->raw_buf_index = 0;
+ copy_bytes = fld_size - att_buf_idx;
+
+ /*
+ * the bytes that are yet to be taken into att buff are more than
+ * the entire data block size, but only take the data block size
+ * elements.
+ */
+ if (copy_bytes >= DATA_BLOCK_SIZE)
+ copy_bytes = DATA_BLOCK_SIZE;
+
+ memcpy(&cstate->attribute_buf.data[att_buf_idx],
+ &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], copy_bytes);
+ att_buf_idx += copy_bytes;
+ i--;
+ }
+ GET_RAW_BUF_INDEX(cstate->raw_buf_index, fld_size, required_blks, curr_blk_bytes);
+ }
+
+ cstate->attribute_buf.len = fld_size;
+ cstate->attribute_buf.data[fld_size] = '\0';
+
+ /* Call the column type's binary input converter */
+ result = ReceiveFunctionCall(flinfo, &cstate->attribute_buf,
+ typioparam, typmod);
+
+ /* Trouble if it didn't eat the whole buffer */
+ if (cstate->attribute_buf.cursor != cstate->attribute_buf.len)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
+ errmsg("incorrect binary data format")));
+
+ *isnull = false;
+ return result;
+}
+
+/*
+ * GetLinePosition - return the line position that worker should pcdata->process.
*/
static uint32
GetLinePosition(CopyState cstate)
@@ -1663,7 +2173,9 @@ GetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
{
dataBlkPtr->curr_blk_completed = false;
dataBlkPtr->skip_bytes = 0;
+ dataBlkPtr->following_block = -1;
pcshared_info->cur_block_pos = block_pos;
+ MemSet(&dataBlkPtr->data[0], 0, DATA_BLOCK_SIZE);
return block_pos;
}
@@ -5313,60 +5825,47 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
else
{
/* binary */
- int16 fld_count;
- ListCell *cur;
-
- cstate->cur_lineno++;
-
- if (!CopyGetInt16(cstate, &fld_count))
+ if (!IsParallelCopy())
{
- /* EOF detected (end of file, or protocol-level EOF) */
- return false;
- }
+ int16 fld_count;
+ ListCell *cur;
- if (fld_count == -1)
- {
- /*
- * Received EOF marker. In a V3-protocol copy, wait for the
- * protocol-level EOF, and complain if it doesn't come
- * immediately. This ensures that we correctly handle CopyFail,
- * if client chooses to send that now.
- *
- * Note that we MUST NOT try to read more data in an old-protocol
- * copy, since there is no protocol-level EOF marker then. We
- * could go either way for copy from file, but choose to throw
- * error if there's data after the EOF marker, for consistency
- * with the new-protocol case.
- */
- char dummy;
+ cstate->cur_lineno++;
+ cstate->max_fields = list_length(cstate->attnumlist);
- if (cstate->copy_dest != COPY_OLD_FE &&
- CopyGetData(cstate, &dummy, 1, 1) > 0)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("received copy data after EOF marker")));
- return false;
- }
+ if (!CopyGetInt16(cstate, &fld_count))
+ {
+ /* EOF detected (end of file, or protocol-level EOF) */
+ return false;
+ }
- if (fld_count != attr_count)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("row field count is %d, expected %d",
- (int) fld_count, attr_count)));
+ CHECK_FIELD_COUNT;
- foreach(cur, cstate->attnumlist)
+ foreach(cur, cstate->attnumlist)
+ {
+ int attnum = lfirst_int(cur);
+ int m = attnum - 1;
+ Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+ cstate->cur_attname = NameStr(att->attname);
+ values[m] = CopyReadBinaryAttribute(cstate,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &nulls[m]);
+ cstate->cur_attname = NULL;
+ }
+ }
+ else
{
- int attnum = lfirst_int(cur);
- int m = attnum - 1;
- Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+ bool eof = false;
+ cstate->cur_lineno++;
+ cstate->max_fields = list_length(cstate->attnumlist);
- cstate->cur_attname = NameStr(att->attname);
- values[m] = CopyReadBinaryAttribute(cstate,
- &in_functions[m],
- typioparams[m],
- att->atttypmod,
- &nulls[m]);
- cstate->cur_attname = NULL;
+ eof = CopyReadBinaryTupleWorker(cstate, values, nulls);
+
+ if (eof)
+ return false;
}
}
@@ -6366,18 +6865,15 @@ CopyReadBinaryAttribute(CopyState cstate, FmgrInfo *flinfo,
Datum result;
if (!CopyGetInt32(cstate, &fld_size))
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("unexpected EOF in COPY data")));
+ EOF_ERROR;
+
if (fld_size == -1)
{
*isnull = true;
return ReceiveFunctionCall(flinfo, NULL, typioparam, typmod);
}
- if (fld_size < 0)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("invalid field size")));
+
+ CHECK_FIELD_SIZE(fld_size);
/* reset attribute_buf to empty, and load raw data in it */
resetStringInfo(&cstate->attribute_buf);
@@ -6385,9 +6881,7 @@ CopyReadBinaryAttribute(CopyState cstate, FmgrInfo *flinfo,
enlargeStringInfo(&cstate->attribute_buf, fld_size);
if (CopyGetData(cstate, cstate->attribute_buf.data,
fld_size, fld_size) != fld_size)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("unexpected EOF in COPY data")));
+ EOF_ERROR;
cstate->attribute_buf.len = fld_size;
cstate->attribute_buf.data[fld_size] = '\0';
--
1.8.3.1
Thanks for reviewing and providing the comments Ashutosh.
Please find my thoughts below:
On Fri, Jul 17, 2020 at 7:18 PM Ashutosh Sharma <ashu.coek88@gmail.com>
wrote:
Some review comments (mostly) from the leader side code changes:
1) Do we need a DSM key for the FORCE_QUOTE option? I think FORCE_QUOTE
option is only used with COPY TO and not COPY FROM so not sure why you have
added it.
PARALLEL_COPY_KEY_FORCE_QUOTE_LIST
Fixed
2) Should we be allocating the parallel copy data structure only when it
is confirmed that the parallel copy is allowed?
pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
cstate->pcdata = pcdata;Or, if you want it to be allocated before confirming if Parallel copy is
allowed or not, then I think it would be good to allocate it in
*cstate->copycontext* memory context so that when EndCopy is called towards
the end of the COPY FROM operation, the entire context itself gets deleted
thereby freeing the memory space allocated for pcdata. In fact it would be
good to ensure that all the local memory allocated inside the ctstate
structure gets allocated in the *cstate->copycontext* memory context.
Fixed
3) Should we allow Parallel Copy when the insert method is
CIM_MULTI_CONDITIONAL?
+ /* Check if the insertion mode is single. */ + if (FindInsertMethod(cstate) == CIM_SINGLE) + return false;I know we have added checks in CopyFrom() to ensure that if any trigger
(before row or instead of) is found on any of partition being loaded with
data, then COPY FROM operation would fail, but does it mean that we are
okay to perform parallel copy on partitioned table. Have we done some
performance testing with the partitioned table where the data in the input
file needs to be routed to the different partitions?
Partition data is handled like what Amit had told in one of earlier mails
[1]: /messages/by-id/CAA4eK1LQPxULxw8JpucX0PwzQQRk=q4jG32cU1us2+-mtzZUQg@mail.gmail.com
he will be sharing the results.
4) There are lot of if-checks in IsParallelCopyAllowed function that are
checked in CopyFrom function as well which means in case of Parallel Copy
those checks will get executed multiple times (first by the leader and from
second time onwards by each worker process). Is that required?
It is called from BeginParallelCopy, This will be called only once. This
change is ok.
5) Should the worker process be calling this function when the leader has
already called it once in ExecBeforeStmtTrigger()?
/* Verify the named relation is a valid target for INSERT */
CheckValidResultRel(resultRelInfo, CMD_INSERT);
Fixed.
6) I think it would be good to re-write the comments atop
ParallelCopyLeader(). From the present comments it appears as if you were
trying to put the information pointwise but somehow you ended up putting in
a paragraph. The comments also have some typos like *line beaks* which
possibly means line breaks. This is applicable for other comments as well
where you
Fixed.
7) Is the following checking equivalent to IsWorker()? If so, it would be
good to replace it with an IsWorker like macro to increase the readability.
(IsParallelCopy() && !IsLeader())
Fixed.
These have been fixed and the new patch is attached as part of my previous
mail.
[1]: /messages/by-id/CAA4eK1LQPxULxw8JpucX0PwzQQRk=q4jG32cU1us2+-mtzZUQg@mail.gmail.com
/messages/by-id/CAA4eK1LQPxULxw8JpucX0PwzQQRk=q4jG32cU1us2+-mtzZUQg@mail.gmail.com
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
On Wed, Jul 22, 2020 at 7:56 PM vignesh C <vignesh21@gmail.com> wrote:
Thanks for reviewing and providing the comments Ashutosh.
Please find my thoughts below:On Fri, Jul 17, 2020 at 7:18 PM Ashutosh Sharma <ashu.coek88@gmail.com>
wrote:
Some review comments (mostly) from the leader side code changes:
3) Should we allow Parallel Copy when the insert method is
CIM_MULTI_CONDITIONAL?
+ /* Check if the insertion mode is single. */ + if (FindInsertMethod(cstate) == CIM_SINGLE) + return false;I know we have added checks in CopyFrom() to ensure that if any trigger
(before row or instead of) is found on any of partition being loaded with
data, then COPY FROM operation would fail, but does it mean that we are
okay to perform parallel copy on partitioned table. Have we done some
performance testing with the partitioned table where the data in the input
file needs to be routed to the different partitions?
Partition data is handled like what Amit had told in one of earlier mails
[1]: . My colleague Bharath has run performance test with partition table, he will be sharing the results.
he will be sharing the results.
I ran tests for partitioned use cases - results are similar to that of non
partitioned cases[1]. My colleague Bharath has run performance test with partition table, he will be sharing the results..
parallel workers test case 1(exec time in sec): copy from csv file, 5.1GB,
10million tuples, 4 range partitions, 3 indexes on integer columns unique
data test case 2(exec time in sec): copy from csv file, 5.1GB, 10million
tuples, 4 range partitions, unique data
0 205.403(1X) 135(1X)
2 114.724(1.79X) 59.388(2.27X)
4 99.017(2.07X) 56.742(2.34X)
8 99.722(2.06X) 66.323(2.03X)
16 98.147(2.09X) 66.054(2.04X)
20 97.723(2.1X) 66.389(2.03X)
30 97.048(2.11X) 70.568(1.91X)
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On Thu, Jul 23, 2020 at 8:51 AM Bharath Rupireddy <
bharath.rupireddyforpostgres@gmail.com> wrote:
On Wed, Jul 22, 2020 at 7:56 PM vignesh C <vignesh21@gmail.com> wrote:
Thanks for reviewing and providing the comments Ashutosh.
Please find my thoughts below:On Fri, Jul 17, 2020 at 7:18 PM Ashutosh Sharma <ashu.coek88@gmail.com>
wrote:
Some review comments (mostly) from the leader side code changes:
3) Should we allow Parallel Copy when the insert method is
CIM_MULTI_CONDITIONAL?
+ /* Check if the insertion mode is single. */ + if (FindInsertMethod(cstate) == CIM_SINGLE) + return false;I know we have added checks in CopyFrom() to ensure that if any
trigger (before row or instead of) is found on any of partition being
loaded with data, then COPY FROM operation would fail, but does it mean
that we are okay to perform parallel copy on partitioned table. Have we
done some performance testing with the partitioned table where the data in
the input file needs to be routed to the different partitions?Partition data is handled like what Amit had told in one of earlier
mails [1]. My colleague Bharath has run performance test with partition
table, he will be sharing the results.I ran tests for partitioned use cases - results are similar to that of non
partitioned cases[1].
I could see the gain up to 10-11 times for non-partitioned cases [1]/messages/by-id/CALj2ACVR4WE98Per1H7ajosW8vafN16548O2UV8bG3p4D3XnPg@mail.gmail.com, can
we use similar test case here as well (with one of the indexes on text
column or having gist index) to see its impact?
[1]: /messages/by-id/CALj2ACVR4WE98Per1H7ajosW8vafN16548O2UV8bG3p4D3XnPg@mail.gmail.com
/messages/by-id/CALj2ACVR4WE98Per1H7ajosW8vafN16548O2UV8bG3p4D3XnPg@mail.gmail.com
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
I think, when doing the performance testing for partitioned table, it would
be good to also mention about the distribution of data in the input file.
One possible data distribution could be that we have let's say 100 tuples
in the input file, and every consecutive tuple belongs to a different
partition.
On Thu, Jul 23, 2020 at 8:51 AM Bharath Rupireddy <
bharath.rupireddyforpostgres@gmail.com> wrote:
Show quoted text
On Wed, Jul 22, 2020 at 7:56 PM vignesh C <vignesh21@gmail.com> wrote:
Thanks for reviewing and providing the comments Ashutosh.
Please find my thoughts below:On Fri, Jul 17, 2020 at 7:18 PM Ashutosh Sharma <ashu.coek88@gmail.com>
wrote:
Some review comments (mostly) from the leader side code changes:
3) Should we allow Parallel Copy when the insert method is
CIM_MULTI_CONDITIONAL?
+ /* Check if the insertion mode is single. */ + if (FindInsertMethod(cstate) == CIM_SINGLE) + return false;I know we have added checks in CopyFrom() to ensure that if any
trigger (before row or instead of) is found on any of partition being
loaded with data, then COPY FROM operation would fail, but does it mean
that we are okay to perform parallel copy on partitioned table. Have we
done some performance testing with the partitioned table where the data in
the input file needs to be routed to the different partitions?Partition data is handled like what Amit had told in one of earlier
mails [1]. My colleague Bharath has run performance test with partition
table, he will be sharing the results.I ran tests for partitioned use cases - results are similar to that of non
partitioned cases[1].parallel workers test case 1(exec time in sec): copy from csv file,
5.1GB, 10million tuples, 4 range partitions, 3 indexes on integer columns
unique data test case 2(exec time in sec): copy from csv file, 5.1GB,
10million tuples, 4 range partitions, unique data
0 205.403(1X) 135(1X)
2 114.724(1.79X) 59.388(2.27X)
4 99.017(2.07X) 56.742(2.34X)
8 99.722(2.06X) 66.323(2.03X)
16 98.147(2.09X) 66.054(2.04X)
20 97.723(2.1X) 66.389(2.03X)
30 97.048(2.11X) 70.568(1.91X)With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On Thu, Jul 23, 2020 at 9:22 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
I ran tests for partitioned use cases - results are similar to that of
non partitioned cases[1].
I could see the gain up to 10-11 times for non-partitioned cases [1], can
we use similar test case here as well (with one of the indexes on text
column or having gist index) to see its impact?
[1] -
/messages/by-id/CALj2ACVR4WE98Per1H7ajosW8vafN16548O2UV8bG3p4D3XnPg@mail.gmail.com
Thanks Amit! Please find the results of detailed testing done for
partitioned use cases:
Range Partitions: consecutive rows go into the same partitions.
parallel workers test case 1(exec time in sec): copy from csv file, 2
indexes on integer columns and 1 index on text column, 4 range partitions test
case 2(exec time in sec): copy from csv file, 1 gist index on text column,
4 range partitions test case 3(exec time in sec): copy from csv file, 3
indexes on integer columns, 4 range partitions
0 1051.924(1X) 785.052(1X) 205.403(1X)
2 589.576(1.78X) 421.974(1.86X) 114.724(1.79X)
4 321.960(3.27X) 230.997(3.4X) 99.017(2.07X)
8 199.245(5.23X) *156.132(5.02X)* 99.722(2.06X)
16 127.343(8.26X) 173.696(4.52X) 98.147(2.09X)
20 *122.029(8.62X)* 186.418(4.21X) 97.723(2.1X)
30 142.876(7.36X) 214.598(3.66X) *97.048(2.11X)*
On Thu, Jul 23, 2020 at 10:21 AM Ashutosh Sharma <ashu.coek88@gmail.com>
wrote:
I think, when doing the performance testing for partitioned table, it
would be good to also mention about the distribution of data in the input
file. One possible data distribution could be that we have let's say 100
tuples in the input file, and every consecutive tuple belongs to a
different partition.
To address Ashutosh's point, I used hash partitioning. Hope this helps to
clear the doubt.
Hash Partitions: where there are high chances that consecutive rows may go
into different partitions.
parallel workers test case 1(exec time in sec): copy from csv file, 2
indexes on integer columns and 1 index on text column, 4 hash partitions test
case 2(exec time in sec): copy from csv file, 1 gist index on text column,
4 hash partitions test case 3(exec time in sec): copy from csv file, 3
indexes on integer columns, 4 hash partitions
0 1060.884(1X) 812.283(1X) 207.745(1X)
2 572.542(1.85X) 418.454(1.94X) 107.850(1.93X)
4 298.132(3.56X) 227.367(3.57X) *83.895(2.48X)*
8 169.449(6.26X) 137.993(5.89X) 85.411(2.43X)
16 112.297(9.45X) 95.167(8.53X) 96.136(2.16X)
20 *101.546(10.45X)* *90.552(8.97X)* 97.066(2.14X)
30 113.877(9.32X) 127.17(6.38X) 96.819(2.14X)
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
The patches were not applying because of the recent commits.
I have rebased the patch over head & attached.
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
On Thu, Jul 23, 2020 at 6:07 PM Bharath Rupireddy <
bharath.rupireddyforpostgres@gmail.com> wrote:
Show quoted text
On Thu, Jul 23, 2020 at 9:22 AM Amit Kapila <amit.kapila16@gmail.com>
wrote:I ran tests for partitioned use cases - results are similar to that of
non partitioned cases[1].
I could see the gain up to 10-11 times for non-partitioned cases [1],
can we use similar test case here as well (with one of the indexes on text
column or having gist index) to see its impact?[1] -
/messages/by-id/CALj2ACVR4WE98Per1H7ajosW8vafN16548O2UV8bG3p4D3XnPg@mail.gmail.com
Thanks Amit! Please find the results of detailed testing done for
partitioned use cases:Range Partitions: consecutive rows go into the same partitions.
parallel workers test case 1(exec time in sec): copy from csv file, 2
indexes on integer columns and 1 index on text column, 4 range partitions test
case 2(exec time in sec): copy from csv file, 1 gist index on text column,
4 range partitions test case 3(exec time in sec): copy from csv file, 3
indexes on integer columns, 4 range partitions
0 1051.924(1X) 785.052(1X) 205.403(1X)
2 589.576(1.78X) 421.974(1.86X) 114.724(1.79X)
4 321.960(3.27X) 230.997(3.4X) 99.017(2.07X)
8 199.245(5.23X) *156.132(5.02X)* 99.722(2.06X)
16 127.343(8.26X) 173.696(4.52X) 98.147(2.09X)
20 *122.029(8.62X)* 186.418(4.21X) 97.723(2.1X)
30 142.876(7.36X) 214.598(3.66X) *97.048(2.11X)*On Thu, Jul 23, 2020 at 10:21 AM Ashutosh Sharma <ashu.coek88@gmail.com>
wrote:I think, when doing the performance testing for partitioned table, it
would be good to also mention about the distribution of data in the input
file. One possible data distribution could be that we have let's say 100
tuples in the input file, and every consecutive tuple belongs to a
different partition.To address Ashutosh's point, I used hash partitioning. Hope this helps to
clear the doubt.Hash Partitions: where there are high chances that consecutive rows may go
into different partitions.
parallel workers test case 1(exec time in sec): copy from csv file, 2
indexes on integer columns and 1 index on text column, 4 hash partitions test
case 2(exec time in sec): copy from csv file, 1 gist index on text column,
4 hash partitions test case 3(exec time in sec): copy from csv file, 3
indexes on integer columns, 4 hash partitions
0 1060.884(1X) 812.283(1X) 207.745(1X)
2 572.542(1.85X) 418.454(1.94X) 107.850(1.93X)
4 298.132(3.56X) 227.367(3.57X) *83.895(2.48X)*
8 169.449(6.26X) 137.993(5.89X) 85.411(2.43X)
16 112.297(9.45X) 95.167(8.53X) 96.136(2.16X)
20 *101.546(10.45X)* *90.552(8.97X)* 97.066(2.14X)
30 113.877(9.32X) 127.17(6.38X) 96.819(2.14X)With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v2-0001-Copy-code-readjustment-to-support-parallel-copy.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Copy-code-readjustment-to-support-parallel-copy.patchDownload
From f2ba043af005c55961ed68c9a595cee58c46c79d Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Sat, 1 Aug 2020 08:52:48 +0530
Subject: [PATCH v2 1/5] Copy code readjustment to support parallel copy.
This patch has the copy code slightly readjusted so that the common code is
separated to functions/macros, these functions/macros will be used by the
workers in the parallel copy code of the upcoming patches. EOL removal is moved
from CopyReadLine to CopyReadLineText, this change was required because in case
of parallel copy the record identification and record updation is done in
CopyReadLineText, before record information is updated in shared memory the new
line characters should be removed.
---
src/backend/commands/copy.c | 361 ++++++++++++++++++++++++++------------------
1 file changed, 218 insertions(+), 143 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index db7d24a..3efafca 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -95,6 +95,9 @@ typedef enum CopyInsertMethod
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
} CopyInsertMethod;
+#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
+
/*
* This struct contains all the state variables used throughout a COPY
* operation. For simplicity, we use the same struct for all variants of COPY,
@@ -224,7 +227,6 @@ typedef struct CopyStateData
* appropriate amounts of data from this buffer. In both modes, we
* guarantee that there is a \0 at raw_buf[raw_buf_len].
*/
-#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
char *raw_buf;
int raw_buf_index; /* next byte to process */
int raw_buf_len; /* total # of bytes stored */
@@ -354,6 +356,27 @@ if (1) \
goto not_end_of_copy; \
} else ((void) 0)
+/*
+ * CLEAR_EOL_LINE - Wrapper for clearing EOL.
+ */
+#define CLEAR_EOL_LINE() \
+if (!result && !IsHeaderLine()) \
+ ClearEOLFromCopiedData(cstate, cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ &cstate->line_buf.len) \
+
+/*
+ * INCREMENTPROCESSED - Increment the lines processed.
+ */
+#define INCREMENTPROCESSED(processed) \
+processed++;
+
+/*
+ * GETPROCESSED - Get the lines processed.
+ */
+#define GETPROCESSED(processed) \
+return processed;
+
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
@@ -401,7 +424,11 @@ static bool CopyGetInt16(CopyState cstate, int16 *val);
static bool CopyLoadRawBuf(CopyState cstate);
static int CopyReadBinaryData(CopyState cstate, char *dest, int nbytes);
-
+static void PopulateCommonCstateInfo(CopyState cstate, TupleDesc tup_desc,
+ List *attnamelist);
+static void ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
+ int copy_line_pos, int *copy_line_size);
+static void ConvertToServerEncoding(CopyState cstate);
/*
* Send copy start/stop messages for frontend copies. These have changed
* in past protocol redesigns.
@@ -801,14 +828,18 @@ CopyLoadRawBuf(CopyState cstate)
{
int nbytes = RAW_BUF_BYTES(cstate);
int inbytes;
+ int minread = 1;
/* Copy down the unprocessed data if any. */
if (nbytes > 0)
memmove(cstate->raw_buf, cstate->raw_buf + cstate->raw_buf_index,
nbytes);
+ if (cstate->copy_dest == COPY_NEW_FE)
+ minread = RAW_BUF_SIZE - nbytes;
+
inbytes = CopyGetData(cstate, cstate->raw_buf + nbytes,
- 1, RAW_BUF_SIZE - nbytes);
+ minread, RAW_BUF_SIZE - nbytes);
nbytes += inbytes;
cstate->raw_buf[nbytes] = '\0';
cstate->raw_buf_index = 0;
@@ -1514,7 +1545,6 @@ BeginCopy(ParseState *pstate,
{
CopyState cstate;
TupleDesc tupDesc;
- int num_phys_attrs;
MemoryContext oldcontext;
/* Allocate workspace and zero all fields */
@@ -1680,6 +1710,24 @@ BeginCopy(ParseState *pstate,
tupDesc = cstate->queryDesc->tupDesc;
}
+ PopulateCommonCstateInfo(cstate, tupDesc, attnamelist);
+ cstate->copy_dest = COPY_FILE; /* default */
+
+ MemoryContextSwitchTo(oldcontext);
+
+ return cstate;
+}
+
+/*
+ * PopulateCommonCstateInfo - Populates the common variables required for copy
+ * from operation. This is a helper function for BeginCopy function.
+ */
+static void
+PopulateCommonCstateInfo(CopyState cstate, TupleDesc tupDesc,
+ List *attnamelist)
+{
+ int num_phys_attrs;
+
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
@@ -1799,12 +1847,6 @@ BeginCopy(ParseState *pstate,
pg_database_encoding_max_length() > 1);
/* See Multibyte encoding comment above */
cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
-
- cstate->copy_dest = COPY_FILE; /* default */
-
- MemoryContextSwitchTo(oldcontext);
-
- return cstate;
}
/*
@@ -2696,32 +2738,11 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
}
/*
- * Copy FROM file to relation.
+ * Check if the relation specified in copy from is valid.
*/
-uint64
-CopyFrom(CopyState cstate)
+static void
+CheckTargetRelValidity(CopyState cstate)
{
- ResultRelInfo *resultRelInfo;
- ResultRelInfo *target_resultRelInfo;
- ResultRelInfo *prevResultRelInfo = NULL;
- EState *estate = CreateExecutorState(); /* for ExecConstraints() */
- ModifyTableState *mtstate;
- ExprContext *econtext;
- TupleTableSlot *singleslot = NULL;
- MemoryContext oldcontext = CurrentMemoryContext;
-
- PartitionTupleRouting *proute = NULL;
- ErrorContextCallback errcallback;
- CommandId mycid = GetCurrentCommandId(true);
- int ti_options = 0; /* start with default options for insert */
- BulkInsertState bistate = NULL;
- CopyInsertMethod insertMethod;
- CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
- uint64 processed = 0;
- bool has_before_insert_row_trig;
- bool has_instead_insert_row_trig;
- bool leafpart_use_multi_insert = false;
-
Assert(cstate->rel);
/*
@@ -2758,27 +2779,6 @@ CopyFrom(CopyState cstate)
RelationGetRelationName(cstate->rel))));
}
- /*
- * If the target file is new-in-transaction, we assume that checking FSM
- * for free space is a waste of time. This could possibly be wrong, but
- * it's unlikely.
- */
- if (RELKIND_HAS_STORAGE(cstate->rel->rd_rel->relkind) &&
- (cstate->rel->rd_createSubid != InvalidSubTransactionId ||
- cstate->rel->rd_firstRelfilenodeSubid != InvalidSubTransactionId))
- ti_options |= TABLE_INSERT_SKIP_FSM;
-
- /*
- * Optimize if new relfilenode was created in this subxact or one of its
- * committed children and we won't see those rows later as part of an
- * earlier scan or command. The subxact test ensures that if this subxact
- * aborts then the frozen rows won't be visible after xact cleanup. Note
- * that the stronger test of exactly which subtransaction created it is
- * crucial for correctness of this optimization. The test for an earlier
- * scan or command tolerates false negatives. FREEZE causes other sessions
- * to see rows they would not see under MVCC, and a false negative merely
- * spreads that anomaly to the current session.
- */
if (cstate->freeze)
{
/*
@@ -2816,9 +2816,61 @@ CopyFrom(CopyState cstate)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("cannot perform COPY FREEZE because the table was not created or truncated in the current subtransaction")));
+ }
+}
+
+/*
+ * Copy FROM file to relation.
+ */
+uint64
+CopyFrom(CopyState cstate)
+{
+ ResultRelInfo *resultRelInfo;
+ ResultRelInfo *target_resultRelInfo;
+ ResultRelInfo *prevResultRelInfo = NULL;
+ EState *estate = CreateExecutorState(); /* for ExecConstraints() */
+ ModifyTableState *mtstate;
+ ExprContext *econtext;
+ TupleTableSlot *singleslot = NULL;
+ MemoryContext oldcontext = CurrentMemoryContext;
+
+ PartitionTupleRouting *proute = NULL;
+ ErrorContextCallback errcallback;
+ CommandId mycid = GetCurrentCommandId(true);
+ int ti_options = 0; /* start with default options for insert */
+ BulkInsertState bistate = NULL;
+ CopyInsertMethod insertMethod;
+ CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
+ uint64 processed = 0;
+ bool has_before_insert_row_trig;
+ bool has_instead_insert_row_trig;
+ bool leafpart_use_multi_insert = false;
+
+ CheckTargetRelValidity(cstate);
+ /*
+ * If the target file is new-in-transaction, we assume that checking FSM
+ * for free space is a waste of time. This could possibly be wrong, but
+ * it's unlikely.
+ */
+ if (RELKIND_HAS_STORAGE(cstate->rel->rd_rel->relkind) &&
+ (cstate->rel->rd_createSubid != InvalidSubTransactionId ||
+ cstate->rel->rd_firstRelfilenodeSubid != InvalidSubTransactionId))
+ ti_options |= TABLE_INSERT_SKIP_FSM;
+
+ /*
+ * Optimize if new relfilenode was created in this subxact or one of its
+ * committed children and we won't see those rows later as part of an
+ * earlier scan or command. The subxact test ensures that if this subxact
+ * aborts then the frozen rows won't be visible after xact cleanup. Note
+ * that the stronger test of exactly which subtransaction created it is
+ * crucial for correctness of this optimization. The test for an earlier
+ * scan or command tolerates false negatives. FREEZE causes other sessions
+ * to see rows they would not see under MVCC, and a false negative merely
+ * spreads that anomaly to the current session.
+ */
+ if (cstate->freeze)
ti_options |= TABLE_INSERT_FROZEN;
- }
/*
* We need a ResultRelInfo so we can use the regular executor's
@@ -3311,7 +3363,7 @@ CopyFrom(CopyState cstate)
* or FDW; this is the same definition used by nodeModifyTable.c
* for counting tuples inserted by an INSERT command.
*/
- processed++;
+ INCREMENTPROCESSED(processed)
}
}
@@ -3366,30 +3418,15 @@ CopyFrom(CopyState cstate)
FreeExecutorState(estate);
- return processed;
+ GETPROCESSED(processed)
}
/*
- * Setup to read tuples from a file for COPY FROM.
- *
- * 'rel': Used as a template for the tuples
- * 'filename': Name of server-local file to read
- * 'attnamelist': List of char *, columns to include. NIL selects all cols.
- * 'options': List of DefElem. See copy_opt_item in gram.y for selections.
- *
- * Returns a CopyState, to be passed to NextCopyFrom and related functions.
+ * PopulateCstateCatalogInfo - populate the catalog information.
*/
-CopyState
-BeginCopyFrom(ParseState *pstate,
- Relation rel,
- const char *filename,
- bool is_program,
- copy_data_source_cb data_source_cb,
- List *attnamelist,
- List *options)
+static void
+PopulateCstateCatalogInfo(CopyState cstate)
{
- CopyState cstate;
- bool pipe = (filename == NULL);
TupleDesc tupDesc;
AttrNumber num_phys_attrs,
num_defaults;
@@ -3399,38 +3436,8 @@ BeginCopyFrom(ParseState *pstate,
Oid in_func_oid;
int *defmap;
ExprState **defexprs;
- MemoryContext oldcontext;
bool volatile_defexprs;
- cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
- oldcontext = MemoryContextSwitchTo(cstate->copycontext);
-
- /* Initialize state variables */
- cstate->reached_eof = false;
- cstate->eol_type = EOL_UNKNOWN;
- cstate->cur_relname = RelationGetRelationName(cstate->rel);
- cstate->cur_lineno = 0;
- cstate->cur_attname = NULL;
- cstate->cur_attval = NULL;
-
- /*
- * Set up variables to avoid per-attribute overhead. attribute_buf and
- * raw_buf are used in both text and binary modes, but we use line_buf
- * only in text mode.
- */
- initStringInfo(&cstate->attribute_buf);
- cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
- cstate->raw_buf_index = cstate->raw_buf_len = 0;
- if (!cstate->binary)
- {
- initStringInfo(&cstate->line_buf);
- cstate->line_buf_converted = false;
- }
-
- /* Assign range table, we'll need it in CopyFrom. */
- if (pstate)
- cstate->range_table = pstate->p_rtable;
-
tupDesc = RelationGetDescr(cstate->rel);
num_phys_attrs = tupDesc->natts;
num_defaults = 0;
@@ -3508,6 +3515,61 @@ BeginCopyFrom(ParseState *pstate,
cstate->defexprs = defexprs;
cstate->volatile_defexprs = volatile_defexprs;
cstate->num_defaults = num_defaults;
+}
+
+/*
+ * Setup to read tuples from a file for COPY FROM.
+ *
+ * 'rel': Used as a template for the tuples
+ * 'filename': Name of server-local file to read
+ * 'attnamelist': List of char *, columns to include. NIL selects all cols.
+ * 'options': List of DefElem. See copy_opt_item in gram.y for selections.
+ *
+ * Returns a CopyState, to be passed to NextCopyFrom and related functions.
+ */
+CopyState
+BeginCopyFrom(ParseState *pstate,
+ Relation rel,
+ const char *filename,
+ bool is_program,
+ copy_data_source_cb data_source_cb,
+ List *attnamelist,
+ List *options)
+{
+ CopyState cstate;
+ bool pipe = (filename == NULL);
+ MemoryContext oldcontext;
+
+ cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+
+ /* Initialize state variables */
+ cstate->reached_eof = false;
+ cstate->eol_type = EOL_UNKNOWN;
+ cstate->cur_relname = RelationGetRelationName(cstate->rel);
+ cstate->cur_lineno = 0;
+ cstate->cur_attname = NULL;
+ cstate->cur_attval = NULL;
+
+ /*
+ * Set up variables to avoid per-attribute overhead. attribute_buf is
+ * used in both text and binary modes, but we use line_buf and raw_buf
+ * only in text mode.
+ */
+ initStringInfo(&cstate->attribute_buf);
+ cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+ if (!cstate->binary)
+ {
+ initStringInfo(&cstate->line_buf);
+ cstate->line_buf_converted = false;
+ }
+
+ /* Assign range table, we'll need it in CopyFrom. */
+ if (pstate)
+ cstate->range_table = pstate->p_rtable;
+
+ PopulateCstateCatalogInfo(cstate);
cstate->is_program = is_program;
if (data_source_cb)
@@ -3917,45 +3979,60 @@ CopyReadLine(CopyState cstate)
} while (CopyLoadRawBuf(cstate));
}
}
- else
- {
- /*
- * If we didn't hit EOF, then we must have transferred the EOL marker
- * to line_buf along with the data. Get rid of it.
- */
- switch (cstate->eol_type)
- {
- case EOL_NL:
- Assert(cstate->line_buf.len >= 1);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\n');
- cstate->line_buf.len--;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_CR:
- Assert(cstate->line_buf.len >= 1);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\r');
- cstate->line_buf.len--;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_CRNL:
- Assert(cstate->line_buf.len >= 2);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 2] == '\r');
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\n');
- cstate->line_buf.len -= 2;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_UNKNOWN:
- /* shouldn't get here */
- Assert(false);
- break;
- }
- }
+ ConvertToServerEncoding(cstate);
+ return result;
+}
+
+/*
+ * ClearEOLFromCopiedData - Clear EOL from the copied data.
+ */
+static void
+ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
+ int copy_line_pos, int *copy_line_size)
+{
+ /*
+ * If we didn't hit EOF, then we must have transferred the EOL marker
+ * to line_buf along with the data. Get rid of it.
+ */
+ switch (cstate->eol_type)
+ {
+ case EOL_NL:
+ Assert(*copy_line_size >= 1);
+ Assert(copy_line_data[copy_line_pos - 1] == '\n');
+ copy_line_data[copy_line_pos - 1] = '\0';
+ (*copy_line_size)--;
+ break;
+ case EOL_CR:
+ Assert(*copy_line_size >= 1);
+ Assert(copy_line_data[copy_line_pos - 1] == '\r');
+ copy_line_data[copy_line_pos - 1] = '\0';
+ (*copy_line_size)--;
+ break;
+ case EOL_CRNL:
+ Assert(*copy_line_size >= 2);
+ Assert(copy_line_data[copy_line_pos - 2] == '\r');
+ Assert(copy_line_data[copy_line_pos - 1] == '\n');
+ copy_line_data[copy_line_pos - 2] = '\0';
+ *copy_line_size -= 2;
+ break;
+ case EOL_UNKNOWN:
+ /* shouldn't get here */
+ Assert(false);
+ break;
+ }
+}
+
+/*
+ * ConvertToServerEncoding - convert contents to server encoding.
+ */
+static void
+ConvertToServerEncoding(CopyState cstate)
+{
/* Done reading the line. Convert it to server encoding. */
if (cstate->need_transcoding)
{
char *cvt;
-
cvt = pg_any_to_server(cstate->line_buf.data,
cstate->line_buf.len,
cstate->file_encoding);
@@ -3967,11 +4044,8 @@ CopyReadLine(CopyState cstate)
pfree(cvt);
}
}
-
/* Now it's safe to use the buffer in error messages */
cstate->line_buf_converted = true;
-
- return result;
}
/*
@@ -4334,6 +4408,7 @@ not_end_of_copy:
* Transfer any still-uncopied data to line_buf.
*/
REFILL_LINEBUF;
+ CLEAR_EOL_LINE();
return result;
}
--
1.8.3.1
v2-0002-Framework-for-leader-worker-in-parallel-copy.patchtext/x-patch; charset=US-ASCII; name=v2-0002-Framework-for-leader-worker-in-parallel-copy.patchDownload
From 1bf7d74c7308e6eae807c0258322abbb3890aede Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Sat, 1 Aug 2020 08:56:39 +0530
Subject: [PATCH v2 2/5] Framework for leader/worker in parallel copy
This patch has the framework for data structures in parallel copy, leader
initialization, worker initialization, shared memory updation, starting workers,
wait for workers and workers exiting.
---
src/backend/access/transam/parallel.c | 4 +
src/backend/commands/copy.c | 756 +++++++++++++++++++++++++++++++++-
src/include/commands/copy.h | 2 +
src/tools/pgindent/typedefs.list | 7 +
4 files changed, 765 insertions(+), 4 deletions(-)
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index b042696..a3cff4b 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -25,6 +25,7 @@
#include "catalog/pg_enum.h"
#include "catalog/storage.h"
#include "commands/async.h"
+#include "commands/copy.h"
#include "executor/execParallel.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
@@ -145,6 +146,9 @@ static const struct
},
{
"parallel_vacuum_main", parallel_vacuum_main
+ },
+ {
+ "ParallelCopyMain", ParallelCopyMain
}
};
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3efafca..3dcdfc1 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -96,9 +96,180 @@ typedef enum CopyInsertMethod
} CopyInsertMethod;
#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+
+/*
+ * The macros DATA_BLOCK_SIZE, RINGSIZE & MAX_BLOCKS_COUNT stores the records
+ * read from the file that need to be inserted into the relation. These values
+ * help in handover of multiple records with significant size of data to be
+ * processed by each of the workers to make sure there is no context switch & the
+ * work is fairly distributed among the workers. This number showed best
+ * results in the performance tests.
+ */
+#define DATA_BLOCK_SIZE RAW_BUF_SIZE
+
+/* It can hold upto 10000 record information for worker to process. */
+#define RINGSIZE (10 * 1000)
+
+/* It can hold 1000 blocks of 64K data in DSM to be processed by the worker. */
+#define MAX_BLOCKS_COUNT 1000
+
+/*
+ * Each worker will be allocated WORKER_CHUNK_COUNT of records from DSM data
+ * block to process to avoid lock contention. This value should be mode of
+ * RINGSIZE, as wrap around cases is currently not handled while selecting the
+ * WORKER_CHUNK_COUNT by the worker.
+ */
+#define WORKER_CHUNK_COUNT 50
+
+#define IsParallelCopy() (cstate->is_parallel)
+#define IsLeader() (cstate->pcdata->is_leader)
#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
/*
+ * Copy data block information.
+ * ParallelCopyDataBlock's will be created in DSM. Data read from file will be
+ * copied in these DSM data blocks. The leader process identifies the records
+ * and the record information will be shared to the workers. The workers will
+ * insert the records into the table. There can be one or more number of records
+ * in each of the data block based on the record size.
+ */
+typedef struct ParallelCopyDataBlock
+{
+ /* The number of unprocessed lines in the current block. */
+ pg_atomic_uint32 unprocessed_line_parts;
+
+ /*
+ * If the current line data is continued into another block, following_block
+ * will have the position where the remaining data need to be read.
+ */
+ uint32 following_block;
+
+ /*
+ * This flag will be set, when the leader finds out this block can be read
+ * safely by the worker. This helps the worker to start processing the line
+ * early where the line will be spread across many blocks and the worker
+ * need not wait for the complete line to be processed.
+ */
+ bool curr_blk_completed;
+ char data[DATA_BLOCK_SIZE]; /* data read from file */
+ uint8 skip_bytes;
+}ParallelCopyDataBlock;
+
+/*
+ * Individual line information.
+ * ParallelCopyLineBoundary is common data structure between leader & worker,
+ * Leader process will be populating data block, data block offset & the size of
+ * the record in DSM for the workers to copy the data into the relation.
+ * This is protected by the following sequence in the leader & worker. If they
+ * don't follow this order the worker might process wrong line_size and leader
+ * might populate the information which worker has not yet processed or in the
+ * process of processing.
+ * Leader should operate in the following order:
+ * 1) check if line_size is -1, if not wait, it means worker is still
+ * processing.
+ * 2) set line_state to LINE_LEADER_POPULATING.
+ * 3) update first_block, start_offset & cur_lineno in any order.
+ * 4) update line_size.
+ * 5) update line_state to LINE_LEADER_POPULATED.
+ * Worker should operate in the following order:
+ * 1) check line_state is LINE_LEADER_POPULATED, if not it means leader is still
+ * populating the data.
+ * 2) read line_size.
+ * 3) only one worker should choose one line for processing, this is handled by
+ * using pg_atomic_compare_exchange_u32, worker will change the sate to
+ * LINE_WORKER_PROCESSING only if line_state is LINE_LEADER_POPULATED.
+ * 4) read first_block, start_offset & cur_lineno in any order.
+ * 5) process line_size data.
+ * 6) update line_size to -1.
+ */
+typedef struct ParallelCopyLineBoundary
+{
+ /* Position of the first block in data_blocks array. */
+ uint32 first_block;
+ uint32 start_offset; /* start offset of the line */
+
+ /*
+ * Size of the current line -1 means line is yet to be filled completely,
+ * 0 means empty line, >0 means line filled with line size data.
+ */
+ pg_atomic_uint32 line_size;
+ pg_atomic_uint32 line_state; /* line state */
+ uint64 cur_lineno; /* line number for error messages */
+}ParallelCopyLineBoundary;
+
+/*
+ * Circular queue used to store the line information.
+ */
+typedef struct ParallelCopyLineBoundaries
+{
+ /* Position for the leader to populate a line. */
+ uint32 pos;
+
+ /* Data read from the file/stdin by the leader process. */
+ ParallelCopyLineBoundary ring[RINGSIZE];
+}ParallelCopyLineBoundaries;
+
+/*
+ * Shared information among parallel copy workers. This will be allocated in the
+ * DSM segment.
+ */
+typedef struct ParallelCopyShmInfo
+{
+ bool is_read_in_progress; /* file read status */
+
+ /*
+ * Actual lines inserted by worker, will not be same as
+ * total_worker_processed if where condition is specified along with copy.
+ * This will be the actual records inserted into the relation.
+ */
+ pg_atomic_uint64 processed;
+
+ /*
+ * The number of records currently processed by the worker, this will also
+ * include the number of records that was filtered because of where clause.
+ */
+ pg_atomic_uint64 total_worker_processed;
+ uint64 populated; /* lines populated by leader */
+ uint32 cur_block_pos; /* current data block */
+ ParallelCopyDataBlock data_blocks[MAX_BLOCKS_COUNT]; /* data block array */
+ FullTransactionId full_transaction_id; /* xid for copy from statement */
+ CommandId mycid; /* command id */
+ ParallelCopyLineBoundaries line_boundaries; /* line array */
+} ParallelCopyShmInfo;
+
+/*
+ * Parallel copy line buffer information.
+ */
+typedef struct ParallelCopyLineBuf
+{
+ StringInfoData line_buf;
+ uint64 cur_lineno; /* line number for error messages */
+}ParallelCopyLineBuf;
+
+/*
+ * Parallel copy data information.
+ */
+typedef struct ParallelCopyData
+{
+ Oid relid; /* relation id of the table */
+ ParallelCopyShmInfo *pcshared_info; /* common info in shared memory */
+ bool is_leader;
+
+ /* line position which worker is processing */
+ uint32 worker_processed_pos;
+
+ /*
+ * Local line_buf array, workers will copy it here and release the lines
+ * for the leader to continue.
+ */
+ ParallelCopyLineBuf worker_line_buf[WORKER_CHUNK_COUNT];
+ uint32 worker_line_buf_count; /* Number of lines */
+
+ /* Current position in worker_line_buf */
+ uint32 worker_line_buf_pos;
+}ParallelCopyData;
+
+/*
* This struct contains all the state variables used throughout a COPY
* operation. For simplicity, we use the same struct for all variants of COPY,
* even though some fields are used in only some cases.
@@ -230,10 +401,38 @@ typedef struct CopyStateData
char *raw_buf;
int raw_buf_index; /* next byte to process */
int raw_buf_len; /* total # of bytes stored */
+ int nworkers;
+ bool is_parallel;
+ ParallelCopyData *pcdata;
/* Shorthand for number of unconsumed bytes available in raw_buf */
#define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
} CopyStateData;
+/*
+ * This structure helps in storing the common data from CopyStateData that are
+ * required by the workers. This information will then be allocated and stored
+ * into the DSM for the worker to retrieve and copy it to CopyStateData.
+ */
+typedef struct SerializedParallelCopyState
+{
+ /* low-level state data */
+ CopyDest copy_dest; /* type of copy source/destination */
+ int file_encoding; /* file or remote side's character encoding */
+ bool need_transcoding; /* file encoding diff from server? */
+ bool encoding_embeds_ascii; /* ASCII can be non-first byte? */
+
+ /* parameters from the COPY command */
+ bool csv_mode; /* Comma Separated Value format? */
+ bool header_line; /* CSV header line? */
+ int null_print_len; /* length of same */
+ bool force_quote_all; /* FORCE_QUOTE *? */
+ bool convert_selectively; /* do selective binary conversion? */
+
+ /* Working state for COPY FROM */
+ AttrNumber num_defaults;
+ Oid relid;
+}SerializedParallelCopyState;
+
/* DestReceiver for COPY (query) TO */
typedef struct
{
@@ -263,6 +462,22 @@ typedef struct
/* Trim the list of buffers back down to this number after flushing */
#define MAX_PARTITION_BUFFERS 32
+/* DSM keys for parallel copy. */
+#define PARALLEL_COPY_KEY_SHARED_INFO 1
+#define PARALLEL_COPY_KEY_CSTATE 2
+#define PARALLEL_COPY_KEY_NULL_PRINT 3
+#define PARALLEL_COPY_KEY_DELIM 4
+#define PARALLEL_COPY_KEY_QUOTE 5
+#define PARALLEL_COPY_KEY_ESCAPE 6
+#define PARALLEL_COPY_KEY_ATTNAME_LIST 7
+#define PARALLEL_COPY_KEY_NOT_NULL_LIST 8
+#define PARALLEL_COPY_KEY_NULL_LIST 9
+#define PARALLEL_COPY_KEY_CONVERT_LIST 10
+#define PARALLEL_COPY_KEY_WHERE_CLAUSE_STR 11
+#define PARALLEL_COPY_KEY_RANGE_TABLE 12
+#define PARALLEL_COPY_WAL_USAGE 13
+#define PARALLEL_COPY_BUFFER_USAGE 14
+
/* Stores multi-insert data related to a single relation in CopyFrom. */
typedef struct CopyMultiInsertBuffer
{
@@ -424,11 +639,477 @@ static bool CopyGetInt16(CopyState cstate, int16 *val);
static bool CopyLoadRawBuf(CopyState cstate);
static int CopyReadBinaryData(CopyState cstate, char *dest, int nbytes);
+static pg_attribute_always_inline void EndParallelCopy(ParallelContext *pcxt);
static void PopulateCommonCstateInfo(CopyState cstate, TupleDesc tup_desc,
- List *attnamelist);
+ List *attnamelist);
static void ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
int copy_line_pos, int *copy_line_size);
static void ConvertToServerEncoding(CopyState cstate);
+
+/*
+ * SerializeParallelCopyState - Copy shared_cstate using cstate information.
+ */
+static pg_attribute_always_inline void
+SerializeParallelCopyState(CopyState cstate, SerializedParallelCopyState *shared_cstate)
+{
+ shared_cstate->copy_dest = cstate->copy_dest;
+ shared_cstate->file_encoding = cstate->file_encoding;
+ shared_cstate->need_transcoding = cstate->need_transcoding;
+ shared_cstate->encoding_embeds_ascii = cstate->encoding_embeds_ascii;
+ shared_cstate->csv_mode = cstate->csv_mode;
+ shared_cstate->header_line = cstate->header_line;
+ shared_cstate->null_print_len = cstate->null_print_len;
+ shared_cstate->force_quote_all = cstate->force_quote_all;
+ shared_cstate->convert_selectively = cstate->convert_selectively;
+ shared_cstate->num_defaults = cstate->num_defaults;
+ shared_cstate->relid = cstate->pcdata->relid;
+}
+
+/*
+ * RestoreString - Retrieve the string from shared memory.
+ */
+static void
+RestoreString(shm_toc *toc, int sharedkey, char **copystr)
+{
+ char *shared_str_val = (char *)shm_toc_lookup(toc, sharedkey, true);
+ if (shared_str_val)
+ *copystr = pstrdup(shared_str_val);
+}
+
+/*
+ * EstimateLineKeysStr - Estimate the size required in shared memory for the
+ * input string.
+ */
+static void
+EstimateLineKeysStr(ParallelContext *pcxt, char *inputstr)
+{
+ if (inputstr)
+ {
+ shm_toc_estimate_chunk(&pcxt->estimator, strlen(inputstr) + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+}
+
+/*
+ * SerializeString - Insert a string into shared memory.
+ */
+static void
+SerializeString(ParallelContext *pcxt, int key, char *inputstr)
+{
+ if (inputstr)
+ {
+ char *shmptr = (char *)shm_toc_allocate(pcxt->toc,
+ strlen(inputstr) + 1);
+ strcpy(shmptr, inputstr);
+ shm_toc_insert(pcxt->toc, key, shmptr);
+ }
+}
+
+/*
+ * PopulateParallelCopyShmInfo - set ParallelCopyShmInfo.
+ */
+static void
+PopulateParallelCopyShmInfo(ParallelCopyShmInfo *shared_info_ptr,
+ FullTransactionId full_transaction_id)
+{
+ uint32 count;
+
+ MemSet(shared_info_ptr, 0, sizeof(ParallelCopyShmInfo));
+ shared_info_ptr->is_read_in_progress = true;
+ shared_info_ptr->cur_block_pos = -1;
+ shared_info_ptr->full_transaction_id = full_transaction_id;
+ shared_info_ptr->mycid = GetCurrentCommandId(true);
+ for (count = 0; count < RINGSIZE; count++)
+ {
+ ParallelCopyLineBoundary *lineInfo = &shared_info_ptr->line_boundaries.ring[count];
+ pg_atomic_init_u32(&(lineInfo->line_size), -1);
+ }
+}
+
+/*
+ * BeginParallelCopy - start parallel copy tasks.
+ *
+ * Get the number of workers required to perform the parallel copy. The data
+ * structures that are required by the parallel workers will be initialized, the
+ * size required in DSM will be calculated and the necessary keys will be loaded
+ * in the DSM. The specified number of workers will then be launched.
+ */
+static ParallelContext*
+BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
+{
+ ParallelContext *pcxt;
+ ParallelCopyShmInfo *shared_info_ptr;
+ SerializedParallelCopyState *shared_cstate;
+ FullTransactionId full_transaction_id;
+ Size est_cstateshared;
+ char *whereClauseStr = NULL;
+ char *rangeTableStr = NULL;
+ char *attnameListStr = NULL;
+ char *notnullListStr = NULL;
+ char *nullListStr = NULL;
+ char *convertListStr = NULL;
+ int parallel_workers = 0;
+ WalUsage *walusage;
+ BufferUsage *bufferusage;
+ ParallelCopyData *pcdata;
+ MemoryContext oldcontext;
+
+ parallel_workers = Min(nworkers, max_worker_processes);
+
+ /* Can't perform copy in parallel */
+ if (parallel_workers <= 0)
+ return NULL;
+
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+ pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
+ MemoryContextSwitchTo(oldcontext);
+ cstate->pcdata = pcdata;
+
+ EnterParallelMode();
+ pcxt = CreateParallelContext("postgres", "ParallelCopyMain",
+ parallel_workers);
+ Assert(pcxt->nworkers > 0);
+
+ /*
+ * Estimate size for shared information for PARALLEL_COPY_KEY_SHARED_INFO
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator, sizeof(ParallelCopyShmInfo));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate the size for shared information for PARALLEL_COPY_KEY_CSTATE */
+ est_cstateshared = MAXALIGN(sizeof(SerializedParallelCopyState));
+ shm_toc_estimate_chunk(&pcxt->estimator, est_cstateshared);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ EstimateLineKeysStr(pcxt, cstate->null_print);
+ EstimateLineKeysStr(pcxt, cstate->null_print_client);
+ EstimateLineKeysStr(pcxt, cstate->delim);
+ EstimateLineKeysStr(pcxt, cstate->quote);
+ EstimateLineKeysStr(pcxt, cstate->escape);
+
+ if (cstate->whereClause != NULL)
+ {
+ whereClauseStr = nodeToString(cstate->whereClause);
+ EstimateLineKeysStr(pcxt, whereClauseStr);
+ }
+
+ if (cstate->range_table != NULL)
+ {
+ rangeTableStr = nodeToString(cstate->range_table);
+ EstimateLineKeysStr(pcxt, rangeTableStr);
+ }
+
+ /* Estimate the size for shared information for PARALLEL_COPY_KEY_XID. */
+ shm_toc_estimate_chunk(&pcxt->estimator, sizeof(FullTransactionId));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_ATTNAME_LIST.
+ */
+ if (attnamelist != NIL)
+ {
+ attnameListStr = nodeToString(attnamelist);
+ EstimateLineKeysStr(pcxt, attnameListStr);
+ }
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_NOT_NULL_LIST.
+ */
+ if (cstate->force_notnull != NIL)
+ {
+ notnullListStr = nodeToString(cstate->force_notnull);
+ EstimateLineKeysStr(pcxt, notnullListStr);
+ }
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_NULL_LIST.
+ */
+ if (cstate->force_null != NIL)
+ {
+ nullListStr = nodeToString(cstate->force_null);
+ EstimateLineKeysStr(pcxt, nullListStr);
+ }
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_CONVERT_LIST.
+ */
+ if (cstate->convert_select != NIL)
+ {
+ convertListStr = nodeToString(cstate->convert_select);
+ EstimateLineKeysStr(pcxt, convertListStr);
+ }
+
+ /*
+ * Estimate space for WalUsage and BufferUsage -- PARALLEL_COPY_WAL_USAGE
+ * and PARALLEL_COPY_BUFFER_USAGE.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ InitializeParallelDSM(pcxt);
+
+ /* If no DSM segment was available, back out (do serial copy) */
+ if (pcxt->seg == NULL)
+ {
+ EndParallelCopy(pcxt);
+ return NULL;
+ }
+
+ /* Allocate shared memory for PARALLEL_COPY_KEY_SHARED_INFO */
+ shared_info_ptr = (ParallelCopyShmInfo *) shm_toc_allocate(pcxt->toc, sizeof(ParallelCopyShmInfo));
+ PopulateParallelCopyShmInfo(shared_info_ptr, full_transaction_id);
+
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_SHARED_INFO, shared_info_ptr);
+ pcdata->pcshared_info = shared_info_ptr;
+ pcdata->relid = relid;
+
+ /* Store shared build state, for which we reserved space. */
+ shared_cstate = (SerializedParallelCopyState *)shm_toc_allocate(pcxt->toc, est_cstateshared);
+
+ /* copy cstate variables. */
+ SerializeParallelCopyState(cstate, shared_cstate);
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_CSTATE, shared_cstate);
+
+ SerializeString(pcxt, PARALLEL_COPY_KEY_NULL_PRINT, cstate->null_print);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_DELIM, cstate->delim);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_QUOTE, cstate->quote);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_ESCAPE, cstate->escape);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_ATTNAME_LIST, attnameListStr);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_NOT_NULL_LIST, notnullListStr);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_NULL_LIST, nullListStr);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_CONVERT_LIST, convertListStr);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_WHERE_CLAUSE_STR, whereClauseStr);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_RANGE_TABLE, rangeTableStr);
+
+ /*
+ * Allocate space for each worker's WalUsage and BufferUsage; no need to
+ * initialize.
+ */
+ walusage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_WAL_USAGE, walusage);
+ bufferusage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_BUFFER_USAGE, bufferusage);
+
+ LaunchParallelWorkers(pcxt);
+ if (pcxt->nworkers_launched == 0)
+ {
+ EndParallelCopy(pcxt);
+ return NULL;
+ }
+
+ /*
+ * Caller needs to wait for all launched workers when we return. Make sure
+ * that the failure-to-start case will not hang forever.
+ */
+ WaitForParallelWorkersToAttach(pcxt);
+
+ pcdata->is_leader = true;
+ cstate->is_parallel = true;
+ return pcxt;
+}
+
+/*
+ * EndParallelCopy - end the parallel copy tasks.
+ */
+static pg_attribute_always_inline void
+EndParallelCopy(ParallelContext *pcxt)
+{
+ Assert(!IsParallelWorker());
+
+ DestroyParallelContext(pcxt);
+ ExitParallelMode();
+}
+
+/*
+ * InitializeParallelCopyInfo - Initialize parallel worker.
+ */
+static void
+InitializeParallelCopyInfo(SerializedParallelCopyState *shared_cstate,
+ CopyState cstate, List *attnamelist)
+{
+ uint32 count;
+ ParallelCopyData *pcdata = cstate->pcdata;
+ TupleDesc tup_desc = RelationGetDescr(cstate->rel);
+
+ cstate->copy_dest = shared_cstate->copy_dest;
+ cstate->file_encoding = shared_cstate->file_encoding;
+ cstate->need_transcoding = shared_cstate->need_transcoding;
+ cstate->encoding_embeds_ascii = shared_cstate->encoding_embeds_ascii;
+ cstate->csv_mode = shared_cstate->csv_mode;
+ cstate->header_line = shared_cstate->header_line;
+ cstate->null_print_len = shared_cstate->null_print_len;
+ cstate->force_quote_all = shared_cstate->force_quote_all;
+ cstate->convert_selectively = shared_cstate->convert_selectively;
+ cstate->num_defaults = shared_cstate->num_defaults;
+ pcdata->relid = shared_cstate->relid;
+
+ PopulateCommonCstateInfo(cstate, tup_desc, attnamelist);
+
+ /* Initialize state variables. */
+ cstate->reached_eof = false;
+ cstate->eol_type = EOL_UNKNOWN;
+ cstate->cur_relname = RelationGetRelationName(cstate->rel);
+ cstate->cur_lineno = 0;
+ cstate->cur_attname = NULL;
+ cstate->cur_attval = NULL;
+
+ /* Set up variables to avoid per-attribute overhead. */
+ initStringInfo(&cstate->attribute_buf);
+
+ initStringInfo(&cstate->line_buf);
+ for (count = 0; count < WORKER_CHUNK_COUNT; count++)
+ initStringInfo(&pcdata->worker_line_buf[count].line_buf);
+
+ cstate->line_buf_converted = false;
+ cstate->raw_buf = NULL;
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+}
+
+/*
+ * ParallelCopyMain - parallel copy worker's code.
+ *
+ * Where clause handling, convert tuple to columns, add default null values for
+ * the missing columns that are not present in that record. Find the partition
+ * if it is partitioned table, invoke before row insert Triggers, handle
+ * constraints and insert the tuples.
+ */
+void
+ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
+{
+ CopyState cstate;
+ ParallelCopyData *pcdata;
+ ParallelCopyShmInfo *pcshared_info;
+ SerializedParallelCopyState *shared_cstate;
+ Relation rel = NULL;
+ MemoryContext oldcontext;
+ List *attlist = NIL;
+ char *whereClauseStr = NULL;
+ char *rangeTableStr = NULL;
+ char *attnameListStr = NULL;
+ char *notnullListStr = NULL;
+ char *nullListStr = NULL;
+ char *convertListStr = NULL;
+ WalUsage *walusage;
+ BufferUsage *bufferusage;
+
+ /* Allocate workspace and zero all fields. */
+ cstate = (CopyStateData *) palloc0(sizeof(CopyStateData));
+
+ /*
+ * We allocate everything used by a cstate in a new memory context. This
+ * avoids memory leaks during repeated use of COPY in a query.
+ */
+ cstate->copycontext = AllocSetContextCreate(CurrentMemoryContext,
+ "COPY",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+
+ pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
+ cstate->pcdata = pcdata;
+ pcdata->is_leader = false;
+ pcdata->worker_processed_pos = -1;
+ cstate->is_parallel = true;
+ pcshared_info = (ParallelCopyShmInfo *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_SHARED_INFO, false);
+
+ ereport(DEBUG1, (errmsg("Starting parallel copy worker")));
+
+ pcdata->pcshared_info = pcshared_info;
+
+ shared_cstate = (SerializedParallelCopyState *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_CSTATE, false);
+ cstate->null_print = (char *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_NULL_PRINT, true);
+
+ RestoreString(toc, PARALLEL_COPY_KEY_DELIM, &cstate->delim);
+ RestoreString(toc, PARALLEL_COPY_KEY_QUOTE, &cstate->quote);
+ RestoreString(toc, PARALLEL_COPY_KEY_ESCAPE, &cstate->escape);
+ RestoreString(toc, PARALLEL_COPY_KEY_ATTNAME_LIST, &attnameListStr);
+ if (attnameListStr)
+ attlist = (List *)stringToNode(attnameListStr);
+
+ RestoreString(toc, PARALLEL_COPY_KEY_NOT_NULL_LIST, ¬nullListStr);
+ if (notnullListStr)
+ cstate->force_notnull = (List *)stringToNode(notnullListStr);
+
+ RestoreString(toc, PARALLEL_COPY_KEY_NULL_LIST, &nullListStr);
+ if (nullListStr)
+ cstate->force_null = (List *)stringToNode(nullListStr);
+
+ RestoreString(toc, PARALLEL_COPY_KEY_CONVERT_LIST, &convertListStr);
+ if (convertListStr)
+ cstate->convert_select = (List *)stringToNode(convertListStr);
+
+ RestoreString(toc, PARALLEL_COPY_KEY_WHERE_CLAUSE_STR, &whereClauseStr);
+ RestoreString(toc, PARALLEL_COPY_KEY_RANGE_TABLE, &rangeTableStr);
+
+ if (whereClauseStr)
+ {
+ Node *whereClauseCnvrtdFrmStr = (Node *) stringToNode(whereClauseStr);
+ cstate->whereClause = whereClauseCnvrtdFrmStr;
+ }
+
+ if (rangeTableStr)
+ {
+ List *rangeTableCnvrtdFrmStr = (List *) stringToNode(rangeTableStr);
+ cstate->range_table = rangeTableCnvrtdFrmStr;
+ }
+
+ /* Open and lock the relation, using the appropriate lock type. */
+ rel = table_open(shared_cstate->relid, RowExclusiveLock);
+ cstate->rel = rel;
+ InitializeParallelCopyInfo(shared_cstate, cstate, attlist);
+
+ CopyFrom(cstate);
+
+ if (rel != NULL)
+ table_close(rel, RowExclusiveLock);
+
+ /* Report WAL/buffer usage during parallel execution */
+ bufferusage = shm_toc_lookup(toc, PARALLEL_COPY_BUFFER_USAGE, false);
+ walusage = shm_toc_lookup(toc, PARALLEL_COPY_WAL_USAGE, false);
+ InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber],
+ &walusage[ParallelWorkerNumber]);
+
+ MemoryContextSwitchTo(oldcontext);
+ pfree(cstate);
+ return;
+}
+
+/*
+ * ParallelCopyFrom - parallel copy leader's functionality.
+ *
+ * Leader executes the before statement for before statement trigger, if before
+ * statement trigger is present. It will read the table data from the file and
+ * copy the contents to DSM data blocks. It will then read the input contents
+ * from the DSM data block and identify the records based on line breaks. This
+ * information is called line or a record that need to be inserted into a
+ * relation. The line information will be stored in ParallelCopyLineBoundary DSM
+ * data structure. Workers will then process this information and insert the
+ * data in to table. It will repeat this process until the all data is read from
+ * the file and all the DSM data blocks are processed. While processing if
+ * leader identifies that DSM Data blocks or DSM ParallelCopyLineBoundary data
+ * structures is full, leader will wait till the worker frees up some entries
+ * and repeat the process. It will wait till all the lines populated are
+ * processed by the workers and exits.
+ */
+static void
+ParallelCopyFrom(CopyState cstate)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ereport(DEBUG1, (errmsg("Running parallel copy leader")));
+
+ pcshared_info->is_read_in_progress = false;
+ cstate->cur_lineno = 0;
+}
+
/*
* Send copy start/stop messages for frontend copies. These have changed
* in past protocol redesigns.
@@ -1141,6 +1822,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
if (is_from)
{
+ ParallelContext *pcxt = NULL;
Assert(rel);
/* check read-only transaction and parallel mode */
@@ -1150,7 +1832,24 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
cstate = BeginCopyFrom(pstate, rel, stmt->filename, stmt->is_program,
NULL, stmt->attlist, stmt->options);
cstate->whereClause = whereClause;
- *processed = CopyFrom(cstate); /* copy from file to database */
+ cstate->is_parallel = false;
+
+ if (cstate->nworkers > 0)
+ pcxt = BeginParallelCopy(cstate->nworkers, cstate, stmt->attlist,
+ relid);
+
+ if (pcxt)
+ {
+ ParallelCopyFrom(cstate);
+
+ /* Wait for all copy workers to finish */
+ WaitForParallelWorkersToFinish(pcxt);
+ *processed = pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
+ EndParallelCopy(pcxt);
+ }
+ else
+ *processed = CopyFrom(cstate); /* copy from file to database */
+
EndCopyFrom(cstate);
}
else
@@ -1199,6 +1898,7 @@ ProcessCopyOptions(ParseState *pstate,
cstate->is_copy_from = is_from;
cstate->file_encoding = -1;
+ cstate->nworkers = -1;
/* Extract options from the statement node tree */
foreach(option, options)
@@ -1367,6 +2067,53 @@ ProcessCopyOptions(ParseState *pstate,
defel->defname),
parser_errposition(pstate, defel->location)));
}
+ else if (strcmp(defel->defname, "parallel") == 0)
+ {
+ char *endptr, *str;
+ long val;
+
+ if (!cstate->is_copy_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("parallel option supported only for copy from"),
+ parser_errposition(pstate, defel->location)));
+ if (cstate->nworkers >= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("conflicting or redundant options"),
+ parser_errposition(pstate, defel->location)));
+
+ str = defGetString(defel);
+
+ errno = 0; /* To distinguish success/failure after call */
+ val = strtol(str, &endptr, 10);
+
+ /* Check for various possible errors */
+ if ((errno == ERANGE && (val == LONG_MAX || val == LONG_MIN))
+ || (errno != 0 && val == 0) ||
+ *endptr)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("improper use of argument to option \"%s\"",
+ defel->defname),
+ parser_errposition(pstate, defel->location)));
+
+ if (endptr == str)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("no digits were found in argument to option \"%s\"",
+ defel->defname),
+ parser_errposition(pstate, defel->location)));
+
+ cstate->nworkers = (int) val;
+
+ if (cstate->nworkers <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("argument to option \"%s\" must be a positive integer greater than zero",
+ defel->defname),
+ parser_errposition(pstate, defel->location)));
+ }
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1720,11 +2467,12 @@ BeginCopy(ParseState *pstate,
/*
* PopulateCommonCstateInfo - Populates the common variables required for copy
- * from operation. This is a helper function for BeginCopy function.
+ * from operation. This is a helper function for BeginCopy &
+ * InitializeParallelCopyInfo function.
*/
static void
PopulateCommonCstateInfo(CopyState cstate, TupleDesc tupDesc,
- List *attnamelist)
+ List *attnamelist)
{
int num_phys_attrs;
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c639833..82843c6 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -14,6 +14,7 @@
#ifndef COPY_H
#define COPY_H
+#include "access/parallel.h"
#include "nodes/execnodes.h"
#include "nodes/parsenodes.h"
#include "parser/parse_node.h"
@@ -41,4 +42,5 @@ extern uint64 CopyFrom(CopyState cstate);
extern DestReceiver *CreateCopyDestReceiver(void);
+extern void ParallelCopyMain(dsm_segment *seg, shm_toc *toc);
#endif /* COPY_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 7eaaad1..3a83d0f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1699,6 +1699,12 @@ ParallelBitmapHeapState
ParallelBlockTableScanDesc
ParallelCompletionPtr
ParallelContext
+ParallelCopyLineBoundaries
+ParallelCopyLineBoundary
+ParallelCopyData
+ParallelCopyDataBlock
+ParallelCopyLineBuf
+ParallelCopyShmInfo
ParallelExecutorInfo
ParallelHashGrowth
ParallelHashJoinBatch
@@ -2215,6 +2221,7 @@ SerCommitSeqNo
SerialControl
SerializableXactHandle
SerializedActiveRelMaps
+SerializedParallelCopyState
SerializedReindexState
SerializedSnapshotData
SerializedTransactionState
--
1.8.3.1
v2-0003-Allow-copy-from-command-to-process-data-from-file.patchtext/x-patch; charset=US-ASCII; name=v2-0003-Allow-copy-from-command-to-process-data-from-file.patchDownload
From f3ec188c3c1e11fdbb716ec12862a891a95b396f Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Sat, 1 Aug 2020 09:31:42 +0530
Subject: [PATCH v2 3/5] Allow copy from command to process data from
file/STDIN contents to a table in parallel.
This feature allows the copy from to leverage multiple CPUs in order to copy
data from file/STDIN to a table. This adds a PARALLEL option to COPY FROM
command where the user can specify the number of workers that can be used
to perform the COPY FROM command. Specifying zero as number of workers will
disable parallelism.
The backend, to which the "COPY FROM" query is submitted acts as leader with
the responsibility of reading data from the file/stdin, launching at most n
number of workers as specified with PARALLEL 'n' option in the "COPY FROM"
query. The leader populates the common data required for the workers execution
in the DSM and shares it with the workers. The leader then executes before
statement triggers if there exists any. Leader populates DSM lines which
includes the start offset and line size, while populating the lines it reads
as many blocks as required into the DSM data blocks from the file. Each block
is of 64K size. The leader parses the data to identify a line, the existing
logic from CopyReadLineText which identifies the lines with some changes was
used for this. Leader checks if a free line is available to copy the
information, if there is no free line it waits till the required line is
freed up by the worker and then copies the identified lines information
(offset & line size) into the DSM lines. This process is repeated till the
complete file is processed. Simultaneously, the workers cache the lines(50)
locally into the local memory and release the lines to the leader for further
populating. Each worker processes the lines that it cached and inserts it into
the table. The leader does not participate in the insertion of data, leaders
only responsibility will be to identify the lines as fast as possible for the
workers to do the actual copy operation. The leader waits till all the lines
populated are processed by the workers and exits.
We have chosen this design based on the reason "that everything stalls if the
leader doesn't accept further input data, as well as when there are no
available splitted chunks so it doesn't seem like a good idea to have the
leader do other work. This is backed by the performance data where we have
seen that with 1 worker there is just a 5-10% performance difference".
---
src/backend/access/common/toast_internals.c | 11 +-
src/backend/access/heap/heapam.c | 13 -
src/backend/access/transam/xact.c | 31 +
src/backend/commands/copy.c | 890 ++++++++++++++++++++++++++--
src/bin/psql/tab-complete.c | 2 +-
src/include/access/xact.h | 2 +
src/tools/pgindent/typedefs.list | 1 +
7 files changed, 896 insertions(+), 54 deletions(-)
diff --git a/src/backend/access/common/toast_internals.c b/src/backend/access/common/toast_internals.c
index 25a81e5..586d53d 100644
--- a/src/backend/access/common/toast_internals.c
+++ b/src/backend/access/common/toast_internals.c
@@ -17,6 +17,7 @@
#include "access/genam.h"
#include "access/heapam.h"
#include "access/heaptoast.h"
+#include "access/parallel.h"
#include "access/table.h"
#include "access/toast_internals.h"
#include "access/xact.h"
@@ -116,7 +117,15 @@ toast_save_datum(Relation rel, Datum value,
TupleDesc toasttupDesc;
Datum t_values[3];
bool t_isnull[3];
- CommandId mycid = GetCurrentCommandId(true);
+
+ /*
+ * Parallel copy can insert toast tuples, in case of parallel copy the
+ * command would have been set already by calling AssignCommandIdForWorker.
+ * For parallel copy call GetCurrentCommandId to get currentCommandId by
+ * passing used as false, as this is taken care earlier.
+ */
+ CommandId mycid = IsParallelWorker() ? GetCurrentCommandId(false) :
+ GetCurrentCommandId(true);
struct varlena *result;
struct varatt_external toast_pointer;
union
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 5eef225..bd7a7fc 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2023,19 +2023,6 @@ static HeapTuple
heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
CommandId cid, int options)
{
- /*
- * Parallel operations are required to be strictly read-only in a parallel
- * worker. Parallel inserts are not safe even in the leader in the
- * general case, because group locking means that heavyweight locks for
- * relation extension or GIN page locks will not conflict between members
- * of a lock group, but we don't prohibit that case here because there are
- * useful special cases that we can safely allow, such as CREATE TABLE AS.
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-
tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index d4f7c29..9bff390 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -504,6 +504,37 @@ GetCurrentFullTransactionIdIfAny(void)
}
/*
+ * AssignFullTransactionIdForWorker
+ *
+ * For parallel copy, transaction id of leader will be used by the workers.
+ */
+void
+AssignFullTransactionIdForWorker(FullTransactionId fullTransactionId)
+{
+ TransactionState s = CurrentTransactionState;
+
+ Assert((IsInParallelMode() || IsParallelWorker()));
+ s->fullTransactionId = fullTransactionId;
+}
+
+/*
+ * AssignCommandIdForWorker
+ *
+ * For parallel copy, command id of leader will be used by the workers.
+ */
+void
+AssignCommandIdForWorker(CommandId commandId, bool used)
+{
+ Assert((IsInParallelMode() || IsParallelWorker()));
+
+ /* this is global to a transaction, not subtransaction-local */
+ if (used)
+ currentCommandIdUsed = true;
+
+ currentCommandId = commandId;
+}
+
+/*
* MarkCurrentTransactionIdLoggedIfAny
*
* Remember that the current xid - if it is assigned - now has been wal logged.
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3dcdfc1..c65fc98 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -26,6 +26,7 @@
#include "access/xlog.h"
#include "catalog/dependency.h"
#include "catalog/pg_authid.h"
+#include "catalog/pg_proc_d.h"
#include "catalog/pg_type.h"
#include "commands/copy.h"
#include "commands/defrem.h"
@@ -40,11 +41,13 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "optimizer/clauses.h"
#include "optimizer/optimizer.h"
#include "parser/parse_coerce.h"
#include "parser/parse_collate.h"
#include "parser/parse_expr.h"
#include "parser/parse_relation.h"
+#include "pgstat.h"
#include "port/pg_bswap.h"
#include "rewrite/rewriteHandler.h"
#include "storage/fd.h"
@@ -95,6 +98,18 @@ typedef enum CopyInsertMethod
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
} CopyInsertMethod;
+/*
+ * State of the line.
+ */
+typedef enum ParallelCopyLineState
+{
+ LINE_INIT, /* initial state of line */
+ LINE_LEADER_POPULATING, /* leader processing line */
+ LINE_LEADER_POPULATED, /* leader completed populating line */
+ LINE_WORKER_PROCESSING, /* worker processing line */
+ LINE_WORKER_PROCESSED /* worker completed processing line */
+}ParallelCopyLineState;
+
#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
/*
@@ -123,6 +138,7 @@ typedef enum CopyInsertMethod
#define IsParallelCopy() (cstate->is_parallel)
#define IsLeader() (cstate->pcdata->is_leader)
+#define IsWorker() (IsParallelCopy() && !IsLeader())
#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
/*
@@ -556,9 +572,13 @@ if (1) \
{ \
if (raw_buf_ptr > cstate->raw_buf_index) \
{ \
- appendBinaryStringInfo(&cstate->line_buf, \
- cstate->raw_buf + cstate->raw_buf_index, \
- raw_buf_ptr - cstate->raw_buf_index); \
+ if (!IsParallelCopy()) \
+ appendBinaryStringInfo(&cstate->line_buf, \
+ cstate->raw_buf + cstate->raw_buf_index, \
+ raw_buf_ptr - cstate->raw_buf_index); \
+ else \
+ line_size += raw_buf_ptr - cstate->raw_buf_index; \
+ \
cstate->raw_buf_index = raw_buf_ptr; \
} \
} else ((void) 0)
@@ -571,26 +591,65 @@ if (1) \
goto not_end_of_copy; \
} else ((void) 0)
+/* Begin parallel copy Macros */
+#define SET_NEWLINE_SIZE() \
+{ \
+ if (cstate->eol_type == EOL_NL || cstate->eol_type == EOL_CR) \
+ new_line_size = 1; \
+ else if (cstate->eol_type == EOL_CRNL) \
+ new_line_size = 2; \
+ else \
+ new_line_size = 0; \
+}
+
+/*
+ * COPY_WAIT_TO_PROCESS - Wait before continuing to process.
+ */
+#define COPY_WAIT_TO_PROCESS() \
+{ \
+ CHECK_FOR_INTERRUPTS(); \
+ (void) WaitLatch(MyLatch, \
+ WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, \
+ 1L, WAIT_EVENT_PG_SLEEP); \
+ ResetLatch(MyLatch); \
+}
+
/*
* CLEAR_EOL_LINE - Wrapper for clearing EOL.
*/
#define CLEAR_EOL_LINE() \
if (!result && !IsHeaderLine()) \
- ClearEOLFromCopiedData(cstate, cstate->line_buf.data, \
- cstate->line_buf.len, \
- &cstate->line_buf.len) \
+{ \
+ if (IsParallelCopy()) \
+ ClearEOLFromCopiedData(cstate, cstate->raw_buf, \
+ raw_buf_ptr, &line_size); \
+ else \
+ ClearEOLFromCopiedData(cstate, cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ &cstate->line_buf.len); \
+} \
/*
* INCREMENTPROCESSED - Increment the lines processed.
*/
-#define INCREMENTPROCESSED(processed) \
-processed++;
+#define INCREMENTPROCESSED(processed) \
+{ \
+ if (!IsParallelCopy()) \
+ processed++; \
+ else \
+ pg_atomic_add_fetch_u64(&cstate->pcdata->pcshared_info->processed, 1); \
+}
/*
* GETPROCESSED - Get the lines processed.
*/
#define GETPROCESSED(processed) \
-return processed;
+if (!IsParallelCopy()) \
+ return processed; \
+else \
+ return pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
+
+/* End parallel copy Macros */
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
@@ -645,7 +704,10 @@ static void PopulateCommonCstateInfo(CopyState cstate, TupleDesc tup_desc,
static void ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
int copy_line_pos, int *copy_line_size);
static void ConvertToServerEncoding(CopyState cstate);
-
+static void ExecBeforeStmtTrigger(CopyState cstate);
+static void CheckTargetRelValidity(CopyState cstate);
+static void PopulateCstateCatalogInfo(CopyState cstate);
+static pg_attribute_always_inline uint32 GetLinePosition(CopyState cstate);
/*
* SerializeParallelCopyState - Copy shared_cstate using cstate information.
*/
@@ -727,6 +789,137 @@ PopulateParallelCopyShmInfo(ParallelCopyShmInfo *shared_info_ptr,
}
/*
+ * IsTriggerFunctionParallelSafe - Check if the trigger function is parallel
+ * safe for the triggers. Return false if any one of the trigger has parallel
+ * unsafe function.
+ */
+static pg_attribute_always_inline bool
+IsTriggerFunctionParallelSafe(TriggerDesc *trigdesc)
+{
+ int i;
+ for (i = 0; i < trigdesc->numtriggers; i++)
+ {
+ Trigger *trigger = &trigdesc->triggers[i];
+ int trigtype = RI_TRIGGER_NONE;
+
+ if (func_parallel(trigger->tgfoid) != PROPARALLEL_SAFE)
+ return false;
+
+ /* If the trigger is parallel safe, also look for RI_TRIGGER. */
+ trigtype = RI_FKey_trigger_type(trigger->tgfoid);
+ if (trigtype == RI_TRIGGER_PK || trigtype == RI_TRIGGER_FK)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * CheckExprParallelSafety - determine parallel safety of volatile expressions
+ * in default clause of column definition or in where clause and return true if
+ * they are parallel safe.
+ */
+static pg_attribute_always_inline bool
+CheckExprParallelSafety(CopyState cstate)
+{
+ if (contain_volatile_functions(cstate->whereClause))
+ {
+ if (max_parallel_hazard((Query *)cstate->whereClause) != PROPARALLEL_SAFE)
+ return false;
+ }
+
+ /*
+ * Check if any of the column has volatile default expression. if yes, and
+ * they are not parallel safe, then parallelism is not allowed. For
+ * instance, if there are any serial/bigserial columns for which nextval()
+ * default expression which is parallel unsafe is associated, parallelism
+ * should not be allowed. In non parallel copy volatile functions are not
+ * checked for nextval().
+ */
+ if (cstate->defexprs != NULL && cstate->num_defaults != 0)
+ {
+ int i;
+ for (i = 0; i < cstate->num_defaults; i++)
+ {
+ bool volatile_expr = contain_volatile_functions((Node *)cstate->defexprs[i]->expr);
+ if (volatile_expr &&
+ (max_parallel_hazard((Query *)cstate->defexprs[i]->expr)) !=
+ PROPARALLEL_SAFE)
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * FindInsertMethod - determine insert mode single, multi, or multi conditional.
+ */
+static pg_attribute_always_inline CopyInsertMethod
+FindInsertMethod(CopyState cstate)
+{
+ if (cstate->rel->trigdesc != NULL &&
+ (cstate->rel->trigdesc->trig_insert_before_row ||
+ cstate->rel->trigdesc->trig_insert_instead_row))
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
+ cstate->rel->trigdesc != NULL &&
+ cstate->rel->trigdesc->trig_insert_new_table)
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+ return CIM_MULTI_CONDITIONAL;
+
+ return CIM_MULTI;
+}
+
+/*
+ * IsParallelCopyAllowed - check for the cases where parallel copy is not
+ * applicable.
+ */
+static pg_attribute_always_inline bool
+IsParallelCopyAllowed(CopyState cstate)
+{
+ /* Parallel copy not allowed for frontend (2.0 protocol) & binary option. */
+ if ((cstate->copy_dest == COPY_OLD_FE) || cstate->binary)
+ return false;
+
+ /* Check if copy is into foreign table or temporary table. */
+ if (cstate->rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE ||
+ RelationUsesLocalBuffers(cstate->rel))
+ return false;
+
+ /* Check if trigger function is parallel safe. */
+ if (cstate->rel->trigdesc != NULL &&
+ !IsTriggerFunctionParallelSafe(cstate->rel->trigdesc))
+ return false;
+
+ /*
+ * Check if there is after statement or instead of trigger or transition
+ * table triggers.
+ */
+ if (cstate->rel->trigdesc != NULL &&
+ (cstate->rel->trigdesc->trig_insert_after_statement ||
+ cstate->rel->trigdesc->trig_insert_instead_row ||
+ cstate->rel->trigdesc->trig_insert_new_table))
+ return false;
+
+ /* Check if the volatile expressions are parallel safe, if present any. */
+ if (!CheckExprParallelSafety(cstate))
+ return false;
+
+ /* Check if the insertion mode is single. */
+ if (FindInsertMethod(cstate) == CIM_SINGLE)
+ return false;
+
+ return true;
+}
+
+/*
* BeginParallelCopy - start parallel copy tasks.
*
* Get the number of workers required to perform the parallel copy. The data
@@ -754,6 +947,7 @@ BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
ParallelCopyData *pcdata;
MemoryContext oldcontext;
+ CheckTargetRelValidity(cstate);
parallel_workers = Min(nworkers, max_worker_processes);
/* Can't perform copy in parallel */
@@ -765,6 +959,15 @@ BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
MemoryContextSwitchTo(oldcontext);
cstate->pcdata = pcdata;
+ /*
+ * User chosen parallel copy. Determine if the parallel copy is actually
+ * allowed. If not, go with the non-parallel mode.
+ */
+ if (!IsParallelCopyAllowed(cstate))
+ return NULL;
+
+ full_transaction_id = GetCurrentFullTransactionId();
+
EnterParallelMode();
pcxt = CreateParallelContext("postgres", "ParallelCopyMain",
parallel_workers);
@@ -973,9 +1176,212 @@ InitializeParallelCopyInfo(SerializedParallelCopyState *shared_cstate,
cstate->line_buf_converted = false;
cstate->raw_buf = NULL;
cstate->raw_buf_index = cstate->raw_buf_len = 0;
+
+ PopulateCstateCatalogInfo(cstate);
+
+ /* Create workspace for CopyReadAttributes results. */
+ if (!cstate->binary)
+ {
+ AttrNumber attr_count = list_length(cstate->attnumlist);
+
+ cstate->max_fields = attr_count;
+ cstate->raw_fields = (char **)palloc(attr_count * sizeof(char *));
+ }
+}
+
+/*
+ * CacheLineInfo - Cache the line information to local memory.
+ */
+static bool
+CacheLineInfo(CopyState cstate, uint32 buff_count)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyData *pcdata = cstate->pcdata;
+ uint32 write_pos;
+ ParallelCopyDataBlock *data_blk_ptr;
+ ParallelCopyLineBoundary *lineInfo;
+ uint32 offset;
+ int dataSize;
+ int copiedSize = 0;
+
+ resetStringInfo(&pcdata->worker_line_buf[buff_count].line_buf);
+ write_pos = GetLinePosition(cstate);
+ if (-1 == write_pos)
+ return true;
+
+ /* Get the current line information. */
+ lineInfo = &pcshared_info->line_boundaries.ring[write_pos];
+ if (pg_atomic_read_u32(&lineInfo->line_size) == 0)
+ goto empty_data_line_update;
+
+ /* Get the block information. */
+ data_blk_ptr = &pcshared_info->data_blocks[lineInfo->first_block];
+
+ /* Get the offset information from where the data must be copied. */
+ offset = lineInfo->start_offset;
+ pcdata->worker_line_buf[buff_count].cur_lineno = lineInfo->cur_lineno;
+
+ elog(DEBUG1, "[Worker] Processing - line position:%d, block:%d, unprocessed lines:%d, offset:%d, line size:%d",
+ write_pos, lineInfo->first_block,
+ pg_atomic_read_u32(&data_blk_ptr->unprocessed_line_parts),
+ offset, pg_atomic_read_u32(&lineInfo->line_size));
+
+ for (;;)
+ {
+ uint8 skip_bytes = data_blk_ptr->skip_bytes;
+ /*
+ * There is a possibility that the above loop has come out because
+ * data_blk_ptr->curr_blk_completed is set, but dataSize read might
+ * be an old value, if data_blk_ptr->curr_blk_completed and the line is
+ * completed, line_size will be set. Read the line_size again to be
+ * sure if it is complete or partial block.
+ */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+ if (dataSize)
+ {
+ int remainingSize = dataSize - copiedSize;
+ if (!remainingSize)
+ break;
+
+ /* Whole line is in current block. */
+ if (remainingSize + offset + skip_bytes < DATA_BLOCK_SIZE)
+ {
+ appendBinaryStringInfo(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ remainingSize);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts,
+ 1);
+ break;
+ }
+ else
+ {
+ /* Line is spread across the blocks. */
+ uint32 lineInCurrentBlock = (DATA_BLOCK_SIZE - skip_bytes) - offset;
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ lineInCurrentBlock);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts, 1);
+ copiedSize += lineInCurrentBlock;
+ while (copiedSize < dataSize)
+ {
+ uint32 currentBlockCopySize;
+ ParallelCopyDataBlock *currBlkPtr = &pcshared_info->data_blocks[data_blk_ptr->following_block];
+ skip_bytes = currBlkPtr->skip_bytes;
+
+ /*
+ * If complete data is present in current block use
+ * dataSize - copiedSize, or copy the whole block from
+ * current block.
+ */
+ currentBlockCopySize = Min(dataSize - copiedSize, DATA_BLOCK_SIZE - skip_bytes);
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &currBlkPtr->data[0],
+ currentBlockCopySize);
+ pg_atomic_sub_fetch_u32(&currBlkPtr->unprocessed_line_parts, 1);
+ copiedSize += currentBlockCopySize;
+ data_blk_ptr = currBlkPtr;
+ }
+
+ break;
+ }
+ }
+ else
+ {
+ /* Copy this complete block from the current offset. */
+ uint32 lineInCurrentBlock = (DATA_BLOCK_SIZE - skip_bytes) - offset;
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ lineInCurrentBlock);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts, 1);
+ copiedSize += lineInCurrentBlock;
+
+ /*
+ * Reset the offset. For the first copy, copy from the offset. For
+ * the subsequent copy the complete block.
+ */
+ offset = 0;
+
+ /* Set data_blk_ptr to the following block. */
+ data_blk_ptr = &pcshared_info->data_blocks[data_blk_ptr->following_block];
+ }
+
+ for (;;)
+ {
+ /* Get the size of this line */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+
+ /*
+ * If the data is present in current block lineInfo.line_size
+ * will be updated. If the data is spread across the blocks either
+ * of lineInfo.line_size or data_blk_ptr->curr_blk_completed can
+ * be updated. lineInfo.line_size will be updated if the complete
+ * read is finished. data_blk_ptr->curr_blk_completed will be
+ * updated if processing of current block is finished and data
+ * processing is not finished.
+ */
+ if (data_blk_ptr->curr_blk_completed || (dataSize != -1))
+ break;
+
+ COPY_WAIT_TO_PROCESS()
+ }
+ }
+
+empty_data_line_update:
+ elog(DEBUG1, "[Worker] Completed processing line:%d", write_pos);
+ pg_atomic_write_u32(&lineInfo->line_state, LINE_WORKER_PROCESSED);
+ pg_atomic_write_u32(&lineInfo->line_size, -1);
+ pg_atomic_add_fetch_u64(&pcshared_info->total_worker_processed, 1);
+ return false;
}
/*
+ * GetWorkerLine - Returns a line for worker to process.
+ */
+static bool
+GetWorkerLine(CopyState cstate)
+{
+ uint32 buff_count;
+ ParallelCopyData *pcdata = cstate->pcdata;
+
+ /*
+ * Copy the line data to line_buf and release the line position so that the
+ * worker can continue loading data.
+ */
+ if (pcdata->worker_line_buf_pos < pcdata->worker_line_buf_count)
+ goto return_line;
+
+ pcdata->worker_line_buf_pos = 0;
+ pcdata->worker_line_buf_count = 0;
+
+ for (buff_count = 0; buff_count < WORKER_CHUNK_COUNT; buff_count++)
+ {
+ bool result = CacheLineInfo(cstate, buff_count);
+ if (result)
+ break;
+
+ pcdata->worker_line_buf_count++;
+ }
+
+ if (pcdata->worker_line_buf_count)
+ goto return_line;
+ else
+ resetStringInfo(&cstate->line_buf);
+
+ return true;
+
+return_line:
+ cstate->line_buf = pcdata->worker_line_buf[pcdata->worker_line_buf_pos].line_buf;
+ cstate->cur_lineno = pcdata->worker_line_buf[pcdata->worker_line_buf_pos].cur_lineno;
+ cstate->line_buf_valid = true;
+
+ /* Mark that encoding conversion hasn't occurred yet. */
+ cstate->line_buf_converted = false;
+ ConvertToServerEncoding(cstate);
+ pcdata->worker_line_buf_pos++;
+ return false;
+ }
+
+/*
* ParallelCopyMain - parallel copy worker's code.
*
* Where clause handling, convert tuple to columns, add default null values for
@@ -1024,6 +1430,8 @@ ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
ereport(DEBUG1, (errmsg("Starting parallel copy worker")));
pcdata->pcshared_info = pcshared_info;
+ AssignFullTransactionIdForWorker(pcshared_info->full_transaction_id);
+ AssignCommandIdForWorker(pcshared_info->mycid, true);
shared_cstate = (SerializedParallelCopyState *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_CSTATE, false);
cstate->null_print = (char *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_NULL_PRINT, true);
@@ -1084,6 +1492,33 @@ ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
}
/*
+ * UpdateBlockInLineInfo - Update the line information.
+ */
+static pg_attribute_always_inline int
+UpdateBlockInLineInfo(CopyState cstate, uint32 blk_pos,
+ uint32 offset, uint32 line_size, uint32 line_state)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundaries *lineBoundaryPtr = &pcshared_info->line_boundaries;
+ ParallelCopyLineBoundary *lineInfo;
+ int line_pos = lineBoundaryPtr->pos;
+
+ /* Update the line information for the worker to pick and process. */
+ lineInfo = &lineBoundaryPtr->ring[line_pos];
+ while (pg_atomic_read_u32(&lineInfo->line_size) != -1)
+ COPY_WAIT_TO_PROCESS()
+
+ lineInfo->first_block = blk_pos;
+ lineInfo->start_offset = offset;
+ lineInfo->cur_lineno = cstate->cur_lineno;
+ pg_atomic_write_u32(&lineInfo->line_size, line_size);
+ pg_atomic_write_u32(&lineInfo->line_state, line_state);
+ lineBoundaryPtr->pos = (lineBoundaryPtr->pos + 1) % RINGSIZE;
+
+ return line_pos;
+}
+
+/*
* ParallelCopyFrom - parallel copy leader's functionality.
*
* Leader executes the before statement for before statement trigger, if before
@@ -1106,8 +1541,298 @@ ParallelCopyFrom(CopyState cstate)
ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
ereport(DEBUG1, (errmsg("Running parallel copy leader")));
+ /* Execute the before statement triggers from the leader */
+ ExecBeforeStmtTrigger(cstate);
+
+ /* On input just throw the header line away. */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
+ {
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ {
+ pcshared_info->is_read_in_progress = false;
+ return; /* done */
+ }
+ }
+
+ for (;;)
+ {
+ bool done;
+ cstate->cur_lineno++;
+
+ /* Actually read the line into memory here. */
+ done = CopyReadLine(cstate);
+
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ break;
+ }
+
pcshared_info->is_read_in_progress = false;
cstate->cur_lineno = 0;
+ }
+
+/*
+ * GetLinePosition - return the line position that worker should process.
+ */
+static uint32
+GetLinePosition(CopyState cstate)
+{
+ ParallelCopyData *pcdata = cstate->pcdata;
+ ParallelCopyShmInfo *pcshared_info = pcdata->pcshared_info;
+ uint32 previous_pos = pcdata->worker_processed_pos;
+ uint32 write_pos = (previous_pos == -1) ? 0 : (previous_pos + 1) % RINGSIZE;
+ for (;;)
+ {
+ int dataSize;
+ bool is_read_in_progress = pcshared_info->is_read_in_progress;
+ ParallelCopyLineBoundary *lineInfo;
+ ParallelCopyDataBlock *data_blk_ptr;
+ ParallelCopyLineState line_state = LINE_LEADER_POPULATED;
+ ParallelCopyLineState curr_line_state;
+ CHECK_FOR_INTERRUPTS();
+
+ /* File read completed & no elements to process. */
+ if (!is_read_in_progress &&
+ (pcshared_info->populated ==
+ pg_atomic_read_u64(&pcshared_info->total_worker_processed)))
+ {
+ write_pos = -1;
+ break;
+ }
+
+ /* Get the current line information. */
+ lineInfo = &pcshared_info->line_boundaries.ring[write_pos];
+ curr_line_state = pg_atomic_read_u32(&lineInfo->line_state);
+ if ((write_pos % WORKER_CHUNK_COUNT == 0) &&
+ (curr_line_state == LINE_WORKER_PROCESSED ||
+ curr_line_state == LINE_WORKER_PROCESSING))
+ {
+ pcdata->worker_processed_pos = write_pos;
+ write_pos = (write_pos + WORKER_CHUNK_COUNT) % RINGSIZE;
+ continue;
+ }
+
+ /* Get the size of this line. */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+
+ if (dataSize != 0) /* If not an empty line. */
+ {
+ /* Get the block information. */
+ data_blk_ptr = &pcshared_info->data_blocks[lineInfo->first_block];
+
+ if (!data_blk_ptr->curr_blk_completed && (dataSize == -1))
+ {
+ /* Wait till the current line or block is added. */
+ COPY_WAIT_TO_PROCESS()
+ continue;
+ }
+ }
+
+ /* Make sure that no worker has consumed this element. */
+ if (pg_atomic_compare_exchange_u32(&lineInfo->line_state,
+ &line_state, LINE_WORKER_PROCESSING))
+ break;
+ }
+
+ pcdata->worker_processed_pos = write_pos;
+ return write_pos;
+}
+
+/*
+ * GetFreeCopyBlock - Get a free block for data to be copied.
+ */
+static pg_attribute_always_inline uint32
+GetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
+{
+ int count = 0;
+ uint32 last_free_block = pcshared_info->cur_block_pos;
+ uint32 block_pos = (last_free_block != -1) ? ((last_free_block + 1) % MAX_BLOCKS_COUNT): 0;
+
+ /*
+ * Get a new block for copying data, don't check current block, current
+ * block will have some unprocessed data.
+ */
+ while (count < (MAX_BLOCKS_COUNT - 1))
+ {
+ ParallelCopyDataBlock *dataBlkPtr = &pcshared_info->data_blocks[block_pos];
+ uint32 unprocessed_line_parts = pg_atomic_read_u32(&dataBlkPtr->unprocessed_line_parts);
+ if (unprocessed_line_parts == 0)
+ {
+ dataBlkPtr->curr_blk_completed = false;
+ dataBlkPtr->skip_bytes = 0;
+ pcshared_info->cur_block_pos = block_pos;
+ return block_pos;
+ }
+
+ block_pos = (block_pos + 1) % MAX_BLOCKS_COUNT;
+ count++;
+ }
+
+ return -1;
+}
+
+/*
+ * WaitGetFreeCopyBlock - If there are no blocks available, wait and get a block
+ * for copying data.
+ */
+static uint32
+WaitGetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
+{
+ uint32 new_free_pos = -1;
+ for (;;)
+ {
+ new_free_pos = GetFreeCopyBlock(pcshared_info);
+ if (new_free_pos != -1) /* We have got one block, break now. */
+ break;
+
+ COPY_WAIT_TO_PROCESS()
+ }
+
+ return new_free_pos;
+}
+
+/*
+ * SetRawBufForLoad - Set raw_buf to the shared memory where the file data must
+ * be read.
+ */
+static void
+SetRawBufForLoad(CopyState cstate, uint32 line_size, uint32 copy_buf_len,
+ uint32 raw_buf_ptr, char **copy_raw_buf)
+{
+ ParallelCopyShmInfo *pcshared_info;
+ uint32 cur_block_pos;
+ uint32 next_block_pos;
+ ParallelCopyDataBlock *cur_data_blk_ptr = NULL;
+ ParallelCopyDataBlock *next_data_blk_ptr = NULL;
+
+ if (!IsParallelCopy())
+ return;
+
+ pcshared_info = cstate->pcdata->pcshared_info;
+ cur_block_pos = pcshared_info->cur_block_pos;
+ cur_data_blk_ptr = (cstate->raw_buf) ? &pcshared_info->data_blocks[cur_block_pos] : NULL;
+ next_block_pos = WaitGetFreeCopyBlock(pcshared_info);
+ next_data_blk_ptr = &pcshared_info->data_blocks[next_block_pos];
+
+ /* set raw_buf to the data block in shared memory */
+ cstate->raw_buf = next_data_blk_ptr->data;
+ *copy_raw_buf = cstate->raw_buf;
+ if (cur_data_blk_ptr && line_size)
+ {
+ /*
+ * Mark the previous block as completed, worker can start copying this
+ * data.
+ */
+ cur_data_blk_ptr->following_block = next_block_pos;
+ pg_atomic_add_fetch_u32(&cur_data_blk_ptr->unprocessed_line_parts, 1);
+ cur_data_blk_ptr->skip_bytes = copy_buf_len - raw_buf_ptr;
+ cur_data_blk_ptr->curr_blk_completed = true;
+ }
+}
+
+/*
+ * EndLineParallelCopy - Update the line information in shared memory.
+ */
+static void
+EndLineParallelCopy(CopyState cstate, uint32 line_pos, uint32 line_size,
+ uint32 raw_buf_ptr)
+{
+ uint8 new_line_size;
+ if (!IsParallelCopy())
+ return;
+
+ if (!IsHeaderLine())
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundaries *lineBoundaryPtr = &pcshared_info->line_boundaries;
+ SET_NEWLINE_SIZE()
+ if (line_size)
+ {
+ ParallelCopyLineBoundary *lineInfo = &lineBoundaryPtr->ring[line_pos];
+ /*
+ * If the new_line_size > raw_buf_ptr, then the new block has only
+ * new line char content. The unprocessed count should not be
+ * increased in this case.
+ */
+ if (raw_buf_ptr > new_line_size)
+ {
+ uint32 cur_block_pos = pcshared_info->cur_block_pos;
+ ParallelCopyDataBlock *curr_data_blk_ptr = &pcshared_info->data_blocks[cur_block_pos];
+ pg_atomic_add_fetch_u32(&curr_data_blk_ptr->unprocessed_line_parts, 1);
+ }
+
+ /* Update line size. */
+ pg_atomic_write_u32(&lineInfo->line_size, line_size);
+ pg_atomic_write_u32(&lineInfo->line_state, LINE_LEADER_POPULATED);
+ elog(DEBUG1, "[Leader] After adding - line position:%d, line_size:%d",
+ line_pos, line_size);
+ pcshared_info->populated++;
+ }
+ else if (new_line_size)
+ {
+ /* This means only new line char, empty record should be inserted.*/
+ ParallelCopyLineBoundary *lineInfo;
+ line_pos = UpdateBlockInLineInfo(cstate, -1, -1, 0,
+ LINE_LEADER_POPULATED);
+ lineInfo = &lineBoundaryPtr->ring[line_pos];
+ elog(DEBUG1, "[Leader] Added empty line with offset:%d, line position:%d, line size:%d",
+ lineInfo->start_offset, line_pos,
+ pg_atomic_read_u32(&lineInfo->line_size));
+ pcshared_info->populated++;
+ }
+ }
+}
+
+/*
+ * ExecBeforeStmtTrigger - Execute the before statement trigger, this will be
+ * executed for parallel copy by the leader process.
+ */
+static void
+ExecBeforeStmtTrigger(CopyState cstate)
+{
+ EState *estate = CreateExecutorState();
+ ResultRelInfo *resultRelInfo;
+
+ Assert(IsLeader());
+
+ /*
+ * We need a ResultRelInfo so we can use the regular executor's
+ * index-entry-making machinery. (There used to be a huge amount of code
+ * here that basically duplicated execUtils.c ...)
+ */
+ resultRelInfo = makeNode(ResultRelInfo);
+ InitResultRelInfo(resultRelInfo,
+ cstate->rel,
+ 1, /* must match rel's position in range_table */
+ NULL,
+ 0);
+
+ /* Verify the named relation is a valid target for INSERT */
+ CheckValidResultRel(resultRelInfo, CMD_INSERT);
+
+ estate->es_result_relations = resultRelInfo;
+ estate->es_num_result_relations = 1;
+ estate->es_result_relation_info = resultRelInfo;
+
+ ExecInitRangeTable(estate, cstate->range_table);
+
+ /*
+ * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
+ * should do this for COPY, since it's not really an "INSERT" statement as
+ * such. However, executing these triggers maintains consistency with the
+ * EACH ROW triggers that we already fire on COPY.
+ */
+ ExecBSInsertTriggers(estate, resultRelInfo);
+
+ /* Close any trigger target relations */
+ ExecCleanUpTriggerState(estate);
+
+ FreeExecutorState(estate);
}
/*
@@ -3584,7 +4309,8 @@ CopyFrom(CopyState cstate)
PartitionTupleRouting *proute = NULL;
ErrorContextCallback errcallback;
- CommandId mycid = GetCurrentCommandId(true);
+ CommandId mycid = IsParallelCopy() ? GetCurrentCommandId(false) :
+ GetCurrentCommandId(true);
int ti_options = 0; /* start with default options for insert */
BulkInsertState bistate = NULL;
CopyInsertMethod insertMethod;
@@ -3594,7 +4320,14 @@ CopyFrom(CopyState cstate)
bool has_instead_insert_row_trig;
bool leafpart_use_multi_insert = false;
- CheckTargetRelValidity(cstate);
+ /*
+ * Perform this check if it is not parallel copy. In case of parallel
+ * copy, this check is done by the leader, so that if any invalid case
+ * exist the copy from command will error out from the leader itself,
+ * avoiding launching workers, just to throw error.
+ */
+ if (!IsParallelCopy())
+ CheckTargetRelValidity(cstate);
/*
* If the target file is new-in-transaction, we assume that checking FSM
@@ -3634,7 +4367,8 @@ CopyFrom(CopyState cstate)
target_resultRelInfo = resultRelInfo;
/* Verify the named relation is a valid target for INSERT */
- CheckValidResultRel(resultRelInfo, CMD_INSERT);
+ if (!IsParallelCopy())
+ CheckValidResultRel(resultRelInfo, CMD_INSERT);
ExecOpenIndices(resultRelInfo, false);
@@ -3783,13 +4517,16 @@ CopyFrom(CopyState cstate)
has_instead_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
resultRelInfo->ri_TrigDesc->trig_insert_instead_row);
- /*
- * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
- * should do this for COPY, since it's not really an "INSERT" statement as
- * such. However, executing these triggers maintains consistency with the
- * EACH ROW triggers that we already fire on COPY.
- */
- ExecBSInsertTriggers(estate, resultRelInfo);
+ if (!IsParallelCopy())
+ {
+ /*
+ * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
+ * should do this for COPY, since it's not really an "INSERT" statement as
+ * such. However, executing these triggers maintains consistency with the
+ * EACH ROW triggers that we already fire on COPY.
+ */
+ ExecBSInsertTriggers(estate, resultRelInfo);
+ }
econtext = GetPerTupleExprContext(estate);
@@ -3889,6 +4626,16 @@ CopyFrom(CopyState cstate)
!has_instead_insert_row_trig &&
resultRelInfo->ri_FdwRoutine == NULL;
+ /*
+ * If the table has any partitions that are either foreign or
+ * has BEFORE/INSTEAD OF triggers, we can't perform copy
+ * operations with parallel workers.
+ */
+ if (!leafpart_use_multi_insert && IsParallelWorker())
+ ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot perform PARALLEL COPY if partition has BEFORE/INSTEAD OF triggers, or if the partition is foreign partition"),
+ errhint("Try COPY without PARALLEL option")));
+
/* Set the multi-insert buffer to use for this partition. */
if (leafpart_use_multi_insert)
{
@@ -4305,7 +5052,7 @@ BeginCopyFrom(ParseState *pstate,
* only in text mode.
*/
initStringInfo(&cstate->attribute_buf);
- cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf = (IsParallelCopy()) ? NULL : (char *) palloc(RAW_BUF_SIZE + 1);
cstate->raw_buf_index = cstate->raw_buf_len = 0;
if (!cstate->binary)
{
@@ -4454,26 +5201,35 @@ NextCopyFromRawFields(CopyState cstate, char ***fields, int *nfields)
/* only available for text or csv input */
Assert(!cstate->binary);
- /* on input just throw the header line away */
- if (cstate->cur_lineno == 0 && cstate->header_line)
+ if (IsParallelCopy())
{
- cstate->cur_lineno++;
- if (CopyReadLine(cstate))
- return false; /* done */
+ done = GetWorkerLine(cstate);
+ if (done && cstate->line_buf.len == 0)
+ return false;
}
+ else
+ {
+ /* on input just throw the header line away */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
+ {
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ return false; /* done */
+ }
- cstate->cur_lineno++;
+ cstate->cur_lineno++;
- /* Actually read the line into memory here */
- done = CopyReadLine(cstate);
+ /* Actually read the line into memory here */
+ done = CopyReadLine(cstate);
- /*
- * EOF at start of line means we're done. If we see EOF after some
- * characters, we act as though it was newline followed by EOF, ie,
- * process the line and then exit loop on next iteration.
- */
- if (done && cstate->line_buf.len == 0)
- return false;
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ return false;
+ }
/* Parse the line into de-escaped field values */
if (cstate->csv_mode)
@@ -4721,9 +5477,31 @@ CopyReadLine(CopyState cstate)
*/
if (cstate->copy_dest == COPY_NEW_FE)
{
+ bool bIsFirst = true;
do
{
- cstate->raw_buf_index = cstate->raw_buf_len;
+ if (!IsParallelCopy())
+ cstate->raw_buf_index = cstate->raw_buf_len;
+ else
+ {
+ if (cstate->raw_buf_index == RAW_BUF_SIZE)
+ {
+ /* Get a new block if it is the first time, From the
+ * subsequent time, reset the index and re-use the same
+ * block.
+ */
+ if (bIsFirst)
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint32 block_pos = WaitGetFreeCopyBlock(pcshared_info);
+ cstate->raw_buf = pcshared_info->data_blocks[block_pos].data;
+ bIsFirst = false;
+ }
+
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+ }
+ }
+
} while (CopyLoadRawBuf(cstate));
}
}
@@ -4778,7 +5556,7 @@ static void
ConvertToServerEncoding(CopyState cstate)
{
/* Done reading the line. Convert it to server encoding. */
- if (cstate->need_transcoding)
+ if (cstate->need_transcoding && (!IsParallelCopy() || IsWorker()))
{
char *cvt;
cvt = pg_any_to_server(cstate->line_buf.data,
@@ -4817,6 +5595,11 @@ CopyReadLineText(CopyState cstate)
char quotec = '\0';
char escapec = '\0';
+ /* For parallel copy */
+ int line_size = 0;
+ int line_pos = 0;
+
+ cstate->eol_type = EOL_UNKNOWN;
if (cstate->csv_mode)
{
quotec = cstate->quote[0];
@@ -4871,6 +5654,8 @@ CopyReadLineText(CopyState cstate)
if (raw_buf_ptr >= copy_buf_len || need_data)
{
REFILL_LINEBUF;
+ SetRawBufForLoad(cstate, line_size, copy_buf_len, raw_buf_ptr,
+ ©_raw_buf);
/*
* Try to read some more data. This will certainly reset
@@ -5095,9 +5880,15 @@ CopyReadLineText(CopyState cstate)
* discard the data and the \. sequence.
*/
if (prev_raw_ptr > cstate->raw_buf_index)
- appendBinaryStringInfo(&cstate->line_buf,
+ {
+ if (!IsParallelCopy())
+ appendBinaryStringInfo(&cstate->line_buf,
cstate->raw_buf + cstate->raw_buf_index,
prev_raw_ptr - cstate->raw_buf_index);
+ else
+ line_size += prev_raw_ptr - cstate->raw_buf_index;
+ }
+
cstate->raw_buf_index = raw_buf_ptr;
result = true; /* report EOF */
break;
@@ -5149,6 +5940,26 @@ not_end_of_copy:
IF_NEED_REFILL_AND_EOF_BREAK(mblen - 1);
raw_buf_ptr += mblen - 1;
}
+
+ /*
+ * Skip the header line. Update the line here, this cannot be done at
+ * the beginning, as there is a possibility that file contains empty
+ * lines.
+ */
+ if (IsParallelCopy() && first_char_in_line && !IsHeaderLine())
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundary *lineInfo;
+ uint32 line_first_block = pcshared_info->cur_block_pos;
+ line_pos = UpdateBlockInLineInfo(cstate,
+ line_first_block,
+ cstate->raw_buf_index, -1,
+ LINE_LEADER_POPULATING);
+ lineInfo = &pcshared_info->line_boundaries.ring[line_pos];
+ elog(DEBUG1, "[Leader] Adding - block:%d, offset:%d, line position:%d",
+ line_first_block, lineInfo->start_offset, line_pos);
+ }
+
first_char_in_line = false;
} /* end of outer loop */
@@ -5157,6 +5968,7 @@ not_end_of_copy:
*/
REFILL_LINEBUF;
CLEAR_EOL_LINE();
+ EndLineParallelCopy(cstate, line_pos, line_size, raw_buf_ptr);
return result;
}
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index c4af40b..c95db78 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2353,7 +2353,7 @@ psql_completion(const char *text, int start, int end)
/* Complete COPY <sth> FROM|TO filename WITH ( */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "("))
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
- "HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
+ "HEADER", "PARALLEL", "QUOTE", "ESCAPE", "FORCE_QUOTE",
"FORCE_NOT_NULL", "FORCE_NULL", "ENCODING");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 5348011..4ea02f7 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -381,6 +381,8 @@ extern FullTransactionId GetTopFullTransactionId(void);
extern FullTransactionId GetTopFullTransactionIdIfAny(void);
extern FullTransactionId GetCurrentFullTransactionId(void);
extern FullTransactionId GetCurrentFullTransactionIdIfAny(void);
+extern void AssignFullTransactionIdForWorker(FullTransactionId fullTransactionId);
+extern void AssignCommandIdForWorker(CommandId commandId, bool used);
extern void MarkCurrentTransactionIdLoggedIfAny(void);
extern bool SubTransactionIsActive(SubTransactionId subxid);
extern CommandId GetCurrentCommandId(bool used);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 3a83d0f..ba93f65 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1704,6 +1704,7 @@ ParallelCopyLineBoundary
ParallelCopyData
ParallelCopyDataBlock
ParallelCopyLineBuf
+ParallelCopyLineState
ParallelCopyShmInfo
ParallelExecutorInfo
ParallelHashGrowth
--
1.8.3.1
v2-0004-Documentation-for-parallel-copy.patchtext/x-patch; charset=US-ASCII; name=v2-0004-Documentation-for-parallel-copy.patchDownload
From e543e3b3def9367bd30191ea334c26fb7e372b96 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Sat, 1 Aug 2020 09:00:36 +0530
Subject: [PATCH v2 4/5] Documentation for parallel copy.
This patch has the documentation changes for parallel copy.
---
doc/src/sgml/ref/copy.sgml | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18189ab..95d349d 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -37,6 +37,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
NULL '<replaceable class="parameter">null_string</replaceable>'
HEADER [ <replaceable class="parameter">boolean</replaceable> ]
+ PARALLEL <replaceable class="parameter">integer</replaceable>
QUOTE '<replaceable class="parameter">quote_character</replaceable>'
ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -275,6 +276,21 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</varlistentry>
<varlistentry>
+ <term><literal>PARALLEL</literal></term>
+ <listitem>
+ <para>
+ Perform <command>COPY FROM</command> in parallel using <replaceable
+ class="parameter"> integer</replaceable> background workers. Please
+ note that it is not guaranteed that the number of parallel workers
+ specified in <replaceable class="parameter">integer</replaceable> will
+ be used during execution. It is possible for a copy to run with fewer
+ workers than specified, or even with no workers at all. This option is
+ allowed only in <command>COPY FROM</command>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>QUOTE</literal></term>
<listitem>
<para>
--
1.8.3.1
v2-0005-Tests-for-parallel-copy.patchtext/x-patch; charset=US-ASCII; name=v2-0005-Tests-for-parallel-copy.patchDownload
From ec2489bc2316da2aaa959e1e1c53df1f0a801d8a Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Sat, 1 Aug 2020 09:19:39 +0530
Subject: [PATCH v2 5/5] Tests for parallel copy.
This patch has the tests for parallel copy.
---
src/test/regress/expected/copy2.out | 206 ++++++++++++++++++++++++++++++++++-
src/test/regress/input/copy.source | 12 +++
src/test/regress/output/copy.source | 12 +++
src/test/regress/sql/copy2.sql | 208 +++++++++++++++++++++++++++++++++++-
4 files changed, 430 insertions(+), 8 deletions(-)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index e40287d..0e01fa0 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -254,18 +254,32 @@ It is "perfect".|
"It is ""perfect""."," "
"",
--test that we read consecutive LFs properly
-CREATE TEMP TABLE testnl (a int, b text, c int);
+CREATE TABLE testnl (a int, b text, c int);
COPY testnl FROM stdin CSV;
+COPY testnl FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
+SELECT * FROM testnl;
+ a | b | c
+---+----------------------+---
+ 1 | a field with two LFs+| 2
+ | +|
+ | inside |
+ 1 | a field with two LFs+| 2
+ | +|
+ | inside |
+(2 rows)
+
-- test end of copy marker
-CREATE TEMP TABLE testeoc (a text);
+CREATE TABLE testeoc (a text);
COPY testeoc FROM stdin CSV;
+TRUNCATE testeoc;
+COPY testeoc FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
COPY testeoc TO stdout CSV;
a\.
\.b
c\.d
"\."
-- test handling of nonstandard null marker that violates escaping rules
-CREATE TEMP TABLE testnull(a int, b text);
+CREATE TABLE testnull(a int, b text);
INSERT INTO testnull VALUES (1, E'\\0'), (NULL, NULL);
COPY testnull TO stdout WITH NULL AS E'\\0';
1 \\0
@@ -280,6 +294,15 @@ SELECT * FROM testnull;
|
(4 rows)
+TRUNCATE testnull;
+COPY testnull FROM stdin WITH (NULL E'\\0', PARALLEL 1);
+SELECT * FROM testnull;
+ a | b
+----+----
+ 42 | \0
+ |
+(2 rows)
+
BEGIN;
CREATE TABLE vistest (LIKE testeoc);
COPY vistest FROM stdin CSV;
@@ -349,6 +372,34 @@ SELECT * FROM vistest;
BEGIN;
TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+SELECT * FROM vistest;
+ a
+----
+ a2
+ b
+(2 rows)
+
+SAVEPOINT s1;
+TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+SELECT * FROM vistest;
+ a
+----
+ d2
+ e
+(2 rows)
+
+COMMIT;
+SELECT * FROM vistest;
+ a
+----
+ d2
+ e
+(2 rows)
+
+BEGIN;
+TRUNCATE vistest;
COPY vistest FROM stdin CSV FREEZE;
SELECT * FROM vistest;
a
@@ -409,7 +460,7 @@ SELECT * FROM vistest;
(2 rows)
-- Test FORCE_NOT_NULL and FORCE_NULL options
-CREATE TEMP TABLE forcetest (
+CREATE TABLE forcetest (
a INT NOT NULL,
b TEXT NOT NULL,
c TEXT,
@@ -420,6 +471,8 @@ CREATE TEMP TABLE forcetest (
-- should succeed with no effect ("b" remains an empty string, "c" remains NULL)
BEGIN;
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
+TRUNCATE forcetest;
+COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c), PARALLEL 1);
COMMIT;
SELECT b, c FROM forcetest WHERE a = 1;
b | c
@@ -430,6 +483,8 @@ SELECT b, c FROM forcetest WHERE a = 1;
-- should succeed, FORCE_NULL and FORCE_NOT_NULL can be both specified
BEGIN;
COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d));
+TRUNCATE forcetest;
+COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d), PARALLEL 1);
COMMIT;
SELECT c, d FROM forcetest WHERE a = 2;
c | d
@@ -486,6 +541,31 @@ select * from check_con_tbl;
(2 rows)
+\d+ check_con_tbl
+ Table "public.check_con_tbl"
+ Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
+--------+---------+-----------+----------+---------+---------+--------------+-------------
+ f1 | integer | | | | plain | |
+Check constraints:
+ "check_con_tbl_check" CHECK (check_con_function(check_con_tbl.*))
+
+truncate check_con_tbl;
+copy check_con_tbl from stdin with (parallel 1);
+NOTICE: input = {"f1":1}
+NOTICE: input = {"f1":null}
+copy check_con_tbl from stdin with (parallel 1);
+NOTICE: input = {"f1":0}
+ERROR: new row for relation "check_con_tbl" violates check constraint "check_con_tbl_check"
+DETAIL: Failing row contains (0).
+CONTEXT: COPY check_con_tbl, line 1: "0"
+parallel worker
+select * from check_con_tbl;
+ f1
+----
+ 1
+
+(2 rows)
+
-- test with RLS enabled.
CREATE ROLE regress_rls_copy_user;
CREATE ROLE regress_rls_copy_user_colperms;
@@ -600,8 +680,126 @@ SELECT * FROM instead_of_insert_tbl;
(2 rows)
COMMIT;
+-- Parallel copy tests.
+CREATE TABLE test_parallel_copy (
+ a INT,
+ b INT,
+ c TEXT not null default 'stuff',
+ d TEXT,
+ e TEXT
+) ;
+COPY test_parallel_copy (a, b, c, d, e) from stdin with (PARALLEL 1);
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1, FORMAT 'csv', HEADER);
+-- zero workers: should perform non-parallel copy
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0');
+ERROR: argument to option "parallel" must be a positive integer greater than zero
+LINE 1: COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0'...
+ ^
+-- referencing table: should perform non-parallel copy
+CREATE TABLE test_copy_pk(c1 INT PRIMARY KEY);
+INSERT INTO test_copy_pk VALUES(10);
+CREATE TABLE test_copy_ri(c1 INT REFERENCES test_copy_pk(c1));
+COPY test_copy_ri from stdin with (FORMAT csv, DELIMITER ',', PARALLEL 1);
+-- expressions: should perform non-parallel copy
+CREATE TABLE test_copy_expr (index INT, height REAL, weight REAL);
+COPY test_copy_expr FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1) WHERE height > random() * 65;
+-- serial data: should perform non-parallel copy
+CREATE TABLE testserial (index SERIAL, height REAL);
+COPY testserial(height) FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1);
+-- temporary table copy: should perform non-parallel copy
+CREATE TEMPORARY TABLE temp_test(
+ a int
+) ;
+COPY temp_test (a) from stdin with (PARALLEL 1);
+-- non-existent column in column list: should fail
+COPY test_parallel_copy (xyz) from stdin with (PARALLEL 1);
+ERROR: column "xyz" of relation "test_parallel_copy" does not exist
+-- too many columns in column list: should fail
+COPY test_parallel_copy (a, b, c, d, e, d, c) from stdin with (PARALLEL 1);
+ERROR: column "d" specified more than once
+-- missing data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: invalid input syntax for type integer: ""
+CONTEXT: COPY test_parallel_copy, line 0, column a: ""
+parallel worker
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: missing data for column "e"
+CONTEXT: COPY test_parallel_copy, line 1: "2000 230 23 23"
+parallel worker
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: missing data for column "e"
+CONTEXT: COPY test_parallel_copy, line 1: "2001 231 \N \N"
+parallel worker
+-- extra data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: extra data after last expected column
+CONTEXT: COPY test_parallel_copy, line 1: "2002 232 40 50 60 70 80"
+parallel worker
+-- various COPY options: delimiters, oids, NULL string, encoding
+COPY test_parallel_copy (b, c, d, e) from stdin with (delimiter ',', null 'x', PARALLEL 1) ;
+COPY test_parallel_copy from stdin WITH (DELIMITER ';', NULL '', PARALLEL 1);
+COPY test_parallel_copy from stdin WITH (DELIMITER ':', NULL E'\\X', ENCODING 'sql_ascii', PARALLEL 1);
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = 50004;
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a > 60003;
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE f > 60003;
+ERROR: column "f" does not exist
+LINE 1: ..._parallel_copy from stdin WITH (PARALLEL 1) WHERE f > 60003;
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = max(x.b);
+ERROR: missing FROM-clause entry for table "x"
+LINE 1: ...rallel_copy from stdin WITH (PARALLEL 1) WHERE a = max(x.b);
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (SELECT 1 FROM x);
+ERROR: cannot use subquery in COPY FROM WHERE condition
+LINE 1: ...arallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (SELECT...
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (generate_series(1,5));
+ERROR: set-returning functions are not allowed in COPY FROM WHERE conditions
+LINE 1: ...lel_copy from stdin WITH (PARALLEL 1) WHERE a IN (generate_s...
+ ^
+COPY test_parallel_copy from stdin WITH(PARALLEL 1) WHERE a = row_number() over(b);
+ERROR: window functions are not allowed in COPY FROM WHERE conditions
+LINE 1: ...rallel_copy from stdin WITH(PARALLEL 1) WHERE a = row_number...
+ ^
+-- check results of copy in
+SELECT * FROM test_parallel_copy;
+ a | b | c | d | e
+-------+----+------------+---------+---------
+ 1 | 11 | test_c1 | test_d1 | test_e1
+ 2 | 12 | test_c2 | test_d2 | test_e2
+ | 3 | stuff | test_d3 |
+ | 4 | stuff | test_d4 |
+ | 5 | stuff | test_d5 |
+ | | 45 | 80 | 90
+ | | x | \x | \x
+ | | , | \, | \
+ 3000 | | c | |
+ 4000 | | C | |
+ 4001 | 1 | empty | |
+ 4002 | 2 | null | |
+ 4003 | 3 | Backslash | \ | \
+ 4004 | 4 | BackslashX | \X | \X
+ 4005 | 5 | N | N | N
+ 4006 | 6 | BackslashN | \N | \N
+ 4007 | 7 | XX | XX | XX
+ 4008 | 8 | Delimiter | : | :
+ 50004 | 25 | 35 | 45 | 55
+ 60004 | 25 | 35 | 45 | 55
+ 60005 | 26 | 36 | 46 | 56
+(21 rows)
+
-- clean up
+DROP TABLE test_copy_ri;
+DROP TABLE test_copy_pk;
+DROP TABLE test_copy_expr;
+DROP TABLE testeoc;
+DROP TABLE testnl;
DROP TABLE forcetest;
+DROP TABLE test_parallel_copy;
+DROP TABLE testserial;
+DROP TABLE testnull;
DROP TABLE vistest;
DROP FUNCTION truncate_in_subxact();
DROP TABLE x, y;
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index a1d529a..159c058 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -15,6 +15,13 @@ DELETE FROM onek;
COPY onek FROM '@abs_builddir@/results/onek.data';
+-- test parallel copy
+COPY tenk1 FROM '@abs_srcdir@/data/tenk.data' WITH (parallel 2);
+
+SELECT COUNT(*) FROM tenk1;
+
+TRUNCATE tenk1;
+
COPY tenk1 FROM '@abs_srcdir@/data/tenk.data';
COPY slow_emp4000 FROM '@abs_srcdir@/data/rect.data';
@@ -159,6 +166,11 @@ truncate parted_copytest;
copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv';
+-- Test parallel copy from with a partitioned table.
+truncate parted_copytest;
+
+copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv' with (parallel 2);
+
-- Ensure COPY FREEZE errors for partitioned tables.
begin;
truncate parted_copytest;
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index 938d355..c3003fe 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -9,6 +9,15 @@ COPY onek FROM '@abs_srcdir@/data/onek.data';
COPY onek TO '@abs_builddir@/results/onek.data';
DELETE FROM onek;
COPY onek FROM '@abs_builddir@/results/onek.data';
+-- test parallel copy
+COPY tenk1 FROM '@abs_srcdir@/data/tenk.data' WITH (parallel 2);
+SELECT COUNT(*) FROM tenk1;
+ count
+-------
+ 10000
+(1 row)
+
+TRUNCATE tenk1;
COPY tenk1 FROM '@abs_srcdir@/data/tenk.data';
COPY slow_emp4000 FROM '@abs_srcdir@/data/rect.data';
COPY person FROM '@abs_srcdir@/data/person.data';
@@ -113,6 +122,9 @@ insert into parted_copytest select x,1,'One' from generate_series(1011,1020) x;
copy (select * from parted_copytest order by a) to '@abs_builddir@/results/parted_copytest.csv';
truncate parted_copytest;
copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv';
+-- Test parallel copy from with a partitioned table.
+truncate parted_copytest;
+copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv' with (parallel 2);
-- Ensure COPY FREEZE errors for partitioned tables.
begin;
truncate parted_copytest;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 902f4fa..7015698 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -157,7 +157,7 @@ COPY y TO stdout (FORMAT CSV, FORCE_QUOTE *);
--test that we read consecutive LFs properly
-CREATE TEMP TABLE testnl (a int, b text, c int);
+CREATE TABLE testnl (a int, b text, c int);
COPY testnl FROM stdin CSV;
1,"a field with two LFs
@@ -165,8 +165,16 @@ COPY testnl FROM stdin CSV;
inside",2
\.
+COPY testnl FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
+1,"a field with two LFs
+
+inside",2
+\.
+
+SELECT * FROM testnl;
+
-- test end of copy marker
-CREATE TEMP TABLE testeoc (a text);
+CREATE TABLE testeoc (a text);
COPY testeoc FROM stdin CSV;
a\.
@@ -175,11 +183,19 @@ c\.d
"\."
\.
+TRUNCATE testeoc;
+COPY testeoc FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
+a\.
+\.b
+c\.d
+"\."
+\.
+
COPY testeoc TO stdout CSV;
-- test handling of nonstandard null marker that violates escaping rules
-CREATE TEMP TABLE testnull(a int, b text);
+CREATE TABLE testnull(a int, b text);
INSERT INTO testnull VALUES (1, E'\\0'), (NULL, NULL);
COPY testnull TO stdout WITH NULL AS E'\\0';
@@ -191,6 +207,14 @@ COPY testnull FROM stdin WITH NULL AS E'\\0';
SELECT * FROM testnull;
+TRUNCATE testnull;
+COPY testnull FROM stdin WITH (NULL E'\\0', PARALLEL 1);
+42 \\0
+\0 \0
+\.
+
+SELECT * FROM testnull;
+
BEGIN;
CREATE TABLE vistest (LIKE testeoc);
COPY vistest FROM stdin CSV;
@@ -235,6 +259,23 @@ SELECT * FROM vistest;
BEGIN;
TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+a2
+b
+\.
+SELECT * FROM vistest;
+SAVEPOINT s1;
+TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+d2
+e
+\.
+SELECT * FROM vistest;
+COMMIT;
+SELECT * FROM vistest;
+
+BEGIN;
+TRUNCATE vistest;
COPY vistest FROM stdin CSV FREEZE;
x
y
@@ -284,7 +325,7 @@ SELECT * FROM vistest;
COMMIT;
SELECT * FROM vistest;
-- Test FORCE_NOT_NULL and FORCE_NULL options
-CREATE TEMP TABLE forcetest (
+CREATE TABLE forcetest (
a INT NOT NULL,
b TEXT NOT NULL,
c TEXT,
@@ -297,6 +338,10 @@ BEGIN;
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
1,,""
\.
+TRUNCATE forcetest;
+COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c), PARALLEL 1);
+1,,""
+\.
COMMIT;
SELECT b, c FROM forcetest WHERE a = 1;
-- should succeed, FORCE_NULL and FORCE_NOT_NULL can be both specified
@@ -304,6 +349,10 @@ BEGIN;
COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d));
2,'a',,""
\.
+TRUNCATE forcetest;
+COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d), PARALLEL 1);
+2,'a',,""
+\.
COMMIT;
SELECT c, d FROM forcetest WHERE a = 2;
-- should fail with not-null constraint violation
@@ -339,6 +388,16 @@ copy check_con_tbl from stdin;
0
\.
select * from check_con_tbl;
+\d+ check_con_tbl
+truncate check_con_tbl;
+copy check_con_tbl from stdin with (parallel 1);
+1
+\N
+\.
+copy check_con_tbl from stdin with (parallel 1);
+0
+\.
+select * from check_con_tbl;
-- test with RLS enabled.
CREATE ROLE regress_rls_copy_user;
@@ -440,8 +499,149 @@ test1
SELECT * FROM instead_of_insert_tbl;
COMMIT;
+-- Parallel copy tests.
+CREATE TABLE test_parallel_copy (
+ a INT,
+ b INT,
+ c TEXT not null default 'stuff',
+ d TEXT,
+ e TEXT
+) ;
+
+COPY test_parallel_copy (a, b, c, d, e) from stdin with (PARALLEL 1);
+1 11 test_c1 test_d1 test_e1
+2 12 test_c2 test_d2 test_e2
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+3 test_d3
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+4 test_d4
+5 test_d5
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1, FORMAT 'csv', HEADER);
+b d
+\.
+
+-- zero workers: should perform non-parallel copy
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0');
+
+-- referencing table: should perform non-parallel copy
+CREATE TABLE test_copy_pk(c1 INT PRIMARY KEY);
+INSERT INTO test_copy_pk VALUES(10);
+CREATE TABLE test_copy_ri(c1 INT REFERENCES test_copy_pk(c1));
+COPY test_copy_ri from stdin with (FORMAT csv, DELIMITER ',', PARALLEL 1);
+10
+\.
+
+-- expressions: should perform non-parallel copy
+CREATE TABLE test_copy_expr (index INT, height REAL, weight REAL);
+
+COPY test_copy_expr FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1) WHERE height > random() * 65;
+60,60,60
+\.
+
+-- serial data: should perform non-parallel copy
+CREATE TABLE testserial (index SERIAL, height REAL);
+
+COPY testserial(height) FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1);
+60
+\.
+
+-- temporary table copy: should perform non-parallel copy
+CREATE TEMPORARY TABLE temp_test(
+ a int
+) ;
+
+COPY temp_test (a) from stdin with (PARALLEL 1);
+10
+\.
+
+-- non-existent column in column list: should fail
+COPY test_parallel_copy (xyz) from stdin with (PARALLEL 1);
+
+-- too many columns in column list: should fail
+COPY test_parallel_copy (a, b, c, d, e, d, c) from stdin with (PARALLEL 1);
+
+-- missing data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+
+\.
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+2000 230 23 23
+\.
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+2001 231 \N \N
+\.
+
+-- extra data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+2002 232 40 50 60 70 80
+\.
+
+-- various COPY options: delimiters, oids, NULL string, encoding
+COPY test_parallel_copy (b, c, d, e) from stdin with (delimiter ',', null 'x', PARALLEL 1) ;
+x,45,80,90
+x,\x,\\x,\\\x
+x,\,,\\\,,\\
+\.
+
+COPY test_parallel_copy from stdin WITH (DELIMITER ';', NULL '', PARALLEL 1);
+3000;;c;;
+\.
+
+COPY test_parallel_copy from stdin WITH (DELIMITER ':', NULL E'\\X', ENCODING 'sql_ascii', PARALLEL 1);
+4000:\X:C:\X:\X
+4001:1:empty::
+4002:2:null:\X:\X
+4003:3:Backslash:\\:\\
+4004:4:BackslashX:\\X:\\X
+4005:5:N:\N:\N
+4006:6:BackslashN:\\N:\\N
+4007:7:XX:\XX:\XX
+4008:8:Delimiter:\::\:
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = 50004;
+50003 24 34 44 54
+50004 25 35 45 55
+50005 26 36 46 56
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a > 60003;
+60001 22 32 42 52
+60002 23 33 43 53
+60003 24 34 44 54
+60004 25 35 45 55
+60005 26 36 46 56
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE f > 60003;
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = max(x.b);
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (SELECT 1 FROM x);
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (generate_series(1,5));
+
+COPY test_parallel_copy from stdin WITH(PARALLEL 1) WHERE a = row_number() over(b);
+
+-- check results of copy in
+SELECT * FROM test_parallel_copy;
+
-- clean up
+DROP TABLE test_copy_ri;
+DROP TABLE test_copy_pk;
+DROP TABLE test_copy_expr;
+DROP TABLE testeoc;
+DROP TABLE testnl;
DROP TABLE forcetest;
+DROP TABLE test_parallel_copy;
+DROP TABLE testserial;
+DROP TABLE testnull;
DROP TABLE vistest;
DROP FUNCTION truncate_in_subxact();
DROP TABLE x, y;
--
1.8.3.1
On Sat, Aug 1, 2020 at 9:55 AM vignesh C <vignesh21@gmail.com> wrote:
The patches were not applying because of the recent commits.
I have rebased the patch over head & attached.
I rebased v2-0006-Parallel-Copy-For-Binary-Format-Files.patch.
Putting together all the patches rebased on to the latest commit
b8fdee7d0ca8bd2165d46fb1468f75571b706a01. Patches from 0001 to 0005
are rebased by Vignesh, that are from the previous mail and the patch
0006 is rebased by me.
Please consider this patch set for further review.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v2-0001-Copy-code-readjustment-to-support-parallel-copy.patchapplication/x-patch; name=v2-0001-Copy-code-readjustment-to-support-parallel-copy.patchDownload
From f2ba043af005c55961ed68c9a595cee58c46c79d Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Sat, 1 Aug 2020 08:52:48 +0530
Subject: [PATCH v2 1/5] Copy code readjustment to support parallel copy.
This patch has the copy code slightly readjusted so that the common code is
separated to functions/macros, these functions/macros will be used by the
workers in the parallel copy code of the upcoming patches. EOL removal is moved
from CopyReadLine to CopyReadLineText, this change was required because in case
of parallel copy the record identification and record updation is done in
CopyReadLineText, before record information is updated in shared memory the new
line characters should be removed.
---
src/backend/commands/copy.c | 361 ++++++++++++++++++++++++++------------------
1 file changed, 218 insertions(+), 143 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index db7d24a..3efafca 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -95,6 +95,9 @@ typedef enum CopyInsertMethod
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
} CopyInsertMethod;
+#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
+
/*
* This struct contains all the state variables used throughout a COPY
* operation. For simplicity, we use the same struct for all variants of COPY,
@@ -224,7 +227,6 @@ typedef struct CopyStateData
* appropriate amounts of data from this buffer. In both modes, we
* guarantee that there is a \0 at raw_buf[raw_buf_len].
*/
-#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
char *raw_buf;
int raw_buf_index; /* next byte to process */
int raw_buf_len; /* total # of bytes stored */
@@ -354,6 +356,27 @@ if (1) \
goto not_end_of_copy; \
} else ((void) 0)
+/*
+ * CLEAR_EOL_LINE - Wrapper for clearing EOL.
+ */
+#define CLEAR_EOL_LINE() \
+if (!result && !IsHeaderLine()) \
+ ClearEOLFromCopiedData(cstate, cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ &cstate->line_buf.len) \
+
+/*
+ * INCREMENTPROCESSED - Increment the lines processed.
+ */
+#define INCREMENTPROCESSED(processed) \
+processed++;
+
+/*
+ * GETPROCESSED - Get the lines processed.
+ */
+#define GETPROCESSED(processed) \
+return processed;
+
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
@@ -401,7 +424,11 @@ static bool CopyGetInt16(CopyState cstate, int16 *val);
static bool CopyLoadRawBuf(CopyState cstate);
static int CopyReadBinaryData(CopyState cstate, char *dest, int nbytes);
-
+static void PopulateCommonCstateInfo(CopyState cstate, TupleDesc tup_desc,
+ List *attnamelist);
+static void ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
+ int copy_line_pos, int *copy_line_size);
+static void ConvertToServerEncoding(CopyState cstate);
/*
* Send copy start/stop messages for frontend copies. These have changed
* in past protocol redesigns.
@@ -801,14 +828,18 @@ CopyLoadRawBuf(CopyState cstate)
{
int nbytes = RAW_BUF_BYTES(cstate);
int inbytes;
+ int minread = 1;
/* Copy down the unprocessed data if any. */
if (nbytes > 0)
memmove(cstate->raw_buf, cstate->raw_buf + cstate->raw_buf_index,
nbytes);
+ if (cstate->copy_dest == COPY_NEW_FE)
+ minread = RAW_BUF_SIZE - nbytes;
+
inbytes = CopyGetData(cstate, cstate->raw_buf + nbytes,
- 1, RAW_BUF_SIZE - nbytes);
+ minread, RAW_BUF_SIZE - nbytes);
nbytes += inbytes;
cstate->raw_buf[nbytes] = '\0';
cstate->raw_buf_index = 0;
@@ -1514,7 +1545,6 @@ BeginCopy(ParseState *pstate,
{
CopyState cstate;
TupleDesc tupDesc;
- int num_phys_attrs;
MemoryContext oldcontext;
/* Allocate workspace and zero all fields */
@@ -1680,6 +1710,24 @@ BeginCopy(ParseState *pstate,
tupDesc = cstate->queryDesc->tupDesc;
}
+ PopulateCommonCstateInfo(cstate, tupDesc, attnamelist);
+ cstate->copy_dest = COPY_FILE; /* default */
+
+ MemoryContextSwitchTo(oldcontext);
+
+ return cstate;
+}
+
+/*
+ * PopulateCommonCstateInfo - Populates the common variables required for copy
+ * from operation. This is a helper function for BeginCopy function.
+ */
+static void
+PopulateCommonCstateInfo(CopyState cstate, TupleDesc tupDesc,
+ List *attnamelist)
+{
+ int num_phys_attrs;
+
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
@@ -1799,12 +1847,6 @@ BeginCopy(ParseState *pstate,
pg_database_encoding_max_length() > 1);
/* See Multibyte encoding comment above */
cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
-
- cstate->copy_dest = COPY_FILE; /* default */
-
- MemoryContextSwitchTo(oldcontext);
-
- return cstate;
}
/*
@@ -2696,32 +2738,11 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
}
/*
- * Copy FROM file to relation.
+ * Check if the relation specified in copy from is valid.
*/
-uint64
-CopyFrom(CopyState cstate)
+static void
+CheckTargetRelValidity(CopyState cstate)
{
- ResultRelInfo *resultRelInfo;
- ResultRelInfo *target_resultRelInfo;
- ResultRelInfo *prevResultRelInfo = NULL;
- EState *estate = CreateExecutorState(); /* for ExecConstraints() */
- ModifyTableState *mtstate;
- ExprContext *econtext;
- TupleTableSlot *singleslot = NULL;
- MemoryContext oldcontext = CurrentMemoryContext;
-
- PartitionTupleRouting *proute = NULL;
- ErrorContextCallback errcallback;
- CommandId mycid = GetCurrentCommandId(true);
- int ti_options = 0; /* start with default options for insert */
- BulkInsertState bistate = NULL;
- CopyInsertMethod insertMethod;
- CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
- uint64 processed = 0;
- bool has_before_insert_row_trig;
- bool has_instead_insert_row_trig;
- bool leafpart_use_multi_insert = false;
-
Assert(cstate->rel);
/*
@@ -2758,27 +2779,6 @@ CopyFrom(CopyState cstate)
RelationGetRelationName(cstate->rel))));
}
- /*
- * If the target file is new-in-transaction, we assume that checking FSM
- * for free space is a waste of time. This could possibly be wrong, but
- * it's unlikely.
- */
- if (RELKIND_HAS_STORAGE(cstate->rel->rd_rel->relkind) &&
- (cstate->rel->rd_createSubid != InvalidSubTransactionId ||
- cstate->rel->rd_firstRelfilenodeSubid != InvalidSubTransactionId))
- ti_options |= TABLE_INSERT_SKIP_FSM;
-
- /*
- * Optimize if new relfilenode was created in this subxact or one of its
- * committed children and we won't see those rows later as part of an
- * earlier scan or command. The subxact test ensures that if this subxact
- * aborts then the frozen rows won't be visible after xact cleanup. Note
- * that the stronger test of exactly which subtransaction created it is
- * crucial for correctness of this optimization. The test for an earlier
- * scan or command tolerates false negatives. FREEZE causes other sessions
- * to see rows they would not see under MVCC, and a false negative merely
- * spreads that anomaly to the current session.
- */
if (cstate->freeze)
{
/*
@@ -2816,9 +2816,61 @@ CopyFrom(CopyState cstate)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("cannot perform COPY FREEZE because the table was not created or truncated in the current subtransaction")));
+ }
+}
+
+/*
+ * Copy FROM file to relation.
+ */
+uint64
+CopyFrom(CopyState cstate)
+{
+ ResultRelInfo *resultRelInfo;
+ ResultRelInfo *target_resultRelInfo;
+ ResultRelInfo *prevResultRelInfo = NULL;
+ EState *estate = CreateExecutorState(); /* for ExecConstraints() */
+ ModifyTableState *mtstate;
+ ExprContext *econtext;
+ TupleTableSlot *singleslot = NULL;
+ MemoryContext oldcontext = CurrentMemoryContext;
+
+ PartitionTupleRouting *proute = NULL;
+ ErrorContextCallback errcallback;
+ CommandId mycid = GetCurrentCommandId(true);
+ int ti_options = 0; /* start with default options for insert */
+ BulkInsertState bistate = NULL;
+ CopyInsertMethod insertMethod;
+ CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
+ uint64 processed = 0;
+ bool has_before_insert_row_trig;
+ bool has_instead_insert_row_trig;
+ bool leafpart_use_multi_insert = false;
+
+ CheckTargetRelValidity(cstate);
+ /*
+ * If the target file is new-in-transaction, we assume that checking FSM
+ * for free space is a waste of time. This could possibly be wrong, but
+ * it's unlikely.
+ */
+ if (RELKIND_HAS_STORAGE(cstate->rel->rd_rel->relkind) &&
+ (cstate->rel->rd_createSubid != InvalidSubTransactionId ||
+ cstate->rel->rd_firstRelfilenodeSubid != InvalidSubTransactionId))
+ ti_options |= TABLE_INSERT_SKIP_FSM;
+
+ /*
+ * Optimize if new relfilenode was created in this subxact or one of its
+ * committed children and we won't see those rows later as part of an
+ * earlier scan or command. The subxact test ensures that if this subxact
+ * aborts then the frozen rows won't be visible after xact cleanup. Note
+ * that the stronger test of exactly which subtransaction created it is
+ * crucial for correctness of this optimization. The test for an earlier
+ * scan or command tolerates false negatives. FREEZE causes other sessions
+ * to see rows they would not see under MVCC, and a false negative merely
+ * spreads that anomaly to the current session.
+ */
+ if (cstate->freeze)
ti_options |= TABLE_INSERT_FROZEN;
- }
/*
* We need a ResultRelInfo so we can use the regular executor's
@@ -3311,7 +3363,7 @@ CopyFrom(CopyState cstate)
* or FDW; this is the same definition used by nodeModifyTable.c
* for counting tuples inserted by an INSERT command.
*/
- processed++;
+ INCREMENTPROCESSED(processed)
}
}
@@ -3366,30 +3418,15 @@ CopyFrom(CopyState cstate)
FreeExecutorState(estate);
- return processed;
+ GETPROCESSED(processed)
}
/*
- * Setup to read tuples from a file for COPY FROM.
- *
- * 'rel': Used as a template for the tuples
- * 'filename': Name of server-local file to read
- * 'attnamelist': List of char *, columns to include. NIL selects all cols.
- * 'options': List of DefElem. See copy_opt_item in gram.y for selections.
- *
- * Returns a CopyState, to be passed to NextCopyFrom and related functions.
+ * PopulateCstateCatalogInfo - populate the catalog information.
*/
-CopyState
-BeginCopyFrom(ParseState *pstate,
- Relation rel,
- const char *filename,
- bool is_program,
- copy_data_source_cb data_source_cb,
- List *attnamelist,
- List *options)
+static void
+PopulateCstateCatalogInfo(CopyState cstate)
{
- CopyState cstate;
- bool pipe = (filename == NULL);
TupleDesc tupDesc;
AttrNumber num_phys_attrs,
num_defaults;
@@ -3399,38 +3436,8 @@ BeginCopyFrom(ParseState *pstate,
Oid in_func_oid;
int *defmap;
ExprState **defexprs;
- MemoryContext oldcontext;
bool volatile_defexprs;
- cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
- oldcontext = MemoryContextSwitchTo(cstate->copycontext);
-
- /* Initialize state variables */
- cstate->reached_eof = false;
- cstate->eol_type = EOL_UNKNOWN;
- cstate->cur_relname = RelationGetRelationName(cstate->rel);
- cstate->cur_lineno = 0;
- cstate->cur_attname = NULL;
- cstate->cur_attval = NULL;
-
- /*
- * Set up variables to avoid per-attribute overhead. attribute_buf and
- * raw_buf are used in both text and binary modes, but we use line_buf
- * only in text mode.
- */
- initStringInfo(&cstate->attribute_buf);
- cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
- cstate->raw_buf_index = cstate->raw_buf_len = 0;
- if (!cstate->binary)
- {
- initStringInfo(&cstate->line_buf);
- cstate->line_buf_converted = false;
- }
-
- /* Assign range table, we'll need it in CopyFrom. */
- if (pstate)
- cstate->range_table = pstate->p_rtable;
-
tupDesc = RelationGetDescr(cstate->rel);
num_phys_attrs = tupDesc->natts;
num_defaults = 0;
@@ -3508,6 +3515,61 @@ BeginCopyFrom(ParseState *pstate,
cstate->defexprs = defexprs;
cstate->volatile_defexprs = volatile_defexprs;
cstate->num_defaults = num_defaults;
+}
+
+/*
+ * Setup to read tuples from a file for COPY FROM.
+ *
+ * 'rel': Used as a template for the tuples
+ * 'filename': Name of server-local file to read
+ * 'attnamelist': List of char *, columns to include. NIL selects all cols.
+ * 'options': List of DefElem. See copy_opt_item in gram.y for selections.
+ *
+ * Returns a CopyState, to be passed to NextCopyFrom and related functions.
+ */
+CopyState
+BeginCopyFrom(ParseState *pstate,
+ Relation rel,
+ const char *filename,
+ bool is_program,
+ copy_data_source_cb data_source_cb,
+ List *attnamelist,
+ List *options)
+{
+ CopyState cstate;
+ bool pipe = (filename == NULL);
+ MemoryContext oldcontext;
+
+ cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+
+ /* Initialize state variables */
+ cstate->reached_eof = false;
+ cstate->eol_type = EOL_UNKNOWN;
+ cstate->cur_relname = RelationGetRelationName(cstate->rel);
+ cstate->cur_lineno = 0;
+ cstate->cur_attname = NULL;
+ cstate->cur_attval = NULL;
+
+ /*
+ * Set up variables to avoid per-attribute overhead. attribute_buf is
+ * used in both text and binary modes, but we use line_buf and raw_buf
+ * only in text mode.
+ */
+ initStringInfo(&cstate->attribute_buf);
+ cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+ if (!cstate->binary)
+ {
+ initStringInfo(&cstate->line_buf);
+ cstate->line_buf_converted = false;
+ }
+
+ /* Assign range table, we'll need it in CopyFrom. */
+ if (pstate)
+ cstate->range_table = pstate->p_rtable;
+
+ PopulateCstateCatalogInfo(cstate);
cstate->is_program = is_program;
if (data_source_cb)
@@ -3917,45 +3979,60 @@ CopyReadLine(CopyState cstate)
} while (CopyLoadRawBuf(cstate));
}
}
- else
- {
- /*
- * If we didn't hit EOF, then we must have transferred the EOL marker
- * to line_buf along with the data. Get rid of it.
- */
- switch (cstate->eol_type)
- {
- case EOL_NL:
- Assert(cstate->line_buf.len >= 1);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\n');
- cstate->line_buf.len--;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_CR:
- Assert(cstate->line_buf.len >= 1);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\r');
- cstate->line_buf.len--;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_CRNL:
- Assert(cstate->line_buf.len >= 2);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 2] == '\r');
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\n');
- cstate->line_buf.len -= 2;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_UNKNOWN:
- /* shouldn't get here */
- Assert(false);
- break;
- }
- }
+ ConvertToServerEncoding(cstate);
+ return result;
+}
+
+/*
+ * ClearEOLFromCopiedData - Clear EOL from the copied data.
+ */
+static void
+ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
+ int copy_line_pos, int *copy_line_size)
+{
+ /*
+ * If we didn't hit EOF, then we must have transferred the EOL marker
+ * to line_buf along with the data. Get rid of it.
+ */
+ switch (cstate->eol_type)
+ {
+ case EOL_NL:
+ Assert(*copy_line_size >= 1);
+ Assert(copy_line_data[copy_line_pos - 1] == '\n');
+ copy_line_data[copy_line_pos - 1] = '\0';
+ (*copy_line_size)--;
+ break;
+ case EOL_CR:
+ Assert(*copy_line_size >= 1);
+ Assert(copy_line_data[copy_line_pos - 1] == '\r');
+ copy_line_data[copy_line_pos - 1] = '\0';
+ (*copy_line_size)--;
+ break;
+ case EOL_CRNL:
+ Assert(*copy_line_size >= 2);
+ Assert(copy_line_data[copy_line_pos - 2] == '\r');
+ Assert(copy_line_data[copy_line_pos - 1] == '\n');
+ copy_line_data[copy_line_pos - 2] = '\0';
+ *copy_line_size -= 2;
+ break;
+ case EOL_UNKNOWN:
+ /* shouldn't get here */
+ Assert(false);
+ break;
+ }
+}
+
+/*
+ * ConvertToServerEncoding - convert contents to server encoding.
+ */
+static void
+ConvertToServerEncoding(CopyState cstate)
+{
/* Done reading the line. Convert it to server encoding. */
if (cstate->need_transcoding)
{
char *cvt;
-
cvt = pg_any_to_server(cstate->line_buf.data,
cstate->line_buf.len,
cstate->file_encoding);
@@ -3967,11 +4044,8 @@ CopyReadLine(CopyState cstate)
pfree(cvt);
}
}
-
/* Now it's safe to use the buffer in error messages */
cstate->line_buf_converted = true;
-
- return result;
}
/*
@@ -4334,6 +4408,7 @@ not_end_of_copy:
* Transfer any still-uncopied data to line_buf.
*/
REFILL_LINEBUF;
+ CLEAR_EOL_LINE();
return result;
}
--
1.8.3.1
v2-0002-Framework-for-leader-worker-in-parallel-copy.patchapplication/x-patch; name=v2-0002-Framework-for-leader-worker-in-parallel-copy.patchDownload
From 1bf7d74c7308e6eae807c0258322abbb3890aede Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Sat, 1 Aug 2020 08:56:39 +0530
Subject: [PATCH v2 2/5] Framework for leader/worker in parallel copy
This patch has the framework for data structures in parallel copy, leader
initialization, worker initialization, shared memory updation, starting workers,
wait for workers and workers exiting.
---
src/backend/access/transam/parallel.c | 4 +
src/backend/commands/copy.c | 756 +++++++++++++++++++++++++++++++++-
src/include/commands/copy.h | 2 +
src/tools/pgindent/typedefs.list | 7 +
4 files changed, 765 insertions(+), 4 deletions(-)
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index b042696..a3cff4b 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -25,6 +25,7 @@
#include "catalog/pg_enum.h"
#include "catalog/storage.h"
#include "commands/async.h"
+#include "commands/copy.h"
#include "executor/execParallel.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
@@ -145,6 +146,9 @@ static const struct
},
{
"parallel_vacuum_main", parallel_vacuum_main
+ },
+ {
+ "ParallelCopyMain", ParallelCopyMain
}
};
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3efafca..3dcdfc1 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -96,9 +96,180 @@ typedef enum CopyInsertMethod
} CopyInsertMethod;
#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+
+/*
+ * The macros DATA_BLOCK_SIZE, RINGSIZE & MAX_BLOCKS_COUNT stores the records
+ * read from the file that need to be inserted into the relation. These values
+ * help in handover of multiple records with significant size of data to be
+ * processed by each of the workers to make sure there is no context switch & the
+ * work is fairly distributed among the workers. This number showed best
+ * results in the performance tests.
+ */
+#define DATA_BLOCK_SIZE RAW_BUF_SIZE
+
+/* It can hold upto 10000 record information for worker to process. */
+#define RINGSIZE (10 * 1000)
+
+/* It can hold 1000 blocks of 64K data in DSM to be processed by the worker. */
+#define MAX_BLOCKS_COUNT 1000
+
+/*
+ * Each worker will be allocated WORKER_CHUNK_COUNT of records from DSM data
+ * block to process to avoid lock contention. This value should be mode of
+ * RINGSIZE, as wrap around cases is currently not handled while selecting the
+ * WORKER_CHUNK_COUNT by the worker.
+ */
+#define WORKER_CHUNK_COUNT 50
+
+#define IsParallelCopy() (cstate->is_parallel)
+#define IsLeader() (cstate->pcdata->is_leader)
#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
/*
+ * Copy data block information.
+ * ParallelCopyDataBlock's will be created in DSM. Data read from file will be
+ * copied in these DSM data blocks. The leader process identifies the records
+ * and the record information will be shared to the workers. The workers will
+ * insert the records into the table. There can be one or more number of records
+ * in each of the data block based on the record size.
+ */
+typedef struct ParallelCopyDataBlock
+{
+ /* The number of unprocessed lines in the current block. */
+ pg_atomic_uint32 unprocessed_line_parts;
+
+ /*
+ * If the current line data is continued into another block, following_block
+ * will have the position where the remaining data need to be read.
+ */
+ uint32 following_block;
+
+ /*
+ * This flag will be set, when the leader finds out this block can be read
+ * safely by the worker. This helps the worker to start processing the line
+ * early where the line will be spread across many blocks and the worker
+ * need not wait for the complete line to be processed.
+ */
+ bool curr_blk_completed;
+ char data[DATA_BLOCK_SIZE]; /* data read from file */
+ uint8 skip_bytes;
+}ParallelCopyDataBlock;
+
+/*
+ * Individual line information.
+ * ParallelCopyLineBoundary is common data structure between leader & worker,
+ * Leader process will be populating data block, data block offset & the size of
+ * the record in DSM for the workers to copy the data into the relation.
+ * This is protected by the following sequence in the leader & worker. If they
+ * don't follow this order the worker might process wrong line_size and leader
+ * might populate the information which worker has not yet processed or in the
+ * process of processing.
+ * Leader should operate in the following order:
+ * 1) check if line_size is -1, if not wait, it means worker is still
+ * processing.
+ * 2) set line_state to LINE_LEADER_POPULATING.
+ * 3) update first_block, start_offset & cur_lineno in any order.
+ * 4) update line_size.
+ * 5) update line_state to LINE_LEADER_POPULATED.
+ * Worker should operate in the following order:
+ * 1) check line_state is LINE_LEADER_POPULATED, if not it means leader is still
+ * populating the data.
+ * 2) read line_size.
+ * 3) only one worker should choose one line for processing, this is handled by
+ * using pg_atomic_compare_exchange_u32, worker will change the sate to
+ * LINE_WORKER_PROCESSING only if line_state is LINE_LEADER_POPULATED.
+ * 4) read first_block, start_offset & cur_lineno in any order.
+ * 5) process line_size data.
+ * 6) update line_size to -1.
+ */
+typedef struct ParallelCopyLineBoundary
+{
+ /* Position of the first block in data_blocks array. */
+ uint32 first_block;
+ uint32 start_offset; /* start offset of the line */
+
+ /*
+ * Size of the current line -1 means line is yet to be filled completely,
+ * 0 means empty line, >0 means line filled with line size data.
+ */
+ pg_atomic_uint32 line_size;
+ pg_atomic_uint32 line_state; /* line state */
+ uint64 cur_lineno; /* line number for error messages */
+}ParallelCopyLineBoundary;
+
+/*
+ * Circular queue used to store the line information.
+ */
+typedef struct ParallelCopyLineBoundaries
+{
+ /* Position for the leader to populate a line. */
+ uint32 pos;
+
+ /* Data read from the file/stdin by the leader process. */
+ ParallelCopyLineBoundary ring[RINGSIZE];
+}ParallelCopyLineBoundaries;
+
+/*
+ * Shared information among parallel copy workers. This will be allocated in the
+ * DSM segment.
+ */
+typedef struct ParallelCopyShmInfo
+{
+ bool is_read_in_progress; /* file read status */
+
+ /*
+ * Actual lines inserted by worker, will not be same as
+ * total_worker_processed if where condition is specified along with copy.
+ * This will be the actual records inserted into the relation.
+ */
+ pg_atomic_uint64 processed;
+
+ /*
+ * The number of records currently processed by the worker, this will also
+ * include the number of records that was filtered because of where clause.
+ */
+ pg_atomic_uint64 total_worker_processed;
+ uint64 populated; /* lines populated by leader */
+ uint32 cur_block_pos; /* current data block */
+ ParallelCopyDataBlock data_blocks[MAX_BLOCKS_COUNT]; /* data block array */
+ FullTransactionId full_transaction_id; /* xid for copy from statement */
+ CommandId mycid; /* command id */
+ ParallelCopyLineBoundaries line_boundaries; /* line array */
+} ParallelCopyShmInfo;
+
+/*
+ * Parallel copy line buffer information.
+ */
+typedef struct ParallelCopyLineBuf
+{
+ StringInfoData line_buf;
+ uint64 cur_lineno; /* line number for error messages */
+}ParallelCopyLineBuf;
+
+/*
+ * Parallel copy data information.
+ */
+typedef struct ParallelCopyData
+{
+ Oid relid; /* relation id of the table */
+ ParallelCopyShmInfo *pcshared_info; /* common info in shared memory */
+ bool is_leader;
+
+ /* line position which worker is processing */
+ uint32 worker_processed_pos;
+
+ /*
+ * Local line_buf array, workers will copy it here and release the lines
+ * for the leader to continue.
+ */
+ ParallelCopyLineBuf worker_line_buf[WORKER_CHUNK_COUNT];
+ uint32 worker_line_buf_count; /* Number of lines */
+
+ /* Current position in worker_line_buf */
+ uint32 worker_line_buf_pos;
+}ParallelCopyData;
+
+/*
* This struct contains all the state variables used throughout a COPY
* operation. For simplicity, we use the same struct for all variants of COPY,
* even though some fields are used in only some cases.
@@ -230,10 +401,38 @@ typedef struct CopyStateData
char *raw_buf;
int raw_buf_index; /* next byte to process */
int raw_buf_len; /* total # of bytes stored */
+ int nworkers;
+ bool is_parallel;
+ ParallelCopyData *pcdata;
/* Shorthand for number of unconsumed bytes available in raw_buf */
#define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
} CopyStateData;
+/*
+ * This structure helps in storing the common data from CopyStateData that are
+ * required by the workers. This information will then be allocated and stored
+ * into the DSM for the worker to retrieve and copy it to CopyStateData.
+ */
+typedef struct SerializedParallelCopyState
+{
+ /* low-level state data */
+ CopyDest copy_dest; /* type of copy source/destination */
+ int file_encoding; /* file or remote side's character encoding */
+ bool need_transcoding; /* file encoding diff from server? */
+ bool encoding_embeds_ascii; /* ASCII can be non-first byte? */
+
+ /* parameters from the COPY command */
+ bool csv_mode; /* Comma Separated Value format? */
+ bool header_line; /* CSV header line? */
+ int null_print_len; /* length of same */
+ bool force_quote_all; /* FORCE_QUOTE *? */
+ bool convert_selectively; /* do selective binary conversion? */
+
+ /* Working state for COPY FROM */
+ AttrNumber num_defaults;
+ Oid relid;
+}SerializedParallelCopyState;
+
/* DestReceiver for COPY (query) TO */
typedef struct
{
@@ -263,6 +462,22 @@ typedef struct
/* Trim the list of buffers back down to this number after flushing */
#define MAX_PARTITION_BUFFERS 32
+/* DSM keys for parallel copy. */
+#define PARALLEL_COPY_KEY_SHARED_INFO 1
+#define PARALLEL_COPY_KEY_CSTATE 2
+#define PARALLEL_COPY_KEY_NULL_PRINT 3
+#define PARALLEL_COPY_KEY_DELIM 4
+#define PARALLEL_COPY_KEY_QUOTE 5
+#define PARALLEL_COPY_KEY_ESCAPE 6
+#define PARALLEL_COPY_KEY_ATTNAME_LIST 7
+#define PARALLEL_COPY_KEY_NOT_NULL_LIST 8
+#define PARALLEL_COPY_KEY_NULL_LIST 9
+#define PARALLEL_COPY_KEY_CONVERT_LIST 10
+#define PARALLEL_COPY_KEY_WHERE_CLAUSE_STR 11
+#define PARALLEL_COPY_KEY_RANGE_TABLE 12
+#define PARALLEL_COPY_WAL_USAGE 13
+#define PARALLEL_COPY_BUFFER_USAGE 14
+
/* Stores multi-insert data related to a single relation in CopyFrom. */
typedef struct CopyMultiInsertBuffer
{
@@ -424,11 +639,477 @@ static bool CopyGetInt16(CopyState cstate, int16 *val);
static bool CopyLoadRawBuf(CopyState cstate);
static int CopyReadBinaryData(CopyState cstate, char *dest, int nbytes);
+static pg_attribute_always_inline void EndParallelCopy(ParallelContext *pcxt);
static void PopulateCommonCstateInfo(CopyState cstate, TupleDesc tup_desc,
- List *attnamelist);
+ List *attnamelist);
static void ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
int copy_line_pos, int *copy_line_size);
static void ConvertToServerEncoding(CopyState cstate);
+
+/*
+ * SerializeParallelCopyState - Copy shared_cstate using cstate information.
+ */
+static pg_attribute_always_inline void
+SerializeParallelCopyState(CopyState cstate, SerializedParallelCopyState *shared_cstate)
+{
+ shared_cstate->copy_dest = cstate->copy_dest;
+ shared_cstate->file_encoding = cstate->file_encoding;
+ shared_cstate->need_transcoding = cstate->need_transcoding;
+ shared_cstate->encoding_embeds_ascii = cstate->encoding_embeds_ascii;
+ shared_cstate->csv_mode = cstate->csv_mode;
+ shared_cstate->header_line = cstate->header_line;
+ shared_cstate->null_print_len = cstate->null_print_len;
+ shared_cstate->force_quote_all = cstate->force_quote_all;
+ shared_cstate->convert_selectively = cstate->convert_selectively;
+ shared_cstate->num_defaults = cstate->num_defaults;
+ shared_cstate->relid = cstate->pcdata->relid;
+}
+
+/*
+ * RestoreString - Retrieve the string from shared memory.
+ */
+static void
+RestoreString(shm_toc *toc, int sharedkey, char **copystr)
+{
+ char *shared_str_val = (char *)shm_toc_lookup(toc, sharedkey, true);
+ if (shared_str_val)
+ *copystr = pstrdup(shared_str_val);
+}
+
+/*
+ * EstimateLineKeysStr - Estimate the size required in shared memory for the
+ * input string.
+ */
+static void
+EstimateLineKeysStr(ParallelContext *pcxt, char *inputstr)
+{
+ if (inputstr)
+ {
+ shm_toc_estimate_chunk(&pcxt->estimator, strlen(inputstr) + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+}
+
+/*
+ * SerializeString - Insert a string into shared memory.
+ */
+static void
+SerializeString(ParallelContext *pcxt, int key, char *inputstr)
+{
+ if (inputstr)
+ {
+ char *shmptr = (char *)shm_toc_allocate(pcxt->toc,
+ strlen(inputstr) + 1);
+ strcpy(shmptr, inputstr);
+ shm_toc_insert(pcxt->toc, key, shmptr);
+ }
+}
+
+/*
+ * PopulateParallelCopyShmInfo - set ParallelCopyShmInfo.
+ */
+static void
+PopulateParallelCopyShmInfo(ParallelCopyShmInfo *shared_info_ptr,
+ FullTransactionId full_transaction_id)
+{
+ uint32 count;
+
+ MemSet(shared_info_ptr, 0, sizeof(ParallelCopyShmInfo));
+ shared_info_ptr->is_read_in_progress = true;
+ shared_info_ptr->cur_block_pos = -1;
+ shared_info_ptr->full_transaction_id = full_transaction_id;
+ shared_info_ptr->mycid = GetCurrentCommandId(true);
+ for (count = 0; count < RINGSIZE; count++)
+ {
+ ParallelCopyLineBoundary *lineInfo = &shared_info_ptr->line_boundaries.ring[count];
+ pg_atomic_init_u32(&(lineInfo->line_size), -1);
+ }
+}
+
+/*
+ * BeginParallelCopy - start parallel copy tasks.
+ *
+ * Get the number of workers required to perform the parallel copy. The data
+ * structures that are required by the parallel workers will be initialized, the
+ * size required in DSM will be calculated and the necessary keys will be loaded
+ * in the DSM. The specified number of workers will then be launched.
+ */
+static ParallelContext*
+BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
+{
+ ParallelContext *pcxt;
+ ParallelCopyShmInfo *shared_info_ptr;
+ SerializedParallelCopyState *shared_cstate;
+ FullTransactionId full_transaction_id;
+ Size est_cstateshared;
+ char *whereClauseStr = NULL;
+ char *rangeTableStr = NULL;
+ char *attnameListStr = NULL;
+ char *notnullListStr = NULL;
+ char *nullListStr = NULL;
+ char *convertListStr = NULL;
+ int parallel_workers = 0;
+ WalUsage *walusage;
+ BufferUsage *bufferusage;
+ ParallelCopyData *pcdata;
+ MemoryContext oldcontext;
+
+ parallel_workers = Min(nworkers, max_worker_processes);
+
+ /* Can't perform copy in parallel */
+ if (parallel_workers <= 0)
+ return NULL;
+
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+ pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
+ MemoryContextSwitchTo(oldcontext);
+ cstate->pcdata = pcdata;
+
+ EnterParallelMode();
+ pcxt = CreateParallelContext("postgres", "ParallelCopyMain",
+ parallel_workers);
+ Assert(pcxt->nworkers > 0);
+
+ /*
+ * Estimate size for shared information for PARALLEL_COPY_KEY_SHARED_INFO
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator, sizeof(ParallelCopyShmInfo));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate the size for shared information for PARALLEL_COPY_KEY_CSTATE */
+ est_cstateshared = MAXALIGN(sizeof(SerializedParallelCopyState));
+ shm_toc_estimate_chunk(&pcxt->estimator, est_cstateshared);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ EstimateLineKeysStr(pcxt, cstate->null_print);
+ EstimateLineKeysStr(pcxt, cstate->null_print_client);
+ EstimateLineKeysStr(pcxt, cstate->delim);
+ EstimateLineKeysStr(pcxt, cstate->quote);
+ EstimateLineKeysStr(pcxt, cstate->escape);
+
+ if (cstate->whereClause != NULL)
+ {
+ whereClauseStr = nodeToString(cstate->whereClause);
+ EstimateLineKeysStr(pcxt, whereClauseStr);
+ }
+
+ if (cstate->range_table != NULL)
+ {
+ rangeTableStr = nodeToString(cstate->range_table);
+ EstimateLineKeysStr(pcxt, rangeTableStr);
+ }
+
+ /* Estimate the size for shared information for PARALLEL_COPY_KEY_XID. */
+ shm_toc_estimate_chunk(&pcxt->estimator, sizeof(FullTransactionId));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_ATTNAME_LIST.
+ */
+ if (attnamelist != NIL)
+ {
+ attnameListStr = nodeToString(attnamelist);
+ EstimateLineKeysStr(pcxt, attnameListStr);
+ }
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_NOT_NULL_LIST.
+ */
+ if (cstate->force_notnull != NIL)
+ {
+ notnullListStr = nodeToString(cstate->force_notnull);
+ EstimateLineKeysStr(pcxt, notnullListStr);
+ }
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_NULL_LIST.
+ */
+ if (cstate->force_null != NIL)
+ {
+ nullListStr = nodeToString(cstate->force_null);
+ EstimateLineKeysStr(pcxt, nullListStr);
+ }
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_CONVERT_LIST.
+ */
+ if (cstate->convert_select != NIL)
+ {
+ convertListStr = nodeToString(cstate->convert_select);
+ EstimateLineKeysStr(pcxt, convertListStr);
+ }
+
+ /*
+ * Estimate space for WalUsage and BufferUsage -- PARALLEL_COPY_WAL_USAGE
+ * and PARALLEL_COPY_BUFFER_USAGE.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ InitializeParallelDSM(pcxt);
+
+ /* If no DSM segment was available, back out (do serial copy) */
+ if (pcxt->seg == NULL)
+ {
+ EndParallelCopy(pcxt);
+ return NULL;
+ }
+
+ /* Allocate shared memory for PARALLEL_COPY_KEY_SHARED_INFO */
+ shared_info_ptr = (ParallelCopyShmInfo *) shm_toc_allocate(pcxt->toc, sizeof(ParallelCopyShmInfo));
+ PopulateParallelCopyShmInfo(shared_info_ptr, full_transaction_id);
+
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_SHARED_INFO, shared_info_ptr);
+ pcdata->pcshared_info = shared_info_ptr;
+ pcdata->relid = relid;
+
+ /* Store shared build state, for which we reserved space. */
+ shared_cstate = (SerializedParallelCopyState *)shm_toc_allocate(pcxt->toc, est_cstateshared);
+
+ /* copy cstate variables. */
+ SerializeParallelCopyState(cstate, shared_cstate);
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_CSTATE, shared_cstate);
+
+ SerializeString(pcxt, PARALLEL_COPY_KEY_NULL_PRINT, cstate->null_print);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_DELIM, cstate->delim);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_QUOTE, cstate->quote);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_ESCAPE, cstate->escape);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_ATTNAME_LIST, attnameListStr);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_NOT_NULL_LIST, notnullListStr);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_NULL_LIST, nullListStr);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_CONVERT_LIST, convertListStr);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_WHERE_CLAUSE_STR, whereClauseStr);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_RANGE_TABLE, rangeTableStr);
+
+ /*
+ * Allocate space for each worker's WalUsage and BufferUsage; no need to
+ * initialize.
+ */
+ walusage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_WAL_USAGE, walusage);
+ bufferusage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_BUFFER_USAGE, bufferusage);
+
+ LaunchParallelWorkers(pcxt);
+ if (pcxt->nworkers_launched == 0)
+ {
+ EndParallelCopy(pcxt);
+ return NULL;
+ }
+
+ /*
+ * Caller needs to wait for all launched workers when we return. Make sure
+ * that the failure-to-start case will not hang forever.
+ */
+ WaitForParallelWorkersToAttach(pcxt);
+
+ pcdata->is_leader = true;
+ cstate->is_parallel = true;
+ return pcxt;
+}
+
+/*
+ * EndParallelCopy - end the parallel copy tasks.
+ */
+static pg_attribute_always_inline void
+EndParallelCopy(ParallelContext *pcxt)
+{
+ Assert(!IsParallelWorker());
+
+ DestroyParallelContext(pcxt);
+ ExitParallelMode();
+}
+
+/*
+ * InitializeParallelCopyInfo - Initialize parallel worker.
+ */
+static void
+InitializeParallelCopyInfo(SerializedParallelCopyState *shared_cstate,
+ CopyState cstate, List *attnamelist)
+{
+ uint32 count;
+ ParallelCopyData *pcdata = cstate->pcdata;
+ TupleDesc tup_desc = RelationGetDescr(cstate->rel);
+
+ cstate->copy_dest = shared_cstate->copy_dest;
+ cstate->file_encoding = shared_cstate->file_encoding;
+ cstate->need_transcoding = shared_cstate->need_transcoding;
+ cstate->encoding_embeds_ascii = shared_cstate->encoding_embeds_ascii;
+ cstate->csv_mode = shared_cstate->csv_mode;
+ cstate->header_line = shared_cstate->header_line;
+ cstate->null_print_len = shared_cstate->null_print_len;
+ cstate->force_quote_all = shared_cstate->force_quote_all;
+ cstate->convert_selectively = shared_cstate->convert_selectively;
+ cstate->num_defaults = shared_cstate->num_defaults;
+ pcdata->relid = shared_cstate->relid;
+
+ PopulateCommonCstateInfo(cstate, tup_desc, attnamelist);
+
+ /* Initialize state variables. */
+ cstate->reached_eof = false;
+ cstate->eol_type = EOL_UNKNOWN;
+ cstate->cur_relname = RelationGetRelationName(cstate->rel);
+ cstate->cur_lineno = 0;
+ cstate->cur_attname = NULL;
+ cstate->cur_attval = NULL;
+
+ /* Set up variables to avoid per-attribute overhead. */
+ initStringInfo(&cstate->attribute_buf);
+
+ initStringInfo(&cstate->line_buf);
+ for (count = 0; count < WORKER_CHUNK_COUNT; count++)
+ initStringInfo(&pcdata->worker_line_buf[count].line_buf);
+
+ cstate->line_buf_converted = false;
+ cstate->raw_buf = NULL;
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+}
+
+/*
+ * ParallelCopyMain - parallel copy worker's code.
+ *
+ * Where clause handling, convert tuple to columns, add default null values for
+ * the missing columns that are not present in that record. Find the partition
+ * if it is partitioned table, invoke before row insert Triggers, handle
+ * constraints and insert the tuples.
+ */
+void
+ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
+{
+ CopyState cstate;
+ ParallelCopyData *pcdata;
+ ParallelCopyShmInfo *pcshared_info;
+ SerializedParallelCopyState *shared_cstate;
+ Relation rel = NULL;
+ MemoryContext oldcontext;
+ List *attlist = NIL;
+ char *whereClauseStr = NULL;
+ char *rangeTableStr = NULL;
+ char *attnameListStr = NULL;
+ char *notnullListStr = NULL;
+ char *nullListStr = NULL;
+ char *convertListStr = NULL;
+ WalUsage *walusage;
+ BufferUsage *bufferusage;
+
+ /* Allocate workspace and zero all fields. */
+ cstate = (CopyStateData *) palloc0(sizeof(CopyStateData));
+
+ /*
+ * We allocate everything used by a cstate in a new memory context. This
+ * avoids memory leaks during repeated use of COPY in a query.
+ */
+ cstate->copycontext = AllocSetContextCreate(CurrentMemoryContext,
+ "COPY",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+
+ pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
+ cstate->pcdata = pcdata;
+ pcdata->is_leader = false;
+ pcdata->worker_processed_pos = -1;
+ cstate->is_parallel = true;
+ pcshared_info = (ParallelCopyShmInfo *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_SHARED_INFO, false);
+
+ ereport(DEBUG1, (errmsg("Starting parallel copy worker")));
+
+ pcdata->pcshared_info = pcshared_info;
+
+ shared_cstate = (SerializedParallelCopyState *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_CSTATE, false);
+ cstate->null_print = (char *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_NULL_PRINT, true);
+
+ RestoreString(toc, PARALLEL_COPY_KEY_DELIM, &cstate->delim);
+ RestoreString(toc, PARALLEL_COPY_KEY_QUOTE, &cstate->quote);
+ RestoreString(toc, PARALLEL_COPY_KEY_ESCAPE, &cstate->escape);
+ RestoreString(toc, PARALLEL_COPY_KEY_ATTNAME_LIST, &attnameListStr);
+ if (attnameListStr)
+ attlist = (List *)stringToNode(attnameListStr);
+
+ RestoreString(toc, PARALLEL_COPY_KEY_NOT_NULL_LIST, ¬nullListStr);
+ if (notnullListStr)
+ cstate->force_notnull = (List *)stringToNode(notnullListStr);
+
+ RestoreString(toc, PARALLEL_COPY_KEY_NULL_LIST, &nullListStr);
+ if (nullListStr)
+ cstate->force_null = (List *)stringToNode(nullListStr);
+
+ RestoreString(toc, PARALLEL_COPY_KEY_CONVERT_LIST, &convertListStr);
+ if (convertListStr)
+ cstate->convert_select = (List *)stringToNode(convertListStr);
+
+ RestoreString(toc, PARALLEL_COPY_KEY_WHERE_CLAUSE_STR, &whereClauseStr);
+ RestoreString(toc, PARALLEL_COPY_KEY_RANGE_TABLE, &rangeTableStr);
+
+ if (whereClauseStr)
+ {
+ Node *whereClauseCnvrtdFrmStr = (Node *) stringToNode(whereClauseStr);
+ cstate->whereClause = whereClauseCnvrtdFrmStr;
+ }
+
+ if (rangeTableStr)
+ {
+ List *rangeTableCnvrtdFrmStr = (List *) stringToNode(rangeTableStr);
+ cstate->range_table = rangeTableCnvrtdFrmStr;
+ }
+
+ /* Open and lock the relation, using the appropriate lock type. */
+ rel = table_open(shared_cstate->relid, RowExclusiveLock);
+ cstate->rel = rel;
+ InitializeParallelCopyInfo(shared_cstate, cstate, attlist);
+
+ CopyFrom(cstate);
+
+ if (rel != NULL)
+ table_close(rel, RowExclusiveLock);
+
+ /* Report WAL/buffer usage during parallel execution */
+ bufferusage = shm_toc_lookup(toc, PARALLEL_COPY_BUFFER_USAGE, false);
+ walusage = shm_toc_lookup(toc, PARALLEL_COPY_WAL_USAGE, false);
+ InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber],
+ &walusage[ParallelWorkerNumber]);
+
+ MemoryContextSwitchTo(oldcontext);
+ pfree(cstate);
+ return;
+}
+
+/*
+ * ParallelCopyFrom - parallel copy leader's functionality.
+ *
+ * Leader executes the before statement for before statement trigger, if before
+ * statement trigger is present. It will read the table data from the file and
+ * copy the contents to DSM data blocks. It will then read the input contents
+ * from the DSM data block and identify the records based on line breaks. This
+ * information is called line or a record that need to be inserted into a
+ * relation. The line information will be stored in ParallelCopyLineBoundary DSM
+ * data structure. Workers will then process this information and insert the
+ * data in to table. It will repeat this process until the all data is read from
+ * the file and all the DSM data blocks are processed. While processing if
+ * leader identifies that DSM Data blocks or DSM ParallelCopyLineBoundary data
+ * structures is full, leader will wait till the worker frees up some entries
+ * and repeat the process. It will wait till all the lines populated are
+ * processed by the workers and exits.
+ */
+static void
+ParallelCopyFrom(CopyState cstate)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ereport(DEBUG1, (errmsg("Running parallel copy leader")));
+
+ pcshared_info->is_read_in_progress = false;
+ cstate->cur_lineno = 0;
+}
+
/*
* Send copy start/stop messages for frontend copies. These have changed
* in past protocol redesigns.
@@ -1141,6 +1822,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
if (is_from)
{
+ ParallelContext *pcxt = NULL;
Assert(rel);
/* check read-only transaction and parallel mode */
@@ -1150,7 +1832,24 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
cstate = BeginCopyFrom(pstate, rel, stmt->filename, stmt->is_program,
NULL, stmt->attlist, stmt->options);
cstate->whereClause = whereClause;
- *processed = CopyFrom(cstate); /* copy from file to database */
+ cstate->is_parallel = false;
+
+ if (cstate->nworkers > 0)
+ pcxt = BeginParallelCopy(cstate->nworkers, cstate, stmt->attlist,
+ relid);
+
+ if (pcxt)
+ {
+ ParallelCopyFrom(cstate);
+
+ /* Wait for all copy workers to finish */
+ WaitForParallelWorkersToFinish(pcxt);
+ *processed = pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
+ EndParallelCopy(pcxt);
+ }
+ else
+ *processed = CopyFrom(cstate); /* copy from file to database */
+
EndCopyFrom(cstate);
}
else
@@ -1199,6 +1898,7 @@ ProcessCopyOptions(ParseState *pstate,
cstate->is_copy_from = is_from;
cstate->file_encoding = -1;
+ cstate->nworkers = -1;
/* Extract options from the statement node tree */
foreach(option, options)
@@ -1367,6 +2067,53 @@ ProcessCopyOptions(ParseState *pstate,
defel->defname),
parser_errposition(pstate, defel->location)));
}
+ else if (strcmp(defel->defname, "parallel") == 0)
+ {
+ char *endptr, *str;
+ long val;
+
+ if (!cstate->is_copy_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("parallel option supported only for copy from"),
+ parser_errposition(pstate, defel->location)));
+ if (cstate->nworkers >= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("conflicting or redundant options"),
+ parser_errposition(pstate, defel->location)));
+
+ str = defGetString(defel);
+
+ errno = 0; /* To distinguish success/failure after call */
+ val = strtol(str, &endptr, 10);
+
+ /* Check for various possible errors */
+ if ((errno == ERANGE && (val == LONG_MAX || val == LONG_MIN))
+ || (errno != 0 && val == 0) ||
+ *endptr)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("improper use of argument to option \"%s\"",
+ defel->defname),
+ parser_errposition(pstate, defel->location)));
+
+ if (endptr == str)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("no digits were found in argument to option \"%s\"",
+ defel->defname),
+ parser_errposition(pstate, defel->location)));
+
+ cstate->nworkers = (int) val;
+
+ if (cstate->nworkers <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("argument to option \"%s\" must be a positive integer greater than zero",
+ defel->defname),
+ parser_errposition(pstate, defel->location)));
+ }
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1720,11 +2467,12 @@ BeginCopy(ParseState *pstate,
/*
* PopulateCommonCstateInfo - Populates the common variables required for copy
- * from operation. This is a helper function for BeginCopy function.
+ * from operation. This is a helper function for BeginCopy &
+ * InitializeParallelCopyInfo function.
*/
static void
PopulateCommonCstateInfo(CopyState cstate, TupleDesc tupDesc,
- List *attnamelist)
+ List *attnamelist)
{
int num_phys_attrs;
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c639833..82843c6 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -14,6 +14,7 @@
#ifndef COPY_H
#define COPY_H
+#include "access/parallel.h"
#include "nodes/execnodes.h"
#include "nodes/parsenodes.h"
#include "parser/parse_node.h"
@@ -41,4 +42,5 @@ extern uint64 CopyFrom(CopyState cstate);
extern DestReceiver *CreateCopyDestReceiver(void);
+extern void ParallelCopyMain(dsm_segment *seg, shm_toc *toc);
#endif /* COPY_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 7eaaad1..3a83d0f 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1699,6 +1699,12 @@ ParallelBitmapHeapState
ParallelBlockTableScanDesc
ParallelCompletionPtr
ParallelContext
+ParallelCopyLineBoundaries
+ParallelCopyLineBoundary
+ParallelCopyData
+ParallelCopyDataBlock
+ParallelCopyLineBuf
+ParallelCopyShmInfo
ParallelExecutorInfo
ParallelHashGrowth
ParallelHashJoinBatch
@@ -2215,6 +2221,7 @@ SerCommitSeqNo
SerialControl
SerializableXactHandle
SerializedActiveRelMaps
+SerializedParallelCopyState
SerializedReindexState
SerializedSnapshotData
SerializedTransactionState
--
1.8.3.1
v2-0003-Allow-copy-from-command-to-process-data-from-file.patchapplication/x-patch; name=v2-0003-Allow-copy-from-command-to-process-data-from-file.patchDownload
From f3ec188c3c1e11fdbb716ec12862a891a95b396f Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Sat, 1 Aug 2020 09:31:42 +0530
Subject: [PATCH v2 3/5] Allow copy from command to process data from
file/STDIN contents to a table in parallel.
This feature allows the copy from to leverage multiple CPUs in order to copy
data from file/STDIN to a table. This adds a PARALLEL option to COPY FROM
command where the user can specify the number of workers that can be used
to perform the COPY FROM command. Specifying zero as number of workers will
disable parallelism.
The backend, to which the "COPY FROM" query is submitted acts as leader with
the responsibility of reading data from the file/stdin, launching at most n
number of workers as specified with PARALLEL 'n' option in the "COPY FROM"
query. The leader populates the common data required for the workers execution
in the DSM and shares it with the workers. The leader then executes before
statement triggers if there exists any. Leader populates DSM lines which
includes the start offset and line size, while populating the lines it reads
as many blocks as required into the DSM data blocks from the file. Each block
is of 64K size. The leader parses the data to identify a line, the existing
logic from CopyReadLineText which identifies the lines with some changes was
used for this. Leader checks if a free line is available to copy the
information, if there is no free line it waits till the required line is
freed up by the worker and then copies the identified lines information
(offset & line size) into the DSM lines. This process is repeated till the
complete file is processed. Simultaneously, the workers cache the lines(50)
locally into the local memory and release the lines to the leader for further
populating. Each worker processes the lines that it cached and inserts it into
the table. The leader does not participate in the insertion of data, leaders
only responsibility will be to identify the lines as fast as possible for the
workers to do the actual copy operation. The leader waits till all the lines
populated are processed by the workers and exits.
We have chosen this design based on the reason "that everything stalls if the
leader doesn't accept further input data, as well as when there are no
available splitted chunks so it doesn't seem like a good idea to have the
leader do other work. This is backed by the performance data where we have
seen that with 1 worker there is just a 5-10% performance difference".
---
src/backend/access/common/toast_internals.c | 11 +-
src/backend/access/heap/heapam.c | 13 -
src/backend/access/transam/xact.c | 31 +
src/backend/commands/copy.c | 890 ++++++++++++++++++++++++++--
src/bin/psql/tab-complete.c | 2 +-
src/include/access/xact.h | 2 +
src/tools/pgindent/typedefs.list | 1 +
7 files changed, 896 insertions(+), 54 deletions(-)
diff --git a/src/backend/access/common/toast_internals.c b/src/backend/access/common/toast_internals.c
index 25a81e5..586d53d 100644
--- a/src/backend/access/common/toast_internals.c
+++ b/src/backend/access/common/toast_internals.c
@@ -17,6 +17,7 @@
#include "access/genam.h"
#include "access/heapam.h"
#include "access/heaptoast.h"
+#include "access/parallel.h"
#include "access/table.h"
#include "access/toast_internals.h"
#include "access/xact.h"
@@ -116,7 +117,15 @@ toast_save_datum(Relation rel, Datum value,
TupleDesc toasttupDesc;
Datum t_values[3];
bool t_isnull[3];
- CommandId mycid = GetCurrentCommandId(true);
+
+ /*
+ * Parallel copy can insert toast tuples, in case of parallel copy the
+ * command would have been set already by calling AssignCommandIdForWorker.
+ * For parallel copy call GetCurrentCommandId to get currentCommandId by
+ * passing used as false, as this is taken care earlier.
+ */
+ CommandId mycid = IsParallelWorker() ? GetCurrentCommandId(false) :
+ GetCurrentCommandId(true);
struct varlena *result;
struct varatt_external toast_pointer;
union
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 5eef225..bd7a7fc 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2023,19 +2023,6 @@ static HeapTuple
heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
CommandId cid, int options)
{
- /*
- * Parallel operations are required to be strictly read-only in a parallel
- * worker. Parallel inserts are not safe even in the leader in the
- * general case, because group locking means that heavyweight locks for
- * relation extension or GIN page locks will not conflict between members
- * of a lock group, but we don't prohibit that case here because there are
- * useful special cases that we can safely allow, such as CREATE TABLE AS.
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-
tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index d4f7c29..9bff390 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -504,6 +504,37 @@ GetCurrentFullTransactionIdIfAny(void)
}
/*
+ * AssignFullTransactionIdForWorker
+ *
+ * For parallel copy, transaction id of leader will be used by the workers.
+ */
+void
+AssignFullTransactionIdForWorker(FullTransactionId fullTransactionId)
+{
+ TransactionState s = CurrentTransactionState;
+
+ Assert((IsInParallelMode() || IsParallelWorker()));
+ s->fullTransactionId = fullTransactionId;
+}
+
+/*
+ * AssignCommandIdForWorker
+ *
+ * For parallel copy, command id of leader will be used by the workers.
+ */
+void
+AssignCommandIdForWorker(CommandId commandId, bool used)
+{
+ Assert((IsInParallelMode() || IsParallelWorker()));
+
+ /* this is global to a transaction, not subtransaction-local */
+ if (used)
+ currentCommandIdUsed = true;
+
+ currentCommandId = commandId;
+}
+
+/*
* MarkCurrentTransactionIdLoggedIfAny
*
* Remember that the current xid - if it is assigned - now has been wal logged.
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3dcdfc1..c65fc98 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -26,6 +26,7 @@
#include "access/xlog.h"
#include "catalog/dependency.h"
#include "catalog/pg_authid.h"
+#include "catalog/pg_proc_d.h"
#include "catalog/pg_type.h"
#include "commands/copy.h"
#include "commands/defrem.h"
@@ -40,11 +41,13 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "optimizer/clauses.h"
#include "optimizer/optimizer.h"
#include "parser/parse_coerce.h"
#include "parser/parse_collate.h"
#include "parser/parse_expr.h"
#include "parser/parse_relation.h"
+#include "pgstat.h"
#include "port/pg_bswap.h"
#include "rewrite/rewriteHandler.h"
#include "storage/fd.h"
@@ -95,6 +98,18 @@ typedef enum CopyInsertMethod
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
} CopyInsertMethod;
+/*
+ * State of the line.
+ */
+typedef enum ParallelCopyLineState
+{
+ LINE_INIT, /* initial state of line */
+ LINE_LEADER_POPULATING, /* leader processing line */
+ LINE_LEADER_POPULATED, /* leader completed populating line */
+ LINE_WORKER_PROCESSING, /* worker processing line */
+ LINE_WORKER_PROCESSED /* worker completed processing line */
+}ParallelCopyLineState;
+
#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
/*
@@ -123,6 +138,7 @@ typedef enum CopyInsertMethod
#define IsParallelCopy() (cstate->is_parallel)
#define IsLeader() (cstate->pcdata->is_leader)
+#define IsWorker() (IsParallelCopy() && !IsLeader())
#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
/*
@@ -556,9 +572,13 @@ if (1) \
{ \
if (raw_buf_ptr > cstate->raw_buf_index) \
{ \
- appendBinaryStringInfo(&cstate->line_buf, \
- cstate->raw_buf + cstate->raw_buf_index, \
- raw_buf_ptr - cstate->raw_buf_index); \
+ if (!IsParallelCopy()) \
+ appendBinaryStringInfo(&cstate->line_buf, \
+ cstate->raw_buf + cstate->raw_buf_index, \
+ raw_buf_ptr - cstate->raw_buf_index); \
+ else \
+ line_size += raw_buf_ptr - cstate->raw_buf_index; \
+ \
cstate->raw_buf_index = raw_buf_ptr; \
} \
} else ((void) 0)
@@ -571,26 +591,65 @@ if (1) \
goto not_end_of_copy; \
} else ((void) 0)
+/* Begin parallel copy Macros */
+#define SET_NEWLINE_SIZE() \
+{ \
+ if (cstate->eol_type == EOL_NL || cstate->eol_type == EOL_CR) \
+ new_line_size = 1; \
+ else if (cstate->eol_type == EOL_CRNL) \
+ new_line_size = 2; \
+ else \
+ new_line_size = 0; \
+}
+
+/*
+ * COPY_WAIT_TO_PROCESS - Wait before continuing to process.
+ */
+#define COPY_WAIT_TO_PROCESS() \
+{ \
+ CHECK_FOR_INTERRUPTS(); \
+ (void) WaitLatch(MyLatch, \
+ WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, \
+ 1L, WAIT_EVENT_PG_SLEEP); \
+ ResetLatch(MyLatch); \
+}
+
/*
* CLEAR_EOL_LINE - Wrapper for clearing EOL.
*/
#define CLEAR_EOL_LINE() \
if (!result && !IsHeaderLine()) \
- ClearEOLFromCopiedData(cstate, cstate->line_buf.data, \
- cstate->line_buf.len, \
- &cstate->line_buf.len) \
+{ \
+ if (IsParallelCopy()) \
+ ClearEOLFromCopiedData(cstate, cstate->raw_buf, \
+ raw_buf_ptr, &line_size); \
+ else \
+ ClearEOLFromCopiedData(cstate, cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ &cstate->line_buf.len); \
+} \
/*
* INCREMENTPROCESSED - Increment the lines processed.
*/
-#define INCREMENTPROCESSED(processed) \
-processed++;
+#define INCREMENTPROCESSED(processed) \
+{ \
+ if (!IsParallelCopy()) \
+ processed++; \
+ else \
+ pg_atomic_add_fetch_u64(&cstate->pcdata->pcshared_info->processed, 1); \
+}
/*
* GETPROCESSED - Get the lines processed.
*/
#define GETPROCESSED(processed) \
-return processed;
+if (!IsParallelCopy()) \
+ return processed; \
+else \
+ return pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
+
+/* End parallel copy Macros */
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
@@ -645,7 +704,10 @@ static void PopulateCommonCstateInfo(CopyState cstate, TupleDesc tup_desc,
static void ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
int copy_line_pos, int *copy_line_size);
static void ConvertToServerEncoding(CopyState cstate);
-
+static void ExecBeforeStmtTrigger(CopyState cstate);
+static void CheckTargetRelValidity(CopyState cstate);
+static void PopulateCstateCatalogInfo(CopyState cstate);
+static pg_attribute_always_inline uint32 GetLinePosition(CopyState cstate);
/*
* SerializeParallelCopyState - Copy shared_cstate using cstate information.
*/
@@ -727,6 +789,137 @@ PopulateParallelCopyShmInfo(ParallelCopyShmInfo *shared_info_ptr,
}
/*
+ * IsTriggerFunctionParallelSafe - Check if the trigger function is parallel
+ * safe for the triggers. Return false if any one of the trigger has parallel
+ * unsafe function.
+ */
+static pg_attribute_always_inline bool
+IsTriggerFunctionParallelSafe(TriggerDesc *trigdesc)
+{
+ int i;
+ for (i = 0; i < trigdesc->numtriggers; i++)
+ {
+ Trigger *trigger = &trigdesc->triggers[i];
+ int trigtype = RI_TRIGGER_NONE;
+
+ if (func_parallel(trigger->tgfoid) != PROPARALLEL_SAFE)
+ return false;
+
+ /* If the trigger is parallel safe, also look for RI_TRIGGER. */
+ trigtype = RI_FKey_trigger_type(trigger->tgfoid);
+ if (trigtype == RI_TRIGGER_PK || trigtype == RI_TRIGGER_FK)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * CheckExprParallelSafety - determine parallel safety of volatile expressions
+ * in default clause of column definition or in where clause and return true if
+ * they are parallel safe.
+ */
+static pg_attribute_always_inline bool
+CheckExprParallelSafety(CopyState cstate)
+{
+ if (contain_volatile_functions(cstate->whereClause))
+ {
+ if (max_parallel_hazard((Query *)cstate->whereClause) != PROPARALLEL_SAFE)
+ return false;
+ }
+
+ /*
+ * Check if any of the column has volatile default expression. if yes, and
+ * they are not parallel safe, then parallelism is not allowed. For
+ * instance, if there are any serial/bigserial columns for which nextval()
+ * default expression which is parallel unsafe is associated, parallelism
+ * should not be allowed. In non parallel copy volatile functions are not
+ * checked for nextval().
+ */
+ if (cstate->defexprs != NULL && cstate->num_defaults != 0)
+ {
+ int i;
+ for (i = 0; i < cstate->num_defaults; i++)
+ {
+ bool volatile_expr = contain_volatile_functions((Node *)cstate->defexprs[i]->expr);
+ if (volatile_expr &&
+ (max_parallel_hazard((Query *)cstate->defexprs[i]->expr)) !=
+ PROPARALLEL_SAFE)
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * FindInsertMethod - determine insert mode single, multi, or multi conditional.
+ */
+static pg_attribute_always_inline CopyInsertMethod
+FindInsertMethod(CopyState cstate)
+{
+ if (cstate->rel->trigdesc != NULL &&
+ (cstate->rel->trigdesc->trig_insert_before_row ||
+ cstate->rel->trigdesc->trig_insert_instead_row))
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
+ cstate->rel->trigdesc != NULL &&
+ cstate->rel->trigdesc->trig_insert_new_table)
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+ return CIM_MULTI_CONDITIONAL;
+
+ return CIM_MULTI;
+}
+
+/*
+ * IsParallelCopyAllowed - check for the cases where parallel copy is not
+ * applicable.
+ */
+static pg_attribute_always_inline bool
+IsParallelCopyAllowed(CopyState cstate)
+{
+ /* Parallel copy not allowed for frontend (2.0 protocol) & binary option. */
+ if ((cstate->copy_dest == COPY_OLD_FE) || cstate->binary)
+ return false;
+
+ /* Check if copy is into foreign table or temporary table. */
+ if (cstate->rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE ||
+ RelationUsesLocalBuffers(cstate->rel))
+ return false;
+
+ /* Check if trigger function is parallel safe. */
+ if (cstate->rel->trigdesc != NULL &&
+ !IsTriggerFunctionParallelSafe(cstate->rel->trigdesc))
+ return false;
+
+ /*
+ * Check if there is after statement or instead of trigger or transition
+ * table triggers.
+ */
+ if (cstate->rel->trigdesc != NULL &&
+ (cstate->rel->trigdesc->trig_insert_after_statement ||
+ cstate->rel->trigdesc->trig_insert_instead_row ||
+ cstate->rel->trigdesc->trig_insert_new_table))
+ return false;
+
+ /* Check if the volatile expressions are parallel safe, if present any. */
+ if (!CheckExprParallelSafety(cstate))
+ return false;
+
+ /* Check if the insertion mode is single. */
+ if (FindInsertMethod(cstate) == CIM_SINGLE)
+ return false;
+
+ return true;
+}
+
+/*
* BeginParallelCopy - start parallel copy tasks.
*
* Get the number of workers required to perform the parallel copy. The data
@@ -754,6 +947,7 @@ BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
ParallelCopyData *pcdata;
MemoryContext oldcontext;
+ CheckTargetRelValidity(cstate);
parallel_workers = Min(nworkers, max_worker_processes);
/* Can't perform copy in parallel */
@@ -765,6 +959,15 @@ BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
MemoryContextSwitchTo(oldcontext);
cstate->pcdata = pcdata;
+ /*
+ * User chosen parallel copy. Determine if the parallel copy is actually
+ * allowed. If not, go with the non-parallel mode.
+ */
+ if (!IsParallelCopyAllowed(cstate))
+ return NULL;
+
+ full_transaction_id = GetCurrentFullTransactionId();
+
EnterParallelMode();
pcxt = CreateParallelContext("postgres", "ParallelCopyMain",
parallel_workers);
@@ -973,9 +1176,212 @@ InitializeParallelCopyInfo(SerializedParallelCopyState *shared_cstate,
cstate->line_buf_converted = false;
cstate->raw_buf = NULL;
cstate->raw_buf_index = cstate->raw_buf_len = 0;
+
+ PopulateCstateCatalogInfo(cstate);
+
+ /* Create workspace for CopyReadAttributes results. */
+ if (!cstate->binary)
+ {
+ AttrNumber attr_count = list_length(cstate->attnumlist);
+
+ cstate->max_fields = attr_count;
+ cstate->raw_fields = (char **)palloc(attr_count * sizeof(char *));
+ }
+}
+
+/*
+ * CacheLineInfo - Cache the line information to local memory.
+ */
+static bool
+CacheLineInfo(CopyState cstate, uint32 buff_count)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyData *pcdata = cstate->pcdata;
+ uint32 write_pos;
+ ParallelCopyDataBlock *data_blk_ptr;
+ ParallelCopyLineBoundary *lineInfo;
+ uint32 offset;
+ int dataSize;
+ int copiedSize = 0;
+
+ resetStringInfo(&pcdata->worker_line_buf[buff_count].line_buf);
+ write_pos = GetLinePosition(cstate);
+ if (-1 == write_pos)
+ return true;
+
+ /* Get the current line information. */
+ lineInfo = &pcshared_info->line_boundaries.ring[write_pos];
+ if (pg_atomic_read_u32(&lineInfo->line_size) == 0)
+ goto empty_data_line_update;
+
+ /* Get the block information. */
+ data_blk_ptr = &pcshared_info->data_blocks[lineInfo->first_block];
+
+ /* Get the offset information from where the data must be copied. */
+ offset = lineInfo->start_offset;
+ pcdata->worker_line_buf[buff_count].cur_lineno = lineInfo->cur_lineno;
+
+ elog(DEBUG1, "[Worker] Processing - line position:%d, block:%d, unprocessed lines:%d, offset:%d, line size:%d",
+ write_pos, lineInfo->first_block,
+ pg_atomic_read_u32(&data_blk_ptr->unprocessed_line_parts),
+ offset, pg_atomic_read_u32(&lineInfo->line_size));
+
+ for (;;)
+ {
+ uint8 skip_bytes = data_blk_ptr->skip_bytes;
+ /*
+ * There is a possibility that the above loop has come out because
+ * data_blk_ptr->curr_blk_completed is set, but dataSize read might
+ * be an old value, if data_blk_ptr->curr_blk_completed and the line is
+ * completed, line_size will be set. Read the line_size again to be
+ * sure if it is complete or partial block.
+ */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+ if (dataSize)
+ {
+ int remainingSize = dataSize - copiedSize;
+ if (!remainingSize)
+ break;
+
+ /* Whole line is in current block. */
+ if (remainingSize + offset + skip_bytes < DATA_BLOCK_SIZE)
+ {
+ appendBinaryStringInfo(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ remainingSize);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts,
+ 1);
+ break;
+ }
+ else
+ {
+ /* Line is spread across the blocks. */
+ uint32 lineInCurrentBlock = (DATA_BLOCK_SIZE - skip_bytes) - offset;
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ lineInCurrentBlock);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts, 1);
+ copiedSize += lineInCurrentBlock;
+ while (copiedSize < dataSize)
+ {
+ uint32 currentBlockCopySize;
+ ParallelCopyDataBlock *currBlkPtr = &pcshared_info->data_blocks[data_blk_ptr->following_block];
+ skip_bytes = currBlkPtr->skip_bytes;
+
+ /*
+ * If complete data is present in current block use
+ * dataSize - copiedSize, or copy the whole block from
+ * current block.
+ */
+ currentBlockCopySize = Min(dataSize - copiedSize, DATA_BLOCK_SIZE - skip_bytes);
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &currBlkPtr->data[0],
+ currentBlockCopySize);
+ pg_atomic_sub_fetch_u32(&currBlkPtr->unprocessed_line_parts, 1);
+ copiedSize += currentBlockCopySize;
+ data_blk_ptr = currBlkPtr;
+ }
+
+ break;
+ }
+ }
+ else
+ {
+ /* Copy this complete block from the current offset. */
+ uint32 lineInCurrentBlock = (DATA_BLOCK_SIZE - skip_bytes) - offset;
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ lineInCurrentBlock);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts, 1);
+ copiedSize += lineInCurrentBlock;
+
+ /*
+ * Reset the offset. For the first copy, copy from the offset. For
+ * the subsequent copy the complete block.
+ */
+ offset = 0;
+
+ /* Set data_blk_ptr to the following block. */
+ data_blk_ptr = &pcshared_info->data_blocks[data_blk_ptr->following_block];
+ }
+
+ for (;;)
+ {
+ /* Get the size of this line */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+
+ /*
+ * If the data is present in current block lineInfo.line_size
+ * will be updated. If the data is spread across the blocks either
+ * of lineInfo.line_size or data_blk_ptr->curr_blk_completed can
+ * be updated. lineInfo.line_size will be updated if the complete
+ * read is finished. data_blk_ptr->curr_blk_completed will be
+ * updated if processing of current block is finished and data
+ * processing is not finished.
+ */
+ if (data_blk_ptr->curr_blk_completed || (dataSize != -1))
+ break;
+
+ COPY_WAIT_TO_PROCESS()
+ }
+ }
+
+empty_data_line_update:
+ elog(DEBUG1, "[Worker] Completed processing line:%d", write_pos);
+ pg_atomic_write_u32(&lineInfo->line_state, LINE_WORKER_PROCESSED);
+ pg_atomic_write_u32(&lineInfo->line_size, -1);
+ pg_atomic_add_fetch_u64(&pcshared_info->total_worker_processed, 1);
+ return false;
}
/*
+ * GetWorkerLine - Returns a line for worker to process.
+ */
+static bool
+GetWorkerLine(CopyState cstate)
+{
+ uint32 buff_count;
+ ParallelCopyData *pcdata = cstate->pcdata;
+
+ /*
+ * Copy the line data to line_buf and release the line position so that the
+ * worker can continue loading data.
+ */
+ if (pcdata->worker_line_buf_pos < pcdata->worker_line_buf_count)
+ goto return_line;
+
+ pcdata->worker_line_buf_pos = 0;
+ pcdata->worker_line_buf_count = 0;
+
+ for (buff_count = 0; buff_count < WORKER_CHUNK_COUNT; buff_count++)
+ {
+ bool result = CacheLineInfo(cstate, buff_count);
+ if (result)
+ break;
+
+ pcdata->worker_line_buf_count++;
+ }
+
+ if (pcdata->worker_line_buf_count)
+ goto return_line;
+ else
+ resetStringInfo(&cstate->line_buf);
+
+ return true;
+
+return_line:
+ cstate->line_buf = pcdata->worker_line_buf[pcdata->worker_line_buf_pos].line_buf;
+ cstate->cur_lineno = pcdata->worker_line_buf[pcdata->worker_line_buf_pos].cur_lineno;
+ cstate->line_buf_valid = true;
+
+ /* Mark that encoding conversion hasn't occurred yet. */
+ cstate->line_buf_converted = false;
+ ConvertToServerEncoding(cstate);
+ pcdata->worker_line_buf_pos++;
+ return false;
+ }
+
+/*
* ParallelCopyMain - parallel copy worker's code.
*
* Where clause handling, convert tuple to columns, add default null values for
@@ -1024,6 +1430,8 @@ ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
ereport(DEBUG1, (errmsg("Starting parallel copy worker")));
pcdata->pcshared_info = pcshared_info;
+ AssignFullTransactionIdForWorker(pcshared_info->full_transaction_id);
+ AssignCommandIdForWorker(pcshared_info->mycid, true);
shared_cstate = (SerializedParallelCopyState *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_CSTATE, false);
cstate->null_print = (char *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_NULL_PRINT, true);
@@ -1084,6 +1492,33 @@ ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
}
/*
+ * UpdateBlockInLineInfo - Update the line information.
+ */
+static pg_attribute_always_inline int
+UpdateBlockInLineInfo(CopyState cstate, uint32 blk_pos,
+ uint32 offset, uint32 line_size, uint32 line_state)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundaries *lineBoundaryPtr = &pcshared_info->line_boundaries;
+ ParallelCopyLineBoundary *lineInfo;
+ int line_pos = lineBoundaryPtr->pos;
+
+ /* Update the line information for the worker to pick and process. */
+ lineInfo = &lineBoundaryPtr->ring[line_pos];
+ while (pg_atomic_read_u32(&lineInfo->line_size) != -1)
+ COPY_WAIT_TO_PROCESS()
+
+ lineInfo->first_block = blk_pos;
+ lineInfo->start_offset = offset;
+ lineInfo->cur_lineno = cstate->cur_lineno;
+ pg_atomic_write_u32(&lineInfo->line_size, line_size);
+ pg_atomic_write_u32(&lineInfo->line_state, line_state);
+ lineBoundaryPtr->pos = (lineBoundaryPtr->pos + 1) % RINGSIZE;
+
+ return line_pos;
+}
+
+/*
* ParallelCopyFrom - parallel copy leader's functionality.
*
* Leader executes the before statement for before statement trigger, if before
@@ -1106,8 +1541,298 @@ ParallelCopyFrom(CopyState cstate)
ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
ereport(DEBUG1, (errmsg("Running parallel copy leader")));
+ /* Execute the before statement triggers from the leader */
+ ExecBeforeStmtTrigger(cstate);
+
+ /* On input just throw the header line away. */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
+ {
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ {
+ pcshared_info->is_read_in_progress = false;
+ return; /* done */
+ }
+ }
+
+ for (;;)
+ {
+ bool done;
+ cstate->cur_lineno++;
+
+ /* Actually read the line into memory here. */
+ done = CopyReadLine(cstate);
+
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ break;
+ }
+
pcshared_info->is_read_in_progress = false;
cstate->cur_lineno = 0;
+ }
+
+/*
+ * GetLinePosition - return the line position that worker should process.
+ */
+static uint32
+GetLinePosition(CopyState cstate)
+{
+ ParallelCopyData *pcdata = cstate->pcdata;
+ ParallelCopyShmInfo *pcshared_info = pcdata->pcshared_info;
+ uint32 previous_pos = pcdata->worker_processed_pos;
+ uint32 write_pos = (previous_pos == -1) ? 0 : (previous_pos + 1) % RINGSIZE;
+ for (;;)
+ {
+ int dataSize;
+ bool is_read_in_progress = pcshared_info->is_read_in_progress;
+ ParallelCopyLineBoundary *lineInfo;
+ ParallelCopyDataBlock *data_blk_ptr;
+ ParallelCopyLineState line_state = LINE_LEADER_POPULATED;
+ ParallelCopyLineState curr_line_state;
+ CHECK_FOR_INTERRUPTS();
+
+ /* File read completed & no elements to process. */
+ if (!is_read_in_progress &&
+ (pcshared_info->populated ==
+ pg_atomic_read_u64(&pcshared_info->total_worker_processed)))
+ {
+ write_pos = -1;
+ break;
+ }
+
+ /* Get the current line information. */
+ lineInfo = &pcshared_info->line_boundaries.ring[write_pos];
+ curr_line_state = pg_atomic_read_u32(&lineInfo->line_state);
+ if ((write_pos % WORKER_CHUNK_COUNT == 0) &&
+ (curr_line_state == LINE_WORKER_PROCESSED ||
+ curr_line_state == LINE_WORKER_PROCESSING))
+ {
+ pcdata->worker_processed_pos = write_pos;
+ write_pos = (write_pos + WORKER_CHUNK_COUNT) % RINGSIZE;
+ continue;
+ }
+
+ /* Get the size of this line. */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+
+ if (dataSize != 0) /* If not an empty line. */
+ {
+ /* Get the block information. */
+ data_blk_ptr = &pcshared_info->data_blocks[lineInfo->first_block];
+
+ if (!data_blk_ptr->curr_blk_completed && (dataSize == -1))
+ {
+ /* Wait till the current line or block is added. */
+ COPY_WAIT_TO_PROCESS()
+ continue;
+ }
+ }
+
+ /* Make sure that no worker has consumed this element. */
+ if (pg_atomic_compare_exchange_u32(&lineInfo->line_state,
+ &line_state, LINE_WORKER_PROCESSING))
+ break;
+ }
+
+ pcdata->worker_processed_pos = write_pos;
+ return write_pos;
+}
+
+/*
+ * GetFreeCopyBlock - Get a free block for data to be copied.
+ */
+static pg_attribute_always_inline uint32
+GetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
+{
+ int count = 0;
+ uint32 last_free_block = pcshared_info->cur_block_pos;
+ uint32 block_pos = (last_free_block != -1) ? ((last_free_block + 1) % MAX_BLOCKS_COUNT): 0;
+
+ /*
+ * Get a new block for copying data, don't check current block, current
+ * block will have some unprocessed data.
+ */
+ while (count < (MAX_BLOCKS_COUNT - 1))
+ {
+ ParallelCopyDataBlock *dataBlkPtr = &pcshared_info->data_blocks[block_pos];
+ uint32 unprocessed_line_parts = pg_atomic_read_u32(&dataBlkPtr->unprocessed_line_parts);
+ if (unprocessed_line_parts == 0)
+ {
+ dataBlkPtr->curr_blk_completed = false;
+ dataBlkPtr->skip_bytes = 0;
+ pcshared_info->cur_block_pos = block_pos;
+ return block_pos;
+ }
+
+ block_pos = (block_pos + 1) % MAX_BLOCKS_COUNT;
+ count++;
+ }
+
+ return -1;
+}
+
+/*
+ * WaitGetFreeCopyBlock - If there are no blocks available, wait and get a block
+ * for copying data.
+ */
+static uint32
+WaitGetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
+{
+ uint32 new_free_pos = -1;
+ for (;;)
+ {
+ new_free_pos = GetFreeCopyBlock(pcshared_info);
+ if (new_free_pos != -1) /* We have got one block, break now. */
+ break;
+
+ COPY_WAIT_TO_PROCESS()
+ }
+
+ return new_free_pos;
+}
+
+/*
+ * SetRawBufForLoad - Set raw_buf to the shared memory where the file data must
+ * be read.
+ */
+static void
+SetRawBufForLoad(CopyState cstate, uint32 line_size, uint32 copy_buf_len,
+ uint32 raw_buf_ptr, char **copy_raw_buf)
+{
+ ParallelCopyShmInfo *pcshared_info;
+ uint32 cur_block_pos;
+ uint32 next_block_pos;
+ ParallelCopyDataBlock *cur_data_blk_ptr = NULL;
+ ParallelCopyDataBlock *next_data_blk_ptr = NULL;
+
+ if (!IsParallelCopy())
+ return;
+
+ pcshared_info = cstate->pcdata->pcshared_info;
+ cur_block_pos = pcshared_info->cur_block_pos;
+ cur_data_blk_ptr = (cstate->raw_buf) ? &pcshared_info->data_blocks[cur_block_pos] : NULL;
+ next_block_pos = WaitGetFreeCopyBlock(pcshared_info);
+ next_data_blk_ptr = &pcshared_info->data_blocks[next_block_pos];
+
+ /* set raw_buf to the data block in shared memory */
+ cstate->raw_buf = next_data_blk_ptr->data;
+ *copy_raw_buf = cstate->raw_buf;
+ if (cur_data_blk_ptr && line_size)
+ {
+ /*
+ * Mark the previous block as completed, worker can start copying this
+ * data.
+ */
+ cur_data_blk_ptr->following_block = next_block_pos;
+ pg_atomic_add_fetch_u32(&cur_data_blk_ptr->unprocessed_line_parts, 1);
+ cur_data_blk_ptr->skip_bytes = copy_buf_len - raw_buf_ptr;
+ cur_data_blk_ptr->curr_blk_completed = true;
+ }
+}
+
+/*
+ * EndLineParallelCopy - Update the line information in shared memory.
+ */
+static void
+EndLineParallelCopy(CopyState cstate, uint32 line_pos, uint32 line_size,
+ uint32 raw_buf_ptr)
+{
+ uint8 new_line_size;
+ if (!IsParallelCopy())
+ return;
+
+ if (!IsHeaderLine())
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundaries *lineBoundaryPtr = &pcshared_info->line_boundaries;
+ SET_NEWLINE_SIZE()
+ if (line_size)
+ {
+ ParallelCopyLineBoundary *lineInfo = &lineBoundaryPtr->ring[line_pos];
+ /*
+ * If the new_line_size > raw_buf_ptr, then the new block has only
+ * new line char content. The unprocessed count should not be
+ * increased in this case.
+ */
+ if (raw_buf_ptr > new_line_size)
+ {
+ uint32 cur_block_pos = pcshared_info->cur_block_pos;
+ ParallelCopyDataBlock *curr_data_blk_ptr = &pcshared_info->data_blocks[cur_block_pos];
+ pg_atomic_add_fetch_u32(&curr_data_blk_ptr->unprocessed_line_parts, 1);
+ }
+
+ /* Update line size. */
+ pg_atomic_write_u32(&lineInfo->line_size, line_size);
+ pg_atomic_write_u32(&lineInfo->line_state, LINE_LEADER_POPULATED);
+ elog(DEBUG1, "[Leader] After adding - line position:%d, line_size:%d",
+ line_pos, line_size);
+ pcshared_info->populated++;
+ }
+ else if (new_line_size)
+ {
+ /* This means only new line char, empty record should be inserted.*/
+ ParallelCopyLineBoundary *lineInfo;
+ line_pos = UpdateBlockInLineInfo(cstate, -1, -1, 0,
+ LINE_LEADER_POPULATED);
+ lineInfo = &lineBoundaryPtr->ring[line_pos];
+ elog(DEBUG1, "[Leader] Added empty line with offset:%d, line position:%d, line size:%d",
+ lineInfo->start_offset, line_pos,
+ pg_atomic_read_u32(&lineInfo->line_size));
+ pcshared_info->populated++;
+ }
+ }
+}
+
+/*
+ * ExecBeforeStmtTrigger - Execute the before statement trigger, this will be
+ * executed for parallel copy by the leader process.
+ */
+static void
+ExecBeforeStmtTrigger(CopyState cstate)
+{
+ EState *estate = CreateExecutorState();
+ ResultRelInfo *resultRelInfo;
+
+ Assert(IsLeader());
+
+ /*
+ * We need a ResultRelInfo so we can use the regular executor's
+ * index-entry-making machinery. (There used to be a huge amount of code
+ * here that basically duplicated execUtils.c ...)
+ */
+ resultRelInfo = makeNode(ResultRelInfo);
+ InitResultRelInfo(resultRelInfo,
+ cstate->rel,
+ 1, /* must match rel's position in range_table */
+ NULL,
+ 0);
+
+ /* Verify the named relation is a valid target for INSERT */
+ CheckValidResultRel(resultRelInfo, CMD_INSERT);
+
+ estate->es_result_relations = resultRelInfo;
+ estate->es_num_result_relations = 1;
+ estate->es_result_relation_info = resultRelInfo;
+
+ ExecInitRangeTable(estate, cstate->range_table);
+
+ /*
+ * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
+ * should do this for COPY, since it's not really an "INSERT" statement as
+ * such. However, executing these triggers maintains consistency with the
+ * EACH ROW triggers that we already fire on COPY.
+ */
+ ExecBSInsertTriggers(estate, resultRelInfo);
+
+ /* Close any trigger target relations */
+ ExecCleanUpTriggerState(estate);
+
+ FreeExecutorState(estate);
}
/*
@@ -3584,7 +4309,8 @@ CopyFrom(CopyState cstate)
PartitionTupleRouting *proute = NULL;
ErrorContextCallback errcallback;
- CommandId mycid = GetCurrentCommandId(true);
+ CommandId mycid = IsParallelCopy() ? GetCurrentCommandId(false) :
+ GetCurrentCommandId(true);
int ti_options = 0; /* start with default options for insert */
BulkInsertState bistate = NULL;
CopyInsertMethod insertMethod;
@@ -3594,7 +4320,14 @@ CopyFrom(CopyState cstate)
bool has_instead_insert_row_trig;
bool leafpart_use_multi_insert = false;
- CheckTargetRelValidity(cstate);
+ /*
+ * Perform this check if it is not parallel copy. In case of parallel
+ * copy, this check is done by the leader, so that if any invalid case
+ * exist the copy from command will error out from the leader itself,
+ * avoiding launching workers, just to throw error.
+ */
+ if (!IsParallelCopy())
+ CheckTargetRelValidity(cstate);
/*
* If the target file is new-in-transaction, we assume that checking FSM
@@ -3634,7 +4367,8 @@ CopyFrom(CopyState cstate)
target_resultRelInfo = resultRelInfo;
/* Verify the named relation is a valid target for INSERT */
- CheckValidResultRel(resultRelInfo, CMD_INSERT);
+ if (!IsParallelCopy())
+ CheckValidResultRel(resultRelInfo, CMD_INSERT);
ExecOpenIndices(resultRelInfo, false);
@@ -3783,13 +4517,16 @@ CopyFrom(CopyState cstate)
has_instead_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
resultRelInfo->ri_TrigDesc->trig_insert_instead_row);
- /*
- * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
- * should do this for COPY, since it's not really an "INSERT" statement as
- * such. However, executing these triggers maintains consistency with the
- * EACH ROW triggers that we already fire on COPY.
- */
- ExecBSInsertTriggers(estate, resultRelInfo);
+ if (!IsParallelCopy())
+ {
+ /*
+ * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
+ * should do this for COPY, since it's not really an "INSERT" statement as
+ * such. However, executing these triggers maintains consistency with the
+ * EACH ROW triggers that we already fire on COPY.
+ */
+ ExecBSInsertTriggers(estate, resultRelInfo);
+ }
econtext = GetPerTupleExprContext(estate);
@@ -3889,6 +4626,16 @@ CopyFrom(CopyState cstate)
!has_instead_insert_row_trig &&
resultRelInfo->ri_FdwRoutine == NULL;
+ /*
+ * If the table has any partitions that are either foreign or
+ * has BEFORE/INSTEAD OF triggers, we can't perform copy
+ * operations with parallel workers.
+ */
+ if (!leafpart_use_multi_insert && IsParallelWorker())
+ ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot perform PARALLEL COPY if partition has BEFORE/INSTEAD OF triggers, or if the partition is foreign partition"),
+ errhint("Try COPY without PARALLEL option")));
+
/* Set the multi-insert buffer to use for this partition. */
if (leafpart_use_multi_insert)
{
@@ -4305,7 +5052,7 @@ BeginCopyFrom(ParseState *pstate,
* only in text mode.
*/
initStringInfo(&cstate->attribute_buf);
- cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf = (IsParallelCopy()) ? NULL : (char *) palloc(RAW_BUF_SIZE + 1);
cstate->raw_buf_index = cstate->raw_buf_len = 0;
if (!cstate->binary)
{
@@ -4454,26 +5201,35 @@ NextCopyFromRawFields(CopyState cstate, char ***fields, int *nfields)
/* only available for text or csv input */
Assert(!cstate->binary);
- /* on input just throw the header line away */
- if (cstate->cur_lineno == 0 && cstate->header_line)
+ if (IsParallelCopy())
{
- cstate->cur_lineno++;
- if (CopyReadLine(cstate))
- return false; /* done */
+ done = GetWorkerLine(cstate);
+ if (done && cstate->line_buf.len == 0)
+ return false;
}
+ else
+ {
+ /* on input just throw the header line away */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
+ {
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ return false; /* done */
+ }
- cstate->cur_lineno++;
+ cstate->cur_lineno++;
- /* Actually read the line into memory here */
- done = CopyReadLine(cstate);
+ /* Actually read the line into memory here */
+ done = CopyReadLine(cstate);
- /*
- * EOF at start of line means we're done. If we see EOF after some
- * characters, we act as though it was newline followed by EOF, ie,
- * process the line and then exit loop on next iteration.
- */
- if (done && cstate->line_buf.len == 0)
- return false;
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ return false;
+ }
/* Parse the line into de-escaped field values */
if (cstate->csv_mode)
@@ -4721,9 +5477,31 @@ CopyReadLine(CopyState cstate)
*/
if (cstate->copy_dest == COPY_NEW_FE)
{
+ bool bIsFirst = true;
do
{
- cstate->raw_buf_index = cstate->raw_buf_len;
+ if (!IsParallelCopy())
+ cstate->raw_buf_index = cstate->raw_buf_len;
+ else
+ {
+ if (cstate->raw_buf_index == RAW_BUF_SIZE)
+ {
+ /* Get a new block if it is the first time, From the
+ * subsequent time, reset the index and re-use the same
+ * block.
+ */
+ if (bIsFirst)
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint32 block_pos = WaitGetFreeCopyBlock(pcshared_info);
+ cstate->raw_buf = pcshared_info->data_blocks[block_pos].data;
+ bIsFirst = false;
+ }
+
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+ }
+ }
+
} while (CopyLoadRawBuf(cstate));
}
}
@@ -4778,7 +5556,7 @@ static void
ConvertToServerEncoding(CopyState cstate)
{
/* Done reading the line. Convert it to server encoding. */
- if (cstate->need_transcoding)
+ if (cstate->need_transcoding && (!IsParallelCopy() || IsWorker()))
{
char *cvt;
cvt = pg_any_to_server(cstate->line_buf.data,
@@ -4817,6 +5595,11 @@ CopyReadLineText(CopyState cstate)
char quotec = '\0';
char escapec = '\0';
+ /* For parallel copy */
+ int line_size = 0;
+ int line_pos = 0;
+
+ cstate->eol_type = EOL_UNKNOWN;
if (cstate->csv_mode)
{
quotec = cstate->quote[0];
@@ -4871,6 +5654,8 @@ CopyReadLineText(CopyState cstate)
if (raw_buf_ptr >= copy_buf_len || need_data)
{
REFILL_LINEBUF;
+ SetRawBufForLoad(cstate, line_size, copy_buf_len, raw_buf_ptr,
+ ©_raw_buf);
/*
* Try to read some more data. This will certainly reset
@@ -5095,9 +5880,15 @@ CopyReadLineText(CopyState cstate)
* discard the data and the \. sequence.
*/
if (prev_raw_ptr > cstate->raw_buf_index)
- appendBinaryStringInfo(&cstate->line_buf,
+ {
+ if (!IsParallelCopy())
+ appendBinaryStringInfo(&cstate->line_buf,
cstate->raw_buf + cstate->raw_buf_index,
prev_raw_ptr - cstate->raw_buf_index);
+ else
+ line_size += prev_raw_ptr - cstate->raw_buf_index;
+ }
+
cstate->raw_buf_index = raw_buf_ptr;
result = true; /* report EOF */
break;
@@ -5149,6 +5940,26 @@ not_end_of_copy:
IF_NEED_REFILL_AND_EOF_BREAK(mblen - 1);
raw_buf_ptr += mblen - 1;
}
+
+ /*
+ * Skip the header line. Update the line here, this cannot be done at
+ * the beginning, as there is a possibility that file contains empty
+ * lines.
+ */
+ if (IsParallelCopy() && first_char_in_line && !IsHeaderLine())
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundary *lineInfo;
+ uint32 line_first_block = pcshared_info->cur_block_pos;
+ line_pos = UpdateBlockInLineInfo(cstate,
+ line_first_block,
+ cstate->raw_buf_index, -1,
+ LINE_LEADER_POPULATING);
+ lineInfo = &pcshared_info->line_boundaries.ring[line_pos];
+ elog(DEBUG1, "[Leader] Adding - block:%d, offset:%d, line position:%d",
+ line_first_block, lineInfo->start_offset, line_pos);
+ }
+
first_char_in_line = false;
} /* end of outer loop */
@@ -5157,6 +5968,7 @@ not_end_of_copy:
*/
REFILL_LINEBUF;
CLEAR_EOL_LINE();
+ EndLineParallelCopy(cstate, line_pos, line_size, raw_buf_ptr);
return result;
}
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index c4af40b..c95db78 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2353,7 +2353,7 @@ psql_completion(const char *text, int start, int end)
/* Complete COPY <sth> FROM|TO filename WITH ( */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "("))
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
- "HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
+ "HEADER", "PARALLEL", "QUOTE", "ESCAPE", "FORCE_QUOTE",
"FORCE_NOT_NULL", "FORCE_NULL", "ENCODING");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index 5348011..4ea02f7 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -381,6 +381,8 @@ extern FullTransactionId GetTopFullTransactionId(void);
extern FullTransactionId GetTopFullTransactionIdIfAny(void);
extern FullTransactionId GetCurrentFullTransactionId(void);
extern FullTransactionId GetCurrentFullTransactionIdIfAny(void);
+extern void AssignFullTransactionIdForWorker(FullTransactionId fullTransactionId);
+extern void AssignCommandIdForWorker(CommandId commandId, bool used);
extern void MarkCurrentTransactionIdLoggedIfAny(void);
extern bool SubTransactionIsActive(SubTransactionId subxid);
extern CommandId GetCurrentCommandId(bool used);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 3a83d0f..ba93f65 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1704,6 +1704,7 @@ ParallelCopyLineBoundary
ParallelCopyData
ParallelCopyDataBlock
ParallelCopyLineBuf
+ParallelCopyLineState
ParallelCopyShmInfo
ParallelExecutorInfo
ParallelHashGrowth
--
1.8.3.1
v2-0004-Documentation-for-parallel-copy.patchapplication/x-patch; name=v2-0004-Documentation-for-parallel-copy.patchDownload
From e543e3b3def9367bd30191ea334c26fb7e372b96 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Sat, 1 Aug 2020 09:00:36 +0530
Subject: [PATCH v2 4/5] Documentation for parallel copy.
This patch has the documentation changes for parallel copy.
---
doc/src/sgml/ref/copy.sgml | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18189ab..95d349d 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -37,6 +37,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
NULL '<replaceable class="parameter">null_string</replaceable>'
HEADER [ <replaceable class="parameter">boolean</replaceable> ]
+ PARALLEL <replaceable class="parameter">integer</replaceable>
QUOTE '<replaceable class="parameter">quote_character</replaceable>'
ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -275,6 +276,21 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</varlistentry>
<varlistentry>
+ <term><literal>PARALLEL</literal></term>
+ <listitem>
+ <para>
+ Perform <command>COPY FROM</command> in parallel using <replaceable
+ class="parameter"> integer</replaceable> background workers. Please
+ note that it is not guaranteed that the number of parallel workers
+ specified in <replaceable class="parameter">integer</replaceable> will
+ be used during execution. It is possible for a copy to run with fewer
+ workers than specified, or even with no workers at all. This option is
+ allowed only in <command>COPY FROM</command>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>QUOTE</literal></term>
<listitem>
<para>
--
1.8.3.1
v2-0005-Tests-for-parallel-copy.patchapplication/x-patch; name=v2-0005-Tests-for-parallel-copy.patchDownload
From ec2489bc2316da2aaa959e1e1c53df1f0a801d8a Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Sat, 1 Aug 2020 09:19:39 +0530
Subject: [PATCH v2 5/5] Tests for parallel copy.
This patch has the tests for parallel copy.
---
src/test/regress/expected/copy2.out | 206 ++++++++++++++++++++++++++++++++++-
src/test/regress/input/copy.source | 12 +++
src/test/regress/output/copy.source | 12 +++
src/test/regress/sql/copy2.sql | 208 +++++++++++++++++++++++++++++++++++-
4 files changed, 430 insertions(+), 8 deletions(-)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index e40287d..0e01fa0 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -254,18 +254,32 @@ It is "perfect".|
"It is ""perfect""."," "
"",
--test that we read consecutive LFs properly
-CREATE TEMP TABLE testnl (a int, b text, c int);
+CREATE TABLE testnl (a int, b text, c int);
COPY testnl FROM stdin CSV;
+COPY testnl FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
+SELECT * FROM testnl;
+ a | b | c
+---+----------------------+---
+ 1 | a field with two LFs+| 2
+ | +|
+ | inside |
+ 1 | a field with two LFs+| 2
+ | +|
+ | inside |
+(2 rows)
+
-- test end of copy marker
-CREATE TEMP TABLE testeoc (a text);
+CREATE TABLE testeoc (a text);
COPY testeoc FROM stdin CSV;
+TRUNCATE testeoc;
+COPY testeoc FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
COPY testeoc TO stdout CSV;
a\.
\.b
c\.d
"\."
-- test handling of nonstandard null marker that violates escaping rules
-CREATE TEMP TABLE testnull(a int, b text);
+CREATE TABLE testnull(a int, b text);
INSERT INTO testnull VALUES (1, E'\\0'), (NULL, NULL);
COPY testnull TO stdout WITH NULL AS E'\\0';
1 \\0
@@ -280,6 +294,15 @@ SELECT * FROM testnull;
|
(4 rows)
+TRUNCATE testnull;
+COPY testnull FROM stdin WITH (NULL E'\\0', PARALLEL 1);
+SELECT * FROM testnull;
+ a | b
+----+----
+ 42 | \0
+ |
+(2 rows)
+
BEGIN;
CREATE TABLE vistest (LIKE testeoc);
COPY vistest FROM stdin CSV;
@@ -349,6 +372,34 @@ SELECT * FROM vistest;
BEGIN;
TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+SELECT * FROM vistest;
+ a
+----
+ a2
+ b
+(2 rows)
+
+SAVEPOINT s1;
+TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+SELECT * FROM vistest;
+ a
+----
+ d2
+ e
+(2 rows)
+
+COMMIT;
+SELECT * FROM vistest;
+ a
+----
+ d2
+ e
+(2 rows)
+
+BEGIN;
+TRUNCATE vistest;
COPY vistest FROM stdin CSV FREEZE;
SELECT * FROM vistest;
a
@@ -409,7 +460,7 @@ SELECT * FROM vistest;
(2 rows)
-- Test FORCE_NOT_NULL and FORCE_NULL options
-CREATE TEMP TABLE forcetest (
+CREATE TABLE forcetest (
a INT NOT NULL,
b TEXT NOT NULL,
c TEXT,
@@ -420,6 +471,8 @@ CREATE TEMP TABLE forcetest (
-- should succeed with no effect ("b" remains an empty string, "c" remains NULL)
BEGIN;
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
+TRUNCATE forcetest;
+COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c), PARALLEL 1);
COMMIT;
SELECT b, c FROM forcetest WHERE a = 1;
b | c
@@ -430,6 +483,8 @@ SELECT b, c FROM forcetest WHERE a = 1;
-- should succeed, FORCE_NULL and FORCE_NOT_NULL can be both specified
BEGIN;
COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d));
+TRUNCATE forcetest;
+COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d), PARALLEL 1);
COMMIT;
SELECT c, d FROM forcetest WHERE a = 2;
c | d
@@ -486,6 +541,31 @@ select * from check_con_tbl;
(2 rows)
+\d+ check_con_tbl
+ Table "public.check_con_tbl"
+ Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
+--------+---------+-----------+----------+---------+---------+--------------+-------------
+ f1 | integer | | | | plain | |
+Check constraints:
+ "check_con_tbl_check" CHECK (check_con_function(check_con_tbl.*))
+
+truncate check_con_tbl;
+copy check_con_tbl from stdin with (parallel 1);
+NOTICE: input = {"f1":1}
+NOTICE: input = {"f1":null}
+copy check_con_tbl from stdin with (parallel 1);
+NOTICE: input = {"f1":0}
+ERROR: new row for relation "check_con_tbl" violates check constraint "check_con_tbl_check"
+DETAIL: Failing row contains (0).
+CONTEXT: COPY check_con_tbl, line 1: "0"
+parallel worker
+select * from check_con_tbl;
+ f1
+----
+ 1
+
+(2 rows)
+
-- test with RLS enabled.
CREATE ROLE regress_rls_copy_user;
CREATE ROLE regress_rls_copy_user_colperms;
@@ -600,8 +680,126 @@ SELECT * FROM instead_of_insert_tbl;
(2 rows)
COMMIT;
+-- Parallel copy tests.
+CREATE TABLE test_parallel_copy (
+ a INT,
+ b INT,
+ c TEXT not null default 'stuff',
+ d TEXT,
+ e TEXT
+) ;
+COPY test_parallel_copy (a, b, c, d, e) from stdin with (PARALLEL 1);
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1, FORMAT 'csv', HEADER);
+-- zero workers: should perform non-parallel copy
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0');
+ERROR: argument to option "parallel" must be a positive integer greater than zero
+LINE 1: COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0'...
+ ^
+-- referencing table: should perform non-parallel copy
+CREATE TABLE test_copy_pk(c1 INT PRIMARY KEY);
+INSERT INTO test_copy_pk VALUES(10);
+CREATE TABLE test_copy_ri(c1 INT REFERENCES test_copy_pk(c1));
+COPY test_copy_ri from stdin with (FORMAT csv, DELIMITER ',', PARALLEL 1);
+-- expressions: should perform non-parallel copy
+CREATE TABLE test_copy_expr (index INT, height REAL, weight REAL);
+COPY test_copy_expr FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1) WHERE height > random() * 65;
+-- serial data: should perform non-parallel copy
+CREATE TABLE testserial (index SERIAL, height REAL);
+COPY testserial(height) FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1);
+-- temporary table copy: should perform non-parallel copy
+CREATE TEMPORARY TABLE temp_test(
+ a int
+) ;
+COPY temp_test (a) from stdin with (PARALLEL 1);
+-- non-existent column in column list: should fail
+COPY test_parallel_copy (xyz) from stdin with (PARALLEL 1);
+ERROR: column "xyz" of relation "test_parallel_copy" does not exist
+-- too many columns in column list: should fail
+COPY test_parallel_copy (a, b, c, d, e, d, c) from stdin with (PARALLEL 1);
+ERROR: column "d" specified more than once
+-- missing data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: invalid input syntax for type integer: ""
+CONTEXT: COPY test_parallel_copy, line 0, column a: ""
+parallel worker
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: missing data for column "e"
+CONTEXT: COPY test_parallel_copy, line 1: "2000 230 23 23"
+parallel worker
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: missing data for column "e"
+CONTEXT: COPY test_parallel_copy, line 1: "2001 231 \N \N"
+parallel worker
+-- extra data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: extra data after last expected column
+CONTEXT: COPY test_parallel_copy, line 1: "2002 232 40 50 60 70 80"
+parallel worker
+-- various COPY options: delimiters, oids, NULL string, encoding
+COPY test_parallel_copy (b, c, d, e) from stdin with (delimiter ',', null 'x', PARALLEL 1) ;
+COPY test_parallel_copy from stdin WITH (DELIMITER ';', NULL '', PARALLEL 1);
+COPY test_parallel_copy from stdin WITH (DELIMITER ':', NULL E'\\X', ENCODING 'sql_ascii', PARALLEL 1);
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = 50004;
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a > 60003;
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE f > 60003;
+ERROR: column "f" does not exist
+LINE 1: ..._parallel_copy from stdin WITH (PARALLEL 1) WHERE f > 60003;
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = max(x.b);
+ERROR: missing FROM-clause entry for table "x"
+LINE 1: ...rallel_copy from stdin WITH (PARALLEL 1) WHERE a = max(x.b);
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (SELECT 1 FROM x);
+ERROR: cannot use subquery in COPY FROM WHERE condition
+LINE 1: ...arallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (SELECT...
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (generate_series(1,5));
+ERROR: set-returning functions are not allowed in COPY FROM WHERE conditions
+LINE 1: ...lel_copy from stdin WITH (PARALLEL 1) WHERE a IN (generate_s...
+ ^
+COPY test_parallel_copy from stdin WITH(PARALLEL 1) WHERE a = row_number() over(b);
+ERROR: window functions are not allowed in COPY FROM WHERE conditions
+LINE 1: ...rallel_copy from stdin WITH(PARALLEL 1) WHERE a = row_number...
+ ^
+-- check results of copy in
+SELECT * FROM test_parallel_copy;
+ a | b | c | d | e
+-------+----+------------+---------+---------
+ 1 | 11 | test_c1 | test_d1 | test_e1
+ 2 | 12 | test_c2 | test_d2 | test_e2
+ | 3 | stuff | test_d3 |
+ | 4 | stuff | test_d4 |
+ | 5 | stuff | test_d5 |
+ | | 45 | 80 | 90
+ | | x | \x | \x
+ | | , | \, | \
+ 3000 | | c | |
+ 4000 | | C | |
+ 4001 | 1 | empty | |
+ 4002 | 2 | null | |
+ 4003 | 3 | Backslash | \ | \
+ 4004 | 4 | BackslashX | \X | \X
+ 4005 | 5 | N | N | N
+ 4006 | 6 | BackslashN | \N | \N
+ 4007 | 7 | XX | XX | XX
+ 4008 | 8 | Delimiter | : | :
+ 50004 | 25 | 35 | 45 | 55
+ 60004 | 25 | 35 | 45 | 55
+ 60005 | 26 | 36 | 46 | 56
+(21 rows)
+
-- clean up
+DROP TABLE test_copy_ri;
+DROP TABLE test_copy_pk;
+DROP TABLE test_copy_expr;
+DROP TABLE testeoc;
+DROP TABLE testnl;
DROP TABLE forcetest;
+DROP TABLE test_parallel_copy;
+DROP TABLE testserial;
+DROP TABLE testnull;
DROP TABLE vistest;
DROP FUNCTION truncate_in_subxact();
DROP TABLE x, y;
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index a1d529a..159c058 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -15,6 +15,13 @@ DELETE FROM onek;
COPY onek FROM '@abs_builddir@/results/onek.data';
+-- test parallel copy
+COPY tenk1 FROM '@abs_srcdir@/data/tenk.data' WITH (parallel 2);
+
+SELECT COUNT(*) FROM tenk1;
+
+TRUNCATE tenk1;
+
COPY tenk1 FROM '@abs_srcdir@/data/tenk.data';
COPY slow_emp4000 FROM '@abs_srcdir@/data/rect.data';
@@ -159,6 +166,11 @@ truncate parted_copytest;
copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv';
+-- Test parallel copy from with a partitioned table.
+truncate parted_copytest;
+
+copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv' with (parallel 2);
+
-- Ensure COPY FREEZE errors for partitioned tables.
begin;
truncate parted_copytest;
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index 938d355..c3003fe 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -9,6 +9,15 @@ COPY onek FROM '@abs_srcdir@/data/onek.data';
COPY onek TO '@abs_builddir@/results/onek.data';
DELETE FROM onek;
COPY onek FROM '@abs_builddir@/results/onek.data';
+-- test parallel copy
+COPY tenk1 FROM '@abs_srcdir@/data/tenk.data' WITH (parallel 2);
+SELECT COUNT(*) FROM tenk1;
+ count
+-------
+ 10000
+(1 row)
+
+TRUNCATE tenk1;
COPY tenk1 FROM '@abs_srcdir@/data/tenk.data';
COPY slow_emp4000 FROM '@abs_srcdir@/data/rect.data';
COPY person FROM '@abs_srcdir@/data/person.data';
@@ -113,6 +122,9 @@ insert into parted_copytest select x,1,'One' from generate_series(1011,1020) x;
copy (select * from parted_copytest order by a) to '@abs_builddir@/results/parted_copytest.csv';
truncate parted_copytest;
copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv';
+-- Test parallel copy from with a partitioned table.
+truncate parted_copytest;
+copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv' with (parallel 2);
-- Ensure COPY FREEZE errors for partitioned tables.
begin;
truncate parted_copytest;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 902f4fa..7015698 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -157,7 +157,7 @@ COPY y TO stdout (FORMAT CSV, FORCE_QUOTE *);
--test that we read consecutive LFs properly
-CREATE TEMP TABLE testnl (a int, b text, c int);
+CREATE TABLE testnl (a int, b text, c int);
COPY testnl FROM stdin CSV;
1,"a field with two LFs
@@ -165,8 +165,16 @@ COPY testnl FROM stdin CSV;
inside",2
\.
+COPY testnl FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
+1,"a field with two LFs
+
+inside",2
+\.
+
+SELECT * FROM testnl;
+
-- test end of copy marker
-CREATE TEMP TABLE testeoc (a text);
+CREATE TABLE testeoc (a text);
COPY testeoc FROM stdin CSV;
a\.
@@ -175,11 +183,19 @@ c\.d
"\."
\.
+TRUNCATE testeoc;
+COPY testeoc FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
+a\.
+\.b
+c\.d
+"\."
+\.
+
COPY testeoc TO stdout CSV;
-- test handling of nonstandard null marker that violates escaping rules
-CREATE TEMP TABLE testnull(a int, b text);
+CREATE TABLE testnull(a int, b text);
INSERT INTO testnull VALUES (1, E'\\0'), (NULL, NULL);
COPY testnull TO stdout WITH NULL AS E'\\0';
@@ -191,6 +207,14 @@ COPY testnull FROM stdin WITH NULL AS E'\\0';
SELECT * FROM testnull;
+TRUNCATE testnull;
+COPY testnull FROM stdin WITH (NULL E'\\0', PARALLEL 1);
+42 \\0
+\0 \0
+\.
+
+SELECT * FROM testnull;
+
BEGIN;
CREATE TABLE vistest (LIKE testeoc);
COPY vistest FROM stdin CSV;
@@ -235,6 +259,23 @@ SELECT * FROM vistest;
BEGIN;
TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+a2
+b
+\.
+SELECT * FROM vistest;
+SAVEPOINT s1;
+TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+d2
+e
+\.
+SELECT * FROM vistest;
+COMMIT;
+SELECT * FROM vistest;
+
+BEGIN;
+TRUNCATE vistest;
COPY vistest FROM stdin CSV FREEZE;
x
y
@@ -284,7 +325,7 @@ SELECT * FROM vistest;
COMMIT;
SELECT * FROM vistest;
-- Test FORCE_NOT_NULL and FORCE_NULL options
-CREATE TEMP TABLE forcetest (
+CREATE TABLE forcetest (
a INT NOT NULL,
b TEXT NOT NULL,
c TEXT,
@@ -297,6 +338,10 @@ BEGIN;
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
1,,""
\.
+TRUNCATE forcetest;
+COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c), PARALLEL 1);
+1,,""
+\.
COMMIT;
SELECT b, c FROM forcetest WHERE a = 1;
-- should succeed, FORCE_NULL and FORCE_NOT_NULL can be both specified
@@ -304,6 +349,10 @@ BEGIN;
COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d));
2,'a',,""
\.
+TRUNCATE forcetest;
+COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d), PARALLEL 1);
+2,'a',,""
+\.
COMMIT;
SELECT c, d FROM forcetest WHERE a = 2;
-- should fail with not-null constraint violation
@@ -339,6 +388,16 @@ copy check_con_tbl from stdin;
0
\.
select * from check_con_tbl;
+\d+ check_con_tbl
+truncate check_con_tbl;
+copy check_con_tbl from stdin with (parallel 1);
+1
+\N
+\.
+copy check_con_tbl from stdin with (parallel 1);
+0
+\.
+select * from check_con_tbl;
-- test with RLS enabled.
CREATE ROLE regress_rls_copy_user;
@@ -440,8 +499,149 @@ test1
SELECT * FROM instead_of_insert_tbl;
COMMIT;
+-- Parallel copy tests.
+CREATE TABLE test_parallel_copy (
+ a INT,
+ b INT,
+ c TEXT not null default 'stuff',
+ d TEXT,
+ e TEXT
+) ;
+
+COPY test_parallel_copy (a, b, c, d, e) from stdin with (PARALLEL 1);
+1 11 test_c1 test_d1 test_e1
+2 12 test_c2 test_d2 test_e2
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+3 test_d3
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+4 test_d4
+5 test_d5
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1, FORMAT 'csv', HEADER);
+b d
+\.
+
+-- zero workers: should perform non-parallel copy
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0');
+
+-- referencing table: should perform non-parallel copy
+CREATE TABLE test_copy_pk(c1 INT PRIMARY KEY);
+INSERT INTO test_copy_pk VALUES(10);
+CREATE TABLE test_copy_ri(c1 INT REFERENCES test_copy_pk(c1));
+COPY test_copy_ri from stdin with (FORMAT csv, DELIMITER ',', PARALLEL 1);
+10
+\.
+
+-- expressions: should perform non-parallel copy
+CREATE TABLE test_copy_expr (index INT, height REAL, weight REAL);
+
+COPY test_copy_expr FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1) WHERE height > random() * 65;
+60,60,60
+\.
+
+-- serial data: should perform non-parallel copy
+CREATE TABLE testserial (index SERIAL, height REAL);
+
+COPY testserial(height) FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1);
+60
+\.
+
+-- temporary table copy: should perform non-parallel copy
+CREATE TEMPORARY TABLE temp_test(
+ a int
+) ;
+
+COPY temp_test (a) from stdin with (PARALLEL 1);
+10
+\.
+
+-- non-existent column in column list: should fail
+COPY test_parallel_copy (xyz) from stdin with (PARALLEL 1);
+
+-- too many columns in column list: should fail
+COPY test_parallel_copy (a, b, c, d, e, d, c) from stdin with (PARALLEL 1);
+
+-- missing data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+
+\.
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+2000 230 23 23
+\.
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+2001 231 \N \N
+\.
+
+-- extra data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+2002 232 40 50 60 70 80
+\.
+
+-- various COPY options: delimiters, oids, NULL string, encoding
+COPY test_parallel_copy (b, c, d, e) from stdin with (delimiter ',', null 'x', PARALLEL 1) ;
+x,45,80,90
+x,\x,\\x,\\\x
+x,\,,\\\,,\\
+\.
+
+COPY test_parallel_copy from stdin WITH (DELIMITER ';', NULL '', PARALLEL 1);
+3000;;c;;
+\.
+
+COPY test_parallel_copy from stdin WITH (DELIMITER ':', NULL E'\\X', ENCODING 'sql_ascii', PARALLEL 1);
+4000:\X:C:\X:\X
+4001:1:empty::
+4002:2:null:\X:\X
+4003:3:Backslash:\\:\\
+4004:4:BackslashX:\\X:\\X
+4005:5:N:\N:\N
+4006:6:BackslashN:\\N:\\N
+4007:7:XX:\XX:\XX
+4008:8:Delimiter:\::\:
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = 50004;
+50003 24 34 44 54
+50004 25 35 45 55
+50005 26 36 46 56
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a > 60003;
+60001 22 32 42 52
+60002 23 33 43 53
+60003 24 34 44 54
+60004 25 35 45 55
+60005 26 36 46 56
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE f > 60003;
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = max(x.b);
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (SELECT 1 FROM x);
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (generate_series(1,5));
+
+COPY test_parallel_copy from stdin WITH(PARALLEL 1) WHERE a = row_number() over(b);
+
+-- check results of copy in
+SELECT * FROM test_parallel_copy;
+
-- clean up
+DROP TABLE test_copy_ri;
+DROP TABLE test_copy_pk;
+DROP TABLE test_copy_expr;
+DROP TABLE testeoc;
+DROP TABLE testnl;
DROP TABLE forcetest;
+DROP TABLE test_parallel_copy;
+DROP TABLE testserial;
+DROP TABLE testnull;
DROP TABLE vistest;
DROP FUNCTION truncate_in_subxact();
DROP TABLE x, y;
--
1.8.3.1
v2-0006-Parallel-Copy-For-Binary-Format-Files.patchapplication/x-patch; name=v2-0006-Parallel-Copy-For-Binary-Format-Files.patchDownload
From 26a401f1ece2dfca2414805c5ae2c71f156e9ae6 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Mon, 3 Aug 2020 11:58:35 +0530
Subject: [PATCH v2] Parallel Copy For Binary Format Files
Leader reads data from the file into the DSM data blocks each of 64K size.
It also identifies each tuple data block id, start offset, end offset,
tuple size and updates this information in the ring data structure.
Workers parallelly read the tuple information from the ring data structure,
the actual tuple data from the data blocks and parallelly insert the tuples
into the table.
---
src/backend/commands/copy.c | 687 +++++++++++++++++++++++++++++++-----
1 file changed, 602 insertions(+), 85 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index c65fc9866f..af24c20a3f 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -262,6 +262,17 @@ typedef struct ParallelCopyLineBuf
uint64 cur_lineno; /* line number for error messages */
}ParallelCopyLineBuf;
+/*
+ * Represents the usage mode for CopyReadBinaryGetDataBlock.
+ */
+typedef enum FieldInfoType
+{
+ FIELD_NONE = 0,
+ FIELD_COUNT,
+ FIELD_SIZE,
+ FIELD_DATA
+} FieldInfoType;
+
/*
* Parallel copy data information.
*/
@@ -283,6 +294,9 @@ typedef struct ParallelCopyData
/* Current position in worker_line_buf */
uint32 worker_line_buf_pos;
+
+ /* For binary formatted files */
+ ParallelCopyDataBlock *curr_data_block;
}ParallelCopyData;
/*
@@ -447,6 +461,7 @@ typedef struct SerializedParallelCopyState
/* Working state for COPY FROM */
AttrNumber num_defaults;
Oid relid;
+ bool binary;
}SerializedParallelCopyState;
/* DestReceiver for COPY (query) TO */
@@ -521,7 +536,6 @@ typedef struct CopyMultiInsertInfo
int ti_options; /* table insert options */
} CopyMultiInsertInfo;
-
/*
* These macros centralize code used to process line_buf and raw_buf buffers.
* They are macros because they often do continue/break control and to avoid
@@ -651,8 +665,110 @@ else \
/* End parallel copy Macros */
-static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
+/*
+ * CHECK_FIELD_COUNT - Handles the error cases for field count
+ * for binary format files.
+ */
+#define CHECK_FIELD_COUNT \
+{\
+ if (fld_count == -1) \
+ { \
+ if (IsParallelCopy() && \
+ !IsLeader()) \
+ return true; \
+ else if (IsParallelCopy() && \
+ IsLeader()) \
+ { \
+ if (cstate->pcdata->curr_data_block->data[cstate->raw_buf_index + sizeof(fld_count)] != 0) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("received copy data after EOF marker"))); \
+ return true; \
+ } \
+ else \
+ { \
+ /* \
+ * Received EOF marker. In a V3-protocol copy, wait for the \
+ * protocol-level EOF, and complain if it doesn't come \
+ * immediately. This ensures that we correctly handle CopyFail, \
+ * if client chooses to send that now. \
+ * \
+ * Note that we MUST NOT try to read more data in an old-protocol \
+ * copy, since there is no protocol-level EOF marker then. We \
+ * could go either way for copy from file, but choose to throw \
+ * error if there's data after the EOF marker, for consistency \
+ * with the new-protocol case. \
+ */ \
+ char dummy; \
+ if (cstate->copy_dest != COPY_OLD_FE && \
+ CopyReadBinaryData(cstate, &dummy, 1) > 0) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("received copy data after EOF marker"))); \
+ return false; \
+ } \
+ } \
+ if (fld_count != cstate->max_fields) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("row field count is %d, expected %d", \
+ (int) fld_count, cstate->max_fields))); \
+}
+
+/*
+ * CHECK_FIELD_SIZE - Handles the error case for field size
+ * for binary format files.
+ */
+#define CHECK_FIELD_SIZE(fld_size) \
+{ \
+ if (fld_size < -1) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("invalid field size")));\
+}
+/*
+ * EOF_ERROR - Error statement for EOF for binary format
+ * files.
+ */
+#define EOF_ERROR \
+{ \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("unexpected EOF in COPY data")));\
+}
+
+/*
+ * GET_RAW_BUF_INDEX - Calculates the raw buf index for the cases
+ * where the data spread is across multiple data blocks.
+ */
+#define GET_RAW_BUF_INDEX(raw_buf_index, fld_size, required_blks, curr_blk_bytes) \
+{ \
+ raw_buf_index = fld_size - (((required_blks - 1) * DATA_BLOCK_SIZE) + curr_blk_bytes); \
+}
+
+/*
+ * GET_REQUIRED_BLOCKS - Calculates the number of data
+ * blocks required for the cases where the data spread
+ * is across multiple data blocks.
+ */
+#define GET_REQUIRED_BLOCKS(required_blks, fld_size, curr_blk_bytes) \
+{ \
+ /* \
+ * field size can spread across multiple data blocks, \
+ * calculate the number of required data blocks and try to get \
+ * those many data blocks. \
+ */ \
+ required_blks = (int32)(fld_size - curr_blk_bytes)/(int32)DATA_BLOCK_SIZE; \
+ /* \
+ * check if we need the data block for the field data \
+ * bytes that are not modulus of data block size. \
+ */ \
+ if ((fld_size - curr_blk_bytes)%DATA_BLOCK_SIZE != 0) \
+ required_blks++; \
+}
+
+static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
/* non-export function prototypes */
static CopyState BeginCopy(ParseState *pstate, bool is_from, Relation rel,
@@ -708,6 +824,14 @@ static void ExecBeforeStmtTrigger(CopyState cstate);
static void CheckTargetRelValidity(CopyState cstate);
static void PopulateCstateCatalogInfo(CopyState cstate);
static pg_attribute_always_inline uint32 GetLinePosition(CopyState cstate);
+static uint32 WaitGetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info);
+static bool CopyReadBinaryTupleLeader(CopyState cstate);
+static void CopyReadBinaryFindTupleSize(CopyState cstate, uint32 *line_size);
+static bool CopyReadBinaryTupleWorker(CopyState cstate, Datum *values, bool *nulls);
+static Datum CopyReadBinaryAttributeWorker(CopyState cstate, FmgrInfo *flinfo,
+ Oid typioparam, int32 typmod, bool *isnull);
+static void CopyReadBinaryGetDataBlock(CopyState cstate, FieldInfoType field_info);
+
/*
* SerializeParallelCopyState - Copy shared_cstate using cstate information.
*/
@@ -725,6 +849,7 @@ SerializeParallelCopyState(CopyState cstate, SerializedParallelCopyState *shared
shared_cstate->convert_selectively = cstate->convert_selectively;
shared_cstate->num_defaults = cstate->num_defaults;
shared_cstate->relid = cstate->pcdata->relid;
+ shared_cstate->binary = cstate->binary;
}
/*
@@ -884,8 +1009,8 @@ FindInsertMethod(CopyState cstate)
static pg_attribute_always_inline bool
IsParallelCopyAllowed(CopyState cstate)
{
- /* Parallel copy not allowed for frontend (2.0 protocol) & binary option. */
- if ((cstate->copy_dest == COPY_OLD_FE) || cstate->binary)
+ /* Parallel copy not allowed for frontend (2.0 protocol). */
+ if (cstate->copy_dest == COPY_OLD_FE)
return false;
/* Check if copy is into foreign table or temporary table. */
@@ -1155,6 +1280,7 @@ InitializeParallelCopyInfo(SerializedParallelCopyState *shared_cstate,
cstate->convert_selectively = shared_cstate->convert_selectively;
cstate->num_defaults = shared_cstate->num_defaults;
pcdata->relid = shared_cstate->relid;
+ cstate->binary = shared_cstate->binary;
PopulateCommonCstateInfo(cstate, tup_desc, attnamelist);
@@ -1541,35 +1667,72 @@ ParallelCopyFrom(CopyState cstate)
ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
ereport(DEBUG1, (errmsg("Running parallel copy leader")));
+ /* raw_buf is not used in parallel copy, instead data blocks are used.*/
+ pfree(cstate->raw_buf);
+
/* Execute the before statement triggers from the leader */
ExecBeforeStmtTrigger(cstate);
- /* On input just throw the header line away. */
- if (cstate->cur_lineno == 0 && cstate->header_line)
+ if (!cstate->binary)
{
- cstate->cur_lineno++;
- if (CopyReadLine(cstate))
+ /* On input just throw the header line away. */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
{
- pcshared_info->is_read_in_progress = false;
- return; /* done */
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ {
+ pcshared_info->is_read_in_progress = false;
+ return; /* done */
+ }
}
- }
- for (;;)
- {
- bool done;
- cstate->cur_lineno++;
+ for (;;)
+ {
+ bool done;
+ cstate->cur_lineno++;
- /* Actually read the line into memory here. */
- done = CopyReadLine(cstate);
+ /* Actually read the line into memory here. */
+ done = CopyReadLine(cstate);
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ break;
+ }
+ }
+ else
+ {
/*
- * EOF at start of line means we're done. If we see EOF after some
- * characters, we act as though it was newline followed by EOF, ie,
- * process the line and then exit loop on next iteration.
+ * Binary Format Files.
+ * For parallel copy leader, fill in the error
+ * context information here, in case any failures
+ * while determining tuple offsets, leader
+ * would throw the errors with proper context.
*/
- if (done && cstate->line_buf.len == 0)
- break;
+ ErrorContextCallback errcallback;
+ errcallback.callback = CopyFromErrorCallback;
+ errcallback.arg = (void *) cstate;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+ cstate->pcdata->curr_data_block = NULL;
+ cstate->raw_buf_index = 0;
+ pcshared_info->populated = 0;
+ cstate->cur_lineno = 0;
+ cstate->max_fields = list_length(cstate->attnumlist);
+
+ for (;;)
+ {
+ bool eof = false;
+ cstate->cur_lineno++;
+
+ eof = CopyReadBinaryTupleLeader(cstate);
+
+ if (eof)
+ break;
+ }
}
pcshared_info->is_read_in_progress = false;
@@ -1577,7 +1740,355 @@ ParallelCopyFrom(CopyState cstate)
}
/*
- * GetLinePosition - return the line position that worker should process.
+ * CopyReadBinaryGetDataBlock - gets a new block, updates
+ * the current offset, calculates the skip bytes.
+ */
+static void
+CopyReadBinaryGetDataBlock(CopyState cstate, FieldInfoType field_info)
+{
+ ParallelCopyDataBlock *data_block = NULL;
+ ParallelCopyDataBlock *curr_data_block = cstate->pcdata->curr_data_block;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint8 move_bytes = 0;
+ uint32 block_pos;
+ uint32 prev_block_pos;
+ int read_bytes = 0;
+
+ prev_block_pos = pcshared_info->cur_block_pos;
+
+ block_pos = WaitGetFreeCopyBlock(pcshared_info);
+
+ if (field_info == FIELD_SIZE || field_info == FIELD_COUNT)
+ move_bytes = (DATA_BLOCK_SIZE - cstate->raw_buf_index);
+
+ if (curr_data_block != NULL)
+ curr_data_block->skip_bytes = move_bytes;
+
+ data_block = &pcshared_info->data_blocks[block_pos];
+
+ if (move_bytes > 0 && curr_data_block != NULL)
+ memmove(&data_block->data[0], &curr_data_block->data[cstate->raw_buf_index], move_bytes);
+
+ elog(DEBUG1, "LEADER - field info %d is spread across data blocks - moved %d bytes from current block %u to %u block",
+ field_info, move_bytes, prev_block_pos, block_pos);
+
+ read_bytes = CopyGetData(cstate, &data_block->data[move_bytes], 1, (DATA_BLOCK_SIZE - move_bytes));
+
+ if (field_info == FIELD_NONE && cstate->reached_eof)
+ return;
+
+ if (cstate->reached_eof)
+ EOF_ERROR;
+
+ elog(DEBUG1, "LEADER - bytes read from file %d", read_bytes);
+
+ if(field_info == FIELD_SIZE || field_info == FIELD_DATA)
+ {
+ ParallelCopyDataBlock *prev_data_block = NULL;
+ prev_data_block = curr_data_block;
+ prev_data_block->following_block = block_pos;
+
+ if (prev_data_block->curr_blk_completed == false)
+ prev_data_block->curr_blk_completed = true;
+
+ pg_atomic_add_fetch_u32(&prev_data_block->unprocessed_line_parts, 1);
+ }
+
+ cstate->pcdata->curr_data_block = data_block;
+ cstate->raw_buf_index = 0;
+}
+
+/*
+ * CopyReadBinaryTupleLeader - leader reads data from binary formatted file
+ * to data blocks and identifies tuple boundaries/offsets so that workers
+ * can work on the data blocks data.
+ */
+static bool
+CopyReadBinaryTupleLeader(CopyState cstate)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ int16 fld_count;
+ uint32 line_size = 0;
+ uint32 start_block_pos;
+ uint32 start_offset;
+
+ if (cstate->pcdata->curr_data_block == NULL)
+ {
+ CopyReadBinaryGetDataBlock(cstate, FIELD_NONE);
+
+ /*
+ * no data is read from file here. one possibility
+ * to be here could be that the binary file just
+ * has a valid signature but nothing else.
+ */
+ if (cstate->reached_eof)
+ return true;
+ }
+
+ if ((cstate->raw_buf_index + sizeof(fld_count)) >= DATA_BLOCK_SIZE)
+ CopyReadBinaryGetDataBlock(cstate, FIELD_COUNT);
+
+ memcpy(&fld_count, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_count));
+ fld_count = (int16) pg_ntoh16(fld_count);
+ CHECK_FIELD_COUNT;
+ start_offset = cstate->raw_buf_index;
+ cstate->raw_buf_index += sizeof(fld_count);
+ line_size += sizeof(fld_count);
+ start_block_pos = pcshared_info->cur_block_pos;
+
+ CopyReadBinaryFindTupleSize(cstate, &line_size);
+
+ pg_atomic_add_fetch_u32(&cstate->pcdata->curr_data_block->unprocessed_line_parts, 1);
+
+ if (line_size > 0)
+ {
+ int line_pos = UpdateBlockInLineInfo(cstate,
+ start_block_pos,
+ start_offset,
+ line_size,
+ LINE_LEADER_POPULATED);
+
+ pcshared_info->populated++;
+ elog(DEBUG1, "LEADER - adding - block:%u, offset:%u, line size:%u line position:%d",
+ start_block_pos, start_offset, line_size, line_pos);
+ }
+
+ return false;
+}
+
+/*
+ * CopyReadBinaryFindTupleSize - leader identifies boundaries/
+ * offsets for each attribute/column and finally results in the
+ * tuple/row size. It moves on to next data block if the attribute/
+ * column is spread across data blocks.
+ */
+static void
+CopyReadBinaryFindTupleSize(CopyState cstate, uint32 *line_size)
+{
+ int32 fld_size;
+ ListCell *cur;
+ TupleDesc tup_desc = RelationGetDescr(cstate->rel);
+
+ foreach(cur, cstate->attnumlist)
+ {
+ int att_num = lfirst_int(cur);
+ Form_pg_attribute att = TupleDescAttr(tup_desc, (att_num - 1));
+ cstate->cur_attname = NameStr(att->attname);
+ fld_size = 0;
+
+ if ((cstate->raw_buf_index + sizeof(fld_size)) >= DATA_BLOCK_SIZE)
+ CopyReadBinaryGetDataBlock(cstate, FIELD_SIZE);
+
+ memcpy(&fld_size, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_size));
+ cstate->raw_buf_index += sizeof(fld_size);
+ *line_size += sizeof(fld_size);
+ fld_size = (int32) pg_ntoh32(fld_size);
+
+ CHECK_FIELD_SIZE(fld_size);
+
+ *line_size += fld_size;
+
+ if ((DATA_BLOCK_SIZE - cstate->raw_buf_index) >= fld_size)
+ {
+ cstate->raw_buf_index += fld_size;
+ elog(DEBUG1, "LEADER - tuple lies in he same data block");
+ }
+ else
+ {
+ int32 required_blks = 0;
+ int32 curr_blk_bytes = (DATA_BLOCK_SIZE - cstate->raw_buf_index);
+ int i = 0;
+
+ GET_REQUIRED_BLOCKS(required_blks, fld_size, curr_blk_bytes);
+
+ i = required_blks;
+
+ while(i > 0)
+ {
+ CopyReadBinaryGetDataBlock(cstate, FIELD_DATA);
+ i--;
+ }
+
+ GET_RAW_BUF_INDEX(cstate->raw_buf_index, fld_size, required_blks, curr_blk_bytes);
+
+ /*
+ * raw_buf_index should never cross data block size,
+ * as the required number of data blocks would have
+ * been obtained in the above while loop.
+ */
+ Assert(cstate->raw_buf_index <= DATA_BLOCK_SIZE);
+ }
+ cstate->cur_attname = NULL;
+ }
+}
+
+/*
+ * CopyReadBinaryTupleWorker - each worker reads data from data blocks after
+ * getting leader-identified tuple offsets from ring data structure.
+ */
+static bool
+CopyReadBinaryTupleWorker(CopyState cstate, Datum *values, bool *nulls)
+{
+ uint32 line_pos;
+ ParallelCopyLineBoundary *line_info;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ int16 fld_count;
+ ListCell *cur;
+ FmgrInfo *in_functions = cstate->in_functions;
+ Oid *typioparams = cstate->typioparams;
+ TupleDesc tup_desc = RelationGetDescr(cstate->rel);
+
+ line_pos = GetLinePosition(cstate);
+
+ if (line_pos == -1)
+ return true;
+
+ line_info = &pcshared_info->line_boundaries.ring[line_pos];
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[line_info->first_block];
+ cstate->raw_buf_index = line_info->start_offset;
+
+ if (cstate->raw_buf_index + sizeof(fld_count) >= DATA_BLOCK_SIZE)
+ {
+ /*
+ * The case where field count spread across datablocks should never occur,
+ * as the leader would have moved it to next block. this code exists for
+ * debugging purposes only.
+ */
+ elog(DEBUG1, "WORKER - field count spread across datablocks should never occur");
+ }
+
+ memcpy(&fld_count, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_count));
+ fld_count = (int16) pg_ntoh16(fld_count);
+
+ CHECK_FIELD_COUNT;
+
+ cstate->raw_buf_index += sizeof(fld_count);
+
+ foreach(cur, cstate->attnumlist)
+ {
+ int att_num = lfirst_int(cur);
+ int m = att_num - 1;
+ Form_pg_attribute att = TupleDescAttr(tup_desc, m);
+ cstate->cur_attname = NameStr(att->attname);
+
+ values[m] = CopyReadBinaryAttributeWorker(cstate,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &nulls[m]);
+ cstate->cur_attname = NULL;
+ }
+
+ pg_atomic_sub_fetch_u32(&cstate->pcdata->curr_data_block->unprocessed_line_parts, 1);
+ line_info->start_offset = -1;
+ pg_atomic_write_u32(&line_info->line_state, LINE_WORKER_PROCESSED);
+ pg_atomic_write_u32(&line_info->line_size, -1);
+ pg_atomic_add_fetch_u64(&pcshared_info->total_worker_processed, 1);
+
+ return false;
+}
+
+/*
+ * CopyReadBinaryAttributeWorker - leader identifies boundaries/offsets
+ * for each attribute/column, it moves on to next data block if the
+ * attribute/column is spread across data blocks.
+ */
+static Datum
+CopyReadBinaryAttributeWorker(CopyState cstate, FmgrInfo *flinfo,
+ Oid typioparam, int32 typmod, bool *isnull)
+{
+ int32 fld_size;
+ Datum result;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+
+ if ((cstate->raw_buf_index + sizeof(fld_size)) >= DATA_BLOCK_SIZE)
+ {
+ ParallelCopyDataBlock *prev_data_block = cstate->pcdata->curr_data_block;
+ elog(DEBUG1, "WORKER - field size is spread across data blocks");
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[prev_data_block->following_block];
+ pg_atomic_sub_fetch_u32(&prev_data_block->unprocessed_line_parts, 1);
+ cstate->raw_buf_index = 0;
+ }
+
+ memcpy(&fld_size, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_size));
+ fld_size = (int32) pg_ntoh32(fld_size);
+
+ CHECK_FIELD_SIZE(fld_size);
+
+ cstate->raw_buf_index += sizeof(fld_size);
+
+ /* reset attribute_buf to empty, and load raw data in it */
+ resetStringInfo(&cstate->attribute_buf);
+
+ enlargeStringInfo(&cstate->attribute_buf, fld_size);
+
+ if ((DATA_BLOCK_SIZE - cstate->raw_buf_index) >= fld_size)
+ {
+ elog(DEBUG1, "WORKER - tuple lies in single data block");
+ memcpy(&cstate->attribute_buf.data[0],&cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], fld_size);
+ cstate->raw_buf_index += fld_size;
+ }
+ else
+ {
+ uint32 att_buf_idx = 0;
+ uint32 copy_bytes = 0;
+ int32 required_blks = 0;
+ int32 curr_blk_bytes = (DATA_BLOCK_SIZE - cstate->raw_buf_index);
+ ParallelCopyDataBlock *prev_data_block = NULL;
+ int i = 0;
+
+ GET_REQUIRED_BLOCKS(required_blks, fld_size, curr_blk_bytes);
+
+ i = required_blks;
+ prev_data_block = cstate->pcdata->curr_data_block;
+ elog(DEBUG1, "WORKER - tuple is spread across data blocks");
+ memcpy(&cstate->attribute_buf.data[0], &prev_data_block->data[cstate->raw_buf_index],
+ curr_blk_bytes);
+ copy_bytes = curr_blk_bytes;
+ att_buf_idx = curr_blk_bytes;
+
+ while (i>0)
+ {
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[prev_data_block->following_block];
+ pg_atomic_sub_fetch_u32(&prev_data_block->unprocessed_line_parts, 1);
+ cstate->raw_buf_index = 0;
+ copy_bytes = fld_size - att_buf_idx;
+
+ /*
+ * the bytes that are yet to be taken into att buff are more than
+ * the entire data block size, but only take the data block size
+ * elements.
+ */
+ if (copy_bytes >= DATA_BLOCK_SIZE)
+ copy_bytes = DATA_BLOCK_SIZE;
+
+ memcpy(&cstate->attribute_buf.data[att_buf_idx],
+ &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], copy_bytes);
+ att_buf_idx += copy_bytes;
+ i--;
+ }
+ GET_RAW_BUF_INDEX(cstate->raw_buf_index, fld_size, required_blks, curr_blk_bytes);
+ }
+
+ cstate->attribute_buf.len = fld_size;
+ cstate->attribute_buf.data[fld_size] = '\0';
+
+ /* Call the column type's binary input converter */
+ result = ReceiveFunctionCall(flinfo, &cstate->attribute_buf,
+ typioparam, typmod);
+
+ /* Trouble if it didn't eat the whole buffer */
+ if (cstate->attribute_buf.cursor != cstate->attribute_buf.len)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
+ errmsg("incorrect binary data format")));
+
+ *isnull = false;
+ return result;
+}
+
+/*
+ * GetLinePosition - return the line position that worker should pcdata->process.
*/
static uint32
GetLinePosition(CopyState cstate)
@@ -1665,7 +2176,9 @@ GetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
{
dataBlkPtr->curr_blk_completed = false;
dataBlkPtr->skip_bytes = 0;
+ dataBlkPtr->following_block = -1;
pcshared_info->cur_block_pos = block_pos;
+ MemSet(&dataBlkPtr->data[0], 0, DATA_BLOCK_SIZE);
return block_pos;
}
@@ -2181,10 +2694,26 @@ CopyGetInt32(CopyState cstate, int32 *val)
{
uint32 buf;
- if (CopyReadBinaryData(cstate, (char *) &buf, sizeof(buf)) != sizeof(buf))
+ /*
+ * For parallel copy, avoid reading data to raw buf, read directly
+ * from file, later the data will be read to parallel copy data
+ * buffers.
+ */
+ if (cstate->nworkers > 0)
{
- *val = 0; /* suppress compiler warning */
- return false;
+ if (CopyGetData(cstate, &buf, sizeof(buf), sizeof(buf)) != sizeof(buf))
+ {
+ *val = 0; /* suppress compiler warning */
+ return false;
+ }
+ }
+ else
+ {
+ if (CopyReadBinaryData(cstate, (char *) &buf, sizeof(buf)) != sizeof(buf))
+ {
+ *val = 0; /* suppress compiler warning */
+ return false;
+ }
}
*val = (int32) pg_ntoh32(buf);
return true;
@@ -2573,7 +3102,15 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
EndParallelCopy(pcxt);
}
else
+ {
+ /*
+ * Reset nworkers to -1 here. This is useful in cases where user
+ * specifies parallel workers, but, no worker is picked up, so go
+ * back to non parallel mode value of nworkers.
+ */
+ cstate->nworkers = -1;
*processed = CopyFrom(cstate); /* copy from file to database */
+ }
EndCopyFrom(cstate);
}
@@ -5052,7 +5589,7 @@ BeginCopyFrom(ParseState *pstate,
* only in text mode.
*/
initStringInfo(&cstate->attribute_buf);
- cstate->raw_buf = (IsParallelCopy()) ? NULL : (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
cstate->raw_buf_index = cstate->raw_buf_len = 0;
if (!cstate->binary)
{
@@ -5132,7 +5669,7 @@ BeginCopyFrom(ParseState *pstate,
int32 tmp;
/* Signature */
- if (CopyReadBinaryData(cstate, readSig, 11) != 11 ||
+ if (CopyGetData(cstate, readSig, 11, 11) != 11 ||
memcmp(readSig, BinarySignature, 11) != 0)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
@@ -5160,7 +5697,7 @@ BeginCopyFrom(ParseState *pstate,
/* Skip extension header, if present */
while (tmp-- > 0)
{
- if (CopyReadBinaryData(cstate, readSig, 1) != 1)
+ if (CopyGetData(cstate, readSig, 1, 1) != 1)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
errmsg("invalid COPY file header (wrong length)")));
@@ -5357,60 +5894,45 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
else
{
/* binary */
- int16 fld_count;
- ListCell *cur;
-
cstate->cur_lineno++;
+ cstate->max_fields = list_length(cstate->attnumlist);
- if (!CopyGetInt16(cstate, &fld_count))
+ if (!IsParallelCopy())
{
- /* EOF detected (end of file, or protocol-level EOF) */
- return false;
- }
+ int16 fld_count;
+ ListCell *cur;
- if (fld_count == -1)
- {
- /*
- * Received EOF marker. In a V3-protocol copy, wait for the
- * protocol-level EOF, and complain if it doesn't come
- * immediately. This ensures that we correctly handle CopyFail,
- * if client chooses to send that now.
- *
- * Note that we MUST NOT try to read more data in an old-protocol
- * copy, since there is no protocol-level EOF marker then. We
- * could go either way for copy from file, but choose to throw
- * error if there's data after the EOF marker, for consistency
- * with the new-protocol case.
- */
- char dummy;
-
- if (cstate->copy_dest != COPY_OLD_FE &&
- CopyReadBinaryData(cstate, &dummy, 1) > 0)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("received copy data after EOF marker")));
- return false;
- }
+ if (!CopyGetInt16(cstate, &fld_count))
+ {
+ /* EOF detected (end of file, or protocol-level EOF) */
+ return false;
+ }
- if (fld_count != attr_count)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("row field count is %d, expected %d",
- (int) fld_count, attr_count)));
+ CHECK_FIELD_COUNT;
- foreach(cur, cstate->attnumlist)
+ foreach(cur, cstate->attnumlist)
+ {
+ int attnum = lfirst_int(cur);
+ int m = attnum - 1;
+ Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+ cstate->cur_attname = NameStr(att->attname);
+ values[m] = CopyReadBinaryAttribute(cstate,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &nulls[m]);
+ cstate->cur_attname = NULL;
+ }
+ }
+ else
{
- int attnum = lfirst_int(cur);
- int m = attnum - 1;
- Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+ bool eof = false;
- cstate->cur_attname = NameStr(att->attname);
- values[m] = CopyReadBinaryAttribute(cstate,
- &in_functions[m],
- typioparams[m],
- att->atttypmod,
- &nulls[m]);
- cstate->cur_attname = NULL;
+ eof = CopyReadBinaryTupleWorker(cstate, values, nulls);
+
+ if (eof)
+ return false;
}
}
@@ -6410,18 +6932,15 @@ CopyReadBinaryAttribute(CopyState cstate, FmgrInfo *flinfo,
Datum result;
if (!CopyGetInt32(cstate, &fld_size))
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("unexpected EOF in COPY data")));
+ EOF_ERROR;
+
if (fld_size == -1)
{
*isnull = true;
return ReceiveFunctionCall(flinfo, NULL, typioparam, typmod);
}
- if (fld_size < 0)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("invalid field size")));
+
+ CHECK_FIELD_SIZE(fld_size);
/* reset attribute_buf to empty, and load raw data in it */
resetStringInfo(&cstate->attribute_buf);
@@ -6429,9 +6948,7 @@ CopyReadBinaryAttribute(CopyState cstate, FmgrInfo *flinfo,
enlargeStringInfo(&cstate->attribute_buf, fld_size);
if (CopyReadBinaryData(cstate, cstate->attribute_buf.data,
fld_size) != fld_size)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("unexpected EOF in COPY data")));
+ EOF_ERROR;
cstate->attribute_buf.len = fld_size;
cstate->attribute_buf.data[fld_size] = '\0';
--
2.25.1
On Mon, Aug 03, 2020 at 12:33:48PM +0530, Bharath Rupireddy wrote:
On Sat, Aug 1, 2020 at 9:55 AM vignesh C <vignesh21@gmail.com> wrote:
The patches were not applying because of the recent commits.
I have rebased the patch over head & attached.I rebased v2-0006-Parallel-Copy-For-Binary-Format-Files.patch.
Putting together all the patches rebased on to the latest commit
b8fdee7d0ca8bd2165d46fb1468f75571b706a01. Patches from 0001 to 0005
are rebased by Vignesh, that are from the previous mail and the patch
0006 is rebased by me.Please consider this patch set for further review.
I'd suggest incrementing the version every time an updated version is
submitted, even if it's just a rebased version. It makes it clearer
which version of the code is being discussed, etc.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Aug 4, 2020 at 9:51 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
On Mon, Aug 03, 2020 at 12:33:48PM +0530, Bharath Rupireddy wrote:
On Sat, Aug 1, 2020 at 9:55 AM vignesh C <vignesh21@gmail.com> wrote:
The patches were not applying because of the recent commits.
I have rebased the patch over head & attached.I rebased v2-0006-Parallel-Copy-For-Binary-Format-Files.patch.
Putting together all the patches rebased on to the latest commit
b8fdee7d0ca8bd2165d46fb1468f75571b706a01. Patches from 0001 to 0005
are rebased by Vignesh, that are from the previous mail and the patch
0006 is rebased by me.Please consider this patch set for further review.
I'd suggest incrementing the version every time an updated version is
submitted, even if it's just a rebased version. It makes it clearer
which version of the code is being discussed, etc.
Sure, we will take care of this when we are sending the next set of patches.
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: tested, passed
Documentation: tested, failed
Hi,
I don't claim to yet understand all of the Postgres internals that this patch is updating and interacting with, so I'm still testing and debugging portions of this patch, but would like to give feedback on what I've noticed so far.
I have done some ad-hoc testing of the patch using parallel copies from text/csv/binary files and have not yet struck any execution problems other than some option validation and associated error messages on boundary cases.
One general question that I have: is there a user benefit (over the normal non-parallel COPY) to allowing "COPY ... FROM ... WITH (PARALLEL 1)"?
My following comments are broken down by patch:
(1) v2-0001-Copy-code-readjustment-to-support-parallel-copy.patch
(i) Whilst I can't entirely blame these patches for it (as they are following what is already there), I can't help noticing the use of numerous macros in src/backend/commands/copy.c which paste in multiple lines of code in various places.
It's getting a little out-of-hand. Surely the majority of these would be best inline functions instead?
Perhaps hasn't been done because too many parameters need to be passed - thoughts?
(2) v2-0002-Framework-for-leader-worker-in-parallel-copy.patch
(i) minor point: there are some tabbing/spacing issues in this patch (and the other patches), affecting alignment.
e.g. mixed tabs/spaces and misalignment in PARALLEL_COPY_KEY_xxx definitions
(ii)
+/*
+ * Each worker will be allocated WORKER_CHUNK_COUNT of records from DSM data
+ * block to process to avoid lock contention. This value should be mode of
+ * RINGSIZE, as wrap around cases is currently not handled while selecting the
+ * WORKER_CHUNK_COUNT by the worker.
+ */
+#define WORKER_CHUNK_COUNT 50
"This value should be mode of RINGSIZE ..."
-> typo: mode (mod? should evenly divide into RINGSIZE?)
(iii)
+ * using pg_atomic_compare_exchange_u32, worker will change the sate to
->typo: sate (should be "state")
(iv)
+ errmsg("parallel option supported only for copy from"),
-> suggest change to: errmsg("parallel option is supported only for COPY FROM"),
(v)
+ errno = 0; /* To distinguish success/failure after call */
+ val = strtol(str, &endptr, 10);
+
+ /* Check for various possible errors */
+ if ((errno == ERANGE && (val == LONG_MAX || val == LONG_MIN))
+ || (errno != 0 && val == 0) ||
+ *endptr)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("improper use of argument to option \"%s\"",
+ defel->defname),
+ parser_errposition(pstate, defel->location)));
+
+ if (endptr == str)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("no digits were found in argument to option \"%s\"",
+ defel->defname),
+ parser_errposition(pstate, defel->location)));
+
+ cstate->nworkers = (int) val;
+
+ if (cstate->nworkers <= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("argument to option \"%s\" must be a positive integer greater than zero",
+ defel->defname),
+ parser_errposition(pstate, defel->location)));
I think this validation code needs to be improved, including the error messages (e.g. when can a "positive integer" NOT be greater than zero?)
There is some overlap in the "no digits were found" case between the two conditions above, depending, for example, if the argument is quoted.
Also, "improper use of argument to option" sounds a bit odd and vague to me.
Finally, not range checking before casting long to int can lead to allowing out-of-range int values like in the following case:
test=# copy mytable from '/myspace/test_pcopy/tmp.dat' (parallel '-2147483648');
ERROR: argument to option "parallel" must be a positive integer greater than zero
LINE 1: copy mytable from '/myspace/test_pcopy/tmp.dat' (parallel '-2...
^
BUT the following is allowed...
test=# copy mytable from '/myspace/test_pcopy/tmp.dat' (parallel '-2147483649');
COPY 1000000
I'd suggest to change the above validation code to do similar validation to that for the CREATE TABLE parallel_workers storage parameter (case RELOPT_TYPE_INT in reloptions.c). Like that code, wouldn't it be best to range-check the integer option value to be within a reasonable range, say 1 to 1024, with a corresponding errdetail message if possible?
(3) v2-0003-Allow-copy-from-command-to-process-data-from-file.patch
(i)
Patch comment says:
"This feature allows the copy from to leverage multiple CPUs in order to copy
data from file/STDIN to a table. This adds a PARALLEL option to COPY FROM
command where the user can specify the number of workers that can be used
to perform the COPY FROM command. Specifying zero as number of workers will
disable parallelism."
BUT - the changes to ProcessCopyOptions() specified in "v2-0002-Framework-for-leader-worker-in-parallel-copy.patch" do not allow zero workers to be specified - you get an error in that case. Patch comment should be updated accordingly.
(ii)
#define GETPROCESSED(processed) \
-return processed;
+if (!IsParallelCopy()) \
+ return processed; \
+else \
+ return pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
+
I think GETPROCESSED would be better named "RETURNPROCESSED".
(iii)
The below comment seems out- of-date with the current code - is it referring to the loop embedded at the bottom of the current loop that the comment is within?
+ /*
+ * There is a possibility that the above loop has come out because
+ * data_blk_ptr->curr_blk_completed is set, but dataSize read might
+ * be an old value, if data_blk_ptr->curr_blk_completed and the line is
+ * completed, line_size will be set. Read the line_size again to be
+ * sure if it is complete or partial block.
+ */
(iv)
I may be wrong here, but in the following block of code, isn't there a window of opportunity (however small) in which the line_state might be updated (LINE_WORKER_PROCESSED) by another worker just AFTER pg_atomic_read_u32() returns the current line_state which is put into curr_line_state, such that a write_pos update might be missed? And then a race-condition exists for reading/setting line_size (since line_size gets atomically set after line_state is set)?
If I am wrong in thinking this synchronization might not be correct, maybe the comments could be improved here to explain how this code is safe in that respect.
+ /* Get the current line information. */
+ lineInfo = &pcshared_info->line_boundaries.ring[write_pos];
+ curr_line_state = pg_atomic_read_u32(&lineInfo->line_state);
+ if ((write_pos % WORKER_CHUNK_COUNT == 0) &&
+ (curr_line_state == LINE_WORKER_PROCESSED ||
+ curr_line_state == LINE_WORKER_PROCESSING))
+ {
+ pcdata->worker_processed_pos = write_pos;
+ write_pos = (write_pos + WORKER_CHUNK_COUNT) % RINGSIZE;
+ continue;
+ }
+
+ /* Get the size of this line. */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+
+ if (dataSize != 0) /* If not an empty line. */
+ {
+ /* Get the block information. */
+ data_blk_ptr = &pcshared_info->data_blocks[lineInfo->first_block];
+
+ if (!data_blk_ptr->curr_blk_completed && (dataSize == -1))
+ {
+ /* Wait till the current line or block is added. */
+ COPY_WAIT_TO_PROCESS()
+ continue;
+ }
+ }
+
+ /* Make sure that no worker has consumed this element. */
+ if (pg_atomic_compare_exchange_u32(&lineInfo->line_state,
+ &line_state, LINE_WORKER_PROCESSING))
+ break;
(4) v2-0004-Documentation-for-parallel-copy.patch
(i) I think that it is necessary to mention the "max_worker_processes" option in the description of the COPY statement PARALLEL option.
For example, something like:
+ Perform <command>COPY FROM</command> in parallel using <replaceable
+ class="parameter"> integer</replaceable> background workers. Please
+ note that it is not guaranteed that the number of parallel workers
+ specified in <replaceable class="parameter">integer</replaceable> will
+ be used during execution. It is possible for a copy to run with fewer
+ workers than specified, or even with no workers at all (for example,
+ due to the setting of max_worker_processes). This option is allowed
+ only in <command>COPY FROM</command>.
(5) v2-0005-Tests-for-parallel-copy.patch
(i) None of the provided tests seem to test beyond "PARALLEL 2"
(6) v2-0006-Parallel-Copy-For-Binary-Format-Files.patch
(i) In the ParallelCopyFrom() function, "cstate->raw_buf" is pfree()d:
+ /* raw_buf is not used in parallel copy, instead data blocks are used.*/
+ pfree(cstate->raw_buf);
This comment doesn't seem to be entirely true.
At least for text/csv file COPY FROM, cstate->raw_buf is subsequently referenced in the SetRawBufForLoad() function, which is called by CopyReadLineText():
cur_data_blk_ptr = (cstate->raw_buf) ? &pcshared_info->data_blocks[cur_block_pos] : NULL;
So I think cstate->raw_buf should be set to NULL after being pfree()d, and the comment fixed/adjusted.
(ii) This patch adds some macros (involving parallel copy checks) AFTER the comment:
/* End parallel copy Macros */
Regards,
Greg Nancarrow
Fujitsu Australia
Thanks Greg for reviewing the patch. Please find my thoughts for your comments.
On Wed, Aug 12, 2020 at 9:10 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
I have done some ad-hoc testing of the patch using parallel copies from text/csv/binary files and have not yet struck any execution problems other than some option validation and associated error messages on boundary cases.
One general question that I have: is there a user benefit (over the normal non-parallel COPY) to allowing "COPY ... FROM ... WITH (PARALLEL 1)"?
There will be marginal improvement as worker only need to process the
data, need not do the file reading, file reading would have been done
by the main process. The real improvement can be seen from 2 workers
onwards.
My following comments are broken down by patch:
(1) v2-0001-Copy-code-readjustment-to-support-parallel-copy.patch
(i) Whilst I can't entirely blame these patches for it (as they are following what is already there), I can't help noticing the use of numerous macros in src/backend/commands/copy.c which paste in multiple lines of code in various places.
It's getting a little out-of-hand. Surely the majority of these would be best inline functions instead?
Perhaps hasn't been done because too many parameters need to be passed - thoughts?
I felt they have used macros mainly because it has a tight loop and
having macros gives better performance. I have added the macros
CLEAR_EOL_LINE, INCREMENTPROCESSED & GETPROCESSED as there will be
slight difference in parallel copy & non parallel copy for these. In
the remaining patches the macor will be extended to include parallel
copy logic. Instead of having checks in the core logic, thought of
keeping as macros so that the readability is good.
(2) v2-0002-Framework-for-leader-worker-in-parallel-copy.patch
(i) minor point: there are some tabbing/spacing issues in this patch (and the other patches), affecting alignment.
e.g. mixed tabs/spaces and misalignment in PARALLEL_COPY_KEY_xxx definitions
Fixed
(ii)
+/* + * Each worker will be allocated WORKER_CHUNK_COUNT of records from DSM data + * block to process to avoid lock contention. This value should be mode of + * RINGSIZE, as wrap around cases is currently not handled while selecting the + * WORKER_CHUNK_COUNT by the worker. + */ +#define WORKER_CHUNK_COUNT 50"This value should be mode of RINGSIZE ..."
-> typo: mode (mod? should evenly divide into RINGSIZE?)
Fixed, changed it to divisible by.
(iii)
+ * using pg_atomic_compare_exchange_u32, worker will change the sate to->typo: sate (should be "state")
Fixed
(iv)
+ errmsg("parallel option supported only for copy from"),
-> suggest change to: errmsg("parallel option is supported only for COPY FROM"),
Fixed
(v)
+ errno = 0; /* To distinguish success/failure after call */ + val = strtol(str, &endptr, 10); + + /* Check for various possible errors */ + if ((errno == ERANGE && (val == LONG_MAX || val == LONG_MIN)) + || (errno != 0 && val == 0) || + *endptr) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("improper use of argument to option \"%s\"", + defel->defname), + parser_errposition(pstate, defel->location))); + + if (endptr == str) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("no digits were found in argument to option \"%s\"", + defel->defname), + parser_errposition(pstate, defel->location))); + + cstate->nworkers = (int) val; + + if (cstate->nworkers <= 0) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("argument to option \"%s\" must be a positive integer greater than zero", + defel->defname), + parser_errposition(pstate, defel->location)));I think this validation code needs to be improved, including the error messages (e.g. when can a "positive integer" NOT be greater than zero?)
There is some overlap in the "no digits were found" case between the two conditions above, depending, for example, if the argument is quoted.
Also, "improper use of argument to option" sounds a bit odd and vague to me.
Finally, not range checking before casting long to int can lead to allowing out-of-range int values like in the following case:test=# copy mytable from '/myspace/test_pcopy/tmp.dat' (parallel '-2147483648');
ERROR: argument to option "parallel" must be a positive integer greater than zero
LINE 1: copy mytable from '/myspace/test_pcopy/tmp.dat' (parallel '-2...
^
BUT the following is allowed...test=# copy mytable from '/myspace/test_pcopy/tmp.dat' (parallel '-2147483649');
COPY 1000000I'd suggest to change the above validation code to do similar validation to that for the CREATE TABLE parallel_workers storage parameter (case RELOPT_TYPE_INT in reloptions.c). Like that code, wouldn't it be best to range-check the integer option value to be within a reasonable range, say 1 to 1024, with a corresponding errdetail message if possible?
Fixed, changed as suggested.
(3) v2-0003-Allow-copy-from-command-to-process-data-from-file.patch
(i)
Patch comment says:
"This feature allows the copy from to leverage multiple CPUs in order to copy
data from file/STDIN to a table. This adds a PARALLEL option to COPY FROM
command where the user can specify the number of workers that can be used
to perform the COPY FROM command. Specifying zero as number of workers will
disable parallelism."BUT - the changes to ProcessCopyOptions() specified in "v2-0002-Framework-for-leader-worker-in-parallel-copy.patch" do not allow zero workers to be specified - you get an error in that case. Patch comment should be updated accordingly.
Removed "Specifying zero as number of workers will disable
parallelism". As the new value is range from 1 to 1024.
(ii)
#define GETPROCESSED(processed) \ -return processed; +if (!IsParallelCopy()) \ + return processed; \ +else \ + return pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed); +I think GETPROCESSED would be better named "RETURNPROCESSED".
Fixed.
(iii)
The below comment seems out- of-date with the current code - is it referring to the loop embedded at the bottom of the current loop that the comment is within?
+ /* + * There is a possibility that the above loop has come out because + * data_blk_ptr->curr_blk_completed is set, but dataSize read might + * be an old value, if data_blk_ptr->curr_blk_completed and the line is + * completed, line_size will be set. Read the line_size again to be + * sure if it is complete or partial block. + */
Updated, it is referring to the embedded loop at the bottom of the current loop.
(iv)
I may be wrong here, but in the following block of code, isn't there a window of opportunity (however small) in which the line_state might be updated (LINE_WORKER_PROCESSED) by another worker just AFTER pg_atomic_read_u32() returns the current line_state which is put into curr_line_state, such that a write_pos update might be missed? And then a race-condition exists for reading/setting line_size (since line_size gets atomically set after line_state is set)?
If I am wrong in thinking this synchronization might not be correct, maybe the comments could be improved here to explain how this code is safe in that respect.+ /* Get the current line information. */ + lineInfo = &pcshared_info->line_boundaries.ring[write_pos]; + curr_line_state = pg_atomic_read_u32(&lineInfo->line_state); + if ((write_pos % WORKER_CHUNK_COUNT == 0) && + (curr_line_state == LINE_WORKER_PROCESSED || + curr_line_state == LINE_WORKER_PROCESSING)) + { + pcdata->worker_processed_pos = write_pos; + write_pos = (write_pos + WORKER_CHUNK_COUNT) % RINGSIZE; + continue; + } + + /* Get the size of this line. */ + dataSize = pg_atomic_read_u32(&lineInfo->line_size); + + if (dataSize != 0) /* If not an empty line. */ + { + /* Get the block information. */ + data_blk_ptr = &pcshared_info->data_blocks[lineInfo->first_block]; + + if (!data_blk_ptr->curr_blk_completed && (dataSize == -1)) + { + /* Wait till the current line or block is added. */ + COPY_WAIT_TO_PROCESS() + continue; + } + } + + /* Make sure that no worker has consumed this element. */ + if (pg_atomic_compare_exchange_u32(&lineInfo->line_state, + &line_state, LINE_WORKER_PROCESSING)) + break;
This is not possible because of pg_atomic_compare_exchange_u32, this
will succeed only for one of the workers whose line_state is
LINE_LEADER_POPULATED, for other workers it will fail. This is
explained in detail above ParallelCopyLineBoundary.
(4) v2-0004-Documentation-for-parallel-copy.patch
(i) I think that it is necessary to mention the "max_worker_processes" option in the description of the COPY statement PARALLEL option.
For example, something like:
+ Perform <command>COPY FROM</command> in parallel using <replaceable + class="parameter"> integer</replaceable> background workers. Please + note that it is not guaranteed that the number of parallel workers + specified in <replaceable class="parameter">integer</replaceable> will + be used during execution. It is possible for a copy to run with fewer + workers than specified, or even with no workers at all (for example, + due to the setting of max_worker_processes). This option is allowed + only in <command>COPY FROM</command>.
Fixed.
(5) v2-0005-Tests-for-parallel-copy.patch
(i) None of the provided tests seem to test beyond "PARALLEL 2"
I intentionally ran with 1 parallel worker, because when you specify
more than 1 parallel worker the order of record insertion can vary &
there may be random failures.
(6) v2-0006-Parallel-Copy-For-Binary-Format-Files.patch
(i) In the ParallelCopyFrom() function, "cstate->raw_buf" is pfree()d:
+ /* raw_buf is not used in parallel copy, instead data blocks are used.*/ + pfree(cstate->raw_buf);
raw_buf is not used in parallel copy, instead raw_buf will be pointing
to shared memory data blocks. This memory was allocated as part of
BeginCopyFrom, uptil this point we cannot be 100% sure as copy can be
performed sequentially like in case max_worker_processes is not
available, if it switches to sequential mode raw_buf will be used
while performing copy operation. At this place we can safely free this
memory that was allocated.
This comment doesn't seem to be entirely true.
At least for text/csv file COPY FROM, cstate->raw_buf is subsequently referenced in the SetRawBufForLoad() function, which is called by CopyReadLineText():cur_data_blk_ptr = (cstate->raw_buf) ? &pcshared_info->data_blocks[cur_block_pos] : NULL;
So I think cstate->raw_buf should be set to NULL after being pfree()d, and the comment fixed/adjusted.
(ii) This patch adds some macros (involving parallel copy checks) AFTER the comment:
/* End parallel copy Macros */
Fixed, moved the macros above the comment.
I have attached new set of patches with the fixes.
Thoughts?
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v3-0001-Copy-code-readjustment-to-support-parallel-copy.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Copy-code-readjustment-to-support-parallel-copy.patchDownload
From ddba635791eb415ffae8136b6e77875d2fac353e Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Sat, 1 Aug 2020 08:52:48 +0530
Subject: [PATCH v3 1/6] Copy code readjustment to support parallel copy.
This patch has the copy code slightly readjusted so that the common code is
separated to functions/macros, these functions/macros will be used by the
workers in the parallel copy code of the upcoming patches. EOL removal is moved
from CopyReadLine to CopyReadLineText, this change was required because in case
of parallel copy the record identification and record updation is done in
CopyReadLineText, before record information is updated in shared memory the new
line characters should be removed.
---
src/backend/commands/copy.c | 361 ++++++++++++++++++++++++++------------------
1 file changed, 218 insertions(+), 143 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index db7d24a..436e458 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -95,6 +95,9 @@ typedef enum CopyInsertMethod
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
} CopyInsertMethod;
+#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
+
/*
* This struct contains all the state variables used throughout a COPY
* operation. For simplicity, we use the same struct for all variants of COPY,
@@ -224,7 +227,6 @@ typedef struct CopyStateData
* appropriate amounts of data from this buffer. In both modes, we
* guarantee that there is a \0 at raw_buf[raw_buf_len].
*/
-#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
char *raw_buf;
int raw_buf_index; /* next byte to process */
int raw_buf_len; /* total # of bytes stored */
@@ -354,6 +356,27 @@ if (1) \
goto not_end_of_copy; \
} else ((void) 0)
+/*
+ * CLEAR_EOL_LINE - Wrapper for clearing EOL.
+ */
+#define CLEAR_EOL_LINE() \
+if (!result && !IsHeaderLine()) \
+ ClearEOLFromCopiedData(cstate, cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ &cstate->line_buf.len) \
+
+/*
+ * INCREMENTPROCESSED - Increment the lines processed.
+ */
+#define INCREMENTPROCESSED(processed) \
+processed++;
+
+/*
+ * RETURNPROCESSED - Get the lines processed.
+ */
+#define RETURNPROCESSED(processed) \
+return processed;
+
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
@@ -401,7 +424,11 @@ static bool CopyGetInt16(CopyState cstate, int16 *val);
static bool CopyLoadRawBuf(CopyState cstate);
static int CopyReadBinaryData(CopyState cstate, char *dest, int nbytes);
-
+static void PopulateCommonCstateInfo(CopyState cstate, TupleDesc tup_desc,
+ List *attnamelist);
+static void ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
+ int copy_line_pos, int *copy_line_size);
+static void ConvertToServerEncoding(CopyState cstate);
/*
* Send copy start/stop messages for frontend copies. These have changed
* in past protocol redesigns.
@@ -801,14 +828,18 @@ CopyLoadRawBuf(CopyState cstate)
{
int nbytes = RAW_BUF_BYTES(cstate);
int inbytes;
+ int minread = 1;
/* Copy down the unprocessed data if any. */
if (nbytes > 0)
memmove(cstate->raw_buf, cstate->raw_buf + cstate->raw_buf_index,
nbytes);
+ if (cstate->copy_dest == COPY_NEW_FE)
+ minread = RAW_BUF_SIZE - nbytes;
+
inbytes = CopyGetData(cstate, cstate->raw_buf + nbytes,
- 1, RAW_BUF_SIZE - nbytes);
+ minread, RAW_BUF_SIZE - nbytes);
nbytes += inbytes;
cstate->raw_buf[nbytes] = '\0';
cstate->raw_buf_index = 0;
@@ -1514,7 +1545,6 @@ BeginCopy(ParseState *pstate,
{
CopyState cstate;
TupleDesc tupDesc;
- int num_phys_attrs;
MemoryContext oldcontext;
/* Allocate workspace and zero all fields */
@@ -1680,6 +1710,24 @@ BeginCopy(ParseState *pstate,
tupDesc = cstate->queryDesc->tupDesc;
}
+ PopulateCommonCstateInfo(cstate, tupDesc, attnamelist);
+ cstate->copy_dest = COPY_FILE; /* default */
+
+ MemoryContextSwitchTo(oldcontext);
+
+ return cstate;
+}
+
+/*
+ * PopulateCommonCstateInfo - Populates the common variables required for copy
+ * from operation. This is a helper function for BeginCopy function.
+ */
+static void
+PopulateCommonCstateInfo(CopyState cstate, TupleDesc tupDesc,
+ List *attnamelist)
+{
+ int num_phys_attrs;
+
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
@@ -1799,12 +1847,6 @@ BeginCopy(ParseState *pstate,
pg_database_encoding_max_length() > 1);
/* See Multibyte encoding comment above */
cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
-
- cstate->copy_dest = COPY_FILE; /* default */
-
- MemoryContextSwitchTo(oldcontext);
-
- return cstate;
}
/*
@@ -2696,32 +2738,11 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
}
/*
- * Copy FROM file to relation.
+ * Check if the relation specified in copy from is valid.
*/
-uint64
-CopyFrom(CopyState cstate)
+static void
+CheckTargetRelValidity(CopyState cstate)
{
- ResultRelInfo *resultRelInfo;
- ResultRelInfo *target_resultRelInfo;
- ResultRelInfo *prevResultRelInfo = NULL;
- EState *estate = CreateExecutorState(); /* for ExecConstraints() */
- ModifyTableState *mtstate;
- ExprContext *econtext;
- TupleTableSlot *singleslot = NULL;
- MemoryContext oldcontext = CurrentMemoryContext;
-
- PartitionTupleRouting *proute = NULL;
- ErrorContextCallback errcallback;
- CommandId mycid = GetCurrentCommandId(true);
- int ti_options = 0; /* start with default options for insert */
- BulkInsertState bistate = NULL;
- CopyInsertMethod insertMethod;
- CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
- uint64 processed = 0;
- bool has_before_insert_row_trig;
- bool has_instead_insert_row_trig;
- bool leafpart_use_multi_insert = false;
-
Assert(cstate->rel);
/*
@@ -2758,27 +2779,6 @@ CopyFrom(CopyState cstate)
RelationGetRelationName(cstate->rel))));
}
- /*
- * If the target file is new-in-transaction, we assume that checking FSM
- * for free space is a waste of time. This could possibly be wrong, but
- * it's unlikely.
- */
- if (RELKIND_HAS_STORAGE(cstate->rel->rd_rel->relkind) &&
- (cstate->rel->rd_createSubid != InvalidSubTransactionId ||
- cstate->rel->rd_firstRelfilenodeSubid != InvalidSubTransactionId))
- ti_options |= TABLE_INSERT_SKIP_FSM;
-
- /*
- * Optimize if new relfilenode was created in this subxact or one of its
- * committed children and we won't see those rows later as part of an
- * earlier scan or command. The subxact test ensures that if this subxact
- * aborts then the frozen rows won't be visible after xact cleanup. Note
- * that the stronger test of exactly which subtransaction created it is
- * crucial for correctness of this optimization. The test for an earlier
- * scan or command tolerates false negatives. FREEZE causes other sessions
- * to see rows they would not see under MVCC, and a false negative merely
- * spreads that anomaly to the current session.
- */
if (cstate->freeze)
{
/*
@@ -2816,9 +2816,61 @@ CopyFrom(CopyState cstate)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("cannot perform COPY FREEZE because the table was not created or truncated in the current subtransaction")));
+ }
+}
+
+/*
+ * Copy FROM file to relation.
+ */
+uint64
+CopyFrom(CopyState cstate)
+{
+ ResultRelInfo *resultRelInfo;
+ ResultRelInfo *target_resultRelInfo;
+ ResultRelInfo *prevResultRelInfo = NULL;
+ EState *estate = CreateExecutorState(); /* for ExecConstraints() */
+ ModifyTableState *mtstate;
+ ExprContext *econtext;
+ TupleTableSlot *singleslot = NULL;
+ MemoryContext oldcontext = CurrentMemoryContext;
+
+ PartitionTupleRouting *proute = NULL;
+ ErrorContextCallback errcallback;
+ CommandId mycid = GetCurrentCommandId(true);
+ int ti_options = 0; /* start with default options for insert */
+ BulkInsertState bistate = NULL;
+ CopyInsertMethod insertMethod;
+ CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
+ uint64 processed = 0;
+ bool has_before_insert_row_trig;
+ bool has_instead_insert_row_trig;
+ bool leafpart_use_multi_insert = false;
+
+ CheckTargetRelValidity(cstate);
+ /*
+ * If the target file is new-in-transaction, we assume that checking FSM
+ * for free space is a waste of time. This could possibly be wrong, but
+ * it's unlikely.
+ */
+ if (RELKIND_HAS_STORAGE(cstate->rel->rd_rel->relkind) &&
+ (cstate->rel->rd_createSubid != InvalidSubTransactionId ||
+ cstate->rel->rd_firstRelfilenodeSubid != InvalidSubTransactionId))
+ ti_options |= TABLE_INSERT_SKIP_FSM;
+
+ /*
+ * Optimize if new relfilenode was created in this subxact or one of its
+ * committed children and we won't see those rows later as part of an
+ * earlier scan or command. The subxact test ensures that if this subxact
+ * aborts then the frozen rows won't be visible after xact cleanup. Note
+ * that the stronger test of exactly which subtransaction created it is
+ * crucial for correctness of this optimization. The test for an earlier
+ * scan or command tolerates false negatives. FREEZE causes other sessions
+ * to see rows they would not see under MVCC, and a false negative merely
+ * spreads that anomaly to the current session.
+ */
+ if (cstate->freeze)
ti_options |= TABLE_INSERT_FROZEN;
- }
/*
* We need a ResultRelInfo so we can use the regular executor's
@@ -3311,7 +3363,7 @@ CopyFrom(CopyState cstate)
* or FDW; this is the same definition used by nodeModifyTable.c
* for counting tuples inserted by an INSERT command.
*/
- processed++;
+ INCREMENTPROCESSED(processed)
}
}
@@ -3366,30 +3418,15 @@ CopyFrom(CopyState cstate)
FreeExecutorState(estate);
- return processed;
+ RETURNPROCESSED(processed)
}
/*
- * Setup to read tuples from a file for COPY FROM.
- *
- * 'rel': Used as a template for the tuples
- * 'filename': Name of server-local file to read
- * 'attnamelist': List of char *, columns to include. NIL selects all cols.
- * 'options': List of DefElem. See copy_opt_item in gram.y for selections.
- *
- * Returns a CopyState, to be passed to NextCopyFrom and related functions.
+ * PopulateCstateCatalogInfo - populate the catalog information.
*/
-CopyState
-BeginCopyFrom(ParseState *pstate,
- Relation rel,
- const char *filename,
- bool is_program,
- copy_data_source_cb data_source_cb,
- List *attnamelist,
- List *options)
+static void
+PopulateCstateCatalogInfo(CopyState cstate)
{
- CopyState cstate;
- bool pipe = (filename == NULL);
TupleDesc tupDesc;
AttrNumber num_phys_attrs,
num_defaults;
@@ -3399,38 +3436,8 @@ BeginCopyFrom(ParseState *pstate,
Oid in_func_oid;
int *defmap;
ExprState **defexprs;
- MemoryContext oldcontext;
bool volatile_defexprs;
- cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
- oldcontext = MemoryContextSwitchTo(cstate->copycontext);
-
- /* Initialize state variables */
- cstate->reached_eof = false;
- cstate->eol_type = EOL_UNKNOWN;
- cstate->cur_relname = RelationGetRelationName(cstate->rel);
- cstate->cur_lineno = 0;
- cstate->cur_attname = NULL;
- cstate->cur_attval = NULL;
-
- /*
- * Set up variables to avoid per-attribute overhead. attribute_buf and
- * raw_buf are used in both text and binary modes, but we use line_buf
- * only in text mode.
- */
- initStringInfo(&cstate->attribute_buf);
- cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
- cstate->raw_buf_index = cstate->raw_buf_len = 0;
- if (!cstate->binary)
- {
- initStringInfo(&cstate->line_buf);
- cstate->line_buf_converted = false;
- }
-
- /* Assign range table, we'll need it in CopyFrom. */
- if (pstate)
- cstate->range_table = pstate->p_rtable;
-
tupDesc = RelationGetDescr(cstate->rel);
num_phys_attrs = tupDesc->natts;
num_defaults = 0;
@@ -3508,6 +3515,61 @@ BeginCopyFrom(ParseState *pstate,
cstate->defexprs = defexprs;
cstate->volatile_defexprs = volatile_defexprs;
cstate->num_defaults = num_defaults;
+}
+
+/*
+ * Setup to read tuples from a file for COPY FROM.
+ *
+ * 'rel': Used as a template for the tuples
+ * 'filename': Name of server-local file to read
+ * 'attnamelist': List of char *, columns to include. NIL selects all cols.
+ * 'options': List of DefElem. See copy_opt_item in gram.y for selections.
+ *
+ * Returns a CopyState, to be passed to NextCopyFrom and related functions.
+ */
+CopyState
+BeginCopyFrom(ParseState *pstate,
+ Relation rel,
+ const char *filename,
+ bool is_program,
+ copy_data_source_cb data_source_cb,
+ List *attnamelist,
+ List *options)
+{
+ CopyState cstate;
+ bool pipe = (filename == NULL);
+ MemoryContext oldcontext;
+
+ cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+
+ /* Initialize state variables */
+ cstate->reached_eof = false;
+ cstate->eol_type = EOL_UNKNOWN;
+ cstate->cur_relname = RelationGetRelationName(cstate->rel);
+ cstate->cur_lineno = 0;
+ cstate->cur_attname = NULL;
+ cstate->cur_attval = NULL;
+
+ /*
+ * Set up variables to avoid per-attribute overhead. attribute_buf is
+ * used in both text and binary modes, but we use line_buf and raw_buf
+ * only in text mode.
+ */
+ initStringInfo(&cstate->attribute_buf);
+ cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+ if (!cstate->binary)
+ {
+ initStringInfo(&cstate->line_buf);
+ cstate->line_buf_converted = false;
+ }
+
+ /* Assign range table, we'll need it in CopyFrom. */
+ if (pstate)
+ cstate->range_table = pstate->p_rtable;
+
+ PopulateCstateCatalogInfo(cstate);
cstate->is_program = is_program;
if (data_source_cb)
@@ -3917,45 +3979,60 @@ CopyReadLine(CopyState cstate)
} while (CopyLoadRawBuf(cstate));
}
}
- else
- {
- /*
- * If we didn't hit EOF, then we must have transferred the EOL marker
- * to line_buf along with the data. Get rid of it.
- */
- switch (cstate->eol_type)
- {
- case EOL_NL:
- Assert(cstate->line_buf.len >= 1);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\n');
- cstate->line_buf.len--;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_CR:
- Assert(cstate->line_buf.len >= 1);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\r');
- cstate->line_buf.len--;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_CRNL:
- Assert(cstate->line_buf.len >= 2);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 2] == '\r');
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\n');
- cstate->line_buf.len -= 2;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_UNKNOWN:
- /* shouldn't get here */
- Assert(false);
- break;
- }
- }
+ ConvertToServerEncoding(cstate);
+ return result;
+}
+
+/*
+ * ClearEOLFromCopiedData - Clear EOL from the copied data.
+ */
+static void
+ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
+ int copy_line_pos, int *copy_line_size)
+{
+ /*
+ * If we didn't hit EOF, then we must have transferred the EOL marker
+ * to line_buf along with the data. Get rid of it.
+ */
+ switch (cstate->eol_type)
+ {
+ case EOL_NL:
+ Assert(*copy_line_size >= 1);
+ Assert(copy_line_data[copy_line_pos - 1] == '\n');
+ copy_line_data[copy_line_pos - 1] = '\0';
+ (*copy_line_size)--;
+ break;
+ case EOL_CR:
+ Assert(*copy_line_size >= 1);
+ Assert(copy_line_data[copy_line_pos - 1] == '\r');
+ copy_line_data[copy_line_pos - 1] = '\0';
+ (*copy_line_size)--;
+ break;
+ case EOL_CRNL:
+ Assert(*copy_line_size >= 2);
+ Assert(copy_line_data[copy_line_pos - 2] == '\r');
+ Assert(copy_line_data[copy_line_pos - 1] == '\n');
+ copy_line_data[copy_line_pos - 2] = '\0';
+ *copy_line_size -= 2;
+ break;
+ case EOL_UNKNOWN:
+ /* shouldn't get here */
+ Assert(false);
+ break;
+ }
+}
+
+/*
+ * ConvertToServerEncoding - convert contents to server encoding.
+ */
+static void
+ConvertToServerEncoding(CopyState cstate)
+{
/* Done reading the line. Convert it to server encoding. */
if (cstate->need_transcoding)
{
char *cvt;
-
cvt = pg_any_to_server(cstate->line_buf.data,
cstate->line_buf.len,
cstate->file_encoding);
@@ -3967,11 +4044,8 @@ CopyReadLine(CopyState cstate)
pfree(cvt);
}
}
-
/* Now it's safe to use the buffer in error messages */
cstate->line_buf_converted = true;
-
- return result;
}
/*
@@ -4334,6 +4408,7 @@ not_end_of_copy:
* Transfer any still-uncopied data to line_buf.
*/
REFILL_LINEBUF;
+ CLEAR_EOL_LINE();
return result;
}
--
1.8.3.1
v3-0002-Framework-for-leader-worker-in-parallel-copy.patchtext/x-patch; charset=US-ASCII; name=v3-0002-Framework-for-leader-worker-in-parallel-copy.patchDownload
From 7faaa1c10a793b5f0f94d35327af23b9ffe889b2 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Sat, 1 Aug 2020 08:56:39 +0530
Subject: [PATCH v3 2/6] Framework for leader/worker in parallel copy
This patch has the framework for data structures in parallel copy, leader
initialization, worker initialization, shared memory updation, starting workers,
wait for workers and workers exiting.
---
src/backend/access/transam/parallel.c | 4 +
src/backend/commands/copy.c | 742 +++++++++++++++++++++++++++++++++-
src/include/commands/copy.h | 2 +
src/tools/pgindent/typedefs.list | 7 +
4 files changed, 751 insertions(+), 4 deletions(-)
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index b042696..a3cff4b 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -25,6 +25,7 @@
#include "catalog/pg_enum.h"
#include "catalog/storage.h"
#include "commands/async.h"
+#include "commands/copy.h"
#include "executor/execParallel.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
@@ -145,6 +146,9 @@ static const struct
},
{
"parallel_vacuum_main", parallel_vacuum_main
+ },
+ {
+ "ParallelCopyMain", ParallelCopyMain
}
};
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 436e458..25ef664 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -96,9 +96,180 @@ typedef enum CopyInsertMethod
} CopyInsertMethod;
#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+
+/*
+ * The macros DATA_BLOCK_SIZE, RINGSIZE & MAX_BLOCKS_COUNT stores the records
+ * read from the file that need to be inserted into the relation. These values
+ * help in handover of multiple records with significant size of data to be
+ * processed by each of the workers to make sure there is no context switch & the
+ * work is fairly distributed among the workers. This number showed best
+ * results in the performance tests.
+ */
+#define DATA_BLOCK_SIZE RAW_BUF_SIZE
+
+/* It can hold upto 10000 record information for worker to process. */
+#define RINGSIZE (10 * 1000)
+
+/* It can hold 1000 blocks of 64K data in DSM to be processed by the worker. */
+#define MAX_BLOCKS_COUNT 1000
+
+/*
+ * Each worker will be allocated WORKER_CHUNK_COUNT of records from DSM data
+ * block to process to avoid lock contention. This value should be divisible by
+ * RINGSIZE, as wrap around cases is currently not handled while selecting the
+ * WORKER_CHUNK_COUNT by the worker.
+ */
+#define WORKER_CHUNK_COUNT 50
+
+#define IsParallelCopy() (cstate->is_parallel)
+#define IsLeader() (cstate->pcdata->is_leader)
#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
/*
+ * Copy data block information.
+ * ParallelCopyDataBlock's will be created in DSM. Data read from file will be
+ * copied in these DSM data blocks. The leader process identifies the records
+ * and the record information will be shared to the workers. The workers will
+ * insert the records into the table. There can be one or more number of records
+ * in each of the data block based on the record size.
+ */
+typedef struct ParallelCopyDataBlock
+{
+ /* The number of unprocessed lines in the current block. */
+ pg_atomic_uint32 unprocessed_line_parts;
+
+ /*
+ * If the current line data is continued into another block, following_block
+ * will have the position where the remaining data need to be read.
+ */
+ uint32 following_block;
+
+ /*
+ * This flag will be set, when the leader finds out this block can be read
+ * safely by the worker. This helps the worker to start processing the line
+ * early where the line will be spread across many blocks and the worker
+ * need not wait for the complete line to be processed.
+ */
+ bool curr_blk_completed;
+ char data[DATA_BLOCK_SIZE]; /* data read from file */
+ uint8 skip_bytes;
+}ParallelCopyDataBlock;
+
+/*
+ * Individual line information.
+ * ParallelCopyLineBoundary is common data structure between leader & worker,
+ * Leader process will be populating data block, data block offset & the size of
+ * the record in DSM for the workers to copy the data into the relation.
+ * This is protected by the following sequence in the leader & worker. If they
+ * don't follow this order the worker might process wrong line_size and leader
+ * might populate the information which worker has not yet processed or in the
+ * process of processing.
+ * Leader should operate in the following order:
+ * 1) check if line_size is -1, if not wait, it means worker is still
+ * processing.
+ * 2) set line_state to LINE_LEADER_POPULATING.
+ * 3) update first_block, start_offset & cur_lineno in any order.
+ * 4) update line_size.
+ * 5) update line_state to LINE_LEADER_POPULATED.
+ * Worker should operate in the following order:
+ * 1) check line_state is LINE_LEADER_POPULATED, if not it means leader is still
+ * populating the data.
+ * 2) read line_size.
+ * 3) only one worker should choose one line for processing, this is handled by
+ * using pg_atomic_compare_exchange_u32, worker will change the state to
+ * LINE_WORKER_PROCESSING only if line_state is LINE_LEADER_POPULATED.
+ * 4) read first_block, start_offset & cur_lineno in any order.
+ * 5) process line_size data.
+ * 6) update line_size to -1.
+ */
+typedef struct ParallelCopyLineBoundary
+{
+ /* Position of the first block in data_blocks array. */
+ uint32 first_block;
+ uint32 start_offset; /* start offset of the line */
+
+ /*
+ * Size of the current line -1 means line is yet to be filled completely,
+ * 0 means empty line, >0 means line filled with line size data.
+ */
+ pg_atomic_uint32 line_size;
+ pg_atomic_uint32 line_state; /* line state */
+ uint64 cur_lineno; /* line number for error messages */
+}ParallelCopyLineBoundary;
+
+/*
+ * Circular queue used to store the line information.
+ */
+typedef struct ParallelCopyLineBoundaries
+{
+ /* Position for the leader to populate a line. */
+ uint32 pos;
+
+ /* Data read from the file/stdin by the leader process. */
+ ParallelCopyLineBoundary ring[RINGSIZE];
+}ParallelCopyLineBoundaries;
+
+/*
+ * Shared information among parallel copy workers. This will be allocated in the
+ * DSM segment.
+ */
+typedef struct ParallelCopyShmInfo
+{
+ bool is_read_in_progress; /* file read status */
+
+ /*
+ * Actual lines inserted by worker, will not be same as
+ * total_worker_processed if where condition is specified along with copy.
+ * This will be the actual records inserted into the relation.
+ */
+ pg_atomic_uint64 processed;
+
+ /*
+ * The number of records currently processed by the worker, this will also
+ * include the number of records that was filtered because of where clause.
+ */
+ pg_atomic_uint64 total_worker_processed;
+ uint64 populated; /* lines populated by leader */
+ uint32 cur_block_pos; /* current data block */
+ ParallelCopyDataBlock data_blocks[MAX_BLOCKS_COUNT]; /* data block array */
+ FullTransactionId full_transaction_id; /* xid for copy from statement */
+ CommandId mycid; /* command id */
+ ParallelCopyLineBoundaries line_boundaries; /* line array */
+} ParallelCopyShmInfo;
+
+/*
+ * Parallel copy line buffer information.
+ */
+typedef struct ParallelCopyLineBuf
+{
+ StringInfoData line_buf;
+ uint64 cur_lineno; /* line number for error messages */
+}ParallelCopyLineBuf;
+
+/*
+ * Parallel copy data information.
+ */
+typedef struct ParallelCopyData
+{
+ Oid relid; /* relation id of the table */
+ ParallelCopyShmInfo *pcshared_info; /* common info in shared memory */
+ bool is_leader;
+
+ /* line position which worker is processing */
+ uint32 worker_processed_pos;
+
+ /*
+ * Local line_buf array, workers will copy it here and release the lines
+ * for the leader to continue.
+ */
+ ParallelCopyLineBuf worker_line_buf[WORKER_CHUNK_COUNT];
+ uint32 worker_line_buf_count; /* Number of lines */
+
+ /* Current position in worker_line_buf */
+ uint32 worker_line_buf_pos;
+}ParallelCopyData;
+
+/*
* This struct contains all the state variables used throughout a COPY
* operation. For simplicity, we use the same struct for all variants of COPY,
* even though some fields are used in only some cases.
@@ -230,10 +401,38 @@ typedef struct CopyStateData
char *raw_buf;
int raw_buf_index; /* next byte to process */
int raw_buf_len; /* total # of bytes stored */
+ int nworkers;
+ bool is_parallel;
+ ParallelCopyData *pcdata;
/* Shorthand for number of unconsumed bytes available in raw_buf */
#define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
} CopyStateData;
+/*
+ * This structure helps in storing the common data from CopyStateData that are
+ * required by the workers. This information will then be allocated and stored
+ * into the DSM for the worker to retrieve and copy it to CopyStateData.
+ */
+typedef struct SerializedParallelCopyState
+{
+ /* low-level state data */
+ CopyDest copy_dest; /* type of copy source/destination */
+ int file_encoding; /* file or remote side's character encoding */
+ bool need_transcoding; /* file encoding diff from server? */
+ bool encoding_embeds_ascii; /* ASCII can be non-first byte? */
+
+ /* parameters from the COPY command */
+ bool csv_mode; /* Comma Separated Value format? */
+ bool header_line; /* CSV header line? */
+ int null_print_len; /* length of same */
+ bool force_quote_all; /* FORCE_QUOTE *? */
+ bool convert_selectively; /* do selective binary conversion? */
+
+ /* Working state for COPY FROM */
+ AttrNumber num_defaults;
+ Oid relid;
+}SerializedParallelCopyState;
+
/* DestReceiver for COPY (query) TO */
typedef struct
{
@@ -263,6 +462,22 @@ typedef struct
/* Trim the list of buffers back down to this number after flushing */
#define MAX_PARTITION_BUFFERS 32
+/* DSM keys for parallel copy. */
+#define PARALLEL_COPY_KEY_SHARED_INFO 1
+#define PARALLEL_COPY_KEY_CSTATE 2
+#define PARALLEL_COPY_KEY_NULL_PRINT 3
+#define PARALLEL_COPY_KEY_DELIM 4
+#define PARALLEL_COPY_KEY_QUOTE 5
+#define PARALLEL_COPY_KEY_ESCAPE 6
+#define PARALLEL_COPY_KEY_ATTNAME_LIST 7
+#define PARALLEL_COPY_KEY_NOT_NULL_LIST 8
+#define PARALLEL_COPY_KEY_NULL_LIST 9
+#define PARALLEL_COPY_KEY_CONVERT_LIST 10
+#define PARALLEL_COPY_KEY_WHERE_CLAUSE_STR 11
+#define PARALLEL_COPY_KEY_RANGE_TABLE 12
+#define PARALLEL_COPY_WAL_USAGE 13
+#define PARALLEL_COPY_BUFFER_USAGE 14
+
/* Stores multi-insert data related to a single relation in CopyFrom. */
typedef struct CopyMultiInsertBuffer
{
@@ -424,11 +639,477 @@ static bool CopyGetInt16(CopyState cstate, int16 *val);
static bool CopyLoadRawBuf(CopyState cstate);
static int CopyReadBinaryData(CopyState cstate, char *dest, int nbytes);
+static pg_attribute_always_inline void EndParallelCopy(ParallelContext *pcxt);
static void PopulateCommonCstateInfo(CopyState cstate, TupleDesc tup_desc,
- List *attnamelist);
+ List *attnamelist);
static void ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
int copy_line_pos, int *copy_line_size);
static void ConvertToServerEncoding(CopyState cstate);
+
+/*
+ * SerializeParallelCopyState - Copy shared_cstate using cstate information.
+ */
+static pg_attribute_always_inline void
+SerializeParallelCopyState(CopyState cstate, SerializedParallelCopyState *shared_cstate)
+{
+ shared_cstate->copy_dest = cstate->copy_dest;
+ shared_cstate->file_encoding = cstate->file_encoding;
+ shared_cstate->need_transcoding = cstate->need_transcoding;
+ shared_cstate->encoding_embeds_ascii = cstate->encoding_embeds_ascii;
+ shared_cstate->csv_mode = cstate->csv_mode;
+ shared_cstate->header_line = cstate->header_line;
+ shared_cstate->null_print_len = cstate->null_print_len;
+ shared_cstate->force_quote_all = cstate->force_quote_all;
+ shared_cstate->convert_selectively = cstate->convert_selectively;
+ shared_cstate->num_defaults = cstate->num_defaults;
+ shared_cstate->relid = cstate->pcdata->relid;
+}
+
+/*
+ * RestoreString - Retrieve the string from shared memory.
+ */
+static void
+RestoreString(shm_toc *toc, int sharedkey, char **copystr)
+{
+ char *shared_str_val = (char *)shm_toc_lookup(toc, sharedkey, true);
+ if (shared_str_val)
+ *copystr = pstrdup(shared_str_val);
+}
+
+/*
+ * EstimateLineKeysStr - Estimate the size required in shared memory for the
+ * input string.
+ */
+static void
+EstimateLineKeysStr(ParallelContext *pcxt, char *inputstr)
+{
+ if (inputstr)
+ {
+ shm_toc_estimate_chunk(&pcxt->estimator, strlen(inputstr) + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+}
+
+/*
+ * SerializeString - Insert a string into shared memory.
+ */
+static void
+SerializeString(ParallelContext *pcxt, int key, char *inputstr)
+{
+ if (inputstr)
+ {
+ char *shmptr = (char *)shm_toc_allocate(pcxt->toc,
+ strlen(inputstr) + 1);
+ strcpy(shmptr, inputstr);
+ shm_toc_insert(pcxt->toc, key, shmptr);
+ }
+}
+
+/*
+ * PopulateParallelCopyShmInfo - set ParallelCopyShmInfo.
+ */
+static void
+PopulateParallelCopyShmInfo(ParallelCopyShmInfo *shared_info_ptr,
+ FullTransactionId full_transaction_id)
+{
+ uint32 count;
+
+ MemSet(shared_info_ptr, 0, sizeof(ParallelCopyShmInfo));
+ shared_info_ptr->is_read_in_progress = true;
+ shared_info_ptr->cur_block_pos = -1;
+ shared_info_ptr->full_transaction_id = full_transaction_id;
+ shared_info_ptr->mycid = GetCurrentCommandId(true);
+ for (count = 0; count < RINGSIZE; count++)
+ {
+ ParallelCopyLineBoundary *lineInfo = &shared_info_ptr->line_boundaries.ring[count];
+ pg_atomic_init_u32(&(lineInfo->line_size), -1);
+ }
+}
+
+/*
+ * BeginParallelCopy - start parallel copy tasks.
+ *
+ * Get the number of workers required to perform the parallel copy. The data
+ * structures that are required by the parallel workers will be initialized, the
+ * size required in DSM will be calculated and the necessary keys will be loaded
+ * in the DSM. The specified number of workers will then be launched.
+ */
+static ParallelContext*
+BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
+{
+ ParallelContext *pcxt;
+ ParallelCopyShmInfo *shared_info_ptr;
+ SerializedParallelCopyState *shared_cstate;
+ FullTransactionId full_transaction_id;
+ Size est_cstateshared;
+ char *whereClauseStr = NULL;
+ char *rangeTableStr = NULL;
+ char *attnameListStr = NULL;
+ char *notnullListStr = NULL;
+ char *nullListStr = NULL;
+ char *convertListStr = NULL;
+ int parallel_workers = 0;
+ WalUsage *walusage;
+ BufferUsage *bufferusage;
+ ParallelCopyData *pcdata;
+ MemoryContext oldcontext;
+
+ parallel_workers = Min(nworkers, max_worker_processes);
+
+ /* Can't perform copy in parallel */
+ if (parallel_workers <= 0)
+ return NULL;
+
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+ pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
+ MemoryContextSwitchTo(oldcontext);
+ cstate->pcdata = pcdata;
+
+ EnterParallelMode();
+ pcxt = CreateParallelContext("postgres", "ParallelCopyMain",
+ parallel_workers);
+ Assert(pcxt->nworkers > 0);
+
+ /*
+ * Estimate size for shared information for PARALLEL_COPY_KEY_SHARED_INFO
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator, sizeof(ParallelCopyShmInfo));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate the size for shared information for PARALLEL_COPY_KEY_CSTATE */
+ est_cstateshared = MAXALIGN(sizeof(SerializedParallelCopyState));
+ shm_toc_estimate_chunk(&pcxt->estimator, est_cstateshared);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ EstimateLineKeysStr(pcxt, cstate->null_print);
+ EstimateLineKeysStr(pcxt, cstate->null_print_client);
+ EstimateLineKeysStr(pcxt, cstate->delim);
+ EstimateLineKeysStr(pcxt, cstate->quote);
+ EstimateLineKeysStr(pcxt, cstate->escape);
+
+ if (cstate->whereClause != NULL)
+ {
+ whereClauseStr = nodeToString(cstate->whereClause);
+ EstimateLineKeysStr(pcxt, whereClauseStr);
+ }
+
+ if (cstate->range_table != NULL)
+ {
+ rangeTableStr = nodeToString(cstate->range_table);
+ EstimateLineKeysStr(pcxt, rangeTableStr);
+ }
+
+ /* Estimate the size for shared information for PARALLEL_COPY_KEY_XID. */
+ shm_toc_estimate_chunk(&pcxt->estimator, sizeof(FullTransactionId));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_ATTNAME_LIST.
+ */
+ if (attnamelist != NIL)
+ {
+ attnameListStr = nodeToString(attnamelist);
+ EstimateLineKeysStr(pcxt, attnameListStr);
+ }
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_NOT_NULL_LIST.
+ */
+ if (cstate->force_notnull != NIL)
+ {
+ notnullListStr = nodeToString(cstate->force_notnull);
+ EstimateLineKeysStr(pcxt, notnullListStr);
+ }
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_NULL_LIST.
+ */
+ if (cstate->force_null != NIL)
+ {
+ nullListStr = nodeToString(cstate->force_null);
+ EstimateLineKeysStr(pcxt, nullListStr);
+ }
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_CONVERT_LIST.
+ */
+ if (cstate->convert_select != NIL)
+ {
+ convertListStr = nodeToString(cstate->convert_select);
+ EstimateLineKeysStr(pcxt, convertListStr);
+ }
+
+ /*
+ * Estimate space for WalUsage and BufferUsage -- PARALLEL_COPY_WAL_USAGE
+ * and PARALLEL_COPY_BUFFER_USAGE.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ InitializeParallelDSM(pcxt);
+
+ /* If no DSM segment was available, back out (do serial copy) */
+ if (pcxt->seg == NULL)
+ {
+ EndParallelCopy(pcxt);
+ return NULL;
+ }
+
+ /* Allocate shared memory for PARALLEL_COPY_KEY_SHARED_INFO */
+ shared_info_ptr = (ParallelCopyShmInfo *) shm_toc_allocate(pcxt->toc, sizeof(ParallelCopyShmInfo));
+ PopulateParallelCopyShmInfo(shared_info_ptr, full_transaction_id);
+
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_SHARED_INFO, shared_info_ptr);
+ pcdata->pcshared_info = shared_info_ptr;
+ pcdata->relid = relid;
+
+ /* Store shared build state, for which we reserved space. */
+ shared_cstate = (SerializedParallelCopyState *)shm_toc_allocate(pcxt->toc, est_cstateshared);
+
+ /* copy cstate variables. */
+ SerializeParallelCopyState(cstate, shared_cstate);
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_CSTATE, shared_cstate);
+
+ SerializeString(pcxt, PARALLEL_COPY_KEY_NULL_PRINT, cstate->null_print);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_DELIM, cstate->delim);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_QUOTE, cstate->quote);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_ESCAPE, cstate->escape);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_ATTNAME_LIST, attnameListStr);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_NOT_NULL_LIST, notnullListStr);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_NULL_LIST, nullListStr);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_CONVERT_LIST, convertListStr);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_WHERE_CLAUSE_STR, whereClauseStr);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_RANGE_TABLE, rangeTableStr);
+
+ /*
+ * Allocate space for each worker's WalUsage and BufferUsage; no need to
+ * initialize.
+ */
+ walusage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_WAL_USAGE, walusage);
+ bufferusage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_BUFFER_USAGE, bufferusage);
+
+ LaunchParallelWorkers(pcxt);
+ if (pcxt->nworkers_launched == 0)
+ {
+ EndParallelCopy(pcxt);
+ return NULL;
+ }
+
+ /*
+ * Caller needs to wait for all launched workers when we return. Make sure
+ * that the failure-to-start case will not hang forever.
+ */
+ WaitForParallelWorkersToAttach(pcxt);
+
+ pcdata->is_leader = true;
+ cstate->is_parallel = true;
+ return pcxt;
+}
+
+/*
+ * EndParallelCopy - end the parallel copy tasks.
+ */
+static pg_attribute_always_inline void
+EndParallelCopy(ParallelContext *pcxt)
+{
+ Assert(!IsParallelWorker());
+
+ DestroyParallelContext(pcxt);
+ ExitParallelMode();
+}
+
+/*
+ * InitializeParallelCopyInfo - Initialize parallel worker.
+ */
+static void
+InitializeParallelCopyInfo(SerializedParallelCopyState *shared_cstate,
+ CopyState cstate, List *attnamelist)
+{
+ uint32 count;
+ ParallelCopyData *pcdata = cstate->pcdata;
+ TupleDesc tup_desc = RelationGetDescr(cstate->rel);
+
+ cstate->copy_dest = shared_cstate->copy_dest;
+ cstate->file_encoding = shared_cstate->file_encoding;
+ cstate->need_transcoding = shared_cstate->need_transcoding;
+ cstate->encoding_embeds_ascii = shared_cstate->encoding_embeds_ascii;
+ cstate->csv_mode = shared_cstate->csv_mode;
+ cstate->header_line = shared_cstate->header_line;
+ cstate->null_print_len = shared_cstate->null_print_len;
+ cstate->force_quote_all = shared_cstate->force_quote_all;
+ cstate->convert_selectively = shared_cstate->convert_selectively;
+ cstate->num_defaults = shared_cstate->num_defaults;
+ pcdata->relid = shared_cstate->relid;
+
+ PopulateCommonCstateInfo(cstate, tup_desc, attnamelist);
+
+ /* Initialize state variables. */
+ cstate->reached_eof = false;
+ cstate->eol_type = EOL_UNKNOWN;
+ cstate->cur_relname = RelationGetRelationName(cstate->rel);
+ cstate->cur_lineno = 0;
+ cstate->cur_attname = NULL;
+ cstate->cur_attval = NULL;
+
+ /* Set up variables to avoid per-attribute overhead. */
+ initStringInfo(&cstate->attribute_buf);
+
+ initStringInfo(&cstate->line_buf);
+ for (count = 0; count < WORKER_CHUNK_COUNT; count++)
+ initStringInfo(&pcdata->worker_line_buf[count].line_buf);
+
+ cstate->line_buf_converted = false;
+ cstate->raw_buf = NULL;
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+}
+
+/*
+ * ParallelCopyMain - parallel copy worker's code.
+ *
+ * Where clause handling, convert tuple to columns, add default null values for
+ * the missing columns that are not present in that record. Find the partition
+ * if it is partitioned table, invoke before row insert Triggers, handle
+ * constraints and insert the tuples.
+ */
+void
+ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
+{
+ CopyState cstate;
+ ParallelCopyData *pcdata;
+ ParallelCopyShmInfo *pcshared_info;
+ SerializedParallelCopyState *shared_cstate;
+ Relation rel = NULL;
+ MemoryContext oldcontext;
+ List *attlist = NIL;
+ char *whereClauseStr = NULL;
+ char *rangeTableStr = NULL;
+ char *attnameListStr = NULL;
+ char *notnullListStr = NULL;
+ char *nullListStr = NULL;
+ char *convertListStr = NULL;
+ WalUsage *walusage;
+ BufferUsage *bufferusage;
+
+ /* Allocate workspace and zero all fields. */
+ cstate = (CopyStateData *) palloc0(sizeof(CopyStateData));
+
+ /*
+ * We allocate everything used by a cstate in a new memory context. This
+ * avoids memory leaks during repeated use of COPY in a query.
+ */
+ cstate->copycontext = AllocSetContextCreate(CurrentMemoryContext,
+ "COPY",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+
+ pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
+ cstate->pcdata = pcdata;
+ pcdata->is_leader = false;
+ pcdata->worker_processed_pos = -1;
+ cstate->is_parallel = true;
+ pcshared_info = (ParallelCopyShmInfo *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_SHARED_INFO, false);
+
+ ereport(DEBUG1, (errmsg("Starting parallel copy worker")));
+
+ pcdata->pcshared_info = pcshared_info;
+
+ shared_cstate = (SerializedParallelCopyState *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_CSTATE, false);
+ cstate->null_print = (char *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_NULL_PRINT, true);
+
+ RestoreString(toc, PARALLEL_COPY_KEY_DELIM, &cstate->delim);
+ RestoreString(toc, PARALLEL_COPY_KEY_QUOTE, &cstate->quote);
+ RestoreString(toc, PARALLEL_COPY_KEY_ESCAPE, &cstate->escape);
+ RestoreString(toc, PARALLEL_COPY_KEY_ATTNAME_LIST, &attnameListStr);
+ if (attnameListStr)
+ attlist = (List *)stringToNode(attnameListStr);
+
+ RestoreString(toc, PARALLEL_COPY_KEY_NOT_NULL_LIST, ¬nullListStr);
+ if (notnullListStr)
+ cstate->force_notnull = (List *)stringToNode(notnullListStr);
+
+ RestoreString(toc, PARALLEL_COPY_KEY_NULL_LIST, &nullListStr);
+ if (nullListStr)
+ cstate->force_null = (List *)stringToNode(nullListStr);
+
+ RestoreString(toc, PARALLEL_COPY_KEY_CONVERT_LIST, &convertListStr);
+ if (convertListStr)
+ cstate->convert_select = (List *)stringToNode(convertListStr);
+
+ RestoreString(toc, PARALLEL_COPY_KEY_WHERE_CLAUSE_STR, &whereClauseStr);
+ RestoreString(toc, PARALLEL_COPY_KEY_RANGE_TABLE, &rangeTableStr);
+
+ if (whereClauseStr)
+ {
+ Node *whereClauseCnvrtdFrmStr = (Node *) stringToNode(whereClauseStr);
+ cstate->whereClause = whereClauseCnvrtdFrmStr;
+ }
+
+ if (rangeTableStr)
+ {
+ List *rangeTableCnvrtdFrmStr = (List *) stringToNode(rangeTableStr);
+ cstate->range_table = rangeTableCnvrtdFrmStr;
+ }
+
+ /* Open and lock the relation, using the appropriate lock type. */
+ rel = table_open(shared_cstate->relid, RowExclusiveLock);
+ cstate->rel = rel;
+ InitializeParallelCopyInfo(shared_cstate, cstate, attlist);
+
+ CopyFrom(cstate);
+
+ if (rel != NULL)
+ table_close(rel, RowExclusiveLock);
+
+ /* Report WAL/buffer usage during parallel execution */
+ bufferusage = shm_toc_lookup(toc, PARALLEL_COPY_BUFFER_USAGE, false);
+ walusage = shm_toc_lookup(toc, PARALLEL_COPY_WAL_USAGE, false);
+ InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber],
+ &walusage[ParallelWorkerNumber]);
+
+ MemoryContextSwitchTo(oldcontext);
+ pfree(cstate);
+ return;
+}
+
+/*
+ * ParallelCopyFrom - parallel copy leader's functionality.
+ *
+ * Leader executes the before statement for before statement trigger, if before
+ * statement trigger is present. It will read the table data from the file and
+ * copy the contents to DSM data blocks. It will then read the input contents
+ * from the DSM data block and identify the records based on line breaks. This
+ * information is called line or a record that need to be inserted into a
+ * relation. The line information will be stored in ParallelCopyLineBoundary DSM
+ * data structure. Workers will then process this information and insert the
+ * data in to table. It will repeat this process until the all data is read from
+ * the file and all the DSM data blocks are processed. While processing if
+ * leader identifies that DSM Data blocks or DSM ParallelCopyLineBoundary data
+ * structures is full, leader will wait till the worker frees up some entries
+ * and repeat the process. It will wait till all the lines populated are
+ * processed by the workers and exits.
+ */
+static void
+ParallelCopyFrom(CopyState cstate)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ereport(DEBUG1, (errmsg("Running parallel copy leader")));
+
+ pcshared_info->is_read_in_progress = false;
+ cstate->cur_lineno = 0;
+}
+
/*
* Send copy start/stop messages for frontend copies. These have changed
* in past protocol redesigns.
@@ -1141,6 +1822,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
if (is_from)
{
+ ParallelContext *pcxt = NULL;
Assert(rel);
/* check read-only transaction and parallel mode */
@@ -1150,7 +1832,24 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
cstate = BeginCopyFrom(pstate, rel, stmt->filename, stmt->is_program,
NULL, stmt->attlist, stmt->options);
cstate->whereClause = whereClause;
- *processed = CopyFrom(cstate); /* copy from file to database */
+ cstate->is_parallel = false;
+
+ if (cstate->nworkers > 0)
+ pcxt = BeginParallelCopy(cstate->nworkers, cstate, stmt->attlist,
+ relid);
+
+ if (pcxt)
+ {
+ ParallelCopyFrom(cstate);
+
+ /* Wait for all copy workers to finish */
+ WaitForParallelWorkersToFinish(pcxt);
+ *processed = pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
+ EndParallelCopy(pcxt);
+ }
+ else
+ *processed = CopyFrom(cstate); /* copy from file to database */
+
EndCopyFrom(cstate);
}
else
@@ -1199,6 +1898,7 @@ ProcessCopyOptions(ParseState *pstate,
cstate->is_copy_from = is_from;
cstate->file_encoding = -1;
+ cstate->nworkers = -1;
/* Extract options from the statement node tree */
foreach(option, options)
@@ -1367,6 +2067,39 @@ ProcessCopyOptions(ParseState *pstate,
defel->defname),
parser_errposition(pstate, defel->location)));
}
+ else if (strcmp(defel->defname, "parallel") == 0)
+ {
+ int val;
+ bool parsed;
+ char *strval;
+
+ if (!cstate->is_copy_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("parallel option is supported only for copy from"),
+ parser_errposition(pstate, defel->location)));
+ if (cstate->nworkers >= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("conflicting or redundant options"),
+ parser_errposition(pstate, defel->location)));
+
+ strval = defGetString(defel);
+ parsed = parse_int(strval, &val, 0, NULL);
+ if (!parsed)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for integer option \"%s\": %s",
+ defel->defname, strval)));
+ if (val < 1 || val > 1024)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("value %s out of bounds for option \"%s\"",
+ strval, defel->defname),
+ errdetail("Valid values are between \"%d\" and \"%d\".",
+ 1, 1024)));
+ cstate->nworkers = val;
+ }
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1720,11 +2453,12 @@ BeginCopy(ParseState *pstate,
/*
* PopulateCommonCstateInfo - Populates the common variables required for copy
- * from operation. This is a helper function for BeginCopy function.
+ * from operation. This is a helper function for BeginCopy &
+ * InitializeParallelCopyInfo function.
*/
static void
PopulateCommonCstateInfo(CopyState cstate, TupleDesc tupDesc,
- List *attnamelist)
+ List *attnamelist)
{
int num_phys_attrs;
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c639833..82843c6 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -14,6 +14,7 @@
#ifndef COPY_H
#define COPY_H
+#include "access/parallel.h"
#include "nodes/execnodes.h"
#include "nodes/parsenodes.h"
#include "parser/parse_node.h"
@@ -41,4 +42,5 @@ extern uint64 CopyFrom(CopyState cstate);
extern DestReceiver *CreateCopyDestReceiver(void);
+extern void ParallelCopyMain(dsm_segment *seg, shm_toc *toc);
#endif /* COPY_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b4948ac..bb49b65 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1701,6 +1701,12 @@ ParallelBitmapHeapState
ParallelBlockTableScanDesc
ParallelCompletionPtr
ParallelContext
+ParallelCopyLineBoundaries
+ParallelCopyLineBoundary
+ParallelCopyData
+ParallelCopyDataBlock
+ParallelCopyLineBuf
+ParallelCopyShmInfo
ParallelExecutorInfo
ParallelHashGrowth
ParallelHashJoinBatch
@@ -2217,6 +2223,7 @@ SerCommitSeqNo
SerialControl
SerializableXactHandle
SerializedActiveRelMaps
+SerializedParallelCopyState
SerializedReindexState
SerializedSnapshotData
SerializedTransactionState
--
1.8.3.1
v3-0003-Allow-copy-from-command-to-process-data-from-file.patchtext/x-patch; charset=US-ASCII; name=v3-0003-Allow-copy-from-command-to-process-data-from-file.patchDownload
From c944b276b8f8b92a98d63d4a66ff931baa8fab1b Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Sat, 1 Aug 2020 09:31:42 +0530
Subject: [PATCH v3 3/6] Allow copy from command to process data from
file/STDIN contents to a table in parallel.
This feature allows the copy from to leverage multiple CPUs in order to copy
data from file/STDIN to a table. This adds a PARALLEL option to COPY FROM
command where the user can specify the number of workers that can be used
to perform the COPY FROM command.
The backend, to which the "COPY FROM" query is submitted acts as leader with
the responsibility of reading data from the file/stdin, launching at most n
number of workers as specified with PARALLEL 'n' option in the "COPY FROM"
query. The leader populates the common data required for the workers execution
in the DSM and shares it with the workers. The leader then executes before
statement triggers if there exists any. Leader populates DSM lines which
includes the start offset and line size, while populating the lines it reads
as many blocks as required into the DSM data blocks from the file. Each block
is of 64K size. The leader parses the data to identify a line, the existing
logic from CopyReadLineText which identifies the lines with some changes was
used for this. Leader checks if a free line is available to copy the
information, if there is no free line it waits till the required line is
freed up by the worker and then copies the identified lines information
(offset & line size) into the DSM lines. This process is repeated till the
complete file is processed. Simultaneously, the workers cache the lines(50)
locally into the local memory and release the lines to the leader for further
populating. Each worker processes the lines that it cached and inserts it into
the table. The leader does not participate in the insertion of data, leaders
only responsibility will be to identify the lines as fast as possible for the
workers to do the actual copy operation. The leader waits till all the lines
populated are processed by the workers and exits.
We have chosen this design based on the reason "that everything stalls if the
leader doesn't accept further input data, as well as when there are no
available splitted chunks so it doesn't seem like a good idea to have the
leader do other work. This is backed by the performance data where we have
seen that with 1 worker there is just a 5-10% performance difference".
---
src/backend/access/common/toast_internals.c | 11 +-
src/backend/access/heap/heapam.c | 13 -
src/backend/access/transam/xact.c | 31 +
src/backend/commands/copy.c | 892 ++++++++++++++++++++++++++--
src/bin/psql/tab-complete.c | 2 +-
src/include/access/xact.h | 2 +
src/tools/pgindent/typedefs.list | 1 +
7 files changed, 898 insertions(+), 54 deletions(-)
diff --git a/src/backend/access/common/toast_internals.c b/src/backend/access/common/toast_internals.c
index 25a81e5..586d53d 100644
--- a/src/backend/access/common/toast_internals.c
+++ b/src/backend/access/common/toast_internals.c
@@ -17,6 +17,7 @@
#include "access/genam.h"
#include "access/heapam.h"
#include "access/heaptoast.h"
+#include "access/parallel.h"
#include "access/table.h"
#include "access/toast_internals.h"
#include "access/xact.h"
@@ -116,7 +117,15 @@ toast_save_datum(Relation rel, Datum value,
TupleDesc toasttupDesc;
Datum t_values[3];
bool t_isnull[3];
- CommandId mycid = GetCurrentCommandId(true);
+
+ /*
+ * Parallel copy can insert toast tuples, in case of parallel copy the
+ * command would have been set already by calling AssignCommandIdForWorker.
+ * For parallel copy call GetCurrentCommandId to get currentCommandId by
+ * passing used as false, as this is taken care earlier.
+ */
+ CommandId mycid = IsParallelWorker() ? GetCurrentCommandId(false) :
+ GetCurrentCommandId(true);
struct varlena *result;
struct varatt_external toast_pointer;
union
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index f75e1cf..fa005df 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2043,19 +2043,6 @@ static HeapTuple
heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
CommandId cid, int options)
{
- /*
- * Parallel operations are required to be strictly read-only in a parallel
- * worker. Parallel inserts are not safe even in the leader in the
- * general case, because group locking means that heavyweight locks for
- * relation extension or GIN page locks will not conflict between members
- * of a lock group, but we don't prohibit that case here because there are
- * useful special cases that we can safely allow, such as CREATE TABLE AS.
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-
tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 7ccb7d6..e2bcac2 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -517,6 +517,37 @@ GetCurrentFullTransactionIdIfAny(void)
}
/*
+ * AssignFullTransactionIdForWorker
+ *
+ * For parallel copy, transaction id of leader will be used by the workers.
+ */
+void
+AssignFullTransactionIdForWorker(FullTransactionId fullTransactionId)
+{
+ TransactionState s = CurrentTransactionState;
+
+ Assert((IsInParallelMode() || IsParallelWorker()));
+ s->fullTransactionId = fullTransactionId;
+}
+
+/*
+ * AssignCommandIdForWorker
+ *
+ * For parallel copy, command id of leader will be used by the workers.
+ */
+void
+AssignCommandIdForWorker(CommandId commandId, bool used)
+{
+ Assert((IsInParallelMode() || IsParallelWorker()));
+
+ /* this is global to a transaction, not subtransaction-local */
+ if (used)
+ currentCommandIdUsed = true;
+
+ currentCommandId = commandId;
+}
+
+/*
* MarkCurrentTransactionIdLoggedIfAny
*
* Remember that the current xid - if it is assigned - now has been wal logged.
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 25ef664..4c69a5d 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -26,6 +26,7 @@
#include "access/xlog.h"
#include "catalog/dependency.h"
#include "catalog/pg_authid.h"
+#include "catalog/pg_proc_d.h"
#include "catalog/pg_type.h"
#include "commands/copy.h"
#include "commands/defrem.h"
@@ -40,11 +41,13 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "optimizer/clauses.h"
#include "optimizer/optimizer.h"
#include "parser/parse_coerce.h"
#include "parser/parse_collate.h"
#include "parser/parse_expr.h"
#include "parser/parse_relation.h"
+#include "pgstat.h"
#include "port/pg_bswap.h"
#include "rewrite/rewriteHandler.h"
#include "storage/fd.h"
@@ -95,6 +98,18 @@ typedef enum CopyInsertMethod
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
} CopyInsertMethod;
+/*
+ * State of the line.
+ */
+typedef enum ParallelCopyLineState
+{
+ LINE_INIT, /* initial state of line */
+ LINE_LEADER_POPULATING, /* leader processing line */
+ LINE_LEADER_POPULATED, /* leader completed populating line */
+ LINE_WORKER_PROCESSING, /* worker processing line */
+ LINE_WORKER_PROCESSED /* worker completed processing line */
+}ParallelCopyLineState;
+
#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
/*
@@ -123,6 +138,7 @@ typedef enum CopyInsertMethod
#define IsParallelCopy() (cstate->is_parallel)
#define IsLeader() (cstate->pcdata->is_leader)
+#define IsWorker() (IsParallelCopy() && !IsLeader())
#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
/*
@@ -556,9 +572,13 @@ if (1) \
{ \
if (raw_buf_ptr > cstate->raw_buf_index) \
{ \
- appendBinaryStringInfo(&cstate->line_buf, \
- cstate->raw_buf + cstate->raw_buf_index, \
- raw_buf_ptr - cstate->raw_buf_index); \
+ if (!IsParallelCopy()) \
+ appendBinaryStringInfo(&cstate->line_buf, \
+ cstate->raw_buf + cstate->raw_buf_index, \
+ raw_buf_ptr - cstate->raw_buf_index); \
+ else \
+ line_size += raw_buf_ptr - cstate->raw_buf_index; \
+ \
cstate->raw_buf_index = raw_buf_ptr; \
} \
} else ((void) 0)
@@ -571,26 +591,65 @@ if (1) \
goto not_end_of_copy; \
} else ((void) 0)
+/* Begin parallel copy Macros */
+#define SET_NEWLINE_SIZE() \
+{ \
+ if (cstate->eol_type == EOL_NL || cstate->eol_type == EOL_CR) \
+ new_line_size = 1; \
+ else if (cstate->eol_type == EOL_CRNL) \
+ new_line_size = 2; \
+ else \
+ new_line_size = 0; \
+}
+
+/*
+ * COPY_WAIT_TO_PROCESS - Wait before continuing to process.
+ */
+#define COPY_WAIT_TO_PROCESS() \
+{ \
+ CHECK_FOR_INTERRUPTS(); \
+ (void) WaitLatch(MyLatch, \
+ WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, \
+ 1L, WAIT_EVENT_PG_SLEEP); \
+ ResetLatch(MyLatch); \
+}
+
/*
* CLEAR_EOL_LINE - Wrapper for clearing EOL.
*/
#define CLEAR_EOL_LINE() \
if (!result && !IsHeaderLine()) \
- ClearEOLFromCopiedData(cstate, cstate->line_buf.data, \
- cstate->line_buf.len, \
- &cstate->line_buf.len) \
+{ \
+ if (IsParallelCopy()) \
+ ClearEOLFromCopiedData(cstate, cstate->raw_buf, \
+ raw_buf_ptr, &line_size); \
+ else \
+ ClearEOLFromCopiedData(cstate, cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ &cstate->line_buf.len); \
+} \
/*
* INCREMENTPROCESSED - Increment the lines processed.
*/
-#define INCREMENTPROCESSED(processed) \
-processed++;
+#define INCREMENTPROCESSED(processed) \
+{ \
+ if (!IsParallelCopy()) \
+ processed++; \
+ else \
+ pg_atomic_add_fetch_u64(&cstate->pcdata->pcshared_info->processed, 1); \
+}
/*
* RETURNPROCESSED - Get the lines processed.
*/
#define RETURNPROCESSED(processed) \
-return processed;
+if (!IsParallelCopy()) \
+ return processed; \
+else \
+ return pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
+
+/* End parallel copy Macros */
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
@@ -645,7 +704,10 @@ static void PopulateCommonCstateInfo(CopyState cstate, TupleDesc tup_desc,
static void ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
int copy_line_pos, int *copy_line_size);
static void ConvertToServerEncoding(CopyState cstate);
-
+static void ExecBeforeStmtTrigger(CopyState cstate);
+static void CheckTargetRelValidity(CopyState cstate);
+static void PopulateCstateCatalogInfo(CopyState cstate);
+static pg_attribute_always_inline uint32 GetLinePosition(CopyState cstate);
/*
* SerializeParallelCopyState - Copy shared_cstate using cstate information.
*/
@@ -727,6 +789,137 @@ PopulateParallelCopyShmInfo(ParallelCopyShmInfo *shared_info_ptr,
}
/*
+ * IsTriggerFunctionParallelSafe - Check if the trigger function is parallel
+ * safe for the triggers. Return false if any one of the trigger has parallel
+ * unsafe function.
+ */
+static pg_attribute_always_inline bool
+IsTriggerFunctionParallelSafe(TriggerDesc *trigdesc)
+{
+ int i;
+ for (i = 0; i < trigdesc->numtriggers; i++)
+ {
+ Trigger *trigger = &trigdesc->triggers[i];
+ int trigtype = RI_TRIGGER_NONE;
+
+ if (func_parallel(trigger->tgfoid) != PROPARALLEL_SAFE)
+ return false;
+
+ /* If the trigger is parallel safe, also look for RI_TRIGGER. */
+ trigtype = RI_FKey_trigger_type(trigger->tgfoid);
+ if (trigtype == RI_TRIGGER_PK || trigtype == RI_TRIGGER_FK)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * CheckExprParallelSafety - determine parallel safety of volatile expressions
+ * in default clause of column definition or in where clause and return true if
+ * they are parallel safe.
+ */
+static pg_attribute_always_inline bool
+CheckExprParallelSafety(CopyState cstate)
+{
+ if (contain_volatile_functions(cstate->whereClause))
+ {
+ if (max_parallel_hazard((Query *)cstate->whereClause) != PROPARALLEL_SAFE)
+ return false;
+ }
+
+ /*
+ * Check if any of the column has volatile default expression. if yes, and
+ * they are not parallel safe, then parallelism is not allowed. For
+ * instance, if there are any serial/bigserial columns for which nextval()
+ * default expression which is parallel unsafe is associated, parallelism
+ * should not be allowed. In non parallel copy volatile functions are not
+ * checked for nextval().
+ */
+ if (cstate->defexprs != NULL && cstate->num_defaults != 0)
+ {
+ int i;
+ for (i = 0; i < cstate->num_defaults; i++)
+ {
+ bool volatile_expr = contain_volatile_functions((Node *)cstate->defexprs[i]->expr);
+ if (volatile_expr &&
+ (max_parallel_hazard((Query *)cstate->defexprs[i]->expr)) !=
+ PROPARALLEL_SAFE)
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * FindInsertMethod - determine insert mode single, multi, or multi conditional.
+ */
+static pg_attribute_always_inline CopyInsertMethod
+FindInsertMethod(CopyState cstate)
+{
+ if (cstate->rel->trigdesc != NULL &&
+ (cstate->rel->trigdesc->trig_insert_before_row ||
+ cstate->rel->trigdesc->trig_insert_instead_row))
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
+ cstate->rel->trigdesc != NULL &&
+ cstate->rel->trigdesc->trig_insert_new_table)
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+ return CIM_MULTI_CONDITIONAL;
+
+ return CIM_MULTI;
+}
+
+/*
+ * IsParallelCopyAllowed - check for the cases where parallel copy is not
+ * applicable.
+ */
+static pg_attribute_always_inline bool
+IsParallelCopyAllowed(CopyState cstate)
+{
+ /* Parallel copy not allowed for frontend (2.0 protocol) & binary option. */
+ if ((cstate->copy_dest == COPY_OLD_FE) || cstate->binary)
+ return false;
+
+ /* Check if copy is into foreign table or temporary table. */
+ if (cstate->rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE ||
+ RelationUsesLocalBuffers(cstate->rel))
+ return false;
+
+ /* Check if trigger function is parallel safe. */
+ if (cstate->rel->trigdesc != NULL &&
+ !IsTriggerFunctionParallelSafe(cstate->rel->trigdesc))
+ return false;
+
+ /*
+ * Check if there is after statement or instead of trigger or transition
+ * table triggers.
+ */
+ if (cstate->rel->trigdesc != NULL &&
+ (cstate->rel->trigdesc->trig_insert_after_statement ||
+ cstate->rel->trigdesc->trig_insert_instead_row ||
+ cstate->rel->trigdesc->trig_insert_new_table))
+ return false;
+
+ /* Check if the volatile expressions are parallel safe, if present any. */
+ if (!CheckExprParallelSafety(cstate))
+ return false;
+
+ /* Check if the insertion mode is single. */
+ if (FindInsertMethod(cstate) == CIM_SINGLE)
+ return false;
+
+ return true;
+}
+
+/*
* BeginParallelCopy - start parallel copy tasks.
*
* Get the number of workers required to perform the parallel copy. The data
@@ -754,6 +947,7 @@ BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
ParallelCopyData *pcdata;
MemoryContext oldcontext;
+ CheckTargetRelValidity(cstate);
parallel_workers = Min(nworkers, max_worker_processes);
/* Can't perform copy in parallel */
@@ -765,6 +959,15 @@ BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
MemoryContextSwitchTo(oldcontext);
cstate->pcdata = pcdata;
+ /*
+ * User chosen parallel copy. Determine if the parallel copy is actually
+ * allowed. If not, go with the non-parallel mode.
+ */
+ if (!IsParallelCopyAllowed(cstate))
+ return NULL;
+
+ full_transaction_id = GetCurrentFullTransactionId();
+
EnterParallelMode();
pcxt = CreateParallelContext("postgres", "ParallelCopyMain",
parallel_workers);
@@ -973,9 +1176,214 @@ InitializeParallelCopyInfo(SerializedParallelCopyState *shared_cstate,
cstate->line_buf_converted = false;
cstate->raw_buf = NULL;
cstate->raw_buf_index = cstate->raw_buf_len = 0;
+
+ PopulateCstateCatalogInfo(cstate);
+
+ /* Create workspace for CopyReadAttributes results. */
+ if (!cstate->binary)
+ {
+ AttrNumber attr_count = list_length(cstate->attnumlist);
+
+ cstate->max_fields = attr_count;
+ cstate->raw_fields = (char **)palloc(attr_count * sizeof(char *));
+ }
}
/*
+ * CacheLineInfo - Cache the line information to local memory.
+ */
+static bool
+CacheLineInfo(CopyState cstate, uint32 buff_count)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyData *pcdata = cstate->pcdata;
+ uint32 write_pos;
+ ParallelCopyDataBlock *data_blk_ptr;
+ ParallelCopyLineBoundary *lineInfo;
+ uint32 offset;
+ int dataSize;
+ int copiedSize = 0;
+
+ resetStringInfo(&pcdata->worker_line_buf[buff_count].line_buf);
+ write_pos = GetLinePosition(cstate);
+ if (-1 == write_pos)
+ return true;
+
+ /* Get the current line information. */
+ lineInfo = &pcshared_info->line_boundaries.ring[write_pos];
+ if (pg_atomic_read_u32(&lineInfo->line_size) == 0)
+ goto empty_data_line_update;
+
+ /* Get the block information. */
+ data_blk_ptr = &pcshared_info->data_blocks[lineInfo->first_block];
+
+ /* Get the offset information from where the data must be copied. */
+ offset = lineInfo->start_offset;
+ pcdata->worker_line_buf[buff_count].cur_lineno = lineInfo->cur_lineno;
+
+ elog(DEBUG1, "[Worker] Processing - line position:%d, block:%d, unprocessed lines:%d, offset:%d, line size:%d",
+ write_pos, lineInfo->first_block,
+ pg_atomic_read_u32(&data_blk_ptr->unprocessed_line_parts),
+ offset, pg_atomic_read_u32(&lineInfo->line_size));
+
+ for (;;)
+ {
+ uint8 skip_bytes = data_blk_ptr->skip_bytes;
+
+ /*
+ * There is a possibility that the loop embedded at the bottom of the
+ * current loop has come out because data_blk_ptr->curr_blk_completed
+ * is set, but dataSize read might be an old value, if
+ * data_blk_ptr->curr_blk_completed and the line is completed, line_size
+ * will be set. Read the line_size again to be sure if it is completed
+ * or partial block.
+ */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+ if (dataSize)
+ {
+ int remainingSize = dataSize - copiedSize;
+ if (!remainingSize)
+ break;
+
+ /* Whole line is in current block. */
+ if (remainingSize + offset + skip_bytes < DATA_BLOCK_SIZE)
+ {
+ appendBinaryStringInfo(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ remainingSize);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts,
+ 1);
+ break;
+ }
+ else
+ {
+ /* Line is spread across the blocks. */
+ uint32 lineInCurrentBlock = (DATA_BLOCK_SIZE - skip_bytes) - offset;
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ lineInCurrentBlock);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts, 1);
+ copiedSize += lineInCurrentBlock;
+ while (copiedSize < dataSize)
+ {
+ uint32 currentBlockCopySize;
+ ParallelCopyDataBlock *currBlkPtr = &pcshared_info->data_blocks[data_blk_ptr->following_block];
+ skip_bytes = currBlkPtr->skip_bytes;
+
+ /*
+ * If complete data is present in current block use
+ * dataSize - copiedSize, or copy the whole block from
+ * current block.
+ */
+ currentBlockCopySize = Min(dataSize - copiedSize, DATA_BLOCK_SIZE - skip_bytes);
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &currBlkPtr->data[0],
+ currentBlockCopySize);
+ pg_atomic_sub_fetch_u32(&currBlkPtr->unprocessed_line_parts, 1);
+ copiedSize += currentBlockCopySize;
+ data_blk_ptr = currBlkPtr;
+ }
+
+ break;
+ }
+ }
+ else
+ {
+ /* Copy this complete block from the current offset. */
+ uint32 lineInCurrentBlock = (DATA_BLOCK_SIZE - skip_bytes) - offset;
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ lineInCurrentBlock);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts, 1);
+ copiedSize += lineInCurrentBlock;
+
+ /*
+ * Reset the offset. For the first copy, copy from the offset. For
+ * the subsequent copy the complete block.
+ */
+ offset = 0;
+
+ /* Set data_blk_ptr to the following block. */
+ data_blk_ptr = &pcshared_info->data_blocks[data_blk_ptr->following_block];
+ }
+
+ for (;;)
+ {
+ /* Get the size of this line */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+
+ /*
+ * If the data is present in current block lineInfo. line_size
+ * will be updated. If the data is spread across the blocks either
+ * of lineInfo.line_size or data_blk_ptr->curr_blk_completed can
+ * be updated. lineInfo.line_size will be updated if the complete
+ * read is finished. data_blk_ptr->curr_blk_completed will be
+ * updated if processing of current block is finished and data
+ * processing is not finished.
+ */
+ if (data_blk_ptr->curr_blk_completed || (dataSize != -1))
+ break;
+
+ COPY_WAIT_TO_PROCESS()
+ }
+ }
+
+empty_data_line_update:
+ elog(DEBUG1, "[Worker] Completed processing line:%d", write_pos);
+ pg_atomic_write_u32(&lineInfo->line_state, LINE_WORKER_PROCESSED);
+ pg_atomic_write_u32(&lineInfo->line_size, -1);
+ pg_atomic_add_fetch_u64(&pcshared_info->total_worker_processed, 1);
+ return false;
+}
+
+/*
+ * GetWorkerLine - Returns a line for worker to process.
+ */
+static bool
+GetWorkerLine(CopyState cstate)
+{
+ uint32 buff_count;
+ ParallelCopyData *pcdata = cstate->pcdata;
+
+ /*
+ * Copy the line data to line_buf and release the line position so that the
+ * worker can continue loading data.
+ */
+ if (pcdata->worker_line_buf_pos < pcdata->worker_line_buf_count)
+ goto return_line;
+
+ pcdata->worker_line_buf_pos = 0;
+ pcdata->worker_line_buf_count = 0;
+
+ for (buff_count = 0; buff_count < WORKER_CHUNK_COUNT; buff_count++)
+ {
+ bool result = CacheLineInfo(cstate, buff_count);
+ if (result)
+ break;
+
+ pcdata->worker_line_buf_count++;
+ }
+
+ if (pcdata->worker_line_buf_count)
+ goto return_line;
+ else
+ resetStringInfo(&cstate->line_buf);
+
+ return true;
+
+return_line:
+ cstate->line_buf = pcdata->worker_line_buf[pcdata->worker_line_buf_pos].line_buf;
+ cstate->cur_lineno = pcdata->worker_line_buf[pcdata->worker_line_buf_pos].cur_lineno;
+ cstate->line_buf_valid = true;
+
+ /* Mark that encoding conversion hasn't occurred yet. */
+ cstate->line_buf_converted = false;
+ ConvertToServerEncoding(cstate);
+ pcdata->worker_line_buf_pos++;
+ return false;
+ }
+
+/*
* ParallelCopyMain - parallel copy worker's code.
*
* Where clause handling, convert tuple to columns, add default null values for
@@ -1024,6 +1432,8 @@ ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
ereport(DEBUG1, (errmsg("Starting parallel copy worker")));
pcdata->pcshared_info = pcshared_info;
+ AssignFullTransactionIdForWorker(pcshared_info->full_transaction_id);
+ AssignCommandIdForWorker(pcshared_info->mycid, true);
shared_cstate = (SerializedParallelCopyState *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_CSTATE, false);
cstate->null_print = (char *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_NULL_PRINT, true);
@@ -1084,6 +1494,33 @@ ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
}
/*
+ * UpdateBlockInLineInfo - Update the line information.
+ */
+static pg_attribute_always_inline int
+UpdateBlockInLineInfo(CopyState cstate, uint32 blk_pos,
+ uint32 offset, uint32 line_size, uint32 line_state)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundaries *lineBoundaryPtr = &pcshared_info->line_boundaries;
+ ParallelCopyLineBoundary *lineInfo;
+ int line_pos = lineBoundaryPtr->pos;
+
+ /* Update the line information for the worker to pick and process. */
+ lineInfo = &lineBoundaryPtr->ring[line_pos];
+ while (pg_atomic_read_u32(&lineInfo->line_size) != -1)
+ COPY_WAIT_TO_PROCESS()
+
+ lineInfo->first_block = blk_pos;
+ lineInfo->start_offset = offset;
+ lineInfo->cur_lineno = cstate->cur_lineno;
+ pg_atomic_write_u32(&lineInfo->line_size, line_size);
+ pg_atomic_write_u32(&lineInfo->line_state, line_state);
+ lineBoundaryPtr->pos = (lineBoundaryPtr->pos + 1) % RINGSIZE;
+
+ return line_pos;
+}
+
+/*
* ParallelCopyFrom - parallel copy leader's functionality.
*
* Leader executes the before statement for before statement trigger, if before
@@ -1106,8 +1543,298 @@ ParallelCopyFrom(CopyState cstate)
ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
ereport(DEBUG1, (errmsg("Running parallel copy leader")));
+ /* Execute the before statement triggers from the leader */
+ ExecBeforeStmtTrigger(cstate);
+
+ /* On input just throw the header line away. */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
+ {
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ {
+ pcshared_info->is_read_in_progress = false;
+ return; /* done */
+ }
+ }
+
+ for (;;)
+ {
+ bool done;
+ cstate->cur_lineno++;
+
+ /* Actually read the line into memory here. */
+ done = CopyReadLine(cstate);
+
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ break;
+ }
+
pcshared_info->is_read_in_progress = false;
cstate->cur_lineno = 0;
+ }
+
+/*
+ * GetLinePosition - return the line position that worker should process.
+ */
+static uint32
+GetLinePosition(CopyState cstate)
+{
+ ParallelCopyData *pcdata = cstate->pcdata;
+ ParallelCopyShmInfo *pcshared_info = pcdata->pcshared_info;
+ uint32 previous_pos = pcdata->worker_processed_pos;
+ uint32 write_pos = (previous_pos == -1) ? 0 : (previous_pos + 1) % RINGSIZE;
+ for (;;)
+ {
+ int dataSize;
+ bool is_read_in_progress = pcshared_info->is_read_in_progress;
+ ParallelCopyLineBoundary *lineInfo;
+ ParallelCopyDataBlock *data_blk_ptr;
+ ParallelCopyLineState line_state = LINE_LEADER_POPULATED;
+ ParallelCopyLineState curr_line_state;
+ CHECK_FOR_INTERRUPTS();
+
+ /* File read completed & no elements to process. */
+ if (!is_read_in_progress &&
+ (pcshared_info->populated ==
+ pg_atomic_read_u64(&pcshared_info->total_worker_processed)))
+ {
+ write_pos = -1;
+ break;
+ }
+
+ /* Get the current line information. */
+ lineInfo = &pcshared_info->line_boundaries.ring[write_pos];
+ curr_line_state = pg_atomic_read_u32(&lineInfo->line_state);
+ if ((write_pos % WORKER_CHUNK_COUNT == 0) &&
+ (curr_line_state == LINE_WORKER_PROCESSED ||
+ curr_line_state == LINE_WORKER_PROCESSING))
+ {
+ pcdata->worker_processed_pos = write_pos;
+ write_pos = (write_pos + WORKER_CHUNK_COUNT) % RINGSIZE;
+ continue;
+ }
+
+ /* Get the size of this line. */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+
+ if (dataSize != 0) /* If not an empty line. */
+ {
+ /* Get the block information. */
+ data_blk_ptr = &pcshared_info->data_blocks[lineInfo->first_block];
+
+ if (!data_blk_ptr->curr_blk_completed && (dataSize == -1))
+ {
+ /* Wait till the current line or block is added. */
+ COPY_WAIT_TO_PROCESS()
+ continue;
+ }
+ }
+
+ /* Make sure that no worker has consumed this element. */
+ if (pg_atomic_compare_exchange_u32(&lineInfo->line_state,
+ &line_state, LINE_WORKER_PROCESSING))
+ break;
+ }
+
+ pcdata->worker_processed_pos = write_pos;
+ return write_pos;
+}
+
+/*
+ * GetFreeCopyBlock - Get a free block for data to be copied.
+ */
+static pg_attribute_always_inline uint32
+GetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
+{
+ int count = 0;
+ uint32 last_free_block = pcshared_info->cur_block_pos;
+ uint32 block_pos = (last_free_block != -1) ? ((last_free_block + 1) % MAX_BLOCKS_COUNT): 0;
+
+ /*
+ * Get a new block for copying data, don't check current block, current
+ * block will have some unprocessed data.
+ */
+ while (count < (MAX_BLOCKS_COUNT - 1))
+ {
+ ParallelCopyDataBlock *dataBlkPtr = &pcshared_info->data_blocks[block_pos];
+ uint32 unprocessed_line_parts = pg_atomic_read_u32(&dataBlkPtr->unprocessed_line_parts);
+ if (unprocessed_line_parts == 0)
+ {
+ dataBlkPtr->curr_blk_completed = false;
+ dataBlkPtr->skip_bytes = 0;
+ pcshared_info->cur_block_pos = block_pos;
+ return block_pos;
+ }
+
+ block_pos = (block_pos + 1) % MAX_BLOCKS_COUNT;
+ count++;
+ }
+
+ return -1;
+}
+
+/*
+ * WaitGetFreeCopyBlock - If there are no blocks available, wait and get a block
+ * for copying data.
+ */
+static uint32
+WaitGetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
+{
+ uint32 new_free_pos = -1;
+ for (;;)
+ {
+ new_free_pos = GetFreeCopyBlock(pcshared_info);
+ if (new_free_pos != -1) /* We have got one block, break now. */
+ break;
+
+ COPY_WAIT_TO_PROCESS()
+ }
+
+ return new_free_pos;
+}
+
+/*
+ * SetRawBufForLoad - Set raw_buf to the shared memory where the file data must
+ * be read.
+ */
+static void
+SetRawBufForLoad(CopyState cstate, uint32 line_size, uint32 copy_buf_len,
+ uint32 raw_buf_ptr, char **copy_raw_buf)
+{
+ ParallelCopyShmInfo *pcshared_info;
+ uint32 cur_block_pos;
+ uint32 next_block_pos;
+ ParallelCopyDataBlock *cur_data_blk_ptr = NULL;
+ ParallelCopyDataBlock *next_data_blk_ptr = NULL;
+
+ if (!IsParallelCopy())
+ return;
+
+ pcshared_info = cstate->pcdata->pcshared_info;
+ cur_block_pos = pcshared_info->cur_block_pos;
+ cur_data_blk_ptr = (cstate->raw_buf) ? &pcshared_info->data_blocks[cur_block_pos] : NULL;
+ next_block_pos = WaitGetFreeCopyBlock(pcshared_info);
+ next_data_blk_ptr = &pcshared_info->data_blocks[next_block_pos];
+
+ /* set raw_buf to the data block in shared memory */
+ cstate->raw_buf = next_data_blk_ptr->data;
+ *copy_raw_buf = cstate->raw_buf;
+ if (cur_data_blk_ptr && line_size)
+ {
+ /*
+ * Mark the previous block as completed, worker can start copying this
+ * data.
+ */
+ cur_data_blk_ptr->following_block = next_block_pos;
+ pg_atomic_add_fetch_u32(&cur_data_blk_ptr->unprocessed_line_parts, 1);
+ cur_data_blk_ptr->skip_bytes = copy_buf_len - raw_buf_ptr;
+ cur_data_blk_ptr->curr_blk_completed = true;
+ }
+}
+
+/*
+ * EndLineParallelCopy - Update the line information in shared memory.
+ */
+static void
+EndLineParallelCopy(CopyState cstate, uint32 line_pos, uint32 line_size,
+ uint32 raw_buf_ptr)
+{
+ uint8 new_line_size;
+ if (!IsParallelCopy())
+ return;
+
+ if (!IsHeaderLine())
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundaries *lineBoundaryPtr = &pcshared_info->line_boundaries;
+ SET_NEWLINE_SIZE()
+ if (line_size)
+ {
+ ParallelCopyLineBoundary *lineInfo = &lineBoundaryPtr->ring[line_pos];
+ /*
+ * If the new_line_size > raw_buf_ptr, then the new block has only
+ * new line char content. The unprocessed count should not be
+ * increased in this case.
+ */
+ if (raw_buf_ptr > new_line_size)
+ {
+ uint32 cur_block_pos = pcshared_info->cur_block_pos;
+ ParallelCopyDataBlock *curr_data_blk_ptr = &pcshared_info->data_blocks[cur_block_pos];
+ pg_atomic_add_fetch_u32(&curr_data_blk_ptr->unprocessed_line_parts, 1);
+ }
+
+ /* Update line size. */
+ pg_atomic_write_u32(&lineInfo->line_size, line_size);
+ pg_atomic_write_u32(&lineInfo->line_state, LINE_LEADER_POPULATED);
+ elog(DEBUG1, "[Leader] After adding - line position:%d, line_size:%d",
+ line_pos, line_size);
+ pcshared_info->populated++;
+ }
+ else if (new_line_size)
+ {
+ /* This means only new line char, empty record should be inserted.*/
+ ParallelCopyLineBoundary *lineInfo;
+ line_pos = UpdateBlockInLineInfo(cstate, -1, -1, 0,
+ LINE_LEADER_POPULATED);
+ lineInfo = &lineBoundaryPtr->ring[line_pos];
+ elog(DEBUG1, "[Leader] Added empty line with offset:%d, line position:%d, line size:%d",
+ lineInfo->start_offset, line_pos,
+ pg_atomic_read_u32(&lineInfo->line_size));
+ pcshared_info->populated++;
+ }
+ }
+}
+
+/*
+ * ExecBeforeStmtTrigger - Execute the before statement trigger, this will be
+ * executed for parallel copy by the leader process.
+ */
+static void
+ExecBeforeStmtTrigger(CopyState cstate)
+{
+ EState *estate = CreateExecutorState();
+ ResultRelInfo *resultRelInfo;
+
+ Assert(IsLeader());
+
+ /*
+ * We need a ResultRelInfo so we can use the regular executor's
+ * index-entry-making machinery. (There used to be a huge amount of code
+ * here that basically duplicated execUtils.c ...)
+ */
+ resultRelInfo = makeNode(ResultRelInfo);
+ InitResultRelInfo(resultRelInfo,
+ cstate->rel,
+ 1, /* must match rel's position in range_table */
+ NULL,
+ 0);
+
+ /* Verify the named relation is a valid target for INSERT */
+ CheckValidResultRel(resultRelInfo, CMD_INSERT);
+
+ estate->es_result_relations = resultRelInfo;
+ estate->es_num_result_relations = 1;
+ estate->es_result_relation_info = resultRelInfo;
+
+ ExecInitRangeTable(estate, cstate->range_table);
+
+ /*
+ * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
+ * should do this for COPY, since it's not really an "INSERT" statement as
+ * such. However, executing these triggers maintains consistency with the
+ * EACH ROW triggers that we already fire on COPY.
+ */
+ ExecBSInsertTriggers(estate, resultRelInfo);
+
+ /* Close any trigger target relations */
+ ExecCleanUpTriggerState(estate);
+
+ FreeExecutorState(estate);
}
/*
@@ -3570,7 +4297,8 @@ CopyFrom(CopyState cstate)
PartitionTupleRouting *proute = NULL;
ErrorContextCallback errcallback;
- CommandId mycid = GetCurrentCommandId(true);
+ CommandId mycid = IsParallelCopy() ? GetCurrentCommandId(false) :
+ GetCurrentCommandId(true);
int ti_options = 0; /* start with default options for insert */
BulkInsertState bistate = NULL;
CopyInsertMethod insertMethod;
@@ -3580,7 +4308,14 @@ CopyFrom(CopyState cstate)
bool has_instead_insert_row_trig;
bool leafpart_use_multi_insert = false;
- CheckTargetRelValidity(cstate);
+ /*
+ * Perform this check if it is not parallel copy. In case of parallel
+ * copy, this check is done by the leader, so that if any invalid case
+ * exist the copy from command will error out from the leader itself,
+ * avoiding launching workers, just to throw error.
+ */
+ if (!IsParallelCopy())
+ CheckTargetRelValidity(cstate);
/*
* If the target file is new-in-transaction, we assume that checking FSM
@@ -3620,7 +4355,8 @@ CopyFrom(CopyState cstate)
target_resultRelInfo = resultRelInfo;
/* Verify the named relation is a valid target for INSERT */
- CheckValidResultRel(resultRelInfo, CMD_INSERT);
+ if (!IsParallelCopy())
+ CheckValidResultRel(resultRelInfo, CMD_INSERT);
ExecOpenIndices(resultRelInfo, false);
@@ -3769,13 +4505,16 @@ CopyFrom(CopyState cstate)
has_instead_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
resultRelInfo->ri_TrigDesc->trig_insert_instead_row);
- /*
- * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
- * should do this for COPY, since it's not really an "INSERT" statement as
- * such. However, executing these triggers maintains consistency with the
- * EACH ROW triggers that we already fire on COPY.
- */
- ExecBSInsertTriggers(estate, resultRelInfo);
+ if (!IsParallelCopy())
+ {
+ /*
+ * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
+ * should do this for COPY, since it's not really an "INSERT" statement as
+ * such. However, executing these triggers maintains consistency with the
+ * EACH ROW triggers that we already fire on COPY.
+ */
+ ExecBSInsertTriggers(estate, resultRelInfo);
+ }
econtext = GetPerTupleExprContext(estate);
@@ -3875,6 +4614,16 @@ CopyFrom(CopyState cstate)
!has_instead_insert_row_trig &&
resultRelInfo->ri_FdwRoutine == NULL;
+ /*
+ * If the table has any partitions that are either foreign or
+ * has BEFORE/INSTEAD OF triggers, we can't perform copy
+ * operations with parallel workers.
+ */
+ if (!leafpart_use_multi_insert && IsParallelWorker())
+ ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot perform PARALLEL COPY if partition has BEFORE/INSTEAD OF triggers, or if the partition is foreign partition"),
+ errhint("Try COPY without PARALLEL option")));
+
/* Set the multi-insert buffer to use for this partition. */
if (leafpart_use_multi_insert)
{
@@ -4291,7 +5040,7 @@ BeginCopyFrom(ParseState *pstate,
* only in text mode.
*/
initStringInfo(&cstate->attribute_buf);
- cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf = (IsParallelCopy()) ? NULL : (char *) palloc(RAW_BUF_SIZE + 1);
cstate->raw_buf_index = cstate->raw_buf_len = 0;
if (!cstate->binary)
{
@@ -4440,26 +5189,35 @@ NextCopyFromRawFields(CopyState cstate, char ***fields, int *nfields)
/* only available for text or csv input */
Assert(!cstate->binary);
- /* on input just throw the header line away */
- if (cstate->cur_lineno == 0 && cstate->header_line)
+ if (IsParallelCopy())
{
- cstate->cur_lineno++;
- if (CopyReadLine(cstate))
- return false; /* done */
+ done = GetWorkerLine(cstate);
+ if (done && cstate->line_buf.len == 0)
+ return false;
}
+ else
+ {
+ /* on input just throw the header line away */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
+ {
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ return false; /* done */
+ }
- cstate->cur_lineno++;
+ cstate->cur_lineno++;
- /* Actually read the line into memory here */
- done = CopyReadLine(cstate);
+ /* Actually read the line into memory here */
+ done = CopyReadLine(cstate);
- /*
- * EOF at start of line means we're done. If we see EOF after some
- * characters, we act as though it was newline followed by EOF, ie,
- * process the line and then exit loop on next iteration.
- */
- if (done && cstate->line_buf.len == 0)
- return false;
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ return false;
+ }
/* Parse the line into de-escaped field values */
if (cstate->csv_mode)
@@ -4707,9 +5465,31 @@ CopyReadLine(CopyState cstate)
*/
if (cstate->copy_dest == COPY_NEW_FE)
{
+ bool bIsFirst = true;
do
{
- cstate->raw_buf_index = cstate->raw_buf_len;
+ if (!IsParallelCopy())
+ cstate->raw_buf_index = cstate->raw_buf_len;
+ else
+ {
+ if (cstate->raw_buf_index == RAW_BUF_SIZE)
+ {
+ /* Get a new block if it is the first time, From the
+ * subsequent time, reset the index and re-use the same
+ * block.
+ */
+ if (bIsFirst)
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint32 block_pos = WaitGetFreeCopyBlock(pcshared_info);
+ cstate->raw_buf = pcshared_info->data_blocks[block_pos].data;
+ bIsFirst = false;
+ }
+
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+ }
+ }
+
} while (CopyLoadRawBuf(cstate));
}
}
@@ -4764,7 +5544,7 @@ static void
ConvertToServerEncoding(CopyState cstate)
{
/* Done reading the line. Convert it to server encoding. */
- if (cstate->need_transcoding)
+ if (cstate->need_transcoding && (!IsParallelCopy() || IsWorker()))
{
char *cvt;
cvt = pg_any_to_server(cstate->line_buf.data,
@@ -4803,6 +5583,11 @@ CopyReadLineText(CopyState cstate)
char quotec = '\0';
char escapec = '\0';
+ /* For parallel copy */
+ int line_size = 0;
+ int line_pos = 0;
+
+ cstate->eol_type = EOL_UNKNOWN;
if (cstate->csv_mode)
{
quotec = cstate->quote[0];
@@ -4857,6 +5642,8 @@ CopyReadLineText(CopyState cstate)
if (raw_buf_ptr >= copy_buf_len || need_data)
{
REFILL_LINEBUF;
+ SetRawBufForLoad(cstate, line_size, copy_buf_len, raw_buf_ptr,
+ ©_raw_buf);
/*
* Try to read some more data. This will certainly reset
@@ -5081,9 +5868,15 @@ CopyReadLineText(CopyState cstate)
* discard the data and the \. sequence.
*/
if (prev_raw_ptr > cstate->raw_buf_index)
- appendBinaryStringInfo(&cstate->line_buf,
+ {
+ if (!IsParallelCopy())
+ appendBinaryStringInfo(&cstate->line_buf,
cstate->raw_buf + cstate->raw_buf_index,
prev_raw_ptr - cstate->raw_buf_index);
+ else
+ line_size += prev_raw_ptr - cstate->raw_buf_index;
+ }
+
cstate->raw_buf_index = raw_buf_ptr;
result = true; /* report EOF */
break;
@@ -5135,6 +5928,26 @@ not_end_of_copy:
IF_NEED_REFILL_AND_EOF_BREAK(mblen - 1);
raw_buf_ptr += mblen - 1;
}
+
+ /*
+ * Skip the header line. Update the line here, this cannot be done at
+ * the beginning, as there is a possibility that file contains empty
+ * lines.
+ */
+ if (IsParallelCopy() && first_char_in_line && !IsHeaderLine())
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundary *lineInfo;
+ uint32 line_first_block = pcshared_info->cur_block_pos;
+ line_pos = UpdateBlockInLineInfo(cstate,
+ line_first_block,
+ cstate->raw_buf_index, -1,
+ LINE_LEADER_POPULATING);
+ lineInfo = &pcshared_info->line_boundaries.ring[line_pos];
+ elog(DEBUG1, "[Leader] Adding - block:%d, offset:%d, line position:%d",
+ line_first_block, lineInfo->start_offset, line_pos);
+ }
+
first_char_in_line = false;
} /* end of outer loop */
@@ -5143,6 +5956,7 @@ not_end_of_copy:
*/
REFILL_LINEBUF;
CLEAR_EOL_LINE();
+ EndLineParallelCopy(cstate, line_pos, line_size, raw_buf_ptr);
return result;
}
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index f41785f..81cc1f4 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2353,7 +2353,7 @@ psql_completion(const char *text, int start, int end)
/* Complete COPY <sth> FROM|TO filename WITH ( */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "("))
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
- "HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
+ "HEADER", "PARALLEL", "QUOTE", "ESCAPE", "FORCE_QUOTE",
"FORCE_NOT_NULL", "FORCE_NULL", "ENCODING");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index c18554b..94219e8 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -385,6 +385,8 @@ extern FullTransactionId GetTopFullTransactionId(void);
extern FullTransactionId GetTopFullTransactionIdIfAny(void);
extern FullTransactionId GetCurrentFullTransactionId(void);
extern FullTransactionId GetCurrentFullTransactionIdIfAny(void);
+extern void AssignFullTransactionIdForWorker(FullTransactionId fullTransactionId);
+extern void AssignCommandIdForWorker(CommandId commandId, bool used);
extern void MarkCurrentTransactionIdLoggedIfAny(void);
extern bool SubTransactionIsActive(SubTransactionId subxid);
extern CommandId GetCurrentCommandId(bool used);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index bb49b65..120c7a6 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1706,6 +1706,7 @@ ParallelCopyLineBoundary
ParallelCopyData
ParallelCopyDataBlock
ParallelCopyLineBuf
+ParallelCopyLineState
ParallelCopyShmInfo
ParallelExecutorInfo
ParallelHashGrowth
--
1.8.3.1
v3-0004-Documentation-for-parallel-copy.patchtext/x-patch; charset=US-ASCII; name=v3-0004-Documentation-for-parallel-copy.patchDownload
From 8969d183e86199c9035d47459217d5e46f51c49f Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Sat, 1 Aug 2020 09:00:36 +0530
Subject: [PATCH v3 4/6] Documentation for parallel copy.
This patch has the documentation changes for parallel copy.
---
doc/src/sgml/ref/copy.sgml | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18189ab..2e023ed 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -37,6 +37,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
NULL '<replaceable class="parameter">null_string</replaceable>'
HEADER [ <replaceable class="parameter">boolean</replaceable> ]
+ PARALLEL <replaceable class="parameter">integer</replaceable>
QUOTE '<replaceable class="parameter">quote_character</replaceable>'
ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -275,6 +276,22 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</varlistentry>
<varlistentry>
+ <term><literal>PARALLEL</literal></term>
+ <listitem>
+ <para>
+ Perform <command>COPY FROM</command> in parallel using <replaceable
+ class="parameter"> integer</replaceable> background workers. Please
+ note that it is not guaranteed that the number of parallel workers
+ specified in <replaceable class="parameter">integer</replaceable> will
+ be used during execution. It is possible for a copy to run with fewer
+ workers than specified, or even with no workers at all (for example,
+ due to the setting of max_worker_processes). This option is allowed
+ only in <command>COPY FROM</command>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>QUOTE</literal></term>
<listitem>
<para>
--
1.8.3.1
v3-0005-Tests-for-parallel-copy.patchtext/x-patch; charset=US-ASCII; name=v3-0005-Tests-for-parallel-copy.patchDownload
From d72e976bf5310679f2cbac8d91205fea169abbef Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Sat, 1 Aug 2020 09:19:39 +0530
Subject: [PATCH v3 5/6] Tests for parallel copy.
This patch has the tests for parallel copy.
---
src/test/regress/expected/copy2.out | 205 ++++++++++++++++++++++++++++++++++-
src/test/regress/input/copy.source | 12 +++
src/test/regress/output/copy.source | 12 +++
src/test/regress/sql/copy2.sql | 208 +++++++++++++++++++++++++++++++++++-
4 files changed, 429 insertions(+), 8 deletions(-)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index e40287d..7ae5d44 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -254,18 +254,32 @@ It is "perfect".|
"It is ""perfect""."," "
"",
--test that we read consecutive LFs properly
-CREATE TEMP TABLE testnl (a int, b text, c int);
+CREATE TABLE testnl (a int, b text, c int);
COPY testnl FROM stdin CSV;
+COPY testnl FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
+SELECT * FROM testnl;
+ a | b | c
+---+----------------------+---
+ 1 | a field with two LFs+| 2
+ | +|
+ | inside |
+ 1 | a field with two LFs+| 2
+ | +|
+ | inside |
+(2 rows)
+
-- test end of copy marker
-CREATE TEMP TABLE testeoc (a text);
+CREATE TABLE testeoc (a text);
COPY testeoc FROM stdin CSV;
+TRUNCATE testeoc;
+COPY testeoc FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
COPY testeoc TO stdout CSV;
a\.
\.b
c\.d
"\."
-- test handling of nonstandard null marker that violates escaping rules
-CREATE TEMP TABLE testnull(a int, b text);
+CREATE TABLE testnull(a int, b text);
INSERT INTO testnull VALUES (1, E'\\0'), (NULL, NULL);
COPY testnull TO stdout WITH NULL AS E'\\0';
1 \\0
@@ -280,6 +294,15 @@ SELECT * FROM testnull;
|
(4 rows)
+TRUNCATE testnull;
+COPY testnull FROM stdin WITH (NULL E'\\0', PARALLEL 1);
+SELECT * FROM testnull;
+ a | b
+----+----
+ 42 | \0
+ |
+(2 rows)
+
BEGIN;
CREATE TABLE vistest (LIKE testeoc);
COPY vistest FROM stdin CSV;
@@ -349,6 +372,34 @@ SELECT * FROM vistest;
BEGIN;
TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+SELECT * FROM vistest;
+ a
+----
+ a2
+ b
+(2 rows)
+
+SAVEPOINT s1;
+TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+SELECT * FROM vistest;
+ a
+----
+ d2
+ e
+(2 rows)
+
+COMMIT;
+SELECT * FROM vistest;
+ a
+----
+ d2
+ e
+(2 rows)
+
+BEGIN;
+TRUNCATE vistest;
COPY vistest FROM stdin CSV FREEZE;
SELECT * FROM vistest;
a
@@ -409,7 +460,7 @@ SELECT * FROM vistest;
(2 rows)
-- Test FORCE_NOT_NULL and FORCE_NULL options
-CREATE TEMP TABLE forcetest (
+CREATE TABLE forcetest (
a INT NOT NULL,
b TEXT NOT NULL,
c TEXT,
@@ -420,6 +471,8 @@ CREATE TEMP TABLE forcetest (
-- should succeed with no effect ("b" remains an empty string, "c" remains NULL)
BEGIN;
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
+TRUNCATE forcetest;
+COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c), PARALLEL 1);
COMMIT;
SELECT b, c FROM forcetest WHERE a = 1;
b | c
@@ -430,6 +483,8 @@ SELECT b, c FROM forcetest WHERE a = 1;
-- should succeed, FORCE_NULL and FORCE_NOT_NULL can be both specified
BEGIN;
COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d));
+TRUNCATE forcetest;
+COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d), PARALLEL 1);
COMMIT;
SELECT c, d FROM forcetest WHERE a = 2;
c | d
@@ -486,6 +541,31 @@ select * from check_con_tbl;
(2 rows)
+\d+ check_con_tbl
+ Table "public.check_con_tbl"
+ Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
+--------+---------+-----------+----------+---------+---------+--------------+-------------
+ f1 | integer | | | | plain | |
+Check constraints:
+ "check_con_tbl_check" CHECK (check_con_function(check_con_tbl.*))
+
+truncate check_con_tbl;
+copy check_con_tbl from stdin with (parallel 1);
+NOTICE: input = {"f1":1}
+NOTICE: input = {"f1":null}
+copy check_con_tbl from stdin with (parallel 1);
+NOTICE: input = {"f1":0}
+ERROR: new row for relation "check_con_tbl" violates check constraint "check_con_tbl_check"
+DETAIL: Failing row contains (0).
+CONTEXT: COPY check_con_tbl, line 1: "0"
+parallel worker
+select * from check_con_tbl;
+ f1
+----
+ 1
+
+(2 rows)
+
-- test with RLS enabled.
CREATE ROLE regress_rls_copy_user;
CREATE ROLE regress_rls_copy_user_colperms;
@@ -600,8 +680,125 @@ SELECT * FROM instead_of_insert_tbl;
(2 rows)
COMMIT;
+-- Parallel copy tests.
+CREATE TABLE test_parallel_copy (
+ a INT,
+ b INT,
+ c TEXT not null default 'stuff',
+ d TEXT,
+ e TEXT
+) ;
+COPY test_parallel_copy (a, b, c, d, e) from stdin with (PARALLEL 1);
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1, FORMAT 'csv', HEADER);
+-- zero workers: should perform non-parallel copy
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0');
+ERROR: value 0 out of bounds for option "parallel"
+DETAIL: Valid values are between "1" and "1024".
+-- referencing table: should perform non-parallel copy
+CREATE TABLE test_copy_pk(c1 INT PRIMARY KEY);
+INSERT INTO test_copy_pk VALUES(10);
+CREATE TABLE test_copy_ri(c1 INT REFERENCES test_copy_pk(c1));
+COPY test_copy_ri from stdin with (FORMAT csv, DELIMITER ',', PARALLEL 1);
+-- expressions: should perform non-parallel copy
+CREATE TABLE test_copy_expr (index INT, height REAL, weight REAL);
+COPY test_copy_expr FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1) WHERE height > random() * 65;
+-- serial data: should perform non-parallel copy
+CREATE TABLE testserial (index SERIAL, height REAL);
+COPY testserial(height) FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1);
+-- temporary table copy: should perform non-parallel copy
+CREATE TEMPORARY TABLE temp_test(
+ a int
+) ;
+COPY temp_test (a) from stdin with (PARALLEL 1);
+-- non-existent column in column list: should fail
+COPY test_parallel_copy (xyz) from stdin with (PARALLEL 1);
+ERROR: column "xyz" of relation "test_parallel_copy" does not exist
+-- too many columns in column list: should fail
+COPY test_parallel_copy (a, b, c, d, e, d, c) from stdin with (PARALLEL 1);
+ERROR: column "d" specified more than once
+-- missing data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: invalid input syntax for type integer: ""
+CONTEXT: COPY test_parallel_copy, line 0, column a: ""
+parallel worker
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: missing data for column "e"
+CONTEXT: COPY test_parallel_copy, line 1: "2000 230 23 23"
+parallel worker
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: missing data for column "e"
+CONTEXT: COPY test_parallel_copy, line 1: "2001 231 \N \N"
+parallel worker
+-- extra data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: extra data after last expected column
+CONTEXT: COPY test_parallel_copy, line 1: "2002 232 40 50 60 70 80"
+parallel worker
+-- various COPY options: delimiters, oids, NULL string, encoding
+COPY test_parallel_copy (b, c, d, e) from stdin with (delimiter ',', null 'x', PARALLEL 1) ;
+COPY test_parallel_copy from stdin WITH (DELIMITER ';', NULL '', PARALLEL 1);
+COPY test_parallel_copy from stdin WITH (DELIMITER ':', NULL E'\\X', ENCODING 'sql_ascii', PARALLEL 1);
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = 50004;
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a > 60003;
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE f > 60003;
+ERROR: column "f" does not exist
+LINE 1: ..._parallel_copy from stdin WITH (PARALLEL 1) WHERE f > 60003;
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = max(x.b);
+ERROR: missing FROM-clause entry for table "x"
+LINE 1: ...rallel_copy from stdin WITH (PARALLEL 1) WHERE a = max(x.b);
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (SELECT 1 FROM x);
+ERROR: cannot use subquery in COPY FROM WHERE condition
+LINE 1: ...arallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (SELECT...
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (generate_series(1,5));
+ERROR: set-returning functions are not allowed in COPY FROM WHERE conditions
+LINE 1: ...lel_copy from stdin WITH (PARALLEL 1) WHERE a IN (generate_s...
+ ^
+COPY test_parallel_copy from stdin WITH(PARALLEL 1) WHERE a = row_number() over(b);
+ERROR: window functions are not allowed in COPY FROM WHERE conditions
+LINE 1: ...rallel_copy from stdin WITH(PARALLEL 1) WHERE a = row_number...
+ ^
+-- check results of copy in
+SELECT * FROM test_parallel_copy;
+ a | b | c | d | e
+-------+----+------------+---------+---------
+ 1 | 11 | test_c1 | test_d1 | test_e1
+ 2 | 12 | test_c2 | test_d2 | test_e2
+ | 3 | stuff | test_d3 |
+ | 4 | stuff | test_d4 |
+ | 5 | stuff | test_d5 |
+ | | 45 | 80 | 90
+ | | x | \x | \x
+ | | , | \, | \
+ 3000 | | c | |
+ 4000 | | C | |
+ 4001 | 1 | empty | |
+ 4002 | 2 | null | |
+ 4003 | 3 | Backslash | \ | \
+ 4004 | 4 | BackslashX | \X | \X
+ 4005 | 5 | N | N | N
+ 4006 | 6 | BackslashN | \N | \N
+ 4007 | 7 | XX | XX | XX
+ 4008 | 8 | Delimiter | : | :
+ 50004 | 25 | 35 | 45 | 55
+ 60004 | 25 | 35 | 45 | 55
+ 60005 | 26 | 36 | 46 | 56
+(21 rows)
+
-- clean up
+DROP TABLE test_copy_ri;
+DROP TABLE test_copy_pk;
+DROP TABLE test_copy_expr;
+DROP TABLE testeoc;
+DROP TABLE testnl;
DROP TABLE forcetest;
+DROP TABLE test_parallel_copy;
+DROP TABLE testserial;
+DROP TABLE testnull;
DROP TABLE vistest;
DROP FUNCTION truncate_in_subxact();
DROP TABLE x, y;
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index a1d529a..159c058 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -15,6 +15,13 @@ DELETE FROM onek;
COPY onek FROM '@abs_builddir@/results/onek.data';
+-- test parallel copy
+COPY tenk1 FROM '@abs_srcdir@/data/tenk.data' WITH (parallel 2);
+
+SELECT COUNT(*) FROM tenk1;
+
+TRUNCATE tenk1;
+
COPY tenk1 FROM '@abs_srcdir@/data/tenk.data';
COPY slow_emp4000 FROM '@abs_srcdir@/data/rect.data';
@@ -159,6 +166,11 @@ truncate parted_copytest;
copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv';
+-- Test parallel copy from with a partitioned table.
+truncate parted_copytest;
+
+copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv' with (parallel 2);
+
-- Ensure COPY FREEZE errors for partitioned tables.
begin;
truncate parted_copytest;
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index 938d355..c3003fe 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -9,6 +9,15 @@ COPY onek FROM '@abs_srcdir@/data/onek.data';
COPY onek TO '@abs_builddir@/results/onek.data';
DELETE FROM onek;
COPY onek FROM '@abs_builddir@/results/onek.data';
+-- test parallel copy
+COPY tenk1 FROM '@abs_srcdir@/data/tenk.data' WITH (parallel 2);
+SELECT COUNT(*) FROM tenk1;
+ count
+-------
+ 10000
+(1 row)
+
+TRUNCATE tenk1;
COPY tenk1 FROM '@abs_srcdir@/data/tenk.data';
COPY slow_emp4000 FROM '@abs_srcdir@/data/rect.data';
COPY person FROM '@abs_srcdir@/data/person.data';
@@ -113,6 +122,9 @@ insert into parted_copytest select x,1,'One' from generate_series(1011,1020) x;
copy (select * from parted_copytest order by a) to '@abs_builddir@/results/parted_copytest.csv';
truncate parted_copytest;
copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv';
+-- Test parallel copy from with a partitioned table.
+truncate parted_copytest;
+copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv' with (parallel 2);
-- Ensure COPY FREEZE errors for partitioned tables.
begin;
truncate parted_copytest;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 902f4fa..7015698 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -157,7 +157,7 @@ COPY y TO stdout (FORMAT CSV, FORCE_QUOTE *);
--test that we read consecutive LFs properly
-CREATE TEMP TABLE testnl (a int, b text, c int);
+CREATE TABLE testnl (a int, b text, c int);
COPY testnl FROM stdin CSV;
1,"a field with two LFs
@@ -165,8 +165,16 @@ COPY testnl FROM stdin CSV;
inside",2
\.
+COPY testnl FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
+1,"a field with two LFs
+
+inside",2
+\.
+
+SELECT * FROM testnl;
+
-- test end of copy marker
-CREATE TEMP TABLE testeoc (a text);
+CREATE TABLE testeoc (a text);
COPY testeoc FROM stdin CSV;
a\.
@@ -175,11 +183,19 @@ c\.d
"\."
\.
+TRUNCATE testeoc;
+COPY testeoc FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
+a\.
+\.b
+c\.d
+"\."
+\.
+
COPY testeoc TO stdout CSV;
-- test handling of nonstandard null marker that violates escaping rules
-CREATE TEMP TABLE testnull(a int, b text);
+CREATE TABLE testnull(a int, b text);
INSERT INTO testnull VALUES (1, E'\\0'), (NULL, NULL);
COPY testnull TO stdout WITH NULL AS E'\\0';
@@ -191,6 +207,14 @@ COPY testnull FROM stdin WITH NULL AS E'\\0';
SELECT * FROM testnull;
+TRUNCATE testnull;
+COPY testnull FROM stdin WITH (NULL E'\\0', PARALLEL 1);
+42 \\0
+\0 \0
+\.
+
+SELECT * FROM testnull;
+
BEGIN;
CREATE TABLE vistest (LIKE testeoc);
COPY vistest FROM stdin CSV;
@@ -235,6 +259,23 @@ SELECT * FROM vistest;
BEGIN;
TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+a2
+b
+\.
+SELECT * FROM vistest;
+SAVEPOINT s1;
+TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+d2
+e
+\.
+SELECT * FROM vistest;
+COMMIT;
+SELECT * FROM vistest;
+
+BEGIN;
+TRUNCATE vistest;
COPY vistest FROM stdin CSV FREEZE;
x
y
@@ -284,7 +325,7 @@ SELECT * FROM vistest;
COMMIT;
SELECT * FROM vistest;
-- Test FORCE_NOT_NULL and FORCE_NULL options
-CREATE TEMP TABLE forcetest (
+CREATE TABLE forcetest (
a INT NOT NULL,
b TEXT NOT NULL,
c TEXT,
@@ -297,6 +338,10 @@ BEGIN;
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
1,,""
\.
+TRUNCATE forcetest;
+COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c), PARALLEL 1);
+1,,""
+\.
COMMIT;
SELECT b, c FROM forcetest WHERE a = 1;
-- should succeed, FORCE_NULL and FORCE_NOT_NULL can be both specified
@@ -304,6 +349,10 @@ BEGIN;
COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d));
2,'a',,""
\.
+TRUNCATE forcetest;
+COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d), PARALLEL 1);
+2,'a',,""
+\.
COMMIT;
SELECT c, d FROM forcetest WHERE a = 2;
-- should fail with not-null constraint violation
@@ -339,6 +388,16 @@ copy check_con_tbl from stdin;
0
\.
select * from check_con_tbl;
+\d+ check_con_tbl
+truncate check_con_tbl;
+copy check_con_tbl from stdin with (parallel 1);
+1
+\N
+\.
+copy check_con_tbl from stdin with (parallel 1);
+0
+\.
+select * from check_con_tbl;
-- test with RLS enabled.
CREATE ROLE regress_rls_copy_user;
@@ -440,8 +499,149 @@ test1
SELECT * FROM instead_of_insert_tbl;
COMMIT;
+-- Parallel copy tests.
+CREATE TABLE test_parallel_copy (
+ a INT,
+ b INT,
+ c TEXT not null default 'stuff',
+ d TEXT,
+ e TEXT
+) ;
+
+COPY test_parallel_copy (a, b, c, d, e) from stdin with (PARALLEL 1);
+1 11 test_c1 test_d1 test_e1
+2 12 test_c2 test_d2 test_e2
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+3 test_d3
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+4 test_d4
+5 test_d5
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1, FORMAT 'csv', HEADER);
+b d
+\.
+
+-- zero workers: should perform non-parallel copy
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0');
+
+-- referencing table: should perform non-parallel copy
+CREATE TABLE test_copy_pk(c1 INT PRIMARY KEY);
+INSERT INTO test_copy_pk VALUES(10);
+CREATE TABLE test_copy_ri(c1 INT REFERENCES test_copy_pk(c1));
+COPY test_copy_ri from stdin with (FORMAT csv, DELIMITER ',', PARALLEL 1);
+10
+\.
+
+-- expressions: should perform non-parallel copy
+CREATE TABLE test_copy_expr (index INT, height REAL, weight REAL);
+
+COPY test_copy_expr FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1) WHERE height > random() * 65;
+60,60,60
+\.
+
+-- serial data: should perform non-parallel copy
+CREATE TABLE testserial (index SERIAL, height REAL);
+
+COPY testserial(height) FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1);
+60
+\.
+
+-- temporary table copy: should perform non-parallel copy
+CREATE TEMPORARY TABLE temp_test(
+ a int
+) ;
+
+COPY temp_test (a) from stdin with (PARALLEL 1);
+10
+\.
+
+-- non-existent column in column list: should fail
+COPY test_parallel_copy (xyz) from stdin with (PARALLEL 1);
+
+-- too many columns in column list: should fail
+COPY test_parallel_copy (a, b, c, d, e, d, c) from stdin with (PARALLEL 1);
+
+-- missing data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+
+\.
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+2000 230 23 23
+\.
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+2001 231 \N \N
+\.
+
+-- extra data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+2002 232 40 50 60 70 80
+\.
+
+-- various COPY options: delimiters, oids, NULL string, encoding
+COPY test_parallel_copy (b, c, d, e) from stdin with (delimiter ',', null 'x', PARALLEL 1) ;
+x,45,80,90
+x,\x,\\x,\\\x
+x,\,,\\\,,\\
+\.
+
+COPY test_parallel_copy from stdin WITH (DELIMITER ';', NULL '', PARALLEL 1);
+3000;;c;;
+\.
+
+COPY test_parallel_copy from stdin WITH (DELIMITER ':', NULL E'\\X', ENCODING 'sql_ascii', PARALLEL 1);
+4000:\X:C:\X:\X
+4001:1:empty::
+4002:2:null:\X:\X
+4003:3:Backslash:\\:\\
+4004:4:BackslashX:\\X:\\X
+4005:5:N:\N:\N
+4006:6:BackslashN:\\N:\\N
+4007:7:XX:\XX:\XX
+4008:8:Delimiter:\::\:
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = 50004;
+50003 24 34 44 54
+50004 25 35 45 55
+50005 26 36 46 56
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a > 60003;
+60001 22 32 42 52
+60002 23 33 43 53
+60003 24 34 44 54
+60004 25 35 45 55
+60005 26 36 46 56
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE f > 60003;
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = max(x.b);
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (SELECT 1 FROM x);
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (generate_series(1,5));
+
+COPY test_parallel_copy from stdin WITH(PARALLEL 1) WHERE a = row_number() over(b);
+
+-- check results of copy in
+SELECT * FROM test_parallel_copy;
+
-- clean up
+DROP TABLE test_copy_ri;
+DROP TABLE test_copy_pk;
+DROP TABLE test_copy_expr;
+DROP TABLE testeoc;
+DROP TABLE testnl;
DROP TABLE forcetest;
+DROP TABLE test_parallel_copy;
+DROP TABLE testserial;
+DROP TABLE testnull;
DROP TABLE vistest;
DROP FUNCTION truncate_in_subxact();
DROP TABLE x, y;
--
1.8.3.1
v3-0006-Parallel-Copy-For-Binary-Format-Files.patchtext/x-patch; charset=US-ASCII; name=v3-0006-Parallel-Copy-For-Binary-Format-Files.patchDownload
From b351a756e59b6b3c21a058394b969877788f43b5 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Mon, 3 Aug 2020 11:58:35 +0530
Subject: [PATCH v3 6/6] Parallel Copy For Binary Format Files
Leader reads data from the file into the DSM data blocks each of 64K size.
It also identifies each tuple data block id, start offset, end offset,
tuple size and updates this information in the ring data structure.
Workers parallelly read the tuple information from the ring data structure,
the actual tuple data from the data blocks and parallelly insert the tuples
into the table.
---
src/backend/commands/copy.c | 687 ++++++++++++++++++++++++++++++++++++++------
1 file changed, 602 insertions(+), 85 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 4c69a5d..0948296 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -263,6 +263,17 @@ typedef struct ParallelCopyLineBuf
}ParallelCopyLineBuf;
/*
+ * Represents the usage mode for CopyReadBinaryGetDataBlock.
+ */
+typedef enum FieldInfoType
+{
+ FIELD_NONE = 0,
+ FIELD_COUNT,
+ FIELD_SIZE,
+ FIELD_DATA
+} FieldInfoType;
+
+/*
* Parallel copy data information.
*/
typedef struct ParallelCopyData
@@ -283,6 +294,9 @@ typedef struct ParallelCopyData
/* Current position in worker_line_buf */
uint32 worker_line_buf_pos;
+
+ /* For binary formatted files */
+ ParallelCopyDataBlock *curr_data_block;
}ParallelCopyData;
/*
@@ -447,6 +461,7 @@ typedef struct SerializedParallelCopyState
/* Working state for COPY FROM */
AttrNumber num_defaults;
Oid relid;
+ bool binary;
}SerializedParallelCopyState;
/* DestReceiver for COPY (query) TO */
@@ -521,7 +536,6 @@ typedef struct CopyMultiInsertInfo
int ti_options; /* table insert options */
} CopyMultiInsertInfo;
-
/*
* These macros centralize code used to process line_buf and raw_buf buffers.
* They are macros because they often do continue/break control and to avoid
@@ -649,11 +663,113 @@ if (!IsParallelCopy()) \
else \
return pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
+/*
+ * CHECK_FIELD_COUNT - Handles the error cases for field count
+ * for binary format files.
+ */
+#define CHECK_FIELD_COUNT \
+{\
+ if (fld_count == -1) \
+ { \
+ if (IsParallelCopy() && \
+ !IsLeader()) \
+ return true; \
+ else if (IsParallelCopy() && \
+ IsLeader()) \
+ { \
+ if (cstate->pcdata->curr_data_block->data[cstate->raw_buf_index + sizeof(fld_count)] != 0) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("received copy data after EOF marker"))); \
+ return true; \
+ } \
+ else \
+ { \
+ /* \
+ * Received EOF marker. In a V3-protocol copy, wait for the \
+ * protocol-level EOF, and complain if it doesn't come \
+ * immediately. This ensures that we correctly handle CopyFail, \
+ * if client chooses to send that now. \
+ * \
+ * Note that we MUST NOT try to read more data in an old-protocol \
+ * copy, since there is no protocol-level EOF marker then. We \
+ * could go either way for copy from file, but choose to throw \
+ * error if there's data after the EOF marker, for consistency \
+ * with the new-protocol case. \
+ */ \
+ char dummy; \
+ if (cstate->copy_dest != COPY_OLD_FE && \
+ CopyReadBinaryData(cstate, &dummy, 1) > 0) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("received copy data after EOF marker"))); \
+ return false; \
+ } \
+ } \
+ if (fld_count != cstate->max_fields) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("row field count is %d, expected %d", \
+ (int) fld_count, cstate->max_fields))); \
+}
+
+/*
+ * CHECK_FIELD_SIZE - Handles the error case for field size
+ * for binary format files.
+ */
+#define CHECK_FIELD_SIZE(fld_size) \
+{ \
+ if (fld_size < -1) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("invalid field size")));\
+}
+
+/*
+ * EOF_ERROR - Error statement for EOF for binary format
+ * files.
+ */
+#define EOF_ERROR \
+{ \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("unexpected EOF in COPY data")));\
+}
+
+/*
+ * GET_RAW_BUF_INDEX - Calculates the raw buf index for the cases
+ * where the data spread is across multiple data blocks.
+ */
+#define GET_RAW_BUF_INDEX(raw_buf_index, fld_size, required_blks, curr_blk_bytes) \
+{ \
+ raw_buf_index = fld_size - (((required_blks - 1) * DATA_BLOCK_SIZE) + curr_blk_bytes); \
+}
+
+/*
+ * GET_REQUIRED_BLOCKS - Calculates the number of data
+ * blocks required for the cases where the data spread
+ * is across multiple data blocks.
+ */
+#define GET_REQUIRED_BLOCKS(required_blks, fld_size, curr_blk_bytes) \
+{ \
+ /* \
+ * field size can spread across multiple data blocks, \
+ * calculate the number of required data blocks and try to get \
+ * those many data blocks. \
+ */ \
+ required_blks = (int32)(fld_size - curr_blk_bytes)/(int32)DATA_BLOCK_SIZE; \
+ /* \
+ * check if we need the data block for the field data \
+ * bytes that are not modulus of data block size. \
+ */ \
+ if ((fld_size - curr_blk_bytes)%DATA_BLOCK_SIZE != 0) \
+ required_blks++; \
+}
+
/* End parallel copy Macros */
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
-
/* non-export function prototypes */
static CopyState BeginCopy(ParseState *pstate, bool is_from, Relation rel,
RawStmt *raw_query, Oid queryRelId, List *attnamelist,
@@ -708,6 +824,14 @@ static void ExecBeforeStmtTrigger(CopyState cstate);
static void CheckTargetRelValidity(CopyState cstate);
static void PopulateCstateCatalogInfo(CopyState cstate);
static pg_attribute_always_inline uint32 GetLinePosition(CopyState cstate);
+static uint32 WaitGetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info);
+static bool CopyReadBinaryTupleLeader(CopyState cstate);
+static void CopyReadBinaryFindTupleSize(CopyState cstate, uint32 *line_size);
+static bool CopyReadBinaryTupleWorker(CopyState cstate, Datum *values, bool *nulls);
+static Datum CopyReadBinaryAttributeWorker(CopyState cstate, FmgrInfo *flinfo,
+ Oid typioparam, int32 typmod, bool *isnull);
+static void CopyReadBinaryGetDataBlock(CopyState cstate, FieldInfoType field_info);
+
/*
* SerializeParallelCopyState - Copy shared_cstate using cstate information.
*/
@@ -725,6 +849,7 @@ SerializeParallelCopyState(CopyState cstate, SerializedParallelCopyState *shared
shared_cstate->convert_selectively = cstate->convert_selectively;
shared_cstate->num_defaults = cstate->num_defaults;
shared_cstate->relid = cstate->pcdata->relid;
+ shared_cstate->binary = cstate->binary;
}
/*
@@ -884,8 +1009,8 @@ FindInsertMethod(CopyState cstate)
static pg_attribute_always_inline bool
IsParallelCopyAllowed(CopyState cstate)
{
- /* Parallel copy not allowed for frontend (2.0 protocol) & binary option. */
- if ((cstate->copy_dest == COPY_OLD_FE) || cstate->binary)
+ /* Parallel copy not allowed for frontend (2.0 protocol). */
+ if (cstate->copy_dest == COPY_OLD_FE)
return false;
/* Check if copy is into foreign table or temporary table. */
@@ -1155,6 +1280,7 @@ InitializeParallelCopyInfo(SerializedParallelCopyState *shared_cstate,
cstate->convert_selectively = shared_cstate->convert_selectively;
cstate->num_defaults = shared_cstate->num_defaults;
pcdata->relid = shared_cstate->relid;
+ cstate->binary = shared_cstate->binary;
PopulateCommonCstateInfo(cstate, tup_desc, attnamelist);
@@ -1543,35 +1669,72 @@ ParallelCopyFrom(CopyState cstate)
ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
ereport(DEBUG1, (errmsg("Running parallel copy leader")));
+ /* raw_buf is not used in parallel copy, instead data blocks are used.*/
+ pfree(cstate->raw_buf);
+
/* Execute the before statement triggers from the leader */
ExecBeforeStmtTrigger(cstate);
- /* On input just throw the header line away. */
- if (cstate->cur_lineno == 0 && cstate->header_line)
+ if (!cstate->binary)
{
- cstate->cur_lineno++;
- if (CopyReadLine(cstate))
+ /* On input just throw the header line away. */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
{
- pcshared_info->is_read_in_progress = false;
- return; /* done */
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ {
+ pcshared_info->is_read_in_progress = false;
+ return; /* done */
+ }
}
- }
- for (;;)
- {
- bool done;
- cstate->cur_lineno++;
+ for (;;)
+ {
+ bool done;
+ cstate->cur_lineno++;
- /* Actually read the line into memory here. */
- done = CopyReadLine(cstate);
+ /* Actually read the line into memory here. */
+ done = CopyReadLine(cstate);
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ break;
+ }
+ }
+ else
+ {
/*
- * EOF at start of line means we're done. If we see EOF after some
- * characters, we act as though it was newline followed by EOF, ie,
- * process the line and then exit loop on next iteration.
+ * Binary Format Files.
+ * For parallel copy leader, fill in the error
+ * context information here, in case any failures
+ * while determining tuple offsets, leader
+ * would throw the errors with proper context.
*/
- if (done && cstate->line_buf.len == 0)
- break;
+ ErrorContextCallback errcallback;
+ errcallback.callback = CopyFromErrorCallback;
+ errcallback.arg = (void *) cstate;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+ cstate->pcdata->curr_data_block = NULL;
+ cstate->raw_buf_index = 0;
+ pcshared_info->populated = 0;
+ cstate->cur_lineno = 0;
+ cstate->max_fields = list_length(cstate->attnumlist);
+
+ for (;;)
+ {
+ bool eof = false;
+ cstate->cur_lineno++;
+
+ eof = CopyReadBinaryTupleLeader(cstate);
+
+ if (eof)
+ break;
+ }
}
pcshared_info->is_read_in_progress = false;
@@ -1579,7 +1742,355 @@ ParallelCopyFrom(CopyState cstate)
}
/*
- * GetLinePosition - return the line position that worker should process.
+ * CopyReadBinaryGetDataBlock - gets a new block, updates
+ * the current offset, calculates the skip bytes.
+ */
+static void
+CopyReadBinaryGetDataBlock(CopyState cstate, FieldInfoType field_info)
+{
+ ParallelCopyDataBlock *data_block = NULL;
+ ParallelCopyDataBlock *curr_data_block = cstate->pcdata->curr_data_block;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint8 move_bytes = 0;
+ uint32 block_pos;
+ uint32 prev_block_pos;
+ int read_bytes = 0;
+
+ prev_block_pos = pcshared_info->cur_block_pos;
+
+ block_pos = WaitGetFreeCopyBlock(pcshared_info);
+
+ if (field_info == FIELD_SIZE || field_info == FIELD_COUNT)
+ move_bytes = (DATA_BLOCK_SIZE - cstate->raw_buf_index);
+
+ if (curr_data_block != NULL)
+ curr_data_block->skip_bytes = move_bytes;
+
+ data_block = &pcshared_info->data_blocks[block_pos];
+
+ if (move_bytes > 0 && curr_data_block != NULL)
+ memmove(&data_block->data[0], &curr_data_block->data[cstate->raw_buf_index], move_bytes);
+
+ elog(DEBUG1, "LEADER - field info %d is spread across data blocks - moved %d bytes from current block %u to %u block",
+ field_info, move_bytes, prev_block_pos, block_pos);
+
+ read_bytes = CopyGetData(cstate, &data_block->data[move_bytes], 1, (DATA_BLOCK_SIZE - move_bytes));
+
+ if (field_info == FIELD_NONE && cstate->reached_eof)
+ return;
+
+ if (cstate->reached_eof)
+ EOF_ERROR;
+
+ elog(DEBUG1, "LEADER - bytes read from file %d", read_bytes);
+
+ if(field_info == FIELD_SIZE || field_info == FIELD_DATA)
+ {
+ ParallelCopyDataBlock *prev_data_block = NULL;
+ prev_data_block = curr_data_block;
+ prev_data_block->following_block = block_pos;
+
+ if (prev_data_block->curr_blk_completed == false)
+ prev_data_block->curr_blk_completed = true;
+
+ pg_atomic_add_fetch_u32(&prev_data_block->unprocessed_line_parts, 1);
+ }
+
+ cstate->pcdata->curr_data_block = data_block;
+ cstate->raw_buf_index = 0;
+}
+
+/*
+ * CopyReadBinaryTupleLeader - leader reads data from binary formatted file
+ * to data blocks and identifies tuple boundaries/offsets so that workers
+ * can work on the data blocks data.
+ */
+static bool
+CopyReadBinaryTupleLeader(CopyState cstate)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ int16 fld_count;
+ uint32 line_size = 0;
+ uint32 start_block_pos;
+ uint32 start_offset;
+
+ if (cstate->pcdata->curr_data_block == NULL)
+ {
+ CopyReadBinaryGetDataBlock(cstate, FIELD_NONE);
+
+ /*
+ * no data is read from file here. one possibility
+ * to be here could be that the binary file just
+ * has a valid signature but nothing else.
+ */
+ if (cstate->reached_eof)
+ return true;
+ }
+
+ if ((cstate->raw_buf_index + sizeof(fld_count)) >= DATA_BLOCK_SIZE)
+ CopyReadBinaryGetDataBlock(cstate, FIELD_COUNT);
+
+ memcpy(&fld_count, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_count));
+ fld_count = (int16) pg_ntoh16(fld_count);
+ CHECK_FIELD_COUNT;
+ start_offset = cstate->raw_buf_index;
+ cstate->raw_buf_index += sizeof(fld_count);
+ line_size += sizeof(fld_count);
+ start_block_pos = pcshared_info->cur_block_pos;
+
+ CopyReadBinaryFindTupleSize(cstate, &line_size);
+
+ pg_atomic_add_fetch_u32(&cstate->pcdata->curr_data_block->unprocessed_line_parts, 1);
+
+ if (line_size > 0)
+ {
+ int line_pos = UpdateBlockInLineInfo(cstate,
+ start_block_pos,
+ start_offset,
+ line_size,
+ LINE_LEADER_POPULATED);
+
+ pcshared_info->populated++;
+ elog(DEBUG1, "LEADER - adding - block:%u, offset:%u, line size:%u line position:%d",
+ start_block_pos, start_offset, line_size, line_pos);
+ }
+
+ return false;
+}
+
+/*
+ * CopyReadBinaryFindTupleSize - leader identifies boundaries/
+ * offsets for each attribute/column and finally results in the
+ * tuple/row size. It moves on to next data block if the attribute/
+ * column is spread across data blocks.
+ */
+static void
+CopyReadBinaryFindTupleSize(CopyState cstate, uint32 *line_size)
+{
+ int32 fld_size;
+ ListCell *cur;
+ TupleDesc tup_desc = RelationGetDescr(cstate->rel);
+
+ foreach(cur, cstate->attnumlist)
+ {
+ int att_num = lfirst_int(cur);
+ Form_pg_attribute att = TupleDescAttr(tup_desc, (att_num - 1));
+ cstate->cur_attname = NameStr(att->attname);
+ fld_size = 0;
+
+ if ((cstate->raw_buf_index + sizeof(fld_size)) >= DATA_BLOCK_SIZE)
+ CopyReadBinaryGetDataBlock(cstate, FIELD_SIZE);
+
+ memcpy(&fld_size, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_size));
+ cstate->raw_buf_index += sizeof(fld_size);
+ *line_size += sizeof(fld_size);
+ fld_size = (int32) pg_ntoh32(fld_size);
+
+ CHECK_FIELD_SIZE(fld_size);
+
+ *line_size += fld_size;
+
+ if ((DATA_BLOCK_SIZE - cstate->raw_buf_index) >= fld_size)
+ {
+ cstate->raw_buf_index += fld_size;
+ elog(DEBUG1, "LEADER - tuple lies in he same data block");
+ }
+ else
+ {
+ int32 required_blks = 0;
+ int32 curr_blk_bytes = (DATA_BLOCK_SIZE - cstate->raw_buf_index);
+ int i = 0;
+
+ GET_REQUIRED_BLOCKS(required_blks, fld_size, curr_blk_bytes);
+
+ i = required_blks;
+
+ while(i > 0)
+ {
+ CopyReadBinaryGetDataBlock(cstate, FIELD_DATA);
+ i--;
+ }
+
+ GET_RAW_BUF_INDEX(cstate->raw_buf_index, fld_size, required_blks, curr_blk_bytes);
+
+ /*
+ * raw_buf_index should never cross data block size,
+ * as the required number of data blocks would have
+ * been obtained in the above while loop.
+ */
+ Assert(cstate->raw_buf_index <= DATA_BLOCK_SIZE);
+ }
+ cstate->cur_attname = NULL;
+ }
+}
+
+/*
+ * CopyReadBinaryTupleWorker - each worker reads data from data blocks after
+ * getting leader-identified tuple offsets from ring data structure.
+ */
+static bool
+CopyReadBinaryTupleWorker(CopyState cstate, Datum *values, bool *nulls)
+{
+ uint32 line_pos;
+ ParallelCopyLineBoundary *line_info;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ int16 fld_count;
+ ListCell *cur;
+ FmgrInfo *in_functions = cstate->in_functions;
+ Oid *typioparams = cstate->typioparams;
+ TupleDesc tup_desc = RelationGetDescr(cstate->rel);
+
+ line_pos = GetLinePosition(cstate);
+
+ if (line_pos == -1)
+ return true;
+
+ line_info = &pcshared_info->line_boundaries.ring[line_pos];
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[line_info->first_block];
+ cstate->raw_buf_index = line_info->start_offset;
+
+ if (cstate->raw_buf_index + sizeof(fld_count) >= DATA_BLOCK_SIZE)
+ {
+ /*
+ * The case where field count spread across datablocks should never occur,
+ * as the leader would have moved it to next block. this code exists for
+ * debugging purposes only.
+ */
+ elog(DEBUG1, "WORKER - field count spread across datablocks should never occur");
+ }
+
+ memcpy(&fld_count, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_count));
+ fld_count = (int16) pg_ntoh16(fld_count);
+
+ CHECK_FIELD_COUNT;
+
+ cstate->raw_buf_index += sizeof(fld_count);
+
+ foreach(cur, cstate->attnumlist)
+ {
+ int att_num = lfirst_int(cur);
+ int m = att_num - 1;
+ Form_pg_attribute att = TupleDescAttr(tup_desc, m);
+ cstate->cur_attname = NameStr(att->attname);
+
+ values[m] = CopyReadBinaryAttributeWorker(cstate,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &nulls[m]);
+ cstate->cur_attname = NULL;
+ }
+
+ pg_atomic_sub_fetch_u32(&cstate->pcdata->curr_data_block->unprocessed_line_parts, 1);
+ line_info->start_offset = -1;
+ pg_atomic_write_u32(&line_info->line_state, LINE_WORKER_PROCESSED);
+ pg_atomic_write_u32(&line_info->line_size, -1);
+ pg_atomic_add_fetch_u64(&pcshared_info->total_worker_processed, 1);
+
+ return false;
+}
+
+/*
+ * CopyReadBinaryAttributeWorker - leader identifies boundaries/offsets
+ * for each attribute/column, it moves on to next data block if the
+ * attribute/column is spread across data blocks.
+ */
+static Datum
+CopyReadBinaryAttributeWorker(CopyState cstate, FmgrInfo *flinfo,
+ Oid typioparam, int32 typmod, bool *isnull)
+{
+ int32 fld_size;
+ Datum result;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+
+ if ((cstate->raw_buf_index + sizeof(fld_size)) >= DATA_BLOCK_SIZE)
+ {
+ ParallelCopyDataBlock *prev_data_block = cstate->pcdata->curr_data_block;
+ elog(DEBUG1, "WORKER - field size is spread across data blocks");
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[prev_data_block->following_block];
+ pg_atomic_sub_fetch_u32(&prev_data_block->unprocessed_line_parts, 1);
+ cstate->raw_buf_index = 0;
+ }
+
+ memcpy(&fld_size, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_size));
+ fld_size = (int32) pg_ntoh32(fld_size);
+
+ CHECK_FIELD_SIZE(fld_size);
+
+ cstate->raw_buf_index += sizeof(fld_size);
+
+ /* reset attribute_buf to empty, and load raw data in it */
+ resetStringInfo(&cstate->attribute_buf);
+
+ enlargeStringInfo(&cstate->attribute_buf, fld_size);
+
+ if ((DATA_BLOCK_SIZE - cstate->raw_buf_index) >= fld_size)
+ {
+ elog(DEBUG1, "WORKER - tuple lies in single data block");
+ memcpy(&cstate->attribute_buf.data[0],&cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], fld_size);
+ cstate->raw_buf_index += fld_size;
+ }
+ else
+ {
+ uint32 att_buf_idx = 0;
+ uint32 copy_bytes = 0;
+ int32 required_blks = 0;
+ int32 curr_blk_bytes = (DATA_BLOCK_SIZE - cstate->raw_buf_index);
+ ParallelCopyDataBlock *prev_data_block = NULL;
+ int i = 0;
+
+ GET_REQUIRED_BLOCKS(required_blks, fld_size, curr_blk_bytes);
+
+ i = required_blks;
+ prev_data_block = cstate->pcdata->curr_data_block;
+ elog(DEBUG1, "WORKER - tuple is spread across data blocks");
+ memcpy(&cstate->attribute_buf.data[0], &prev_data_block->data[cstate->raw_buf_index],
+ curr_blk_bytes);
+ copy_bytes = curr_blk_bytes;
+ att_buf_idx = curr_blk_bytes;
+
+ while (i>0)
+ {
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[prev_data_block->following_block];
+ pg_atomic_sub_fetch_u32(&prev_data_block->unprocessed_line_parts, 1);
+ cstate->raw_buf_index = 0;
+ copy_bytes = fld_size - att_buf_idx;
+
+ /*
+ * the bytes that are yet to be taken into att buff are more than
+ * the entire data block size, but only take the data block size
+ * elements.
+ */
+ if (copy_bytes >= DATA_BLOCK_SIZE)
+ copy_bytes = DATA_BLOCK_SIZE;
+
+ memcpy(&cstate->attribute_buf.data[att_buf_idx],
+ &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], copy_bytes);
+ att_buf_idx += copy_bytes;
+ i--;
+ }
+ GET_RAW_BUF_INDEX(cstate->raw_buf_index, fld_size, required_blks, curr_blk_bytes);
+ }
+
+ cstate->attribute_buf.len = fld_size;
+ cstate->attribute_buf.data[fld_size] = '\0';
+
+ /* Call the column type's binary input converter */
+ result = ReceiveFunctionCall(flinfo, &cstate->attribute_buf,
+ typioparam, typmod);
+
+ /* Trouble if it didn't eat the whole buffer */
+ if (cstate->attribute_buf.cursor != cstate->attribute_buf.len)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
+ errmsg("incorrect binary data format")));
+
+ *isnull = false;
+ return result;
+}
+
+/*
+ * GetLinePosition - return the line position that worker should pcdata->process.
*/
static uint32
GetLinePosition(CopyState cstate)
@@ -1667,7 +2178,9 @@ GetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
{
dataBlkPtr->curr_blk_completed = false;
dataBlkPtr->skip_bytes = 0;
+ dataBlkPtr->following_block = -1;
pcshared_info->cur_block_pos = block_pos;
+ MemSet(&dataBlkPtr->data[0], 0, DATA_BLOCK_SIZE);
return block_pos;
}
@@ -2183,10 +2696,26 @@ CopyGetInt32(CopyState cstate, int32 *val)
{
uint32 buf;
- if (CopyReadBinaryData(cstate, (char *) &buf, sizeof(buf)) != sizeof(buf))
+ /*
+ * For parallel copy, avoid reading data to raw buf, read directly
+ * from file, later the data will be read to parallel copy data
+ * buffers.
+ */
+ if (cstate->nworkers > 0)
{
- *val = 0; /* suppress compiler warning */
- return false;
+ if (CopyGetData(cstate, &buf, sizeof(buf), sizeof(buf)) != sizeof(buf))
+ {
+ *val = 0; /* suppress compiler warning */
+ return false;
+ }
+ }
+ else
+ {
+ if (CopyReadBinaryData(cstate, (char *) &buf, sizeof(buf)) != sizeof(buf))
+ {
+ *val = 0; /* suppress compiler warning */
+ return false;
+ }
}
*val = (int32) pg_ntoh32(buf);
return true;
@@ -2575,7 +3104,15 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
EndParallelCopy(pcxt);
}
else
+ {
+ /*
+ * Reset nworkers to -1 here. This is useful in cases where user
+ * specifies parallel workers, but, no worker is picked up, so go
+ * back to non parallel mode value of nworkers.
+ */
+ cstate->nworkers = -1;
*processed = CopyFrom(cstate); /* copy from file to database */
+ }
EndCopyFrom(cstate);
}
@@ -5040,7 +5577,7 @@ BeginCopyFrom(ParseState *pstate,
* only in text mode.
*/
initStringInfo(&cstate->attribute_buf);
- cstate->raw_buf = (IsParallelCopy()) ? NULL : (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
cstate->raw_buf_index = cstate->raw_buf_len = 0;
if (!cstate->binary)
{
@@ -5120,7 +5657,7 @@ BeginCopyFrom(ParseState *pstate,
int32 tmp;
/* Signature */
- if (CopyReadBinaryData(cstate, readSig, 11) != 11 ||
+ if (CopyGetData(cstate, readSig, 11, 11) != 11 ||
memcmp(readSig, BinarySignature, 11) != 0)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
@@ -5148,7 +5685,7 @@ BeginCopyFrom(ParseState *pstate,
/* Skip extension header, if present */
while (tmp-- > 0)
{
- if (CopyReadBinaryData(cstate, readSig, 1) != 1)
+ if (CopyGetData(cstate, readSig, 1, 1) != 1)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
errmsg("invalid COPY file header (wrong length)")));
@@ -5345,60 +5882,45 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
else
{
/* binary */
- int16 fld_count;
- ListCell *cur;
-
cstate->cur_lineno++;
+ cstate->max_fields = list_length(cstate->attnumlist);
- if (!CopyGetInt16(cstate, &fld_count))
+ if (!IsParallelCopy())
{
- /* EOF detected (end of file, or protocol-level EOF) */
- return false;
- }
+ int16 fld_count;
+ ListCell *cur;
- if (fld_count == -1)
- {
- /*
- * Received EOF marker. In a V3-protocol copy, wait for the
- * protocol-level EOF, and complain if it doesn't come
- * immediately. This ensures that we correctly handle CopyFail,
- * if client chooses to send that now.
- *
- * Note that we MUST NOT try to read more data in an old-protocol
- * copy, since there is no protocol-level EOF marker then. We
- * could go either way for copy from file, but choose to throw
- * error if there's data after the EOF marker, for consistency
- * with the new-protocol case.
- */
- char dummy;
-
- if (cstate->copy_dest != COPY_OLD_FE &&
- CopyReadBinaryData(cstate, &dummy, 1) > 0)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("received copy data after EOF marker")));
- return false;
- }
+ if (!CopyGetInt16(cstate, &fld_count))
+ {
+ /* EOF detected (end of file, or protocol-level EOF) */
+ return false;
+ }
- if (fld_count != attr_count)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("row field count is %d, expected %d",
- (int) fld_count, attr_count)));
+ CHECK_FIELD_COUNT;
- foreach(cur, cstate->attnumlist)
+ foreach(cur, cstate->attnumlist)
+ {
+ int attnum = lfirst_int(cur);
+ int m = attnum - 1;
+ Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+ cstate->cur_attname = NameStr(att->attname);
+ values[m] = CopyReadBinaryAttribute(cstate,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &nulls[m]);
+ cstate->cur_attname = NULL;
+ }
+ }
+ else
{
- int attnum = lfirst_int(cur);
- int m = attnum - 1;
- Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+ bool eof = false;
- cstate->cur_attname = NameStr(att->attname);
- values[m] = CopyReadBinaryAttribute(cstate,
- &in_functions[m],
- typioparams[m],
- att->atttypmod,
- &nulls[m]);
- cstate->cur_attname = NULL;
+ eof = CopyReadBinaryTupleWorker(cstate, values, nulls);
+
+ if (eof)
+ return false;
}
}
@@ -6398,18 +6920,15 @@ CopyReadBinaryAttribute(CopyState cstate, FmgrInfo *flinfo,
Datum result;
if (!CopyGetInt32(cstate, &fld_size))
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("unexpected EOF in COPY data")));
+ EOF_ERROR;
+
if (fld_size == -1)
{
*isnull = true;
return ReceiveFunctionCall(flinfo, NULL, typioparam, typmod);
}
- if (fld_size < 0)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("invalid field size")));
+
+ CHECK_FIELD_SIZE(fld_size);
/* reset attribute_buf to empty, and load raw data in it */
resetStringInfo(&cstate->attribute_buf);
@@ -6417,9 +6936,7 @@ CopyReadBinaryAttribute(CopyState cstate, FmgrInfo *flinfo,
enlargeStringInfo(&cstate->attribute_buf, fld_size);
if (CopyReadBinaryData(cstate, cstate->attribute_buf.data,
fld_size) != fld_size)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("unexpected EOF in COPY data")));
+ EOF_ERROR;
cstate->attribute_buf.len = fld_size;
cstate->attribute_buf.data[fld_size] = '\0';
--
1.8.3.1
Hi Vignesh,
Some further comments:
(1) v3-0002-Framework-for-leader-worker-in-parallel-copy.patch
+/*
+ * Each worker will be allocated WORKER_CHUNK_COUNT of records from DSM data
+ * block to process to avoid lock contention. This value should be divisible by
+ * RINGSIZE, as wrap around cases is currently not handled while selecting the
+ * WORKER_CHUNK_COUNT by the worker.
+ */
+#define WORKER_CHUNK_COUNT 50
"This value should be divisible by RINGSIZE" is not a correct
statement (since obviously 50 is not divisible by 10000).
It should say something like "This value should evenly divide into
RINGSIZE", or "RINGSIZE should be a multiple of WORKER_CHUNK_COUNT".
(2) v3-0003-Allow-copy-from-command-to-process-data-from-file.patch
(i)
+ /*
+ * If the data is present in current block
lineInfo. line_size
+ * will be updated. If the data is spread
across the blocks either
Somehow a space has been put between "lineinfo." and "line_size".
It should be: "If the data is present in current block
lineInfo.line_size will be updated"
(ii)
This is not possible because of pg_atomic_compare_exchange_u32, this
will succeed only for one of the workers whose line_state is
LINE_LEADER_POPULATED, for other workers it will fail. This is
explained in detail above ParallelCopyLineBoundary.
Yes, but prior to that call to pg_atomic_compare_exchange_u32(),
aren't you separately reading line_state and line_state, so that
between those reads, it may have transitioned from leader to another
worker, such that the read line state ("cur_line_state", being checked
in the if block) may not actually match what is now in the line_state
and/or the read line_size ("dataSize") doesn't actually correspond to
the read line state?
(sorry, still not 100% convinced that the synchronization and checks
are safe in all cases)
(3) v3-0006-Parallel-Copy-For-Binary-Format-Files.patch
raw_buf is not used in parallel copy, instead raw_buf will be pointing
to shared memory data blocks. This memory was allocated as part of
BeginCopyFrom, uptil this point we cannot be 100% sure as copy can be
performed sequentially like in case max_worker_processes is not
available, if it switches to sequential mode raw_buf will be used
while performing copy operation. At this place we can safely free this
memory that was allocated
So the following code (which checks raw_buf, which still points to
memory that has been pfreed) is still valid?
In the SetRawBufForLoad() function, which is called by CopyReadLineText():
cur_data_blk_ptr = (cstate->raw_buf) ?
&pcshared_info->data_blocks[cur_block_pos] : NULL;
The above code looks a bit dicey to me. I stepped over that line in
the debugger when I debugged an instance of Parallel Copy, so it
definitely gets executed.
It makes me wonder what other code could possibly be checking raw_buf
and using it in some way, when in fact what it points to has been
pfreed.
Are you able to add the following line of code, or will it (somehow)
break logic that you are relying on?
pfree(cstate->raw_buf);
cstate->raw_buf = NULL; <=== I suggest that this line is added
Regards,
Greg Nancarrow
Fujitsu Australia
Thanks Greg for reviewing the patch. Please find my thoughts for your comments.
On Mon, Aug 17, 2020 at 9:44 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
Some further comments:
(1) v3-0002-Framework-for-leader-worker-in-parallel-copy.patch
+/* + * Each worker will be allocated WORKER_CHUNK_COUNT of records from DSM data + * block to process to avoid lock contention. This value should be divisible by + * RINGSIZE, as wrap around cases is currently not handled while selecting the + * WORKER_CHUNK_COUNT by the worker. + */ +#define WORKER_CHUNK_COUNT 50"This value should be divisible by RINGSIZE" is not a correct
statement (since obviously 50 is not divisible by 10000).
It should say something like "This value should evenly divide into
RINGSIZE", or "RINGSIZE should be a multiple of WORKER_CHUNK_COUNT".
Fixed. Changed it to RINGSIZE should be a multiple of WORKER_CHUNK_COUNT.
(2) v3-0003-Allow-copy-from-command-to-process-data-from-file.patch
(i)
+ /* + * If the data is present in current block lineInfo. line_size + * will be updated. If the data is spread across the blocks eitherSomehow a space has been put between "lineinfo." and "line_size".
It should be: "If the data is present in current block
lineInfo.line_size will be updated"
Fixed, changed it to lineinfo->line_size.
(ii)
This is not possible because of pg_atomic_compare_exchange_u32, this
will succeed only for one of the workers whose line_state is
LINE_LEADER_POPULATED, for other workers it will fail. This is
explained in detail above ParallelCopyLineBoundary.Yes, but prior to that call to pg_atomic_compare_exchange_u32(),
aren't you separately reading line_state and line_state, so that
between those reads, it may have transitioned from leader to another
worker, such that the read line state ("cur_line_state", being checked
in the if block) may not actually match what is now in the line_state
and/or the read line_size ("dataSize") doesn't actually correspond to
the read line state?(sorry, still not 100% convinced that the synchronization and checks
are safe in all cases)
I think that you are describing about the problem could happen in the
following case:
when we read curr_line_state, the value was LINE_WORKER_PROCESSED or
LINE_WORKER_PROCESSING. Then in some cases if the leader is very fast
compared to the workers then the leader quickly populates one line and
sets the state to LINE_LEADER_POPULATED. State is changed to
LINE_LEADER_POPULATED when we are checking the currr_line_state.
I feel this will not be a problem because, Leader will populate & wait
till some RING element is available to populate. In the meantime
worker has seen that state is LINE_WORKER_PROCESSED or
LINE_WORKER_PROCESSING(previous state that it read), worker has
identified that this chunk was processed by some other worker, worker
will move and try to get the next available chunk & insert those
records. It will keep continuing till it gets the next chunk to
process. Eventually one of the workers will get this chunk and process
it.
(3) v3-0006-Parallel-Copy-For-Binary-Format-Files.patch
raw_buf is not used in parallel copy, instead raw_buf will be pointing
to shared memory data blocks. This memory was allocated as part of
BeginCopyFrom, uptil this point we cannot be 100% sure as copy can be
performed sequentially like in case max_worker_processes is not
available, if it switches to sequential mode raw_buf will be used
while performing copy operation. At this place we can safely free this
memory that was allocatedSo the following code (which checks raw_buf, which still points to
memory that has been pfreed) is still valid?In the SetRawBufForLoad() function, which is called by CopyReadLineText():
cur_data_blk_ptr = (cstate->raw_buf) ?
&pcshared_info->data_blocks[cur_block_pos] : NULL;The above code looks a bit dicey to me. I stepped over that line in
the debugger when I debugged an instance of Parallel Copy, so it
definitely gets executed.
It makes me wonder what other code could possibly be checking raw_buf
and using it in some way, when in fact what it points to has been
pfreed.Are you able to add the following line of code, or will it (somehow)
break logic that you are relying on?pfree(cstate->raw_buf);
cstate->raw_buf = NULL; <=== I suggest that this line is added
You are right, I have debugged & verified it sets it to an invalid
block which is not expected. There are chances this would have caused
some corruption in some machines. The suggested fix is required, I
have fixed it. I have moved this change to
0003-Allow-copy-from-command-to-process-data-from-file.patch as
0006-Parallel-Copy-For-Binary-Format-Files is only for Binary format
parallel copy & that change is common change for parallel copy.
I have attached new set of patches with the fixes.
Thoughts?
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v4-0001-Copy-code-readjustment-to-support-parallel-copy.patchtext/x-patch; charset=US-ASCII; name=v4-0001-Copy-code-readjustment-to-support-parallel-copy.patchDownload
From 896cb1f96f0a2215a097fffce94b50b2872b2c81 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Sat, 1 Aug 2020 08:52:48 +0530
Subject: [PATCH v4 1/6] Copy code readjustment to support parallel copy.
This patch has the copy code slightly readjusted so that the common code is
separated to functions/macros, these functions/macros will be used by the
workers in the parallel copy code of the upcoming patches. EOL removal is moved
from CopyReadLine to CopyReadLineText, this change was required because in case
of parallel copy the record identification and record updation is done in
CopyReadLineText, before record information is updated in shared memory the new
line characters should be removed.
---
src/backend/commands/copy.c | 361 ++++++++++++++++++++++++++------------------
1 file changed, 218 insertions(+), 143 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index db7d24a..436e458 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -95,6 +95,9 @@ typedef enum CopyInsertMethod
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
} CopyInsertMethod;
+#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
+
/*
* This struct contains all the state variables used throughout a COPY
* operation. For simplicity, we use the same struct for all variants of COPY,
@@ -224,7 +227,6 @@ typedef struct CopyStateData
* appropriate amounts of data from this buffer. In both modes, we
* guarantee that there is a \0 at raw_buf[raw_buf_len].
*/
-#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
char *raw_buf;
int raw_buf_index; /* next byte to process */
int raw_buf_len; /* total # of bytes stored */
@@ -354,6 +356,27 @@ if (1) \
goto not_end_of_copy; \
} else ((void) 0)
+/*
+ * CLEAR_EOL_LINE - Wrapper for clearing EOL.
+ */
+#define CLEAR_EOL_LINE() \
+if (!result && !IsHeaderLine()) \
+ ClearEOLFromCopiedData(cstate, cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ &cstate->line_buf.len) \
+
+/*
+ * INCREMENTPROCESSED - Increment the lines processed.
+ */
+#define INCREMENTPROCESSED(processed) \
+processed++;
+
+/*
+ * RETURNPROCESSED - Get the lines processed.
+ */
+#define RETURNPROCESSED(processed) \
+return processed;
+
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
@@ -401,7 +424,11 @@ static bool CopyGetInt16(CopyState cstate, int16 *val);
static bool CopyLoadRawBuf(CopyState cstate);
static int CopyReadBinaryData(CopyState cstate, char *dest, int nbytes);
-
+static void PopulateCommonCstateInfo(CopyState cstate, TupleDesc tup_desc,
+ List *attnamelist);
+static void ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
+ int copy_line_pos, int *copy_line_size);
+static void ConvertToServerEncoding(CopyState cstate);
/*
* Send copy start/stop messages for frontend copies. These have changed
* in past protocol redesigns.
@@ -801,14 +828,18 @@ CopyLoadRawBuf(CopyState cstate)
{
int nbytes = RAW_BUF_BYTES(cstate);
int inbytes;
+ int minread = 1;
/* Copy down the unprocessed data if any. */
if (nbytes > 0)
memmove(cstate->raw_buf, cstate->raw_buf + cstate->raw_buf_index,
nbytes);
+ if (cstate->copy_dest == COPY_NEW_FE)
+ minread = RAW_BUF_SIZE - nbytes;
+
inbytes = CopyGetData(cstate, cstate->raw_buf + nbytes,
- 1, RAW_BUF_SIZE - nbytes);
+ minread, RAW_BUF_SIZE - nbytes);
nbytes += inbytes;
cstate->raw_buf[nbytes] = '\0';
cstate->raw_buf_index = 0;
@@ -1514,7 +1545,6 @@ BeginCopy(ParseState *pstate,
{
CopyState cstate;
TupleDesc tupDesc;
- int num_phys_attrs;
MemoryContext oldcontext;
/* Allocate workspace and zero all fields */
@@ -1680,6 +1710,24 @@ BeginCopy(ParseState *pstate,
tupDesc = cstate->queryDesc->tupDesc;
}
+ PopulateCommonCstateInfo(cstate, tupDesc, attnamelist);
+ cstate->copy_dest = COPY_FILE; /* default */
+
+ MemoryContextSwitchTo(oldcontext);
+
+ return cstate;
+}
+
+/*
+ * PopulateCommonCstateInfo - Populates the common variables required for copy
+ * from operation. This is a helper function for BeginCopy function.
+ */
+static void
+PopulateCommonCstateInfo(CopyState cstate, TupleDesc tupDesc,
+ List *attnamelist)
+{
+ int num_phys_attrs;
+
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
@@ -1799,12 +1847,6 @@ BeginCopy(ParseState *pstate,
pg_database_encoding_max_length() > 1);
/* See Multibyte encoding comment above */
cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
-
- cstate->copy_dest = COPY_FILE; /* default */
-
- MemoryContextSwitchTo(oldcontext);
-
- return cstate;
}
/*
@@ -2696,32 +2738,11 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
}
/*
- * Copy FROM file to relation.
+ * Check if the relation specified in copy from is valid.
*/
-uint64
-CopyFrom(CopyState cstate)
+static void
+CheckTargetRelValidity(CopyState cstate)
{
- ResultRelInfo *resultRelInfo;
- ResultRelInfo *target_resultRelInfo;
- ResultRelInfo *prevResultRelInfo = NULL;
- EState *estate = CreateExecutorState(); /* for ExecConstraints() */
- ModifyTableState *mtstate;
- ExprContext *econtext;
- TupleTableSlot *singleslot = NULL;
- MemoryContext oldcontext = CurrentMemoryContext;
-
- PartitionTupleRouting *proute = NULL;
- ErrorContextCallback errcallback;
- CommandId mycid = GetCurrentCommandId(true);
- int ti_options = 0; /* start with default options for insert */
- BulkInsertState bistate = NULL;
- CopyInsertMethod insertMethod;
- CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
- uint64 processed = 0;
- bool has_before_insert_row_trig;
- bool has_instead_insert_row_trig;
- bool leafpart_use_multi_insert = false;
-
Assert(cstate->rel);
/*
@@ -2758,27 +2779,6 @@ CopyFrom(CopyState cstate)
RelationGetRelationName(cstate->rel))));
}
- /*
- * If the target file is new-in-transaction, we assume that checking FSM
- * for free space is a waste of time. This could possibly be wrong, but
- * it's unlikely.
- */
- if (RELKIND_HAS_STORAGE(cstate->rel->rd_rel->relkind) &&
- (cstate->rel->rd_createSubid != InvalidSubTransactionId ||
- cstate->rel->rd_firstRelfilenodeSubid != InvalidSubTransactionId))
- ti_options |= TABLE_INSERT_SKIP_FSM;
-
- /*
- * Optimize if new relfilenode was created in this subxact or one of its
- * committed children and we won't see those rows later as part of an
- * earlier scan or command. The subxact test ensures that if this subxact
- * aborts then the frozen rows won't be visible after xact cleanup. Note
- * that the stronger test of exactly which subtransaction created it is
- * crucial for correctness of this optimization. The test for an earlier
- * scan or command tolerates false negatives. FREEZE causes other sessions
- * to see rows they would not see under MVCC, and a false negative merely
- * spreads that anomaly to the current session.
- */
if (cstate->freeze)
{
/*
@@ -2816,9 +2816,61 @@ CopyFrom(CopyState cstate)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("cannot perform COPY FREEZE because the table was not created or truncated in the current subtransaction")));
+ }
+}
+
+/*
+ * Copy FROM file to relation.
+ */
+uint64
+CopyFrom(CopyState cstate)
+{
+ ResultRelInfo *resultRelInfo;
+ ResultRelInfo *target_resultRelInfo;
+ ResultRelInfo *prevResultRelInfo = NULL;
+ EState *estate = CreateExecutorState(); /* for ExecConstraints() */
+ ModifyTableState *mtstate;
+ ExprContext *econtext;
+ TupleTableSlot *singleslot = NULL;
+ MemoryContext oldcontext = CurrentMemoryContext;
+
+ PartitionTupleRouting *proute = NULL;
+ ErrorContextCallback errcallback;
+ CommandId mycid = GetCurrentCommandId(true);
+ int ti_options = 0; /* start with default options for insert */
+ BulkInsertState bistate = NULL;
+ CopyInsertMethod insertMethod;
+ CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
+ uint64 processed = 0;
+ bool has_before_insert_row_trig;
+ bool has_instead_insert_row_trig;
+ bool leafpart_use_multi_insert = false;
+
+ CheckTargetRelValidity(cstate);
+ /*
+ * If the target file is new-in-transaction, we assume that checking FSM
+ * for free space is a waste of time. This could possibly be wrong, but
+ * it's unlikely.
+ */
+ if (RELKIND_HAS_STORAGE(cstate->rel->rd_rel->relkind) &&
+ (cstate->rel->rd_createSubid != InvalidSubTransactionId ||
+ cstate->rel->rd_firstRelfilenodeSubid != InvalidSubTransactionId))
+ ti_options |= TABLE_INSERT_SKIP_FSM;
+
+ /*
+ * Optimize if new relfilenode was created in this subxact or one of its
+ * committed children and we won't see those rows later as part of an
+ * earlier scan or command. The subxact test ensures that if this subxact
+ * aborts then the frozen rows won't be visible after xact cleanup. Note
+ * that the stronger test of exactly which subtransaction created it is
+ * crucial for correctness of this optimization. The test for an earlier
+ * scan or command tolerates false negatives. FREEZE causes other sessions
+ * to see rows they would not see under MVCC, and a false negative merely
+ * spreads that anomaly to the current session.
+ */
+ if (cstate->freeze)
ti_options |= TABLE_INSERT_FROZEN;
- }
/*
* We need a ResultRelInfo so we can use the regular executor's
@@ -3311,7 +3363,7 @@ CopyFrom(CopyState cstate)
* or FDW; this is the same definition used by nodeModifyTable.c
* for counting tuples inserted by an INSERT command.
*/
- processed++;
+ INCREMENTPROCESSED(processed)
}
}
@@ -3366,30 +3418,15 @@ CopyFrom(CopyState cstate)
FreeExecutorState(estate);
- return processed;
+ RETURNPROCESSED(processed)
}
/*
- * Setup to read tuples from a file for COPY FROM.
- *
- * 'rel': Used as a template for the tuples
- * 'filename': Name of server-local file to read
- * 'attnamelist': List of char *, columns to include. NIL selects all cols.
- * 'options': List of DefElem. See copy_opt_item in gram.y for selections.
- *
- * Returns a CopyState, to be passed to NextCopyFrom and related functions.
+ * PopulateCstateCatalogInfo - populate the catalog information.
*/
-CopyState
-BeginCopyFrom(ParseState *pstate,
- Relation rel,
- const char *filename,
- bool is_program,
- copy_data_source_cb data_source_cb,
- List *attnamelist,
- List *options)
+static void
+PopulateCstateCatalogInfo(CopyState cstate)
{
- CopyState cstate;
- bool pipe = (filename == NULL);
TupleDesc tupDesc;
AttrNumber num_phys_attrs,
num_defaults;
@@ -3399,38 +3436,8 @@ BeginCopyFrom(ParseState *pstate,
Oid in_func_oid;
int *defmap;
ExprState **defexprs;
- MemoryContext oldcontext;
bool volatile_defexprs;
- cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
- oldcontext = MemoryContextSwitchTo(cstate->copycontext);
-
- /* Initialize state variables */
- cstate->reached_eof = false;
- cstate->eol_type = EOL_UNKNOWN;
- cstate->cur_relname = RelationGetRelationName(cstate->rel);
- cstate->cur_lineno = 0;
- cstate->cur_attname = NULL;
- cstate->cur_attval = NULL;
-
- /*
- * Set up variables to avoid per-attribute overhead. attribute_buf and
- * raw_buf are used in both text and binary modes, but we use line_buf
- * only in text mode.
- */
- initStringInfo(&cstate->attribute_buf);
- cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
- cstate->raw_buf_index = cstate->raw_buf_len = 0;
- if (!cstate->binary)
- {
- initStringInfo(&cstate->line_buf);
- cstate->line_buf_converted = false;
- }
-
- /* Assign range table, we'll need it in CopyFrom. */
- if (pstate)
- cstate->range_table = pstate->p_rtable;
-
tupDesc = RelationGetDescr(cstate->rel);
num_phys_attrs = tupDesc->natts;
num_defaults = 0;
@@ -3508,6 +3515,61 @@ BeginCopyFrom(ParseState *pstate,
cstate->defexprs = defexprs;
cstate->volatile_defexprs = volatile_defexprs;
cstate->num_defaults = num_defaults;
+}
+
+/*
+ * Setup to read tuples from a file for COPY FROM.
+ *
+ * 'rel': Used as a template for the tuples
+ * 'filename': Name of server-local file to read
+ * 'attnamelist': List of char *, columns to include. NIL selects all cols.
+ * 'options': List of DefElem. See copy_opt_item in gram.y for selections.
+ *
+ * Returns a CopyState, to be passed to NextCopyFrom and related functions.
+ */
+CopyState
+BeginCopyFrom(ParseState *pstate,
+ Relation rel,
+ const char *filename,
+ bool is_program,
+ copy_data_source_cb data_source_cb,
+ List *attnamelist,
+ List *options)
+{
+ CopyState cstate;
+ bool pipe = (filename == NULL);
+ MemoryContext oldcontext;
+
+ cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+
+ /* Initialize state variables */
+ cstate->reached_eof = false;
+ cstate->eol_type = EOL_UNKNOWN;
+ cstate->cur_relname = RelationGetRelationName(cstate->rel);
+ cstate->cur_lineno = 0;
+ cstate->cur_attname = NULL;
+ cstate->cur_attval = NULL;
+
+ /*
+ * Set up variables to avoid per-attribute overhead. attribute_buf is
+ * used in both text and binary modes, but we use line_buf and raw_buf
+ * only in text mode.
+ */
+ initStringInfo(&cstate->attribute_buf);
+ cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+ if (!cstate->binary)
+ {
+ initStringInfo(&cstate->line_buf);
+ cstate->line_buf_converted = false;
+ }
+
+ /* Assign range table, we'll need it in CopyFrom. */
+ if (pstate)
+ cstate->range_table = pstate->p_rtable;
+
+ PopulateCstateCatalogInfo(cstate);
cstate->is_program = is_program;
if (data_source_cb)
@@ -3917,45 +3979,60 @@ CopyReadLine(CopyState cstate)
} while (CopyLoadRawBuf(cstate));
}
}
- else
- {
- /*
- * If we didn't hit EOF, then we must have transferred the EOL marker
- * to line_buf along with the data. Get rid of it.
- */
- switch (cstate->eol_type)
- {
- case EOL_NL:
- Assert(cstate->line_buf.len >= 1);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\n');
- cstate->line_buf.len--;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_CR:
- Assert(cstate->line_buf.len >= 1);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\r');
- cstate->line_buf.len--;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_CRNL:
- Assert(cstate->line_buf.len >= 2);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 2] == '\r');
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\n');
- cstate->line_buf.len -= 2;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_UNKNOWN:
- /* shouldn't get here */
- Assert(false);
- break;
- }
- }
+ ConvertToServerEncoding(cstate);
+ return result;
+}
+
+/*
+ * ClearEOLFromCopiedData - Clear EOL from the copied data.
+ */
+static void
+ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
+ int copy_line_pos, int *copy_line_size)
+{
+ /*
+ * If we didn't hit EOF, then we must have transferred the EOL marker
+ * to line_buf along with the data. Get rid of it.
+ */
+ switch (cstate->eol_type)
+ {
+ case EOL_NL:
+ Assert(*copy_line_size >= 1);
+ Assert(copy_line_data[copy_line_pos - 1] == '\n');
+ copy_line_data[copy_line_pos - 1] = '\0';
+ (*copy_line_size)--;
+ break;
+ case EOL_CR:
+ Assert(*copy_line_size >= 1);
+ Assert(copy_line_data[copy_line_pos - 1] == '\r');
+ copy_line_data[copy_line_pos - 1] = '\0';
+ (*copy_line_size)--;
+ break;
+ case EOL_CRNL:
+ Assert(*copy_line_size >= 2);
+ Assert(copy_line_data[copy_line_pos - 2] == '\r');
+ Assert(copy_line_data[copy_line_pos - 1] == '\n');
+ copy_line_data[copy_line_pos - 2] = '\0';
+ *copy_line_size -= 2;
+ break;
+ case EOL_UNKNOWN:
+ /* shouldn't get here */
+ Assert(false);
+ break;
+ }
+}
+
+/*
+ * ConvertToServerEncoding - convert contents to server encoding.
+ */
+static void
+ConvertToServerEncoding(CopyState cstate)
+{
/* Done reading the line. Convert it to server encoding. */
if (cstate->need_transcoding)
{
char *cvt;
-
cvt = pg_any_to_server(cstate->line_buf.data,
cstate->line_buf.len,
cstate->file_encoding);
@@ -3967,11 +4044,8 @@ CopyReadLine(CopyState cstate)
pfree(cvt);
}
}
-
/* Now it's safe to use the buffer in error messages */
cstate->line_buf_converted = true;
-
- return result;
}
/*
@@ -4334,6 +4408,7 @@ not_end_of_copy:
* Transfer any still-uncopied data to line_buf.
*/
REFILL_LINEBUF;
+ CLEAR_EOL_LINE();
return result;
}
--
1.8.3.1
v4-0002-Framework-for-leader-worker-in-parallel-copy.patchtext/x-patch; charset=US-ASCII; name=v4-0002-Framework-for-leader-worker-in-parallel-copy.patchDownload
From eb97b28ecaacfbd248d5662f15bd5e331821a03a Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Sat, 1 Aug 2020 08:56:39 +0530
Subject: [PATCH v4 2/6] Framework for leader/worker in parallel copy
This patch has the framework for data structures in parallel copy, leader
initialization, worker initialization, shared memory updation, starting workers,
wait for workers and workers exiting.
---
src/backend/access/transam/parallel.c | 4 +
src/backend/commands/copy.c | 743 +++++++++++++++++++++++++++++++++-
src/include/commands/copy.h | 2 +
src/tools/pgindent/typedefs.list | 7 +
4 files changed, 752 insertions(+), 4 deletions(-)
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index b042696..a3cff4b 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -25,6 +25,7 @@
#include "catalog/pg_enum.h"
#include "catalog/storage.h"
#include "commands/async.h"
+#include "commands/copy.h"
#include "executor/execParallel.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
@@ -145,6 +146,9 @@ static const struct
},
{
"parallel_vacuum_main", parallel_vacuum_main
+ },
+ {
+ "ParallelCopyMain", ParallelCopyMain
}
};
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 436e458..704bbd3 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -96,9 +96,181 @@ typedef enum CopyInsertMethod
} CopyInsertMethod;
#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+
+/*
+ * The macros DATA_BLOCK_SIZE, RINGSIZE & MAX_BLOCKS_COUNT stores the records
+ * read from the file that need to be inserted into the relation. These values
+ * help in handover of multiple records with significant size of data to be
+ * processed by each of the workers to make sure there is no context switch & the
+ * work is fairly distributed among the workers. This number showed best
+ * results in the performance tests.
+ */
+#define DATA_BLOCK_SIZE RAW_BUF_SIZE
+
+/* It can hold 1000 blocks of 64K data in DSM to be processed by the worker. */
+#define MAX_BLOCKS_COUNT 1000
+
+/* It can hold upto 10000 record information for worker to process. RINGSIZE
+ * should be a multiple of WORKER_CHUNK_COUNT, as wrap around cases is currently
+ * not handled while selecting the WORKER_CHUNK_COUNT by the worker. */
+#define RINGSIZE (10 * 1000)
+
+/*
+ * Each worker will be allocated WORKER_CHUNK_COUNT of records from DSM data
+ * block to process to avoid lock contention. Read RINGSIZE comments before
+ * changing this value.
+ */
+#define WORKER_CHUNK_COUNT 50
+
+#define IsParallelCopy() (cstate->is_parallel)
+#define IsLeader() (cstate->pcdata->is_leader)
#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
/*
+ * Copy data block information.
+ * ParallelCopyDataBlock's will be created in DSM. Data read from file will be
+ * copied in these DSM data blocks. The leader process identifies the records
+ * and the record information will be shared to the workers. The workers will
+ * insert the records into the table. There can be one or more number of records
+ * in each of the data block based on the record size.
+ */
+typedef struct ParallelCopyDataBlock
+{
+ /* The number of unprocessed lines in the current block. */
+ pg_atomic_uint32 unprocessed_line_parts;
+
+ /*
+ * If the current line data is continued into another block, following_block
+ * will have the position where the remaining data need to be read.
+ */
+ uint32 following_block;
+
+ /*
+ * This flag will be set, when the leader finds out this block can be read
+ * safely by the worker. This helps the worker to start processing the line
+ * early where the line will be spread across many blocks and the worker
+ * need not wait for the complete line to be processed.
+ */
+ bool curr_blk_completed;
+ char data[DATA_BLOCK_SIZE]; /* data read from file */
+ uint8 skip_bytes;
+}ParallelCopyDataBlock;
+
+/*
+ * Individual line information.
+ * ParallelCopyLineBoundary is common data structure between leader & worker,
+ * Leader process will be populating data block, data block offset & the size of
+ * the record in DSM for the workers to copy the data into the relation.
+ * This is protected by the following sequence in the leader & worker. If they
+ * don't follow this order the worker might process wrong line_size and leader
+ * might populate the information which worker has not yet processed or in the
+ * process of processing.
+ * Leader should operate in the following order:
+ * 1) check if line_size is -1, if not wait, it means worker is still
+ * processing.
+ * 2) set line_state to LINE_LEADER_POPULATING.
+ * 3) update first_block, start_offset & cur_lineno in any order.
+ * 4) update line_size.
+ * 5) update line_state to LINE_LEADER_POPULATED.
+ * Worker should operate in the following order:
+ * 1) check line_state is LINE_LEADER_POPULATED, if not it means leader is still
+ * populating the data.
+ * 2) read line_size.
+ * 3) only one worker should choose one line for processing, this is handled by
+ * using pg_atomic_compare_exchange_u32, worker will change the state to
+ * LINE_WORKER_PROCESSING only if line_state is LINE_LEADER_POPULATED.
+ * 4) read first_block, start_offset & cur_lineno in any order.
+ * 5) process line_size data.
+ * 6) update line_size to -1.
+ */
+typedef struct ParallelCopyLineBoundary
+{
+ /* Position of the first block in data_blocks array. */
+ uint32 first_block;
+ uint32 start_offset; /* start offset of the line */
+
+ /*
+ * Size of the current line -1 means line is yet to be filled completely,
+ * 0 means empty line, >0 means line filled with line size data.
+ */
+ pg_atomic_uint32 line_size;
+ pg_atomic_uint32 line_state; /* line state */
+ uint64 cur_lineno; /* line number for error messages */
+}ParallelCopyLineBoundary;
+
+/*
+ * Circular queue used to store the line information.
+ */
+typedef struct ParallelCopyLineBoundaries
+{
+ /* Position for the leader to populate a line. */
+ uint32 pos;
+
+ /* Data read from the file/stdin by the leader process. */
+ ParallelCopyLineBoundary ring[RINGSIZE];
+}ParallelCopyLineBoundaries;
+
+/*
+ * Shared information among parallel copy workers. This will be allocated in the
+ * DSM segment.
+ */
+typedef struct ParallelCopyShmInfo
+{
+ bool is_read_in_progress; /* file read status */
+
+ /*
+ * Actual lines inserted by worker, will not be same as
+ * total_worker_processed if where condition is specified along with copy.
+ * This will be the actual records inserted into the relation.
+ */
+ pg_atomic_uint64 processed;
+
+ /*
+ * The number of records currently processed by the worker, this will also
+ * include the number of records that was filtered because of where clause.
+ */
+ pg_atomic_uint64 total_worker_processed;
+ uint64 populated; /* lines populated by leader */
+ uint32 cur_block_pos; /* current data block */
+ ParallelCopyDataBlock data_blocks[MAX_BLOCKS_COUNT]; /* data block array */
+ FullTransactionId full_transaction_id; /* xid for copy from statement */
+ CommandId mycid; /* command id */
+ ParallelCopyLineBoundaries line_boundaries; /* line array */
+} ParallelCopyShmInfo;
+
+/*
+ * Parallel copy line buffer information.
+ */
+typedef struct ParallelCopyLineBuf
+{
+ StringInfoData line_buf;
+ uint64 cur_lineno; /* line number for error messages */
+}ParallelCopyLineBuf;
+
+/*
+ * Parallel copy data information.
+ */
+typedef struct ParallelCopyData
+{
+ Oid relid; /* relation id of the table */
+ ParallelCopyShmInfo *pcshared_info; /* common info in shared memory */
+ bool is_leader;
+
+ /* line position which worker is processing */
+ uint32 worker_processed_pos;
+
+ /*
+ * Local line_buf array, workers will copy it here and release the lines
+ * for the leader to continue.
+ */
+ ParallelCopyLineBuf worker_line_buf[WORKER_CHUNK_COUNT];
+ uint32 worker_line_buf_count; /* Number of lines */
+
+ /* Current position in worker_line_buf */
+ uint32 worker_line_buf_pos;
+}ParallelCopyData;
+
+/*
* This struct contains all the state variables used throughout a COPY
* operation. For simplicity, we use the same struct for all variants of COPY,
* even though some fields are used in only some cases.
@@ -230,10 +402,38 @@ typedef struct CopyStateData
char *raw_buf;
int raw_buf_index; /* next byte to process */
int raw_buf_len; /* total # of bytes stored */
+ int nworkers;
+ bool is_parallel;
+ ParallelCopyData *pcdata;
/* Shorthand for number of unconsumed bytes available in raw_buf */
#define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
} CopyStateData;
+/*
+ * This structure helps in storing the common data from CopyStateData that are
+ * required by the workers. This information will then be allocated and stored
+ * into the DSM for the worker to retrieve and copy it to CopyStateData.
+ */
+typedef struct SerializedParallelCopyState
+{
+ /* low-level state data */
+ CopyDest copy_dest; /* type of copy source/destination */
+ int file_encoding; /* file or remote side's character encoding */
+ bool need_transcoding; /* file encoding diff from server? */
+ bool encoding_embeds_ascii; /* ASCII can be non-first byte? */
+
+ /* parameters from the COPY command */
+ bool csv_mode; /* Comma Separated Value format? */
+ bool header_line; /* CSV header line? */
+ int null_print_len; /* length of same */
+ bool force_quote_all; /* FORCE_QUOTE *? */
+ bool convert_selectively; /* do selective binary conversion? */
+
+ /* Working state for COPY FROM */
+ AttrNumber num_defaults;
+ Oid relid;
+}SerializedParallelCopyState;
+
/* DestReceiver for COPY (query) TO */
typedef struct
{
@@ -263,6 +463,22 @@ typedef struct
/* Trim the list of buffers back down to this number after flushing */
#define MAX_PARTITION_BUFFERS 32
+/* DSM keys for parallel copy. */
+#define PARALLEL_COPY_KEY_SHARED_INFO 1
+#define PARALLEL_COPY_KEY_CSTATE 2
+#define PARALLEL_COPY_KEY_NULL_PRINT 3
+#define PARALLEL_COPY_KEY_DELIM 4
+#define PARALLEL_COPY_KEY_QUOTE 5
+#define PARALLEL_COPY_KEY_ESCAPE 6
+#define PARALLEL_COPY_KEY_ATTNAME_LIST 7
+#define PARALLEL_COPY_KEY_NOT_NULL_LIST 8
+#define PARALLEL_COPY_KEY_NULL_LIST 9
+#define PARALLEL_COPY_KEY_CONVERT_LIST 10
+#define PARALLEL_COPY_KEY_WHERE_CLAUSE_STR 11
+#define PARALLEL_COPY_KEY_RANGE_TABLE 12
+#define PARALLEL_COPY_WAL_USAGE 13
+#define PARALLEL_COPY_BUFFER_USAGE 14
+
/* Stores multi-insert data related to a single relation in CopyFrom. */
typedef struct CopyMultiInsertBuffer
{
@@ -424,11 +640,477 @@ static bool CopyGetInt16(CopyState cstate, int16 *val);
static bool CopyLoadRawBuf(CopyState cstate);
static int CopyReadBinaryData(CopyState cstate, char *dest, int nbytes);
+static pg_attribute_always_inline void EndParallelCopy(ParallelContext *pcxt);
static void PopulateCommonCstateInfo(CopyState cstate, TupleDesc tup_desc,
- List *attnamelist);
+ List *attnamelist);
static void ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
int copy_line_pos, int *copy_line_size);
static void ConvertToServerEncoding(CopyState cstate);
+
+/*
+ * SerializeParallelCopyState - Copy shared_cstate using cstate information.
+ */
+static pg_attribute_always_inline void
+SerializeParallelCopyState(CopyState cstate, SerializedParallelCopyState *shared_cstate)
+{
+ shared_cstate->copy_dest = cstate->copy_dest;
+ shared_cstate->file_encoding = cstate->file_encoding;
+ shared_cstate->need_transcoding = cstate->need_transcoding;
+ shared_cstate->encoding_embeds_ascii = cstate->encoding_embeds_ascii;
+ shared_cstate->csv_mode = cstate->csv_mode;
+ shared_cstate->header_line = cstate->header_line;
+ shared_cstate->null_print_len = cstate->null_print_len;
+ shared_cstate->force_quote_all = cstate->force_quote_all;
+ shared_cstate->convert_selectively = cstate->convert_selectively;
+ shared_cstate->num_defaults = cstate->num_defaults;
+ shared_cstate->relid = cstate->pcdata->relid;
+}
+
+/*
+ * RestoreString - Retrieve the string from shared memory.
+ */
+static void
+RestoreString(shm_toc *toc, int sharedkey, char **copystr)
+{
+ char *shared_str_val = (char *)shm_toc_lookup(toc, sharedkey, true);
+ if (shared_str_val)
+ *copystr = pstrdup(shared_str_val);
+}
+
+/*
+ * EstimateLineKeysStr - Estimate the size required in shared memory for the
+ * input string.
+ */
+static void
+EstimateLineKeysStr(ParallelContext *pcxt, char *inputstr)
+{
+ if (inputstr)
+ {
+ shm_toc_estimate_chunk(&pcxt->estimator, strlen(inputstr) + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+}
+
+/*
+ * SerializeString - Insert a string into shared memory.
+ */
+static void
+SerializeString(ParallelContext *pcxt, int key, char *inputstr)
+{
+ if (inputstr)
+ {
+ char *shmptr = (char *)shm_toc_allocate(pcxt->toc,
+ strlen(inputstr) + 1);
+ strcpy(shmptr, inputstr);
+ shm_toc_insert(pcxt->toc, key, shmptr);
+ }
+}
+
+/*
+ * PopulateParallelCopyShmInfo - set ParallelCopyShmInfo.
+ */
+static void
+PopulateParallelCopyShmInfo(ParallelCopyShmInfo *shared_info_ptr,
+ FullTransactionId full_transaction_id)
+{
+ uint32 count;
+
+ MemSet(shared_info_ptr, 0, sizeof(ParallelCopyShmInfo));
+ shared_info_ptr->is_read_in_progress = true;
+ shared_info_ptr->cur_block_pos = -1;
+ shared_info_ptr->full_transaction_id = full_transaction_id;
+ shared_info_ptr->mycid = GetCurrentCommandId(true);
+ for (count = 0; count < RINGSIZE; count++)
+ {
+ ParallelCopyLineBoundary *lineInfo = &shared_info_ptr->line_boundaries.ring[count];
+ pg_atomic_init_u32(&(lineInfo->line_size), -1);
+ }
+}
+
+/*
+ * BeginParallelCopy - start parallel copy tasks.
+ *
+ * Get the number of workers required to perform the parallel copy. The data
+ * structures that are required by the parallel workers will be initialized, the
+ * size required in DSM will be calculated and the necessary keys will be loaded
+ * in the DSM. The specified number of workers will then be launched.
+ */
+static ParallelContext*
+BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
+{
+ ParallelContext *pcxt;
+ ParallelCopyShmInfo *shared_info_ptr;
+ SerializedParallelCopyState *shared_cstate;
+ FullTransactionId full_transaction_id;
+ Size est_cstateshared;
+ char *whereClauseStr = NULL;
+ char *rangeTableStr = NULL;
+ char *attnameListStr = NULL;
+ char *notnullListStr = NULL;
+ char *nullListStr = NULL;
+ char *convertListStr = NULL;
+ int parallel_workers = 0;
+ WalUsage *walusage;
+ BufferUsage *bufferusage;
+ ParallelCopyData *pcdata;
+ MemoryContext oldcontext;
+
+ parallel_workers = Min(nworkers, max_worker_processes);
+
+ /* Can't perform copy in parallel */
+ if (parallel_workers <= 0)
+ return NULL;
+
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+ pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
+ MemoryContextSwitchTo(oldcontext);
+ cstate->pcdata = pcdata;
+
+ EnterParallelMode();
+ pcxt = CreateParallelContext("postgres", "ParallelCopyMain",
+ parallel_workers);
+ Assert(pcxt->nworkers > 0);
+
+ /*
+ * Estimate size for shared information for PARALLEL_COPY_KEY_SHARED_INFO
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator, sizeof(ParallelCopyShmInfo));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate the size for shared information for PARALLEL_COPY_KEY_CSTATE */
+ est_cstateshared = MAXALIGN(sizeof(SerializedParallelCopyState));
+ shm_toc_estimate_chunk(&pcxt->estimator, est_cstateshared);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ EstimateLineKeysStr(pcxt, cstate->null_print);
+ EstimateLineKeysStr(pcxt, cstate->null_print_client);
+ EstimateLineKeysStr(pcxt, cstate->delim);
+ EstimateLineKeysStr(pcxt, cstate->quote);
+ EstimateLineKeysStr(pcxt, cstate->escape);
+
+ if (cstate->whereClause != NULL)
+ {
+ whereClauseStr = nodeToString(cstate->whereClause);
+ EstimateLineKeysStr(pcxt, whereClauseStr);
+ }
+
+ if (cstate->range_table != NULL)
+ {
+ rangeTableStr = nodeToString(cstate->range_table);
+ EstimateLineKeysStr(pcxt, rangeTableStr);
+ }
+
+ /* Estimate the size for shared information for PARALLEL_COPY_KEY_XID. */
+ shm_toc_estimate_chunk(&pcxt->estimator, sizeof(FullTransactionId));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_ATTNAME_LIST.
+ */
+ if (attnamelist != NIL)
+ {
+ attnameListStr = nodeToString(attnamelist);
+ EstimateLineKeysStr(pcxt, attnameListStr);
+ }
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_NOT_NULL_LIST.
+ */
+ if (cstate->force_notnull != NIL)
+ {
+ notnullListStr = nodeToString(cstate->force_notnull);
+ EstimateLineKeysStr(pcxt, notnullListStr);
+ }
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_NULL_LIST.
+ */
+ if (cstate->force_null != NIL)
+ {
+ nullListStr = nodeToString(cstate->force_null);
+ EstimateLineKeysStr(pcxt, nullListStr);
+ }
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_CONVERT_LIST.
+ */
+ if (cstate->convert_select != NIL)
+ {
+ convertListStr = nodeToString(cstate->convert_select);
+ EstimateLineKeysStr(pcxt, convertListStr);
+ }
+
+ /*
+ * Estimate space for WalUsage and BufferUsage -- PARALLEL_COPY_WAL_USAGE
+ * and PARALLEL_COPY_BUFFER_USAGE.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ InitializeParallelDSM(pcxt);
+
+ /* If no DSM segment was available, back out (do serial copy) */
+ if (pcxt->seg == NULL)
+ {
+ EndParallelCopy(pcxt);
+ return NULL;
+ }
+
+ /* Allocate shared memory for PARALLEL_COPY_KEY_SHARED_INFO */
+ shared_info_ptr = (ParallelCopyShmInfo *) shm_toc_allocate(pcxt->toc, sizeof(ParallelCopyShmInfo));
+ PopulateParallelCopyShmInfo(shared_info_ptr, full_transaction_id);
+
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_SHARED_INFO, shared_info_ptr);
+ pcdata->pcshared_info = shared_info_ptr;
+ pcdata->relid = relid;
+
+ /* Store shared build state, for which we reserved space. */
+ shared_cstate = (SerializedParallelCopyState *)shm_toc_allocate(pcxt->toc, est_cstateshared);
+
+ /* copy cstate variables. */
+ SerializeParallelCopyState(cstate, shared_cstate);
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_CSTATE, shared_cstate);
+
+ SerializeString(pcxt, PARALLEL_COPY_KEY_NULL_PRINT, cstate->null_print);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_DELIM, cstate->delim);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_QUOTE, cstate->quote);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_ESCAPE, cstate->escape);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_ATTNAME_LIST, attnameListStr);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_NOT_NULL_LIST, notnullListStr);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_NULL_LIST, nullListStr);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_CONVERT_LIST, convertListStr);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_WHERE_CLAUSE_STR, whereClauseStr);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_RANGE_TABLE, rangeTableStr);
+
+ /*
+ * Allocate space for each worker's WalUsage and BufferUsage; no need to
+ * initialize.
+ */
+ walusage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_WAL_USAGE, walusage);
+ bufferusage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_BUFFER_USAGE, bufferusage);
+
+ LaunchParallelWorkers(pcxt);
+ if (pcxt->nworkers_launched == 0)
+ {
+ EndParallelCopy(pcxt);
+ return NULL;
+ }
+
+ /*
+ * Caller needs to wait for all launched workers when we return. Make sure
+ * that the failure-to-start case will not hang forever.
+ */
+ WaitForParallelWorkersToAttach(pcxt);
+
+ pcdata->is_leader = true;
+ cstate->is_parallel = true;
+ return pcxt;
+}
+
+/*
+ * EndParallelCopy - end the parallel copy tasks.
+ */
+static pg_attribute_always_inline void
+EndParallelCopy(ParallelContext *pcxt)
+{
+ Assert(!IsParallelWorker());
+
+ DestroyParallelContext(pcxt);
+ ExitParallelMode();
+}
+
+/*
+ * InitializeParallelCopyInfo - Initialize parallel worker.
+ */
+static void
+InitializeParallelCopyInfo(SerializedParallelCopyState *shared_cstate,
+ CopyState cstate, List *attnamelist)
+{
+ uint32 count;
+ ParallelCopyData *pcdata = cstate->pcdata;
+ TupleDesc tup_desc = RelationGetDescr(cstate->rel);
+
+ cstate->copy_dest = shared_cstate->copy_dest;
+ cstate->file_encoding = shared_cstate->file_encoding;
+ cstate->need_transcoding = shared_cstate->need_transcoding;
+ cstate->encoding_embeds_ascii = shared_cstate->encoding_embeds_ascii;
+ cstate->csv_mode = shared_cstate->csv_mode;
+ cstate->header_line = shared_cstate->header_line;
+ cstate->null_print_len = shared_cstate->null_print_len;
+ cstate->force_quote_all = shared_cstate->force_quote_all;
+ cstate->convert_selectively = shared_cstate->convert_selectively;
+ cstate->num_defaults = shared_cstate->num_defaults;
+ pcdata->relid = shared_cstate->relid;
+
+ PopulateCommonCstateInfo(cstate, tup_desc, attnamelist);
+
+ /* Initialize state variables. */
+ cstate->reached_eof = false;
+ cstate->eol_type = EOL_UNKNOWN;
+ cstate->cur_relname = RelationGetRelationName(cstate->rel);
+ cstate->cur_lineno = 0;
+ cstate->cur_attname = NULL;
+ cstate->cur_attval = NULL;
+
+ /* Set up variables to avoid per-attribute overhead. */
+ initStringInfo(&cstate->attribute_buf);
+
+ initStringInfo(&cstate->line_buf);
+ for (count = 0; count < WORKER_CHUNK_COUNT; count++)
+ initStringInfo(&pcdata->worker_line_buf[count].line_buf);
+
+ cstate->line_buf_converted = false;
+ cstate->raw_buf = NULL;
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+}
+
+/*
+ * ParallelCopyMain - parallel copy worker's code.
+ *
+ * Where clause handling, convert tuple to columns, add default null values for
+ * the missing columns that are not present in that record. Find the partition
+ * if it is partitioned table, invoke before row insert Triggers, handle
+ * constraints and insert the tuples.
+ */
+void
+ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
+{
+ CopyState cstate;
+ ParallelCopyData *pcdata;
+ ParallelCopyShmInfo *pcshared_info;
+ SerializedParallelCopyState *shared_cstate;
+ Relation rel = NULL;
+ MemoryContext oldcontext;
+ List *attlist = NIL;
+ char *whereClauseStr = NULL;
+ char *rangeTableStr = NULL;
+ char *attnameListStr = NULL;
+ char *notnullListStr = NULL;
+ char *nullListStr = NULL;
+ char *convertListStr = NULL;
+ WalUsage *walusage;
+ BufferUsage *bufferusage;
+
+ /* Allocate workspace and zero all fields. */
+ cstate = (CopyStateData *) palloc0(sizeof(CopyStateData));
+
+ /*
+ * We allocate everything used by a cstate in a new memory context. This
+ * avoids memory leaks during repeated use of COPY in a query.
+ */
+ cstate->copycontext = AllocSetContextCreate(CurrentMemoryContext,
+ "COPY",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+
+ pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
+ cstate->pcdata = pcdata;
+ pcdata->is_leader = false;
+ pcdata->worker_processed_pos = -1;
+ cstate->is_parallel = true;
+ pcshared_info = (ParallelCopyShmInfo *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_SHARED_INFO, false);
+
+ ereport(DEBUG1, (errmsg("Starting parallel copy worker")));
+
+ pcdata->pcshared_info = pcshared_info;
+
+ shared_cstate = (SerializedParallelCopyState *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_CSTATE, false);
+ cstate->null_print = (char *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_NULL_PRINT, true);
+
+ RestoreString(toc, PARALLEL_COPY_KEY_DELIM, &cstate->delim);
+ RestoreString(toc, PARALLEL_COPY_KEY_QUOTE, &cstate->quote);
+ RestoreString(toc, PARALLEL_COPY_KEY_ESCAPE, &cstate->escape);
+ RestoreString(toc, PARALLEL_COPY_KEY_ATTNAME_LIST, &attnameListStr);
+ if (attnameListStr)
+ attlist = (List *)stringToNode(attnameListStr);
+
+ RestoreString(toc, PARALLEL_COPY_KEY_NOT_NULL_LIST, ¬nullListStr);
+ if (notnullListStr)
+ cstate->force_notnull = (List *)stringToNode(notnullListStr);
+
+ RestoreString(toc, PARALLEL_COPY_KEY_NULL_LIST, &nullListStr);
+ if (nullListStr)
+ cstate->force_null = (List *)stringToNode(nullListStr);
+
+ RestoreString(toc, PARALLEL_COPY_KEY_CONVERT_LIST, &convertListStr);
+ if (convertListStr)
+ cstate->convert_select = (List *)stringToNode(convertListStr);
+
+ RestoreString(toc, PARALLEL_COPY_KEY_WHERE_CLAUSE_STR, &whereClauseStr);
+ RestoreString(toc, PARALLEL_COPY_KEY_RANGE_TABLE, &rangeTableStr);
+
+ if (whereClauseStr)
+ {
+ Node *whereClauseCnvrtdFrmStr = (Node *) stringToNode(whereClauseStr);
+ cstate->whereClause = whereClauseCnvrtdFrmStr;
+ }
+
+ if (rangeTableStr)
+ {
+ List *rangeTableCnvrtdFrmStr = (List *) stringToNode(rangeTableStr);
+ cstate->range_table = rangeTableCnvrtdFrmStr;
+ }
+
+ /* Open and lock the relation, using the appropriate lock type. */
+ rel = table_open(shared_cstate->relid, RowExclusiveLock);
+ cstate->rel = rel;
+ InitializeParallelCopyInfo(shared_cstate, cstate, attlist);
+
+ CopyFrom(cstate);
+
+ if (rel != NULL)
+ table_close(rel, RowExclusiveLock);
+
+ /* Report WAL/buffer usage during parallel execution */
+ bufferusage = shm_toc_lookup(toc, PARALLEL_COPY_BUFFER_USAGE, false);
+ walusage = shm_toc_lookup(toc, PARALLEL_COPY_WAL_USAGE, false);
+ InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber],
+ &walusage[ParallelWorkerNumber]);
+
+ MemoryContextSwitchTo(oldcontext);
+ pfree(cstate);
+ return;
+}
+
+/*
+ * ParallelCopyFrom - parallel copy leader's functionality.
+ *
+ * Leader executes the before statement for before statement trigger, if before
+ * statement trigger is present. It will read the table data from the file and
+ * copy the contents to DSM data blocks. It will then read the input contents
+ * from the DSM data block and identify the records based on line breaks. This
+ * information is called line or a record that need to be inserted into a
+ * relation. The line information will be stored in ParallelCopyLineBoundary DSM
+ * data structure. Workers will then process this information and insert the
+ * data in to table. It will repeat this process until the all data is read from
+ * the file and all the DSM data blocks are processed. While processing if
+ * leader identifies that DSM Data blocks or DSM ParallelCopyLineBoundary data
+ * structures is full, leader will wait till the worker frees up some entries
+ * and repeat the process. It will wait till all the lines populated are
+ * processed by the workers and exits.
+ */
+static void
+ParallelCopyFrom(CopyState cstate)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ereport(DEBUG1, (errmsg("Running parallel copy leader")));
+
+ pcshared_info->is_read_in_progress = false;
+ cstate->cur_lineno = 0;
+}
+
/*
* Send copy start/stop messages for frontend copies. These have changed
* in past protocol redesigns.
@@ -1141,6 +1823,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
if (is_from)
{
+ ParallelContext *pcxt = NULL;
Assert(rel);
/* check read-only transaction and parallel mode */
@@ -1150,7 +1833,24 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
cstate = BeginCopyFrom(pstate, rel, stmt->filename, stmt->is_program,
NULL, stmt->attlist, stmt->options);
cstate->whereClause = whereClause;
- *processed = CopyFrom(cstate); /* copy from file to database */
+ cstate->is_parallel = false;
+
+ if (cstate->nworkers > 0)
+ pcxt = BeginParallelCopy(cstate->nworkers, cstate, stmt->attlist,
+ relid);
+
+ if (pcxt)
+ {
+ ParallelCopyFrom(cstate);
+
+ /* Wait for all copy workers to finish */
+ WaitForParallelWorkersToFinish(pcxt);
+ *processed = pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
+ EndParallelCopy(pcxt);
+ }
+ else
+ *processed = CopyFrom(cstate); /* copy from file to database */
+
EndCopyFrom(cstate);
}
else
@@ -1199,6 +1899,7 @@ ProcessCopyOptions(ParseState *pstate,
cstate->is_copy_from = is_from;
cstate->file_encoding = -1;
+ cstate->nworkers = -1;
/* Extract options from the statement node tree */
foreach(option, options)
@@ -1367,6 +2068,39 @@ ProcessCopyOptions(ParseState *pstate,
defel->defname),
parser_errposition(pstate, defel->location)));
}
+ else if (strcmp(defel->defname, "parallel") == 0)
+ {
+ int val;
+ bool parsed;
+ char *strval;
+
+ if (!cstate->is_copy_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("parallel option is supported only for copy from"),
+ parser_errposition(pstate, defel->location)));
+ if (cstate->nworkers >= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("conflicting or redundant options"),
+ parser_errposition(pstate, defel->location)));
+
+ strval = defGetString(defel);
+ parsed = parse_int(strval, &val, 0, NULL);
+ if (!parsed)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for integer option \"%s\": %s",
+ defel->defname, strval)));
+ if (val < 1 || val > 1024)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("value %s out of bounds for option \"%s\"",
+ strval, defel->defname),
+ errdetail("Valid values are between \"%d\" and \"%d\".",
+ 1, 1024)));
+ cstate->nworkers = val;
+ }
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1720,11 +2454,12 @@ BeginCopy(ParseState *pstate,
/*
* PopulateCommonCstateInfo - Populates the common variables required for copy
- * from operation. This is a helper function for BeginCopy function.
+ * from operation. This is a helper function for BeginCopy &
+ * InitializeParallelCopyInfo function.
*/
static void
PopulateCommonCstateInfo(CopyState cstate, TupleDesc tupDesc,
- List *attnamelist)
+ List *attnamelist)
{
int num_phys_attrs;
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c639833..82843c6 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -14,6 +14,7 @@
#ifndef COPY_H
#define COPY_H
+#include "access/parallel.h"
#include "nodes/execnodes.h"
#include "nodes/parsenodes.h"
#include "parser/parse_node.h"
@@ -41,4 +42,5 @@ extern uint64 CopyFrom(CopyState cstate);
extern DestReceiver *CreateCopyDestReceiver(void);
+extern void ParallelCopyMain(dsm_segment *seg, shm_toc *toc);
#endif /* COPY_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 3d99046..a0e4ac7 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1700,6 +1700,12 @@ ParallelBitmapHeapState
ParallelBlockTableScanDesc
ParallelCompletionPtr
ParallelContext
+ParallelCopyLineBoundaries
+ParallelCopyLineBoundary
+ParallelCopyData
+ParallelCopyDataBlock
+ParallelCopyLineBuf
+ParallelCopyShmInfo
ParallelExecutorInfo
ParallelHashGrowth
ParallelHashJoinBatch
@@ -2216,6 +2222,7 @@ SerCommitSeqNo
SerialControl
SerializableXactHandle
SerializedActiveRelMaps
+SerializedParallelCopyState
SerializedReindexState
SerializedSnapshotData
SerializedTransactionState
--
1.8.3.1
v4-0003-Allow-copy-from-command-to-process-data-from-file.patchtext/x-patch; charset=US-ASCII; name=v4-0003-Allow-copy-from-command-to-process-data-from-file.patchDownload
From ba0bcdcf758d1fb779921b2ab615170a53ecdc4b Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Sat, 1 Aug 2020 09:31:42 +0530
Subject: [PATCH v4 3/6] Allow copy from command to process data from
file/STDIN contents to a table in parallel.
This feature allows the copy from to leverage multiple CPUs in order to copy
data from file/STDIN to a table. This adds a PARALLEL option to COPY FROM
command where the user can specify the number of workers that can be used
to perform the COPY FROM command.
The backend, to which the "COPY FROM" query is submitted acts as leader with
the responsibility of reading data from the file/stdin, launching at most n
number of workers as specified with PARALLEL 'n' option in the "COPY FROM"
query. The leader populates the common data required for the workers execution
in the DSM and shares it with the workers. The leader then executes before
statement triggers if there exists any. Leader populates DSM lines which
includes the start offset and line size, while populating the lines it reads
as many blocks as required into the DSM data blocks from the file. Each block
is of 64K size. The leader parses the data to identify a line, the existing
logic from CopyReadLineText which identifies the lines with some changes was
used for this. Leader checks if a free line is available to copy the
information, if there is no free line it waits till the required line is
freed up by the worker and then copies the identified lines information
(offset & line size) into the DSM lines. This process is repeated till the
complete file is processed. Simultaneously, the workers cache the lines(50)
locally into the local memory and release the lines to the leader for further
populating. Each worker processes the lines that it cached and inserts it into
the table. The leader does not participate in the insertion of data, leaders
only responsibility will be to identify the lines as fast as possible for the
workers to do the actual copy operation. The leader waits till all the lines
populated are processed by the workers and exits.
We have chosen this design based on the reason "that everything stalls if the
leader doesn't accept further input data, as well as when there are no
available splitted chunks so it doesn't seem like a good idea to have the
leader do other work. This is backed by the performance data where we have
seen that with 1 worker there is just a 5-10% performance difference".
---
src/backend/access/common/toast_internals.c | 11 +-
src/backend/access/heap/heapam.c | 13 -
src/backend/access/transam/xact.c | 31 +
src/backend/commands/copy.c | 896 ++++++++++++++++++++++++++--
src/bin/psql/tab-complete.c | 2 +-
src/include/access/xact.h | 2 +
src/tools/pgindent/typedefs.list | 1 +
7 files changed, 902 insertions(+), 54 deletions(-)
diff --git a/src/backend/access/common/toast_internals.c b/src/backend/access/common/toast_internals.c
index 25a81e5..586d53d 100644
--- a/src/backend/access/common/toast_internals.c
+++ b/src/backend/access/common/toast_internals.c
@@ -17,6 +17,7 @@
#include "access/genam.h"
#include "access/heapam.h"
#include "access/heaptoast.h"
+#include "access/parallel.h"
#include "access/table.h"
#include "access/toast_internals.h"
#include "access/xact.h"
@@ -116,7 +117,15 @@ toast_save_datum(Relation rel, Datum value,
TupleDesc toasttupDesc;
Datum t_values[3];
bool t_isnull[3];
- CommandId mycid = GetCurrentCommandId(true);
+
+ /*
+ * Parallel copy can insert toast tuples, in case of parallel copy the
+ * command would have been set already by calling AssignCommandIdForWorker.
+ * For parallel copy call GetCurrentCommandId to get currentCommandId by
+ * passing used as false, as this is taken care earlier.
+ */
+ CommandId mycid = IsParallelWorker() ? GetCurrentCommandId(false) :
+ GetCurrentCommandId(true);
struct varlena *result;
struct varatt_external toast_pointer;
union
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 8eb276e..a5fa9f5 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2043,19 +2043,6 @@ static HeapTuple
heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
CommandId cid, int options)
{
- /*
- * Parallel operations are required to be strictly read-only in a parallel
- * worker. Parallel inserts are not safe even in the leader in the
- * general case, because group locking means that heavyweight locks for
- * relation extension or GIN page locks will not conflict between members
- * of a lock group, but we don't prohibit that case here because there are
- * useful special cases that we can safely allow, such as CREATE TABLE AS.
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-
tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index af6afce..e983f78 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -517,6 +517,37 @@ GetCurrentFullTransactionIdIfAny(void)
}
/*
+ * AssignFullTransactionIdForWorker
+ *
+ * For parallel copy, transaction id of leader will be used by the workers.
+ */
+void
+AssignFullTransactionIdForWorker(FullTransactionId fullTransactionId)
+{
+ TransactionState s = CurrentTransactionState;
+
+ Assert((IsInParallelMode() || IsParallelWorker()));
+ s->fullTransactionId = fullTransactionId;
+}
+
+/*
+ * AssignCommandIdForWorker
+ *
+ * For parallel copy, command id of leader will be used by the workers.
+ */
+void
+AssignCommandIdForWorker(CommandId commandId, bool used)
+{
+ Assert((IsInParallelMode() || IsParallelWorker()));
+
+ /* this is global to a transaction, not subtransaction-local */
+ if (used)
+ currentCommandIdUsed = true;
+
+ currentCommandId = commandId;
+}
+
+/*
* MarkCurrentTransactionIdLoggedIfAny
*
* Remember that the current xid - if it is assigned - now has been wal logged.
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 704bbd3..868ba4a 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -26,6 +26,7 @@
#include "access/xlog.h"
#include "catalog/dependency.h"
#include "catalog/pg_authid.h"
+#include "catalog/pg_proc_d.h"
#include "catalog/pg_type.h"
#include "commands/copy.h"
#include "commands/defrem.h"
@@ -40,11 +41,13 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "optimizer/clauses.h"
#include "optimizer/optimizer.h"
#include "parser/parse_coerce.h"
#include "parser/parse_collate.h"
#include "parser/parse_expr.h"
#include "parser/parse_relation.h"
+#include "pgstat.h"
#include "port/pg_bswap.h"
#include "rewrite/rewriteHandler.h"
#include "storage/fd.h"
@@ -95,6 +98,18 @@ typedef enum CopyInsertMethod
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
} CopyInsertMethod;
+/*
+ * State of the line.
+ */
+typedef enum ParallelCopyLineState
+{
+ LINE_INIT, /* initial state of line */
+ LINE_LEADER_POPULATING, /* leader processing line */
+ LINE_LEADER_POPULATED, /* leader completed populating line */
+ LINE_WORKER_PROCESSING, /* worker processing line */
+ LINE_WORKER_PROCESSED /* worker completed processing line */
+}ParallelCopyLineState;
+
#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
/*
@@ -124,6 +139,7 @@ typedef enum CopyInsertMethod
#define IsParallelCopy() (cstate->is_parallel)
#define IsLeader() (cstate->pcdata->is_leader)
+#define IsWorker() (IsParallelCopy() && !IsLeader())
#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
/*
@@ -557,9 +573,13 @@ if (1) \
{ \
if (raw_buf_ptr > cstate->raw_buf_index) \
{ \
- appendBinaryStringInfo(&cstate->line_buf, \
- cstate->raw_buf + cstate->raw_buf_index, \
- raw_buf_ptr - cstate->raw_buf_index); \
+ if (!IsParallelCopy()) \
+ appendBinaryStringInfo(&cstate->line_buf, \
+ cstate->raw_buf + cstate->raw_buf_index, \
+ raw_buf_ptr - cstate->raw_buf_index); \
+ else \
+ line_size += raw_buf_ptr - cstate->raw_buf_index; \
+ \
cstate->raw_buf_index = raw_buf_ptr; \
} \
} else ((void) 0)
@@ -572,26 +592,65 @@ if (1) \
goto not_end_of_copy; \
} else ((void) 0)
+/* Begin parallel copy Macros */
+#define SET_NEWLINE_SIZE() \
+{ \
+ if (cstate->eol_type == EOL_NL || cstate->eol_type == EOL_CR) \
+ new_line_size = 1; \
+ else if (cstate->eol_type == EOL_CRNL) \
+ new_line_size = 2; \
+ else \
+ new_line_size = 0; \
+}
+
+/*
+ * COPY_WAIT_TO_PROCESS - Wait before continuing to process.
+ */
+#define COPY_WAIT_TO_PROCESS() \
+{ \
+ CHECK_FOR_INTERRUPTS(); \
+ (void) WaitLatch(MyLatch, \
+ WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, \
+ 1L, WAIT_EVENT_PG_SLEEP); \
+ ResetLatch(MyLatch); \
+}
+
/*
* CLEAR_EOL_LINE - Wrapper for clearing EOL.
*/
#define CLEAR_EOL_LINE() \
if (!result && !IsHeaderLine()) \
- ClearEOLFromCopiedData(cstate, cstate->line_buf.data, \
- cstate->line_buf.len, \
- &cstate->line_buf.len) \
+{ \
+ if (IsParallelCopy()) \
+ ClearEOLFromCopiedData(cstate, cstate->raw_buf, \
+ raw_buf_ptr, &line_size); \
+ else \
+ ClearEOLFromCopiedData(cstate, cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ &cstate->line_buf.len); \
+} \
/*
* INCREMENTPROCESSED - Increment the lines processed.
*/
-#define INCREMENTPROCESSED(processed) \
-processed++;
+#define INCREMENTPROCESSED(processed) \
+{ \
+ if (!IsParallelCopy()) \
+ processed++; \
+ else \
+ pg_atomic_add_fetch_u64(&cstate->pcdata->pcshared_info->processed, 1); \
+}
/*
* RETURNPROCESSED - Get the lines processed.
*/
#define RETURNPROCESSED(processed) \
-return processed;
+if (!IsParallelCopy()) \
+ return processed; \
+else \
+ return pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
+
+/* End parallel copy Macros */
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
@@ -646,7 +705,10 @@ static void PopulateCommonCstateInfo(CopyState cstate, TupleDesc tup_desc,
static void ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
int copy_line_pos, int *copy_line_size);
static void ConvertToServerEncoding(CopyState cstate);
-
+static void ExecBeforeStmtTrigger(CopyState cstate);
+static void CheckTargetRelValidity(CopyState cstate);
+static void PopulateCstateCatalogInfo(CopyState cstate);
+static pg_attribute_always_inline uint32 GetLinePosition(CopyState cstate);
/*
* SerializeParallelCopyState - Copy shared_cstate using cstate information.
*/
@@ -728,6 +790,137 @@ PopulateParallelCopyShmInfo(ParallelCopyShmInfo *shared_info_ptr,
}
/*
+ * IsTriggerFunctionParallelSafe - Check if the trigger function is parallel
+ * safe for the triggers. Return false if any one of the trigger has parallel
+ * unsafe function.
+ */
+static pg_attribute_always_inline bool
+IsTriggerFunctionParallelSafe(TriggerDesc *trigdesc)
+{
+ int i;
+ for (i = 0; i < trigdesc->numtriggers; i++)
+ {
+ Trigger *trigger = &trigdesc->triggers[i];
+ int trigtype = RI_TRIGGER_NONE;
+
+ if (func_parallel(trigger->tgfoid) != PROPARALLEL_SAFE)
+ return false;
+
+ /* If the trigger is parallel safe, also look for RI_TRIGGER. */
+ trigtype = RI_FKey_trigger_type(trigger->tgfoid);
+ if (trigtype == RI_TRIGGER_PK || trigtype == RI_TRIGGER_FK)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * CheckExprParallelSafety - determine parallel safety of volatile expressions
+ * in default clause of column definition or in where clause and return true if
+ * they are parallel safe.
+ */
+static pg_attribute_always_inline bool
+CheckExprParallelSafety(CopyState cstate)
+{
+ if (contain_volatile_functions(cstate->whereClause))
+ {
+ if (max_parallel_hazard((Query *)cstate->whereClause) != PROPARALLEL_SAFE)
+ return false;
+ }
+
+ /*
+ * Check if any of the column has volatile default expression. if yes, and
+ * they are not parallel safe, then parallelism is not allowed. For
+ * instance, if there are any serial/bigserial columns for which nextval()
+ * default expression which is parallel unsafe is associated, parallelism
+ * should not be allowed. In non parallel copy volatile functions are not
+ * checked for nextval().
+ */
+ if (cstate->defexprs != NULL && cstate->num_defaults != 0)
+ {
+ int i;
+ for (i = 0; i < cstate->num_defaults; i++)
+ {
+ bool volatile_expr = contain_volatile_functions((Node *)cstate->defexprs[i]->expr);
+ if (volatile_expr &&
+ (max_parallel_hazard((Query *)cstate->defexprs[i]->expr)) !=
+ PROPARALLEL_SAFE)
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * FindInsertMethod - determine insert mode single, multi, or multi conditional.
+ */
+static pg_attribute_always_inline CopyInsertMethod
+FindInsertMethod(CopyState cstate)
+{
+ if (cstate->rel->trigdesc != NULL &&
+ (cstate->rel->trigdesc->trig_insert_before_row ||
+ cstate->rel->trigdesc->trig_insert_instead_row))
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
+ cstate->rel->trigdesc != NULL &&
+ cstate->rel->trigdesc->trig_insert_new_table)
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+ return CIM_MULTI_CONDITIONAL;
+
+ return CIM_MULTI;
+}
+
+/*
+ * IsParallelCopyAllowed - check for the cases where parallel copy is not
+ * applicable.
+ */
+static pg_attribute_always_inline bool
+IsParallelCopyAllowed(CopyState cstate)
+{
+ /* Parallel copy not allowed for frontend (2.0 protocol) & binary option. */
+ if ((cstate->copy_dest == COPY_OLD_FE) || cstate->binary)
+ return false;
+
+ /* Check if copy is into foreign table or temporary table. */
+ if (cstate->rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE ||
+ RelationUsesLocalBuffers(cstate->rel))
+ return false;
+
+ /* Check if trigger function is parallel safe. */
+ if (cstate->rel->trigdesc != NULL &&
+ !IsTriggerFunctionParallelSafe(cstate->rel->trigdesc))
+ return false;
+
+ /*
+ * Check if there is after statement or instead of trigger or transition
+ * table triggers.
+ */
+ if (cstate->rel->trigdesc != NULL &&
+ (cstate->rel->trigdesc->trig_insert_after_statement ||
+ cstate->rel->trigdesc->trig_insert_instead_row ||
+ cstate->rel->trigdesc->trig_insert_new_table))
+ return false;
+
+ /* Check if the volatile expressions are parallel safe, if present any. */
+ if (!CheckExprParallelSafety(cstate))
+ return false;
+
+ /* Check if the insertion mode is single. */
+ if (FindInsertMethod(cstate) == CIM_SINGLE)
+ return false;
+
+ return true;
+}
+
+/*
* BeginParallelCopy - start parallel copy tasks.
*
* Get the number of workers required to perform the parallel copy. The data
@@ -755,6 +948,7 @@ BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
ParallelCopyData *pcdata;
MemoryContext oldcontext;
+ CheckTargetRelValidity(cstate);
parallel_workers = Min(nworkers, max_worker_processes);
/* Can't perform copy in parallel */
@@ -766,6 +960,15 @@ BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
MemoryContextSwitchTo(oldcontext);
cstate->pcdata = pcdata;
+ /*
+ * User chosen parallel copy. Determine if the parallel copy is actually
+ * allowed. If not, go with the non-parallel mode.
+ */
+ if (!IsParallelCopyAllowed(cstate))
+ return NULL;
+
+ full_transaction_id = GetCurrentFullTransactionId();
+
EnterParallelMode();
pcxt = CreateParallelContext("postgres", "ParallelCopyMain",
parallel_workers);
@@ -974,9 +1177,214 @@ InitializeParallelCopyInfo(SerializedParallelCopyState *shared_cstate,
cstate->line_buf_converted = false;
cstate->raw_buf = NULL;
cstate->raw_buf_index = cstate->raw_buf_len = 0;
+
+ PopulateCstateCatalogInfo(cstate);
+
+ /* Create workspace for CopyReadAttributes results. */
+ if (!cstate->binary)
+ {
+ AttrNumber attr_count = list_length(cstate->attnumlist);
+
+ cstate->max_fields = attr_count;
+ cstate->raw_fields = (char **)palloc(attr_count * sizeof(char *));
+ }
+}
+
+/*
+ * CacheLineInfo - Cache the line information to local memory.
+ */
+static bool
+CacheLineInfo(CopyState cstate, uint32 buff_count)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyData *pcdata = cstate->pcdata;
+ uint32 write_pos;
+ ParallelCopyDataBlock *data_blk_ptr;
+ ParallelCopyLineBoundary *lineInfo;
+ uint32 offset;
+ int dataSize;
+ int copiedSize = 0;
+
+ resetStringInfo(&pcdata->worker_line_buf[buff_count].line_buf);
+ write_pos = GetLinePosition(cstate);
+ if (-1 == write_pos)
+ return true;
+
+ /* Get the current line information. */
+ lineInfo = &pcshared_info->line_boundaries.ring[write_pos];
+ if (pg_atomic_read_u32(&lineInfo->line_size) == 0)
+ goto empty_data_line_update;
+
+ /* Get the block information. */
+ data_blk_ptr = &pcshared_info->data_blocks[lineInfo->first_block];
+
+ /* Get the offset information from where the data must be copied. */
+ offset = lineInfo->start_offset;
+ pcdata->worker_line_buf[buff_count].cur_lineno = lineInfo->cur_lineno;
+
+ elog(DEBUG1, "[Worker] Processing - line position:%d, block:%d, unprocessed lines:%d, offset:%d, line size:%d",
+ write_pos, lineInfo->first_block,
+ pg_atomic_read_u32(&data_blk_ptr->unprocessed_line_parts),
+ offset, pg_atomic_read_u32(&lineInfo->line_size));
+
+ for (;;)
+ {
+ uint8 skip_bytes = data_blk_ptr->skip_bytes;
+
+ /*
+ * There is a possibility that the loop embedded at the bottom of the
+ * current loop has come out because data_blk_ptr->curr_blk_completed
+ * is set, but dataSize read might be an old value, if
+ * data_blk_ptr->curr_blk_completed and the line is completed, line_size
+ * will be set. Read the line_size again to be sure if it is completed
+ * or partial block.
+ */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+ if (dataSize)
+ {
+ int remainingSize = dataSize - copiedSize;
+ if (!remainingSize)
+ break;
+
+ /* Whole line is in current block. */
+ if (remainingSize + offset + skip_bytes < DATA_BLOCK_SIZE)
+ {
+ appendBinaryStringInfo(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ remainingSize);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts,
+ 1);
+ break;
+ }
+ else
+ {
+ /* Line is spread across the blocks. */
+ uint32 lineInCurrentBlock = (DATA_BLOCK_SIZE - skip_bytes) - offset;
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ lineInCurrentBlock);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts, 1);
+ copiedSize += lineInCurrentBlock;
+ while (copiedSize < dataSize)
+ {
+ uint32 currentBlockCopySize;
+ ParallelCopyDataBlock *currBlkPtr = &pcshared_info->data_blocks[data_blk_ptr->following_block];
+ skip_bytes = currBlkPtr->skip_bytes;
+
+ /*
+ * If complete data is present in current block use
+ * dataSize - copiedSize, or copy the whole block from
+ * current block.
+ */
+ currentBlockCopySize = Min(dataSize - copiedSize, DATA_BLOCK_SIZE - skip_bytes);
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &currBlkPtr->data[0],
+ currentBlockCopySize);
+ pg_atomic_sub_fetch_u32(&currBlkPtr->unprocessed_line_parts, 1);
+ copiedSize += currentBlockCopySize;
+ data_blk_ptr = currBlkPtr;
+ }
+
+ break;
+ }
+ }
+ else
+ {
+ /* Copy this complete block from the current offset. */
+ uint32 lineInCurrentBlock = (DATA_BLOCK_SIZE - skip_bytes) - offset;
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ lineInCurrentBlock);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts, 1);
+ copiedSize += lineInCurrentBlock;
+
+ /*
+ * Reset the offset. For the first copy, copy from the offset. For
+ * the subsequent copy the complete block.
+ */
+ offset = 0;
+
+ /* Set data_blk_ptr to the following block. */
+ data_blk_ptr = &pcshared_info->data_blocks[data_blk_ptr->following_block];
+ }
+
+ for (;;)
+ {
+ /* Get the size of this line */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+
+ /*
+ * If the data is present in current block lineInfo->line_size
+ * will be updated. If the data is spread across the blocks either
+ * of lineInfo->line_size or data_blk_ptr->curr_blk_completed can
+ * be updated. lineInfo->line_size will be updated if the complete
+ * read is finished. data_blk_ptr->curr_blk_completed will be
+ * updated if processing of current block is finished and data
+ * processing is not finished.
+ */
+ if (data_blk_ptr->curr_blk_completed || (dataSize != -1))
+ break;
+
+ COPY_WAIT_TO_PROCESS()
+ }
+ }
+
+empty_data_line_update:
+ elog(DEBUG1, "[Worker] Completed processing line:%d", write_pos);
+ pg_atomic_write_u32(&lineInfo->line_state, LINE_WORKER_PROCESSED);
+ pg_atomic_write_u32(&lineInfo->line_size, -1);
+ pg_atomic_add_fetch_u64(&pcshared_info->total_worker_processed, 1);
+ return false;
}
/*
+ * GetWorkerLine - Returns a line for worker to process.
+ */
+static bool
+GetWorkerLine(CopyState cstate)
+{
+ uint32 buff_count;
+ ParallelCopyData *pcdata = cstate->pcdata;
+
+ /*
+ * Copy the line data to line_buf and release the line position so that the
+ * worker can continue loading data.
+ */
+ if (pcdata->worker_line_buf_pos < pcdata->worker_line_buf_count)
+ goto return_line;
+
+ pcdata->worker_line_buf_pos = 0;
+ pcdata->worker_line_buf_count = 0;
+
+ for (buff_count = 0; buff_count < WORKER_CHUNK_COUNT; buff_count++)
+ {
+ bool result = CacheLineInfo(cstate, buff_count);
+ if (result)
+ break;
+
+ pcdata->worker_line_buf_count++;
+ }
+
+ if (pcdata->worker_line_buf_count)
+ goto return_line;
+ else
+ resetStringInfo(&cstate->line_buf);
+
+ return true;
+
+return_line:
+ cstate->line_buf = pcdata->worker_line_buf[pcdata->worker_line_buf_pos].line_buf;
+ cstate->cur_lineno = pcdata->worker_line_buf[pcdata->worker_line_buf_pos].cur_lineno;
+ cstate->line_buf_valid = true;
+
+ /* Mark that encoding conversion hasn't occurred yet. */
+ cstate->line_buf_converted = false;
+ ConvertToServerEncoding(cstate);
+ pcdata->worker_line_buf_pos++;
+ return false;
+ }
+
+/*
* ParallelCopyMain - parallel copy worker's code.
*
* Where clause handling, convert tuple to columns, add default null values for
@@ -1025,6 +1433,8 @@ ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
ereport(DEBUG1, (errmsg("Starting parallel copy worker")));
pcdata->pcshared_info = pcshared_info;
+ AssignFullTransactionIdForWorker(pcshared_info->full_transaction_id);
+ AssignCommandIdForWorker(pcshared_info->mycid, true);
shared_cstate = (SerializedParallelCopyState *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_CSTATE, false);
cstate->null_print = (char *)shm_toc_lookup(toc, PARALLEL_COPY_KEY_NULL_PRINT, true);
@@ -1085,6 +1495,33 @@ ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
}
/*
+ * UpdateBlockInLineInfo - Update the line information.
+ */
+static pg_attribute_always_inline int
+UpdateBlockInLineInfo(CopyState cstate, uint32 blk_pos,
+ uint32 offset, uint32 line_size, uint32 line_state)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundaries *lineBoundaryPtr = &pcshared_info->line_boundaries;
+ ParallelCopyLineBoundary *lineInfo;
+ int line_pos = lineBoundaryPtr->pos;
+
+ /* Update the line information for the worker to pick and process. */
+ lineInfo = &lineBoundaryPtr->ring[line_pos];
+ while (pg_atomic_read_u32(&lineInfo->line_size) != -1)
+ COPY_WAIT_TO_PROCESS()
+
+ lineInfo->first_block = blk_pos;
+ lineInfo->start_offset = offset;
+ lineInfo->cur_lineno = cstate->cur_lineno;
+ pg_atomic_write_u32(&lineInfo->line_size, line_size);
+ pg_atomic_write_u32(&lineInfo->line_state, line_state);
+ lineBoundaryPtr->pos = (lineBoundaryPtr->pos + 1) % RINGSIZE;
+
+ return line_pos;
+}
+
+/*
* ParallelCopyFrom - parallel copy leader's functionality.
*
* Leader executes the before statement for before statement trigger, if before
@@ -1107,8 +1544,302 @@ ParallelCopyFrom(CopyState cstate)
ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
ereport(DEBUG1, (errmsg("Running parallel copy leader")));
+ /* raw_buf is not used in parallel copy, instead data blocks are used.*/
+ pfree(cstate->raw_buf);
+ cstate->raw_buf = NULL;
+
+ /* Execute the before statement triggers from the leader */
+ ExecBeforeStmtTrigger(cstate);
+
+ /* On input just throw the header line away. */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
+ {
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ {
+ pcshared_info->is_read_in_progress = false;
+ return; /* done */
+ }
+ }
+
+ for (;;)
+ {
+ bool done;
+ cstate->cur_lineno++;
+
+ /* Actually read the line into memory here. */
+ done = CopyReadLine(cstate);
+
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ break;
+ }
+
pcshared_info->is_read_in_progress = false;
cstate->cur_lineno = 0;
+ }
+
+/*
+ * GetLinePosition - return the line position that worker should process.
+ */
+static uint32
+GetLinePosition(CopyState cstate)
+{
+ ParallelCopyData *pcdata = cstate->pcdata;
+ ParallelCopyShmInfo *pcshared_info = pcdata->pcshared_info;
+ uint32 previous_pos = pcdata->worker_processed_pos;
+ uint32 write_pos = (previous_pos == -1) ? 0 : (previous_pos + 1) % RINGSIZE;
+ for (;;)
+ {
+ int dataSize;
+ bool is_read_in_progress = pcshared_info->is_read_in_progress;
+ ParallelCopyLineBoundary *lineInfo;
+ ParallelCopyDataBlock *data_blk_ptr;
+ ParallelCopyLineState line_state = LINE_LEADER_POPULATED;
+ ParallelCopyLineState curr_line_state;
+ CHECK_FOR_INTERRUPTS();
+
+ /* File read completed & no elements to process. */
+ if (!is_read_in_progress &&
+ (pcshared_info->populated ==
+ pg_atomic_read_u64(&pcshared_info->total_worker_processed)))
+ {
+ write_pos = -1;
+ break;
+ }
+
+ /* Get the current line information. */
+ lineInfo = &pcshared_info->line_boundaries.ring[write_pos];
+ curr_line_state = pg_atomic_read_u32(&lineInfo->line_state);
+ if ((write_pos % WORKER_CHUNK_COUNT == 0) &&
+ (curr_line_state == LINE_WORKER_PROCESSED ||
+ curr_line_state == LINE_WORKER_PROCESSING))
+ {
+ pcdata->worker_processed_pos = write_pos;
+ write_pos = (write_pos + WORKER_CHUNK_COUNT) % RINGSIZE;
+ continue;
+ }
+
+ /* Get the size of this line. */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+
+ if (dataSize != 0) /* If not an empty line. */
+ {
+ /* Get the block information. */
+ data_blk_ptr = &pcshared_info->data_blocks[lineInfo->first_block];
+
+ if (!data_blk_ptr->curr_blk_completed && (dataSize == -1))
+ {
+ /* Wait till the current line or block is added. */
+ COPY_WAIT_TO_PROCESS()
+ continue;
+ }
+ }
+
+ /* Make sure that no worker has consumed this element. */
+ if (pg_atomic_compare_exchange_u32(&lineInfo->line_state,
+ &line_state, LINE_WORKER_PROCESSING))
+ break;
+ }
+
+ pcdata->worker_processed_pos = write_pos;
+ return write_pos;
+}
+
+/*
+ * GetFreeCopyBlock - Get a free block for data to be copied.
+ */
+static pg_attribute_always_inline uint32
+GetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
+{
+ int count = 0;
+ uint32 last_free_block = pcshared_info->cur_block_pos;
+ uint32 block_pos = (last_free_block != -1) ? ((last_free_block + 1) % MAX_BLOCKS_COUNT): 0;
+
+ /*
+ * Get a new block for copying data, don't check current block, current
+ * block will have some unprocessed data.
+ */
+ while (count < (MAX_BLOCKS_COUNT - 1))
+ {
+ ParallelCopyDataBlock *dataBlkPtr = &pcshared_info->data_blocks[block_pos];
+ uint32 unprocessed_line_parts = pg_atomic_read_u32(&dataBlkPtr->unprocessed_line_parts);
+ if (unprocessed_line_parts == 0)
+ {
+ dataBlkPtr->curr_blk_completed = false;
+ dataBlkPtr->skip_bytes = 0;
+ pcshared_info->cur_block_pos = block_pos;
+ return block_pos;
+ }
+
+ block_pos = (block_pos + 1) % MAX_BLOCKS_COUNT;
+ count++;
+ }
+
+ return -1;
+}
+
+/*
+ * WaitGetFreeCopyBlock - If there are no blocks available, wait and get a block
+ * for copying data.
+ */
+static uint32
+WaitGetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
+{
+ uint32 new_free_pos = -1;
+ for (;;)
+ {
+ new_free_pos = GetFreeCopyBlock(pcshared_info);
+ if (new_free_pos != -1) /* We have got one block, break now. */
+ break;
+
+ COPY_WAIT_TO_PROCESS()
+ }
+
+ return new_free_pos;
+}
+
+/*
+ * SetRawBufForLoad - Set raw_buf to the shared memory where the file data must
+ * be read.
+ */
+static void
+SetRawBufForLoad(CopyState cstate, uint32 line_size, uint32 copy_buf_len,
+ uint32 raw_buf_ptr, char **copy_raw_buf)
+{
+ ParallelCopyShmInfo *pcshared_info;
+ uint32 cur_block_pos;
+ uint32 next_block_pos;
+ ParallelCopyDataBlock *cur_data_blk_ptr = NULL;
+ ParallelCopyDataBlock *next_data_blk_ptr = NULL;
+
+ if (!IsParallelCopy())
+ return;
+
+ pcshared_info = cstate->pcdata->pcshared_info;
+ cur_block_pos = pcshared_info->cur_block_pos;
+ cur_data_blk_ptr = (cstate->raw_buf) ? &pcshared_info->data_blocks[cur_block_pos] : NULL;
+ next_block_pos = WaitGetFreeCopyBlock(pcshared_info);
+ next_data_blk_ptr = &pcshared_info->data_blocks[next_block_pos];
+
+ /* set raw_buf to the data block in shared memory */
+ cstate->raw_buf = next_data_blk_ptr->data;
+ *copy_raw_buf = cstate->raw_buf;
+ if (cur_data_blk_ptr && line_size)
+ {
+ /*
+ * Mark the previous block as completed, worker can start copying this
+ * data.
+ */
+ cur_data_blk_ptr->following_block = next_block_pos;
+ pg_atomic_add_fetch_u32(&cur_data_blk_ptr->unprocessed_line_parts, 1);
+ cur_data_blk_ptr->skip_bytes = copy_buf_len - raw_buf_ptr;
+ cur_data_blk_ptr->curr_blk_completed = true;
+ }
+}
+
+/*
+ * EndLineParallelCopy - Update the line information in shared memory.
+ */
+static void
+EndLineParallelCopy(CopyState cstate, uint32 line_pos, uint32 line_size,
+ uint32 raw_buf_ptr)
+{
+ uint8 new_line_size;
+ if (!IsParallelCopy())
+ return;
+
+ if (!IsHeaderLine())
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundaries *lineBoundaryPtr = &pcshared_info->line_boundaries;
+ SET_NEWLINE_SIZE()
+ if (line_size)
+ {
+ ParallelCopyLineBoundary *lineInfo = &lineBoundaryPtr->ring[line_pos];
+ /*
+ * If the new_line_size > raw_buf_ptr, then the new block has only
+ * new line char content. The unprocessed count should not be
+ * increased in this case.
+ */
+ if (raw_buf_ptr > new_line_size)
+ {
+ uint32 cur_block_pos = pcshared_info->cur_block_pos;
+ ParallelCopyDataBlock *curr_data_blk_ptr = &pcshared_info->data_blocks[cur_block_pos];
+ pg_atomic_add_fetch_u32(&curr_data_blk_ptr->unprocessed_line_parts, 1);
+ }
+
+ /* Update line size. */
+ pg_atomic_write_u32(&lineInfo->line_size, line_size);
+ pg_atomic_write_u32(&lineInfo->line_state, LINE_LEADER_POPULATED);
+ elog(DEBUG1, "[Leader] After adding - line position:%d, line_size:%d",
+ line_pos, line_size);
+ pcshared_info->populated++;
+ }
+ else if (new_line_size)
+ {
+ /* This means only new line char, empty record should be inserted.*/
+ ParallelCopyLineBoundary *lineInfo;
+ line_pos = UpdateBlockInLineInfo(cstate, -1, -1, 0,
+ LINE_LEADER_POPULATED);
+ lineInfo = &lineBoundaryPtr->ring[line_pos];
+ elog(DEBUG1, "[Leader] Added empty line with offset:%d, line position:%d, line size:%d",
+ lineInfo->start_offset, line_pos,
+ pg_atomic_read_u32(&lineInfo->line_size));
+ pcshared_info->populated++;
+ }
+ }
+}
+
+/*
+ * ExecBeforeStmtTrigger - Execute the before statement trigger, this will be
+ * executed for parallel copy by the leader process.
+ */
+static void
+ExecBeforeStmtTrigger(CopyState cstate)
+{
+ EState *estate = CreateExecutorState();
+ ResultRelInfo *resultRelInfo;
+
+ Assert(IsLeader());
+
+ /*
+ * We need a ResultRelInfo so we can use the regular executor's
+ * index-entry-making machinery. (There used to be a huge amount of code
+ * here that basically duplicated execUtils.c ...)
+ */
+ resultRelInfo = makeNode(ResultRelInfo);
+ InitResultRelInfo(resultRelInfo,
+ cstate->rel,
+ 1, /* must match rel's position in range_table */
+ NULL,
+ 0);
+
+ /* Verify the named relation is a valid target for INSERT */
+ CheckValidResultRel(resultRelInfo, CMD_INSERT);
+
+ estate->es_result_relations = resultRelInfo;
+ estate->es_num_result_relations = 1;
+ estate->es_result_relation_info = resultRelInfo;
+
+ ExecInitRangeTable(estate, cstate->range_table);
+
+ /*
+ * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
+ * should do this for COPY, since it's not really an "INSERT" statement as
+ * such. However, executing these triggers maintains consistency with the
+ * EACH ROW triggers that we already fire on COPY.
+ */
+ ExecBSInsertTriggers(estate, resultRelInfo);
+
+ /* Close any trigger target relations */
+ ExecCleanUpTriggerState(estate);
+
+ FreeExecutorState(estate);
}
/*
@@ -3571,7 +4302,8 @@ CopyFrom(CopyState cstate)
PartitionTupleRouting *proute = NULL;
ErrorContextCallback errcallback;
- CommandId mycid = GetCurrentCommandId(true);
+ CommandId mycid = IsParallelCopy() ? GetCurrentCommandId(false) :
+ GetCurrentCommandId(true);
int ti_options = 0; /* start with default options for insert */
BulkInsertState bistate = NULL;
CopyInsertMethod insertMethod;
@@ -3581,7 +4313,14 @@ CopyFrom(CopyState cstate)
bool has_instead_insert_row_trig;
bool leafpart_use_multi_insert = false;
- CheckTargetRelValidity(cstate);
+ /*
+ * Perform this check if it is not parallel copy. In case of parallel
+ * copy, this check is done by the leader, so that if any invalid case
+ * exist the copy from command will error out from the leader itself,
+ * avoiding launching workers, just to throw error.
+ */
+ if (!IsParallelCopy())
+ CheckTargetRelValidity(cstate);
/*
* If the target file is new-in-transaction, we assume that checking FSM
@@ -3621,7 +4360,8 @@ CopyFrom(CopyState cstate)
target_resultRelInfo = resultRelInfo;
/* Verify the named relation is a valid target for INSERT */
- CheckValidResultRel(resultRelInfo, CMD_INSERT);
+ if (!IsParallelCopy())
+ CheckValidResultRel(resultRelInfo, CMD_INSERT);
ExecOpenIndices(resultRelInfo, false);
@@ -3770,13 +4510,16 @@ CopyFrom(CopyState cstate)
has_instead_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
resultRelInfo->ri_TrigDesc->trig_insert_instead_row);
- /*
- * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
- * should do this for COPY, since it's not really an "INSERT" statement as
- * such. However, executing these triggers maintains consistency with the
- * EACH ROW triggers that we already fire on COPY.
- */
- ExecBSInsertTriggers(estate, resultRelInfo);
+ if (!IsParallelCopy())
+ {
+ /*
+ * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
+ * should do this for COPY, since it's not really an "INSERT" statement as
+ * such. However, executing these triggers maintains consistency with the
+ * EACH ROW triggers that we already fire on COPY.
+ */
+ ExecBSInsertTriggers(estate, resultRelInfo);
+ }
econtext = GetPerTupleExprContext(estate);
@@ -3876,6 +4619,16 @@ CopyFrom(CopyState cstate)
!has_instead_insert_row_trig &&
resultRelInfo->ri_FdwRoutine == NULL;
+ /*
+ * If the table has any partitions that are either foreign or
+ * has BEFORE/INSTEAD OF triggers, we can't perform copy
+ * operations with parallel workers.
+ */
+ if (!leafpart_use_multi_insert && IsParallelWorker())
+ ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot perform PARALLEL COPY if partition has BEFORE/INSTEAD OF triggers, or if the partition is foreign partition"),
+ errhint("Try COPY without PARALLEL option")));
+
/* Set the multi-insert buffer to use for this partition. */
if (leafpart_use_multi_insert)
{
@@ -4292,7 +5045,7 @@ BeginCopyFrom(ParseState *pstate,
* only in text mode.
*/
initStringInfo(&cstate->attribute_buf);
- cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf = (IsParallelCopy()) ? NULL : (char *) palloc(RAW_BUF_SIZE + 1);
cstate->raw_buf_index = cstate->raw_buf_len = 0;
if (!cstate->binary)
{
@@ -4441,26 +5194,35 @@ NextCopyFromRawFields(CopyState cstate, char ***fields, int *nfields)
/* only available for text or csv input */
Assert(!cstate->binary);
- /* on input just throw the header line away */
- if (cstate->cur_lineno == 0 && cstate->header_line)
+ if (IsParallelCopy())
{
- cstate->cur_lineno++;
- if (CopyReadLine(cstate))
- return false; /* done */
+ done = GetWorkerLine(cstate);
+ if (done && cstate->line_buf.len == 0)
+ return false;
}
+ else
+ {
+ /* on input just throw the header line away */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
+ {
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ return false; /* done */
+ }
- cstate->cur_lineno++;
+ cstate->cur_lineno++;
- /* Actually read the line into memory here */
- done = CopyReadLine(cstate);
+ /* Actually read the line into memory here */
+ done = CopyReadLine(cstate);
- /*
- * EOF at start of line means we're done. If we see EOF after some
- * characters, we act as though it was newline followed by EOF, ie,
- * process the line and then exit loop on next iteration.
- */
- if (done && cstate->line_buf.len == 0)
- return false;
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ return false;
+ }
/* Parse the line into de-escaped field values */
if (cstate->csv_mode)
@@ -4708,9 +5470,31 @@ CopyReadLine(CopyState cstate)
*/
if (cstate->copy_dest == COPY_NEW_FE)
{
+ bool bIsFirst = true;
do
{
- cstate->raw_buf_index = cstate->raw_buf_len;
+ if (!IsParallelCopy())
+ cstate->raw_buf_index = cstate->raw_buf_len;
+ else
+ {
+ if (cstate->raw_buf_index == RAW_BUF_SIZE)
+ {
+ /* Get a new block if it is the first time, From the
+ * subsequent time, reset the index and re-use the same
+ * block.
+ */
+ if (bIsFirst)
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint32 block_pos = WaitGetFreeCopyBlock(pcshared_info);
+ cstate->raw_buf = pcshared_info->data_blocks[block_pos].data;
+ bIsFirst = false;
+ }
+
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+ }
+ }
+
} while (CopyLoadRawBuf(cstate));
}
}
@@ -4765,7 +5549,7 @@ static void
ConvertToServerEncoding(CopyState cstate)
{
/* Done reading the line. Convert it to server encoding. */
- if (cstate->need_transcoding)
+ if (cstate->need_transcoding && (!IsParallelCopy() || IsWorker()))
{
char *cvt;
cvt = pg_any_to_server(cstate->line_buf.data,
@@ -4804,6 +5588,11 @@ CopyReadLineText(CopyState cstate)
char quotec = '\0';
char escapec = '\0';
+ /* For parallel copy */
+ int line_size = 0;
+ int line_pos = 0;
+
+ cstate->eol_type = EOL_UNKNOWN;
if (cstate->csv_mode)
{
quotec = cstate->quote[0];
@@ -4858,6 +5647,8 @@ CopyReadLineText(CopyState cstate)
if (raw_buf_ptr >= copy_buf_len || need_data)
{
REFILL_LINEBUF;
+ SetRawBufForLoad(cstate, line_size, copy_buf_len, raw_buf_ptr,
+ ©_raw_buf);
/*
* Try to read some more data. This will certainly reset
@@ -5082,9 +5873,15 @@ CopyReadLineText(CopyState cstate)
* discard the data and the \. sequence.
*/
if (prev_raw_ptr > cstate->raw_buf_index)
- appendBinaryStringInfo(&cstate->line_buf,
+ {
+ if (!IsParallelCopy())
+ appendBinaryStringInfo(&cstate->line_buf,
cstate->raw_buf + cstate->raw_buf_index,
prev_raw_ptr - cstate->raw_buf_index);
+ else
+ line_size += prev_raw_ptr - cstate->raw_buf_index;
+ }
+
cstate->raw_buf_index = raw_buf_ptr;
result = true; /* report EOF */
break;
@@ -5136,6 +5933,26 @@ not_end_of_copy:
IF_NEED_REFILL_AND_EOF_BREAK(mblen - 1);
raw_buf_ptr += mblen - 1;
}
+
+ /*
+ * Skip the header line. Update the line here, this cannot be done at
+ * the beginning, as there is a possibility that file contains empty
+ * lines.
+ */
+ if (IsParallelCopy() && first_char_in_line && !IsHeaderLine())
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundary *lineInfo;
+ uint32 line_first_block = pcshared_info->cur_block_pos;
+ line_pos = UpdateBlockInLineInfo(cstate,
+ line_first_block,
+ cstate->raw_buf_index, -1,
+ LINE_LEADER_POPULATING);
+ lineInfo = &pcshared_info->line_boundaries.ring[line_pos];
+ elog(DEBUG1, "[Leader] Adding - block:%d, offset:%d, line position:%d",
+ line_first_block, lineInfo->start_offset, line_pos);
+ }
+
first_char_in_line = false;
} /* end of outer loop */
@@ -5144,6 +5961,7 @@ not_end_of_copy:
*/
REFILL_LINEBUF;
CLEAR_EOL_LINE();
+ EndLineParallelCopy(cstate, line_pos, line_size, raw_buf_ptr);
return result;
}
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index f41785f..81cc1f4 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2353,7 +2353,7 @@ psql_completion(const char *text, int start, int end)
/* Complete COPY <sth> FROM|TO filename WITH ( */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "("))
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
- "HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
+ "HEADER", "PARALLEL", "QUOTE", "ESCAPE", "FORCE_QUOTE",
"FORCE_NOT_NULL", "FORCE_NULL", "ENCODING");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index df1b43a..71a6c9b 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -385,6 +385,8 @@ extern FullTransactionId GetTopFullTransactionId(void);
extern FullTransactionId GetTopFullTransactionIdIfAny(void);
extern FullTransactionId GetCurrentFullTransactionId(void);
extern FullTransactionId GetCurrentFullTransactionIdIfAny(void);
+extern void AssignFullTransactionIdForWorker(FullTransactionId fullTransactionId);
+extern void AssignCommandIdForWorker(CommandId commandId, bool used);
extern void MarkCurrentTransactionIdLoggedIfAny(void);
extern bool SubTransactionIsActive(SubTransactionId subxid);
extern CommandId GetCurrentCommandId(bool used);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a0e4ac7..5e5c534 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1705,6 +1705,7 @@ ParallelCopyLineBoundary
ParallelCopyData
ParallelCopyDataBlock
ParallelCopyLineBuf
+ParallelCopyLineState
ParallelCopyShmInfo
ParallelExecutorInfo
ParallelHashGrowth
--
1.8.3.1
v4-0004-Documentation-for-parallel-copy.patchtext/x-patch; charset=US-ASCII; name=v4-0004-Documentation-for-parallel-copy.patchDownload
From 65bb658d94e36d3d0f44a19ffaaf60315988c548 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Sat, 1 Aug 2020 09:00:36 +0530
Subject: [PATCH v4 4/6] Documentation for parallel copy.
This patch has the documentation changes for parallel copy.
---
doc/src/sgml/ref/copy.sgml | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18189ab..2e023ed 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -37,6 +37,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
NULL '<replaceable class="parameter">null_string</replaceable>'
HEADER [ <replaceable class="parameter">boolean</replaceable> ]
+ PARALLEL <replaceable class="parameter">integer</replaceable>
QUOTE '<replaceable class="parameter">quote_character</replaceable>'
ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -275,6 +276,22 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</varlistentry>
<varlistentry>
+ <term><literal>PARALLEL</literal></term>
+ <listitem>
+ <para>
+ Perform <command>COPY FROM</command> in parallel using <replaceable
+ class="parameter"> integer</replaceable> background workers. Please
+ note that it is not guaranteed that the number of parallel workers
+ specified in <replaceable class="parameter">integer</replaceable> will
+ be used during execution. It is possible for a copy to run with fewer
+ workers than specified, or even with no workers at all (for example,
+ due to the setting of max_worker_processes). This option is allowed
+ only in <command>COPY FROM</command>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>QUOTE</literal></term>
<listitem>
<para>
--
1.8.3.1
v4-0005-Tests-for-parallel-copy.patchtext/x-patch; charset=US-ASCII; name=v4-0005-Tests-for-parallel-copy.patchDownload
From ff4f2e8ba3ebf9a6ee25b40450c41c8175ec3e7e Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Sat, 1 Aug 2020 09:19:39 +0530
Subject: [PATCH v4 5/6] Tests for parallel copy.
This patch has the tests for parallel copy.
---
src/test/regress/expected/copy2.out | 205 ++++++++++++++++++++++++++++++++++-
src/test/regress/input/copy.source | 12 +++
src/test/regress/output/copy.source | 12 +++
src/test/regress/sql/copy2.sql | 208 +++++++++++++++++++++++++++++++++++-
4 files changed, 429 insertions(+), 8 deletions(-)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index e40287d..7ae5d44 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -254,18 +254,32 @@ It is "perfect".|
"It is ""perfect""."," "
"",
--test that we read consecutive LFs properly
-CREATE TEMP TABLE testnl (a int, b text, c int);
+CREATE TABLE testnl (a int, b text, c int);
COPY testnl FROM stdin CSV;
+COPY testnl FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
+SELECT * FROM testnl;
+ a | b | c
+---+----------------------+---
+ 1 | a field with two LFs+| 2
+ | +|
+ | inside |
+ 1 | a field with two LFs+| 2
+ | +|
+ | inside |
+(2 rows)
+
-- test end of copy marker
-CREATE TEMP TABLE testeoc (a text);
+CREATE TABLE testeoc (a text);
COPY testeoc FROM stdin CSV;
+TRUNCATE testeoc;
+COPY testeoc FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
COPY testeoc TO stdout CSV;
a\.
\.b
c\.d
"\."
-- test handling of nonstandard null marker that violates escaping rules
-CREATE TEMP TABLE testnull(a int, b text);
+CREATE TABLE testnull(a int, b text);
INSERT INTO testnull VALUES (1, E'\\0'), (NULL, NULL);
COPY testnull TO stdout WITH NULL AS E'\\0';
1 \\0
@@ -280,6 +294,15 @@ SELECT * FROM testnull;
|
(4 rows)
+TRUNCATE testnull;
+COPY testnull FROM stdin WITH (NULL E'\\0', PARALLEL 1);
+SELECT * FROM testnull;
+ a | b
+----+----
+ 42 | \0
+ |
+(2 rows)
+
BEGIN;
CREATE TABLE vistest (LIKE testeoc);
COPY vistest FROM stdin CSV;
@@ -349,6 +372,34 @@ SELECT * FROM vistest;
BEGIN;
TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+SELECT * FROM vistest;
+ a
+----
+ a2
+ b
+(2 rows)
+
+SAVEPOINT s1;
+TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+SELECT * FROM vistest;
+ a
+----
+ d2
+ e
+(2 rows)
+
+COMMIT;
+SELECT * FROM vistest;
+ a
+----
+ d2
+ e
+(2 rows)
+
+BEGIN;
+TRUNCATE vistest;
COPY vistest FROM stdin CSV FREEZE;
SELECT * FROM vistest;
a
@@ -409,7 +460,7 @@ SELECT * FROM vistest;
(2 rows)
-- Test FORCE_NOT_NULL and FORCE_NULL options
-CREATE TEMP TABLE forcetest (
+CREATE TABLE forcetest (
a INT NOT NULL,
b TEXT NOT NULL,
c TEXT,
@@ -420,6 +471,8 @@ CREATE TEMP TABLE forcetest (
-- should succeed with no effect ("b" remains an empty string, "c" remains NULL)
BEGIN;
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
+TRUNCATE forcetest;
+COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c), PARALLEL 1);
COMMIT;
SELECT b, c FROM forcetest WHERE a = 1;
b | c
@@ -430,6 +483,8 @@ SELECT b, c FROM forcetest WHERE a = 1;
-- should succeed, FORCE_NULL and FORCE_NOT_NULL can be both specified
BEGIN;
COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d));
+TRUNCATE forcetest;
+COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d), PARALLEL 1);
COMMIT;
SELECT c, d FROM forcetest WHERE a = 2;
c | d
@@ -486,6 +541,31 @@ select * from check_con_tbl;
(2 rows)
+\d+ check_con_tbl
+ Table "public.check_con_tbl"
+ Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
+--------+---------+-----------+----------+---------+---------+--------------+-------------
+ f1 | integer | | | | plain | |
+Check constraints:
+ "check_con_tbl_check" CHECK (check_con_function(check_con_tbl.*))
+
+truncate check_con_tbl;
+copy check_con_tbl from stdin with (parallel 1);
+NOTICE: input = {"f1":1}
+NOTICE: input = {"f1":null}
+copy check_con_tbl from stdin with (parallel 1);
+NOTICE: input = {"f1":0}
+ERROR: new row for relation "check_con_tbl" violates check constraint "check_con_tbl_check"
+DETAIL: Failing row contains (0).
+CONTEXT: COPY check_con_tbl, line 1: "0"
+parallel worker
+select * from check_con_tbl;
+ f1
+----
+ 1
+
+(2 rows)
+
-- test with RLS enabled.
CREATE ROLE regress_rls_copy_user;
CREATE ROLE regress_rls_copy_user_colperms;
@@ -600,8 +680,125 @@ SELECT * FROM instead_of_insert_tbl;
(2 rows)
COMMIT;
+-- Parallel copy tests.
+CREATE TABLE test_parallel_copy (
+ a INT,
+ b INT,
+ c TEXT not null default 'stuff',
+ d TEXT,
+ e TEXT
+) ;
+COPY test_parallel_copy (a, b, c, d, e) from stdin with (PARALLEL 1);
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1, FORMAT 'csv', HEADER);
+-- zero workers: should perform non-parallel copy
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0');
+ERROR: value 0 out of bounds for option "parallel"
+DETAIL: Valid values are between "1" and "1024".
+-- referencing table: should perform non-parallel copy
+CREATE TABLE test_copy_pk(c1 INT PRIMARY KEY);
+INSERT INTO test_copy_pk VALUES(10);
+CREATE TABLE test_copy_ri(c1 INT REFERENCES test_copy_pk(c1));
+COPY test_copy_ri from stdin with (FORMAT csv, DELIMITER ',', PARALLEL 1);
+-- expressions: should perform non-parallel copy
+CREATE TABLE test_copy_expr (index INT, height REAL, weight REAL);
+COPY test_copy_expr FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1) WHERE height > random() * 65;
+-- serial data: should perform non-parallel copy
+CREATE TABLE testserial (index SERIAL, height REAL);
+COPY testserial(height) FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1);
+-- temporary table copy: should perform non-parallel copy
+CREATE TEMPORARY TABLE temp_test(
+ a int
+) ;
+COPY temp_test (a) from stdin with (PARALLEL 1);
+-- non-existent column in column list: should fail
+COPY test_parallel_copy (xyz) from stdin with (PARALLEL 1);
+ERROR: column "xyz" of relation "test_parallel_copy" does not exist
+-- too many columns in column list: should fail
+COPY test_parallel_copy (a, b, c, d, e, d, c) from stdin with (PARALLEL 1);
+ERROR: column "d" specified more than once
+-- missing data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: invalid input syntax for type integer: ""
+CONTEXT: COPY test_parallel_copy, line 0, column a: ""
+parallel worker
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: missing data for column "e"
+CONTEXT: COPY test_parallel_copy, line 1: "2000 230 23 23"
+parallel worker
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: missing data for column "e"
+CONTEXT: COPY test_parallel_copy, line 1: "2001 231 \N \N"
+parallel worker
+-- extra data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: extra data after last expected column
+CONTEXT: COPY test_parallel_copy, line 1: "2002 232 40 50 60 70 80"
+parallel worker
+-- various COPY options: delimiters, oids, NULL string, encoding
+COPY test_parallel_copy (b, c, d, e) from stdin with (delimiter ',', null 'x', PARALLEL 1) ;
+COPY test_parallel_copy from stdin WITH (DELIMITER ';', NULL '', PARALLEL 1);
+COPY test_parallel_copy from stdin WITH (DELIMITER ':', NULL E'\\X', ENCODING 'sql_ascii', PARALLEL 1);
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = 50004;
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a > 60003;
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE f > 60003;
+ERROR: column "f" does not exist
+LINE 1: ..._parallel_copy from stdin WITH (PARALLEL 1) WHERE f > 60003;
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = max(x.b);
+ERROR: missing FROM-clause entry for table "x"
+LINE 1: ...rallel_copy from stdin WITH (PARALLEL 1) WHERE a = max(x.b);
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (SELECT 1 FROM x);
+ERROR: cannot use subquery in COPY FROM WHERE condition
+LINE 1: ...arallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (SELECT...
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (generate_series(1,5));
+ERROR: set-returning functions are not allowed in COPY FROM WHERE conditions
+LINE 1: ...lel_copy from stdin WITH (PARALLEL 1) WHERE a IN (generate_s...
+ ^
+COPY test_parallel_copy from stdin WITH(PARALLEL 1) WHERE a = row_number() over(b);
+ERROR: window functions are not allowed in COPY FROM WHERE conditions
+LINE 1: ...rallel_copy from stdin WITH(PARALLEL 1) WHERE a = row_number...
+ ^
+-- check results of copy in
+SELECT * FROM test_parallel_copy;
+ a | b | c | d | e
+-------+----+------------+---------+---------
+ 1 | 11 | test_c1 | test_d1 | test_e1
+ 2 | 12 | test_c2 | test_d2 | test_e2
+ | 3 | stuff | test_d3 |
+ | 4 | stuff | test_d4 |
+ | 5 | stuff | test_d5 |
+ | | 45 | 80 | 90
+ | | x | \x | \x
+ | | , | \, | \
+ 3000 | | c | |
+ 4000 | | C | |
+ 4001 | 1 | empty | |
+ 4002 | 2 | null | |
+ 4003 | 3 | Backslash | \ | \
+ 4004 | 4 | BackslashX | \X | \X
+ 4005 | 5 | N | N | N
+ 4006 | 6 | BackslashN | \N | \N
+ 4007 | 7 | XX | XX | XX
+ 4008 | 8 | Delimiter | : | :
+ 50004 | 25 | 35 | 45 | 55
+ 60004 | 25 | 35 | 45 | 55
+ 60005 | 26 | 36 | 46 | 56
+(21 rows)
+
-- clean up
+DROP TABLE test_copy_ri;
+DROP TABLE test_copy_pk;
+DROP TABLE test_copy_expr;
+DROP TABLE testeoc;
+DROP TABLE testnl;
DROP TABLE forcetest;
+DROP TABLE test_parallel_copy;
+DROP TABLE testserial;
+DROP TABLE testnull;
DROP TABLE vistest;
DROP FUNCTION truncate_in_subxact();
DROP TABLE x, y;
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index a1d529a..159c058 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -15,6 +15,13 @@ DELETE FROM onek;
COPY onek FROM '@abs_builddir@/results/onek.data';
+-- test parallel copy
+COPY tenk1 FROM '@abs_srcdir@/data/tenk.data' WITH (parallel 2);
+
+SELECT COUNT(*) FROM tenk1;
+
+TRUNCATE tenk1;
+
COPY tenk1 FROM '@abs_srcdir@/data/tenk.data';
COPY slow_emp4000 FROM '@abs_srcdir@/data/rect.data';
@@ -159,6 +166,11 @@ truncate parted_copytest;
copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv';
+-- Test parallel copy from with a partitioned table.
+truncate parted_copytest;
+
+copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv' with (parallel 2);
+
-- Ensure COPY FREEZE errors for partitioned tables.
begin;
truncate parted_copytest;
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index 938d355..c3003fe 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -9,6 +9,15 @@ COPY onek FROM '@abs_srcdir@/data/onek.data';
COPY onek TO '@abs_builddir@/results/onek.data';
DELETE FROM onek;
COPY onek FROM '@abs_builddir@/results/onek.data';
+-- test parallel copy
+COPY tenk1 FROM '@abs_srcdir@/data/tenk.data' WITH (parallel 2);
+SELECT COUNT(*) FROM tenk1;
+ count
+-------
+ 10000
+(1 row)
+
+TRUNCATE tenk1;
COPY tenk1 FROM '@abs_srcdir@/data/tenk.data';
COPY slow_emp4000 FROM '@abs_srcdir@/data/rect.data';
COPY person FROM '@abs_srcdir@/data/person.data';
@@ -113,6 +122,9 @@ insert into parted_copytest select x,1,'One' from generate_series(1011,1020) x;
copy (select * from parted_copytest order by a) to '@abs_builddir@/results/parted_copytest.csv';
truncate parted_copytest;
copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv';
+-- Test parallel copy from with a partitioned table.
+truncate parted_copytest;
+copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv' with (parallel 2);
-- Ensure COPY FREEZE errors for partitioned tables.
begin;
truncate parted_copytest;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 902f4fa..7015698 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -157,7 +157,7 @@ COPY y TO stdout (FORMAT CSV, FORCE_QUOTE *);
--test that we read consecutive LFs properly
-CREATE TEMP TABLE testnl (a int, b text, c int);
+CREATE TABLE testnl (a int, b text, c int);
COPY testnl FROM stdin CSV;
1,"a field with two LFs
@@ -165,8 +165,16 @@ COPY testnl FROM stdin CSV;
inside",2
\.
+COPY testnl FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
+1,"a field with two LFs
+
+inside",2
+\.
+
+SELECT * FROM testnl;
+
-- test end of copy marker
-CREATE TEMP TABLE testeoc (a text);
+CREATE TABLE testeoc (a text);
COPY testeoc FROM stdin CSV;
a\.
@@ -175,11 +183,19 @@ c\.d
"\."
\.
+TRUNCATE testeoc;
+COPY testeoc FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
+a\.
+\.b
+c\.d
+"\."
+\.
+
COPY testeoc TO stdout CSV;
-- test handling of nonstandard null marker that violates escaping rules
-CREATE TEMP TABLE testnull(a int, b text);
+CREATE TABLE testnull(a int, b text);
INSERT INTO testnull VALUES (1, E'\\0'), (NULL, NULL);
COPY testnull TO stdout WITH NULL AS E'\\0';
@@ -191,6 +207,14 @@ COPY testnull FROM stdin WITH NULL AS E'\\0';
SELECT * FROM testnull;
+TRUNCATE testnull;
+COPY testnull FROM stdin WITH (NULL E'\\0', PARALLEL 1);
+42 \\0
+\0 \0
+\.
+
+SELECT * FROM testnull;
+
BEGIN;
CREATE TABLE vistest (LIKE testeoc);
COPY vistest FROM stdin CSV;
@@ -235,6 +259,23 @@ SELECT * FROM vistest;
BEGIN;
TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+a2
+b
+\.
+SELECT * FROM vistest;
+SAVEPOINT s1;
+TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+d2
+e
+\.
+SELECT * FROM vistest;
+COMMIT;
+SELECT * FROM vistest;
+
+BEGIN;
+TRUNCATE vistest;
COPY vistest FROM stdin CSV FREEZE;
x
y
@@ -284,7 +325,7 @@ SELECT * FROM vistest;
COMMIT;
SELECT * FROM vistest;
-- Test FORCE_NOT_NULL and FORCE_NULL options
-CREATE TEMP TABLE forcetest (
+CREATE TABLE forcetest (
a INT NOT NULL,
b TEXT NOT NULL,
c TEXT,
@@ -297,6 +338,10 @@ BEGIN;
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
1,,""
\.
+TRUNCATE forcetest;
+COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c), PARALLEL 1);
+1,,""
+\.
COMMIT;
SELECT b, c FROM forcetest WHERE a = 1;
-- should succeed, FORCE_NULL and FORCE_NOT_NULL can be both specified
@@ -304,6 +349,10 @@ BEGIN;
COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d));
2,'a',,""
\.
+TRUNCATE forcetest;
+COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d), PARALLEL 1);
+2,'a',,""
+\.
COMMIT;
SELECT c, d FROM forcetest WHERE a = 2;
-- should fail with not-null constraint violation
@@ -339,6 +388,16 @@ copy check_con_tbl from stdin;
0
\.
select * from check_con_tbl;
+\d+ check_con_tbl
+truncate check_con_tbl;
+copy check_con_tbl from stdin with (parallel 1);
+1
+\N
+\.
+copy check_con_tbl from stdin with (parallel 1);
+0
+\.
+select * from check_con_tbl;
-- test with RLS enabled.
CREATE ROLE regress_rls_copy_user;
@@ -440,8 +499,149 @@ test1
SELECT * FROM instead_of_insert_tbl;
COMMIT;
+-- Parallel copy tests.
+CREATE TABLE test_parallel_copy (
+ a INT,
+ b INT,
+ c TEXT not null default 'stuff',
+ d TEXT,
+ e TEXT
+) ;
+
+COPY test_parallel_copy (a, b, c, d, e) from stdin with (PARALLEL 1);
+1 11 test_c1 test_d1 test_e1
+2 12 test_c2 test_d2 test_e2
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+3 test_d3
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+4 test_d4
+5 test_d5
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1, FORMAT 'csv', HEADER);
+b d
+\.
+
+-- zero workers: should perform non-parallel copy
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0');
+
+-- referencing table: should perform non-parallel copy
+CREATE TABLE test_copy_pk(c1 INT PRIMARY KEY);
+INSERT INTO test_copy_pk VALUES(10);
+CREATE TABLE test_copy_ri(c1 INT REFERENCES test_copy_pk(c1));
+COPY test_copy_ri from stdin with (FORMAT csv, DELIMITER ',', PARALLEL 1);
+10
+\.
+
+-- expressions: should perform non-parallel copy
+CREATE TABLE test_copy_expr (index INT, height REAL, weight REAL);
+
+COPY test_copy_expr FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1) WHERE height > random() * 65;
+60,60,60
+\.
+
+-- serial data: should perform non-parallel copy
+CREATE TABLE testserial (index SERIAL, height REAL);
+
+COPY testserial(height) FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1);
+60
+\.
+
+-- temporary table copy: should perform non-parallel copy
+CREATE TEMPORARY TABLE temp_test(
+ a int
+) ;
+
+COPY temp_test (a) from stdin with (PARALLEL 1);
+10
+\.
+
+-- non-existent column in column list: should fail
+COPY test_parallel_copy (xyz) from stdin with (PARALLEL 1);
+
+-- too many columns in column list: should fail
+COPY test_parallel_copy (a, b, c, d, e, d, c) from stdin with (PARALLEL 1);
+
+-- missing data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+
+\.
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+2000 230 23 23
+\.
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+2001 231 \N \N
+\.
+
+-- extra data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+2002 232 40 50 60 70 80
+\.
+
+-- various COPY options: delimiters, oids, NULL string, encoding
+COPY test_parallel_copy (b, c, d, e) from stdin with (delimiter ',', null 'x', PARALLEL 1) ;
+x,45,80,90
+x,\x,\\x,\\\x
+x,\,,\\\,,\\
+\.
+
+COPY test_parallel_copy from stdin WITH (DELIMITER ';', NULL '', PARALLEL 1);
+3000;;c;;
+\.
+
+COPY test_parallel_copy from stdin WITH (DELIMITER ':', NULL E'\\X', ENCODING 'sql_ascii', PARALLEL 1);
+4000:\X:C:\X:\X
+4001:1:empty::
+4002:2:null:\X:\X
+4003:3:Backslash:\\:\\
+4004:4:BackslashX:\\X:\\X
+4005:5:N:\N:\N
+4006:6:BackslashN:\\N:\\N
+4007:7:XX:\XX:\XX
+4008:8:Delimiter:\::\:
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = 50004;
+50003 24 34 44 54
+50004 25 35 45 55
+50005 26 36 46 56
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a > 60003;
+60001 22 32 42 52
+60002 23 33 43 53
+60003 24 34 44 54
+60004 25 35 45 55
+60005 26 36 46 56
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE f > 60003;
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = max(x.b);
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (SELECT 1 FROM x);
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (generate_series(1,5));
+
+COPY test_parallel_copy from stdin WITH(PARALLEL 1) WHERE a = row_number() over(b);
+
+-- check results of copy in
+SELECT * FROM test_parallel_copy;
+
-- clean up
+DROP TABLE test_copy_ri;
+DROP TABLE test_copy_pk;
+DROP TABLE test_copy_expr;
+DROP TABLE testeoc;
+DROP TABLE testnl;
DROP TABLE forcetest;
+DROP TABLE test_parallel_copy;
+DROP TABLE testserial;
+DROP TABLE testnull;
DROP TABLE vistest;
DROP FUNCTION truncate_in_subxact();
DROP TABLE x, y;
--
1.8.3.1
v4-0006-Parallel-Copy-For-Binary-Format-Files.patchtext/x-patch; charset=US-ASCII; name=v4-0006-Parallel-Copy-For-Binary-Format-Files.patchDownload
From ccbd0cc315e2733a104dedb984ad5003f74c97e5 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Tue, 18 Aug 2020 16:17:14 +0530
Subject: [PATCH v4 6/6] Parallel Copy For Binary Format Files
Leader reads data from the file into the DSM data blocks each of 64K size.
It also identifies each tuple data block id, start offset, end offset,
tuple size and updates this information in the ring data structure.
Workers parallelly read the tuple information from the ring data structure,
the actual tuple data from the data blocks and parallelly insert the tuples
into the table.
---
src/backend/commands/copy.c | 684 ++++++++++++++++++++++++++++++++++++++------
1 file changed, 599 insertions(+), 85 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 868ba4a..f63dc49 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -264,6 +264,17 @@ typedef struct ParallelCopyLineBuf
}ParallelCopyLineBuf;
/*
+ * Represents the usage mode for CopyReadBinaryGetDataBlock.
+ */
+typedef enum FieldInfoType
+{
+ FIELD_NONE = 0,
+ FIELD_COUNT,
+ FIELD_SIZE,
+ FIELD_DATA
+} FieldInfoType;
+
+/*
* Parallel copy data information.
*/
typedef struct ParallelCopyData
@@ -284,6 +295,9 @@ typedef struct ParallelCopyData
/* Current position in worker_line_buf */
uint32 worker_line_buf_pos;
+
+ /* For binary formatted files */
+ ParallelCopyDataBlock *curr_data_block;
}ParallelCopyData;
/*
@@ -448,6 +462,7 @@ typedef struct SerializedParallelCopyState
/* Working state for COPY FROM */
AttrNumber num_defaults;
Oid relid;
+ bool binary;
}SerializedParallelCopyState;
/* DestReceiver for COPY (query) TO */
@@ -522,7 +537,6 @@ typedef struct CopyMultiInsertInfo
int ti_options; /* table insert options */
} CopyMultiInsertInfo;
-
/*
* These macros centralize code used to process line_buf and raw_buf buffers.
* They are macros because they often do continue/break control and to avoid
@@ -650,11 +664,113 @@ if (!IsParallelCopy()) \
else \
return pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
+/*
+ * CHECK_FIELD_COUNT - Handles the error cases for field count
+ * for binary format files.
+ */
+#define CHECK_FIELD_COUNT \
+{\
+ if (fld_count == -1) \
+ { \
+ if (IsParallelCopy() && \
+ !IsLeader()) \
+ return true; \
+ else if (IsParallelCopy() && \
+ IsLeader()) \
+ { \
+ if (cstate->pcdata->curr_data_block->data[cstate->raw_buf_index + sizeof(fld_count)] != 0) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("received copy data after EOF marker"))); \
+ return true; \
+ } \
+ else \
+ { \
+ /* \
+ * Received EOF marker. In a V3-protocol copy, wait for the \
+ * protocol-level EOF, and complain if it doesn't come \
+ * immediately. This ensures that we correctly handle CopyFail, \
+ * if client chooses to send that now. \
+ * \
+ * Note that we MUST NOT try to read more data in an old-protocol \
+ * copy, since there is no protocol-level EOF marker then. We \
+ * could go either way for copy from file, but choose to throw \
+ * error if there's data after the EOF marker, for consistency \
+ * with the new-protocol case. \
+ */ \
+ char dummy; \
+ if (cstate->copy_dest != COPY_OLD_FE && \
+ CopyReadBinaryData(cstate, &dummy, 1) > 0) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("received copy data after EOF marker"))); \
+ return false; \
+ } \
+ } \
+ if (fld_count != cstate->max_fields) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("row field count is %d, expected %d", \
+ (int) fld_count, cstate->max_fields))); \
+}
+
+/*
+ * CHECK_FIELD_SIZE - Handles the error case for field size
+ * for binary format files.
+ */
+#define CHECK_FIELD_SIZE(fld_size) \
+{ \
+ if (fld_size < -1) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("invalid field size")));\
+}
+
+/*
+ * EOF_ERROR - Error statement for EOF for binary format
+ * files.
+ */
+#define EOF_ERROR \
+{ \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("unexpected EOF in COPY data")));\
+}
+
+/*
+ * GET_RAW_BUF_INDEX - Calculates the raw buf index for the cases
+ * where the data spread is across multiple data blocks.
+ */
+#define GET_RAW_BUF_INDEX(raw_buf_index, fld_size, required_blks, curr_blk_bytes) \
+{ \
+ raw_buf_index = fld_size - (((required_blks - 1) * DATA_BLOCK_SIZE) + curr_blk_bytes); \
+}
+
+/*
+ * GET_REQUIRED_BLOCKS - Calculates the number of data
+ * blocks required for the cases where the data spread
+ * is across multiple data blocks.
+ */
+#define GET_REQUIRED_BLOCKS(required_blks, fld_size, curr_blk_bytes) \
+{ \
+ /* \
+ * field size can spread across multiple data blocks, \
+ * calculate the number of required data blocks and try to get \
+ * those many data blocks. \
+ */ \
+ required_blks = (int32)(fld_size - curr_blk_bytes)/(int32)DATA_BLOCK_SIZE; \
+ /* \
+ * check if we need the data block for the field data \
+ * bytes that are not modulus of data block size. \
+ */ \
+ if ((fld_size - curr_blk_bytes)%DATA_BLOCK_SIZE != 0) \
+ required_blks++; \
+}
+
/* End parallel copy Macros */
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
-
/* non-export function prototypes */
static CopyState BeginCopy(ParseState *pstate, bool is_from, Relation rel,
RawStmt *raw_query, Oid queryRelId, List *attnamelist,
@@ -709,6 +825,14 @@ static void ExecBeforeStmtTrigger(CopyState cstate);
static void CheckTargetRelValidity(CopyState cstate);
static void PopulateCstateCatalogInfo(CopyState cstate);
static pg_attribute_always_inline uint32 GetLinePosition(CopyState cstate);
+static uint32 WaitGetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info);
+static bool CopyReadBinaryTupleLeader(CopyState cstate);
+static void CopyReadBinaryFindTupleSize(CopyState cstate, uint32 *line_size);
+static bool CopyReadBinaryTupleWorker(CopyState cstate, Datum *values, bool *nulls);
+static Datum CopyReadBinaryAttributeWorker(CopyState cstate, FmgrInfo *flinfo,
+ Oid typioparam, int32 typmod, bool *isnull);
+static void CopyReadBinaryGetDataBlock(CopyState cstate, FieldInfoType field_info);
+
/*
* SerializeParallelCopyState - Copy shared_cstate using cstate information.
*/
@@ -726,6 +850,7 @@ SerializeParallelCopyState(CopyState cstate, SerializedParallelCopyState *shared
shared_cstate->convert_selectively = cstate->convert_selectively;
shared_cstate->num_defaults = cstate->num_defaults;
shared_cstate->relid = cstate->pcdata->relid;
+ shared_cstate->binary = cstate->binary;
}
/*
@@ -885,8 +1010,8 @@ FindInsertMethod(CopyState cstate)
static pg_attribute_always_inline bool
IsParallelCopyAllowed(CopyState cstate)
{
- /* Parallel copy not allowed for frontend (2.0 protocol) & binary option. */
- if ((cstate->copy_dest == COPY_OLD_FE) || cstate->binary)
+ /* Parallel copy not allowed for frontend (2.0 protocol). */
+ if (cstate->copy_dest == COPY_OLD_FE)
return false;
/* Check if copy is into foreign table or temporary table. */
@@ -1156,6 +1281,7 @@ InitializeParallelCopyInfo(SerializedParallelCopyState *shared_cstate,
cstate->convert_selectively = shared_cstate->convert_selectively;
cstate->num_defaults = shared_cstate->num_defaults;
pcdata->relid = shared_cstate->relid;
+ cstate->binary = shared_cstate->binary;
PopulateCommonCstateInfo(cstate, tup_desc, attnamelist);
@@ -1551,32 +1677,66 @@ ParallelCopyFrom(CopyState cstate)
/* Execute the before statement triggers from the leader */
ExecBeforeStmtTrigger(cstate);
- /* On input just throw the header line away. */
- if (cstate->cur_lineno == 0 && cstate->header_line)
+ if (!cstate->binary)
{
- cstate->cur_lineno++;
- if (CopyReadLine(cstate))
+ /* On input just throw the header line away. */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
{
- pcshared_info->is_read_in_progress = false;
- return; /* done */
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ {
+ pcshared_info->is_read_in_progress = false;
+ return; /* done */
+ }
}
- }
- for (;;)
- {
- bool done;
- cstate->cur_lineno++;
+ for (;;)
+ {
+ bool done;
+ cstate->cur_lineno++;
- /* Actually read the line into memory here. */
- done = CopyReadLine(cstate);
+ /* Actually read the line into memory here. */
+ done = CopyReadLine(cstate);
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ break;
+ }
+ }
+ else
+ {
/*
- * EOF at start of line means we're done. If we see EOF after some
- * characters, we act as though it was newline followed by EOF, ie,
- * process the line and then exit loop on next iteration.
+ * Binary Format Files.
+ * For parallel copy leader, fill in the error
+ * context information here, in case any failures
+ * while determining tuple offsets, leader
+ * would throw the errors with proper context.
*/
- if (done && cstate->line_buf.len == 0)
- break;
+ ErrorContextCallback errcallback;
+ errcallback.callback = CopyFromErrorCallback;
+ errcallback.arg = (void *) cstate;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+ cstate->pcdata->curr_data_block = NULL;
+ cstate->raw_buf_index = 0;
+ pcshared_info->populated = 0;
+ cstate->cur_lineno = 0;
+ cstate->max_fields = list_length(cstate->attnumlist);
+
+ for (;;)
+ {
+ bool eof = false;
+ cstate->cur_lineno++;
+
+ eof = CopyReadBinaryTupleLeader(cstate);
+
+ if (eof)
+ break;
+ }
}
pcshared_info->is_read_in_progress = false;
@@ -1584,7 +1744,355 @@ ParallelCopyFrom(CopyState cstate)
}
/*
- * GetLinePosition - return the line position that worker should process.
+ * CopyReadBinaryGetDataBlock - gets a new block, updates
+ * the current offset, calculates the skip bytes.
+ */
+static void
+CopyReadBinaryGetDataBlock(CopyState cstate, FieldInfoType field_info)
+{
+ ParallelCopyDataBlock *data_block = NULL;
+ ParallelCopyDataBlock *curr_data_block = cstate->pcdata->curr_data_block;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint8 move_bytes = 0;
+ uint32 block_pos;
+ uint32 prev_block_pos;
+ int read_bytes = 0;
+
+ prev_block_pos = pcshared_info->cur_block_pos;
+
+ block_pos = WaitGetFreeCopyBlock(pcshared_info);
+
+ if (field_info == FIELD_SIZE || field_info == FIELD_COUNT)
+ move_bytes = (DATA_BLOCK_SIZE - cstate->raw_buf_index);
+
+ if (curr_data_block != NULL)
+ curr_data_block->skip_bytes = move_bytes;
+
+ data_block = &pcshared_info->data_blocks[block_pos];
+
+ if (move_bytes > 0 && curr_data_block != NULL)
+ memmove(&data_block->data[0], &curr_data_block->data[cstate->raw_buf_index], move_bytes);
+
+ elog(DEBUG1, "LEADER - field info %d is spread across data blocks - moved %d bytes from current block %u to %u block",
+ field_info, move_bytes, prev_block_pos, block_pos);
+
+ read_bytes = CopyGetData(cstate, &data_block->data[move_bytes], 1, (DATA_BLOCK_SIZE - move_bytes));
+
+ if (field_info == FIELD_NONE && cstate->reached_eof)
+ return;
+
+ if (cstate->reached_eof)
+ EOF_ERROR;
+
+ elog(DEBUG1, "LEADER - bytes read from file %d", read_bytes);
+
+ if(field_info == FIELD_SIZE || field_info == FIELD_DATA)
+ {
+ ParallelCopyDataBlock *prev_data_block = NULL;
+ prev_data_block = curr_data_block;
+ prev_data_block->following_block = block_pos;
+
+ if (prev_data_block->curr_blk_completed == false)
+ prev_data_block->curr_blk_completed = true;
+
+ pg_atomic_add_fetch_u32(&prev_data_block->unprocessed_line_parts, 1);
+ }
+
+ cstate->pcdata->curr_data_block = data_block;
+ cstate->raw_buf_index = 0;
+}
+
+/*
+ * CopyReadBinaryTupleLeader - leader reads data from binary formatted file
+ * to data blocks and identifies tuple boundaries/offsets so that workers
+ * can work on the data blocks data.
+ */
+static bool
+CopyReadBinaryTupleLeader(CopyState cstate)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ int16 fld_count;
+ uint32 line_size = 0;
+ uint32 start_block_pos;
+ uint32 start_offset;
+
+ if (cstate->pcdata->curr_data_block == NULL)
+ {
+ CopyReadBinaryGetDataBlock(cstate, FIELD_NONE);
+
+ /*
+ * no data is read from file here. one possibility
+ * to be here could be that the binary file just
+ * has a valid signature but nothing else.
+ */
+ if (cstate->reached_eof)
+ return true;
+ }
+
+ if ((cstate->raw_buf_index + sizeof(fld_count)) >= DATA_BLOCK_SIZE)
+ CopyReadBinaryGetDataBlock(cstate, FIELD_COUNT);
+
+ memcpy(&fld_count, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_count));
+ fld_count = (int16) pg_ntoh16(fld_count);
+ CHECK_FIELD_COUNT;
+ start_offset = cstate->raw_buf_index;
+ cstate->raw_buf_index += sizeof(fld_count);
+ line_size += sizeof(fld_count);
+ start_block_pos = pcshared_info->cur_block_pos;
+
+ CopyReadBinaryFindTupleSize(cstate, &line_size);
+
+ pg_atomic_add_fetch_u32(&cstate->pcdata->curr_data_block->unprocessed_line_parts, 1);
+
+ if (line_size > 0)
+ {
+ int line_pos = UpdateBlockInLineInfo(cstate,
+ start_block_pos,
+ start_offset,
+ line_size,
+ LINE_LEADER_POPULATED);
+
+ pcshared_info->populated++;
+ elog(DEBUG1, "LEADER - adding - block:%u, offset:%u, line size:%u line position:%d",
+ start_block_pos, start_offset, line_size, line_pos);
+ }
+
+ return false;
+}
+
+/*
+ * CopyReadBinaryFindTupleSize - leader identifies boundaries/
+ * offsets for each attribute/column and finally results in the
+ * tuple/row size. It moves on to next data block if the attribute/
+ * column is spread across data blocks.
+ */
+static void
+CopyReadBinaryFindTupleSize(CopyState cstate, uint32 *line_size)
+{
+ int32 fld_size;
+ ListCell *cur;
+ TupleDesc tup_desc = RelationGetDescr(cstate->rel);
+
+ foreach(cur, cstate->attnumlist)
+ {
+ int att_num = lfirst_int(cur);
+ Form_pg_attribute att = TupleDescAttr(tup_desc, (att_num - 1));
+ cstate->cur_attname = NameStr(att->attname);
+ fld_size = 0;
+
+ if ((cstate->raw_buf_index + sizeof(fld_size)) >= DATA_BLOCK_SIZE)
+ CopyReadBinaryGetDataBlock(cstate, FIELD_SIZE);
+
+ memcpy(&fld_size, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_size));
+ cstate->raw_buf_index += sizeof(fld_size);
+ *line_size += sizeof(fld_size);
+ fld_size = (int32) pg_ntoh32(fld_size);
+
+ CHECK_FIELD_SIZE(fld_size);
+
+ *line_size += fld_size;
+
+ if ((DATA_BLOCK_SIZE - cstate->raw_buf_index) >= fld_size)
+ {
+ cstate->raw_buf_index += fld_size;
+ elog(DEBUG1, "LEADER - tuple lies in he same data block");
+ }
+ else
+ {
+ int32 required_blks = 0;
+ int32 curr_blk_bytes = (DATA_BLOCK_SIZE - cstate->raw_buf_index);
+ int i = 0;
+
+ GET_REQUIRED_BLOCKS(required_blks, fld_size, curr_blk_bytes);
+
+ i = required_blks;
+
+ while(i > 0)
+ {
+ CopyReadBinaryGetDataBlock(cstate, FIELD_DATA);
+ i--;
+ }
+
+ GET_RAW_BUF_INDEX(cstate->raw_buf_index, fld_size, required_blks, curr_blk_bytes);
+
+ /*
+ * raw_buf_index should never cross data block size,
+ * as the required number of data blocks would have
+ * been obtained in the above while loop.
+ */
+ Assert(cstate->raw_buf_index <= DATA_BLOCK_SIZE);
+ }
+ cstate->cur_attname = NULL;
+ }
+}
+
+/*
+ * CopyReadBinaryTupleWorker - each worker reads data from data blocks after
+ * getting leader-identified tuple offsets from ring data structure.
+ */
+static bool
+CopyReadBinaryTupleWorker(CopyState cstate, Datum *values, bool *nulls)
+{
+ uint32 line_pos;
+ ParallelCopyLineBoundary *line_info;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ int16 fld_count;
+ ListCell *cur;
+ FmgrInfo *in_functions = cstate->in_functions;
+ Oid *typioparams = cstate->typioparams;
+ TupleDesc tup_desc = RelationGetDescr(cstate->rel);
+
+ line_pos = GetLinePosition(cstate);
+
+ if (line_pos == -1)
+ return true;
+
+ line_info = &pcshared_info->line_boundaries.ring[line_pos];
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[line_info->first_block];
+ cstate->raw_buf_index = line_info->start_offset;
+
+ if (cstate->raw_buf_index + sizeof(fld_count) >= DATA_BLOCK_SIZE)
+ {
+ /*
+ * The case where field count spread across datablocks should never occur,
+ * as the leader would have moved it to next block. this code exists for
+ * debugging purposes only.
+ */
+ elog(DEBUG1, "WORKER - field count spread across datablocks should never occur");
+ }
+
+ memcpy(&fld_count, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_count));
+ fld_count = (int16) pg_ntoh16(fld_count);
+
+ CHECK_FIELD_COUNT;
+
+ cstate->raw_buf_index += sizeof(fld_count);
+
+ foreach(cur, cstate->attnumlist)
+ {
+ int att_num = lfirst_int(cur);
+ int m = att_num - 1;
+ Form_pg_attribute att = TupleDescAttr(tup_desc, m);
+ cstate->cur_attname = NameStr(att->attname);
+
+ values[m] = CopyReadBinaryAttributeWorker(cstate,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &nulls[m]);
+ cstate->cur_attname = NULL;
+ }
+
+ pg_atomic_sub_fetch_u32(&cstate->pcdata->curr_data_block->unprocessed_line_parts, 1);
+ line_info->start_offset = -1;
+ pg_atomic_write_u32(&line_info->line_state, LINE_WORKER_PROCESSED);
+ pg_atomic_write_u32(&line_info->line_size, -1);
+ pg_atomic_add_fetch_u64(&pcshared_info->total_worker_processed, 1);
+
+ return false;
+}
+
+/*
+ * CopyReadBinaryAttributeWorker - leader identifies boundaries/offsets
+ * for each attribute/column, it moves on to next data block if the
+ * attribute/column is spread across data blocks.
+ */
+static Datum
+CopyReadBinaryAttributeWorker(CopyState cstate, FmgrInfo *flinfo,
+ Oid typioparam, int32 typmod, bool *isnull)
+{
+ int32 fld_size;
+ Datum result;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+
+ if ((cstate->raw_buf_index + sizeof(fld_size)) >= DATA_BLOCK_SIZE)
+ {
+ ParallelCopyDataBlock *prev_data_block = cstate->pcdata->curr_data_block;
+ elog(DEBUG1, "WORKER - field size is spread across data blocks");
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[prev_data_block->following_block];
+ pg_atomic_sub_fetch_u32(&prev_data_block->unprocessed_line_parts, 1);
+ cstate->raw_buf_index = 0;
+ }
+
+ memcpy(&fld_size, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_size));
+ fld_size = (int32) pg_ntoh32(fld_size);
+
+ CHECK_FIELD_SIZE(fld_size);
+
+ cstate->raw_buf_index += sizeof(fld_size);
+
+ /* reset attribute_buf to empty, and load raw data in it */
+ resetStringInfo(&cstate->attribute_buf);
+
+ enlargeStringInfo(&cstate->attribute_buf, fld_size);
+
+ if ((DATA_BLOCK_SIZE - cstate->raw_buf_index) >= fld_size)
+ {
+ elog(DEBUG1, "WORKER - tuple lies in single data block");
+ memcpy(&cstate->attribute_buf.data[0],&cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], fld_size);
+ cstate->raw_buf_index += fld_size;
+ }
+ else
+ {
+ uint32 att_buf_idx = 0;
+ uint32 copy_bytes = 0;
+ int32 required_blks = 0;
+ int32 curr_blk_bytes = (DATA_BLOCK_SIZE - cstate->raw_buf_index);
+ ParallelCopyDataBlock *prev_data_block = NULL;
+ int i = 0;
+
+ GET_REQUIRED_BLOCKS(required_blks, fld_size, curr_blk_bytes);
+
+ i = required_blks;
+ prev_data_block = cstate->pcdata->curr_data_block;
+ elog(DEBUG1, "WORKER - tuple is spread across data blocks");
+ memcpy(&cstate->attribute_buf.data[0], &prev_data_block->data[cstate->raw_buf_index],
+ curr_blk_bytes);
+ copy_bytes = curr_blk_bytes;
+ att_buf_idx = curr_blk_bytes;
+
+ while (i>0)
+ {
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[prev_data_block->following_block];
+ pg_atomic_sub_fetch_u32(&prev_data_block->unprocessed_line_parts, 1);
+ cstate->raw_buf_index = 0;
+ copy_bytes = fld_size - att_buf_idx;
+
+ /*
+ * the bytes that are yet to be taken into att buff are more than
+ * the entire data block size, but only take the data block size
+ * elements.
+ */
+ if (copy_bytes >= DATA_BLOCK_SIZE)
+ copy_bytes = DATA_BLOCK_SIZE;
+
+ memcpy(&cstate->attribute_buf.data[att_buf_idx],
+ &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], copy_bytes);
+ att_buf_idx += copy_bytes;
+ i--;
+ }
+ GET_RAW_BUF_INDEX(cstate->raw_buf_index, fld_size, required_blks, curr_blk_bytes);
+ }
+
+ cstate->attribute_buf.len = fld_size;
+ cstate->attribute_buf.data[fld_size] = '\0';
+
+ /* Call the column type's binary input converter */
+ result = ReceiveFunctionCall(flinfo, &cstate->attribute_buf,
+ typioparam, typmod);
+
+ /* Trouble if it didn't eat the whole buffer */
+ if (cstate->attribute_buf.cursor != cstate->attribute_buf.len)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
+ errmsg("incorrect binary data format")));
+
+ *isnull = false;
+ return result;
+}
+
+/*
+ * GetLinePosition - return the line position that worker should pcdata->process.
*/
static uint32
GetLinePosition(CopyState cstate)
@@ -1672,7 +2180,9 @@ GetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
{
dataBlkPtr->curr_blk_completed = false;
dataBlkPtr->skip_bytes = 0;
+ dataBlkPtr->following_block = -1;
pcshared_info->cur_block_pos = block_pos;
+ MemSet(&dataBlkPtr->data[0], 0, DATA_BLOCK_SIZE);
return block_pos;
}
@@ -2188,10 +2698,26 @@ CopyGetInt32(CopyState cstate, int32 *val)
{
uint32 buf;
- if (CopyReadBinaryData(cstate, (char *) &buf, sizeof(buf)) != sizeof(buf))
+ /*
+ * For parallel copy, avoid reading data to raw buf, read directly
+ * from file, later the data will be read to parallel copy data
+ * buffers.
+ */
+ if (cstate->nworkers > 0)
{
- *val = 0; /* suppress compiler warning */
- return false;
+ if (CopyGetData(cstate, &buf, sizeof(buf), sizeof(buf)) != sizeof(buf))
+ {
+ *val = 0; /* suppress compiler warning */
+ return false;
+ }
+ }
+ else
+ {
+ if (CopyReadBinaryData(cstate, (char *) &buf, sizeof(buf)) != sizeof(buf))
+ {
+ *val = 0; /* suppress compiler warning */
+ return false;
+ }
}
*val = (int32) pg_ntoh32(buf);
return true;
@@ -2580,7 +3106,15 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
EndParallelCopy(pcxt);
}
else
+ {
+ /*
+ * Reset nworkers to -1 here. This is useful in cases where user
+ * specifies parallel workers, but, no worker is picked up, so go
+ * back to non parallel mode value of nworkers.
+ */
+ cstate->nworkers = -1;
*processed = CopyFrom(cstate); /* copy from file to database */
+ }
EndCopyFrom(cstate);
}
@@ -5045,7 +5579,7 @@ BeginCopyFrom(ParseState *pstate,
* only in text mode.
*/
initStringInfo(&cstate->attribute_buf);
- cstate->raw_buf = (IsParallelCopy()) ? NULL : (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
cstate->raw_buf_index = cstate->raw_buf_len = 0;
if (!cstate->binary)
{
@@ -5125,7 +5659,7 @@ BeginCopyFrom(ParseState *pstate,
int32 tmp;
/* Signature */
- if (CopyReadBinaryData(cstate, readSig, 11) != 11 ||
+ if (CopyGetData(cstate, readSig, 11, 11) != 11 ||
memcmp(readSig, BinarySignature, 11) != 0)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
@@ -5153,7 +5687,7 @@ BeginCopyFrom(ParseState *pstate,
/* Skip extension header, if present */
while (tmp-- > 0)
{
- if (CopyReadBinaryData(cstate, readSig, 1) != 1)
+ if (CopyGetData(cstate, readSig, 1, 1) != 1)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
errmsg("invalid COPY file header (wrong length)")));
@@ -5350,60 +5884,45 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
else
{
/* binary */
- int16 fld_count;
- ListCell *cur;
-
cstate->cur_lineno++;
+ cstate->max_fields = list_length(cstate->attnumlist);
- if (!CopyGetInt16(cstate, &fld_count))
+ if (!IsParallelCopy())
{
- /* EOF detected (end of file, or protocol-level EOF) */
- return false;
- }
+ int16 fld_count;
+ ListCell *cur;
- if (fld_count == -1)
- {
- /*
- * Received EOF marker. In a V3-protocol copy, wait for the
- * protocol-level EOF, and complain if it doesn't come
- * immediately. This ensures that we correctly handle CopyFail,
- * if client chooses to send that now.
- *
- * Note that we MUST NOT try to read more data in an old-protocol
- * copy, since there is no protocol-level EOF marker then. We
- * could go either way for copy from file, but choose to throw
- * error if there's data after the EOF marker, for consistency
- * with the new-protocol case.
- */
- char dummy;
-
- if (cstate->copy_dest != COPY_OLD_FE &&
- CopyReadBinaryData(cstate, &dummy, 1) > 0)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("received copy data after EOF marker")));
- return false;
- }
+ if (!CopyGetInt16(cstate, &fld_count))
+ {
+ /* EOF detected (end of file, or protocol-level EOF) */
+ return false;
+ }
- if (fld_count != attr_count)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("row field count is %d, expected %d",
- (int) fld_count, attr_count)));
+ CHECK_FIELD_COUNT;
- foreach(cur, cstate->attnumlist)
+ foreach(cur, cstate->attnumlist)
+ {
+ int attnum = lfirst_int(cur);
+ int m = attnum - 1;
+ Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+ cstate->cur_attname = NameStr(att->attname);
+ values[m] = CopyReadBinaryAttribute(cstate,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &nulls[m]);
+ cstate->cur_attname = NULL;
+ }
+ }
+ else
{
- int attnum = lfirst_int(cur);
- int m = attnum - 1;
- Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+ bool eof = false;
- cstate->cur_attname = NameStr(att->attname);
- values[m] = CopyReadBinaryAttribute(cstate,
- &in_functions[m],
- typioparams[m],
- att->atttypmod,
- &nulls[m]);
- cstate->cur_attname = NULL;
+ eof = CopyReadBinaryTupleWorker(cstate, values, nulls);
+
+ if (eof)
+ return false;
}
}
@@ -6403,18 +6922,15 @@ CopyReadBinaryAttribute(CopyState cstate, FmgrInfo *flinfo,
Datum result;
if (!CopyGetInt32(cstate, &fld_size))
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("unexpected EOF in COPY data")));
+ EOF_ERROR;
+
if (fld_size == -1)
{
*isnull = true;
return ReceiveFunctionCall(flinfo, NULL, typioparam, typmod);
}
- if (fld_size < 0)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("invalid field size")));
+
+ CHECK_FIELD_SIZE(fld_size);
/* reset attribute_buf to empty, and load raw data in it */
resetStringInfo(&cstate->attribute_buf);
@@ -6422,9 +6938,7 @@ CopyReadBinaryAttribute(CopyState cstate, FmgrInfo *flinfo,
enlargeStringInfo(&cstate->attribute_buf, fld_size);
if (CopyReadBinaryData(cstate, cstate->attribute_buf.data,
fld_size) != fld_size)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("unexpected EOF in COPY data")));
+ EOF_ERROR;
cstate->attribute_buf.len = fld_size;
cstate->attribute_buf.data[fld_size] = '\0';
--
1.8.3.1
I have attached new set of patches with the fixes.
Thoughts?
Hi Vignesh,
I don't really have any further comments on the code, but would like
to share some results of some Parallel Copy performance tests I ran
(attached).
The tests loaded a 5GB CSV data file into a 100 column table (of
different data types). The following were varied as part of the test:
- Number of workers (1 – 10)
- No indexes / 4-indexes
- Default settings / increased resources (shared_buffers,work_mem, etc.)
(I did not do any partition-related tests as I believe those type of
tests were previously performed)
I built Postgres (latest OSS code) with the latest Parallel Copy patches (v4).
The test system was a 32-core Intel Xeon E5-4650 server with 378GB of RAM.
I observed the following trends:
- For the data file size used, Parallel Copy achieved best performance
using about 9 – 10 workers. Larger data files may benefit from using
more workers. However, I couldn’t really see any better performance,
for example, from using 16 workers on a 10GB CSV data file compared to
using 8 workers. Results may also vary depending on machine
characteristics.
- Parallel Copy with 1 worker ran slower than normal Copy in a couple
of cases (I did question if allowing 1 worker was useful in my patch
review).
- Typical load time improvement (load factor) for Parallel Copy was
between 2x and 3x. Better load factors can be obtained by using larger
data files and/or more indexes.
- Increasing Postgres resources made little or no difference to
Parallel Copy performance when the target table had no indexes.
Increasing Postgres resources improved Parallel Copy performance when
the target table had indexes.
Regards,
Greg Nancarrow
Fujitsu Australia
Attachments:
On Thu, Aug 27, 2020 at 8:04 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
I have attached new set of patches with the fixes.
Thoughts?Hi Vignesh,
I don't really have any further comments on the code, but would like
to share some results of some Parallel Copy performance tests I ran
(attached).The tests loaded a 5GB CSV data file into a 100 column table (of
different data types). The following were varied as part of the test:
- Number of workers (1 – 10)
- No indexes / 4-indexes
- Default settings / increased resources (shared_buffers,work_mem, etc.)(I did not do any partition-related tests as I believe those type of
tests were previously performed)I built Postgres (latest OSS code) with the latest Parallel Copy patches (v4).
The test system was a 32-core Intel Xeon E5-4650 server with 378GB of RAM.I observed the following trends:
- For the data file size used, Parallel Copy achieved best performance
using about 9 – 10 workers. Larger data files may benefit from using
more workers. However, I couldn’t really see any better performance,
for example, from using 16 workers on a 10GB CSV data file compared to
using 8 workers. Results may also vary depending on machine
characteristics.
- Parallel Copy with 1 worker ran slower than normal Copy in a couple
of cases (I did question if allowing 1 worker was useful in my patch
review).
I think the reason is that for 1 worker case there is not much
parallelization as a leader doesn't perform the actual load work.
Vignesh, can you please once see if the results are reproducible at
your end, if so, we can once compare the perf profiles to see why in
some cases we get improvement and in other cases not. Based on that we
can decide whether to allow the 1 worker case or not.
- Typical load time improvement (load factor) for Parallel Copy was
between 2x and 3x. Better load factors can be obtained by using larger
data files and/or more indexes.
Nice improvement and I think you are right that with larger load data
we will get even better improvement.
--
With Regards,
Amit Kapila.
On Thu, Aug 27, 2020 at 8:24 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Aug 27, 2020 at 8:04 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
I have attached new set of patches with the fixes.
Thoughts?Hi Vignesh,
I don't really have any further comments on the code, but would like
to share some results of some Parallel Copy performance tests I ran
(attached).The tests loaded a 5GB CSV data file into a 100 column table (of
different data types). The following were varied as part of the test:
- Number of workers (1 – 10)
- No indexes / 4-indexes
- Default settings / increased resources (shared_buffers,work_mem, etc.)(I did not do any partition-related tests as I believe those type of
tests were previously performed)I built Postgres (latest OSS code) with the latest Parallel Copy patches (v4).
The test system was a 32-core Intel Xeon E5-4650 server with 378GB of RAM.I observed the following trends:
- For the data file size used, Parallel Copy achieved best performance
using about 9 – 10 workers. Larger data files may benefit from using
more workers. However, I couldn’t really see any better performance,
for example, from using 16 workers on a 10GB CSV data file compared to
using 8 workers. Results may also vary depending on machine
characteristics.
- Parallel Copy with 1 worker ran slower than normal Copy in a couple
of cases (I did question if allowing 1 worker was useful in my patch
review).I think the reason is that for 1 worker case there is not much
parallelization as a leader doesn't perform the actual load work.
Vignesh, can you please once see if the results are reproducible at
your end, if so, we can once compare the perf profiles to see why in
some cases we get improvement and in other cases not. Based on that we
can decide whether to allow the 1 worker case or not.
I will spend some time on this and update.
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
On Thu, Aug 27, 2020 at 4:56 PM vignesh C <vignesh21@gmail.com> wrote:
On Thu, Aug 27, 2020 at 8:24 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Aug 27, 2020 at 8:04 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
I have attached new set of patches with the fixes.
Thoughts?Hi Vignesh,
I don't really have any further comments on the code, but would like
to share some results of some Parallel Copy performance tests I ran
(attached).The tests loaded a 5GB CSV data file into a 100 column table (of
different data types). The following were varied as part of the test:
- Number of workers (1 – 10)
- No indexes / 4-indexes
- Default settings / increased resources (shared_buffers,work_mem, etc.)(I did not do any partition-related tests as I believe those type of
tests were previously performed)I built Postgres (latest OSS code) with the latest Parallel Copy patches (v4).
The test system was a 32-core Intel Xeon E5-4650 server with 378GB of RAM.I observed the following trends:
- For the data file size used, Parallel Copy achieved best performance
using about 9 – 10 workers. Larger data files may benefit from using
more workers. However, I couldn’t really see any better performance,
for example, from using 16 workers on a 10GB CSV data file compared to
using 8 workers. Results may also vary depending on machine
characteristics.
- Parallel Copy with 1 worker ran slower than normal Copy in a couple
of cases (I did question if allowing 1 worker was useful in my patch
review).I think the reason is that for 1 worker case there is not much
parallelization as a leader doesn't perform the actual load work.
Vignesh, can you please once see if the results are reproducible at
your end, if so, we can once compare the perf profiles to see why in
some cases we get improvement and in other cases not. Based on that we
can decide whether to allow the 1 worker case or not.I will spend some time on this and update.
Thanks.
--
With Regards,
Amit Kapila.
On Thu, Aug 27, 2020 at 8:04 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
- Parallel Copy with 1 worker ran slower than normal Copy in a couple
of cases (I did question if allowing 1 worker was useful in my patch
review).
Thanks Greg for your review & testing.
I had executed various tests with 1GB, 2GB & 5GB with 100 columns without
parallel mode & with 1 parallel worker. Test result for the same is as
given below:
Test Without parallel mode With 1 Parallel worker
1GB csv file 100 columns
(100 bytes data in each column) 62 seconds 47 seconds (1.32X)
1GB csv file 100 columns
(1000 bytes data in each column) 89 seconds 78 seconds (1.14X)
2GB csv file 100 columns
(1 byte data in each column) 277 seconds 256 seconds (1.08X)
5GB csv file 100 columns
(100 byte data in each column) 515 seconds 445 seconds (1.16X)
I have run the tests multiple times and have noticed the similar execution
times in all the runs for the above tests.
In the above results there is slight improvement with 1 worker. In my tests
I did not observe the degradation for copy with 1 worker compared to the
non parallel copy. Can you share with me the script you used to generate
the data & the ddl of the table, so that it will help me check that
scenario you faced the problem.
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
Hi Vignesh,
Can you share with me the script you used to generate the data & the ddl of the table, so that it will help me check that >scenario you faced the >problem.
Unfortunately I can't directly share it (considered company IP),
though having said that it's only doing something that is relatively
simple and unremarkable, so I'd expect it to be much like what you are
currently doing. I can describe it in general.
The table being used contains 100 columns (as I pointed out earlier),
with the first column of "bigserial" type, and the others of different
types like "character varying(255)", "numeric", "date" and "time
without timezone". There's about 60 of the "character varying(255)"
overall, with the other types interspersed.
When testing with indexes, 4 b-tree indexes were used that each
included the first column and then distinctly 9 other columns.
A CSV record (row) template file was created with test data
(corresponding to the table), and that was simply copied and appended
over and over with a record prefix in order to create the test data
file.
The following shell-script basically does it (but very slowly). I was
using a small C program to do similar, a lot faster.
In my case, N=2550000 produced about a 5GB CSV file.
file_out=data.csv; for i in {1..N}; do echo -n "$i," >> $file_out;
cat sample_record.csv >> $file_out; done
One other thing I should mention is that between each test run, I
cleared the OS page cache, as described here:
https://linuxhint.com/clear_cache_linux/
That way, each COPY FROM is not taking advantage of any OS-cached data
from a previous COPY FROM.
If your data is somehow significantly different and you want to (and
can) share your script, then I can try it in my environment.
Regards,
Greg
On Tue, Sep 1, 2020 at 3:39 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
Hi Vignesh,
Can you share with me the script you used to generate the data & the ddl of the table, so that it will help me check that >scenario you faced the >problem.
Unfortunately I can't directly share it (considered company IP),
though having said that it's only doing something that is relatively
simple and unremarkable, so I'd expect it to be much like what you are
currently doing. I can describe it in general.The table being used contains 100 columns (as I pointed out earlier),
with the first column of "bigserial" type, and the others of different
types like "character varying(255)", "numeric", "date" and "time
without timezone". There's about 60 of the "character varying(255)"
overall, with the other types interspersed.When testing with indexes, 4 b-tree indexes were used that each
included the first column and then distinctly 9 other columns.A CSV record (row) template file was created with test data
(corresponding to the table), and that was simply copied and appended
over and over with a record prefix in order to create the test data
file.
The following shell-script basically does it (but very slowly). I was
using a small C program to do similar, a lot faster.
In my case, N=2550000 produced about a 5GB CSV file.file_out=data.csv; for i in {1..N}; do echo -n "$i," >> $file_out;
cat sample_record.csv >> $file_out; doneOne other thing I should mention is that between each test run, I
cleared the OS page cache, as described here:
https://linuxhint.com/clear_cache_linux/
That way, each COPY FROM is not taking advantage of any OS-cached data
from a previous COPY FROM.
I will try with a similar test and check if I can reproduce.
If your data is somehow significantly different and you want to (and
can) share your script, then I can try it in my environment.
I have attached the scripts that I used for the test results I
mentioned in my previous mail. create.sql file has the table that I
used, insert_data_gen.txt has the insert data generation scripts. I
varied the count in insert_data_gen to generate csv files of 1GB, 2GB
& 5GB & varied the data to generate 1 char, 10 char & 100 char for
each column for various testing. You can rename insert_data_gen.txt to
insert_data_gen.sh & generate the csv file.
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
On Wed, Sep 2, 2020 at 3:40 PM vignesh C <vignesh21@gmail.com> wrote:
I have attached the scripts that I used for the test results I
mentioned in my previous mail. create.sql file has the table that I
used, insert_data_gen.txt has the insert data generation scripts. I
varied the count in insert_data_gen to generate csv files of 1GB, 2GB
& 5GB & varied the data to generate 1 char, 10 char & 100 char for
each column for various testing. You can rename insert_data_gen.txt to
insert_data_gen.sh & generate the csv file.
Hi Vignesh,
I used your script and table definition, multiplying the number of
records to produce a 5GB and 9.5GB CSV file.
I got the following results:
(1) Postgres default settings, 5GB CSV (530000 rows):
Copy Type Duration (s) Load factor
===============================================
Normal Copy 132.197 -
Parallel Copy
(#workers)
1 98.428 1.34
2 52.753 2.51
3 37.630 3.51
4 33.554 3.94
5 33.636 3.93
6 33.821 3.91
7 34.270 3.86
8 34.465 3.84
9 34.315 3.85
10 33.543 3.94
(2) Postgres increased resources, 5GB CSV (530000 rows):
shared_buffers = 20% of RAM (total RAM = 376GB) = 76GB
work_mem = 10% of RAM = 38GB
maintenance_work_mem = 10% of RAM = 38GB
max_worker_processes = 16
max_parallel_workers = 16
checkpoint_timeout = 30min
max_wal_size=2GB
Copy Type Duration (s) Load factor
===============================================
Normal Copy 131.835 -
Parallel Copy
(#workers)
1 98.301 1.34
2 53.261 2.48
3 37.868 3.48
4 34.224 3.85
5 33.831 3.90
6 34.229 3.85
7 34.512 3.82
8 34.303 3.84
9 34.690 3.80
10 34.479 3.82
(3) Postgres default settings, 9.5GB CSV (1000000 rows):
Copy Type Duration (s) Load factor
===============================================
Normal Copy 248.503 -
Parallel Copy
(#workers)
1 185.724 1.34
2 99.832 2.49
3 70.560 3.52
4 63.328 3.92
5 63.182 3.93
6 64.108 3.88
7 64.131 3.87
8 64.350 3.86
9 64.293 3.87
10 63.818 3.89
(4) Postgres increased resources, 9.5GB CSV (1000000 rows):
shared_buffers = 20% of RAM (total RAM = 376GB) = 76GB
work_mem = 10% of RAM = 38GB
maintenance_work_mem = 10% of RAM = 38GB
max_worker_processes = 16
max_parallel_workers = 16
checkpoint_timeout = 30min
max_wal_size=2GB
Copy Type Duration (s) Load factor
===============================================
Normal Copy 248.647 -
Parallel Copy
(#workers)
1 182.236 1.36
2 92.814 2.68
3 67.347 3.69
4 63.839 3.89
5 62.672 3.97
6 63.873 3.89
7 64.930 3.83
8 63.885 3.89
9 62.397 3.98
10 64.477 3.86
So as you found, with this particular table definition and data, 1
parallel worker always performs better than normal copy.
The different result obtained for this particular case seems to be
caused by the following factors:
- different table definition (I used a variety of column types)
- amount of data per row (I used less data per row, so more rows per
same size data file)
As I previously observed, if the target table has no indexes,
increasing resources beyond the default settings makes little
difference to the performance.
Regards,
Greg Nancarrow
Fujitsu Australia
On Tue, Sep 1, 2020 at 3:39 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
Hi Vignesh,
Can you share with me the script you used to generate the data & the ddl of the table, so that it will help me check that >scenario you faced the >problem.
Unfortunately I can't directly share it (considered company IP),
though having said that it's only doing something that is relatively
simple and unremarkable, so I'd expect it to be much like what you are
currently doing. I can describe it in general.The table being used contains 100 columns (as I pointed out earlier),
with the first column of "bigserial" type, and the others of different
types like "character varying(255)", "numeric", "date" and "time
without timezone". There's about 60 of the "character varying(255)"
overall, with the other types interspersed.
Thanks Greg for executing & sharing the results.
I tried with a similar test case that you suggested, I was not able to
reproduce the degradation scenario.
If it is possible, can you run perf for the scenario with 1 worker &
non parallel mode & share the perf results, we will be able to find
out which of the functions is consuming more time by doing a
comparison of the perf reports.
Steps for running perf:
1) get the postgres pid
2) perf record -a -g -p <above pid>
3) Run copy command
4) Execute "perf report -g" once copy finishes.
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
On Fri, Sep 11, 2020 at 3:49 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
I couldn't use the original machine from which I obtained the previous
results, but ended up using a 4-core CentOS7 VM, which showed a
similar pattern in the performance results for this test case.
I obtained the following results from loading a 2GB CSV file (1000000
rows, 4 indexes):Copy Type Duration (s) Load factor
===============================================
Normal Copy 190.891 -Parallel Copy
(#workers)
1 210.947 0.90
Hi Greg,
I tried to recreate the test case(attached) and I didn't find much
difference with the custom postgresql.config file.
Test case: 250000 tuples, 4 indexes(composite indexes with 10
columns), 3.7GB, 100 columns(as suggested by you and all the
varchar(255) columns are having 255 characters), exec time in sec.
With custom postgresql.conf[1]Postgres configuration used for above testing: shared_buffers = 40GB max_worker_processes = 32 max_parallel_maintenance_workers = 24 max_parallel_workers = 32 synchronous_commit = off checkpoint_timeout = 1d max_wal_size = 24GB min_wal_size = 15GB autovacuum = off, removed and recreated the data
directory after every run(I couldn't perform the OS page cache flush
due to some reasons. So, chose this recreation of data dir way, for
testing purpose):
HEAD: 129.547, 128.624, 128.890
Patch: 0 workers - 130.213, 131.298, 130.555
Patch: 1 worker - 127.757, 125.560, 128.275
With default postgresql.conf, removed and recreated the data directory
after every run:
HEAD: 138.276, 150.472, 153.304
Patch: 0 workers - 162.468, 149.423, 159.137
Patch: 1 worker - 136.055, 144.250, 137.916
Few questions:
1. Was the run performed with default postgresql.conf file? If not,
what are the changed configurations?
2. Are the readings for normal copy(190.891sec, mentioned by you
above) taken on HEAD or with patch, 0 workers? How much is the runtime
with your test case on HEAD(Without patch) and 0 workers(With patch)?
3. Was the run performed on release build?
4. Were the readings taken on multiple runs(say 3 or 4 times)?
[1]: Postgres configuration used for above testing: shared_buffers = 40GB max_worker_processes = 32 max_parallel_maintenance_workers = 24 max_parallel_workers = 32 synchronous_commit = off checkpoint_timeout = 1d max_wal_size = 24GB min_wal_size = 15GB autovacuum = off
shared_buffers = 40GB
max_worker_processes = 32
max_parallel_maintenance_workers = 24
max_parallel_workers = 32
synchronous_commit = off
checkpoint_timeout = 1d
max_wal_size = 24GB
min_wal_size = 15GB
autovacuum = off
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
Import Notes
Reply to msg id not found: CAJcOf-fhdoLxTyEUx1H1UxxQ4yEz4f6qNRe6mXzpW_PkN7EA@mail.gmail.com
Hi Bharath,
On Tue, Sep 15, 2020 at 11:49 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
Few questions:
1. Was the run performed with default postgresql.conf file? If not,
what are the changed configurations?
Yes, just default settings.
2. Are the readings for normal copy(190.891sec, mentioned by you
above) taken on HEAD or with patch, 0 workers?
With patch
How much is the runtime
with your test case on HEAD(Without patch) and 0 workers(With patch)?
TBH, I didn't test that. Looking at the changes, I wouldn't expect a
degradation of performance for normal copy (you have tested, right?).
3. Was the run performed on release build?
For generating the perf data I sent (normal copy vs parallel copy with
1 worker), I used a debug build (-g -O0), as that is needed for
generating all the relevant perf data for Postgres code. Previously I
ran with a release build (-O2).
4. Were the readings taken on multiple runs(say 3 or 4 times)?
The readings I sent were from just one run (not averaged), but I did
run the tests several times to verify the readings were representative
of the pattern I was seeing.
Fortunately I have been given permission to share the exact table
definition and data I used, so you can check the behaviour and timings
on your own test machine.
Please see the attachment.
You can create the table using the table.sql and index_4.sql
definitions in the "sql" directory.
The data.csv file (to be loaded by COPY) can be created with the
included "dupdata" tool in the "input" directory, which you need to
build, then run, specifying a suitable number of records and path of
the template record (see README). Obviously the larger the number of
records, the larger the file ...
The table can then be loaded using COPY with "format csv" (and
"parallel N" if testing parallel copy).
Regards,
Greg Nancarrow
Fujitsu Australia
Attachments:
Hi Vignesh,
I've spent some time today looking at your new set of patches and I've
some thoughts and queries which I would like to put here:
Why are these not part of the shared cstate structure?
SerializeString(pcxt, PARALLEL_COPY_KEY_NULL_PRINT, cstate->null_print);
SerializeString(pcxt, PARALLEL_COPY_KEY_DELIM, cstate->delim);
SerializeString(pcxt, PARALLEL_COPY_KEY_QUOTE, cstate->quote);
SerializeString(pcxt, PARALLEL_COPY_KEY_ESCAPE, cstate->escape);
I think in the refactoring patch we could replace all the cstate
variables that would be shared between the leader and workers with a
common structure which would be used even for a serial copy. Thoughts?
--
Have you tested your patch when encoding conversion is needed? If so,
could you please point out the email that has the test results.
--
Apart from above, I've noticed some cosmetic errors which I am sharing here:
+#define IsParallelCopy() (cstate->is_parallel)
+#define IsLeader() (cstate->pcdata->is_leader)
This doesn't look to be properly aligned.
--
+ shared_info_ptr = (ParallelCopyShmInfo *)
shm_toc_allocate(pcxt->toc, sizeof(ParallelCopyShmInfo));
+ PopulateParallelCopyShmInfo(shared_info_ptr, full_transaction_id);
..
+ /* Store shared build state, for which we reserved space. */
+ shared_cstate = (SerializedParallelCopyState
*)shm_toc_allocate(pcxt->toc, est_cstateshared);
In the first case, while typecasting you've added a space between the
typename and the function but that is missing in the second case. I
think it would be good if you could make it consistent.
Same comment applies here as well:
+ pg_atomic_uint32 line_state; /* line state */
+ uint64 cur_lineno; /* line number for error messages */
+}ParallelCopyLineBoundary;
...
+ CommandId mycid; /* command id */
+ ParallelCopyLineBoundaries line_boundaries; /* line array */
+} ParallelCopyShmInfo;
There is no space between the closing brace and the structure name in
the first case but it is in the second one. So, again this doesn't
look consistent.
I could also find this type of inconsistency in comments. See below:
+/* It can hold upto 10000 record information for worker to process. RINGSIZE
+ * should be a multiple of WORKER_CHUNK_COUNT, as wrap around cases
is currently
+ * not handled while selecting the WORKER_CHUNK_COUNT by the worker. */
+#define RINGSIZE (10 * 1000)
...
+/*
+ * Each worker will be allocated WORKER_CHUNK_COUNT of records from DSM data
+ * block to process to avoid lock contention. Read RINGSIZE comments before
+ * changing this value.
+ */
+#define WORKER_CHUNK_COUNT 50
You may see these kinds of errors at other places as well if you scan
through your patch.
--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
Show quoted text
On Wed, Aug 19, 2020 at 11:51 AM vignesh C <vignesh21@gmail.com> wrote:
Thanks Greg for reviewing the patch. Please find my thoughts for your comments.
On Mon, Aug 17, 2020 at 9:44 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
Some further comments:
(1) v3-0002-Framework-for-leader-worker-in-parallel-copy.patch
+/* + * Each worker will be allocated WORKER_CHUNK_COUNT of records from DSM data + * block to process to avoid lock contention. This value should be divisible by + * RINGSIZE, as wrap around cases is currently not handled while selecting the + * WORKER_CHUNK_COUNT by the worker. + */ +#define WORKER_CHUNK_COUNT 50"This value should be divisible by RINGSIZE" is not a correct
statement (since obviously 50 is not divisible by 10000).
It should say something like "This value should evenly divide into
RINGSIZE", or "RINGSIZE should be a multiple of WORKER_CHUNK_COUNT".Fixed. Changed it to RINGSIZE should be a multiple of WORKER_CHUNK_COUNT.
(2) v3-0003-Allow-copy-from-command-to-process-data-from-file.patch
(i)
+ /* + * If the data is present in current block lineInfo. line_size + * will be updated. If the data is spread across the blocks eitherSomehow a space has been put between "lineinfo." and "line_size".
It should be: "If the data is present in current block
lineInfo.line_size will be updated"Fixed, changed it to lineinfo->line_size.
(ii)
This is not possible because of pg_atomic_compare_exchange_u32, this
will succeed only for one of the workers whose line_state is
LINE_LEADER_POPULATED, for other workers it will fail. This is
explained in detail above ParallelCopyLineBoundary.Yes, but prior to that call to pg_atomic_compare_exchange_u32(),
aren't you separately reading line_state and line_state, so that
between those reads, it may have transitioned from leader to another
worker, such that the read line state ("cur_line_state", being checked
in the if block) may not actually match what is now in the line_state
and/or the read line_size ("dataSize") doesn't actually correspond to
the read line state?(sorry, still not 100% convinced that the synchronization and checks
are safe in all cases)I think that you are describing about the problem could happen in the
following case:
when we read curr_line_state, the value was LINE_WORKER_PROCESSED or
LINE_WORKER_PROCESSING. Then in some cases if the leader is very fast
compared to the workers then the leader quickly populates one line and
sets the state to LINE_LEADER_POPULATED. State is changed to
LINE_LEADER_POPULATED when we are checking the currr_line_state.
I feel this will not be a problem because, Leader will populate & wait
till some RING element is available to populate. In the meantime
worker has seen that state is LINE_WORKER_PROCESSED or
LINE_WORKER_PROCESSING(previous state that it read), worker has
identified that this chunk was processed by some other worker, worker
will move and try to get the next available chunk & insert those
records. It will keep continuing till it gets the next chunk to
process. Eventually one of the workers will get this chunk and process
it.(3) v3-0006-Parallel-Copy-For-Binary-Format-Files.patch
raw_buf is not used in parallel copy, instead raw_buf will be pointing
to shared memory data blocks. This memory was allocated as part of
BeginCopyFrom, uptil this point we cannot be 100% sure as copy can be
performed sequentially like in case max_worker_processes is not
available, if it switches to sequential mode raw_buf will be used
while performing copy operation. At this place we can safely free this
memory that was allocatedSo the following code (which checks raw_buf, which still points to
memory that has been pfreed) is still valid?In the SetRawBufForLoad() function, which is called by CopyReadLineText():
cur_data_blk_ptr = (cstate->raw_buf) ?
&pcshared_info->data_blocks[cur_block_pos] : NULL;The above code looks a bit dicey to me. I stepped over that line in
the debugger when I debugged an instance of Parallel Copy, so it
definitely gets executed.
It makes me wonder what other code could possibly be checking raw_buf
and using it in some way, when in fact what it points to has been
pfreed.Are you able to add the following line of code, or will it (somehow)
break logic that you are relying on?pfree(cstate->raw_buf);
cstate->raw_buf = NULL; <=== I suggest that this line is addedYou are right, I have debugged & verified it sets it to an invalid
block which is not expected. There are chances this would have caused
some corruption in some machines. The suggested fix is required, I
have fixed it. I have moved this change to
0003-Allow-copy-from-command-to-process-data-from-file.patch as
0006-Parallel-Copy-For-Binary-Format-Files is only for Binary format
parallel copy & that change is common change for parallel copy.I have attached new set of patches with the fixes.
Thoughts?Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
On Wed, Sep 16, 2020 at 1:20 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
Fortunately I have been given permission to share the exact table
definition and data I used, so you can check the behaviour and timings
on your own test machine.
Thanks Greg for the script. I ran your test case and I didn't observe
any increase in exec time with 1 worker, indeed, we have benefitted a
few seconds from 0 to 1 worker as expected.
Execution time is in seconds. Each test case is executed 3 times on
release build. Each time the data directory is recreated.
Case 1: 1000000 rows, 2GB
With Patch, default configuration, 0 worker: 88.933, 92.261, 88.423
With Patch, default configuration, 1 worker: 73.825, 74.583, 72.678
With Patch, custom configuration, 0 worker: 76.191, 78.160, 78.822
With Patch, custom configuration, 1 worker: 61.289, 61.288, 60.573
Case 2: 2550000 rows, 5GB
With Patch, default configuration, 0 worker: 246.031, 188.323, 216.683
With Patch, default configuration, 1 worker: 156.299, 153.293, 170.307
With Patch, custom configuration, 0 worker: 197.234, 195.866, 196.049
With Patch, custom configuration, 1 worker: 157.173, 158.287, 157.090
[1]: Custom configuration is set up to ensure that no other processes influence the results. The postgresql.conf used: shared_buffers = 40GB synchronous_commit = off checkpoint_timeout = 1d max_wal_size = 24GB min_wal_size = 15GB autovacuum = off max_worker_processes = 32 max_parallel_maintenance_workers = 24 max_parallel_workers = 32
influence the results. The postgresql.conf used:
shared_buffers = 40GB
synchronous_commit = off
checkpoint_timeout = 1d
max_wal_size = 24GB
min_wal_size = 15GB
autovacuum = off
max_worker_processes = 32
max_parallel_maintenance_workers = 24
max_parallel_workers = 32
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Thanks Ashutosh for your comments.
On Wed, Sep 16, 2020 at 6:36 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
Hi Vignesh,
I've spent some time today looking at your new set of patches and I've
some thoughts and queries which I would like to put here:Why are these not part of the shared cstate structure?
SerializeString(pcxt, PARALLEL_COPY_KEY_NULL_PRINT, cstate->null_print);
SerializeString(pcxt, PARALLEL_COPY_KEY_DELIM, cstate->delim);
SerializeString(pcxt, PARALLEL_COPY_KEY_QUOTE, cstate->quote);
SerializeString(pcxt, PARALLEL_COPY_KEY_ESCAPE, cstate->escape);
I have used shared_cstate mainly to share the integer & bool data
types from the leader to worker process. The above data types are of
char* data type, I will not be able to use it like how I could do it
for integer type. So I preferred to send these as separate keys to the
worker. Thoughts?
I think in the refactoring patch we could replace all the cstate
variables that would be shared between the leader and workers with a
common structure which would be used even for a serial copy. Thoughts?
Currently we are using shared_cstate only to share integer & bool data
types from leader to worker. Once worker retrieves the shared data for
integer & bool data types, worker will copy it to cstate. I preferred
this way because only for integer & bool we retrieve to shared_cstate
& copy it to cstate and for rest of the members any way we are
directly copying back to cstate. Thoughts?
Have you tested your patch when encoding conversion is needed? If so,
could you please point out the email that has the test results.
We have not yet done encoding testing, we will do and post the results
separately in the coming days.
Apart from above, I've noticed some cosmetic errors which I am sharing here:
+#define IsParallelCopy() (cstate->is_parallel) +#define IsLeader() (cstate->pcdata->is_leader)This doesn't look to be properly aligned.
Fixed.
+ shared_info_ptr = (ParallelCopyShmInfo *) shm_toc_allocate(pcxt->toc, sizeof(ParallelCopyShmInfo)); + PopulateParallelCopyShmInfo(shared_info_ptr, full_transaction_id);..
+ /* Store shared build state, for which we reserved space. */ + shared_cstate = (SerializedParallelCopyState *)shm_toc_allocate(pcxt->toc, est_cstateshared);In the first case, while typecasting you've added a space between the
typename and the function but that is missing in the second case. I
think it would be good if you could make it consistent.
Fixed
Same comment applies here as well:
+ pg_atomic_uint32 line_state; /* line state */ + uint64 cur_lineno; /* line number for error messages */ +}ParallelCopyLineBoundary;...
+ CommandId mycid; /* command id */ + ParallelCopyLineBoundaries line_boundaries; /* line array */ +} ParallelCopyShmInfo;There is no space between the closing brace and the structure name in
the first case but it is in the second one. So, again this doesn't
look consistent.
Fixed
I could also find this type of inconsistency in comments. See below:
+/* It can hold upto 10000 record information for worker to process. RINGSIZE + * should be a multiple of WORKER_CHUNK_COUNT, as wrap around cases is currently + * not handled while selecting the WORKER_CHUNK_COUNT by the worker. */ +#define RINGSIZE (10 * 1000)...
+/* + * Each worker will be allocated WORKER_CHUNK_COUNT of records from DSM data + * block to process to avoid lock contention. Read RINGSIZE comments before + * changing this value. + */ +#define WORKER_CHUNK_COUNT 50You may see these kinds of errors at other places as well if you scan
through your patch.
Fixed.
Please find the attached v5 patch which has the fixes for the same.
Thoughts?
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v5-0001-Copy-code-readjustment-to-support-parallel-copy.patchtext/x-patch; charset=US-ASCII; name=v5-0001-Copy-code-readjustment-to-support-parallel-copy.patchDownload
From 443297e9b1f842a78cdd37e6f273dbfa7a706897 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Sat, 1 Aug 2020 08:52:48 +0530
Subject: [PATCH v5 1/6] Copy code readjustment to support parallel copy.
This patch has the copy code slightly readjusted so that the common code is
separated to functions/macros, these functions/macros will be used by the
workers in the parallel copy code of the upcoming patches. EOL removal is moved
from CopyReadLine to CopyReadLineText, this change was required because in case
of parallel copy the record identification and record updation is done in
CopyReadLineText, before record information is updated in shared memory the new
line characters should be removed.
---
src/backend/commands/copy.c | 360 ++++++++++++++++++++++++++------------------
1 file changed, 217 insertions(+), 143 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 2047557..cf7277a 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -95,6 +95,9 @@ typedef enum CopyInsertMethod
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
} CopyInsertMethod;
+#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
+
/*
* This struct contains all the state variables used throughout a COPY
* operation. For simplicity, we use the same struct for all variants of COPY,
@@ -224,7 +227,6 @@ typedef struct CopyStateData
* appropriate amounts of data from this buffer. In both modes, we
* guarantee that there is a \0 at raw_buf[raw_buf_len].
*/
-#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
char *raw_buf;
int raw_buf_index; /* next byte to process */
int raw_buf_len; /* total # of bytes stored */
@@ -354,6 +356,27 @@ if (1) \
goto not_end_of_copy; \
} else ((void) 0)
+/*
+ * CLEAR_EOL_LINE - Wrapper for clearing EOL.
+ */
+#define CLEAR_EOL_LINE() \
+if (!result && !IsHeaderLine()) \
+ ClearEOLFromCopiedData(cstate, cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ &cstate->line_buf.len) \
+
+/*
+ * INCREMENTPROCESSED - Increment the lines processed.
+ */
+#define INCREMENTPROCESSED(processed) \
+processed++;
+
+/*
+ * RETURNPROCESSED - Get the lines processed.
+ */
+#define RETURNPROCESSED(processed) \
+return processed;
+
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
@@ -401,7 +424,11 @@ static bool CopyGetInt16(CopyState cstate, int16 *val);
static bool CopyLoadRawBuf(CopyState cstate);
static int CopyReadBinaryData(CopyState cstate, char *dest, int nbytes);
-
+static void PopulateCommonCstateInfo(CopyState cstate, TupleDesc tup_desc,
+ List *attnamelist);
+static void ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
+ int copy_line_pos, int *copy_line_size);
+static void ConvertToServerEncoding(CopyState cstate);
/*
* Send copy start/stop messages for frontend copies. These have changed
* in past protocol redesigns.
@@ -801,14 +828,18 @@ CopyLoadRawBuf(CopyState cstate)
{
int nbytes = RAW_BUF_BYTES(cstate);
int inbytes;
+ int minread = 1;
/* Copy down the unprocessed data if any. */
if (nbytes > 0)
memmove(cstate->raw_buf, cstate->raw_buf + cstate->raw_buf_index,
nbytes);
+ if (cstate->copy_dest == COPY_NEW_FE)
+ minread = RAW_BUF_SIZE - nbytes;
+
inbytes = CopyGetData(cstate, cstate->raw_buf + nbytes,
- 1, RAW_BUF_SIZE - nbytes);
+ minread, RAW_BUF_SIZE - nbytes);
nbytes += inbytes;
cstate->raw_buf[nbytes] = '\0';
cstate->raw_buf_index = 0;
@@ -1514,7 +1545,6 @@ BeginCopy(ParseState *pstate,
{
CopyState cstate;
TupleDesc tupDesc;
- int num_phys_attrs;
MemoryContext oldcontext;
/* Allocate workspace and zero all fields */
@@ -1680,6 +1710,23 @@ BeginCopy(ParseState *pstate,
tupDesc = cstate->queryDesc->tupDesc;
}
+ PopulateCommonCstateInfo(cstate, tupDesc, attnamelist);
+ cstate->copy_dest = COPY_FILE; /* default */
+
+ MemoryContextSwitchTo(oldcontext);
+
+ return cstate;
+}
+
+/*
+ * PopulateCommonCstateInfo - Populates the common variables required for copy
+ * from operation. This is a helper function for BeginCopy function.
+ */
+static void
+PopulateCommonCstateInfo(CopyState cstate, TupleDesc tupDesc, List *attnamelist)
+{
+ int num_phys_attrs;
+
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
@@ -1799,12 +1846,6 @@ BeginCopy(ParseState *pstate,
pg_database_encoding_max_length() > 1);
/* See Multibyte encoding comment above */
cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
-
- cstate->copy_dest = COPY_FILE; /* default */
-
- MemoryContextSwitchTo(oldcontext);
-
- return cstate;
}
/*
@@ -2696,32 +2737,11 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
}
/*
- * Copy FROM file to relation.
+ * Check if the relation specified in copy from is valid.
*/
-uint64
-CopyFrom(CopyState cstate)
+static void
+CheckTargetRelValidity(CopyState cstate)
{
- ResultRelInfo *resultRelInfo;
- ResultRelInfo *target_resultRelInfo;
- ResultRelInfo *prevResultRelInfo = NULL;
- EState *estate = CreateExecutorState(); /* for ExecConstraints() */
- ModifyTableState *mtstate;
- ExprContext *econtext;
- TupleTableSlot *singleslot = NULL;
- MemoryContext oldcontext = CurrentMemoryContext;
-
- PartitionTupleRouting *proute = NULL;
- ErrorContextCallback errcallback;
- CommandId mycid = GetCurrentCommandId(true);
- int ti_options = 0; /* start with default options for insert */
- BulkInsertState bistate = NULL;
- CopyInsertMethod insertMethod;
- CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
- uint64 processed = 0;
- bool has_before_insert_row_trig;
- bool has_instead_insert_row_trig;
- bool leafpart_use_multi_insert = false;
-
Assert(cstate->rel);
/*
@@ -2758,27 +2778,6 @@ CopyFrom(CopyState cstate)
RelationGetRelationName(cstate->rel))));
}
- /*
- * If the target file is new-in-transaction, we assume that checking FSM
- * for free space is a waste of time. This could possibly be wrong, but
- * it's unlikely.
- */
- if (RELKIND_HAS_STORAGE(cstate->rel->rd_rel->relkind) &&
- (cstate->rel->rd_createSubid != InvalidSubTransactionId ||
- cstate->rel->rd_firstRelfilenodeSubid != InvalidSubTransactionId))
- ti_options |= TABLE_INSERT_SKIP_FSM;
-
- /*
- * Optimize if new relfilenode was created in this subxact or one of its
- * committed children and we won't see those rows later as part of an
- * earlier scan or command. The subxact test ensures that if this subxact
- * aborts then the frozen rows won't be visible after xact cleanup. Note
- * that the stronger test of exactly which subtransaction created it is
- * crucial for correctness of this optimization. The test for an earlier
- * scan or command tolerates false negatives. FREEZE causes other sessions
- * to see rows they would not see under MVCC, and a false negative merely
- * spreads that anomaly to the current session.
- */
if (cstate->freeze)
{
/*
@@ -2816,9 +2815,61 @@ CopyFrom(CopyState cstate)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("cannot perform COPY FREEZE because the table was not created or truncated in the current subtransaction")));
+ }
+}
+
+/*
+ * Copy FROM file to relation.
+ */
+uint64
+CopyFrom(CopyState cstate)
+{
+ ResultRelInfo *resultRelInfo;
+ ResultRelInfo *target_resultRelInfo;
+ ResultRelInfo *prevResultRelInfo = NULL;
+ EState *estate = CreateExecutorState(); /* for ExecConstraints() */
+ ModifyTableState *mtstate;
+ ExprContext *econtext;
+ TupleTableSlot *singleslot = NULL;
+ MemoryContext oldcontext = CurrentMemoryContext;
+
+ PartitionTupleRouting *proute = NULL;
+ ErrorContextCallback errcallback;
+ CommandId mycid = GetCurrentCommandId(true);
+ int ti_options = 0; /* start with default options for insert */
+ BulkInsertState bistate = NULL;
+ CopyInsertMethod insertMethod;
+ CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
+ uint64 processed = 0;
+ bool has_before_insert_row_trig;
+ bool has_instead_insert_row_trig;
+ bool leafpart_use_multi_insert = false;
+
+ CheckTargetRelValidity(cstate);
+ /*
+ * If the target file is new-in-transaction, we assume that checking FSM
+ * for free space is a waste of time. This could possibly be wrong, but
+ * it's unlikely.
+ */
+ if (RELKIND_HAS_STORAGE(cstate->rel->rd_rel->relkind) &&
+ (cstate->rel->rd_createSubid != InvalidSubTransactionId ||
+ cstate->rel->rd_firstRelfilenodeSubid != InvalidSubTransactionId))
+ ti_options |= TABLE_INSERT_SKIP_FSM;
+
+ /*
+ * Optimize if new relfilenode was created in this subxact or one of its
+ * committed children and we won't see those rows later as part of an
+ * earlier scan or command. The subxact test ensures that if this subxact
+ * aborts then the frozen rows won't be visible after xact cleanup. Note
+ * that the stronger test of exactly which subtransaction created it is
+ * crucial for correctness of this optimization. The test for an earlier
+ * scan or command tolerates false negatives. FREEZE causes other sessions
+ * to see rows they would not see under MVCC, and a false negative merely
+ * spreads that anomaly to the current session.
+ */
+ if (cstate->freeze)
ti_options |= TABLE_INSERT_FROZEN;
- }
/*
* We need a ResultRelInfo so we can use the regular executor's
@@ -3311,7 +3362,7 @@ CopyFrom(CopyState cstate)
* or FDW; this is the same definition used by nodeModifyTable.c
* for counting tuples inserted by an INSERT command.
*/
- processed++;
+ INCREMENTPROCESSED(processed)
}
}
@@ -3366,30 +3417,15 @@ CopyFrom(CopyState cstate)
FreeExecutorState(estate);
- return processed;
+ RETURNPROCESSED(processed)
}
/*
- * Setup to read tuples from a file for COPY FROM.
- *
- * 'rel': Used as a template for the tuples
- * 'filename': Name of server-local file to read
- * 'attnamelist': List of char *, columns to include. NIL selects all cols.
- * 'options': List of DefElem. See copy_opt_item in gram.y for selections.
- *
- * Returns a CopyState, to be passed to NextCopyFrom and related functions.
+ * PopulateCstateCatalogInfo - Populate the catalog information.
*/
-CopyState
-BeginCopyFrom(ParseState *pstate,
- Relation rel,
- const char *filename,
- bool is_program,
- copy_data_source_cb data_source_cb,
- List *attnamelist,
- List *options)
+static void
+PopulateCstateCatalogInfo(CopyState cstate)
{
- CopyState cstate;
- bool pipe = (filename == NULL);
TupleDesc tupDesc;
AttrNumber num_phys_attrs,
num_defaults;
@@ -3399,38 +3435,8 @@ BeginCopyFrom(ParseState *pstate,
Oid in_func_oid;
int *defmap;
ExprState **defexprs;
- MemoryContext oldcontext;
bool volatile_defexprs;
- cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
- oldcontext = MemoryContextSwitchTo(cstate->copycontext);
-
- /* Initialize state variables */
- cstate->reached_eof = false;
- cstate->eol_type = EOL_UNKNOWN;
- cstate->cur_relname = RelationGetRelationName(cstate->rel);
- cstate->cur_lineno = 0;
- cstate->cur_attname = NULL;
- cstate->cur_attval = NULL;
-
- /*
- * Set up variables to avoid per-attribute overhead. attribute_buf and
- * raw_buf are used in both text and binary modes, but we use line_buf
- * only in text mode.
- */
- initStringInfo(&cstate->attribute_buf);
- cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
- cstate->raw_buf_index = cstate->raw_buf_len = 0;
- if (!cstate->binary)
- {
- initStringInfo(&cstate->line_buf);
- cstate->line_buf_converted = false;
- }
-
- /* Assign range table, we'll need it in CopyFrom. */
- if (pstate)
- cstate->range_table = pstate->p_rtable;
-
tupDesc = RelationGetDescr(cstate->rel);
num_phys_attrs = tupDesc->natts;
num_defaults = 0;
@@ -3508,6 +3514,61 @@ BeginCopyFrom(ParseState *pstate,
cstate->defexprs = defexprs;
cstate->volatile_defexprs = volatile_defexprs;
cstate->num_defaults = num_defaults;
+}
+
+/*
+ * Setup to read tuples from a file for COPY FROM.
+ *
+ * 'rel': Used as a template for the tuples
+ * 'filename': Name of server-local file to read
+ * 'attnamelist': List of char *, columns to include. NIL selects all cols.
+ * 'options': List of DefElem. See copy_opt_item in gram.y for selections.
+ *
+ * Returns a CopyState, to be passed to NextCopyFrom and related functions.
+ */
+CopyState
+BeginCopyFrom(ParseState *pstate,
+ Relation rel,
+ const char *filename,
+ bool is_program,
+ copy_data_source_cb data_source_cb,
+ List *attnamelist,
+ List *options)
+{
+ CopyState cstate;
+ bool pipe = (filename == NULL);
+ MemoryContext oldcontext;
+
+ cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+
+ /* Initialize state variables */
+ cstate->reached_eof = false;
+ cstate->eol_type = EOL_UNKNOWN;
+ cstate->cur_relname = RelationGetRelationName(cstate->rel);
+ cstate->cur_lineno = 0;
+ cstate->cur_attname = NULL;
+ cstate->cur_attval = NULL;
+
+ /*
+ * Set up variables to avoid per-attribute overhead. attribute_buf is
+ * used in both text and binary modes, but we use line_buf and raw_buf
+ * only in text mode.
+ */
+ initStringInfo(&cstate->attribute_buf);
+ cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+ if (!cstate->binary)
+ {
+ initStringInfo(&cstate->line_buf);
+ cstate->line_buf_converted = false;
+ }
+
+ /* Assign range table, we'll need it in CopyFrom. */
+ if (pstate)
+ cstate->range_table = pstate->p_rtable;
+
+ PopulateCstateCatalogInfo(cstate);
cstate->is_program = is_program;
if (data_source_cb)
@@ -3917,45 +3978,60 @@ CopyReadLine(CopyState cstate)
} while (CopyLoadRawBuf(cstate));
}
}
- else
- {
- /*
- * If we didn't hit EOF, then we must have transferred the EOL marker
- * to line_buf along with the data. Get rid of it.
- */
- switch (cstate->eol_type)
- {
- case EOL_NL:
- Assert(cstate->line_buf.len >= 1);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\n');
- cstate->line_buf.len--;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_CR:
- Assert(cstate->line_buf.len >= 1);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\r');
- cstate->line_buf.len--;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_CRNL:
- Assert(cstate->line_buf.len >= 2);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 2] == '\r');
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\n');
- cstate->line_buf.len -= 2;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_UNKNOWN:
- /* shouldn't get here */
- Assert(false);
- break;
- }
- }
+ ConvertToServerEncoding(cstate);
+ return result;
+}
+
+/*
+ * ClearEOLFromCopiedData - Clear EOL from the copied data.
+ */
+static void
+ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
+ int copy_line_pos, int *copy_line_size)
+{
+ /*
+ * If we didn't hit EOF, then we must have transferred the EOL marker
+ * to line_buf along with the data. Get rid of it.
+ */
+ switch (cstate->eol_type)
+ {
+ case EOL_NL:
+ Assert(*copy_line_size >= 1);
+ Assert(copy_line_data[copy_line_pos - 1] == '\n');
+ copy_line_data[copy_line_pos - 1] = '\0';
+ (*copy_line_size)--;
+ break;
+ case EOL_CR:
+ Assert(*copy_line_size >= 1);
+ Assert(copy_line_data[copy_line_pos - 1] == '\r');
+ copy_line_data[copy_line_pos - 1] = '\0';
+ (*copy_line_size)--;
+ break;
+ case EOL_CRNL:
+ Assert(*copy_line_size >= 2);
+ Assert(copy_line_data[copy_line_pos - 2] == '\r');
+ Assert(copy_line_data[copy_line_pos - 1] == '\n');
+ copy_line_data[copy_line_pos - 2] = '\0';
+ *copy_line_size -= 2;
+ break;
+ case EOL_UNKNOWN:
+ /* shouldn't get here */
+ Assert(false);
+ break;
+ }
+}
+
+/*
+ * ConvertToServerEncoding - Convert contents to server encoding.
+ */
+static void
+ConvertToServerEncoding(CopyState cstate)
+{
/* Done reading the line. Convert it to server encoding. */
if (cstate->need_transcoding)
{
char *cvt;
-
cvt = pg_any_to_server(cstate->line_buf.data,
cstate->line_buf.len,
cstate->file_encoding);
@@ -3967,11 +4043,8 @@ CopyReadLine(CopyState cstate)
pfree(cvt);
}
}
-
/* Now it's safe to use the buffer in error messages */
cstate->line_buf_converted = true;
-
- return result;
}
/*
@@ -4334,6 +4407,7 @@ not_end_of_copy:
* Transfer any still-uncopied data to line_buf.
*/
REFILL_LINEBUF;
+ CLEAR_EOL_LINE();
return result;
}
--
1.8.3.1
v5-0002-Framework-for-leader-worker-in-parallel-copy.patchtext/x-patch; charset=US-ASCII; name=v5-0002-Framework-for-leader-worker-in-parallel-copy.patchDownload
From 84a641370debe73a8e4f32b2628d0cb2d21ab75b Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Tue, 22 Sep 2020 13:54:45 +0530
Subject: [PATCH v5 2/6] Framework for leader/worker in parallel copy
This patch has the framework for data structures in parallel copy, leader
initialization, worker initialization, shared memory updation, starting workers,
wait for workers and workers exiting.
---
src/backend/access/transam/parallel.c | 4 +
src/backend/commands/copy.c | 742 +++++++++++++++++++++++++++++++++-
src/include/commands/copy.h | 2 +
src/tools/pgindent/typedefs.list | 7 +
4 files changed, 753 insertions(+), 2 deletions(-)
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index b042696..a3cff4b 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -25,6 +25,7 @@
#include "catalog/pg_enum.h"
#include "catalog/storage.h"
#include "commands/async.h"
+#include "commands/copy.h"
#include "executor/execParallel.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
@@ -145,6 +146,9 @@ static const struct
},
{
"parallel_vacuum_main", parallel_vacuum_main
+ },
+ {
+ "ParallelCopyMain", ParallelCopyMain
}
};
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cf7277a..cf16109 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -96,9 +96,183 @@ typedef enum CopyInsertMethod
} CopyInsertMethod;
#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+
+/*
+ * The macros DATA_BLOCK_SIZE, RINGSIZE & MAX_BLOCKS_COUNT stores the records
+ * read from the file that need to be inserted into the relation. These values
+ * help in handover of multiple records with significant size of data to be
+ * processed by each of the workers to make sure there is no context switch & the
+ * work is fairly distributed among the workers. This number showed best
+ * results in the performance tests.
+ */
+#define DATA_BLOCK_SIZE RAW_BUF_SIZE
+
+/* It can hold 1000 blocks of 64K data in DSM to be processed by the worker. */
+#define MAX_BLOCKS_COUNT 1000
+
+/*
+ * It can hold upto 10000 record information for worker to process. RINGSIZE
+ * should be a multiple of WORKER_CHUNK_COUNT, as wrap around cases is currently
+ * not handled while selecting the WORKER_CHUNK_COUNT by the worker.
+ */
+#define RINGSIZE (10 * 1000)
+
+/*
+ * Each worker will be allocated WORKER_CHUNK_COUNT of records from DSM data
+ * block to process to avoid lock contention. Read RINGSIZE comments before
+ * changing this value.
+ */
+#define WORKER_CHUNK_COUNT 50
+
+#define IsParallelCopy() (cstate->is_parallel)
+#define IsLeader() (cstate->pcdata->is_leader)
#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
/*
+ * Copy data block information.
+ * ParallelCopyDataBlock's will be created in DSM. Data read from file will be
+ * copied in these DSM data blocks. The leader process identifies the records
+ * and the record information will be shared to the workers. The workers will
+ * insert the records into the table. There can be one or more number of records
+ * in each of the data block based on the record size.
+ */
+typedef struct ParallelCopyDataBlock
+{
+ /* The number of unprocessed lines in the current block. */
+ pg_atomic_uint32 unprocessed_line_parts;
+
+ /*
+ * If the current line data is continued into another block, following_block
+ * will have the position where the remaining data need to be read.
+ */
+ uint32 following_block;
+
+ /*
+ * This flag will be set, when the leader finds out this block can be read
+ * safely by the worker. This helps the worker to start processing the line
+ * early where the line will be spread across many blocks and the worker
+ * need not wait for the complete line to be processed.
+ */
+ bool curr_blk_completed;
+ char data[DATA_BLOCK_SIZE]; /* data read from file */
+ uint8 skip_bytes;
+} ParallelCopyDataBlock;
+
+/*
+ * Individual line information.
+ * ParallelCopyLineBoundary is common data structure between leader & worker,
+ * Leader process will be populating data block, data block offset & the size of
+ * the record in DSM for the workers to copy the data into the relation.
+ * This is protected by the following sequence in the leader & worker. If they
+ * don't follow this order the worker might process wrong line_size and leader
+ * might populate the information which worker has not yet processed or in the
+ * process of processing.
+ * Leader should operate in the following order:
+ * 1) check if line_size is -1, if not wait, it means worker is still
+ * processing.
+ * 2) set line_state to LINE_LEADER_POPULATING.
+ * 3) update first_block, start_offset & cur_lineno in any order.
+ * 4) update line_size.
+ * 5) update line_state to LINE_LEADER_POPULATED.
+ * Worker should operate in the following order:
+ * 1) check line_state is LINE_LEADER_POPULATED, if not it means leader is still
+ * populating the data.
+ * 2) read line_size.
+ * 3) only one worker should choose one line for processing, this is handled by
+ * using pg_atomic_compare_exchange_u32, worker will change the state to
+ * LINE_WORKER_PROCESSING only if line_state is LINE_LEADER_POPULATED.
+ * 4) read first_block, start_offset & cur_lineno in any order.
+ * 5) process line_size data.
+ * 6) update line_size to -1.
+ */
+typedef struct ParallelCopyLineBoundary
+{
+ /* Position of the first block in data_blocks array. */
+ uint32 first_block;
+ uint32 start_offset; /* start offset of the line */
+
+ /*
+ * Size of the current line -1 means line is yet to be filled completely,
+ * 0 means empty line, >0 means line filled with line size data.
+ */
+ pg_atomic_uint32 line_size;
+ pg_atomic_uint32 line_state; /* line state */
+ uint64 cur_lineno; /* line number for error messages */
+} ParallelCopyLineBoundary;
+
+/*
+ * Circular queue used to store the line information.
+ */
+typedef struct ParallelCopyLineBoundaries
+{
+ /* Position for the leader to populate a line. */
+ uint32 pos;
+
+ /* Data read from the file/stdin by the leader process. */
+ ParallelCopyLineBoundary ring[RINGSIZE];
+} ParallelCopyLineBoundaries;
+
+/*
+ * Shared information among parallel copy workers. This will be allocated in the
+ * DSM segment.
+ */
+typedef struct ParallelCopyShmInfo
+{
+ bool is_read_in_progress; /* file read status */
+
+ /*
+ * Actual lines inserted by worker, will not be same as
+ * total_worker_processed if where condition is specified along with copy.
+ * This will be the actual records inserted into the relation.
+ */
+ pg_atomic_uint64 processed;
+
+ /*
+ * The number of records currently processed by the worker, this will also
+ * include the number of records that was filtered because of where clause.
+ */
+ pg_atomic_uint64 total_worker_processed;
+ uint64 populated; /* lines populated by leader */
+ uint32 cur_block_pos; /* current data block */
+ ParallelCopyDataBlock data_blocks[MAX_BLOCKS_COUNT]; /* data block array */
+ FullTransactionId full_transaction_id; /* xid for copy from statement */
+ CommandId mycid; /* command id */
+ ParallelCopyLineBoundaries line_boundaries; /* line array */
+} ParallelCopyShmInfo;
+
+/*
+ * Parallel copy line buffer information.
+ */
+typedef struct ParallelCopyLineBuf
+{
+ StringInfoData line_buf;
+ uint64 cur_lineno; /* line number for error messages */
+} ParallelCopyLineBuf;
+
+/*
+ * Parallel copy data information.
+ */
+typedef struct ParallelCopyData
+{
+ Oid relid; /* relation id of the table */
+ ParallelCopyShmInfo *pcshared_info; /* common info in shared memory */
+ bool is_leader;
+
+ /* line position which worker is processing */
+ uint32 worker_processed_pos;
+
+ /*
+ * Local line_buf array, workers will copy it here and release the lines
+ * for the leader to continue.
+ */
+ ParallelCopyLineBuf worker_line_buf[WORKER_CHUNK_COUNT];
+ uint32 worker_line_buf_count; /* Number of lines */
+
+ /* Current position in worker_line_buf */
+ uint32 worker_line_buf_pos;
+} ParallelCopyData;
+
+/*
* This struct contains all the state variables used throughout a COPY
* operation. For simplicity, we use the same struct for all variants of COPY,
* even though some fields are used in only some cases.
@@ -230,10 +404,38 @@ typedef struct CopyStateData
char *raw_buf;
int raw_buf_index; /* next byte to process */
int raw_buf_len; /* total # of bytes stored */
+ int nworkers;
+ bool is_parallel;
+ ParallelCopyData *pcdata;
/* Shorthand for number of unconsumed bytes available in raw_buf */
#define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
} CopyStateData;
+/*
+ * This structure helps in storing the common data from CopyStateData that are
+ * required by the workers. This information will then be allocated and stored
+ * into the DSM for the worker to retrieve and copy it to CopyStateData.
+ */
+typedef struct SerializedParallelCopyState
+{
+ /* low-level state data */
+ CopyDest copy_dest; /* type of copy source/destination */
+ int file_encoding; /* file or remote side's character encoding */
+ bool need_transcoding; /* file encoding diff from server? */
+ bool encoding_embeds_ascii; /* ASCII can be non-first byte? */
+
+ /* parameters from the COPY command */
+ bool csv_mode; /* Comma Separated Value format? */
+ bool header_line; /* CSV header line? */
+ int null_print_len; /* length of same */
+ bool force_quote_all; /* FORCE_QUOTE *? */
+ bool convert_selectively; /* do selective binary conversion? */
+
+ /* Working state for COPY FROM */
+ AttrNumber num_defaults;
+ Oid relid;
+} SerializedParallelCopyState;
+
/* DestReceiver for COPY (query) TO */
typedef struct
{
@@ -263,6 +465,22 @@ typedef struct
/* Trim the list of buffers back down to this number after flushing */
#define MAX_PARTITION_BUFFERS 32
+/* DSM keys for parallel copy. */
+#define PARALLEL_COPY_KEY_SHARED_INFO 1
+#define PARALLEL_COPY_KEY_CSTATE 2
+#define PARALLEL_COPY_KEY_NULL_PRINT 3
+#define PARALLEL_COPY_KEY_DELIM 4
+#define PARALLEL_COPY_KEY_QUOTE 5
+#define PARALLEL_COPY_KEY_ESCAPE 6
+#define PARALLEL_COPY_KEY_ATTNAME_LIST 7
+#define PARALLEL_COPY_KEY_NOT_NULL_LIST 8
+#define PARALLEL_COPY_KEY_NULL_LIST 9
+#define PARALLEL_COPY_KEY_CONVERT_LIST 10
+#define PARALLEL_COPY_KEY_WHERE_CLAUSE_STR 11
+#define PARALLEL_COPY_KEY_RANGE_TABLE 12
+#define PARALLEL_COPY_WAL_USAGE 13
+#define PARALLEL_COPY_BUFFER_USAGE 14
+
/* Stores multi-insert data related to a single relation in CopyFrom. */
typedef struct CopyMultiInsertBuffer
{
@@ -424,11 +642,478 @@ static bool CopyGetInt16(CopyState cstate, int16 *val);
static bool CopyLoadRawBuf(CopyState cstate);
static int CopyReadBinaryData(CopyState cstate, char *dest, int nbytes);
+static pg_attribute_always_inline void EndParallelCopy(ParallelContext *pcxt);
static void PopulateCommonCstateInfo(CopyState cstate, TupleDesc tup_desc,
List *attnamelist);
static void ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
int copy_line_pos, int *copy_line_size);
static void ConvertToServerEncoding(CopyState cstate);
+
+
+/*
+ * SerializeParallelCopyState - Copy shared_cstate using cstate information.
+ */
+static pg_attribute_always_inline void
+SerializeParallelCopyState(CopyState cstate, SerializedParallelCopyState *shared_cstate)
+{
+ shared_cstate->copy_dest = cstate->copy_dest;
+ shared_cstate->file_encoding = cstate->file_encoding;
+ shared_cstate->need_transcoding = cstate->need_transcoding;
+ shared_cstate->encoding_embeds_ascii = cstate->encoding_embeds_ascii;
+ shared_cstate->csv_mode = cstate->csv_mode;
+ shared_cstate->header_line = cstate->header_line;
+ shared_cstate->null_print_len = cstate->null_print_len;
+ shared_cstate->force_quote_all = cstate->force_quote_all;
+ shared_cstate->convert_selectively = cstate->convert_selectively;
+ shared_cstate->num_defaults = cstate->num_defaults;
+ shared_cstate->relid = cstate->pcdata->relid;
+}
+
+/*
+ * RestoreString - Retrieve the string from shared memory.
+ */
+static void
+RestoreString(shm_toc *toc, int sharedkey, char **copystr)
+{
+ char *shared_str_val = (char *) shm_toc_lookup(toc, sharedkey, true);
+ if (shared_str_val)
+ *copystr = pstrdup(shared_str_val);
+}
+
+/*
+ * EstimateLineKeysStr - Estimate the size required in shared memory for the
+ * input string.
+ */
+static void
+EstimateLineKeysStr(ParallelContext *pcxt, char *inputstr)
+{
+ if (inputstr)
+ {
+ shm_toc_estimate_chunk(&pcxt->estimator, strlen(inputstr) + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }
+}
+
+/*
+ * SerializeString - Insert a string into shared memory.
+ */
+static void
+SerializeString(ParallelContext *pcxt, int key, char *inputstr)
+{
+ if (inputstr)
+ {
+ char *shmptr = (char *) shm_toc_allocate(pcxt->toc,
+ strlen(inputstr) + 1);
+ strcpy(shmptr, inputstr);
+ shm_toc_insert(pcxt->toc, key, shmptr);
+ }
+}
+
+/*
+ * PopulateParallelCopyShmInfo - Set ParallelCopyShmInfo.
+ */
+static void
+PopulateParallelCopyShmInfo(ParallelCopyShmInfo *shared_info_ptr,
+ FullTransactionId full_transaction_id)
+{
+ uint32 count;
+
+ MemSet(shared_info_ptr, 0, sizeof(ParallelCopyShmInfo));
+ shared_info_ptr->is_read_in_progress = true;
+ shared_info_ptr->cur_block_pos = -1;
+ shared_info_ptr->full_transaction_id = full_transaction_id;
+ shared_info_ptr->mycid = GetCurrentCommandId(true);
+ for (count = 0; count < RINGSIZE; count++)
+ {
+ ParallelCopyLineBoundary *lineInfo = &shared_info_ptr->line_boundaries.ring[count];
+ pg_atomic_init_u32(&(lineInfo->line_size), -1);
+ }
+}
+
+/*
+ * BeginParallelCopy - Start parallel copy tasks.
+ *
+ * Get the number of workers required to perform the parallel copy. The data
+ * structures that are required by the parallel workers will be initialized, the
+ * size required in DSM will be calculated and the necessary keys will be loaded
+ * in the DSM. The specified number of workers will then be launched.
+ */
+static ParallelContext*
+BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
+{
+ ParallelContext *pcxt;
+ ParallelCopyShmInfo *shared_info_ptr;
+ SerializedParallelCopyState *shared_cstate;
+ FullTransactionId full_transaction_id;
+ Size est_cstateshared;
+ char *whereClauseStr = NULL;
+ char *rangeTableStr = NULL;
+ char *attnameListStr = NULL;
+ char *notnullListStr = NULL;
+ char *nullListStr = NULL;
+ char *convertListStr = NULL;
+ int parallel_workers = 0;
+ WalUsage *walusage;
+ BufferUsage *bufferusage;
+ ParallelCopyData *pcdata;
+ MemoryContext oldcontext;
+
+ parallel_workers = Min(nworkers, max_worker_processes);
+
+ /* Can't perform copy in parallel */
+ if (parallel_workers <= 0)
+ return NULL;
+
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+ pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
+ MemoryContextSwitchTo(oldcontext);
+ cstate->pcdata = pcdata;
+
+ EnterParallelMode();
+ pcxt = CreateParallelContext("postgres", "ParallelCopyMain",
+ parallel_workers);
+ Assert(pcxt->nworkers > 0);
+
+ /*
+ * Estimate size for shared information for PARALLEL_COPY_KEY_SHARED_INFO
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator, sizeof(ParallelCopyShmInfo));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate the size for shared information for PARALLEL_COPY_KEY_CSTATE */
+ est_cstateshared = MAXALIGN(sizeof(SerializedParallelCopyState));
+ shm_toc_estimate_chunk(&pcxt->estimator, est_cstateshared);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ EstimateLineKeysStr(pcxt, cstate->null_print);
+ EstimateLineKeysStr(pcxt, cstate->null_print_client);
+ EstimateLineKeysStr(pcxt, cstate->delim);
+ EstimateLineKeysStr(pcxt, cstate->quote);
+ EstimateLineKeysStr(pcxt, cstate->escape);
+
+ if (cstate->whereClause != NULL)
+ {
+ whereClauseStr = nodeToString(cstate->whereClause);
+ EstimateLineKeysStr(pcxt, whereClauseStr);
+ }
+
+ if (cstate->range_table != NULL)
+ {
+ rangeTableStr = nodeToString(cstate->range_table);
+ EstimateLineKeysStr(pcxt, rangeTableStr);
+ }
+
+ /* Estimate the size for shared information for PARALLEL_COPY_KEY_XID. */
+ shm_toc_estimate_chunk(&pcxt->estimator, sizeof(FullTransactionId));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_ATTNAME_LIST.
+ */
+ if (attnamelist != NIL)
+ {
+ attnameListStr = nodeToString(attnamelist);
+ EstimateLineKeysStr(pcxt, attnameListStr);
+ }
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_NOT_NULL_LIST.
+ */
+ if (cstate->force_notnull != NIL)
+ {
+ notnullListStr = nodeToString(cstate->force_notnull);
+ EstimateLineKeysStr(pcxt, notnullListStr);
+ }
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_NULL_LIST.
+ */
+ if (cstate->force_null != NIL)
+ {
+ nullListStr = nodeToString(cstate->force_null);
+ EstimateLineKeysStr(pcxt, nullListStr);
+ }
+
+ /*
+ * Estimate the size for shared information for
+ * PARALLEL_COPY_KEY_CONVERT_LIST.
+ */
+ if (cstate->convert_select != NIL)
+ {
+ convertListStr = nodeToString(cstate->convert_select);
+ EstimateLineKeysStr(pcxt, convertListStr);
+ }
+
+ /*
+ * Estimate space for WalUsage and BufferUsage -- PARALLEL_COPY_WAL_USAGE
+ * and PARALLEL_COPY_BUFFER_USAGE.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ InitializeParallelDSM(pcxt);
+
+ /* If no DSM segment was available, back out (do serial copy) */
+ if (pcxt->seg == NULL)
+ {
+ EndParallelCopy(pcxt);
+ return NULL;
+ }
+
+ /* Allocate shared memory for PARALLEL_COPY_KEY_SHARED_INFO */
+ shared_info_ptr = (ParallelCopyShmInfo *) shm_toc_allocate(pcxt->toc, sizeof(ParallelCopyShmInfo));
+ PopulateParallelCopyShmInfo(shared_info_ptr, full_transaction_id);
+
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_SHARED_INFO, shared_info_ptr);
+ pcdata->pcshared_info = shared_info_ptr;
+ pcdata->relid = relid;
+
+ /* Store shared build state, for which we reserved space. */
+ shared_cstate = (SerializedParallelCopyState *) shm_toc_allocate(pcxt->toc, est_cstateshared);
+
+ /* copy cstate variables. */
+ SerializeParallelCopyState(cstate, shared_cstate);
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_CSTATE, shared_cstate);
+
+ SerializeString(pcxt, PARALLEL_COPY_KEY_NULL_PRINT, cstate->null_print);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_DELIM, cstate->delim);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_QUOTE, cstate->quote);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_ESCAPE, cstate->escape);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_ATTNAME_LIST, attnameListStr);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_NOT_NULL_LIST, notnullListStr);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_NULL_LIST, nullListStr);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_CONVERT_LIST, convertListStr);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_WHERE_CLAUSE_STR, whereClauseStr);
+ SerializeString(pcxt, PARALLEL_COPY_KEY_RANGE_TABLE, rangeTableStr);
+
+ /*
+ * Allocate space for each worker's WalUsage and BufferUsage; no need to
+ * initialize.
+ */
+ walusage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_WAL_USAGE, walusage);
+ bufferusage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_BUFFER_USAGE, bufferusage);
+
+ LaunchParallelWorkers(pcxt);
+ if (pcxt->nworkers_launched == 0)
+ {
+ EndParallelCopy(pcxt);
+ return NULL;
+ }
+
+ /*
+ * Caller needs to wait for all launched workers when we return. Make sure
+ * that the failure-to-start case will not hang forever.
+ */
+ WaitForParallelWorkersToAttach(pcxt);
+
+ pcdata->is_leader = true;
+ cstate->is_parallel = true;
+ return pcxt;
+}
+
+/*
+ * EndParallelCopy - End the parallel copy tasks.
+ */
+static pg_attribute_always_inline void
+EndParallelCopy(ParallelContext *pcxt)
+{
+ Assert(!IsParallelWorker());
+
+ DestroyParallelContext(pcxt);
+ ExitParallelMode();
+}
+
+/*
+ * InitializeParallelCopyInfo - Initialize parallel worker.
+ */
+static void
+InitializeParallelCopyInfo(SerializedParallelCopyState *shared_cstate,
+ CopyState cstate, List *attnamelist)
+{
+ uint32 count;
+ ParallelCopyData *pcdata = cstate->pcdata;
+ TupleDesc tup_desc = RelationGetDescr(cstate->rel);
+
+ cstate->copy_dest = shared_cstate->copy_dest;
+ cstate->file_encoding = shared_cstate->file_encoding;
+ cstate->need_transcoding = shared_cstate->need_transcoding;
+ cstate->encoding_embeds_ascii = shared_cstate->encoding_embeds_ascii;
+ cstate->csv_mode = shared_cstate->csv_mode;
+ cstate->header_line = shared_cstate->header_line;
+ cstate->null_print_len = shared_cstate->null_print_len;
+ cstate->force_quote_all = shared_cstate->force_quote_all;
+ cstate->convert_selectively = shared_cstate->convert_selectively;
+ cstate->num_defaults = shared_cstate->num_defaults;
+ pcdata->relid = shared_cstate->relid;
+
+ PopulateCommonCstateInfo(cstate, tup_desc, attnamelist);
+
+ /* Initialize state variables. */
+ cstate->reached_eof = false;
+ cstate->eol_type = EOL_UNKNOWN;
+ cstate->cur_relname = RelationGetRelationName(cstate->rel);
+ cstate->cur_lineno = 0;
+ cstate->cur_attname = NULL;
+ cstate->cur_attval = NULL;
+
+ /* Set up variables to avoid per-attribute overhead. */
+ initStringInfo(&cstate->attribute_buf);
+
+ initStringInfo(&cstate->line_buf);
+ for (count = 0; count < WORKER_CHUNK_COUNT; count++)
+ initStringInfo(&pcdata->worker_line_buf[count].line_buf);
+
+ cstate->line_buf_converted = false;
+ cstate->raw_buf = NULL;
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+}
+
+/*
+ * ParallelCopyMain - Parallel copy worker's code.
+ *
+ * Where clause handling, convert tuple to columns, add default null values for
+ * the missing columns that are not present in that record. Find the partition
+ * if it is partitioned table, invoke before row insert Triggers, handle
+ * constraints and insert the tuples.
+ */
+void
+ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
+{
+ CopyState cstate;
+ ParallelCopyData *pcdata;
+ ParallelCopyShmInfo *pcshared_info;
+ SerializedParallelCopyState *shared_cstate;
+ Relation rel = NULL;
+ MemoryContext oldcontext;
+ List *attlist = NIL;
+ char *whereClauseStr = NULL;
+ char *rangeTableStr = NULL;
+ char *attnameListStr = NULL;
+ char *notnullListStr = NULL;
+ char *nullListStr = NULL;
+ char *convertListStr = NULL;
+ WalUsage *walusage;
+ BufferUsage *bufferusage;
+
+ /* Allocate workspace and zero all fields. */
+ cstate = (CopyStateData *) palloc0(sizeof(CopyStateData));
+
+ /*
+ * We allocate everything used by a cstate in a new memory context. This
+ * avoids memory leaks during repeated use of COPY in a query.
+ */
+ cstate->copycontext = AllocSetContextCreate(CurrentMemoryContext,
+ "COPY",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+
+ pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
+ cstate->pcdata = pcdata;
+ pcdata->is_leader = false;
+ pcdata->worker_processed_pos = -1;
+ cstate->is_parallel = true;
+ pcshared_info = (ParallelCopyShmInfo *) shm_toc_lookup(toc, PARALLEL_COPY_KEY_SHARED_INFO, false);
+
+ ereport(DEBUG1, (errmsg("Starting parallel copy worker")));
+
+ pcdata->pcshared_info = pcshared_info;
+
+ shared_cstate = (SerializedParallelCopyState *) shm_toc_lookup(toc, PARALLEL_COPY_KEY_CSTATE, false);
+ cstate->null_print = (char *) shm_toc_lookup(toc, PARALLEL_COPY_KEY_NULL_PRINT, true);
+
+ RestoreString(toc, PARALLEL_COPY_KEY_DELIM, &cstate->delim);
+ RestoreString(toc, PARALLEL_COPY_KEY_QUOTE, &cstate->quote);
+ RestoreString(toc, PARALLEL_COPY_KEY_ESCAPE, &cstate->escape);
+ RestoreString(toc, PARALLEL_COPY_KEY_ATTNAME_LIST, &attnameListStr);
+ if (attnameListStr)
+ attlist = (List *) stringToNode(attnameListStr);
+
+ RestoreString(toc, PARALLEL_COPY_KEY_NOT_NULL_LIST, ¬nullListStr);
+ if (notnullListStr)
+ cstate->force_notnull = (List *) stringToNode(notnullListStr);
+
+ RestoreString(toc, PARALLEL_COPY_KEY_NULL_LIST, &nullListStr);
+ if (nullListStr)
+ cstate->force_null = (List *) stringToNode(nullListStr);
+
+ RestoreString(toc, PARALLEL_COPY_KEY_CONVERT_LIST, &convertListStr);
+ if (convertListStr)
+ cstate->convert_select = (List *) stringToNode(convertListStr);
+
+ RestoreString(toc, PARALLEL_COPY_KEY_WHERE_CLAUSE_STR, &whereClauseStr);
+ RestoreString(toc, PARALLEL_COPY_KEY_RANGE_TABLE, &rangeTableStr);
+
+ if (whereClauseStr)
+ {
+ Node *whereClauseCnvrtdFrmStr = (Node *) stringToNode(whereClauseStr);
+ cstate->whereClause = whereClauseCnvrtdFrmStr;
+ }
+
+ if (rangeTableStr)
+ {
+ List *rangeTableCnvrtdFrmStr = (List *) stringToNode(rangeTableStr);
+ cstate->range_table = rangeTableCnvrtdFrmStr;
+ }
+
+ /* Open and lock the relation, using the appropriate lock type. */
+ rel = table_open(shared_cstate->relid, RowExclusiveLock);
+ cstate->rel = rel;
+ InitializeParallelCopyInfo(shared_cstate, cstate, attlist);
+
+ CopyFrom(cstate);
+
+ if (rel != NULL)
+ table_close(rel, RowExclusiveLock);
+
+ /* Report WAL/buffer usage during parallel execution */
+ bufferusage = shm_toc_lookup(toc, PARALLEL_COPY_BUFFER_USAGE, false);
+ walusage = shm_toc_lookup(toc, PARALLEL_COPY_WAL_USAGE, false);
+ InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber],
+ &walusage[ParallelWorkerNumber]);
+
+ MemoryContextSwitchTo(oldcontext);
+ pfree(cstate);
+ return;
+}
+
+/*
+ * ParallelCopyFrom - parallel copy leader's functionality.
+ *
+ * Leader executes the before statement for before statement trigger, if before
+ * statement trigger is present. It will read the table data from the file and
+ * copy the contents to DSM data blocks. It will then read the input contents
+ * from the DSM data block and identify the records based on line breaks. This
+ * information is called line or a record that need to be inserted into a
+ * relation. The line information will be stored in ParallelCopyLineBoundary DSM
+ * data structure. Workers will then process this information and insert the
+ * data in to table. It will repeat this process until the all data is read from
+ * the file and all the DSM data blocks are processed. While processing if
+ * leader identifies that DSM Data blocks or DSM ParallelCopyLineBoundary data
+ * structures is full, leader will wait till the worker frees up some entries
+ * and repeat the process. It will wait till all the lines populated are
+ * processed by the workers and exits.
+ */
+static void
+ParallelCopyFrom(CopyState cstate)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ereport(DEBUG1, (errmsg("Running parallel copy leader")));
+
+ pcshared_info->is_read_in_progress = false;
+ cstate->cur_lineno = 0;
+}
+
/*
* Send copy start/stop messages for frontend copies. These have changed
* in past protocol redesigns.
@@ -1141,6 +1826,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
if (is_from)
{
+ ParallelContext *pcxt = NULL;
Assert(rel);
/* check read-only transaction and parallel mode */
@@ -1150,7 +1836,24 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
cstate = BeginCopyFrom(pstate, rel, stmt->filename, stmt->is_program,
NULL, stmt->attlist, stmt->options);
cstate->whereClause = whereClause;
- *processed = CopyFrom(cstate); /* copy from file to database */
+ cstate->is_parallel = false;
+
+ if (cstate->nworkers > 0)
+ pcxt = BeginParallelCopy(cstate->nworkers, cstate, stmt->attlist,
+ relid);
+
+ if (pcxt)
+ {
+ ParallelCopyFrom(cstate);
+
+ /* Wait for all copy workers to finish */
+ WaitForParallelWorkersToFinish(pcxt);
+ *processed = pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
+ EndParallelCopy(pcxt);
+ }
+ else
+ *processed = CopyFrom(cstate); /* copy from file to database */
+
EndCopyFrom(cstate);
}
else
@@ -1199,6 +1902,7 @@ ProcessCopyOptions(ParseState *pstate,
cstate->is_copy_from = is_from;
cstate->file_encoding = -1;
+ cstate->nworkers = -1;
/* Extract options from the statement node tree */
foreach(option, options)
@@ -1367,6 +2071,39 @@ ProcessCopyOptions(ParseState *pstate,
defel->defname),
parser_errposition(pstate, defel->location)));
}
+ else if (strcmp(defel->defname, "parallel") == 0)
+ {
+ int val;
+ bool parsed;
+ char *strval;
+
+ if (!cstate->is_copy_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("parallel option is supported only for copy from"),
+ parser_errposition(pstate, defel->location)));
+ if (cstate->nworkers >= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("conflicting or redundant options"),
+ parser_errposition(pstate, defel->location)));
+
+ strval = defGetString(defel);
+ parsed = parse_int(strval, &val, 0, NULL);
+ if (!parsed)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for integer option \"%s\": %s",
+ defel->defname, strval)));
+ if (val < 1 || val > 1024)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("value %s out of bounds for option \"%s\"",
+ strval, defel->defname),
+ errdetail("Valid values are between \"%d\" and \"%d\".",
+ 1, 1024)));
+ cstate->nworkers = val;
+ }
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1720,7 +2457,8 @@ BeginCopy(ParseState *pstate,
/*
* PopulateCommonCstateInfo - Populates the common variables required for copy
- * from operation. This is a helper function for BeginCopy function.
+ * from operation. This is a helper function for BeginCopy &
+ * InitializeParallelCopyInfo function.
*/
static void
PopulateCommonCstateInfo(CopyState cstate, TupleDesc tupDesc, List *attnamelist)
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c639833..82843c6 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -14,6 +14,7 @@
#ifndef COPY_H
#define COPY_H
+#include "access/parallel.h"
#include "nodes/execnodes.h"
#include "nodes/parsenodes.h"
#include "parser/parse_node.h"
@@ -41,4 +42,5 @@ extern uint64 CopyFrom(CopyState cstate);
extern DestReceiver *CreateCopyDestReceiver(void);
+extern void ParallelCopyMain(dsm_segment *seg, shm_toc *toc);
#endif /* COPY_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index b1afb34..509c695 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1702,6 +1702,12 @@ ParallelBitmapHeapState
ParallelBlockTableScanDesc
ParallelCompletionPtr
ParallelContext
+ParallelCopyLineBoundaries
+ParallelCopyLineBoundary
+ParallelCopyData
+ParallelCopyDataBlock
+ParallelCopyLineBuf
+ParallelCopyShmInfo
ParallelExecutorInfo
ParallelHashGrowth
ParallelHashJoinBatch
@@ -2219,6 +2225,7 @@ SerCommitSeqNo
SerialControl
SerializableXactHandle
SerializedActiveRelMaps
+SerializedParallelCopyState
SerializedReindexState
SerializedSnapshotData
SerializedTransactionState
--
1.8.3.1
v5-0003-Allow-copy-from-command-to-process-data-from-file.patchtext/x-patch; charset=US-ASCII; name=v5-0003-Allow-copy-from-command-to-process-data-from-file.patchDownload
From b7d2f516f9ae9088663d7e68247541ce6b019c40 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Tue, 22 Sep 2020 13:00:32 +0530
Subject: [PATCH v5 3/6] Allow copy from command to process data from
file/STDIN contents to a table in parallel.
This feature allows the copy from to leverage multiple CPUs in order to copy
data from file/STDIN to a table. This adds a PARALLEL option to COPY FROM
command where the user can specify the number of workers that can be used
to perform the COPY FROM command.
The backend, to which the "COPY FROM" query is submitted acts as leader with
the responsibility of reading data from the file/stdin, launching at most n
number of workers as specified with PARALLEL 'n' option in the "COPY FROM"
query. The leader populates the common data required for the workers execution
in the DSM and shares it with the workers. The leader then executes before
statement triggers if there exists any. Leader populates DSM lines which
includes the start offset and line size, while populating the lines it reads
as many blocks as required into the DSM data blocks from the file. Each block
is of 64K size. The leader parses the data to identify a line, the existing
logic from CopyReadLineText which identifies the lines with some changes was
used for this. Leader checks if a free line is available to copy the
information, if there is no free line it waits till the required line is
freed up by the worker and then copies the identified lines information
(offset & line size) into the DSM lines. This process is repeated till the
complete file is processed. Simultaneously, the workers cache the lines(50)
locally into the local memory and release the lines to the leader for further
populating. Each worker processes the lines that it cached and inserts it into
the table. The leader does not participate in the insertion of data, leaders
only responsibility will be to identify the lines as fast as possible for the
workers to do the actual copy operation. The leader waits till all the lines
populated are processed by the workers and exits.
We have chosen this design based on the reason "that everything stalls if the
leader doesn't accept further input data, as well as when there are no
available splitted chunks so it doesn't seem like a good idea to have the
leader do other work. This is backed by the performance data where we have
seen that with 1 worker there is just a 5-10% performance difference".
---
src/backend/access/common/toast_internals.c | 11 +-
src/backend/access/heap/heapam.c | 11 -
src/backend/access/transam/xact.c | 31 +
src/backend/commands/copy.c | 896 ++++++++++++++++++++++++++--
src/bin/psql/tab-complete.c | 2 +-
src/include/access/xact.h | 2 +
src/tools/pgindent/typedefs.list | 1 +
7 files changed, 902 insertions(+), 52 deletions(-)
diff --git a/src/backend/access/common/toast_internals.c b/src/backend/access/common/toast_internals.c
index 25a81e5..586d53d 100644
--- a/src/backend/access/common/toast_internals.c
+++ b/src/backend/access/common/toast_internals.c
@@ -17,6 +17,7 @@
#include "access/genam.h"
#include "access/heapam.h"
#include "access/heaptoast.h"
+#include "access/parallel.h"
#include "access/table.h"
#include "access/toast_internals.h"
#include "access/xact.h"
@@ -116,7 +117,15 @@ toast_save_datum(Relation rel, Datum value,
TupleDesc toasttupDesc;
Datum t_values[3];
bool t_isnull[3];
- CommandId mycid = GetCurrentCommandId(true);
+
+ /*
+ * Parallel copy can insert toast tuples, in case of parallel copy the
+ * command would have been set already by calling AssignCommandIdForWorker.
+ * For parallel copy call GetCurrentCommandId to get currentCommandId by
+ * passing used as false, as this is taken care earlier.
+ */
+ CommandId mycid = IsParallelWorker() ? GetCurrentCommandId(false) :
+ GetCurrentCommandId(true);
struct varlena *result;
struct varatt_external toast_pointer;
union
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 1585861..1602525 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2043,17 +2043,6 @@ static HeapTuple
heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
CommandId cid, int options)
{
- /*
- * To allow parallel inserts, we need to ensure that they are safe to be
- * performed in workers. We have the infrastructure to allow parallel
- * inserts in general except for the cases where inserts generate a new
- * CommandId (eg. inserts into a table having a foreign key column).
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-
tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index af6afce..e983f78 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -517,6 +517,37 @@ GetCurrentFullTransactionIdIfAny(void)
}
/*
+ * AssignFullTransactionIdForWorker
+ *
+ * For parallel copy, transaction id of leader will be used by the workers.
+ */
+void
+AssignFullTransactionIdForWorker(FullTransactionId fullTransactionId)
+{
+ TransactionState s = CurrentTransactionState;
+
+ Assert((IsInParallelMode() || IsParallelWorker()));
+ s->fullTransactionId = fullTransactionId;
+}
+
+/*
+ * AssignCommandIdForWorker
+ *
+ * For parallel copy, command id of leader will be used by the workers.
+ */
+void
+AssignCommandIdForWorker(CommandId commandId, bool used)
+{
+ Assert((IsInParallelMode() || IsParallelWorker()));
+
+ /* this is global to a transaction, not subtransaction-local */
+ if (used)
+ currentCommandIdUsed = true;
+
+ currentCommandId = commandId;
+}
+
+/*
* MarkCurrentTransactionIdLoggedIfAny
*
* Remember that the current xid - if it is assigned - now has been wal logged.
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cf16109..ba188d7 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -26,6 +26,7 @@
#include "access/xlog.h"
#include "catalog/dependency.h"
#include "catalog/pg_authid.h"
+#include "catalog/pg_proc_d.h"
#include "catalog/pg_type.h"
#include "commands/copy.h"
#include "commands/defrem.h"
@@ -40,11 +41,13 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "optimizer/clauses.h"
#include "optimizer/optimizer.h"
#include "parser/parse_coerce.h"
#include "parser/parse_collate.h"
#include "parser/parse_expr.h"
#include "parser/parse_relation.h"
+#include "pgstat.h"
#include "port/pg_bswap.h"
#include "rewrite/rewriteHandler.h"
#include "storage/fd.h"
@@ -95,6 +98,18 @@ typedef enum CopyInsertMethod
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
} CopyInsertMethod;
+/*
+ * State of the line.
+ */
+typedef enum ParallelCopyLineState
+{
+ LINE_INIT, /* initial state of line */
+ LINE_LEADER_POPULATING, /* leader processing line */
+ LINE_LEADER_POPULATED, /* leader completed populating line */
+ LINE_WORKER_PROCESSING, /* worker processing line */
+ LINE_WORKER_PROCESSED /* worker completed processing line */
+} ParallelCopyLineState;
+
#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
/*
@@ -126,6 +141,7 @@ typedef enum CopyInsertMethod
#define IsParallelCopy() (cstate->is_parallel)
#define IsLeader() (cstate->pcdata->is_leader)
+#define IsWorker() (IsParallelCopy() && !IsLeader())
#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
/*
@@ -559,9 +575,13 @@ if (1) \
{ \
if (raw_buf_ptr > cstate->raw_buf_index) \
{ \
- appendBinaryStringInfo(&cstate->line_buf, \
- cstate->raw_buf + cstate->raw_buf_index, \
- raw_buf_ptr - cstate->raw_buf_index); \
+ if (!IsParallelCopy()) \
+ appendBinaryStringInfo(&cstate->line_buf, \
+ cstate->raw_buf + cstate->raw_buf_index, \
+ raw_buf_ptr - cstate->raw_buf_index); \
+ else \
+ line_size += raw_buf_ptr - cstate->raw_buf_index; \
+ \
cstate->raw_buf_index = raw_buf_ptr; \
} \
} else ((void) 0)
@@ -574,26 +594,65 @@ if (1) \
goto not_end_of_copy; \
} else ((void) 0)
+/* Begin parallel copy Macros */
+#define SET_NEWLINE_SIZE() \
+{ \
+ if (cstate->eol_type == EOL_NL || cstate->eol_type == EOL_CR) \
+ new_line_size = 1; \
+ else if (cstate->eol_type == EOL_CRNL) \
+ new_line_size = 2; \
+ else \
+ new_line_size = 0; \
+}
+
+/*
+ * COPY_WAIT_TO_PROCESS - Wait before continuing to process.
+ */
+#define COPY_WAIT_TO_PROCESS() \
+{ \
+ CHECK_FOR_INTERRUPTS(); \
+ (void) WaitLatch(MyLatch, \
+ WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, \
+ 1L, WAIT_EVENT_PG_SLEEP); \
+ ResetLatch(MyLatch); \
+}
+
/*
* CLEAR_EOL_LINE - Wrapper for clearing EOL.
*/
#define CLEAR_EOL_LINE() \
if (!result && !IsHeaderLine()) \
- ClearEOLFromCopiedData(cstate, cstate->line_buf.data, \
- cstate->line_buf.len, \
- &cstate->line_buf.len) \
+{ \
+ if (IsParallelCopy()) \
+ ClearEOLFromCopiedData(cstate, cstate->raw_buf, \
+ raw_buf_ptr, &line_size); \
+ else \
+ ClearEOLFromCopiedData(cstate, cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ &cstate->line_buf.len); \
+} \
/*
* INCREMENTPROCESSED - Increment the lines processed.
*/
-#define INCREMENTPROCESSED(processed) \
-processed++;
+#define INCREMENTPROCESSED(processed) \
+{ \
+ if (!IsParallelCopy()) \
+ processed++; \
+ else \
+ pg_atomic_add_fetch_u64(&cstate->pcdata->pcshared_info->processed, 1); \
+}
/*
* RETURNPROCESSED - Get the lines processed.
*/
#define RETURNPROCESSED(processed) \
-return processed;
+if (!IsParallelCopy()) \
+ return processed; \
+else \
+ return pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
+
+/* End parallel copy Macros */
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
@@ -648,7 +707,10 @@ static void PopulateCommonCstateInfo(CopyState cstate, TupleDesc tup_desc,
static void ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
int copy_line_pos, int *copy_line_size);
static void ConvertToServerEncoding(CopyState cstate);
-
+static void ExecBeforeStmtTrigger(CopyState cstate);
+static void CheckTargetRelValidity(CopyState cstate);
+static void PopulateCstateCatalogInfo(CopyState cstate);
+static pg_attribute_always_inline uint32 GetLinePosition(CopyState cstate);
/*
* SerializeParallelCopyState - Copy shared_cstate using cstate information.
@@ -731,6 +793,137 @@ PopulateParallelCopyShmInfo(ParallelCopyShmInfo *shared_info_ptr,
}
/*
+ * IsTriggerFunctionParallelSafe - Check if the trigger function is parallel
+ * safe for the triggers. Return false if any one of the trigger has parallel
+ * unsafe function.
+ */
+static pg_attribute_always_inline bool
+IsTriggerFunctionParallelSafe(TriggerDesc *trigdesc)
+{
+ int i;
+ for (i = 0; i < trigdesc->numtriggers; i++)
+ {
+ Trigger *trigger = &trigdesc->triggers[i];
+ int trigtype = RI_TRIGGER_NONE;
+
+ if (func_parallel(trigger->tgfoid) != PROPARALLEL_SAFE)
+ return false;
+
+ /* If the trigger is parallel safe, also look for RI_TRIGGER. */
+ trigtype = RI_FKey_trigger_type(trigger->tgfoid);
+ if (trigtype == RI_TRIGGER_PK || trigtype == RI_TRIGGER_FK)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * CheckExprParallelSafety - Determine parallel safety of volatile expressions
+ * in default clause of column definition or in where clause and return true if
+ * they are parallel safe.
+ */
+static pg_attribute_always_inline bool
+CheckExprParallelSafety(CopyState cstate)
+{
+ if (contain_volatile_functions(cstate->whereClause))
+ {
+ if (max_parallel_hazard((Query *) cstate->whereClause) != PROPARALLEL_SAFE)
+ return false;
+ }
+
+ /*
+ * Check if any of the column has volatile default expression. if yes, and
+ * they are not parallel safe, then parallelism is not allowed. For
+ * instance, if there are any serial/bigserial columns for which nextval()
+ * default expression which is parallel unsafe is associated, parallelism
+ * should not be allowed. In non parallel copy volatile functions are not
+ * checked for nextval().
+ */
+ if (cstate->defexprs != NULL && cstate->num_defaults != 0)
+ {
+ int i;
+ for (i = 0; i < cstate->num_defaults; i++)
+ {
+ bool volatile_expr = contain_volatile_functions((Node *)cstate->defexprs[i]->expr);
+ if (volatile_expr &&
+ (max_parallel_hazard((Query *)cstate->defexprs[i]->expr)) !=
+ PROPARALLEL_SAFE)
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * FindInsertMethod - Determine insert mode single, multi, or multi conditional.
+ */
+static pg_attribute_always_inline CopyInsertMethod
+FindInsertMethod(CopyState cstate)
+{
+ if (cstate->rel->trigdesc != NULL &&
+ (cstate->rel->trigdesc->trig_insert_before_row ||
+ cstate->rel->trigdesc->trig_insert_instead_row))
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE &&
+ cstate->rel->trigdesc != NULL &&
+ cstate->rel->trigdesc->trig_insert_new_table)
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE)
+ return CIM_SINGLE;
+
+ if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+ return CIM_MULTI_CONDITIONAL;
+
+ return CIM_MULTI;
+}
+
+/*
+ * IsParallelCopyAllowed - Check for the cases where parallel copy is not
+ * applicable.
+ */
+static pg_attribute_always_inline bool
+IsParallelCopyAllowed(CopyState cstate)
+{
+ /* Parallel copy not allowed for frontend (2.0 protocol) & binary option. */
+ if ((cstate->copy_dest == COPY_OLD_FE) || cstate->binary)
+ return false;
+
+ /* Check if copy is into foreign table or temporary table. */
+ if (cstate->rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE ||
+ RelationUsesLocalBuffers(cstate->rel))
+ return false;
+
+ /* Check if trigger function is parallel safe. */
+ if (cstate->rel->trigdesc != NULL &&
+ !IsTriggerFunctionParallelSafe(cstate->rel->trigdesc))
+ return false;
+
+ /*
+ * Check if there is after statement or instead of trigger or transition
+ * table triggers.
+ */
+ if (cstate->rel->trigdesc != NULL &&
+ (cstate->rel->trigdesc->trig_insert_after_statement ||
+ cstate->rel->trigdesc->trig_insert_instead_row ||
+ cstate->rel->trigdesc->trig_insert_new_table))
+ return false;
+
+ /* Check if the volatile expressions are parallel safe, if present any. */
+ if (!CheckExprParallelSafety(cstate))
+ return false;
+
+ /* Check if the insertion mode is single. */
+ if (FindInsertMethod(cstate) == CIM_SINGLE)
+ return false;
+
+ return true;
+}
+
+/*
* BeginParallelCopy - Start parallel copy tasks.
*
* Get the number of workers required to perform the parallel copy. The data
@@ -758,6 +951,7 @@ BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
ParallelCopyData *pcdata;
MemoryContext oldcontext;
+ CheckTargetRelValidity(cstate);
parallel_workers = Min(nworkers, max_worker_processes);
/* Can't perform copy in parallel */
@@ -769,6 +963,15 @@ BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
MemoryContextSwitchTo(oldcontext);
cstate->pcdata = pcdata;
+ /*
+ * User chosen parallel copy. Determine if the parallel copy is actually
+ * allowed. If not, go with the non-parallel mode.
+ */
+ if (!IsParallelCopyAllowed(cstate))
+ return NULL;
+
+ full_transaction_id = GetCurrentFullTransactionId();
+
EnterParallelMode();
pcxt = CreateParallelContext("postgres", "ParallelCopyMain",
parallel_workers);
@@ -977,9 +1180,214 @@ InitializeParallelCopyInfo(SerializedParallelCopyState *shared_cstate,
cstate->line_buf_converted = false;
cstate->raw_buf = NULL;
cstate->raw_buf_index = cstate->raw_buf_len = 0;
+
+ PopulateCstateCatalogInfo(cstate);
+
+ /* Create workspace for CopyReadAttributes results. */
+ if (!cstate->binary)
+ {
+ AttrNumber attr_count = list_length(cstate->attnumlist);
+
+ cstate->max_fields = attr_count;
+ cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
+ }
+}
+
+/*
+ * CacheLineInfo - Cache the line information to local memory.
+ */
+static bool
+CacheLineInfo(CopyState cstate, uint32 buff_count)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyData *pcdata = cstate->pcdata;
+ uint32 write_pos;
+ ParallelCopyDataBlock *data_blk_ptr;
+ ParallelCopyLineBoundary *lineInfo;
+ uint32 offset;
+ int dataSize;
+ int copiedSize = 0;
+
+ resetStringInfo(&pcdata->worker_line_buf[buff_count].line_buf);
+ write_pos = GetLinePosition(cstate);
+ if (-1 == write_pos)
+ return true;
+
+ /* Get the current line information. */
+ lineInfo = &pcshared_info->line_boundaries.ring[write_pos];
+ if (pg_atomic_read_u32(&lineInfo->line_size) == 0)
+ goto empty_data_line_update;
+
+ /* Get the block information. */
+ data_blk_ptr = &pcshared_info->data_blocks[lineInfo->first_block];
+
+ /* Get the offset information from where the data must be copied. */
+ offset = lineInfo->start_offset;
+ pcdata->worker_line_buf[buff_count].cur_lineno = lineInfo->cur_lineno;
+
+ elog(DEBUG1, "[Worker] Processing - line position:%d, block:%d, unprocessed lines:%d, offset:%d, line size:%d",
+ write_pos, lineInfo->first_block,
+ pg_atomic_read_u32(&data_blk_ptr->unprocessed_line_parts),
+ offset, pg_atomic_read_u32(&lineInfo->line_size));
+
+ for (;;)
+ {
+ uint8 skip_bytes = data_blk_ptr->skip_bytes;
+
+ /*
+ * There is a possibility that the loop embedded at the bottom of the
+ * current loop has come out because data_blk_ptr->curr_blk_completed
+ * is set, but dataSize read might be an old value, if
+ * data_blk_ptr->curr_blk_completed and the line is completed, line_size
+ * will be set. Read the line_size again to be sure if it is completed
+ * or partial block.
+ */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+ if (dataSize)
+ {
+ int remainingSize = dataSize - copiedSize;
+ if (!remainingSize)
+ break;
+
+ /* Whole line is in current block. */
+ if (remainingSize + offset + skip_bytes < DATA_BLOCK_SIZE)
+ {
+ appendBinaryStringInfo(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ remainingSize);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts,
+ 1);
+ break;
+ }
+ else
+ {
+ /* Line is spread across the blocks. */
+ uint32 lineInCurrentBlock = (DATA_BLOCK_SIZE - skip_bytes) - offset;
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ lineInCurrentBlock);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts, 1);
+ copiedSize += lineInCurrentBlock;
+ while (copiedSize < dataSize)
+ {
+ uint32 currentBlockCopySize;
+ ParallelCopyDataBlock *currBlkPtr = &pcshared_info->data_blocks[data_blk_ptr->following_block];
+ skip_bytes = currBlkPtr->skip_bytes;
+
+ /*
+ * If complete data is present in current block use
+ * dataSize - copiedSize, or copy the whole block from
+ * current block.
+ */
+ currentBlockCopySize = Min(dataSize - copiedSize, DATA_BLOCK_SIZE - skip_bytes);
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &currBlkPtr->data[0],
+ currentBlockCopySize);
+ pg_atomic_sub_fetch_u32(&currBlkPtr->unprocessed_line_parts, 1);
+ copiedSize += currentBlockCopySize;
+ data_blk_ptr = currBlkPtr;
+ }
+
+ break;
+ }
+ }
+ else
+ {
+ /* Copy this complete block from the current offset. */
+ uint32 lineInCurrentBlock = (DATA_BLOCK_SIZE - skip_bytes) - offset;
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ lineInCurrentBlock);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts, 1);
+ copiedSize += lineInCurrentBlock;
+
+ /*
+ * Reset the offset. For the first copy, copy from the offset. For
+ * the subsequent copy the complete block.
+ */
+ offset = 0;
+
+ /* Set data_blk_ptr to the following block. */
+ data_blk_ptr = &pcshared_info->data_blocks[data_blk_ptr->following_block];
+ }
+
+ for (;;)
+ {
+ /* Get the size of this line */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+
+ /*
+ * If the data is present in current block lineInfo->line_size
+ * will be updated. If the data is spread across the blocks either
+ * of lineInfo->line_size or data_blk_ptr->curr_blk_completed can
+ * be updated. lineInfo->line_size will be updated if the complete
+ * read is finished. data_blk_ptr->curr_blk_completed will be
+ * updated if processing of current block is finished and data
+ * processing is not finished.
+ */
+ if (data_blk_ptr->curr_blk_completed || (dataSize != -1))
+ break;
+
+ COPY_WAIT_TO_PROCESS()
+ }
+ }
+
+empty_data_line_update:
+ elog(DEBUG1, "[Worker] Completed processing line:%d", write_pos);
+ pg_atomic_write_u32(&lineInfo->line_state, LINE_WORKER_PROCESSED);
+ pg_atomic_write_u32(&lineInfo->line_size, -1);
+ pg_atomic_add_fetch_u64(&pcshared_info->total_worker_processed, 1);
+ return false;
}
/*
+ * GetWorkerLine - Returns a line for worker to process.
+ */
+static bool
+GetWorkerLine(CopyState cstate)
+{
+ uint32 buff_count;
+ ParallelCopyData *pcdata = cstate->pcdata;
+
+ /*
+ * Copy the line data to line_buf and release the line position so that the
+ * worker can continue loading data.
+ */
+ if (pcdata->worker_line_buf_pos < pcdata->worker_line_buf_count)
+ goto return_line;
+
+ pcdata->worker_line_buf_pos = 0;
+ pcdata->worker_line_buf_count = 0;
+
+ for (buff_count = 0; buff_count < WORKER_CHUNK_COUNT; buff_count++)
+ {
+ bool result = CacheLineInfo(cstate, buff_count);
+ if (result)
+ break;
+
+ pcdata->worker_line_buf_count++;
+ }
+
+ if (pcdata->worker_line_buf_count)
+ goto return_line;
+ else
+ resetStringInfo(&cstate->line_buf);
+
+ return true;
+
+return_line:
+ cstate->line_buf = pcdata->worker_line_buf[pcdata->worker_line_buf_pos].line_buf;
+ cstate->cur_lineno = pcdata->worker_line_buf[pcdata->worker_line_buf_pos].cur_lineno;
+ cstate->line_buf_valid = true;
+
+ /* Mark that encoding conversion hasn't occurred yet. */
+ cstate->line_buf_converted = false;
+ ConvertToServerEncoding(cstate);
+ pcdata->worker_line_buf_pos++;
+ return false;
+ }
+
+/*
* ParallelCopyMain - Parallel copy worker's code.
*
* Where clause handling, convert tuple to columns, add default null values for
@@ -1028,6 +1436,8 @@ ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
ereport(DEBUG1, (errmsg("Starting parallel copy worker")));
pcdata->pcshared_info = pcshared_info;
+ AssignFullTransactionIdForWorker(pcshared_info->full_transaction_id);
+ AssignCommandIdForWorker(pcshared_info->mycid, true);
shared_cstate = (SerializedParallelCopyState *) shm_toc_lookup(toc, PARALLEL_COPY_KEY_CSTATE, false);
cstate->null_print = (char *) shm_toc_lookup(toc, PARALLEL_COPY_KEY_NULL_PRINT, true);
@@ -1088,6 +1498,33 @@ ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
}
/*
+ * UpdateBlockInLineInfo - Update the line information.
+ */
+static pg_attribute_always_inline int
+UpdateBlockInLineInfo(CopyState cstate, uint32 blk_pos,
+ uint32 offset, uint32 line_size, uint32 line_state)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundaries *lineBoundaryPtr = &pcshared_info->line_boundaries;
+ ParallelCopyLineBoundary *lineInfo;
+ int line_pos = lineBoundaryPtr->pos;
+
+ /* Update the line information for the worker to pick and process. */
+ lineInfo = &lineBoundaryPtr->ring[line_pos];
+ while (pg_atomic_read_u32(&lineInfo->line_size) != -1)
+ COPY_WAIT_TO_PROCESS()
+
+ lineInfo->first_block = blk_pos;
+ lineInfo->start_offset = offset;
+ lineInfo->cur_lineno = cstate->cur_lineno;
+ pg_atomic_write_u32(&lineInfo->line_size, line_size);
+ pg_atomic_write_u32(&lineInfo->line_state, line_state);
+ lineBoundaryPtr->pos = (lineBoundaryPtr->pos + 1) % RINGSIZE;
+
+ return line_pos;
+}
+
+/*
* ParallelCopyFrom - parallel copy leader's functionality.
*
* Leader executes the before statement for before statement trigger, if before
@@ -1110,8 +1547,302 @@ ParallelCopyFrom(CopyState cstate)
ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
ereport(DEBUG1, (errmsg("Running parallel copy leader")));
+ /* raw_buf is not used in parallel copy, instead data blocks are used.*/
+ pfree(cstate->raw_buf);
+ cstate->raw_buf = NULL;
+
+ /* Execute the before statement triggers from the leader */
+ ExecBeforeStmtTrigger(cstate);
+
+ /* On input just throw the header line away. */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
+ {
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ {
+ pcshared_info->is_read_in_progress = false;
+ return; /* done */
+ }
+ }
+
+ for (;;)
+ {
+ bool done;
+ cstate->cur_lineno++;
+
+ /* Actually read the line into memory here. */
+ done = CopyReadLine(cstate);
+
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ break;
+ }
+
pcshared_info->is_read_in_progress = false;
cstate->cur_lineno = 0;
+ }
+
+/*
+ * GetLinePosition - Return the line position that worker should process.
+ */
+static uint32
+GetLinePosition(CopyState cstate)
+{
+ ParallelCopyData *pcdata = cstate->pcdata;
+ ParallelCopyShmInfo *pcshared_info = pcdata->pcshared_info;
+ uint32 previous_pos = pcdata->worker_processed_pos;
+ uint32 write_pos = (previous_pos == -1) ? 0 : (previous_pos + 1) % RINGSIZE;
+ for (;;)
+ {
+ int dataSize;
+ bool is_read_in_progress = pcshared_info->is_read_in_progress;
+ ParallelCopyLineBoundary *lineInfo;
+ ParallelCopyDataBlock *data_blk_ptr;
+ ParallelCopyLineState line_state = LINE_LEADER_POPULATED;
+ ParallelCopyLineState curr_line_state;
+ CHECK_FOR_INTERRUPTS();
+
+ /* File read completed & no elements to process. */
+ if (!is_read_in_progress &&
+ (pcshared_info->populated ==
+ pg_atomic_read_u64(&pcshared_info->total_worker_processed)))
+ {
+ write_pos = -1;
+ break;
+ }
+
+ /* Get the current line information. */
+ lineInfo = &pcshared_info->line_boundaries.ring[write_pos];
+ curr_line_state = pg_atomic_read_u32(&lineInfo->line_state);
+ if ((write_pos % WORKER_CHUNK_COUNT == 0) &&
+ (curr_line_state == LINE_WORKER_PROCESSED ||
+ curr_line_state == LINE_WORKER_PROCESSING))
+ {
+ pcdata->worker_processed_pos = write_pos;
+ write_pos = (write_pos + WORKER_CHUNK_COUNT) % RINGSIZE;
+ continue;
+ }
+
+ /* Get the size of this line. */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+
+ if (dataSize != 0) /* If not an empty line. */
+ {
+ /* Get the block information. */
+ data_blk_ptr = &pcshared_info->data_blocks[lineInfo->first_block];
+
+ if (!data_blk_ptr->curr_blk_completed && (dataSize == -1))
+ {
+ /* Wait till the current line or block is added. */
+ COPY_WAIT_TO_PROCESS()
+ continue;
+ }
+ }
+
+ /* Make sure that no worker has consumed this element. */
+ if (pg_atomic_compare_exchange_u32(&lineInfo->line_state,
+ &line_state, LINE_WORKER_PROCESSING))
+ break;
+ }
+
+ pcdata->worker_processed_pos = write_pos;
+ return write_pos;
+}
+
+/*
+ * GetFreeCopyBlock - Get a free block for data to be copied.
+ */
+static pg_attribute_always_inline uint32
+GetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
+{
+ int count = 0;
+ uint32 last_free_block = pcshared_info->cur_block_pos;
+ uint32 block_pos = (last_free_block != -1) ? ((last_free_block + 1) % MAX_BLOCKS_COUNT): 0;
+
+ /*
+ * Get a new block for copying data, don't check current block, current
+ * block will have some unprocessed data.
+ */
+ while (count < (MAX_BLOCKS_COUNT - 1))
+ {
+ ParallelCopyDataBlock *dataBlkPtr = &pcshared_info->data_blocks[block_pos];
+ uint32 unprocessed_line_parts = pg_atomic_read_u32(&dataBlkPtr->unprocessed_line_parts);
+ if (unprocessed_line_parts == 0)
+ {
+ dataBlkPtr->curr_blk_completed = false;
+ dataBlkPtr->skip_bytes = 0;
+ pcshared_info->cur_block_pos = block_pos;
+ return block_pos;
+ }
+
+ block_pos = (block_pos + 1) % MAX_BLOCKS_COUNT;
+ count++;
+ }
+
+ return -1;
+}
+
+/*
+ * WaitGetFreeCopyBlock - If there are no blocks available, wait and get a block
+ * for copying data.
+ */
+static uint32
+WaitGetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
+{
+ uint32 new_free_pos = -1;
+ for (;;)
+ {
+ new_free_pos = GetFreeCopyBlock(pcshared_info);
+ if (new_free_pos != -1) /* We have got one block, break now. */
+ break;
+
+ COPY_WAIT_TO_PROCESS()
+ }
+
+ return new_free_pos;
+}
+
+/*
+ * SetRawBufForLoad - Set raw_buf to the shared memory where the file data must
+ * be read.
+ */
+static void
+SetRawBufForLoad(CopyState cstate, uint32 line_size, uint32 copy_buf_len,
+ uint32 raw_buf_ptr, char **copy_raw_buf)
+{
+ ParallelCopyShmInfo *pcshared_info;
+ uint32 cur_block_pos;
+ uint32 next_block_pos;
+ ParallelCopyDataBlock *cur_data_blk_ptr = NULL;
+ ParallelCopyDataBlock *next_data_blk_ptr = NULL;
+
+ if (!IsParallelCopy())
+ return;
+
+ pcshared_info = cstate->pcdata->pcshared_info;
+ cur_block_pos = pcshared_info->cur_block_pos;
+ cur_data_blk_ptr = (cstate->raw_buf) ? &pcshared_info->data_blocks[cur_block_pos] : NULL;
+ next_block_pos = WaitGetFreeCopyBlock(pcshared_info);
+ next_data_blk_ptr = &pcshared_info->data_blocks[next_block_pos];
+
+ /* set raw_buf to the data block in shared memory */
+ cstate->raw_buf = next_data_blk_ptr->data;
+ *copy_raw_buf = cstate->raw_buf;
+ if (cur_data_blk_ptr && line_size)
+ {
+ /*
+ * Mark the previous block as completed, worker can start copying this
+ * data.
+ */
+ cur_data_blk_ptr->following_block = next_block_pos;
+ pg_atomic_add_fetch_u32(&cur_data_blk_ptr->unprocessed_line_parts, 1);
+ cur_data_blk_ptr->skip_bytes = copy_buf_len - raw_buf_ptr;
+ cur_data_blk_ptr->curr_blk_completed = true;
+ }
+}
+
+/*
+ * EndLineParallelCopy - Update the line information in shared memory.
+ */
+static void
+EndLineParallelCopy(CopyState cstate, uint32 line_pos, uint32 line_size,
+ uint32 raw_buf_ptr)
+{
+ uint8 new_line_size;
+ if (!IsParallelCopy())
+ return;
+
+ if (!IsHeaderLine())
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundaries *lineBoundaryPtr = &pcshared_info->line_boundaries;
+ SET_NEWLINE_SIZE()
+ if (line_size)
+ {
+ ParallelCopyLineBoundary *lineInfo = &lineBoundaryPtr->ring[line_pos];
+ /*
+ * If the new_line_size > raw_buf_ptr, then the new block has only
+ * new line char content. The unprocessed count should not be
+ * increased in this case.
+ */
+ if (raw_buf_ptr > new_line_size)
+ {
+ uint32 cur_block_pos = pcshared_info->cur_block_pos;
+ ParallelCopyDataBlock *curr_data_blk_ptr = &pcshared_info->data_blocks[cur_block_pos];
+ pg_atomic_add_fetch_u32(&curr_data_blk_ptr->unprocessed_line_parts, 1);
+ }
+
+ /* Update line size. */
+ pg_atomic_write_u32(&lineInfo->line_size, line_size);
+ pg_atomic_write_u32(&lineInfo->line_state, LINE_LEADER_POPULATED);
+ elog(DEBUG1, "[Leader] After adding - line position:%d, line_size:%d",
+ line_pos, line_size);
+ pcshared_info->populated++;
+ }
+ else if (new_line_size)
+ {
+ /* This means only new line char, empty record should be inserted.*/
+ ParallelCopyLineBoundary *lineInfo;
+ line_pos = UpdateBlockInLineInfo(cstate, -1, -1, 0,
+ LINE_LEADER_POPULATED);
+ lineInfo = &lineBoundaryPtr->ring[line_pos];
+ elog(DEBUG1, "[Leader] Added empty line with offset:%d, line position:%d, line size:%d",
+ lineInfo->start_offset, line_pos,
+ pg_atomic_read_u32(&lineInfo->line_size));
+ pcshared_info->populated++;
+ }
+ }
+}
+
+/*
+ * ExecBeforeStmtTrigger - Execute the before statement trigger, this will be
+ * executed for parallel copy by the leader process.
+ */
+static void
+ExecBeforeStmtTrigger(CopyState cstate)
+{
+ EState *estate = CreateExecutorState();
+ ResultRelInfo *resultRelInfo;
+
+ Assert(IsLeader());
+
+ /*
+ * We need a ResultRelInfo so we can use the regular executor's
+ * index-entry-making machinery. (There used to be a huge amount of code
+ * here that basically duplicated execUtils.c ...)
+ */
+ resultRelInfo = makeNode(ResultRelInfo);
+ InitResultRelInfo(resultRelInfo,
+ cstate->rel,
+ 1, /* must match rel's position in range_table */
+ NULL,
+ 0);
+
+ /* Verify the named relation is a valid target for INSERT */
+ CheckValidResultRel(resultRelInfo, CMD_INSERT);
+
+ estate->es_result_relations = resultRelInfo;
+ estate->es_num_result_relations = 1;
+ estate->es_result_relation_info = resultRelInfo;
+
+ ExecInitRangeTable(estate, cstate->range_table);
+
+ /*
+ * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
+ * should do this for COPY, since it's not really an "INSERT" statement as
+ * such. However, executing these triggers maintains consistency with the
+ * EACH ROW triggers that we already fire on COPY.
+ */
+ ExecBSInsertTriggers(estate, resultRelInfo);
+
+ /* Close any trigger target relations */
+ ExecCleanUpTriggerState(estate);
+
+ FreeExecutorState(estate);
}
/*
@@ -3573,7 +4304,8 @@ CopyFrom(CopyState cstate)
PartitionTupleRouting *proute = NULL;
ErrorContextCallback errcallback;
- CommandId mycid = GetCurrentCommandId(true);
+ CommandId mycid = IsParallelCopy() ? GetCurrentCommandId(false) :
+ GetCurrentCommandId(true);
int ti_options = 0; /* start with default options for insert */
BulkInsertState bistate = NULL;
CopyInsertMethod insertMethod;
@@ -3583,7 +4315,14 @@ CopyFrom(CopyState cstate)
bool has_instead_insert_row_trig;
bool leafpart_use_multi_insert = false;
- CheckTargetRelValidity(cstate);
+ /*
+ * Perform this check if it is not parallel copy. In case of parallel
+ * copy, this check is done by the leader, so that if any invalid case
+ * exist the copy from command will error out from the leader itself,
+ * avoiding launching workers, just to throw error.
+ */
+ if (!IsParallelCopy())
+ CheckTargetRelValidity(cstate);
/*
* If the target file is new-in-transaction, we assume that checking FSM
@@ -3623,7 +4362,8 @@ CopyFrom(CopyState cstate)
target_resultRelInfo = resultRelInfo;
/* Verify the named relation is a valid target for INSERT */
- CheckValidResultRel(resultRelInfo, CMD_INSERT);
+ if (!IsParallelCopy())
+ CheckValidResultRel(resultRelInfo, CMD_INSERT);
ExecOpenIndices(resultRelInfo, false);
@@ -3772,13 +4512,16 @@ CopyFrom(CopyState cstate)
has_instead_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
resultRelInfo->ri_TrigDesc->trig_insert_instead_row);
- /*
- * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
- * should do this for COPY, since it's not really an "INSERT" statement as
- * such. However, executing these triggers maintains consistency with the
- * EACH ROW triggers that we already fire on COPY.
- */
- ExecBSInsertTriggers(estate, resultRelInfo);
+ if (!IsParallelCopy())
+ {
+ /*
+ * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
+ * should do this for COPY, since it's not really an "INSERT" statement as
+ * such. However, executing these triggers maintains consistency with the
+ * EACH ROW triggers that we already fire on COPY.
+ */
+ ExecBSInsertTriggers(estate, resultRelInfo);
+ }
econtext = GetPerTupleExprContext(estate);
@@ -3878,6 +4621,16 @@ CopyFrom(CopyState cstate)
!has_instead_insert_row_trig &&
resultRelInfo->ri_FdwRoutine == NULL;
+ /*
+ * If the table has any partitions that are either foreign or
+ * has BEFORE/INSTEAD OF triggers, we can't perform copy
+ * operations with parallel workers.
+ */
+ if (!leafpart_use_multi_insert && IsParallelWorker())
+ ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot perform PARALLEL COPY if partition has BEFORE/INSTEAD OF triggers, or if the partition is foreign partition"),
+ errhint("Try COPY without PARALLEL option")));
+
/* Set the multi-insert buffer to use for this partition. */
if (leafpart_use_multi_insert)
{
@@ -4294,7 +5047,7 @@ BeginCopyFrom(ParseState *pstate,
* only in text mode.
*/
initStringInfo(&cstate->attribute_buf);
- cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf = (IsParallelCopy()) ? NULL : (char *) palloc(RAW_BUF_SIZE + 1);
cstate->raw_buf_index = cstate->raw_buf_len = 0;
if (!cstate->binary)
{
@@ -4443,26 +5196,35 @@ NextCopyFromRawFields(CopyState cstate, char ***fields, int *nfields)
/* only available for text or csv input */
Assert(!cstate->binary);
- /* on input just throw the header line away */
- if (cstate->cur_lineno == 0 && cstate->header_line)
+ if (IsParallelCopy())
{
- cstate->cur_lineno++;
- if (CopyReadLine(cstate))
- return false; /* done */
+ done = GetWorkerLine(cstate);
+ if (done && cstate->line_buf.len == 0)
+ return false;
}
+ else
+ {
+ /* on input just throw the header line away */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
+ {
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ return false; /* done */
+ }
- cstate->cur_lineno++;
+ cstate->cur_lineno++;
- /* Actually read the line into memory here */
- done = CopyReadLine(cstate);
+ /* Actually read the line into memory here */
+ done = CopyReadLine(cstate);
- /*
- * EOF at start of line means we're done. If we see EOF after some
- * characters, we act as though it was newline followed by EOF, ie,
- * process the line and then exit loop on next iteration.
- */
- if (done && cstate->line_buf.len == 0)
- return false;
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ return false;
+ }
/* Parse the line into de-escaped field values */
if (cstate->csv_mode)
@@ -4710,9 +5472,31 @@ CopyReadLine(CopyState cstate)
*/
if (cstate->copy_dest == COPY_NEW_FE)
{
+ bool bIsFirst = true;
do
{
- cstate->raw_buf_index = cstate->raw_buf_len;
+ if (!IsParallelCopy())
+ cstate->raw_buf_index = cstate->raw_buf_len;
+ else
+ {
+ if (cstate->raw_buf_index == RAW_BUF_SIZE)
+ {
+ /* Get a new block if it is the first time, From the
+ * subsequent time, reset the index and re-use the same
+ * block.
+ */
+ if (bIsFirst)
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint32 block_pos = WaitGetFreeCopyBlock(pcshared_info);
+ cstate->raw_buf = pcshared_info->data_blocks[block_pos].data;
+ bIsFirst = false;
+ }
+
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+ }
+ }
+
} while (CopyLoadRawBuf(cstate));
}
}
@@ -4767,7 +5551,7 @@ static void
ConvertToServerEncoding(CopyState cstate)
{
/* Done reading the line. Convert it to server encoding. */
- if (cstate->need_transcoding)
+ if (cstate->need_transcoding && (!IsParallelCopy() || IsWorker()))
{
char *cvt;
cvt = pg_any_to_server(cstate->line_buf.data,
@@ -4806,6 +5590,11 @@ CopyReadLineText(CopyState cstate)
char quotec = '\0';
char escapec = '\0';
+ /* For parallel copy */
+ int line_size = 0;
+ int line_pos = 0;
+
+ cstate->eol_type = EOL_UNKNOWN;
if (cstate->csv_mode)
{
quotec = cstate->quote[0];
@@ -4860,6 +5649,8 @@ CopyReadLineText(CopyState cstate)
if (raw_buf_ptr >= copy_buf_len || need_data)
{
REFILL_LINEBUF;
+ SetRawBufForLoad(cstate, line_size, copy_buf_len, raw_buf_ptr,
+ ©_raw_buf);
/*
* Try to read some more data. This will certainly reset
@@ -5084,9 +5875,15 @@ CopyReadLineText(CopyState cstate)
* discard the data and the \. sequence.
*/
if (prev_raw_ptr > cstate->raw_buf_index)
- appendBinaryStringInfo(&cstate->line_buf,
+ {
+ if (!IsParallelCopy())
+ appendBinaryStringInfo(&cstate->line_buf,
cstate->raw_buf + cstate->raw_buf_index,
prev_raw_ptr - cstate->raw_buf_index);
+ else
+ line_size += prev_raw_ptr - cstate->raw_buf_index;
+ }
+
cstate->raw_buf_index = raw_buf_ptr;
result = true; /* report EOF */
break;
@@ -5138,6 +5935,26 @@ not_end_of_copy:
IF_NEED_REFILL_AND_EOF_BREAK(mblen - 1);
raw_buf_ptr += mblen - 1;
}
+
+ /*
+ * Skip the header line. Update the line here, this cannot be done at
+ * the beginning, as there is a possibility that file contains empty
+ * lines.
+ */
+ if (IsParallelCopy() && first_char_in_line && !IsHeaderLine())
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundary *lineInfo;
+ uint32 line_first_block = pcshared_info->cur_block_pos;
+ line_pos = UpdateBlockInLineInfo(cstate,
+ line_first_block,
+ cstate->raw_buf_index, -1,
+ LINE_LEADER_POPULATING);
+ lineInfo = &pcshared_info->line_boundaries.ring[line_pos];
+ elog(DEBUG1, "[Leader] Adding - block:%d, offset:%d, line position:%d",
+ line_first_block, lineInfo->start_offset, line_pos);
+ }
+
first_char_in_line = false;
} /* end of outer loop */
@@ -5146,6 +5963,7 @@ not_end_of_copy:
*/
REFILL_LINEBUF;
CLEAR_EOL_LINE();
+ EndLineParallelCopy(cstate, line_pos, line_size, raw_buf_ptr);
return result;
}
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 9c6f5ec..43fc823 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2353,7 +2353,7 @@ psql_completion(const char *text, int start, int end)
/* Complete COPY <sth> FROM|TO filename WITH ( */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "("))
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
- "HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
+ "HEADER", "PARALLEL", "QUOTE", "ESCAPE", "FORCE_QUOTE",
"FORCE_NOT_NULL", "FORCE_NULL", "ENCODING");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index df1b43a..71a6c9b 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -385,6 +385,8 @@ extern FullTransactionId GetTopFullTransactionId(void);
extern FullTransactionId GetTopFullTransactionIdIfAny(void);
extern FullTransactionId GetCurrentFullTransactionId(void);
extern FullTransactionId GetCurrentFullTransactionIdIfAny(void);
+extern void AssignFullTransactionIdForWorker(FullTransactionId fullTransactionId);
+extern void AssignCommandIdForWorker(CommandId commandId, bool used);
extern void MarkCurrentTransactionIdLoggedIfAny(void);
extern bool SubTransactionIsActive(SubTransactionId subxid);
extern CommandId GetCurrentCommandId(bool used);
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 509c695..e8d8ffd 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1707,6 +1707,7 @@ ParallelCopyLineBoundary
ParallelCopyData
ParallelCopyDataBlock
ParallelCopyLineBuf
+ParallelCopyLineState
ParallelCopyShmInfo
ParallelExecutorInfo
ParallelHashGrowth
--
1.8.3.1
v5-0004-Documentation-for-parallel-copy.patchtext/x-patch; charset=US-ASCII; name=v5-0004-Documentation-for-parallel-copy.patchDownload
From 6a04ea9b0fbd62966d37212c7590f62c9d71b309 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Sat, 1 Aug 2020 09:00:36 +0530
Subject: [PATCH v5 4/6] Documentation for parallel copy.
This patch has the documentation changes for parallel copy.
---
doc/src/sgml/ref/copy.sgml | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18189ab..2e023ed 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -37,6 +37,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
NULL '<replaceable class="parameter">null_string</replaceable>'
HEADER [ <replaceable class="parameter">boolean</replaceable> ]
+ PARALLEL <replaceable class="parameter">integer</replaceable>
QUOTE '<replaceable class="parameter">quote_character</replaceable>'
ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -275,6 +276,22 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</varlistentry>
<varlistentry>
+ <term><literal>PARALLEL</literal></term>
+ <listitem>
+ <para>
+ Perform <command>COPY FROM</command> in parallel using <replaceable
+ class="parameter"> integer</replaceable> background workers. Please
+ note that it is not guaranteed that the number of parallel workers
+ specified in <replaceable class="parameter">integer</replaceable> will
+ be used during execution. It is possible for a copy to run with fewer
+ workers than specified, or even with no workers at all (for example,
+ due to the setting of max_worker_processes). This option is allowed
+ only in <command>COPY FROM</command>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>QUOTE</literal></term>
<listitem>
<para>
--
1.8.3.1
v5-0005-Tests-for-parallel-copy.patchtext/x-patch; charset=US-ASCII; name=v5-0005-Tests-for-parallel-copy.patchDownload
From f2b0635303b986ea944f38e9325b1f06c53a5060 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Sat, 1 Aug 2020 09:19:39 +0530
Subject: [PATCH v5 5/6] Tests for parallel copy.
This patch has the tests for parallel copy.
---
src/test/regress/expected/copy2.out | 205 ++++++++++++++++++++++++++++++++++-
src/test/regress/input/copy.source | 12 +++
src/test/regress/output/copy.source | 12 +++
src/test/regress/sql/copy2.sql | 208 +++++++++++++++++++++++++++++++++++-
4 files changed, 429 insertions(+), 8 deletions(-)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index e40287d..7ae5d44 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -254,18 +254,32 @@ It is "perfect".|
"It is ""perfect""."," "
"",
--test that we read consecutive LFs properly
-CREATE TEMP TABLE testnl (a int, b text, c int);
+CREATE TABLE testnl (a int, b text, c int);
COPY testnl FROM stdin CSV;
+COPY testnl FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
+SELECT * FROM testnl;
+ a | b | c
+---+----------------------+---
+ 1 | a field with two LFs+| 2
+ | +|
+ | inside |
+ 1 | a field with two LFs+| 2
+ | +|
+ | inside |
+(2 rows)
+
-- test end of copy marker
-CREATE TEMP TABLE testeoc (a text);
+CREATE TABLE testeoc (a text);
COPY testeoc FROM stdin CSV;
+TRUNCATE testeoc;
+COPY testeoc FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
COPY testeoc TO stdout CSV;
a\.
\.b
c\.d
"\."
-- test handling of nonstandard null marker that violates escaping rules
-CREATE TEMP TABLE testnull(a int, b text);
+CREATE TABLE testnull(a int, b text);
INSERT INTO testnull VALUES (1, E'\\0'), (NULL, NULL);
COPY testnull TO stdout WITH NULL AS E'\\0';
1 \\0
@@ -280,6 +294,15 @@ SELECT * FROM testnull;
|
(4 rows)
+TRUNCATE testnull;
+COPY testnull FROM stdin WITH (NULL E'\\0', PARALLEL 1);
+SELECT * FROM testnull;
+ a | b
+----+----
+ 42 | \0
+ |
+(2 rows)
+
BEGIN;
CREATE TABLE vistest (LIKE testeoc);
COPY vistest FROM stdin CSV;
@@ -349,6 +372,34 @@ SELECT * FROM vistest;
BEGIN;
TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+SELECT * FROM vistest;
+ a
+----
+ a2
+ b
+(2 rows)
+
+SAVEPOINT s1;
+TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+SELECT * FROM vistest;
+ a
+----
+ d2
+ e
+(2 rows)
+
+COMMIT;
+SELECT * FROM vistest;
+ a
+----
+ d2
+ e
+(2 rows)
+
+BEGIN;
+TRUNCATE vistest;
COPY vistest FROM stdin CSV FREEZE;
SELECT * FROM vistest;
a
@@ -409,7 +460,7 @@ SELECT * FROM vistest;
(2 rows)
-- Test FORCE_NOT_NULL and FORCE_NULL options
-CREATE TEMP TABLE forcetest (
+CREATE TABLE forcetest (
a INT NOT NULL,
b TEXT NOT NULL,
c TEXT,
@@ -420,6 +471,8 @@ CREATE TEMP TABLE forcetest (
-- should succeed with no effect ("b" remains an empty string, "c" remains NULL)
BEGIN;
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
+TRUNCATE forcetest;
+COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c), PARALLEL 1);
COMMIT;
SELECT b, c FROM forcetest WHERE a = 1;
b | c
@@ -430,6 +483,8 @@ SELECT b, c FROM forcetest WHERE a = 1;
-- should succeed, FORCE_NULL and FORCE_NOT_NULL can be both specified
BEGIN;
COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d));
+TRUNCATE forcetest;
+COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d), PARALLEL 1);
COMMIT;
SELECT c, d FROM forcetest WHERE a = 2;
c | d
@@ -486,6 +541,31 @@ select * from check_con_tbl;
(2 rows)
+\d+ check_con_tbl
+ Table "public.check_con_tbl"
+ Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
+--------+---------+-----------+----------+---------+---------+--------------+-------------
+ f1 | integer | | | | plain | |
+Check constraints:
+ "check_con_tbl_check" CHECK (check_con_function(check_con_tbl.*))
+
+truncate check_con_tbl;
+copy check_con_tbl from stdin with (parallel 1);
+NOTICE: input = {"f1":1}
+NOTICE: input = {"f1":null}
+copy check_con_tbl from stdin with (parallel 1);
+NOTICE: input = {"f1":0}
+ERROR: new row for relation "check_con_tbl" violates check constraint "check_con_tbl_check"
+DETAIL: Failing row contains (0).
+CONTEXT: COPY check_con_tbl, line 1: "0"
+parallel worker
+select * from check_con_tbl;
+ f1
+----
+ 1
+
+(2 rows)
+
-- test with RLS enabled.
CREATE ROLE regress_rls_copy_user;
CREATE ROLE regress_rls_copy_user_colperms;
@@ -600,8 +680,125 @@ SELECT * FROM instead_of_insert_tbl;
(2 rows)
COMMIT;
+-- Parallel copy tests.
+CREATE TABLE test_parallel_copy (
+ a INT,
+ b INT,
+ c TEXT not null default 'stuff',
+ d TEXT,
+ e TEXT
+) ;
+COPY test_parallel_copy (a, b, c, d, e) from stdin with (PARALLEL 1);
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1, FORMAT 'csv', HEADER);
+-- zero workers: should perform non-parallel copy
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0');
+ERROR: value 0 out of bounds for option "parallel"
+DETAIL: Valid values are between "1" and "1024".
+-- referencing table: should perform non-parallel copy
+CREATE TABLE test_copy_pk(c1 INT PRIMARY KEY);
+INSERT INTO test_copy_pk VALUES(10);
+CREATE TABLE test_copy_ri(c1 INT REFERENCES test_copy_pk(c1));
+COPY test_copy_ri from stdin with (FORMAT csv, DELIMITER ',', PARALLEL 1);
+-- expressions: should perform non-parallel copy
+CREATE TABLE test_copy_expr (index INT, height REAL, weight REAL);
+COPY test_copy_expr FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1) WHERE height > random() * 65;
+-- serial data: should perform non-parallel copy
+CREATE TABLE testserial (index SERIAL, height REAL);
+COPY testserial(height) FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1);
+-- temporary table copy: should perform non-parallel copy
+CREATE TEMPORARY TABLE temp_test(
+ a int
+) ;
+COPY temp_test (a) from stdin with (PARALLEL 1);
+-- non-existent column in column list: should fail
+COPY test_parallel_copy (xyz) from stdin with (PARALLEL 1);
+ERROR: column "xyz" of relation "test_parallel_copy" does not exist
+-- too many columns in column list: should fail
+COPY test_parallel_copy (a, b, c, d, e, d, c) from stdin with (PARALLEL 1);
+ERROR: column "d" specified more than once
+-- missing data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: invalid input syntax for type integer: ""
+CONTEXT: COPY test_parallel_copy, line 0, column a: ""
+parallel worker
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: missing data for column "e"
+CONTEXT: COPY test_parallel_copy, line 1: "2000 230 23 23"
+parallel worker
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: missing data for column "e"
+CONTEXT: COPY test_parallel_copy, line 1: "2001 231 \N \N"
+parallel worker
+-- extra data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: extra data after last expected column
+CONTEXT: COPY test_parallel_copy, line 1: "2002 232 40 50 60 70 80"
+parallel worker
+-- various COPY options: delimiters, oids, NULL string, encoding
+COPY test_parallel_copy (b, c, d, e) from stdin with (delimiter ',', null 'x', PARALLEL 1) ;
+COPY test_parallel_copy from stdin WITH (DELIMITER ';', NULL '', PARALLEL 1);
+COPY test_parallel_copy from stdin WITH (DELIMITER ':', NULL E'\\X', ENCODING 'sql_ascii', PARALLEL 1);
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = 50004;
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a > 60003;
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE f > 60003;
+ERROR: column "f" does not exist
+LINE 1: ..._parallel_copy from stdin WITH (PARALLEL 1) WHERE f > 60003;
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = max(x.b);
+ERROR: missing FROM-clause entry for table "x"
+LINE 1: ...rallel_copy from stdin WITH (PARALLEL 1) WHERE a = max(x.b);
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (SELECT 1 FROM x);
+ERROR: cannot use subquery in COPY FROM WHERE condition
+LINE 1: ...arallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (SELECT...
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (generate_series(1,5));
+ERROR: set-returning functions are not allowed in COPY FROM WHERE conditions
+LINE 1: ...lel_copy from stdin WITH (PARALLEL 1) WHERE a IN (generate_s...
+ ^
+COPY test_parallel_copy from stdin WITH(PARALLEL 1) WHERE a = row_number() over(b);
+ERROR: window functions are not allowed in COPY FROM WHERE conditions
+LINE 1: ...rallel_copy from stdin WITH(PARALLEL 1) WHERE a = row_number...
+ ^
+-- check results of copy in
+SELECT * FROM test_parallel_copy;
+ a | b | c | d | e
+-------+----+------------+---------+---------
+ 1 | 11 | test_c1 | test_d1 | test_e1
+ 2 | 12 | test_c2 | test_d2 | test_e2
+ | 3 | stuff | test_d3 |
+ | 4 | stuff | test_d4 |
+ | 5 | stuff | test_d5 |
+ | | 45 | 80 | 90
+ | | x | \x | \x
+ | | , | \, | \
+ 3000 | | c | |
+ 4000 | | C | |
+ 4001 | 1 | empty | |
+ 4002 | 2 | null | |
+ 4003 | 3 | Backslash | \ | \
+ 4004 | 4 | BackslashX | \X | \X
+ 4005 | 5 | N | N | N
+ 4006 | 6 | BackslashN | \N | \N
+ 4007 | 7 | XX | XX | XX
+ 4008 | 8 | Delimiter | : | :
+ 50004 | 25 | 35 | 45 | 55
+ 60004 | 25 | 35 | 45 | 55
+ 60005 | 26 | 36 | 46 | 56
+(21 rows)
+
-- clean up
+DROP TABLE test_copy_ri;
+DROP TABLE test_copy_pk;
+DROP TABLE test_copy_expr;
+DROP TABLE testeoc;
+DROP TABLE testnl;
DROP TABLE forcetest;
+DROP TABLE test_parallel_copy;
+DROP TABLE testserial;
+DROP TABLE testnull;
DROP TABLE vistest;
DROP FUNCTION truncate_in_subxact();
DROP TABLE x, y;
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index a1d529a..159c058 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -15,6 +15,13 @@ DELETE FROM onek;
COPY onek FROM '@abs_builddir@/results/onek.data';
+-- test parallel copy
+COPY tenk1 FROM '@abs_srcdir@/data/tenk.data' WITH (parallel 2);
+
+SELECT COUNT(*) FROM tenk1;
+
+TRUNCATE tenk1;
+
COPY tenk1 FROM '@abs_srcdir@/data/tenk.data';
COPY slow_emp4000 FROM '@abs_srcdir@/data/rect.data';
@@ -159,6 +166,11 @@ truncate parted_copytest;
copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv';
+-- Test parallel copy from with a partitioned table.
+truncate parted_copytest;
+
+copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv' with (parallel 2);
+
-- Ensure COPY FREEZE errors for partitioned tables.
begin;
truncate parted_copytest;
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index 938d355..c3003fe 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -9,6 +9,15 @@ COPY onek FROM '@abs_srcdir@/data/onek.data';
COPY onek TO '@abs_builddir@/results/onek.data';
DELETE FROM onek;
COPY onek FROM '@abs_builddir@/results/onek.data';
+-- test parallel copy
+COPY tenk1 FROM '@abs_srcdir@/data/tenk.data' WITH (parallel 2);
+SELECT COUNT(*) FROM tenk1;
+ count
+-------
+ 10000
+(1 row)
+
+TRUNCATE tenk1;
COPY tenk1 FROM '@abs_srcdir@/data/tenk.data';
COPY slow_emp4000 FROM '@abs_srcdir@/data/rect.data';
COPY person FROM '@abs_srcdir@/data/person.data';
@@ -113,6 +122,9 @@ insert into parted_copytest select x,1,'One' from generate_series(1011,1020) x;
copy (select * from parted_copytest order by a) to '@abs_builddir@/results/parted_copytest.csv';
truncate parted_copytest;
copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv';
+-- Test parallel copy from with a partitioned table.
+truncate parted_copytest;
+copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv' with (parallel 2);
-- Ensure COPY FREEZE errors for partitioned tables.
begin;
truncate parted_copytest;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 902f4fa..7015698 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -157,7 +157,7 @@ COPY y TO stdout (FORMAT CSV, FORCE_QUOTE *);
--test that we read consecutive LFs properly
-CREATE TEMP TABLE testnl (a int, b text, c int);
+CREATE TABLE testnl (a int, b text, c int);
COPY testnl FROM stdin CSV;
1,"a field with two LFs
@@ -165,8 +165,16 @@ COPY testnl FROM stdin CSV;
inside",2
\.
+COPY testnl FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
+1,"a field with two LFs
+
+inside",2
+\.
+
+SELECT * FROM testnl;
+
-- test end of copy marker
-CREATE TEMP TABLE testeoc (a text);
+CREATE TABLE testeoc (a text);
COPY testeoc FROM stdin CSV;
a\.
@@ -175,11 +183,19 @@ c\.d
"\."
\.
+TRUNCATE testeoc;
+COPY testeoc FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
+a\.
+\.b
+c\.d
+"\."
+\.
+
COPY testeoc TO stdout CSV;
-- test handling of nonstandard null marker that violates escaping rules
-CREATE TEMP TABLE testnull(a int, b text);
+CREATE TABLE testnull(a int, b text);
INSERT INTO testnull VALUES (1, E'\\0'), (NULL, NULL);
COPY testnull TO stdout WITH NULL AS E'\\0';
@@ -191,6 +207,14 @@ COPY testnull FROM stdin WITH NULL AS E'\\0';
SELECT * FROM testnull;
+TRUNCATE testnull;
+COPY testnull FROM stdin WITH (NULL E'\\0', PARALLEL 1);
+42 \\0
+\0 \0
+\.
+
+SELECT * FROM testnull;
+
BEGIN;
CREATE TABLE vistest (LIKE testeoc);
COPY vistest FROM stdin CSV;
@@ -235,6 +259,23 @@ SELECT * FROM vistest;
BEGIN;
TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+a2
+b
+\.
+SELECT * FROM vistest;
+SAVEPOINT s1;
+TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+d2
+e
+\.
+SELECT * FROM vistest;
+COMMIT;
+SELECT * FROM vistest;
+
+BEGIN;
+TRUNCATE vistest;
COPY vistest FROM stdin CSV FREEZE;
x
y
@@ -284,7 +325,7 @@ SELECT * FROM vistest;
COMMIT;
SELECT * FROM vistest;
-- Test FORCE_NOT_NULL and FORCE_NULL options
-CREATE TEMP TABLE forcetest (
+CREATE TABLE forcetest (
a INT NOT NULL,
b TEXT NOT NULL,
c TEXT,
@@ -297,6 +338,10 @@ BEGIN;
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
1,,""
\.
+TRUNCATE forcetest;
+COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c), PARALLEL 1);
+1,,""
+\.
COMMIT;
SELECT b, c FROM forcetest WHERE a = 1;
-- should succeed, FORCE_NULL and FORCE_NOT_NULL can be both specified
@@ -304,6 +349,10 @@ BEGIN;
COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d));
2,'a',,""
\.
+TRUNCATE forcetest;
+COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d), PARALLEL 1);
+2,'a',,""
+\.
COMMIT;
SELECT c, d FROM forcetest WHERE a = 2;
-- should fail with not-null constraint violation
@@ -339,6 +388,16 @@ copy check_con_tbl from stdin;
0
\.
select * from check_con_tbl;
+\d+ check_con_tbl
+truncate check_con_tbl;
+copy check_con_tbl from stdin with (parallel 1);
+1
+\N
+\.
+copy check_con_tbl from stdin with (parallel 1);
+0
+\.
+select * from check_con_tbl;
-- test with RLS enabled.
CREATE ROLE regress_rls_copy_user;
@@ -440,8 +499,149 @@ test1
SELECT * FROM instead_of_insert_tbl;
COMMIT;
+-- Parallel copy tests.
+CREATE TABLE test_parallel_copy (
+ a INT,
+ b INT,
+ c TEXT not null default 'stuff',
+ d TEXT,
+ e TEXT
+) ;
+
+COPY test_parallel_copy (a, b, c, d, e) from stdin with (PARALLEL 1);
+1 11 test_c1 test_d1 test_e1
+2 12 test_c2 test_d2 test_e2
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+3 test_d3
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+4 test_d4
+5 test_d5
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1, FORMAT 'csv', HEADER);
+b d
+\.
+
+-- zero workers: should perform non-parallel copy
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0');
+
+-- referencing table: should perform non-parallel copy
+CREATE TABLE test_copy_pk(c1 INT PRIMARY KEY);
+INSERT INTO test_copy_pk VALUES(10);
+CREATE TABLE test_copy_ri(c1 INT REFERENCES test_copy_pk(c1));
+COPY test_copy_ri from stdin with (FORMAT csv, DELIMITER ',', PARALLEL 1);
+10
+\.
+
+-- expressions: should perform non-parallel copy
+CREATE TABLE test_copy_expr (index INT, height REAL, weight REAL);
+
+COPY test_copy_expr FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1) WHERE height > random() * 65;
+60,60,60
+\.
+
+-- serial data: should perform non-parallel copy
+CREATE TABLE testserial (index SERIAL, height REAL);
+
+COPY testserial(height) FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1);
+60
+\.
+
+-- temporary table copy: should perform non-parallel copy
+CREATE TEMPORARY TABLE temp_test(
+ a int
+) ;
+
+COPY temp_test (a) from stdin with (PARALLEL 1);
+10
+\.
+
+-- non-existent column in column list: should fail
+COPY test_parallel_copy (xyz) from stdin with (PARALLEL 1);
+
+-- too many columns in column list: should fail
+COPY test_parallel_copy (a, b, c, d, e, d, c) from stdin with (PARALLEL 1);
+
+-- missing data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+
+\.
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+2000 230 23 23
+\.
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+2001 231 \N \N
+\.
+
+-- extra data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+2002 232 40 50 60 70 80
+\.
+
+-- various COPY options: delimiters, oids, NULL string, encoding
+COPY test_parallel_copy (b, c, d, e) from stdin with (delimiter ',', null 'x', PARALLEL 1) ;
+x,45,80,90
+x,\x,\\x,\\\x
+x,\,,\\\,,\\
+\.
+
+COPY test_parallel_copy from stdin WITH (DELIMITER ';', NULL '', PARALLEL 1);
+3000;;c;;
+\.
+
+COPY test_parallel_copy from stdin WITH (DELIMITER ':', NULL E'\\X', ENCODING 'sql_ascii', PARALLEL 1);
+4000:\X:C:\X:\X
+4001:1:empty::
+4002:2:null:\X:\X
+4003:3:Backslash:\\:\\
+4004:4:BackslashX:\\X:\\X
+4005:5:N:\N:\N
+4006:6:BackslashN:\\N:\\N
+4007:7:XX:\XX:\XX
+4008:8:Delimiter:\::\:
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = 50004;
+50003 24 34 44 54
+50004 25 35 45 55
+50005 26 36 46 56
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a > 60003;
+60001 22 32 42 52
+60002 23 33 43 53
+60003 24 34 44 54
+60004 25 35 45 55
+60005 26 36 46 56
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE f > 60003;
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = max(x.b);
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (SELECT 1 FROM x);
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (generate_series(1,5));
+
+COPY test_parallel_copy from stdin WITH(PARALLEL 1) WHERE a = row_number() over(b);
+
+-- check results of copy in
+SELECT * FROM test_parallel_copy;
+
-- clean up
+DROP TABLE test_copy_ri;
+DROP TABLE test_copy_pk;
+DROP TABLE test_copy_expr;
+DROP TABLE testeoc;
+DROP TABLE testnl;
DROP TABLE forcetest;
+DROP TABLE test_parallel_copy;
+DROP TABLE testserial;
+DROP TABLE testnull;
DROP TABLE vistest;
DROP FUNCTION truncate_in_subxact();
DROP TABLE x, y;
--
1.8.3.1
v5-0006-Parallel-Copy-For-Binary-Format-Files.patchtext/x-patch; charset=US-ASCII; name=v5-0006-Parallel-Copy-For-Binary-Format-Files.patchDownload
From 8dc40e5d290edd954b7914d3f8abe3de22b1667d Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Tue, 22 Sep 2020 13:43:10 +0530
Subject: [PATCH v5 6/6] Parallel Copy For Binary Format Files
Leader reads data from the file into the DSM data blocks each of 64K size.
It also identifies each tuple data block id, start offset, end offset,
tuple size and updates this information in the ring data structure.
Workers parallelly read the tuple information from the ring data structure,
the actual tuple data from the data blocks and parallelly insert the tuples
into the table.
---
src/backend/commands/copy.c | 681 ++++++++++++++++++++++++++++++++++++++------
1 file changed, 597 insertions(+), 84 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index ba188d7..5b1884a 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -266,6 +266,17 @@ typedef struct ParallelCopyLineBuf
} ParallelCopyLineBuf;
/*
+ * Represents the usage mode for CopyReadBinaryGetDataBlock.
+ */
+typedef enum FieldInfoType
+{
+ FIELD_NONE = 0,
+ FIELD_COUNT,
+ FIELD_SIZE,
+ FIELD_DATA
+} FieldInfoType;
+
+/*
* Parallel copy data information.
*/
typedef struct ParallelCopyData
@@ -286,6 +297,9 @@ typedef struct ParallelCopyData
/* Current position in worker_line_buf */
uint32 worker_line_buf_pos;
+
+ /* For binary formatted files */
+ ParallelCopyDataBlock *curr_data_block;
} ParallelCopyData;
/*
@@ -450,6 +464,7 @@ typedef struct SerializedParallelCopyState
/* Working state for COPY FROM */
AttrNumber num_defaults;
Oid relid;
+ bool binary;
} SerializedParallelCopyState;
/* DestReceiver for COPY (query) TO */
@@ -524,7 +539,6 @@ typedef struct CopyMultiInsertInfo
int ti_options; /* table insert options */
} CopyMultiInsertInfo;
-
/*
* These macros centralize code used to process line_buf and raw_buf buffers.
* They are macros because they often do continue/break control and to avoid
@@ -652,11 +666,113 @@ if (!IsParallelCopy()) \
else \
return pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
+/*
+ * CHECK_FIELD_COUNT - Handles the error cases for field count
+ * for binary format files.
+ */
+#define CHECK_FIELD_COUNT \
+{\
+ if (fld_count == -1) \
+ { \
+ if (IsParallelCopy() && \
+ !IsLeader()) \
+ return true; \
+ else if (IsParallelCopy() && \
+ IsLeader()) \
+ { \
+ if (cstate->pcdata->curr_data_block->data[cstate->raw_buf_index + sizeof(fld_count)] != 0) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("received copy data after EOF marker"))); \
+ return true; \
+ } \
+ else \
+ { \
+ /* \
+ * Received EOF marker. In a V3-protocol copy, wait for the \
+ * protocol-level EOF, and complain if it doesn't come \
+ * immediately. This ensures that we correctly handle CopyFail, \
+ * if client chooses to send that now. \
+ * \
+ * Note that we MUST NOT try to read more data in an old-protocol \
+ * copy, since there is no protocol-level EOF marker then. We \
+ * could go either way for copy from file, but choose to throw \
+ * error if there's data after the EOF marker, for consistency \
+ * with the new-protocol case. \
+ */ \
+ char dummy; \
+ if (cstate->copy_dest != COPY_OLD_FE && \
+ CopyReadBinaryData(cstate, &dummy, 1) > 0) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("received copy data after EOF marker"))); \
+ return false; \
+ } \
+ } \
+ if (fld_count != cstate->max_fields) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("row field count is %d, expected %d", \
+ (int) fld_count, cstate->max_fields))); \
+}
+
+/*
+ * CHECK_FIELD_SIZE - Handles the error case for field size
+ * for binary format files.
+ */
+#define CHECK_FIELD_SIZE(fld_size) \
+{ \
+ if (fld_size < -1) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("invalid field size")));\
+}
+
+/*
+ * EOF_ERROR - Error statement for EOF for binary format
+ * files.
+ */
+#define EOF_ERROR \
+{ \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("unexpected EOF in COPY data")));\
+}
+
+/*
+ * GET_RAW_BUF_INDEX - Calculates the raw buf index for the cases
+ * where the data spread is across multiple data blocks.
+ */
+#define GET_RAW_BUF_INDEX(raw_buf_index, fld_size, required_blks, curr_blk_bytes) \
+{ \
+ raw_buf_index = fld_size - (((required_blks - 1) * DATA_BLOCK_SIZE) + curr_blk_bytes); \
+}
+
+/*
+ * GET_REQUIRED_BLOCKS - Calculates the number of data
+ * blocks required for the cases where the data spread
+ * is across multiple data blocks.
+ */
+#define GET_REQUIRED_BLOCKS(required_blks, fld_size, curr_blk_bytes) \
+{ \
+ /* \
+ * field size can spread across multiple data blocks, \
+ * calculate the number of required data blocks and try to get \
+ * those many data blocks. \
+ */ \
+ required_blks = (int32)(fld_size - curr_blk_bytes)/(int32)DATA_BLOCK_SIZE; \
+ /* \
+ * check if we need the data block for the field data \
+ * bytes that are not modulus of data block size. \
+ */ \
+ if ((fld_size - curr_blk_bytes)%DATA_BLOCK_SIZE != 0) \
+ required_blks++; \
+}
+
/* End parallel copy Macros */
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
-
/* non-export function prototypes */
static CopyState BeginCopy(ParseState *pstate, bool is_from, Relation rel,
RawStmt *raw_query, Oid queryRelId, List *attnamelist,
@@ -711,6 +827,13 @@ static void ExecBeforeStmtTrigger(CopyState cstate);
static void CheckTargetRelValidity(CopyState cstate);
static void PopulateCstateCatalogInfo(CopyState cstate);
static pg_attribute_always_inline uint32 GetLinePosition(CopyState cstate);
+static uint32 WaitGetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info);
+static bool CopyReadBinaryTupleLeader(CopyState cstate);
+static void CopyReadBinaryFindTupleSize(CopyState cstate, uint32 *line_size);
+static bool CopyReadBinaryTupleWorker(CopyState cstate, Datum *values, bool *nulls);
+static Datum CopyReadBinaryAttributeWorker(CopyState cstate, FmgrInfo *flinfo,
+ Oid typioparam, int32 typmod, bool *isnull);
+static void CopyReadBinaryGetDataBlock(CopyState cstate, FieldInfoType field_info);
/*
* SerializeParallelCopyState - Copy shared_cstate using cstate information.
@@ -729,6 +852,7 @@ SerializeParallelCopyState(CopyState cstate, SerializedParallelCopyState *shared
shared_cstate->convert_selectively = cstate->convert_selectively;
shared_cstate->num_defaults = cstate->num_defaults;
shared_cstate->relid = cstate->pcdata->relid;
+ shared_cstate->binary = cstate->binary;
}
/*
@@ -888,8 +1012,8 @@ FindInsertMethod(CopyState cstate)
static pg_attribute_always_inline bool
IsParallelCopyAllowed(CopyState cstate)
{
- /* Parallel copy not allowed for frontend (2.0 protocol) & binary option. */
- if ((cstate->copy_dest == COPY_OLD_FE) || cstate->binary)
+ /* Parallel copy not allowed for frontend (2.0 protocol). */
+ if (cstate->copy_dest == COPY_OLD_FE)
return false;
/* Check if copy is into foreign table or temporary table. */
@@ -1159,6 +1283,7 @@ InitializeParallelCopyInfo(SerializedParallelCopyState *shared_cstate,
cstate->convert_selectively = shared_cstate->convert_selectively;
cstate->num_defaults = shared_cstate->num_defaults;
pcdata->relid = shared_cstate->relid;
+ cstate->binary = shared_cstate->binary;
PopulateCommonCstateInfo(cstate, tup_desc, attnamelist);
@@ -1554,32 +1679,66 @@ ParallelCopyFrom(CopyState cstate)
/* Execute the before statement triggers from the leader */
ExecBeforeStmtTrigger(cstate);
- /* On input just throw the header line away. */
- if (cstate->cur_lineno == 0 && cstate->header_line)
+ if (!cstate->binary)
{
- cstate->cur_lineno++;
- if (CopyReadLine(cstate))
+ /* On input just throw the header line away. */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
{
- pcshared_info->is_read_in_progress = false;
- return; /* done */
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ {
+ pcshared_info->is_read_in_progress = false;
+ return; /* done */
+ }
}
- }
- for (;;)
- {
- bool done;
- cstate->cur_lineno++;
+ for (;;)
+ {
+ bool done;
+ cstate->cur_lineno++;
- /* Actually read the line into memory here. */
- done = CopyReadLine(cstate);
+ /* Actually read the line into memory here. */
+ done = CopyReadLine(cstate);
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ break;
+ }
+ }
+ else
+ {
/*
- * EOF at start of line means we're done. If we see EOF after some
- * characters, we act as though it was newline followed by EOF, ie,
- * process the line and then exit loop on next iteration.
+ * Binary Format Files.
+ * For parallel copy leader, fill in the error
+ * context information here, in case any failures
+ * while determining tuple offsets, leader
+ * would throw the errors with proper context.
*/
- if (done && cstate->line_buf.len == 0)
- break;
+ ErrorContextCallback errcallback;
+ errcallback.callback = CopyFromErrorCallback;
+ errcallback.arg = (void *) cstate;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+ cstate->pcdata->curr_data_block = NULL;
+ cstate->raw_buf_index = 0;
+ pcshared_info->populated = 0;
+ cstate->cur_lineno = 0;
+ cstate->max_fields = list_length(cstate->attnumlist);
+
+ for (;;)
+ {
+ bool eof = false;
+ cstate->cur_lineno++;
+
+ eof = CopyReadBinaryTupleLeader(cstate);
+
+ if (eof)
+ break;
+ }
}
pcshared_info->is_read_in_progress = false;
@@ -1587,6 +1746,354 @@ ParallelCopyFrom(CopyState cstate)
}
/*
+ * CopyReadBinaryGetDataBlock - Gets a new block, updates
+ * the current offset, calculates the skip bytes.
+ */
+static void
+CopyReadBinaryGetDataBlock(CopyState cstate, FieldInfoType field_info)
+{
+ ParallelCopyDataBlock *data_block = NULL;
+ ParallelCopyDataBlock *curr_data_block = cstate->pcdata->curr_data_block;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint8 move_bytes = 0;
+ uint32 block_pos;
+ uint32 prev_block_pos;
+ int read_bytes = 0;
+
+ prev_block_pos = pcshared_info->cur_block_pos;
+
+ block_pos = WaitGetFreeCopyBlock(pcshared_info);
+
+ if (field_info == FIELD_SIZE || field_info == FIELD_COUNT)
+ move_bytes = (DATA_BLOCK_SIZE - cstate->raw_buf_index);
+
+ if (curr_data_block != NULL)
+ curr_data_block->skip_bytes = move_bytes;
+
+ data_block = &pcshared_info->data_blocks[block_pos];
+
+ if (move_bytes > 0 && curr_data_block != NULL)
+ memmove(&data_block->data[0], &curr_data_block->data[cstate->raw_buf_index], move_bytes);
+
+ elog(DEBUG1, "LEADER - field info %d is spread across data blocks - moved %d bytes from current block %u to %u block",
+ field_info, move_bytes, prev_block_pos, block_pos);
+
+ read_bytes = CopyGetData(cstate, &data_block->data[move_bytes], 1, (DATA_BLOCK_SIZE - move_bytes));
+
+ if (field_info == FIELD_NONE && cstate->reached_eof)
+ return;
+
+ if (cstate->reached_eof)
+ EOF_ERROR;
+
+ elog(DEBUG1, "LEADER - bytes read from file %d", read_bytes);
+
+ if(field_info == FIELD_SIZE || field_info == FIELD_DATA)
+ {
+ ParallelCopyDataBlock *prev_data_block = NULL;
+ prev_data_block = curr_data_block;
+ prev_data_block->following_block = block_pos;
+
+ if (prev_data_block->curr_blk_completed == false)
+ prev_data_block->curr_blk_completed = true;
+
+ pg_atomic_add_fetch_u32(&prev_data_block->unprocessed_line_parts, 1);
+ }
+
+ cstate->pcdata->curr_data_block = data_block;
+ cstate->raw_buf_index = 0;
+}
+
+/*
+ * CopyReadBinaryTupleLeader - Leader reads data from binary formatted file
+ * to data blocks and identifies tuple boundaries/offsets so that workers
+ * can work on the data blocks data.
+ */
+static bool
+CopyReadBinaryTupleLeader(CopyState cstate)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ int16 fld_count;
+ uint32 line_size = 0;
+ uint32 start_block_pos;
+ uint32 start_offset;
+
+ if (cstate->pcdata->curr_data_block == NULL)
+ {
+ CopyReadBinaryGetDataBlock(cstate, FIELD_NONE);
+
+ /*
+ * no data is read from file here. one possibility
+ * to be here could be that the binary file just
+ * has a valid signature but nothing else.
+ */
+ if (cstate->reached_eof)
+ return true;
+ }
+
+ if ((cstate->raw_buf_index + sizeof(fld_count)) >= DATA_BLOCK_SIZE)
+ CopyReadBinaryGetDataBlock(cstate, FIELD_COUNT);
+
+ memcpy(&fld_count, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_count));
+ fld_count = (int16) pg_ntoh16(fld_count);
+ CHECK_FIELD_COUNT;
+ start_offset = cstate->raw_buf_index;
+ cstate->raw_buf_index += sizeof(fld_count);
+ line_size += sizeof(fld_count);
+ start_block_pos = pcshared_info->cur_block_pos;
+
+ CopyReadBinaryFindTupleSize(cstate, &line_size);
+
+ pg_atomic_add_fetch_u32(&cstate->pcdata->curr_data_block->unprocessed_line_parts, 1);
+
+ if (line_size > 0)
+ {
+ int line_pos = UpdateBlockInLineInfo(cstate,
+ start_block_pos,
+ start_offset,
+ line_size,
+ LINE_LEADER_POPULATED);
+
+ pcshared_info->populated++;
+ elog(DEBUG1, "LEADER - adding - block:%u, offset:%u, line size:%u line position:%d",
+ start_block_pos, start_offset, line_size, line_pos);
+ }
+
+ return false;
+}
+
+/*
+ * CopyReadBinaryFindTupleSize - Leader identifies boundaries/
+ * offsets for each attribute/column and finally results in the
+ * tuple/row size. It moves on to next data block if the attribute/
+ * column is spread across data blocks.
+ */
+static void
+CopyReadBinaryFindTupleSize(CopyState cstate, uint32 *line_size)
+{
+ int32 fld_size;
+ ListCell *cur;
+ TupleDesc tup_desc = RelationGetDescr(cstate->rel);
+
+ foreach(cur, cstate->attnumlist)
+ {
+ int att_num = lfirst_int(cur);
+ Form_pg_attribute att = TupleDescAttr(tup_desc, (att_num - 1));
+ cstate->cur_attname = NameStr(att->attname);
+ fld_size = 0;
+
+ if ((cstate->raw_buf_index + sizeof(fld_size)) >= DATA_BLOCK_SIZE)
+ CopyReadBinaryGetDataBlock(cstate, FIELD_SIZE);
+
+ memcpy(&fld_size, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_size));
+ cstate->raw_buf_index += sizeof(fld_size);
+ *line_size += sizeof(fld_size);
+ fld_size = (int32) pg_ntoh32(fld_size);
+
+ CHECK_FIELD_SIZE(fld_size);
+
+ *line_size += fld_size;
+
+ if ((DATA_BLOCK_SIZE - cstate->raw_buf_index) >= fld_size)
+ {
+ cstate->raw_buf_index += fld_size;
+ elog(DEBUG1, "LEADER - tuple lies in he same data block");
+ }
+ else
+ {
+ int32 required_blks = 0;
+ int32 curr_blk_bytes = (DATA_BLOCK_SIZE - cstate->raw_buf_index);
+ int i = 0;
+
+ GET_REQUIRED_BLOCKS(required_blks, fld_size, curr_blk_bytes);
+
+ i = required_blks;
+
+ while(i > 0)
+ {
+ CopyReadBinaryGetDataBlock(cstate, FIELD_DATA);
+ i--;
+ }
+
+ GET_RAW_BUF_INDEX(cstate->raw_buf_index, fld_size, required_blks, curr_blk_bytes);
+
+ /*
+ * raw_buf_index should never cross data block size,
+ * as the required number of data blocks would have
+ * been obtained in the above while loop.
+ */
+ Assert(cstate->raw_buf_index <= DATA_BLOCK_SIZE);
+ }
+ cstate->cur_attname = NULL;
+ }
+}
+
+/*
+ * CopyReadBinaryTupleWorker - Each worker reads data from data blocks after
+ * getting leader-identified tuple offsets from ring data structure.
+ */
+static bool
+CopyReadBinaryTupleWorker(CopyState cstate, Datum *values, bool *nulls)
+{
+ uint32 line_pos;
+ ParallelCopyLineBoundary *line_info;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ int16 fld_count;
+ ListCell *cur;
+ FmgrInfo *in_functions = cstate->in_functions;
+ Oid *typioparams = cstate->typioparams;
+ TupleDesc tup_desc = RelationGetDescr(cstate->rel);
+
+ line_pos = GetLinePosition(cstate);
+
+ if (line_pos == -1)
+ return true;
+
+ line_info = &pcshared_info->line_boundaries.ring[line_pos];
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[line_info->first_block];
+ cstate->raw_buf_index = line_info->start_offset;
+
+ if (cstate->raw_buf_index + sizeof(fld_count) >= DATA_BLOCK_SIZE)
+ {
+ /*
+ * The case where field count spread across datablocks should never occur,
+ * as the leader would have moved it to next block. this code exists for
+ * debugging purposes only.
+ */
+ elog(DEBUG1, "WORKER - field count spread across datablocks should never occur");
+ }
+
+ memcpy(&fld_count, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_count));
+ fld_count = (int16) pg_ntoh16(fld_count);
+
+ CHECK_FIELD_COUNT;
+
+ cstate->raw_buf_index += sizeof(fld_count);
+
+ foreach(cur, cstate->attnumlist)
+ {
+ int att_num = lfirst_int(cur);
+ int m = att_num - 1;
+ Form_pg_attribute att = TupleDescAttr(tup_desc, m);
+ cstate->cur_attname = NameStr(att->attname);
+
+ values[m] = CopyReadBinaryAttributeWorker(cstate,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &nulls[m]);
+ cstate->cur_attname = NULL;
+ }
+
+ pg_atomic_sub_fetch_u32(&cstate->pcdata->curr_data_block->unprocessed_line_parts, 1);
+ line_info->start_offset = -1;
+ pg_atomic_write_u32(&line_info->line_state, LINE_WORKER_PROCESSED);
+ pg_atomic_write_u32(&line_info->line_size, -1);
+ pg_atomic_add_fetch_u64(&pcshared_info->total_worker_processed, 1);
+
+ return false;
+}
+
+/*
+ * CopyReadBinaryAttributeWorker - Leader identifies boundaries/offsets
+ * for each attribute/column, it moves on to next data block if the
+ * attribute/column is spread across data blocks.
+ */
+static Datum
+CopyReadBinaryAttributeWorker(CopyState cstate, FmgrInfo *flinfo,
+ Oid typioparam, int32 typmod, bool *isnull)
+{
+ int32 fld_size;
+ Datum result;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+
+ if ((cstate->raw_buf_index + sizeof(fld_size)) >= DATA_BLOCK_SIZE)
+ {
+ ParallelCopyDataBlock *prev_data_block = cstate->pcdata->curr_data_block;
+ elog(DEBUG1, "WORKER - field size is spread across data blocks");
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[prev_data_block->following_block];
+ pg_atomic_sub_fetch_u32(&prev_data_block->unprocessed_line_parts, 1);
+ cstate->raw_buf_index = 0;
+ }
+
+ memcpy(&fld_size, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_size));
+ fld_size = (int32) pg_ntoh32(fld_size);
+
+ CHECK_FIELD_SIZE(fld_size);
+
+ cstate->raw_buf_index += sizeof(fld_size);
+
+ /* reset attribute_buf to empty, and load raw data in it */
+ resetStringInfo(&cstate->attribute_buf);
+
+ enlargeStringInfo(&cstate->attribute_buf, fld_size);
+
+ if ((DATA_BLOCK_SIZE - cstate->raw_buf_index) >= fld_size)
+ {
+ elog(DEBUG1, "WORKER - tuple lies in single data block");
+ memcpy(&cstate->attribute_buf.data[0],&cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], fld_size);
+ cstate->raw_buf_index += fld_size;
+ }
+ else
+ {
+ uint32 att_buf_idx = 0;
+ uint32 copy_bytes = 0;
+ int32 required_blks = 0;
+ int32 curr_blk_bytes = (DATA_BLOCK_SIZE - cstate->raw_buf_index);
+ ParallelCopyDataBlock *prev_data_block = NULL;
+ int i = 0;
+
+ GET_REQUIRED_BLOCKS(required_blks, fld_size, curr_blk_bytes);
+
+ i = required_blks;
+ prev_data_block = cstate->pcdata->curr_data_block;
+ elog(DEBUG1, "WORKER - tuple is spread across data blocks");
+ memcpy(&cstate->attribute_buf.data[0], &prev_data_block->data[cstate->raw_buf_index],
+ curr_blk_bytes);
+ copy_bytes = curr_blk_bytes;
+ att_buf_idx = curr_blk_bytes;
+
+ while (i>0)
+ {
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[prev_data_block->following_block];
+ pg_atomic_sub_fetch_u32(&prev_data_block->unprocessed_line_parts, 1);
+ cstate->raw_buf_index = 0;
+ copy_bytes = fld_size - att_buf_idx;
+
+ /*
+ * The bytes that are yet to be taken into att buff are more than
+ * the entire data block size, but only take the data block size
+ * elements.
+ */
+ if (copy_bytes >= DATA_BLOCK_SIZE)
+ copy_bytes = DATA_BLOCK_SIZE;
+
+ memcpy(&cstate->attribute_buf.data[att_buf_idx],
+ &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], copy_bytes);
+ att_buf_idx += copy_bytes;
+ i--;
+ }
+ GET_RAW_BUF_INDEX(cstate->raw_buf_index, fld_size, required_blks, curr_blk_bytes);
+ }
+
+ cstate->attribute_buf.len = fld_size;
+ cstate->attribute_buf.data[fld_size] = '\0';
+
+ /* Call the column type's binary input converter */
+ result = ReceiveFunctionCall(flinfo, &cstate->attribute_buf,
+ typioparam, typmod);
+
+ /* Trouble if it didn't eat the whole buffer */
+ if (cstate->attribute_buf.cursor != cstate->attribute_buf.len)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
+ errmsg("incorrect binary data format")));
+
+ *isnull = false;
+ return result;
+}
+
+/*
* GetLinePosition - Return the line position that worker should process.
*/
static uint32
@@ -1675,7 +2182,9 @@ GetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
{
dataBlkPtr->curr_blk_completed = false;
dataBlkPtr->skip_bytes = 0;
+ dataBlkPtr->following_block = -1;
pcshared_info->cur_block_pos = block_pos;
+ MemSet(&dataBlkPtr->data[0], 0, DATA_BLOCK_SIZE);
return block_pos;
}
@@ -2191,10 +2700,26 @@ CopyGetInt32(CopyState cstate, int32 *val)
{
uint32 buf;
- if (CopyReadBinaryData(cstate, (char *) &buf, sizeof(buf)) != sizeof(buf))
+ /*
+ * For parallel copy, avoid reading data to raw buf, read directly
+ * from file, later the data will be read to parallel copy data
+ * buffers.
+ */
+ if (cstate->nworkers > 0)
{
- *val = 0; /* suppress compiler warning */
- return false;
+ if (CopyGetData(cstate, &buf, sizeof(buf), sizeof(buf)) != sizeof(buf))
+ {
+ *val = 0; /* suppress compiler warning */
+ return false;
+ }
+ }
+ else
+ {
+ if (CopyReadBinaryData(cstate, (char *) &buf, sizeof(buf)) != sizeof(buf))
+ {
+ *val = 0; /* suppress compiler warning */
+ return false;
+ }
}
*val = (int32) pg_ntoh32(buf);
return true;
@@ -2583,7 +3108,15 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
EndParallelCopy(pcxt);
}
else
+ {
+ /*
+ * Reset nworkers to -1 here. This is useful in cases where user
+ * specifies parallel workers, but, no worker is picked up, so go
+ * back to non parallel mode value of nworkers.
+ */
+ cstate->nworkers = -1;
*processed = CopyFrom(cstate); /* copy from file to database */
+ }
EndCopyFrom(cstate);
}
@@ -5047,7 +5580,7 @@ BeginCopyFrom(ParseState *pstate,
* only in text mode.
*/
initStringInfo(&cstate->attribute_buf);
- cstate->raw_buf = (IsParallelCopy()) ? NULL : (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
cstate->raw_buf_index = cstate->raw_buf_len = 0;
if (!cstate->binary)
{
@@ -5127,7 +5660,7 @@ BeginCopyFrom(ParseState *pstate,
int32 tmp;
/* Signature */
- if (CopyReadBinaryData(cstate, readSig, 11) != 11 ||
+ if (CopyGetData(cstate, readSig, 11, 11) != 11 ||
memcmp(readSig, BinarySignature, 11) != 0)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
@@ -5155,7 +5688,7 @@ BeginCopyFrom(ParseState *pstate,
/* Skip extension header, if present */
while (tmp-- > 0)
{
- if (CopyReadBinaryData(cstate, readSig, 1) != 1)
+ if (CopyGetData(cstate, readSig, 1, 1) != 1)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
errmsg("invalid COPY file header (wrong length)")));
@@ -5352,60 +5885,45 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
else
{
/* binary */
- int16 fld_count;
- ListCell *cur;
-
cstate->cur_lineno++;
+ cstate->max_fields = list_length(cstate->attnumlist);
- if (!CopyGetInt16(cstate, &fld_count))
+ if (!IsParallelCopy())
{
- /* EOF detected (end of file, or protocol-level EOF) */
- return false;
- }
+ int16 fld_count;
+ ListCell *cur;
- if (fld_count == -1)
- {
- /*
- * Received EOF marker. In a V3-protocol copy, wait for the
- * protocol-level EOF, and complain if it doesn't come
- * immediately. This ensures that we correctly handle CopyFail,
- * if client chooses to send that now.
- *
- * Note that we MUST NOT try to read more data in an old-protocol
- * copy, since there is no protocol-level EOF marker then. We
- * could go either way for copy from file, but choose to throw
- * error if there's data after the EOF marker, for consistency
- * with the new-protocol case.
- */
- char dummy;
-
- if (cstate->copy_dest != COPY_OLD_FE &&
- CopyReadBinaryData(cstate, &dummy, 1) > 0)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("received copy data after EOF marker")));
- return false;
- }
+ if (!CopyGetInt16(cstate, &fld_count))
+ {
+ /* EOF detected (end of file, or protocol-level EOF) */
+ return false;
+ }
- if (fld_count != attr_count)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("row field count is %d, expected %d",
- (int) fld_count, attr_count)));
+ CHECK_FIELD_COUNT;
- foreach(cur, cstate->attnumlist)
+ foreach(cur, cstate->attnumlist)
+ {
+ int attnum = lfirst_int(cur);
+ int m = attnum - 1;
+ Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+ cstate->cur_attname = NameStr(att->attname);
+ values[m] = CopyReadBinaryAttribute(cstate,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &nulls[m]);
+ cstate->cur_attname = NULL;
+ }
+ }
+ else
{
- int attnum = lfirst_int(cur);
- int m = attnum - 1;
- Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+ bool eof = false;
- cstate->cur_attname = NameStr(att->attname);
- values[m] = CopyReadBinaryAttribute(cstate,
- &in_functions[m],
- typioparams[m],
- att->atttypmod,
- &nulls[m]);
- cstate->cur_attname = NULL;
+ eof = CopyReadBinaryTupleWorker(cstate, values, nulls);
+
+ if (eof)
+ return false;
}
}
@@ -6405,18 +6923,15 @@ CopyReadBinaryAttribute(CopyState cstate, FmgrInfo *flinfo,
Datum result;
if (!CopyGetInt32(cstate, &fld_size))
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("unexpected EOF in COPY data")));
+ EOF_ERROR;
+
if (fld_size == -1)
{
*isnull = true;
return ReceiveFunctionCall(flinfo, NULL, typioparam, typmod);
}
- if (fld_size < 0)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("invalid field size")));
+
+ CHECK_FIELD_SIZE(fld_size);
/* reset attribute_buf to empty, and load raw data in it */
resetStringInfo(&cstate->attribute_buf);
@@ -6424,9 +6939,7 @@ CopyReadBinaryAttribute(CopyState cstate, FmgrInfo *flinfo,
enlargeStringInfo(&cstate->attribute_buf, fld_size);
if (CopyReadBinaryData(cstate, cstate->attribute_buf.data,
fld_size) != fld_size)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("unexpected EOF in COPY data")));
+ EOF_ERROR;
cstate->attribute_buf.len = fld_size;
cstate->attribute_buf.data[fld_size] = '\0';
--
1.8.3.1
On Thu, Sep 17, 2020 at 11:06 AM Bharath Rupireddy <
bharath.rupireddyforpostgres@gmail.com> wrote:
On Wed, Sep 16, 2020 at 1:20 PM Greg Nancarrow <gregn4422@gmail.com>
wrote:
Fortunately I have been given permission to share the exact table
definition and data I used, so you can check the behaviour and timings
on your own test machine.Thanks Greg for the script. I ran your test case and I didn't observe
any increase in exec time with 1 worker, indeed, we have benefitted a
few seconds from 0 to 1 worker as expected.Execution time is in seconds. Each test case is executed 3 times on
release build. Each time the data directory is recreated.Case 1: 1000000 rows, 2GB
With Patch, default configuration, 0 worker: 88.933, 92.261, 88.423
With Patch, default configuration, 1 worker: 73.825, 74.583, 72.678With Patch, custom configuration, 0 worker: 76.191, 78.160, 78.822
With Patch, custom configuration, 1 worker: 61.289, 61.288, 60.573Case 2: 2550000 rows, 5GB
With Patch, default configuration, 0 worker: 246.031, 188.323, 216.683
With Patch, default configuration, 1 worker: 156.299, 153.293, 170.307With Patch, custom configuration, 0 worker: 197.234, 195.866, 196.049
With Patch, custom configuration, 1 worker: 157.173, 158.287, 157.090
Hi Greg,
If you still observe the issue in your testing environment, I'm attaching a
testing patch(that applies on top of the latest parallel copy patch set
i.e. v5 1 to 6) to capture various timings such as total copy time in
leader and worker, index and table insertion time, leader and worker
waiting time. These logs are shown in the server log file.
Few things to follow before testing:
1. Is the table being dropped/truncated after the test with 0 workers and
before running with 1 worker? If not, then the index insertion time would
increase.[1]0 worker: LOG: totaltableinsertiontime = 25491.881 ms LOG: totalindexinsertiontime = 14136.104 ms LOG: totalcopytime = 75606.858 ms table is not dropped and so are indexes 1 worker: LOG: totalcopyworkerwaitingtime = 64.582 ms LOG: totaltableinsertiontime = 21360.875 ms LOG: totalindexinsertiontime = 24843.570 ms LOG: totalcopytimeworker = 69837.162 ms LOG: totalcopyleaderwaitingtime = 49548.441 ms LOG: totalcopytime = 69997.778 ms(for me it is increasing by 10 sec). This is obvious because
the 1st time index will be created from bottom up manner(from leaves to
root), but for the 2nd time it has to search and insert at the proper
leaves and inner B+Tree nodes.
2. If possible, can you also run with custom postgresql.conf settings[2]custom postgresql.conf configuration: shared_buffers = 40GB max_worker_processes = 32 max_parallel_maintenance_workers = 24 max_parallel_workers = 32 synchronous_commit = off checkpoint_timeout = 1d max_wal_size = 24GB min_wal_size = 15GB autovacuum = off
along with default? Just to ensure that other bg processes such as
checkpointer, autovacuum, bgwriter etc. don't affect our testcase. For
instance, with default postgresql.conf file, it looks like checkpointing[3]LOG: checkpoints are occurring too frequently (14 seconds apart) HINT: Consider increasing the configuration parameter "max_wal_size".
is happening frequently, could you please let us know if that happens at
your end?
3. Could you please run the test case 3 times at least? Just to ensure the
consistency of the issue.
4. I ran the tests in a performance test system where no other user
processes(except system processes) are running. Is it possible for you to
do the same?
Please capture and share the timing logs with us.
Here's a snapshot of how the added timings show up in the logs: ( I
captured this with your test case case 1: 1000000 rows, 2GB, custom
postgresql.conf file settings[2]custom postgresql.conf configuration: shared_buffers = 40GB max_worker_processes = 32 max_parallel_maintenance_workers = 24 max_parallel_workers = 32 synchronous_commit = off checkpoint_timeout = 1d max_wal_size = 24GB min_wal_size = 15GB autovacuum = off).
with 0 workers:
2020-09-22 10:49:27.508 BST [163910] LOG: totaltableinsertiontime =
24072.034 ms
2020-09-22 10:49:27.508 BST [163910] LOG: totalindexinsertiontime = 60.682
ms
2020-09-22 10:49:27.508 BST [163910] LOG: totalcopytime = 59664.594 ms
with 1 worker:
2020-09-22 10:53:58.409 BST [163947] LOG: totalcopyworkerwaitingtime =
59.815 ms
2020-09-22 10:53:58.409 BST [163947] LOG: totaltableinsertiontime =
23585.881 ms
2020-09-22 10:53:58.409 BST [163947] LOG: totalindexinsertiontime = 30.946
ms
2020-09-22 10:53:58.409 BST [163947] LOG: totalcopytimeworker = 47047.956
ms
2020-09-22 10:53:58.429 BST [163946] LOG: totalcopyleaderwaitingtime =
26746.744 ms
2020-09-22 10:53:58.429 BST [163946] LOG: totalcopytime = 47150.002 ms
[1]: 0 worker: LOG: totaltableinsertiontime = 25491.881 ms LOG: totalindexinsertiontime = 14136.104 ms LOG: totalcopytime = 75606.858 ms table is not dropped and so are indexes 1 worker: LOG: totalcopyworkerwaitingtime = 64.582 ms LOG: totaltableinsertiontime = 21360.875 ms LOG: totalindexinsertiontime = 24843.570 ms LOG: totalcopytimeworker = 69837.162 ms LOG: totalcopyleaderwaitingtime = 49548.441 ms LOG: totalcopytime = 69997.778 ms
0 worker:
LOG: totaltableinsertiontime = 25491.881 ms
LOG: totalindexinsertiontime = 14136.104 ms
LOG: totalcopytime = 75606.858 ms
table is not dropped and so are indexes
1 worker:
LOG: totalcopyworkerwaitingtime = 64.582 ms
LOG: totaltableinsertiontime = 21360.875 ms
LOG: totalindexinsertiontime = 24843.570 ms
LOG: totalcopytimeworker = 69837.162 ms
LOG: totalcopyleaderwaitingtime = 49548.441 ms
LOG: totalcopytime = 69997.778 ms
[2]: custom postgresql.conf configuration: shared_buffers = 40GB max_worker_processes = 32 max_parallel_maintenance_workers = 24 max_parallel_workers = 32 synchronous_commit = off checkpoint_timeout = 1d max_wal_size = 24GB min_wal_size = 15GB autovacuum = off
custom postgresql.conf configuration:
shared_buffers = 40GB
max_worker_processes = 32
max_parallel_maintenance_workers = 24
max_parallel_workers = 32
synchronous_commit = off
checkpoint_timeout = 1d
max_wal_size = 24GB
min_wal_size = 15GB
autovacuum = off
[3]: LOG: checkpoints are occurring too frequently (14 seconds apart) HINT: Consider increasing the configuration parameter "max_wal_size".
LOG: checkpoints are occurring too frequently (14 seconds apart)
HINT: Consider increasing the configuration parameter "max_wal_size".
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v1-0001-Parallel-Copy-Exec-Time-Capture.patchapplication/octet-stream; name=v1-0001-Parallel-Copy-Exec-Time-Capture.patchDownload
From 28c5b37c2271b623f6bc4653d17f92dedb8722be Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Tue, 22 Sep 2020 15:12:27 +0530
Subject: [PATCH v2] Parallel Copy Exec Time Capture
A testing patch for capturing various timings such as total copy
time in leader and worker, index insertion time, leader and worker
waiting time.
---
src/backend/commands/copy.c | 74 ++++++++++++++++++++++++++++++++++++-
1 file changed, 73 insertions(+), 1 deletion(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 5b1884acd8..cb72949e0e 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -65,6 +65,14 @@
#define ISOCTAL(c) (((c) >= '0') && ((c) <= '7'))
#define OCTVALUE(c) ((c) - '0')
+/* Global variables for capturing parallel copy execution times. */
+double totalcopytime;
+double totalcopytimeworker;
+double totalcopyleaderwaitingtime;
+double totalcopyworkerwaitingtime;
+double totaltableinsertiontime;
+double totalindexinsertiontime;
+
/*
* Represents the different source/dest cases we need to worry about at
* the bottom level
@@ -1332,9 +1340,16 @@ CacheLineInfo(CopyState cstate, uint32 buff_count)
uint32 offset;
int dataSize;
int copiedSize = 0;
+ struct timespec before, after;
+ struct timespec before1, after1;
resetStringInfo(&pcdata->worker_line_buf[buff_count].line_buf);
+ INSTR_TIME_SET_CURRENT(before);
write_pos = GetLinePosition(cstate);
+ INSTR_TIME_SET_CURRENT(after);
+ INSTR_TIME_SUBTRACT(after, before);
+ totalcopyworkerwaitingtime += INSTR_TIME_GET_MILLISEC(after);
+
if (-1 == write_pos)
return true;
@@ -1436,6 +1451,7 @@ CacheLineInfo(CopyState cstate, uint32 buff_count)
data_blk_ptr = &pcshared_info->data_blocks[data_blk_ptr->following_block];
}
+ INSTR_TIME_SET_CURRENT(before1);
for (;;)
{
/* Get the size of this line */
@@ -1455,6 +1471,9 @@ CacheLineInfo(CopyState cstate, uint32 buff_count)
COPY_WAIT_TO_PROCESS()
}
+ INSTR_TIME_SET_CURRENT(after1);
+ INSTR_TIME_SUBTRACT(after1, before1);
+ totalcopyworkerwaitingtime += INSTR_TIME_GET_MILLISEC(after1);
}
empty_data_line_update:
@@ -1538,6 +1557,11 @@ ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
char *convertListStr = NULL;
WalUsage *walusage;
BufferUsage *bufferusage;
+ struct timespec before, after;
+ totalcopytimeworker = 0;
+ totalcopyworkerwaitingtime = 0;
+ totaltableinsertiontime = 0;
+ totalindexinsertiontime = 0;
/* Allocate workspace and zero all fields. */
cstate = (CopyStateData *) palloc0(sizeof(CopyStateData));
@@ -1606,7 +1630,15 @@ ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
cstate->rel = rel;
InitializeParallelCopyInfo(shared_cstate, cstate, attlist);
+ INSTR_TIME_SET_CURRENT(before);
CopyFrom(cstate);
+ INSTR_TIME_SET_CURRENT(after);
+ INSTR_TIME_SUBTRACT(after, before);
+ totalcopytimeworker += INSTR_TIME_GET_MILLISEC(after);
+ ereport(LOG, (errmsg("totalcopyworkerwaitingtime = %.3f ms", totalcopyworkerwaitingtime), errhidestmt(true)));
+ ereport(LOG, (errmsg("totaltableinsertiontime = %.3f ms", totaltableinsertiontime), errhidestmt(true)));
+ ereport(LOG, (errmsg("totalindexinsertiontime = %.3f ms", totalindexinsertiontime), errhidestmt(true)));
+ ereport(LOG, (errmsg("totalcopytimeworker = %.3f ms", totalcopytimeworker), errhidestmt(true)));
if (rel != NULL)
table_close(rel, RowExclusiveLock);
@@ -1633,11 +1665,16 @@ UpdateBlockInLineInfo(CopyState cstate, uint32 blk_pos,
ParallelCopyLineBoundaries *lineBoundaryPtr = &pcshared_info->line_boundaries;
ParallelCopyLineBoundary *lineInfo;
int line_pos = lineBoundaryPtr->pos;
+ struct timespec before, after;
/* Update the line information for the worker to pick and process. */
lineInfo = &lineBoundaryPtr->ring[line_pos];
+ INSTR_TIME_SET_CURRENT(before);
while (pg_atomic_read_u32(&lineInfo->line_size) != -1)
COPY_WAIT_TO_PROCESS()
+ INSTR_TIME_SET_CURRENT(after);
+ INSTR_TIME_SUBTRACT(after, before);
+ totalcopyleaderwaitingtime += INSTR_TIME_GET_MILLISEC(after);
lineInfo->first_block = blk_pos;
lineInfo->start_offset = offset;
@@ -2203,6 +2240,8 @@ static uint32
WaitGetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
{
uint32 new_free_pos = -1;
+ struct timespec before, after;
+ INSTR_TIME_SET_CURRENT(before);
for (;;)
{
new_free_pos = GetFreeCopyBlock(pcshared_info);
@@ -2211,7 +2250,9 @@ WaitGetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
COPY_WAIT_TO_PROCESS()
}
-
+ INSTR_TIME_SET_CURRENT(after);
+ INSTR_TIME_SUBTRACT(after, before);
+ totalcopyleaderwaitingtime += INSTR_TIME_GET_MILLISEC(after);
return new_free_pos;
}
@@ -3083,12 +3124,21 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
if (is_from)
{
ParallelContext *pcxt = NULL;
+ struct timespec before, after;
Assert(rel);
+ totalcopytime = 0;
+ totalcopytimeworker = 0;
+ totalcopyleaderwaitingtime = 0;
+ totalcopyworkerwaitingtime = 0;
+ totaltableinsertiontime = 0;
+ totalindexinsertiontime = 0;
/* check read-only transaction and parallel mode */
if (XactReadOnly && !rel->rd_islocaltemp)
PreventCommandIfReadOnly("COPY FROM");
+ INSTR_TIME_SET_CURRENT(before);
+
cstate = BeginCopyFrom(pstate, rel, stmt->filename, stmt->is_program,
NULL, stmt->attlist, stmt->options);
cstate->whereClause = whereClause;
@@ -3119,6 +3169,18 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
}
EndCopyFrom(cstate);
+
+ INSTR_TIME_SET_CURRENT(after);
+ INSTR_TIME_SUBTRACT(after, before);
+ totalcopytime += INSTR_TIME_GET_MILLISEC(after);
+ if (pcxt != NULL)
+ ereport(LOG, (errmsg("totalcopyleaderwaitingtime = %.3f ms", totalcopyleaderwaitingtime), errhidestmt(true)));
+ if (pcxt == NULL)
+ {
+ ereport(LOG, (errmsg("totaltableinsertiontime = %.3f ms", totaltableinsertiontime), errhidestmt(true)));
+ ereport(LOG, (errmsg("totalindexinsertiontime = %.3f ms", totalindexinsertiontime), errhidestmt(true)));
+ }
+ ereport(LOG, (errmsg("totalcopytime = %.3f ms", totalcopytime), errhidestmt(true)));
}
else
{
@@ -4527,6 +4589,8 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
int nused = buffer->nused;
ResultRelInfo *resultRelInfo = buffer->resultRelInfo;
TupleTableSlot **slots = buffer->slots;
+ struct timespec before, after;
+ struct timespec before1, after1;
/* Set es_result_relation_info to the ResultRelInfo we're flushing. */
estate->es_result_relation_info = resultRelInfo;
@@ -4543,14 +4607,19 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
* context before calling it.
*/
oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+ INSTR_TIME_SET_CURRENT(before);
table_multi_insert(resultRelInfo->ri_RelationDesc,
slots,
nused,
mycid,
ti_options,
buffer->bistate);
+ INSTR_TIME_SET_CURRENT(after);
+ INSTR_TIME_SUBTRACT(after, before);
+ totaltableinsertiontime += INSTR_TIME_GET_MILLISEC(after);
MemoryContextSwitchTo(oldcontext);
+ INSTR_TIME_SET_CURRENT(before1);
for (i = 0; i < nused; i++)
{
/*
@@ -4586,6 +4655,9 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
ExecClearTuple(slots[i]);
}
+ INSTR_TIME_SET_CURRENT(after1);
+ INSTR_TIME_SUBTRACT(after1, before1);
+ totalindexinsertiontime += INSTR_TIME_GET_MILLISEC(after1);
/* Mark that all slots are free */
buffer->nused = 0;
--
2.25.1
Hi Bharath,
Few things to follow before testing:
1. Is the table being dropped/truncated after the test with 0 workers and before running with 1 worker? If not, then the index insertion time would increase.[1](for me it is increasing by 10 sec). This is obvious because the 1st time index will be created from bottom up manner(from leaves to root), but for the 2nd time it has to search and insert at the proper leaves and inner B+Tree nodes.
Yes, it' being truncated before running each and every COPY.
2. If possible, can you also run with custom postgresql.conf settings[2] along with default? Just to ensure that other bg processes such as checkpointer, autovacuum, bgwriter etc. don't affect our testcase. For instance, with default postgresql.conf file, it looks like checkpointing[3] is happening frequently, could you please let us know if that happens at your end?
Yes, have run with default and your custom settings. With default
settings, I can confirm that checkpointing is happening frequently
with the tests I've run here.
3. Could you please run the test case 3 times at least? Just to ensure the consistency of the issue.
Yes, have run 4 times. Seems to be a performance hit (whether normal
copy or parallel-1 copy) on the first COPY run on a freshly created
database. After that, results are consistent.
4. I ran the tests in a performance test system where no other user processes(except system processes) are running. Is it possible for you to do the same?
Please capture and share the timing logs with us.
Yes, I have ensured the system is as idle as possible prior to testing.
I have attached the test results obtained after building with your
Parallel Copy patch and testing patch applied (HEAD at
733fa9aa51c526582f100aa0d375e0eb9a6bce8b).
Test results show that Parallel COPY with 1 worker is performing
better than normal COPY in the test scenarios run. There is a
performance hit (regardless of COPY type) on the very first COPY run
on a freshly-created database.
I ran the test case 4 times. and also in reverse order, with truncate
run before each COPY (output and logs named xxxx_0_1 run normal COPY
then parallel COPY, and named xxxx_1_0 run parallel COPY and then
normal COPY).
Please refer to attached results.
Regards,
Greg
Attachments:
Thanks Greg for the testing.
On Thu, Sep 24, 2020 at 8:27 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
3. Could you please run the test case 3 times at least? Just to ensure
the consistency of the issue.
Yes, have run 4 times. Seems to be a performance hit (whether normal
copy or parallel-1 copy) on the first COPY run on a freshly created
database. After that, results are consistent.
From the logs, I see that it is happening only with default
postgresql.conf, and there's inconsistency in table insertion times,
especially from the 1st time to 2nd time. Also, the table insertion time
variation is more. This is expected with the default postgresql.conf,
because of the background processes interference. That's the reason we
usually run with custom configuration to correctly measure the performance
gain.
br_default_0_1.log:
2020-09-23 22:32:36.944 JST [112616] LOG: totaltableinsertiontime =
155068.244 ms
2020-09-23 22:33:57.615 JST [11426] LOG: totaltableinsertiontime =
42096.275 ms
2020-09-23 22:37:39.192 JST [43097] LOG: totaltableinsertiontime =
29135.262 ms
2020-09-23 22:38:56.389 JST [54205] LOG: totaltableinsertiontime =
38953.912 ms
2020-09-23 22:40:27.573 JST [66485] LOG: totaltableinsertiontime =
27895.326 ms
2020-09-23 22:41:34.948 JST [77523] LOG: totaltableinsertiontime =
28929.642 ms
2020-09-23 22:43:18.938 JST [89857] LOG: totaltableinsertiontime =
30625.015 ms
2020-09-23 22:44:21.938 JST [101372] LOG: totaltableinsertiontime =
24624.045 ms
br_default_1_0.log:
2020-09-24 11:12:14.989 JST [56146] LOG: totaltableinsertiontime =
192068.350 ms
2020-09-24 11:13:38.228 JST [88455] LOG: totaltableinsertiontime =
30999.942 ms
2020-09-24 11:15:50.381 JST [108935] LOG: totaltableinsertiontime =
31673.204 ms
2020-09-24 11:17:14.260 JST [118541] LOG: totaltableinsertiontime =
31367.027 ms
2020-09-24 11:20:18.975 JST [17270] LOG: totaltableinsertiontime =
26858.924 ms
2020-09-24 11:22:17.822 JST [26852] LOG: totaltableinsertiontime =
66531.442 ms
2020-09-24 11:24:09.221 JST [47971] LOG: totaltableinsertiontime =
38943.384 ms
2020-09-24 11:25:30.955 JST [58849] LOG: totaltableinsertiontime =
28286.634 ms
br_custom_0_1.log:
2020-09-24 10:29:44.956 JST [110477] LOG: totaltableinsertiontime =
20207.928 ms
2020-09-24 10:30:49.570 JST [120568] LOG: totaltableinsertiontime =
23360.006 ms
2020-09-24 10:32:31.659 JST [2753] LOG: totaltableinsertiontime =
19837.588 ms
2020-09-24 10:35:49.245 JST [31118] LOG: totaltableinsertiontime =
21759.253 ms
2020-09-24 10:36:54.834 JST [41763] LOG: totaltableinsertiontime =
23547.323 ms
2020-09-24 10:38:53.507 JST [56779] LOG: totaltableinsertiontime =
21543.984 ms
2020-09-24 10:39:58.713 JST [67489] LOG: totaltableinsertiontime =
25254.563 ms
br_custom_1_0.log:
2020-09-24 10:49:03.242 JST [15308] LOG: totaltableinsertiontime =
16541.201 ms
2020-09-24 10:50:11.848 JST [23324] LOG: totaltableinsertiontime =
15076.577 ms
2020-09-24 10:51:24.497 JST [35394] LOG: totaltableinsertiontime =
16400.777 ms
2020-09-24 10:52:32.354 JST [42953] LOG: totaltableinsertiontime =
15591.051 ms
2020-09-24 10:54:30.327 JST [61136] LOG: totaltableinsertiontime =
16700.954 ms
2020-09-24 10:55:38.377 JST [68719] LOG: totaltableinsertiontime =
15435.150 ms
2020-09-24 10:57:08.927 JST [83335] LOG: totaltableinsertiontime =
17133.251 ms
2020-09-24 10:58:17.420 JST [90905] LOG: totaltableinsertiontime =
15352.753 ms
Test results show that Parallel COPY with 1 worker is performing
better than normal COPY in the test scenarios run.
Good to know :)
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Have you tested your patch when encoding conversion is needed? If so,
could you please point out the email that has the test results.We have not yet done encoding testing, we will do and post the results
separately in the coming days.
Hi Ashutosh,
I ran the tests ensuring pg_server_to_any() gets called from copy.c. I
specified the encoding option of COPY command, with client and server
encodings being UTF-8.
Tests are performed with custom postgresql.conf[1]shared_buffers = 40GB max_worker_processes = 32 max_parallel_maintenance_workers = 24 max_parallel_workers = 32 synchronous_commit = off checkpoint_timeout = 1d max_wal_size = 24GB min_wal_size = 15GB autovacuum = off, 10million rows, 5.2GB
data. The results are of the triplet form (exec time in sec, number of
workers, gain)
Use case 1: 2 indexes on integer columns, 1 index on text column
(1174.395, 0, 1X), (1127.792, 1, 1.04X), (644.260, 2, 1.82X), (341.284, 4,
3.43X), (204.423, 8, 5.74X), (140.692, 16, 8.34X), (129.843, 20, 9.04X),
(134.511, 30, 8.72X)
Use case 2: 1 gist index on text column
(811.412, 0, 1X), (772.203, 1, 1.05X), (437.364, 2, 1.85X), (263.575, 4,
3.08X), (175.135, 8, 4.63X), (155.355, 16, 5.22X), (178.704, 20, 4.54X),
(199.402, 30, 4.06)
Use case 3: 3 indexes on integer columns
(220.680, 0, 1X), (185.096, 1, 1.19X), (134.811, 2, 1.64X), (114.585, 4,
1.92X), (107.707, 8, 2.05X), (101.253, 16, 2.18X), (100.749, 20, 2.19X),
(100.656, 30, 2.19X)
The results are similar to our earlier runs[2]/messages/by-id/CALDaNm13zK=JXfZWqZJsm3+2yagYDJc=eJBgE4i77-4PPNj7vw@mail.gmail.com.
[1]: shared_buffers = 40GB max_worker_processes = 32 max_parallel_maintenance_workers = 24 max_parallel_workers = 32 synchronous_commit = off checkpoint_timeout = 1d max_wal_size = 24GB min_wal_size = 15GB autovacuum = off
shared_buffers = 40GB
max_worker_processes = 32
max_parallel_maintenance_workers = 24
max_parallel_workers = 32
synchronous_commit = off
checkpoint_timeout = 1d
max_wal_size = 24GB
min_wal_size = 15GB
autovacuum = off
[2]: /messages/by-id/CALDaNm13zK=JXfZWqZJsm3+2yagYDJc=eJBgE4i77-4PPNj7vw@mail.gmail.com
/messages/by-id/CALDaNm13zK=JXfZWqZJsm3+2yagYDJc=eJBgE4i77-4PPNj7vw@mail.gmail.com
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On Thu, Sep 24, 2020 at 3:00 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
Have you tested your patch when encoding conversion is needed? If so,
could you please point out the email that has the test results.We have not yet done encoding testing, we will do and post the results
separately in the coming days.Hi Ashutosh,
I ran the tests ensuring pg_server_to_any() gets called from copy.c. I specified the encoding option of COPY command, with client and server encodings being UTF-8.
Thanks Bharath for the testing. The results look impressive.
--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
On Wed, Jul 22, 2020 at 7:48 PM vignesh C <vignesh21@gmail.com> wrote:
On Tue, Jul 21, 2020 at 3:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Review comments:
===================0001-Copy-code-readjustment-to-support-parallel-copy
1.
@@ -807,8 +835,11 @@ CopyLoadRawBuf(CopyState cstate)
else
nbytes = 0; /* no data need be saved */+ if (cstate->copy_dest == COPY_NEW_FE) + minread = RAW_BUF_SIZE - nbytes; + inbytes = CopyGetData(cstate, cstate->raw_buf + nbytes, - 1, RAW_BUF_SIZE - nbytes); + minread, RAW_BUF_SIZE - nbytes);No comment to explain why this change is done?
0002-Framework-for-leader-worker-in-parallel-copy
Currently CopyGetData copies a lesser amount of data to buffer even though space is available in buffer because minread was passed as 1 to CopyGetData. Because of this there are frequent call to CopyGetData for fetching the data. In this case it will load only some data due to the below check:
while (maxread > 0 && bytesread < minread && !cstate->reached_eof)
After reading some data bytesread will be greater than minread which is passed as 1 and return with lesser amount of data, even though there is some space.
This change is required for parallel copy feature as each time we get a new DSM data block which is of 64K size and copy the data. If we copy less data into DSM data blocks we might end up consuming all the DSM data blocks.
Why can't we reuse the DSM block which has unfilled space?
I felt this issue can be fixed as part of HEAD. Have posted a separate thread [1] for this. I'm planning to remove that change once it gets committed. Can that go as a separate
patch or should we include it here?
[1] - /messages/by-id/CALDaNm0v4CjmvSnftYnx_9pOS_dKRG=O3NnBgJsQmi0KipvLog@mail.gmail.com
I am convinced by the reason given by Kyotaro-San in that another
thread [1]/messages/by-id/20200911.155804.359271394064499501.horikyota.ntt@gmail.com and performance data shown by Peter that this can't be an
independent improvement and rather in some cases it can do harm. Now,
if you need it for a parallel-copy path then we can change it
specifically to the parallel-copy code path but I don't understand
your reason completely.
2.
..
+ */
+typedef struct ParallelCopyLineBoundaryAre we doing all this state management to avoid using locks while
processing lines? If so, I think we can use either spinlock or LWLock
to keep the main patch simple and then provide a later patch to make
it lock-less. This will allow us to first focus on the main design of
the patch rather than trying to make this datastructure processing
lock-less in the best possible way.The steps will be more or less same if we use spinlock too. step 1, step 3 & step 4 will be common we have to use lock & unlock instead of step 2 & step 5. I feel we can retain the current implementation.
I'll study this in detail and let you know my opinion on the same but
in the meantime, I don't follow one part of this comment: "If they
don't follow this order the worker might process wrong line_size and
leader might populate the information which worker has not yet
processed or in the process of processing."
Do you want to say that leader might overwrite some information which
worker hasn't read yet? If so, it is not clear from the comment.
Another minor point about this comment:
+ * ParallelCopyLineBoundary is common data structure between leader & worker,
+ * Leader process will be populating data block, data block offset &
the size of
I think there should be a full-stop after worker instead of a comma.
6.
In function BeginParallelCopy(), you need to keep a provision to
collect wal_usage and buf_usage stats. See _bt_begin_parallel for
reference. Those will be required for pg_stat_statements.Fixed
How did you ensure that this is fixed? Have you tested it, if so
please share the test? I see a basic problem with your fix.
+ /* Report WAL/buffer usage during parallel execution */
+ bufferusage = shm_toc_lookup(toc, PARALLEL_COPY_BUFFER_USAGE, false);
+ walusage = shm_toc_lookup(toc, PARALLEL_COPY_WAL_USAGE, false);
+ InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber],
+ &walusage[ParallelWorkerNumber]);
You need to call InstrStartParallelQuery() before the actual operation
starts, without that stats won't be accurate? Also, after calling
WaitForParallelWorkersToFinish(), you need to accumulate the stats
collected from workers which neither you have done nor is possible
with the current code in your patch because you haven't made any
provision to capture them in BeginParallelCopy.
I suggest you look into lazy_parallel_vacuum_indexes() and
begin_parallel_vacuum() to understand how the buffer/wal usage stats
are accumulated. Also, please test this functionality using
pg_stat_statements.
0003-Allow-copy-from-command-to-process-data-from-file-ST
10.
In the commit message, you have written "The leader does not
participate in the insertion of data, leaders only responsibility will
be to identify the lines as fast as possible for the workers to do the
actual copy operation. The leader waits till all the lines populated
are processed by the workers and exits."I think you should also mention that we have chosen this design based
on the reason "that everything stalls if the leader doesn't accept
further input data, as well as when there are no available splitted
chunks so it doesn't seem like a good idea to have the leader do other
work. This is backed by the performance data where we have seen that
with 1 worker there is just a 5-10% (or whatever percentage difference
you have seen) performance difference)".Fixed.
Make it a one-paragraph starting from "The leader does not participate
in the insertion of data .... just a 5-10% performance difference".
Right now both the parts look a bit disconnected.
Few additional comments:
======================
v5-0001-Copy-code-readjustment-to-support-parallel-copy
---------------------------------------------------------------------------------
1.
+/*
+ * CLEAR_EOL_LINE - Wrapper for clearing EOL.
+ */
+#define CLEAR_EOL_LINE() \
+if (!result && !IsHeaderLine()) \
+ ClearEOLFromCopiedData(cstate, cstate->line_buf.data, \
+ cstate->line_buf.len, \
+ &cstate->line_buf.len) \
I don't like this macro. I think it is sufficient to move the common
code to be called from the parallel and non-parallel path in
ClearEOLFromCopiedData but I think the other checks can be done
in-place. I think having macros for such a thing makes code less
readable.
2.
-
+static void PopulateCommonCstateInfo(CopyState cstate, TupleDesc tup_desc,
+ List *attnamelist);
Spurious line removal.
v5-0002-Framework-for-leader-worker-in-parallel-copy
---------------------------------------------------------------------------
3.
+ FullTransactionId full_transaction_id; /* xid for copy from statement */
+ CommandId mycid; /* command id */
+ ParallelCopyLineBoundaries line_boundaries; /* line array */
+} ParallelCopyShmInfo;
We already serialize FullTransactionId and CommandId via
InitializeParallelDSM->SerializeTransactionState. Can't we reuse it? I
think recently Parallel Insert patch has also done something for this
[2]: /messages/by-id/CAJcOf-fn1nhEtaU91NvRuA3EbvbJGACMd4_c+Uu3XU5VMv37Aw@mail.gmail.com
v5-0004-Documentation-for-parallel-copy
-----------------------------------------------------------
1. Perform <command>COPY FROM</command> in parallel using <replaceable
+ class="parameter"> integer</replaceable> background workers.
No need for space before integer.
[1]: /messages/by-id/20200911.155804.359271394064499501.horikyota.ntt@gmail.com
[2]: /messages/by-id/CAJcOf-fn1nhEtaU91NvRuA3EbvbJGACMd4_c+Uu3XU5VMv37Aw@mail.gmail.com
--
With Regards,
Amit Kapila.
On Tue, Sep 22, 2020 at 2:44 PM vignesh C <vignesh21@gmail.com> wrote:
Thanks Ashutosh for your comments.
On Wed, Sep 16, 2020 at 6:36 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
Hi Vignesh,
I've spent some time today looking at your new set of patches and I've
some thoughts and queries which I would like to put here:Why are these not part of the shared cstate structure?
SerializeString(pcxt, PARALLEL_COPY_KEY_NULL_PRINT, cstate->null_print);
SerializeString(pcxt, PARALLEL_COPY_KEY_DELIM, cstate->delim);
SerializeString(pcxt, PARALLEL_COPY_KEY_QUOTE, cstate->quote);
SerializeString(pcxt, PARALLEL_COPY_KEY_ESCAPE, cstate->escape);I have used shared_cstate mainly to share the integer & bool data
types from the leader to worker process. The above data types are of
char* data type, I will not be able to use it like how I could do it
for integer type. So I preferred to send these as separate keys to the
worker. Thoughts?
I think the way you have written will work but if we go with
Ashutosh's proposal it will look elegant and in the future, if we need
to share more strings as part of cstate structure then that would be
easier. You can probably refer to EstimateParamListSpace,
SerializeParamList, and RestoreParamList to see how we can share
different types of data in one key.
--
With Regards,
Amit Kapila.
On Mon, Sep 28, 2020 at 3:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Sep 22, 2020 at 2:44 PM vignesh C <vignesh21@gmail.com> wrote:
Thanks Ashutosh for your comments.
On Wed, Sep 16, 2020 at 6:36 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
Hi Vignesh,
I've spent some time today looking at your new set of patches and I've
some thoughts and queries which I would like to put here:Why are these not part of the shared cstate structure?
SerializeString(pcxt, PARALLEL_COPY_KEY_NULL_PRINT, cstate->null_print);
SerializeString(pcxt, PARALLEL_COPY_KEY_DELIM, cstate->delim);
SerializeString(pcxt, PARALLEL_COPY_KEY_QUOTE, cstate->quote);
SerializeString(pcxt, PARALLEL_COPY_KEY_ESCAPE, cstate->escape);I have used shared_cstate mainly to share the integer & bool data
types from the leader to worker process. The above data types are of
char* data type, I will not be able to use it like how I could do it
for integer type. So I preferred to send these as separate keys to the
worker. Thoughts?I think the way you have written will work but if we go with
Ashutosh's proposal it will look elegant and in the future, if we need
to share more strings as part of cstate structure then that would be
easier. You can probably refer to EstimateParamListSpace,
SerializeParamList, and RestoreParamList to see how we can share
different types of data in one key.
Yeah. And in addition to that it will also reduce the number of DSM
keys that we need to maintain.
--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
Hi Vignesh and Bharath,
Seems like the Parallel Copy patch is regarding RI_TRIGGER_PK as
parallel-unsafe.
Can you explain why this is?
Regards,
Greg Nancarrow
Fujitsu Australia
On Mon, Sep 28, 2020 at 12:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Few additional comments:
======================
Some more comments:
v5-0002-Framework-for-leader-worker-in-parallel-copy
===========================================
1.
These values
+ * help in handover of multiple records with significant size of data to be
+ * processed by each of the workers to make sure there is no context
switch & the
+ * work is fairly distributed among the workers.
How about writing it as: "These values help in the handover of
multiple records with the significant size of data to be processed by
each of the workers. This also ensures there is no context switch and
the work is fairly distributed among the workers."
2. Can we keep WORKER_CHUNK_COUNT, MAX_BLOCKS_COUNT, and RINGSIZE as
power-of-two? Say WORKER_CHUNK_COUNT as 64, MAX_BLOCK_COUNT as 1024,
and accordingly choose RINGSIZE. At many places, we do that way. I
think it can sometimes help in faster processing due to cache size
requirements and in this case, I don't see a reason why we can't
choose these values to be power-of-two. If you agree with this change
then also do some performance testing after this change?
3.
+ bool curr_blk_completed;
+ char data[DATA_BLOCK_SIZE]; /* data read from file */
+ uint8 skip_bytes;
+} ParallelCopyDataBlock;
Is there a reason to keep skip_bytes after data? Normally the variable
size data is at the end of the structure. Also, there is no comment
explaining the purpose of skip_bytes.
4.
+ * Copy data block information.
+ * ParallelCopyDataBlock's will be created in DSM. Data read from file will be
+ * copied in these DSM data blocks. The leader process identifies the records
+ * and the record information will be shared to the workers. The workers will
+ * insert the records into the table. There can be one or more number
of records
+ * in each of the data block based on the record size.
+ */
+typedef struct ParallelCopyDataBlock
Keep one empty line after the description line like below. I also
suggested to do a minor tweak in the above sentence which is as
follows:
* Copy data block information.
*
* These data blocks are created in DSM. Data read ...
Try to follow a similar format in other comments as well.
5. I think it is better to move parallelism related code to a new file
(we can name it as copyParallel.c or something like that).
6. copy.c(1648,25): warning C4133: 'function': incompatible types -
from 'ParallelCopyLineState *' to 'uint32 *'
Getting above compilation warning on Windows.
v5-0003-Allow-copy-from-command-to-process-data-from-file
==================================================
1.
@@ -4294,7 +5047,7 @@ BeginCopyFrom(ParseState *pstate,
* only in text mode.
*/
initStringInfo(&cstate->attribute_buf);
- cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf = (IsParallelCopy()) ? NULL : (char *)
palloc(RAW_BUF_SIZE + 1);
Is there anyway IsParallelCopy can be true by this time? AFAICS, we do
anything about parallelism after this. If you want to save this
allocation then we need to move this after we determine that
parallelism can be used or not and accordingly the below code in the
patch needs to be changed.
* ParallelCopyFrom - parallel copy leader's functionality.
*
* Leader executes the before statement for before statement trigger, if before
@@ -1110,8 +1547,302 @@ ParallelCopyFrom(CopyState cstate)
ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
ereport(DEBUG1, (errmsg("Running parallel copy leader")));
+ /* raw_buf is not used in parallel copy, instead data blocks are used.*/
+ pfree(cstate->raw_buf);
+ cstate->raw_buf = NULL;
Is there anything else also the allocation of which depends on parallelism?
2.
+static pg_attribute_always_inline bool
+IsParallelCopyAllowed(CopyState cstate)
+{
+ /* Parallel copy not allowed for frontend (2.0 protocol) & binary option. */
+ if ((cstate->copy_dest == COPY_OLD_FE) || cstate->binary)
+ return false;
+
+ /* Check if copy is into foreign table or temporary table. */
+ if (cstate->rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE ||
+ RelationUsesLocalBuffers(cstate->rel))
+ return false;
+
+ /* Check if trigger function is parallel safe. */
+ if (cstate->rel->trigdesc != NULL &&
+ !IsTriggerFunctionParallelSafe(cstate->rel->trigdesc))
+ return false;
+
+ /*
+ * Check if there is after statement or instead of trigger or transition
+ * table triggers.
+ */
+ if (cstate->rel->trigdesc != NULL &&
+ (cstate->rel->trigdesc->trig_insert_after_statement ||
+ cstate->rel->trigdesc->trig_insert_instead_row ||
+ cstate->rel->trigdesc->trig_insert_new_table))
+ return false;
+
+ /* Check if the volatile expressions are parallel safe, if present any. */
+ if (!CheckExprParallelSafety(cstate))
+ return false;
+
+ /* Check if the insertion mode is single. */
+ if (FindInsertMethod(cstate) == CIM_SINGLE)
+ return false;
+
+ return true;
+}
In the comments, we should write why parallelism is not allowed for a
particular case. The cases where parallel-unsafe clause is involved
are okay but it is not clear from comments why it is not allowed in
other cases.
3.
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundary *lineInfo;
+ uint32 line_first_block = pcshared_info->cur_block_pos;
+ line_pos = UpdateBlockInLineInfo(cstate,
+ line_first_block,
+ cstate->raw_buf_index, -1,
+ LINE_LEADER_POPULATING);
+ lineInfo = &pcshared_info->line_boundaries.ring[line_pos];
+ elog(DEBUG1, "[Leader] Adding - block:%d, offset:%d, line position:%d",
+ line_first_block, lineInfo->start_offset, line_pos);
Can we take all the code here inside function UpdateBlockInLineInfo? I
see that it is called from one other place but I guess most of the
surrounding code there can also be moved inside the function. Can we
change the name of the function to UpdateSharedLineInfo or something
like that and remove inline marking from this? I am not sure we want
to inline such big functions. If it make difference in performance
then we can probably consider it.
4.
EndLineParallelCopy()
{
..
+ /* Update line size. */
+ pg_atomic_write_u32(&lineInfo->line_size, line_size);
+ pg_atomic_write_u32(&lineInfo->line_state, LINE_LEADER_POPULATED);
+ elog(DEBUG1, "[Leader] After adding - line position:%d, line_size:%d",
+ line_pos, line_size);
..
}
Can we instead call UpdateSharedLineInfo (new function name for
UpdateBlockInLineInfo) to do this and maybe see it only updates the
required info? The idea is to centralize the code for updating
SharedLineInfo.
5.
+static uint32
+GetLinePosition(CopyState cstate)
+{
+ ParallelCopyData *pcdata = cstate->pcdata;
+ ParallelCopyShmInfo *pcshared_info = pcdata->pcshared_info;
+ uint32 previous_pos = pcdata->worker_processed_pos;
+ uint32 write_pos = (previous_pos == -1) ? 0 : (previous_pos + 1) % RINGSIZE;
It seems to me that each worker has to hop through all the processed
chunks before getting the chunk which it can process. This will work
but I think it is better if we have some shared counter which can tell
us the next chunk to be processed and avoid all the unnecessary work
of hopping to find the exact position.
v5-0004-Documentation-for-parallel-copy
-----------------------------------------
1. Can you add one or two examples towards the end of the page where
we have examples for other Copy options?
Please run pgindent on all patches as that will make the code look better.
From the testing perspective,
1. Test by having something force_parallel_mode = regress which means
that all existing Copy tests in the regression will be executed via
new worker code. You can have this as a test-only patch for now and
make sure all existing tests passed with this.
2. Do we have tests for toast tables? I think if you implement the
previous point some existing tests might cover it but I feel we should
have at least one or two tests for the same.
3. Have we checked the code coverage of the newly added code with
existing tests?
--
With Regards,
Amit Kapila.
On Tue, Sep 29, 2020 at 6:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Sep 28, 2020 at 12:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Few additional comments:
======================Some more comments:
Thanks Amit for the comments, I will work on the comments and provide
a patch in the next few days.
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
On Tue, Sep 29, 2020 at 3:16 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
Hi Vignesh and Bharath,
Seems like the Parallel Copy patch is regarding RI_TRIGGER_PK as
parallel-unsafe.
Can you explain why this is?
I don't think we need to restrict this case and even if there is some
reason to do so then probably the same should be mentioned in the
comments.
--
With Regards,
Amit Kapila.
Hello Vignesh,
I've done some basic benchmarking on the v4 version of the patches (but
AFAIKC the v5 should perform about the same), and some initial review.
For the benchmarking, I used the lineitem table from TPC-H - for 75GB
data set, this largest table is about 64GB once loaded, with another
54GB in 5 indexes. This is on a server with 32 cores, 64GB of RAM and
NVME storage.
The COPY duration with varying number of workers (specified using the
parallel COPY option) looks like this:
workers duration
---------------------
0 1366
1 1255
2 704
3 526
4 434
5 385
6 347
7 322
8 327
So this seems to work pretty well - initially we get almost linear
speedup, then it slows down (likely due to contention for locks, I/O
etc.). Not bad.
I've only done a quick review, but overall the patch looks in fairly
good shape.
1) I don't quite understand why we need INCREMENTPROCESSED and
RETURNPROCESSED, considering it just does ++ or return. It just
obfuscated the code, I think.
2) I find it somewhat strange that BeginParallelCopy can just decide not
to do parallel copy after all. Why not to do this decisions in the
caller? Or maybe it's fine this way, not sure.
3) AFAIK we don't modify typedefs.list in patches, so these changes
should be removed.
4) IsTriggerFunctionParallelSafe actually checks all triggers, not just
one, so the comment needs minor rewording.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, Oct 3, 2020 at 6:20 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
Hello Vignesh,
I've done some basic benchmarking on the v4 version of the patches (but
AFAIKC the v5 should perform about the same), and some initial review.For the benchmarking, I used the lineitem table from TPC-H - for 75GB
data set, this largest table is about 64GB once loaded, with another
54GB in 5 indexes. This is on a server with 32 cores, 64GB of RAM and
NVME storage.The COPY duration with varying number of workers (specified using the
parallel COPY option) looks like this:workers duration
---------------------
0 1366
1 1255
2 704
3 526
4 434
5 385
6 347
7 322
8 327So this seems to work pretty well - initially we get almost linear
speedup, then it slows down (likely due to contention for locks, I/O
etc.). Not bad.
+1. These numbers (> 4x speed up) look good to me.
--
With Regards,
Amit Kapila.
On Mon, Sep 28, 2020 at 12:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Jul 22, 2020 at 7:48 PM vignesh C <vignesh21@gmail.com> wrote:
On Tue, Jul 21, 2020 at 3:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Review comments:
===================0001-Copy-code-readjustment-to-support-parallel-copy
1.
@@ -807,8 +835,11 @@ CopyLoadRawBuf(CopyState cstate)
else
nbytes = 0; /* no data need be saved */+ if (cstate->copy_dest == COPY_NEW_FE) + minread = RAW_BUF_SIZE - nbytes; + inbytes = CopyGetData(cstate, cstate->raw_buf + nbytes, - 1, RAW_BUF_SIZE - nbytes); + minread, RAW_BUF_SIZE - nbytes);No comment to explain why this change is done?
0002-Framework-for-leader-worker-in-parallel-copy
Currently CopyGetData copies a lesser amount of data to buffer even though space is available in buffer because minread was passed as 1 to CopyGetData. Because of this there are frequent call to CopyGetData for fetching the data. In this case it will load only some data due to the below check:
while (maxread > 0 && bytesread < minread && !cstate->reached_eof)
After reading some data bytesread will be greater than minread which is passed as 1 and return with lesser amount of data, even though there is some space.
This change is required for parallel copy feature as each time we get a new DSM data block which is of 64K size and copy the data. If we copy less data into DSM data blocks we might end up consuming all the DSM data blocks.Why can't we reuse the DSM block which has unfilled space?
I felt this issue can be fixed as part of HEAD. Have posted a separate thread [1] for this. I'm planning to remove that change once it gets committed. Can that go as a separate
patch or should we include it here?
[1] - /messages/by-id/CALDaNm0v4CjmvSnftYnx_9pOS_dKRG=O3NnBgJsQmi0KipvLog@mail.gmail.comI am convinced by the reason given by Kyotaro-San in that another
thread [1] and performance data shown by Peter that this can't be an
independent improvement and rather in some cases it can do harm. Now,
if you need it for a parallel-copy path then we can change it
specifically to the parallel-copy code path but I don't understand
your reason completely.
Whenever we need data to be populated, we will get a new data block &
pass it to CopyGetData to populate the data. In case of file copy, the
server will completely fill the data block. We expect the data to be
filled completely. If data is available it will completely load the
complete data block in case of file copy. There is no scenario where
even if data is present a partial data block will be returned except
for EOF or no data available. But in case of STDIN data copy, even
though there is 8K data available in data block & 8K data available in
STDIN, CopyGetData will return as soon as libpq buffer data is more
than the minread. We will pass new data block every time to load data.
Every time we pass an 8K data block but CopyGetData loads a few bytes
in the new data block & returns. I wanted to keep the same data
population logic for both file copy & STDIN copy i.e copy full 8K data
blocks & then the populated data can be required. There is an
alternative solution I can have some special handling in case of STDIN
wherein the existing data block can be passed with the index from
where the data should be copied. Thoughts?
2.
..
+ */
+typedef struct ParallelCopyLineBoundaryAre we doing all this state management to avoid using locks while
processing lines? If so, I think we can use either spinlock or LWLock
to keep the main patch simple and then provide a later patch to make
it lock-less. This will allow us to first focus on the main design of
the patch rather than trying to make this datastructure processing
lock-less in the best possible way.The steps will be more or less same if we use spinlock too. step 1, step 3 & step 4 will be common we have to use lock & unlock instead of step 2 & step 5. I feel we can retain the current implementation.
I'll study this in detail and let you know my opinion on the same but
in the meantime, I don't follow one part of this comment: "If they
don't follow this order the worker might process wrong line_size and
leader might populate the information which worker has not yet
processed or in the process of processing."Do you want to say that leader might overwrite some information which
worker hasn't read yet? If so, it is not clear from the comment.
Another minor point about this comment:
Here leader and worker must follow these steps to avoid any corruption
or hang issue. Changed it to:
* The leader & worker process access the shared line information by following
* the below steps to avoid any data corruption or hang:
+ * ParallelCopyLineBoundary is common data structure between leader & worker, + * Leader process will be populating data block, data block offset & the size ofI think there should be a full-stop after worker instead of a comma.
Changed it.
6.
In function BeginParallelCopy(), you need to keep a provision to
collect wal_usage and buf_usage stats. See _bt_begin_parallel for
reference. Those will be required for pg_stat_statements.Fixed
How did you ensure that this is fixed? Have you tested it, if so
please share the test? I see a basic problem with your fix.+ /* Report WAL/buffer usage during parallel execution */ + bufferusage = shm_toc_lookup(toc, PARALLEL_COPY_BUFFER_USAGE, false); + walusage = shm_toc_lookup(toc, PARALLEL_COPY_WAL_USAGE, false); + InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber], + &walusage[ParallelWorkerNumber]);You need to call InstrStartParallelQuery() before the actual operation
starts, without that stats won't be accurate? Also, after calling
WaitForParallelWorkersToFinish(), you need to accumulate the stats
collected from workers which neither you have done nor is possible
with the current code in your patch because you haven't made any
provision to capture them in BeginParallelCopy.I suggest you look into lazy_parallel_vacuum_indexes() and
begin_parallel_vacuum() to understand how the buffer/wal usage stats
are accumulated. Also, please test this functionality using
pg_stat_statements.
Made changes accordingly.
I have verified it using:
postgres=# select * from pg_stat_statements where query like '%copy%';
userid | dbid | queryid |
query
| plans | total_plan_time |
min_plan_time | max_plan_time | mean_plan_time | stddev_plan_time |
calls | total_exec_time | min_exec_time | max_exec_time |
mean_exec_time | stddev_exec_time | rows | shared_blks_hi
t | shared_blks_read | shared_blks_dirtied | shared_blks_written |
local_blks_hit | local_blks_read | local_blks_dirtied |
local_blks_written | temp_blks_read | temp_blks_written | blk_
read_time | blk_write_time | wal_records | wal_fpi | wal_bytes
--------+-------+----------------------+---------------------------------------------------------------------------------------------------------------------+-------+-----------------+-
--------------+---------------+----------------+------------------+-------+-----------------+---------------+---------------+----------------+------------------+--------+---------------
--+------------------+---------------------+---------------------+----------------+-----------------+--------------------+--------------------+----------------+-------------------+-----
----------+----------------+-------------+---------+-----------
10 | 13743 | -6947756673093447609 | copy hw from
'/home/vignesh/postgres/postgres/inst/bin/hw_175000.csv' with(format
csv, delimiter ',') | 0 | 0 |
0 | 0 | 0 | 0 |
1 | 265.195105 | 265.195105 | 265.195105 | 265.195105
| 0 | 175000 | 191
6 | 0 | 946 | 946 |
0 | 0 | 0 | 0
| 0 | 0 |
0 | 0 | 1116 | 0 | 3587203
10 | 13743 | 8570215596364326047 | copy hw from
'/home/vignesh/postgres/postgres/inst/bin/hw_175000.csv' with(format
csv, delimiter ',', parallel '2') | 0 | 0 |
0 | 0 | 0 | 0 |
1 | 35668.402482 | 35668.402482 | 35668.402482 | 35668.402482
| 0 | 175000 | 310
1 | 36 | 952 | 919 |
0 | 0 | 0 | 0
| 0 | 0 |
0 | 0 | 1119 | 6 | 3624405
(2 rows)
0003-Allow-copy-from-command-to-process-data-from-file-ST
10.
In the commit message, you have written "The leader does not
participate in the insertion of data, leaders only responsibility will
be to identify the lines as fast as possible for the workers to do the
actual copy operation. The leader waits till all the lines populated
are processed by the workers and exits."I think you should also mention that we have chosen this design based
on the reason "that everything stalls if the leader doesn't accept
further input data, as well as when there are no available splitted
chunks so it doesn't seem like a good idea to have the leader do other
work. This is backed by the performance data where we have seen that
with 1 worker there is just a 5-10% (or whatever percentage difference
you have seen) performance difference)".Fixed.
Make it a one-paragraph starting from "The leader does not participate
in the insertion of data .... just a 5-10% performance difference".
Right now both the parts look a bit disconnected.
Made the contents starting from "The leader does not" in a paragraph.
Few additional comments: ====================== v5-0001-Copy-code-readjustment-to-support-parallel-copy --------------------------------------------------------------------------------- 1. +/* + * CLEAR_EOL_LINE - Wrapper for clearing EOL. + */ +#define CLEAR_EOL_LINE() \ +if (!result && !IsHeaderLine()) \ + ClearEOLFromCopiedData(cstate, cstate->line_buf.data, \ + cstate->line_buf.len, \ + &cstate->line_buf.len) \I don't like this macro. I think it is sufficient to move the common
code to be called from the parallel and non-parallel path in
ClearEOLFromCopiedData but I think the other checks can be done
in-place. I think having macros for such a thing makes code less
readable.
I have removed the macro & called ClearEOLFromCopiedData directly
wherever required.
2. - +static void PopulateCommonCstateInfo(CopyState cstate, TupleDesc tup_desc, + List *attnamelist);Spurious line removal.
I have modified it to keep it as it is.
v5-0002-Framework-for-leader-worker-in-parallel-copy --------------------------------------------------------------------------- 3. + FullTransactionId full_transaction_id; /* xid for copy from statement */ + CommandId mycid; /* command id */ + ParallelCopyLineBoundaries line_boundaries; /* line array */ +} ParallelCopyShmInfo;We already serialize FullTransactionId and CommandId via
InitializeParallelDSM->SerializeTransactionState. Can't we reuse it? I
think recently Parallel Insert patch has also done something for this
[2] so you can refer that if you want.
Changed it to remove setting of command id & full transaction id.
Added a function SetCurrentCommandIdUsedForWorker to set
currentCommandIdUsed to true & called GetCurrentCommandId by passing
!IsParallelCopy().
v5-0004-Documentation-for-parallel-copy ----------------------------------------------------------- 1. Perform <command>COPY FROM</command> in parallel using <replaceable + class="parameter"> integer</replaceable> background workers.No need for space before integer.
I have removed it.
Attached v6 patch with the fixes.
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v6-0001-Copy-code-readjustment-to-support-parallel-copy.patchapplication/x-patch; name=v6-0001-Copy-code-readjustment-to-support-parallel-copy.patchDownload
From 2f6fda276f191a3b7a15c07c51199a154530ed09 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Sat, 1 Aug 2020 08:52:48 +0530
Subject: [PATCH v6 1/6] Copy code readjustment to support parallel copy.
This patch has the copy code slightly readjusted so that the common code is
separated to functions/macros, these functions/macros will be used by the
workers in the parallel copy code of the upcoming patches. EOL removal is moved
from CopyReadLine to CopyReadLineText, this change was required because in case
of parallel copy the record identification and record updation is done in
CopyReadLineText, before record information is updated in shared memory the new
line characters should be removed.
---
src/backend/commands/copy.c | 356 +++++++++++++++++++++++++++-----------------
1 file changed, 218 insertions(+), 138 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 2047557..f2848a1 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -95,6 +95,9 @@ typedef enum CopyInsertMethod
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
} CopyInsertMethod;
+#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
+
/*
* This struct contains all the state variables used throughout a COPY
* operation. For simplicity, we use the same struct for all variants of COPY,
@@ -224,7 +227,6 @@ typedef struct CopyStateData
* appropriate amounts of data from this buffer. In both modes, we
* guarantee that there is a \0 at raw_buf[raw_buf_len].
*/
-#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
char *raw_buf;
int raw_buf_index; /* next byte to process */
int raw_buf_len; /* total # of bytes stored */
@@ -354,6 +356,18 @@ if (1) \
goto not_end_of_copy; \
} else ((void) 0)
+/*
+ * Increment the lines processed.
+ */
+#define INCREMENTPROCESSED(processed) \
+processed++;
+
+/*
+ * Get the lines processed.
+ */
+#define RETURNPROCESSED(processed) \
+return processed;
+
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
@@ -401,6 +415,12 @@ static bool CopyGetInt16(CopyState cstate, int16 *val);
static bool CopyLoadRawBuf(CopyState cstate);
static int CopyReadBinaryData(CopyState cstate, char *dest, int nbytes);
+static void PopulateCommonCstateInfo(CopyState cstate, TupleDesc tup_desc,
+ List *attnamelist);
+static void ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
+ int copy_line_pos, int *copy_line_size);
+static void ConvertToServerEncoding(CopyState cstate);
+
/*
* Send copy start/stop messages for frontend copies. These have changed
@@ -801,14 +821,18 @@ CopyLoadRawBuf(CopyState cstate)
{
int nbytes = RAW_BUF_BYTES(cstate);
int inbytes;
+ int minread = 1;
/* Copy down the unprocessed data if any. */
if (nbytes > 0)
memmove(cstate->raw_buf, cstate->raw_buf + cstate->raw_buf_index,
nbytes);
+ if (cstate->copy_dest == COPY_NEW_FE)
+ minread = RAW_BUF_SIZE - nbytes;
+
inbytes = CopyGetData(cstate, cstate->raw_buf + nbytes,
- 1, RAW_BUF_SIZE - nbytes);
+ minread, RAW_BUF_SIZE - nbytes);
nbytes += inbytes;
cstate->raw_buf[nbytes] = '\0';
cstate->raw_buf_index = 0;
@@ -1514,7 +1538,6 @@ BeginCopy(ParseState *pstate,
{
CopyState cstate;
TupleDesc tupDesc;
- int num_phys_attrs;
MemoryContext oldcontext;
/* Allocate workspace and zero all fields */
@@ -1680,6 +1703,25 @@ BeginCopy(ParseState *pstate,
tupDesc = cstate->queryDesc->tupDesc;
}
+ PopulateCommonCstateInfo(cstate, tupDesc, attnamelist);
+ cstate->copy_dest = COPY_FILE; /* default */
+
+ MemoryContextSwitchTo(oldcontext);
+
+ return cstate;
+}
+
+/*
+ * PopulateCommonCstateInfo
+ *
+ * Populates the common variables required for copy from operation. This is a
+ * helper function for BeginCopy function.
+ */
+static void
+PopulateCommonCstateInfo(CopyState cstate, TupleDesc tupDesc, List *attnamelist)
+{
+ int num_phys_attrs;
+
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
@@ -1799,12 +1841,6 @@ BeginCopy(ParseState *pstate,
pg_database_encoding_max_length() > 1);
/* See Multibyte encoding comment above */
cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
-
- cstate->copy_dest = COPY_FILE; /* default */
-
- MemoryContextSwitchTo(oldcontext);
-
- return cstate;
}
/*
@@ -2696,32 +2732,13 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
}
/*
- * Copy FROM file to relation.
+ * CheckTargetRelValidity
+ *
+ * Check if the relation specified in copy from is valid.
*/
-uint64
-CopyFrom(CopyState cstate)
+static void
+CheckTargetRelValidity(CopyState cstate)
{
- ResultRelInfo *resultRelInfo;
- ResultRelInfo *target_resultRelInfo;
- ResultRelInfo *prevResultRelInfo = NULL;
- EState *estate = CreateExecutorState(); /* for ExecConstraints() */
- ModifyTableState *mtstate;
- ExprContext *econtext;
- TupleTableSlot *singleslot = NULL;
- MemoryContext oldcontext = CurrentMemoryContext;
-
- PartitionTupleRouting *proute = NULL;
- ErrorContextCallback errcallback;
- CommandId mycid = GetCurrentCommandId(true);
- int ti_options = 0; /* start with default options for insert */
- BulkInsertState bistate = NULL;
- CopyInsertMethod insertMethod;
- CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
- uint64 processed = 0;
- bool has_before_insert_row_trig;
- bool has_instead_insert_row_trig;
- bool leafpart_use_multi_insert = false;
-
Assert(cstate->rel);
/*
@@ -2758,27 +2775,6 @@ CopyFrom(CopyState cstate)
RelationGetRelationName(cstate->rel))));
}
- /*
- * If the target file is new-in-transaction, we assume that checking FSM
- * for free space is a waste of time. This could possibly be wrong, but
- * it's unlikely.
- */
- if (RELKIND_HAS_STORAGE(cstate->rel->rd_rel->relkind) &&
- (cstate->rel->rd_createSubid != InvalidSubTransactionId ||
- cstate->rel->rd_firstRelfilenodeSubid != InvalidSubTransactionId))
- ti_options |= TABLE_INSERT_SKIP_FSM;
-
- /*
- * Optimize if new relfilenode was created in this subxact or one of its
- * committed children and we won't see those rows later as part of an
- * earlier scan or command. The subxact test ensures that if this subxact
- * aborts then the frozen rows won't be visible after xact cleanup. Note
- * that the stronger test of exactly which subtransaction created it is
- * crucial for correctness of this optimization. The test for an earlier
- * scan or command tolerates false negatives. FREEZE causes other sessions
- * to see rows they would not see under MVCC, and a false negative merely
- * spreads that anomaly to the current session.
- */
if (cstate->freeze)
{
/*
@@ -2816,9 +2812,61 @@ CopyFrom(CopyState cstate)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("cannot perform COPY FREEZE because the table was not created or truncated in the current subtransaction")));
+ }
+}
+
+/*
+ * Copy FROM file to relation.
+ */
+uint64
+CopyFrom(CopyState cstate)
+{
+ ResultRelInfo *resultRelInfo;
+ ResultRelInfo *target_resultRelInfo;
+ ResultRelInfo *prevResultRelInfo = NULL;
+ EState *estate = CreateExecutorState(); /* for ExecConstraints() */
+ ModifyTableState *mtstate;
+ ExprContext *econtext;
+ TupleTableSlot *singleslot = NULL;
+ MemoryContext oldcontext = CurrentMemoryContext;
+
+ PartitionTupleRouting *proute = NULL;
+ ErrorContextCallback errcallback;
+ CommandId mycid = GetCurrentCommandId(true);
+ int ti_options = 0; /* start with default options for insert */
+ BulkInsertState bistate = NULL;
+ CopyInsertMethod insertMethod;
+ CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
+ uint64 processed = 0;
+ bool has_before_insert_row_trig;
+ bool has_instead_insert_row_trig;
+ bool leafpart_use_multi_insert = false;
+ CheckTargetRelValidity(cstate);
+
+ /*
+ * If the target file is new-in-transaction, we assume that checking FSM
+ * for free space is a waste of time. This could possibly be wrong, but
+ * it's unlikely.
+ */
+ if (RELKIND_HAS_STORAGE(cstate->rel->rd_rel->relkind) &&
+ (cstate->rel->rd_createSubid != InvalidSubTransactionId ||
+ cstate->rel->rd_firstRelfilenodeSubid != InvalidSubTransactionId))
+ ti_options |= TABLE_INSERT_SKIP_FSM;
+
+ /*
+ * Optimize if new relfilenode was created in this subxact or one of its
+ * committed children and we won't see those rows later as part of an
+ * earlier scan or command. The subxact test ensures that if this subxact
+ * aborts then the frozen rows won't be visible after xact cleanup. Note
+ * that the stronger test of exactly which subtransaction created it is
+ * crucial for correctness of this optimization. The test for an earlier
+ * scan or command tolerates false negatives. FREEZE causes other sessions
+ * to see rows they would not see under MVCC, and a false negative merely
+ * spreads that anomaly to the current session.
+ */
+ if (cstate->freeze)
ti_options |= TABLE_INSERT_FROZEN;
- }
/*
* We need a ResultRelInfo so we can use the regular executor's
@@ -3311,7 +3359,7 @@ CopyFrom(CopyState cstate)
* or FDW; this is the same definition used by nodeModifyTable.c
* for counting tuples inserted by an INSERT command.
*/
- processed++;
+ INCREMENTPROCESSED(processed)
}
}
@@ -3366,30 +3414,17 @@ CopyFrom(CopyState cstate)
FreeExecutorState(estate);
- return processed;
+ RETURNPROCESSED(processed)
}
/*
- * Setup to read tuples from a file for COPY FROM.
+ * PopulateCstateCatalogInfo
*
- * 'rel': Used as a template for the tuples
- * 'filename': Name of server-local file to read
- * 'attnamelist': List of char *, columns to include. NIL selects all cols.
- * 'options': List of DefElem. See copy_opt_item in gram.y for selections.
- *
- * Returns a CopyState, to be passed to NextCopyFrom and related functions.
+ * Populate the cstate catalog information.
*/
-CopyState
-BeginCopyFrom(ParseState *pstate,
- Relation rel,
- const char *filename,
- bool is_program,
- copy_data_source_cb data_source_cb,
- List *attnamelist,
- List *options)
+static void
+PopulateCstateCatalogInfo(CopyState cstate)
{
- CopyState cstate;
- bool pipe = (filename == NULL);
TupleDesc tupDesc;
AttrNumber num_phys_attrs,
num_defaults;
@@ -3399,38 +3434,8 @@ BeginCopyFrom(ParseState *pstate,
Oid in_func_oid;
int *defmap;
ExprState **defexprs;
- MemoryContext oldcontext;
bool volatile_defexprs;
- cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
- oldcontext = MemoryContextSwitchTo(cstate->copycontext);
-
- /* Initialize state variables */
- cstate->reached_eof = false;
- cstate->eol_type = EOL_UNKNOWN;
- cstate->cur_relname = RelationGetRelationName(cstate->rel);
- cstate->cur_lineno = 0;
- cstate->cur_attname = NULL;
- cstate->cur_attval = NULL;
-
- /*
- * Set up variables to avoid per-attribute overhead. attribute_buf and
- * raw_buf are used in both text and binary modes, but we use line_buf
- * only in text mode.
- */
- initStringInfo(&cstate->attribute_buf);
- cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
- cstate->raw_buf_index = cstate->raw_buf_len = 0;
- if (!cstate->binary)
- {
- initStringInfo(&cstate->line_buf);
- cstate->line_buf_converted = false;
- }
-
- /* Assign range table, we'll need it in CopyFrom. */
- if (pstate)
- cstate->range_table = pstate->p_rtable;
-
tupDesc = RelationGetDescr(cstate->rel);
num_phys_attrs = tupDesc->natts;
num_defaults = 0;
@@ -3508,6 +3513,61 @@ BeginCopyFrom(ParseState *pstate,
cstate->defexprs = defexprs;
cstate->volatile_defexprs = volatile_defexprs;
cstate->num_defaults = num_defaults;
+}
+
+/*
+ * Setup to read tuples from a file for COPY FROM.
+ *
+ * 'rel': Used as a template for the tuples
+ * 'filename': Name of server-local file to read
+ * 'attnamelist': List of char *, columns to include. NIL selects all cols.
+ * 'options': List of DefElem. See copy_opt_item in gram.y for selections.
+ *
+ * Returns a CopyState, to be passed to NextCopyFrom and related functions.
+ */
+CopyState
+BeginCopyFrom(ParseState *pstate,
+ Relation rel,
+ const char *filename,
+ bool is_program,
+ copy_data_source_cb data_source_cb,
+ List *attnamelist,
+ List *options)
+{
+ CopyState cstate;
+ bool pipe = (filename == NULL);
+ MemoryContext oldcontext;
+
+ cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+
+ /* Initialize state variables */
+ cstate->reached_eof = false;
+ cstate->eol_type = EOL_UNKNOWN;
+ cstate->cur_relname = RelationGetRelationName(cstate->rel);
+ cstate->cur_lineno = 0;
+ cstate->cur_attname = NULL;
+ cstate->cur_attval = NULL;
+
+ /*
+ * Set up variables to avoid per-attribute overhead. attribute_buf is
+ * used in both text and binary modes, but we use line_buf and raw_buf
+ * only in text mode.
+ */
+ initStringInfo(&cstate->attribute_buf);
+ cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+ if (!cstate->binary)
+ {
+ initStringInfo(&cstate->line_buf);
+ cstate->line_buf_converted = false;
+ }
+
+ /* Assign range table, we'll need it in CopyFrom. */
+ if (pstate)
+ cstate->range_table = pstate->p_rtable;
+
+ PopulateCstateCatalogInfo(cstate);
cstate->is_program = is_program;
if (data_source_cb)
@@ -3917,40 +3977,60 @@ CopyReadLine(CopyState cstate)
} while (CopyLoadRawBuf(cstate));
}
}
- else
+
+ ConvertToServerEncoding(cstate);
+ return result;
+}
+
+/*
+ * ClearEOLFromCopiedData
+ *
+ * Clear EOL from the copied data.
+ */
+static void
+ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
+ int copy_line_pos, int *copy_line_size)
+{
+ /*
+ * If we didn't hit EOF, then we must have transferred the EOL marker to
+ * line_buf along with the data. Get rid of it.
+ */
+ switch (cstate->eol_type)
{
- /*
- * If we didn't hit EOF, then we must have transferred the EOL marker
- * to line_buf along with the data. Get rid of it.
- */
- switch (cstate->eol_type)
- {
- case EOL_NL:
- Assert(cstate->line_buf.len >= 1);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\n');
- cstate->line_buf.len--;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_CR:
- Assert(cstate->line_buf.len >= 1);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\r');
- cstate->line_buf.len--;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_CRNL:
- Assert(cstate->line_buf.len >= 2);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 2] == '\r');
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\n');
- cstate->line_buf.len -= 2;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_UNKNOWN:
- /* shouldn't get here */
- Assert(false);
- break;
- }
+ case EOL_NL:
+ Assert(*copy_line_size >= 1);
+ Assert(copy_line_data[copy_line_pos - 1] == '\n');
+ copy_line_data[copy_line_pos - 1] = '\0';
+ (*copy_line_size)--;
+ break;
+ case EOL_CR:
+ Assert(*copy_line_size >= 1);
+ Assert(copy_line_data[copy_line_pos - 1] == '\r');
+ copy_line_data[copy_line_pos - 1] = '\0';
+ (*copy_line_size)--;
+ break;
+ case EOL_CRNL:
+ Assert(*copy_line_size >= 2);
+ Assert(copy_line_data[copy_line_pos - 2] == '\r');
+ Assert(copy_line_data[copy_line_pos - 1] == '\n');
+ copy_line_data[copy_line_pos - 2] = '\0';
+ *copy_line_size -= 2;
+ break;
+ case EOL_UNKNOWN:
+ /* shouldn't get here */
+ Assert(false);
+ break;
}
+}
+/*
+ * ConvertToServerEncoding
+ *
+ * Convert contents to server encoding.
+ */
+static void
+ConvertToServerEncoding(CopyState cstate)
+{
/* Done reading the line. Convert it to server encoding. */
if (cstate->need_transcoding)
{
@@ -3967,11 +4047,8 @@ CopyReadLine(CopyState cstate)
pfree(cvt);
}
}
-
/* Now it's safe to use the buffer in error messages */
cstate->line_buf_converted = true;
-
- return result;
}
/*
@@ -4334,6 +4411,9 @@ not_end_of_copy:
* Transfer any still-uncopied data to line_buf.
*/
REFILL_LINEBUF;
+ if (!result && !IsHeaderLine())
+ ClearEOLFromCopiedData(cstate, cstate->line_buf.data,
+ cstate->line_buf.len, &cstate->line_buf.len);
return result;
}
--
1.8.3.1
v6-0002-Framework-for-leader-worker-in-parallel-copy.patchapplication/x-patch; name=v6-0002-Framework-for-leader-worker-in-parallel-copy.patchDownload
From 67e5240af5ebe803473acebaf0e8796fd2a05cdd Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Wed, 7 Oct 2020 17:18:17 +0530
Subject: [PATCH v6 2/6] Framework for leader/worker in parallel copy
This patch has the framework for data structures in parallel copy, leader
initialization, worker initialization, shared memory updation, starting workers,
wait for workers and workers exiting.
---
src/backend/access/transam/parallel.c | 4 +
src/backend/commands/Makefile | 1 +
src/backend/commands/copy.c | 235 ++++++--------------
src/include/commands/copy.h | 389 +++++++++++++++++++++++++++++++++-
src/tools/pgindent/typedefs.list | 7 +
5 files changed, 469 insertions(+), 167 deletions(-)
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index b042696..a3cff4b 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -25,6 +25,7 @@
#include "catalog/pg_enum.h"
#include "catalog/storage.h"
#include "commands/async.h"
+#include "commands/copy.h"
#include "executor/execParallel.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
@@ -145,6 +146,9 @@ static const struct
},
{
"parallel_vacuum_main", parallel_vacuum_main
+ },
+ {
+ "ParallelCopyMain", ParallelCopyMain
}
};
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index d4815d3..a224aac 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -24,6 +24,7 @@ OBJS = \
constraint.o \
conversioncmds.o \
copy.o \
+ copyparallel.o \
createas.o \
dbcommands.o \
define.o \
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index f2848a1..1e55a30 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -29,7 +29,6 @@
#include "catalog/pg_type.h"
#include "commands/copy.h"
#include "commands/defrem.h"
-#include "commands/trigger.h"
#include "executor/execPartition.h"
#include "executor/executor.h"
#include "executor/nodeModifyTable.h"
@@ -63,29 +62,6 @@
#define OCTVALUE(c) ((c) - '0')
/*
- * Represents the different source/dest cases we need to worry about at
- * the bottom level
- */
-typedef enum CopyDest
-{
- COPY_FILE, /* to/from file (or a piped program) */
- COPY_OLD_FE, /* to/from frontend (2.0 protocol) */
- COPY_NEW_FE, /* to/from frontend (3.0 protocol) */
- COPY_CALLBACK /* to/from callback function */
-} CopyDest;
-
-/*
- * Represents the end-of-line terminator type of the input
- */
-typedef enum EolType
-{
- EOL_UNKNOWN,
- EOL_NL,
- EOL_CR,
- EOL_CRNL
-} EolType;
-
-/*
* Represents the heap insert method to be used during COPY FROM.
*/
typedef enum CopyInsertMethod
@@ -95,145 +71,10 @@ typedef enum CopyInsertMethod
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
} CopyInsertMethod;
-#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+#define IsParallelCopy() (cstate->is_parallel)
+#define IsLeader() (cstate->pcdata->is_leader)
#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
-/*
- * This struct contains all the state variables used throughout a COPY
- * operation. For simplicity, we use the same struct for all variants of COPY,
- * even though some fields are used in only some cases.
- *
- * Multi-byte encodings: all supported client-side encodings encode multi-byte
- * characters by having the first byte's high bit set. Subsequent bytes of the
- * character can have the high bit not set. When scanning data in such an
- * encoding to look for a match to a single-byte (ie ASCII) character, we must
- * use the full pg_encoding_mblen() machinery to skip over multibyte
- * characters, else we might find a false match to a trailing byte. In
- * supported server encodings, there is no possibility of a false match, and
- * it's faster to make useless comparisons to trailing bytes than it is to
- * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true
- * when we have to do it the hard way.
- */
-typedef struct CopyStateData
-{
- /* low-level state data */
- CopyDest copy_dest; /* type of copy source/destination */
- FILE *copy_file; /* used if copy_dest == COPY_FILE */
- StringInfo fe_msgbuf; /* used for all dests during COPY TO, only for
- * dest == COPY_NEW_FE in COPY FROM */
- bool is_copy_from; /* COPY TO, or COPY FROM? */
- bool reached_eof; /* true if we read to end of copy data (not
- * all copy_dest types maintain this) */
- EolType eol_type; /* EOL type of input */
- int file_encoding; /* file or remote side's character encoding */
- bool need_transcoding; /* file encoding diff from server? */
- bool encoding_embeds_ascii; /* ASCII can be non-first byte? */
-
- /* parameters from the COPY command */
- Relation rel; /* relation to copy to or from */
- QueryDesc *queryDesc; /* executable query to copy from */
- List *attnumlist; /* integer list of attnums to copy */
- char *filename; /* filename, or NULL for STDIN/STDOUT */
- bool is_program; /* is 'filename' a program to popen? */
- copy_data_source_cb data_source_cb; /* function for reading data */
- bool binary; /* binary format? */
- bool freeze; /* freeze rows on loading? */
- bool csv_mode; /* Comma Separated Value format? */
- bool header_line; /* CSV header line? */
- char *null_print; /* NULL marker string (server encoding!) */
- int null_print_len; /* length of same */
- char *null_print_client; /* same converted to file encoding */
- char *delim; /* column delimiter (must be 1 byte) */
- char *quote; /* CSV quote char (must be 1 byte) */
- char *escape; /* CSV escape char (must be 1 byte) */
- List *force_quote; /* list of column names */
- bool force_quote_all; /* FORCE_QUOTE *? */
- bool *force_quote_flags; /* per-column CSV FQ flags */
- List *force_notnull; /* list of column names */
- bool *force_notnull_flags; /* per-column CSV FNN flags */
- List *force_null; /* list of column names */
- bool *force_null_flags; /* per-column CSV FN flags */
- bool convert_selectively; /* do selective binary conversion? */
- List *convert_select; /* list of column names (can be NIL) */
- bool *convert_select_flags; /* per-column CSV/TEXT CS flags */
- Node *whereClause; /* WHERE condition (or NULL) */
-
- /* these are just for error messages, see CopyFromErrorCallback */
- const char *cur_relname; /* table name for error messages */
- uint64 cur_lineno; /* line number for error messages */
- const char *cur_attname; /* current att for error messages */
- const char *cur_attval; /* current att value for error messages */
-
- /*
- * Working state for COPY TO/FROM
- */
- MemoryContext copycontext; /* per-copy execution context */
-
- /*
- * Working state for COPY TO
- */
- FmgrInfo *out_functions; /* lookup info for output functions */
- MemoryContext rowcontext; /* per-row evaluation context */
-
- /*
- * Working state for COPY FROM
- */
- AttrNumber num_defaults;
- FmgrInfo *in_functions; /* array of input functions for each attrs */
- Oid *typioparams; /* array of element types for in_functions */
- int *defmap; /* array of default att numbers */
- ExprState **defexprs; /* array of default att expressions */
- bool volatile_defexprs; /* is any of defexprs volatile? */
- List *range_table;
- ExprState *qualexpr;
-
- TransitionCaptureState *transition_capture;
-
- /*
- * These variables are used to reduce overhead in COPY FROM.
- *
- * attribute_buf holds the separated, de-escaped text for each field of
- * the current line. The CopyReadAttributes functions return arrays of
- * pointers into this buffer. We avoid palloc/pfree overhead by re-using
- * the buffer on each cycle.
- *
- * In binary COPY FROM, attribute_buf holds the binary data for the
- * current field, but the usage is otherwise similar.
- */
- StringInfoData attribute_buf;
-
- /* field raw data pointers found by COPY FROM */
-
- int max_fields;
- char **raw_fields;
-
- /*
- * Similarly, line_buf holds the whole input line being processed. The
- * input cycle is first to read the whole line into line_buf, convert it
- * to server encoding there, and then extract the individual attribute
- * fields into attribute_buf. line_buf is preserved unmodified so that we
- * can display it in error messages if appropriate. (In binary mode,
- * line_buf is not used.)
- */
- StringInfoData line_buf;
- bool line_buf_converted; /* converted to server encoding? */
- bool line_buf_valid; /* contains the row being processed? */
-
- /*
- * Finally, raw_buf holds raw data read from the data source (file or
- * client connection). In text mode, CopyReadLine parses this data
- * sufficiently to locate line boundaries, then transfers the data to
- * line_buf and converts it. In binary mode, CopyReadBinaryData fetches
- * appropriate amounts of data from this buffer. In both modes, we
- * guarantee that there is a \0 at raw_buf[raw_buf_len].
- */
- char *raw_buf;
- int raw_buf_index; /* next byte to process */
- int raw_buf_len; /* total # of bytes stored */
- /* Shorthand for number of unconsumed bytes available in raw_buf */
-#define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
-} CopyStateData;
-
/* DestReceiver for COPY (query) TO */
typedef struct
{
@@ -415,8 +256,6 @@ static bool CopyGetInt16(CopyState cstate, int16 *val);
static bool CopyLoadRawBuf(CopyState cstate);
static int CopyReadBinaryData(CopyState cstate, char *dest, int nbytes);
-static void PopulateCommonCstateInfo(CopyState cstate, TupleDesc tup_desc,
- List *attnamelist);
static void ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
int copy_line_pos, int *copy_line_size);
static void ConvertToServerEncoding(CopyState cstate);
@@ -1134,6 +973,8 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
if (is_from)
{
+ ParallelContext *pcxt = NULL;
+
Assert(rel);
/* check read-only transaction and parallel mode */
@@ -1143,7 +984,35 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
cstate = BeginCopyFrom(pstate, rel, stmt->filename, stmt->is_program,
NULL, stmt->attlist, stmt->options);
cstate->whereClause = whereClause;
- *processed = CopyFrom(cstate); /* copy from file to database */
+ cstate->is_parallel = false;
+
+ if (cstate->nworkers > 0)
+ pcxt = BeginParallelCopy(cstate->nworkers, cstate, stmt->attlist,
+ relid);
+
+ if (pcxt)
+ {
+ int i;
+
+ ParallelCopyFrom(cstate);
+
+ /* Wait for all copy workers to finish */
+ WaitForParallelWorkersToFinish(pcxt);
+
+ /*
+ * Next, accumulate WAL usage. (This must wait for the workers to
+ * finish, or we might get incomplete data.)
+ */
+ for (i = 0; i < pcxt->nworkers_launched; i++)
+ InstrAccumParallelQuery(&cstate->pcdata->bufferusage[i],
+ &cstate->pcdata->walusage[i]);
+
+ *processed = pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
+ EndParallelCopy(pcxt);
+ }
+ else
+ *processed = CopyFrom(cstate); /* copy from file to database */
+
EndCopyFrom(cstate);
}
else
@@ -1192,6 +1061,7 @@ ProcessCopyOptions(ParseState *pstate,
cstate->is_copy_from = is_from;
cstate->file_encoding = -1;
+ cstate->nworkers = -1;
/* Extract options from the statement node tree */
foreach(option, options)
@@ -1360,6 +1230,39 @@ ProcessCopyOptions(ParseState *pstate,
defel->defname),
parser_errposition(pstate, defel->location)));
}
+ else if (strcmp(defel->defname, "parallel") == 0)
+ {
+ int val;
+ bool parsed;
+ char *strval;
+
+ if (!cstate->is_copy_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("parallel option is supported only for copy from"),
+ parser_errposition(pstate, defel->location)));
+ if (cstate->nworkers >= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("conflicting or redundant options"),
+ parser_errposition(pstate, defel->location)));
+
+ strval = defGetString(defel);
+ parsed = parse_int(strval, &val, 0, NULL);
+ if (!parsed)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for integer option \"%s\": %s",
+ defel->defname, strval)));
+ if (val < 1 || val > 1024)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("value %s out of bounds for option \"%s\"",
+ strval, defel->defname),
+ errdetail("Valid values are between \"%d\" and \"%d\".",
+ 1, 1024)));
+ cstate->nworkers = val;
+ }
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1715,9 +1618,9 @@ BeginCopy(ParseState *pstate,
* PopulateCommonCstateInfo
*
* Populates the common variables required for copy from operation. This is a
- * helper function for BeginCopy function.
+ * helper function for BeginCopy & InitializeParallelCopyInfo function.
*/
-static void
+void
PopulateCommonCstateInfo(CopyState cstate, TupleDesc tupDesc, List *attnamelist)
{
int num_phys_attrs;
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c639833..cd2d56e 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -14,14 +14,394 @@
#ifndef COPY_H
#define COPY_H
+#include "access/parallel.h"
+#include "commands/trigger.h"
+#include "executor/executor.h"
#include "nodes/execnodes.h"
#include "nodes/parsenodes.h"
#include "parser/parse_node.h"
#include "tcop/dest.h"
+#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+
+/*
+ * The macros DATA_BLOCK_SIZE, RINGSIZE & MAX_BLOCKS_COUNT stores the records
+ * read from the file that need to be inserted into the relation. These values
+ * help in the handover of multiple records with the significant size of data to
+ * be processed by each of the workers. This also ensures there is no context
+ * switch and the work is fairly distributed among the workers. This number
+ * showed best results in the performance tests.
+ */
+#define DATA_BLOCK_SIZE RAW_BUF_SIZE
+
+/* It can hold 1023 blocks of 64K data in DSM to be processed by the worker. */
+#define MAX_BLOCKS_COUNT 1024
+
+/*
+ * It can hold upto 10240 record information for worker to process. RINGSIZE
+ * should be a multiple of WORKER_CHUNK_COUNT, as wrap around cases is currently
+ * not handled while selecting the WORKER_CHUNK_COUNT by the worker.
+ */
+#define RINGSIZE (10 * 1024)
+
+/*
+ * Each worker will be allocated WORKER_CHUNK_COUNT of records from DSM data
+ * block to process to avoid lock contention. Read RINGSIZE comments before
+ * changing this value.
+ */
+#define WORKER_CHUNK_COUNT 64
+
+/*
+ * Represents the different source/dest cases we need to worry about at
+ * the bottom level
+ */
+typedef enum CopyDest
+{
+ COPY_FILE, /* to/from file (or a piped program) */
+ COPY_OLD_FE, /* to/from frontend (2.0 protocol) */
+ COPY_NEW_FE, /* to/from frontend (3.0 protocol) */
+ COPY_CALLBACK /* to/from callback function */
+} CopyDest;
+
+/*
+ * Represents the end-of-line terminator type of the input
+ */
+typedef enum EolType
+{
+ EOL_UNKNOWN,
+ EOL_NL,
+ EOL_CR,
+ EOL_CRNL
+} EolType;
+
+/*
+ * Copy data block information.
+ *
+ * These data blocks are created in DSM. Data read from file will be copied in
+ * these DSM data blocks. The leader process identifies the records and the
+ * record information will be shared to the workers. The workers will insert the
+ * records into the table. There can be one or more number of records in each of
+ * the data block based on the record size.
+ */
+typedef struct ParallelCopyDataBlock
+{
+ /* The number of unprocessed lines in the current block. */
+ pg_atomic_uint32 unprocessed_line_parts;
+
+ /*
+ * If the current line data is continued into another block,
+ * following_block will have the position where the remaining data need to
+ * be read.
+ */
+ uint32 following_block;
+
+ /*
+ * This flag will be set, when the leader finds out this block can be read
+ * safely by the worker. This helps the worker to start processing the
+ * line early where the line will be spread across many blocks and the
+ * worker need not wait for the complete line to be processed.
+ */
+ bool curr_blk_completed;
+
+ /*
+ * Few bytes need to be skipped from this block, this will be set when a
+ * sequence of characters like \r\n is expected, but end of our block
+ * contained only \r. In this case we copy the data from \r into the new
+ * block as they have to be processed together to identify end of line.
+ * Worker will use skip_bytes to know that this data must be skipped from
+ * this data block.
+ */
+ uint8 skip_bytes;
+ char data[DATA_BLOCK_SIZE]; /* data read from file */
+} ParallelCopyDataBlock;
+
+/*
+ * Individual line information.
+ *
+ * ParallelCopyLineBoundary is common data structure between leader & worker.
+ * Leader process will be populating data block, data block offset & the size of
+ * the record in DSM for the workers to copy the data into the relation.
+ * The leader & worker process access the shared line information by following
+ * the below steps to avoid any data corruption or hang:
+ * Leader should operate in the following order:
+ * 1) check if line_size is -1, if line_size is not -1 wait until line_size is
+ * set to -1 by the worker. If line_size is -1 it means worker is still
+ * processing.
+ * 2) set line_state to LINE_LEADER_POPULATING, so that the worker knows that
+ * leader is populating this line.
+ * 3) update first_block, start_offset & cur_lineno in any order.
+ * 4) update line_size.
+ * 5) update line_state to LINE_LEADER_POPULATED.
+ * Worker should operate in the following order:
+ * 1) check line_state is LINE_LEADER_POPULATED, if not it means leader is still
+ * populating the data.
+ * 2) read line_size to know the size of the data.
+ * 3) only one worker should choose one line for processing, this is handled by
+ * using pg_atomic_compare_exchange_u32, worker will change the state to
+ * LINE_WORKER_PROCESSING only if line_state is LINE_LEADER_POPULATED.
+ * 4) read first_block, start_offset & cur_lineno in any order.
+ * 5) process line_size data.
+ * 6) update line_size to -1.
+ */
+typedef struct ParallelCopyLineBoundary
+{
+ /* Position of the first block in data_blocks array. */
+ uint32 first_block;
+ uint32 start_offset; /* start offset of the line */
+
+ /*
+ * Size of the current line -1 means line is yet to be filled completely,
+ * 0 means empty line, >0 means line filled with line size data.
+ */
+ pg_atomic_uint32 line_size;
+ pg_atomic_uint32 line_state; /* line state */
+ uint64 cur_lineno; /* line number for error messages */
+} ParallelCopyLineBoundary;
+
+/*
+ * Circular queue used to store the line information.
+ */
+typedef struct ParallelCopyLineBoundaries
+{
+ /* Position for the leader to populate a line. */
+ uint32 pos;
+
+ /* Data read from the file/stdin by the leader process. */
+ ParallelCopyLineBoundary ring[RINGSIZE];
+} ParallelCopyLineBoundaries;
+
+/*
+ * Shared information among parallel copy workers. This will be allocated in the
+ * DSM segment.
+ */
+typedef struct ParallelCopyShmInfo
+{
+ bool is_read_in_progress; /* file read status */
+
+ /*
+ * Actual lines inserted by worker, will not be same as
+ * total_worker_processed if where condition is specified along with copy.
+ * This will be the actual records inserted into the relation.
+ */
+ pg_atomic_uint64 processed;
+
+ /*
+ * The number of records currently processed by the worker, this will also
+ * include the number of records that was filtered because of where
+ * clause.
+ */
+ pg_atomic_uint64 total_worker_processed;
+ uint64 populated; /* lines populated by leader */
+ uint32 cur_block_pos; /* current data block */
+ ParallelCopyDataBlock data_blocks[MAX_BLOCKS_COUNT]; /* data block array */
+ FullTransactionId full_transaction_id; /* xid for copy from statement */
+ CommandId mycid; /* command id */
+ ParallelCopyLineBoundaries line_boundaries; /* line array */
+} ParallelCopyShmInfo;
+
+/*
+ * Parallel copy line buffer information.
+ */
+typedef struct ParallelCopyLineBuf
+{
+ StringInfoData line_buf;
+ uint64 cur_lineno; /* line number for error messages */
+} ParallelCopyLineBuf;
+
+/*
+ * This structure helps in storing the common data from CopyStateData that are
+ * required by the workers. This information will then be allocated and stored
+ * into the DSM for the worker to retrieve and copy it to CopyStateData.
+ */
+typedef struct SerializedParallelCopyState
+{
+ /* low-level state data */
+ CopyDest copy_dest; /* type of copy source/destination */
+ int file_encoding; /* file or remote side's character encoding */
+ bool need_transcoding; /* file encoding diff from server? */
+ bool encoding_embeds_ascii; /* ASCII can be non-first byte? */
+
+ /* parameters from the COPY command */
+ bool csv_mode; /* Comma Separated Value format? */
+ bool header_line; /* CSV header line? */
+ int null_print_len; /* length of same */
+ bool force_quote_all; /* FORCE_QUOTE *? */
+ bool convert_selectively; /* do selective binary conversion? */
+
+ /* Working state for COPY FROM */
+ AttrNumber num_defaults;
+ Oid relid;
+} SerializedParallelCopyState;
+
+/*
+ * Parallel copy data information.
+ */
+typedef struct ParallelCopyData
+{
+ Oid relid; /* relation id of the table */
+ ParallelCopyShmInfo *pcshared_info; /* common info in shared memory */
+ bool is_leader;
+
+ WalUsage *walusage;
+ BufferUsage *bufferusage;
+
+ /* line position which worker is processing */
+ uint32 worker_processed_pos;
+
+ /*
+ * Local line_buf array, workers will copy it here and release the lines
+ * for the leader to continue.
+ */
+ ParallelCopyLineBuf worker_line_buf[WORKER_CHUNK_COUNT];
+ uint32 worker_line_buf_count; /* Number of lines */
+
+ /* Current position in worker_line_buf */
+ uint32 worker_line_buf_pos;
+} ParallelCopyData;
+
+typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
+
+/*
+ * This struct contains all the state variables used throughout a COPY
+ * operation. For simplicity, we use the same struct for all variants of COPY,
+ * even though some fields are used in only some cases.
+ *
+ * Multi-byte encodings: all supported client-side encodings encode multi-byte
+ * characters by having the first byte's high bit set. Subsequent bytes of the
+ * character can have the high bit not set. When scanning data in such an
+ * encoding to look for a match to a single-byte (ie ASCII) character, we must
+ * use the full pg_encoding_mblen() machinery to skip over multibyte
+ * characters, else we might find a false match to a trailing byte. In
+ * supported server encodings, there is no possibility of a false match, and
+ * it's faster to make useless comparisons to trailing bytes than it is to
+ * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true
+ * when we have to do it the hard way.
+ */
+typedef struct CopyStateData
+{
+ /* low-level state data */
+ CopyDest copy_dest; /* type of copy source/destination */
+ FILE *copy_file; /* used if copy_dest == COPY_FILE */
+ StringInfo fe_msgbuf; /* used for all dests during COPY TO, only for
+ * dest == COPY_NEW_FE in COPY FROM */
+ bool is_copy_from; /* COPY TO, or COPY FROM? */
+ bool reached_eof; /* true if we read to end of copy data (not
+ * all copy_dest types maintain this) */
+ EolType eol_type; /* EOL type of input */
+ int file_encoding; /* file or remote side's character encoding */
+ bool need_transcoding; /* file encoding diff from server? */
+ bool encoding_embeds_ascii; /* ASCII can be non-first byte? */
+
+ /* parameters from the COPY command */
+ Relation rel; /* relation to copy to or from */
+ QueryDesc *queryDesc; /* executable query to copy from */
+ List *attnumlist; /* integer list of attnums to copy */
+ char *filename; /* filename, or NULL for STDIN/STDOUT */
+ bool is_program; /* is 'filename' a program to popen? */
+ copy_data_source_cb data_source_cb; /* function for reading data */
+ bool binary; /* binary format? */
+ bool freeze; /* freeze rows on loading? */
+ bool csv_mode; /* Comma Separated Value format? */
+ bool header_line; /* CSV header line? */
+ char *null_print; /* NULL marker string (server encoding!) */
+ int null_print_len; /* length of same */
+ char *null_print_client; /* same converted to file encoding */
+ char *delim; /* column delimiter (must be 1 byte) */
+ char *quote; /* CSV quote char (must be 1 byte) */
+ char *escape; /* CSV escape char (must be 1 byte) */
+ List *force_quote; /* list of column names */
+ bool force_quote_all; /* FORCE_QUOTE *? */
+ bool *force_quote_flags; /* per-column CSV FQ flags */
+ List *force_notnull; /* list of column names */
+ bool *force_notnull_flags; /* per-column CSV FNN flags */
+ List *force_null; /* list of column names */
+ bool *force_null_flags; /* per-column CSV FN flags */
+ bool convert_selectively; /* do selective binary conversion? */
+ List *convert_select; /* list of column names (can be NIL) */
+ bool *convert_select_flags; /* per-column CSV/TEXT CS flags */
+ Node *whereClause; /* WHERE condition (or NULL) */
+
+ /* these are just for error messages, see CopyFromErrorCallback */
+ const char *cur_relname; /* table name for error messages */
+ uint64 cur_lineno; /* line number for error messages */
+ const char *cur_attname; /* current att for error messages */
+ const char *cur_attval; /* current att value for error messages */
+
+ /*
+ * Working state for COPY TO/FROM
+ */
+ MemoryContext copycontext; /* per-copy execution context */
+
+ /*
+ * Working state for COPY TO
+ */
+ FmgrInfo *out_functions; /* lookup info for output functions */
+ MemoryContext rowcontext; /* per-row evaluation context */
+
+ /*
+ * Working state for COPY FROM
+ */
+ AttrNumber num_defaults;
+ FmgrInfo *in_functions; /* array of input functions for each attrs */
+ Oid *typioparams; /* array of element types for in_functions */
+ int *defmap; /* array of default att numbers */
+ ExprState **defexprs; /* array of default att expressions */
+ bool volatile_defexprs; /* is any of defexprs volatile? */
+ List *range_table;
+ ExprState *qualexpr;
+
+ TransitionCaptureState *transition_capture;
+
+ /*
+ * These variables are used to reduce overhead in COPY FROM.
+ *
+ * attribute_buf holds the separated, de-escaped text for each field of
+ * the current line. The CopyReadAttributes functions return arrays of
+ * pointers into this buffer. We avoid palloc/pfree overhead by re-using
+ * the buffer on each cycle.
+ *
+ * In binary COPY FROM, attribute_buf holds the binary data for the
+ * current field, but the usage is otherwise similar.
+ */
+ StringInfoData attribute_buf;
+
+ /* field raw data pointers found by COPY FROM */
+
+ int max_fields;
+ char **raw_fields;
+
+ /*
+ * Similarly, line_buf holds the whole input line being processed. The
+ * input cycle is first to read the whole line into line_buf, convert it
+ * to server encoding there, and then extract the individual attribute
+ * fields into attribute_buf. line_buf is preserved unmodified so that we
+ * can display it in error messages if appropriate. (In binary mode,
+ * line_buf is not used.)
+ */
+ StringInfoData line_buf;
+ bool line_buf_converted; /* converted to server encoding? */
+ bool line_buf_valid; /* contains the row being processed? */
+
+ /*
+ * Finally, raw_buf holds raw data read from the data source (file or
+ * client connection). In text mode, CopyReadLine parses this data
+ * sufficiently to locate line boundaries, then transfers the data to
+ * line_buf and converts it. In binary mode, CopyReadBinaryData fetches
+ * appropriate amounts of data from this buffer. In both modes, we
+ * guarantee that there is a \0 at raw_buf[raw_buf_len].
+ */
+ char *raw_buf;
+ int raw_buf_index; /* next byte to process */
+ int raw_buf_len; /* total # of bytes stored */
+ int nworkers;
+ bool is_parallel;
+ ParallelCopyData *pcdata;
+ /* Shorthand for number of unconsumed bytes available in raw_buf */
+#define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
+} CopyStateData;
+
/* CopyStateData is private in commands/copy.c */
typedef struct CopyStateData *CopyState;
-typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
extern void DoCopy(ParseState *state, const CopyStmt *stmt,
int stmt_location, int stmt_len,
@@ -41,4 +421,11 @@ extern uint64 CopyFrom(CopyState cstate);
extern DestReceiver *CreateCopyDestReceiver(void);
+extern void PopulateCommonCstateInfo(CopyState cstate, TupleDesc tup_desc,
+ List *attnamelist);
+
+extern void ParallelCopyMain(dsm_segment *seg, shm_toc *toc);
+extern ParallelContext *BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid);
+extern void ParallelCopyFrom(CopyState cstate);
+extern void EndParallelCopy(ParallelContext *pcxt);
#endif /* COPY_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 9cd1179..f5b818b 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1702,6 +1702,12 @@ ParallelBitmapHeapState
ParallelBlockTableScanDesc
ParallelCompletionPtr
ParallelContext
+ParallelCopyLineBoundaries
+ParallelCopyLineBoundary
+ParallelCopyData
+ParallelCopyDataBlock
+ParallelCopyLineBuf
+ParallelCopyShmInfo
ParallelExecutorInfo
ParallelHashGrowth
ParallelHashJoinBatch
@@ -2219,6 +2225,7 @@ SerCommitSeqNo
SerialControl
SerializableXactHandle
SerializedActiveRelMaps
+SerializedParallelCopyState
SerializedReindexState
SerializedSnapshotData
SerializedTransactionState
--
1.8.3.1
v6-0003-Allow-copy-from-command-to-process-data-from-file.patchapplication/x-patch; name=v6-0003-Allow-copy-from-command-to-process-data-from-file.patchDownload
From eb1a33276d1f907d14e7e1962b1cd254b81e1587 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Wed, 7 Oct 2020 17:24:44 +0530
Subject: [PATCH v6 3/6] Allow copy from command to process data from
file/STDIN contents to a table in parallel.
This feature allows the copy from to leverage multiple CPUs in order to copy
data from file/STDIN to a table. This adds a PARALLEL option to COPY FROM
command where the user can specify the number of workers that can be used
to perform the COPY FROM command.
The backend, to which the "COPY FROM" query is submitted acts as leader with
the responsibility of reading data from the file/stdin, launching at most n
number of workers as specified with PARALLEL 'n' option in the "COPY FROM"
query. The leader populates the common data required for the workers execution
in the DSM and shares it with the workers. The leader then executes before
statement triggers if there exists any. Leader populates DSM lines which
includes the start offset and line size, while populating the lines it reads
as many blocks as required into the DSM data blocks from the file. Each block
is of 64K size. The leader parses the data to identify a line, the existing
logic from CopyReadLineText which identifies the lines with some changes was
used for this. Leader checks if a free line is available to copy the
information, if there is no free line it waits till the required line is
freed up by the worker and then copies the identified lines information
(offset & line size) into the DSM lines. This process is repeated till the
complete file is processed. Simultaneously, the workers cache the lines(50)
locally into the local memory and release the lines to the leader for further
populating. Each worker processes the lines that it cached and inserts it into
the table.
The leader does not participate in the insertion of data, leaders only
responsibility will be to identify the lines as fast as possible for the
workers to do the actual copy operation. The leader waits till all the lines
populated are processed by the workers and exits. We have chosen this design
based on the reason "that everything stalls if the leader doesn't accept further
input data, as well as when there are no available splitted chunks so it doesn't
seem like a good idea to have the leader do other work. This is backed by the
performance data where we have seen that with 1 worker there is just a 5-10%
performance difference".
---
src/backend/access/common/toast_internals.c | 12 +-
src/backend/access/heap/heapam.c | 11 -
src/backend/access/transam/xact.c | 15 +
src/backend/commands/copy.c | 220 +++--
src/backend/commands/copyparallel.c | 1269 +++++++++++++++++++++++++++
src/bin/psql/tab-complete.c | 2 +-
src/include/access/xact.h | 1 +
src/include/commands/copy.h | 69 +-
src/tools/pgindent/typedefs.list | 1 +
9 files changed, 1514 insertions(+), 86 deletions(-)
create mode 100644 src/backend/commands/copyparallel.c
diff --git a/src/backend/access/common/toast_internals.c b/src/backend/access/common/toast_internals.c
index 25a81e5..70c070e 100644
--- a/src/backend/access/common/toast_internals.c
+++ b/src/backend/access/common/toast_internals.c
@@ -17,6 +17,7 @@
#include "access/genam.h"
#include "access/heapam.h"
#include "access/heaptoast.h"
+#include "access/parallel.h"
#include "access/table.h"
#include "access/toast_internals.h"
#include "access/xact.h"
@@ -116,7 +117,16 @@ toast_save_datum(Relation rel, Datum value,
TupleDesc toasttupDesc;
Datum t_values[3];
bool t_isnull[3];
- CommandId mycid = GetCurrentCommandId(true);
+
+ /*
+ * Parallel copy can insert toast tuples, in case of parallel copy the
+ * command would have been set already by calling
+ * AssignCommandIdForWorker. For parallel copy call GetCurrentCommandId to
+ * get currentCommandId by passing used as false, as this is taken care
+ * earlier.
+ */
+ CommandId mycid = IsParallelWorker() ? GetCurrentCommandId(false) :
+ GetCurrentCommandId(true);
struct varlena *result;
struct varatt_external toast_pointer;
union
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 1585861..1602525 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2043,17 +2043,6 @@ static HeapTuple
heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
CommandId cid, int options)
{
- /*
- * To allow parallel inserts, we need to ensure that they are safe to be
- * performed in workers. We have the infrastructure to allow parallel
- * inserts in general except for the cases where inserts generate a new
- * CommandId (eg. inserts into a table having a foreign key column).
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-
tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index af6afce..0b3337c 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -776,6 +776,21 @@ GetCurrentCommandId(bool used)
}
/*
+ * SetCurrentCommandIdUsedForWorker
+ *
+ * For a parallel worker, record that the currentCommandId has been used.
+ * This must only be called at the start of a parallel operation.
+ */
+void
+SetCurrentCommandIdUsedForWorker(void)
+{
+ Assert(IsParallelWorker() && !currentCommandIdUsed &&
+ (currentCommandId != InvalidCommandId));
+
+ currentCommandIdUsed = true;
+}
+
+/*
* SetParallelStartTimestamps
*
* In a parallel worker, we should inherit the parent transaction's
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 1e55a30..dc006a5 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -61,20 +61,6 @@
#define ISOCTAL(c) (((c) >= '0') && ((c) <= '7'))
#define OCTVALUE(c) ((c) - '0')
-/*
- * Represents the heap insert method to be used during COPY FROM.
- */
-typedef enum CopyInsertMethod
-{
- CIM_SINGLE, /* use table_tuple_insert or fdw routine */
- CIM_MULTI, /* always use table_multi_insert */
- CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
-} CopyInsertMethod;
-
-#define IsParallelCopy() (cstate->is_parallel)
-#define IsLeader() (cstate->pcdata->is_leader)
-#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
-
/* DestReceiver for COPY (query) TO */
typedef struct
{
@@ -131,7 +117,6 @@ typedef struct CopyMultiInsertInfo
int ti_options; /* table insert options */
} CopyMultiInsertInfo;
-
/*
* These macros centralize code used to process line_buf and raw_buf buffers.
* They are macros because they often do continue/break control and to avoid
@@ -182,9 +167,13 @@ if (1) \
{ \
if (raw_buf_ptr > cstate->raw_buf_index) \
{ \
- appendBinaryStringInfo(&cstate->line_buf, \
- cstate->raw_buf + cstate->raw_buf_index, \
- raw_buf_ptr - cstate->raw_buf_index); \
+ if (!IsParallelCopy()) \
+ appendBinaryStringInfo(&cstate->line_buf, \
+ cstate->raw_buf + cstate->raw_buf_index, \
+ raw_buf_ptr - cstate->raw_buf_index); \
+ else \
+ line_size += raw_buf_ptr - cstate->raw_buf_index; \
+ \
cstate->raw_buf_index = raw_buf_ptr; \
} \
} else ((void) 0)
@@ -197,18 +186,6 @@ if (1) \
goto not_end_of_copy; \
} else ((void) 0)
-/*
- * Increment the lines processed.
- */
-#define INCREMENTPROCESSED(processed) \
-processed++;
-
-/*
- * Get the lines processed.
- */
-#define RETURNPROCESSED(processed) \
-return processed;
-
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
@@ -225,7 +202,6 @@ static void EndCopyTo(CopyState cstate);
static uint64 DoCopyTo(CopyState cstate);
static uint64 CopyTo(CopyState cstate);
static void CopyOneRowTo(CopyState cstate, TupleTableSlot *slot);
-static bool CopyReadLine(CopyState cstate);
static bool CopyReadLineText(CopyState cstate);
static int CopyReadAttributesText(CopyState cstate);
static int CopyReadAttributesCSV(CopyState cstate);
@@ -258,7 +234,6 @@ static int CopyReadBinaryData(CopyState cstate, char *dest, int nbytes);
static void ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
int copy_line_pos, int *copy_line_size);
-static void ConvertToServerEncoding(CopyState cstate);
/*
@@ -2639,7 +2614,7 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
*
* Check if the relation specified in copy from is valid.
*/
-static void
+void
CheckTargetRelValidity(CopyState cstate)
{
Assert(cstate->rel);
@@ -2735,7 +2710,7 @@ CopyFrom(CopyState cstate)
PartitionTupleRouting *proute = NULL;
ErrorContextCallback errcallback;
- CommandId mycid = GetCurrentCommandId(true);
+ CommandId mycid;
int ti_options = 0; /* start with default options for insert */
BulkInsertState bistate = NULL;
CopyInsertMethod insertMethod;
@@ -2745,7 +2720,18 @@ CopyFrom(CopyState cstate)
bool has_instead_insert_row_trig;
bool leafpart_use_multi_insert = false;
- CheckTargetRelValidity(cstate);
+ /*
+ * Perform this check if it is not parallel copy. In case of parallel
+ * copy, this check is done by the leader, so that if any invalid case
+ * exist the copy from command will error out from the leader itself,
+ * avoiding launching workers, just to throw error.
+ */
+ if (!IsParallelCopy())
+ CheckTargetRelValidity(cstate);
+ else
+ SetCurrentCommandIdUsedForWorker();
+
+ mycid = GetCurrentCommandId(!IsParallelCopy());
/*
* If the target file is new-in-transaction, we assume that checking FSM
@@ -2785,7 +2771,8 @@ CopyFrom(CopyState cstate)
target_resultRelInfo = resultRelInfo;
/* Verify the named relation is a valid target for INSERT */
- CheckValidResultRel(resultRelInfo, CMD_INSERT);
+ if (!IsParallelCopy())
+ CheckValidResultRel(resultRelInfo, CMD_INSERT);
ExecOpenIndices(resultRelInfo, false);
@@ -2934,13 +2921,17 @@ CopyFrom(CopyState cstate)
has_instead_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
resultRelInfo->ri_TrigDesc->trig_insert_instead_row);
- /*
- * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
- * should do this for COPY, since it's not really an "INSERT" statement as
- * such. However, executing these triggers maintains consistency with the
- * EACH ROW triggers that we already fire on COPY.
- */
- ExecBSInsertTriggers(estate, resultRelInfo);
+ if (!IsParallelCopy())
+ {
+ /*
+ * Check BEFORE STATEMENT insertion triggers. It's debatable whether
+ * we should do this for COPY, since it's not really an "INSERT"
+ * statement as such. However, executing these triggers maintains
+ * consistency with the EACH ROW triggers that we already fire on
+ * COPY.
+ */
+ ExecBSInsertTriggers(estate, resultRelInfo);
+ }
econtext = GetPerTupleExprContext(estate);
@@ -3040,6 +3031,29 @@ CopyFrom(CopyState cstate)
!has_instead_insert_row_trig &&
resultRelInfo->ri_FdwRoutine == NULL;
+ /*
+ * We may still be able to perform parallel inserts for
+ * partitioned tables. However, the possibility of this
+ * depends on which types of triggers exist on the partition.
+ * We must not do parallel inserts if the partition is a
+ * foreign table or it has any BEFORE/INSTEAD OF row triggers.
+ * Since the partition's resultRelInfo are initialized only
+ * when we actually insert the first tuple into them, we may
+ * not know this info easily in the leader while deciding for
+ * the parallelism. We would have gone ahead and allowed
+ * parallelism. Now it's the time to throw an error and also
+ * provide a hint to the user to not use parallelism. Throwing
+ * an error seemed a simple approach than to look for all the
+ * partitions in the leader while deciding for the
+ * parallelism. Note that this error is thrown early, exactly
+ * on the first tuple being inserted into the partition, so
+ * not much work, that has been done so far, is wasted.
+ */
+ if (!leafpart_use_multi_insert && IsParallelWorker())
+ ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot perform PARALLEL COPY if partition has BEFORE/INSTEAD OF triggers, or if the partition is foreign partition"),
+ errhint("Try COPY without PARALLEL option")));
+
/* Set the multi-insert buffer to use for this partition. */
if (leafpart_use_multi_insert)
{
@@ -3325,7 +3339,7 @@ CopyFrom(CopyState cstate)
*
* Populate the cstate catalog information.
*/
-static void
+void
PopulateCstateCatalogInfo(CopyState cstate)
{
TupleDesc tupDesc;
@@ -3607,26 +3621,35 @@ NextCopyFromRawFields(CopyState cstate, char ***fields, int *nfields)
/* only available for text or csv input */
Assert(!cstate->binary);
- /* on input just throw the header line away */
- if (cstate->cur_lineno == 0 && cstate->header_line)
+ if (IsParallelCopy())
{
- cstate->cur_lineno++;
- if (CopyReadLine(cstate))
- return false; /* done */
+ done = GetWorkerLine(cstate);
+ if (done && cstate->line_buf.len == 0)
+ return false;
}
+ else
+ {
+ /* on input just throw the header line away */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
+ {
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ return false; /* done */
+ }
- cstate->cur_lineno++;
+ cstate->cur_lineno++;
- /* Actually read the line into memory here */
- done = CopyReadLine(cstate);
+ /* Actually read the line into memory here */
+ done = CopyReadLine(cstate);
- /*
- * EOF at start of line means we're done. If we see EOF after some
- * characters, we act as though it was newline followed by EOF, ie,
- * process the line and then exit loop on next iteration.
- */
- if (done && cstate->line_buf.len == 0)
- return false;
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ return false;
+ }
/* Parse the line into de-escaped field values */
if (cstate->csv_mode)
@@ -3851,7 +3874,7 @@ EndCopyFrom(CopyState cstate)
* by newline. The terminating newline or EOF marker is not included
* in the final value of line_buf.
*/
-static bool
+bool
CopyReadLine(CopyState cstate)
{
bool result;
@@ -3874,9 +3897,34 @@ CopyReadLine(CopyState cstate)
*/
if (cstate->copy_dest == COPY_NEW_FE)
{
+ bool bIsFirst = true;
+
do
{
- cstate->raw_buf_index = cstate->raw_buf_len;
+ if (!IsParallelCopy())
+ cstate->raw_buf_index = cstate->raw_buf_len;
+ else
+ {
+ if (cstate->raw_buf_index == RAW_BUF_SIZE)
+ {
+ /*
+ * Get a new block if it is the first time, From the
+ * subsequent time, reset the index and re-use the
+ * same block.
+ */
+ if (bIsFirst)
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint32 block_pos = WaitGetFreeCopyBlock(pcshared_info);
+
+ cstate->raw_buf = pcshared_info->data_blocks[block_pos].data;
+ bIsFirst = false;
+ }
+
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+ }
+ }
+
} while (CopyLoadRawBuf(cstate));
}
}
@@ -3931,11 +3979,11 @@ ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
*
* Convert contents to server encoding.
*/
-static void
+void
ConvertToServerEncoding(CopyState cstate)
{
/* Done reading the line. Convert it to server encoding. */
- if (cstate->need_transcoding)
+ if (cstate->need_transcoding && (!IsParallelCopy() || IsWorker()))
{
char *cvt;
@@ -3975,6 +4023,11 @@ CopyReadLineText(CopyState cstate)
char quotec = '\0';
char escapec = '\0';
+ /* For parallel copy */
+ int line_size = 0;
+ uint32 line_pos = 0;
+
+ cstate->eol_type = EOL_UNKNOWN;
if (cstate->csv_mode)
{
quotec = cstate->quote[0];
@@ -4029,6 +4082,8 @@ CopyReadLineText(CopyState cstate)
if (raw_buf_ptr >= copy_buf_len || need_data)
{
REFILL_LINEBUF;
+ SetRawBufForLoad(cstate, line_size, copy_buf_len, raw_buf_ptr,
+ ©_raw_buf);
/*
* Try to read some more data. This will certainly reset
@@ -4253,9 +4308,15 @@ CopyReadLineText(CopyState cstate)
* discard the data and the \. sequence.
*/
if (prev_raw_ptr > cstate->raw_buf_index)
- appendBinaryStringInfo(&cstate->line_buf,
- cstate->raw_buf + cstate->raw_buf_index,
- prev_raw_ptr - cstate->raw_buf_index);
+ {
+ if (!IsParallelCopy())
+ appendBinaryStringInfo(&cstate->line_buf,
+ cstate->raw_buf + cstate->raw_buf_index,
+ prev_raw_ptr - cstate->raw_buf_index);
+ else
+ line_size += prev_raw_ptr - cstate->raw_buf_index;
+ }
+
cstate->raw_buf_index = raw_buf_ptr;
result = true; /* report EOF */
break;
@@ -4307,6 +4368,22 @@ not_end_of_copy:
IF_NEED_REFILL_AND_EOF_BREAK(mblen - 1);
raw_buf_ptr += mblen - 1;
}
+
+ /*
+ * Skip the header line. Update the line here, this cannot be done at
+ * the beginning, as there is a possibility that file contains empty
+ * lines.
+ */
+ if (IsParallelCopy() && first_char_in_line && !IsHeaderLine())
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+
+ line_pos = UpdateSharedLineInfo(cstate,
+ pcshared_info->cur_block_pos,
+ cstate->raw_buf_index, -1,
+ LINE_LEADER_POPULATING, -1);
+ }
+
first_char_in_line = false;
} /* end of outer loop */
@@ -4315,9 +4392,16 @@ not_end_of_copy:
*/
REFILL_LINEBUF;
if (!result && !IsHeaderLine())
- ClearEOLFromCopiedData(cstate, cstate->line_buf.data,
- cstate->line_buf.len, &cstate->line_buf.len);
+ {
+ if (IsParallelCopy())
+ ClearEOLFromCopiedData(cstate, cstate->raw_buf, raw_buf_ptr,
+ &line_size);
+ else
+ ClearEOLFromCopiedData(cstate, cstate->line_buf.data,
+ cstate->line_buf.len, &cstate->line_buf.len);
+ }
+ EndLineParallelCopy(cstate, line_pos, line_size, raw_buf_ptr);
return result;
}
diff --git a/src/backend/commands/copyparallel.c b/src/backend/commands/copyparallel.c
new file mode 100644
index 0000000..6a44a01
--- /dev/null
+++ b/src/backend/commands/copyparallel.c
@@ -0,0 +1,1269 @@
+/*-------------------------------------------------------------------------
+ *
+ * copyparallel.c
+ * Implements the Parallel COPY utility command
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/commands/copyparallel.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/heapam.h"
+#include "catalog/pg_proc_d.h"
+#include "commands/copy.h"
+#include "optimizer/clauses.h"
+#include "optimizer/optimizer.h"
+#include "pgstat.h"
+#include "utils/lsyscache.h"
+
+/* DSM keys for parallel copy. */
+#define PARALLEL_COPY_KEY_SHARED_INFO 1
+#define PARALLEL_COPY_KEY_CSTATE 2
+#define PARALLEL_COPY_WAL_USAGE 3
+#define PARALLEL_COPY_BUFFER_USAGE 4
+
+/* Begin parallel copy Macros */
+#define SET_NEWLINE_SIZE() \
+{ \
+ if (cstate->eol_type == EOL_NL || cstate->eol_type == EOL_CR) \
+ new_line_size = 1; \
+ else if (cstate->eol_type == EOL_CRNL) \
+ new_line_size = 2; \
+ else \
+ new_line_size = 0; \
+}
+
+/*
+ * COPY_WAIT_TO_PROCESS - Wait before continuing to process.
+ */
+#define COPY_WAIT_TO_PROCESS() \
+{ \
+ CHECK_FOR_INTERRUPTS(); \
+ (void) WaitLatch(MyLatch, \
+ WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, \
+ 1L, WAIT_EVENT_PG_SLEEP); \
+ ResetLatch(MyLatch); \
+}
+
+/*
+ * CopyStringToSharedMemory - Copy the string to shared memory.
+ */
+static void
+CopyStringToSharedMemory(CopyState cstate, char *srcPtr, char *destptr,
+ uint32 *copiedsize)
+{
+ uint32 len = srcPtr ? strlen(srcPtr) + 1 : 0;
+
+ memcpy(destptr, (uint16 *) &len, sizeof(uint16));
+ *copiedsize += sizeof(uint16);
+ if (len)
+ {
+ memcpy(destptr + sizeof(uint16), srcPtr, len);
+ *copiedsize += len;
+ }
+}
+
+/*
+ * SerializeParallelCopyState - Serialize the data into shared memory.
+ */
+static void
+SerializeParallelCopyState(ParallelContext *pcxt, CopyState cstate,
+ uint32 estimatedSize, char *whereClauseStr,
+ char *rangeTableStr, char *attnameListStr,
+ char *notnullListStr, char *nullListStr,
+ char *convertListStr)
+{
+ SerializedParallelCopyState shared_cstate;
+ char *shmptr = (char *) shm_toc_allocate(pcxt->toc, estimatedSize + 1);
+ uint32 copiedsize = 0;
+
+ shared_cstate.copy_dest = cstate->copy_dest;
+ shared_cstate.file_encoding = cstate->file_encoding;
+ shared_cstate.need_transcoding = cstate->need_transcoding;
+ shared_cstate.encoding_embeds_ascii = cstate->encoding_embeds_ascii;
+ shared_cstate.csv_mode = cstate->csv_mode;
+ shared_cstate.header_line = cstate->header_line;
+ shared_cstate.null_print_len = cstate->null_print_len;
+ shared_cstate.force_quote_all = cstate->force_quote_all;
+ shared_cstate.convert_selectively = cstate->convert_selectively;
+ shared_cstate.num_defaults = cstate->num_defaults;
+ shared_cstate.relid = cstate->pcdata->relid;
+
+ memcpy(shmptr, (char *) &shared_cstate, sizeof(SerializedParallelCopyState));
+ copiedsize = sizeof(SerializedParallelCopyState);
+
+ CopyStringToSharedMemory(cstate, cstate->null_print, shmptr + copiedsize,
+ &copiedsize);
+ CopyStringToSharedMemory(cstate, cstate->delim, shmptr + copiedsize,
+ &copiedsize);
+ CopyStringToSharedMemory(cstate, cstate->quote, shmptr + copiedsize,
+ &copiedsize);
+ CopyStringToSharedMemory(cstate, cstate->escape, shmptr + copiedsize,
+ &copiedsize);
+ CopyStringToSharedMemory(cstate, attnameListStr, shmptr + copiedsize,
+ &copiedsize);
+ CopyStringToSharedMemory(cstate, notnullListStr, shmptr + copiedsize,
+ &copiedsize);
+ CopyStringToSharedMemory(cstate, nullListStr, shmptr + copiedsize,
+ &copiedsize);
+ CopyStringToSharedMemory(cstate, convertListStr, shmptr + copiedsize,
+ &copiedsize);
+ CopyStringToSharedMemory(cstate, whereClauseStr, shmptr + copiedsize,
+ &copiedsize);
+ CopyStringToSharedMemory(cstate, rangeTableStr, shmptr + copiedsize,
+ &copiedsize);
+
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_CSTATE, shmptr);
+}
+
+/*
+ * CopyNodeFromSharedMemory - Copy the shared memory & return the ptr.
+ */
+static char *
+CopyStringFromSharedMemory(char *srcPtr, uint32 *copiedsize)
+{
+ char *destptr = NULL;
+ uint16 len = 0;
+
+ memcpy((uint16 *) (&len), srcPtr, sizeof(uint16));
+ *copiedsize += sizeof(uint16);
+ if (len)
+ {
+ destptr = (char *) palloc0(len);
+ memcpy(destptr, srcPtr + sizeof(uint16), len);
+ *copiedsize += len;
+ }
+
+ return destptr;
+}
+
+/*
+ * CopyNodeFromSharedMemory - Copy the shared memory & convert it into node
+ * type.
+ */
+static void *
+CopyNodeFromSharedMemory(char *srcPtr, uint32 *copiedsize)
+{
+ char *destptr = NULL;
+ List *destList = NIL;
+ uint16 len = 0;
+
+ memcpy((uint16 *) (&len), srcPtr, sizeof(uint16));
+ *copiedsize += sizeof(uint16);
+ if (len)
+ {
+ destptr = (char *) palloc0(len);
+ memcpy(destptr, srcPtr + sizeof(uint16), len);
+ *copiedsize += len;
+ destList = (List *) stringToNode(destptr);
+ pfree(destptr);
+ }
+
+ return destList;
+}
+
+/*
+ * RestoreParallelCopyState - Retrieve the cstate from shared memory.
+ */
+static void
+RestoreParallelCopyState(shm_toc *toc, CopyState cstate, List **attlist)
+{
+ char *shared_str_val = (char *) shm_toc_lookup(toc, PARALLEL_COPY_KEY_CSTATE, true);
+ SerializedParallelCopyState shared_cstate = {0};
+ uint32 copiedsize = 0;
+
+ memcpy(&shared_cstate, (char *) shared_str_val, sizeof(SerializedParallelCopyState));
+ copiedsize = sizeof(SerializedParallelCopyState);
+
+ cstate->file_encoding = shared_cstate.file_encoding;
+ cstate->need_transcoding = shared_cstate.need_transcoding;
+ cstate->encoding_embeds_ascii = shared_cstate.encoding_embeds_ascii;
+ cstate->csv_mode = shared_cstate.csv_mode;
+ cstate->header_line = shared_cstate.header_line;
+ cstate->null_print_len = shared_cstate.null_print_len;
+ cstate->force_quote_all = shared_cstate.force_quote_all;
+ cstate->convert_selectively = shared_cstate.convert_selectively;
+ cstate->num_defaults = shared_cstate.num_defaults;
+ cstate->pcdata->relid = shared_cstate.relid;
+
+ cstate->null_print = CopyStringFromSharedMemory(shared_str_val + copiedsize,
+ &copiedsize);
+ cstate->delim = CopyStringFromSharedMemory(shared_str_val + copiedsize,
+ &copiedsize);
+ cstate->quote = CopyStringFromSharedMemory(shared_str_val + copiedsize,
+ &copiedsize);
+ cstate->escape = CopyStringFromSharedMemory(shared_str_val + copiedsize,
+ &copiedsize);
+
+ *attlist = (List *) CopyNodeFromSharedMemory(shared_str_val + copiedsize,
+ &copiedsize);
+ cstate->force_notnull = (List *) CopyNodeFromSharedMemory(shared_str_val + copiedsize,
+ &copiedsize);
+ cstate->force_null = (List *) CopyNodeFromSharedMemory(shared_str_val + copiedsize,
+ &copiedsize);
+ cstate->convert_select = (List *) CopyNodeFromSharedMemory(shared_str_val + copiedsize,
+ &copiedsize);
+ cstate->whereClause = (Node *) CopyNodeFromSharedMemory(shared_str_val + copiedsize,
+ &copiedsize);
+ cstate->range_table = (List *) CopyNodeFromSharedMemory(shared_str_val + copiedsize,
+ &copiedsize);
+}
+
+/*
+ * EstimateStringSize - Estimate the size required for the string in shared
+ * memory.
+ */
+static uint32
+EstimateStringSize(char *str)
+{
+ uint32 strsize = sizeof(uint16);
+
+ if (str)
+ strsize += strlen(str) + 1;
+
+ return strsize;
+}
+
+/*
+ * EstimateNodeSize - Convert the list to string & estimate the size required
+ * in shared memory.
+ */
+static uint32
+EstimateNodeSize(void *list, char **listStr)
+{
+ uint32 strsize = sizeof(uint16);
+
+ if (list != NIL)
+ {
+ *listStr = nodeToString(list);
+ strsize += strlen(*listStr) + 1;
+ }
+
+ return strsize;
+}
+
+/*
+ * EstimateCstateSize - Estimate the size required in shared memory for cstate
+ * variables.
+ */
+static uint32
+EstimateCstateSize(ParallelContext *pcxt, CopyState cstate, List *attnamelist,
+ char **whereClauseStr, char **rangeTableStr,
+ char **attnameListStr, char **notnullListStr,
+ char **nullListStr, char **convertListStr)
+{
+ uint32 strsize = MAXALIGN(sizeof(SerializedParallelCopyState));
+
+ strsize += EstimateStringSize(cstate->null_print);
+ strsize += EstimateStringSize(cstate->delim);
+ strsize += EstimateStringSize(cstate->quote);
+ strsize += EstimateStringSize(cstate->escape);
+ strsize += EstimateNodeSize(attnamelist, attnameListStr);
+ strsize += EstimateNodeSize(cstate->force_notnull, notnullListStr);
+ strsize += EstimateNodeSize(cstate->force_null, nullListStr);
+ strsize += EstimateNodeSize(cstate->convert_select, convertListStr);
+ strsize += EstimateNodeSize(cstate->whereClause, whereClauseStr);
+ strsize += EstimateNodeSize(cstate->range_table, rangeTableStr);
+
+ strsize++;
+ shm_toc_estimate_chunk(&pcxt->estimator, strsize);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ return strsize;
+}
+
+/*
+ * PopulateParallelCopyShmInfo - Set ParallelCopyShmInfo.
+ */
+static void
+PopulateParallelCopyShmInfo(ParallelCopyShmInfo *shared_info_ptr)
+{
+ uint32 count;
+
+ MemSet(shared_info_ptr, 0, sizeof(ParallelCopyShmInfo));
+ shared_info_ptr->is_read_in_progress = true;
+ shared_info_ptr->cur_block_pos = -1;
+ for (count = 0; count < RINGSIZE; count++)
+ {
+ ParallelCopyLineBoundary *lineInfo = &shared_info_ptr->line_boundaries.ring[count];
+
+ pg_atomic_init_u32(&(lineInfo->line_size), -1);
+ }
+}
+
+/*
+ * CheckTrigFunParallelSafety - For all triggers, check if the associated
+ * trigger functions are parallel safe. If at least one trigger function is
+ * parallel unsafe, we do not allow parallelism.
+ */
+static pg_attribute_always_inline bool
+CheckTrigFunParallelSafety(TriggerDesc *trigdesc)
+{
+ int i;
+
+ for (i = 0; i < trigdesc->numtriggers; i++)
+ {
+ Trigger *trigger = &trigdesc->triggers[i];
+ int trigtype = RI_TRIGGER_NONE;
+
+ if (func_parallel(trigger->tgfoid) != PROPARALLEL_SAFE)
+ return false;
+
+ /* If the trigger is parallel safe, also look for RI_TRIGGER. */
+ trigtype = RI_FKey_trigger_type(trigger->tgfoid);
+
+ /*
+ * No parallelism if foreign key check trigger is present. This is
+ * because, while performing foreign key checks, we take KEY SHARE
+ * lock on primary key table rows which inturn will increment the
+ * command counter and updates the snapshot. Since we share the
+ * snapshots at the beginning of the command, we can't allow it to be
+ * changed later. So, unless we do something special for it, we can't
+ * allow parallelism in such cases.
+ */
+ if (trigtype == RI_TRIGGER_FK)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * CheckExprParallelSafety - Determine parallel safety of volatile expressions
+ * in default clause of column definition or in where clause and return true if
+ * they are parallel safe.
+ */
+static pg_attribute_always_inline bool
+CheckExprParallelSafety(CopyState cstate)
+{
+ if (contain_volatile_functions(cstate->whereClause))
+ {
+ if (max_parallel_hazard((Query *) cstate->whereClause) != PROPARALLEL_SAFE)
+ return false;
+ }
+
+ /*
+ * Check if any of the column has volatile default expression. if yes, and
+ * they are not parallel safe, then parallelism is not allowed. For
+ * instance, if there are any serial/bigserial columns for which nextval()
+ * default expression which is parallel unsafe is associated, parallelism
+ * should not be allowed.
+ */
+ if (cstate->defexprs != NULL && cstate->num_defaults != 0)
+ {
+ int i;
+
+ for (i = 0; i < cstate->num_defaults; i++)
+ {
+ bool volatile_expr = contain_volatile_functions((Node *) cstate->defexprs[i]->expr);
+
+ if (volatile_expr &&
+ (max_parallel_hazard((Query *) cstate->defexprs[i]->expr)) !=
+ PROPARALLEL_SAFE)
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * IsParallelCopyAllowed - Check for the cases where parallel copy is not
+ * applicable.
+ */
+static pg_attribute_always_inline bool
+IsParallelCopyAllowed(CopyState cstate)
+{
+ /* Parallel copy not allowed for frontend (2.0 protocol) & binary option. */
+ if ((cstate->copy_dest == COPY_OLD_FE) || cstate->binary)
+ return false;
+
+ /*
+ * Check if copy is into foreign table. We can not allow parallelism in
+ * this case because each worker needs to establish FDW connection and
+ * operate in a separate transaction. Unless we have a capability to
+ * provide two-phase commit protocol, we can not allow parallelism.
+ *
+ * Also check if copy is into temporary table. Since parallel workers can
+ * not access temporary table, parallelism is not allowed.
+ */
+ if (cstate->rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE ||
+ RelationUsesLocalBuffers(cstate->rel))
+ return false;
+
+ /*
+ * If there are volatile default expressions or where clause contain
+ * volatile expressions, allow parallelism if they are parallel safe,
+ * otherwise not.
+ */
+ if (!CheckExprParallelSafety(cstate))
+ return false;
+
+ /* Check parallel safety of the trigger functions. */
+ if (cstate->rel->trigdesc != NULL &&
+ !CheckTrigFunParallelSafety(cstate->rel->trigdesc))
+ return false;
+
+ /*
+ * When transition tables are involved (if after statement triggers are
+ * present), we collect minimal tuples in the tuple store after processing
+ * them so that later after statement triggers can access them. Now, if
+ * we want to enable parallelism for such cases, we instead need to store
+ * and access tuples from shared tuple store. However, it does not have
+ * the facility to store tuples in-memory, so we always need to store and
+ * access from a file which could be costly unless we also have an
+ * additional way to store minimal tuples in shared memory till work_mem
+ * and then in shared tuple store. It is possible to do all this to enable
+ * parallel copy for such cases. Currently, we can disallow parallelism
+ * for such cases and later allow if required.
+ *
+ * When there are BEFORE/AFTER/INSTEAD OF row triggers on the table. We do
+ * not allow parallelism in such cases because such triggers might query
+ * the table we are inserting into and act differently if the tuples that
+ * have already been processed and prepared for insertion are not there.
+ * Now, if we allow parallelism with such triggers the behaviour would
+ * depend on if the parallel worker has already inserted or not that
+ * particular tuples.
+ */
+ if (cstate->rel->trigdesc != NULL &&
+ (cstate->rel->trigdesc->trig_insert_after_statement ||
+ cstate->rel->trigdesc->trig_insert_new_table ||
+ cstate->rel->trigdesc->trig_insert_before_row ||
+ cstate->rel->trigdesc->trig_insert_after_row ||
+ cstate->rel->trigdesc->trig_insert_instead_row))
+ return false;
+
+ return true;
+}
+
+/*
+ * BeginParallelCopy - Start parallel copy tasks.
+ *
+ * Get the number of workers required to perform the parallel copy. The data
+ * structures that are required by the parallel workers will be initialized, the
+ * size required in DSM will be calculated and the necessary keys will be loaded
+ * in the DSM. The specified number of workers will then be launched.
+ */
+ParallelContext *
+BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
+{
+ ParallelContext *pcxt;
+ ParallelCopyShmInfo *shared_info_ptr;
+ Size est_cstateshared;
+ char *whereClauseStr = NULL;
+ char *rangeTableStr = NULL;
+ char *attnameListStr = NULL;
+ char *notnullListStr = NULL;
+ char *nullListStr = NULL;
+ char *convertListStr = NULL;
+ int parallel_workers = 0;
+ WalUsage *walusage;
+ BufferUsage *bufferusage;
+ ParallelCopyData *pcdata;
+ MemoryContext oldcontext;
+ uint32 strsize;
+
+ CheckTargetRelValidity(cstate);
+ parallel_workers = Min(nworkers, max_worker_processes);
+
+ /* Can't perform copy in parallel */
+ if (parallel_workers <= 0)
+ return NULL;
+
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+ pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
+ MemoryContextSwitchTo(oldcontext);
+ cstate->pcdata = pcdata;
+
+ /*
+ * User chosen parallel copy. Determine if the parallel copy is actually
+ * allowed. If not, go with the non-parallel mode.
+ */
+ if (!IsParallelCopyAllowed(cstate))
+ return NULL;
+
+ (void) GetCurrentFullTransactionId();
+ (void) GetCurrentCommandId(true);
+
+ EnterParallelMode();
+ pcxt = CreateParallelContext("postgres", "ParallelCopyMain",
+ parallel_workers);
+ Assert(pcxt->nworkers > 0);
+
+ /*
+ * Estimate size for shared information for PARALLEL_COPY_KEY_SHARED_INFO
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator, sizeof(ParallelCopyShmInfo));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate the size for shared information for PARALLEL_COPY_KEY_CSTATE */
+ est_cstateshared = MAXALIGN(sizeof(SerializedParallelCopyState));
+ shm_toc_estimate_chunk(&pcxt->estimator, est_cstateshared);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ strsize = EstimateCstateSize(pcxt, cstate, attnamelist, &whereClauseStr,
+ &rangeTableStr, &attnameListStr,
+ ¬nullListStr, &nullListStr,
+ &convertListStr);
+
+ /*
+ * Estimate space for WalUsage and BufferUsage -- PARALLEL_COPY_WAL_USAGE
+ * and PARALLEL_COPY_BUFFER_USAGE.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ InitializeParallelDSM(pcxt);
+
+ /* If no DSM segment was available, back out (do serial copy) */
+ if (pcxt->seg == NULL)
+ {
+ EndParallelCopy(pcxt);
+ return NULL;
+ }
+
+ /* Allocate shared memory for PARALLEL_COPY_KEY_SHARED_INFO */
+ shared_info_ptr = (ParallelCopyShmInfo *) shm_toc_allocate(pcxt->toc, sizeof(ParallelCopyShmInfo));
+ PopulateParallelCopyShmInfo(shared_info_ptr);
+
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_SHARED_INFO, shared_info_ptr);
+ pcdata->pcshared_info = shared_info_ptr;
+ pcdata->relid = relid;
+
+ SerializeParallelCopyState(pcxt, cstate, strsize, whereClauseStr,
+ rangeTableStr, attnameListStr, notnullListStr,
+ nullListStr, convertListStr);
+
+ /*
+ * Allocate space for each worker's WalUsage and BufferUsage; no need to
+ * initialize.
+ */
+ walusage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_WAL_USAGE, walusage);
+ pcdata->walusage = walusage;
+ bufferusage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_BUFFER_USAGE, bufferusage);
+ pcdata->bufferusage = bufferusage;
+
+ LaunchParallelWorkers(pcxt);
+ if (pcxt->nworkers_launched == 0)
+ {
+ EndParallelCopy(pcxt);
+ return NULL;
+ }
+
+ /*
+ * Caller needs to wait for all launched workers when we return. Make
+ * sure that the failure-to-start case will not hang forever.
+ */
+ WaitForParallelWorkersToAttach(pcxt);
+
+ pcdata->is_leader = true;
+ cstate->is_parallel = true;
+ return pcxt;
+}
+
+/*
+ * EndParallelCopy - End the parallel copy tasks.
+ */
+pg_attribute_always_inline void
+EndParallelCopy(ParallelContext *pcxt)
+{
+ Assert(!IsParallelWorker());
+
+ DestroyParallelContext(pcxt);
+ ExitParallelMode();
+}
+
+/*
+ * InitializeParallelCopyInfo - Initialize parallel worker.
+ */
+static void
+InitializeParallelCopyInfo(CopyState cstate, List *attnamelist)
+{
+ uint32 count;
+ ParallelCopyData *pcdata = cstate->pcdata;
+ TupleDesc tup_desc = RelationGetDescr(cstate->rel);
+
+ PopulateCommonCstateInfo(cstate, tup_desc, attnamelist);
+
+ /* Initialize state variables. */
+ cstate->reached_eof = false;
+ cstate->eol_type = EOL_UNKNOWN;
+ cstate->cur_relname = RelationGetRelationName(cstate->rel);
+ cstate->cur_lineno = 0;
+ cstate->cur_attname = NULL;
+ cstate->cur_attval = NULL;
+
+ /* Set up variables to avoid per-attribute overhead. */
+ initStringInfo(&cstate->attribute_buf);
+
+ initStringInfo(&cstate->line_buf);
+ for (count = 0; count < WORKER_CHUNK_COUNT; count++)
+ initStringInfo(&pcdata->worker_line_buf[count].line_buf);
+
+ cstate->line_buf_converted = false;
+ cstate->raw_buf = NULL;
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+
+ PopulateCstateCatalogInfo(cstate);
+
+ /* Create workspace for CopyReadAttributes results. */
+ if (!cstate->binary)
+ {
+ AttrNumber attr_count = list_length(cstate->attnumlist);
+
+ cstate->max_fields = attr_count;
+ cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
+ }
+}
+
+/*
+ * CacheLineInfo - Cache the line information to local memory.
+ */
+static bool
+CacheLineInfo(CopyState cstate, uint32 buff_count)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyData *pcdata = cstate->pcdata;
+ uint32 write_pos;
+ ParallelCopyDataBlock *data_blk_ptr;
+ ParallelCopyLineBoundary *lineInfo;
+ uint32 offset;
+ int dataSize;
+ int copiedSize = 0;
+
+ resetStringInfo(&pcdata->worker_line_buf[buff_count].line_buf);
+ write_pos = GetLinePosition(cstate);
+ if (-1 == write_pos)
+ return true;
+
+ /* Get the current line information. */
+ lineInfo = &pcshared_info->line_boundaries.ring[write_pos];
+ if (pg_atomic_read_u32(&lineInfo->line_size) == 0)
+ goto empty_data_line_update;
+
+ /* Get the block information. */
+ data_blk_ptr = &pcshared_info->data_blocks[lineInfo->first_block];
+
+ /* Get the offset information from where the data must be copied. */
+ offset = lineInfo->start_offset;
+ pcdata->worker_line_buf[buff_count].cur_lineno = lineInfo->cur_lineno;
+
+ elog(DEBUG1, "[Worker] Processing - line position:%d, block:%d, unprocessed lines:%d, offset:%d, line size:%d",
+ write_pos, lineInfo->first_block,
+ pg_atomic_read_u32(&data_blk_ptr->unprocessed_line_parts),
+ offset, pg_atomic_read_u32(&lineInfo->line_size));
+
+ for (;;)
+ {
+ uint8 skip_bytes = data_blk_ptr->skip_bytes;
+
+ /*
+ * There is a possibility that the loop embedded at the bottom of the
+ * current loop has come out because data_blk_ptr->curr_blk_completed
+ * is set, but dataSize read might be an old value, if
+ * data_blk_ptr->curr_blk_completed and the line is completed,
+ * line_size will be set. Read the line_size again to be sure if it is
+ * completed or partial block.
+ */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+ if (dataSize)
+ {
+ int remainingSize = dataSize - copiedSize;
+
+ if (!remainingSize)
+ break;
+
+ /* Whole line is in current block. */
+ if (remainingSize + offset + skip_bytes < DATA_BLOCK_SIZE)
+ {
+ appendBinaryStringInfo(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ remainingSize);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts,
+ 1);
+ break;
+ }
+ else
+ {
+ /* Line is spread across the blocks. */
+ uint32 lineInCurrentBlock = (DATA_BLOCK_SIZE - skip_bytes) - offset;
+
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ lineInCurrentBlock);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts, 1);
+ copiedSize += lineInCurrentBlock;
+ while (copiedSize < dataSize)
+ {
+ uint32 currentBlockCopySize;
+ ParallelCopyDataBlock *currBlkPtr = &pcshared_info->data_blocks[data_blk_ptr->following_block];
+
+ skip_bytes = currBlkPtr->skip_bytes;
+
+ /*
+ * If complete data is present in current block use
+ * dataSize - copiedSize, or copy the whole block from
+ * current block.
+ */
+ currentBlockCopySize = Min(dataSize - copiedSize, DATA_BLOCK_SIZE - skip_bytes);
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &currBlkPtr->data[0],
+ currentBlockCopySize);
+ pg_atomic_sub_fetch_u32(&currBlkPtr->unprocessed_line_parts, 1);
+ copiedSize += currentBlockCopySize;
+ data_blk_ptr = currBlkPtr;
+ }
+
+ break;
+ }
+ }
+ else
+ {
+ /* Copy this complete block from the current offset. */
+ uint32 lineInCurrentBlock = (DATA_BLOCK_SIZE - skip_bytes) - offset;
+
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ lineInCurrentBlock);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts, 1);
+ copiedSize += lineInCurrentBlock;
+
+ /*
+ * Reset the offset. For the first copy, copy from the offset. For
+ * the subsequent copy the complete block.
+ */
+ offset = 0;
+
+ /* Set data_blk_ptr to the following block. */
+ data_blk_ptr = &pcshared_info->data_blocks[data_blk_ptr->following_block];
+ }
+
+ for (;;)
+ {
+ /* Get the size of this line */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+
+ /*
+ * If the data is present in current block lineInfo->line_size
+ * will be updated. If the data is spread across the blocks either
+ * of lineInfo->line_size or data_blk_ptr->curr_blk_completed can
+ * be updated. lineInfo->line_size will be updated if the complete
+ * read is finished. data_blk_ptr->curr_blk_completed will be
+ * updated if processing of current block is finished and data
+ * processing is not finished.
+ */
+ if (data_blk_ptr->curr_blk_completed || (dataSize != -1))
+ break;
+
+ COPY_WAIT_TO_PROCESS()
+ }
+ }
+
+empty_data_line_update:
+ elog(DEBUG1, "[Worker] Completed processing line:%d", write_pos);
+ pg_atomic_write_u32(&lineInfo->line_state, LINE_WORKER_PROCESSED);
+ pg_atomic_write_u32(&lineInfo->line_size, -1);
+ pg_atomic_add_fetch_u64(&pcshared_info->total_worker_processed, 1);
+ return false;
+}
+
+/*
+ * GetWorkerLine - Returns a line for worker to process.
+ */
+bool
+GetWorkerLine(CopyState cstate)
+{
+ uint32 buff_count;
+ ParallelCopyData *pcdata = cstate->pcdata;
+
+ /*
+ * Copy the line data to line_buf and release the line position so that
+ * the worker can continue loading data.
+ */
+ if (pcdata->worker_line_buf_pos < pcdata->worker_line_buf_count)
+ goto return_line;
+
+ pcdata->worker_line_buf_pos = 0;
+ pcdata->worker_line_buf_count = 0;
+
+ for (buff_count = 0; buff_count < WORKER_CHUNK_COUNT; buff_count++)
+ {
+ bool result = CacheLineInfo(cstate, buff_count);
+ if (result)
+ break;
+
+ pcdata->worker_line_buf_count++;
+ }
+
+ if (pcdata->worker_line_buf_count)
+ goto return_line;
+ else
+ resetStringInfo(&cstate->line_buf);
+
+ return true;
+
+return_line:
+ cstate->line_buf = pcdata->worker_line_buf[pcdata->worker_line_buf_pos].line_buf;
+ cstate->cur_lineno = pcdata->worker_line_buf[pcdata->worker_line_buf_pos].cur_lineno;
+ cstate->line_buf_valid = true;
+
+ /* Mark that encoding conversion hasn't occurred yet. */
+ cstate->line_buf_converted = false;
+ ConvertToServerEncoding(cstate);
+ pcdata->worker_line_buf_pos++;
+ return false;
+}
+
+/*
+ * ParallelCopyMain - Parallel copy worker's code.
+ *
+ * Where clause handling, convert tuple to columns, add default null values for
+ * the missing columns that are not present in that record. Find the partition
+ * if it is partitioned table, invoke before row insert Triggers, handle
+ * constraints and insert the tuples.
+ */
+void
+ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
+{
+ CopyState cstate;
+ ParallelCopyData *pcdata;
+ ParallelCopyShmInfo *pcshared_info;
+ Relation rel = NULL;
+ MemoryContext oldcontext;
+ List *attlist = NIL;
+ WalUsage *walusage;
+ BufferUsage *bufferusage;
+
+ /* Allocate workspace and zero all fields. */
+ cstate = (CopyStateData *) palloc0(sizeof(CopyStateData));
+
+ /*
+ * We allocate everything used by a cstate in a new memory context. This
+ * avoids memory leaks during repeated use of COPY in a query.
+ */
+ cstate->copycontext = AllocSetContextCreate(CurrentMemoryContext,
+ "COPY",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+
+ pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
+ cstate->pcdata = pcdata;
+ pcdata->is_leader = false;
+ pcdata->worker_processed_pos = -1;
+ cstate->is_parallel = true;
+ pcshared_info = (ParallelCopyShmInfo *) shm_toc_lookup(toc, PARALLEL_COPY_KEY_SHARED_INFO, false);
+
+ ereport(DEBUG1, (errmsg("Starting parallel copy worker")));
+
+ pcdata->pcshared_info = pcshared_info;
+ RestoreParallelCopyState(toc, cstate, &attlist);
+
+ /* Open and lock the relation, using the appropriate lock type. */
+ rel = table_open(cstate->pcdata->relid, RowExclusiveLock);
+ cstate->rel = rel;
+ InitializeParallelCopyInfo(cstate, attlist);
+
+ /* Prepare to track buffer usage during parallel execution */
+ InstrStartParallelQuery();
+
+ CopyFrom(cstate);
+
+ if (rel != NULL)
+ table_close(rel, RowExclusiveLock);
+
+ /* Report WAL/buffer usage during parallel execution */
+ bufferusage = shm_toc_lookup(toc, PARALLEL_COPY_BUFFER_USAGE, false);
+ walusage = shm_toc_lookup(toc, PARALLEL_COPY_WAL_USAGE, false);
+ InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber],
+ &walusage[ParallelWorkerNumber]);
+
+ MemoryContextSwitchTo(oldcontext);
+ pfree(cstate);
+ return;
+}
+
+/*
+ * UpdateSharedLineInfo - Update the line information.
+ */
+uint32
+UpdateSharedLineInfo(CopyState cstate, uint32 blk_pos, uint32 offset,
+ uint32 line_size, uint32 line_state, uint32 blk_line_pos)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundaries *lineBoundaryPtr = &pcshared_info->line_boundaries;
+ ParallelCopyLineBoundary *lineInfo;
+ uint32 line_pos;
+
+ /* blk_line_pos will be valid in case line_pos was blocked earlier. */
+ if (blk_line_pos == -1)
+ {
+ line_pos = lineBoundaryPtr->pos;
+
+ /* Update the line information for the worker to pick and process. */
+ lineInfo = &lineBoundaryPtr->ring[line_pos];
+ while (pg_atomic_read_u32(&lineInfo->line_size) != -1)
+ COPY_WAIT_TO_PROCESS()
+
+ lineInfo->first_block = blk_pos;
+ lineInfo->start_offset = offset;
+ lineInfo->cur_lineno = cstate->cur_lineno;
+ lineBoundaryPtr->pos = (lineBoundaryPtr->pos + 1) % RINGSIZE;
+ }
+ else
+ {
+ line_pos = blk_line_pos;
+ lineInfo = &lineBoundaryPtr->ring[line_pos];
+ }
+
+ if (line_state == LINE_LEADER_POPULATED)
+ {
+ elog(DEBUG1, "[Leader] Added line with block:%d, offset:%d, line position:%d, line size:%d",
+ lineInfo->first_block, lineInfo->start_offset, line_pos,
+ pg_atomic_read_u32(&lineInfo->line_size));
+ pcshared_info->populated++;
+ }
+ else
+ elog(DEBUG1, "[Leader] Adding - block:%d, offset:%d, line position:%d",
+ lineInfo->first_block, lineInfo->start_offset, line_pos);
+
+ pg_atomic_write_u32(&lineInfo->line_size, line_size);
+ pg_atomic_write_u32(&lineInfo->line_state, line_state);
+
+ return line_pos;
+}
+
+/*
+ * ParallelCopyFrom - parallel copy leader's functionality.
+ *
+ * Leader executes the before statement for before statement trigger, if before
+ * statement trigger is present. It will read the table data from the file and
+ * copy the contents to DSM data blocks. It will then read the input contents
+ * from the DSM data block and identify the records based on line breaks. This
+ * information is called line or a record that need to be inserted into a
+ * relation. The line information will be stored in ParallelCopyLineBoundary DSM
+ * data structure. Workers will then process this information and insert the
+ * data in to table. It will repeat this process until the all data is read from
+ * the file and all the DSM data blocks are processed. While processing if
+ * leader identifies that DSM Data blocks or DSM ParallelCopyLineBoundary data
+ * structures is full, leader will wait till the worker frees up some entries
+ * and repeat the process. It will wait till all the lines populated are
+ * processed by the workers and exits.
+ */
+void
+ParallelCopyFrom(CopyState cstate)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+
+ ereport(DEBUG1, (errmsg("Running parallel copy leader")));
+
+ /* raw_buf is not used in parallel copy, instead data blocks are used. */
+ pfree(cstate->raw_buf);
+ cstate->raw_buf = NULL;
+
+ /* Execute the before statement triggers from the leader */
+ ExecBeforeStmtTrigger(cstate);
+
+ /* On input just throw the header line away. */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
+ {
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ {
+ pcshared_info->is_read_in_progress = false;
+ return; /* done */
+ }
+ }
+
+ for (;;)
+ {
+ bool done;
+
+ cstate->cur_lineno++;
+
+ /* Actually read the line into memory here. */
+ done = CopyReadLine(cstate);
+
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ break;
+ }
+
+ pcshared_info->is_read_in_progress = false;
+ cstate->cur_lineno = 0;
+}
+
+/*
+ * GetLinePosition - Return the line position once the leader has populated the
+ * data.
+ */
+uint32
+GetLinePosition(CopyState cstate)
+{
+ ParallelCopyData *pcdata = cstate->pcdata;
+ ParallelCopyShmInfo *pcshared_info = pcdata->pcshared_info;
+ uint32 previous_pos = pcdata->worker_processed_pos;
+ uint32 write_pos = (previous_pos == -1) ? 0 : (previous_pos + 1) % RINGSIZE;
+
+ for (;;)
+ {
+ int dataSize;
+ bool is_read_in_progress = pcshared_info->is_read_in_progress;
+ ParallelCopyLineBoundary *lineInfo;
+ ParallelCopyDataBlock *data_blk_ptr;
+ uint32 line_state = LINE_LEADER_POPULATED;
+ ParallelCopyLineState curr_line_state;
+
+ CHECK_FOR_INTERRUPTS();
+
+ /* File read completed & no elements to process. */
+ if (!is_read_in_progress &&
+ (pcshared_info->populated ==
+ pg_atomic_read_u64(&pcshared_info->total_worker_processed)))
+ {
+ write_pos = -1;
+ break;
+ }
+
+ /* Get the current line information. */
+ lineInfo = &pcshared_info->line_boundaries.ring[write_pos];
+ curr_line_state = pg_atomic_read_u32(&lineInfo->line_state);
+ if ((write_pos % WORKER_CHUNK_COUNT == 0) &&
+ (curr_line_state == LINE_WORKER_PROCESSED ||
+ curr_line_state == LINE_WORKER_PROCESSING))
+ {
+ pcdata->worker_processed_pos = write_pos;
+ write_pos = (write_pos + WORKER_CHUNK_COUNT) % RINGSIZE;
+ continue;
+ }
+
+ /* Get the size of this line. */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+
+ if (dataSize != 0) /* If not an empty line. */
+ {
+ /* Get the block information. */
+ data_blk_ptr = &pcshared_info->data_blocks[lineInfo->first_block];
+
+ if (!data_blk_ptr->curr_blk_completed && (dataSize == -1))
+ {
+ /* Wait till the current line or block is added. */
+ COPY_WAIT_TO_PROCESS()
+ continue;
+ }
+ }
+
+ /* Make sure that no worker has consumed this element. */
+ if (pg_atomic_compare_exchange_u32(&lineInfo->line_state,
+ &line_state, LINE_WORKER_PROCESSING))
+ break;
+ }
+
+ pcdata->worker_processed_pos = write_pos;
+ return write_pos;
+}
+
+/*
+ * GetFreeCopyBlock - Get a free block for data to be copied.
+ */
+static pg_attribute_always_inline uint32
+GetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
+{
+ int count = 0;
+ uint32 last_free_block = pcshared_info->cur_block_pos;
+ uint32 block_pos = (last_free_block != -1) ? ((last_free_block + 1) % MAX_BLOCKS_COUNT) : 0;
+
+ /*
+ * Get a new block for copying data, don't check current block, current
+ * block will have some unprocessed data.
+ */
+ while (count < (MAX_BLOCKS_COUNT - 1))
+ {
+ ParallelCopyDataBlock *dataBlkPtr = &pcshared_info->data_blocks[block_pos];
+ uint32 unprocessed_line_parts = pg_atomic_read_u32(&dataBlkPtr->unprocessed_line_parts);
+
+ if (unprocessed_line_parts == 0)
+ {
+ dataBlkPtr->curr_blk_completed = false;
+ dataBlkPtr->skip_bytes = 0;
+ dataBlkPtr->following_block = -1;
+ pcshared_info->cur_block_pos = block_pos;
+ MemSet(&dataBlkPtr->data[0], 0, DATA_BLOCK_SIZE);
+ return block_pos;
+ }
+
+ block_pos = (block_pos + 1) % MAX_BLOCKS_COUNT;
+ count++;
+ }
+
+ return -1;
+}
+
+/*
+ * WaitGetFreeCopyBlock - If there are no blocks available, wait and get a block
+ * for copying data.
+ */
+uint32
+WaitGetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
+{
+ uint32 new_free_pos = -1;
+
+ for (;;)
+ {
+ new_free_pos = GetFreeCopyBlock(pcshared_info);
+ if (new_free_pos != -1) /* We have got one block, break now. */
+ break;
+
+ COPY_WAIT_TO_PROCESS()
+ }
+
+ return new_free_pos;
+}
+
+/*
+ * SetRawBufForLoad - Set raw_buf to the shared memory where the file data must
+ * be read.
+ */
+void
+SetRawBufForLoad(CopyState cstate, uint32 line_size, uint32 copy_buf_len,
+ uint32 raw_buf_ptr, char **copy_raw_buf)
+{
+ ParallelCopyShmInfo *pcshared_info;
+ uint32 cur_block_pos;
+ uint32 next_block_pos;
+ ParallelCopyDataBlock *cur_data_blk_ptr = NULL;
+ ParallelCopyDataBlock *next_data_blk_ptr = NULL;
+
+ if (!IsParallelCopy())
+ return;
+
+ pcshared_info = cstate->pcdata->pcshared_info;
+ cur_block_pos = pcshared_info->cur_block_pos;
+ cur_data_blk_ptr = (cstate->raw_buf) ? &pcshared_info->data_blocks[cur_block_pos] : NULL;
+ next_block_pos = WaitGetFreeCopyBlock(pcshared_info);
+ next_data_blk_ptr = &pcshared_info->data_blocks[next_block_pos];
+
+ /* set raw_buf to the data block in shared memory */
+ cstate->raw_buf = next_data_blk_ptr->data;
+ *copy_raw_buf = cstate->raw_buf;
+ if (cur_data_blk_ptr && line_size)
+ {
+ /*
+ * Mark the previous block as completed, worker can start copying this
+ * data.
+ */
+ cur_data_blk_ptr->following_block = next_block_pos;
+ pg_atomic_add_fetch_u32(&cur_data_blk_ptr->unprocessed_line_parts, 1);
+ cur_data_blk_ptr->skip_bytes = copy_buf_len - raw_buf_ptr;
+ cur_data_blk_ptr->curr_blk_completed = true;
+ }
+}
+
+/*
+ * EndLineParallelCopy - Update the line information in shared memory.
+ */
+void
+EndLineParallelCopy(CopyState cstate, uint32 line_pos, uint32 line_size,
+ uint32 raw_buf_ptr)
+{
+ uint8 new_line_size;
+
+ if (!IsParallelCopy())
+ return;
+
+ if (!IsHeaderLine())
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+
+ SET_NEWLINE_SIZE()
+ if (line_size)
+ {
+ /*
+ * If the new_line_size > raw_buf_ptr, then the new block has only
+ * new line char content. The unprocessed count should not be
+ * increased in this case.
+ */
+ if (raw_buf_ptr > new_line_size)
+ {
+ uint32 cur_block_pos = pcshared_info->cur_block_pos;
+ ParallelCopyDataBlock *curr_data_blk_ptr = &pcshared_info->data_blocks[cur_block_pos];
+
+ pg_atomic_add_fetch_u32(&curr_data_blk_ptr->unprocessed_line_parts, 1);
+ }
+
+ /*
+ * Update line size & line state, other members are already
+ * updated.
+ */
+ (void) UpdateSharedLineInfo(cstate, -1, -1, line_size,
+ LINE_LEADER_POPULATED, line_pos);
+ }
+ else if (new_line_size)
+ /* This means only new line char, empty record should be inserted. */
+ (void) UpdateSharedLineInfo(cstate, -1, -1, 0,
+ LINE_LEADER_POPULATED, -1);
+ }
+}
+
+/*
+ * ExecBeforeStmtTrigger - Execute the before statement trigger, this will be
+ * executed for parallel copy by the leader process.
+ */
+void
+ExecBeforeStmtTrigger(CopyState cstate)
+{
+ EState *estate = CreateExecutorState();
+ ResultRelInfo *resultRelInfo;
+
+ Assert(IsLeader());
+
+ /*
+ * We need a ResultRelInfo so we can use the regular executor's
+ * index-entry-making machinery. (There used to be a huge amount of code
+ * here that basically duplicated execUtils.c ...)
+ */
+ resultRelInfo = makeNode(ResultRelInfo);
+ InitResultRelInfo(resultRelInfo,
+ cstate->rel,
+ 1, /* must match rel's position in range_table */
+ NULL,
+ 0);
+
+ /* Verify the named relation is a valid target for INSERT */
+ CheckValidResultRel(resultRelInfo, CMD_INSERT);
+
+ estate->es_result_relations = resultRelInfo;
+ estate->es_num_result_relations = 1;
+ estate->es_result_relation_info = resultRelInfo;
+
+ ExecInitRangeTable(estate, cstate->range_table);
+
+ /*
+ * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
+ * should do this for COPY, since it's not really an "INSERT" statement as
+ * such. However, executing these triggers maintains consistency with the
+ * EACH ROW triggers that we already fire on COPY.
+ */
+ ExecBSInsertTriggers(estate, resultRelInfo);
+
+ /* Close any trigger target relations */
+ ExecCleanUpTriggerState(estate);
+
+ FreeExecutorState(estate);
+}
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 24c7b41..cf00256 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2353,7 +2353,7 @@ psql_completion(const char *text, int start, int end)
/* Complete COPY <sth> FROM|TO filename WITH ( */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "("))
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
- "HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
+ "HEADER", "PARALLEL", "QUOTE", "ESCAPE", "FORCE_QUOTE",
"FORCE_NOT_NULL", "FORCE_NULL", "ENCODING");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index df1b43a..96295bc 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -385,6 +385,7 @@ extern FullTransactionId GetTopFullTransactionId(void);
extern FullTransactionId GetTopFullTransactionIdIfAny(void);
extern FullTransactionId GetCurrentFullTransactionId(void);
extern FullTransactionId GetCurrentFullTransactionIdIfAny(void);
+extern void SetCurrentCommandIdUsedForWorker(void);
extern void MarkCurrentTransactionIdLoggedIfAny(void);
extern bool SubTransactionIsActive(SubTransactionId subxid);
extern CommandId GetCurrentCommandId(bool used);
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index cd2d56e..a9fe950 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -51,6 +51,31 @@
*/
#define WORKER_CHUNK_COUNT 64
+#define IsParallelCopy() (cstate->is_parallel)
+#define IsLeader() (cstate->pcdata->is_leader)
+#define IsWorker() (IsParallelCopy() && !IsLeader())
+#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
+
+/*
+ * Increment the lines processed.
+ */
+#define INCREMENTPROCESSED(processed) \
+{ \
+ if (!IsParallelCopy()) \
+ processed++; \
+ else \
+ pg_atomic_add_fetch_u64(&cstate->pcdata->pcshared_info->processed, 1); \
+}
+
+/*
+ * Get the lines processed.
+ */
+#define RETURNPROCESSED(processed) \
+if (!IsParallelCopy()) \
+ return processed; \
+else \
+ return pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
+
/*
* Represents the different source/dest cases we need to worry about at
* the bottom level
@@ -75,6 +100,28 @@ typedef enum EolType
} EolType;
/*
+ * State of the line.
+ */
+typedef enum ParallelCopyLineState
+{
+ LINE_INIT, /* initial state of line */
+ LINE_LEADER_POPULATING, /* leader processing line */
+ LINE_LEADER_POPULATED, /* leader completed populating line */
+ LINE_WORKER_PROCESSING, /* worker processing line */
+ LINE_WORKER_PROCESSED /* worker completed processing line */
+} ParallelCopyLineState;
+
+/*
+ * Represents the heap insert method to be used during COPY FROM.
+ */
+typedef enum CopyInsertMethod
+{
+ CIM_SINGLE, /* use table_tuple_insert or fdw routine */
+ CIM_MULTI, /* always use table_multi_insert */
+ CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
+} CopyInsertMethod;
+
+/*
* Copy data block information.
*
* These data blocks are created in DSM. Data read from file will be copied in
@@ -194,8 +241,6 @@ typedef struct ParallelCopyShmInfo
uint64 populated; /* lines populated by leader */
uint32 cur_block_pos; /* current data block */
ParallelCopyDataBlock data_blocks[MAX_BLOCKS_COUNT]; /* data block array */
- FullTransactionId full_transaction_id; /* xid for copy from statement */
- CommandId mycid; /* command id */
ParallelCopyLineBoundaries line_boundaries; /* line array */
} ParallelCopyShmInfo;
@@ -242,12 +287,12 @@ typedef struct ParallelCopyData
ParallelCopyShmInfo *pcshared_info; /* common info in shared memory */
bool is_leader;
+ /* line position which worker is processing */
+ uint32 worker_processed_pos;
+
WalUsage *walusage;
BufferUsage *bufferusage;
- /* line position which worker is processing */
- uint32 worker_processed_pos;
-
/*
* Local line_buf array, workers will copy it here and release the lines
* for the leader to continue.
@@ -423,9 +468,23 @@ extern DestReceiver *CreateCopyDestReceiver(void);
extern void PopulateCommonCstateInfo(CopyState cstate, TupleDesc tup_desc,
List *attnamelist);
+extern void ConvertToServerEncoding(CopyState cstate);
extern void ParallelCopyMain(dsm_segment *seg, shm_toc *toc);
extern ParallelContext *BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid);
extern void ParallelCopyFrom(CopyState cstate);
extern void EndParallelCopy(ParallelContext *pcxt);
+extern void ExecBeforeStmtTrigger(CopyState cstate);
+extern void CheckTargetRelValidity(CopyState cstate);
+extern void PopulateCstateCatalogInfo(CopyState cstate);
+extern uint32 GetLinePosition(CopyState cstate);
+extern bool GetWorkerLine(CopyState cstate);
+extern bool CopyReadLine(CopyState cstate);
+extern uint32 WaitGetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info);
+extern void SetRawBufForLoad(CopyState cstate, uint32 line_size, uint32 copy_buf_len,
+ uint32 raw_buf_ptr, char **copy_raw_buf);
+extern uint32 UpdateSharedLineInfo(CopyState cstate, uint32 blk_pos, uint32 offset,
+ uint32 line_size, uint32 line_state, uint32 blk_line_pos);
+extern void EndLineParallelCopy(CopyState cstate, uint32 line_pos, uint32 line_size,
+ uint32 raw_buf_ptr);
#endif /* COPY_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index f5b818b..a198bf0 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1707,6 +1707,7 @@ ParallelCopyLineBoundary
ParallelCopyData
ParallelCopyDataBlock
ParallelCopyLineBuf
+ParallelCopyLineState
ParallelCopyShmInfo
ParallelExecutorInfo
ParallelHashGrowth
--
1.8.3.1
v6-0004-Documentation-for-parallel-copy.patchapplication/x-patch; name=v6-0004-Documentation-for-parallel-copy.patchDownload
From 0dd69daa6794de67ca6398d4f72869ae3c17dcdb Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Sat, 1 Aug 2020 09:00:36 +0530
Subject: [PATCH v6 4/6] Documentation for parallel copy.
This patch has the documentation changes for parallel copy.
---
doc/src/sgml/ref/copy.sgml | 31 +++++++++++++++++++++++++++++++
1 file changed, 31 insertions(+)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18189ab..19b1979 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -37,6 +37,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
NULL '<replaceable class="parameter">null_string</replaceable>'
HEADER [ <replaceable class="parameter">boolean</replaceable> ]
+ PARALLEL <replaceable class="parameter">integer</replaceable>
QUOTE '<replaceable class="parameter">quote_character</replaceable>'
ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -275,6 +276,22 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</varlistentry>
<varlistentry>
+ <term><literal>PARALLEL</literal></term>
+ <listitem>
+ <para>
+ Perform <command>COPY FROM</command> in parallel using <replaceable
+ class="parameter">integer</replaceable> background workers. Please
+ note that it is not guaranteed that the number of parallel workers
+ specified in <replaceable class="parameter">integer</replaceable> will
+ be used during execution. It is possible for a copy to run with fewer
+ workers than specified, or even with no workers at all (for example,
+ due to the setting of max_worker_processes). This option is allowed
+ only in <command>COPY FROM</command>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>QUOTE</literal></term>
<listitem>
<para>
@@ -951,6 +968,20 @@ COPY country FROM '/usr1/proj/bray/sql/country_data';
</para>
<para>
+ To copy data parallelly from a file into the <literal>country</literal> table:
+<programlisting>
+COPY country FROM '/usr1/proj/bray/sql/country_data' WITH (PARALLEL 1);
+</programlisting>
+ </para>
+
+ <para>
+ To copy data parallelly from STDIN into the <literal>country</literal> table:
+<programlisting>
+COPY country FROM STDIN WITH (PARALLEL 1);
+</programlisting>
+ </para>
+
+ <para>
To copy into a file just the countries whose names start with 'A':
<programlisting>
COPY (SELECT * FROM country WHERE country_name LIKE 'A%') TO '/usr1/proj/bray/sql/a_list_countries.copy';
--
1.8.3.1
v6-0005-Tests-for-parallel-copy.patchapplication/x-patch; name=v6-0005-Tests-for-parallel-copy.patchDownload
From 1f07a8f5128a2e295e6dda1ab667f8ca8fcffaff Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Sat, 1 Aug 2020 09:19:39 +0530
Subject: [PATCH v6 5/6] Tests for parallel copy.
This patch has the tests for parallel copy.
---
src/test/regress/expected/copy2.out | 205 ++++++++++++++++++++++++++++++++++-
src/test/regress/input/copy.source | 12 +++
src/test/regress/output/copy.source | 12 +++
src/test/regress/sql/copy2.sql | 208 +++++++++++++++++++++++++++++++++++-
4 files changed, 429 insertions(+), 8 deletions(-)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index e40287d..7ae5d44 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -254,18 +254,32 @@ It is "perfect".|
"It is ""perfect""."," "
"",
--test that we read consecutive LFs properly
-CREATE TEMP TABLE testnl (a int, b text, c int);
+CREATE TABLE testnl (a int, b text, c int);
COPY testnl FROM stdin CSV;
+COPY testnl FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
+SELECT * FROM testnl;
+ a | b | c
+---+----------------------+---
+ 1 | a field with two LFs+| 2
+ | +|
+ | inside |
+ 1 | a field with two LFs+| 2
+ | +|
+ | inside |
+(2 rows)
+
-- test end of copy marker
-CREATE TEMP TABLE testeoc (a text);
+CREATE TABLE testeoc (a text);
COPY testeoc FROM stdin CSV;
+TRUNCATE testeoc;
+COPY testeoc FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
COPY testeoc TO stdout CSV;
a\.
\.b
c\.d
"\."
-- test handling of nonstandard null marker that violates escaping rules
-CREATE TEMP TABLE testnull(a int, b text);
+CREATE TABLE testnull(a int, b text);
INSERT INTO testnull VALUES (1, E'\\0'), (NULL, NULL);
COPY testnull TO stdout WITH NULL AS E'\\0';
1 \\0
@@ -280,6 +294,15 @@ SELECT * FROM testnull;
|
(4 rows)
+TRUNCATE testnull;
+COPY testnull FROM stdin WITH (NULL E'\\0', PARALLEL 1);
+SELECT * FROM testnull;
+ a | b
+----+----
+ 42 | \0
+ |
+(2 rows)
+
BEGIN;
CREATE TABLE vistest (LIKE testeoc);
COPY vistest FROM stdin CSV;
@@ -349,6 +372,34 @@ SELECT * FROM vistest;
BEGIN;
TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+SELECT * FROM vistest;
+ a
+----
+ a2
+ b
+(2 rows)
+
+SAVEPOINT s1;
+TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+SELECT * FROM vistest;
+ a
+----
+ d2
+ e
+(2 rows)
+
+COMMIT;
+SELECT * FROM vistest;
+ a
+----
+ d2
+ e
+(2 rows)
+
+BEGIN;
+TRUNCATE vistest;
COPY vistest FROM stdin CSV FREEZE;
SELECT * FROM vistest;
a
@@ -409,7 +460,7 @@ SELECT * FROM vistest;
(2 rows)
-- Test FORCE_NOT_NULL and FORCE_NULL options
-CREATE TEMP TABLE forcetest (
+CREATE TABLE forcetest (
a INT NOT NULL,
b TEXT NOT NULL,
c TEXT,
@@ -420,6 +471,8 @@ CREATE TEMP TABLE forcetest (
-- should succeed with no effect ("b" remains an empty string, "c" remains NULL)
BEGIN;
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
+TRUNCATE forcetest;
+COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c), PARALLEL 1);
COMMIT;
SELECT b, c FROM forcetest WHERE a = 1;
b | c
@@ -430,6 +483,8 @@ SELECT b, c FROM forcetest WHERE a = 1;
-- should succeed, FORCE_NULL and FORCE_NOT_NULL can be both specified
BEGIN;
COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d));
+TRUNCATE forcetest;
+COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d), PARALLEL 1);
COMMIT;
SELECT c, d FROM forcetest WHERE a = 2;
c | d
@@ -486,6 +541,31 @@ select * from check_con_tbl;
(2 rows)
+\d+ check_con_tbl
+ Table "public.check_con_tbl"
+ Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
+--------+---------+-----------+----------+---------+---------+--------------+-------------
+ f1 | integer | | | | plain | |
+Check constraints:
+ "check_con_tbl_check" CHECK (check_con_function(check_con_tbl.*))
+
+truncate check_con_tbl;
+copy check_con_tbl from stdin with (parallel 1);
+NOTICE: input = {"f1":1}
+NOTICE: input = {"f1":null}
+copy check_con_tbl from stdin with (parallel 1);
+NOTICE: input = {"f1":0}
+ERROR: new row for relation "check_con_tbl" violates check constraint "check_con_tbl_check"
+DETAIL: Failing row contains (0).
+CONTEXT: COPY check_con_tbl, line 1: "0"
+parallel worker
+select * from check_con_tbl;
+ f1
+----
+ 1
+
+(2 rows)
+
-- test with RLS enabled.
CREATE ROLE regress_rls_copy_user;
CREATE ROLE regress_rls_copy_user_colperms;
@@ -600,8 +680,125 @@ SELECT * FROM instead_of_insert_tbl;
(2 rows)
COMMIT;
+-- Parallel copy tests.
+CREATE TABLE test_parallel_copy (
+ a INT,
+ b INT,
+ c TEXT not null default 'stuff',
+ d TEXT,
+ e TEXT
+) ;
+COPY test_parallel_copy (a, b, c, d, e) from stdin with (PARALLEL 1);
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1, FORMAT 'csv', HEADER);
+-- zero workers: should perform non-parallel copy
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0');
+ERROR: value 0 out of bounds for option "parallel"
+DETAIL: Valid values are between "1" and "1024".
+-- referencing table: should perform non-parallel copy
+CREATE TABLE test_copy_pk(c1 INT PRIMARY KEY);
+INSERT INTO test_copy_pk VALUES(10);
+CREATE TABLE test_copy_ri(c1 INT REFERENCES test_copy_pk(c1));
+COPY test_copy_ri from stdin with (FORMAT csv, DELIMITER ',', PARALLEL 1);
+-- expressions: should perform non-parallel copy
+CREATE TABLE test_copy_expr (index INT, height REAL, weight REAL);
+COPY test_copy_expr FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1) WHERE height > random() * 65;
+-- serial data: should perform non-parallel copy
+CREATE TABLE testserial (index SERIAL, height REAL);
+COPY testserial(height) FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1);
+-- temporary table copy: should perform non-parallel copy
+CREATE TEMPORARY TABLE temp_test(
+ a int
+) ;
+COPY temp_test (a) from stdin with (PARALLEL 1);
+-- non-existent column in column list: should fail
+COPY test_parallel_copy (xyz) from stdin with (PARALLEL 1);
+ERROR: column "xyz" of relation "test_parallel_copy" does not exist
+-- too many columns in column list: should fail
+COPY test_parallel_copy (a, b, c, d, e, d, c) from stdin with (PARALLEL 1);
+ERROR: column "d" specified more than once
+-- missing data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: invalid input syntax for type integer: ""
+CONTEXT: COPY test_parallel_copy, line 0, column a: ""
+parallel worker
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: missing data for column "e"
+CONTEXT: COPY test_parallel_copy, line 1: "2000 230 23 23"
+parallel worker
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: missing data for column "e"
+CONTEXT: COPY test_parallel_copy, line 1: "2001 231 \N \N"
+parallel worker
+-- extra data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: extra data after last expected column
+CONTEXT: COPY test_parallel_copy, line 1: "2002 232 40 50 60 70 80"
+parallel worker
+-- various COPY options: delimiters, oids, NULL string, encoding
+COPY test_parallel_copy (b, c, d, e) from stdin with (delimiter ',', null 'x', PARALLEL 1) ;
+COPY test_parallel_copy from stdin WITH (DELIMITER ';', NULL '', PARALLEL 1);
+COPY test_parallel_copy from stdin WITH (DELIMITER ':', NULL E'\\X', ENCODING 'sql_ascii', PARALLEL 1);
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = 50004;
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a > 60003;
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE f > 60003;
+ERROR: column "f" does not exist
+LINE 1: ..._parallel_copy from stdin WITH (PARALLEL 1) WHERE f > 60003;
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = max(x.b);
+ERROR: missing FROM-clause entry for table "x"
+LINE 1: ...rallel_copy from stdin WITH (PARALLEL 1) WHERE a = max(x.b);
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (SELECT 1 FROM x);
+ERROR: cannot use subquery in COPY FROM WHERE condition
+LINE 1: ...arallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (SELECT...
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (generate_series(1,5));
+ERROR: set-returning functions are not allowed in COPY FROM WHERE conditions
+LINE 1: ...lel_copy from stdin WITH (PARALLEL 1) WHERE a IN (generate_s...
+ ^
+COPY test_parallel_copy from stdin WITH(PARALLEL 1) WHERE a = row_number() over(b);
+ERROR: window functions are not allowed in COPY FROM WHERE conditions
+LINE 1: ...rallel_copy from stdin WITH(PARALLEL 1) WHERE a = row_number...
+ ^
+-- check results of copy in
+SELECT * FROM test_parallel_copy;
+ a | b | c | d | e
+-------+----+------------+---------+---------
+ 1 | 11 | test_c1 | test_d1 | test_e1
+ 2 | 12 | test_c2 | test_d2 | test_e2
+ | 3 | stuff | test_d3 |
+ | 4 | stuff | test_d4 |
+ | 5 | stuff | test_d5 |
+ | | 45 | 80 | 90
+ | | x | \x | \x
+ | | , | \, | \
+ 3000 | | c | |
+ 4000 | | C | |
+ 4001 | 1 | empty | |
+ 4002 | 2 | null | |
+ 4003 | 3 | Backslash | \ | \
+ 4004 | 4 | BackslashX | \X | \X
+ 4005 | 5 | N | N | N
+ 4006 | 6 | BackslashN | \N | \N
+ 4007 | 7 | XX | XX | XX
+ 4008 | 8 | Delimiter | : | :
+ 50004 | 25 | 35 | 45 | 55
+ 60004 | 25 | 35 | 45 | 55
+ 60005 | 26 | 36 | 46 | 56
+(21 rows)
+
-- clean up
+DROP TABLE test_copy_ri;
+DROP TABLE test_copy_pk;
+DROP TABLE test_copy_expr;
+DROP TABLE testeoc;
+DROP TABLE testnl;
DROP TABLE forcetest;
+DROP TABLE test_parallel_copy;
+DROP TABLE testserial;
+DROP TABLE testnull;
DROP TABLE vistest;
DROP FUNCTION truncate_in_subxact();
DROP TABLE x, y;
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index a1d529a..159c058 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -15,6 +15,13 @@ DELETE FROM onek;
COPY onek FROM '@abs_builddir@/results/onek.data';
+-- test parallel copy
+COPY tenk1 FROM '@abs_srcdir@/data/tenk.data' WITH (parallel 2);
+
+SELECT COUNT(*) FROM tenk1;
+
+TRUNCATE tenk1;
+
COPY tenk1 FROM '@abs_srcdir@/data/tenk.data';
COPY slow_emp4000 FROM '@abs_srcdir@/data/rect.data';
@@ -159,6 +166,11 @@ truncate parted_copytest;
copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv';
+-- Test parallel copy from with a partitioned table.
+truncate parted_copytest;
+
+copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv' with (parallel 2);
+
-- Ensure COPY FREEZE errors for partitioned tables.
begin;
truncate parted_copytest;
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index 938d355..c3003fe 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -9,6 +9,15 @@ COPY onek FROM '@abs_srcdir@/data/onek.data';
COPY onek TO '@abs_builddir@/results/onek.data';
DELETE FROM onek;
COPY onek FROM '@abs_builddir@/results/onek.data';
+-- test parallel copy
+COPY tenk1 FROM '@abs_srcdir@/data/tenk.data' WITH (parallel 2);
+SELECT COUNT(*) FROM tenk1;
+ count
+-------
+ 10000
+(1 row)
+
+TRUNCATE tenk1;
COPY tenk1 FROM '@abs_srcdir@/data/tenk.data';
COPY slow_emp4000 FROM '@abs_srcdir@/data/rect.data';
COPY person FROM '@abs_srcdir@/data/person.data';
@@ -113,6 +122,9 @@ insert into parted_copytest select x,1,'One' from generate_series(1011,1020) x;
copy (select * from parted_copytest order by a) to '@abs_builddir@/results/parted_copytest.csv';
truncate parted_copytest;
copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv';
+-- Test parallel copy from with a partitioned table.
+truncate parted_copytest;
+copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv' with (parallel 2);
-- Ensure COPY FREEZE errors for partitioned tables.
begin;
truncate parted_copytest;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 902f4fa..7015698 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -157,7 +157,7 @@ COPY y TO stdout (FORMAT CSV, FORCE_QUOTE *);
--test that we read consecutive LFs properly
-CREATE TEMP TABLE testnl (a int, b text, c int);
+CREATE TABLE testnl (a int, b text, c int);
COPY testnl FROM stdin CSV;
1,"a field with two LFs
@@ -165,8 +165,16 @@ COPY testnl FROM stdin CSV;
inside",2
\.
+COPY testnl FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
+1,"a field with two LFs
+
+inside",2
+\.
+
+SELECT * FROM testnl;
+
-- test end of copy marker
-CREATE TEMP TABLE testeoc (a text);
+CREATE TABLE testeoc (a text);
COPY testeoc FROM stdin CSV;
a\.
@@ -175,11 +183,19 @@ c\.d
"\."
\.
+TRUNCATE testeoc;
+COPY testeoc FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
+a\.
+\.b
+c\.d
+"\."
+\.
+
COPY testeoc TO stdout CSV;
-- test handling of nonstandard null marker that violates escaping rules
-CREATE TEMP TABLE testnull(a int, b text);
+CREATE TABLE testnull(a int, b text);
INSERT INTO testnull VALUES (1, E'\\0'), (NULL, NULL);
COPY testnull TO stdout WITH NULL AS E'\\0';
@@ -191,6 +207,14 @@ COPY testnull FROM stdin WITH NULL AS E'\\0';
SELECT * FROM testnull;
+TRUNCATE testnull;
+COPY testnull FROM stdin WITH (NULL E'\\0', PARALLEL 1);
+42 \\0
+\0 \0
+\.
+
+SELECT * FROM testnull;
+
BEGIN;
CREATE TABLE vistest (LIKE testeoc);
COPY vistest FROM stdin CSV;
@@ -235,6 +259,23 @@ SELECT * FROM vistest;
BEGIN;
TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+a2
+b
+\.
+SELECT * FROM vistest;
+SAVEPOINT s1;
+TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+d2
+e
+\.
+SELECT * FROM vistest;
+COMMIT;
+SELECT * FROM vistest;
+
+BEGIN;
+TRUNCATE vistest;
COPY vistest FROM stdin CSV FREEZE;
x
y
@@ -284,7 +325,7 @@ SELECT * FROM vistest;
COMMIT;
SELECT * FROM vistest;
-- Test FORCE_NOT_NULL and FORCE_NULL options
-CREATE TEMP TABLE forcetest (
+CREATE TABLE forcetest (
a INT NOT NULL,
b TEXT NOT NULL,
c TEXT,
@@ -297,6 +338,10 @@ BEGIN;
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
1,,""
\.
+TRUNCATE forcetest;
+COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c), PARALLEL 1);
+1,,""
+\.
COMMIT;
SELECT b, c FROM forcetest WHERE a = 1;
-- should succeed, FORCE_NULL and FORCE_NOT_NULL can be both specified
@@ -304,6 +349,10 @@ BEGIN;
COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d));
2,'a',,""
\.
+TRUNCATE forcetest;
+COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d), PARALLEL 1);
+2,'a',,""
+\.
COMMIT;
SELECT c, d FROM forcetest WHERE a = 2;
-- should fail with not-null constraint violation
@@ -339,6 +388,16 @@ copy check_con_tbl from stdin;
0
\.
select * from check_con_tbl;
+\d+ check_con_tbl
+truncate check_con_tbl;
+copy check_con_tbl from stdin with (parallel 1);
+1
+\N
+\.
+copy check_con_tbl from stdin with (parallel 1);
+0
+\.
+select * from check_con_tbl;
-- test with RLS enabled.
CREATE ROLE regress_rls_copy_user;
@@ -440,8 +499,149 @@ test1
SELECT * FROM instead_of_insert_tbl;
COMMIT;
+-- Parallel copy tests.
+CREATE TABLE test_parallel_copy (
+ a INT,
+ b INT,
+ c TEXT not null default 'stuff',
+ d TEXT,
+ e TEXT
+) ;
+
+COPY test_parallel_copy (a, b, c, d, e) from stdin with (PARALLEL 1);
+1 11 test_c1 test_d1 test_e1
+2 12 test_c2 test_d2 test_e2
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+3 test_d3
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+4 test_d4
+5 test_d5
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1, FORMAT 'csv', HEADER);
+b d
+\.
+
+-- zero workers: should perform non-parallel copy
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0');
+
+-- referencing table: should perform non-parallel copy
+CREATE TABLE test_copy_pk(c1 INT PRIMARY KEY);
+INSERT INTO test_copy_pk VALUES(10);
+CREATE TABLE test_copy_ri(c1 INT REFERENCES test_copy_pk(c1));
+COPY test_copy_ri from stdin with (FORMAT csv, DELIMITER ',', PARALLEL 1);
+10
+\.
+
+-- expressions: should perform non-parallel copy
+CREATE TABLE test_copy_expr (index INT, height REAL, weight REAL);
+
+COPY test_copy_expr FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1) WHERE height > random() * 65;
+60,60,60
+\.
+
+-- serial data: should perform non-parallel copy
+CREATE TABLE testserial (index SERIAL, height REAL);
+
+COPY testserial(height) FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1);
+60
+\.
+
+-- temporary table copy: should perform non-parallel copy
+CREATE TEMPORARY TABLE temp_test(
+ a int
+) ;
+
+COPY temp_test (a) from stdin with (PARALLEL 1);
+10
+\.
+
+-- non-existent column in column list: should fail
+COPY test_parallel_copy (xyz) from stdin with (PARALLEL 1);
+
+-- too many columns in column list: should fail
+COPY test_parallel_copy (a, b, c, d, e, d, c) from stdin with (PARALLEL 1);
+
+-- missing data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+
+\.
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+2000 230 23 23
+\.
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+2001 231 \N \N
+\.
+
+-- extra data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+2002 232 40 50 60 70 80
+\.
+
+-- various COPY options: delimiters, oids, NULL string, encoding
+COPY test_parallel_copy (b, c, d, e) from stdin with (delimiter ',', null 'x', PARALLEL 1) ;
+x,45,80,90
+x,\x,\\x,\\\x
+x,\,,\\\,,\\
+\.
+
+COPY test_parallel_copy from stdin WITH (DELIMITER ';', NULL '', PARALLEL 1);
+3000;;c;;
+\.
+
+COPY test_parallel_copy from stdin WITH (DELIMITER ':', NULL E'\\X', ENCODING 'sql_ascii', PARALLEL 1);
+4000:\X:C:\X:\X
+4001:1:empty::
+4002:2:null:\X:\X
+4003:3:Backslash:\\:\\
+4004:4:BackslashX:\\X:\\X
+4005:5:N:\N:\N
+4006:6:BackslashN:\\N:\\N
+4007:7:XX:\XX:\XX
+4008:8:Delimiter:\::\:
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = 50004;
+50003 24 34 44 54
+50004 25 35 45 55
+50005 26 36 46 56
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a > 60003;
+60001 22 32 42 52
+60002 23 33 43 53
+60003 24 34 44 54
+60004 25 35 45 55
+60005 26 36 46 56
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE f > 60003;
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = max(x.b);
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (SELECT 1 FROM x);
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (generate_series(1,5));
+
+COPY test_parallel_copy from stdin WITH(PARALLEL 1) WHERE a = row_number() over(b);
+
+-- check results of copy in
+SELECT * FROM test_parallel_copy;
+
-- clean up
+DROP TABLE test_copy_ri;
+DROP TABLE test_copy_pk;
+DROP TABLE test_copy_expr;
+DROP TABLE testeoc;
+DROP TABLE testnl;
DROP TABLE forcetest;
+DROP TABLE test_parallel_copy;
+DROP TABLE testserial;
+DROP TABLE testnull;
DROP TABLE vistest;
DROP FUNCTION truncate_in_subxact();
DROP TABLE x, y;
--
1.8.3.1
v6-0006-Parallel-Copy-For-Binary-Format-Files.patchapplication/x-patch; name=v6-0006-Parallel-Copy-For-Binary-Format-Files.patchDownload
From 3e5886767f10c3fae2d11c8014764da8127f4f77 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Wed, 7 Oct 2020 13:24:38 +0530
Subject: [PATCH v6 6/6] Parallel Copy For Binary Format Files
Leader reads data from the file into the DSM data blocks each of 64K size.
It also identifies each tuple data block id, start offset, end offset,
tuple size and updates this information in the ring data structure.
Workers parallelly read the tuple information from the ring data structure,
the actual tuple data from the data blocks and parallelly insert the tuples
into the table.
---
src/backend/commands/copy.c | 134 ++++++------
src/backend/commands/copyparallel.c | 422 ++++++++++++++++++++++++++++++++++--
src/include/commands/copy.h | 126 +++++++++++
3 files changed, 595 insertions(+), 87 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index dc006a5..2ea3a90 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -223,19 +223,14 @@ static void CopySendData(CopyState cstate, const void *databuf, int datasize);
static void CopySendString(CopyState cstate, const char *str);
static void CopySendChar(CopyState cstate, char c);
static void CopySendEndOfRow(CopyState cstate);
-static int CopyGetData(CopyState cstate, void *databuf,
- int minread, int maxread);
static void CopySendInt32(CopyState cstate, int32 val);
static bool CopyGetInt32(CopyState cstate, int32 *val);
static void CopySendInt16(CopyState cstate, int16 val);
static bool CopyGetInt16(CopyState cstate, int16 *val);
static bool CopyLoadRawBuf(CopyState cstate);
-static int CopyReadBinaryData(CopyState cstate, char *dest, int nbytes);
-
static void ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
int copy_line_pos, int *copy_line_size);
-
/*
* Send copy start/stop messages for frontend copies. These have changed
* in past protocol redesigns.
@@ -449,7 +444,7 @@ CopySendEndOfRow(CopyState cstate)
*
* NB: no data conversion is applied here.
*/
-static int
+int
CopyGetData(CopyState cstate, void *databuf, int minread, int maxread)
{
int bytesread = 0;
@@ -582,10 +577,25 @@ CopyGetInt32(CopyState cstate, int32 *val)
{
uint32 buf;
- if (CopyReadBinaryData(cstate, (char *) &buf, sizeof(buf)) != sizeof(buf))
+ /*
+ * For parallel copy, avoid reading data to raw buf, read directly from
+ * file, later the data will be read to parallel copy data buffers.
+ */
+ if (cstate->nworkers > 0)
{
- *val = 0; /* suppress compiler warning */
- return false;
+ if (CopyGetData(cstate, &buf, sizeof(buf), sizeof(buf)) != sizeof(buf))
+ {
+ *val = 0; /* suppress compiler warning */
+ return false;
+ }
+ }
+ else
+ {
+ if (CopyReadBinaryData(cstate, (char *) &buf, sizeof(buf)) != sizeof(buf))
+ {
+ *val = 0; /* suppress compiler warning */
+ return false;
+ }
}
*val = (int32) pg_ntoh32(buf);
return true;
@@ -661,7 +671,7 @@ CopyLoadRawBuf(CopyState cstate)
* and writes them to 'dest'. Returns the number of bytes read (which
* would be less than 'nbytes' only if we reach EOF).
*/
-static int
+int
CopyReadBinaryData(CopyState cstate, char *dest, int nbytes)
{
int copied_bytes = 0;
@@ -986,7 +996,15 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
EndParallelCopy(pcxt);
}
else
+ {
+ /*
+ * Reset nworkers to -1 here. This is useful in cases where user
+ * specifies parallel workers, but, no worker is picked up, so go
+ * back to non parallel mode value of nworkers.
+ */
+ cstate->nworkers = -1;
*processed = CopyFrom(cstate); /* copy from file to database */
+ }
EndCopyFrom(cstate);
}
@@ -3552,7 +3570,7 @@ BeginCopyFrom(ParseState *pstate,
int32 tmp;
/* Signature */
- if (CopyReadBinaryData(cstate, readSig, 11) != 11 ||
+ if (CopyGetData(cstate, readSig, 11, 11) != 11 ||
memcmp(readSig, BinarySignature, 11) != 0)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
@@ -3580,7 +3598,7 @@ BeginCopyFrom(ParseState *pstate,
/* Skip extension header, if present */
while (tmp-- > 0)
{
- if (CopyReadBinaryData(cstate, readSig, 1) != 1)
+ if (CopyGetData(cstate, readSig, 1, 1) != 1)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
errmsg("invalid COPY file header (wrong length)")));
@@ -3777,60 +3795,45 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
else
{
/* binary */
- int16 fld_count;
- ListCell *cur;
-
cstate->cur_lineno++;
+ cstate->max_fields = list_length(cstate->attnumlist);
- if (!CopyGetInt16(cstate, &fld_count))
+ if (!IsParallelCopy())
{
- /* EOF detected (end of file, or protocol-level EOF) */
- return false;
- }
+ int16 fld_count;
+ ListCell *cur;
- if (fld_count == -1)
- {
- /*
- * Received EOF marker. In a V3-protocol copy, wait for the
- * protocol-level EOF, and complain if it doesn't come
- * immediately. This ensures that we correctly handle CopyFail,
- * if client chooses to send that now.
- *
- * Note that we MUST NOT try to read more data in an old-protocol
- * copy, since there is no protocol-level EOF marker then. We
- * could go either way for copy from file, but choose to throw
- * error if there's data after the EOF marker, for consistency
- * with the new-protocol case.
- */
- char dummy;
-
- if (cstate->copy_dest != COPY_OLD_FE &&
- CopyReadBinaryData(cstate, &dummy, 1) > 0)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("received copy data after EOF marker")));
- return false;
- }
+ if (!CopyGetInt16(cstate, &fld_count))
+ {
+ /* EOF detected (end of file, or protocol-level EOF) */
+ return false;
+ }
- if (fld_count != attr_count)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("row field count is %d, expected %d",
- (int) fld_count, attr_count)));
+ CHECK_FIELD_COUNT;
- foreach(cur, cstate->attnumlist)
+ foreach(cur, cstate->attnumlist)
+ {
+ int attnum = lfirst_int(cur);
+ int m = attnum - 1;
+ Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+ cstate->cur_attname = NameStr(att->attname);
+ values[m] = CopyReadBinaryAttribute(cstate,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &nulls[m]);
+ cstate->cur_attname = NULL;
+ }
+ }
+ else
{
- int attnum = lfirst_int(cur);
- int m = attnum - 1;
- Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+ bool eof = false;
- cstate->cur_attname = NameStr(att->attname);
- values[m] = CopyReadBinaryAttribute(cstate,
- &in_functions[m],
- typioparams[m],
- att->atttypmod,
- &nulls[m]);
- cstate->cur_attname = NULL;
+ eof = CopyReadBinaryTupleWorker(cstate, values, nulls);
+
+ if (eof)
+ return false;
}
}
@@ -4842,18 +4845,15 @@ CopyReadBinaryAttribute(CopyState cstate, FmgrInfo *flinfo,
Datum result;
if (!CopyGetInt32(cstate, &fld_size))
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("unexpected EOF in COPY data")));
+ EOF_ERROR;
+
if (fld_size == -1)
{
*isnull = true;
return ReceiveFunctionCall(flinfo, NULL, typioparam, typmod);
}
- if (fld_size < 0)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("invalid field size")));
+
+ CHECK_FIELD_SIZE(fld_size);
/* reset attribute_buf to empty, and load raw data in it */
resetStringInfo(&cstate->attribute_buf);
@@ -4861,9 +4861,7 @@ CopyReadBinaryAttribute(CopyState cstate, FmgrInfo *flinfo,
enlargeStringInfo(&cstate->attribute_buf, fld_size);
if (CopyReadBinaryData(cstate, cstate->attribute_buf.data,
fld_size) != fld_size)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("unexpected EOF in COPY data")));
+ EOF_ERROR;
cstate->attribute_buf.len = fld_size;
cstate->attribute_buf.data[fld_size] = '\0';
diff --git a/src/backend/commands/copyparallel.c b/src/backend/commands/copyparallel.c
index 6a44a01..e8d1a99 100644
--- a/src/backend/commands/copyparallel.c
+++ b/src/backend/commands/copyparallel.c
@@ -94,6 +94,7 @@ SerializeParallelCopyState(ParallelContext *pcxt, CopyState cstate,
shared_cstate.convert_selectively = cstate->convert_selectively;
shared_cstate.num_defaults = cstate->num_defaults;
shared_cstate.relid = cstate->pcdata->relid;
+ shared_cstate.binary = cstate->binary;
memcpy(shmptr, (char *) &shared_cstate, sizeof(SerializedParallelCopyState));
copiedsize = sizeof(SerializedParallelCopyState);
@@ -191,6 +192,7 @@ RestoreParallelCopyState(shm_toc *toc, CopyState cstate, List **attlist)
cstate->convert_selectively = shared_cstate.convert_selectively;
cstate->num_defaults = shared_cstate.num_defaults;
cstate->pcdata->relid = shared_cstate.relid;
+ cstate->binary = shared_cstate.binary;
cstate->null_print = CopyStringFromSharedMemory(shared_str_val + copiedsize,
&copiedsize);
@@ -380,7 +382,7 @@ static pg_attribute_always_inline bool
IsParallelCopyAllowed(CopyState cstate)
{
/* Parallel copy not allowed for frontend (2.0 protocol) & binary option. */
- if ((cstate->copy_dest == COPY_OLD_FE) || cstate->binary)
+ if (cstate->copy_dest == COPY_OLD_FE)
return false;
/*
@@ -976,33 +978,67 @@ ParallelCopyFrom(CopyState cstate)
/* Execute the before statement triggers from the leader */
ExecBeforeStmtTrigger(cstate);
- /* On input just throw the header line away. */
- if (cstate->cur_lineno == 0 && cstate->header_line)
+ if (!cstate->binary)
{
- cstate->cur_lineno++;
- if (CopyReadLine(cstate))
+ /* On input just throw the header line away. */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
{
- pcshared_info->is_read_in_progress = false;
- return; /* done */
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ {
+ pcshared_info->is_read_in_progress = false;
+ return; /* done */
+ }
}
- }
- for (;;)
- {
- bool done;
+ for (;;)
+ {
+ bool done;
- cstate->cur_lineno++;
+ cstate->cur_lineno++;
- /* Actually read the line into memory here. */
- done = CopyReadLine(cstate);
+ /* Actually read the line into memory here. */
+ done = CopyReadLine(cstate);
+ /*
+ * EOF at start of line means we're done. If we see EOF after
+ * some characters, we act as though it was newline followed by
+ * EOF, ie, process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ break;
+ }
+ }
+ else
+ {
/*
- * EOF at start of line means we're done. If we see EOF after some
- * characters, we act as though it was newline followed by EOF, ie,
- * process the line and then exit loop on next iteration.
+ * Binary Format Files. For parallel copy leader, fill in the error
+ * context information here, in case any failures while determining
+ * tuple offsets, leader would throw the errors with proper context.
*/
- if (done && cstate->line_buf.len == 0)
- break;
+ ErrorContextCallback errcallback;
+
+ errcallback.callback = CopyFromErrorCallback;
+ errcallback.arg = (void *) cstate;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+ cstate->pcdata->curr_data_block = NULL;
+ cstate->raw_buf_index = 0;
+ pcshared_info->populated = 0;
+ cstate->cur_lineno = 0;
+ cstate->max_fields = list_length(cstate->attnumlist);
+
+ for (;;)
+ {
+ bool eof = false;
+
+ cstate->cur_lineno++;
+
+ eof = CopyReadBinaryTupleLeader(cstate);
+
+ if (eof)
+ break;
+ }
}
pcshared_info->is_read_in_progress = false;
@@ -1010,6 +1046,354 @@ ParallelCopyFrom(CopyState cstate)
}
/*
+ * CopyReadBinaryGetDataBlock
+ *
+ * Gets a new block, updates the current offset, calculates the skip bytes.
+ */
+void
+CopyReadBinaryGetDataBlock(CopyState cstate, FieldInfoType field_info)
+{
+ ParallelCopyDataBlock *data_block = NULL;
+ ParallelCopyDataBlock *curr_data_block = cstate->pcdata->curr_data_block;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint8 move_bytes = 0;
+ uint32 block_pos;
+ uint32 prev_block_pos;
+ int read_bytes = 0;
+
+ prev_block_pos = pcshared_info->cur_block_pos;
+
+ block_pos = WaitGetFreeCopyBlock(pcshared_info);
+
+ if (field_info == FIELD_SIZE || field_info == FIELD_COUNT)
+ move_bytes = (DATA_BLOCK_SIZE - cstate->raw_buf_index);
+
+ if (curr_data_block != NULL)
+ curr_data_block->skip_bytes = move_bytes;
+
+ data_block = &pcshared_info->data_blocks[block_pos];
+
+ if (move_bytes > 0 && curr_data_block != NULL)
+ memmove(&data_block->data[0], &curr_data_block->data[cstate->raw_buf_index], move_bytes);
+
+ elog(DEBUG1, "LEADER - field info %d is spread across data blocks - moved %d bytes from current block %u to %u block",
+ field_info, move_bytes, prev_block_pos, block_pos);
+
+ read_bytes = CopyGetData(cstate, &data_block->data[move_bytes], 1, (DATA_BLOCK_SIZE - move_bytes));
+
+ if (field_info == FIELD_NONE && cstate->reached_eof)
+ return;
+
+ if (cstate->reached_eof)
+ EOF_ERROR;
+
+ elog(DEBUG1, "LEADER - bytes read from file %d", read_bytes);
+
+ if (field_info == FIELD_SIZE || field_info == FIELD_DATA)
+ {
+ ParallelCopyDataBlock *prev_data_block = NULL;
+
+ prev_data_block = curr_data_block;
+ prev_data_block->following_block = block_pos;
+
+ if (prev_data_block->curr_blk_completed == false)
+ prev_data_block->curr_blk_completed = true;
+
+ pg_atomic_add_fetch_u32(&prev_data_block->unprocessed_line_parts, 1);
+ }
+
+ cstate->pcdata->curr_data_block = data_block;
+ cstate->raw_buf_index = 0;
+}
+
+/*
+ * CopyReadBinaryTupleLeader
+ *
+ * Leader reads data from binary formatted file to data blocks and identifies
+ * tuple boundaries/offsets so that workers can work on the data blocks data.
+ */
+bool
+CopyReadBinaryTupleLeader(CopyState cstate)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ int16 fld_count;
+ uint32 line_size = 0;
+ uint32 start_block_pos;
+ uint32 start_offset;
+
+ if (cstate->pcdata->curr_data_block == NULL)
+ {
+ CopyReadBinaryGetDataBlock(cstate, FIELD_NONE);
+
+ /*
+ * no data is read from file here. one possibility to be here could be
+ * that the binary file just has a valid signature but nothing else.
+ */
+ if (cstate->reached_eof)
+ return true;
+ }
+
+ if ((cstate->raw_buf_index + sizeof(fld_count)) >= DATA_BLOCK_SIZE)
+ CopyReadBinaryGetDataBlock(cstate, FIELD_COUNT);
+
+ memcpy(&fld_count, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_count));
+ fld_count = (int16) pg_ntoh16(fld_count);
+ CHECK_FIELD_COUNT;
+ start_offset = cstate->raw_buf_index;
+ cstate->raw_buf_index += sizeof(fld_count);
+ line_size += sizeof(fld_count);
+ start_block_pos = pcshared_info->cur_block_pos;
+
+ CopyReadBinaryFindTupleSize(cstate, &line_size);
+
+ pg_atomic_add_fetch_u32(&cstate->pcdata->curr_data_block->unprocessed_line_parts, 1);
+
+ if (line_size > 0)
+ (void) UpdateSharedLineInfo(cstate, start_block_pos, start_offset,
+ line_size, LINE_LEADER_POPULATED, -1);
+
+ return false;
+}
+
+/*
+ * CopyReadBinaryFindTupleSize
+ *
+ * Leader identifies boundaries/offsets for each attribute/column and finally
+ * results in the tuple/row size. It moves on to next data block if the
+ * attribute/column is spread across data blocks.
+ */
+void
+CopyReadBinaryFindTupleSize(CopyState cstate, uint32 *line_size)
+{
+ int32 fld_size;
+ ListCell *cur;
+ TupleDesc tup_desc = RelationGetDescr(cstate->rel);
+
+ foreach(cur, cstate->attnumlist)
+ {
+ int att_num = lfirst_int(cur);
+ Form_pg_attribute att = TupleDescAttr(tup_desc, (att_num - 1));
+
+ cstate->cur_attname = NameStr(att->attname);
+ fld_size = 0;
+
+ if ((cstate->raw_buf_index + sizeof(fld_size)) >= DATA_BLOCK_SIZE)
+ CopyReadBinaryGetDataBlock(cstate, FIELD_SIZE);
+
+ memcpy(&fld_size, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_size));
+ cstate->raw_buf_index += sizeof(fld_size);
+ *line_size += sizeof(fld_size);
+ fld_size = (int32) pg_ntoh32(fld_size);
+
+ CHECK_FIELD_SIZE(fld_size);
+
+ *line_size += fld_size;
+
+ if ((DATA_BLOCK_SIZE - cstate->raw_buf_index) >= fld_size)
+ {
+ cstate->raw_buf_index += fld_size;
+ elog(DEBUG1, "LEADER - tuple lies in he same data block");
+ }
+ else
+ {
+ int32 required_blks = 0;
+ int32 curr_blk_bytes = (DATA_BLOCK_SIZE - cstate->raw_buf_index);
+ int i = 0;
+
+ GET_REQUIRED_BLOCKS(required_blks, fld_size, curr_blk_bytes);
+
+ i = required_blks;
+
+ while (i > 0)
+ {
+ CopyReadBinaryGetDataBlock(cstate, FIELD_DATA);
+ i--;
+ }
+
+ GET_RAW_BUF_INDEX(cstate->raw_buf_index, fld_size, required_blks, curr_blk_bytes);
+
+ /*
+ * raw_buf_index should never cross data block size, as the
+ * required number of data blocks would have been obtained in the
+ * above while loop.
+ */
+ Assert(cstate->raw_buf_index <= DATA_BLOCK_SIZE);
+ }
+ cstate->cur_attname = NULL;
+ }
+}
+
+/*
+ * CopyReadBinaryTupleWorker
+ *
+ * Each worker reads data from data blocks after getting leader-identified tuple
+ * offsets from ring data structure.
+ */
+bool
+CopyReadBinaryTupleWorker(CopyState cstate, Datum *values, bool *nulls)
+{
+ uint32 line_pos;
+ ParallelCopyLineBoundary *line_info;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ int16 fld_count;
+ ListCell *cur;
+ FmgrInfo *in_functions = cstate->in_functions;
+ Oid *typioparams = cstate->typioparams;
+ TupleDesc tup_desc = RelationGetDescr(cstate->rel);
+
+
+ line_pos = GetLinePosition(cstate);
+ if (line_pos == -1)
+ return true;
+
+ line_info = &pcshared_info->line_boundaries.ring[line_pos];
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[line_info->first_block];
+ cstate->raw_buf_index = line_info->start_offset;
+
+ if (cstate->raw_buf_index + sizeof(fld_count) >= DATA_BLOCK_SIZE)
+ {
+ /*
+ * The case where field count spread across datablocks should never
+ * occur, as the leader would have moved it to next block. this code
+ * exists for debugging purposes only.
+ */
+ elog(DEBUG1, "WORKER - field count spread across datablocks should never occur");
+ }
+
+ memcpy(&fld_count, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_count));
+ fld_count = (int16) pg_ntoh16(fld_count);
+
+ CHECK_FIELD_COUNT;
+
+ cstate->raw_buf_index += sizeof(fld_count);
+
+ foreach(cur, cstate->attnumlist)
+ {
+ int att_num = lfirst_int(cur);
+ int m = att_num - 1;
+ Form_pg_attribute att = TupleDescAttr(tup_desc, m);
+
+ cstate->cur_attname = NameStr(att->attname);
+
+ values[m] = CopyReadBinaryAttributeWorker(cstate,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &nulls[m]);
+ cstate->cur_attname = NULL;
+ }
+
+ pg_atomic_sub_fetch_u32(&cstate->pcdata->curr_data_block->unprocessed_line_parts, 1);
+ line_info->start_offset = -1;
+ pg_atomic_write_u32(&line_info->line_state, LINE_WORKER_PROCESSED);
+ pg_atomic_write_u32(&line_info->line_size, -1);
+ pg_atomic_add_fetch_u64(&pcshared_info->total_worker_processed, 1);
+
+ return false;
+}
+
+/*
+ * CopyReadBinaryAttributeWorker
+ *
+ * Leader identifies boundaries/offsets for each attribute/column, it moves on
+ * to next data block if the attribute/column is spread across data blocks.
+ */
+Datum
+CopyReadBinaryAttributeWorker(CopyState cstate, FmgrInfo *flinfo,
+ Oid typioparam, int32 typmod, bool *isnull)
+{
+ int32 fld_size;
+ Datum result;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+
+ if ((cstate->raw_buf_index + sizeof(fld_size)) >= DATA_BLOCK_SIZE)
+ {
+ ParallelCopyDataBlock *prev_data_block = cstate->pcdata->curr_data_block;
+
+ elog(DEBUG1, "WORKER - field size is spread across data blocks");
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[prev_data_block->following_block];
+ pg_atomic_sub_fetch_u32(&prev_data_block->unprocessed_line_parts, 1);
+ cstate->raw_buf_index = 0;
+ }
+
+ memcpy(&fld_size, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_size));
+ fld_size = (int32) pg_ntoh32(fld_size);
+
+ CHECK_FIELD_SIZE(fld_size);
+
+ cstate->raw_buf_index += sizeof(fld_size);
+
+ /* reset attribute_buf to empty, and load raw data in it */
+ resetStringInfo(&cstate->attribute_buf);
+
+ enlargeStringInfo(&cstate->attribute_buf, fld_size);
+
+ if ((DATA_BLOCK_SIZE - cstate->raw_buf_index) >= fld_size)
+ {
+ elog(DEBUG1, "WORKER - tuple lies in single data block");
+ memcpy(&cstate->attribute_buf.data[0], &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], fld_size);
+ cstate->raw_buf_index += fld_size;
+ }
+ else
+ {
+ uint32 att_buf_idx = 0;
+ uint32 copy_bytes = 0;
+ int32 required_blks = 0;
+ int32 curr_blk_bytes = (DATA_BLOCK_SIZE - cstate->raw_buf_index);
+ ParallelCopyDataBlock *prev_data_block = NULL;
+ int i = 0;
+
+ GET_REQUIRED_BLOCKS(required_blks, fld_size, curr_blk_bytes);
+
+ i = required_blks;
+ prev_data_block = cstate->pcdata->curr_data_block;
+ elog(DEBUG1, "WORKER - tuple is spread across data blocks");
+ memcpy(&cstate->attribute_buf.data[0], &prev_data_block->data[cstate->raw_buf_index],
+ curr_blk_bytes);
+ copy_bytes = curr_blk_bytes;
+ att_buf_idx = curr_blk_bytes;
+
+ while (i > 0)
+ {
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[prev_data_block->following_block];
+ pg_atomic_sub_fetch_u32(&prev_data_block->unprocessed_line_parts, 1);
+ cstate->raw_buf_index = 0;
+ copy_bytes = fld_size - att_buf_idx;
+
+ /*
+ * The bytes that are yet to be taken into att buff are more than
+ * the entire data block size, but only take the data block size
+ * elements.
+ */
+ if (copy_bytes >= DATA_BLOCK_SIZE)
+ copy_bytes = DATA_BLOCK_SIZE;
+
+ memcpy(&cstate->attribute_buf.data[att_buf_idx],
+ &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], copy_bytes);
+ att_buf_idx += copy_bytes;
+ i--;
+ }
+ GET_RAW_BUF_INDEX(cstate->raw_buf_index, fld_size, required_blks, curr_blk_bytes);
+ }
+
+ cstate->attribute_buf.len = fld_size;
+ cstate->attribute_buf.data[fld_size] = '\0';
+
+ /* Call the column type's binary input converter */
+ result = ReceiveFunctionCall(flinfo, &cstate->attribute_buf,
+ typioparam, typmod);
+
+ /* Trouble if it didn't eat the whole buffer */
+ if (cstate->attribute_buf.cursor != cstate->attribute_buf.len)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
+ errmsg("incorrect binary data format")));
+
+ *isnull = false;
+ return result;
+}
+
+/*
* GetLinePosition - Return the line position once the leader has populated the
* data.
*/
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index a9fe950..746c139 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -77,6 +77,109 @@ else \
return pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
/*
+ * CHECK_FIELD_COUNT - Handles the error cases for field count
+ * for binary format files.
+ */
+#define CHECK_FIELD_COUNT \
+{\
+ if (fld_count == -1) \
+ { \
+ if (IsParallelCopy() && \
+ !IsLeader()) \
+ return true; \
+ else if (IsParallelCopy() && \
+ IsLeader()) \
+ { \
+ if (cstate->pcdata->curr_data_block->data[cstate->raw_buf_index + sizeof(fld_count)] != 0) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("received copy data after EOF marker"))); \
+ return true; \
+ } \
+ else \
+ { \
+ /* \
+ * Received EOF marker. In a V3-protocol copy, wait for the \
+ * protocol-level EOF, and complain if it doesn't come \
+ * immediately. This ensures that we correctly handle CopyFail, \
+ * if client chooses to send that now. \
+ * \
+ * Note that we MUST NOT try to read more data in an old-protocol \
+ * copy, since there is no protocol-level EOF marker then. We \
+ * could go either way for copy from file, but choose to throw \
+ * error if there's data after the EOF marker, for consistency \
+ * with the new-protocol case. \
+ */ \
+ char dummy; \
+ if (cstate->copy_dest != COPY_OLD_FE && \
+ CopyReadBinaryData(cstate, &dummy, 1) > 0) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("received copy data after EOF marker"))); \
+ return false; \
+ } \
+ } \
+ if (fld_count != cstate->max_fields) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("row field count is %d, expected %d", \
+ (int) fld_count, cstate->max_fields))); \
+}
+
+/*
+ * CHECK_FIELD_SIZE - Handles the error case for field size
+ * for binary format files.
+ */
+#define CHECK_FIELD_SIZE(fld_size) \
+{ \
+ if (fld_size < -1) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("invalid field size")));\
+}
+
+/*
+ * EOF_ERROR - Error statement for EOF for binary format
+ * files.
+ */
+#define EOF_ERROR \
+{ \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("unexpected EOF in COPY data")));\
+}
+
+/*
+ * GET_RAW_BUF_INDEX - Calculates the raw buf index for the cases
+ * where the data spread is across multiple data blocks.
+ */
+#define GET_RAW_BUF_INDEX(raw_buf_index, fld_size, required_blks, curr_blk_bytes) \
+{ \
+ raw_buf_index = fld_size - (((required_blks - 1) * DATA_BLOCK_SIZE) + curr_blk_bytes); \
+}
+
+/*
+ * GET_REQUIRED_BLOCKS - Calculates the number of data
+ * blocks required for the cases where the data spread
+ * is across multiple data blocks.
+ */
+#define GET_REQUIRED_BLOCKS(required_blks, fld_size, curr_blk_bytes) \
+{ \
+ /* \
+ * field size can spread across multiple data blocks, \
+ * calculate the number of required data blocks and try to get \
+ * those many data blocks. \
+ */ \
+ required_blks = (int32)(fld_size - curr_blk_bytes)/(int32)DATA_BLOCK_SIZE; \
+ /* \
+ * check if we need the data block for the field data \
+ * bytes that are not modulus of data block size. \
+ */ \
+ if ((fld_size - curr_blk_bytes)%DATA_BLOCK_SIZE != 0) \
+ required_blks++; \
+}
+
+/*
* Represents the different source/dest cases we need to worry about at
* the bottom level
*/
@@ -254,6 +357,17 @@ typedef struct ParallelCopyLineBuf
} ParallelCopyLineBuf;
/*
+ * Represents the usage mode for CopyReadBinaryGetDataBlock.
+ */
+typedef enum FieldInfoType
+{
+ FIELD_NONE = 0,
+ FIELD_COUNT,
+ FIELD_SIZE,
+ FIELD_DATA
+} FieldInfoType;
+
+/*
* This structure helps in storing the common data from CopyStateData that are
* required by the workers. This information will then be allocated and stored
* into the DSM for the worker to retrieve and copy it to CopyStateData.
@@ -276,6 +390,7 @@ typedef struct SerializedParallelCopyState
/* Working state for COPY FROM */
AttrNumber num_defaults;
Oid relid;
+ bool binary;
} SerializedParallelCopyState;
/*
@@ -302,6 +417,9 @@ typedef struct ParallelCopyData
/* Current position in worker_line_buf */
uint32 worker_line_buf_pos;
+
+ /* For binary formatted files */
+ ParallelCopyDataBlock *curr_data_block;
} ParallelCopyData;
typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
@@ -487,4 +605,12 @@ extern uint32 UpdateSharedLineInfo(CopyState cstate, uint32 blk_pos, uint32 offs
uint32 line_size, uint32 line_state, uint32 blk_line_pos);
extern void EndLineParallelCopy(CopyState cstate, uint32 line_pos, uint32 line_size,
uint32 raw_buf_ptr);
+extern int CopyGetData(CopyState cstate, void *databuf, int minread, int maxread);
+extern int CopyReadBinaryData(CopyState cstate, char *dest, int nbytes);
+extern bool CopyReadBinaryTupleLeader(CopyState cstate);
+extern bool CopyReadBinaryTupleWorker(CopyState cstate, Datum *values, bool *nulls);
+extern void CopyReadBinaryFindTupleSize(CopyState cstate, uint32 *line_size);
+extern Datum CopyReadBinaryAttributeWorker(CopyState cstate, FmgrInfo *flinfo,
+ Oid typioparam, int32 typmod, bool *isnull);
+extern void CopyReadBinaryGetDataBlock(CopyState cstate, FieldInfoType field_info);
#endif /* COPY_H */
--
1.8.3.1
On Tue, Sep 29, 2020 at 3:16 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
Hi Vignesh and Bharath,
Seems like the Parallel Copy patch is regarding RI_TRIGGER_PK as
parallel-unsafe.
Can you explain why this is?
Yes we don't need to restrict parallelism for RI_TRIGGER_PK cases as
we don't do any command counter increments while performing PK checks
as opposed to RI_TRIGGER_FK/foreign key checks. We have modified this
in the v6 patch set.
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
On Mon, Sep 28, 2020 at 3:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Sep 22, 2020 at 2:44 PM vignesh C <vignesh21@gmail.com> wrote:
Thanks Ashutosh for your comments.
On Wed, Sep 16, 2020 at 6:36 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
Hi Vignesh,
I've spent some time today looking at your new set of patches and I've
some thoughts and queries which I would like to put here:Why are these not part of the shared cstate structure?
SerializeString(pcxt, PARALLEL_COPY_KEY_NULL_PRINT, cstate->null_print);
SerializeString(pcxt, PARALLEL_COPY_KEY_DELIM, cstate->delim);
SerializeString(pcxt, PARALLEL_COPY_KEY_QUOTE, cstate->quote);
SerializeString(pcxt, PARALLEL_COPY_KEY_ESCAPE, cstate->escape);I have used shared_cstate mainly to share the integer & bool data
types from the leader to worker process. The above data types are of
char* data type, I will not be able to use it like how I could do it
for integer type. So I preferred to send these as separate keys to the
worker. Thoughts?I think the way you have written will work but if we go with
Ashutosh's proposal it will look elegant and in the future, if we need
to share more strings as part of cstate structure then that would be
easier. You can probably refer to EstimateParamListSpace,
SerializeParamList, and RestoreParamList to see how we can share
different types of data in one key.
Thanks for the solution Amit, I have fixed this and handled it in the
v6 patch shared in my previous mail.
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
On Thu, Oct 8, 2020 at 5:44 AM vignesh C <vignesh21@gmail.com> wrote:
Attached v6 patch with the fixes.
Hi Vignesh,
I noticed a couple of issues when scanning the code in the following patch:
v6-0003-Allow-copy-from-command-to-process-data-from-file.patch
In the following code, it will put a junk uint16 value into *destptr
(and thus may well cause a crash) on a Big Endian architecture
(Solaris Sparc, s390x, etc.):
You're storing a (uint16) string length in a uint32 and then pulling
out the lower two bytes of the uint32 and copying them into the
location pointed to by destptr.
static void
+CopyStringToSharedMemory(CopyState cstate, char *srcPtr, char *destptr,
+ uint32 *copiedsize)
+{
+ uint32 len = srcPtr ? strlen(srcPtr) + 1 : 0;
+
+ memcpy(destptr, (uint16 *) &len, sizeof(uint16));
+ *copiedsize += sizeof(uint16);
+ if (len)
+ {
+ memcpy(destptr + sizeof(uint16), srcPtr, len);
+ *copiedsize += len;
+ }
+}
I suggest you change the code to:
uint16 len = srcPtr ? (uint16)strlen(srcPtr) + 1 : 0;
memcpy(destptr, &len, sizeof(uint16));
[I assume string length here can't ever exceed (65535 - 1), right?]
Looking a bit deeper into this, I'm wondering if in fact your
EstimateStringSize() and EstimateNodeSize() functions should be using
BUFFERALIGN() for EACH stored string/node (rather than just calling
shm_toc_estimate_chunk() once at the end, after the length of packed
strings and nodes has been estimated), to ensure alignment of start of
each string/node. Other Postgres code appears to be aligning each
stored chunk using shm_toc_estimate_chunk(). See the definition of
that macro and its current usages.
Then you could safely use:
uint16 len = srcPtr ? (uint16)strlen(srcPtr) + 1 : 0;
*(uint16 *)destptr = len;
*copiedsize += sizeof(uint16);
if (len)
{
memcpy(destptr + sizeof(uint16), srcPtr, len);
*copiedsize += len;
}
and in the CopyStringFromSharedMemory() function, then could safely use:
len = *(uint16 *)srcPtr;
The compiler may be smart enough to optimize-away the memcpy() in this
case anyway, but there are issues in doing this for architectures that
take a performance hit for unaligned access, or don't support
unaligned access.
Also, in CopyXXXXFromSharedMemory() functions, you should use palloc()
instead of palloc0(), as you're filling the entire palloc'd buffer
anyway, so no need to ask for additional MemSet() of all buffer bytes
to 0 prior to memcpy().
Regards,
Greg Nancarrow
Fujitsu Australia
On Mon, Sep 28, 2020 at 6:37 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
On Mon, Sep 28, 2020 at 3:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Sep 22, 2020 at 2:44 PM vignesh C <vignesh21@gmail.com> wrote:
Thanks Ashutosh for your comments.
On Wed, Sep 16, 2020 at 6:36 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
Hi Vignesh,
I've spent some time today looking at your new set of patches and I've
some thoughts and queries which I would like to put here:Why are these not part of the shared cstate structure?
SerializeString(pcxt, PARALLEL_COPY_KEY_NULL_PRINT, cstate->null_print);
SerializeString(pcxt, PARALLEL_COPY_KEY_DELIM, cstate->delim);
SerializeString(pcxt, PARALLEL_COPY_KEY_QUOTE, cstate->quote);
SerializeString(pcxt, PARALLEL_COPY_KEY_ESCAPE, cstate->escape);I have used shared_cstate mainly to share the integer & bool data
types from the leader to worker process. The above data types are of
char* data type, I will not be able to use it like how I could do it
for integer type. So I preferred to send these as separate keys to the
worker. Thoughts?I think the way you have written will work but if we go with
Ashutosh's proposal it will look elegant and in the future, if we need
to share more strings as part of cstate structure then that would be
easier. You can probably refer to EstimateParamListSpace,
SerializeParamList, and RestoreParamList to see how we can share
different types of data in one key.Yeah. And in addition to that it will also reduce the number of DSM
keys that we need to maintain.
Thanks Ashutosh, This is handled as part of the v6 patch set.
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
On Tue, Sep 29, 2020 at 6:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Sep 28, 2020 at 12:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Few additional comments:
======================Some more comments:
v5-0002-Framework-for-leader-worker-in-parallel-copy =========================================== 1. These values + * help in handover of multiple records with significant size of data to be + * processed by each of the workers to make sure there is no context switch & the + * work is fairly distributed among the workers.How about writing it as: "These values help in the handover of
multiple records with the significant size of data to be processed by
each of the workers. This also ensures there is no context switch and
the work is fairly distributed among the workers."
Changed as suggested.
2. Can we keep WORKER_CHUNK_COUNT, MAX_BLOCKS_COUNT, and RINGSIZE as
power-of-two? Say WORKER_CHUNK_COUNT as 64, MAX_BLOCK_COUNT as 1024,
and accordingly choose RINGSIZE. At many places, we do that way. I
think it can sometimes help in faster processing due to cache size
requirements and in this case, I don't see a reason why we can't
choose these values to be power-of-two. If you agree with this change
then also do some performance testing after this change?
Modified as suggested, Have checked few performance tests & verified
there is no degradation. We will post a performance run of this
separately in the coming days..
3. + bool curr_blk_completed; + char data[DATA_BLOCK_SIZE]; /* data read from file */ + uint8 skip_bytes; +} ParallelCopyDataBlock;Is there a reason to keep skip_bytes after data? Normally the variable
size data is at the end of the structure. Also, there is no comment
explaining the purpose of skip_bytes.
Modified as suggested and added comments.
4. + * Copy data block information. + * ParallelCopyDataBlock's will be created in DSM. Data read from file will be + * copied in these DSM data blocks. The leader process identifies the records + * and the record information will be shared to the workers. The workers will + * insert the records into the table. There can be one or more number of records + * in each of the data block based on the record size. + */ +typedef struct ParallelCopyDataBlockKeep one empty line after the description line like below. I also
suggested to do a minor tweak in the above sentence which is as
follows:* Copy data block information.
*
* These data blocks are created in DSM. Data read ...Try to follow a similar format in other comments as well.
Modified as suggested.
5. I think it is better to move parallelism related code to a new file
(we can name it as copyParallel.c or something like that).
Modified, added copyparallel.c file to include copy parallelism
functionality & copyparallel.c file & some of the function prototype &
data structure were moved to copy.h header file so that it can be
shared between copy.c & copyparallel.c
6. copy.c(1648,25): warning C4133: 'function': incompatible types -
from 'ParallelCopyLineState *' to 'uint32 *'
Getting above compilation warning on Windows.
Modified the data type.
v5-0003-Allow-copy-from-command-to-process-data-from-file ================================================== 1. @@ -4294,7 +5047,7 @@ BeginCopyFrom(ParseState *pstate, * only in text mode. */ initStringInfo(&cstate->attribute_buf); - cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1); + cstate->raw_buf = (IsParallelCopy()) ? NULL : (char *) palloc(RAW_BUF_SIZE + 1);Is there anyway IsParallelCopy can be true by this time? AFAICS, we do
anything about parallelism after this. If you want to save this
allocation then we need to move this after we determine that
parallelism can be used or not and accordingly the below code in the
patch needs to be changed.* ParallelCopyFrom - parallel copy leader's functionality.
*
* Leader executes the before statement for before statement trigger, if before
@@ -1110,8 +1547,302 @@ ParallelCopyFrom(CopyState cstate)
ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
ereport(DEBUG1, (errmsg("Running parallel copy leader")));+ /* raw_buf is not used in parallel copy, instead data blocks are used.*/ + pfree(cstate->raw_buf); + cstate->raw_buf = NULL;
Removed the palloc change, raw_buf will be allocated both for parallel
and non parallel copy. One other solution that I thought was to move
the memory allocation to CopyFrom, but this solution might affect fdw
where they use BeginCopyFrom, NextCopyFrom & EndCopyFrom. So I have
kept the allocation as in BeginCopyFrom & freeing for parallel copy in
ParallelCopyFrom.
Is there anything else also the allocation of which depends on parallelism?
I felt this is the only allocated memory that sequential copy requires
and which is not required in parallel copy.
2. +static pg_attribute_always_inline bool +IsParallelCopyAllowed(CopyState cstate) +{ + /* Parallel copy not allowed for frontend (2.0 protocol) & binary option. */ + if ((cstate->copy_dest == COPY_OLD_FE) || cstate->binary) + return false; + + /* Check if copy is into foreign table or temporary table. */ + if (cstate->rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE || + RelationUsesLocalBuffers(cstate->rel)) + return false; + + /* Check if trigger function is parallel safe. */ + if (cstate->rel->trigdesc != NULL && + !IsTriggerFunctionParallelSafe(cstate->rel->trigdesc)) + return false; + + /* + * Check if there is after statement or instead of trigger or transition + * table triggers. + */ + if (cstate->rel->trigdesc != NULL && + (cstate->rel->trigdesc->trig_insert_after_statement || + cstate->rel->trigdesc->trig_insert_instead_row || + cstate->rel->trigdesc->trig_insert_new_table)) + return false; + + /* Check if the volatile expressions are parallel safe, if present any. */ + if (!CheckExprParallelSafety(cstate)) + return false; + + /* Check if the insertion mode is single. */ + if (FindInsertMethod(cstate) == CIM_SINGLE) + return false; + + return true; +}In the comments, we should write why parallelism is not allowed for a
particular case. The cases where parallel-unsafe clause is involved
are okay but it is not clear from comments why it is not allowed in
other cases.
Added comments.
3. + ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info; + ParallelCopyLineBoundary *lineInfo; + uint32 line_first_block = pcshared_info->cur_block_pos; + line_pos = UpdateBlockInLineInfo(cstate, + line_first_block, + cstate->raw_buf_index, -1, + LINE_LEADER_POPULATING); + lineInfo = &pcshared_info->line_boundaries.ring[line_pos]; + elog(DEBUG1, "[Leader] Adding - block:%d, offset:%d, line position:%d", + line_first_block, lineInfo->start_offset, line_pos);Can we take all the code here inside function UpdateBlockInLineInfo? I
see that it is called from one other place but I guess most of the
surrounding code there can also be moved inside the function. Can we
change the name of the function to UpdateSharedLineInfo or something
like that and remove inline marking from this? I am not sure we want
to inline such big functions. If it make difference in performance
then we can probably consider it.
Changed as suggested.
4. EndLineParallelCopy() { .. + /* Update line size. */ + pg_atomic_write_u32(&lineInfo->line_size, line_size); + pg_atomic_write_u32(&lineInfo->line_state, LINE_LEADER_POPULATED); + elog(DEBUG1, "[Leader] After adding - line position:%d, line_size:%d", + line_pos, line_size); .. }Can we instead call UpdateSharedLineInfo (new function name for
UpdateBlockInLineInfo) to do this and maybe see it only updates the
required info? The idea is to centralize the code for updating
SharedLineInfo.
Updated as suggested.
5. +static uint32 +GetLinePosition(CopyState cstate) +{ + ParallelCopyData *pcdata = cstate->pcdata; + ParallelCopyShmInfo *pcshared_info = pcdata->pcshared_info; + uint32 previous_pos = pcdata->worker_processed_pos; + uint32 write_pos = (previous_pos == -1) ? 0 : (previous_pos + 1) % RINGSIZE;It seems to me that each worker has to hop through all the processed
chunks before getting the chunk which it can process. This will work
but I think it is better if we have some shared counter which can tell
us the next chunk to be processed and avoid all the unnecessary work
of hopping to find the exact position.
I had tried to have a spin lock & try to track this position instead
of hopping through the processed chunks. But I did not get the earlier
performance results, there was slight degradation:
Use case 2: 3 indexes on integer columns
Run on earlier patches without spinlock:
(220.680, 0, 1X), (185.096, 1, 1.19X), (134.811, 2, 1.64X), (114.585,
4, 1.92X), (107.707, 8, 2.05X), (101.253, 16, 2.18X), (100.749, 20,
2.19X), (100.656, 30, 2.19X)
Run on latest v6 patches with spinlock:
(216.059, 0, 1X), (177.639, 1, 1.22X), (145.213, 2, 1.49X), (126.370,
4, 1.71X), (121.013, 8, 1.78X), (102.933, 16, 2.1X), (103.000, 20,
2.1X), (100.308, 30, 2.15X)
I have not included these changes as there was some performance
degradation. I will try to come with a different solution for this and
discuss in the coming days. This point is not yet handled.
v5-0004-Documentation-for-parallel-copy
-----------------------------------------
1. Can you add one or two examples towards the end of the page where
we have examples for other Copy options?Please run pgindent on all patches as that will make the code look better.
Have run pgindent on the latest patches.
From the testing perspective,
1. Test by having something force_parallel_mode = regress which means
that all existing Copy tests in the regression will be executed via
new worker code. You can have this as a test-only patch for now and
make sure all existing tests passed with this.
2. Do we have tests for toast tables? I think if you implement the
previous point some existing tests might cover it but I feel we should
have at least one or two tests for the same.
3. Have we checked the code coverage of the newly added code with
existing tests?
These will be handled in the next few days.
These changes are present as part of the v6 patch set.
I'm summarizing the pending open points so that I don't miss anything:
1) Performance test on latest patch set.
2) Testing points suggested.
3) Support of parallel copy for COPY_OLD_FE.
4) Worker has to hop through all the processed chunks before getting
the chunk which it can process.
5) Handling of Tomas's comments.
6) Handling of Greg's comments.
We plan to work on this & complete in the next few days.
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
On Thu, Oct 8, 2020 at 12:14 AM vignesh C <vignesh21@gmail.com> wrote:
On Mon, Sep 28, 2020 at 12:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
I am convinced by the reason given by Kyotaro-San in that another
thread [1] and performance data shown by Peter that this can't be an
independent improvement and rather in some cases it can do harm. Now,
if you need it for a parallel-copy path then we can change it
specifically to the parallel-copy code path but I don't understand
your reason completely.Whenever we need data to be populated, we will get a new data block &
pass it to CopyGetData to populate the data. In case of file copy, the
server will completely fill the data block. We expect the data to be
filled completely. If data is available it will completely load the
complete data block in case of file copy. There is no scenario where
even if data is present a partial data block will be returned except
for EOF or no data available. But in case of STDIN data copy, even
though there is 8K data available in data block & 8K data available in
STDIN, CopyGetData will return as soon as libpq buffer data is more
than the minread. We will pass new data block every time to load data.
Every time we pass an 8K data block but CopyGetData loads a few bytes
in the new data block & returns. I wanted to keep the same data
population logic for both file copy & STDIN copy i.e copy full 8K data
blocks & then the populated data can be required. There is an
alternative solution I can have some special handling in case of STDIN
wherein the existing data block can be passed with the index from
where the data should be copied. Thoughts?
What you are proposing as an alternative solution, isn't that what we
are doing without the patch? IIUC, you require this because of your
corresponding changes to handle COPY_NEW_FE in CopyReadLine(), is that
right? If so, what is the difficulty in making it behave similar to
the non-parallel case?
--
With Regards,
Amit Kapila.
On Thu, Oct 8, 2020 at 12:14 AM vignesh C <vignesh21@gmail.com> wrote:
On Mon, Sep 28, 2020 at 12:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
+ */
+typedef struct ParallelCopyLineBoundaryAre we doing all this state management to avoid using locks while
processing lines? If so, I think we can use either spinlock or LWLock
to keep the main patch simple and then provide a later patch to make
it lock-less. This will allow us to first focus on the main design of
the patch rather than trying to make this datastructure processing
lock-less in the best possible way.The steps will be more or less same if we use spinlock too. step 1, step 3 & step 4 will be common we have to use lock & unlock instead of step 2 & step 5. I feel we can retain the current implementation.
I'll study this in detail and let you know my opinion on the same but
in the meantime, I don't follow one part of this comment: "If they
don't follow this order the worker might process wrong line_size and
leader might populate the information which worker has not yet
processed or in the process of processing."Do you want to say that leader might overwrite some information which
worker hasn't read yet? If so, it is not clear from the comment.
Another minor point about this comment:Here leader and worker must follow these steps to avoid any corruption
or hang issue. Changed it to:
* The leader & worker process access the shared line information by following
* the below steps to avoid any data corruption or hang:
Actually, I wanted more on the lines why such corruption or hang can
happen? It might help reviewers to understand why you have followed
such a sequence.
How did you ensure that this is fixed? Have you tested it, if so
please share the test? I see a basic problem with your fix.+ /* Report WAL/buffer usage during parallel execution */ + bufferusage = shm_toc_lookup(toc, PARALLEL_COPY_BUFFER_USAGE, false); + walusage = shm_toc_lookup(toc, PARALLEL_COPY_WAL_USAGE, false); + InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber], + &walusage[ParallelWorkerNumber]);You need to call InstrStartParallelQuery() before the actual operation
starts, without that stats won't be accurate? Also, after calling
WaitForParallelWorkersToFinish(), you need to accumulate the stats
collected from workers which neither you have done nor is possible
with the current code in your patch because you haven't made any
provision to capture them in BeginParallelCopy.I suggest you look into lazy_parallel_vacuum_indexes() and
begin_parallel_vacuum() to understand how the buffer/wal usage stats
are accumulated. Also, please test this functionality using
pg_stat_statements.Made changes accordingly.
I have verified it using:
postgres=# select * from pg_stat_statements where query like '%copy%';
userid | dbid | queryid |
query
| plans | total_plan_time |
min_plan_time | max_plan_time | mean_plan_time | stddev_plan_time |
calls | total_exec_time | min_exec_time | max_exec_time |
mean_exec_time | stddev_exec_time | rows | shared_blks_hi
t | shared_blks_read | shared_blks_dirtied | shared_blks_written |
local_blks_hit | local_blks_read | local_blks_dirtied |
local_blks_written | temp_blks_read | temp_blks_written | blk_
read_time | blk_write_time | wal_records | wal_fpi | wal_bytes
--------+-------+----------------------+---------------------------------------------------------------------------------------------------------------------+-------+-----------------+-
--------------+---------------+----------------+------------------+-------+-----------------+---------------+---------------+----------------+------------------+--------+---------------
--+------------------+---------------------+---------------------+----------------+-----------------+--------------------+--------------------+----------------+-------------------+-----
----------+----------------+-------------+---------+-----------
10 | 13743 | -6947756673093447609 | copy hw from
'/home/vignesh/postgres/postgres/inst/bin/hw_175000.csv' with(format
csv, delimiter ',') | 0 | 0 |
0 | 0 | 0 | 0 |
1 | 265.195105 | 265.195105 | 265.195105 | 265.195105
| 0 | 175000 | 191
6 | 0 | 946 | 946 |
0 | 0 | 0 | 0
| 0 | 0 |
0 | 0 | 1116 | 0 | 3587203
10 | 13743 | 8570215596364326047 | copy hw from
'/home/vignesh/postgres/postgres/inst/bin/hw_175000.csv' with(format
csv, delimiter ',', parallel '2') | 0 | 0 |
0 | 0 | 0 | 0 |
1 | 35668.402482 | 35668.402482 | 35668.402482 | 35668.402482
| 0 | 175000 | 310
1 | 36 | 952 | 919 |
0 | 0 | 0 | 0
| 0 | 0 |
0 | 0 | 1119 | 6 | 3624405
(2 rows)
I am not able to properly parse the data but If understand the wal
data for non-parallel (1116 | 0 | 3587203) and parallel (1119
| 6 | 3624405) case doesn't seem to be the same. Is that
right? If so, why? Please ensure that no checkpoint happens for both
cases.
--
With Regards,
Amit Kapila.
On Thu, Oct 8, 2020 at 8:43 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
On Thu, Oct 8, 2020 at 5:44 AM vignesh C <vignesh21@gmail.com> wrote:
Attached v6 patch with the fixes.
Hi Vignesh,
I noticed a couple of issues when scanning the code in the following patch:
v6-0003-Allow-copy-from-command-to-process-data-from-file.patch
In the following code, it will put a junk uint16 value into *destptr
(and thus may well cause a crash) on a Big Endian architecture
(Solaris Sparc, s390x, etc.):
You're storing a (uint16) string length in a uint32 and then pulling
out the lower two bytes of the uint32 and copying them into the
location pointed to by destptr.static void +CopyStringToSharedMemory(CopyState cstate, char *srcPtr, char *destptr, + uint32 *copiedsize) +{ + uint32 len = srcPtr ? strlen(srcPtr) + 1 : 0; + + memcpy(destptr, (uint16 *) &len, sizeof(uint16)); + *copiedsize += sizeof(uint16); + if (len) + { + memcpy(destptr + sizeof(uint16), srcPtr, len); + *copiedsize += len; + } +}I suggest you change the code to:
uint16 len = srcPtr ? (uint16)strlen(srcPtr) + 1 : 0;
memcpy(destptr, &len, sizeof(uint16));[I assume string length here can't ever exceed (65535 - 1), right?]
Your suggestion makes sense to me if the assumption related to string
length is correct. If we can't ensure that then we need to probably
use four bytes uint32 to store the length.
Looking a bit deeper into this, I'm wondering if in fact your
EstimateStringSize() and EstimateNodeSize() functions should be using
BUFFERALIGN() for EACH stored string/node (rather than just calling
shm_toc_estimate_chunk() once at the end, after the length of packed
strings and nodes has been estimated), to ensure alignment of start of
each string/node. Other Postgres code appears to be aligning each
stored chunk using shm_toc_estimate_chunk(). See the definition of
that macro and its current usages.
I am not sure if this required for the purpose of correctness. AFAIU,
we do store/estimate multiple parameters in same way at other places,
see EstimateParamListSpace and SerializeParamList. Do you have
something else in mind?
While looking at the latest code, I observed below issue in patch
v6-0003-Allow-copy-from-command-to-process-data-from-file:
+ /* Estimate the size for shared information for PARALLEL_COPY_KEY_CSTATE */
+ est_cstateshared = MAXALIGN(sizeof(SerializedParallelCopyState));
+ shm_toc_estimate_chunk(&pcxt->estimator, est_cstateshared);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ strsize = EstimateCstateSize(pcxt, cstate, attnamelist, &whereClauseStr,
+ &rangeTableStr, &attnameListStr,
+ ¬nullListStr, &nullListStr,
+ &convertListStr);
Here, do we need to separately estimate the size of
SerializedParallelCopyState when it is also done in
EstimateCstateSize?
--
With Regards,
Amit Kapila.
On Fri, Oct 9, 2020 at 5:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Looking a bit deeper into this, I'm wondering if in fact your
EstimateStringSize() and EstimateNodeSize() functions should be using
BUFFERALIGN() for EACH stored string/node (rather than just calling
shm_toc_estimate_chunk() once at the end, after the length of packed
strings and nodes has been estimated), to ensure alignment of start of
each string/node. Other Postgres code appears to be aligning each
stored chunk using shm_toc_estimate_chunk(). See the definition of
that macro and its current usages.I am not sure if this required for the purpose of correctness. AFAIU,
we do store/estimate multiple parameters in same way at other places,
see EstimateParamListSpace and SerializeParamList. Do you have
something else in mind?
The point I was trying to make is that potentially more efficient code
can be used if the individual strings/nodes are aligned, rather than
packed (as they are now), but as you point out, there are already
cases (e.g. SerializeParamList) where within the separately-aligned
chunks the data is not aligned, so maybe not a big deal. Oh well,
without alignment, that means use of memcpy() cannot really be avoided
here for serializing/de-serializing ints etc., let's hope the compiler
optimizes it as best it can.
Regards,
Greg Nancarrow
Fujitsu Australia
On Tue, Sep 29, 2020 at 6:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
From the testing perspective,
1. Test by having something force_parallel_mode = regress which means
that all existing Copy tests in the regression will be executed via
new worker code. You can have this as a test-only patch for now and
make sure all existing tests passed with this.
I don't think all the existing copy test cases(except the new test cases
added in the parallel copy patch set) would run inside the parallel worker
if force_parallel_mode is on. This is because, the parallelism will be
picked up for parallel copy only if parallel option is specified unlike
parallelism for select queries.
Anyways, I ran with force_parallel_mode on and regress. All copy related
tests and make check/make check-world ran fine.
2. Do we have tests for toast tables? I think if you implement the
previous point some existing tests might cover it but I feel we should
have at least one or two tests for the same.
Toast table use case 1: 10000 tuples, 9.6GB data, 3 indexes 2 on integer
columns, 1 on text column(not the toast column), csv file, each row is >
1320KB:
(222.767, 0, 1X), (134.171, 1, 1.66X), (93.749, 2, 2.38X), (93.672, 4,
2.38X), (94.827, 8, 2.35X), (93.766, 16, 2.37X), (98.153, 20, 2.27X),
(122.721, 30, 1.81X)
Toast table use case 2: 100000 tuples, 96GB data, 3 indexes 2 on integer
columns, 1 on text column(not the toast column), csv file, each row is >
1320KB:
(2255.032, 0, 1X), (1358.628, 1, 1.66X), (901.170, 2, 2.5X), (912.743, 4,
2.47X), (988.718, 8, 2.28X), (938.000, 16, 2.4X), (997.556, 20, 2.26X),
(1000.586, 30, 2.25X)
Toast table use case3: 10000 tuples, 9.6GB, no indexes, binary file, each
row is > 1320KB:
(136.983, 0, 1X), (136.418, 1, 1X), (81.896, 2, 1.66X), (62.929, 4, 2.16X),
(52.311, 8, 2.6X), (40.032, 16, 3.49X), (44.097, 20, 3.09X), (62.310, 30,
2.18X)
In the case of a Toast table, we could achieve upto 2.5X for csv files, and
3.5X for binary files. We are analyzing this point and will post an update
on our findings soon.
While testing for the Toast table case with a binary file, I discovered an
issue with the earlier v6-0006-Parallel-Copy-For-Binary-Format-Files.patch
from [1]/messages/by-id/CALDaNm29DJKy0-vozs8eeBRf2u3rbvPdZHCocrd0VjoWHS7h5A@mail.gmail.com, I fixed it and added the updated v6-0006 patch here. Please note
that I'm also attaching the 1 to 5 patches from version 6 just for
completion, that have no change from what Vignesh sent earlier in [1]/messages/by-id/CALDaNm29DJKy0-vozs8eeBRf2u3rbvPdZHCocrd0VjoWHS7h5A@mail.gmail.com.
3. Have we checked the code coverage of the newly added code with
existing tests?
So far, we manually ensured that most of the code parts are covered(see
below list of test cases). But we are also planning to do the code coverage
using some tool in the coming days.
Apart from the above tests, I also captured performance measurement on the
latest v6 patch set.
Use case 1: 10million rows, 5.2GB data,2 indexes on integer columns, 1
index on text column, csv file
(1168.484, 0, 1X), (1116.442, 1, 1.05X), (641.272, 2, 1.82X), (338.963, 4,
3.45X), (202.914, 8, 5.76X), (139.884, 16, 8.35X), (128.955, 20, 9.06X),
(131.898, 30, 8.86X)
Use case 2: 10million rows, 5.2GB data,2 indexes on integer columns, 1
index on text column, binary file
(1097.83, 0, 1X), (1095.735, 1, 1.002X), (625.610, 2, 1.75X), (319.833, 4,
3.43X), (186.908, 8, 5.87X), (132.115, 16, 8.31X), (128.854, 20, 8.52X),
(134.965, 30, 8.13X)
Use case 2: 10million rows, 5.2GB data, 3 indexes on integer columns, csv
file
(218.227, 0, 1X), (182.815, 1, 1.19X), (135.500, 2, 1.61), (113.954, 4,
1.91X), (106.243, 8, 2.05X), (101.222, 16, 2.15X), (100.378, 20, 2.17X),
(100.351, 30, 2.17X)
All the above tests are performed on the latest v6 patch set (attached here
in this thread) with custom postgresql.conf[1]/messages/by-id/CALDaNm29DJKy0-vozs8eeBRf2u3rbvPdZHCocrd0VjoWHS7h5A@mail.gmail.com. The results are of the
triplet form (exec time in sec, number of workers, gain)
Overall, we have below test cases to cover the code and for performance
measurements. We plan to run these tests whenever a new set of patches is
posted.
1. csv
2. binary
3. force parallel mode = regress
4. toast data csv and binary
5. foreign key check, before row, after row, before statement, after
statement, instead of triggers
6. partition case
7. foreign partitions and partitions having trigger cases
8. where clause having parallel unsafe and safe expression, default
parallel unsafe and safe expression
9. temp, global, local, unlogged, inherited tables cases, foreign tables
[1]: /messages/by-id/CALDaNm29DJKy0-vozs8eeBRf2u3rbvPdZHCocrd0VjoWHS7h5A@mail.gmail.com
/messages/by-id/CALDaNm29DJKy0-vozs8eeBRf2u3rbvPdZHCocrd0VjoWHS7h5A@mail.gmail.com
[2]: shared_buffers = 40GB max_worker_processes = 32 max_parallel_maintenance_workers = 24 max_parallel_workers = 32 synchronous_commit = off checkpoint_timeout = 1d max_wal_size = 24GB min_wal_size = 15GB autovacuum = off
shared_buffers = 40GB
max_worker_processes = 32
max_parallel_maintenance_workers = 24
max_parallel_workers = 32
synchronous_commit = off
checkpoint_timeout = 1d
max_wal_size = 24GB
min_wal_size = 15GB
autovacuum = off
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v6-0001-Copy-code-readjustment-to-support-parallel-copy.patchapplication/octet-stream; name=v6-0001-Copy-code-readjustment-to-support-parallel-copy.patchDownload
From 2f6fda276f191a3b7a15c07c51199a154530ed09 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Sat, 1 Aug 2020 08:52:48 +0530
Subject: [PATCH v6 1/6] Copy code readjustment to support parallel copy.
This patch has the copy code slightly readjusted so that the common code is
separated to functions/macros, these functions/macros will be used by the
workers in the parallel copy code of the upcoming patches. EOL removal is moved
from CopyReadLine to CopyReadLineText, this change was required because in case
of parallel copy the record identification and record updation is done in
CopyReadLineText, before record information is updated in shared memory the new
line characters should be removed.
---
src/backend/commands/copy.c | 356 +++++++++++++++++++++++++++-----------------
1 file changed, 218 insertions(+), 138 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 2047557..f2848a1 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -95,6 +95,9 @@ typedef enum CopyInsertMethod
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
} CopyInsertMethod;
+#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
+
/*
* This struct contains all the state variables used throughout a COPY
* operation. For simplicity, we use the same struct for all variants of COPY,
@@ -224,7 +227,6 @@ typedef struct CopyStateData
* appropriate amounts of data from this buffer. In both modes, we
* guarantee that there is a \0 at raw_buf[raw_buf_len].
*/
-#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
char *raw_buf;
int raw_buf_index; /* next byte to process */
int raw_buf_len; /* total # of bytes stored */
@@ -354,6 +356,18 @@ if (1) \
goto not_end_of_copy; \
} else ((void) 0)
+/*
+ * Increment the lines processed.
+ */
+#define INCREMENTPROCESSED(processed) \
+processed++;
+
+/*
+ * Get the lines processed.
+ */
+#define RETURNPROCESSED(processed) \
+return processed;
+
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
@@ -401,6 +415,12 @@ static bool CopyGetInt16(CopyState cstate, int16 *val);
static bool CopyLoadRawBuf(CopyState cstate);
static int CopyReadBinaryData(CopyState cstate, char *dest, int nbytes);
+static void PopulateCommonCstateInfo(CopyState cstate, TupleDesc tup_desc,
+ List *attnamelist);
+static void ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
+ int copy_line_pos, int *copy_line_size);
+static void ConvertToServerEncoding(CopyState cstate);
+
/*
* Send copy start/stop messages for frontend copies. These have changed
@@ -801,14 +821,18 @@ CopyLoadRawBuf(CopyState cstate)
{
int nbytes = RAW_BUF_BYTES(cstate);
int inbytes;
+ int minread = 1;
/* Copy down the unprocessed data if any. */
if (nbytes > 0)
memmove(cstate->raw_buf, cstate->raw_buf + cstate->raw_buf_index,
nbytes);
+ if (cstate->copy_dest == COPY_NEW_FE)
+ minread = RAW_BUF_SIZE - nbytes;
+
inbytes = CopyGetData(cstate, cstate->raw_buf + nbytes,
- 1, RAW_BUF_SIZE - nbytes);
+ minread, RAW_BUF_SIZE - nbytes);
nbytes += inbytes;
cstate->raw_buf[nbytes] = '\0';
cstate->raw_buf_index = 0;
@@ -1514,7 +1538,6 @@ BeginCopy(ParseState *pstate,
{
CopyState cstate;
TupleDesc tupDesc;
- int num_phys_attrs;
MemoryContext oldcontext;
/* Allocate workspace and zero all fields */
@@ -1680,6 +1703,25 @@ BeginCopy(ParseState *pstate,
tupDesc = cstate->queryDesc->tupDesc;
}
+ PopulateCommonCstateInfo(cstate, tupDesc, attnamelist);
+ cstate->copy_dest = COPY_FILE; /* default */
+
+ MemoryContextSwitchTo(oldcontext);
+
+ return cstate;
+}
+
+/*
+ * PopulateCommonCstateInfo
+ *
+ * Populates the common variables required for copy from operation. This is a
+ * helper function for BeginCopy function.
+ */
+static void
+PopulateCommonCstateInfo(CopyState cstate, TupleDesc tupDesc, List *attnamelist)
+{
+ int num_phys_attrs;
+
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
@@ -1799,12 +1841,6 @@ BeginCopy(ParseState *pstate,
pg_database_encoding_max_length() > 1);
/* See Multibyte encoding comment above */
cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
-
- cstate->copy_dest = COPY_FILE; /* default */
-
- MemoryContextSwitchTo(oldcontext);
-
- return cstate;
}
/*
@@ -2696,32 +2732,13 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
}
/*
- * Copy FROM file to relation.
+ * CheckTargetRelValidity
+ *
+ * Check if the relation specified in copy from is valid.
*/
-uint64
-CopyFrom(CopyState cstate)
+static void
+CheckTargetRelValidity(CopyState cstate)
{
- ResultRelInfo *resultRelInfo;
- ResultRelInfo *target_resultRelInfo;
- ResultRelInfo *prevResultRelInfo = NULL;
- EState *estate = CreateExecutorState(); /* for ExecConstraints() */
- ModifyTableState *mtstate;
- ExprContext *econtext;
- TupleTableSlot *singleslot = NULL;
- MemoryContext oldcontext = CurrentMemoryContext;
-
- PartitionTupleRouting *proute = NULL;
- ErrorContextCallback errcallback;
- CommandId mycid = GetCurrentCommandId(true);
- int ti_options = 0; /* start with default options for insert */
- BulkInsertState bistate = NULL;
- CopyInsertMethod insertMethod;
- CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
- uint64 processed = 0;
- bool has_before_insert_row_trig;
- bool has_instead_insert_row_trig;
- bool leafpart_use_multi_insert = false;
-
Assert(cstate->rel);
/*
@@ -2758,27 +2775,6 @@ CopyFrom(CopyState cstate)
RelationGetRelationName(cstate->rel))));
}
- /*
- * If the target file is new-in-transaction, we assume that checking FSM
- * for free space is a waste of time. This could possibly be wrong, but
- * it's unlikely.
- */
- if (RELKIND_HAS_STORAGE(cstate->rel->rd_rel->relkind) &&
- (cstate->rel->rd_createSubid != InvalidSubTransactionId ||
- cstate->rel->rd_firstRelfilenodeSubid != InvalidSubTransactionId))
- ti_options |= TABLE_INSERT_SKIP_FSM;
-
- /*
- * Optimize if new relfilenode was created in this subxact or one of its
- * committed children and we won't see those rows later as part of an
- * earlier scan or command. The subxact test ensures that if this subxact
- * aborts then the frozen rows won't be visible after xact cleanup. Note
- * that the stronger test of exactly which subtransaction created it is
- * crucial for correctness of this optimization. The test for an earlier
- * scan or command tolerates false negatives. FREEZE causes other sessions
- * to see rows they would not see under MVCC, and a false negative merely
- * spreads that anomaly to the current session.
- */
if (cstate->freeze)
{
/*
@@ -2816,9 +2812,61 @@ CopyFrom(CopyState cstate)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("cannot perform COPY FREEZE because the table was not created or truncated in the current subtransaction")));
+ }
+}
+
+/*
+ * Copy FROM file to relation.
+ */
+uint64
+CopyFrom(CopyState cstate)
+{
+ ResultRelInfo *resultRelInfo;
+ ResultRelInfo *target_resultRelInfo;
+ ResultRelInfo *prevResultRelInfo = NULL;
+ EState *estate = CreateExecutorState(); /* for ExecConstraints() */
+ ModifyTableState *mtstate;
+ ExprContext *econtext;
+ TupleTableSlot *singleslot = NULL;
+ MemoryContext oldcontext = CurrentMemoryContext;
+
+ PartitionTupleRouting *proute = NULL;
+ ErrorContextCallback errcallback;
+ CommandId mycid = GetCurrentCommandId(true);
+ int ti_options = 0; /* start with default options for insert */
+ BulkInsertState bistate = NULL;
+ CopyInsertMethod insertMethod;
+ CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
+ uint64 processed = 0;
+ bool has_before_insert_row_trig;
+ bool has_instead_insert_row_trig;
+ bool leafpart_use_multi_insert = false;
+ CheckTargetRelValidity(cstate);
+
+ /*
+ * If the target file is new-in-transaction, we assume that checking FSM
+ * for free space is a waste of time. This could possibly be wrong, but
+ * it's unlikely.
+ */
+ if (RELKIND_HAS_STORAGE(cstate->rel->rd_rel->relkind) &&
+ (cstate->rel->rd_createSubid != InvalidSubTransactionId ||
+ cstate->rel->rd_firstRelfilenodeSubid != InvalidSubTransactionId))
+ ti_options |= TABLE_INSERT_SKIP_FSM;
+
+ /*
+ * Optimize if new relfilenode was created in this subxact or one of its
+ * committed children and we won't see those rows later as part of an
+ * earlier scan or command. The subxact test ensures that if this subxact
+ * aborts then the frozen rows won't be visible after xact cleanup. Note
+ * that the stronger test of exactly which subtransaction created it is
+ * crucial for correctness of this optimization. The test for an earlier
+ * scan or command tolerates false negatives. FREEZE causes other sessions
+ * to see rows they would not see under MVCC, and a false negative merely
+ * spreads that anomaly to the current session.
+ */
+ if (cstate->freeze)
ti_options |= TABLE_INSERT_FROZEN;
- }
/*
* We need a ResultRelInfo so we can use the regular executor's
@@ -3311,7 +3359,7 @@ CopyFrom(CopyState cstate)
* or FDW; this is the same definition used by nodeModifyTable.c
* for counting tuples inserted by an INSERT command.
*/
- processed++;
+ INCREMENTPROCESSED(processed)
}
}
@@ -3366,30 +3414,17 @@ CopyFrom(CopyState cstate)
FreeExecutorState(estate);
- return processed;
+ RETURNPROCESSED(processed)
}
/*
- * Setup to read tuples from a file for COPY FROM.
+ * PopulateCstateCatalogInfo
*
- * 'rel': Used as a template for the tuples
- * 'filename': Name of server-local file to read
- * 'attnamelist': List of char *, columns to include. NIL selects all cols.
- * 'options': List of DefElem. See copy_opt_item in gram.y for selections.
- *
- * Returns a CopyState, to be passed to NextCopyFrom and related functions.
+ * Populate the cstate catalog information.
*/
-CopyState
-BeginCopyFrom(ParseState *pstate,
- Relation rel,
- const char *filename,
- bool is_program,
- copy_data_source_cb data_source_cb,
- List *attnamelist,
- List *options)
+static void
+PopulateCstateCatalogInfo(CopyState cstate)
{
- CopyState cstate;
- bool pipe = (filename == NULL);
TupleDesc tupDesc;
AttrNumber num_phys_attrs,
num_defaults;
@@ -3399,38 +3434,8 @@ BeginCopyFrom(ParseState *pstate,
Oid in_func_oid;
int *defmap;
ExprState **defexprs;
- MemoryContext oldcontext;
bool volatile_defexprs;
- cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
- oldcontext = MemoryContextSwitchTo(cstate->copycontext);
-
- /* Initialize state variables */
- cstate->reached_eof = false;
- cstate->eol_type = EOL_UNKNOWN;
- cstate->cur_relname = RelationGetRelationName(cstate->rel);
- cstate->cur_lineno = 0;
- cstate->cur_attname = NULL;
- cstate->cur_attval = NULL;
-
- /*
- * Set up variables to avoid per-attribute overhead. attribute_buf and
- * raw_buf are used in both text and binary modes, but we use line_buf
- * only in text mode.
- */
- initStringInfo(&cstate->attribute_buf);
- cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
- cstate->raw_buf_index = cstate->raw_buf_len = 0;
- if (!cstate->binary)
- {
- initStringInfo(&cstate->line_buf);
- cstate->line_buf_converted = false;
- }
-
- /* Assign range table, we'll need it in CopyFrom. */
- if (pstate)
- cstate->range_table = pstate->p_rtable;
-
tupDesc = RelationGetDescr(cstate->rel);
num_phys_attrs = tupDesc->natts;
num_defaults = 0;
@@ -3508,6 +3513,61 @@ BeginCopyFrom(ParseState *pstate,
cstate->defexprs = defexprs;
cstate->volatile_defexprs = volatile_defexprs;
cstate->num_defaults = num_defaults;
+}
+
+/*
+ * Setup to read tuples from a file for COPY FROM.
+ *
+ * 'rel': Used as a template for the tuples
+ * 'filename': Name of server-local file to read
+ * 'attnamelist': List of char *, columns to include. NIL selects all cols.
+ * 'options': List of DefElem. See copy_opt_item in gram.y for selections.
+ *
+ * Returns a CopyState, to be passed to NextCopyFrom and related functions.
+ */
+CopyState
+BeginCopyFrom(ParseState *pstate,
+ Relation rel,
+ const char *filename,
+ bool is_program,
+ copy_data_source_cb data_source_cb,
+ List *attnamelist,
+ List *options)
+{
+ CopyState cstate;
+ bool pipe = (filename == NULL);
+ MemoryContext oldcontext;
+
+ cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+
+ /* Initialize state variables */
+ cstate->reached_eof = false;
+ cstate->eol_type = EOL_UNKNOWN;
+ cstate->cur_relname = RelationGetRelationName(cstate->rel);
+ cstate->cur_lineno = 0;
+ cstate->cur_attname = NULL;
+ cstate->cur_attval = NULL;
+
+ /*
+ * Set up variables to avoid per-attribute overhead. attribute_buf is
+ * used in both text and binary modes, but we use line_buf and raw_buf
+ * only in text mode.
+ */
+ initStringInfo(&cstate->attribute_buf);
+ cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+ if (!cstate->binary)
+ {
+ initStringInfo(&cstate->line_buf);
+ cstate->line_buf_converted = false;
+ }
+
+ /* Assign range table, we'll need it in CopyFrom. */
+ if (pstate)
+ cstate->range_table = pstate->p_rtable;
+
+ PopulateCstateCatalogInfo(cstate);
cstate->is_program = is_program;
if (data_source_cb)
@@ -3917,40 +3977,60 @@ CopyReadLine(CopyState cstate)
} while (CopyLoadRawBuf(cstate));
}
}
- else
+
+ ConvertToServerEncoding(cstate);
+ return result;
+}
+
+/*
+ * ClearEOLFromCopiedData
+ *
+ * Clear EOL from the copied data.
+ */
+static void
+ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
+ int copy_line_pos, int *copy_line_size)
+{
+ /*
+ * If we didn't hit EOF, then we must have transferred the EOL marker to
+ * line_buf along with the data. Get rid of it.
+ */
+ switch (cstate->eol_type)
{
- /*
- * If we didn't hit EOF, then we must have transferred the EOL marker
- * to line_buf along with the data. Get rid of it.
- */
- switch (cstate->eol_type)
- {
- case EOL_NL:
- Assert(cstate->line_buf.len >= 1);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\n');
- cstate->line_buf.len--;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_CR:
- Assert(cstate->line_buf.len >= 1);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\r');
- cstate->line_buf.len--;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_CRNL:
- Assert(cstate->line_buf.len >= 2);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 2] == '\r');
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\n');
- cstate->line_buf.len -= 2;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_UNKNOWN:
- /* shouldn't get here */
- Assert(false);
- break;
- }
+ case EOL_NL:
+ Assert(*copy_line_size >= 1);
+ Assert(copy_line_data[copy_line_pos - 1] == '\n');
+ copy_line_data[copy_line_pos - 1] = '\0';
+ (*copy_line_size)--;
+ break;
+ case EOL_CR:
+ Assert(*copy_line_size >= 1);
+ Assert(copy_line_data[copy_line_pos - 1] == '\r');
+ copy_line_data[copy_line_pos - 1] = '\0';
+ (*copy_line_size)--;
+ break;
+ case EOL_CRNL:
+ Assert(*copy_line_size >= 2);
+ Assert(copy_line_data[copy_line_pos - 2] == '\r');
+ Assert(copy_line_data[copy_line_pos - 1] == '\n');
+ copy_line_data[copy_line_pos - 2] = '\0';
+ *copy_line_size -= 2;
+ break;
+ case EOL_UNKNOWN:
+ /* shouldn't get here */
+ Assert(false);
+ break;
}
+}
+/*
+ * ConvertToServerEncoding
+ *
+ * Convert contents to server encoding.
+ */
+static void
+ConvertToServerEncoding(CopyState cstate)
+{
/* Done reading the line. Convert it to server encoding. */
if (cstate->need_transcoding)
{
@@ -3967,11 +4047,8 @@ CopyReadLine(CopyState cstate)
pfree(cvt);
}
}
-
/* Now it's safe to use the buffer in error messages */
cstate->line_buf_converted = true;
-
- return result;
}
/*
@@ -4334,6 +4411,9 @@ not_end_of_copy:
* Transfer any still-uncopied data to line_buf.
*/
REFILL_LINEBUF;
+ if (!result && !IsHeaderLine())
+ ClearEOLFromCopiedData(cstate, cstate->line_buf.data,
+ cstate->line_buf.len, &cstate->line_buf.len);
return result;
}
--
1.8.3.1
v6-0002-Framework-for-leader-worker-in-parallel-copy.patchapplication/octet-stream; name=v6-0002-Framework-for-leader-worker-in-parallel-copy.patchDownload
From 67e5240af5ebe803473acebaf0e8796fd2a05cdd Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Wed, 7 Oct 2020 17:18:17 +0530
Subject: [PATCH v6 2/6] Framework for leader/worker in parallel copy
This patch has the framework for data structures in parallel copy, leader
initialization, worker initialization, shared memory updation, starting workers,
wait for workers and workers exiting.
---
src/backend/access/transam/parallel.c | 4 +
src/backend/commands/Makefile | 1 +
src/backend/commands/copy.c | 235 ++++++--------------
src/include/commands/copy.h | 389 +++++++++++++++++++++++++++++++++-
src/tools/pgindent/typedefs.list | 7 +
5 files changed, 469 insertions(+), 167 deletions(-)
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index b042696..a3cff4b 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -25,6 +25,7 @@
#include "catalog/pg_enum.h"
#include "catalog/storage.h"
#include "commands/async.h"
+#include "commands/copy.h"
#include "executor/execParallel.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
@@ -145,6 +146,9 @@ static const struct
},
{
"parallel_vacuum_main", parallel_vacuum_main
+ },
+ {
+ "ParallelCopyMain", ParallelCopyMain
}
};
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index d4815d3..a224aac 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -24,6 +24,7 @@ OBJS = \
constraint.o \
conversioncmds.o \
copy.o \
+ copyparallel.o \
createas.o \
dbcommands.o \
define.o \
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index f2848a1..1e55a30 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -29,7 +29,6 @@
#include "catalog/pg_type.h"
#include "commands/copy.h"
#include "commands/defrem.h"
-#include "commands/trigger.h"
#include "executor/execPartition.h"
#include "executor/executor.h"
#include "executor/nodeModifyTable.h"
@@ -63,29 +62,6 @@
#define OCTVALUE(c) ((c) - '0')
/*
- * Represents the different source/dest cases we need to worry about at
- * the bottom level
- */
-typedef enum CopyDest
-{
- COPY_FILE, /* to/from file (or a piped program) */
- COPY_OLD_FE, /* to/from frontend (2.0 protocol) */
- COPY_NEW_FE, /* to/from frontend (3.0 protocol) */
- COPY_CALLBACK /* to/from callback function */
-} CopyDest;
-
-/*
- * Represents the end-of-line terminator type of the input
- */
-typedef enum EolType
-{
- EOL_UNKNOWN,
- EOL_NL,
- EOL_CR,
- EOL_CRNL
-} EolType;
-
-/*
* Represents the heap insert method to be used during COPY FROM.
*/
typedef enum CopyInsertMethod
@@ -95,145 +71,10 @@ typedef enum CopyInsertMethod
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
} CopyInsertMethod;
-#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+#define IsParallelCopy() (cstate->is_parallel)
+#define IsLeader() (cstate->pcdata->is_leader)
#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
-/*
- * This struct contains all the state variables used throughout a COPY
- * operation. For simplicity, we use the same struct for all variants of COPY,
- * even though some fields are used in only some cases.
- *
- * Multi-byte encodings: all supported client-side encodings encode multi-byte
- * characters by having the first byte's high bit set. Subsequent bytes of the
- * character can have the high bit not set. When scanning data in such an
- * encoding to look for a match to a single-byte (ie ASCII) character, we must
- * use the full pg_encoding_mblen() machinery to skip over multibyte
- * characters, else we might find a false match to a trailing byte. In
- * supported server encodings, there is no possibility of a false match, and
- * it's faster to make useless comparisons to trailing bytes than it is to
- * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true
- * when we have to do it the hard way.
- */
-typedef struct CopyStateData
-{
- /* low-level state data */
- CopyDest copy_dest; /* type of copy source/destination */
- FILE *copy_file; /* used if copy_dest == COPY_FILE */
- StringInfo fe_msgbuf; /* used for all dests during COPY TO, only for
- * dest == COPY_NEW_FE in COPY FROM */
- bool is_copy_from; /* COPY TO, or COPY FROM? */
- bool reached_eof; /* true if we read to end of copy data (not
- * all copy_dest types maintain this) */
- EolType eol_type; /* EOL type of input */
- int file_encoding; /* file or remote side's character encoding */
- bool need_transcoding; /* file encoding diff from server? */
- bool encoding_embeds_ascii; /* ASCII can be non-first byte? */
-
- /* parameters from the COPY command */
- Relation rel; /* relation to copy to or from */
- QueryDesc *queryDesc; /* executable query to copy from */
- List *attnumlist; /* integer list of attnums to copy */
- char *filename; /* filename, or NULL for STDIN/STDOUT */
- bool is_program; /* is 'filename' a program to popen? */
- copy_data_source_cb data_source_cb; /* function for reading data */
- bool binary; /* binary format? */
- bool freeze; /* freeze rows on loading? */
- bool csv_mode; /* Comma Separated Value format? */
- bool header_line; /* CSV header line? */
- char *null_print; /* NULL marker string (server encoding!) */
- int null_print_len; /* length of same */
- char *null_print_client; /* same converted to file encoding */
- char *delim; /* column delimiter (must be 1 byte) */
- char *quote; /* CSV quote char (must be 1 byte) */
- char *escape; /* CSV escape char (must be 1 byte) */
- List *force_quote; /* list of column names */
- bool force_quote_all; /* FORCE_QUOTE *? */
- bool *force_quote_flags; /* per-column CSV FQ flags */
- List *force_notnull; /* list of column names */
- bool *force_notnull_flags; /* per-column CSV FNN flags */
- List *force_null; /* list of column names */
- bool *force_null_flags; /* per-column CSV FN flags */
- bool convert_selectively; /* do selective binary conversion? */
- List *convert_select; /* list of column names (can be NIL) */
- bool *convert_select_flags; /* per-column CSV/TEXT CS flags */
- Node *whereClause; /* WHERE condition (or NULL) */
-
- /* these are just for error messages, see CopyFromErrorCallback */
- const char *cur_relname; /* table name for error messages */
- uint64 cur_lineno; /* line number for error messages */
- const char *cur_attname; /* current att for error messages */
- const char *cur_attval; /* current att value for error messages */
-
- /*
- * Working state for COPY TO/FROM
- */
- MemoryContext copycontext; /* per-copy execution context */
-
- /*
- * Working state for COPY TO
- */
- FmgrInfo *out_functions; /* lookup info for output functions */
- MemoryContext rowcontext; /* per-row evaluation context */
-
- /*
- * Working state for COPY FROM
- */
- AttrNumber num_defaults;
- FmgrInfo *in_functions; /* array of input functions for each attrs */
- Oid *typioparams; /* array of element types for in_functions */
- int *defmap; /* array of default att numbers */
- ExprState **defexprs; /* array of default att expressions */
- bool volatile_defexprs; /* is any of defexprs volatile? */
- List *range_table;
- ExprState *qualexpr;
-
- TransitionCaptureState *transition_capture;
-
- /*
- * These variables are used to reduce overhead in COPY FROM.
- *
- * attribute_buf holds the separated, de-escaped text for each field of
- * the current line. The CopyReadAttributes functions return arrays of
- * pointers into this buffer. We avoid palloc/pfree overhead by re-using
- * the buffer on each cycle.
- *
- * In binary COPY FROM, attribute_buf holds the binary data for the
- * current field, but the usage is otherwise similar.
- */
- StringInfoData attribute_buf;
-
- /* field raw data pointers found by COPY FROM */
-
- int max_fields;
- char **raw_fields;
-
- /*
- * Similarly, line_buf holds the whole input line being processed. The
- * input cycle is first to read the whole line into line_buf, convert it
- * to server encoding there, and then extract the individual attribute
- * fields into attribute_buf. line_buf is preserved unmodified so that we
- * can display it in error messages if appropriate. (In binary mode,
- * line_buf is not used.)
- */
- StringInfoData line_buf;
- bool line_buf_converted; /* converted to server encoding? */
- bool line_buf_valid; /* contains the row being processed? */
-
- /*
- * Finally, raw_buf holds raw data read from the data source (file or
- * client connection). In text mode, CopyReadLine parses this data
- * sufficiently to locate line boundaries, then transfers the data to
- * line_buf and converts it. In binary mode, CopyReadBinaryData fetches
- * appropriate amounts of data from this buffer. In both modes, we
- * guarantee that there is a \0 at raw_buf[raw_buf_len].
- */
- char *raw_buf;
- int raw_buf_index; /* next byte to process */
- int raw_buf_len; /* total # of bytes stored */
- /* Shorthand for number of unconsumed bytes available in raw_buf */
-#define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
-} CopyStateData;
-
/* DestReceiver for COPY (query) TO */
typedef struct
{
@@ -415,8 +256,6 @@ static bool CopyGetInt16(CopyState cstate, int16 *val);
static bool CopyLoadRawBuf(CopyState cstate);
static int CopyReadBinaryData(CopyState cstate, char *dest, int nbytes);
-static void PopulateCommonCstateInfo(CopyState cstate, TupleDesc tup_desc,
- List *attnamelist);
static void ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
int copy_line_pos, int *copy_line_size);
static void ConvertToServerEncoding(CopyState cstate);
@@ -1134,6 +973,8 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
if (is_from)
{
+ ParallelContext *pcxt = NULL;
+
Assert(rel);
/* check read-only transaction and parallel mode */
@@ -1143,7 +984,35 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
cstate = BeginCopyFrom(pstate, rel, stmt->filename, stmt->is_program,
NULL, stmt->attlist, stmt->options);
cstate->whereClause = whereClause;
- *processed = CopyFrom(cstate); /* copy from file to database */
+ cstate->is_parallel = false;
+
+ if (cstate->nworkers > 0)
+ pcxt = BeginParallelCopy(cstate->nworkers, cstate, stmt->attlist,
+ relid);
+
+ if (pcxt)
+ {
+ int i;
+
+ ParallelCopyFrom(cstate);
+
+ /* Wait for all copy workers to finish */
+ WaitForParallelWorkersToFinish(pcxt);
+
+ /*
+ * Next, accumulate WAL usage. (This must wait for the workers to
+ * finish, or we might get incomplete data.)
+ */
+ for (i = 0; i < pcxt->nworkers_launched; i++)
+ InstrAccumParallelQuery(&cstate->pcdata->bufferusage[i],
+ &cstate->pcdata->walusage[i]);
+
+ *processed = pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
+ EndParallelCopy(pcxt);
+ }
+ else
+ *processed = CopyFrom(cstate); /* copy from file to database */
+
EndCopyFrom(cstate);
}
else
@@ -1192,6 +1061,7 @@ ProcessCopyOptions(ParseState *pstate,
cstate->is_copy_from = is_from;
cstate->file_encoding = -1;
+ cstate->nworkers = -1;
/* Extract options from the statement node tree */
foreach(option, options)
@@ -1360,6 +1230,39 @@ ProcessCopyOptions(ParseState *pstate,
defel->defname),
parser_errposition(pstate, defel->location)));
}
+ else if (strcmp(defel->defname, "parallel") == 0)
+ {
+ int val;
+ bool parsed;
+ char *strval;
+
+ if (!cstate->is_copy_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("parallel option is supported only for copy from"),
+ parser_errposition(pstate, defel->location)));
+ if (cstate->nworkers >= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("conflicting or redundant options"),
+ parser_errposition(pstate, defel->location)));
+
+ strval = defGetString(defel);
+ parsed = parse_int(strval, &val, 0, NULL);
+ if (!parsed)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for integer option \"%s\": %s",
+ defel->defname, strval)));
+ if (val < 1 || val > 1024)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("value %s out of bounds for option \"%s\"",
+ strval, defel->defname),
+ errdetail("Valid values are between \"%d\" and \"%d\".",
+ 1, 1024)));
+ cstate->nworkers = val;
+ }
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1715,9 +1618,9 @@ BeginCopy(ParseState *pstate,
* PopulateCommonCstateInfo
*
* Populates the common variables required for copy from operation. This is a
- * helper function for BeginCopy function.
+ * helper function for BeginCopy & InitializeParallelCopyInfo function.
*/
-static void
+void
PopulateCommonCstateInfo(CopyState cstate, TupleDesc tupDesc, List *attnamelist)
{
int num_phys_attrs;
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c639833..cd2d56e 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -14,14 +14,394 @@
#ifndef COPY_H
#define COPY_H
+#include "access/parallel.h"
+#include "commands/trigger.h"
+#include "executor/executor.h"
#include "nodes/execnodes.h"
#include "nodes/parsenodes.h"
#include "parser/parse_node.h"
#include "tcop/dest.h"
+#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+
+/*
+ * The macros DATA_BLOCK_SIZE, RINGSIZE & MAX_BLOCKS_COUNT stores the records
+ * read from the file that need to be inserted into the relation. These values
+ * help in the handover of multiple records with the significant size of data to
+ * be processed by each of the workers. This also ensures there is no context
+ * switch and the work is fairly distributed among the workers. This number
+ * showed best results in the performance tests.
+ */
+#define DATA_BLOCK_SIZE RAW_BUF_SIZE
+
+/* It can hold 1023 blocks of 64K data in DSM to be processed by the worker. */
+#define MAX_BLOCKS_COUNT 1024
+
+/*
+ * It can hold upto 10240 record information for worker to process. RINGSIZE
+ * should be a multiple of WORKER_CHUNK_COUNT, as wrap around cases is currently
+ * not handled while selecting the WORKER_CHUNK_COUNT by the worker.
+ */
+#define RINGSIZE (10 * 1024)
+
+/*
+ * Each worker will be allocated WORKER_CHUNK_COUNT of records from DSM data
+ * block to process to avoid lock contention. Read RINGSIZE comments before
+ * changing this value.
+ */
+#define WORKER_CHUNK_COUNT 64
+
+/*
+ * Represents the different source/dest cases we need to worry about at
+ * the bottom level
+ */
+typedef enum CopyDest
+{
+ COPY_FILE, /* to/from file (or a piped program) */
+ COPY_OLD_FE, /* to/from frontend (2.0 protocol) */
+ COPY_NEW_FE, /* to/from frontend (3.0 protocol) */
+ COPY_CALLBACK /* to/from callback function */
+} CopyDest;
+
+/*
+ * Represents the end-of-line terminator type of the input
+ */
+typedef enum EolType
+{
+ EOL_UNKNOWN,
+ EOL_NL,
+ EOL_CR,
+ EOL_CRNL
+} EolType;
+
+/*
+ * Copy data block information.
+ *
+ * These data blocks are created in DSM. Data read from file will be copied in
+ * these DSM data blocks. The leader process identifies the records and the
+ * record information will be shared to the workers. The workers will insert the
+ * records into the table. There can be one or more number of records in each of
+ * the data block based on the record size.
+ */
+typedef struct ParallelCopyDataBlock
+{
+ /* The number of unprocessed lines in the current block. */
+ pg_atomic_uint32 unprocessed_line_parts;
+
+ /*
+ * If the current line data is continued into another block,
+ * following_block will have the position where the remaining data need to
+ * be read.
+ */
+ uint32 following_block;
+
+ /*
+ * This flag will be set, when the leader finds out this block can be read
+ * safely by the worker. This helps the worker to start processing the
+ * line early where the line will be spread across many blocks and the
+ * worker need not wait for the complete line to be processed.
+ */
+ bool curr_blk_completed;
+
+ /*
+ * Few bytes need to be skipped from this block, this will be set when a
+ * sequence of characters like \r\n is expected, but end of our block
+ * contained only \r. In this case we copy the data from \r into the new
+ * block as they have to be processed together to identify end of line.
+ * Worker will use skip_bytes to know that this data must be skipped from
+ * this data block.
+ */
+ uint8 skip_bytes;
+ char data[DATA_BLOCK_SIZE]; /* data read from file */
+} ParallelCopyDataBlock;
+
+/*
+ * Individual line information.
+ *
+ * ParallelCopyLineBoundary is common data structure between leader & worker.
+ * Leader process will be populating data block, data block offset & the size of
+ * the record in DSM for the workers to copy the data into the relation.
+ * The leader & worker process access the shared line information by following
+ * the below steps to avoid any data corruption or hang:
+ * Leader should operate in the following order:
+ * 1) check if line_size is -1, if line_size is not -1 wait until line_size is
+ * set to -1 by the worker. If line_size is -1 it means worker is still
+ * processing.
+ * 2) set line_state to LINE_LEADER_POPULATING, so that the worker knows that
+ * leader is populating this line.
+ * 3) update first_block, start_offset & cur_lineno in any order.
+ * 4) update line_size.
+ * 5) update line_state to LINE_LEADER_POPULATED.
+ * Worker should operate in the following order:
+ * 1) check line_state is LINE_LEADER_POPULATED, if not it means leader is still
+ * populating the data.
+ * 2) read line_size to know the size of the data.
+ * 3) only one worker should choose one line for processing, this is handled by
+ * using pg_atomic_compare_exchange_u32, worker will change the state to
+ * LINE_WORKER_PROCESSING only if line_state is LINE_LEADER_POPULATED.
+ * 4) read first_block, start_offset & cur_lineno in any order.
+ * 5) process line_size data.
+ * 6) update line_size to -1.
+ */
+typedef struct ParallelCopyLineBoundary
+{
+ /* Position of the first block in data_blocks array. */
+ uint32 first_block;
+ uint32 start_offset; /* start offset of the line */
+
+ /*
+ * Size of the current line -1 means line is yet to be filled completely,
+ * 0 means empty line, >0 means line filled with line size data.
+ */
+ pg_atomic_uint32 line_size;
+ pg_atomic_uint32 line_state; /* line state */
+ uint64 cur_lineno; /* line number for error messages */
+} ParallelCopyLineBoundary;
+
+/*
+ * Circular queue used to store the line information.
+ */
+typedef struct ParallelCopyLineBoundaries
+{
+ /* Position for the leader to populate a line. */
+ uint32 pos;
+
+ /* Data read from the file/stdin by the leader process. */
+ ParallelCopyLineBoundary ring[RINGSIZE];
+} ParallelCopyLineBoundaries;
+
+/*
+ * Shared information among parallel copy workers. This will be allocated in the
+ * DSM segment.
+ */
+typedef struct ParallelCopyShmInfo
+{
+ bool is_read_in_progress; /* file read status */
+
+ /*
+ * Actual lines inserted by worker, will not be same as
+ * total_worker_processed if where condition is specified along with copy.
+ * This will be the actual records inserted into the relation.
+ */
+ pg_atomic_uint64 processed;
+
+ /*
+ * The number of records currently processed by the worker, this will also
+ * include the number of records that was filtered because of where
+ * clause.
+ */
+ pg_atomic_uint64 total_worker_processed;
+ uint64 populated; /* lines populated by leader */
+ uint32 cur_block_pos; /* current data block */
+ ParallelCopyDataBlock data_blocks[MAX_BLOCKS_COUNT]; /* data block array */
+ FullTransactionId full_transaction_id; /* xid for copy from statement */
+ CommandId mycid; /* command id */
+ ParallelCopyLineBoundaries line_boundaries; /* line array */
+} ParallelCopyShmInfo;
+
+/*
+ * Parallel copy line buffer information.
+ */
+typedef struct ParallelCopyLineBuf
+{
+ StringInfoData line_buf;
+ uint64 cur_lineno; /* line number for error messages */
+} ParallelCopyLineBuf;
+
+/*
+ * This structure helps in storing the common data from CopyStateData that are
+ * required by the workers. This information will then be allocated and stored
+ * into the DSM for the worker to retrieve and copy it to CopyStateData.
+ */
+typedef struct SerializedParallelCopyState
+{
+ /* low-level state data */
+ CopyDest copy_dest; /* type of copy source/destination */
+ int file_encoding; /* file or remote side's character encoding */
+ bool need_transcoding; /* file encoding diff from server? */
+ bool encoding_embeds_ascii; /* ASCII can be non-first byte? */
+
+ /* parameters from the COPY command */
+ bool csv_mode; /* Comma Separated Value format? */
+ bool header_line; /* CSV header line? */
+ int null_print_len; /* length of same */
+ bool force_quote_all; /* FORCE_QUOTE *? */
+ bool convert_selectively; /* do selective binary conversion? */
+
+ /* Working state for COPY FROM */
+ AttrNumber num_defaults;
+ Oid relid;
+} SerializedParallelCopyState;
+
+/*
+ * Parallel copy data information.
+ */
+typedef struct ParallelCopyData
+{
+ Oid relid; /* relation id of the table */
+ ParallelCopyShmInfo *pcshared_info; /* common info in shared memory */
+ bool is_leader;
+
+ WalUsage *walusage;
+ BufferUsage *bufferusage;
+
+ /* line position which worker is processing */
+ uint32 worker_processed_pos;
+
+ /*
+ * Local line_buf array, workers will copy it here and release the lines
+ * for the leader to continue.
+ */
+ ParallelCopyLineBuf worker_line_buf[WORKER_CHUNK_COUNT];
+ uint32 worker_line_buf_count; /* Number of lines */
+
+ /* Current position in worker_line_buf */
+ uint32 worker_line_buf_pos;
+} ParallelCopyData;
+
+typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
+
+/*
+ * This struct contains all the state variables used throughout a COPY
+ * operation. For simplicity, we use the same struct for all variants of COPY,
+ * even though some fields are used in only some cases.
+ *
+ * Multi-byte encodings: all supported client-side encodings encode multi-byte
+ * characters by having the first byte's high bit set. Subsequent bytes of the
+ * character can have the high bit not set. When scanning data in such an
+ * encoding to look for a match to a single-byte (ie ASCII) character, we must
+ * use the full pg_encoding_mblen() machinery to skip over multibyte
+ * characters, else we might find a false match to a trailing byte. In
+ * supported server encodings, there is no possibility of a false match, and
+ * it's faster to make useless comparisons to trailing bytes than it is to
+ * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true
+ * when we have to do it the hard way.
+ */
+typedef struct CopyStateData
+{
+ /* low-level state data */
+ CopyDest copy_dest; /* type of copy source/destination */
+ FILE *copy_file; /* used if copy_dest == COPY_FILE */
+ StringInfo fe_msgbuf; /* used for all dests during COPY TO, only for
+ * dest == COPY_NEW_FE in COPY FROM */
+ bool is_copy_from; /* COPY TO, or COPY FROM? */
+ bool reached_eof; /* true if we read to end of copy data (not
+ * all copy_dest types maintain this) */
+ EolType eol_type; /* EOL type of input */
+ int file_encoding; /* file or remote side's character encoding */
+ bool need_transcoding; /* file encoding diff from server? */
+ bool encoding_embeds_ascii; /* ASCII can be non-first byte? */
+
+ /* parameters from the COPY command */
+ Relation rel; /* relation to copy to or from */
+ QueryDesc *queryDesc; /* executable query to copy from */
+ List *attnumlist; /* integer list of attnums to copy */
+ char *filename; /* filename, or NULL for STDIN/STDOUT */
+ bool is_program; /* is 'filename' a program to popen? */
+ copy_data_source_cb data_source_cb; /* function for reading data */
+ bool binary; /* binary format? */
+ bool freeze; /* freeze rows on loading? */
+ bool csv_mode; /* Comma Separated Value format? */
+ bool header_line; /* CSV header line? */
+ char *null_print; /* NULL marker string (server encoding!) */
+ int null_print_len; /* length of same */
+ char *null_print_client; /* same converted to file encoding */
+ char *delim; /* column delimiter (must be 1 byte) */
+ char *quote; /* CSV quote char (must be 1 byte) */
+ char *escape; /* CSV escape char (must be 1 byte) */
+ List *force_quote; /* list of column names */
+ bool force_quote_all; /* FORCE_QUOTE *? */
+ bool *force_quote_flags; /* per-column CSV FQ flags */
+ List *force_notnull; /* list of column names */
+ bool *force_notnull_flags; /* per-column CSV FNN flags */
+ List *force_null; /* list of column names */
+ bool *force_null_flags; /* per-column CSV FN flags */
+ bool convert_selectively; /* do selective binary conversion? */
+ List *convert_select; /* list of column names (can be NIL) */
+ bool *convert_select_flags; /* per-column CSV/TEXT CS flags */
+ Node *whereClause; /* WHERE condition (or NULL) */
+
+ /* these are just for error messages, see CopyFromErrorCallback */
+ const char *cur_relname; /* table name for error messages */
+ uint64 cur_lineno; /* line number for error messages */
+ const char *cur_attname; /* current att for error messages */
+ const char *cur_attval; /* current att value for error messages */
+
+ /*
+ * Working state for COPY TO/FROM
+ */
+ MemoryContext copycontext; /* per-copy execution context */
+
+ /*
+ * Working state for COPY TO
+ */
+ FmgrInfo *out_functions; /* lookup info for output functions */
+ MemoryContext rowcontext; /* per-row evaluation context */
+
+ /*
+ * Working state for COPY FROM
+ */
+ AttrNumber num_defaults;
+ FmgrInfo *in_functions; /* array of input functions for each attrs */
+ Oid *typioparams; /* array of element types for in_functions */
+ int *defmap; /* array of default att numbers */
+ ExprState **defexprs; /* array of default att expressions */
+ bool volatile_defexprs; /* is any of defexprs volatile? */
+ List *range_table;
+ ExprState *qualexpr;
+
+ TransitionCaptureState *transition_capture;
+
+ /*
+ * These variables are used to reduce overhead in COPY FROM.
+ *
+ * attribute_buf holds the separated, de-escaped text for each field of
+ * the current line. The CopyReadAttributes functions return arrays of
+ * pointers into this buffer. We avoid palloc/pfree overhead by re-using
+ * the buffer on each cycle.
+ *
+ * In binary COPY FROM, attribute_buf holds the binary data for the
+ * current field, but the usage is otherwise similar.
+ */
+ StringInfoData attribute_buf;
+
+ /* field raw data pointers found by COPY FROM */
+
+ int max_fields;
+ char **raw_fields;
+
+ /*
+ * Similarly, line_buf holds the whole input line being processed. The
+ * input cycle is first to read the whole line into line_buf, convert it
+ * to server encoding there, and then extract the individual attribute
+ * fields into attribute_buf. line_buf is preserved unmodified so that we
+ * can display it in error messages if appropriate. (In binary mode,
+ * line_buf is not used.)
+ */
+ StringInfoData line_buf;
+ bool line_buf_converted; /* converted to server encoding? */
+ bool line_buf_valid; /* contains the row being processed? */
+
+ /*
+ * Finally, raw_buf holds raw data read from the data source (file or
+ * client connection). In text mode, CopyReadLine parses this data
+ * sufficiently to locate line boundaries, then transfers the data to
+ * line_buf and converts it. In binary mode, CopyReadBinaryData fetches
+ * appropriate amounts of data from this buffer. In both modes, we
+ * guarantee that there is a \0 at raw_buf[raw_buf_len].
+ */
+ char *raw_buf;
+ int raw_buf_index; /* next byte to process */
+ int raw_buf_len; /* total # of bytes stored */
+ int nworkers;
+ bool is_parallel;
+ ParallelCopyData *pcdata;
+ /* Shorthand for number of unconsumed bytes available in raw_buf */
+#define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
+} CopyStateData;
+
/* CopyStateData is private in commands/copy.c */
typedef struct CopyStateData *CopyState;
-typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
extern void DoCopy(ParseState *state, const CopyStmt *stmt,
int stmt_location, int stmt_len,
@@ -41,4 +421,11 @@ extern uint64 CopyFrom(CopyState cstate);
extern DestReceiver *CreateCopyDestReceiver(void);
+extern void PopulateCommonCstateInfo(CopyState cstate, TupleDesc tup_desc,
+ List *attnamelist);
+
+extern void ParallelCopyMain(dsm_segment *seg, shm_toc *toc);
+extern ParallelContext *BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid);
+extern void ParallelCopyFrom(CopyState cstate);
+extern void EndParallelCopy(ParallelContext *pcxt);
#endif /* COPY_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 9cd1179..f5b818b 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1702,6 +1702,12 @@ ParallelBitmapHeapState
ParallelBlockTableScanDesc
ParallelCompletionPtr
ParallelContext
+ParallelCopyLineBoundaries
+ParallelCopyLineBoundary
+ParallelCopyData
+ParallelCopyDataBlock
+ParallelCopyLineBuf
+ParallelCopyShmInfo
ParallelExecutorInfo
ParallelHashGrowth
ParallelHashJoinBatch
@@ -2219,6 +2225,7 @@ SerCommitSeqNo
SerialControl
SerializableXactHandle
SerializedActiveRelMaps
+SerializedParallelCopyState
SerializedReindexState
SerializedSnapshotData
SerializedTransactionState
--
1.8.3.1
v6-0003-Allow-copy-from-command-to-process-data-from-file.patchapplication/octet-stream; name=v6-0003-Allow-copy-from-command-to-process-data-from-file.patchDownload
From eb1a33276d1f907d14e7e1962b1cd254b81e1587 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Wed, 7 Oct 2020 17:24:44 +0530
Subject: [PATCH v6 3/6] Allow copy from command to process data from
file/STDIN contents to a table in parallel.
This feature allows the copy from to leverage multiple CPUs in order to copy
data from file/STDIN to a table. This adds a PARALLEL option to COPY FROM
command where the user can specify the number of workers that can be used
to perform the COPY FROM command.
The backend, to which the "COPY FROM" query is submitted acts as leader with
the responsibility of reading data from the file/stdin, launching at most n
number of workers as specified with PARALLEL 'n' option in the "COPY FROM"
query. The leader populates the common data required for the workers execution
in the DSM and shares it with the workers. The leader then executes before
statement triggers if there exists any. Leader populates DSM lines which
includes the start offset and line size, while populating the lines it reads
as many blocks as required into the DSM data blocks from the file. Each block
is of 64K size. The leader parses the data to identify a line, the existing
logic from CopyReadLineText which identifies the lines with some changes was
used for this. Leader checks if a free line is available to copy the
information, if there is no free line it waits till the required line is
freed up by the worker and then copies the identified lines information
(offset & line size) into the DSM lines. This process is repeated till the
complete file is processed. Simultaneously, the workers cache the lines(50)
locally into the local memory and release the lines to the leader for further
populating. Each worker processes the lines that it cached and inserts it into
the table.
The leader does not participate in the insertion of data, leaders only
responsibility will be to identify the lines as fast as possible for the
workers to do the actual copy operation. The leader waits till all the lines
populated are processed by the workers and exits. We have chosen this design
based on the reason "that everything stalls if the leader doesn't accept further
input data, as well as when there are no available splitted chunks so it doesn't
seem like a good idea to have the leader do other work. This is backed by the
performance data where we have seen that with 1 worker there is just a 5-10%
performance difference".
---
src/backend/access/common/toast_internals.c | 12 +-
src/backend/access/heap/heapam.c | 11 -
src/backend/access/transam/xact.c | 15 +
src/backend/commands/copy.c | 220 +++--
src/backend/commands/copyparallel.c | 1269 +++++++++++++++++++++++++++
src/bin/psql/tab-complete.c | 2 +-
src/include/access/xact.h | 1 +
src/include/commands/copy.h | 69 +-
src/tools/pgindent/typedefs.list | 1 +
9 files changed, 1514 insertions(+), 86 deletions(-)
create mode 100644 src/backend/commands/copyparallel.c
diff --git a/src/backend/access/common/toast_internals.c b/src/backend/access/common/toast_internals.c
index 25a81e5..70c070e 100644
--- a/src/backend/access/common/toast_internals.c
+++ b/src/backend/access/common/toast_internals.c
@@ -17,6 +17,7 @@
#include "access/genam.h"
#include "access/heapam.h"
#include "access/heaptoast.h"
+#include "access/parallel.h"
#include "access/table.h"
#include "access/toast_internals.h"
#include "access/xact.h"
@@ -116,7 +117,16 @@ toast_save_datum(Relation rel, Datum value,
TupleDesc toasttupDesc;
Datum t_values[3];
bool t_isnull[3];
- CommandId mycid = GetCurrentCommandId(true);
+
+ /*
+ * Parallel copy can insert toast tuples, in case of parallel copy the
+ * command would have been set already by calling
+ * AssignCommandIdForWorker. For parallel copy call GetCurrentCommandId to
+ * get currentCommandId by passing used as false, as this is taken care
+ * earlier.
+ */
+ CommandId mycid = IsParallelWorker() ? GetCurrentCommandId(false) :
+ GetCurrentCommandId(true);
struct varlena *result;
struct varatt_external toast_pointer;
union
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 1585861..1602525 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2043,17 +2043,6 @@ static HeapTuple
heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
CommandId cid, int options)
{
- /*
- * To allow parallel inserts, we need to ensure that they are safe to be
- * performed in workers. We have the infrastructure to allow parallel
- * inserts in general except for the cases where inserts generate a new
- * CommandId (eg. inserts into a table having a foreign key column).
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-
tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index af6afce..0b3337c 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -776,6 +776,21 @@ GetCurrentCommandId(bool used)
}
/*
+ * SetCurrentCommandIdUsedForWorker
+ *
+ * For a parallel worker, record that the currentCommandId has been used.
+ * This must only be called at the start of a parallel operation.
+ */
+void
+SetCurrentCommandIdUsedForWorker(void)
+{
+ Assert(IsParallelWorker() && !currentCommandIdUsed &&
+ (currentCommandId != InvalidCommandId));
+
+ currentCommandIdUsed = true;
+}
+
+/*
* SetParallelStartTimestamps
*
* In a parallel worker, we should inherit the parent transaction's
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 1e55a30..dc006a5 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -61,20 +61,6 @@
#define ISOCTAL(c) (((c) >= '0') && ((c) <= '7'))
#define OCTVALUE(c) ((c) - '0')
-/*
- * Represents the heap insert method to be used during COPY FROM.
- */
-typedef enum CopyInsertMethod
-{
- CIM_SINGLE, /* use table_tuple_insert or fdw routine */
- CIM_MULTI, /* always use table_multi_insert */
- CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
-} CopyInsertMethod;
-
-#define IsParallelCopy() (cstate->is_parallel)
-#define IsLeader() (cstate->pcdata->is_leader)
-#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
-
/* DestReceiver for COPY (query) TO */
typedef struct
{
@@ -131,7 +117,6 @@ typedef struct CopyMultiInsertInfo
int ti_options; /* table insert options */
} CopyMultiInsertInfo;
-
/*
* These macros centralize code used to process line_buf and raw_buf buffers.
* They are macros because they often do continue/break control and to avoid
@@ -182,9 +167,13 @@ if (1) \
{ \
if (raw_buf_ptr > cstate->raw_buf_index) \
{ \
- appendBinaryStringInfo(&cstate->line_buf, \
- cstate->raw_buf + cstate->raw_buf_index, \
- raw_buf_ptr - cstate->raw_buf_index); \
+ if (!IsParallelCopy()) \
+ appendBinaryStringInfo(&cstate->line_buf, \
+ cstate->raw_buf + cstate->raw_buf_index, \
+ raw_buf_ptr - cstate->raw_buf_index); \
+ else \
+ line_size += raw_buf_ptr - cstate->raw_buf_index; \
+ \
cstate->raw_buf_index = raw_buf_ptr; \
} \
} else ((void) 0)
@@ -197,18 +186,6 @@ if (1) \
goto not_end_of_copy; \
} else ((void) 0)
-/*
- * Increment the lines processed.
- */
-#define INCREMENTPROCESSED(processed) \
-processed++;
-
-/*
- * Get the lines processed.
- */
-#define RETURNPROCESSED(processed) \
-return processed;
-
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
@@ -225,7 +202,6 @@ static void EndCopyTo(CopyState cstate);
static uint64 DoCopyTo(CopyState cstate);
static uint64 CopyTo(CopyState cstate);
static void CopyOneRowTo(CopyState cstate, TupleTableSlot *slot);
-static bool CopyReadLine(CopyState cstate);
static bool CopyReadLineText(CopyState cstate);
static int CopyReadAttributesText(CopyState cstate);
static int CopyReadAttributesCSV(CopyState cstate);
@@ -258,7 +234,6 @@ static int CopyReadBinaryData(CopyState cstate, char *dest, int nbytes);
static void ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
int copy_line_pos, int *copy_line_size);
-static void ConvertToServerEncoding(CopyState cstate);
/*
@@ -2639,7 +2614,7 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
*
* Check if the relation specified in copy from is valid.
*/
-static void
+void
CheckTargetRelValidity(CopyState cstate)
{
Assert(cstate->rel);
@@ -2735,7 +2710,7 @@ CopyFrom(CopyState cstate)
PartitionTupleRouting *proute = NULL;
ErrorContextCallback errcallback;
- CommandId mycid = GetCurrentCommandId(true);
+ CommandId mycid;
int ti_options = 0; /* start with default options for insert */
BulkInsertState bistate = NULL;
CopyInsertMethod insertMethod;
@@ -2745,7 +2720,18 @@ CopyFrom(CopyState cstate)
bool has_instead_insert_row_trig;
bool leafpart_use_multi_insert = false;
- CheckTargetRelValidity(cstate);
+ /*
+ * Perform this check if it is not parallel copy. In case of parallel
+ * copy, this check is done by the leader, so that if any invalid case
+ * exist the copy from command will error out from the leader itself,
+ * avoiding launching workers, just to throw error.
+ */
+ if (!IsParallelCopy())
+ CheckTargetRelValidity(cstate);
+ else
+ SetCurrentCommandIdUsedForWorker();
+
+ mycid = GetCurrentCommandId(!IsParallelCopy());
/*
* If the target file is new-in-transaction, we assume that checking FSM
@@ -2785,7 +2771,8 @@ CopyFrom(CopyState cstate)
target_resultRelInfo = resultRelInfo;
/* Verify the named relation is a valid target for INSERT */
- CheckValidResultRel(resultRelInfo, CMD_INSERT);
+ if (!IsParallelCopy())
+ CheckValidResultRel(resultRelInfo, CMD_INSERT);
ExecOpenIndices(resultRelInfo, false);
@@ -2934,13 +2921,17 @@ CopyFrom(CopyState cstate)
has_instead_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
resultRelInfo->ri_TrigDesc->trig_insert_instead_row);
- /*
- * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
- * should do this for COPY, since it's not really an "INSERT" statement as
- * such. However, executing these triggers maintains consistency with the
- * EACH ROW triggers that we already fire on COPY.
- */
- ExecBSInsertTriggers(estate, resultRelInfo);
+ if (!IsParallelCopy())
+ {
+ /*
+ * Check BEFORE STATEMENT insertion triggers. It's debatable whether
+ * we should do this for COPY, since it's not really an "INSERT"
+ * statement as such. However, executing these triggers maintains
+ * consistency with the EACH ROW triggers that we already fire on
+ * COPY.
+ */
+ ExecBSInsertTriggers(estate, resultRelInfo);
+ }
econtext = GetPerTupleExprContext(estate);
@@ -3040,6 +3031,29 @@ CopyFrom(CopyState cstate)
!has_instead_insert_row_trig &&
resultRelInfo->ri_FdwRoutine == NULL;
+ /*
+ * We may still be able to perform parallel inserts for
+ * partitioned tables. However, the possibility of this
+ * depends on which types of triggers exist on the partition.
+ * We must not do parallel inserts if the partition is a
+ * foreign table or it has any BEFORE/INSTEAD OF row triggers.
+ * Since the partition's resultRelInfo are initialized only
+ * when we actually insert the first tuple into them, we may
+ * not know this info easily in the leader while deciding for
+ * the parallelism. We would have gone ahead and allowed
+ * parallelism. Now it's the time to throw an error and also
+ * provide a hint to the user to not use parallelism. Throwing
+ * an error seemed a simple approach than to look for all the
+ * partitions in the leader while deciding for the
+ * parallelism. Note that this error is thrown early, exactly
+ * on the first tuple being inserted into the partition, so
+ * not much work, that has been done so far, is wasted.
+ */
+ if (!leafpart_use_multi_insert && IsParallelWorker())
+ ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot perform PARALLEL COPY if partition has BEFORE/INSTEAD OF triggers, or if the partition is foreign partition"),
+ errhint("Try COPY without PARALLEL option")));
+
/* Set the multi-insert buffer to use for this partition. */
if (leafpart_use_multi_insert)
{
@@ -3325,7 +3339,7 @@ CopyFrom(CopyState cstate)
*
* Populate the cstate catalog information.
*/
-static void
+void
PopulateCstateCatalogInfo(CopyState cstate)
{
TupleDesc tupDesc;
@@ -3607,26 +3621,35 @@ NextCopyFromRawFields(CopyState cstate, char ***fields, int *nfields)
/* only available for text or csv input */
Assert(!cstate->binary);
- /* on input just throw the header line away */
- if (cstate->cur_lineno == 0 && cstate->header_line)
+ if (IsParallelCopy())
{
- cstate->cur_lineno++;
- if (CopyReadLine(cstate))
- return false; /* done */
+ done = GetWorkerLine(cstate);
+ if (done && cstate->line_buf.len == 0)
+ return false;
}
+ else
+ {
+ /* on input just throw the header line away */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
+ {
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ return false; /* done */
+ }
- cstate->cur_lineno++;
+ cstate->cur_lineno++;
- /* Actually read the line into memory here */
- done = CopyReadLine(cstate);
+ /* Actually read the line into memory here */
+ done = CopyReadLine(cstate);
- /*
- * EOF at start of line means we're done. If we see EOF after some
- * characters, we act as though it was newline followed by EOF, ie,
- * process the line and then exit loop on next iteration.
- */
- if (done && cstate->line_buf.len == 0)
- return false;
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ return false;
+ }
/* Parse the line into de-escaped field values */
if (cstate->csv_mode)
@@ -3851,7 +3874,7 @@ EndCopyFrom(CopyState cstate)
* by newline. The terminating newline or EOF marker is not included
* in the final value of line_buf.
*/
-static bool
+bool
CopyReadLine(CopyState cstate)
{
bool result;
@@ -3874,9 +3897,34 @@ CopyReadLine(CopyState cstate)
*/
if (cstate->copy_dest == COPY_NEW_FE)
{
+ bool bIsFirst = true;
+
do
{
- cstate->raw_buf_index = cstate->raw_buf_len;
+ if (!IsParallelCopy())
+ cstate->raw_buf_index = cstate->raw_buf_len;
+ else
+ {
+ if (cstate->raw_buf_index == RAW_BUF_SIZE)
+ {
+ /*
+ * Get a new block if it is the first time, From the
+ * subsequent time, reset the index and re-use the
+ * same block.
+ */
+ if (bIsFirst)
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint32 block_pos = WaitGetFreeCopyBlock(pcshared_info);
+
+ cstate->raw_buf = pcshared_info->data_blocks[block_pos].data;
+ bIsFirst = false;
+ }
+
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+ }
+ }
+
} while (CopyLoadRawBuf(cstate));
}
}
@@ -3931,11 +3979,11 @@ ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
*
* Convert contents to server encoding.
*/
-static void
+void
ConvertToServerEncoding(CopyState cstate)
{
/* Done reading the line. Convert it to server encoding. */
- if (cstate->need_transcoding)
+ if (cstate->need_transcoding && (!IsParallelCopy() || IsWorker()))
{
char *cvt;
@@ -3975,6 +4023,11 @@ CopyReadLineText(CopyState cstate)
char quotec = '\0';
char escapec = '\0';
+ /* For parallel copy */
+ int line_size = 0;
+ uint32 line_pos = 0;
+
+ cstate->eol_type = EOL_UNKNOWN;
if (cstate->csv_mode)
{
quotec = cstate->quote[0];
@@ -4029,6 +4082,8 @@ CopyReadLineText(CopyState cstate)
if (raw_buf_ptr >= copy_buf_len || need_data)
{
REFILL_LINEBUF;
+ SetRawBufForLoad(cstate, line_size, copy_buf_len, raw_buf_ptr,
+ ©_raw_buf);
/*
* Try to read some more data. This will certainly reset
@@ -4253,9 +4308,15 @@ CopyReadLineText(CopyState cstate)
* discard the data and the \. sequence.
*/
if (prev_raw_ptr > cstate->raw_buf_index)
- appendBinaryStringInfo(&cstate->line_buf,
- cstate->raw_buf + cstate->raw_buf_index,
- prev_raw_ptr - cstate->raw_buf_index);
+ {
+ if (!IsParallelCopy())
+ appendBinaryStringInfo(&cstate->line_buf,
+ cstate->raw_buf + cstate->raw_buf_index,
+ prev_raw_ptr - cstate->raw_buf_index);
+ else
+ line_size += prev_raw_ptr - cstate->raw_buf_index;
+ }
+
cstate->raw_buf_index = raw_buf_ptr;
result = true; /* report EOF */
break;
@@ -4307,6 +4368,22 @@ not_end_of_copy:
IF_NEED_REFILL_AND_EOF_BREAK(mblen - 1);
raw_buf_ptr += mblen - 1;
}
+
+ /*
+ * Skip the header line. Update the line here, this cannot be done at
+ * the beginning, as there is a possibility that file contains empty
+ * lines.
+ */
+ if (IsParallelCopy() && first_char_in_line && !IsHeaderLine())
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+
+ line_pos = UpdateSharedLineInfo(cstate,
+ pcshared_info->cur_block_pos,
+ cstate->raw_buf_index, -1,
+ LINE_LEADER_POPULATING, -1);
+ }
+
first_char_in_line = false;
} /* end of outer loop */
@@ -4315,9 +4392,16 @@ not_end_of_copy:
*/
REFILL_LINEBUF;
if (!result && !IsHeaderLine())
- ClearEOLFromCopiedData(cstate, cstate->line_buf.data,
- cstate->line_buf.len, &cstate->line_buf.len);
+ {
+ if (IsParallelCopy())
+ ClearEOLFromCopiedData(cstate, cstate->raw_buf, raw_buf_ptr,
+ &line_size);
+ else
+ ClearEOLFromCopiedData(cstate, cstate->line_buf.data,
+ cstate->line_buf.len, &cstate->line_buf.len);
+ }
+ EndLineParallelCopy(cstate, line_pos, line_size, raw_buf_ptr);
return result;
}
diff --git a/src/backend/commands/copyparallel.c b/src/backend/commands/copyparallel.c
new file mode 100644
index 0000000..6a44a01
--- /dev/null
+++ b/src/backend/commands/copyparallel.c
@@ -0,0 +1,1269 @@
+/*-------------------------------------------------------------------------
+ *
+ * copyparallel.c
+ * Implements the Parallel COPY utility command
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/commands/copyparallel.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/heapam.h"
+#include "catalog/pg_proc_d.h"
+#include "commands/copy.h"
+#include "optimizer/clauses.h"
+#include "optimizer/optimizer.h"
+#include "pgstat.h"
+#include "utils/lsyscache.h"
+
+/* DSM keys for parallel copy. */
+#define PARALLEL_COPY_KEY_SHARED_INFO 1
+#define PARALLEL_COPY_KEY_CSTATE 2
+#define PARALLEL_COPY_WAL_USAGE 3
+#define PARALLEL_COPY_BUFFER_USAGE 4
+
+/* Begin parallel copy Macros */
+#define SET_NEWLINE_SIZE() \
+{ \
+ if (cstate->eol_type == EOL_NL || cstate->eol_type == EOL_CR) \
+ new_line_size = 1; \
+ else if (cstate->eol_type == EOL_CRNL) \
+ new_line_size = 2; \
+ else \
+ new_line_size = 0; \
+}
+
+/*
+ * COPY_WAIT_TO_PROCESS - Wait before continuing to process.
+ */
+#define COPY_WAIT_TO_PROCESS() \
+{ \
+ CHECK_FOR_INTERRUPTS(); \
+ (void) WaitLatch(MyLatch, \
+ WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, \
+ 1L, WAIT_EVENT_PG_SLEEP); \
+ ResetLatch(MyLatch); \
+}
+
+/*
+ * CopyStringToSharedMemory - Copy the string to shared memory.
+ */
+static void
+CopyStringToSharedMemory(CopyState cstate, char *srcPtr, char *destptr,
+ uint32 *copiedsize)
+{
+ uint32 len = srcPtr ? strlen(srcPtr) + 1 : 0;
+
+ memcpy(destptr, (uint16 *) &len, sizeof(uint16));
+ *copiedsize += sizeof(uint16);
+ if (len)
+ {
+ memcpy(destptr + sizeof(uint16), srcPtr, len);
+ *copiedsize += len;
+ }
+}
+
+/*
+ * SerializeParallelCopyState - Serialize the data into shared memory.
+ */
+static void
+SerializeParallelCopyState(ParallelContext *pcxt, CopyState cstate,
+ uint32 estimatedSize, char *whereClauseStr,
+ char *rangeTableStr, char *attnameListStr,
+ char *notnullListStr, char *nullListStr,
+ char *convertListStr)
+{
+ SerializedParallelCopyState shared_cstate;
+ char *shmptr = (char *) shm_toc_allocate(pcxt->toc, estimatedSize + 1);
+ uint32 copiedsize = 0;
+
+ shared_cstate.copy_dest = cstate->copy_dest;
+ shared_cstate.file_encoding = cstate->file_encoding;
+ shared_cstate.need_transcoding = cstate->need_transcoding;
+ shared_cstate.encoding_embeds_ascii = cstate->encoding_embeds_ascii;
+ shared_cstate.csv_mode = cstate->csv_mode;
+ shared_cstate.header_line = cstate->header_line;
+ shared_cstate.null_print_len = cstate->null_print_len;
+ shared_cstate.force_quote_all = cstate->force_quote_all;
+ shared_cstate.convert_selectively = cstate->convert_selectively;
+ shared_cstate.num_defaults = cstate->num_defaults;
+ shared_cstate.relid = cstate->pcdata->relid;
+
+ memcpy(shmptr, (char *) &shared_cstate, sizeof(SerializedParallelCopyState));
+ copiedsize = sizeof(SerializedParallelCopyState);
+
+ CopyStringToSharedMemory(cstate, cstate->null_print, shmptr + copiedsize,
+ &copiedsize);
+ CopyStringToSharedMemory(cstate, cstate->delim, shmptr + copiedsize,
+ &copiedsize);
+ CopyStringToSharedMemory(cstate, cstate->quote, shmptr + copiedsize,
+ &copiedsize);
+ CopyStringToSharedMemory(cstate, cstate->escape, shmptr + copiedsize,
+ &copiedsize);
+ CopyStringToSharedMemory(cstate, attnameListStr, shmptr + copiedsize,
+ &copiedsize);
+ CopyStringToSharedMemory(cstate, notnullListStr, shmptr + copiedsize,
+ &copiedsize);
+ CopyStringToSharedMemory(cstate, nullListStr, shmptr + copiedsize,
+ &copiedsize);
+ CopyStringToSharedMemory(cstate, convertListStr, shmptr + copiedsize,
+ &copiedsize);
+ CopyStringToSharedMemory(cstate, whereClauseStr, shmptr + copiedsize,
+ &copiedsize);
+ CopyStringToSharedMemory(cstate, rangeTableStr, shmptr + copiedsize,
+ &copiedsize);
+
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_CSTATE, shmptr);
+}
+
+/*
+ * CopyNodeFromSharedMemory - Copy the shared memory & return the ptr.
+ */
+static char *
+CopyStringFromSharedMemory(char *srcPtr, uint32 *copiedsize)
+{
+ char *destptr = NULL;
+ uint16 len = 0;
+
+ memcpy((uint16 *) (&len), srcPtr, sizeof(uint16));
+ *copiedsize += sizeof(uint16);
+ if (len)
+ {
+ destptr = (char *) palloc0(len);
+ memcpy(destptr, srcPtr + sizeof(uint16), len);
+ *copiedsize += len;
+ }
+
+ return destptr;
+}
+
+/*
+ * CopyNodeFromSharedMemory - Copy the shared memory & convert it into node
+ * type.
+ */
+static void *
+CopyNodeFromSharedMemory(char *srcPtr, uint32 *copiedsize)
+{
+ char *destptr = NULL;
+ List *destList = NIL;
+ uint16 len = 0;
+
+ memcpy((uint16 *) (&len), srcPtr, sizeof(uint16));
+ *copiedsize += sizeof(uint16);
+ if (len)
+ {
+ destptr = (char *) palloc0(len);
+ memcpy(destptr, srcPtr + sizeof(uint16), len);
+ *copiedsize += len;
+ destList = (List *) stringToNode(destptr);
+ pfree(destptr);
+ }
+
+ return destList;
+}
+
+/*
+ * RestoreParallelCopyState - Retrieve the cstate from shared memory.
+ */
+static void
+RestoreParallelCopyState(shm_toc *toc, CopyState cstate, List **attlist)
+{
+ char *shared_str_val = (char *) shm_toc_lookup(toc, PARALLEL_COPY_KEY_CSTATE, true);
+ SerializedParallelCopyState shared_cstate = {0};
+ uint32 copiedsize = 0;
+
+ memcpy(&shared_cstate, (char *) shared_str_val, sizeof(SerializedParallelCopyState));
+ copiedsize = sizeof(SerializedParallelCopyState);
+
+ cstate->file_encoding = shared_cstate.file_encoding;
+ cstate->need_transcoding = shared_cstate.need_transcoding;
+ cstate->encoding_embeds_ascii = shared_cstate.encoding_embeds_ascii;
+ cstate->csv_mode = shared_cstate.csv_mode;
+ cstate->header_line = shared_cstate.header_line;
+ cstate->null_print_len = shared_cstate.null_print_len;
+ cstate->force_quote_all = shared_cstate.force_quote_all;
+ cstate->convert_selectively = shared_cstate.convert_selectively;
+ cstate->num_defaults = shared_cstate.num_defaults;
+ cstate->pcdata->relid = shared_cstate.relid;
+
+ cstate->null_print = CopyStringFromSharedMemory(shared_str_val + copiedsize,
+ &copiedsize);
+ cstate->delim = CopyStringFromSharedMemory(shared_str_val + copiedsize,
+ &copiedsize);
+ cstate->quote = CopyStringFromSharedMemory(shared_str_val + copiedsize,
+ &copiedsize);
+ cstate->escape = CopyStringFromSharedMemory(shared_str_val + copiedsize,
+ &copiedsize);
+
+ *attlist = (List *) CopyNodeFromSharedMemory(shared_str_val + copiedsize,
+ &copiedsize);
+ cstate->force_notnull = (List *) CopyNodeFromSharedMemory(shared_str_val + copiedsize,
+ &copiedsize);
+ cstate->force_null = (List *) CopyNodeFromSharedMemory(shared_str_val + copiedsize,
+ &copiedsize);
+ cstate->convert_select = (List *) CopyNodeFromSharedMemory(shared_str_val + copiedsize,
+ &copiedsize);
+ cstate->whereClause = (Node *) CopyNodeFromSharedMemory(shared_str_val + copiedsize,
+ &copiedsize);
+ cstate->range_table = (List *) CopyNodeFromSharedMemory(shared_str_val + copiedsize,
+ &copiedsize);
+}
+
+/*
+ * EstimateStringSize - Estimate the size required for the string in shared
+ * memory.
+ */
+static uint32
+EstimateStringSize(char *str)
+{
+ uint32 strsize = sizeof(uint16);
+
+ if (str)
+ strsize += strlen(str) + 1;
+
+ return strsize;
+}
+
+/*
+ * EstimateNodeSize - Convert the list to string & estimate the size required
+ * in shared memory.
+ */
+static uint32
+EstimateNodeSize(void *list, char **listStr)
+{
+ uint32 strsize = sizeof(uint16);
+
+ if (list != NIL)
+ {
+ *listStr = nodeToString(list);
+ strsize += strlen(*listStr) + 1;
+ }
+
+ return strsize;
+}
+
+/*
+ * EstimateCstateSize - Estimate the size required in shared memory for cstate
+ * variables.
+ */
+static uint32
+EstimateCstateSize(ParallelContext *pcxt, CopyState cstate, List *attnamelist,
+ char **whereClauseStr, char **rangeTableStr,
+ char **attnameListStr, char **notnullListStr,
+ char **nullListStr, char **convertListStr)
+{
+ uint32 strsize = MAXALIGN(sizeof(SerializedParallelCopyState));
+
+ strsize += EstimateStringSize(cstate->null_print);
+ strsize += EstimateStringSize(cstate->delim);
+ strsize += EstimateStringSize(cstate->quote);
+ strsize += EstimateStringSize(cstate->escape);
+ strsize += EstimateNodeSize(attnamelist, attnameListStr);
+ strsize += EstimateNodeSize(cstate->force_notnull, notnullListStr);
+ strsize += EstimateNodeSize(cstate->force_null, nullListStr);
+ strsize += EstimateNodeSize(cstate->convert_select, convertListStr);
+ strsize += EstimateNodeSize(cstate->whereClause, whereClauseStr);
+ strsize += EstimateNodeSize(cstate->range_table, rangeTableStr);
+
+ strsize++;
+ shm_toc_estimate_chunk(&pcxt->estimator, strsize);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ return strsize;
+}
+
+/*
+ * PopulateParallelCopyShmInfo - Set ParallelCopyShmInfo.
+ */
+static void
+PopulateParallelCopyShmInfo(ParallelCopyShmInfo *shared_info_ptr)
+{
+ uint32 count;
+
+ MemSet(shared_info_ptr, 0, sizeof(ParallelCopyShmInfo));
+ shared_info_ptr->is_read_in_progress = true;
+ shared_info_ptr->cur_block_pos = -1;
+ for (count = 0; count < RINGSIZE; count++)
+ {
+ ParallelCopyLineBoundary *lineInfo = &shared_info_ptr->line_boundaries.ring[count];
+
+ pg_atomic_init_u32(&(lineInfo->line_size), -1);
+ }
+}
+
+/*
+ * CheckTrigFunParallelSafety - For all triggers, check if the associated
+ * trigger functions are parallel safe. If at least one trigger function is
+ * parallel unsafe, we do not allow parallelism.
+ */
+static pg_attribute_always_inline bool
+CheckTrigFunParallelSafety(TriggerDesc *trigdesc)
+{
+ int i;
+
+ for (i = 0; i < trigdesc->numtriggers; i++)
+ {
+ Trigger *trigger = &trigdesc->triggers[i];
+ int trigtype = RI_TRIGGER_NONE;
+
+ if (func_parallel(trigger->tgfoid) != PROPARALLEL_SAFE)
+ return false;
+
+ /* If the trigger is parallel safe, also look for RI_TRIGGER. */
+ trigtype = RI_FKey_trigger_type(trigger->tgfoid);
+
+ /*
+ * No parallelism if foreign key check trigger is present. This is
+ * because, while performing foreign key checks, we take KEY SHARE
+ * lock on primary key table rows which inturn will increment the
+ * command counter and updates the snapshot. Since we share the
+ * snapshots at the beginning of the command, we can't allow it to be
+ * changed later. So, unless we do something special for it, we can't
+ * allow parallelism in such cases.
+ */
+ if (trigtype == RI_TRIGGER_FK)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * CheckExprParallelSafety - Determine parallel safety of volatile expressions
+ * in default clause of column definition or in where clause and return true if
+ * they are parallel safe.
+ */
+static pg_attribute_always_inline bool
+CheckExprParallelSafety(CopyState cstate)
+{
+ if (contain_volatile_functions(cstate->whereClause))
+ {
+ if (max_parallel_hazard((Query *) cstate->whereClause) != PROPARALLEL_SAFE)
+ return false;
+ }
+
+ /*
+ * Check if any of the column has volatile default expression. if yes, and
+ * they are not parallel safe, then parallelism is not allowed. For
+ * instance, if there are any serial/bigserial columns for which nextval()
+ * default expression which is parallel unsafe is associated, parallelism
+ * should not be allowed.
+ */
+ if (cstate->defexprs != NULL && cstate->num_defaults != 0)
+ {
+ int i;
+
+ for (i = 0; i < cstate->num_defaults; i++)
+ {
+ bool volatile_expr = contain_volatile_functions((Node *) cstate->defexprs[i]->expr);
+
+ if (volatile_expr &&
+ (max_parallel_hazard((Query *) cstate->defexprs[i]->expr)) !=
+ PROPARALLEL_SAFE)
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * IsParallelCopyAllowed - Check for the cases where parallel copy is not
+ * applicable.
+ */
+static pg_attribute_always_inline bool
+IsParallelCopyAllowed(CopyState cstate)
+{
+ /* Parallel copy not allowed for frontend (2.0 protocol) & binary option. */
+ if ((cstate->copy_dest == COPY_OLD_FE) || cstate->binary)
+ return false;
+
+ /*
+ * Check if copy is into foreign table. We can not allow parallelism in
+ * this case because each worker needs to establish FDW connection and
+ * operate in a separate transaction. Unless we have a capability to
+ * provide two-phase commit protocol, we can not allow parallelism.
+ *
+ * Also check if copy is into temporary table. Since parallel workers can
+ * not access temporary table, parallelism is not allowed.
+ */
+ if (cstate->rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE ||
+ RelationUsesLocalBuffers(cstate->rel))
+ return false;
+
+ /*
+ * If there are volatile default expressions or where clause contain
+ * volatile expressions, allow parallelism if they are parallel safe,
+ * otherwise not.
+ */
+ if (!CheckExprParallelSafety(cstate))
+ return false;
+
+ /* Check parallel safety of the trigger functions. */
+ if (cstate->rel->trigdesc != NULL &&
+ !CheckTrigFunParallelSafety(cstate->rel->trigdesc))
+ return false;
+
+ /*
+ * When transition tables are involved (if after statement triggers are
+ * present), we collect minimal tuples in the tuple store after processing
+ * them so that later after statement triggers can access them. Now, if
+ * we want to enable parallelism for such cases, we instead need to store
+ * and access tuples from shared tuple store. However, it does not have
+ * the facility to store tuples in-memory, so we always need to store and
+ * access from a file which could be costly unless we also have an
+ * additional way to store minimal tuples in shared memory till work_mem
+ * and then in shared tuple store. It is possible to do all this to enable
+ * parallel copy for such cases. Currently, we can disallow parallelism
+ * for such cases and later allow if required.
+ *
+ * When there are BEFORE/AFTER/INSTEAD OF row triggers on the table. We do
+ * not allow parallelism in such cases because such triggers might query
+ * the table we are inserting into and act differently if the tuples that
+ * have already been processed and prepared for insertion are not there.
+ * Now, if we allow parallelism with such triggers the behaviour would
+ * depend on if the parallel worker has already inserted or not that
+ * particular tuples.
+ */
+ if (cstate->rel->trigdesc != NULL &&
+ (cstate->rel->trigdesc->trig_insert_after_statement ||
+ cstate->rel->trigdesc->trig_insert_new_table ||
+ cstate->rel->trigdesc->trig_insert_before_row ||
+ cstate->rel->trigdesc->trig_insert_after_row ||
+ cstate->rel->trigdesc->trig_insert_instead_row))
+ return false;
+
+ return true;
+}
+
+/*
+ * BeginParallelCopy - Start parallel copy tasks.
+ *
+ * Get the number of workers required to perform the parallel copy. The data
+ * structures that are required by the parallel workers will be initialized, the
+ * size required in DSM will be calculated and the necessary keys will be loaded
+ * in the DSM. The specified number of workers will then be launched.
+ */
+ParallelContext *
+BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
+{
+ ParallelContext *pcxt;
+ ParallelCopyShmInfo *shared_info_ptr;
+ Size est_cstateshared;
+ char *whereClauseStr = NULL;
+ char *rangeTableStr = NULL;
+ char *attnameListStr = NULL;
+ char *notnullListStr = NULL;
+ char *nullListStr = NULL;
+ char *convertListStr = NULL;
+ int parallel_workers = 0;
+ WalUsage *walusage;
+ BufferUsage *bufferusage;
+ ParallelCopyData *pcdata;
+ MemoryContext oldcontext;
+ uint32 strsize;
+
+ CheckTargetRelValidity(cstate);
+ parallel_workers = Min(nworkers, max_worker_processes);
+
+ /* Can't perform copy in parallel */
+ if (parallel_workers <= 0)
+ return NULL;
+
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+ pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
+ MemoryContextSwitchTo(oldcontext);
+ cstate->pcdata = pcdata;
+
+ /*
+ * User chosen parallel copy. Determine if the parallel copy is actually
+ * allowed. If not, go with the non-parallel mode.
+ */
+ if (!IsParallelCopyAllowed(cstate))
+ return NULL;
+
+ (void) GetCurrentFullTransactionId();
+ (void) GetCurrentCommandId(true);
+
+ EnterParallelMode();
+ pcxt = CreateParallelContext("postgres", "ParallelCopyMain",
+ parallel_workers);
+ Assert(pcxt->nworkers > 0);
+
+ /*
+ * Estimate size for shared information for PARALLEL_COPY_KEY_SHARED_INFO
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator, sizeof(ParallelCopyShmInfo));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate the size for shared information for PARALLEL_COPY_KEY_CSTATE */
+ est_cstateshared = MAXALIGN(sizeof(SerializedParallelCopyState));
+ shm_toc_estimate_chunk(&pcxt->estimator, est_cstateshared);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ strsize = EstimateCstateSize(pcxt, cstate, attnamelist, &whereClauseStr,
+ &rangeTableStr, &attnameListStr,
+ ¬nullListStr, &nullListStr,
+ &convertListStr);
+
+ /*
+ * Estimate space for WalUsage and BufferUsage -- PARALLEL_COPY_WAL_USAGE
+ * and PARALLEL_COPY_BUFFER_USAGE.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ InitializeParallelDSM(pcxt);
+
+ /* If no DSM segment was available, back out (do serial copy) */
+ if (pcxt->seg == NULL)
+ {
+ EndParallelCopy(pcxt);
+ return NULL;
+ }
+
+ /* Allocate shared memory for PARALLEL_COPY_KEY_SHARED_INFO */
+ shared_info_ptr = (ParallelCopyShmInfo *) shm_toc_allocate(pcxt->toc, sizeof(ParallelCopyShmInfo));
+ PopulateParallelCopyShmInfo(shared_info_ptr);
+
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_SHARED_INFO, shared_info_ptr);
+ pcdata->pcshared_info = shared_info_ptr;
+ pcdata->relid = relid;
+
+ SerializeParallelCopyState(pcxt, cstate, strsize, whereClauseStr,
+ rangeTableStr, attnameListStr, notnullListStr,
+ nullListStr, convertListStr);
+
+ /*
+ * Allocate space for each worker's WalUsage and BufferUsage; no need to
+ * initialize.
+ */
+ walusage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_WAL_USAGE, walusage);
+ pcdata->walusage = walusage;
+ bufferusage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_BUFFER_USAGE, bufferusage);
+ pcdata->bufferusage = bufferusage;
+
+ LaunchParallelWorkers(pcxt);
+ if (pcxt->nworkers_launched == 0)
+ {
+ EndParallelCopy(pcxt);
+ return NULL;
+ }
+
+ /*
+ * Caller needs to wait for all launched workers when we return. Make
+ * sure that the failure-to-start case will not hang forever.
+ */
+ WaitForParallelWorkersToAttach(pcxt);
+
+ pcdata->is_leader = true;
+ cstate->is_parallel = true;
+ return pcxt;
+}
+
+/*
+ * EndParallelCopy - End the parallel copy tasks.
+ */
+pg_attribute_always_inline void
+EndParallelCopy(ParallelContext *pcxt)
+{
+ Assert(!IsParallelWorker());
+
+ DestroyParallelContext(pcxt);
+ ExitParallelMode();
+}
+
+/*
+ * InitializeParallelCopyInfo - Initialize parallel worker.
+ */
+static void
+InitializeParallelCopyInfo(CopyState cstate, List *attnamelist)
+{
+ uint32 count;
+ ParallelCopyData *pcdata = cstate->pcdata;
+ TupleDesc tup_desc = RelationGetDescr(cstate->rel);
+
+ PopulateCommonCstateInfo(cstate, tup_desc, attnamelist);
+
+ /* Initialize state variables. */
+ cstate->reached_eof = false;
+ cstate->eol_type = EOL_UNKNOWN;
+ cstate->cur_relname = RelationGetRelationName(cstate->rel);
+ cstate->cur_lineno = 0;
+ cstate->cur_attname = NULL;
+ cstate->cur_attval = NULL;
+
+ /* Set up variables to avoid per-attribute overhead. */
+ initStringInfo(&cstate->attribute_buf);
+
+ initStringInfo(&cstate->line_buf);
+ for (count = 0; count < WORKER_CHUNK_COUNT; count++)
+ initStringInfo(&pcdata->worker_line_buf[count].line_buf);
+
+ cstate->line_buf_converted = false;
+ cstate->raw_buf = NULL;
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+
+ PopulateCstateCatalogInfo(cstate);
+
+ /* Create workspace for CopyReadAttributes results. */
+ if (!cstate->binary)
+ {
+ AttrNumber attr_count = list_length(cstate->attnumlist);
+
+ cstate->max_fields = attr_count;
+ cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
+ }
+}
+
+/*
+ * CacheLineInfo - Cache the line information to local memory.
+ */
+static bool
+CacheLineInfo(CopyState cstate, uint32 buff_count)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyData *pcdata = cstate->pcdata;
+ uint32 write_pos;
+ ParallelCopyDataBlock *data_blk_ptr;
+ ParallelCopyLineBoundary *lineInfo;
+ uint32 offset;
+ int dataSize;
+ int copiedSize = 0;
+
+ resetStringInfo(&pcdata->worker_line_buf[buff_count].line_buf);
+ write_pos = GetLinePosition(cstate);
+ if (-1 == write_pos)
+ return true;
+
+ /* Get the current line information. */
+ lineInfo = &pcshared_info->line_boundaries.ring[write_pos];
+ if (pg_atomic_read_u32(&lineInfo->line_size) == 0)
+ goto empty_data_line_update;
+
+ /* Get the block information. */
+ data_blk_ptr = &pcshared_info->data_blocks[lineInfo->first_block];
+
+ /* Get the offset information from where the data must be copied. */
+ offset = lineInfo->start_offset;
+ pcdata->worker_line_buf[buff_count].cur_lineno = lineInfo->cur_lineno;
+
+ elog(DEBUG1, "[Worker] Processing - line position:%d, block:%d, unprocessed lines:%d, offset:%d, line size:%d",
+ write_pos, lineInfo->first_block,
+ pg_atomic_read_u32(&data_blk_ptr->unprocessed_line_parts),
+ offset, pg_atomic_read_u32(&lineInfo->line_size));
+
+ for (;;)
+ {
+ uint8 skip_bytes = data_blk_ptr->skip_bytes;
+
+ /*
+ * There is a possibility that the loop embedded at the bottom of the
+ * current loop has come out because data_blk_ptr->curr_blk_completed
+ * is set, but dataSize read might be an old value, if
+ * data_blk_ptr->curr_blk_completed and the line is completed,
+ * line_size will be set. Read the line_size again to be sure if it is
+ * completed or partial block.
+ */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+ if (dataSize)
+ {
+ int remainingSize = dataSize - copiedSize;
+
+ if (!remainingSize)
+ break;
+
+ /* Whole line is in current block. */
+ if (remainingSize + offset + skip_bytes < DATA_BLOCK_SIZE)
+ {
+ appendBinaryStringInfo(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ remainingSize);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts,
+ 1);
+ break;
+ }
+ else
+ {
+ /* Line is spread across the blocks. */
+ uint32 lineInCurrentBlock = (DATA_BLOCK_SIZE - skip_bytes) - offset;
+
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ lineInCurrentBlock);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts, 1);
+ copiedSize += lineInCurrentBlock;
+ while (copiedSize < dataSize)
+ {
+ uint32 currentBlockCopySize;
+ ParallelCopyDataBlock *currBlkPtr = &pcshared_info->data_blocks[data_blk_ptr->following_block];
+
+ skip_bytes = currBlkPtr->skip_bytes;
+
+ /*
+ * If complete data is present in current block use
+ * dataSize - copiedSize, or copy the whole block from
+ * current block.
+ */
+ currentBlockCopySize = Min(dataSize - copiedSize, DATA_BLOCK_SIZE - skip_bytes);
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &currBlkPtr->data[0],
+ currentBlockCopySize);
+ pg_atomic_sub_fetch_u32(&currBlkPtr->unprocessed_line_parts, 1);
+ copiedSize += currentBlockCopySize;
+ data_blk_ptr = currBlkPtr;
+ }
+
+ break;
+ }
+ }
+ else
+ {
+ /* Copy this complete block from the current offset. */
+ uint32 lineInCurrentBlock = (DATA_BLOCK_SIZE - skip_bytes) - offset;
+
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ lineInCurrentBlock);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts, 1);
+ copiedSize += lineInCurrentBlock;
+
+ /*
+ * Reset the offset. For the first copy, copy from the offset. For
+ * the subsequent copy the complete block.
+ */
+ offset = 0;
+
+ /* Set data_blk_ptr to the following block. */
+ data_blk_ptr = &pcshared_info->data_blocks[data_blk_ptr->following_block];
+ }
+
+ for (;;)
+ {
+ /* Get the size of this line */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+
+ /*
+ * If the data is present in current block lineInfo->line_size
+ * will be updated. If the data is spread across the blocks either
+ * of lineInfo->line_size or data_blk_ptr->curr_blk_completed can
+ * be updated. lineInfo->line_size will be updated if the complete
+ * read is finished. data_blk_ptr->curr_blk_completed will be
+ * updated if processing of current block is finished and data
+ * processing is not finished.
+ */
+ if (data_blk_ptr->curr_blk_completed || (dataSize != -1))
+ break;
+
+ COPY_WAIT_TO_PROCESS()
+ }
+ }
+
+empty_data_line_update:
+ elog(DEBUG1, "[Worker] Completed processing line:%d", write_pos);
+ pg_atomic_write_u32(&lineInfo->line_state, LINE_WORKER_PROCESSED);
+ pg_atomic_write_u32(&lineInfo->line_size, -1);
+ pg_atomic_add_fetch_u64(&pcshared_info->total_worker_processed, 1);
+ return false;
+}
+
+/*
+ * GetWorkerLine - Returns a line for worker to process.
+ */
+bool
+GetWorkerLine(CopyState cstate)
+{
+ uint32 buff_count;
+ ParallelCopyData *pcdata = cstate->pcdata;
+
+ /*
+ * Copy the line data to line_buf and release the line position so that
+ * the worker can continue loading data.
+ */
+ if (pcdata->worker_line_buf_pos < pcdata->worker_line_buf_count)
+ goto return_line;
+
+ pcdata->worker_line_buf_pos = 0;
+ pcdata->worker_line_buf_count = 0;
+
+ for (buff_count = 0; buff_count < WORKER_CHUNK_COUNT; buff_count++)
+ {
+ bool result = CacheLineInfo(cstate, buff_count);
+ if (result)
+ break;
+
+ pcdata->worker_line_buf_count++;
+ }
+
+ if (pcdata->worker_line_buf_count)
+ goto return_line;
+ else
+ resetStringInfo(&cstate->line_buf);
+
+ return true;
+
+return_line:
+ cstate->line_buf = pcdata->worker_line_buf[pcdata->worker_line_buf_pos].line_buf;
+ cstate->cur_lineno = pcdata->worker_line_buf[pcdata->worker_line_buf_pos].cur_lineno;
+ cstate->line_buf_valid = true;
+
+ /* Mark that encoding conversion hasn't occurred yet. */
+ cstate->line_buf_converted = false;
+ ConvertToServerEncoding(cstate);
+ pcdata->worker_line_buf_pos++;
+ return false;
+}
+
+/*
+ * ParallelCopyMain - Parallel copy worker's code.
+ *
+ * Where clause handling, convert tuple to columns, add default null values for
+ * the missing columns that are not present in that record. Find the partition
+ * if it is partitioned table, invoke before row insert Triggers, handle
+ * constraints and insert the tuples.
+ */
+void
+ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
+{
+ CopyState cstate;
+ ParallelCopyData *pcdata;
+ ParallelCopyShmInfo *pcshared_info;
+ Relation rel = NULL;
+ MemoryContext oldcontext;
+ List *attlist = NIL;
+ WalUsage *walusage;
+ BufferUsage *bufferusage;
+
+ /* Allocate workspace and zero all fields. */
+ cstate = (CopyStateData *) palloc0(sizeof(CopyStateData));
+
+ /*
+ * We allocate everything used by a cstate in a new memory context. This
+ * avoids memory leaks during repeated use of COPY in a query.
+ */
+ cstate->copycontext = AllocSetContextCreate(CurrentMemoryContext,
+ "COPY",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+
+ pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
+ cstate->pcdata = pcdata;
+ pcdata->is_leader = false;
+ pcdata->worker_processed_pos = -1;
+ cstate->is_parallel = true;
+ pcshared_info = (ParallelCopyShmInfo *) shm_toc_lookup(toc, PARALLEL_COPY_KEY_SHARED_INFO, false);
+
+ ereport(DEBUG1, (errmsg("Starting parallel copy worker")));
+
+ pcdata->pcshared_info = pcshared_info;
+ RestoreParallelCopyState(toc, cstate, &attlist);
+
+ /* Open and lock the relation, using the appropriate lock type. */
+ rel = table_open(cstate->pcdata->relid, RowExclusiveLock);
+ cstate->rel = rel;
+ InitializeParallelCopyInfo(cstate, attlist);
+
+ /* Prepare to track buffer usage during parallel execution */
+ InstrStartParallelQuery();
+
+ CopyFrom(cstate);
+
+ if (rel != NULL)
+ table_close(rel, RowExclusiveLock);
+
+ /* Report WAL/buffer usage during parallel execution */
+ bufferusage = shm_toc_lookup(toc, PARALLEL_COPY_BUFFER_USAGE, false);
+ walusage = shm_toc_lookup(toc, PARALLEL_COPY_WAL_USAGE, false);
+ InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber],
+ &walusage[ParallelWorkerNumber]);
+
+ MemoryContextSwitchTo(oldcontext);
+ pfree(cstate);
+ return;
+}
+
+/*
+ * UpdateSharedLineInfo - Update the line information.
+ */
+uint32
+UpdateSharedLineInfo(CopyState cstate, uint32 blk_pos, uint32 offset,
+ uint32 line_size, uint32 line_state, uint32 blk_line_pos)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundaries *lineBoundaryPtr = &pcshared_info->line_boundaries;
+ ParallelCopyLineBoundary *lineInfo;
+ uint32 line_pos;
+
+ /* blk_line_pos will be valid in case line_pos was blocked earlier. */
+ if (blk_line_pos == -1)
+ {
+ line_pos = lineBoundaryPtr->pos;
+
+ /* Update the line information for the worker to pick and process. */
+ lineInfo = &lineBoundaryPtr->ring[line_pos];
+ while (pg_atomic_read_u32(&lineInfo->line_size) != -1)
+ COPY_WAIT_TO_PROCESS()
+
+ lineInfo->first_block = blk_pos;
+ lineInfo->start_offset = offset;
+ lineInfo->cur_lineno = cstate->cur_lineno;
+ lineBoundaryPtr->pos = (lineBoundaryPtr->pos + 1) % RINGSIZE;
+ }
+ else
+ {
+ line_pos = blk_line_pos;
+ lineInfo = &lineBoundaryPtr->ring[line_pos];
+ }
+
+ if (line_state == LINE_LEADER_POPULATED)
+ {
+ elog(DEBUG1, "[Leader] Added line with block:%d, offset:%d, line position:%d, line size:%d",
+ lineInfo->first_block, lineInfo->start_offset, line_pos,
+ pg_atomic_read_u32(&lineInfo->line_size));
+ pcshared_info->populated++;
+ }
+ else
+ elog(DEBUG1, "[Leader] Adding - block:%d, offset:%d, line position:%d",
+ lineInfo->first_block, lineInfo->start_offset, line_pos);
+
+ pg_atomic_write_u32(&lineInfo->line_size, line_size);
+ pg_atomic_write_u32(&lineInfo->line_state, line_state);
+
+ return line_pos;
+}
+
+/*
+ * ParallelCopyFrom - parallel copy leader's functionality.
+ *
+ * Leader executes the before statement for before statement trigger, if before
+ * statement trigger is present. It will read the table data from the file and
+ * copy the contents to DSM data blocks. It will then read the input contents
+ * from the DSM data block and identify the records based on line breaks. This
+ * information is called line or a record that need to be inserted into a
+ * relation. The line information will be stored in ParallelCopyLineBoundary DSM
+ * data structure. Workers will then process this information and insert the
+ * data in to table. It will repeat this process until the all data is read from
+ * the file and all the DSM data blocks are processed. While processing if
+ * leader identifies that DSM Data blocks or DSM ParallelCopyLineBoundary data
+ * structures is full, leader will wait till the worker frees up some entries
+ * and repeat the process. It will wait till all the lines populated are
+ * processed by the workers and exits.
+ */
+void
+ParallelCopyFrom(CopyState cstate)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+
+ ereport(DEBUG1, (errmsg("Running parallel copy leader")));
+
+ /* raw_buf is not used in parallel copy, instead data blocks are used. */
+ pfree(cstate->raw_buf);
+ cstate->raw_buf = NULL;
+
+ /* Execute the before statement triggers from the leader */
+ ExecBeforeStmtTrigger(cstate);
+
+ /* On input just throw the header line away. */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
+ {
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ {
+ pcshared_info->is_read_in_progress = false;
+ return; /* done */
+ }
+ }
+
+ for (;;)
+ {
+ bool done;
+
+ cstate->cur_lineno++;
+
+ /* Actually read the line into memory here. */
+ done = CopyReadLine(cstate);
+
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ break;
+ }
+
+ pcshared_info->is_read_in_progress = false;
+ cstate->cur_lineno = 0;
+}
+
+/*
+ * GetLinePosition - Return the line position once the leader has populated the
+ * data.
+ */
+uint32
+GetLinePosition(CopyState cstate)
+{
+ ParallelCopyData *pcdata = cstate->pcdata;
+ ParallelCopyShmInfo *pcshared_info = pcdata->pcshared_info;
+ uint32 previous_pos = pcdata->worker_processed_pos;
+ uint32 write_pos = (previous_pos == -1) ? 0 : (previous_pos + 1) % RINGSIZE;
+
+ for (;;)
+ {
+ int dataSize;
+ bool is_read_in_progress = pcshared_info->is_read_in_progress;
+ ParallelCopyLineBoundary *lineInfo;
+ ParallelCopyDataBlock *data_blk_ptr;
+ uint32 line_state = LINE_LEADER_POPULATED;
+ ParallelCopyLineState curr_line_state;
+
+ CHECK_FOR_INTERRUPTS();
+
+ /* File read completed & no elements to process. */
+ if (!is_read_in_progress &&
+ (pcshared_info->populated ==
+ pg_atomic_read_u64(&pcshared_info->total_worker_processed)))
+ {
+ write_pos = -1;
+ break;
+ }
+
+ /* Get the current line information. */
+ lineInfo = &pcshared_info->line_boundaries.ring[write_pos];
+ curr_line_state = pg_atomic_read_u32(&lineInfo->line_state);
+ if ((write_pos % WORKER_CHUNK_COUNT == 0) &&
+ (curr_line_state == LINE_WORKER_PROCESSED ||
+ curr_line_state == LINE_WORKER_PROCESSING))
+ {
+ pcdata->worker_processed_pos = write_pos;
+ write_pos = (write_pos + WORKER_CHUNK_COUNT) % RINGSIZE;
+ continue;
+ }
+
+ /* Get the size of this line. */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+
+ if (dataSize != 0) /* If not an empty line. */
+ {
+ /* Get the block information. */
+ data_blk_ptr = &pcshared_info->data_blocks[lineInfo->first_block];
+
+ if (!data_blk_ptr->curr_blk_completed && (dataSize == -1))
+ {
+ /* Wait till the current line or block is added. */
+ COPY_WAIT_TO_PROCESS()
+ continue;
+ }
+ }
+
+ /* Make sure that no worker has consumed this element. */
+ if (pg_atomic_compare_exchange_u32(&lineInfo->line_state,
+ &line_state, LINE_WORKER_PROCESSING))
+ break;
+ }
+
+ pcdata->worker_processed_pos = write_pos;
+ return write_pos;
+}
+
+/*
+ * GetFreeCopyBlock - Get a free block for data to be copied.
+ */
+static pg_attribute_always_inline uint32
+GetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
+{
+ int count = 0;
+ uint32 last_free_block = pcshared_info->cur_block_pos;
+ uint32 block_pos = (last_free_block != -1) ? ((last_free_block + 1) % MAX_BLOCKS_COUNT) : 0;
+
+ /*
+ * Get a new block for copying data, don't check current block, current
+ * block will have some unprocessed data.
+ */
+ while (count < (MAX_BLOCKS_COUNT - 1))
+ {
+ ParallelCopyDataBlock *dataBlkPtr = &pcshared_info->data_blocks[block_pos];
+ uint32 unprocessed_line_parts = pg_atomic_read_u32(&dataBlkPtr->unprocessed_line_parts);
+
+ if (unprocessed_line_parts == 0)
+ {
+ dataBlkPtr->curr_blk_completed = false;
+ dataBlkPtr->skip_bytes = 0;
+ dataBlkPtr->following_block = -1;
+ pcshared_info->cur_block_pos = block_pos;
+ MemSet(&dataBlkPtr->data[0], 0, DATA_BLOCK_SIZE);
+ return block_pos;
+ }
+
+ block_pos = (block_pos + 1) % MAX_BLOCKS_COUNT;
+ count++;
+ }
+
+ return -1;
+}
+
+/*
+ * WaitGetFreeCopyBlock - If there are no blocks available, wait and get a block
+ * for copying data.
+ */
+uint32
+WaitGetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
+{
+ uint32 new_free_pos = -1;
+
+ for (;;)
+ {
+ new_free_pos = GetFreeCopyBlock(pcshared_info);
+ if (new_free_pos != -1) /* We have got one block, break now. */
+ break;
+
+ COPY_WAIT_TO_PROCESS()
+ }
+
+ return new_free_pos;
+}
+
+/*
+ * SetRawBufForLoad - Set raw_buf to the shared memory where the file data must
+ * be read.
+ */
+void
+SetRawBufForLoad(CopyState cstate, uint32 line_size, uint32 copy_buf_len,
+ uint32 raw_buf_ptr, char **copy_raw_buf)
+{
+ ParallelCopyShmInfo *pcshared_info;
+ uint32 cur_block_pos;
+ uint32 next_block_pos;
+ ParallelCopyDataBlock *cur_data_blk_ptr = NULL;
+ ParallelCopyDataBlock *next_data_blk_ptr = NULL;
+
+ if (!IsParallelCopy())
+ return;
+
+ pcshared_info = cstate->pcdata->pcshared_info;
+ cur_block_pos = pcshared_info->cur_block_pos;
+ cur_data_blk_ptr = (cstate->raw_buf) ? &pcshared_info->data_blocks[cur_block_pos] : NULL;
+ next_block_pos = WaitGetFreeCopyBlock(pcshared_info);
+ next_data_blk_ptr = &pcshared_info->data_blocks[next_block_pos];
+
+ /* set raw_buf to the data block in shared memory */
+ cstate->raw_buf = next_data_blk_ptr->data;
+ *copy_raw_buf = cstate->raw_buf;
+ if (cur_data_blk_ptr && line_size)
+ {
+ /*
+ * Mark the previous block as completed, worker can start copying this
+ * data.
+ */
+ cur_data_blk_ptr->following_block = next_block_pos;
+ pg_atomic_add_fetch_u32(&cur_data_blk_ptr->unprocessed_line_parts, 1);
+ cur_data_blk_ptr->skip_bytes = copy_buf_len - raw_buf_ptr;
+ cur_data_blk_ptr->curr_blk_completed = true;
+ }
+}
+
+/*
+ * EndLineParallelCopy - Update the line information in shared memory.
+ */
+void
+EndLineParallelCopy(CopyState cstate, uint32 line_pos, uint32 line_size,
+ uint32 raw_buf_ptr)
+{
+ uint8 new_line_size;
+
+ if (!IsParallelCopy())
+ return;
+
+ if (!IsHeaderLine())
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+
+ SET_NEWLINE_SIZE()
+ if (line_size)
+ {
+ /*
+ * If the new_line_size > raw_buf_ptr, then the new block has only
+ * new line char content. The unprocessed count should not be
+ * increased in this case.
+ */
+ if (raw_buf_ptr > new_line_size)
+ {
+ uint32 cur_block_pos = pcshared_info->cur_block_pos;
+ ParallelCopyDataBlock *curr_data_blk_ptr = &pcshared_info->data_blocks[cur_block_pos];
+
+ pg_atomic_add_fetch_u32(&curr_data_blk_ptr->unprocessed_line_parts, 1);
+ }
+
+ /*
+ * Update line size & line state, other members are already
+ * updated.
+ */
+ (void) UpdateSharedLineInfo(cstate, -1, -1, line_size,
+ LINE_LEADER_POPULATED, line_pos);
+ }
+ else if (new_line_size)
+ /* This means only new line char, empty record should be inserted. */
+ (void) UpdateSharedLineInfo(cstate, -1, -1, 0,
+ LINE_LEADER_POPULATED, -1);
+ }
+}
+
+/*
+ * ExecBeforeStmtTrigger - Execute the before statement trigger, this will be
+ * executed for parallel copy by the leader process.
+ */
+void
+ExecBeforeStmtTrigger(CopyState cstate)
+{
+ EState *estate = CreateExecutorState();
+ ResultRelInfo *resultRelInfo;
+
+ Assert(IsLeader());
+
+ /*
+ * We need a ResultRelInfo so we can use the regular executor's
+ * index-entry-making machinery. (There used to be a huge amount of code
+ * here that basically duplicated execUtils.c ...)
+ */
+ resultRelInfo = makeNode(ResultRelInfo);
+ InitResultRelInfo(resultRelInfo,
+ cstate->rel,
+ 1, /* must match rel's position in range_table */
+ NULL,
+ 0);
+
+ /* Verify the named relation is a valid target for INSERT */
+ CheckValidResultRel(resultRelInfo, CMD_INSERT);
+
+ estate->es_result_relations = resultRelInfo;
+ estate->es_num_result_relations = 1;
+ estate->es_result_relation_info = resultRelInfo;
+
+ ExecInitRangeTable(estate, cstate->range_table);
+
+ /*
+ * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
+ * should do this for COPY, since it's not really an "INSERT" statement as
+ * such. However, executing these triggers maintains consistency with the
+ * EACH ROW triggers that we already fire on COPY.
+ */
+ ExecBSInsertTriggers(estate, resultRelInfo);
+
+ /* Close any trigger target relations */
+ ExecCleanUpTriggerState(estate);
+
+ FreeExecutorState(estate);
+}
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 24c7b41..cf00256 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2353,7 +2353,7 @@ psql_completion(const char *text, int start, int end)
/* Complete COPY <sth> FROM|TO filename WITH ( */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "("))
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
- "HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
+ "HEADER", "PARALLEL", "QUOTE", "ESCAPE", "FORCE_QUOTE",
"FORCE_NOT_NULL", "FORCE_NULL", "ENCODING");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index df1b43a..96295bc 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -385,6 +385,7 @@ extern FullTransactionId GetTopFullTransactionId(void);
extern FullTransactionId GetTopFullTransactionIdIfAny(void);
extern FullTransactionId GetCurrentFullTransactionId(void);
extern FullTransactionId GetCurrentFullTransactionIdIfAny(void);
+extern void SetCurrentCommandIdUsedForWorker(void);
extern void MarkCurrentTransactionIdLoggedIfAny(void);
extern bool SubTransactionIsActive(SubTransactionId subxid);
extern CommandId GetCurrentCommandId(bool used);
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index cd2d56e..a9fe950 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -51,6 +51,31 @@
*/
#define WORKER_CHUNK_COUNT 64
+#define IsParallelCopy() (cstate->is_parallel)
+#define IsLeader() (cstate->pcdata->is_leader)
+#define IsWorker() (IsParallelCopy() && !IsLeader())
+#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
+
+/*
+ * Increment the lines processed.
+ */
+#define INCREMENTPROCESSED(processed) \
+{ \
+ if (!IsParallelCopy()) \
+ processed++; \
+ else \
+ pg_atomic_add_fetch_u64(&cstate->pcdata->pcshared_info->processed, 1); \
+}
+
+/*
+ * Get the lines processed.
+ */
+#define RETURNPROCESSED(processed) \
+if (!IsParallelCopy()) \
+ return processed; \
+else \
+ return pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
+
/*
* Represents the different source/dest cases we need to worry about at
* the bottom level
@@ -75,6 +100,28 @@ typedef enum EolType
} EolType;
/*
+ * State of the line.
+ */
+typedef enum ParallelCopyLineState
+{
+ LINE_INIT, /* initial state of line */
+ LINE_LEADER_POPULATING, /* leader processing line */
+ LINE_LEADER_POPULATED, /* leader completed populating line */
+ LINE_WORKER_PROCESSING, /* worker processing line */
+ LINE_WORKER_PROCESSED /* worker completed processing line */
+} ParallelCopyLineState;
+
+/*
+ * Represents the heap insert method to be used during COPY FROM.
+ */
+typedef enum CopyInsertMethod
+{
+ CIM_SINGLE, /* use table_tuple_insert or fdw routine */
+ CIM_MULTI, /* always use table_multi_insert */
+ CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
+} CopyInsertMethod;
+
+/*
* Copy data block information.
*
* These data blocks are created in DSM. Data read from file will be copied in
@@ -194,8 +241,6 @@ typedef struct ParallelCopyShmInfo
uint64 populated; /* lines populated by leader */
uint32 cur_block_pos; /* current data block */
ParallelCopyDataBlock data_blocks[MAX_BLOCKS_COUNT]; /* data block array */
- FullTransactionId full_transaction_id; /* xid for copy from statement */
- CommandId mycid; /* command id */
ParallelCopyLineBoundaries line_boundaries; /* line array */
} ParallelCopyShmInfo;
@@ -242,12 +287,12 @@ typedef struct ParallelCopyData
ParallelCopyShmInfo *pcshared_info; /* common info in shared memory */
bool is_leader;
+ /* line position which worker is processing */
+ uint32 worker_processed_pos;
+
WalUsage *walusage;
BufferUsage *bufferusage;
- /* line position which worker is processing */
- uint32 worker_processed_pos;
-
/*
* Local line_buf array, workers will copy it here and release the lines
* for the leader to continue.
@@ -423,9 +468,23 @@ extern DestReceiver *CreateCopyDestReceiver(void);
extern void PopulateCommonCstateInfo(CopyState cstate, TupleDesc tup_desc,
List *attnamelist);
+extern void ConvertToServerEncoding(CopyState cstate);
extern void ParallelCopyMain(dsm_segment *seg, shm_toc *toc);
extern ParallelContext *BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid);
extern void ParallelCopyFrom(CopyState cstate);
extern void EndParallelCopy(ParallelContext *pcxt);
+extern void ExecBeforeStmtTrigger(CopyState cstate);
+extern void CheckTargetRelValidity(CopyState cstate);
+extern void PopulateCstateCatalogInfo(CopyState cstate);
+extern uint32 GetLinePosition(CopyState cstate);
+extern bool GetWorkerLine(CopyState cstate);
+extern bool CopyReadLine(CopyState cstate);
+extern uint32 WaitGetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info);
+extern void SetRawBufForLoad(CopyState cstate, uint32 line_size, uint32 copy_buf_len,
+ uint32 raw_buf_ptr, char **copy_raw_buf);
+extern uint32 UpdateSharedLineInfo(CopyState cstate, uint32 blk_pos, uint32 offset,
+ uint32 line_size, uint32 line_state, uint32 blk_line_pos);
+extern void EndLineParallelCopy(CopyState cstate, uint32 line_pos, uint32 line_size,
+ uint32 raw_buf_ptr);
#endif /* COPY_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index f5b818b..a198bf0 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1707,6 +1707,7 @@ ParallelCopyLineBoundary
ParallelCopyData
ParallelCopyDataBlock
ParallelCopyLineBuf
+ParallelCopyLineState
ParallelCopyShmInfo
ParallelExecutorInfo
ParallelHashGrowth
--
1.8.3.1
v6-0004-Documentation-for-parallel-copy.patchapplication/octet-stream; name=v6-0004-Documentation-for-parallel-copy.patchDownload
From 0dd69daa6794de67ca6398d4f72869ae3c17dcdb Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Sat, 1 Aug 2020 09:00:36 +0530
Subject: [PATCH v6 4/6] Documentation for parallel copy.
This patch has the documentation changes for parallel copy.
---
doc/src/sgml/ref/copy.sgml | 31 +++++++++++++++++++++++++++++++
1 file changed, 31 insertions(+)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 18189ab..19b1979 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -37,6 +37,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
NULL '<replaceable class="parameter">null_string</replaceable>'
HEADER [ <replaceable class="parameter">boolean</replaceable> ]
+ PARALLEL <replaceable class="parameter">integer</replaceable>
QUOTE '<replaceable class="parameter">quote_character</replaceable>'
ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -275,6 +276,22 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</varlistentry>
<varlistentry>
+ <term><literal>PARALLEL</literal></term>
+ <listitem>
+ <para>
+ Perform <command>COPY FROM</command> in parallel using <replaceable
+ class="parameter">integer</replaceable> background workers. Please
+ note that it is not guaranteed that the number of parallel workers
+ specified in <replaceable class="parameter">integer</replaceable> will
+ be used during execution. It is possible for a copy to run with fewer
+ workers than specified, or even with no workers at all (for example,
+ due to the setting of max_worker_processes). This option is allowed
+ only in <command>COPY FROM</command>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>QUOTE</literal></term>
<listitem>
<para>
@@ -951,6 +968,20 @@ COPY country FROM '/usr1/proj/bray/sql/country_data';
</para>
<para>
+ To copy data parallelly from a file into the <literal>country</literal> table:
+<programlisting>
+COPY country FROM '/usr1/proj/bray/sql/country_data' WITH (PARALLEL 1);
+</programlisting>
+ </para>
+
+ <para>
+ To copy data parallelly from STDIN into the <literal>country</literal> table:
+<programlisting>
+COPY country FROM STDIN WITH (PARALLEL 1);
+</programlisting>
+ </para>
+
+ <para>
To copy into a file just the countries whose names start with 'A':
<programlisting>
COPY (SELECT * FROM country WHERE country_name LIKE 'A%') TO '/usr1/proj/bray/sql/a_list_countries.copy';
--
1.8.3.1
v6-0005-Tests-for-parallel-copy.patchapplication/octet-stream; name=v6-0005-Tests-for-parallel-copy.patchDownload
From 1f07a8f5128a2e295e6dda1ab667f8ca8fcffaff Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Sat, 1 Aug 2020 09:19:39 +0530
Subject: [PATCH v6 5/6] Tests for parallel copy.
This patch has the tests for parallel copy.
---
src/test/regress/expected/copy2.out | 205 ++++++++++++++++++++++++++++++++++-
src/test/regress/input/copy.source | 12 +++
src/test/regress/output/copy.source | 12 +++
src/test/regress/sql/copy2.sql | 208 +++++++++++++++++++++++++++++++++++-
4 files changed, 429 insertions(+), 8 deletions(-)
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index e40287d..7ae5d44 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -254,18 +254,32 @@ It is "perfect".|
"It is ""perfect""."," "
"",
--test that we read consecutive LFs properly
-CREATE TEMP TABLE testnl (a int, b text, c int);
+CREATE TABLE testnl (a int, b text, c int);
COPY testnl FROM stdin CSV;
+COPY testnl FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
+SELECT * FROM testnl;
+ a | b | c
+---+----------------------+---
+ 1 | a field with two LFs+| 2
+ | +|
+ | inside |
+ 1 | a field with two LFs+| 2
+ | +|
+ | inside |
+(2 rows)
+
-- test end of copy marker
-CREATE TEMP TABLE testeoc (a text);
+CREATE TABLE testeoc (a text);
COPY testeoc FROM stdin CSV;
+TRUNCATE testeoc;
+COPY testeoc FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
COPY testeoc TO stdout CSV;
a\.
\.b
c\.d
"\."
-- test handling of nonstandard null marker that violates escaping rules
-CREATE TEMP TABLE testnull(a int, b text);
+CREATE TABLE testnull(a int, b text);
INSERT INTO testnull VALUES (1, E'\\0'), (NULL, NULL);
COPY testnull TO stdout WITH NULL AS E'\\0';
1 \\0
@@ -280,6 +294,15 @@ SELECT * FROM testnull;
|
(4 rows)
+TRUNCATE testnull;
+COPY testnull FROM stdin WITH (NULL E'\\0', PARALLEL 1);
+SELECT * FROM testnull;
+ a | b
+----+----
+ 42 | \0
+ |
+(2 rows)
+
BEGIN;
CREATE TABLE vistest (LIKE testeoc);
COPY vistest FROM stdin CSV;
@@ -349,6 +372,34 @@ SELECT * FROM vistest;
BEGIN;
TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+SELECT * FROM vistest;
+ a
+----
+ a2
+ b
+(2 rows)
+
+SAVEPOINT s1;
+TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+SELECT * FROM vistest;
+ a
+----
+ d2
+ e
+(2 rows)
+
+COMMIT;
+SELECT * FROM vistest;
+ a
+----
+ d2
+ e
+(2 rows)
+
+BEGIN;
+TRUNCATE vistest;
COPY vistest FROM stdin CSV FREEZE;
SELECT * FROM vistest;
a
@@ -409,7 +460,7 @@ SELECT * FROM vistest;
(2 rows)
-- Test FORCE_NOT_NULL and FORCE_NULL options
-CREATE TEMP TABLE forcetest (
+CREATE TABLE forcetest (
a INT NOT NULL,
b TEXT NOT NULL,
c TEXT,
@@ -420,6 +471,8 @@ CREATE TEMP TABLE forcetest (
-- should succeed with no effect ("b" remains an empty string, "c" remains NULL)
BEGIN;
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
+TRUNCATE forcetest;
+COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c), PARALLEL 1);
COMMIT;
SELECT b, c FROM forcetest WHERE a = 1;
b | c
@@ -430,6 +483,8 @@ SELECT b, c FROM forcetest WHERE a = 1;
-- should succeed, FORCE_NULL and FORCE_NOT_NULL can be both specified
BEGIN;
COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d));
+TRUNCATE forcetest;
+COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d), PARALLEL 1);
COMMIT;
SELECT c, d FROM forcetest WHERE a = 2;
c | d
@@ -486,6 +541,31 @@ select * from check_con_tbl;
(2 rows)
+\d+ check_con_tbl
+ Table "public.check_con_tbl"
+ Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
+--------+---------+-----------+----------+---------+---------+--------------+-------------
+ f1 | integer | | | | plain | |
+Check constraints:
+ "check_con_tbl_check" CHECK (check_con_function(check_con_tbl.*))
+
+truncate check_con_tbl;
+copy check_con_tbl from stdin with (parallel 1);
+NOTICE: input = {"f1":1}
+NOTICE: input = {"f1":null}
+copy check_con_tbl from stdin with (parallel 1);
+NOTICE: input = {"f1":0}
+ERROR: new row for relation "check_con_tbl" violates check constraint "check_con_tbl_check"
+DETAIL: Failing row contains (0).
+CONTEXT: COPY check_con_tbl, line 1: "0"
+parallel worker
+select * from check_con_tbl;
+ f1
+----
+ 1
+
+(2 rows)
+
-- test with RLS enabled.
CREATE ROLE regress_rls_copy_user;
CREATE ROLE regress_rls_copy_user_colperms;
@@ -600,8 +680,125 @@ SELECT * FROM instead_of_insert_tbl;
(2 rows)
COMMIT;
+-- Parallel copy tests.
+CREATE TABLE test_parallel_copy (
+ a INT,
+ b INT,
+ c TEXT not null default 'stuff',
+ d TEXT,
+ e TEXT
+) ;
+COPY test_parallel_copy (a, b, c, d, e) from stdin with (PARALLEL 1);
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1, FORMAT 'csv', HEADER);
+-- zero workers: should perform non-parallel copy
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0');
+ERROR: value 0 out of bounds for option "parallel"
+DETAIL: Valid values are between "1" and "1024".
+-- referencing table: should perform non-parallel copy
+CREATE TABLE test_copy_pk(c1 INT PRIMARY KEY);
+INSERT INTO test_copy_pk VALUES(10);
+CREATE TABLE test_copy_ri(c1 INT REFERENCES test_copy_pk(c1));
+COPY test_copy_ri from stdin with (FORMAT csv, DELIMITER ',', PARALLEL 1);
+-- expressions: should perform non-parallel copy
+CREATE TABLE test_copy_expr (index INT, height REAL, weight REAL);
+COPY test_copy_expr FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1) WHERE height > random() * 65;
+-- serial data: should perform non-parallel copy
+CREATE TABLE testserial (index SERIAL, height REAL);
+COPY testserial(height) FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1);
+-- temporary table copy: should perform non-parallel copy
+CREATE TEMPORARY TABLE temp_test(
+ a int
+) ;
+COPY temp_test (a) from stdin with (PARALLEL 1);
+-- non-existent column in column list: should fail
+COPY test_parallel_copy (xyz) from stdin with (PARALLEL 1);
+ERROR: column "xyz" of relation "test_parallel_copy" does not exist
+-- too many columns in column list: should fail
+COPY test_parallel_copy (a, b, c, d, e, d, c) from stdin with (PARALLEL 1);
+ERROR: column "d" specified more than once
+-- missing data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: invalid input syntax for type integer: ""
+CONTEXT: COPY test_parallel_copy, line 0, column a: ""
+parallel worker
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: missing data for column "e"
+CONTEXT: COPY test_parallel_copy, line 1: "2000 230 23 23"
+parallel worker
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: missing data for column "e"
+CONTEXT: COPY test_parallel_copy, line 1: "2001 231 \N \N"
+parallel worker
+-- extra data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+ERROR: extra data after last expected column
+CONTEXT: COPY test_parallel_copy, line 1: "2002 232 40 50 60 70 80"
+parallel worker
+-- various COPY options: delimiters, oids, NULL string, encoding
+COPY test_parallel_copy (b, c, d, e) from stdin with (delimiter ',', null 'x', PARALLEL 1) ;
+COPY test_parallel_copy from stdin WITH (DELIMITER ';', NULL '', PARALLEL 1);
+COPY test_parallel_copy from stdin WITH (DELIMITER ':', NULL E'\\X', ENCODING 'sql_ascii', PARALLEL 1);
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = 50004;
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a > 60003;
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE f > 60003;
+ERROR: column "f" does not exist
+LINE 1: ..._parallel_copy from stdin WITH (PARALLEL 1) WHERE f > 60003;
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = max(x.b);
+ERROR: missing FROM-clause entry for table "x"
+LINE 1: ...rallel_copy from stdin WITH (PARALLEL 1) WHERE a = max(x.b);
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (SELECT 1 FROM x);
+ERROR: cannot use subquery in COPY FROM WHERE condition
+LINE 1: ...arallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (SELECT...
+ ^
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (generate_series(1,5));
+ERROR: set-returning functions are not allowed in COPY FROM WHERE conditions
+LINE 1: ...lel_copy from stdin WITH (PARALLEL 1) WHERE a IN (generate_s...
+ ^
+COPY test_parallel_copy from stdin WITH(PARALLEL 1) WHERE a = row_number() over(b);
+ERROR: window functions are not allowed in COPY FROM WHERE conditions
+LINE 1: ...rallel_copy from stdin WITH(PARALLEL 1) WHERE a = row_number...
+ ^
+-- check results of copy in
+SELECT * FROM test_parallel_copy;
+ a | b | c | d | e
+-------+----+------------+---------+---------
+ 1 | 11 | test_c1 | test_d1 | test_e1
+ 2 | 12 | test_c2 | test_d2 | test_e2
+ | 3 | stuff | test_d3 |
+ | 4 | stuff | test_d4 |
+ | 5 | stuff | test_d5 |
+ | | 45 | 80 | 90
+ | | x | \x | \x
+ | | , | \, | \
+ 3000 | | c | |
+ 4000 | | C | |
+ 4001 | 1 | empty | |
+ 4002 | 2 | null | |
+ 4003 | 3 | Backslash | \ | \
+ 4004 | 4 | BackslashX | \X | \X
+ 4005 | 5 | N | N | N
+ 4006 | 6 | BackslashN | \N | \N
+ 4007 | 7 | XX | XX | XX
+ 4008 | 8 | Delimiter | : | :
+ 50004 | 25 | 35 | 45 | 55
+ 60004 | 25 | 35 | 45 | 55
+ 60005 | 26 | 36 | 46 | 56
+(21 rows)
+
-- clean up
+DROP TABLE test_copy_ri;
+DROP TABLE test_copy_pk;
+DROP TABLE test_copy_expr;
+DROP TABLE testeoc;
+DROP TABLE testnl;
DROP TABLE forcetest;
+DROP TABLE test_parallel_copy;
+DROP TABLE testserial;
+DROP TABLE testnull;
DROP TABLE vistest;
DROP FUNCTION truncate_in_subxact();
DROP TABLE x, y;
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index a1d529a..159c058 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -15,6 +15,13 @@ DELETE FROM onek;
COPY onek FROM '@abs_builddir@/results/onek.data';
+-- test parallel copy
+COPY tenk1 FROM '@abs_srcdir@/data/tenk.data' WITH (parallel 2);
+
+SELECT COUNT(*) FROM tenk1;
+
+TRUNCATE tenk1;
+
COPY tenk1 FROM '@abs_srcdir@/data/tenk.data';
COPY slow_emp4000 FROM '@abs_srcdir@/data/rect.data';
@@ -159,6 +166,11 @@ truncate parted_copytest;
copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv';
+-- Test parallel copy from with a partitioned table.
+truncate parted_copytest;
+
+copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv' with (parallel 2);
+
-- Ensure COPY FREEZE errors for partitioned tables.
begin;
truncate parted_copytest;
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index 938d355..c3003fe 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -9,6 +9,15 @@ COPY onek FROM '@abs_srcdir@/data/onek.data';
COPY onek TO '@abs_builddir@/results/onek.data';
DELETE FROM onek;
COPY onek FROM '@abs_builddir@/results/onek.data';
+-- test parallel copy
+COPY tenk1 FROM '@abs_srcdir@/data/tenk.data' WITH (parallel 2);
+SELECT COUNT(*) FROM tenk1;
+ count
+-------
+ 10000
+(1 row)
+
+TRUNCATE tenk1;
COPY tenk1 FROM '@abs_srcdir@/data/tenk.data';
COPY slow_emp4000 FROM '@abs_srcdir@/data/rect.data';
COPY person FROM '@abs_srcdir@/data/person.data';
@@ -113,6 +122,9 @@ insert into parted_copytest select x,1,'One' from generate_series(1011,1020) x;
copy (select * from parted_copytest order by a) to '@abs_builddir@/results/parted_copytest.csv';
truncate parted_copytest;
copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv';
+-- Test parallel copy from with a partitioned table.
+truncate parted_copytest;
+copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv' with (parallel 2);
-- Ensure COPY FREEZE errors for partitioned tables.
begin;
truncate parted_copytest;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 902f4fa..7015698 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -157,7 +157,7 @@ COPY y TO stdout (FORMAT CSV, FORCE_QUOTE *);
--test that we read consecutive LFs properly
-CREATE TEMP TABLE testnl (a int, b text, c int);
+CREATE TABLE testnl (a int, b text, c int);
COPY testnl FROM stdin CSV;
1,"a field with two LFs
@@ -165,8 +165,16 @@ COPY testnl FROM stdin CSV;
inside",2
\.
+COPY testnl FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
+1,"a field with two LFs
+
+inside",2
+\.
+
+SELECT * FROM testnl;
+
-- test end of copy marker
-CREATE TEMP TABLE testeoc (a text);
+CREATE TABLE testeoc (a text);
COPY testeoc FROM stdin CSV;
a\.
@@ -175,11 +183,19 @@ c\.d
"\."
\.
+TRUNCATE testeoc;
+COPY testeoc FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
+a\.
+\.b
+c\.d
+"\."
+\.
+
COPY testeoc TO stdout CSV;
-- test handling of nonstandard null marker that violates escaping rules
-CREATE TEMP TABLE testnull(a int, b text);
+CREATE TABLE testnull(a int, b text);
INSERT INTO testnull VALUES (1, E'\\0'), (NULL, NULL);
COPY testnull TO stdout WITH NULL AS E'\\0';
@@ -191,6 +207,14 @@ COPY testnull FROM stdin WITH NULL AS E'\\0';
SELECT * FROM testnull;
+TRUNCATE testnull;
+COPY testnull FROM stdin WITH (NULL E'\\0', PARALLEL 1);
+42 \\0
+\0 \0
+\.
+
+SELECT * FROM testnull;
+
BEGIN;
CREATE TABLE vistest (LIKE testeoc);
COPY vistest FROM stdin CSV;
@@ -235,6 +259,23 @@ SELECT * FROM vistest;
BEGIN;
TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+a2
+b
+\.
+SELECT * FROM vistest;
+SAVEPOINT s1;
+TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+d2
+e
+\.
+SELECT * FROM vistest;
+COMMIT;
+SELECT * FROM vistest;
+
+BEGIN;
+TRUNCATE vistest;
COPY vistest FROM stdin CSV FREEZE;
x
y
@@ -284,7 +325,7 @@ SELECT * FROM vistest;
COMMIT;
SELECT * FROM vistest;
-- Test FORCE_NOT_NULL and FORCE_NULL options
-CREATE TEMP TABLE forcetest (
+CREATE TABLE forcetest (
a INT NOT NULL,
b TEXT NOT NULL,
c TEXT,
@@ -297,6 +338,10 @@ BEGIN;
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
1,,""
\.
+TRUNCATE forcetest;
+COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c), PARALLEL 1);
+1,,""
+\.
COMMIT;
SELECT b, c FROM forcetest WHERE a = 1;
-- should succeed, FORCE_NULL and FORCE_NOT_NULL can be both specified
@@ -304,6 +349,10 @@ BEGIN;
COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d));
2,'a',,""
\.
+TRUNCATE forcetest;
+COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d), PARALLEL 1);
+2,'a',,""
+\.
COMMIT;
SELECT c, d FROM forcetest WHERE a = 2;
-- should fail with not-null constraint violation
@@ -339,6 +388,16 @@ copy check_con_tbl from stdin;
0
\.
select * from check_con_tbl;
+\d+ check_con_tbl
+truncate check_con_tbl;
+copy check_con_tbl from stdin with (parallel 1);
+1
+\N
+\.
+copy check_con_tbl from stdin with (parallel 1);
+0
+\.
+select * from check_con_tbl;
-- test with RLS enabled.
CREATE ROLE regress_rls_copy_user;
@@ -440,8 +499,149 @@ test1
SELECT * FROM instead_of_insert_tbl;
COMMIT;
+-- Parallel copy tests.
+CREATE TABLE test_parallel_copy (
+ a INT,
+ b INT,
+ c TEXT not null default 'stuff',
+ d TEXT,
+ e TEXT
+) ;
+
+COPY test_parallel_copy (a, b, c, d, e) from stdin with (PARALLEL 1);
+1 11 test_c1 test_d1 test_e1
+2 12 test_c2 test_d2 test_e2
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+3 test_d3
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1);
+4 test_d4
+5 test_d5
+\.
+
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL 1, FORMAT 'csv', HEADER);
+b d
+\.
+
+-- zero workers: should perform non-parallel copy
+COPY test_parallel_copy (b, d) from stdin with (PARALLEL '0');
+
+-- referencing table: should perform non-parallel copy
+CREATE TABLE test_copy_pk(c1 INT PRIMARY KEY);
+INSERT INTO test_copy_pk VALUES(10);
+CREATE TABLE test_copy_ri(c1 INT REFERENCES test_copy_pk(c1));
+COPY test_copy_ri from stdin with (FORMAT csv, DELIMITER ',', PARALLEL 1);
+10
+\.
+
+-- expressions: should perform non-parallel copy
+CREATE TABLE test_copy_expr (index INT, height REAL, weight REAL);
+
+COPY test_copy_expr FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1) WHERE height > random() * 65;
+60,60,60
+\.
+
+-- serial data: should perform non-parallel copy
+CREATE TABLE testserial (index SERIAL, height REAL);
+
+COPY testserial(height) FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1);
+60
+\.
+
+-- temporary table copy: should perform non-parallel copy
+CREATE TEMPORARY TABLE temp_test(
+ a int
+) ;
+
+COPY temp_test (a) from stdin with (PARALLEL 1);
+10
+\.
+
+-- non-existent column in column list: should fail
+COPY test_parallel_copy (xyz) from stdin with (PARALLEL 1);
+
+-- too many columns in column list: should fail
+COPY test_parallel_copy (a, b, c, d, e, d, c) from stdin with (PARALLEL 1);
+
+-- missing data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+
+\.
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+2000 230 23 23
+\.
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+2001 231 \N \N
+\.
+
+-- extra data: should fail
+COPY test_parallel_copy from stdin with (PARALLEL 1);
+2002 232 40 50 60 70 80
+\.
+
+-- various COPY options: delimiters, oids, NULL string, encoding
+COPY test_parallel_copy (b, c, d, e) from stdin with (delimiter ',', null 'x', PARALLEL 1) ;
+x,45,80,90
+x,\x,\\x,\\\x
+x,\,,\\\,,\\
+\.
+
+COPY test_parallel_copy from stdin WITH (DELIMITER ';', NULL '', PARALLEL 1);
+3000;;c;;
+\.
+
+COPY test_parallel_copy from stdin WITH (DELIMITER ':', NULL E'\\X', ENCODING 'sql_ascii', PARALLEL 1);
+4000:\X:C:\X:\X
+4001:1:empty::
+4002:2:null:\X:\X
+4003:3:Backslash:\\:\\
+4004:4:BackslashX:\\X:\\X
+4005:5:N:\N:\N
+4006:6:BackslashN:\\N:\\N
+4007:7:XX:\XX:\XX
+4008:8:Delimiter:\::\:
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = 50004;
+50003 24 34 44 54
+50004 25 35 45 55
+50005 26 36 46 56
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a > 60003;
+60001 22 32 42 52
+60002 23 33 43 53
+60003 24 34 44 54
+60004 25 35 45 55
+60005 26 36 46 56
+\.
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE f > 60003;
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a = max(x.b);
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (SELECT 1 FROM x);
+
+COPY test_parallel_copy from stdin WITH (PARALLEL 1) WHERE a IN (generate_series(1,5));
+
+COPY test_parallel_copy from stdin WITH(PARALLEL 1) WHERE a = row_number() over(b);
+
+-- check results of copy in
+SELECT * FROM test_parallel_copy;
+
-- clean up
+DROP TABLE test_copy_ri;
+DROP TABLE test_copy_pk;
+DROP TABLE test_copy_expr;
+DROP TABLE testeoc;
+DROP TABLE testnl;
DROP TABLE forcetest;
+DROP TABLE test_parallel_copy;
+DROP TABLE testserial;
+DROP TABLE testnull;
DROP TABLE vistest;
DROP FUNCTION truncate_in_subxact();
DROP TABLE x, y;
--
1.8.3.1
v6-0006-Parallel-Copy-For-Binary-Format-Files.patchapplication/octet-stream; name=v6-0006-Parallel-Copy-For-Binary-Format-Files.patchDownload
From fd470b5454555af0f633371f3e7ab99104e36f2c Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Fri, 9 Oct 2020 12:58:20 +0530
Subject: [PATCH v6 6/6] Parallel Copy For Binary Format Files
Leader reads data from the file into the DSM data blocks each of 64K size.
It also identifies each tuple data block id, start offset, end offset,
tuple size and updates this information in the ring data structure.
Workers parallelly read the tuple information from the ring data structure,
the actual tuple data from the data blocks and parallelly insert the tuples
into the table.
---
src/backend/commands/copy.c | 134 +++++----
src/backend/commands/copyparallel.c | 426 ++++++++++++++++++++++++++--
src/include/commands/copy.h | 126 ++++++++
3 files changed, 599 insertions(+), 87 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 5bed12896f..69119d8513 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -223,19 +223,14 @@ static void CopySendData(CopyState cstate, const void *databuf, int datasize);
static void CopySendString(CopyState cstate, const char *str);
static void CopySendChar(CopyState cstate, char c);
static void CopySendEndOfRow(CopyState cstate);
-static int CopyGetData(CopyState cstate, void *databuf,
- int minread, int maxread);
static void CopySendInt32(CopyState cstate, int32 val);
static bool CopyGetInt32(CopyState cstate, int32 *val);
static void CopySendInt16(CopyState cstate, int16 val);
static bool CopyGetInt16(CopyState cstate, int16 *val);
static bool CopyLoadRawBuf(CopyState cstate);
-static int CopyReadBinaryData(CopyState cstate, char *dest, int nbytes);
-
static void ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
int copy_line_pos, int *copy_line_size);
-
/*
* Send copy start/stop messages for frontend copies. These have changed
* in past protocol redesigns.
@@ -449,7 +444,7 @@ CopySendEndOfRow(CopyState cstate)
*
* NB: no data conversion is applied here.
*/
-static int
+int
CopyGetData(CopyState cstate, void *databuf, int minread, int maxread)
{
int bytesread = 0;
@@ -582,10 +577,25 @@ CopyGetInt32(CopyState cstate, int32 *val)
{
uint32 buf;
- if (CopyReadBinaryData(cstate, (char *) &buf, sizeof(buf)) != sizeof(buf))
+ /*
+ * For parallel copy, avoid reading data to raw buf, read directly from
+ * file, later the data will be read to parallel copy data buffers.
+ */
+ if (cstate->nworkers > 0)
{
- *val = 0; /* suppress compiler warning */
- return false;
+ if (CopyGetData(cstate, &buf, sizeof(buf), sizeof(buf)) != sizeof(buf))
+ {
+ *val = 0; /* suppress compiler warning */
+ return false;
+ }
+ }
+ else
+ {
+ if (CopyReadBinaryData(cstate, (char *) &buf, sizeof(buf)) != sizeof(buf))
+ {
+ *val = 0; /* suppress compiler warning */
+ return false;
+ }
}
*val = (int32) pg_ntoh32(buf);
return true;
@@ -661,7 +671,7 @@ CopyLoadRawBuf(CopyState cstate)
* and writes them to 'dest'. Returns the number of bytes read (which
* would be less than 'nbytes' only if we reach EOF).
*/
-static int
+int
CopyReadBinaryData(CopyState cstate, char *dest, int nbytes)
{
int copied_bytes = 0;
@@ -986,7 +996,15 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
EndParallelCopy(pcxt);
}
else
+ {
+ /*
+ * Reset nworkers to -1 here. This is useful in cases where user
+ * specifies parallel workers, but, no worker is picked up, so go
+ * back to non parallel mode value of nworkers.
+ */
+ cstate->nworkers = -1;
*processed = CopyFrom(cstate); /* copy from file to database */
+ }
EndCopyFrom(cstate);
}
@@ -3556,7 +3574,7 @@ BeginCopyFrom(ParseState *pstate,
int32 tmp;
/* Signature */
- if (CopyReadBinaryData(cstate, readSig, 11) != 11 ||
+ if (CopyGetData(cstate, readSig, 11, 11) != 11 ||
memcmp(readSig, BinarySignature, 11) != 0)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
@@ -3584,7 +3602,7 @@ BeginCopyFrom(ParseState *pstate,
/* Skip extension header, if present */
while (tmp-- > 0)
{
- if (CopyReadBinaryData(cstate, readSig, 1) != 1)
+ if (CopyGetData(cstate, readSig, 1, 1) != 1)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
errmsg("invalid COPY file header (wrong length)")));
@@ -3781,60 +3799,45 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
else
{
/* binary */
- int16 fld_count;
- ListCell *cur;
-
cstate->cur_lineno++;
+ cstate->max_fields = list_length(cstate->attnumlist);
- if (!CopyGetInt16(cstate, &fld_count))
+ if (!IsParallelCopy())
{
- /* EOF detected (end of file, or protocol-level EOF) */
- return false;
- }
+ int16 fld_count;
+ ListCell *cur;
- if (fld_count == -1)
- {
- /*
- * Received EOF marker. In a V3-protocol copy, wait for the
- * protocol-level EOF, and complain if it doesn't come
- * immediately. This ensures that we correctly handle CopyFail,
- * if client chooses to send that now.
- *
- * Note that we MUST NOT try to read more data in an old-protocol
- * copy, since there is no protocol-level EOF marker then. We
- * could go either way for copy from file, but choose to throw
- * error if there's data after the EOF marker, for consistency
- * with the new-protocol case.
- */
- char dummy;
-
- if (cstate->copy_dest != COPY_OLD_FE &&
- CopyReadBinaryData(cstate, &dummy, 1) > 0)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("received copy data after EOF marker")));
- return false;
- }
+ if (!CopyGetInt16(cstate, &fld_count))
+ {
+ /* EOF detected (end of file, or protocol-level EOF) */
+ return false;
+ }
- if (fld_count != attr_count)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("row field count is %d, expected %d",
- (int) fld_count, attr_count)));
+ CHECK_FIELD_COUNT;
- foreach(cur, cstate->attnumlist)
+ foreach(cur, cstate->attnumlist)
+ {
+ int attnum = lfirst_int(cur);
+ int m = attnum - 1;
+ Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+ cstate->cur_attname = NameStr(att->attname);
+ values[m] = CopyReadBinaryAttribute(cstate,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &nulls[m]);
+ cstate->cur_attname = NULL;
+ }
+ }
+ else
{
- int attnum = lfirst_int(cur);
- int m = attnum - 1;
- Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+ bool eof = false;
- cstate->cur_attname = NameStr(att->attname);
- values[m] = CopyReadBinaryAttribute(cstate,
- &in_functions[m],
- typioparams[m],
- att->atttypmod,
- &nulls[m]);
- cstate->cur_attname = NULL;
+ eof = CopyReadBinaryTupleWorker(cstate, values, nulls);
+
+ if (eof)
+ return false;
}
}
@@ -4846,18 +4849,15 @@ CopyReadBinaryAttribute(CopyState cstate, FmgrInfo *flinfo,
Datum result;
if (!CopyGetInt32(cstate, &fld_size))
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("unexpected EOF in COPY data")));
+ EOF_ERROR;
+
if (fld_size == -1)
{
*isnull = true;
return ReceiveFunctionCall(flinfo, NULL, typioparam, typmod);
}
- if (fld_size < 0)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("invalid field size")));
+
+ CHECK_FIELD_SIZE(fld_size);
/* reset attribute_buf to empty, and load raw data in it */
resetStringInfo(&cstate->attribute_buf);
@@ -4865,9 +4865,7 @@ CopyReadBinaryAttribute(CopyState cstate, FmgrInfo *flinfo,
enlargeStringInfo(&cstate->attribute_buf, fld_size);
if (CopyReadBinaryData(cstate, cstate->attribute_buf.data,
fld_size) != fld_size)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("unexpected EOF in COPY data")));
+ EOF_ERROR;
cstate->attribute_buf.len = fld_size;
cstate->attribute_buf.data[fld_size] = '\0';
diff --git a/src/backend/commands/copyparallel.c b/src/backend/commands/copyparallel.c
index 6a44a01e47..ccfe38363c 100644
--- a/src/backend/commands/copyparallel.c
+++ b/src/backend/commands/copyparallel.c
@@ -94,6 +94,7 @@ SerializeParallelCopyState(ParallelContext *pcxt, CopyState cstate,
shared_cstate.convert_selectively = cstate->convert_selectively;
shared_cstate.num_defaults = cstate->num_defaults;
shared_cstate.relid = cstate->pcdata->relid;
+ shared_cstate.binary = cstate->binary;
memcpy(shmptr, (char *) &shared_cstate, sizeof(SerializedParallelCopyState));
copiedsize = sizeof(SerializedParallelCopyState);
@@ -191,6 +192,7 @@ RestoreParallelCopyState(shm_toc *toc, CopyState cstate, List **attlist)
cstate->convert_selectively = shared_cstate.convert_selectively;
cstate->num_defaults = shared_cstate.num_defaults;
cstate->pcdata->relid = shared_cstate.relid;
+ cstate->binary = shared_cstate.binary;
cstate->null_print = CopyStringFromSharedMemory(shared_str_val + copiedsize,
&copiedsize);
@@ -380,7 +382,7 @@ static pg_attribute_always_inline bool
IsParallelCopyAllowed(CopyState cstate)
{
/* Parallel copy not allowed for frontend (2.0 protocol) & binary option. */
- if ((cstate->copy_dest == COPY_OLD_FE) || cstate->binary)
+ if (cstate->copy_dest == COPY_OLD_FE)
return false;
/*
@@ -976,39 +978,425 @@ ParallelCopyFrom(CopyState cstate)
/* Execute the before statement triggers from the leader */
ExecBeforeStmtTrigger(cstate);
- /* On input just throw the header line away. */
- if (cstate->cur_lineno == 0 && cstate->header_line)
+ if (!cstate->binary)
{
- cstate->cur_lineno++;
- if (CopyReadLine(cstate))
+ /* On input just throw the header line away. */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
{
- pcshared_info->is_read_in_progress = false;
- return; /* done */
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ {
+ pcshared_info->is_read_in_progress = false;
+ return; /* done */
+ }
}
- }
- for (;;)
- {
- bool done;
+ for (;;)
+ {
+ bool done;
- cstate->cur_lineno++;
+ cstate->cur_lineno++;
- /* Actually read the line into memory here. */
- done = CopyReadLine(cstate);
+ /* Actually read the line into memory here. */
+ done = CopyReadLine(cstate);
+ /*
+ * EOF at start of line means we're done. If we see EOF after
+ * some characters, we act as though it was newline followed by
+ * EOF, ie, process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ break;
+ }
+ }
+ else
+ {
/*
- * EOF at start of line means we're done. If we see EOF after some
- * characters, we act as though it was newline followed by EOF, ie,
- * process the line and then exit loop on next iteration.
+ * Binary Format Files. For parallel copy leader, fill in the error
+ * context information here, in case any failures while determining
+ * tuple offsets, leader would throw the errors with proper context.
*/
- if (done && cstate->line_buf.len == 0)
- break;
+ ErrorContextCallback errcallback;
+
+ errcallback.callback = CopyFromErrorCallback;
+ errcallback.arg = (void *) cstate;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+ cstate->pcdata->curr_data_block = NULL;
+ cstate->raw_buf_index = 0;
+ pcshared_info->populated = 0;
+ cstate->cur_lineno = 0;
+ cstate->max_fields = list_length(cstate->attnumlist);
+
+ for (;;)
+ {
+ bool eof = false;
+
+ cstate->cur_lineno++;
+
+ eof = CopyReadBinaryTupleLeader(cstate);
+
+ if (eof)
+ break;
+ }
+
+ /* Done, clean up */
+ error_context_stack = errcallback.previous;
}
pcshared_info->is_read_in_progress = false;
cstate->cur_lineno = 0;
}
+/*
+ * CopyReadBinaryGetDataBlock
+ *
+ * Gets a new block, updates the current offset, calculates the skip bytes.
+ */
+void
+CopyReadBinaryGetDataBlock(CopyState cstate, FieldInfoType field_info)
+{
+ ParallelCopyDataBlock *data_block = NULL;
+ ParallelCopyDataBlock *curr_data_block = cstate->pcdata->curr_data_block;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint8 move_bytes = 0;
+ uint32 block_pos;
+ uint32 prev_block_pos;
+ int read_bytes = 0;
+
+ prev_block_pos = pcshared_info->cur_block_pos;
+
+ block_pos = WaitGetFreeCopyBlock(pcshared_info);
+
+ if (field_info == FIELD_SIZE || field_info == FIELD_COUNT)
+ move_bytes = (DATA_BLOCK_SIZE - cstate->raw_buf_index);
+
+ if (curr_data_block != NULL)
+ curr_data_block->skip_bytes = move_bytes;
+
+ data_block = &pcshared_info->data_blocks[block_pos];
+
+ if (move_bytes > 0 && curr_data_block != NULL)
+ memmove(&data_block->data[0], &curr_data_block->data[cstate->raw_buf_index], move_bytes);
+
+ elog(DEBUG1, "LEADER - field info %d is spread across data blocks - moved %d bytes from current block %u to %u block",
+ field_info, move_bytes, prev_block_pos, block_pos);
+
+ read_bytes = CopyGetData(cstate, &data_block->data[move_bytes], 1, (DATA_BLOCK_SIZE - move_bytes));
+
+ if (field_info == FIELD_NONE && cstate->reached_eof)
+ return;
+
+ if (cstate->reached_eof)
+ EOF_ERROR;
+
+ elog(DEBUG1, "LEADER - bytes read from file %d", read_bytes);
+
+ if (field_info == FIELD_SIZE || field_info == FIELD_DATA)
+ {
+ ParallelCopyDataBlock *prev_data_block = NULL;
+
+ prev_data_block = curr_data_block;
+ prev_data_block->following_block = block_pos;
+
+ if (prev_data_block->curr_blk_completed == false)
+ prev_data_block->curr_blk_completed = true;
+
+ pg_atomic_add_fetch_u32(&prev_data_block->unprocessed_line_parts, 1);
+ }
+
+ cstate->pcdata->curr_data_block = data_block;
+ cstate->raw_buf_index = 0;
+}
+
+/*
+ * CopyReadBinaryTupleLeader
+ *
+ * Leader reads data from binary formatted file to data blocks and identifies
+ * tuple boundaries/offsets so that workers can work on the data blocks data.
+ */
+bool
+CopyReadBinaryTupleLeader(CopyState cstate)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ int16 fld_count;
+ uint32 line_size = 0;
+ uint32 start_block_pos;
+ uint32 start_offset;
+
+ if (cstate->pcdata->curr_data_block == NULL)
+ {
+ CopyReadBinaryGetDataBlock(cstate, FIELD_NONE);
+
+ /*
+ * no data is read from file here. one possibility to be here could be
+ * that the binary file just has a valid signature but nothing else.
+ */
+ if (cstate->reached_eof)
+ return true;
+ }
+
+ if ((cstate->raw_buf_index + sizeof(fld_count)) >= DATA_BLOCK_SIZE)
+ CopyReadBinaryGetDataBlock(cstate, FIELD_COUNT);
+
+ memcpy(&fld_count, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_count));
+ fld_count = (int16) pg_ntoh16(fld_count);
+ CHECK_FIELD_COUNT;
+ start_offset = cstate->raw_buf_index;
+ cstate->raw_buf_index += sizeof(fld_count);
+ line_size += sizeof(fld_count);
+ start_block_pos = pcshared_info->cur_block_pos;
+
+ CopyReadBinaryFindTupleSize(cstate, &line_size);
+
+ pg_atomic_add_fetch_u32(&cstate->pcdata->curr_data_block->unprocessed_line_parts, 1);
+
+ if (line_size > 0)
+ (void) UpdateSharedLineInfo(cstate, start_block_pos, start_offset,
+ line_size, LINE_LEADER_POPULATED, -1);
+
+ return false;
+}
+
+/*
+ * CopyReadBinaryFindTupleSize
+ *
+ * Leader identifies boundaries/offsets for each attribute/column and finally
+ * results in the tuple/row size. It moves on to next data block if the
+ * attribute/column is spread across data blocks.
+ */
+void
+CopyReadBinaryFindTupleSize(CopyState cstate, uint32 *line_size)
+{
+ int32 fld_size;
+ ListCell *cur;
+ TupleDesc tup_desc = RelationGetDescr(cstate->rel);
+
+ foreach(cur, cstate->attnumlist)
+ {
+ int att_num = lfirst_int(cur);
+ Form_pg_attribute att = TupleDescAttr(tup_desc, (att_num - 1));
+
+ cstate->cur_attname = NameStr(att->attname);
+ fld_size = 0;
+
+ if ((cstate->raw_buf_index + sizeof(fld_size)) >= DATA_BLOCK_SIZE)
+ CopyReadBinaryGetDataBlock(cstate, FIELD_SIZE);
+
+ memcpy(&fld_size, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_size));
+ cstate->raw_buf_index += sizeof(fld_size);
+ *line_size += sizeof(fld_size);
+ fld_size = (int32) pg_ntoh32(fld_size);
+
+ CHECK_FIELD_SIZE(fld_size);
+
+ *line_size += fld_size;
+
+ if ((DATA_BLOCK_SIZE - cstate->raw_buf_index) >= fld_size)
+ {
+ cstate->raw_buf_index += fld_size;
+ elog(DEBUG1, "LEADER - tuple lies in he same data block");
+ }
+ else
+ {
+ int32 required_blks = 0;
+ int32 curr_blk_bytes = (DATA_BLOCK_SIZE - cstate->raw_buf_index);
+ int i = 0;
+
+ GET_REQUIRED_BLOCKS(required_blks, fld_size, curr_blk_bytes);
+
+ i = required_blks;
+
+ while (i > 0)
+ {
+ CopyReadBinaryGetDataBlock(cstate, FIELD_DATA);
+ i--;
+ }
+
+ GET_RAW_BUF_INDEX(cstate->raw_buf_index, fld_size, required_blks, curr_blk_bytes);
+
+ /*
+ * raw_buf_index should never cross data block size, as the
+ * required number of data blocks would have been obtained in the
+ * above while loop.
+ */
+ Assert(cstate->raw_buf_index <= DATA_BLOCK_SIZE);
+ }
+ cstate->cur_attname = NULL;
+ }
+}
+
+/*
+ * CopyReadBinaryTupleWorker
+ *
+ * Each worker reads data from data blocks after getting leader-identified tuple
+ * offsets from ring data structure.
+ */
+bool
+CopyReadBinaryTupleWorker(CopyState cstate, Datum *values, bool *nulls)
+{
+ uint32 line_pos;
+ ParallelCopyLineBoundary *line_info;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ int16 fld_count;
+ ListCell *cur;
+ FmgrInfo *in_functions = cstate->in_functions;
+ Oid *typioparams = cstate->typioparams;
+ TupleDesc tup_desc = RelationGetDescr(cstate->rel);
+
+
+ line_pos = GetLinePosition(cstate);
+ if (line_pos == -1)
+ return true;
+
+ line_info = &pcshared_info->line_boundaries.ring[line_pos];
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[line_info->first_block];
+ cstate->raw_buf_index = line_info->start_offset;
+
+ if (cstate->raw_buf_index + sizeof(fld_count) >= DATA_BLOCK_SIZE)
+ {
+ /*
+ * The case where field count spread across datablocks should never
+ * occur, as the leader would have moved it to next block. this code
+ * exists for debugging purposes only.
+ */
+ elog(DEBUG1, "WORKER - field count spread across datablocks should never occur");
+ }
+
+ memcpy(&fld_count, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_count));
+ fld_count = (int16) pg_ntoh16(fld_count);
+
+ CHECK_FIELD_COUNT;
+
+ cstate->raw_buf_index += sizeof(fld_count);
+
+ foreach(cur, cstate->attnumlist)
+ {
+ int att_num = lfirst_int(cur);
+ int m = att_num - 1;
+ Form_pg_attribute att = TupleDescAttr(tup_desc, m);
+
+ cstate->cur_attname = NameStr(att->attname);
+
+ values[m] = CopyReadBinaryAttributeWorker(cstate,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &nulls[m]);
+ cstate->cur_attname = NULL;
+ }
+
+ pg_atomic_sub_fetch_u32(&cstate->pcdata->curr_data_block->unprocessed_line_parts, 1);
+ line_info->start_offset = -1;
+ pg_atomic_write_u32(&line_info->line_state, LINE_WORKER_PROCESSED);
+ pg_atomic_write_u32(&line_info->line_size, -1);
+ pg_atomic_add_fetch_u64(&pcshared_info->total_worker_processed, 1);
+
+ return false;
+}
+
+/*
+ * CopyReadBinaryAttributeWorker
+ *
+ * Leader identifies boundaries/offsets for each attribute/column, it moves on
+ * to next data block if the attribute/column is spread across data blocks.
+ */
+Datum
+CopyReadBinaryAttributeWorker(CopyState cstate, FmgrInfo *flinfo,
+ Oid typioparam, int32 typmod, bool *isnull)
+{
+ int32 fld_size;
+ Datum result;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+
+ if ((cstate->raw_buf_index + sizeof(fld_size)) >= DATA_BLOCK_SIZE)
+ {
+ ParallelCopyDataBlock *prev_data_block = cstate->pcdata->curr_data_block;
+
+ elog(DEBUG1, "WORKER - field size is spread across data blocks");
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[prev_data_block->following_block];
+ pg_atomic_sub_fetch_u32(&prev_data_block->unprocessed_line_parts, 1);
+ cstate->raw_buf_index = 0;
+ }
+
+ memcpy(&fld_size, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_size));
+ fld_size = (int32) pg_ntoh32(fld_size);
+
+ CHECK_FIELD_SIZE(fld_size);
+
+ cstate->raw_buf_index += sizeof(fld_size);
+
+ /* reset attribute_buf to empty, and load raw data in it */
+ resetStringInfo(&cstate->attribute_buf);
+
+ enlargeStringInfo(&cstate->attribute_buf, fld_size);
+
+ if ((DATA_BLOCK_SIZE - cstate->raw_buf_index) >= fld_size)
+ {
+ elog(DEBUG1, "WORKER - tuple lies in single data block");
+ memcpy(&cstate->attribute_buf.data[0], &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], fld_size);
+ cstate->raw_buf_index += fld_size;
+ }
+ else
+ {
+ uint32 att_buf_idx = 0;
+ uint32 copy_bytes = 0;
+ int32 required_blks = 0;
+ int32 curr_blk_bytes = (DATA_BLOCK_SIZE - cstate->raw_buf_index);
+ ParallelCopyDataBlock *prev_data_block = NULL;
+ int i = 0;
+
+ GET_REQUIRED_BLOCKS(required_blks, fld_size, curr_blk_bytes);
+
+ i = required_blks;
+ prev_data_block = cstate->pcdata->curr_data_block;
+ elog(DEBUG1, "WORKER - tuple is spread across data blocks");
+ memcpy(&cstate->attribute_buf.data[0], &prev_data_block->data[cstate->raw_buf_index],
+ curr_blk_bytes);
+ copy_bytes = curr_blk_bytes;
+ att_buf_idx = curr_blk_bytes;
+
+ while (i > 0)
+ {
+ cstate->pcdata->curr_data_block = &pcshared_info->data_blocks[prev_data_block->following_block];
+ pg_atomic_sub_fetch_u32(&prev_data_block->unprocessed_line_parts, 1);
+ cstate->raw_buf_index = 0;
+ copy_bytes = fld_size - att_buf_idx;
+
+ /*
+ * The bytes that are yet to be taken into att buff are more than
+ * the entire data block size, but only take the data block size
+ * elements.
+ */
+ if (copy_bytes >= DATA_BLOCK_SIZE)
+ copy_bytes = DATA_BLOCK_SIZE;
+
+ memcpy(&cstate->attribute_buf.data[att_buf_idx],
+ &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], copy_bytes);
+ att_buf_idx += copy_bytes;
+ prev_data_block = cstate->pcdata->curr_data_block;
+ i--;
+ }
+ GET_RAW_BUF_INDEX(cstate->raw_buf_index, fld_size, required_blks, curr_blk_bytes);
+ }
+
+ cstate->attribute_buf.len = fld_size;
+ cstate->attribute_buf.data[fld_size] = '\0';
+
+ /* Call the column type's binary input converter */
+ result = ReceiveFunctionCall(flinfo, &cstate->attribute_buf,
+ typioparam, typmod);
+
+ /* Trouble if it didn't eat the whole buffer */
+ if (cstate->attribute_buf.cursor != cstate->attribute_buf.len)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
+ errmsg("incorrect binary data format")));
+
+ *isnull = false;
+ return result;
+}
+
/*
* GetLinePosition - Return the line position once the leader has populated the
* data.
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index a9fe950e75..746c139e94 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -76,6 +76,109 @@ if (!IsParallelCopy()) \
else \
return pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
+/*
+ * CHECK_FIELD_COUNT - Handles the error cases for field count
+ * for binary format files.
+ */
+#define CHECK_FIELD_COUNT \
+{\
+ if (fld_count == -1) \
+ { \
+ if (IsParallelCopy() && \
+ !IsLeader()) \
+ return true; \
+ else if (IsParallelCopy() && \
+ IsLeader()) \
+ { \
+ if (cstate->pcdata->curr_data_block->data[cstate->raw_buf_index + sizeof(fld_count)] != 0) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("received copy data after EOF marker"))); \
+ return true; \
+ } \
+ else \
+ { \
+ /* \
+ * Received EOF marker. In a V3-protocol copy, wait for the \
+ * protocol-level EOF, and complain if it doesn't come \
+ * immediately. This ensures that we correctly handle CopyFail, \
+ * if client chooses to send that now. \
+ * \
+ * Note that we MUST NOT try to read more data in an old-protocol \
+ * copy, since there is no protocol-level EOF marker then. We \
+ * could go either way for copy from file, but choose to throw \
+ * error if there's data after the EOF marker, for consistency \
+ * with the new-protocol case. \
+ */ \
+ char dummy; \
+ if (cstate->copy_dest != COPY_OLD_FE && \
+ CopyReadBinaryData(cstate, &dummy, 1) > 0) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("received copy data after EOF marker"))); \
+ return false; \
+ } \
+ } \
+ if (fld_count != cstate->max_fields) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("row field count is %d, expected %d", \
+ (int) fld_count, cstate->max_fields))); \
+}
+
+/*
+ * CHECK_FIELD_SIZE - Handles the error case for field size
+ * for binary format files.
+ */
+#define CHECK_FIELD_SIZE(fld_size) \
+{ \
+ if (fld_size < -1) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("invalid field size")));\
+}
+
+/*
+ * EOF_ERROR - Error statement for EOF for binary format
+ * files.
+ */
+#define EOF_ERROR \
+{ \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("unexpected EOF in COPY data")));\
+}
+
+/*
+ * GET_RAW_BUF_INDEX - Calculates the raw buf index for the cases
+ * where the data spread is across multiple data blocks.
+ */
+#define GET_RAW_BUF_INDEX(raw_buf_index, fld_size, required_blks, curr_blk_bytes) \
+{ \
+ raw_buf_index = fld_size - (((required_blks - 1) * DATA_BLOCK_SIZE) + curr_blk_bytes); \
+}
+
+/*
+ * GET_REQUIRED_BLOCKS - Calculates the number of data
+ * blocks required for the cases where the data spread
+ * is across multiple data blocks.
+ */
+#define GET_REQUIRED_BLOCKS(required_blks, fld_size, curr_blk_bytes) \
+{ \
+ /* \
+ * field size can spread across multiple data blocks, \
+ * calculate the number of required data blocks and try to get \
+ * those many data blocks. \
+ */ \
+ required_blks = (int32)(fld_size - curr_blk_bytes)/(int32)DATA_BLOCK_SIZE; \
+ /* \
+ * check if we need the data block for the field data \
+ * bytes that are not modulus of data block size. \
+ */ \
+ if ((fld_size - curr_blk_bytes)%DATA_BLOCK_SIZE != 0) \
+ required_blks++; \
+}
+
/*
* Represents the different source/dest cases we need to worry about at
* the bottom level
@@ -253,6 +356,17 @@ typedef struct ParallelCopyLineBuf
uint64 cur_lineno; /* line number for error messages */
} ParallelCopyLineBuf;
+/*
+ * Represents the usage mode for CopyReadBinaryGetDataBlock.
+ */
+typedef enum FieldInfoType
+{
+ FIELD_NONE = 0,
+ FIELD_COUNT,
+ FIELD_SIZE,
+ FIELD_DATA
+} FieldInfoType;
+
/*
* This structure helps in storing the common data from CopyStateData that are
* required by the workers. This information will then be allocated and stored
@@ -276,6 +390,7 @@ typedef struct SerializedParallelCopyState
/* Working state for COPY FROM */
AttrNumber num_defaults;
Oid relid;
+ bool binary;
} SerializedParallelCopyState;
/*
@@ -302,6 +417,9 @@ typedef struct ParallelCopyData
/* Current position in worker_line_buf */
uint32 worker_line_buf_pos;
+
+ /* For binary formatted files */
+ ParallelCopyDataBlock *curr_data_block;
} ParallelCopyData;
typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
@@ -487,4 +605,12 @@ extern uint32 UpdateSharedLineInfo(CopyState cstate, uint32 blk_pos, uint32 offs
uint32 line_size, uint32 line_state, uint32 blk_line_pos);
extern void EndLineParallelCopy(CopyState cstate, uint32 line_pos, uint32 line_size,
uint32 raw_buf_ptr);
+extern int CopyGetData(CopyState cstate, void *databuf, int minread, int maxread);
+extern int CopyReadBinaryData(CopyState cstate, char *dest, int nbytes);
+extern bool CopyReadBinaryTupleLeader(CopyState cstate);
+extern bool CopyReadBinaryTupleWorker(CopyState cstate, Datum *values, bool *nulls);
+extern void CopyReadBinaryFindTupleSize(CopyState cstate, uint32 *line_size);
+extern Datum CopyReadBinaryAttributeWorker(CopyState cstate, FmgrInfo *flinfo,
+ Oid typioparam, int32 typmod, bool *isnull);
+extern void CopyReadBinaryGetDataBlock(CopyState cstate, FieldInfoType field_info);
#endif /* COPY_H */
--
2.25.1
On Fri, Oct 9, 2020 at 2:52 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
On Tue, Sep 29, 2020 at 6:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
From the testing perspective,
1. Test by having something force_parallel_mode = regress which means
that all existing Copy tests in the regression will be executed via
new worker code. You can have this as a test-only patch for now and
make sure all existing tests passed with this.I don't think all the existing copy test cases(except the new test cases added in the parallel copy patch set) would run inside the parallel worker if force_parallel_mode is on. This is because, the parallelism will be picked up for parallel copy only if parallel option is specified unlike parallelism for select queries.
Sure, you need to change the code such that when force_parallel_mode =
'regress' is specified then it always uses one worker. This is
primarily for testing purposes and will help during the development of
this patch as it will make all exiting Copy tests to use quite a good
portion of the parallel infrastructure.
All the above tests are performed on the latest v6 patch set (attached here in this thread) with custom postgresql.conf[1]. The results are of the triplet form (exec time in sec, number of workers, gain)
Okay, so I am assuming the performance is the same as we have seen
with the earlier versions of patches.
Overall, we have below test cases to cover the code and for performance measurements. We plan to run these tests whenever a new set of patches is posted.
1. csv
2. binary
Don't we need the tests for plain text files as well?
3. force parallel mode = regress
4. toast data csv and binary
5. foreign key check, before row, after row, before statement, after statement, instead of triggers
6. partition case
7. foreign partitions and partitions having trigger cases
8. where clause having parallel unsafe and safe expression, default parallel unsafe and safe expression
9. temp, global, local, unlogged, inherited tables cases, foreign tables
Sounds like good coverage. So, are you doing all this testing
manually? How are you maintaining these tests?
--
With Regards,
Amit Kapila.
On Fri, Oct 9, 2020 at 3:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Oct 9, 2020 at 2:52 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:On Tue, Sep 29, 2020 at 6:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
From the testing perspective,
1. Test by having something force_parallel_mode = regress which means
that all existing Copy tests in the regression will be executed via
new worker code. You can have this as a test-only patch for now and
make sure all existing tests passed with this.I don't think all the existing copy test cases(except the new test cases added in the parallel copy patch set) would run inside the parallel worker if force_parallel_mode is on. This is because, the parallelism will be picked up for parallel copy only if parallel option is specified unlike parallelism for select queries.
Sure, you need to change the code such that when force_parallel_mode =
'regress' is specified then it always uses one worker. This is
primarily for testing purposes and will help during the development of
this patch as it will make all exiting Copy tests to use quite a good
portion of the parallel infrastructure.
IIUC, firstly, I will set force_parallel_mode = FORCE_PARALLEL_REGRESS
as default value in guc.c, and then adjust the parallelism related
code in copy.c such that it always picks 1 worker and spawns it. This
way, all the existing copy test cases would be run in parallel worker.
Please let me know if this is okay. If yes, I will do this and update
here.
All the above tests are performed on the latest v6 patch set (attached here in this thread) with custom postgresql.conf[1]. The results are of the triplet form (exec time in sec, number of workers, gain)
Okay, so I am assuming the performance is the same as we have seen
with the earlier versions of patches.
Yes. Most recent run on v5 patch set [1]/messages/by-id/CALj2ACW=jm5ri+7rXiQaFT_c5h2rVS=cJOQVFR5R+bowt3QDkw@mail.gmail.com
Overall, we have below test cases to cover the code and for performance measurements. We plan to run these tests whenever a new set of patches is posted.
1. csv
2. binaryDon't we need the tests for plain text files as well?
Will add one.
3. force parallel mode = regress
4. toast data csv and binary
5. foreign key check, before row, after row, before statement, after statement, instead of triggers
6. partition case
7. foreign partitions and partitions having trigger cases
8. where clause having parallel unsafe and safe expression, default parallel unsafe and safe expression
9. temp, global, local, unlogged, inherited tables cases, foreign tablesSounds like good coverage. So, are you doing all this testing
manually? How are you maintaining these tests?
Yes, running them manually. Few of the tests(1,2,4) require huge
datasets for performance measurements and other test cases are to
ensure we don't choose parallelism. We will try to add test cases that
are not meant for performance, to the patch test.
[1]: /messages/by-id/CALj2ACW=jm5ri+7rXiQaFT_c5h2rVS=cJOQVFR5R+bowt3QDkw@mail.gmail.com
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On Fri, Oct 9, 2020 at 3:50 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
On Fri, Oct 9, 2020 at 3:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Oct 9, 2020 at 2:52 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:On Tue, Sep 29, 2020 at 6:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
From the testing perspective,
1. Test by having something force_parallel_mode = regress which means
that all existing Copy tests in the regression will be executed via
new worker code. You can have this as a test-only patch for now and
make sure all existing tests passed with this.I don't think all the existing copy test cases(except the new test cases added in the parallel copy patch set) would run inside the parallel worker if force_parallel_mode is on. This is because, the parallelism will be picked up for parallel copy only if parallel option is specified unlike parallelism for select queries.
Sure, you need to change the code such that when force_parallel_mode =
'regress' is specified then it always uses one worker. This is
primarily for testing purposes and will help during the development of
this patch as it will make all exiting Copy tests to use quite a good
portion of the parallel infrastructure.IIUC, firstly, I will set force_parallel_mode = FORCE_PARALLEL_REGRESS
as default value in guc.c,
No need to set this as the default value. You can change it in
postgresql.conf before running tests.
and then adjust the parallelism related
code in copy.c such that it always picks 1 worker and spawns it. This
way, all the existing copy test cases would be run in parallel worker.
Please let me know if this is okay.
Yeah, this sounds fine.
If yes, I will do this and update
here.
Okay, thanks, but ensure the difference in test execution before and
after your change. After your change, all the 'copy' tests should
invoke the worker to perform a copy.
All the above tests are performed on the latest v6 patch set (attached here in this thread) with custom postgresql.conf[1]. The results are of the triplet form (exec time in sec, number of workers, gain)
Okay, so I am assuming the performance is the same as we have seen
with the earlier versions of patches.Yes. Most recent run on v5 patch set [1]
Okay, good to know that.
--
With Regards,
Amit Kapila.
On Fri, Oct 9, 2020 at 12:10 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
While looking at the latest code, I observed below issue in patch
v6-0003-Allow-copy-from-command-to-process-data-from-file:+ /* Estimate the size for shared information for PARALLEL_COPY_KEY_CSTATE */ + est_cstateshared = MAXALIGN(sizeof(SerializedParallelCopyState)); + shm_toc_estimate_chunk(&pcxt->estimator, est_cstateshared); + shm_toc_estimate_keys(&pcxt->estimator, 1); + + strsize = EstimateCstateSize(pcxt, cstate, attnamelist, &whereClauseStr, + &rangeTableStr, &attnameListStr, + ¬nullListStr, &nullListStr, + &convertListStr);Here, do we need to separately estimate the size of
SerializedParallelCopyState when it is also done in
EstimateCstateSize?
This is not required, this has been removed in the attached patches.
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v7-0001-Copy-code-readjustment-to-support-parallel-copy.patchtext/x-patch; charset=US-ASCII; name=v7-0001-Copy-code-readjustment-to-support-parallel-copy.patchDownload
From ff510b2589e251523b12e3914cc7fe89ba0ac1d0 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Tue, 13 Oct 2020 18:29:58 +0530
Subject: [PATCH v7 1/6] Copy code readjustment to support parallel copy.
This patch has the copy code slightly readjusted so that the common code is
separated to functions/macros, these functions/macros will be used by the
workers in the parallel copy code of the upcoming patches. EOL removal is moved
from CopyReadLine to CopyReadLineText, this change was required because in case
of parallel copy the record identification and record updation is done in
CopyReadLineText, before record information is updated in shared memory the new
line characters should be removed.
---
src/backend/commands/copy.c | 335 ++++++++++++++++++++++++++------------------
1 file changed, 199 insertions(+), 136 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 71d48d4..a01e438 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -95,6 +95,9 @@ typedef enum CopyInsertMethod
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
} CopyInsertMethod;
+#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
+
/*
* This struct contains all the state variables used throughout a COPY
* operation. For simplicity, we use the same struct for all variants of COPY,
@@ -224,7 +227,6 @@ typedef struct CopyStateData
* appropriate amounts of data from this buffer. In both modes, we
* guarantee that there is a \0 at raw_buf[raw_buf_len].
*/
-#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
char *raw_buf;
int raw_buf_index; /* next byte to process */
int raw_buf_len; /* total # of bytes stored */
@@ -288,7 +290,6 @@ typedef struct CopyMultiInsertInfo
int ti_options; /* table insert options */
} CopyMultiInsertInfo;
-
/*
* These macros centralize code used to process line_buf and raw_buf buffers.
* They are macros because they often do continue/break control and to avoid
@@ -401,6 +402,12 @@ static bool CopyGetInt16(CopyState cstate, int16 *val);
static bool CopyLoadRawBuf(CopyState cstate);
static int CopyReadBinaryData(CopyState cstate, char *dest, int nbytes);
+static void PopulateCommonCstateInfo(CopyState cstate, TupleDesc tup_desc,
+ List *attnamelist);
+static void ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
+ int copy_line_pos, int *copy_line_size);
+static void ConvertToServerEncoding(CopyState cstate);
+
/*
* Send copy start/stop messages for frontend copies. These have changed
@@ -1518,7 +1525,6 @@ BeginCopy(ParseState *pstate,
{
CopyState cstate;
TupleDesc tupDesc;
- int num_phys_attrs;
MemoryContext oldcontext;
/* Allocate workspace and zero all fields */
@@ -1684,6 +1690,25 @@ BeginCopy(ParseState *pstate,
tupDesc = cstate->queryDesc->tupDesc;
}
+ PopulateCommonCstateInfo(cstate, tupDesc, attnamelist);
+ cstate->copy_dest = COPY_FILE; /* default */
+
+ MemoryContextSwitchTo(oldcontext);
+
+ return cstate;
+}
+
+/*
+ * PopulateCommonCstateInfo
+ *
+ * Populates the common variables required for copy from operation. This is a
+ * helper function for BeginCopy function.
+ */
+static void
+PopulateCommonCstateInfo(CopyState cstate, TupleDesc tupDesc, List *attnamelist)
+{
+ int num_phys_attrs;
+
/* Generate or convert list of attributes to process */
cstate->attnumlist = CopyGetAttnums(tupDesc, cstate->rel, attnamelist);
@@ -1803,12 +1828,6 @@ BeginCopy(ParseState *pstate,
pg_database_encoding_max_length() > 1);
/* See Multibyte encoding comment above */
cstate->encoding_embeds_ascii = PG_ENCODING_IS_CLIENT_ONLY(cstate->file_encoding);
-
- cstate->copy_dest = COPY_FILE; /* default */
-
- MemoryContextSwitchTo(oldcontext);
-
- return cstate;
}
/*
@@ -2700,32 +2719,13 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
}
/*
- * Copy FROM file to relation.
+ * CheckTargetRelValidity
+ *
+ * Check if the relation specified in copy from is valid.
*/
-uint64
-CopyFrom(CopyState cstate)
+static void
+CheckTargetRelValidity(CopyState cstate)
{
- ResultRelInfo *resultRelInfo;
- ResultRelInfo *target_resultRelInfo;
- ResultRelInfo *prevResultRelInfo = NULL;
- EState *estate = CreateExecutorState(); /* for ExecConstraints() */
- ModifyTableState *mtstate;
- ExprContext *econtext;
- TupleTableSlot *singleslot = NULL;
- MemoryContext oldcontext = CurrentMemoryContext;
-
- PartitionTupleRouting *proute = NULL;
- ErrorContextCallback errcallback;
- CommandId mycid = GetCurrentCommandId(true);
- int ti_options = 0; /* start with default options for insert */
- BulkInsertState bistate = NULL;
- CopyInsertMethod insertMethod;
- CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
- uint64 processed = 0;
- bool has_before_insert_row_trig;
- bool has_instead_insert_row_trig;
- bool leafpart_use_multi_insert = false;
-
Assert(cstate->rel);
Assert(list_length(cstate->range_table) == 1);
@@ -2763,27 +2763,6 @@ CopyFrom(CopyState cstate)
RelationGetRelationName(cstate->rel))));
}
- /*
- * If the target file is new-in-transaction, we assume that checking FSM
- * for free space is a waste of time. This could possibly be wrong, but
- * it's unlikely.
- */
- if (RELKIND_HAS_STORAGE(cstate->rel->rd_rel->relkind) &&
- (cstate->rel->rd_createSubid != InvalidSubTransactionId ||
- cstate->rel->rd_firstRelfilenodeSubid != InvalidSubTransactionId))
- ti_options |= TABLE_INSERT_SKIP_FSM;
-
- /*
- * Optimize if new relfilenode was created in this subxact or one of its
- * committed children and we won't see those rows later as part of an
- * earlier scan or command. The subxact test ensures that if this subxact
- * aborts then the frozen rows won't be visible after xact cleanup. Note
- * that the stronger test of exactly which subtransaction created it is
- * crucial for correctness of this optimization. The test for an earlier
- * scan or command tolerates false negatives. FREEZE causes other sessions
- * to see rows they would not see under MVCC, and a false negative merely
- * spreads that anomaly to the current session.
- */
if (cstate->freeze)
{
/*
@@ -2821,9 +2800,61 @@ CopyFrom(CopyState cstate)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("cannot perform COPY FREEZE because the table was not created or truncated in the current subtransaction")));
+ }
+}
+
+/*
+ * Copy FROM file to relation.
+ */
+uint64
+CopyFrom(CopyState cstate)
+{
+ ResultRelInfo *resultRelInfo;
+ ResultRelInfo *target_resultRelInfo;
+ ResultRelInfo *prevResultRelInfo = NULL;
+ EState *estate = CreateExecutorState(); /* for ExecConstraints() */
+ ModifyTableState *mtstate;
+ ExprContext *econtext;
+ TupleTableSlot *singleslot = NULL;
+ MemoryContext oldcontext = CurrentMemoryContext;
+
+ PartitionTupleRouting *proute = NULL;
+ ErrorContextCallback errcallback;
+ CommandId mycid = GetCurrentCommandId(true);
+ int ti_options = 0; /* start with default options for insert */
+ BulkInsertState bistate = NULL;
+ CopyInsertMethod insertMethod;
+ CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */
+ uint64 processed = 0;
+ bool has_before_insert_row_trig;
+ bool has_instead_insert_row_trig;
+ bool leafpart_use_multi_insert = false;
+
+ CheckTargetRelValidity(cstate);
+
+ /*
+ * If the target file is new-in-transaction, we assume that checking FSM
+ * for free space is a waste of time. This could possibly be wrong, but
+ * it's unlikely.
+ */
+ if (RELKIND_HAS_STORAGE(cstate->rel->rd_rel->relkind) &&
+ (cstate->rel->rd_createSubid != InvalidSubTransactionId ||
+ cstate->rel->rd_firstRelfilenodeSubid != InvalidSubTransactionId))
+ ti_options |= TABLE_INSERT_SKIP_FSM;
+ /*
+ * Optimize if new relfilenode was created in this subxact or one of its
+ * committed children and we won't see those rows later as part of an
+ * earlier scan or command. The subxact test ensures that if this subxact
+ * aborts then the frozen rows won't be visible after xact cleanup. Note
+ * that the stronger test of exactly which subtransaction created it is
+ * crucial for correctness of this optimization. The test for an earlier
+ * scan or command tolerates false negatives. FREEZE causes other sessions
+ * to see rows they would not see under MVCC, and a false negative merely
+ * spreads that anomaly to the current session.
+ */
+ if (cstate->freeze)
ti_options |= TABLE_INSERT_FROZEN;
- }
/*
* We need a ResultRelInfo so we can use the regular executor's
@@ -3366,26 +3397,13 @@ CopyFrom(CopyState cstate)
}
/*
- * Setup to read tuples from a file for COPY FROM.
+ * PopulateCstateCatalogInfo
*
- * 'rel': Used as a template for the tuples
- * 'filename': Name of server-local file to read
- * 'attnamelist': List of char *, columns to include. NIL selects all cols.
- * 'options': List of DefElem. See copy_opt_item in gram.y for selections.
- *
- * Returns a CopyState, to be passed to NextCopyFrom and related functions.
+ * Populate the cstate catalog information.
*/
-CopyState
-BeginCopyFrom(ParseState *pstate,
- Relation rel,
- const char *filename,
- bool is_program,
- copy_data_source_cb data_source_cb,
- List *attnamelist,
- List *options)
+static void
+PopulateCstateCatalogInfo(CopyState cstate)
{
- CopyState cstate;
- bool pipe = (filename == NULL);
TupleDesc tupDesc;
AttrNumber num_phys_attrs,
num_defaults;
@@ -3395,38 +3413,8 @@ BeginCopyFrom(ParseState *pstate,
Oid in_func_oid;
int *defmap;
ExprState **defexprs;
- MemoryContext oldcontext;
bool volatile_defexprs;
- cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
- oldcontext = MemoryContextSwitchTo(cstate->copycontext);
-
- /* Initialize state variables */
- cstate->reached_eof = false;
- cstate->eol_type = EOL_UNKNOWN;
- cstate->cur_relname = RelationGetRelationName(cstate->rel);
- cstate->cur_lineno = 0;
- cstate->cur_attname = NULL;
- cstate->cur_attval = NULL;
-
- /*
- * Set up variables to avoid per-attribute overhead. attribute_buf and
- * raw_buf are used in both text and binary modes, but we use line_buf
- * only in text mode.
- */
- initStringInfo(&cstate->attribute_buf);
- cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
- cstate->raw_buf_index = cstate->raw_buf_len = 0;
- if (!cstate->binary)
- {
- initStringInfo(&cstate->line_buf);
- cstate->line_buf_converted = false;
- }
-
- /* Assign range table, we'll need it in CopyFrom. */
- if (pstate)
- cstate->range_table = pstate->p_rtable;
-
tupDesc = RelationGetDescr(cstate->rel);
num_phys_attrs = tupDesc->natts;
num_defaults = 0;
@@ -3504,6 +3492,61 @@ BeginCopyFrom(ParseState *pstate,
cstate->defexprs = defexprs;
cstate->volatile_defexprs = volatile_defexprs;
cstate->num_defaults = num_defaults;
+}
+
+/*
+ * Setup to read tuples from a file for COPY FROM.
+ *
+ * 'rel': Used as a template for the tuples
+ * 'filename': Name of server-local file to read
+ * 'attnamelist': List of char *, columns to include. NIL selects all cols.
+ * 'options': List of DefElem. See copy_opt_item in gram.y for selections.
+ *
+ * Returns a CopyState, to be passed to NextCopyFrom and related functions.
+ */
+CopyState
+BeginCopyFrom(ParseState *pstate,
+ Relation rel,
+ const char *filename,
+ bool is_program,
+ copy_data_source_cb data_source_cb,
+ List *attnamelist,
+ List *options)
+{
+ CopyState cstate;
+ bool pipe = (filename == NULL);
+ MemoryContext oldcontext;
+
+ cstate = BeginCopy(pstate, true, rel, NULL, InvalidOid, attnamelist, options);
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+
+ /* Initialize state variables */
+ cstate->reached_eof = false;
+ cstate->eol_type = EOL_UNKNOWN;
+ cstate->cur_relname = RelationGetRelationName(cstate->rel);
+ cstate->cur_lineno = 0;
+ cstate->cur_attname = NULL;
+ cstate->cur_attval = NULL;
+
+ /*
+ * Set up variables to avoid per-attribute overhead. attribute_buf is
+ * used in both text and binary modes, but we use line_buf and raw_buf
+ * only in text mode.
+ */
+ initStringInfo(&cstate->attribute_buf);
+ cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+ if (!cstate->binary)
+ {
+ initStringInfo(&cstate->line_buf);
+ cstate->line_buf_converted = false;
+ }
+
+ /* Assign range table, we'll need it in CopyFrom. */
+ if (pstate)
+ cstate->range_table = pstate->p_rtable;
+
+ PopulateCstateCatalogInfo(cstate);
cstate->is_program = is_program;
if (data_source_cb)
@@ -3913,40 +3956,60 @@ CopyReadLine(CopyState cstate)
} while (CopyLoadRawBuf(cstate));
}
}
- else
+
+ ConvertToServerEncoding(cstate);
+ return result;
+}
+
+/*
+ * ClearEOLFromCopiedData
+ *
+ * Clear EOL from the copied data.
+ */
+static void
+ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
+ int copy_line_pos, int *copy_line_size)
+{
+ /*
+ * If we didn't hit EOF, then we must have transferred the EOL marker to
+ * line_buf along with the data. Get rid of it.
+ */
+ switch (cstate->eol_type)
{
- /*
- * If we didn't hit EOF, then we must have transferred the EOL marker
- * to line_buf along with the data. Get rid of it.
- */
- switch (cstate->eol_type)
- {
- case EOL_NL:
- Assert(cstate->line_buf.len >= 1);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\n');
- cstate->line_buf.len--;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_CR:
- Assert(cstate->line_buf.len >= 1);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\r');
- cstate->line_buf.len--;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_CRNL:
- Assert(cstate->line_buf.len >= 2);
- Assert(cstate->line_buf.data[cstate->line_buf.len - 2] == '\r');
- Assert(cstate->line_buf.data[cstate->line_buf.len - 1] == '\n');
- cstate->line_buf.len -= 2;
- cstate->line_buf.data[cstate->line_buf.len] = '\0';
- break;
- case EOL_UNKNOWN:
- /* shouldn't get here */
- Assert(false);
- break;
- }
+ case EOL_NL:
+ Assert(*copy_line_size >= 1);
+ Assert(copy_line_data[copy_line_pos - 1] == '\n');
+ copy_line_data[copy_line_pos - 1] = '\0';
+ (*copy_line_size)--;
+ break;
+ case EOL_CR:
+ Assert(*copy_line_size >= 1);
+ Assert(copy_line_data[copy_line_pos - 1] == '\r');
+ copy_line_data[copy_line_pos - 1] = '\0';
+ (*copy_line_size)--;
+ break;
+ case EOL_CRNL:
+ Assert(*copy_line_size >= 2);
+ Assert(copy_line_data[copy_line_pos - 2] == '\r');
+ Assert(copy_line_data[copy_line_pos - 1] == '\n');
+ copy_line_data[copy_line_pos - 2] = '\0';
+ *copy_line_size -= 2;
+ break;
+ case EOL_UNKNOWN:
+ /* shouldn't get here */
+ Assert(false);
+ break;
}
+}
+/*
+ * ConvertToServerEncoding
+ *
+ * Convert contents to server encoding.
+ */
+static void
+ConvertToServerEncoding(CopyState cstate)
+{
/* Done reading the line. Convert it to server encoding. */
if (cstate->need_transcoding)
{
@@ -3963,11 +4026,8 @@ CopyReadLine(CopyState cstate)
pfree(cvt);
}
}
-
/* Now it's safe to use the buffer in error messages */
cstate->line_buf_converted = true;
-
- return result;
}
/*
@@ -4330,6 +4390,9 @@ not_end_of_copy:
* Transfer any still-uncopied data to line_buf.
*/
REFILL_LINEBUF;
+ if (!result && !IsHeaderLine())
+ ClearEOLFromCopiedData(cstate, cstate->line_buf.data,
+ cstate->line_buf.len, &cstate->line_buf.len);
return result;
}
--
1.8.3.1
v7-0002-Framework-for-leader-worker-in-parallel-copy.patchtext/x-patch; charset=US-ASCII; name=v7-0002-Framework-for-leader-worker-in-parallel-copy.patchDownload
From 8f95ef5928693a38f444a94143adeb7d74e85ddd Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Wed, 7 Oct 2020 17:18:17 +0530
Subject: [PATCH v7 2/6] Framework for leader/worker in parallel copy
This patch has the framework for data structures in parallel copy, leader
initialization, worker initialization, shared memory updation, starting workers,
wait for workers and workers exiting.
---
src/backend/access/transam/parallel.c | 4 +
src/backend/commands/copy.c | 235 ++++++--------------
src/include/commands/copy.h | 389 +++++++++++++++++++++++++++++++++-
src/tools/pgindent/typedefs.list | 7 +
4 files changed, 468 insertions(+), 167 deletions(-)
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index b042696..a3cff4b 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -25,6 +25,7 @@
#include "catalog/pg_enum.h"
#include "catalog/storage.h"
#include "commands/async.h"
+#include "commands/copy.h"
#include "executor/execParallel.h"
#include "libpq/libpq.h"
#include "libpq/pqformat.h"
@@ -145,6 +146,9 @@ static const struct
},
{
"parallel_vacuum_main", parallel_vacuum_main
+ },
+ {
+ "ParallelCopyMain", ParallelCopyMain
}
};
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index a01e438..6c5dc2a 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -29,7 +29,6 @@
#include "catalog/pg_type.h"
#include "commands/copy.h"
#include "commands/defrem.h"
-#include "commands/trigger.h"
#include "executor/execPartition.h"
#include "executor/executor.h"
#include "executor/nodeModifyTable.h"
@@ -63,29 +62,6 @@
#define OCTVALUE(c) ((c) - '0')
/*
- * Represents the different source/dest cases we need to worry about at
- * the bottom level
- */
-typedef enum CopyDest
-{
- COPY_FILE, /* to/from file (or a piped program) */
- COPY_OLD_FE, /* to/from frontend (2.0 protocol) */
- COPY_NEW_FE, /* to/from frontend (3.0 protocol) */
- COPY_CALLBACK /* to/from callback function */
-} CopyDest;
-
-/*
- * Represents the end-of-line terminator type of the input
- */
-typedef enum EolType
-{
- EOL_UNKNOWN,
- EOL_NL,
- EOL_CR,
- EOL_CRNL
-} EolType;
-
-/*
* Represents the heap insert method to be used during COPY FROM.
*/
typedef enum CopyInsertMethod
@@ -95,145 +71,10 @@ typedef enum CopyInsertMethod
CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
} CopyInsertMethod;
-#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+#define IsParallelCopy() (cstate->is_parallel)
+#define IsLeader() (cstate->pcdata->is_leader)
#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
-/*
- * This struct contains all the state variables used throughout a COPY
- * operation. For simplicity, we use the same struct for all variants of COPY,
- * even though some fields are used in only some cases.
- *
- * Multi-byte encodings: all supported client-side encodings encode multi-byte
- * characters by having the first byte's high bit set. Subsequent bytes of the
- * character can have the high bit not set. When scanning data in such an
- * encoding to look for a match to a single-byte (ie ASCII) character, we must
- * use the full pg_encoding_mblen() machinery to skip over multibyte
- * characters, else we might find a false match to a trailing byte. In
- * supported server encodings, there is no possibility of a false match, and
- * it's faster to make useless comparisons to trailing bytes than it is to
- * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true
- * when we have to do it the hard way.
- */
-typedef struct CopyStateData
-{
- /* low-level state data */
- CopyDest copy_dest; /* type of copy source/destination */
- FILE *copy_file; /* used if copy_dest == COPY_FILE */
- StringInfo fe_msgbuf; /* used for all dests during COPY TO, only for
- * dest == COPY_NEW_FE in COPY FROM */
- bool is_copy_from; /* COPY TO, or COPY FROM? */
- bool reached_eof; /* true if we read to end of copy data (not
- * all copy_dest types maintain this) */
- EolType eol_type; /* EOL type of input */
- int file_encoding; /* file or remote side's character encoding */
- bool need_transcoding; /* file encoding diff from server? */
- bool encoding_embeds_ascii; /* ASCII can be non-first byte? */
-
- /* parameters from the COPY command */
- Relation rel; /* relation to copy to or from */
- QueryDesc *queryDesc; /* executable query to copy from */
- List *attnumlist; /* integer list of attnums to copy */
- char *filename; /* filename, or NULL for STDIN/STDOUT */
- bool is_program; /* is 'filename' a program to popen? */
- copy_data_source_cb data_source_cb; /* function for reading data */
- bool binary; /* binary format? */
- bool freeze; /* freeze rows on loading? */
- bool csv_mode; /* Comma Separated Value format? */
- bool header_line; /* CSV header line? */
- char *null_print; /* NULL marker string (server encoding!) */
- int null_print_len; /* length of same */
- char *null_print_client; /* same converted to file encoding */
- char *delim; /* column delimiter (must be 1 byte) */
- char *quote; /* CSV quote char (must be 1 byte) */
- char *escape; /* CSV escape char (must be 1 byte) */
- List *force_quote; /* list of column names */
- bool force_quote_all; /* FORCE_QUOTE *? */
- bool *force_quote_flags; /* per-column CSV FQ flags */
- List *force_notnull; /* list of column names */
- bool *force_notnull_flags; /* per-column CSV FNN flags */
- List *force_null; /* list of column names */
- bool *force_null_flags; /* per-column CSV FN flags */
- bool convert_selectively; /* do selective binary conversion? */
- List *convert_select; /* list of column names (can be NIL) */
- bool *convert_select_flags; /* per-column CSV/TEXT CS flags */
- Node *whereClause; /* WHERE condition (or NULL) */
-
- /* these are just for error messages, see CopyFromErrorCallback */
- const char *cur_relname; /* table name for error messages */
- uint64 cur_lineno; /* line number for error messages */
- const char *cur_attname; /* current att for error messages */
- const char *cur_attval; /* current att value for error messages */
-
- /*
- * Working state for COPY TO/FROM
- */
- MemoryContext copycontext; /* per-copy execution context */
-
- /*
- * Working state for COPY TO
- */
- FmgrInfo *out_functions; /* lookup info for output functions */
- MemoryContext rowcontext; /* per-row evaluation context */
-
- /*
- * Working state for COPY FROM
- */
- AttrNumber num_defaults;
- FmgrInfo *in_functions; /* array of input functions for each attrs */
- Oid *typioparams; /* array of element types for in_functions */
- int *defmap; /* array of default att numbers */
- ExprState **defexprs; /* array of default att expressions */
- bool volatile_defexprs; /* is any of defexprs volatile? */
- List *range_table;
- ExprState *qualexpr;
-
- TransitionCaptureState *transition_capture;
-
- /*
- * These variables are used to reduce overhead in COPY FROM.
- *
- * attribute_buf holds the separated, de-escaped text for each field of
- * the current line. The CopyReadAttributes functions return arrays of
- * pointers into this buffer. We avoid palloc/pfree overhead by re-using
- * the buffer on each cycle.
- *
- * In binary COPY FROM, attribute_buf holds the binary data for the
- * current field, but the usage is otherwise similar.
- */
- StringInfoData attribute_buf;
-
- /* field raw data pointers found by COPY FROM */
-
- int max_fields;
- char **raw_fields;
-
- /*
- * Similarly, line_buf holds the whole input line being processed. The
- * input cycle is first to read the whole line into line_buf, convert it
- * to server encoding there, and then extract the individual attribute
- * fields into attribute_buf. line_buf is preserved unmodified so that we
- * can display it in error messages if appropriate. (In binary mode,
- * line_buf is not used.)
- */
- StringInfoData line_buf;
- bool line_buf_converted; /* converted to server encoding? */
- bool line_buf_valid; /* contains the row being processed? */
-
- /*
- * Finally, raw_buf holds raw data read from the data source (file or
- * client connection). In text mode, CopyReadLine parses this data
- * sufficiently to locate line boundaries, then transfers the data to
- * line_buf and converts it. In binary mode, CopyReadBinaryData fetches
- * appropriate amounts of data from this buffer. In both modes, we
- * guarantee that there is a \0 at raw_buf[raw_buf_len].
- */
- char *raw_buf;
- int raw_buf_index; /* next byte to process */
- int raw_buf_len; /* total # of bytes stored */
- /* Shorthand for number of unconsumed bytes available in raw_buf */
-#define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
-} CopyStateData;
-
/* DestReceiver for COPY (query) TO */
typedef struct
{
@@ -402,8 +243,6 @@ static bool CopyGetInt16(CopyState cstate, int16 *val);
static bool CopyLoadRawBuf(CopyState cstate);
static int CopyReadBinaryData(CopyState cstate, char *dest, int nbytes);
-static void PopulateCommonCstateInfo(CopyState cstate, TupleDesc tup_desc,
- List *attnamelist);
static void ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
int copy_line_pos, int *copy_line_size);
static void ConvertToServerEncoding(CopyState cstate);
@@ -1117,6 +956,8 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
if (is_from)
{
+ ParallelContext *pcxt = NULL;
+
Assert(rel);
/* check read-only transaction and parallel mode */
@@ -1126,7 +967,35 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
cstate = BeginCopyFrom(pstate, rel, stmt->filename, stmt->is_program,
NULL, stmt->attlist, stmt->options);
cstate->whereClause = whereClause;
- *processed = CopyFrom(cstate); /* copy from file to database */
+ cstate->is_parallel = false;
+
+ if (cstate->nworkers > 0)
+ pcxt = BeginParallelCopy(cstate->nworkers, cstate, stmt->attlist,
+ relid);
+
+ if (pcxt)
+ {
+ int i;
+
+ ParallelCopyFrom(cstate);
+
+ /* Wait for all copy workers to finish */
+ WaitForParallelWorkersToFinish(pcxt);
+
+ /*
+ * Next, accumulate WAL usage. (This must wait for the workers to
+ * finish, or we might get incomplete data.)
+ */
+ for (i = 0; i < pcxt->nworkers_launched; i++)
+ InstrAccumParallelQuery(&cstate->pcdata->bufferusage[i],
+ &cstate->pcdata->walusage[i]);
+
+ *processed = pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
+ EndParallelCopy(pcxt);
+ }
+ else
+ *processed = CopyFrom(cstate); /* copy from file to database */
+
EndCopyFrom(cstate);
}
else
@@ -1177,6 +1046,7 @@ ProcessCopyOptions(ParseState *pstate,
cstate->is_copy_from = is_from;
cstate->file_encoding = -1;
+ cstate->nworkers = -1;
/* Extract options from the statement node tree */
foreach(option, options)
@@ -1347,6 +1217,39 @@ ProcessCopyOptions(ParseState *pstate,
defel->defname),
parser_errposition(pstate, defel->location)));
}
+ else if (strcmp(defel->defname, "parallel") == 0)
+ {
+ int val;
+ bool parsed;
+ char *strval;
+
+ if (!cstate->is_copy_from)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("parallel option is supported only for copy from"),
+ parser_errposition(pstate, defel->location)));
+ if (cstate->nworkers >= 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("conflicting or redundant options"),
+ parser_errposition(pstate, defel->location)));
+
+ strval = defGetString(defel);
+ parsed = parse_int(strval, &val, 0, NULL);
+ if (!parsed)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for integer option \"%s\": %s",
+ defel->defname, strval)));
+ if (val < 1 || val > 1024)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("value %s out of bounds for option \"%s\"",
+ strval, defel->defname),
+ errdetail("Valid values are between \"%d\" and \"%d\".",
+ 1, 1024)));
+ cstate->nworkers = val;
+ }
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1702,9 +1605,9 @@ BeginCopy(ParseState *pstate,
* PopulateCommonCstateInfo
*
* Populates the common variables required for copy from operation. This is a
- * helper function for BeginCopy function.
+ * helper function for BeginCopy & InitializeParallelCopyInfo function.
*/
-static void
+void
PopulateCommonCstateInfo(CopyState cstate, TupleDesc tupDesc, List *attnamelist)
{
int num_phys_attrs;
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index c639833..cd2d56e 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -14,14 +14,394 @@
#ifndef COPY_H
#define COPY_H
+#include "access/parallel.h"
+#include "commands/trigger.h"
+#include "executor/executor.h"
#include "nodes/execnodes.h"
#include "nodes/parsenodes.h"
#include "parser/parse_node.h"
#include "tcop/dest.h"
+#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
+
+/*
+ * The macros DATA_BLOCK_SIZE, RINGSIZE & MAX_BLOCKS_COUNT stores the records
+ * read from the file that need to be inserted into the relation. These values
+ * help in the handover of multiple records with the significant size of data to
+ * be processed by each of the workers. This also ensures there is no context
+ * switch and the work is fairly distributed among the workers. This number
+ * showed best results in the performance tests.
+ */
+#define DATA_BLOCK_SIZE RAW_BUF_SIZE
+
+/* It can hold 1023 blocks of 64K data in DSM to be processed by the worker. */
+#define MAX_BLOCKS_COUNT 1024
+
+/*
+ * It can hold upto 10240 record information for worker to process. RINGSIZE
+ * should be a multiple of WORKER_CHUNK_COUNT, as wrap around cases is currently
+ * not handled while selecting the WORKER_CHUNK_COUNT by the worker.
+ */
+#define RINGSIZE (10 * 1024)
+
+/*
+ * Each worker will be allocated WORKER_CHUNK_COUNT of records from DSM data
+ * block to process to avoid lock contention. Read RINGSIZE comments before
+ * changing this value.
+ */
+#define WORKER_CHUNK_COUNT 64
+
+/*
+ * Represents the different source/dest cases we need to worry about at
+ * the bottom level
+ */
+typedef enum CopyDest
+{
+ COPY_FILE, /* to/from file (or a piped program) */
+ COPY_OLD_FE, /* to/from frontend (2.0 protocol) */
+ COPY_NEW_FE, /* to/from frontend (3.0 protocol) */
+ COPY_CALLBACK /* to/from callback function */
+} CopyDest;
+
+/*
+ * Represents the end-of-line terminator type of the input
+ */
+typedef enum EolType
+{
+ EOL_UNKNOWN,
+ EOL_NL,
+ EOL_CR,
+ EOL_CRNL
+} EolType;
+
+/*
+ * Copy data block information.
+ *
+ * These data blocks are created in DSM. Data read from file will be copied in
+ * these DSM data blocks. The leader process identifies the records and the
+ * record information will be shared to the workers. The workers will insert the
+ * records into the table. There can be one or more number of records in each of
+ * the data block based on the record size.
+ */
+typedef struct ParallelCopyDataBlock
+{
+ /* The number of unprocessed lines in the current block. */
+ pg_atomic_uint32 unprocessed_line_parts;
+
+ /*
+ * If the current line data is continued into another block,
+ * following_block will have the position where the remaining data need to
+ * be read.
+ */
+ uint32 following_block;
+
+ /*
+ * This flag will be set, when the leader finds out this block can be read
+ * safely by the worker. This helps the worker to start processing the
+ * line early where the line will be spread across many blocks and the
+ * worker need not wait for the complete line to be processed.
+ */
+ bool curr_blk_completed;
+
+ /*
+ * Few bytes need to be skipped from this block, this will be set when a
+ * sequence of characters like \r\n is expected, but end of our block
+ * contained only \r. In this case we copy the data from \r into the new
+ * block as they have to be processed together to identify end of line.
+ * Worker will use skip_bytes to know that this data must be skipped from
+ * this data block.
+ */
+ uint8 skip_bytes;
+ char data[DATA_BLOCK_SIZE]; /* data read from file */
+} ParallelCopyDataBlock;
+
+/*
+ * Individual line information.
+ *
+ * ParallelCopyLineBoundary is common data structure between leader & worker.
+ * Leader process will be populating data block, data block offset & the size of
+ * the record in DSM for the workers to copy the data into the relation.
+ * The leader & worker process access the shared line information by following
+ * the below steps to avoid any data corruption or hang:
+ * Leader should operate in the following order:
+ * 1) check if line_size is -1, if line_size is not -1 wait until line_size is
+ * set to -1 by the worker. If line_size is -1 it means worker is still
+ * processing.
+ * 2) set line_state to LINE_LEADER_POPULATING, so that the worker knows that
+ * leader is populating this line.
+ * 3) update first_block, start_offset & cur_lineno in any order.
+ * 4) update line_size.
+ * 5) update line_state to LINE_LEADER_POPULATED.
+ * Worker should operate in the following order:
+ * 1) check line_state is LINE_LEADER_POPULATED, if not it means leader is still
+ * populating the data.
+ * 2) read line_size to know the size of the data.
+ * 3) only one worker should choose one line for processing, this is handled by
+ * using pg_atomic_compare_exchange_u32, worker will change the state to
+ * LINE_WORKER_PROCESSING only if line_state is LINE_LEADER_POPULATED.
+ * 4) read first_block, start_offset & cur_lineno in any order.
+ * 5) process line_size data.
+ * 6) update line_size to -1.
+ */
+typedef struct ParallelCopyLineBoundary
+{
+ /* Position of the first block in data_blocks array. */
+ uint32 first_block;
+ uint32 start_offset; /* start offset of the line */
+
+ /*
+ * Size of the current line -1 means line is yet to be filled completely,
+ * 0 means empty line, >0 means line filled with line size data.
+ */
+ pg_atomic_uint32 line_size;
+ pg_atomic_uint32 line_state; /* line state */
+ uint64 cur_lineno; /* line number for error messages */
+} ParallelCopyLineBoundary;
+
+/*
+ * Circular queue used to store the line information.
+ */
+typedef struct ParallelCopyLineBoundaries
+{
+ /* Position for the leader to populate a line. */
+ uint32 pos;
+
+ /* Data read from the file/stdin by the leader process. */
+ ParallelCopyLineBoundary ring[RINGSIZE];
+} ParallelCopyLineBoundaries;
+
+/*
+ * Shared information among parallel copy workers. This will be allocated in the
+ * DSM segment.
+ */
+typedef struct ParallelCopyShmInfo
+{
+ bool is_read_in_progress; /* file read status */
+
+ /*
+ * Actual lines inserted by worker, will not be same as
+ * total_worker_processed if where condition is specified along with copy.
+ * This will be the actual records inserted into the relation.
+ */
+ pg_atomic_uint64 processed;
+
+ /*
+ * The number of records currently processed by the worker, this will also
+ * include the number of records that was filtered because of where
+ * clause.
+ */
+ pg_atomic_uint64 total_worker_processed;
+ uint64 populated; /* lines populated by leader */
+ uint32 cur_block_pos; /* current data block */
+ ParallelCopyDataBlock data_blocks[MAX_BLOCKS_COUNT]; /* data block array */
+ FullTransactionId full_transaction_id; /* xid for copy from statement */
+ CommandId mycid; /* command id */
+ ParallelCopyLineBoundaries line_boundaries; /* line array */
+} ParallelCopyShmInfo;
+
+/*
+ * Parallel copy line buffer information.
+ */
+typedef struct ParallelCopyLineBuf
+{
+ StringInfoData line_buf;
+ uint64 cur_lineno; /* line number for error messages */
+} ParallelCopyLineBuf;
+
+/*
+ * This structure helps in storing the common data from CopyStateData that are
+ * required by the workers. This information will then be allocated and stored
+ * into the DSM for the worker to retrieve and copy it to CopyStateData.
+ */
+typedef struct SerializedParallelCopyState
+{
+ /* low-level state data */
+ CopyDest copy_dest; /* type of copy source/destination */
+ int file_encoding; /* file or remote side's character encoding */
+ bool need_transcoding; /* file encoding diff from server? */
+ bool encoding_embeds_ascii; /* ASCII can be non-first byte? */
+
+ /* parameters from the COPY command */
+ bool csv_mode; /* Comma Separated Value format? */
+ bool header_line; /* CSV header line? */
+ int null_print_len; /* length of same */
+ bool force_quote_all; /* FORCE_QUOTE *? */
+ bool convert_selectively; /* do selective binary conversion? */
+
+ /* Working state for COPY FROM */
+ AttrNumber num_defaults;
+ Oid relid;
+} SerializedParallelCopyState;
+
+/*
+ * Parallel copy data information.
+ */
+typedef struct ParallelCopyData
+{
+ Oid relid; /* relation id of the table */
+ ParallelCopyShmInfo *pcshared_info; /* common info in shared memory */
+ bool is_leader;
+
+ WalUsage *walusage;
+ BufferUsage *bufferusage;
+
+ /* line position which worker is processing */
+ uint32 worker_processed_pos;
+
+ /*
+ * Local line_buf array, workers will copy it here and release the lines
+ * for the leader to continue.
+ */
+ ParallelCopyLineBuf worker_line_buf[WORKER_CHUNK_COUNT];
+ uint32 worker_line_buf_count; /* Number of lines */
+
+ /* Current position in worker_line_buf */
+ uint32 worker_line_buf_pos;
+} ParallelCopyData;
+
+typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
+
+/*
+ * This struct contains all the state variables used throughout a COPY
+ * operation. For simplicity, we use the same struct for all variants of COPY,
+ * even though some fields are used in only some cases.
+ *
+ * Multi-byte encodings: all supported client-side encodings encode multi-byte
+ * characters by having the first byte's high bit set. Subsequent bytes of the
+ * character can have the high bit not set. When scanning data in such an
+ * encoding to look for a match to a single-byte (ie ASCII) character, we must
+ * use the full pg_encoding_mblen() machinery to skip over multibyte
+ * characters, else we might find a false match to a trailing byte. In
+ * supported server encodings, there is no possibility of a false match, and
+ * it's faster to make useless comparisons to trailing bytes than it is to
+ * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true
+ * when we have to do it the hard way.
+ */
+typedef struct CopyStateData
+{
+ /* low-level state data */
+ CopyDest copy_dest; /* type of copy source/destination */
+ FILE *copy_file; /* used if copy_dest == COPY_FILE */
+ StringInfo fe_msgbuf; /* used for all dests during COPY TO, only for
+ * dest == COPY_NEW_FE in COPY FROM */
+ bool is_copy_from; /* COPY TO, or COPY FROM? */
+ bool reached_eof; /* true if we read to end of copy data (not
+ * all copy_dest types maintain this) */
+ EolType eol_type; /* EOL type of input */
+ int file_encoding; /* file or remote side's character encoding */
+ bool need_transcoding; /* file encoding diff from server? */
+ bool encoding_embeds_ascii; /* ASCII can be non-first byte? */
+
+ /* parameters from the COPY command */
+ Relation rel; /* relation to copy to or from */
+ QueryDesc *queryDesc; /* executable query to copy from */
+ List *attnumlist; /* integer list of attnums to copy */
+ char *filename; /* filename, or NULL for STDIN/STDOUT */
+ bool is_program; /* is 'filename' a program to popen? */
+ copy_data_source_cb data_source_cb; /* function for reading data */
+ bool binary; /* binary format? */
+ bool freeze; /* freeze rows on loading? */
+ bool csv_mode; /* Comma Separated Value format? */
+ bool header_line; /* CSV header line? */
+ char *null_print; /* NULL marker string (server encoding!) */
+ int null_print_len; /* length of same */
+ char *null_print_client; /* same converted to file encoding */
+ char *delim; /* column delimiter (must be 1 byte) */
+ char *quote; /* CSV quote char (must be 1 byte) */
+ char *escape; /* CSV escape char (must be 1 byte) */
+ List *force_quote; /* list of column names */
+ bool force_quote_all; /* FORCE_QUOTE *? */
+ bool *force_quote_flags; /* per-column CSV FQ flags */
+ List *force_notnull; /* list of column names */
+ bool *force_notnull_flags; /* per-column CSV FNN flags */
+ List *force_null; /* list of column names */
+ bool *force_null_flags; /* per-column CSV FN flags */
+ bool convert_selectively; /* do selective binary conversion? */
+ List *convert_select; /* list of column names (can be NIL) */
+ bool *convert_select_flags; /* per-column CSV/TEXT CS flags */
+ Node *whereClause; /* WHERE condition (or NULL) */
+
+ /* these are just for error messages, see CopyFromErrorCallback */
+ const char *cur_relname; /* table name for error messages */
+ uint64 cur_lineno; /* line number for error messages */
+ const char *cur_attname; /* current att for error messages */
+ const char *cur_attval; /* current att value for error messages */
+
+ /*
+ * Working state for COPY TO/FROM
+ */
+ MemoryContext copycontext; /* per-copy execution context */
+
+ /*
+ * Working state for COPY TO
+ */
+ FmgrInfo *out_functions; /* lookup info for output functions */
+ MemoryContext rowcontext; /* per-row evaluation context */
+
+ /*
+ * Working state for COPY FROM
+ */
+ AttrNumber num_defaults;
+ FmgrInfo *in_functions; /* array of input functions for each attrs */
+ Oid *typioparams; /* array of element types for in_functions */
+ int *defmap; /* array of default att numbers */
+ ExprState **defexprs; /* array of default att expressions */
+ bool volatile_defexprs; /* is any of defexprs volatile? */
+ List *range_table;
+ ExprState *qualexpr;
+
+ TransitionCaptureState *transition_capture;
+
+ /*
+ * These variables are used to reduce overhead in COPY FROM.
+ *
+ * attribute_buf holds the separated, de-escaped text for each field of
+ * the current line. The CopyReadAttributes functions return arrays of
+ * pointers into this buffer. We avoid palloc/pfree overhead by re-using
+ * the buffer on each cycle.
+ *
+ * In binary COPY FROM, attribute_buf holds the binary data for the
+ * current field, but the usage is otherwise similar.
+ */
+ StringInfoData attribute_buf;
+
+ /* field raw data pointers found by COPY FROM */
+
+ int max_fields;
+ char **raw_fields;
+
+ /*
+ * Similarly, line_buf holds the whole input line being processed. The
+ * input cycle is first to read the whole line into line_buf, convert it
+ * to server encoding there, and then extract the individual attribute
+ * fields into attribute_buf. line_buf is preserved unmodified so that we
+ * can display it in error messages if appropriate. (In binary mode,
+ * line_buf is not used.)
+ */
+ StringInfoData line_buf;
+ bool line_buf_converted; /* converted to server encoding? */
+ bool line_buf_valid; /* contains the row being processed? */
+
+ /*
+ * Finally, raw_buf holds raw data read from the data source (file or
+ * client connection). In text mode, CopyReadLine parses this data
+ * sufficiently to locate line boundaries, then transfers the data to
+ * line_buf and converts it. In binary mode, CopyReadBinaryData fetches
+ * appropriate amounts of data from this buffer. In both modes, we
+ * guarantee that there is a \0 at raw_buf[raw_buf_len].
+ */
+ char *raw_buf;
+ int raw_buf_index; /* next byte to process */
+ int raw_buf_len; /* total # of bytes stored */
+ int nworkers;
+ bool is_parallel;
+ ParallelCopyData *pcdata;
+ /* Shorthand for number of unconsumed bytes available in raw_buf */
+#define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
+} CopyStateData;
+
/* CopyStateData is private in commands/copy.c */
typedef struct CopyStateData *CopyState;
-typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
extern void DoCopy(ParseState *state, const CopyStmt *stmt,
int stmt_location, int stmt_len,
@@ -41,4 +421,11 @@ extern uint64 CopyFrom(CopyState cstate);
extern DestReceiver *CreateCopyDestReceiver(void);
+extern void PopulateCommonCstateInfo(CopyState cstate, TupleDesc tup_desc,
+ List *attnamelist);
+
+extern void ParallelCopyMain(dsm_segment *seg, shm_toc *toc);
+extern ParallelContext *BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid);
+extern void ParallelCopyFrom(CopyState cstate);
+extern void EndParallelCopy(ParallelContext *pcxt);
#endif /* COPY_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index c52f20d..5ce8296 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1702,6 +1702,12 @@ ParallelBitmapHeapState
ParallelBlockTableScanDesc
ParallelCompletionPtr
ParallelContext
+ParallelCopyLineBoundaries
+ParallelCopyLineBoundary
+ParallelCopyData
+ParallelCopyDataBlock
+ParallelCopyLineBuf
+ParallelCopyShmInfo
ParallelExecutorInfo
ParallelHashGrowth
ParallelHashJoinBatch
@@ -2224,6 +2230,7 @@ SerCommitSeqNo
SerialControl
SerializableXactHandle
SerializedActiveRelMaps
+SerializedParallelCopyState
SerializedReindexState
SerializedSnapshotData
SerializedTransactionState
--
1.8.3.1
v7-0003-Allow-copy-from-command-to-process-data-from-file.patchtext/x-patch; charset=US-ASCII; name=v7-0003-Allow-copy-from-command-to-process-data-from-file.patchDownload
From c0a0e8e583a67b1a6090c3cd0b521cbff31a2d2f Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Wed, 14 Oct 2020 12:48:22 +0530
Subject: [PATCH v7 3/6] Allow copy from command to process data from
file/STDIN contents to a table in parallel.
This feature allows the copy from to leverage multiple CPUs in order to copy
data from file/STDIN to a table. This adds a PARALLEL option to COPY FROM
command where the user can specify the number of workers that can be used
to perform the COPY FROM command.
The backend, to which the "COPY FROM" query is submitted acts as leader with
the responsibility of reading data from the file/stdin, launching at most n
number of workers as specified with PARALLEL 'n' option in the "COPY FROM"
query. The leader populates the common data required for the workers execution
in the DSM and shares it with the workers. The leader then executes before
statement triggers if there exists any. Leader populates DSM lines which
includes the start offset and line size, while populating the lines it reads
as many blocks as required into the DSM data blocks from the file. Each block
is of 64K size. The leader parses the data to identify a line, the existing
logic from CopyReadLineText which identifies the lines with some changes was
used for this. Leader checks if a free line is available to copy the
information, if there is no free line it waits till the required line is
freed up by the worker and then copies the identified lines information
(offset & line size) into the DSM lines. This process is repeated till the
complete file is processed. Simultaneously, the workers cache the lines(50)
locally into the local memory and release the lines to the leader for further
populating. Each worker processes the lines that it cached and inserts it into
the table.
The leader does not participate in the insertion of data, leaders only
responsibility will be to identify the lines as fast as possible for the
workers to do the actual copy operation. The leader waits till all the lines
populated are processed by the workers and exits. We have chosen this design
based on the reason "that everything stalls if the leader doesn't accept further
input data, as well as when there are no available splitted chunks so it doesn't
seem like a good idea to have the leader do other work. This is backed by the
performance data where we have seen that with 1 worker there is just a 5-10%
performance difference".
---
src/backend/access/common/toast_internals.c | 12 +-
src/backend/access/heap/heapam.c | 11 -
src/backend/access/transam/xact.c | 15 +
src/backend/commands/Makefile | 1 +
src/backend/commands/copy.c | 243 +++--
src/backend/commands/copyparallel.c | 1301 +++++++++++++++++++++++++++
src/bin/psql/tab-complete.c | 2 +-
src/include/access/xact.h | 1 +
src/include/commands/copy.h | 54 +-
src/tools/pgindent/typedefs.list | 1 +
10 files changed, 1558 insertions(+), 83 deletions(-)
create mode 100644 src/backend/commands/copyparallel.c
diff --git a/src/backend/access/common/toast_internals.c b/src/backend/access/common/toast_internals.c
index 25a81e5..70c070e 100644
--- a/src/backend/access/common/toast_internals.c
+++ b/src/backend/access/common/toast_internals.c
@@ -17,6 +17,7 @@
#include "access/genam.h"
#include "access/heapam.h"
#include "access/heaptoast.h"
+#include "access/parallel.h"
#include "access/table.h"
#include "access/toast_internals.h"
#include "access/xact.h"
@@ -116,7 +117,16 @@ toast_save_datum(Relation rel, Datum value,
TupleDesc toasttupDesc;
Datum t_values[3];
bool t_isnull[3];
- CommandId mycid = GetCurrentCommandId(true);
+
+ /*
+ * Parallel copy can insert toast tuples, in case of parallel copy the
+ * command would have been set already by calling
+ * AssignCommandIdForWorker. For parallel copy call GetCurrentCommandId to
+ * get currentCommandId by passing used as false, as this is taken care
+ * earlier.
+ */
+ CommandId mycid = IsParallelWorker() ? GetCurrentCommandId(false) :
+ GetCurrentCommandId(true);
struct varlena *result;
struct varatt_external toast_pointer;
union
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 1585861..1602525 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2043,17 +2043,6 @@ static HeapTuple
heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
CommandId cid, int options)
{
- /*
- * To allow parallel inserts, we need to ensure that they are safe to be
- * performed in workers. We have the infrastructure to allow parallel
- * inserts in general except for the cases where inserts generate a new
- * CommandId (eg. inserts into a table having a foreign key column).
- */
- if (IsParallelWorker())
- ereport(ERROR,
- (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
- errmsg("cannot insert tuples in a parallel worker")));
-
tup->t_data->t_infomask &= ~(HEAP_XACT_MASK);
tup->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
tup->t_data->t_infomask |= HEAP_XMAX_INVALID;
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index af6afce..0b3337c 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -776,6 +776,21 @@ GetCurrentCommandId(bool used)
}
/*
+ * SetCurrentCommandIdUsedForWorker
+ *
+ * For a parallel worker, record that the currentCommandId has been used.
+ * This must only be called at the start of a parallel operation.
+ */
+void
+SetCurrentCommandIdUsedForWorker(void)
+{
+ Assert(IsParallelWorker() && !currentCommandIdUsed &&
+ (currentCommandId != InvalidCommandId));
+
+ currentCommandIdUsed = true;
+}
+
+/*
* SetParallelStartTimestamps
*
* In a parallel worker, we should inherit the parent transaction's
diff --git a/src/backend/commands/Makefile b/src/backend/commands/Makefile
index d4815d3..a224aac 100644
--- a/src/backend/commands/Makefile
+++ b/src/backend/commands/Makefile
@@ -24,6 +24,7 @@ OBJS = \
constraint.o \
conversioncmds.o \
copy.o \
+ copyparallel.o \
createas.o \
dbcommands.o \
define.o \
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 6c5dc2a..9a026be 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -61,20 +61,6 @@
#define ISOCTAL(c) (((c) >= '0') && ((c) <= '7'))
#define OCTVALUE(c) ((c) - '0')
-/*
- * Represents the heap insert method to be used during COPY FROM.
- */
-typedef enum CopyInsertMethod
-{
- CIM_SINGLE, /* use table_tuple_insert or fdw routine */
- CIM_MULTI, /* always use table_multi_insert */
- CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
-} CopyInsertMethod;
-
-#define IsParallelCopy() (cstate->is_parallel)
-#define IsLeader() (cstate->pcdata->is_leader)
-#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
-
/* DestReceiver for COPY (query) TO */
typedef struct
{
@@ -83,7 +69,6 @@ typedef struct
uint64 processed; /* # of tuples processed */
} DR_copy;
-
/*
* No more than this many tuples per CopyMultiInsertBuffer
*
@@ -181,9 +166,13 @@ if (1) \
{ \
if (raw_buf_ptr > cstate->raw_buf_index) \
{ \
- appendBinaryStringInfo(&cstate->line_buf, \
- cstate->raw_buf + cstate->raw_buf_index, \
- raw_buf_ptr - cstate->raw_buf_index); \
+ if (!IsParallelCopy()) \
+ appendBinaryStringInfo(&cstate->line_buf, \
+ cstate->raw_buf + cstate->raw_buf_index, \
+ raw_buf_ptr - cstate->raw_buf_index); \
+ else \
+ line_size += raw_buf_ptr - cstate->raw_buf_index; \
+ \
cstate->raw_buf_index = raw_buf_ptr; \
} \
} else ((void) 0)
@@ -212,7 +201,6 @@ static void EndCopyTo(CopyState cstate);
static uint64 DoCopyTo(CopyState cstate);
static uint64 CopyTo(CopyState cstate);
static void CopyOneRowTo(CopyState cstate, TupleTableSlot *slot);
-static bool CopyReadLine(CopyState cstate);
static bool CopyReadLineText(CopyState cstate);
static int CopyReadAttributesText(CopyState cstate);
static int CopyReadAttributesCSV(CopyState cstate);
@@ -245,7 +233,6 @@ static int CopyReadBinaryData(CopyState cstate, char *dest, int nbytes);
static void ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
int copy_line_pos, int *copy_line_size);
-static void ConvertToServerEncoding(CopyState cstate);
/*
@@ -645,11 +632,11 @@ CopyGetInt16(CopyState cstate, int16 *val)
static bool
CopyLoadRawBuf(CopyState cstate)
{
- int nbytes = RAW_BUF_BYTES(cstate);
+ int nbytes = (!IsParallelCopy()) ? RAW_BUF_BYTES(cstate) : cstate->raw_buf_index;
int inbytes;
/* Copy down the unprocessed data if any. */
- if (nbytes > 0)
+ if (nbytes > 0 && !IsParallelCopy())
memmove(cstate->raw_buf, cstate->raw_buf + cstate->raw_buf_index,
nbytes);
@@ -657,7 +644,9 @@ CopyLoadRawBuf(CopyState cstate)
1, RAW_BUF_SIZE - nbytes);
nbytes += inbytes;
cstate->raw_buf[nbytes] = '\0';
- cstate->raw_buf_index = 0;
+ if (!IsParallelCopy())
+ cstate->raw_buf_index = 0;
+
cstate->raw_buf_len = nbytes;
return (inbytes > 0);
}
@@ -969,7 +958,11 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
cstate->whereClause = whereClause;
cstate->is_parallel = false;
- if (cstate->nworkers > 0)
+ /*
+ * User chosen parallel copy. Determine if the parallel copy is
+ * actually allowed. If not, go with the non-parallel mode.
+ */
+ if (cstate->nworkers > 0 && IsParallelCopyAllowed(cstate))
pcxt = BeginParallelCopy(cstate->nworkers, cstate, stmt->attlist,
relid);
@@ -994,7 +987,15 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
EndParallelCopy(pcxt);
}
else
+ {
+ /*
+ * Reset nworkers to -1 here. This is useful in cases where user
+ * specifies parallel workers, but, no worker is picked up, so go
+ * back to non parallel mode value of nworkers.
+ */
+ cstate->nworkers = -1;
*processed = CopyFrom(cstate); /* copy from file to database */
+ }
EndCopyFrom(cstate);
}
@@ -2626,7 +2627,7 @@ CopyMultiInsertInfoStore(CopyMultiInsertInfo *miinfo, ResultRelInfo *rri,
*
* Check if the relation specified in copy from is valid.
*/
-static void
+void
CheckTargetRelValidity(CopyState cstate)
{
Assert(cstate->rel);
@@ -2723,7 +2724,7 @@ CopyFrom(CopyState cstate)
PartitionTupleRouting *proute = NULL;
ErrorContextCallback errcallback;
- CommandId mycid = GetCurrentCommandId(true);
+ CommandId mycid;
int ti_options = 0; /* start with default options for insert */
BulkInsertState bistate = NULL;
CopyInsertMethod insertMethod;
@@ -2733,7 +2734,18 @@ CopyFrom(CopyState cstate)
bool has_instead_insert_row_trig;
bool leafpart_use_multi_insert = false;
- CheckTargetRelValidity(cstate);
+ /*
+ * Perform this check if it is not parallel copy. In case of parallel
+ * copy, this check is done by the leader, so that if any invalid case
+ * exist the copy from command will error out from the leader itself,
+ * avoiding launching workers, just to throw error.
+ */
+ if (!IsParallelCopy())
+ CheckTargetRelValidity(cstate);
+ else
+ SetCurrentCommandIdUsedForWorker();
+
+ mycid = GetCurrentCommandId(!IsParallelCopy());
/*
* If the target file is new-in-transaction, we assume that checking FSM
@@ -2769,7 +2781,8 @@ CopyFrom(CopyState cstate)
ExecInitResultRelation(estate, resultRelInfo, 1);
/* Verify the named relation is a valid target for INSERT */
- CheckValidResultRel(resultRelInfo, CMD_INSERT);
+ if (!IsParallelCopy())
+ CheckValidResultRel(resultRelInfo, CMD_INSERT);
ExecOpenIndices(resultRelInfo, false);
@@ -2914,13 +2927,17 @@ CopyFrom(CopyState cstate)
has_instead_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
resultRelInfo->ri_TrigDesc->trig_insert_instead_row);
- /*
- * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
- * should do this for COPY, since it's not really an "INSERT" statement as
- * such. However, executing these triggers maintains consistency with the
- * EACH ROW triggers that we already fire on COPY.
- */
- ExecBSInsertTriggers(estate, resultRelInfo);
+ if (!IsParallelCopy())
+ {
+ /*
+ * Check BEFORE STATEMENT insertion triggers. It's debatable whether
+ * we should do this for COPY, since it's not really an "INSERT"
+ * statement as such. However, executing these triggers maintains
+ * consistency with the EACH ROW triggers that we already fire on
+ * COPY.
+ */
+ ExecBSInsertTriggers(estate, resultRelInfo);
+ }
econtext = GetPerTupleExprContext(estate);
@@ -3020,6 +3037,29 @@ CopyFrom(CopyState cstate)
!has_instead_insert_row_trig &&
resultRelInfo->ri_FdwRoutine == NULL;
+ /*
+ * We may still be able to perform parallel inserts for
+ * partitioned tables. However, the possibility of this
+ * depends on which types of triggers exist on the partition.
+ * We must not do parallel inserts if the partition is a
+ * foreign table or it has any BEFORE/INSTEAD OF row triggers.
+ * Since the partition's resultRelInfo are initialized only
+ * when we actually insert the first tuple into them, we may
+ * not know this info easily in the leader while deciding for
+ * the parallelism. We would have gone ahead and allowed
+ * parallelism. Now it's the time to throw an error and also
+ * provide a hint to the user to not use parallelism. Throwing
+ * an error seemed a simple approach than to look for all the
+ * partitions in the leader while deciding for the
+ * parallelism. Note that this error is thrown early, exactly
+ * on the first tuple being inserted into the partition, so
+ * not much work, that has been done so far, is wasted.
+ */
+ if (!leafpart_use_multi_insert && IsParallelWorker())
+ ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot perform PARALLEL COPY if partition has BEFORE/INSTEAD OF triggers, or if the partition is foreign partition"),
+ errhint("Try COPY without PARALLEL option")));
+
/* Set the multi-insert buffer to use for this partition. */
if (leafpart_use_multi_insert)
{
@@ -3242,7 +3282,10 @@ CopyFrom(CopyState cstate)
* or FDW; this is the same definition used by nodeModifyTable.c
* for counting tuples inserted by an INSERT command.
*/
- processed++;
+ if (!IsParallelCopy())
+ processed++;
+ else
+ pg_atomic_add_fetch_u64(&cstate->pcdata->pcshared_info->processed, 1);
}
}
@@ -3296,7 +3339,10 @@ CopyFrom(CopyState cstate)
FreeExecutorState(estate);
- return processed;
+ if (!IsParallelCopy())
+ return processed;
+ else
+ return pg_atomic_read_u64(&cstate->pcdata->pcshared_info->processed);
}
/*
@@ -3304,7 +3350,7 @@ CopyFrom(CopyState cstate)
*
* Populate the cstate catalog information.
*/
-static void
+void
PopulateCstateCatalogInfo(CopyState cstate)
{
TupleDesc tupDesc;
@@ -3586,26 +3632,35 @@ NextCopyFromRawFields(CopyState cstate, char ***fields, int *nfields)
/* only available for text or csv input */
Assert(!cstate->binary);
- /* on input just throw the header line away */
- if (cstate->cur_lineno == 0 && cstate->header_line)
+ if (IsParallelCopy())
{
- cstate->cur_lineno++;
- if (CopyReadLine(cstate))
- return false; /* done */
+ done = GetWorkerLine(cstate);
+ if (done && cstate->line_buf.len == 0)
+ return false;
}
+ else
+ {
+ /* on input just throw the header line away */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
+ {
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ return false; /* done */
+ }
- cstate->cur_lineno++;
+ cstate->cur_lineno++;
- /* Actually read the line into memory here */
- done = CopyReadLine(cstate);
+ /* Actually read the line into memory here */
+ done = CopyReadLine(cstate);
- /*
- * EOF at start of line means we're done. If we see EOF after some
- * characters, we act as though it was newline followed by EOF, ie,
- * process the line and then exit loop on next iteration.
- */
- if (done && cstate->line_buf.len == 0)
- return false;
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ return false;
+ }
/* Parse the line into de-escaped field values */
if (cstate->csv_mode)
@@ -3830,7 +3885,7 @@ EndCopyFrom(CopyState cstate)
* by newline. The terminating newline or EOF marker is not included
* in the final value of line_buf.
*/
-static bool
+bool
CopyReadLine(CopyState cstate)
{
bool result;
@@ -3853,9 +3908,31 @@ CopyReadLine(CopyState cstate)
*/
if (cstate->copy_dest == COPY_NEW_FE)
{
+ bool bIsFirst = true;
+
do
{
- cstate->raw_buf_index = cstate->raw_buf_len;
+ if (!IsParallelCopy())
+ cstate->raw_buf_index = cstate->raw_buf_len;
+ else
+ {
+ /*
+ * Get a new block if it is the first time, From the
+ * subsequent time, reset the index and re-use the same
+ * block.
+ */
+ if (bIsFirst)
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint32 block_pos = WaitGetFreeCopyBlock(pcshared_info);
+
+ cstate->raw_buf = pcshared_info->data_blocks[block_pos].data;
+ bIsFirst = false;
+ }
+
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+ }
+
} while (CopyLoadRawBuf(cstate));
}
}
@@ -3910,11 +3987,11 @@ ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
*
* Convert contents to server encoding.
*/
-static void
+void
ConvertToServerEncoding(CopyState cstate)
{
/* Done reading the line. Convert it to server encoding. */
- if (cstate->need_transcoding)
+ if (cstate->need_transcoding && (!IsParallelCopy() || IsWorker()))
{
char *cvt;
@@ -3954,6 +4031,11 @@ CopyReadLineText(CopyState cstate)
char quotec = '\0';
char escapec = '\0';
+ /* For parallel copy */
+ int line_size = 0;
+ uint32 line_pos = 0;
+
+ cstate->eol_type = EOL_UNKNOWN;
if (cstate->csv_mode)
{
quotec = cstate->quote[0];
@@ -4008,6 +4090,10 @@ CopyReadLineText(CopyState cstate)
if (raw_buf_ptr >= copy_buf_len || need_data)
{
REFILL_LINEBUF;
+ if ((copy_buf_len == DATA_BLOCK_SIZE || copy_buf_len == 0) &&
+ IsParallelCopy())
+ SetRawBufForLoad(cstate, line_size, copy_buf_len, raw_buf_ptr,
+ ©_raw_buf);
/*
* Try to read some more data. This will certainly reset
@@ -4015,14 +4101,14 @@ CopyReadLineText(CopyState cstate)
*/
if (!CopyLoadRawBuf(cstate))
hit_eof = true;
- raw_buf_ptr = 0;
+ raw_buf_ptr = (IsParallelCopy()) ? cstate->raw_buf_index : 0;
copy_buf_len = cstate->raw_buf_len;
/*
* If we are completely out of data, break out of the loop,
* reporting EOF.
*/
- if (copy_buf_len <= 0)
+ if (RAW_BUF_BYTES(cstate) <= 0)
{
result = true;
break;
@@ -4232,9 +4318,15 @@ CopyReadLineText(CopyState cstate)
* discard the data and the \. sequence.
*/
if (prev_raw_ptr > cstate->raw_buf_index)
- appendBinaryStringInfo(&cstate->line_buf,
- cstate->raw_buf + cstate->raw_buf_index,
- prev_raw_ptr - cstate->raw_buf_index);
+ {
+ if (!IsParallelCopy())
+ appendBinaryStringInfo(&cstate->line_buf,
+ cstate->raw_buf + cstate->raw_buf_index,
+ prev_raw_ptr - cstate->raw_buf_index);
+ else
+ line_size += prev_raw_ptr - cstate->raw_buf_index;
+ }
+
cstate->raw_buf_index = raw_buf_ptr;
result = true; /* report EOF */
break;
@@ -4286,6 +4378,22 @@ not_end_of_copy:
IF_NEED_REFILL_AND_EOF_BREAK(mblen - 1);
raw_buf_ptr += mblen - 1;
}
+
+ /*
+ * Skip the header line. Update the line here, this cannot be done at
+ * the beginning, as there is a possibility that file contains empty
+ * lines.
+ */
+ if (IsParallelCopy() && first_char_in_line && !IsHeaderLine())
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+
+ line_pos = UpdateSharedLineInfo(cstate,
+ pcshared_info->cur_block_pos,
+ cstate->raw_buf_index, -1,
+ LINE_LEADER_POPULATING, -1);
+ }
+
first_char_in_line = false;
} /* end of outer loop */
@@ -4294,9 +4402,16 @@ not_end_of_copy:
*/
REFILL_LINEBUF;
if (!result && !IsHeaderLine())
- ClearEOLFromCopiedData(cstate, cstate->line_buf.data,
- cstate->line_buf.len, &cstate->line_buf.len);
+ {
+ if (IsParallelCopy())
+ ClearEOLFromCopiedData(cstate, cstate->raw_buf, raw_buf_ptr,
+ &line_size);
+ else
+ ClearEOLFromCopiedData(cstate, cstate->line_buf.data,
+ cstate->line_buf.len, &cstate->line_buf.len);
+ }
+ EndLineParallelCopy(cstate, line_pos, line_size, raw_buf_ptr);
return result;
}
diff --git a/src/backend/commands/copyparallel.c b/src/backend/commands/copyparallel.c
new file mode 100644
index 0000000..9cae112
--- /dev/null
+++ b/src/backend/commands/copyparallel.c
@@ -0,0 +1,1301 @@
+/*-------------------------------------------------------------------------
+ *
+ * copyparallel.c
+ * Implements the Parallel COPY utility command
+ *
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/commands/copyparallel.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/heapam.h"
+#include "catalog/pg_proc_d.h"
+#include "commands/copy.h"
+#include "optimizer/clauses.h"
+#include "optimizer/optimizer.h"
+#include "pgstat.h"
+#include "utils/lsyscache.h"
+
+/* DSM keys for parallel copy. */
+#define PARALLEL_COPY_KEY_SHARED_INFO 1
+#define PARALLEL_COPY_KEY_CSTATE 2
+#define PARALLEL_COPY_WAL_USAGE 3
+#define PARALLEL_COPY_BUFFER_USAGE 4
+
+/* Begin parallel copy Macros */
+#define SET_NEWLINE_SIZE() \
+{ \
+ if (cstate->eol_type == EOL_NL || cstate->eol_type == EOL_CR) \
+ new_line_size = 1; \
+ else if (cstate->eol_type == EOL_CRNL) \
+ new_line_size = 2; \
+ else \
+ new_line_size = 0; \
+}
+
+/*
+ * COPY_WAIT_TO_PROCESS - Wait before continuing to process.
+ */
+#define COPY_WAIT_TO_PROCESS() \
+{ \
+ CHECK_FOR_INTERRUPTS(); \
+ (void) WaitLatch(MyLatch, \
+ WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, \
+ 1L, WAIT_EVENT_PG_SLEEP); \
+ ResetLatch(MyLatch); \
+}
+
+/*
+ * CopyStringToSharedMemory
+ *
+ * Copy the string to shared memory.
+ */
+static uint32
+CopyStringToSharedMemory(CopyState cstate, char *srcPtr, char *destptr)
+{
+ uint32 len = srcPtr ? strlen(srcPtr) + 1 : 0;
+ uint32 copiedsize;
+
+ memcpy(destptr, (uint32 *) &len, sizeof(uint32));
+ copiedsize = sizeof(uint32);
+ if (len)
+ {
+ memcpy(destptr + sizeof(uint32), srcPtr, len);
+ copiedsize += len;
+ }
+
+ return copiedsize;
+}
+
+/*
+ * SerializeParallelCopyState
+ *
+ * Serialize the cstate members required by the workers into shared memory.
+ */
+static void
+SerializeParallelCopyState(ParallelContext *pcxt, CopyState cstate,
+ uint32 estimatedSize, char *whereClauseStr,
+ char *rangeTableStr, char *attnameListStr,
+ char *notnullListStr, char *nullListStr,
+ char *convertListStr)
+{
+ SerializedParallelCopyState shared_cstate;
+ char *shmptr = (char *) shm_toc_allocate(pcxt->toc, estimatedSize + 1);
+ uint32 copiedsize = 0;
+
+ shared_cstate.copy_dest = cstate->copy_dest;
+ shared_cstate.file_encoding = cstate->file_encoding;
+ shared_cstate.need_transcoding = cstate->need_transcoding;
+ shared_cstate.encoding_embeds_ascii = cstate->encoding_embeds_ascii;
+ shared_cstate.csv_mode = cstate->csv_mode;
+ shared_cstate.header_line = cstate->header_line;
+ shared_cstate.null_print_len = cstate->null_print_len;
+ shared_cstate.force_quote_all = cstate->force_quote_all;
+ shared_cstate.convert_selectively = cstate->convert_selectively;
+ shared_cstate.num_defaults = cstate->num_defaults;
+ shared_cstate.relid = cstate->pcdata->relid;
+
+ memcpy(shmptr, (char *) &shared_cstate, sizeof(SerializedParallelCopyState));
+ copiedsize = sizeof(SerializedParallelCopyState);
+
+ copiedsize += CopyStringToSharedMemory(cstate, cstate->null_print,
+ shmptr + copiedsize);
+ copiedsize += CopyStringToSharedMemory(cstate, cstate->delim,
+ shmptr + copiedsize);
+ copiedsize += CopyStringToSharedMemory(cstate, cstate->quote,
+ shmptr + copiedsize);
+ copiedsize += CopyStringToSharedMemory(cstate, cstate->escape,
+ shmptr + copiedsize);
+ copiedsize += CopyStringToSharedMemory(cstate, attnameListStr,
+ shmptr + copiedsize);
+ copiedsize += CopyStringToSharedMemory(cstate, notnullListStr,
+ shmptr + copiedsize);
+ copiedsize += CopyStringToSharedMemory(cstate, nullListStr,
+ shmptr + copiedsize);
+ copiedsize += CopyStringToSharedMemory(cstate, convertListStr,
+ shmptr + copiedsize);
+ copiedsize += CopyStringToSharedMemory(cstate, whereClauseStr,
+ shmptr + copiedsize);
+ copiedsize += CopyStringToSharedMemory(cstate, rangeTableStr,
+ shmptr + copiedsize);
+
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_CSTATE, shmptr);
+}
+
+/*
+ * CopyStringFromSharedMemory
+ *
+ * Copy the string contents from shared memory & return the ptr.
+ */
+static char *
+CopyStringFromSharedMemory(char *srcPtr, uint32 *copiedsize)
+{
+ char *destptr = NULL;
+ uint32 len;
+
+ memcpy((uint32 *) (&len), srcPtr, sizeof(uint32));
+ *copiedsize += sizeof(uint32);
+ if (len)
+ {
+ destptr = (char *) palloc(len);
+ memcpy(destptr, srcPtr + sizeof(uint32), len);
+ *copiedsize += len;
+ }
+
+ return destptr;
+}
+
+/*
+ * CopyNodeFromSharedMemory
+ *
+ * Copy the node contents which was stored as string format in shared memory &
+ * convert it into node type.
+ */
+static void *
+CopyNodeFromSharedMemory(char *srcPtr, uint32 *copiedsize)
+{
+ char *destptr = NULL;
+ List *destList = NIL;
+ uint32 len;
+
+ memcpy((uint32 *) (&len), srcPtr, sizeof(uint32));
+ *copiedsize += sizeof(uint32);
+ if (len)
+ {
+ destptr = (char *) palloc(len);
+ memcpy(destptr, srcPtr + sizeof(uint32), len);
+ *copiedsize += len;
+ destList = (List *) stringToNode(destptr);
+ pfree(destptr);
+ }
+
+ return destList;
+}
+
+/*
+ * RestoreParallelCopyState
+ *
+ * Retrieve the cstate members which was populated by the leader in the shared
+ * memory.
+ */
+static void
+RestoreParallelCopyState(shm_toc *toc, CopyState cstate, List **attlist)
+{
+ char *shared_str_val = (char *) shm_toc_lookup(toc, PARALLEL_COPY_KEY_CSTATE, true);
+ SerializedParallelCopyState shared_cstate = {0};
+ uint32 copiedsize = 0;
+
+ memcpy(&shared_cstate, (char *) shared_str_val, sizeof(SerializedParallelCopyState));
+ copiedsize = sizeof(SerializedParallelCopyState);
+
+ cstate->file_encoding = shared_cstate.file_encoding;
+ cstate->need_transcoding = shared_cstate.need_transcoding;
+ cstate->encoding_embeds_ascii = shared_cstate.encoding_embeds_ascii;
+ cstate->csv_mode = shared_cstate.csv_mode;
+ cstate->header_line = shared_cstate.header_line;
+ cstate->null_print_len = shared_cstate.null_print_len;
+ cstate->force_quote_all = shared_cstate.force_quote_all;
+ cstate->convert_selectively = shared_cstate.convert_selectively;
+ cstate->num_defaults = shared_cstate.num_defaults;
+ cstate->pcdata->relid = shared_cstate.relid;
+
+ cstate->null_print = CopyStringFromSharedMemory(shared_str_val + copiedsize,
+ &copiedsize);
+ cstate->delim = CopyStringFromSharedMemory(shared_str_val + copiedsize,
+ &copiedsize);
+ cstate->quote = CopyStringFromSharedMemory(shared_str_val + copiedsize,
+ &copiedsize);
+ cstate->escape = CopyStringFromSharedMemory(shared_str_val + copiedsize,
+ &copiedsize);
+
+ *attlist = (List *) CopyNodeFromSharedMemory(shared_str_val + copiedsize,
+ &copiedsize);
+ cstate->force_notnull = (List *) CopyNodeFromSharedMemory(shared_str_val + copiedsize,
+ &copiedsize);
+ cstate->force_null = (List *) CopyNodeFromSharedMemory(shared_str_val + copiedsize,
+ &copiedsize);
+ cstate->convert_select = (List *) CopyNodeFromSharedMemory(shared_str_val + copiedsize,
+ &copiedsize);
+ cstate->whereClause = (Node *) CopyNodeFromSharedMemory(shared_str_val + copiedsize,
+ &copiedsize);
+ cstate->range_table = (List *) CopyNodeFromSharedMemory(shared_str_val + copiedsize,
+ &copiedsize);
+}
+
+/*
+ * EstimateStringSize
+ *
+ * Estimate the size required for the string in shared memory.
+ */
+static uint32
+EstimateStringSize(char *str)
+{
+ uint32 strsize = sizeof(uint32);
+
+ if (str)
+ strsize += strlen(str) + 1;
+
+ return strsize;
+}
+
+/*
+ * EstimateNodeSize
+ *
+ * Convert the input list/node to string & estimate the size required in shared
+ * memory.
+ */
+static uint32
+EstimateNodeSize(void *list, char **listStr)
+{
+ uint32 strsize = sizeof(uint32);
+
+ if (list != NIL)
+ {
+ *listStr = nodeToString(list);
+ strsize += strlen(*listStr) + 1;
+ }
+
+ return strsize;
+}
+
+/*
+ * EstimateCstateSize
+ *
+ * Estimate the size of the required cstate variables in the shared memory.
+ */
+static uint32
+EstimateCstateSize(ParallelContext *pcxt, CopyState cstate, List *attnamelist,
+ char **whereClauseStr, char **rangeTableStr,
+ char **attnameListStr, char **notnullListStr,
+ char **nullListStr, char **convertListStr)
+{
+ uint32 strsize = MAXALIGN(sizeof(SerializedParallelCopyState));
+
+ strsize += EstimateStringSize(cstate->null_print);
+ strsize += EstimateStringSize(cstate->delim);
+ strsize += EstimateStringSize(cstate->quote);
+ strsize += EstimateStringSize(cstate->escape);
+ strsize += EstimateNodeSize(attnamelist, attnameListStr);
+ strsize += EstimateNodeSize(cstate->force_notnull, notnullListStr);
+ strsize += EstimateNodeSize(cstate->force_null, nullListStr);
+ strsize += EstimateNodeSize(cstate->convert_select, convertListStr);
+ strsize += EstimateNodeSize(cstate->whereClause, whereClauseStr);
+ strsize += EstimateNodeSize(cstate->range_table, rangeTableStr);
+
+ strsize++;
+ shm_toc_estimate_chunk(&pcxt->estimator, strsize);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ return strsize;
+}
+
+/*
+ * PopulateParallelCopyShmInfo
+ *
+ * Sets ParallelCopyShmInfo structure members.
+ */
+static void
+PopulateParallelCopyShmInfo(ParallelCopyShmInfo *shared_info_ptr)
+{
+ uint32 count;
+
+ MemSet(shared_info_ptr, 0, sizeof(ParallelCopyShmInfo));
+ shared_info_ptr->is_read_in_progress = true;
+ shared_info_ptr->cur_block_pos = -1;
+ for (count = 0; count < RINGSIZE; count++)
+ {
+ ParallelCopyLineBoundary *lineInfo = &shared_info_ptr->line_boundaries.ring[count];
+
+ pg_atomic_init_u32(&(lineInfo->line_size), -1);
+ }
+}
+
+/*
+ * CheckRelTrigFunParallelSafety
+ *
+ * Check if the relation's associated trigger functions are parallel safe. If
+ * any of the trigger function is parallel unsafe or if trigger is on foreign
+ * key relation, we do not allow parallel copy.
+ */
+static pg_attribute_always_inline bool
+CheckRelTrigFunParallelSafety(TriggerDesc *trigdesc)
+{
+ int i;
+
+ for (i = 0; i < trigdesc->numtriggers; i++)
+ {
+ Trigger *trigger = &trigdesc->triggers[i];
+ int trigtype = RI_TRIGGER_NONE;
+
+ if (func_parallel(trigger->tgfoid) != PROPARALLEL_SAFE)
+ return false;
+
+ /* If the trigger is parallel safe, also look for RI_TRIGGER. */
+ trigtype = RI_FKey_trigger_type(trigger->tgfoid);
+
+ /*
+ * No parallelism if foreign key check trigger is present. This is
+ * because, while performing foreign key checks, we take KEY SHARE
+ * lock on primary key table rows which inturn will increment the
+ * command counter and updates the snapshot. Since we share the
+ * snapshots at the beginning of the command, we can't allow it to be
+ * changed later. So, unless we do something special for it, we can't
+ * allow parallelism in such cases.
+ */
+ if (trigtype == RI_TRIGGER_FK)
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * CheckExprParallelSafety
+ *
+ * Determine parallel safety of volatile expressions in default clause of column
+ * definition or in where clause and return true if they are parallel safe.
+ */
+static pg_attribute_always_inline bool
+CheckExprParallelSafety(CopyState cstate)
+{
+ if (contain_volatile_functions(cstate->whereClause))
+ {
+ if (max_parallel_hazard((Query *) cstate->whereClause) != PROPARALLEL_SAFE)
+ return false;
+ }
+
+ /*
+ * Check if any of the column has volatile default expression. if yes, and
+ * they are not parallel safe, then parallelism is not allowed. For
+ * instance, if there are any serial/bigserial columns for which nextval()
+ * default expression which is parallel unsafe is associated, parallelism
+ * should not be allowed.
+ */
+ if (cstate->defexprs != NULL && cstate->num_defaults != 0)
+ {
+ int i;
+
+ for (i = 0; i < cstate->num_defaults; i++)
+ {
+ bool volatile_expr = contain_volatile_functions((Node *) cstate->defexprs[i]->expr);
+
+ if (volatile_expr &&
+ (max_parallel_hazard((Query *) cstate->defexprs[i]->expr)) !=
+ PROPARALLEL_SAFE)
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * IsParallelCopyAllowed
+ *
+ * Check if parallel copy can be allowed.
+ */
+bool
+IsParallelCopyAllowed(CopyState cstate)
+{
+ /* Parallel copy not allowed for frontend (2.0 protocol) & binary option. */
+ if ((cstate->copy_dest == COPY_OLD_FE) || cstate->binary)
+ return false;
+
+ /*
+ * Check if copy is into foreign table. We can not allow parallelism in
+ * this case because each worker needs to establish FDW connection and
+ * operate in a separate transaction. Unless we have a capability to
+ * provide two-phase commit protocol, we can not allow parallelism.
+ *
+ * Also check if copy is into temporary table. Since parallel workers can
+ * not access temporary table, parallelism is not allowed.
+ */
+ if (cstate->rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE ||
+ RelationUsesLocalBuffers(cstate->rel))
+ return false;
+
+ /*
+ * If there are volatile default expressions or where clause contain
+ * volatile expressions, allow parallelism if they are parallel safe,
+ * otherwise not.
+ */
+ if (!CheckExprParallelSafety(cstate))
+ return false;
+
+ /* Check parallel safety of the trigger functions. */
+ if (cstate->rel->trigdesc != NULL &&
+ !CheckRelTrigFunParallelSafety(cstate->rel->trigdesc))
+ return false;
+
+ /*
+ * When transition tables are involved (if after statement triggers are
+ * present), we collect minimal tuples in the tuple store after processing
+ * them so that later after statement triggers can access them. Now, if
+ * we want to enable parallelism for such cases, we instead need to store
+ * and access tuples from shared tuple store. However, it does not have
+ * the facility to store tuples in-memory, so we always need to store and
+ * access from a file which could be costly unless we also have an
+ * additional way to store minimal tuples in shared memory till work_mem
+ * and then in shared tuple store. It is possible to do all this to enable
+ * parallel copy for such cases. Currently, we can disallow parallelism
+ * for such cases and later allow if required.
+ *
+ * When there are BEFORE/AFTER/INSTEAD OF row triggers on the table. We do
+ * not allow parallelism in such cases because such triggers might query
+ * the table we are inserting into and act differently if the tuples that
+ * have already been processed and prepared for insertion are not there.
+ * Now, if we allow parallelism with such triggers the behaviour would
+ * depend on if the parallel worker has already inserted or not that
+ * particular tuples.
+ */
+ if (cstate->rel->trigdesc != NULL &&
+ (cstate->rel->trigdesc->trig_insert_after_statement ||
+ cstate->rel->trigdesc->trig_insert_new_table ||
+ cstate->rel->trigdesc->trig_insert_before_row ||
+ cstate->rel->trigdesc->trig_insert_after_row ||
+ cstate->rel->trigdesc->trig_insert_instead_row))
+ return false;
+
+ return true;
+}
+
+/*
+ * BeginParallelCopy - Start parallel copy tasks.
+ *
+ * Get the number of workers required to perform the parallel copy. The data
+ * structures that are required by the parallel workers will be initialized, the
+ * size required in DSM will be calculated and the necessary keys will be loaded
+ * in the DSM. The specified number of workers will then be launched.
+ */
+ParallelContext *
+BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid)
+{
+ ParallelContext *pcxt;
+ ParallelCopyShmInfo *shared_info_ptr;
+ char *whereClauseStr = NULL;
+ char *rangeTableStr = NULL;
+ char *attnameListStr = NULL;
+ char *notnullListStr = NULL;
+ char *nullListStr = NULL;
+ char *convertListStr = NULL;
+ int parallel_workers = 0;
+ WalUsage *walusage;
+ BufferUsage *bufferusage;
+ ParallelCopyData *pcdata;
+ MemoryContext oldcontext;
+ uint32 strsize;
+
+ CheckTargetRelValidity(cstate);
+ parallel_workers = Min(nworkers, max_worker_processes);
+
+ /* Can't perform copy in parallel */
+ if (parallel_workers <= 0)
+ return NULL;
+
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+ pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
+ MemoryContextSwitchTo(oldcontext);
+ cstate->pcdata = pcdata;
+
+ (void) GetCurrentFullTransactionId();
+ (void) GetCurrentCommandId(true);
+
+ EnterParallelMode();
+ pcxt = CreateParallelContext("postgres", "ParallelCopyMain",
+ parallel_workers);
+ Assert(pcxt->nworkers > 0);
+
+ /*
+ * Estimate size for shared information for PARALLEL_COPY_KEY_SHARED_INFO
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator, sizeof(ParallelCopyShmInfo));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate the size for shared information for PARALLEL_COPY_KEY_CSTATE */
+ strsize = EstimateCstateSize(pcxt, cstate, attnamelist, &whereClauseStr,
+ &rangeTableStr, &attnameListStr,
+ ¬nullListStr, &nullListStr,
+ &convertListStr);
+
+ /*
+ * Estimate space for WalUsage and BufferUsage -- PARALLEL_COPY_WAL_USAGE
+ * and PARALLEL_COPY_BUFFER_USAGE.
+ */
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ shm_toc_estimate_chunk(&pcxt->estimator,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ InitializeParallelDSM(pcxt);
+
+ /* If no DSM segment was available, back out (do serial copy) */
+ if (pcxt->seg == NULL)
+ {
+ EndParallelCopy(pcxt);
+ return NULL;
+ }
+
+ /* Allocate shared memory for PARALLEL_COPY_KEY_SHARED_INFO */
+ shared_info_ptr = (ParallelCopyShmInfo *) shm_toc_allocate(pcxt->toc, sizeof(ParallelCopyShmInfo));
+ PopulateParallelCopyShmInfo(shared_info_ptr);
+
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_KEY_SHARED_INFO, shared_info_ptr);
+ pcdata->pcshared_info = shared_info_ptr;
+ pcdata->relid = relid;
+
+ SerializeParallelCopyState(pcxt, cstate, strsize, whereClauseStr,
+ rangeTableStr, attnameListStr, notnullListStr,
+ nullListStr, convertListStr);
+
+ /*
+ * Allocate space for each worker's WalUsage and BufferUsage; no need to
+ * initialize.
+ */
+ walusage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(WalUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_WAL_USAGE, walusage);
+ pcdata->walusage = walusage;
+ bufferusage = shm_toc_allocate(pcxt->toc,
+ mul_size(sizeof(BufferUsage), pcxt->nworkers));
+ shm_toc_insert(pcxt->toc, PARALLEL_COPY_BUFFER_USAGE, bufferusage);
+ pcdata->bufferusage = bufferusage;
+
+ LaunchParallelWorkers(pcxt);
+ if (pcxt->nworkers_launched == 0)
+ {
+ EndParallelCopy(pcxt);
+ return NULL;
+ }
+
+ /*
+ * Caller needs to wait for all launched workers when we return. Make
+ * sure that the failure-to-start case will not hang forever.
+ */
+ WaitForParallelWorkersToAttach(pcxt);
+
+ pcdata->is_leader = true;
+ cstate->is_parallel = true;
+ return pcxt;
+}
+
+/*
+ * EndParallelCopy
+ *
+ * End the parallel copy tasks.
+ */
+pg_attribute_always_inline void
+EndParallelCopy(ParallelContext *pcxt)
+{
+ Assert(!IsParallelWorker());
+
+ DestroyParallelContext(pcxt);
+ ExitParallelMode();
+}
+
+/*
+ * InitializeParallelCopyInfo
+ *
+ * Initialize parallel worker.
+ */
+static void
+InitializeParallelCopyInfo(CopyState cstate, List *attnamelist)
+{
+ uint32 count;
+ ParallelCopyData *pcdata = cstate->pcdata;
+ TupleDesc tup_desc = RelationGetDescr(cstate->rel);
+
+ PopulateCommonCstateInfo(cstate, tup_desc, attnamelist);
+
+ /* Initialize state variables. */
+ cstate->reached_eof = false;
+ cstate->eol_type = EOL_UNKNOWN;
+ cstate->cur_relname = RelationGetRelationName(cstate->rel);
+ cstate->cur_lineno = 0;
+ cstate->cur_attname = NULL;
+ cstate->cur_attval = NULL;
+
+ /* Set up variables to avoid per-attribute overhead. */
+ initStringInfo(&cstate->attribute_buf);
+
+ initStringInfo(&cstate->line_buf);
+ for (count = 0; count < WORKER_CHUNK_COUNT; count++)
+ initStringInfo(&pcdata->worker_line_buf[count].line_buf);
+
+ cstate->line_buf_converted = false;
+ cstate->raw_buf = NULL;
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+
+ PopulateCstateCatalogInfo(cstate);
+
+ /* Create workspace for CopyReadAttributes results. */
+ if (!cstate->binary)
+ {
+ AttrNumber attr_count = list_length(cstate->attnumlist);
+
+ cstate->max_fields = attr_count;
+ cstate->raw_fields = (char **) palloc(attr_count * sizeof(char *));
+ }
+}
+
+/*
+ * CacheLineInfo
+ *
+ * Cache the line information to local memory.
+ */
+static bool
+CacheLineInfo(CopyState cstate, uint32 buff_count)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyData *pcdata = cstate->pcdata;
+ uint32 write_pos;
+ ParallelCopyDataBlock *data_blk_ptr;
+ ParallelCopyLineBoundary *lineInfo;
+ uint32 offset;
+ int dataSize;
+ int copiedSize = 0;
+
+ resetStringInfo(&pcdata->worker_line_buf[buff_count].line_buf);
+ write_pos = GetLinePosition(cstate);
+ if (-1 == write_pos)
+ return true;
+
+ /* Get the current line information. */
+ lineInfo = &pcshared_info->line_boundaries.ring[write_pos];
+ if (pg_atomic_read_u32(&lineInfo->line_size) == 0)
+ goto empty_data_line_update;
+
+ /* Get the block information. */
+ data_blk_ptr = &pcshared_info->data_blocks[lineInfo->first_block];
+
+ /* Get the offset information from where the data must be copied. */
+ offset = lineInfo->start_offset;
+ pcdata->worker_line_buf[buff_count].cur_lineno = lineInfo->cur_lineno;
+
+ elog(DEBUG1, "[Worker] Processing - line position:%d, block:%d, unprocessed lines:%d, offset:%d, line size:%d",
+ write_pos, lineInfo->first_block,
+ pg_atomic_read_u32(&data_blk_ptr->unprocessed_line_parts),
+ offset, pg_atomic_read_u32(&lineInfo->line_size));
+
+ for (;;)
+ {
+ uint8 skip_bytes = data_blk_ptr->skip_bytes;
+
+ /*
+ * There is a possibility that the loop embedded at the bottom of the
+ * current loop has come out because data_blk_ptr->curr_blk_completed
+ * is set, but dataSize read might be an old value, if
+ * data_blk_ptr->curr_blk_completed and the line is completed,
+ * line_size will be set. Read the line_size again to be sure if it is
+ * completed or partial block.
+ */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+ if (dataSize)
+ {
+ int remainingSize = dataSize - copiedSize;
+
+ if (!remainingSize)
+ break;
+
+ /* Whole line is in current block. */
+ if (remainingSize + offset + skip_bytes < DATA_BLOCK_SIZE)
+ {
+ appendBinaryStringInfo(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ remainingSize);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts,
+ 1);
+ break;
+ }
+ else
+ {
+ /* Line is spread across the blocks. */
+ uint32 lineInCurrentBlock = (DATA_BLOCK_SIZE - skip_bytes) - offset;
+
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ lineInCurrentBlock);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts, 1);
+ copiedSize += lineInCurrentBlock;
+ while (copiedSize < dataSize)
+ {
+ uint32 currentBlockCopySize;
+ ParallelCopyDataBlock *currBlkPtr = &pcshared_info->data_blocks[data_blk_ptr->following_block];
+
+ skip_bytes = currBlkPtr->skip_bytes;
+
+ /*
+ * If complete data is present in current block use
+ * dataSize - copiedSize, or copy the whole block from
+ * current block.
+ */
+ currentBlockCopySize = Min(dataSize - copiedSize, DATA_BLOCK_SIZE - skip_bytes);
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &currBlkPtr->data[0],
+ currentBlockCopySize);
+ pg_atomic_sub_fetch_u32(&currBlkPtr->unprocessed_line_parts, 1);
+ copiedSize += currentBlockCopySize;
+ data_blk_ptr = currBlkPtr;
+ }
+
+ break;
+ }
+ }
+ else
+ {
+ /* Copy this complete block from the current offset. */
+ uint32 lineInCurrentBlock = (DATA_BLOCK_SIZE - skip_bytes) - offset;
+
+ appendBinaryStringInfoNT(&pcdata->worker_line_buf[buff_count].line_buf,
+ &data_blk_ptr->data[offset],
+ lineInCurrentBlock);
+ pg_atomic_sub_fetch_u32(&data_blk_ptr->unprocessed_line_parts, 1);
+ copiedSize += lineInCurrentBlock;
+
+ /*
+ * Reset the offset. For the first copy, copy from the offset. For
+ * the subsequent copy the complete block.
+ */
+ offset = 0;
+
+ /* Set data_blk_ptr to the following block. */
+ data_blk_ptr = &pcshared_info->data_blocks[data_blk_ptr->following_block];
+ }
+
+ for (;;)
+ {
+ /* Get the size of this line */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+
+ /*
+ * If the data is present in current block lineInfo->line_size
+ * will be updated. If the data is spread across the blocks either
+ * of lineInfo->line_size or data_blk_ptr->curr_blk_completed can
+ * be updated. lineInfo->line_size will be updated if the complete
+ * read is finished. data_blk_ptr->curr_blk_completed will be
+ * updated if processing of current block is finished and data
+ * processing is not finished.
+ */
+ if (data_blk_ptr->curr_blk_completed || (dataSize != -1))
+ break;
+
+ COPY_WAIT_TO_PROCESS()
+ }
+ }
+
+empty_data_line_update:
+ elog(DEBUG1, "[Worker] Completed processing line:%d", write_pos);
+ pg_atomic_write_u32(&lineInfo->line_state, LINE_WORKER_PROCESSED);
+ pg_atomic_write_u32(&lineInfo->line_size, -1);
+ pg_atomic_add_fetch_u64(&pcshared_info->total_worker_processed, 1);
+ return false;
+}
+
+/*
+ * GetWorkerLine
+ *
+ * Returns a line for worker to process.
+ */
+bool
+GetWorkerLine(CopyState cstate)
+{
+ uint32 buff_count;
+ ParallelCopyData *pcdata = cstate->pcdata;
+
+ /*
+ * Copy the line data to line_buf and release the line position so that
+ * the worker can continue loading data.
+ */
+ if (pcdata->worker_line_buf_pos < pcdata->worker_line_buf_count)
+ goto return_line;
+
+ pcdata->worker_line_buf_pos = 0;
+ pcdata->worker_line_buf_count = 0;
+
+ for (buff_count = 0; buff_count < WORKER_CHUNK_COUNT; buff_count++)
+ {
+ bool result = CacheLineInfo(cstate, buff_count);
+
+ if (result)
+ break;
+
+ pcdata->worker_line_buf_count++;
+ }
+
+ if (pcdata->worker_line_buf_count)
+ goto return_line;
+ else
+ resetStringInfo(&cstate->line_buf);
+
+ return true;
+
+return_line:
+ cstate->line_buf = pcdata->worker_line_buf[pcdata->worker_line_buf_pos].line_buf;
+ cstate->cur_lineno = pcdata->worker_line_buf[pcdata->worker_line_buf_pos].cur_lineno;
+ cstate->line_buf_valid = true;
+
+ /* Mark that encoding conversion hasn't occurred yet. */
+ cstate->line_buf_converted = false;
+ ConvertToServerEncoding(cstate);
+ pcdata->worker_line_buf_pos++;
+ return false;
+}
+
+/*
+ * ParallelCopyMain - Parallel copy worker's code.
+ *
+ * Where clause handling, convert tuple to columns, add default null values for
+ * the missing columns that are not present in that record. Find the partition
+ * if it is partitioned table, invoke before row insert Triggers, handle
+ * constraints and insert the tuples.
+ */
+void
+ParallelCopyMain(dsm_segment *seg, shm_toc *toc)
+{
+ CopyState cstate;
+ ParallelCopyData *pcdata;
+ ParallelCopyShmInfo *pcshared_info;
+ Relation rel = NULL;
+ MemoryContext oldcontext;
+ List *attlist = NIL;
+ WalUsage *walusage;
+ BufferUsage *bufferusage;
+
+ /* Allocate workspace and zero all fields. */
+ cstate = (CopyStateData *) palloc0(sizeof(CopyStateData));
+
+ /*
+ * We allocate everything used by a cstate in a new memory context. This
+ * avoids memory leaks during repeated use of COPY in a query.
+ */
+ cstate->copycontext = AllocSetContextCreate(CurrentMemoryContext,
+ "COPY",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(cstate->copycontext);
+
+ pcdata = (ParallelCopyData *) palloc0(sizeof(ParallelCopyData));
+ cstate->pcdata = pcdata;
+ pcdata->is_leader = false;
+ pcdata->worker_processed_pos = -1;
+ cstate->is_parallel = true;
+ pcshared_info = (ParallelCopyShmInfo *) shm_toc_lookup(toc, PARALLEL_COPY_KEY_SHARED_INFO, false);
+
+ ereport(DEBUG1, (errmsg("Starting parallel copy worker")));
+
+ pcdata->pcshared_info = pcshared_info;
+ RestoreParallelCopyState(toc, cstate, &attlist);
+
+ /* Open and lock the relation, using the appropriate lock type. */
+ rel = table_open(cstate->pcdata->relid, RowExclusiveLock);
+ cstate->rel = rel;
+ InitializeParallelCopyInfo(cstate, attlist);
+
+ /* Prepare to track buffer usage during parallel execution */
+ InstrStartParallelQuery();
+
+ CopyFrom(cstate);
+
+ if (rel != NULL)
+ table_close(rel, RowExclusiveLock);
+
+ /* Report WAL/buffer usage during parallel execution */
+ bufferusage = shm_toc_lookup(toc, PARALLEL_COPY_BUFFER_USAGE, false);
+ walusage = shm_toc_lookup(toc, PARALLEL_COPY_WAL_USAGE, false);
+ InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber],
+ &walusage[ParallelWorkerNumber]);
+
+ MemoryContextSwitchTo(oldcontext);
+ pfree(cstate);
+ return;
+}
+
+/*
+ * UpdateSharedLineInfo
+ *
+ * Update the line information.
+ */
+uint32
+UpdateSharedLineInfo(CopyState cstate, uint32 blk_pos, uint32 offset,
+ uint32 line_size, uint32 line_state, uint32 blk_line_pos)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ ParallelCopyLineBoundaries *lineBoundaryPtr = &pcshared_info->line_boundaries;
+ ParallelCopyLineBoundary *lineInfo;
+ uint32 line_pos;
+
+ /* blk_line_pos will be valid in case line_pos was blocked earlier. */
+ if (blk_line_pos == -1)
+ {
+ line_pos = lineBoundaryPtr->pos;
+
+ /* Update the line information for the worker to pick and process. */
+ lineInfo = &lineBoundaryPtr->ring[line_pos];
+ while (pg_atomic_read_u32(&lineInfo->line_size) != -1)
+ COPY_WAIT_TO_PROCESS()
+
+ lineInfo->first_block = blk_pos;
+ lineInfo->start_offset = offset;
+ lineInfo->cur_lineno = cstate->cur_lineno;
+ lineBoundaryPtr->pos = (lineBoundaryPtr->pos + 1) % RINGSIZE;
+ }
+ else
+ {
+ line_pos = blk_line_pos;
+ lineInfo = &lineBoundaryPtr->ring[line_pos];
+ }
+
+ if (line_state == LINE_LEADER_POPULATED)
+ {
+ elog(DEBUG1, "[Leader] Added line with block:%d, offset:%d, line position:%d, line size:%d",
+ lineInfo->first_block, lineInfo->start_offset, line_pos,
+ pg_atomic_read_u32(&lineInfo->line_size));
+ pcshared_info->populated++;
+ }
+ else
+ elog(DEBUG1, "[Leader] Adding - block:%d, offset:%d, line position:%d",
+ lineInfo->first_block, lineInfo->start_offset, line_pos);
+
+ pg_atomic_write_u32(&lineInfo->line_size, line_size);
+ pg_atomic_write_u32(&lineInfo->line_state, line_state);
+
+ return line_pos;
+}
+
+/*
+ * ParallelCopyFrom - parallel copy leader's functionality.
+ *
+ * Leader executes the before statement for before statement trigger, if before
+ * statement trigger is present. It will read the table data from the file and
+ * copy the contents to DSM data blocks. It will then read the input contents
+ * from the DSM data block and identify the records based on line breaks. This
+ * information is called line or a record that need to be inserted into a
+ * relation. The line information will be stored in ParallelCopyLineBoundary DSM
+ * data structure. Workers will then process this information and insert the
+ * data in to table. It will repeat this process until the all data is read from
+ * the file and all the DSM data blocks are processed. While processing if
+ * leader identifies that DSM Data blocks or DSM ParallelCopyLineBoundary data
+ * structures is full, leader will wait till the worker frees up some entries
+ * and repeat the process. It will wait till all the lines populated are
+ * processed by the workers and exits.
+ */
+void
+ParallelCopyFrom(CopyState cstate)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+
+ ereport(DEBUG1, (errmsg("Running parallel copy leader")));
+
+ /* raw_buf is not used in parallel copy, instead data blocks are used. */
+ pfree(cstate->raw_buf);
+ cstate->raw_buf = NULL;
+
+ /* Execute the before statement triggers from the leader */
+ ExecBeforeStmtTrigger(cstate);
+
+ /* On input just throw the header line away. */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
+ {
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ {
+ pcshared_info->is_read_in_progress = false;
+ return; /* done */
+ }
+ }
+
+ for (;;)
+ {
+ bool done;
+
+ cstate->cur_lineno++;
+
+ /* Actually read the line into memory here. */
+ done = CopyReadLine(cstate);
+
+ /*
+ * EOF at start of line means we're done. If we see EOF after some
+ * characters, we act as though it was newline followed by EOF, ie,
+ * process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ break;
+ }
+
+ pcshared_info->is_read_in_progress = false;
+ cstate->cur_lineno = 0;
+}
+
+/*
+ * GetLinePosition
+ *
+ * Return the line position once the leader has populated the data.
+ */
+uint32
+GetLinePosition(CopyState cstate)
+{
+ ParallelCopyData *pcdata = cstate->pcdata;
+ ParallelCopyShmInfo *pcshared_info = pcdata->pcshared_info;
+ uint32 previous_pos = pcdata->worker_processed_pos;
+ uint32 write_pos = (previous_pos == -1) ? 0 : (previous_pos + 1) % RINGSIZE;
+
+ for (;;)
+ {
+ int dataSize;
+ bool is_read_in_progress = pcshared_info->is_read_in_progress;
+ ParallelCopyLineBoundary *lineInfo;
+ ParallelCopyDataBlock *data_blk_ptr;
+ uint32 line_state = LINE_LEADER_POPULATED;
+ ParallelCopyLineState curr_line_state;
+
+ CHECK_FOR_INTERRUPTS();
+
+ /* File read completed & no elements to process. */
+ if (!is_read_in_progress &&
+ (pcshared_info->populated ==
+ pg_atomic_read_u64(&pcshared_info->total_worker_processed)))
+ {
+ write_pos = -1;
+ break;
+ }
+
+ /* Get the current line information. */
+ lineInfo = &pcshared_info->line_boundaries.ring[write_pos];
+ curr_line_state = pg_atomic_read_u32(&lineInfo->line_state);
+ if ((write_pos % WORKER_CHUNK_COUNT == 0) &&
+ (curr_line_state == LINE_WORKER_PROCESSED ||
+ curr_line_state == LINE_WORKER_PROCESSING))
+ {
+ pcdata->worker_processed_pos = write_pos;
+ write_pos = (write_pos + WORKER_CHUNK_COUNT) % RINGSIZE;
+ continue;
+ }
+
+ /* Get the size of this line. */
+ dataSize = pg_atomic_read_u32(&lineInfo->line_size);
+
+ if (dataSize != 0) /* If not an empty line. */
+ {
+ /* Get the block information. */
+ data_blk_ptr = &pcshared_info->data_blocks[lineInfo->first_block];
+
+ if (!data_blk_ptr->curr_blk_completed && (dataSize == -1))
+ {
+ /* Wait till the current line or block is added. */
+ COPY_WAIT_TO_PROCESS()
+ continue;
+ }
+ }
+
+ /* Make sure that no worker has consumed this element. */
+ if (pg_atomic_compare_exchange_u32(&lineInfo->line_state,
+ &line_state, LINE_WORKER_PROCESSING))
+ break;
+ }
+
+ pcdata->worker_processed_pos = write_pos;
+ return write_pos;
+}
+
+/*
+ * GetFreeCopyBlock
+ *
+ * Get a free block for data to be copied.
+ */
+static pg_attribute_always_inline uint32
+GetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
+{
+ int count = 0;
+ uint32 last_free_block = pcshared_info->cur_block_pos;
+ uint32 block_pos = (last_free_block != -1) ? ((last_free_block + 1) % MAX_BLOCKS_COUNT) : 0;
+
+ /*
+ * Get a new block for copying data, don't check current block, current
+ * block will have some unprocessed data.
+ */
+ while (count < (MAX_BLOCKS_COUNT - 1))
+ {
+ ParallelCopyDataBlock *dataBlkPtr = &pcshared_info->data_blocks[block_pos];
+ uint32 unprocessed_line_parts = pg_atomic_read_u32(&dataBlkPtr->unprocessed_line_parts);
+
+ if (unprocessed_line_parts == 0)
+ {
+ dataBlkPtr->curr_blk_completed = false;
+ dataBlkPtr->skip_bytes = 0;
+ dataBlkPtr->following_block = -1;
+ pcshared_info->cur_block_pos = block_pos;
+ MemSet(&dataBlkPtr->data[0], 0, DATA_BLOCK_SIZE);
+ return block_pos;
+ }
+
+ block_pos = (block_pos + 1) % MAX_BLOCKS_COUNT;
+ count++;
+ }
+
+ return -1;
+}
+
+/*
+ * WaitGetFreeCopyBlock
+ *
+ * If there are no blocks available, wait and get a block for copying data.
+ */
+uint32
+WaitGetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info)
+{
+ uint32 new_free_pos = -1;
+
+ for (;;)
+ {
+ new_free_pos = GetFreeCopyBlock(pcshared_info);
+ if (new_free_pos != -1) /* We have got one block, break now. */
+ break;
+
+ COPY_WAIT_TO_PROCESS()
+ }
+
+ return new_free_pos;
+}
+
+/*
+ * SetRawBufForLoad
+ *
+ * Set raw_buf to the shared memory where the file data must be read.
+ */
+void
+SetRawBufForLoad(CopyState cstate, uint32 line_size, uint32 copy_buf_len,
+ uint32 raw_buf_ptr, char **copy_raw_buf)
+{
+ ParallelCopyShmInfo *pcshared_info;
+ uint32 cur_block_pos;
+ uint32 next_block_pos;
+ ParallelCopyDataBlock *cur_data_blk_ptr = NULL;
+ ParallelCopyDataBlock *next_data_blk_ptr = NULL;
+
+ Assert(IsParallelCopy());
+
+ pcshared_info = cstate->pcdata->pcshared_info;
+ cur_block_pos = pcshared_info->cur_block_pos;
+ cur_data_blk_ptr = (cstate->raw_buf) ? &pcshared_info->data_blocks[cur_block_pos] : NULL;
+ next_block_pos = WaitGetFreeCopyBlock(pcshared_info);
+ next_data_blk_ptr = &pcshared_info->data_blocks[next_block_pos];
+
+ /* set raw_buf to the data block in shared memory */
+ cstate->raw_buf = next_data_blk_ptr->data;
+ *copy_raw_buf = cstate->raw_buf;
+ if (cur_data_blk_ptr && line_size)
+ {
+ /*
+ * Mark the previous block as completed, worker can start copying this
+ * data.
+ */
+ cur_data_blk_ptr->following_block = next_block_pos;
+ pg_atomic_add_fetch_u32(&cur_data_blk_ptr->unprocessed_line_parts, 1);
+ cur_data_blk_ptr->skip_bytes = copy_buf_len - raw_buf_ptr;
+ cur_data_blk_ptr->curr_blk_completed = true;
+ }
+
+ cstate->raw_buf_index = 0;
+}
+
+/*
+ * EndLineParallelCopy
+ *
+ * Update the line information in shared memory.
+ */
+void
+EndLineParallelCopy(CopyState cstate, uint32 line_pos, uint32 line_size,
+ uint32 raw_buf_ptr)
+{
+ uint8 new_line_size;
+
+ if (!IsParallelCopy())
+ return;
+
+ if (!IsHeaderLine())
+ {
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+
+ SET_NEWLINE_SIZE()
+ if (line_size)
+ {
+ /*
+ * If the new_line_size > raw_buf_ptr, then the new block has only
+ * new line char content. The unprocessed count should not be
+ * increased in this case.
+ */
+ if (raw_buf_ptr > new_line_size)
+ {
+ uint32 cur_block_pos = pcshared_info->cur_block_pos;
+ ParallelCopyDataBlock *curr_data_blk_ptr = &pcshared_info->data_blocks[cur_block_pos];
+
+ pg_atomic_add_fetch_u32(&curr_data_blk_ptr->unprocessed_line_parts, 1);
+ }
+
+ /*
+ * Update line size & line state, other members are already
+ * updated.
+ */
+ (void) UpdateSharedLineInfo(cstate, -1, -1, line_size,
+ LINE_LEADER_POPULATED, line_pos);
+ }
+ else if (new_line_size)
+ /* This means only new line char, empty record should be inserted. */
+ (void) UpdateSharedLineInfo(cstate, -1, -1, 0,
+ LINE_LEADER_POPULATED, -1);
+ }
+}
+
+/*
+ * ExecBeforeStmtTrigger
+ *
+ * Execute the before statement trigger, this will be executed for parallel copy
+ * by the leader process.
+ */
+void
+ExecBeforeStmtTrigger(CopyState cstate)
+{
+ EState *estate = CreateExecutorState();
+ ResultRelInfo *resultRelInfo;
+
+ Assert(IsLeader());
+
+ /*
+ * We need a ResultRelInfo so we can use the regular executor's
+ * index-entry-making machinery. (There used to be a huge amount of code
+ * here that basically duplicated execUtils.c ...)
+ */
+ ExecInitRangeTable(estate, cstate->range_table);
+ resultRelInfo = makeNode(ResultRelInfo);
+ ExecInitResultRelation(estate, resultRelInfo, 1);
+
+ /* Verify the named relation is a valid target for INSERT */
+ CheckValidResultRel(resultRelInfo, CMD_INSERT);
+
+ estate->es_result_relation_info = resultRelInfo;
+
+ /* Prepare to catch AFTER triggers. */
+ AfterTriggerBeginQuery();
+
+ /*
+ * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
+ * should do this for COPY, since it's not really an "INSERT" statement as
+ * such. However, executing these triggers maintains consistency with the
+ * EACH ROW triggers that we already fire on COPY.
+ */
+ ExecBSInsertTriggers(estate, resultRelInfo);
+
+ /* Handle queued AFTER triggers */
+ AfterTriggerEndQuery(estate);
+
+ /* Close the result relations, including any trigger target relations */
+ ExecCloseResultRelations(estate);
+ ExecCloseRangeTableRelations(estate);
+
+ FreeExecutorState(estate);
+}
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 24c7b41..cf00256 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2353,7 +2353,7 @@ psql_completion(const char *text, int start, int end)
/* Complete COPY <sth> FROM|TO filename WITH ( */
else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "("))
COMPLETE_WITH("FORMAT", "FREEZE", "DELIMITER", "NULL",
- "HEADER", "QUOTE", "ESCAPE", "FORCE_QUOTE",
+ "HEADER", "PARALLEL", "QUOTE", "ESCAPE", "FORCE_QUOTE",
"FORCE_NOT_NULL", "FORCE_NULL", "ENCODING");
/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
diff --git a/src/include/access/xact.h b/src/include/access/xact.h
index df1b43a..96295bc 100644
--- a/src/include/access/xact.h
+++ b/src/include/access/xact.h
@@ -385,6 +385,7 @@ extern FullTransactionId GetTopFullTransactionId(void);
extern FullTransactionId GetTopFullTransactionIdIfAny(void);
extern FullTransactionId GetCurrentFullTransactionId(void);
extern FullTransactionId GetCurrentFullTransactionIdIfAny(void);
+extern void SetCurrentCommandIdUsedForWorker(void);
extern void MarkCurrentTransactionIdLoggedIfAny(void);
extern bool SubTransactionIsActive(SubTransactionId subxid);
extern CommandId GetCurrentCommandId(bool used);
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index cd2d56e..9b19dcb 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -34,7 +34,7 @@
*/
#define DATA_BLOCK_SIZE RAW_BUF_SIZE
-/* It can hold 1023 blocks of 64K data in DSM to be processed by the worker. */
+/* It can hold 1024 blocks of 64K data in DSM to be processed by the worker. */
#define MAX_BLOCKS_COUNT 1024
/*
@@ -51,6 +51,13 @@
*/
#define WORKER_CHUNK_COUNT 64
+#define IsParallelCopy() (cstate->is_parallel)
+#define IsLeader() (cstate->pcdata->is_leader)
+#define IsWorker() (IsParallelCopy() && !IsLeader())
+#define IsHeaderLine() (cstate->header_line && cstate->cur_lineno == 1)
+
+
+
/*
* Represents the different source/dest cases we need to worry about at
* the bottom level
@@ -75,6 +82,28 @@ typedef enum EolType
} EolType;
/*
+ * State of the line.
+ */
+typedef enum ParallelCopyLineState
+{
+ LINE_INIT, /* initial state of line */
+ LINE_LEADER_POPULATING, /* leader processing line */
+ LINE_LEADER_POPULATED, /* leader completed populating line */
+ LINE_WORKER_PROCESSING, /* worker processing line */
+ LINE_WORKER_PROCESSED /* worker completed processing line */
+} ParallelCopyLineState;
+
+/*
+ * Represents the heap insert method to be used during COPY FROM.
+ */
+typedef enum CopyInsertMethod
+{
+ CIM_SINGLE, /* use table_tuple_insert or fdw routine */
+ CIM_MULTI, /* always use table_multi_insert */
+ CIM_MULTI_CONDITIONAL /* use table_multi_insert only if valid */
+} CopyInsertMethod;
+
+/*
* Copy data block information.
*
* These data blocks are created in DSM. Data read from file will be copied in
@@ -194,8 +223,6 @@ typedef struct ParallelCopyShmInfo
uint64 populated; /* lines populated by leader */
uint32 cur_block_pos; /* current data block */
ParallelCopyDataBlock data_blocks[MAX_BLOCKS_COUNT]; /* data block array */
- FullTransactionId full_transaction_id; /* xid for copy from statement */
- CommandId mycid; /* command id */
ParallelCopyLineBoundaries line_boundaries; /* line array */
} ParallelCopyShmInfo;
@@ -242,12 +269,12 @@ typedef struct ParallelCopyData
ParallelCopyShmInfo *pcshared_info; /* common info in shared memory */
bool is_leader;
+ /* line position which worker is processing */
+ uint32 worker_processed_pos;
+
WalUsage *walusage;
BufferUsage *bufferusage;
- /* line position which worker is processing */
- uint32 worker_processed_pos;
-
/*
* Local line_buf array, workers will copy it here and release the lines
* for the leader to continue.
@@ -423,9 +450,24 @@ extern DestReceiver *CreateCopyDestReceiver(void);
extern void PopulateCommonCstateInfo(CopyState cstate, TupleDesc tup_desc,
List *attnamelist);
+extern void ConvertToServerEncoding(CopyState cstate);
extern void ParallelCopyMain(dsm_segment *seg, shm_toc *toc);
extern ParallelContext *BeginParallelCopy(int nworkers, CopyState cstate, List *attnamelist, Oid relid);
extern void ParallelCopyFrom(CopyState cstate);
extern void EndParallelCopy(ParallelContext *pcxt);
+extern bool IsParallelCopyAllowed(CopyState cstate);
+extern void ExecBeforeStmtTrigger(CopyState cstate);
+extern void CheckTargetRelValidity(CopyState cstate);
+extern void PopulateCstateCatalogInfo(CopyState cstate);
+extern uint32 GetLinePosition(CopyState cstate);
+extern bool GetWorkerLine(CopyState cstate);
+extern bool CopyReadLine(CopyState cstate);
+extern uint32 WaitGetFreeCopyBlock(ParallelCopyShmInfo *pcshared_info);
+extern void SetRawBufForLoad(CopyState cstate, uint32 line_size, uint32 copy_buf_len,
+ uint32 raw_buf_ptr, char **copy_raw_buf);
+extern uint32 UpdateSharedLineInfo(CopyState cstate, uint32 blk_pos, uint32 offset,
+ uint32 line_size, uint32 line_state, uint32 blk_line_pos);
+extern void EndLineParallelCopy(CopyState cstate, uint32 line_pos, uint32 line_size,
+ uint32 raw_buf_ptr);
#endif /* COPY_H */
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 5ce8296..8dfb944 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1707,6 +1707,7 @@ ParallelCopyLineBoundary
ParallelCopyData
ParallelCopyDataBlock
ParallelCopyLineBuf
+ParallelCopyLineState
ParallelCopyShmInfo
ParallelExecutorInfo
ParallelHashGrowth
--
1.8.3.1
v7-0004-Documentation-for-parallel-copy.patchtext/x-patch; charset=US-ASCII; name=v7-0004-Documentation-for-parallel-copy.patchDownload
From 778be67172b1f9eb3070ea64f64215e36e11c062 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Sat, 1 Aug 2020 09:00:36 +0530
Subject: [PATCH v7 4/6] Documentation for parallel copy.
This patch has the documentation changes for parallel copy.
---
doc/src/sgml/ref/copy.sgml | 31 +++++++++++++++++++++++++++++++
1 file changed, 31 insertions(+)
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 369342b..328a5f1 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -37,6 +37,7 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
DELIMITER '<replaceable class="parameter">delimiter_character</replaceable>'
NULL '<replaceable class="parameter">null_string</replaceable>'
HEADER [ <replaceable class="parameter">boolean</replaceable> ]
+ PARALLEL <replaceable class="parameter">integer</replaceable>
QUOTE '<replaceable class="parameter">quote_character</replaceable>'
ESCAPE '<replaceable class="parameter">escape_character</replaceable>'
FORCE_QUOTE { ( <replaceable class="parameter">column_name</replaceable> [, ...] ) | * }
@@ -277,6 +278,22 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
</varlistentry>
<varlistentry>
+ <term><literal>PARALLEL</literal></term>
+ <listitem>
+ <para>
+ Perform <command>COPY FROM</command> in parallel using <replaceable
+ class="parameter">integer</replaceable> background workers. Please
+ note that it is not guaranteed that the number of parallel workers
+ specified in <replaceable class="parameter">integer</replaceable> will
+ be used during execution. It is possible for a copy to run with fewer
+ workers than specified, or even with no workers at all (for example,
+ due to the setting of max_worker_processes). This option is allowed
+ only in <command>COPY FROM</command>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>QUOTE</literal></term>
<listitem>
<para>
@@ -953,6 +970,20 @@ COPY country FROM '/usr1/proj/bray/sql/country_data';
</para>
<para>
+ To copy data parallelly from a file into the <literal>country</literal> table:
+<programlisting>
+COPY country FROM '/usr1/proj/bray/sql/country_data' WITH (PARALLEL 1);
+</programlisting>
+ </para>
+
+ <para>
+ To copy data parallelly from STDIN into the <literal>country</literal> table:
+<programlisting>
+COPY country FROM STDIN WITH (PARALLEL 1);
+</programlisting>
+ </para>
+
+ <para>
To copy into a file just the countries whose names start with 'A':
<programlisting>
COPY (SELECT * FROM country WHERE country_name LIKE 'A%') TO '/usr1/proj/bray/sql/a_list_countries.copy';
--
1.8.3.1
v7-0005-Tests-for-parallel-copy.patchtext/x-patch; charset=US-ASCII; name=v7-0005-Tests-for-parallel-copy.patchDownload
From 66846373b9cd3a508ba1b0dd622f834783855e99 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Tue, 13 Oct 2020 14:23:10 +0530
Subject: [PATCH v7 5/6] Tests for parallel copy.
This patch has the tests for parallel copy:
---
contrib/postgres_fdw/expected/postgres_fdw.out | 49 ++++
contrib/postgres_fdw/sql/postgres_fdw.sql | 52 ++++
src/test/regress/expected/copy2.out | 326 +++++++++++++++++++++-
src/test/regress/input/copy.source | 31 +++
src/test/regress/output/copy.source | 27 ++
src/test/regress/sql/copy2.sql | 368 ++++++++++++++++++++++++-
6 files changed, 845 insertions(+), 8 deletions(-)
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 2d88d06..474c5e7 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9033,5 +9033,54 @@ SELECT 1 FROM ft1 LIMIT 1; -- should fail
ERROR: 08006
\set VERBOSITY default
COMMIT;
+-- parallel copy related tests.
+CREATE TABLE test_parallel_copy (
+ a INT,
+ b INT,
+ c TEXT default 'stuff',
+ d TEXT,
+ e TEXT
+) ;
+CREATE FOREIGN TABLE test_parallel_copy_ft (
+ a INT,
+ b INT,
+ c TEXT default 'stuff',
+ d TEXT,
+ e TEXT
+) SERVER loopback OPTIONS (table_name 'test_parallel_copy') ;
+-- parallel copy into foreign table, parallelism must not be picked up.
+COPY test_parallel_copy_ft FROM stdin WITH (PARALLEL 1);
+SELECT count(*) FROM test_parallel_copy_ft;
+ count
+-------
+ 2
+(1 row)
+
+-- parallel copy into a table with foreign partition.
+CREATE TABLE part_test_parallel_copy (
+ a INT,
+ b INT,
+ c TEXT default 'stuff',
+ d TEXT,
+ e TEXT
+) PARTITION BY LIST (b);
+CREATE FOREIGN TABLE part_test_parallel_copy_a1 (c TEXT, b INT, a INT, e TEXT, d TEXT) SERVER loopback;
+CREATE TABLE part_test_parallel_copy_a2 (a INT, c TEXT, b INT, d TEXT, e TEXT);
+ALTER TABLE part_test_parallel_copy ATTACH PARTITION part_test_parallel_copy_a1 FOR VALUES IN(1);
+ALTER TABLE part_test_parallel_copy ATTACH PARTITION part_test_parallel_copy_a2 FOR VALUES IN(2);
+COPY part_test_parallel_copy FROM stdin WITH (PARALLEL 1);
+ERROR: cannot perform PARALLEL COPY if partition has BEFORE/INSTEAD OF triggers, or if the partition is foreign partition
+HINT: Try COPY without PARALLEL option
+CONTEXT: COPY part_test_parallel_copy, line 1: "1 1 test_c1 test_d1 test_e1"
+parallel worker
+SELECT count(*) FROM part_test_parallel_copy WHERE b = 2;
+ count
+-------
+ 0
+(1 row)
+
-- Clean up
DROP PROCEDURE terminate_backend_and_wait(text);
+DROP FOREIGN TABLE test_parallel_copy_ft;
+DROP TABLE test_parallel_copy;
+DROP TABLE part_test_parallel_copy;
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 7581c54..635fcc2 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2695,5 +2695,57 @@ SELECT 1 FROM ft1 LIMIT 1; -- should fail
\set VERBOSITY default
COMMIT;
+-- parallel copy related tests.
+CREATE TABLE test_parallel_copy (
+ a INT,
+ b INT,
+ c TEXT default 'stuff',
+ d TEXT,
+ e TEXT
+) ;
+
+CREATE FOREIGN TABLE test_parallel_copy_ft (
+ a INT,
+ b INT,
+ c TEXT default 'stuff',
+ d TEXT,
+ e TEXT
+) SERVER loopback OPTIONS (table_name 'test_parallel_copy') ;
+
+-- parallel copy into foreign table, parallelism must not be picked up.
+COPY test_parallel_copy_ft FROM stdin WITH (PARALLEL 1);
+1 1 test_c1 test_d1 test_e1
+2 2 test_c2 test_d2 test_e2
+\.
+
+SELECT count(*) FROM test_parallel_copy_ft;
+
+-- parallel copy into a table with foreign partition.
+CREATE TABLE part_test_parallel_copy (
+ a INT,
+ b INT,
+ c TEXT default 'stuff',
+ d TEXT,
+ e TEXT
+) PARTITION BY LIST (b);
+
+CREATE FOREIGN TABLE part_test_parallel_copy_a1 (c TEXT, b INT, a INT, e TEXT, d TEXT) SERVER loopback;
+
+CREATE TABLE part_test_parallel_copy_a2 (a INT, c TEXT, b INT, d TEXT, e TEXT);
+
+ALTER TABLE part_test_parallel_copy ATTACH PARTITION part_test_parallel_copy_a1 FOR VALUES IN(1);
+
+ALTER TABLE part_test_parallel_copy ATTACH PARTITION part_test_parallel_copy_a2 FOR VALUES IN(2);
+
+COPY part_test_parallel_copy FROM stdin WITH (PARALLEL 1);
+1 1 test_c1 test_d1 test_e1
+2 2 test_c2 test_d2 test_e2
+\.
+
+SELECT count(*) FROM part_test_parallel_copy WHERE b = 2;
+
-- Clean up
DROP PROCEDURE terminate_backend_and_wait(text);
+DROP FOREIGN TABLE test_parallel_copy_ft;
+DROP TABLE test_parallel_copy;
+DROP TABLE part_test_parallel_copy;
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index c64f071..08ce743 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -301,18 +301,32 @@ It is "perfect".|
"It is ""perfect""."," "
"",
--test that we read consecutive LFs properly
-CREATE TEMP TABLE testnl (a int, b text, c int);
+CREATE TABLE testnl (a int, b text, c int);
COPY testnl FROM stdin CSV;
+COPY testnl FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
+SELECT * FROM testnl;
+ a | b | c
+---+----------------------+---
+ 1 | a field with two LFs+| 2
+ | +|
+ | inside |
+ 1 | a field with two LFs+| 2
+ | +|
+ | inside |
+(2 rows)
+
-- test end of copy marker
-CREATE TEMP TABLE testeoc (a text);
+CREATE TABLE testeoc (a text);
COPY testeoc FROM stdin CSV;
+TRUNCATE testeoc;
+COPY testeoc FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
COPY testeoc TO stdout CSV;
a\.
\.b
c\.d
"\."
-- test handling of nonstandard null marker that violates escaping rules
-CREATE TEMP TABLE testnull(a int, b text);
+CREATE TABLE testnull(a int, b text);
INSERT INTO testnull VALUES (1, E'\\0'), (NULL, NULL);
COPY testnull TO stdout WITH NULL AS E'\\0';
1 \\0
@@ -327,6 +341,15 @@ SELECT * FROM testnull;
|
(4 rows)
+TRUNCATE testnull;
+COPY testnull FROM stdin WITH (NULL E'\\0', PARALLEL 1);
+SELECT * FROM testnull;
+ a | b
+----+----
+ 42 | \0
+ |
+(2 rows)
+
BEGIN;
CREATE TABLE vistest (LIKE testeoc);
COPY vistest FROM stdin CSV;
@@ -396,6 +419,34 @@ SELECT * FROM vistest;
BEGIN;
TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+SELECT * FROM vistest;
+ a
+----
+ a2
+ b
+(2 rows)
+
+SAVEPOINT s1;
+TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+SELECT * FROM vistest;
+ a
+----
+ d2
+ e
+(2 rows)
+
+COMMIT;
+SELECT * FROM vistest;
+ a
+----
+ d2
+ e
+(2 rows)
+
+BEGIN;
+TRUNCATE vistest;
COPY vistest FROM stdin CSV FREEZE;
SELECT * FROM vistest;
a
@@ -456,7 +507,7 @@ SELECT * FROM vistest;
(2 rows)
-- Test FORCE_NOT_NULL and FORCE_NULL options
-CREATE TEMP TABLE forcetest (
+CREATE TABLE forcetest (
a INT NOT NULL,
b TEXT NOT NULL,
c TEXT,
@@ -467,6 +518,8 @@ CREATE TEMP TABLE forcetest (
-- should succeed with no effect ("b" remains an empty string, "c" remains NULL)
BEGIN;
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
+TRUNCATE forcetest;
+COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c), PARALLEL 1);
COMMIT;
SELECT b, c FROM forcetest WHERE a = 1;
b | c
@@ -477,6 +530,8 @@ SELECT b, c FROM forcetest WHERE a = 1;
-- should succeed, FORCE_NULL and FORCE_NOT_NULL can be both specified
BEGIN;
COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d));
+TRUNCATE forcetest;
+COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d), PARALLEL 1);
COMMIT;
SELECT c, d FROM forcetest WHERE a = 2;
c | d
@@ -533,6 +588,31 @@ select * from check_con_tbl;
(2 rows)
+\d+ check_con_tbl
+ Table "public.check_con_tbl"
+ Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
+--------+---------+-----------+----------+---------+---------+--------------+-------------
+ f1 | integer | | | | plain | |
+Check constraints:
+ "check_con_tbl_check" CHECK (check_con_function(check_con_tbl.*))
+
+truncate check_con_tbl;
+copy check_con_tbl from stdin with (parallel 1);
+NOTICE: input = {"f1":1}
+NOTICE: input = {"f1":null}
+copy check_con_tbl from stdin with (parallel 1);
+NOTICE: input = {"f1":0}
+ERROR: new row for relation "check_con_tbl" violates check constraint "check_con_tbl_check"
+DETAIL: Failing row contains (0).
+CONTEXT: COPY check_con_tbl, line 1: "0"
+parallel worker
+select * from check_con_tbl;
+ f1
+----
+ 1
+
+(2 rows)
+
-- test with RLS enabled.
CREATE ROLE regress_rls_copy_user;
CREATE ROLE regress_rls_copy_user_colperms;
@@ -647,10 +727,248 @@ SELECT * FROM instead_of_insert_tbl;
(2 rows)
COMMIT;
+-- Parallel copy tests.
+CREATE TABLE test_parallel_copy (
+ a INT,
+ b INT,
+ c TEXT not null default 'stuff',
+ d TEXT,
+ e TEXT
+) ;
+COPY test_parallel_copy (a, b, c, d, e) FROM stdin WITH (PARALLEL 1);
+COPY test_parallel_copy (b, d) FROM stdin WITH (PARALLEL 1);
+COPY test_parallel_copy (b, d) FROM stdin WITH (PARALLEL 1);
+COPY test_parallel_copy (b, d) FROM stdin WITH (PARALLEL 1, FORMAT 'csv', HEADER);
+-- zero workers: should perform non-parallel copy
+COPY test_parallel_copy (b, d) FROM stdin WITH (PARALLEL '0');
+ERROR: value 0 out of bounds for option "parallel"
+DETAIL: Valid values are between "1" and "1024".
+-- referencing table: should perform non-parallel copy
+CREATE TABLE test_copy_pk(c1 INT PRIMARY KEY);
+INSERT INTO test_copy_pk VALUES(10);
+CREATE TABLE test_copy_ri(c1 INT REFERENCES test_copy_pk(c1));
+COPY test_copy_ri FROM stdin WITH (FORMAT csv, DELIMITER ',', PARALLEL 1);
+-- expressions: should perform non-parallel copy
+CREATE TABLE test_copy_expr (index INT, height REAL, weight REAL);
+COPY test_copy_expr FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1) WHERE height > random() * 65;
+-- serial data: should perform non-parallel copy
+CREATE TABLE testserial (index SERIAL, height REAL);
+COPY testserial(height) FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1);
+-- temporary table copy: should perform non-parallel copy
+CREATE TEMPORARY TABLE temp_test(
+ a int
+) ;
+COPY temp_test (a) FROM stdin WITH (PARALLEL 1);
+-- non-existent column in column list: should fail
+COPY test_parallel_copy (xyz) FROM stdin WITH (PARALLEL 1);
+ERROR: column "xyz" of relation "test_parallel_copy" does not exist
+-- too many columns in column list: should fail
+COPY test_parallel_copy (a, b, c, d, e, d, c) FROM stdin WITH (PARALLEL 1);
+ERROR: column "d" specified more than once
+-- missing data: should fail
+COPY test_parallel_copy FROM stdin WITH (PARALLEL 1);
+ERROR: invalid input syntax for type integer: ""
+CONTEXT: COPY test_parallel_copy, line 0, column a: ""
+parallel worker
+COPY test_parallel_copy FROM stdin WITH (PARALLEL 1);
+ERROR: missing data for column "e"
+CONTEXT: COPY test_parallel_copy, line 1: "2000 230 23 23"
+parallel worker
+COPY test_parallel_copy FROM stdin WITH (PARALLEL 1);
+ERROR: missing data for column "e"
+CONTEXT: COPY test_parallel_copy, line 1: "2001 231 \N \N"
+parallel worker
+-- extra data: should fail
+COPY test_parallel_copy FROM stdin WITH (PARALLEL 1);
+ERROR: extra data after last expected column
+CONTEXT: COPY test_parallel_copy, line 1: "2002 232 40 50 60 70 80"
+parallel worker
+-- various COPY options: delimiters, oids, NULL string, encoding
+COPY test_parallel_copy (b, c, d, e) FROM stdin WITH (DELIMITER ',', null 'x', PARALLEL 1) ;
+COPY test_parallel_copy FROM stdin WITH (DELIMITER ';', NULL '', PARALLEL 1);
+COPY test_parallel_copy FROM stdin WITH (DELIMITER ':', NULL E'\\X', ENCODING 'sql_ascii', PARALLEL 1);
+COPY test_parallel_copy FROM stdin WITH (PARALLEL 1) WHERE a = 50004;
+COPY test_parallel_copy FROM stdin WITH (PARALLEL 1) WHERE a > 60003;
+COPY test_parallel_copy FROM stdin WITH (PARALLEL 1) WHERE f > 60003;
+ERROR: column "f" does not exist
+LINE 1: ..._parallel_copy FROM stdin WITH (PARALLEL 1) WHERE f > 60003;
+ ^
+COPY test_parallel_copy FROM stdin WITH (PARALLEL 1) WHERE a = max(x.b);
+ERROR: missing FROM-clause entry for table "x"
+LINE 1: ...rallel_copy FROM stdin WITH (PARALLEL 1) WHERE a = max(x.b);
+ ^
+COPY test_parallel_copy FROM stdin WITH (PARALLEL 1) WHERE a IN (SELECT 1 FROM x);
+ERROR: cannot use subquery in COPY FROM WHERE condition
+LINE 1: ...arallel_copy FROM stdin WITH (PARALLEL 1) WHERE a IN (SELECT...
+ ^
+COPY test_parallel_copy FROM stdin WITH (PARALLEL 1) WHERE a IN (generate_series(1,5));
+ERROR: set-returning functions are not allowed in COPY FROM WHERE conditions
+LINE 1: ...lel_copy FROM stdin WITH (PARALLEL 1) WHERE a IN (generate_s...
+ ^
+COPY test_parallel_copy FROM stdin WITH(PARALLEL 1) WHERE a = row_number() over(b);
+ERROR: window functions are not allowed in COPY FROM WHERE conditions
+LINE 1: ...rallel_copy FROM stdin WITH(PARALLEL 1) WHERE a = row_number...
+ ^
+-- check results of copy in
+SELECT * FROM test_parallel_copy ORDER BY 1;
+ a | b | c | d | e
+-------+----+------------+---------+---------
+ 1 | 11 | test_c1 | test_d1 | test_e1
+ 2 | 12 | test_c2 | test_d2 | test_e2
+ 3000 | | c | |
+ 4000 | | C | |
+ 4001 | 1 | empty | |
+ 4002 | 2 | null | |
+ 4003 | 3 | Backslash | \ | \
+ 4004 | 4 | BackslashX | \X | \X
+ 4005 | 5 | N | N | N
+ 4006 | 6 | BackslashN | \N | \N
+ 4007 | 7 | XX | XX | XX
+ 4008 | 8 | Delimiter | : | :
+ 50004 | 25 | 35 | 45 | 55
+ 60004 | 25 | 35 | 45 | 55
+ 60005 | 26 | 36 | 46 | 56
+ | 3 | stuff | test_d3 |
+ | 4 | stuff | test_d4 |
+ | 5 | stuff | test_d5 |
+ | | , | \, | \
+ | | x | \x | \x
+ | | 45 | 80 | 90
+(21 rows)
+
+-- parallel copy test for unlogged tables. should execute in parallel worker
+CREATE UNLOGGED TABLE test_parallel_copy_unlogged(
+ a INT,
+ b INT,
+ c TEXT default 'stuff',
+ d TEXT,
+ e TEXT
+);
+COPY test_parallel_copy_unlogged FROM stdin WITH (PARALLEL 1);
+SELECT count(*) FROM test_parallel_copy_unlogged;
+ count
+-------
+ 2
+(1 row)
+
+-- parallel copy test for various trigger types
+TRUNCATE test_parallel_copy;
+-- parallel safe trigger function
+CREATE OR REPLACE FUNCTION parallel_copy_trig_func() RETURNS TRIGGER
+LANGUAGE plpgsql PARALLEL SAFE AS $$
+BEGIN
+ RETURN NEW;
+END;
+$$;
+-- before insert row trigger
+CREATE TRIGGER parallel_copy_trig
+BEFORE INSERT ON test_parallel_copy
+FOR EACH ROW
+EXECUTE PROCEDURE parallel_copy_trig_func();
+-- no parallelism should be picked
+COPY test_parallel_copy FROM stdin WITH (PARALLEL 1);
+DROP TRIGGER parallel_copy_trig ON test_parallel_copy;
+-- after insert row trigger
+CREATE TRIGGER parallel_copy_trig
+AFTER INSERT ON test_parallel_copy
+FOR EACH ROW
+EXECUTE PROCEDURE parallel_copy_trig_func();
+-- no parallelism should be picked
+COPY test_parallel_copy FROM stdin WITH (PARALLEL 1);
+DROP TRIGGER parallel_copy_trig ON test_parallel_copy;
+-- before insert statement trigger
+CREATE TRIGGER parallel_copy_trig
+BEFORE INSERT ON test_parallel_copy
+FOR EACH STATEMENT
+EXECUTE PROCEDURE parallel_copy_trig_func();
+-- parallelism should be picked, since the trigger function is parallel safe
+COPY test_parallel_copy FROM stdin WITH (PARALLEL 1);
+DROP TRIGGER parallel_copy_trig ON test_parallel_copy;
+-- after insert statement trigger
+CREATE TRIGGER parallel_copy_trig
+AFTER INSERT ON test_parallel_copy
+FOR EACH STATEMENT
+EXECUTE PROCEDURE parallel_copy_trig_func();
+-- no parallelism should be picked
+COPY test_parallel_copy FROM stdin WITH (PARALLEL 1);
+DROP TRIGGER parallel_copy_trig ON test_parallel_copy;
+-- transition table is involved
+CREATE TRIGGER parallel_copy_trig
+AFTER INSERT ON test_parallel_copy REFERENCING NEW TABLE AS new_table
+FOR EACH STATEMENT
+EXECUTE PROCEDURE parallel_copy_trig_func();
+-- no parallelism should be picked
+COPY test_parallel_copy FROM stdin WITH (PARALLEL 1);
+DROP TRIGGER parallel_copy_trig ON test_parallel_copy;
+-- make trigger function parallel unsafe
+ALTER FUNCTION parallel_copy_trig_func PARALLEL UNSAFE;
+-- before statement trigger has a parallel unsafe function
+CREATE TRIGGER parallel_copy_trig
+BEFORE INSERT ON test_parallel_copy
+FOR EACH STATEMENT
+EXECUTE PROCEDURE parallel_copy_trig_func();
+-- no parallelism should be picked
+COPY test_parallel_copy FROM stdin WITH (PARALLEL 1);
+SELECT count(*) FROM test_parallel_copy;
+ count
+-------
+ 12
+(1 row)
+
+DROP TRIGGER parallel_copy_trig ON test_parallel_copy;
+-- instead of insert trigger, no parallelism should be picked
+COPY instead_of_insert_tbl_view FROM stdin WITH(PARALLEL 1);;
+SELECT count(*) FROM instead_of_insert_tbl_view;
+ count
+-------
+ 1
+(1 row)
+
+-- parallel copy test for a partitioned table with a before insert trigger on
+-- one of the partition
+ALTER FUNCTION parallel_copy_trig_func PARALLEL SAFE;
+CREATE TABLE test_parallel_copy_part (
+ a INT,
+ b INT,
+ c TEXT default 'stuff',
+ d TEXT,
+ e TEXT
+) PARTITION BY LIST (b);
+CREATE TABLE test_parallel_copy_part_a1 (c TEXT, b INT, a INT, e TEXT, d TEXT);
+CREATE TABLE test_parallel_copy_part_a2 (a INT, c TEXT, b INT, d TEXT, e TEXT);
+ALTER TABLE test_parallel_copy_part ATTACH PARTITION test_parallel_copy_part_a1 FOR VALUES IN(1);
+ALTER TABLE test_parallel_copy_part ATTACH PARTITION test_parallel_copy_part_a2 FOR VALUES IN(2);
+CREATE TRIGGER parallel_copy_trig
+BEFORE INSERT ON test_parallel_copy_part_a2
+FOR EACH ROW
+EXECUTE PROCEDURE parallel_copy_trig_func();
+COPY test_parallel_copy_part FROM stdin WITH (PARALLEL 1);
+ERROR: cannot perform PARALLEL COPY if partition has BEFORE/INSTEAD OF triggers, or if the partition is foreign partition
+HINT: Try COPY without PARALLEL option
+CONTEXT: COPY test_parallel_copy_part, line 2: "2 2 test_c2 test_d2 test_e2"
+parallel worker
+SELECT count(*) FROM test_parallel_copy_part;
+ count
+-------
+ 0
+(1 row)
+
+DROP TRIGGER parallel_copy_trig ON test_parallel_copy_part_a2;
-- clean up
+DROP TABLE test_copy_ri;
+DROP TABLE test_copy_pk;
+DROP TABLE test_copy_expr;
+DROP TABLE testeoc;
+DROP TABLE testnl;
DROP TABLE forcetest;
+DROP TABLE test_parallel_copy;
+DROP TABLE test_parallel_copy_unlogged;
+DROP TABLE test_parallel_copy_part;
+DROP TABLE testserial;
+DROP TABLE testnull;
DROP TABLE vistest;
DROP FUNCTION truncate_in_subxact();
+DROP FUNCTION parallel_copy_trig_func();
DROP TABLE x, y;
DROP TABLE rls_t1 CASCADE;
DROP ROLE regress_rls_copy_user;
diff --git a/src/test/regress/input/copy.source b/src/test/regress/input/copy.source
index a1d529a..cb39c66 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -15,6 +15,13 @@ DELETE FROM onek;
COPY onek FROM '@abs_builddir@/results/onek.data';
+-- test parallel copy
+COPY tenk1 FROM '@abs_srcdir@/data/tenk.data' WITH (parallel 2);
+
+SELECT COUNT(*) FROM tenk1;
+
+TRUNCATE tenk1;
+
COPY tenk1 FROM '@abs_srcdir@/data/tenk.data';
COPY slow_emp4000 FROM '@abs_srcdir@/data/rect.data';
@@ -159,6 +166,30 @@ truncate parted_copytest;
copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv';
+-- Test parallel copy from with a partitioned table.
+truncate parted_copytest;
+
+copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv' with (parallel 2);
+
+-- Test parallel copy from for toast table with csv and binary format file
+create table test_parallel_copy_toast(a1 int, b1 text, c1 text default null);
+
+insert into test_parallel_copy_toast select i, (repeat(md5(i::text), 100000)) FROM generate_series(1,2) AS i;
+
+copy test_parallel_copy_toast to '@abs_builddir@/results/parallelcopytoast.csv' with(format csv);
+
+copy test_parallel_copy_toast to '@abs_builddir@/results/parallelcopytoast.dat' with(format binary);
+
+truncate test_parallel_copy_toast;
+
+copy test_parallel_copy_toast from '@abs_builddir@/results/parallelcopytoast.csv' with(format csv, parallel 2);
+
+copy test_parallel_copy_toast from '@abs_builddir@/results/parallelcopytoast.dat' with(format binary, parallel 2);
+
+select count(*) from test_parallel_copy_toast;
+
+drop table test_parallel_copy_toast;
+
-- Ensure COPY FREEZE errors for partitioned tables.
begin;
truncate parted_copytest;
diff --git a/src/test/regress/output/copy.source b/src/test/regress/output/copy.source
index 938d355..a5dca79 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -9,6 +9,15 @@ COPY onek FROM '@abs_srcdir@/data/onek.data';
COPY onek TO '@abs_builddir@/results/onek.data';
DELETE FROM onek;
COPY onek FROM '@abs_builddir@/results/onek.data';
+-- test parallel copy
+COPY tenk1 FROM '@abs_srcdir@/data/tenk.data' WITH (parallel 2);
+SELECT COUNT(*) FROM tenk1;
+ count
+-------
+ 10000
+(1 row)
+
+TRUNCATE tenk1;
COPY tenk1 FROM '@abs_srcdir@/data/tenk.data';
COPY slow_emp4000 FROM '@abs_srcdir@/data/rect.data';
COPY person FROM '@abs_srcdir@/data/person.data';
@@ -113,6 +122,24 @@ insert into parted_copytest select x,1,'One' from generate_series(1011,1020) x;
copy (select * from parted_copytest order by a) to '@abs_builddir@/results/parted_copytest.csv';
truncate parted_copytest;
copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv';
+-- Test parallel copy from with a partitioned table.
+truncate parted_copytest;
+copy parted_copytest from '@abs_builddir@/results/parted_copytest.csv' with (parallel 2);
+-- Test parallel copy from for toast table with csv and binary format file
+create table test_parallel_copy_toast(a1 int, b1 text, c1 text default null);
+insert into test_parallel_copy_toast select i, (repeat(md5(i::text), 100000)) FROM generate_series(1,2) AS i;
+copy test_parallel_copy_toast to '@abs_builddir@/results/parallelcopytoast.csv' with(format csv);
+copy test_parallel_copy_toast to '@abs_builddir@/results/parallelcopytoast.dat' with(format binary);
+truncate test_parallel_copy_toast;
+copy test_parallel_copy_toast from '@abs_builddir@/results/parallelcopytoast.csv' with(format csv, parallel 2);
+copy test_parallel_copy_toast from '@abs_builddir@/results/parallelcopytoast.dat' with(format binary, parallel 2);
+select count(*) from test_parallel_copy_toast;
+ count
+-------
+ 4
+(1 row)
+
+drop table test_parallel_copy_toast;
-- Ensure COPY FREEZE errors for partitioned tables.
begin;
truncate parted_copytest;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index b3c16af..b3c9af3 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -171,7 +171,7 @@ COPY y TO stdout (FORMAT CSV, FORCE_QUOTE *);
--test that we read consecutive LFs properly
-CREATE TEMP TABLE testnl (a int, b text, c int);
+CREATE TABLE testnl (a int, b text, c int);
COPY testnl FROM stdin CSV;
1,"a field with two LFs
@@ -179,8 +179,16 @@ COPY testnl FROM stdin CSV;
inside",2
\.
+COPY testnl FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
+1,"a field with two LFs
+
+inside",2
+\.
+
+SELECT * FROM testnl;
+
-- test end of copy marker
-CREATE TEMP TABLE testeoc (a text);
+CREATE TABLE testeoc (a text);
COPY testeoc FROM stdin CSV;
a\.
@@ -189,11 +197,19 @@ c\.d
"\."
\.
+TRUNCATE testeoc;
+COPY testeoc FROM stdin WITH (FORMAT 'csv', PARALLEL 1);
+a\.
+\.b
+c\.d
+"\."
+\.
+
COPY testeoc TO stdout CSV;
-- test handling of nonstandard null marker that violates escaping rules
-CREATE TEMP TABLE testnull(a int, b text);
+CREATE TABLE testnull(a int, b text);
INSERT INTO testnull VALUES (1, E'\\0'), (NULL, NULL);
COPY testnull TO stdout WITH NULL AS E'\\0';
@@ -205,6 +221,14 @@ COPY testnull FROM stdin WITH NULL AS E'\\0';
SELECT * FROM testnull;
+TRUNCATE testnull;
+COPY testnull FROM stdin WITH (NULL E'\\0', PARALLEL 1);
+42 \\0
+\0 \0
+\.
+
+SELECT * FROM testnull;
+
BEGIN;
CREATE TABLE vistest (LIKE testeoc);
COPY vistest FROM stdin CSV;
@@ -249,6 +273,23 @@ SELECT * FROM vistest;
BEGIN;
TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+a2
+b
+\.
+SELECT * FROM vistest;
+SAVEPOINT s1;
+TRUNCATE vistest;
+COPY vistest FROM stdin WITH (FORMAT 'csv', FREEZE, PARALLEL 1);
+d2
+e
+\.
+SELECT * FROM vistest;
+COMMIT;
+SELECT * FROM vistest;
+
+BEGIN;
+TRUNCATE vistest;
COPY vistest FROM stdin CSV FREEZE;
x
y
@@ -298,7 +339,7 @@ SELECT * FROM vistest;
COMMIT;
SELECT * FROM vistest;
-- Test FORCE_NOT_NULL and FORCE_NULL options
-CREATE TEMP TABLE forcetest (
+CREATE TABLE forcetest (
a INT NOT NULL,
b TEXT NOT NULL,
c TEXT,
@@ -311,6 +352,10 @@ BEGIN;
COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c));
1,,""
\.
+TRUNCATE forcetest;
+COPY forcetest (a, b, c) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(b), FORCE_NULL(c), PARALLEL 1);
+1,,""
+\.
COMMIT;
SELECT b, c FROM forcetest WHERE a = 1;
-- should succeed, FORCE_NULL and FORCE_NOT_NULL can be both specified
@@ -318,6 +363,10 @@ BEGIN;
COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d));
2,'a',,""
\.
+TRUNCATE forcetest;
+COPY forcetest (a, b, c, d) FROM STDIN WITH (FORMAT csv, FORCE_NOT_NULL(c,d), FORCE_NULL(c,d), PARALLEL 1);
+2,'a',,""
+\.
COMMIT;
SELECT c, d FROM forcetest WHERE a = 2;
-- should fail with not-null constraint violation
@@ -353,6 +402,16 @@ copy check_con_tbl from stdin;
0
\.
select * from check_con_tbl;
+\d+ check_con_tbl
+truncate check_con_tbl;
+copy check_con_tbl from stdin with (parallel 1);
+1
+\N
+\.
+copy check_con_tbl from stdin with (parallel 1);
+0
+\.
+select * from check_con_tbl;
-- test with RLS enabled.
CREATE ROLE regress_rls_copy_user;
@@ -454,10 +513,311 @@ test1
SELECT * FROM instead_of_insert_tbl;
COMMIT;
+-- Parallel copy tests.
+CREATE TABLE test_parallel_copy (
+ a INT,
+ b INT,
+ c TEXT not null default 'stuff',
+ d TEXT,
+ e TEXT
+) ;
+
+COPY test_parallel_copy (a, b, c, d, e) FROM stdin WITH (PARALLEL 1);
+1 11 test_c1 test_d1 test_e1
+2 12 test_c2 test_d2 test_e2
+\.
+
+COPY test_parallel_copy (b, d) FROM stdin WITH (PARALLEL 1);
+3 test_d3
+\.
+
+COPY test_parallel_copy (b, d) FROM stdin WITH (PARALLEL 1);
+4 test_d4
+5 test_d5
+\.
+
+COPY test_parallel_copy (b, d) FROM stdin WITH (PARALLEL 1, FORMAT 'csv', HEADER);
+b d
+\.
+
+-- zero workers: should perform non-parallel copy
+COPY test_parallel_copy (b, d) FROM stdin WITH (PARALLEL '0');
+
+-- referencing table: should perform non-parallel copy
+CREATE TABLE test_copy_pk(c1 INT PRIMARY KEY);
+INSERT INTO test_copy_pk VALUES(10);
+CREATE TABLE test_copy_ri(c1 INT REFERENCES test_copy_pk(c1));
+COPY test_copy_ri FROM stdin WITH (FORMAT csv, DELIMITER ',', PARALLEL 1);
+10
+\.
+
+-- expressions: should perform non-parallel copy
+CREATE TABLE test_copy_expr (index INT, height REAL, weight REAL);
+
+COPY test_copy_expr FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1) WHERE height > random() * 65;
+60,60,60
+\.
+
+-- serial data: should perform non-parallel copy
+CREATE TABLE testserial (index SERIAL, height REAL);
+
+COPY testserial(height) FROM STDIN WITH (FORMAT csv, DELIMITER ',', PARALLEL 1);
+60
+\.
+
+-- temporary table copy: should perform non-parallel copy
+CREATE TEMPORARY TABLE temp_test(
+ a int
+) ;
+
+COPY temp_test (a) FROM stdin WITH (PARALLEL 1);
+10
+\.
+
+-- non-existent column in column list: should fail
+COPY test_parallel_copy (xyz) FROM stdin WITH (PARALLEL 1);
+
+-- too many columns in column list: should fail
+COPY test_parallel_copy (a, b, c, d, e, d, c) FROM stdin WITH (PARALLEL 1);
+
+-- missing data: should fail
+COPY test_parallel_copy FROM stdin WITH (PARALLEL 1);
+
+\.
+COPY test_parallel_copy FROM stdin WITH (PARALLEL 1);
+2000 230 23 23
+\.
+COPY test_parallel_copy FROM stdin WITH (PARALLEL 1);
+2001 231 \N \N
+\.
+
+-- extra data: should fail
+COPY test_parallel_copy FROM stdin WITH (PARALLEL 1);
+2002 232 40 50 60 70 80
+\.
+
+-- various COPY options: delimiters, oids, NULL string, encoding
+COPY test_parallel_copy (b, c, d, e) FROM stdin WITH (DELIMITER ',', null 'x', PARALLEL 1) ;
+x,45,80,90
+x,\x,\\x,\\\x
+x,\,,\\\,,\\
+\.
+
+COPY test_parallel_copy FROM stdin WITH (DELIMITER ';', NULL '', PARALLEL 1);
+3000;;c;;
+\.
+
+COPY test_parallel_copy FROM stdin WITH (DELIMITER ':', NULL E'\\X', ENCODING 'sql_ascii', PARALLEL 1);
+4000:\X:C:\X:\X
+4001:1:empty::
+4002:2:null:\X:\X
+4003:3:Backslash:\\:\\
+4004:4:BackslashX:\\X:\\X
+4005:5:N:\N:\N
+4006:6:BackslashN:\\N:\\N
+4007:7:XX:\XX:\XX
+4008:8:Delimiter:\::\:
+\.
+
+COPY test_parallel_copy FROM stdin WITH (PARALLEL 1) WHERE a = 50004;
+50003 24 34 44 54
+50004 25 35 45 55
+50005 26 36 46 56
+\.
+
+COPY test_parallel_copy FROM stdin WITH (PARALLEL 1) WHERE a > 60003;
+60001 22 32 42 52
+60002 23 33 43 53
+60003 24 34 44 54
+60004 25 35 45 55
+60005 26 36 46 56
+\.
+
+COPY test_parallel_copy FROM stdin WITH (PARALLEL 1) WHERE f > 60003;
+
+COPY test_parallel_copy FROM stdin WITH (PARALLEL 1) WHERE a = max(x.b);
+
+COPY test_parallel_copy FROM stdin WITH (PARALLEL 1) WHERE a IN (SELECT 1 FROM x);
+
+COPY test_parallel_copy FROM stdin WITH (PARALLEL 1) WHERE a IN (generate_series(1,5));
+
+COPY test_parallel_copy FROM stdin WITH(PARALLEL 1) WHERE a = row_number() over(b);
+
+-- check results of copy in
+SELECT * FROM test_parallel_copy ORDER BY 1;
+
+-- parallel copy test for unlogged tables. should execute in parallel worker
+CREATE UNLOGGED TABLE test_parallel_copy_unlogged(
+ a INT,
+ b INT,
+ c TEXT default 'stuff',
+ d TEXT,
+ e TEXT
+);
+
+COPY test_parallel_copy_unlogged FROM stdin WITH (PARALLEL 1);
+1 1 test_c1 test_d1 test_e1
+2 2 test_c2 test_d2 test_e2
+\.
+
+SELECT count(*) FROM test_parallel_copy_unlogged;
+
+-- parallel copy test for various trigger types
+TRUNCATE test_parallel_copy;
+
+-- parallel safe trigger function
+CREATE OR REPLACE FUNCTION parallel_copy_trig_func() RETURNS TRIGGER
+LANGUAGE plpgsql PARALLEL SAFE AS $$
+BEGIN
+ RETURN NEW;
+END;
+$$;
+
+-- before insert row trigger
+CREATE TRIGGER parallel_copy_trig
+BEFORE INSERT ON test_parallel_copy
+FOR EACH ROW
+EXECUTE PROCEDURE parallel_copy_trig_func();
+
+-- no parallelism should be picked
+COPY test_parallel_copy FROM stdin WITH (PARALLEL 1);
+1 1 test_c1 test_d1 test_e1
+2 2 test_c2 test_d2 test_e2
+\.
+
+DROP TRIGGER parallel_copy_trig ON test_parallel_copy;
+
+-- after insert row trigger
+CREATE TRIGGER parallel_copy_trig
+AFTER INSERT ON test_parallel_copy
+FOR EACH ROW
+EXECUTE PROCEDURE parallel_copy_trig_func();
+
+-- no parallelism should be picked
+COPY test_parallel_copy FROM stdin WITH (PARALLEL 1);
+1 1 test_c1 test_d1 test_e1
+2 2 test_c2 test_d2 test_e2
+\.
+
+DROP TRIGGER parallel_copy_trig ON test_parallel_copy;
+
+-- before insert statement trigger
+CREATE TRIGGER parallel_copy_trig
+BEFORE INSERT ON test_parallel_copy
+FOR EACH STATEMENT
+EXECUTE PROCEDURE parallel_copy_trig_func();
+
+-- parallelism should be picked, since the trigger function is parallel safe
+COPY test_parallel_copy FROM stdin WITH (PARALLEL 1);
+1 1 test_c1 test_d1 test_e1
+2 2 test_c2 test_d2 test_e2
+\.
+
+DROP TRIGGER parallel_copy_trig ON test_parallel_copy;
+
+-- after insert statement trigger
+CREATE TRIGGER parallel_copy_trig
+AFTER INSERT ON test_parallel_copy
+FOR EACH STATEMENT
+EXECUTE PROCEDURE parallel_copy_trig_func();
+
+-- no parallelism should be picked
+COPY test_parallel_copy FROM stdin WITH (PARALLEL 1);
+1 1 test_c1 test_d1 test_e1
+2 2 test_c2 test_d2 test_e2
+\.
+
+DROP TRIGGER parallel_copy_trig ON test_parallel_copy;
+
+-- transition table is involved
+CREATE TRIGGER parallel_copy_trig
+AFTER INSERT ON test_parallel_copy REFERENCING NEW TABLE AS new_table
+FOR EACH STATEMENT
+EXECUTE PROCEDURE parallel_copy_trig_func();
+
+-- no parallelism should be picked
+COPY test_parallel_copy FROM stdin WITH (PARALLEL 1);
+1 1 test_c1 test_d1 test_e1
+2 2 test_c2 test_d2 test_e2
+\.
+
+DROP TRIGGER parallel_copy_trig ON test_parallel_copy;
+
+-- make trigger function parallel unsafe
+ALTER FUNCTION parallel_copy_trig_func PARALLEL UNSAFE;
+
+-- before statement trigger has a parallel unsafe function
+CREATE TRIGGER parallel_copy_trig
+BEFORE INSERT ON test_parallel_copy
+FOR EACH STATEMENT
+EXECUTE PROCEDURE parallel_copy_trig_func();
+
+-- no parallelism should be picked
+COPY test_parallel_copy FROM stdin WITH (PARALLEL 1);
+1 1 test_c1 test_d1 test_e1
+2 2 test_c2 test_d2 test_e2
+\.
+
+SELECT count(*) FROM test_parallel_copy;
+
+DROP TRIGGER parallel_copy_trig ON test_parallel_copy;
+
+-- instead of insert trigger, no parallelism should be picked
+COPY instead_of_insert_tbl_view FROM stdin WITH(PARALLEL 1);;
+test1
+\.
+
+SELECT count(*) FROM instead_of_insert_tbl_view;
+
+-- parallel copy test for a partitioned table with a before insert trigger on
+-- one of the partition
+ALTER FUNCTION parallel_copy_trig_func PARALLEL SAFE;
+
+CREATE TABLE test_parallel_copy_part (
+ a INT,
+ b INT,
+ c TEXT default 'stuff',
+ d TEXT,
+ e TEXT
+) PARTITION BY LIST (b);
+
+CREATE TABLE test_parallel_copy_part_a1 (c TEXT, b INT, a INT, e TEXT, d TEXT);
+
+CREATE TABLE test_parallel_copy_part_a2 (a INT, c TEXT, b INT, d TEXT, e TEXT);
+
+ALTER TABLE test_parallel_copy_part ATTACH PARTITION test_parallel_copy_part_a1 FOR VALUES IN(1);
+
+ALTER TABLE test_parallel_copy_part ATTACH PARTITION test_parallel_copy_part_a2 FOR VALUES IN(2);
+
+CREATE TRIGGER parallel_copy_trig
+BEFORE INSERT ON test_parallel_copy_part_a2
+FOR EACH ROW
+EXECUTE PROCEDURE parallel_copy_trig_func();
+
+COPY test_parallel_copy_part FROM stdin WITH (PARALLEL 1);
+1 1 test_c1 test_d1 test_e1
+2 2 test_c2 test_d2 test_e2
+\.
+
+SELECT count(*) FROM test_parallel_copy_part;
+
+DROP TRIGGER parallel_copy_trig ON test_parallel_copy_part_a2;
+
-- clean up
+DROP TABLE test_copy_ri;
+DROP TABLE test_copy_pk;
+DROP TABLE test_copy_expr;
+DROP TABLE testeoc;
+DROP TABLE testnl;
DROP TABLE forcetest;
+DROP TABLE test_parallel_copy;
+DROP TABLE test_parallel_copy_unlogged;
+DROP TABLE test_parallel_copy_part;
+DROP TABLE testserial;
+DROP TABLE testnull;
DROP TABLE vistest;
DROP FUNCTION truncate_in_subxact();
+DROP FUNCTION parallel_copy_trig_func();
DROP TABLE x, y;
DROP TABLE rls_t1 CASCADE;
DROP ROLE regress_rls_copy_user;
--
1.8.3.1
v7-0006-Parallel-Copy-For-Binary-Format-Files.patchtext/x-patch; charset=US-ASCII; name=v7-0006-Parallel-Copy-For-Binary-Format-Files.patchDownload
From f2f82a6018937a594eea9b3f3d9b9c23be9b4b66 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Wed, 14 Oct 2020 12:49:48 +0530
Subject: [PATCH v7 6/6] Parallel Copy For Binary Format Files
Leader reads data from the file into the DSM data blocks each of 64K size.
It also identifies each tuple data block id, start offset, end offset,
tuple size and updates this information in the ring data structure.
Workers parallelly read the tuple information from the ring data structure,
the actual tuple data from the data blocks and parallelly insert the tuples
into the table.
---
src/backend/commands/copy.c | 126 ++++++-------
src/backend/commands/copyparallel.c | 367 ++++++++++++++++++++++++++++++++++--
src/include/commands/copy.h | 126 +++++++++++++
3 files changed, 531 insertions(+), 88 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 9a026be..44e0aa4 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -222,19 +222,14 @@ static void CopySendData(CopyState cstate, const void *databuf, int datasize);
static void CopySendString(CopyState cstate, const char *str);
static void CopySendChar(CopyState cstate, char c);
static void CopySendEndOfRow(CopyState cstate);
-static int CopyGetData(CopyState cstate, void *databuf,
- int minread, int maxread);
static void CopySendInt32(CopyState cstate, int32 val);
static bool CopyGetInt32(CopyState cstate, int32 *val);
static void CopySendInt16(CopyState cstate, int16 val);
static bool CopyGetInt16(CopyState cstate, int16 *val);
static bool CopyLoadRawBuf(CopyState cstate);
-static int CopyReadBinaryData(CopyState cstate, char *dest, int nbytes);
-
static void ClearEOLFromCopiedData(CopyState cstate, char *copy_line_data,
int copy_line_pos, int *copy_line_size);
-
/*
* Send copy start/stop messages for frontend copies. These have changed
* in past protocol redesigns.
@@ -448,7 +443,7 @@ CopySendEndOfRow(CopyState cstate)
*
* NB: no data conversion is applied here.
*/
-static int
+int
CopyGetData(CopyState cstate, void *databuf, int minread, int maxread)
{
int bytesread = 0;
@@ -581,10 +576,25 @@ CopyGetInt32(CopyState cstate, int32 *val)
{
uint32 buf;
- if (CopyReadBinaryData(cstate, (char *) &buf, sizeof(buf)) != sizeof(buf))
+ /*
+ * For parallel copy, avoid reading data to raw buf, read directly from
+ * file, later the data will be read to parallel copy data buffers.
+ */
+ if (cstate->nworkers > 0)
{
- *val = 0; /* suppress compiler warning */
- return false;
+ if (CopyGetData(cstate, &buf, sizeof(buf), sizeof(buf)) != sizeof(buf))
+ {
+ *val = 0; /* suppress compiler warning */
+ return false;
+ }
+ }
+ else
+ {
+ if (CopyReadBinaryData(cstate, (char *) &buf, sizeof(buf)) != sizeof(buf))
+ {
+ *val = 0; /* suppress compiler warning */
+ return false;
+ }
}
*val = (int32) pg_ntoh32(buf);
return true;
@@ -658,7 +668,7 @@ CopyLoadRawBuf(CopyState cstate)
* and writes them to 'dest'. Returns the number of bytes read (which
* would be less than 'nbytes' only if we reach EOF).
*/
-static int
+int
CopyReadBinaryData(CopyState cstate, char *dest, int nbytes)
{
int copied_bytes = 0;
@@ -3563,7 +3573,7 @@ BeginCopyFrom(ParseState *pstate,
int32 tmp;
/* Signature */
- if (CopyReadBinaryData(cstate, readSig, 11) != 11 ||
+ if (CopyGetData(cstate, readSig, 11, 11) != 11 ||
memcmp(readSig, BinarySignature, 11) != 0)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
@@ -3591,7 +3601,7 @@ BeginCopyFrom(ParseState *pstate,
/* Skip extension header, if present */
while (tmp-- > 0)
{
- if (CopyReadBinaryData(cstate, readSig, 1) != 1)
+ if (CopyGetData(cstate, readSig, 1, 1) != 1)
ereport(ERROR,
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
errmsg("invalid COPY file header (wrong length)")));
@@ -3788,60 +3798,45 @@ NextCopyFrom(CopyState cstate, ExprContext *econtext,
else
{
/* binary */
- int16 fld_count;
- ListCell *cur;
-
cstate->cur_lineno++;
+ cstate->max_fields = list_length(cstate->attnumlist);
- if (!CopyGetInt16(cstate, &fld_count))
+ if (!IsParallelCopy())
{
- /* EOF detected (end of file, or protocol-level EOF) */
- return false;
- }
-
- if (fld_count == -1)
- {
- /*
- * Received EOF marker. In a V3-protocol copy, wait for the
- * protocol-level EOF, and complain if it doesn't come
- * immediately. This ensures that we correctly handle CopyFail,
- * if client chooses to send that now.
- *
- * Note that we MUST NOT try to read more data in an old-protocol
- * copy, since there is no protocol-level EOF marker then. We
- * could go either way for copy from file, but choose to throw
- * error if there's data after the EOF marker, for consistency
- * with the new-protocol case.
- */
- char dummy;
+ int16 fld_count;
+ ListCell *cur;
- if (cstate->copy_dest != COPY_OLD_FE &&
- CopyReadBinaryData(cstate, &dummy, 1) > 0)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("received copy data after EOF marker")));
- return false;
- }
+ if (!CopyGetInt16(cstate, &fld_count))
+ {
+ /* EOF detected (end of file, or protocol-level EOF) */
+ return false;
+ }
- if (fld_count != attr_count)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("row field count is %d, expected %d",
- (int) fld_count, attr_count)));
+ CHECK_FIELD_COUNT;
- foreach(cur, cstate->attnumlist)
+ foreach(cur, cstate->attnumlist)
+ {
+ int attnum = lfirst_int(cur);
+ int m = attnum - 1;
+ Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+
+ cstate->cur_attname = NameStr(att->attname);
+ values[m] = CopyReadBinaryAttribute(cstate,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &nulls[m]);
+ cstate->cur_attname = NULL;
+ }
+ }
+ else
{
- int attnum = lfirst_int(cur);
- int m = attnum - 1;
- Form_pg_attribute att = TupleDescAttr(tupDesc, m);
+ bool eof = false;
- cstate->cur_attname = NameStr(att->attname);
- values[m] = CopyReadBinaryAttribute(cstate,
- &in_functions[m],
- typioparams[m],
- att->atttypmod,
- &nulls[m]);
- cstate->cur_attname = NULL;
+ eof = CopyReadBinaryTupleWorker(cstate, values, nulls);
+
+ if (eof)
+ return false;
}
}
@@ -4852,18 +4847,15 @@ CopyReadBinaryAttribute(CopyState cstate, FmgrInfo *flinfo,
Datum result;
if (!CopyGetInt32(cstate, &fld_size))
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("unexpected EOF in COPY data")));
+ EOF_ERROR;
+
if (fld_size == -1)
{
*isnull = true;
return ReceiveFunctionCall(flinfo, NULL, typioparam, typmod);
}
- if (fld_size < 0)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("invalid field size")));
+
+ CHECK_FIELD_SIZE(fld_size);
/* reset attribute_buf to empty, and load raw data in it */
resetStringInfo(&cstate->attribute_buf);
@@ -4871,9 +4863,7 @@ CopyReadBinaryAttribute(CopyState cstate, FmgrInfo *flinfo,
enlargeStringInfo(&cstate->attribute_buf, fld_size);
if (CopyReadBinaryData(cstate, cstate->attribute_buf.data,
fld_size) != fld_size)
- ereport(ERROR,
- (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
- errmsg("unexpected EOF in COPY data")));
+ EOF_ERROR;
cstate->attribute_buf.len = fld_size;
cstate->attribute_buf.data[fld_size] = '\0';
diff --git a/src/backend/commands/copyparallel.c b/src/backend/commands/copyparallel.c
index 9cae112..3e17e82 100644
--- a/src/backend/commands/copyparallel.c
+++ b/src/backend/commands/copyparallel.c
@@ -100,6 +100,7 @@ SerializeParallelCopyState(ParallelContext *pcxt, CopyState cstate,
shared_cstate.convert_selectively = cstate->convert_selectively;
shared_cstate.num_defaults = cstate->num_defaults;
shared_cstate.relid = cstate->pcdata->relid;
+ shared_cstate.binary = cstate->binary;
memcpy(shmptr, (char *) &shared_cstate, sizeof(SerializedParallelCopyState));
copiedsize = sizeof(SerializedParallelCopyState);
@@ -204,6 +205,7 @@ RestoreParallelCopyState(shm_toc *toc, CopyState cstate, List **attlist)
cstate->convert_selectively = shared_cstate.convert_selectively;
cstate->num_defaults = shared_cstate.num_defaults;
cstate->pcdata->relid = shared_cstate.relid;
+ cstate->binary = shared_cstate.binary;
cstate->null_print = CopyStringFromSharedMemory(shared_str_val + copiedsize,
&copiedsize);
@@ -403,7 +405,7 @@ bool
IsParallelCopyAllowed(CopyState cstate)
{
/* Parallel copy not allowed for frontend (2.0 protocol) & binary option. */
- if ((cstate->copy_dest == COPY_OLD_FE) || cstate->binary)
+ if (cstate->copy_dest == COPY_OLD_FE)
return false;
/*
@@ -620,6 +622,7 @@ InitializeParallelCopyInfo(CopyState cstate, List *attnamelist)
cstate->cur_lineno = 0;
cstate->cur_attname = NULL;
cstate->cur_attval = NULL;
+ cstate->pcdata->curr_data_block = NULL;
/* Set up variables to avoid per-attribute overhead. */
initStringInfo(&cstate->attribute_buf);
@@ -842,7 +845,11 @@ return_line:
/* Mark that encoding conversion hasn't occurred yet. */
cstate->line_buf_converted = false;
- ConvertToServerEncoding(cstate);
+
+ /* For binary format data, we don't need conversion. */
+ if (!cstate->binary)
+ ConvertToServerEncoding(cstate);
+
pcdata->worker_line_buf_pos++;
return false;
}
@@ -998,33 +1005,70 @@ ParallelCopyFrom(CopyState cstate)
/* Execute the before statement triggers from the leader */
ExecBeforeStmtTrigger(cstate);
- /* On input just throw the header line away. */
- if (cstate->cur_lineno == 0 && cstate->header_line)
+ if (!cstate->binary)
{
- cstate->cur_lineno++;
- if (CopyReadLine(cstate))
+ /* On input just throw the header line away. */
+ if (cstate->cur_lineno == 0 && cstate->header_line)
{
- pcshared_info->is_read_in_progress = false;
- return; /* done */
+ cstate->cur_lineno++;
+ if (CopyReadLine(cstate))
+ {
+ pcshared_info->is_read_in_progress = false;
+ return; /* done */
+ }
}
- }
- for (;;)
- {
- bool done;
+ for (;;)
+ {
+ bool done;
- cstate->cur_lineno++;
+ cstate->cur_lineno++;
- /* Actually read the line into memory here. */
- done = CopyReadLine(cstate);
+ /* Actually read the line into memory here. */
+ done = CopyReadLine(cstate);
+ /*
+ * EOF at start of line means we're done. If we see EOF after
+ * some characters, we act as though it was newline followed by
+ * EOF, ie, process the line and then exit loop on next iteration.
+ */
+ if (done && cstate->line_buf.len == 0)
+ break;
+ }
+ }
+ else
+ {
/*
- * EOF at start of line means we're done. If we see EOF after some
- * characters, we act as though it was newline followed by EOF, ie,
- * process the line and then exit loop on next iteration.
+ * Binary Format Files. In parallel copy leader, fill in the error
+ * context information here, in case any failures while determining
+ * tuple offsets, leader would throw the errors with proper context.
*/
- if (done && cstate->line_buf.len == 0)
- break;
+ ErrorContextCallback errcallback;
+
+ errcallback.callback = CopyFromErrorCallback;
+ errcallback.arg = (void *) cstate;
+ errcallback.previous = error_context_stack;
+ error_context_stack = &errcallback;
+ cstate->pcdata->curr_data_block = NULL;
+ cstate->raw_buf_index = 0;
+ pcshared_info->populated = 0;
+ cstate->cur_lineno = 0;
+ cstate->max_fields = list_length(cstate->attnumlist);
+
+ for (;;)
+ {
+ bool eof = false;
+
+ cstate->cur_lineno++;
+
+ eof = CopyReadBinaryTupleLeader(cstate);
+
+ if (eof)
+ break;
+ }
+
+ /* Reset the error context. */
+ error_context_stack = errcallback.previous;
}
pcshared_info->is_read_in_progress = false;
@@ -1032,6 +1076,289 @@ ParallelCopyFrom(CopyState cstate)
}
/*
+ * CopyReadBinaryGetDataBlock
+ *
+ * Gets a new block, updates the current offset, calculates the skip bytes.
+ */
+void
+CopyReadBinaryGetDataBlock(CopyState cstate, FieldInfoType field_info)
+{
+ ParallelCopyDataBlock *data_block = NULL;
+ ParallelCopyDataBlock *curr_data_block = cstate->pcdata->curr_data_block;
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ uint8 move_bytes = 0;
+ uint32 block_pos;
+ uint32 prev_block_pos;
+ int read_bytes = 0;
+
+ prev_block_pos = pcshared_info->cur_block_pos;
+
+ block_pos = WaitGetFreeCopyBlock(pcshared_info);
+
+ if (field_info == FIELD_SIZE || field_info == FIELD_COUNT)
+ move_bytes = (DATA_BLOCK_SIZE - cstate->raw_buf_index);
+
+ if (curr_data_block != NULL)
+ curr_data_block->skip_bytes = move_bytes;
+
+ data_block = &pcshared_info->data_blocks[block_pos];
+
+ if (move_bytes > 0 && curr_data_block != NULL)
+ memmove(&data_block->data[0], &curr_data_block->data[cstate->raw_buf_index], move_bytes);
+
+ elog(DEBUG1, "LEADER - field info %d is spread across data blocks - moved %d bytes from current block %u to %u block",
+ field_info, move_bytes, prev_block_pos, block_pos);
+
+ read_bytes = CopyGetData(cstate, &data_block->data[move_bytes], 1, (DATA_BLOCK_SIZE - move_bytes));
+
+ if (field_info == FIELD_NONE && cstate->reached_eof)
+ return;
+
+ if (cstate->reached_eof)
+ EOF_ERROR;
+
+ elog(DEBUG1, "LEADER - bytes read from file %d", read_bytes);
+
+ if (field_info == FIELD_SIZE || field_info == FIELD_DATA)
+ {
+ ParallelCopyDataBlock *prev_data_block = NULL;
+
+ prev_data_block = curr_data_block;
+ prev_data_block->following_block = block_pos;
+
+ if (prev_data_block->curr_blk_completed == false)
+ prev_data_block->curr_blk_completed = true;
+
+ pg_atomic_add_fetch_u32(&prev_data_block->unprocessed_line_parts, 1);
+ }
+
+ cstate->pcdata->curr_data_block = data_block;
+ cstate->raw_buf_index = 0;
+}
+
+/*
+ * CopyReadBinaryTupleLeader
+ *
+ * Leader reads data from binary formatted file to data blocks and identifies
+ * tuple boundaries/offsets so that workers can work on the data blocks data.
+ */
+bool
+CopyReadBinaryTupleLeader(CopyState cstate)
+{
+ ParallelCopyShmInfo *pcshared_info = cstate->pcdata->pcshared_info;
+ int16 fld_count;
+ uint32 line_size = 0;
+ uint32 start_block_pos;
+ uint32 start_offset;
+
+ if (cstate->pcdata->curr_data_block == NULL)
+ {
+ CopyReadBinaryGetDataBlock(cstate, FIELD_NONE);
+
+ /*
+ * no data is read from file here. one possibility to be here could be
+ * that the binary file just has a valid signature but nothing else.
+ */
+ if (cstate->reached_eof)
+ return true;
+ }
+
+ if ((cstate->raw_buf_index + sizeof(fld_count)) >= DATA_BLOCK_SIZE)
+ CopyReadBinaryGetDataBlock(cstate, FIELD_COUNT);
+
+ memcpy(&fld_count, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_count));
+ fld_count = (int16) pg_ntoh16(fld_count);
+
+ CHECK_FIELD_COUNT;
+
+ start_offset = cstate->raw_buf_index;
+ cstate->raw_buf_index += sizeof(fld_count);
+ line_size += sizeof(fld_count);
+ start_block_pos = pcshared_info->cur_block_pos;
+
+ CopyReadBinaryFindTupleSize(cstate, &line_size);
+
+ pg_atomic_add_fetch_u32(&cstate->pcdata->curr_data_block->unprocessed_line_parts, 1);
+
+ if (line_size > 0)
+ (void) UpdateSharedLineInfo(cstate, start_block_pos, start_offset,
+ line_size, LINE_LEADER_POPULATED, -1);
+
+ return false;
+}
+
+/*
+ * CopyReadBinaryFindTupleSize
+ *
+ * Leader identifies boundaries/offsets for each attribute/column and finally
+ * results in the tuple/row size. It moves on to next data block if the
+ * attribute/column is spread across data blocks.
+ */
+void
+CopyReadBinaryFindTupleSize(CopyState cstate, uint32 *line_size)
+{
+ int32 fld_size;
+ ListCell *cur;
+ TupleDesc tup_desc = RelationGetDescr(cstate->rel);
+
+ foreach(cur, cstate->attnumlist)
+ {
+ int att_num = lfirst_int(cur);
+ Form_pg_attribute att = TupleDescAttr(tup_desc, (att_num - 1));
+
+ cstate->cur_attname = NameStr(att->attname);
+ fld_size = 0;
+
+ if ((cstate->raw_buf_index + sizeof(fld_size)) >= DATA_BLOCK_SIZE)
+ CopyReadBinaryGetDataBlock(cstate, FIELD_SIZE);
+
+ memcpy(&fld_size, &cstate->pcdata->curr_data_block->data[cstate->raw_buf_index], sizeof(fld_size));
+ cstate->raw_buf_index += sizeof(fld_size);
+ *line_size += sizeof(fld_size);
+ fld_size = (int32) pg_ntoh32(fld_size);
+
+ /* fld_size -1 represents the null value for the field. */
+ if (fld_size == -1)
+ continue;
+
+ CHECK_FIELD_SIZE(fld_size);
+
+ *line_size += fld_size;
+
+ if ((DATA_BLOCK_SIZE - cstate->raw_buf_index) >= fld_size)
+ {
+ cstate->raw_buf_index += fld_size;
+ elog(DEBUG1, "LEADER - tuple lies in he same data block");
+ }
+ else
+ {
+ int32 required_blks = 0;
+ int32 curr_blk_bytes = (DATA_BLOCK_SIZE - cstate->raw_buf_index);
+ int i = 0;
+
+ GET_REQUIRED_BLOCKS(required_blks, fld_size, curr_blk_bytes);
+
+ i = required_blks;
+
+ while (i > 0)
+ {
+ CopyReadBinaryGetDataBlock(cstate, FIELD_DATA);
+ i--;
+ }
+
+ GET_RAW_BUF_INDEX(cstate->raw_buf_index, fld_size, required_blks, curr_blk_bytes);
+
+ /*
+ * raw_buf_index should never cross data block size, as the
+ * required number of data blocks would have been obtained in the
+ * above while loop.
+ */
+ Assert(cstate->raw_buf_index <= DATA_BLOCK_SIZE);
+ }
+ cstate->cur_attname = NULL;
+ }
+}
+
+/*
+ * CopyReadBinaryTupleWorker
+ *
+ * Each worker reads data from data blocks caches the tuple data into local
+ * memory.
+ */
+bool
+CopyReadBinaryTupleWorker(CopyState cstate, Datum *values, bool *nulls)
+{
+ int16 fld_count;
+ ListCell *cur;
+ FmgrInfo *in_functions = cstate->in_functions;
+ Oid *typioparams = cstate->typioparams;
+ TupleDesc tup_desc = RelationGetDescr(cstate->rel);
+ bool done = false;
+
+ done = GetWorkerLine(cstate);
+ cstate->raw_buf_index = 0;
+
+ if (done && cstate->line_buf.len == 0)
+ return true;
+
+ memcpy(&fld_count, &cstate->line_buf.data[cstate->raw_buf_index], sizeof(fld_count));
+ fld_count = (int16) pg_ntoh16(fld_count);
+
+ CHECK_FIELD_COUNT;
+
+ cstate->raw_buf_index += sizeof(fld_count);
+
+ foreach(cur, cstate->attnumlist)
+ {
+ int att_num = lfirst_int(cur);
+ int m = att_num - 1;
+ Form_pg_attribute att = TupleDescAttr(tup_desc, m);
+
+ cstate->cur_attname = NameStr(att->attname);
+
+ values[m] = CopyReadBinaryAttributeWorker(cstate,
+ &in_functions[m],
+ typioparams[m],
+ att->atttypmod,
+ &nulls[m]);
+ cstate->cur_attname = NULL;
+ }
+
+ return false;
+}
+
+/*
+ * CopyReadBinaryAttributeWorker
+ *
+ * Worker identifies and converts each attribute/column data from binary to
+ * the data type of attribute/column.
+ */
+Datum
+CopyReadBinaryAttributeWorker(CopyState cstate, FmgrInfo *flinfo,
+ Oid typioparam, int32 typmod, bool *isnull)
+{
+ int32 fld_size;
+ Datum result;
+
+ memcpy(&fld_size, &cstate->line_buf.data[cstate->raw_buf_index], sizeof(fld_size));
+ cstate->raw_buf_index += sizeof(fld_size);
+ fld_size = (int32) pg_ntoh32(fld_size);
+
+ /* fld_size -1 represents the null value for the field. */
+ if (fld_size == -1)
+ {
+ *isnull = true;
+ return ReceiveFunctionCall(flinfo, NULL, typioparam, typmod);
+ }
+
+ CHECK_FIELD_SIZE(fld_size);
+
+ /* Reset attribute_buf to empty, and load raw data in it */
+ resetStringInfo(&cstate->attribute_buf);
+
+ enlargeStringInfo(&cstate->attribute_buf, fld_size);
+
+ memcpy(&cstate->attribute_buf.data[0], &cstate->line_buf.data[cstate->raw_buf_index], fld_size);
+ cstate->raw_buf_index += fld_size;
+
+ cstate->attribute_buf.len = fld_size;
+ cstate->attribute_buf.data[fld_size] = '\0';
+
+ /* Call the column type's binary input converter */
+ result = ReceiveFunctionCall(flinfo, &cstate->attribute_buf,
+ typioparam, typmod);
+
+ /* Trouble if it didn't eat the whole buffer */
+ if (cstate->attribute_buf.cursor != cstate->attribute_buf.len)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_BINARY_REPRESENTATION),
+ errmsg("incorrect binary data format")));
+
+ *isnull = false;
+ return result;
+}
+
+/*
* GetLinePosition
*
* Return the line position once the leader has populated the data.
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 9b19dcb..49f438f 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -59,6 +59,109 @@
/*
+ * CHECK_FIELD_COUNT - Handles the error cases for field count
+ * for binary format files.
+ */
+#define CHECK_FIELD_COUNT \
+{\
+ if (fld_count == -1) \
+ { \
+ if (IsParallelCopy() && \
+ !IsLeader()) \
+ return true; \
+ else if (IsParallelCopy() && \
+ IsLeader()) \
+ { \
+ if (cstate->pcdata->curr_data_block->data[cstate->raw_buf_index + sizeof(fld_count)] != 0) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("received copy data after EOF marker"))); \
+ return true; \
+ } \
+ else \
+ { \
+ /* \
+ * Received EOF marker. In a V3-protocol copy, wait for the \
+ * protocol-level EOF, and complain if it doesn't come \
+ * immediately. This ensures that we correctly handle CopyFail, \
+ * if client chooses to send that now. \
+ * \
+ * Note that we MUST NOT try to read more data in an old-protocol \
+ * copy, since there is no protocol-level EOF marker then. We \
+ * could go either way for copy from file, but choose to throw \
+ * error if there's data after the EOF marker, for consistency \
+ * with the new-protocol case. \
+ */ \
+ char dummy; \
+ if (cstate->copy_dest != COPY_OLD_FE && \
+ CopyReadBinaryData(cstate, &dummy, 1) > 0) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("received copy data after EOF marker"))); \
+ return false; \
+ } \
+ } \
+ if (fld_count != cstate->max_fields) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("row field count is %d, expected %d", \
+ (int) fld_count, cstate->max_fields))); \
+}
+
+/*
+ * CHECK_FIELD_SIZE - Handles the error case for field size
+ * for binary format files.
+ */
+#define CHECK_FIELD_SIZE(fld_size) \
+{ \
+ if (fld_size < -1) \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("invalid field size")));\
+}
+
+/*
+ * EOF_ERROR - Error statement for EOF for binary format
+ * files.
+ */
+#define EOF_ERROR \
+{ \
+ ereport(ERROR, \
+ (errcode(ERRCODE_BAD_COPY_FILE_FORMAT), \
+ errmsg("unexpected EOF in COPY data")));\
+}
+
+/*
+ * GET_RAW_BUF_INDEX - Calculates the raw buf index for the cases
+ * where the data spread is across multiple data blocks.
+ */
+#define GET_RAW_BUF_INDEX(raw_buf_index, fld_size, required_blks, curr_blk_bytes) \
+{ \
+ raw_buf_index = fld_size - (((required_blks - 1) * DATA_BLOCK_SIZE) + curr_blk_bytes); \
+}
+
+/*
+ * GET_REQUIRED_BLOCKS - Calculates the number of data
+ * blocks required for the cases where the data spread
+ * is across multiple data blocks.
+ */
+#define GET_REQUIRED_BLOCKS(required_blks, fld_size, curr_blk_bytes) \
+{ \
+ /* \
+ * field size can spread across multiple data blocks, \
+ * calculate the number of required data blocks and try to get \
+ * those many data blocks. \
+ */ \
+ required_blks = (int32)(fld_size - curr_blk_bytes)/(int32)DATA_BLOCK_SIZE; \
+ /* \
+ * check if we need the data block for the field data \
+ * bytes that are not modulus of data block size. \
+ */ \
+ if ((fld_size - curr_blk_bytes)%DATA_BLOCK_SIZE != 0) \
+ required_blks++; \
+}
+
+/*
* Represents the different source/dest cases we need to worry about at
* the bottom level
*/
@@ -236,6 +339,17 @@ typedef struct ParallelCopyLineBuf
} ParallelCopyLineBuf;
/*
+ * Represents the usage mode for CopyReadBinaryGetDataBlock.
+ */
+typedef enum FieldInfoType
+{
+ FIELD_NONE = 0,
+ FIELD_COUNT,
+ FIELD_SIZE,
+ FIELD_DATA
+} FieldInfoType;
+
+/*
* This structure helps in storing the common data from CopyStateData that are
* required by the workers. This information will then be allocated and stored
* into the DSM for the worker to retrieve and copy it to CopyStateData.
@@ -258,6 +372,7 @@ typedef struct SerializedParallelCopyState
/* Working state for COPY FROM */
AttrNumber num_defaults;
Oid relid;
+ bool binary;
} SerializedParallelCopyState;
/*
@@ -284,6 +399,9 @@ typedef struct ParallelCopyData
/* Current position in worker_line_buf */
uint32 worker_line_buf_pos;
+
+ /* For binary formatted files */
+ ParallelCopyDataBlock *curr_data_block;
} ParallelCopyData;
typedef int (*copy_data_source_cb) (void *outbuf, int minread, int maxread);
@@ -470,4 +588,12 @@ extern uint32 UpdateSharedLineInfo(CopyState cstate, uint32 blk_pos, uint32 offs
uint32 line_size, uint32 line_state, uint32 blk_line_pos);
extern void EndLineParallelCopy(CopyState cstate, uint32 line_pos, uint32 line_size,
uint32 raw_buf_ptr);
+extern int CopyGetData(CopyState cstate, void *databuf, int minread, int maxread);
+extern int CopyReadBinaryData(CopyState cstate, char *dest, int nbytes);
+extern bool CopyReadBinaryTupleLeader(CopyState cstate);
+extern bool CopyReadBinaryTupleWorker(CopyState cstate, Datum *values, bool *nulls);
+extern void CopyReadBinaryFindTupleSize(CopyState cstate, uint32 *line_size);
+extern Datum CopyReadBinaryAttributeWorker(CopyState cstate, FmgrInfo *flinfo,
+ Oid typioparam, int32 typmod, bool *isnull);
+extern void CopyReadBinaryGetDataBlock(CopyState cstate, FieldInfoType field_info);
#endif /* COPY_H */
--
1.8.3.1
I did performance testing on v7 patch set[1]/messages/by-id/CALDaNm1n1xW43neXSGs=c7zt-mj+JHHbubWBVDYT9NfCoF8TuQ@mail.gmail.com with custom
postgresql.conf[2]shared_buffers = 40GB max_worker_processes = 32 max_parallel_maintenance_workers = 24 max_parallel_workers = 32 synchronous_commit = off checkpoint_timeout = 1d max_wal_size = 24GB min_wal_size = 15GB autovacuum = off. The results are of the triplet form (exec time in
sec, number of workers, gain)
Use case 1: 10million rows, 5.2GB data, 2 indexes on integer columns,
1 index on text column, binary file
(1104.898, 0, 1X), (1112.221, 1, 1X), (640.236, 2, 1.72X), (335.090,
4, 3.3X), (200.492, 8, 5.51X), (131.448, 16, 8.4X), (121.832, 20,
9.1X), (124.287, 30, 8.9X)
Use case 2: 10million rows, 5.2GB data,2 indexes on integer columns, 1
index on text column, copy from stdin, csv format
(1203.282, 0, 1X), (1135.517, 1, 1.06X), (655.140, 2, 1.84X),
(343.688, 4, 3.5X), (203.742, 8, 5.9X), (144.793, 16, 8.31X),
(133.339, 20, 9.02X), (136.672, 30, 8.8X)
Use case 3: 10million rows, 5.2GB data,2 indexes on integer columns, 1
index on text column, text file
(1165.991, 0, 1X), (1128.599, 1, 1.03X), (644.793, 2, 1.81X),
(342.813, 4, 3.4X), (204.279, 8, 5.71X), (139.986, 16, 8.33X),
(128.259, 20, 9.1X), (132.764, 30, 8.78X)
Above results are similar to the results with earlier versions of the patch set.
On Fri, Oct 9, 2020 at 3:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
Sure, you need to change the code such that when force_parallel_mode =
'regress' is specified then it always uses one worker. This is
primarily for testing purposes and will help during the development of
this patch as it will make all exiting Copy tests to use quite a good
portion of the parallel infrastructure.
I performed force_parallel_mode = regress testing and found 2 issues,
the fixes for the same are available in v7 patch set[1]/messages/by-id/CALDaNm1n1xW43neXSGs=c7zt-mj+JHHbubWBVDYT9NfCoF8TuQ@mail.gmail.com.
Overall, we have below test cases to cover the code and for performance measurements. We plan to run these tests whenever a new set of patches is posted.
1. csv
2. binaryDon't we need the tests for plain text files as well?
I added a text use case and above mentioned are perf results on v7 patch set[1]/messages/by-id/CALDaNm1n1xW43neXSGs=c7zt-mj+JHHbubWBVDYT9NfCoF8TuQ@mail.gmail.com.
3. force parallel mode = regress
4. toast data csv and binary
5. foreign key check, before row, after row, before statement, after statement, instead of triggers
6. partition case
7. foreign partitions and partitions having trigger cases
8. where clause having parallel unsafe and safe expression, default parallel unsafe and safe expression
9. temp, global, local, unlogged, inherited tables cases, foreign tablesSounds like good coverage. So, are you doing all this testing
manually? How are you maintaining these tests?
All test cases listed above, except for the cases that are meant to
measure perf gain with huge data, are present in v7-0005 patch in v7
patch set[1]/messages/by-id/CALDaNm1n1xW43neXSGs=c7zt-mj+JHHbubWBVDYT9NfCoF8TuQ@mail.gmail.com.
[1]: /messages/by-id/CALDaNm1n1xW43neXSGs=c7zt-mj+JHHbubWBVDYT9NfCoF8TuQ@mail.gmail.com
[2]: shared_buffers = 40GB max_worker_processes = 32 max_parallel_maintenance_workers = 24 max_parallel_workers = 32 synchronous_commit = off checkpoint_timeout = 1d max_wal_size = 24GB min_wal_size = 15GB autovacuum = off
shared_buffers = 40GB
max_worker_processes = 32
max_parallel_maintenance_workers = 24
max_parallel_workers = 32
synchronous_commit = off
checkpoint_timeout = 1d
max_wal_size = 24GB
min_wal_size = 15GB
autovacuum = off
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
On Fri, Oct 9, 2020 at 10:42 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Oct 8, 2020 at 12:14 AM vignesh C <vignesh21@gmail.com> wrote:
On Mon, Sep 28, 2020 at 12:19 PM Amit Kapila <amit.kapila16@gmail.com>
wrote:
I am convinced by the reason given by Kyotaro-San in that another
thread [1] and performance data shown by Peter that this can't be an
independent improvement and rather in some cases it can do harm. Now,
if you need it for a parallel-copy path then we can change it
specifically to the parallel-copy code path but I don't understand
your reason completely.Whenever we need data to be populated, we will get a new data block &
pass it to CopyGetData to populate the data. In case of file copy, the
server will completely fill the data block. We expect the data to be
filled completely. If data is available it will completely load the
complete data block in case of file copy. There is no scenario where
even if data is present a partial data block will be returned except
for EOF or no data available. But in case of STDIN data copy, even
though there is 8K data available in data block & 8K data available in
STDIN, CopyGetData will return as soon as libpq buffer data is more
than the minread. We will pass new data block every time to load data.
Every time we pass an 8K data block but CopyGetData loads a few bytes
in the new data block & returns. I wanted to keep the same data
population logic for both file copy & STDIN copy i.e copy full 8K data
blocks & then the populated data can be required. There is an
alternative solution I can have some special handling in case of STDIN
wherein the existing data block can be passed with the index from
where the data should be copied. Thoughts?What you are proposing as an alternative solution, isn't that what we
are doing without the patch? IIUC, you require this because of your
corresponding changes to handle COPY_NEW_FE in CopyReadLine(), is that
right? If so, what is the difficulty in making it behave similar to
the non-parallel case?
The alternate solution is similar to how existing copy handles STDIN
copies, I have made changes in the v7 patch attached in [1]/messages/by-id/CALDaNm1n1xW43neXSGs=c7zt-mj+JHHbubWBVDYT9NfCoF8TuQ@mail.gmail.com to have
parallel copy handle STDIN data similar to non parallel copy, so the
original comment on why this change is required has been removed from 001
patch:
+ if (cstate->copy_dest == COPY_NEW_FE) + minread = RAW_BUF_SIZE - nbytes; + inbytes = CopyGetData(cstate, cstate->raw_buf + nbytes, - 1, RAW_BUF_SIZE - nbytes); + minread, RAW_BUF_SIZE - nbytes);No comment to explain why this change is done?
[1]: /messages/by-id/CALDaNm1n1xW43neXSGs=c7zt-mj+JHHbubWBVDYT9NfCoF8TuQ@mail.gmail.com
/messages/by-id/CALDaNm1n1xW43neXSGs=c7zt-mj+JHHbubWBVDYT9NfCoF8TuQ@mail.gmail.com
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
On Fri, Oct 9, 2020 at 11:01 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Oct 8, 2020 at 12:14 AM vignesh C <vignesh21@gmail.com> wrote:
On Mon, Sep 28, 2020 at 12:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
+ */
+typedef struct ParallelCopyLineBoundaryAre we doing all this state management to avoid using locks while
processing lines? If so, I think we can use either spinlock or LWLock
to keep the main patch simple and then provide a later patch to make
it lock-less. This will allow us to first focus on the main design of
the patch rather than trying to make this datastructure processing
lock-less in the best possible way.The steps will be more or less same if we use spinlock too. step 1, step 3 & step 4 will be common we have to use lock & unlock instead of step 2 & step 5. I feel we can retain the current implementation.
I'll study this in detail and let you know my opinion on the same but
in the meantime, I don't follow one part of this comment: "If they
don't follow this order the worker might process wrong line_size and
leader might populate the information which worker has not yet
processed or in the process of processing."Do you want to say that leader might overwrite some information which
worker hasn't read yet? If so, it is not clear from the comment.
Another minor point about this comment:Here leader and worker must follow these steps to avoid any corruption
or hang issue. Changed it to:
* The leader & worker process access the shared line information by following
* the below steps to avoid any data corruption or hang:Actually, I wanted more on the lines why such corruption or hang can
happen? It might help reviewers to understand why you have followed
such a sequence.
There are 3 variables which the leader & worker are working on:
line_size, line_state & data. Leader will update line_state & populate
data, update line_size & line_state. Workers will wait for line_state
to be updated, once the updated leader will read the data based on the
line_size. If the worker is not synchronized wrong line_size will be
set & read wrong amount of data, anything can happen.There are 3
variables which leader & worker are working on: line_size, line_state
& data. Leader will update line_state & populate data, update
line_size & line_state. Workers will wait for line_state to be
updated, once the updated leader will read the data based on the
line_size. If the worker is not synchronized wrong line_size will be
set & read wrong amount of data, anything can happen. This is the
usual concurrency case with reader/writers. I felt that much details
need not be mentioned.
How did you ensure that this is fixed? Have you tested it, if so
please share the test? I see a basic problem with your fix.+ /* Report WAL/buffer usage during parallel execution */ + bufferusage = shm_toc_lookup(toc, PARALLEL_COPY_BUFFER_USAGE, false); + walusage = shm_toc_lookup(toc, PARALLEL_COPY_WAL_USAGE, false); + InstrEndParallelQuery(&bufferusage[ParallelWorkerNumber], + &walusage[ParallelWorkerNumber]);You need to call InstrStartParallelQuery() before the actual operation
starts, without that stats won't be accurate? Also, after calling
WaitForParallelWorkersToFinish(), you need to accumulate the stats
collected from workers which neither you have done nor is possible
with the current code in your patch because you haven't made any
provision to capture them in BeginParallelCopy.I suggest you look into lazy_parallel_vacuum_indexes() and
begin_parallel_vacuum() to understand how the buffer/wal usage stats
are accumulated. Also, please test this functionality using
pg_stat_statements.Made changes accordingly.
I have verified it using:
postgres=# select * from pg_stat_statements where query like '%copy%';
userid | dbid | queryid |
query
| plans | total_plan_time |
min_plan_time | max_plan_time | mean_plan_time | stddev_plan_time |
calls | total_exec_time | min_exec_time | max_exec_time |
mean_exec_time | stddev_exec_time | rows | shared_blks_hi
t | shared_blks_read | shared_blks_dirtied | shared_blks_written |
local_blks_hit | local_blks_read | local_blks_dirtied |
local_blks_written | temp_blks_read | temp_blks_written | blk_
read_time | blk_write_time | wal_records | wal_fpi | wal_bytes
--------+-------+----------------------+---------------------------------------------------------------------------------------------------------------------+-------+-----------------+-
--------------+---------------+----------------+------------------+-------+-----------------+---------------+---------------+----------------+------------------+--------+---------------
--+------------------+---------------------+---------------------+----------------+-----------------+--------------------+--------------------+----------------+-------------------+-----
----------+----------------+-------------+---------+-----------
10 | 13743 | -6947756673093447609 | copy hw from
'/home/vignesh/postgres/postgres/inst/bin/hw_175000.csv' with(format
csv, delimiter ',') | 0 | 0 |
0 | 0 | 0 | 0 |
1 | 265.195105 | 265.195105 | 265.195105 | 265.195105
| 0 | 175000 | 191
6 | 0 | 946 | 946 |
0 | 0 | 0 | 0
| 0 | 0 |
0 | 0 | 1116 | 0 | 3587203
10 | 13743 | 8570215596364326047 | copy hw from
'/home/vignesh/postgres/postgres/inst/bin/hw_175000.csv' with(format
csv, delimiter ',', parallel '2') | 0 | 0 |
0 | 0 | 0 | 0 |
1 | 35668.402482 | 35668.402482 | 35668.402482 | 35668.402482
| 0 | 175000 | 310
1 | 36 | 952 | 919 |
0 | 0 | 0 | 0
| 0 | 0 |
0 | 0 | 1119 | 6 | 3624405
(2 rows)I am not able to properly parse the data but If understand the wal
data for non-parallel (1116 | 0 | 3587203) and parallel (1119
| 6 | 3624405) case doesn't seem to be the same. Is that
right? If so, why? Please ensure that no checkpoint happens for both
cases.
I have disabled checkpoint, the results with the checkpoint disabled
are given below:
| wal_records | wal_fpi | wal_bytes
Sequential Copy | 1116 | 0 | 3587669
Parallel Copy(1 worker) | 1116 | 0 | 3587669
Parallel Copy(4 worker) | 1121 | 0 | 3587668
I noticed that for 1 worker wal_records & wal_bytes are same as
sequential copy, but with different worker count I had noticed that
there is difference in wal_records & wal_bytes, I think the difference
should be ok because with more than 1 worker the order of records
processed will be different based on which worker picks which records
to process from input file. In the case of sequential copy/1 worker
the order in which the records will be processed is always in the same
order hence wal_bytes are the same.
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
On Sat, Oct 3, 2020 at 6:20 AM Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:
Hello Vignesh,
I've done some basic benchmarking on the v4 version of the patches (but
AFAIKC the v5 should perform about the same), and some initial review.For the benchmarking, I used the lineitem table from TPC-H - for 75GB
data set, this largest table is about 64GB once loaded, with another
54GB in 5 indexes. This is on a server with 32 cores, 64GB of RAM and
NVME storage.The COPY duration with varying number of workers (specified using the
parallel COPY option) looks like this:workers duration
---------------------
0 1366
1 1255
2 704
3 526
4 434
5 385
6 347
7 322
8 327So this seems to work pretty well - initially we get almost linear
speedup, then it slows down (likely due to contention for locks, I/O
etc.). Not bad.
Thanks for testing with different workers & posting the results.
I've only done a quick review, but overall the patch looks in fairly
good shape.1) I don't quite understand why we need INCREMENTPROCESSED and
RETURNPROCESSED, considering it just does ++ or return. It just
obfuscated the code, I think.
I have removed the macros.
2) I find it somewhat strange that BeginParallelCopy can just decide not
to do parallel copy after all. Why not to do this decisions in the
caller? Or maybe it's fine this way, not sure.
I have moved the check IsParallelCopyAllowed to the caller.
3) AFAIK we don't modify typedefs.list in patches, so these changes
should be removed.
I had seen that in many of the commits typedefs.list is getting changed,
also it helps in running pgindent. So I'm retaining this change.
4) IsTriggerFunctionParallelSafe actually checks all triggers, not just
one, so the comment needs minor rewording.
Modified the comments.
Thanks for the comments & sharing the test results Tomas, These changes are
fixed in one of my earlier mail [1]/messages/by-id/CALDaNm1n1xW43neXSGs=c7zt-mj+JHHbubWBVDYT9NfCoF8TuQ@mail.gmail.com that I sent.
[1]: /messages/by-id/CALDaNm1n1xW43neXSGs=c7zt-mj+JHHbubWBVDYT9NfCoF8TuQ@mail.gmail.com
/messages/by-id/CALDaNm1n1xW43neXSGs=c7zt-mj+JHHbubWBVDYT9NfCoF8TuQ@mail.gmail.com
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
On Thu, Oct 8, 2020 at 8:43 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
On Thu, Oct 8, 2020 at 5:44 AM vignesh C <vignesh21@gmail.com> wrote:
Attached v6 patch with the fixes.
Hi Vignesh,
I noticed a couple of issues when scanning the code in the following
patch:
v6-0003-Allow-copy-from-command-to-process-data-from-file.patch
In the following code, it will put a junk uint16 value into *destptr
(and thus may well cause a crash) on a Big Endian architecture
(Solaris Sparc, s390x, etc.):
You're storing a (uint16) string length in a uint32 and then pulling
out the lower two bytes of the uint32 and copying them into the
location pointed to by destptr.static void +CopyStringToSharedMemory(CopyState cstate, char *srcPtr, char *destptr, + uint32 *copiedsize) +{ + uint32 len = srcPtr ? strlen(srcPtr) + 1 : 0; + + memcpy(destptr, (uint16 *) &len, sizeof(uint16)); + *copiedsize += sizeof(uint16); + if (len) + { + memcpy(destptr + sizeof(uint16), srcPtr, len); + *copiedsize += len; + } +}I suggest you change the code to:
uint16 len = srcPtr ? (uint16)strlen(srcPtr) + 1 : 0;
memcpy(destptr, &len, sizeof(uint16));[I assume string length here can't ever exceed (65535 - 1), right?]
Looking a bit deeper into this, I'm wondering if in fact your
EstimateStringSize() and EstimateNodeSize() functions should be using
BUFFERALIGN() for EACH stored string/node (rather than just calling
shm_toc_estimate_chunk() once at the end, after the length of packed
strings and nodes has been estimated), to ensure alignment of start of
each string/node. Other Postgres code appears to be aligning each
stored chunk using shm_toc_estimate_chunk(). See the definition of
that macro and its current usages.
I'm not handling this, this is similar to how it is handled in other places.
Then you could safely use:
uint16 len = srcPtr ? (uint16)strlen(srcPtr) + 1 : 0;
*(uint16 *)destptr = len;
*copiedsize += sizeof(uint16);
if (len)
{
memcpy(destptr + sizeof(uint16), srcPtr, len);
*copiedsize += len;
}and in the CopyStringFromSharedMemory() function, then could safely use:
len = *(uint16 *)srcPtr;
The compiler may be smart enough to optimize-away the memcpy() in this
case anyway, but there are issues in doing this for architectures that
take a performance hit for unaligned access, or don't support
unaligned access.
Changed it to uin32, so that there are no issues in case if length exceeds
65535 & also to avoid problems in Big Endian architecture.
Also, in CopyXXXXFromSharedMemory() functions, you should use palloc()
instead of palloc0(), as you're filling the entire palloc'd buffer
anyway, so no need to ask for additional MemSet() of all buffer bytes
to 0 prior to memcpy().
I have changed palloc0 to palloc.
Thanks Greg for reviewing & providing your comments. These changes are
fixed in one of my earlier mail [1]/messages/by-id/CALDaNm1n1xW43neXSGs=c7zt-mj+JHHbubWBVDYT9NfCoF8TuQ@mail.gmail.com that I sent.
[1]: /messages/by-id/CALDaNm1n1xW43neXSGs=c7zt-mj+JHHbubWBVDYT9NfCoF8TuQ@mail.gmail.com
/messages/by-id/CALDaNm1n1xW43neXSGs=c7zt-mj+JHHbubWBVDYT9NfCoF8TuQ@mail.gmail.com
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
On Wed, Oct 14, 2020 at 6:51 PM vignesh C <vignesh21@gmail.com> wrote:
On Fri, Oct 9, 2020 at 11:01 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
I am not able to properly parse the data but If understand the wal
data for non-parallel (1116 | 0 | 3587203) and parallel (1119
| 6 | 3624405) case doesn't seem to be the same. Is that
right? If so, why? Please ensure that no checkpoint happens for both
cases.I have disabled checkpoint, the results with the checkpoint disabled
are given below:
| wal_records | wal_fpi | wal_bytes
Sequential Copy | 1116 | 0 | 3587669
Parallel Copy(1 worker) | 1116 | 0 | 3587669
Parallel Copy(4 worker) | 1121 | 0 | 3587668
I noticed that for 1 worker wal_records & wal_bytes are same as
sequential copy, but with different worker count I had noticed that
there is difference in wal_records & wal_bytes, I think the difference
should be ok because with more than 1 worker the order of records
processed will be different based on which worker picks which records
to process from input file. In the case of sequential copy/1 worker
the order in which the records will be processed is always in the same
order hence wal_bytes are the same.
Are all records of the same size in your test? If so, then why the
order should matter? Also, even the number of wal_records has
increased but wal_bytes are not increased, rather it is one-byte less.
Can we identify what is going on here? I don't intend to say that it
is a problem but we should know the reason clearly.
--
With Regards,
Amit Kapila.
Hi Vignesh,
After having a look over the patch,
I have some suggestions for
0003-Allow-copy-from-command-to-process-data-from-file.patch.
1.
+static uint32
+EstimateCstateSize(ParallelContext *pcxt, CopyState cstate, List *attnamelist,
+ char **whereClauseStr, char **rangeTableStr,
+ char **attnameListStr, char **notnullListStr,
+ char **nullListStr, char **convertListStr)
+{
+ uint32 strsize = MAXALIGN(sizeof(SerializedParallelCopyState));
+
+ strsize += EstimateStringSize(cstate->null_print);
+ strsize += EstimateStringSize(cstate->delim);
+ strsize += EstimateStringSize(cstate->quote);
+ strsize += EstimateStringSize(cstate->escape);
It use function EstimateStringSize to get the strlen of null_print, delim, quote and escape.
But the length of null_print seems has been stored in null_print_len.
And delim/quote/escape must be 1 byte, so I think call strlen again seems unnecessary.
How about " strsize += sizeof(uint32) + cstate->null_print_len + 1"
2.
+ strsize += EstimateNodeSize(cstate->whereClause, whereClauseStr);
+ copiedsize += CopyStringToSharedMemory(cstate, whereClauseStr,
+ shmptr + copiedsize);
Some string length is counted for two times.
The ' whereClauseStr ' has call strlen in EstimateNodeSize once and call strlen in CopyStringToSharedMemory again.
I don't know wheather it's worth to refacor the code to avoid duplicate strlen . what do you think ?
Best regards,
houzj
On Sun, Oct 18, 2020 at 7:47 AM Hou, Zhijie <houzj.fnst@cn.fujitsu.com> wrote:
Hi Vignesh,
After having a look over the patch,
I have some suggestions for
0003-Allow-copy-from-command-to-process-data-from-file.patch.1.
+static uint32 +EstimateCstateSize(ParallelContext *pcxt, CopyState cstate, List *attnamelist, + char **whereClauseStr, char **rangeTableStr, + char **attnameListStr, char **notnullListStr, + char **nullListStr, char **convertListStr) +{ + uint32 strsize = MAXALIGN(sizeof(SerializedParallelCopyState)); + + strsize += EstimateStringSize(cstate->null_print); + strsize += EstimateStringSize(cstate->delim); + strsize += EstimateStringSize(cstate->quote); + strsize += EstimateStringSize(cstate->escape);It use function EstimateStringSize to get the strlen of null_print, delim, quote and escape.
But the length of null_print seems has been stored in null_print_len.
And delim/quote/escape must be 1 byte, so I think call strlen again seems unnecessary.How about " strsize += sizeof(uint32) + cstate->null_print_len + 1"
+1. This seems like a good suggestion but add comments for
delim/quote/escape to indicate that we are considering one-byte for
each. I think this will obviate the need of function
EstimateStringSize. Another thing in this regard is that we normally
use add_size function to compute the size but I don't see that being
used in this and nearby computation. That helps us to detect overflow
of addition if any.
EstimateCstateSize()
{
..
+
+ strsize++;
..
}
Why do we need this additional one-byte increment? Does it make sense
to add a small comment for the same?
2.
+ strsize += EstimateNodeSize(cstate->whereClause, whereClauseStr);+ copiedsize += CopyStringToSharedMemory(cstate, whereClauseStr, + shmptr + copiedsize);Some string length is counted for two times.
The ' whereClauseStr ' has call strlen in EstimateNodeSize once and call strlen in CopyStringToSharedMemory again.
I don't know wheather it's worth to refacor the code to avoid duplicate strlen . what do you think ?
It doesn't seem worth to me. We probably need to use additional
variables to save those lengths. I think it will add more
code/complexity than we will save. See EstimateParamListSpace and
SerializeParamList where we get the typeLen each time, that way code
looks neat to me and we are don't going to save much by not following
a similar thing here.
--
With Regards,
Amit Kapila.
On Thu, Oct 15, 2020 at 2:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Oct 14, 2020 at 6:51 PM vignesh C <vignesh21@gmail.com> wrote:
On Fri, Oct 9, 2020 at 11:01 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
I am not able to properly parse the data but If understand the wal
data for non-parallel (1116 | 0 | 3587203) and parallel (1119
| 6 | 3624405) case doesn't seem to be the same. Is that
right? If so, why? Please ensure that no checkpoint happens for both
cases.I have disabled checkpoint, the results with the checkpoint disabled
are given below:
| wal_records | wal_fpi | wal_bytes
Sequential Copy | 1116 | 0 | 3587669
Parallel Copy(1 worker) | 1116 | 0 | 3587669
Parallel Copy(4 worker) | 1121 | 0 | 3587668
I noticed that for 1 worker wal_records & wal_bytes are same as
sequential copy, but with different worker count I had noticed that
there is difference in wal_records & wal_bytes, I think the difference
should be ok because with more than 1 worker the order of records
processed will be different based on which worker picks which records
to process from input file. In the case of sequential copy/1 worker
the order in which the records will be processed is always in the same
order hence wal_bytes are the same.Are all records of the same size in your test? If so, then why the
order should matter? Also, even the number of wal_records has
increased but wal_bytes are not increased, rather it is one-byte less.
Can we identify what is going on here? I don't intend to say that it
is a problem but we should know the reason clearly.
The earlier run that I executed was with varying record size. The
below results are by modifying the records to keep it of same size:
| wal_records | wal_fpi
| wal_bytes
Sequential Copy | 1307 | 0 | 4198526
Parallel Copy(1 worker) | 1307 | 0 | 4198526
Parallel Copy(2 worker) | 1308 | 0 | 4198836
Parallel Copy(4 worker) | 1307 | 0 | 4199147
Parallel Copy(8 worker) | 1312 | 0 | 4199735
Parallel Copy(16 worker) | 1313 | 0 | 4200311
Still I noticed that there is some difference in wal_records &
wal_bytes. I feel the difference in wal_records & wal_bytes is because
of the following reasons:
Each worker prepares 1000 tuples and then tries to do
heap_multi_insert for 1000 tuples, In our case approximately 185
tuples is stored in 1 page, 925 tuples are stored in 5 WAL records and
the remaining 75 tuples are stored in next WAL record. The wal dump is
like below:
rmgr: Heap2 len (rec/tot): 3750/ 3750, tx: 510, lsn:
0/0160EC80, prev 0/0160DDB0, desc: MULTI_INSERT+INIT 185 tuples flags
0x00, blkref #0: rel 1663/13751/16384 blk 0
rmgr: Heap2 len (rec/tot): 3750/ 3750, tx: 510, lsn:
0/0160FB28, prev 0/0160EC80, desc: MULTI_INSERT+INIT 185 tuples flags
0x00, blkref #0: rel 1663/13751/16384 blk 1
rmgr: Heap2 len (rec/tot): 3750/ 3750, tx: 510, lsn:
0/016109E8, prev 0/0160FB28, desc: MULTI_INSERT+INIT 185 tuples flags
0x00, blkref #0: rel 1663/13751/16384 blk 2
rmgr: Heap2 len (rec/tot): 3750/ 3750, tx: 510, lsn:
0/01611890, prev 0/016109E8, desc: MULTI_INSERT+INIT 185 tuples flags
0x00, blkref #0: rel 1663/13751/16384 blk 3
rmgr: Heap2 len (rec/tot): 3750/ 3750, tx: 510, lsn:
0/01612750, prev 0/01611890, desc: MULTI_INSERT+INIT 185 tuples flags
0x00, blkref #0: rel 1663/13751/16384 blk 4
rmgr: Heap2 len (rec/tot): 1550/ 1550, tx: 510, lsn:
0/016135F8, prev 0/01612750, desc: MULTI_INSERT+INIT 75 tuples flags
0x02, blkref #0: rel 1663/13751/16384 blk 5
After the 1st 1000 tuples are inserted and when the worker tries to
insert another 1000 tuples, it will use the last page which had free
space to insert where we can insert 110 more tuples:
rmgr: Heap2 len (rec/tot): 2470/ 2470, tx: 510, lsn:
0/01613C08, prev 0/016135F8, desc: MULTI_INSERT 110 tuples flags 0x00,
blkref #0: rel 1663/13751/16384 blk 5
rmgr: Heap2 len (rec/tot): 3750/ 3750, tx: 510, lsn:
0/016145C8, prev 0/01613C08, desc: MULTI_INSERT+INIT 185 tuples flags
0x00, blkref #0: rel 1663/13751/16384 blk 6
rmgr: Heap2 len (rec/tot): 3750/ 3750, tx: 510, lsn:
0/01615470, prev 0/016145C8, desc: MULTI_INSERT+INIT 185 tuples flags
0x00, blkref #0: rel 1663/13751/16384 blk 7
rmgr: Heap2 len (rec/tot): 3750/ 3750, tx: 510, lsn:
0/01616330, prev 0/01615470, desc: MULTI_INSERT+INIT 185 tuples flags
0x00, blkref #0: rel 1663/13751/16384 blk 8
rmgr: Heap2 len (rec/tot): 3750/ 3750, tx: 510, lsn:
0/016171D8, prev 0/01616330, desc: MULTI_INSERT+INIT 185 tuples flags
0x00, blkref #0: rel 1663/13751/16384 blk 9
rmgr: Heap2 len (rec/tot): 3050/ 3050, tx: 510, lsn:
0/01618098, prev 0/016171D8, desc: MULTI_INSERT+INIT 150 tuples flags
0x02, blkref #0: rel 1663/13751/16384 blk 10
This behavior will be the same for sequential copy and copy with 1
worker as the sequence of insert & the pages used to insert is in same
order. There 2 reasons together result in the varying wal_size &
wal_records with multiple worker: 1) When more than 1 worker is
involved the sequence in which the pages that will be selected is not
guaranteed, the MULTI_INSERT tuple count varies &
MULTI_INSERT/MULTI_INSERT+INIT description varies. 2) wal_records will
increase with more number of workers because when the tuples are split
across the workers, one of the worker will have few more WAL record
because the last heap_multi_insert gets split across the workers and
generates new wal records like:
rmgr: Heap2 len (rec/tot): 600/ 600, tx: 510, lsn:
0/019F8B08, prev 0/019F7C48, desc: MULTI_INSERT 25 tuples flags 0x00,
blkref #0: rel 1663/13751/16384 blk 1065
Attached the tar of wal file dump which was used for analysis.
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
Attachments:
wal_dump.tarapplication/x-tar; name=wal_dump.tarDownload
�
��_ ��]�$�������4��S�����t�N�Z-�$�������(�v$RF��:��D���.v�G9����Y*��2W`�;�, {��w��������Y����������������_�����r�?�������0O����d�7����!L>���,�O���t���?�����������?|�����������|��?|��w������������}��������w���?��on}A��������w��g��w����������m��w����}�����������]�o%���~��/~��_���������w�����~�����w����;�3�����������������o����S��w������������_�����}�������w����n���?��}���}���������~����������y���7���w���?�������_����o�@b~���?������o����;7}A��(�u�q�9�4��(��o~������$�����G�\x��f�[�zW�������_�?��������Oo����Y���b1&|�O������K%�%�����I�(I$}���$?H�
(�<k�$�M ����$�Q�Hr���� I������i��
$}��X�s��Q�=J��{�n��Q�v��Q�S�F������9��GI� ����$�8��)��BDI�9I�����|���J��������$}����)��z������F�*� $}���%��4j�]H�����R���,JJO��M��%�����>5J��zSf���'��-�����'�I��A��%�OI2a�i�����g�B�QQ}J��G�V
(�(��1�&uE%>3F�39T�>9F�� ���E��"��X�
��K�$g~��^PR>'������I7���I�xB�����f��H� Ig�Ks�Fi�(����&��Ys�6�xB��w����/�]"�OL�a �F�@�EOHz��/���\�u����y�x�4��t����r� � ����e�]��GIO���t~��#I8J��Q�]���$�'$]�]� ��)I��Qr8J�9�����KG�==J��������xB��QZ,���=L(��)Ii�(%����I�?qG)<9J��(�����v_��y��E���w)�I.i<!�r��@i|B�_FIZJz2����*J� �%�O���~� �%�OI������I�GIK.�h�4>�w���'�9��Iz5�|�3��8�/N>F�K��,�q������� ��+Y �OIM�\�xB��_�e���4>��v����-��g�,�A���--O�[�()
3�--O�[�Q��������sK������H�[Z�8����3v���--O�[�K�8Y�������y����--O�[�}�F����'�-�%�����q�[Z�8�t�d^��n��L �������)���N��D��TAi<!����"���v�E%�'$�WK�W�v�H� I�i�$'5 ���������p���$�xBR<o;J�A�xBR:-)�����
���3�������~�eM$����/����:yF���%��k�������3����?��L�o�w�"�h:�
�i�c'w�g4�~�{>D55�=�qyF���8����T�"�h:����-oz�E��t�5�H�m��f��"�h��=n�y;�E�Y�������4��3�����cv���.�����q��d3�-w�g4������o/5�E��t�R���@o�#�g4����9S��<�E����b�'����]�Mg����������w�g4�}��� �wK};�E��t�!B������������7�]�S��W�^R2�e"�'$]�wky���k<!��e����t����������5��t�e�K���v1w�xB��5�I��
y�OH:��} ��Q������x���������3����dv�� r��~�MW�(~Y�n��5��tv���\��"��<wn��y��F�������?$�%��0�W�|1�y��F��Gc�OH:�:1����������g�������b�
��I� I/�N����{�xB������~���w�'$�]������?y����.{��-��5��t5�}�Q���_;��PR
$���]��'�~u2���[I�����O����{6����]� IW�o�V������F��\��V���
����h��;HZ�~U�v�����,,t���4�����P��������%��n��5>%i�B�& �8�xB��/��rv������1����=����/0�-��'q
:�7�<���IX��'�=q9���lt�����n����;��OI�G��������iBy�0��Q*3Jz���2���MRI��<.���0�P��e��+����d���O/t_����` ���'R����\v�=��k<!��Q����O��c���s�9�4��t��{^a�#��\*r�V�
si|F��%4I�,Jzn��Izu�����$x ��'n��A���9�(m��������G�p��������� ��]� I�����8������o�>�������OH��H���]�xB��'�R`�MOH:���KZ���N�]� I�7wH�P�������I�����]�����������0��3��~Y.���E��t��rI��Q���3���>���-�%���3�N������7��"Oh�:�_2\�k<!������r�]�S��,�o�Hz�r�]��p~���@�w���Wcr��$��
#���5�n,Jzz���^7Io� �OH:��g��n-Jzv�0�v���(e��H� I�w����(;��I�3��1!�RO��o������B�%�W�x�����LK�$=����t����n���'$���h�$�]0�O!=�1Is��GI���^.��P�����=���]P��;���U�� �&��$�����m�lIt�xB��Q�_�[�g[9����I�A��-�����y��a�{iw��TAtv8��_�'���S��E��m��Vi|�%�Z�oHr�
����%�R����X�^�������&�m����.���(��(����v�}�xB�i��;
7I��&�OI��a�os����.�%��aw����1h�f@Rz�m�?$]��G��< !;������W�n�$e��wON�l����(������tz����z�4>%i�A�*��H� IW�iq�'�>�^�_�E�@R�aB�O������8���;�w�'$]}�9�������N�"��Ku~[����������Z?��v[�������5����OH��*��5>%���M��E4��������.6=�J�{K(w�'$� ���8C�����^(t���=��OH:�X_!*��m|z�xB���0�,�+W�v��������Kq�o������^{�u6 �?��������$����<f ����''a~
����1�$-��r�Kz!���K9���4��t��� ���o������������'����{��,�H� IW���+�xB������"����OH�����yI����x�$YkA��t��e�w�ALq�xB���%=8�OH�������k|�����CI3���k<!��wi$��6��k<!���)MoC����.�z'���OH�����CIO��k���2�����t�xB��_������k<!���7?�8s�xB��_�n�OH:�a�`�-�����)Ic6Ro����w�'$]?J>�����1� k!�5��t��n|���$=]
y����1`����{�F�M�������N�7/��y$�:��8��Ks��&)�$�8��(iz���cI���5>�E����& ^�����n���[����>���8Ji����_w������7I $=��u�4�%�X��\�|�4�%�l iyn��&i��;x�����M�� %�B�4>1�97���Jz���w���p ���I~�qp ��� ��5J+����b�<���M���'��FI�p��>���=q����\��,� I�K�s�`n�F�q�K���&������v\/�����A{��\/����M�(�
�����t�4�������z�����$\/�g�K�hw���$=�^�c��7Ioc����$��z����k|jBs�z��IO������'�&iI����8��:9�����M(c^7I$�'w����:�%=�a�6Jg�K�x���A��
H�?���������/���o~��������?qS����~�����w����;�3�����������������o����S��w������������������������������������7����������������������?����������~���|�S>~������o���$�g�~��_���w��>��7�M_�/�(�o�x�a�Q���_����/����?|�n����������{�_�h"<q���{��q?�e�����?��������w�������?��������]�?�����{�?c�z�O`�����v�� ~\c��W�����/��o�����w�q��|�9~���M���
�����0�_��gL���?������f|r����\�S���O��/~]~�������>F:ui������e�����?uC�������i����p���o����?�������-�)~A��
�w�|��������Nv�������m��i��4�W��q��_��g�������_�^v�w����������������/�w�������������~�����{���^���}���3v���c7�/��}�lo���������?|�<~�<�5����/��tFJI��(�Y�Q��~u�~�����������S��~w�7������_���o������6X�}�������(O�u����R�~��p��Z�"����|#]x�?Q�����W?_s����7�?������������:�,��Y|�g�c��/��Y��m>�������������O������gMT���}�ZChZ#�a���F8����L����������(Zs�T��j$ eg�*�}�,������0M���~%,���0D��^ 7���")����U
����?] ���6�,��/�)�����~A�<�@�'�~Q���~��������W���f?y����D>*���@�����G�����tT-�����gG�?U?`T�n����U�fT�y���j�7���=����;z������p������p������uz�� n���B� g�g�����z�{p�+:[Pa����qv��lyI�gg\9������������B���]&pv���s�F8� ����������
���ue���]����p68��r6�9`��$�G��*r6dp6����p6��r6�9PMw������8[q����I���Qa����6���<��K�F8��gS�g��g��@��@k�����D�2Y�Q������ki��������i�>���"���!�n}��~p7��� �F1���W���V:����[k��'#��&��,*#�f&���v��[i��
e{�V1���l�>��8jG���h+�L.��{m�bU�mK�m�t UnKn� c��x����U��*�m���������}A�spkp�^*���O�F8��z�w�/,�'�2>�����z�T4%����� 'So����b�
L���q?@���5('��z����Aj_7�\S�=���"�J�k��h���hn}M�s����h�^8�����T��sw�/�F��!�cQ4�t*�
%b8�����T�"�w����3��q�X�M/��PC���c�p<g�E�'����\(utW{4W/��������]Lb��(�}A�s��#��(���b �%����b@�
�
M�1741��T1"��>�Z��*F����\����sc�s�b@��sc�s�t@��sS�sYq#�M������UB�M�JT+rn�s��jyE�]����P���+��� ����2�+�)7�����N1���s�>���W27#�n�J�d����\��Pe����\'�Pe����\�}�1771�)T;b���\'P���;���SnA��������[�������[���)�4)�)�S)��)�I�S)��)W�F��� �N/�JBn�C������,�W� 7MM�uz�T�*J�3n��� o�Go��+�g� .�=z�N%��A�;�
�S Ku$�G\V�s- n:T� ]z�T�JdW8�J�E���q#�u����W/�J�s���3~��Tr����W���G���^/�J����S\B8�J7�>��?������h�^8��r�������4�������!��&�z�t*-%�!��Si�%�/�+#�]�rI��\�t*��������rC�r�I( ��&�z�t*����\/�N���>��H��I�^1��H��O�^:�JH����>_�RnjR�W��Rn�S����V���P�4�H�k�r�b>�"��}���2e.R��P�4mH�k�r��
)wc(W:���r�>���F���r�6�*�S)73�+�Oe������������;+�S;Bn�C�,O��{r��!woB��O�����t<UrKr���sRniR��O�����Y:��H��O���V�����Y1��������p<�N8g�>���� s� �R{4W/�Z�%�� ����j`� �/�W�0���Aj���T��Pb��,P��R���0�z9�3h����BIJ��1W8�Z-������{�����z���&���n����A4Jj?�;B+�m!�Gs���P�L��Y8�Z=D�����d�
1���a4J���uv(��t+P������ n����e�1�c�z ������N���QR��<�������K����D���pB�����Q�?�������K�>~�}�]��eA\�s���s���=���P�9�i��H'T97�9W����\H��\��*!�2�\��*!��>����F���\H��\��jE�e�,� �����9W�����\H��\��jC�e�,� ����1
��sW��Bj��*&T9����H'T97�9�]��07#��6�*&T;rnf8W:���s�>�J��[���=���P���LO�E:�*��{�s���#��v.��`nPL�
r.��%H'T9��9W|��v.��h�bBU�s��.A:��������s�6!��&���j��sIs�\��j��sI��qc3G���A/��p.i��+�Pm8��~G[7;��597�%T��%�s�������!��&���js%�97'T��%�/$,#�u����h�^B�y�����������z�\R{4W/��|E�L�R��j����\6aa��Kj���%T���D�s��m�%�/$,#�]�sI��\��j[6��p�pB�-������e�����597�%T[@�
}��� ��sC�s��e��97497�%T[D��}��� ��sc�s��e�� 97697*&T 97�97J'T 97�9�MX��������97�97J'T+r���\6aa����697*&Tr����(�Pm��[�s��e��r������Pm��[�s�tB��s�>�� �s3rnnrnTL�2rn�sn�N�v����\6aa����797*&T;r����(�P���;��� UA������P���p�tBU�sK�s���s+rnis�bBU�s+��� UE��}�eW{�������Mz U��:ef�$��*O0g��V{C��8r8g���%T�x����N���9�����a.6u!�Gs��l-J�WJ� U�0g��V{#���.��h�^B����Eo�pB�d������s���=���PeWPb��mN��wh���F��M]H��\��*���Eo�pB�g�FI��QA.cSR{4W/�����Eo�pB�g�FIm�\i��.��h�^B�(O���.I8��d���c�4
aSR{4W/��9�k�"�P�����+}�$cWR{4W/��9���"�P�������a.�u!�s��Q%]���tF��t��A:���.��a�bH�"�2�]nB��]�uW��t���]Hm�^��jE�e:���
��!��Li{���m��Tm��L���Pa{3���4b�N����m��Ue^���$�Ue$���b�~z���m��V���L���Pa{wd��i� ��b�R��W1�*�L���Pa{Ro�S�������m��XU�^���M��������H'V����6����� ��i�b&��j��{I�+����/�m���Z�fB��
�V��%����F�k�|I��_��[���F�|���#�����/�����%�
����f�����v�Kr_Y���������z���V���/��e����$�������k�����v�K�{�
GW��%����F�;�6���j��~It�_��j��~I�+��!�V�6���j_<jd��%��.��$�����_���W/���E����j����_v�7������k���="���p|�G�����t��="���*�W�7r�+�_%�����J?� �7u�W1�J����_��jE�M������"��m��������2�k����we�Wzwa��~�6�Z��jC����������1����F���~�6�Z��*#�f�~�tz��~s�~gi8���
�V1��~3�V:��~�>��/�vd����V1�*��;��V:�*�����+moA�-m����UE�-�Z���"�V�~����"��6�Z���"�V�|�prU&�:*������[�5�m���\����~E��Pa
�$�����;�m���\�Qc���G���Z�<H�+K��b����W/�*vE����7�����Ar_Y���{�����N/�*j]�U�q��Uq����:�V�������z�U�P��0�bnB����������6
��!�
{���2{��/�{*l��)���+|l�`�R��W/�*�E��B�7���.����}_�`�`�R��W/�*aB��R�7���`_���W}�[�m���\����4�� �7"��>�J��*�/��6��K�JD�e����
���|c�|�k��Cj?��/����3_<���L������}S�}��\lCj�������"�2�c#�]|�>�����m�=�{j�|��m��L����s7���������!�sO-��������1,x�07#��>�����m�=�{j�|�����L��b�swD��A^��V����������_A�ez�vU:����[����m�=�{j����C�ez�v�2����[�����m�=�{j����C�ez�v�2��:!�Vv���:�������x/�2���Y���Kr?����8�����k�� ������9U5 �$���t�58t�^���Z ]��sW8��H��~�n������RU�Q"���!Uu��$�3�v�.�ml!���-J�XW8��X��~&���%�Gw�b�:O(��]v G�;�����u�yw�%�Gw�r�:���.{og���Ew��+���.@�����^PU���e�D�p7 �������n@�
�C�zIU
�����Y8��y70�+|��F����]���F�����,UE����]�b5!��6�*fU y71�;KgU y7�y�}�qy75y�\%�k�oE�M���Y�����y�}��pwE�]��{������!����M(F��!�n}�eKl�pwC����{�������w7�w����������XF���ws�w���x��w3���Y������+������{�w����x��ww�w��� ��}��nnS�ni�����naxw���
�naxWz�"���9f���"�V�w���#����������zw� ������jY�M����w�=we�������]n�]��j��i�]��]���&q�\��E6���Y�]��k)�k�������A��eU�dJd�-�Y�M]wm���p����Cgq��+&jY�69���1�lVuS�����i)KD7-+�����U��n�*Jd��.�Y�6y�����p���-�.�h�ZXuS��D�h�"Vm�l�^�j;��B������ji�M����1�lZ�M����v3a/����=��W�%��4� �q�ML{�~IKv��.t���=��W�$"�r�b�l\u����~5GD��D�se3�����5� �q�M"od�W��ycy�b\�y�V1&H�U �7q���M������q����t�1A:�ZyWy'i{Wd�����y����u� �y�������]�pwC�������
������������]�p7#��&�z��*#�2]b{(|��;o�/�*�����7��+�U;/�$��:�������7IoA������������1A:�*�����o����4��+�U��i�����������]�
q��6����UfB��:��
��k&