An idea for parallelizing COPY within one backend
As far as I can see the main difficulty in making COPY run faster (on
the server) is that pretty involved conversion from plain-text lines
into tuples. Trying to get rid of this conversion by having the client
send something that resembles the data stored in on-disk tuples is not a
good answer, either, because it ties the client too closely to
backend-version specific implementation details.
But those problems only arise if the *client* needs to deal with the
binary format. What I envision is parallelizing that conversion step on
the server, controlled by a backend process, kind of like a filter
between the server and the client.
Upon reception of a COPY INTO command, a backend would
.) Retrieve all catalog information required to convert a plain-text
line into a tuple
.) Fork off a "dealer" and N "worker" processes that take over the
client connection. The "dealer" distributes lines received from the
client to the N workes, while the original backend receives them
as tuples back from the workers.
Neither the "dealer", nor the "workers" would need access to the either
the shared memory or the disk, thereby not messing with the "one backend
is one transaction is one session" dogma.
Now I'm eagerly waiting to hear all the reasons why this idea is broken
as hell ;-)
regards, Florian Pflug
"Florian G. Pflug" <fgp@phlo.org> writes:
...
Neither the "dealer", nor the "workers" would need access to the either
the shared memory or the disk, thereby not messing with the "one backend
is one transaction is one session" dogma.
...
Unfortunately, this idea has far too narrow a view of what a datatype
input function might do. Just for starters, consider "enum" input,
which certainly requires catalog access. We have also explicitly
acknowledged the idea that datatype I/O functions might try to store
typmod-related data in some special catalog somewhere.
regards, tom lane
Hi,
Le mercredi 27 février 2008, Florian G. Pflug a écrit :
Upon reception of a COPY INTO command, a backend would
.) Fork off a "dealer" and N "worker" processes that take over the
client connection. The "dealer" distributes lines received from the
client to the N workes, while the original backend receives them
as tuples back from the workers.
This looks so much like what pgloader does now (version 2.3.0~dev2, release
candidate) at the client side, when configured for it, that I can't help
answering the mail :)
http://pgloader.projects.postgresql.org/dev/pgloader.1.html#_parallel_loading
section_threads = N
split_file_reading = False
Of course, the backends still have to parse the input given by pgloader, which
only pre-processes data. I'm not sure having the client prepare the data some
more (binary format or whatever) is a wise idea, as you mentionned and wrt
Tom's follow-up. But maybe I'm all wrong, so I'm all ears!
Regards,
--
dim
On Wed, 2008-02-27 at 09:09 +0100, Dimitri Fontaine wrote:
Hi,
Le mercredi 27 février 2008, Florian G. Pflug a écrit :
Upon reception of a COPY INTO command, a backend would
.) Fork off a "dealer" and N "worker" processes that take over the
client connection. The "dealer" distributes lines received from the
client to the N workes, while the original backend receives them
as tuples back from the workers.This looks so much like what pgloader does now (version 2.3.0~dev2, release
candidate) at the client side, when configured for it, that I can't help
answering the mail :)
http://pgloader.projects.postgresql.org/dev/pgloader.1.html#_parallel_loading
section_threads = N
split_file_reading = FalseOf course, the backends still have to parse the input given by pgloader, which
only pre-processes data. I'm not sure having the client prepare the data some
more (binary format or whatever) is a wise idea, as you mentionned and wrt
Tom's follow-up. But maybe I'm all wrong, so I'm all ears!
ISTM the external parallelization approach is more likely to help us
avoid bottlenecks, so I support Dimitri's approach.
We also need error handling which pgloader also has.
Writing error handling and parallelization into COPY isn't going to be
easy, and not very justifiable either if we already have both.
There might be a reason to re-write it in C one day, but that will be
fairly easy task if we ever need to do it.
--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com
Tom Lane wrote:
"Florian G. Pflug" <fgp@phlo.org> writes:
...
Neither the "dealer", nor the "workers" would need access to the either
the shared memory or the disk, thereby not messing with the "one backend
is one transaction is one session" dogma.
...Unfortunately, this idea has far too narrow a view of what a datatype
input function might do. Just for starters, consider "enum" input,
which certainly requires catalog access. We have also explicitly
acknowledged the idea that datatype I/O functions might try to store
typmod-related data in some special catalog somewhere.
Hm... how many in-core datatypes are there which need catalog access in
their input or output functions? Maybe we could change the API for
i/o functions in a way that allows us to request all needed information
to be cached?
regards, Florian Pflug
Dimitri Fontaine wrote:
Of course, the backends still have to parse the input given by pgloader, which
only pre-processes data. I'm not sure having the client prepare the data some
more (binary format or whatever) is a wise idea, as you mentionned and wrt
Tom's follow-up. But maybe I'm all wrong, so I'm all ears!
As far as I understand, pgloader starts N threads or processes that open
up N individual connections to the server. In that case, moving then
text->binary conversion from the backend into the loader won't give any
additional performace I'd say.
The reason that I'd love some within-one-backend solution is that I'd
allow you to utilize more than one CPU for a restore within a *single*
transaction. This is something that a client-side solution won't be able
to deliver, unless major changes to the architecture of postgres happen
first...
regards, Florian Pflug
On Feb 27, 2008, at 9:11 AM, Florian G. Pflug wrote:
Dimitri Fontaine wrote:
Of course, the backends still have to parse the input given by
pgloader, which only pre-processes data. I'm not sure having the
client prepare the data some more (binary format or whatever) is a
wise idea, as you mentionned and wrt Tom's follow-up. But maybe I'm
all wrong, so I'm all ears!As far as I understand, pgloader starts N threads or processes that
open up N individual connections to the server. In that case, moving
then text->binary conversion from the backend into the loader won't
give any
additional performace I'd say.The reason that I'd love some within-one-backend solution is that
I'd allow you to utilize more than one CPU for a restore within a
*single* transaction. This is something that a client-side solution
won't be able to deliver, unless major changes to the architecture
of postgres happen first...
It seems like multiple backends should be able to take advantage of
2PC for transaction safety.
Cheers,
M
Tom Lane wrote:
"Florian G. Pflug" <fgp@phlo.org> writes:
...
Neither the "dealer", nor the "workers" would need access to the either
the shared memory or the disk, thereby not messing with the "one backend
is one transaction is one session" dogma.
...Unfortunately, this idea has far too narrow a view of what a datatype
input function might do. Just for starters, consider "enum" input,
which certainly requires catalog access. We have also explicitly
acknowledged the idea that datatype I/O functions might try to store
typmod-related data in some special catalog somewhere.regards, tom lane
Would it be possible to determine when the copy is starting that this
case holds, and not use the parallel parsing idea in those cases?
I'm a big user of copy, generally into very simple tables- few indexes,
simple column types (numeric, varchar, and int almost exclusively), no
fancy features. A parallel copy input in the "simple" cases would be of
great advantage to me, even if it doesn't parallelize "complicated" cases.
Brian
A.M. wrote:
On Feb 27, 2008, at 9:11 AM, Florian G. Pflug wrote:
The reason that I'd love some within-one-backend solution is that I'd
allow you to utilize more than one CPU for a restore within a *single*
transaction. This is something that a client-side solution won't be
able to deliver, unless major changes to the architecture of postgres
happen first...It seems like multiple backends should be able to take advantage of 2PC
for transaction safety.
Yeah, but it wouldn't take advantage of, say, the hack to disable WAL
when the table was created in the same transaction.
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
A.M. wrote:
On Feb 27, 2008, at 9:11 AM, Florian G. Pflug wrote:
Dimitri Fontaine wrote:
Of course, the backends still have to parse the input given by
pgloader, which only pre-processes data. I'm not sure having the
client prepare the data some more (binary format or whatever) is a
wise idea, as you mentionned and wrt Tom's follow-up. But maybe I'm
all wrong, so I'm all ears!As far as I understand, pgloader starts N threads or processes that
open up N individual connections to the server. In that case, moving
then text->binary conversion from the backend into the loader won't
give any
additional performace I'd say.The reason that I'd love some within-one-backend solution is that I'd
allow you to utilize more than one CPU for a restore within a *single*
transaction. This is something that a client-side solution won't be
able to deliver, unless major changes to the architecture of postgres
happen first...It seems like multiple backends should be able to take advantage of 2PC
for transaction safety.
Yes, whatever is coordinating the multiple backends (a master backend? i
haven't followed this thread closely) would then have to have logic to
finish the prepared transactions if you crash after you've committed one
but not all of them. IOW, it would need a mini transaction log of its own.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Brian Hurt wrote:
Tom Lane wrote:
"Florian G. Pflug" <fgp@phlo.org> writes:
...
Neither the "dealer", nor the "workers" would need access to the either
the shared memory or the disk, thereby not messing with the "one backend
is one transaction is one session" dogma.
...Unfortunately, this idea has far too narrow a view of what a datatype
input function might do. Just for starters, consider "enum" input,
which certainly requires catalog access. We have also explicitly
acknowledged the idea that datatype I/O functions might try to store
typmod-related data in some special catalog somewhere.
Would it be possible to determine when the copy is starting that this
case holds, and not use the parallel parsing idea in those cases?
In theory, yes. In pratice, I don't want to be the one who has to answer
to an angry user who just suffered a major drop in COPY performance
after adding an ENUM column to his table.
I was thinking more along the line of letting a datatype specify a
function "void* ioprepare(typmod)" which returns some opaque object
specifying all that the input and output function needs to know.
We could than establish the rule that input/output functions may not
access the catalog, and instead pass them a pointer to that opaque object.
All pretty pie-in-the-sky at the moment, though...
regards, Florian Pflug
On Wed, Feb 27, 2008 at 9:26 PM, Florian G. Pflug <fgp@phlo.org> wrote:
I was thinking more along the line of letting a datatype specify a
function "void* ioprepare(typmod)" which returns some opaque object
specifying all that the input and output function needs to know.
We could than establish the rule that input/output functions may not
access the catalog, and instead pass them a pointer to that opaque object.
Callers of IO functions don't always know which type they're dealing
with - we had to go to some lengths to pass type information along
with the enum value itself so that it could be looked up in the
syscache in the output function. I think the main culprits are the
P/Ls, but I think there was a security related concern about passing
the type through to the IO function as well. If you want to do
something like this, it would certainly be possible to cache the enum
info for a particular type, but you might want to have a separate set
of io functions just for this case.
On the plus side, if such a cache were to be used by IO generally, we
could reimplement enums to just store the ordinal on disk and save a
couple of bytes, like I wanted but was unable to do the first time
around. :)
Enums are an easy case, though, as there's very little data to deal
with. I don't know about other UDTs out there - do any require more
extensive catalog access?
Cheers
Tom
Alvaro Herrera <alvherre@commandprompt.com> writes:
Yeah, but it wouldn't take advantage of, say, the hack to disable WAL
when the table was created in the same transaction.
In the context of a parallelizing pg_restore this would be fairly easy
to get around. We could probably teach the thing to combine table
creation and loading steps into one action (transaction) in most cases.
If that couldn't work because of some weird dependency chain, the
data loading transaction could be done as
BEGIN;
TRUNCATE table;
COPY table FROM stdin;
...
COMMIT;
which I believe already invokes the no-WAL optimization, and could
certainly be made to do so if not.
Obviously, pg_restore would have to be aware of whether or not it had
created that table in the current run (else it mustn't TRUNCATE),
but it would be tracking more or less exactly that info anyway to handle
dependency ordering.
regards, tom lane
Florian G. Pflug wrote:
Would it be possible to determine when the copy is starting that this
case holds, and not use the parallel parsing idea in those cases?In theory, yes. In pratice, I don't want to be the one who has to
answer to an angry user who just suffered a major drop in COPY
performance after adding an ENUM column to his table.
I am yet to be convinced that this is even theoretically a good path to
follow. Any sufficiently large table could probably be partitioned and
then we could use the parallelism that is being discussed for pg_restore
without any modification to the backend at all. Similar tricks could be
played by an external bulk loader for third party data sources.
cheers
andrew
Andrew Dunstan wrote:
Florian G. Pflug wrote:
Would it be possible to determine when the copy is starting that
this case holds, and not use the parallel parsing idea in those cases?In theory, yes. In pratice, I don't want to be the one who has to
answer to an angry user who just suffered a major drop in COPY
performance after adding an ENUM column to his table.I am yet to be convinced that this is even theoretically a good path
to follow. Any sufficiently large table could probably be partitioned
and then we could use the parallelism that is being discussed for
pg_restore without any modification to the backend at all. Similar
tricks could be played by an external bulk loader for third party data
sources.
I was just floating this as an idea- I don't know enough about the
backend to know if it was a good idea or not, it sounds like "not".
Brian
Andrew Dunstan wrote:
Florian G. Pflug wrote:
Would it be possible to determine when the copy is starting that this
case holds, and not use the parallel parsing idea in those cases?In theory, yes. In pratice, I don't want to be the one who has to
answer to an angry user who just suffered a major drop in COPY
performance after adding an ENUM column to his table.I am yet to be convinced that this is even theoretically a good path to
follow. Any sufficiently large table could probably be partitioned and
then we could use the parallelism that is being discussed for pg_restore
without any modification to the backend at all. Similar tricks could be
played by an external bulk loader for third party data sources.
That assumes that some specific bulkloader like pg_restore, pgloader
or similar is used to perform the load. Plain libpq-users would either
need to duplicate the logic these loaders contain, or wouldn't be able
to take advantage of fast loads.
Plus, I'd see this as a kind of testbed for gently introducing
parallelism into postgres backends (especially thinking about sorting
here). CPU gain more and more cores, so in the long run I fear that we
will have to find ways to utilize more than one of those to execute a
single query.
But of course the architectural details need to be sorted out before any
credible judgement about the feasability of this idea can be made...
regards, Florian Pflug
"Florian G. Pflug" <fgp@phlo.org> writes:
Plus, I'd see this as a kind of testbed for gently introducing
parallelism into postgres backends (especially thinking about sorting
here).
This thinking is exactly what makes me scream loudly and run in the
other direction. I don't want threads introduced into the backend,
whether "gently" or otherwise. The portability and reliability hits
that we'll take are too daunting. Threads that invoke user-defined
code (as anything involved with datatype-specific operations must)
are especially fearsome, as there is precisely 0 chance of that code
being thread-safe.
regards, tom lane
Tom Lane wrote:
"Florian G. Pflug" <fgp@phlo.org> writes:
Plus, I'd see this as a kind of testbed for gently introducing
parallelism into postgres backends (especially thinking about sorting
here).This thinking is exactly what makes me scream loudly and run in the
other direction. I don't want threads introduced into the backend,
whether "gently" or otherwise. The portability and reliability hits
that we'll take are too daunting. Threads that invoke user-defined
code (as anything involved with datatype-specific operations must)
are especially fearsome, as there is precisely 0 chance of that code
being thread-safe.
Exactly my thinking. That is why I was looking for a way to introduce
parallelism *without* threading. Though it's not so much the
user-defined code that scares me, but rather the portability issues. The
differences between NPTL and non-NPTL threads on linux alone make me
shudder...
Was I was saying is that there might be a chance to get some parallelism
without threading, by executing well-defined pieces of code with
controlled dependencies in separate processes. COPY seemed like an ideal
testbed for that idea, since the conversion of received lines into
tuples seemed reasonable self-contained, and with little outside
dependencies. If the idea can't be made to work there, it probably won't
work anywhere. If it turns out that it does (with an API change for
input/output functions) however, then it *might* be possible to apply it
to other relatively self-contained parts in the future...
To restate, I don't want threaded backends. Not in the foreseeable
future at least. But I'd still love to see a single transaction using
more than one core.
regards, Florian Pflug