Partitioning option for COPY
Hi all,
I have extracted the partitioning option for COPY (removed the error
logging part) from the previous patch. The documentation and test suite
sample are provided as well.
More details are on the wiki page at
http://wiki.postgresql.org/wiki/Auto-partitioning_in_COPY. Ignore the
error logging related comments that do not apply here.
Looking forward to your feedback
Emmanuel
--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com
Attachments:
aster-copy-partitioning-patch-8.5-context.txttext/plain; name=aster-copy-partitioning-patch-8.5-context.txtDownload+1242-77
Emmanuel Cecchet <manu@asterdata.com> wrote:
I have extracted the partitioning option for COPY (removed the error
logging part) from the previous patch.
We can use an INSERT trigger to route tuples into partitions even now.
Why do you need an additional router for COPY? Also, it would be nicer
that the router can works not only in COPY but also in INSERT.
BTW, I'm working on meta data of partitioning now. Your "partitioning"
option in COPY could be replaced with the catalog.
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
Hi,
I have extracted the partitioning option for COPY (removed the error
logging part) from the previous patch.We can use an INSERT trigger to route tuples into partitions even now.
Why do you need an additional router for COPY?
Tom has already explained on the list why using a trigger was a bad idea
(and I know we can use a trigger since I am the one who wrote it).
If you look at the code you will see that you can do optimizations in
the COPY code that you cannot do in the trigger.
Also, it would be nicer
that the router can works not only in COPY but also in INSERT.
As 8.5 will at best provide a syntactic hack on top of the existing
constraint implementation, I think that it will not hurt to have routing
in COPY since we will not have it anywhere otherwise.
BTW, I'm working on meta data of partitioning now. Your "partitioning"
option in COPY could be replaced with the catalog.
This implementation is only for the current 8.5 and it will not be
needed anymore once we get a fully functional partitioning in Postgres
which seems to be for a future version.
Best regards,
Emmanuel
--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com
Emmanuel Cecchet <manu@asterdata.com> wrote:
If you look at the code you will see that you can do optimizations in
the COPY code that you cannot do in the trigger.
Since the optimizations is nice, I hope it will work not only in COPY
but also in INSERT. An idea is moving the partitioning cache into
Relation cache, and also moving the routing routines into heap_insert().
My concern is just in the modified position; I think you don't have to
change your logic itself.
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
We can use an INSERT trigger to route tuples into partitions even now.
Why do you need an additional router for COPY? Also, it would be nicer
that the router can works not only in COPY but also in INSERT.
Yeah, but performance on an insert trigger is impractical for large
volumes of data.
--Josh Berkus
Emmanuel Cecchet wrote:
Hi all,
Hi!,
partitioning option for COPY
Here's the review:
== Submission ==
The patch is contextual, applies cleanly to current HEAD, compiles fine.
The docs build cleanly.
== Docs ==
They're reasonably clear, although they still mention ERROR_LOGGING,
which was taken out of this patch. They could use some wordsmithing, but
I didn't go into details, as there were more severe issues with the patch.
One thing that made me cautious was the mention that triggers modifying
tuples will make random errors appear. As is demonstrated later,
triggers are a big issue.
== Regression tests ==
They ran fine, there's one additional regression test that exercises the
new option.
== Style/nitpicks ==
Minor gripes include:
o instead of using an ad-hoc data structure for the LRU cache list, I'd
suggest an OidList from pg_list.h.
o some mentions of "method" in comments should be changed to "function"
o trailing whitespace in the patch (it's not the end of the world, of
course)
== Issues ==
Attached are 3 files that demonstrate problems the patch has.
o test1.sql always segfaults for me, poking around with gdb suggests
it's a case of an uninitialised cache list (another reason to use the
builtin one).
o test2.sql demonstrates, that indices on child tables are not being
updated, probably because after resultRelInfo in
check_tuple_constraints() gets created is never has ri_NumIndices set,
and so the code that was supposed to take care of indices is never
called. Looks like a copy-paste error.
o test3.sql demonstrates, that some triggers that I would expect to be
fired are in fact not fired. I guess it's the same reason as mentioned:
ri_TrigDesc never gets set, so the code that calls triggers is dead.
I stopped there, because unfortunately, apart from all that there's one
fundamental problem with this patch, namely "we probably don't want it".
As it stands it's more of a proof of concept than a really usable
solution, it feels like built from spare (copied from around copy.c)
parts. IMHO it's much too narrow for a general partitioning solution,
even if the design it's based upon would be accepted. It's assuming a
lot of things about the presence of child tables (with proper
constraints), the absence of triggers, and so on.
Granted, it solves a particular problem (bulk loading into a partitioned
table, with not extra features like triggers and with standard
inheritance/exclusive check constraints setup), but that's not good
enough in my opinion, even if all other issues would be addressed.
Now I'm not a real Postgres user, it's been a while since I worked in a
PG shop (or a DB shop for that matter), but from what I understand from
following this community for a while, a patch like that doesn't have a
lot of chances to be committed. That said, my puny experience with real
PG installations and their needs must be taken into account here.
I'll mark this patch as "Waiting on Author", but I have little doubt
that even after fixing those probably trivial segfaults etc. the patch
would be promptly rejected by a committer. I suggest withdrawing it from
this commitfest and trying to work out a more complete design first that
would address the needs of a bigger variety of users, or joining some of
the already underway efforts to bring full-featured partitioning into
Postgres.
Best,
Jan
Jan Urbański wrote:
Emmanuel Cecchet wrote:
Hi all,
Hi!,
partitioning option for COPY
Attached are 3 files that demonstrate problems the patch has.
And the click-before-you-think prize winner is... me.
Test cases attached, see the comments for expected/actual results.
Jan
Jan,
Here is a new version of the patch. Find the response to your comments
embedded in the text.
partitioning option for COPY
Here's the review:
== Submission ==
The patch is contextual, applies cleanly to current HEAD, compiles fine.
The docs build cleanly.== Docs ==
They're reasonably clear, although they still mention ERROR_LOGGING,
which was taken out of this patch. They could use some wordsmithing, but
I didn't go into details, as there were more severe issues with the patch.
Removed the text related to ERROR_LOGGING.
One thing that made me cautious was the mention that triggers modifying
tuples will make random errors appear. As is demonstrated later,
triggers are a big issue.
Whichever way routing is implemented we will have to decide what we want
to do with triggers. We can decide to fire them or not (there was
already a debate whether COPY is an insert statement or not and should
fire the statement trigger for insert). This is not a design problem
with this patch, we just have to chose what we want to do with triggers
when partitioning is involved. IMHO we should disable them altogether
but there are scenarios where one could argue that there are still useful.
== Regression tests ==
They ran fine, there's one additional regression test that exercises the
new option.== Style/nitpicks ==
Minor gripes include:
o instead of using an ad-hoc data structure for the LRU cache list, I'd
suggest an OidList from pg_list.h.
Will do if we decide to go further with this patch.
o some mentions of "method" in comments should be changed to "function"
o trailing whitespace in the patch (it's not the end of the world, of
course)
I guess the committer will run pg_indent anyway so I'm not too worried
about spaces.
== Issues ==
Attached are 3 files that demonstrate problems the patch has.
o test1.sql always segfaults for me, poking around with gdb suggests
it's a case of an uninitialised cache list (another reason to use the
builtin one).
I was never able to reproduce that problem. I don't know where this
comes from.
o test2.sql demonstrates, that indices on child tables are not being
updated, probably because after resultRelInfo in
check_tuple_constraints() gets created is never has ri_NumIndices set,
and so the code that was supposed to take care of indices is never
called. Looks like a copy-paste error.
Fixed, actually there was a leak in relcache for the index.
o test3.sql demonstrates, that some triggers that I would expect to be
fired are in fact not fired. I guess it's the same reason as mentioned:
ri_TrigDesc never gets set, so the code that calls triggers is dead.
There is a problem with after row triggers that I did not completely
figure out. For some reason, if I use the regular mechanism by calling
ExecARInsertTrigger that differ the execution of the trigger until the
after row event is triggered, the child relation is not closed and there
is a leak in the relcache. I forced the after row triggers to execute
synchronously after inserting in the child table to work around the
problem. If someone has an explanation, I am willing to do a cleaner
implementation!
I stopped there, because unfortunately, apart from all that there's one
fundamental problem with this patch, namely "we probably don't want it".As it stands it's more of a proof of concept than a really usable
solution, it feels like built from spare (copied from around copy.c)
parts. IMHO it's much too narrow for a general partitioning solution,
even if the design it's based upon would be accepted. It's assuming a
lot of things about the presence of child tables (with proper
constraints), the absence of triggers, and so on.Granted, it solves a particular problem (bulk loading into a partitioned
table, with not extra features like triggers and with standard
inheritance/exclusive check constraints setup), but that's not good
enough in my opinion, even if all other issues would be addressed.
Well, as Postgres does not have any support for real partitioning
besides inheritance, and so far it is unlikely that another
implementation will happen in the 8.5 timeframe, this feature fills the
need for people doing data warehouses. This is a scenario used with
every single Aster customer. Now if the Postgres community does not
think that the Aster use case is general enough or of interest to be
integrated in the code base, this is a different issue and I won't spent
time arguing if this is a philosophical/political issue.
Note that the new patch works with triggers but you can easily generate
corrupt data if your triggers are modifying the data on which the
routing decision is based.
Now I'm not a real Postgres user, it's been a while since I worked in a
PG shop (or a DB shop for that matter), but from what I understand from
following this community for a while, a patch like that doesn't have a
lot of chances to be committed. That said, my puny experience with real
PG installations and their needs must be taken into account here.
I don't really understand why a new option of COPY should be solving a
general problem. It's an option, and like every option, it is to solve a
particular use case. I don't see what is wrong with that.
I'll mark this patch as "Waiting on Author", but I have little doubt
that even after fixing those probably trivial segfaults etc. the patch
would be promptly rejected by a committer. I suggest withdrawing it from
this commitfest and trying to work out a more complete design first that
would address the needs of a bigger variety of users, or joining some of
the already underway efforts to bring full-featured partitioning into
Postgres.
I have integrated your tests in the regression test suite and I was
never able to reproduce the segfault you mentioned. What platform are
you using?
Thanks for your valuable feedback
Emmanuel
--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com
Attachments:
aster-copy-partitioning-patch-8.5-contextv2.txttext/plain; name=aster-copy-partitioning-patch-8.5-contextv2.txtDownload+1421-72
Hi,
I'll hopefully look at the next version of the patch tommorrow.
Emmanuel Cecchet wrote:
o test1.sql always segfaults for me, poking around with gdb suggests
it's a case of an uninitialised cache list (another reason to use the
builtin one).I was never able to reproduce that problem. I don't know where this
comes from.
I have integrated your tests in the regression test suite and I was
never able to reproduce the segfault you mentioned. What platform are
you using?
In the meantime I tried the test1.sql file again and it still segfaulted
for me.
I'm using 32bit Linux, PG compiled with:
$ ./configure CFLAGS=-O0 --enable-cassert --enable-debug --without-perl
--without-python --without-openssl --without-tcl
and then I start postmaster, fire up psql, attach gdb to the backend, do
\i test1.sql and get:
Program received signal SIGSEGV, Segmentation fault.
0x0819368b in route_tuple_to_child (parent_relation=0xb5d93040,
tuple=0x873b08c, hi_options=0, parentResultRelInfo=0x871e204) at copy.c:1821
1821 child_relation_id =
child_oid_cell->oid_value;
(gdb) bt
#0 0x0819368b in route_tuple_to_child (parent_relation=0xb5d93040,
tuple=0x873b08c, hi_options=0, parentResultRelInfo=0x871e204) at copy.c:1821
#1 0x081950e3 in CopyFrom (cstate=0x871e0dc) at copy.c:2480
#2 0x08192532 in DoCopy (stmt=0x86fb144, queryString=0x86fa73c "copy
parent from stdin with (partitioning);") at copy.c:1227
(gdb) p child_oid_cell
$1 = (OidCell *) 0x7f7f7f7f
(gdb) p child_oid_cell->oid_value
Cannot access memory at address 0x7f7f7f7f
That 0x7f7f7f7f looks like clobbered memory, the memory management funcs
do that when cassert is enabled, IIRC.
Cheers,
Jan
--
Jan Urbanski
GPG key ID: E583D7D2
ouden estin
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <wulczer@wulczer.org> writes:
Program received signal SIGSEGV, Segmentation fault.
0x0819368b in route_tuple_to_child (parent_relation=0xb5d93040,
tuple=0x873b08c, hi_options=0, parentResultRelInfo=0x871e204) at copy.c:1821
1821 child_relation_id =
child_oid_cell->oid_value;
(gdb) p child_oid_cell
$1 = (OidCell *) 0x7f7f7f7f
This looks like the patch is trying to create a data structure in a
memory context that's not sufficiently long-lived for the use of the
structure. If you do this in a non-cassert build, it will seem to
work, some of the time, if the memory in question happens to not
get reallocated to something else.
A good rule of thumb is to never do code development in a non-cassert
build. You're just setting yourself up for failure.
regards, tom lane
Tom Lane wrote:
A good rule of thumb is to never do code development in a non-cassert
build.
And the same rule goes for review, too; I'll update the review
guidelines to spell that out more clearly. Basically, if you're doing
any work on new code, you should have cassert turned on, *except* if
you're doing performance testing. The asserts slow things down enough
(particularly with large shared_buffers values) to skew performance
tests, but in all other coding situations you should have them enabled.
--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com www.2ndQuadrant.com
Tom Lane wrote:
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <wulczer@wulczer.org> writes:
Program received signal SIGSEGV, Segmentation fault.
0x0819368b in route_tuple_to_child (parent_relation=0xb5d93040,
tuple=0x873b08c, hi_options=0, parentResultRelInfo=0x871e204) at copy.c:1821
1821 child_relation_id =
child_oid_cell->oid_value;
(gdb) p child_oid_cell
$1 = (OidCell *) 0x7f7f7f7fThis looks like the patch is trying to create a data structure in a
memory context that's not sufficiently long-lived for the use of the
structure. If you do this in a non-cassert build, it will seem to
work, some of the time, if the memory in question happens to not
get reallocated to something else.
I was using the CacheMemoryContext. Could someone tell me why this is
wrong and what should have been the appropriate context to use?
Thanks
Emmanuel
--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com
Emmanuel Cecchet <manu@asterdata.com> writes:
Tom Lane wrote:
This looks like the patch is trying to create a data structure in a
memory context that's not sufficiently long-lived for the use of the
structure. If you do this in a non-cassert build, it will seem to
work, some of the time, if the memory in question happens to not
get reallocated to something else.I was using the CacheMemoryContext. Could someone tell me why this is
wrong and what should have been the appropriate context to use?
Well, (a) I doubt you really were creating the list in
CacheMemoryContext, else it'd have not gotten clobbered; (b) creating
statement-local data structures in CacheMemoryContext is entirely
unacceptable anyway, because then they represent a permanent memory
leak.
The right context for statement-lifetime data structures is generally
the CurrentMemoryContext the statement code is called with.
regards, tom lane
Tom Lane wrote:
Emmanuel Cecchet <manu@asterdata.com> writes:
Tom Lane wrote:
This looks like the patch is trying to create a data structure in a
memory context that's not sufficiently long-lived for the use of the
structure. If you do this in a non-cassert build, it will seem to
work, some of the time, if the memory in question happens to not
get reallocated to something else.I was using the CacheMemoryContext. Could someone tell me why this is
wrong and what should have been the appropriate context to use?Well, (a) I doubt you really were creating the list in
CacheMemoryContext, else it'd have not gotten clobbered; (b) creating
statement-local data structures in CacheMemoryContext is entirely
unacceptable anyway, because then they represent a permanent memory
leak.
Well I thought that this code would do it:
child_table_lru = (OidLinkedList *)MemoryContextAlloc(
+ CacheMemoryContext, sizeof(OidLinkedList));
...
+ /* Add the new entry in head of the list */
+ new_head = (OidCell *) MemoryContextAlloc(
+ CacheMemoryContext, sizeof(OidCell));
The right context for statement-lifetime data structures is generally
the CurrentMemoryContext the statement code is called with.
Actually the list is supposed to stay around between statement
executions. You don't want to restart with a cold cache at every
statement so I really want this structure to stay in memory at a more
global level.
Emmanuel
--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com
Emmanuel Cecchet <manu@asterdata.com> writes:
Actually the list is supposed to stay around between statement
executions. You don't want to restart with a cold cache at every
statement so I really want this structure to stay in memory at a more
global level.
Cache? Why do you need a cache for COPY? Repeated bulk loads into the
same table within a single session doesn't seem to me to be a case that
is common enough to justify a cache.
(BTW, the quoted code seems to be busily reinventing OID Lists. Don't
do that.)
regards, tom lane
Tom Lane wrote:
Emmanuel Cecchet <manu@asterdata.com> writes:
Actually the list is supposed to stay around between statement
executions. You don't want to restart with a cold cache at every
statement so I really want this structure to stay in memory at a more
global level.Cache? Why do you need a cache for COPY? Repeated bulk loads into the
same table within a single session doesn't seem to me to be a case that
is common enough to justify a cache.
Actually the cache is only activated if you use the partitioning option.
It is just a list of oids of child tables where tuples were inserted.
It is common to have multiple COPY operations in the same session when
you are doing bulk loading in a warehouse.
(BTW, the quoted code seems to be busily reinventing OID Lists. Don't
do that.)
Yes, I understood that I should use an OidList instead. But I was trying
to understand what I did wrong here (besides reinventing the oid list ;-)).
Why do I get this segfault if I use memory from CacheMemoryContext?
Emmanuel
--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com
Emmanuel Cecchet <manu@asterdata.com> writes:
Tom Lane wrote:
Cache? Why do you need a cache for COPY?
Actually the cache is only activated if you use the partitioning option.
It is just a list of oids of child tables where tuples were inserted.
Umm ... why is that useful enough to be cached?
Why do I get this segfault if I use memory from CacheMemoryContext?
Well, CacheMemoryContext will never be reset, so either you freed the
data structure yourself or there's something wrong with the pointer
you think is pointing at the data structure ...
regards, tom lane
Hi Jan,
Here is a new version of the patch with the following modifications:
- used oid list from pg_list.h
- properly handles triggers and generate an error if needed (updated doc
as well)
- added your test cases + extra bad trigger cases
Emmanuel
Hi,
I'll hopefully look at the next version of the patch tommorrow.
Emmanuel Cecchet wrote:
o test1.sql always segfaults for me, poking around with gdb suggests
it's a case of an uninitialised cache list (another reason to use the
builtin one).I was never able to reproduce that problem. I don't know where this
comes from.I have integrated your tests in the regression test suite and I was
never able to reproduce the segfault you mentioned. What platform are
you using?In the meantime I tried the test1.sql file again and it still segfaulted
for me.
I'm using 32bit Linux, PG compiled with:$ ./configure CFLAGS=-O0 --enable-cassert --enable-debug --without-perl
--without-python --without-openssl --without-tcland then I start postmaster, fire up psql, attach gdb to the backend, do
\i test1.sql and get:Program received signal SIGSEGV, Segmentation fault.
0x0819368b in route_tuple_to_child (parent_relation=0xb5d93040,
tuple=0x873b08c, hi_options=0, parentResultRelInfo=0x871e204) at copy.c:1821
1821 child_relation_id =
child_oid_cell->oid_value;
(gdb) bt
#0 0x0819368b in route_tuple_to_child (parent_relation=0xb5d93040,
tuple=0x873b08c, hi_options=0, parentResultRelInfo=0x871e204) at copy.c:1821
#1 0x081950e3 in CopyFrom (cstate=0x871e0dc) at copy.c:2480
#2 0x08192532 in DoCopy (stmt=0x86fb144, queryString=0x86fa73c "copy
parent from stdin with (partitioning);") at copy.c:1227(gdb) p child_oid_cell
$1 = (OidCell *) 0x7f7f7f7f(gdb) p child_oid_cell->oid_value
Cannot access memory at address 0x7f7f7f7fThat 0x7f7f7f7f looks like clobbered memory, the memory management funcs
do that when cassert is enabled, IIRC.Cheers,
Jan
--
Emmanuel Cecchet
FTO @ Frog Thinker
Open Source Development & Consulting
--
Web: http://www.frogthinker.org
email: manu@frogthinker.org
Skype: emmanuel_cecchet
Attachments:
aster-copy-partitioning-patch-8.5-contextv4.txttext/plain; name=aster-copy-partitioning-patch-8.5-contextv4.txtDownload+1465-74
Emmanuel Cecchet wrote:
Hi Jan,
Here is a new version of the patch with the following modifications:
- used oid list from pg_list.h
- properly handles triggers and generate an error if needed (updated doc
as well)
- added your test cases + extra bad trigger cases
Hi,
that got broken by the WHEN triggers patch
(c6e0a36243a54eff79b47b3a0cb119fb67a55165), which changed the
TriggerEnabled function signature, the code currently does not compile.
I'll continue reading, in the meantime could you send a updated patch?
Thanks,
Jan
=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= <wulczer@wulczer.org> writes:
that got broken by the WHEN triggers patch
(c6e0a36243a54eff79b47b3a0cb119fb67a55165), which changed the
TriggerEnabled function signature, the code currently does not compile.
[ squint... ] What is that patch doing touching the innards of
trigger.c in the first place? I can't see any reason for trigger.c
to be associated with partitioning.
regards, tom lane