pg_reorg in core?

Started by Michael Paquierover 13 years ago40 messageshackers
Jump to latest
#1Michael Paquier
michael@paquier.xyz

Hi all,

During the last PGCon, I heard that some community members would be
interested in having pg_reorg directly in core.
Just to recall, pg_reorg is a functionality developped by NTT that allows
to redistribute a table without taking locks on it.
The technique it uses to reorganize the table is to create a temporary copy
of the table to be redistributed with a CREATE TABLE AS
whose definition changes if table is redistributed with a VACUUM FULL or
CLUSTER.
Then it follows this mechanism:
- triggers are created to redirect all the DMLs that occur on the table to
an intermediate log table.
- creation of indexes on the temporary table based on what the user wishes
- Apply the logs registered during the index creation
- Swap the names of freshly created table and old table
- Drop the useless objects

The code is hosted by pg_foundry here: http://pgfoundry.org/projects/reorg/.
I am also maintaining a fork in github in sync with pgfoundry here:
https://github.com/michaelpq/pg_reorg.

Just, do you guys think it is worth adding a functionality like pg_reorg in
core or not?

If yes, well I think the code of pg_reorg is going to need some
modifications to make it more compatible with contrib modules using only
EXTENSION.
For the time being pg_reorg is divided into 2 parts, binary and library.
The library part is the SQL portion of pg_reorg, containing a set of C
functions that are called by the binary part. This has been extended to
support CREATE EXTENSION recently.
The binary part creates a command pg_reorg in charge of calling the set of
functions created by the lib part, being just a wrapper of the library part
to control the creation and deletion of the objects.
It is also in charge of deleting the temporary objects by callback if an
error occurs.

By using the binary command, it is possible to reorganize a single table or
a database, in this case reorganizing a database launches only a loop on
each table of this database.

My idea is to remove the binary part and to rely only on the library part
to make pg_reorg a single extension with only system functions like other
contrib modules.
In order to do that what is missing is a function that could be used as an
entry point for table reorganization, a function of the type
pg_reorg_table(tableoid) and pg_reorg_table(tableoid, text).
All the functionalities of pg_reorg could be reproducible:
- pg_reorg_table(tableoid) for a VACUUM FULL reorganization
- pg_reorg_table(tableoid, NULL) for a CLUSTER reorganization if table has
a CLUSTER key
- pg_reorg_table(tableoid, columnname) for a CLUSTER reorganization based
on a wanted column.

Is it worth the shot?

Regards,
--
Michael Paquier
http://michael.otacoo.com

#2Josh Kupershmidt
schmiddy@gmail.com
In reply to: Michael Paquier (#1)
Re: pg_reorg in core?

On Thu, Sep 20, 2012 at 7:05 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

Hi all,

During the last PGCon, I heard that some community members would be
interested in having pg_reorg directly in core.

I'm actually not crazy about this idea, at least not given the current
state of pg_reorg. Right now, there are a quite a few fixes and
features which remain to be merged in to cvs head, but at least we can
develop pg_reorg on a schedule independent of Postgres itself, i.e. we
can release new features more often than once a year. Perhaps when
pg_reorg is more stable, and the known bugs and missing features have
been ironed out, we could think about integrating into core.

Granted, a nice thing about integrating with core is we'd probably
have more of an early warning when reshuffling of PG breaks pg_reorg
(e.g. the recent splitting of the htup headers), but such changes have
been quick and easy to fix so far.

Just to recall, pg_reorg is a functionality developped by NTT that allows to
redistribute a table without taking locks on it.
The technique it uses to reorganize the table is to create a temporary copy
of the table to be redistributed with a CREATE TABLE AS
whose definition changes if table is redistributed with a VACUUM FULL or
CLUSTER.
Then it follows this mechanism:
- triggers are created to redirect all the DMLs that occur on the table to
an intermediate log table.

N.B. CREATE TRIGGER takes an AccessExclusiveLock on the table, see below.

- creation of indexes on the temporary table based on what the user wishes
- Apply the logs registered during the index creation
- Swap the names of freshly created table and old table
- Drop the useless objects

The code is hosted by pg_foundry here: http://pgfoundry.org/projects/reorg/.
I am also maintaining a fork in github in sync with pgfoundry here:
https://github.com/michaelpq/pg_reorg.

Just, do you guys think it is worth adding a functionality like pg_reorg in
core or not?

If yes, well I think the code of pg_reorg is going to need some
modifications to make it more compatible with contrib modules using only
EXTENSION.
For the time being pg_reorg is divided into 2 parts, binary and library.
The library part is the SQL portion of pg_reorg, containing a set of C
functions that are called by the binary part. This has been extended to
support CREATE EXTENSION recently.
The binary part creates a command pg_reorg in charge of calling the set of
functions created by the lib part, being just a wrapper of the library part
to control the creation and deletion of the objects.
It is also in charge of deleting the temporary objects by callback if an
error occurs.

By using the binary command, it is possible to reorganize a single table or
a database, in this case reorganizing a database launches only a loop on
each table of this database.

My idea is to remove the binary part and to rely only on the library part to
make pg_reorg a single extension with only system functions like other
contrib modules.

In order to do that what is missing is a function that could be used as an
entry point for table reorganization, a function of the type
pg_reorg_table(tableoid) and pg_reorg_table(tableoid, text).
All the functionalities of pg_reorg could be reproducible:
- pg_reorg_table(tableoid) for a VACUUM FULL reorganization
- pg_reorg_table(tableoid, NULL) for a CLUSTER reorganization if table has a
CLUSTER key
- pg_reorg_table(tableoid, columnname) for a CLUSTER reorganization based on
a wanted column.

Is it worth the shot?

I haven't seen this documented as such, but AFAICT the reason that
pg_reorg is split into a binary and set of backend functions which are
called by the binary is that pg_reorg needs to be able to control its
steps in several transactions so as to avoid holding locks
excessively. The reorg_one_table() function uses four or five
transactions per table, in fact. If all the logic currently in the
pg_reorg binary were moved into backend functions, calling
pg_reorg_table() would have to be a single transaction, and there
would be no advantage to using such a function vs. CLUSTER or VACUUM
FULL.

Also, having a separate binary we should be able to perform some neat
tricks such as parallel index builds using multiple connections (I'm
messing around with this idea now). AFAIK this would also not be
possible if pg_reorg were contained solely in the library functions.

Josh

#3Michael Paquier
michael@paquier.xyz
In reply to: Josh Kupershmidt (#2)
Re: pg_reorg in core?

On Fri, Sep 21, 2012 at 12:07 PM, Josh Kupershmidt <schmiddy@gmail.com>wrote:

On Thu, Sep 20, 2012 at 7:05 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

Hi all,

During the last PGCon, I heard that some community members would be
interested in having pg_reorg directly in core.

I'm actually not crazy about this idea, at least not given the current
state of pg_reorg. Right now, there are a quite a few fixes and
features which remain to be merged in to cvs head, but at least we can
develop pg_reorg on a schedule independent of Postgres itself, i.e. we
can release new features more often than once a year. Perhaps when
pg_reorg is more stable, and the known bugs and missing features have
been ironed out, we could think about integrating into core.

What could be also great is to move the project directly into github to
facilitate its maintenance and development.
My own copy is based and synced on what is in pgfoundry as I don't own any
admin access to on pgfoundry (honestly don't think I can get one either),
even if I am from NTT. Hey, some people with admin rights here?

Granted, a nice thing about integrating with core is we'd probably
have more of an early warning when reshuffling of PG breaks pg_reorg
(e.g. the recent splitting of the htup headers), but such changes have
been quick and easy to fix so far.

Yes, that is also why I am proposing to integrate it into core. Its
maintenance pace would be faster and easier than it is now in pgfoundry.
However, if hackers do not think that it is worth adding it to core... Well
separate development as done now would be fine but slower...
Also, just by watching the extension modules in contrib, I haven't seen one
using both the library and binary at the same time like pg_reorg does.

- creation of indexes on the temporary table based on what the user wishes

- Apply the logs registered during the index creation
- Swap the names of freshly created table and old table
- Drop the useless objects

The code is hosted by pg_foundry here:

http://pgfoundry.org/projects/reorg/.

I am also maintaining a fork in github in sync with pgfoundry here:
https://github.com/michaelpq/pg_reorg.

Just, do you guys think it is worth adding a functionality like pg_reorg

in

core or not?

If yes, well I think the code of pg_reorg is going to need some
modifications to make it more compatible with contrib modules using only
EXTENSION.
For the time being pg_reorg is divided into 2 parts, binary and library.
The library part is the SQL portion of pg_reorg, containing a set of C
functions that are called by the binary part. This has been extended to
support CREATE EXTENSION recently.
The binary part creates a command pg_reorg in charge of calling the set

of

functions created by the lib part, being just a wrapper of the library

part

to control the creation and deletion of the objects.
It is also in charge of deleting the temporary objects by callback if an
error occurs.

By using the binary command, it is possible to reorganize a single table

or

a database, in this case reorganizing a database launches only a loop on
each table of this database.

My idea is to remove the binary part and to rely only on the library

part to

make pg_reorg a single extension with only system functions like other
contrib modules.

In order to do that what is missing is a function that could be used as

an

entry point for table reorganization, a function of the type
pg_reorg_table(tableoid) and pg_reorg_table(tableoid, text).
All the functionalities of pg_reorg could be reproducible:
- pg_reorg_table(tableoid) for a VACUUM FULL reorganization
- pg_reorg_table(tableoid, NULL) for a CLUSTER reorganization if table

has a

CLUSTER key
- pg_reorg_table(tableoid, columnname) for a CLUSTER reorganization

based on

a wanted column.

Is it worth the shot?

I haven't seen this documented as such, but AFAICT the reason that
pg_reorg is split into a binary and set of backend functions which are
called by the binary is that pg_reorg needs to be able to control its
steps in several transactions so as to avoid holding locks
excessively. The reorg_one_table() function uses four or five
transactions per table, in fact. If all the logic currently in the
pg_reorg binary were moved into backend functions, calling
pg_reorg_table() would have to be a single transaction, and there
would be no advantage to using such a function vs. CLUSTER or VACUUM
FULL.

Of course, but functionalities like CREATE INDEX CONCURRENTLY use multiple
transactions. Couldn't it be possible to use something similar to make the
modifications visible to other backends?

Also, having a separate binary we should be able to perform some neat
tricks such as parallel index builds using multiple connections (I'm
messing around with this idea now). AFAIK this would also not be
possible if pg_reorg were contained solely in the library functions.

Interesting idea, this could accelerate the whole process. I am just
wondering about possible consistency issues like the logs being replayed
before swap.
--
Michael Paquier
http://michael.otacoo.com

#4Hitoshi Harada
umi.tanuki@gmail.com
In reply to: Michael Paquier (#1)
Re: pg_reorg in core?

On Thu, Sep 20, 2012 at 7:05 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

Hi all,

During the last PGCon, I heard that some community members would be
interested in having pg_reorg directly in core.
Just to recall, pg_reorg is a functionality developped by NTT that allows to
redistribute a table without taking locks on it.
The technique it uses to reorganize the table is to create a temporary copy
of the table to be redistributed with a CREATE TABLE AS
whose definition changes if table is redistributed with a VACUUM FULL or
CLUSTER.
Then it follows this mechanism:
- triggers are created to redirect all the DMLs that occur on the table to
an intermediate log table.
- creation of indexes on the temporary table based on what the user wishes
- Apply the logs registered during the index creation
- Swap the names of freshly created table and old table
- Drop the useless objects

I'm not familiar with pg_reorg, but I wonder why we need a separate
program for this task. I know pg_reorg is ok as an external program
per se, but if we could optimize CLUSTER (or VACUUM which I'm a little
pessimistic about) in the same way, it's much nicer than having
additional binary + extension. Isn't it possible to do the same thing
above within the CLUSTER command? Maybe CLUSTER .. CONCURRENTLY?

Thanks,
--
Hitoshi Harada

#5Josh Kupershmidt
schmiddy@gmail.com
In reply to: Michael Paquier (#3)
Re: pg_reorg in core?

On Thu, Sep 20, 2012 at 8:33 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Fri, Sep 21, 2012 at 12:07 PM, Josh Kupershmidt <schmiddy@gmail.com>
wrote:

On Thu, Sep 20, 2012 at 7:05 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

What could be also great is to move the project directly into github to
facilitate its maintenance and development.

No argument from me there, especially as I have my own fork in github,
but that's up to the current maintainers.

Granted, a nice thing about integrating with core is we'd probably
have more of an early warning when reshuffling of PG breaks pg_reorg
(e.g. the recent splitting of the htup headers), but such changes have
been quick and easy to fix so far.

Yes, that is also why I am proposing to integrate it into core. Its
maintenance pace would be faster and easier than it is now in pgfoundry.

If the argument for moving pg_reorg into core is "faster and easier"
development, well I don't really buy that. Yes, there would presumably
be more eyeballs on the project, but you could make the same argument
about any auxiliary Postgres project which wants more attention, and
we can't have everything in core. And I fail to see how being in-core
makes development "easier"; I think everyone here would agree that the
bar to commit things to core is pretty darn high. If you're concerned
about the [lack of] development on pg_reorg, there are plenty of
things to fix without moving the project. I recently posted an "issues
roundup" to the reorg list, if you are interested in pitching in.

Josh

#6M.Sakamoto
sakamoto_masahiko_b1@lab.ntt.co.jp
In reply to: Josh Kupershmidt (#5)
Re: pg_reorg in core?

Hi,
I'm sakamoto, maintainer of reorg.

What could be also great is to move the project directly into github to
facilitate its maintenance and development.

No argument from me there, especially as I have my own fork in github,
but that's up to the current maintainers.

Yup, I am thinking development on CVS(onPgfoundry) is a bit awkward for
me and github would be a suitable place.

To be honest, we have little available development resources, so
no additional features are added recently. But features and fixes to
be done piled up, which Josh sums up.

In the short term, within this month I'll release minor versionup
of reorg to support PostgreSQL 9.2. And I think it's the time to
reconsider the way we maintain pg_reorg.
It's happy that Josh and Michael are interested in reorg,
and I wish you to be a maintainer :)

I think we can discuss at reorg list.

M.Sakamoto NTT OSS Center

#7Daniele Varrazzo
daniele.varrazzo@gmail.com
In reply to: Josh Kupershmidt (#5)
Re: pg_reorg in core?

On Fri, Sep 21, 2012 at 5:17 AM, Josh Kupershmidt <schmiddy@gmail.com> wrote:

If the argument for moving pg_reorg into core is "faster and easier"
development, well I don't really buy that.

I don't see any problem in having pg_reorg in PGXN instead.

I've tried adding a META.json to the project and it seems working fine
with the pgxn client. It is together with other patches in my own
github fork.

https://github.com/dvarrazzo/pg_reorg/

I haven't submitted it to PGXN as I prefer the original author to keep
the ownership.

-- Daniele

#8Michael Paquier
michael@paquier.xyz
In reply to: Daniele Varrazzo (#7)
Re: pg_reorg in core?

On Fri, Sep 21, 2012 at 9:33 PM, Daniele Varrazzo <
daniele.varrazzo@gmail.com> wrote:

On Fri, Sep 21, 2012 at 5:17 AM, Josh Kupershmidt <schmiddy@gmail.com>
wrote:

If the argument for moving pg_reorg into core is "faster and easier"
development, well I don't really buy that.

I don't see any problem in having pg_reorg in PGXN instead.

I've tried adding a META.json to the project and it seems working fine
with the pgxn client. It is together with other patches in my own
github fork.

https://github.com/dvarrazzo/pg_reorg/

I haven't submitted it to PGXN as I prefer the original author to keep
the ownership.

Thanks, I merged your patches with the dev branch for the time being.
It would be great to have some input from the maintainers of pg_reorg in
pgfoundry to see if they agree about putting it in pgxn.
--
Michael Paquier
http://michael.otacoo.com

#9Michael Paquier
michael@paquier.xyz
In reply to: Hitoshi Harada (#4)
Re: pg_reorg in core?

On Fri, Sep 21, 2012 at 1:00 PM, Hitoshi Harada <umi.tanuki@gmail.com>wrote:

I'm not familiar with pg_reorg, but I wonder why we need a separate
program for this task. I know pg_reorg is ok as an external program
per se, but if we could optimize CLUSTER (or VACUUM which I'm a little
pessimistic about) in the same way, it's much nicer than having
additional binary + extension. Isn't it possible to do the same thing
above within the CLUSTER command? Maybe CLUSTER .. CONCURRENTLY?

CLUSTER might be more adapted in this case as the purpose is to reorder the
table.
The same technique used by pg_reorg (aka table coupled with triggers) could
lower the lock access of the table.
Also, it could be possible to control each sub-operation in the same
fashion way as CREATE INDEX CONCURRENTLY.
By the way, whatever the operation, VACUUM or CLUSTER used, I got a couple
of doubts:
1) isn't it be too costly for a core operation as pg_reorg really needs
many temporary objects? Could be possible to reduce the number of objects
created if added to core though...
2) Do you think the current CLUSTER is enough and are there wishes to
implement such an optimization directly in core?
--
Michael Paquier
http://michael.otacoo.com

#10sakamoto
dsakamoto@lolloo.net
In reply to: Michael Paquier (#8)
Re: pg_reorg in core?

(2012/09/21 22:32), Michael Paquier wrote:

On Fri, Sep 21, 2012 at 9:33 PM, Daniele Varrazzo
<daniele.varrazzo@gmail.com <mailto:daniele.varrazzo@gmail.com>> wrote:

On Fri, Sep 21, 2012 at 5:17 AM, Josh Kupershmidt
<schmiddy@gmail.com <mailto:schmiddy@gmail.com>> wrote:

I haven't submitted it to PGXN as I prefer the original author to keep
the ownership.

Thanks, I merged your patches with the dev branch for the time being.
It would be great to have some input from the maintainers of pg_reorg
in pgfoundry to see if they agree about putting it in pgxn.

Hi, I'm Sakamoto, reorg mainainer.
I'm very happy Josh, Michael and Daniele are interested in reorg.

I'm working on the next version of reorg 1.1.8, which will be released
in a couple of days.
And I come to think that it is a point to reconsider the way to
develop/maintain.
To be honest, we have little available development resources, so no
additional
features are added recently. But features and fixes to be done (as Josh
sums up. thanks).

I think it is a good idea to develop on github. Michael's repo is the root?
After the release of 1.1.8, I will freeze CVS repository and create a
mirror on github.
# Or Michael's repo will do :)

I have received some patches from Josh, Daniele. It should be developed
in the next
major version 1.2. So some of them may not be included in 1.1.8 (caz
it's minor versionup),
but I feel so appreciated.

I think we can discuss further at reorg list.

Sakamoto

#11Michael Paquier
michael@paquier.xyz
In reply to: sakamoto (#10)
Re: pg_reorg in core?

On Sat, Sep 22, 2012 at 9:08 AM, sakamoto <dsakamoto@lolloo.net> wrote:

(2012/09/21 22:32), Michael Paquier wrote:

On Fri, Sep 21, 2012 at 9:33 PM, Daniele Varrazzo <
daniele.varrazzo@gmail.com <mailto:daniele.varrazzo@**gmail.com<daniele.varrazzo@gmail.com>>>
wrote:

On Fri, Sep 21, 2012 at 5:17 AM, Josh Kupershmidt
<schmiddy@gmail.com <mailto:schmiddy@gmail.com>> wrote:

I haven't submitted it to PGXN as I prefer the original author to keep
the ownership.

Thanks, I merged your patches with the dev branch for the time being.
It would be great to have some input from the maintainers of pg_reorg in
pgfoundry to see if they agree about putting it in pgxn.

Hi, I'm Sakamoto, reorg mainainer.

I'm very happy Josh, Michael and Daniele are interested in reorg.

I'm working on the next version of reorg 1.1.8, which will be released in
a couple of days.
And I come to think that it is a point to reconsider the way to
develop/maintain.
To be honest, we have little available development resources, so no
additional
features are added recently. But features and fixes to be done (as Josh
sums up. thanks).

I think it is a good idea to develop on github. Michael's repo is the root?
After the release of 1.1.8, I will freeze CVS repository and create a
mirror on github.
# Or Michael's repo will do :)

As you wish. You could create a root folder based on a new organization, or
on your own account, or use my repo.
The result will be the same. I let it at your appreciation

I have received some patches from Josh, Daniele. It should be developed in

the next
major version 1.2. So some of them may not be included in 1.1.8 (caz it's
minor versionup),
but I feel so appreciated.

Great!
--
Michael Paquier
http://michael.otacoo.com

#12Chris Browne
cbbrowne@acm.org
In reply to: sakamoto (#10)
Re: pg_reorg in core?

If the present project is having a tough time doing enhancements, I should
think it mighty questionable to try to draw it into core, that presses it
towards a group of already very busy developers.

On the other hand, if the present development efforts can be made more
public, by having them take place in a more public repository, that at
least has potential to let others in the community see and participate.
There are no guarantees, but privacy is liable to hurt.

I wouldn't expect any sudden huge influx of developers, but a steady
visible stream of development effort would be mighty useful to a "merge
into core" argument.

A *lot* of projects are a lot like this. On the Slony project, we have
tried hard to maintain this sort of visibility. Steve Singer, Jan Wieck
and I do our individual efforts on git repos visible at GitHub to ensure
ongoing efforts aren't invisible inside a corporate repo. It hasn't led to
any massive of extra developers, but I am always grateful to see Peter
Eisentraut's bug reports.

#13sakamoto
dsakamoto@lolloo.net
In reply to: Chris Browne (#12)
Re: pg_reorg in core?

(2012/09/22 10:02), Christopher Browne wrote:

If the present project is having a tough time doing enhancements, I
should think it mighty questionable to try to draw it into core, that
presses it towards a group of already very busy developers.

On the other hand, if the present development efforts can be made more
public, by having them take place in a more public repository, that at
least has potential to let others in the community see and
participate. There are no guarantees, but privacy is liable to hurt.

I wouldn't expect any sudden huge influx of developers, but a steady
visible stream of development effort would be mighty useful to a
"merge into core" argument.

A *lot* of projects are a lot like this. On the Slony project, we
have tried hard to maintain this sort of visibility. Steve Singer,
Jan Wieck and I do our individual efforts on git repos visible at
GitHub to ensure ongoing efforts aren't invisible inside a corporate
repo. It hasn't led to any massive of extra developers, but I am
always grateful to see Peter Eisentraut's bug reports.

Agreed. What reorg project needs first is transparency, including
issue traking, bugs, listup todo items, clearfied release schedules,
quarity assurance and so force.
Only after all that done, the discussion to put them to core can be started.

Until now, reorg is developed and maintained behind corporate repository.
But now that its activity goes slow, what I should do as a maintainer is to
try development process more public and finds someone to corporate with:)

Sakamoto

#14Satoshi Nagayasu
snaga@uptime.jp
In reply to: sakamoto (#13)
Re: pg_reorg in core?

(2012/09/22 11:01), sakamoto wrote:

(2012/09/22 10:02), Christopher Browne wrote:

If the present project is having a tough time doing enhancements, I
should think it mighty questionable to try to draw it into core, that
presses it towards a group of already very busy developers.

On the other hand, if the present development efforts can be made more
public, by having them take place in a more public repository, that at
least has potential to let others in the community see and
participate. There are no guarantees, but privacy is liable to hurt.

I wouldn't expect any sudden huge influx of developers, but a steady
visible stream of development effort would be mighty useful to a
"merge into core" argument.

A *lot* of projects are a lot like this. On the Slony project, we
have tried hard to maintain this sort of visibility. Steve Singer,
Jan Wieck and I do our individual efforts on git repos visible at
GitHub to ensure ongoing efforts aren't invisible inside a corporate
repo. It hasn't led to any massive of extra developers, but I am
always grateful to see Peter Eisentraut's bug reports.

Agreed. What reorg project needs first is transparency, including
issue traking, bugs, listup todo items, clearfied release schedules,
quarity assurance and so force.
Only after all that done, the discussion to put them to core can be
started.

Until now, reorg is developed and maintained behind corporate repository.
But now that its activity goes slow, what I should do as a maintainer is to
try development process more public and finds someone to corporate with:)

I think it's time to consider some *umbrella project* for maintaining
several small projects outside the core.

As you pointed out, the problem here is that it's difficult to keep
enough eyeballs and development resource on tiny projects outside
the core.

For examples, NTT OSSC has created lots of tools, but they're facing
some difficulties to keep them being maintained because of their
development resources. There're diffrent code repositories, different
web sites, diffirent issus tracking system and different dev mailing
lists, for different small projects. My xlogdump as well.

Actually, that's the reason why it's difficult to keep enough eyeballs
on small third-party projects. And also the reason why some developers
want to push their tools into the core, isn't it? :)

To solve this problem, I would like to have some umbrella project.
It would be called "pg dba utils", or something like this.
This umbrella project may contain several third-party tools (pg_reorg,
pg_rman, pg_filedump, xlogdump, etc, etc...) as its sub-modules.

And also it may have single web site, code repository, issue tracking
system and developer mailing list in order to share its development
resource for testing, maintening and releasing. I think it would help
third-party projects keep enough eyeballs even outside the core.

Of course, if a third-party project has faster pace on its development
and enough eyeballs to maintain, it's ok to be an independent project.
However when a tool have already got matured with less eyeballs,
it needs to be merged into this umbrella project.

Any comments?

Sakamoto

--
Satoshi Nagayasu <snaga@uptime.jp>
Uptime Technologies, LLC. http://www.uptime.jp

#15Pavel Stehule
pavel.stehule@gmail.com
In reply to: Satoshi Nagayasu (#14)
Re: pg_reorg in core?

2012/9/22 Satoshi Nagayasu <snaga@uptime.jp>:

(2012/09/22 11:01), sakamoto wrote:

(2012/09/22 10:02), Christopher Browne wrote:

If the present project is having a tough time doing enhancements, I
should think it mighty questionable to try to draw it into core, that
presses it towards a group of already very busy developers.

On the other hand, if the present development efforts can be made more
public, by having them take place in a more public repository, that at
least has potential to let others in the community see and
participate. There are no guarantees, but privacy is liable to hurt.

I wouldn't expect any sudden huge influx of developers, but a steady
visible stream of development effort would be mighty useful to a
"merge into core" argument.

A *lot* of projects are a lot like this. On the Slony project, we
have tried hard to maintain this sort of visibility. Steve Singer,
Jan Wieck and I do our individual efforts on git repos visible at
GitHub to ensure ongoing efforts aren't invisible inside a corporate
repo. It hasn't led to any massive of extra developers, but I am
always grateful to see Peter Eisentraut's bug reports.

Agreed. What reorg project needs first is transparency, including
issue traking, bugs, listup todo items, clearfied release schedules,
quarity assurance and so force.
Only after all that done, the discussion to put them to core can be
started.

Until now, reorg is developed and maintained behind corporate repository.
But now that its activity goes slow, what I should do as a maintainer is to
try development process more public and finds someone to corporate with:)

I think it's time to consider some *umbrella project* for maintaining
several small projects outside the core.

As you pointed out, the problem here is that it's difficult to keep
enough eyeballs and development resource on tiny projects outside
the core.

For examples, NTT OSSC has created lots of tools, but they're facing
some difficulties to keep them being maintained because of their
development resources. There're diffrent code repositories, different
web sites, diffirent issus tracking system and different dev mailing
lists, for different small projects. My xlogdump as well.

Actually, that's the reason why it's difficult to keep enough eyeballs
on small third-party projects. And also the reason why some developers
want to push their tools into the core, isn't it? :)

To solve this problem, I would like to have some umbrella project.
It would be called "pg dba utils", or something like this.
This umbrella project may contain several third-party tools (pg_reorg,
pg_rman, pg_filedump, xlogdump, etc, etc...) as its sub-modules.

And also it may have single web site, code repository, issue tracking
system and developer mailing list in order to share its development
resource for testing, maintening and releasing. I think it would help
third-party projects keep enough eyeballs even outside the core.

Of course, if a third-party project has faster pace on its development
and enough eyeballs to maintain, it's ok to be an independent project.
However when a tool have already got matured with less eyeballs,
it needs to be merged into this umbrella project.

Any comments?

good idea

Pavel

Show quoted text

Sakamoto

--
Satoshi Nagayasu <snaga@uptime.jp>
Uptime Technologies, LLC. http://www.uptime.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Daniele Varrazzo
daniele.varrazzo@gmail.com
In reply to: M.Sakamoto (#6)
Re: pg_reorg in core?

On Fri, Sep 21, 2012 at 9:45 AM, M.Sakamoto
<sakamoto_masahiko_b1@lab.ntt.co.jp> wrote:

Hi,
I'm sakamoto, maintainer of reorg.

What could be also great is to move the project directly into github to
facilitate its maintenance and development.

No argument from me there, especially as I have my own fork in github,
but that's up to the current maintainers.

Yup, I am thinking development on CVS(onPgfoundry) is a bit awkward for
me and github would be a suitable place.

Hello Sakamoto-san

I have created a "reorg" organization on github: https://github.com/reorg/
You are welcome to become one of the owners of the organization. I
have already added Itagaki Takahiro as owner because he has a github
account. If you open a github account or give me the email of one you
own I will invite you as organization owner. Michael is also member of
the organization.

I have re-converted the original CVS repository as Michael's
conversion was missing the commit email info, but I have rebased his
commits on the new master. My intention is to track CVS commits into
the cvs branch of the repos and merge them into the master, until
official development is moved to git.

The repository is at <https://github.com/reorg/pg_reorg&gt;. Because I'm
not sure yet about a few details (from the development model to the
committers emails) it may be rebased in the near future, until
everything has been decided.

Thank you very much.

-- Daniele

#17Peter Eisentraut
peter_e@gmx.net
In reply to: Satoshi Nagayasu (#14)
Re: pg_reorg in core?

On Sat, 2012-09-22 at 16:25 +0900, Satoshi Nagayasu wrote:

I think it's time to consider some *umbrella project* for maintaining
several small projects outside the core.

Well, that was pgfoundry, and it didn't work out.

#18Chris Browne
cbbrowne@acm.org
In reply to: Peter Eisentraut (#17)
Re: pg_reorg in core?

On Sat, Sep 22, 2012 at 7:45 PM, Peter Eisentraut <peter_e@gmx.net> wrote:

On Sat, 2012-09-22 at 16:25 +0900, Satoshi Nagayasu wrote:

I think it's time to consider some *umbrella project* for maintaining
several small projects outside the core.

Well, that was pgfoundry, and it didn't work out.

There seem to be some efforts to update it, but yeah, the software
behind it didn't age gracefully, and it seems doubtful to me that
people will be flocking back to pgfoundry.

The other ongoing attempt at an "umbrella" is PGXN, and it's different
enough in approach that, while it's not obvious that it'll succeed, if
it fails, the failure wouldn't involve the same set of issues that
made pgfoundry problematic.

PGXN notably captures metadata about the project; resources (e.g. -
SCM) don't have to be kept there.
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

#19Greg Sabino Mullane
greg@turnstep.com
In reply to: Peter Eisentraut (#17)
Re: pg_reorg in core?

-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160

I think it's time to consider some *umbrella project* for maintaining
several small projects outside the core.

Well, that was pgfoundry, and it didn't work out.

I'm not sure that is quite analogous to what was being proposed.
I read it as more of "let's package a bunch of these small utilities
together into a single project", such that installing one installs them
all (e.g. aptitude install pg_tools), and they all have a single bug
tracker, etc. That tracker could be github, of course.

I'm not convinced of the merit of that plan, but that's an alternative
interpretation that doesn't involve our beloved pgfoundry. :)

Oh, and -1 for putting it in core. Way too early, and not
important enough.

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 201209222334
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAlBeg/AACgkQvJuQZxSWSsjL5ACgimT71B4lSb1ELhgMw5EBzAKs
xHIAn08vxGzmM6eSmDfZfxlJDTousq7h
=KgXW
-----END PGP SIGNATURE-----

#20Satoshi Nagayasu
snaga@uptime.jp
In reply to: Greg Sabino Mullane (#19)
Re: pg_reorg in core?

2012/09/23 12:37, Greg Sabino Mullane wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160

I think it's time to consider some *umbrella project* for maintaining
several small projects outside the core.

Well, that was pgfoundry, and it didn't work out.

I'm not sure that is quite analogous to what was being proposed.
I read it as more of "let's package a bunch of these small utilities
together into a single project", such that installing one installs them
all (e.g. aptitude install pg_tools), and they all have a single bug
tracker, etc. That tracker could be github, of course.

Exactly --- I do not care the SCM system though. :)

I'm not convinced of the merit of that plan, but that's an alternative
interpretation that doesn't involve our beloved pgfoundry. :)

For example, xlogdump had not been maintained for 5 years when
I picked it up last year. And the latest pg_filedump that supports 9.2
has not been released yet. pg_reorg as well.

If those tools are in a single project, it would be easier to keep
attention on it. Then, developers can easily build *all of them*
at once, fix them, and post any patch on the single mailing list.
Actually, it would save developers from waisting their time.

From my viewpoint, it's not just a SCM or distributing issue.
It's about how to survive for such small projects around the core
even if these could not come in the core.

Regards,

Oh, and -1 for putting it in core. Way too early, and not
important enough.

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 201209222334
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAlBeg/AACgkQvJuQZxSWSsjL5ACgimT71B4lSb1ELhgMw5EBzAKs
xHIAn08vxGzmM6eSmDfZfxlJDTousq7h
=KgXW
-----END PGP SIGNATURE-----

--
Satoshi Nagayasu <snaga@uptime.jp>
Uptime Technologies, LLC. http://www.uptime.jp

#21Michael Paquier
michael@paquier.xyz
In reply to: Satoshi Nagayasu (#20)
#22Daniele Varrazzo
daniele.varrazzo@gmail.com
In reply to: Michael Paquier (#21)
#23Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Daniele Varrazzo (#22)
#24Chris Browne
cbbrowne@acm.org
In reply to: Alvaro Herrera (#23)
#25Simon Riggs
simon@2ndQuadrant.com
In reply to: Michael Paquier (#9)
#26Roberto Mello
rmello@cc.usu.edu
In reply to: Satoshi Nagayasu (#14)
#27Satoshi Nagayasu
snaga@uptime.jp
In reply to: Simon Riggs (#25)
#28Josh Berkus
josh@agliodbs.com
In reply to: Satoshi Nagayasu (#27)
#29Simon Riggs
simon@2ndQuadrant.com
In reply to: Josh Berkus (#28)
#30Josh Berkus
josh@agliodbs.com
In reply to: Simon Riggs (#29)
#31Andres Freund
andres@anarazel.de
In reply to: Josh Berkus (#30)
#32Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#31)
#33Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#32)
#34Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#33)
#35Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Simon Riggs (#25)
#36Michael Paquier
michael@paquier.xyz
In reply to: Dimitri Fontaine (#35)
#37Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#34)
#38Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#37)
#39Andres Freund
andres@anarazel.de
In reply to: Michael Paquier (#38)
#40Bruce Momjian
bruce@momjian.us
In reply to: Josh Berkus (#30)