Improving prep_buildtree used in VPATH builds

Started by Gurjeet Singhover 15 years ago25 messageshackers
Jump to latest
#1Gurjeet Singh
singh.gurjeet@gmail.com

Attached is the patch that tries to speedup prep_buildtree script, which is
used in VPATH builds, from our configure script.

The idea is to ask `find` to emit directory listing in depth-first order so
that the `mkdir -p` will create the deepest directory first and any
subsequent `mkdir -p` on an intermediate directory will not have to do
anything.

Currently I am seeing a performance improvement of this script by only about
500 ms; say 11.8 seconds vs. 11.3 secs. But I remember distinctly that
yesterday I was able to see an improvement of 11% on the same virtual
machine, averaged on multiple runs; 42 sec vs 37 sec. It might be the case
that the host OS or my Linux virtual machine were loaded at that time and
the filesystem could not cache enough inodes.

Seems like it would improve performance in general, but more so under load
conditions when you actually need it. I am not sure if -depth option is
supported by all incarnations of 'find'.

I have been away from Postgres development for quite a while, so would
appreciate if someone could tell me if such a patch should be submitted for
commitfest (since this is not actually a source patch).

Regards,
--
gurjeet.singh
@ EnterpriseDB - The Enterprise Postgres Company
http://www.EnterpriseDB.com

singh.gurjeet@{ gmail | yahoo }.com
Twitter/Skype: singh_gurjeet

Mail sent from my BlackLaptop device

Attachments:

prep_buildtree.patchapplication/octet-stream; name=prep_buildtree.patchDownload+1-1
#2Robert Haas
robertmhaas@gmail.com
In reply to: Gurjeet Singh (#1)
Re: Improving prep_buildtree used in VPATH builds

On Sun, Sep 26, 2010 at 10:15 PM, Gurjeet Singh <singh.gurjeet@gmail.com> wrote:

I have been away from Postgres development for quite a while, so would
appreciate if someone could tell me if such a patch should be submitted for
commitfest (since this is not actually a source patch).

By all means add it to the open CF.

https://commitfest.postgresql.org/action/commitfest_view/open

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#3Gurjeet Singh
singh.gurjeet@gmail.com
In reply to: Robert Haas (#2)
Re: Improving prep_buildtree used in VPATH builds

On Mon, Sep 27, 2010 at 3:02 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Sun, Sep 26, 2010 at 10:15 PM, Gurjeet Singh <singh.gurjeet@gmail.com>
wrote:

I have been away from Postgres development for quite a while, so would
appreciate if someone could tell me if such a patch should be submitted

for

commitfest (since this is not actually a source patch).

By all means add it to the open CF.

https://commitfest.postgresql.org/action/commitfest_view/open

When trying to submit new patch, the 'CommitFest' drop-down has just one
entry '(None Selected)', and 'Submit' would refuse to go through without a
topic.

What should be the value of 'Message-ID for original patch' ?
the URL: http://archives.postgresql.org/pgsql-hackers/2010-09/msg01837.php
or the ID: AANLkTinw0HL+jQmrtwXC9Y2tqhcfHFgFekxyyfYGvQrB@mail.gmail.com<AANLkTinw0HL%2BjQmrtwXC9Y2tqhcfHFgFekxyyfYGvQrB@mail.gmail.com>

Regards,
--
gurjeet.singh
@ EnterpriseDB - The Enterprise Postgres Company
http://www.EnterpriseDB.com

singh.gurjeet@{ gmail | yahoo }.com
Twitter/Skype: singh_gurjeet

Mail sent from my BlackLaptop device

#4Robert Haas
robertmhaas@gmail.com
In reply to: Gurjeet Singh (#3)
Re: Improving prep_buildtree used in VPATH builds

On Mon, Sep 27, 2010 at 10:09 AM, Gurjeet Singh <singh.gurjeet@gmail.com> wrote:

On Mon, Sep 27, 2010 at 3:02 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Sun, Sep 26, 2010 at 10:15 PM, Gurjeet Singh <singh.gurjeet@gmail.com>
wrote:

I have been away from Postgres development for quite a while, so would
appreciate if someone could tell me if such a patch should be submitted
for
commitfest (since this is not actually a source patch).

By all means add it to the open CF.

https://commitfest.postgresql.org/action/commitfest_view/open

When trying to submit new patch, the 'CommitFest' drop-down has just one
entry '(None Selected)', and 'Submit' would refuse to go through without a
topic.

Oh, blah. I just added a Miscellaneous topic. You can add others
yourself, just look on the CF page for "CommitFest topics".

What should be the value of 'Message-ID for original patch' ?
the URL: http://archives.postgresql.org/pgsql-hackers/2010-09/msg01837.php
or the ID: AANLkTinw0HL+jQmrtwXC9Y2tqhcfHFgFekxyyfYGvQrB@mail.gmail.com

The latter.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#5Andrew Dunstan
andrew@dunslane.net
In reply to: Robert Haas (#4)
Re: Improving prep_buildtree used in VPATH builds

On 09/27/2010 10:11 AM, Robert Haas wrote:

What should be the value of 'Message-ID for original patch' ?
the URL: http://archives.postgresql.org/pgsql-hackers/2010-09/msg01837.php
or the ID: AANLkTinw0HL+jQmrtwXC9Y2tqhcfHFgFekxyyfYGvQrB@mail.gmail.com

The latter.

Could this perhaps be made clearer on the page, perhaps with an example?
It confused me recently too.

cheers

andrew

#6Gurjeet Singh
singh.gurjeet@gmail.com
In reply to: Andrew Dunstan (#5)
Re: Improving prep_buildtree used in VPATH builds

On Mon, Sep 27, 2010 at 4:15 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

On 09/27/2010 10:11 AM, Robert Haas wrote:

What should be the value of 'Message-ID for original patch' ?

the URL:
http://archives.postgresql.org/pgsql-hackers/2010-09/msg01837.php
or the ID: AANLkTinw0HL+jQmrtwXC9Y2tqhcfHFgFekxyyfYGvQrB@mail.gmail.com<AANLkTinw0HL%2BjQmrtwXC9Y2tqhcfHFgFekxyyfYGvQrB@mail.gmail.com>

The latter.

Could this perhaps be made clearer on the page, perhaps with an example? It
confused me recently too.

Or maybe populate the drop-down with every available topic from previous
CFs, and add an additional '(Add new topic)' to the drop-down which would
take you to topic creation page.

Regards,
--
gurjeet.singh
@ EnterpriseDB - The Enterprise Postgres Company
http://www.EnterpriseDB.com

singh.gurjeet@{ gmail | yahoo }.com
Twitter/Skype: singh_gurjeet

Mail sent from my BlackLaptop device

#7Robert Haas
robertmhaas@gmail.com
In reply to: Andrew Dunstan (#5)
Re: Improving prep_buildtree used in VPATH builds

On Mon, Sep 27, 2010 at 10:15 AM, Andrew Dunstan <andrew@dunslane.net> wrote:

On 09/27/2010 10:11 AM, Robert Haas wrote:

What should be the value of 'Message-ID for original patch' ?
the URL:
http://archives.postgresql.org/pgsql-hackers/2010-09/msg01837.php
or the ID: AANLkTinw0HL+jQmrtwXC9Y2tqhcfHFgFekxyyfYGvQrB@mail.gmail.com

The latter.

Could this perhaps be made clearer on the page, perhaps with an example? It
confused me recently too.

Can you suggest something more specific?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#8Robert Haas
robertmhaas@gmail.com
In reply to: Gurjeet Singh (#6)
Re: Improving prep_buildtree used in VPATH builds

On Mon, Sep 27, 2010 at 10:18 AM, Gurjeet Singh <singh.gurjeet@gmail.com> wrote:

On Mon, Sep 27, 2010 at 4:15 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

On 09/27/2010 10:11 AM, Robert Haas wrote:

What should be the value of 'Message-ID for original patch' ?
the URL:
http://archives.postgresql.org/pgsql-hackers/2010-09/msg01837.php
or the ID: AANLkTinw0HL+jQmrtwXC9Y2tqhcfHFgFekxyyfYGvQrB@mail.gmail.com

The latter.

Could this perhaps be made clearer on the page, perhaps with an example?
It confused me recently too.

Or maybe populate the drop-down with every available topic from previous
CFs, and add an additional '(Add new topic)' to the drop-down which would
take you to topic creation page.

Andrew's question seemed to be about the message-ID. I agree the
topic thing is confusing, though. I'm wondering if it would be
sufficient to do the following - if no topic are available, instead of
showing the form, it says something like:

No topics have been created for this CommitFest yet. Before adding
your patch, you must add one or more items to the <link>topic
list</link>.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#8)
Re: Improving prep_buildtree used in VPATH builds

Robert Haas <robertmhaas@gmail.com> writes:

Andrew's question seemed to be about the message-ID. I agree the
topic thing is confusing, though. I'm wondering if it would be
sufficient to do the following - if no topic are available, instead of
showing the form, it says something like:

No topics have been created for this CommitFest yet. Before adding
your patch, you must add one or more items to the <link>topic
list</link>.

I liked the idea of pre-populating with the historical set of topics.
If you encourage the first few submitters to a new CF to invent their
own topic categories without any guidance, you're going to get some
crazy topics.

regards, tom lane

#10Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#9)
Re: Improving prep_buildtree used in VPATH builds

On Mon, Sep 27, 2010 at 10:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Andrew's question seemed to be about the message-ID.  I agree the
topic thing is confusing, though.  I'm wondering if it would be
sufficient to do the following - if no topic are available, instead of
showing the form, it says something like:

No topics have been created for this CommitFest yet.  Before adding
your patch, you must add one or more items to the <link>topic
list</link>.

I liked the idea of pre-populating with the historical set of topics.
If you encourage the first few submitters to a new CF to invent their
own topic categories without any guidance, you're going to get some
crazy topics.

Well, the historical set of topics varies from CommitFest to
CommitFest, by design. There are some that recur pretty regularly, of
course, like Security, Performance, and Miscellaneous. But not every
CF will have a section for ECPG or Refactoring, for example. In one
CF, we may have six ECPG patches, so ECPG gets its own topic; in
another CF, 1 ECPG patch + 2 libpq patches + 1 psql patch get merged
together under a section called Interfaces. This generally makes it
easier to group things in ways that are useful in practice than a
fixed list of topics, so I'm in favor of keeping it that way.

This is surely a surmountable issue but the exact right thing to do is
not altogether obvious to me.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#11Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#9)
Re: Improving prep_buildtree used in VPATH builds

Tom Lane <tgl@sss.pgh.pa.us> wrote:

I liked the idea of pre-populating with the historical set of
topics.

+1

-Kevin

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#10)
Re: Improving prep_buildtree used in VPATH builds

Robert Haas <robertmhaas@gmail.com> writes:

Well, the historical set of topics varies from CommitFest to
CommitFest, by design. There are some that recur pretty regularly, of
course, like Security, Performance, and Miscellaneous. But not every
CF will have a section for ECPG or Refactoring, for example. In one
CF, we may have six ECPG patches, so ECPG gets its own topic; in
another CF, 1 ECPG patch + 2 libpq patches + 1 psql patch get merged
together under a section called Interfaces. This generally makes it
easier to group things in ways that are useful in practice than a
fixed list of topics, so I'm in favor of keeping it that way.

If it's intentional that the topic for the same patch might vary
depending on what else is submitted in the same CF, then I think that
asking submitters to select topics is the wrong thing from the get-go.
The patches should be uncategorized initially, and then someone like the
CF manager should group them into topics after-the-fact.

regards, tom lane

#13Andrew Dunstan
andrew@dunslane.net
In reply to: Robert Haas (#7)
Re: Improving prep_buildtree used in VPATH builds

On 09/27/2010 10:39 AM, Robert Haas wrote:

On Mon, Sep 27, 2010 at 10:15 AM, Andrew Dunstan<andrew@dunslane.net> wrote:

On 09/27/2010 10:11 AM, Robert Haas wrote:

What should be the value of 'Message-ID for original patch' ?
the URL:
http://archives.postgresql.org/pgsql-hackers/2010-09/msg01837.php
or the ID: AANLkTinw0HL+jQmrtwXC9Y2tqhcfHFgFekxyyfYGvQrB@mail.gmail.com

The latter.

Could this perhaps be made clearer on the page, perhaps with an example? It
confused me recently too.

Can you suggest something more specific?

Well, it could say something like:

The Message-ID can be found in the headers of the relevant email to
the pgsql-hackers mailing list, and also in the mailing list
archives at http://archives.postgresql.org. It looks something like
this (the format varies somewhat depending on the sender's Mail User
Agent): AANLkTinw0HL+jQmrtwXC9Y2tqhcfHFgFekxyyfYGvQrB@mail.gmail.com

That would certainly have given me, and I suspect Gurjeet, enough clue.

cheers

andrew

#14Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#12)
Re: Improving prep_buildtree used in VPATH builds

On Mon, Sep 27, 2010 at 11:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Well, the historical set of topics varies from CommitFest to
CommitFest, by design.  There are some that recur pretty regularly, of
course, like Security, Performance, and Miscellaneous.  But not every
CF will have a section for ECPG or Refactoring, for example.  In one
CF, we may have six ECPG patches, so ECPG gets its own topic; in
another CF, 1 ECPG patch + 2 libpq patches + 1 psql patch get merged
together under a section called Interfaces.  This generally makes it
easier to group things in ways that are useful in practice than a
fixed list of topics, so I'm in favor of keeping it that way.

If it's intentional that the topic for the same patch might vary
depending on what else is submitted in the same CF, then I think that
asking submitters to select topics is the wrong thing from the get-go.
The patches should be uncategorized initially, and then someone like the
CF manager should group them into topics after-the-fact.

That's actually not a bad idea, although it would require a bit of
hacking given the way the schema is currently set up. The current
system has been working well enough that I'm inclined to do something
simpler for the present, like maybe just auto-create MIscellaneous for
each new CF. That would have more or less the same effect for about
one-tenth the work.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#15Robert Haas
robertmhaas@gmail.com
In reply to: Andrew Dunstan (#13)
Re: Improving prep_buildtree used in VPATH builds

On Mon, Sep 27, 2010 at 11:15 AM, Andrew Dunstan <andrew@dunslane.net> wrote:

On 09/27/2010 10:39 AM, Robert Haas wrote:

On Mon, Sep 27, 2010 at 10:15 AM, Andrew Dunstan <andrew@dunslane.net>
wrote:

On 09/27/2010 10:11 AM, Robert Haas wrote:

What should be the value of 'Message-ID for original patch' ?
the URL:
http://archives.postgresql.org/pgsql-hackers/2010-09/msg01837.php
or the ID: AANLkTinw0HL+jQmrtwXC9Y2tqhcfHFgFekxyyfYGvQrB@mail.gmail.com

The latter.

Could this perhaps be made clearer on the page, perhaps with an example? It
confused me recently too.

Can you suggest something more specific?

Well, it could say something like:

The Message-ID can be found in the headers of the relevant email to the
pgsql-hackers mailing list, and also in the mailing list archives at
http://archives.postgresql.org. It looks something like this (the format
varies somewhat depending on the sender's Mail User Agent):
AANLkTinw0HL+jQmrtwXC9Y2tqhcfHFgFekxyyfYGvQrB@mail.gmail.com

That would certainly have given me, and I suspect Gurjeet, enough clue.

Where on the page would you suggest that we put that text?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#16Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Gurjeet Singh (#1)
Re: Improving prep_buildtree used in VPATH builds

Excerpts from Gurjeet Singh's message of dom sep 26 22:15:59 -0400 2010:

Currently I am seeing a performance improvement of this script by only about
500 ms; say 11.8 seconds vs. 11.3 secs. But I remember distinctly that
yesterday I was able to see an improvement of 11% on the same virtual
machine, averaged on multiple runs; 42 sec vs 37 sec. It might be the case
that the host OS or my Linux virtual machine were loaded at that time and
the filesystem could not cache enough inodes.

Hmm. On my otherwise idle desktop machine, I can't measure a difference.
But this machine has enough RAM for inode cache.

With patch:

real 0m3.092s
user 0m0.900s
sys 0m2.220s

real 0m3.116s
user 0m0.928s
sys 0m2.176s

real 0m3.128s
user 0m1.040s
sys 0m2.108s

Without patch:

real 0m3.109s
user 0m0.852s
sys 0m2.180s

real 0m3.101s
user 0m0.884s
sys 0m2.264s

real 0m3.121s
user 0m0.968s
sys 0m2.140s

Seems like it would improve performance in general, but more so under load
conditions when you actually need it. I am not sure if -depth option is
supported by all incarnations of 'find'.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#17Andrew Dunstan
andrew@dunslane.net
In reply to: Robert Haas (#15)
Re: Improving prep_buildtree used in VPATH builds

On 09/27/2010 11:16 AM, Robert Haas wrote:

On Mon, Sep 27, 2010 at 11:15 AM, Andrew Dunstan<andrew@dunslane.net> wrote:

On 09/27/2010 10:39 AM, Robert Haas wrote:

On Mon, Sep 27, 2010 at 10:15 AM, Andrew Dunstan<andrew@dunslane.net>
wrote:

On 09/27/2010 10:11 AM, Robert Haas wrote:

What should be the value of 'Message-ID for original patch' ?
the URL:
http://archives.postgresql.org/pgsql-hackers/2010-09/msg01837.php
or the ID: AANLkTinw0HL+jQmrtwXC9Y2tqhcfHFgFekxyyfYGvQrB@mail.gmail.com

The latter.

Could this perhaps be made clearer on the page, perhaps with an example? It
confused me recently too.

Can you suggest something more specific?

Well, it could say something like:

The Message-ID can be found in the headers of the relevant email to the
pgsql-hackers mailing list, and also in the mailing list archives at
http://archives.postgresql.org. It looks something like this (the format
varies somewhat depending on the sender's Mail User Agent):
AANLkTinw0HL+jQmrtwXC9Y2tqhcfHFgFekxyyfYGvQrB@mail.gmail.com

That would certainly have given me, and I suspect Gurjeet, enough clue.

Where on the page would you suggest that we put that text?

Following "Enter your comments below. If you wish your comment to
reference a message from the mailing list archives, enter the message ID
into the space provided." The point is that because the app nicely turns
the Message-id into a URL that links to the archives, it's not entirely
clear whether the user needs to enter the whole URL or not.

Another way to handle this might be to extract it from an http URL if
given one.

It's a minor nit - it didn't seem worth raising at the time, and I only
commented when I saw that someone else had the same small confusion I
had had.

cheers

andrew

#18Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andrew Dunstan (#17)
Re: Improving prep_buildtree used in VPATH builds

Excerpts from Andrew Dunstan's message of lun sep 27 11:28:33 -0400 2010:

Another way to handle this might be to extract it from an http URL if
given one.

+1 for this approach

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#19Robert Haas
robertmhaas@gmail.com
In reply to: Andrew Dunstan (#17)
Re: Improving prep_buildtree used in VPATH builds

On Mon, Sep 27, 2010 at 11:28 AM, Andrew Dunstan <andrew@dunslane.net> wrote:

Could this perhaps be made clearer on the page, perhaps with an example?
It confused me recently too.

Can you suggest something more specific?

Well, it could say something like:

The Message-ID can be found in the headers of the relevant email to the
pgsql-hackers mailing list, and also in the mailing list archives at
http://archives.postgresql.org. It looks something like this (the format
varies somewhat depending on the sender's Mail User Agent):
AANLkTinw0HL+jQmrtwXC9Y2tqhcfHFgFekxyyfYGvQrB@mail.gmail.com

That would certainly have given me, and I suspect Gurjeet, enough clue.

Where on the page would you suggest that we put that text?

Following "Enter your comments below. If you wish your comment to reference
a message from the mailing list archives, enter the message ID into the
space provided."

Done.

Another way to handle this might be to extract it from an http URL if given
one.

I'm going to leave this idea for another day, thought it's not a bad
one if someone feels motivated to write the code.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#20Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#14)
Re: Improving prep_buildtree used in VPATH builds

On Mon, Sep 27, 2010 at 11:15 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Sep 27, 2010 at 11:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Well, the historical set of topics varies from CommitFest to
CommitFest, by design.  There are some that recur pretty regularly, of
course, like Security, Performance, and Miscellaneous.  But not every
CF will have a section for ECPG or Refactoring, for example.  In one
CF, we may have six ECPG patches, so ECPG gets its own topic; in
another CF, 1 ECPG patch + 2 libpq patches + 1 psql patch get merged
together under a section called Interfaces.  This generally makes it
easier to group things in ways that are useful in practice than a
fixed list of topics, so I'm in favor of keeping it that way.

If it's intentional that the topic for the same patch might vary
depending on what else is submitted in the same CF, then I think that
asking submitters to select topics is the wrong thing from the get-go.
The patches should be uncategorized initially, and then someone like the
CF manager should group them into topics after-the-fact.

That's actually not a bad idea, although it would require a bit of
hacking given the way the schema is currently set up.  The current
system has been working well enough that I'm inclined to do something
simpler for the present, like maybe just auto-create MIscellaneous for
each new CF.  That would have more or less the same effect for about
one-tenth the work.

Eh, on further review, I decided to do something even simpler still,
which is to say unbreak the warning message that's supposed to appear
in this case. Doing one of the things listed above is probably
better, but this took approximately 60 seconds, so let's wait and see
whether it helps. If not, I'll whack it around some more.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#16)
#22Gurjeet Singh
singh.gurjeet@gmail.com
In reply to: Tom Lane (#21)
#23Greg Smith
gsmith@gregsmith.com
In reply to: Gurjeet Singh (#1)
#24Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Greg Smith (#23)
#25Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#24)