Hi
Hi,
despite bein chronically short of time I'd like to be involved with some
database development in free software. It seems to me that postgresql is
the most complete package out there and it also has some nice features
like object identity. I consider myself fairly knowledgable when it
comes to databases (at least it brought me my Ph.D.) so I think I could
do some. But before I do so I'd like to ask some questions about the
status of development:
1) Is there an embedded SQL preprocessor available? I read about an
alpha version, but is it still maintained?
2) How about ODBC? I know there is a driver but apparently I cannot use
all ODBC features. I have an application on Windows that I'd like to use
to stress test postgresql a little bit, but I get an SQLError -1
everytime I try to fetch a record. Is this a knwon bug?
3) How would you describe the general goal of this project: Offer as
good a database system as you can create as free software? That is
should it be able to handle huge amounts of data, or is it seen more as
a tool for a private person/small business?
4) Are there areas/Which ares need help?
Please tell me what you think
Michael
--
Dr. Michael Meskes, Project-Manager | topsystem Systemhaus GmbH
meskes@topsystem.de | Europark A2, Adenauerstr. 20
meskes@debian.org | 52146 Wuerselen
Go SF49ers! Go Rhein Fire! | Tel: (+49) 2405/4670-44
Use Debian GNU/Linux! | Fax: (+49) 2405/4670-10
On Wed, 4 Feb 1998, Michael Meskes wrote:
Hi,
despite bein chronically short of time I'd like to be involved with some
database development in free software. It seems to me that postgresql is
the most complete package out there and it also has some nice features
like object identity. I consider myself fairly knowledgable when it
comes to databases (at least it brought me my Ph.D.) so I think I could
do some. But before I do so I'd like to ask some questions about the
status of development:
Welcome :)
2) How about ODBC? I know there is a driver but apparently I cannot use
all ODBC features. I have an application on Windows that I'd like to use
to stress test postgresql a little bit, but I get an SQLError -1
everytime I try to fetch a record. Is this a knwon bug?
Julie (maintainer of PostODBC(src/interfaces/odbc)) has just
joined this list, and will hopefully be able to answer more completely,
but the PostODBC driver is currently listed as v0.21, so is still young...
Any aid you can provide in helping to mature it are most welcome,
both on our part, and, I imagine, Julie's...
3) How would you describe the general goal of this project: Offer as
good a database system as you can create as free software? That is
should it be able to handle huge amounts of data, or is it seen more as
a tool for a private person/small business?
Both...We have at least two ppl so far, that I'm aware of, that
are dealing with systems of several GB of data through it, including one
of the core developers...like *BSD/Linux "competing" with the more
entrenched OSs like Solaris (and Microsloth), we're moving towards the
concept of being a viable alternative to such monolithic RDBMS systems
such as Oracle/Informix...but we are also working at making sure that we
don't loss site of the majority of our users, which tend to be ppl using
it for WWW based projects.
4) Are there areas/Which ares need help?
http://www.postgresql.org/docs/todo.shtml :)
There is a long list of things that are "Wish list" items...if you
decide to dive into something, all we ask is that you post something here
so that others know and to prevent "recreating the wheel"...just in case
someone else is already working on it.
Okay. Sounds good. :-)
How do I get the latest source tree and how is it synchronised?
Michael
--
Dr. Michael Meskes, Project-Manager | topsystem Systemhaus GmbH
meskes@topsystem.de | Europark A2, Adenauerstr. 20
meskes@debian.org | 52146 Wuerselen
Go SF49ers! Go Rhein Fire! | Tel: (+49) 2405/4670-44
Use Debian GNU/Linux! | Fax: (+49) 2405/4670-10
Show quoted text
-----Original Message-----
From: The Hermit Hacker [SMTP:scrappy@hub.org]
Sent: Wednesday, February 04, 1998 2:06 PM
To: Michael Meskes
Cc: pgsql-hackers@postgreSQL.org
Subject: Re: [HACKERS] HiOn Wed, 4 Feb 1998, Michael Meskes wrote:
Hi,
despite bein chronically short of time I'd like to be involved with
some
database development in free software. It seems to me that
postgresql is
the most complete package out there and it also has some nice
features
like object identity. I consider myself fairly knowledgable when it
comes to databases (at least it brought me my Ph.D.) so I think Icould
do some. But before I do so I'd like to ask some questions about the
status of development:Welcome :)
2) How about ODBC? I know there is a driver but apparently I cannot
use
all ODBC features. I have an application on Windows that I'd like to
use
to stress test postgresql a little bit, but I get an SQLError -1
everytime I try to fetch a record. Is this a knwon bug?Julie (maintainer of PostODBC(src/interfaces/odbc)) has just
joined this list, and will hopefully be able to answer more
completely,
but the PostODBC driver is currently listed as v0.21, so is still
young...Any aid you can provide in helping to mature it are most
welcome,
both on our part, and, I imagine, Julie's...3) How would you describe the general goal of this project: Offer as
good a database system as you can create as free software? That is
should it be able to handle huge amounts of data, or is it seen moreas
a tool for a private person/small business?
Both...We have at least two ppl so far, that I'm aware of, that
are dealing with systems of several GB of data through it, including
one
of the core developers...like *BSD/Linux "competing" with the more
entrenched OSs like Solaris (and Microsloth), we're moving towards the
concept of being a viable alternative to such monolithic RDBMS systems
such as Oracle/Informix...but we are also working at making sure that
we
don't loss site of the majority of our users, which tend to be ppl
using
it for WWW based projects.4) Are there areas/Which ares need help?
http://www.postgresql.org/docs/todo.shtml :)
There is a long list of things that are "Wish list" items...if
you
decide to dive into something, all we ask is that you post something
here
so that others know and to prevent "recreating the wheel"...just in
case
someone else is already working on it.
Import Notes
Resolved by subject fallback
On Wed, 4 Feb 1998, Meskes, Michael wrote:
Okay. Sounds good. :-)
How do I get the latest source tree and how is it synchronised?
We recommend that everyone doing development work use or grab
CVSup (see ftp://ftp.postgresql.org/pub/CVSup for details and some
binaries), which is the fastest/most uptodate way of grabbing the sources.
Right now, being in beta mode, there is a daily snapshot being
made and placed in ftp://ftp.postgresql.org/pub/postgresql.snapshot.tar.gz
After v6.4 is released, that will go back down to weekly snapshots
vs daily
Show quoted text
Michael
--
Dr. Michael Meskes, Project-Manager | topsystem Systemhaus GmbH
meskes@topsystem.de | Europark A2, Adenauerstr. 20
meskes@debian.org | 52146 Wuerselen
Go SF49ers! Go Rhein Fire! | Tel: (+49) 2405/4670-44
Use Debian GNU/Linux! | Fax: (+49) 2405/4670-10-----Original Message-----
From: The Hermit Hacker [SMTP:scrappy@hub.org]
Sent: Wednesday, February 04, 1998 2:06 PM
To: Michael Meskes
Cc: pgsql-hackers@postgreSQL.org
Subject: Re: [HACKERS] HiOn Wed, 4 Feb 1998, Michael Meskes wrote:
Hi,
despite bein chronically short of time I'd like to be involved with
some
database development in free software. It seems to me that
postgresql is
the most complete package out there and it also has some nice
features
like object identity. I consider myself fairly knowledgable when it
comes to databases (at least it brought me my Ph.D.) so I think Icould
do some. But before I do so I'd like to ask some questions about the
status of development:Welcome :)
2) How about ODBC? I know there is a driver but apparently I cannot
use
all ODBC features. I have an application on Windows that I'd like to
use
to stress test postgresql a little bit, but I get an SQLError -1
everytime I try to fetch a record. Is this a knwon bug?Julie (maintainer of PostODBC(src/interfaces/odbc)) has just
joined this list, and will hopefully be able to answer more
completely,
but the PostODBC driver is currently listed as v0.21, so is still
young...Any aid you can provide in helping to mature it are most
welcome,
both on our part, and, I imagine, Julie's...3) How would you describe the general goal of this project: Offer as
good a database system as you can create as free software? That is
should it be able to handle huge amounts of data, or is it seen moreas
a tool for a private person/small business?
Both...We have at least two ppl so far, that I'm aware of, that
are dealing with systems of several GB of data through it, including
one
of the core developers...like *BSD/Linux "competing" with the more
entrenched OSs like Solaris (and Microsloth), we're moving towards the
concept of being a viable alternative to such monolithic RDBMS systems
such as Oracle/Informix...but we are also working at making sure that
we
don't loss site of the majority of our users, which tend to be ppl
using
it for WWW based projects.4) Are there areas/Which ares need help?
http://www.postgresql.org/docs/todo.shtml :)
There is a long list of things that are "Wish list" items...if
you
decide to dive into something, all we ask is that you post something
here
so that others know and to prevent "recreating the wheel"...just in
case
someone else is already working on it.
1) Is there an embedded SQL preprocessor available? I read about an
alpha version, but is it still maintained?
I remember mention of it on the list a while ago, but have not heard
anything since. Also, no code for this was contributed for the
distribution, and it should be included. It would be _great_ to have a real
preprocessor.
3) How would you describe the general goal of this project: Offer as
good a database system as you can create as free software? That is
should it be able to handle huge amounts of data, or is it seen more as
a tool for a private person/small business?
As scrappy pointed out, we would like to not forget either kind of user.
However, afaik PostgreSQL is the closest to a "heavyweight" RDBMS available
in the freeware world, so if we had to choose we would probably put
ourselves on the "huge amounts of data" end of the spectrum.
4) Are there areas/Which areas need help?
The ToDo has lots of projects. Also, as you get started and learn the
details of Postgres internals the docs project could use some contributions
:).
There are other areas which are not yet called out in the ToDo list,
including
- recovery of munged databases (I think disk full conditions may cause
damage)
- transaction-only and session-only tables, variables, etc.
- re-think the table/type interface to allow more extensibility of types
But, picking simpler items from the ToDo is probably the best way to get
familiar with pieces of the code.
Welcome.
- Tom
Okay, I'm just checking out sources.
Michael
--
Dr. Michael Meskes, Project-Manager | topsystem Systemhaus GmbH
meskes@topsystem.de | Europark A2, Adenauerstr. 20
meskes@debian.org | 52146 Wuerselen
Go SF49ers! Go Rhein Fire! | Tel: (+49) 2405/4670-44
Use Debian GNU/Linux! | Fax: (+49) 2405/4670-10
Show quoted text
-----Original Message-----
From: The Hermit Hacker [SMTP:scrappy@hub.org]
Sent: Wednesday, February 04, 1998 2:42 PM
To: Meskes, Michael
Cc: pgsql-hackers@postgreSQL.org
Subject: RE: [HACKERS] HiOn Wed, 4 Feb 1998, Meskes, Michael wrote:
Okay. Sounds good. :-)
How do I get the latest source tree and how is it synchronised?
We recommend that everyone doing development work use or grab
CVSup (see ftp://ftp.postgresql.org/pub/CVSup for details and some
binaries), which is the fastest/most uptodate way of grabbing the
sources.Right now, being in beta mode, there is a daily snapshot being
made and placed in
ftp://ftp.postgresql.org/pub/postgresql.snapshot.tar.gzAfter v6.4 is released, that will go back down to weekly
snapshots
vs dailyMichael
--
Dr. Michael Meskes, Project-Manager | topsystem Systemhaus GmbH
meskes@topsystem.de | Europark A2, Adenauerstr.20
meskes@debian.org | 52146 Wuerselen
Go SF49ers! Go Rhein Fire! | Tel: (+49) 2405/4670-44
Use Debian GNU/Linux! | Fax: (+49) 2405/4670-10-----Original Message-----
From: The Hermit Hacker [SMTP:scrappy@hub.org]
Sent: Wednesday, February 04, 1998 2:06 PM
To: Michael Meskes
Cc: pgsql-hackers@postgreSQL.org
Subject: Re: [HACKERS] HiOn Wed, 4 Feb 1998, Michael Meskes wrote:
Hi,
despite bein chronically short of time I'd like to be involved
with
some
database development in free software. It seems to me that
postgresql is
the most complete package out there and it also has some nice
features
like object identity. I consider myself fairly knowledgable when
it
comes to databases (at least it brought me my Ph.D.) so I think
I
could
do some. But before I do so I'd like to ask some questions about
the
status of development:
Welcome :)
2) How about ODBC? I know there is a driver but apparently I
cannot
use
all ODBC features. I have an application on Windows that I'd
like to
use
to stress test postgresql a little bit, but I get an SQLError -1
everytime I try to fetch a record. Is this a knwon bug?Julie (maintainer of PostODBC(src/interfaces/odbc)) has just
joined this list, and will hopefully be able to answer more
completely,
but the PostODBC driver is currently listed as v0.21, so is still
young...Any aid you can provide in helping to mature it are most
welcome,
both on our part, and, I imagine, Julie's...3) How would you describe the general goal of this project:
Offer as
good a database system as you can create as free software? That
is
should it be able to handle huge amounts of data, or is it seen
more
as
a tool for a private person/small business?
Both...We have at least two ppl so far, that I'm aware of, that
are dealing with systems of several GB of data through it,including
one
of the core developers...like *BSD/Linux "competing" with the more
entrenched OSs like Solaris (and Microsloth), we're moving towardsthe
concept of being a viable alternative to such monolithic RDBMS
systems
such as Oracle/Informix...but we are also working at making sure
that
we
don't loss site of the majority of our users, which tend to be ppl
using
it for WWW based projects.4) Are there areas/Which ares need help?
http://www.postgresql.org/docs/todo.shtml :)
There is a long list of things that are "Wish list" items...if
you
decide to dive into something, all we ask is that you postsomething
here
so that others know and to prevent "recreating the wheel"...justin
case
someone else is already working on it.
Import Notes
Resolved by subject fallback
I think I will start to get into the code slowly. But I agree that a
preprocessor would be fine. I found some old 0.1 code on the internet.
Michael
--
Dr. Michael Meskes, Project-Manager | topsystem Systemhaus GmbH
meskes@topsystem.de | Europark A2, Adenauerstr. 20
meskes@debian.org | 52146 Wuerselen
Go SF49ers! Go Rhein Fire! | Tel: (+49) 2405/4670-44
Use Debian GNU/Linux! | Fax: (+49) 2405/4670-10
Show quoted text
-----Original Message-----
From: Thomas G. Lockhart [SMTP:lockhart@alumni.caltech.edu]
Sent: Wednesday, February 04, 1998 3:29 PM
To: Michael Meskes
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Hi1) Is there an embedded SQL preprocessor available? I read about an
alpha version, but is it still maintained?I remember mention of it on the list a while ago, but have not heard
anything since. Also, no code for this was contributed for the
distribution, and it should be included. It would be _great_ to have a
real
preprocessor.3) How would you describe the general goal of this project: Offer as
good a database system as you can create as free software? That is
should it be able to handle huge amounts of data, or is it seen moreas
a tool for a private person/small business?
As scrappy pointed out, we would like to not forget either kind of
user.
However, afaik PostgreSQL is the closest to a "heavyweight" RDBMS
available
in the freeware world, so if we had to choose we would probably put
ourselves on the "huge amounts of data" end of the spectrum.4) Are there areas/Which areas need help?
The ToDo has lots of projects. Also, as you get started and learn the
details of Postgres internals the docs project could use some
contributions
:).There are other areas which are not yet called out in the ToDo list,
including- recovery of munged databases (I think disk full conditions may
cause
damage)
- transaction-only and session-only tables, variables, etc.
- re-think the table/type interface to allow more extensibility of
typesBut, picking simpler items from the ToDo is probably the best way to
get
familiar with pieces of the code.Welcome.
- Tom
Import Notes
Resolved by subject fallback
On Wed, 4 Feb 1998, Meskes, Michael wrote:
Okay, I'm just checking out sources.
Discussions on development issues should be discussed here, in
pgsql-hackers, but any patches should be submitted to
pgsql-patches...other then...enjoy :)
Are changes incorporated via patches? That means there's one person
merging all the patches?
Michael
--
Dr. Michael Meskes, Project-Manager | topsystem Systemhaus GmbH
meskes@topsystem.de | Europark A2, Adenauerstr. 20
meskes@debian.org | 52146 Wuerselen
Go SF49ers! Go Rhein Fire! | Tel: (+49) 2405/4670-44
Use Debian GNU/Linux! | Fax: (+49) 2405/4670-10
Show quoted text
-----Original Message-----
From: The Hermit Hacker [SMTP:scrappy@hub.org]
Sent: Wednesday, February 04, 1998 3:40 PM
To: Meskes, Michael
Cc: pgsql-hackers@postgreSQL.org
Subject: RE: [HACKERS] HiOn Wed, 4 Feb 1998, Meskes, Michael wrote:
Okay, I'm just checking out sources.
Discussions on development issues should be discussed here, in
pgsql-hackers, but any patches should be submitted to
pgsql-patches...other then...enjoy :)
Import Notes
Resolved by subject fallback
On Wed, 4 Feb 1998, Meskes, Michael wrote:
Are changes incorporated via patches? That means there's one person
merging all the patches?
Generally, I keep on top of the patches (or try to), but Bruce,
Vadim and Thomas all have direct access to the CVS repository for applying
patches and changes also..
Show quoted text
Michael
--
Dr. Michael Meskes, Project-Manager | topsystem Systemhaus GmbH
meskes@topsystem.de | Europark A2, Adenauerstr. 20
meskes@debian.org | 52146 Wuerselen
Go SF49ers! Go Rhein Fire! | Tel: (+49) 2405/4670-44
Use Debian GNU/Linux! | Fax: (+49) 2405/4670-10-----Original Message-----
From: The Hermit Hacker [SMTP:scrappy@hub.org]
Sent: Wednesday, February 04, 1998 3:40 PM
To: Meskes, Michael
Cc: pgsql-hackers@postgreSQL.org
Subject: RE: [HACKERS] HiOn Wed, 4 Feb 1998, Meskes, Michael wrote:
Okay, I'm just checking out sources.
Discussions on development issues should be discussed here, in
pgsql-hackers, but any patches should be submitted to
pgsql-patches...other then...enjoy :)
Hi,
despite bein chronically short of time I'd like to be involved with some
database development in free software. It seems to me that postgresql is
the most complete package out there and it also has some nice features
like object identity. I consider myself fairly knowledgable when it
comes to databases (at least it brought me my Ph.D.) so I think I could
do some. But before I do so I'd like to ask some questions about the
status of development:
3) How would you describe the general goal of this project: Offer as
good a database system as you can create as free software? That is
should it be able to handle huge amounts of data, or is it seen more as
a tool for a private person/small business?
About a year ago, our goal was to make it more reliable. At this point,
our goal is SQL compliance, speed, and features. See the TODO list in
the distribution or on our web page for an idea of our open items.
4) Are there areas/Which ares need help?
What we really need are more people who can get a handle on the entire
source tree, so we can implement some of the more complex SQL features,
or improve some of the older code.
--
Bruce Momjian
maillist@candle.pha.pa.us
About a year ago, our goal was to make it more reliable. At this point,
our goal is SQL compliance, speed, and features. See the TODO list in
the distribution or on our web page for an idea of our open items.
Good summary Bruce. We haven't emphasized long term goals, since these
shorter-term ones don't happen overnight. In the long run, though, other
topics of advanced interest, such as storage managers, distributed databases,
logging and error recovery, more OO features, adaptation to emerging
standards (SQL3 and beyond), certainly will be pertinent.
4) Are there areas/Which ares need help?
What we really need are more people who can get a handle on the entire
source tree, so we can implement some of the more complex SQL features,
or improve some of the older code.
We have just recently gotten to the point where a very few of the current
developers have an understanding of _most_ of the backend code. Achieving
longer-term advances require that the development be planned by folks with
that level of understanding, though others can and do contribute
substantially.
Everyone starts on Postgres without understanding the code. As we do small
projects, we learn as we go and become more capable of dealing with larger
problems. Enjoy.
- Tom
Okay. Sounds good. :-)
How do I get the latest source tree and how is it synchronised?
Here is the developers FAQ. It will be on our web page soon, too.
---------------------------------------------------------------------------
Developers Frequently Asked Questions (FAQ) for PostgreSQL
Last updated: Wed Feb 4 12:43:43 EST 1998
Current maintainer: Bruce Momjian (maillist@candle.pha.pa.us)
The most recent version of this document can be viewed at the postgreSQL Web
site, http://postgreSQL.org.
------------------------------------------------------------------------
Questions answered:
1) What tools are available for developers?
2) What books are good for developers?
3) Why do we use palloc() and pfree() to allocate memory?
4) Why do we use Node and List to make data structures?
5) How do I add a feature or fix a bug?
6) How do I download/update the current source tree?
------------------------------------------------------------------------
1) What tools are available for developers?
Aside from the User documentation mentioned in the regular FAQ, there are
several development tools available. First, all the files in the /tools
directory are designed for developers.
RELEASE_CHANGES changes we have to make for each release
SQL_keywords standard SQL'92 keywords
backend web flowchart of the backend directories
ccsym find standard defines made by your compiler
entab converts tabs to spaces, used by pgindent
find_static finds functions that could be made static
find_typedef get a list of typedefs in the source code
make_ctags make vi 'tags' file in each directory
make_diff make *.orig and diffs of source
make_etags make emacs 'etags' files
make_keywords.README make comparison of our keywords and SQL'92
make_mkid make mkid ID files
mkldexport create AIX exports file
pgindent indents C source files
Let me note some of these. If you point your browser at the tools/backend
directory, you will see all the backend components in a flow chart. You can
click on any one to see a description. If you then click on the directory
name, you will be taken to the source directory, to browse the actual source
code behind it. We also have several README files in some source directories
to describe the function of the module. The browser will display these when
you enter the directory also. The tools/backend directory is also contained
on our web page under the title Backend Flowchart.
Second, you really should have an editor that can handle tags, so you can
tag a function call to see the function definition, and then tag inside that
function to see an even lower-level function, and then back out twice to
return to the original function. Most editors support this via tags or etags
files.
Third, you need to get mkid from ftp.postgresql.org. By running
tools/make_mkid, an archive of source symbols can be created that can be
rapidly queried like grep or edited.
make_diff has tools to create patch diff files that can be applied to the
distribution.
pgindent will format source files to match our standard format, which has
four-space tabs, and an indenting format specified by flags to the your
operating system's utility indent.
2) What books are good for developers?
I have two good books, An Introduction to Database Systems, by C.J. Date,
Addison, Wesley and A Guide to the SQL Standard, by C.J. Date, et. al,
Addison, Wesley.
3) Why do we use palloc() and pfree() to allocate memory?
palloc() and pfree() are used in place of malloc() and free() because we
automatically free all memory allocated when a transaction completes. This
makes it easier to make sure we free memory that gets allocated in one
place, but only freed much later. There are several contexts that memory can
be allocated in, and this controls when the allocated memory is
automatically freed by the backend.
4) Why do we use Node and List to make data structures?
We do this because this allows a consistent way to pass data inside the
backend in a flexible way. Every node has a NodeTag which specifies what
type of data is inside the Node. Lists are lists of Nodes. lfirst(),
lnext(), and foreach() are used to get, skip, and traverse through Lists.
5) How do I add a feature or fix a bug?
The source code is over 250,000 lines. Many problems/features are isolated
to one specific area of the code. Others require knowledge of much of the
source. If you are confused about where to start, ask the hackers list, and
they will be glad to assess the complexity and give pointers on where to
start.
Another thing to keep in mind is that many fixes and features can be added
with surprisingly little code. I often start by adding code, then looking at
other areas in the code where similar things are done, and by the time I am
finished, the patch is quite small and compact.
When adding code, keep in mind that it should use the existing facilities in
the source, for performance reasons and for simplicity. Often a review of
existing code doing similar things is helpful.
6) How do I download/update the current source tree?
There are several ways to obtain the source tree. Occasional developers can
just get the most recent source tree snapshot from ftp.postgresql.org. For
regular developers, you can get cvsup, which is available from
ftp.postgresql.org too. cvsup allows you to download the source tree, then
occasionally update your copy of the source tree with any new changes. Using
cvsup, you don't have to download the entire source each time, only the
changed files. cvsup does not allow developers to update the source tree.
To update the source tree, there are two ways. You can generate a patch
against your current source tree, perhaps using the make_diff tools
mentioned above, and send them to the patches list. They will be reviewed,
and applied in a timely manner. If the patch is major, and we are in beta
testing, the developers may wait for the final release before applying your
patches.
For hard-core developers, Marc(scrappy@postgresql.org) will give you a Unix
shell account on postgresql.org, and you can ftp your files into your
account, patch, and cvs install the changes directly in the source tree.
--
Bruce Momjian
maillist@candle.pha.pa.us
Thanks to Neil, our webman, this is now on the web page, and will be
updated regularly.
Okay. Sounds good. :-)
How do I get the latest source tree and how is it synchronised?
Here is the developers FAQ. It will be on our web page soon, too.
---------------------------------------------------------------------------
Developers Frequently Asked Questions (FAQ) for PostgreSQLLast updated: Wed Feb 4 12:43:43 EST 1998
Current maintainer: Bruce Momjian (maillist@candle.pha.pa.us)
The most recent version of this document can be viewed at the postgreSQL Web
site, http://postgreSQL.org.------------------------------------------------------------------------
--
Bruce Momjian
maillist@candle.pha.pa.us
Okay, here's my idea of how I will get into it:
1) Try to get my apps going. That is I'll try to port my ORACLE
applications to postgresql to have some programs to do testing.
Therefore I need:
2) Add Linus' ecpg precompiler for embedded SQL to the dist.
3) Try to implement some speed-ups I encounter while playing with my
apps.
I take it some of you have already run the backend against a profiler.
Could anyone please tell me which functions need works most?
Michael
--
Dr. Michael Meskes, Project-Manager | topsystem Systemhaus GmbH
meskes@topsystem.de | Europark A2, Adenauerstr. 20
meskes@debian.org | 52146 Wuerselen
Go SF49ers! Go Rhein Fire! | Tel: (+49) 2405/4670-44
Use Debian GNU/Linux! | Fax: (+49) 2405/4670-10
Show quoted text
-----Original Message-----
From: Bruce Momjian [SMTP:maillist@candle.pha.pa.us]
Sent: Wednesday, February 04, 1998 6:10 PM
To: meskes@topsystem.de
Cc: pgsql-hackers@postgreSQL.org
Subject: Re: [HACKERS] HiHi,
despite bein chronically short of time I'd like to be involved with
some
database development in free software. It seems to me that
postgresql is
the most complete package out there and it also has some nice
features
like object identity. I consider myself fairly knowledgable when it
comes to databases (at least it brought me my Ph.D.) so I think Icould
do some. But before I do so I'd like to ask some questions about the
status of development:3) How would you describe the general goal of this project: Offer as
good a database system as you can create as free software? That is
should it be able to handle huge amounts of data, or is it seen moreas
a tool for a private person/small business?
About a year ago, our goal was to make it more reliable. At this
point,
our goal is SQL compliance, speed, and features. See the TODO list in
the distribution or on our web page for an idea of our open items.4) Are there areas/Which ares need help?
What we really need are more people who can get a handle on the entire
source tree, so we can implement some of the more complex SQL
features,
or improve some of the older code.--
Bruce Momjian
maillist@candle.pha.pa.us
Import Notes
Resolved by subject fallback
Thomas G. Lockhart wrote:
- recovery of munged databases (I think disk full conditions may cause
damage)
I have filled up the disk lots of times, but never had any problems
with corrupted data. When inserting, the backend simply says:
"mytable: cannot extend" and the insert fails.
The only problem I've had is with SELECT and ORDER BY, where the
sort files fill up the disk, and the backend either hangs or returns
zero rows. This can of course cause serious problems if an application
does something bad when a select returns zero rows (like re-initializing
the whole database or something... ooooh noooo...).
/* m */
Okay, here's my idea of how I will get into it:
1) Try to get my apps going. That is I'll try to port my ORACLE
applications to postgresql to have some programs to do testing.
Therefore I need:
2) Add Linus' ecpg precompiler for embedded SQL to the dist.
3) Try to implement some speed-ups I encounter while playing with my
apps.I take it some of you have already run the backend against a profiler.
Could anyone please tell me which functions need works most?
I have already hit many of the speedups people have reported in the
profiler by changing the logic, or inlining part/all of the function. I
recommend running your own profile and see what it shows.
--
Bruce Momjian
maillist@candle.pha.pa.us
Meskes, Michael wrote:
I take it some of you have already run the backend against a profiler.
Could anyone please tell me which functions need works most?
SYSTEM: Linux 2.0.29, gcc 2.7.2
-------------------------------
[make distclean]
[CVSup'ed current source 1998-02-05]
[./configure]
[make]
[CUSTOM_LDFLAGS not settable in Makefile.custom ...]
[... setting LDFLAGS=-pg ins Makefile.global]
--> fmgr.c: In function `fmgr_pl':
--> fmgr.c:46: `fmgr_pl_finfo' undeclared (first use this function)
[fix] cp backend/utils/fmgr.h backend/utils/fmgr/fmgr.h
[make install]
[Run initdb]
[Regression tests]
strings .. failed [ok]
lseg .. failed [new operator <= on lseg]
horology .. failed [daylight savings error?]
triggers .. failed
select_views .. failed
PROFILE OF REGRESSION TEST:
---------------------------
time seconds seconds calls ms/call ms/call name
35.16 7.97 7.97 mcount (profiler overhead)
5.91 9.31 1.34 9924 0.14 0.20 heapgettup
4.54 10.34 1.03 234597 0.00 0.01 hash_search
2.12 10.82 0.48 151781 0.00 0.00 SpinAcquire
2.03 11.28 0.46 46635 0.01 0.02 SearchSysCache
1.63 11.65 0.37 171345 0.00 0.00 tag_hash
1.46 11.98 0.33 20511 0.02 0.02 yylex
1.41 12.30 0.32 31311 0.01 0.03 LockAcquire
1.28 12.59 0.29 1522 0.19 0.61 yyparse
1.24 12.87 0.28 41176 0.01 0.01 fmgr_isbuiltin
1.01 13.10 0.23 201572 0.00 0.00 AllocSetAlloc
0.84 13.29 0.19 186214 0.00 0.00 OrderedElemPop
0.75 13.46 0.17 42896 0.00 0.00 nocachegetattr
0.71 13.62 0.16 29869 0.01 0.04 LockRelease
0.66 13.77 0.15 151781 0.00 0.00 SpinRelease
0.66 13.92 0.15 88766 0.00 0.01 newNode
0.66 14.07 0.15 52248 0.00 0.00 _bt_compare
... around 1850 functions follows (0.66 - 0.0%)
heapgettup (called by, calling):
--------------------------------
index % time self children called name
0.01 0.01 96/9924 heap_markpos [448]
0.21 0.11 1566/9924 heap_restrpos [102]
1.12 0.58 8262/9924 heap_getnext [20]
[14]: 13.8 1.34 0.69 9924 heapgettup [14] 0.04 0.14 9924/9924 RelationGetBufferWithBuffer [148] 0.03 0.15 5642/5702 ReleaseAndReadBuffer [145] 0.10 0.00 26276/42896 nocachegetattr [158] 0.01 0.08 7111/9607 HeapTupleSatisfiesVisibility [185] 0.04 0.00 117785/126582 char16eq [339] 0.02 0.00 111941/111994 int4eq [427] 0.02 0.00 109647/112329 nameeq [426] 0.00 0.01 1770/31585 ReleaseBuffer [134] 0.01 0.00 14000/50173 PageGetMaxOffsetNumber [341] 0.01 0.00 5185/5433 chareq [537] 0.01 0.00 15566/90147 BufferGetBlock [311] 0.01 0.00 17336/240911 BufferIsValid [207] 0.00 0.00 4/1973 int2eq [575] 0.00 0.00 7412/7412 nextpage [1186] 0.00 0.00 951/1358 SetBufferCommitInfoNeedsSave [1253] --------------------------------
0.04 0.14 9924/9924 RelationGetBufferWithBuffer [148]
0.03 0.15 5642/5702 ReleaseAndReadBuffer [145]
0.10 0.00 26276/42896 nocachegetattr [158]
0.01 0.08 7111/9607 HeapTupleSatisfiesVisibility [185]
0.04 0.00 117785/126582 char16eq [339]
0.02 0.00 111941/111994 int4eq [427]
0.02 0.00 109647/112329 nameeq [426]
0.00 0.01 1770/31585 ReleaseBuffer [134]
0.01 0.00 14000/50173 PageGetMaxOffsetNumber [341]
0.01 0.00 5185/5433 chareq [537]
0.01 0.00 15566/90147 BufferGetBlock [311]
0.01 0.00 17336/240911 BufferIsValid [207]
0.00 0.00 4/1973 int2eq [575]
0.00 0.00 7412/7412 nextpage [1186]
0.00 0.00 951/1358 SetBufferCommitInfoNeedsSave [1253]
--------------------------------
The *complete* gprof output is here:
ftp://postgresql.org/pub/incoming/regression-profile-980205.gz 156k
/* m */
Interesting. Nothing is jumping out at me. Looks like we could try to
clean up heapgettup() to see if there is anything in there that can be
speeded up.
None of the calls looks like it should be inlined. Do you see any that
look good for inlining?
Meskes, Michael wrote:
I take it some of you have already run the backend against a profiler.
Could anyone please tell me which functions need works most?SYSTEM: Linux 2.0.29, gcc 2.7.2
-------------------------------
[make distclean]
[CVSup'ed current source 1998-02-05]
[./configure]
[make]
[CUSTOM_LDFLAGS not settable in Makefile.custom ...]
[... setting LDFLAGS=-pg ins Makefile.global]--> fmgr.c: In function `fmgr_pl':
--> fmgr.c:46: `fmgr_pl_finfo' undeclared (first use this function)
[fix] cp backend/utils/fmgr.h backend/utils/fmgr/fmgr.h[make install]
[Run initdb]
[Regression tests]
strings .. failed [ok]
lseg .. failed [new operator <= on lseg]
horology .. failed [daylight savings error?]
triggers .. failed
select_views .. failedPROFILE OF REGRESSION TEST:
---------------------------
time seconds seconds calls ms/call ms/call name
35.16 7.97 7.97 mcount (profiler overhead)
5.91 9.31 1.34 9924 0.14 0.20 heapgettup
4.54 10.34 1.03 234597 0.00 0.01 hash_search
2.12 10.82 0.48 151781 0.00 0.00 SpinAcquire
2.03 11.28 0.46 46635 0.01 0.02 SearchSysCache
1.63 11.65 0.37 171345 0.00 0.00 tag_hash
1.46 11.98 0.33 20511 0.02 0.02 yylex
1.41 12.30 0.32 31311 0.01 0.03 LockAcquire
1.28 12.59 0.29 1522 0.19 0.61 yyparse
1.24 12.87 0.28 41176 0.01 0.01 fmgr_isbuiltin
1.01 13.10 0.23 201572 0.00 0.00 AllocSetAlloc
0.84 13.29 0.19 186214 0.00 0.00 OrderedElemPop
0.75 13.46 0.17 42896 0.00 0.00 nocachegetattr
0.71 13.62 0.16 29869 0.01 0.04 LockRelease
0.66 13.77 0.15 151781 0.00 0.00 SpinRelease
0.66 13.92 0.15 88766 0.00 0.01 newNode
0.66 14.07 0.15 52248 0.00 0.00 _bt_compare
... around 1850 functions follows (0.66 - 0.0%)heapgettup (called by, calling):
--------------------------------
index % time self children called name
0.01 0.01 96/9924 heap_markpos [448]
0.21 0.11 1566/9924 heap_restrpos [102]
1.12 0.58 8262/9924 heap_getnext [20]
[14] 13.8 1.34 0.69 9924 heapgettup [14]
0.04 0.14 9924/9924 RelationGetBufferWithBuffer [148]
0.03 0.15 5642/5702 ReleaseAndReadBuffer [145]
0.10 0.00 26276/42896 nocachegetattr [158]
0.01 0.08 7111/9607 HeapTupleSatisfiesVisibility [185]
0.04 0.00 117785/126582 char16eq [339]
0.02 0.00 111941/111994 int4eq [427]
0.02 0.00 109647/112329 nameeq [426]
0.00 0.01 1770/31585 ReleaseBuffer [134]
0.01 0.00 14000/50173 PageGetMaxOffsetNumber [341]
0.01 0.00 5185/5433 chareq [537]
0.01 0.00 15566/90147 BufferGetBlock [311]
0.01 0.00 17336/240911 BufferIsValid [207]
0.00 0.00 4/1973 int2eq [575]
0.00 0.00 7412/7412 nextpage [1186]
0.00 0.00 951/1358 SetBufferCommitInfoNeedsSave [1253]
--------------------------------The *complete* gprof output is here:
ftp://postgresql.org/pub/incoming/regression-profile-980205.gz 156k/* m */
--
Bruce Momjian
maillist@candle.pha.pa.us
Here are my ideas.
Can you look at fmgr_isbuiltin? It is doing a binary search looking to
see if it is builtin. Is it wrong? Is there a better way to do this?
Seems like a lot of cpu for this.
Also, locking seems to be taking quite a bit of cpu, especially if you
add up all the locking calls together. Can you check and see if there
is a cleaner way to do this?
hash_search also looks like it could be improved. This is called a lot
by the locking code. I know because when I added real deadlock
detection, I had to work with this stuff.
---------------------------------------------------------------------------
Forwarded message:
From maillist Thu Feb 5 15:11:14 1998
Subject: Re: [HACKERS] Profile of current backend
To: matti@algonet.se (Mattias Kregert)
Date: Thu, 5 Feb 1998 15:11:14 -0500 (EST)
Cc: meskes@topsystem.de, pgsql-hackers@postgreSQL.org
In-Reply-To: <34DA19A1.14BD647E@algonet.se> from "Mattias Kregert" at Feb 5, 98 08:57:21 pm
X-Mailer: ELM [version 2.4 PL25]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Content-Length: 4140Interesting. Nothing is jumping out at me. Looks like we could try to
clean up heapgettup() to see if there is anything in there that can be
speeded up.None of the calls looks like it should be inlined. Do you see any that
look good for inlining?Meskes, Michael wrote:
I take it some of you have already run the backend against a profiler.
Could anyone please tell me which functions need works most?SYSTEM: Linux 2.0.29, gcc 2.7.2
-------------------------------
[make distclean]
[CVSup'ed current source 1998-02-05]
[./configure]
[make]
[CUSTOM_LDFLAGS not settable in Makefile.custom ...]
[... setting LDFLAGS=-pg ins Makefile.global]--> fmgr.c: In function `fmgr_pl':
--> fmgr.c:46: `fmgr_pl_finfo' undeclared (first use this function)
[fix] cp backend/utils/fmgr.h backend/utils/fmgr/fmgr.h[make install]
[Run initdb]
[Regression tests]
strings .. failed [ok]
lseg .. failed [new operator <= on lseg]
horology .. failed [daylight savings error?]
triggers .. failed
select_views .. failedPROFILE OF REGRESSION TEST:
---------------------------
time seconds seconds calls ms/call ms/call name
35.16 7.97 7.97 mcount (profiler overhead)
5.91 9.31 1.34 9924 0.14 0.20 heapgettup
4.54 10.34 1.03 234597 0.00 0.01 hash_search
2.12 10.82 0.48 151781 0.00 0.00 SpinAcquire
2.03 11.28 0.46 46635 0.01 0.02 SearchSysCache
1.63 11.65 0.37 171345 0.00 0.00 tag_hash
1.46 11.98 0.33 20511 0.02 0.02 yylex
1.41 12.30 0.32 31311 0.01 0.03 LockAcquire
1.28 12.59 0.29 1522 0.19 0.61 yyparse
1.24 12.87 0.28 41176 0.01 0.01 fmgr_isbuiltin
1.01 13.10 0.23 201572 0.00 0.00 AllocSetAlloc
0.84 13.29 0.19 186214 0.00 0.00 OrderedElemPop
0.75 13.46 0.17 42896 0.00 0.00 nocachegetattr
0.71 13.62 0.16 29869 0.01 0.04 LockRelease
0.66 13.77 0.15 151781 0.00 0.00 SpinRelease
0.66 13.92 0.15 88766 0.00 0.01 newNode
0.66 14.07 0.15 52248 0.00 0.00 _bt_compare
... around 1850 functions follows (0.66 - 0.0%)heapgettup (called by, calling):
--------------------------------
index % time self children called name
0.01 0.01 96/9924 heap_markpos [448]
0.21 0.11 1566/9924 heap_restrpos [102]
1.12 0.58 8262/9924 heap_getnext [20]
[14] 13.8 1.34 0.69 9924 heapgettup [14]
0.04 0.14 9924/9924 RelationGetBufferWithBuffer [148]
0.03 0.15 5642/5702 ReleaseAndReadBuffer [145]
0.10 0.00 26276/42896 nocachegetattr [158]
0.01 0.08 7111/9607 HeapTupleSatisfiesVisibility [185]
0.04 0.00 117785/126582 char16eq [339]
0.02 0.00 111941/111994 int4eq [427]
0.02 0.00 109647/112329 nameeq [426]
0.00 0.01 1770/31585 ReleaseBuffer [134]
0.01 0.00 14000/50173 PageGetMaxOffsetNumber [341]
0.01 0.00 5185/5433 chareq [537]
0.01 0.00 15566/90147 BufferGetBlock [311]
0.01 0.00 17336/240911 BufferIsValid [207]
0.00 0.00 4/1973 int2eq [575]
0.00 0.00 7412/7412 nextpage [1186]
0.00 0.00 951/1358 SetBufferCommitInfoNeedsSave [1253]
--------------------------------The *complete* gprof output is here:
ftp://postgresql.org/pub/incoming/regression-profile-980205.gz 156k/* m */
--
Bruce Momjian
maillist@candle.pha.pa.us
--
Bruce Momjian
maillist@candle.pha.pa.us
Import Notes
Resolved by subject fallback
Bruce Momjian writes:
Interesting. Nothing is jumping out at me. Looks like we could try to
clean up heapgettup() to see if there is anything in there that can be
speeded up.None of the calls looks like it should be inlined. Do you see any that
look good for inlining?
Not exactly. But to get my application running we have to do something to
speed it up. This morning I started my program on a database with 15165
records in one table and something like 100 in a second plus 2 in a third.
Then my software tries to add records to the first table and for each record
added it updates a record in the other two. This doesn't count the select
etc.
Anyway, the data isn't processed fast enough. I tried adding about 600
records which should be done (the time in which the data was send) in 5
minutes, but our system needed almost 8 minutes to insert the data. And this
doesn't cause Oracle too much trouble.
So I guess there's need for some speed-up. :-)
Michael
--
Dr. Michael Meskes, Project-Manager | topsystem Systemhaus GmbH
meskes@topsystem.de | Europark A2, Adenauerstr. 20
meskes@debian.org | 52146 Wuerselen
Go SF49ers! Go Rhein Fire! | Tel: (+49) 2405/4670-44
Use Debian GNU/Linux! | Fax: (+49) 2405/4670-10
Bruce Momjian wrote:
Interesting. Nothing is jumping out at me. Looks like we could try to
clean up heapgettup() to see if there is anything in there that can be
speeded up.None of the calls looks like it should be inlined. Do you see any that
look good for inlining?
ExecScan() seems to be the only func which calls SeqNext(), which in
turn accounts for 60% of the calls to heap_getnext(), which does 80% of
the calls to heapgettup().
- Put SeqNext() into ExecScan() to lower function call overhead? [minimal optim.]
- In heapgettup(), 50% is the func itself and 50% is called funcs.
Top four CPU consumers:
0.04 0.14 9924/9924 RelationGetBufferWithBuffer [148]
0.03 0.15 5642/5702 ReleaseAndReadBuffer [145]
0.10 0.00 26276/42896 nocachegetattr [158]
0.01 0.08 7111/9607 HeapTupleSatisfiesVisibility [185]
RelationGetBufferWithBuffer() seems to be called from here only. If so, inline.
- Looking at RelationGetBufferWithBuffer():
0.00 0.10 4603/32354 ReadBuffer [55]
ReadBuffer() is the biggest cpu consumer called by RelationGetBufferWithBuffer(). (55%)
-> *** 97% of ReadBuffer() CPU time is in calling ReadBufferWithBufferLock()
-> 85% of ReadBufferWithBufferLock() CPU time is in calling BufferAlloc().
-> ReadBufferWithBufferLock() is the only func calling BufferAlloc().
-> Conclusion: INLINE BufferAlloc().
- Looking at BufferAlloc():
0.04 0.25 37974/37974 BufTableLookup [114]
0.10 0.00 32340/151781 SpinAcquire [81]
0.10 0.00 37470/40585 PinBuffer [209]
0.08 0.00 38478/43799 RelationGetLRelId [234]
0.04 0.00 37974/151781 SpinRelease [175]
-> 40% of BufferAlloc() CPU time is in calling BufTableLookup().
-> BufferAlloc() is the only func calling BufTableLookup().
-> Conclusion: INLINE BufTableLookup().
- Looking at BufTableLookup():
86% of CPU time is in calling hash_search(). The rest is own time.
- Looking at hash_search():
0.13 0.41 179189/179189 call_hash [69]
0.00 0.00 6/6 bucket_alloc [1084]
-> Conclusion: INLINE call_hash() [and bucket_alloc()] into hash_search().
- Looking at call_hash():
0.37 0.00 171345/171345 tag_hash [94]
0.04 0.00 7844/7844 string_hash [348]
-> Conclusion: INLINE tag_hash() [and string_hash()] into call_hash().
-> Perhaps disk_hash() could be used in some way? It is currently #ifdef'd away.
-> Could we use a lookup table instead of doing hash calculations? Would not that
-> be much faster?
It looks to me as if there are too many levels of function calls.
Perhaps all functions which are called by only one other func should be inlined?
Guesstimate:
This would speed up heapgettup() by 10% ???
Other functions would speed up too.
/* m */
Not exactly. But to get my application running we have to do something to
speed it up. This morning I started my program on a database with 15165
records in one table and something like 100 in a second plus 2 in a third.
Then my software tries to add records to the first table and for each record
added it updates a record in the other two. This doesn't count the select
etc.Anyway, the data isn't processed fast enough. I tried adding about 600
records which should be done (the time in which the data was send) in 5
minutes, but our system needed almost 8 minutes to insert the data. And this
doesn't cause Oracle too much trouble.So I guess there's need for some speed-up. :-)
I (and others) had done some benchmarking on simple inserts (6 months ago?) and
had concluded that the speed was similar to other commercial systems (I was
comparing against Ingres). I recall getting ~50TPS.
This was all before Bruce did his work on startup and runtime speeds. You
really think your performance is that far off? You are doing selects on the big
table before inserting? Do you have indices set up?? Our results were for
inserts on a heap table, which has the least overhead...
- Tom
On Fri, 6 Feb 1998, Thomas G. Lockhart wrote:
Not exactly. But to get my application running we have to do something to
speed it up. This morning I started my program on a database with 15165
records in one table and something like 100 in a second plus 2 in a third.
Then my software tries to add records to the first table and for each record
added it updates a record in the other two. This doesn't count the select
etc.Anyway, the data isn't processed fast enough. I tried adding about 600
records which should be done (the time in which the data was send) in 5
minutes, but our system needed almost 8 minutes to insert the data. And this
doesn't cause Oracle too much trouble.So I guess there's need for some speed-up. :-)
I (and others) had done some benchmarking on simple inserts (6 months ago?) and
had concluded that the speed was similar to other commercial systems (I was
comparing against Ingres). I recall getting ~50TPS.This was all before Bruce did his work on startup and runtime speeds. You
really think your performance is that far off? You are doing selects on the big
table before inserting? Do you have indices set up?? Our results were for
inserts on a heap table, which has the least overhead...
Just curious, but do you have -F set to disable fsync()? We
really really should disable that by default :(
On Fri, 6 Feb 1998, Vadim B. Mikheev wrote:
Just curious, but do you have -F set to disable fsync()? We
really really should disable that by default :(^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Oh no!
Isn't that our standard recommended operating parameter anyway?
Oops, that's just our Performance improving recommendation...sorry...:)
Import Notes
Reply to msg id not found: 34DB33B6.B1631627@sable.krasnoyarsk.su | Resolved by subject fallback
The Hermit Hacker wrote:
On Fri, 6 Feb 1998, Thomas G. Lockhart wrote:
Not exactly. But to get my application running we have to do something to
speed it up. This morning I started my program on a database with 15165
records in one table and something like 100 in a second plus 2 in a third.
Then my software tries to add records to the first table and for each record
added it updates a record in the other two. This doesn't count the select
etc.Anyway, the data isn't processed fast enough. I tried adding about 600
records which should be done (the time in which the data was send) in 5
minutes, but our system needed almost 8 minutes to insert the data. And this
doesn't cause Oracle too much trouble.So I guess there's need for some speed-up. :-)
I (and others) had done some benchmarking on simple inserts (6 months ago?) and
had concluded that the speed was similar to other commercial systems (I was
comparing against Ingres). I recall getting ~50TPS.This was all before Bruce did his work on startup and runtime speeds. You
really think your performance is that far off? You are doing selects on the big
table before inserting? Do you have indices set up?? Our results were for
inserts on a heap table, which has the least overhead...
And did you use BEGIN/END ?
Auto-commit is ON in postgres...
Just curious, but do you have -F set to disable fsync()? We
really really should disable that by default :(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Oh no!
Vadim
Bruce Momjian wrote:
Interesting. Nothing is jumping out at me. Looks like we could try to
clean up heapgettup() to see if there is anything in there that can be
speeded up.None of the calls looks like it should be inlined. Do you see any that
look good for inlining?ExecScan() seems to be the only func which calls SeqNext(), which in
turn accounts for 60% of the calls to heap_getnext(), which does 80% of
the calls to heapgettup().
This is certainly the type of analysis that will help us. I disagree on
inlining everything, because the code maintenance becomes a nightmare,
but we certainly could improve things.
Currently, if a function is declared static, is it only called from
within that file. If there is only one call to that function in the
file, it could be inlined. I suggest we only attack things like the
below where we have a large amount of function call overhead. Also,
there are cases where inlining the 'top' of the function can be really
good. For heap_getattr(), many cases were handled in the top that just
did a quick return. In those cases, I took the top and made a macro out
of it, and only called the function if I needed real processing.
This is a great help, and any patches would be good. The other nice
thing about this is that it is pretty compartementalized. You just need
to check that the new code behaves like the old code, and you are safe.
This is great analysis of the problem, too.
- Put SeqNext() into ExecScan() to lower function call overhead? [minimal optim.]
Sure. Just comment it so it is clear for readers, so it is not pulled
out into a separate function some day.
- In heapgettup(), 50% is the func itself and 50% is called funcs.
Top four CPU consumers:
0.04 0.14 9924/9924 RelationGetBufferWithBuffer [148]
0.03 0.15 5642/5702 ReleaseAndReadBuffer [145]
0.10 0.00 26276/42896 nocachegetattr [158]
0.01 0.08 7111/9607 HeapTupleSatisfiesVisibility [185]RelationGetBufferWithBuffer() seems to be called from here only. If so, inline.
OK.
- Looking at RelationGetBufferWithBuffer():
0.00 0.10 4603/32354 ReadBuffer [55]
ReadBuffer() is the biggest cpu consumer called by RelationGetBufferWithBuffer(). (55%)-> *** 97% of ReadBuffer() CPU time is in calling ReadBufferWithBufferLock()
-> 85% of ReadBufferWithBufferLock() CPU time is in calling BufferAlloc().
-> ReadBufferWithBufferLock() is the only func calling BufferAlloc().
-> Conclusion: INLINE BufferAlloc().
My only warning is that I have never inlined a function, AND left
another copy of the function non-inlined. The reason is that you are
left with two copies of the function, and that can get really confusing.
I don't know how others feel about that, but the code is already so
complicated, I hesitate to start duplicating blocks of code. Of course,
if you make it a macro, you have it in one place, and the preprocessor
is duplicating it for you.
I have converted a function to a macro, and I have inlined
frequently-called functions.
- Looking at BufferAlloc():
0.04 0.25 37974/37974 BufTableLookup [114]
0.10 0.00 32340/151781 SpinAcquire [81]
0.10 0.00 37470/40585 PinBuffer [209]
0.08 0.00 38478/43799 RelationGetLRelId [234]
0.04 0.00 37974/151781 SpinRelease [175]-> 40% of BufferAlloc() CPU time is in calling BufTableLookup().
-> BufferAlloc() is the only func calling BufTableLookup().
-> Conclusion: INLINE BufTableLookup().- Looking at BufTableLookup():
86% of CPU time is in calling hash_search(). The rest is own time.- Looking at hash_search():
0.13 0.41 179189/179189 call_hash [69]
0.00 0.00 6/6 bucket_alloc [1084]
-> Conclusion: INLINE call_hash() [and bucket_alloc()] into hash_search().- Looking at call_hash():
0.37 0.00 171345/171345 tag_hash [94]
0.04 0.00 7844/7844 string_hash [348]
-> Conclusion: INLINE tag_hash() [and string_hash()] into call_hash().
-> Perhaps disk_hash() could be used in some way? It is currently #ifdef'd away.
-> Could we use a lookup table instead of doing hash calculations? Would not that
-> be much faster?It looks to me as if there are too many levels of function calls.
Perhaps all functions which are called by only one other func should be inlined?Guesstimate:
This would speed up heapgettup() by 10% ???
Other functions would speed up too.
Makes sense. If the functions are in the same source file, and are
declared static, doesn't the compiler inline some of them.
See my heap_getattr() macro for a way perhaps to macro-ize some of
these. The macro looks much clearer than the old style macros.
--
Bruce Momjian
maillist@candle.pha.pa.us
I (and others) had done some benchmarking on simple inserts (6 months ago?) and
had concluded that the speed was similar to other commercial systems (I was
comparing against Ingres). I recall getting ~50TPS.This was all before Bruce did his work on startup and runtime speeds. You
really think your performance is that far off? You are doing selects on the big
table before inserting? Do you have indices set up?? Our results were for
inserts on a heap table, which has the least overhead...Just curious, but do you have -F set to disable fsync()? We
really really should disable that by default :(
Agreed. But maybe we should wait until we get pg_log syncing so we have
30-second reliability. Don't know where that fits on Vadim's list.
--
Bruce Momjian
maillist@candle.pha.pa.us
50 sounds okay and sufficient.
But I do not get that much. I get about 50 of my transactions (2updates
+ 1insert + some selects) per minute!
I do not select the big table before inserting And yes, I do use
indices. But for one key there are only two different values in the big
table.
Also, please take note that I'm still using 6.2.1 for this test. I will
switch to 6.3. as soon as the ODBC driver is working again.
Michael
--
Dr. Michael Meskes, Projekt-Manager | topystem Systemhaus GmbH
meskes@topsystem.de | Europark A2, Adenauerstr. 20
meskes@debian.org | 52146 Wuerselen
Go SF49ers! Use Debian GNU/Linux! | Tel: (+49) 2405/4670-44
Show quoted text
----------
From: Thomas G. Lockhart[SMTP:lockhart@alumni.caltech.edu]
Sent: Freitag, 6. Februar 1998 16:09
To: Michael Meskes
Cc: Bruce Momjian; PostgreSQL Hacker
Subject: Re: [HACKERS] Profile of current backendI (and others) had done some benchmarking on simple inserts (6 months
ago?) and
had concluded that the speed was similar to other commercial systems
(I was
comparing against Ingres). I recall getting ~50TPS.This was all before Bruce did his work on startup and runtime speeds.
You
really think your performance is that far off? You are doing selects
on the big
table before inserting? Do you have indices set up?? Our results were
for
inserts on a heap table, which has the least overhead...- Tom
Import Notes
Resolved by subject fallback
No, and that explains it all. Silly me. Argh!
I used the standard start script that comes with Debian (in fact the
6.2.1 version I use is the Debian package) and it doesn't use -F. And
with more disk access we have no chance.
I'll check that as soon as I find the time.
Michael
--
Dr. Michael Meskes, Projekt-Manager | topystem Systemhaus GmbH
meskes@topsystem.de | Europark A2, Adenauerstr. 20
meskes@debian.org | 52146 Wuerselen
Go SF49ers! Use Debian GNU/Linux! | Tel: (+49) 2405/4670-44
Show quoted text
----------
From: The Hermit Hacker[SMTP:scrappy@hub.org]
Sent: Freitag, 6. Februar 1998 16:20
To: Thomas G. Lockhart
Cc: Michael Meskes; Bruce Momjian; PostgreSQL Hacker
Subject: Re: [HACKERS] Profile of current backendJust curious, but do you have -F set to disable fsync()? We
really really should disable that by default :(
Import Notes
Resolved by subject fallback
BEGIN/END of what? Transactions? Since I used ODBC auto commit is
usually on. But all writing is done in one tranaction, i.e. we switch
auto-commit off and handle it ourselves.
Michael
--
Dr. Michael Meskes, Projekt-Manager | topystem Systemhaus GmbH
meskes@topsystem.de | Europark A2, Adenauerstr. 20
meskes@debian.org | 52146 Wuerselen
Go SF49ers! Use Debian GNU/Linux! | Tel: (+49) 2405/4670-44
Show quoted text
----------
From: Vadim B. Mikheev[SMTP:vadim@sable.krasnoyarsk.su]
Sent: Freitag, 6. Februar 1998 17:00
To: The Hermit Hacker
Cc: Thomas G. Lockhart; Michael Meskes; Bruce Momjian; PostgreSQL
Hacker
Subject: Re: [HACKERS] Profile of current backendAnd did you use BEGIN/END ?
Auto-commit is ON in postgres...Just curious, but do you have -F set to disable fsync()? We
really really should disable that by default :(^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Oh no!Vadim
Import Notes
Resolved by subject fallback
The Hermit Hacker writes:
Just curious, but do you have -F set to disable fsync()? We
really really should disable that by default :(
I tried with -F and it runs nicely. No difference to see between PostgreSQL
and Oracle. I just ran another test which includes table creation, inserts
and drop (source follows). Here's the result:
Oracle 7.3.3.4.0:
I needed 21 seconds and -345682 microseconds for this test
PostgreSQL without -F:
I needed 152 seconds and -623545 microseconds for this test
PostgreSQL with -F:
I needed 5 seconds and 84411 microseconds for this test
Whow!
Here's the source (yes, our precompiler can handle this kind of program
already :-)):
#include <stdio.h>
#include <sys/time.h>
#include <unistd.h>
exec sql include sqlca;
#define SQLCODE sqlca.sqlcode
void
db_error (char *msg)
{
sqlca.sqlerrm.sqlerrmc[sqlca.sqlerrm.sqlerrml] = '\0';
printf ("%s: db error %s\n", msg, sqlca.sqlerrm.sqlerrmc);
exit (1);
}
int
main ()
{
exec sql begin declare section;
long i;
exec sql end declare section;
struct timeval tvs, tve;
gettimeofday(&tvs, NULL);
exec sql connect 'mm';
if (SQLCODE)
db_error ("connect");
exec sql create table perftest(number int4, ascii char16);
if (SQLCODE)
db_error ("create t");
exec sql create unique index number on perftest(number);
if (SQLCODE)
db_error ("create i");
for (i = 0;i < 1407; i++)
{
exec sql begin declare section;
char text[16];
exec sql end declare section;
sprintf(text, "%ld", i);
exec sql insert into perftest(number, ascii) values (:i, :text);
if (SQLCODE)
db_error ("insert");
exec sql commit;
if (SQLCODE)
db_error ("commit");
}
exec sql drop index number;
if (SQLCODE)
db_error ("drop i");
exec sql drop table perftest;
if (SQLCODE)
db_error ("drop t");
exec sql commit;
if (SQLCODE)
db_error ("commit");
gettimeofday(&tve, NULL);
printf("I needed %ld seconds and %ld microseconds for this test\n", tve.tv_sec - tvs.tv_sec, tve.tv_usec - tvs.tv_usec);
return (0);
}
Michael
--
Dr. Michael Meskes, Project-Manager | topsystem Systemhaus GmbH
meskes@topsystem.de | Europark A2, Adenauerstr. 20
meskes@debian.org | 52146 Wuerselen
Go SF49ers! Go Rhein Fire! | Tel: (+49) 2405/4670-44
Use Debian GNU/Linux! | Fax: (+49) 2405/4670-10
Here's the source (yes, our precompiler can handle this kind of program
already :-)):
Looks nice!
- Tom
The Hermit Hacker writes:
Just curious, but do you have -F set to disable fsync()? We
really really should disable that by default :(I tried with -F and it runs nicely. No difference to see between PostgreSQL
and Oracle. I just ran another test which includes table creation, inserts
and drop (source follows). Here's the result:Oracle 7.3.3.4.0:
I needed 21 seconds and -345682 microseconds for this testPostgreSQL without -F:
I needed 152 seconds and -623545 microseconds for this testPostgreSQL with -F:
I needed 5 seconds and 84411 microseconds for this testWhow!
This is good news. Is this with 6.3 or 6.2.1?
--
Bruce Momjian
maillist@candle.pha.pa.us
The numbers are from 6.3.
Michael
--
Dr. Michael Meskes, Project-Manager | topsystem Systemhaus GmbH
meskes@topsystem.de | Europark A2, Adenauerstr. 20
meskes@debian.org | 52146 Wuerselen
Go SF49ers! Go Rhein Fire! | Tel: (+49) 2405/4670-44
Use Debian GNU/Linux! | Fax: (+49) 2405/4670-10
Show quoted text
-----Original Message-----
From: Bruce Momjian [SMTP:maillist@candle.pha.pa.us]
Sent: Tuesday, February 10, 1998 4:41 PM
To: meskes@topsystem.de
Cc: scrappy@hub.org; pgsql-hackers@postgreSQL.org
Subject: Re: [HACKERS] Profile of current backendThe Hermit Hacker writes:
Just curious, but do you have -F set to disable fsync()? We
really really should disable that by default :(I tried with -F and it runs nicely. No difference to see between
PostgreSQL
and Oracle. I just ran another test which includes table creation,
inserts
and drop (source follows). Here's the result:
Oracle 7.3.3.4.0:
I needed 21 seconds and -345682 microseconds for this testPostgreSQL without -F:
I needed 152 seconds and -623545 microseconds for this testPostgreSQL with -F:
I needed 5 seconds and 84411 microseconds for this testWhow!
This is good news. Is this with 6.3 or 6.2.1?
--
Bruce Momjian
maillist@candle.pha.pa.us
Import Notes
Resolved by subject fallback
Can we go anywhere with this?
Bruce Momjian wrote:
Interesting. Nothing is jumping out at me. Looks like we could try to
clean up heapgettup() to see if there is anything in there that can be
speeded up.None of the calls looks like it should be inlined. Do you see any that
look good for inlining?ExecScan() seems to be the only func which calls SeqNext(), which in
turn accounts for 60% of the calls to heap_getnext(), which does 80% of
the calls to heapgettup().- Put SeqNext() into ExecScan() to lower function call overhead? [minimal optim.]
- In heapgettup(), 50% is the func itself and 50% is called funcs.
Top four CPU consumers:
0.04 0.14 9924/9924 RelationGetBufferWithBuffer [148]
0.03 0.15 5642/5702 ReleaseAndReadBuffer [145]
0.10 0.00 26276/42896 nocachegetattr [158]
0.01 0.08 7111/9607 HeapTupleSatisfiesVisibility [185]RelationGetBufferWithBuffer() seems to be called from here only. If so, inline.
- Looking at RelationGetBufferWithBuffer():
0.00 0.10 4603/32354 ReadBuffer [55]
ReadBuffer() is the biggest cpu consumer called by RelationGetBufferWithBuffer(). (55%)-> *** 97% of ReadBuffer() CPU time is in calling ReadBufferWithBufferLock()
-> 85% of ReadBufferWithBufferLock() CPU time is in calling BufferAlloc().
-> ReadBufferWithBufferLock() is the only func calling BufferAlloc().
-> Conclusion: INLINE BufferAlloc().- Looking at BufferAlloc():
0.04 0.25 37974/37974 BufTableLookup [114]
0.10 0.00 32340/151781 SpinAcquire [81]
0.10 0.00 37470/40585 PinBuffer [209]
0.08 0.00 38478/43799 RelationGetLRelId [234]
0.04 0.00 37974/151781 SpinRelease [175]-> 40% of BufferAlloc() CPU time is in calling BufTableLookup().
-> BufferAlloc() is the only func calling BufTableLookup().
-> Conclusion: INLINE BufTableLookup().- Looking at BufTableLookup():
86% of CPU time is in calling hash_search(). The rest is own time.- Looking at hash_search():
0.13 0.41 179189/179189 call_hash [69]
0.00 0.00 6/6 bucket_alloc [1084]
-> Conclusion: INLINE call_hash() [and bucket_alloc()] into hash_search().- Looking at call_hash():
0.37 0.00 171345/171345 tag_hash [94]
0.04 0.00 7844/7844 string_hash [348]
-> Conclusion: INLINE tag_hash() [and string_hash()] into call_hash().
-> Perhaps disk_hash() could be used in some way? It is currently #ifdef'd away.
-> Could we use a lookup table instead of doing hash calculations? Would not that
-> be much faster?It looks to me as if there are too many levels of function calls.
Perhaps all functions which are called by only one other func should be inlined?Guesstimate:
This would speed up heapgettup() by 10% ???
Other functions would speed up too./* m */
--
Bruce Momjian | 830 Blythe Avenue
maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026
+ If your life is a hard drive, | (610) 353-9879(w)
+ Christ can be your backup. | (610) 853-3000(h)
In src/test/regress:
it is time to remove the "PST8PDT" stuff from the files
README and regress.sh
"set timezone to 'PST8PDT'" should be added to sql/datetime.sql
/* m */
it is time to remove the "PST8PDT" stuff from the files
README and regress.sh"set timezone to 'PST8PDT'" should be added to sql/datetime.sql
Hmm. It would have to be added in several regression tests, not just the
datetime one. Also, having it isolated to the client configuration in
regress.sh gives one the opportunity to fix it or adjust it for your
platform.
I'm not sure what the README says about it; that may be obsolete. Is the
PGTZ setting in regress.sh causing you trouble??
- Tom
Bruce Momjian writes:
Can we go anywhere with this?
Did anyone do anything with this already?
ExecScan() seems to be the only func which calls SeqNext(), which in
turn accounts for 60% of the calls to heap_getnext(), which does 80% of
the calls to heapgettup().- Put SeqNext() into ExecScan() to lower function call overhead? [minimal optim.]
- In heapgettup(), 50% is the func itself and 50% is called funcs.
Top four CPU consumers:
0.04 0.14 9924/9924 RelationGetBufferWithBuffer [148]
0.03 0.15 5642/5702 ReleaseAndReadBuffer [145]
0.10 0.00 26276/42896 nocachegetattr [158]
0.01 0.08 7111/9607 HeapTupleSatisfiesVisibility [185]RelationGetBufferWithBuffer() seems to be called from here only. If so, inline.
- Looking at RelationGetBufferWithBuffer():
0.00 0.10 4603/32354 ReadBuffer [55]
ReadBuffer() is the biggest cpu consumer called by RelationGetBufferWithBuffer(). (55%)-> *** 97% of ReadBuffer() CPU time is in calling ReadBufferWithBufferLock()
-> 85% of ReadBufferWithBufferLock() CPU time is in calling BufferAlloc().
-> ReadBufferWithBufferLock() is the only func calling BufferAlloc().
-> Conclusion: INLINE BufferAlloc().- Looking at BufferAlloc():
0.04 0.25 37974/37974 BufTableLookup [114]
0.10 0.00 32340/151781 SpinAcquire [81]
0.10 0.00 37470/40585 PinBuffer [209]
0.08 0.00 38478/43799 RelationGetLRelId [234]
0.04 0.00 37974/151781 SpinRelease [175]-> 40% of BufferAlloc() CPU time is in calling BufTableLookup().
-> BufferAlloc() is the only func calling BufTableLookup().
-> Conclusion: INLINE BufTableLookup().- Looking at BufTableLookup():
86% of CPU time is in calling hash_search(). The rest is own time.- Looking at hash_search():
0.13 0.41 179189/179189 call_hash [69]
0.00 0.00 6/6 bucket_alloc [1084]
-> Conclusion: INLINE call_hash() [and bucket_alloc()] into hash_search().- Looking at call_hash():
0.37 0.00 171345/171345 tag_hash [94]
0.04 0.00 7844/7844 string_hash [348]
-> Conclusion: INLINE tag_hash() [and string_hash()] into call_hash().
-> Perhaps disk_hash() could be used in some way? It is currently #ifdef'd away.
-> Could we use a lookup table instead of doing hash calculations? Would not that
-> be much faster?It looks to me as if there are too many levels of function calls.
Perhaps all functions which are called by only one other func should be inlined?
Isn't this a good solution? A function only called by one other function has
its right to exist only for readability. And this optimization could be done
automatically.
Michael
--
Dr. Michael Meskes, Project-Manager | topsystem Systemhaus GmbH
meskes@topsystem.de | Europark A2, Adenauerstr. 20
meskes@debian.org | 52146 Wuerselen
Go SF49ers! Go Rhein Fire! | Tel: (+49) 2405/4670-44
Use Debian GNU/Linux! | Fax: (+49) 2405/4670-10
Yes, but since it is, why don't we inline it? Is there a possible
caveat?
Michael
--
Dr. Michael Meskes, Project-Manager | topsystem Systemhaus GmbH
meskes@topsystem.de | Europark A2, Adenauerstr. 20
meskes@debian.org | 52146 Wuerselen
Go SF49ers! Go Rhein Fire! | Tel: (+49) 2405/4670-44
Use Debian GNU/Linux! | Fax: (+49) 2405/4670-10
Show quoted text
-----Original Message-----
From: Bruce Momjian [SMTP:maillist@candle.pha.pa.us]
Sent: Monday, March 23, 1998 4:08 PM
To: meskes@topsystem.de
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Profile of current backendIsn't this a good solution? A function only called by one other
function has
its right to exist only for readability. And this optimization could
be done
automatically.
Wouldn't be such a big deal if it was not call so many times.
-- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)
Import Notes
Resolved by subject fallback
Bruce Momjian writes:
Can we go anywhere with this?
Did anyone do anything with this already?
No one yet.
ExecScan() seems to be the only func which calls SeqNext(), which in
turn accounts for 60% of the calls to heap_getnext(), which does 80% of
the calls to heapgettup().- Put SeqNext() into ExecScan() to lower function call overhead? [minimal optim.]
- In heapgettup(), 50% is the func itself and 50% is called funcs.
Top four CPU consumers:
0.04 0.14 9924/9924 RelationGetBufferWithBuffer [148]
0.03 0.15 5642/5702 ReleaseAndReadBuffer [145]
0.10 0.00 26276/42896 nocachegetattr [158]
0.01 0.08 7111/9607 HeapTupleSatisfiesVisibility [185]RelationGetBufferWithBuffer() seems to be called from here only. If so, inline.
- Looking at RelationGetBufferWithBuffer():
0.00 0.10 4603/32354 ReadBuffer [55]
ReadBuffer() is the biggest cpu consumer called by RelationGetBufferWithBuffer(). (55%)-> *** 97% of ReadBuffer() CPU time is in calling ReadBufferWithBufferLock()
-> 85% of ReadBufferWithBufferLock() CPU time is in calling BufferAlloc().
-> ReadBufferWithBufferLock() is the only func calling BufferAlloc().
-> Conclusion: INLINE BufferAlloc().- Looking at BufferAlloc():
0.04 0.25 37974/37974 BufTableLookup [114]
0.10 0.00 32340/151781 SpinAcquire [81]
0.10 0.00 37470/40585 PinBuffer [209]
0.08 0.00 38478/43799 RelationGetLRelId [234]
0.04 0.00 37974/151781 SpinRelease [175]-> 40% of BufferAlloc() CPU time is in calling BufTableLookup().
-> BufferAlloc() is the only func calling BufTableLookup().
-> Conclusion: INLINE BufTableLookup().- Looking at BufTableLookup():
86% of CPU time is in calling hash_search(). The rest is own time.- Looking at hash_search():
0.13 0.41 179189/179189 call_hash [69]
0.00 0.00 6/6 bucket_alloc [1084]
-> Conclusion: INLINE call_hash() [and bucket_alloc()] into hash_search().- Looking at call_hash():
0.37 0.00 171345/171345 tag_hash [94]
0.04 0.00 7844/7844 string_hash [348]
-> Conclusion: INLINE tag_hash() [and string_hash()] into call_hash().
-> Perhaps disk_hash() could be used in some way? It is currently #ifdef'd away.
-> Could we use a lookup table instead of doing hash calculations? Would not that
-> be much faster?It looks to me as if there are too many levels of function calls.
Perhaps all functions which are called by only one other func should be inlined?Isn't this a good solution? A function only called by one other function has
its right to exist only for readability. And this optimization could be done
automatically.
Wouldn't be such a big deal if it was not call so many times.
--
Bruce Momjian | 830 Blythe Avenue
maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026
+ If your life is a hard drive, | (610) 353-9879(w)
+ Christ can be your backup. | (610) 853-3000(h)
No, it wasn't me. Sorry.
I believe it was matti@algonet.se (Mattias Kregert). At least his
address is on one of the earlier replies.
Michael
--
Dr. Michael Meskes, Project-Manager | topsystem Systemhaus GmbH
meskes@topsystem.de | Europark A2, Adenauerstr. 20
meskes@debian.org | 52146 Wuerselen
Go SF49ers! Go Rhein Fire! | Tel: (+49) 2405/4670-44
Use Debian GNU/Linux! | Fax: (+49) 2405/4670-10
Show quoted text
-----Original Message-----
From: Bruce Momjian [SMTP:maillist@candle.pha.pa.us]
Sent: Monday, March 23, 1998 4:58 PM
To: meskes@topsystem.de
Cc: meskes@topsystem.de; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Profile of current backendYes, but since it is, why don't we inline it? Is there a possible
caveat?Inline it. I just haven't had time or studied that part of the code.
I
had hoped the original researcher would do the honors, but I don't
remember who that was, or if it was you.-- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)
Import Notes
Resolved by subject fallback
Yes, but since it is, why don't we inline it? Is there a possible
caveat?
Inline it. I just haven't had time or studied that part of the code. I
had hoped the original researcher would do the honors, but I don't
remember who that was, or if it was you.
--
Bruce Momjian | 830 Blythe Avenue
maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026
+ If your life is a hard drive, | (610) 353-9879(w)
+ Christ can be your backup. | (610) 853-3000(h)
Bruce Momjian wrote:
Interesting. Nothing is jumping out at me. Looks like we could try to
clean up heapgettup() to see if there is anything in there that can be
speeded up.None of the calls looks like it should be inlined. Do you see any that
look good for inlining?ExecScan() seems to be the only func which calls SeqNext(), which in
turn accounts for 60% of the calls to heap_getnext(), which does 80% of
the calls to heapgettup().- Put SeqNext() into ExecScan() to lower function call overhead? [minimal optim.]
- In heapgettup(), 50% is the func itself and 50% is called funcs.
Top four CPU consumers:
0.04 0.14 9924/9924 RelationGetBufferWithBuffer [148]
0.03 0.15 5642/5702 ReleaseAndReadBuffer [145]
0.10 0.00 26276/42896 nocachegetattr [158]
0.01 0.08 7111/9607 HeapTupleSatisfiesVisibility [185]RelationGetBufferWithBuffer() seems to be called from here only. If so, inline.
- Looking at RelationGetBufferWithBuffer():
0.00 0.10 4603/32354 ReadBuffer [55]
ReadBuffer() is the biggest cpu consumer called by RelationGetBufferWithBuffer(). (55%)-> *** 97% of ReadBuffer() CPU time is in calling ReadBufferWithBufferLock()
-> 85% of ReadBufferWithBufferLock() CPU time is in calling BufferAlloc().
-> ReadBufferWithBufferLock() is the only func calling BufferAlloc().
-> Conclusion: INLINE BufferAlloc().- Looking at BufferAlloc():
0.04 0.25 37974/37974 BufTableLookup [114]
0.10 0.00 32340/151781 SpinAcquire [81]
0.10 0.00 37470/40585 PinBuffer [209]
0.08 0.00 38478/43799 RelationGetLRelId [234]
0.04 0.00 37974/151781 SpinRelease [175]-> 40% of BufferAlloc() CPU time is in calling BufTableLookup().
-> BufferAlloc() is the only func calling BufTableLookup().
-> Conclusion: INLINE BufTableLookup().- Looking at BufTableLookup():
86% of CPU time is in calling hash_search(). The rest is own time.- Looking at hash_search():
0.13 0.41 179189/179189 call_hash [69]
0.00 0.00 6/6 bucket_alloc [1084]
-> Conclusion: INLINE call_hash() [and bucket_alloc()] into hash_search().- Looking at call_hash():
0.37 0.00 171345/171345 tag_hash [94]
0.04 0.00 7844/7844 string_hash [348]
-> Conclusion: INLINE tag_hash() [and string_hash()] into call_hash().
-> Perhaps disk_hash() could be used in some way? It is currently #ifdef'd away.
-> Could we use a lookup table instead of doing hash calculations? Would not that
-> be much faster?It looks to me as if there are too many levels of function calls.
Perhaps all functions which are called by only one other func should be inlined?Guesstimate:
This would speed up heapgettup() by 10% ???
Other functions would speed up too.
I have reviewed your findings, and have inlined 9 small functions that
are called for every row. Unfortunately, several of the ones you
mention above are called via function pointers, so it was not possible
to inline them.
In my testing, I created a table with 160k rows, and did a SELECT that
would do a sequential scan of the table, and looked at the functions
that were called the most.
---------------------------------------------------------------------------
% cumulative self self total
time seconds seconds calls ms/call ms/call name
54.3 2.65 2.65 mcount (472)
16.2 3.44 0.79 2999 0.26 0.26 _read [18]
3.5 3.61 0.17 160148 0.00 0.01 _heapgettup [15]
2.5 3.73 0.12 174007 0.00 0.00 _SpinAcquire [36]
2.3 3.84 0.11 981732 0.00 0.00 _BufferIsValid [37]
2.3 3.95 0.11 160148 0.00 0.01 _heap_getnext [14]
2.0 4.05 0.10 160096 0.00 0.00 _ExecEvalVar [38]
2.0 4.15 0.10 160096 0.00 0.00 _ExecMakeFunctionResult <cycle 3> [26]
1.4 4.22 0.07 174007 0.00 0.00 _SpinRelease [41]
1.2 4.28 0.06 18176 0.00 0.00 _hash_search [40]
1.0 4.33 0.05 480288 0.00 0.00 _ExecEvalExpr <cycle 3> [34]
1.0 4.38 0.05 167313 0.00 0.00 _RelationGetLRelId [47]
1.0 4.43 0.05 160096 0.00 0.00 _ExecEvalFuncArgs <cycle 3> [48]
0.8 4.47 0.04 160097 0.00 0.01 _SeqNext [13]
0.8 4.51 0.04 160096 0.00 0.00 _ExecQual [22]
0.6 4.54 0.03 160096 0.00 0.00 _ExecEvalOper <cycle 3> [51]
0.4 4.56 0.02 163015 0.00 0.00 _IncrBufferRefCount [61]
0.4 4.58 0.02 161342 0.00 0.00 _fmgr_c <cycle 5> [29]
0.4 4.60 0.02 160451 0.00 0.00 _TransactionIdEquals [62]
0.4 4.62 0.02 160148 0.00 0.00 _RelationGetBufferWithBuffer [28]
0.4 4.64 0.02 160098 0.00 0.00 _TupIsNull [63]
0.4 4.66 0.02 160097 0.00 0.00 _ExecIncrSlotBufferRefcnt [45]
0.4 4.68 0.02 160096 0.00 0.00 _ExecQualClause [23]
0.4 4.70 0.02 6161 0.00 0.00 _strcpy [64]
0.4 4.72 0.02 31 0.65 0.65 _open [65]
0.2 4.73 0.01 164288 0.00 0.00 _ReleaseBuffer [81]
0.2 4.74 0.01 164048 0.00 0.00 _PageGetMaxOffsetNumber [97]
0.2 4.75 0.01 160450 0.00 0.00 _HeapTupleSatisfiesVisibility [50]
0.2 4.76 0.01 160449 0.00 0.00 _HeapTupleSatisfiesNow [98]
0.2 4.77 0.01 160100 0.00 0.00 _ExecClearTuple [49]
0.2 4.78 0.01 160098 0.00 0.00 _ExecStoreTuple [46]
0.2 4.79 0.01 160096 0.00 0.00 _bpchareq [102]
0.2 4.80 0.01 15696 0.00 0.00 _call_hash [99]
0.2 4.81 0.01 12187 0.00 0.00 _malloc [96]
0.2 4.82 0.01 9182 0.00 0.00 _PortalHeapMemoryAlloc [73]
0.2 4.83 0.01 7257 0.00 0.00 _newNode [54]
0.2 4.84 0.01 2973 0.00 0.20 _ReleaseAndReadBuffer [25]
0.2 4.85 0.01 1241 0.01 0.02 _LockAcquire [58]
0.2 4.86 0.01 189 0.05 0.05 _IsBootstrapProcessingMode [103]
0.2 4.87 0.01 3 3.33 3.33 _shmdt [101]
0.2 4.88 0.01 1 10.00 2172.84 _ExecScan [11]
0.0 4.88 0.00 166214 0.00 0.00 _BufferGetBlock [72]
0.0 4.88 0.00 160449 0.00 0.00 _heapisoverride [477]
0.0 4.88 0.00 15303 0.00 0.00 _tag_hash [478]
0.0 4.88 0.00 11718 0.00 0.00 _AllocSetContains [479]
0.0 4.88 0.00 11718 0.00 0.00 _OrderedSetContains [480]
0.0 4.88 0.00 11366 0.00 0.00 _OrderedElemPush [481]
0.0 4.88 0.00 11366 0.00 0.00 _OrderedElemPushHead [482]
0.0 4.88 0.00 11366 0.00 0.00 _OrderedElemPushInto [483]
0.0 4.88 0.00 11361 0.00 0.00 _AllocSetAlloc [104]
0.0 4.88 0.00 11361 0.00 0.00 _MemoryContextAlloc [67]
0.0 4.88 0.00 11339 0.00 0.00 _palloc [68]
0.0 4.88 0.00 10515 0.00 0.00 _free [484]
0.0 4.88 0.00 10488 0.00 0.00 _AllocSetFree [485]
0.0 4.88 0.00 10488 0.00 0.00 _OrderedElemPop [486]
0.0 4.88 0.00 8176 0.00 0.00 _get_expr [487]
0.0 4.88 0.00 7953 0.00 0.00 _AllocSetGetFirst [488]
0.0 4.88 0.00 7953 0.00 0.00 _OrderedSetGetHead [489]
0.0 4.88 0.00 7952 0.00 0.00 _OrderedElemGetBase [490]
0.0 4.88 0.00 7424 0.00 0.00 _memmove [491]
---------------------------------------------------------------------------
Of these, I inlined:
#define TransactionIdEquals(id1, id2) \
#define TupIsNull(slot) \
#define ExecIncrSlotBufferRefcnt(slot) \
#define BufferIsValid(bufnum) \
#define IncrBufferRefCount(buffer) \
#define BufferGetBlock(buffer) \
#define PageGetMaxOffsetNumber(page) \
#define HeapTupleSatisfiesVisibility(tuple, seeself) \
#define heapisoverride() \
When I say linined, I mean I made macros of them. I did not inline the
code into other functions, because you loose the modularity of the
function name, and many are called multiple times.
These were all small functions, only a few lines of actual code.
BufferIsValid was called several times per row.
With that said, the execution times did not change at all. What I did
see is that the mcount row in gprof output:
54.3 2.65 2.65 mcount (472)
decreased from 2.65 to 1.79, which means that less time was spent in
function calls. Not sure why that time did not translate to better
overall performance, but the inlining is a clear win for these small
functions. mcount is artificially high anyway because it is doing the
profiling. The resulting gprof output shows a much smaller list of
functions called 160k times.
A patch will be posted to the patches list, for any of you who subscribe
to that.
--
Bruce Momjian | 830 Blythe Avenue
maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026
+ If your life is a hard drive, | (610) 353-9879(w)
+ Christ can be your backup. | (610) 853-3000(h)