I am confused after reading codes of PostgreSQL three week

Started by homalmost 15 years ago17 messages
#1hom
obsidianhom@gmail.com

Hi,

I try to known how a database is implemented and I have been reading
PG source codes for a month.

Now, I only know a little about how PG work. :(

I just know PG work like this but I don't know why PG work like this. :( :(

even worse, I feel I can better understand the source code. it may be
that I could't split the large module into small piece which may help
to understand.

Is there any article or some way could help understand the source code ?

Thanks for help ~

--
Best Wishes!

                                     hom

#2Bruce Momjian
bruce@momjian.us
In reply to: hom (#1)
Re: I am confused after reading codes of PostgreSQL three week

hom wrote:

Hi,

I try to known how a database is implemented and I have been reading
PG source codes for a month.

Now, I only know a little about how PG work. :(

I just know PG work like this but I don't know why PG work like this. :( :(

even worse, I feel I can better understand the source code. it may be
that I could't split the large module into small piece which may help
to understand.

Is there any article or some way could help understand the source code ?

I assume you have looked at these places:

http://wiki.postgresql.org/wiki/Developer_FAQ
http://www.postgresql.org/developer/coding

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#3Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: hom (#1)
Re: I am confused after reading codes of PostgreSQL three week

hom <obsidianhom@gmail.com> wrote:

I try to known how a database is implemented and I have been
reading PG source codes for a month.

That's ambitious.

find -name '*.h' -or -name '*.c' \
| egrep -v '^\./src/test/.+/tmp_check/' \
| xargs cat | wc -l
1059144

Depending on how you do the math, that's about 50,000 lines of code
per day to get through it in the time you mention.

Is there any article or some way could help understand the source
code ?

Your best bet would be to follow links from the Developers tab on
the main PostgreSQL web site:

http://www.postgresql.org/developer/

In particular the Developer FAQ page:

http://wiki.postgresql.org/wiki/Developer_FAQ

And the "Coding" links:

http://www.postgresql.org/developer/coding

may help.

Before reading code in a directory, be sure to read any README
file(s) in that directory carefully.

It helps to read this list.

In spite of reviewing all of that myself, it was rather intimidating
when I went to work on a major patch 14 months ago. Robert Haas
offered some good advice which served me well in that effort --
divide the effort in to a series of incremental steps, each of which
deals with a small enough portion of the code to get your head
around. As you work in any one narrow area, it becomes increasingly
clear; with that as a base you can expand your scope.

When you're working in the code, it is tremendously helpful to use
an editor with ctags support (or similar IDE functionality).

I hope this is helpful. Good luck.

-Kevin

#4Markus Wanner
markus@bluegap.ch
In reply to: Kevin Grittner (#3)
Re: I am confused after reading codes of PostgreSQL three week

Hom,

On 03/17/2011 04:49 PM, Kevin Grittner wrote:

That's ambitious.

Absolutely, yes. Exercise patience with yourself.

A method that hasn't been mentioned, yet, is digging out your debugger
and attach it to a connected Postgres backend. You can then issue a
query you are interested in and follow the backend doing its work.

That's particularly helpful in trying to find a certain spot of
interest. Of course, it doesn't help much in getting the big picture.

Good luck on your journey through the code base.

Regards

Markus Wanner

#5Brendan Jurd
direvus@gmail.com
In reply to: hom (#1)
Re: I am confused after reading codes of PostgreSQL three week

On 18 March 2011 01:57, hom <obsidianhom@gmail.com> wrote:

 I try to known how a database is implemented

This objective is so vast and so vague that it's difficult to give
meaningful help.

I'd emphasise Kevin Grittner's very worthwhile advice. Try to break
your question down into smaller, more specific ones. With a question
like "how does postgres work" you're likely to flounder. But with a
more targeted question, e.g., "what format does postgres use to save
data to disk" or "how does postgres implement ORDER BY", you can make
easier progress, and perhaps you could get more useful pointers from
the people on this list.

Have you read through the "Overview of System Internals" chapter in
the documentation [1]http://www.postgresql.org/docs/current/static/overview.html? Perhaps it will help you identify the areas
you wish to explore further, and form more specific questions.

[1]: http://www.postgresql.org/docs/current/static/overview.html

Cheers,
BJ

#6Vaibhav Kaushal
vaibhavkaushal123@gmail.com
In reply to: Brendan Jurd (#5)
Re: I am confused after reading codes of PostgreSQL three week

Hi,

That was the question I was facing 5 months ago and trust me I am doing it
even now. With an average of 6+ hours going into PostgreSQL Code, even with
best practices (as suggested by the developers) I still think I know less
than 10 percent. It is too huge to be swallowed at once.

I too had to break it down into pieces and because everything is so
interconnected with everything else, it is quite complicated in the
beginning. Start with one piece; planner, parser, executor, storage
management whatever and slowly it should help you get the bigger picture.

regards,
Vaibhav

I had to break it into

On Fri, Mar 18, 2011 at 3:39 PM, Brendan Jurd <direvus@gmail.com> wrote:

Show quoted text

On 18 March 2011 01:57, hom <obsidianhom@gmail.com> wrote:

I try to known how a database is implemented

This objective is so vast and so vague that it's difficult to give
meaningful help.

I'd emphasise Kevin Grittner's very worthwhile advice. Try to break
your question down into smaller, more specific ones. With a question
like "how does postgres work" you're likely to flounder. But with a
more targeted question, e.g., "what format does postgres use to save
data to disk" or "how does postgres implement ORDER BY", you can make
easier progress, and perhaps you could get more useful pointers from
the people on this list.

Have you read through the "Overview of System Internals" chapter in
the documentation [1]? Perhaps it will help you identify the areas
you wish to explore further, and form more specific questions.

[1] http://www.postgresql.org/docs/current/static/overview.html

Cheers,
BJ

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7hom
obsidianhom@gmail.com
In reply to: Bruce Momjian (#2)
Re: I am confused after reading codes of PostgreSQL three week

2011/3/17 Bruce Momjian <bruce@momjian.us>:

hom wrote:

Hi,

  I try to known how a database is implemented and I have been reading
PG source codes for a month.

Now, I only know a little about how PG work.  :(

I just know PG work like this but I don't know why PG work like this.  :(  :(

even worse, I feel I can better understand the source code. it may be
that I could't split the large module into small piece which may help
to understand.

Is there any article or some way could help understand the source code ?

I assume you have looked at these places:

       http://wiki.postgresql.org/wiki/Developer_FAQ
       http://www.postgresql.org/developer/coding

--
 Bruce Momjian  <bruce@momjian.us>        http://momjian.us
 EnterpriseDB                             http://enterprisedb.com

 + It's impossible for everything to be true. +

Thanks Bruce.
I am also reading your book <PostgreSQL Introduction and Concepts>. :)

--
Best Wishes!

                                     hom

#8hom
obsidianhom@gmail.com
In reply to: Kevin Grittner (#3)
Re: I am confused after reading codes of PostgreSQL three week

2011/3/17 Kevin Grittner <Kevin.Grittner@wicourts.gov>:

hom <obsidianhom@gmail.com> wrote:

I try to known how a database is implemented and I have been
reading PG source codes for a month.

That's ambitious.

find -name '*.h' -or -name '*.c' \
 | egrep -v '^\./src/test/.+/tmp_check/' \
 | xargs cat | wc -l
1059144

Depending on how you do the math, that's about 50,000 lines of code
per day to get through it in the time you mention.

Is there any article or some way could help understand the source
code ?

Your best bet would be to follow links from the Developers tab on
the main PostgreSQL web site:

http://www.postgresql.org/developer/

In particular the Developer FAQ page:

http://wiki.postgresql.org/wiki/Developer_FAQ

And the "Coding" links:

http://www.postgresql.org/developer/coding

may help.

Before reading code in a directory, be sure to read any README
file(s) in that directory carefully.

It helps to read this list.

In spite of reviewing all of that myself, it was rather intimidating
when I went to work on a major patch 14 months ago.  Robert Haas
offered some good advice which served me well in that effort --
divide the effort in to a series of incremental steps, each of which
deals with a small enough portion of the code to get your head
around.  As you work in any one narrow area, it becomes increasingly
clear; with that as a base you can expand your scope.

When you're working in the code, it is tremendously helpful to use
an editor with ctags support (or similar IDE functionality).

I hope this is helpful.  Good luck.

-Kevin

Thanks Kevin.
I will follow your advice and I will also post the question to the
mail list for help.
Thanks a lot.

--
Best Wishes!

                                     hom

#9hom
obsidianhom@gmail.com
In reply to: Markus Wanner (#4)
Re: I am confused after reading codes of PostgreSQL three week

2011/3/18 Markus Wanner <markus@bluegap.ch>:

Hom,

On 03/17/2011 04:49 PM, Kevin Grittner wrote:

That's ambitious.

Absolutely, yes.  Exercise patience with yourself.

A method that hasn't been mentioned, yet, is digging out your debugger
and attach it to a connected Postgres backend.  You can then issue a
query you are interested in and follow the backend doing its work.

That's particularly helpful in trying to find a certain spot of
interest.  Of course, it doesn't help much in getting the big picture.

Good luck on your journey through the code base.

Regards

Markus Wanner

Thanks Markus.
It's hard time at the beginning.
I should keep patient. :)

--
Best Wishes!

                                     hom

#10hom
obsidianhom@gmail.com
In reply to: Brendan Jurd (#5)
Re: I am confused after reading codes of PostgreSQL three week

2011/3/18 Brendan Jurd <direvus@gmail.com>:

On 18 March 2011 01:57, hom <obsidianhom@gmail.com> wrote:

 I try to known how a database is implemented

This objective is so vast and so vague that it's difficult to give
meaningful help.

I'd emphasise Kevin Grittner's very worthwhile advice.  Try to break
your question down into smaller, more specific ones.  With a question
like "how does postgres work" you're likely to flounder.  But with a
more targeted question, e.g., "what format does postgres use to save
data to disk" or "how does postgres implement ORDER BY", you can make
easier progress, and perhaps you could get more useful pointers from
the people on this list.

Have you read through the "Overview of System Internals" chapter in
the documentation [1]?  Perhaps it will help you identify the areas
you wish to explore further, and form more specific questions.

[1] http://www.postgresql.org/docs/current/static/overview.html

Cheers,
BJ

Thanks Brendan.
I have a quickly glance on "Overview of System Internals" before.
I think it is time to read it again.

--
Best Wishes!

                                     hom

#11hom
obsidianhom@gmail.com
In reply to: Vaibhav Kaushal (#6)
Re: I am confused after reading codes of PostgreSQL three week

2011/3/18 Vaibhav Kaushal <vaibhavkaushal123@gmail.com>:

Hi,
That was the question I was facing 5 months ago and trust me I am doing it
even now. With an average of 6+ hours going into PostgreSQL Code, even with
best practices (as suggested by the developers) I still think I know less
than 10 percent. It is too huge to be swallowed at once.
I too had to break it down into pieces and because everything is so
interconnected with everything else, it is quite complicated in the
beginning. Start with one piece; planner, parser, executor, storage
management whatever and slowly it should help you get the bigger picture.
regards,
Vaibhav
I had to break it into

Thanks Vaibhav .
I have step into parser before but I meet a problem:

when I debug step in the scanner_init(), Eclipse always finds scan.l
and the excute order is not match the file.
I think it should be scan.c actually but I don't known how to trace
into scan.c :(
PS: I have turn "Search for duplicate source files" option on.

I have posted to the mail list, but it have not solved.

here is the link:
http://postgresql.1045698.n5.nabble.com/Open-unmatch-source-file-when-step-into-parse-analyze-in-Eclipse-td3408033.html

--
Best Wishes!

                                     hom

#12Vaibhav Kaushal
vaibhavkaushal123@gmail.com
In reply to: hom (#11)
Re: I am confused after reading codes of PostgreSQL three week

Hello hom,

Frankly I am a learner as well. The experts here are almost always ready
to help and would be a better source of information.

Moreover I am also using eclipse but I do not use it for building the
source. I use it only as a source code browser (its easy in GUI; isn't
it? ). I am trying to learn about the executor so can't say much about
the parser. However I suppose that you must be knowing the rules of the
tools flex and bison to understand the parser. And why are you into
scan.c? It is created by flex dear. Read the scan.l and gram.y instead.
It is these files which are responsible for the major work done by the
parser.

If you are keen about the parser, go learn lex and yacc (or flex and
bison ... they are almost the same) and then go through the scan.l and
gram.y files. It is actually an _extremely_ tough job to read the
generated files. Once again, do turn off the "Search for duplicate
source files" option. There are no duplicate files in the source tree.

Also, if you are using the copy of source tree which was built once in
the workspace, things can be a little different.

@others: Well, I do know that there are a few books in the market
written by the devs but how much does it help when I am already banging
my head into source since last 5 months?

Regards,
Vaibhav

Show quoted text

On Fri, 2011-03-18 at 22:44 +0800, hom wrote:

2011/3/18 Vaibhav Kaushal <vaibhavkaushal123@gmail.com>:

Hi,
That was the question I was facing 5 months ago and trust me I am doing it
even now. With an average of 6+ hours going into PostgreSQL Code, even with
best practices (as suggested by the developers) I still think I know less
than 10 percent. It is too huge to be swallowed at once.
I too had to break it down into pieces and because everything is so
interconnected with everything else, it is quite complicated in the
beginning. Start with one piece; planner, parser, executor, storage
management whatever and slowly it should help you get the bigger picture.
regards,
Vaibhav
I had to break it into

Thanks Vaibhav .
I have step into parser before but I meet a problem:

when I debug step in the scanner_init(), Eclipse always finds scan.l
and the excute order is not match the file.
I think it should be scan.c actually but I don't known how to trace
into scan.c :(
PS: I have turn "Search for duplicate source files" option on.

I have posted to the mail list, but it have not solved.

here is the link:
http://postgresql.1045698.n5.nabble.com/Open-unmatch-source-file-when-step-into-parse-analyze-in-Eclipse-td3408033.html

#13hom
obsidianhom@gmail.com
In reply to: Vaibhav Kaushal (#12)
Re: I am confused after reading codes of PostgreSQL three week

2011/3/19 Vaibhav Kaushal <vaibhavkaushal123@gmail.com>:

Hello hom,

Frankly I am a learner as well. The experts here are almost always ready
to help and would be a better source of information.

Moreover I am also using eclipse but I do not use it for building the
source. I use it only as a source code browser (its easy in GUI; isn't
it? ). I am trying to learn about the executor so can't say much about
the parser. However I suppose that you must be knowing the rules of the
tools flex and bison to understand the parser. And why are you into
scan.c? It is created by flex dear. Read the scan.l and gram.y instead.
It is these files which are responsible for the major work done by the
parser.

If you are keen about the parser, go learn lex and yacc (or flex and
bison ... they are almost the same) and then go through the scan.l and
gram.y files. It is actually an _extremely_ tough job to read the
generated files. Once again, do turn off the "Search for duplicate
source files" option. There are no duplicate files in the source tree.

Also, if you are using the copy of source tree which was built once in
the workspace, things can be a little different.

@others: Well, I do know that there are a few books in the market
written by the devs but how much does it help when I am already banging
my head into source since last 5 months?

Regards,
Vaibhav

Thanks Vaibhav.

I trace into scan.c because I want to known how the paser tree is
built and I debug the source step by step.
Then the eclipse pick up the scan.I and the excute order does not
match the code.

Actually, I have no idea which module of the source I should read first.
I have a quick glance at the source and I known a litter about how a
query excutes.
But the modules are so connected. I don't known what part I should be deep in.

Now, I plan to study deep in mmgr. Will it be suitable?

--
Best Wishes!

                                     hom

#14Martijn van Oosterhout
kleptog@svana.org
In reply to: hom (#13)
Re: I am confused after reading codes of PostgreSQL three week

On Sun, Mar 20, 2011 at 11:50:01AM +0800, hom wrote:

I trace into scan.c because I want to known how the paser tree is
built and I debug the source step by step.
Then the eclipse pick up the scan.I and the excute order does not
match the code.

Umm, the scanners produced by flex and bison are huge table driven
parsers, which makes following what is happening in terms of "parse
tree" extremely difficult to follow.

If you want to follow what's happening, see the following page:

http://dinosaur.compilertools.net/bison/bison_11.html

Which will cause the parser to dump what it's doing. As the page says,
stepping through the processed file reveals little, becuase it's the
same code being executed over and over again, only the variables
change.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Patriotism is when love of your own people comes first; nationalism,
when hate for people other than your own comes first.
- Charles de Gaulle

#15Nicolas Barbier
nicolas.barbier@gmail.com
In reply to: hom (#13)
Re: I am confused after reading codes of PostgreSQL three week

2011/3/20 hom <obsidianhom@gmail.com>:

I trace into scan.c because I want to known how the paser tree is
built and I debug the source step by step.

I suggest you learn how flex/bison work first. The contents of the *.c
files generated by flex/bison are not generally supposed to be
interpreted by humans, rather you should read their original sources
(*.l and *.y).

Then the eclipse pick up the scan.I and the excute order does not
match the code.

Eclipse seems to understand that any code corresponding to the
generated .c file actually originates in the .l file, but apparently
fails to match (some of?) the line numbers. OTOH, I cannot really
imagine how it is supposed to match them as long as you are not
executing lines that are literally copied from the .l file (e.g., as
long as the lexer or parser code itself is being executed), so that
may be normal.

Again: Do not try to read the generated .c files, but rather read the
corresponding .l and .y files. The tarballs may include those
generated .c files, but as you will find out when checking out the
repository itself, they are not really considered "source" (i.e., they
are not included). When debugging, skip over the lexer and parser code
itself, just put your breakpoints in the C code in the .l and .y files
(I hope Eclipse might match *those* line numbers a least, and make the
breakpoints work).

Nicolas

#16hom
obsidianhom@gmail.com
In reply to: Martijn van Oosterhout (#14)
Re: I am confused after reading codes of PostgreSQL three week

2011/3/20 Martijn van Oosterhout <kleptog@svana.org>:

On Sun, Mar 20, 2011 at 11:50:01AM +0800, hom wrote:

I trace into scan.c because I want to known how the paser tree is
built and I debug the source step by step.
Then the eclipse pick up the scan.I and the excute order does not
match the code.

Umm, the scanners produced by flex and bison are huge table driven
parsers, which makes following what is happening in terms of "parse
tree" extremely difficult to follow.

If you want to follow what's happening, see the following page:

http://dinosaur.compilertools.net/bison/bison_11.html

Which will cause the parser to dump what it's doing. As the page says,
stepping through the processed file reveals little, becuase it's the
same code being executed over and over again, only the variables
change.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/

Patriotism is when love of your own people comes first; nationalism,
when hate for people other than your own comes first.
                                      - Charles de Gaulle

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iD8DBQFNhdIqIB7bNG8LQkwRAoMeAJsG3Z1reT2E04fy+sFvA2izfXOX3gCfbxhN
fra+WGq65WMfNlmFa9NGktU=
=3kpG
-----END PGP SIGNATURE-----

Thanks Martijn.
I am trying lex and yacc on my Linux. :)

--
Best Wishes!

                                     hom

#17hom
obsidianhom@gmail.com
In reply to: Nicolas Barbier (#15)
Re: I am confused after reading codes of PostgreSQL three week

2011/3/20 Nicolas Barbier <nicolas.barbier@gmail.com>:

2011/3/20 hom <obsidianhom@gmail.com>:

I trace into scan.c because I want to known how the paser tree is
built and I debug the source step by step.

I suggest you learn how flex/bison work first. The contents of the *.c
files generated by flex/bison are not generally supposed to be
interpreted by humans, rather you should read their original sources
(*.l and *.y).

Then the eclipse pick up the scan.I and the excute order does not
match the code.

Eclipse seems to understand that any code corresponding to the
generated .c file actually originates in the .l file, but apparently
fails to match (some of?) the line numbers. OTOH, I cannot really
imagine how it is supposed to match them as long as you are not
executing lines that are literally copied from the .l file (e.g., as
long as the lexer or parser code itself is being executed), so that
may be normal.

Again: Do not try to read the generated .c files, but rather read the
corresponding .l and .y files. The tarballs may include those
generated .c files, but as you will find out when checking out the
repository itself, they are not really considered "source" (i.e., they
are not included). When debugging, skip over the lexer and parser code
itself, just put your breakpoints in the C code in the .l and .y files
(I hope Eclipse might match *those* line numbers a least, and make the
breakpoints work).

Nicolas

Thanks Nicolas.
I put breakpoints in scan.I but it doesn't work sometime.
but it doesn't matter. I plan to spend more time on mmgr, storage, access. :)

--
Best Wishes!

                                     hom