On How To Shorten the Steep Learning Curve Towards PG Hacking...

Started by Kang Yuzheabout 9 years ago30 messageshackers

Jump to latest

Kang Yuzhe

tiggreen87@gmail.com

about 9 years ago

Dear PG Hackers/Experts,

I am newbie to PG Hacking.
I have been reading the PG code base to find my space in it but without
success.

There are hundreds of Hands-on with PG Application development on the web.
Alas, there is almost none in PG hacking.

I have found PG source Code reading and hacking to be one the most
frustrating experiences in my life. I believe that PG hacking should not
be a painful
Dear PG Hacker/Experts,

I am newbie to PG Hacking.
I have been reading the PG code base to find my space in it but without
success.

There are hundreds of Hands-on with PG Application development on the web.
Alas, there is almost none in PG hacking.

I have found PG source Code reading and hacking to be one the most
frustrating experiences in my life. I believe that PG hacking should not
be a painful journey but an enjoyable one!

It is my strong believe that out of my PG hacking frustrations, there may
come insights for the PG experts on ways how to devise hands-on with PG
internals so that new comers will be great coders as quickly as possible.

I also believe that we should spend our time reading great Papers and Books
related to Data Management problems BUT not PG code base.

Here are my suggestion for the experts to devise ways to shorten the steep
learning curve towards PG Hacking.

1. Prepare Hands-on with PG internals

For example, a complete Hands-on with SELECT/INSERT SQL Standard PG
internals. The point is the experts can pick one fairly complex feature and
walk it from Parser to Executor in a hands-on manner explaining step by
step every technical detail.

2. Write a book on PG Internals.

There is one book on PG internals. Unfortunately, it's in Chinese.
Why not in English??
It is my strong believe that if there were a great book on PG Internals
with hands-on with some of the basic features of PG internals machinery, PG
hacking would be almost as easy as PG application development.

If the experts make the newbie understand the PG code base as quickly as
possible, that means more reviewers, more contributors and more users of PG
which in turn means more PG usability, more PG popularity, stronger PG
community.

This is my personal feelings and am the ready to be corrected and advised
the right way towards the PG hacking.

Regards,
Zeray

Michael Paquier

michael@paquier.xyz

about 9 years ago

In reply to: Kang Yuzhe (#1)

Re: On How To Shorten the Steep Learning Curve Towards PG Hacking...

On Mon, Mar 27, 2017 at 9:00 PM, Kang Yuzhe <tiggreen87@gmail.com> wrote:

1. Prepare Hands-on with PG internals

For example, a complete Hands-on with SELECT/INSERT SQL Standard PG
internals. The point is the experts can pick one fairly complex feature and
walk it from Parser to Executor in a hands-on manner explaining step by step
every technical detail.

There are resources on the net, in English as well. Take for example
this manual explaining the internals of Postgres by Hironobu Suzuki:
http://www.interdb.jp/pg/
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tsunakawa, Takayuki

tsunakawa.takay@jp.fujitsu.com

about 9 years ago

In reply to: Kang Yuzhe (#1)

Re: On How To Shorten the Steep Learning Curve Towards PG Hacking...

From: pgsql-hackers-owner@postgresql.org

[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kang Yuzhe

1. Prepare Hands-on with PG internals

For example, a complete Hands-on with SELECT/INSERT SQL Standard PG
internals. The point is the experts can pick one fairly complex feature
and walk it from Parser to Executor in a hands-on manner explaining step
by step every technical detail.

I sympathize with you. What level of detail do you have in mind? The query processing framework is described in the manual:

Chapter 50. Overview of PostgreSQL Internals
https://www.postgresql.org/docs/devel/static/overview.html

More detailed source code analysis is provided for very old PostgreSQL 7.4, but I guess it's not much different now. The document is in Japanese only:

http://ikubo.x0.com/PostgreSQL/pg_source.htm

Are you thinking of something like this?

MySQL Internals Manual
https://dev.mysql.com/doc/internals/en/

Regards
Takayuki Tsunakawa

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Kang Yuzhe

tiggreen87@gmail.com

about 9 years ago

In reply to: Tsunakawa, Takayuki (#3)

Re: On How To Shorten the Steep Learning Curve Towards PG Hacking...

Thanks Tsunakawa for such an informative reply.

Almost all of the docs related to the internals of PG are of introductory
concepts only.
There is even more useful PG internals site entitled "The Internals of
PostgreSQL" in http://www.interdb.jp/pg/ translation of the Japanese PG
Internals.

The query processing framework that is described in the manual as you
mentioned is of informative and introductory nature.
In theory, the query processing framework described in the manual is
understandable.

Unfortunate, it is another story to understand how query processing
framework in PG codebase really works.
It has become a difficult task for me to walk through the PG source code
for example how SELECT/INSERT/TRUNCATE in the the different modules under
"src/..". really works.

I wish there were Hands-On with PostgreSQL Internals like
https://bkmjournal.wordpress.com/2017/01/22/hands-on-with-postgresql-internals/
for more complex PG features.

For example, MERGE SQL standard is not supported yet by PG. I wish there
were Hands-On with PostgreSQL Internals for MERGE/UPSERT. How it is
implemented in parser/executor/storage etc. modules with detailed
explanation for each code and debugging and other important concepts
related to system programming.

Zeray,
Regards

On Tue, Mar 28, 2017 at 6:04 AM, Tsunakawa, Takayuki <
tsunakawa.takay@jp.fujitsu.com> wrote:

Show quoted text

From: pgsql-hackers-owner@postgresql.org

[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kang Yuzhe

1. Prepare Hands-on with PG internals

For example, a complete Hands-on with SELECT/INSERT SQL Standard PG
internals. The point is the experts can pick one fairly complex feature
and walk it from Parser to Executor in a hands-on manner explaining step
by step every technical detail.

I sympathize with you. What level of detail do you have in mind? The
query processing framework is described in the manual:

Chapter 50. Overview of PostgreSQL Internals
https://www.postgresql.org/docs/devel/static/overview.html

More detailed source code analysis is provided for very old PostgreSQL
7.4, but I guess it's not much different now. The document is in Japanese
only:

http://ikubo.x0.com/PostgreSQL/pg_source.htm

Are you thinking of something like this?

MySQL Internals Manual
https://dev.mysql.com/doc/internals/en/

Regards
Takayuki Tsunakawa

Adrien Nayrat

adrien.nayrat@dalibo.com

about 9 years ago

In reply to: Kang Yuzhe (#1)

Re: On How To Shorten the Steep Learning Curve Towards PG Hacking...

On 03/27/2017 02:00 PM, Kang Yuzhe wrote:

1. Prepare Hands-on with PG internals

For example, a complete Hands-on with SELECT/INSERT SQL Standard PG internals.
The point is the experts can pick one fairly complex feature and walk it from
Parser to Executor in a hands-on manner explaining step by step every technical
detail.

Hi,

Bruce Momjian has made several presentations about Postgres Internal :
http://momjian.us/main/presentations/internals.html

Regards
--
Adrien NAYRAT

Amit Langote

Langote_Amit_f8@lab.ntt.co.jp

about 9 years ago

In reply to: Kang Yuzhe (#4)

Re: On How To Shorten the Steep Learning Curve Towards PG Hacking...

Hi,

On 2017/03/28 15:40, Kang Yuzhe wrote:

Thanks Tsunakawa for such an informative reply.

Almost all of the docs related to the internals of PG are of introductory
concepts only.
There is even more useful PG internals site entitled "The Internals of
PostgreSQL" in http://www.interdb.jp/pg/ translation of the Japanese PG
Internals.

The query processing framework that is described in the manual as you
mentioned is of informative and introductory nature.
In theory, the query processing framework described in the manual is
understandable.

Unfortunate, it is another story to understand how query processing
framework in PG codebase really works.
It has become a difficult task for me to walk through the PG source code
for example how SELECT/INSERT/TRUNCATE in the the different modules under
"src/..". really works.

I wish there were Hands-On with PostgreSQL Internals like
https://bkmjournal.wordpress.com/2017/01/22/hands-on-with-postgresql-internals/
for more complex PG features.

For example, MERGE SQL standard is not supported yet by PG. I wish there
were Hands-On with PostgreSQL Internals for MERGE/UPSERT. How it is
implemented in parser/executor/storage etc. modules with detailed
explanation for each code and debugging and other important concepts
related to system programming.

I am not sure if I can show you that one place where you could learn all
of that, but many people who started with PostgreSQL development at some
point started by exploring the source code itself (either for learning or
to write a feature patch), articles on PostgreSQL wiki, and many related
presentations accessible using the Internet. I liked the following among
many others:

Introduction to Hacking PostgreSQL:
http://www.neilconway.org/talks/hacking/

Inside the PostgreSQL Query Optimizer:
http://www.neilconway.org/talks/optimizer/optimizer.pdf

Postgres Internals Presentations:
http://momjian.us/main/presentations/internals.html

Thanks,
Amit

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Craig Ringer

craig@2ndquadrant.com

about 9 years ago

In reply to: Amit Langote (#6)

Re: On How To Shorten the Steep Learning Curve Towards PG Hacking...

On 29 March 2017 at 10:53, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:

Hi,

On 2017/03/28 15:40, Kang Yuzhe wrote:

Thanks Tsunakawa for such an informative reply.

Almost all of the docs related to the internals of PG are of introductory
concepts only.
There is even more useful PG internals site entitled "The Internals of
PostgreSQL" in http://www.interdb.jp/pg/ translation of the Japanese PG
Internals.

The query processing framework that is described in the manual as you
mentioned is of informative and introductory nature.
In theory, the query processing framework described in the manual is
understandable.

Unfortunate, it is another story to understand how query processing
framework in PG codebase really works.
It has become a difficult task for me to walk through the PG source code
for example how SELECT/INSERT/TRUNCATE in the the different modules under
"src/..". really works.

I wish there were Hands-On with PostgreSQL Internals like
https://bkmjournal.wordpress.com/2017/01/22/hands-on-with-postgresql-internals/
for more complex PG features.

For example, MERGE SQL standard is not supported yet by PG. I wish there
were Hands-On with PostgreSQL Internals for MERGE/UPSERT. How it is
implemented in parser/executor/storage etc. modules with detailed
explanation for each code and debugging and other important concepts
related to system programming.

I am not sure if I can show you that one place where you could learn all
of that, but many people who started with PostgreSQL development at some
point started by exploring the source code itself (either for learning or
to write a feature patch), articles on PostgreSQL wiki, and many related
presentations accessible using the Internet. I liked the following among
many others:

Personally I have to agree that the learning curve is very steep. Some
of the docs and presentations help, but there's a LOT to understand.

When you're getting started you're lost in a world of language you
don't know, and trying to understand one piece often gets you lost in
other pieces. In no particular order:

* Memory contexts and palloc
* Managing transactions and how that interacts with memory contexts
and the default memory context
* Snapshots, snapshot push/pop, etc
* LWLocks, memory barriers, spinlocks, latches
* Heavyweight locks (and the different APIs to them)
* GUCs, their scopes, the rules around their callbacks, etc
* dynahash
* catalogs and oids and access methods
* The heap AM like heap_open
* relcache, catcache, syscache
* genam and the systable_ calls and their limitations with indexes
* The SPI
* When to use each of the above 4!
* Heap tuples and minimal tuples
* VARLENA
* GETSTRUCT, when you can/can't use it, other attribute fetching methods
* TOAST and detoasting datums.
* forming and deforming tuples
* LSNs, WAL/xlog generation and redo. Timelines. (ARGH, timelines).
* cache invalidations, when they can happen, and how to do anything
safely around them.
* TIDs, cmin and cmax, xmin and xmax
* postmaster, vacuum, bgwriter, checkpointer, startup process,
walsender, walreceiver, all our auxillary procs and what they do
* relmapper, relfilenodes vs relation oids, filenode extents
* ondisk structure, page headers, pages
* shmem management, buffers and buffer pins
* bgworkers
* PG_TRY() and PG_CATCH() and their limitations
* elog and ereport and errcontexts, exception unwinding/longjmp and
how it interacts with memory contexts, lwlocks, etc
* The nest of macros around datum manipulation and functions, PL
handlers. How to find the macros for the data types you want to work
with.
* Everything to do with the C API for arrays (is horrible)
* The details of the parse/rewrite/plan phases with rewrite calling
back into parse, paths, the mess with inheritance_planner, reading and
understanding plantrees
* The permissions and grants model and how to interact with it
* PGPROC, PGXACT, other main shmem structures
* Resource owners (which I still don't fully "get")
* Checkpoints, pg_control and ShmemVariableCache, crash recovery
* How globals are used in Pg and how they interact with fork()ing from
postmaster
* SSI (haven't gone there yet myself)
* ....

Personally I recall finding the magic of resource owner and memory
context changing under me when I started/stopped xacts in a bgworker,
along with the need to manage snapshots and SPI state to be distinctly
confusing.

There are various READMEs, blog posts, presentation slides/videos, etc
that explain bits and pieces. But not much exists to tie it together
into a comprehensible hole with simple, minimal explanations for each
part so someone who's new to it all can begin to get a handle on it,
find resources to learn more about subsystems they need to care about,
etc.

Lots of it boils down to "read the code". But so much code! You don't
know if what you're reading is really relevant or if it's even
correct, or if it makes assumptions that differ from your situation.
There are lots of coding rules that aren't necessarily obvious unless
you read the right place, e.g. that you don't need to and shouldn't
LWLockRelease() before elog(ERROR). That SPI doesn't manage snapshots
or xacts for you (but will often silently work anyway!). etc.

I've long intended to start a blog series on postgresql innards
concepts, partly with the intent of turning it into such an overview.
I find that people are better at shouting you down when you're wrong
than they are at writing new material or reviewing proposed docs, so
it's often a good way to fact-check things ;) . Plus it's a good way
to learn. Time is always short though.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Kang Yuzhe

tiggreen87@gmail.com

about 9 years ago

In reply to: Craig Ringer (#7)

Re: On How To Shorten the Steep Learning Curve Towards PG Hacking...

Thanks you all for pointing me to useful docs on PG kernel stuff as well as
for being sympathetic with me and the newbie question that appears to be
true and interesting but yet be addressed by PG experts.

Last but not least, *Craig Ringer*, you just nailed it!! You also made me
feel and think that my question is working asking.

Regards,
Zeray

On Wed, Mar 29, 2017 at 6:36 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

Show quoted text

On 29 March 2017 at 10:53, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>
wrote:

Hi,

On 2017/03/28 15:40, Kang Yuzhe wrote:

Thanks Tsunakawa for such an informative reply.

Almost all of the docs related to the internals of PG are of

introductory

concepts only.
There is even more useful PG internals site entitled "The Internals of
PostgreSQL" in http://www.interdb.jp/pg/ translation of the Japanese PG
Internals.

The query processing framework that is described in the manual as you
mentioned is of informative and introductory nature.
In theory, the query processing framework described in the manual is
understandable.

Unfortunate, it is another story to understand how query processing
framework in PG codebase really works.
It has become a difficult task for me to walk through the PG source code
for example how SELECT/INSERT/TRUNCATE in the the different modules

under

"src/..". really works.

I wish there were Hands-On with PostgreSQL Internals like
https://bkmjournal.wordpress.com/2017/01/22/hands-on-with-

postgresql-internals/

for more complex PG features.

For example, MERGE SQL standard is not supported yet by PG. I wish

there

were Hands-On with PostgreSQL Internals for MERGE/UPSERT. How it is
implemented in parser/executor/storage etc. modules with detailed
explanation for each code and debugging and other important concepts
related to system programming.

I am not sure if I can show you that one place where you could learn all
of that, but many people who started with PostgreSQL development at some
point started by exploring the source code itself (either for learning or
to write a feature patch), articles on PostgreSQL wiki, and many related
presentations accessible using the Internet. I liked the following among
many others:

Personally I have to agree that the learning curve is very steep. Some
of the docs and presentations help, but there's a LOT to understand.

When you're getting started you're lost in a world of language you
don't know, and trying to understand one piece often gets you lost in
other pieces. In no particular order:

* Memory contexts and palloc
* Managing transactions and how that interacts with memory contexts
and the default memory context
* Snapshots, snapshot push/pop, etc
* LWLocks, memory barriers, spinlocks, latches
* Heavyweight locks (and the different APIs to them)
* GUCs, their scopes, the rules around their callbacks, etc
* dynahash
* catalogs and oids and access methods
* The heap AM like heap_open
* relcache, catcache, syscache
* genam and the systable_ calls and their limitations with indexes
* The SPI
* When to use each of the above 4!
* Heap tuples and minimal tuples
* VARLENA
* GETSTRUCT, when you can/can't use it, other attribute fetching methods
* TOAST and detoasting datums.
* forming and deforming tuples
* LSNs, WAL/xlog generation and redo. Timelines. (ARGH, timelines).
* cache invalidations, when they can happen, and how to do anything
safely around them.
* TIDs, cmin and cmax, xmin and xmax
* postmaster, vacuum, bgwriter, checkpointer, startup process,
walsender, walreceiver, all our auxillary procs and what they do
* relmapper, relfilenodes vs relation oids, filenode extents
* ondisk structure, page headers, pages
* shmem management, buffers and buffer pins
* bgworkers
* PG_TRY() and PG_CATCH() and their limitations
* elog and ereport and errcontexts, exception unwinding/longjmp and
how it interacts with memory contexts, lwlocks, etc
* The nest of macros around datum manipulation and functions, PL
handlers. How to find the macros for the data types you want to work
with.
* Everything to do with the C API for arrays (is horrible)
* The details of the parse/rewrite/plan phases with rewrite calling
back into parse, paths, the mess with inheritance_planner, reading and
understanding plantrees
* The permissions and grants model and how to interact with it
* PGPROC, PGXACT, other main shmem structures
* Resource owners (which I still don't fully "get")
* Checkpoints, pg_control and ShmemVariableCache, crash recovery
* How globals are used in Pg and how they interact with fork()ing from
postmaster
* SSI (haven't gone there yet myself)
* ....

Personally I recall finding the magic of resource owner and memory
context changing under me when I started/stopped xacts in a bgworker,
along with the need to manage snapshots and SPI state to be distinctly
confusing.

There are various READMEs, blog posts, presentation slides/videos, etc
that explain bits and pieces. But not much exists to tie it together
into a comprehensible hole with simple, minimal explanations for each
part so someone who's new to it all can begin to get a handle on it,
find resources to learn more about subsystems they need to care about,
etc.

Lots of it boils down to "read the code". But so much code! You don't
know if what you're reading is really relevant or if it's even
correct, or if it makes assumptions that differ from your situation.
There are lots of coding rules that aren't necessarily obvious unless
you read the right place, e.g. that you don't need to and shouldn't
LWLockRelease() before elog(ERROR). That SPI doesn't manage snapshots
or xacts for you (but will often silently work anyway!). etc.

I've long intended to start a blog series on postgresql innards
concepts, partly with the intent of turning it into such an overview.
I find that people are better at shouting you down when you're wrong
than they are at writing new material or reviewing proposed docs, so
it's often a good way to fact-check things ;) . Plus it's a good way
to learn. Time is always short though.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Amit Langote

Langote_Amit_f8@lab.ntt.co.jp

about 9 years ago

In reply to: Craig Ringer (#7)

Re: On How To Shorten the Steep Learning Curve Towards PG Hacking...

On 2017/03/29 12:36, Craig Ringer wrote:

On 29 March 2017 at 10:53, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:

Hi,

On 2017/03/28 15:40, Kang Yuzhe wrote:

Thanks Tsunakawa for such an informative reply.

Almost all of the docs related to the internals of PG are of introductory
concepts only.
There is even more useful PG internals site entitled "The Internals of
PostgreSQL" in http://www.interdb.jp/pg/ translation of the Japanese PG
Internals.

The query processing framework that is described in the manual as you
mentioned is of informative and introductory nature.
In theory, the query processing framework described in the manual is
understandable.

Unfortunate, it is another story to understand how query processing
framework in PG codebase really works.
It has become a difficult task for me to walk through the PG source code
for example how SELECT/INSERT/TRUNCATE in the the different modules under
"src/..". really works.

I wish there were Hands-On with PostgreSQL Internals like
https://bkmjournal.wordpress.com/2017/01/22/hands-on-with-postgresql-internals/
for more complex PG features.

For example, MERGE SQL standard is not supported yet by PG. I wish there
were Hands-On with PostgreSQL Internals for MERGE/UPSERT. How it is
implemented in parser/executor/storage etc. modules with detailed
explanation for each code and debugging and other important concepts
related to system programming.

I am not sure if I can show you that one place where you could learn all
of that, but many people who started with PostgreSQL development at some
point started by exploring the source code itself (either for learning or
to write a feature patch), articles on PostgreSQL wiki, and many related
presentations accessible using the Internet. I liked the following among
many others:

Personally I have to agree that the learning curve is very steep. Some
of the docs and presentations help, but there's a LOT to understand.

I agree too. :)

When you're getting started you're lost in a world of language you
don't know, and trying to understand one piece often gets you lost in
other pieces. In no particular order:

* Memory contexts and palloc
* Managing transactions and how that interacts with memory contexts
and the default memory context
* Snapshots, snapshot push/pop, etc
* LWLocks, memory barriers, spinlocks, latches
* Heavyweight locks (and the different APIs to them)
* GUCs, their scopes, the rules around their callbacks, etc
* dynahash
* catalogs and oids and access methods
* The heap AM like heap_open
* relcache, catcache, syscache
* genam and the systable_ calls and their limitations with indexes
* The SPI
* When to use each of the above 4!
* Heap tuples and minimal tuples
* VARLENA
* GETSTRUCT, when you can/can't use it, other attribute fetching methods
* TOAST and detoasting datums.
* forming and deforming tuples
* LSNs, WAL/xlog generation and redo. Timelines. (ARGH, timelines).
* cache invalidations, when they can happen, and how to do anything
safely around them.
* TIDs, cmin and cmax, xmin and xmax
* postmaster, vacuum, bgwriter, checkpointer, startup process,
walsender, walreceiver, all our auxillary procs and what they do
* relmapper, relfilenodes vs relation oids, filenode extents
* ondisk structure, page headers, pages
* shmem management, buffers and buffer pins
* bgworkers
* PG_TRY() and PG_CATCH() and their limitations
* elog and ereport and errcontexts, exception unwinding/longjmp and
how it interacts with memory contexts, lwlocks, etc
* The nest of macros around datum manipulation and functions, PL
handlers. How to find the macros for the data types you want to work
with.
* Everything to do with the C API for arrays (is horrible)
* The details of the parse/rewrite/plan phases with rewrite calling
back into parse, paths, the mess with inheritance_planner, reading and
understanding plantrees
* The permissions and grants model and how to interact with it
* PGPROC, PGXACT, other main shmem structures
* Resource owners (which I still don't fully "get")
* Checkpoints, pg_control and ShmemVariableCache, crash recovery
* How globals are used in Pg and how they interact with fork()ing from
postmaster
* SSI (haven't gone there yet myself)
* ....

That is indeed a big list of things to know and (have to) worry about. If
we indeed come up with a PG-hackers-handbook someday, things in your list
could be organized such that it's clear to someone wanting to contribute
code which of those things they need to *absolutely* worry about and which
they don't.

Personally I recall finding the magic of resource owner and memory
context changing under me when I started/stopped xacts in a bgworker,
along with the need to manage snapshots and SPI state to be distinctly
confusing.

There are various READMEs, blog posts, presentation slides/videos, etc
that explain bits and pieces. But not much exists to tie it together
into a comprehensible hole with simple, minimal explanations for each
part so someone who's new to it all can begin to get a handle on it,
find resources to learn more about subsystems they need to care about,
etc.

Lots of it boils down to "read the code". But so much code! You don't
know if what you're reading is really relevant or if it's even
correct, or if it makes assumptions that differ from your situation.
There are lots of coding rules that aren't necessarily obvious unless
you read the right place, e.g. that you don't need to and shouldn't
LWLockRelease() before elog(ERROR). That SPI doesn't manage snapshots
or xacts for you (but will often silently work anyway!). etc.

I've long intended to start a blog series on postgresql innards
concepts, partly with the intent of turning it into such an overview.
I find that people are better at shouting you down when you're wrong
than they are at writing new material or reviewing proposed docs, so
it's often a good way to fact-check things ;) . Plus it's a good way
to learn. Time is always short though.

Agreed on all counts. Look forward to the blog. :)

Thanks,
Amit

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Kang Yuzhe

tiggreen87@gmail.com

about 9 years ago

In reply to: Amit Langote (#9)

Re: On How To Shorten the Steep Learning Curve Towards PG Hacking...

Thanks Amit for further confirmation on the Craig's intention.

I am looking forward to seeing your "PG internal machinery under
microscope" blog. May health, persistence and courage be with YOU.

Regards,
Zeray

On Wed, Mar 29, 2017 at 10:36 AM, Amit Langote <
Langote_Amit_f8@lab.ntt.co.jp> wrote:

Show quoted text

On 2017/03/29 12:36, Craig Ringer wrote:

On 29 March 2017 at 10:53, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>

wrote:

Hi,

On 2017/03/28 15:40, Kang Yuzhe wrote:

Thanks Tsunakawa for such an informative reply.

Almost all of the docs related to the internals of PG are of

introductory

concepts only.
There is even more useful PG internals site entitled "The Internals of
PostgreSQL" in http://www.interdb.jp/pg/ translation of the Japanese

PG

Internals.

The query processing framework that is described in the manual as you
mentioned is of informative and introductory nature.
In theory, the query processing framework described in the manual is
understandable.

Unfortunate, it is another story to understand how query processing
framework in PG codebase really works.
It has become a difficult task for me to walk through the PG source

code

for example how SELECT/INSERT/TRUNCATE in the the different modules

under

"src/..". really works.

I wish there were Hands-On with PostgreSQL Internals like
https://bkmjournal.wordpress.com/2017/01/22/hands-on-with-

postgresql-internals/

for more complex PG features.

For example, MERGE SQL standard is not supported yet by PG. I wish

there

were Hands-On with PostgreSQL Internals for MERGE/UPSERT. How it is
implemented in parser/executor/storage etc. modules with detailed
explanation for each code and debugging and other important concepts
related to system programming.

I am not sure if I can show you that one place where you could learn all
of that, but many people who started with PostgreSQL development at some
point started by exploring the source code itself (either for learning

or

to write a feature patch), articles on PostgreSQL wiki, and many related
presentations accessible using the Internet. I liked the following among
many others:

Personally I have to agree that the learning curve is very steep. Some
of the docs and presentations help, but there's a LOT to understand.

I agree too. :)

When you're getting started you're lost in a world of language you
don't know, and trying to understand one piece often gets you lost in
other pieces. In no particular order:

* Memory contexts and palloc
* Managing transactions and how that interacts with memory contexts
and the default memory context
* Snapshots, snapshot push/pop, etc
* LWLocks, memory barriers, spinlocks, latches
* Heavyweight locks (and the different APIs to them)
* GUCs, their scopes, the rules around their callbacks, etc
* dynahash
* catalogs and oids and access methods
* The heap AM like heap_open
* relcache, catcache, syscache
* genam and the systable_ calls and their limitations with indexes
* The SPI
* When to use each of the above 4!
* Heap tuples and minimal tuples
* VARLENA
* GETSTRUCT, when you can/can't use it, other attribute fetching methods
* TOAST and detoasting datums.
* forming and deforming tuples
* LSNs, WAL/xlog generation and redo. Timelines. (ARGH, timelines).
* cache invalidations, when they can happen, and how to do anything
safely around them.
* TIDs, cmin and cmax, xmin and xmax
* postmaster, vacuum, bgwriter, checkpointer, startup process,
walsender, walreceiver, all our auxillary procs and what they do
* relmapper, relfilenodes vs relation oids, filenode extents
* ondisk structure, page headers, pages
* shmem management, buffers and buffer pins
* bgworkers
* PG_TRY() and PG_CATCH() and their limitations
* elog and ereport and errcontexts, exception unwinding/longjmp and
how it interacts with memory contexts, lwlocks, etc
* The nest of macros around datum manipulation and functions, PL
handlers. How to find the macros for the data types you want to work
with.
* Everything to do with the C API for arrays (is horrible)
* The details of the parse/rewrite/plan phases with rewrite calling
back into parse, paths, the mess with inheritance_planner, reading and
understanding plantrees
* The permissions and grants model and how to interact with it
* PGPROC, PGXACT, other main shmem structures
* Resource owners (which I still don't fully "get")
* Checkpoints, pg_control and ShmemVariableCache, crash recovery
* How globals are used in Pg and how they interact with fork()ing from
postmaster
* SSI (haven't gone there yet myself)
* ....

That is indeed a big list of things to know and (have to) worry about. If
we indeed come up with a PG-hackers-handbook someday, things in your list
could be organized such that it's clear to someone wanting to contribute
code which of those things they need to *absolutely* worry about and which
they don't.

Personally I recall finding the magic of resource owner and memory
context changing under me when I started/stopped xacts in a bgworker,
along with the need to manage snapshots and SPI state to be distinctly
confusing.

There are various READMEs, blog posts, presentation slides/videos, etc
that explain bits and pieces. But not much exists to tie it together
into a comprehensible hole with simple, minimal explanations for each
part so someone who's new to it all can begin to get a handle on it,
find resources to learn more about subsystems they need to care about,
etc.

Lots of it boils down to "read the code". But so much code! You don't
know if what you're reading is really relevant or if it's even
correct, or if it makes assumptions that differ from your situation.
There are lots of coding rules that aren't necessarily obvious unless
you read the right place, e.g. that you don't need to and shouldn't
LWLockRelease() before elog(ERROR). That SPI doesn't manage snapshots
or xacts for you (but will often silently work anyway!). etc.

I've long intended to start a blog series on postgresql innards
concepts, partly with the intent of turning it into such an overview.
I find that people are better at shouting you down when you're wrong
than they are at writing new material or reviewing proposed docs, so
it's often a good way to fact-check things ;) . Plus it's a good way
to learn. Time is always short though.

Agreed on all counts. Look forward to the blog. :)

Thanks,
Amit

#11

Kevin Grittner

Kevin.Grittner@wicourts.gov

about 9 years ago

In reply to: Craig Ringer (#7)

Re: On How To Shorten the Steep Learning Curve Towards PG Hacking...

On Tue, Mar 28, 2017 at 10:36 PM, Craig Ringer <craig@2ndquadrant.com> wrote:

Personally I have to agree that the learning curve is very steep. Some
of the docs and presentations help, but there's a LOT to understand.

Some small patches can be kept to a fairly narrow set of areas, and
if you can find a similar capability to can crib technique for
handling some of the more mysterious areas it might brush up
against. When I started working on my first *big* patch that was
bound to touch many areas (around the start of development for 9.1)
I counted lines of code and found over a million lines just in .c
and .h files. We're now closing in on 1.5 million lines. That's
not counting over 376,000 lines of documentation in .sgml files,
over 12,000 lines of text in README* files, over 26,000 lines of
perl code, over 103,000 lines of .sql code (60% of which is in
regression tests), over 38,000 lines of .y code (for flex/bison
parsing), about 9,000 lines of various type of code just for
generating the configure file, and over 439,000 lines of .po files
(for message translations). I'm sure I missed a lot of important
stuff there, but it gives some idea the challenge it is to get your
head around it all.

My first advice is to try to identify which areas of the code you
will need to touch, and read those over. Several times. Try to
infer the API to areas *that* code needs to reference from looking
at other code (as similar to what you want to work on as you can
find), reading code comments and README files, and asking
questions. Secondly, there is a lot that is considered to be
"coding rules" that is, as far as I've been able to tell, only
contained inside the heads of veteran PostgreSQL coders, with
occasional references in the discussion list archives. Asking
questions, proposing approaches before coding, and showing work in
progress early and often will help a lot in terms of discovering
these issues and allowing you to rearrange things to fit these
conventions. If someone with the "gift of gab" is able to capture
these and put them into a readily available form, that would be
fantastic.

* SSI (haven't gone there yet myself)

For anyone wanting to approach this area, there is a fair amount to
look at. There is some overlap, but in rough order of "practical"
to "theoretical foundation", you might want to look at:

https://www.postgresql.org/docs/current/static/transaction-iso.html

https://wiki.postgresql.org/wiki/SSI

The SQL standard

https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob_plain;f=src/backend/storage/lmgr/README-SSI;hb=refs/heads/master

http://www.vldb.org/pvldb/vol5.html

http://hdl.handle.net/2123/5353

Papers cited in these last two. I have found papers authored by
Alan Fekete or Adul Adya particularly enlightening.

If any of the other areas that Craig listed have similar work
available, maybe we should start a Wiki page where we list areas of
code (starting with the list Craig included) as section headers, and
put links to useful reading below each?

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Simon Riggs

simon@2ndQuadrant.com

about 9 years ago

In reply to: Kang Yuzhe (#1)

Re: On How To Shorten the Steep Learning Curve Towards PG Hacking...

On 27 March 2017 at 13:00, Kang Yuzhe <tiggreen87@gmail.com> wrote:

I have found PG source Code reading and hacking to be one the most
frustrating experiences in my life. I believe that PG hacking should not be
a painful journey but an enjoyable one!

It is my strong believe that out of my PG hacking frustrations, there may
come insights for the PG experts on ways how to devise hands-on with PG
internals so that new comers will be great coders as quickly as possible.

I'm here now because PostgreSQL has clear, well designed and
maintained code with accurate docs, great comments and a helpful team.

I'd love to see detailed cases where another project is better in a
measurable way; I am willing to learn from that.

Any journey to expertise takes 10,000 hours. There is no way to shorten that.

What aspect of your journey caused you pain?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Alvaro Herrera

alvherre@2ndquadrant.com

about 9 years ago

In reply to: Craig Ringer (#7)

Re: On How To Shorten the Steep Learning Curve Towards PG Hacking...

Craig Ringer wrote:

Personally I have to agree that the learning curve is very steep. Some
of the docs and presentations help, but there's a LOT to understand.

There is a wiki page "Developer_FAQ" which is supposed to help answer
these questions. It is currently not very useful, because people
stopped adding to it very early and is now mostly unmaintained, but
I'm sure it could become a very useful central resource for this kind of
information.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Kang Yuzhe

tiggreen87@gmail.com

about 9 years ago

In reply to: Alvaro Herrera (#13)

Re: On How To Shorten the Steep Learning Curve Towards PG Hacking...

Thanks Alvaro for taking your time and pointing me to "Developer_FAQ". I
knew this web page and there is good stuff int it.
The most important about "Developer_FAQ" which I believe is that it lists
vital books for PG developers.

Comparing the real challenge I am facing in finding my way in the rabbit
role(the PG source code), "Developer_FAQ" is indeed less useful.

Of course, I am a beginner and I am just beginning and one day I hope with
your support I will figure out to find my space in PG development.

My question is why is that there is a lot of hands-on about PG application
development(eg. connecting to PG using JAVA/JDBC) but almost nothing about
PG hacking hands-on lessons. For example, I wanna add the keyword
"Encrypted" in CREATE TABLE t1(a int, b int encrypted) or CREATE TABLE t1(a
int, b int) encrypted. Alas, its not easy task.

Regards,
Zeray

On Mon, Apr 17, 2017 at 8:29 PM, Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

Show quoted text

Craig Ringer wrote:

Personally I have to agree that the learning curve is very steep. Some
of the docs and presentations help, but there's a LOT to understand.

There is a wiki page "Developer_FAQ" which is supposed to help answer
these questions. It is currently not very useful, because people
stopped adding to it very early and is now mostly unmaintained, but
I'm sure it could become a very useful central resource for this kind of
information.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#15

Kang Yuzhe

tiggreen87@gmail.com

about 9 years ago

In reply to: Kevin Grittner (#11)

Re: On How To Shorten the Steep Learning Curve Towards PG Hacking...

Thanks Kevin for taking your time and justifying the real difficult of
finding ones space/way in PG development.And thanks for your genuine advice
which I have taken it AS IS.
My question is why is that there is a lot of hands-on about PG application
development(eg. connecting to PG using JAVA/JDBC) but almost nothing about
PG hacking hands-on lessons. For example, I wanna add the keyword
"Encrypted" in "CREATE TABLE t1(a int, b int encrypted)" or "CREATE TABLE
t1(a int, b int) encrypted". Alas, its not easy task.

Lastly, I have come to understand that PG community is not harsh to newbies
and thus, I am feeling at home.

Regards,
Zeray

On Mon, Apr 17, 2017 at 6:53 PM, Kevin Grittner <kgrittn@gmail.com> wrote:

Show quoted text

On Tue, Mar 28, 2017 at 10:36 PM, Craig Ringer <craig@2ndquadrant.com>
wrote:

Personally I have to agree that the learning curve is very steep. Some
of the docs and presentations help, but there's a LOT to understand.

Some small patches can be kept to a fairly narrow set of areas, and
if you can find a similar capability to can crib technique for
handling some of the more mysterious areas it might brush up
against. When I started working on my first *big* patch that was
bound to touch many areas (around the start of development for 9.1)
I counted lines of code and found over a million lines just in .c
and .h files. We're now closing in on 1.5 million lines. That's
not counting over 376,000 lines of documentation in .sgml files,
over 12,000 lines of text in README* files, over 26,000 lines of
perl code, over 103,000 lines of .sql code (60% of which is in
regression tests), over 38,000 lines of .y code (for flex/bison
parsing), about 9,000 lines of various type of code just for
generating the configure file, and over 439,000 lines of .po files
(for message translations). I'm sure I missed a lot of important
stuff there, but it gives some idea the challenge it is to get your
head around it all.

My first advice is to try to identify which areas of the code you
will need to touch, and read those over. Several times. Try to
infer the API to areas *that* code needs to reference from looking
at other code (as similar to what you want to work on as you can
find), reading code comments and README files, and asking
questions. Secondly, there is a lot that is considered to be
"coding rules" that is, as far as I've been able to tell, only
contained inside the heads of veteran PostgreSQL coders, with
occasional references in the discussion list archives. Asking
questions, proposing approaches before coding, and showing work in
progress early and often will help a lot in terms of discovering
these issues and allowing you to rearrange things to fit these
conventions. If someone with the "gift of gab" is able to capture
these and put them into a readily available form, that would be
fantastic.

* SSI (haven't gone there yet myself)

For anyone wanting to approach this area, there is a fair amount to
look at. There is some overlap, but in rough order of "practical"
to "theoretical foundation", you might want to look at:

https://www.postgresql.org/docs/current/static/transaction-iso.html

https://wiki.postgresql.org/wiki/SSI

The SQL standard

https://git.postgresql.org/gitweb/?p=postgresql.git;a=
blob_plain;f=src/backend/storage/lmgr/README-SSI;hb=refs/heads/master

http://www.vldb.org/pvldb/vol5.html

http://hdl.handle.net/2123/5353

Papers cited in these last two. I have found papers authored by
Alan Fekete or Adul Adya particularly enlightening.

If any of the other areas that Craig listed have similar work
available, maybe we should start a Wiki page where we list areas of
code (starting with the list Craig included) as section headers, and
put links to useful reading below each?

--
Kevin Grittner
VMware vCenter Server
https://www.vmware.com/

#16

Amit Langote

Langote_Amit_f8@lab.ntt.co.jp

about 9 years ago

In reply to: Kang Yuzhe (#14)

Re: On How To Shorten the Steep Learning Curve Towards PG Hacking...

On 2017/04/18 15:31, Kang Yuzhe wrote:

My question is why is that there is a lot of hands-on about PG application
development(eg. connecting to PG using JAVA/JDBC) but almost nothing about
PG hacking hands-on lessons. For example, I wanna add the keyword
"Encrypted" in CREATE TABLE t1(a int, b int encrypted) or CREATE TABLE t1(a
int, b int) encrypted. Alas, its not easy task.

Regarding this part, at one of the links shared above [1]http://www.neilconway.org/talks/hacking/, you can find
presentations with hands-on instructions about how to implement a new SQL
functionality by modifying various parts of the source code. See these:

Implementing a TABLESAMPLE clause (by Neil Conway)
http://www.neilconway.org/talks/hacking/ottawa/ottawa_slides.pdf

Add support for the WHEN clause to the CREATE TRIGGER statement (by Neil
Conway)
http://www.neilconway.org/talks/hacking/hack_slides.pdf

(by Gavin Sherry)
https://linux.org.au/conf/2007/att_data/Miniconfs(2f)PostgreSQL/attachments/hacking_intro.pdf

Handout: The Implementation of TABLESAMPLE
http://www.neilconway.org/talks/hacking/ottawa/ottawa_handout.pdf

Handout: Adding WHEN clause to CREATE TRIGGER
http://www.neilconway.org/talks/hacking/hack_handout.pdf

Some of the details might be dated, because they were written more than 10
years ago, but will definitely get you motivated to dive more into the
source code.

Thanks,
Amit

[1]: http://www.neilconway.org/talks/hacking/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Kang Yuzhe

tiggreen87@gmail.com

about 9 years ago

In reply to: Simon Riggs (#12)

Re: On How To Shorten the Steep Learning Curve Towards PG Hacking...

Thanks Simon for taking your time and trying to tell and warn me the harsh
reality truth:there is no shortcut to expertise. One has to fail and rise
towards any journey to expertise.
Overall, you are right. But I do believe that there is a way(some
techniques) to speed up any journey to expertise. One of them is
mentorship. For example(just an example), If you show me how to design and
implement FDW to Hadoop/HBase., I believe that I will manage to design and
implement FDW to Cassandra/MengoDB.

The paths towards any journey to expertise by working alone/the hard way
and working with you using as a mentorship are completely different. I
believe that we humans have to power to imitate and get innovative
afterwords.

There are many books on PG business application development:
1.
*PostgreSQL Essential Reference/Barry Stinson2. *PostgreSQL : introduction
and concepts / Momjian,
Bruce.
3. PostgreSQL Cookbook/Over 90 hands-on recipes to effectively manage,
administer, and design solutions using PostgreSQL
4.PostgreSQL Developer's Handbook
5.PostgreSQL 9.0 High Performance
6.PostgreSQL Server Programming
7.PostgreSQL for Data Architects/Discover how to design, develop, and
maintain your
database application effectively with PostgreSQL
8.Practical PostgreSQL
9.Practical SQL Handbook, The: Using SQL Variants, Fourth Edition
10.PostgreSQL: The comprehensive guide to building, programming, and
administering PostgreSQL databases, Second Edition
11.Beginning Databases with PostgreSQL From Novice to Professional, Second
Edition
12.PostgreSQL Succinctly
13.PostgreSQL Up and Running
....

But almost nothing about The Internals of PostgreSQL:
1. The Internals of PostgreSQL:
http://www.interdb.jp/pg/index.html translated from Japanese Book
2. PostgreSQL数据库内核分析(Chinese) Book on the Internals of PostgreSQL:
3. PG Docs/site
4. some here and there which are less useful

Lastly, I have come to understand that PG community is not
harsh/intimidating to newbies and thus, I am feeling at home.

Regards,
Zeray

On Mon, Apr 17, 2017 at 7:33 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

Show quoted text

On 27 March 2017 at 13:00, Kang Yuzhe <tiggreen87@gmail.com> wrote:

I have found PG source Code reading and hacking to be one the most
frustrating experiences in my life. I believe that PG hacking should

not be

a painful journey but an enjoyable one!

It is my strong believe that out of my PG hacking frustrations, there may
come insights for the PG experts on ways how to devise hands-on with PG
internals so that new comers will be great coders as quickly as possible.

I'm here now because PostgreSQL has clear, well designed and
maintained code with accurate docs, great comments and a helpful team.

I'd love to see detailed cases where another project is better in a
measurable way; I am willing to learn from that.

Any journey to expertise takes 10,000 hours. There is no way to shorten
that.

What aspect of your journey caused you pain?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#18

Craig Ringer

craig@2ndquadrant.com

about 9 years ago

In reply to: Alvaro Herrera (#13)

Re: On How To Shorten the Steep Learning Curve Towards PG Hacking...

On 18 April 2017 at 01:29, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Craig Ringer wrote:

Personally I have to agree that the learning curve is very steep. Some
of the docs and presentations help, but there's a LOT to understand.

There is a wiki page "Developer_FAQ" which is supposed to help answer
these questions. It is currently not very useful, because people
stopped adding to it very early and is now mostly unmaintained, but
I'm sure it could become a very useful central resource for this kind of
information.

I add to it when I think of things.

But it'll become an unmaintainable useless morass if random things are
just indiscriminately added. Something more structured is needed to
cover subsystems, coding rules ("don't LWLockRelease() before
ereport(ERROR, ...)"), etc.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

Craig Ringer

craig@2ndquadrant.com

about 9 years ago

In reply to: Kang Yuzhe (#17)

Re: On How To Shorten the Steep Learning Curve Towards PG Hacking...

On 18 April 2017 at 15:41, Kang Yuzhe <tiggreen87@gmail.com> wrote:

Thanks Simon for taking your time and trying to tell and warn me the harsh
reality truth:there is no shortcut to expertise. One has to fail and rise
towards any journey to expertise.

Yeah, just because Pg is hard doesn't mean it's notably bad or worse
than other things. I generally find working on code in other projects,
even smaller and simpler ones, a rather unpleasant change.

That doesn't mean we can't do things to help interested new people get
and stay engaged and grow into productive devs to grow our pool.

Overall, you are right. But I do believe that there is a way(some
techniques) to speed up any journey to expertise. One of them is mentorship.
For example(just an example), If you show me how to design and implement FDW
to Hadoop/HBase., I believe that I will manage to design and implement FDW
to Cassandra/MengoDB.

TBH, that's the sort of thing where looking at existing examples is
often the best way forward and will stay that way.

What I'd like to do is make it easier to understand those examples by
providing background and overview info on subsystems, so you can read
the code and have more idea what it does and why.

But almost nothing about The Internals of PostgreSQL:

Not surprising. They'd go out of date fast, be a huge effort to write
and maintain, and sell poorly given the small audience.

Print books probably aren't the way forward here.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

Kang Yuzhe

tiggreen87@gmail.com

about 9 years ago

In reply to: Amit Langote (#6)

Re: On How To Shorten the Steep Learning Curve Towards PG Hacking...

Thanks Amit for taking your time and pointing to some useful stuff on the
Internals of PostgreSQL.

One thing I have learned is that PG community is not as hostile/harsh as I
imagined to newbies. Rather, its the reverse.
I am feeling at home here.

Amit, would you please help out on how to apply some patches in PG source
code. For example, there are two patches attached here: one on
CORRESPONDING CLAUSE and one on MERGE SQL Standard.

There are some errors saying Hunk failed(src/backend/parser/gram.y.rej).

postgresql-9.6.2$ patch --dry-run -p1 < corresponding_clause_v12.patch
patching file doc/src/sgml/queries.sgml
Hunk #1 succeeded at 1603 (offset 2 lines).
Hunk #2 succeeded at 1622 (offset 2 lines).
Hunk #3 succeeded at 1664 (offset 2 lines).
patching file doc/src/sgml/sql.sgml
patching file src/backend/nodes/copyfuncs.c
Hunk #1 succeeded at 2807 (offset -188 lines).
Hunk #2 succeeded at 2823 (offset -188 lines).
Hunk #3 succeeded at 4251 (offset -340 lines).
patching file src/backend/nodes/equalfuncs.c
Hunk #1 succeeded at 995 (offset -55 lines).
Hunk #2 succeeded at 1009 (offset -55 lines).
Hunk #3 succeeded at 2708 (offset -230 lines).
patching file src/backend/nodes/nodeFuncs.c
Hunk #1 succeeded at 3384 (offset -60 lines).
patching file src/backend/nodes/outfuncs.c
Hunk #1 succeeded at 2500 (offset -164 lines).
Hunk #2 succeeded at 2793 (offset -179 lines).
Hunk #3 succeeded at 2967 (offset -184 lines).
patching file src/backend/nodes/readfuncs.c
Hunk #1 succeeded at 414 (offset -2 lines).
patching file src/backend/nodes/value.c
patching file src/backend/optimizer/prep/prepunion.c
Hunk #1 succeeded at 92 (offset 1 line).
Hunk #2 succeeded at 112 (offset 1 line).
Hunk #3 succeeded at 190 (offset 1 line).
Hunk #4 succeeded at 273 (offset 1 line).
Hunk #5 succeeded at 339 (offset 1 line).
Hunk #6 succeeded at 445 (offset 1 line).
Hunk #7 succeeded at 1057 (offset 1 line).
Hunk #8 succeeded at 1080 (offset 1 line).
Hunk #9 succeeded at 2190 (offset -13 lines).
patching file src/backend/parser/analyze.c
Hunk #1 succeeded at 75 (offset -1 lines).
Hunk #2 succeeded at 1600 (offset -61 lines).
Hunk #3 succeeded at 1882 (offset -69 lines).
Hunk #4 succeeded at 1892 (offset -69 lines).
Hunk #5 succeeded at 1994 (offset -69 lines).
patching file src/backend/parser/gram.y
Hunk #1 succeeded at 158 (offset -8 lines).
Hunk #2 FAILED at 394.
Hunk #3 succeeded at 573 with fuzz 2 (offset -41 lines).
Hunk #4 succeeded at 3328 (offset -251 lines).
Hunk #5 succeeded at 10182 (offset -699 lines).
Hunk #6 succeeded at 13470 (offset -771 lines).
Hunk #7 succeeded at 13784 (offset -773 lines).
Hunk #8 succeeded at 14581 (offset -811 lines).
Hunk #9 succeeded at 14589 (offset -811 lines).
1 out of 9 hunks FAILED -- saving rejects to file
src/backend/parser/gram.y.rej
patching file src/backend/parser/parse_type.c
Hunk #1 succeeded at 736 (offset 1 line).
patching file src/backend/utils/adt/ruleutils.c
Hunk #1 succeeded at 5166 (offset -276 lines).
patching file src/include/nodes/parsenodes.h
Hunk #1 succeeded at 1285 (offset -175 lines).
Hunk #2 succeeded at 1321 (offset -175 lines).
Hunk #3 succeeded at 1350 (offset -175 lines).
patching file src/include/nodes/value.h
patching file src/include/parser/kwlist.h
Hunk #1 succeeded at 95 (offset -2 lines).
patching file src/test/regress/expected/create_view.out
Hunk #1 succeeded at 1571 (offset -55 lines).
patching file src/test/regress/expected/rules.out
Hunk #1 succeeded at 2260 (offset -85 lines).
patching file src/test/regress/expected/union.out
Hunk #1 succeeded at 59 with fuzz 2.
Hunk #3 succeeded at 479 (offset -1 lines).
Hunk #4 succeeded at 609 (offset -1 lines).
Hunk #5 succeeded at 684 (offset -1 lines).
Hunk #6 succeeded at 785 with fuzz 1 (offset -1 lines).
Hunk #7 succeeded at 838 (offset -1 lines).
patching file src/test/regress/sql/create_view.sql
Hunk #1 succeeded at 524 (offset -27 lines).
patching file src/test/regress/sql/union.sql
Hunk #1 succeeded at 20 with fuzz 2.
Hunk #2 succeeded at 69 with fuzz 2.
Hunk #3 succeeded at 149 (offset -1 lines).
Hunk #4 succeeded at 194 (offset -1 lines).
Hunk #5 succeeded at 218 (offset -1 lines).
Hunk #6 succeeded at 252 with fuzz 2 (offset -1 lines).
Hunk #7 succeeded at 281 (offset -1 lines).
..../postgresql-9.6.2$

Regards,
Zeray

On Wed, Mar 29, 2017 at 5:53 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp

Show quoted text

wrote:

Hi,

On 2017/03/28 15:40, Kang Yuzhe wrote:

Thanks Tsunakawa for such an informative reply.

Almost all of the docs related to the internals of PG are of introductory
concepts only.
There is even more useful PG internals site entitled "The Internals of
PostgreSQL" in http://www.interdb.jp/pg/ translation of the Japanese PG
Internals.

The query processing framework that is described in the manual as you
mentioned is of informative and introductory nature.
In theory, the query processing framework described in the manual is
understandable.

Unfortunate, it is another story to understand how query processing
framework in PG codebase really works.
It has become a difficult task for me to walk through the PG source code
for example how SELECT/INSERT/TRUNCATE in the the different modules under
"src/..". really works.

I wish there were Hands-On with PostgreSQL Internals like
https://bkmjournal.wordpress.com/2017/01/22/hands-on-with-

postgresql-internals/

for more complex PG features.

For example, MERGE SQL standard is not supported yet by PG. I wish there
were Hands-On with PostgreSQL Internals for MERGE/UPSERT. How it is
implemented in parser/executor/storage etc. modules with detailed
explanation for each code and debugging and other important concepts
related to system programming.

I am not sure if I can show you that one place where you could learn all
of that, but many people who started with PostgreSQL development at some
point started by exploring the source code itself (either for learning or
to write a feature patch), articles on PostgreSQL wiki, and many related
presentations accessible using the Internet. I liked the following among
many others:

Introduction to Hacking PostgreSQL:
http://www.neilconway.org/talks/hacking/

Inside the PostgreSQL Query Optimizer:
http://www.neilconway.org/talks/optimizer/optimizer.pdf

Postgres Internals Presentations:
http://momjian.us/main/presentations/internals.html

Thanks,
Amit