code contributions for 2024, WIP version
Hi,
As many of you are probably aware, I have been doing an annual blog
post on who contributes to PostgreSQL development for some years now.
It includes information on lines of code committed to PostgreSQL, and
also emails sent to the list. This year, I got a jump on analyzing the
commit log, and a draft of the data covering January-November of 2024
has been uploaded in pg_dump format to here:
https://sites.google.com/site/robertmhaas/contributions
I'm sending this message to invite anyone who is interested to review
the data in the commits2024 table and send me corrections. For
example, it's possible that there are cases where I've failed to pick
out the correct primary author for a commit; or where somebody's name
is spelled in two different ways; or where somebody's name is not
spelled the way that they prefer.
You'll notice that the table has columns "lines" and "xlines". I have
set xlines=0 in cases where (a) I considered the commit to be a large,
mechanical commit such as a pgindent run or translation updates; or
(b) the commit was reverting some other commit that occurred earlier
in 2024; or (c) the commit was subsequently reverted. When I run the
final statistics, those commits will still count for the statistics
that count the number of commits, but the lines they inserted will not
be counted as lines of code contributed in 2024. Also for clarity,
please be aware that the "ncauthor" column is not used in the final
reporting; that is just there so that I can set
author=coalesce(ncauthor,committer) at a certain phase of the data
preparation. Corrections should be made to the author column, not
ncauthor.
If you would like to correct the data, please send me your corrections
off-list, as a reply to this email, ideally in the form of one or more
UPDATE statements. If you would like to complain about the
methodology, I can't stop you, but please bear in mind that (1) this
is already a lot of work and (2) I've always been upfront in my blog
post about what the limitations of the methodology are and I do my
best not to suggest that this method is somehow perfect or
unimpeachable and (3) you're welcome to publish your own blog post
where you compute things differently. I'm open to reasonable
suggestions for improvement, but if your overall view is that this
sucks or that I suck for doing it, I'm sorry that you feel that way
but giving me that feedback probably will not induce me to do anything
differently.
Donning my asbestos underwear, I remain yours faithfully,
--
Robert Haas
EDB: http://www.enterprisedb.com
On Mon, Dec 02, 2024 at 04:10:22PM -0500, Robert Haas wrote:
Donning my asbestos underwear, I remain yours faithfully,
Thanks for taking the time to compile all that. That's really nice.
--
Michael
On Tue, Dec 03, 2024 at 10:16:35AM +0900, Michael Paquier wrote:
On Mon, Dec 02, 2024 at 04:10:22PM -0500, Robert Haas wrote:
Donning my asbestos underwear, I remain yours faithfully,
Thanks for taking the time to compile all that. That's really nice.
+1, I always look forward to the blog post.
--
nathan
On Tue, Dec 3, 2024 at 10:37 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Tue, Dec 03, 2024 at 10:16:35AM +0900, Michael Paquier wrote:
On Mon, Dec 02, 2024 at 04:10:22PM -0500, Robert Haas wrote:
Donning my asbestos underwear, I remain yours faithfully,
Thanks for taking the time to compile all that. That's really nice.
+1, I always look forward to the blog post.
Thanks, glad it's appreciated.
--
Robert Haas
EDB: http://www.enterprisedb.com
On 12/3/24 10:44, Robert Haas wrote:
On Tue, Dec 3, 2024 at 10:37 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Tue, Dec 03, 2024 at 10:16:35AM +0900, Michael Paquier wrote:
On Mon, Dec 02, 2024 at 04:10:22PM -0500, Robert Haas wrote:
Donning my asbestos underwear, I remain yours faithfully,
Thanks for taking the time to compile all that. That's really nice.
+1, I always look forward to the blog post.
Thanks, glad it's appreciated.
It is definitely appreciated.
While I know you said "you will do you" when it comes to your annual
blog, there are a number of similar efforts -- top of mind is the
analysis done (as I understand it) by Daniel Gustafsson and Claire
Giordano [1]https://speakerdeck.com/clairegiordano/whats-in-a-postgres-major-release-an-analysis-of-contributions-in-the-v17-timeframe-claire-giordano-pgconf-eu-2024 -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com, as well as ongoing/recurring analysis done by the
contributor committee. And there is the adjacent related discussion
around commit messages/authors. It makes me wonder if there isn't a way
to make all of our lives easier going forward.
[1]: https://speakerdeck.com/clairegiordano/whats-in-a-postgres-major-release-an-analysis-of-contributions-in-the-v17-timeframe-claire-giordano-pgconf-eu-2024 -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
https://speakerdeck.com/clairegiordano/whats-in-a-postgres-major-release-an-analysis-of-contributions-in-the-v17-timeframe-claire-giordano-pgconf-eu-2024
--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Tue, Dec 3, 2024 at 11:19 AM Joe Conway <mail@joeconway.com> wrote:
While I know you said "you will do you" when it comes to your annual
blog, there are a number of similar efforts -- top of mind is the
analysis done (as I understand it) by Daniel Gustafsson and Claire
Giordano [1], as well as ongoing/recurring analysis done by the
contributor committee. And there is the adjacent related discussion
around commit messages/authors. It makes me wonder if there isn't a way
to make all of our lives easier going forward.
Yes, I'm game to try to figure out how to combine our efforts. I don't
think it's a bad thing that different people have different takes;
this is complicated and looking at it through just one lens is
limiting. But people duplicating work is, well, not so good.
--
Robert Haas
EDB: http://www.enterprisedb.com
On 3 Dec 2024, at 17:41, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Dec 3, 2024 at 11:19 AM Joe Conway <mail@joeconway.com> wrote:
While I know you said "you will do you" when it comes to your annual
blog, there are a number of similar efforts -- top of mind is the
analysis done (as I understand it) by Daniel Gustafsson and Claire
Giordano [1], as well as ongoing/recurring analysis done by the
contributor committee. And there is the adjacent related discussion
around commit messages/authors. It makes me wonder if there isn't a way
to make all of our lives easier going forward.Yes, I'm game to try to figure out how to combine our efforts. I don't
think it's a bad thing that different people have different takes;
this is complicated and looking at it through just one lens is
limiting. But people duplicating work is, well, not so good.
If we settled on a meta-data standard for how to identify authors, reviewers,
backpatches etc I think that would go a very long way to lower the complexity
of getting to the data and keep folks focused on doing interesting analysis.
--
Daniel Gustafsson
Hello Robert,
On 2024-Dec-02, Robert Haas wrote:
As many of you are probably aware, I have been doing an annual blog
post on who contributes to PostgreSQL development for some years now.
It includes information on lines of code committed to PostgreSQL, and
also emails sent to the list. This year, I got a jump on analyzing the
commit log, and a draft of the data covering January-November of 2024
has been uploaded in pg_dump format to here:https://sites.google.com/site/robertmhaas/contributions
I'm sending this message to invite anyone who is interested to review
the data in the commits2024 table and send me corrections.
No corrections here -- I noticed nothing wrong with the commits I am
involved with, in a quick read. I did notice that for patches with
multiple authors, only the first one is listed. For instance,
53c2a97a926's author ("Improve performance of subsystems on top of
SLRU") is listed as Andrey Borodin, leaving Dilip Kumar out. I realize
that addressing this would complicate the schema and queries, but maybe
it's worth thinking about for next time. We have plenty of patches with
multiple authors, after all.
Hmm, maybe
UPDATE commits2024 SET xlines = 0 WHERE commitid in
('43ce181059d', '4632e5cf4bc', '6377e12a5a5', 'ff9f72c68f6',
'21ef4d4d897', '592a2283721');
How did you come up with the 'lines' number for each commit anyway?
Judging by 592a2283721 it's not just the number of lines added, since
that commit added 3 lines and you have lines=2.
An unrelated (and possibly useless) thing is that some committers seem
firmly in the camp of ending commit titles with a period, others are
firmly in the other camp; only two people seem not to have made up their
minds about that:
committer │ with end period │ without end period │ fraction with end period
────────────────────┼─────────────────┼────────────────────┼──────────────────────────
Etsuro Fujita │ 6 │ 0 │ 100.00
Peter Geoghegan │ 39 │ 0 │ 100.00
Tatsuo Ishii │ 8 │ 0 │ 100.00
Amit Kapila │ 87 │ 0 │ 100.00
Fujii Masao │ 35 │ 0 │ 100.00
Tom Lane │ 296 │ 1 │ 99.66
Nathan Bossart │ 131 │ 1 │ 99.24
Jeff Davis │ 88 │ 1 │ 98.88
Noah Misch │ 61 │ 1 │ 98.39
Thomas Munro │ 59 │ 1 │ 98.33
Masahiko Sawada │ 39 │ 1 │ 97.50
Dean Rasheed │ 23 │ 1 │ 95.83
Robert Haas │ 77 │ 10 │ 88.51
Joe Conway │ 1 │ 2 │ 33.33
Alexander Korotkov │ 4 │ 153 │ 2.55
Andrew Dunstan │ 1 │ 40 │ 2.44
Bruce Momjian │ 2 │ 82 │ 2.38
Heikki Linnakangas │ 4 │ 174 │ 2.25
Peter Eisentraut │ 6 │ 309 │ 1.90
Amit Langote │ 1 │ 54 │ 1.82
Álvaro Herrera │ 1 │ 118 │ 0.84
Michael Paquier │ 1 │ 275 │ 0.36
Andres Freund │ 0 │ 26 │ 0.00
Richard Guo │ 0 │ 27 │ 0.00
Daniel Gustafsson │ 0 │ 99 │ 0.00
Magnus Hagander │ 0 │ 4 │ 0.00
John Naylor │ 0 │ 33 │ 0.00
Melanie Plageman │ 0 │ 6 │ 0.00
David Rowley │ 0 │ 106 │ 0.00
Tomas Vondra │ 0 │ 33 │ 0.00
Query was:
select committer,
count(*) filter (where subject like '%.') as "with end period",
count(*) filter (where subject not like '%.') "without end period",
((count(*) filter (where subject like '%.'))::numeric / count(*) * 100)::numeric(5,2) as "fraction with end period"
from commits2024
group by committer
order by 4 desc, split_part(committer, ' ', 2);
Thanks!
--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"The problem with the facetime model is not just that it's demoralizing, but
that the people pretending to work interrupt the ones actually working."
-- Paul Graham, http://www.paulgraham.com/opensource.html
On 5 Dec 2024, at 17:46, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
We have plenty of patches with
multiple authors, after all.
+1, thanks for raising this. A lot of stuff is actually joint work.
It’s much more fun to develop something in a group of co-authors.
Best regards, Andrey Borodin.
On Thu, Dec 5, 2024 at 7:46 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
No corrections here -- I noticed nothing wrong with the commits I am
involved with, in a quick read. I did notice that for patches with
multiple authors, only the first one is listed. For instance,
53c2a97a926's author ("Improve performance of subsystems on top of
SLRU") is listed as Andrey Borodin, leaving Dilip Kumar out. I realize
that addressing this would complicate the schema and queries, but maybe
it's worth thinking about for next time. We have plenty of patches with
multiple authors, after all.
I agree, but I don't know how to apportion the work between the
authors. I think dividing credit equally between two or three authors
would often be very unfair to the first author. If we want to annotate
commit messages in a way that allows me to apportion credit more
fairly, I'm totally game to do that, but otherwise I think that giving
the credit to the first author is probably more fair on average.
Hmm, maybe
UPDATE commits2024 SET xlines = 0 WHERE commitid in
('43ce181059d', '4632e5cf4bc', '6377e12a5a5', 'ff9f72c68f6',
'21ef4d4d897', '592a2283721');
Thanks.
How did you come up with the 'lines' number for each commit anyway?
Judging by 592a2283721 it's not just the number of lines added, since
that commit added 3 lines and you have lines=2.
git log --before=${YEAR}-12-31 --after=${YEAR}-01-01 --shortstat -w -M
--
Robert Haas
EDB: http://www.enterprisedb.com
While I know you said "you will do you" when it comes to your annual
blog, there are a number of similar efforts -- top of mind is the
analysis done (as I understand it) by Daniel Gustafsson and Claire
Giordano [1], as well as ongoing/recurring analysis done by the
contributor committee. And there is the adjacent related discussion
around commit messages/authors. It makes me wonder if there isn't a way
to make all of our lives easier going forward.
Perhaps slightly off topic, so how does one provide input to the
contributor committee?
[1]
--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
--
Thomas John Kincaid
On 2024-Dec-05, Robert Haas wrote:
On Thu, Dec 5, 2024 at 7:46 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
No corrections here -- I noticed nothing wrong with the commits I am
involved with, in a quick read. I did notice that for patches with
multiple authors, only the first one is listed. For instance,
53c2a97a926's author ("Improve performance of subsystems on top of
SLRU") is listed as Andrey Borodin, leaving Dilip Kumar out. I realize
that addressing this would complicate the schema and queries, but maybe
it's worth thinking about for next time. We have plenty of patches with
multiple authors, after all.I agree, but I don't know how to apportion the work between the
authors. I think dividing credit equally between two or three authors
would often be very unfair to the first author. If we want to annotate
commit messages in a way that allows me to apportion credit more
fairly, I'm totally game to do that, but otherwise I think that giving
the credit to the first author is probably more fair on average.
Just give credit to all lines for all authors, would be my approach. Is
that unfair? Perhaps, but I'd rather err on the side of giving too much
credit, than on not giving enough.
How did you come up with the 'lines' number for each commit anyway?
Judging by 592a2283721 it's not just the number of lines added, since
that commit added 3 lines and you have lines=2.git log --before=${YEAR}-12-31 --after=${YEAR}-01-01 --shortstat -w -M
Ah, it's -w that makes the difference, got it.
--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Right now the sectors on the hard disk run clockwise, but I heard a rumor that
you can squeeze 0.2% more throughput by running them counterclockwise.
It's worth the effort. Recommended." (Gerry Pourwelle)
On Thu, Dec 5, 2024 at 10:39:38AM -0500, Tom Kincaid wrote:
While I know you said "you will do you" when it comes to your annual
blog, there are a number of similar efforts -- top of mind is the
analysis done (as I understand it) by Daniel Gustafsson and Claire
Giordano [1], as well as ongoing/recurring analysis done by the
contributor committee. And there is the adjacent related discussion
around commit messages/authors. It makes me wonder if there isn't a way
to make all of our lives easier going forward.Perhaps slightly off topic, so how does one provide input to the contributor
committee?
The committee is responsible for updating the contributors list web page:
https://www.postgresql.org/community/contributors/
and does analysis of contributions to the Postgres community to help
update the list.
Their email address at the bottom:
contributors@postgresql.org
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
Do not let urgent matters crowd out time for investment in the future.
On Thu, Dec 5, 2024 at 11:19 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
Just give credit to all lines for all authors, would be my approach. Is
that unfair? Perhaps, but I'd rather err on the side of giving too much
credit, than on not giving enough.
I'm not against somebody putting that together, but I don't think it
would be useful for me. I think it would inflate the numbers for
committers by quite a lot more than what is fair, because if I commit
a 1000 line patch and I add 50 lines of code, I'm going to get an
awful lot more credit than I deserve. It would probably also inflate
or distort the numbers for some other people as well. But what I would
say is -- if you think it's a useful thing, try doing it.
--
Robert Haas
EDB: http://www.enterprisedb.com
Hi,
A draft of my analysis of code contributions for 2025 can be found at
https://sites.google.com/site/robertmhaas/contributions in
contributions2025-wip.dmp. In contrast to previous years, I was able
to do much more of this in an automated way this year: the principal
author of the commit was computed by grabbing the first Author or
Co-Authored-By tag from the commit message rather than by manual
inspection of all the commit messages. Yay!
Just like last year, I invite corrections from anyone who is
interested in providing them. The table of interest is commits2025,
which has columns lines and xlines. xlines is what will be used to
produce the final blog post. As usual, I've set xlines=0 if a commit
seemed to be a large, mechanical commit that shouldn't count toward
someone's lines contributed. Also, this year, for certain patches that
touched the Unicode translate tables, instead of setting xlines=0,
I've decremented it by the size of the changes to the Unicode tables,
to avoid overcounting the significance of those commits relative to
others while still giving credit for the net new code. I did not
bother to account for reverts as carefully this year, because,
thankfully, most of them touched only relatively small numbers of
lines of code, and so it didn't seem to me that they affected the
statistics very much. If I did account for reverts more carefully,
what I would do is set xlines=0 for both reverts and the reverted
commits, but only when both occurred in the same calendar year. I'm
open to feedback on whether that should be pursued further in the
interest of accuracy, but so far it didn't seem especially important
given the shape of this year's data.
My main reason for putting this out for possible corrections is to fix
author names. If the primary author of a commit is not as listed, or
where I have multiple spellings for the same person's name, or where
someone's name is not spelled as they prefer, corrections are welcome.
Secondarily, if you think I should set xlines=0 for some mechanical
commit that was not identified as such in my initial analysis, you can
also tell me about that. As before, please send corrections off-list
as proposed UPDATE statements against the commits2025 table.
Thanks,
--
Robert Haas
EDB: http://www.enterprisedb.com
Hi,
On Tue, Jan 13, 2026 at 02:54:59PM -0500, Robert Haas wrote:
Hi,
A draft of my analysis of code contributions for 2025 can be found at
https://sites.google.com/site/robertmhaas/contributions in
contributions2025-wip.dmp.
Thanks for taking the time to do that!
My main reason for putting this out for possible corrections is to fix
author names.
I did a quick scan and it looks like "Hou Zhijie" is listed twice: one as
"Zhijie Hou" and one as "Hou Zhijie". So it looks like those related numbers
should be added and displayed as a single entry.
Looking at the commit log for 2025, they were all associated with the same
email "houzj.fnst@fujitsu.com".
So, generally speaking, maybe the counts should be based on the email
address instead and then pick up one of the "name surname"?
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Wed, Jan 14, 2026 at 06:01:56PM +0000, Bertrand Drouvot wrote:
Hi,
On Tue, Jan 13, 2026 at 02:54:59PM -0500, Robert Haas wrote:
Hi,
A draft of my analysis of code contributions for 2025 can be found at
https://sites.google.com/site/robertmhaas/contributions in
contributions2025-wip.dmp.Thanks for taking the time to do that!
My main reason for putting this out for possible corrections is to fix
author names.I did a quick scan and it looks like "Hou Zhijie" is listed twice: one as
"Zhijie Hou" and one as "Hou Zhijie". So it looks like those related numbers
should be added and displayed as a single entry.
Maybe all those ones could be double checked? (already done for "Hou Zhijie").
postgres=# SELECT a1.author, a2.author,
similarity(a1.author, a2.author) as similarity_score
FROM top_authors2025 a1
JOIN top_authors2025 a2 ON a1.author < a2.author
WHERE similarity(a1.author, a2.author) > 0.6
ORDER BY similarity_score DESC;
author | author | similarity_score
----------------------+-----------------------+------------------
Hou Zhijie [*] | Zhijie Hou [*] | 1
Maksim Melnikov [*] | Melnikov Maksim [*] | 1
Andrei Lepikhov [*] | Andrey Lepikhov [*] | 0.7777778
Mihail Nikalayeu [*] | Mikhail Nikalayeu [*] | 0.75
Lukas Fitti [*] | Lukas Fittl [*] | 0.71428573
Dmitry Koval [*] | Dmitry Kovalenko [*] | 0.6666667
(6 rows)
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Wed, 14 Jan 2026, 23:52 Bertrand Drouvot, <bertranddrouvot.pg@gmail.com>
wrote:
On Wed, Jan 14, 2026 at 06:01:56PM +0000, Bertrand Drouvot wrote:
Hi,
On Tue, Jan 13, 2026 at 02:54:59PM -0500, Robert Haas wrote:
Hi,
A draft of my analysis of code contributions for 2025 can be found at
https://sites.google.com/site/robertmhaas/contributions in
contributions2025-wip.dmp.Thanks for taking the time to do that!
My main reason for putting this out for possible corrections is to fix
author names.I did a quick scan and it looks like "Hou Zhijie" is listed twice: one
as
"Zhijie Hou" and one as "Hou Zhijie". So it looks like those related
numbers
should be added and displayed as a single entry.
Maybe all those ones could be double checked? (already done for "Hou
Zhijie").postgres=# SELECT a1.author, a2.author,
similarity(a1.author, a2.author) as similarity_score
FROM top_authors2025 a1
JOIN top_authors2025 a2 ON a1.author < a2.author
WHERE similarity(a1.author, a2.author) > 0.6
ORDER BY similarity_score DESC;
author | author | similarity_score
----------------------+-----------------------+------------------
Hou Zhijie [*] | Zhijie Hou [*] | 1
Maksim Melnikov [*] | Melnikov Maksim [*] | 1
Andrei Lepikhov [*] | Andrey Lepikhov [*] | 0.7777778
Mihail Nikalayeu [*] | Mikhail Nikalayeu [*] | 0.75
Lukas Fitti [*] | Lukas Fittl [*] | 0.71428573
Dmitry Koval [*] | Dmitry Kovalenko [*] | 0.6666667
(6 rows)Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Hi
Dmitry Koval and Dmitry Kovalenko and not the same person
Other cases, yes, the same person
On Wed, Jan 14, 2026 at 1:52 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
postgres=# SELECT a1.author, a2.author,
similarity(a1.author, a2.author) as similarity_score
FROM top_authors2025 a1
JOIN top_authors2025 a2 ON a1.author < a2.author
WHERE similarity(a1.author, a2.author) > 0.6
ORDER BY similarity_score DESC;
author | author | similarity_score
----------------------+-----------------------+------------------
Hou Zhijie [*] | Zhijie Hou [*] | 1
Maksim Melnikov [*] | Melnikov Maksim [*] | 1
Andrei Lepikhov [*] | Andrey Lepikhov [*] | 0.7777778
Mihail Nikalayeu [*] | Mikhail Nikalayeu [*] | 0.75
Lukas Fitti [*] | Lukas Fittl [*] | 0.71428573
Dmitry Koval [*] | Dmitry Kovalenko [*] | 0.6666667
(6 rows)
I have made these corrections:
update commits2025 set author = 'Hou Zhijie' where author = 'Zhijie Hou';
update commits2025 set author = 'Maksim Melnikov' where author =
'Melnikov Maksim';
update commits2025 set author = 'Andrei Lepikhov' where author =
'Andrey Lepikhov';
update commits2025 set author = 'Lukas Fittl' where author = 'Lukas Fitti';
Please let me know if you see anything else.
Thanks,
--
Robert Haas
EDB: http://www.enterprisedb.com
чт, 15 янв. 2026 г., 22:46 Robert Haas <robertmhaas@gmail.com>:
I have made these corrections:
update commits2025 set author = 'Hou Zhijie' where author = 'Zhijie Hou';
update commits2025 set author = 'Maksim Melnikov' where author =
'Melnikov Maksim';
update commits2025 set author = 'Andrei Lepikhov' where author =
'Andrey Lepikhov';
update commits2025 set author = 'Lukas Fittl' where author = 'Lukas Fitti';
Looks like you missed me,
Mihail Nikalayeu [*] | Mikhail Nikalayeu [*] | 0.75
Thanks!