10% drop in code line count in PG 17

Started by Bruce Momjianabout 2 months ago23 messages
#1Bruce Momjian
bruce@momjian.us

While working on a talk, I studied the number of code line changes in
each major release, and found PG 17 surprisingly reduced code line count
by 10%. To get the code line count, I used /pgtop/src/tools/codelines,
which runs:

find . -name '*.[chyl]' | xargs cat| wc -l

Any ideas on the cause of this decrease? I skimmed the major release
notes but didn't see anything obvious. I see removal of support for
OpenSSL 1.0.1 and AIX.

---------------------------------------------------------------------------

version | reldate | months | relnotes | lines | change | % change
----------+------------+--------+----------+---------+---------+----------
4.2 | 1994-03-17 | | | 250872 | |
1.0 | 1995-09-05 | 18 | | 172470 | -78402 | -31
1.01 | 1996-02-23 | 6 | | 179463 | 6993 | 4
1.09 | 1996-11-04 | 8 | | 178976 | -487 | 0
6.0 | 1997-01-29 | 3 | | 189399 | 10423 | 5
6.1 | 1997-06-08 | 4 | | 200709 | 11310 | 5
6.2 | 1997-10-02 | 4 | | 225848 | 25139 | 12
6.3 | 1998-03-01 | 5 | | 260809 | 34961 | 15
6.4 | 1998-10-30 | 8 | | 297918 | 37109 | 14
6.5 | 1999-06-09 | 7 | | 331278 | 33360 | 11
7.0 | 2000-05-08 | 11 | | 383270 | 51992 | 15
7.1 | 2001-04-13 | 11 | | 410500 | 27230 | 7
7.2 | 2002-02-04 | 10 | 250 | 394274 | -16226 | -3
7.3 | 2002-11-27 | 10 | 305 | 453282 | 59008 | 14
7.4 | 2003-11-17 | 12 | 263 | 508523 | 55241 | 12
8.0 | 2005-01-19 | 14 | 230 | 654437 | 145914 | 28
8.1 | 2005-11-08 | 10 | 174 | 630422 | -24015 | -3
8.2 | 2006-12-05 | 13 | 215 | 684646 | 54224 | 8
8.3 | 2008-02-04 | 14 | 214 | 762697 | 78051 | 11
8.4 | 2009-07-01 | 17 | 314 | 939098 | 176401 | 23
9.0 | 2010-09-20 | 15 | 237 | 999862 | 60764 | 6
9.1 | 2011-09-12 | 12 | 203 | 1069547 | 69685 | 6
9.2 | 2012-09-10 | 12 | 238 | 1148192 | 78645 | 7
9.3 | 2013-09-09 | 12 | 177 | 1195627 | 47435 | 4
9.4 | 2014-12-18 | 15 | 211 | 1261024 | 65397 | 5
9.5 | 2016-01-07 | 13 | 193 | 1340005 | 78981 | 6
9.6 | 2016-09-29 | 9 | 214 | 1380458 | 40453 | 3
10 | 2017-10-05 | 12 | 189 | 1495196 | 114738 | 8
11 | 2018-10-18 | 12 | 170 | 1562537 | 67341 | 4
12 | 2019-10-03 | 11 | 180 | 1616912 | 54375 | 3
13 | 2020-09-24 | 12 | 178 | 1656030 | 39118 | 2
14 | 2021-09-30 | 12 | 220 | 1779777 | 123747 | 7
15 | 2022-10-13 | 12 | 184 | 1815646 | 35869 | 2
16 | 2023-09-14 | 11 | 206 | 1869401 | 53755 | 2
17 | 2024-09-26 | 12 | 182 | 1673116 | -196285 | -10
18 | 2025-09-25 | 12 | 210 | 1750814 | 77698 | 4
Averages | | 11 | 215 | | | 5.89

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Do not let urgent matters crowd out time for investment in the future.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#1)
Re: 10% drop in code line count in PG 17

Bruce Momjian <bruce@momjian.us> writes:

While working on a talk, I studied the number of code line changes in
each major release, and found PG 17 surprisingly reduced code line count
by 10%. To get the code line count, I used /pgtop/src/tools/codelines,
which runs:

find . -name '*.[chyl]' | xargs cat| wc -l

Any ideas on the cause of this decrease?

My first thought was that it had to do with the conversion of
src/backend/nodes/ to be largely auto-generated code. If you
are using codelines against just what is in git, that would look
like a decrease. However, I see that came in during v16 not v17,
so that's not the explanation. I'm betting it's some similar
effect though: code getting moved out of the set of files that
will match '*.[chyl]'.

Also ... are you in fact counting only what is in git? Because
I get different answers:

$ git clean -dfxq
$ git checkout REL_17_0
HEAD is now at d7ec59a63d7 Stamp 17.0.
$ src/tools/codelines
1664472
$ git checkout REL_16_0
HEAD is now at c372fbbd8e9 Doc: fix release date in release-16.sgml.
$ src/tools/codelines
1595197

regards, tom lane

#3Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#2)
Re: 10% drop in code line count in PG 17

On Wed, Nov 19, 2025 at 03:21:33PM -0500, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

While working on a talk, I studied the number of code line changes in
each major release, and found PG 17 surprisingly reduced code line count
by 10%. To get the code line count, I used /pgtop/src/tools/codelines,
which runs:

find . -name '*.[chyl]' | xargs cat| wc -l

Any ideas on the cause of this decrease?

My first thought was that it had to do with the conversion of
src/backend/nodes/ to be largely auto-generated code. If you
are using codelines against just what is in git, that would look
like a decrease. However, I see that came in during v16 not v17,
so that's not the explanation. I'm betting it's some similar
effect though: code getting moved out of the set of files that
will match '*.[chyl]'.

Huh.

Also ... are you in fact counting only what is in git? Because
I get different answers:

$ git clean -dfxq
$ git checkout REL_17_0
HEAD is now at d7ec59a63d7 Stamp 17.0.
$ src/tools/codelines
1664472
$ git checkout REL_16_0
HEAD is now at c372fbbd8e9 Doc: fix release date in release-16.sgml.
$ src/tools/codelines
1595197

No, I just followed the shell comment I wrote above the 'find' command
shown above:

# This script is used to compute the total number of "C" lines in the
# release This should be run from the top of the Git tree after a 'make
# distclean'

And that tree has been built many times. Should I change my procedure?

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Do not let urgent matters crowd out time for investment in the future.

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#3)
Re: 10% drop in code line count in PG 17

Bruce Momjian <bruce@momjian.us> writes:

On Wed, Nov 19, 2025 at 03:21:33PM -0500, Tom Lane wrote:

Also ... are you in fact counting only what is in git? Because
I get different answers:

No, I just followed the shell comment I wrote above the 'find' command
shown above:

# This script is used to compute the total number of "C" lines in the
# release This should be run from the top of the Git tree after a 'make
# distclean'

And that tree has been built many times. Should I change my procedure?

Does "git status --ignored" show any leftover junk files?

I've found that "make distclean" isn't 100% reliable if you aren't
religious about doing it before every git pull or other change of
git HEAD. The pull might bring in new makefiles with a different
idea of what needs to be cleaned. For .c files I'd kind of expect
leftovers to be obvious because they won't get hidden by .gitignore
rules, but maybe you hit some case where they're still hidden.

I've largely migrated to using "git clean -dfxq", which has about
the same results in modern branches, but is faster and never (IME)
misses anything.

regards, tom lane

#5Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#4)
1 attachment(s)
Re: 10% drop in code line count in PG 17

On Wed, Nov 19, 2025 at 04:22:37PM -0500, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

On Wed, Nov 19, 2025 at 03:21:33PM -0500, Tom Lane wrote:

Also ... are you in fact counting only what is in git? Because
I get different answers:

No, I just followed the shell comment I wrote above the 'find' command
shown above:

# This script is used to compute the total number of "C" lines in the
# release This should be run from the top of the Git tree after a 'make
# distclean'

And that tree has been built many times. Should I change my procedure?

Does "git status --ignored" show any leftover junk files?

I've found that "make distclean" isn't 100% reliable if you aren't
religious about doing it before every git pull or other change of
git HEAD. The pull might bring in new makefiles with a different
idea of what needs to be cleaned. For .c files I'd kind of expect
leftovers to be obvious because they won't get hidden by .gitignore
rules, but maybe you hit some case where they're still hidden.

I've largely migrated to using "git clean -dfxq", which has about
the same results in modern branches, but is faster and never (IME)
misses anything.

I think you are right. Attached is the difference between the output
for 16 & 17. Let me do some more research and run all the versions
again and report back, thanks.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Do not let urgent matters crowd out time for investment in the future.

Attachments:

16_17.txttext/plain; charset=us-asciiDownload
#6Álvaro Herrera
alvherre@kurilemu.de
In reply to: Tom Lane (#4)
Re: 10% drop in code line count in PG 17

On 2025-Nov-19, Tom Lane wrote:

No, I just followed the shell comment I wrote above the 'find' command
shown above:

# This script is used to compute the total number of "C" lines in the
# release This should be run from the top of the Git tree after a 'make
# distclean'

And that tree has been built many times. Should I change my procedure?

Does "git status --ignored" show any leftover junk files?

Maybe it'd be better to use `git ls-files` to create the list of files.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"But static content is just dynamic content that isn't moving!"
http://smylers.hates-software.com/2007/08/15/fe244d0c.html

#7David Rowley
dgrowleyml@gmail.com
In reply to: Bruce Momjian (#5)
Re: 10% drop in code line count in PG 17

On Thu, 20 Nov 2025 at 10:58, Bruce Momjian <bruce@momjian.us> wrote:

I think you are right. Attached is the difference between the output
for 16 & 17. Let me do some more research and run all the versions
again and report back, thanks.

Maybe you'd be better with git ls-files if you only want just what's
in the repo. Something like:

for b in "REL8_0_0" "REL8_1_0" "REL8_2_0" "REL8_3_0" "REL8_4_0"
"REL9_0_0" "REL9_1_0" "REL9_2_0" "REL9_3_0" "REL9_4_0" "REL9_5_0"
"REL9_6_0" "REL_10_0" "REL_11_0" "REL_12_0" "REL_13_0" "REL_14_0"
"REL_15_0" "REL_16_0" "REL_17_0" "REL_18_0" "master"; do git checkout
-f $b > /dev/null 2>&1 && echo -n "$b " && git ls-files -- '*.[chyl]'
| xargs cat | wc -l; done

Careful with the git checkout "-f" though.

David

#8Daniel Gustafsson
daniel@yesql.se
In reply to: Bruce Momjian (#1)
Re: 10% drop in code line count in PG 17

On 19 Nov 2025, at 20:59, Bruce Momjian <bruce@momjian.us> wrote:

While working on a talk, I studied the number of code line changes in
each major release,

This script will only pick up C, but will pick up C in src/test but not any
Perl code using the C modules in src/test etc. These days we also have C++ and
some Python in the tree. Maybe it's time to revise it for todays codebase
which is quite different from when it was written 20 years ago?

--
Daniel Gustafsson

#9Álvaro Herrera
alvherre@kurilemu.de
In reply to: David Rowley (#7)
Re: 10% drop in code line count in PG 17

On 2025-Nov-20, David Rowley wrote:

Maybe you'd be better with git ls-files if you only want just what's
in the repo. Something like:

for b in "REL8_0_0" "REL8_1_0" "REL8_2_0" "REL8_3_0" "REL8_4_0"
"REL9_0_0" "REL9_1_0" "REL9_2_0" "REL9_3_0" "REL9_4_0" "REL9_5_0"
"REL9_6_0" "REL_10_0" "REL_11_0" "REL_12_0" "REL_13_0" "REL_14_0"
"REL_15_0" "REL_16_0" "REL_17_0" "REL_18_0" "master"; do git checkout
-f $b > /dev/null 2>&1 && echo -n "$b " && git ls-files -- '*.[chyl]'
| xargs cat | wc -l; done

Maybe this should also consider .pl and .pm files ... we now have almost
90k lines of Perl code in branch master:

I perhan: master 0 0$ git ls-files -- '*.pl' | xargs cat | wc -l
77234
C perhan: master 0 0 0$ git ls-files -- '*.pm' | xargs cat | wc -l
10386

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"After a quick R of TFM, all I can say is HOLY CR** THAT IS COOL! PostgreSQL was
amazing when I first started using it at 7.2, and I'm continually astounded by
learning new features and techniques made available by the continuing work of
the development team."
Berend Tober, http://archives.postgresql.org/pgsql-hackers/2007-08/msg01009.php

#10Aleksander Alekseev
aleksander@tigerdata.com
In reply to: Bruce Momjian (#1)
Re: 10% drop in code line count in PG 17

Hi Bruce,

While working on a talk, I studied the number of code line changes in
each major release, and found PG 17 surprisingly reduced code line count
by 10%. To get the code line count, I used /pgtop/src/tools/codelines,
which runs:

find . -name '*.[chyl]' | xargs cat| wc -l

FWIW I get different results with `cloc`:

$ git checkout REL_18_STABLE
$ git clean -dfx # be careful! this will drop your local .clangd settings etc
$ cloc ./

github.com/AlDanial/cloc v 1.98 T=3.38 s (1448.6 files/s, 915951.4 lines/s)
---------------------------------------------------------------------------------------
Language files blank
comment code
---------------------------------------------------------------------------------------
C 1555 189668
393758 940984
PO File 466 180914
221367 543216
SQL 791 30420
23631 124104
C/C++ Header 973 18935
64368 114176
Perl 335 13402
12264 60254
XML 3 4
15 30922
... skipped ...
---------------------------------------------------------------------------------------
SUM: 4895 446873
728634 1919686
---------------------------------------------------------------------------------------

$ git checkout REL_17_STABLE
$ cloc ./

github.com/AlDanial/cloc v 1.98 T=2.68 s (1764.0 files/s, 1104266.3 lines/s)
---------------------------------------------------------------------------------------
Language files blank
comment code
---------------------------------------------------------------------------------------
C 1507 181725
376154 905987
PO File 466 174902
211970 529317
SQL 754 28606
21625 115742
C/C++ Header 943 18255
61771 100741
Perl 309 11882
10905 52974
XML 3 4
15 30922
... skipped ...
---------------------------------------------------------------------------------------
SUM: 4733 428393
695356 1839064
---------------------------------------------------------------------------------------

Overall, there is a 4% increase according to this tool. What is
convenient about `cloc` - you can count only what you want, e.g. code
without comments, etc.

--
Best regards,
Aleksander Alekseev

#11Aleksander Alekseev
aleksander@tigerdata.com
In reply to: Aleksander Alekseev (#10)
Re: 10% drop in code line count in PG 17

Hi,

While working on a talk, I studied the number of code line changes in
each major release, and found PG 17 surprisingly reduced code line count
by 10%. To get the code line count, I used /pgtop/src/tools/codelines,
which runs:

[..]
Overall, there is a 4% increase according to this tool. What is
convenient about `cloc` - you can count only what you want, e.g. code
without comments, etc.

Oops, I didn't notice that you were comparing PG16 and PG17. Still,
the result with `cloc` is similar, +4% approximately. Apologies for
the noise.

--
Best regards,
Aleksander Alekseev

#12Bruce Momjian
bruce@momjian.us
In reply to: David Rowley (#7)
1 attachment(s)
Re: 10% drop in code line count in PG 17

On Thu, Nov 20, 2025 at 12:23:25PM +1300, David Rowley wrote:

On Thu, 20 Nov 2025 at 10:58, Bruce Momjian <bruce@momjian.us> wrote:

I think you are right. Attached is the difference between the output
for 16 & 17. Let me do some more research and run all the versions
again and report back, thanks.

Maybe you'd be better with git ls-files if you only want just what's
in the repo. Something like:

for b in "REL8_0_0" "REL8_1_0" "REL8_2_0" "REL8_3_0" "REL8_4_0"
"REL9_0_0" "REL9_1_0" "REL9_2_0" "REL9_3_0" "REL9_4_0" "REL9_5_0"
"REL9_6_0" "REL_10_0" "REL_11_0" "REL_12_0" "REL_13_0" "REL_14_0"
"REL_15_0" "REL_16_0" "REL_17_0" "REL_18_0" "master"; do git checkout
-f $b > /dev/null 2>&1 && echo -n "$b " && git ls-files -- '*.[chyl]'
| xargs cat | wc -l; done

Yes, I like "git ls-files" since it gives the same count as Tom's
version but doesn't modify the git tree. The old script pre-dates git
and I didn't consider "git" could give us a better solution. Attached
is the applied patch.

And here are the updated line counts. I went all the way back to 7.1
which is the last stasble git branch.

---------------------------------------------------------------------------

version | reldate | months | changes | C lines | C changes | % C change
----------+------------+--------+---------+---------+-----------+------------
4.2 | 1994-03-17 | | | 250872 | |
1.0 | 1995-09-05 | 18 | | 172470 | -78402 | -31
1.01 | 1996-02-23 | 6 | | 179463 | 6993 | 4
1.09 | 1996-11-04 | 8 | | 178976 | -487 | 0
6.0 | 1997-01-29 | 3 | | 189399 | 10423 | 5
6.1 | 1997-06-08 | 4 | | 200709 | 11310 | 5
6.2 | 1997-10-02 | 4 | | 225848 | 25139 | 12
6.3 | 1998-03-01 | 5 | | 260809 | 34961 | 15
6.4 | 1998-10-30 | 8 | | 297918 | 37109 | 14
6.5 | 1999-06-09 | 7 | | 331278 | 33360 | 11
7.0 | 2000-05-08 | 11 | | 383270 | 51992 | 15
7.1 | 2001-04-13 | 11 | | 380642 | -2628 | 0
7.2 | 2002-02-04 | 10 | 250 | 425898 | 45256 | 11
7.3 | 2002-11-27 | 10 | 305 | 439816 | 13918 | 3
7.4 | 2003-11-17 | 12 | 263 | 522371 | 82555 | 18
8.0 | 2005-01-19 | 14 | 230 | 586127 | 63756 | 12
8.1 | 2005-11-08 | 10 | 174 | 625253 | 39126 | 6
8.2 | 2006-12-05 | 13 | 215 | 684726 | 59473 | 9
8.3 | 2008-02-04 | 14 | 214 | 765100 | 80374 | 11
8.4 | 2009-07-01 | 17 | 314 | 817849 | 52749 | 6
9.0 | 2010-09-20 | 15 | 237 | 870790 | 52941 | 6
9.1 | 2011-09-12 | 12 | 203 | 932936 | 62146 | 7
9.2 | 2012-09-10 | 12 | 238 | 987460 | 54524 | 5
9.3 | 2013-09-09 | 12 | 177 | 1040813 | 53353 | 5
9.4 | 2014-12-18 | 15 | 211 | 1096707 | 55894 | 5
9.5 | 2016-01-07 | 13 | 193 | 1167110 | 70403 | 6
9.6 | 2016-09-29 | 9 | 214 | 1219720 | 52610 | 4
10 | 2017-10-05 | 12 | 189 | 1316447 | 96727 | 7
11 | 2018-10-18 | 12 | 170 | 1369590 | 53143 | 4
12 | 2019-10-03 | 11 | 180 | 1423215 | 53625 | 3
13 | 2020-09-24 | 12 | 178 | 1473738 | 50523 | 3
14 | 2021-09-30 | 12 | 220 | 1558178 | 84440 | 5
15 | 2022-10-13 | 12 | 184 | 1587763 | 29585 | 1
16 | 2023-09-14 | 11 | 206 | 1608031 | 20268 | 1
17 | 2024-09-26 | 12 | 182 | 1673116 | 65085 | 4
18 | 2025-09-25 | 12 | 210 | 1750814 | 77698 | 4
Averages | | 11 | 215 | | | 5.60

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Do not let urgent matters crowd out time for investment in the future.

Attachments:

master.difftext/x-diff; charset=us-asciiDownload
diff --git a/src/tools/codelines b/src/tools/codelines
index 11e86accf27..93ad571acf9 100755
--- a/src/tools/codelines
+++ b/src/tools/codelines
@@ -3,5 +3,5 @@
 # src/tools/codelines
 
 # This script is used to compute the total number of "C" lines in the release
-# This should be run from the top of the Git tree after a 'make distclean'
-find . -name '*.[chyl]' | xargs cat| wc -l
+# This should be run from the top of the Git tree.
+git ls-files -- '*.[chyl]' | xargs cat | wc -l
#13Bruce Momjian
bruce@momjian.us
In reply to: Álvaro Herrera (#9)
Re: 10% drop in code line count in PG 17

On Thu, Nov 20, 2025 at 11:42:39AM +0100, Álvaro Herrera wrote:

On 2025-Nov-20, David Rowley wrote:

Maybe you'd be better with git ls-files if you only want just what's
in the repo. Something like:

for b in "REL8_0_0" "REL8_1_0" "REL8_2_0" "REL8_3_0" "REL8_4_0"
"REL9_0_0" "REL9_1_0" "REL9_2_0" "REL9_3_0" "REL9_4_0" "REL9_5_0"
"REL9_6_0" "REL_10_0" "REL_11_0" "REL_12_0" "REL_13_0" "REL_14_0"
"REL_15_0" "REL_16_0" "REL_17_0" "REL_18_0" "master"; do git checkout
-f $b > /dev/null 2>&1 && echo -n "$b " && git ls-files -- '*.[chyl]'
| xargs cat | wc -l; done

Maybe this should also consider .pl and .pm files ... we now have almost
90k lines of Perl code in branch master:

I perhan: master 0 0$ git ls-files -- '*.pl' | xargs cat | wc -l
77234
C perhan: master 0 0 0$ git ls-files -- '*.pm' | xargs cat | wc -l
10386

Well, I am trying to count only the code that is part of a cluster
install, or optionally an install for extensions. Aren't most of the
Perl files testing? Not sure we want to count that.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Do not let urgent matters crowd out time for investment in the future.

#14Bruce Momjian
bruce@momjian.us
In reply to: Daniel Gustafsson (#8)
Re: 10% drop in code line count in PG 17

On Thu, Nov 20, 2025 at 10:38:49AM +0100, Daniel Gustafsson wrote:

On 19 Nov 2025, at 20:59, Bruce Momjian <bruce@momjian.us> wrote:

While working on a talk, I studied the number of code line changes in
each major release,

This script will only pick up C, but will pick up C in src/test but not any
Perl code using the C modules in src/test etc. These days we also have C++ and
some Python in the tree. Maybe it's time to revise it for todays codebase
which is quite different from when it was written 20 years ago?

Yeah, that's part of a larger discussion. In an email I just sent I
suggested we are trying to count files that are part of a cluster
install, rather than testing files, but again, needs discussion.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Do not let urgent matters crowd out time for investment in the future.

#15Bruce Momjian
bruce@momjian.us
In reply to: Aleksander Alekseev (#11)
Re: 10% drop in code line count in PG 17

On Thu, Nov 20, 2025 at 04:49:51PM +0300, Aleksander Alekseev wrote:

Hi,

While working on a talk, I studied the number of code line changes in
each major release, and found PG 17 surprisingly reduced code line count
by 10%. To get the code line count, I used /pgtop/src/tools/codelines,
which runs:

[..]
Overall, there is a 4% increase according to this tool. What is
convenient about `cloc` - you can count only what you want, e.g. code
without comments, etc.

Oops, I didn't notice that you were comparing PG16 and PG17. Still,
the result with `cloc` is similar, +4% approximately. Apologies for
the noise.

Yes, that is another discussion we can have --- whether line count alone
is what we want.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Do not let urgent matters crowd out time for investment in the future.

#16Daniel Gustafsson
daniel@yesql.se
In reply to: Bruce Momjian (#14)
Re: 10% drop in code line count in PG 17

On 20 Nov 2025, at 21:30, Bruce Momjian <bruce@momjian.us> wrote:

Yeah, that's part of a larger discussion. In an email I just sent I
suggested we are trying to count files that are part of a cluster
install, rather than testing files, but again, needs discussion.

Right, but that was sort of my point, you are counting lines which aren't part
of the cluster install since src/test has lot's of C code which is just tests.

$ find src/test/ -name '*.[chyl]' | xargs cat|wc -l
23587

And the cluster install does contain C++ which isn't counted for.

$ find . -name '*.cpp' | xargs cat|wc -l
1485

Counting just lines in a cluster install is a valid use case but the script
might need some adaptations to match the current tree.

--
Daniel Gustafsson

#17David Rowley
dgrowleyml@gmail.com
In reply to: Bruce Momjian (#12)
Re: 10% drop in code line count in PG 17

On Fri, 21 Nov 2025 at 09:27, Bruce Momjian <bruce@momjian.us> wrote:

# This script is used to compute the total number of "C" lines in the release
-# This should be run from the top of the Git tree after a 'make distclean'
-find . -name '*.[chyl]' | xargs cat| wc -l
+# This should be run from the top of the Git tree.
+git ls-files -- '*.[chyl]' | xargs cat | wc -l

I think you need to keep the "top of the Git tree" comment as git
ls-files is context-based.

David

#18Bruce Momjian
bruce@momjian.us
In reply to: David Rowley (#17)
Re: 10% drop in code line count in PG 17

On Fri, Nov 21, 2025 at 10:16:56AM +1300, David Rowley wrote:

On Fri, 21 Nov 2025 at 09:27, Bruce Momjian <bruce@momjian.us> wrote:

# This script is used to compute the total number of "C" lines in the release
-# This should be run from the top of the Git tree after a 'make distclean'
-find . -name '*.[chyl]' | xargs cat| wc -l
+# This should be run from the top of the Git tree.

---------------------------------------------------

+git ls-files -- '*.[chyl]' | xargs cat | wc -l

I think you need to keep the "top of the Git tree" comment as git
ls-files is context-based.

Uh, the current file has this comment:

# This script is used to compute the total number of "C" lines in the release
# This should be run from the top of the Git tree.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Do not let urgent matters crowd out time for investment in the future.

#19David Rowley
dgrowleyml@gmail.com
In reply to: Bruce Momjian (#18)
Re: 10% drop in code line count in PG 17

On Fri, 21 Nov 2025 at 10:26, Bruce Momjian <bruce@momjian.us> wrote:

On Fri, Nov 21, 2025 at 10:16:56AM +1300, David Rowley wrote:

I think you need to keep the "top of the Git tree" comment as git
ls-files is context-based.

Uh, the current file has this comment:

Oh. I misread the patch. Mistakenly thought you'd removed that entire
line. (I normally use a difftool, but didn't in this instance).

David

#20Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#14)
Re: 10% drop in code line count in PG 17

On Thu, Nov 20, 2025 at 03:30:15PM -0500, Bruce Momjian wrote:

On Thu, Nov 20, 2025 at 10:38:49AM +0100, Daniel Gustafsson wrote:

On 19 Nov 2025, at 20:59, Bruce Momjian <bruce@momjian.us> wrote:

While working on a talk, I studied the number of code line changes in
each major release,

This script will only pick up C, but will pick up C in src/test but not any
Perl code using the C modules in src/test etc. These days we also have C++ and
some Python in the tree. Maybe it's time to revise it for todays codebase
which is quite different from when it was written 20 years ago?

Yeah, that's part of a larger discussion. In an email I just sent I
suggested we are trying to count files that are part of a cluster
install, rather than testing files, but again, needs discussion.

Actually, another discussion would be why we have src/tools/codelines in
the git tree at all. I added it in 2005 to use in counting code lines,
and I thought we could consider it our standard method, but I am not
sure anyone aside from me even uses it, and it is clear there are
multiple methods people consider valid. Should we just remove it?

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Do not let urgent matters crowd out time for investment in the future.

#21Aleksander Alekseev
aleksander@tigerdata.com
In reply to: Bruce Momjian (#20)
Re: 10% drop in code line count in PG 17

Hi Bruce,

Actually, another discussion would be why we have src/tools/codelines in
the git tree at all. I added it in 2005 to use in counting code lines,
and I thought we could consider it our standard method, but I am not
sure anyone aside from me even uses it, and it is clear there are
multiple methods people consider valid. Should we just remove it?

I think we should.

--
Best regards,
Aleksander Alekseev

#22Peter Eisentraut
peter@eisentraut.org
In reply to: Bruce Momjian (#20)
Re: 10% drop in code line count in PG 17

On 21.11.25 01:49, Bruce Momjian wrote:

Actually, another discussion would be why we have src/tools/codelines in
the git tree at all. I added it in 2005 to use in counting code lines,
and I thought we could consider it our standard method, but I am not
sure anyone aside from me even uses it, and it is clear there are
multiple methods people consider valid. Should we just remove it?

I think so.

#23Bruce Momjian
bruce@momjian.us
In reply to: Peter Eisentraut (#22)
Re: 10% drop in code line count in PG 17

On Fri, Nov 21, 2025 at 01:13:50PM +0100, Peter Eisentraut wrote:

On 21.11.25 01:49, Bruce Momjian wrote:

Actually, another discussion would be why we have src/tools/codelines in
the git tree at all. I added it in 2005 to use in counting code lines,
and I thought we could consider it our standard method, but I am not
sure anyone aside from me even uses it, and it is clear there are
multiple methods people consider valid. Should we just remove it?

I think so.

Removed.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

Do not let urgent matters crowd out time for investment in the future.