Tarball builds in the new world order

Started by Tom Laneover 1 year ago13 messages
#1Tom Lane
tgl@sss.pgh.pa.us

With one eye on the beta-release calendar, I thought it'd be a
good idea to test whether our tarball build script works for
the new plan where we'll use "git archive" instead of the
traditional process.

It doesn't.

It makes tarballs all right, but whatever commit ID you specify
is semi-ignored, and you get a tarball corresponding to HEAD
of master. (The PDFs come from the right version, though!)

The reason for that is that the mk-one-release script does this
(shorn of not-relevant-here details):

export BASE=/home/pgsql
export GIT_DIR=$BASE/postgresql.git

mkdir pgsql

# Export the selected git ref
git archive ${gitref} | tar xf - -C pgsql

cd pgsql
./configure

# Produce .tar.gz and .tar.bz2 files
make dist

Since 619bc23a1, what "make dist" does is

$(GIT) -C $(srcdir) -c core.autocrlf=false archive --format tar.gz -9 --prefix $(distdir)/ HEAD -o $(abs_top_builddir)/$@
$(GIT) -C $(srcdir) -c core.autocrlf=false -c tar.tar.bz2.command='$(BZIP2) -c' archive --format tar.bz2 --prefix $(distdir)/ HEAD -o $(abs_top_builddir)/$@

Since GIT_DIR is set, git consults that repo not the current working
directory, so HEAD means whatever it means in a just-fetched repo,
and mk-one-release's efforts to select the ${gitref} commit mean
nothing. (If git had tried to consult the current working directory,
it would've failed for lack of any .git subdir therein.)

I really really don't want to put version-specific coding into
mk-one-release, but fortunately I think we don't have to.
What I suggest is doing this in mk-one-release:

-make dist
+make dist PG_COMMIT_HASH=${gitref}

and changing the "make dist" rules to write $(PG_COMMIT_HASH) not
HEAD. The extra make variable will have no effect in the back
branches, while it should cause the right thing to happen with
the new implementation of "make dist".

This change seems like a good thing anyway for anyone who's tempted
to use "make dist" manually, since they wouldn't necessarily want
to package HEAD either. Now, if we just do it exactly like that
then trying to "make dist" without setting PG_COMMIT_HASH will
fail, since "git archive" has no default for its <tree-ish>
argument. I can't quite decide if that's a good thing, or if we
should hack the makefile a little further to allow PG_COMMIT_HASH
to default to HEAD.

Thoughts, better ideas?

regards, tom lane

#2Greg Sabino Mullane
htamfids@gmail.com
In reply to: Tom Lane (#1)
Re: Tarball builds in the new world order

On Tue, Apr 23, 2024 at 6:06 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

This change seems like a good thing anyway for anyone who's tempted
to use "make dist" manually, since they wouldn't necessarily want
to package HEAD either. Now, if we just do it exactly like that
then trying to "make dist" without setting PG_COMMIT_HASH will
fail, since "git archive" has no default for its <tree-ish>
argument. I can't quite decide if that's a good thing, or if we
should hack the makefile a little further to allow PG_COMMIT_HASH
to default to HEAD.

Just having it fail seems harsh. What if we had plain "make dist" at least
output a friendly hint about "please specify a hash"? That seems better
than an implicit HEAD default, as they can manually set it to HEAD
themselves per the hint.

Cheers,
Greg

#3Peter Eisentraut
peter@eisentraut.org
In reply to: Tom Lane (#1)
Re: Tarball builds in the new world order

On 24.04.24 00:05, Tom Lane wrote:

It makes tarballs all right, but whatever commit ID you specify
is semi-ignored, and you get a tarball corresponding to HEAD
of master. (The PDFs come from the right version, though!)

The reason for that is that the mk-one-release script does this
(shorn of not-relevant-here details):

export BASE=/home/pgsql
export GIT_DIR=$BASE/postgresql.git

mkdir pgsql

# Export the selected git ref
git archive ${gitref} | tar xf - -C pgsql

Where does ${gitref} come from? Why doesn't this line use git archive
HEAD | ... ?

What I suggest is doing this in mk-one-release:

-make dist
+make dist PG_COMMIT_HASH=${gitref}

and changing the "make dist" rules to write $(PG_COMMIT_HASH) not
HEAD. The extra make variable will have no effect in the back
branches, while it should cause the right thing to happen with
the new implementation of "make dist".

I suppose we could do something like that, but we'd also need to come up
with a meson version.

(Let's not use "hash" though, since other ways to commit specify a
commit can be used.)

This change seems like a good thing anyway for anyone who's tempted
to use "make dist" manually, since they wouldn't necessarily want
to package HEAD either.

A tin-foil-hat argument is that we might not want to encourage that,
because for reproducibility, we need a known git commit and also a known
implementation of make dist. If in the future someone uses the make
dist implementation of PG19 to build a tarball for PG17, it might not
come out the same way as using the make dist implementation of PG17.

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#3)
Re: Tarball builds in the new world order

Peter Eisentraut <peter@eisentraut.org> writes:

On 24.04.24 00:05, Tom Lane wrote:

# Export the selected git ref
git archive ${gitref} | tar xf - -C pgsql

Where does ${gitref} come from? Why doesn't this line use git archive
HEAD | ... ?

${gitref} is an argument to the script, specifying the commit
to be packaged. HEAD would certainly not work when trying to
package a back-branch release, and in general it would hardly
ever be what you want if your goal is to make a reproducible
package.

What I suggest is doing this in mk-one-release:
-make dist
+make dist PG_COMMIT_HASH=${gitref}

I suppose we could do something like that, but we'd also need to come up
with a meson version.

Packaging via meson is years away yet IMO, so I'm unconcerned
about that for now. See below.

(Let's not use "hash" though, since other ways to commit specify a
commit can be used.)

OK, do you have a different term in mind?

This change seems like a good thing anyway for anyone who's tempted
to use "make dist" manually, since they wouldn't necessarily want
to package HEAD either.

A tin-foil-hat argument is that we might not want to encourage that,
because for reproducibility, we need a known git commit and also a known
implementation of make dist. If in the future someone uses the make
dist implementation of PG19 to build a tarball for PG17, it might not
come out the same way as using the make dist implementation of PG17.

Of course. The entire reason why this script invokes "make dist",
rather than implementing the behavior for itself, is so that
branch-specific behaviors can be accounted for in the branches
not here. (To be clear, the script has no idea which branch
it's packaging --- that's implicit in the commit ID.)

Because of that, I really don't want to rely on some equivalent
meson infrastructure until it's available in all the live branches.
v15 will be EOL in 3.5 years, and that's more or less the time frame
that we've spoken of for dropping the makefile infrastructure, so
I don't think that's an unreasonable plan.

regards, tom lane

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Greg Sabino Mullane (#2)
Re: Tarball builds in the new world order

Greg Sabino Mullane <htamfids@gmail.com> writes:

On Tue, Apr 23, 2024 at 6:06 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Now, if we just do it exactly like that
then trying to "make dist" without setting PG_COMMIT_HASH will
fail, since "git archive" has no default for its <tree-ish>
argument. I can't quite decide if that's a good thing, or if we
should hack the makefile a little further to allow PG_COMMIT_HASH
to default to HEAD.

Just having it fail seems harsh. What if we had plain "make dist" at least
output a friendly hint about "please specify a hash"? That seems better
than an implicit HEAD default, as they can manually set it to HEAD
themselves per the hint.

Yeah, it would be easy to do something like

ifneq ($(PG_COMMIT_HASH),)
$(GIT) ...
else
@echo "Please specify PG_COMMIT_HASH." && exit 1
endif

I'm just debating whether that's better than inserting a default
value.

regards, tom lane

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#5)
2 attachment(s)
Re: Tarball builds in the new world order

Concretely, I'm proposing the attached. Peter didn't like
PG_COMMIT_HASH, so I have PG_COMMIT_REFSPEC below, but I'm not
wedded to that if a better name is proposed.

regards, tom lane

Attachments:

0002-fix-tarball-build-script.patchtext/x-diff; charset=us-ascii; name=0002-fix-tarball-build-script.patchDownload
--- mk-one-release.orig	2024-04-23 17:30:08.983226671 -0400
+++ mk-one-release	2024-04-26 15:17:29.713669677 -0400
@@ -39,13 +39,17 @@ mkdir pgsql
 git archive ${gitref} | tar xf - -C pgsql
 
 # Include the git ref in the output tarballs
+# (This has no effect with v17 and up; instead we rely on "git archive"
+# to include the commit hash in the tar header)
 echo ${gitref} >pgsql/.gitrevision
 
 cd pgsql
 ./configure
 
 # Produce .tar.gz and .tar.bz2 files
-make dist
+# (With v17 and up, this will repeat the "git archive" call; the contents
+# of the working directory don't matter directly to the results.)
+make dist PG_COMMIT_REFSPEC=${gitref}
 
 # Generate md5 and sha256 sums, and copy files to output
 for x in *.tar.*; do
0001-make-packaged-commit-selectable.patchtext/x-diff; charset=us-ascii; name=0001-make-packaged-commit-selectable.patchDownload
diff --git a/GNUmakefile.in b/GNUmakefile.in
index 30553b2a95..27357e5e3b 100644
--- a/GNUmakefile.in
+++ b/GNUmakefile.in
@@ -102,10 +102,18 @@ distdir-location:
 # on, Unix machines.
 
 $(distdir).tar.gz:
-	$(GIT) -C $(srcdir) -c core.autocrlf=false archive --format tar.gz -9 --prefix $(distdir)/ HEAD -o $(abs_top_builddir)/$@
+ifneq ($(PG_COMMIT_REFSPEC),)
+	$(GIT) -C $(srcdir) -c core.autocrlf=false archive --format tar.gz -9 --prefix $(distdir)/ $(PG_COMMIT_REFSPEC) -o $(abs_top_builddir)/$@
+else
+	@echo "Please specify PG_COMMIT_REFSPEC." && exit 1
+endif
 
 $(distdir).tar.bz2:
-	$(GIT) -C $(srcdir) -c core.autocrlf=false -c tar.tar.bz2.command='$(BZIP2) -c' archive --format tar.bz2 --prefix $(distdir)/ HEAD -o $(abs_top_builddir)/$@
+ifneq ($(PG_COMMIT_REFSPEC),)
+	$(GIT) -C $(srcdir) -c core.autocrlf=false -c tar.tar.bz2.command='$(BZIP2) -c' archive --format tar.bz2 --prefix $(distdir)/ $(PG_COMMIT_REFSPEC) -o $(abs_top_builddir)/$@
+else
+	@echo "Please specify PG_COMMIT_REFSPEC." && exit 1
+endif
 
 distcheck: dist
 	rm -rf $(dummy)
#7Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Tom Lane (#6)
Re: Tarball builds in the new world order

On 2024-Apr-26, Tom Lane wrote:

--- mk-one-release.orig	2024-04-23 17:30:08.983226671 -0400
+++ mk-one-release	2024-04-26 15:17:29.713669677 -0400
@@ -39,13 +39,17 @@ mkdir pgsql
git archive ${gitref} | tar xf - -C pgsql
# Include the git ref in the output tarballs
+# (This has no effect with v17 and up; instead we rely on "git archive"
+# to include the commit hash in the tar header)
echo ${gitref} >pgsql/.gitrevision

Why is it that the .gitrevision file is only created here, instead of
being added to the tarball that "git archive" produces? Adding an
argument like
--add-virtual-file $(distdir)/.gitrevision:$(GIT_REFSPEC)

to the git archive call should suffice.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"I can't go to a restaurant and order food because I keep looking at the
fonts on the menu. Five minutes later I realize that it's also talking
about food" (Donald Knuth)

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#7)
Re: Tarball builds in the new world order

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

Why is it that the .gitrevision file is only created here, instead of
being added to the tarball that "git archive" produces? Adding an
argument like
--add-virtual-file $(distdir)/.gitrevision:$(GIT_REFSPEC)
to the git archive call should suffice.

I think we don't want to do that. In the first place, it's redundant
because "git archive" includes the commit hash in the tar header,
and in the second place it gets away from the concept that the tarball
contains exactly what is in our git tree.

Now admittedly, if anyone's built tooling that relies on the presence
of the .gitrevision file, they might prefer that we keep on including
it. But I'm not sure anyone has, and in any case I think switching
to the git-approved way of incorporating the hash is the best thing
in the long run.

What I'm thinking of doing, as soon as we've sorted the tarball
creation process, is to make a test tarball available to the
packagers group so that anyone interested can start working on
updating their packaging process for the new approach. Hopefully,
if anyone's especially unhappy about omitting .gitrevision, they'll
speak up.

regards, tom lane

#9Peter Eisentraut
peter@eisentraut.org
In reply to: Tom Lane (#6)
Re: Tarball builds in the new world order

On 26.04.24 21:24, Tom Lane wrote:

Concretely, I'm proposing the attached. Peter didn't like
PG_COMMIT_HASH, so I have PG_COMMIT_REFSPEC below, but I'm not
wedded to that if a better name is proposed.

This seems ok to me, but note that we do have an equivalent
implementation in meson. If we don't want to update that in a similar
way, maybe we should disable it.

#10Peter Eisentraut
peter@eisentraut.org
In reply to: Tom Lane (#6)
Re: Tarball builds in the new world order

On 26.04.24 21:24, Tom Lane wrote:

Concretely, I'm proposing the attached. Peter didn't like
PG_COMMIT_HASH, so I have PG_COMMIT_REFSPEC below, but I'm not
wedded to that if a better name is proposed.

Um, "refspec" leads me here
<https://git-scm.com/book/en/v2/Git-Internals-The-Refspec&gt;, which seems
like the wrong concept. I think the more correct concept is "revision"
(https://git-scm.com/docs/gitrevisions), so something like PG_GIT_REVISION?

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#9)
2 attachment(s)
Re: Tarball builds in the new world order

Peter Eisentraut <peter@eisentraut.org> writes:

On 26.04.24 21:24, Tom Lane wrote:

Concretely, I'm proposing the attached. Peter didn't like
PG_COMMIT_HASH, so I have PG_COMMIT_REFSPEC below, but I'm not
wedded to that if a better name is proposed.

This seems ok to me, but note that we do have an equivalent
implementation in meson. If we don't want to update that in a similar
way, maybe we should disable it.

OK. After poking at that for awhile, it seemed like "default to
HEAD" fits into meson a lot better than "throw an error if the
variable isn't set", so I switched to doing it like that.
One reason is that AFAICT you can only set the variable during
"meson setup" not during "ninja". This won't matter to the
tarball build script, which does a one-off configuration run
anyway. But for manual use, a movable target like HEAD might be
more convenient given that behavior.

I tested this by building tarballs using the makefiles on a RHEL8
box, and using meson on my MacBook (with recent MacPorts tools).
I got bit-for-bit identical files, which I found rather impressive
given the gap between the platforms. Maybe this "reproducible builds"
wheeze will actually work.

I also changed the variable name to PG_GIT_REVISION per your
other suggestion.

regards, tom lane

Attachments:

v2-0001-make-packaged-commit-selectable.patchtext/x-diff; charset=us-ascii; name=v2-0001-make-packaged-commit-selectable.patchDownload
diff --git a/GNUmakefile.in b/GNUmakefile.in
index 30553b2a95..9e41794f60 100644
--- a/GNUmakefile.in
+++ b/GNUmakefile.in
@@ -87,6 +87,9 @@ update-unicode: | submake-generated-headers submake-libpgport
 distdir	= postgresql-$(VERSION)
 dummy	= =install=
 
+# git revision to be packaged
+PG_GIT_REVISION ?= HEAD
+
 GIT = git
 
 dist: $(distdir).tar.gz $(distdir).tar.bz2
@@ -102,10 +105,10 @@ distdir-location:
 # on, Unix machines.
 
 $(distdir).tar.gz:
-	$(GIT) -C $(srcdir) -c core.autocrlf=false archive --format tar.gz -9 --prefix $(distdir)/ HEAD -o $(abs_top_builddir)/$@
+	$(GIT) -C $(srcdir) -c core.autocrlf=false archive --format tar.gz -9 --prefix $(distdir)/ $(PG_GIT_REVISION) -o $(abs_top_builddir)/$@
 
 $(distdir).tar.bz2:
-	$(GIT) -C $(srcdir) -c core.autocrlf=false -c tar.tar.bz2.command='$(BZIP2) -c' archive --format tar.bz2 --prefix $(distdir)/ HEAD -o $(abs_top_builddir)/$@
+	$(GIT) -C $(srcdir) -c core.autocrlf=false -c tar.tar.bz2.command='$(BZIP2) -c' archive --format tar.bz2 --prefix $(distdir)/ $(PG_GIT_REVISION) -o $(abs_top_builddir)/$@
 
 distcheck: dist
 	rm -rf $(dummy)
diff --git a/meson.build b/meson.build
index cdfd31377d..1c0579d5a6 100644
--- a/meson.build
+++ b/meson.build
@@ -3469,6 +3469,8 @@ bzip2 = find_program('bzip2', required: false, native: true)
 
 distdir = meson.project_name() + '-' + meson.project_version()
 
+pg_git_revision = get_option('PG_GIT_REVISION')
+
 # Note: core.autocrlf=false is needed to avoid line-ending conversion
 # in case the environment has a different setting.  Without this, a
 # tarball created on Windows might be different than on, and unusable
@@ -3483,7 +3485,7 @@ tar_gz = custom_target('tar.gz',
             '-9',
             '--prefix', distdir + '/',
             '-o', join_paths(meson.build_root(), '@OUTPUT@'),
-            'HEAD', '.'],
+            pg_git_revision],
   output: distdir + '.tar.gz',
 )
 
@@ -3497,7 +3499,7 @@ if bzip2.found()
               '--format', 'tar.bz2',
               '--prefix', distdir + '/',
               '-o', join_paths(meson.build_root(), '@OUTPUT@'),
-              'HEAD', '.'],
+              pg_git_revision],
     output: distdir + '.tar.bz2',
   )
 else
diff --git a/meson_options.txt b/meson_options.txt
index 249ecc5ffd..246cecf382 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -55,6 +55,9 @@ option('atomics', type: 'boolean', value: true,
 option('spinlocks', type: 'boolean', value: true,
   description: 'Use spinlocks')
 
+option('PG_GIT_REVISION', type: 'string', value: 'HEAD',
+  description: 'git revision to be packaged by pgdist target')
+
 
 # Compilation options
 
v2-0002-fix-tarball-build-script.patchtext/x-diff; charset=us-ascii; name=v2-0002-fix-tarball-build-script.patchDownload
--- mk-one-release.orig	2024-04-23 17:30:08.983226671 -0400
+++ mk-one-release	2024-04-29 12:10:38.106723387 -0400
@@ -39,13 +39,17 @@ mkdir pgsql
 git archive ${gitref} | tar xf - -C pgsql
 
 # Include the git ref in the output tarballs
+# (This has no effect with v17 and up; instead we rely on "git archive"
+# to include the commit hash in the tar header)
 echo ${gitref} >pgsql/.gitrevision
 
 cd pgsql
 ./configure
 
 # Produce .tar.gz and .tar.bz2 files
-make dist
+# (With v17 and up, this will repeat the "git archive" call; the contents
+# of the working directory don't matter directly to the results.)
+make dist PG_GIT_REVISION=${gitref}
 
 # Generate md5 and sha256 sums, and copy files to output
 for x in *.tar.*; do
#12Peter Eisentraut
peter@eisentraut.org
In reply to: Tom Lane (#11)
Re: Tarball builds in the new world order

On 29.04.24 18:14, Tom Lane wrote:

Peter Eisentraut <peter@eisentraut.org> writes:

On 26.04.24 21:24, Tom Lane wrote:

Concretely, I'm proposing the attached. Peter didn't like
PG_COMMIT_HASH, so I have PG_COMMIT_REFSPEC below, but I'm not
wedded to that if a better name is proposed.

This seems ok to me, but note that we do have an equivalent
implementation in meson. If we don't want to update that in a similar
way, maybe we should disable it.

OK. After poking at that for awhile, it seemed like "default to
HEAD" fits into meson a lot better than "throw an error if the
variable isn't set", so I switched to doing it like that.
One reason is that AFAICT you can only set the variable during
"meson setup" not during "ninja". This won't matter to the
tarball build script, which does a one-off configuration run
anyway. But for manual use, a movable target like HEAD might be
more convenient given that behavior.

This patch looks good to me.

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#12)
Re: Tarball builds in the new world order

Peter Eisentraut <peter@eisentraut.org> writes:

On 29.04.24 18:14, Tom Lane wrote:

OK. After poking at that for awhile, it seemed like "default to
HEAD" fits into meson a lot better than "throw an error if the
variable isn't set", so I switched to doing it like that.

This patch looks good to me.

Pushed, thanks.

regards, tom lane