Git revision in tarballs

Started by Magnus Haganderover 4 years ago14 messages
#1Magnus Hagander
magnus@hagander.net
1 attachment(s)

I think it'd be useful to be able to identify exactly which git commit
was used to produce a tarball. This would be especially useful when
downloading snapshot tarballs where that's not entirely clear, but can
also be used to verify that the release tarballs matches what's
expected (in the extremely rare case that a tarball is rewrapped for
example).

What do people think of the attached?

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

Attachments:

gitrevision.patchtext/x-patch; charset=US-ASCII; name=gitrevision.patchDownload
diff --git a/GNUmakefile.in b/GNUmakefile.in
index 2352fc1171..f63fb037ae 100644
--- a/GNUmakefile.in
+++ b/GNUmakefile.in
@@ -114,6 +114,8 @@ distdir:
 	cp $(distdir)/doc/src/sgml/INSTALL $(distdir)/
 	$(MAKE) -C $(distdir) distclean
 	rm -f $(distdir)/README.git
+	git rev-parse HEAD > $(distdir)/.gitrevision
+	if test -n "`git status --untracked-files=no --porcelain`"; then sed -i "s/$$/ (modified)/" $(distdir)/.gitrevision ; fi
 
 distcheck: dist
 	rm -rf $(dummy)
#2Josef Šimánek
josef.simanek@gmail.com
In reply to: Magnus Hagander (#1)
Re: Git revision in tarballs

čt 15. 7. 2021 v 10:33 odesílatel Magnus Hagander <magnus@hagander.net> napsal:

I think it'd be useful to be able to identify exactly which git commit
was used to produce a tarball. This would be especially useful when
downloading snapshot tarballs where that's not entirely clear, but can
also be used to verify that the release tarballs matches what's
expected (in the extremely rare case that a tarball is rewrapped for
example).

What do people think of the attached?

The only problem I do see is adding "git" as a new dependency. That
can potentially cause troubles.

For the file name, I have seen GIT_VERSION or REVISION file names used
before in another projects. Using ".gitrevision" doesn't make sense to
me since it will be hidden on Unix by default and I'm not sure that is
intended.

Show quoted text

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#3Magnus Hagander
magnus@hagander.net
In reply to: Josef Šimánek (#2)
Re: Git revision in tarballs

On Thu, Jul 15, 2021 at 1:40 PM Josef Šimánek <josef.simanek@gmail.com> wrote:

čt 15. 7. 2021 v 10:33 odesílatel Magnus Hagander <magnus@hagander.net> napsal:

I think it'd be useful to be able to identify exactly which git commit
was used to produce a tarball. This would be especially useful when
downloading snapshot tarballs where that's not entirely clear, but can
also be used to verify that the release tarballs matches what's
expected (in the extremely rare case that a tarball is rewrapped for
example).

What do people think of the attached?

The only problem I do see is adding "git" as a new dependency. That
can potentially cause troubles.

But only for *creating* the tarballs, and not for using them. I'm not
sure what the usecase would be to create a tarball from an environment
that doesn't have git?

For the file name, I have seen GIT_VERSION or REVISION file names used
before in another projects. Using ".gitrevision" doesn't make sense to
me since it will be hidden on Unix by default and I'm not sure that is
intended.

It was definitely intended, as I'd assume it's normally a file that
most people don't care about, but more something that scripts that
verify things would. But I'm more than happy to change it to a
different name if that's preferred. I looked around a bit and couldn't
find any general consensus for a name for such a file, but I may not
have looked carefully enough.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#4Michael Paquier
michael@paquier.xyz
In reply to: Magnus Hagander (#3)
Re: Git revision in tarballs

On Thu, Jul 15, 2021 at 01:44:45PM +0200, Magnus Hagander wrote:

But only for *creating* the tarballs, and not for using them. I'm not
sure what the usecase would be to create a tarball from an environment
that doesn't have git?

Which would likely mean somebody creating a release tarball in an
environment doing a build with what is already a release tarball.
Adding a dependency to git in this code path does not sound that bad
to me, FWIW.
--
Michael

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#3)
Re: Git revision in tarballs

Magnus Hagander <magnus@hagander.net> writes:

On Thu, Jul 15, 2021 at 1:40 PM Josef Šimánek <josef.simanek@gmail.com> wrote:

The only problem I do see is adding "git" as a new dependency. That
can potentially cause troubles.

But only for *creating* the tarballs, and not for using them. I'm not
sure what the usecase would be to create a tarball from an environment
that doesn't have git?

I agree, this objection seems silly. If we ever move off of git, the
process could be adapted at that time. However, there *is* a reasonable
question whether this ought to be handled by "make dist" versus the
tarball-wrapping script.

For the file name, I have seen GIT_VERSION or REVISION file names used
before in another projects. Using ".gitrevision" doesn't make sense to
me since it will be hidden on Unix by default and I'm not sure that is
intended.

It was definitely intended, as I'd assume it's normally a file that
most people don't care about, but more something that scripts that
verify things would. But I'm more than happy to change it to a
different name if that's preferred. I looked around a bit and couldn't
find any general consensus for a name for such a file, but I may not
have looked carefully enough.

We already have that convention in place:

$ ls -a
./ .gitignore README.git contrib/
../ COPYRIGHT aclocal.m4 doc/
.dir-locals.el GNUmakefile config/ src/
.editorconfig GNUmakefile.in config.log tmp_install/
.git/ HISTORY config.status*
.git-blame-ignore-revs Makefile configure*
.gitattributes README configure.ac

So ".gitrevision" or the like seems fine to me.

My thoughts about the proposed patch are (1) you'd better have a
.gitignore entry too, and (2) what is the mechanism that removes
this file? It seems weird to have a make rule that makes a
generated file but none to remove it. Perhaps maintainer-clean
should remove it?

Both of those issues vanish if this is delegated to the tarball
making script; as does the need to cope with a starting point
that isn't a specific commit. So on the whole I'm leaning to
the idea that it would be better done over there.

regards, tom lane

#6Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#5)
Re: Git revision in tarballs

On Thu, Jul 15, 2021 at 3:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Magnus Hagander <magnus@hagander.net> writes:

On Thu, Jul 15, 2021 at 1:40 PM Josef Šimánek <josef.simanek@gmail.com> wrote:

The only problem I do see is adding "git" as a new dependency. That
can potentially cause troubles.

But only for *creating* the tarballs, and not for using them. I'm not
sure what the usecase would be to create a tarball from an environment
that doesn't have git?

I agree, this objection seems silly. If we ever move off of git, the
process could be adapted at that time. However, there *is* a reasonable
question whether this ought to be handled by "make dist" versus the
tarball-wrapping script.

For the file name, I have seen GIT_VERSION or REVISION file names used
before in another projects. Using ".gitrevision" doesn't make sense to
me since it will be hidden on Unix by default and I'm not sure that is
intended.

It was definitely intended, as I'd assume it's normally a file that
most people don't care about, but more something that scripts that
verify things would. But I'm more than happy to change it to a
different name if that's preferred. I looked around a bit and couldn't
find any general consensus for a name for such a file, but I may not
have looked carefully enough.

We already have that convention in place:

$ ls -a
./ .gitignore README.git contrib/
../ COPYRIGHT aclocal.m4 doc/
.dir-locals.el GNUmakefile config/ src/
.editorconfig GNUmakefile.in config.log tmp_install/
.git/ HISTORY config.status*
.git-blame-ignore-revs Makefile configure*
.gitattributes README configure.ac

So ".gitrevision" or the like seems fine to me.

My thoughts about the proposed patch are (1) you'd better have a
.gitignore entry too, and (2) what is the mechanism that removes
this file? It seems weird to have a make rule that makes a
generated file but none to remove it. Perhaps maintainer-clean
should remove it?

maintainer-clean sounds reasonable for that, yes.

Both of those issues vanish if this is delegated to the tarball
making script; as does the need to cope with a starting point
that isn't a specific commit. So on the whole I'm leaning to
the idea that it would be better done over there.

I'd be fine with either. The argument for putting it in the makefile
would be, uh, maybe it makes it a tad bit easier to verify builds
because you get it in your local build as well. But it's not like it's
very *hard* to do it...

Putting it in the tarball making script certainly works for me,
though, if that's what people prefer. And that does away with the
"clean" part as that one blows away the whole directory between each
run.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#6)
Re: Git revision in tarballs

Magnus Hagander <magnus@hagander.net> writes:

Putting it in the tarball making script certainly works for me,
though, if that's what people prefer. And that does away with the
"clean" part as that one blows away the whole directory between each
run.

Actually, we *have* to do it over there, because what that script
does is

# Export the selected git ref
git archive ${i} | tar xf - -C pgsql

cd pgsql
./configure
# some irrelevant stuff
make dist

So there's no .git subdirectory in the directory it runs "make dist"
in. Now maybe it'd work anyway because of the GIT_DIR environment
variable, but what I think is more likely is that the file would
end up containing the current master-branch HEAD commit, whereas
the thing we actually want to record here is ${i}.

regards, tom lane

#8Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#7)
Re: Git revision in tarballs

On Thu, Jul 15, 2021 at 4:35 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Magnus Hagander <magnus@hagander.net> writes:

Putting it in the tarball making script certainly works for me,
though, if that's what people prefer. And that does away with the
"clean" part as that one blows away the whole directory between each
run.

Actually, we *have* to do it over there, because what that script
does is

# Export the selected git ref
git archive ${i} | tar xf - -C pgsql

cd pgsql
./configure
# some irrelevant stuff
make dist

So there's no .git subdirectory in the directory it runs "make dist"
in. Now maybe it'd work anyway because of the GIT_DIR environment
variable, but what I think is more likely is that the file would
end up containing the current master-branch HEAD commit, whereas
the thing we actually want to record here is ${i}.

Hah, yeah, that certainly decides it. And I was even poking around
that script as well today, just nt at the same time :)

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#9Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Magnus Hagander (#1)
Re: Git revision in tarballs

On 15.07.21 10:33, Magnus Hagander wrote:

I think it'd be useful to be able to identify exactly which git commit
was used to produce a tarball. This would be especially useful when
downloading snapshot tarballs where that's not entirely clear, but can
also be used to verify that the release tarballs matches what's
expected (in the extremely rare case that a tarball is rewrapped for
example).

Or we could do what git-archive does:

Additionally the commit ID is stored in a global extended
pax header if the tar format is used; it can be extracted using git
get-tar-commit-id. In ZIP files it is stored as
a file comment.

#10Daniel Gustafsson
daniel@yesql.se
In reply to: Peter Eisentraut (#9)
Re: Git revision in tarballs

On 21 Jul 2021, at 20:25, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:

On 15.07.21 10:33, Magnus Hagander wrote:

I think it'd be useful to be able to identify exactly which git commit
was used to produce a tarball. This would be especially useful when
downloading snapshot tarballs where that's not entirely clear, but can
also be used to verify that the release tarballs matches what's
expected (in the extremely rare case that a tarball is rewrapped for
example).

Or we could do what git-archive does:

Additionally the commit ID is stored in a global extended
pax header if the tar format is used; it can be extracted using git
get-tar-commit-id. In ZIP files it is stored as
a file comment.

That does adds Git as a dependency for consuming the tarball though, which
might not be a problem but it's a change from what we require today.

--
Daniel Gustafsson https://vmware.com/

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Daniel Gustafsson (#10)
Re: Git revision in tarballs

Daniel Gustafsson <daniel@yesql.se> writes:

On 21 Jul 2021, at 20:25, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:

On 15.07.21 10:33, Magnus Hagander wrote:

I think it'd be useful to be able to identify exactly which git commit
was used to produce a tarball.

Or we could do what git-archive does:
Additionally the commit ID is stored in a global extended
pax header if the tar format is used; it can be extracted using git
get-tar-commit-id. In ZIP files it is stored as
a file comment.

That does adds Git as a dependency for consuming the tarball though, which
might not be a problem but it's a change from what we require today.

It also requires keeping the tarball itself around, which you might not
have done, or you might not remember which directory you extracted which
tarball into. So on the whole that solution seems strictly worse.

FYI, the "put it into .gitrevision" solution is already implemented
in the new tarball-building script that Magnus and I have been
working on off-list.

regards, tom lane

#12Daniel Gustafsson
daniel@yesql.se
In reply to: Tom Lane (#11)
Re: Git revision in tarballs

On 21 Jul 2021, at 21:23, Tom Lane <tgl@sss.pgh.pa.us> wrote:

FYI, the "put it into .gitrevision" solution is already implemented
in the new tarball-building script that Magnus and I have been
working on off-list.

+1, I think that's the preferred option.

--
Daniel Gustafsson https://vmware.com/

#13Peter Eisentraut
peter.eisentraut@enterprisedb.com
In reply to: Daniel Gustafsson (#10)
Re: Git revision in tarballs

On 21.07.21 21:12, Daniel Gustafsson wrote:

Or we could do what git-archive does:

Additionally the commit ID is stored in a global extended
pax header if the tar format is used; it can be extracted using git
get-tar-commit-id. In ZIP files it is stored as
a file comment.

That does adds Git as a dependency for consuming the tarball though, which
might not be a problem but it's a change from what we require today.

How so?

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#13)
Re: Git revision in tarballs

Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:

On 21.07.21 21:12, Daniel Gustafsson wrote:

Additionally the commit ID is stored in a global extended
pax header if the tar format is used; it can be extracted using git
get-tar-commit-id. In ZIP files it is stored as
a file comment.

That does adds Git as a dependency for consuming the tarball though, which
might not be a problem but it's a change from what we require today.

How so?

It's only a dependency if you want to know the commit ID, which
perhaps isn't something you would have use for if you don't have
git installed ... but I don't think that's totally obvious.

Personally I'd be more worried about rendering the tarballs
totally corrupt from the perspective of somebody using an old
"tar" that hasn't heard of extended pax headers. Maybe there
are no such versions in the wild anymore; but I do not see any
advantages to this approach that would justify taking any risk for.

regards, tom lane