Report: removing the inconsistencies in our CVS->git conversion

Started by Tom Laneover 15 years ago59 messageshackers
Jump to latest
#1Tom Lane
tgl@sss.pgh.pa.us

I've spent much of the weekend examining the discrepancies between our CVS
repository and the tarballs available from our FTP archives, and after
that trying to remove infelicities in the cvs2git output. There are a
couple of remaining oddities that I would classify as probable cvs2git
bugs, but an awful lot of it is inconsistencies in the CVS repository
itself, some of which I can explain and some that I can't. Read on for
many boring details.

One thing that only old-timers will recall is that originally the PG code
base was divided into multiple repositories. There was one for the server
code and one for the client interfaces, and I believe that at the very
beginning much of the documentation was in yet a third place. The oldest
stuff that's now in src/interfaces/ was in the client repository. It
looks to me like when the earliest tarballs were made up, the
subdirectories that were in the client repository were dumped directly
under src/ instead of src/interfaces; that is, the directory layout of
those tarballs does not exactly match the current CVS repository layout.

I also found out that somebody seems to have manually moved the RCS file
for src/backend/commands/version.c into src/backend/commands/_deadcode,
and that a couple of subdirectories apparently were manually renamed
somewhere along the line.

The upshot of all this is that if you want to match the old tarballs to
current CVS contents, you need to make these hacks:

# hacks to make certain old versions diff successfully
if ((-d "postgresql-v$tag/src" and
not -d "postgresql-v$tag/src/interfaces") or
-d "postgres95/src") {
print "moving src/interfaces for $tag\n";
system("mv cvsout/src/interfaces/* cvsout/src") == 0 || die "mv failed: $?";
system("rmdir cvsout/src/interfaces") == 0 || die "rmdir failed: $?";
}
if (-d "postgresql-v$tag/src/pgsql_perl5") {
print "moving perl5 for $tag\n";
system("mv cvsout/src/perl5 cvsout/src/pgsql_perl5") == 0 || die "mv failed: $?";
}
if (-f "postgresql-$tag/src/backend/commands/version.c" or
-f "postgresql-v$tag/src/backend/commands/version.c" or
-f "postgres95/src/backend/commands/version.c") {
print "moving version.c for $tag\n";
system("mv cvsout/src/backend/commands/_deadcode/version.c cvsout/src/backend/commands") == 0 || die "mv failed: $?";
system("rmdir cvsout/src/backend/commands/_deadcode 2>/dev/null");
}
if (-d "postgresql-$tag/src/test/locale/ISO8859-7") {
print "moving ISO8859-7 for $tag\n";
system("mv cvsout/src/test/locale/gr_GR.ISO8859-7 cvsout/src/test/locale/ISO8859-7") == 0 || die "mv failed: $?";
}

Just for the record, these are the versions for which these tests hit:

moving src/interfaces for 1.08
moving version.c for 1.08
moving src/interfaces for 1.09
moving version.c for 1.09
moving src/interfaces for 6.1
moving perl5 for 6.1
moving version.c for 6.1
moving src/interfaces for 6.1.1
moving perl5 for 6.1.1
moving version.c for 6.1.1
moving version.c for 6.2
moving version.c for 6.2.1
moving version.c for 6.3.2
moving ISO8859-7 for 6.5
moving ISO8859-7 for 6.5.1
moving ISO8859-7 for 6.5.2
moving ISO8859-7 for 6.5.3

With those changes, I am able to match all the available archival tarballs
to various places in the CVS history. The exact spots where they match
are detailed in the attached "matches" file. The file also shows the
cvsroot path and CVS module name that was in use at each time; you need
to duplicate that if you want $Header$ lines to match what's in the
tarballs. (I set up symlinks to the base repository on my machine so that
CVS could check out successfully for each of these scenarios.)

There are still a couple of unexplainable discrepancies, though.
In particular, the 1.08 and 1.09 tarballs contain this fix:
http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/interfaces/libpgtcl/Attic/pgtclCmds.c.diff?r1=1.10;r2=1.11
which is odd because it wasn't applied to CVS till months after those
tarballs were made. Even odder, the file timestamp on pgtclCmds.c in
the tarballs agrees with CVS revision 1.2, which is what ought to be in
those tarballs according to CVS. It may be that this fix was made in the
separate client-code repository and not propagated to the core till later;
but that theory doesn't explain the exact timestamp match.

Anyway, the distressing thing about what the "matches" file shows is that
we do not have CVS tags for a lot of the older tarballs. Even worse,
there are a couple of CVS tags that look like they ought to match released
tarballs, but do not: the tags were evidently applied a few commits before
the tarball was actually made. In particular, the tags REL6_5, REL7_1,
and REL7_1_2 don't match the tarballs they ought to. I don't have a whole
lot of faith in some of the other early tags either, because we don't seem
to have an archived tarball to compare them to.

Having completed that comparison, I then moved on to trying to get rid of
the discrepancies in the git conversion; particularly, trying to get rid
of the "manufactured commits". I didn't have much success in that for the
cases where the manufactured commit was caused by a back-branch file
addition. The case I showed before where things cleaned up nicely (for
pg_dump's it.po) depended on the fact that the place where the branch
would naturally sprout off happened to be a "dead" revision on HEAD.
That's not the case anywhere else, so I gave up on the complicated patch
for it.po. The patches I'm using instead just inject a dead ".0" revision
immediately after the branch point, and are pretty small and easy to
verify. I only bothered to do this for the cases where the back-branch
addition happened significantly later than the main-branch addition. If
they were done in a group of related commits with nothing else in between,
I left well enough alone. We still have "manufactured" commits either
way, but they are just cosmetic so I guess we should live with them.

I also found numerous places where we'd been sloppy about placing tags.
That explains some of the weird things cvs2git did. In particular:

* We had the already-known problem that gram.c and some other derived
files had commits made after they should have been dead.

* Bruce had transiently added those files on the WIN32_DEV branch as
well, to general disapproval, and this seemed to also give cvs2git
indigestion. The attached proposed fixup script deals with this by
deleting those revisions altogether. This is a loss of history, but
not one that I care about.

* The HISTORY and INSTALL files have REL7_3_10 tags and should not.
As mentioned earlier, I think this is because they were deleted after the
original placement of that tag, and weren't correctly fixed when the
tag was moved up to branch end a few days later.

* The regression tests files recently added to contrib/xml2 have REL8_0_23
tags. I have no idea how that happened, because they certainly didn't
exist when 8.0.23 was released.

* There are a bunch of files that should have REL7_3_5 tags and lack them.
They are in just a few subdirectories, so probably what happened was that
the "cvs tag" operation was issued in an incomplete checkout tree.

* Similarly, gram.c should have a release-6-3 tag and lacks it.

* There are a bunch of files that have REL7_1 tags when what they should
have are REL7_1_BETA tags. These appear to be exactly the files that were
deleted between the initial placement of the REL7_1 tag and Marc's later
ex-post-facto renaming of the tag to REL7_1_BETA. I'm guessing another
case of "cvs tag" missing files that weren't in the checkout.

* There are a number of files that lack the REL2_0 tag and REL2_0B branch,
though they should have it according to file dates. These appear to be
exactly the files that were in the separate documentation repository at
the time, so that probably tells us the mechanism for missing them.

After fixing all the above items using the attached script, I have what
seems to be a reasonably clean conversion. I still have the three
oddities alluded to over in the "uh-oh" thread, but I'm not sure any of
them should be considered blockers for making the conversion. There are
also some cosmetic issues remaining, like what committer to blame the
various inserted commits on and whether we want to keep partial tags.
But this message is long enough already so I'll get to those issues
separately.

Attached are an updated version of Max's README file about how to perform
the conversion, the repository fixup script needed for that, the Perl
script I used for comparing CVS to tarballs, and the input file for the
Perl script, which shows which CVS tag or checkout date to compare against
each of the available tarballs.

regards, tom lane

#2Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#1)
Re: Report: removing the inconsistencies in our CVS->git conversion

On Sun, Sep 12, 2010 at 11:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I've spent much of the weekend examining the discrepancies between our CVS
repository and the tarballs available from our FTP archives, and after
that trying to remove infelicities in the cvs2git output.  There are a
couple of remaining oddities that I would classify as probable cvs2git
bugs, but an awful lot of it is inconsistencies in the CVS repository
itself, some of which I can explain and some that I can't.  Read on for
many boring details.

First of all, WOW, and thank you very much for putting in the time to
make this happen.

With those changes, I am able to match all the available archival tarballs
to various places in the CVS history.  The exact spots where they match
are detailed in the attached "matches" file.  The file also shows the

Regrettably, all of your attachments came through as part of the
actual email, both in my GMail and in the archives. I hate
technology.

Having completed that comparison, I then moved on to trying to get rid of
the discrepancies in the git conversion; particularly, trying to get rid
of the "manufactured commits".  I didn't have much success in that for the
cases where the manufactured commit was caused by a back-branch file
addition. [...]  We still have "manufactured" commits either
way, but they are just cosmetic so I guess we should live with them.

I'm not really following what the history looks like here. What are
the contents (git show) of the manufactured commit?

I also found numerous places where we'd been sloppy about placing tags.
That explains some of the weird things cvs2git did.  In particular:

* We had the already-known problem that gram.c and some other derived
files had commits made after they should have been dead.

* Bruce had transiently added those files on the WIN32_DEV branch as
well, to general disapproval, and this seemed to also give cvs2git
indigestion.  The attached proposed fixup script deals with this by
deleting those revisions altogether.  This is a loss of history, but
not one that I care about.

* The HISTORY and INSTALL files have REL7_3_10 tags and should not.
As mentioned earlier, I think this is because they were deleted after the
original placement of that tag, and weren't correctly fixed when the
tag was moved up to branch end a few days later.

* The regression tests files recently added to contrib/xml2 have REL8_0_23
tags.  I have no idea how that happened, because they certainly didn't
exist when 8.0.23 was released.

* There are a bunch of files that should have REL7_3_5 tags and lack them.
They are in just a few subdirectories, so probably what happened was that
the "cvs tag" operation was issued in an incomplete checkout tree.

* Similarly, gram.c should have a release-6-3 tag and lacks it.

* There are a bunch of files that have REL7_1 tags when what they should
have are REL7_1_BETA tags.  These appear to be exactly the files that were
deleted between the initial placement of the REL7_1 tag and Marc's later
ex-post-facto renaming of the tag to REL7_1_BETA.  I'm guessing another
case of "cvs tag" missing files that weren't in the checkout.

* There are a number of files that lack the REL2_0 tag and REL2_0B branch,
though they should have it according to file dates.  These appear to be
exactly the files that were in the separate documentation repository at
the time, so that probably tells us the mechanism for missing them.

I wonder if we should consider fixing some or all of these things on
the master CVS repository. I wouldn't be too eager to inject those
fake .0 commits for fear of breakage, but moving tags to where they
ought to have been all along seems like it might be a good thing to do
independent of git.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#2)
Re: Report: removing the inconsistencies in our CVS->git conversion

Robert Haas <robertmhaas@gmail.com> writes:

Regrettably, all of your attachments came through as part of the
actual email, both in my GMail and in the archives. I hate
technology.

Sorry about that. Here's another try with the stuff in a tarball.
This time, I also remembered to include cvs2git.options; although
I think it's the same as Max's original except for

-    r'cvsroot/pgsql',
+    r'/cvsroot/pgsql',

I'll address the other points in a bit.

regards, tom lane

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#2)
Re: Report: removing the inconsistencies in our CVS->git conversion

Robert Haas <robertmhaas@gmail.com> writes:

On Sun, Sep 12, 2010 at 11:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Having completed that comparison, I then moved on to trying to get rid of
the discrepancies in the git conversion; particularly, trying to get rid
of the "manufactured commits". �I didn't have much success in that for the
cases where the manufactured commit was caused by a back-branch file
addition. [...] �We still have "manufactured" commits either
way, but they are just cosmetic so I guess we should live with them.

I'm not really following what the history looks like here. What are
the contents (git show) of the manufactured commit?

A typical example is

commit 4d2ac8075a93c685dbbe920f4bac23288dd7cf11
Author: PostgreSQL Daemon <webmaster@postgresql.org>
Date: Tue Nov 22 18:17:36 2005 +0000

This commit was manufactured by cvs2svn to create branch 'REL7_4_STABLE'.

Cherrypick from master 2005-11-22 18:17:34 UTC Bruce Momjian <bruce@momjian.us> 'Re-run pgindent, fixing a problem where comment lines after a blank':
src/port/unsetenv.c

diff --git a/src/port/unsetenv.c b/src/port/unsetenv.c
new file mode 100644
index 0000000..bdfb3f6
--- /dev/null
+++ b/src/port/unsetenv.c
@@ -0,0 +1,56 @@
+ [ entire contents of unsetenv.c here ]

In the cases where I inserted a dead .0 revision, this is followed by
something like

commit a1bdd263ca8ff657365a97a560f6371f39295efc
Author: Bruce Momjian <bruce@momjian.us>
Date: Tue Nov 22 18:17:37 2005 +0000

Mark branch as deleted.

diff --git a/src/port/unsetenv.c b/src/port/unsetenv.c
deleted file mode 100644
index bdfb3f6..0000000
--- a/src/port/unsetenv.c
+++ /dev/null
@@ -1,56 +0,0 @@
- [ entire contents of unsetenv.c here too ]

I'm a bit disappointed by the fact that we get either of these. I had
gathered from Max's comments that the dead-revision-at-the-base-of-the-
branch trick is considered standard in newer CVS versions, and so I'd
hoped that cvs2git would understand the construct and not generate
either of these commits. Possibly the hacked-up revisions I inserted
are enough different from the regular kind to confuse it.

I also found numerous places where we'd been sloppy about placing tags.

I wonder if we should consider fixing some or all of these things on
the master CVS repository. I wouldn't be too eager to inject those
fake .0 commits for fear of breakage, but moving tags to where they
ought to have been all along seems like it might be a good thing to do
independent of git.

Yeah, that's something I was wondering too. Applying these fixes to the
master repository would also reduce the number of things we have to
remember to do during the final conversion. OTOH, there's that risk of
breaking something.

regards, tom lane

#5Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#4)
Re: Report: removing the inconsistencies in our CVS->git conversion

On Mon, Sep 13, 2010 at 11:48 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Sun, Sep 12, 2010 at 11:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Having completed that comparison, I then moved on to trying to get rid of
the discrepancies in the git conversion; particularly, trying to get rid
of the "manufactured commits".  I didn't have much success in that for the
cases where the manufactured commit was caused by a back-branch file
addition. [...]  We still have "manufactured" commits either
way, but they are just cosmetic so I guess we should live with them.

I'm not really following what the history looks like here.  What are
the contents (git show) of the manufactured commit?

A typical example is

commit 4d2ac8075a93c685dbbe920f4bac23288dd7cf11
Author: PostgreSQL Daemon <webmaster@postgresql.org>
Date:   Tue Nov 22 18:17:36 2005 +0000

   This commit was manufactured by cvs2svn to create branch 'REL7_4_STABLE'.

   Cherrypick from master 2005-11-22 18:17:34 UTC Bruce Momjian <bruce@momjian.us> 'Re-run pgindent, fixing a problem where comment lines after a blank':
       src/port/unsetenv.c

diff --git a/src/port/unsetenv.c b/src/port/unsetenv.c
new file mode 100644
index 0000000..bdfb3f6
--- /dev/null
+++ b/src/port/unsetenv.c
@@ -0,0 +1,56 @@
+ [ entire contents of unsetenv.c here ]

In the cases where I inserted a dead .0 revision, this is followed by
something like

commit a1bdd263ca8ff657365a97a560f6371f39295efc
Author: Bruce Momjian <bruce@momjian.us>
Date:   Tue Nov 22 18:17:37 2005 +0000

   Mark branch as deleted.

If we have two commits one right after the other that cancel each
other out, we might be able to write them both out of the history
using git-filter-branch. But if Max or Michael can shed any light on
why it's happening, that might lead to a simpler solution.

I also found numerous places where we'd been sloppy about placing tags.

I wonder if we should consider fixing some or all of these things on
the master CVS repository.  I wouldn't be too eager to inject those
fake .0 commits for fear of breakage, but moving tags to where they
ought to have been all along seems like it might be a good thing to do
independent of git.

Yeah, that's something I was wondering too.  Applying these fixes to the
master repository would also reduce the number of things we have to
remember to do during the final conversion.  OTOH, there's that risk of
breaking something.

Hand-written patches that apply directly to the RCS files seem like
they'd be a risk for breakage, but I don't see why moving tags around
would be all that dangerous, especially in cases where you can do it
by running 'cvs' itself rather than 'rcs'. That should just be
routine stuff, no?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#5)
Re: Report: removing the inconsistencies in our CVS->git conversion

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Sep 13, 2010 at 11:48 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

I wonder if we should consider fixing some or all of these things on
the master CVS repository. �I wouldn't be too eager to inject those
fake .0 commits for fear of breakage, but moving tags to where they
ought to have been all along seems like it might be a good thing to do
independent of git.

Yeah, that's something I was wondering too. �Applying these fixes to the
master repository would also reduce the number of things we have to
remember to do during the final conversion. �OTOH, there's that risk of
breaking something.

Hand-written patches that apply directly to the RCS files seem like
they'd be a risk for breakage, but I don't see why moving tags around
would be all that dangerous, especially in cases where you can do it
by running 'cvs' itself rather than 'rcs'. That should just be
routine stuff, no?

Hrm, well, keep in mind that most of these problems were *created* by
careless use of "cvs tag". At the moment I'm leaning towards the idea
that we should leave the CVS repository as it is, rather than take any
risk of making things worse.

regards, tom lane

#7Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#6)
Re: Report: removing the inconsistencies in our CVS->git conversion

On Mon, Sep 13, 2010 at 1:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Sep 13, 2010 at 11:48 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

I wonder if we should consider fixing some or all of these things on
the master CVS repository.  I wouldn't be too eager to inject those
fake .0 commits for fear of breakage, but moving tags to where they
ought to have been all along seems like it might be a good thing to do
independent of git.

Yeah, that's something I was wondering too.  Applying these fixes to the
master repository would also reduce the number of things we have to
remember to do during the final conversion.  OTOH, there's that risk of
breaking something.

Hand-written patches that apply directly to the RCS files seem like
they'd be a risk for breakage, but I don't see why moving tags around
would be all that dangerous, especially in cases where you can do it
by running 'cvs' itself rather than 'rcs'.  That should just be
routine stuff, no?

Hrm, well, keep in mind that most of these problems were *created* by
careless use of "cvs tag".  At the moment I'm leaning towards the idea
that we should leave the CVS repository as it is, rather than take any
risk of making things worse.

I think that I have never, and am never likely ever to, hear anyone
describe you as careless. I feel pretty much 100% safe having you
retag those releases to match the tarballs.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#4)
Re: Report: removing the inconsistencies in our CVS->git conversion

I wrote:

I'm a bit disappointed by the fact that we get either of these. I had
gathered from Max's comments that the dead-revision-at-the-base-of-the-
branch trick is considered standard in newer CVS versions, and so I'd
hoped that cvs2git would understand the construct and not generate
either of these commits. Possibly the hacked-up revisions I inserted
are enough different from the regular kind to confuse it.

Hah: a bit of digging in the cvs2svn sources found this:

def _is_unneeded_initial_branch_delete(self, lod_items, metadata_db):
"""Return True iff the initial revision in LOD_ITEMS can be deleted."""

if not lod_items.cvs_revisions:
return False

cvs_revision = lod_items.cvs_revisions[0]

if cvs_revision.ntdbr:
return False

if not isinstance(cvs_revision, CVSRevisionAbsent):
return False

if cvs_revision.branch_ids:
return False

log_msg = metadata_db[cvs_revision.metadata_id].log_msg
return bool(re.match(
r'file .* was added on branch .* on '
r'\d{4}\-\d{2}\-\d{2} \d{2}\:\d{2}\:\d{2}( [\+\-]\d{4})?'
'\n$',
log_msg,
))

So it looks like I have to make the dead revisions' log messages match
that regexp. Off to make another try.

regards, tom lane

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#8)
Re: Report: removing the inconsistencies in our CVS->git conversion

I wrote:

return bool(re.match(
r'file .* was added on branch .* on '
r'\d{4}\-\d{2}\-\d{2} \d{2}\:\d{2}\:\d{2}( [\+\-]\d{4})?'
'\n$',
log_msg,
))

So it looks like I have to make the dead revisions' log messages match
that regexp. Off to make another try.

It works! Now I don't see either the manufactured commits or the
patched-in deletions.

I had not previously bothered to patch the places where a file was added
on the branch immediately after being added on the main, but now it
seems worth doing. That will get us down to a *very* small number of
manufactured commits in the final version.

regards, tom lane

#10Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#9)
Re: Report: removing the inconsistencies in our CVS->git conversion

On Mon, Sep 13, 2010 at 21:28, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

    return bool(re.match(
        r'file .* was added on branch .* on '
        r'\d{4}\-\d{2}\-\d{2} \d{2}\:\d{2}\:\d{2}( [\+\-]\d{4})?'
        '\n$',
        log_msg,
        ))

So it looks like I have to make the dead revisions' log messages match
that regexp.  Off to make another try.

It works!  Now I don't see either the manufactured commits or the
patched-in deletions.

I had not previously bothered to patch the places where a file was added
on the branch immediately after being added on the main, but now it
seems worth doing.  That will get us down to a *very* small number of
manufactured commits in the final version.

That's awesome!

Thanks so much for doing this. I've come to realize I know far too
little about *cvs* to work on those things myself :S

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

#11Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#1)
Re: Report: removing the inconsistencies in our CVS->git conversion

Tom Lane wrote:

the tarball was actually made. In particular, the tags REL6_5, REL7_1,
and REL7_1_2 don't match the tarballs they ought to. I don't have a whole
lot of faith in some of the other early tags either, because we don't seem
to have an archived tarball to compare them to.

I believe I have those on a CDROM here.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#11)
Re: Report: removing the inconsistencies in our CVS->git conversion

Bruce Momjian <bruce@momjian.us> writes:

Tom Lane wrote:

the tarball was actually made. In particular, the tags REL6_5, REL7_1,
and REL7_1_2 don't match the tarballs they ought to. I don't have a whole
lot of faith in some of the other early tags either, because we don't seem
to have an archived tarball to compare them to.

I believe I have those on a CDROM here.

If you can recover any of the releases that aren't on ftp-archive,
please send me copies.

regards, tom lane

#13Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#12)
Re: Report: removing the inconsistencies in our CVS->git conversion

Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

Tom Lane wrote:

the tarball was actually made. In particular, the tags REL6_5, REL7_1,
and REL7_1_2 don't match the tarballs they ought to. I don't have a whole
lot of faith in some of the other early tags either, because we don't seem
to have an archived tarball to compare them to.

I believe I have those on a CDROM here.

If you can recover any of the releases that aren't on ftp-archive,
please send me copies.

Sure. I have a copy of our ftp site /pub as of 6.3 and have put it
online:

http://momjian.us/expire/pgsql_ftp_6.3/

Unfortunately I don't see anything there that isn't already here:

ftp://ftp-archives.postgresql.org/pub/source/

but let me know if you find something new.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#13)
Re: [HACKERS] Report: removing the inconsistencies in our CVS->git conversion

Bruce Momjian <bruce@momjian.us> writes:

Tom Lane wrote:

If you can recover any of the releases that aren't on ftp-archive,
please send me copies.

Sure. I have a copy of our ftp site /pub as of 6.3 and have put it
online:
http://momjian.us/expire/pgsql_ftp_6.3/

Unfortunately I don't see anything there that isn't already here:
ftp://ftp-archives.postgresql.org/pub/source/

Well, you seem to have 6.0 and 6.3 initial releases, which I didn't
have before, so thanks for that. But you're failing to realize that
you have unadulterated historical gold here:

http://momjian.us/expire/pgsql_ftp_6.3/majordomo/

What that looks like to me is an archive of our mailing lists from 1996
to 1998. The material at archives.postgresql.org doesn't go back that
far, at least not for all the lists. Is anybody up for merging that
traffic into the main archives?

regards, tom lane

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#9)
Re: Report: removing the inconsistencies in our CVS->git conversion

I wrote:

I had not previously bothered to patch the places where a file was added
on the branch immediately after being added on the main, but now it
seems worth doing. That will get us down to a *very* small number of
manufactured commits in the final version.

Attached is an updated repository.fixups script that inserts dead
revisions in every case where a new file was back-patched into an
existing branch. With that, we are down to a total of nine manufactured
commits, to wit:

* Four that create the partial tags SUPPORT, MANUAL_1_0, creation, and
Release-1-6-0. I think we agreed that we can just drop these tags and
allow their manufactured commits to be garbage-collected.

* Two that create the tags Release_2_0 and Release_2_0_0. I think these
probably represent a cvs2git bug, as there is no apparent reason why it
didn't just apply the tags to the immediately preceding mainline commits
instead. In any case, we can get rid of them by moving the tags to the
appropriate commits manually.

* One that creates the branch REL2_0B. This is caused by a known,
longstanding cvs2git deficiency: it fails to pick the optimal place
to branch from when file deletions are involved. We're just going to
have to live with that, I think; it's a pretty minor infelicity anyway.

* One that creates the partial branch ecpg_big_bison. I think we have
to live with this too. I don't want to drop the branch altogether,
as that would represent a loss of development history. The only other
alternative I can think of is to try to convert it into a full branch,
but I'm unsure what the implications would be of that.

* And lastly, there's a weird manufactured commit that adds a passel of
files on REL7_3_STABLE branch, only to have them deleted again by the
following real commit. This is a result of the fact that the branch
point was moved long after creation, as discussed here:
http://archives.postgresql.org/pgsql-hackers/2002-11/msg00127.php
We could maybe try to get rid of both the manufactured commit and
the deletion commit, but I'm inclined not to. The underlying history
is really as dirty as this commit makes it look.

The long and the short of it is that I'm now satisfied with the git
conversion. There is still the issue of adding/adjusting release tags
for ancient releases, but the lack of those is surely not the
conversion's fault.

regards, tom lane

PS: This attachment is text/x-patch instead of text/plain ... does
it come through as an attachment for you, Robert?

#16Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Tom Lane (#15)
Re: Report: removing the inconsistencies in our CVS->git conversion

Tom Lane <tgl@sss.pgh.pa.us> writes:

PS: This attachment is text/x-patch instead of text/plain ... does
it come through as an attachment for you, Robert?

From my MUA, I can say that it's not so much a problem of MIME type than
the Content-Disposition, yours are always inline.

http://www.gnus.org/manual/emacs-mime_11.html#SEC11
http://en.wikipedia.org/wiki/MIME#Content-Disposition

Regards,
--
dim

#17Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#15)
Re: Report: removing the inconsistencies in our CVS->git conversion

On Tue, Sep 14, 2010 at 10:19 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

* Four that create the partial tags SUPPORT, MANUAL_1_0, creation, and
Release-1-6-0.  I think we agreed that we can just drop these tags and
allow their manufactured commits to be garbage-collected.

+1.

* Two that create the tags Release_2_0 and Release_2_0_0.  I think these
probably represent a cvs2git bug, as there is no apparent reason why it
didn't just apply the tags to the immediately preceding mainline commits
instead.  In any case, we can get rid of them by moving the tags to the
appropriate commits manually.

+1.

* One that creates the branch REL2_0B.  This is caused by a known,
longstanding cvs2git deficiency: it fails to pick the optimal place
to branch from when file deletions are involved.  We're just going to
have to live with that, I think; it's a pretty minor infelicity anyway.

Fine with me.

* One that creates the partial branch ecpg_big_bison.  I think we have
to live with this too.  I don't want to drop the branch altogether,
as that would represent a loss of development history.  The only other
alternative I can think of is to try to convert it into a full branch,
but I'm unsure what the implications would be of that.

I doubt there's a clean way to do that. I am not sure there's much
point in moving the tag over to git - anyone wanting to do something
useful with it will need to use CVS anyway, won't they?

* And lastly, there's a weird manufactured commit that adds a passel of
files on REL7_3_STABLE branch, only to have them deleted again by the
following real commit.  This is a result of the fact that the branch
point was moved long after creation, as discussed here:
http://archives.postgresql.org/pgsql-hackers/2002-11/msg00127.php
We could maybe try to get rid of both the manufactured commit and
the deletion commit, but I'm inclined not to.  The underlying history
is really as dirty as this commit makes it look.

OK.

The long and the short of it is that I'm now satisfied with the git
conversion.  There is still the issue of adding/adjusting release tags
for ancient releases, but the lack of those is surely not the
conversion's fault.

Great.

PS: This attachment is text/x-patch instead of text/plain ... does
it come through as an attachment for you, Robert?

Yep, thanks. I'd like to have Magnus run a test conversion with all
the latest and greatest stuff and throw it up somewhere so we can all
poke at it.

Incidentally, with respect to timing, do we want to press on with this
conversion now or wait until after the CommitFest is done?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#17)
Re: Report: removing the inconsistencies in our CVS->git conversion

Robert Haas <robertmhaas@gmail.com> writes:

Incidentally, with respect to timing, do we want to press on with this
conversion now or wait until after the CommitFest is done?

I'd kind of like to do it before we start the commitfest. These
repository patches will go stale if we wait too long, and a month
is probably too long. In any case I'd rather get it done while all
the information is fresh in mind.

The main schedule constraint I can see at the moment is that 9.0 wrap is
scheduled for Thursday, and I think we probably don't want to do it
before the wrap.

Another issue is that we need a chunk of Magnus' time to shepherd the
conversion, and I don't know what his availability is.

regards, tom lane

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#17)
Re: Report: removing the inconsistencies in our CVS->git conversion

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Sep 14, 2010 at 10:19 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

* One that creates the partial branch ecpg_big_bison. �I think we have
to live with this too. �I don't want to drop the branch altogether,
as that would represent a loss of development history. �The only other
alternative I can think of is to try to convert it into a full branch,
but I'm unsure what the implications would be of that.

I doubt there's a clean way to do that. I am not sure there's much
point in moving the tag over to git - anyone wanting to do something
useful with it will need to use CVS anyway, won't they?

Well ... I guess the other attitude we could take is that that was a
private development branch of Michael's. If we'd been working in git
at the time, that branch would never have been seen outside his personal
repository, most likely. The changes did eventually get merged back to
HEAD, so we'd not be losing anything critical if we just dropped the
branch altogether. Anybody else have an opinion on what to do with it?

regards, tom lane

#20Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Dimitri Fontaine (#16)
Re: Report: removing the inconsistencies in our CVS->git conversion

Excerpts from Dimitri Fontaine's message of mar sep 14 11:10:50 -0400 2010:

Tom Lane <tgl@sss.pgh.pa.us> writes:

PS: This attachment is text/x-patch instead of text/plain ... does
it come through as an attachment for you, Robert?

From my MUA, I can say that it's not so much a problem of MIME type than
the Content-Disposition, yours are always inline.

Hmm, I see it as a separate attachment in this case. The original mail
was indeed "collapsed" in that all the text attachments looked like a
single text stream.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#21Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#19)
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#21)
#23Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#22)
#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#23)
#25Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#24)
#26Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#25)
#27Michael Meskes
meskes@postgresql.org
In reply to: Tom Lane (#19)
#28Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#15)
#29Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#28)
#30Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#29)
#31Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#29)
#32Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#30)
#33Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#32)
#34Magnus Hagander
magnus@hagander.net
In reply to: Robert Haas (#32)
#35Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#34)
#36Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#35)
#37Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#36)
#38Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#37)
#39Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#38)
#40Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#38)
#41Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#40)
#42Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#41)
#43Andrew Dunstan
andrew@dunslane.net
In reply to: Magnus Hagander (#41)
#44Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#42)
#45Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#44)
#46Magnus Hagander
magnus@hagander.net
In reply to: Magnus Hagander (#41)
#47Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#44)
#48Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#47)
#49Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#47)
#50Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#48)
#51Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#49)
#52Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#51)
#53Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#52)
#54Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#53)
#55Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#54)
#56Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#42)
#57Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#56)
#58Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#57)
#59Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#58)