, , , , , , , ,

Remarks on SVN-trac to GitHub migration

GRASS GIS is an open source geoinformation system which is developed by a globally distributed team of developers. Besides the source code developers also message translators, people who write documentation, those who report bugs and wishes and more are involved.

Early days… from pre-Internet to CVS and SVN

While GRASS GIS is under development since 1982 (no typo!) it has been put into a centralized source code management system in December 1999. Why so late? Because the World Wide Web (WWW) became available in the 1990s along with tools like browsers and such, followed by the development of distributed source code management tools. We moved on 29th Dec 1999 (think Y2K bug) the entire code into our instance of CVS (Concurrent Versioning System). With OSGeo being founded in 2006, we migrated the CVS repository to SVN (Subversion for the source code management) and trac (bug and wish tracker) on 8 Dec 2007. See here for historic details on our various bug trackers.

Time to move on: git

Now, after more than 10 years using SVN/trac time had come to move on and join the large group of projects managing their source code in git (see also our related Wiki page on migration). Git comes with numerous advantages, yet we needed to decide which hosting platform to use. Options where github.com, gitlab.com, gitlab or gitea on OSGeo infrastructure, or other platforms. Through a survey we found out that the preference among contributors is GitHub. While not being open source itself it offers several advantages: it is widely known (good to get new developers interested and involved), numerous OSGeo projects are hosted there under the GitHub “OSGeo organization“.

If all fails (say, one day GitHub no longer being a reasonable choice) the import of our project from GitHub to GitLab is always possible. Indeed, we meanwhile mirror our code on OSGeo’s gitea server.

Relevant script code and migration ticket:

Relevant steps:

  • migrated SVN trunk -> git master
  • migrated and tagged release branches (milestones)
  • deleted “develbranch6” (we compared it to “releasebranch_6_4” and didn’t discover relevant differences)
  • Fix commit messages (yes, we really wanted to be brave, updating decades of commit messages!):
    • references to old RT tracker tickets (used Dec 2000 – Dec 2006)
    • references to old GForge tracker tickets (used Jan 2007 – Dec 2008)
    • references to other trac tickets (#x -> https://trac.osgeo.org/…)

Source code migration: the new git repositories

  • github repository “grass” (repo)

    • Source code from 1999 to present day (SVN-trunk -> git-master)
    • all 7.x release branches
  • github repository “grass-legacy” (repo)

    • separate repository for older GRASS GIS releases (3.2, 4.x, 5.x, 6.x), hence source code now available in git since 1987!
  • github repository “grass-addons” (repo)

    • repository for addons
  • github repository “grass-promo” (repo)
    • repository for promotional material
  • github repository “grass-website” (repo)
    • repository for upcoming new Website

Remarks on the “grass-legacy” repository

What special about it:

  • the source code goes back to 1987!
  • file timestamps (which I tried to preserve for decades :-) have been used to reconstruct the source code history (e.g., releasebranch_3_2)
  • junk files removed (plenty of leftover old binary files, files consisting of a special char only etc)
  • having this grass-legacy repo available in parallel to the main grass repo which contains the  recent source code we have a continuous source code coverage from 1987 to today in git.
  • size is about 250MB

What’s missing

  • the 4.3 source code doesn’t have distinct timestamps. Someone must have once packaged without mtime preservation… a pity. Perhaps a volunteer may fix that by carrying over the timestamps from GRASS GIS 4.2 in case the md5sum of a file is identical (or so).

Trac issue migration

A series of links had to be updated. Martin Landa invested days and days on that (thanks!!). He used the related GDAL efforts as a basis (Even Rouault: thanks!). As the date for the trac migration we selected 2007-12-09 (r25479) as it was the first SVN commit (after the years in CVS). The migration of trac bugs to github (i.e. transfer of trac ticket content) required several steps:

Link updates in the ticket texts:

  • links to other tickets (now to be pointed to full trac URL). Note that there were many styles of referring in the commit log message which had to be parsed accordingly
  • links to trac wiki (now to be pointed to full trac URL)
  • links source code in SVN (now to be pointed to full trac URL)
  • images and attachments (now to be pointed to full trac URL)

Transferring:

  • “operating system” trac label into the github issue text itself (following the new issue reporting template)
  • converting milestones/tickets/comments/labels
  • converting trac usernames to Github usernames
  • setting assignees if possible, set new “grass-svn2git” an assignee otherwise
  • slowing down transfer to match the 60 requests per second API limit rate at github

Fun with user name mapping

Given GRASS GIS’ history of 35+ years we had to invest major effort in identifying and mapping user names throughout the decades (see also bug tracker history). The following circumstances could be identified:

  • user present in CVS but not in SVN
  • user present in SVN but not in CVS
  • user present in both with identical name
  • user present in both with different name (well, in our initial CVS days in 1999 we often naivly picked our surnames like “martin”, “helena”, “markus”, “michael” … cute yet no scaling very much over the years!) as some were changed in the CVS to SVN migration in 2007, leading to
    • colliding user names
  • some users already having a github account (with mostly different name again)

We came up with several lookup tables, aiming at catching all variants. Just a “few” hours to dig in old source code files and in emails for finding all the missing email addresses…

Labels for issues

We cleaned up the trac component of the bug reports, coming up with the following categories which have to be visually grouped by color since the label list is just sorted alphabetically in github/gitlab:

  • Issue category:
    • bug
    • enhancement
  • Issue solution (other than fixing and closing it normally):
    • duplicate
    • invalid
    • wontfix
    • worksforme
  • Priority:
    • blocker
    • critical
    • feedback needed
  • Components:
    • docs
    • GUI
    • libs
    • modules
    • packaging
    • python
    • translations
    • unittests
    • Windows specific

Note that the complete issue migration is still to be done (as of Nov. 2019). Hopefully addressed at the GRASS GIS Community Sprint Prague 2019.

Setting up the github repository

In order to avoid users being flooded by emails due to the parsing of user contributions which normally triggers an email from github) we reached out to GitHub support in order to temporarily disable these notifications until all source code and selected issues were migrated.

The issue conversion rate was 4 min per trac bug to be converted and uploaded to github. Fairly slow but likely due to the API rate limit imposed and the fact that the migration script above generates a lot of API requests rather than combined ones..
Note to future projects to be migrated: use the new gihub import API (unfortunately we got to know about its existence too late in our migration process).

Here out timings which occurred during the GRASS GIS project migration from SVN to github:

  • grass repo: XX hours (all GRASS GIS 7.x code)
  • grass-legacy repo: XX hours (all GRASS GIS 3.x-6.x code)
  • NNN issues: XX hours – forthcoming.

New issue reporting template

In order to guide the user when reporting new issues, we will develop a small template – forthcoming.

Email notifications: issues to grass-dev and commits to grass-commit

We changed the settings from SVN post-hook to Github commit notifications and they flow in smoothly into the grass-commit mailing list. Join it to follow the development.

Overall, after now several months of using our new workflow we can state that things work fine.

1 reply

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *