fluffy rambles: My useless thoughts on Microsoft GitHub

My useless thoughts on Microsoft GitHub

June 6, 2018 6:43 PM (5 years ago)

A lot has been written about the impending buyout of GitHub by Microsoft. As a regular user of the former and not much of a user of the latter, people would probably expect me to be against this happening, but my feelings on it are largely positive.

And this comes from someone who used to refer to them as “M$!” (All I can say is I’ve grown a lot since the 90s.)

The general sentiment in many circles is that it’s horrible how Microsoft are now the stewards of so much open source code, a company which has historically not been great at open source in general. However, over the past ten years or so, Microsoft have completely changed their attitudes towards OSS, and I have no worries about them at this time. (Contrast this to, say, SourceForge or Google.)

Anyway, I won’t spend a lot of time talking about why I think the GitHub buyout is okay or why I’m not worried about Microsoft; Peter Bright says pretty much everything I’d be able to say about that. (And, in the interest of full disclosure, my startup is a paying customer of GitHub, as was the previous company I worked at.)

People do raise a perfectly good point about how bad it is for any one company to be the primary respository for so much F/OSS software out there, and I do agree with that. It’s only fairly recently (2014 or thereabouts) that I’ve started using GitHub myself; I used to use a self-hosted GitWeb site for sharing all of my code, but it had the problem of nobody knowing how to submit patches. (And honestly, GitWeb is kinda hard to use even for browsing.)

Also, consider that git itself is already a distributed source control system; if Microsoft were to kill GitHub overnight, your repositories are already backed up. In fact, the easiest way to make a backup of a git repo is to simply use it!

Really, the main things that GitHub brings to the table are:

Discovery
Issue tracking
Wiki hosting
Ease of forking and submitting patches
The less-bad variant of large file support

Discovery is sort of a non-issue; popular projects link to wherever their code is hosted anyway, and most discovery of such software likely comes from general-purpose search engines (ideally finding the projects' own sites). Being on GitHub does help with search traffic, granted, but it’s not the be-all end-all of project discovery. Most of the open source software I use I find from their own sites and only follow their “Fork me on GitHub!” links if I need to see the source for some reason.

Issue tracking is definitely a problem, but that’s not specific only to GitHub. Git itself doesn’t have any sort of built-in issue tracking at all, and I always felt like that was a huge lost opportunity; forking a codebase should also fork the issues, and it’d be great if issues' closures were tied to their commits as well (so, merge a repo with a fixed issue, merge the issue update too). This is something that Fossil gets right, and feels like it should also be something that git could support via metadata conventions. There is no reason there couldn’t be an issues/ directory at the root of a repo, where each issue is provided as a Markdown file where contributors can add comments to the end. (And these would of course all be maintained in git’s blockchain.)

(As an aside, did you realize git is a blockchain? I have often joked in the past about how some software company could probably make a killing with investors by just rebranding git as “blockchain-based source control,” because git already is that. It’s just not the environment-destroying kind, as the proof of work is based on pulls and merges rather than based on consensus hashing.)

Wiki hosting is another version of issue tracking; the documentation for a system should be part of the repo, and it’s telling that GitHub’s wiki is already just a git repo that contains Markdown files. Why not just make it part of the same repo?

Code forking is easy on GitHub, but it’s also not that hard to do in vanilla git. After all, the way you fork a git repo is the same as how you back it up — you check it out, and then you have your own version history that may or may not merge back upstream based on a submitted patch being accepted.

Patch submissions are tricky, but even before GitHub there were concepts of pull requests, which worked a bit differently than GitHub pull requests — it was a specially-formatted email message that basically said “hey, could you pull this branch from this repo and consider merging it in?” And now you know why it’s called a “pull request” in the first place, when that name never really made any sense in the GitHub context!

And for folks who didn’t have their own repo hosting (which is uncommon — any static file hosting on the web can be a git repo!) you could also submit a patch by email.

That really just leaves Git LFS as the thing to worry most about. Before Git LFS there was git-annex, which basically just provided a mechanism for referencing an external file store via UNIX symbolic links, and the actual question of hosting/forking/replication was a gigantic 🤷🏼‍♀️. LFS is a huge improvement over that, but the overhead and infrastructure needs of setting up an LFS-aware repo leaves a lot to be desired. LFS itself does benefit from the “free” repo cloning that is offered by git’s distributed nature, but your remote repositories all need to be LFS-aware.

(Want your large binary files to get distributed backups? That’s already supported by vanilla git — it’s called “not using LFS.”)

Some alternatives

So, while I personally have no problem with this GitHub sea change, a lot of people are scrambling to find alternatives. While it’s silly to only now have a problem with centralized hosting at the mercy of others, it’s good that this conversation is finally happening all the same.

For hosted services, a lot of folks use BitBucket, which I should point out is currently owned by Atlassian, who also don’t have a particularly great track record with open standards or cooperation. (Ask anyone who’s ever had to put up with JIRA or Confluence for their opinions on them!) Oh and if you just wanted to use Trello as an “independent” issue tracking system, I’ve got news for you.

On the “I just want to self-host everything that GitHub had to offer” end of the spectrum there’s GitLab, which requires quite a lot of infrastructure and administrative overhead; the cost of running a GitLab instance can be pretty enormous, much moreso than simply buying GitHub’s paid hosting! If you want to buy into someone else’s GitLab hosting overhead, a lot of people seem to be switching to Framagit, but do you really want to dive right back into relying on someone else’s hosting? And GitHub at least (now) has deep pockets!

Or if you want a lighter-weight variant, Felix has pointed me to Gitea, which looks cute. (Much later update: Gitea is what I now use for my personal repository. It’s pretty great.)

If you’re not particularly wedded to git itself, consider Fossil, which (as I mentioned above) has issues and wikis integrated, and also allows some amount of bidirectional sync with git! Also the folks I know who use it praise it for being way more humane than git anyway. So, it’s worth trying out, at least.

Anyway, if you have any preferred GitHub alternatives, or just want to say why I’m silly for overlooking something important in all this, feel free to share your comments here!