Recreating GitHub gists

Parts (1 out of ?)

Overview

There are many interesting details about git. How it designed to work over ssh, how its content addressable object store works for not only blobs but other data types too, like types. The way that git has over the years optimized its on disk storage via packfiles, indexes, bitmaps, commit graphs. The features that are built in to extend and customize functionality via hooks. How git can share object data across many repos making “forks” lightweight.

GitHub has done a good job of both building code collaboration tools around git, but also working on improving the scale, performance, and data integrity features in the git project directly as they ran into limitations. Some of the extensibility features of git, performance improvements, and features that make data replication and ref integrity possible (like transactions on update-ref) can be traced back to contributions from GitHub to solve their needs.

So since I got interested into digging into more of the how this works, I decide to alongside build a project that will help explore that functionality. I’m starting to build a service similar to GitHub gists. The plan will be a standalone service that can handle auth for git over ssh, and simple viewing of gist contents online. Web auth, web editing and other features may come over time, but to start I want to better understand git over ssh.

GitHub Gists is a very underrated service. Sure at its core it seems like a text box on the web that can post or edit some text and provide others a link to that text. Yeah in part its similar to the other code snippet services that were available around the same time. But what make gist different, is that each gist is a git repo. It seem to be built on a lot of the same tooling and infrastructure as the rest of GitHub repository service, though with some gist specific limits.

You can clone a GitHub gist, push changes, link to past versions of a file, diff commits, even fork gists. Gists have a bit of a slimmed down git service. It doesnt support branches. At least the web UI doesn’t support branches, it does seem you can push and pull changes to branches in gists, but no way to view those on the web even with manually putting the branch name as the ref in the url. It doesn’t support folders. All files have to be in the root of the repo. It does enforce that when changes are pushed via git:

remote: Gist does not support directories.
remote: These are the directories that are causing problems:

I’ll want to implement that too, so we can work with receive hooks to enforce a limit like this.

List of Features

A very high level list of features I’d like to build and explore.

  • Git repo host over both ssh and http
  • Web UI for creating, editing, deleting, commenting
  • List all public gists, profile pages listing public gists per user

Things that GitHub does, that we might skip (but could be really interesting to explore, so eventually might get to it)

  • Git data replication (Similar to GitHub spokes)
  • Forking gists, and using git alternates to keep objects shared, resulting in lightweight forks.
  • Rendering for multiple types of files like Python notebooks, or GeoJson maps.

Features Gists dont have, but would be good additions.

  • Publish files / access via CDN (like bl.ocks.org enables for gists or a lightweight gh-pages for sharing demos)
  • Enforce auth on access to gists (not just “secret” via a longer uuid)

Some technical choices

As a stating point

  • Using Golang, I’ve been writing Go recently for work and want to use keep using it here.
  • Sqlite for relational data (users / public keys, gist names / uuids / ownership). Keeping this service
  • Use Openssh, not running any custom ssh server. Openssh has plenty of hooks to customize it.
  • No git data replication / backup. Will rely on persist disk / VM disk snapshots for backups (unless/until we get to something like spokes)