The Phacility Blog

Encouraging Open Source Etiquette

As a potential new contributor to an open source project, it can be very frustrating to submit an issue or pull request on GitHub and never hear back from the maintainer. This article briefly discusses the problem and proposes a tool which might improve the state of the world by making it easier for contributors to estimate what level of response they'll get from a project before they begin work.

Problem: Unmaintained Projects

This article is partially a response to an article titled "A plea for better open source etiquette" by Jake Benilov, which hit the front page of Hacker News today (cf. HN comment therad) and discusses this issue in more detail from the point of view of a contributor.

Broadly, some open source projects are abandoned, effectively unmaintained, or maintained with such low priority that sending patches or issues to them is like screaming into a void. As a potential contributor, spending time to diagnose and understand a bug or issue, submit it, and then to get nothing back is extremely frustrating. Notably, any response (even a negative one) is better for a contributor than no response.

Anecdotally, this problem is fairly widespread. My experience is that maybe half what I push to upstreams will never receive a response. (Encouragingly, most of the responsive upstreams are responsive immediately.)

Reframing the Problem

The world would perhaps be better if every open source project was responsively maintained, but this is obviously unrealistic. It's also hard to change this much. Raising awareness may help, but probably only at the margins.

And this isn't really the problem: even unmaintained projects are generally a net positive. If you're submitting an issue or pull request, you likely found the code useful, and would have been worse off if it wasn't available at all.

The root problem of unmaintained or unresponsive projects is very difficult to solve, but there's a problem one level removed which seems fairly easy to solve: make it simple for potential contributors to determine how responsive a project is.

If contributors know ahead of time that a project is a black hole, they can avoid investing time to format issues for the upstream and thereby avoid the frustration of having them ignored (or at least have realistic expectations as they pitch their code into the salty depths). Similarly, if contributors know that a project is generally responsive, it is likely to encourage them to format and submit issues.

Proposal: Track Project Responsiveness

At least for projects on GitHub (which is a large enough set to be meaningful), this is a very tractable problem. A tool to solve it might look something like this:

  • Create a site like StarLogs which uses the GitHub API, but its output is a responsiveness assessment for the project.
  • Create an image badge like TravisCI so responsive maintainers can show off their good track record in their README and potential contributors can recognize it.
  • If this ends up actually being a good idea, add things like an "I was going to contribute to this project but it looks like it's dead so I didn't" button in some future iteration. (Among possible future features, I'm singling this one out because I suspect the catharsis provided by a "You Suck (but Your Code is Okay)" might be effective in addressing frustration.)

Responsiveness data is both easy to compute and demonstrably useful to contributors:

Ease of Computation: "Project Responsiveness" is a complicated question, but it doesn't need a precise answer: for a prototype, it is sufficient to categorize projects into "at least somewhat responsive" and "pretty much dead". It seems like this is probably plausible to do well enough via the GitHub API, by examining response rates and response times for issues and pull requests and then applying some complex mathematical wizardly like "mean" or maybe "median".

Demonstrably Useful: This data is useful enough that contributors sometimes make an effort to do this today. Benilov mentions raising a tiny pull request to test the waters. I've combed through closed pulls and issues to try to divine this, myself. Things like general project activity can be used to infer this, but there are plenty of responsively maintained projects which don't see much new code. All told, the existing methods for estimating project responsiveness are time consuming and extremely crude: it would be far easier, faster, and more valuable to look at a single aggregate statistic.

For such a tool to be successful, it may be important to consider incentives for maintainers. The tool is most useful if adoption is widespread, so that many projects show a badge. Generally, the guiding principle here should be that the largest gain is made by rejecting dead projects, not by selecting elite projects.

To this end, the bar for positive recognition as a responsibly maintained project should be very low (maybe "often responds within 14 days") and should award you a big gold "responsible maintainer" star. Future iterations could establish higher grades (from, say, "usually responds within 7 days" to "usually responds within 24 hours"), but the ranks should probably be like eBay: "A", "A+", "A++", etc. From the point of view of a contributor, a project that usually does a reasonable job of responding fairly quickly is already an "A" compared to projects which are a pit of nothingness.

The bar for "responding" should also be low: any response should count, not just a resolution. Even if a maintainer saying they're too busy to look at something for a while is vastly better for contributors than no response.

Generally, my intuition is that project responsiveness is a heavily bimodal distribution with active projects at one end and the Great Open Source Boneyard at the other. This tool could back that view of the world up with data, or reveal some other distribution and adjust product choices accordingly.

The responsiveness statistic should also be heavily weighted for recency so that projects can easily improve their report by improving their behavior.

Otherwise, it should be easy for contributors to look up projects, and clear to both contributors and maintainers how statistics are computed.

This idea seems like it is not especially novel, so maybe this tool already exists. But tools I'm aware of, like Ohloh, compute project activity -- this is correlated, but not actually the statistic we're interested in. Project responsiveness can be very high even when project activity is very low. But maybe something well-executed and sufficiently similar to this already exists and demonstrates that this is an awful idea or not useful enough that anyone cares.

If this tool sounds interesting and useful, you should think about building it. You'd have at least one user.

Also, GitHub Should Allow Projects to Disable Pull Requests

A very small part of this issue is that GitHub does not allow projects to disable pull requests (projects can disable Issues and Wikis, but not Pull Requests). It should: responsive, well-maintained projects with another documented channel for submitting changes can't really do anything reasonable here.

The poster child here is probably the Git mirror itself, which "solves" this issue by putting "This is a publish-only repository and all pull requests are ignored. Please follow Documentation/SubmittingPatches procedure for any of your improvements." in the summary. Despite this, 36 user have submitted pull requests. Many of them are probably frustrated. Allowing projects to disable Pull Requests would at least shut down one small source of frustration.