Stripe CTF3

I recently participated in Stripe's CTF3, a programming challenge. This is the first time I've participated in Stripe's CTF, but I really enjoyed it. I thought the problems were very interesting and the technical aspects of the challenge were well executed.

I'm satisfied with how I did overall: although I dropped to 30th by the end of the competition, I was the second public competitor to capture all the flags, and was at the top of the leaderboard for several days.

Here's a rough description of how I approached the problems, and how I was able to churn out some highly mediocre solutions to them as quickly as I did.

Level 0: Introduction

This level required you to implement what was essentially a simple spellchecker: given an input dictionary and a corpus, your program had to output the corpus with brackets around words which do not appear in the dictionary. For example, for the input:

The quick borfwn fox jumped over the zaly dog

...the correct output might have been:

The quick <borfwn> fox jumped over the <zaly> dog

I initially solved this in PHP by storing the dictionary in a hash table. This was good enough to capture the flag.

I later returned to this problem to optimize it. A key part of the problem was that the dictionary was static and known ahead of time, so your program could be hard-coded against the dictionary and did not need to read it.

To take advantage of this, I wrote a PHP script which would output a C header file containing the data for a trie. This was much faster, and nearly instantaneous on the local test cases.

Generally, processing the actual data seemed to be so fast that doing well on this challenge came down to the number of milliseconds your binary required to start up. I didn't empirically verify this by deliberately adding sleeps, so I may have been chasing the wrong thing, but algorithmic changes I made after this point had no apparent effect, while changes that reduced the binary's startup time seemed to.

My first version of the generated C trie was about 100MB, but I was able to make the trie generator smarter and reduce the amount of data require to represent the trie. In particular:

  • After generating the trie, I collapsed all equivalent states. This reduced the trie to about 20% of the original size.
  • I sorted the states in the trie by the number of nonempty node transitions leading away from the state. Most trie states have only one transition, so I stored all of those transitions and a count of the number of states with only one transition, then all the states with two transitions and a count of the number of states with two transitions, etc.

These optimizations gave me a 1.1MB binary from a 2.4MB dictionary. I ended up 28th, about 25% slower than the leader. Offhand, I don't know what I could have done to improve my time from here.

Level 1: Gitcoin

This level required you to mine Gitcoins, which are similar to Bitcoins. Essentially, you were given a Git repository and needed to generate a new commit with a certain number of leading zeroes. To do this, you made up random commit messages until one produced a commit with enough zeroes.

The problem came with skeleton code, written in shell script.

I initially solved this by taking the critical part of the script, which performed hashing, and implementing it in C instead. This was good enough to capture the flag.

I later returned to this problem to optimize it. My C hasher remained mostly intact, but I wrote smarter driver code in PHP which could start a hasher per CPU core, watch for new commits from the upstream, and recover from various kinds of failure. Once this was stable, I deployed it on one machine in EC2 and let it run continuously.

At first, I was pretty competitive: I earned coins in almost every round, and I won a few rounds. However, after four or five days I no longer had enough compute power to compete with the strongest competitors and rarely earned coins. In theory I could have deployed on more boxes, but I was fairly sure that would work properly and it didn't seem very interesting. Instead, I just called it a day and shut down my mining operations.

Despite my comparatively limited throughput, I ended up in 11th place overall, largely because each round awarded bonus points and I'd managed to get my miner stable on the first day and earn points in each round after that until near the end of the contest.

Level 2: DDOS Defense

This level required you to write an HTTP proxy to protect a service from malicious attackers. It came with skeleton code written in Javascript.

I initially solved the problem by limiting each client to one concurrent connection, changing about 20 lines of the skeleton code. This was good enough to capture the flag.

I didn't find this problem especially interesting, and never returned to it.

Level 3: Instant Code Search

This level required you to write a distributed search engine. It came with skeleton code written in Scala.

The problem would start several indexers, then issue queries. You had to respond to the queries as quickly as possible. The skeleton code was very inefficient, in that each indexer indexed the entire corpus, and then the master chose one indexer at random and used its results.

My initial solution was to make each of the three indexers index just one third of the corpus, then combine the solutions from each indexer.

I had never written Scala before, and the project took a full minute to rebuild on my machine. I spent several hours getting this solution to work, including at least 30 minutes of struggling with string concatenation, and it wasn't fast enough to capture the flag.

However, it got me much closer and revealed that the total size of the corpus was far smaller than the memory limit for the indexers, so I just put the entire corpus in memory and that was good enough for a capture.

I found this problem interesting, but recompiling Scala was really painful, so I never returned to it.

Level 4: SQLCluster

This level required you to build a highly-available cluster of SQL servers. The simulation made the network this cluster was running on artificially unreliable, so you needed to be robust against various failure modes.

It came with skeleton code written in Go which worked very poorly, and linked to some academic papers with a lot of words in them. I'd never written Go before either, but found it pretty easy to figure out and it only took a second or two to build on my machine. I was very happy about this after the Scala challenge.

All that robust high-availability stuff sounded really really hard, so I commented out all the replication and failover logic and made every node except the primary proxy connections to the primary. This was good enough to capture the flag, and kept me at the top of the leaderboard for about 24 hours.

I later revisited this problem, and made several improvements:

When a node proxied a query, sometimes the query would execute but the network link would drop before the result returned. If the proxy repeated the request, it would double-execute. If the proxy returned a failure code, the test harness would be left waiting for the missing result forever. This didn't always happen, but killed my score when it did, since I'd get no points for the rest of the round (and lose points with every network request).

To resolve this, I tagged each incoming SQL query with a unique comment, then put a cache of query results on the primary. This let proxies replay requests as often as they wanted: if the primary had already executed a request, it would just return the cached result.

I also noticed that the SQL queries used only a tiny subset of SQL, so I implemented the table and SQL parser in memory. I couldn't figure out how to loop over a list of strings in Go, so I used copy-paste.

These changes catapulted me to the head of the leaderboard until Sunday, when Stripe adjusted the scoring algorithm to award more points to solutions that were implemented in the sprit of the challenge rather than to the letter of the scoring algorithm.

I had some plans for revising my solution, but didn't have a chance to return to the problem after the scoring change and ended up at rank 30.