Automatic Mitigation of Meltdown

Let’s look at what Meltdown is and how it works, as well as how it is stopped. A lot has been written about the Meltdown vulnerability, but it is still commonly misunderstood. A few diagrams may help.

First, let’s consider a simplified memory hierarchy for a computer: main memory, split into user memory and kernel memory; the cache (typically on the CPU chip); and then the CPU itself.

The bug is pretty simple. For about two decades now, processors have had a flag that tells them what privilege level a certain instruction is running in. If an instruction in user space tries to access memory in kernel space (where all the important stuff resides), the processor will throw an exception, and all will be well.

On certain processors though, the speculative executor fails to check this bit, thus causing side-effects in user space (caching of a page), which the user space instructions can test for. The attack is both clever and remarkably simple.

Let’s walk through it graphically. Assume your memory starts with this flushed cache state — nothing sits in the cache right now (the “flush” part of what is a a “flush-reload” attack):

Step 1: Find cached pages

First let’s allocate 256 pages on the user space that we can access. Assuming a page size of 4K, we just allocate 256 times 4K bytes of memory. It doesn’t matter where those pages reside in user-space memory, so long as we got the page size correct. In C-style pseudo-code:

char userspace[256 * 4096];

I’ll mark those in the userspace diagram — for brevity, I’ll only show a few pages, and I’m going to show cached pages popped up like this:

This allows for easier reading (and easier drawing for me!).

So let’s start with an empty (flushed) cache:

We know what the cache state would be if we accessed a byte in page 10. Since any byte in page 10 would do the trick, let’s just use the very first byte (at location 0).

The following code accesses that byte:

char dummy = userspace[10 * 4096];

This leads the state to be:

Now what if we measured the time to access each page and stored it?

int accessTimes[256];
for (int i=0; i < 256; i++) {
    t1 = now();
char dummy = userspace[i * 4096];
    t2 = now();
accessTimes[i] = t2-t1;

Since page 10 was cached, page 10’s access time would be significantly faster than all other pages which need a roundtrip to main memory. Our access times array would look something like this:

accessTimes = [100, 100, 100, 100, 100, 100, 100, 100, 100, 10, 100, 100....];

The 10th value (page 10) is an order of magnitude faster to access than anything else. So page 10 is cached, whereas others were not. Note though that all of the pages did get cached as part of this access loop. This is the “reload” part of the flush-reload side-channel — because we reloaded all pages into the cache.

At this point we can figure out which pages are cached with ease if we flush the cache, allow someone else to affect it, then reload it.

Step 2: Speculate on kernel memory

This step is easy. Let’s assume we have a pointer to kernel memory:

char *kernel = 0x1000; //or whatever the case is

If we tried to access it using an unprivileged instruction, it would fail — our user space instructions don’t have a privileged bit set:

char important = kernel[10];

Speculating this is easy. The instruction above would speculate just fine. It would then throw an exception, which would cause us to never get the value of important.

Step 3: Affect userspace based on speculated value

However, what happens if we speculated this?

char dummy = userspace[kernel[10] * 4096]

We know userspace has 256 * 4096 bytes — we allocated it. Since we’re only reading one byte from the kernel address, the maximum value is 255.

What happens when this line is speculated? Even though the processor detected the segmentation fault and prevented you from reading the value, did you notice that it cached the user-space page? The page whose number was the value of kernel memory!

Suppose the value ofkernel[10] was 17. Let’s run through this:

  1. Processor obtained kernel[10] using the branch predictor. That value was 17.
  2. The processor then dereferenced the 17th 4K-wide page in the array “userspace”: userspace[17 * 4096]
  3. The processor detected that you weren’t allowed to access kernel[10], and so told you you can’t execute the branch. Bad programmer!
  4. The processor left the cache untouched. It’s not going to let you touch kernel memory on the cache though. It’s got your back…

What was the state of cache at the end of this?

That’s cool! Using Step 1, we would get the 17th page time being the fastest — by a large amount from the others! That tells us the value of kernel[10] was 17, even though we never accessed kernel[10]!

Pretty neat huh? By going over the kernel byte by byte, we can get the value of every kernel address, by affecting cache pages.

What went wrong? How are we fixing it?

Meltdown is a genuine “bug” — it’s not in the side-channel. The bug is straightforward — CPU speculative execution should not cross security boundaries — and ultimately should be fixed in the CPU itself.

It’s not the cache that’s misbehaving — even though that’s where most operating-system vendors are fixing it. More precisely, they are attempting to further isolate kernel and userspace memory, using something called Kernel Page Table Isolation (KPTI), previously called KAISER. It maps very few “stub” pages to the process’s virtual memory, keeping the kernel out (and thus not reachable by the speculative execution engine).

Unfortunately, this segmentation is coming at a cost — accessing kernel memory now requires more expensive hardware-assisted transitions.

Polymorphic Linux stops ROP attacks; increases difficulty of others

Since Polymorphic Linux was intended for stopping ROP attacks dead in their tracks, all ROP attacks in kernel space are defeated by using polymorphic kernels. Especially when KASLR (kernel address space layout randomization) is defeated (which is so trivial that the Meltdown paper leaves it as an exercise for the reader).

Furthermore, since polymorphic binaries have different signatures, layouts, instructions and gadgets, they make it difficult by at least an order of magnitude to craft further attacks. Polymorphic binaries force the extra step of analysis and understanding per binary. This means that a lateral attack (one that moves from machine to machine in a network) becomes much harder.

Look out for my next post on Spectre. It’s a bit more difficult to explain and definitely harder than Meltdown to craft…

Semantic Versioning has failed Agile

This is an Engineering post on how we build software at Polyverse, what processes we follow and why we follow them.

A couple of weeks ago, I attended a CoffeeOps meetup at Chef HQ. One of my answers detailing how we do agile, CI/CD, etc. got people excited. That prompted me to describe in detail exactly how our code is built, shipped, and how we simplified many of the challenges we saw in other processes. It should be no surprise that we make heavy use of Docker for getting cheap, reliable, and consistent environments.

Dropping Irrelevant Assumptions

I first want to take a quick moment to explain how I try to approach any new technology, methodology or solution, so that I make the best use of it.

Four years ago, when we brought Git into an org, a very experienced and extremely capable engineer raised their hand and asked, “I’ve heard that Git doesn’t have the feature to revert a single file back in history. Is this true? If it is true, then I want to understand why we are going backwards.”

I will never forget that moment. As a technical person, truthfully, that person was absolutely RIGHT! However, moving a single file backwards was something we did because we didn’t have the ability to cheaply tag the “state” of a repo, so we built up terrible habits such as “branch freeze”, timestamp-based checkouts, “gauntlets”, etc. It was one of the most difficult questions to answer, without turning them antagonistic, and without sounding like you’re evading the issue.

I previously wrote a similar answer on Quora about Docker and why the worst thing you can do is to compare containers to VMs.

It is very dangerous to stick to old workarounds when a paradigm shift occurs. Can we finally stop it with object pools for trivial objects in Java?

What problems were we trying to solve?

We had the same laundry list of problems nearly any organization of any type (big, small, startup, distributed, centralized, etc.) has:

  1. First and foremost, we wanted to be agile, as an adjective and a verb, not the noun. We had to demonstrably move fast.
  2. The most important capability we really wanted to get right was the ability to talk about a “thing” consistently across the team, with customers, with partners, with tools and everything else. After decades of experience, we’d all had difficulty in communicating precise “versions” of things. Does that version contain a specific patch? Does it have all the things you expect? Or does it have things you don’t expect?
  3. We wanted to separate the various concerns of shipping and pay the technical debt at the right layer at all times: developer concerns, QA assertions, and release concerns. Traditional methods (even those used in “agile”, were too mingled for the modern tooling we had). For example, “Committing code” is precisely that — take some code and push it. “Good code” is a QA assertion — regardless of whether it be committed or not. “Good enough for customers” is a release concern. When you have powerful tools like Docker and Git, trying to mangle all three in some kind of “master” or “release” branch seemed medieval!
  4. We wanted no “special” developers. Nobody is any more or less important than anyone else. All developers pay off all costs at the source. One person wouldn’t be unfairly paying for the tech debt of another person.
  5. We believed that “code is the spec, and the spec is in your code”. If you want to know something, you should be able to consistently figure it out in one place — the code. This is the DRY principle (Don’t Repeat Yourself.) If code is unreadable, make it readable, accessible, factored, understandable. Don’t use documentation to cover it up.
  6. We wanted the lowest cognitive load on the team. This means use familiar and de-facto standards wherever possible, even if it is occasionally at the cost of simplicity. We have a serious case of the “Not-Invented-Here” syndrome. 🙂 We first look at everything that’s “Not-Invented-Here”, and only if it doesn’t suit our purpose for a real and practical problem, do we sponsor building it in-house.

Now let’s look at how we actually build code.

Content-based non-linear versioning

The very fundamental foundation of everything at Polyverse is content-based versioning. A content-based version is a cryptographically secure hash over a set of bits, and that hash is used as the “version” for those bits.

This is a premise you will find everywhere in our processes. If you want to tell your colleague what version of Router you want, you’d say something like: router@72e5e550d1835013832f64597cb1368b7155bd53. That is the version of the router you’re addressing. It is unambiguous, and you can go to the Git repository that holds our router, and get PRECISELY what your colleague is using by running git checkout 72e5e550d1835013832f64597cb1368b7155bd53.

This theme also carries over to our binaries. While there is semantic versioning in there, you’ll easily baffle anyone on the team if you asked them for “Router 1.0.2”. Not that it is difficult to look it up, but that number is a text string that anyone could place there and as a mental model, you’d make everyone a little uneasy. Culturally we simply aren’t accustomed to talking in imprecise terms like that. You’d be far more comfortable saying Router with sha 5c0fd5d38f55b49565253c8d469beb9f3fcf9003.

Philosophically we view Git repos as “commit-clouds”. The repos are just an amorphous cloud of various commit shas. Any and every commit is a “version”. You’ll note that this not only is an important way to talk about artifacts precisely, but more so, it truly separates “concerns”. There is no punishment for pushing arbitrary amounts of code to Git on arbitrary branches. There is no fear of rapidly branching and patching. There is no cognitive load for quickly working with a customer to deliver a rapid feature off of a different branch. It just takes away all the burden of having to figure out what version you assign to indicate “Last known good build of v1.2.3 with patches x, y, but not z”, and “Last known good build of v1.2.3 with patches x, z, but not y”.

Instead, anyone can look up your “version” and go through the content tree, as well as Git history and figure out precisely what is contained in there.

Right about now, I usually get pushback surrounding the questions: how do you know what is the latest? And how do you know where to merge?

That is precisely that “perforce vs git” mental break we benefit from. You see, versions don’t really work linearly. I’ve seen teams extremely frightened of reverting commits and terrified of removing breaking features rapidly. Remember that “later” does not necessarily mean “better” or “comprehensive”. If A comes later than B, it does not imply that A has more features than B, or that A is more stable than B, or that A is more useful than B. It simply means that somewhere in A’s history, is a commit node B. I fundamentally wanted to break this mental model of “later” in order to break the hierarchy in a team.

This came from two very real examples from my past:

  1. I’ll never forget my first ever non-shadowed on-call rotation at Amazon. I called it the build-from-hell. For some reason, a breaking feature delayed our release by a couple of days. We were on a monthly release cycle. No big deal.
    However, once the feature was broken, and because there was an implied but unspoken model of “later is better”, nobody wanted to use the word “revert”. In fact, it was a taboo word that would be taken very personally by any committer. At some point, other people started to pile on to the same build in order to take advantage of the release (because the next release was still a month away.) This soon degraded into a cycle where we delayed the release by about 3 weeks, and in frustration we pulled the trigger and deployed it, because reverting was not in the vocabulary — I might have just called the committer a bunch of profane names and an imbecile and a moron and would have been better off.
  2. The event led to the very first Sev-1 of my life at 4am on one fateful morning. For the uninitiated, a Sev-1 is code for “holy shit something’s broken so bad, we’re not making any money off”
    After four hours of investigation it turned out that someone had made a breaking API change in a Java library, and had forgotten to bump up the version number — so when a dependent object tried to call into that library, the dispatch failed. Yes we blamed them and ostracized them, and added “process” around versioning, but that was my inflexion point — it’s so stupid!
    What if that library had been a patch? What if it had 3 specific patches but not 2 others? Does looking at version “2.4.5-rc5” vs “2.4.5-rc4” tell you that rc5 has more patches? Less patches? Is more stable? Which one is preferable? Are these versions branch labels? Git tags? How you ensure someone didn’t reassign that tag by accident? Use signed tags? All of this is dumb and unnecessary! The tool gives you PRECISE content identifiers. It was fundamentally built for that! Why not use it? Back in SVN/CVS/Perforce, we had no ability to specifically point to a code snapshot and it was time we broke habits we picked up because of that. This is why Git does not allow reverting of a single one file. 🙂

The key takeaway here was — these are not development concerns. We conflated release concerns with identity concerns. They are not the same. First, we need a way to identify and speak about the precise same thing. Then we can assert over that thing various attributes and qualities we want.

We didn’t want people to have undue process to make rapid changes. What’s wrong with making breaking API changes? Nothing at all! That’s how progress happens. Developers should be able to have a bunch of crazy ideas in progress at all times and commits should be cheap and easy! They should also have a quick, easy and reliable way of throwing their crazy ideas over the wall to someone else and say, “Hey can you check this version and see how it does?”, without having to go through a one-page naming-convention doc and updating metadata files. That was just so medieval!

What about dependency indirection? One reason people use symbolic references (like semantic versioning) is so that we can refer to “anything greater than 2.3.4” and not worry about the specific thing that’s used.

For one, do you REALLY ever deploy to production and allow late-binding? As numerous incidents have demonstrated, no sane Ops person would ever do this!

In my mind, having the ability to deterministically talk about something, far outweighs the minor inconvenience of having to publish dependency updates. I’ll describe how we handle dependencies in just a minute.

Heavy reliance on Feature Detection

Non-linear content-based versioning, clearly raises red-flags. Especially when you’re built around an actor-based model of microservices passing messages all over the place.

However, there’s been a solution staring us right in the face for the past decade. One that we learned from the web developers — use feature detection, not version detection!

When you have loosely-coupled microservices that have no strict API bindings, but rather pass messages to each other, the best way to determine if a service provides a feature you want, is to just ask it!

We found quite easily, that when you’re not building RPC-style systems, and I consider callbacks as still being an RPC-style system, you don’t even need feature-detection. If a service doesn’t know what to do with a message, it merely ignores it, and the feature simply doesn’t exist in the system. If you’re not waiting for a side-effect — not just syntactically, but even semantically, you end up with a very easy model.

Now that comment in the previous section about a developer being able to throw a version over the wall and ask another developer what they thought of it, makes a lot more sense. Someone can easily plug in a new version very easily into the system, and quickly assert whether it works with the other components and what features it enables.

This means that at any given time, all services can arbitrarily publish a bunch of newer features without affecting others for the most part. This is also what allows us to have half a dozen things in progress at all times, and we can quickly test whether something causes a regression, and whether something breaks a scenario. We can label that against a “version” and we know what versions don’t work.

Naturally this leads us to a very obvious conclusion, where “taking dependencies” areno longer required at a service/component level. They wouldn’t be loosely-coupled actor-model-based Erlang-inspired microservices, if they had dependency trees. What makes more sense is…

Composition, not Dependencies

When you have content-based non-linear versioning allowing aggressive idea execution, combined with services that really aren’t all that concerned about what their message receivers do, and will simply log an error and drop weird messages sent to themselves, you end up with a rather easy solution to dependency management — composition.

If you’ve read my previous posts, or if you’ve seen some of our samples, you’ll have noticed a key configuration value that shows up all over the place called the VFI. It’s a JSON blob that looks something like this:

"etcd": {
"type": "dockerImage",
"address": ""
"nsq": {
"type": "dockerImage",
"address": "nsqio/nsq:v0.3.8"
"polyverse": {
"type": "dockerImage",
"address": ""
"status": {
"type": "dockerImage",
"address": ""
"supervisor": {
"type": "dockerImage",
"address": ""
"router": {
"type": "dockerImage",
"address": ""
"containerManager": {
"type": "dockerImage",
"address": ""
"eventService": {
"type": "dockerImage",
"address": ""
"api": {
"type": "dockerImage",
"address": ""

NOTE: If you work at Amazon or have worked there before, you’ll recognize where the word came from. When we started at Polyverse, I really wanted a composition blob that described a set of components together, and I started calling it a VFI, and now it’s become a proper noun. It really has lost all meaning as an acronym. It’s simply its own thing at this point.

What you’re seeing here, is a set of components that describe as you guessed it, the addresses where they might be obtained (in this example, the addresses are symbolic — they’re Docker image labels; however in highly-secure deployments we use the one true way to address something — content-based shas. You might easily see a VFI that has “<somesha>” in the address field.

Again, you’ll notice that this isn’t a fight against proper dependencies, but rather an acknowledgement that “router” is not where information for “all of polyverse working” should be captured. It is a proper separation of concerns.

The router is concerned with whether it builds, passes tests, boots up and has dependencies it requires for its own runtime. What doesn’t happen is a router taking dependencies at the component level, on what the container manager should be, could be, would be, etc. And more so, it does not have the burden of ensuring that “cycling works”.

Too often these dependency trees impose heavy burdens on developers of a single component. In the past I’ve seen situations where, if you’re a web-server builder, and you got a downstream broken dependency related to authentication, you are now somehow de-facto responsible for paying the price of the entire system’s end-to-end working. It means that the burden of work increases as you move further upstream closer to your customer. One bad actor downstream, has you paying the cost. Sure, we can reduce the cost by continually integrating faster, but unless “reverting” is an option on the table, you’re still the person who has to do it.

This is why Security teams are so derided by the Operations teams. Until recently and the advent of DevSecOps, they always added a downstream burden — they would publish a library that is “more secure” but breaks a fundamental API, and you as the developer, and occasionally the operator paid the price for updating all API calls, testing and verifying that everything works.

Our VFI structure flips this premise on its head. If the router-developer has a working VFI, and somehow the downstream container manager developer broke something, then their “version” in that VFI is not sanctioned. The burden is now on them to go fix it. However, since the router doesn’t require a dependency update or a rebuild, simply plugging in their fixed version in the VFI, is sufficient enough to get their upgrade pushed into production quite easily.

You’ll also notice how this structure puts our experimentation ability on steroids. Given content-based versioning, and feature-detection, we can plug a thousand different branches, with a thousand different features, experiments, implementations, etc. in a VFI, and move rapidly. If we have to make a breaking change to an API, we don’t really have to either “freeze a branch” or do code lockdowns. We just replace API V1 with V2, and then as various components make their changes, we update those in the VFI and roll out the change reliably, accurately, predictably, and most importantly, easily. We remove the burden on the API changer to somehow coordinate this massive org-wide migration, and yet we also unburden the consumers from doing some kind of lock-step code update.

All the while, we preserve our ability to make strict assertions about every component, and an overall VFI — is it secure? is it building? Is it passing tests? Does it support features? Has something regressed? We further preserve our ability to know what is being used and executed at all times, and where it came from.

Naturally, VFI’s themselves are content-versioned. 🙂 You’ll find us sending each other sample VFIs like so: Runtime@b383463cf0f42b9a2095b40fc4cc597443da47f2

Anyone in the company can use our VFI CLI to expand this into a json-blob, and that blob is guaranteed to be exactly what I wanted someone else to see, with almost no chance of mistake or miscommunication.

Isn’t this cool? We can stay loosey-goosey and experimentally hipster, and yet talk precisely and accurately about everything we consume!

You’ll almost never hear “Does the router work?” because nobody really cares if the router works or not. You’ll always hear conversations like, “What’s the latest VFI that supports scrambling?”, or “What’s the latest stable VFI that supports session isolation?”

Assertions are made over VFIs. We bless a VFI as an overall locked entity, and that is why long-time customers have been getting a monthly email from us with these blobs. 🙂 When we need to roll out a surgical patch, the overhead is so minimal, it is uncanny. If someone makes a surgical change to one component, they test that component, then publish a new VFI with that component’s version, and test the VFI for overall scenarios. The remaining 8 components that are reliable, stable, tested, see no touch or churn.

Components are self-describing

Runtime Independence

Alex and I are Erlang fanboys and it shows. 100% of Polyverse is built on a few core principles, and everything we call a “component” is really an Actor stylized strictly after the Actor Model.

A component is first and foremost a runtime definition; it is something that one can run completely on it’s own and it contains all dependencies, supporting programs, and anything else it needs to reliably and accurately execute. As you might imagine, we’re crazy about Docker.

Components have a few properties:

  1. A component must always exist in a single Git repository. If it has source dependencies, they must be folded in (denormalized.)
  2. Whenever possible a components runtime manifestation must be a Docker Image, and it must fold in all required runtime dependencies within itself.

This sounds simple enough, but one very important contract every component has, is that a component may not know implicitly about the existence of any other components. This is a critical contract we enforce.

If there is one thing I passionately detest above all else in software engineering, it is implicit coupling. Implicit coupling is “magic”. It is when you build a component that is entirely syntactically decoupled from the rest. If Component A somehow relies on Component B existing, and acting a very specific way, then Component A should have explicitly expressed that coupling. As an operator, it is a nightmare to run these systems! You don’t know what Component A wants, and to keep up public displays of propriety, doesn’t want to tell you. In theory Component A requires nothing else to work! In practice, Component A requires Component B to be connected to it in a very specific magical way.

We go to great lengths to prevent this from happening. When required, components are explicitly coupled, and are self-describing as to what they need. That means all our specs are in the code. If it is not defined in code, it is not a dependency.

Build Independence

We then take this runtime definition back to the development pipeline, and ensure that all components can be built with two guaranteed assumptions:

  1. That there be a “/bin/bash” that they can rely on being present (so they can add the #!/bin/bash line.)
  2. That there be a working Docker engine.

All our components must meet the following build contract:

docker build .

It really is that simple. This means that combined with the power of content-addresses, VFIs and commit-clouds, we always have a reliable and repeatable build process on every developers’ desktop — Windows, Linux or Mac. We can be on the road, and if we need a component we can do “docker build .” We can completely change out the build system, and the interface still remains identical. Whether we’re cross-compiling for ARM, or for an x86 server, we all have a clear definition of “build works” or “build fails”. It really is that simple.

Furthermore, because even our builders are “components” technically, they follow the same rules of content-addressing. That means at any given time you can go back two years into a Git repo, and build that component using an outdated build system that will continue to work identically.

We store all build configuration as part of the components repo, which ensures that when we address “router@<sha>” we are not only talking about the code, but the exact manner that version needed to be built in, or wanted to be built in.

Here too you’ll notice the affinity to two things at the same time:

  1. Giving developers complete freedom to do what they want, how they want — whatever QA frameworks, scripts, dependencies, packages, etc. that they need, they get. They don’t have to go talk to the “Jenkins guy” or the “builder team”. If they want something they can use it in their builder images.
  2. Giving consumers the capacity to reason about “what” it is someone is talking about in a comprehensive fashion. Not just code, but how it was built, what tests were run, what frameworks were used, what artifacts were generated, how they were generated, etc.

Where it all comes together

Now that we’ve talked about the individual pieces, I’ll describe the full development/build/release cycle. This should give you an overview of how things work:

  1. When code is checked in it gets to be pushed without any checks and balances, unless you’re trying to get onto master.
  2. You’re allowed to push something to master without asking either, with the contract that you will be reverted if anyone finds a red flag (automation or human.)
  3. Anyone can build any component off any branch at any time they want. Usually this means that nobody is all that aggressive about getting onto master.
  4. An automated process runs the docker build command and publishes images tagged with their Git commit-shas. For now it only runs off master.
  5. If the component builds, it is considered “unit tested”. It knows what it needs to do that.
  6. A new VFI is generated with the last-known-good-state of working components, and this new component tag is updated in the VFI, and the VFI is submitted for proper QA.
  7. Assertions such as feature availability, stability, etc. are made over a VFI. The components really don’t know each other and don’t take version-based dependencies across each other. This makes VFI generation dirt-cheap.
  8. When we release, we look for the VFI that has the most assertions tagged around it (reviewed, error-free, warning-free, statically-verified, smoke-test-passed, patch-verified, etc. etc.)
  9. In the dev world, everyone is generating hundreds of VFIs per day, just trying out their various things. They don’t need to touch other components for the most part, and there is little dependency-churn.

I hope this post sheds some light on how we do versioning, why we do it this way, and what benefits we gain. I personally happen to think we lose almost none of the “assertions” we need to ship reliable, stable and predictable code, and at the same time simultaneously allowing all developers a lot of freedom to experiment, test, prototype and have fun.