The Docless Manifesto

“undocumented” code is the new ideal

Documentation is necessary. It’s essential. It’s valuable. For all its purported benefits though, even towards the end of 2018, it remains relegated to a stringly-typed escape-hatch to English, at best.

Let’s dissect that problem statement below, but feel free to skip the section if you nodded in agreement.

The Problem: Documentation is a Stringly-Typed escape hatch to English (at best)

String-Typing: A performance-review-safe way of saying “no typing”.

String-typing is the result of two constraints placed on developers:

  1. We must be strongly typed; we’re not hipsters — we enforce strong opinions on consumers of our program/library/platform.
  2. We must be extensible; we don’t know if we have the right opinion necessarily, and are afraid of getting abstracted out or replaced if we end up being wrong.

So how do you have an opinion that can never be wrong? You “enforce” the most generic opinion humanly possible:

// C/C++
(void *) compare(void *a, void *b)
// Java - but hey, at least primitive types are ruled out!
static Object UnnecessaryNounHolderForFunction compare(Object a, Object b)
// Golang
func compare(a interface{}, b interface{}) (interface{}, error)

There it is! Now we can never have a wrong opinion, but gosh darnit, we enforce the opinion we have — don’t you be sending stuff the programming language doesn’t support! We will not stand barbarism.

How does anyone know what those functions do?

We can solve this problem in one of two ways:

  1. You can just read the source code and stay up-to-date on what it does at any given iteration (and make the code easy to read/comprehend.)
  2. Or we can double-down on how we got here in the first place: If we had a highly opinionated but extensible way to document what it does, we could document anything: Welcome to Code Comment Documentation!

The strong opinion: You can add a string of bytes (non-bytes will NOT be tolerated; we’re not barbaric), that can represent anything (extensible). It is left to the user to interpret or trust anything said there — ASCII, Unicode, Roman Script, Devnagri, English, French, Deutch, whatever — it’s extensible, but at least it’s all bytes. We won’t parse it, store it, matain it, or make any sense of it in the compiler — we will contractually ignore it.

Why we don’t program in English in the first place?

Past the low-hanging fruit about English (or any natural language) being specific to a region, culture, or people, there is a much deeper reason English is not used to write computer programs: English is semantically ambiguous.

It is remarkably difficult to get two people, let alone a machine, to make sense of an English statement in the exact same way. Which is why we build programming languages that define strict semantics for every statement.

So how does a program written in a deterministic programming language whose meaning can’t be made sense of, become easier to comprehend when rewritten in the form of a code comment document in a language that fundamentally cannot convey accurate meaning? It doesn’t, of course. But much like CPR, which works far less than most people think, it is a form of therapy for the developer to feel like they did everything they could.

A second problem, that annoys me personally, is the blatant violation of DRY (Don’t Repeat Yourself.) You write code twice — once in your programming language, and once again in English (at best — because that’s all I read.) I have to usually read both if I have to make any reliable bet on your system. Only one of those two are understood unambiguously by the computer, you and me. The other one is not even understood by the computer, while you and I may not interpret it in the same way.

We can do better!

We have the benefit of hindsight. We have seen various constructs across programming languages. We can and should do better!

  1. Unit tests are spec and examples — they demonstrate where inputs come from, how to mock them, and what side effects should or should not happen. A Unit-Test becomes at once, a usage document, but also an assurance document.
  2. Better type-systems allow us to capture what we accept.
  3. Function/Parameter annotations can provide enumerable information to a caller.
  4. Function Clauses and Pattern Matching allows the same function to be better expressed in smaller chunks with limited context.
  5. Guards can merge input-validation and corresponding “what-I-don’t-accept” code comments into one — a machine-parsable enforceable spec + a beautiful document to read.
  6. Documentation as a first-class part of language grammar (that homoiconic influence from Lisp again.) In at least one system in Polyverse, we made code-comments enumerable through an interface (i.e. object.GetUsage() returns usage info, object.GetDefaultValue() returns a default value, etc. etc.) This means we can query a running compiled system for it’s documentation. We no longer have version-sync problems across code, or fear of loss of documentation. If all we have left is an executable, we can ask it to enumerate it’s own public structs, methods, fields, types, etc. and recreate a full doc for that executable.

All of this information is available to us. We intentionally lose it. It annoys me every day that I read someone else’s code. What else did they want to say? What else did they want to tell me? How is it that I force them to only communicate with cave paintings, and blame them for not explaining it better?

This led me to write:

The Docless Manifesto

A Docless Architecture strives to remove arbitrary duplicate string-based disjoint meta-programming in favor of a system that is self-describing — aka, it strives to not require documentation, in the conventional way we understand it.

The Docless Manifesto has two primary directives: Convey Intent and Never Lose Information.

Convey Intent

The question to ask when writing any function, program or configuration, is whether you are conveying intent, more so than getting the job done. If you convey intent, others can finish the job. If you only do the job, others can’t necessarily extract intent. Obviously, when possible, accomplish both, but when impossible, bias towards conveying intent above getting the job done.

Use the following tools whenever available, and bias towards building the tool when given a choice.

  1. Exploit Contentions: If you want someone to run make, give them a Makefile. If you want someone to run npm, give them a package.json. Convey the obvious using decades of well-known short-hand. Use it to every advantage possible.
  2. Exploit Types, Pattern-Matching, Guards, Immutable types: If you don’t want negative numbers, and your language supports it, accept unsigned values. If you won’t modify inputs in your function, ask for a first-class immutable type. If you May return a value, return a Maybe<Type>. Ask your language designers for more primitives to express intent (or a mechanism to add your own.)
  3. Make Discovery a requirement: Capture information in language-preserved constructs. Have a Documented interface that might support methods GetUsage, GetDefaultValue, etc. The more information is in-language, the more you can build tools around it. More so, you ensure it is available in the program, not along-side the program. It is available at development-time, compile time, runtime, debug-time, etc. (yes, imagine your debugger looking for implementation of Documented and pulling usage docs for you real-time from a running executable.)
  4. Create new constructs when necessary: A “type” is more than simply a container or a struct or a class. It is information. If you want a non-nullable object, create a new type or alias. Create enumerations. Create functions. Breaking code out is much less about cyclomatic complexity and more about conveying intent.
    A great example of a created-construct is the Builder Pattern. At the fancy upper-end you can provide a full decision-tree purely through discoverable and compile-time checked interfaces.
  5. Convey Usage through Spec Unit-Tests, not Readme: The best libraries I’ve ever consumed, gave me unit tests for example usage (or not usage.) It leads to three benefits all at once. First, I know that it is perfectly in sync without guessing, if the tests pass (discoverability), and the developer knows where they must update examples as well (more discoverability). Secondly, it allows me to test MY assumptions by extending those tests. The third and most important benefit of all: Your Spec is in Code, and you are ALWAYS meeting your Spec. There is no state where you have a Spec, separate and out of sync, from the Code.

Never Lose Information

This is the defining commandment of my Manifesto!

Violating this commandment borders on the criminal in my eyes. Building tools that violate this commandment will annoy me.

The single largest problem with programming isn’t that we don’t have information. We have a LOT of it. The problem is that we knowingly lose it because we don’t need it immediately. Then we react by adding other clumsy methods around this fundamentally flawed decision.

My current favorite annoyance is code-generation. Personally I’d prefer a 2-pass compiler where code is generated at compile-time, but language providers don’t like it. Fine, I’ll live with it. What bugs me is the loss of the semantically rich spec in the process. It probably contained intent I would benefit from as a consumer.

Consider a function that only operates on integers between 5 and 10.

When information is captured correctly, you can generate beautiful documents from it, run static analysis, do fuzz testing, unambiguously read it, and just do a lot more — from ONE definition.

typedef FuncDomain where FuncDomain in int and 5 <= FuncDomain <= 10.
function CalculateStuff(input: FuncDomain) output {
}

When information is still captured for you (in English) and the programming language (through input validation), you lose a lot of the above, and take the additional burden unit-testing the limits.

// Make sure input is between 5 and 10
function CalculateStuff(input: int) output {
if (input < 5 OR input >= 10) {
// throw error, panic, exit, crash, blame caller,
// shame them for not reading documentation
}
}

This borders on criminal. What happens when you pass it a 5? Aha, we have a half-open interval. Documentation bug?Make sure input is between 5 and 10, 5 exclusive, 10 inclusive. Or (5,10]. We just wrote a fairly accurate program. It’s a pity it wasn’t THE program.

We can certainly blame the programmer for not writing the input validation code twice. Or we may blame the consumer for having trusted the doc blindly (which seems to be like the Pirates Code… more what you call a guideline.) The root problem? We LOST information we shouldn’t have — by crippling the developer! Then we shamed them for good measure.

Finally, the absolute worst, is where you have no input validation but only code comments explaining what you can and cannot do.

I could write books on unforgivable information-loss. We do this a lot more than you’d think. If I was wasn’t on the sidelines, Kubernetes would hit this for me: You begin with strongly-typed structs, code-generated from spec because the language refuses to give you generics or immutable structures, communicated through a stringly-typed API (for extensiblility), giving birth to an entire cottage industry of side-channel documentation systems.

In a world where information should be ADDED and ENRICHED as you move up the stack, you face the opposite. You begin with remarkably rich specs and intentions, and you end up sending strings and seeing what happens.

So here’s my call to Docless! Ask yourself two questions before throwing some string of bytes at me and call it “Documentation”.

  1. Have you taken every effort to convey intent?
  2. Have you missed an opportunity to store/capture/expose information that you already had before duplicating the same thing in a different format?

The SLA-Engineering Paradox

Why outcome-defined projects tend to drive more innovation than recipe-driven projects

In the beginning, there was no Software Engineering. But as teams got distributed across the world, they needed to communicate what they wanted from each other and how they wanted it.

Thus, Software Engineering was born and it was…. okay-ish. Everything ran over-budget, over-time, and ultimately proved unreliable. Unlike construction, from whence SE learned its craft, computers had nothing resembling bricks, which hadn’t changed since pre-history.

But in a small corner of the industry a group of utilitarian product people were building things that had to work now: they knew what they wanted out of them, and they knew what they were willing to pay for them. They created an outcome-driven approach. This approach was simple: they asked for what they wanted, and tested whether they got what they asked for. These outcomes were called SLAs (Service Level Agreements) or SLOs (Service Level Objectives). I’m going to call them SLAs throughout this post.

The larger industry decided that it wanted to be perfect. It wanted to build the best things, not good-enough things. Since commoners couldn’t be trusted to pick and choose the best, it would architect the best, and leave commoners only to execute. It promoted a recipe-driven approach. If the recipe was perfectly designed, perfectly expressed, and perfectly executed, the outcome would, by definition, be the best possible.

The industry would have architects to create perfect recipes, and line cooks who would execute them. All the industry would need to build were tools to help those line cooks. This led to Rational Rose, UML, SOAP, CORBA, XMLs DTDs, and to some extent SQL, but more painfully Transactional-SQL, stored procedures, and so on.

The paradox of software engineering over the past two decades is that, more often than not, people who specified what they wanted without regard to recipes tended to produce better results, and create more breakthroughs and game changers, than those who built the perfect recipes and accepted whatever resulted from them as perfect.

(I can hear you smug functional-programming folks. Cut it out.)

Today I want to explore the reasons behind this paradox. But first… let’s understand where the two camps came from intuitively.

The intuitive process to get the very best

If you want to design/architect the most perfect system in the world, it makes sense to start with a “greedy algorithm”. At each step, a greedy algorithm chooses the absolute very best technology, methodology and implementation the world has to offer. You vet this thing through to the end. Once you are assured you could not have picked a better component, you use it. As any chef will tell you, a high-quality ingredient cooked with gentle seasoning will outshine a mediocre ingredient hidden by spices.

The second round of iteration is reducing component sets with better-together groups. The perfect burger might be more desirable with mediocre potato chips than with the world’s greatest cheesecake. In this iteration you give up the perfection of one component for the betterment of the whole.

In technology terms, this would mean that perhaps a SQL server and web framework from the same vendor go better together, even if the web server is missing a couple of features here and there.

These decisions, while seemingly cumbersome, would only need to be done once and then you were unbeatable. You could not be said to have missed anything, overlooked anything, or given up any opportunity to provide the very best.

What you now end up with, after following this recipe, is the absolute state-of-the-art solution possible. Nobody else can do better. To quote John Hammond, “spared no expense.

SLA-driven engineering should miss opportunities

The SLA-driven approach of product design defines components only by their attributes — what resources they may consume, what output they must provide, and what side-effects they may have.

The system is then designed based on an assumption of component behaviors. If the system stands up, the components are individually built to meet very-well-understood behaviors.

Simply looking at SLAs it can be determined whether a component is even possible to build. Like a home cook, you taste everything while it is being made, and course-correct. If the overall SLA is 0.2% salt and you’re at 0.1%, you add 0.1% salt in the next step.

The beauty of this is that when SLAs can’t be met, you get the same outcome as recipe-driven design. The teams find the best the world has to offer, and tells you what it can do — you can do no better. No harm done. In the best case however, once they meet your SLA, they have no incentive to go above and beyond. You get mediocre.

This is a utilitarian industrial system. It appears that there is no aesthetic sense, art, or even effort to do anything better. There is no drive to exceed.

SLA-driven engineering should be missing out on all the best and greatest things in the world.

What we expected

The expectation here was obvious to all.

If you told someone to build you a website that works in 20ms, they would do the least possible job to make it happen.

If you gave them the perfect recipe instead, you might have gotten 5ms. Even if you did get 25ms or 30ms, they followed your recipe faithfully, you’ve just hit theoretical perfection.

A recipe is more normalized; it is compact. Once we capture that it will be a SQL server with password protection, security groups, roles, principals, user groups, onboarding and offboarding processes, everything else follows.

On the other hand, SLA-folks are probably paying more and getting less, and what’s worse, they don’t even know by how much. Worse they are also defining de-normalized SLAs. If they fail to mention something, they don’t get it. Aside from salt, can the chef add or remove arbitrary ingredients? Oh the horror!

The Paradox

Paradoxically we observed the complete opposite.

SLA-driven development gave us Actor Models, NoSQL, BigTable, Map-Reduce, AJAX (what “web apps” were called only ten years ago), Mechanical Turk, concurrency, and most recently, Blockchain. Science and tech that was revolutionary, game-changing, elegant, simple, usable, and widely applicable.

Recipe-driven development on the other hand kept burdening us with UML, CORBA, SOAP, XML DTDs, a shadow of Alan Kay’s OOP, Spring, XAML, Symmetric Multiprocessing (SMP), parallelism, etc. More burden. More constraints. More APIs. More prisons.

It was only ten years ago that “an app in your browser” would have led to Java Applets, Flash or Silverlight or some such “native plugin.” Improving ECMAScript to be faster would not be a recipe-driven outcome.

The paradoxical pain is amplified when you consider that recipes didn’t give us provably correct systems, and failed abysmally at empirically correct systems that the SLA camp explicitly checked for. A further irony is that under SLA constraints, provable correctness is valued more — because it makes meeting SLAs easier. The laziness of SLA-driven folks not only seems to help, but is essential!

A lot like social dogma, when recipes don’t produce the best results, we have two instinctive reactions. The first is to dismiss the SLA-solution through some trivialization. The second is to double-down and convince ourselves we didn’t demand enough. We need to demand MORE recipes, more strictness, more perfection.

Understanding the paradox

Let’s try to understand the paradox. It is obvious when we take the time to understand context and circumstances in a social setting where a human is a person with ambitions, emotions, creativity and drive, vs. an ideal setting where “a human” is an emotionless unambitious drone who fills in code in a UML box.

SLAs contextualize your problem

This is the obvious easy one. If you designed web search without SLAs, you would end up with a complex, distributed SQL server with all sorts of scaffolding to make it work under load. If you first looked at SLAs, you would reach a radically new idea : these are key-value pairs. Why do I need B+ trees and complex file systems?

Similarly, if you had to ask, “What is the best system to do parallel index generation?”, the state of the art at the time were PVM/MPI. Having pluggable Map/Reduce operations was considered so cool, every resume in the mid-2000s had to mention these functions (something that is so common and trivial today, your web-developer javascript kiddie is using it to process streams.) The concept of running a million cheap machines was neither the best nor state of the art. 🙂 No recipe would have arrived at it.

SLAs contextualize what is NOT a problem

We all carry this implied baggage with us. Going by the examples above, today none of us consider using a million commodity machines to do something as being unusual. However, with no scientific reason whatsoever, it was considered bad and undesirable until Google came along and made it okay.

Again, having SLAs helped Google focus on what is NOT a problem. What is NOT the objective. What is NOT a goal. This is more important than you’d think. The ability to ignore unnecessary baggage is a powerful tool.

Do you need those factories? Do you need that to be a singleton? Does that have to be global? SLAs fight over-engineering.

SLAs encourage system solutions

What? That sounds crazy! It’s true nonetheless.

When programmers were told they could no longer have buffer overflows, they began working on Rust. When developers were told to solve the dependency problem, they flocked to containers. When developers were told to make apps portable, they made Javascript faster.

On the other hand, recipe-driven development, not being forced and held to an outcome, kept going the complex way. Its solution to buffer overflows was more static analysis, more scanning, more compiler warnings, more errors, more interns, more code reviews. Its solution to the dependency problem was more XML-based dependency manifests and all the baggage of semantic versioning. Its solution to portable apps was more virtual machines, yet one more standard, more plugins, etc.

Without SLAs Polyverse wouldn’t exist

How much do I believe in SLA-driven development? Do I have proof that the approach of first having the goal and working backwards works? Polyverse is living proof.

When we saw the Premera hack, and if we wondered what the BEST in the world is, we’d have thought of even MORE scanning, and MORE detection and MORE AI and MORE neural networks, and maybe some MORE blockchain. Cram in as much as we can.

But when we asked ourselves “If the SLA is to not lose more than ten records a year or bust,” we reached a very different systemic conclusion.

When Equifax happened, if we’d asked ourselves what the state of the art was, we would do MORE code reviews, MORE patching, FASTER patching, etc.

Instead we asked, “What has to happen to make the attack impossible?” We came up with Polyscripting.

Welcome to the most counterintuitive, and yet widely empirically proven, and widely used conclusion in Software Engineering. Picking the best doesn’t give you the best. Picking a goal leads to better than anything the world has to offer today.

Google proved it. Facebook proved it. Apple proved it. Microsoft proved it. They all believe in it. Define goals. Focus on them. Ignore everything else. Change the world.

Automatic Mitigation of Meltdown

Let’s look at what Meltdown is and how it works, as well as how it is stopped. A lot has been written about the Meltdown vulnerability, but it is still commonly misunderstood. A few diagrams may help.

First, let’s consider a simplified memory hierarchy for a computer: main memory, split into user memory and kernel memory; the cache (typically on the CPU chip); and then the CPU itself.


The bug is pretty simple. For about two decades now, processors have had a flag that tells them what privilege level a certain instruction is running in. If an instruction in user space tries to access memory in kernel space (where all the important stuff resides), the processor will throw an exception, and all will be well.

On certain processors though, the speculative executor fails to check this bit, thus causing side-effects in user space (caching of a page), which the user space instructions can test for. The attack is both clever and remarkably simple.

Let’s walk through it graphically. Assume your memory starts with this flushed cache state — nothing sits in the cache right now (the “flush” part of what is a a “flush-reload” attack):


Step 1: Find cached pages

First let’s allocate 256 pages on the user space that we can access. Assuming a page size of 4K, we just allocate 256 times 4K bytes of memory. It doesn’t matter where those pages reside in user-space memory, so long as we got the page size correct. In C-style pseudo-code:

char userspace[256 * 4096];

I’ll mark those in the userspace diagram — for brevity, I’ll only show a few pages, and I’m going to show cached pages popped up like this:


This allows for easier reading (and easier drawing for me!).

So let’s start with an empty (flushed) cache:


We know what the cache state would be if we accessed a byte in page 10. Since any byte in page 10 would do the trick, let’s just use the very first byte (at location 0).

The following code accesses that byte:

char dummy = userspace[10 * 4096];

This leads the state to be:


Now what if we measured the time to access each page and stored it?

int accessTimes[256];
for (int i=0; i < 256; i++) {
    t1 = now();
char dummy = userspace[i * 4096];
    t2 = now();
accessTimes[i] = t2-t1;
}

Since page 10 was cached, page 10’s access time would be significantly faster than all other pages which need a roundtrip to main memory. Our access times array would look something like this:

accessTimes = [100, 100, 100, 100, 100, 100, 100, 100, 100, 10, 100, 100....];

The 10th value (page 10) is an order of magnitude faster to access than anything else. So page 10 is cached, whereas others were not. Note though that all of the pages did get cached as part of this access loop. This is the “reload” part of the flush-reload side-channel — because we reloaded all pages into the cache.

At this point we can figure out which pages are cached with ease if we flush the cache, allow someone else to affect it, then reload it.

Step 2: Speculate on kernel memory

This step is easy. Let’s assume we have a pointer to kernel memory:

char *kernel = 0x1000; //or whatever the case is

If we tried to access it using an unprivileged instruction, it would fail — our user space instructions don’t have a privileged bit set:

char important = kernel[10];

Speculating this is easy. The instruction above would speculate just fine. It would then throw an exception, which would cause us to never get the value of important.

Step 3: Affect userspace based on speculated value

However, what happens if we speculated this?

char dummy = userspace[kernel[10] * 4096]

We know userspace has 256 * 4096 bytes — we allocated it. Since we’re only reading one byte from the kernel address, the maximum value is 255.

What happens when this line is speculated? Even though the processor detected the segmentation fault and prevented you from reading the value, did you notice that it cached the user-space page? The page whose number was the value of kernel memory!

Suppose the value ofkernel[10] was 17. Let’s run through this:

  1. Processor obtained kernel[10] using the branch predictor. That value was 17.
  2. The processor then dereferenced the 17th 4K-wide page in the array “userspace”: userspace[17 * 4096]
  3. The processor detected that you weren’t allowed to access kernel[10], and so told you you can’t execute the branch. Bad programmer!
  4. The processor left the cache untouched. It’s not going to let you touch kernel memory on the cache though. It’s got your back…

What was the state of cache at the end of this?


That’s cool! Using Step 1, we would get the 17th page time being the fastest — by a large amount from the others! That tells us the value of kernel[10] was 17, even though we never accessed kernel[10]!

Pretty neat huh? By going over the kernel byte by byte, we can get the value of every kernel address, by affecting cache pages.

What went wrong? How are we fixing it?

Meltdown is a genuine “bug” — it’s not in the side-channel. The bug is straightforward — CPU speculative execution should not cross security boundaries — and ultimately should be fixed in the CPU itself.

It’s not the cache that’s misbehaving — even though that’s where most operating-system vendors are fixing it. More precisely, they are attempting to further isolate kernel and userspace memory, using something called Kernel Page Table Isolation (KPTI), previously called KAISER. It maps very few “stub” pages to the process’s virtual memory, keeping the kernel out (and thus not reachable by the speculative execution engine).


Unfortunately, this segmentation is coming at a cost — accessing kernel memory now requires more expensive hardware-assisted transitions.

Polymorphic Linux stops ROP attacks; increases difficulty of others

Since Polymorphic Linux was intended for stopping ROP attacks dead in their tracks, all ROP attacks in kernel space are defeated by using polymorphic kernels. Especially when KASLR (kernel address space layout randomization) is defeated (which is so trivial that the Meltdown paper leaves it as an exercise for the reader).

Furthermore, since polymorphic binaries have different signatures, layouts, instructions and gadgets, they make it difficult by at least an order of magnitude to craft further attacks. Polymorphic binaries force the extra step of analysis and understanding per binary. This means that a lateral attack (one that moves from machine to machine in a network) becomes much harder.

Look out for my next post on Spectre. It’s a bit more difficult to explain and definitely harder than Meltdown to craft…

ASLR simplified!

ASLR explained in one simple picture

ASLR increases difficulty without adding complexity. In Part 1 and Part 2 of this series I demonstrated that crafting attacks can be a pleasant experience without a lot of furious typing. I’ve even shown you how defeating exploits is easy when we really understand how the attack works. Lets see dive deeper into ASLR, your first line of defense.


Let me explain what you’re seeing in this picture. I loaded a CentOS 7.2 libc-2.17, which we crafted an attack against in my previous post. When I loaded the exact same file on the right, I did it with an offset of 16 bytes (10 in hexadecimal).

I’m adding features to the tool when I need them for the story.

I picked 16 (hex 10) because it provides easy to interpret uniform offsets across all addresses.


You’ll notice how the binary on the right is the same binary as on the left, but it’s moved (which is why the lines are all orange.) The gadgets still exist intact but they’re in a different location. Let’s tabulate the first 5 addresses:

1807d1 + 0x10 = 1807e1
1807f1 + 0x10 = 180801
1807b1 + 0x10 = 1807c1
1a44cc + 0x10 = 1a44dc
1770b0 + 0x10 = 1770c0

This is clever because, as you saw in the title image, if we tried to execute our ROP chain, c6169 c7466 1b92, it will work on the original binary, but it falls flat on the offset one.


In a nutshell, this is what ASLR does! If we offset the same library differently (and unpredictably) for every program on a machine, the chances that the same attack would work or spread are very low.

Remember, security is not about complexity and two people typing furiously on keyboards, entertaining as that is. Security is about doing what is necessary and sufficient to defeat an attack vector.

How is this movement possible?

Offsets are easy because right around the time virtual memory became a thing with the i386, and we moved away from segmented memory to paged memory. All operating systems, processors and compilers came together to work on an offset model. This was originally not intended for security, but rather to enable programs to view a really large memory space, when physically they would only ever use a little bit. It allowed every program to work from memory address 0 through MAX, and the operating system would map it to something real.

ASLR makes use of what already existed which enables any program compiled for a modern operating system to automatically benefit from it.

Can we discover more?

I’m particularly proud of this disassembler because you’re not looking at some block diagram I drew in photoshop or name your favorite visualizer program. You’re looking at a real binary of your choice that you uploaded and can now watch these offsets, gadgets and chains at work. This is ASLR on real gadgets in action!

The cliffhanger for this post is to figure out what techniques you might use to discover the offset… remember there’s only one piece of information we need to jump to any ROP location in the offset binary. All I would have to do is add 0x10 to each address in my chain, and I broke ASLR. Like so: c6179 c7476 1ba2


This gave me an idea. You’ll notice that somehow pop rdi ; ret was in the base library even at the offsetted position! Can we find something common?

I filtered the offsetted library to show surviving gadgets, and some 2,279 gadgets survived.


I have to admit, I sometimes rig these posts to tell a story but this caught me off guard. I discovered that an offset isn’t enough and a sufficiently LARGE offset is needed if a lot of gadgets tend to occur consecutively. This was crazy!

So the second cliffhanger for today is… given that they ALL offset by a fixed amount, is it possible to infer the offset trivially? The answer is of course yes, since the video in Part 2 demonstrated it happening. It’s one thing to read a dry answer and another to intuitively understand it.

Next up I’ll see if I can’t easily figure out an intuitive way to find the offset. I’m basically solving these problems as I write them — this is not some planned series. My team wanted this tool for some other demo, but it ended up being so much fun, I started writing these posts. So I honestly don’t know if I have an answer for intuitive offset-discovery.

Fun with binaries!

ASLR and DEP defeated with three instructions and one offset!

This is Part 2 of my previous post that demonstrated how you craft undetectable attacks against binaries, using our colorful Open Source Entropy Visualization tool. I left you with a cliffhanger… so let’s begin there!

Recap of the cliffhanger

The cliffhanger I left you with was that all we need are three tiny ROP gadgets, and the offset of mprotect, to make any arbitrary part memory executable. First, I present my proof:

This is a video by Roy Sundahl, one of our most senior engineers, and our resident ROP expert who spends a lot of his time figuring out offensive tools.

Before we proceed, if you’re wondering why we can’t just block calls to mprotect, it turns out there’s some truth to Greenspun’s tenth rule. Let’s forgo the obvious candidates like interpreters and JITers. I learned that the tiniest of programs that might use regular expressions will need to call mprotect — including the innocuous “ls”.

Let’s cast a wider net!

Okay that exploit was cool, and you can do this for yourself by finding gadgets across all the libc’s in the samples.

But can we do more? Can we easily go after a range of machines *without* knowing a target signature? Let’s find out!

Here I’m comparing the same “version” of libc across CentOS 7.1 and 7.2. For a quick reference, on the right, rows with a red background are gadgets that survived perfectly, yellow background are gadgets that exist but at a different location, and no background are gadgets that didn’t exist in the first file.

We found some 2503 gadgets across them. You notice how little variation there is when the code was compiled at two different times, from what is probably two variations. The more gadgets that fall on the same addresses, the easier it is for us to cast a wide net since it requires that many fewer custom craftings to go after a binary. The way to determine if your exploit will work across both, first filter the right side by “Surviving Gadgets”, and then search for gadgets you want.

Let’s try that across CentOS 7.1 and 7.2. First up, pop rdi ; ret? Yep! There it is! The first common address is: c6169.

Second up, pop rsi ; ret? Yep! There it is also! First common address is: c7466.

Finally, pop rdx ; ret? Yep! The first surviving address is: 1b92.

We got our complete ROP chain across both binaries: c6169 c7466 1b92. We can validate this by simulating execution across both binaries.

Now you know the complete power of the tool!

This is what the tool is intended to do! You can verify rop chains across binaries without ever leaving your browser. You can now tell, visually and graphically, whether a particular attack will work against a given binary you run. It can be used to craft attacks, but it can also be used to ensure that a patch really worked.

There’s a bit of emotional comfort when you can execute a chain visually, see how the flow jumps around, and see that it doesn’t work.

Are Overflows/Leaks that common?

All this depends of course, on you being able to manipulate some little bit of stack space. Aren’t overflows so…. 2000s? We use bounds-checked modern languages that don’t suffer from these problems.

First of all, if you subscribe to our weekly breach reports, you’ll empirically find that overflows and memory leaks are pretty common. Even the internet’s favorite language, Javascript, is not immune.

Secondly, my best metric to find truth is to look for back-pressure (the sociological version of proof-by-contradiction). Look out for attempts at locking this down 100%, and then follow the backlash.

However, I also want you to get an intuitive understanding of where they arise and why they happen.

Even I have to admit that certain operations (such as sorting or XML/JSON parsing) are better implemented by manipulating memory buffers directly, despite my well-publicized extremist views favoring immutable data and list comprehensions,

So what does a “real overflow” look like? (Code in the samples directory.)

#include <stdio.h>
#define BUF_LEN 20
int main()
{
    char buf[BUF_LEN];
    int i=0;
    while (i++ < BUF_LEN) {
        printf("Setting buf[%d] to zero. n",i);
        buf[i] = 0;
    }
}

I just overwrote a byte on the stack frame. It’s obvious when I point it out. If you were working on this code and not looking for overruns, this is easy to miss. Ever seen the college textbook example of a quicksort using while-loops to avoid using the system stack? They are liberal with while(1)s all over the place.

Personal Rant: They are very common, and they are insanely difficult to find. This is why I’m such an extremist about immutability, list comprehensions, symbolic computation. For your business apps, you should NEVER, unless under extreme exceptions, listen to that “clever” developer who is doing you the favor of writing efficient code. Pat them on the back. Give them a promotion or whatever. Get them out of the way. Then find a lazy person who’ll use list-comprehensions and copy-on-change wherever possible! I’m a big believer in Joe Armstrong’s advice here: First make it work. Then make it beautiful. Finally, if necessary, make it fast.

In our analyses, more than 65% of critical CVEs since June 1st fell under this category. I could be off by a few points on that number since it changes as we compile our reports periodically and tweak how we classify them. But it’s well over 60%.

Putting it all together

In Part 1, I showed you what ROP gadgets are, how to find them, chain them, and exploit them.

In Part 2, I completed the story by demonstrating how to find common gadgets across a wide array of deployed binaries.

The purpose of the Entropy Visualizer is to enable all this decomposition in your browser. In fact this is an easier tool than most ROP finders I know. 🙂

Happy Hunting!

Let’s craft some real attacks!

If you read security briefings, you wake up every morning to “buffer overflow” vulnerabilities, “control flow” exploits, crafted attacks against specific versions of code, and whatnot.

Most of those descriptions are bland and dry. Moreover, much of it makes no intuitive sense, everyone has their fad of the week, and it is easy to feel disillusioned. What’s real, and what’s techno-babble? Didn’t we just pay for the firewalls and deploy the endless stream of patches? What is with all this machine-code nonsense?

A gripe I’ve always had with our industry is that the first solutions we come up with are architectural ivory towers. We try curing cancer on day one, and then in a few years we would sell our soul just to be able to add two numbers reliably. (Yeah, I’m still holding a grudge against UML, CORBA, SOAP, WSDL, and oh for god’s sake — DTDs!)

Let’s skip all that and actually begin by crafting a real attack visually and interactively! No more concepts. No more theory. No more descriptions of instruction set layouts and stacks and heaps! Liberal screenshots to follow! Brace yourself! This is as colorful as binaries will ever get!

Let’s play attacker for a bit

Intro to Tools

Let’s start by visiting this tool I wrote specifically for this blog post, and open a binary.

https://analyze.polyverse.io

(Source code here: https://github.com/Polyverse/binary-entropy-visualizer)

Everytime I build a web app, I end up putting a CLI in there.

Now you can drag-drop a file on there to analyze it — yeah that web page is going to do what advanced geeky nerdy tools are supposed to do on your desktop. For now it only supports Linux 64-bit binaries. Don’t look too hard, there’s two samples provided on my github repo: https://github.com/polyverse/binary-entropy-visualizer/tree/master/samples. Simply download either of the files ending in “.so”.

When you throw it on there, it should show you a progress bar with some analysis…..

Getting this screenshot was hard — it analyzes quickly.

If you want to know what it’s doing, click on the progress bar to see a complete log of actions taken.

Proof: Despite my best attempts, I hid a CLI in there for myself.

When analysis is complete, you should see a table. This is a table of “ROP gadgets.” You’re witnessing a live analysis in your browser of what people with six screens in dark rooms run with complex command lines and special programs.

But wait.. what about those other two sections?

We won’t go into what ROP gadgets are, what makes them a gadget and so on. Anyone who’s ever gone through Programming 101 will recognize it as “Assembly Language code”, another really fun thing that is always presented as dry and irritating. It’s also everywhere.

What is an exploit?

Execution of unwanted instructions

In the fashion of my patron saints, McGyver (the old one) and the Mythbusters, I am not going to go into how you find a buffer overrun and get to inject stuff onto a stack and so on. Sorry. Plenty of classes online to learn how to do that, or you might want to visit Defcon.

Let’s just assume you have a process with a single byte buffer overrun. This isn’t as uncommon as you’d think. Off-by-one errors are plentiful out there. Sure, everyone should use Rust, but didn’t I just rant about how we all want to be “clever” and struggle to plug holes later?

Let’s simply accept that an “exploit” is a set of commands you send to a computer to do what you (the attacker) wants, but something the owner/developer/administrator (the victim) definitely does not want. No matter what name the exploit goes under, at the end of the day it comes down to executing instructions that the attacker wants, and the victim doesn’t. What does stealing/breaking a password do? Allow execution. What does a virus do? Executes instructions. What does SQL-injection do? Executes SQL instructions.

Remember this: execution of unwanted instructions is bad.

Always know what you want

We want a specific set of instructions to run, given below.

Okay let’s craft an exploit now. We’re going to simulate it. All within the browser.

Let’s say for absolutely arbitrary reasons that running the following instructions makes something bad happen. WOPR starts playing a game. Trust me: nobody wants that! You don’t have to understand assembly code. In your mind, the following should translate to, “Later. Let’s play Global Thermonuclear War.

jbe 0x46c18 ; nop ; mov rax, rsi ; pop rbx ;
add byte ptr [rax], al ; add bl, ch ; cmpsb byte ptr [rsi], byte ptr [rdi] ; call rax
and al, 0xf8 ; 
mov ecx, edx ; cmp rdx, rcx ; je 0x12cb78 ; 
jne 0x1668e0 ; add rsp, 8 ; pop rbx ; pop rbp ; 
sub bl, bh ; jmp qword ptr [rax]
add byte ptr [r8–0x77], r9b ; fimul dword ptr [rax — 0x77] ; 
or byte ptr [rdi], 0x94 ; 
push r15 ; in eax, dx ; jmp qword ptr [rdx]
jg 0x95257 ; jne 0x95828 ; 
jb 0x146d9a ; movaps xmmword ptr [rdi], xmm4 ; jmp r9
or byte ptr [rdx], al ; add ah, dl ; div dh ; call rsp
jg 0x97acb ; movdqu xmmword ptr [rdi + 0x10], xmm2 ; 
or dword ptr [rax], eax ; add byte ptr [rax], al ; add byte ptr [rax], al ; 
add byte ptr [rax], al ; enter 8, 0 ; 
xor ch, ch ; mov byte ptr [rdi + 0x1a], ch ;

So how do we do it? The most effective ways to do this is to use social engineering, spearfishing, password-guessing, etc, etc. They are also ways that leave traces. They are effective and blunt, and, with enough data, they will be caught. Also, look at that code. Once someone figures out that this set of instructions causes bad things, it is easy to generate a signature to find any bits of code that match it, and prevent it from running.

But I wouldn’t be writing this post if that was the end of it.

Just because you can’t inject this code through the other methods, doesn’t mean you can’t inject code that will cause this series of instructions to be executed. AI/analytics/machine learning: all suffer from one big flaw — the Turing Test.

A program isn’t malicious because it “has bad instructions.” There’s no such thing as “bad instructions”. Why would processors, and machines and servers and phones ship with “bad instructions?” No, there are bad sequences of instructions!

A program doesn’t necessarily have to carry the bad sequence within itself. All it has to do is carry friendly good sequences, which, on the target host, lead to bad sequences getting executed. If you haven’t guessed already, this behavior may not necessarily be malicious; it might even be accidental.

How to get what you want

Now go back to the tool if you haven’t closed it. Use the file “libc-2.17.so” from the samples, and load it.

Then enter this sequence of numbers in the little text box below “ROP Chain Execution:”

46c1c 7ac3f 46947 12cb5f 166900 183139 cfdcb 12f7ea 191614 95236 146d8a 1889ad 97abb 4392 17390e 98878

It should look something like this:


Go ahead and execute the chain.


Well guess what? An exact match to my instructions to activate WOPR!

The libc that you just analyzed is a fundamental and foundational library linked into practically any and every program on a Linux host. It is checked and validated and patched. Each of those instructions is a good instruction — approved and validated by the processor-maker, the compiler-maker, the package manager all the way down to your system administrator.

What’s a REAL sequence of bad instructions?

pop rdi, pop rsi, pop rdx, and offset of mprotect is all it takes!

I made up the sequence above. In a complete break from convention, I made it more complex just so it’d look cool. Real exploits require gadgets so simple, you’ll think I’m making this part up!

A real known dangerous exploit we (simulated) in our lab requires only three ROP gadget locations, and the offset to mprotect within libc. We can defeat ASLR remotely in seconds, and once we call mprotect, we can make anything executable that we want.

You can see how easy it is to “Find Gadget” and create your own chain for:
pop rdi ; ret
pop rsi ; ret
pop rdx ; ret

This illustrates how simple exploits hide behind cumbersome tools, giving the illusion of difficulty or complexity.

Crafting your own real, serious payloads

So why is this ROP analyzer such a big deal? If you haven’t put two and two together, an exploit typically works like this:

  1. You figure out what you want (we covered this step above).
  2. You need to figure out a sequence of instruction groups, all ending with some kind of a jump/return/call that you can exploit to get the intermediate instructions in between executed.

Turns out that step 2 is not so easy. You need to know what groups of instructions you have to play with, so you can craft chains of them together.

This tool exports these little instruction-groups (called gadgets) from the binaries you feed it. You can then solve for finding which gadgets in what sequence will get achieve your goal.

This is a complex computational problem that I won’t solve today.

Look out for Part 2 of my post which will go into what the other “Compare File” dialog is for… stay tuned! It’s dead trivial to figure out, anyway, so go do it if you want.