Semantic Versioning has failed Agile

This is an Engineering post on how we build software at Polyverse, what processes we follow and why we follow them.

A couple of weeks ago, I attended a CoffeeOps meetup at Chef HQ. One of my answers detailing how we do agile, CI/CD, etc. got people excited. That prompted me to describe in detail exactly how our code is built, shipped, and how we simplified many of the challenges we saw in other processes. It should be no surprise that we make heavy use of Docker for getting cheap, reliable, and consistent environments.

Dropping Irrelevant Assumptions

I first want to take a quick moment to explain how I try to approach any new technology, methodology or solution, so that I make the best use of it.

Four years ago, when we brought Git into an org, a very experienced and extremely capable engineer raised their hand and asked, “I’ve heard that Git doesn’t have the feature to revert a single file back in history. Is this true? If it is true, then I want to understand why we are going backwards.”

I will never forget that moment. As a technical person, truthfully, that person was absolutely RIGHT! However, moving a single file backwards was something we did because we didn’t have the ability to cheaply tag the “state” of a repo, so we built up terrible habits such as “branch freeze”, timestamp-based checkouts, “gauntlets”, etc. It was one of the most difficult questions to answer, without turning them antagonistic, and without sounding like you’re evading the issue.

I previously wrote a similar answer on Quora about Docker and why the worst thing you can do is to compare containers to VMs.

It is very dangerous to stick to old workarounds when a paradigm shift occurs. Can we finally stop it with object pools for trivial objects in Java?

What problems were we trying to solve?

We had the same laundry list of problems nearly any organization of any type (big, small, startup, distributed, centralized, etc.) has:

  1. First and foremost, we wanted to be agile, as an adjective and a verb, not the noun. We had to demonstrably move fast.
  2. The most important capability we really wanted to get right was the ability to talk about a “thing” consistently across the team, with customers, with partners, with tools and everything else. After decades of experience, we’d all had difficulty in communicating precise “versions” of things. Does that version contain a specific patch? Does it have all the things you expect? Or does it have things you don’t expect?
  3. We wanted to separate the various concerns of shipping and pay the technical debt at the right layer at all times: developer concerns, QA assertions, and release concerns. Traditional methods (even those used in “agile”, were too mingled for the modern tooling we had). For example, “Committing code” is precisely that — take some code and push it. “Good code” is a QA assertion — regardless of whether it be committed or not. “Good enough for customers” is a release concern. When you have powerful tools like Docker and Git, trying to mangle all three in some kind of “master” or “release” branch seemed medieval!
  4. We wanted no “special” developers. Nobody is any more or less important than anyone else. All developers pay off all costs at the source. One person wouldn’t be unfairly paying for the tech debt of another person.
  5. We believed that “code is the spec, and the spec is in your code”. If you want to know something, you should be able to consistently figure it out in one place — the code. This is the DRY principle (Don’t Repeat Yourself.) If code is unreadable, make it readable, accessible, factored, understandable. Don’t use documentation to cover it up.
  6. We wanted the lowest cognitive load on the team. This means use familiar and de-facto standards wherever possible, even if it is occasionally at the cost of simplicity. We have a serious case of the “Not-Invented-Here” syndrome. 🙂 We first look at everything that’s “Not-Invented-Here”, and only if it doesn’t suit our purpose for a real and practical problem, do we sponsor building it in-house.

Now let’s look at how we actually build code.

Content-based non-linear versioning

The very fundamental foundation of everything at Polyverse is content-based versioning. A content-based version is a cryptographically secure hash over a set of bits, and that hash is used as the “version” for those bits.

This is a premise you will find everywhere in our processes. If you want to tell your colleague what version of Router you want, you’d say something like: router@72e5e550d1835013832f64597cb1368b7155bd53. That is the version of the router you’re addressing. It is unambiguous, and you can go to the Git repository that holds our router, and get PRECISELY what your colleague is using by running git checkout 72e5e550d1835013832f64597cb1368b7155bd53.

This theme also carries over to our binaries. While there is semantic versioning in there, you’ll easily baffle anyone on the team if you asked them for “Router 1.0.2”. Not that it is difficult to look it up, but that number is a text string that anyone could place there and as a mental model, you’d make everyone a little uneasy. Culturally we simply aren’t accustomed to talking in imprecise terms like that. You’d be far more comfortable saying Router with sha 5c0fd5d38f55b49565253c8d469beb9f3fcf9003.

Philosophically we view Git repos as “commit-clouds”. The repos are just an amorphous cloud of various commit shas. Any and every commit is a “version”. You’ll note that this not only is an important way to talk about artifacts precisely, but more so, it truly separates “concerns”. There is no punishment for pushing arbitrary amounts of code to Git on arbitrary branches. There is no fear of rapidly branching and patching. There is no cognitive load for quickly working with a customer to deliver a rapid feature off of a different branch. It just takes away all the burden of having to figure out what version you assign to indicate “Last known good build of v1.2.3 with patches x, y, but not z”, and “Last known good build of v1.2.3 with patches x, z, but not y”.

Instead, anyone can look up your “version” and go through the content tree, as well as Git history and figure out precisely what is contained in there.

Right about now, I usually get pushback surrounding the questions: how do you know what is the latest? And how do you know where to merge?

That is precisely that “perforce vs git” mental break we benefit from. You see, versions don’t really work linearly. I’ve seen teams extremely frightened of reverting commits and terrified of removing breaking features rapidly. Remember that “later” does not necessarily mean “better” or “comprehensive”. If A comes later than B, it does not imply that A has more features than B, or that A is more stable than B, or that A is more useful than B. It simply means that somewhere in A’s history, is a commit node B. I fundamentally wanted to break this mental model of “later” in order to break the hierarchy in a team.

This came from two very real examples from my past:

  1. I’ll never forget my first ever non-shadowed on-call rotation at Amazon. I called it the build-from-hell. For some reason, a breaking feature delayed our release by a couple of days. We were on a monthly release cycle. No big deal.
    However, once the feature was broken, and because there was an implied but unspoken model of “later is better”, nobody wanted to use the word “revert”. In fact, it was a taboo word that would be taken very personally by any committer. At some point, other people started to pile on to the same build in order to take advantage of the release (because the next release was still a month away.) This soon degraded into a cycle where we delayed the release by about 3 weeks, and in frustration we pulled the trigger and deployed it, because reverting was not in the vocabulary — I might have just called the committer a bunch of profane names and an imbecile and a moron and would have been better off.
  2. The event led to the very first Sev-1 of my life at 4am on one fateful morning. For the uninitiated, a Sev-1 is code for “holy shit something’s broken so bad, we’re not making any money off Amazon.com.”
    After four hours of investigation it turned out that someone had made a breaking API change in a Java library, and had forgotten to bump up the version number — so when a dependent object tried to call into that library, the dispatch failed. Yes we blamed them and ostracized them, and added “process” around versioning, but that was my inflexion point — it’s so stupid!
    What if that library had been a patch? What if it had 3 specific patches but not 2 others? Does looking at version “2.4.5-rc5” vs “2.4.5-rc4” tell you that rc5 has more patches? Less patches? Is more stable? Which one is preferable? Are these versions branch labels? Git tags? How you ensure someone didn’t reassign that tag by accident? Use signed tags? All of this is dumb and unnecessary! The tool gives you PRECISE content identifiers. It was fundamentally built for that! Why not use it? Back in SVN/CVS/Perforce, we had no ability to specifically point to a code snapshot and it was time we broke habits we picked up because of that. This is why Git does not allow reverting of a single one file. 🙂

The key takeaway here was — these are not development concerns. We conflated release concerns with identity concerns. They are not the same. First, we need a way to identify and speak about the precise same thing. Then we can assert over that thing various attributes and qualities we want.

We didn’t want people to have undue process to make rapid changes. What’s wrong with making breaking API changes? Nothing at all! That’s how progress happens. Developers should be able to have a bunch of crazy ideas in progress at all times and commits should be cheap and easy! They should also have a quick, easy and reliable way of throwing their crazy ideas over the wall to someone else and say, “Hey can you check this version and see how it does?”, without having to go through a one-page naming-convention doc and updating metadata files. That was just so medieval!

What about dependency indirection? One reason people use symbolic references (like semantic versioning) is so that we can refer to “anything greater than 2.3.4” and not worry about the specific thing that’s used.

For one, do you REALLY ever deploy to production and allow late-binding? As numerous incidents have demonstrated, no sane Ops person would ever do this!

In my mind, having the ability to deterministically talk about something, far outweighs the minor inconvenience of having to publish dependency updates. I’ll describe how we handle dependencies in just a minute.

Heavy reliance on Feature Detection

Non-linear content-based versioning, clearly raises red-flags. Especially when you’re built around an actor-based model of microservices passing messages all over the place.

However, there’s been a solution staring us right in the face for the past decade. One that we learned from the web developers — use feature detection, not version detection!

When you have loosely-coupled microservices that have no strict API bindings, but rather pass messages to each other, the best way to determine if a service provides a feature you want, is to just ask it!

We found quite easily, that when you’re not building RPC-style systems, and I consider callbacks as still being an RPC-style system, you don’t even need feature-detection. If a service doesn’t know what to do with a message, it merely ignores it, and the feature simply doesn’t exist in the system. If you’re not waiting for a side-effect — not just syntactically, but even semantically, you end up with a very easy model.

Now that comment in the previous section about a developer being able to throw a version over the wall and ask another developer what they thought of it, makes a lot more sense. Someone can easily plug in a new version very easily into the system, and quickly assert whether it works with the other components and what features it enables.

This means that at any given time, all services can arbitrarily publish a bunch of newer features without affecting others for the most part. This is also what allows us to have half a dozen things in progress at all times, and we can quickly test whether something causes a regression, and whether something breaks a scenario. We can label that against a “version” and we know what versions don’t work.

Naturally this leads us to a very obvious conclusion, where “taking dependencies” areno longer required at a service/component level. They wouldn’t be loosely-coupled actor-model-based Erlang-inspired microservices, if they had dependency trees. What makes more sense is…

Composition, not Dependencies

When you have content-based non-linear versioning allowing aggressive idea execution, combined with services that really aren’t all that concerned about what their message receivers do, and will simply log an error and drop weird messages sent to themselves, you end up with a rather easy solution to dependency management — composition.

If you’ve read my previous posts, or if you’ve seen some of our samples, you’ll have noticed a key configuration value that shows up all over the place called the VFI. It’s a JSON blob that looks something like this:

{
"etcd": {
"type": "dockerImage",
"address": "quay.io/coreos/etcd:v3.1.5"
},
"nsq": {
"type": "dockerImage",
"address": "nsqio/nsq:v0.3.8"
},
"polyverse": {
"type": "dockerImage",
"address": "polyverse-runtime.jfrog.io/polyverse:0e564bcc9d4c8f972fc02c1f5941cbf5be2cdb60"
},
"status": {
"type": "dockerImage",
"address": "polyverse-runtime.jfrog.io/status:01670250b9a6ee21a07355f3351e7182f55f7271"
},
"supervisor": {
"type": "dockerImage",
"address": "polyverse-runtime.jfrog.io/supervisor:0d58176ce3efa59e2d30b869ac88add4467da71f"
},
"router": {
"type": "dockerImage",
"address": "polyverse-runtime.jfrog.io/router:f0efcd118dca2a81229571a2dfa166ea144595a1"
},
"containerManager": {
"type": "dockerImage",
"address": "polyverse-runtime.jfrog.io/container-manager:b629e2e0238adfcc8bf7c8c36cff1637d769339d"
},
"eventService": {
"type": "dockerImage",
"address": "polyverse-runtime.jfrog.io/event-service:0bd8b2bb2292dbe6800ebc7d95bcdb7eb902e67d"
},
"api": {
"type": "dockerImage",
"address": "polyverse-runtime.jfrog.io/api:ed0062b413a4ede0e647d1a160ecfd3a8c476879"
}
}

NOTE: If you work at Amazon or have worked there before, you’ll recognize where the word came from. When we started at Polyverse, I really wanted a composition blob that described a set of components together, and I started calling it a VFI, and now it’s become a proper noun. It really has lost all meaning as an acronym. It’s simply its own thing at this point.

What you’re seeing here, is a set of components that describe as you guessed it, the addresses where they might be obtained (in this example, the addresses are symbolic — they’re Docker image labels; however in highly-secure deployments we use the one true way to address something — content-based shas. You might easily see a VFI that has “polyverse-runtime.jfrog.io/api@sha256:<somesha>” in the address field.

Again, you’ll notice that this isn’t a fight against proper dependencies, but rather an acknowledgement that “router” is not where information for “all of polyverse working” should be captured. It is a proper separation of concerns.

The router is concerned with whether it builds, passes tests, boots up and has dependencies it requires for its own runtime. What doesn’t happen is a router taking dependencies at the component level, on what the container manager should be, could be, would be, etc. And more so, it does not have the burden of ensuring that “cycling works”.

Too often these dependency trees impose heavy burdens on developers of a single component. In the past I’ve seen situations where, if you’re a web-server builder, and you got a downstream broken dependency related to authentication, you are now somehow de-facto responsible for paying the price of the entire system’s end-to-end working. It means that the burden of work increases as you move further upstream closer to your customer. One bad actor downstream, has you paying the cost. Sure, we can reduce the cost by continually integrating faster, but unless “reverting” is an option on the table, you’re still the person who has to do it.

This is why Security teams are so derided by the Operations teams. Until recently and the advent of DevSecOps, they always added a downstream burden — they would publish a library that is “more secure” but breaks a fundamental API, and you as the developer, and occasionally the operator paid the price for updating all API calls, testing and verifying that everything works.

Our VFI structure flips this premise on its head. If the router-developer has a working VFI, and somehow the downstream container manager developer broke something, then their “version” in that VFI is not sanctioned. The burden is now on them to go fix it. However, since the router doesn’t require a dependency update or a rebuild, simply plugging in their fixed version in the VFI, is sufficient enough to get their upgrade pushed into production quite easily.

You’ll also notice how this structure puts our experimentation ability on steroids. Given content-based versioning, and feature-detection, we can plug a thousand different branches, with a thousand different features, experiments, implementations, etc. in a VFI, and move rapidly. If we have to make a breaking change to an API, we don’t really have to either “freeze a branch” or do code lockdowns. We just replace API V1 with V2, and then as various components make their changes, we update those in the VFI and roll out the change reliably, accurately, predictably, and most importantly, easily. We remove the burden on the API changer to somehow coordinate this massive org-wide migration, and yet we also unburden the consumers from doing some kind of lock-step code update.

All the while, we preserve our ability to make strict assertions about every component, and an overall VFI — is it secure? is it building? Is it passing tests? Does it support features? Has something regressed? We further preserve our ability to know what is being used and executed at all times, and where it came from.

Naturally, VFI’s themselves are content-versioned. 🙂 You’ll find us sending each other sample VFIs like so: Runtime@b383463cf0f42b9a2095b40fc4cc597443da47f2

Anyone in the company can use our VFI CLI to expand this into a json-blob, and that blob is guaranteed to be exactly what I wanted someone else to see, with almost no chance of mistake or miscommunication.

Isn’t this cool? We can stay loosey-goosey and experimentally hipster, and yet talk precisely and accurately about everything we consume!

You’ll almost never hear “Does the router work?” because nobody really cares if the router works or not. You’ll always hear conversations like, “What’s the latest VFI that supports scrambling?”, or “What’s the latest stable VFI that supports session isolation?”

Assertions are made over VFIs. We bless a VFI as an overall locked entity, and that is why long-time customers have been getting a monthly email from us with these blobs. 🙂 When we need to roll out a surgical patch, the overhead is so minimal, it is uncanny. If someone makes a surgical change to one component, they test that component, then publish a new VFI with that component’s version, and test the VFI for overall scenarios. The remaining 8 components that are reliable, stable, tested, see no touch or churn.

Components are self-describing

Runtime Independence

Alex and I are Erlang fanboys and it shows. 100% of Polyverse is built on a few core principles, and everything we call a “component” is really an Actor stylized strictly after the Actor Model.

A component is first and foremost a runtime definition; it is something that one can run completely on it’s own and it contains all dependencies, supporting programs, and anything else it needs to reliably and accurately execute. As you might imagine, we’re crazy about Docker.

Components have a few properties:

  1. A component must always exist in a single Git repository. If it has source dependencies, they must be folded in (denormalized.)
  2. Whenever possible a components runtime manifestation must be a Docker Image, and it must fold in all required runtime dependencies within itself.

This sounds simple enough, but one very important contract every component has, is that a component may not know implicitly about the existence of any other components. This is a critical contract we enforce.

If there is one thing I passionately detest above all else in software engineering, it is implicit coupling. Implicit coupling is “magic”. It is when you build a component that is entirely syntactically decoupled from the rest. If Component A somehow relies on Component B existing, and acting a very specific way, then Component A should have explicitly expressed that coupling. As an operator, it is a nightmare to run these systems! You don’t know what Component A wants, and to keep up public displays of propriety, doesn’t want to tell you. In theory Component A requires nothing else to work! In practice, Component A requires Component B to be connected to it in a very specific magical way.

We go to great lengths to prevent this from happening. When required, components are explicitly coupled, and are self-describing as to what they need. That means all our specs are in the code. If it is not defined in code, it is not a dependency.

Build Independence

We then take this runtime definition back to the development pipeline, and ensure that all components can be built with two guaranteed assumptions:

  1. That there be a “/bin/bash” that they can rely on being present (so they can add the #!/bin/bash line.)
  2. That there be a working Docker engine.

All our components must meet the following build contract:

docker build .

It really is that simple. This means that combined with the power of content-addresses, VFIs and commit-clouds, we always have a reliable and repeatable build process on every developers’ desktop — Windows, Linux or Mac. We can be on the road, and if we need a component we can do “docker build .” We can completely change out the build system, and the interface still remains identical. Whether we’re cross-compiling for ARM, or for an x86 server, we all have a clear definition of “build works” or “build fails”. It really is that simple.

Furthermore, because even our builders are “components” technically, they follow the same rules of content-addressing. That means at any given time you can go back two years into a Git repo, and build that component using an outdated build system that will continue to work identically.

We store all build configuration as part of the components repo, which ensures that when we address “router@<sha>” we are not only talking about the code, but the exact manner that version needed to be built in, or wanted to be built in.

Here too you’ll notice the affinity to two things at the same time:

  1. Giving developers complete freedom to do what they want, how they want — whatever QA frameworks, scripts, dependencies, packages, etc. that they need, they get. They don’t have to go talk to the “Jenkins guy” or the “builder team”. If they want something they can use it in their builder images.
  2. Giving consumers the capacity to reason about “what” it is someone is talking about in a comprehensive fashion. Not just code, but how it was built, what tests were run, what frameworks were used, what artifacts were generated, how they were generated, etc.

Where it all comes together

Now that we’ve talked about the individual pieces, I’ll describe the full development/build/release cycle. This should give you an overview of how things work:

  1. When code is checked in it gets to be pushed without any checks and balances, unless you’re trying to get onto master.
  2. You’re allowed to push something to master without asking either, with the contract that you will be reverted if anyone finds a red flag (automation or human.)
  3. Anyone can build any component off any branch at any time they want. Usually this means that nobody is all that aggressive about getting onto master.
  4. An automated process runs the docker build command and publishes images tagged with their Git commit-shas. For now it only runs off master.
  5. If the component builds, it is considered “unit tested”. It knows what it needs to do that.
  6. A new VFI is generated with the last-known-good-state of working components, and this new component tag is updated in the VFI, and the VFI is submitted for proper QA.
  7. Assertions such as feature availability, stability, etc. are made over a VFI. The components really don’t know each other and don’t take version-based dependencies across each other. This makes VFI generation dirt-cheap.
  8. When we release, we look for the VFI that has the most assertions tagged around it (reviewed, error-free, warning-free, statically-verified, smoke-test-passed, patch-verified, etc. etc.)
  9. In the dev world, everyone is generating hundreds of VFIs per day, just trying out their various things. They don’t need to touch other components for the most part, and there is little dependency-churn.

I hope this post sheds some light on how we do versioning, why we do it this way, and what benefits we gain. I personally happen to think we lose almost none of the “assertions” we need to ship reliable, stable and predictable code, and at the same time simultaneously allowing all developers a lot of freedom to experiment, test, prototype and have fun.

Calling deco at the first Deco Stop

Disclaimer: These numbers are most certainly “WRONG!” You should NOT use this post or anything from a random only tool to plan or execute dives. You WILL get bent. Not “may”, but WILL. You know this. DO NOT rely on this tool.

Here’s a scenario that should never happen, but to quote the eloquent Mr. Mackey, “There are no stupid questions. Only stupid people.”

I decided to answer the following question full well know the question sounds stupid:

Suppose you make it to your first deco stop and want to adjust your deco based on what happened between your max depth, and hitting the deco stop. Say you had a reg failure and had to fix it. Or a scooter got entangled in the line. Or you had a reverse squeeze so you had to pause for a bit longer. Now you’re AT your deco stop, and you’ve got two things – your average depth by the time you hit the stop, and your total runtime for the dive.

Given those two numbers, if you had to calculate deco, how much would it vary based on calculating it as a true multi-level dive where you accounted for the pause as a scheduled “level stop”?

Side note: Once again, remember the purpose of these questions isn’t about what should happen or should never happen, but to create a strong feedback loop in ensuring what should NEVER happen as a function of the cost/risk you incur when you do so. Basically if it should turn out that stopping mid way, and not keeping track of where you stopped and how long you stopped is going to add a ridiculous increase in deco, then you HAVE to make sure you remember it. If it should turn out, it doesn’t add more than a few minutes of deco based on your avg depth observed at your gas switch, then you can in an emergency, get the hell up to that gas switch, switch over, and run the numbers based on what you saw then.

The Program

Here’s a quick program I whipped up which lets you do just that:

//This function is a utility to get total dive time out of all segments
var calculateDiveTime = function(diveplan) {
 var totalTime = 0;
 for (var index = 0; index < diveplan.length; index++) {
 var segment = diveplan[index];
 totalTime = totalTime + segment.time;
 }
 return totalTime;
}

var buhlmann = dive.deco.buhlmann();
console.log("Second Level depth, Avg Depth at Deco Stop, Multi-Level Deco Time, Avg Depth Deco Time")
for (var nextLevelTime = 5; nextLevelTime <= 30; nextLevelTime += 5) {
 for (var nextLevel=190; nextLevel > 70; nextLevel -= 10) {
 var plan = new buhlmann.plan(buhlmann.ZH16BTissues);
 plan.addBottomGas("18/45", 0.21, 0.35);
 plan.addDecoGas("50%", 0.50, 0);
 plan.addDecoGas("100%", 1.0, 0);
 plan.addDepthChange(0, dive.feetToMeters(200), "18/45", 5);
 plan.addFlat(dive.feetToMeters(200), "18/45", 25);
 var bottomTime = 30; //5 + 25 to start with
 var cumulativeDepth = (25 * 200) + (5 * 100); //Avg depth so far (25 mins at 200, and 5 minutes at 100 - which is mid-point when descending).
 
 //add a add depth change to next level
 var depthDiff = 200 - nextLevel;
 var timeToLevel = depthDiff/60;
 plan.addDepthChange(dive.feetToMeters(200), dive.feetToMeters(nextLevel), "18/45", timeToLevel);
 bottomTime += timeToLevel;
 cumulativeDepth += (timeToLevel * (nextLevel+(depthDiff/2)));
 
 //add a segment at next level
 plan.addFlat(dive.feetToMeters(nextLevel), "18/45", nextLevelTime);
 bottomTime += nextLevelTime;
 cumulativeDepth += (nextLevelTime * nextLevel);
 
 depthDiff = nextLevel - 70;
 timeToLevel = depthDiff/60; //This is aggressive since we won't hit 70 feet at 60 fpm
 plan.addDepthChange(dive.feetToMeters(nextLevel), dive.feetToMeters(70), "18/45", timeToLevel);
 bottomTime += timeToLevel;
 cumulativeDepth += (timeToLevel * (70+(depthDiff/2)));
 
 var avgDepthAtDecoBegin = cumulativeDepth/bottomTime;
 
 var decoPlan = plan.calculateDecompression(false, 0.2, 0.8, 1.6, 30);
 
 var totalTime = calculateDiveTime(decoPlan);
 var decoTimeFromMaxDepth = totalTime - bottomTime;
 
 plan = new buhlmann.plan(buhlmann.ZH16BTissues);
 plan.addBottomGas("18/45", 0.21, 0.35);
 plan.addDecoGas("50%", 0.50, 0);
 plan.addDecoGas("100%", 1.0, 0);
 plan.addFlat(dive.feetToMeters(avgDepthAtDecoBegin), "18/45", bottomTime);
 decoPlan = plan.calculateDecompression(false, 0.2, 0.8, 1.6, 30);
 totalTime = calculateDiveTime(decoPlan);
 var decoTimeFromAvgDepth = totalTime - bottomTime;
 
 console.log(nextLevel + ", " + nextLevelTime + ", " + avgDepthAtDecoBegin + ", " + decoTimeFromMaxDepth + ", " + decoTimeFromAvgDepth);
 }
}



 

The Results (raw data)

The results I got were:

Second Level depth, Avg Depth at Deco Stop, Multi-Level Deco Time, Avg Depth Deco Time
190, 5, 181.41255605381164, 56.29999999999997, 56.52945470852016
180, 5, 180.06726457399105, 55.29999999999996, 55.488450224215235
170, 5, 178.7219730941704, 54.29999999999996, 55.4474457399103
160, 5, 177.37668161434976, 53.29999999999997, 54.40644125560536
150, 5, 176.03139013452915, 51.99999999999998, 53.36543677130043
140, 5, 174.68609865470853, 50.999999999999964, 52.324432286995496
130, 5, 173.34080717488786, 49.99999999999997, 51.28342780269057
120, 5, 171.99551569506727, 49.999999999999964, 51.24242331838564
110, 5, 170.65022421524665, 48.69999999999998, 50.20141883408069
100, 5, 169.30493273542598, 47.699999999999974, 50.16041434977576
90, 5, 167.9596412556054, 45.69999999999998, 49.119409865470836
80, 5, 166.61434977578477, 45.69999999999998, 48.0784053811659
190, 10, 182.43083003952566, 66.29999999999997, 67.56049169960473
180, 10, 180.0592885375494, 64.29999999999998, 65.48820711462449
170, 10, 177.68774703557312, 63.299999999999976, 63.415922529644256
160, 10, 175.31620553359681, 60.99999999999997, 62.34363794466401
150, 10, 172.94466403162056, 58.99999999999998, 60.27135335968378
140, 10, 170.57312252964428, 56.99999999999998, 59.19906877470354
130, 10, 168.20158102766797, 54.699999999999974, 57.12678418972331
120, 10, 165.83003952569172, 52.69999999999997, 55.054499604743064
110, 10, 163.45849802371544, 50.69999999999997, 53.98221501976284
100, 10, 161.08695652173913, 48.39999999999998, 51.909930434782595
90, 10, 158.71541501976284, 47.399999999999984, 50.837645849802364
80, 10, 156.34387351778656, 45.39999999999997, 49.76536126482211
190, 15, 183.23321554770317, 77.59999999999997, 78.58494840989397
180, 15, 180.05300353356887, 73.29999999999995, 75.48801554770316
170, 15, 176.87279151943463, 71.29999999999995, 72.39108268551234
160, 15, 173.69257950530033, 67.99999999999997, 70.29414982332155
150, 15, 170.5123674911661, 64.99999999999997, 67.19721696113072
140, 15, 167.33215547703182, 62.69999999999998, 65.10028409893991
130, 15, 164.15194346289752, 59.699999999999974, 62.00335123674911
120, 15, 160.97173144876325, 57.69999999999998, 59.90641837455829
110, 15, 157.79151943462898, 54.39999999999997, 57.80948551236748
100, 15, 154.61130742049468, 51.39999999999998, 55.71255265017666
90, 15, 151.43109540636044, 49.13359999999998, 53.61561978798586
80, 15, 148.25088339222614, 46.13359999999998, 50.518686925795045
190, 20, 183.88178913738017, 87.59999999999998, 89.60471693290735
180, 20, 180.0479233226837, 83.29999999999998, 86.4878607028754
170, 20, 176.21405750798723, 80.29999999999998, 82.37100447284345
160, 20, 172.38019169329073, 75.99999999999999, 79.2541482428115
150, 20, 168.54632587859422, 71.99999999999997, 75.13729201277954
140, 20, 164.71246006389777, 68.69999999999999, 71.0204357827476
130, 20, 160.87859424920126, 64.69999999999997, 66.90357955271564
120, 20, 157.0447284345048, 61.399999999999984, 64.78672332268368
110, 20, 153.21086261980832, 57.39999999999997, 61.66986709265175
100, 20, 149.37699680511182, 54.13359999999999, 59.5530108626198
90, 20, 145.54313099041534, 51.13359999999998, 55.43615463258785
80, 20, 141.70926517571885, 47.13359999999998, 53.319298402555894
190, 25, 184.41690962099125, 97.59999999999998, 100.6210274052478
180, 25, 180.04373177842564, 92.29999999999998, 95.48773294460639
170, 25, 175.67055393586006, 88.29999999999998, 91.35443848396503
160, 25, 171.29737609329445, 82.99999999999999, 87.22114402332359
150, 25, 166.92419825072884, 78.99999999999997, 83.08784956268224
140, 25, 162.55102040816328, 74.69999999999999, 78.95455510204081
130, 25, 158.17784256559764, 70.69999999999997, 72.82126064139943
120, 25, 153.80466472303203, 65.39999999999998, 69.687966180758
110, 25, 149.43148688046648, 61.39999999999997, 65.5546717201166
100, 25, 145.05830903790087, 57.13359999999997, 61.42137725947519
90, 25, 140.6851311953353, 53.13359999999998, 58.28808279883379
80, 25, 136.31195335276968, 49.13359999999998, 55.1547883381924
190, 30, 184.86595174262735, 108.59999999999998, 113.63471420911527
180, 30, 180.04021447721178, 102.29999999999998, 107.4876257372654
170, 30, 175.21447721179626, 96.29999999999998, 101.34053726541555
160, 30, 170.3887399463807, 90.99999999999999, 94.19344879356568
150, 30, 165.56300268096513, 85.99999999999997, 89.04636032171581
140, 30, 160.73726541554961, 80.69999999999999, 83.89927184986593
130, 30, 155.91152815013405, 75.7, 78.75218337801608
120, 30, 151.08579088471848, 70.4, 74.60509490616622
110, 30, 146.26005361930297, 65.39999999999998, 70.45800643431636
100, 30, 141.4343163538874, 60.13359999999997, 65.31091796246646
90, 30, 136.60857908847183, 56.13359999999998, 61.1638294906166
80, 30, 131.78284182305632, 51.13359999999998, 57.01674101876673

The conclusions

Here’s a couple of quick conclusions I was able to draw:
1. If all you did was compute deco based on your avg depth + time after having hit the stop, the biggest difference I could find was a little over 6 minutes (and it was negative.) Meaning, if we did the entire dive at Avg depth, we’d be calling 6 minutes more at most.
2. The maximum deviation expressed in percentage points, was 13 percent. Meaning adding a safe 15% to what would be deco based on avg depth would be a good rule of thumb.
I haven’t played with greater depths or attempted to plot/chart these to get a visual “feel” for how the curves shape up. I haven’t tried VPM.

Recipes to play with “No Deco Limits”

For a technical, philosophical and every other view on No-Deco limits, go to scubaboard.

First let me elaborate on why the mathematical model is important and how to play with it.

Models allow reverse engineering (fitting)
This post is about understanding the mathematical model in terms of the NDLs we know and use. One of the most important things about any model is, once you have verified it in one direction (how much deco must I do for a specific dive), you can then run it in the other direction (how much can I dive, before I have mandatory deco.) You can then understand what parameters, adjustments, corrections, variations other people were using when they came up with the numbers they gave you.

This is a subtle point and one excites me the most. What this means is, if someone said to me, “Let’s dive to 100 feet, for 40 minutes, on 32%, and ascend while stopping two minutes every ten feet.”, I now have the tools guess their parameters.

Suppose they were using VPM, then I can reverse-engineer things like what bubble sizes they considered “critial”, and what their perfusion rates were assumed to be, etc. If they were using the Buhlmann, I can reverse-engineer their gradient factors.

This is awesome because it allows me to break down the black box a little – instead of whining about “My computer said 10 minutes, and yours said 20 minutes”, I can whine in a far more specific and deeply annoying way – “Oh I see, my computer is assuming critical He bubble size to be 0.8 microns, but yours is assuming 0.5 microns.” Remember kids – you shouldn’t always whine, but when you do, make it count!

When your computer has a “conservativism factor”, what does that mean? Is it simply brute-multiplying each stop by that factor? Is it multiplying the shallow stops? Is it a coefficient used in a curve-fitting model, if let’s say it’s trying to fit a curve like a spline or bezier to “smoothen out the ascent”? Conservativism factor “4” makes you no more intelligent about what’s going on, than saying, “These are the adjustments/corrections I make.”

While these ARE “just models”, models are nothing if they are not properly parameterized.

Here again, existing software came short in what I could do with it. The GUI is a great tool for a narrow specific task. But when exploring the model, nothing is more useful and powerful than being able to play with it programmatically. Once I begin posting recipes you’ll know what is so fascinating about “playing with it”.

If you’re a fan of the Mythbusters, you will see them refer to this as, producing the result.

Models allow study of rates-of-change (sensitivity analysis)

The other very important aspect of a model, even the constants are wrong, is the overall rate of change, or growth. Also called sensitivity analysis (meaning how sensitive is my model to which parameters.)

Let us say we had a few things in our control – ppO2, ppN2, ppH2, bottom time, ascent rate, descent rate, stops.

What a mathematical model allows us to learn (and should help us learn), is how sensitive the plans are to each of these parameters, even if specific constants are wrong.

Let me put it this way – if you wanted to guess the sensitivity of a “car” to things like – weight, number of gear shifts, size of wheels, etc., and you had a hummer to study with, and but had to somehow extend that knowledge to an sedan, how would you do it?

The “constants” are different in both. But the models aren’t. An internal combustion engine has an ideal RPM rate where it provides the maximum torque for minimum fuel consumption. The specific rev rate will be different. And you can’t account for that. However, the “speed at which inefficiency changes”, is a commonality in all internal combustion engines. Unless the  sedan is using a wenkel engine, the rate-of-change characteristics still apply. Even if the hummer’s ideal RPM 2000, and the sedan’s is 1500, the questions we can still study are – when I deviate 10% from the ideal, how does that affect fuel consumption, and torque?

So even if the software/constants I wrote are entirely wrong (which they probably are), they still serve a valuable tool in studying these changes.

A study in NDLs

Naturally one of the first unit tests I wrote for the algorithm, was PADI dive tabes: https://github.com/nyxtom/dive/blob/master/test/dive_test.js#L646

The point here was to recreate an approximation of the dive tables. What fascinated me was how much subtle understanding there is behind that number though.

First let’s define an NDL as: Maximum time at a depth, with an ascent ceiling of zero.

What this means is, whether you use Buhlmann, or VPM or whatever model you like, the NDL is the time after which you can ascend straight to the surface (depth of zero meters.)

So what happens when we run pure Buhlmann without a gradient factor?

(This snippet is meant to be executed here: http://deco-planner.archisgore.com/)

var buhlmann = dive.deco.buhlmann();
var plan = new buhlmann.plan(buhlmann.ZH16BTissues);
plan.addBottomGas("Air", 0.21, 0.0);
plan.ndl(dive.feetToMeters(100), "Air", 1.0);

//Result is 16

That’s a bit strange isn’t it? The official NDL on air is closer to 19 or 20 minutes (with a “mandatory safety stop”.)

Does it mean my model is wrong? My software is wrong? Compare it with different depths, and you’ll find it gives consistently shorter NDLs. What gives?

Let’s try fudging the conservativism factor a bit.

var buhlmann = dive.deco.buhlmann();
var plan = new buhlmann.plan(buhlmann.ZH16BTissues);
plan.addBottomGas("Air", 0.21, 0.0);
plan.ndl(dive.feetToMeters(100), "Air", 1.1);


//Result is 19

That’s just about where we expect it to be. This tells me that the NDL could have been computed with a less conservative factor. But is there something I’m missing?

Wait a minute, this assumes you literally teleport to the surface. That’s not usually the case. Let’s run the same NDL with a 30-feet-per-minute ascent (this time we have to use the getCeiling method).

for (var bottomTime = 1; bottomTime <= 120; bottomTime++) {
 var buhlmann = dive.deco.buhlmann();
 var plan = new buhlmann.plan(buhlmann.ZH16BTissues);
 plan.addBottomGas("Air", 0.21, 0.0);
 plan.addFlat(dive.feetToMeters(100), "Air", bottomTime);
 plan.addDepthChange(dive.feetToMeters(100), 0, "Air", 3);
 
 if (plan.getCeiling(1.0) > 0) {
 console.log("NDL for 100 feet is: " + (bottomTime-1));
 break;
 }
}
NDL for 100 feet is: 19

That’s interesting. For the same parameters, if we assume an ascent of two minutes, our NDL went up – we can stay down longer if we are ASSURED of a 30-feet-per-minute ascent at the end.

Now remember these numbers are entirely made up. My constants are probably helter-skelter. You shouldn’t use the SPECIFIC numbers on this model. But there’s something intuitive we discovered.

Let’s try it again with a 3 minute safety stop at 15 feet:

for (var bottomTime = 1; bottomTime <= 120; bottomTime++) {
 var buhlmann = dive.deco.buhlmann();
 var plan = new buhlmann.plan(buhlmann.ZH16BTissues);
 plan.addBottomGas("Air", 0.21, 0.0);
 plan.addFlat(dive.feetToMeters(100), "Air", bottomTime);
 plan.addFlat(dive.feetToMeters(15), "Air", 3);
 
 if (plan.getCeiling(1.0) > 0) {
 console.log("NDL for 100 feet is: " + (bottomTime-1));
 break;
 }
}
NDL for 100 feet is: 22

Once again these numbers make sense – if we are ASSURED of a 3 minute stop at 15 feet, our NDL goes up. How interesting.

This gives you a better idea of a “dynamic” dive. You aren’t exactly teleporting from depth to depth, and those ascents and descents matter. Try this for different gasses.

Dive Planner Recipes

This is really for my personal reference. If it helps you, I’m glad.

A couple of weeks ago, I wrote this tool (http://deco-planner.archisgore.com.) You can go read the history, motivation, etc. on that page and the github repo ad nauseum.

NOTE: Why is this important/useful? Don’t computers tell you how much deco you should do? Yes they do exactly that, and do it pretty well. Now here’s what a computer won’t tell you – how much deco would you be looking at _if_ you extended the dive by 10 minutes? Let’s say that by extending it 10 minutes, or pushing it down by 10 feet more, your obligation jumps from 30 minutes to 50 minutes. That is objectively two-thirds more gas than you planned for. This tool/post is about understanding what those shapes are so you can decide, even if you had your computer telling you what your deco was, whether you’re going to like doing it or not.

This post is about how to effectively use that tool with some pre-canned recipes to generate information cheap/easy than any other tool I know of or can think of.

The first recipe (and the primary reason I built the entire damn thing, is to get an idea of how ratio-deco changes over different bottom times. Does it grow linearly? Non-linearly? etc. Say you’re at 150 feet for “x” minutes longer than your plan, and you just don’t happen to have a computer to do your math. Do you have a vague idea how the shape of increments changes?)

Let’s find the answer to that very question quickly.

Deco time change as a ratio of bottom time:

//This function is a utility to get total dive time out of all segments
var calculateDiveTime = function(diveplan) {
    var totalTime = 0;
    for (var index = 0; index < diveplan.length; index++) {
        var segment = diveplan[index];
        totalTime = totalTime + segment.time;
    }
    return totalTime;
}

//In this loop we'll run a 150 foot dive for bottom times between 1 to 120 
// minutes and calculate total dive time, find deco time (by subtracting 
// bottom time), and store it in the array.
for (var time = 1; time <= 120; time++) {
    var buhlmann = dive.deco.buhlmann();
    var plan = new buhlmann.plan(buhlmann.ZH16BTissues);
    plan.addBottomGas("2135", 0.21, 0.35);
    plan.addDecoGas("50%", 0.50, 0);
    plan.addDecoGas("Oxygen 100%", 1.0, 0.0);
    plan.addFlat(dive.feetToMeters(150), "2135", time);
    var decoPlan = plan.calculateDecompression(false, 0.2, 0.8, 1.6, 30);
    var totalTime = calculateDiveTime(decoPlan);
    var decoTime = totalTime - time;
    console.log(decoTime);
}

What’s really cool is, I can now chart that decoTimes array using Excel or Numbers or whatever your spreadsheet is. I just paste it in plot.ly, and get this:

Deco Time change as a ratio of depth:

Now let’s look at how does decompression change if my depth came out different than anticipated? We can generate deco schedules for that too:

//This function is a utility to get total dive time out of all segments
var calculateDiveTime = function(diveplan) {
    var totalTime = 0;
    for (var index = 0; index < diveplan.length; index++) {
        var segment = diveplan[index];
        totalTime = totalTime + segment.time;
    }
    return totalTime;
}

//In this loop we'll run a 150 foot dive for bottom times between 1 to 120 
// minutes and calculate total dive time, find deco time (by subtracting 
// bottom time), and store it in the array.
for (var depth = 120; depth <= 180; depth++) {
    var buhlmann = dive.deco.buhlmann();
    var plan = new buhlmann.plan(buhlmann.ZH16BTissues);
    plan.addBottomGas("2135", 0.21, 0.35);
    plan.addDecoGas("50%", 0.50, 0);
    plan.addDecoGas("Oxygen 100%", 1.0, 0.0);
    plan.addFlat(dive.feetToMeters(depth), "2135", 30);
    var decoPlan = plan.calculateDecompression(false, 0.2, 0.8, 1.6, 30);
    var totalTime = calculateDiveTime(decoPlan);
    var decoTime = totalTime - 30;
    console.log(decoTime);
}

And we get this:

 

Finally, let’s plot how VPM-B compares to Buhlmann. In this case, we have to add a depth change from 0 feet to 150 feet, because VPM is very sensitive to the slopes unlike Buhlmann which only worries about tissue loading (more on this later, I promise.)

Here’s the code to generate Buhlmann vs VPM deco times for the same dive profile:

//This function is a utility to get total dive time out of all segments
var calculateDiveTime = function(diveplan) {
    var totalTime = 0;
    for (var index = 0; index < diveplan.length; index++) {
        var segment = diveplan[index];
        totalTime = totalTime + segment.time;
    }
    return totalTime;
}

//In this loop we'll run a 150 foot dive for bottom times between 1 to 120 
// minutes and calculate total dive time, find deco time (by subtracting 
// bottom time), and store it in the array.
for (var time = 1; time <= 120; time++) {
    var buhlmann = dive.deco.buhlmann();
    var bplan = new buhlmann.plan(buhlmann.ZH16BTissues);
    bplan.addBottomGas("2135", 0.21, 0.35);
    bplan.addDecoGas("50%", 0.50, 0);
    bplan.addDecoGas("Oxygen 100%", 1.0, 0.0);
    bplan.addDepthChange(0, dive.feetToMeters(150), "2135", 5);
    bplan.addFlat(dive.feetToMeters(150), "2135", time);
    var bdecoPlan = bplan.calculateDecompression(false, 0.2, 0.8, 1.6, 30);
    var btotalTime = calculateDiveTime(bdecoPlan);
    var bdecoTime = btotalTime - time - 5;

    var vpm = dive.deco.vpm();
    var vplan = new vpm.plan();
    vplan.addBottomGas("2135", 0.21, 0.35);
    vplan.addDecoGas("50%", 0.50, 0);
    vplan.addDecoGas("Oxygen 100%", 1.0, 0.0);
    vplan.addDepthChange(0, dive.feetToMeters(150), "2135", 5);
    vplan.addFlat(dive.feetToMeters(150), "2135", time);
    var vdecoPlan = vplan.calculateDecompression(false, 0.2, 0.8, 1.6, 30);
    var vtotalTime = calculateDiveTime(vdecoPlan);
    var vdecoTime = vtotalTime - time - 5;

    console.log(bdecoTime + " " + vdecoTime);
}

And the chart that comes out:

Scuba Diving tools

At some point I made a page to document random scuba tools I build/will-build/want-to-build/want-others-to-build.

The last part is a bit tricky. I want many things – and asking others to build something is painful. You don’t always get what you want. You don’t always like what you get. You don’t always get what you want, and how you like it at a price you’re willing to pay for it.

So in a very terribly-theme’d page (because I absolutely suck at designing web pages), here’s a link to some tools I’m working on:

https://archisgore.com/scuba-diving-resources/

The next big couple of things coming up are – a better UI (especially plotter/charter) for the dive planner, and a RaspberryPi Zero based dive computer (on which you can write deco plans on a full linux distro.)

Don’t hold your breath though. My history with these things is very haphazard depending on how obsessive I feel the need/want.