Archis's Blog

April 16, 2015

How to (maybe) build a Dive Computer

Filed under: Uncategorized — archisgore @ 10:27 am

This post was entirely unnecessary. Go here to the Github file. All instructions are inline comments in the Arduino sketch.

https://github.com/archisgore/BFY_Bottom_Timer/blob/master/bfy_bottom_timer.ino

April 10, 2015

Problems you should never have

Filed under: Uncategorized — archisgore @ 5:48 am

A long time ago in a company far far away, I was asked the following question in my interview: “What should happen if a thread tried to acquire a lock it already has?”

To summarize the problem visually, they wanted to know what happens here:

function foo() {
    l.lock()
    if (someTerminatingCondition) {
        return
    }
    foo()
    l.unlock()
}

If you haven’t guessed already, the problem here is not one of locking. The real problem is – depending on whom you ask – that you’ll get a different answer every time for what is happening here. They asked me this with a very proud look on their face – like they had just figured out a way of handling “complexity” nobody else had thought of.

The real real problem is that now you’ve had to remember the thread-id somewhere inside the lock (or in some global space.) Then you need to lock THAT state to ensure safety. Nobody’s really sure if only one unlock() should release the lock, or whether a proper unwinding is necessary. Nobody’s sure everyone can do a proper unwinding, so there’s a catch-all “unlockAll()” operation that’s called in case of an error. Most of this bullshit masquerades under the guise of intelligence – “Oh you’re just not smart enough to understand advanced locking methodologies.”, which is usually proxy for, “If I don’t go on the offensive first, I risk looking incredibly stupid.”

This falls under a category of problems we see around us every single day. People avoid creating new objects in the name of efficiency (if your biggest delay is in object allocation, you’re either deluding yourself into thinking you’re some kind of mega-genius, or…. nope you’re just deluding yourself.) People mangle and tangle text in the name of security (there’s a difference between just running a string through a dozen encoders, as opposed to encrypting it.) People expose unnecessary inner implementations in the name of testability (if a test can’t call your API to do something, then I assure you nobody else can either.)

All of these are problems that you should never have. I’m not arguing that in a “real world engineering problem” you wouldn’t want to compromise, but it is important to note that it is, in fact, still a compromise. To use a recursive lock, is a failure to understand the problem, not a genius solution. Running obfuscation on a string is a failure to protect the string, not an ingenious workaround for key-management. If you’re seeing a performance improvement when you reduce object allocations, you have an infestation of bad design. A “design pattern” is a compromise to use a widely-seen idiom for that which otherwise is not obvious. That is why it’s called a pattern, not simplification.

Similarly, when writing C code, I want people who’re never doing ++(**i–), rather than people who can tell me what it really means. I should never have to deal with that problem ever. In 2015, logging should not be a problem you ever have. Concurrent lightweight threading is a problem you should never have. Configuration loading/management is a problem you should never have. Building/compiling is a problem you should never have. If you’re running out of stack space, you have a problem that won’t go away by writing non-recursive code. If you’re running out of disk space due to logs, and your reaction is to implement your own logger, you’re evading the decision of log retention/storage policies. If your web server can do a bunch of fancy things but can’t do efficient string-concatenation, you’re in deep shit (a web-page is just a really large concatenated string.)

Coming up soon…. a list of problems you should ALWAYS have, no matter what technological marvel you use. As much as I’m sick of hearing how someone’s invented the logger to end all loggers, I’m equally sick of hearing about the guys who’ve “solved distributed consistency.”

March 14, 2015

Strong/static-typing to nothingness

Filed under: Uncategorized — archisgore @ 8:10 pm

Perhaps someday, I’ll create a programmer-tropes.com website featuring our favorite tropes. Until then, blog entries will have to do.

A fairly annoying trope involves people using words like strong-typing, or static-typing, while carefully avoiding saying type-safety (which would imply some sort of responsibility.) When did you last see this sort of interface?

Map doSomething(Map parameters)

This is a fairly representative example, but it comes in many shapes and sizes. You’ll get Stringly typed at other times (aka everything is a string, but it’s really not.)

This kind of thing makes for great jokes.

“Do you know what types you arguments are?”

“Why yes of course. I believe in strong and static typing. The compiler will complain if you don’t adhere to my type expectations.”

“Oh good good. That’s helpful. Can you tell me the types of your arguments?”

“Yup. Your parameter must be a valid entity allowed in the system. So long as it’s a struct/enum/class instance, you’re in safe hands. Can’t go wrong with it! We will verify and error out if you use anything else.”

January 30, 2015

Object serialization isn’t the challenge for message-passing. It’s the other way round.

Filed under: Uncategorized — archisgore @ 10:01 pm

Whenever OOP (well, I should say C++/Java/C#)[1] programmers think of boundaries/coupling, etc. the tendency is to think in terms of “names”. I won’t go into the nouns vs verbs diatribe. However, boundaries and coupling are thought in terms of components – where a component is arbitrarily defined by each person. Inevitably these boundaries become muddled. Usually when we think of complex problems, it helps to go back to basics. There was one system I’ve used in my life, that forced me to never have these problems around boundaries. It did it so elegantly and effectively, that even today I can’t find something that good for use easily.[1]

When we think of calling methods, we commonly run into the challenge of sending data across component boundaries (whether your language calls it a “class” or not, is irrelevant). In the way an program is written or thought of, it isn’t always clear what the right granularity for information should be. Let’s take the example of a quicksort in C.

I can write the signature as:

quicksort(array, left, right) //My recursive calls progressively narrow left->right windows.
quicksort(array, right) //In C I can pass a pointer to "left", I bound on the right
quicksort(array); //I somehow found a way to NULL-terminate my sub-arrays. Call it magic.

What’s interesting about this example is, until you look at all three signatures, you don’t realize that each of them conveys very different meaning. If you saw any one of these in a code review, it wouldn’t raise a large red flag. It’s all plain pure C. Nothing out of the ordinary is happening here. (Someone coming from a traditionally functional/declarative language might throw a fit just on principle, but that’s because in their world, all arithmetic has to be built from a single primitive function “increment”. I may be wrong here – someone in the comments is probably going to point out that increment is built using some other more basic functions.)

Now imagine doing the exact same thing in Java or C#. When was the last time you even got to consider the three options as being options? Due to memory constraints and lack of pointers or null-terminability of arrays, you almost always end up using the first version of the method signature. Doing anything else, would mean copying the subarray into new arrays (not that there’s anything wrong with that – it’s just that the languages choose to make it expensive.)

This is what I mean by object serialization. The cost of object composition or casting is orders of magnitude more expensive than passing the message to the other side. So you inevitably end up sending everything you know. “Here’s three globals, two locals, my array, another array I was working on, and the past 10 US presidents for good measure. You figure out what you need to sort this array.”

This is where we come to the title of this article. One of the common pushbacks I get when I ask for sending the “proper message” across a boundary is how difficult it is to serialize an object. That is, somehow the challenge message-passing systems have to “overcome” is to figure out how Java and C# can serialize their stuff better. I submit to you, that that is where we lost our componentizing battles. Whereas the challenge presented to Java and C# should have been: Make it easier to compose your stuff into messages.

You want empirical evidence? Here goes. In 1995, pure-C programmers using multiple different compilers were able to build Microsoft Office that could be extended almost indefinitely. Now these people weren’t security experts, but the proliferation of IE’s ActiveX nightmare only proves the case of their APIs. Internet Explorer using COM was the closest the world got to Jeff Goldblum being able to execute PowerPC machine code on an Alien mothership (Where Steve Jobs probably designed their systems before coming to earth [citation needed]). While it was a godforsaken nightmare for Microsoft, how many of you wish your legitimate components could work THAT good with each other, huh? Be honest!

The takeaway is, the next time you define an interface, ask yourself this question:

If I was sending this information across a compiler and memory-layout boundary, is this the way I would send it?

(Also, all your dangling unclosed file descriptors will go away in the next few minutes.)

[1] There’s a reason I call out the three most common commercial Object-oriented languages out there. I’m going to say something that’ll polarize my audience – must you really know the best commercially-used and widely deployed OOP system in existence today? It’s COM. COM isn’t easy to understand. It sure as hell isn’t simple. However, that thing is so darned elegant, you fall in love with it the more you use it. Elegance comes from using a very small number of powerful concepts repeatedly.

I’ve spent two years writing very performant OOP code for mobile devices using COM, and later when I switched to C# for more mainstream server programming, I found that C# programmers had an array of things they couldn’t do which I was doing comfortably using plain-old-C. Sure you save some verbosity using “val foo = <Large type declaraton>”, or you save some filtering verbosity using LINQ, and what not.

COM was built out of a need – out of desperation – and the elegance shows. When you write code that talks to a COM component, you have no idea what language it would be written in (proper OO boundary). You don’t quite know what methods it exposes (duck typing, dynamic typing). The object itself can expose methods not physically compiled by the language (it can have extension methods, or runtime-exposed methods, etc.) However, the one key benefit of all this was that communication between two components was always very well enforced. The benefit C++/C#/Java programmers tend to miss. We got this as C++Light programmers (C++Light is when you’re mainly writing C code on a C++ compiler but use some C++ for strategic things like smart pointers.)

OOP was never about the language. It was about “objects”. COM forced you to think of components. You couldn’t have global statics because… well how do you do that across language boundaries? When you’re linking two DLLs with vastly different memory layouts, you just can’t pull it off. So you are forced to think how you share stuff.

January 11, 2015

To focus on the language, is to focus on the wrong idea.

Filed under: Uncategorized — archisgore @ 9:13 am

Anytime we talk languages, a funny thing happens. Nearly 100% of the times, the question you somehow find yourself debating is, “How is, (if (= 1 2)) different from if (1 == 2)”. I wrote a blog a few months ago on why programming languages matter, and I’m here to tell you I was wrong. Sure intent is important. But beyond that, why you should care about what others are doing isn’t for the easy of syntax…

Beyond expressing the “intent” of what you want to do, what you truly gain from different languages are the concepts. After reading my blog, read this entry by Alan Kay: http://lists.squeakfoundation.org/pipermail/squeak-dev/1998-October/017019.html

This is the man who cointed the term “Object Oriented” and we’ve been using it wrong all along. This is like if Jesus were to come back today, and we began “correcting” him on what the Bible “truly means.” You’ll see how wrong I was. Alan Kay couldn’t care less about how he passes messages. Whether it looks like “import “messaging.h” send_message(foo, “bar”)”, or (sendmessage foo “bar”) or foo.bar() is truly irrelevant. If what we picked up from Smalltalk or OOP is which of these three are the “right” way of writing a line that sends the message, we’re focussing on the entirely wrong thing. It is also possible, we are bikeshedding.

The sad part is, I too, sometimes get caught up in these wrong discussions. Occasionally I want data-to-be-code-and-code-to-be-data, but it’ll come out of my mouth as, “I wish I had Lisp right about now.” without realizing that well – Java and .Net and Perl and Python and Ruby are already interpreted. If I’m using XML, I’m the one at fault, not the language!

Let’s take Erlang as the example – if it gave me C syntax on the BEAM, I’d be in heaven. That’s the whole point of knowing it! If you’re looking at Erlang, and wondering, how:

if condition -> do.

is any different from

if (condition) {do}

You’ll learn nothing at all and come out very confused about what a couple of dozen hipsters at Whatsapp were snorting. Both are Turing complete, and as proven by Alan Turing, can solve all classes of problems that can ever exist (given infinite time and memory of course.)

But if you look at Erlang and ask what “Processes are” and “Why the language cares”, you’ll end up learning a lot. Then the debate suddenly turns into, “Well I can do Actors in C, and I can create message queues there as well.”, to which I say, “What took you so long!? Let’s get on with it then. Erlang has shit for syntax, but Joe Armstrong sure had a great idea for how to build a safe system!”

THIS is really why programming languages matter – the language itself is entirely useless. The concepts they bring however, are not. Look at JSON for instance. JSON has a way of defining an object literal in the language. Sounds simple right? This is why it’s powerful – you can literally create a new object from code. That is, your Data IS code. Your data isn’t parsed by code. It simply is! Does this remind you of Lisp? It should. Why’s this important? Because with Javascript you already have the best compiler, parser, validator imaginable. To write an XML parser, validator and compiler ON TOP of Javascript seems like folly. Why would you not use one that exists and one in which you have already put your faith in?

I believe one of the most critical mistakes made in the Java and C# world, is the mistaken belief that they are still C – which used to be statically linked into OS-specific bare-metal code. They are not though, but their developers haven’t figured this out in over 2 decades! Every classloader, is a compiler and validator. Which means there really is no reason for either XML, Spring or XAML to exist. When you load a “class”, you’re loading config – what’s more, you’re loading it using a compiler/interpretor/validator/parser combo that you’ve already decided to trust. Why on earth would you put your faith in something written ON TOP of a perfectly functioning system? All Java had to do, was define a good way of writing an object literal in code, and they’d have unleashed a killer product!

To look at Lisp and wonder about the syntax, is to look at the wrong thing. To look at Lisp, and come up with JSON, or a way to define object literals inline, is the right way to do it.

Similarly, what WOULD be profoundly useful, is if we got around to developing some very good process-isolation tools for plain-old C, and allowed async message passing between processes, and then developed a supervisor that could watch and restart one. Let me put it this way – you ever think Chrome is a fast/efficient/reliable browser? You’re using Erlang concepts implemented in C. If you like Chrome’s way of doing things, you’re already a supporter of the Erlang model. To focus on the syntax, is to focus on the wrong idea. Sure it looks like Prolog, which to FOPL and Discrete mathematics fanatics, is very elegant. It is also entirely beside the point! To focus on how Chrome webbrowser works by running each tab in a different process, this ensuring the “Browser” in it’s entirety doesn’t die easily, is to learn the right thing. All of this done in age-old ANSI C.

So there you go – why programmer languages matter is not only to capture “intent”, but to steal from what others have done, shamelessly and unabashedly.

November 12, 2014

The non-recursive quicksort

Filed under: Uncategorized — archisgore @ 5:48 pm

Many of you saw the title, assumed it was either clickbait, or written by a lunatic, or clickbait written by a lunaic. You are the righteous few. Move along, this entry isn’t for you.

Some of you saw the title, and went, “Ohh… I want to learn how to do that!”

I’m going to try and change you mind. I probably won’t succeed, but I’ll be damned if I don’t give it a fair shot. Ready for some convincin’?

Somewhere out there, is someone who remembers that “recursion is bad.” They don’t particularly remember why that is, or why they believe it. They just “know”. If they see a function foo, calling itself, they’ll freak out. Everything else is good – a nested loop even. They’ll even write explicit stacks so long as the language keyword they use is “for”, or “while”.

If right about now, you’re wondering why they think they can do a better job in 20 minutes, compared to gcc or msvc who between them compile the Linux kernel, ALL of Microsoft Windows and Office, all of Debian, Ubuntu, Redhat, the Apache webserver, and  countless other things, then I’m hopeful for you – perhaps I may just yet convert you over.

The first hurdle you need to overcome is not confuse reentrancy with recursion. The quicksort, and many algorithms are recursive “by definition”. I.e., if it isn’t defined in terms of partitioning the input, and reapplying itself to the smaller partitions, then you don’t have a quicksort, i.e. you are using some other algorithm. Whether you use a function that can call itself, and allow the compiler to handle how to do it, or whether you use a loop statement and manage your own state (i.e. explicit stack), the quicksort remains “recursive” in all it’s complexity. There’s simply no way that you have a quicksort, and not recurse. There are plenty of ways that you can have a quicksort and not use reentrancy though.

When most people claim “recursion is bad”, they aren’t speaking of algorithmic recursion. They are speaking of reentrancy. When computer people don’t clearly remember why they do something, they convince themselves that it is for “performance.” Really try this – it doesn’t stop being funny.

Reentrancy WAS indeed bad, but not for the reasons you think. Reentrancy (usage of an implicit stack) was, even in the 80’s ridiculously efficient for performance – simply because at any given moment (more (more smart) people) are getting paid a lot more money to implement a stack for a compiler, than you are – unless you are one of those people. This isn’t to say you aren’t smart enough, but if your incentives aren’t based on a badass stack implementation, you simply aren’t going to beat the person whose ARE!

The real reason you were told to avoid recursive function calls in college, was because of how memory layouts worked in the DOS days.

If you don’t have context on why this was important, back in the day, when I learned how to write programs, we didn’t have reentrancy always. For one, languages like BASIC and FORTRAN were taught widely, and many variants used sub-routines as “addressable blocks”, nothing more. This meant that a subroutine call, was almost like executing a JMP instruction on the silicon. You set up some variables (registers), and then JMP/CALL’d to the subroutine, which would modify the memory state and JMP/RET back. Having the RET back was kind of an amazing thing for it’s time. The processor could remember where you jumped from, and where it had to return. That was it for a stack. In an architecture where memory addressing was still based on segmentation, we had to determine our stack space upfront, and everything we allocated to the stack, was that much less accessible for storing non-stack variables (heap). We didn’t have virtual address spaces and paging.

So you’d keep stacks aggressively small, and protect their usage like gold. This is where the term “Stack Overflow” comes from, for you young ones.

So you see, being able to manage state in a system where shoving all your data on the implicit stack was a bad idea, was an engineering skill required of all programmers. You had to know how to store state in globals and safely backtrack, without tripping yourself up. And that happened a lot too. If your SP (Stack Pointer) was moved by even one byte beyond absolute perfection, you’d be in for a world of hurt.

Sometime in the mid-90’s though, the JVM came in, and paged memory came in, and process isolation came in, and reentrant kernels came in (thanks to Linux, and Windows 95). In the early 2000’s these ideas went mainstream. We forgot why the non-recursive quicksort was taught, and what the principle behind it was. It was somehow confused with optimization. Under a paged address space, the assumptions simply no longer held true. You could address anything in your processor’s addressable space, and never trip up, unless you actually *used* more memory than the system had. You could grow your stack indefinitely large, or you could implement the stack data structure on the heap, and use some sort of loop to simulate the same thing. You ended up using the same memory either way, but far less effort one way – say your physical memory is 258 gigs but addressable memory is 4 terabytes, you should be able to safely allot 2 terabytes each to the stack and heap, and unless you actually used all that memory, you’d still come out fine. Your OS’s virtual memory manager would have your back – back in the DOS days, the OS did very little, and it’s “API” was a bunch of interrupts you called – which were a slightly fancier version of CALL/RET.

In the general scheme of things though, we lost memory of why this was taught and we mistook it for optimization. I remember a friend of mine in college spent weeks attempting to remove recursion from her code and replacing it with loops – because she had gotten it into her head, after the non recursive quicksort chapter, that she’d somehow make her program perform faster. I always wrote it off as a symptom of her being a student, but I realize that there are industry veterans who still hold the same opinions today when they should know better!

In fact, it gets worse! If the cost of making a function call remains the same, and if you haven’t inlined every single push/pop instance in your subroutine, then calling that “push” function is the same as calling the current function reentrantly. A function call, is a function call, is a function call.

Ask anyone who’s competed in programming contests, written the non-polynomial primarily tester, or solved the traveling salesman problem for Wall Street, and they’ll tell you that their “recursive” algorithm will outperform your “non recursive” version five times a day and twice on a weekend.

Algorithmic optimizations are hard. Very hard. That’s also why they work.

This general principle applies for everything – whether you are trying to optimize collections by pre-allocating space for small sizes, or “helping” the compiler by using a 32-bit integer instead of a 64-bit one, or using a “for” instead of “foreach”, or avoiding creating an extra object, or any number of things. The question to think about is, “Is someone way smarter than me, getting paid a lot more than me, to solve this problem? Are way more people than a single individual like me getting paid to do it?

 

October 31, 2014

Immutability simplified

Immutability, as used in software engineering, brings much fear and paranoia. I figured out today morning why that is. It evokes visions of fixture, restrictions, dogma, lack of change, straightjackets, chains, prisons! Read below to find out that immutability is about far more reckless and carefree change than you can possibly imagine.

First remember that immutability is simply a concept, a principle, etc. Imagine a “Thread of software”. You may call things threads, processes, strings, ropes, etc. These are all concepts. From the silicon’s view, it is getting a stream of instructions that it executes. It doesn’t know that you have multiplexed the stream with conceptual ideas. Immutability is simply an idea, that if you guarantee me certain things won’t change without my knowledge, then I can save significant time and effort being wary of those changes. At what level you provide that guarantee is entirely upto you. I think the crucial component people leave out, when they describe immutability, is the hidden clause, “without my knowledge”.

If you think in terms of procedures, this can be very truly frightening word. Afterall, everything about software is change. We change states. We grew up learning about state machines. There are two parallel reactions I have seen on blogs that promote immutability.

The first one assumes that the author is trying to be a showoff. “What the author wants, is for me to get it right the first time. The author can do it right in one shot, but I can’t. We’re not all math geniuses professor. We’re just engineers who write code, and it works. Go live in your ivory tower you stuck up snob.”

This is simply based on a misunderstanding of what immutability means. In the second paragraph I spoke about change without others’ knowledge. We’ll get around to that again.

The second one assumes that immutability is this one single thing. Let’s explore that further. If you’re interested in general language discussions, it would help to read this post on strong and weak type systems by Eric Lippert. If you already know, I want to draw an analogy between strongly typed references vs strongly typed values – even in a dynamic language, you can have values that are strongly typed, even if the references to those values necessarily aren’t.

In the same way, immutability has many many forms and shapes.

For example, in some languages,

x = 2
y = x
y = 3

Is a logical fallacy. The compiler would fail. As a mathematician, you clearly know that the code above is “not true”. Let us call this type of immutability, a “immutability of reference”. These are not assignment operators, if that is how you think of them. These are facts being given to the compiler. You are informing the compiler that y is 2.

But I’m sure we’ll all agree that the following (interpreted as C code) is very very bad:

int const TWO = 2;
int* x = &TWO;
int* y = x;
passYToThisFunctionToBeDereferenced(y);
*x = 5;

What’s wrong here? By pointing x and y to the same address, and after having passed y to some function, I change the contents of the value itself. Remember, if I simply repointed x to some other place, all would be well. But if *y was being used as a loop invariant inside a function, and suddenly *y’s memory contents changed, wouldn’t that be an incredibly difficult bug to track down?

In fact, this is why all mainstream languages like C, C++, and Java pass by value. Because the target function receives a value, not the reference to a value.

And that, is immutability. :-) That’s literally all it is!

So then why is it such a big deal? Because copying objects is incredibly hard, expensive, and many times, downright impossible. Imagine you had an object that held a reference to a file handle, or a network socket. You can’t just copy it and pass the value. You need to pass the actual object!

Immutability doesn’t mean that you make data unmodifiable. In fact, far from it. If you are used to coding in the C example I gave above, you are used to coding in an environment rife with fear and paranoia. You have wisened words from old developers that warn you of the dangers of changing values without knowing what you are doing. Who knows what will happen if you change the memory that x points to. Your movements are restricted. You are chained. Your freedom curtailed. Innovation destroyed. Ideas discouraged. A truly frightening world it is. That’s no way for a programmer to live! That’s abysmal! We can do better! We MUST do better!

On the other hand, immutable data structures deliver just the opposite.

FooUser Y = new FooUser(); //Y needs a Foo to operate correctly
Foo x = Foo();  //x can change. The object that x is referencing can't change.
Y.functionThatNeedsX(x); //This is safe. Can't hurt him no matter what we do
x = new Foo(); //innovate
x = new FooWrapper(x); //use that new idea
x = x.addStuff(2); //Wheeee. Freedom!

So how the hell does Y know that it needs to use a new Foo? The question writes it’s own answer:

Y.heyYouNeedToUseANewFoo(x);

That’s it! You TELL Y something has changed. You can tell Y as many times as you want, as many things as you want!

But aha! Y itself is an object isn’t it? There are people holding a reference to Y, and you can’t just outright change their assumptions, otherwise soon, this is your doc all over again:

class FooUser {
    /*Ancient wisdom. Don't call this if you're past the post-back phase, 
        except if you're running on Machine Y, and under these conditions. Or else!*/
    void heyYouNeedToUseANewFoo(Foo x) {
        this.x = x;
    }
}

Uhggg!!!! Who wants to live in this person’s world? It is positively horrid. It is restrictive. It is uninnovative. Can you imagine being told not to do things when you want, how you want, where you want? How the hell do you change the world with that kind of platform? Hell no! We do this:

class FooUser {
    /*Hey there, what's up. Call me for drinks later.       */
    FooUser heyYouNeedToUseANewFoo(Foo x) {
        FooUser fu = this.copy();
        fu.setFoo(x); //private method;
        return fu;
    }
}

Remember immutability is a concept. It is upto you how, where and when you apply it. But everywhere that you CAN apply it, becomes a happy place of innovation, where anyone can come in and do their thing. Ideas can be expressed faster. Fear is gone. Threads can play. Cores can play. Processes can play. Huge clusters can play.

I hope this clarifies that immutability ironically encourages massive change, while itself denoting lack of change.

October 10, 2014

When everything’s important, someone’s lying.

Filed under: Uncategorized — archisgore @ 5:56 am

A tough lesson in the software world, and a sobering reminder. Also a statistical truth (not mathematical, though.)

If you ever read about or studied probability theory, process control theory, quality assurance (modelling, predictability, reasonability, etc.), clinical trials, or just about any kind of applied science, you know that this is both theoretical truth, and empirically proven. You can always make improvements that tend towards correctness. But the only way to reach absolute perfection, is to never conduct the experiment at all.

There is no way to ever reduce false positives to zero, nor reduce false negatives to zero. This is why we have acceptance criteria. Cars today are safer than they were a decade ago. But the only way you ensure you are always absolutely safe in a car, is to never sit in one. It sounds counter-intuitive and difficult to grasp.

I recently read an article about the expensive experiment that OOP was, and while you may disagree with the article, you will notice the True Scotsman fallacy.

Sidebar: Why doing what you can do, instead of what you need to do, makes for great religion, but rarely useful.

I make fun of the mystics and snakecharmers, but one thing a yoga guru once said to me ten years ago, and what seemed like a dumb thing at the time, holds true for so many adults. He gave the example of the person who has lost their keys outdoors, but are searching for the keys indoors because that’s where the light is. They are virtuous because they are at least doing something, unlike me, who is making the smartass observation that they will never find the key indoors because that’s not where they lost it.

Why do I equate it with religion? Because religion allows you to do anything, and always be right. Need to feed the hungry? Sure. Need to NOT feed the hungry? Sure. Support gays? Sure. Not support gays? Sure. Invade Jerusalem? Sure. Defend against invaders? Sure. Forgive people? Sure. Not forgive people? Sure.

This has always been my observation with “frameworks”, “patterns”, “best practices” and especially, “testing”. You can write a few hundred tests that do nothing, or you can write one test that guarantees correctness. But if you’re having system problems, and you’re doubling-down on writing more tests, you’re looking indoors because that’s what you know how to do, instead of searching for the keys outdoors where you dropped them. It is exceptional religion. Also exceptionally useless. None of the above can fix a problem, and if you’re just about thinking of saying, “Well, what would you have me do? I have to do something!”, then you and I both know you’re lying.

Interesting note: Doing something is not necessarily bad. It is psychologically very helpful in fact. CPR rarely works as often as they show in the movies. Neither do defibrillators. The first thing they teach you in a first-aid class, is that CPR is 95% for the psychological protection of the rescuer, and 5% for the benefit of the victim. It helps you avoid PTSD because you had the illusion of agency – you acted, you did something!

 

This is why there are very few people in our industry who can really critically cut down to things that ARE important and those that ARE NOT. Remember that because something is NOT important doesn’t mean you go around sending a wave of crusaders to destroy it – another wholly inaccurate interpretation most people make.

Ask anyone about ACID consistency, or scalability, or correctness. Is testing important? YES! Is scaling important? YES! Is reliability important? YES! Are ALL requirements important? YES! Is there something you can live without? NO! Alright – someone’s lying. One of the above is most certainly NOT true.

When I’m reading documentation, or looking for tools to use, the biggest red flag I look out for, is when a system has made no compromises. Knowing that such a thing is theoretically impossible, I’m probably dealing with someone in denial – which is worse, because that means, like a religious zealot, I have no ability to hold a critical discussion with them. This isn’t so difficult to find. Can an iPhone be improved? Unless you live under a rock, the builder of the iPhone, Apple, releases improvements every year. Ask any fan of Apple the same question – you’ll get the same answer, “No!”. Guess which of them is lying? Can Android be improved? Can Java be improved? Can Linux be improved? On that last note, you’ll find Linus himself laying down a whole host of things he wished he’d had. He’s well aware of the compromises he’s made in the kernel, and he continues to make – and he publishes them too. 10 points if you figured out which of them gives me the best confidence that I’m dealing with the right person.

People have been wrong in the past, and people will continue to be wrong, long after we are gone. I have changed my mind about so many things, so often, its not even funny. The only reasonable change we have of building a good system, is by recognizing the compromises we make. As I wrote in my last entry, I’m available to give you hugs if “compromising” somehow diminishes your self-identity.

If you’re going to tell me that you want everything, perfect, all the time, I’m not putting my faith in you. Either you’re lying to me, which is bad, or you’re in denial, which is worse.

September 18, 2014

Software Design: Learning to identify Dislike vs Exception.

Filed under: Uncategorized — archisgore @ 12:20 pm

This is a subtle point I learned over the past year. You know how I’ve always hated Frameworks. The repeated irony in my blogs is that I’m a crazy academic process-Nazi when speaking and yet seem to disregard every bit of process, apparently when it suits me. I assure you there’s a method to the madness (see Sidebar 1).

Sidebar 1: I evaluate everything by statistical significance. Here’s the deal – if your child was taken hostage and a terrorist demanded money. You know the terrorist has no predictable consistency when it comes to releasing hostages after the ransom has been paid (i.e. there is no process that you know of.) You have two choices you either pay the ransom or you don’t. Let me add a twist (See Sidebar 2) – the terrorist also has no predictable consistency of releasing hostages when the ransom isn’t paid either. Let that sink in for a moment. The cover-your-ass thing to do is to always pay the ransom – because then you did everything you could. The cost-effective thing to do is, never pay the ransom, because there is no correlation of your action to the outcome, but in the first scenario you always lose money and in the second one, you never do. This one is genuinely difficult to get around mentally. You’ll accuse me of being inhuman. You’ll give me a lot of philosophy. The math doesn’t change.

Sidebar: One of the best critical-thinking tools you can use, is to frame the converse/inverse/contrapositive of every statement you hear. It is a great tool to identify if what you’re getting is “information” or not. Sometimes people imply causality without explicitly saying so. In normal words, they passive-aggressively accuse you. Consider this: “You were late today. I didn’t have anything to eat.” It is a very counter-productive way to have a conversation. I don’t know if I’m being criticized for being late – which I totally deserve. I don’t know if you want sympathy for not having eaten – which you totally deserve. But the question is, if you had somehow had something to eat, would my being late, be acceptable? THAT’s the question which determines whether or not you are having a constructive conversation or not. Use it the next time makes a comment without actually making a comment.

I treat all processes the same. A process that provides even the slightest bit of causality, I will uphold with an iron fist. A process that is designed to make us “feel good” does nothing for me. Either it is contributing to my success or it isn’t. If it isn’t, I have very little tolerance for the ceremony of “having done everything I could.” – because it costs a lot of money always, and doesn’t predict anything. Moreover, it has the risk of setting up a dangerous trap where people invent new ways of non-causal crap to do, do demonstrate they did everything they could. It’s like a religion where people hurt themselves or become poor for virtue – not that it objectively made them “better people”, but so long as they self-inflicted unnecessary pain, I’m somehow supposed to believe they overcame tragedy.

That leads us to the main point of software design. A widely misunderstood interpretation of the word “Exception” is that of dislike. I’m going to emphasize heavily that we understand the two are not one and the same.

I dislike many things about how software works. I wish nobody used recursive locks. I wish nobody created global variables. I wish nobody passed large objects by value. These are things I dislike. But I do all three of them all the time. Meaning, they are not done under exceptional circumstances. By definition, a situation has to be less than, if not equal to, what I do most of the time, for it to be called an “exception”.

Let’s pick on my favorite “framework” I love to beat to death – WCF (or SOAP, or CORBA). Every single SOAP implementation I have ever seen – literally EVERY SINGLE ONE has had an “exception” for passing HTTP headers. You should see where I’m going with this – it is not an exception. It is something the designers of SOAP disliked heavily. However, when HTTP is the most common protocol used for transporting SOAP messages, then manipulating HTTP headers is not an exceptional and extravagant scenario.

This is important because that is what ruined SOAP for me every time. Walk up to any old-school SOAP person, and you still have to put up the same fight all over again to “hack” your SOAP client to allow passing extra headers. You both know it’s necessary. But you both have to pretend like you’re somehow the exception to the rule, and that other SOAP implementations don’t do it.

If the SOAP authors had admitted and written a note saying just how much they absolutely detest application developers manipulating low-level protocol details, we’d have vehemently agreed with them, given them a hug, bought them a beer, and bitched about it. Instead, they turned their dislike into the “rule”, thus making everyone call the most common scenario the exception.

It is important in designing a system to understand why and how it is going to be used. If you don’t want people writing HTTP headers, that isn’t done by making it difficult to write code for – we’re used to writing difficult code. We get promoted by writing complex hard-to-understand code! It is done by providing a service design that won’t require it. Adding hoops and a thousand unit tests and process requirements doesn’t make a good software architecture. Removing the incentive to do the undesirable thing works wonders, on the other hand.

The next time you’re building something – ask yourself this question – is your system going to be broken under truly exceptional, truly uncommon, truly extravagant circumstances? If not, then educate your consumers on how to avoid the pitfalls. Don’t make their life annoying by adding barriers. I feel your pain when people do stuff you don’t want them to. I’m available for hugs if you need any. Education is the only way to fight fear and paranoia.

June 29, 2014

Why Programming Languages matter (and how you may choose wisely.)

A colleague asked me a couple of months ago, what determines my choice of language – or even paradigm and that had already motivated me to write a draft on this. This past week, I spent in Silicon Valley at a conference. This would be the first time that I actually interacted with the “hack3rs” and startup-junkies and techies and what not. I learnt a lot of buzz-words I had never heard of before, but I did walk away with one major take-away – even startups that are trying to be lean and simple, tend to complicate things beyond belief!

You can separate out the ones who “get it” vs those who don’t. The talks by Google, Facebook and Akamai engineers were captivating. They operate on a higher-level of problem space. This is not because of money, size or scale – it is simply because they choose to brutally simplify that which does not matter – so that they can focus on what does matter.

So in the backdrop of me pushing my own team to consider bringing in higher-level languages or even using full object-oriented as it was meant to be used, I found that a lot of new-age startups didn’t “get it” either. They use Javascript and Python and Ruby, but they have no clue why.

Now admittedly there is value in writing a more compact loop, or avoid boiler-plate code. But if that is the only reason you are considering a programming language, you’re doing it entirely wrong. I’ll take an FFT written in BASIC any day of the week and twice on a weekend, as opposed to a correlation-DFT on a multi-core parallel futures-dependent async map-reduce framework on top of hyper-optimized vectorized C++.

So what should influence the choice of language? One question and one question alone: Am I saying what I want to say?

After putting aside reasonable performance requirements, all the requisite functionality, etc., the language needs to allow you to express your intent, not only to the compiler, but to a future reader. It is my belief that 99% of the reason behind all failed attempts at making software maintenance possible, lies the inability of the original programmers to express their intent. What is documentation if not expression of intent? What is a UML diagram if not expression of intent? What is Object Oriented Programming, if not expressing intent of WHAT operations are possible on WHAT data? It isn’t as if the old C-style ModifyWindowEx(HWND wnd) didn’t work, but Window.modify() tells you and the compiler, what is possible on that window, and what is not. It is expression of intent.

Fortran was huge back in the day, because it expressed your formula. Instead of reading:

MOV AX, $5D
ADD AX, $6F
MOV $7F, AX

You could say:

c = a + b

So you know that the entity “a” was to be added to entity “b”, and the result stored in “c”. You don’t even need to know computers to tell me what that means.

The common misinterpretation of this concept is that “Functional languages allow you to say what you want, and imperative languages allow you to say how you want.”

That is a terrible way of looking at it. Because sometimes “how” you want something is what you’re trying to express.

Like all my blog posts, I’m going to give you that fundamental question to ask yourself when choosing your language:

“Did I tell people what my intent here was?”

If you can’t answer that question directly by the language, you’re using a non-optimal fit. When you have to write documentation and code comments, that means your language failed at expressing intent. Take the example of this prototype:

char* reverseString(const char *foo);

Without extensive documentation on treatment of nulls, empty strings, and exception-handling capabilities, there is no way to understand what the author of the function intended this to be used for. This is BAD! Sure there may be tons of input validation inside the function, but now you have to write a dozen unit tests for a dozen scenarios to ensure that assumption isn’t broken.

What do I mean by intent-expressivity? Suppose C++ allowed hypothetical annotations that could be made part of the prototype metadata?

char* @Nullable reverseString(@NonNullable const char *foo);

If those annotations were stored at the prototype metadata, you got two benefits:

1. You never need tests to ensure foo is non-null. Your compiler did whatever it had to, to give you a non-null char pointer.

2. You expressed to your caller, in no uncertain terms, that you will not tolerate nulls. You expressed it in a way that the compiler understood, and a smart static analysis tool would catch a class of bugs not possible in plain-old-C.

While this appears to be cosmetic syntactic sugar, it is far far more than that – it is semantic sugar. Now any analyser, man or machine, knows that having foo being null, will not be entertained by my function. Rather, you’re locking down the domain and range of the function. It looks very very silly that I would care about such a thing.

 Functional Programming isn’t the answer to everything:

Another common misconception about me is that I want pure-functional languages. Oh boy do I love them dearly, and for good reason. See that expression above?

c = a + b

what if I wanted to add the result of two expressions?

c = (expr1) + (expr2)

What if expr1 has side-effects that affect expr2? This isn’t an unseen situation:

c = (a++) + (a + b);

The problem here is not the one you think it is. I know what you’re thinking: “Who knows how this language interprets that statement? What happens if the evaluation order got changed?”

And you’re WRONG! That kind of thinking is what allows such features to live. There is an easy answer to what you thought was the problem. The answer may be reading the compiler spec.

The real problem with the expression above, is I have no way of knowing whether that sequencing was incidental or intentional. I can deterministically answer what will happen above. What I cannot answer is, was that intended? If I had to optimize the method above for running in a loop. If I had to make it so that it could be invoked by multiple threads, possibly running on different cores. If someone asked me, “If I set value of variable z to 10 instead of 20, will it affect your value of c?”

Then it is theoretically impossible[1] to answer that question. Sure we could heuristically make some assertion, after adding a thousand caveats (or just one caveat), but as a reasoned outcome, we cannot say, that z somehow didn’t affect a or b. Furthermore, multiple evaluations of the expression above cause “c” to change.

Why is that important?

Because the ability to reason is the ability to maintain. You want to know why CSS sucks? It isn’t horrible, like most people think, because people write it wrong, or because designers mix font rules with layout rules. CSS sucks because it literally removes any and all ability to reason about the intent behind any rule, without massive comments.

Remember that a rule-based declarative language isn’t exactly new or revolutionary. Prolog gave us the CSS-style declarations 50 years ago. Erlang gives them to us today in a widely used industrial language.

If I showed you the code below:

div .title #subtitle {color: blue}

I bet you, you would have absolutely no freakin clue what effect this has on a particular page, without actually loading the page. It makes no mention of how it is supposed to be interpreted in relation to other rules. It makes no mention of how it relates to conflicting matches.

So for all you Ruby/Python/Node.js users out there, I have one piece of advice – if you truly want to out-do the “establishment” and gain an edge – do what Google and Facebook does. They use experimental technology, but they don’t do it to reduce boilerplate code in for-loops. They use it to express their intent for their loops. Rapid development is a good enough reason to pick an easier language. Accurate expression of intent is the best reason to pick any language.

When Imperative matters:

To finish up, I wanted to explain why imperative programming matters. Look at a device driver for instance:

setlpt1(00000000b);

setlpt1(00010000b);

setlpt1(00000000b);

That’s some primitive protocol I made up for the parallel port. Those statements are organized chronologically. Even 200 years from now, that is EXACTLY what they mean and what they must do. To use imperative programming where necessary, provides a strong signal to the reader that this code is NOT to be messed with. There is no opportunity to reorder those operations. There is no opportunity to apply them to abstract “ports” – they only work for the “parallel port” or “printer port of old”.

Writing the above in a functional language, and then adding synchronization primitives to ensure they run sequentially, is folly.

Conclusion:

If there’s one take-away I can summarize from this post, the next time you write ANY code/spec/program, ask yourself – did you express your intent properly? Did your choice of tool/language allow you to express it semantically without ambiguity for interpretation? Was it done in such a way that a future maintainer, would know what your constraints are? Without reading a single code-comment or documentation? If you answered yes to most of the above – you’re probably using the right language, and using the right language right.

Older Posts »

The Silver is the New Black Theme. Create a free website or blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 137 other followers