Archis's Blog

September 1, 2011

On code reuse and maintainability

Filed under: Science, Technology — Tags: , , , — archisgore @ 12:32 am

A wise man once said, “Any procedural program, given a sufficient level of complexity, will end up implementing some form of Lisp.” (If you can’t find a paraphrase of that quote, then attribute it to me – but I’m pretty sure I read it somewhere 10 years ago.)

Today, we continue on the rant against frameworks, and look into code reuse and maintainability. Unlike regular posts, this is one area I’m not too sure about, and would love comments or counter-examples. Spare no punches!

What is reuse exactly? Syntax or semantics?

Let’s start with an example. Given a procedural language, I write a framework to do the following. Instead of the coding having to decide at coding-time what operation they want to do on a float, they can delegate it to my run-time operation-definition framework that allows data-driven dynamically loaded operations.
1. The code itself will look like this:
y = f(x)
2. The executable will add a configuration file for my framework that says:
<define function=”f” definition=”System.Math.Sin”/>

This one raises a lot of questions that I want to ask, but for now, I’d like to know, would you call this code reuse? I can certainly make great arguments for this style of coding (keep quiet functional-programmers, this one’s for someone who _has_ chosen a procedural language.) It allows me to change the definition of ‘f’ at runtime. I don’t have to worry if tomorrow my computation changes, because a simple config-change will make my code work for Cos, or Tan or whatever else the user needs. I can replace the definition of “Sin” to use a different implementation whenever I choose.

For me personally, this is bullshit! It’s the worst kind of code I would ever had the displeasure of dealing with. Only the last argument made any sense, and there are ways around that. It is the most irresponsible style of coding too – the programmer, instead of taking responsibility for ensuring correctness of code, delegates every function out of their own scope. If you really want to do that, use a functional language already! People have been advocating them for over four decades now, and this is exactly the reason why! Stop contaminating my procedural code with a smarty-pants half-assed implementation of something for which robust implementations exist already. You can replace your code at runtime and any interpretor worth its two cents has a decent JITer. Semantically, what would be the difference in sending the interpretor a new file to interpret, versus changing configuration for a running program? y=f(x) is certainly not going to have bugs (and can be tested easily.) Your probable bugs are going to be inside ‘f’ anyway. So while your core ‘executable’ can be assured of being stable, it’s a false perception.

The problem with this snippet, functional language or not, is that instead of ensuring correctness, it actually reduces it.

For one, what you see above, is an example of syntactic reuse. You are reusing the syntax for making a function call. You may disagree with me on this, but to me personally, what is really valuable isn’t syntax reuse but semantic reuse. Implementing a good Sine function is damned difficult. That’s what I want to reuse. Calling into the Sine function isn’t what I worry about when I open up my editor. The correctness of my Sine function is what I want to reuse. If there’s a bug, and someone fixes it, I want the new Sine function. If I may ever need to use a Sine function from a different library/implementation, well, seriously – change references to the new library and recompile your code (I know I’m making some atrocious demands here.)

The second problem is the really serious one. When I’m writing code as y = f(x), what the heck am I thinking? I mean seriously. If I am writing a program, I write it for a specific purpose. If I’m computing some vector component across one axis, I know why I’m computing it along that axis. Which means, when I write f(x), I had better damned well know what that ‘f’ should be. If that ‘f’ is ever going to change to ‘g’, then that’s because my problem statement has changed. It alters completely what I am doing (two axes are never the same.) If I start computing Cos(x), it is very very different from computing Sin(x) and I would have serious justifications for why I want the Cosine now. I sure as hell don’t want to reconfigure a running program to do that. I may do a host of things with a running program – use a more accurate Sin implementation, use a faster Sin implementation. If I’m fundamentally changing the definition of the function, I’m in big problems from the outset because I’m changing what my code is guaranteed to do.

Copy-fidelity

I know a lot of people don’t consider the fidelity their code preserves when it is xcopied from one place to another, but I assign it a very high value. The problem with the above snippet, is that 9 out of 10 times, someone’s going to only pick up the code file, without caring what the configuration file is. This is certainly not unreasonable, regardless of how senior or experienced you are. If you see a file called “eigenvalues.java”, you think to yourself, “Hmm…. maybe I should copy eigenvalues.java and use it to compute my eigenvalues.”

I find nothing wrong with this thinking. Very soon though, you see a compiler error: “Function ‘f’ not found.” You spend a couple of days (and if you’re lucky, you’d find a comment) figuring out that you have to host this class file in another loader called ‘function-replacement-framework.jar’. No big deal you say, I love myself some helper tools! This is when you go mad. “functionReplacementParserError: Please define ‘f’ in the configuration file.”

Now you are, in effect, figuring out how to compute eigenvalues so you can define what ‘f’ should be. I’m sorry, but you really should NEVER have to write a config file to define _what_ your code does! Regardless of how much computer-sciency stuff your parser and interpretor are doing, and how they’re generating binary classes at run-time which are processor-optimized by the JITer, this is some pretty bad design there.

Maintainability

So I’ve been thinking about this for a few weeks now, and want opinions on. I came to a good working definition that I think I’m going to use for a while in the near future. I draw this definition from how real-time systems are defined.

A brief overview for the uninitiated – while the common-sense notion of real-time systems is ‘they’re really really fast’, the working definition is ‘time-deterministic’. Meaning that, given what they do, they must do it in deterministic time – meaning, they have to guarantee that something will happen at (‘by’ if you’re soft-realtime) a specific time. Even in light-hearted situations such as filling ketchup in a bottle, you need hard time-determinism, to ensure the bottle is under the nozzle ‘at’ a certain time, not ‘by’ a certain time (if it passes from under it too early, you’ve got a mess to clean up.)

In a similar fashion, I was wondering if maintainability is “doing less work”, or “ensuring correctness even if it is more work.” If I had to make the choice between say…
1. Being able to replace all definitions of “Sin” with “Cos”, using a single config change, which is admittedly a lot less work, but depends on the hope that everyone has taken care to handle the cases where x=0,
2. Or replace Sin with Cos in all the code manually, which is a lot more work, but you ensure that that specific place really is worthy of a “Cos” function as per your replacement-intention. This would guarantee determinism in correctness, but increase work significantly.

If the above sounds unlikely I can certainly come up with more concrete stuff. We frequently use stdout and stderr to send output of a program. Would you prefer to control what goes where implicitly, or would you prefer to modify code yourself (there’s a difference between code consciously having an if-then based on a configuration parameter, vs, your code just having fprintf(outstream, “<stuff>”) where you don’t know during code-time what outstream could be, and can’t function without a complex config file.)

I intentionally chose the stderr vs stdout example because it is vague. I can see it from both ways. In some cases I am conflicted as to whether a warning should go to stderr or stdout. It depends on what kind of tools are going to capture the output, and how they may want to parse/interpret it (some tools, for instance, may consider anything being spit to stderr as indication of program failure.)

How would you define maintainable (or is it maintenable) code? How do you draw the line between abstraction of details vs. core purpose of the tool which is what makes the tool what it is. Does configuration at some point, become complex enough that you’re really just programming in a declarative language through your config files, whereas your “code” is simply an interpretor at that point? If so, is it configuration any longer? If declarative code is more maintainable, why not use a declarative language from the ground up?

Theme: Silver is the New Black. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.