Archis's Blog

April 21, 2013

What rescue training taught me about dealing with problems

Filed under: Personal, philosophy, Preaching, Technology — archisgore @ 8:46 am

My first MBA-style “how to do stuff” list. If I get time, I’ll even give it some forced-acronym like “The Five E’s of handling pressure” or something. This will sound a lot like a self-help book as well. You have been warned. If it’s any further warning, I would never read the crap below and follow it myself. So there!

I recently went through a major high-risk scenario at work. When you’re in my line of work, these things are your worst nightmares. We have computational theory itself against us – what we code cannot be verified for correctness by a machine. So we do the best we can, and hope it works. Literally. Even the “best of the best” amongst us is reducing the odds that something could go wrong, rather than improving the odds that everything is right. There’s a big semantic difference between the two.

An year ago, I’d have handled the situation very differently. Usual responses by a human include a lot of things. First is blame. Second is “how do I get out of this as soon as possible?” Third is, “how do I justify my actions”? etc. However, I realized I could handle the pressure with great enjoyment. I’ve been called a masochist before, but this wasn’t that. A few recent rules learnt in my diving world, played a huge role in preparing me to handle anything.

Anyone who has techie friends knows we love our jargon. You can’t walk into a bar in Bellevue without overhearing boasts of “mission critical” and “strategy” and “tactical decision” and all sorts of awesome that would make James Bond walk out in shame.

When you’re training for rescue scenarios however, it is VERY REAL. A wrong decision and someone dies. You can do everything right and someone can still die. Those words actually mean something. An “emergency” doesn’t mean “oh I need a promotion, so I’m going to make things seem important.” An “emergency” means “unless you act now, and use the next 30 seconds correctly, someone is going to die.”

But here are five things I learnt out of diving and rescue training that can really relieve your pressure when dealing with “mission critical” situations.

1. Have reserves for the worst: This is a fundamental rule you learn after you’ve been in horrid situations and have exhausted yourself earlier. Everyone else can say it, but you only mean it when your life is at risk. When you are trying to salvage a dive halfway through, and someone’s reg starts to freeflow and he runs out of air, and you drained your resources in debugging a smaller issue, someone is going to die. When prioritizing, ensure that if at that moment, your entire company’s service went down or your product on millions of machines suddenly has some critical vulnerability, you have the energy to deal with it. If you’re using your reserve energy for anything else, you’re doing it wrong. Take a rescue class and you’ll gain the backbone to tell people to go away. They are important, no doubt, but are they important enough to burn your reserves on at that moment? Reserves are called “reserves” for a reason.

2. You’re not in a life-threatening situation: When you really have been in a situation from which you’re glad to have simply come out alive, you’ll find stress in life goes way down. Lost keys, or a disappointing interruption in internet connectivity, or some pissed-off coworker or whatever the heck you get worked up about, it’s not like you took your last breath, your reg won’t open, and unless you figure it out in 30 seconds, you’re not sure you will ever breathe again.

3. Use your importance: An year ago, I would have double-checked, questioned, and hesitated. Life-threatening situations teach you one thing: you’re the best chance the victim has. By definition you are the best person equipped to make decisions. Use that power. Make those decisions. That is not the time to educate a bystander on the physiology of emergency oxygen. It is not a time to build “consensus”. Nobody can validate your actions. That is the time to get that oxygen in the mouth of the victim as soon as possible. Everything else be damned! Also learn to hand over charge when someone more qualified comes along (a medical doctor for instance.) Life-guarding school should be mandatory to every MBA in dealing with “mission critical” stuff. When you’re dealing with an emergency, and someone says, “He should never have done that in the first place.”, you learn to treat that statement as “noise” rather than a discussion to be had at that moment.

4. Partial aid provided is better than full aid withheld: This is one statement they will drill into your brain every other minute. Derives from point 3 above. When you’re the BEST hope the victim has, everything that you do is helping. If you forget rescue breathing, that’s bad, but not the worst. Others didn’t know WHAT to do at all. As an owner of some task at work or in life, YOU know how something works. YOU have more information than anyone else involved. No matter how disturbed you are, or how tired you are, or how pissed off you are. YOUR bad decision, is statistically likely to be better than someone else’s random guess.

5. Look Cool Doing it: Perhaps the big point I learnt from GUE folks. If you’re doing something, do it well. There’s no excuse to not have your skills up-to-date. Others derive their cues from you. When you falter, they lose confidence. In a way this derives from point 3 & 4: People have already decided you’re the best hope they have. If you’re the most qualified person, where your decision is likely to be the best one, if you panic, you’re making the situation much much worse. You won’t know everything. But that’s better than not knowing anything. Think of your last doctor visit. If your doctor is worried, concerned, sweating, and informs you that you have a cold, you’re not going to go home very confident. If you’ve not delegated, then you’re in-charge. When people begin panicking, your calm assertive behavior can do wonders to get everyone to focus. There’s a reason I mentioned those IT showoffs. With all that language, it is very easy to lose perspective of the fact that while things are BAD, they’re not THAT BAD. Don’t be the panicking doctor who kills their patient of a heart attack, while informing them they have a cold.

I learnt that applying points 3 & 4 can be quite valuable in emergencies. Whether it’s your service failing, a bad press release, a badly received feature, an angry coworker or whatever it is you’re dealing with. Learn to identify when you KNOW better than others. Learn to USE that advantage to take control because at that moment YOU’RE the best of the worst. Learn to identify a BETTER QUALIFIED PERSON fast and handoff! PROVIDE HELP even if isn’t ALL the help needed. Some help is better than none. IGNORE noise. Remember that all hell can break lose. Your company could go bankrupt. You could get fired. And yet it’s not like someone is DYING. When you face your first panicked diver or your first low-on-air emergency (and I’ve thankfully never had someone go completely out of air on me,) when you’re thinking about whether you’re going to see the surface alive, whether you’re going to surface alone, etc. you really do get a much better grasp on everything else that can go wrong on the surface – you’re ON THE SURFACE! :-)

August 27, 2010

The Electronic Voting Machine issue in India

Filed under: Politics, Technology — Tags: , , , , , — archisgore @ 5:53 pm

I never looked at my blog as anything more than selfish gratification, until quite recently when a person named Hari Prasad got arrested last week for allegedly having “stolen” an electronic voting machine.

First some background – ever since the EVMs were used in elections, my mom has been involved with a group of politically-un-allied activists. Naturally I made quite a bit of fun of her (my family always enjoys a bit of a jest at each others’ expense.) She used to visit me in Hyderabad often on account of her meetings with Mr. Hari Prasad who has his offices in Madhapur. She introduced me to him on multiple occasions but I always took the meetings casually, being more involved in my “work or whatever.”

You may imagine my surprise when one morning I wake up and see this same Hari Prasad an internet sensation making the headlines on Digg and Slashdot. Then you would have seen me telling everyone, “I know that guy personally and I know that he knows what he’s talking about.” I just found out my mom is in Mumbai awaiting his release and has been subpoena’d (not sure by which side right now), and decided to at least bring the issue to attention. Let me be honest, an year ago, she was in Hyderabad at least five or six times, and while I did believe what she said about the machines, I would never have imagined that they would be taken seriously, and let me be the first to say, I am sooooooo happy I was wayyyyy wrong! If you met those people, they’re really just electronicians – these guys aren’t politicians, and they don’t know squat about that stuff. They know how chipsets work and serial ports work, and that’s all they are making claims about.

To reach slashdot and get that much international attention, to get arrested is pretty impressive. What’s more, I called a few friends and family members in India right now, and nobody down there has any clue that this is even happening. That was a bit disturbing frankly.

So what really is the deal with the voting machines? Quite a lot really – I’ve heard discussions and arguments right from having found the seals broken on the boxes in which they were being carried, and the fact that the storage chips on which the numbers are stored could be plugged out and replaced with relative ease – and this stuff is what they teach in Electronics 101.

I don’t have all the specs with me right now, but I’ve been talking to these people enough where it warrants at least some looking-into by voters before you make up your minds. Whether you are for the winning party or not, as Perry Mason would say, everyone is entitled to a defense because it protects us from being falsely accused of a crime. In the same way, even if you love the winning party, it is in your best interests to at least give attention to the matter so that you are protected, should the system be compromised against you.

The real issue from a common-sense point of view that every really seems to overlook is this: that the “count” stored on the machines is virtual. You see, everyone makes comparisons to conventional ballot-boxes, and a casual “what’s the difference really?” kind of arguments. What they don’t realise is, in the old system there were physically 10,000 pieces of paper that necessitated tampering. An interested political party just needed to hire a street-side loafer to follow the van that carries the sealed ballot boxes from the voting booths back to the election-commission offices to see that nobody brought in another set of a thousand or so pieces of paper to replace the originals. Then again, when the boxes are sealed/unsealed, there are witnesses who sign the locks. Even the ballot papers used for counting can be verified for authenticity and their authenticity can be questioned (if you notice an inkjet printout, it’s a no-brainer.) In short, the system has multiple checks in place to ensure lack of tampering.

In an EVM though, all these checks and balances go out and what we get is: Party A: 5000, Party B 20,000. These are pure numbers. There is no public-key system that ensures even 25,000 different people walked into the booth. There is no way to “go back”, or trace tampering. There is no log of when entries were made – even a text file that contains time-stamps of actions without any Personally Identifiable Information (PII) would make tampering that much harder since a scammer would have to fabricate a large text-file and make sure it’s consistent. Heck, someone could even look at what sectors/clusters each of the block of the text-file was stored in to provide an indication whether it was generated over a period of 5 hours, or was just copied from one large blob.

Does it make forgeries impossible? Of course not, and those claims existed against the old-school ballots too. But does the current machine make forging ridiculiously simple? Yes. For anyone politically inclined, I would encourage you to at least check out his youtube videos. I’ll provide more edits to this post with more details on where to find information.

August 17, 2010

What does 100% CPU usage mean?

Filed under: Technology — Tags: , , , — archisgore @ 1:32 am

Traditional scientists (mathematicians, physicists, and engineers in traditional disciplines of mechanics, dynamics, etc.) have always known that certain entities are temporal in nature. Temporal means time-dependent (using the term losely here), meaning quantities or values that only make sense “across time”. The larger time period you observe them for, the more sense they make, and the less you know about local details (you lose details of when  happened but you get a greater understanding of what exactly happened.) This is Heisenberg’s principle applied to time-domain metrics. I don’t want to go into all the stuff Fourier gave us, but heck that dude really changed the way we do stuff today.

A quick refresher example (or introductory example for those who didn’t study a lot of signal analysis). Imagine you were sitting in a large theatre at 8:00pm in the evening with the theatre partially filled up (say 30% of the seats were filled when you entered at 8:00pm sharp). At 8:01, you observe one person coming into the room. Given this data, would you be able to describe either, by what time the theatre would be filled up, or how many people would be in the theatre at 8:30pm the same evening? Now you doze off for 3 minutes, and again observe at 8:05, you notice another person entering the room (due to the darkness, you don’t know how many people are in the theatre currently). A better estimate of the answers to the two questions? Suppose the first person who entered at 8:01 clarified that between him entering and the next person entering at 8:05, he can assure you nobody else entered while you were dozing off, would that make your answer more accurate?  If by 8:20, you got details of how many people entered when, would your estimate be even more accurate?

As you see, when making such estimates, specific point-data has very little value. Given, at 8:15, there were 40 people seated, is quite pointless to figure out when your show should start. Without knowing how many people are currently seated, but the rate-of-entry per minute is much more valuable. Knowing both, is even more so. Knowing the rate-of-entry as it differs each minute is even more valuable (whether it is decreasing, increasing or constant.)

This is all fairly introductory textbook stuff for most other disciplines. In computing though, a lot many programmers while aware of temporal quantities, either misinterpret them or overlook their usefullness. This can mean a lot of implications in terms of quality, performance and correctness. I comment based on some observations and interpretations I have seen in this industry over the last few years and how we misinterpret benchmarks and metrics.

The most commonly misinterpreted statistic thrown around out there is “CPU usage”. We see people panic at “100%” CPU usage, while there are also apps out there who could have only 10% CPU usage but bring systems to a crawl. Ask a common coder, “My app uses 100% CPU sometimes, is that good?” and the immediate response is, “What kind of coder would write such a program?” Let’s look at how a CPU works for a minute.

The CPU runs on a wave (a square wave to be precise). It has a certain number of vibrations per second, and it does work every time it vibrates (just like the piston of an engine). What you see as CPU usage for a certain program, is the number of vibrations of the CPU the program used to do its own work (basically if a car engine could divide it’s piston strokes between the axle and the air conditioner’s compressor, the amount of strokes it allocated to the axle). A lot of times, we have a tendency to interpret “100%” CPU usage as bad. While this is generally the right metric to use in very generic terms, when you are developing a controlled system, it could lead to some quirky scenarios.

A CPU, just like the car engine, is running whether or not it is used to get work done. Of course, 100% usage naturally means nobody else gets a bit of that power when they need it, but there’s no reason why it shouldn’t be used whenever possible (when nobody else needs it). At times, you turn off your airconditioner, so that you get a higher boost in acceleration. In the same way, for certain applications, if a program is deliberately NOT using 100% CPU, it is a very very bad thing.

I’ve been developing server applications for a while now, and this gets brought up a lot. When my webserver is hit with one request, the server goes to 100% CPU usage for about 2-3 milliseconds. This isn’t only not bad, but actually helpful because what else did I expect the server to be doing anyway? What would it mean if I got 20 requests in one second? The server would still use 100% CPU and answer the requests in order and they get answered faily fast too. I’ve seen people going nuts on forums when they see their OS running at 90% CPU – you can imagine how the question is framed, “If with only a one request it used 100% CPU, how will it handle 2 requests at all?”

It’s not all that hard to reproduce on your home desktop too. Ever notice how media players always seem to be using 80% CPU and the system is still responsive (I compile large code-trees while playing a movie on my 2nd screen)? Well, why shouldn’t they, if nobody is using those ‘piston strokes’ for driving the axle? Contrary to that, sometimes a program goes unresponsive, and you open up task manager but you see barely 1-2% of total CPU usage, and wonder why the program is stuck? Happens to me too – even on programs I’ve written, until I realise that the whole “noble coding” era is passed. Back then we used to use more “efficient” workarounds to common functions to do things faster. Modern OSes expect you to be more semantic than syntactic – tell the OS what you need in no indirect terms, and the OS knows best how to provide it to you. Try doing custom memory tricks, and you end up with inefficient code. This doesn’t mean that the “hacker culture” doesn’t exist where super-smart minds exploit new ways to improve speed and efficiency, however it’s just that no longer can you read books on using “a+a” instead of “a*2″ and hope to gain a lot of applause.

Theme: Silver is the New Black. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 130 other followers