Archis's Blog

August 27, 2010

The Electronic Voting Machine issue in India

Filed under: Politics, Technology — Tags: , , , , , — archisgore @ 5:53 pm

I never looked at my blog as anything more than selfish gratification, until quite recently when a person named Hari Prasad got arrested last week for allegedly having “stolen” an electronic voting machine.

First some background – ever since the EVMs were used in elections, my mom has been involved with a group of politically-un-allied activists. Naturally I made quite a bit of fun of her (my family always enjoys a bit of a jest at each others’ expense.) She used to visit me in Hyderabad often on account of her meetings with Mr. Hari Prasad who has his offices in Madhapur. She introduced me to him on multiple occasions but I always took the meetings casually, being more involved in my “work or whatever.”

You may imagine my surprise when one morning I wake up and see this same Hari Prasad an internet sensation making the headlines on Digg and Slashdot. Then you would have seen me telling everyone, “I know that guy personally and I know that he knows what he’s talking about.” I just found out my mom is in Mumbai awaiting his release and has been subpoena’d (not sure by which side right now), and decided to at least bring the issue to attention. Let me be honest, an year ago, she was in Hyderabad at least five or six times, and while I did believe what she said about the machines, I would never have imagined that they would be taken seriously, and let me be the first to say, I am sooooooo happy I was wayyyyy wrong! If you met those people, they’re really just electronicians – these guys aren’t politicians, and they don’t know squat about that stuff. They know how chipsets work and serial ports work, and that’s all they are making claims about.

To reach slashdot and get that much international attention, to get arrested is pretty impressive. What’s more, I called a few friends and family members in India right now, and nobody down there has any clue that this is even happening. That was a bit disturbing frankly.

So what really is the deal with the voting machines? Quite a lot really – I’ve heard discussions and arguments right from having found the seals broken on the boxes in which they were being carried, and the fact that the storage chips on which the numbers are stored could be plugged out and replaced with relative ease – and this stuff is what they teach in Electronics 101.

I don’t have all the specs with me right now, but I’ve been talking to these people enough where it warrants at least some looking-into by voters before you make up your minds. Whether you are for the winning party or not, as Perry Mason would say, everyone is entitled to a defense because it protects us from being falsely accused of a crime. In the same way, even if you love the winning party, it is in your best interests to at least give attention to the matter so that you are protected, should the system be compromised against you.

The real issue from a common-sense point of view that every really seems to overlook is this: that the “count” stored on the machines is virtual. You see, everyone makes comparisons to conventional ballot-boxes, and a casual “what’s the difference really?” kind of arguments. What they don’t realise is, in the old system there were physically 10,000 pieces of paper that necessitated tampering. An interested political party just needed to hire a street-side loafer to follow the van that carries the sealed ballot boxes from the voting booths back to the election-commission offices to see that nobody brought in another set of a thousand or so pieces of paper to replace the originals. Then again, when the boxes are sealed/unsealed, there are witnesses who sign the locks. Even the ballot papers used for counting can be verified for authenticity and their authenticity can be questioned (if you notice an inkjet printout, it’s a no-brainer.) In short, the system has multiple checks in place to ensure lack of tampering.

In an EVM though, all these checks and balances go out and what we get is: Party A: 5000, Party B 20,000. These are pure numbers. There is no public-key system that ensures even 25,000 different people walked into the booth. There is no way to “go back”, or trace tampering. There is no log of when entries were made – even a text file that contains time-stamps of actions without any Personally Identifiable Information (PII) would make tampering that much harder since a scammer would have to fabricate a large text-file and make sure it’s consistent. Heck, someone could even look at what sectors/clusters each of the block of the text-file was stored in to provide an indication whether it was generated over a period of 5 hours, or was just copied from one large blob.

Does it make forgeries impossible? Of course not, and those claims existed against the old-school ballots too. But does the current machine make forging ridiculiously simple? Yes. For anyone politically inclined, I would encourage you to at least check out his youtube videos. I’ll provide more edits to this post with more details on where to find information.

August 19, 2010

Polynomial approximations of neural networks

Filed under: Science — archisgore @ 11:23 am

Still in draft-purge mode, so lots of older shit from five years ago is coming out now. This post particularly makes no sense whatsoever anymore, partly because the neural network scene is much more mature today, and AI isn’t the sexy-new-thing grads look forward to. But…. since I had this written, wanted to go ahead and throw it out there. I must say I am very proud of the attempt even today, even though it was an embarrassment in the end.

Back in college I was bigtime into signal processing and brain-computer interfaces. The biggest challenge in such a system is data-filtering. You end up with a ton of data per second, and your system has to find the right data to process at the right time. To give you an idea (and this may be outdated stuff, since today people use ECoGs and fMRIs), an EEG  has about 64 channels, each sampling at about 100 Hz (assume 8-bit accuracy of each sample – in whatever units – usually micro-amperes).

Obviously there are a lot of conventional tools to process this data (Principal Component Analysis, Independent Component Analysis – which wasn’t of much help since it turned out there is very little cross-talk between channels, Support Vector Machines, etc.) But who in college is content with what we have? I secretly hoped the data had non-linear components (which it didn’t – all non-linear methods gave about 10% more accuracy than linear ones, and it seemed a lot like the reason was memorization rather than a better fit.)

So I had this crazy idea for sensitivity analysis. I had a well-trained Neural Network, and which the old method of branch-and-bound on the input vector to check variance in output is well known, I somehow wanted to model it in an equation, reason being, through sampling, you can find out that the answer is very sensitive to dimension X1, but not so sensitive to dimension X2. However, I wanted to know what Xi’s were most important and what weren’t and by how much. It’s the same concept you use in PCA – you want to pick ’r’ channels out of ‘n’ channels so that you get a ’90%’ accuracy. PCA gives you the contribution index of each channel to the final output. The idea is a trivial extension of the sampling method used above, but simply substitutes polynomials in place of the logarithmic/exponential functions used.

Enter the brand new idea of sampling each neuron at ‘n’ intervals in its range, and using a curve-fitting method to generate an ‘n’th order polynomial for each neuron, and then injecting that into the next neuron it feeds. Here’s the idea:

1. Each input’s domain is determined. Whether it’s an Int32, or Int64 or double or quad. You know the exact bounds of the set.  Therefore, whatever function you use, (in my case TanH),and since backpropagation requires the function to be continuous and 2nd-order differentiable, you know that it’s range is finite and well-known. This range feeds into the next neuron, so it applies to all neurons.

2. Since most used networks (most used by me) were 2-layer, it was a tractable problem.

3. What you do is, you sample each neuron in the first layer at ‘n’ evenly-spaced points in it’s domain, and you compute output-values. Then you fit a polynomial (Spline, Bezier, whatever) to mimic that exact shape (you don’t do this earlier, because this is a fixed graph that can’t be “trained” anymore using backpropagation).

4. You feed in the polynomial as a symbolic variable into the next neuron’s polynomial too, and then simpify the result.

5. What you would get is an n^2 (2 being the number of layers) order polynomial for each output in “m” unknowns (‘m’ being the number of input variables).

Based on the constants and exponents for each variable, you can judge what it contributes to the output. It felt like a good way to quantify in relative terms and symbolically, the importance of each variable, and to find which channels/dimensions could be dropped entirely without losing significant accuracy.

Frankly, if I worked with ANNs ever again, I’d still go this route, especially for problems where the nonlinearity is unknown and you find an ANN that gives you a good fit. Of course, if I worked with ANNs again, I’d catch up on where the world is today, but that’s another matter. :-)

August 17, 2010

What does 100% CPU usage mean?

Filed under: Technology — Tags: , , , — archisgore @ 1:32 am

Traditional scientists (mathematicians, physicists, and engineers in traditional disciplines of mechanics, dynamics, etc.) have always known that certain entities are temporal in nature. Temporal means time-dependent (using the term losely here), meaning quantities or values that only make sense “across time”. The larger time period you observe them for, the more sense they make, and the less you know about local details (you lose details of when  happened but you get a greater understanding of what exactly happened.) This is Heisenberg’s principle applied to time-domain metrics. I don’t want to go into all the stuff Fourier gave us, but heck that dude really changed the way we do stuff today.

A quick refresher example (or introductory example for those who didn’t study a lot of signal analysis). Imagine you were sitting in a large theatre at 8:00pm in the evening with the theatre partially filled up (say 30% of the seats were filled when you entered at 8:00pm sharp). At 8:01, you observe one person coming into the room. Given this data, would you be able to describe either, by what time the theatre would be filled up, or how many people would be in the theatre at 8:30pm the same evening? Now you doze off for 3 minutes, and again observe at 8:05, you notice another person entering the room (due to the darkness, you don’t know how many people are in the theatre currently). A better estimate of the answers to the two questions? Suppose the first person who entered at 8:01 clarified that between him entering and the next person entering at 8:05, he can assure you nobody else entered while you were dozing off, would that make your answer more accurate?  If by 8:20, you got details of how many people entered when, would your estimate be even more accurate?

As you see, when making such estimates, specific point-data has very little value. Given, at 8:15, there were 40 people seated, is quite pointless to figure out when your show should start. Without knowing how many people are currently seated, but the rate-of-entry per minute is much more valuable. Knowing both, is even more so. Knowing the rate-of-entry as it differs each minute is even more valuable (whether it is decreasing, increasing or constant.)

This is all fairly introductory textbook stuff for most other disciplines. In computing though, a lot many programmers while aware of temporal quantities, either misinterpret them or overlook their usefullness. This can mean a lot of implications in terms of quality, performance and correctness. I comment based on some observations and interpretations I have seen in this industry over the last few years and how we misinterpret benchmarks and metrics.

The most commonly misinterpreted statistic thrown around out there is “CPU usage”. We see people panic at “100%” CPU usage, while there are also apps out there who could have only 10% CPU usage but bring systems to a crawl. Ask a common coder, “My app uses 100% CPU sometimes, is that good?” and the immediate response is, “What kind of coder would write such a program?” Let’s look at how a CPU works for a minute.

The CPU runs on a wave (a square wave to be precise). It has a certain number of vibrations per second, and it does work every time it vibrates (just like the piston of an engine). What you see as CPU usage for a certain program, is the number of vibrations of the CPU the program used to do its own work (basically if a car engine could divide it’s piston strokes between the axle and the air conditioner’s compressor, the amount of strokes it allocated to the axle). A lot of times, we have a tendency to interpret “100%” CPU usage as bad. While this is generally the right metric to use in very generic terms, when you are developing a controlled system, it could lead to some quirky scenarios.

A CPU, just like the car engine, is running whether or not it is used to get work done. Of course, 100% usage naturally means nobody else gets a bit of that power when they need it, but there’s no reason why it shouldn’t be used whenever possible (when nobody else needs it). At times, you turn off your airconditioner, so that you get a higher boost in acceleration. In the same way, for certain applications, if a program is deliberately NOT using 100% CPU, it is a very very bad thing.

I’ve been developing server applications for a while now, and this gets brought up a lot. When my webserver is hit with one request, the server goes to 100% CPU usage for about 2-3 milliseconds. This isn’t only not bad, but actually helpful because what else did I expect the server to be doing anyway? What would it mean if I got 20 requests in one second? The server would still use 100% CPU and answer the requests in order and they get answered faily fast too. I’ve seen people going nuts on forums when they see their OS running at 90% CPU – you can imagine how the question is framed, “If with only a one request it used 100% CPU, how will it handle 2 requests at all?”

It’s not all that hard to reproduce on your home desktop too. Ever notice how media players always seem to be using 80% CPU and the system is still responsive (I compile large code-trees while playing a movie on my 2nd screen)? Well, why shouldn’t they, if nobody is using those ‘piston strokes’ for driving the axle? Contrary to that, sometimes a program goes unresponsive, and you open up task manager but you see barely 1-2% of total CPU usage, and wonder why the program is stuck? Happens to me too – even on programs I’ve written, until I realise that the whole “noble coding” era is passed. Back then we used to use more “efficient” workarounds to common functions to do things faster. Modern OSes expect you to be more semantic than syntactic – tell the OS what you need in no indirect terms, and the OS knows best how to provide it to you. Try doing custom memory tricks, and you end up with inefficient code. This doesn’t mean that the “hacker culture” doesn’t exist where super-smart minds exploit new ways to improve speed and efficiency, however it’s just that no longer can you read books on using “a+a” instead of “a*2″ and hope to gain a lot of applause.

Theme: Silver is the New Black. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 130 other followers