Archis's Blog

February 25, 2008

Android UI sucks!

Filed under: Uncategorized — Tags: — archisgore @ 9:53 am

First look at some of the screenshots from the link below: http://content.zdnet.com/2346-11470_22-188303-5.html

Let’s first make allowances for the following:
1. I work for Microsoft and will by definition criticise anything-Google.
2. ….. well, there is no point number two, but I know most of you will be just repeating point one constantly instead of really giving me reasons to like the Android UI.

Good heavens! After all that hype and after all the criticism I dished out to WiMo, I was expecting something more! I admit this may just be a prototype, but after the iPhone, I was expecting something a lot cooler.

The UI is just a been-there-done-that kinda thing. No fancy scrolling, or multitouch, or stuff like that. Some stuff I find on my Pocket PC at the bottom is at the top. Some stuff I find at the top on my PPC is sometimes at the top. But apart from that, I saw all the same options I’m used to seeing.

Sarcastic as I may be, an idiot I am not. One thing conspicously missing from the screenshots is even the slightest hint of the browser/web interface – the real killer everyone is expecting from the device. So perhaps, I may be a bit premature in my criticism, and perhaps the best of Android is yet to come……

February 16, 2008

Taking the politics out of computer science?

Filed under: Uncategorized — Tags: — archisgore @ 1:38 am

A common comment in any debate involving computer science or software or technology is, “Let’s leave the politics out of it.” More than often, it is the losing side that makes this comment.

Over time, I have learnt that there is no such thing as a “non-political” decision. Every decision is opinionated (opinionated by the decision-makers). Every decision has a reason behind it – people don’t just take them randomly. Now while you may say that a decision was objectively taken to fit certain criteria, who guaranteers that the criteria to begin with are themselves non-opinionated?

I’m not trying to lecture in some kind of patronising holistic manner here. I’m speaking out of experience. Till about two years ago, I lived in my own virtual shell where I believed there is no politics. I believed there can be places where people work purely towards altruistic goals. Over time I realised that my views were not necessarily mutually-exclusive. Altruism requires politics. Mohandas Gandhi (for those who may disagree with his Mahatma status) may have been altruistic, but you don’t get independence without laws, and a government. You don’t rally people towards a common cause without being political.

“Open Source”, “Freedom”, “Copyrights”, “Usability”, etc. is all politics. That doesn’t mean either that being politics makes it bad, or that it’s not important or necessary. But you need to factually accept the fact that we live in a world with passions, whims, opinions, and egos.

And you know what? That’s what makes the world so great! Richard Stallman firmly believes in his altruistic vision where software shall be free. To come and tell me that he’s not a political guy would make me lose all respect for him. Of course he’s political. Of course he’s passionate about what he feels. On the other hand, I also know people who have been ruined by opening up their ideas and others making money out of them. Therefore there are people who passionately believe in intellectual property rights. Afterall, it takes hard work to publish a paper and if you’ve ever studied in the University of Pune, you’d know just how valuable “intellectual property” is. The value is as complex to measure as the intellectual property itself. Many a student has been harassed, tortured and abused before they could do anything productive. That’s the “price” of intellectual property, and it certainly doesn’t come cheap (let alone free).

It’s important to separate the election-oriented party-politics from politics in general. If we didn’t have passions, if we didn’t have opinions and if we didn’t have beliefs, humanity would have been lovable, cute, adorable and ultimately inconsequential pets of some alien species.

So think again before you try “taking the politics out of computer science”.

February 6, 2008

Probability, Randomness, Non-Determinism, Approximation – the subtle distinctions

Filed under: Uncategorized — Tags: — archisgore @ 1:46 pm

During discussions on various algorithms communities, and with friends and colleagues, I found many people having skewed notions of the words “probability”, “randomness” and non-determinism. They use the words out-of-context or with inaccurate semantics.

This blog was motivated in part to dismiss some inaccurate notions regarding these phrases, and also to present some interesting examples which may help you appreciate the subtleness of expression science requires. I am writing this on a very tight schedule, it may end abruptly (I shall follow up with entries in the near future).

This topic is highly “academic”, and you may appreciate that academicians, not throwing fancy words all the time, when they do use a word, they mean it! And I hope to cultivate some regard amongst CS students for the highly precise nature of the mathematical science they chose to study.

Randomness:

Let’s begin with a fun example my dad used to tell me (he’s a statistician). Imagine there are two people reciting numbers progressively. And there are two observers writing down the numbers spoken by the former candidates. Some information regarding the numbers they’ve provided:

1. Candidate one has said: 1, 5, 2, 9, 7, 3, 2, 1

2. Candidate two has said: 2, 2, 2, 2, 2, 2, 2

Checkpoint1: Now I ask you, can you predict the next number that either of the candidates will utter? Also, please do write down the reason(s) for your answer.

Keep that answer to yourself for the moment. In the meantime, some more info about our contestants:

1. Candidate one is a professor of statistics who specialises in pseudo-random number generation.

2. Candidate two is a mentally retarded person (so far as our contemporary understanding of his condition goes).

Checkpoint2: Now I ask – can you predict the next number either of them will utter? Keep your answer to yourself once more.

In the meantime, some more gossip about these candidates: both people were initially told to produce random numbers.

Checkpoint3: Again, note down your answer and the reason for thinking so.

Let us now try and debate which of the next number to be obtained from either candidate would qualify as “random”. Let us assume that randomness is the property of being unable to predict the next number that will be produced.

1. Candidate one: He knew he was supposed to generate “random numbers”. He is a professor aware of the properties of randomness of a sequence of numbers. And the sequence he has generated does exhibit some of those properties. I shall venture to make a comment here and you can make up your own mind – given enough numbers that he has generated, and if we assumed he is going to follow the “statistical properties of randomness”, it would fairly narrow down the possibilities of the next number he would generate (since it would try to maximise the adherence to the properties of randomness).

2. Candidate two: He probably has no idea what he is supposed to do beyond generating numbers. While we do know that he has generated a sequence of ’2′s consequtively, we cannot assume any properties or any process by which he is generating the sequence. Maybe he has OCD and is generating a specific number of ’2′s before he switches to another number? Maybe the only number he knows is ’2′? Maybe he’ll stop at the next number just for the heck of it?

Based on the definition of randomness that I stated above (conceding that it is my subjective definition), I am compelled to find that the series of ’2′s is a random series, whereas the first series is not, in fact, random.

This is a critical property of randomness that very few people seem to grasp. Randomness of an outcome is a property of the process used to generate the outcome, and not the outcome that was actually generated.

To provide a simpler analogy, will you assume that just because your home had tap water for the previous one hour, you are going to get tap water indefinately? Or conversely that because tap water at your house keeps getting disconnected abruptly, it will never come back? As you can see, the way you’d predict the outcome of the supply of tapwater is based on the process used to provide it. It is the same with random series, and making assumptions based on the numbers themselves is synonymous with predicting tap water availability based on how much water you collected in the bucket.

Sidebar: Frequently, people use the word “random” to indicate “high variance”, which is technically inaccurate.

Law of Averages:

Another fun statistician joke: A guy goes to his doctor for heart surgery. The doc tells him his chances of survival are 100%. The happy, yet skeptical patient asks his doc, “How come? I was told that the survival rate in this surgery is only 10%.” The doc promply replies, “That is precisely correct. And it is because all my previous nine patients have died during this surgery, that I am supremely confident of your survival!”

This example in part shows the danger of predicting future outcomes of a process, based on purely the past outcomes, without looking into the process. Look at it this way, would you yourself go to a doc, who’s had a 100% fatality rate for all his prior patients? However, if it were the best heart surgeon in the world who’s had extreme-stage cases for the past nine times, would you rather trust him or another doc who’s has a zero fatality rate but has left patients with a permanent disabilities where any other doc would have been able to provide a full recovery?

Essentially, I want to reinforce two points:
1. Randomness is a property of the process, not the outcome of the process. Never ever make assumptions of randomness based on purely the outcomes of the process (cryptographers would tell you the extreme dangers of that).

2. Randomness is about unpredictability! It is not about a high variance in outcomes. It is about not being able to predict the outcome. Basically, even if you get a million ’2′s in a row, if you are unable to predict the million-and-first number, it is a random sequence. On the other hand, if you find a sequence of four statistically random numbers, and can predict the 5th number, it is NOT a random sequence.

Non-determinism of outcome vs. Non-determinism of process:

Everyone, I hope is familiar with Schr��dinger’s cat. I keep facing many queries in algorithms forums for “non-deterministic” algorithms, when people really cannot distinguish between determinism and accuracy.

Schr��dinger’s cat is a perfect example of non-determinism or unpredictability or randomness. There is really no parameter in the experiment that allows us to explain the state of the cat while the box is closed. The outcome of this experiment is non-deterministic in the sense that a decision based on this outcome will change drastically since the outcome is discrete and binary.

Another example of non-determinism is that of probabilistic primality testing. Given a number, you may have false positives and false negatives.

Let me put it this way:
In non-deterministic outcomes, the outcomes themselves are well-defined and you know all possible outcomes, only that you don’t know whether the outcome you obtained was correct or incorrect. A non-deterministic Turing machine is a good example of this. Like Schr��dinger’s cat, an NTM can be in multiple states at the same time. However, for each state, the decision to be taken next is well-defined and deterministic. Similarly in a probabilistic primality testing algorithm, each step is well-defined and accurately executed. The algorithm itself has no non-determinism built into it. Given the same input, the algorithm will behave exactly the same every time (although the answer you get may be different). To put in another way, an algorithm to test for the survival of the cat calls for the opening up of the box. This alg
orithm is invariant. You always deterministically open the box and look inside to obtain the outcome. The outcome itself is something you don’t know until the algorithm has completed.

Now don’t get me wrong. I’m not saying a non-deterministic-outcome algorithm has to be deterministic in it’s process. These two are orthogonal properties. Let’s look at non-deterministic-process algorithms:

Random algorithms, on the other hand, use some kind of entropy within the algorithm itself. The processes of such algorithms are themselves non-deterministic. Ideally, until you reach a decision point in the algorithm, you’d have no way of forecasting what decision would be taken at the point. Monte-Carlo methods are an example of such algorithms (but pseudo-random numbers do in fact allow us to predict their decision points). Let me put that another way – if you run the same Monte Carlo method twice, you’d follow different steps. So a Schr��dinger’s cat experiment with Monte Carlo methods would vary based on the parameter you’re making non-deterministic – for example, it may open the the box after different time intervals each time you ran it and you’d have no way (ideally) of predicting when the box would be opened.

An example of a non-deterministic process leading to a deterministic outcome is the use of Monte Carlo methods used to approximate PI (the most common example of Monte Carlo methods), or to get more deterministic, roots of polynomials. The algorithms used generally won’t ever follow the same state transitions twice, but they always lead to the results which are always deterministic and identical (in any case, even if the algorithm doesn’t reach the deterministic answer, there does exist a deterministic answer and the algorithm gets _closer_ to it progressively).

Which brings us finally to:

Approximations:

Approximations or near-optimal solutions are the third orthogonal property we need to consider. The notion of “close to” the answer comes into play here. And this is difficult to differentiate from the property of high-probability of an answer.

An easy way to understand this contrast is to think of prime number generation vs. primality testing.

1. Assume a function F(x) which returns the x-th prime number. Due to computational limitations, there will be loss in numerical accuracy and the number we obtain won’t exactly be a prime itself (and in all likeliness not an integer at all). In this case, the better the numerical accuracy we provide during our computations, we arrive “closer” to the prime number. If our numerical accuracy is infinite, we obtain the perfect integer. But in every case, we are aware that there exists in fact a prime number without any doubt in a neighbourhood of the number we obtain. You can bet your cat’s life expectancy to be propertional to closeness of the outcome, and you’d get a pretty healthy cat.

2. On the other hand, consider an F(x) that returns 1 if x is prime, and 0 if x is non-prime, and any x in (0, 1) by attributing a confidence level to the primality of x. Essentially if you tested a 100 Xi’s, (define Yi = F(Xi)) and if Yi > Yj, then more Xi’s are primes than Xj’s on average over time. However, by increasing Yi, it doesn’t make the number Xi any primer. There is no such thing as “more prime” or “less prime”. Even with a Yi of 0.99, Xi may turn out to be composite. Hence, this is not an “approximation”. This is non-deterministic, probabilistic and perhaps random. But it is not approximate. To say that “X is approximately a prime number” would be a gross misrepresentation (and I’ve heard this statement made more than once). If you bet your cat’s life on this, you’d be playing russion roulette with your cat.

Essentially, in a surgery with chance of 20% tissue damage, all 100% patients would come out with (upto a maximum of) 20% damage to their muscles. In a surgery with 20% mortality rate, 80% people would come out alive.

Such distinctions become very important when you’re in any kind of business that supports decision-making (programming being one of them). Imagine if someone sold you an investment-advice program that had a 0.01% failure in predictions, and another guy sells you a program with 10% loss of accuracy in predictions. It’s important to know these distinctions before evaluating any decision choices. Don’t misunderstand me – I’m not biasing on either side. I personally would buy the first one (since I’d like to make lots of money fast and risk the total loss of all my money). But that doesn’t mean one shouldn’t understand and appreciate the distinction between the two.

This concludes the post for now. I need to go watch He-Man for a while and get motivated and punch in a lot of code.

January 17, 2008

Sun acquires MySQL – proof of money in “Open Source”?

Filed under: Uncategorized — Tags: — archisgore @ 9:21 am

As we all know, a lot of internet traffic was generated by this news over the last couple of days. In a local mailing list, I read this comment:

WOW!!! US$ 1Bn for that.
This could be any eye-opener for the people who say there is no money in FOSS :-)

Below is my response to this comment (which I have posted on that mailing list too – but my blog is a good place for me to talk instead of having to listen to abstract philosophy in every other sentence):

I’ll interpret that as, “For people who think money cannot be made from owning source code to a FOSS project”, which nobody has ever debated :-) . (I can feel the flames coming.)

The billion-dollar question (it’s very rare occasions when one can use such a phrase figuratively and literally one one shot) was, “Can I make money by _using_ GPL’d code” and the question just got answered!

MySQL was GPL’d already – and Sun could have just bought a copy (or licence or support or whatever MySQL sells) to get their source code (assuming Sun was already stupid enough not to download a tarball (further assuming I’m not stupid to have assumed a tarball is available for public download)).

The real key here is ownership of the copyrights to the MySQL codebase. That’s what sun paid 1Bn for. In that case, the money went to buying out a copyright. Can’t really understand the merits of the code being GPL’d or otherwise – people pay money for copyrighted code every day of the week.

Nobody had ever debated the prospects of making money so long as _you’re_ the owner of the codebase. In fact this is an eye-opener for those who were being told, “Hey, you can make money from MySQL because it’s GPL’d.” Now if a company with Sun’s magnitude and power and influence and fan-following can’t make money from the GPL’d version, it’s hard to believe that an individual might make money from it.

Of course, I’m not at all saying MySQL isn’t “free as in free speech” (lest I should be mailed a whole lot of philosophy), but since “money” is considered orthogonal to freedom, I’m simply talking on that aspect.

Since we’re on that topic, there was an interesting post on a zdnet blog recently about “definition of freedom” and a comparison between the BSD-style licence which allows you to make proprietary forks of their code and the GPL. Ultimtely it’s upto the benelovence of the “owner of the code” (I have to agree with Torvalds here), to maintain “freedom”.

1. In the GPL licence, you have the freedom to do anything with the software except sell it’s source code separately (as a separate entity from the binaries). This means that whoever owns the copyrights to the code (Trolltech, MySQL, etc.) could just fork a proprietary version without you being able to do anything about it (do not misinterpret that; I admit _most_ of those who ask for transfer of copyright are in fact benevolent; all I’m saying is that this is not a licence-imposed necessity). The average guy still competes with other players who have the GPL’d code.

2. Under the BSD-style licence, if one guy makes a proprietary fork, then so can you. You can go head-to-head and try and kill each other off. If Berkeley makes a proprietary fork, every single person on earth can do the same and neutralize the playing field. (To answer all those who were wondering why after having funded PostgreSQL so much, Sun bought out MySQL). If they’d bought (without need) the PostgreSQL codebase, every programmer on earth worth his two cents would have made a proprietary fork to compete against Sun. With MySQL…. well you get the picture.

Please do not misinterpret this post. I’m sure Sun will ensure everyone gets all the code and we already know Google is planning to give back a lot of their in-house modifications to the public code-base. All I’m saying is, the 1Bn of money that was paid, was paid for the conventional thing you pay any proprietary company for – the ownership of copyrighted code.

October 19, 2007

Blazing-fast large-integer multiplication

Filed under: Uncategorized — Tags: — archisgore @ 9:23 pm

This question keeps getting asked again and again, on mailing lists, and forums, and programming contests.

What’s different for the BCS crowd, is that this question gets asked by “Alumni” to show of their mediocre skills. What’s more wierd is that most “Alumni” that I know of have always demonstrated suboptimal solutions to this problem (refer to definition of “Alumni” below).

For the benefit of the small band of rebels that I’m trying to encourage to purge the empire, here are the hidden plans that destroy the empire. The FFT method would out-run absolutely any method on earth when you’re talking about integer multiplications to the scale of 1000! or above.

All feedback/comments will be appreciated.

The code and a descripton will be found here: http://www.geocities.com/archisgore/code_samples/fft_mult.html

August 27, 2007

Project guidance

Filed under: Uncategorized — Tags: — archisgore @ 5:02 pm

I hope this mitigates those “I wanna do a project, do you have a project?” kind of questions and explains all my views in excruciating detail. For any questions, you’re always welcome to mail me and I’ll revert back with another entry. Me not being either an “industry expert” (I’ve exhausted all industry-expert humor for now – but I promise to come up with new insults soon), and neither am I an “alumni” (if you’re new to Pune culture, this isn’t an adjective which means past-student, but a very special honour – it’s a bit hard to explain here unless you’ve faced some “alumni” yourselves), so there’s little scope of me ever being able to speak on these topics directly to students.

Let’s get down to business. I’ll post suggesstions here and justifications later, so that any of you who want to get a quick overview won’t waste time:

1. Projects should always be focussed on a “problem”. They must begin with a problem, and only end in solving the problem. If you’re beginning with, “I want to do something in Java/.Net/”, you’re in trouble. Now, this could be a valid assertion, assuming you’re talking about extending the language/technology in question. If you want to increase the “Java platform” itself, this makes sense. But if you’re talking about using Java to do (where is as of yet undefined), you’d better rethink your choice of becoming a computer scientist. Not sure if you even know how a “problem” is defined.

2. A problem must be something real and substantial. It must be something people can relate to. Something that, if “solved”, must no longer exist. Let’s say you pick up “Base64 encoding a string” as a problem, then after your solution, base64 encoding of a string should no longer bother your target audience.

A problem must always have a target audience. If nobody needs a base64 encoding of a string, don’t do it. If someone needs a base64 encoding of a string, and won’t use your program for it, don’t do it. Do something that someone real uses. This doesn’t have to be a “live” project. Do something at home. I don’t care. But do something that works! Even if you’re mom or dad or grandpa uses it, it’s okay.

In my opinion, a great project is one that achieves something – something that at least someone cares about – even if it’s one person on earth (including you). Let’s take an example later.

3. Write “REAL” code: Go to your own college. I’m sure everyone’s complaining about some missing tool, or “I wish I had a tool to do this”. Now think to yourself. In your own college, one that is admittedly “academic” and “non-industry-oriented”, there isn’t a single tool written by many of you today, that solves any of the problems they crib about. Now if your so-called “academic” college won’t use it (which is why you bring in “industry experts” to deliver lectures), why should a multi-billion-dollar industry trust you?

Your own HoD won’t trust the code that you write, but overnight, you expect a multibillion-dollar company to just trust you? And let me make this very clear – the industry, once they hire you, implicitely trusts you. I had a disagreement about this issue with a senior many years ago, and he had said I would “learn that trust isn’t everything” when I joined the industry. As I have yet to apaprently join the “industry”, I maintain my original opinion, and actively promote it. When I write code, I am questioned by nobody. My manager doesn’t ask me why I did something a certain way. If I say something can’t be done, he takes it at face value and defends my decision higher up. He’ll never ask for a second opinion. My mails are not monitored and I could burn all my team’s code on a CD and walk out with it without anyone checking. It’s that simple. You can now imagine just how much trust Microsoft has in all it’s 70,000+ employees worldwide (vendors and interns over and above this number). That’s how the industry works. And this trust has to be earned.

“How do I write real code?” It’s simple. Write code. Distribute it. If one person uses your code for their problem, it’s “real” code. Now it is very important that they really do use it. Some friend shouldn’t just fire up your program, run it, and then fire up something else later. You’re program needs to be the only program they use for the specific problem your program attempts to solve.

4. “Impressive projects” (IPs for short), are mostly for cowards. Let me explain. In my day, I’ve seen (and continue to see) extremely fancy-sounding names like “kernel-module-janky-panky-thingy”, or “hyper-threaded-multi-headed-monster” to name the remotely sane ones. Now while those of you doing “small” projects like GUI’s for a database might get intimidated by these names, let me attempt to give you some confidence.

If the comment, “My project is 90% done”, can be made about any project, that project is cowardly. How many of those kernel modules being made is the college using in their labs? How many of those jhanky-panky things are being used by your own teachers? So far as I’ve noticed, ZERO! The cool thing about an IP (impressive project) is that the minute you begin it, you’ve already defined an exit-route. Regardless of what you end up doing, you just say, “Hey it’s 90% done”, and you got 100% marks for doing 90% of a very complex project.

The really brave are those who take up those silly-sounding database GUI projects. Due to their silly-sounding nature, everyone is already critical of them before the evaluation begins. Being a silly-sounding project, they really need to “deliver”. If it doesn’t work, they’re screwed. Forget 99%, they need it to be done 100% or even more sometimes. Think about this for a moment.

One of the reasons I have never been, nor will never be asked to get involved in industry-student activities (where industry jerks go to the college and tell students how stupid they are), is that I keep asking the wrong questions. I distinctly remember at least 3 IPs where I asked the teacher praising it, “So ma’am, does your machine use this file-system driver?”, and she didn’t reply, and I didn’t need to ask anything further. That’s why “defining the problem” is the most important step. Define it and solve it goddamit!

If you’re still not convinced, let me put this in another way. Most of you know I love making my own toothpaste just for fun. Now I recently made a toothpaste that’s 90% done. It can kill off absolutely any bacteria, virus, fungus and any other pathogens known to man! Wow! It’s super-awesome! Just one minor issue – we can’t use it on our teeth yet. But wait! It’s 90% done! We’re almost there! We solved the major issue that no other toothpaste in the world solves! We killed off all germs! Take that you idiotic dentists studying for 1000 years! I, a mere computer programmer, invented a better toothpaste than you! And now, I want you all to invest 100 million dollars in my new invention so I can make it safe for teeth, and we’ll all be rich!

Just tell me how many if you will invest in my super-awesome toothpaste? The moral of the story is, just go do a project you enjoy and have fun with, and don’t be intimidated by these IP’s. If you’re doing a small GUI project, you’re braver than many of them, and you’ve got a lot more to be afraid of – because you will be scrutinized, you’ll be interrogated, you’ll be broken down into bits before you get a single point for your project, which needs to do everything you said it would do. Your project needs to be safe for teeth, _and_ kill as many germs as possible. So take heart and trust me – the industry will value you. Have you ever heard of the standard comment at college that goes something like…. “sometimes low-scorers are hired by the industry and high-scorers remain jobless”? Believe me – the industry isn’t stupid. They never hired low-scorers. They hired those who were scrutinized the most and didn’t have the escape routes that the others had.

As an ending note: if you do manage an IP and only deliver the solution
- well, you’re beyond all praise. I honestly mean it. This blog entry was not meant for you.

5. Commitment: I know most of you have never met people who’re committal, but hey, they do exist you know! Now, I don’t mean this in the context of management-style blazer-wearing people. I mean it in the plain human definition of it. If you want to be brave, learn to write down the problem you’ll commit to solve, and only go to the examiner with, “It’s solved”, or “It’s not solved” (unless you’d rather back me a million dollars for my toothpaste).

Define the problem in use-case terms ONLY! That’s the only way to really commit. If you’re making a Linux installer, and if I were an examiner, I’d ask it to install Linux on my PC without asking you a single question otherwise. If it didn’t install Linux, but instead did some hyper-threaded-multi-headed thingy, I wouldn’t give a damn! Use-case scenarios are frightening to commit to, and that’s why you need to learn this while you’re students. Now is the time to make those mistakes.

And believe me, there’s a difference when I say this. I work for a company that produces products that _you_ personally use (well, I hope so anyway; and if you don’t use our products, I’ve proven my point even better). I don’t come to you and say, “I’m some bigshot industry fellow who knows everything and you’re stupid.” The code I write, physically reaches you. You’re being, while you’re reading this blog, my direct judges. You’re my evaluators. You’re directly responsible for my bread-and-butter. How many of you use Live Search? What if I told you Live Search uses some jhanky-panky-super-cool technology that Google doesn’t have, will you use it? If I said, Live uses C#, will you use it? Think of orkut. What technology does it use? What Os does it run on? What AJAX engine does it employ? Have you even once considered these questions? All you care about is the quality of search results, and the ability to _communicate_ using orkut. Orkut’s problem is a use-case scenario. “To enable Person A to talk to Person B”. Even if it’s 90% done, it’s worthless. Even if it’s 99% done, it’s worthless. Only if it enables Person A to talk to Person B, do you – the judges, the evaluators, the jury – use orkut! Then why is it, that having expressed a desire to work for this industry, you hate these “academic questions” asked by allegedly stupid university examiners? In my opinion, the university examiners ask the perfect questions – the very questions that you would ask me when I come and ask you to use my product. In a way, calling them stupid, is calling yourself stupid.

I admit, they don’t care what technology you’ve used, or how hard you’ve worked, or that cool kernel-thingy you did. But then again, neither do you care for all these things. Learn to live with it – better now than later. That’s what life in the industry-without-double-quotes is. That’s how we live daily. If Person A cannot talk to Person B, I’m worthless to the world regardless of what transport protocol I may have used and how many layers of encryption I may have put on it.

The one final suggesstion I would make to you – learn to make commitments and learn to live with them. Never have excuses based on technology, hard-work, or impressive-sounding names – that’s for cowards.

6. Ensure you’re thinking of deployment: How many projects in our labs are run directly from IDEs, and how many are binary executables? Even more so, how many of those executables are packaged? On Linux, there’s deb and rpm, on Windows there’s MSI. How many projects can you “install->double-click->run”? Again, would you use my products if I gave you a large source tree? This is a tough call, but it’s an important one. Being able to write something that’ll run on even one machine that you don’t have control over provides immense pleasure.

7. Use all the tools at your disposal: Use a source versioning system. Store your code in a CVS/SVN/VSS repository. Keep incremental changes as diffs.

Compare those diffs. Look at how you wrote the code. If something goes wrong, you can revert back to a working build instantly. It’s important to be able to use these tools regardless of where you work. If you’re handling any content, a versioning system is critical – even if you’re only working with office documents and not source code. Your company/clients/stakeholders are going to want your sharepoint to hold all incremental changes to documents that are made. Use source-analysis tools to find memory holes. Even knowing that a certain tool exists for a certain job can be valuable.

8. “If I do all this, when will I do my project?” The answer is simple – do a simple project, but one in which you can focus on all aspects of development. I had attempted to make this suggesstion in the syllabus, but the university is still living in the 1980′s so we’ve got to give them some time before they can catch up on almost 22 years. In the meantime, I personally recommend build only notepad if you have to. Build sudoku. Build minesweeper. But build something that you yourself should use for hours on your own. Not for testing, but for actual “using”. Build something that allows you to explore various challenges – how to version source code being edited by 20 people at a time, how to write comments, how to build binaries, how to build packages, how to ensure and test packages will install and run and take care of dependencies for unknown configurations. These are all fun and interesting aspects, but most importantly, they’re just plain old bread-and-butter aspects of a programmer.

When I say real code, go out to a public CVS. Contribute just 100 lines to Apache or MySQL, or Postgres, or Linux. Instead of a 10000-line kernel-thingy, submit 100 lines to _the_ kernel. 100 lines that will be used by hundreds of thousands of users all over the world! Now that’s “real” code! It’s not the quantity but the quality that counts.

Naturally, I don’t discourage building a kernel driver, but then have the guts to commit to it. It shouldn’t be done for “40 marks”. The only acceptibility criteria is that it be merged into the kernel tree. There should be no other criteria. If you’re solving the problem of “non-existence of a driver”, after your solution, this problem must go away (as I mentioned above). After your solution, there should “exist a driver” – not on your PC or in your lab, but in the kernel production tree. Being able to make this commitment is what the thrill is all about. Make a commitment on notepad, but then ship it! It should go into production. If you can’t commit on even notepad, then you’re seriously in the wrong place and reading the wrong blog. Its upto you – whether you want to be a coward and be non-committal, or be courageous and commit to a small deliverable!

August 15, 2007

Ubuntu Servers Hacked

Filed under: Uncategorized — Tags: — archisgore @ 11:42 am

Disclaimer: All content in this blog represents my opinions and only my opinions, and does not even remotely, incidentally, implicitely, or accidentally, or in any other way represent the opinion of my current, past or future employers or any institutions I may have been affiliated or associated with.

http://it.slashdot.org/article.pl?sid=07/08/15/1341224&from=rss

Now before we misinterpret it, let me make that clear. I don’t mean servers of other companies which are running Ubuntu. I meant servers of Ubuntu.

I’m sure Microsoft must have had something to do with this! But until we can somehow figure out how Microsoft was involved, let’s just assume it wasn’t them and look at alternatives:

I once had a major mail-debate with a prominent freedom-fighter regarding system administrators. His argument was “Linux requires admins with a high IQ” whereas “Windows doesn’t”. Now call me stupid, but I thought this was a good thing for windows. Afterall, I was under the impression that software was meant to simplify our lives (I keep making wrong assumptions on so many fronts lately). Then he goes to say that because Linux requires Administrators with high IQ, they are always competent – by virtue of being Linux admins, whereas a Windows admin is not of high IQ because it’s Microsoft’s fault that Windows is easy to use. Yes, Microsoft is responsible that companies who use Windows, hire admins with low IQ. (disclaimer – I do not subscribe to this opinion personally).

Anyways, a whole community’s eyes (thousands of eyes, to quote said mighty freedom-fighter) did not notice a (probably must be low-iq) procedure that I, as a college student was smart enough to know – use everything over SSL. I quote: “as using unencrypted FTP transfers with accounts”. I’m pretty sure there was something called ssl and secure ftp and ssh in college. We’re talking about servers running GPL’d programs, which, according to some, by virtue of applying the GPL, got thousands of eyes going over them, combined with the fact that they’re being run by the community (add a thousand or so more eyes to the mix), and add to that the fact that we’re dealing with the most popular Linux distro in the world – one that’s going to be the flagship into the entry of Linux on preinstalled desktops by Dell (add a few hundred yet more eyes). Hmm…. maybe I missed something…..

The community has it’s share blames to throw on Canonical too. The kernel has backward compatibility issues with Canonical-provided hardware, which I presume was bought with all kinds of freedom-fighter-retaliation-threats to whoever sold it to them, which means that there’s no hardware lockin (besides the servers were working till now – I mean something was throwing that webpage at me!). Now I found out from slashdot that Vista’s just stupid that it’s not backward compatible with lots of hardware. However, the Linux kernel gets patches “overnight” at the click of your fingers (especially on hardware that was bought with that purpose in mind, and which worked with the previous version). However, it’s Canonical’s fault that they sponsored this hardware. Maybe I missed something again……

Wait, there’s more….. there were missing security patches. Damn that Windows Update! Applies patches authomatically huh? Must be for those low-IQ admins. With high-iq people who can apply patches “overnight”, why bother?

Wait there’s even more…. I’ve faced “case studies” and “statistics” of how Linux machines never ever fail, and any windows failure is always Microsoft’s personal fault, and the configurations had nothing to do with it. And let’s not forget the whole hardware compatibility Vista gave up – just how mean and evil of it! I guess the freedom-fighter-threatened vendor who sold that hardware (or Canonical to have bought it) must be so damned evil, that even after all those threats, he sold hardware that chose to be backward compatible with an upgrade in the kernel. Shouldn’t the stupid, arrogant, low-IQ vendor have anticipated future kernel-changes and made hardware that could live with them? Since the code is by-definition always right (thanks to those thousands + thousands + hundreds of eyes), the evil hardware decided not to allow the kernel to upgrade. Maybe it was that damned DRM again…

Afterall, it’s always everyone else’s fault right?

May 14, 2007

GPLv3: Is Stallman taking sides?

Filed under: Uncategorized — Tags: — archisgore @ 12:46 am

An year ago, way before I had joined Microsoft, I had a discussion with some friends regarding the “Software as a service” scenario. Most specifically, the discussion began at LinuxAsia where some of my friends were saying that it’s sad that nobody from Maharashtra competed in the Summer of Code that year. Google, as you might know, is the great supporter and promoter of open source software, uhmmm…, just so long as it competes only with Microsoft.

I have asked this question on numerous mailing lists repeatedly, only to get the standard freedom-fighter response: you’re crap, you don’t know anything, we know everything, you can’t oppose us, anyone who raises doubts against Stallman is stupid, and so on and so forth….. (if you’ve been in the FOSS guys long enough, you might have seen this a lot of times already). The question is quite simple, “I want to study how Google searches the web. Kindly point me to the tarball.”

There was a nice, but highly ignored, article on Slashdot about a hypothetical “Honest Public License”, where software-as-a-service would also need to be open sourced. They had intended to add a clause in GPLv3 that would have made things interesting in the Patents vs. Stallman vs. Proprietary vs. Google vs. Microsoft vs. Torvalds vs. Raymond camp. The clause was justifiably removed due to technical reasons (and I rather agree with them).

So in response, GNU is going for an Affero GPL that addresses the problem of “Software as a service”, or the “ASP Loophole” as the GNU website calls it. The AGPLv2 is quite appropriate to solve the problem in mind. On the “ASP Loophole” thing, I wonder if the GNU is publically declaring that ASP is the dominant and preferred method of developing webapps. If not, why not call it the PHP/JSP/Coldfusion/etc loophole? I’m not an expert, but I’m pretty sure people can prevent their PHP/JSP scripts from being visible on the web. But then again, not being a recincarnated Bilbical Prophet, what would I know?

Everyone in the FOSS world praises google for it’s webservices, and criticizes Microsoft for not following that model fast enough. Let us take this assumption to be true. If webservices are the future and there won’t be a space for Microsoft on the desktop, then the GPL is a “Library GPL” since your libaries will be on the web. In other words, the GPL is the LESSER GPL. If I used only SOAP webservices for my programs, I have effectively reduced the GPL to LGPL status.

What really bothers me is the fact that the GPL is still called the GPL. It begs to question whether the GNU is taking sides, and taking a softer stance towards the likes of Google just because they’re a threat to Microsoft. If so, I wonder if I’d really care to listen to “matters of principle” from such a crowd. And I’d encourage you to do the same.

Am I saying everyone should be forced to open up their webservices code? Heavens no! And I assure you, when I work at Microsoft, I wouldn’t either. But I’d certainly like Stallman screaming that people do so. I’d like Stallman to point out the names of companies and people who don’t do it today. I’d like Stallman to discourage people from using those services which they cannot study, modify or improve; the way he does towards Microsoft products.

Not everyone uses the GPL today, and not everyone will use the AGPL tomorrow. But if Microsoft is evil for not GPL’ing their code, then any webservice provider should be treated the same for not AGPL’ing it. If you spread FUD about how Microsoft might be stealing your data from their SQL Server, or through Office, _because you can’t see the code_, then how’re you so sure a modified GPL’d program isn’t doing the same on any webservice? Now I may not be a reincarnated Biblical Prophet, but I do have some common sense.

The ‘trick’ with using the AGPL is that nobody knows what the hell an Affero GPL is. They’ve never heard of it. This means that a proprietary webservice provider can always be a “GPL Promoter” and people won’t know any better. I suggest that the AGPL be made the GPL, and the current GPL be called the Lesser GPL.

And now most of you are going to respond about how some things are “acceptable” and some are “not acceptable” (where Microsoft falls under unacceptable and everyone else is acceptable). There are two responses I’d like to give here:

1. Effectively, Stallman has become the Torvalds of the ’90s. Torvalds quoted, “He who writes the code, should have the freedom of deciding its license.” The documentation on GPLv3 specifically states, “Those who want to open source their webservices can optionally use the Affero GPL License while others can continue using the current GPLv3.” Isn’t this the same thing with some fancy prophetic terminology as opposed to the simple human-understandable one Torvalds used?

2. Who get’s to draw the “line” between acceptable and unacceptable? Who gets the play God? Who gets the authority to judge what’s right and whats wrong? Who gets to say, “Anything that’s non-GPL’d is pure evil, but anything that’s not AGPL’d but GPL’d is okay.”? What about protecting my freedom to study what services I use and improve them if I want? What about my freedom to copy the code of those services on my own servers and running them on my intranet instead of wasting heavily-paid-for external bandwidth everytime I want to edit docs hosted online?

March 22, 2007

Build 24 – a not so unique contest, but fun none-the-less!

Filed under: Uncategorized — Tags: — archisgore @ 12:41 pm

Microsoft occasionally has interesting contests and competitions on campus to keep us motivated and to come up with reasons to give out T-shirts, Pizza, and prizes.

Right now, the currently ongoing contest is called Build 24. It began yesterday at high noon and will go on till noon today.

During this contest, we have to come up with as many crazy whacky ideas, concepts, prototypes, demos, etc. that we can. There’s a lot of teams participating, and loads of action going on. The manager sitting in the cabin opposite mine is working hard as I and my partner take a break. I’m watching a movie while he’s taking a nap.

I’ll post how it goes here. Whatever the case, it’s damn exciting and exhiliarating. The judges panel is made up of elite people – Srini being amongst them.

I have quite a few whacky ideas of my own, but not sure if I’ve got enough time to prototype them all. Since my job is to build rapid prototypes as fast as possible, it’s going to be an embarrassment if I can’t deliver at least two prototypes. If Peri had been the organiser, he’d have expected a lot more than that.

One of them is half done. But hit a technical snag. The other is awesome in design, but not sure if I’ll get the time to investigate everything before prototyping it.

February 24, 2007

MSR’s Innovations: Photosynth – totally awesome!

Filed under: Uncategorized — Tags: — archisgore @ 8:13 am

Take a look at Photosynth.

Microsoft’s photosynth seems to be completely groundbreaking technology. Another excellent example of what Microsoft Research is upto in those underground labs back in Redmond and right here in Banglore.

I really don’t know how to explain this in words. Afterall, the name itself suggests something visual. So without further ado, I’ll just point you to a video that shows you.

As they say, actions speak louder than words. Actions with words (known in contemporary society as ‘video’), speak even louder:

Video: Photosynth Microsoft Research

Compare this with Flickr or any other photo sharing. Now when this comes out, naturally there’ll be a big debate on how Microsoft replicated current ideas and made them “slightly” better. But let’s hope yahoo 360′s timestamping is reliable enough to help me provide proof against that in the future.

Photosynth can be found here http://labs.live.com/photosynth/ as a technology Preview.

Older Posts »

Theme: Silver is the New Black. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 122 other followers