Archis's Blog

March 10, 2013

“Sugar” in soda

Filed under: Science — Tags: , , , , — archisgore @ 6:28 am

Been seeing a lot of visuals on the amount of “sugar” soda has. I decide to test the hypothesis. Lets begin with results:

Started with non-diet sweeted soda:

20130309-170330.jpg

20130309-170342.jpg

20130309-170355.jpg

Then, I left it to evaporate. Lets assume an evaporation loss of 0%. Add another 10% offset for stuff-that’s-not-water-or-sugar.

After a week of careful evaporation, here is what I was left with:

20130309-170630.jpg

Since high fructose corn syrup is sugar in liquid form, and when my rate-of-change as well as the change-of-content went to zero for a week, I assumed I had gotten rid of water and nothing else.

This is, empirically, the actual amount of sugar in your drink.

Now I don’t doubt sodas are bad. I don’t doubt the amount of sugar is bad. You probably shouldn’t be subjecting your body to that kind of ingestion. I know that the “amount” isn’t visually comparable to crystallized sucrose. That’s my point too. It may contain the “equivalent of” a pound of table sugar. It doesn’t actually contain a pound of table sugar.

Therefore, I want to know, when someone posts those visual amounts, by what scientific method do they identify equivalent crystalline sucrose.

September 17, 2011

The rise of context-free language

Filed under: philosophy, Science — Tags: , , , , , — archisgore @ 6:48 am

Here’s an intriguing thought. I have a super-intelligent friend (one of those whose guides is Turing Award winner) who works on NLP. We have our occasional long-term phone calls where some or the other topic comes up for discussion. This time it was worth blogging about.

Quick overview – languages have rules, structures, etc. Sometimes, the rules become too complex, or at times, they are so specialized, they turn into a look-up table (i.e., not a lot of generaization.) Whenever you can’t generalize, you add entropy. Putting aside, for a moment, the poetic beauty of a language and the art of eloqution, many rules are redundant.

Consider language as simply a tool, a means to an end rather than the end in itself, designed to express a thought. If so, the less ambiguity something has, the easier it is, and the better it solves its purpose. When one first begins to learn computer languages, or even when they think of “parsing” English, every single person that I know goes through the thought process above. Why not just have a language that isn’t as nuanced? Why not design a simpler language? Esperanto certainly came out of a need, but building out a complete new language may not have been the solution. It appears that the need is already being met by modification to English itself.

I am beginning to believe that the very efficiency computational linquists want in a simple-to-parse language, is also the kind of simplicity the human brain wants. There is a certain idea you want to express. The nuances of whether I will do something, as opposed to whether something will be done by me, while undoubtedly helpful, may not be as necessary as we think. Facebook/Twitter are helping reinforce that idea. If you look at most non-proofread contemporary speech, it almost feels like a context-free language. It appears that what NLP wants, NLP may end up getting, simply because what makes NLP so hard is what also makes language itself so hard for most people.

Texting is the classic blatant example. Most texts are simply a gathering of words put together. There is a certain amount of context and syntax present to avoid ambiguity, but overall, the tools used to elimiate ambiguity are the ones that can do it in as blatant a way as possible, with as little simplicity subtlety as possible. Similarly, few FB/Twitter posts seem to be carefully crafted treatises, but generally just words that present an idea. The less context necessary for the idea to be parsed, the better it is communicated. Five years ago, a lot of “old school” people, including me, would complain of the utter lack of punctuation in sentences. Instead of adapting punctuation correctly, I found that people learnt to phrase their text in such a way that addition of commas and full-stops became unnecessary. A modern FB post is as decipherable without punctuation, as it is with. That’s some creative adaptation, right there.

Another reason for this is search engines. Very rarely do you search for something like, “Give me movie times for today evening in Redmond.”

The same idea is expessed as simply as, “Movies redmond today”

Over time, it is not difficult to imagine this is how I might begin communication with a friend. Even the verb is implied and not explicitly stated! The parsing rules for this language are just ridiculously simple – tokenize the sentence, and you know what it’s saying.

Then again, I’m not blaming the internet or machines for this phenomenon. I think it is simply been the first time that a large population the entire earth is literate (even 30 years ago, when I was born, I knew plenty of people who couldn’t read or write.) Written language was, no matter how many people may dislike this, an elite previledge – and to some extend, an end in itself. When you are a club of handful people, you can end up in an ego-pissing match. What we might call spoken ‘peasant’ language was always utterly simple and efficient (though I find a lot of ideas I cannot express to them due to the lack of a vocabulary that can convey subtle differences.)

I’m not advocating anything here, but we have to admit that any complex and large system always tends towards reducing entropy over time. It does not mean literary art will have no appreciation, but it is an interesting thought. This would be an interesting hypothesis to test out, if only for the academic validity of the idea. Is modern human language finding a path towards reduction in the energy and ambiguity required to express an idea? Is it a dual-feedback loop where NLP systems are getting better with feedback, but also driving certain generalizations back into the human world?

June 2, 2011

Astrologers…

Filed under: Entertainment, philosophy, Science — Tags: — archisgore @ 2:10 am

Did I spell that right? I’m supposed to write a document that’s going to take 2 hours, and I’ve pushed it too far. Good time to get all my thoughts out to the world one at a time. Today, let’s rip on Astrologers a bit.

To give you an idea of the motivation, I picked up a hillarious book at the airport during my last India visit called “Am I a Hindu.” That’s going to lead to a few posts, but you’ll have to wait until the next time I run out of things to do, and face the inevitable document-writing task. Today, I began reading this book to take my mind off some blocking issues. I had read a part of it during my flight, and I recalled an emotional rollercoaster between humor, apathy and perhaps anger (or annoyance). I’m giving you this context because this post is regarding one argument that book made (I’m willing to discuss other arguments.)

Lets get this out of the way - lack of disproof, is not a proof. I’d love to talk to anyone who believes that isn’t true (that was sarcastic; if you think lack of disproof is proof itself, I probably don’t ever want to speak to you in my life). The Indian Government proclaims it’s a science, and I claim I’m king of the world. Neither of the clauses is relevant for this discussion. A common argument we hear from pro-Astrology people is, “Why is it so difficult to believe stars could affect the physical processes within you?”

It’s not difficult to believe at all. I never claimed a remote planet doesn’t have gravitational influence on me. I’m claiming you’re full of crap. When I rip on Astrologers and Prophesizers, I’m making fun of them. I’m claiming they’re full of bullshit. It’s about them! That’s as direct as I can say it (offense intended). I have no problem believing that we might be able to model those interactions, and what the result of that influence would be. I’m not saying it can’t be done. I’m saying you’re not the ones doing it.

“If people can predict weather, why can’t people predict fortunes?”

We could extend this argument infinitely – If people can predict weather, and people can predict fortunes, why can’t people predict when we’ll get a man on Mars? If people can predict weather, the stock market, the next Tsunami, and some Earthquakes, sure, it may be possible to predict fortunes too. Doesn’t change the fact that you’re not the one to do it.

I think the modern Astrology debate has gotten too impersonal. Perhaps we’re trying to be too politically correct, or the Astrologers are just better at reframing the problem than we are at noticing that it got reframed right under our noses. I believe in open-heart surgery, however, if you’re any one of the people reading this blog, I can safely say I’m not letting you come anywhere near my heart. If you tried to convince me, I’d find it midly humorous and highly annoying. It’s the same with Astrologers – science doesn’t deny modelling. Modelling is a fundamental tenet of Science. What we’re saying is, you’re no good at it, and that you have no idea what you’re talking about.

It’s true that all models are merely approximations. That’s why in addition to predicting weather, weather-predictors are also constantly ‘learning’ from the outliers. They’re on the search for new variables, and better sampling methods. I’ve not seen major publications that have indiciated the discovery of any new variables or processes, or models that provide a better fit than what historic Astrology demonstrates. Even if that’s accomplished, a theory that does not demonstrate a prediction record significantly higher than a random process, is not considered a theory at all. Are astrologers willing to submit themselves to a controlled experiment where they can demonstrate their predictors are any better than a random predictor?

Hence the title of this post – ‘Astrologers’. Don’t make this about the “Science of Astrology”. I don’t claim it won’t work. This is about you – I claim you don’t work.

April 1, 2011

Making a Brain-Computer Interface

After five years of consistently blogging, and consistently failing to do something about it, the BCI-building has begun. The academician in me needed an outlet and I’ve been craving for something hardcore technical for a while now. So begins the first formal attempt at building a cheap home-made BCI. I’m going to label all entries so that it will be easy to follow progress on this.

Unlike my regular posts which are composed with at least some thought, this series of entries will be more like a journal. I’ll post entries with what I do, what methods I follow, everything I try. With any luck, there should be enough detail that anyone could replicate what I am doing with full fidelity. The attempt at any original research is not even a long-term goal. The goal here is to, and I say this as directly as possible, with no misconceptions whatsoever, and no subtext, “to have fun.”

If you have any interesting theories, experiments, I would love to hear your thoughts. If you’d like to participate, you’re welcome to join. Most followups will be brain-dumps of my thoughts – unedited, raw, and naive. A side-effect of this blog is to also demonstrate any points of failure, or stuff that doesn’t work. At this point, I have nothing working. I don’t want to claim that I knew everything, in case I succeed.  I won’t use the excuse that I never wanted to succeed in case I fail. I want to make this work, and even if it doesn’t, it won’t change the fact that I still wanted it to work. If I make a mistake it’s going to be published, since I will try and publish what I intend to do before I do it as a validation.

At the moment, I picked the PocketEEG from PocketNeurobics (apparently an Australian company) to get started. I’ve been out of this world for a while (four years), and it takes a long time to catch up on the IEEE Journal of Biomedical Engineering where most BCI work is (used to be?) published. The WaveRider system is a clinical device but it costs too much for me at the moment, but if this device fails, I’ll save up for the WaveRider and go with it. It’s a two-channel device.

I haven’t got the electrodes yet, so haven’t taken any readings, but plenty of work has to happen before the electrodes become relevant. The recommended software to be used with it is BioExplorer which I think costs too much. I instead tried to connect the open-source BioEra software. The UI takes me back to the good old colleges days before the polished world of  iPhones and iPads. It has a way to define the processing pipeline but I rather hate doing it graphically. The dongle does provide a fake serial port on your PC that you can read from, but I didn’t want to go through that much trouble. I intend to use BioEra to capture the signal and send it across a local TCP channel into a server I’ll write that does what I want it to. BioEra seems to support plugins, but I love the flexibility of having my own executable as opposed to being “hosted in” another exe.

This pipeline investigation should take until the weekend, and hopefully I’ll have it coded over the weekend. If I’m lucky, and the electrodes to arrive, I’ll be sure to post some recordings of motor activity of right and left hands, captured from the C3 and C4 points at the beta band.

The reason I don’t want to use the provided FFT block is because I much rather enjoy playing with parameters initially. Do I want it sliding window at every point or will I perform it on intervals? If I intend to do something like auto-regression, I’d much rather use my own buffers and optimize the pipeline to operate on. I’ll post what happens.

August 19, 2010

Polynomial approximations of neural networks

Filed under: Science — archisgore @ 11:23 am

Still in draft-purge mode, so lots of older shit from five years ago is coming out now. This post particularly makes no sense whatsoever anymore, partly because the neural network scene is much more mature today, and AI isn’t the sexy-new-thing grads look forward to. But…. since I had this written, wanted to go ahead and throw it out there. I must say I am very proud of the attempt even today, even though it was an embarrassment in the end.

Back in college I was bigtime into signal processing and brain-computer interfaces. The biggest challenge in such a system is data-filtering. You end up with a ton of data per second, and your system has to find the right data to process at the right time. To give you an idea (and this may be outdated stuff, since today people use ECoGs and fMRIs), an EEG  has about 64 channels, each sampling at about 100 Hz (assume 8-bit accuracy of each sample – in whatever units – usually micro-amperes).

Obviously there are a lot of conventional tools to process this data (Principal Component Analysis, Independent Component Analysis – which wasn’t of much help since it turned out there is very little cross-talk between channels, Support Vector Machines, etc.) But who in college is content with what we have? I secretly hoped the data had non-linear components (which it didn’t – all non-linear methods gave about 10% more accuracy than linear ones, and it seemed a lot like the reason was memorization rather than a better fit.)

So I had this crazy idea for sensitivity analysis. I had a well-trained Neural Network, and which the old method of branch-and-bound on the input vector to check variance in output is well known, I somehow wanted to model it in an equation, reason being, through sampling, you can find out that the answer is very sensitive to dimension X1, but not so sensitive to dimension X2. However, I wanted to know what Xi’s were most important and what weren’t and by how much. It’s the same concept you use in PCA – you want to pick ’r’ channels out of ‘n’ channels so that you get a ’90%’ accuracy. PCA gives you the contribution index of each channel to the final output. The idea is a trivial extension of the sampling method used above, but simply substitutes polynomials in place of the logarithmic/exponential functions used.

Enter the brand new idea of sampling each neuron at ‘n’ intervals in its range, and using a curve-fitting method to generate an ‘n’th order polynomial for each neuron, and then injecting that into the next neuron it feeds. Here’s the idea:

1. Each input’s domain is determined. Whether it’s an Int32, or Int64 or double or quad. You know the exact bounds of the set.  Therefore, whatever function you use, (in my case TanH),and since backpropagation requires the function to be continuous and 2nd-order differentiable, you know that it’s range is finite and well-known. This range feeds into the next neuron, so it applies to all neurons.

2. Since most used networks (most used by me) were 2-layer, it was a tractable problem.

3. What you do is, you sample each neuron in the first layer at ‘n’ evenly-spaced points in it’s domain, and you compute output-values. Then you fit a polynomial (Spline, Bezier, whatever) to mimic that exact shape (you don’t do this earlier, because this is a fixed graph that can’t be “trained” anymore using backpropagation).

4. You feed in the polynomial as a symbolic variable into the next neuron’s polynomial too, and then simpify the result.

5. What you would get is an n^2 (2 being the number of layers) order polynomial for each output in “m” unknowns (‘m’ being the number of input variables).

Based on the constants and exponents for each variable, you can judge what it contributes to the output. It felt like a good way to quantify in relative terms and symbolically, the importance of each variable, and to find which channels/dimensions could be dropped entirely without losing significant accuracy.

Frankly, if I worked with ANNs ever again, I’d still go this route, especially for problems where the nonlinearity is unknown and you find an ANN that gives you a good fit. Of course, if I worked with ANNs again, I’d catch up on where the world is today, but that’s another matter. :-)

Theme: Silver is the New Black. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 130 other followers