Still in draft-purge mode, so lots of older shit from five years ago is coming out now. This post particularly makes no sense whatsoever anymore, partly because the neural network scene is much more mature today, and AI isn’t the sexy-new-thing grads look forward to. But…. since I had this written, wanted to go ahead and throw it out there. I must say I am very proud of the attempt even today, even though it was an embarrassment in the end.
Back in college I was bigtime into signal processing and brain-computer interfaces. The biggest challenge in such a system is data-filtering. You end up with a ton of data per second, and your system has to find the right data to process at the right time. To give you an idea (and this may be outdated stuff, since today people use ECoGs and fMRIs), an EEG has about 64 channels, each sampling at about 100 Hz (assume 8-bit accuracy of each sample – in whatever units – usually micro-amperes).
Obviously there are a lot of conventional tools to process this data (Principal Component Analysis, Independent Component Analysis – which wasn’t of much help since it turned out there is very little cross-talk between channels, Support Vector Machines, etc.) But who in college is content with what we have? I secretly hoped the data had non-linear components (which it didn’t – all non-linear methods gave about 10% more accuracy than linear ones, and it seemed a lot like the reason was memorization rather than a better fit.)
So I had this crazy idea for sensitivity analysis. I had a well-trained Neural Network, and which the old method of branch-and-bound on the input vector to check variance in output is well known, I somehow wanted to model it in an equation, reason being, through sampling, you can find out that the answer is very sensitive to dimension X1, but not so sensitive to dimension X2. However, I wanted to know what Xi’s were most important and what weren’t and by how much. It’s the same concept you use in PCA – you want to pick ’r’ channels out of ‘n’ channels so that you get a ’90%’ accuracy. PCA gives you the contribution index of each channel to the final output. The idea is a trivial extension of the sampling method used above, but simply substitutes polynomials in place of the logarithmic/exponential functions used.
Enter the brand new idea of sampling each neuron at ‘n’ intervals in its range, and using a curve-fitting method to generate an ‘n’th order polynomial for each neuron, and then injecting that into the next neuron it feeds. Here’s the idea:
1. Each input’s domain is determined. Whether it’s an Int32, or Int64 or double or quad. You know the exact bounds of the set. Therefore, whatever function you use, (in my case TanH),and since backpropagation requires the function to be continuous and 2nd-order differentiable, you know that it’s range is finite and well-known. This range feeds into the next neuron, so it applies to all neurons.
2. Since most used networks (most used by me) were 2-layer, it was a tractable problem.
3. What you do is, you sample each neuron in the first layer at ‘n’ evenly-spaced points in it’s domain, and you compute output-values. Then you fit a polynomial (Spline, Bezier, whatever) to mimic that exact shape (you don’t do this earlier, because this is a fixed graph that can’t be “trained” anymore using backpropagation).
4. You feed in the polynomial as a symbolic variable into the next neuron’s polynomial too, and then simpify the result.
5. What you would get is an n^2 (2 being the number of layers) order polynomial for each output in “m” unknowns (‘m’ being the number of input variables).
Based on the constants and exponents for each variable, you can judge what it contributes to the output. It felt like a good way to quantify in relative terms and symbolically, the importance of each variable, and to find which channels/dimensions could be dropped entirely without losing significant accuracy.
Frankly, if I worked with ANNs ever again, I’d still go this route, especially for problems where the nonlinearity is unknown and you find an ANN that gives you a good fit. Of course, if I worked with ANNs again, I’d catch up on where the world is today, but that’s another matter.