Thesis

A Neuromorphic Machine Learning Framework based on the Growth Transform Dynamical System

In neuromorphic machine learning, neural networks are designed in a bottom-up fashion - starting with the model of a single neuron and connecting them together to form a network. Although these spiking neural networks use biologically relevant neural dynamics and learning rules, they are not optimized w.r.t. a network objective.

In contrast in traditional machine learning, neural networks are designed in a top-down manner, starting from a network objective (loss function), and reducing it to adjust the parameters of a non-spiking neuron model with static nonlinearities. Although these are highly optimized w.r.t. a network objective (usually a training error), they do not produce biologically relevant neural dynamics - potentially losing out on the performance and energy benefits of biological systems.

We developed a spiking neuron and population model that reconciles the top-down and bottom-up approaches to achieve the best of both worlds. The dynamical and spiking responses of the neurons are derived by directly minimizing a network objective function under realistic physical constraints using Growth Transforms. The model exhibits several single-neuron response characteristics and population dynamics as seen in biological neurons. We are currently working on designing new scalable learning algorithms for the spiking neuron model for cognitive applications.

Spiking support vector machines

We developed a theoretical framework, through the choice of different primal-dual mappings, to design binary SVMs that exhibit noise-shaping properties like sigma-delta modulators, and SVMs that encode information about the classification hyperplane using spikes and bursts. The network was able to allocate its switching energy so that only the support vector neurons (i.e., the neurons that are the most important for classification) spike, while the remaining are silent. The network also exhibits firing rate and time-to-first-spike encoding of the input stimuli, as in biological neurons.

Generalized support vector machines

We extended the domain of Growth Transforms to bounded real variables and used it to design novel variants of SVMs with different convex and quasi-convex loss functions through the choice of different primal-dual mappings. We proposed an efficient training algorithm based on polynomial growth transforms, and compared and contrasted the properties of different SVMs using several synthetic and benchmark datasets. Simulation results showed better scalability and convergent properties than standard quadratic and nonlinear programming solvers.

Continuous-time analog optimization

We developed a novel continuous-time analog optimization circuit using a growth transform-based fixed-point algorithm. The sub-threshold current-mode growth transform circuit inherently enforces optimization constraints and naturally converges to a local minimum of an objective function. Circuit simulations for several quadratic and linear cost functions with normalization constraints show an excellent match with floating-point software simulation results. The circuit is generic enough to solve a multitude of objective functions simply by changing the external circuitry.

Personal Projects

The 3rd YouTube-8M video understanding challenge

We participated in this Kaggle competition hosted by Google Research in summer 2019, where the task involved building a segment-level classifier for temporal localization of topics within a video, using the YouTube-8M dataset. The dataset consists of 8 million videos and 1000 classes. The challenge lay in the fact that there were only noisy video-level labels for the training dataset, and only a much smaller validation dataset with accurate segment-level labels.

We formulated the problem as a multiple-instance, multi-label learning and developed an attention-based mechanism to selectively emphasize the important frames in a video by trainable attention weights. The model performance was further improved by constructing multiple sets of such attention networks in order to detect multiple high-level topics in a single video. We further fine-tuned the models after training, using the segment-level dataset. Our final model consisted of an ensemble of attention/multi-attention networks, attention-based deep bag of frames models, recurrent neural networks and convolutional neural networks. We ranked 13th out of 283 teams (top 5%) in the public leaderboard, and published our work at the 3rd Workshop on YouTube-8M Large Scale Video Understanding, at ICCV 2019 in Seoul, Korea.

Relevant skills: Video understanding, attention networks, ensemble learning
Tools: Python, TensorFlow, Google Cloud Platform

Speaker change detection from EEG and eye-tracking data

We recorded and analyzed EEG and pupillometry data from volunteers to find signatures of speaker change detection in multi-talker speech. We used machine learning models like logistic regression and SVMs to predict speaker change detection with 78% accuracy from eye-tracking data pooled across subjects.

In an extension of this work, we analyzed behavioral responses from a listening test by volunteers for two multi-talker speech stimuli sets, one in a language familiar to the participants (English), and one unfamiliar (Mandarin). We found statistically significant lower miss rate and higher false alarm rate in the familiar vs unfamiliar language, and longer response times for the familiar vs unfamiliar language. A machine system, using state-of-the-art diarization system with x-vector embeddings, was also designed to perform the same task. We found that the machine system falls far short of human performance for both languages.

**This project started at Telluride Neuromorphic Cognition Engineering Workshop 2019, and was extended later. It was in collaboration with Dr. Sriram Ganapathy and Venkat Krishnamohan (IISc), Dr. Neeraj Sharma (CMU) and Dr. Lauren Fink (Max Planck Institute).

Relevant skills: Time-series analysis, statistical analysis, supervised learning
Tools: Python (Pandas, NumPy), MATLAB

Graduate Course Projects

Sparse Representation for Face Recognition (Computer Vision)

I implemented a sparse coding algorithm called Sparse Representation based Classification (SRC) for face recognition and tested it on a subset (35 images from each of 19 subjects) of the publicly available 'Labeled Faces in the Wild' dataset - a challenging dataset exhibiting great variations in pose, illumination and occlusion levels. Faces were detected from the dataset using the Viola-Jones detector. I analyzed the choice of two sets of features - down-sampled images and eigenfaces. For the latter, eigenfaces were constructed from the pool of training images, and both training and test sets were projected on different number of eigenfaces to compute features of different sizes.

In all cases, eigenfaces gave a much more discriminative set of features compared to merely downsampling the images. Thus contrary to the theory behind sparse representation, for difficult face databases with misalignments, occlusions, and variations in pose and illumination, it can be said that the choice of features is critical even when sparsity in the recognition problem is properly harnessed.

Relevant skills: Sparse coding, feature engineering
Tools: MATLAB

Odor classification with Tempotron (Biological Neural Computation)

We implemented Tempotron, a supervised synaptic learning algorithm that can distinguish between spike patterns having different second or third-order spike-time (or temporal) statistics. We applied the Tempotron model to toy datasets, as well as neural responses (spike data) recorded from the Antennal Lobe of the locust olfactory system in response to two odors (hexanol and octanol). The neural data is likely to have an even higher order temporal statistics, and our goal was to see if our model can sufficiently capture the temporal structure of the dataset (25 trials for each odor). We were able to obtain an odor classification accuracy of 90% with leave-one-out cross-validation.

We also analyzed whether the spatiotemporal structure of spike trains remained similar across trials over the entire duration of the odor stimulus. We found out that the spike statistics remain sufficiently distinct for each odor to produce classification accuracies greater than 90% over the entire duration, but accuracies are usually higher in the steady-state of odor presentation, indicating that it takes a few hundred milliseconds for the neural responses to produce stable, odor-specific patterns.

Relevant skills: Neural computation, supervised learning
Tools: MATLAB

Undergraduate Thesis

Hand-shape based biometric authentication system using Collaborative Representation based Classification

For our undergraduate final year project, we developed a hand-shape based biometric authentication system and verified it on a hand-image dataset of 300 images collected in the lab. The biometric authentication system used morphological operations to extract the hand contours, followed by Radon Transform in an optimal direction to produce an unique one-dimensional feature vector. For classification, we used Collaborative Representation based Classification (CRC), where the feature vector of a test image is coded over a dictionary of similarly processed training images from all subjects (or classes), and identified as a member of the class which produces the least reconstruction residual with Regularized Least Square (RLS). CRC is useful when there are very few training samples per class, which was the case for our dataset. We obtained a classification accuracy of 96.7% on our dataset.