Machine Learning for Prediction of Tearing Mode Instabilities in Magnetically Confined Plasmas
This project involves applying different Machine Learning techniques to predict the occurrence of instabilities in the core of a nuclear fusion reactor. We have been using data from the Tokamak at San Diego. Using techniques such as orthogonal series estimators for distribution-to-real regression on the radial variables, we have been able to produce significantly better results than the baseline model. The progress report for this ongoing project can be found here.
R&D Machine Learning Intern at Bloomberg, NYC
During the summer of 2015, I worked for 15 weeks at Bloomberg in the Big Apple. I had a wonderful time, applying all the knowledge I had gained through courses, to real-world problems in the field of Machine Learning. Under the mentorship of Dr. Leonid Razoumov, I worked on applying ensemble methods built with strong constituents to big data problems, in order to reduce the training time. We experimented with a variety of aggregation techniques to figure out good means to impact performance as little as possible as compared to a single classifier.
Software Engineer Intern at Facebook-MPK
But as you must be knowing, an internship a top-brass firm in the Silicon Valley is always more than just work. I met so many awesome people (yes, I met Zuck too!! proof: pic), and more importantly, they were all suh friendly and helpful, I never really felt like struggling, something that I had feared before joining, this being my first ever project outside the academic world. The 'infinite' free food isn't a myth - I had to bike a lot in order to compensate for the extra calories I was taking in, and guess what - they even gave me a facebook helmet for doing so. Trust me, internships don't get better than this. Read about it here. Other than that, I toured Los Angeles, Yosemite National Park, Great America amusement park, San Francisco, Alcatraz, etc. I did miss out on Vegas - so, that is on my bucket list now.
Visiting Scientist at Institute of Science and Technology, Austria
During the summer of 2012, I worked as a research intern with Group Henzinger at IST Austria, a small but serene institute with top-class researchers. There, I studied the effect of stochastic delay on biological processes such as protein production and transcriptional signaling. The work I did was to formally verify the correctness of the delayed Continuous Time Markov Chain (CTMC) model for gene regulatory circuits for a 1-particle system. After my return to IIT Bombay, I had summarized all my work in a poster as part of a research symposium and the interested onlooker can find it here
An internship in Europe is incomplete if you do not visit the major Schengen states. And, so we did, spending weekends in Switzerland, Italy, Paris, south Germany, Benelux, Bratislava, Vienna and western Austria. One of my favourite experiences was the Netherlands vs Bulgaria soccer friendly that we watched at the Amsterdam Arena, where I also happened to capture this pic from the top row at the other end of the stadium.
Quantum Computing and Natural Language Processing
This is the project that I had taken up as my undergraduate thesis. Along with my batchmate Nishanth and under the able guidance of Prof. Pushpak Bhattacharyya, I explored the applications of quantum computing principles to provide insights into better/efficient algorithms for common natural language proessing tasks, our motivation being as follows:
With the development of quantum mechanics, new paradigms have opened up in many sciences. One such paradigm is a novel way of performing computation directly using quantum mechanical principles. Quantum computing looks at the act of computing from a radically different viewpoint from that of classical theories of computation, the most popular among the latter being the model of Turing machine. Once, we have a quantum theory of computing, the next task would be to develop algorithms for various problems using quantum computing which brings us to the focus of the thesis. A close relation between quantum mechanics, natural language processing and the functioning of the mind has been proposed by many previous works such as A teleportation-based algorithm and this dissertation. In many tasks in Natural Language Processing, we have run into efficiency roadblocks these days. For example, the Word Sense Disambiguation (WSD) Problem was conceived quite long back but the best known algorithms till date do not provide an accuracy better than 65-70%. Hence, in this thesis, we seek to study whether quantum computing can be used to give efficient algorithms for such problems which are a part of NLP. The full report can be found here. We also submitted a short paper to ACL 2014, which was unfortunately deemed as being too futuristic.
Quantum Computer Simulator
This is a debugger-cum-simulator that was developed (by me in a team of 3) in DrScheme for building quantum networks using binary trees and higher order functions. It was part of a course project for CS 154 - Abstractions and Paradigms in Programming in spring 2011. We had also implemented algorithms such as Grover's, Deutsch-Jozsa and Fast Fourier Transform. This project has been documented at QuiCkS and stands as my contribution to the Open Source Community.
Quantification of Entanglement
This project was a consequence of the NIUS (National Initiative on Undergraduate Sciences) Camp that I had attended at the Homi Bhabha Centre for Science Education, Mumbai in June 2011. Mentored by Prof. Prasanta Panigrahi at the Indian Institute of Science Education and Research (IISER), Kolkata, I studied quantum entanglement and its applications to teleportation. I came across various new aspects of quantum computing such as the No-cloning and No-deletion theorems. My task there was to work on quantifying entanglement and we came up with ideas like (i)representation of the state as a point in an n-dimensional cube and looking at distance from centre (ii)using K-map reduction. Further, we also tried to explain the failure of W-state in teleportation, using this quantification
Clustering in Time-Series Forecasting
I worked with Prof. Bernard Menezes at IIT Bombay on an R&D project in forecasting. In the past, his students and he have identified clusters in time series pertaining to retail sales of various items - furniture, apparel, housing, beverages, etc. I have been trying to understand, at a very fundamental level, the basis of this clustering. This proved to be a challenging task which requires a detailed study of the mathematical characteristics of various cluster-specific forecasting models.
During my time at IITB, I have delivered seminars on the following topics:
- Efficacy of the Accelerated Proximal Gradient Method for large-scale convex optimization - Implemented various methods mentioned in this lecture, ran experiments in MATLAB and performed a convergence-analysis.
- Maximum Entropy Markov Model and its application to Part-of-Speech tagging, for the introductory course on Artificial Intelligence. Click here for slides
- Oracle Turing Machines and the Baker-Gill-Solovay Theorem, done as part of the course on Complexity Theory in spring 2013. Click here for slides
- Computational Humour - Recognition and Generation; and a model for the Sense of Humour. This is a seminar we presented in November 2013 under the course on Natural Language Processing, Speech and the Web. Click here for slides.
Other Academic Projects
- Supervised Mind Reading: Uncovering Text from Neural Data (Spring 2015) The aim of this work is to predict meaningful natural language sentences given the magnetoencephalography (MEG) recordings corresponding to the brain activity when a person reads the same sentences. As an essential part of language processing in the brain, word integration should be included in any model that aims to capture the neural representation of sentence processing. We came up with two approaches to account for this. The project report can be found here.
- Regression on Distributions (Spring 2015) We studied and presented regression from distributions to reals/distributions, and the FuSSO (Functional Shrinkage and Selection Operator). Also covered non-parametric divergence estimation, and its application to image classification. The project report can be found here.
- SAT and Genetic Algorithms for Sudokus (Spring 2015) This project addresses the problem of encoding Sudoku puzzles into conjunctive normal form (CNF), and subsequently solving them using polynomial-time propositional satisfiability (SAT) inference techniques. We discuss solving and generating Sudoku puzzles with evolutionary algorithms. Another goal is to test if we can use genetic algorithm solvers as rating machines to test the difficulty levels of new puzzles. The project report can be found here.
- Graph Mining (Fall 2014) Implemented various graph algorithms for computing the degree distributions, pagerank, connected components, radii of the vertices, eigen-decomposition of the adjacency matrix, and triangle counts using SQL. We also perform anomaly detection by extracting features from the egonets. The project report can be found here.
- Epilepy Seizure Prediction Challenge (Fall 2014) This project is based on a kaggle competition wherein the task is to distinguish between ten minute long data clips covering an hour prior to a seizure, and ten minute iEEG clips of interictal activity. The goal of the competition is to demonstrate the existence and accurate classification of the preictal brain state in dogs and humans with naturally occurring epilepsy. The project report can be found here.
- CFGLP Compiler (Spring 2013) Enhanced the Control Flow Graph Language Processor to support control flow statements and procedures including recursive ones. This was done using lex for lexical scanning and yacc for parsing, and implemented in C++.
- VM on OS-161 (Spring 2013) Built Virtual Memory system with swap space and implemented various page replacement policies on OS/161.
- Backup performance-analysis (Fall 2012) Simulated performance of disk-to-disk backup for a toyDB containing N pages where the disks are within the same controller having a cache of M pages. The aim was to do as much sequential reading/writing as possible, and to use as much parallelism in IO across the spindles as possible. Then, I studied performance by varying M. Since the backup may take a long time, I had implemented check-pointing too.
- Moodle XData (Fall 2012) Developed a web interface to IITB's XData SQL grading system to allow query testing and provide the facility for online assignments for a database course.
- Wireless Multi-Point Relay Simulator (Fall 2011) Built a wireless network simulator in C++ as a project for the course on Data Structures and Algorithms using innovative concepts to enable maximum data transfer and minimum congestion. We had implemented RTS-CTS like features without having actually had a course on Networks by then. The project summary/report can be found here.
- Intel 8085 Simulator (Fall 2010) Implemented in C++ with a 3-level debugger as part of our first CS course, CS 101 at IIT Bombay. The graphical interface is a complete IDE, devloped using Qt and allows the user to compose programs, save and retrieve them from disk.