Research

I have kept myself busy doing research in AI throughout my undergraduate years at Michigan. I include a summary of the problems I have worked on, my contributions, and the end results.

Publications, presentations, and posters

-Akram Helou et al. Cansat: Building a mini-satellite. Michigan Research Community Symposium, University of Michigan, Ann Arbor, 2007.
-Akram Helou et al. Investigating the properties of reward, Summer Undergraduate Research in Engineering, University of Michigan, Ann Arbor, 2009.

Research Projects

I- Unsupervised Learning of Class Invariant Features (Work In Progress)

Advisors: Professor Andrew Ng and Adam Coates

Motivation and problem: This project attempts to address three limitations in deep learning algorithms. The first is scaling deep learning algorithms to handle a large number of high dimensional data. The second is learning features which are not only invariant to phase, frequency, and orientation but also invariant to object classes. The third is learning features that decompose images of complex scenes (or other data modalities) to higher level concepts such as object classes.

Approach: The approach taken in this project relies on previous work by Adam Coates on scaling deep learning algorithms and learning the Receptive Fields (RFs) of a deep architecture. The architecture consists of alternating layers of simple and complex units. The simple units output is computed by convolving randomly extracted patches with the training data. The complex cells pool over learned RFs which contain square correlated simple units. The choice of data preprocessing, clustering method for learning RFs, and pooling method are part of what this project is exploring to address the three aforementioned limitations.

Adam and I discussed the various architectural choices. I have contributed ideas into improving the RF learning algorithm by improving the clustering method used in the NIPS 2011 paper. Previously, the algorithm would seed R_n RFs with a randomly chosen feature and then insert the remaining features into the top T most similar RFs where similarity is measured by the square correlation value between the feature and the feature that seeded a RF. This clustering is naïve because a feature that originally seeded the RF may not be a good enough description of the RF after it has expanded to more features. Additionally, some features may be closely related to less than T RFs. I tried to address the first problem by three different clustering heuristics: measuring similarity by looking at the minimum square correlation value of the feature to be inserted and all the features in a RF yielded the best results. For the second problem, I used a cutoff on the square correlation value.

I have implemented most of the code for preprocessing the data, instantiating the architecture and running the various experiments to explore the best combination of data preprocessing, clustering and pooling methods.

Results: On the relatively simple Faces In the Wild (FIW) dataset we obtained good results for our 1-layer architecture (one simple layer followed by complex cells). Namely, our architecture outperformed a similar architecture that doesn't use RF learning as measured by AUROC for faces classification. Visually inspecting the learned RFs, we could see that they encoded for faces with various poses. However, these results did not hold for a deeper architecture where we have at least 2 alternating layers of simple and complex cells. AUROC values decreased as we go deeper in the architectures. In deeper architectures, we necessarily start with low dimensional filters that encode for low level features. These low level features such as edges are shared amongst virtually all objects and the square correlation metric will pick up on this. Consequently, higher level RFs were not able to construct objects such as faces. Another issue, is that some of the pooling units were not behaving like biological plausible pooling units (selective for orientation and frequency but invariant for phase).

Currently as a first attempt to address the problems noted above we moved to video data and changed the RF learning algorithm to cluster features that are square correlated over time. A short sequence video data is likely to capture a single object undergoing small pose changes. This might make it easier for square correlation to group together low level features into objects. Additionally, video data might allow us to capture slow changing invariances that may yield more biologically plausible pooling units.

II-Deep Learning For Scene Recognition

On my own, and as way to get familiar with deep learning algorithms, I’ve implemented Convolutional Deep Belief Networks (CDBN) and applied them to the problem of learning good scene representation in the aim of doing scene classification. On the Torralba scene recognition dataset consisting of 2600 images of 8 scenes, the learned CDBN representation outperformed the Gist representation.

Results: The paper detailing our research can be found here.

III- Learning Kinematics Models

At Cornell I have worked on a couple projects that are not as interdisciplinary in nature as my other projects. With Professor Ashutosh Saxena, I formulated and implemented a Gaussian mixture model for inferring the 2D kinematic structure of simple 2D objects from a sequence of still images. The model worked well for objects with 2 links and one joint.

Results: The paper detailing our research can be found here.

IV- Continuous EEG based cursor control using Support Vector Machines and Dynamic Time Warping

Introduction: Brain-Computer Interfaces (BCIs) offer a possibility of communication to individuals with severe paralysis or advanced neurodegenerative diseases. One common form of non-invasive BCI is based on Sensori-Motor Rhythms (SMR) known as the μ and β rhythms. These rhythms are “resting rhythms” observed over motor cortex, and are interrupted by real or imagined movement. The μ rhythm’s frequency content is mostly in the 8-12 Hz band, while the β rhythm is closer to 18-26 Hz [1]. With training, a human can increase his or her control of the μ rhythm, and it can be used to control cursor movement in several dimensions.
The μ and β rhythms are typically measured by surface electro-encephalogram (EEG). EEG is safe, inexpensive, and portable, but has a very poor signal-to-noise ratio relative to many other BCI technologies. Because user training to produce reliable μ rhythm signals is time-consuming and the outcome is uncertain [2], adaptive signal processing and machine learning can offer substantial benefits [3].

Problem: One of our initial observations is that most methods used in a setting where data was collected from many subjects across multiple days were rather simple such as LS regression [1] or simple weighting of the μ and β rhythm based on visual inspection of both signals [10]. At the same time, more sophisticated methods such as SVM were employed in the BCI competitions and led to good results. However, the datasets in the BCI competition are very limited. Therefore, our first aim was to investigate whether sophisticated learners can do equally well if not better than simpler methods on an independent, sizable dataset.
Another important observation concerning previous work is that training and testing mostly occurred on the same day (session) for a single subject ([1] is a notable exception, but is a simple technique). Thus, the question of whether the model trained on one or a few temporally close sessions can generalize across future sessions has not been satisfactorily addressed.

Results: The paper detailing our research can be found here.

V- Investigating the Properties of Rewards in Reinforcement Learning

Motivation and problem: In order to allow a RL agent to achieve the user’s intended behavior, one must define an appropriate primary reward function. RL researchers have long chosen to abstract away the problem of studying the source and properties of such rewards and assume them given in order to make rapid strides in solving the pure learning aspect of an RL problem. Despite the central role of rewards in RL, they are not well understood. In this research, we seek to formulate a theory for the study of the properties of various forms of rewards [1]. Specifically, we want to investigate whether different rewards make learning more efficient and scalable to complex problems. There is already evidence that certain types of rewards can make learning efficient and scalable to large problems. For instance, it has been shown that rewards that incite an agent to engage in activities for their own sake rather than to solve practical problems, increases the agent’s competence through acquiring an array of skills, which can be transferred to extrinsically rewarded tasks [2]. Additionally, the addition of the Bonus Exploration Boost parameter to the reward function was shown to approximate the optimal and intractable exploration and exploitation tradeoff [3]. However, in neither of these cases was the reward the primary object of study.

Solution: To study rewards we developed a general computational framework, which searches the space of reward functions for the best rewards given a global performance function and a distribution of environments. My contributions to this work start by experimentally validating the properties of this framework. Some of the demonstrated properties include sensitivity to regularities in a distribution of environments, ignoring features in reward that are inconsequential to the dynamics of the environments, encouraging intrinsic behaviors, optimizing independently from agent architecture, optimizing over agent architecture, optimizing over a finite horizon as opposed to focusing on asymptotic, and others. To study the properties of rewards and how they can be derived, I implemented and contributed to the design of various domains where a regular extrinsic reward (global performance function based reward) does not lead to optimal learning efficiency over a finite horizon. In such experiments, we want to learn the properties of an immense space of reward. Unfortunately, the curse of dimensionality quickly became a computational bottleneck. I had to alleviate this problem by developing and implementing an algorithm that can approximate the result of the optimal rewards framework by adaptively sampling rewards coefficients and then generalizing using regression.

Results: These experiments have been instrumental in demonstrating the impact of some classes of rewards on the performance of an agent over finite horizons. Rewards that best took advantage of the properties of environments while encouraging intrinsic behavior unrelated to an external goal achieved the best performance. In fact, these experiments formed the basis for two papers. I cannot describe the main contribution of this line of work because the papers have not been accepted for publications yet. However, it is safe to say that investigating the role of rewards in improving learning efficiency should be researched more thoroughly.

[1] Singh, Lewis, and Barto. Where do rewards come from? In Cogsci 2009 proceedings.

[2] Singh, Barto, Chentanez. Intrinsically motivated reinforcement learning. NIPS, 2005.

[3] Kolter and Ng. Near Bayesian exploration in polynomial time. Proceedings of ICML 2009.

VI- Extending the Soar Cognitive Architecture with an Episodic Memory (EM) to improve reasoning and learning

EM’s importance: In Soar, EM is designed to store snapshots of working memory at every time step. An EM is a fixed architectural mechanism. EM was added to Soar because it was hypothesized to assist with sensing, reasoning, and learning and thus provide Soar with additional cognitive abilities [1]. For instance, an agent can remember the outcome following the selection of some action in some state to predict the outcome of the same action taken in a similar state . Therefore, EM allows action modeling. As an example of how EM can assist with learning, the stream of episodes stored in EM can be used as an experience source for model based RL.

Problem: The original implementation of Soar’s EM stored all elements in working memory elements at any time step in a master tree, which simplified the graph structure of working memory. While this abstraction allowed performance gains, it prevented a sound search and reconstruction of episodes, which was detrimental to real world applications that relied on the full richness of a graphical representation of episodes

Solution: One specific problem with the old representation of episodes was the impossibility to store multiple identically described edges (Multi-Valued Attributes or MVA), emanating from a single node, across a sequence of time steps since all such edges are merged into one in the master tree. In such a situation, it becomes impossible to consistently retrieve the correct episode or to faithfully reconstruct it in working memory. My contribution is solving this problem by modifying the tree representation of episodes to accommodate for saving and restoring episodes with multiple identically described edges. This was done by noting that a MVA structure does not require a significant amount of memory to save assuming that working memory is not in a degenerate case. It can be shown that a MVA structure can be preserved by recursively storing every parent of identically described edges, the edges themselves, and the children in a compact table representation. These tables would be pointed to from the master tree representing all episodes. Such a representation retained the good performance of the tree representation while allowing for a faithful representation of all episodes and their accurate reconstruction into working memory.

[1] Nuxoll, A. M. and Laird, J. E. (2007). Extending Cognitive Architecture with Episodic Memory. In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI).

More recent papers on the integration of EM to Soar:

Derbinsky, N., Laird, J.E.: Efficiently Implementing Episodic Memory. Proceedings of the 8th International Conference on Case-Based Reasoning, ICCBR (2009)

Laird, J.E., Derbinsky, N.: A Year of Episodic Memory. Workshop on Grand Challenges for Reasoning from Experiences, 21st IJCAI (2009)

VII- Clustering Unlabeled Peptides for Expression Levels Comparision

This research was done at Pfizer’s Global Research and Development’s proteomics laboratory under the guidance of Dr. Julia Bandow.

Problem: My main task revolved around designing machine learning algorithms and building software that enhances the usability of data generated from liquid chromatography and mass spectrometry instrumentation. In the context of a disease, proteins that change expressions between a healthy and infected subject, may reveal themselves to be possible biomarkers or candidate drug targets. Nonetheless, this assumes that we are able to identify the amino-acid sequence information for most peptides which is often not realizable. This results in our inability to cluster identical peptides together within an experiment and hence prohibits any kind of expression levels comparison for these peptides across experiments. Consequently, a tremendous amount of data is not utilized which might contain valuable information leading to the discovery of a new biomarker, drug target, or the understanding of a disease's pathway.

Solution: The solution was to design classification algorithms that would utilize additional data to hint at proteins that might be identical enough to be clustered together without the need to rely on any amino acid sequence information. This would at least allow proteomics scientists to compare expression signals across experiments. The first concept was to couple mass information of each peptide in all experiments, with a liquid chromatography simulation characterizing peptides with specific chemical properties. This was necessary since masses provided from a mass spectrometer are not discriminatory enough. The different thresholds and margins characterizing similitude between a set of peptides within an experiment were not known a priori and therefore had to be learned from the current peptides that had sequence information and then generalized. The algorithm would search for correct parameters while minimizing an error measure. The best parameters are chosen and later on applied within an experiment. The results were validated using tagged sets of peptides. Now all that needs to be done is to compare peptide expression signals across experiments. However, since I still do not have a unique identifying amino-acid sequence for each peptides, I now learn what constitutes similar peptides clusters across experiments. For this, I use k-means where k is initialized to the minimum number of peptides across experiments. This method was sufficiently successful to consistently achieve on average 90% accuracy within experiments and was thus an appropriate approach when proper amino-acid identification is unavailable.

This work was abruptly halted because Pfizer's Ann Arbor location closed in 2007.

I- Unsupervised Learning of Class Invariant Features

II- Deep Learning for Scene Recognition

III- Learning Kinematics Models

IV- Continuous EEG Based Contro

V- Investigating the Properties of Rewards in Reinforcement Learning

VI- Extending the Soar Cognitive Architecture with an Episodic Memory

VII- Clustering Unlabeled Peptides for Expression Levels Comparision

Research

Publications, presentations, and posters

Research Projects

I- Unsupervised Learning of Class Invariant Features (Work In Progress)

II-Deep Learning For Scene Recognition

III- Learning Kinematics Models

IV- Continuous EEG based cursor control using Support Vector Machines and Dynamic Time Warping

V- Investigating the Properties of Rewards in Reinforcement Learning

VI- Extending the Soar Cognitive Architecture with an Episodic Memory (EM) to improve reasoning and learning

VII- Clustering Unlabeled Peptides for Expression Levels Comparision