[back]
Minimal
Data Set Optimal Classification
James R. Psota, Malik
Magdon-Ismail, Yaser Abu-Mostafa
Abstract.
We are developing classification techniques to detect the nature of
a pump malfunction given pump vibration sensor data. The size of the
data set is very minimal, creating the need for an extremely robust
classifier that incorporates all available information. We investigated
several generalized nearest neighbor and Bayesian classifiers. By incorporating
hints, or information about the problem known independently of the data
set, we show that performance can be significantly improved.
Motivation
and Aims. Honeywell
Corporation approached the Learning Systems Group with a request for
a system that classifies vibration sensor data. They provided us with
information about the pump's prior operation in terms of examples of
frequency responses for numerous faults and fault severities that pump
exhibited. Ultimately, we aim to use our knowledge of the pump's prior
operation to develop classification techniques that can predict the
current mode of the pump in terms of fault and severity of fault. This
system could conceivably forecast severely faulty operation, allowing
necessary repairs and preventing costly malfunctions.
In order
to attain reliable data that describes a pump enduring a fault, the
pump must be forced to exhibit that fault. This turns out to be very
expensive, so we have a very minimal data set. The size of the data
set implies that it contains considerable uncertainty and is therefore
somewhat unreliable. Thus, we must rely on all the information we have
about the problem, including that which is known independently of the
data, to make our classifier as robust as possible.
Research.
We investigated numerous approaches in search of the optimal classifier
for this specific problem specific problem. First we considered the
Nearest Neighbor approach, which classifies based on minimal Euclidean
distance. For visualization purposes, we plotted two of the available
twenty-four frequencies and noted the ensuing decision regions (Figure
1). The blue points at the center of each decision region represent
the original data points - the colored, symbolized regions around them
represent classifications that would result if new data arose with those
features. (Of course, this technique is applicable to any number of
dimensions; in our case we used twenty-four dimensions as our input
feature vector was composed of twenty-four components, one for each
frequency.)
|
|
Figure
1.
We then
incorporated the Honeywell-provided cost matrix into the nearest neighbor
implementation. It is summarized in the following table, where 0 represents
the normal operation and the pairs (i, k) represent fault i at
severity level k. For example, (2,1) would represent a mildly
severe fault 2. The row represents the actual state while the
column represents the classified state and the entry represents the
cost incurred when the actual state is classified as indicated by the
column. This expert input allows us to nudge the existing decision boundaries
in such a way that, if there is some uncertainty in the measurement,
the cost is minimized.
|
0 |
(i,1) |
(i,2) |
(i,3) |
(j,1) |
(j,2) |
(j,3) |
0 |
0 |
3 |
7 |
9 |
3 |
7 |
9 |
(i,1) |
3 |
0 |
2 |
4 |
1 |
3 |
5 |
(i,2) |
3 |
2 |
0 |
2 |
3 |
1 |
3 |
(i,3) |
7 |
4 |
2 |
0 |
5 |
3 |
1 |
(j,1) |
3 |
1 |
3 |
5 |
0 |
2 |
4 |
(j,2) |
3 |
3 |
1 |
3 |
2 |
0 |
2 |
(j,3) |
7 |
5 |
3 |
1 |
4 |
2 |
0 |
Figure
2 shows that as uncertainty increases, boundaries move by greater amounts
to favor a less consequential decision.
|
|
Figure
2. Click here or on image for
animated version. Incorporating the cost matrix into nearest neighbor
shows that certain boundaries (normal-medium) vary as uncertainty
increases, whereas others (normal-mild) do not. For the faults
shown, we see that as uncertainty increases this classifier favors a
medium classification over a normal classification.
Although
intuitively reasonable, the nearest neighbor approach isn't optimal
from a probabilistic standpoint because it does not consider variances
and prior probabilities. Thus, we implemented a Bayesian classifier
based on Gaussian class conditional densities. The data provided some
estimates for error bars in those classes where the error bars were
not available. To further incorporate all available data, we implemented
a Bayesian minimal risk classifier to incorporate the cost matrix.
To determine
which components of the data were most relevant to classification, we
performed a principal component analysis. It showed that 90% of the
energy can be summarized in 10 orthogonal directions and 99% can be
summarized in 18 orthogonal directions. These observations will help
in interpreting results.
To make
the data more reliable, we argued that it contains error; we responded
by perturbing each point in such a way that intuitive requirements about
the system's behavior are satisfied (Figure 3). Specifically, these
requirements constitute a monotonicity hint, and it arises from the
following understanding: if the feature vector for the center of a class
moves in a certain direction when the pump progresses from, say, normal
operation to fault F, mild operation, then if the vector
continues to move in the same direction, the pump should eventually
result in fault F, medium operation. Incorporation of
the hint also allows us to estimate values the data set is missing.
For instance, if we do not have data for fault F, severe
operation, but we do have data for fault F, mild and fault
F, medium, we can extrapolate and estimate values for fault
F, severe. We have implemented several working algorithms
for incorporation of the hint. Each algorithm returns the new, monotonic
class centers as well as the price paid for movement.
|
|
Figure
3. Click here or on image for
animated version. Here we see the "monotonicity algorithm" in action.
Notice that as the points are perturbed, they become more monotonic.
Finally, in iteration 8, we see that the class centers within each fault
are monotonic. The ensuing decision regions are also much more "progressive"
in that a transition from normal to severe is impossible
without first entering through mild and medium severity.
Future
Work. We are awaiting an additional, more-complete data set from
Honeywell; it will enable us to further improve our classifier and test
its performance. Finally, we hope to build a very strong classifier
by first extracting the relevant features using Principal Component
Analysis, making this new data obey monotonicity, and then feeding this
enhanced data to the Bayesian Minimal Risk Classifier. We expect this
combination of methods to give us more reliable results than any solitary
one.
top
|