WACV paper on multi* fusion submitted last week
Last week we submitted a paper to the 2009 Worshop on Applications of Computer Vision (WACV). The paper presents some recent results on fusing multiple views and multiple modalities together to improve classification performance. The application was to zooplankton classification which has been a focus of my research for the last four years.In contrast to more typical fusion scenarios where assumptions about the different modalities are made to yield better performance (e.g. reliability weighting) this method assumes nothing about the different sensors. In fact, it abstracts the sensors into single-view "agents" who only communicate through a classic Bayesian network by passing messages. However, the the nodes in the network do not follow a typical message passing scheme, but rather use confidence-weighting to combine inputs together.The confidence is estimated via the sidelobe ratio which is (p1-p2)/p1 where p1 is the largest probability and p2 is the second largest probability. In the classification problem, these are the posterior class probabilities, and therefore the sidelobe ratio tells you how much one class is favored over the others. When there is a strong favorite, the ratio is close to one, and when each class is equally likely, the ratio is equal to zero. We call this a "confidence" following the notion that a classifier which puts roughly equal weight on every class has no confidence that the class with slightly higher probability is the correct class.Computing the confidence is at the heart of the algorithm and has an important connection with the classifiers used for processing each modality. We use one-vs-rest SVMs with a slight twist. Rather than using the binary outputs of each machine to yield a hard label. The outputs of the machines are taken before applying a threshold and converted into a probability using the softmax function. While it is well known that sparse learning functions can do a poor job of estimating probabilities, it does seem to matter much in this scenario.The output of the SVM is equal to the distance of the example to the separating hyperplane in feature space. Examples that are very close to the hyperplane can not be easily classified, and in many cases, the SVM is trained such that it makes no effort to correctly classify these examples. For these examples, the SVM output tells nothing about the correct class. The softmax function then takes that result and distributes probability equally among classes, and confidence weighting then (appropriately) sets the influence of this predication to zero.In the monomodal, monoview world, this would have no benefit, because the confidence only say that the system has total ingnorance about the correct class label. However, in multiview or multimodal, additional examples are available, and each one will lie at a different distance from the hyperplanes. So now, the confidence ratio can be used to directly compare predictions based on each example. If each prediction alone does okay, and each prediction offers unique information, then the fusion can dramatically boost performance. This is the whole idea. However, note that it does not depend on any a priori information, nor does it care about the type of data so long as a trained classifier exists for it.That is the key benefit of the algorithm. While it may not perform as well as a fusion processor which is optimized for a specific type of operating environment, it can adapt to changes in the reliability of the sensors on the fly, and does not need to relearn a new weighting scheme. The weighting is based only on the relationship of the example to the margin of the classifier.Based on the promissing results of this paper, there are several important issues to address:
- Performance as a function of data variation from the training set.
- Performance using only a single classifier for each modality.
- Adding reliability weigting when appropriate.
- Impact of using different base classifiers (does it matter?).
- Search for any provavble characteristics of the fusion performance. (Probably don't exist)