This paper proposed the idea of co-training: use inexpensive unlabeled data for improving initially weak classifiers. The weak classfiers are firstly trained by a few labeled examples to achieve coarse discrimination. It then repeatedly selects some instances from the unlabeled data pool, labels them and re-trains itself using these newly labeled data along with the ones it has labeled before. This approach can dramatically improve the performance of classifiers as it is learnt on a larger set of "training data".
The working situation of this approach requires that each example contains two kinds of (but not limited to) independent cues for classification. Furthermore, the author assumes that both cues are consistent in their decision, i.e., take f1 and f2 as the classifiers for cues x1 and x2, f1(x1) should equal f2(x2) at probability close to 1. These two conditions allow cues of one kind to be connected ( and thus strengthened) through "bridging" on the other kind. For example, assume observation 1 contains feature A and B and observation 2 contains contains feature C and D in the first and second kind respectively. Given a new observation with feature C and B, we may refer all the three observations to the same class as we believe both cues are in agreement. This forms the basis of co-training.
The experiments on real data in the paper did support the idea of co-training. Nevertheless, I think one problem here is whether we should put all the work to the learning algorithm. If we do get two kinds of feature for each example, why don't we first perform dimension reduction such like LDA so that the examples of same class would cluster together. In this way, maybe the so-called connected component sub-structure could be discovered before any learning algorithm is done. I would like to know if anyone has done research on such topics.
沒有留言:
張貼留言