Affiliation of Author(s):计算机科学与技术学院/人工智能学院/软件学院
Journal:IEEE Int. Conf. Big Data Smart Comput., BigComp - Proc.
Abstract:Label noise is a common phenomenon when labeling a large-scale dataset for supervised learning. Outlier detection is a recently proposed method to handle this issue by treating the outliers of each class as potential data points with label noise and remove them before training. However, this approach could lead to high false positive rate and hurt the performance. In this paper, we propose a novel and effective method to deal with this issue by combining the strength of outlier detection and reconstruction error minimization (REM). The main idea is add a second verification step (i.e., REM) to the outputs of outlier detection so as to reduce the risk of discarding those points which do not fit the underlying data distribution well but with correct label. Particularly, we first find the outliers in each class by a robust deep autoencoders-based outlier detector, through which not only did we get candidate mislabeled data but also a group of well-learned deep autoencoders. Then a reconstruction error minimization based approach is applied to these outliers to further filter and relabel the mislabeled data. The experimental results on MNIST dataset show that the proposed method could significantly reduce the false positive rate of outlier detection and improve the performance of both data cleaning and classification in the presence of label noise. © 2019 IEEE.
Translation or Not:no
Date of Publication:2019-04-01
Co-author:Zhang, Weining
Correspondence Author:Tan Xiaoyang