Volume 5, Issue 3
Using Self-Organizing Maps for Binary Classification with Highly Imbalanced Datasets

Zichen Deng

Int. J. Numer. Anal. Mod. B, 5 (2014), pp. 238-254

Preview Full PDF BiBTex 60 373
  • Abstract

Highly imbalanced datasets occur in domains like fraud detection, fraud prediction, and clinical diagnosis of rare diseases, among others. These datasets are characterized by the existence of a prevalent class (e.g. legitimate sellers) while the other is relatively rare (e.g. fraudsters). Although small in proportion, the observations belonging to the minority class can be of a crucial importance. In this work we extend an unsupervised learning technique-Self-Organizing Maps-to use labeled data for binary classification under a constraint on the proportion of false positives. The resulting technique was applied to two highly imbalanced real datasets, achieving good results while being easier to interpret.

  • History

Published online: 2014-05

  • AMS Subject Headings


  • Cited by