Volume 5, Issue 3
Using Self-Organizing Maps for Binary Classification with Highly Imbalanced Datasets

VINICIUS ALMENDRA AND DENIS EN ̆ACHESCU

Int. J. Numer. Anal. Mod. B, 5 (2014), pp. 238-254

Published online: 2014-05

Export citation
  • Abstract
Highly imbalanced datasets occur in domains like fraud detection, fraud prediction, and clinical diagnosis of rare diseases, among others. These datasets are characterized by the existence of a prevalent class (e.g. legitimate sellers) while the other is relatively rare (e.g. fraudsters). Although small in proportion, the observations belonging to the minority class can be of a crucial importance. In this work we extend an unsupervised learning technique-Self-Organizing Maps-to use labeled data for binary classification under a constraint on the proportion of false positives. The resulting technique was applied to two highly imbalanced real datasets, achieving good results while being easier to interpret.
  • AMS Subject Headings

62H30

  • Copyright

COPYRIGHT: © Global Science Press

  • Email address
  • BibTex
  • RIS
  • TXT
@Article{IJNAMB-5-238, author = {VINICIUS ALMENDRA AND DENIS EN ̆ACHESCU}, title = {Using Self-Organizing Maps for Binary Classification with Highly Imbalanced Datasets}, journal = {International Journal of Numerical Analysis Modeling Series B}, year = {2014}, volume = {5}, number = {3}, pages = {238--254}, abstract = {Highly imbalanced datasets occur in domains like fraud detection, fraud prediction, and clinical diagnosis of rare diseases, among others. These datasets are characterized by the existence of a prevalent class (e.g. legitimate sellers) while the other is relatively rare (e.g. fraudsters). Although small in proportion, the observations belonging to the minority class can be of a crucial importance. In this work we extend an unsupervised learning technique-Self-Organizing Maps-to use labeled data for binary classification under a constraint on the proportion of false positives. The resulting technique was applied to two highly imbalanced real datasets, achieving good results while being easier to interpret.}, issn = {}, doi = {https://doi.org/}, url = {http://global-sci.org/intro/article_detail/ijnamb/232.html} }
TY - JOUR T1 - Using Self-Organizing Maps for Binary Classification with Highly Imbalanced Datasets AU - VINICIUS ALMENDRA AND DENIS EN ̆ACHESCU JO - International Journal of Numerical Analysis Modeling Series B VL - 3 SP - 238 EP - 254 PY - 2014 DA - 2014/05 SN - 5 DO - http://doi.org/ UR - https://global-sci.org/intro/article_detail/ijnamb/232.html KW - unsupervised learning KW - self-organizing maps KW - imbalanced datasets KW - supervised learning AB - Highly imbalanced datasets occur in domains like fraud detection, fraud prediction, and clinical diagnosis of rare diseases, among others. These datasets are characterized by the existence of a prevalent class (e.g. legitimate sellers) while the other is relatively rare (e.g. fraudsters). Although small in proportion, the observations belonging to the minority class can be of a crucial importance. In this work we extend an unsupervised learning technique-Self-Organizing Maps-to use labeled data for binary classification under a constraint on the proportion of false positives. The resulting technique was applied to two highly imbalanced real datasets, achieving good results while being easier to interpret.
VINICIUS ALMENDRA AND DENIS EN ̆ACHESCU. (1970). Using Self-Organizing Maps for Binary Classification with Highly Imbalanced Datasets. International Journal of Numerical Analysis Modeling Series B. 5 (3). 238-254. doi:
Copy to clipboard
The citation has been copied to your clipboard