Using Self-Organizing Maps for Binary Classification with Highly Imbalanced Datasets

Volume 5, Issue 3

VINICIUS ALMENDRA AND DENIS EN ̆ACHESCU

Int. J. Numer. Anal. Mod. B, 5 (2014), pp. 238-254

Published online: 2014-05

Preview Purchase PDF 1110 12806

Cited by

google scholar semantic scholar

Export citation

Abstract

Highly imbalanced datasets occur in domains like fraud detection, fraud prediction, and clinical diagnosis of rare diseases, among others. These datasets are characterized by the existence of a prevalent class (e.g. legitimate sellers) while the other is relatively rare (e.g. fraudsters). Although small in proportion, the observations belonging to the minority class can be of a crucial importance. In this work we extend an unsupervised learning technique-Self-Organizing Maps-to use labeled data for binary classification under a constraint on the proportion of false positives. The resulting technique was applied to two highly imbalanced real datasets, achieving good results while being easier to interpret.

Keywords

unsupervised learning self-organizing maps imbalanced datasets supervised learning

AMS Subject Headings

62H30

Email address

BibTex
RIS
TXT

@Article{IJNAMB-5-238, author = {VINICIUS ALMENDRA AND DENIS EN ̆ACHESCU}, title = {Using Self-Organizing Maps for Binary Classification with Highly Imbalanced Datasets}, journal = {International Journal of Numerical Analysis Modeling Series B}, year = {2014}, volume = {5}, number = {3}, pages = {238--254}, abstract = {Highly imbalanced datasets occur in domains like fraud detection, fraud prediction, and clinical diagnosis of rare diseases, among others. These datasets are characterized by the existence of a prevalent class (e.g. legitimate sellers) while the other is relatively rare (e.g. fraudsters). Although small in proportion, the observations belonging to the minority class can be of a crucial importance. In this work we extend an unsupervised learning technique-Self-Organizing Maps-to use labeled data for binary classification under a constraint on the proportion of false positives. The resulting technique was applied to two highly imbalanced real datasets, achieving good results while being easier to interpret.}, issn = {}, doi = {https://doi.org/}, url = {http://global-sci.org/intro/article_detail/ijnamb/232.html} }

TY - JOUR T1 - Using Self-Organizing Maps for Binary Classification with Highly Imbalanced Datasets AU - VINICIUS ALMENDRA AND DENIS EN ̆ACHESCU JO - International Journal of Numerical Analysis Modeling Series B VL - 3 SP - 238 EP - 254 PY - 2014 DA - 2014/05 SN - 5 DO - http://doi.org/ UR - https://global-sci.org/intro/article_detail/ijnamb/232.html KW - unsupervised learning KW - self-organizing maps KW - imbalanced datasets KW - supervised learning AB - Highly imbalanced datasets occur in domains like fraud detection, fraud prediction, and clinical diagnosis of rare diseases, among others. These datasets are characterized by the existence of a prevalent class (e.g. legitimate sellers) while the other is relatively rare (e.g. fraudsters). Although small in proportion, the observations belonging to the minority class can be of a crucial importance. In this work we extend an unsupervised learning technique-Self-Organizing Maps-to use labeled data for binary classification under a constraint on the proportion of false positives. The resulting technique was applied to two highly imbalanced real datasets, achieving good results while being easier to interpret.

VINICIUS ALMENDRA AND DENIS EN ̆ACHESCU. (2014). Using Self-Organizing Maps for Binary Classification with Highly Imbalanced Datasets. International Journal of Numerical Analysis Modeling Series B. 5 (3). 238-254. doi:

Copy to clipboard

BibteX RIS TXT

The citation has been copied to your clipboard

- LOGIN -

- E-mail verification -

- REGISTER -