Data oversampling and feature selection for class imbalanced datasets

V. Krishnakumar; V. Sangeetha

doi:10.56294/sctconf2024935

Original

Published: 2024-06-21

DOI: https://doi.org/10.56294/sctconf2024935

Data oversampling and feature selection for class imbalanced datasets

Abstract

Introduction: Significant advancements and modifications have been implemented in data classification (DC) in the past few decades. Due to their infinite quantity and imbalance, data becomes challenging for classification. The biggest concern in DM (Data Mining) is Class Imbalance (CI). To avoid these issues in recent work proposed map reduce based data parallelization of class imbalanced datasets.
Methods: A novel Over Sampling (OS) technique called Minority Oversampling in Kernel Canonical Correlation Adaptive Subspaces (MOKCCAS) has been suggested with the objective to minimize data loss throughout (FSP) Feature Space Projections. This technique takes advantage of the constant Feature Extraction (FE) capability of a version of the ASSOM (Adaptive Subspace Self-Organizing Maps) that is derived from Kernel Canonical Correlation Analysis (KCCA). And in classification, Feature Selection (FS) plays an important role because the acquired dataset might contain large volume of samples, utilizing all features of samples from the dataset for classification will decrease the classifier performance. And then data parallelization will be done by using map reduce framework to solve this computation requirement problem.
Result: Then proposes a feature selection model using Mutated whale optimization (MWO) methods and produces features and reduces the time consumption. Finally proposed class balancing model will be tested using uniform distribution based enhanced adaptive neuro fuzzy inference system (UDANFIS). Test outcomes validate the efficiency of the suggested technique by precision, recall, accuracy and Error Rate (ER).
Conclusion: The study subsequently suggests a novel OS approach called MOKCCAS to lessen the loss of data throughout feature space projection.

Keywords:

Imbalance data,

Data Mining (DM),

parallelization,

Feature Space Projection (FSP),

Minority Oversampling (MO),

Kernel Canonical Correlation Adaptive Subspaces (KCCAS),

Whale Optimization (WO),

How to Cite

Krishnakumar V, Sangeetha V. Data oversampling and feature selection for class imbalanced datasets. Salud, Ciencia y Tecnología - Serie de Conferencias [Internet]. 2024 Jun. 21 [cited 2024 Jun. 29];3:935. Available from: https://conferencias.saludcyt.ar/index.php/sctconf/article/view/935

Copyright Notice

The article is distributed under the Creative Commons Attribution 4.0 License. Unless otherwise stated, associated published material is distributed under the same licence.

Article metrics

Google scholar: See link

Metrics

Metrics Loading ...

Vol. 3 (2024)

See full issue

Revistas / Journals

Issue

About

Author Guidelines

Data oversampling and feature selection for class imbalanced datasets

Abstract

Keywords:

How to Cite

Copyright Notice

Article metrics

Metrics

Vol. 3 (2024)

Disclaimer