Skip to main navigation menu Skip to main content Skip to site footer
×
Español (España) | English
Editorial
Home
Indexing
Original

Data oversampling and feature selection for class imbalanced datasets

By
Krishnakumar V. ,
Krishnakumar V.

Research scholar, Department of Computer Science, Periyar University, Salem, India

Search this author on:

PubMed | Google Scholar
Sangeetha V. ,
Sangeetha V.

Head & Assistant Professor, Department of Computer Science, Periyar University Arts and Science College, Pappireddipatti, Dharmapuri, India

Search this author on:

PubMed | Google Scholar

Abstract

Introduction: Significant advancements and modifications have been implemented in data classification (DC) in the past few decades. Due to their infinite quantity and imbalance, data becomes challenging for classification. The biggest concern in DM (Data Mining) is Class Imbalance (CI). To avoid these issues in recent work proposed map reduce based data parallelization of class imbalanced datasets.
Methods: A novel Over Sampling (OS) technique called Minority Oversampling in Kernel Canonical Correlation Adaptive Subspaces (MOKCCAS) has been suggested with the objective to minimize data loss throughout (FSP) Feature Space Projections. This technique takes advantage of the constant Feature Extraction (FE) capability of a version of the ASSOM (Adaptive Subspace Self-Organizing Maps) that is derived from Kernel Canonical Correlation Analysis (KCCA). And in classification, Feature Selection (FS) plays an important role because the acquired dataset might contain large volume of samples, utilizing all features of samples from the dataset for classification will decrease the classifier performance. And then data parallelization will be done by using map reduce framework to solve this computation requirement problem.
Result: Then proposes a feature selection model using Mutated whale optimization (MWO) methods and produces features and reduces the time consumption. Finally proposed class balancing model will be tested using uniform distribution based enhanced adaptive neuro fuzzy inference system (UDANFIS). Test outcomes validate the efficiency of the suggested technique by precision, recall, accuracy and Error Rate (ER).
Conclusion: The study subsequently suggests a novel OS approach called MOKCCAS to lessen the loss of data throughout feature space projection.

How to Cite

1.
Krishnakumar V, Sangeetha V. Data oversampling and feature selection for class imbalanced datasets. Salud, Ciencia y Tecnología - Serie de Conferencias [Internet]. 2024 Jun. 21 [cited 2024 Jul. 19];3:935. Available from: https://conferencias.saludcyt.ar/index.php/sctconf/article/view/935

The article is distributed under the Creative Commons Attribution 4.0 License. Unless otherwise stated, associated published material is distributed under the same licence.

Article metrics

Google scholar: See link

Metrics

Metrics Loading ...

The statements, opinions and data contained in the journal are solely those of the individual authors and contributors and not of the publisher and the editor(s). We stay neutral with regard to jurisdictional claims in published maps and institutional affiliations.