Jamshid Pirgazi
1* , Mohammad Mehdi Pourhashem Kallehbasti
1 , Ali Ghanbari Sorkhi
1 , Ali Kermani
1 1 Department of Electrical and Computer Engineering, University of Science and Technology of Mazandaran, Behshahr, Iran
Abstract
Introduction: High-dimensional datasets often contain an abundance of features, many of which are irrelevant to the subject of interest. This issue is compounded by the frequently low number of samples and imbalanced class samples. These factors can negatively impact the performance of classification algorithms, necessitating feature selection before classification. The primary objective of feature selection algorithms is to identify a minimal subset of features that enables accurate classification.
Methods: In this paper, we propose a two-stage hybrid method for the optimal selection of relevant features. In the first stage, a filter method is employed to assign weights to the features, facilitating the removal of redundant and irrelevant features and reducing the computational cost of classification algorithms. A subset of high-weight features is retained for further processing in the second stage. In this stage, an enhanced Harris Hawks Optimization algorithm and GRASP, augmented with crossover and mutation operators from genetic algorithms, are utilized based on the weights calculated in the first stage to identify the optimal feature set.
Results: Experimental results demonstrate that the proposed algorithm successfully identifies the optimal subset of features.
Conclusion: The two-stage hybrid method effectively selects the optimal subset of features, improving the performance of classification algorithms on high-dimensional datasets. This approach addresses the challenges posed by the abundance of features, low number of samples, and imbalanced class samples, demonstrating its potential for application in various fields.