Design of a Classifier model for Heart Disease Prediction using normalized graph model

Heart disease is an illness that influences enormous people worldwide. Particularly in cardiology, heart disease diagnosis and treatment need to happen quickly and precisely. Here, a machine learning-based (ML) approach is anticipated for diagnosing a cardiac disease that is both effective and accurate. The system was developed using standard feature selection algorithms for removing unnecessary and redundant features. Here, a novel normalized graph model (n – GM) is used for prediction. To address the issue of feature selection, this work considers the significant information feature selection approach. To improve classification accuracy and shorten the time it takes to process classifications, feature selection techniques are utilized. Furthermore, the hyper-parameters and learning techniques for model evaluation have been accomplished using cross-validation. The performance is evaluated with various metrics. The performance is evaluated on the features chosen via features representation. The outcomes demonstrate that the suggested (n – GM) gives 98 % accuracy for modeling an intelligent system to detect heart disease using a classifier support vector machine.


INTRODUCTION
The most serious health problem is heart disease (HD), which has affected many people worldwide. (1)Among the most common symptoms of HD are swollen feet, muscle weakness, and shortness of breath. (2)Current methods for diagnosing cardiac disease are not useful for early identification because of a number of issues, such as accuracy and execution time. (3)Therefore, researchers are developing an effective method for detecting heart disease.Without state-of-the-art tools and trained medical personnel, diagnosing and treating heart disease can be very difficult. (4)Numerous lives can be saved by an accurate diagnosis and appropriate care. (5)ccording to the European Society of Cardiology, there are an estimated 26 million HD patients worldwide, and 3,6 million new cases are found each year. (6)Heart disease affects most people in the United States. (7)A doctor often diagnoses HD after patient's history review, physical exam outcomes and related symptoms.Moreover, the outcomes do not reliably identify HD.Additionally; analysis is computationally complex and challenging. (8)reating a non-invasive system using ML classifiers is essential to address these problems.The HD is successfully predicted by the expert system using ML and ANN.The death rate is predicted based in some studies. (9,10)everal researchers (11,12) used online available to address the HD identification issue.During testing and training, ML predictive approachesneedsuitable data.ML model performance improves if balanced datasets are employed for testing and training.Furthermore, the model's predictive skills are enhanced by incorporating appropriate and pertinent elements from the data.To increase model performance, data balancing and feature selection are therefore crucial.Numerous researchers have suggested different diagnosis methods in the literature, but these methods do not reliably diagnose HD.Data preprocessing is essential for data normalization, which helps machine learning models predict outcomes more accurately. (13)The authors suggested ML-based diagnosis approach for identifying HD.Various prediction models are utilized to identify HD.Some features were chosen using the current feature selection algorithms mRMR, Relief, LASSO and local features selection.Additionally, conditional mutual information features selection approach is also used.The optimum hyperparameters are chosen with 10-fold CV technique for validation purpose. (14)Furthermore, the performance of the classifier is gauged using a range of performance measures.The HD dataset is used to evaluate the strategy.The effectiveness is evaluated in comparison to other methods. (15)However, all these techniques fail to fulfill the research requirements.The anticipated model attempts to fulfill the requirements.The study's contributions are as follows: • The first attempt is to resolve the feature selection issues using pre-processing approaches and suitable feature selection approach.These features are provided to efficient classifiers to determine which classifier gives superior outcomes regarding accuracy and other evaluation metrics.
• To increase prediction accuracy and shorten computation times, the authors also introduced the novel normalized graph model () algorithm for feature selection.The suggested algorithm's selected features are evaluated to features chosen by the standard prevailing algorithms to see how well the classifiers performed.Any weak dataset properties show an impact on classifier performance.
• Finally, it is suggested that the heart disease identification may successfully identify HD.
The study is set up like this: Section 2 offers a thorough study of the advantages and disadvantages of various strategies.The approach is described in part 3, while section 4 contains the results.Section 5's summary comes after it.

Related works
Investigators have recommended various ML approaches for HD prediction.To project the significance of various approaches, this work includes learning-based approaches that are now in use.Karthiga et al. (11) developed the HD classification system, which has a 77 % accuracy rate thanks to machine learning classification algorithms.Evolutionary and feature selection techniquesareemployed to the online dataset.In different investigation, Beyence et al. (12) modeled a HD prediction and categorization approach utilizing MLP and SVM algorithms and attained 80,41 % accuracy.The classification system accuracy was 87,4 %.Using enterprise miner, a statistical measurement system, an ANN-based prediction for HD was constructed by Rahman et al. (13) with sensitivity, accuracy, and specificity values of 89 %, 80 %, and 95 %.An study (14) developed a method for diagnosing HD based on ML.Both the FS algorithm and the ANN-DBP method had good results.A system of expert medical diagnosis was developed by Goel et al. (15) for the identification of HD.During the system's Salud, Ciencia y Tecnología -Serie de Conferencias.2024; 3:653 2 development, Artificial Neural Networks (ANN), Decision Trees (DT), and Navies Bays (NB) were employed as predictive machine learning models. (16,17,18,19,20)ANN obtained an accuracy of 88,12 %, NB obtained 86,12 %, and the DT classifier obtained 80 % accuracy.Jenzi et al. (21) modeled three-stage design based on ANNand achieved 88 % accuracy.
An HD identification approach utilizing feature selection and classification algorithms were proposed in. (33,34,35)Here, Sequential Backward Selection Algorithm is employed.K-NN performanceis tested on both feature set and feature subset. (36,37,38,39,40,41,42,43,44,45)he suggested procedure produced excellent accuracy.Raju et al. (46) modeled a hybrid ML-based HD prediction in different works.Additionally, a superior approach for selecting essential features is designed from the data using ML classifiers.The accuracy rate of their classification was 88,07 %.This model was useful for several studies and develop another models. (47,48,49,50,51,52,53,54,55,56,57,58)n improved SVM-based duality optimization technique Venkatalakshmi et al. (59) created HD detection tools.To better understand the significance of our suggested strategy, All of these technologies that are in use now use different approaches to identify HD in its early stages.Furthermore, the computation time of these approaches is high and the accuracy is low.For better treatment and recovery, HD detection needs to be improved in order to make accurate and efficient early predictions. (60,61)As such, the main problems with these earlier methods are their poor accuracy and long computation durations, which may be caused by the presence of unnecessary features in the dataset.New techniques are required for the precise identification of HD in order to address these issues.There is a great need for additional study on improving prediction accuracy.

METHOD
The proposed research is achieved by three successive steps dataset description, feature representation and normalized graph model.The anticipated model provides expert knowledge to the physicians during the crucial time and assists in predicting heart disease in earlier.The experimentation is executed in MATLAB 2020awhere metrics such as recall, precision, accuracy, and F-measure are assessed and contrasted with alternative methods.The block representation is provided in figure 1.

Dataset
UCI ML dataset is used for prediction purposes.When the data set was designed, only 14 subsets of the 303 occurrences and 75 attributes were used in the reported studies.Six samples were excluded from the data set owing to the missing values after pre-processing.There are 297 samples from the remaining dataset with 13-characteristics and 1-output label.Two classes on the output label indicate if HD is present or not.As an outcome, 297*13 features matrix is created.Information about the dataset matrix is provided in table 1.

Relief for feature learning
The Relief method automatically updates the weights for each feature in the data collection.High-weight features should be chosen, while low-weight ones should be ignored.The processes used by the relief to estimate the weights of features are identical.The parameter is m, and the method is repeated through m randomly chosen training samples (R k ) without selection replacement.The "target" sample is R k for every k, and the weight W is updated.Below is an explanation of the relief model's algorithm:

Feature representation
This study introduced mutual feature information analysis to address the feature selection problem.It is a productive feature selection technique created using mutual information.The following steps are part of the designing of the algorithm.Consider the dataset D (X,Y), which, like in Eq. ( 1), is composed of X instances and Y output labels: As stated in Eq. (3), we use preprocess statistical techniques such Min-Max normalization to the dataset D (X,Y): We now use the mutual information approach D to choose the subset of feature (X i ,Y i ).The feature selection approach uses the information to calculate the dataset's value for feature relevance and duplication.Conditional on the outcome of any feature chosen previously, the proposed algorithm selects features that enhances mutual information based on the target class (D).Due to the lack of certain information (output), this factor chooses characteristics that differ which are already chosen, even if it is correct independently.The balance between Salud, Ciencia y Tecnología -Serie de Conferencias.2024; 3:653 4 duplication and relevance is favourable.The feature X n is very compatible with other features and relevant to output Y, X j , where j∈D, according to the higher mutual information value.The condition is mathematically represented by Eq. ( 4): The anticipated model attempts to balance independence and separable power among the significant features with the features selected already.The features X 0 is a significant consideration if I(Y,X 0 |X) is huge for every X chosen already.The foremost execution of the feature scores during the selection process evaluates the features that give more information and reduce redundancy.The model maintains the partial feature score D i which is minimal (min algorithm).The vector store of the chosen features is based on P i .

Normalized graph model
For real-time analysis, various datasets prevail in graph form, normalized to a specific format.This paper considers the proposed normalized graph model (n -GM) constructed using the graph structure.The input data product x∈R N is used to represent the spectral graph convolution, which filters the g θ = diag (θ): Here, U refers to the matrix which is composed of normalized eigenvector graph Laplacian matrix, i.e.L=I N -D (1/2) AD (1/2) =U⋀U T , ⋀ specifies the diagonal matrix composed of L eigenvalues and U T x refers to Fourier transform (x) using is applied in Eq. ( 6): Here, L = 2/λ max L-I N .The normalized graph model is provided with the many convolutional layers as in Eq. (6).When k=1 is given as the total number of convolutional layers, the estimated λ max =2 is taken into account.
Moreover, over-fitting is eliminated by restricting certain parameters when the operating frequencies at every layer are reduced, as in Eq. ( 8): Here, θ= θ 0 = -θ 1 is provided in Eq. (7).However, the eigenvalue interval of I N + D 1/2 AD 1/2 is provided as [0, 2].In the n-GM, the repetitive function outcomes in gradient vanishing or instability.The re-normalization idea is proposed as in Eq. ( 9) to address this issue: Here, A =A+I N ,D ̃ii = ∑ j A ̃ij .The explanation is given as follows: F feature map and filter with C channel, and input signal X∈R (N*C) (where C is Eigenvalues node dimensionality and N→total nodes): Here, ϴ ∈ R (C*F) refers to the filter parameter matrix, and Z ∈ R (N*F) refers to the signal matrix after the convolution process.The filter complexity is O(|ε|FC).A X is considered as the sparse matrix product and dense matrix.

CONCLUSION
The results of the experiment show that, in comparison to conventional methods, the suggested feature selection methodology chooses relevant characteristics more successfully and with higher classification accuracy.Based on investigators perspective, the significant and appropriate features are exercise-induced angina and chest discomfort of the Thallium Scan kind.Some features are not a reliable indicator of the presence of heart disease, according to all algorithm results.When compared to previously proposed approaches.The accuracy of n-GM' using the proposed feature selection model is 98 % which is quite good.Additionally, the machine learning-based technique performs better than existing mining approaches.A slight increase in prediction accuracy can significantly impact the diagnosis of serious diseases.The study's originality is the creation of a system for diagnosing cardiac disease.The feature selection algorithms are newly developed to pick the features.Performance evaluation measures are employed.For testing purposes, the UCI heart disease dataset is utilized.We believe that creating a support system using ML algorithms will make diagnosing heart disease more appropriate.Utilizing feature selection algorithms to choose relevant features that enhance classification accuracy and shorten the diagnosis system's processing time is another novel aspect of our research.We'll apply additional feature selection algorithms and optimization techniques in the future to boost a prediction system's ability to diagnose HD.
with labels, or training data; the number of training samples chosen at random (m); Output: Weighted features (more feature information); 1: n → total samples of training; 2: d → total characteristics; 3: Evaluate feature set W[A]→0; 4: for k→1 to m do 5: Select target samples; // R k 6: Predict the eligible features; 7: for A → 1 to a do 8: W[A] → W[A] -difference (A,R k ,H)/m + difference (A,R k ,H)/m 9: end for 10: end for 11: Evaluate the weighted feature vector; 12: Compute the superior feature

Figure 2 .
Figure 2. Flow diagram of the model

Figure 7 .
Figure 7. Training and validation loss

Table 1 .
Dataset details Input: Input the dataset; D(X,Y) matrix, significant features, finest feature set, mutual information and partial score; Output: Select finest feature D(X i ,Y i ) 1: Execute data pre-processing; 2: Chosen features as ∅ 3: for all features f i ∈O do i >score k and L i <k-1 do 12:Fix L i →L i +1 13:Evaluate VU i among o k and o i ; 14:Fix p i →min (p i (mutual information ik )) 15:end while 16:if p i >score k , then 17:Fix p i >partial score k , then 18:Choose feature subset → significant feature;

Table 2 .
Proposed vs. existing comparison