Dados do Trabalho
Título
ARE MACHINE LEARNING MODELS BETTER THAN THE CURRENT EQUATIONS USED TO ESTIMATE GLOMERULAR FILTRATION RATE?
Introdução
An accurate prediction of glomerular filtration rate (GFR) is extremely important to classify and define chronic kidney disease stages. The estimation of GFR is based on equations derived from regression models, mostly common are CKD-EPI, MDRD and Schwartz. The aim is to evaluate the performance of machine learning models compared to the standard equations to predict the measured glomerular filtration rate
Material e Método
The study included a cross-section retrospective sample of 10,610 participants referred to a Hospital in Lyon, France to undergo GFR measures for suspected kidney dysfunction or kidney donation. The GFR was measured by urinary inulin clearance. We split the data into derivation (training) and validation (test) datasets. The machine learning models were tree-based model (gradient-boosting decision trees (xgBoost), and LightGBM), Lasso regression, and cubist regression. To compare the accuracy of the models in the test set we used Root mean square error (RMSE) and Bias (median of the difference between measured GFR and estimated GFR). We report the P30 (percentage of eGFR values within the 30% percent limits above and below the measured GFR)
Resultados
The conventional equations used to estimate GFR had higher values of RMSE and bias compared to machine learning models in the test set. The best conventional equation was Schwartz (RMSE=19.877, median residual difference= -1.000, p30=0.782) and the worst was MDRD (RMSE=45.908, median residual difference= -2.000, p30=0.714) considering overall population. The best model for the overall population was xgBoost (RMSE=15.577, median residual difference= -0.093, p30=0.832) (Figure 01). The performance of machine learning models were also better than conventional equations in the splits of GFR (below 60 and over 60ml/min) and age strata). For machine learning models the agreement in the Bland-Altman plot was similar between all models. The summary SHAP plot showed that creatinine was the most important predictor followed by age and sex (Figure 02). We also showed a non-linear relation between creatinine and age with measured GFR
Discussão e Conclusões
We demonstrated that machine learning models were superior to conventional equations used to estimate GFR in all subgroups of age and kidney function. As an additional advantage we could use single equations for adults and children. We suggest that machine learning models must be considered in the future actualizations of eGFR equations
Palavras Chave
Glomerular Filtration Rate; machine learning; chronic kidney disease
Área
Doença renal crônica
Instituições
UNESP - São Paulo - Brasil
Autores
BRUNA FERRAZ DEORIO, ABNER MACOLA PACHECO BARROS, JULIANA TEREZA CONEGLIAN ALMEIDA, JULIANA MACHADO RUGOLO, LUCAS FREDERICO ARANTES, NAILA CAMIAL DA ROCHA, MARILIA MASTROLA CARDOSO DE ALAMEIDA, MONICA AP DE PAULA DE SORDI, LUIS GUSTAVO MODELLI DE ANDRADE