Prediction of Organic Dyes Absorption Wavelength Using Different Machine Learning Boosting Models
Fluorescent dye molecules have numerous applications in pharmaceutical industries, R & D, bioimaging, fluorescence imaging, light harvesting, drug delivery, and others, and several attempts have been made to develop new fluorescent dyes with desirable photophysical properties. The absorption wavelength is one of the most important photophysical properties of fluorescent dyes. The determination of the absorption properties of new fluorescent organic dyes at a low cost of time and money is a difficult task for experimentalists. For this purpose, various Machine Learning (ML) boosting regression models are used for estimating photophysical properties (particularly absorption wavelength) and may be an alternate approach to density functional theory or Time-Dependent density functional theory. For predicting the absorption wavelength, we examined 9% of the test size data of a given dataset of 3073 organic dyes using five different ML-based boosting regression models, such as AdaBoost Regression (ABR), Gradient Boosting Regression (GBR), XGBoost Regression (XGBR), CatBoost Regression (CBR), and LightGBM Regression (LGBMR). Before beginning the work, the chemical structures were converted into continuous values by their molecule weights using the RDKit library. Then, the proposed models were evaluated using three evaluation parameters: Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and the coefficient of determination (R²). R² values were 0.61, 0.68, 0.75, 0.75, and 0.73 for ABR, GBR, XGBR, CBR, and LGBMR, respectively. XGBR was the best-performing model across all of these implemented models in terms of the three assessment parameters of RMSE-29.83, MAE-21.26, and R²-0.75. The proposed boosting models can predict the absorption wavelength, which benefits scientists and industrialists by producing accurate drug designs and new organic dyes for large-scale manufacturing and material assessment in a short time.
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.