ESG Score Prediction
End-to-end ML pipeline for ESG grade classification from publicly available data.
Project Snapshot
Built as part of the OUTTA AI Boot Camp. Developed an end-to-end ESG grade classification pipeline by scraping publicly available ESG data (2020–2023), performing preprocessing, feature engineering, and exploratory analysis, then benchmarking multiple ML models.
Problem
ESG (Environmental, Social, Governance) scores are increasingly important for investment decisions, but obtaining timely and accurate grades can be slow and opaque. Automating ESG classification from publicly available data can democratize access to these insights.
Approach
- Scraped publicly available ESG data spanning 2020–2023
- Performed data preprocessing, feature engineering, and exploratory analysis
- Benchmarked KNN, Logistic Regression, SVM, and Random Forest classifiers
- Identified the most influential ESG sub-score through feature importance analysis
Tech Stack
Python, Scikit-learn, Pandas, NumPy, BeautifulSoup, Jupyter Notebook
Results
- Random Forest achieved the best performance with accuracy > 85%
- Governance (G) identified as the most influential feature for ESG grade prediction
- Demonstrated a complete ML pipeline from data collection to model evaluation