1679257133143.jpg
MIDS Capstone Project Fall 2023

ESG Mapper: Doing well by doing good consistently

Overview

Our mission is to develop a NLP (Natural Language Processing) powered tool that identifies and labels the Environmental, Social, and Governance (ESG) statements according to the Sustainability and Accounting Standards Board (SASB) materiality framework in any document. The product was designed with fine-tuned BERT models to first capture and identify ESG-relevant texts, followed by parent and child label SASB classifications. The final product allows users to directly upload PDF, retrieve a summary of SASB texts and view distribution of SASB-labeled text on a web-based user interface (UI).

Background

Environmental, social, and governance (ESG) is a set of aspects being carefully considered when investing in companies taking environmental, social and corporate governance issues into account. While there is a rising network of ESG ratings, they are inconsistent across reporting standards, rating agencies, and governance entities. Recently, after examining six well-known ESG ratings, The Aggregation Confusion Project at MIT found that the scores of these ratings on the same companies often diverged. The major contributor to this problem is the Jingle-Jangle Fallacy, a term often used in information science (e.g., Larsen & Bong, 2016; Song et al., 2021) to describe the inconsistent use of terminologies and measurements across domains of information and knowledge. Specifically, the definitions and measures adopted by different ratings differ significantly and how they relate to each other remains quite unclear.

The Jingle-Jangle Fallacy in the ESG rating industry is problematic for a few reasons.

First, it makes it difficult if not impossible to compare ratings across standards. As a consequence, it leaves companies much leeway for questionable practices to selectively report some measures from a rating in favor of their position and hide others where firms may be at a disadvantage. Second, it makes it costly for analysts to compare and synthesize the findings about the impact of ESG dimensions, from trade-offs among stakeholder interests to long-term economic value.

As a solution, we created a Large Language Model (LLM)-based end-to-end platform that streamlines ESG text identification, classification and summary visualization in a smooth workflow.

Motivation and Market Opportunity

The Global ESG investment space is seeing fractures appearing due to concerns about inconsistent ratings. The global ESG market is estimated to hit $50 trillion in 2025 (McCabe, 2023). However, ESG score providers, who are following inconsistent approaches, are drawing criticism because unreliable ESG ratings can lead to greenwashing.

The project aims to use a systematic natural language processing (NLP) approach to standardize the ESG categorization process which subsequently drives ESG ratings. A standardized and consistent ESG rating system serves as a transparent means for investors, customers, NGOs, and the public to make well-informed decisions based on ESG considerations. It will also help improve the reliability and integrity of the ESG investment market space and attract capital towards companies that focus on society, the environment, and a sustainable future for all stakeholders.

Our target users are anyone who needs an efficient approach to comparing companies against a standard framework of ESG topics, such as ESG investment analysts and ESG regulators/auditors. 

The direct impact of our tool is to replace labor costs for ESG labeling and comparison, which is estimated to count for roughly $170 million per year for a large ESG rating agency.

Key Features

Interactive

ESG mapper is a user-friendly, web based tool that takes a single input of any PDF document uploaded by the user.

Real-time Result

Compehensive results are generated in real time. It takes roughly 3-5 minutes for ESG mapper to finish summarization, prediction and present results after user input.

End-to-end workflow

ESG mapper kicks off when a user drags and drops a PDF document, and the rest happens automatically in the backgorund.

Detail-oriented

Results generated by ESG mapper are comprehensive and detail oriented. They can be as high level as a sandkey diagram to help users view the distribution of ESG label distribution at a glance, and well as being specific enough to include confidence intervals of predictions to help investment professionals make informed decisions.

Consistency, Automation and Accuracy

In a highly regulated market as the Financial Market, rules and standards are the heart and soul of responsible investing. We aim standardize ESG rating so that ESG driven investment decisions are transparent, standardized and regulation-compliant. 

Data Source, Data Science Approach & Evaluation

We used a multi-stage Large Language Model to classify text into ESG vs Non-ESG, ESG parent label and ESG child labels. Using texts manually labeled by industry experts, we trained and tested the multi-stage model on a curated sample of 5,828 sentences. The Data Source, Data Science Approach and Evaluation are detailed in our ESG Mapper Data and Model Specs linked at the end of this page.

Learning & Impact

Our LLM backed real-time web based interface is a revolutionary alternative to the current manual and repetitive ESG labeling process in the industry. The prediction model achieves better performance than the Rule-Based (keyword search) approach and the State of the Art ESG-BERT model. 

If our product can be successfully adopted by MSCI, the largest and most reliable ESG rating provider, followed by more ESG rating agencies such as Bloomberg, FactSet, Bloomberg, Morningstar Sustainalytics, and Refinitive, we have confidence in reaching a market size of at least $100 million.

Acknowledgements

We thank Professors Daniel Aranki and Puya Vahabi for the guidance and the class section 9 for their constructive comments!

References:

  1. Berg, F., Koelbel, J.F. and Rigobon, R., 2022. Aggregate confusion: The divergence of ESG ratings. Review of Finance, 26(6), pp.1315-1344.
  2. Busco, C., Consolandi, C., Eccles, R.G. and Sofra, E., 2020. A preliminary analysis of SASB reporting: Disclosure topics, financial relevance, and the financial intensity of ESG materiality. Journal of Applied Corporate Finance, 32(2), pp.117-125.
  3. Devlin, J., Chang, M.W., Lee, K. and Toutanova, K., 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  4. Mehra, S., Louka, R. and Zhang, Y., 2022. ESGBERT: Language Model to Help with Classification Tasks Related to Companies Environmental, Social, and Governance Practices. arXiv preprint arXiv:2203.16788.

SASB. 2023. Companies reporting with SASB Standards, accessible at https://sasb.org/company-use/sasb-reporters/.

Last updated: December 14, 2023