Solar farm with wind power in Winter
MIDS Capstone Project Fall 2023

SFDRgen: Automating ESG Report Analysis

Problem & Motivation

More and more investors want their money to do ‘something good’. They want their asset managers to fund investments with good Environmental, Social and Governance (ESG) characteristics, such as wind turbine markers or LEED Platinum buildings. Investment portfolios with specific ESG mandates, often designated with labels like Article 8 funds or Article 9 funds, have attracted a lot of capital over the last few years, especially in Europe. EU regulators, however, want to ensure that Article 8 and Article 9 funds are truly investing in ‘good’ projects and companies. The money must be spent on the promised projects without causing harm in other ways. To that end, the EU regulators devised the Sustainable Finance Disclosure Regulations (SFDR), a set of around 30 detailed questions ranging from water policy to energy consumption data. Article 9 fund managers must complete an SFDR report for every investment in the portfolio.

This due diligence is a heavy burden on top of regular financial analysis. Analysts search for answers to SFDR questions by manually searching through ESG reports published by companies, which are often pdf files over 100 pages in length. Terminology, formatting, and content vary greatly across companies. An analyst can easily spend over 8 hours initiating a new SFDR report, and a typical investment portfolio can have anywhere from 20 to 100 investments. Inadequate analysis can result in regulatory fines and reputation damage. The sizable workload and high execution risk have led many asset managers to abandon Article 9 funds over the last year. Ultimately, this blocks money flowing into investments good for society and the planet.

Data Source & Data Science Approach

Our data source is 50 Sustainability Reports from companies around the world and in different industries such as utilities, technology, and industrials. We have ground truth answers to the 27 SFDR questions for 25 of these companies. Out of the 25 companies, 22 have Sustainability Reports in English. We used 11 of these companies to train our model and saved the rest for testing.

Our tool takes the uploaded pdf and parses it into text chunks. We used a combination of text extraction techniques and Optical Character Recognition (OCR) techniques, implemented via langchain, to interpret tables and infographics. These text chunks are embedded using OpenAI embeddings and stored in a Pinecone database. The 27 SFDR questions were prompt engineered to be more specific in targeted information and received a one-shot example context, answer, and source for each question. We also tried other methods of prompt engineering, specifically using few-shot prompts to guide the model to think through questions logically via sequential steps, either via our custom-designed steps or utilizing langchain’s prompts, but both sets of results were weaker than those generated using the one-shot model. A Large-Language-Model (LLM), in this case ChatGPT version 3.5, interprets the 27 SFDR questions. We then use Retrieval Augmented Generation (RAG) to search for the most relevant text chunks in the vector database for each question based on cosine similarity. The LLM then uses these chunks to generate an answer for each question. We also provide the page numbers of the relevant text chunks for user cross-checking and reducing hallucination.

Evaluation

We compared the Rouge 1 and Rouge 2 scores across our three model architectures. The one-shot model performed best across the technical metrics. We also had three human evaluators reading through the model generated answers and found the one-shot model the most knowledgeable and with the least hallucination.

Key Learnings & Impact

We realized the difficulty of parsing information from tables and infographics, with infographics the hardest to extract. However, by using multiple parsing tools and employing techniques like OCR, we improved the amount of extracted information.

We tried to formulate the best prompts that will generate the most comprehensive response. After comparing three different prompts, we decided that providing one-shot examples customized to our domain of ESG reports produced the best responses.

We also overcame a lot of difficulties implementing various components of the model and creating the API and UI of the tool.

Acknowledgements

We would like to thank our external Subject Matter Experts who helped evaluate our model responses, Valeria C. and Kevin K.

More Information

Last updated: December 14, 2023