Reading Time: 3 minutes

Data-Driven ESG Assessment: A Transparent Approach

Benjamin Cheong
21 Feb 2023

Datasets: https://docs.google.com/spreadsheets/d/1z7J0gUNdYNDIFIHuo8LvtLx-XEk8qVCA/edit#gid=687711652

Github Program: https://github.com/BCBeast/esgInfoCompiler.git

My project connects to the #13 and #17 SDGs: Climate Action and Peace, Justice, and Strong Institutions. On climate action, climate change, pollution, and resource depletion are major threats to our planet, and businesses have a crucial role to play in addressing these issues. By incorporating ESG principles into their operations, companies can reduce their greenhouse gas emissions, conserve resources, and protect natural habitats. On Peace, Justice, and Strong Institutions, ESG plays a crucial role in promoting social responsibility and equity. Companies that prioritize ESG factors are often more transparent and accountable, with stronger governance structures and processes in place. This can help to prevent corruption and misconduct, and also enhance the reputation and credibility of the company.

While there are many organizations such as Sustainalytics and MSCI that provide ESG ratings, all of these organizations do not provide the data they use to come up with their ESG scores as the data is proprietary. Ironically, there is a lack of transparency from the very organization that is supposed to grade other companies on their transparency among other factors. My project does not calculate ESG ratings, but rather provides all the data necessary from reputable publicly available resources for investors to judge the ESG of the company for themselves. The reason why my project does not calculate ESG ratings is because the weights of the parameters that are used to calculate scores are subjective. Given the same data, funds can come to different decisions about a company’s ESG rating based upon how much weight they give each factor such as whether a company is involved in oil or not. This project used 18 data sets and 52 parameters to collect data on companies in the S&P 500 index.

Algorithms

1. Javascript data scraper

Although there were some instances where original datasets were provided in easily accessible formats, there were other instances where data needed to be compiled from hundreds of URLs. In those cases, I wrote a Javascript scraper and implemented it in the Google Sheets App Scripts section. For example, to assemble a dataset on the median age of company executives, I scraped data from the Profile section of Yahoo Finance pages. 

(Easiest method) Using Google Sheets, I can create a Javascript web scraper that pulls out information from websites that use HTML for their frontend (not Javascript). I use XML which is the attribute of where certain data is stored. By inspecting the element on the website, I can find the XML path of the data. After collating my URLs, I can use my web scraper program to pull out the data kept in those XML paths. Note that this approach will only work if the XML paths are the same across URLs. 

(Hard method) Using Google Apps Script, I can create a Javascript web scraper that pulls out information from websites that use Javascript for the frontend. Instead of using XML, we use the UrlFetchApp class to fetch a url and get all the HTML body code in it. This is akin to copying the HTML frontend code shown when you write “view-source:” before a URL. The data will be in the HTML frontend code, but you will have to use multiple cases of substring to pull out the data. This method is very tough and requires much more trial and error than the above method, but works with websites based on Javascript while the other doesn’t.

2. Find tickers

Since many companies have different names with minor differences (e.g Amazon vs Amazon.com), I could not easily compare companies in my later analysis. Instead, I wrote code in my datasets that found the ticker of the company given several iterations of a name.

Leave a Reply