MIDS Capstone Project Summer 2024

CurrentAI

Problem and Motivations

The rapid growth and abundance of AI-related news and technologies make it challenging for AI professionals and enthusiasts to stay current. Therefore, there is a need to provide these individuals with relevant information in a format that reduces cognitive load. Addressing this, we created CurrentAI, a platform aimed at helping AI professionals and enthusiasts effortlessly stay abreast of the latest AI advancements.

Data Sources

We collect data from the engineering blogs of various companies including Amazon, DoorDash, Grammarly, IBM, LinkedIn, Meta, Nvidia, Salesforce, Scale AI, Spotify, Two Sigma, and Uber.

Data Science Approach

Data and pipeline: After articles are gathered from the webpages, they are stored in a locally hosted SQLite database along with summaries generated by our Large Language Model. These summaries are also transformed into embeddings using a sentence transformer to facilitate similarity comparisons and stored in Pinecone, a cloud-based vector database that supports scalable vector search.

Modeling: Our system comprises two main parts: a Large Language Model that generates summaries, keywords, and powers the chatbot, and a sentence encoder that enhances our recommendation system by evaluating similarity scores between different content pieces. This enables personalized content curation based on user preferences and interaction history.

Evaluation: Both ChatGPT and human assessment were used in our model evaluation. We invented a "tag score" to assess the performance of our summarization model. For each article, our model generates five keywords and one summary. Those data were fed into GPT40 and determined whether the keywords could be derived from the summary. We then divide the number of correct keywords by the total number of keywords to get a score. Human evaluators from within the team and data professionals from outside the organization were also engaged to evaluate our product's response accuracy and helpfulness.

Key Learnings and Impacts

We have created a convenient method for data professionals to stay updated with new AI developments. Our product offers concise summaries and interactive discussions to reduce the time required for users to learn about new tools. In addition, we have implemented a recommendation system that provides personalized suggestions to users based on their interests. This allows professionals to effortlessly stay informed about advancements in their field.

Last updated: August 8, 2024