Verif.AI
Problem & Motivation
The ramifications of misinformation and disinformation extend beyond the digital realm and can lead to real and tragic consequences. A recent study found that COVID-19 misinformation, (deliberate false information spread to deceive), has cost at least 2,800 lives and a staggering $300 million in economic losses in Canada alone.
The World Economic Forum even ranks manipulated information as the world's most pressing short-term risk, highlighting its potential to disrupt elections, erode trust, and deepen societal divisions. With billions set to vote globally in the next two years, misinformation could have a devastating impact on our world.
This problem is exacerbated by the fact that the vast majority of existing fact-checking services are manual, time-consuming and unscalable. In light of this, it’s clear there’s a need for a tool that’s robust, scalable, accessible, and capable of empowering organizations and individuals to swiftly discern truth from fiction in an era of escalating misinformation.
Verif.AI's mission is to do just that; reduce the spread of misinformation by revolutionizing fact-checking accessibility. Through our innovative AI-powered browser extension, we strive to create a more resilient society.
Data Source, Data Science Approach, and MVP
Our model prioritizes detecting objectively verifiable claims, particularly within the political domain. To achieve this, we leveraged a dataset of approximately 3,000 web-scraped claims and their corresponding assessments from PolitiFact, a reputable source known for its fact-checking expertise. These judgements were made between 2022 and the present by PolitiFact's team of political subject-matter experts.
While PolitiFact utilizes a comprehensive scale, for our purposes, we focused on the most objective categories: "True" and "False." Classifications like "Half-True" or "Barely True" involved some subjective judgement, and therefore weren't included in the model validation process.
By sampling this dataset and processing each claim through our own pipeline, we trained our model to identify objectively false information with greater accuracy. This ensures Verif.AI provides users with the most reliable foundation for navigating the political landscape online.
Verif.AI is a browser extension that tackles the growing problem of misinformation head-on. Coupled with our versatile API and powerful LLMs, our MVP empowers users to effortlessly fact-check information wherever they encounter it online. Our product gives detailed information in the form a judgement of whether a claim is true or false, a justification, and sources to its reasoning in the form of links.
Our user research emphasized the need for a solution that's intuitive to use, provides results with low latency, and provides contextual information as well providing citations. Our product's intuitive interface allows for easy content submission and retrieval, integrating seamlessly into our users' browsing experience, ensuring they can quickly verify information without interrupting their workflow.The browser extension goes beyond simple "true" or "false" labels; our data cross-referencing capabilities leverage a powerful multi-agent system and cutting-edge Retrieval-Augmented Generation (RAG) technology. This multi-pronged approach ensures the accuracy and reliability of our results, giving our users' peace of mind when evaluating online content.
Evaluation
Our evaluation focused on measuring how effectively Verif.AI distinguishes between true and false claims, particularly within the political sphere. We compared our model's performance against baseline LLMs that processed claims through our pipeline but lacked the Retrieval-Augmented Generation (RAG) search functionality.
We utilized three key metrics: F1 score, sensitivity, and specificity.
- F1 Score: This metric provides a balanced view of both precision (correctly identifying false claims) and recall (not missing any false claims). It allows us to fine-tune our model for optimal performance.
- Sensitivity: This metric assesses the model's ability to correctly identify false claims (true positives).
- Specificity: This metric assesses the model's ability to correctly identify true claims (true negatives).
Our model achieved a higher F1 score compared to the baseline LLMs, (0.96), demonstrating its superior ability to classify false articles from the PolitiFact dataset. Additionally, Verif.AI exhibited high sensitivity in correctly judging false claims as false, (0.94). Finally, it also achieved a specificity of 0.93.
Key Learnings & Impact
Verif.AI leverages a streamlined model pipeline that integrates a state-of-the-art Large Language Models (LLMs) with a user-friendly interface packaged as a browser extension. The system offers high reliability, exceeding 90% sensitivity in detecting false claims. Furthermore, Verif.AI provides users with contextual information in the form of relevant search results to enhance information comprehension and facilitate further research.
The project's key innovation lies in its comprehensive approach. Verif.AI goes beyond just model development; it delivers a user-facing web browser extension, making the fact-checking capabilities accessible to the public through a readily available, open-source API. Our results further demonstrate Verif.AI's effectiveness through benchmarking against political data, solidifying its position as a competitive solution in the fight against misinformation.
Developing an automated fact-checking system involved overcoming several challenges related to LLM usage. The project addressed issues like controlling LLM output quality by employing techniques like prompt engineering, multi-agent systems, and few-shot learning. Additionally, LLM outputs may not always seamlessly integrate with the user interface due to formatting issues. To address this, our team utilized built-in tools and multi-agent verification to ensure proper parsing and integration. Moving forward, our team would like to focus on curating data sources to ensure the search space consists of reliable sources, as well as refining our product by adding some fine-tuning to the output.
Acknowledgments
Our team is grateful to our Capstone professors Kira Wetzel and Puya Vahabi. Their feedback and guidance were instrumental in helping us achieve what we have achieved.
In addition, we'd like to thank professor Jared Maslin for providing us with guidance on the ethical dimensions of our work.