MIDS Capstone Project Fall 2024

WonderScribe: Where Stories Come To Life

Team members

Problem & Motivation

Children are brimming with creativity and imagination, often dreaming up vivid stories and fantastical worlds. However, many struggle to express their ideas due to limited vocabulary, developing writing skills, or difficulty visualizing their narratives. Traditional storytelling methods—like writing by hand or using essential drawing tools—can be time-consuming, lack interactivity, and fail to fully capture the richness of their imagination. These challenges are especially pronounced for digital-native children, accustomed to more dynamic and immersive experiences, and those with diverse learning needs, which may require more engaging or accessible tools.

The need for an inclusive, interactive, and user-friendly storytelling platform has never been greater. A solution must address these challenges while nurturing children’s creativity, improving their literacy skills, and helping them express themselves in a structured yet imaginative way. Such a tool should bridge the gap between traditional and modern storytelling and prepare children for a tech-driven future by introducing them to digital tools in a safe and culturally sensitive environment.

Our Solution

WonderScribe offers an innovative platform that transforms storytelling into a dynamic, interactive, and immersive experience for children. By leveraging advanced AI technologies and cloud-based services, WonderScribe addresses the challenges of traditional storytelling methods by making it accessible, engaging, and inclusive. At the heart of the platform is Anthropic’s Haiku model, integrated with a Retrieval-Augmented Generation (RAG) system, which generates personalized and contextually relevant narratives based on user prompts. These narratives draw from a curated knowledge base of children’s stories to ensure logical flow and creative depth.

To complement the text, WonderScribe incorporates Stability Diffusion to generate high-quality, comic-style images that visually represent key elements of the story, such as characters, settings, and scenes. Additionally, AWS Polly converts the generated text into high-quality audio, providing an auditory dimension that enhances accessibility and engagement. The platform’s user-friendly interface allows children to input prompts, visualize their stories through vivid illustrations, and listen to them in multiple languages, ensuring cultural inclusivity. Together, these features create a seamless and engaging storytelling process that nurtures creativity, literacy, and digital skills, helping children explore their imaginations and share their unique stories with the world.

Data Source & Data Science Approach

WonderScribe utilizes a curated dataset of over 50,000 children’s stories from Kaggle, including Fairy Tales and Grimm’s Fairy Tales. These datasets serve as the foundation for the AWS Knowledge Base that powers the RAG architecture.

Data Cleaning: Refined for child-friendly vocabulary and removed inappropriate or redundant content.
Story Structure: Average of 26 sentences per story, with clear plot progression and character archetypes.
EDA Insights: Identified common themes, patterns, and age-appropriate language, ensuring cultural diversity and relevance.
Themes: Bravery, friendship, exploration, and human-centered narratives.
Model Development: A Retrieval-Augmented Generation (RAG) model combines retrieval-based search with generation capabilities, ensuring coherence and creativity in story generation. Stability Diffusion and AWS Polly enhance image and audio outputs, making stories more immersive.

Architecture

WonderScribe integrates multiple AI and cloud services into a streamlined workflow:

Frontend: Interactive UI built with Streamlit, allowing children to input prompts, view visuals, and listen to audio.
Backend: Amazon Bedrock, AWS Lambda, and API Gateway support scalable, cloud-based processing.
AI Models: Anthropic’s Haiku generates personalized stories, Stability Diffusion creates high-quality images, and AWS Polly produces multilingual, child-friendly audio.
Knowledge Base: A curated story repository enables the RAG system to enhance text coherence and creativity.

RAG Architecture - Retrieval-Augmented Generation (RAG) combines retrieval-based search with generation-based responses. In WonderScribe, RAG is used to ensure that the story generation process remains relevant and coherent while responding dynamically to user inputs. Here’s how RAG supports WonderScribe:

1. Retrieval: The system retrieves relevant story segments or patterns from a predefined dataset, aligning with user prompts or inputs.

2. Generation: Using these retrieved elements, the generation model (Anthropic’s Haiku model) creates new story content by blending retrieved information with fresh, AI-generated text.

3. Integration: This approach enables the model to maintain story consistency, drawing on real-world story data and ensuring logical narrative flow, all while staying contextually relevant to the user’s prompt.

The RAG model framework in WonderScribe bridges the gap between an entirely new generation and curated retrieval, delivering creative yet contextually grounded content.

Technical Approach - WonderScribe's technical approach integrates multiple AI and cloud services, streamlining the story generation process across three main components: text, image, and audio.

Text Generation:

Model: Anthropic Haiku model
Function: Based on the user’s input or prompt, this model generates story text, creating a coherent, engaging storyline suitable for children.
Processing: RAG architecture enhances relevance by combining the Haiku model’s generative abilities with contextual data from existing story segments.

Image Generation:

Model: Stability Diffusion model
Function: Generates comic-style illustrations that align with the story narrative, making each scene visually engaging.
Details: The Stability Diffusion model processes the generated text and creates corresponding visual elements that resonate with the story's tone and style.

Audio Generation

Service: AWS Polly
Function: Converts the generated story text into high-quality, child-friendly audio, enabling an auditory storytelling experience.
Configuration: Configured to use voices that are clear, warm, and well-suited for young listeners.

The backend is supported by Amazon Bedrock, which handles all essential back-end functions and integrates with AWS services, including API Gateway and Lambda functions for streamlined API management and processing.

Application Process Summary

Our story-generation application offers a seamless and interactive experience:

Access: Users can access the platform via a web URL or mobile devices like iPads and iPhones for on-the-go story creation.
Input: Users provide a story theme or query, which is processed using AWS Bedrock and Claude 3 LLM to generate personalized story content.
Story Creation: The story is segmented into parts, with captions extracted for visual representation.
Image Generation: Captions are transformed into visuals using Stable Diffusion XL, creating vibrant illustrations.
Audio Narration: The story text is converted into engaging audio using AWS Polly for a multimodal experience.
Interactive Output: The Streamlit application integrates text, images, and audio to deliver an immersive storytelling experience.

This efficient process empowers users to create and enjoy personalized, visually engaging, and narratively rich stories.

Evaluation

User Feedback and Resolution

Future of WonderScribe

Acknowledgments

We extend our heartfelt gratitude to our team members, professors, classmates, family members, and friends who tested our product and provided invaluable insights. Your support, feedback, and encouragement were instrumental in shaping WonderScribe into the engaging and innovative platform it is today!