if-then.nexus banner
MIDS Capstone Project Fall 2024

if-then.nexus

if-then.nexus is a gaming prototyping tool that allows users to organize and develop storyline ideas without the need for special software. 

It works right in your web browser and provides an intuitive platform for writing interactive fiction. Whether it's a simple, linear story or a complex, branching narrative with character and item-based decisions, our tool adapts to your needs. 

We've built in cutting-edge technologies to streamline the process, allowing for rapid story expansion and visual storyboarding. 

Focus on creative storytelling without the hassle of coding or managing complex storylines -- this will help us all get started creating.


Problem & Motivation

Prototyping and developing complex, non-linear storylines in interactive games is time-intensive and challenging, especially for smaller developers with limited resources. There is a need for an open-source, easy-to-use tool that enables collaborative, fast, and creative ideation for game narrative development, while integrating GenAI ethically to stimulate rather than replace creative processes. 

For fledgling independent developers, the *cost* of large scale development is insurmountable. Big game timelines and budgets can demand multiple years and hundreds of millions of dollars. There’s a market opportunity for small studios of size 1-10 to build novel and sustainable game-based themed entertainment utilizing established and widely available technologies together with contractor artists seeking project work.  Small shops are faced with having to do more with less.  Ideation and narrative are at the core of interactive gaming, from the very earliest text adventure games of the 1970s. Stories, whether linear or nonlinear, are timeless and technology-agnostic.

The global video game market, including online and offline console, mobile, and pc games, generates roughly 200 billion dollars in yearly revenue (international movie box office revenues for 2023 was estimated to be 34 billion). The Steam PC gaming platform, popular for independent publishing, generated 9 billion dollars of revenue in 2023.

According to the GDC 2024 Game Industry Report, which surveyed 3000 game developers, 49% said that they're experiencing GenAI use in their studios.  And this includes art and writing narrative.  There's more of an appetite and acceptance of GenAI at smaller studios.  

In 2024, game development professionals are facing layoffs and uncertainty around ethical AI adoption within the industry and artists are also concerned about a wholesale erosion of their unique voice and agency with regard to their craft.

Data Source & Data Science Approach

Text-To-Text

Initially, we explored the Mosaic models (mpt-7B Storywriter and mpt-7B Chat) for their storytelling capabilities. These models seemed promising due to their fine-tuning for specific tasks like storytelling and chat-based instruction.

Challenges with Mosaic Models:

  • Loading the full-size, unquantized Mosaic models was almost impossible due to resource constraints.
  • Even when using the quantized versions (which reduce model size and memory usage), there was noticeable degradation in generation performance.
  • Decision to Switch to Llama 3.1B:
    • Ultimately, we decided to use the Llama 3.1B model instead.
    • It is a more reliable high-performing multilingual model, which met our requirements more efficiently. Even with loading quantized, the performance did not degrade.
    • Overall, this gave us better generation quality.

The platform UI exports game elements in JSON format, which updates dynamically as the game creator makes changes.  This ensures that the prompts generated for the model reflect the most up-to-date game context.  Knowing this, we have used that to our advantage with prompt engineering and leveraged our synthetic dataset  We focused prompt engineering from the user’s perspective, and tried to think about what would a user ask for in the prompt?

We came up with 3 strategies keeping users in mind:

  • Strategy 1:  Simple Character Generation
  • Strategy 2:  Complex Character Generation with more details, interactions.
  • Strategy 3:  Very complex Dialogue between characters and usage of items in the game.

Text-To-Image

We started off planning on using Adobe Firefly, an generative AI tool that is a part of the Adobe Creative Cloud.  This model is marketed to be geared towards creators, so it seemed to be a sensible choice for us to use, however, after over a month of going back and forth with Adobe sales and attempting to gain access to the API, we decided to move on to more open-sourced models hosted on Hugging Face.  We selected stable diffusion xl base model, which many of you may be familiar with, since it serves as the base to many image generation models and is generally considered to be easy to use.

As we explored the possibilities of fine tuning this model to get that rough, pencil sketch like look to the images, we were able to find a storyboard-sketch adaptation among the over 4,000 adaptations of the base model on Hugging Face.  This LoRA adaptation not only saved us resources on fine tuning the model, but it also produces results consistent with the original vision of the MVP. 

Evaluation

We decided to user Perplexity as our evaluation metric.  Perplexity measures how well the model predicts text.  We tried out several scenarios and are pretty happy with model performance and the evaluation metric results. Overall, we find that the higher complexity of the generation, the higher perplexity is.

Key Learnings & Impact

We’ve had several technical challenges that we’ve managed to overcome.  Initially, we wanted to leverage the AWS credits and use Amazon SageMaker.  We quickly realized that we probably do not need to use such technology for the MVP.  We can utilize free and open source notebooks and models.  And we even found a way to connect to Hugging Face via their inference API using access tokens.

To Fine Tune or Not to Fine Tune - we also looked into fine tuning but then realized that this might impede user creativity if we rely on a fine tuned model with very specific training data.

Adobe Firefly -- As much as we wanted to use Adobe Firefly API, we struggled with getting access to the API, despite communication with several customer service representatives. This actually worked out to our advantage because we found models that are more suitable for our use case and has already been fine tuned to our desired aesthetics of storyboard sketch or simple image generations. 

Acknowledgements

We must first thank our capstone instructors, Zona Kostic and Morgan Ames, for their expert guidance and enthusiastic support along each step of this process.  We'd also like to thank those who graciously spent personal time with our team discussing the project and assessing our work in progress:  Brad Taylor, Punn Wiantrakoon, John Romero, Wolff Dobson, Roland Dubois, and Brian Sanchez.  Also thanks to the Themed Entertainment Association for their excellent Round Table Panel Events covering highly pertinent subject areas.

References

Images

Last updated: December 11, 2024