AdVision Company Banner Image
MIDS Capstone Project Fall 2024

AdVizion: Capturing Attention, Transforming Advertising with AI

Revolutionizing Digital Advertising with AdVizion's AI-Driven Saliency Detection

Problem & Motivation

In today's fiercely competitive e-commerce landscape, capturing and retaining customer attention is more critical than ever. With global e-commerce sales projected to reach $7.4 trillion by 2025, businesses face the daunting challenge of converting views into purchases. Traditional saliency prediction models fall short when it comes to e-commerce images and advertisements, which uniquely combine visual elements with textual information like product names and pricing. This gap often leads to suboptimal content that fails to engage customers effectively.

The challenge lies in answering a critical question: How can e-commerce retailers design visuals that effectively capture customer attention and drive purchases? AdVizion was born out of the need to address this challenge by empowering e-commerce businesses to create optimized, high-impact visuals that increase customer engagement and boost sales.

Data Source & Data Science Approach

Our solution leverages cutting-edge AI-driven saliency detection to analyze and optimize images and videos for maximum customer engagement. Here's how we did it:
Data Source:
At the core of our approach is the SalECI dataset, comprising 972 images from platforms like Taobao, Amazon, and Wish. spanning 13 diverse categories, such as beauty, electronics, and automotive. This dataset simulates human gaze behavior in online shopping environments, providing annotated text regions and eye-tracking data to enable precise training and evaluation of saliency models.

Model Development

We employ fine-tuned TranSalNet model, which integrates CNN and transformer architectures to capture multi-scale representations and long-range contextual information. Key steps in our data science approach include:

  1. Preprocessing & Augmentation: Images are resized to 288x384 and augmented with techniques such as random flips, rotations, and brightness adjustments to ensure model robustness.
  2. Model Training:
    1. Backbone: ResNet-50 extracts feature maps, enhanced by transformer encoders for contextual understanding.
    2. Loss Functions: Custom saliency metrics (e.g., KL Divergence, Correlation Coefficient, Similarity) and Mean Squared Error guide training.
    3. Optimization: Adam optimizer with dynamic learning rate adjustments ensures efficient convergence.
    4. Validation & Early Stopping: Validation loss is closely monitored to avoid overfitting, with training halting after 8 epochs of no improvement.
    5. Model Deployment: Trained models are exported to ONNX format for portability and real-world applications.
Schematic overview of TranSalNet.
TranSalNet Architecture

Extension to Video Analysis

  • Frame Extraction & Preprocessing: Videos are decomposed into frames at regular intervals. Each frame undergoes preprocessing to match the model's input requirements, including resizing and normalization.
  • Saliency Map Generation: The model generates saliency maps for individual frames, highlighting areas likely to attract viewer attention.
  • Temporal Smoothing: To ensure smooth transitions between frames, we implemented interpolation techniques that blend saliency maps across consecutive frames, avoiding abrupt changes and providing a coherent visual representation of attention flow.
  • Audio Synchronization: Using FFmpeg, we integrated the original audio with the processed video frames, resulting in a final output that maintains both visual and auditory elements.

Production-Grade Architecture & Deployment

To elevate our solution from a prototype to a reliable, production-grade platform, we’ve meticulously designed and implemented a scalable, secure, and fully automated architecture on Amazon Web Services (AWS). Our goal was to ensure a seamless user experience—from initial authentication and free trials to subscription-based premium access—backed by a powerful AI inference engine integrated with real-time data flows.

Below is our production architecture diagram, illustrating the end-to-end workflow from data ingestion to inference delivery and user interaction:

[ Check Out Our Architecture Image On The Right-Hand Side]

Key Architectural Highlights

  1. Continuous Model Training and Deployment:
    We leverage Google’s Colab environment (with GCP under the hood) for iterative model training and experimentation. Once a model iteration is refined, it’s packaged and deployed to Amazon SageMaker, where it is served at scale for real-time inference. This continuous loop ensures that our saliency detection models stay at the cutting edge of performance and accuracy.
  2. Containerized Services with AWS ECS & ECR:
    Both our backend and frontend engines are containerized and stored in Amazon Elastic Container Registry (ECR). We use Amazon Elastic Container Service (ECS) on AWS Fargate to orchestrate and run these containers without the need to manage underlying servers. This container-first approach ensures agility, resilience, and the ability to rapidly roll out updates or scale on demand.
  3. Load Balancing & High Availability:
    An Application Load Balancer (ALB) dynamically routes incoming traffic, ensuring balanced load distribution and minimizing downtime. This ensures that as our user base expands, we can maintain swift response times and uninterrupted service quality.
  4. Intelligent User Management & Monetization:
    We’ve integrated Amazon Cognito for secure user authentication and authorization, providing a frictionless login experience. Users can experiment with our platform up to 10 free attempts. Post-trial, our system seamlessly transitions them to a subscription model managed by Stripe. By combining Cognito and Stripe with our custom APIs, we deliver a smooth upgrade path, turning curious visitors into loyal, paying customers.
  5. Persistent Data & State Management:
    Amazon S3 stores user-uploaded images, previously generated predictions, and processed videos, enabling users to revisit past analyses effortlessly. Amazon DynamoDB provides a high-performance, serverless database to track user activities, preferences, and subscription statuses. This approach ensures rapid data retrieval and a personalized, context-aware experience.
  6. Operational Monitoring & Communication:
    Our stack is fully encapsulated within a private Virtual Private Cloud (VPC) for maximum security. Amazon Route53 manages domain name services, ensuring global accessibility with minimal latency. Operational metrics and logs are continuously monitored using Amazon CloudWatch. In addition, Amazon Simple Email Service (SES) enables us to engage directly with our customers, providing support, gathering feedback, and maintaining an ongoing dialogue to improve our service.

By seamlessly blending advanced AI modeling, robust AWS infrastructure, secure authentication, and a flexible subscription mechanism, we have created an end-to-end ecosystem that can effortlessly scale alongside your business growth. This production-grade platform not only enhances the user experience but also positions our product as an investment-ready solution engineered for performance, reliability, and future innovation.

Advizion Platform Capabilities

Advizion provides a user-friendly platform that allows businesses to seamlessly access advanced saliency detection services.

  • Easy-to-Use Interface:
    • Users can effortlessly upload images and videos to receive detailed saliency maps, enabling them to understand where customers focus their attention.
  • Real-Time Analysis:
    • The platform delivers fast and accurate predictions, providing immediate insights to optimize visual content.
  • Comprehensive Media Support:
    • Supports both images and videos, allowing users to analyze various types of advertising content.
  • User Account Management:
    • Secure login and personalized accounts enable users to manage their analyses, view history, and track usage.
  • Flexible Access and Integration:
    • Offers a simple subscription model with affordable pricing, providing unlimited access to saliency predictions.
    • Subscribers receive API access, allowing integration of Advizion's capabilities into their own applications and workflows.
  • Data Security and Privacy:
    • Ensures user data is protected through robust security measures and best practices in data handling.

By focusing on delivering actionable insights through an intuitive platform, Advizion empowers businesses to enhance their visual content and drive customer engagement.

Evaluation

Our model’s performance is rigorously evaluated using metrics designed for saliency prediction:

  • KL Divergence: Measures divergence between predicted and actual saliency distributions.
  • Correlation Coefficient (CC): Indicates the strength of correlation between predictions and ground truth.
  • Similarity Metric (SIM): Reflects similarity between predicted and actual saliency maps.
  • Normalized Scanpath Saliency (NSS): Assesses prediction accuracy for human fixation points.
  • Area Under the Curve (AUC): Evaluates fixation prediction effectiveness.
Evaluation Metrics
Evaluation Metrics

Key Learnings & Impact

Our journey has underscored the transformative potential of AI in addressing real-world challenges. Key learnings include:

  • Data Diversity: A diverse dataset is essential for robust, generalizable models.
  • Model Optimization: Combining CNN and transformer architectures improves saliency prediction by leveraging contextual and spatial information.
  • Actionable Insights: Businesses can use saliency maps to refine visual content and optimize layouts, directly impacting customer satisfaction and conversion rates.

Our technology empowers e-commerce businesses to:

  • Enhance Visual Appeal: Create captivating product visuals.
  • Boost Engagement: Increase click-through and conversion rates.
  • Reduce Bounce Rates: Hold customer attention for longer periods.

By enabling retailers to design visuals that convert, AdVizion supports their growth in a competitive digital marketplace.

Acknowledgements

We extend our gratitude to:

  • Research Collaborators: For their contributions to dataset creation and model development. 
  • E-Commerce Professionals: Amir Sharifian, for insights into industry needs and challenges.
  • AI & Data Science Community: For advancing the tools and methodologies that make solutions like AdVizion possible.
  • Berkeley Instructors: Kira Wetzel and Puya Vahabi for their valuable insights and directions.
Last updated: December 13, 2024