AdVision Company Banner Image
MIDS Capstone Project Fall 2024

AdVizion: Capturing Attention, Transforming Advertising with AI

Revolutionizing Digital Advertising with AdVizion's AI-Driven Saliency Detection

Problem & Motivation

In today's fiercely competitive e-commerce landscape, capturing and retaining customer attention is more critical than ever. With global e-commerce sales projected to reach $7.4 trillion by 2025, businesses face the daunting challenge of converting views into purchases. Traditional saliency prediction models fall short when it comes to e-commerce images and advertisements, which uniquely combine visual elements with textual information like product names and pricing. This gap often leads to suboptimal content that fails to engage customers effectively.

The challenge lies in answering a critical question: How can e-commerce retailers design visuals that effectively capture customer attention and drive purchases? AdVizion was born out of the need to address this challenge by empowering e-commerce businesses to create optimized, high-impact visuals that increase customer engagement and boost sales.

Data Source & Data Science Approach

Our solution leverages cutting-edge AI-driven saliency detection to analyze and optimize images and videos for maximum customer engagement. Here's how we did it:
Data Source:
At the core of our approach is the SalECI dataset, comprising 972 images from platforms like Taobao, Amazon, and Wish. spanning 13 diverse categories, such as beauty, electronics, and automotive. This dataset simulates human gaze behavior in online shopping environments, providing annotated text regions and eye-tracking data to enable precise training and evaluation of saliency models.

Model Development:

We employ fine-tuned TranSalNet model, which integrates CNN and transformer architectures to capture multi-scale representations and long-range contextual information. Key steps in our data science approach include:

  1. Preprocessing & Augmentation: Images are resized to 288x384 and augmented with techniques such as random flips, rotations, and brightness adjustments to ensure model robustness.
  2. Model Training:
    1. Backbone: ResNet-50 extracts feature maps, enhanced by transformer encoders for contextual understanding.
    2. Loss Functions: Custom saliency metrics (e.g., KL Divergence, Correlation Coefficient, Similarity) and Mean Squared Error guide training.
    3. Optimization: Adam optimizer with dynamic learning rate adjustments ensures efficient convergence.
    4. Validation & Early Stopping: Validation loss is closely monitored to avoid overfitting, with training halting after 8 epochs of no improvement.
    5. Model Deployment: Trained models are exported to ONNX format for portability and real-world applications.
Schematic overview of TranSalNet.
TranSalNet Architecture
 

Extension to Video Analysis:

  • Frame Extraction & Preprocessing: Videos are decomposed into frames at regular intervals. Each frame undergoes preprocessing to match the model's input requirements, including resizing and normalization.
  • Saliency Map Generation: The model generates saliency maps for individual frames, highlighting areas likely to attract viewer attention.
  • Temporal Smoothing: To ensure smooth transitions between frames, we implemented interpolation techniques that blend saliency maps across consecutive frames, avoiding abrupt changes and providing a coherent visual representation of attention flow.
  • Audio Synchronization: Using FFmpeg, we integrated the original audio with the processed video frames, resulting in a final output that maintains both visual and auditory elements.

Deployment & Scalability:

  • AWS Integration: Our entire application is deployed on Amazon Web Services (AWS), utilizing services like ECS Fargate for container management, SageMaker for model hosting and inference, DynamoDB for database management, and S3 for storage.
  • API Development: We developed RESTful APIs using Flask, enabling seamless integration with the frontend and providing API access to premium users.

Advizion Platform Capabilities

Advizion provides a user-friendly platform that allows businesses to seamlessly access advanced saliency detection services.

  • Easy-to-Use Interface:
    • Users can effortlessly upload images and videos to receive detailed saliency maps, enabling them to understand where customers focus their attention.
  • Real-Time Analysis:
    • The platform delivers fast and accurate predictions, providing immediate insights to optimize visual content.
  • Comprehensive Media Support:
    • Supports both images and videos, allowing users to analyze various types of advertising content.
  • User Account Management:
    • Secure login and personalized accounts enable users to manage their analyses, view history, and track usage.
  • Flexible Access and Integration:
    • Offers a simple subscription model with affordable pricing, providing unlimited access to saliency predictions.
    • Subscribers receive API access, allowing integration of Advizion's capabilities into their own applications and workflows.
  • Data Security and Privacy:
    • Ensures user data is protected through robust security measures and best practices in data handling.

By focusing on delivering actionable insights through an intuitive platform, Advizion empowers businesses to enhance their visual content and drive customer engagement.

Evaluation

Our model’s performance is rigorously evaluated using metrics designed for saliency prediction:

  • KL Divergence: Measures divergence between predicted and actual saliency distributions.
  • Correlation Coefficient (CC): Indicates the strength of correlation between predictions and ground truth.
  • Similarity Metric (SIM): Reflects similarity between predicted and actual saliency maps.
  • Normalized Scanpath Saliency (NSS): Assesses prediction accuracy for human fixation points.
  • Area Under the Curve (AUC): Evaluates fixation prediction effectiveness.
Evaluation Metrics
Evaluation Metrics

Key Learnings & Impact

Our journey has underscored the transformative potential of AI in addressing real-world challenges. Key learnings include:

  • Data Diversity: A diverse dataset is essential for robust, generalizable models.
  • Model Optimization: Combining CNN and transformer architectures improves saliency prediction by leveraging contextual and spatial information.
  • Actionable Insights: Businesses can use saliency maps to refine visual content and optimize layouts, directly impacting customer satisfaction and conversion rates.

Our technology empowers e-commerce businesses to:

  • Enhance Visual Appeal: Create captivating product visuals.
  • Boost Engagement: Increase click-through and conversion rates.
  • Reduce Bounce Rates: Hold customer attention for longer periods.

By enabling retailers to design visuals that convert, AdVizion supports their growth in a competitive digital marketplace.

Acknowledgements

We extend our gratitude to:

  • Research Collaborators: For their contributions to dataset creation and model development. 
  • E-Commerce Professionals: Amir Sharifian, for insights into industry needs and challenges.
  • AI & Data Science Community: For advancing the tools and methodologies that make solutions like AdVizion possible.
  • Berkeley Instructors: Kira Wetzel and Puya Vahabi for their valuable insights and directions.

More Information

Last updated: December 2, 2024