Real-time ad pricing is one of the hardest problems in adtech.
Every single ad request is a decision. And that decision has to happen in milliseconds. The system has to determine the right bid price while balancing three competing goals at once- revenue, win rate, and market competitiveness.
Get it wrong and you lose auctions. Get it slow and you don’t even get to participate. That’s the environment ad pricing systems operate in. Now add another layer of complexity. Most analytics systems work with historical data. Ad pricing does not have that luxury. It runs inside a live marketplace where things change constantly.
Campaign starts and stops. Budgets shifts. Users move between devices, regions, and apps. Which means the model that worked yesterday might already be outdated today. To solve this at scale, we built a production–grade price optimization engine and integrated it into the existing Ad exchange platform.
One that could:
- process massive volumes of ad traffic effectively
- make pricing decision in real-time
- automate continuously retraining and deployment
- support both supervised and continuous learning
- stay reliable under extreme scale
Sounds simple, right? It wasn’t.
In this article, i will walk you through how our team designed and implemented this system – from large scale data ingestion and feature generation to model training, reinforcement learning, and the MLOps infrastructure that keeps the platform running in production.
But before that, I would like to walk you through the constraints that were imposed by the ad exchange environment.
The three constraints
Massive traffic volumes
Ad exchanges operate at an extraordinary scale. In our case, the platform processed more than 50 billion ad requests every day. It is easy to underestimate what this means for ML workflows. Training machine learning models on the entire dataset were not practical. The compute cost alone would have made daily retraining impossible.
At the same time, reducing the dataset too aggressively wasn’t the option either. Poor sampling could introduce bias and distorted pricing predictions.
The system therefore needed a strategy that could preserve statistical reliability while keeping infrastructure costs manageable.
While scale presented one challenge, latency presented another.
Strict latency requirements
In most machine learning systems, inference latency of a few hundred milliseconds is acceptable.
In adtech, it isn’t.
Each bid decision must be returned within strict latency bounds. Even small delays can reduce auction participation and directly impact revenue. This meant the pricing models could not introduce meaningful overhead. The system therefore required:
- lightweight inference pipelines
- distributed serving infrastructure
- parallel model execution across multiple servers
Designing models was only part of the problem. The entire serving architecture had to support high-speed decisioning at scale.
Even if scale and latency are addressed, another challenge remains- data freshness
Continuous, reliable data
In AdTech, data freshness is everything.
Ad markets change constantly. Campaign launches, seasonal trends, shifting audience behavior, and supply-demand dynamics can all change pricing patterns within hours. If models are trained on stale data, their predictions quickly become irrelevant.
To keep the pricing models, align with these changes, the system required:
- daily data ingestion
- automated cleaning and validation
- aggregation pipelines capable of handling multi-billion row datasets
- reliable feature consistency between training and serving
Addressing these challenges required building a scalable data foundation before any modeling could begin.
Building a data pipeline that can keep up with Ad traffic
The first step in solving the scale challenge was designing a data pipeline capable of capturing and processing representative production traffic without overwhelming infrastructure.
Capturing representative traffic through smart sampling
Instead of storing every request, we introduced a structured sampling strategy.
Approximately 4% of live production traffic was routed into the machine learning data pipeline. Even this small slice generated a dataset of more than 2 billion requests, large enough to maintain statistical integrity while keeping infrastructure costs manageable.
This sampling strategy ensured that the dataset remained statistically representative while keeping storage and computational requirements manageable.
Processing massive dataset with distributed infrastructure
Once sampled, the traffic data was processed using PySpark, which allowed large datasets to be distributed across multiple compute nodes.
Every day, these pipelines performed a series of processing steps that included:
- data cleaning (handling missing values, outliers, inconsistencies)
- validation checks (schema validation, distribution drift detection)
- feature aggregation at multiple levels (publisher, geo, device, time window)
All these pipelines ran end-to-end without manual intervention, ensuring that fresh, validated data was continuously available for both supervised and reinforcement learning workflows.
With a reliable data foundation in place, the next challenge was converting this massive volume of raw traffic data into signals that machine learning models could interpret and act upon.
Turning raw traffic data into meaningful signals
Raw ad request logs contain valuable information, but they are rarely structured in a way that models can directly use. Extracting useful signals required a large-scale feature engineering pipeline.
Engineering features that capture market dynamics
The system generated several groups of features, including:
- historical win rates
- bid response patterns
- contextual signals (device, geography, time-of-day)
- User and published related information
- aggregated temporal statistics
All of these feature transformations were implemented using PySpark. This ensured three critical properties:
- distributed scalability
- consistent transformation logic
- reproducibility between training and production
We also introduced daily automated jobs to refresh these features, ensuring the models could continuously adapt to live-serving behavior.
With structured features available, the platform could now train predictive models to estimate optimal bid prices.
Training models for bid optimization
With a consistent and scalable feature layer in place, the system could now train supervised learning models capable of predicting optimal bidding strategies based on historical performance.
These models analyzed past bidding behaviors to predict optimal bid values under different market conditions.
Designing the large-scale training pipeline
To support large-scale training, we built a fully automated training pipeline:
- ingest sampled production data
- generate features through distributed Spark pipelines
- train models on large-scale datasets
- evaluate models against predefined performance metrics
- automatically promote successful models to production
Before deployment, each model was evaluated across several performance indicators, including:
- Revenue uplift
- Win-rate improvement
- Bid efficiency
- Stability across traffic segments
Only models that met strict thresholds were promoted to production.
Versioning & managing model artifacts
To keep everything reliable and traceable, we introduced a structured model versioning system.
Every trained model artifact was:
- versioned systematically
- stored in Amazon S3
- logged with metadata for full traceability
This approach allowed our team to reproduce experiments, compare model versions, and quickly roll back deployments when necessary.
Evaluating models before production deployment
Deployment itself was designed for scale.
When a new model artifact was uploaded to storage, production servers detected the update through file-based triggers and refreshed the model automatically. This eliminated manual deployment steps and enabled seamless model rollouts.
Introducing reinforcement learning for continuous learning
While supervised models provided strong baseline predictions, real-time ad markets often require more adaptive strategies.
To address this limitation, we developed and deployed a reinforcement learning (RL) system in production.
The exploration-exploitation strategy
At the core of the RL system, we implemented a balanced exploration- exploitation strategy.
This allowed system to both learn new strategies and maintain strong short-term performance:
- Exploration allowed the system to test new bid prices
- Exploitation maximized revenue using previously learned strategies
By maintaining this balance, we were able to prevent the system from becoming overly dependent on historical bidding behavior.
Real-time distributed execution
To operate within strict ad auction latency limits, we deployed the RL system across multiple production servers.
This distributed architecture allowed the system to make real-time bidding decisions while maintaining the same strict latency constraints as the supervised models.
Continuous reward updates
The RL system continuously updated its policy based on live feedback signals like bid response rates, bid values, and auction outcomes.
Over time, this feedback loop allowed the platform to adjust to seasonal shifts, campaign changes and supply-demand volatility
In practice, the RL system acted as a self-adjusting mechanism built on top of historical intelligence captured by supervised models.
Building the MLOps architecture
To support this continuously evolving learning system, we needed an operational backbone capable of managing massive datasets, frequent retaining cycles, and large-scale model deployment.
To achieve this, we designed the entire platform as a production grade MLOps architecture around four core principles.
Automation first
Automation was built into every stage of pipeline.
Daily data processing pipelines operated automatically, handling ingestion, cleaning, validation, and feature generation. Model training and evaluation workflows were also automated, allowing models to be retrained regularly without manual oversight.
Trigger-based deployment mechanisms ensured that validated models could be promoted to production environments seamlessly.
Scalable infrastructure
Given the scale of the data and infrastructure involved, distributed processing was essential.
Spark-based pipelines enabled efficient processing of multi-billion-row datasets. Model inference workloads were distributed across multiple servers to support concurrent real-time bidding decisions.
Model artifacts and metadata were stored in S3-backed storage systems, providing reliable and scalable artifact management.
Reliability and safe model rollouts
Operational stability was critical for a system responsible for real-time pricing decisions.
Validation checks were embedded throughout the data pipelines to detect anomalies before training began. Model artifacts were version-controlled to ensure traceability and safe rollback capabilities.
Deployment workflows incorporated safe rollout mechanisms designed to minimize downtime and reduce operational risk.
Separation of system responsibilities
The platform architecture intentionally separated major system components.
Data pipelines were decoupled from model training processes, allowing each layer to evolve independently. Training infrastructure was also isolated from deployment mechanisms, reducing operational dependencies.
Supervised learning models and reinforcement learning systems operated in parallel, each contributing different forms of intelligence to the pricing platform.
This modular approach allowed teams to evolve each component independently without disrupting production.
From models to a continuously learning system
By combining distributed data pipelines, automated model training, reinforcement learning, and scalable deployment infrastructure, we built a continuously learning ad pricing platform.
The platform now supports:
- processing and storage of 1B+ sampled ad requests
- scalable feature engineering using PySpark
- automated daily training and validation pipelines
- seamless model versioning and deployment across hundreds of servers
- real-time reinforcement learning for adaptive bidding
- high-throughput, low-latency production performance
Most importantly, we transformed the ad price optimization from a static modeling problem into a continuous learning system capable of adapting to real-time market dynamics.
If you are building AI-driven platform or looking to scale machine learning in production, our team of experts can help design and implement the right architecture for your needs.