Performing near-real-time personalized recommendations at scale with Dapr
Challenge
Derivco is a global tech company and invention house, with deep roots in the iGaming sector. Founded over 25 years ago, Derivco now has over 2000 global experts based across Australasia, Southern Africa, Europe, and North America. Derivco’s teams around the world have been the early pioneers behind leading technology infrastructure, payment processing and management platforms for some of the world’s biggest iGaming brands.
Historically, in the iGaming industry, players have all been treated the same and are typically grouped by commonly identifiable fields. With this grouping, it was possible to interact with the players at a group level, but it was never possible to personalize the experience of an individual. Derivco set out to change this by leveraging the power of AI/ML. The goal was to have the AI/ML identify characteristics of the player’s current experience and provide recommendations to the player or to intervene on behalf of the player to improve their experience. The fundamental idea being that with the power of ML, it can provide immediate recommendations far quicker than via normal human intervention processes and interact or intervene at any point along the player’s lifecycle, not simply at static intervention points as has been done in the past.
Personalized recommendations have been around for a while in the industry, but typically run in batch jobs, and thus speed of the recommendation has not been a priority. However, in our use case, speed is one of the top priorities alongside correctness whilst considering any applicable legal and regulatory obligations. Ultimately the challenge here is how to do that in near-real-time because we have a very short window in time in which we can interact or intervene before the player is gone – lost because of poor experience.
Solution
To achieve our goal, we had to build an event-driven system that can handle huge volume and trigger actions as and when the AI/ML identifies that an action should be taken. To do this, we deployed a distributed system that leveraged Pub/Sub, State Management, and Virtual Actors to interact with our AI/ML resources and act based on the recommendations. We use Dapr as the application runtime to provide us with contact points to the infrastructure we have deployed, and actors to handle concurrent processing of events at a player-level without slowing the system down. ML recommendations come out of the system and are triggered through Pub/Sub using CloudEvents. These recommendations are made at a player-level and are personalized to that player.
Impact
We were able to build from scratch and deploy a system into production in under 6 months. This system easily and reliably handles up to 320 million events per day with more than 1000 events per second flowing through the actors. At the height of feature delivery, we were deploying up to 80 times per day into production without dropping a single event.
Since go-live, having Dapr at our disposal has given us the opportunity to tackle technical debt within our platform. Dapr has also given us the freedom to rewrite legacy systems that were previously infeasible to rewrite due to the time it would take to complete. This is because Dapr eliminates the need to write boilerplate plumbing code to tie up infrastructural elements.
Dapr has also eliminated all environmental issues from the development à test cycle as the infrastructure abstraction allows engineers to run an entire development environment fully contained on their local machine. Even remote workers can spin up development environments in cloud-based workspaces; the efficiency and subsequent financial impact of being totally unblocked and unaffected by environmental issues cannot be overstated.
Dapr has also proven easy to manage from an operational perspective, giving us the ability to augment our own telemetry with the telemetry that Dapr emits to give us a clear picture of the system. The regular updates of patch and minor releases that the Open-Source community work hard on maintaining have brought important stability and performance improvements.
By the numbers
320+ million
events per day
60+ million
database interactions per day
Immeasurable
development time saved