Building risk detection engine with Kafka and stream processing for top global retailer

retail, identity fraud detection
stream processing, cybersecurity, risk management
* 2014 Identity Fraud Report By Al Pascual: Card Data Breaches and Inadequate Consumer Password Habits Fuel Disturbing Fraud Trends
Every two seconds, another American becomes a victim of online identity fraud. Detecting account takeovers, attacks, and malicious user behaviour were the main goals of the Risk Detection Engine *
Motivation for building risk engine

Every organisation exposing online services needs to have security mechanisms that prevent a variety of different attacks and frauds attempted by malicious users. Due to the cloud revolution, access to computing power distributed across the globe is easier than ever. Encouraged by that fact, attackers build massive botnets, imitating real customers, in order to steal value, PII or any other goods from their accounts. Recognizing account takeovers in real-time at scale is a key responsibility of risk engine platform, that we have built.

Responsibilities of the system

Context statistics calculation


Authentication attempt categorization


Taking reactive and proactive targeted actions to prevent different types of attacks

Example features

Recognizing login attempts from unknown and untrusted devices for given user


Recognizing login attempts from new locations for a given user


Recognizing login attempts from botnet agents


Recognizing brute-force attacks

Solution architecture

The final solution offers several different types of analysis. Analyses are performed in parallel, using specific data pipelines. Each pipeline’s execution is triggered by every new login event, which makes the platform event-driven. An event is just a statement of the fact – something that has happened in the real world. Events of certain type are grouped together in an event stream. Building a reliable, scalable and fault-tolerant event streaming platform is a very complex task on its own, so we have decided to use an industry-standard in the area – Apache Kafka

Apache Kafka provides all the necessary abstractions to build a platform based on stream processing paradigm. Moreover, it offers multiple approaches for building stream processor applications. Starting from the most basic services using Customer/Producer API, through applications utilizing Kafka Streams, finishing on SQL interface provided by ksqlDB engine. 

Most of our stream processors are based on Kafka Streams, which is a library that can be used with any application built on a JVM stack. Kafka Streams based applications do not have any specific requirements about the deployment platform, thus our infrastructure is built on top of Kubernetes, which allows scaling up and down according to the traffic volume. Additionally, there are also specific stream processors that use ksqlDB, together with multiple integrations with third-party systems through Kafka Connect.

“The final system has been built based on an event-at-a-time processing model that operates with millisecond latency with Kafka in its core. Kafka Streams framework supplies features that help engineers to focus on delivering real business value. The solution illustrates how the stream processing paradigm could be applied to build software that accommodates a requirement to provide results in real-time at scale.”  – says Daniel Jagielski, TechLead at VirtusLab


The results

The given results are 1-day statistics:

  • ~1000 unique IPs blocked
  • ~30 000 total amount of IP blocked
  • ~500 user accounts locked
Read more

Read full article about reference architecture here.

Get access to a full reference architecture
Secure your business and organization with risk detection engine based on stream processing.

"*" indicates required fields

If you click the “Send” button you agree to the privacy policy. Your personal data given in the contact form above will be processed for purposes of answering your inquiry and for any further correspondence regarding this inquiry. The controller of your personal data is VirtusLab Sp. z o.o. For more information, see our Privacy Policy