Object detection in real-time streaming recordings and images

It’s high time to understand and take advantage of the technology, creating anti-frauds, security systems, utilising facial recognition algorithms. Use cases might be various – taking retail only – object detection can pick out shoplifters, prevent clients’ or employees’ errors, or detect misbehaviours in a split second.

ML / Deep learning / PyTorch
image processing, object detection, image recognition
Having a global retailer as a client we’ve been thinking of a solution that might change the way the retail industry runs autonomous stores at scale.
The real challenge

Autonomous vehicles, no-staff retail stores, image secured personal devices are just some of the areas in which Computer Vision has empowered us to reach for what was considered impossible until recently. Transferring to machines what a man is capable of, does not sound unreachable anymore.

It’s high time to understand and take advantage of the technology, creating anti-frauds, security systems, utilising facial recognition algorithms. Use cases might be various – taking retail only – object detection can pick out shoplifters, prevent clients’ or employees’ errors, or detect misbehaviours in a split second.

Whenever you aim to dive into quite complex math equations to prove your approach works well, expect the road to be bumpy. Before we managed to deliver such a solution, we had to think of it as something more than data structures. Technology is the answer, but it might be tricky. What in particular?

Data: As Deep Learning models require large amounts of data to grasp existing patterns, our team had to quickly prepare as large a set of images as possible. To increase our chances of obtaining a powerful model, we used techniques such as Data Augmentation and Transfer Learning. To train the model with enough accuracy we had to prepare thousands (1.6k) of correctly labelled pictures.

Computing power: Training of the Convolutional Neural Networks requires high computing power, therefore our team decided to run Virtual Machines on GCP and configure them accordingly. We used cloud machines with NVIDIA Tesla V100 GPUs.

Training: this part requires knowledge of Python but also knowledge of broadly understood Deep Learning.

Debugging: Some problems were related to image processing, others with errors in the implementation of the libraries we used.

The solution

Due to the use of the case, we chose a network architecture that allows for quick location of the objects in the picture, but at the same time ensures high accuracy. We successfully implemented machine learning and object detection techniques to quickly identify many types of products present in photos or video in near real-time. The first step is to give it a training data set, enabling a model to learn of an object that it tries to match. Then the magic happens. How? Let’s see an example.

Here’s an example of one of the most problematic products for a retail store customer – bread rolls.

The trivial act of buying many types of rolls can turn into a time-consuming process if the product’s scanning tool does not recognize the right type of roll or you must search for it in an extensive catalogue. This is a problem for both buyers (in self-service checkout) and sellers and it can be related to hundreds of other products.

We train object detection to find and locate bread rolls present in images that we had taken before. This allows identifying multiple types of bread rolls within the same photo or a near real-time video feed. In the final result, rolls were detected and the probability of their recognition was shown.

Given the example of the retail process, the solution improves sales. This not only speeds up the buying process in self-service checkout points but also allows to track products that are badly chosen (e.g. a cheaper product when in fact more expensive is bought). It can also be extended to other items that do not have a barcode (such as vegetables or fruits).

Performance indicators

Customer Friction Factor gives you all the information you need to begin mitigating your customer’s frustration levels and improving your customers’ experiences


Shrink reduction – Fraud and theft detection mechanisms reduce the amount of money that you lose every day due to unfair customers.


Operational cost reduction – Many of the activities that were previously performed manually by employees (such as watching CCTV videos seeking thieves) can now be performed by intelligent algorithms.


Valuable marketing information – With AI you can analyze the image from CCTV cameras you will be able to indicate, for example, which goods are often put back on the shelf, suggesting that the customer was thinking about buying them but eventually resigned.


Optimization of internal processes – Computer Vision will allow you to react quickly to current problems on the shop floor. Detect that the stock of products on the shelves is low. Open additional checkouts based on the number of customers currently shopping.

Technologies we used

The team used highly accurate object detection-algorithms and methods such as YOLO 3 network architecture. Why YOLO?

YOLO – or You Only Look Once – is a real-time object detection algorithm that was one of the first to balance the quality and speed provided. As the “object detection model” it can be used not only to indicate which objects are present in a given photo but also where they are and in what quantity. Since 2015, three versions of this algorithm have already been created – allowing us to use YOLO3 – as well as variants designed for mobile devices such as TinyYOLO. The precision of the mobile version is limited, but it is also less computationally demanding, so it can run faster.

image for article: Object detection in real-time streaming recordings and images

To solve the problem, our team used PyTorch – a very popular framework for Deep Learning. This powerful Python library has recently gathered a lot of publicity after such giants as Tesla and DeepMind publicly announced that they would build their models solely upon it. We are proud that our Computer Vision team uses tools considered the bleeding edge of technology.

It’s the field of our expertise.
If you want to check how we operate, talk to an expert.

"*" indicates required fields

If you click the “Send” button you agree to the privacy policy. Your personal data given in the contact form above will be processed for purposes of answering your inquiry and for any further correspondence regarding this inquiry. The controller of your personal data is VirtusLab Sp. z o.o. For more information, see our Privacy Policy