What is object detection and how it can be used with RPA.

Object detection is a computer technology related to computer vision and image processing that deals with detecting instances of semantic objects of a certain class (such as humans, buildings, or cars) in digital images and videos. Well-researched domains of object detection include face detection and pedestrian detection.
By Aphex34 – Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=45679374
RPA – Robotic process automation is a form of business process automation technology based on metaphorical software robots (bots) or on artificial intelligence (AI)/digital workers. In this particular case, RPA is used to save the frames and send an email with the detected frames.

Motivation

I started to work on an object detection activity out of the frustration that most of the CCTV cameras these days with motion detection are terrible at detecting objects and most of the times they send false positive reports. For instance, when the wind blows the leaves or a spider web moves in front of the camera. I used the first version of this work to scan the cctv recordings of a burgled house in order to find what happened that night.

Implementation

For implementation, I used tensorflow and ML.net with the following models: SSD MobileNet, Yolo, SSD Shared Box, RCNN Inception, Faster RCNN. The best model in terms of CPU usage is SSD MobileNet, but the accuracy I would say it is acceptable. The most optimum one in terms of accuracy and resource usage is Yolo. The below screenshot is from the detection running on the attached test mp4 file:

Sample Description

Camera Configuration

The rtsp stream path for my cctv camera (Hikvision) is “rtsp://userName:[email protected]:port/Streaming/Channels/101”. Most of the cctv cameras expose a rtsp stream, you just need to search the model of the camera on the web to find the rtsp format.

Object detection Parameters

For the attached sample, the most important properties are

  1. Object Filter => “person” – to ignore all other objects
  2. Minimum Score => 60 – on a scale from 1 to 100 and it represents the confidence of the algorithm for the detected object.
  3. Every n seconds => 4 – to avoid the load on the CPU, the detection is checking the frames every 4 seconds ignoring any intermediary frames
  4. Interval between (s) => 2 seconds – the interval between subsequent frames after the detection.
  5. Images Count (s) => 2 seconds – how many subsequent frames will be taken after a person is detected.

To reduce the number of duplicate detections, I set the maximum frame difference to 3% so that I can ignore subsequent frames. Also, I check “ignore similar frames” to ignore frames detected at the same position.

For instance, I don’t want to scan a subsequent frame if the current image frame is similar with the previous image frame(within a limit of 3%). Also, I don’t want to report a detection when a similar detection has already been reported.

Object Detection Workflow

The bellow workflow performs the following tasks:

  1. Detects a person in a rtsp stream using SSD Mobile Net.
  2. Increments an index for indexing the file name of the frame.
  3. Saves the frame with the detected person.
  4. Saves the 2 subsequent frames after the detection at 2 seconds between them.

Properties for object detection activity

For more details on the above properties please check the documentation.

In this example, I tried to detail the steps of using Rinkt Studio with object detection and RPA.

Alternative Solution

An alternative solution may be to use the cctv motion detection with email notifications and object detection. The camera will send an email for every motion detection and Rinkt Studio will listen to that email box and discard invalid notifications by using object detection. This could be summarized as:

  1. Listen to a gmail email and scan attachments for emails coming from the camera.
  2. Scan the image attachments with object detection.
  3. Discard invalid emails.
  4. Send another email, activate the alarm or perform another action when a person, car,… is detected.

Object detection sample for rtsp streams.

Sample Video.