A brief guide to Post Processing methods for Object Detection

7 min readJul 5, 2021

Object detection is one of the most popular tasks in computer vision which involves recognizing and localizing objects using bounding boxes. Object detectors usually spit out huge number of overlapping bounding boxes corresponding to the same object. Naturally, the next step involves choosing the boxes that best localize objects such that each object has only 1 corresponding bounding box.

In object detection, first category independent region proposals are generated. These region proposals are then assigned a score for each class label using a classification network and their positions are updated slightly using a regression network. Finally,non-maximum-suppression is applied.

Post processing methods are usually employed for this purpose. They help suppress superfluous boxes and aid in selecting the best possible boxes. This article provides an overview of some of the popular post processing mechanisms.

Before proceeding with the different post processing steps, it would be good to refresh about IoU (Intersection over Union) as shown in the picture below.

Greedy Non-Maximum Suppression (NMS)

Greedy NMS is the most common and widely used post processing step. The algorithm can be summarized as follows -

Let B represent the list of predicted bounding boxes.

Select the bounding box with maximum confidence score, M and store in list D
Compute IoU between M and remaining boxes in B
Delete boxes with IoU(M,B) > threshold, Nt
Repeat process 1–3 until B is empty. D contains the retained bounding boxes

While greedy NMS, seems to be a decent post processing method, it suffers from some drawbacks which are addressed in the different variations described below.

Soft-NMS

Greedy NMS completely removes boxes which do not satisfy the IoU threshold. This may not be suitable when there are overlapping/crowded regions in the image. There might be a scenario wherein a box overlapping with the most confident box, contains an object of interest but eventually gets removed due to the threshold. This may result in the object not being detected at all.

Soft-NMS proposes a very simple change to the Greedy NMS algorithm as a solution to the above problem. It suggests that rather than removing the boxes which do not satisfy the threshold, the confidence scores of the boxes should be decayed by an amount proportional to the IoU. In this way, the boxes having higher IoU are penalized more than the boxes which are farther away with lower IoU similar to Greedy NMS.

Softer-NMS

Usually bounding boxes with the highest confidence scores are chosen during NMS, but they may not be indicative of better localization. Softer-NMS rather incorporates the use of localization uncertainty which is estimated as a single variate Gaussian distribution.

The parameter σ² indicates localization uncertainty i.e. variance of predicted location. Smaller the value of σ², the more confident the model is about the localization.

New coordinate obtained using variance weighted voting; σt is the variance threshold

Softer-NMS performs Soft-NMS followed by updating the coordinates of the most confident box, M using variance weighted voting. In variance voting, the learned localization variances are used to merge neighboring bounding boxes which further improves the localization of the chosen box. Thus, lower weights are assigned for boxes with high variances and boxes having small IoU with the selected box.

KL Loss was also introduced for bounding box regression instead of the smooth L1 loss in the paper.

IoU guided NMS

Similar to Softer-NMS, IoU guided NMS also states that classification confidence is not always related to localization confidence. In IoU guided NMS, IoU value was considered to be a good representation of the localization confidence.

The paper introduces IoU-Net, a separate branch after RoI layer, which is responsible for learning to predict the IoU between each detected bounding box and the matched ground-truth. The network acquires confidence of localization which improves the NMS procedure by preserving accurately localized bounding boxes. The algorithm is similar to Greedy NMS except that the in step 1, the box having the highest localization confidence is chosen instead of the box with the highest classification confidence score. Hence, it helps to eliminate the suppression failure caused by the misleading classification confidences. However, this method has a considerable computational overhead as it requires generation of bounding boxes and labels for training the IoU-Net.

DIoU (Distance-IoU) NMS

DIoU NMS uses central point distance between two boxes along with the IoU in the threshold criterion. Central point distance, RDIoU, between two boxes has been shown below where c is the diagonal length of the smallest enclosing box covering the two boxes and ρ(b,b_gt) is the euclidean distance between centers of the two boxes.

DIoU-NMS has been shown to work better in case of occlusions and ensures tighter boxes. The paper also introduced DIoU loss for bounding box regression and performs better than the usual smooth L1 loss.

Adaptive NMS

While the previously discussed NMS methods use only a single threshold to retain or suppress boxes, it may not be suitable for regions with lot of overlapping objects. Adaptive NMS was mainly developed for highly crowded/ occluded scenes such for pedestrian detection.

The idea of adaptive NMS is very simple. The NMS threshold is adaptive, i.e. the threshold changes based on the neighborhood of an object. Based on the density (crowdedness) around the object, a high or low threshold may be used. In this way, wrongly suppressing a large number of boxes based in the crowded regions is prevented. The adaptive NMS thus applies a dynamic suppression strategy,where the threshold rises as instances appear occluding each other and decays when instances appear separately. A separate subnetwork was introduced to learn the density prediction.

Learning NMS

The above mentioned NMS methods are usually performed independently from the training process and is basically greedy clustering with a fixed distance threshold forcing a trade-off between recall and precision. This paper rather introduces a convolutional neural network which learns to perform NMS.

The network deals with 2 components -

a) A matching loss to penalize superfluous detections ===>encouraging only 1 detection per object

b) Joint processing of neighbors (GossipNet) ===> to know if better detection can be obtained

NMS is reformulated as a rescoring task that seeks to decrease the score of detections that cover objects that already have been detected. After rescoring, simple thresholding is used to reduce the set of detections. During inference, the full set of rescored detections is directlty passed to the evaluation script.

The network has been shown to be a close replacement of Greedy NMS. However, the disadvantage is that it requires lot of training data.

Conclusion

Post processing schemes undoubtedly form an integral part of the object detection pipeline. They help in getting rid of false positives and retaining the best bounding boxes for objects. This article provides a brief summary of some of the popular post processing schemes used for object detection.

Thanks for reading and I hope that this article was useful! For a more detailed explanation, please checkout the papers mentioned under the references.

References

Soft-NMS -- Improving Object Detection With One Line of Code

Non-maximum suppression is an integral part of the object detection pipeline. First, it sorts all detection boxes on…

arxiv.org

Bounding Box Regression with Uncertainty for Accurate Object Detection

Large-scale object detection datasets (e.g., MS-COCO) try to define the ground truth bounding boxes as clear as…

arxiv.org

Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression

Bounding box regression is the crucial step in object detection. In existing methods, while $\ell_n$-norm loss is…

arxiv.org

Acquisition of Localization Confidence for Accurate Object Detection

Modern CNN-based object detectors rely on bounding box regression and non-maximum suppression to localize objects…

arxiv.org

Adaptive NMS: Refining Pedestrian Detection in a Crowd

Pedestrian detection in a crowd is a very challenging issue. This paper addresses this problem by a novel Non-Maximum…

arxiv.org

Learning non-maximum suppression

Object detectors have hugely profited from moving towards an end-to-end learning paradigm: proposals, features, and the…

arxiv.org

A brief guide to Post Processing methods for Object Detection

Greedy Non-Maximum Suppression (NMS)

Soft-NMS

Softer-NMS

IoU guided NMS

DIoU (Distance-IoU) NMS

Adaptive NMS

Learning NMS

Conclusion

References

Soft-NMS -- Improving Object Detection With One Line of Code

Non-maximum suppression is an integral part of the object detection pipeline. First, it sorts all detection boxes on…

Bounding Box Regression with Uncertainty for Accurate Object Detection

Large-scale object detection datasets (e.g., MS-COCO) try to define the ground truth bounding boxes as clear as…

Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression

Bounding box regression is the crucial step in object detection. In existing methods, while $\ell_n$-norm loss is…

Acquisition of Localization Confidence for Accurate Object Detection

Modern CNN-based object detectors rely on bounding box regression and non-maximum suppression to localize objects…

Adaptive NMS: Refining Pedestrian Detection in a Crowd

Pedestrian detection in a crowd is a very challenging issue. This paper addresses this problem by a novel Non-Maximum…

Learning non-maximum suppression

Object detectors have hugely profited from moving towards an end-to-end learning paradigm: proposals, features, and the…

Written by Rushali Grandhe