A brief guide to Post Processing methods for Object Detection

Rushali Grandhe
7 min readJul 5, 2021

--

Object detection is one of the most popular tasks in computer vision which involves recognizing and localizing objects using bounding boxes. Object detectors usually spit out huge number of overlapping bounding boxes corresponding to the same object. Naturally, the next step involves choosing the boxes that best localize objects such that each object has only 1 corresponding bounding box.

In object detection, first category independent region proposals are generated. These region proposals are then assigned a score for each class label using a classification network and their positions are updated slightly using a regression network. Finally,non-maximum-suppression is applied.

Post processing methods are usually employed for this purpose. They help suppress superfluous boxes and aid in selecting the best possible boxes. This article provides an overview of some of the popular post processing mechanisms.

Before proceeding with the different post processing steps, it would be good to refresh about IoU (Intersection over Union) as shown in the picture below.

Intersection over Union

Greedy Non-Maximum Suppression (NMS)

Greedy NMS is the most common and widely used post processing step. The algorithm can be summarized as follows -

Let B represent the list of predicted bounding boxes.

  1. Select the bounding box with maximum confidence score, M and store in list D
  2. Compute IoU between M and remaining boxes in B
  3. Delete boxes with IoU(M,B) > threshold, Nt
  4. Repeat process 1–3 until B is empty. D contains the retained bounding boxes
Greedy NMS

While greedy NMS, seems to be a decent post processing method, it suffers from some drawbacks which are addressed in the different variations described below.

Soft-NMS

Greedy NMS completely removes boxes which do not satisfy the IoU threshold. This may not be suitable when there are overlapping/crowded regions in the image. There might be a scenario wherein a box overlapping with the most confident box, contains an object of interest but eventually gets removed due to the threshold. This may result in the object not being detected at all.

Soft-NMS criteria

Soft-NMS proposes a very simple change to the Greedy NMS algorithm as a solution to the above problem. It suggests that rather than removing the boxes which do not satisfy the threshold, the confidence scores of the boxes should be decayed by an amount proportional to the IoU. In this way, the boxes having higher IoU are penalized more than the boxes which are farther away with lower IoU similar to Greedy NMS.

Softer-NMS

Usually bounding boxes with the highest confidence scores are chosen during NMS, but they may not be indicative of better localization. Softer-NMS rather incorporates the use of localization uncertainty which is estimated as a single variate Gaussian distribution.

The parameter σ² indicates localization uncertainty i.e. variance of predicted location. Smaller the value of σ², the more confident the model is about the localization.

Single variate Gaussian distribution
New coordinate obtained using variance weighted voting; σt is the variance threshold

Softer-NMS performs Soft-NMS followed by updating the coordinates of the most confident box, M using variance weighted voting. In variance voting, the learned localization variances are used to merge neighboring bounding boxes which further improves the localization of the chosen box. Thus, lower weights are assigned for boxes with high variances and boxes having small IoU with the selected box.

KL Loss was also introduced for bounding box regression instead of the smooth L1 loss in the paper.

IoU guided NMS

Similar to Softer-NMS, IoU guided NMS also states that classification confidence is not always related to localization confidence. In IoU guided NMS, IoU value was considered to be a good representation of the localization confidence.

The paper introduces IoU-Net, a separate branch after RoI layer, which is responsible for learning to predict the IoU between each detected bounding box and the matched ground-truth. The network acquires confidence of localization which improves the NMS procedure by preserving accurately localized bounding boxes. The algorithm is similar to Greedy NMS except that the in step 1, the box having the highest localization confidence is chosen instead of the box with the highest classification confidence score. Hence, it helps to eliminate the suppression failure caused by the misleading classification confidences. However, this method has a considerable computational overhead as it requires generation of bounding boxes and labels for training the IoU-Net.

DIoU (Distance-IoU) NMS

DIoU NMS uses central point distance between two boxes along with the IoU in the threshold criterion. Central point distance, RDIoU, between two boxes has been shown below where c is the diagonal length of the smallest enclosing box covering the two boxes and ρ(b,b_gt) is the euclidean distance between centers of the two boxes.

Central point distance equation
DIoU NMS criteria

DIoU-NMS has been shown to work better in case of occlusions and ensures tighter boxes. The paper also introduced DIoU loss for bounding box regression and performs better than the usual smooth L1 loss.

Adaptive NMS

While the previously discussed NMS methods use only a single threshold to retain or suppress boxes, it may not be suitable for regions with lot of overlapping objects. Adaptive NMS was mainly developed for highly crowded/ occluded scenes such for pedestrian detection.

Adaptive NMS, dM = density of the object M

The idea of adaptive NMS is very simple. The NMS threshold is adaptive, i.e. the threshold changes based on the neighborhood of an object. Based on the density (crowdedness) around the object, a high or low threshold may be used. In this way, wrongly suppressing a large number of boxes based in the crowded regions is prevented. The adaptive NMS thus applies a dynamic suppression strategy,where the threshold rises as instances appear occluding each other and decays when instances appear separately. A separate subnetwork was introduced to learn the density prediction.

Learning NMS

The above mentioned NMS methods are usually performed independently from the training process and is basically greedy clustering with a fixed distance threshold forcing a trade-off between recall and precision. This paper rather introduces a convolutional neural network which learns to perform NMS.

The network deals with 2 components -

a) A matching loss to penalize superfluous detections ===>encouraging only 1 detection per object

b) Joint processing of neighbors (GossipNet) ===> to know if better detection can be obtained

NMS is reformulated as a rescoring task that seeks to decrease the score of detections that cover objects that already have been detected. After rescoring, simple thresholding is used to reduce the set of detections. During inference, the full set of rescored detections is directlty passed to the evaluation script.

The network has been shown to be a close replacement of Greedy NMS. However, the disadvantage is that it requires lot of training data.

Conclusion

Post processing schemes undoubtedly form an integral part of the object detection pipeline. They help in getting rid of false positives and retaining the best bounding boxes for objects. This article provides a brief summary of some of the popular post processing schemes used for object detection.

Thanks for reading and I hope that this article was useful! For a more detailed explanation, please checkout the papers mentioned under the references.

References

--

--