Center-Directions for counting and localization

CeDiRNet

We introduce CeDiRNet, a novel point-supervised learning approach for object counting and localization that addresses the challenges of imbalanced annotated versus unannotated pixels common in point-based methods. Instead of focusing only on pixels near point annotations, CeDiRNet performs dense regression of center-direction vectors, where each pixel predicts a direction pointing to the nearest object center, thereby leveraging information from many surrounding pixels to provide stronger supervision. This formulation enables the method to be decomposed into two stages: a domain-specific dense regression network that predicts these center-directions using a convolutional neural network (CNN) with a Feature Pyramid Network (FPN), and a lightweight, domain-agnostic localization network that efficiently processes the dense direction maps to accurately localize object centers. Importantly, the localization network can be trained on synthetic data independent of the target domain, reducing the need for extensive retraining and lowering annotation effort without compromising accuracy.

The CeDiRNet framework is built on two key components:

Domain-specific dense regression: This module predicts dense center-direction vectors for each pixel in the image. These vectors point towards the nearest object center, effectively encoding spatial relationships and object locations. The dense regression network is trained using point annotations, ensuring that the supervision remains lightweight while still capturing detailed spatial information.
Lightweight, domain-agnostic localization network: This component processes the dense center-direction outputs to identify object centers. A key advantage of this network is that it is trained once on synthetic data and does not require retraining for new datasets. Additionally, for scenarios requiring lower computational cost, a hand-crafted CNN can be used instead, completely eliminating the need for training while still delivering efficient performance.

The architecture of CeDiRNet is modular, allowing for flexibility in adapting to various datasets and tasks. The dense regression network is typically a convolutional neural network backbone (ResNet, ConvNext, etc.) optimized for extracting spatial features, while the localization network employs a lightweight design to aggregate and interpret the regression outputs. Together, these components enable CeDiRNet to achieve state-of-the-art results in object counting and localization tasks, all while relying on minimal supervision.

Code and citation

The implementation of CeDiRNet is open-source and available on GitHub licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. You can find the code and additional resources at CeDiRNet GitHub Repository.

Please cite our paper published in the IEEE Robotics and Automation Letters when using this model and code:

@article{Tabernik2024PR,
    author = {Tabernik, Domen and Muhovi{\v{c}}, Jon and Sko{\v{c}}aj, Danijel},
    doi = {10.1016/j.patcog.2024.110540},
    issn = {00313203},
    journal = {Pattern Recognition},
    number = {April},
    pages = {110540},
    publisher = {Elsevier Ltd},
    title = {{Dense center-direction regression for object counting and localization with point supervision}},
    url = {https://doi.org/10.1016/j.patcog.2024.110540},
    volume = {153},
    year = {2024}
}