• People
  • Research
  • Projects
  • Publications
  • Resources
ViCoS Lab

Center-Directions for counting and localization

Subtopic of Object counting

Researchers

Domen Tabernik, PhD
Domen Tabernik, PhD
Danijel Skočaj, PhD
Danijel Skočaj, PhD
Jon Muhovič, MSc
Jon Muhovič, MSc

CeDiRNet

We introduce CeDiRNet, a novel point-supervised learning approach for object counting and localization that addresses the challenges of imbalanced annotated versus unannotated pixels common in point-based methods. Instead of focusing only on pixels near point annotations, CeDiRNet performs dense regression of center-direction vectors, where each pixel predicts a direction pointing to the nearest object center, thereby leveraging information from many surrounding pixels to provide stronger supervision. This formulation enables the method to be decomposed into two stages: a domain-specific dense regression network that predicts these center-directions using a convolutional neural network (CNN) with a Feature Pyramid Network (FPN), and a lightweight, domain-agnostic localization network that efficiently processes the dense direction maps to accurately localize object centers. Importantly, the localization network can be trained on synthetic data independent of the target domain, reducing the need for extensive retraining and lowering annotation effort without compromising accuracy.

The CeDiRNet framework is built on two key components:

  • Domain-specific dense regression: This module predicts dense center-direction vectors for each pixel in the image. These vectors point towards the nearest object center, effectively encoding spatial relationships and object locations. The dense regression network is trained using point annotations, ensuring that the supervision remains lightweight while still capturing detailed spatial information.

  • Lightweight, domain-agnostic localization network: This component processes the dense center-direction outputs to identify object centers. A key advantage of this network is that it is trained once on synthetic data and does not require retraining for new datasets. Additionally, for scenarios requiring lower computational cost, a hand-crafted CNN can be used instead, completely eliminating the need for training while still delivering efficient performance.

The architecture of CeDiRNet is modular, allowing for flexibility in adapting to various datasets and tasks. The dense regression network is typically a convolutional neural network backbone (ResNet, ConvNext, etc.) optimized for extracting spatial features, while the localization network employs a lightweight design to aggregate and interpret the regression outputs. Together, these components enable CeDiRNet to achieve state-of-the-art results in object counting and localization tasks, all while relying on minimal supervision.

Code and citation

The implementation of CeDiRNet is open-source and available on GitHub licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. You can find the code and additional resources at CeDiRNet GitHub Repository.

Please cite our paper published in the IEEE Robotics and Automation Letters when using this model and code:

@article{Tabernik2024PR,
    author = {Tabernik, Domen and Muhovi{\v{c}}, Jon and Sko{\v{c}}aj, Danijel},
    doi = {10.1016/j.patcog.2024.110540},
    issn = {00313203},
    journal = {Pattern Recognition},
    number = {April},
    pages = {110540},
    publisher = {Elsevier Ltd},
    title = {{Dense center-direction regression for object counting and localization with point supervision}},
    url = {https://doi.org/10.1016/j.patcog.2024.110540},
    volume = {153},
    year = {2024}
}

Publications

Projects

MV4.0 - Data-driven framework for development of machine vision solutions

October 2021 - September 2024
The functional objective of the project is to shift the paradigm in the development of machine vision solutions from hand-engineered specific solutions to data-driven learning-based design and development that would enable more general, efficient, flexible and economical development, deployment and maintenance of machine vision systems. The main research goal of this project is to develop novel deep learning methods for iterative, active, robust, weak, self-, unsupervised and few-shot learning that would reduce the amount of needed annotated data.

DIVID - Detection of inconsistencies in complex visual data using deep learning

July 2018 - December 2021
The objective of the project is to develop novel deep learning methods for modelling complex consistency and detecting inconsistencies in visual data using training images annotated with different levels of accuracy. The main project goal is to go beyond the traditional supervised learning, where all anomalies on all training images have to be adequately labelled.
Faculty of Computer and Information Science

Visual Cognitive Systems Laboratory

University of Ljubljana

Faculty of Computer and Information Science

Večna pot 113
SI-1000 Ljubljana
Slovenia
Tel.: +386 1 479 8245