Deep Compositional Networks
We have developed a novel form of deep neural networks that combines the advantages of compositional hierarchies and deep convolutional networks (ConvNets or CNNs). Both approaches have its benefits and its drawbacks. For instance, compositional hierarchies have explicit structure that directly explains what is a feature or a category. This makes approach partially generative as it can generate new instances of specific parts. However, learning compositional hierarchy has always presented its own challenges. Having a good optimization function to learn compositions that are good for discriminative tasks, such as image classification, has always been a difficult proposition. On the other hand deep convolutional networks have always had a well defined cost function and a well defined learning scheme through back-propagation. However, main drawback of deep learning has always been understanding what is happening inside. The network structure is quite opaque and complex visualization techniques are needed to gain any understanding into the feature being learned.
We have now introduced a novel network architecture that combines the advantages of both approaches and can act as a bridge between deep networks and compositional hierarchies. The proposed network combines explicit structure of compositions with a powerful discriminative training of deep networks. We term this network Deep Compositional Network. This architecture introduces several intriguing properties into deep networks:
- fully adjustable receptive fields through spatially-adjustable filter units
- new visualization capabilities by explicitly following the compositions
- reduced parameters for spatial coverage
- efficient inference
We achieved this by replacing 3x3 filter units with novel compositional unit that is implemented with Gaussian distribution. We term this unit Displaced Aggregation Unit or DAU for short. DAU has three parameters, i.e., importance weight, offset (mu) and spatial aggregation perimeter (variance), that are all learned through back-propagation in deep learning framework. Having ability to learn the importance weight and offsets allows us to achieve fully adjustable receptive fields that can replace existing 3x3 convolutions in deep networks.
The proposed network with DAUs is fully compatible with any other deep learning models and can be used as direct replacement of convolutional layers in ConvNets. DAU layers can be arbitrarily combined with standard convolutional layers to form any kind of network. DAUs can be implemented for any kind of deep architectures, such as AlexNet, VGG16 or ResNet. In fact, we provide the pre-trained AlexNet variant of deep compositional network available for download.
Adjustable receptive fields
Ability of DAUs to arbitrarily learn offsets allows deep compositional network to adjust receptive fields to any problem at hand. Network is flexible enough to adjust certain features to have large receptive fields while adjusting other features to have receptive field that are smaller. This can be used as replacement of dilated convolution since that is particularly useful for semantic segmentation where context information is important.
Reduced number of parameters
Our analysis revealed that 9 units/parameters in 3x3 filters still uses too many units for spatial coverage. With DAU it is possible to significantly reduce the number of units and parameters to only a few units per filter kernel.
Compared to classic ConvNets our deep compositional network with DAUs can achieve:
- 70% less parameters
- 90% less units
- Only 0.5% performance difference
Code for all our models is publicly available on our GitHub repositories.
- DAU-ConvNet: Self-contained DAU layer implementation (C++ and CUDA). Use this library to implement DAU layers in any deep learning frameworks.
- DAU-ConvNet TensorFlow: DAU-ConvNet contains TensorFlow wrapper as well (build using BUILD_TENSORFLOW_PLUGIN=on).
- DAU-ConvNet-caffe: Caffe implmenetation of DAU-ConvNet using upper library. See DAUConvolution layer for details how to implement DAU-ConvNet library.
- caffe: Older ICPR2016 version of Deep Compositional Network (GaussianConvLayers) without constraints on unit variance but significantly slower implementation.
Please feel free to use our code in your research projects or implementing it in any deep learning framework using DAU-ConvNet library. Please cite our CVPR2018 paper when using our code.
We provide ImageNet pre-trained models for Caffe framework. Models are based on AlexNet architecture where conv3,conv4 and conv5 are implemented with DAU convolutions. Models compatible with our DAU-ConvNet-caffe implementation are available to download at:
- AlexNet-DAU-ConvNet (default) (56.9% top-1 accuracy, 0.7 mio DAU units)
- AlexNet-DAU-ConvNet-small (56.4% top-1 accuracy, 0.3 mio DAU units)
- AlexNet-DAU-ConvNet-large (57.3% top-1 accuracy, 1.5 mio DAU units)