Shallow and deep convolutional neural networks for predicting visual saliency in images using a data-driven approach.
Saliency-2016-cvpr is a research project that provides two convolutional neural network models for predicting visual saliency in images. It solves the problem of identifying which parts of an image attract human attention by using a purely data-driven, end-to-end learning approach instead of relying on hand-crafted features. The project includes both a shallow network trained from scratch and a deeper network with transferred features from classification tasks.
Computer vision researchers and practitioners working on visual attention modeling, saliency prediction, or applications requiring understanding of image regions that capture human gaze. It's particularly relevant for those interested in early deep learning approaches to this problem.
Developers choose this project because it provides well-documented, pre-trained models from a peer-reviewed CVPR publication, offering both lightweight and deeper architectural options. It represents one of the first end-to-end CNN implementations specifically for saliency prediction, serving as a foundational reference in the field.
Shallow and Deep Convolutional Networks for Saliency Prediction
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Provides both a shallow network trained from scratch and a deeper network with transferred features, allowing flexibility for different computational and accuracy needs, as detailed in the paper's two-model approach.
Published at CVPR 2016 with open access, includes full paper, citations, and detailed setup instructions, making it a reliable reference for research in visual saliency.
Offers downloadable model files for both architectures, such as the 2.5 GB shallow model and 99 MB deep model, enabling easy experimentation without training from scratch.
Pioneered end-to-end CNN training for saliency prediction, demonstrating a shift from hand-crafted features to data-driven methods on large datasets like SALICON.
Relies on Lasagne/Theano and Caffe, which are no longer actively maintained, leading to compatibility issues and difficulty in integration with modern systems.
The shallow network model file is 2.5 GB, which is bulky compared to contemporary models and can be prohibitive for storage-constrained deployments or quick prototyping.
The authors acknowledge that their posterior work, SalGAN, offers better performance, making this model less suitable for applications requiring cutting-edge accuracy.