A bottom-up computational visual attention model with a simple, but intriguing hypothesis, that our eyes usually rest on areas with abrupt changes in color, which I call corners.
So, I converted this idea into a visual saliency prediction model.
This is the sample result:
The field of computational visual attention tries to find algorithms that predict where we look. The general model takes an image as input and returns a grayscale map that indicates how much people will pay attention to corresponding pixels. This field is very multidisciplinary, as there are many proposed approaches from various fields including information theory, general computer science, biology, psychology, and more.
There are two types of visual attention model: bottom-up and top-down.
Bottom-up models deal with low-level features such as luminosity, colors, and orientations. Just imagine a single red dot on a white paper. Your eyes are attracted to that area immediately. The dot “pops-out”. You can’t help but look at it. It seems like our eyes and brains are wired this way. Another example is a paper filled with vertical bars pattern, but one of the bars is horizontal. Your eyes are attracted to the horizontal one.
Top-down models deal with high-level features, those that come from individuals and different from person to person, like internal goals. They are much more complicated than the bottom-up models. The example is when a paper is filled with dots of various colors, but you are told to count the blue ones. In this case, your brain is “primed” to be ready to detect a single color, which is blue.
The unified model would be able to address both low-level and high-level features. However, it is still too hard to find such algorithm today. There is a little foundation on this topic (psychology, biology, etc.) that we can base the model on. Most papers will come up with various experiments hoping to crack the secret sauce behind the mechanism.
I propose a bottom-up model called corner-based saliency model, or CORS. It bases on the fact that corner information often attracts our eyes.
Try looking at a line. Did you spend the same amount of time along each segment of the line? No. You would quickly look at the starting and ending points of a line. The intermediate section is not that important to you. I called these areas corners. In case of a rectangle, they are its four corners. Or, you can try looking around where you are right now, you would notice that your eyes usually rest on the area with the abrupt change in colors or intensity.
So, CORS detects corners in an image and simply regards them as the area with the higher amount of visual attention. According to MIT Saliency Benchmark, this simple idea turned out to be competitive.