Different Frameworks for Understanding Natural Vision

Introduction
Past decades of work on vision have focused on artificial stimuli in hopes of isolating and characterizing specific circuit mechanisms. However, the statics of such stimuli are very different from those of natural stimuli, which have complex characteristics such as large dynamical range, strong temporal and spatial correlations, etc. There have been two main approaches concerning this challenge of understanding how natural images are processed: a stimulus-oriented approach and a goal-oriented approach.

The Stimulus Oriented Approach
The stimulus oriented approach relies heavily on the “efficient coding hypothesis”[1]. It searches for transformations of the input signals that optimize some metric, e.g. maximizes reduced redundancies. Therefore this approach cares mostly about the statistics of the input.
For example, many of these work have focused on second-order spatial correlations in natural input, where it is commonly found that low spatial frequencies have higher power than those of higher frequencies (in fact, it has a power-law scaling). This causes redundancies in neighboring cell responses, and therefore some neurons suppress their response to low spatial frequencies.

The Goal Oriented Approach
This approach emphasizes on abstractions of some specific features. That is, depending on the desired behavioral output, the underlying circuitry must try to extract certain features to accomplish the goal. Some examples on this approach include: directional selectivity, orientation selectivity, omitted stimulus response, image recurrence sensitivity, etc.

Merging of Perspectives
Historically, stimulus-oriented frameworks were applied to early visual areas and goal-oriented to later cortical areas. However, this line is beginning to blur. One way the brain may be satisfying two very different frameworks may be this: retinal neurons that support a variety of behavioral goals are more likely to obey the efficient coding hypothesis (stimulus-oriented), and those that project to brain areas that deal with specialized behaviors may not (goal-oriented).
There have been more and more methods that try to blend the two approaches, for example: 1) information bottleneck: capturing relevant information rather than all information[2]. 2) Using only stimuli that are predictive about the future.

[1]Neurons are operating in the most efficient way, i.e. using least spikes to encode signal. This hypothesis predicts that each possible response of the neuron occurs with equal frequency (for a noiseless neuron). If this is not met, then the neuron is not optimizing its response range. Adding noise complicates the matter, as the neuron would also have to balance between robustness.
[2]The information bottleneck measures how well Y is predicted from a compressed representation T compared to its direct prediction from X. (from wiki)


Written by: Pei-Hsien Liu


Original paper: Turner, Maxwell H., et al. “Stimulus- and Goal-Oriented Frameworks for Understanding Natural Vision.” Nature Neuroscience, vol. 22, no. 1, Oct. 2018, pp. 15–24., doi:10.1038/s41593-018-0284-0

留言