Unifying efficient coding, predictive coding and sparse coding (part I)


While there are many studies done on efficient coding, it always operates under the assumption that the neurons are encoding some past information. On the other hand, there is a belief that sensory information is useful to motions only if they contain information about the future (i.e. predictive coding). This study attempts to reconcile these two different but important branches in sensory processing, and shows that under different encoding schemes, the optimal spatiotemporal filters for neurons are different due to constraints of sparse coding, which may explain the various forms of RFs observed in the V1 area.

Three main parameters were discussed throughout the paper: the decoding lag ∆ (if positive, neuron encodes future; if negative, encodes past), code length τ (the amount of states that downstream neurons integrate over to gain information) and the coding capacity C (amount of information between instantaneous response and stimulus – can be thought of as the signal-to-noise ratio). Typical efficient coding studies operate within the parameter space of high C, ∆<0 and τ≫0, which is the blue sphere in Fig 1. Predictive coding, on the other hand, investigates the relation of ∆ and C, which is the red rectangle in Fig 1. The grey sphere marked with a question mark is the region of interest in this paper.
Fig 1. Efficient coding and predictive coding in terms of ∆,τ,C.


Under the information bottleneck framework, the objective function to gain an optimal filter that maximizes the information encoded about $X_{t+\Delta }$, while constrained by the coding capacity $C=I(R_{t};X_{-\infty :t})$, is
$L_{p(r_{t}\mid x_{-\infty :t})}=I(R_{t-\tau :t};X_{t+\Delta })-\gamma I(R_{t};X_{-\infty :t})$
Where $R_{t-\tau :t}\equiv (...,r_{t-1},r_{t})$, and $R_{t}$ is the instantaneous response $(r_{t})$. $\gamma $ is the Lagrange multiplier, which controls the amount of compression when encoding. The object to optimize over is a linear temporal filter, which weights the past information $(...,x_{t-1},x_{t})$ and sums them up linearly to produce a response $r_{t}$. The results are shown in Fig 2.

By this point I should mention that three stimuli with different statistical properties are used (Fig 2B): the first is a Markov chain, where the future depends only on the immediate past. The second is two Markov chain with different time scales added together. The third is a hidden Markov chain, where there exists a latent variable. In Fig 2C-F, two regions were investigated using these three stimuli, as shown in Fig 2A. One is an instantaneous code (τ=0, corresponding to Fig 2C-D), and the other is a code with finite length τ (corresponding to Fig 2E-F). 


For the Markov stimulus, there is no point in encoding any past information if τ=0, hence the optimal filter remains unchanged. For the two-timescale stimulus, low-pass filtering extracts the slow varying component, which is more useful for prediction. For the hidden Markov stimulus, a biphasic filter extracts velocity information and aids in prediction. When τ>0, all of the optimal filters for the different stimuli becomes more biphasic.
Fig 2. Result for optimized filters.


This shows how the encoding strategy of neurons change as the stimulus property and objective changes. However, all of the stimulus mentioned above are one dimensional and Gaussian. How can this result be generalized to natural stimulus, which is known to be non-Gaussian with high dimensions? Stay tuned to find out!


author: Pei-Hsien Liu


Disclaimer: All the figures (and results, of course) shown here are from the original paper. 
Original paper: Chalk, M., Marre, O., & Tkačik, G. (2017). Toward a unified theory of efficient, predictive, and sparse coding. Proceedings of the National Academy of Sciences, 115(1), 186-191. doi:10.1073/pnas.1711114115

留言