**Associative memory models** are different; they can perform
**one-shot learning**. So far we have seen two examples of
one-shot learning devices:

- the linear associative memory models (both hetero- and auto-associative)
- nonlinear associative models. So far we have seen one example, the Hopfield network. Below, we will study another example of a nonlinear associative memory model: Anderson's BSB model.

- They may be good models of human episodic memory because they exhibit one-shot learning.
- They can model dynamical aspects of memory retrieval because they have recurrent connections, and so their activations change over time. This permits associative computation and pattern completion.
- We can analyze their formal properties using tools such as
**Eigenvalue analyses**: If the linear associator updates its state vector*f*by incrementing the state in proportion to the weighted summed inputs to each unit,*f(t+1) = f(t) + A f(t)*, where*A*is the weight matrix, it can be shown that*f*will converge to the*direction*of the eigenvector of*A*having the largest eigenvalue; let's call this the*dominant eigenvector*. This is a well-known technique for finding eigenvectors, known as the "power method", and it is guaranteed to work, provided the initial state is not orthogonal to dominant eigenvector. Note that we said the final state converges to the*direction*of the dominant eigenvector; however, the state will continue to grow in magnitude at an exponential rate, if the eigenvalue is greater than one. THus, a linear recurrent network is a dangerous thing: if the states are repeatedly updated, they may grow without bound. During training, the matrix*A*is created by the outer product rule; it is a sum of outer products of the training pattern with itself (in the case of auto-association) or with the target (in the case of hetero-association). Thus, we would hope that after learning, we have a weight matrix for which the patterns are very close to the eigenvectors of the matrix. Further, the eigenvectors should have roughly equal eigenvalues. If one eigenvector has a much larger eigenvalue that the others, the state update rule will almost always retrieve that same vector regardless of the initial state (except when the initial state is orthogonal to the dominant eigenvector).

**The BSB model** is very similar to a Hopfield network,
in that it is a nonlinear associative memory model with recurrent connections.
Like the Hopfield model, it can also be trained with a simple Hebbian
learning rule. (It can also be trained using a Widrow-Hoff or delta rule, but
we didn't discuss that in class.) However, Anderson generalizes the simple
binary Hopfield network in an important way: the units have
**continous real-valued states**. The units' states are computed
according to a "LIMIT function" (see text, p 504-506) of the total
weighted summed input. This function is roughly sigmoidal in shape. States
are updated gradually, so that for each unit, the old state is gradually
replaced by the influence of the incoming activations. As in the case of the
linear associator, the state dynamics tend to cause the network to converge to
eigenvectors of the weight matrix. So if the initial state is close to an
eigenvector with a large eigenvalue, it will converge rapidly to something
close to that eigenvector. However, because of the LIMIT function, the final
state will actually consist of all 1's and -1's. If there are only 3 units,
this would correspond to the corners of a cube in state-space. For higher
dimensions, the network states are bounded by the corners of a
high-dimensional hypercube. That's why this model is called
brain-state-in-a-box.

**Applications of the BSB model:**

**Modelling multistable perception**, e.g. the Necker cube.**Modelling rule-based inferences:**for semantic knowledge, and knowledge of drugs, organisms and diseases.**Modelling arithemtic associations**e.g. multiplication. We discussed evidence for localist ("sensory") versus distributed/abstract representations of numbers in children, and cross-cultural evidence of different numerical representations. The BSB simulations used a combination of a localist and an abstract code.