**symmetrical connections**: if there is a connection going from unit*j*to unit*i*having a connection weight equal to*W_ij*then there is also a connection going from unit*i*to unit*j*with an equal weight.**linear threshold activation**: if the total weighted summed input (dot product of input and weights) to a unit is greater than or equal to zero, its state is set to 1, otherwise it is -1. Normally, the threshold is zero. Note that the Hopfield network for the travelling salesman problem (assignment 3) behaved slightly differently from this.**asynchronous state updates**: units are visited in random order and updated according to the above linear threshold rule.**Energy function**: it can be shown that the above state dynamics minimizes an energy function.**Hebbian learning**

The most important features of the Hopfield network are:

*Energy minimization during state updates*guarantees that it will converge to a stable attractor.- The
*learning*(weight updates) also minimizes energy; therefore, the training patterns will become stable attractors (provided the capacity has not been exceeded).

However, there are some serious drawbacks to Hopfield networks:

**Capacity**is only about .15 N, where N is the number of units.**Local energy minima**may occur, and the network may therefore get stuck in very poor (high Energy) states which do not satisfy the "constraints" imposed by the weights very well at all. These local minima are referred to as**spurious attractors**if they are stable attractors which are not part of the training set. Often, they are blends of two or more training patterns.

The Boltzmann machine, described below, was designed to overcome these limitations.

The states of the units in the network are initialized as follows: a central square of units are turned on, the rest are turned off, and then all units are randomly flipped with probability 0.1.

Here is an example of the initial state of the network.

For this network, there are only two global energy minima, one with all units on and one with all units off. After settling, however, the network usually ends up in a "blend state" which is a blend of the two global minima.

Here is an example of the final state of the
network after settling. This is a local minimum, sometimes called a
*spurious attractor*.

**Matlab code:**

- initHopfield.m (initialization module)
- forwardHopfield.m (state updates)
- scramble.m (function)
- plotHopActivations.m
- init2Dweights.m (function)

You can run the above demo by loading initHopfield, and then repeatedly loading forwardHopfield. Each time forwardHopfield is run, you should then call plotHopActivations, to display the states of all units in the network in a 2-dimensional layout.

The binary Boltzmann machine is very similar to the binary Hopfield network, with the addition of three features:

**Stochastic activation function:**the state a unit is in is probabilistically related to its*Energy gap*. The bigger the energy gap between its current state and the opposite state, the more likely the unit will flip states.**Temperature and simulated annealing:**the probability that a unit is on is computed according to a sigmoid function of its total weighted summed input divided by T. If T is large, the network behaves very randomly. T is gradually reduced and at each value of T, all the units' states are updated. Eventually, at the lowest T, units are behaving less randomly and more like binary threshold units.**Contrastive Hebbian Learning:**A Boltzmann machine is trained in two phases, "clamped" and "unclamped". It can be trained either in supervised or unsupervised mode. Only the supervised mode was discussed in class; this type of training proceeds as follows, for each training pattern:*Clamped Phase:*The input units' states are clamped to (set and not permitted to change from) the training pattern, and the output units' states are clamped to the target vector. All other units' states are initialized randomly, and are then permitted to update until they reach "equilibrium" (simulated annealing). Then Hebbian learning is applied.*Unclamped Phase:*The input units' states are clamped to the training pattern. All other units' states (both hidden and output) are initialized randomly, and are then permitted to update until they reach "equilibrium". Then*anti-Hebbian learning*(Hebbian learning with a negative sign) is applied.

The stochasticity enables the Boltzmann machine to overcome the problem of getting stuck in local energy minima, while the contrastive Hebb rule allows the network to be trained with hidden features and thus overcomes the capacity limitations of the Hopfield network. However, in practice, learning in the Boltzmann machine is hopelessly slow.

In class, we saw a demonstration on overhead transparencies of the
Boltzmann machine performing **figure-ground segregation**. This
network was hard-wired (i.e. hand-selected weights, no learning). Some units
were designated as "edge units" which had a particular direction and
orientation,
while others were designated as "figure-ground units". At each image location
there was a full set of edge units in every possible orientation (horizontal
or vertical) and direction (left, right, up or down), and a full set of
figure/ground units (one of each). The weights could be
excitatory or inhibitory, and represented particular constraints amongst the
figure/ground and edge units. For example, an edge unit at one location would
inhibit the edge unit of the same orientation but opposite direction at the
same location. Another example: a vertical right-ward pointing edge
unit would excite a figure unit at the next image local to the right, and
inhibit a ground unit at that location, and would inhibit the figure unit to
the left and excite the ground unit to the left. The entire network was
initialized with the appropriate edge units turned on, and all other units
off, and then all units were randomly flipped with some small probability so
that the input was noisy. Units states were then updated using simulated
annealing. The network was shown to be able to fill in
continuous regions and label them as either figure or ground. The region could
be non-convex (e.g. the letter C). The network could also fill in
non-continuous edges, exhibiting "illusory contours".