6.1.1 A demonstration of a 2-layer back-propagation network

Below is a demonstration of a two-layer back-propagation network that learns to classify a set of hand-drawn digits.

The network consists of an input layer with 20 units layed out in a 4 by 5 grid, a variable-sized hidden layer, and 4 output units. The default network has 6 hidden layer units. This value may be changed by editing the parameter # hidden units. Each hidden unit receives connections from the whole input grid. Hence, the weights on the connections from the input layer to each hidden unit are also displayed in a 4 by 5 grid. Each output unit receives connections from the whole hidden layer; the weights on those connections are displayed in a single row of squares immediately below the output unit. Each hidden and output unit also has a "bias weight", which determines its resting level of activation.

Activation display: The states of the input units are displayed in grey levels on a white background; black means fully activated. The activity levels of the hidden and output units are displayed both numerically, and via a color code ranging from black to red. Bright red means fully on. For the output layer, once a unit's activation level is 0.8 or greater, it is considered to be fully on for the purpose of the display. This feature makes it easier to see when the output units are producing nearly correct responses.

Weight display:The connection weights are displayed in two colors, red squares for positive weights and blue squares for negative weights. The size of the square indicates the magnitude of the weight. For example, a large blue square represents a very large negative connection weight. The bias weight of each output unit is displayed in a box just to the left of the unit.

Error display: the total error, summed over the 40 training patterns, (see back-propagation description) is displayed during training, once the the number of learning iterations (displayed as # back-props) exceeds 40.

Training patterns: The model is trained on a set of hand-drawn digits. There are 10 examples of each of the four digits, shown at the bottom left region of the display, below the network.

Testing the network: Any one of the training patterns may be presented to the network for testing, by simply clicking on the pattern. Alternatively, you can enter your own test pattern by clicking on Clear and then drawing a digit by hand onto the display of the input unit grid by clicking down and dragging the mouse.

Activation rule:When the network is presented with a training or testing pattern, the states of the input layer units are set to the corresponding values in the training pattern. Next, the hidden layer units are activated, and finally the output layer units are activated. Each hidden and output layer unit is activated as follows: first, it computes the sum of the activations on each of its incoming connections, weighted by the connection strengths. The final output is computed by passing the weighted summed input through a nonlinear squashing function (the sigmoid function), to get an activation value that lies in the range 0 to 1.

Learning:The network can be trained by clicking on animate; the label for the animate button will then change to stop. Learning will continue until you click on stop. Even for a small sized hidden layer, this network takes a long time to train. A (nearly) fully trained set of weights can be loaded by clicking on load trained weights. The learning rate parameter may be changed too. However, a value larger than about 0.5 tends to make the learning unstable; the error should always decrease as long as the learning rate is not too large.

The network updates its weights by a supervised learning procedure called back-propagation. The desired states of the output units are indicated by the little boxes above the output units. When an output unit should be on, its corresponding box is filled in black.
McMaster students: Please report any difficulties with this software to your instructor.