Artificial Neural Networks

 

Anna Liebowitz ’09

SPE Summer 2006

Advisor: Jordan Boyd Graber

Introduction

Artificial Neural Networks are computer programs designed to mimic what we know of how our brains actually learn and function.  Just as our nervous systems are built out of individual neurons connected to each other in an intricate web, ANNs consist of graphs or trees of nodes systematically connected to one another by edges of varying weights.  A network generally consists of a layer of input nodes, with each node connected to every node in the next layer (usually a hidden layer) and with each hidden node connecting to every output node.

 

 

 

In theory, nodes can either “fire” or not, depending on their input.  Each node takes the sum of its inputs, and if the sum exceeds some threshold, the node fires and passes on its signal to nodes in the next layer.  In our computer model, if the node fires, its output value is 1; otherwise the value is 0.  (In practice, the nodes in my model can output any value between 0 and 1, and in other models the nodes can output wider ranges of vales, if desired). The network as a whole functions because the input to each node is determined by both the signals from the incoming nodes, and also by the weight from each incoming node to the receiving node.  Thus, if input node 1 has a value of 1, but the weight between input1 and hidden2 is 3, then hidden2 will receive a signal of 3 from input 1.

 

“This is all very nice, but what can it do?” I hear you cry.  ANNs seem like an interesting system (perhaps) but not so useful, until we build a network that can train itself – that is, one that can adjust its own weights so that it can recognize desired patterns and give the correct output.  A simple network like the one shown above can act as a logic gate, recognizing such functions as AND, OR, and XOR (depending on the weights).  More importantly, if we initialize the weights randomly in the beginning, we can train the network so that it can learn to recognize specific patterns.  We can give the network a set of inputs and expected outputs, such as the following XOR set:

 

Input 1

Input 2

Output

0

0

0

0

1

1

1

0

1

1

1

0

 

We are asking the network to output a 1 only when one, but not both, of the inputs are one.  Because the output of the network is dependent on the weights between the network’s nodes, initially the network will produce a completely wrong answer (anywhere between 0 and 1).  But the network will compute its error by subtracting actual output from the expected output that we give it, and then it will change its weights accordingly, to make the error smaller.  After several thousand rounds of training itself on the same set of data, the network will “learn” to recognize the XOR pattern.

 

Using My Program

 

My network requires three programs to run, which can be found here: Program Files.

 

The program creates a default network with three layers, though you can specify the number of layers in the command line if you prefer.  You must then pass the program the number of nodes per layer (input, hidden layers [if there are any] and output) and the number of training sets, and then each example input followed by its expected output.  Examples can be found here: Examples.

 

Applications of Artificial Neural Networks

 

More complex networks can do more than simply recognize simple patterns: they can sort out patterns from huge amounts of data, which allows them to learn such complex tasks as handwriting recognition (in which the network must learn to correlate patterns of pixels with specific letters) and language processing (in which the network must learn to predict the next word in a sequence of words).

 

The latter of these applications requires that the network have some sort of memory, because the desired output cannot be discerned simply from the given input.  The network cannot predict a word from the single previous word; it must have access to a string of previous words.  An Elman network, a type of ANN that attempts to solve this problem, provides memory by having an extra layer of nodes, the context layer.  The hidden layer connects to the output layer, as usual, but it also connects to a node in the context layer.  The context layer, in turn, passes its information back to the hidden nodes in the next round of training.  That way, the input into the hidden nodes depends not only on the current input, but also on all of the previous input.