=============================================================================
	README file for the example files letseq.xxx
=============================================================================


Description:	Elman network (partially recurrent network)
============	for the task to predict a letter sequence

The task of this partially recurrent network is to predict a letter
sequence of the letters b, d, g, a i u. The problem is described in
detail in J.L. Elman: Finding Structure in Time. Cognitive Science,
14:179-211, 1990


See the user manual for a detailed description of Elman networks and
their usage.


Pattern-Files:	letseq_train.pat
==============	letseq_test.pat

The six input units code the input letters in a 6 Bit binary
vector. (Note that in SNNS all inputs and outputs are treated as real
values). The coding is as follows:

letter	     Consonant Vowel Interrupted      High	Back	Voiced
 b		1	0	1		0	0	1
 d		1	0	1		1	0	1
 g		1	0	1		0	1	1
 a		0	1	0		0	1	1
 i		0	1	0		1	0	1
 u		0	1	0		1	1	1

A random letter sequence of length 1000 was generated from the
consonants of this set. From this sequence a new sequence was
generated by replacing every consonant of the original seqence with
the following rules:
	b	->	ba
	d	->	dii
	g	->	guuu
resulting in a new sequence, in which the consonants still were
random, but the type and number of vowels was determined by the
preceding consonant.

Both pattern files may be used for the standard elman network
letseq_elman.net and the hierarchical elman network letseq_h_elm.net.


Network-Files:	letseq_elman.net
==============	letseq_h_elm.net

The file letseq_elman.net contains a trained elman network for the
task to predict a semi-random letter sequence as described above.
This network has the following dimensions:
	 6 input units
	24 hiden units in one hidden layer
	24 context units
	 6 output units

The file letseq_h_elm.net contains a trained hierarchical elman
network for the  same task. This network has the following dimensions:
	 6 input units
	 8 hiden units in the first hidden layer
	 8 context units in the first context layer
	 8 hiden units in the second hidden layer
	 8 context units in the second context layer
	 6 output units
The second network has a similar predictive power as the first but
much less weights.

The standard configuration files for these network files are
letseq_elman.cfg and letseq_h_elm.cfg (one 2D display only).


Hints:
======

The easiest way to create Elman networks is with the BIGNET
panel from the info panel. All network parameters can then be
specified in a special Elman network creation panel called
with the respective button in the BIGNET panel.

If you want to train your own Elman network from scratch, note to set
the proper initialization function and initialization parameters.

Remember to set the update function to JE_Order or JE_Special,
depending on your task (see the SNNS user manual for more details).

You may choose between four different learning functions, 
JE_BP (Backprop), JE_BP_Momentum, JE_Quickprop, and JE_Rprop.
The example was trained with a combination of JE_BP and JE_Rprop:
	10 cycles JE_BP with learning rate 0.5 (1st parameter), plus
	10 cycles JE_Rprop with parameters 0.1 (1st) and 50.0 (2nd).

The behaviour of this network can very nicely be visualized with the
network analyzer tool which can be called from the info panel with the
GUI button as ANALYZER. The proceed as follows:
Press ON and LINE (so that both buttons are highlighted) from the
buttons at the right.
Press SETUP and choose T-E graph from the network analyzer setup panel.
Choose the following values for axis, min, max, unit, grid:
	x	0.0,  50.0, - ,  10
	y	0.0,  1.0,  _,,  10

This specifies the display area to be a time series of length 100 with
range [0, 1] sum squared error is displayed (middle error button)
Choose m-test: 10 in this network analyzer setup panel to test 10
patterns in a multiple inputs test sequence (You may also choose to test
more or less input patterns.
Finally, press the button M-TEST to test the trained network for the
number of input patterns specified.

You see how the prediction error is zero for all vowels that are
predicted, because the network can predict them from the preceeding
consonant. The prediction error for the consonants which still appear
randomly gives the sharp peaks of the error curve.


=============================================================================
	End of README file
=============================================================================