============================================================================= README file for the example files letseq.xxx ============================================================================= Description: Elman network (partially recurrent network) ============ for the task to predict a letter sequence The task of this partially recurrent network is to predict a letter sequence of the letters b, d, g, a i u. The problem is described in detail in J.L. Elman: Finding Structure in Time. Cognitive Science, 14:179-211, 1990 See the user manual for a detailed description of Elman networks and their usage. Pattern-Files: letseq_train.pat ============== letseq_test.pat The six input units code the input letters in a 6 Bit binary vector. (Note that in SNNS all inputs and outputs are treated as real values). The coding is as follows: letter Consonant Vowel Interrupted High Back Voiced b 1 0 1 0 0 1 d 1 0 1 1 0 1 g 1 0 1 0 1 1 a 0 1 0 0 1 1 i 0 1 0 1 0 1 u 0 1 0 1 1 1 A random letter sequence of length 1000 was generated from the consonants of this set. From this sequence a new sequence was generated by replacing every consonant of the original seqence with the following rules: b -> ba d -> dii g -> guuu resulting in a new sequence, in which the consonants still were random, but the type and number of vowels was determined by the preceding consonant. Both pattern files may be used for the standard elman network letseq_elman.net and the hierarchical elman network letseq_h_elm.net. Network-Files: letseq_elman.net ============== letseq_h_elm.net The file letseq_elman.net contains a trained elman network for the task to predict a semi-random letter sequence as described above. This network has the following dimensions: 6 input units 24 hiden units in one hidden layer 24 context units 6 output units The file letseq_h_elm.net contains a trained hierarchical elman network for the same task. This network has the following dimensions: 6 input units 8 hiden units in the first hidden layer 8 context units in the first context layer 8 hiden units in the second hidden layer 8 context units in the second context layer 6 output units The second network has a similar predictive power as the first but much less weights. The standard configuration files for these network files are letseq_elman.cfg and letseq_h_elm.cfg (one 2D display only). Hints: ====== The easiest way to create Elman networks is with the BIGNET panel from the info panel. All network parameters can then be specified in a special Elman network creation panel called with the respective button in the BIGNET panel. If you want to train your own Elman network from scratch, note to set the proper initialization function and initialization parameters. Remember to set the update function to JE_Order or JE_Special, depending on your task (see the SNNS user manual for more details). You may choose between four different learning functions, JE_BP (Backprop), JE_BP_Momentum, JE_Quickprop, and JE_Rprop. The example was trained with a combination of JE_BP and JE_Rprop: 10 cycles JE_BP with learning rate 0.5 (1st parameter), plus 10 cycles JE_Rprop with parameters 0.1 (1st) and 50.0 (2nd). The behaviour of this network can very nicely be visualized with the network analyzer tool which can be called from the info panel with the GUI button as ANALYZER. The proceed as follows: Press ON and LINE (so that both buttons are highlighted) from the buttons at the right. Press SETUP and choose T-E graph from the network analyzer setup panel. Choose the following values for axis, min, max, unit, grid: x 0.0, 50.0, - , 10 y 0.0, 1.0, _,, 10 This specifies the display area to be a time series of length 100 with range [0, 1] sum squared error is displayed (middle error button) Choose m-test: 10 in this network analyzer setup panel to test 10 patterns in a multiple inputs test sequence (You may also choose to test more or less input patterns. Finally, press the button M-TEST to test the trained network for the number of input patterns specified. You see how the prediction error is zero for all vowels that are predicted, because the network can predict them from the preceeding consonant. The prediction error for the consonants which still appear randomly gives the sharp peaks of the error curve. ============================================================================= End of README file =============================================================================