Peter Tino

University of Birmingham

"Markovian Architectural Bias of Recurrent Neural Networks "

Abstract:

Recurrent neural networks (RNNs) are neural networks with feedback connections that endow the networks with a sort of `neural memory'. RNNs have been widely studied (among other things) as a continuous state-space metaphor for learning and representing input symbolic streams, e.g. examples of grammatical strings of a language.

It has been quite common to concentrate on clusters of activations in recurrent neurons as `abstract states' discovered by RNN during the training. Critics of this heuristic argue that clustering in the recurrent layer may not reflect any meaningful information processing states, since well separated clusters already appear in the state space of RNNs initialized with `small weights' (a common and well-motivated strategy of initializing RNNs) even PRIOR to any training.

Perhaps surprisingly, I will show that clusters of recurrent activations emerging prior to training are actually meaningful and correspond to Markovian prediction contexts. By applying traditional knowledge extraction methods from trained RNNs to UNTRAINED networks, one can construct predictive models corresponding to a class of Markov models, called variable memory length Markov models.

I will sketch two different outlooks at the phenomenon of architectural bias of RNNs towards Markov models (definite memory machines):

1. claims that can be proved in the framework of statistical learning theory showing the advantage of starting learning with `simple` RNNs having small weights (a sort of Ockham's razor)

and

2. fractal analysis of recurrent activation patterns showing a straightforward relation between complexity of input stream (topological entropy) and geometric complexity of activation clusters (fractal dimensions).

Papers related to the talk on-line at: http://www.cs.bham.ac.uk/~pxt/my.publ.html

* P. Tino, B. Hammer: Architectural Bias in Recurrent Neural Networks - Fractal Analysis. Neural Computation, in print, 2003.

* B. Hammer, P. Tino: Recurrent neural networks with small weights implement definite memory machines.

Neural Computation, in print, 2003.

* P. Tino, M. Cernansky, L. Benuskova: Markovian Architectural Bias of Recurrent Neural Networks. In Intelligent Technologies - Theory and Applications. Frontiers in AI and Applications, vol. 76, (eds) P. Sincak, J. Vascak, V. Kvasnicka and J. Pospichal. pp. 17-23, IOS Press, Amsterdam, 2002.


 

Back to Talks Page