In our implementation and examples we have restricted ourself on Feed-Forward-Neural-Networks (FNN). They are the most often used neural networks for regression and classification tasks. But too often they are treated as black boxes in daily use. The advantage of flexibility is mostly compensated by non-transparency of the training process and the final model. However, the idea of visualization of the inner geometry of the network can be applied to nearly any kind of neural network.
To understand how a neural network works and what it learns it is necessary to understand the topological behaviour of the weights of the network during and after the training process. In statistical modelling it is often desirable to interpret the model, that is to find out which variables are contributing to one or more response variables and of what kind the contributions are (e.g., linear, nonlinear) .
To see the topological behaviour of the weights we visualize it based on a statistical technique called "multidimensional scaling" (MDS). We interpret the weights of the FFN as distances between the locations of the units. For these purpose we propose an intuitively non-linear transformation of the weights and a "better" behaving linear transformation. Thus nearby located units are connected by large weights.
Now we select the locations of the units, based on these distances, through an optimization algorithm. The drawback of MDS is that the true structure is high-dimensional. If we use two or three dimensions for visualization we will get only the best 2 or 3-dimensional approximation of true structure. Nevertheless we should be able to grasp some important properties of the network structure.
Implementation in XploRe 3.2.
The implementation consists of four commands:
NNINIT which checks a connection and weight matrix (CWM) if it is a FFN, NNFUNC which computes for a specific input and CWM the output, NNVISU which visualizes the geometry of the network, and NNANAL which allows the analysis of the input and output a single unit. The macro NN finally allows to generate a multi-layer FFN, to train it (via a test set or cross-validation and early stopping) and to analyze it for classification or regression.
proc()=3Dmain() func ("nn") ; load the NN-macro x=3Dread("kredit") ; load the credit data t=3Dread("tkredit") ; load training, test and validation set y=3Dx[,1] ; create y x=3Dx[,2:21] ; create x x=3D(x-mean(x)=B4)./sqrt(var(x)=B4)~matrix(1000) ; standardize the data nn(x y t) ; run the NN-macro endp
1. We apply our technique to the credit scoring data of Fahrmeir and Hammerle (1981) with different number of units in the hidden layer (see Figure 1 with no hidden units). The aim is to predict from some variables if we have "good" or a "bad" client such that the repayment of the credit is not a problem.
Figure 1: The best generalization network for the logistic regression with one output-unit. We can see easily that we have 3 important variables (near to the output unit o) and 5 less important variables (far away from the output unit).
2. The second application comes from a very popular field of molecular biology (protein structure prediction). First we consider only one of the simplest cases. As input variables we chose the relative amino acid frequencies within the protein. Secondary structural elements (e.g., alpha- helix, betha-strand, coil) are used for a rather rough class definition of four supersecondary structural classes. Thus we have here four output units.