create_class_mlp ( : : NumInput, NumHidden, NumOutput, OutputFunction, Preprocessing, NumComponents, RandSeed : MLPHandle )

Create a multilayer perceptron for classification or regression.

create_class_mlp creates a neural net in the form of a multilayer perceptron (MLP), which can be used for classification or regression (function approximation), depending on how OutputFunction is set. The MLP consists of three layers: an input layer with NumInput input variables (units, neurons), a hidden layer with NumHidden units, and an output layer with NumOutput output variables. The MLP performs the following steps to calculate the activations z_j of the hidden units from the input data x_i (the so-called feature vector):

        n_i
        ----
 (1)    \     (1)       (1)
a    =  /    w    x  + b   ,       j = 1,...,n_h
 j      ----  ji   i    j
        i=1

          /  (1) \
z  = tanh | a    |
 j        \  j   /
Here, the matrix w_ji^(1) and the vector b_j^(1) are the weights of the input layer (first layer) of the MLP. In the hidden layer (second layer), the activations z_j are transformed in a first step by using linear combinations of the variables in an analogous manner as above:
        n_h
        ----
 (2)    \     (2)       (2)
a    =  /    w    z  + b   ,       k = 1,...,n_o
 k      ----  kj   j    k
        j=1
Here, the matrix w_kj^(2) and the vector b_k^(2) are the weights of the second layer of the MLP.

The activation function used in the output layer can be determined by setting OutputFunction. For OutputFunction = 'linear', the data are simply copied:

       (2)
y  =  a   ,       k = 1,...,n_o
 k     k
This type of activation function should be used for regression problems (function approximation). This activation function is not suited for classification problems.

For OutputFunction = 'logistic', the activations are computed as follows:

              1
y  =  ------------------ ,       k = 1,...,n_o
 k            /    (2) \
      1 + exp | - a    |
              \    k   /
This type of activation function should be used for classification problems with multiple (NumOutput) independent logical attributes as output. This kind of classification problem is relatively rare in practice.

For OutputFunction = 'softmax', the activations are computed as follows:


            /  (2) \
        exp | a    |
            \  k   /
y  =  ---------------- ,       k = 1,...,n_o
 k    n_o
      ----     /  (2) \
      \    exp | a    |
      /        \  l   /
      ----
      l=1

This type of activation function should be used for common classification problems with multiple (NumOutput) mutually exclusive classes as output. In particular, OutputFunction = 'softmax' must be used for the classification of pixel data with classify_image_class_mlp.

The parameters Preprocessing and NumComponents can be used to specify a preprocessing of the data (i.e., the feature vectors). For Preprocessing = 'none', the data are passed unaltered to the MLP. NumComponents is ignored in this case.

For Preprocessing = 'normalization', the data are normalized by subtracting their mean and dividing the result by the standard deviation of the individual components of the data. Hence, the transformed feature vectors have a mean of 0 and a standard deviation of 1. The normalization does not change the length of the feature vector. NumComponents is ignored in this case. This transformation can be used if the mean and standard deviation of the data differs substantially from 0 and 1, respectively, or for data in which the components of the data are measured in different units (e.g., if some of the data are gray value features and some are region features, or if region features are mixed, e.g., 'circularity' (unit: scalar) and 'area' (unit: pixel squared)). In these cases, the training of the net will typically require fewer iterations than without normalization.

For Preprocessing = 'principal_components', a principal component analysis is performed. First, the data are normalized (see above). Then, an orthogonal transformation (a rotation in the feature space) that decorrelates the data is computed. After the transformation, the mean of the data is 0 and the covariance matrix of the data is a diagonal matrix. The transformation is chosen such that the data that contains the most variation is contained in the first components of the transformed feature vector. With this, it is possible to omit the data in the last components of the feature vector, which typically are mainly influenced by noise, without losing a large amount of information. The parameter NumComponents can be used to detemine how many of the transformed feature vector components should be used. Up to NumInput components can be selected. The operator get_prep_info_class_mlp can be used to determine how much information each transformed component contains. Hence, it aids the selection of NumComponents. Like data normalization, this transformation can be used if the mean and standard deviation of the data differs substantially from 0 and 1, respectively, or for data in which the components of the data are measured in different units. In addition, this transformation is useful if it can be expected that the data is highly correlated.

In contrast to the above three transformations, which can be used for all MLP types, the transformation specified by Preprocessing = 'canonical_variates' can only be used if the MLP is used as a classifier with OutputFunction = 'softmax'). The computation of the canonical variates is also called linear discriminant analysis. In this case, a transformation that first normalizes the data and then decorrelates the data on average over all classes is computed. At the same time, the transformation maximally separates the mean values of the individual classes. As for Preprocessing = 'principal_components', the transformed components are sorted by information content, and hence transformed components with little information content can be omitted. For canonical variates, up to min(NumOutput - 1, NumInput) components can be selected. Also in this case, the information content of the transformed components can be determined with get_prep_info_class_mlp. Like principal component analysis, canonical variates can be used to reduce the amount of data without losing a large amount of information, while additionally optimizing the separability of the classes after the data reduction.

For the last two types of transformations ('principal_components' and 'canonical_variates'), the actual number of input units of the MLP is determined by NumComponents, whereas NumInput determines the dimensionality of the input data (i.e., the length of the untransformed feature vector). Hence, by using one of these two transformations, the number of input variables, and thus usually also the number of hidden units can be reduced. With this, the time needed to train the MLP and to evaluate and classify a feature vector is typically reduced.

Usually, NumHidden should be selected in the order of magnitude of NumInput and NumOutput. In many cases, much smaller values of NumHidden already lead to very good classification results. If NumHidden is chosen too large, the MLP may overfit the training data, which typically leads to bad generalization properties, i.e., the MLP learns the training data very well, but does not return very good results on unknown data.

create_class_mlp initializes the above described weights with random numbers. To ensure that the results of training the classifier with train_class_mlp are reproducible, the seed value of the random number generator is passed in RandSeed. If the training results in a relatively large error, it sometimes may be possible to achieve a smaller error by selecting a different value for RandSeed and retraining an MLP.

After the MLP has been created, typically training samples are added to the MLP by repeatedly calling add_sample_class_mlp. After this, the MLP is typically trained using train_class_mlp. Hereafter, the MLP can be saved using write_class_mlp. Alternatively, the MLP can be used immediately after training to evaluate data using evaluate_class_mlp or, if the MLP is used as a classifier (i.e., for OutputFunction = 'softmax'), to classify data using classify_class_mlp.


Parameters

NumInput (input_control)
integer -> integer
Number of input variables (features) of the MLP.
Default value: 20
Suggested values: 1, 2, 3, 4, 5, 8, 10, 15, 20, 30, 40, 50, 60, 70, 80, 90, 100
Restriction: NumInput >= 1

NumHidden (input_control)
integer -> integer
Number of hidden units of the MLP.
Default value: 10
Suggested values: 1, 2, 3, 4, 5, 8, 10, 15, 20, 30, 40, 50, 60, 70, 80, 90, 100, 120, 150
Restriction: NumHidden >= 1

NumOutput (input_control)
integer -> integer
Number of output variables (classes) of the MLP.
Default value: 5
Suggested values: 1, 2, 3, 4, 5, 8, 10, 15, 20, 30, 40, 50, 60, 70, 80, 90, 100, 120, 150
Restriction: NumOutput >= 1

OutputFunction (input_control)
string -> string
Type of the activation function in the output layer of the MLP.
Default value: 'softmax'
List of values: 'linear', 'logistic', 'softmax'

Preprocessing (input_control)
string -> string
Type of preprocessing used to transform the feature vectors.
Default value: 'normalization'
List of values: 'none', 'normalization', 'principal_components', 'canonical_variates'

NumComponents (input_control)
integer -> integer
Preprocessing parameter: Number of transformed features (ignored for Preprocessing = 'none' and Preprocessing = 'normalization').
Default value: 10
Suggested values: 1, 2, 3, 4, 5, 8, 10, 15, 20, 30, 40, 50, 60, 70, 80, 90, 100
Restriction: NumComponents >= 1

RandSeed (input_control)
integer -> integer
Seed value of the random number generator that is used to initialize the MLP with random values.
Default value: 42

MLPHandle (output_control)
class_mlp -> integer
MLP handle.


Example
* Use the MLP for regression (function approximation)
create_class_mlp (1, NHidden, 1, 'linear', 'none', 1, 42, MLPHandle)
* Generate the training data
* D = [...]
* T = [...]
* Add the training data
for J := 0 to NData-1 by 1
    add_sample_class_mlp (MLPHandle, D[J], T[J])
endfor
* Train the MLP
train_class_mlp (MLPHandle, 200, 0.001, 0.001, Error, ErrorLog)
* Generate test data
* X = [...]
* Compute the output of the MLP on the test data
for J := 0 to N-1 by 1
    evaluate_class_mlp (MLPHandle, X[J], Y)
endfor
clear_class_mlp (MLPHandle)


* Use the MLP for classification
create_class_mlp (NIn, NHidden, NOut, 'softmax', 'normalization', NIn,
                  42, MLPHandle)
* Generate and add the training data
for J := 0 to NData-1 by 1
    * Generate training features and classes
    * Data = [...]
    * Class = [...]
    add_sample_class_mlp (MLPHandle, Data, Class)
endfor
* Train the MLP
train_class_mlp (MLPHandle, 100, 1, 0.01, Error, ErrorLog)
* Use the MLP to classify unknown data
for J := 0 to N-1 by 1
    * Extract features
    * Features = [...]
    classify_class_mlp (MLPHandle, Features, 1, Class, Confidence)
endfor
clear_class_mlp (MLPHandle)

Result

If the parameters are valid, the operator create_class_mlp returns the value 2 (H_MSG_TRUE). If necessary an exception handling is raised.


Parallelization Information

create_class_mlp is processed completely exclusively without parallelization.


Possible Successors

add_sample_class_mlp


Alternatives

create_class_box


See also

clear_class_mlp, train_class_mlp, classify_class_mlp, evaluate_class_mlp


References

Christopher M. Bishop: ``Neural Networks for Pattern Recognition''; Oxford University Press, Oxford; 1995.

Andrew Webb: ``Statistical Pattern Recognition''; Arnold, London; 1999.


Module

Foundation



Copyright © 1996-2008 MVTec Software GmbH