Please choose your delivery country and your customer group
This memorandum provides a theorem which illustrates why a general adaptive feed-forward layered network with linear output units can perform well as a pattern classification device. The central result is that minimizing the error at the output of the network is equivalent to maximizing a particular norm, the Network Cost Function, at the output of the hidden units. If the total covariance matrix is full rank and the targets are appropriately chosen, then this cost function relates the inverse of the total covariance matrix and the weighted between class covariance matrix of the hidden unit patterns. In a linear network it is shown how our theorem can reproduce the result recently obtained by Gallinari et al as a special case. We present numerical simulations to illustrate the theorem and to show that alternative choices for the cost function at the hidden layer are not maximized, generally, in a nonlinear situation. Great Britain, (RH)