A New Link to an Old Model Could Crack the Mystery of Deep Learning. But a number of researchers are showing that idealized versions of these powerful networks are mathematically equivalent to older, simpler machine learning models called kernel machines. If this equivalence can be extended beyond idealized neural networks, it may explain how practical ANNs achieve their astonishing results.
A little intro and summary for those who might appreciate it:
- Parametric and non-parametric models are two ways of describing data.
- Parametric models describe the distribution of data by relying on a bunch of numbers (parameters), where each number tends to correspond to some feature of the data (e.g. its average or spread). These numbers are adjusted to best match the data using numerical procedures like gradient descent.
- Non-parametric models tend to separate data into classes by blowing it up into a super high-dimensional space, and dividing it up with hyperplanes. Defining these hyperplanes does not in general require numerical methods.
- Although neural networks are parametric (defined by tons of numbers), and require sophisticated numerical methods to optimize, it turns out they may have more in common with non-parametric methods.
- This helps explain why the parameters of neural networks are so hard to interpret: they were never really parameters in the probabilistic sense, but rather just a means to approximate a non-parametric object.