| Preface | p. ix |
| Notation | p. xii |
| 1 Introduction and Examples | p. 1 |
| 1.1 How do neural methods differ? | p. 4 |
| 1.2 The pattern recognition task | p. 5 |
| 1.3 Overview of the remaining chapters | p. 9 |
| 1.4 Examples | p. 10 |
| 1.5 Literature | p. 15 |
| 2 Statistical Decision Theory | p. 17 |
| 2.1 Bayes rules for known distributions | p. 18 |
| 2.2 Parametric models | p. 26 |
| 2.3 Logistic discrimination | p. 43 |
| 2.4 Predictive classification | p. 45 |
| 2.5 Alternative estimation procedures | p. 55 |
| 2.6 How complex a model do we need? | p. 59 |
| 2.7 Performance assessment | p. 66 |
| 2.8 Computational learning approaches | p. 77 |
| 3 Linear Discriminant Analysis | p. 91 |
| 3.1 Classical linear discrimination | p. 92 |
| 3.2 Linear discriminants via regression | p. 101 |
| 3.3 Robustness | p. 105 |
| 3.4 Shrinkage methods | p. 106 |
| 3.5 Logistic discrimination | p. 109 |
| 3.6 Linear separation and perceptrons | p. 116 |
| 4 Flexible Discriminants | p. 121 |
| 4.1 Fitting smooth parametric functions | p. 122 |
| 4.2 Radial basis functions | p. 131 |
| 4.3 Regularization | p. 136 |
| 5 Feed-forward Neural Networks | p. 143 |
| 5.1 Biological motivation | p. 145 |
| 5.2 Theory | p. 147 |
| 5.3 Learning algorithms | p. 148 |
| 5.4 Examples | p. 160 |
| 5.5 Bayesian perspectives | p. 163 |
| 5.6 Network complexity | p. 168 |
| 5.7 Approximation results | p. 173 |
| 6 Non-parametric Methods | p. 181 |
| 6.1 Non-parametric estimation of class densities | p. 181 |
| 6.2 Nearest neighbour methods | p. 191 |
| 6.3 Learning vector quantization | p. 201 |
| 6.4 Mixture representations | p. 207 |
| 7 Tree-structured Classifiers | p. 213 |
| 7.1 Splitting rules | p. 216 |
| 7.2 Pruning rules | p. 221 |
| 7.3 Missing values | p. 231 |
| 7.4 Earlier approaches | p. 235 |
| 7.5 Refinements | p. 237 |
| 7.6 Relationships to neural networks | p. 240 |
| 7.7 Bayesian trees | p. 241 |
| 8 Belief Networks | p. 243 |
| 8.1 Graphical models and networks | p. 246 |
| 8.2 Causal networks | p. 262 |
| 8.3 Learning the network structure | p. 275 |
| 8.4 Boltzmann machines | p. 279 |
| 8.5 Hierarchical mixtures of experts | p. 283 |
| 9 Unsupervised Methods | p. 287 |
| 9.1 Projection methods | p. 288 |
| 9.2 Multidimensional scaling | p. 305 |
| 9.3 Clustering algorithms | p. 311 |
| 9.4 Self-organizing maps | p. 322 |
| 10 Finding Good Pattern Features | p. 327 |
| 10.1 Bounds for the Bayes error | p. 328 |
| 10.2 Normal class distributions | p. 329 |
| 10.3 Branch-and-bound techniques | p. 330 |
| 10.4 Feature extraction | p. 331 |
| A Statistical Sidelines | p. 333 |
| A.1 Maximum likelihood and MAP estimation | p. 333 |
| A.2 The EM algorithm | p. 334 |
| A.3 Markov chain Monte Carlo | p. 337 |
| A.4 Axioms for conditional independence | p. 339 |
| A.5 Optimization | p. 342 |
| Glossary | p. 347 |
| References | p. 355 |
| Author Index | p. 391 |
| Subject Index | p. 399 |