Oldies but goodies: MLE
Maximum-Likelihood Estimation (MLE) is ubiquitous in modern (supervised) machine learning, to the point that we sometimes forget it is the very principle behind the criterion we use to train fancy models. Justifying the MLE’s current popularity from a statistical perspective only would be a stretch; nonetheless, it is useful to remember that it enjoys some comfortable (asymptotic) statistical properties. This post provides proof for two such attributes: consistency and asymptotic normality.