Couldn't fetch the CSV (running from a file?). Load it manually:
① The pipeline — features engineered chronologically, no future leakage
For every match we walk the dataset in date order and, using only what was known before kickoff, build four features:
Elo difference (a running strength rating updated after each game, with home advantage & margin-of-victory),
home advantage (0 on neutral ground), recent-form goal difference (last N games each), and the pairwise head-to-head rate.
Models train on the early years and are scored on a held-out recent slice they never saw — so the accuracy below is honest out-of-sample, not memorised.
② Model leaderboard
Out-of-sample, on the held-out recent matches. Lower log-loss / Brier = better calibrated; higher accuracy = more right.
Model
Accuracy
Log-loss
Brier
vs naïve
③ What the model learned — logistic weights (standardised)
Positive weight on the home-win class means the feature pushes toward a home win. This is the full logistic model.
④ Calibration — do 70%-confidence predictions win 70% of the time?
Predicted home-win probability (binned) vs the actual home-win rate in each bin. On the diagonal = perfectly calibrated.
⑤ Predict a match
Models (all from scratch, gradient descent): Elo-logistic baseline (strength + home only) · Full logistic softmax over all four features (W/D/L) · Poisson goals model (two Poisson regressions → a score matrix → outcome & most-likely scoreline). Baseline = always predict the majority class. Data: Kaggle · martj42 international results 1872→present.