Glicko-2 Rating System¶
in the context of Go ○● (Baduk, Weiqi)¶
A Smarter Method for Player Ranking in Zero-Sum Competative Games 📊¶
by Victor Zommers for Pydata London
📢 Talk Outcome:¶
👉 Explain where Glicko-2 improves over conventional Rating Systems (Elo)¶
👉 Basic Intuition behind the rules of Go¶
👉 Compare Player Data using Glicko-2 between Chess and Go Online Platforms¶
🔢 What is Glicko-2?¶
👉 Extension of Elo, introduced by Chess Player / Mathematician Mark Glickman in 2011 Paper¶
🛠️ Main Feature: quantifying Uncertainty, Confidence 🔮 in the the new Player Ratings ➡️ Handling Outliers¶
🌍 Used in: Chess.com | Lichess | Dota 2 | Counter-Strike | Online-Go (OGS) | & more¶
🌱 Application Use-Cases:¶
👉 Anywhere where Elo is used (Leaderboards, Reputation ranking); esp. Anomaly Detection, Handling Outliers!¶
👉 User Matching Engine (Social Networks, Trading Venues, Peer-2-Peer, Online Poker)¶
🚩 Where Platform Performance depends on the “quality” of matched Counter-Parties of “similar” abilities or overall “balance” | "fairness" of user lobbies¶
🚩 Where measuing Precison & Accuracy 🎯 of player scores and bets is important! (Prediction Markets)¶
👉 Crypto Tokenomics (pre-TGE points, stacking, DAO voting)¶
👉 LLM Evals (Chatbot Arena, Huggingface)¶
Similarities: |
Chess ♞ |
Go ○● |
2-Player |
||
Turn-Based |
||
Deterministic (no randomness; rules-based outcomes) |
||
Perfect Information (no information assymetry/hidden pieces) |
||
Zero Sum (your loss is my gain) |
Differences: |
Chess ♞ |
Go ○● |
Piece Types |
6 Figures |
One Type (stone) |
Board Size |
8 X 8 |
19 X 19 |
Board Progression |
Starts full, empties via captures |
Starts empty, fills via stone placements |
Game Objective |
Checkmate (capturing helps) |
Control more territory (not capturing directly) |
Differences: |
Chess ♞ |
Go ○● |
Opening Theory Depth |
15-20 Moves |
50+ Moves |
Branching Factor |
~35 Legal Moves |
~250-300 Legal Moves |
Game Tree Complexity |
10^120 (Shannon number) |
10^360 (> no of atoms in universe) |
Strategy |
Tactical, positional |
Strategic, spatial |
⚖️ Player Balacing in Go ○●¶
Unlike Chess, Go is designed to support fair play between opponents of different strengths¶
👉 KOMI - Black plays first; White receives automatic compensation (+6.5 points) to offset the loss of initiative (sente)¶
👉 HANDICAP - Skill balancing, weaker player gets more stones/moves advantage in the starting position¶
🎯 Knowing Accurate Player Ratings is Essential for calculating Komi & Handicap (Parametric Neccessaty for fair play!) - directly impacts starting probabilities!¶
⚛️ 2 AIs with Equal Strength & Style: But Extra 1-Stone Handicap/Move was given to Black ⚠️¶
On average, 1-stone Handicap (50-100 Elo) results in >75% win rate among equal opponents, assuming optimal moves¶
(for 2-stone handicap game below: win rate @ 80-95%)

Estimating the Handicap Effect in the Go Game: A RDD Approach. Kota Mori. 2016. arXiv:1606.05778
Fig: Y-axis are Handicap Stones. X-axis are Predicted probability for the white to win. The drop in the probability at cutoff points can be interpreted as the correponding handicap effect. The gray bands represent the 95% CI. The dots indicate the local averages.
Running (Simplified) Glicko-2 System:¶
- Rating ($r$), theoretically no finite limit, empirically $r \in [100, 3500]$
- Rating Deviation ($RD$), $RD =$ Standard Deviation, suggested limit $RD \in [30, 500]$
- Volatility ($\sigma$), $\sigma$ is high when player has erratic performances (Player had exceptional results after a period of stability), $\sigma$ is low when player performs at a consistent level
Example: $Rating = 1500,\ RD = 100\ \therefore\ 95\% \text{ Confidence Interval } [1300, 1700]$
- System Constant ($\tau$) constrains the change in Volatility $\sigma$ over time, $\tau \in [0.3, 1.2]$
Smaller $\tau$ prevents $\sigma$ from changing by large amounts, which in turn prevents enormous changes in ratings based on very improbable results
- Rating Period: collection of games & opponent ratings + RDs that are Batch-processed to output new Player Rating, RD, $\sigma$; Can be an event-driven implementation: change in RD (uncertainty increase due to Player inactivity) can be back-calculated when player enters matching pool for their first game after the pause)
🐍 Python Implementation (But constant $\sigma^{\prime}$, skipping Step 5)¶
import math
def update_glicko2(rating=1500, RD=350, sigma=0.06,
games: list[tuple[int, int, float]]=[]):
# Step 1: Convert rating and RD to Glicko-2 scale (mu, phi)
Glicko_Scale = 173.7178 # constant ≈ 400/ln(10)
mu = (rating - 1500) / Glicko_Scale
phi = RD / Glicko_Scale
# If no games, just age the RD (phi) and return.
if not games:
# RD increases (uncertainty grows) when no new games played
phi_star = math.sqrt(phi**2 + sigma**2) # pre-period RD (aging)
new_mu = mu
new_phi = phi_star
# Convert back to original scale
new_rating = new_mu * Glicko_Scale + 1500
new_RD = new_phi * Glicko_Scale
return new_rating, new_RD
# Step 2: For each game, convert opponent rating and RD to Glicko-2 scale
mus = []
phis = []
results = []
for (r_opp, RD_opp, score) in games:
mu_opp = (r_opp - 1500) / Glicko_Scale
phi_opp = RD_opp / Glicko_Scale
mus.append(mu_opp)
phis.append(phi_opp)
results.append(score)
# Step 3: Compute g(phi) and expected score E for each opponent
g_list = []
E_list = []
for mu_opp, phi_opp in zip(mus, phis):
g = 1.0 / math.sqrt(1 + 3 * phi_opp**2 / (math.pi**2))
E = 1.0 / (1 + math.exp(-g * (mu - mu_opp)))
g_list.append(g)
E_list.append(E)
# Step 4: Compute variance v and delta Δ
v_inv = 0.0
delta_sum = 0.0
for g, E, score in zip(g_list, E_list, results):
v_inv += (g**2) * E * (1 - E)
delta_sum += g * (score - E)
v = 1.0 / v_inv if v_inv != 0 else float('inf')
delta = v * delta_sum
# Step 5: Compute new φ (phi') and μ (mu')
# First, the "pre-period" phi: BUT !Constant! volatility (sigma).
phi_star = math.sqrt(phi**2 + sigma**2)
new_phi = 1.0 / math.sqrt((1.0 / (phi_star**2)) + (1.0 / v))
new_mu = mu + (new_phi**2) * delta_sum
# Step 6: Convert new_mu and new_phi back to rating scale
new_rating = new_mu * Glicko_Scale + 1500
new_RD = new_phi * Glicko_Scale
return new_rating, new_RD
initial_rating = 1500
initial_RD = 350
# 2 games played: a win vs a 1400/30 player, a loss vs a 1550/100 player
# game outcome is float between 0-1
games = [(1400, 30, 1.0), (1550, 100, 0.0)]
new_rating, new_RD = update_glicko2(initial_rating, initial_RD, games=games)
print(f"New rating: {new_rating:.2f}, New RD: {new_RD:.2f}")
New rating: 1486.87, New RD: 208.00
# If no games were played, only RD increases:
rating, RD, sigma = 1500, 350, 0.6
for i in range(10): # Simulate 10 periods of inactivity
rating, RD = update_glicko2(games=[], rating=rating, RD=RD, sigma=sigma)
print(f"After period {i+1}:\nrating stays same {rating:.2f}, RD Expanded to {RD:.2f}")
After period 10: rating stays same 1500.00, RD Expanded to 480.77
In Glicko2 Rating Changes of Opponents in a game don't sum to zero!¶
(Unlike in Elo, where points gained = points lost and uncertaincy is constant for all players)
# 2 players played against each other
start_rating = 1500 # same rating, only RD is different
player1_rating, player1_RD = update_glicko2(start_rating, RD=350, games=[(start_rating, 100, 0.0)])
player2_rating, player2_RD = update_glicko2(start_rating, RD=100, games=[(start_rating, 350, 1.0)])
print(f"Player 1: New rating {player1_rating:.2f}, New RD {player1_RD:.2f}"
f"\nPlayer 2: New rating {player2_rating:.2f}, New RD {player2_RD:.2f}"
f"\nΔ Player 1: {player1_rating - start_rating:.2f}, Δ Player 2: {player2_rating - start_rating:.2f}")
Player 1: New rating 1325.06, New RD 252.52 Player 2: New rating 1518.76, New RD 98.71 Δ Player 1: -174.94, Δ Player 2: 18.76
👆 Players Deltas don't sum to zero; Harder to gain points for established players, easy to have big losses for new players;
What is Online-Go.com (OGS)? - It's like Lichess¶
🌐 Biggest Online Go Platform (1.75+ Mil Players) + Forever Free!¶
🆓 Open-Source Codebase, maintaned by community devs¶
🔍 Django Rest API for data mining, AI training runs!¶
🤖 (Probably) Most Advanced AI in Prod for Scoring, Reviews & Live Game Outcomes! (AI as Umpire 👨🏻⚖️)¶
🌏 Analysing Histogram of Glicko-2 Ratings on OGS
Back-End Binning of Player Scores into JSON
Click "▶ 📊 Compare to Global Distribution" under Ratings table:
♞ Lichess Distribution Chart
- Bell curve central tendency @ 1500 ➡️ not Right-skewed
- Rating Range [400,2800] ➡️ lower Kurtosis than in Go?
FOOTNOTE: In Zero-Sum games, theoretic central tendecy should be located around the starting (default) rating ➡️ "your loss is my gain", however Rating Changes in Gicko2 are not zero sum...
🔗 Useful Resources:¶
🌐 Learn Go Online¶
📍 Play Go In-Person¶
- London City Go Club (Monday 6PM @ Old Red Cow Pub EC1A)
- British Go Association (BGA)
📊 Glicko-2 Paper¶
🛠️ GitHub Repos¶
- Online-Go Server - Contribute Code! 👨🏻💻
- KataGo (Open-Soure Go AI)