Glicko-2 Rating System¶

in the context of Go ○● (Baduk, Weiqi)¶


A Smarter Method for Player Ranking in Zero-Sum Competative Games 📊¶

by Victor Zommers for Pydata London

📢 Talk Outcome:¶

👉 Explain where Glicko-2 improves over conventional Rating Systems (Elo)¶

👉 Basic Intuition behind the rules of Go¶

👉 Compare Player Data using Glicko-2 between Chess and Go Online Platforms¶

🔢 What is Glicko-2?¶

👉 Extension of Elo, introduced by Chess Player / Mathematician Mark Glickman in 2011 Paper¶

🛠️ Main Feature: quantifying Uncertainty, Confidence 🔮 in the the new Player Ratings ➡️ Handling Outliers¶

🌍 Used in: Chess.com | Lichess | Dota 2 | Counter-Strike | Online-Go (OGS) | & more¶

🌱 Application Use-Cases:¶

👉 Anywhere where Elo is used (Leaderboards, Reputation ranking); esp. Anomaly Detection, Handling Outliers!¶

👉 User Matching Engine (Social Networks, Trading Venues, Peer-2-Peer, Online Poker)¶

🚩 Where Platform Performance depends on the “quality” of matched Counter-Parties of “similar” abilities or overall “balance” | "fairness" of user lobbies¶

🚩 Where measuing Precison & Accuracy 🎯 of player scores and bets is important! (Prediction Markets)¶

👉 Crypto Tokenomics (pre-TGE points, stacking, DAO voting)¶

👉 LLM Evals (Chatbot Arena, Huggingface)¶

⛩️ Let's talk GO!

Similarities:

Chess ♞

Go ○●

2-Player

Turn-Based

Deterministic (no randomness; rules-based outcomes)

Perfect Information (no information assymetry/hidden pieces)

Zero Sum (your loss is my gain)

Differences:

Chess ♞

Go ○●

Piece Types

6 Figures

One Type (stone)

Board Size

8 X 8

19 X 19

Board Progression

Starts full, empties via captures

Starts empty, fills via stone placements

Game Objective

Checkmate (capturing helps)

Control more territory (not capturing directly)

Differences:

Chess ♞

Go ○●

Opening Theory Depth

15-20 Moves

50+ Moves

Branching Factor

~35 Legal Moves

~250-300 Legal Moves

Game Tree Complexity

10^120 (Shannon number)

10^360 (> no of atoms in universe)

Strategy

Tactical, positional

Strategic, spatial

⚖️ Player Balacing in Go ○●¶

Unlike Chess, Go is designed to support fair play between opponents of different strengths¶

👉 KOMI - Black plays first; White receives automatic compensation (+6.5 points) to offset the loss of initiative (sente)¶

👉 HANDICAP - Skill balancing, weaker player gets more stones/moves advantage in the starting position¶

🎯 Knowing Accurate Player Ratings is Essential for calculating Komi & Handicap (Parametric Neccessaty for fair play!) - directly impacts starting probabilities!¶

⚛️ 2 AIs with Equal Strength & Style: But Extra 1-Stone Handicap/Move was given to Black ⚠️¶


On average, 1-stone Handicap (50-100 Elo) results in >75% win rate among equal opponents, assuming optimal moves¶

(for 2-stone handicap game below: win rate @ 80-95%)

gif AI play 2 stone handicap imbalance

Elo Points per Stone in Go. François Labelle. 2016

png Francois Labelle Stone Handicap value in Elo points

Estimating the Handicap Effect in the Go Game: A RDD Approach. Kota Mori. 2016. arXiv:1606.05778

svg Kota Mori extra handicap Ssones win rates arXiv:1606.05778

Fig: Y-axis are Handicap Stones. X-axis are Predicted probability for the white to win. The drop in the probability at cutoff points can be interpreted as the correponding handicap effect. The gray bands represent the 95% CI. The dots indicate the local averages.

Running (Simplified) Glicko-2 System:¶

  • Rating ($r$), theoretically no finite limit, empirically $r \in [100, 3500]$
  • Rating Deviation ($RD$), $RD =$ Standard Deviation, suggested limit $RD \in [30, 500]$
  • Volatility ($\sigma$), $\sigma$ is high when player has erratic performances (Player had exceptional results after a period of stability), $\sigma$ is low when player performs at a consistent level

Example: $Rating = 1500,\ RD = 100\ \therefore\ 95\% \text{ Confidence Interval } [1300, 1700]$

  • System Constant ($\tau$) constrains the change in Volatility $\sigma$ over time, $\tau \in [0.3, 1.2]$
    Smaller $\tau$ prevents $\sigma$ from changing by large amounts, which in turn prevents enormous changes in ratings based on very improbable results
  • Rating Period: collection of games & opponent ratings + RDs that are Batch-processed to output new Player Rating, RD, $\sigma$; Can be an event-driven implementation: change in RD (uncertainty increase due to Player inactivity) can be back-calculated when player enters matching pool for their first game after the pause)

🐍 Python Implementation (But constant $\sigma^{\prime}$, skipping Step 5)¶

In [1]:
import math
def update_glicko2(rating=1500, RD=350, sigma=0.06,
                   games: list[tuple[int, int, float]]=[]):
    # Step 1: Convert rating and RD to Glicko-2 scale (mu, phi)
    Glicko_Scale = 173.7178 # constant ≈ 400/ln(10)
    mu = (rating - 1500) / Glicko_Scale
    phi = RD / Glicko_Scale

    # If no games, just age the RD (phi) and return.
    if not games:
        # RD increases (uncertainty grows) when no new games played
        phi_star = math.sqrt(phi**2 + sigma**2)  # pre-period RD (aging)
        new_mu = mu
        new_phi = phi_star
        # Convert back to original scale
        new_rating = new_mu * Glicko_Scale + 1500
        new_RD = new_phi * Glicko_Scale
        return new_rating, new_RD
    
    # Step 2: For each game, convert opponent rating and RD to Glicko-2 scale
    mus = []
    phis = []
    results = []
    for (r_opp, RD_opp, score) in games:
        mu_opp = (r_opp - 1500) / Glicko_Scale
        phi_opp = RD_opp / Glicko_Scale
        mus.append(mu_opp)
        phis.append(phi_opp)
        results.append(score)

    # Step 3: Compute g(phi) and expected score E for each opponent
    g_list = []
    E_list = []
    for mu_opp, phi_opp in zip(mus, phis):
        g = 1.0 / math.sqrt(1 + 3 * phi_opp**2 / (math.pi**2))
        E = 1.0 / (1 + math.exp(-g * (mu - mu_opp)))
        g_list.append(g)
        E_list.append(E)

    # Step 4: Compute variance v and delta Δ
    v_inv = 0.0
    delta_sum = 0.0
    for g, E, score in zip(g_list, E_list, results):
        v_inv += (g**2) * E * (1 - E)
        delta_sum += g * (score - E)
    v = 1.0 / v_inv if v_inv != 0 else float('inf')
    delta = v * delta_sum

    # Step 5: Compute new φ (phi') and μ (mu')
    # First, the "pre-period" phi: BUT !Constant! volatility (sigma).
    phi_star = math.sqrt(phi**2 + sigma**2)
    new_phi = 1.0 / math.sqrt((1.0 / (phi_star**2)) + (1.0 / v))
    new_mu = mu + (new_phi**2) * delta_sum

    # Step 6: Convert new_mu and new_phi back to rating scale
    new_rating = new_mu * Glicko_Scale + 1500
    new_RD = new_phi * Glicko_Scale
    return new_rating, new_RD
In [2]:
initial_rating = 1500
initial_RD = 350
# 2 games played: a win vs a 1400/30 player, a loss vs a 1550/100 player
# game outcome is float between 0-1
games = [(1400, 30, 1.0), (1550, 100, 0.0)]
new_rating, new_RD = update_glicko2(initial_rating, initial_RD, games=games)
print(f"New rating: {new_rating:.2f}, New RD: {new_RD:.2f}")
New rating: 1486.87, New RD: 208.00
In [3]:
# If no games were played, only RD increases:
rating, RD, sigma = 1500, 350, 0.6
for i in range(10):  # Simulate 10 periods of inactivity
    rating, RD = update_glicko2(games=[], rating=rating, RD=RD, sigma=sigma)
print(f"After period {i+1}:\nrating stays same {rating:.2f}, RD Expanded to {RD:.2f}")
After period 10:
rating stays same 1500.00, RD Expanded to 480.77

In Glicko2 Rating Changes of Opponents in a game don't sum to zero!¶

(Unlike in Elo, where points gained = points lost and uncertaincy is constant for all players)

In [4]:
# 2 players played against each other
start_rating = 1500 # same rating, only RD is different
player1_rating, player1_RD = update_glicko2(start_rating, RD=350, games=[(start_rating, 100, 0.0)])
player2_rating, player2_RD = update_glicko2(start_rating, RD=100, games=[(start_rating, 350, 1.0)])
print(f"Player 1: New rating {player1_rating:.2f}, New RD {player1_RD:.2f}"
      f"\nPlayer 2: New rating {player2_rating:.2f}, New RD {player2_RD:.2f}"
      f"\nΔ Player 1: {player1_rating - start_rating:.2f}, Δ Player 2: {player2_rating - start_rating:.2f}")
Player 1: New rating 1325.06, New RD 252.52
Player 2: New rating 1518.76, New RD 98.71
Δ Player 1: -174.94, Δ Player 2: 18.76

👆 Players Deltas don't sum to zero; Harder to gain points for established players, easy to have big losses for new players;

What is Online-Go.com (OGS)? - It's like Lichess¶

🌐 Biggest Online Go Platform (1.75+ Mil Players) + Forever Free!¶

🆓 Open-Source Codebase, maintaned by community devs¶

🔍 Django Rest API for data mining, AI training runs!¶

🤖 (Probably) Most Advanced AI in Prod for Scoring, Reviews & Live Game Outcomes! (AI as Umpire 👨🏻‍⚖️)¶

🌏 Analysing Histogram of Glicko-2 Ratings on OGS

Back-End Binning of Player Scores into JSON

Click "▶ 📊 Compare to Global Distribution" under Ratings table:

♞ Lichess Distribution Chart

  • Bell curve central tendency @ 1500 ➡️ not Right-skewed
  • Rating Range [400,2800] ➡️ lower Kurtosis than in Go?

FOOTNOTE: In Zero-Sum games, theoretic central tendecy should be located around the starting (default) rating ➡️ "your loss is my gain", however Rating Changes in Gicko2 are not zero sum...

🔗 Useful Resources:¶

🌐 Learn Go Online¶

  • online-go.com/learn-to-play-go

📍 Play Go In-Person¶

  • London City Go Club (Monday 6PM @ Old Red Cow Pub EC1A)
  • British Go Association (BGA)

📊 Glicko-2 Paper¶

🛠️ GitHub Repos¶

  • Online-Go Server - Contribute Code! 👨🏻‍💻
  • KataGo (Open-Soure Go AI)