Adapting Chess PGN to Track Game State and Other Variables

Problem


Portable Game Notation (PGN) files are the most common way of recording chess games. The file format utilizes algebraic chess notation, which is a quick and easy way to record individual moves during a game, but omits some valuable information such as the current game state or positional/material advantage, and also doesn’t distinguish between different pieces of the same type (e.g. a-pawn vs b-pawn).

Can we take a repository of PGN files and create a database of chess moves that includes key context information? What insights could we make if such a resource was made available?

Collecting Data


Chess.com allows you download PGN files containing multiple games from your match history.

Example of a chess game stored in a PGN file.

Using this feature, I was able to pull a sample dataset of 605 chess games with which to work.

Building a Solution


Python Script – Chess PGN Parser – GitHub

The process of writing a Python script that tracked everything I wanted to track was much more complicated than I anticipated because I was basically implementing the entire chess rule set and logic.

Piece properties.

To summarize what the script does for each game/move list:

  1. Create dictionary to store each piece and its properties.
  2. Calculate all legal moves and legal captures and store these as piece properties.
    1. Calculate possible pins and checks, double pawn moves, castles, en passant, etc.
  3. Update the ‘square’ (location) property of the piece that was moved.
  4. Recalculate all legal moves and captures. Update our dictionary of pieces accordingly.
  5. Create a record with columns that record the current move, as well as the properties of each piece on the board, effectively recording the game state after each move.
  6. Add record to our table.
  7. Repeat for each move.

We end up with a table that requires 226 columns to store the continually updating properties of each chess piece, but we have so much more data available to us. Each record in the table represents an entire chess game state!

Column definitions.

Data Visualizations


Chess – Distribution of Opening Lines – Jupyter Notebook – Python

Chess Piece Placement – Heatmap – Jupyter Notebook – Python