data:image/s3,"s3://crabby-images/b2c01/b2c018b8f09c3b29ae386b2d70f80f18cf7d466d" alt="Resume"
World of Warcraft TBC Arena
Problem
The World of Warcraft ArenaStats add-on lets players track their Arena (2v2/3v3/5v5 player-vs-player matches) results automatically. This tool is fan-created and not natively supported, thus it is prone to occasional errors when recording results. Can we clean the exported data to get a usable dataset?
data:image/s3,"s3://crabby-images/88594/88594dd24119492792edc26224152a69ed75c342" alt="WoW1"
ArenaStats In-Game Display.
Data Collection & Data Cleansing
- TBC Arena Stats - Data CleaningJupyter Notebook - Python - Jupyter Notebook – Python
The most common errors with ArenaStats are duplicated matches and a ghost player error, where a player from the previous match will be duplicated in the record for the following match (so three players would be listed where there were only two). Duplicate matches are easy enough to filter out, but ghost players are more difficult to determine and rely on player race-class-faction combinations to be distinct between matches. If the two matches feature characters with similar race, class, and/or faction, the ghost player might not be able to be determined and the match will have to be dropped from the dataset.
data:image/s3,"s3://crabby-images/acedb/acedbb53e3c9acff7c3c9e391de5bf0f1d5ad1d2" alt="WOW-TBC2"
Ghost Player Example.
Data Cleaning Steps
- Add columns for queueType (2v2, 3v3, 5v5) and playersPerTeam (2, 3, 5).
- Drop duplicated records (matches with matchDuration = 0 and/or a blank teamName field are duplicates).
- Check for ghost players.
- Attempt to determine the ghost player. Update the record by dropping the erroneous recorded player.
- If the ghost player could not be determined, drop the record from the dataset entirely.
- Update the zoneId column to use map names.
- Standardize the format of the date column.
- Replace endTime column with a single matchDuration column (seconds).
- Add columns for teamComp and enemyTeamComp (eg. “Rogue-Priest”) and a binary winLoss column.
data:image/s3,"s3://crabby-images/45ea0/45ea097982392c6887b6ca1816ad28e476a4bedf" alt="WoW3"
Cleaned ArenaStats Dataset.
Data Visualizations
data:image/s3,"s3://crabby-images/0807f/0807fac7df6826160398f9eae63a832f3f65bb47" alt="WoW-Viz1"
data:image/s3,"s3://crabby-images/57ea8/57ea8ef60c43d3ea8a2382a93ed78797252c41e9" alt="WoW-Viz2"
data:image/s3,"s3://crabby-images/decd3/decd38795e6fc75bbb4eb157ff54081abe3b26ad" alt="WoW-Viz3"
data:image/s3,"s3://crabby-images/373e7/373e7f0f8ac11f6b5ecd703861434211540a4631" alt="WoW-Viz4"
data:image/s3,"s3://crabby-images/0a6b7/0a6b77c256b8e2f62eefc12b3751d4afbdd4fb49" alt="WoW-Viz5"
data:image/s3,"s3://crabby-images/8a92b/8a92baf07a4f710ba6243971301fef3006498c7e" alt="WoW-Viz6"
data:image/s3,"s3://crabby-images/3dc15/3dc156e849ca4e7fe9e937b7721f64ebdfcc9041" alt="WoW-Viz7"
- TBC Arena Matchups, Season 3, Rogue-Priest - Jupyter Notebook – R
- TBC Arena Season 3, Rogue-Priest, Distribution of Match Durations - Jupyter Notebook – Python