The idea stemed from an evening of banters about who is the bigger bully (obviously it's Flashnub), but having no Data to prove our case, we started looking into the possibility of aggregating matches history.
First steps taken
Our first step was to scout if similar work had been previously done, and what type of assets we had.
After a brief period of research, we came to the conclusion that no one had really looked into aggregating matches history, so we scratched the idea of borrowing someone's code.
Next, we scouted what we had to work with, locally. Within the Guilty Gear steamapp folder, there was particular folder called Replay, upon inspection, it held 100 replxxx.dat files and also a replay.dat file.
Opening the .dat(Typical a freeform type of file which every developpers can use their own encoding) with notepad++ revealed.. a mess (go try it on your own). We had no clue what encoding Arksys used to store the replay data.
After a couple hours of trials and error trying to find which encoding it used, we gave up on the idea of recieving clean data.
Doing it oldschool
Our next step was to analyse .dat files through a hex editor
replxxx.dat looked like a hot mess and held no consistency, it didn't seem to have any relevance to the match history. (we found out later that the replay itself (inputs, yada yada) is saved in replxxx.dat)
Next we looked at replay.dat and upon inspection we noticed that a certain string-length was repeating itself, 100 times. Eureka, the 100 match history displayed in the replay were all stored in this one file.
Next step was to reverse engineer the hex code
Every matches had a string-length of 504 bits, Starting from the 28th. After splitting them all into a spreadsheet, we started changing inputs and analysing outputs.
The results indicated the following :
- Winner Boolean
- - 0 = Player 1 ; 1 = Player 2
- Player 1 and Player 2 Character
- - A two bit Hex (each) to determine the players's character (00 = Ky, 08 = Axl)
- Player 1 and Player 2 Hash
- - A 10 bit (each) that was unique for each player.
- - Year, Month, Day, Hour, Minutes, Seconds. Each have 2 bits. Uses Local time and not UTC.
Parsing the replay
Next we had to automate the parsing of the data
Attributing a player name to a hash
Since it was impossible to decode the player hash, we manually created and maintained a Player name/player hash reference table
This step was extremely inefficient.
Avoiding Duplicates and inaccuricies
To avoid players uploading duplicates, we needed to create a "Primary Key"
We knew that using the hour wasn't a viable solution because the time was saved locally which would cause too many duplicates when two players in different timezones would upload the replays of their fights together
So we decided to use : the player names, the player characters, the year/month/day/minutes and the winner boolean to create our "UniqueHash".
It's not fool proof, if two players in two different time zones play while being on different days and both upload the replay, the matches which were played in separate days will be duplicated.
Also, if two players play on the same day and on the same minute, with the same characters and the same result, it will be detected as a duplicate and ignored.
Analysing the data
To shorten the conception time of a test analysis, we used tableau to fetch and synthetize the SQL database.
While functional, Tableau is mainly aimed for businesses purpose and cannot be hosted publicly & directly linked to the SQL database (unless you pay big $$).
We tinkered around a bit with the player hash and discovered that it was in fact the Decimal to Hex convertion of the player's Steam ID, ordered in an odd way.
From there we worked on having the parser convert the said hash into a steamID and then create a separate array which would hold all unique steamIDs within the replay file.
The next step was to call Steam's API to convert all steamIDs into the players's current Alias.
Why not fetch everything?
While creating a call the the steam API, we thought of the possibility to also call for all replay data stored inside steam's API
We didn't really find any concrete answers, there was a glimmer of hope that with a publisher key, we could fetch said data but without enough proof we couldn't bring ourselves to spend the money.
We emailed steamwork but have yet to get an answer from them.
Whats to come?
The biggest thing to work on for us at this point will be the frontend and backend of the analysis, we want to move away from tableau but we have a lot of learning to do when it comes to node.js and graphically displaying SQL queries.
We'd really like to automate the replay fetching so players don't have to upload their replays, we're still waiting on an answer from Steamwork on this case.
As part of this project, maybe we'll do the same for other Steam Fighting games, but the way their replay data is saved may be completely different and impossible for us to interpret