Steam Info

Intro

The gist of this project is gather a bunch of data from Steam in order to do some trend analysis on people playing games. We'll need to gather a bunch of data to do that. The bulk of the start of this project is just gathering the data, then there will be some analysis after it's done.

Materials to have

You'll need some things before starting.

Steam API Key - https://steamcommunity.com/dev/apikey
Go - https://go.dev/
mongodb - https://www.mongodb.com/docs/manual/tutorial/install-mongodb-on-windows/
An IDE like VSCode, which you probably still have installed.

Background information

Good links:

https://developer.valvesoftware.com/wiki/Steam_Web_API

The steam API has a lot of information we'll need, but it's not going to be trivial to get it out. In theory we could iterate through all of the steam IDs via the format since they're all sequential, but since I'd like to collect friend data anyways, we might as well just use that to gather all the IDs too.

Also, chatbots are your friend, but they're the friend that's always trying to get you to do coke with them behind the gym after school. They are cool and all, but don't rely on them too much. They can help you up, but depending on them will just lead to bad things.

Assignment???

First test out the API just on your own. Put in the query into the URL bar of your browser and look at what is there. Put your key where the XXXXXXXXX's are: http://api.steampowered.com/ISteamUser/GetFriendList/v0001/?key=XXXXXXXXXXXXXXXXXXXXX&steamid=76561198064176948

Start with a "hello world" program in Go, then do the HTTP request, and print out the response via the same fmt.Println call.

Then, variablize the URL. You can literally just "add up" strings, so putting the parts of the URL that will always be the same in there, and putting the bits that will change as variables separated by '+' will get you a normally constructed URL. Just print out the result.

unmarshal the result into a struct using the annotations

Then you'll need to output that result to the database in order to save it.

get database running locally
use mongodb compass to create the database
connect to the database in go
write the one object into a collection
view the object in mongodb compass and make sure it looks like the data you got in your browser

Now we need to scan more than one user. We need to know what users to scan.

Let me help you with getting your mongodb setup to be indexed correctly. This isn't necessary, but it will dramatically reduce the amount of CPU it will take up. I can explain what's going on here if you're curious.
Then we'll need to make another collection in your database, for the "unscanned users" which just has a list of the users that aren't in your main friends list table.
In order to put information into this table, we need to process the scanned users a little bit. For each user we collect the friends list of, iterate through the friends list, and for each UID, look and see if it exists as a person who we've already scanned the friends list of. If you get no document in return, we didn't scan the user yet, so you need to add that UID into the unscanned users table.
Finally, use the FindOneAndDelete() function on the unscanned user collection. Put that UID as the variable in the URL, and have it call the API again.

At this point we will be able to allow the program to run and slowly get us, and practically never finish, the list of users on Steam. Obviously this doesn't help a ton with selecting games however. It's just letting us gather the users that we can take a look at later.

However, the Steam API, like most APIs, has a rate limiter on it. They don't want people taking up more CPU time than they can handle, so they limit you on the number of times you can call the API per a certain amount of time. They aren't super clear, but it looks like they are limiting it to 100,000 per day. And we need to respect that and take it into account.

In order to do this, there's several ways. The "braindead" way we can do would work and not need any real work other than one line, to add a 'sleep' for 1/100000 of a day in the loop. The slightly more complicated but more accurate way includes maybe a little bit of rework, and a little bit of concurrency, so we can totally attack that one if you want and it might be interesting for you, but I wouldn't say that it's necessary right now.

Now that we have this going, we can start working on a new program or expanding the existing program to also get the games a user has played. It may be easier to create a new one for now, and merge the two together later, just to make your development easier for the time being.

For the users we have scanned the friends list of, go to the new API endpoint to get the profile data. This includes time spent playing X game and what's in their library in general. We will want to gather a lot of these too, but it will be very similar to how the previous one worked.

Once we gather a decent chunk of this data (it will take less than a day to give us a very good sample size) we can start pumping that data into various tools like Elasticsearch and vector databases like neo4j. I will need a bit more time to figure out what the following steps for that will be.