The goal this time was a little different than the previous crawler which focused on files and repository. Discord and Telegram invites appeared in code, repository details, wiki pages, issues and even user profiles. I used Puppeteer to run Google Chrome in headless mode. Headless mode means that a browser window won’t pop up and that we can run this script from an SSH shell on Linux without any desktop. For each run I would sign into GitHub and save the session cookies off for reuse. Then search GitHub like you would do from any normal browser saving data and paging through search results. Each type of search object and different sorts get parsed for each run.
When searching for Discord invites I would use either
discordapp invite NOT oauth2 which seemed to cover both invite link styles fairly well. After all the searches had been completed any existing invite codes are filtered out. The remaining invite codes then each call out to Discord’s invite API to ask for information about that invite. All of that data and the source information would be saved off in the database. Discord does ban IP addresses over excessive API usage but the ban is temporary. I could never find out what the right requests per hour limit was when I emailed them. Also fun fact you can only join, now I can’t remember it, I think 100 Discord channels before you hit a limit on free accounts. The bummer is that there isn’t an error for it, invites just don’t work and information about new Discord invites won’t show up either.
I spent less time on Telegram and thus didn’t go far in collecting Telegram invites. I didn’t bother trying to get any information about Telegram invite just saved them off a I found them. Those search strings were
telegram.me. I bet they paid a handsome sum for a single letter domain even if the TLD was
After collecting that information then I could combine these invites and sources with the repositories I had found with Cryptonight coins. Sometimes coins were better handled and launched than others and these searches helped more than once.
This was part of a series: Ephemeral Projects as Performance Art