Crawling GitHub for Discord & Telegram Invites

My other crawling efforts in Crawling GitHub for New Cryptonight Coins used the GitHub API and Python. When I started on my efforts to find new Discord or Telegram invites that could possibly be cryptocurrency related I chose to walk a different path. This set of scripts used JavaScript and I did not use the GitHub API mostly out of curiosity and a desire to learn new things along the way.

The goal this time was a little different than the previous crawler which focused on files and repository. Discord and Telegram invites appeared in code, repository details, wiki pages, issues and even user profiles. I used Puppeteer to run Google Chrome in headless mode. Headless mode means that a browser window won’t pop up and that we can run this script from an SSH shell on Linux without any desktop. For each run I would sign into GitHub and save the session cookies off for reuse. Then search GitHub like you would do from any normal browser saving data and paging through search results. Each type of search object and different sorts get parsed for each run.

When searching for Discord invites I would use either discord.gg or discordapp invite NOT oauth2 which seemed to cover both invite link styles fairly well. After all the searches had been completed any existing invite codes are filtered out. The remaining invite codes then each call out to Discord’s invite API to ask for information about that invite. All of that data and the source information would be saved off in the database. Discord does ban IP addresses over excessive API usage but the ban is temporary. I could never find out what the right requests per hour limit was when I emailed them. Also fun fact you can only join, now I can’t remember it, I think 100 Discord channels before you hit a limit on free accounts. The bummer is that there isn’t an error for it, invites just don’t work and information about new Discord invites won’t show up either.

I spent less time on Telegram and thus didn’t go far in collecting Telegram invites. I didn’t bother trying to get any information about Telegram invite just saved them off a I found them. Those search strings were t.me and telegram.me. I bet they paid a handsome sum for a single letter domain even if the TLD was me.

After collecting that information then I could combine these invites and sources with the repositories I had found with Cryptonight coins. Sometimes coins were better handled and launched than others and these searches helped more than once.

This was part of a series: Ephemeral Projects as Performance Art

Crawling GitHub for New Cryptonight Coins

This is the most important set of scripts I came up with. All new coins I was interested were fork off existing coins who host their source code on GitHub. Turtlecoin is the most developer friendly Cryptonight coin. It has 559 forks as of this writing and the developers are very friendly to helping people who fork their code. All of these new forks and coins share similar code. For Cryptonight coins the configuration files for all the coins can be put into one of two categories as either C or C++ in nature.

Using Python and connecting the GitHub API allows you to perform multiple searches. const+char+CRYPTONOTE_NAME and #define+CRYPTONOTE_NAME were the only two searches I needed to find new GitHub repositories. One I found a new GitHub repository with a file that matched one of those two searches I would do my best to parse the configuration file to extract what data I could about the coin. Searched for the name, money supply, emission speed factor, difficult target seconds, minimum fee, genesis hex, block reward unlock window, dust threshold, ports and block reward.

Another Python script then iterated over those records to search each repository for forks sorting by newest forks first. I performed some one time searches for forks sorting by oldest instead since there is a limit to the number of results GitHub will provide.

The scripts would also look for most recently updated repositories and then parse the configuration file again comparing with with previous values to alert if any coins had their blockchain reset.

This was part of a series: Ephemeral Projects as Performance Art