Python script to identify "Bad Faith" Social Media usage


I am new to this forum but am very interested by many of the ideas here, and wanted to introduce an existing (and stalled) project of mine that might be interesting to many of you here.

The purpose of the project is to identify accounts on social media which disguise themselves as members of a community, and then try to steer the conversation to something motivated by a hidden agenda. The most obvious example of this would be attempting to promote something, or influence political opinions. To achieve this goal, I wrote a script to pull statistics from an entire subreddit regarding the average age of an account, average karma, number of interactions in the subreddit, and number of interactions with other users also active in the subreddit. By default this is restricted to the first 5 interactions on a given thread.

The pattern I am looking for here is that manipulators will purchase old accounts to appear valid, and then comment very early in a thread’s lifetime to steer the conversation. Then other accounts may be used to build false consensus and make the conversation more engaging. So if two users interact above average, the poster has an abnormally high or low interaction rate in the subreddit, or anything else looks highly abnormal, it’s a sign that it might not be genuine.

Unfortunately I got rather stuck once I realized I had no concrete examples of how these social media manipulators really operate, and was unable to confirm a positive match. I tried asking the data science subreddit for some suggestions, as well as reddit themselves for a history of confirmed accounts guilty of this. When I learned about this community, I thought I would post this here as many of you may have backgrounds conducive to spotting suspicious patterns in social media usage. I am happy to write code to generate statistics and connections between accounts, but simply lack the knowledge of what to look for.

If you are interested in trying it out or contributing, here is the github page, and a blog write up with some more background.

Even if you are not a subject matter expert, I’d love to hear some wild guesses at what characteristics might suggest a social media account is acting against the best wishes of a community.



This is a cool idea. Your last commit was a year ago. I think nowadays (after the FB/CA scandals et al) there is much more interest in these topics and many more ongoing work. Sporadically I see topics like yours passing by on Hacker News. Most work is done in academic circles here.

It would be a good subject for the CHT to find out more about the methods that are used and thinking of strategies and theory to fight this phenomenon.

If your project evolves I will surely add it to the List of awesome open-source tech projects on Github
It can be in a new separate category with similar software.

Very cool, thanks aschrijver. I would call this a prototype at best, but hopefully I can get the ball rolling again.

My biggest challenge is finding a way to confirm the script works, which would require a data set or some more in depth domain knowledge on how these manipulators actually operate. That is something that I hope academic circles may have access to. Otherwise this knowledge is likely restricted to a small group within Facebook/Reddit/Twitter who deal with the problem directly.

Another option would be to focus on twitter, which occasionally bans large groups at the same time. So if I can pull their tweets before they were banned, that might be enough to build up the dataset.