A blog about defending the social web against abuse. Brought to you by Impermium.
development
Building a Classification Framework with Hive and Python

Building a Classification Framework with Hive and Python

Impermium aspires to make the web a secure place. To do this, we have developed products that do content analysis and classification. And products that analyze event-streams and pick out anomalous behaviour. The key products we have developed provide real-time feedback on event-streams. As one would expect, the real-time feedback process mostly involves model-evaluations  and...
Built to Scale: How does Impermium process data?

Built to Scale: How does Impermium process data?

In 2010, Impermium launched with a vision to handle abuse across the internet in a smart and scalable way.  The architecture was designed from the ground up to ingest and analyze large amounts of data from many different social networks on an ongoing basis.  Cofounders Naveen and Vish designed the data warehouse to utilize Pig,...
Internationalization in Python 2

Internationalization in Python 2

Why does a security company like Impermium care so much about internationalization? We care about User Generated Content (UGC). A lot. At Impermium, we employ patented machine learning algorithms to stop the bad guys from spreading spam, taking over accounts and exploiting the vulnerable. When discussing adversarial machine learning, the temptation is to focus on...
The Spell Caster – A Case Study in Adversarial Machine Learning

The Spell Caster – A Case Study in Adversarial Machine Learning

In a recent talk at the 2013 Strata Conference, I presented a few insights into adversarial machine learning and how it challenges traditional machine learning. I received a lot of positive feedback from attendees, and was subsequently flooded with requests for my slides and additional materials. Here, I will present an abridged version of my...
Program Management at Impermium: An Unexpected Journey

Program Management at Impermium: An Unexpected Journey

Last week, I talked about some of the challenges I’ve seen while trying to mold teams to fit rigid, defined processes that supposedly work for everyone. Here, I’d like to talk about how we’ve evolved our process to fit the individual team at Impermium. Getting to this point was difficult for me. I had to...
Breaking Free From the Cult: 6 Reasons Why Agile Doesn’t Work

Breaking Free From the Cult: 6 Reasons Why Agile Doesn’t Work

For the last six years, I have been trying to create a real, true Agile environment. I’ve gotten teams together and tried to follow the rules, doing everything “right” to achieve true Agile productivity. I’ve tried. My coworkers, colleagues, and fellow program managers have tried too. At this point, I’m ready to call it. Agile...
Productizing Web-Scale Machine Learning Systems

Productizing Web-Scale Machine Learning Systems

With the (re-)emergence of machine learning as a fundamental component for the current social web and the upcoming semantic web, increasing the accessibility (and applicability) of learning algorithms is becoming important. In the engineering team here at Impermium, we often need to quickly stand up internally available classification and regression services to help advance our...
Kaggle Competition Helps Impermium Detect Insults in Social Commentary

Kaggle Competition Helps Impermium Detect Insults in Social Commentary

In early August, Impermium launched a $10,000 competition on Kaggle, a platform for big data analytics. Because the field of social spam is so new, and attackers are continuously evolving their techniques, Impermium needed to reach the world’s preeminent computer scientists for this novel problem. The contest’s goal was to identify new ways to defend...
Developers: Salt Thy Passwords

Developers: Salt Thy Passwords

With groups like LulzSec leaking passwords left-and-right, it seemed like a good time to remind developers the importance of  never ever ever storing passwords in the clear, and always using cryptographic Salt when storing them. For those not familiar with the terms, or who think “my site isn’t a target so it’s not a big...