With the (re-)emergence of machine learning as a fundamental component for the current social web and the upcoming semantic web, increasing the accessibility (and applicability) of learning algorithms is becoming important. In the engineering team here at Impermium, we often need to quickly stand up internally available classification and regression services to help advance our grand plan. These services have to be reliable, maintainable, and scalable to hundreds of millions of events per day. They must also support (down to the algorithm level) the same best practices found elsewhere in the software world (logging, exception handling, etc). For this week’s blog post I thought we’d provide a glimpse at some of the software architecture that we’ve built up to meet these needs.

The standard ML tools for quickly developing learning models (weka, scikit-learn, R) are essentially prototyping workbenches, where most of the emphasis is on interactive feature selection, normalization, model selection, and model tuning. Once a successful model has been generated in these prototyping tools, it is usually an entirely new process to turn that model into a production quality system whose predictions can be consumed. In certain environments this involves collecting data from various feeds, and running day-end batch prediction scripts that use the workbench-based libraries to reproduce the steps developed in the prototyping phase. In other cases, the predictors must be able to operate in-line as a web service, and must be rewritten to meet performance or interface requirements.

Over several iterations, we have developed a python framework (named Lego) specifically designed to simplify productization and reduce iteration cycles of web-scale predictors. It focuses on meeting the demands of the entire prediction tool-chain, from the REST-API talking unstructured JSON, through a formalized feature-extraction process to the response structure – complete with debug information to assist in explainability. Working within the constraints of a framework buys us the ability to build really great tools around the whole process and not worry about over-engineering the infrastructure on any single project. In addition, we ensure that domain-specific modules developed for our business case can always be reused across all projects. It has allowed our small team of developers to rapidly stand up and iterate on many logistically and algorithmically challenging engineering tasks.

One of the keys was componentizing the basic process of machine learning in a way perfectly amenable to later prediction as a component of a web-service. As such, feature-extractors, normalizers, classifiers and the server glue code all communicate through well-defined interfaces and operate as appropriate in an online system. The learning and testing processes use the same methods and libraries that the live production system uses. There is a unified wrapper for a variety of key-value stores that are available for stateful featurization. We have a lightweight dpkg-based infrastructure for packaging and deployment, along with fabric for distribution and management.

Our second key was simplicity. As part of a high-velocity startup, we need to be able to adjust quickly as business conditions and customer appetites change. For example: using a skeleton script in the framework a developer can create a sandbox prediction project in a single line. This skeleton script creates for them a default classifier (SVM), a dummy feature-extractor and a few canonical training examples. It also creates all of the necessary logging, reporting and monitoring configuration files for the new project to use during packaging. The skeleton code then runs a train/test cycle (reporting P/R numbers), and does a full production train and packaging. Work on the project can then be done as a series of small changes, pushing the server to a “test environment” with each change. Adding another feature-extractor takes a few seconds and requires no interaction with other components of the system. This focus on simplicity in the design has made it easy to prototype new ideas and respond to customer requests in a very short time.

Our current work on Lego is focused on expanding the available classification modules and the options for iterative feature selection.

Cory O’Connor

Cory O’Connor is a Lead Engineer at Impermium, where he does away with several gallons of coffee a day, and as a side effect creates really killer software systems.

Facebook LinkedIn Google+