This is the first installment of a multi-part series drilling deep into Linear Labs’ new and exciting technology — BLOX. Throughout this series we highlight how to implement this new framework, create and test prototypes, and how to go from wild and crazy ideas to actual products in less than an hour. Let’s jump right into it!
Wait, I lied…
Before we get too far into the weeds, let’s first discuss what BLOX is and what it can do for you. BLOX is an opensource library dedicated to automating design and deployment of machine learning models.
The real question to ask is why we created Blox, and why we devoted so much time to open sourcing it? It‘s really pretty simple — at Linear Labs, we found that our ability to innovate was most limited to implementation details. Once we came up with an innovative idea for a machine learning model, we spent most of our time testing, verifying and versioning. This is still just the beginning; after this incredibly important but time consuming step, then we had to deploy. As a startup, it’s imperative that we can execute as quickly as possible.
Now the best bet to do this would be to implement something similar to Google’s Tensor2Tensor. Unfortunately, since our backend learning framework is built using PyTorch, we can’t rely on the TensorFlow frameworks to support us.
After looking for months, we couldn’t quite find what we needed. Leading us to have to build our own solution — BLOX.
In good faith, we decided to opensource this technology in order to help other startups and engineers who are facing this same problem. Below we’ve added our first of many series on how to build with BLOX
Building with BLOX
In this tutorial, we demonstrate how we can go from idea to product in a single config file. For this tutorial we require a few prerequisites.
- Understanding JSON syntax
- Basic HTML & RESTful API understanding
- A Debian/Ubuntu computer
So, what’s our end goal? What will you get out of this tutorial? What are we going to do?
We’re going to build a sandwich classifier.
The “sandwich classifier” will give us the ability to classify whether an image is a hot dog or not.
Yes, that’s right, I said it.
A hot dog is a sandwich! A SANDWICH!
Below are examples of how EVERYONE gets this wrong.
A Hot Dog Is Not a Sandwich
Every once in awhile, with a regularity that is both astounding and reassuring, Americans will gather together to raise…
A Hot Dog Is Not a Sandwich and We Don't Care What Anyone Says
My favorite part of BBQs has always been the hot dogs. Sure, drinking outside is nice, and sharing conversation with…
So if you haven’t guessed it yet, we’re doing the hot dog, not hot dog classifier used in HBO’s hit series Silicon Valley.
What makes BLOX such a great tool to do this is its ability to consume data, train a model and deploy a classifier in a few steps, or in BLOX terms, defining a Process.
STEP 1. Get The Data
Before we can deploy any type of machine learning model, we need data! For this tutorial we’re going to reference a really great tool to download images from Google.
For this, we’re going to use Hardik Vasa’s Google Imgage Downloader. We’ll let you refer to their instructions on downloading data. What we expect for this tutorial is your ability to download data into a folders “hotdog” and “nothotdog”. For example,
$ googleimagesdownload --keywords "Pasta,baloons,Beaches" --limit 2000
Then moving all non hotdog images to a single folder named “NotHotdog”.
$ mkdir nothotdog && mv downloads/*/* nothotdog && rm -rf downloads
$ googleimagesdownload --keywords "hotdog" --limit 3000
$ mv nothotdog downloads
Step 2-?? idk, use the config
BLOX uses a config file to set everything up, which is super easy. We’ll break it up into it’s main parts and how to use them, but really, it’s just as simple as using it like…
$ blox -c <your_cfg_file>
Wait, what config?
Ok, so you want to create, train and deploy, but how? Let’s walk through developing a config file. If you’d like to jump ahead and just get it going, checkout our Github repo here and navigate to the NotHotdog directory in the Examples section.
So you downloaded your data, dope — now let’s convert that to a BLOX dataset. In our config file, let’s add a section to our config titled “Data”. This way, BLOX will know what data we’re working with and may want to use in downstream tasks. So our config should look something like this…
But let’s tell it where to look for data and where to save the dataset once it’s converted it. So the updated config sould look something like this….
Creating model definitions are fairly easy. To do this, we can either create the models via python like so:
Or, we could do so via the BLOX syntax, like the net.json defined in the repo example. That essentially looks something like:
We just created our first BLOX dataset and model — noice! Now all we need to do is train and deploy. Let’s work on the next step, training our classifier.
To do this, we need to add the “Train” section to our config. The Train section will describe the Optimizer, Loss Function, Logging and Model(s).
We also support TensorboardX, so you can also log various metrics and store the the graphical model. Be sure the model(s) included in the “Params” section is the model(s) you wish to optimize.
To start training we just need to run a simple command:
$ blox -c <your_config> --train
Once we start training our model, we should get a terminal print out like the one below.
During the training you’ll be able to see the train and dev loss in the Summary section.
It’s time to start putting everything together. In the Deploy section of your config, you’ll be defining your endpoint to access the model along with the how the backend will be indexing it.
This allows us to create multiple endpoints. For this tutorial, we only want one endpoint, which will be defined by url keyword. We also want to make sure that when we access the data on the backend, we are using the arg keyword. Last, we implement RabbitMQ as message broker, so when we define a specific IP address and queue, we do so with the client arguments.
We’re almost there! Now we need to setup our compute pipeline, which is how we want to consume and serve our model as a service. We do this by adding a new section in our config named “Pipeline”.
This will define how many compute daemons to spawn and what RabbitMQ queue to be listening to.
Note, Pipelines are sequential and parsed top-down. That means if you wish to have your data processed by multiple models, provide them in the order you need the data processed.
Interfacing your model
We’re in the homestretch. We’ve defined the entire backend in a single config, now we just need something to interface with it. Let’s put together a simple web UI using Flask and Bootstrap. For implementation details see our Github repo’s example under Examples > NotHotdog.
Once you spin up your instances you’ll only need to run a few programs.
The server for the UI (assuming you’ve downloaded the repo and are in the NotHotdog directory)
$ python server.py
The client to server the evaluations
$ blox -c <your_config> --serve
Then, spin up the compute pipeline
$ blox -c <your_config> --pipeline
Now you’re ready to solve all the worlds problems using BLOX!