…after a few weeks, we weren’t even sure what we have actually tried so we needed to rerun pretty much everything”
Sound familiar?
In this article, I will show you how you can keep track of your machine learning experiments and organize your model development efforts so that stories like that will never happen to you.
You will learn about
What is experiment management?
- code versions
- data versions
- hyperparameters
- environment
- metrics
Tracking ML experiments
- share your results and insights with the team (and you in the future),
- reproduce results of the machine learning experiments,
- keep your results, that take a long time to generate, safe.
Code version control for data science
Problem 1: Jupyter notebook version control
- nbconvert (.ipynb -> .py conversion)
- nbdime (diffing)
- jupytext (conversion+versioning)
- neptune-notebooks (versioning+diffing+sharing)
jupyter nbconvert –to script train_model.ipynb python train_model.py;
python train_model.py
Problem 2: Experiments on dirty commits
One option is to explicitly forbid running code on dirty commits. Another option is to give users an additional safety net and snapshot code whenever they run an experiment. Each one has its pros and cons and it is up to you to decide.“But how about tracking code in-between commits? What if someone runs an experiment without committing the code?”
Tracking hyperparameters
Config files
data:
train_path: ‘/path/to/my/train.csv’
valid_path: ‘/path/to/my/valid.csv’
model:
objective: ‘binary’
metric: ‘auc’
learning_rate: 0.1
num_boost_round: 200
num_leaves: 60
feature_fraction: 0.2
Command line + argparse
python train_evaluate.py
–train_path ‘/path/to/my/train.csv’
–valid_path ‘/path/to/my/valid.csv’
— objective ‘binary’
— metric ‘auc’
— learning_rate 0.1
— num_boost_round 200
— num_leaves 60
— feature_fraction 0.2
Parameters dictionary in main.py
TRAIN_PATH = ‘/path/to/my/train.csv’
VALID_PATH = ‘/path/to/my/valid.csv’
PARAMS = {‘objective’: ‘binary’,
‘metric’: ‘auc’,
‘learning_rate’: 0.1,
‘num_boost_round’: 200,
‘num_leaves’: 60,
‘feature_fraction’: 0.2}
Magic numbers all over the place
…
train = pd.read_csv(‘/path/to/my/train.csv’)
model = Model(objective=’binary’,
metric=’auc’,
learning_rate=0.1,
num_boost_round=200,
num_leaves=60,
feature_fraction=0.2)
model.fit(train)
valid = pd.read_csv(‘/path/to/my/valid.csv’)
model.evaluate(valid)
parser = argparse.ArgumentParser()
parser.add_argument(‘–number_trees’)
parser.add_argument(‘–learning_rate’)
args = parser.parse_args()
experiment_manager.create_experiment(params=vars(args))
…
# experiment logic
…
That means you can use readily available libraries and run hyperparameter optimization algorithms with virtually no additional work! If you are interested in the subject please check out my blog post series about hyperparameter optimization libraries in Python.
Data versioning
- new images are added,
- labels are improved,
- mislabeled/wrong data is removed,
- new data tables are discovered,
- new features are engineered and processed,
- validation and testing datasets change to reflect the production environment.
Having almost everything versioned and getting different results can be extremely frustrating, and can mean a lot of time (and money) in wasted effort. The sad part is that you can do little about it afterward. So again, keep your experiment data versioned.
For the vast majority of use cases whenever new data comes in you can save it in a new location and log this location and a hash of the data. Even if the data is very large, for example when dealing with images, you can create a smaller metadata file with image paths and labels and track changes of that file.
A wise man once told me:
“Storage is cheap, training a model for 2 weeks on an 8-GPU node is not.”
“Storage is cheap, training a model for 2 weeks on an 8-GPU node is not.”
And if you think about it, logging this information doesn’t have to be rocket science.
exp.set_property(‘data_path’, ‘DATASET_PATH’)
exp.set_property(‘data_version’, md5_hash(‘DATASET_PATH’))
from neptunecontrib.versioning.data import log_image_dir_snapshots
log_image_dir_snapshots(‘path/to/my/image_dir/’)
Tracking machine learning metrics
“Log metrics, log them all”
“Log metrics, log them all”
Typically, metrics are as simple as a single number
but I like to think of it as something a bit broader. To understand if your model has improved, you may want to take a look at a chart, confusion matrix or distribution of predictions. Those, in my view, are still metrics because they help you measure the performance of your experiment.
exp.send_metric(‘train_auc’, train_auc)
exp.send_metric(‘valid_auc’, valid_auc)
exp.send_image(‘diagnostics’, ‘confusion_matrix.png’)
exp.send_image(‘diagnostics’, ‘roc_auc.png’)
exp.send_image(‘diagnostics’, ‘prediction_dist.png’)
exp.send_image(‘diagnostics’, ‘roc_auc.png’)
exp.send_image(‘diagnostics’, ‘prediction_dist.png’)
Note:Tracking metrics both on training and validation datasets can help you assess the risk of the model not performing well in production. The smaller the gap the lower the risk. A great resource is this kaggle days talk by Jean-François Puget.
Versioning data science environment
“I don’t understand, it worked on my machine.”
“I don’t understand, it worked on my machine.”
One approach that helps solve this issue can be called “environment as code” where the environment can be created by executing instructions (bash/yaml/docker) step-by-step. By embracing this approach you can switch from versioning the environment to versioning environment set-up code which we know how to do.
There are a few options that I know to be used in practice (by no means this is a full list of approaches).
Docker images
Docker images
# Use a miniconda3 as base image
FROM continuumio/miniconda3
# Installation of jupyterlab
RUN pip install jupyterlab==0.35.6 &&
pip install jupyterlab-server==0.2.0 &&
conda install -c conda-forge nodejs
# Installation of Neptune and enabling neptune extension
RUN pip install neptune-client &&
pip install neptune-notebooks &&
jupyter labextension install neptune-notebooks
# Setting up Neptune API token as env variable
ARG NEPTUNE_API_TOKEN
ENV NEPTUNE_API_TOKEN=$NEPTUNE_API_TOKEN
# Adding current directory to container
ADD . /mnt/workdir
WORKDIR /mnt/workdir
docker build -t jupyterlab
–build-arg NEPTUNE_API_TOKEN=$NEPTUNE_API_TOKEN .
docker run
-p 8888:8888
jupyterlab:latest
/opt/conda/bin/jupyter lab
–allow-root
–ip=0.0.0.0
–port=8888
Conda Environments
name: salt
dependencies:
– pip=19.1.1
– python=3.6.8
– psutil
– matplotlib
– scikit-image
– pip:
– neptune-client==0.3.0
– neptune-contrib==0.9.2
– imgaug==0.2.5
– opencv_python==3.4.0.12
– torch==0.3.1
– torchvision==0.2.0
– pretrainedmodels==0.7.0
– pandas==0.24.2
– numpy==1.16.4
– cython==0.28.2
– pycocotools==2.0.0
conda env create -f environment.yaml
conda env export > environment.yaml
Makefile
git clone git@github.com:neptune-ml/open-solution-mapping-challenge.git
cd open-solution-mapping-challenge
cd open-solution-mapping-challenge
pip install -r requirements.txt
mkdir data
cd data
curl -0 https://www.kaggle.com/c/imagenet-object-localization-challenge/data/LOC_synset_mapping.txt
cd data
curl -0 https://www.kaggle.com/c/imagenet-object-localization-challenge/data/LOC_synset_mapping.txt
source Makefile
experiment_manager.create_experiment(upload_source_files=[‘environment.yml’)
…
# machine learning magic
…
…
# machine learning magic
…
…
# machine learning magic
…
How to organize your model development process?
- how to search through and visualize all of those experiments,
- how to organize them into something that you and your colleagues can digest,
- how to make this data shareable and accessible inside your team/organization?
- filter/sort/tag/group experiments,
- visualize/compare experiment runs,
- share (app and programmatic query API) experiment results and metadata.
Working in creative iterations
time, budget, business_goal = business_specification()
creative_idea = initial_research(business_goal)
while time and budget and not business_goal:
solution = develop(creative_idea)
metrics = evaluate(solution, validation_data)
if metrics > best_metrics:
best_metrics = metrics
best_solution = solution
creative_idea = explore_results(best_solution)
time.update()
budget.update()
- your first solution is good enough to satisfy business needs,
- you can reasonably expect that there is no way to reach business goals within the previously assumed time and budget,
- you discover that there is a low-hanging fruit problem somewhere close and your team should focus their efforts there.
If none of the above apply, you list all the underperforming parts of your solution and figure out which ones could be improved and what creative_ideas can get you there. Once you have that list, you need to prioritize them based on expected goal improvements and budget. If you are wondering how can you estimate those improvements, the answer is simple: results exploration.
You have probably noticed that results exploration comes up a lot. That’s because it is so very important that it deserves its own section.
Model results exploration
- it leads to business problem understanding,
- it leads to focusing on the problems that matter and saves a lot of time and effort for the team and organization,
- it leads to discovering new business insights and project ideas.
- “Understanding and diagnosing your machine-learning models” PyData talk by Gael Varoquaux
- “Creating correct and capable classifiers” PyData talk by Ian Osvald
- “Using the ‘What-If Tool’ to investigate Machine Learning models” article by Parul Pandey
Diving deeply into results exploration is a story for another day and another blog post, but the key takeaway is that investing your time in understanding your current solution can be extremely beneficial for your business.
Final thoughts
Final thoughts
- what experiment management is,
- how organizing your model development process improves your workflow.