Applying Valuable Lessons from the DIUx xView 2018 Detection Challenge

October 14, 2018

In my last post, I shared some of the lessons learned by the Wovenware Data Science team this summer as a contestant in the DIUx xView 2018 Detection Challenge. This contest asked participants to create innovative solutions for national security and disaster response issues using one of the largest publicly available datasets of overhead imagery, xView.

As mentioned previously, as part of the challenge we trained object detection models, packaged them inside Docker containers, and gave the containers a private validation cluster provided by DIUx.

Some of the first lessons we learned revolve around the fundamental aspects of machine learning preparation, including the need to wisely select tools and techniques needed for effective dataset exploration, cleansing and preparation; as well as the fact that working with dense, unbalanced datasets requires really strong time management and planning.

In addition, below are other key lessons we learned from the DIUx xView 2018 Detection Challenge.

Semi-Automated Model Training Requires Hard Work

Model training was based on a Keras implementation of the Single Shot Detector (SSD) and FAIR’s Retinanet. We developed custom scripts for automating model training using YAML config files to specify hyperparameters for our experiments. To achieve automation of training we made use of default and custom Keras callbacks to monitor loss plateaus, perform learning rate adjustments, and detect over fitting. For training we used Wovenware’s Octoputer server, which is powered by a GPU cluster of NVIDIA Titan Xps and CUDA 8.

With this strategy in place we were able to perform over 200 experiments in the first two weeks. Subsequent training sessions pushed that number to approximately 300 total experiments. This large volume of experiments in need of validation, and subsequently, large volumes of results in need of interpretation, grew to an almost unmanageable amount. Because of this, we brainstormed possibilities for modeling results interpretations and model selection with the goal to aid data scientists dealing with such a large volume of results just like we were.

In the end we learned that pushing the boundaries of automated deep learning model training is extremely difficult. Although we found ways to imitate basic decisions made by data scientists in the field (such as stopping training after detecting over fitting) we still relied on a long list of manually designed experiments that evolved as results from previous experiments were explored. The bottom line is that when going after AI pipeline automation don’t expect it to be a breeze.

Validate Early and Often

The main performance metric used in the xView competition was DIUx’s custom implementation of mean average precision (mAP). We performed local validation of our models using 10% of the dataset as a hold-out set, but only after DIUx published a hotfix for a bug that caused classes to score NaNs when no bounding boxes were assigned to them. We also frequently submitted containerized inference scripts to their validation cluster to get constant feedback from the private hold-out set. This feedback was very important for us to constantly gain insight about our models’ performances in the wild and where we stood regarding the single CPU / 8GB of RAM / 72hr constraints. That is why relying on local scoring code was not enough – if we did not validate with an appropriate hold-out set our results were just a bluff.

It Pays to Continuously Document Results

One of the most important lessons we learned is that consistent results documentation is essential, or else much of your work is in vain. Due to the large volume of results we generated, proper documentation could make or break our progress. That is why we recommend keeping a well-documented log of what you tried and how you tried it, what worked out and what didn’t, how to go back and find model files in the future, and how to replicate something you already tried.

It is extremely important to always avoid progress paralysis by lacking structured data for analysis.

Conclusions

As you might have noticed, all of our lessons learned revolved around the popular ML life cycle, from data exploration to hypothesis formulation, from model training and validation back to data exploration. It is clear to us data scientists at Wovenware that we are ready for the next challenge. We are ready to apply the lessons learned.

 

Leave a Reply

  • (will not be published)