When working on artificial intelligence models, you may feel like you are standing on the shoulders of giants. Improve machine learning models is one of the most challenging tasks for a data scientist and it requires a lot of experimentation and testing. Different optimization techniques can be applied to different types of models. For example, a neural network might require additional hidden layers, depth, and complexity, while a logistic regression may require changing hyper-parameter combinations. If these techniques do not yield any results and the model is not learning, where do you go from there? What should you look for, and how do you find the culprits to less than ideal performance?
There are many resources on the web that outline different techniques to modify part of your model’s architecture and improve accuracy. However, you must be careful when taking this approach because changing a model’s architecture and its underlying mathematical algorithms can drastically change how the model learns from the information it is given and the interpretations it makes about the world. Experimenting, however, can lead to novel and ground-breaking solutions. It is one of the most exciting parts of a data scientist’s job, but it is also a time-consuming endeavor that should not be rushed. Resist the temptation to dive into research and instead take a step back and go-over the fundamentals.
There are three key elements that determine the potential of your model. First and foremost is the data, you probably have heard it a dozen times but it can’t be stressed enough, data is key. Second, you should review your model’s assumptions as most models can only excel under certain circumstances. Finally, you should take a deep dive when studying the results of the training and testing metrics, searching for answers beyond accuracy numbers.
Evaluate the Data
Every single time I think about improving a model’s performance the first thing that pops into my mind is the 2009 article, The Unreasonable Effectiveness of Data, by Peter Norvirg et. al., which suggests that often the most effective way to improve a model is to increase the amount and the quality of the data. Today, datasets are high quality and have been properly and extensively curated, so it is unlikely that working with the data alone will significantly improve your model’s performance. With the advent of layers and techniques that can help models process data in more effective ways, the model itself can handle the task of extracting more information from available examples. Nonetheless, analysis of the relationships within the data, a large enough dataset, and sufficient data quality, are still essential to achieving good performance in models. In fact, the article was revisited in 2017 by Abhinav Gupta in a paper titled Revisiting the Unreasonable Effectiveness of Data, in which he explains that although model capabilities and hardware have increased, a sufficiently large dataset continues to be paramount for improving model performance. Gather as much data as possible and then scrub it as best you can to keep it clean and organized.
Verify Your Assumptions
If you are confident that you have a quality dataset that is sufficiently large enough for the task, but the model is still not performing good enough, what then? You should then evaluate the underlying assumptions that your model is making about the data and the problem you’re trying to solve. Whether the model is a recurrent neural network designed to identify recurring patterns, or a convolutional neural network learning to extract and identify a dataset’s most important features, these assumptions should fall in line with the task you’re trying to perform and the information that is provided in the dataset. You should also ensure that the cost and loss functions confirm what you’re trying to optimize. For example, suppose you are training a model to classify two groups of cars: performance and non performance, based on attributes such as weight, horsepower, and 0-60 time. If you used categorical cross entropy as your loss functions, this approach will work but you will not get as good a performance from the model as you would have gotten if you had used binary cross entropy. Categorical cross entropy assumes mutual exclusion between classes while binary cross entropy doesn’t, and some cars blur the line between performance. Another factor is that categorical cross entropy performs best when classifying between multiple classes, usually more than two, while binary cross entropy can handle two classes particularly well because it excels at solving problems that are binary in nature. The better the fit between the assumptions that are being made and the problem that is trying to be solved, the more likely you are to get good results.
Understand the Results
If the model is still not performing after both the data and the model assumptions are tuned and aligned with the objective, it’s time to take a deep dive into the results generated from the black-box (ANN, etc…) or white-box (Decision Trees, etc…) and use a magnifying glass to get a better understanding of what is going on during the learning process. Studying the metrics is a great start. Consistent patterns in specific metrics may reveal where the model is failing to perform as expected, but often there is more information behind the numbers. If the algorithm’s decision process is explainable (white-box models) this task might be straight-forward, but if it isn’t (black-box models) all hope is not lost. By obtaining visual representations of the common features across the results, or of the inner structures the model utilized to make a decision, you might be able to make some educated guesses as to why you are getting the results. You should look to see if the model appears to be paying special attention to an element that may be confounding the results. You should also take the time to analyze if there is any hard constraint that intrinsically limits the ability of a model to learn and improve, such as irreducible random noise. This may be part of the phenomenon you are trying to study and it might not go away no matter how much more data you add.
In a best-case scenario, after going through these steps you should optimize or at least improve your model’s performance, or have a pretty good idea of why your model isn’t performing as expected.
Maintain Discipline and Focus
There are many excellent online resources that describe mechanical approaches that you can use for improving a machine learning model, but, as a data scientist, you need to think more critically about the business problems you are solving. Evaluating the data, aligning underlying assumptions and analyzing results are fundamental to optimizing artificial intelligence models. Going back to the basics may not be as thrilling as undertaking new research, but maintaining discipline and focus is important to achieving your goals. You can achieve whatever you put your mind to, so go out there and get those results.