Optimizing Machine Learning Models: Helpful Tips for Data Scientists

When working on artificial intelligence models, you may feel like you are standing on the shoulders of giants. Optimizing the performance of machine learning models is one of the most challenging tasks for a data scientist and it requires a lot of experimentation and testing. Different optimization techniques can be applied to different types of models. For example, a neural network might require additional hidden layers, depth, and complexity, while a logistic regression may require changing hyper-parameter combinations. If these techniques do not yield any results and the model is not learning, where do you go from there? What should you look for, and how do you find the culprits to less than ideal performance?

There are many resources on the web that outline different techniques to modify part of your model’s architecture and improve accuracy.  However, you must be careful when taking this approach because changing a model’s architecture and its underlying mathematical algorithms can drastically change how the model learns from the information it is given and the interpretations it makes about the world. Experimenting, however, can lead to novel and ground-breaking solutions. It is one of the most exciting parts of a data scientist’s job, but it is also a time-consuming endeavor that should not be rushed. Resist the temptation to dive into research and instead take a step back and go-over the fundamentals.

There are three key elements that determine the potential of your model. First and foremost is the data, you probably have heard it a dozen times but it can’t be stressed enough, data is key. Second, you should review your model’s assumptions as most models can only excel under certain circumstances. Finally, you should take a deep dive when studying the results of the training and testing metrics, searching for answers beyond accuracy numbers.

Evaluate the Data

Every single time I think about improving a model’s performance the first thing that pops into my mind is the 2009 article, The Unreasonable Effectiveness of Data, by Peter Norvirg et. al., which suggests that often the most effective way to improve a model is to increase the amount and the quality of the data. Today, datasets are high quality and have been properly and extensively curated, so it is unlikely that working with the data alone will significantly improve your model’s performance. With the advent of layers and techniques that can help models process data in more effective ways, the model itself can handle the task of extracting more information from available examples. Nonetheless, analysis of the relationships within the data, a large enough dataset, and sufficient data quality, are still essential to achieving good performance in models. In fact, the article was revisited in 2017 by Abhinav Gupta in a paper titled Revisiting the Unreasonable Effectiveness of Data, in which he explains that although model capabilities and hardware have increased, a sufficiently large dataset continues to be paramount for improving model performance. Gather as much data as possible and then scrub it as best you can to keep it clean and organized.

Verify Your Assumptions

If you are confident that you have a quality dataset that is sufficiently large enough for the task, but the model is still not performing good enough, what then? You should then evaluate the underlying assumptions that your model is making about the data and the problem you’re trying to solve. Whether the model is a recurrent neural network designed to identify recurring patterns, or a convolutional neural network learning to extract and identify a dataset’s most important features, these assumptions should fall in line with the task you’re trying to perform and the information that is provided in the dataset.  You should also ensure that the cost and loss functions confirm what you’re trying to optimize. For example, suppose you are training a model to classify two groups of cars: performance and non performance, based on attributes such as weight, horsepower, and 0-60 time.  If you used categorical cross entropy as your loss functions, this approach will work but you will not get as good a performance from the model as you would have gotten if you had used binary cross entropy. Categorical cross entropy assumes mutual exclusion between classes while binary cross entropy doesn’t, and some cars blur the line between performance. Another factor is that categorical cross entropy performs best when classifying between multiple classes, usually more than two, while binary cross entropy can handle two classes particularly well because it excels at solving problems that are binary in nature. The better the fit between the assumptions that are being made and the problem that is trying to be solved, the more likely you are to get good results.

Understand the Results

If the model is still not performing after both the data and the model assumptions are tuned and aligned with the objective, it’s time to take a deep dive into the results generated from the black-box (ANN, etc…) or white-box (Decision Trees, etc…) and use a magnifying glass to get a better understanding of what is going on during the learning process. Studying the metrics is a great start. Consistent patterns in specific metrics may reveal where the model is failing to perform as expected, but often there is more information behind the numbers. If the algorithm’s decision process is explainable (white-box models) this task might be straight-forward, but if it isn’t (black-box models) all hope is not lost. By obtaining visual representations of the common features across the results, or of the inner structures the model utilized to make a decision, you might be able to make some educated guesses as to why you are getting the results. You should look to see if the model appears to be paying special attention to an element that may be confounding the results. You should also take the time to analyze if there is any hard constraint that intrinsically limits the ability of a model to learn and improve, such as irreducible random noise. This may be part of the phenomenon you are trying to study and it might not go away no matter how much more data you add. 

In a best-case scenario, after going through these steps you should optimize or at least improve your model’s performance, or have a pretty good idea of why your model isn’t performing as expected.

Maintain Discipline and Focus

There are many excellent online resources that describe mechanical approaches that you can use for improving a machine learning model, but, as a data scientist, you need to think more critically about the business problems you are solving. Evaluating the data, aligning underlying assumptions and analyzing results are fundamental to optimizing artificial intelligence models. Going back to the basics may not be as thrilling as undertaking new research, but maintaining discipline and focus is important to achieving your goals. You can achieve whatever you put your mind to, so go out there and get those results.

Businesses Are At A Fork in the Road When it Comes to Digital Transformation

Recently I shared some thoughts about digital transformation in an article for the Forbes Technology Council. I noted what it is and what it is not, as well as how to get ready for the journey.

We’re at a pivotal time when it comes to taking the digital transformation plunge. Businesses who take the wrong road and delay the move to a digital-first organization could be left behind, without the ability to compete in the future.

As I mentioned in my Forbes article, a key reason for the delay is confusion about what digital transformation is, after all, the term is tossed around quite loosely. It’s often confused for technology modernization, which usually has a very different goal.

Technology modernization is upgrading your current technology to improve its capabilities, yet digital transformation means a complete change in your overall business model and finding new forms of revenue through digital value streams.

Digital transformation is no easy feat even for the boldest of companies, but there are some ways to make the journey a little easier:

  • Lead by design. Organizations can take a design thinking approach to make sure your initiatives are being driven by the actual people who will be using or benefitting from them.
  • Restructure to get the best results. Digital transformation not only requires top-down support from management, but many times it’s important to create a new business unit, with its own staff, budget and other resources to give it the priority it needs.
  • Take an incremental approach. It’s not necessary to dive head first into digital transformation without first putting your toes in the water. Some companies are turning to innovation sprints to determine if a full-fledged tech project is warranted.

When it comes to remaining relevant, digital transformation and adopting a digital mindset will no longer be an approach for the riskiest of companies, but a requirement for everyone, across the board. By taking some basic first steps, the journey can begin today.

NVIDIA GTC Conference Sheds Light on Power of GPUs

In October I had the privilege of speaking at the NVIDIA GPU Technology Conference (GTC), an annual event for developers, researchers, engineers, and innovators in the NVIDIA community. The conference helps them enhance their skills by sharing ideas, best practices and inspiration for new ways to approach AI projects. Noted speakers at this year’s event hailed from leading academic institutions, such as MIT and Johns Hopkins; major companies, such as Facebook and Amazon; and of course senior-level experts at NVIDIA.

As a member of the NVIDIA Inception Program, Wovenware has been fortunate to have access to training from NVIDIA’s Deep Learning Institute, as well as the ability to participate in its developer forums, along with other key benefits. So, we were honored when Wovenware was invited to submit a topic that would be considered for presentation at GTC.

I submitted an abstract on how to port Keras Lambda layers to TensorFlow.js, and was thrilled when it was accepted. The challenge was boiling down a complex subject into a five-minute discussion.

I basically shared a specific project we worked on involving the development of a deep learning solution on the edge, for satellite imagery. Our deep learning model was trained using the xView dataset.

I shared how, by using custom Lambda layers to build Keras models, users can specify an arbitrary Python function to be wrapped as a Keras Layer object. They are typically used to construct sequential and functional API models.

The challenge, however, is that although Keras Lambda layers provide flexibility for users to surpass what is achievable beyond the use of stock layers, the Python-based implementation of such layers is not portable to run on TensorFlow.js at runtime, without some hand-tailored intervention. I shared how we solved this complex problem, porting trained Keras models with custom Lambda layers to TensorFlow.js for object detection.

Our goal was to convert deep learning models with a public tool to run on a browser. When running on a browser, we could avoid the use of an external resource, or a remote server that would need to be hosted and paid for. By pushing the models out to a browser , they would live on a local computer and execute at that device with no external cost. In this way, we would bring our models to the edge, as close to the end device hardware as possible.

The NVIDIA GTC inspired me with new ways of leveraging the power of NVIDIA’s GPUs and advanced AI tools and technologies to create new and innovative solutions to support the challenges of business, government and the world at large. I hope my presentation inspired others as well to find new ways to solve complex AI challenges.

Wovenware Named a Strong Performer Among Computer Vision Consultancies by Independent Research Firm

SAN JUAN, Puerto Rico —Nov.16, 2020 — Wovenware, a nearshore provider of AI and digital transformation solutions, today announced that it has been named a Strong Performer in The Forrester New Wave™: Computer Vision Consultancies, Q4 2020, report on the 13 top computer vision (CV) providers. Wovenware joins large firms, such as Accenture, Capgemini, Deloitte and PwC, in receiving this designation.

Forrester’s evaluation noted that while Wovenware focuses on other computer vision use cases, “…its specialty is developing CV models for satellite and aerial imagery.” It also noted that “Wovenware is one of the few consultancies with a large, in-house, U.S.-based (in Puerto Rico) data labeling team and has a history of clients with stringent security requirements.” Based on customer interviews, the report stated that ”Wovenware had glowing recommendations that praised its support for the entire CV lifecycle, technical expertise, and professional execution.”

“We’re honored to be recognized by Forrester as a strong performer and included alongside some of the largest consultancies in the technology industry,” said Christian González, CEO, Wovenware.“Computer vision is fast becoming one of the key areas in AI automation, driven by advances in deep learning, analytics and the Internet of Things, and we look forward to working with customers to innovate new CV applications that enable them to leverage data-driven insights to do everything from automating business processes and conducting predictive maintenance, to improving the planet and keeping citizens safe.”

Wovenware helps companies in a variety of industries and applications, such as satellite imagery, collect, clean and optimize image data assets. Its proprietary deep learning algorithms can automatically identify and classify objects, as well as detect patterns in image data sets to extract actionable insights. The company’s private crowd is comprised of a large team of trained U.S.-based data specialists under NDA who undergo extensive training to consistently identify and label data, including objects and images, with extreme accuracy and precision.

The Forrester New Wavetm : Computer Vision Consultancies, Q4 2020 is an evaluation of the emerging market for computer vision (CV) consultancies. Forrester identified the 13 most significant providers in the category — Accenture, Brillio, Capgemini, Deloitte, EPAM, Fractal Analytics, Grid Dynamics, Infosys, Insight Enterprises, Perficient, PwC, Quantiphi, and Wovenware — and scored vendors against 10 criteria and where they stand in relation to each other. The companies included had to have deployed more than five CV projects, have more than five enterprise or government customers, and provided CV consulting services to customers for more than two years.

To download the full report visit: https://www.wovenware.com/forrester-new-wave-computer-vision-2020/.

Designing Solutions with a Serverless Architecture Requires a Change in Mindset

Over the past decade significant advances in cloud services have helped businesses build scalable applications with increasing agility and scalability. When modernizing legacy applications or designing new cloud native applications using a serverless architecture, everyone on the team must be aligned with the specific design goals and the challenges it entails.

The first thing to consider is that serverless architectures are a new paradigm to the traditional data center. In the past, any task that involved infrastructure would most likely be delegated to a systems administrator and a developer. The only exception would be cases, such as configuring a virtual machine/server, which would not have any interaction with the infrastructure. With cloud services, the on-premise data center ceases to exist and is replaced by lines of code. These lines of code provide the same functionality as an on-premise data center, but in the form of cloud services, hence Infrastructure as a Service (IaaS).

One of the benefits of running your infrastructure in the cloud is the ability to scale any service on the fly. To accomplish this in a traditional data center, you would have grabbed a vendor catalog, looked for the upgrade that was needed to accomplish the scaling, order the upgrade, wait a couple of days/weeks for the upgrade to arrive, install and configure the upgrade, test the upgrade, and finally the service would be scaled. In cloud services, the only thing you need in order to scale a service is a couple of mouse clicks and in a few minutes (or less) the service is scaled. And, this applies both ways, for scaling in and scaling out.

Operations: On-Premise vs Cloud

An on-premise data center requires dedicated resources to provide maintenance and ensure the availability of the services. Maintenance tasks can include upgrades, monitoring, and inspections. In a cloud environment, most of these tasks are outsourced to the cloud provider and its automated platform. The combination of these automated processes and services is what allows the cloud providers to offer service level agreements with high availability.

However, that is not to say that a cloud environment is free of considerations. The availability put forth by the provider refers to individual resources. The overall solution availability results from the combination of the individual resources. This is important to take into consideration when designing an application with a serverless architecture because it can represent a loss of data, revenue, or quality of service since the availability of individual resources is independent.

Designing for cloud architectures requires that you take new considerations into your design. The important thing to remember is that the traditional data center is mostly represented as code in the cloud. With that in mind, there will be additional efforts to consider when representing the infrastructure as code. To get started, ask the following questions:

  • Does infrastructure code need to be part of a CI/CD pipeline?
  • What type of scaling should be applied: fixed or automatic?
  • What additional code or resource should be tested before moving to production?
  • What additional logging efforts should be put in place?

Stay tuned for my next blog post where I will dig a little deeper on identifying these considerations.

The Case For Unit And Integration Testing In Software Development

Long ago, Leonardo Da Vinci had many ideas. Some of them were great, while others not so much. Among them was a premature prototype of what we know today as a helicopter. To his bad fortune, his idea was not realized by him — at least not successfully. Although his idea of having several men on a platform that spins a machine like a screw and propels it to the sky was interesting, in theory, the low power/weight ratio of men and the machine itself would fail. Nevertheless, there was only one way he could know for sure: by testing his theory, either by doing mathematical calculations to determine the possibilities or by creating the machine in question and trying to fly it. No matter how, he had to test because testing is the only way to separate ideas from actual results.

This also happens every day to software developers, especially because of human factors. Even when a protocol or model has worked for many years, errors may occur in its implementation or development. A badly positioned curly brace ({}) can completely alter the continuity of an algorithm. A document in the wrong format (UTF-8, ISO-8859-1 …) can transform a special character into a wildcard character and the search for records becomes much more complex. A user’s input may be completely different than expected and cause a GIGO. Many factors can fail (ask a college student what happens when you have been doing calculations for hours and after all that effort you realize that you had the calculator in radians instead of degrees). All of these issues (and others), however, can be partially (or totally) mitigated with automatic software testing.

In this blog I will show you what automatic software testing is and why it is so important. Since showing is much more powerful than telling you, I will also present different real-life stories that will sustain my case in favor of unit and integration testing.

First, let’s define what automatic testing is. In software development, automatic testing is a technique to test and compare the actual outcome versus the expected outcome of a component or process without human intervention. There are several types of automation tests, ranging from smoke testing to functional or regression testing. However, for the sake of brevity, I will focus this article on the two more commonly used types for developers: unit tests and integration tests.

A perfect example of why an integration test is very important is the case of the Mars Climate Orbiter.

Mars Climate Orbiter

In 1998, NASA sent an expensive ($327.6 million ) space probe into space to orbit and, among other things, collect climate information from the Red Planet. Each of its components had been tested many times. However, the mission was a failure. What happened? Although all components worked perfectly on their own, the different teams that developed them never agreed to use the same metric system. Therefore, some components performed calculations with the English system and others with the metric system. This resulted in the space probe approaching Mars at the wrong angle and losing all communication and tracking. It is not even known if the probe crashed or remained wandering through space.

An integration test, as you can see, can identify issues that are most relevant when two or more components are working together, especially during communication. For other most independent issues a unit test may be best.

Unit Testing

A unit test, as the name implies, tests a part (unit) of a set. It allows you to detect specific failures on individual components early in the process. Yet, it is common for software developers and clients, to assume that only integration tests are necessary. The thinking is that if the set works, it implies that the units work, but that simply is not true. Remember Murphy’s Law: “Anything that can go wrong will go wrong.” Pessimistic, yes, but it has saved many lives. The story of the infamous Therac-25 can show you why.


The Therac-25 was a radiation machine developed by Atomic Energy of Canada Limited (AECL) in the 1980s. Contrary to its predecessor, Therac-20, which used a software system along with hardware (physical) security methods, the Therac-25 would be the first radiation machine that would use a completely software-controlled system. Since it was no longer needed, AECL removed all physical security. It took for granted that the Therac-20’s long history of safe operation, whose components and software was inherited by the Therac-25, was sufficient proof that its components would work perfectly. The result of this thinking was the overexposure to radiation of at least 6 people (of which three died), a Class I recall from the Food and Drug Administration (FDA) and several lawsuits. The cause? Several software errors that, according to the tests, had been present since Therac-20.

Among the many mistakes, the most notable were two:

  1. If a physician mistakenly pressed the X and tried to proceed, but then pressed the E (the correct code to apply the maximum safe amount of radiation), the machine was configured to apply a radiation beam to the patient in a lethal dose.
  2. The safety method to prevent a lethal amount of radiation from being applied used an integer variable instead of a Boolean. The value 0 meant that it was safe to continue (TRUE, secure). Any value other than 0 meant the opposite (FALSE, not secure). In theory, although bad praxis, this should work. However, the software was coded in such a way that for each security validation that worked correctly, the code assigned a 0 to the variable, but when not, instead of assigning a specific value to the variable, such as -1, the code increased the value of the variable by 1 each time. This was a lethal error. Programmers forgot that their validation code ran hundreds of times per minute and that data types cannot contain an infinite number of values. Therefore, from time to time, in both Therac-20 and Therac-25, an Integer Overflow occurred in the code and the variable was reset to 0.

The reason why fatalities never occurred in Therac-20 was due to security hardware controls since when the error occurred in the software and the machine was about to kill someone, the hardware caused an internal error that restarted the process and forced the physician to start over.

As you can see, a unit test can show you component-specific errors that may be overshadowed during an integration test. In software development, this happens frequently and it’s our responsibility to prevent those things from happening.

Both unit and integration test are very important. The lack of doing one or the other, for whatever reason, will always carry high risks. When those risks are counted in human lives, no reset button is worth it.