Summary: Vincent Tompkins, a senior data scientist at Wovenware, shares insights into his background and involvement in the EY Open Science Data Challenge, emphasizing the importance of diverse model exploration in machine learning solutions and addressing prevalent themes like data labeling.
Table of Contents
We recently sat down with Vincent Tompkins, a senior data scientist with Wovenware, who participated in an Ask the Experts panel for registrants of the EY Open Science Data Challenge.
This challenge, which is being supported by Maxar Intelligence, is hosted by Ernst & Young (EY), and challenges students and early professionals to use Maxar’s high-res satellite imagery, collected in Puerto Rico from Hurricane Maria, to build machine learning models to help coastal communities become more resilient to the effects of climate change. The top three finalists will receive cash prizes up to $10,000, and a trip to the International Geoscience and Remote Sensing Symposium (IGARSS) in Athens, Greece in July.
Vincent joined other Maxar and EY experts in February to answer participants’ questions about data science, AI and satellite imagery and address any issues they’re encountering in the challenge. His thoughtful responses were well received and clearly demonstrated his mastery of AI and data science. We’ve posed key questions to Vincent to learn more about his impressions of the session, but also to learn more about him as a key contributor to the Wovenware team.
First tell us a little about yourself. Where did you go to school, what got you interested in data science?
So, I’m originally from North Carolina. I obtained a B.S. degree in both computer engineering and electrical engineering from North Carolina State University. After that, I enrolled in the MS program in electrical engineering at the Polytechnical University of Puerto Rico’s San Juan Campus, concentrating on digital signal processing. For my MS thesis, I helped develop some mathematical models of an experimental plasma diagnostic probe, which is what started me on my journey to discovering data science and artificial intelligence.
How long have you been with Wovenware?
I started working for Wovenware 3 years and 7 months ago as a data scientist. At the time, it was one of the few places on the island providing data science solutions, which is what brought me here.
What is one project that you’re most proud of?
There really is no one project that I’m most proud of since I try to apply the same work ethic to each project. However, I will say that the project I will always be fond of is the very first project I worked on when I joined Wovenware because I learned so much about industry practices and data science workflow from a practical standpoint, as opposed to the academic process I had been exposed to before this.
Why were you interested in supporting the EY Open Science Data Challenge Ask-the-Experts session?
Mentoring and science advocacy are an important part of my living a fulfilling life and seizing enriching experiences, so I try not to overlook these opportunities whenever they present themselves. When offered the opportunity to assist participants by answering their challenge questions, I leapt at the chance to give back and assist others in their learning endeavors. Sharing and collaborating with others is how we grow as a community and a society.
As an expert for the EY Open Science Data Challenge what was the key tip or advice you shared with the more than 91 participants who joined the call from around the world?
In my opinion, the most important advice I gave, and I’m paraphrasing, would be to try various models and combinations of models when building your solution. No single architecture is an end-all solution and, in a competition, where it is mostly low stakes, participants should take the opportunity to really flex their creative muscle. Some of AI’s most novel and innovative contributions come from these competitions, so I say, if you have the time, go for it.
What seemed to be the dominant theme of questions presented during the session?
As I look back on the Q&A, one theme that stood out was data labeling. There were many questions about what data do you label and how do you do it. There were also many questions about what tools are the best to use, and how much data should be labeled. Those kinds of questions made up the bulk of the participants’ concerns, which is not very surprising. A model is only as good as the data it is trained on, so it appeared to me that participants really wanted to know how to build the best datasets possible, as fast as possible, to start training the machine learning solution.
What are some examples of how predictive solutions could help address climate change on coastal zones?
There is a myriad of answers to this question, from understanding how changes in weather patterns affect food sources and how rising sea levels impact flood zones, to correlations between climate patterns and storm damage. It’s a growing issue and it will continue to become a key topic of interest.
What is it about Maxar’s satellite imagery that makes it optimum training data for predictive solutions?
Maxar provides some of the highest quality imagery at a variety of resolutions and, depending on the area, a variety of time points. Maxar even offers analysis-ready data, which expedites time to training, so you can find crisp imagery ready for labeling for an array of tasks. Whether you’re investigating land-use or land coverage problems, or doing some small object detection using its native 30 cm or derived 15 cm imagery, you can make out details at the street level. It is that level of definition and consistency that allows companies to build the highest quality datasets. As I have eluded to before, a quality model needs quality data.
Before labeling any data or developing the ML solution, what is the very first thing a participant should do before initiating the project?
It is important to understand the problem at hand. Often, we’re given data before we know what it is we want to learn from it. Once you understand the nature of the problem, let the insights from the data drive your questions and shape your hypothesis with exploratory data analysis and data governance. Once you’ve developed your hypothesis, determine the validity of your hypothesis, and justify your experiment. Why should we care about this problem, what makes it a problem that needs to be solved, and who will benefit from resolving the problem? Then you need to identify if (and how) others have tried to answer the same or something that resembles the problem you are seeking to solve. After this, ask yourself how you could improve their method or provide a better solution with the data you have. At any point in time you may find yourself returning to a previous point in the process and that’s okay, it’s a part of the discovery process and it means you are truly engaged in the project, just remember you will actually have to complete the experiment to win!
What do you like to do when you’re not training datasets or building ML solutions in your spare time?
Hands down, spending time with my son. Seeing him learn and grow has shifted my perspective on life. I also enjoy movies and reading, mostly sci-fi or non-fiction. I recently completed the fourth book of the “Murderbot Diaries” novella series, which blends comedy, sci-fi, action and philosophy seamlessly, and I’m a fan of Philip K. Dick’s works. I also enjoy a decent RPG video game and board games, especially chess. I’m always up for a game of chess or studying tactics. If you’re on Chess.com feel free to add IllicitPhoenix as a friend.
We give a big shout-out to Vincent and his contribution to the EY Open Science Data Challenge, but also to his ongoing contribution to making Wovenware a center of AI excellence and for serving as an inspiration to rising data scientists everywhere.