I spent the weekend playing the best sport in the whole world; my all-time favourite sport, a sport I started playing over 30 years ago.
That sport is underwater hockey.
Yes, it’s a real sport.
Here in NZ, we’re actually really good at it. The men’s and women’s teams are both current world champions.
And this year is a world’s year, which means all trialists as training hard for selections to the Elite teams which will be made over the next couple of weeks.
That means there are many really fit and fast players in the water every game.
It means the standard of play was exceptionally high.
It was an incredible weekend.
The tournament this past weekend was ‘Regionals’ – The NZ InterZone Championships; this is the best comp; it’s super competitive; the best players in each region duke it out for the top spot.
And they invite a Master’s Invitational team for both the men’s and women’s grades.
Our Underwater Hockey Club was strongly represented across 6 teams, the Northern Region, Men’s A and Men’s B Team, the Northern Region Women’s A and Women’s B team and the Men’s and Women’s Masters Teams.
The younger players in our club are the same age as I was when I first represented Western Province (South Africa) in the equivalent competition back in 1990.
And it is phenomenal to watch these youngsters – they have so much potential, they’re all so eager to learn, and so many of them are already so skilled, having started playing when they were barely teenagers.
As a club, we train together every Monday, and during these training sessions, we ensure that everyone, regardless of level, has the opportunity to learn and develop. And we leverage all the skills and experience that we have available. Fresh eyes and minds see things that might otherwise be missed; the old guard has tips, tricks and skills that are still effective.
Bringing the old and the new together in creative ways can mean the difference between Gold and Silver.
This is one of the areas where sport and L&D at work have strong overlaps.
We can, and should, all learn from one another.
Over the years, in the world of data science, I have noticed that there is a certain segment of up-and-coming data scientists who tend to be quite dismissive of the old guard; they seem to think we have nothing to offer; that we’re past our sell-by date and should be put out to pasture.
After all, back in the day, computing power wasn’t what it is now. We couldn’t simply spin up servers and scale systems to handle bigger datasets. We didn’t have the luxury of simply taking all our data and throwing it at a number of algorithms and seeing what popped out.
For a while, it seemed like there was a bit of a goldilocks zone – data wasn’t so huge that it created problems AND computing power was big enough that, as analysts, we could simply throw everything in the pot and see what happened. Key model metrics looked good, and there was this golden moment where the new school data scientists seemed to walk on water.
And then reality started biting.
You see, in order for models to really be effective, they need to be robust. This means that all the “old fashioned” things, like cleaning data, and ensuring that variables are not strongly correlated, need to be done, and need to be done well. And no amount of computing power can mitigate that.
Speaking of computing power…
Spinning servers up and throwing enormous quantities of mostly raw and unwrangled data at various algorithms can become quite expensive, depending on how these things are set up.
It could also mean that models decay quite faster than expected, which means that they then need to be rebuilt, AND costs are higher than anticipated because not only do models take longer to build, they need to be rebuilt more frequently.
Simply put, this is less than ideal.
The old guard can help with that.
Learn to love data wrangling
Back in the day, we called it data cleaning and data prep. It doesn’t really matter what you call it. What matters is that it is done and done well. Datasets need to be sourced and cleaned, records may need to be merged. The data needs to be evaluated – are they complete, what proportion of data is missing and what ought to be done about that?
Feature evaluation, feature reduction and feature engineering
All analysts I know say that want more data. I’ve said it myself many times. What we actually mean is that we want more data so that we can extract more information. And this is where feature engineering comes in. We have to understand the value, the predictive power, associated with each feature. This may involve extracting information from a feature – for example, a unique product ID could be too finely classed, and by extracting certain features from that single feature, we end up with more information. Perhaps we have variables that will deliver more information once transformed or scaled. Often we have several features that are strongly correlated, and by only selecting one, we lose too much information. A better option could be to reduce several features to a single feature, thus retaining as much information as possible.
Algorithm Selection
There are many algorithms to choose from; the ‘best’ one depends on what the goal is. Some of the go-to techniques were developed hundreds of years ago. They’re still popular because they work AND are readily interpretable.
There are some excellent newer techniques. And these techniques rely on computing power. In order to determine which one is best, you need a good understanding, not only of the technique but of how it will be used and what its strengths and weaknesses are.
Data Sampling
Back in the day, sampling was essential – primarily because we just did not have the required computing capacity. There are plenty of reasons why sampling is still a great idea; chief amongst them:
It reduces model-training times, which means you can build and evaluate more models in the same amount of time.
It reduces modelling costs; if it takes less time to build models, your model-building costs will reduce.
Imagine the impact they could have if the old guard and the up-and-coming talent worked together to leverage their combined knowledge. What fantastic results, insights and benefits would we deliver if we could just get past the ‘intergenerational’ conflict?
Commenti