Data Science: From a data scientist perspective
What I’ve Learned Doing Data Science and Analytics at 8 Different Companies and 4 Jobs in 6 Years
Over the past 6 years, I’ve done Data Science and Analytics projects at companies like Adobe, USAA Bank, Nu Skin, Purple Mattress, Franklin Sports, and others. I’ve also held 4 different jobs in analytics, one with a mid-size IT consulting firm, one with a major corporation, one with a startup, and one with an e-commerce company.
I started my Data career right around the time ‘Data Scientist’ was named “The Sexiest Job of the 21st Century.” Over this time, I’ve learned how different companies structure, engage in and execute on data projects. I’ve also interviewed with 9 different companies for Data Scientist and other analytical roles and gained insight into how companies structure their data teams and how and who they hire to fill their positions. Additionally, I’ve gained analytical expertise entirely through mentorship, self-study, MOOC courses, or on-the-job experience.
If the above isn’t unique enough, my formal degrees are in Latin American (BA) and International Studies (MA), with little to no formal technical training.
Given my unique perspective, I’d like to share the most important things I’ve learned about Data Science and Analytics:
Corporate Data Science and Analytics teams only exist to solve business problems.
This seems like it should be self-explanatory, but somehow, it’s not. I can’t tell you how often I’ve seen data projects fail because somewhere along the way the data team lost sight of the rationale for its own existence. Like it or not, data teams are a support function designed to tackle legitimate business problems — i.e. problems that will either generate revenue for the company or save the company money, full stop.
I once had a Data Scientist tell me that he spent 3 full days working on a new feature for a predictive model, only to have the business he was supporting tell him that the feature he was working on was unnecessary and the predictive model in question was more than sufficient for their needs. Technical DS and DE types love to tinker and get heads down into code. It’s satisfying perfecting a predictive model and eeking out that last 2% or 5% accuracy. Unfortunately, the time it took you to go from an 80% AUC to 85% probably took as much time as it took you to get to 80%. Your value as a data professional is predicated on the dollars and cents that your models, pipelines, or data products are saving or generating and nothing more. How many dollars were lost so that the Data Scientist could spend 3 days tinkering with a new feature? Now, I’m not saying it’s not important for a Data Scientist to experiment — in fact, it’s crucial to what good Data Scientists do — however, keeping your eye focused on providing ROI is critical. Develop the ability to sacrifice complexity and unneeded optimization, for the sake of productivity and utility. You’ll find you get more done and provide more value.
There are several different kinds of ‘Data Scientists.’
Data Scientist is both the sexiest job of the 21st century and the most convoluted. Even though they think they do, no two companies want to hire the same Data Scientist. As I explained in an earlier article, Data Science is a broad field, not a job title — with a tri-dimensional set of skills. I’m tired of the argument over what is or isn’t a ‘real’ Data Scientist. This argument is an HR problem and doesn’t apply to what companies actually need. The combination of skillsets to execute against all dimensions of the Data Science continuum is not only incredibly rare but arguably unnecessary in a single person. The truth is what most organizations really need is someone who can pull together an array of data sources, create some simple models, and implement an automation. This set of skills doesn’t require a Ph.D. or an advanced technical degree, but can still provide incredible value to many companies. That being said, there’s certainly an important place for highly specialized, highly educated statisticians or researchers — but this need is created by the unique challenges of each company, not as a blanket requirement for the role of ‘Data Scientist.’
Data Engineering is more important than Data Science.
There’s a much greater need for the ability to stitch and organize disparate datasets from sources that aren’t designed to talk with each other than there is for the ability to develop and tune predictive models. Unless the company has incredibly well-defined challenges, with finite rulesets and business scenarios, the need for complex predictive models is going to be limited. Just starting out in Data Science and want to get a leg up on the competition? Learn the skills of a Data Engineer first, then figure out modeling and prediction. Not only will you be more valuable to almost any company who would hire you, but you’ll also create better models than your colleagues when/if you decide to go down the prediction path. Advanced SQL, web-scraping, API development, and Data cleaning skills will net you better gains than predictive modeling and tuning over the long haul.
Data Science leaders tend to hire people like themselves.
Many Data Science leaders (and leaders in general) hold fast to the idea that to solve complex challenges, they should hire the most specialized individuals they can (in many cases, the hires that have the experience as close to their own as possible, without being more accomplished). In the case of Data Science, the idea usually goes: the more highly credentialed Data Scientists I hire, the more complex data challenges I’ll be able to solve. Unfortunately, nothing could be further from the truth.
One iteration of this idea is called ‘Local Search,’ that is, using as many specialists as possible from a single domain and trying solutions that have worked before to solve your problem. While this idea feels correct, it’s missing critical ‘outside-in’ thinking — as in the ability to connect experiences and ideas far outside of focused training that can apply to the problem at hand. The book Range, by David Epstein provides several examples of ‘outside-in’ or ‘lateral’ thinking. In one example, VP of research at Eli Lilly, Alph Bingham argued to executives for posting twenty-one company research challenges that had stumped Eli Lilly scientists to the public. At first, executives declined the proposal, noting that “if the most highly educated, highly specialized, well-resourced chemists in the world we’re stuck on technical problems, why would anyone else be able to help?” (Epstien, Range, p.173). Eventually, they agreed on the basis that it couldn’t hurt. The results were astounding: More than 1/3 of the challenges were completely solved — including one by an attorney with absolutely no scientific experience, but whos’ knowledge came from working on chemical patents.
To build a team that solves truly complex, important problems, Data Science leaders need to hire an array of individuals from diverse backgrounds and expertise. They should resist the urge to build their teams from the same backgrounds or even the same technical abilities. The number of PhDs and credentials shouldn’t outweigh the diversity of experiences and accomplishments of your team.