What is Data Science
So you want to be a “data scientist”?
There is no widely accepted definition of who a data scientist is.
- Several books now attempt to define what data science is and who a data scientist,
- It is likely to be an individual with multi-disciplinary training in computer science, business, economics, statistics, and armed with the necessary quantity of domain knowledge relevant to the question at hand. The potential of the field is enormous for just a few well-trained data scientists armed with big data have the potential to transform organizations and societies. In the narrower domain of business life, the role of the data scientist is to generate applicable business intelligence.
Data science is transforming business. Companies are using medical data and claims data to offer incentivized health programs to employees. Caesar’s Entertainment Corp. analyzed data for 65,000 employees and found substantial cost savings. Zynga Inc, famous for its game Farmville, accumulates 25 terabytes of data every day and analyzes it to make choices about new game features. UPS installed sensors to collect data on speed and location of its vans, which combined with GPS information, reduced fuel usage in 2011 by 8.4 million gallons, and shaved 85 million miles off its routes. 5 McKinsey argues that a successful data analytics plan contains three elements: interlinked data inputs, analytics models, and decision-support tools. 6 In a seminal paper, Halevy, Norvig and Pereira (2009), argue that even simple theories and models, with big data, have the potential to do better than complex models with less data.
In a recent talk 7 well-regarded data scientist Hilary Mason emphasized that the creation of “data products” requires three components: data (of course) plus technical expertise (machine-learning) plus people and process (talent). Google Maps is a great example of a data product that epitomizes all these three qualities. She mentioned three skills that good data scientists need to cultivate: (a) in math and stats, (b) coding, (c) communication. I would add that preceding all these is the ability to ask relevant questions, the answers to which unlock value for companies, consumers, and society. Everything in data analytics begins with a clear problem statement, and needs to be judged with clear metrics.
Being a data scientist is inherently interdisciplinary. Good questions come from many disciplines, and the best answers are likely to come from people who are interested in multiple fields, or at least from teams that co-mingle varied skill sets. Josh Wills of Cloudera stated it well “A data scientist is a person who is better at statistics than any software engineer and better at software engineering than any statistician.” In contrast, complementing data scientists are business analytics people, who are more familiar with business models and paradigms and can ask good questions of the data.
References: