Data Science Ingredients Explain with Drew Conway Diagram
By: Nunung Nurul Qomariyah, Ph.D
there is a point where there are just NOT enough brain cells on the planet to even look or even glance at the data (Yann LeCun, Director of AI Research at Facebook)
Nowadays, data is everywhere. Look around us, websites, e-Commerce, financial transactions, IoT devices, online trading, social Network, smart watch, etc. Every single day, people generating data. Yan LeCunn, a director of AI Research at Facebook, said that there will be a day that we, as a human, cannot process all those data anymore with our brain. That’s why we need a help from a computer.
This huge amount of data can no longer be managed by a standard relational database system, it needs something big, and more powerful. This is where the BIG DATA technology is being invented to handle this kind of huge data which comes in different format, structured or unstructured. This is the good news. Now, as we already have the technology to handle this, then what?
Given a large mass of data, we can by judicious selection construct perfectly plausible unassailable theories—all of which, some of which, or none of which may be right. (Paul Srere)
We actually have another problem with that. Everyone can create a theory from those data, it can be true and valuable or even the opposite way, it can be false and rubbish. So now, how people can take advantage of this situation, especially for business people who actually willing to spend their money to get valuable information?
Here comes the DATA SCIENCE, a multidisciplinary field of study with goal to address the challenges in big data. According to [1], Data science is a concept to unify statistics, data analysis, machine learning, domain knowledge and their related methods” in order to “understand and analyze actual phenomena” with data. What is the ingredients of a DATA SCIENCE according to Drew Conway? Please see the image below:
Data science skills composed of three big area: Computer Science, Math and Statistics and Business/Domain Expertise. In computer science, you need coding skill, a little bit of hacking skill will also help to find a new data sources. You also need a machine learning skill. In Math and Statistics, you need analytic skill, dealing with probability, analyzing the data, diagnose problem and selecting a proper procedure for predicting. In Business/Domain Expertise, you will need to understand the problem domain to finally achieve the goal. This also to ensure that the result will be impactful and well implemented.
References:
[1] Hayashi, Chikio (1 January 1998). “What is Data Science? Fundamental Concepts and a Heuristic Example”. In Data Science, Classification, and Related Methods. Studies in Classification, Data Analysis, and Knowledge Organization. Springer Japan. pp. 40–51.