My First Blog Post

Another Blog on Data Scientist vs. Statistician

There are countless articles on this topic, so it’s time for me to add my thoughts to the conversation. A Data Scientist is just an extension of a Statistician and Operational Research Analyst. The latter two titles are positions that have been around for decades. The data science position has evolved due to technology – faster computers, cheaper, more efficient data warehousing, and IOT (the internet of things). Society, in general is more willing to give up their data and organizations can store larger amounts of it. Therefore, the position of data scientist is in high demand. The main function of a Data Scientist is to use our data to help senior leadership make decisions on the next steps for an organization.

To become a successful Data Scientist, you must have these skills:

  • Understanding of Statistics and Probability
  • Knowledge of Programming languages such as Python and PySpark (or R)
  • Knowledge of SQL (Structured Query Language)
  • Understand Data Mining (Classification, Clustering, etc.) and Machine Learning techniques
  • Ability to create visualizations to “tell a story”
  • Ability to communicate and use soft skills
  • Being a lifelong learner

The duties of a Data Scientist are not limited to the following:

  • Understanding the problem/question the organization wishes to address
  • ETL (Extract, Transform, and Load). Being able to pull data from various sources to create the needed dataset.
  • Working with data warehouse structures like Hive and Hadoop
  • Cleaning the data, making decisions about missing data, outliers, etc.
  • Exploring the data to gain insight
  • Create models to train and evaluate the data set on
  • Work with people inside and outside of the organization
  • Communicate by providing the results in a manner that all levels can understand
  • As a lead or senior data scientist, you would be expected to review work including feedback, support, and training to less experienced team members.

The duties for a statistician are very similar to the ones listed above for the data scientist. He/she also needs to be able to clean the data to gain insight (inference) on the relationship between the sample and population. Of course, being able to understand the problem/question, communicate on all levels, and share your knowledge are skills every person needs to be able to do.

As for differences, the statistician may need to create an experiment to obtain the required data. Data scientists seem to be pulling data from in house rather than using results from experiments. The programming skills for a statistician would be less demanding. A statistician would be successful with just using R and SQL. Python would be great, but not necessary. On the other hand, it is important to the data scientist to stay aware of all the latest tools for extracting and working with large volumes of data. Lastly, the knowledge of statistics and probability is greater for the statistician.

I view myself as an aspiring data scientist that will have a strong background in statistics. Without statistics, there would be no data scientist or operational research analyst. Honestly, once data extracting, cleaning, and modeling are automated, you still need a human – Statistician to help with the final decision.


<
Previous Post
Blog Post Title From First Header
>
Next Post
The Simplicity of Functions