Meet data heroes all in one place
Posted by InterVenture on June 28, 2021There is nothing worse than asking a data engineer to build you some fancy report/dashboard. If you are lucky, he would just ignore you, otherwise he might slap you right in the face. So, be careful. Instead of living in fear around data heroes, make some effort to learn key differences between positions born and raised thanks to the sexiest job of the century. Yes, we are talking about Data Science.
Nowadays, there are plenty of positions and roles that are somehow connected to data. Start with data engineers, business analysts, data specialist, researcher scientist etc it makes so hard to differentiate role’s scope between those. Still, there is a high percentage of overlapping, but in the core all data positions are inherited from three major positions:
- Data Engineer
- Data Analyst
- Data Scientist
The purpose of this blog is to introduce these three professions to our audience and, in case you have a lot of data and don’t know how to use it to make some fun or even money, read carefully till the end and we assure you, at least you will catch who is the right person around to ask for a little help.
But wait…. We are missing something at beginning of this journey…YES, here we go! Firstly, let’s understand what Data Science is indeed!
According to Wikipedia, Data Science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data and apply knowledge and actionable insights from data across a broad range of application domains. Data science is related to data mining, machine learning and big data.
(Source: https://en.wikipedia.org/wiki/Data_science)
Hmm, how convenient explanation with so many unnecessary terms and words. So, data science is a broad field of study pertaining to data systems and processes, aimed at maintaining data sets and deriving meaning out of them.
Let’s translate an earlier definition through image:
To become a part of the famous data science world you must master at least one of those skills and be familiar and comfortable with others. First, machine learning (ML) and data science (DS) are fascinating fields. Mostly because they sit at the crossroad of computer science, mathematics and business understanding.
This means that there is way more room for personal growth. Another important factor is that the field is moving at lightning speed. Not a day goes by without hearing from the latest breakthrough, the newest shiny deep learning architecture, this great new book that every DS practitioner should read, etc. Then there are all the other reasons like you can make good money, you can make a big impact in your company, AI is the future and the rest of countless reasons why. It is important to emphasize there is a lot of overlapping skills for each position related to Data Science.
Don’t get confused reading blogs where the terms Data Science, Artificial Intelligence (AI) and Machine learning fall in the same domain. They are connected to each other, but they have their specific applications and meaning.
Describing this relationship requires whole new approach so we will save it as a topic for some of the following blogs in the future.
Finally, it’s time for our heroes to shine.
First star of our story will be a Data Engineer.
He is supposed to have the following responsibilities:
- Development, construction, and maintenance of data architectures.
- Conducting testing on large scale data platforms.
- Handling error logs and building robust data pipelines.
- Ability to handle raw and unstructured data.
- Provide recommendations for data improvement, quality, and efficiency of data.
- Ensure and support the data architecture utilised by data scientists and analysts.
- Development of data processes for data modelling, mining, and data production.
Let’s imagine that you have a couple of devices and each with thousands of notes, images, files etc.
And you want to make your personal scrapbook. You have to collect all your data and place it together somewhere on same device, then you must find a common tool to integrate all data on the same place (let’s say you chose famous Word where you can add text, links, images all in one place) and start making your lifetime scrapbook. Congratulations, you have just become Data Engineer.
Look, is seems not so hard, just find all data sources and integrate them into same place and then let other to use it wisely. And, friendly advice, try to automatise it somehow, you don’t want to spend couple of hours every day on same repetitive task. So, make some effort and build a good infrastructure for it. Make sure that you have necessary programming skills and have a good grasp of critical thinking. Yes, you need to think out of the box.
Following are the key skills required to become a data engineer:
- Knowledge of programming tools like Python and Java.
- Solid Understanding of Operating Systems.
- Ability to develop scalable ETL packages.
- Should be well proficient in SQL as well as NoSQL technologies like Cassandra and MongoDB.
- He should possess knowledge of data warehouse and big data technologies like Hadoop, Hive, Pig, and Spark.
- Should possess creative and out of the box thinking.
Great, we’ve introduced Data Engineer, and he took care to make data available for further manipulation. And what is next?
So, the logical next step would be a studious data analysis. Why?
Because it’s not rare for data engineers to lose same data in process of extracting, transforming and loading data. Besides that, it is often to have low quality data and data engineer isn’t skilled to explore it because that requires some statistical and analytical skills and very often data engineer just don’t have enough time for it.
Now, it’s time to our second star to shine: Data Analyst.
Okay, Data Analyst, here we go!
Following are the main responsibilities of a Data Analyst:
- Analysing the data through descriptive statistics.
- Using database query languages to retrieve and manipulate information.
- Perform data filtering, cleaning and early-stage transformation.
- Communicating results with the team using data visualization.
Data Analyst is needed to understand data quality and propose ways and procedures to increase data quality in the engineering process. Also, they are key players in decision making due to their skill to translate data tables into common language represented with all fancy reports and dashboards. No one understands data until it’s transformed into clear visual (graphical) representation. To be efficient and effective data analyst, you must have a good grasp of business understanding and strong communication skills because you have to clearly communicate data to third parties, usually stakeholders, and make sure to obtain all necessary insights for further business decisions. You don’t want to let down people who give you such freedom and their sincere trust. Be aware how many responsibilities it takes and respect that.
So here comes the hero, the analyst, to filter, direct, and translate your data into actionable insight. To become a Data Analyst, you must possess the following skills:
- Should have the strong mathematical aptitude
- Should be well proficient with Excel, SQL and at least one visualization tool such as Google Data Studio.
- Possession of problem-solving attitude.
- Proficient in the communication of results to the team.
- Should have a strong suite of analytical skills.
Ladies and Gentlemen, there is still our last star and maybe the brightest one, that’s a reason why this person is widely known as a unicorn in the world of data science. With such a pleasure, we’ re introducing you a Data Scientist role.
He does everything what data analyst does but have a little bit more to offer on his plate. He is, sorry analysts, don’t be offended, but some kind of upgraded version of Data Analyst.
Data scientist is supposed to know a little of everything and this guy is often called, in a funny way, as Jack of all trades, Master of none. But we have to disagree on this statement. We truly believe that data scientist is someone who could predict the future a quite accurately. In a business matter, of course, and for the top management he is a wizard. Can you imagine having that superb title? Fancy, isn’t it?
Long story short, this fellow is supposed to mingle some statistical, mathematical, programming and communicational skills. Also, it is strongly required to have ability of critical thinking and strong sense of business domain. Don’t get confused, all of these skills are also must-have for data analysts but not so imperative as for scientists.
A Data Scientist is required to perform responsibilities:
- Performing data pre-processing that involves data transformation as well as data cleaning.
- Using various machine learning tools to forecast and classify patterns in the data.
- Increasing the performance and accuracy of machine learning algorithms through adjustment and further performance optimization.
- Understanding the requirements of the company and formulating questions that need to be addressed.
- Using robust storytelling tools to communicate results with the team members.
To become this lovely fellow from the picture above, you only need to master those key skills:
- Should be proficient with Math and Statistics.
- Should be able to handle structured & unstructured information.
- In-depth knowledge of tools like R, Python and SAS.
- Well competent in various machine learning algorithms.
- Have knowledge of SQL and NoSQL.
- Must be familiar with Big Data tools such as Spark, Hadoop, Kafka etc.
Yes, after all those introductions, we conclude and feel the same way, you could hire just a data scientist and he will do all job around data. But remember to give him at least 3 salaries instead of one and be sure that your scientist will never become a magic wizard of the company because he does not have time to shine due to all tasks that should be taken by some other data position(s).
Finally, let’s put it all together as an overview of skill sets and responsibilities required for each described position in this blog:
We hope you’ve enjoyed reading this blog and finally understand the difference in role’s scope between this data geeks (heroes) of our story.
This blog post is written by our colleague Tamara Ćirić, Data Analyst at Flaschenpost.
Also, if you are looking for new job opportunities, take a look at our open positions.
One of them could be just for you.