From Intern to Data Scientist: Getting Started in a Data-Driven Career
The OptimalBI team have been in the trenches for over 10 years developing our own approach to data analysis, we've spent years nurturing interns and graduates, transforming them into skilled data analysts, reporting analysts, data engineers, and data scientists. This blog is a repurpose of our getting started resources designed to help guide someone completely new to data roles as they prepare for a new career.
Data can be a complex beast, with its own language of databases and structured query languages (SQL). To navigate this terminology jungle, we equip our team with a personal glossary. As they encounter new concepts, they can add them to their glossary and revisit anything unclear for deeper understanding.
This blog post takes the same approach. Here I’ll break down the essentials of data, making it accessible for anyone who wants to embark on a data-driven journey.
This is just part one. There will be future posts on specific areas of specialisation - ethics, governance, machine learning, visualisation etc etc.
Data Literacy Reading
Building a strong foundation in data literacy is essential before diving into any data-related career path. It equips you with the core skills to understand, work with, and communicate data effectively.
Here's an initial reading list to start your journey:
Wikipedia - Data Analysis - https://en.wikipedia.org/wiki/Data_analysis
Understanding Data Roles - https://www.datacaptains.com/blog/guide-to-data-roles
Data Terms you should learn more about - https://www.osmos.io/blog/data-terms-glossary-list
Ted Talks - https://www.ted.com/topics/data
The Data Literacy Project - https://thedataliteracyproject.org/
Data Literacy Getting started
In short Data Literacy empowers you to understand data, transform it into insights, and communicate those findings clearly. This article provides a good overview for further reading on Data Literacy.
For any data role you will need to understand all of these concepts:
Reading Data - Reading data involves comprehending its meaning and how it reflects real-world aspects. This includes recognising whether data is raw, processed, or visualised.
Working with Data - Working with data encompasses creating, collecting, cleaning, and managing it. Different tools are used depending on the desired outcome and organisational needs.
Analysing Data - Data analysis involves using technical, mathematical, and statistical skills to extract insights from raw data. This analysis is crucial for making informed decisions and driving progress through data-driven insights and potentially augmented intelligence.
Using Data to Communicate - Effectively communicating with data involves using data and statistics to persuade an audience. This builds on Business Intelligence and potentially leverages Augmented Intelligence, but the focus is on clear communication, not argument.
Critical Thinking with Data - Developing a critical eye for data allows you to assess its quality, identify potential biases, and draw well-supported conclusions.
In our experience the best place to start for free is Qllik’s Data Literacy Programme which steps you through lessons from Understanding Data to Data Informed Decision Making to Correlation to Causation.
There are other data literacy resources out there to work through eg: Datacamp introduction to data literacy course; or a variety of courses offered by DataQuest.
Databases
Once you have a handle on the concepts of data and data literacy it’s a good time to learn about databases - the engines of the data world. Data could be stored in many types of databases. This resource from Prisma provides a great overview of the types of databases - Relational, No SQL, Multi-model etc etc, so read this first.
Relational databases (RDBMS) from vendors like Oracle and Microsoft are still the prominent data stores in the market but this is changing rapidly. For your career in data however it’s a good idea to learn the basic concepts such as:
Data Organization:
Databases: Think of databases as digital filing cabinets that store information in a structured and organised way. They allow you to efficiently store, manage, and retrieve large amounts of data.
Tables: Databases are made up of tables, similar to spreadsheets. Each table represents a specific category of information, like "Customers" or "Orders."
Rows & Columns: Tables contain rows and columns. Each row represents a single record, like a customer or order. Each column represents a specific attribute of that record, like "Customer Name" or "Order Date."
Data Types:
Databases can store different kinds of data, and each data type has specific formatting rules. Here are some common ones:
Text: Names, addresses, descriptions.
Numbers: Quantities, prices, dates.
Boolean: True or False values.
Data Manipulation:
SQL: Structured Query Language is the most common language used to interact with databases. You can use SQL to insert, update, delete, and retrieve data from tables.
CRUD Operations: These represent the fundamental actions you can perform on data: Create, Read, Update, and Delete.
Relationships between Tables:
Often, data in different tables needs to be linked together. Databases allow you to define relationships between tables based on shared fields, like connecting "Customers" to "Orders" using a customer ID.
Keys:
Primary Key: Each table has a unique identifier for each record, called a primary key. This ensures no duplicate entries exist.
Foreign Keys: These are used to link tables together. A foreign key in one table references the primary key of another, creating a relationship.
Database Management System (DBMS):
This is the software that allows you to create, manage, and interact with a database. Popular examples include MySQL, Oracle, and Microsoft SQL Server.
You can learn more about Databases with some of these resources:
Freecodecamp have an introduction course
You can learn SQL online with resources like SQLBolt or SQLFiddle or this SQL Murder Mystery game - I found all of these on Reddit so haven’t tried them.
By understanding these core concepts, you'll have a good foundation for exploring databases and working with data effectively. Remember, these are just the building blocks. As you learn more, you can delve into more advanced topics like database design, optimization, and security.
What next?
If you master all of this and are looking for what topics to tackle next my suggestion would be to read up on these concepts so you understand the wide range of roles and functions that data careers can lead to:
Data Analysis
“Modern” databases like NoSQL, Graph, Columnar, In Memory
Visualisation
Data Warehousing and Business Intelligence
Data Science and Machine Learning
Big Data
Data Engineering
This may seem like I’m suggesting you start with very basic information but data literacy is an essential skill in today's information-driven world. By understanding basic data concepts, databases, and the power of SQL, you can unlock new opportunities to analyze information, solve problems, and make data-driven decisions. Whether you're a student, professional, or simply curious about the world around you, this journey into data can empower you to extract meaning and insight from the vast ocean of information at your fingertips. So, keep exploring, keep learning, and keep leveraging the power of data!