Recommended Reads to Level Up Your Data Career
I’ve put together a list of recommended, categorized articles to peruse, experiment with, and help level up your career in data. Note that many of these are hosted by Medium.com, which sets a limit on the number of free stories you can view. To get around this, tweet out the link and then click it from Twitter or search the title to see if someone else has already tweeted the link. Enjoy!
API Development
Big Data::Apache Spark
- 5 Spark Best Practices For Data Science Projects
- A modern guide to Spark RDDs
- Building a Big Data Pipeline With Airflow, Spark and Zeppelin
- Data Pipelines With Apache Airflow | by Munish Goyal | The Startup | Jul, 2020
- Faster extract and load of ETL jobs in Apache Spark
- Five Ways to Perform Aggregation in Apache Spark
- Flattening Nested Data (JSON/XML) Using Apache-Spark
- Guide to Selection of Number of Partitions while reading Data Files in Apache Spark | The Startup
- How to process a DataFrame with billions of rows in seconds
- Mastering Query Plans in Spark 3.0 | by David Vrba | Jul, 2020
- Ultimate PySpark Cheat Sheet
- What is Apache Spark? - Data Driven Investor
Business Intelligence
- 4 Ways to Deliver Analytics That Aren’t Dashboards or PowerPoint Decks
- Best Business Intelligence Tools 2020: Round One, Fight!
- Business Intelligence meets Data Engineering with Emerging Technologies
- You’ve Decided on a BI Tool, Now What?
Career Development / Data Strategy
- 25 Hot New Data Tools and What They DON’T Do
- 4 Trends That Will Disrupt Your Data & Analytics Strategy in 2020–2021
- 5 Tips for Kickstarting Your Data Career
- 5 Trends In Big Data And SQL To Be Excited About In 2020 | Better Programming
- 8+ Great Websites to Learn New Tech Skills During the Covid-19 Pandemic
- Data mesh (not a service mesh). The speed of business today calls for… | by Jacek Chmiel
- How to be a Successful Chief Data Officer
- How To Think About Data
- The ROI of a Modern Data Strategy - Future Vision
- Wish your team paid more attention to data?
Cloud::AWS
Containers::Docker
Database Architecture
- 5 Database Scaling Solutions You Need to Know | by Chris Staudinger | The Startup | Jul, 2020
- A Beginner’s Guide to Database Sharding - Level Up Coding
Data Engineering
- 150+ Concepts Heard in Data Engineering | by Dardan Xhymshiti | Jul, 2020
- 5 Great Data Engineering Online Courses - Better Programming
- A Beginner’s Guide to Data Engineering — The Series Finale
- A Data Engineer’s Perspective On Data Democratization
- A Quick Overview of Outliers in Data Engineering - Rohan Gupta
- Build Your First Data Pipeline in just Ten Minutes
- Complete Data Engineer’s Vocabulary
- Data Engineer VS Data Scientist
- Data Engineer, Patterns & Architecture The future
- Data Engineering 101: Writing Your First Pipeline - Better Programming
- Data Engineering and Data Science collaboration processes
- Data Wrangling Is Bad - Pete Aven
- Data Wrangling: A Beginners Cheat Guide! - Python In Plain English
- Dream of Becoming a Big Data Engineer? Discover What Sets Us Apart From Software Engineers
- Introduction to Data Engineering
- Junior Data Engineer packer
- ON the evolution of Data Engineering - Hacking Analytics
- Pre-process like a Pro — A Must-do list for Data-Engineers
- The brave new world of data engineering
- The Downfall of the Data Engineer - Maxime Beauchemin
- When Data Science Meets Data Engineering
Data Formatting / Structures
- Data Masking. It’s an amazing tool to have at your… | by Hugh Gallagher | Analytics Vidhya
- Use Binary Encoding Instead of JSON | by Shilpi Gupta | Better Programming | Jun, 2020
Data / Delta Lakes
- Data Lake Analytics - macxima
- Data partitioning: good practices in the design of Data Lakes.
- Data Warehouse vs Data Lake | ETL vs ELT - sspaeti.com
- Do you really need a data lake?
- The 5 Data Consolidation Patterns — Data Lakes, Data Hubs, Data Virtualization/Data Federation, Data Warehouse, and Operational Data Stores
- What is and Why Delta Lake? How Change Data Capture (CDC) gets benefits from Delta Lake
DataOps
- Are you still not using Version Control for Data?
- DataOps: Building Trust in Data through Automated Testing
- How to do data quality with DataOps | by Ryan Gross
- Top 5 database documentation tools for any team in 2020
Data Quality
- 7 Cs Fundamental Principles of Data Quality? – Pupuweb
- Data Quality Management and Tools
- The Six Dimensions of Data Quality — and how to deal with them
Data Warehousing
- Building a Data Warehouse: Basic Architectural principles
- Implementing a Data Lake or Data Warehouse Architecture for Business Intelligence?
- Use these open-source tools for Data Warehousing - sspaeti.com
ETL::Apache Airflow
- 3 Steps to Advanced Alerting on Airflow with Databand
- A Data Scientist’s Guide to Data Architecture
- Airflow : Zero to One - Analytics Vidhya
- Airflow, the easy way - Hacking Analytics
- Airflow: how and when to use it (Advanced)
- Apache Airflow and the Future of Data Engineering: A Q&A
- Reliably Upgrading Apache Airflow at Slack’s Scale - Several People Are Coding
- Understanding Apache Airflow’s key concepts - Dustin Stansbury
ETL / ELT
- AWS Glue: Amazon’s New ETL Tool. What is AWS Glue and do you need it? | by Sean Knight | Jul, 2020
- ETL Versus ELT | Explained With Examples (Part 1) | by Gary Cheung | developVenture | Jun, 2020
- Introducing Observable, self-documenting ELT - Saurabh Bhatnagar
- Python ETL vs. ETL Tools
IDEs / Code Editors
- How to Configure VS Code Like a Pro
- PyCharm vs VSCode. Is it time to change your IDE? | by Sohaib Ahmad | Jun, 2020
Python
- 12 Python Tips and Tricks For Writing Better Code
- 3 techniques to make your Python code faster | by Dhanesh Budhrani
- 5 Scenarios Where Beginners Usually Misuse Python | by Dardan Xhymshiti | Jul, 2020
- 9 Skills That Separate Beginners From Intermediate Python Programmers
- Can you solve these 3 (seemingly) easy Python problems?
- Concurrency in Python
- Do You Know Python Has A Built-In Database?
- How to analyse 100s of GBs of data on your laptop with Python
- How to Manage Big Data With 5 Python Libraries
- Introduction to metaclasses in Python | by Bakthavatchalam Gopalswamy | Analytics Vidhya
- Ten Python development skills.
- The Most Elegant Python Object-Oriented Programming | by Christopher Tao | Jul, 2020
- Top 15 Python Packages You Must Try - Tech Explained
- Try TextHero: The Absolute Simplest way to Clean and Analyze Text in Pandas
- Understanding Data Structures in Python
- Why doesn’t Python have a main function?
SQL
- How to Build Advanced SQL. Building more maintainable, readable… | by SeattleDataGuy | Better Programming | Jul, 2020
- Mastering SQL Queries - Analytics Vidhya
- The Many Faces of SQL
- The Many Flavours Of SQL
About Me
I'm a data leader working to advance data-driven cultures by wrangling disparate data sources and empowering end users to uncover key insights that tell a bigger story. LEARN MORE >>
comments powered by Disqus