How to Build a Strong Data Science Portfolio

kanger.dev
Dev Genius
Published in
9 min readJul 19, 2022

--

Building a portfolio is a great learning opportunity and quite effective for showcasing your technical expertise. [Updated]

Data Science Portfolio
Image by Alex Padurariu on Unsplash

One of the most important steps to take when planning how to become a Data Scientist is deciding how you will show your skills, accomplishments and knowledge.

A professional portfolio is an important means for building connections for data scientists, and to get started, assess the skill sets that you have mastered (or you are learning). Based on those skills, build a portfolio for the contributions, internship opportunities and jobs.

The requirements in data science job postings make life challenging for every job applicant to stand out. Your portfolio will, however, strengthen your efforts of data science job hunting. Consider, therefore, public portfolio the skeleton of your skills. You can apply for a wide variety of entry-level Data Science jobs by showcasing your data skills to impress employers through the portfolio.

In this guide, we explain the importance of building a portfolio through projects and provide factual tips that will open doors for you.

The Importance of a Data Science Portfolio

Looking for an entry-level data science job can be a defeating experience because you need experience to land a job, but you also need a job to get experience. It can get puzzling for the beginners. Most data science job roles ask for years of prior experience, making it tough to break into the field.

How are you going to get your no-data-science experience foot in the door? Will a Data Science Certification help you if you don’t have the required experience? NO.

There are many ways to get an entry-level data science job, like internships, advanced bootcamps, master’s degree, but the one thing that has helped many people is creating a portfolio. To get a job in Data Science, you need to show expertise through real-world projects.

When you are building data science skills, go that extra mile and work on data science projects and public datasets to stand out. You can deploy your project for public use through GitHub and write articles that explain your findings.

Data scientists are always curious to look at what other data scientists have done.

In simple words, the easier it is for people to find these projects, the easier it becomes for hiring managers to evaluate your skills.

Data Science Projects

One of the best ways to get started in data science is by working on the projects, and there are many free resources online.

While many of the data science projects can seem hard but as you learn the basic statistics for data science, you can perform a lot of tasks to improve your data science skills.

Data Science projects push you to spend more time learning about programming, performing statistical analysis, deploying solutions, and creating data visualizations to communicate results meaningfully.

Here are some key reasons why working on data science projects is worth your time-and why creating a portfolio will boost your career prospects.

  • Hands-on experience: Working through a data science project will cement your knowledge and bootstrap your confidence to talk about it.
  • Data Community: You can connect with people dedicated to data science and machine learning on platforms like Kaggle, Reddit, and Stack Overflow to receive free and expert guidance.
  • Contributions: Data scientists that find your projects interesting will also look through your portfolio in order to gauge your skills, experience, and interests, and may even recommend you for open-source contributions.
  • Internships: Showcasing projects on your portfolio is often a key tool in the finding internships opportunities.
  • Jobs: Finding opportunities is the main reason for building a portfolio and you increase the chances by showing the work you have done on the projects.

We’ve talked thoroughly about why the basics should never be discounted to build the right foundation in our data scientist skills article and now, in this guide, we circle back around into speciality through the projects.

Working on the data science projects and public datasets will help you build up intellectual curiosity to focus on specializing in one specific field.

It’s like if you want to become a specialist in a particular profession, you ought to learn everything essential to becoming a generalist first in that profession.

It takes time, diligence, hours of research and working with data.

There are many ways to showcase your work while you are still learning to make your life easy and build a strong data science portfolio.

Let’s take a closer look at what they are and how you can employ them.

Creating Projects

As a beginner, you can start with easy projects and observe how your peers create well-documented projects and communicate the quality of analyses.

It matters what projects you are creating and how you are making the best use of resources, like scientific libraries, packages and tools at your disposal. You are essentially learning concepts and growing logical reasoning skills while making the optimum use of your time by identifying purpose.

Without purpose, your efforts are in vain, but the debt of purpose can be realized by answering questions like:

  • What problem do I solve here?
  • How would I benefit from my analysis?
  • What skills will I gain?

Projects are not substitutes for your work experience, but if you dedicate time to improve your skills, you can show the expertise that most people gain through work experience.

As you are learning through the projects, document your work on platforms like GitHub and Deepnote without reproach.

Projects Portfolio and Documentations

Portfolio projects which capture the most attention are those that are well documented. Documentation will make or break the success of your projects and your portfolio overall.

Code quality is of paramount importance for relevance and clarity. If your work is not simple, it’s not exceptional.

Here’s an example of an elegant Python code.

from github import Github

# First create a Github instance:

# using an access token
g = Github("access_token")

# Github Enterprise with custom hostname
g = Github(base_url="https://{hostname}/api/v3", login_or_token="access_token")

# Then play with your Github objects:
for repo in g.get_user().get_repos():
print(repo.name)

This code snippet by PyGitHub is human readable with comments explaining in less words the use of each function and variable.

A good portfolio project shows both your technical and soft skills. Expanding out through writing and showcasing contributions will enhance your chances of getting noticed by the potential employers, as the intended use of your portfolio is to provide a jet tour of your skills.

If you’ve spent hours scraping the public dataset for a specific task, you could also create a project repository to make your scraping tool accessible and document the entire process by writing an article about it that evinces your technical skill-set.

Here’s an example of a great portfolio:

Data Science Portfolio Example by kanger.dev

All the relevant information is on the homepage. “I am Chris Tran. Machine Learning Engineer in Deep Learning, NLP and Computer Vision. What more is there to know? It is punchy and direct to the point. Chris has an educational background in Statistical programming and Machine learning.

The main thing to take from the Chris Tran’s approach is simplicity and organization. The portfolio section clearly shows that Chris has put careful thought into showcasing his skills by writing in-dept tutorials explaining every important detail till for each project.

He is driving visitors from his project repositories. He creates a clear and intuitive README file for each repository with links to a topic-specific article for learning the concepts involved in building the project. This is a brilliant approach to maintain a healthy portfolio.

It’s worth noting from this brief clip how Chris gets into detailed case studies on his website, where we also get to learn about his personality and communication skills.

Tip: Learn project documentation from READMEs guide by GitHub.

Publishing

Again, the most important aspect of deployment is code quality. Learn the best practices to write your programs more effectively. This will help you to learn What to include, and what to avoid, how to strike the right balance and why it’s the best choice.

You might benefit from the book Effective Python to learn specific ways to write better Python code. It’s highly recommended in the developer community.

Your work will not go unnoticed. Honing your coding skills and learning from others will help you become a better researcher.

You could configure a local Jupyter environment with GitHub or Deepnote to publish your projects. The single document approach with Jupyter Notebook makes life easy to develop, visualize and add information, and formulas that make work more understandable, repeatable, and shareable.

This is what data scientists are doing. It is a common practice to demonstrate that you have the technical skills and ability to explain complex topics in a way that is understandable.

3 Tips for Building a Strong Data Science Portfolio

With building a professional portfolio, your goal should be to stand out and be one of a kind, not one of many.

These tips will help you persuade potential employers that you are uniquely qualified for a position.

Join Kaggle

Kaggle is the largest, most trusted online community for data scientists and machine learning enthusiasts. You can collaborate with other users, find and publish datasets, use GPU integrated notebooks, and partake in competitions to solve data science challenges.

Employers pay a lot of attention to your Kaggle profile. A strong profile will surely result in a lot of exposure, which will help you in getting an entry-level job.

It is good for learning machine learning. It’s completely free, all datasets, participation in competitions, and discussions. You can also connect with recruiters through the Jobs Board.

Datasets

It’s a great platform to learn how to think and solve real-world problems. You can generate project ideas from real-world datasets, over 160k of them to keep your motivation high throughout the learning.

Competitions

Companies like Google and American Express host Kaggle competitions. Your performance is a powerful way to stand out from the crowd to show your abilities in solving complex problems.

The competitions usually last for 3 months, offering anywhere between $10,000–150,000 in prize fund. There are only 94 grandmasters in the world and the most of them have been using Kaggle for over two years.

Be open to sharp criticism,Kaggle offers aspiring data scientists the best chance to learn from qualified people for free.

The expertise you gain on Kaggle will be invaluable.

Always use GitHub

GitHub keeps a track of your daily contributions. Your work is publicly visible and people can see your working knowledge and commitment to data science.

You should make the most out of GitHub. Data Scientists universally use GitHub because it hosts nearly all data science repositories, powerful libraries/packages and tons of other programming resources.

One of the best ways to highlight your skills is to have an active presence on GitHub. Having an active GitHub profile could open up tremendous collaboration or internship opportunities that you can also showcase in your portfolio.

You can host both code-based and content-based projects on GitHub

Project Example 👇🏾👇

Project Example by kanger.dev

It’s clear at a glance where Chris Tran’s skills lie: Python, Machine learning, and building AI systems.

It’s always a good practice to put the code you’ve written up on the GitHub profile regularly. You can create a static website like Chris Tran with GitHub Pages to host your blog and portfolio for free.

You can easily customize your GitHub profile page, add links to your articles and showcase your projects. It’s best is to link the GitHub, LinkedIn, and Kaggle profile.

It’s easy to gain familiarity with GIT and GitHub terminology, such as repository, branch, commit, pull request, etc. You can learn from the official guide or the recommended resources below.

Write as you learn

Data science blogs can be a fantastic way to hone your communication skills, present your analysis, uncover unique insights, and publish data visualizations.

While it’s true that you show your expertise via projects, but you should start writing tutorials as you grow. You’ll build readership if you write high-quality tutorials.

Marketing Tip: We recommend publishing articles on your a blog then republishing your articles with a canonical link on platforms like Medium, Dev.to and Kdnuggets.

TL;DR

There is no portfolio format that works best. The common denominator, though, is that you should focus on the speciality, skills and notable accomplishments.

Your portfolio should have an intriguing description that drives people to check your projects, tutorials, articles, etc.

Thanks for making it to the end…

If you liked this article, we’ve got a few practical data science resources for you.

--

--

kanger.dev provides contextual stacks for job roles, such as Data Scientists, AI/ML Engineers, Product Managers, etc. Newsletter: https://t.ly/aqNC7