When it comes to stepping into the business of being a professional data scientist and probably landing your dream job, it is important to build a strong data science portfolio and personal brand. Recruiter usually tend to take a close look at your online presence and aim at identifying your track record and the projects you have been working on. Of course, the importance of your online portfolio plays an even higher role when you have not graduated from a top tier university, and is even only one of the few ways to judge your skills when you have studied data science yourself by e.g., focusing solely on online courses. But what actually makes a good data science portfolio? Let us cover a few puzzle pieces that I think make a solid portfolio.
Personal Website
The best way to connect your portfolio is to have a personal website. This site does not need to be in any form fancy, but should provide compact information about yourself, your achievements and skills, and should link to all other parts of your portfolio that we will discuss next. For building such a personal website, I personally prefer to implement a simple one-page HTML site using e.g., bootstrap. For my personal website at philippsinger.com, I adapted an existing bootstrap template. There are many technical alternatives, and in the end you should choose the solution that you are most comfortable with. It might also make sense to use a platform solution like WordPress, specifically, if you plan on also self-hosting your blog. By and large, your personal website should cover at least the following points:
- Who are you?
- What are your main skills and competences?
- What are your main achievements (e.g., publications, projects, etc.)?
- What else can I find about you on the Web?
- How can I contact you?
Blogging
I am a huge fan of blogging (well, I am writing this blog, right?). A blog let’s you share your progress, ideas, accomplishments, or projects and let’s you connect to your audience by either communicating them our teaching them new things. By and large, a blog can be an important touch to your data science portfolio and personal brand. Many people are afraid of starting a blog and try to over-engineer it, but in the end, the best way to go is to just start doing it. You have learned a small trick in Python that saved you several hours of work? Go and write about it. You have just finished an awesome data science project? Go and blog about it and let your audience know. You have come up with a new idea to tackle some pressing issue? Go write about it. You get the idea, just collect all your ideas, accomplishments, and concepts, and communicate them to the outside world. This is what open knowledge is all about in today’s information society and as a side effect it also might give colleagues or recruiters a good idea of what you have been working on and thinking about. And don’t worry too much about your writing skills, they not only will improve over time, but they also often do not matter that much. What matters most is content.
Probably the easiest way to start a blog nowadays is to use Medium. It let’s you write blog posts natively and is also pretty popular in the CS community. However, there are many other options. Prominently, WordPress has emerged as the go-to platform for self-hosting a blog (this blog is running on it). It can be specifically useful for combining your personal website, as stated above, with a blogging functionality. Another popular solution is Jekyll which also can be easily self-hosted on free Github Pages.
Github
If you only want to do one thing out of the list described here, I would probably recommend to start with a Github profile. Github itself describes the platform as:
GitHub is a development platform inspired by the way you work. From open source to business, you can host and review code, manage projects, and build software alongside millions of other developers.
In its core, Github uses Git, an open source version control system. The platform allows users to share their projects in so-called repositories with the outside world. Other users can then see the code, re-use it, tinker with it themselves, post issues, ask for advice, and so on. Github also has emerged as the largest hub for collaborative open science, meaning that everyone in the world has the chance to collaborate on popular libraries and software projects.
So what does that mean for your data science portfolio? It means that in best case you should make most of our code and projects available to the outside world. However, that does not only mean posting it somewhere after finishing, but also sharing the progress leading to the end goal. Github is predestinated for that task with its built-in version control functionality. Similarly to blogging, I can only encourage everyone to use Github as frequently as possible when doing data science projects. Again, it not only allows you to share your knowledge with the outside world, but it also gives you the opportunity to share your data science portfolio to e.g., recruiters.
Social media
Today’s social media platform is vast and not always changing. Choosing on which social media platform to be present, is always a highly individual choice and also depends on your discipline. For data science, I feel like Twitter being the only real valuable platform. It gives yourself an easy tool to further publish your content (e.g., tweeting about the new blog post you wrote) and communicate with your peers. Twitter is also the perfect tool to stay up-to-date and a multitude of influential researchers, data scientists, and open source developers are active on the platform. It gives you the perfect tool to be visible. The only other social media tool I actively use is LinkedIn, which is specifically popular in the corporate field and is the prime source for recruiters to look for potential candidates. Apart from those, the choices are plentiful and it doesn’t hurt to try one or the other out.
Curriculum Vitae
At the same time, it is important to keep your Curriculum Vitae up-to-date as it can be the go-to stop for others to quickly check your track record. Best is to directly link to it on your personal website. It is hard to give clear recommendations towards how to make a CV as it always depends on the people looking at it or the job you apply for. Personally, I like to have one longer version and one shorter one-page version at hand. I usually link to the longer version on my website and profiles. On the Web, you can find a large array of tutorials on how to write a CV, I like to implement the following sections:
- Personal information
- Positions
- Education
- Competences & Personal Skills
Optional information (longer version):
- Awards & Honors
- Projects
- Teaching Experience
- Publications
- Talks, community service, tutorials, etc.
For an example, you can find my personal CV online. However, as said, a CV is a very personal document and depends on each individual and person looking at it.
Summary
In this blog post, I have tried to outline a few puzzle pieces that I feel like make a good data science portfolio that also helps you to build your personal brand. However, in the end, it depends on you personally on what makes sense to you and your field and expectations. Of course, there are also many more things not covered in this article, that might make sense to follow up on.
References and readings
- https://www.dataquest.io/blog/build-a-data-science-portfolio
- https://www.datascienceweekly.org/articles/how-you-should-create-a-data-science-portfolio-that-will-get-you-hired
- https://www.youtube.com/watch?v=xrhPjE7wHas
- https://medium.com/one-datum-at-a-time/how-to-construct-a-data-science-portfolio-from-scratch-de0b70e58bc1
- https://smartblogger.com/how-to-start-a-blog/
Also published on Medium.