Skip to content

Early Career Data Scientist

Description: A collection of videos, tutorials, training materials, and exercises targeted towards any entry-level, early-career trainee interested in learning basic skills in data science.

Preparation: no advance preparation is required.

1. Data Science Ethics

Videos

6 videos available here

2. Overview: What is Data Science

Videos

  1. IBM OpenDS4All What is Data Science? with Yucen Wang - Part I
  2. IBM OpenDS4All What is Data Science? with Yucen Wang - Part II

3. Understand and Appreciate Open and FAIR Data

Article to read

  1. The FAIR Guiding Principles for scientific data management and stewardship

Exercises

  1. Create an ORCID
  2. Create wikidata entry about yourself and link to other projects if applicable
  3. Share past work on FigShare/Zenodo, etc

4. Learn GitHub

Getting started

  1. Create a GitHub account, see https://docs.github.com/en/get-started/signing-up-for-github/signing-up-for-a-new-github-account
  2. Download and install GitHub Desktop

Tutorials

Introduction to GitHub

  1. GitHub getting started guide
  2. Git 101: Git and GitHub for Beginners
  3. GitHub fundamentals

GitHub Issues

  1. Learn Markdown syntax
  2. GitHub issues
  3. About issues
  4. Intro to managing and tracking issues in GitHub

Exercises

  1. Help improve this pathway! Make edits to this OBO Academy page and make a pull request. (For example, find typos to fix, add or revise content to this document, etc.)
  2. Create a GitHub website by forking this repository: https://github.com/laderast/academic_site_workshop

5. Learn command line

Tutorials

Note: for the tutorials below PC users need to install ODK (instructions are linked from the tutorial)
Alternatively, PC users can download Git Bash

  1. Tutorial: Very (!) short introduction to the command line for ontology curators and semantic engineers: Part 1
  2. Tutorial: Very (!) short introduction to the command line for ontology curators and semantic engineers: Part 2

6. Introduction to Ontologies

Articles to read

  1. Ontology 101 by D. McGuiness
  2. Ontological Annotation of Data

Videos

  1. An Introduction to Ontologies by Mark Musen, Stanford University (~15 min)
  2. Introduction to Biomedical Ontologies #1: What is an Ontology?, by Jennifer Smith, Rat Genome Database (~15 min)
  3. Using ontologies to standardize rare disease data collection, by Nicole Vasilevsky, C-Path (1 hr)

Tutorials

  1. Introduction to ontologies
  2. Ontology fundamentals
  3. Contributing to ontologies

7. Basic Data Management

Videos

  1. Data Preparation and Planning
  2. https://dmice.ohsu.edu/bd2k/demo/BDK12-2/presentation_html5.html
  3. https://dmice.ohsu.edu/bd2k/demo/BDK12-3/presentation_html5.html
  4. Data sharing snafu: Data Sharing and Management Snafu in 3 Short Acts

Article to read

  1. 10 Simple Rules for the Care and Feeding of Scientific Data
  2. Big Data: The Future of Biocuration
  3. A primer on data sharing
  4. Identifiers for the 21st century: How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data
  5. Reproducible and reusable research: Are journal data sharing policies meeting the mark?

Exercise

Data Management 101

8. Preparing your CV and Tracking Your Contributions

Video

Workshop from Biocuration: Workshop - Careers In Biocuration

Articles

Is authorship sufficient for today’s collaborative research? A call for contributor roles

9. Effective Communication in Data Science

Tutorials

Survival strategies for team communication