Skip to main content

Getting started with GitHub Codespaces for machine learning

Learn about working on machine learning projects with GitHub Codespaces and its out-of-the-box tools.

GitHub Codespaces is available for organizations using GitHub Team or GitHub Enterprise Cloud. GitHub Codespaces is also available as a limited beta release for individual users on GitHub Free and GitHub Pro plans. For more information, see "GitHub's products."

Introduction

This guide introduces you to machine learning with GitHub Codespaces. You’ll build a simple image classifier, learn about some of the tools that come preinstalled in GitHub Codespaces, configure your development environment for NVIDIA CUDA, and use GitHub CLI to open your codespace in JupyterLab.

Prerequisite

You have access to GitHub Codespaces. For more information, see "Creating a codespace."

Build a simple image classifier

We'll use a Jupyter notebook to build a simple image classifier.

Jupyter notebooks are sets of cells that you can execute one after another. The notebook we'll use includes a number of cells that build an image classifier using PyTorch. Each cell is a different phase of that process: download a dataset, set up a neural network, train a model, and then test that model.

We'll run all of the cells, in sequence, to perform all phases of building the image classifier. When we do this Jupyter saves the output back into the notebook so that you can examine the results.

Creating a repository and a codespace

  1. Go to the github/codespaces-getting-started-ml template repository and click Use this template.

  2. Select an owner for the new repository, enter a repository name, select your preferred privacy setting, and click Create repository from template.

  3. On the main page of the newly created repository, click the Code button and select the Codespaces tab.

    New codespace button

    If you don’t see this tab, GitHub Codespaces isn't available for you. For more information about access to GitHub Codespaces, see "Creating a codespace."

  4. On the Codespaces tab, click Create codespace on main.

    By default, a codespace for this repository opens in a web-based version of Visual Studio Code.

Open the image classifier notebook

The default container image that's used by GitHub Codespaces includes a set of machine learning libraries that are preinstalled in your codespace. For example, Numpy, pandas, SciPy, Matplotlib, seaborn, scikit-learn, TensorFlow, Keras, PyTorch, Requests, and Plotly. For more information about the default image, see "Introduction to dev containers" and the devcontainers/images repository.

  1. In the VS Code editor, close any "Get Started" tabs that are displayed.
  2. Open the image-classifier.ipynb notebook file.

Build the image classifier

The image classifier notebook contains all the code you need to download a dataset, train a neural network, and evaluate its performance.

  1. Click Run All to execute all of the notebook’s cells.

    Screenshot of the Run All button

  2. Scroll down to view the output of each cell.

    Screenshot of Step 3 in the editor

Configure NVIDIA CUDA for your codespace

Some software, such as TensorFlow, requires you to install NVIDIA CUDA to use your codespace’s GPU. Where this is the case, you can create your own custom configuration, by using a devcontainer.json file, and specify that CUDA should be installed. For more information on creating a custom configuration, see "Introduction to dev containers."

Note: For full details of the script that's run when you add the nvidia-cuda feature, see the devcontainers/features repository.

  1. Within a codespace, open the .devcontainer/devcontainer.json file in the editor.

  2. Add a top-level features object with the following contents:

    JSON
      "features": {
        "ghcr.io/devcontainers/features/nvidia-cuda:1": { 
          "installCudnn": true
        }
      }

    For more information about the features object, see the development containers specification.

    If you are using the devcontainer.json file from the image classifier repository you created for this tutorial, your devcontainer.json file will now look like this:

    {
      "customizations": {
        "vscode": {
          "extensions": [
            "ms-python.python",
            "ms-toolsai.jupyter"
          ]
        }
      },
      "features": {
        "ghcr.io/devcontainers/features/nvidia-cuda:1": { 
          "installCudnn": true
        }
      }
    }
    
  3. Save the change.

  4. Access the VS Code Command Palette (Shift+Command+P (Mac) / Ctrl+Shift+P (Windows/Linux)), then start typing "rebuild". Select Codespaces: Rebuild Container.

    Rebuild container option The codespace container will be rebuilt. This will take several minutes. When the rebuild is complete the codespace is automatically reopened.

  5. Commit the change to the repository so that CUDA will be installed in any new codespaces you create from this repository in future.

Open your codespace in JupyterLab

The default container image that's used by GitHub Codespaces includes JupyterLab, the web-based Jupyter IDE. You can use GitHub CLI to open your codespace in JupyterLab without having to install anything else on your codespace.

  1. In the terminal, enter the GitHub CLI command gh cs jupyter.

  2. Choose the codespace you want to open.

    Screenshot of opening a codespace from the terminal