GPT-Code-Clippy (GPT-CC)

An open source version of GitHub Copilot, an AI-driven language model

What is GPT-Code-Clippy (GPT-CC)?

GPT-Code-Clippy (GPT-CC) is an open-source version of GitHub Copilot, a deep learning model based on GPT-3, known as GPT-Codex, which is specifically trained on publicly available code from GitHub.

The dataset used to train GPT-CC was collected from SEART GitHub Search based on the following criteria:

  • 10+ GitHub stars
  • 2+ commits
  • Must have a licence
  • Exclude forks
  • Size < 70708 bytes
  • In addition, the repositories from The Pile are also included.

