GitHub promised not to train Copilot on your private code. But a class-action lawsuit showed Copilot reproducing verbatim chunks of open-source code — including copyright notices and licence text — without attribution. If the AI memorised and regurgitated GPL-licensed code, it violated the licence. If it did the same with your private code, you'd never know. The training data went from GitHub to OpenAI to Copilot. GitHub controls what goes in. OpenAI controls what comes out. You control nothing in between. You write code in VS Code. Copilot sends it to OpenAI for completion. OpenAI runs on Microsoft Azure and now AWS. Microsoft owns GitHub and 27% of OpenAI. Your proprietary code passes through three companies with $185 billion in mutual financial obligations. A developer writing trade secrets has their code context flowing through a code host, an AI company, and a cloud provider — all connected by ownership and investment. They promised the platforms are separate. The money says they aren't.
What they claim: GitHub states "we do not train GitHub Copilot on private repository code" in its privacy documentation.
What we found: In November 2022, a class-action lawsuit (Doe v. GitHub) was filed alleging GitHub Copilot reproduced substantial portions of licensed open-source code without attribution, violating open-source licences (GPL, MIT, Apache). Researchers demonstrated Copilot generating verbatim code snippets including copyright notices and licence headers from training data. While GitHub claims private repos are excluded, the boundary between public and private training data has been questioned — especially since GitHub has access to all repository data on its platform, and OpenAI (which trains Copilot's models) received the training dataset from GitHub.
What they claim: GitHub positions itself as a neutral platform for all developers.
What we found: In 2019, GitHub restricted access for developers in Iran, Syria, Crimea, Cuba, and North Korea citing US trade sanctions. Developers lost access to private repositories containing years of work. In 2024, GitHub suspended accounts of developers contributing to Palestinian open-source projects, citing unspecified "terms of service violations" — later reversed after backlash. GitHub also complies with DMCA takedowns that have been used to remove security research, including the removal of the ProxyLogon proof-of-concept exploit. A code platform under US jurisdiction means US sanctions, US politics, and US copyright law determine who can write software.
What they claim: GitHub markets Copilot as a productivity tool that helps developers "code faster."
What we found: Stanford researchers found that code written with AI assistance contained more security vulnerabilities than code written without it. Developers using Copilot were more likely to introduce SQL injection, XSS, and other OWASP Top 10 vulnerabilities because the AI generated plausible-looking but insecure patterns. A separate study found developers using AI assistants were more confident in their code's security while actually producing less secure output — a phenomenon researchers called "the overconfidence effect." Copilot optimises for speed, not safety.
What they claim: GitHub's privacy statement says it collects data "to provide, improve, and develop" its services.
What we found: GitHub sends telemetry to Microsoft. GitHub Copilot sends code context to OpenAI's API for completion. This means your code passes through: GitHub (Microsoft) servers, OpenAI's inference infrastructure, and potentially AWS (where OpenAI now deploys via Bedrock). A developer writing proprietary code in VS Code with Copilot enabled has their code context flowing through three companies — all part of the Microsoft-OpenAI-AWS convergence. Microsoft owns GitHub and 27% of OpenAI. The code hosting platform, the AI model, and the cloud infrastructure are financially entangled.