Your code, their training data: an AI privacy primer
What every major AI coding tool does with the code you paste in — and the settings that actually protect proprietary work.
If your code is your company's most valuable asset, you should know exactly what your AI tools do with it. The answer varies dramatically by vendor, plan, and setting. Here is a clear, vendor-neutral primer.
The three things vendors might do
- Use your code to improve their general models (training).
- Use your code transiently to produce a response, then discard it (zero-retention).
- Store your code for a defined retention window for abuse monitoring (typical: 30 days).
Defaults, in plain language
Free consumer plans on most vendors permit training on your inputs unless you opt out. Paid business and enterprise plans generally do not train on your data by default — but the setting still exists and is worth verifying in your admin panel.
Zero-retention is usually available only on enterprise contracts. If you cannot accept any retention window, that is the tier you need.
Settings to check this week
- Training opt-out: confirm it is enabled at the org level, not just per user.
- Data residency: choose an EU or US region if your customers expect it.
- Logging: many vendors offer audit logs of every prompt — turn them on.
- Public-code filtering: prevents responses from echoing licensed code verbatim.
Practical guardrails
If you handle truly sensitive material, pair the policy controls with operational habits: secrets in environment variables (not in prompts), production data masked before pasting, and a clear rule that no customer PII is ever shared with any AI tool — paid or otherwise.
When in doubt, ask for the DPA
Every serious vendor has a Data Processing Agreement. Read it. The marketing page is not the contract; the DPA is. If the vendor will not share one, that is your answer.