AI Transparency and Data Protection

How our AI system handles your data

When you use our AI services, you want to know what happens to your data. Here we explain the entire data flow, name every service involved, and link to the official sources. No marketing promises, just verifiable facts.

Data flow step by step

1. Your input

You ask a question or enter text via the web interface, the Teams bot, or the API.

2. Detection of personal data

Before the text leaves our system, it passes through several parallel detection methods: pattern matching for email, phone, IBAN, credit cards and ID numbers from DE/CH/NL/FR/BE/ES/AT. PII-specialised name and organisation recognition (GLiNER, multilingual). Address and street-name detection for 30+ countries. Detection of company and department abbreviations. In addition we match against a project-specific blacklist (maintained by you) and an auto-blacklist (learned from your own data).

All detection methods run in parallel.

3. Pseudonymization

Detected personal data is replaced before it is handed to the AI. We use two output formats depending on the target system: tagged markers (e.g. [PERSON:a1b2c3d4]) for parsable APIs and MCP consumers, plausible pseudonyms (e.g. "Markus Weber") for natural-language LLM inputs such as chat, voice, and automated email. Language models handle realistic-looking names better than bracket markers. Phone numbers and IP addresses are replaced with officially-reserved drama/documentation ranges (BNetzA reserved mobile blocks, FCC 555-0100, Ofcom 07700 900, RFC 5737/3849) — bijective pseudonyms that cannot reach real subscribers or hosts. Highly sensitive financial identifiers such as IBAN, credit-card numbers, BIC, and German tax/social IDs are replaced by the sentinel <suppressed> in both modes — never by plausible fakes that a model could mistake for real accounts.

Michael Berg called on March 15
[PERSON:a1b2c3d4] called on [DATE:e5f6g7h8] (tagged mode) – or: Markus Weber called on March 15 (faker mode)

The mapping is stored temporarily in Redis (rolling 24-hour TTL, in-memory only on our server in Frankfurt). No permanent storage. Atomically deletable on Article 17 GDPR request. In both modes, original data never leaves our server as plaintext.

4. AI processing

Only the pseudonymized text is sent to the AI service. The service sees no real names, no real addresses, no real contact details — depending on the mode it sees either markers or pseudonyms.

5. Response

The AI response comes back with the same markers or pseudonyms. Our system restores the original data before you see the answer. You always see your real values, never the pseudonymized intermediate.

6. Logging with audit trail

Every AI call produces an entry in a tamper-evident audit log: who asked what, when, which provider answered, how many tokens were used. The plaintext never lands in the log — only SHA-256 hashes of inputs and outputs. A Postgres trigger blocks UPDATE and DELETE. A hash chain across all entries is automatically verified every night at 3 AM.

Which AI services we use

For each service we say what the contract states, where the evidence is, and what it means in practice.

G

Google Gemini API

Paid access

Google does not use inputs and outputs from the paid API for product improvement or model training.

For users in the EEA, Switzerland, and the UK, these protections apply even on the free tier.

Logs are retained for up to 55 days and include the request, the response, and metadata. Because our project has billing enabled, Google does not use these logs for product improvement or model training.

The Data Processing Agreement is automatically included with paid services.

A

Anthropic Claude API

Commercial access

API data is never used for model training. This is a blanket policy, no opt-out required.

Inputs and outputs are deleted within 30 days by default. Exception: if safety classifiers flag content, data may be retained for up to 2 years.

The Data Processing Agreement is automatically included with the commercial terms. It includes EU Standard Contractual Clauses (SCCs), Modules 2 and 3.

M

Mistral AI

European provider (France)

Per Mistral's Privacy Policy (Section 3): "we do not use your Input and Output to train our artificial intelligence models when you use Le Chat Enterprise or the paid version of our APIs." Mistral is a French company based in Paris, with processing primarily within the EU.

Standard API per the Privacy Policy (Section 5): inputs and outputs are kept for the time needed to generate the output and then for 30 rolling days for abuse monitoring, unless Zero Data Retention is activated. Special cases: the Agents API retains data until account termination; the Fine-Tuning API retains data until explicit deletion in Mistral AI Studio.

The Data Processing Agreement (DPA) is part of Mistral's commercial terms. Governing law per the DPA (Clause 17): French law, French courts.

Mistral is headquartered in France and is directly subject to the GDPR. Standard Contractual Clauses (SCCs) are therefore not required. For the specific server region see Mistral's Privacy Policy.

O

OpenAI API

Commercial access

Per OpenAI: "data sent to the OpenAI API is not used to train or improve OpenAI models (unless you explicitly opt in to share data with us)." Training is opt-in, not opt-out.

Default per OpenAI platform docs: abuse-monitoring logs are retained for up to 30 days, unless longer retention is legally required. Modified Abuse Monitoring and Zero Data Retention are available after approval and exclude customer content from those logs.

Current constraint (as of May 2026): a court order in the NYT proceedings (Magistrate Judge Wang, 13 May 2025) forces OpenAI to preserve all output logs indefinitely. Exempt are ChatGPT Enterprise, ChatGPT Edu, and API customers with a Zero Data Retention contract. OpenAI has appealed. Consequence for our usage: we currently send OpenAI calls with pseudonymized content, not plaintext. This protects the identity of data subjects even under the court order.

The Data Processing Agreement (DPA) is available via OpenAI's terms and includes EU Standard Contractual Clauses.

What we do not do

We do not store conversation content in the AI audit log. Only SHA-256 hashes of inputs and outputs are kept for the evidentiary trail.

We do not share customer data with third parties, except with the AI services listed above, and only in pseudonymized form.

We use a paid API account with every provider, where the contract excludes training on our data. We do not use free tiers where providers may use submissions for model improvement.

We do not promise 100% pseudonymization. No detection system is perfect. That is why we run multiple layers of protection in parallel: PII detection, blacklists, contractual guarantees with the AI providers, and processing via pseudonymized content. We can reduce residual risk in the detection path; we cannot eliminate it.

Our infrastructure

Servers at dataforest GmbH in Frankfurt am Main
Encrypted transmission (TLS 1.2+)
Every AI call is logged in a tamper-evident audit table. A Postgres WORM trigger blocks UPDATE and DELETE, and a per-source hash chain is verified automatically every night at 3 AM. On any break the system immediately alerts the administrators.
PII mappings: only temporary in Redis (rolling 24-hour TTL), no permanent storage
Right to be forgotten (Article 17 GDPR): on request we atomically delete your PII mappings and document the operation with a multi-step evidence audit (start, one step per source, completion). We retain these deletion records for three years so we can still prove, years later, that your data is gone.

Last updated: May 16, 2026

Questions?

If you have questions about data protection in our AI services, write to us. SCHILLER - Organisation. Digital., Data Protection Officer:

datenschutz@schiller-partners.de