European companies building with LLMs face a recurring architecture question: can customer prompts and retrieved documents stay in the EU for the entire inference path? For many regulated buyers — banks, insurers, law firms, healthtech, public-sector suppliers — the answer must be yes, with evidence. That rules out default US-hosted OpenAI and Anthropic endpoints for primary inference, even when the providers offer DPAs.
This post walks through practical deployment patterns on European infrastructure: OVHcloud, Scaleway, Hetzner, and Mistral — when each fits, how to combine them with retrieval, and what GDPR and EU AI Act implications follow. It complements our GDPR + LLMs guide and LLM selection framework.
Three deployment tiers
Tier 1 — EU-hosted managed API
Lowest engineering overhead. You call an API; the provider runs inference on EU hardware.
- Mistral La Plateforme — Mistral Large, Small, Codestral, Pixtral; French provider, EU data processing commitments
- Azure OpenAI Service — EU regions (e.g. France Central, West Europe) with data residency options
- AWS Bedrock —
eu-west-3(Paris),eu-central-1(Frankfurt) for Claude and other models - OVHcloud AI Endpoints — managed inference endpoints on OVH infrastructure
- Scaleway Generative APIs — hosted models on Scaleway Paris / Amsterdam regions
Best for: teams validating product-market fit, moderate volume, no requirement to eliminate third-party API dependency entirely.
Tier 2 — Self-hosted open-weight models on EU cloud GPUs
You run inference yourself on rented GPUs. Data stays in your tenancy; you manage scaling, updates, and monitoring.
- OVHcloud GPU instances — H100, L40S in Gravelines, Roubaix, Strasbourg
- Scaleway GPU — H100 clusters in Paris
- Hetzner — cost-effective EU compute; growing GPU options
- Stack: vLLM or TGI for serving; Llama 3.x, Mistral open-weight, Mixtral, Qwen depending on task
Best for: sensitive corpora, predictable high volume, buyers who require "no third-party LLM API" in procurement questionnaires.
Tier 3 — On-prem or dedicated sovereign environment
Full control: air-gapped or private cloud, often for defence, critical infrastructure, or national-health deployments. Highest cost and operational burden. Same inference stacks as Tier 2, with stricter network boundaries and key management.
Reference architecture: RAG on EU infrastructure
A production pattern that holds up under GDPR and enterprise procurement review:
- Ingestion — documents land in EU object storage (OVHcloud Object Storage, Scaleway Object Storage, or S3-compatible in EU region)
- Indexing — embeddings via EU-hosted model (Mistral embed, open-source embedder self-hosted, or Bedrock EU); vectors in pgvector, Qdrant, or Weaviate in same region
- Retrieval — application server in EU VPC retrieves chunks; access control applied before context reaches the prompt builder
- Inference — Mistral API EU, or self-hosted vLLM on OVHcloud / Scaleway GPU; prompt and completion never leave EU network path
- Audit log — per-query log: user, retrieved chunk IDs, model version, output hash, timestamp — motivates Article 12 EU AI Act readiness; see CARAG architecture
- Observability — token cost, latency p95, refusal rate, retrieval hit rate
The CARAG enterprise case study implements a compliance-aware variant of this pattern over a 1.2M-document corpus — eligibility checks inside the retriever, not bolted on after generation.
Mistral vs self-hosted Llama — when to use which
- Mistral API (Tier 1): fastest path to quality; frontier-class for European languages; vendor dependency acceptable
- Mistral open-weight on your GPUs (Tier 2): same model family, no per-token API bill; you operate vLLM
- Llama 3.x (Tier 2): strong open ecosystem; good for cost-sensitive batch; you own safety tuning
- Hybrid: Mistral Large for complex queries; smaller self-hosted model for classification and routing — common cost optimisation
Model selection detail: OpenAI vs Claude vs Mistral vs Llama.
OVHcloud vs Scaleway — practical differences
Both are credible for EU sovereign AI. Choice often comes down to existing relationships, certification needs, and GPU availability:
- OVHcloud: large EU footprint, HDS (health data hosting) options, strong with French public-sector and enterprise buyers
- Scaleway: Paris-rooted, competitive GPU pricing, Generative APIs for quick starts, popular with startups
- Hetzner: lower cost for non-HDS workloads; good for dev/staging and cost-sensitive inference
- Multi-cloud: some enterprises require two EU providers for resilience — design for portable Kubernetes and object storage
Migration path from US APIs
Teams already on OpenAI often migrate in phases rather than big-bang:
- Inventory — list every call site, data sensitivity, and quality requirement per use case
- Eval harness — same test set run against OpenAI baseline and Mistral / self-hosted candidate
- Router layer — abstract model provider behind an internal API; swap without rewriting product code
- Shadow mode — run EU model in parallel, compare outputs, do not serve to users yet
- Feature flag cutover — per-tenant or per-use-case switch; keep US fallback only where TIA allows
- Decommission — remove US endpoints from primary path; document remaining exceptions
GDPR transfer mechanics: GDPR + LLMs guide, international transfers section.
Cost reality check
Self-hosting is not automatically cheaper. Below ~10M tokens/day, EU managed APIs usually win on total cost once engineering time is included. Above ~50-100M tokens/day, self-hosted Llama or Mixtral on reserved GPUs often beats API pricing. Run the math for your volume before choosing Tier 2 for cost alone.
Operational checklist
- DPAs signed with every processor (Mistral, cloud provider, vector DB vendor)
- Region pinned in infrastructure-as-code — no accidental US failover
- Secrets in EU-managed vault; not in repo or frontend
- Model version pinned; upgrade process documented
- GPU monitoring and autoscaling tested under load
- Erasure workflow covers vector index and conversation history
- Pen test or security review before enterprise procurement
Bottom line
Deploying LLMs on EU infrastructure is an architecture choice, not a checkbox. Managed EU APIs (Mistral, Azure EU, Bedrock EU) are the fastest path; self-hosted open-weight models on OVHcloud or Scaleway GPUs are the answer when procurement or regulation rules out third-party inference. Either way, design retrieval, logging, and access control together — not as an afterthought. Insightrix Sovereign AI builds and migrates EU-resident LLM systems; the CARAG paper shows what compliance-aware retrieval looks like at scale. Submit a project brief for a scoped assessment of your stack.