Back to all posts

Deploying LLMs on EU infrastructure: OVHcloud, Scaleway, and Mistral

A practical architecture guide for running LLM inference in the EU — when to use sovereign cloud APIs, when to self-host, and how to wire Mistral, Llama, and retrieval without sending prompts to US hyperscalers.

Published 2026-06-06·Updated 2026-06-06·9 min read
Sovereign AIMistralOVHcloudScalewayLLMsGDPR

European companies building with LLMs face a recurring architecture question: can customer prompts and retrieved documents stay in the EU for the entire inference path? For many regulated buyers — banks, insurers, law firms, healthtech, public-sector suppliers — the answer must be yes, with evidence. That rules out default US-hosted OpenAI and Anthropic endpoints for primary inference, even when the providers offer DPAs.

This post walks through practical deployment patterns on European infrastructure: OVHcloud, Scaleway, Hetzner, and Mistral — when each fits, how to combine them with retrieval, and what GDPR and EU AI Act implications follow. It complements our GDPR + LLMs guide and LLM selection framework.

Three deployment tiers

Tier 1 — EU-hosted managed API

Lowest engineering overhead. You call an API; the provider runs inference on EU hardware.

  • Mistral La Plateforme — Mistral Large, Small, Codestral, Pixtral; French provider, EU data processing commitments
  • Azure OpenAI Service — EU regions (e.g. France Central, West Europe) with data residency options
  • AWS Bedrockeu-west-3 (Paris), eu-central-1 (Frankfurt) for Claude and other models
  • OVHcloud AI Endpoints — managed inference endpoints on OVH infrastructure
  • Scaleway Generative APIs — hosted models on Scaleway Paris / Amsterdam regions

Best for: teams validating product-market fit, moderate volume, no requirement to eliminate third-party API dependency entirely.

Tier 2 — Self-hosted open-weight models on EU cloud GPUs

You run inference yourself on rented GPUs. Data stays in your tenancy; you manage scaling, updates, and monitoring.

  • OVHcloud GPU instances — H100, L40S in Gravelines, Roubaix, Strasbourg
  • Scaleway GPU — H100 clusters in Paris
  • Hetzner — cost-effective EU compute; growing GPU options
  • Stack: vLLM or TGI for serving; Llama 3.x, Mistral open-weight, Mixtral, Qwen depending on task

Best for: sensitive corpora, predictable high volume, buyers who require "no third-party LLM API" in procurement questionnaires.

Tier 3 — On-prem or dedicated sovereign environment

Full control: air-gapped or private cloud, often for defence, critical infrastructure, or national-health deployments. Highest cost and operational burden. Same inference stacks as Tier 2, with stricter network boundaries and key management.

Reference architecture: RAG on EU infrastructure

A production pattern that holds up under GDPR and enterprise procurement review:

  1. Ingestion — documents land in EU object storage (OVHcloud Object Storage, Scaleway Object Storage, or S3-compatible in EU region)
  2. Indexing — embeddings via EU-hosted model (Mistral embed, open-source embedder self-hosted, or Bedrock EU); vectors in pgvector, Qdrant, or Weaviate in same region
  3. Retrieval — application server in EU VPC retrieves chunks; access control applied before context reaches the prompt builder
  4. Inference — Mistral API EU, or self-hosted vLLM on OVHcloud / Scaleway GPU; prompt and completion never leave EU network path
  5. Audit log — per-query log: user, retrieved chunk IDs, model version, output hash, timestamp — motivates Article 12 EU AI Act readiness; see CARAG architecture
  6. Observability — token cost, latency p95, refusal rate, retrieval hit rate

The CARAG enterprise case study implements a compliance-aware variant of this pattern over a 1.2M-document corpus — eligibility checks inside the retriever, not bolted on after generation.

Mistral vs self-hosted Llama — when to use which

  • Mistral API (Tier 1): fastest path to quality; frontier-class for European languages; vendor dependency acceptable
  • Mistral open-weight on your GPUs (Tier 2): same model family, no per-token API bill; you operate vLLM
  • Llama 3.x (Tier 2): strong open ecosystem; good for cost-sensitive batch; you own safety tuning
  • Hybrid: Mistral Large for complex queries; smaller self-hosted model for classification and routing — common cost optimisation

Model selection detail: OpenAI vs Claude vs Mistral vs Llama.

OVHcloud vs Scaleway — practical differences

Both are credible for EU sovereign AI. Choice often comes down to existing relationships, certification needs, and GPU availability:

  • OVHcloud: large EU footprint, HDS (health data hosting) options, strong with French public-sector and enterprise buyers
  • Scaleway: Paris-rooted, competitive GPU pricing, Generative APIs for quick starts, popular with startups
  • Hetzner: lower cost for non-HDS workloads; good for dev/staging and cost-sensitive inference
  • Multi-cloud: some enterprises require two EU providers for resilience — design for portable Kubernetes and object storage

Migration path from US APIs

Teams already on OpenAI often migrate in phases rather than big-bang:

  1. Inventory — list every call site, data sensitivity, and quality requirement per use case
  2. Eval harness — same test set run against OpenAI baseline and Mistral / self-hosted candidate
  3. Router layer — abstract model provider behind an internal API; swap without rewriting product code
  4. Shadow mode — run EU model in parallel, compare outputs, do not serve to users yet
  5. Feature flag cutover — per-tenant or per-use-case switch; keep US fallback only where TIA allows
  6. Decommission — remove US endpoints from primary path; document remaining exceptions

GDPR transfer mechanics: GDPR + LLMs guide, international transfers section.

Cost reality check

Self-hosting is not automatically cheaper. Below ~10M tokens/day, EU managed APIs usually win on total cost once engineering time is included. Above ~50-100M tokens/day, self-hosted Llama or Mixtral on reserved GPUs often beats API pricing. Run the math for your volume before choosing Tier 2 for cost alone.

Operational checklist

  • DPAs signed with every processor (Mistral, cloud provider, vector DB vendor)
  • Region pinned in infrastructure-as-code — no accidental US failover
  • Secrets in EU-managed vault; not in repo or frontend
  • Model version pinned; upgrade process documented
  • GPU monitoring and autoscaling tested under load
  • Erasure workflow covers vector index and conversation history
  • Pen test or security review before enterprise procurement

Bottom line

Deploying LLMs on EU infrastructure is an architecture choice, not a checkbox. Managed EU APIs (Mistral, Azure EU, Bedrock EU) are the fastest path; self-hosted open-weight models on OVHcloud or Scaleway GPUs are the answer when procurement or regulation rules out third-party inference. Either way, design retrieval, logging, and access control together — not as an afterthought. Insightrix Sovereign AI builds and migrates EU-resident LLM systems; the CARAG paper shows what compliance-aware retrieval looks like at scale. Submit a project brief for a scoped assessment of your stack.

Editorial content. Informational only — not legal, financial, or professional advice.

Get the playbook

Short, practical AI essays for founders, CTOs, and Heads of AI. One email a month. Unsubscribe anytime.

Want a similar conversation about your stack?

Most engagements start with a 60-minute scoping call.

More reading

Aru Bhardwaj

Fractional CTO architecting sovereign AI systems for startups and scale-ups across Europe. Custom ML, agentic RAG, and secure LLM infrastructure. 7+ years turning complex data into production intelligence.

Malt
Upwork

Contact

Services

  • Fractional CTO & AI Strategy
  • MVP Development & Rapid Prototyping
  • Sovereign LLM Deployment (OVHcloud, Scaleway)
  • Multi-Cloud AI (AWS Bedrock, Vertex AI, Azure)
  • RAG Pipelines & Autonomous Agents
  • GDPR & EU AI Act Compliance
  • Generative AI & Prompt Engineering
  • Machine Learning & Predictive Analytics

Monthly playbook

Practical AI essays for founders and tech leaders. One email a month.

Tactical AI essays, monthly.

© 2026 Insightrix SASU. All rights reserved.Aru Bhardwaj, Fractional CTO & AI Strategist

60 Rue François Ier, 75008 Paris, France · SIRET 989 236 856 00013 · TVA FR42989236856