DE
Back to overview

ERP & Automation

AI Document Processing On-Premise: Sovereign, GDPR-Compliant, No Cloud

By Robin Maier June 12, 2026 9 min read
AI Document Processing On-Premise: Sovereign, GDPR-Compliant, No Cloud

The requirement sounds like a contradiction: business documents — purchase orders, invoices, delivery notes — are supposed to be captured automatically with modern AI. At the same time, those very documents must not leave the company network, because they contain the concentrated essence of your business relationships: purchasing terms, customer prices, volume structures, margins.

Until a few years ago, this really was a contradiction. Capable AI existed only as a cloud API, and “GDPR-compliant” meant, at best, a US vendor’s EU data center plus a stack of data processing agreements. That era is over. Open-weights models on your own hardware now process documents entirely locally — at a quality that matches cloud services for structured extraction tasks, and at costs below a mid-range SaaS subscription. This article explains what on-premise AI document processing means in concrete terms today: legally, technically, and economically.

What “on-premise” actually means here

The term is used loosely in the market, so a precise definition is in order. What we mean here: models, processing, and data run on hardware the company controls — in its own server room or in dedicated, self-managed colocation. No document, no extracted field, and no token stream leaves your own network.

This must be distinguished from three variants that are often marketed as “privacy-friendly” but carry structurally different risk profiles: SaaS from an EU data center (a third party still processes the data), “private cloud” hosted by the vendor (same thing, with better isolation), and EU API inference (fine for many use cases — but the documents still flow outside).

The legal consequence of genuinely local processing is remarkably simple: where no third party processes data, there is no need for a data processing agreement, no third-country assessment, no CLOUD Act deliberation, and no debate about zero-data-retention promises. For German companies, that means GDPR compliance by design; for Swiss companies, the same holds under the revised Swiss Data Protection Act (revDSG). And beyond data protection law, there’s the often more important point: document data is trade secrets. The most robust confidentiality guarantee is the physical one — data that never leaves the building cannot leak externally, cannot be used for training, and cannot be compelled by a foreign government order.

Open-weights models: the foundation

What makes this possible is a development that many IT departments still underestimate in their day-to-day: open-weights models — language and vision models whose weights are freely available and may be run locally, many of them under the Apache 2.0 license with no commercial restrictions.

For document extraction, one insight is central: the task doesn’t need a frontier model. Transferring document data into a schema is focused transcription, not creative reasoning. Compact models between roughly 4 and 30 billion parameters handle it quickly and precisely on your own hardware — as of June 2026, for example Google’s Gemma family as compact generalists, or models specialized in structured extraction like NuExtract, fine-tuned for exactly this class of task and beating larger generalists at it. For edge cases (scanned documents, damaged text layers), a local vision model rounds out the pipeline.

Three properties make open weights strategically interesting, beyond data protection:

  • No variable costs. Local inference costs electricity, not API fees. At a continuous document volume, that’s the difference between a fixed cost line and a growing one.
  • No vendor lock-in. The model is an interchangeable component in a pipeline, not a contract. When a better model appears — which happens every six months in this market — it’s evaluated locally and swapped in if it wins, with no migration project.
  • Reproducibility. A frozen model version behaves exactly the same tomorrow as it does today. Cloud APIs change without notice — a serious problem for auditable business processes; locally, the problem simply doesn’t exist.

The hardware question: smaller than you think

The most persistent prejudice against local AI goes: too expensive, too complex, needs a GPU cluster. For document pipelines in the mid-market, the opposite is true — here are the orders of magnitude:

StageHardwareBallparkHandles
Pilot / developmentexisting workstation, Apple Silicon included€0 extrapipeline logic, small models (4B quantized)
Production, standardserver with one 24 GB GPU~€2,000–4,000mapping model + validation, hundreds of documents/day
Production, comfortablecompact AI appliance (e.g. DGX Spark class, 128 GB unified memory)~€4,000mapping and vision model resident simultaneously, headroom for further AI workloads

To put the compute load in perspective: a typical purchase order amounts to a few thousand tokens. Even on conservative math, a single modern GPU processes a mid-sized company’s daily inbound volume in minutes — after which the hardware sits idle, or takes on additional work (internal assistants, search, classification). The investment equals a few monthly installments of a mid-sized Intelligent Document Processing SaaS; after that, it works with no per-document costs.

Operations have become standard IT: the models run in inference servers like vLLM or Ollama, packaged as Docker containers, with GPU passthrough via the NVIDIA Container Toolkit and monitoring like any other service. No dedicated ML team is required — but you do need someone who builds the pipeline properly the first time.

The architecture: the model is the smallest part

If you understand “AI document processing” as “send the PDF to a model,” you build a system whose errors nobody notices. Local extraction becomes production-grade through a staged architecture in which the language model has exactly one narrowly scoped job — and is bracketed by deterministic code:

  1. Parsing (CPU, no AI): A layout parser such as the open-source project Docling converts PDFs, table structure included, into Markdown. The raw text is stored — as the provenance anchor for everything that follows.
  2. Extraction (local model, constrained): The model transfers values from the raw text into a strict JSON schema. Constrained decoding enforces the schema syntax at the token level; the model cannot produce invalid JSON. It transcribes, it doesn’t calculate.
  3. Validation (deterministic Python): Arithmetic invariants (quantity × price = line total, line totals = document total), master data matching, and a grounding check: every extracted value must be traceable to the stored raw text — hallucinations are caught systematically, not ignored hopefully.
  4. Escalation and review: Whatever fails the checks escalates in stages — first to a vision model, then to a review interface for humans. The goal is not 100% raw accuracy (which nobody can credibly promise on free-form layouts), but zero undetected errors among the documents transferred automatically.

Every component in this chain is open source or your own code. There is no proprietary core, no black box, and no contract the architecture depends on — the system belongs to the company that runs it. (The complete process, including rollout phases, is described in the guide: Getting PDF documents into your ERP automatically)

Limits and an honest assessment

On-premise is not an end in itself, and it isn’t the right answer for every scenario. Three caveats belong in any serious evaluation:

  • It’s a project, not a subscription. A local pipeline gets built and operated — initially with external expertise, afterwards with manageable but real maintenance effort (updates, model refreshes, monitoring). If you need some solution within two weeks and can accept data leaving the building, SaaS is faster.
  • Model upkeep is part of operations. The open-weights market moves fast; roughly every six months it’s worth evaluating new models against your own test set. Thanks to interchangeable components, that’s an afternoon of benchmarking, not a project — but it should be planned for.
  • Generative tasks beyond extraction (long-form writing, complex reasoning) still benefit from larger models. For such workloads, a hybrid setup can make sense — structured document data processed locally, non-critical tasks via EU inference. What matters is that the sensitive documents never leave the defined perimeter.

Frequently asked questions

Is local AI extraction worse than cloud AI? For structured extraction: no. The task is transcription into a schema — compact, sometimes specialized open-weights models are sufficient for that, and combined with deterministic validation they are production-grade. Frontier models play out their advantages on open-ended, creative tasks, not on transferring order line items.

What hardware does it take to get started? For the pilot and logic validation: existing hardware, even a laptop. For production: a single 24 GB GPU or a compact AI appliance with large unified memory (ballpark €2,000–4,000). GPU clusters are not required for document pipelines in the mid-market.

How does the system stay current when models evolve so quickly? Through architecture: the model is an interchangeable component behind a stable interface. New models are evaluated against your own gold set (real documents with known target results) and only swapped in when they deliver a measurable improvement. The pipeline itself — parsing, validation, review — is unaffected.

Do we need our own AI team? No. Operations match those of any standard containerized service (Docker, monitoring, backups). The pipeline should be built by someone experienced in both LLM systems and ERP integration; your existing IT can run it.

What do we concretely gain under GDPR? Processing without third parties: no data processing agreement for the extraction, no third-country transfers, no reliance on a vendor’s assurances. The legal-basis assessment for the processing itself remains — but the entire class of transfer and vendor risks is eliminated structurally.

Conclusion

AI document processing and data sovereignty are no longer mutually exclusive — they have become combinable, affordable, and operable with standard IT. Open-weights models on your own hardware extract document data at production quality, deterministic validation makes the results trustworthy, and the cost curve is flat once the system is built. For companies whose documents are trade secrets — which is to say, most of them — on-premise is not the cautious architecture decision. It’s the rational one.


kitun builds AI document pipelines fully on-premise — open-weights models, deterministic validation, direct ERP integration, code handover included. Digital sovereignty isn’t an upcharge; it’s the default. Get a first assessment in a 20-minute call.

The solution at a glance: the kitun document pipeline

Conversation

Let's talk.

If you are thinking about custom business software — an ERP replacement, a new customer portal, process digitalisation — drop us a line. No sales rep, no funnel. Reply within 24 hours.