Abstract visualisation of structured data patterns flowing through a secure AI system

Blog

How we train AI on real orders without storing the data

Improving order AI requires real customer orders. But storing that data creates privacy risks. Here is the approach we chose and why it works.

Hyperfox AI order processing without storing sensitive customer data

Every order contains sensitive information. Storing it creates GDPR risks. So we built an architecture that learns from structure and context, not from stored documents.

The three roads to AI-powered order processing

Building AI for order automation requires training data. The real question is where that data comes from and what happens to it.

In practice, we see three common approaches. Most solutions choose one of the first two. We deliberately chose the third.

Approach 1: OCR combined with machine learning

The first approach starts with OCR. A document is scanned, text is extracted and a model learns patterns over time.

It sounds logical. In reality it breaks quickly.

OCR struggles with handwriting, unusual layouts and low quality scans. The machine learning layer behind it requires large volumes of labelled data before it becomes reliable.

For B2B distributors dealing with hundreds of different order formats, from PDFs to spreadsheets to photos, this quickly becomes a maintenance challenge. Every new edge case requires retraining.

‍

Approach 2: fine-tuning a large language model

Another approach is to train or fine-tune a large language model on internal order data.

The idea is that the model will eventually learn the company specific context.

But this comes with trade-offs. You need to store large volumes of raw order data, allocate significant compute resources and iterate for months before the model stabilises.

You are also tied to a specific model architecture. When a provider releases a new version, the fine-tuning process may have to start again.

What we actually do: context modelling

We chose a third path called context modelling.

Instead of feeding raw orders into a training pipeline, we structure the context around the order.

For each customer we define what a valid order looks like. That includes product catalogues, pricing agreements, delivery rules, packaging conventions and the typical exceptions handled by the team.

This structured context is what the AI works with. Not the raw document itself.

When an order arrives, whether it is a PDF, Excel file, email body or even a voice note, the process works like this.

The AI reads the document and extracts the relevant fields.
The system validates those fields against the customer specific rules.
A human reviews the exceptions.
Feedback from that review improves the system over time.

What we do not retain are raw orders, identifiable customer data, pricing details or delivery locations.

What we keep is the structure, the validation rules and the edge cases that help improve accuracy.

In short, we learn from patterns, not from stored documents.

No items found.

Why this matters in practice

This approach creates several advantages that reinforce each other over time.

Fast onboarding

Because we do not train a new model for every customer, implementation takes days rather than months.

The system learns from structure rather than volume. Once the customer context layer is configured with rules, mappings and exceptions, the AI can start processing orders immediately.

Model independence

Context modelling means we are not tied to a single model provider.

If a better model becomes available tomorrow, we can switch without rebuilding everything. The intelligence lives in the structured context layer.

Privacy as a design principle

We do not need to store sensitive order data to improve accuracy.

For some customers, this approach has pushed accuracy above 95 percent over time. Not by compromising privacy, but by designing the system around it.

‍

This philosophy also powers our Codex feature.

Codex is a structured knowledge layer where customer specific interpretation rules become explicit and machine readable. The business logic that normally lives in someone's head becomes part of the system.

That is how a line like "4x mayonnaise" can automatically translate to pallets for one customer and buckets for another.

The real difference

Many vendors in the order automation space claim AI powered processing.

In practice this often means OCR with a cleaner interface, or models that need months of data collection before they deliver value.

We believe the better question is not how smart the AI is.

The real question is how well the system understands your business.

That understanding does not require storing customer orders. It requires structuring the rules, the context and the exceptions and validating every order before it reaches your ERP.

That is what context modelling enables.

If you want to see how this works with your order types, book a demo with our team. We will gladly walk you through a live example.