Mock vs Synthetic vs Real Data: What Your Simulation Is Really Based On

Most people hear “data-driven simulation” and assume it’s all grounded in reality. That every moving dot, every conversion rate, every outcome comes straight from real data.

That’s usually not the case.

In simulations, data is often not the starting point—it’s the result.

The simulations don’t begin with a dataset. First, the system is built: the environment, the agents, and the rules. And once that system runs, it produces outputs.

That output is interpreted as mock data or synthetic data.
It is not real data; it is just realistic data.

Mock data vs Synthetic data vs Real data

When we build simulations, we’re usually working with three kinds of data: mock data, synthetic data, and real data. They sound similar, but they behave very differently.

Mock data meaning

Mock data means test data is the simplest form. It’s not trying to reflect reality—it just needs to look plausible. Often, some basic flows, random behaviors, and drop-offs or delays are defined, so a funnel or process could be easily generated from the mock data.

It’s useful for fast communication. You can explain an idea without needing access to real systems.

But it’s purely illustrative. Mock data shows how something could behave, not how it actually does.

However, if we talk about synthetic data and real data, the analysis becomes more meaningful.

Synthetic data vs real data

The main difference between synthetic data vs real data is that synthetic data is generated data that behaves like real data, whilst real data is the actual data. 

First, we suggest reading our blog post on the meaning of synthetic data to understand the basics.

Synthetic data isn’t just random. The system is built to behave like reality, not just look like it, as we explained in the case of mock data. For quality synthetic data, the rules that reflect real-world dynamics need to be defined—conversion rates, delays, constraints, and interactions.

The data is still generated, but it’s generated from a system that mimics how things actually work.

So, synthetic data is artificial data grounded in modeled logic.

This is the core of the most useful simulations. Because now the questions get meaningful:

  • What happens if conversion improves by 10%?
  • Where do bottlenecks appear if demand doubles
  • How does a small delay early in the process cascade downstream?

The past data is not just analyzed—the possible realities are generated.

Below, you can see an example of our simulation where synthetic data is generated as an output.


 

Real data in simulations

Real data is the actual data; it does not come out of the simulation as an outcome.

The real data comes from actual systems—analytics tools, logs, user behavior, operations data. It reflects what has already happened.

The main value in using real data for simulations is to use the data for building the model itself.

The real data could be used to:

  • Calibrate parameters (e.g., real conversion rates, timings)
  • Validate whether the simulation behaves realistically
  • Reverse-engineer system dynamics

So instead of feeding real data into a simulation as “input data,” it could be used to design a model that behaves in a data-informed way.

Data usage in simulations

In practice, good simulations combine all three data types:

  • Mock logic → to move fast and explain concepts
  • Synthetic logic → to explore behavior and scenarios
  • Real data → to ground and validate the model

However, there has to be an understanding that the simulations are not real. 

Simulations are always generated—either from general assumptions (mock data) or structured logic (synthetic data).

That’s the core idea of the simulations – to be generated and adjustable. 

The goal isn’t to replay reality.
The goal of simulations is to construct a system that helps to understand reality.

The data used in simulations is created by the model, while real data is used to make that model credible.