Realism and Deviations
One of DataGen’s important current design choices is the separation between:
- enterprise richness
- realism deviations
- hard invariants that may not be violated
That means you can still generate a rich world without necessarily injecting the same level of flaws, omissions, and inconsistencies every time.
Deviation profiles
The current top-level scenario control is DeviationProfile.
Available values:
CleanRealisticAggressive
How to think about them
Clean
Use Clean when you want:
- a baseline environment
- deterministic demos with fewer distractions
- a control world for comparison
Realistic
Use Realistic when you want:
- believable enterprise messiness
- moderate drift and omissions
- the best general-purpose default
Aggressive
Use Aggressive when you want:
- intentionally flawed service-management or identity views
- harder security or discovery labs
- more drift-heavy validation datasets
Why this matters
Before this separation, teams often had to choose between:
- a rich but messy environment
- a sparse but easy-to-control one
Now the goal is to keep richness high while giving you a cleaner lever over realism intensity.
Hard invariants vs soft deviations
Deviation profiles control the soft side of realism:
- missing owners
- stale CMDB views
- conflicting policy settings
- incomplete observed data
They do not permit hard correctness failures such as:
- duplicate user principal names
- structurally invalid identity records
- impossible reference relationships
If a generated world crosses one of those hard boundaries, generation should fail instead of returning an invalid environment.
CMDB-specific override
The CMDB profile can still carry its own override when you need the broader world at one realism level and the CMDB layer at another.