What's New
This page tracks the most recent product-level changes that matter to users of DataGen.
v0.8.1
This release focuses on repository realism cleanup for collaboration-heavy enterprises, especially around what should and should not count as a true file share.
Highlights
- Realistic hidden-root share modeling
User homes and profile paths are no longer exported as one top-level share per employee. DataGen now models a small number of real hidden roots such as
users$andprofiles$, which is much closer to how modern environments are actually run. - Limited owner-specific exceptions The generator still emits a small number of direct-owner restricted shares for realistic edge cases like executive, legal, HR, and finance working files, but those now appear as exceptions instead of dominating the repository surface.
- Cleaner downstream repository counts Flagship datasets such as Duckburg now present a far more believable file-share footprint for SharePoint- and Teams-heavy organizations, which improves downstream CMDB, coverage, and repository analytics.
v0.8.0
This release focuses on realism expansion beyond the flagship Duckburg bundle, with particular emphasis on topology completeness, CMDB quality, and more believable access/resource surfaces.
Highlights
- AD sites and subnet realism Generated environments now include Active Directory sites, site links, site memberships, internal network subnets, and realistic IP allocation for workstations, servers, telephony, and network assets.
- Stronger CMDB operating shape Configuration items now carry a more believable criticality spread across platform, application, infrastructure, collaboration, software, and data surfaces, reducing the flat analytical profile that broke downstream realism.
- Cleaner resource and access realism Shared resources and major enterprise applications now tell a clearer group-centric access story, with fewer synthetic naming artifacts in repositories, collaboration workspaces, and supporting ACL surfaces.
- Duckburg refresh The DTED demo package was regenerated again from the improved source contract so topology, CMDB, repository, and plugin-record realism all start from the newer baseline.
v0.7.0
This release focuses on realism hardening in the flagship Duckburg dataset, especially around identity, access, endpoint coverage, and release-quality naming.
Highlights
- Group-centric access realism Major enterprise applications, shared mailboxes, and file-share resources now surface clearer governing groups instead of leaning on direct assignment patterns or synthetic-looking ACL labels.
- Identity and endpoint cleanup
Device-account display names,
sAMAccountNameuniqueness, account OU repair behavior, and endpoint/security agent coverage are all tighter and more believable across hybrid and cloud-facing identity surfaces. - Stronger organization/output polish Team and resource naming was refined further, late-stage synthetic suffix artifacts were removed, and Duckburg now reads more like an enterprise environment and less like a procedurally stitched demo bundle.
- Broader realism validation The quality sweep now catches more release-stage issues earlier, including duplicate directory account names, weak access-label patterns, and OU-reference cleanup noise.
v0.6.0
This release focuses on realism hardening and stronger downstream export evidence, especially for DTED-oriented validation and demo datasets.
Highlights
- Organization and reporting realism Large scenarios now generate cleaner department and team structures, more believable reporting lines, and tighter person-to-team alignment instead of fragmented manager spray or breadcrumb team names.
- Richer DTED-facing evidence
Normalized exports now carry stronger policy-setting provenance and CMDB evidence, including typed
sourceandbehavioron policy settings plusfqdn,unc_path,rto_hours, andrpo_hourson configuration items. - Identity and access alignment Account evidence remains consumer-agnostic, but the generated data is now better shaped for downstream bridges that need lifecycle, state, and non-AD identity-store association cues.
- Duckburg refresh The flagship Duckburg package was regenerated with the newer realism, CMDB, policy, organizational, and plugin-record surfaces so downstream testing starts from a cleaner baseline.
v0.5.1
This patch release focuses on docs toolchain security hygiene.
Highlights
- Patched website transitive dependency
The docs stack no longer pulls the vulnerable
uuidpath throughsockjs; the site now uses a vendoredsockjspatch backed by Node's built-incrypto.randomUUID(). - Clean audit surface
The website lockfile was refreshed so
npm auditis clean again without waiting on an upstream Docusaurus or webpack-dev-server release. - Verified docs runtime Both the static site build and the local Docusaurus development server were validated against the patched dependency tree.
v0.5.0
This release focuses on source realism and evidence quality for downstream consumers such as DTED, without turning DataGen itself into a consumer-specific adapter.
Highlights
- Identity realism cleanup Large flagship scenarios no longer emit dense clusters of duplicate person display names, and account/device identity evidence is more coherent across people, devices, and machine accounts.
- Stronger export evidence Normalized exports now carry richer lifecycle and classification signals, including account creation/modification timestamps, application type/deployment type, and improved policy-setting path data.
- Identity store and application realism AD, Entra, and Okta naming/domain surfaces are cleaner and more believable, and application metadata is stronger for downstream typing and relationship construction.
- Richer policy corpus Policy generation now produces a broader, more enterprise-like management surface with better path realism and stronger scope evidence across Group Policy, Intune, and Conditional Access.
- Duckburg acquisition scenario The flagship Duckburg scenarios now include an acquired-company path so downstream discovery and migration tooling can exercise integration-oriented company-to-company relationships.
v0.4.4
This patch release corrects the release-tag lineage so GitHub Actions runs the intended fixed revision.
Highlights
- Clean release tag
The
v0.4.4tag points at the corrected release commit, so the release workflow uses the fixed flagship acceptance test and portability guardrail changes. - No functional regression
This release carries forward the same runtime fixes from
v0.4.3; the primary change is publishing them under an unambiguous release tag.
v0.4.3
This patch release focuses on release reliability and test portability.
Highlights
- Portable flagship acceptance tests The flagship realism acceptance suite now uses only repo-stable example scenarios, so release builds no longer depend on local generated artifacts.
- Hardened portability validation The repo portability validator no longer self-matches on its own detection pattern, which keeps CI and release workflows from failing on the guardrail itself.
v0.4.2
This patch release focuses on release portability and contributor guardrails.
Highlights
- Repo portability validation DataGen now includes a validator that checks tracked files for machine-specific absolute paths before they break CI or release workflows.
- Optional pre-push hook
Contributors can enable a repo-managed
pre-pushhook to run the portability check automatically before publishing changes. - Stable realism review defaults The realism review script now defaults only to repo-stable scenario inputs instead of depending on local generated artifacts.
v0.4.1
This patch release focuses on security and release automation hygiene.
Highlights
- Secure machine-account password generation Machine-account passwords now use cryptographically secure randomness instead of the general generator random source.
- Explicit CI workflow permissions The CI workflow now declares explicit read-only permissions to satisfy current GitHub Actions security policy and keep release automation unblocked.
v0.4.0
This release is the point where DataGen moved from a stronger enterprise generator into a broader synthetic operating environment platform.
Highlights
- Bundled domain packs
DataGen now includes first-party packs for ITSM, SecOps, and BusinessOps through the native scenario
packsmodel. - Temporal simulation foundations Generated worlds can now include temporal events, drift, and snapshot-oriented export artifacts.
- Productized scenario authoring Archetypes, persona presets, richer overlays, and an archetype-first wizard now shape the preferred authoring workflow.
- Major realism hardening Recent work improved organization structure, geography, groups, policies, repositories, CMDB artifacts, applications, infrastructure, and external ecosystem modeling.
- Built-in quality validation Generation results now include structured quality reports, and the realism review tooling can emit both markdown and JSON for CI and repeatable review loops.
Practical impact
For most users, this means:
- easier scenario setup
- more believable default output
- better operating-domain coverage
- stronger validation and demo datasets
- clearer automation and CI checks around generation quality
Recommended entry points
If you are starting fresh with the current platform, begin with:
If you are extending the platform, also read:
Notes on release scope
DataGen continues to treat downstream import shaping as an external concern.
The work in this release deepens the generated source environment itself:
- richer structure
- more realistic operating domains
- temporal behavior
- stronger quality and realism diagnostics
It does not turn the core product into a collection of consumer-specific adapters.