Reproducible Quantum Experiments: Checklists & Notebooks

Make quantum experiments reproducible offline: checklist, sample notebooks and record‑and‑replay strategies that combine deterministic simulators, noise models and hardware traces.

When QPUs are scarce: reproducible quantum experiments that don't rely on live hardware

Hook: Your team needs reliable benchmarks and repeatable algorithm development, but QPU queues, regional quotas and rising costs make live hardware runs intermittent. The result: experiments that can't be reproduced, wasted developer time, and slow project momentum. This article gives a practical playbook — a checklist, patterns and sample notebook workflows — that combine deterministic simulators, parametric noise models and recorded hardware traces so experiments remain reproducible whether or not you can reach a live QPU.

Why this matters in 2026

By late 2025 and into 2026, teams face two converging realities:

Cloud QPU capacity remains constrained for many organizations; enterprise customers and hyperscalers still get priority for the newest devices.
Tooling for exporting calibration snapshots, measurement error matrices and shot-level traces has matured across multiple vendors, enabling record-and-replay approaches that weren't practical a few years ago.

Together, these trends make reproducible offline experimentation a first-class workflow: you can iterate locally against faithful emulations and later validate on hardware when access permits.

Core idea: deterministic simulator + noise model + hardware traces

At the center of the approach is a three-layer pattern:

Deterministic simulator — a seeded statevector or density-matrix simulator (no randomness) for algorithmic determinism and bit-for-bit comparability across runs.
Parametric noise model — gate errors, T1/T2, readout confusion matrices that are versioned and applied to the simulator to emulate device behavior.
Recorded hardware traces (record-and-replay) — shot-level measurement traces or conditional probability matrices captured from the QPU that you can replay against ideal outcomes to reproduce device-specific stochastic behavior.

Reproducibility checklist (copy into each experiment)

Before you run an experiment — or when you archive results — ensure the following are present and versioned. Treat this as your canonical reproducibility checklist.

Code & Notebooks: The exact notebook(s) and plaintext scripts used. Pin versions (git commit hash).
Environment: requirements.txt / pip‑freeze, conda-lock.yml or a Dockerfile. CPU/GPU/Aer simulator versions and seed settings.
Simulator settings: simulator backend type (statevector/density matrix), seeds for simulator and transpiler, shot count, and deterministic flags.
Noise model artifact: JSON or serialized noise-model export (with creation timestamp and source backend id + calibration snapshot id).
Hardware trace archive: shot-level traces or confusion matrices exported from the QPU; include provenance (backend name, job ids, timestamps, firmware/OS), and a checksum.
Transpiler and compilation: exact transpiler passes and pass manager configs, target coupling map, basis gates and optimization level.
Input data & seeds: random seeds used for any parameter initialization, dataset versions and preprocessing steps.
Results and metrics: raw outputs (shots), aggregated metrics, and scripts to reproduce plots and tables.
Experiment manifest: a single JSON manifest that ties everything together (paths, commit hashes, artifact checksums, experiment id).

Experiment archive layout (recommended)

Use a lightweight layout that can be pushed to a Git + LFS or an experiment storage system like DVC, OSF, or an S3 bucket with immutable versions.

experiment-2026-01-18/
├─ notebook.ipynb
├─ scripts/
│  ├─ run.py
│  └─ utils.py
├─ env/
│  ├─ requirements.txt
│  └─ Dockerfile
├─ artifacts/
│  ├─ noise_model-v1.json
│  ├─ hardware_traces-2026-01-10.json
│  └─ results/raw_shots.json
└─ manifest.json  # Contains git hash, seeds, backend ids, checksums

Sample notebook workflows

Below are three complementary notebooks you should supply in each experiment archive. Each is short, focused, and easily automated in CI.

1) Deterministic baseline notebook

Purpose: verify algorithmic correctness and deterministic behavior without noise.

Key actions:

Load circuits and set deterministic seeds for transpiler and simulator.
Run statevector or density-matrix simulation and store deterministic expectation values.

# Example (Qiskit-like pseudocode)
from qiskit import transpile
from qiskit_aer import AerSimulator

sim = AerSimulator(method='statevector', seed_simulator=12345, seed_transpiler=12345)
qc = build_your_circuit(params)
qc_t = transpile(qc, simulator=sim, optimization_level=1)
result = sim.run(qc_t).result()
state = result.get_statevector()
# Save 'state' and deterministic metrics

2) Noise-model notebook

Purpose: apply a parametric, versioned noise model derived from device calibration data to reproduce average device behavior.

Key actions:

Export device calibration (T1/T2/gate errors/readout error) to a JSON artifact at the time of the experiment.
Construct a NoiseModel object in your simulator from that JSON and run noisy simulations.

# Example: construct noise model from exported JSON (Qiskit-like)
from qiskit.providers.aer.noise import NoiseModel
import json

with open('artifacts/noise_model-v1.json') as f:
    noise_json = json.load(f)
noise_model = NoiseModel.from_dict(noise_json)
noisy_sim = AerSimulator(noise_model=noise_model, seed_simulator=54321)
noisy_result = noisy_sim.run(qc_t, shots=1024).result()
counts = noisy_result.get_counts()
# Save counts and noise_model id

3) Record-and-replay notebook (shot-level fidelity)

Purpose: reproduce the stochastic shot-level behavior of a particular device job by replaying recorded traces or conditional distributions.

Pattern A — Using confusion matrices:

From a calibration job, compute the readout confusion matrix C where C[m][i] = P(measured=m | ideal=i).
From a deterministic or noisy simulator, produce ideal outcome probabilities P_ideal(i).
Generate measured distribution P_meas = C * P_ideal and sample shots from P_meas.

# Pseudocode for confusion-matrix replay
import numpy as np

# confusion: shape (2^n,2^n) exported from device
confusion = np.load('artifacts/confusion_matrix.npy')
# p_ideal: vector of ideal outcome probabilities from deterministic sim
p_ideal = compute_ideal_probabilities(state)
p_meas = confusion.dot(p_ideal)
# Sample 1024 shots from p_meas
samples = np.random.choice(len(p_meas), size=1024, p=p_meas)
# Aggregate and save

Pattern B — Replaying shot-level traces:

If you have shot-level traces recorded from a particular hardware job (for instance, a list of measured bitstrings with job id and timestamps), you can replay the exact empirical distribution by sampling from that trace file or by conditioning replay on ideal outcomes.

# Simple replay of recorded shots
with open('artifacts/hardware_traces-2026-01-10.json') as f:
    traces = json.load(f)  # list of bitstrings
# To reproduce exactly, sample from traces with replacement or use the same ordering
replayed_samples = random.choices(traces, k=1024)

How to export and version noise artifacts

Make the noise model an explicit first-class artifact. Recommended fields:

backend_id, backend_revision or firmware_version
calibration_timestamp
gate_errors and gate_fidelities (per gate, per qubit)
T1/T2 per qubit
readout_confusion_matrix
calibration_job_id and measurement_job_ids
generator script version (git hash) that produced the JSON

Store the JSON in Git-LFS or an artifact store and record its checksum in your manifest.

Case study: VQE reproducibility with intermittent QPU access

Scenario: your team develops a VQE for a molecular Hamiltonian. You want to iterate the ansatz and optimizer on a local deterministic baseline, validate noise robustness with a parametric noise model, and finally report device-validated results even though QPU access is sparse.

Workflow summary:

Run the deterministic notebook to ensure the ansatz produces the expected energy minima and that optimizer converges deterministically (seeded).
Apply the noise-model notebook to evaluate average bias introduced by device-like gate/readout errors. Archive the noise_model JSON.
If a hardware slot is available, run a short calibration job to collect readout confusion matrices and a small number of shots for critical circuits. Export those traces to hardware_traces.json and link to the original job id.
Use the record-and-replay notebook to re-run the full VQE experiment at shot-level fidelity without needing the hardware. Aggregate metrics and compare: deterministic vs noise-model vs replayed-hardware results.

Result: every experiment result can be reproduced offline by anyone who has the archive; when the team later gains longer hardware access, you can validate the offline predictions against new live runs.

Advanced strategies for teams and CI

Notebook testing and CI: run notebooks in CI with deterministic seeds and nbval or Papermill. Keep a set of regression fixtures (small circuits and expected outputs) so changes in compilers or simulator versions surface quickly.
Containerized runtimes: publish a container image (Docker) used for the experiment. Tag with git hash and push to a registry. This prevents drift across developer machines.
Experiment diffs: store aggregated metrics and make diffs between runs part of the CI report. Track not just means but distributions and credible intervals.
Access control & provenance: log who exported noise artifacts and when; include job ids so hardware operators can reconcile with backend logs if needed.
Shared artifact registry: maintain a team registry of noise models and trace snapshots (with metadata and permissions) so colleagues can reproduce experiments without re-downloading large files repeatedly.

Common pitfalls and how to avoid them

Missing seeds: Not setting simulator/transpiler seeds makes deterministic comparisons impossible. Always set seed_simulator and seed_transpiler.
Unversioned noise models: Device calibrations change daily. Archive the exact calibration used and include timestamps.
Transpiler drift: Changes to passes or basis gates change circuits. Include transpiler config or serialized compiled circuits in the archive.
Partial artifacts: Incomplete exports (e.g., confusion matrix but no gate error info) reduce fidelity. Export a full calibration snapshot or document missing fields explicitly in the manifest.

Practical tips & code hygiene

Prefer small, focused notebooks for each stage (deterministic, noise, replay) so CI can run them fast.
Use canonical experiment IDs and human-readable manifests (JSON) so artifacts are discoverable.
Compress large trace files and store checksums in the manifest to ensure integrity when shared.
Annotate notebooks with the exact backend names and calibration job ids (this is crucial for auditability).

Emerging standards and 2026 outlook

In 2025 the community made tangible progress toward standardizing experiment artifacts: exporters for calibration snapshots and shot-level traces became a common feature in provider SDKs, and open-source tooling surfaced to read/write confusion matrices and noise-model JSONs. Expect the following in 2026:

More provider support for artifact export: better APIs for capturing calibration snapshots and job-level traces with explicit provenance.
Standardized experiment manifests: community schemas (JSON-based) that include commit hashes, artifact checksums and calibration metadata.
Improved emulator fidelity: simulators that accept vendor-supplied noise models and traces directly, enabling higher-fidelity offline validation.

Adopting the reproducibility patterns outlined here positions your team to benefit immediately from these improvements and to scale your workflows as provider APIs evolve.

"Reproducibility is not a feature — it's a discipline. Archive the full story: code, environment, seeds, noise and traces."

Actionable takeaways

Start every experiment with a manifest and three short notebooks: deterministic, noise-model, record-and-replay.
Export and version noise artifacts and hardware traces; store them with checksums.
Use deterministic simulator seeds and record transpiler configuration to prevent drift.
Integrate notebook runs into CI with deterministic fixtures so regressions fail fast.
Archive everything in a compact experiment layout that colleagues can clone and re-run locally.

Where to get the sample notebooks

We published a reference repository with the three notebook templates, a manifest example and small utilities for converting backend calibration exports into noise-model JSONs. Clone, adapt and integrate the notebooks into your CI pipeline. If you run into provider-specific quirks, capture the calibration job ids and file an issue in the repo so the community recipes can be improved.

Final note & call-to-action

If reproducibility is strategic for your team — for benchmarking, auditing or for collaborative R&D — adopt the record-and-replay pattern now. It lets you iterate fast locally, produce shareable, auditable experiment archives and still validate on hardware when QPU time is available.

Next step: Download the sample notebooks from our repository, run the deterministic baseline with your circuits today, and add the manifest to your project. Need help adopting this for your team’s pipeline? Contact us for a technical review and a tailored experiment-archiving template.

Reproducible Quantum Experiments When Hardware Access Is Limited: Strategies & Notebooks

When QPUs are scarce: reproducible quantum experiments that don't rely on live hardware

Why this matters in 2026

Core idea: deterministic simulator + noise model + hardware traces

Reproducibility checklist (copy into each experiment)

Experiment archive layout (recommended)

Sample notebook workflows

1) Deterministic baseline notebook

2) Noise-model notebook

3) Record-and-replay notebook (shot-level fidelity)

How to export and version noise artifacts

Case study: VQE reproducibility with intermittent QPU access

Advanced strategies for teams and CI

Common pitfalls and how to avoid them

Practical tips & code hygiene

Emerging standards and 2026 outlook

Actionable takeaways

Where to get the sample notebooks

Final note & call-to-action

Related Topics

qbitshared

Up Next

Brand Positioning Examples for Quantum Hardware vs Quantum Software Companies

Pitch Deck Design for Quantum Startups: What Investors Expect to See

Deep Tech Logo Trends: What Quantum Brands Are Doing Right Now

From Our Network

Best Quantum Company Websites: Design Patterns, Messaging, and UX Benchmarks

Quantum Startup Branding Checklist: What to Build Before and After Seed Funding

Quantum Computing Branding Examples: 50 Companies and What Their Brands Signal

How to Position a Quantum Startup: Category, Wedge, and Proof Framework

Quantum Logo Design Trends: What Looks Credible vs Cliché in 2026

Quantum Machine Learning Examples for Developers: From Concepts to Code

When QPUs are scarce: reproducible quantum experiments that don't rely on live hardware

Why this matters in 2026

Core idea: deterministic simulator + noise model + hardware traces

Reproducibility checklist (copy into each experiment)

Experiment archive layout (recommended)

Sample notebook workflows

1) Deterministic baseline notebook

2) Noise-model notebook

3) Record-and-replay notebook (shot-level fidelity)

How to export and version noise artifacts

Case study: VQE reproducibility with intermittent QPU access

Advanced strategies for teams and CI

Common pitfalls and how to avoid them

Practical tips & code hygiene

Emerging standards and 2026 outlook

Actionable takeaways

Where to get the sample notebooks

Final note & call-to-action

Related Reading

Related Topics

qbitshared

Up Next

Brand Positioning Examples for Quantum Hardware vs Quantum Software Companies

Pitch Deck Design for Quantum Startups: What Investors Expect to See

Deep Tech Logo Trends: What Quantum Brands Are Doing Right Now

From Our Network

Best Quantum Company Websites: Design Patterns, Messaging, and UX Benchmarks

Quantum Startup Branding Checklist: What to Build Before and After Seed Funding

Quantum Computing Branding Examples: 50 Companies and What Their Brands Signal

How to Position a Quantum Startup: Category, Wedge, and Proof Framework

Quantum Logo Design Trends: What Looks Credible vs Cliché in 2026

Quantum Machine Learning Examples for Developers: From Concepts to Code