From Figure-Tour to Claim-Driven: A Workflow for Method, Results, and the Data That Feeds Them¶

Date: 2026-04-18 Status: Working notes, captured after a conversation about why my method and results sections are weaker than my introductions.

The diagnosis: my introductions are strong because they follow a skeleton I trust (context → gap → objective → scope → outline) and because the ENHANCED_GAP_CLAIMS.md work gave each paper an offensive gap — the indictment layer, not just the "nobody has done X" layer. My method and results are weaker because I build them bottom-up from figures. I generate a figure, write two paragraphs about it, repeat, stitch. The product is a figure tour: each paragraph is really a caption pretending to be a paragraph. The argument is a byproduct, not the driver.

This note is the fix, in two halves. The first half is the writing workflow — the inversion that makes method and result sections argue instead of describe. The second half is the data organization that makes the workflow executable without re-doing the heavy lifting inside every figure script.

Part 1 — The Claim-Driven Workflow¶

The inversion¶

Figures serve claims. Claims serve a thesis. The thesis answers the research question. Break any link in that chain and the section falls apart. A figure-first workflow builds the bottom of the chain first; the top (the argument) exists only as a late patch. The excellent version inverts this: the argument drives the figures, not the other way around.

One sentence that keeps me honest: if I cannot state the section's thesis in one sentence before the first figure is rendered, the section is not ready to write.

What an excellent method section actually contains¶

The canonical "3–4 sections with 2–3 subsections each" skeleton is a container, not a structure. Within that container the subsections that differentiate excellent from competent are:

Framework / information flow. A block diagram is decoration unless every arrow is explained. The reader should finish this subsection knowing what enters the pipeline, what leaves it, and what information is transformed at each stage.
Assumptions on the table. Every assumption that could break the argument, stated in one paragraph. Most papers bury assumptions in footnotes or derive-them-as-needed. Excellent papers dedicate a paragraph and make the assumptions testable — "if the loading rate drops below t₉₅, the undrained assumption fails and the result X no longer holds."
Setup / configuration. The reproducibility core. Software versions, random seeds, boundary conditions, mesh parameters, sensor sampling rates. One paragraph that a sceptical reviewer could use to replicate the result without contacting the author.
Verification and validation. Load-bearing, 20–30% of the method section. Amateur papers give this one sentence ("validated against [prior study]"). Excellent papers devote a subsection with its own figures and numbers.
Uncertainty / sensitivity. How the result moves under perturbation of inputs. Often skipped. Shouldn't be — in a probabilistic dissertation, it's non-negotiable.
Limitations of the method itself. What this method cannot tell you, stated before the results are shown. Limitations that appear only in the discussion section read like defensive afterthoughts. Limitations stated up front read like rigour.

The separator between competent and excellent is whether each subsection carries a claim. Amateur: "We used OpenSees." Expert: "We used OpenSees because Rayleigh-damping convergence requires explicit eigenvalue update, which Abaqus's default solver skips; this matters because X." Every method choice is argued, not merely described.

What an excellent result section actually contains¶

Baseline / sanity check. The case where the answer is known — a degenerate limit, a published benchmark, a deterministic reference. If the method passes this, the subsequent results are credible; if it doesn't, nothing else matters.
Headline finding. One figure, one sentence. This is the paper's thesis, visualised.
Sensitivity / robustness. How the headline moves under perturbation. If the headline survives, it is genuinely a finding; if it doesn't, it was a coincidence.
Comparison against prior work. Quantitative, same axes, same metrics. Not "consistent with" — numerically compared.
Edge cases. Where the result does not hold. This subsection is almost always missing and almost always what separates a respectable paper from a great one.

Meta-rules that separate excellent from competent¶

Captions are claims, not descriptions. Not "Figure 5 shows frequency vs. scour depth" but "Figure 5: tripod load redistribution produces a 20% larger frequency drop at 0.5D than monopile theory predicts." The caption is the subsection's thesis statement, shrunk.
Three baselines per result, always. Theoretical prediction, prior published result, null / deterministic case. Miss any of the three and the result is underdefended.
Effect size with CI, not p-values. "18.9% lower σ (95% CI: 14–23%)" dominates "p < 0.05" in every way that matters to an engineering reader.
Dry-to-wet gradient within a paragraph. First sentence: what was measured. Last sentence: what it means. Don't interleave measurement and interpretation — it reads like hedging.
Error bars always. Bare point estimates are a tell of amateur work even if the underlying science is solid.
Captions written before figures are final. If the caption is flabby, the figure will be flabby. This is the single cheapest discipline available.

The seven-phase workflow¶

Top-down with bottom-up iteration. Each phase has a termination check before the next begins.

Thesis statement. One sentence per section. The ONE claim the section must prove. If I can't write it, the section is not ready.
Claim list. Three to five bullets. The sub-claims the thesis decomposes into. Each must be provable from the data I have.
Claim → evidence map. For each claim, what figure / table / statistic proves it? This is now my figure shopping list — derived from argument, not from data I happen to have. Visually:
```
flowchart TD
 T["Thesis statement One sentence the section must prove"]
 T --> C1["Claim 1"]
 T --> C2["Claim 2"]
 T --> C3["Claim 3"]
 T --> C4["Claim 4"]
 C1 --> E1["Evidence Figure A · Statistic S1"]
 C2 --> E2["Evidence Table T1 · Statistic S2"]
 C3 --> E3["Evidence Figure B · Figure C"]
 C4 --> E4["Evidence Figure D"]
 E1 --> S1["figure_generator spec thesis · claim · data_path · style"]
 E2 --> S2["figure_generator spec thesis · claim · data_path · style"]
 E3 --> S3["figure_generator spec thesis · claim · data_path · style"]
 E4 --> S4["figure_generator spec thesis · claim · data_path · style"]
 S1 --> O1["Session output figure · caption draft · paragraph stub"]
 S2 --> O2["Session output figure · caption draft · paragraph stub"]
 S3 --> O3["Session output figure · caption draft · paragraph stub"]
 S4 --> O4["Session output figure · caption draft · paragraph stub"]
```
The top-to-bottom read is the argument. The bottom-to-top read is the audit: every output traces to a spec, every spec to an evidence item, every evidence item to a claim, every claim to the thesis. If any link is missing, the section is not ready. 4. Figure specs. Two sentences per figure. Not "make a figure about scour." Instead: "Figure must show tripod frequency drop exceeding monopile prediction at L/D < 2 by plotting both on the same axes, with the deviation region shaded." The figure_generator engine takes this spec, not a free-form prompt. 5. Paragraph skeletons before any figure runs. Two to three paragraph stubs referencing [Figure X pending]. The argument must be coherent before a single image is rendered. If I cannot write the stub, the figure is not ready. 6. Generate figures, iterate against the paragraph. If the rendered figure does not match the paragraph's claim, one of them is wrong — and the mismatch is now visible, not buried in prose a reviewer will find first. 7. Two coherence passes. (a) Read all paragraphs ignoring the figures — does the argument flow end-to-end? (b) Read all captions ignoring the paragraphs — does the visual story flow? Both must stand alone. Any gap or redundancy surfaced here would otherwise have been surfaced by a reviewer.

Hooking into the `figure_generator` engine¶

The engine at github.com/ksk5429/figure_generator already enforces the discipline I want — one figure per session, journal-specific widths, deterministic reproducibility (git hash + data MD5 + UTC timestamp embedded in every PNG / SVG / PDF), style sheet forbidding jet / rainbow / raw C0, C1, ... cycles, and a make new-figure FIG=<id> scaffold that drops a folder with script, config, and CAPTION.md. Its CLAUDE.md defines a nine-step per-figure session protocol: read data → validate → confirm with user → scaffold → edit script+config → build → write caption → regenerate gallery → propose commit.

The seven-phase claim-driven workflow wraps that engine rather than replaces it. The mapping:

Claim-driven phase	Where it lives	What `figure_generator` does
1. Thesis statement	`paperX/planning/methodology_claims.md`	—
2. Claim list	same file	—
3. Claim → evidence map	same file (mermaid diagram above)	—
4. Figure specs	same file, one `spec` block per planned figure	supplies the `config.yaml` contents: `journal`, `required_columns`, `data_sources`, `width`. Spec + data path is enough to scaffold the session.
5. Paragraph skeletons	`paperX/manuscript.qmd` (draft subsection)	—
6. Generate figure, iterate	one `figure_generator` session per figure	runs its full nine-step protocol
7. Coherence passes	back in `methodology_claims.md`	—

Concrete session pattern. For each figure in the claim-to-evidence map, a single Claude Code session is invoked inside the figure_generator repo with a prompt that pastes the corresponding spec block from methodology_claims.md. The session runs steps 1–9 of the engine's protocol, produces the PNG / SVG / PDF plus CAPTION.md and a figXX_provenance.json with the claim ID it supports. The session's commit message cross-references the claim:

figure(methodology-j2-fig05): tripod vs. monopile fixity ratio

Claim: C2 (load redistribution collapses fixity for L/D ≤ 2)
Thesis: J2-methodology-thesis
Data: paperJ2/figure_inputs/fig05_tripod_vs_monopile.parquet
Journal: ocean_engineering

The commit message is a load-bearing audit artefact. Walking git log in figure_generator for any paper's figures reads back as the argument itself, each commit tied to a claim tied to a thesis. Defects are locatable: a reviewer objection becomes git log --grep "Claim: C2" and the single figure session responsible is found in seconds.

What changes in figure_generator. Nothing, necessarily. The repo already accepts a config.yaml that's a superset of what my spec produces. The only durable addition is a claim_id field in config.yaml (optional, ignored by the build, surfaced in the gallery and commit message) so that every figure self-identifies against the methodology or results thesis it is supporting.

Section-level second- and third-order effects¶

The second-order effect is more important than the tool itself: every paper's planning/ folder gets two new files — methodology_claims.md and results_claims.md — containing thesis, claim list, and claim-to-evidence map. That file becomes the source of truth. Figures and paragraphs are both generated from it. When a reviewer challenges a result, I don't patch the paragraph — I walk the chain: thesis → claim → evidence → figure. Defects become locatable instead of diffuse.

The third-order effect is the cheapest insurance I have ever been offered: if I cannot construct a claim list whose evidence actually proves the thesis, the thesis is wrong or the research question is mis-scoped, and I know this before spending three weeks making figures for a section that was never going to land. An argument pre-mortem, run in hours rather than months.

Part 2 — The Data Organization That Makes the Workflow Executable¶

The source-organized trap¶

My current structure is source-organized: centrifuge_data/, field_data/, numerical_data/, numerical_model/. That's how the data was collected. The claim-driven workflow, however, needs data organized by the claim it supports, not by the sensor that produced it. Paper B consumes centrifuge and field. Paper A consumes all three. Every figure script that reaches back to raw does the heavy lifting again — load, clean, align, normalise, filter — and pre-processing leaks into visualisation. The same cleaning gets silently re-invented in three places with subtle drift, and two months later V1 and V2 disagree by 3% on a number that should be identical.

I have already started a processed layer — centrifuge_data/processed/, field_data/processed/, window_index.parquet, integrated_database_1794_canonical.csv. Good. The missing tier is the one above processed: claim-aligned figure inputs. Without it, the boundary between cleaning and evidence preparation stays fuzzy, and a reviewer's "can you redo this with a different baseline?" becomes a one-week task instead of a one-hour task.

Three tiers, not two¶

Tier 0  raw             (immutable, source-organized)           — what I have
Tier 1  processed       (domain-cleaned, schema-locked)         — partially in place
Tier 2  figure_inputs   (claim-aligned, per-paper)              — missing

Figure code reads only from Tier 2. Never raw, never processed directly. The invariant that enforces discipline: if a figure script has a pd.read_csv pointing at centrifuge_data/raw_link.txt, something is wrong.

Principles across the tiers¶

Raw is immutable. Never edit in place. If a sensor calibration changes, write a transform; don't mutate the file. The existing calibration_map.csv is the right pattern — extend it to centrifuge and numerical.
Data contracts at every tier boundary. Every parquet ships with a schema file alongside it: column names, units, dtypes, valid ranges, primary keys. Missing units is the most common silent error in structural engineering. A schema.yml per processed table fixes it.
Canonical join keys across domains. Pick one vocabulary — test_id, window_id, realisation_id, timestamp_utc — and use it everywhere. If centrifuge says trial_name and field says record_id, cross-domain figures become a scripting nightmare.
Provenance recorded, not hoped for. Every processed or figure-input file writes a sidecar {filename}_provenance.json with input hashes, git commit, timestamp, and the script that produced it. One line of discipline buys full auditability.
One-to-many, not many-to-many. Each raw source → one canonical cleaned version → many figure-specific views. Parallel cleaning pipelines that each re-derive the same quantity are how papers end up numerically inconsistent with each other.

Target structure¶

papers/
├── centrifuge_data/processed/        # Tier 1, shared across papers
├── field_data/processed/             # Tier 1, shared
├── numerical_data/                   # Tier 1, already canonical
│   └── integrated_database_1794_canonical.csv
└── paperJ2_oe00984/
    └── figure_inputs/                # Tier 2, paper-scoped
        ├── fig05_tripod_vs_monopile.parquet
        ├── fig05_schema.yml
        ├── fig05_provenance.json
        ├── fig08_pl1_fit_vs_fe.parquet
        └── build_fig05_tripod_vs_monopile.py

The filename is the claim reference. fig05_tripod_vs_monopile names the argument, not the data source. The figure_inputs/ listing for a paper, read end-to-end, tells me every claim the paper makes, materialised as data.

Minimum viable version¶

Start here; defer the rest:

Add paperX/figure_inputs/ to each paper folder.
For each figure, write a small build_figXX.py script that joins from Tier 1 and writes the parquet. Commit the script alongside the data.
Write a one-line figXX_provenance.json sidecar.
Ensure figure code reads only from figure_inputs/.

That is roughly 80% of the value in an afternoon. DVC, orchestration frameworks, automated pipeline runners — the other 20% — can wait until the payoff is visible.

Synthesis: the two halves snap together¶

The writing workflow and the data organization are the same discipline at two levels:

Phase 3 of the workflow (claim → evidence map) resolves to a Tier 2 file path. If the path exists, the claim is testable; if it doesn't, the next task is to build the Tier 2 parquet, not to render the figure.
The figure_generator engine's input set becomes {thesis, claim, figure_spec, figure_input_path}. Four fields, all traceable; each independently checkable.
A figure can be wrong because the Tier 2 input was wrong (data problem), or because the visual encoding was wrong (engine problem), or because the claim itself was mis-specified (argument problem). The tier structure tells me which — defect localisation by construction.

The second-order benefit is internal consistency across the dissertation. V1, V2, B, and A all consume Gunsan 32-month field data. Their Tier 2 views diverge — V1 needs raw frequency, V2 needs all channels in parked state, B needs coherence features, A needs multi-channel fused — but all four start from the same Tier 1 cleaned table. The four papers cannot disagree on the underlying numbers, because they share the same base truth. That is the property a dissertation built from separate publications must have, and the file-system guarantees it rather than vigilance.

The third-order benefit is an argument pre-mortem earlier than I have ever had one. Open figure_inputs/ for a paper. Read the filenames end-to-end. If the list reads like a coherent argument, the paper is structurally sound. If it reads like a grab-bag, the structure is off — and the check takes five minutes, not five months.

What I am doing next¶

For each paper, roughly in this order:

Write planning/methodology_claims.md and planning/results_claims.md using the thesis / claim list / evidence map schema. J2 first, because R2 is due 2026-04-29.
For each claim, create the figure_inputs/ parquet with its schema and provenance sidecar. Existing figures get reverse-engineered into this structure; new figures are born into it.
Refactor figure_generator to accept structured specs rather than free-form prompts, and to emit figure + caption draft + paragraph skeleton as a triple.
Apply the seven-phase workflow to the J2 methodology section end-to-end as the prototype. Revise the workflow wherever it breaks in practice.

The next note in this series will be the J2 prototype walkthrough — one section, every phase visible, every decision justified, so the workflow can be critiqued on a real case rather than in the abstract.