V — Seven-Phase Claim-Driven Prototype: Methodology + Results (Prospective)¶

Date: 2026-04-18 Status: Third prototype of the seven-phase claim-driven workflow, applied prospectively to Paper V — the consolidated state-function + field-demonstration paper replacing the scrapped V1/V2 split. Two section passes in one note: methodology (§2–§5) and results (§6–§8).

The earlier two prototypes (J2, J3) ran the workflow retrospectively on existing manuscript drafts — useful for surfacing gaps before R2/R1 submission. V is different: the manuscript does not yet exist. The workflow is run prospectively to shape what gets written, so the pay-off is upstream of the first draft rather than patching a later one. This is the mode the workflow is designed for.

The note is structured in two parts. Part 1 runs the seven phases on V's methodology sections (§2 theory, §3 equivalence proof, §4 practical pipeline, §5 datasets). Part 2 runs them on V's results sections (§6 four-method benchmark, §7 field demonstration, §8 discussion). A short synthesis at the end highlights what the two passes surfaced together.

Part 1 — Methodology (§2–§5)¶

Phase 1 — Thesis statement¶

One sentence the methodology sections must prove collectively:

A physics-constrained state-function decomposition of reversible environmental-operational variability can be derived from conservation of modal mass and modal stiffness, proved algebraically equivalent to Johansen cointegration under Gaussian-linear regression conditions, and implemented as a regime-split RANSAC estimator paired with an EWMA persistence filter — producing a reproducible compensation pipeline whose inputs are an operational modal-analysis feature stream and whose outputs are a detection-ready residual.

Phase 2 — Claim list¶

C1 — Theoretical foundation. The state-function decomposition follows from modal-balance conservation arguments, not from curve-fitting; the reversible-versus-irreversible partition is a physical invariant rather than a statistical convenience.
C2 — Cointegration equivalence proof. Under Gaussian-linear regression conditions (linear mapping, jointly Gaussian environmental drivers, full-rank environmental matrix, single regime), the state-function residual is asymptotically identical to the Johansen error-correction residual. The proof is explicit, and the conditions required are not hidden.
C3 — Counterexamples where the equivalence fails. Three classes of counterexample (regime-shift boundary, non-Gaussian noise, rank-deficient environmental vector) show that state-function strictly outperforms Johansen outside Gaussian-linear conditions — the equivalence is a ceiling, not a limit.
C4 — Practical estimator is robust and reproducible. Regime-split RANSAC, EWMA persistence (span 48 at 10-minute cadence), SSI-COV with documented block-Hankel and model-order choices, MAC-based modal tracking, and Butterworth-filtered displacement estimation are specified at the level a sceptical reviewer can replicate.
C5 — Datasets are fit-for-purpose. The Gunsan 32-month field record and the public SHM benchmark (Z-24 Bridge candidate) are both appropriate for the EOV compensation problem; sensor topology, sampling rates, and damage-injection protocols are stated per dataset.

Phase 3 — Claim → evidence map¶

flowchart TD
    T["<b>Thesis (methodology)</b><br/>State-function derivation · equivalence proof ·<br/>reproducible pipeline on two datasets"]
    T --> C1["C1 · Conservation-based derivation"]
    T --> C2["C2 · Gaussian-linear equivalence to Johansen"]
    T --> C3["C3 · Three counterexamples"]
    T --> C4["C4 · Robust reproducible estimator"]
    T --> C5["C5 · Datasets fit-for-purpose"]
    C1 --> E1["Fig · modal-balance schematic<br/>Text · derivation §2.1–§2.3"]
    C2 --> E2["Theorem · explicit statement<br/>Proof · §3.2<br/>Table · conditions checklist"]
    C3 --> E3["Fig · regime-shift counterexample<br/>Fig · heavy-tail noise counterexample<br/>Fig · colinearity counterexample"]
    C4 --> E4["Fig · pipeline flowchart<br/>Table · hyperparameter settings<br/>Algo · regime-split RANSAC pseudocode"]
    C5 --> E5["Fig · Gunsan sensor layout<br/>Fig · Z-24 sensor layout<br/>Table · dataset comparison"]
    E1 --> S1["spec · fig-modal-balance"]
    E2 --> S2["spec · table-equivalence-conditions"]
    E3 --> S3["spec · fig-counterexample-regime<br/>spec · fig-counterexample-heavytail<br/>spec · fig-counterexample-rank"]
    E4 --> S4["spec · fig-pipeline-flowchart<br/>spec · table-hyperparameters"]
    E5 --> S5["spec · fig-sensors-gunsan<br/>spec · fig-sensors-z24<br/>spec · table-datasets"]

Eight figures, three tables. Note that C2 (the equivalence) has only a single evidence slot beyond prose — that's because the proof is the evidence, and there is no figure that can substitute for the algebra.

Phase 4 — Figure specs¶

fig-modal-balance (supports C1). Must show modal mass, modal stiffness, and modal damping as partitioned quantities, with the reversible (environmental-driven) and irreversible (damage-driven) components visualized as two separate evolution paths. Single panel; annotations identify which conservation argument supplies each term.

table-equivalence-conditions (supports C2). Four-column table: (1) condition name, (2) formal statement, (3) typical diagnostic for testing it in practice, (4) consequence when violated. Rows include "linearity of environmental mapping", "joint Gaussianity of drivers", "full-rank environmental matrix", "single regime during analysis window".

fig-counterexample-regime (supports C3a). Must show feature versus wind speed coloured by operational regime (parked, power-production, startup, shutdown), with a single global regression line and separate per-regime regression lines overlaid. Single regression leaves systematic residual at regime boundaries; per-regime fits eliminate it.

fig-counterexample-heavytail (supports C3b). Must show a simulated dataset with heavy-tailed noise (e.g., Student-t with df = 3) and overlay the OLS/Gaussian-MLE fit against the RANSAC-robust fit, with residual histograms below each. The Gaussian-MLE is inefficient; RANSAC recovers the true slope.

fig-counterexample-rank (supports C3c). Must show the condition number of the Johansen Π matrix as a function of the correlation between two environmental drivers. When the drivers become colinear (correlation → 1), Π becomes singular and the estimator explodes; state-function's regime split plus regularisation stays bounded.

fig-pipeline-flowchart (supports C4). Must show the end-to-end compensation pipeline as a block diagram: raw acceleration + strain → OMA (SSI-COV) → feature extraction → regime classification → regime-split RANSAC → EWMA persistence filter → residual stream. Every arrow labelled by data type and sampling cadence.

table-hyperparameters (supports C4). Hyperparameter specifications for the pipeline: block Hankel row count, model-order range (SSI-COV), MAC threshold (mode tracking), Butterworth order and corner frequency (displacement estimation), RANSAC inlier threshold, EWMA span. One row per hyperparameter with its value, justification, and sensitivity flag.

fig-sensors-gunsan (supports C5). Must show the 4.2 MW Gunsan tripod with every sensor labelled: triaxial accelerometers, strain gauges, PPTs, displacement transducers. Cartesian basis indicated (N arrow, \(x/y/z\) triads). Sampling rates per instrument noted in caption.

fig-sensors-z24 (supports C5). Must show the Z-24 Bridge benchmark sensor topology (or the chosen public benchmark). Analogous to Gunsan layout. Caption states licensing and preprocessing link.

table-datasets (supports C5). Side-by-side comparison of the Gunsan record and the public benchmark: duration, number of windows, sensor count, environmental covariates, known damage events, damage-injection protocol (if synthetic).

Phase 5 — Paragraph skeletons¶

§2 Theoretical framework (proves C1). ¶1 (motivation). Environmental variability on an OWT drives frequency shifts of order 1–3 %, while scour-induced shifts are order 0.1–1 %; separating the two is the core problem. ¶2 (modal balance). Modal mass and modal stiffness admit a natural partition: reversible (soil stiffness changing with groundwater / temperature) versus irreversible (soil removed by scour). ¶3 (state-function definition) [→ fig-modal-balance]. The reversible partition is parameterised by an environmental state vector; the irreversible partition is a monotone damage coordinate. ¶4 (regime split). Operational states change the kinematic boundary conditions and therefore the partition coefficients; the regime-split form handles this by learning per-regime coefficients.

§3 Equivalence to cointegration (proves C2, C3). ¶1 (scope). Under Gaussian-linear regression conditions, state-function and Johansen cointegration produce asymptotically identical residuals [→ theorem statement]. ¶2 (conditions) [→ table-equivalence-conditions]. The four conditions (linearity, Gaussianity, full rank, single regime) are listed with diagnostics. ¶3 (proof sketch). Both methods reduce to residualising the feature against a linear combination of environmental drivers; OLS and Johansen's reduced-rank regression give the same estimator asymptotically under the stated conditions. Full derivation in Appendix A. ¶4 (counterexamples: regime shift) [→ fig-counterexample-regime]. Per-regime coefficients matter; a single global fit leaves unexplained residual at transitions. ¶5 (counterexamples: heavy tails) [→ fig-counterexample-heavytail]. Gaussian MLE is inefficient under heavy tails; RANSAC is robust. ¶6 (counterexamples: rank deficiency) [→ fig-counterexample-rank]. Colinear drivers singularise Π; state-function's regime-plus-regularisation handles it.

§4 Practical estimator and signal processing (proves C4). ¶1 (pipeline overview) [→ fig-pipeline-flowchart]. Raw sensor streams through OMA to a feature matrix; feature matrix through regime-split RANSAC to a residual stream; residual through EWMA persistence to a detection-ready signal. ¶2 (SSI-COV details). Block Hankel matrix construction with row count motivated by minimum modal period; model-order selection via stability diagram [Peeters & De Roeck 1999; Reynders 2012]. ¶3 (MAC-based modal tracking). MAC against an FE reference mode shape confirms modal identity across the 32-month window; threshold 0.95 applied window-by-window. ¶4 (displacement estimation). Double integration with Butterworth high-pass at 0.05 Hz; residual drift validated against 10-minute window [→ noise-floor figure, deferred to §7]. ¶5 (RANSAC and EWMA) [→ table-hyperparameters]. Inlier threshold motivated by environmental-driver noise floor; EWMA span 48 (8 h) separates scour drift from diurnal cycles.

§5 Datasets (proves C5). ¶1 (Gunsan) [→ fig-sensors-gunsan]. 4.2 MW tripod foundation, 32 months of continuous recording, 22,617 parked-state windows. ¶2 (public benchmark) [→ fig-sensors-z24]. Z-24 Bridge or equivalent; licensing and preprocessing documented. ¶3 (comparison) [→ table-datasets]. Same compensation protocol applied to both; damage events defined per dataset.

Phase 6 — Generate figures (figure_generator integration)¶

Eight figure sessions + three table artefacts. The three counterexample figures (§3) are the highest novelty and the ones most likely to absorb two sessions each because the synthetic data protocols must be reproducible and tied to the equivalence proof. The pipeline-flowchart and sensor-layout figures are the lowest risk — they are largely diagrammatic rather than data-bound.

Example session prompt for the equivalence-proof anchor figure:

Figure: fig-counterexample-regime
Paper: V methodology
Thesis: V-methodology-thesis
Claim: C3a (regime-shift boundary breaks single-β assumption)

Spec: Must show feature y versus environmental driver x coloured by
operational regime (parked · power · startup · shutdown), with a
single global regression line (dashed) and separate per-regime fits
(solid, one colour per regime). Residual panel below shows that the
single-fit residuals have systematic bias at regime boundaries, while
the per-regime fits have zero-mean residuals.

Data: paperV/figure_inputs/fig_counterexample_regime.parquet
Schema: paperV/figure_inputs/fig_counterexample_regime_schema.yml
Journal: engineering_structures
Width: single (90 mm)

Commit message pattern figure(v-methodology-fig03): regime-shift counterexample (Claim C3a) links every figure to the claim structure.

Phase 7 — Coherence passes¶

Pass A — paragraphs only. - §2 and §3 share motivation but treat different layers (definition vs. equivalence); the §2 → §3 transition must not re-argue §2's premise. Fix in advance: §3 opens with "Given the state-function from §2, the following equivalence holds" — no restatement. - §4 currently reads as a procedures list in the skeleton. Fix in advance: frame every procedure statement against a specific reviewer-anticipated objection (e.g. SSI-COV details = anticipates MSSP R1.h). The skeleton gets one sentence per paragraph naming the objection it preempts. - §5 describes both datasets symmetrically, which is fine, but the asymmetry between them (Gunsan is OWT, Z-24 is a bridge) is load-bearing for the cross-dataset claim. Add: one sentence explicitly framing the asymmetry as a feature (domain-transfer evidence) rather than a bug.

Pass B — figures only. - Eight figures span from schematic (fig-modal-balance) through data-bound (fig-counterexample-regime) to diagrammatic (fig-pipeline-flowchart). The visual register shifts across the section — deliberate and fine. - The three counterexample figures should share a visual template (same axis layout, same regression-line colour convention, same residual-panel style) so they read as a triptych. Commit to: a shared counterexample_template.py that the three figure_generator sessions all import. - fig-modal-balance risks being decorative if the conservation partition is not annotated with the specific quantities used later in §2.3. Fix in advance: the annotation must use the same symbols as the equations.

Part 2 — Results (§6–§8)¶

Phase 1 — Thesis statement¶

One sentence the results sections must prove collectively:

The state-function compensator beats cointegration, PCA, and Gaussian-process regression on both datasets across ROC-AUC, F1, and false-alarm metrics under an identical synthetic damage-injection protocol, and delivers zero false alarms over the 32-month Gunsan record with 95 % detection probability at a 0.2 % injected frequency shift — a result that cointegration cannot match because its \(I(1)\) assumption is violated by parked-state OWT features.

Phase 2 — Claim list¶

R1 — Four-method benchmark superiority. State-function is strictly dominant over Johansen cointegration, PCA (grid-searched), and GP regression on both datasets across ROC-AUC, F1, and false-alarm at a fixed detection threshold.
R2 — Cointegration failure diagnosed quantitatively. Full-rank Johansen trace test (p-value near 1) on parked-state OWT features, with an explicit false-alarm-rate breakdown showing the collapse to near-baseline.
R3 — 32-month field demonstration. Zero false alarms over 22,617 parked-state windows; 95 % detection probability at 0.2 % injected frequency shift; 18.9 % lower residual σ than PCA.
R4 — Regime-split attribution. Ablating the regime split from the pipeline collapses the Gunsan false-alarm performance; this proves regime separation is a load-bearing design choice, not a cosmetic one.
R5 — Cross-dataset generalisation. Four-method ranking is preserved on the Z-24 bridge benchmark; the state-function advantage is not Gunsan-specific.

Phase 3 — Claim → evidence map¶

flowchart TD
    T["<b>Thesis (results)</b><br/>State-function beats 3 baselines on 2 datasets ·<br/>zero false alarms · 95% detection at 0.2%"]
    T --> R1["R1 · Four-method benchmark superiority"]
    T --> R2["R2 · Cointegration failure diagnosed"]
    T --> R3["R3 · 32-month field demonstration"]
    T --> R4["R4 · Regime-split attribution"]
    T --> R5["R5 · Cross-dataset generalisation"]
    R1 --> Q1["Fig · ROC curves per method per dataset<br/>Table · ROC-AUC · F1 · FAR grid"]
    R2 --> Q2["Table · Johansen trace test results<br/>Fig · residual autocorrelation diagnosing I(1) failure"]
    R3 --> Q3["Fig · 32-month residual time series<br/>Fig · detection probability vs injected shift"]
    R4 --> Q4["Fig · ablation with and without regime split"]
    R5 --> Q5["Fig · four-method bar chart on Z-24<br/>Comparison with Gunsan ranking"]
    Q1 --> P1["spec · fig-roc-per-dataset<br/>spec · table-benchmark-grid"]
    Q2 --> P2["spec · table-johansen-trace<br/>spec · fig-residual-autocorr"]
    Q3 --> P3["spec · fig-residual-timeseries<br/>spec · fig-detection-probability"]
    Q4 --> P4["spec · fig-ablation-regime"]
    Q5 --> P5["spec · fig-cross-dataset-bar"]

Seven figures, two tables.

Phase 4 — Figure specs (condensed)¶

fig-roc-per-dataset. Two panels (Gunsan, Z-24), each showing four ROC curves (one per method) under the identical damage-injection protocol. AUC annotated per curve.
table-benchmark-grid. Four methods × two datasets × three metrics (ROC-AUC, F1 at operating threshold, false-alarm rate). 12-cell matrix; best per metric per dataset bolded.
table-johansen-trace. Johansen cointegration test on the Gunsan parked-state feature matrix: null hypotheses \(r = 0, 1, 2, \ldots\), trace statistic, 95 % critical value, decision. Full rank rejects cointegration.
fig-residual-autocorr. Autocorrelation of the Johansen residual on the Gunsan record — non-zero autocorrelation at lag ≥ 1 violates the residual-is-white assumption and is the diagnostic that the \(I(1)\) model is mis-specified.
fig-residual-timeseries. 32-month state-function residual time series with detection threshold marked and zero crossings counted. Subpanel: histogram of residual values confirms Gaussian-like behaviour after compensation.
fig-detection-probability. Detection probability (from synthetic injection) versus injected frequency shift magnitude. Threshold bracket: 95 % detection at 0.2 % shift. Overlaid curve for state-function and the three baselines.
fig-ablation-regime. State-function performance with and without regime split on identical Gunsan data. Single panel comparing ROC curves; second panel shows false-alarm time series with regime-boundary transient spikes in the "no regime split" variant.
fig-cross-dataset-bar. Bar chart: four methods × three metrics on Z-24. Same ranking as Gunsan confirms cross-dataset generalisation.

Phase 5 — Paragraph skeletons (condensed)¶

§6 Four-method benchmark experiments (proves R1, R2). ¶1 (protocol). Identical damage-injection across methods; seed-pinned reproducibility. ¶2 (results headline) [→ fig-roc-per-dataset, table-benchmark-grid]. State-function dominant across all six cells. ¶3 (cointegration diagnosis) [→ table-johansen-trace, fig-residual-autocorr]. Full-rank trace test + residual autocorrelation: the \(I(1)\) model is mis-specified on parked-state OWT features. ¶4 (why state-function wins). It picks up regime-boundary variance that the other three methods lump into residual.

§7 Field demonstration (proves R3, R4). ¶1 (32-month result) [→ fig-residual-timeseries]. Zero false alarms over 22,617 windows. ¶2 (synthetic injection) [→ fig-detection-probability]. 95 % detection at 0.2 % shift; σ reduction 18.9 % vs PCA baseline. ¶3 (regime-split attribution) [→ fig-ablation-regime]. Removing the regime split collapses false-alarm performance by a factor of approximately 10× — regime split is load-bearing. ¶4 (cross-dataset generalisation) [→ fig-cross-dataset-bar]. Z-24 ranking confirms Gunsan ranking; state-function advantage is not site-specific.

§8 Discussion and limitations. ¶1 (method-selection guidance). State-function for parked-state OWT; cointegration for genuinely \(I(1)\) infrastructure where its assumptions hold. ¶2 (limitations). Gaussian-linear proof scope; two-dataset validation; synthetic injection. ¶3 (future work). Power-production regime compensation; field-validated damage events (bathymetric ground truth). ¶4 (implications). Compensation pipeline feeds Paper A's Bayesian fusion as the statistical-detection channel.

Phase 6 — Generate figures¶

Seven result-section sessions + two tables. fig-detection-probability is the single figure a Paper A reviewer will want to see immediately (because it sets the minimum detectable scour depth that Paper A's decision tree prices). Produce it first; the headline claim hinges on it.

Phase 7 — Coherence passes¶

Pass A — paragraphs only. - §6 currently assumes the reader has absorbed the equivalence-proof conditions from §3. Fix in advance: open §6 with a one-sentence recap of the Gaussian-linear conditions and why the Gunsan record violates them. - §7's synthetic-injection result (95 % detection at 0.2 %) will attract reviewer scepticism because the injection is linear. Fix in advance: §8 paragraph on limitations must not be the first place this limitation is disclosed — §7 must state it at the point of reporting the number.

Pass B — figures only. - fig-roc-per-dataset is the hero figure but conveys the least specific information (four curves). The quantitative headline is in table-benchmark-grid. Consider: promote the table to in-text rather than end-of-section. - fig-residual-autocorr is the diagnostic that explains cointegration's failure — it is the mechanism-exposing figure. Elevate: make sure this figure is prominent, not buried; it is what readers will remember as the reason cointegration loses.

What the two passes together surfaced¶

Methodology and results are bound by the equivalence proof. Methodology's C2 sets up the ceiling; results' R2 shows the Gunsan record lives above that ceiling (cointegration fails). The pairing is tight: if the proof is revised, both §3 and §6 must update in lockstep. This is why the equivalence proof (V&V commission 6) is the first engineering task — it is upstream of half the manuscript.
The counterexamples are load-bearing both theoretically and empirically. Methodology's C3 lists them abstractly; results' R1 and R4 demonstrate them concretely (regime-split ablation maps to the regime-shift counterexample; Gunsan's parked-state stationarity maps to the \(I(1)\) counterexample). A reviewer who accepts the counterexamples accepts the paper.
Regime split is the most load-bearing design choice in the pipeline. It appears in C4 (methodology), R4 (results ablation), and C3a (counterexample) — three independent argumentative slots. If V fails reviewer scrutiny, this is the first place it will fail; if V succeeds, this is the first place it succeeds. The workflow makes the centrality visible.
Engineering Structures target aligns with the argument shape. The paper reads as a mixed methodology+application work (theory proof + benchmark + field demonstration), which Engineering Structures accepts; it would be a worse fit for a pure-methodology journal (SHM SAGE, Smart Materials and Structures) where the 32-month field record becomes less central.
Total figure count is 15 (eight methodology + seven results), plus five tables. That is a substantial but defensible figure budget for a 14,000-word paper. Every figure traces to exactly one claim; no orphans.

Next step¶

The single highest-value action before drafting begins is V&V commission 6 — the equivalence-proof derivation. If the derivation goes through cleanly, the paper has its full intended scope. If it fails, the paper reframes to "four-method empirical benchmark" and the theoretical headline downgrades. That binary determines the remaining figure count, paragraph count, and roughly 30 % of the writing scope — so it should be answered first.