Synthesis Methodology¶
How 1,952 papers were compressed into one decision-ready knowledge map in a single session.
The problem¶
Two facts made a conventional linear reading impossible:
- Scale. The dissertation spans five research communities and covers ~2,000 papers from 1902 through 2026. Reading each paper takes 15–60 minutes. Linear reading would cost 6–12 months.
- Coupling. Every paper's contribution only makes sense against the rest of the field. Writing one paper's introduction requires knowing what the field already believes, debates, and leaves open — which is itself a property of the whole corpus, not of any one paper.
Conventional lit-review workflows (snowball sampling from seed papers, read-as-you-write) produce introductions that are defensive rather than offensive — they argue against what the student happens to have read, not against what the field actually is.
The approach: hierarchical divide-and-conquer¶
Parallel agents compress the corpus in four rounds. Each round's output becomes the next round's input. Each round reduces information by roughly an order of magnitude while preserving the claims that survive cross-referencing.
%%{init: {"theme":"base","themeVariables":{"fontSize":"16px","fontFamily":"Inter, system-ui, -apple-system, sans-serif","primaryTextColor":"#1a1a1a","lineColor":"#666"},"flowchart":{"nodeSpacing":30,"rankSpacing":50,"padding":16,"useMaxWidth":true}}}%%
flowchart TB
CORPUS[("<b>1,952 papers</b><br/><span style='font-size:14px'>literature_review/</span>")]:::corpus
subgraph R1 [" <b>ROUND 1 · parallel reading</b> "]
direction LR
A1[Agent 1<br/>~40 papers]
A2[Agent 2<br/>~40 papers]
A3[…45 agents…]
A44[Agent 44<br/>~40 papers]
A45[Agent 45<br/>~40 papers]
end
subgraph R2 [" <b>ROUND 2 · domain compression</b> "]
direction LR
D1[D1<br/>Geotechnical]
D2[D2<br/>SHM]
D3[D3<br/>Centrifuge]
D4[D4<br/>ML · decision]
D5[D5<br/>OWT scour]
end
subgraph R3 [" <b>ROUND 3 · master synthesis</b> "]
MM["<b>Master<br/>Knowledge<br/>Map</b>"]
end
subgraph R4 [" <b>ROUND 4 · per-paper gap claims</b> "]
direction LR
P1[J1] --- P2[J2] --- P3[J3] --- P4[J5] --- P5[J11]
P6[V1] --- P7[V2] --- P8[E] --- P9[A] --- P10[B] --- P11[Op3]
end
CORPUS ==> R1
R1 ==> R2
R2 ==> R3
R3 ==> R4
classDef corpus fill:#f4f4f4,stroke:#444,stroke-width:2px,color:#222
style R1 fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
style R2 fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
style R3 fill:#fff3e0,stroke:#f57c00,stroke-width:2px
style R4 fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
The principle: no agent reads more than ~40 items, but every claim in the final map is backed by at least three domain maps, and every domain map is backed by nine batches. The information is verified by redundancy across independent agents, not by any single agent's thoroughness.
Round 1 — parallel reading (45 agents)¶
- Scope. 1,952 papers sorted alphabetically, split into 9 batches of ~217 papers each, further split into 5 agent slots per batch (~40 papers per agent).
- Prompt pattern. Read all papers in your slot, extract core finding, tag by domain (geotechnical / SHM / ML / centrifuge / offshore wind / reliability / general-mechanics), note cross-references, produce one-page summary.
- Output. 45 Markdown files named
batch{N}_agent{M}.mdin_shared/literature_summaries/. - Browse: Batch summaries index.
%%{init: {"theme":"base","themeVariables":{"fontSize":"16px","fontFamily":"Inter, system-ui, -apple-system, sans-serif","primaryTextColor":"#1a1a1a","lineColor":"#666"},"flowchart":{"nodeSpacing":35,"rankSpacing":55,"padding":16,"useMaxWidth":true}}}%%
flowchart TB
Corpus[("<b>1,952 papers</b><br/><span style='font-size:14px'>alphabetical sort</span>")]:::c
subgraph Part [" <b>9 partitions</b> "]
direction LR
B1[Batch 01<br/>files 1–200]:::b
B2[Batch 02<br/>files 201–400]:::b
B3[Batch 03<br/>files 401–600]:::b
B4[Batch 04<br/>files 601–800]:::b
B5[Batch 05<br/>files 801–1000]:::b
B6[Batch 06<br/>files 1001–1200]:::b
B7[Batch 07<br/>files 1201–1400]:::b
B8[Batch 08<br/>files 1401–1600]:::b
B9[Batch 09<br/>files 1601–1952]:::b
end
subgraph Reading [" <b>Each batch → 5 agents → 5 summaries</b> "]
direction LR
A1["5 agents<br/>~40 papers each"]:::a
A2["5 agents<br/>~70 papers each<br/>(batch 09)"]:::a
S1[5 summaries]:::s
S2[5 summaries]:::s
A1 --> S1
A2 --> S2
end
Corpus ==> Part
B1 & B2 & B3 & B4 & B5 & B6 & B7 & B8 --> A1
B9 --> A2
classDef c fill:#f4f4f4,stroke:#444,stroke-width:2px
classDef b fill:#e3f2fd,stroke:#1565c0,color:#0d47a1
classDef a fill:#fff3e0,stroke:#e65100,color:#e65100
classDef s fill:#e8f5e9,stroke:#2e7d32,color:#1b5e20
style Part fill:#f0f7ff,stroke:#90caf9,stroke-dasharray:5 5
style Reading fill:#fff8ef,stroke:#ffcc80,stroke-dasharray:5 5
What each batch summary contains¶
- Paper inventory table — author, year, core finding, tags.
- Cross-references — which papers cite or build on which.
- Emergent themes for the batch.
- Gaps or contradictions noticed during reading.
What went right¶
- Parallel execution completed the full corpus in roughly one session's worth of agent time rather than months of linear reading.
- Tag-based categorisation allowed Round 2 to select batches by domain rather than re-reading.
- Redundancy across agents caught contradictions (different summaries flagged the same paper differently — these flags propagated upward).
What went differently from plan¶
- Plan said 10 batches of ~200 files each; actual was 9 batches because the final file bucket was too small to warrant its own batch.
- Some "stub" files (Gibbs 1902, Feynman 1942 thesis) had no extractable content — flagged as
-- stub --rather than summarised. - Several Korean-translated duplicates (ISO 19901-4, etc.) were noted but not double-counted.
Round 2 — domain compression (5 agents)¶
- Scope. Each of 5 domain agents consumed all 45 batch summaries but wrote only the slice in its domain.
- Output. 5 files
DOMAIN_{N}_{name}.mdin_shared/literature_summaries/.
%%{init: {"theme":"base","themeVariables":{"fontSize":"16px","fontFamily":"Inter, system-ui, -apple-system, sans-serif","primaryTextColor":"#1a1a1a","lineColor":"#666"},"flowchart":{"nodeSpacing":35,"rankSpacing":55,"padding":16,"useMaxWidth":true}}}%%
flowchart TB
subgraph Inputs [" <b>Input</b> "]
B["<b>45 batch summaries</b><br/><span style='font-size:14px'>all 45 read by every<br/>domain agent</span>"]:::in
end
subgraph Agents [" <b>5 domain agents (parallel)</b> "]
direction LR
A1[D1 agent<br/>Geotech]:::ag
A2[D2 agent<br/>SHM]:::ag
A3[D3 agent<br/>Centrifuge]:::ag
A4[D4 agent<br/>ML · decision]:::ag
A5[D5 agent<br/>OWT scour]:::ag
end
subgraph Outputs [" <b>5 domain maps</b> "]
direction LR
O1["<b>D1 · Geotechnical</b><br/>170 lines"]:::out
O2["<b>D2 · SHM</b><br/>194 lines"]:::out
O3["<b>D3 · Centrifuge</b><br/>199 lines"]:::out
O4["<b>D4 · ML · decision</b><br/>181 lines"]:::out
O5["<b>D5 · OWT scour</b><br/>204 lines"]:::out
end
B ==> A1 & A2 & A3 & A4 & A5
A1 ==> O1
A2 ==> O2
A3 ==> O3
A4 ==> O4
A5 ==> O5
classDef in fill:#e3f2fd,stroke:#1565c0,stroke-width:2px,color:#0d47a1
classDef ag fill:#fff,stroke:#555,stroke-width:1.5px
classDef out fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px,color:#1b5e20
style Inputs fill:#f0f7ff,stroke:#90caf9,stroke-dasharray:5 5
style Agents fill:#fafafa,stroke:#bdbdbd,stroke-dasharray:5 5
style Outputs fill:#f1f8e9,stroke:#aed581,stroke-dasharray:5 5
Structure of each domain map¶
- Established knowledge — consensus claims within the domain, with anchor citations.
- Active frontiers — what was advanced in 2023–2025.
- Open questions and debates — where the domain has not yet converged.
- Methods and tools — dominant techniques.
- Dissertation relevance — which of the 11 papers lives in this domain and what each contributes.
Browse: D1 Geotechnical · D2 SHM · D3 Centrifuge · D4 ML and Decision · D5 Offshore Wind Scour.
Round 3 — master synthesis (1 agent)¶
- Scope. Single agent consumed all 5 domain maps and produced one integrated view.
- Output.
_shared/MASTER_KNOWLEDGE_MAP.md— published here as Master Knowledge Map.
%%{init: {"theme":"base","themeVariables":{"fontSize":"16px","fontFamily":"Inter, system-ui, -apple-system, sans-serif","primaryTextColor":"#1a1a1a","lineColor":"#666"},"flowchart":{"nodeSpacing":35,"rankSpacing":55,"padding":16,"useMaxWidth":true}}}%%
flowchart TB
subgraph In [" <b>Input · 5 domain maps</b> "]
direction LR
D1[D1 · Geotech]:::d
D2[D2 · SHM]:::d
D3[D3 · Centrifuge]:::d
D4[D4 · ML]:::d
D5[D5 · OWT scour]:::d
end
MM["<b>Master Knowledge Map</b><br/><span style='font-size:14px'>single-agent synthesis</span>"]:::mm
subgraph Out [" <b>Six output sections</b> "]
direction LR
S1["<b>Field consensus</b><br/>10 claims"]:::s
S2["<b>Open debates</b><br/>5 controversies"]:::s
S3["<b>Verified gaps</b><br/>10 ranked"]:::s
S4["<b>Coverage matrix</b><br/>11 × 10"]:::s
S5["<b>Offensive framings</b><br/>11 sentences"]:::s
S6["<b>Ultimate questions</b><br/>11 RQs"]:::s
end
D1 & D2 & D3 & D4 & D5 ==> MM
MM ==> S1 & S2 & S3 & S4 & S5 & S6
classDef d fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px,color:#1b5e20
classDef mm fill:#fff3e0,stroke:#e65100,stroke-width:3px,color:#e65100
classDef s fill:#f3e5f5,stroke:#7b1fa2,stroke-width:1.5px,color:#4a148c
style In fill:#f1f8e9,stroke:#aed581,stroke-dasharray:5 5
style Out fill:#faf5ff,stroke:#e1bee7,stroke-dasharray:5 5
What the master map produces¶
- Paper index — dissertation portfolio table.
- Field consensus — 10 claims appearing in ≥3 of 5 domain maps.
- Field debates — 5 unresolved controversies.
- Verified gaps — 10 gaps confirmed across multiple domains, severity-ranked.
- PhD coverage map — 11-paper × 10-gap matrix.
- Offensive gap framings — one-sentence indictment per paper.
- Ultimate research questions — one unanswerable-by-current-literature question per paper.
- Cross-cutting synthesis — dissertation architecture (mechanics → modelling → monitoring → decision) and the single largest remaining vulnerability.
Round 4 — per-paper claim extraction (in progress)¶
- Scope. Each of the 11 paper agents consumes the master map plus its own paper's current introduction, and produces an enhanced gap claim — a paragraph that positions the paper against the field rather than against whatever the author happened to have read.
- Output.
_shared/ENHANCED_GAP_CLAIMS.md— one section per paper. - Status. Completed for J1, J2, J3, V1. Remaining seven being generated during the current writing sprint.
%%{init: {"theme":"base","themeVariables":{"fontSize":"16px","fontFamily":"Inter, system-ui, -apple-system, sans-serif","primaryTextColor":"#1a1a1a","lineColor":"#666"},"flowchart":{"nodeSpacing":35,"rankSpacing":55,"padding":16,"useMaxWidth":true}}}%%
flowchart TB
MM["<b>Master Map</b><br/><span style='font-size:14px'>consensus · debates<br/>gaps · coverage</span>"]:::mm
PI["<b>Paper's current<br/>introduction</b><br/><span style='font-size:14px'>manuscript.qmd</span>"]:::pi
PA["<b>Paper agent</b><br/><span style='font-size:14px'>one of 11</span>"]:::pa
EGC["<b>Enhanced gap claim</b><br/><span style='font-size:14px'>~1 paragraph</span>"]:::egc
RL["<b>Reading list</b><br/><span style='font-size:14px'>top-5 citations</span>"]:::rl
INTRO["<b>Paper's intro opener</b><br/><span style='font-size:14px'>updated in place</span>"]:::intro
MM ==> PA
PI ==> PA
PA ==> EGC
PA ==> RL
EGC -. <b>replaces</b> .-> INTRO
classDef mm fill:#fff3e0,stroke:#e65100,stroke-width:2px,color:#e65100
classDef pi fill:#e3f2fd,stroke:#1565c0,stroke-width:2px,color:#0d47a1
classDef pa fill:#fff,stroke:#333,stroke-width:2px
classDef egc fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px,color:#4a148c
classDef rl fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px,color:#1b5e20
classDef intro fill:#fce4ec,stroke:#c2185b,stroke-width:2px,color:#880e4f
The enhanced gap claim replaces the paper's existing introduction opener. The effect is that the same sentence ("the field has failed to do X") appears in both the master map and the paper, making the paper's contribution traceable to the synthesis rather than to the author's memory.
Provenance chain¶
Every claim in a paper's introduction can be traced back to its source:
%%{init: {"theme":"base","themeVariables":{"fontSize":"16px","fontFamily":"Inter, system-ui, -apple-system, sans-serif","primaryTextColor":"#1a1a1a","lineColor":"#666"},"flowchart":{"nodeSpacing":35,"rankSpacing":55,"padding":16,"useMaxWidth":true}}}%%
flowchart TB
P["<b>Paper intro claim</b><br/><span style='font-size:14px'>e.g. 'the field has failed to X'</span>"]:::p
MM["<b>Master map · consensus #N</b><br/><span style='font-size:14px'>MASTER_KNOWLEDGE_MAP.md</span>"]:::m
D["<b>Domain map · section X.Y</b><br/><span style='font-size:14px'>DOMAIN_{n}.md</span>"]:::d
B["<b>Batch summary · agent M</b><br/><span style='font-size:14px'>batchNN_agentM.md</span>"]:::b
O["<b>Original paper</b><br/><span style='font-size:14px'>literature_review/*.md</span>"]:::o
P == "cites" ==> MM
MM == "synthesises" ==> D
D == "compresses" ==> B
B == "reads" ==> O
classDef p fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px,color:#4a148c
classDef m fill:#fff3e0,stroke:#e65100,stroke-width:2px,color:#e65100
classDef d fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px,color:#1b5e20
classDef b fill:#e3f2fd,stroke:#1565c0,stroke-width:2px,color:#0d47a1
classDef o fill:#f4f4f4,stroke:#444,stroke-width:2px,color:#333
Why this worked¶
The four-round structure has the same information-theoretic property as a prefix tree:
- Each round reduces the information to its invariants under independent re-readings.
- A claim that survives three domain maps is much more reliable than a claim asserted once in a linear review.
- The dissertation introduction now has a provenance for every claim it makes: master map → domain map → batch summary → original paper.
The cost is agent time and disk space. The _shared/literature_summaries/ directory carries ~50 Markdown files totalling under a megabyte — a small price for a traceable, auditable lit review.
What is not in the synthesis¶
- Korean-language literature that is not also available in English is underrepresented; only papers already in the internal digested-literature store were included.
- Conference proceedings without associated journal papers (ISFOG, Offshore Site Investigation, etc.) are captured where they exist in the store but are not exhaustive.
- Papers published after 2026-04-17 (the synthesis date) are not in the knowledge map. The planned regeneration cadence is quarterly.