Skip to content

Synthesis Methodology

How 1,952 papers were compressed into one decision-ready knowledge map in a single session.

The problem

Two facts made a conventional linear reading impossible:

  1. Scale. The dissertation spans five research communities and covers ~2,000 papers from 1902 through 2026. Reading each paper takes 15–60 minutes. Linear reading would cost 6–12 months.
  2. Coupling. Every paper's contribution only makes sense against the rest of the field. Writing one paper's introduction requires knowing what the field already believes, debates, and leaves open — which is itself a property of the whole corpus, not of any one paper.

Conventional lit-review workflows (snowball sampling from seed papers, read-as-you-write) produce introductions that are defensive rather than offensive — they argue against what the student happens to have read, not against what the field actually is.

The approach: hierarchical divide-and-conquer

Parallel agents compress the corpus in four rounds. Each round's output becomes the next round's input. Each round reduces information by roughly an order of magnitude while preserving the claims that survive cross-referencing.

%%{init: {"theme":"base","themeVariables":{"fontSize":"16px","fontFamily":"Inter, system-ui, -apple-system, sans-serif","primaryTextColor":"#1a1a1a","lineColor":"#666"},"flowchart":{"nodeSpacing":30,"rankSpacing":50,"padding":16,"useMaxWidth":true}}}%%
flowchart TB
    CORPUS[("<b>1,952 papers</b><br/><span style='font-size:14px'>literature_review/</span>")]:::corpus

    subgraph R1 ["&nbsp;<b>ROUND 1 · parallel reading</b>&nbsp;"]
        direction LR
        A1[Agent 1<br/>~40 papers]
        A2[Agent 2<br/>~40 papers]
        A3[…45 agents…]
        A44[Agent 44<br/>~40 papers]
        A45[Agent 45<br/>~40 papers]
    end

    subgraph R2 ["&nbsp;<b>ROUND 2 · domain compression</b>&nbsp;"]
        direction LR
        D1[D1<br/>Geotechnical]
        D2[D2<br/>SHM]
        D3[D3<br/>Centrifuge]
        D4[D4<br/>ML · decision]
        D5[D5<br/>OWT scour]
    end

    subgraph R3 ["&nbsp;<b>ROUND 3 · master synthesis</b>&nbsp;"]
        MM["<b>Master<br/>Knowledge<br/>Map</b>"]
    end

    subgraph R4 ["&nbsp;<b>ROUND 4 · per-paper gap claims</b>&nbsp;"]
        direction LR
        P1[J1] --- P2[J2] --- P3[J3] --- P4[J5] --- P5[J11]
        P6[V1] --- P7[V2] --- P8[E] --- P9[A] --- P10[B] --- P11[Op3]
    end

    CORPUS ==> R1
    R1 ==> R2
    R2 ==> R3
    R3 ==> R4

    classDef corpus fill:#f4f4f4,stroke:#444,stroke-width:2px,color:#222
    style R1 fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    style R2 fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
    style R3 fill:#fff3e0,stroke:#f57c00,stroke-width:2px
    style R4 fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px

The principle: no agent reads more than ~40 items, but every claim in the final map is backed by at least three domain maps, and every domain map is backed by nine batches. The information is verified by redundancy across independent agents, not by any single agent's thoroughness.

Round 1 — parallel reading (45 agents)

  • Scope. 1,952 papers sorted alphabetically, split into 9 batches of ~217 papers each, further split into 5 agent slots per batch (~40 papers per agent).
  • Prompt pattern. Read all papers in your slot, extract core finding, tag by domain (geotechnical / SHM / ML / centrifuge / offshore wind / reliability / general-mechanics), note cross-references, produce one-page summary.
  • Output. 45 Markdown files named batch{N}_agent{M}.md in _shared/literature_summaries/.
  • Browse: Batch summaries index.
%%{init: {"theme":"base","themeVariables":{"fontSize":"16px","fontFamily":"Inter, system-ui, -apple-system, sans-serif","primaryTextColor":"#1a1a1a","lineColor":"#666"},"flowchart":{"nodeSpacing":35,"rankSpacing":55,"padding":16,"useMaxWidth":true}}}%%
flowchart TB
    Corpus[("<b>1,952 papers</b><br/><span style='font-size:14px'>alphabetical sort</span>")]:::c

    subgraph Part ["&nbsp;<b>9 partitions</b>&nbsp;"]
        direction LR
        B1[Batch 01<br/>files 1–200]:::b
        B2[Batch 02<br/>files 201–400]:::b
        B3[Batch 03<br/>files 401–600]:::b
        B4[Batch 04<br/>files 601–800]:::b
        B5[Batch 05<br/>files 801–1000]:::b
        B6[Batch 06<br/>files 1001–1200]:::b
        B7[Batch 07<br/>files 1201–1400]:::b
        B8[Batch 08<br/>files 1401–1600]:::b
        B9[Batch 09<br/>files 1601–1952]:::b
    end

    subgraph Reading ["&nbsp;<b>Each batch → 5 agents → 5 summaries</b>&nbsp;"]
        direction LR
        A1["5 agents<br/>~40 papers each"]:::a
        A2["5 agents<br/>~70 papers each<br/>(batch 09)"]:::a
        S1[5 summaries]:::s
        S2[5 summaries]:::s
        A1 --> S1
        A2 --> S2
    end

    Corpus ==> Part
    B1 & B2 & B3 & B4 & B5 & B6 & B7 & B8 --> A1
    B9 --> A2

    classDef c fill:#f4f4f4,stroke:#444,stroke-width:2px
    classDef b fill:#e3f2fd,stroke:#1565c0,color:#0d47a1
    classDef a fill:#fff3e0,stroke:#e65100,color:#e65100
    classDef s fill:#e8f5e9,stroke:#2e7d32,color:#1b5e20

    style Part fill:#f0f7ff,stroke:#90caf9,stroke-dasharray:5 5
    style Reading fill:#fff8ef,stroke:#ffcc80,stroke-dasharray:5 5

What each batch summary contains

  1. Paper inventory table — author, year, core finding, tags.
  2. Cross-references — which papers cite or build on which.
  3. Emergent themes for the batch.
  4. Gaps or contradictions noticed during reading.

What went right

  • Parallel execution completed the full corpus in roughly one session's worth of agent time rather than months of linear reading.
  • Tag-based categorisation allowed Round 2 to select batches by domain rather than re-reading.
  • Redundancy across agents caught contradictions (different summaries flagged the same paper differently — these flags propagated upward).

What went differently from plan

  • Plan said 10 batches of ~200 files each; actual was 9 batches because the final file bucket was too small to warrant its own batch.
  • Some "stub" files (Gibbs 1902, Feynman 1942 thesis) had no extractable content — flagged as -- stub -- rather than summarised.
  • Several Korean-translated duplicates (ISO 19901-4, etc.) were noted but not double-counted.

Round 2 — domain compression (5 agents)

  • Scope. Each of 5 domain agents consumed all 45 batch summaries but wrote only the slice in its domain.
  • Output. 5 files DOMAIN_{N}_{name}.md in _shared/literature_summaries/.
%%{init: {"theme":"base","themeVariables":{"fontSize":"16px","fontFamily":"Inter, system-ui, -apple-system, sans-serif","primaryTextColor":"#1a1a1a","lineColor":"#666"},"flowchart":{"nodeSpacing":35,"rankSpacing":55,"padding":16,"useMaxWidth":true}}}%%
flowchart TB
    subgraph Inputs ["&nbsp;<b>Input</b>&nbsp;"]
        B["<b>45 batch summaries</b><br/><span style='font-size:14px'>all 45 read by every<br/>domain agent</span>"]:::in
    end

    subgraph Agents ["&nbsp;<b>5 domain agents (parallel)</b>&nbsp;"]
        direction LR
        A1[D1 agent<br/>Geotech]:::ag
        A2[D2 agent<br/>SHM]:::ag
        A3[D3 agent<br/>Centrifuge]:::ag
        A4[D4 agent<br/>ML · decision]:::ag
        A5[D5 agent<br/>OWT scour]:::ag
    end

    subgraph Outputs ["&nbsp;<b>5 domain maps</b>&nbsp;"]
        direction LR
        O1["<b>D1 · Geotechnical</b><br/>170 lines"]:::out
        O2["<b>D2 · SHM</b><br/>194 lines"]:::out
        O3["<b>D3 · Centrifuge</b><br/>199 lines"]:::out
        O4["<b>D4 · ML · decision</b><br/>181 lines"]:::out
        O5["<b>D5 · OWT scour</b><br/>204 lines"]:::out
    end

    B ==> A1 & A2 & A3 & A4 & A5
    A1 ==> O1
    A2 ==> O2
    A3 ==> O3
    A4 ==> O4
    A5 ==> O5

    classDef in fill:#e3f2fd,stroke:#1565c0,stroke-width:2px,color:#0d47a1
    classDef ag fill:#fff,stroke:#555,stroke-width:1.5px
    classDef out fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px,color:#1b5e20

    style Inputs fill:#f0f7ff,stroke:#90caf9,stroke-dasharray:5 5
    style Agents fill:#fafafa,stroke:#bdbdbd,stroke-dasharray:5 5
    style Outputs fill:#f1f8e9,stroke:#aed581,stroke-dasharray:5 5

Structure of each domain map

  1. Established knowledge — consensus claims within the domain, with anchor citations.
  2. Active frontiers — what was advanced in 2023–2025.
  3. Open questions and debates — where the domain has not yet converged.
  4. Methods and tools — dominant techniques.
  5. Dissertation relevance — which of the 11 papers lives in this domain and what each contributes.

Browse: D1 Geotechnical · D2 SHM · D3 Centrifuge · D4 ML and Decision · D5 Offshore Wind Scour.

Round 3 — master synthesis (1 agent)

  • Scope. Single agent consumed all 5 domain maps and produced one integrated view.
  • Output. _shared/MASTER_KNOWLEDGE_MAP.md — published here as Master Knowledge Map.
%%{init: {"theme":"base","themeVariables":{"fontSize":"16px","fontFamily":"Inter, system-ui, -apple-system, sans-serif","primaryTextColor":"#1a1a1a","lineColor":"#666"},"flowchart":{"nodeSpacing":35,"rankSpacing":55,"padding":16,"useMaxWidth":true}}}%%
flowchart TB
    subgraph In ["&nbsp;<b>Input · 5 domain maps</b>&nbsp;"]
        direction LR
        D1[D1 · Geotech]:::d
        D2[D2 · SHM]:::d
        D3[D3 · Centrifuge]:::d
        D4[D4 · ML]:::d
        D5[D5 · OWT scour]:::d
    end

    MM["<b>Master Knowledge Map</b><br/><span style='font-size:14px'>single-agent synthesis</span>"]:::mm

    subgraph Out ["&nbsp;<b>Six output sections</b>&nbsp;"]
        direction LR
        S1["<b>Field consensus</b><br/>10 claims"]:::s
        S2["<b>Open debates</b><br/>5 controversies"]:::s
        S3["<b>Verified gaps</b><br/>10 ranked"]:::s
        S4["<b>Coverage matrix</b><br/>11 × 10"]:::s
        S5["<b>Offensive framings</b><br/>11 sentences"]:::s
        S6["<b>Ultimate questions</b><br/>11 RQs"]:::s
    end

    D1 & D2 & D3 & D4 & D5 ==> MM
    MM ==> S1 & S2 & S3 & S4 & S5 & S6

    classDef d fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px,color:#1b5e20
    classDef mm fill:#fff3e0,stroke:#e65100,stroke-width:3px,color:#e65100
    classDef s fill:#f3e5f5,stroke:#7b1fa2,stroke-width:1.5px,color:#4a148c

    style In fill:#f1f8e9,stroke:#aed581,stroke-dasharray:5 5
    style Out fill:#faf5ff,stroke:#e1bee7,stroke-dasharray:5 5

What the master map produces

  1. Paper index — dissertation portfolio table.
  2. Field consensus — 10 claims appearing in ≥3 of 5 domain maps.
  3. Field debates — 5 unresolved controversies.
  4. Verified gaps — 10 gaps confirmed across multiple domains, severity-ranked.
  5. PhD coverage map — 11-paper × 10-gap matrix.
  6. Offensive gap framings — one-sentence indictment per paper.
  7. Ultimate research questions — one unanswerable-by-current-literature question per paper.
  8. Cross-cutting synthesis — dissertation architecture (mechanics → modelling → monitoring → decision) and the single largest remaining vulnerability.

Round 4 — per-paper claim extraction (in progress)

  • Scope. Each of the 11 paper agents consumes the master map plus its own paper's current introduction, and produces an enhanced gap claim — a paragraph that positions the paper against the field rather than against whatever the author happened to have read.
  • Output. _shared/ENHANCED_GAP_CLAIMS.md — one section per paper.
  • Status. Completed for J1, J2, J3, V1. Remaining seven being generated during the current writing sprint.
%%{init: {"theme":"base","themeVariables":{"fontSize":"16px","fontFamily":"Inter, system-ui, -apple-system, sans-serif","primaryTextColor":"#1a1a1a","lineColor":"#666"},"flowchart":{"nodeSpacing":35,"rankSpacing":55,"padding":16,"useMaxWidth":true}}}%%
flowchart TB
    MM["<b>Master Map</b><br/><span style='font-size:14px'>consensus · debates<br/>gaps · coverage</span>"]:::mm
    PI["<b>Paper's current<br/>introduction</b><br/><span style='font-size:14px'>manuscript.qmd</span>"]:::pi

    PA["<b>Paper agent</b><br/><span style='font-size:14px'>one of 11</span>"]:::pa

    EGC["<b>Enhanced gap claim</b><br/><span style='font-size:14px'>~1 paragraph</span>"]:::egc
    RL["<b>Reading list</b><br/><span style='font-size:14px'>top-5 citations</span>"]:::rl
    INTRO["<b>Paper's intro opener</b><br/><span style='font-size:14px'>updated in place</span>"]:::intro

    MM ==> PA
    PI ==> PA
    PA ==> EGC
    PA ==> RL
    EGC -. <b>replaces</b> .-> INTRO

    classDef mm fill:#fff3e0,stroke:#e65100,stroke-width:2px,color:#e65100
    classDef pi fill:#e3f2fd,stroke:#1565c0,stroke-width:2px,color:#0d47a1
    classDef pa fill:#fff,stroke:#333,stroke-width:2px
    classDef egc fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px,color:#4a148c
    classDef rl fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px,color:#1b5e20
    classDef intro fill:#fce4ec,stroke:#c2185b,stroke-width:2px,color:#880e4f

The enhanced gap claim replaces the paper's existing introduction opener. The effect is that the same sentence ("the field has failed to do X") appears in both the master map and the paper, making the paper's contribution traceable to the synthesis rather than to the author's memory.

Provenance chain

Every claim in a paper's introduction can be traced back to its source:

%%{init: {"theme":"base","themeVariables":{"fontSize":"16px","fontFamily":"Inter, system-ui, -apple-system, sans-serif","primaryTextColor":"#1a1a1a","lineColor":"#666"},"flowchart":{"nodeSpacing":35,"rankSpacing":55,"padding":16,"useMaxWidth":true}}}%%
flowchart TB
    P["<b>Paper intro claim</b><br/><span style='font-size:14px'>e.g. 'the field has failed to X'</span>"]:::p
    MM["<b>Master map · consensus #N</b><br/><span style='font-size:14px'>MASTER_KNOWLEDGE_MAP.md</span>"]:::m
    D["<b>Domain map · section X.Y</b><br/><span style='font-size:14px'>DOMAIN_{n}.md</span>"]:::d
    B["<b>Batch summary · agent M</b><br/><span style='font-size:14px'>batchNN_agentM.md</span>"]:::b
    O["<b>Original paper</b><br/><span style='font-size:14px'>literature_review/*.md</span>"]:::o

    P == "cites" ==> MM
    MM == "synthesises" ==> D
    D == "compresses" ==> B
    B == "reads" ==> O

    classDef p fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px,color:#4a148c
    classDef m fill:#fff3e0,stroke:#e65100,stroke-width:2px,color:#e65100
    classDef d fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px,color:#1b5e20
    classDef b fill:#e3f2fd,stroke:#1565c0,stroke-width:2px,color:#0d47a1
    classDef o fill:#f4f4f4,stroke:#444,stroke-width:2px,color:#333

Why this worked

The four-round structure has the same information-theoretic property as a prefix tree:

  • Each round reduces the information to its invariants under independent re-readings.
  • A claim that survives three domain maps is much more reliable than a claim asserted once in a linear review.
  • The dissertation introduction now has a provenance for every claim it makes: master map → domain map → batch summary → original paper.

The cost is agent time and disk space. The _shared/literature_summaries/ directory carries ~50 Markdown files totalling under a megabyte — a small price for a traceable, auditable lit review.

What is not in the synthesis

  • Korean-language literature that is not also available in English is underrepresented; only papers already in the internal digested-literature store were included.
  • Conference proceedings without associated journal papers (ISFOG, Offshore Site Investigation, etc.) are captured where they exist in the store but are not exhaustive.
  • Papers published after 2026-04-17 (the synthesis date) are not in the knowledge map. The planned regeneration cadence is quarterly.