Skip to main content

Overview

vectorless-compiler is a Rust crate that compiles documents (Markdown, PDF) into agent-friendly intermediate artifacts. It follows the traditional compiler architecture — but instead of compiling source code into machine code, it compiles documents into structured trees, symbol tables, and navigation indexes.

Compiler Analogy

Every concept in a traditional compiler maps directly to what this crate does:

Compiler ConceptVectorless EquivalentWhat It Does
Source codePDF / Markdown / bytesRaw input
LexerMarkdown / PDF parserBreaks document into nodes
ASTDocumentTreeHierarchical data structure
Semantic analysisValidate + Enhance (LLM)Enriches semantic information
IR generationSplit + EnrichOptimizes intermediate representation
Code generationReasoning / Navigation indexGenerates lookup indexes
Symbol tableReasoningIndexname → location mapping
Debug infoNavigationIndexRuntime navigation data
LinkerRoutePass / ChainPassPre-computed routing + reasoning chains
Dead code eliminationOverlapPassDetects duplicate content regions
Optimization hintsScorePassEvidence quality scoring per node
Object filePersistedDocumentSerialized to disk
Incremental compilationFingerprint + incrementalOnly recompiles changed parts

Architecture

The pipeline is organized into four phases, each containing one or more passes:

Frontend 10: Parse → Break document into raw nodes
Frontend 20: Build → Construct tree + apply thinning
Analysis 22: Validate → Tree integrity checks (optional)
Transform 25: Split → Break oversized leaf nodes (optional)
Analysis 30: Enhance → LLM summaries (optional)
Transform 40: Enrich → Metadata + cross-references
Backend 45: Reasoning → Keyword→path symbol table
Backend 47: Concept → Key concept extraction (optional)
Backend 50: Navigation→ Runtime navigation index
Backend 52: Route → Query routing table (optional)
Backend 54: Chain → Reasoning chain index (optional)
Backend 56: Overlap → Content overlap detection (optional)
Backend 58: Score → Evidence quality scoring (optional)
Backend 55: Verify → Output validation
Backend 60: Optimize → Final tree optimization

Each pass is an independent unit that declares its dependencies and access patterns. The orchestrator resolves the dependency graph, groups independent passes for parallel execution, and handles failures with configurable policies.

Module Structure

vectorless-compiler/src/
├── config.rs PipelineOptions, SourceFormat, ThinningConfig
├── parse/ Document parsers (Markdown, PDF)
├── pipeline/ Executor, orchestrator, context, checkpoint
├── passes/
│ ├── frontend/ ParsePass, BuildPass
│ ├── analysis/ ValidatePass, EnhancePass
│ ├── transform/ SplitPass, EnrichPass
│ └── backend/ ReasoningPass, ConceptPass, NavigationPass,
│ RoutePass, ChainPass, OverlapPass, ScorePass,
│ VerifyPass, OptimizePass
├── summary/ Summary strategies (Full, Selective, Lazy)
└── incremental/ Change detection, action resolution, tree update

Quick Example

use vectorless_compiler::{PipelineExecutor, PipelineOptions};
use vectorless_compiler::pipeline::CompilerInput;

// Create executor with LLM enhancement
let executor = PipelineExecutor::with_llm(llm_client);

// Compile a document
let input = CompilerInput::file("./report.pdf");
let options = PipelineOptions::default();
let result = executor.execute(input, options).await?;

// Access outputs
let tree = result.tree.expect("tree must exist");
let reasoning = result.reasoning_index;
let navigation = result.navigation_index;
let routes = result.query_routes; // Agent acceleration
let chains = result.chain_index; // Cross-section reasoning
let overlaps = result.content_overlap; // Dedup hints
let scores = result.evidence_scores; // Priority scoring