This document explains the architecture and operational principles of the incremental indexing system in RagCode.
Incremental indexing allows RagCode to update the search index efficiently by processing only the files that have changed since the last indexing run. This significantly reduces the time and computational resources required to keep the knowledge base up-to-date.
The system relies on three main concepts:
state.json)The state of the workspace is persisted in a JSON file located at .ragcode/state.json within the workspace root.
Structure:
{
"files": {
"/path/to/file.go": {
"mod_time": "2023-10-27T10:00:00Z",
"size": 1024
}
},
"last_indexed": "2023-10-27T10:05:00Z"
}
Incremental indexing can be triggered in two ways:
index_workspace tool or the index-all CLI utility.search_code, get_function_details, find_type_definition, etc.) accesses an already indexed workspace. The Manager detects the collection, runs checkAndReindexIfNeeded in a goroutine, and if changes are detected, starts IndexLanguage in the background without blocking the agent’s response.The diagram below describes the common flow used in both scenarios:
graph TD
A[Start Indexing] --> B{Collection Exists?}
B -- No --> C[Full Indexing]
B -- Yes --> D[Load State (.ragcode/state.json)]
D --> E[Scan Current Files]
E --> F{Compare with State}
F -->|New/Modified| G[Add to Index List]
F -->|Deleted/Modified| H[Add to Delete List]
F -->|Unchanged| I[Ignore]
H --> J[Delete Old Chunks from Qdrant]
G --> K[Index New Content]
J --> L[Update State]
K --> L
L --> M[Save State]
M --> N[Finish]
The WorkspaceManager detects the workspace and attempts to load .ragcode/state.json. If the file doesn’t exist, it assumes a fresh state.
The system iterates through all currently detected source files for the target language:
mod_time or size, it is marked for re-indexing.For every file marked as Modified or Deleted, the system performs a cleanup in the vector database.
DeleteByMetadata(ctx, "file", filePath).The system runs the standard indexing pipeline (Analyzer -> Chunker -> Embedder -> Vector DB) only for the list of new or modified files.
Finally, the in-memory state is updated with the new file information, and state.json is rewritten to disk.
file_path will detect the workspace and trigger collection creation + full indexing.state.json and automatically trigger incremental re-indexing when they detect changed files. There is no need to call index_workspace manually.You can still force a manual run using index_workspace:
# First run - indexes all files
index_workspace --file_path /path/to/project
# Subsequent runs - only indexes changed files
index_workspace --file_path /path/to/project
The index-all command-line utility also supports incremental indexing:
# First run
./bin/index-all -paths /path/to/project
# Output: "📝 Indexing 77 new/modified files..."
# Second run (no changes)
./bin/index-all -paths /path/to/project
# Output: "✨ No code changes detected for language 'go'"
Currently, markdown files are re-indexed on every run. The incremental logic applies only to source code files (Go, PHP, etc.). Future versions will extend incremental indexing to documentation files as well.
The .ragcode/state.json file is stored in the workspace root. This directory should be added to .gitignore as it contains local indexing state that should not be shared between developers.
To verify incremental indexing is working:
Example output showing successful incremental operation:
🔎 Indexing Go files in '.' (incremental)...
2025/11/23 22:40:56 🚀 Starting indexing for workspace: .
2025/11/23 22:40:56 Collection: do-ai-code
2025/11/23 22:40:56 Language: go
2025/11/23 22:40:56 ✨ No code changes detected for language 'go'
~/.local/state/ragcode/mcp.log (configurable via logging.path).logging.level: debug to see messages like 🔄 Auto-detected file changes…, 📝 Indexing N new/modified files…, ✨ No code changes detected…, etc.tail -f ~/.local/state/ragcode/mcp.log | grep -E "Auto-reindex|Indexing|No code"