Code analyzer for extracting symbols, structure, and relationships from Python files. Indexes code for semantic search in Qdrant.
The Python analyzer parses .py files and extracts:
Information is converted to CodeChunks which are then indexed in Qdrant for semantic search.
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ .py Files โโโโโโถโ Python Analyzer โโโโโโถโ CodeChunks โ
โ (source code) โ โ (regex parsing) โ โ (structured) โ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโฌโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโ
โ Qdrant โ
โ (vector store) โ
โโโโโโโโโโโโโโโโโโโ
type: "class")@dataclass
class User(BaseModel, LoggingMixin, metaclass=ABCMeta):
"""Represents a user in the system."""
name: str
email: str
Extracted information:
| Field | Value | Description |
|โโ-|โโ-|โโโโ-|
| name | "User" | Class name |
| bases | ["BaseModel", "LoggingMixin"] | Parent classes (inheritance) |
| decorators | ["dataclass"] | Applied decorators |
| is_abstract | true | If itโs an abstract class (ABC) |
| is_dataclass | true | If decorated with @dataclass |
| is_enum | false | If inherits from Enum |
| is_protocol | false | If itโs a Protocol (typing) |
| is_mixin | true | If it is/uses a mixin |
| metaclass | "ABCMeta" | Specified metaclass |
| dependencies | ["BaseModel", "LoggingMixin"] | All class dependencies |
| docstring | "Represents a user..." | Class documentation |
type: "method")class UserService:
async def get_user(self, user_id: int) -> User:
"""Returns a user by ID."""
self.validate_id(user_id)
user = await self.repository.find(user_id)
return user
Extracted information:
| Field | Value | Description |
|โโ-|โโ-|โโโโ-|
| name | "get_user" | Method name |
| signature | "async def get_user(self, user_id: int) -> User" | Complete signature |
| class_name | "UserService" | Parent class |
| parameters | [{name: "user_id", type: "int"}] | Parameters with types |
| return_type | "User" | Return type |
| is_async | true | If itโs an async method |
| is_static | false | If itโs @staticmethod |
| is_classmethod | false | If itโs @classmethod |
| calls | [{name: "validate_id", receiver: "self"}, ...] | Called methods |
| type_deps | ["User"] | Used types (dependencies) |
| docstring | "Returns a user..." | Method documentation |
type: "function")@lru_cache(maxsize=100)
async def fetch_data(url: str) -> dict:
"""Downloads data from URL."""
yield from process(url)
Extracted information:
| Field | Value | Description |
|โโ-|โโ-|โโโโ-|
| name | "fetch_data" | Function name |
| signature | "async def fetch_data(url: str) -> dict" | Signature |
| is_async | true | If itโs async |
| is_generator | true | If it uses yield |
| decorators | ["lru_cache"] | Applied decorators |
type: "property")class User:
@property
def full_name(self) -> str:
return f"{self.first_name} {self.last_name}"
@full_name.setter
def full_name(self, value: str):
self.first_name, self.last_name = value.split()
Extracted information:
| Field | Value | Description |
|โโ-|โโ-|โโโโ-|
| name | "full_name" | Property name |
| type | "str" | Return type |
| has_getter | true | Has getter (@property) |
| has_setter | true | Has setter (@x.setter) |
| has_deleter | false | Has deleter (@x.deleter) |
type: "const")MAX_CONNECTIONS: int = 100
API_BASE_URL = "https://api.example.com"
Extracted information:
type: "var")logger = logging.getLogger(__name__)
default_config: Config = Config()
The analyzer builds a dependency graph between classes:
class OrderService:
repository: OrderRepository # โ dependency
def create_order(self, user: User) -> Order: # โ dependencies: User, Order
notification = NotificationService() # โ dependency (from calls)
return Order(...)
Detected dependencies:
OrderRepository - from type hint on variableUser - from parameterOrder - from return typeNotificationService - from method callsdef process(self, data):
self.validate(data) # โ self.validate
result = Helper.compute(data) # โ Helper.compute (static call)
super().process(data) # โ super().process
save_to_db(result) # โ save_to_db (function call)
Detected calls:
{
"calls": [
{"name": "validate", "receiver": "self", "line": 2},
{"name": "compute", "receiver": "Helper", "class_name": "Helper", "line": 3},
{"name": "process", "receiver": "super()", "line": 4},
{"name": "save_to_db", "line": 5}
]
}
python/
โโโ types.go # Types: ModuleInfo, ClassInfo, MethodInfo, MethodCall, etc.
โโโ analyzer.go # PathAnalyzer implementation (1500+ lines)
โโโ api_analyzer.go # Legacy APIAnalyzer (build-tagged out)
โโโ analyzer_test.go # 26 comprehensive tests
โโโ README.md # This documentation
import "github.com/doITmagic/rag-code-mcp/internal/ragcode/analyzers/python"
// Create analyzer (excludes test files by default)
analyzer := python.NewCodeAnalyzer()
// Analyze directories/files
chunks, err := analyzer.AnalyzePaths([]string{"./myproject"})
for _, chunk := range chunks {
fmt.Printf("[%s] %s.%s\n", chunk.Type, chunk.Package, chunk.Name)
fmt.Printf(" Dependencies: %v\n", chunk.Metadata["dependencies"])
}
// Include test files
analyzer := python.NewCodeAnalyzerWithOptions(true)
The Python analyzer is automatically selected for:
python, py - generic Python projectsdjango - Django projectsflask - Flask projectsfastapi - FastAPI projectsPython projects are detected by:
| File | Description |
|โโ|โโโโ-|
| pyproject.toml | PEP 518 - modern Python |
| setup.py | Setuptools legacy |
| requirements.txt | pip dependencies |
| Pipfile | Pipenv |
| Type | Description | Example |
|---|---|---|
class |
Class definition | class User(BaseModel): |
method |
Class method | def get_user(self): |
function |
Module-level function | def helper(): |
property |
@property | @property def name(self): |
const |
UPPER_CASE constant | MAX_SIZE = 100 |
var |
Module-level variable | logger = getLogger() |
{
"bases": ["BaseModel", "Mixin"],
"decorators": ["dataclass"],
"is_abstract": false,
"is_dataclass": true,
"is_enum": false,
"is_protocol": false,
"is_mixin": false,
"metaclass": "",
"dependencies": ["BaseModel", "Mixin", "User", "Order"]
}
{
"class_name": "UserService",
"is_static": false,
"is_classmethod": false,
"is_async": true,
"is_abstract": false,
"decorators": ["cache"],
"calls": [
{"name": "validate", "receiver": "self", "line": 10},
{"name": "save", "receiver": "self.repository", "line": 12}
],
"type_deps": ["User", "Order"]
}
{
"is_async": true,
"is_generator": false,
"decorators": ["lru_cache"]
}
# Run all tests (26 tests)
go test ./internal/ragcode/analyzers/python/
# With verbose output
go test -v ./internal/ragcode/analyzers/python/
# Specific test
go test -v -run TestMethodCallExtraction ./internal/ragcode/analyzers/python/
# With coverage
go test -cover ./internal/ragcode/analyzers/python/
The analyzer automatically skips:
__pycache__/ - Python cache.venv/, venv/, env/ - virtual environments.git/ - Git.tox/, .pytest_cache/, .mypy_cache/ - cachesdist/, build/ - distributionstest_*.py, *_test.py - test files (by default)| Limitation | Description |
|---|---|
| Regex-based | Doesnโt use full Python AST - may miss edge cases |
| No Type Resolution | Type hints are extracted as strings, not resolved |
| Single-file | Each file is analyzed independently |
| No Runtime Info | Doesnโt execute code, only static analysis |