rag-code-mcp

Python Code Analyzer

Code analyzer for extracting symbols, structure, and relationships from Python files. Indexes code for semantic search in Qdrant.

Status: โœ… FULLY IMPLEMENTED


๐ŸŽฏ What This Analyzer Does

The Python analyzer parses .py files and extracts:

  1. Symbols - classes, methods, functions, variables, constants
  2. Relationships - inheritance, dependencies, method calls
  3. Metadata - decorators, type hints, docstrings

Information is converted to CodeChunks which are then indexed in Qdrant for semantic search.


๐Ÿ“Š Data Flow

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   .py Files     โ”‚โ”€โ”€โ”€โ”€โ–ถโ”‚  Python Analyzer โ”‚โ”€โ”€โ”€โ”€โ–ถโ”‚   CodeChunks    โ”‚
โ”‚  (source code)  โ”‚     โ”‚  (regex parsing) โ”‚     โ”‚  (structured)   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                                          โ”‚
                                                          โ–ผ
                                                 โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                                                 โ”‚     Qdrant      โ”‚
                                                 โ”‚  (vector store) โ”‚
                                                 โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ” What We Index

1. Classes (type: "class")

@dataclass
class User(BaseModel, LoggingMixin, metaclass=ABCMeta):
    """Represents a user in the system."""
    name: str
    email: str

Extracted information: | Field | Value | Description | |โ€”โ€”-|โ€”โ€”-|โ€”โ€”โ€”โ€”-| | name | "User" | Class name | | bases | ["BaseModel", "LoggingMixin"] | Parent classes (inheritance) | | decorators | ["dataclass"] | Applied decorators | | is_abstract | true | If itโ€™s an abstract class (ABC) | | is_dataclass | true | If decorated with @dataclass | | is_enum | false | If inherits from Enum | | is_protocol | false | If itโ€™s a Protocol (typing) | | is_mixin | true | If it is/uses a mixin | | metaclass | "ABCMeta" | Specified metaclass | | dependencies | ["BaseModel", "LoggingMixin"] | All class dependencies | | docstring | "Represents a user..." | Class documentation |

2. Methods (type: "method")

class UserService:
    async def get_user(self, user_id: int) -> User:
        """Returns a user by ID."""
        self.validate_id(user_id)
        user = await self.repository.find(user_id)
        return user

Extracted information: | Field | Value | Description | |โ€”โ€”-|โ€”โ€”-|โ€”โ€”โ€”โ€”-| | name | "get_user" | Method name | | signature | "async def get_user(self, user_id: int) -> User" | Complete signature | | class_name | "UserService" | Parent class | | parameters | [{name: "user_id", type: "int"}] | Parameters with types | | return_type | "User" | Return type | | is_async | true | If itโ€™s an async method | | is_static | false | If itโ€™s @staticmethod | | is_classmethod | false | If itโ€™s @classmethod | | calls | [{name: "validate_id", receiver: "self"}, ...] | Called methods | | type_deps | ["User"] | Used types (dependencies) | | docstring | "Returns a user..." | Method documentation |

3. Functions (type: "function")

@lru_cache(maxsize=100)
async def fetch_data(url: str) -> dict:
    """Downloads data from URL."""
    yield from process(url)

Extracted information: | Field | Value | Description | |โ€”โ€”-|โ€”โ€”-|โ€”โ€”โ€”โ€”-| | name | "fetch_data" | Function name | | signature | "async def fetch_data(url: str) -> dict" | Signature | | is_async | true | If itโ€™s async | | is_generator | true | If it uses yield | | decorators | ["lru_cache"] | Applied decorators |

4. Properties (type: "property")

class User:
    @property
    def full_name(self) -> str:
        return f"{self.first_name} {self.last_name}"
    
    @full_name.setter
    def full_name(self, value: str):
        self.first_name, self.last_name = value.split()

Extracted information: | Field | Value | Description | |โ€”โ€”-|โ€”โ€”-|โ€”โ€”โ€”โ€”-| | name | "full_name" | Property name | | type | "str" | Return type | | has_getter | true | Has getter (@property) | | has_setter | true | Has setter (@x.setter) | | has_deleter | false | Has deleter (@x.deleter) |

5. Constants (type: "const")

MAX_CONNECTIONS: int = 100
API_BASE_URL = "https://api.example.com"

Extracted information:

6. Variables (type: "var")

logger = logging.getLogger(__name__)
default_config: Config = Config()

๐Ÿ”— Relationship Detection

Dependency Graph

The analyzer builds a dependency graph between classes:

class OrderService:
    repository: OrderRepository  # โ†’ dependency
    
    def create_order(self, user: User) -> Order:  # โ†’ dependencies: User, Order
        notification = NotificationService()  # โ†’ dependency (from calls)
        return Order(...)

Detected dependencies:

Method Call Analysis

def process(self, data):
    self.validate(data)           # โ†’ self.validate
    result = Helper.compute(data) # โ†’ Helper.compute (static call)
    super().process(data)         # โ†’ super().process
    save_to_db(result)            # โ†’ save_to_db (function call)

Detected calls:

{
  "calls": [
    {"name": "validate", "receiver": "self", "line": 2},
    {"name": "compute", "receiver": "Helper", "class_name": "Helper", "line": 3},
    {"name": "process", "receiver": "super()", "line": 4},
    {"name": "save_to_db", "line": 5}
  ]
}

๐Ÿ—๏ธ File Structure

python/
โ”œโ”€โ”€ types.go           # Types: ModuleInfo, ClassInfo, MethodInfo, MethodCall, etc.
โ”œโ”€โ”€ analyzer.go        # PathAnalyzer implementation (1500+ lines)
โ”œโ”€โ”€ api_analyzer.go    # Legacy APIAnalyzer (build-tagged out)
โ”œโ”€โ”€ analyzer_test.go   # 26 comprehensive tests
โ””โ”€โ”€ README.md          # This documentation

๐Ÿ’ป Usage

Standard Analysis

import "github.com/doITmagic/rag-code-mcp/internal/ragcode/analyzers/python"

// Create analyzer (excludes test files by default)
analyzer := python.NewCodeAnalyzer()

// Analyze directories/files
chunks, err := analyzer.AnalyzePaths([]string{"./myproject"})

for _, chunk := range chunks {
    fmt.Printf("[%s] %s.%s\n", chunk.Type, chunk.Package, chunk.Name)
    fmt.Printf("  Dependencies: %v\n", chunk.Metadata["dependencies"])
}

With Options

// Include test files
analyzer := python.NewCodeAnalyzerWithOptions(true)

๐Ÿ”Œ Integration

Language Manager

The Python analyzer is automatically selected for:

Workspace Detection

Python projects are detected by: | File | Description | |โ€”โ€”|โ€”โ€”โ€”โ€”-| | pyproject.toml | PEP 518 - modern Python | | setup.py | Setuptools legacy | | requirements.txt | pip dependencies | | Pipfile | Pipenv |


๐Ÿ“‹ CodeChunk Types

Type Description Example
class Class definition class User(BaseModel):
method Class method def get_user(self):
function Module-level function def helper():
property @property @property def name(self):
const UPPER_CASE constant MAX_SIZE = 100
var Module-level variable logger = getLogger()

๐Ÿท๏ธ Complete Metadata

Class Metadata

{
  "bases": ["BaseModel", "Mixin"],
  "decorators": ["dataclass"],
  "is_abstract": false,
  "is_dataclass": true,
  "is_enum": false,
  "is_protocol": false,
  "is_mixin": false,
  "metaclass": "",
  "dependencies": ["BaseModel", "Mixin", "User", "Order"]
}

Method Metadata

{
  "class_name": "UserService",
  "is_static": false,
  "is_classmethod": false,
  "is_async": true,
  "is_abstract": false,
  "decorators": ["cache"],
  "calls": [
    {"name": "validate", "receiver": "self", "line": 10},
    {"name": "save", "receiver": "self.repository", "line": 12}
  ],
  "type_deps": ["User", "Order"]
}

Function Metadata

{
  "is_async": true,
  "is_generator": false,
  "decorators": ["lru_cache"]
}

๐Ÿงช Testing

# Run all tests (26 tests)
go test ./internal/ragcode/analyzers/python/

# With verbose output
go test -v ./internal/ragcode/analyzers/python/

# Specific test
go test -v -run TestMethodCallExtraction ./internal/ragcode/analyzers/python/

# With coverage
go test -cover ./internal/ragcode/analyzers/python/

๐Ÿšซ Excluded Paths

The analyzer automatically skips:


โš ๏ธ Limitations

Limitation Description
Regex-based Doesnโ€™t use full Python AST - may miss edge cases
No Type Resolution Type hints are extracted as strings, not resolved
Single-file Each file is analyzed independently
No Runtime Info Doesnโ€™t execute code, only static analysis

๐Ÿ”ฎ Future Improvements