Transform complex, hard-to-understand Python code into clear, well-documented, maintainable code while preserving correctness. This skill guides systematic refactoring that prioritizes human comprehension without sacrificing correctness or reasonable performance.
When to Invoke
Invoke this skill when:
User explicitly requests "human", "readable", "maintainable", "clean", or "refactor" code improvements
Code review processes flag comprehension or maintainability issues
Working with legacy code that needs modernization
Preparing code for team onboarding or educational contexts
Functions or modules are difficult to understand or modify
RED FLAG indicators: file >500 lines with scattered functions and global state, multiple global statements, no clear module/class organization, configuration mixed with business logic
Do NOT invoke this skill when:
Code is performance-critical and profiling shows optimization is needed first
User explicitly requests performance optimization over readability
Core Principles
Follow these principles in priority order:
Prefer structured OOP for complex code - Code with shared state, multiple concerns, or scattered global functions should be restructured into well-organized classes and modules. Script-like code with global state and tangled dependencies benefits most from OOP. However, simple modules with pure functions, CLI tools using click/argparse, and functional data pipelines don't need to be forced into classes.
Progressive disclosure - Reveal complexity in layers, not all at once
Reasonable performance - Never sacrifice >2x performance without explicit approval
Key Constraints
ALWAYS observe these constraints:
SAFETY BY DESIGN - Use mandatory migration checklists for destructive changes. Create new structure, search all usages, migrate all, verify, only then remove old code. NEVER remove code before 100% migration verified.
STATIC ANALYSIS FIRST - Run flake8 --select=F821,E0602 before tests to catch NameErrors immediately
PRESERVE BEHAVIOR - All existing tests must pass after refactoring
NO PERFORMANCE REGRESSION - Never degrade performance >2x without explicit user approval
NO API CHANGES - Public APIs remain unchanged unless explicitly requested and documented
After each micro-change (not at the end, EVERY SINGLE ONE):
flake8 --select=F821,E999 -> 0 errors
pytest -x -> all passing
Spot check 1 edge case for unchanged behavior
If ANY check fails: STOP -> REVERT -> ANALYZE -> FIX APPROACH -> RETRY
ANY REGRESSION = TOTAL FAILURE OF THE REFACTORING
Refactoring Workflow
Execute refactoring in four phases with validation at each step.
Phase 1: Analysis
Before making any changes, analyze the code comprehensively:
Read the entire codebase section being refactored to understand context
Identify readability issues using the anti-patterns reference (see references/anti-patterns.md):
Check for script-like/procedural code (global state, scattered functions, no clear structure)
Check for God Objects/Classes (classes doing too much)
Complex nested conditionals, long functions, magic numbers, cryptic names, etc.
Assess architecture (see references/oop_principles.md):
Is code organized in proper classes and modules?
Is there global state that should be encapsulated?
Are responsibilities properly separated?
Are SOLID principles followed?
Is dependency injection used instead of hard-coded dependencies?
Measure current metrics using scripts/measure_complexity.py or scripts/analyze_multi_metrics.py
Run linting analysis (see Tooling Recommendations below for which tool to use)
Check test coverage - Identify gaps that need filling before refactoring
Document findings using the analysis template (see assets/templates/analysis_template.md)
Output: Prioritized list of issues by impact and risk.
Phase 2: Planning
Plan the refactoring approach systematically with safety-by-design :
Identify changes by type:
Non-destructive: Renames, documentation, type hints -> Low risk
Destructive: Removing globals, deleting functions, replacing APIs -> High risk
For DESTRUCTIVE changes - CREATE MIGRATION PLAN (MANDATORY):
Search for ALL usages of each element to be removed
Document every found usage with file, line number, and usage type
If you cannot create a complete migration plan, you CANNOT proceed with the destructive change
Risk assessment for each proposed change (Low/Medium/High)
Dependency identification - What else depends on this code?
Test strategy - What tests are needed? What might break?
Change ordering - Sequence changes from safest to riskiest
Expected outcomes - Document what metrics should improve and by how much
Output: Refactoring plan with sequenced changes, migration plans for destructive changes, test strategy, and rollback plan.
Phase 3: Execution
Apply refactoring patterns using safety-by-design workflow.
For NON-DESTRUCTIVE changes (safe to do anytime):
Rename variables/functions for clarity
Extract magic numbers/strings to named constants
Add/improve documentation and type hints
Add guard clauses to reduce nesting
For DESTRUCTIVE changes (removing/replacing code) - STRICT PROTOCOL:
CREATE new structure (no removal yet) - write new classes/functions, add tests
SEARCH comprehensively for ALL usages of the element being removed
CREATE migration checklist documenting every found usage
MIGRATE one usage at a time, checking off the list, running static analysis + tests after each
VERIFY complete migration - re-run original searches, should find zero old references
REMOVE old code only after 100% migration verified
Execution Rules
NEVER skip the migration checklist for destructive changes
Run static analysis BEFORE tests - Catch NameErrors immediately
One pattern at a time - Never mix multiple refactoring patterns in one change
Atomic commits - Each migration step gets its own commit
Stop on ANY error - Static analysis errors OR test failures require immediate fix/revert
Refactoring order (recommended sequence):
Transform script-like code to proper architecture (if code has global state and scattered functions). See references/examples/script_to_oop_transformation.md
Rename variables/functions for clarity
Extract magic numbers/strings to named constants (as class constants or enums)
Add/improve documentation and type hints
Extract methods to reduce function length
Simplify conditionals with guard clauses
Reduce nesting depth
Final review: Ensure separation of concerns is clean
Output: Refactored code passing all tests with clear commit history.
Phase 4: Validation
Validate improvements objectively:
Run static analysis FIRST (catch errors before tests):
Module Documentation - Purpose and key dependencies
Inline Comments - Only for non-obvious "why"
Type Hints - All public APIs and complex internals
OOP Transformation Patterns
For transforming script-like code to structured OOP. See references/examples/script_to_oop_transformation.md for a complete guide and references/oop_principles.md for SOLID principles.
Anti-Patterns to Fix
See references/anti-patterns.md for the full catalog. Priority order:
Critical: Script-like/procedural code with global state, God Object/God Class High: Complex nested conditionals (>3 levels), long functions (>30 lines), magic numbers, cryptic names, missing type hints, missing docstrings Medium: Duplicate code, primitive obsession, long parameter lists (>5) Low: Inconsistent naming, redundant comments, unused imports
Tooling Recommendations
Primary Stack: Ruff + Complexipy (recommended for new projects)
pip install ruff complexipy radon wily
ruff check src/ # Fast linting (Rust, replaces flake8+plugins)
complexipy src/ --max-complexity-allowed 15 # Cognitive complexity (Rust)
radon mi src/ -s # Maintainability Index
See references/cognitive_complexity_guide.md for complete configuration (pyproject.toml, pre-commit hooks, GitHub Actions, CLI usage).
Alternative: Flake8 (for projects already using it)
The scripts/analyze_with_flake8.py and scripts/compare_flake8_reports.py scripts use flake8. See references/flake8_plugins_guide.md for the curated plugin list.
Multi-Metric Analysis
Use scripts/analyze_multi_metrics.py to combine cognitive complexity (complexipy), cyclomatic complexity (radon), and maintainability index in a single report.
Metric
Tool
Use
Cognitive Complexity
complexipy
Human comprehension
Cyclomatic Complexity
ruff (C901), radon
Test planning
Maintainability Index
radon
Overall code health
Metric Targets
Cyclomatic complexity: <10 per function (warning at 15, error at 20)
Cognitive complexity: <15 per function (SonarQube default, warning at 20)
Function length: <30 lines (warning at 50)
Nesting depth: <=3 levels
Docstring coverage: >80% for public functions
Type hint coverage: >90% for public APIs
Historical Tracking with Wily
Monitor trends over time, not just thresholds. See references/cognitive_complexity_guide.md for setup and CI integration.
Common Refactoring Mistakes
See references/REGRESSION_PREVENTION.md for the full guide. Key traps:
Incomplete Migration - Removing old code before ALL usages are migrated (causes NameErrors)
Partial Pattern Application - Applying refactoring to some functions but not others
Breaking Public APIs - Changing function signatures used by external code
Structure refactoring output using the template from assets/templates/summary_template.md. Include:
Changes made with rationale and risk level
Before/after metrics comparison table
Test results and performance impact
Risk assessment and human review recommendation
Related tools -- when to use what
humanize (agent, humanize plugin) -- Multi-language cosmetic cleanup. Renames local variables, improves comments, simplifies structure. Lowest regression risk. Use for: "make this readable", "clean up naming".
python-refactor (this skill) -- Python-only deep restructuring. OOP transformation, SOLID principles, complexity metrics, migration checklists, benchmark validation. Use for: "refactor this module", "reduce complexity", "transform to OOP".
Escalation path: humanize -> python-refactor (from safest to most thorough).
Integration with Same-Package Skills
python-tdd - Set up tests before refactoring, validate coverage after
python-performance-optimization - Deep profiling before/after refactoring
python-packaging - If refactoring a library, handle pyproject.toml and distribution
uv-package-manager - Use uv run ruff, uv run complexipy for tool execution
async-python-patterns - Reference async patterns when refactoring async code
Edge Cases and Limitations
When NOT to Refactor: Performance-critical optimized code (profile first), code scheduled for deletion, external dependencies (contribute upstream), stable legacy code nobody needs to modify.
Limitations: Cannot improve algorithmic complexity (that's algorithm change, not refactoring). Cannot add domain knowledge not in code/comments. Cannot guarantee correctness without tests. Code style preferences vary - adjust based on team conventions.
Examples
See references/examples/ for before/after examples:
script_to_oop_transformation.md - Complete transformation from script-like code to clean OOP architecture
python_complexity_reduction.md - Nested conditionals and long functions
typescript_naming_improvements.md - Variable and function naming patterns (cross-language reference)
Success Criteria
Refactoring is successful when:
ZERO regressions - All existing tests pass, behavior unchanged
Golden master match - Identical output for documented critical cases
Complexity metrics improved (documented in summary)
No performance regression >10% (or explicit approval obtained)
Documentation coverage improved
Code is easier for humans to understand
No new security vulnerabilities introduced
Changes are atomic and well-documented in git history
Wily trend - Complexity not increased compared to previous commit