Python性能优化指南：代码分析、算法优化与内存管理实战

Python Performance by pluginagentmarketplace/custom-plugin-python

5 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/pluginagentmarketplace/custom-plugin-python --skill 'Python Performance'

Python Web框架开发性能优化

🇨🇳中文介绍

Python 性能优化

概述

掌握 Python 性能优化。学习如何分析代码、识别瓶颈、优化算法、高效管理内存，并利用高性能库处理计算密集型任务。

学习目标

分析 Python 代码以识别瓶颈
优化算法和数据结构
高效管理内存
使用编译扩展（Cython、NumPy）
实现缓存策略
并行化 CPU 密集型操作
基准测试和衡量改进

核心主题

1. 性能分析与基准测试

用于微基准测试的 timeit 模块
用于函数级性能分析的 cProfile
用于逐行分析的 line_profiler
用于内存使用分析的 memory_profiler
用于生产环境分析的 py-spy
火焰图和可视化

代码示例：

import timeit
import cProfile
import pstats

# 1. timeit 用于微基准测试
def list_comprehension():
    return [x**2 for x in range(1000)]

def map_function():
    return list(map(lambda x: x**2, range(1000)))

# 比较性能
time_lc = timeit.timeit(list_comprehension, number=10000)
time_map = timeit.timeit(map_function, number=10000)
print(f"List comprehension: {time_lc:.4f}s")
print(f"Map function: {time_map:.4f}s")

# 2. cProfile 用于函数性能分析
def process_data():
    data = []
    for i in range(100000):
        data.append(i ** 2)
    return sum(data)

profiler = cProfile.Profile()
profiler.enable()
result = process_data()
profiler.disable()

stats = pstats.Stats(profiler)
stats.sort_stats('cumulative')
stats.print_stats(10)

# 3. 行分析（需要 line_profiler 包）
# @profile 装饰器（为 line_profiler 手动添加）
def slow_function():
    total = 0
    for i in range(1000000):
        total += i ** 2
    return total

# 运行命令：kernprof -l -v script.py

# 4. 内存分析
from memory_profiler import profile

@profile
def memory_intensive():
    large_list = [i for i in range(1000000)]
    large_dict = {i: i**2 for i in range(1000000)}
    return len(large_list) + len(large_dict)

# 运行命令：python -m memory_profiler script.py

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

2. 算法与数据结构优化

选择高效的数据结构
时间复杂度分析
生成器表达式与列表
用于查找的集合操作
用于队列操作的 Deque
用于有序列表的 Bisect

import bisect
from collections import deque, Counter, defaultdict
import time

# 1. 成员测试：列表 vs 集合
# 差：O(n) 查找
def find_in_list(items, target):
    return target in items  # 线性搜索

# 好：O(1) 查找
def find_in_set(items, target):
    items_set = set(items)
    return target in items_set

items = list(range(100000))
# 列表：0.001s，集合：0.000001s（快 1000 倍！）

# 2. 生成器表达式提高内存效率
# 差：在内存中创建整个列表
squares_list = [x**2 for x in range(1000000)]  # ~4MB

# 好：按需生成
squares_gen = (x**2 for x in range(1000000))   # ~128 字节

# 3. Deque 用于高效的队列操作
# 差：从开头弹出 O(n)
queue_list = list(range(10000))
queue_list.pop(0)  # 慢

# 好：从两端弹出 O(1)
queue_deque = deque(range(10000))
queue_deque.popleft()  # 快

# 4. Bisect 用于维护有序列表
# 差：向有序列表插入 O(n)
sorted_list = []
for i in [5, 2, 8, 1, 9]:
    sorted_list.append(i)
    sorted_list.sort()

# 好：插入 O(log n)
sorted_list = []
for i in [5, 2, 8, 1, 9]:
    bisect.insort(sorted_list, i)

# 5. Counter 用于频率计数
# 差：手动计数
word_count = {}
for word in words:
    if word in word_count:
        word_count[word] += 1
    else:
        word_count[word] = 1

# 好：使用 Counter
word_count = Counter(words)
most_common = word_count.most_common(10)

内存分配和垃圾回收
对象池
用于内存高效类的 Slots
引用计数
弱引用
内存泄漏检测

import gc
import sys
from weakref import WeakValueDictionary

# 1. __slots__ 用于内存高效的类
# 差：普通类（每个实例 56 字节）
class RegularPoint:
    def __init__(self, x, y):
        self.x = x
        self.y = y

# 好：使用 Slots 的类（每个实例 32 字节 - 小 43%！）
class SlottedPoint:
    __slots__ = ['x', 'y']

    def __init__(self, x, y):
        self.x = x
        self.y = y

print(sys.getsizeof(RegularPoint(1, 2)))  # 56 字节
print(sys.getsizeof(SlottedPoint(1, 2)))  # 32 字节

# 2. 对象池用于昂贵的对象
class ObjectPool:
    def __init__(self, factory, max_size=10):
        self.factory = factory
        self.max_size = max_size
        self.pool = []

    def acquire(self):
        if self.pool:
            return self.pool.pop()
        return self.factory()

    def release(self, obj):
        if len(self.pool) < self.max_size:
            self.pool.append(obj)

# 用法
db_pool = ObjectPool(lambda: DatabaseConnection(), max_size=5)
conn = db_pool.acquire()
# 使用连接
db_pool.release(conn)

# 3. 弱引用防止内存泄漏
class Cache:
    def __init__(self):
        self._cache = WeakValueDictionary()

    def get(self, key):
        return self._cache.get(key)

    def set(self, key, value):
        self._cache[key] = value

# 4. 大型操作的手动垃圾回收
def process_large_dataset():
    for batch in large_data:
        process_batch(batch)
        # 每批处理后强制垃圾回收
        gc.collect()

# 5. 用于资源清理的上下文管理器
class ManagedResource:
    def __enter__(self):
        self.resource = allocate_resource()
        return self.resource

    def __exit__(self, exc_type, exc_val, exc_tb):
        self.resource.cleanup()
        return False

NumPy 向量化
Numba JIT 编译
用于 C 扩展的 Cython
用于并行处理的多进程
Concurrent.futures
性能比较

import numpy as np
from numba import jit
import multiprocessing as mp
from concurrent.futures import ProcessPoolExecutor

# 1. NumPy 向量化
# 差：Python 循环（慢）
def python_sum(n):
    total = 0
    for i in range(n):
        total += i ** 2
    return total

# 好：NumPy 向量化（快 100 倍！）
def numpy_sum(n):
    arr = np.arange(n)
    return np.sum(arr ** 2)

# 基准测试：python_sum(1000000) = 0.15s
#           numpy_sum(1000000)  = 0.002s

# 2. Numba JIT 编译
@jit(nopython=True)  # 编译为机器码
def fast_function(n):
    total = 0
    for i in range(n):
        total += i ** 2
    return total

# 第一次调用：编译 + 执行
# 后续调用：比纯 Python 快 50 倍！

# 3. 多进程处理 CPU 密集型任务
def cpu_intensive_task(n):
    return sum(i * i for i in range(n))

# 单进程
result = cpu_intensive_task(10000000)

# 多进程
with ProcessPoolExecutor(max_workers=4) as executor:
    ranges = [2500000, 2500000, 2500000, 2500000]
    results = executor.map(cpu_intensive_task, ranges)
    total = sum(results)

# 在 4 核上实现 4 倍加速！

# 4. 缓存昂贵的计算
from functools import lru_cache

@lru_cache(maxsize=128)
def fibonacci(n):
    if n < 2:
        return n
    return fibonacci(n-1) + fibonacci(n-2)

# fibonacci(100) 无缓存：~永远
# fibonacci(100) 有缓存：瞬间

# 5. 用于零拷贝操作的内存视图
def process_array(data):
    # 差：创建副本
    subset = data[1000:2000]

    # 好：零拷贝视图
    view = memoryview(data)[1000:2000]

项目 1：性能分析器

构建一个全面的性能分析工具。

使用 cProfile 进行 CPU 分析
内存分析
逐行分析
可视化（火焰图）
HTML 报告生成
瓶颈识别

关键技能： 分析工具、可视化、分析

项目 2：数据处理管道

优化数据处理管道。

加载大型 CSV 文件（1GB+）
转换和清理数据
聚合统计信息
比较 Python/NumPy/Pandas 方法
测量内存使用情况
优化至 <2GB RAM

关键技能： NumPy、内存优化、基准测试

项目 3：并行计算

实现并行算法。

矩阵乘法
图像处理
蒙特卡洛模拟
比较线程/多进程/asyncio
测量加速比
处理共享状态

关键技能： 并行处理、性能测量

分析代码以识别瓶颈
选择合适的数据结构
针对时间复杂度优化算法
高效管理内存
在适用情况下使用向量化
实现有效的缓存
并行化 CPU 密集型操作

Python 性能技巧 - 官方技巧
NumPy 文档 - NumPy 文档
Numba 文档 - JIT 编译

高性能 Python - O'Reilly 书籍
Python 性能 - Real Python 指南
优化 Python - PyCon 演讲

cProfile - CPU 分析
memory_profiler - 内存分析
py-spy - 采样分析器
Scalene - CPU/GPU/内存分析器

掌握 Python 性能优化后，可以探索：

Cython - Python 的 C 扩展
PyPy - 替代的 Python 解释器
Dask - 并行计算库
CUDA - 使用 Python 进行 GPU 编程

🇺🇸English

Python Performance Optimization

Overview

Master performance optimization in Python. Learn to profile code, identify bottlenecks, optimize algorithms, manage memory efficiently, and leverage high-performance libraries for compute-intensive tasks.

Learning Objectives

Profile Python code to identify bottlenecks
Optimize algorithms and data structures
Manage memory efficiently
Use compiled extensions (Cython, NumPy)
Implement caching strategies
Parallelize CPU-bound operations
Benchmark and measure improvements

Core Topics

1. Profiling & Benchmarking

timeit module for micro-benchmarks
cProfile for function-level profiling
line_profiler for line-by-line analysis
memory_profiler for memory usage
py-spy for production profiling
Flame graphs and visualization

Code Example:

import timeit
import cProfile
import pstats

# 1. timeit for micro-benchmarks
def list_comprehension():
    return [x**2 for x in range(1000)]

def map_function():
    return list(map(lambda x: x**2, range(1000)))

# Compare performance
time_lc = timeit.timeit(list_comprehension, number=10000)
time_map = timeit.timeit(map_function, number=10000)
print(f"List comprehension: {time_lc:.4f}s")
print(f"Map function: {time_map:.4f}s")

# 2. cProfile for function profiling
def process_data():
    data = []
    for i in range(100000):
        data.append(i ** 2)
    return sum(data)

profiler = cProfile.Profile()
profiler.enable()
result = process_data()
profiler.disable()

stats = pstats.Stats(profiler)
stats.sort_stats('cumulative')
stats.print_stats(10)

# 3. Line profiling (requires line_profiler package)
# @profile decorator (add manually for line_profiler)
def slow_function():
    total = 0
    for i in range(1000000):
        total += i ** 2
    return total

# Run with: kernprof -l -v script.py

# 4. Memory profiling
from memory_profiler import profile

@profile
def memory_intensive():
    large_list = [i for i in range(1000000)]
    large_dict = {i: i**2 for i in range(1000000)}
    return len(large_list) + len(large_dict)

# Run with: python -m memory_profiler script.py

2. Algorithm & Data Structure Optimization

Choosing efficient data structures
Time complexity analysis
Generator expressions vs lists
Set operations for lookups
Deque for queue operations
Bisect for sorted lists

Code Example:

import bisect
from collections import deque, Counter, defaultdict
import time

# 1. List vs Set for membership testing
# Bad: O(n) lookup
def find_in_list(items, target):
    return target in items  # Linear search

# Good: O(1) lookup
def find_in_set(items, target):
    items_set = set(items)
    return target in items_set

items = list(range(100000))
# List: 0.001s, Set: 0.000001s (1000x faster!)

# 2. Generator expressions for memory efficiency
# Bad: Creates entire list in memory
squares_list = [x**2 for x in range(1000000)]  # ~4MB

# Good: Generates on-demand
squares_gen = (x**2 for x in range(1000000))   # ~128 bytes

# 3. Deque for efficient queue operations
# Bad: O(n) pop from beginning
queue_list = list(range(10000))
queue_list.pop(0)  # Slow

# Good: O(1) pop from both ends
queue_deque = deque(range(10000))
queue_deque.popleft()  # Fast

# 4. Bisect for maintaining sorted lists
# Bad: O(n) insertion into sorted list
sorted_list = []
for i in [5, 2, 8, 1, 9]:
    sorted_list.append(i)
    sorted_list.sort()

# Good: O(log n) insertion
sorted_list = []
for i in [5, 2, 8, 1, 9]:
    bisect.insort(sorted_list, i)

# 5. Counter for frequency counting
# Bad: Manual counting
word_count = {}
for word in words:
    if word in word_count:
        word_count[word] += 1
    else:
        word_count[word] = 1

# Good: Counter
word_count = Counter(words)
most_common = word_count.most_common(10)

3. Memory Management

Memory allocation and garbage collection
Object pooling
Slots for memory-efficient classes
Reference counting
Weak references
Memory leaks detection

Code Example:

import gc
import sys
from weakref import WeakValueDictionary

# 1. __slots__ for memory-efficient classes
# Bad: Regular class (56 bytes per instance)
class RegularPoint:
    def __init__(self, x, y):
        self.x = x
        self.y = y

# Good: Slots class (32 bytes per instance - 43% smaller!)
class SlottedPoint:
    __slots__ = ['x', 'y']

    def __init__(self, x, y):
        self.x = x
        self.y = y

print(sys.getsizeof(RegularPoint(1, 2)))  # 56 bytes
print(sys.getsizeof(SlottedPoint(1, 2)))  # 32 bytes

# 2. Object pooling for expensive objects
class ObjectPool:
    def __init__(self, factory, max_size=10):
        self.factory = factory
        self.max_size = max_size
        self.pool = []

    def acquire(self):
        if self.pool:
            return self.pool.pop()
        return self.factory()

    def release(self, obj):
        if len(self.pool) < self.max_size:
            self.pool.append(obj)

# Usage
db_pool = ObjectPool(lambda: DatabaseConnection(), max_size=5)
conn = db_pool.acquire()
# Use connection
db_pool.release(conn)

# 3. Weak references to prevent memory leaks
class Cache:
    def __init__(self):
        self._cache = WeakValueDictionary()

    def get(self, key):
        return self._cache.get(key)

    def set(self, key, value):
        self._cache[key] = value

# 4. Manual garbage collection for large operations
def process_large_dataset():
    for batch in large_data:
        process_batch(batch)
        # Force garbage collection after each batch
        gc.collect()

# 5. Context managers for resource cleanup
class ManagedResource:
    def __enter__(self):
        self.resource = allocate_resource()
        return self.resource

    def __exit__(self, exc_type, exc_val, exc_tb):
        self.resource.cleanup()
        return False

4. High-Performance Computing

NumPy vectorization
Numba JIT compilation
Cython for C extensions
Multiprocessing for parallelism
Concurrent.futures
Performance comparison

Code Example:

import numpy as np
from numba import jit
import multiprocessing as mp
from concurrent.futures import ProcessPoolExecutor

# 1. NumPy vectorization
# Bad: Python loops (slow)
def python_sum(n):
    total = 0
    for i in range(n):
        total += i ** 2
    return total

# Good: NumPy vectorization (100x faster!)
def numpy_sum(n):
    arr = np.arange(n)
    return np.sum(arr ** 2)

# Benchmark: python_sum(1000000) = 0.15s
#           numpy_sum(1000000)  = 0.002s

# 2. Numba JIT compilation
@jit(nopython=True)  # Compile to machine code
def fast_function(n):
    total = 0
    for i in range(n):
        total += i ** 2
    return total

# First call: compilation + execution
# Subsequent calls: 50x faster than pure Python!

# 3. Multiprocessing for CPU-bound tasks
def cpu_intensive_task(n):
    return sum(i * i for i in range(n))

# Single process
result = cpu_intensive_task(10000000)

# Multiple processes
with ProcessPoolExecutor(max_workers=4) as executor:
    ranges = [2500000, 2500000, 2500000, 2500000]
    results = executor.map(cpu_intensive_task, ranges)
    total = sum(results)

# 4x speedup on 4 cores!

# 4. Caching for expensive computations
from functools import lru_cache

@lru_cache(maxsize=128)
def fibonacci(n):
    if n < 2:
        return n
    return fibonacci(n-1) + fibonacci(n-2)

# fibonacci(100) without cache: ~forever
# fibonacci(100) with cache: instant

# 5. Memory views for zero-copy operations
def process_array(data):
    # Bad: Creates copy
    subset = data[1000:2000]

    # Good: Zero-copy view
    view = memoryview(data)[1000:2000]

Hands-On Practice

Project 1: Performance Profiler

Build a comprehensive profiling tool.

Requirements:

CPU profiling with cProfile
Memory profiling
Line-by-line analysis
Visualization (flame graphs)
HTML report generation
Bottleneck identification

Key Skills: Profiling tools, visualization, analysis

Project 2: Data Processing Pipeline

Optimize data processing pipeline.

Requirements:

Load large CSV files (1GB+)
Transform and clean data
Aggregate statistics
Compare Python/NumPy/Pandas approaches
Measure memory usage
Optimize to <2GB RAM

Key Skills: NumPy, memory optimization, benchmarking

Project 3: Parallel Computing

Implement parallel algorithms.

Requirements:

Matrix multiplication
Image processing
Monte Carlo simulation
Compare threading/multiprocessing/asyncio
Measure speedup
Handle shared state

Key Skills: Parallelism, performance measurement

Assessment Criteria

Profile code to identify bottlenecks
Choose appropriate data structures
Optimize algorithms for time complexity
Manage memory efficiently
Use vectorization where applicable
Implement effective caching
Parallelize CPU-bound operations

Resources

Official Documentation

Python Performance Tips - Official tips
NumPy Docs - NumPy documentation
Numba Docs - JIT compilation

Learning Platforms

High Performance Python - O'Reilly book
Python Performance - Real Python guide
Optimizing Python - PyCon talks

Tools

cProfile - CPU profiling
memory_profiler - Memory profiling
py-spy - Sampling profiler
Scalene - CPU/GPU/memory profiler

Next Steps

After mastering Python performance, explore:

Cython - C extensions for Python
PyPy - Alternative Python interpreter
Dask - Parallel computing library
CUDA - GPU programming with Python

Weekly Installs

–

Repository

pluginagentmark…n-python

GitHub Stars

First Seen

–

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

agent-browser 浏览器自动化工具 - Vercel Labs 命令行网页操作与测试

147,400 周安装

Python性能优化指南：代码分析、算法优化与内存管理实战

🇨🇳中文介绍

Python 性能优化

概述

学习目标

核心主题

1. 性能分析与基准测试

相关 Skills

2. 算法与数据结构优化

3. 内存管理

4. 高性能计算

实践练习

项目 1：性能分析器

项目 2：数据处理管道

项目 3：并行计算

评估标准

资源

官方文档

学习平台

工具

后续步骤

🇺🇸English

Python Performance Optimization

Overview

Learning Objectives

Core Topics

1. Profiling & Benchmarking

2. Algorithm & Data Structure Optimization

3. Memory Management

4. High-Performance Computing

Hands-On Practice

Project 1: Performance Profiler

Project 2: Data Processing Pipeline

Project 3: Parallel Computing

Assessment Criteria

Resources

Official Documentation

Learning Platforms

Tools

Next Steps

最新 Skills