Pandas Data Analysis by pluginagentmarketplace/custom-plugin-python
npx skills add https://github.com/pluginagentmarketplace/custom-plugin-python --skill 'Pandas Data Analysis'掌握使用 Pandas 进行数据分析,Pandas 是一个强大的 Python 库,用于数据操作和分析。学习如何有效地清洗、转换、分析和可视化数据。
代码示例:
import pandas as pd
import numpy as np
# 创建 DataFrame
data = {
'name': ['Alice', 'Bob', 'Charlie', 'David'],
'age': [25, 30, 35, 28],
'salary': [50000, 60000, 75000, 55000],
'department': ['IT', 'HR', 'IT', 'Sales']
}
df = pd.DataFrame(data)
# 索引和过滤
it_employees = df[df['department'] == 'IT']
high_earners = df.loc[df['salary'] > 55000, ['name', 'salary']]
# 添加计算列
df['annual_bonus'] = df['salary'] * 0.10
df['age_group'] = pd.cut(df['age'], bins=[0, 30, 40, 100], labels=['Young', 'Mid', 'Senior'])
print(df)
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
代码示例:
import pandas as pd
# 加载包含缺失值的数据
df = pd.read_csv('sales_data.csv')
# 处理缺失值
df['price'].fillna(df['price'].median(), inplace=True)
df['category'].fillna('Unknown', inplace=True)
df.dropna(subset=['customer_id'], inplace=True)
# 清洗文本数据
df['product_name'] = df['product_name'].str.strip().str.lower()
df['product_name'] = df['product_name'].str.replace('[^a-zA-Z0-9 ]', '', regex=True)
# 转换日期
df['order_date'] = pd.to_datetime(df['order_date'])
df['year'] = df['order_date'].dt.year
df['month'] = df['order_date'].dt.month
# 删除重复项
df.drop_duplicates(subset=['order_id'], keep='first', inplace=True)
# 应用自定义函数
def categorize_price(price):
if price < 50:
return 'Low'
elif price < 100:
return 'Medium'
else:
return 'High'
df['price_category'] = df['price'].apply(categorize_price)
代码示例:
import pandas as pd
# 示例销售数据
df = pd.read_csv('sales.csv')
# GroupBy 聚合
dept_stats = df.groupby('department').agg({
'salary': ['mean', 'min', 'max'],
'employee_id': 'count'
})
# 多重分组
sales_by_region_product = df.groupby(['region', 'product_category'])['sales'].sum()
# 数据透视表
pivot = df.pivot_table(
values='sales',
index='product_category',
columns='quarter',
aggfunc='sum',
fill_value=0
)
# 滚动窗口(移动平均)
df['sales_ma_7d'] = df.groupby('product_id')['sales'].transform(
lambda x: x.rolling(window=7, min_periods=1).mean()
)
# 累计和
df['cumulative_sales'] = df.groupby('product_id')['sales'].cumsum()
代码示例:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# 设置样式
sns.set_style('whitegrid')
# 加载数据
df = pd.read_csv('sales_data.csv')
# 1. 折线图 - 随时间变化的销售趋势
df.groupby('month')['sales'].sum().plot(kind='line', figsize=(10, 6))
plt.title('月度销售趋势')
plt.xlabel('月份')
plt.ylabel('总销售额 ($)')
plt.show()
# 2. 条形图 - 按类别划分的销售额
category_sales = df.groupby('category')['sales'].sum().sort_values(ascending=False)
category_sales.plot(kind='bar', figsize=(10, 6))
plt.title('按类别划分的销售额')
plt.xlabel('类别')
plt.ylabel('总销售额 ($)')
plt.xticks(rotation=45)
plt.show()
# 3. 直方图 - 价格分布
df['price'].hist(bins=30, figsize=(10, 6))
plt.title('价格分布')
plt.xlabel('价格 ($)')
plt.ylabel('频率')
plt.show()
# 4. 箱线图 - 按部门划分的薪资
df.boxplot(column='salary', by='department', figsize=(10, 6))
plt.title('按部门划分的薪资分布')
plt.suptitle('')
plt.show()
# 5. 热力图 - 相关性矩阵
corr = df[['age', 'salary', 'years_experience']].corr()
sns.heatmap(corr, annot=True, cmap='coolwarm', center=0)
plt.title('相关性矩阵')
plt.show()
分析客户购买行为和细分。
要求:
关键技能: 数据清洗、聚合、可视化
分析销售趋势并预测未来表现。
要求:
关键技能: 时间序列操作、滚动窗口、绘图
构建自动化数据质量评估工具。
要求:
关键技能: 数据验证、统计分析、报告
掌握 Pandas 后,可以探索:
每周安装次数
–
代码仓库
GitHub 星标数
5
首次出现时间
–
安全审计
Master data analysis with Pandas, the powerful Python library for data manipulation and analysis. Learn to clean, transform, analyze, and visualize data effectively.
Code Example:
import pandas as pd
import numpy as np
# Create DataFrame
data = {
'name': ['Alice', 'Bob', 'Charlie', 'David'],
'age': [25, 30, 35, 28],
'salary': [50000, 60000, 75000, 55000],
'department': ['IT', 'HR', 'IT', 'Sales']
}
df = pd.DataFrame(data)
# Indexing and filtering
it_employees = df[df['department'] == 'IT']
high_earners = df.loc[df['salary'] > 55000, ['name', 'salary']]
# Adding calculated columns
df['annual_bonus'] = df['salary'] * 0.10
df['age_group'] = pd.cut(df['age'], bins=[0, 30, 40, 100], labels=['Young', 'Mid', 'Senior'])
print(df)
Code Example:
import pandas as pd
# Load data with missing values
df = pd.read_csv('sales_data.csv')
# Handle missing values
df['price'].fillna(df['price'].median(), inplace=True)
df['category'].fillna('Unknown', inplace=True)
df.dropna(subset=['customer_id'], inplace=True)
# Clean text data
df['product_name'] = df['product_name'].str.strip().str.lower()
df['product_name'] = df['product_name'].str.replace('[^a-zA-Z0-9 ]', '', regex=True)
# Convert dates
df['order_date'] = pd.to_datetime(df['order_date'])
df['year'] = df['order_date'].dt.year
df['month'] = df['order_date'].dt.month
# Remove duplicates
df.drop_duplicates(subset=['order_id'], keep='first', inplace=True)
# Apply custom function
def categorize_price(price):
if price < 50:
return 'Low'
elif price < 100:
return 'Medium'
else:
return 'High'
df['price_category'] = df['price'].apply(categorize_price)
Code Example:
import pandas as pd
# Sample sales data
df = pd.read_csv('sales.csv')
# GroupBy aggregation
dept_stats = df.groupby('department').agg({
'salary': ['mean', 'min', 'max'],
'employee_id': 'count'
})
# Multiple groupby
sales_by_region_product = df.groupby(['region', 'product_category'])['sales'].sum()
# Pivot table
pivot = df.pivot_table(
values='sales',
index='product_category',
columns='quarter',
aggfunc='sum',
fill_value=0
)
# Rolling window (moving average)
df['sales_ma_7d'] = df.groupby('product_id')['sales'].transform(
lambda x: x.rolling(window=7, min_periods=1).mean()
)
# Cumulative sum
df['cumulative_sales'] = df.groupby('product_id')['sales'].cumsum()
Code Example:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Set style
sns.set_style('whitegrid')
# Load data
df = pd.read_csv('sales_data.csv')
# 1. Line plot - Sales trend over time
df.groupby('month')['sales'].sum().plot(kind='line', figsize=(10, 6))
plt.title('Monthly Sales Trend')
plt.xlabel('Month')
plt.ylabel('Total Sales ($)')
plt.show()
# 2. Bar plot - Sales by category
category_sales = df.groupby('category')['sales'].sum().sort_values(ascending=False)
category_sales.plot(kind='bar', figsize=(10, 6))
plt.title('Sales by Category')
plt.xlabel('Category')
plt.ylabel('Total Sales ($)')
plt.xticks(rotation=45)
plt.show()
# 3. Histogram - Price distribution
df['price'].hist(bins=30, figsize=(10, 6))
plt.title('Price Distribution')
plt.xlabel('Price ($)')
plt.ylabel('Frequency')
plt.show()
# 4. Box plot - Salary by department
df.boxplot(column='salary', by='department', figsize=(10, 6))
plt.title('Salary Distribution by Department')
plt.suptitle('')
plt.show()
# 5. Heatmap - Correlation matrix
corr = df[['age', 'salary', 'years_experience']].corr()
sns.heatmap(corr, annot=True, cmap='coolwarm', center=0)
plt.title('Correlation Matrix')
plt.show()
Analyze customer purchase behavior and segmentation.
Requirements:
Key Skills: Data cleaning, aggregation, visualization
Analyze sales trends and forecast future performance.
Requirements:
Key Skills: Time series operations, rolling windows, plotting
Build automated data quality assessment tool.
Requirements:
Key Skills: Data validation, statistical analysis, reporting
After mastering Pandas, explore:
Weekly Installs
–
Repository
GitHub Stars
5
First Seen
–
Security Audits
DOCX文件创建、编辑与分析完整指南 - 使用docx-js、Pandoc和Python脚本
44,900 周安装