datanalysis-credit-risk by github/awesome-copilot
npx skills add https://github.com/github/awesome-copilot --skill datanalysis-credit-risk# 运行完整的数据清洗流程
python ".github/skills/datanalysis-credit-risk/scripts/example.py"
数据清洗流程包含以下11个步骤,每个步骤独立执行,不会删除原始数据:
| 函数 | 用途 | 模块 |
|---|---|---|
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
get_dataset()| 加载并格式化数据 |
| references.func |
org_analysis() | 机构样本分析 | references.func |
missing_check() | 计算缺失率 | references.func |
drop_abnormal_ym() | 过滤异常月份 | references.analysis |
drop_highmiss_features() | 剔除高缺失率特征 | references.analysis |
drop_lowiv_features() | 剔除低IV特征 | references.analysis |
drop_highpsi_features() | 剔除高PSI特征 | references.analysis |
drop_highnoise_features() | 空重要性降噪 | references.analysis |
drop_highcorr_features() | 剔除高相关性特征 | references.analysis |
iv_distribution_by_org() | IV分布统计 | references.analysis |
psi_distribution_by_org() | PSI分布统计 | references.analysis |
value_ratio_distribution_by_org() | 有值率分布统计 | references.analysis |
export_cleaning_report() | 导出清洗报告 | references.analysis |
DATA_PATH: 数据文件路径(最佳为parquet格式)DATE_COL: 日期列名Y_COL: 标签列名ORG_COL: 机构列名KEY_COLS: 主键列名列表OOS_ORGS: 样本外机构列表min_ym_bad_sample: 每月最小坏样本数量(默认 10)min_ym_sample: 每月最小总样本数量(默认 500)missing_ratio: 整体缺失率阈值(默认 0.6)overall_iv_threshold: 整体IV阈值(默认 0.1)org_iv_threshold: 单机构IV阈值(默认 0.1)max_org_threshold: 最大容忍低IV机构数量(默认 2)psi_threshold: PSI阈值(默认 0.1)max_months_ratio: 最大不稳定月份比例(默认 1/3)max_orgs: 最大不稳定机构数量(默认 6)n_estimators: 树的数量(默认 100)max_depth: 最大树深度(默认 5)gain_threshold: 增益差异阈值(默认 50)max_corr: 相关性阈值(默认 0.9)top_n_keep: 按原始增益排名保留前N个特征(默认 20)生成的Excel报告包含以下工作表:
每周安装量
5.6K
代码库
GitHub星标数
26.9K
首次出现
2026年3月2日
安全审计
安装于
codex5.5K
gemini-cli5.5K
opencode5.5K
cursor5.5K
github-copilot5.5K
kimi-cli5.5K
# Run the complete data cleaning pipeline
python ".github/skills/datanalysis-credit-risk/scripts/example.py"
The data cleaning pipeline consists of the following 11 steps, each executed independently without deleting the original data:
| Function | Purpose | Module |
|---|---|---|
get_dataset() | Load and format data | references.func |
org_analysis() | Organization sample analysis | references.func |
missing_check() | Calculate missing rate | references.func |
drop_abnormal_ym() | Filter abnormal months | references.analysis |
drop_highmiss_features() | Drop high missing rate features |
DATA_PATH: Data file path (best are parquet format)DATE_COL: Date column nameY_COL: Label column nameORG_COL: Organization column nameKEY_COLS: Primary key column name listOOS_ORGS: Out-of-sample organization listmin_ym_bad_sample: Minimum bad sample count per month (default 10)min_ym_sample: Minimum total sample count per month (default 500)missing_ratio: Overall missing rate threshold (default 0.6)overall_iv_threshold: Overall IV threshold (default 0.1)org_iv_threshold: Single organization IV threshold (default 0.1)max_org_threshold: Maximum tolerated low IV organization count (default 2)psi_threshold: PSI threshold (default 0.1)max_months_ratio: Maximum unstable month ratio (default 1/3)max_orgs: Maximum unstable organization count (default 6)n_estimators: Number of trees (default 100)max_depth: Maximum tree depth (default 5)gain_threshold: Gain difference threshold (default 50)max_corr: Correlation threshold (default 0.9)top_n_keep: Keep top N features by original gain ranking (default 20)The generated Excel report contains the following sheets:
Weekly Installs
5.6K
Repository
GitHub Stars
26.9K
First Seen
Mar 2, 2026
Security Audits
Gen Agent Trust HubWarnSocketPassSnykPass
Installed on
codex5.5K
gemini-cli5.5K
opencode5.5K
cursor5.5K
github-copilot5.5K
kimi-cli5.5K
52,000 周安装
| references.analysis |
drop_lowiv_features() | Drop low IV features | references.analysis |
drop_highpsi_features() | Drop high PSI features | references.analysis |
drop_highnoise_features() | Null Importance denoising | references.analysis |
drop_highcorr_features() | Drop high correlation features | references.analysis |
iv_distribution_by_org() | IV distribution statistics | references.analysis |
psi_distribution_by_org() | PSI distribution statistics | references.analysis |
value_ratio_distribution_by_org() | Value ratio distribution statistics | references.analysis |
export_cleaning_report() | Export cleaning report | references.analysis |