Fabric Lakehouse 使用指南：核心概念、性能优化与数据管理实践

fabric-lakehouse by github/awesome-copilot

7,400 周安装量

27,200 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/github/awesome-copilot --skill fabric-lakehouse

开发云服务数据分析

🇨🇳中文介绍

何时使用此技能

在以下情况下使用此技能：

生成包含 Fabric Lakehouse 及其功能定义和背景的文档或解释。
使用最佳实践设计、构建和优化 Lakehouse 解决方案。
理解 Microsoft Fabric 中 Lakehouse 的核心概念和组件。
学习如何在 Lakehouse 中管理表格数据和非表格数据。

Fabric Lakehouse

核心概念

什么是 Lakehouse？

Microsoft Fabric 中的 Lakehouse 是一个项目，为用户提供了一个存储表格数据（如表）和非表格数据（如文件）的场所。它结合了数据湖的灵活性和数据仓库的管理能力。它提供：

统一存储：在 OneLake 中存储结构化和非结构化数据
Delta Lake 格式：支持 ACID 事务、版本控制和时间旅行
SQL 分析端点：用于 T-SQL 查询
语义模型：用于 Power BI 集成
支持其他表格格式，如 CSV、Parquet
支持任何文件格式
用于表优化和数据管理的工具

关键组件

Delta 表：具有 ACID 合规性和模式强制性的托管表
文件：存储在"文件"部分中的非结构化/半结构化数据
SQL 端点：自动生成的只读 SQL 接口，用于查询
快捷方式：指向外部/内部数据的虚拟链接，无需复制
Fabric 物化视图：预计算的表，用于快速查询性能

Lakehouse 中的表格数据

表格形式的表格数据存储在"表"文件夹下。Lakehouse 中表的主要格式是 Delta。Lakehouse 可以以其他格式（如 CSV 或 Parquet）存储表格数据，这些格式仅适用于 Spark 查询。表可以是内部的（当数据存储在"表"文件夹下时），也可以是外部的（当只有对表的引用存储在"表"文件夹下，而数据本身存储在引用位置时）。表通过快捷方式进行引用，快捷方式可以是内部的（指向 Fabric 中的另一个位置）或外部的（指向 Fabric 外部存储的数据）。

Lakehouse 中表的模式

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

Lakehouse 中的文件

文件存储在"文件"文件夹下。用户可以创建文件夹和子文件夹来组织他们的文件。任何文件格式都可以存储在 Lakehouse 中。

一组预计算的表，根据计划自动更新。它们为复杂的聚合和连接提供快速的查询性能。物化视图使用 PySpark 或 Spark SQL 定义，并存储在关联的 Notebook 中。

由 SQL 查询定义的逻辑表。它们不存储数据，但为查询提供虚拟层。视图使用 Spark SQL 定义，并存储在 Lakehouse 中，与表相邻。

项目访问或控制平面安全性

用户可以拥有工作区角色（管理员、成员、参与者、查看者），这些角色提供对 Lakehouse 及其内容的不同级别的访问权限。用户还可以使用 Lakehouse 的共享功能获得访问权限。

数据访问或 OneLake 安全性

对于数据访问，请使用 OneLake 安全模型，该模型基于 Microsoft Entra ID（原 Azure Active Directory）和基于角色的访问控制（RBAC）。Lakehouse 数据存储在 OneLake 中，因此对数据的访问通过 OneLake 权限进行控制。除了对象级权限外，Lakehouse 还支持表的列级和行级安全性，允许对谁可以查看表中的特定列或行进行细粒度控制。

Lakehouse 快捷方式

快捷方式创建指向数据的虚拟链接，无需复制：

内部：链接到其他 Fabric Lakehouse/表，跨工作区数据共享
ADLS Gen2：链接到 Azure 中的 ADLS Gen2 容器
Amazon S3：AWS S3 存储桶，跨云数据访问
Dataverse：Microsoft Dataverse，业务应用程序数据
Google Cloud Storage：GCS 存储桶，跨云数据访问

为了通过语义模型实现更快的数据读取，请在 Delta 表上启用 V-Order 优化。这会以一种改进常见访问模式查询性能的方式对数据进行预排序。

表也可以使用 OPTIMIZE 命令进行优化，该命令将小文件压缩成更大的文件，并且还可以应用 Z 排序来提高特定列上的查询性能。定期优化有助于在数据随时间摄取和更新时保持性能。Vacuum 命令可用于清理旧文件并释放存储空间，尤其是在更新和删除操作之后。

Lakehouse 项目支持数据沿袭，允许用户跟踪数据的来源和转换。Lakehouse 中的表和文件会自动捕获沿袭信息，显示数据如何从源头流向目的地。这有助于调试、审计和理解数据依赖关系。

PySpark 代码示例

详情请参见 PySpark 代码。

将数据导入 Lakehouse

详情请参见获取数据。

🇺🇸English

When to Use This Skill

Use this skill when you need to:

Generate a document or explanation that includes definition and context about Fabric Lakehouse and its capabilities.
Design, build, and optimize Lakehouse solutions using best practices.
Understand the core concepts and components of a Lakehouse in Microsoft Fabric.
Learn how to manage tabular and non-tabular data within a Lakehouse.

Fabric Lakehouse

Core Concepts

What is a Lakehouse?

Lakehouse in Microsoft Fabric is an item that gives users a place to store their tabular data (like tables) and non-tabular data (like files). It combines the flexibility of a data lake with the management capabilities of a data warehouse. It provides:

Unified storage in OneLake for structured and unstructured data
Delta Lake format for ACID transactions, versioning, and time travel
SQL analytics endpoint for T-SQL queries
Semantic model for Power BI integration
Support for other table formats like CSV, Parquet
Support for any file formats
Tools for table optimization and data management

Key Components

Delta Tables : Managed tables with ACID compliance and schema enforcement
Files : Unstructured/semi-structured data in the Files section
SQL Endpoint : Auto-generated read-only SQL interface for querying
Shortcuts : Virtual links to external/internal data without copying
Fabric Materialized Views : Pre-computed tables for fast query performance

Tabular data in a Lakehouse

Tabular data in a form of tables are stored under "Tables" folder. Main format for tables in Lakehouse is Delta. Lakehouse can store tabular data in other formats like CSV or Parquet, these formats are only available for Spark querying. Tables can be internal, when data is stored under "Tables" folder, or external, when only reference to a table is stored under "Tables" folder but the data itself is stored in a referenced location. Tables are referenced through Shortcuts, which can be internal (pointing to another location in Fabric) or external (pointing to data stored outside of Fabric).

Schemas for tables in a Lakehouse

When creating a lakehouse, users can choose to enable schemas. Schemas are used to organize Lakehouse tables. Schemas are implemented as folders under the "Tables" folder and store tables inside of those folders. The default schema is "dbo" and it can't be deleted or renamed. All other schemas are optional and can be created, renamed, or deleted. Users can reference a schema located in another lakehouse using a Schema Shortcut, thereby referencing all tables in the destination schema with a single shortcut.

Files in a Lakehouse

Files are stored under "Files" folder. Users can create folders and subfolders to organize their files. Any file format can be stored in Lakehouse.

Fabric Materialized Views

Set of pre-computed tables that are automatically updated based on a schedule. They provide fast query performance for complex aggregations and joins. Materialized views are defined using PySpark or Spark SQL and stored in an associated Notebook.

Spark Views

Logical tables defined by a SQL query. They do not store data but provide a virtual layer for querying. Views are defined using Spark SQL and stored in Lakehouse next to Tables.

Security

Item access or control plane security

Users can have workspace roles (Admin, Member, Contributor, Viewer) that provide different levels of access to Lakehouse and its contents. Users can also get access permission using sharing capabilities of Lakehouse.

Data access or OneLake Security

For data access use OneLake security model, which is based on Microsoft Entra ID (formerly Azure Active Directory) and role-based access control (RBAC). Lakehouse data is stored in OneLake, so access to data is controlled through OneLake permissions. In addition to object-level permissions, Lakehouse also supports column-level and row-level security for tables, allowing fine-grained control over who can see specific columns or rows in a table.

Lakehouse Shortcuts

Shortcuts create virtual links to data without copying:

Types of Shortcuts

Internal : Link to other Fabric Lakehouses/tables, cross-workspace data sharing
ADLS Gen2 : Link to ADLS Gen2 containers in Azure
Amazon S3 : AWS S3 buckets, cross-cloud data access
Dataverse : Microsoft Dataverse, business application data
Google Cloud Storage : GCS buckets, cross-cloud data access

Performance Optimization

V-Order Optimization

For faster data read with semantic model enable V-Order optimization on Delta tables. This presorts data in a way that improves query performance for common access patterns.

Table Optimization

Tables can also be optimized using the OPTIMIZE command, which compacts small files into larger ones and can also apply Z-ordering to improve query performance on specific columns. Regular optimization helps maintain performance as data is ingested and updated over time. The Vacuum command can be used to clean up old files and free up storage space, especially after updates and deletes.

Lineage

The Lakehouse item supports lineage, which allows users to track the origin and transformations of data. Lineage information is automatically captured for tables and files in Lakehouse, showing how data flows from source to destination. This helps with debugging, auditing, and understanding data dependencies.

PySpark Code Examples

See PySpark code for details.

Getting data into Lakehouse

See Get data for details.

Weekly Installs

7.3K

Repository

github/awesome-copilot

GitHub Stars

26.9K

First Seen

Feb 18, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

codex7.2K

gemini-cli7.2K

opencode7.2K

cursor7.2K

github-copilot7.2K

kimi-cli7.2K

React 组合模式指南：Vercel 组件架构最佳实践，提升代码可维护性

102,200 周安装

Fabric Lakehouse 使用指南：核心概念、性能优化与数据管理实践

🇨🇳中文介绍

何时使用此技能

Fabric Lakehouse

核心概念

什么是 Lakehouse？

关键组件

Lakehouse 中的表格数据

Lakehouse 中表的模式

相关 Skills

Lakehouse 中的文件

Fabric 物化视图

Spark 视图

安全性

项目访问或控制平面安全性

数据访问或 OneLake 安全性

Lakehouse 快捷方式

快捷方式类型

性能优化

V-Order 优化

表优化

数据沿袭

PySpark 代码示例

将数据导入 Lakehouse

🇺🇸English

When to Use This Skill

Fabric Lakehouse

Core Concepts

What is a Lakehouse?

Key Components

Tabular data in a Lakehouse

Schemas for tables in a Lakehouse

Files in a Lakehouse

Fabric Materialized Views

Spark Views

Security

Item access or control plane security

Data access or OneLake Security

Lakehouse Shortcuts

Types of Shortcuts

Performance Optimization

V-Order Optimization

Table Optimization

Lineage

PySpark Code Examples

Getting data into Lakehouse

最新 Skills