fabric-lakehouse by github/awesome-copilot
npx skills add https://github.com/github/awesome-copilot --skill fabric-lakehouse在以下情况下使用此技能:
Microsoft Fabric 中的 Lakehouse 是一个项目,为用户提供了一个存储表格数据(如表)和非表格数据(如文件)的场所。它结合了数据湖的灵活性和数据仓库的管理能力。它提供:
表格形式的表格数据存储在"表"文件夹下。Lakehouse 中表的主要格式是 Delta。Lakehouse 可以以其他格式(如 CSV 或 Parquet)存储表格数据,这些格式仅适用于 Spark 查询。表可以是内部的(当数据存储在"表"文件夹下时),也可以是外部的(当只有对表的引用存储在"表"文件夹下,而数据本身存储在引用位置时)。表通过快捷方式进行引用,快捷方式可以是内部的(指向 Fabric 中的另一个位置)或外部的(指向 Fabric 外部存储的数据)。
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
创建 lakehouse 时,用户可以选择启用模式。模式用于组织 Lakehouse 表。模式在"表"文件夹下以文件夹的形式实现,并在这些文件夹内存储表。默认模式是"dbo",它不能被删除或重命名。所有其他模式都是可选的,可以创建、重命名或删除。用户可以使用模式快捷方式引用位于另一个 lakehouse 中的模式,从而通过单个快捷方式引用目标模式中的所有表。
文件存储在"文件"文件夹下。用户可以创建文件夹和子文件夹来组织他们的文件。任何文件格式都可以存储在 Lakehouse 中。
一组预计算的表,根据计划自动更新。它们为复杂的聚合和连接提供快速的查询性能。物化视图使用 PySpark 或 Spark SQL 定义,并存储在关联的 Notebook 中。
由 SQL 查询定义的逻辑表。它们不存储数据,但为查询提供虚拟层。视图使用 Spark SQL 定义,并存储在 Lakehouse 中,与表相邻。
用户可以拥有工作区角色(管理员、成员、参与者、查看者),这些角色提供对 Lakehouse 及其内容的不同级别的访问权限。用户还可以使用 Lakehouse 的共享功能获得访问权限。
对于数据访问,请使用 OneLake 安全模型,该模型基于 Microsoft Entra ID(原 Azure Active Directory)和基于角色的访问控制(RBAC)。Lakehouse 数据存储在 OneLake 中,因此对数据的访问通过 OneLake 权限进行控制。除了对象级权限外,Lakehouse 还支持表的列级和行级安全性,允许对谁可以查看表中的特定列或行进行细粒度控制。
快捷方式创建指向数据的虚拟链接,无需复制:
为了通过语义模型实现更快的数据读取,请在 Delta 表上启用 V-Order 优化。这会以一种改进常见访问模式查询性能的方式对数据进行预排序。
表也可以使用 OPTIMIZE 命令进行优化,该命令将小文件压缩成更大的文件,并且还可以应用 Z 排序来提高特定列上的查询性能。定期优化有助于在数据随时间摄取和更新时保持性能。Vacuum 命令可用于清理旧文件并释放存储空间,尤其是在更新和删除操作之后。
Lakehouse 项目支持数据沿袭,允许用户跟踪数据的来源和转换。Lakehouse 中的表和文件会自动捕获沿袭信息,显示数据如何从源头流向目的地。这有助于调试、审计和理解数据依赖关系。
详情请参见 PySpark 代码。
详情请参见 获取数据。
每周安装量
7.3K
代码仓库
GitHub 星标数
26.9K
首次出现
2026年2月18日
安全审计
安装于
codex7.2K
gemini-cli7.2K
opencode7.2K
cursor7.2K
github-copilot7.2K
kimi-cli7.2K
Use this skill when you need to:
Lakehouse in Microsoft Fabric is an item that gives users a place to store their tabular data (like tables) and non-tabular data (like files). It combines the flexibility of a data lake with the management capabilities of a data warehouse. It provides:
Tabular data in a form of tables are stored under "Tables" folder. Main format for tables in Lakehouse is Delta. Lakehouse can store tabular data in other formats like CSV or Parquet, these formats are only available for Spark querying. Tables can be internal, when data is stored under "Tables" folder, or external, when only reference to a table is stored under "Tables" folder but the data itself is stored in a referenced location. Tables are referenced through Shortcuts, which can be internal (pointing to another location in Fabric) or external (pointing to data stored outside of Fabric).
When creating a lakehouse, users can choose to enable schemas. Schemas are used to organize Lakehouse tables. Schemas are implemented as folders under the "Tables" folder and store tables inside of those folders. The default schema is "dbo" and it can't be deleted or renamed. All other schemas are optional and can be created, renamed, or deleted. Users can reference a schema located in another lakehouse using a Schema Shortcut, thereby referencing all tables in the destination schema with a single shortcut.
Files are stored under "Files" folder. Users can create folders and subfolders to organize their files. Any file format can be stored in Lakehouse.
Set of pre-computed tables that are automatically updated based on a schedule. They provide fast query performance for complex aggregations and joins. Materialized views are defined using PySpark or Spark SQL and stored in an associated Notebook.
Logical tables defined by a SQL query. They do not store data but provide a virtual layer for querying. Views are defined using Spark SQL and stored in Lakehouse next to Tables.
Users can have workspace roles (Admin, Member, Contributor, Viewer) that provide different levels of access to Lakehouse and its contents. Users can also get access permission using sharing capabilities of Lakehouse.
For data access use OneLake security model, which is based on Microsoft Entra ID (formerly Azure Active Directory) and role-based access control (RBAC). Lakehouse data is stored in OneLake, so access to data is controlled through OneLake permissions. In addition to object-level permissions, Lakehouse also supports column-level and row-level security for tables, allowing fine-grained control over who can see specific columns or rows in a table.
Shortcuts create virtual links to data without copying:
For faster data read with semantic model enable V-Order optimization on Delta tables. This presorts data in a way that improves query performance for common access patterns.
Tables can also be optimized using the OPTIMIZE command, which compacts small files into larger ones and can also apply Z-ordering to improve query performance on specific columns. Regular optimization helps maintain performance as data is ingested and updated over time. The Vacuum command can be used to clean up old files and free up storage space, especially after updates and deletes.
The Lakehouse item supports lineage, which allows users to track the origin and transformations of data. Lineage information is automatically captured for tables and files in Lakehouse, showing how data flows from source to destination. This helps with debugging, auditing, and understanding data dependencies.
See PySpark code for details.
See Get data for details.
Weekly Installs
7.3K
Repository
GitHub Stars
26.9K
First Seen
Feb 18, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
codex7.2K
gemini-cli7.2K
opencode7.2K
cursor7.2K
github-copilot7.2K
kimi-cli7.2K
React 组合模式指南:Vercel 组件架构最佳实践,提升代码可维护性
102,200 周安装