知识图谱构建指南：从本体设计到图数据库选择，提升AI系统性能与推理能力

Knowledge Graph Builder by daffy0208/ai-dev-standards

21 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/daffy0208/ai-dev-standards --skill 'Knowledge Graph Builder'

AI/机器学习数据处理知识管理

🇨🇳中文介绍

知识图谱构建器

通过关系知识构建结构化知识图谱，以提升 AI 系统性能。

核心原则

知识图谱使隐式关系显式化，使 AI 系统能够推理连接、验证事实并避免幻觉。

何时使用知识图谱

在以下情况使用知识图谱：

✅ 复杂的实体关系是您领域的核心
✅ 需要根据结构化知识验证 AI 生成的事实
✅ 需要进行语义搜索和关系遍历
✅ 数据具有丰富的互连性（人员、组织、产品）
✅ 需要回答“X 和 Y 之间有什么关系？”这类查询
✅ 基于关系构建推荐系统
✅ 跨连接数据进行欺诈检测或模式识别

在以下情况不要使用知识图谱：

❌ 简单的表格数据（使用关系型数据库）
❌ 纯基于文档的搜索（使用带向量数据库的 RAG）
❌ 实体之间没有显著关系
❌ 团队缺乏图建模专业知识
❌ 只读工作负载且无需遍历（使用传统数据库）

六阶段知识图谱实施

阶段 1：本体设计

目标：为您的领域定义实体、关系和属性

实体类型（节点）：

Person, Organization, Location, Product, Concept, Event, Document

关系类型（边）：

层次关系：IS_A, PART_OF, REPORTS_TO
关联关系：WORKS_FOR, LOCATED_IN, AUTHORED_BY, RELATED_TO
时间关系：CREATED_ON, OCCURRED_BEFORE, OCCURRED_AFTER

属性（特性）：

节点属性：id, name, type, created_at, metadata

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

阶段 2：图数据库选择

Neo4j（对大多数情况推荐）：

优点：成熟，Cypher 查询语言，图算法，优秀的可视化
缺点：企业版许可成本，扩展复杂性
适用场景：复杂查询，图算法，团队可以学习 Cypher

Amazon Neptune：

优点：托管服务，支持 Gremlin 和 SPARQL，AWS 集成
缺点：供应商锁定，比自托管更昂贵
适用场景：AWS 基础设施，需要托管服务，合规要求

优点：多模型（图 + 文档 + 键值），JavaScript 查询
缺点：社区较小，图特定功能较少
适用场景：需要在一个系统中同时使用文档数据库和图数据库

优点：深度遍历性能最佳，并行处理
缺点：设置复杂，学习曲线较高
适用场景：海量图（数十亿边），实时分析

graph_database: 'Neo4j Community' # or Enterprise for production
vector_integration: 'Pinecone' # For hybrid search
embeddings: 'text-embedding-3-large' # OpenAI
etl: 'Apache Airflow' # For data pipelines

Neo4j 模式设置：

// Create constraints for uniqueness
CREATE CONSTRAINT person_id IF NOT EXISTS
FOR (p:Person) REQUIRE p.id IS UNIQUE;

CREATE CONSTRAINT org_name IF NOT EXISTS
FOR (o:Organization) REQUIRE o.name IS UNIQUE;

// Create indexes for performance
CREATE INDEX entity_search IF NOT EXISTS
FOR (e:Entity) ON (e.name, e.type);

CREATE INDEX relationship_type IF NOT EXISTS
FOR ()-[r:RELATED_TO]-() ON (r.type, r.confidence);

阶段 3：实体提取与关系构建

目标：从数据源中提取实体和关系

结构化：数据库、API、CSV 文件
非结构化：文档、网页内容、文本文件
半结构化：JSON、XML、知识库

实体提取管道：

class EntityExtractionPipeline:
    def __init__(self):
        self.ner_model = load_ner_model()  # spaCy, Hugging Face
        self.entity_linker = EntityLinker()
        self.deduplicator = EntityDeduplicator()

    def process_text(self, text: str) -> List[Entity]:
        # 1. Extract named entities
        entities = self.ner_model.extract(text)

        # 2. Link to existing entities (entity resolution)
        linked_entities = self.entity_linker.link(entities)

        # 3. Deduplicate and resolve conflicts
        resolved_entities = self.deduplicator.resolve(linked_entities)

        return resolved_entities

class RelationshipExtractor:
    def extract_relationships(self, entities: List[Entity],
                            text: str) -> List[Relationship]:
        relationships = []

        # Use dependency parsing or LLM for extraction
        doc = self.nlp(text)
        for sent in doc.sents:
            rels = self.extract_from_sentence(sent, entities)
            relationships.extend(rels)

        # Validate against ontology
        valid_relationships = self.validate_relationships(relationships)
        return valid_relationships

基于 LLM 的提取（用于复杂关系）：

def extract_with_llm(text: str) -> List[Relationship]:
    prompt = f"""
    Extract entities and relationships from this text:
    {text}

    Format: (Entity1, Relationship, Entity2, Confidence)
    Only extract factual relationships.
    """

    response = llm.generate(prompt)
    relationships = parse_llm_response(response)
    return relationships

实体提取准确率 >85%
实体去重功能正常
关系已根据本体验证
已分配置信度分数

阶段 4：混合知识-向量架构

目标：将结构化图与语义向量搜索相结合

class HybridKnowledgeSystem:
    def __init__(self):
        self.graph_db = Neo4jConnection()
        self.vector_db = PineconeClient()
        self.embedding_model = OpenAIEmbeddings()

    def store_entity(self, entity: Entity):
        # Store structured data in graph
        self.graph_db.create_node(entity)

        # Store embeddings in vector database
        embedding = self.embedding_model.embed(entity.description)
        self.vector_db.upsert(
            id=entity.id,
            values=embedding,
            metadata=entity.metadata
        )

    def hybrid_search(self, query: str, top_k: int = 10) -> SearchResults:
        # 1. Vector similarity search
        query_embedding = self.embedding_model.embed(query)
        vector_results = self.vector_db.query(
            vector=query_embedding,
            top_k=100
        )

        # 2. Graph traversal from vector results
        entity_ids = [r.id for r in vector_results.matches]
        graph_results = self.graph_db.get_subgraph(entity_ids, max_hops=2)

        # 3. Merge and rank results
        merged = self.merge_results(vector_results, graph_results)
        return merged[:top_k]

混合方法的优势：

向量搜索：语义相似性，灵活查询
图遍历：基于关系的推理，上下文扩展
结合：两全其美

阶段 5：查询模式与 API 设计

常见查询模式：

1. 查找实体：

MATCH (e:Entity {id: $entity_id})
RETURN e

2. 查找关系：

MATCH (source:Entity {id: $entity_id})-[r]-(target)
RETURN source, r, target
LIMIT 20

3. 实体间路径：

MATCH path = shortestPath(
  (source:Person {id: $source_id})-[*..5]-(target:Person {id: $target_id})
)
RETURN path

4. 多跳遍历：

MATCH (p:Person {name: $name})-[:WORKS_FOR]->(o:Organization)-[:LOCATED_IN]->(l:Location)
RETURN p.name, o.name, l.city

5. 推荐查询：

// Find people similar to this person based on shared organizations
MATCH (p1:Person {id: $person_id})-[:WORKS_FOR]->(o:Organization)<-[:WORKS_FOR]-(p2:Person)
WHERE p1 <> p2
RETURN p2, COUNT(o) AS shared_orgs
ORDER BY shared_orgs DESC
LIMIT 10

知识图谱 API：

class KnowledgeGraphAPI:
    def __init__(self, graph_db):
        self.graph = graph_db

    def find_entity(self, entity_name: str) -> Entity:
        """Find entity by name with fuzzy matching"""
        query = """
        MATCH (e:Entity)
        WHERE e.name CONTAINS $name
        RETURN e
        ORDER BY apoc.text.levenshtein(e.name, $name)
        LIMIT 1
        """
        return self.graph.run(query, name=entity_name).single()

    def find_relationships(self, entity_id: str,
                         relationship_type: str = None,
                         max_hops: int = 2) -> List[Relationship]:
        """Find relationships within specified hops"""
        query = f"""
        MATCH (source:Entity {{id: $entity_id}})
        MATCH path = (source)-[r*1..{max_hops}]-(target)
        RETURN path, relationships(path) AS rels
        LIMIT 100
        """
        return self.graph.run(query, entity_id=entity_id).data()

    def get_subgraph(self, entity_ids: List[str],
                    max_hops: int = 2) -> Subgraph:
        """Get connected subgraph for multiple entities"""
        query = f"""
        MATCH (e:Entity)
        WHERE e.id IN $entity_ids
        CALL apoc.path.subgraphAll(e, {{maxLevel: {max_hops}}})
        YIELD nodes, relationships
        RETURN nodes, relationships
        """
        return self.graph.run(query, entity_ids=entity_ids).data()

阶段 6：AI 集成与幻觉预防

目标：使用知识图谱来锚定 LLM 响应并检测幻觉

知识图谱 RAG：

class KnowledgeGraphRAG:
    def __init__(self, kg_api, llm_client):
        self.kg = kg_api
        self.llm = llm_client

    def retrieve_context(self, query: str) -> str:
        # Extract entities from query
        entities = self.extract_entities_from_query(query)

        # Retrieve relevant subgraph
        subgraph = self.kg.get_subgraph(
            [e.id for e in entities],
            max_hops=2
        )

        # Format subgraph for LLM
        context = self.format_subgraph_for_llm(subgraph)
        return context

    def generate_with_grounding(self, query: str) -> GroundedResponse:
        context = self.retrieve_context(query)

        prompt = f"""
        Context from knowledge graph:
        {context}

        User query: {query}

        Answer based only on the provided context. Include source entities.
        """

        response = self.llm.generate(prompt)

        return GroundedResponse(
            response=response,
            sources=self.extract_sources(context),
            confidence=self.calculate_confidence(response, context)
        )

class HallucinationDetector:
    def __init__(self, knowledge_graph):
        self.kg = knowledge_graph

    def verify_claim(self, claim: str) -> VerificationResult:
        # Parse claim into (subject, predicate, object)
        parsed_claim = self.parse_claim(claim)

        # Query knowledge graph for evidence
        evidence = self.kg.find_evidence(
            parsed_claim.subject,
            parsed_claim.predicate,
            parsed_claim.object
        )

        if evidence:
            return VerificationResult(
                is_supported=True,
                evidence=evidence,
                confidence=evidence.confidence
            )

        # Check for contradictory evidence
        contradiction = self.kg.find_contradiction(parsed_claim)

        return VerificationResult(
            is_supported=False,
            is_contradicted=bool(contradiction),
            contradiction=contradiction
        )

在摄取数据之前定义您的模式。后期更改本体成本高昂。

2. 实体解析至关重要

积极进行实体去重。"Apple Inc"、"Apple"、"Apple Computer" → 同一实体。

3. 为所有内容分配置信度分数

每个关系都应具有置信度分数（0.0-1.0）和来源。

不要试图一次性建模整个领域。从核心实体开始并逐步扩展。

5. 混合架构胜出

结合图遍历（结构化）和向量搜索（语义）以获得最佳结果。

从问题中提取实体
遍历图谱以找到答案
返回路径作为解释

通过共享关系找到相似实体
按关系强度排序
返回前 K 个推荐

3. 欺诈检测：

将交易建模为图
查找可疑模式（循环、异常）
标记以供审查

4. 知识发现：

识别隐式关系
建议缺失的连接
与领域专家验证

5. 语义搜索：

混合向量 + 图搜索
通过关系扩展上下文
返回丰富的连接结果

对于 MVP（<10K 实体）：

Neo4j Community Edition（免费）
SQLite 用于元数据
OpenAI 嵌入
FastAPI 用于 API 层

对于生产环境（10K-1M 实体）：

Neo4j Enterprise 或 ArangoDB
Pinecone 用于向量搜索
Airflow 用于 ETL
GraphQL API

对于企业级（1M+ 实体）：

Neo4j Enterprise 或 TigerGraph
分布式向量数据库（Pinecone, Weaviate）
Kafka 用于流处理
Kubernetes 部署

本体已设计并经领域专家验证
图数据库已选择并设置
实体提取管道已测试（>85% 准确率）
关系提取已验证
混合搜索（图 + 向量）已实现
查询 API 已创建并文档化
AI 集成已测试（RAG 或幻觉检测）
性能基准已满足（常见模式查询 <100ms）
数据质量监控已就位
备份和恢复已测试

rag-implementer - 用于混合 KG+RAG 系统
multi-agent-architect - 用于知识图谱驱动的智能体
api-designer - 用于 KG API 设计

META/DECISION-FRAMEWORK.md - 图数据库选择
STANDARDS/architecture-patterns/knowledge-graph-pattern.md - KG 架构（创建时）

相关操作手册：

PLAYBOOKS/deploy-neo4j.md - Neo4j 部署（创建时）
PLAYBOOKS/build-kg-rag-system.md - KG-RAG 集成（创建时）

🇺🇸English

Knowledge Graph Builder

Build structured knowledge graphs for enhanced AI system performance through relational knowledge.

Core Principle

Knowledge graphs make implicit relationships explicit , enabling AI systems to reason about connections, verify facts, and avoid hallucinations.

When to Use Knowledge Graphs

Use Knowledge Graphs When:

✅ Complex entity relationships are central to your domain
✅ Need to verify AI-generated facts against structured knowledge
✅ Semantic search and relationship traversal required
✅ Data has rich interconnections (people, organizations, products)
✅ Need to answer "how are X and Y related?" queries
✅ Building recommendation systems based on relationships
✅ Fraud detection or pattern recognition across connected data

Don't Use Knowledge Graphs When:

❌ Simple tabular data (use relational DB)
❌ Purely document-based search (use RAG with vector DB)
❌ No significant relationships between entities
❌ Team lacks graph modeling expertise
❌ Read-heavy workload with no traversal (use traditional DB)

6-Phase Knowledge Graph Implementation

Phase 1: Ontology Design

Goal : Define entities, relationships, and properties for your domain

Entity Types (Nodes):

Person, Organization, Location, Product, Concept, Event, Document

Relationship Types (Edges):

Hierarchical: IS_A, PART_OF, REPORTS_TO
Associative: WORKS_FOR, LOCATED_IN, AUTHORED_BY, RELATED_TO
Temporal: CREATED_ON, OCCURRED_BEFORE, OCCURRED_AFTER

Properties (Attributes):

Node properties: id, name, type, created_at, metadata
Edge properties: type, confidence, source, timestamp

Example Ontology :

# RDF/Turtle format
@prefix : <http://example.org/ontology#> .

:Person a owl:Class ;
    rdfs:label "Person" .

:Organization a owl:Class ;
    rdfs:label "Organization" .

:worksFor a owl:ObjectProperty ;
    rdfs:domain :Person ;
    rdfs:range :Organization ;
    rdfs:label "works for" .

Validation :

Entities cover all domain concepts
Relationships capture key connections
Ontology reviewed with domain experts
Classification hierarchy defined (is-a relationships)

Phase 2: Graph Database Selection

Decision Matrix :

Neo4j (Recommended for most):

Pros: Mature, Cypher query language, graph algorithms, excellent visualization
Cons: Licensing costs for enterprise, scaling complexity
Use when: Complex queries, graph algorithms, team can learn Cypher

Amazon Neptune :

Pros: Managed service, supports Gremlin and SPARQL, AWS integration
Cons: Vendor lock-in, more expensive than self-hosted
Use when: AWS infrastructure, need managed service, compliance requirements

ArangoDB :

Pros: Multi-model (graph + document + key-value), JavaScript queries
Cons: Smaller community, fewer graph-specific features
Use when: Need document DB + graph in one system

TigerGraph :

Pros: Best performance for deep traversals, parallel processing
Cons: Complex setup, higher learning curve
Use when: Massive graphs (billions of edges), real-time analytics

Technology Stack :

graph_database: 'Neo4j Community' # or Enterprise for production
vector_integration: 'Pinecone' # For hybrid search
embeddings: 'text-embedding-3-large' # OpenAI
etl: 'Apache Airflow' # For data pipelines

Neo4j Schema Setup :

// Create constraints for uniqueness
CREATE CONSTRAINT person_id IF NOT EXISTS
FOR (p:Person) REQUIRE p.id IS UNIQUE;

CREATE CONSTRAINT org_name IF NOT EXISTS
FOR (o:Organization) REQUIRE o.name IS UNIQUE;

// Create indexes for performance
CREATE INDEX entity_search IF NOT EXISTS
FOR (e:Entity) ON (e.name, e.type);

CREATE INDEX relationship_type IF NOT EXISTS
FOR ()-[r:RELATED_TO]-() ON (r.type, r.confidence);

Phase 3: Entity Extraction & Relationship Building

Goal : Extract entities and relationships from data sources

Data Sources :

Structured: Databases, APIs, CSV files
Unstructured: Documents, web content, text files
Semi-structured: JSON, XML, knowledge bases

Entity Extraction Pipeline :

class EntityExtractionPipeline:
    def __init__(self):
        self.ner_model = load_ner_model()  # spaCy, Hugging Face
        self.entity_linker = EntityLinker()
        self.deduplicator = EntityDeduplicator()

    def process_text(self, text: str) -> List[Entity]:
        # 1. Extract named entities
        entities = self.ner_model.extract(text)

        # 2. Link to existing entities (entity resolution)
        linked_entities = self.entity_linker.link(entities)

        # 3. Deduplicate and resolve conflicts
        resolved_entities = self.deduplicator.resolve(linked_entities)

        return resolved_entities

Relationship Extraction :

class RelationshipExtractor:
    def extract_relationships(self, entities: List[Entity],
                            text: str) -> List[Relationship]:
        relationships = []

        # Use dependency parsing or LLM for extraction
        doc = self.nlp(text)
        for sent in doc.sents:
            rels = self.extract_from_sentence(sent, entities)
            relationships.extend(rels)

        # Validate against ontology
        valid_relationships = self.validate_relationships(relationships)
        return valid_relationships

LLM-Based Extraction (for complex relationships):

def extract_with_llm(text: str) -> List[Relationship]:
    prompt = f"""
    Extract entities and relationships from this text:
    {text}

    Format: (Entity1, Relationship, Entity2, Confidence)
    Only extract factual relationships.
    """

    response = llm.generate(prompt)
    relationships = parse_llm_response(response)
    return relationships

Validation :

Entity extraction accuracy >85%
Entity deduplication working
Relationships validated against ontology
Confidence scores assigned

Phase 4: Hybrid Knowledge-Vector Architecture

Goal : Combine structured graph with semantic vector search

Architecture :

class HybridKnowledgeSystem:
    def __init__(self):
        self.graph_db = Neo4jConnection()
        self.vector_db = PineconeClient()
        self.embedding_model = OpenAIEmbeddings()

    def store_entity(self, entity: Entity):
        # Store structured data in graph
        self.graph_db.create_node(entity)

        # Store embeddings in vector database
        embedding = self.embedding_model.embed(entity.description)
        self.vector_db.upsert(
            id=entity.id,
            values=embedding,
            metadata=entity.metadata
        )

    def hybrid_search(self, query: str, top_k: int = 10) -> SearchResults:
        # 1. Vector similarity search
        query_embedding = self.embedding_model.embed(query)
        vector_results = self.vector_db.query(
            vector=query_embedding,
            top_k=100
        )

        # 2. Graph traversal from vector results
        entity_ids = [r.id for r in vector_results.matches]
        graph_results = self.graph_db.get_subgraph(entity_ids, max_hops=2)

        # 3. Merge and rank results
        merged = self.merge_results(vector_results, graph_results)
        return merged[:top_k]

Benefits of Hybrid Approach :

Vector search: Semantic similarity, flexible queries
Graph traversal: Relationship-based reasoning, context expansion
Combined: Best of both worlds

Phase 5: Query Patterns & API Design

Common Query Patterns :

1. Find Entity :

MATCH (e:Entity {id: $entity_id})
RETURN e

2. Find Relationships :

MATCH (source:Entity {id: $entity_id})-[r]-(target)
RETURN source, r, target
LIMIT 20

3. Path Between Entities :

MATCH path = shortestPath(
  (source:Person {id: $source_id})-[*..5]-(target:Person {id: $target_id})
)
RETURN path

4. Multi-Hop Traversal :

MATCH (p:Person {name: $name})-[:WORKS_FOR]->(o:Organization)-[:LOCATED_IN]->(l:Location)
RETURN p.name, o.name, l.city

5. Recommendation Query :

// Find people similar to this person based on shared organizations
MATCH (p1:Person {id: $person_id})-[:WORKS_FOR]->(o:Organization)<-[:WORKS_FOR]-(p2:Person)
WHERE p1 <> p2
RETURN p2, COUNT(o) AS shared_orgs
ORDER BY shared_orgs DESC
LIMIT 10

Knowledge Graph API :

class KnowledgeGraphAPI:
    def __init__(self, graph_db):
        self.graph = graph_db

    def find_entity(self, entity_name: str) -> Entity:
        """Find entity by name with fuzzy matching"""
        query = """
        MATCH (e:Entity)
        WHERE e.name CONTAINS $name
        RETURN e
        ORDER BY apoc.text.levenshtein(e.name, $name)
        LIMIT 1
        """
        return self.graph.run(query, name=entity_name).single()

    def find_relationships(self, entity_id: str,
                         relationship_type: str = None,
                         max_hops: int = 2) -> List[Relationship]:
        """Find relationships within specified hops"""
        query = f"""
        MATCH (source:Entity {{id: $entity_id}})
        MATCH path = (source)-[r*1..{max_hops}]-(target)
        RETURN path, relationships(path) AS rels
        LIMIT 100
        """
        return self.graph.run(query, entity_id=entity_id).data()

    def get_subgraph(self, entity_ids: List[str],
                    max_hops: int = 2) -> Subgraph:
        """Get connected subgraph for multiple entities"""
        query = f"""
        MATCH (e:Entity)
        WHERE e.id IN $entity_ids
        CALL apoc.path.subgraphAll(e, {{maxLevel: {max_hops}}})
        YIELD nodes, relationships
        RETURN nodes, relationships
        """
        return self.graph.run(query, entity_ids=entity_ids).data()

Phase 6: AI Integration & Hallucination Prevention

Goal : Use knowledge graph to ground LLM responses and detect hallucinations

Knowledge Graph RAG :

class KnowledgeGraphRAG:
    def __init__(self, kg_api, llm_client):
        self.kg = kg_api
        self.llm = llm_client

    def retrieve_context(self, query: str) -> str:
        # Extract entities from query
        entities = self.extract_entities_from_query(query)

        # Retrieve relevant subgraph
        subgraph = self.kg.get_subgraph(
            [e.id for e in entities],
            max_hops=2
        )

        # Format subgraph for LLM
        context = self.format_subgraph_for_llm(subgraph)
        return context

    def generate_with_grounding(self, query: str) -> GroundedResponse:
        context = self.retrieve_context(query)

        prompt = f"""
        Context from knowledge graph:
        {context}

        User query: {query}

        Answer based only on the provided context. Include source entities.
        """

        response = self.llm.generate(prompt)

        return GroundedResponse(
            response=response,
            sources=self.extract_sources(context),
            confidence=self.calculate_confidence(response, context)
        )

Hallucination Detection :

class HallucinationDetector:
    def __init__(self, knowledge_graph):
        self.kg = knowledge_graph

    def verify_claim(self, claim: str) -> VerificationResult:
        # Parse claim into (subject, predicate, object)
        parsed_claim = self.parse_claim(claim)

        # Query knowledge graph for evidence
        evidence = self.kg.find_evidence(
            parsed_claim.subject,
            parsed_claim.predicate,
            parsed_claim.object
        )

        if evidence:
            return VerificationResult(
                is_supported=True,
                evidence=evidence,
                confidence=evidence.confidence
            )

        # Check for contradictory evidence
        contradiction = self.kg.find_contradiction(parsed_claim)

        return VerificationResult(
            is_supported=False,
            is_contradicted=bool(contradiction),
            contradiction=contradiction
        )

Key Principles

1. Start with Ontology

Define your schema before ingesting data. Changing ontology later is expensive.

2. Entity Resolution is Critical

Deduplicate entities aggressively. "Apple Inc", "Apple", "Apple Computer" → same entity.

3. Confidence Scores on Everything

Every relationship should have a confidence score (0.0-1.0) and source.

4. Incremental Building

Don't try to model entire domain at once. Start with core entities and expand.

5. Hybrid Architecture Wins

Combine graph traversal (structured) with vector search (semantic) for best results.

Common Use Cases

1. Question Answering :

Extract entities from question
Traverse graph to find answer
Return path as explanation

2. Recommendation :

Find similar entities via shared relationships
Rank by relationship strength
Return top-K recommendations

3. Fraud Detection :

Model transactions as graph
Find suspicious patterns (cycles, anomalies)
Flag for review

4. Knowledge Discovery :

Identify implicit relationships
Suggest missing connections
Validate with domain experts

5. Semantic Search :

Hybrid vector + graph search
Expand context via relationships
Return rich connected results

Technology Recommendations

For MVPs ( <10K entities):

Neo4j Community Edition (free)
SQLite for metadata
OpenAI embeddings
FastAPI for API layer

For Production (10K-1M entities) :

Neo4j Enterprise or ArangoDB
Pinecone for vector search
Airflow for ETL
GraphQL API

For Enterprise (1M+ entities) :

Neo4j Enterprise or TigerGraph
Distributed vector DB (Pinecone, Weaviate)
Kafka for streaming
Kubernetes deployment

Validation Checklist

Ontology designed and validated with domain experts
Graph database selected and set up
Entity extraction pipeline tested (>85% accuracy)
Relationship extraction validated
Hybrid search (graph + vector) implemented
Query API created and documented
AI integration tested (RAG or hallucination detection)
Performance benchmarks met (query <100ms for common patterns)
Data quality monitoring in place
Backup and recovery tested

Related Resources

Related Skills :

rag-implementer - For hybrid KG+RAG systems
multi-agent-architect - For knowledge-graph-powered agents
api-designer - For KG API design

Related Patterns :

META/DECISION-FRAMEWORK.md - Graph DB selection
STANDARDS/architecture-patterns/knowledge-graph-pattern.md - KG architectures (when created)

Related Playbooks :

PLAYBOOKS/deploy-neo4j.md - Neo4j deployment (when created)
PLAYBOOKS/build-kg-rag-system.md - KG-RAG integration (when created)

Weekly Installs

Repository

daffy0208/ai-de…tandards

GitHub Stars

First Seen

Jan 1, 1970

Security Audits

Gen Agent Trust HubFail SocketPass SnykWarn

超能力技能使用指南：AI助手技能调用优先级与工作流程详解

45,100 周安装