Initial travel knowledge graph release
This commit is contained in:
101
docs/kg-redesign/autoschema_kg_adaptation.md
Normal file
101
docs/kg-redesign/autoschema_kg_adaptation.md
Normal file
@@ -0,0 +1,101 @@
|
||||
# AutoSchemaKG 思路在 new2 中的落地方式
|
||||
|
||||
参考:
|
||||
|
||||
- Paper: https://arxiv.org/abs/2505.23628
|
||||
- GitHub: https://github.com/HKUST-KnowComp/AutoSchemaKG
|
||||
- 代码中可重点参考:
|
||||
- `atlas_rag/kg_construction/triple_extraction.py`
|
||||
- `atlas_rag/kg_construction/concept_generation.py`
|
||||
|
||||
## 不能照搬的部分
|
||||
|
||||
AutoSchemaKG 偏开放域文本 KG,目标是无需预定义 schema 自动构建 ATLAS。当前项目是城市知识图谱,已经有高德 POI 这种结构化、高可信、带坐标的数据源。因此 new2 不应完全抛弃领域锚点,而应采用混合架构:
|
||||
|
||||
```text
|
||||
结构化高可信数据:作为 Anchor Layer
|
||||
非结构化多源数据:作为 Evidence Layer
|
||||
LLM 抽取:产出 Candidate Knowledge
|
||||
Schema 自动演化:只生成 Proposal,不直接改生产图谱
|
||||
```
|
||||
|
||||
## 要吸收的核心思想
|
||||
|
||||
AutoSchemaKG 的关键不是“多抽几个三元组”,而是:
|
||||
|
||||
1. 抽取阶段覆盖实体、事件、实体关系、事件关系。
|
||||
2. 把 Relation 本身也作为可概念化对象。
|
||||
3. 通过 Conceptualization 给实例和关系建立抽象语义桥。
|
||||
4. Schema 不是一次写死,而是由数据持续诱导与修正。
|
||||
|
||||
对应到 new2:
|
||||
|
||||
```text
|
||||
stage_1: Entity-Entity Relation
|
||||
stage_2: Entity-Event Participation / HappensAt
|
||||
stage_3: Event-Event Relation
|
||||
stage_4: Concept Induction
|
||||
stage_5: Schema Proposal & Review
|
||||
```
|
||||
|
||||
## new2 的统一抽取输出
|
||||
|
||||
每次抽取必须输出:
|
||||
|
||||
```json
|
||||
{
|
||||
"entities": [],
|
||||
"events": [],
|
||||
"concepts": [],
|
||||
"relations": [],
|
||||
"statements": [],
|
||||
"schema_proposals": [],
|
||||
"evidence_links": []
|
||||
}
|
||||
```
|
||||
|
||||
其中:
|
||||
|
||||
- `entities`:现实对象,例如地点、机构、人物、设施。
|
||||
- `events`:有时间锚点或动作过程的事实。
|
||||
- `concepts`:抽象类别、主题、场景、标签、业务语义。
|
||||
- `relations`:关系类型定义或候选关系。
|
||||
- `statements`:带主语、谓词、宾语、证据、置信度的候选事实。
|
||||
- `schema_proposals`:模型发现的新类型、新字段、新关系建议。
|
||||
|
||||
## Schema Auto 更新原则
|
||||
|
||||
模型不能直接修改正式 schema。正确流程是:
|
||||
|
||||
```text
|
||||
抽取发现新类型/字段/关系
|
||||
-> 写入 schema_proposals
|
||||
-> 合并相似 proposal
|
||||
-> 统计证据数量、来源数量、置信度
|
||||
-> 人工审核
|
||||
-> 生成 schema version
|
||||
-> 后续抽取使用新版本 schema
|
||||
```
|
||||
|
||||
## 城市领域初始 Seed Schema
|
||||
|
||||
城市 KG 先给一个轻量 seed,不要过度建模:
|
||||
|
||||
```text
|
||||
Entity:
|
||||
Place, Area, Organization, Person, Facility, Product, Route
|
||||
|
||||
Event:
|
||||
OpeningEvent, RenovationEvent, FestivalEvent, HistoricalEvent,
|
||||
AwardEvent, OperationChangeEvent, IncidentEvent
|
||||
|
||||
Concept:
|
||||
NightTour, LocalFood, HistoricalLandmark, BusinessDistrict,
|
||||
FamilyFriendly, FirstTimerFriendly, CulturalTourism, TransitConvenient
|
||||
|
||||
Relation:
|
||||
LOCATED_IN, NEAR, IN_CELL, HAS_CONCEPT, HAS_EVENT, SUPPORTED_BY,
|
||||
OPERATED_BY, HAS_FACILITY, PARTICIPATED_BY, BEFORE, AFTER, RELATED_TO
|
||||
```
|
||||
|
||||
后续业务迁移时,只替换 seed schema 与 prompt,不替换 KG 内核。
|
||||
Reference in New Issue
Block a user