Dynamic schema
DynamicSchema
For the unified knowledge graph, especially the semantic part, the schema is dynamic.
So which will require two things:
- From top-down methodological perspective, we can use ontology based way to implement the schema.
- However, it will require quite a lot of pre-work before we can embrace the usage of LLM
- So we use it from another perspective, which is bottom-up.
- We will have the defined schema first, and then merge the schema
- The merge process will include two parts:
- Machine based, automatic merge
- Frequency based merge
- Similarity based merge
- Other strategies
- Human based, manual merge
- Machine based, automatic merge
Source code in Docs2KG/kg/dynamic_schema.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 |
|
__init__(kg_json_file, merge_freq=10, merge_similarity=0.98)
Initialize the dynamic schema class Args: kg_json_file (Path): The path of the knowledge graph json file merge_freq (int): The frequency of the label, if it is lower than this, we will ignore it merge_similarity (float): The similarity threshold for the merge
Returns:
Source code in Docs2KG/kg/dynamic_schema.py
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
|
human_in_the_loop_input()
Convert the schema into the dict { key: number of occurrence ... }
Then human will do the decision based on the value, to do the mapping
Source code in Docs2KG/kg/dynamic_schema.py
132 133 134 135 136 137 138 139 140 141 142 143 |
|
schema_extraction()
Extract the schema from the knowledge graph
Source code in Docs2KG/kg/dynamic_schema.py
57 58 59 60 61 62 63 64 65 66 67 |
|
schema_freq_merge()
Replace the label under the threshold into text_block label
Returns:
Name | Type | Description |
---|---|---|
merge_mapping |
dict
|
The mapping of the merge, key is the original label, value is the new label |
Source code in Docs2KG/kg/dynamic_schema.py
69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
|
schema_similarity_merge()
Merge the schema based on the similarity
Returns:
Name | Type | Description |
---|---|---|
merge_mapping |
dict
|
The mapping of the merge, key is the original label, value is the new label |
Source code in Docs2KG/kg/dynamic_schema.py
87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 |
|