Skip to content

How to further enhance the unified multimodal knowledge graph?

After the Unified Multimodal KG is generated, we can find that the KG schema (node label) is dynamic generated.

Which will not be perfect as you expected at the beginning.

To improve the schema, one way to do it is using the ontology based way to do the prompting and pre checking.

We will provide that solution later.

Currently, what we think is more intuitive is

  • automatic schema merge
    • node label frequency based merge
    • label semantic similarity based merge
  • human in the loop
    • human review and further enhance the KG schema
from Docs2KG.kg.dynamic_schema import DynamicSchema
from Docs2KG.utils.constants import DATA_OUTPUT_DIR

if __name__ == "__main__":
    """
    1. From the generated KG, how can we do the schema merge?
        - You can hook this into the neo4j after the KG is loaded
    2. Human in the loop for the schema merge
    """
    kg_json_file = (
            DATA_OUTPUT_DIR / "Excellent_Example_Report.pdf" / "kg" / "triplets_kg.json"
    )
    dynamic_schema = DynamicSchema(kg_json_file=kg_json_file)
    dynamic_schema.schema_extraction()
    dynamic_schema.schema_freq_merge()
    dynamic_schema.schema_similarity_merge()
    dynamic_schema.human_in_the_loop_input()

As the function name indicates, automatically ones will remove the low frequency labels and merge them into the high frequency labels.

The semantic similarity based merge will use the word embedding to calculate the similarity between the labels.

The human in the loop will provide the frequency of the labels and the label itself, ask the human to decide whether to merge them or not.