Skip to content

Modules in Docs2KG?

We have several modules in Docs2KG to help you to construct the unified multimodal knowledge graph.

We separate them into two main categories:

  • Native
  • LLM

Native will be traditional programming methods, while LLM will be using the help from the Language Model.

Native ones will be more deterministic, while LLM will be more probabilistic.

LLM will be with more power, however, hard to test, less stable, and more expensive.

The modules we have include:

  • Native:
    • Markdown2JSON
  • LLM:
    • Markdown2JSON
    • Image2Description
    • Sheet2Metadata
    • OpenAICall (Wrapper)
    • OpenAIEmbedding (Wrapper)

For the LLM ones, generally it is using a prompt to generate the output.

For example the LLM.Markdown2JSON will be used inside the PDF2Text to convert the markdown to JSON.

from Docs2KG.parser.pdf.pdf2text import PDF2Text

# define your pdf file
pdf_file = "path/to/your/pdf/file"

pdf_to_text = PDF2Text(pdf_file)
text = pdf_to_text.extract2text(output_csv=True)
md_text = pdf_to_text.extract2markdown(output_csv=True)