Word docling
DOCXMammoth
Bases: DigitizationBase
DOCXDocling class for processing Word documents using mammoth.
Source code in Docs2KG/digitization/native/word_docling.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 |
|
export_markdown(content)
Export content to markdown file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
content
|
str
|
The markdown content to export |
required |
Returns:
Name | Type | Description |
---|---|---|
Path |
Path
|
Path to the generated markdown file |
Source code in Docs2KG/digitization/native/word_docling.py
37 38 39 40 41 42 43 44 45 46 47 48 49 |
|
process()
Process DOCX document and generate markdown output.
Returns:
Name | Type | Description |
---|---|---|
Path |
Path
|
Path to the generated markdown file |
Raises:
Type | Description |
---|---|
ValueError
|
If input is not a valid DOCX file |
FileNotFoundError
|
If DOCX file doesn't exist |
Source code in Docs2KG/digitization/native/word_docling.py
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
|
validate_input(input_data)
staticmethod
Validate if the input is a valid DOCX file path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_data
|
Union[str, Path]
|
Path to DOCX file (string or Path object) |
required |
Returns:
Name | Type | Description |
---|---|---|
bool |
bool
|
True if input is valid DOCX file, False otherwise |
Source code in Docs2KG/digitization/native/word_docling.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
|