Base
Base class for digitization methods that handles conversion of various input formats into standardized digital representations.
DigitizationBase
Bases: ABC
Abstract base class for digitization agents that defines the common interface and functionality for all digitization implementations.
Attributes:
Name | Type | Description |
---|---|---|
name |
str
|
Unique identifier for the digitization agent |
supported_formats |
List[str]
|
List of input formats this agent can process |
The output will be export to - markdown - json for table - json for images and files
Source code in Docs2KG/digitization/base.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 |
|
output_dir
property
Get the output directory for the digitization agent.
Returns:
Name | Type | Description |
---|---|---|
str |
Path
|
Output directory path |
get_agent_info()
Get information about the digitization agent.
Returns:
Type | Description |
---|---|
Dict[str, Any]
|
Dict containing agent metadata and configuration |
Source code in Docs2KG/digitization/base.py
104 105 106 107 108 109 110 111 112 113 114 |
|
process(input_data)
abstractmethod
Process the input data and return digitized output.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_data
|
Any
|
The data to be digitized |
required |
Returns:
Type | Description |
---|---|
Union[Dict, Any]
|
Digitized representation of the input data |
Raises:
Type | Description |
---|---|
NotImplementedError
|
If the child class doesn't implement this method |
ValueError
|
If input format is not supported |
Source code in Docs2KG/digitization/base.py
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 |
|
validate_input(input_data)
Validate if the input data format is supported by this agent.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_data
|
Any
|
The data to validate |
required |
Returns:
Name | Type | Description |
---|---|---|
bool |
bool
|
True if input format is supported, False otherwise |
Source code in Docs2KG/digitization/base.py
92 93 94 95 96 97 98 99 100 101 102 |
|