BaseExtractorStrategy
BaseExtractor
- Uses a fixed multi-stage pipeline of extraction strategies.
- Tries each strategy in order within a stage until one succeeds.
- Feeds the output of one stage into the next for processing.
- Supports async execution for efficient processing.
- Provides batch processing and resource monitoring options.
init
- pipeline (List[List[BaseExtractorStrategy]]): A fixed list of lists where each list represents a stage containing extractor strategies executed in order.
- cache_templates (bool): Whether to cache extraction templates. (default: :obj:
True) - max_cache_size (int): Maximum number of templates to cache. (default: :obj:
1000) - extraction_timeout (float): Maximum time for extraction in seconds. (default: :obj:
30.0) - batch_size (int): Size of batches for parallel extraction. (default: :obj:
10) - monitoring_interval (float): Interval in seconds between resource checks. (default: :obj:
5.0) - cpu_threshold (float): CPU usage percentage threshold for scaling down. (default: :obj:
80.0) - memory_threshold (float): Memory usage percentage threshold for scaling down. (default: :obj:
85.0) **kwargs: Additional extractor parameters.