Skip to content

DriverManage

Overview

The DriverManage class is an abstract base class that defines the interface for initializing and managing drivers in a data extraction pipeline, particularly in an Airflow environment.
It dynamically loads driver classes, handles cookies and API key parsing, and manages errors during driver initialization.

Workflow

The DriverManage class is responsible for setting up and managing platform-specific drivers through the following steps:

  1. Retrieve driver configuration
    Uses the SchedulerHook to fetch metadata about the driver (such as driver class path and type) based on the provided driver_name.

  2. Parse and validate cookies for authentication
    If cookies are provided, attempts to parse them as JSON. If standard parsing fails (e.g., due to encoding issues), retries using unicode_escape decoding. If both attempts fail, it raises a detailed parsing error.

  3. Parse and validate the API key
    Attempts to parse the api_key as a JSON object. If parsing fails, the raw string is used instead. This ensures compatibility with drivers expecting either a string or a structured API key.

  4. Dynamically instantiate the driver class using reflection
    Loads the driver class at runtime using its fully qualified class path and initializes it with the necessary parameters (e.g., cookies, API key, remote URL, driver language, etc.).

  5. Handle initialization errors and update the system accordingly
    If instantiation fails or the driver object is invalid, logs the issue, deletes the related DAG run mapping, optionally sends error notifications (e.g., for failed cookie login), and raises a DriverInitException.


Constructor (__init__)

Initializes the driver manager with necessary parameters, authentication details, and environment values.

Parameters:

  • driver_name (str): Name of the driver.
  • dauth (DAuthenticatorHook): Authentication hook for handling credentials.
  • dag_run_id (str): Unique identifier for the DAG run.
  • cookies (str, optional): Authentication cookies.
  • need_selenium (bool, optional): Whether the driver requires Selenium.
  • remote_url (str, optional): Remote URL for Selenium server.
  • api_key (str, optional): API key used for authentication.
  • account_id (int, optional): ID of the account related to this job.

Driver Instantiation (__call__)

Dynamically loads and initializes the driver class based on the driver configuration.

Workflow:

  1. Retrieves driver details from SchedulerHook.
  2. Loads the driver class using load_class().
  3. Attempts to parse cookies using json.loads() with fallback for decoding issues.
  4. Attempts to parse the API key as JSON.
  5. Instantiates the driver class with the necessary parameters.
  6. Handles and logs driver errors, and deletes failed DAG runs if needed.

Parameters:

  • need_cookie (bool): Whether cookies are required for the driver (default is True).

Returns:

  • An instance of the initialized driver class.

Notes:

  • If the driver is not successfully instantiated, the DAG run will be deleted, and the appropriate error will be logged and raised.

Error Handling (delete_dag_run_and_update_error_message)

Handles failed driver initialization by cleaning up DAG run mappings and updating error messages.

Parameters:

  • error (str, optional): Error message to be logged (default is "Login Failed").

Workflow:

  1. Deletes the DAG run to prevent retrying a failed execution.
  2. Sets an appropriate error message in case of failed login (cookie-based).
  3. Raises DriverInitException with the failure reason.