Retrieve and add domains

Retrieve Domains DAG¶

The retrieve_domains_dag is responsible of retrieving domains associated with a specific school or group of schools from a Webserver and store them in Crawlserver MongoDB database.

Parameters¶

The DAG accepts two parameters: - school_identifier (int): School ID for domain retrieval - group_school_identifier (int): Group school ID for domain retrieval

alt text

Tasks¶

The dags has 3 PythonOperator tasks:

alt text

1. `validate_params`¶

This task ensures that the input parameters are correctly set before proceeding with domain retrieval. It checks that either school_identifier or groupschool_identifier is provided but not both.

2. `get_auth_token`¶

Retrieves an authentication token from the Webserver to access protected API endpoints and stores it in XCom to be used by the subsequent tasks.

3. `retrieve_domains`¶

Fetches domains from the Webserver and stores them in the domain MongoDB collection. It uses the access token obtained from the get_auth_token task to send a request to the get-domains-for-crawlserver API with the appropriate parameters school or group school identifier. The task also verify the retrieved domains exist already in the collection and store them otherwise.

`Cleaning Xcom`¶

When the DAG execution is successfully completed, all XCom entries related to this DAG execution will be deleted

Retrieve and add domains

Retrieve Domains DAG¶

Parameters¶

Tasks¶

1. validate_params¶

2. get_auth_token¶

3. retrieve_domains¶

Cleaning Xcom¶

1. `validate_params`¶

2. `get_auth_token`¶

3. `retrieve_domains`¶

`Cleaning Xcom`¶