Skip to content

Plugins

General Architecture :

airflow_arch

1. Scheduler plugin

  • get_pending_jobs: Get sorted pending jobs from mongodb collection. The filter is based on the field created_at and the jobs are filtered using the value of query.

  • get_jobs: Get all jobs from mongodb for each social media for each tasks

  • insert_in_collection : insert data to mongoDB by Airflow mongohook (pymongo)

  • get_num_pending_jobs: get number of pending jobs

  • get_media_pending_jobs: get the number of jobs for each task in each media

  • update_mongo: updates the state of a job in the database, the state can be "running", "done", "failed", defaults to running

  • get_driver_task_details: get the informations related to drivers tasks from the collection core_tasks

  • get_driver_details: get_the informations related to the driver from the collection childs_socialmedia

  • get_driver_infos: get the available tasks for each media

2. dauthenticator plugin

Dauthenticator login is responsible for managing requestions the available accounts to launch the tasks that need login and to manage and update the state of the used accounts.

The followings are the list of the dauthentificator plugin functions:

  • get_available_accounts: get accounts and cookie available from Dauthenticator according to media

  • update_dagrun_account_mappings: create or update DAG RUN in dauthenticator, the dag_run indicates which dag_run has been started and which account is used

  • delete_dagrun_account_mappings: deletes the dag_run from dauthenticator after the end of the crawl

  • attribute_tasks: calculates the number of accounts each DAG needs

  • attribute_accounts_tasks: for a given number of profiles for each DAG, it attribute and balance accounts for each DAG

  • update_accounts: set the cookie_real_end and updates the cookies
  • set_cookie_error_message: if an error occurs while the driver instaciation, it will set the error messages in the field issue related to the used account