Airflow
Hr Airflow¶
Hr Airflow project is an Airflow application that schedule data collection pipeline for Netethic. It mainly schedule the collection of social media profiles including profile information, postes and list of followers following of a given profile id. It also collects data from workspace based plateforms such as Microsoft, Google, etc. This solution alos includes a recovery system (système de reprise) which manage collecting new data (delta data) produced since the latest run.
Features¶
- Retrieves pending jobs for each social media platform
- Retrieves pending jobs for domains per task
- Manage the process of data collections
- Update job state and execution date for take back system
- Check if there is files that are not pushed in Minio
- Check for blocked jobs and reset their state
Technology and Tools:¶
- Docker
- Airflow
- MongoDB
Gitlab Branchs¶
Develop:¶
- used for developement environment
- The docker compose file to use is dev.yml
Master:¶
- used to deploy the project in preprod environement
- The docker compose file to use is docker-compose.yml
Usage¶
If you want to create your own branch you can use develop as a reference branch
1. Clone project¶
$ git clone https://gitlab.kaisens.fr/kaisensdata/apps/4inshield/back/airflow
$ git checkout $branch
2. Before starting the project¶
In the project root folder run the following command:
$ mkdir airflow/logs
$ chmod 777 -R airflow/logs
3. Add .env file:¶
- Create a copy of .env_sample and name it .env
- The location of the file should be in the root directory
- Make sure to update the configurations in the .env concerning airflow, the mongo db connexion, the connexion with dauthenticator.
4. login to kaisens docker registry :¶
To build your project make sure you have access to the kaisens docker registry.
$ docker login $registry_server -u $username -p $password
5. Launch app :¶
In your project root directory run the following command:
$ docker-compose -f <name_docker_compose_file.yml> up -d --build
License:¶
This software is supplied to you by Kaisens data. Any person who copy or redistribute this software outside Kaisens data or attempts to do so could be sued for intellectual property theft and corporate rules violation.