Lépjen offline állapotba az Player FM alkalmazással!
Scaling On-Prem Airflow With 2,000 DAGs at Numberly with Sébastien Crocquevieille
Manage episode 501484241 series 2948506
Scaling 2,000+ data pipelines isn’t easy. But with the right tools and a self-hosted mindset, it becomes achievable.
In this episode, Sébastien Crocquevieille, Data Engineer at Numberly, unpacks how the team scaled their on-prem Airflow setup using open-source tooling and Kubernetes. We explore orchestration strategies, UI-driven stakeholder access and Airflow’s evolving features.
Key Takeaways:
00:00 Introduction.
02:13 Overview of the company’s operations and global presence.
04:00 The tech stack and structure of the data engineering team.
04:24 Running nearly 2,000 DAGs in production using Airflow.
05:42 How Airflow’s UI empowers stakeholders to self-serve and troubleshoot.
07:05 Details on the Kubernetes-based Airflow setup using Helm charts.
09:31 Transition from GitSync to NFS for DAG syncing due to performance issues.
14:11 Making every team member Airflow-literate through local installation.
17:56 Using custom libraries and plugins to extend Airflow functionality.
Resources Mentioned:
https://www.linkedin.com/in/scroc/
Numberly | LinkedIn
https://www.linkedin.com/company/numberly/
Numberly | Website
https://numberly.com/
https://airflow.apache.org/
https://grafana.com/
https://kafka.apache.org/
https://airflow.apache.org/docs/helm-chart/stable/index.html
https://kubernetes.io/
https://about.gitlab.com/
KubernetesPodOperator – Airflow
https://airflow.apache.org/docs/apache-airflow-providers-cncf-kubernetes/stable/operators.html
https://astronomer.io/beyond/dataflowcast
Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.
#AI #Automation #Airflow #MachineLearning
69 epizódok
Scaling On-Prem Airflow With 2,000 DAGs at Numberly with Sébastien Crocquevieille
The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
Manage episode 501484241 series 2948506
Scaling 2,000+ data pipelines isn’t easy. But with the right tools and a self-hosted mindset, it becomes achievable.
In this episode, Sébastien Crocquevieille, Data Engineer at Numberly, unpacks how the team scaled their on-prem Airflow setup using open-source tooling and Kubernetes. We explore orchestration strategies, UI-driven stakeholder access and Airflow’s evolving features.
Key Takeaways:
00:00 Introduction.
02:13 Overview of the company’s operations and global presence.
04:00 The tech stack and structure of the data engineering team.
04:24 Running nearly 2,000 DAGs in production using Airflow.
05:42 How Airflow’s UI empowers stakeholders to self-serve and troubleshoot.
07:05 Details on the Kubernetes-based Airflow setup using Helm charts.
09:31 Transition from GitSync to NFS for DAG syncing due to performance issues.
14:11 Making every team member Airflow-literate through local installation.
17:56 Using custom libraries and plugins to extend Airflow functionality.
Resources Mentioned:
https://www.linkedin.com/in/scroc/
Numberly | LinkedIn
https://www.linkedin.com/company/numberly/
Numberly | Website
https://numberly.com/
https://airflow.apache.org/
https://grafana.com/
https://kafka.apache.org/
https://airflow.apache.org/docs/helm-chart/stable/index.html
https://kubernetes.io/
https://about.gitlab.com/
KubernetesPodOperator – Airflow
https://airflow.apache.org/docs/apache-airflow-providers-cncf-kubernetes/stable/operators.html
https://astronomer.io/beyond/dataflowcast
Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.
#AI #Automation #Airflow #MachineLearning
69 epizódok
Minden epizód
×Üdvözlünk a Player FM-nél!
A Player FM lejátszó az internetet böngészi a kiváló minőségű podcastok után, hogy ön élvezhesse azokat. Ez a legjobb podcast-alkalmazás, Androidon, iPhone-on és a weben is működik. Jelentkezzen be az feliratkozások szinkronizálásához az eszközök között.