Devops Course Curriculum

Module 8: DevOps, Containerization & APIs (Docker, Kubernetes, Fast API, Flask)

Duration: 50 Hours

Topic 8.1: REST APIs with Flask and Fast API

Theory:

  • REST concepts: endpoints, methods (GET, POST, PUT, DELETE)

  • JSON serialization

  • Error handling

  • API documentation with Swagger/OpenAPI

  • Authentication basics (JWT, API key)

    Lab:

  • Build an API to return account balance by customer ID

  • Create a REST API to serve fraud prediction model

    Scenarios:

  • Provide internal dashboard access via secured API

  • Serve credit score predictions via API for instant loans

    Tasks:

  • Create GET endpoint to fetch transaction history

  • Add JWT token-based authentication

    Challenges:

  • Handling large payloads

  • Token expiration and renewal issues

  • Ensuring secure endpoints (HTTPS, rate-limiting)

Topic 8.2: Docker – Containerization

Theory:

  • Images vs Containers

  • Dockerfile: build custom images

  • Volumes and networking

  • docker-compose for multi-container apps

    Lab:

  • Containerize a Flask ETL API

  • Mount volumes for log and config persistence

    Scenarios:

  • Package and run ETL pipelines on any developer system

  • Build isolated test environments

    Tasks:

  • Write Dockerfile for model-serving app

  • Use docker-compose to run app + MongoDB

    Challenges:

  • Managing image size and build times

  • Container not exposing ports properly

  • Volume misconfiguration leading to data loss

Topic 8.3: Kubernetes Orchestration

Theory:

  • Kubernetes components: Pod, Deployment, Service

  • Helm charts (optional)

  • Secrets and ConfigMaps

  • Horizontal Pod Autoscaling (HPA)

    Lab:

  • Deploy containerized fraud detection API on Kubernetes

  • Use ConfigMap to manage runtime configs

    Scenarios:

  • Auto-scale transaction processing microservices

  • Isolate environments for dev/test/stage

    Tasks:

  • Create YAML manifests to deploy a service

  • Expose FastAPI endpoint using Kubernetes service

    Challenges:

  • Debugging failed pods due to resource limits

  • Network misrouting in service-to-service calls

  • Secrets not mounted properly into containers

Module 9: CI/CD and Monitoring (Airflow, Prometheus, Grafana, Unit Testing) Duration: 40 Hours

Topic 9.1: Workflow Orchestration with Apache Airflow

Theory:

  • DAGs: Directed Acyclic Graphs for task dependency

  • Operators: PythonOperator, BashOperator, DummyOperator

  • Scheduling with cron expressions

  • XComs, retries, SLA monitoring

  • Best practices for DAG structuring

    Lab:

  • Create a DAG to run a daily ETL on banking transaction logs

  • Add retries and failure alerts via email

    Scenarios:

  • Orchestrate pipeline: ingest clean analyze report

  • Automate DBT model updates and ML model refresh

    Tasks:

  • Build a 3-step DAG to load data, run SQL, and generate reports

  • Add on_failure_call back to notify Slack or email

Challenges:

  • DAG import errors due to broken dependencies

  • Scheduler hangs or skipped task retries

  • Managing large DAGs with dynamic branching

Topic 9.2: Monitoring with Prometheus & Grafana

Theory:

  • Time-series monitoring with Prometheus

  • Metric types: counters, gauges, histograms

  • PromQL queries

  • Dashboards and alerts with Grafana

  • Integrating with Airflow and FastAPI

    Lab:

  • Monitor CPU and memory usage of ETL containers

  • Create Grafana dashboard to track DAG runtimes

    Scenarios:

  • Alert when a job exceeds 5-minute SLA

  • Visualize task success/failure trends over time

    Tasks:

  • Configure Prometheus scrape jobs

  • Build a Grafana panel showing DAG task durations

Challenges:

  • Dashboard not updating due to time range mismatch

  • Misconfigured alert thresholds triggering noise

  • Prometheus storage bloat due to high cardinality metrics

Topic 9.3: Testing and CI/CD (GitHub Actions / GitLab)

Theory:

  • Importance of unit and integration testing

  • Writing tests with pytest

  • Mocking external services

  • Setting up CI pipelines for test + build

  • Linting and static analysis tools

    Lab:

  • Write unit tests for a Python ETL function

  • Configure GitHub Actions to run tests on each commit

    Scenarios:

  • Prevent code merge if unit tests fail

  • Automate code formatting checks before deployment

    Tasks:

  • Create .github/workflows/test.yml to trigger pytest

  • Add test coverage reports to PR checks

    Challenges:

  • CI build failing due to dependency conflicts

  • Mock errors in tests for API services

  • Managing secrets securely in pipeline configs

DevOps for Data Engineering (60 Hours)

Module D1: Infrastructure as Code & Automation

Theory

  • IaC concepts (declarative vs imperative)

  • Terraform fundamentals (providers, resources, modules, workspaces)

  • Ansible for configuration management

    Labs

  • Provision AWS S3, RDS, EMR with Terraform

  • Automate EC2 setup for Spark cluster with Ansible

Scenarios

  • Version-controlled infrastructure for repeatable data lake setup

  • Tasks

  • Write a Terraform module for Kafka on AWS

    Challenges

  • Handle drift detection, multi-region deployments

Module D2: CI/CD Pipelines for Data

Theory

  • Jenkins vs GitHub Actions vs GitLab CI

  • Canary deployments, blue-green deployments

Labs

  • Build Jenkins pipeline to deploy DBT models

  • Automate Spark job packaging with Docker + CI

Scenarios

  • Auto-deploy ML model retraining when new data lands in S3

    Tasks

  • Build CI/CD pipeline for Kafka consumers

    Challenges

  • Secrets management, rollback automation

Module D3: Observability & Monitoring

Theory

  • Logs, metrics, tracing (ELK, Prometheus, Grafana, OpenTelemetry)

  • SLA/SLO/SLI for data pipelines

Labs

  • Instrument Spark jobs with Prometheus

  • Grafana dashboard for Kafka lag

Scenarios

  • Detect slow ETL jobs before SLA breach

Tasks

  • Build alerting system for data quality errors

Challenges

  • Avoid false-positive alerts, scale Prometheus for large clusters

Module D4: Container Orchestration at Scale

Theory

  • Kubernetes advanced (Operators, StatefulSets, CRDs)

  • Helm charts for reproducible deployments

Labs

  • Deploy Airflow on Kubernetes using Helm

  • Build multi-tenant ETL platform

Scenarios

  • Auto-scale ML inference API based on request volume

    Tasks

  • Write Helm chart for DBT transformation jobs

    Challenges

  • Debug pod failures in Spark-on-K8s deployments

DevSecOps for Data Engineering (50 Hours) Module

S1: Security in CI/CD

Theory

  • Shift-left security concepts

  • Static code analysis (Bandit, SonarQube)

  • Dependency scanning (Snyk, Trivy)

Labs

  • Add security scan stage in GitHub Actions

  • Scan Python ETL containers with Trivy

Scenarios

  • Prevent vulnerable Spark images from going live

Tasks

  • Integrate Snyk into Airflow DAG build pipeline

    Challenges

  • Handling false positives in scans

Module S2: Secrets & Identity Management

Theory

  • HashiCorp Vault, AWS Secrets Manager

  • IAM policies, principle of least privilege

Labs

  • Rotate database credentials automatically

  • Secure Airflow connections with Vault

    Scenarios

  • Protect API keys used by ML pipelines

Tasks

  • Create least-privilege IAM role for S3 ingestion pipeline

Challenges

  • Avoid hardcoding secrets in code

Module S3: Compliance & Data Security

Theory

  • GDPR, HIPAA, PCI-DSS in data engineering

  • Data masking, tokenization, encryption at rest/in-transit

Labs

  • Build data masking pipeline in Spark

  • Enable TLS for Kafka cluster

Scenarios

  • Encrypt PII data before storing in Snowflake

Tasks

  • Add audit logging to Spark job

Challenges

  • Balancing security vs performance in streaming

Module S4: Runtime Security & Zero Trust

Theory

  • Kubernetes security (PodSecurityPolicies, NetworkPolicies)

  • Runtime protection (Falco, Aqua Security)

  • Zero Trust Architecture

Labs

  • Apply Pod Security Policies to block privileged containers

  • Monitor runtime anomalies with Falco

Scenarios

  • Detect unauthorized ETL pod spawning in cluster

Tasks

  • Configure zero-trust authentication for APIs

Challenges

  • Minimize false alarms in runtime monitoring

    This concludes your complete topic-wise Data-OPS syllabus, with theory, labs, tasks, challenges, and real-world scenarios embedded into each module.