Devops Course Curriculum
Module 8: DevOps, Containerization & APIs (Docker, Kubernetes, Fast API, Flask)
Duration: 50 Hours
Topic 8.1: REST APIs with Flask and Fast API
Theory:
REST concepts: endpoints, methods (GET, POST, PUT, DELETE)
JSON serialization
Error handling
API documentation with Swagger/OpenAPI
Authentication basics (JWT, API key)
Lab:
Build an API to return account balance by customer ID
Create a REST API to serve fraud prediction model
Scenarios:
Provide internal dashboard access via secured API
Serve credit score predictions via API for instant loans
Tasks:
Create GET endpoint to fetch transaction history
Add JWT token-based authentication
Challenges:
Handling large payloads
Token expiration and renewal issues
Ensuring secure endpoints (HTTPS, rate-limiting)
Topic 8.2: Docker – Containerization
Theory:
Images vs Containers
Dockerfile: build custom images
Volumes and networking
docker-compose for multi-container apps
Lab:
Containerize a Flask ETL API
Mount volumes for log and config persistence
Scenarios:
Package and run ETL pipelines on any developer system
Build isolated test environments
Tasks:
Write Dockerfile for model-serving app
Use docker-compose to run app + MongoDB
Challenges:
Managing image size and build times
Container not exposing ports properly
Volume misconfiguration leading to data loss
Topic 8.3: Kubernetes – Orchestration
Theory:
Kubernetes components: Pod, Deployment, Service
Helm charts (optional)
Secrets and ConfigMaps
Horizontal Pod Autoscaling (HPA)
Lab:
Deploy containerized fraud detection API on Kubernetes
Use ConfigMap to manage runtime configs
Scenarios:
Auto-scale transaction processing microservices
Isolate environments for dev/test/stage
Tasks:
Create YAML manifests to deploy a service
Expose FastAPI endpoint using Kubernetes service
Challenges:
Debugging failed pods due to resource limits
Network misrouting in service-to-service calls
Secrets not mounted properly into containers
Module 9: CI/CD and Monitoring (Airflow, Prometheus, Grafana, Unit Testing) Duration: 40 Hours
Topic 9.1: Workflow Orchestration with Apache Airflow
Theory:
DAGs: Directed Acyclic Graphs for task dependency
Operators: PythonOperator, BashOperator, DummyOperator
Scheduling with cron expressions
XComs, retries, SLA monitoring
Best practices for DAG structuring
Lab:
Create a DAG to run a daily ETL on banking transaction logs
Add retries and failure alerts via email
Scenarios:
Orchestrate pipeline: ingest → clean → analyze → report
Automate DBT model updates and ML model refresh
Tasks:
Build a 3-step DAG to load data, run SQL, and generate reports
Add on_failure_call back to notify Slack or email
Challenges:
DAG import errors due to broken dependencies
Scheduler hangs or skipped task retries
Managing large DAGs with dynamic branching
Topic 9.2: Monitoring with Prometheus & Grafana
Theory:
Time-series monitoring with Prometheus
Metric types: counters, gauges, histograms
PromQL queries
Dashboards and alerts with Grafana
Integrating with Airflow and FastAPI
Lab:
Monitor CPU and memory usage of ETL containers
Create Grafana dashboard to track DAG runtimes
Scenarios:
Alert when a job exceeds 5-minute SLA
Visualize task success/failure trends over time
Tasks:
Configure Prometheus scrape jobs
Build a Grafana panel showing DAG task durations
Challenges:
Dashboard not updating due to time range mismatch
Misconfigured alert thresholds triggering noise
Prometheus storage bloat due to high cardinality metrics
Topic 9.3: Testing and CI/CD (GitHub Actions / GitLab)
Theory:
Importance of unit and integration testing
Writing tests with pytest
Mocking external services
Setting up CI pipelines for test + build
Linting and static analysis tools
Lab:
Write unit tests for a Python ETL function
Configure GitHub Actions to run tests on each commit
Scenarios:
Prevent code merge if unit tests fail
Automate code formatting checks before deployment
Tasks:
Create .github/workflows/test.yml to trigger pytest
Add test coverage reports to PR checks
Challenges:
CI build failing due to dependency conflicts
Mock errors in tests for API services
Managing secrets securely in pipeline configs
DevOps for Data Engineering (60 Hours)
Module D1: Infrastructure as Code & Automation
Theory
IaC concepts (declarative vs imperative)
Terraform fundamentals (providers, resources, modules, workspaces)
Ansible for configuration management
Labs
Provision AWS S3, RDS, EMR with Terraform
Automate EC2 setup for Spark cluster with Ansible
Scenarios
Version-controlled infrastructure for repeatable data lake setup
Tasks
Write a Terraform module for Kafka on AWS
Challenges
Handle drift detection, multi-region deployments
Module D2: CI/CD Pipelines for Data
Theory
Jenkins vs GitHub Actions vs GitLab CI
Canary deployments, blue-green deployments
Labs
Build Jenkins pipeline to deploy DBT models
Automate Spark job packaging with Docker + CI
Scenarios
Auto-deploy ML model retraining when new data lands in S3
Tasks
Build CI/CD pipeline for Kafka consumers
Challenges
Secrets management, rollback automation
Module D3: Observability & Monitoring
Theory
Logs, metrics, tracing (ELK, Prometheus, Grafana, OpenTelemetry)
SLA/SLO/SLI for data pipelines
Labs
Instrument Spark jobs with Prometheus
Grafana dashboard for Kafka lag
Scenarios
Detect slow ETL jobs before SLA breach
Tasks
Build alerting system for data quality errors
Challenges
Avoid false-positive alerts, scale Prometheus for large clusters
Module D4: Container Orchestration at Scale
Theory
Kubernetes advanced (Operators, StatefulSets, CRDs)
Helm charts for reproducible deployments
Labs
Deploy Airflow on Kubernetes using Helm
Build multi-tenant ETL platform
Scenarios
Auto-scale ML inference API based on request volume
Tasks
Write Helm chart for DBT transformation jobs
Challenges
Debug pod failures in Spark-on-K8s deployments
DevSecOps for Data Engineering (50 Hours) Module
S1: Security in CI/CD
Theory
Shift-left security concepts
Static code analysis (Bandit, SonarQube)
Dependency scanning (Snyk, Trivy)
Labs
Add security scan stage in GitHub Actions
Scan Python ETL containers with Trivy
Scenarios
Prevent vulnerable Spark images from going live
Tasks
Integrate Snyk into Airflow DAG build pipeline
Challenges
Handling false positives in scans
Module S2: Secrets & Identity Management
Theory
HashiCorp Vault, AWS Secrets Manager
IAM policies, principle of least privilege
Labs
Rotate database credentials automatically
Secure Airflow connections with Vault
Scenarios
Protect API keys used by ML pipelines
Tasks
Create least-privilege IAM role for S3 ingestion pipeline
Challenges
Avoid hardcoding secrets in code
Module S3: Compliance & Data Security
Theory
GDPR, HIPAA, PCI-DSS in data engineering
Data masking, tokenization, encryption at rest/in-transit
Labs
Build data masking pipeline in Spark
Enable TLS for Kafka cluster
Scenarios
Encrypt PII data before storing in Snowflake
Tasks
Add audit logging to Spark job
Challenges
Balancing security vs performance in streaming
Module S4: Runtime Security & Zero Trust
Theory
Kubernetes security (PodSecurityPolicies, NetworkPolicies)
Runtime protection (Falco, Aqua Security)
Zero Trust Architecture
Labs
Apply Pod Security Policies to block privileged containers
Monitor runtime anomalies with Falco
Scenarios
Detect unauthorized ETL pod spawning in cluster
Tasks
Configure zero-trust authentication for APIs
Challenges
Minimize false alarms in runtime monitoring
This concludes your complete topic-wise Data-OPS syllabus, with theory, labs, tasks, challenges, and real-world scenarios embedded into each module.