$ shohanuzzaman
back to projects
$ cat nifty-lakehouse.md

Nifty Lakehouse

Banking Data Lakehouse on Kubernetes with CDC-driven Medallion Architecture.

CLClickHouseDBdbtAIAirbyteAirflowAirflowMIMinIOMIMindsDBSUSupersetKubernetesKubernetesOROracle XEMySQLMySQL

Deployed a Banking Data Lakehouse POC on a self-managed Kubernetes cluster (AWS EC2) implementing Medallion Architecture (Bronze → Silver → Gold) with dual CDC sources — Oracle XE via LogMiner and MySQL via GTID binlog — landing Parquet files into MinIO. ClickHouse queries Bronze directly via S3 external tables with dbt handling Silver and Gold transformations, orchestrated end-to-end by Apache Airflow.

Highlights

  • AI-assisted dbt model generation automates transformation config from source schemas.
  • MindsDB (OpenAI gpt-4o-mini) serves natural-language SQL queries over Gold tables.
  • Apache Superset dashboards for business metrics and anomaly reporting.
  • Longhorn-backed persistent volumes across all stateful workloads.