$ cat nifty-lakehouse.md
Nifty Lakehouse
Banking Data Lakehouse on Kubernetes with CDC-driven Medallion Architecture.
CLClickHouseDBdbtAIAirbyte
AirflowMIMinIOMIMindsDBSUSuperset
KubernetesOROracle XE
MySQL
Deployed a Banking Data Lakehouse POC on a self-managed Kubernetes cluster (AWS EC2) implementing Medallion Architecture (Bronze → Silver → Gold) with dual CDC sources — Oracle XE via LogMiner and MySQL via GTID binlog — landing Parquet files into MinIO. ClickHouse queries Bronze directly via S3 external tables with dbt handling Silver and Gold transformations, orchestrated end-to-end by Apache Airflow.
Highlights
- AI-assisted dbt model generation automates transformation config from source schemas.
- MindsDB (OpenAI gpt-4o-mini) serves natural-language SQL queries over Gold tables.
- Apache Superset dashboards for business metrics and anomaly reporting.
- Longhorn-backed persistent volumes across all stateful workloads.