Projects

Things I'm building or plan to build. Code goes on GitHub as it ships.

Data Engineering

Streaming cost simulator Coming soon

Model the cost of Kafka/Kinesis pipelines before you deploy. Partitions, retention, consumer groups — priced out.

PythonStreamlit
Schema evolution tester Coming soon

Validate Avro/Protobuf schema changes against a registry and surface breaking changes before they hit production.

PythonConfluent Schema Registry
Pipeline completeness tracker Coming soon

Track partition-level freshness and completeness across S3-based data lakes. Alert before downstream jobs read stale data.

PythonAWS CDKDynamoDB

RAG pipeline with observability In progress

End-to-end retrieval-augmented generation — chunking, embedding, vector search — with latency and quality metrics baked in.

PythonLangChainPineconeOpenTelemetry
Feature store from scratch Coming soon

Offline + online feature serving with point-in-time correctness. Training-serving parity without the vendor lock-in.

PythonRedisParquetFastAPI
ML evaluation framework Coming soon

Automated model evaluation pipeline — versioned test sets, metric tracking, regression gates before deployment.

PythonMLflowPandas