Projects
Things I'm building or plan to build. Code goes on GitHub as it ships.
Data Engineering
- Streaming cost simulator Coming soon
Model the cost of Kafka/Kinesis pipelines before you deploy. Partitions, retention, consumer groups — priced out.
PythonStreamlit - Schema evolution tester Coming soon
Validate Avro/Protobuf schema changes against a registry and surface breaking changes before they hit production.
PythonConfluent Schema Registry - Pipeline completeness tracker Coming soon
Track partition-level freshness and completeness across S3-based data lakes. Alert before downstream jobs read stale data.
PythonAWS CDKDynamoDB
AI/ML
- RAG pipeline with observability In progress
End-to-end retrieval-augmented generation — chunking, embedding, vector search — with latency and quality metrics baked in.
PythonLangChainPineconeOpenTelemetry - Feature store from scratch Coming soon
Offline + online feature serving with point-in-time correctness. Training-serving parity without the vendor lock-in.
PythonRedisParquetFastAPI - ML evaluation framework Coming soon
Automated model evaluation pipeline — versioned test sets, metric tracking, regression gates before deployment.
PythonMLflowPandas