Prasad

Projects

Things I'm building or plan to build. Code goes on GitHub as it ships.

Data Engineering

  • Streaming cost simulator Coming soon

    Model the cost of Kafka/Kinesis pipelines before you deploy. Partitions, retention, consumer groups — priced out.

    PythonStreamlit
  • Schema evolution tester Coming soon

    Validate Avro/Protobuf schema changes against a registry and surface breaking changes before they hit production.

    PythonConfluent Schema Registry
  • Pipeline completeness tracker Coming soon

    Track partition-level freshness and completeness across S3-based data lakes. Alert before downstream jobs read stale data.

    PythonAWS CDKDynamoDB

AI/ML

  • RAG pipeline with observability In progress

    End-to-end retrieval-augmented generation — chunking, embedding, vector search — with latency and quality metrics baked in.

    PythonLangChainPineconeOpenTelemetry
  • Feature store from scratch Coming soon

    Offline + online feature serving with point-in-time correctness. Training-serving parity without the vendor lock-in.

    PythonRedisParquetFastAPI
  • ML evaluation framework Coming soon

    Automated model evaluation pipeline — versioned test sets, metric tracking, regression gates before deployment.

    PythonMLflowPandas