/about /blog /build-log /learnNEW

Shaurya's Blog

Data, ML and Software Engineering

Kafka topics to Iceberg Tables
flink
kafka
iceberg
spark
real-time-analytics
In this post, we'll explore different approaches for ingesting Kafka data into Iceberg tables, examine strategies for managing schema evolution, and discuss when to choose one method over another based on your specific use case.
Published On
2025-11-30
Read More →
Distributed Data Systems: Understanding Join Algorithms
distributed-systems
databases
python
spark
A query engine or database's join algorithm is the mechanism through which datasets are unified, relationships are discovered and raw data is transformed into meaningful insights.
Published On
2025-09-13
Read More →
Data Processing with PySpark, Delta Lake and AWS EMR
aws
delta-lake
spark
In this post, we'll discuss data processing with PySpark using the delta lake format and deploying it on AWS Elastic MapReduce (EMR)
Published On
2024-06-27
Read More →