~/resources
6 notes found
spark

Apache Spark

DAG scheduler, stage boundaries, shuffle internals, RDD vs DataFrame, and memory management in Spark 3.x.

notes_in_progress...
hadoop

Hadoop

HDFS architecture, MapReduce internals, YARN resource management, and the Hadoop ecosystem overview.

notes_in_progress...
hive

Hive

HiveQL, partitioning strategies, bucketing, metastore architecture, and query optimization techniques.

notes_in_progress...
pyspark

PySpark

DataFrame API, transformations vs actions, UDFs, Spark SQL integration, and performance tuning patterns.

notes_in_progress...
python

Python

Data engineering patterns — generators, decorators, async I/O, type hints, and testing with pytest.

notes_in_progress...
sql

SQL

Window functions, CTEs, execution plans, indexing strategies, and advanced aggregation patterns.

notes_in_progress...