Mastering PySpark — File formats
Published in
2 min readMay 3, 2024
Cheatsheet to work with different file formats in spark.
Native format does not need any extra jar to be installed. External formats needs separate jar for spark to be able to read the files. Let us see some examples of how we can read these formats in spark.
CSV
Reading CSV:
df_csv = spark.read.csv('path/to/your/csvfile.csv'…