Apache Spark

Shuffling in Apache Spark

Senthil Nayagan
Senthil Nayagan        
 - 0 Mins Read

Shuffling is the act of redistributing data so that it’s grouped differently across partitions. This typically involves copying data across executors and machines, making the shuffle a complex and costly operation.
Shuffling in Apache Spark

Writing in progress: If you have any suggestions for improving the content or notice any inaccuracies, please email me at [email protected]. Thanks!

What is shuffling in Spark?

Comments

comments powered by Disqus