pyspark.RDD.cleanShuffleDependencies#
- RDD.cleanShuffleDependencies(blocking=False)[source]#
Removes an RDD’s shuffles and it’s non-persisted ancestors.
When running without a shuffle service, cleaning up shuffle files enables downscaling. If you use the RDD after this call, you should checkpoint and materialize it first.
New in version 3.3.0.
- Parameters
- blockingbool, optional, default False
whether to block on shuffle cleanup tasks
Notes
This API is a developer API.