Cluster Types

Preemptible VM
Updating Clusters
Custom Clusters
Storage
HDFS storage on Dataproc is built on top of persistent disks (PDs) attached to worker nodes. This means data stored on HDFS is transient (unless it is copied to GCS or other persistent storage) with relatively higher storage costs. Hence it is recommended to minimize the use of HDFS storage. However there might be valid scenarios where you need to maintain a small HDFS footprint, specifically for performance reasons. In such cases, you can provision Dataproc clusters with limited HDFS storage, offloading all persistent storage needs to GCS.
REF: https://cloud.google.com/blog/topics/developers-practitioners/dataproc-best-practices-guide
Autoscaling
Do not use autoscaling with:
Graceful Decommissioning
When you downscale a cluster, work in progress may stop before completion. You can use Graceful Decommissioning to finish work in progress on a worker before it is removed from the Cloud Dataproc cluster.
The preemptible (secondary) worker group continues to provision or delete workers to reach its expected size even after a cluster scaling operation is marked complete.
If you attempt to gracefully decommission a secondary worker and receive an error message similar to the following:
“Secondary worker group cannot be modified outside of Dataproc. If you recently created or updated this cluster, wait a few minutes before gracefully decommissioning to allow all secondary instances to join or leave the cluster. Expected secondary worker group size: x, actual size: y”,
wait a few minutes then repeat the graceful decommissioning request.
Graceful decommission should ideally be set to be longer than the longest running job on the cluster.
Also note: