What is Cloud Dataproc?
A fully managed cluster data processing service (Apache Spark and Apache Hadoop service)
What are key points of Cloud Dataproc?
When do you choose Cloud Dataproc over Cloud Dataflow?
If you have dependencies on Hadoop or Spark, or if you want more hands on management and control.
How do you create a Cloud Dataproc cluster from the command line?
gcloud dataproc clusters create [CLUSTER NAME] –zone [ZONE]
How do you submit a job to Cloud Dataproc via the shell?
gcloud dataproc jobs submit [TYPE] –cluster [CLUSTER NAME] –jar [JAR FILE]
What cluster modes can you choose when setting up Cloud Dataproc?
What job types are available for Cloud Dataproc?
How do you import or export data to Cloud Dataproc?
You don’t. It’s a data analysis platform, not a database.
You can import and export to save/restore the cluster configuration data.
gcloud beta dataproc clusters export [CLUSTER NAME] – destination=[PATH TO EXPORT FILE]
gcloud beta dataproc clusters import [SOURCE FILE]