Brainscape
Find Flashcards
Discover millions of Flashcards
Browse Brainscape-Certified Flashcards
Learn faster with our catalog of expert certified, pre-made flashcards.
Browse All Flashcards
Browse our full catalog of user-generated and Brainscape-Certified flashcards.
Explore the
Knowledge Genome
of subjects:
Entrance Exams
Professional Certifications
Foreign Languages
Medical & Nursing
Science
English
Humanities & Social Studies
Mathematics
Law
Vocations
Health & Fitness
Business & Finance
Technology & Engineering
Food & Beverage
Fine Arts
Random Knowledge
Make Flashcards
How It Works
Features Overview
The Science of Studying
Educators
Schools & Teachers
Tutors & Resellers
Businesses
Employee Training
Publishers & Resellers
Academy
Academy Homepage
The Science of Studying
Study Tips
Teaching Tips
Employee Training Tips
Language Learning Tips
Test Prep Tips
Log in
My Dashboard
Get Started
Log out
Google Cloud Professional Data Engineer
> 15_Dataprep > Flashcards
15_Dataprep Flashcards
(3 cards)
Study These Flashcards
1
Q
What is Dataprep
Explore, cleaning and preparing data
Partnered with Trifacta for data cleaning/processing service
Fully managed, serverless and web-based
User-friendly interface
Clean data by clicking on it
Visually define transformation
Export to Cloud Dataflow
Supported file types
Input: CSV, JSON (including nested), Plain text, Excel, LOG, TSV and Avro
Output: CSV, JSON, Avro, BigQuery table
CSV/JSON can be compressed or uncompressed
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
How it works
Backed by Cloud Dataflow
After preparing, Dataflow processes via Apache Beam pipeline
“User-friendly Dataflow pipeline”
Dataprep process:
Create a
flow
, which is container object to access and manipulate datasets
Import
dataset
Transform sampled data with
recipes
Run Dataflow job on transformed dataset
Export results (GCS, BigQuery)
Intelligent suggestions:
Selecting data will often automatically give the best suggestion
Can manually create recipes, however simple tasks (remove outliers, de-duplicate) should use
auto-suggestions
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q
IAM
Dataprep User
: Run Dataprep in a project
Dataprep Service Agent
: Gives Trifacta necessary access to project resources
Access GCS buckets, Dataflow Developer, BigQuery user/data editor
Necessary for cross-project access + GCE service account
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
Google Cloud Professional Data Engineer
flashcards
Decks in class (16)
# Cards
0_GCP Fundamentals
15
1_Data Processing fundamentals
6
2_Storage and Databases
17
3_Pub/Sub
11
4_Dataflow
20
5_Dataproc
13
6_Bigtable
17
7_BigQuery
30
8_Cloud Datalab
3
9_Cloud Datastudio
4
10_Cloud Composer
4
11_Machine Learning
19
12_Vertex AI
16
13_Pretrained ML API's
3
14_Operationalizing Machine Learning
4
15_Dataprep
3