watsonx.data Flashcards

(148 cards)

1
Q

What is the primary purpose of watsonx.data?

A

To provide a hybrid open data lakehouse that unifies prepares and delivers AI-ready enterprise data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which architecture does watsonx.data use to avoid vendor lock-in?

A

Open-source data formats with a unified metadata layer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Name two vector databases integrated with watsonx.data.

A

Milvus and DataStax.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What advantage does watsonx.data give AI agents over conventional RAG methods?

A

Up to 40% higher accuracy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

List three deployment options for watsonx.data.

A

SaaS client-managed VPC and on-premises.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the core benefit of the multi-engine architecture in watsonx.data?

A

Optimized price-performance across workloads.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Which IBM acquisition enhances unstructured data handling in watsonx.data?

A

DataStax.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does hybrid by design mean for watsonx.data?

A

Support for on-prem private and public clouds with seamless data movement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Name a key capability of watsonx.data Integration.

A

Bulk ETL/ELT real-time streaming data replication or data observability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What tool does watsonx.data Integration use for data observability?

A

IBM Databand.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Which capability helps detect schema changes early?

A

Data Observability pipeline-level monitoring.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What metric is used to bill watsonx.data Integration services?

A

Resource Units (RUs).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the SaaS price per RU for watsonx.data Integration?

A

USD 25 per RU.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the minimum RU purchase for SaaS?

A

200 RUs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What integration styles does watsonx.data Integration support?

A

No-code low-code and SQL-based authoring.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What problem does the unified control plane solve?

A

Tool sprawl across multiple data-integration products.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Name a real-time streaming component of watsonx.data Integration.

A

StreamSets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Which industry is a lead adopter of data observability?

A

BFSI (Banking Financial Services Insurance).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What benefit does Data Observability provide for mean-time-to-detect (MTTD)?

A

Reduces MTTD often to near-zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Which IBM product enables continuous data quality checks?

A

watsonx.data Integration Data Observability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What does composable architecture enable in watsonx.data Integration?

A

Portability of pipelines across execution engines.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How does watsonx.data Integration handle unstructured data?

A

Through the DIUD (Data Integration for Unstructured Data) capability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the primary use case for watsonx.data Integration Real-time Streaming?

A

Low-latency data ingestion for analytics and AI.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Name two use cases for Real-time Streaming in finance.

A

Fraud detection and regulatory compliance alerts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What benefit does real-time streaming give to a European bank?
Reduced latency from days to 1 minute and £400K annual cost savings.
26
Which IBM product provides automated data lineage?
watsonx.data Intelligence Data Lineage.
27
What level of detail does Data Lineage capture?
Column-level lineage with transformation expressions.
28
How does Data Lineage support compliance?
By visualizing data flow and impact for audit readiness.
29
What open API does Data Lineage expose?
Open Lineage API for CI/CD integration.
30
Which IBM product offers automated data governance?
watsonx.data Intelligence Data Governance.
31
What AI technology powers data quality assessment in Data Governance?
IBM Large Language Models (LLMs).
32
How many default data-quality rules are provided?
190+ rules.
33
What is a knowledge accelerator in Data Governance?
Industry-specific vocabularies that map regulations to data.
34
Which banks have adopted Data Governance for multi-country data estates?
Standard Bank of South Africa (example).
35
What SLA can be assigned in Data Governance?
Data-quality Service Level Agreements for critical data elements.
36
What is the primary benefit of Data Sharing?
Self-service marketplace for reusable data products.
37
How much time can Data Sharing save on new use-case implementation?
Up to 90% faster.
38
What is the shop-for-data experience?
A marketplace-like UI for discovering and requesting data products.
39
Which IBM component underpins Data Sharing?
Data Product Hub.
40
What does open mean for Data Sharing?
Integration with IBM and third-party tools for high-scale sharing.
41
What governance feature protects sensitive data in Data Sharing?
Data contracts and usage guidelines.
42
Which industry-specific vocabularies are pre-loaded in Data Governance?
Healthcare Energy & Utilities Insurance Financial Services.
43
What is the primary outcome of using Data Governance for a client?
Trusted high-quality data that meets regulatory standards.
44
Name a key benefit of watsonx.data Intelligence overall.
Improved decision-making reduced rework and compliance assurance.
45
Which three personas benefit most from Data Intelligence?
Data Architect Data Engineer Data Consumer.
46
What does AI-ready data mean in the watsonx.data context?
Data that is clean governed and indexed for model training.
47
How does watsonx.data support generative AI?
Through vector search embeddings and retrieval-augmented generation.
48
What is the primary advantage of the hybrid data lakehouse for AI?
Consistent governance while accessing both structured and unstructured data.
49
Which IBM product helps visualize data lineage for business users?
watsonx.data Intelligence Data Lineage UI.
50
What is the impact of Data Observability on pipeline reliability?
Higher SLA adherence and reduced downtime.
51
Which capability enables design once run anywhere?
Composable pipelines in watsonx.data Integration.
52
What does FinOps-conscious refer to in watsonx.data Integration?
Cost-aware execution planning and resource optimization.
53
How does watsonx.data Integration reduce tool sprawl?
By consolidating ETL ELT streaming replication and observability in one UI.
54
What is the typical mean-time-to-resolution (MTTR) improvement with Data Observability?
From weeks to days (or minutes in some cases).
55
Which IBM product provides automated data profiling?
watsonx.data Intelligence Data Governance (LLM-driven profiling).
56
What is the role of the Resource Unit (RU) in pricing?
A unified metric that maps usage across all integration capabilities.
57
How can a client reallocate RUs between capabilities?
RUs act like a gift card; they can be moved from streaming to batch etc.
58
What is the primary function of the Data Product Hub?
Catalog publish and manage data products enterprise-wide.
59
Which IBM solution helps enforce data contracts?
watsonx.data Intelligence Data Sharing.
60
What is the benefit of lightweight customization for data consumers?
Ability to tailor data products without needing developer resources.
61
Name a key metric tracked by Data Observability.
Pipeline latency data freshness schema changes or null-value rates.
62
What does continuous data observability enable?
Proactive alerts before data quality issues impact downstream processes.
63
Which IBM product integrates with Collibra and Alation for metadata?
watsonx.data Intelligence Data Lineage.
64
What is the primary use case for Data Governance in BFSI?
Regulatory compliance and privacy protection.
65
Which IBM product helps accelerate AI model development via data products?
watsonx.data Intelligence Data Sharing.
66
What is the typical cost reduction achieved by Data Sharing for total ownership?
Around 30% lower total ownership costs.
67
Which IBM product provides a unified metadata catalog?
watsonx.data Intelligence (core catalog).
68
What does AI-driven require beyond data-driven?
Trusted governed data that can be reliably used for model training.
69
How does watsonx.data support schema-on-read?
By allowing unstructured data to be queried without predefined schemas.
70
Which IBM product includes a shop-for-data marketplace?
watsonx.data Intelligence Data Sharing.
71
What is the primary benefit of the Data Product Lifecycle framework?
End-to-end governance from ideation to retirement.
72
Which capability enables real-time decision making in streaming use cases?
Low-latency ingestion and processing via StreamSets.
73
What does the term data product refer to?
A packaged governed dataset or AI asset ready for consumption.
74
What is the impact of Data Governance on compliance risk?
Reduces fines and reputational damage by ensuring policy adherence.
75
Which IBM product uses LLMs for metadata enrichment?
watsonx.data Intelligence Data Governance.
76
What is the primary advantage of using the Open API in Data Lineage?
Integration with CI/CD pipelines for automated lineage capture.
77
Which product helps organizations meet GDPR and CCPA requirements?
watsonx.data Intelligence Data Governance.
78
What is the typical SLA improvement for data quality after implementing Data Governance?
Higher data-quality scores and faster issue remediation.
79
Which IBM solution helps reduce data-pipeline rework?
watsonx.data Integration composable pipelines.
80
What does unified control plane mean in watsonx.data Integration?
Single interface to author deploy and monitor all pipeline types.
81
Which product provides inline testing for data pipelines?
watsonx.data Integration Data Observability (Databand).
82
What is the benefit of automated profiling in Data Governance?
Faster discovery of data characteristics and anomalies.
83
How does watsonx.data enable high-scale sharing?
Through the Data Product Hub marketplace architecture.
84
What is the primary outcome of using Data Lineage for impact analysis?
Understanding downstream effects of code or schema changes.
85
Which IBM product is positioned as a leader in the 2024 Gartner Magic Quadrant for Data Integration Tools?
watsonx.data Integration.
86
What key benefit does design once run anywhere provide?
Reduced vendor lock-in and easier migration across clouds.
87
Which product helps monitor pipeline execution and pipeline latency?
watsonx.data Integration Data Observability.
88
What is the role of knowledge accelerators in Data Governance?
Provide pre-built regulatory vocabularies for faster compliance.
89
Which capability helps detect duplicate data early?
Data Observability data-content checks.
90
What is the primary value proposition of watsonx.data for Generative AI?
Trusted searchable data that improves retrieval-augmented generation accuracy.
91
Which product includes DataStage-as-a-Service for ETL/ELT?
watsonx.data Integration ETL and ELT.
92
What does continuous data quality enable for AI pipelines?
Consistent model performance and reduced retraining cycles.
93
Which IBM solution helps organizations achieve FinOps-conscious data engineering?
watsonx.data Integration smart execution and RU-based pricing.
94
What is the typical reduction in mean-time-to-detect (MTTD) with Data Observability?
From days to minutes.
95
Which product provides automated data quality assessment using 7 dimensions?
watsonx.data Intelligence Data Governance.
96
What is the benefit of data-product lifecycle management for data engineers?
Streamlined creation publishing and retirement of data assets.
97
Which IBM product supports vector search for unstructured data?
watsonx.data (via integrated Milvus/DataStax vectors).
98
What does open ecosystem strategy refer to in Data Sharing?
Ability to integrate third-party tools and platforms seamlessly.
99
Which capability helps enforce data contracts in Data Sharing?
Policy-driven access controls within the Data Product Hub.
100
What is the primary advantage of self-service data product discovery?
Faster time-to-value for analytics and AI teams.
101
Which IBM product helps automate testing and deployment of data pipelines?
watsonx.data Intelligence Data Sharing (via CI/CD integration).
102
What is the impact of high-scale data sharing on organizational productivity?
Reduces duplication and accelerates cross-team collaboration.
103
Which product offers column-level lineage visualization?
watsonx.data Intelligence Data Lineage.
104
What is the primary benefit of AI-driven data quality rules?
Dynamic context-aware validation that adapts to new data patterns.
105
Which IBM solution helps monitor data drift in real-time?
watsonx.data Integration Real-time Streaming with observability.
106
What does resource unit (RU) conversion enable for clients?
Flexible allocation of consumption across multiple integration capabilities.
107
Which product provides pre-built workflow engine for data quality remediation?
watsonx.data Intelligence Data Governance.
108
What is the key differentiator of IBM Data Observability vs. competitors?
Deep integration with the control plane and automated incident management.
109
Which IBM product helps visualize data dependencies for business users?
watsonx.data Intelligence Data Lineage UI.
110
What is the primary outcome of data product marketplace adoption?
30% lower total ownership cost and 90% faster use-case rollout.
111
What is the default open table format used in watsonx.data?
Apache Iceberg.
112
What query engine does watsonx.data use for interactive SQL queries?
Presto (Java or C++ variants).
113
What is Prestissimo in watsonx.data?
The C++ implementation of Presto for improved performance.
114
How do you verify Presto C++ is active in watsonx.data?
Run SELECT * FROM system.runtime.nodes and look for ibm-lh-prestissimo node.
115
What is the primary function of the Hive Metastore in watsonx.data?
Unified metadata management for catalogs and table schemas.
116
Which Spark procedure compacts small files in Iceberg tables?
rewrite_data_files with target-file-size-bytes option.
117
How do you expire old snapshots in an Iceberg table?
CALL lakehouse.system.expire_snapshots with older_than and retain_last parameters.
118
What command removes orphan files from Iceberg tables?
CALL lakehouse.system.remove_orphan_files(table => 'schema.tablename').
119
What does the rewrite_manifests procedure do?
Optimizes Iceberg table manifest files for better query performance.
120
How do you add a column to an Iceberg table in watsonx.data?
ALTER TABLE tablename ADD COLUMN(column_name data_type).
121
What Spark setting disables vectorized reads for Iceberg Parquet V2?
spark.sql.iceberg.vectorization.enabled set to false.
122
What is the presto-cli utility used for?
Running interactive SQL queries against Presto engines from command line.
123
How do you enable the Query Optimizer in Presto C++?
Set session enable_wxd_query_optimizer=true or is_query_rewriter_plugin_enabled=true.
124
What storage protocols does watsonx.data support for Spark applications?
s3a (S3/COS) abfss (ADLS) and gs (GCS).
125
What is the dev-sandbox command in watsonx.data?
Provides a containerized environment with pre-installed utilities for exploring the lakehouse.
126
How do you shut down watsonx.data on OpenShift?
oc patch wxd lakehouse with spec.shutdown set to true.
127
What is the force shutdown command for watsonx.data?
oc patch wxd lakehouse with spec.shutdown set to force.
128
What API authentication does watsonx.data Spark use?
ZenApiKey with base64-encoded username:apikey.
129
What is the ibm-lh data-copy command used for?
Ingesting data files from local file system into watsonx.data lakehouse.
130
What configuration file format is used for watsonx.data ingestion jobs?
INI format with global-ingest-config and ingest-config sections.
131
What is the staging-location parameter in ingestion jobs?
S3 path used for temporary storage during data ingestion.
132
How do you access full watsonx.data Intelligence features on AWS?
Append ?context=df to the aws.data.ibm.com URL before logging in.
133
What Spark extensions are required for Iceberg in watsonx.data?
org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions.
134
What is the default truststore password for watsonx.data connections?
changeit.
135
How do you create a database in watsonx.data with Spark?
spark.sql(CREATE DATABASE IF NOT EXISTS catalog.dbname LOCATION s3path).
136
What metadata table shows Iceberg data files?
tablename.files (e.g. SELECT file_path FROM schema.table.files).
137
What metadata table shows Iceberg snapshots?
tablename.snapshots (e.g. SELECT snapshot_id FROM schema.table.snapshots).
138
What is the recommended target file size for Iceberg compaction?
209715200 bytes (200MB).
139
How do you list databases in a watsonx.data catalog?
SHOW DATABASES FROM catalogname.
140
What is the purpose of the LhInstanceId header in watsonx.data APIs?
Identifies the specific watsonx.data instance for API requests.
141
What CLI command adds a Presto engine connection?
connect-lh --op=add with name host port username password parameters.
142
How do you enable Hive Metastore NodePort access?
oc patch wxd/lakehouse with spec.expose_hive_metastore set to true.
143
What is the EXT_METASTORE_STATS_SYNC procedure used for?
Gathering enhanced statistics for Iceberg tables for Query Optimizer.
144
What authentication mode does Hive Metastore use in watsonx.data?
PLAIN authentication with username and password.
145
What is the spark.sql.catalogImplementation setting for watsonx.data?
hive for Hive Metastore integration.
146
How do you create an Iceberg table from Parquet data?
df.writeTo(catalog.schema.tablename).create() after reading Parquet into DataFrame.
147
What is the manage-engines --op=list command for?
Listing available Presto engines in watsonx.data.
148
What Delta Lake catalog implementation does watsonx.data support?
org.apache.spark.sql.delta.catalog.DeltaCatalog.