watsonx.data Flashcards

Question

What benefit does real-time streaming give to a European bank?

Answer 1

Reduced latency from days to 1 minute and £400K annual cost savings.

Answer 2

watsonx.data Intelligence Data Lineage.

Answer 3

Column-level lineage with transformation expressions.

Answer 4

By visualizing data flow and impact for audit readiness.

Answer 5

Open Lineage API for CI/CD integration.

Answer 6

watsonx.data Intelligence Data Governance.

Answer 7

IBM Large Language Models (LLMs).

Answer 8

190+ rules.

Answer 9

Industry-specific vocabularies that map regulations to data.

Answer 10

Standard Bank of South Africa (example).

Answer 11

Data-quality Service Level Agreements for critical data elements.

Answer 12

Self-service marketplace for reusable data products.

Answer 13

Up to 90% faster.

Answer 14

A marketplace-like UI for discovering and requesting data products.

Answer 15

Data Product Hub.

Answer 16

Integration with IBM and third-party tools for high-scale sharing.

Answer 17

Data contracts and usage guidelines.

Answer 18

Healthcare Energy & Utilities Insurance Financial Services.

Answer 19

Trusted high-quality data that meets regulatory standards.

Answer 20

Improved decision-making reduced rework and compliance assurance.

Answer 21

Data Architect Data Engineer Data Consumer.

Answer 22

Data that is clean governed and indexed for model training.

Answer 23

Through vector search embeddings and retrieval-augmented generation.

Answer 24

Consistent governance while accessing both structured and unstructured data.

Answer 25

watsonx.data Intelligence Data Lineage UI.

Answer 26

Higher SLA adherence and reduced downtime.

Answer 27

Composable pipelines in watsonx.data Integration.

Answer 28

Cost-aware execution planning and resource optimization.

Answer 29

By consolidating ETL ELT streaming replication and observability in one UI.

Answer 30

From weeks to days (or minutes in some cases).

Answer 31

watsonx.data Intelligence Data Governance (LLM-driven profiling).

Answer 32

A unified metric that maps usage across all integration capabilities.

Answer 33

RUs act like a gift card; they can be moved from streaming to batch etc.

Answer 34

Catalog publish and manage data products enterprise-wide.

Answer 35

watsonx.data Intelligence Data Sharing.

Answer 36

Ability to tailor data products without needing developer resources.

Answer 37

Pipeline latency data freshness schema changes or null-value rates.

Answer 38

Proactive alerts before data quality issues impact downstream processes.

Answer 39

watsonx.data Intelligence Data Lineage.

Answer 40

Regulatory compliance and privacy protection.

Answer 41

watsonx.data Intelligence Data Sharing.

Answer 42

Around 30% lower total ownership costs.

Answer 43

watsonx.data Intelligence (core catalog).

Answer 44

Trusted governed data that can be reliably used for model training.

Answer 45

By allowing unstructured data to be queried without predefined schemas.

Answer 46

watsonx.data Intelligence Data Sharing.

Answer 47

End-to-end governance from ideation to retirement.

Answer 48

Low-latency ingestion and processing via StreamSets.

Answer 49

A packaged governed dataset or AI asset ready for consumption.

Answer 50

Reduces fines and reputational damage by ensuring policy adherence.

Answer 51

watsonx.data Intelligence Data Governance.

Answer 52

Integration with CI/CD pipelines for automated lineage capture.

Answer 53

watsonx.data Intelligence Data Governance.

Answer 54

Higher data-quality scores and faster issue remediation.

Answer 55

watsonx.data Integration composable pipelines.

Answer 56

Single interface to author deploy and monitor all pipeline types.

Answer 57

watsonx.data Integration Data Observability (Databand).

Answer 58

Faster discovery of data characteristics and anomalies.

Answer 59

Through the Data Product Hub marketplace architecture.

Answer 60

Understanding downstream effects of code or schema changes.

Answer 61

watsonx.data Integration.

Answer 62

Reduced vendor lock-in and easier migration across clouds.

Answer 63

watsonx.data Integration Data Observability.

Answer 64

Provide pre-built regulatory vocabularies for faster compliance.

Answer 65

Data Observability data-content checks.

Answer 66

Trusted searchable data that improves retrieval-augmented generation accuracy.

Answer 67

watsonx.data Integration ETL and ELT.

Answer 68

Consistent model performance and reduced retraining cycles.

Answer 69

watsonx.data Integration smart execution and RU-based pricing.

Answer 70

From days to minutes.

Answer 71

watsonx.data Intelligence Data Governance.

Answer 72

Streamlined creation publishing and retirement of data assets.

Answer 73

watsonx.data (via integrated Milvus/DataStax vectors).

Answer 74

Ability to integrate third-party tools and platforms seamlessly.

Answer 75

Policy-driven access controls within the Data Product Hub.

Answer 76

Faster time-to-value for analytics and AI teams.

Answer 77

watsonx.data Intelligence Data Sharing (via CI/CD integration).

Answer 78

Reduces duplication and accelerates cross-team collaboration.

Answer 79

watsonx.data Intelligence Data Lineage.

Answer 80

Dynamic context-aware validation that adapts to new data patterns.

Answer 81

watsonx.data Integration Real-time Streaming with observability.

Answer 82

Flexible allocation of consumption across multiple integration capabilities.

Answer 83

watsonx.data Intelligence Data Governance.

Answer 84

Deep integration with the control plane and automated incident management.

Answer 85

watsonx.data Intelligence Data Lineage UI.

Answer 86

30% lower total ownership cost and 90% faster use-case rollout.

Answer 87

Apache Iceberg.

Answer 88

Presto (Java or C++ variants).

Answer 89

The C++ implementation of Presto for improved performance.

Answer 90

Run SELECT * FROM system.runtime.nodes and look for ibm-lh-prestissimo node.

Answer 91

Unified metadata management for catalogs and table schemas.

Answer 92

rewrite_data_files with target-file-size-bytes option.

Answer 93

CALL lakehouse.system.expire_snapshots with older_than and retain_last parameters.

Answer 94

CALL lakehouse.system.remove_orphan_files(table => 'schema.tablename').

Answer 95

Optimizes Iceberg table manifest files for better query performance.

Answer 96

ALTER TABLE tablename ADD COLUMN(column_name data_type).

Answer 97

spark.sql.iceberg.vectorization.enabled set to false.

Answer 98

Running interactive SQL queries against Presto engines from command line.

Answer 99

Set session enable_wxd_query_optimizer=true or is_query_rewriter_plugin_enabled=true.

Answer 100

s3a (S3/COS) abfss (ADLS) and gs (GCS).

Answer 101

Provides a containerized environment with pre-installed utilities for exploring the lakehouse.

Answer 102

oc patch wxd lakehouse with spec.shutdown set to true.

Answer 103

oc patch wxd lakehouse with spec.shutdown set to force.

Answer 104

ZenApiKey with base64-encoded username:apikey.

Answer 105

Ingesting data files from local file system into watsonx.data lakehouse.

Answer 106

INI format with global-ingest-config and ingest-config sections.

Answer 107

S3 path used for temporary storage during data ingestion.

Answer 108

Append ?context=df to the aws.data.ibm.com URL before logging in.

Answer 109

org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions.

Answer 110

spark.sql(CREATE DATABASE IF NOT EXISTS catalog.dbname LOCATION s3path).

Answer 111

tablename.files (e.g. SELECT file_path FROM schema.table.files).

Answer 112

tablename.snapshots (e.g. SELECT snapshot_id FROM schema.table.snapshots).

Answer 113

209715200 bytes (200MB).

Answer 114

SHOW DATABASES FROM catalogname.

Answer 115

Identifies the specific watsonx.data instance for API requests.

Answer 116

connect-lh --op=add with name host port username password parameters.

Answer 117

oc patch wxd/lakehouse with spec.expose_hive_metastore set to true.

Answer 118

Gathering enhanced statistics for Iceberg tables for Query Optimizer.

Answer 119

PLAIN authentication with username and password.

Answer 120

hive for Hive Metastore integration.

Answer 121

df.writeTo(catalog.schema.tablename).create() after reading Parquet into DataFrame.

Answer 122

Listing available Presto engines in watsonx.data.

Answer 123

org.apache.spark.sql.delta.catalog.DeltaCatalog.

watsonx.data Flashcards

(148 cards)