What is data transformation?
Data transformation is the process of converting raw data into a clean, structured, and usable format.
Example (Python – change column format):
import pandas as pd
df = pd.DataFrame({‘date’: [‘2025-01-01’, ‘2025-01-02’]})
df[‘date’] = pd.to_datetime(df[‘date’])
Why is data transformation important?
It ensures data consistency, quality, and compatibility for analysis and reporting.
Practical example:
Converting text-based dates into date format for reporting.
What are common types of data transformation?
These types help in preparing data for analysis.
Difference between data transformation and data cleansing?
Example (SQL – cleansing + transformation):
SELECT DISTINCT UPPER(name) AS name
FROM customers
WHERE name IS NOT NULL;
When is data transformation performed?
After data collection and before data analysis (ETL process).
This step is crucial for ensuring data is ready for analysis.
What is ETL?
ETL stands for Extract, Transform, Load.
Azure Example:
Extract: Azure Data Factory pulls data from SQL Server
Transform: Mapping Data Flow cleans data
Load: Store data in Azure Data Lake or Synapse.
What is data normalization?
Scaling data to a standard range or structure.
Python Example (Min-Max scaling):
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
df[[‘salary’]] = scaler.fit_transform(df[[‘salary’]])
What is data aggregation?
Summarizing data (sum, count, average).
SQL Example:
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department;
What is data filtering?
Selecting only required data.
SQL Example:
SELECT * FROM orders
WHERE order_date >= ‘2025-01-01’;
What is data mapping?
Matching fields from source to destination.
Example:
cust_id → customer_id, dob → birth_date.
What is data format conversion?
Changing data from one format to another.
Python Example:
df[‘phone’] = df[‘phone’].astype(str).
What is schema transformation?
Changing table structure (columns, data types).
SQL Example:
ALTER TABLE employees
ADD email VARCHAR(100);
Difference between structured and unstructured data transformation?
This distinction affects how data is processed.
What is data standardization?
Making data consistent.
SQL Example:
UPDATE customers
SET country = ‘USA’
WHERE country IN (‘US’, ‘United States’);
What is data enrichment?
Adding extra information to existing data.
Example:
Adding city names using ZIP codes via external API.
What are tools used for data transformation?
These tools facilitate various transformation tasks.
Role of SQL in data transformation?
Used for filtering, joining, aggregating data.
Example:
SELECT c.name, o.total
FROM customers c
JOIN orders o ON c.id = o.customer_id;
Difference between batch and real-time transformation?
Azure Example:
Azure Stream Analytics for real-time data.
What is cloud-based data transformation?
Transformation done using cloud services.
Azure Example:
Azure Data Factory Mapping Data Flow.
What is a data pipeline?
Automated flow of data from source to destination.
Azure Example:
Source → ADF → Data Lake → Synapse.
How does transformation affect data quality?
Improves accuracy, consistency, and usability.
This is essential for reliable analysis.
What are challenges in data transformation?
These challenges can complicate the transformation process.
How to handle missing values?
Python Example:
df.fillna(0, inplace=True).
What is data validation?
Checking transformed data meets rules.
SQL Example:
SELECT * FROM employees
WHERE salary < 0;