Sql vs no sql
SQL databases are relational, offer strong consistency, and are suitable for structured data with fixed schemas.
* like a excel sheet
NoSQL databases are non-relational, provide scalability and flexibility, and excel with unstructured data or when scalability is a priority.
* document base, like analytics, logging
to write sargable queries
Search ARGument ABLE queries: (allow db engine use its ability to search)
1- avoid using functinos on indexed columns in where
2- use direct comparisons instead of wrapping into function
3- try to use computed column or function-based index instated of wrapping column into a function
non-sargable
WHERE YEAR(order_date) >= 2023
sargable
WHERE order_date >= '2023-01-01'
what’s elastic db and why we need it
it’s a text/json db
SQL vs Spark Sql vs Presto (Trino)
Feature: Federated queries across different types of databases.
A Hive table
A Hive table creates a structured interface over raw files, allowing you to use SQL instead of writing complex code.
1. The Scenario
Imagine you have a 1TB log file stored in HDFS at /data/logs/server_logs.txt. The data looks like this:
101,2026-01-28,INFO,User logged in
102,2026-01-28,ERROR,Connection failed
Without Hive, you would need to write a MapReduce program in Java (hundreds of lines of code) just to count how many “ERROR” messages occurred.