Database Flashcards

(5 cards)

1
Q

Sql vs no sql

A

SQL databases are relational, offer strong consistency, and are suitable for structured data with fixed schemas.
* like a excel sheet

NoSQL databases are non-relational, provide scalability and flexibility, and excel with unstructured data or when scalability is a priority.
* document base, like analytics, logging

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

to write sargable queries

A

Search ARGument ABLE queries: (allow db engine use its ability to search)

1- avoid using functinos on indexed columns in where

2- use direct comparisons instead of wrapping into function

3- try to use computed column or function-based index instated of wrapping column into a function

non-sargable

WHERE YEAR(order_date) >= 2023

sargable

WHERE order_date >= '2023-01-01'
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what’s elastic db and why we need it

A

it’s a text/json db

  • it writes data like a query (faster query)
  • it’s distributed , horizantly scalable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

SQL vs Spark Sql vs Presto (Trino)

A
  1. Standard SQL
    Optimized for transactional data on a single machine or small cluster
  2. Spark SQL
    Used within the Apache Spark ecosystem to process data stored in formats like Parquet or Avro. It allows mixing SQL with programming languages like Python or Scala.
    Feature: Excellent for long-running ETL jobs that must not fail.
  3. Presto (Trino)
    An in-memory engine designed for speed. It is famous for its Connectors, allowing you to join a MySQL table with a Hive table in a single query.

Feature: Federated queries across different types of databases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

A Hive table

A

A Hive table creates a structured interface over raw files, allowing you to use SQL instead of writing complex code.
1. The Scenario
Imagine you have a 1TB log file stored in HDFS at /data/logs/server_logs.txt. The data looks like this:
101,2026-01-28,INFO,User logged in
102,2026-01-28,ERROR,Connection failed
Without Hive, you would need to write a MapReduce program in Java (hundreds of lines of code) just to count how many “ERROR” messages occurred.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly