Databases & Persistence Flashcards by Ádám Demeter

How do you choose the right database for a new project and why?

Assess requirements:
* Data structure (relational vs. non-relational)
* Scalability and performance needs
* Transaction support and consistency
* Integration with existing systems
* Team expertise and support

Relational DB (e.g., PostgreSQL, MySQL):
* Use for structured data, complex queries, and ACID transactions.

NoSQL DB (e.g., MongoDB, Cassandra):
* Use for flexible schemas, high scalability, or large volumes of unstructured data.

How well did you know this?

Not at all

Perfectly

What are transactions?

A transaction is a sequence of operations performed as a single logical unit of work.

Ensures ACID properties:
* Atomicity: All operations succeed or none do.
* Consistency: Data remains valid before and after.
* Isolation: Concurrent transactions don’t interfere.
* Durability: Once committed, changes are permanent.

How well did you know this?

Not at all

Perfectly

What is ACID?

ACID stands for four key properties of database transactions:
* Atomicity: All operations in a transaction are completed or none are.
* Consistency: Transactions bring the database from one valid state to another.
* Isolation: Concurrent transactions do not affect each other’s outcome.
* Durability: Once a transaction is committed, changes are permanent.

How well did you know this?

Not at all

Perfectly

What is the difference between the ACID and BASE models?

ACID:
* Ensures strong consistency with Atomicity, Consistency, Isolation, and Durability.
* Used in traditional relational databases.

BASE:
* Stands for Basically Available, Soft state, Eventual consistency.
* Focuses on high availability and scalability, allowing temporary inconsistencies.
* Used in many NoSQL databases.

How well did you know this?

Not at all

Perfectly

How would you store and process PII data?

Store securely:
* Use encryption at rest and in transit.
* Restrict access with strong authentication and authorization.
* Store only necessary PII, following data minimization principles.

Process carefully:
* Mask or anonymize data when possible.
* Log access and changes for auditability.
* Comply with regulations (e.g., GDPR, CCPA).
* Regularly review and update security measures.

How well did you know this?

Not at all

Perfectly

How is propagation and isolation important in transactions?

Propagation:
* Controls how transactions relate when one transactional method calls another.
* Determines if a new transaction starts, joins an existing one, or runs non-transactionally (e.g., REQUIRED, REQUIRES_NEW).

Isolation:
* Defines how/when changes made by one transaction are visible to others.
* Prevents issues like dirty reads, non-repeatable reads, and phantom reads (e.g., READ_COMMITTED, SERIALIZABLE).

How well did you know this?

Not at all

Perfectly

How does Hibernate/JPA handle mapping of class hierarchies onto relational tables?

Supports three main inheritance strategies:
* Single Table: All classes in the hierarchy are mapped to one table (uses a discriminator column).
* Joined: Each class has its own table; tables are joined for queries.
* Table per Class: Each concrete class has its own table with all fields.

Choose a strategy with @Inheritance annotation.

How well did you know this?

Not at all

Perfectly

What database transaction isolation levels are there?

Read Uncommitted:
* Allows dirty reads (can see uncommitted changes).

Read Committed:
* Only sees committed data; prevents dirty reads.

Repeatable Read:
* Same query returns the same data within a transaction; prevents non-repeatable reads.

Serializable:
* Highest level; transactions are fully isolated, preventing dirty, non-repeatable, and phantom reads.

How well did you know this?

Not at all

Perfectly

Would you use JDBC on your next project? Why or why not?

Not directly, unless:
* The project is very simple or requires fine-grained control over SQL.

Prefer using:
* Higher-level abstractions like Spring Data JPA or JdbcTemplate for easier development, less boilerplate, and better maintainability.

Why:
* ORMs and templates handle resource management, mapping, and error handling, reducing manual effort and risk of bugs.

How well did you know this?

Not at all

Perfectly

What is the difference between inner join and full outer join?

Inner Join:
Returns only rows with matching values in both tables.

Full Outer Join:
Returns all rows from both tables; unmatched rows are filled with NULL for missing values.

How well did you know this?

Not at all

Perfectly

What is a query plan? How can it help in performance investigations?

Query plan:
* A detailed roadmap generated by the database showing how it will execute a SQL query (e.g., which indexes, joins, and scan methods will be used).

Helps by:
* Identifying slow operations (like full table scans).
* Revealing missing indexes or inefficient joins.
* Guiding query optimization for better performance.

How well did you know this?

Not at all

Perfectly

How do you encrypt data sent to the database?

Use TLS/SSL:
* Enable encrypted connections between your application and the database to protect data in transit.

Application-level encryption:
* Encrypt sensitive data in your application code before sending it to the database.

Database-level encryption:
* Use database features like Transparent Data Encryption (TDE) to encrypt data at rest.

How well did you know this?

Not at all

Perfectly

What are the different statement types in SQL and when would you use each?

DDL (Data Definition Language):
* Defines or alters database structure (e.g., CREATE, ALTER, DROP).
* Use when creating or modifying tables, schemas, or indexes.

DML (Data Manipulation Language):
* Manages data within tables (e.g., SELECT, INSERT, UPDATE, DELETE).
* Use for querying or changing data.

DCL (Data Control Language):
* Controls access to data (e.g., GRANT, REVOKE).
* Use for managing permissions and security.

TCL (Transaction Control Language):
* Manages transactions (e.g., COMMIT, ROLLBACK, SAVEPOINT).
* Use for controlling transaction boundaries.

How well did you know this?

Not at all

Perfectly

What is normalization and denormalization? When would you denormalize?

Normalization:
* Organizing data to reduce redundancy and improve data integrity (using normal forms).

Denormalization:
* Introducing redundancy by combining tables or duplicating data to improve read performance.

When to denormalize:
* When query performance is critical and joins are too costly.
In reporting, analytics, or read-heavy systems.

How well did you know this?

Not at all

Perfectly

What is lazy loading in Hibernate and how does it work?

Lazy loading:
* Associated entities or collections are not loaded from the database until they are accessed.

How it works:
* Hibernate creates proxy objects for associations.
* When you access the association, Hibernate fetches the data on demand.

How well did you know this?

Not at all

Perfectly

Which mapping annotations are lazy by default in Hibernate?

Study These Flashcards

@OneToMany and @ManyToMany associations are lazy by default.

@ManyToOne and @OneToOne are eager by default, but can be set to lazy with fetch = FetchType.LAZY.

What are the different fetching strategies in Hibernate?

Study These Flashcards

Lazy fetching:
* Data is loaded only when accessed (on demand).

Eager fetching:
* Data is loaded immediately with the parent entity.

Fetch modes:
* SELECT: Separate SQL queries for associations.
* JOIN: Uses SQL joins to fetch associations in a single query.
* SUBSELECT: Loads collections for all parent entities in a single subselect query.

How do you handle the mapping between your application model and database model?

Study These Flashcards

Use ORM frameworks (like Hibernate/JPA) with annotations such as @Entity, @Table, and @Column to map Java classes and fields to database tables and columns.

Define relationships using annotations like @OneToMany, @ManyToOne, etc.

Use DTOs or projections to transfer only required data and decouple domain models from database schemas.

What are the benefits and trade-offs of normalization?

Study These Flashcards

Benefits:
* Reduces data redundancy.
* Improves data integrity and consistency.
* Simplifies updates and maintenance.

Trade-offs:
* Can require complex queries with multiple joins.
* May reduce read performance in analytics or reporting scenarios.
* Sometimes denormalization is needed for performance optimization.

Can a primary key be non-unique or empty? Can a foreign key be non-unique or empty?

Study These Flashcards

Primary key:
Must be unique and cannot be empty (NULL); every row must have a unique, non-null value.

Foreign key:
* Can be non-unique (multiple rows can reference the same value).
* Can be empty (NULL) if the relationship is optional.

What is the downside of indexes? When should we avoid them?

Study These Flashcards

Downsides:
* Indexes use extra storage space.
* Slow down write operations (INSERT, UPDATE, DELETE) due to index maintenance.

Avoid indexes:
* On columns with high update/delete frequency.
* On columns with low selectivity (many duplicate values).
* When storage or write performance is a concern.

What types of NoSQL databases are there?

Study These Flashcards

Document stores: Store data as documents (e.g., MongoDB, CouchDB).

Key-value stores: Store data as key-value pairs (e.g., Redis, DynamoDB).

Column-family stores: Store data in columns rather than rows (e.g., Cassandra, HBase).

Graph databases: Store data as nodes and relationships (e.g., Neo4j, Amazon Neptune).

How would you optimize a slow-performing SQL query? Can you share an example?

Study These Flashcards

Optimization steps:
* Analyze the query plan to find bottlenecks.
* Add or adjust indexes on columns used in WHERE, JOIN, or ORDER BY clauses.
* Rewrite the query to reduce unnecessary joins or subqueries.
* Select only needed columns (avoid SELECT *).
* Use proper filtering and limit result sets.

Example:
* If a query joining orders and customers is slow, add an index on orders.customer_id and select only required fields.

What other approaches can you use to improve database performance?

Study These Flashcards

Caching: Store frequently accessed data in memory.

Connection pooling: Reuse database connections to reduce overhead.

Partitioning/sharding: Split large tables across multiple disks or servers.

Denormalization: Reduce joins by duplicating data for faster reads.

Archiving old data: Move rarely used data out of main tables.

Optimize hardware: Use faster disks, more memory, or better CPUs.

In the scenario where performance degrades after migrating from Oracle SQL to PostgreSQL, what steps would you take to identify and resolve bottlenecks?

**Analyze query plans**: Compare execution plans in PostgreSQL to Oracle to spot differences. **Check indexing**: Ensure indexes are present and optimized for PostgreSQL. **Review SQL syntax**: Adapt queries for PostgreSQL-specific features and optimizations. **Tune configuration**: Adjust PostgreSQL settings (e.g., work_mem, shared_buffers) for your workload. **Monitor resource usage**: Check CPU, memory, and disk I/O for bottlenecks. **Test and iterate**: Benchmark changes and monitor improvements.

After decomposing your app, each microservice has its own DB. What happens if one service fails to update its data? How do you ensure consistency?

If one service fails: * Data can become inconsistent across services (no distributed transaction by default). Ensuring consistency: * Use patterns like sagas (orchestration/choreography) to manage distributed transactions. * Implement event-driven architecture with reliable messaging and compensation actions. * Use idempotent operations and retries to handle failures gracefully.

Imagine two services: User (manages user info) and Notification (sends notifications on data change). What type of database would you choose for each and why?

**User service**: Use a relational database (e.g., PostgreSQL, MySQL) for structured user data, strong consistency, and support for complex queries and transactions. **Notification service**: Use a NoSQL database (e.g., MongoDB, Redis) for flexible schema, high write throughput, and fast access—ideal for storing and processing notification events.

You are tasked with migrating an existing SQL-based application to NoSQL. What factors and challenges do you consider?

Factors: * Data model differences (schema-less vs. structured tables) * Query requirements (NoSQL may lack complex joins/transactions) * Consistency needs (eventual vs. strong consistency) * Scalability and performance goals * Integration with existing systems * Team expertise with NoSQL technologies Challenges: * Data migration and transformation * Rewriting queries and application logic * Ensuring data integrity and consistency * Handling transactions and relationships differently * Adapting to new backup, monitoring, and security practices

What is CAP theorem?

CAP theorem states that a distributed system can only guarantee two out of three properties at the same time: * **Consistency**: Every read receives the most recent write or an error. * **Availability**: Every request receives a (non-error) response, even if some nodes are down. * **Partition tolerance**: The system continues to operate despite network partitions.

What is BASE?

BASE stands for: **Basically Available**: The system guarantees availability. **Soft state**: The state of the system may change over time, even without input. **Eventual consistency**: The system will become consistent over time, given no new updates. Used in many NoSQL databases to achieve high availability and scalability, trading off strong consistency.

Databases & Persistence Flashcards

(30 cards)