How do you choose the right database for a new project and why?
Assess requirements:
* Data structure (relational vs. non-relational)
* Scalability and performance needs
* Transaction support and consistency
* Integration with existing systems
* Team expertise and support
Relational DB (e.g., PostgreSQL, MySQL):
* Use for structured data, complex queries, and ACID transactions.
NoSQL DB (e.g., MongoDB, Cassandra):
* Use for flexible schemas, high scalability, or large volumes of unstructured data.
What are transactions?
A transaction is a sequence of operations performed as a single logical unit of work.
Ensures ACID properties:
* Atomicity: All operations succeed or none do.
* Consistency: Data remains valid before and after.
* Isolation: Concurrent transactions don’t interfere.
* Durability: Once committed, changes are permanent.
What is ACID?
ACID stands for four key properties of database transactions:
* Atomicity: All operations in a transaction are completed or none are.
* Consistency: Transactions bring the database from one valid state to another.
* Isolation: Concurrent transactions do not affect each other’s outcome.
* Durability: Once a transaction is committed, changes are permanent.
What is the difference between the ACID and BASE models?
ACID:
* Ensures strong consistency with Atomicity, Consistency, Isolation, and Durability.
* Used in traditional relational databases.
BASE:
* Stands for Basically Available, Soft state, Eventual consistency.
* Focuses on high availability and scalability, allowing temporary inconsistencies.
* Used in many NoSQL databases.
How would you store and process PII data?
Store securely:
* Use encryption at rest and in transit.
* Restrict access with strong authentication and authorization.
* Store only necessary PII, following data minimization principles.
Process carefully:
* Mask or anonymize data when possible.
* Log access and changes for auditability.
* Comply with regulations (e.g., GDPR, CCPA).
* Regularly review and update security measures.
How is propagation and isolation important in transactions?
Propagation:
* Controls how transactions relate when one transactional method calls another.
* Determines if a new transaction starts, joins an existing one, or runs non-transactionally (e.g., REQUIRED, REQUIRES_NEW).
Isolation:
* Defines how/when changes made by one transaction are visible to others.
* Prevents issues like dirty reads, non-repeatable reads, and phantom reads (e.g., READ_COMMITTED, SERIALIZABLE).
How does Hibernate/JPA handle mapping of class hierarchies onto relational tables?
Supports three main inheritance strategies:
* Single Table: All classes in the hierarchy are mapped to one table (uses a discriminator column).
* Joined: Each class has its own table; tables are joined for queries.
* Table per Class: Each concrete class has its own table with all fields.
Choose a strategy with @Inheritance annotation.
What database transaction isolation levels are there?
Read Uncommitted:
* Allows dirty reads (can see uncommitted changes).
Read Committed:
* Only sees committed data; prevents dirty reads.
Repeatable Read:
* Same query returns the same data within a transaction; prevents non-repeatable reads.
Serializable:
* Highest level; transactions are fully isolated, preventing dirty, non-repeatable, and phantom reads.
Would you use JDBC on your next project? Why or why not?
Not directly, unless:
* The project is very simple or requires fine-grained control over SQL.
Prefer using:
* Higher-level abstractions like Spring Data JPA or JdbcTemplate for easier development, less boilerplate, and better maintainability.
Why:
* ORMs and templates handle resource management, mapping, and error handling, reducing manual effort and risk of bugs.
What is the difference between inner join and full outer join?
Inner Join:
Returns only rows with matching values in both tables.
Full Outer Join:
Returns all rows from both tables; unmatched rows are filled with NULL for missing values.
What is a query plan? How can it help in performance investigations?
Query plan:
* A detailed roadmap generated by the database showing how it will execute a SQL query (e.g., which indexes, joins, and scan methods will be used).
Helps by:
* Identifying slow operations (like full table scans).
* Revealing missing indexes or inefficient joins.
* Guiding query optimization for better performance.
How do you encrypt data sent to the database?
Use TLS/SSL:
* Enable encrypted connections between your application and the database to protect data in transit.
Application-level encryption:
* Encrypt sensitive data in your application code before sending it to the database.
Database-level encryption:
* Use database features like Transparent Data Encryption (TDE) to encrypt data at rest.
What are the different statement types in SQL and when would you use each?
DDL (Data Definition Language):
* Defines or alters database structure (e.g., CREATE, ALTER, DROP).
* Use when creating or modifying tables, schemas, or indexes.
DML (Data Manipulation Language):
* Manages data within tables (e.g., SELECT, INSERT, UPDATE, DELETE).
* Use for querying or changing data.
DCL (Data Control Language):
* Controls access to data (e.g., GRANT, REVOKE).
* Use for managing permissions and security.
TCL (Transaction Control Language):
* Manages transactions (e.g., COMMIT, ROLLBACK, SAVEPOINT).
* Use for controlling transaction boundaries.
What is normalization and denormalization? When would you denormalize?
Normalization:
* Organizing data to reduce redundancy and improve data integrity (using normal forms).
Denormalization:
* Introducing redundancy by combining tables or duplicating data to improve read performance.
When to denormalize:
* When query performance is critical and joins are too costly.
In reporting, analytics, or read-heavy systems.
What is lazy loading in Hibernate and how does it work?
Lazy loading:
* Associated entities or collections are not loaded from the database until they are accessed.
How it works:
* Hibernate creates proxy objects for associations.
* When you access the association, Hibernate fetches the data on demand.
Which mapping annotations are lazy by default in Hibernate?
@OneToMany and @ManyToMany associations are lazy by default.
@ManyToOne and @OneToOne are eager by default, but can be set to lazy with fetch = FetchType.LAZY.
What are the different fetching strategies in Hibernate?
Lazy fetching:
* Data is loaded only when accessed (on demand).
Eager fetching:
* Data is loaded immediately with the parent entity.
Fetch modes:
* SELECT: Separate SQL queries for associations.
* JOIN: Uses SQL joins to fetch associations in a single query.
* SUBSELECT: Loads collections for all parent entities in a single subselect query.
How do you handle the mapping between your application model and database model?
Use ORM frameworks (like Hibernate/JPA) with annotations such as @Entity, @Table, and @Column to map Java classes and fields to database tables and columns.
Define relationships using annotations like @OneToMany, @ManyToOne, etc.
Use DTOs or projections to transfer only required data and decouple domain models from database schemas.
What are the benefits and trade-offs of normalization?
Benefits:
* Reduces data redundancy.
* Improves data integrity and consistency.
* Simplifies updates and maintenance.
Trade-offs:
* Can require complex queries with multiple joins.
* May reduce read performance in analytics or reporting scenarios.
* Sometimes denormalization is needed for performance optimization.
Can a primary key be non-unique or empty? Can a foreign key be non-unique or empty?
Primary key:
Must be unique and cannot be empty (NULL); every row must have a unique, non-null value.
Foreign key:
* Can be non-unique (multiple rows can reference the same value).
* Can be empty (NULL) if the relationship is optional.
What is the downside of indexes? When should we avoid them?
Downsides:
* Indexes use extra storage space.
* Slow down write operations (INSERT, UPDATE, DELETE) due to index maintenance.
Avoid indexes:
* On columns with high update/delete frequency.
* On columns with low selectivity (many duplicate values).
* When storage or write performance is a concern.
What types of NoSQL databases are there?
Document stores: Store data as documents (e.g., MongoDB, CouchDB).
Key-value stores: Store data as key-value pairs (e.g., Redis, DynamoDB).
Column-family stores: Store data in columns rather than rows (e.g., Cassandra, HBase).
Graph databases: Store data as nodes and relationships (e.g., Neo4j, Amazon Neptune).
How would you optimize a slow-performing SQL query? Can you share an example?
Optimization steps:
* Analyze the query plan to find bottlenecks.
* Add or adjust indexes on columns used in WHERE, JOIN, or ORDER BY clauses.
* Rewrite the query to reduce unnecessary joins or subqueries.
* Select only needed columns (avoid SELECT *).
* Use proper filtering and limit result sets.
Example:
* If a query joining orders and customers is slow, add an index on orders.customer_id and select only required fields.
What other approaches can you use to improve database performance?
Caching: Store frequently accessed data in memory.
Connection pooling: Reuse database connections to reduce overhead.
Partitioning/sharding: Split large tables across multiple disks or servers.
Denormalization: Reduce joins by duplicating data for faster reads.
Archiving old data: Move rarely used data out of main tables.
Optimize hardware: Use faster disks, more memory, or better CPUs.