How do you find the top N records per group in SQL?
You use a window function like ROW_NUMBER or RANK partitioned by the group and ordered by the value you care about. Then you keep only rows where the rank is less than or equal to N. This lets each group independently pick its top records.
What problem does the Top-N-per-group query solve?
Sometimes you want the best results within each category rather than across the entire dataset. For example the top 3 products in each region.
How can SQL detect duplicate rows?
You group rows by the columns that should be unique and count how many rows exist. If the count is greater than one then duplicates exist.
How do you remove duplicates in SQL?
You usually assign ROW_NUMBER within groups of duplicate rows and keep only row number 1. This keeps one clean record while removing extra copies.
How can SQL calculate a running total?
You use a window function with SUM and an ordered window. This adds values progressively so each row sees the sum of all previous rows.
What is a running total useful for?
It helps track cumulative values over time like total sales growing day by day.
How do you find gaps in sequences in SQL?
You compare each row to the previous row using window functions like LAG. If the difference between numbers is larger than expected then a gap exists.
Why is detecting gaps useful?
It helps identify missing records such as skipped invoice numbers or missing dates in time series data.
How do you compare a row to the previous row in SQL?
You use the LAG window function. It lets the current row see values from earlier rows in the ordered dataset.
How do you compare a row to the next row in SQL?
You use the LEAD window function which looks forward to the next row in the sequence.
What is cohort analysis?
Cohort analysis groups users based on a shared starting event such as signup date and tracks their behavior over time.
Why do analysts use cohort analysis?
It helps understand how user behavior changes over time instead of mixing users from different starting periods.
What is sessionization in SQL?
Sessionization groups events that occur close in time into sessions. For example grouping website clicks into one browsing session.
How does SQL detect sessions?
By comparing timestamps between events. If the gap between events exceeds a threshold like 30 minutes then a new session begins.
How do you calculate retention in SQL?
You track users who return after their initial event by comparing activity dates to the original signup or first activity date.
What is a pivot query?
A pivot reshapes rows into columns so categories become separate columns for easier comparison.
What is an unpivot query?
An unpivot turns columns into rows so data becomes easier to analyze in aggregated form.
How do you find the second highest value in SQL?
You rank values using RANK or DENSE_RANK and then filter for rank equal to 2.
How can SQL find missing dates in a dataset?
You generate a sequence of expected dates and then compare it to existing records using joins to find dates that are missing.
What is a self join?
A self join joins a table to itself so rows can be compared within the same dataset.
Why are self joins useful?
They allow comparisons like employee-manager relationships or detecting relationships between rows in the same table.
What is a recursive query?
A recursive query repeatedly runs a query using its previous result. It is useful for hierarchical data like organizational trees.
What is hierarchical data in databases?
Data where rows relate in parent-child relationships such as company org charts or folder structures.
What is a moving average query?
A moving average calculates the average value across a sliding window of rows such as the last 7 days.