LAG accesses data from previous rows and LEAD accesses data from subsequent rows in the result set. Both functions can specify an offset and a default value. Example: LAG(price, 1, 0) OVER (ORDER BY date) returns the previous row's price or 0 if none exists.
Moving averages are calculated using AVG with a window frame specification: AVG(value) OVER (ORDER BY date ROWS BETWEEN n PRECEDING AND CURRENT ROW). This computes the average of the current row and n previous rows.
ROWS defines the frame based on physical row count, while RANGE defines it based on logical value ranges. ROWS uses exact row positions, while RANGE groups rows with the same ORDER BY values together.
Percent of total is calculated by dividing the current row's value by the sum over the entire partition: (value * 100.0) / SUM(value) OVER (PARTITION BY group). This shows each row's value as a percentage of its group total.
Window functions may require sorting operations and memory for frame processing. Performance can be improved by proper indexing on PARTITION BY and ORDER BY columns, limiting frame sizes, and considering materialized views for complex calculations.
Year-over-year growth can be calculated using LAG to get previous year's value and percentage calculation: (current_value - LAG(value, 1) OVER (ORDER BY year)) * 100.0 / LAG(value, 1) OVER (ORDER BY year).
CUME_DIST calculates cumulative distribution (relative position) of a value, while PERCENT_RANK calculates relative rank. Both return values between 0 and 1, useful for statistical analysis and percentile calculations.
Window functions can be used before or after PIVOT operations to perform calculations across pivoted columns. This requires careful consideration of partitioning and ordering to maintain data relationships.
Running totals with resets use PARTITION BY to define reset boundaries and ORDER BY for sequence. SUM(value) OVER (PARTITION BY reset_column ORDER BY date) calculates totals that reset based on the partition column.
A window function performs calculations across a set of table rows related to the current row. Unlike regular aggregate functions that group rows into a single output row, window functions retain the individual rows while adding computed values based on the specified window of rows.
The OVER clause defines the window or set of rows on which the window function operates. It can contain PARTITION BY to divide rows into groups, ORDER BY to sequence rows, and frame specifications to limit the rows within the partition.
PARTITION BY divides rows into groups for window function calculations while maintaining individual rows in the result set. GROUP BY collapses rows into single summary rows. PARTITION BY is used within window functions, while GROUP BY is used with aggregate functions.
Window frames define the set of rows within a partition using ROWS or RANGE with frame boundaries like UNBOUNDED PRECEDING, CURRENT ROW, or N PRECEDING/FOLLOWING. They control which rows are included in window function calculations.
Exclusive frames (BETWEEN n PRECEDING AND 1 PRECEDING) exclude the current row, while inclusive frames (BETWEEN n PRECEDING AND CURRENT ROW) include it. This affects calculations like moving averages and running totals.
Multiple window functions can be used in the same query with different OVER clauses. You can also define named windows using WINDOW clause and reference them to avoid repetition and maintain consistency.
Anomaly detection uses window functions to calculate statistics (avg, stddev) over windows of data, then identifies values that deviate significantly from these statistics using comparison operations.
ROW_NUMBER() assigns unique sequential numbers to rows, RANK() assigns the same rank to ties with gaps in sequence, and DENSE_RANK() assigns the same rank to ties without gaps. For example, ROW_NUMBER: 1,2,3,4; RANK: 1,2,2,4; DENSE_RANK: 1,2,2,3.
Running totals can be calculated using SUM as a window function with an ORDER BY clause: SUM(value) OVER (ORDER BY date). This creates a cumulative sum where each row contains the total of all previous rows plus the current row.
NULL values in window functions can be handled using IGNORE NULLS option with LAG/LEAD/FIRST_VALUE/LAST_VALUE, or by using COALESCE/ISNULL functions. The treatment of NULLs affects frame boundaries and calculation results.
Median can be calculated using PERCENTILE_CONT(0.5) OVER (PARTITION BY group) or by combining ROW_NUMBER with aggregation to find the middle value in ordered sets.
Gap analysis uses LAG/LEAD to compare consecutive values, identifying missing or irregular values in sequences. Common applications include finding missing sequence numbers or time gaps in event data.
Window function results can be stored in materialized views for performance, but this requires careful consideration of refresh strategies and storage requirements. Not all databases support window functions in materialized views.
Window functions in stored procedures require careful string construction for dynamic SQL, proper parameter handling, and consideration of performance impact. Error handling and SQL injection prevention are crucial.
FIRST_VALUE returns the first value in a window frame, and LAST_VALUE returns the last value. They're useful for comparing current rows with initial or final values in a group, like finding the first or last price in a time period.
Percentiles can be calculated using PERCENTILE_CONT or PERCENTILE_DISC functions with window specifications. PERCENTILE_CONT provides continuous interpolated values, while PERCENTILE_DISC returns actual values from the dataset.
NTILE divides ordered rows into a specified number of roughly equal groups (buckets). For example, NTILE(4) OVER (ORDER BY value) assigns numbers 1-4 to rows, creating quartiles. It's useful for creating equal-sized groupings of ordered data.
When ties occur in ORDER BY, window functions handle them based on their specific behavior. ROW_NUMBER assigns unique values arbitrarily, RANK and DENSE_RANK assign same values, and frame specifications may include or exclude tied rows.
Date/time windows can use RANGE with date intervals or ROWS with specific counts. Consider timezone handling, date arithmetic, and appropriate frame specifications for time-based analysis.
Window functions cannot be nested directly, cannot be used in WHERE clauses, and may have performance implications on large datasets. They're also not available in all SQL databases or versions.
Rolling calculations use window frames with fixed sizes (e.g., ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) combined with aggregate functions. This enables calculations like moving averages, rolling sums, or sliding window analysis.