GROUPING SETS allows you to specify multiple grouping combinations in a single query. It's a shorthand for combining multiple GROUP BY operations with UNION ALL, producing multiple levels of aggregation simultaneously.
The CUBE operator generates all possible combinations of grouping columns, producing a cross-tabulation report. It's useful for generating subtotals and grand totals across multiple dimensions in data analysis.
Groups with specific patterns can be found using HAVING with aggregate functions to filter groups based on conditions like COUNT(), MIN(), MAX(), or custom calculations that identify the desired patterns.
The FILTER clause allows conditional aggregation by specifying which rows to include in the aggregate calculation. It's more readable than CASE expressions and can improve performance.
Median calculation varies by database system. Common approaches include using PERCENTILE_CONT(0.5), specialized functions like MEDIAN(), or calculating it manually using window functions and row numbers.
Aggregate functions group rows into a single result row, while analytic functions (window functions) perform calculations across rows while maintaining individual row details in the result set.
COUNT(*) counts all rows including NULL values, while COUNT(column_name) counts only non-NULL values in the specified column. This can lead to different results when the column contains NULL values.
WHERE filters individual rows before grouping, while HAVING filters groups after grouping. HAVING can use aggregate functions in its conditions, but WHERE cannot because it processes rows before aggregation occurs.
Division by zero can be handled using NULLIF or CASE statements within aggregate functions. For example, AVG(value/NULLIF(divisor,0)) prevents division by zero errors by converting zero divisors to NULL.
ROLLUP generates hierarchical subtotals based on the specified columns' order, while CUBE generates all possible combinations. ROLLUP is used for hierarchical data analysis, creating subtotals for each level.
Percentages within groups can be calculated using window functions, such as SUM(value) OVER (PARTITION BY group) to get the group total, then dividing individual values by this total and multiplying by 100.
FIRST_VALUE and LAST_VALUE are window functions that return the first and last values in a window frame, respectively. They're useful for comparing current rows with initial or final values in a group.
Timezone differences can be handled by converting timestamps to a standard timezone using AT TIME ZONE or converting to UTC before grouping. This ensures consistent grouping across different timezones.
The GROUP BY clause is used to group rows that have the same values in specified columns into summary rows. It is typically used with aggregate functions to perform calculations on each group of rows rather than the entire table.
The five basic aggregate functions in SQL are COUNT(), SUM(), AVG(), MAX(), and MIN(). These functions perform calculations across a set of rows and return a single value.
When DISTINCT is used with aggregate functions (e.g., COUNT(DISTINCT column)), it counts or aggregates only unique values in the specified column, eliminating duplicates before performing the aggregation.
NULL values are handled differently by different aggregate functions: COUNT(*) includes them, COUNT(column) ignores them, SUM and AVG ignore them, and MAX and MIN ignore them. This can significantly impact calculation results.
Running totals can be calculated using window functions with the OVER clause and ORDER BY, such as SUM(value) OVER (ORDER BY date). This creates a cumulative sum while maintaining individual row details.
The mode can be found using COUNT and GROUP BY, then selecting the value with the highest count using ORDER BY COUNT(*) DESC and LIMIT 1 or ranking functions like ROW_NUMBER().
ROW_NUMBER() assigns unique numbers, RANK() assigns same number to ties with gaps, and DENSE_RANK() assigns same number to ties without gaps. They're used for different ranking scenarios within groups.
LAG() accesses data from previous rows while LEAD() accesses data from subsequent rows in a result set. Both are window functions useful for comparing current rows with offset rows within groups.
Moving averages are calculated using window functions with ROWS or RANGE in the OVER clause, such as AVG(value) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW).
Data pivoting can be achieved using aggregate functions with CASE expressions or the PIVOT operator (if supported by the database). This transforms row values into columns, creating cross-tabulated results.
The HAVING clause is used to filter groups based on aggregate function results. It's similar to WHERE but operates on groups rather than individual rows and can use aggregate functions in its conditions.
A window function performs calculations across a set of rows related to the current row, unlike regular aggregation which groups rows into a single output row. Window functions preserve the individual rows while adding aggregate calculations.
Outliers can be identified using window functions to calculate statistical measures like standard deviation within groups, then using WHERE or HAVING to filter values that deviate significantly from the group's average.
ORDER BY in window functions determines the sequence of rows for operations like running totals, moving averages, and LAG/LEAD functions. It's crucial for time-series analysis and sequential calculations.
ROWS defines the window frame based on physical row count, while RANGE defines it based on logical value ranges. ROWS is used for fixed-size windows, RANGE for value-based windows.
Group concatenation can be achieved using STRING_AGG() or GROUP_CONCAT() (depending on the database system), which combines values from multiple rows into a single string within each group.