GROUP BY: Aggregating Data

PostgreSQL GROUP BY: Aggregating Data

Welcome back to our PostgreSQL tutorial series! In this lesson, we'll focus on the GROUP BY clause, which is essential for aggregating data and performing calculations on groups of rows. Understanding how to use the GROUP BY clause effectively allows you to generate insightful summaries from your data.

Introduction to the GROUP BY Clause

The GROUP BY clause in PostgreSQL is used with aggregate functions (like COUNT, SUM, AVG, MAX, MIN) to group rows that have the same values in specified columns into summary rows. It's a powerful tool for data analysis and reporting.

Basic Syntax

SELECT column1, aggregate_function(column2)
FROM table_name
GROUP BY column1;
  • SELECT: Specifies the columns and aggregate functions to retrieve.
  • FROM: Specifies the table to query.
  • GROUP BY: Specifies the column(s) to group the results by.

Using GROUP BY in PostgreSQL

Let's explore various ways to use the GROUP BY clause with practical examples using our library and movies database.

1. Counting Rows in Groups

Example: Count the number of books by each author

SELECT author, COUNT(*) AS book_count
FROM books
GROUP BY author;

View Schema Initialization

Explanation:

  • Groups the books by author.
  • Counts the number of books for each author.
  • The result shows each author and how many books they have in the books table.

2. Calculating Sums

Example: Calculate the total number of pages for each author

SELECT author, SUM(pages) AS total_pages
FROM books
GROUP BY author;

Explanation:

  • Groups the books by author.
  • Sums up the pages for each author's books.
  • The result shows each author and the total pages of their books.

3. Calculating Averages

Example: Find the average price of books for each author

SELECT author, AVG(price) AS average_price
FROM books
GROUP BY author;

Explanation:

  • Groups the books by author.
  • Calculates the average price for each author's books.
  • The result shows each author and the average price of their books.

4. Finding Maximum and Minimum Values

Example: Find the highest and lowest priced books for each author

SELECT author, MAX(price) AS max_price, MIN(price) AS min_price
FROM books
GROUP BY author;

Explanation:

  • Groups the books by author.
  • Finds the maximum (MAX) and minimum (MIN) price for each author's books.
  • The result shows each author, their highest priced book, and their lowest priced book.

5. Grouping by Multiple Columns

Example: Count the number of books by author and price

Suppose we want to see how many books each author has at each price point.

SELECT author, price, COUNT(*) AS book_count
FROM books
GROUP BY author, price;

Explanation:

  • Groups the books by both author and price.
  • Counts the number of books for each author at each price.
  • The result shows combinations of authors and prices with the count of books for each.

6. Using GROUP BY with HAVING Clause

The HAVING clause allows you to filter groups based on aggregate functions.

Example: Find authors with more than one book

SELECT author, COUNT(*) AS book_count
FROM books
GROUP BY author
HAVING COUNT(*) > 1;

Explanation:

  • Groups the books by author.
  • Counts the number of books for each author.
  • The HAVING clause filters to include only authors with more than one book.

7. Combining GROUP BY with ORDER BY

You can sort the grouped results using ORDER BY.

Example: List authors and their total pages, ordered by total pages descending

SELECT author, SUM(pages) AS total_pages
FROM books
GROUP BY author
ORDER BY total_pages DESC;

Explanation:

  • Groups the books by author.
  • Sums up the pages for each author.
  • Orders the results by total_pages in descending order.

8. Grouping and Filtering Data

Example: Calculate average price of books priced over $10, grouped by author

SELECT author, AVG(price) AS average_price
FROM books
WHERE price > 10
GROUP BY author;

Explanation:

  • Filters books with price > 10.
  • Groups the remaining books by author.
  • Calculates the average price for each author.

9. Using GROUP BY with JOINs

You can group data from multiple tables using JOINs.

Example: Count the number of books for authors who are in the authors table

SELECT a.name AS author_name, COUNT(b.title) AS book_count
FROM authors a
LEFT JOIN books b ON a.name = b.author
GROUP BY a.name;

Explanation:

  • Performs a LEFT JOIN between authors and books.
  • Groups the results by a.name (author's name).
  • Counts the number of books for each author, including those with zero books.

10. Grouping with Expressions

You can use expressions in the GROUP BY clause.

Example: Group books by price range

Let's group books into price ranges: Under $10, $10 to $12, Over $12.

SELECT
    CASE
        WHEN price < 10 THEN 'Under $10'
        WHEN price BETWEEN 10 AND 12 THEN '$10 - $12'
        ELSE 'Over $12'
    END AS price_range,
    COUNT(*) AS book_count
FROM books
GROUP BY price_range;

Explanation:

  • Uses a CASE expression to create price_range categories.
  • Groups the books by price_range.
  • Counts the number of books in each price range.

11. Using DISTINCT with GROUP BY

Although GROUP BY can often replace DISTINCT, you can use both if needed.

Example: List distinct authors from the books table

SELECT DISTINCT author
FROM books;

Or using GROUP BY:

SELECT author
FROM books
GROUP BY author;

Explanation:

  • Both queries return a list of unique authors from the books table.

12. Grouping with DATE Functions

If we had date information, we could group by date parts. Since our schema doesn't include a publication_date column, we'll skip this example.

Conclusion

The GROUP BY clause in PostgreSQL is a powerful tool for aggregating and summarizing data. By mastering GROUP BY and aggregate functions, you can derive meaningful insights from your data.

Key Takeaways:

  • Use GROUP BY to group rows that share a value in one or more columns.
  • Aggregate functions like COUNT, SUM, AVG, MAX, and MIN perform calculations on grouped data.
  • The HAVING clause filters groups based on aggregate values.
  • Combine GROUP BY with ORDER BY to sort your grouped results.
  • You can group by expressions and use CASE statements for custom groupings.

Schema Initialization

To run the examples provided, ensure your PostgreSQL database is set up with the following schema:

-- Create the books table
CREATE TABLE books (
    title VARCHAR(255),
    author VARCHAR(255),
    price NUMERIC(10, 2),
    pages INTEGER
);
 
-- Insert sample data into the books table
INSERT INTO books (title, author, price, pages) VALUES
('The Great Gatsby', 'F. Scott Fitzgerald', 10.99, 180),
('This Side of Paradise', 'F. Scott Fitzgerald', 7.99, 305),
('To Kill a Mockingbird', 'Harper Lee', 7.99, 281),
('Go Set a Watchman', 'Harper Lee', 8.99, 278),
('1984', 'George Orwell', 8.99, 328),
('Animal Farm', 'George Orwell', 5.99, 112),
('Pride and Prejudice', 'Jane Austen', 6.99, 279),
('Sense and Sensibility', 'Jane Austen', 5.99, 226),
('The Catcher in the Rye', 'J.D. Salinger', 9.99, 214),
('Moby-Dick', 'Herman Melville', 11.99, 635),
('War and Peace', 'Leo Tolstoy', 12.99, 1225);
 
-- Create the authors table
CREATE TABLE authors (
    name VARCHAR(255)
);
 
-- Insert sample data into the authors table
INSERT INTO authors (name) VALUES
('F. Scott Fitzgerald'),
('Harper Lee'),
('George Orwell'),
('Jane Austen'),
('J.D. Salinger'),
('Herman Melville'),
('Leo Tolstoy'),
('Mark Twain');  -- Added an author without books for JOIN examples
 
-- Create the movies table
CREATE TABLE movies (
    title VARCHAR(255)
);
 
-- Insert sample data into the movies table
INSERT INTO movies (title) VALUES
('The Shawshank Redemption'),
('The Godfather'),
('The Dark Knight'),
('Pulp Fiction'),
('The Lord of the Rings');