# SQL Group By and Having Clauses

## Objectives

This tutorial covers the following objectives:

• Construct and execute a SQL query using GROUP BY
• Construct and execute a SQL query using GROUP BY … HAVING
• Construct and execute a GROUP BY on more than one column
• Nest group functions

## Purpose

• If you wanted to know the average height of all students?
• You could write a query that looks like this:
``SELECT AVG(height) FROM students;``
• But what if you wanted to know the average height of the students based on their year in school?
• With what you know right now, you would have to write a number of different SQL statements to accomplish this:
``SELECT AVG(height) FROM students WHERE year_in_school = 10;``
``SELECT AVG(height) FROM students WHERE year_in_school = 11;``
``SELECT AVG(height) FROM students WHERE year_in_school = 12;``
• And so on!
• To simplify problems like this with just one statement, you use the GROUP BY and HAVING clauses.

## GROUP BY Use in SQL

• You use the GROUP BY clause to divide the rows in a table into smaller groups.
• You can then use the group functions to return summary information for each group.
``````SELECT department_id, AVG(salary)
FROM employees
GROUP BY department_id
ORDER BY department_id;``````
• In the SELECT statement shown, the rows are being grouped by department_id.
• The AVG function is then applied to each group.

### GROUP BY Example

• What if we wanted to find the maximum salary of employees in each department?
• We use a GROUP BY clause stating which column to use to group the rows.
``````SELECT MAX(salary)
FROM employees
GROUP BY department_id;``````
• But how can we tell which maximum salary belongs to which department?

### GROUP BY in SELECT

• Usually we want to include the GROUP BY column in the SELECT list.

### GROUP BY Clause

• Group functions require that any column listed in the SELECT clause that is not part of a group function must be listed in a GROUP BY clause.
• What is wrong with this example?

Job_id is fine in the SELECT list, but last_name is not, because each unique group of job_id‘s produces only one row of output (as it is the GROUP BY column). However, there may be many different employees having that same job_id, for example there are three employees with a job_id of SA_REP.

### COUNT

• This example shows how many countries are in each region.
• Remember that group functions ignore null values, so if any country does not have a country name, it will not be included in the COUNT.
• Of course this is unlikely, but when constructing SQL statements, we have to think about all of the possibilities.
• It would be better to write the query using COUNT(*):
``````SELECT COUNT(*), region_id
FROM wf_countries
GROUP BY region_id
ORDER BY region_id;``````
• This would count all of the rows in each region group, without the need to check which columns contained NULL values.

### WHERE Clause in SQL

• We can also use a WHERE clause to exclude rows before the remaining rows are formed into groups.
``````SELECT department_id, MAX(salary)
FROM employees
WHERE last_name != 'King'
GROUP BY department_id;``````

As employee King is excluded by the WHERE clause, the MAX(salary) for department 90 is returned as 17000.

### More GROUP BY Examples

• Show the average population of all countries in each region.
• Round the average to a whole number.
``````SELECT region_id, ROUND(AVG(population)) AS population
FROM wf_countries
GROUP BY region_id
ORDER BY region_id;``````
• Count the number of spoken languages for all countries.
``````SELECT country_id, COUNT(language_id) AS "Number of languages"
FROM wf_spoken_languages
GROUP BY country_id;``````

### GROUP BY Guidelines

• Important guidelines to remember when using a GROUP BY clause are:
• If you include a group function (AVG, SUM, COUNT, MAX, MIN, STDDEV, VARIANCE) in a SELECT clause along with any other individual columns, each individual column must also appear in the GROUP BY clause.
• You cannot use a column alias in the GROUP BY clause.
• The WHERE clause excludes rows before they are divided into groups.

#### Groups Within GROUPS

• Sometimes you need to divide groups into smaller groups.
• For example, you may want to group all employees by department; then, within each department, group them by job.
• This example shows how many employees are doing each job within each department.
``````SELECT department_id, job_id, count(*)
FROM employees
WHERE department_id > 40
GROUP BY department_id, job_id;``````

### Nesting Group Functions

• Group functions can be nested to a depth of two when GROUP BY is used.
``````SELECT max(avg(salary))
FROM employees
GROUP by department_id;``````
• How many values will be returned by this query?
• The answer is one – the query will find the average salary for each department, and then from that list, select the single largest value.

## HAVING

• Suppose we want to find the maximum salary in each department, but only for those departments which have more than one employee?
• What is wrong with this example?

A WHERE clause can be used only to include/exclude individual rows, not groups of rows. Therefore we cannot use group functions in a WHERE clause.

• In the same way you used the WHERE clause to restrict the rows that you selected, you can use the HAVING clause to restrict groups.
• In a query using a GROUP BY and HAVING clause, the rows are first grouped, group functions are applied, and then only those groups matching the HAVING clause are displayed.
• The WHERE clause is used to restrict rows; the HAVING clause is used to restrict groups returned from a GROUP BY clause.

The above query first finds the MAX salary for each department in the employees table. The HAVING clause then restricts the groups returned to those departments that have more than 1 employee.

• This query finds the average population of the countries in each region.
• It then only returns the region groups with a lowest population greater than three hundred thousand.

The HAVING and GROUP BY clauses can use different columns. The example of GROUPs BY region_id, but the HAVING clause restricts groups based on population.

• Although the HAVING clause can precede the GROUP BY clause in a SELECT statement, it is recommended that you place each clause in the order shown.
• The ORDER BY clause (if used) is always last!

## Terminology

Key terms used in this tutorial included:

• GROUP BY
• HAVING

## Summary

In this tutorial, you should have learned how to:

• Construct and execute a SQL query using GROUP BY
• Construct and execute a SQL query using GROUP BY … HAVING
• Construct and execute a GROUP BY on more than one column
• Nest group functions