Skip to content

Grouping (Filtring)

Objectives

After completing this lab, you will be able to:

  1. Filter the output of a SELECT query by using string patterns, ranges, or sets of values.
  2. Sort the result set in either ascending or descending order in accordance with a pre-determined column.
  3. Group the outcomes of a query based on a selected parameter to further refine the response.

%load_ext sql %sql sqlite:///HR.db

%%sql

CREATE TABLE EMPLOYEES (
                            EMP_ID CHAR(9) NOT NULL, 
                            F_NAME VARCHAR(15) NOT NULL,
                            L_NAME VARCHAR(15) NOT NULL,
                            SSN CHAR(9),
                            B_DATE DATE,
                            SEX CHAR,
                            ADDRESS VARCHAR(30),
                            JOB_ID CHAR(9),
                            SALARY DECIMAL(10,2),
                            MANAGER_ID CHAR(9),
                            DEP_ID CHAR(9) NOT NULL,
                            PRIMARY KEY (EMP_ID));

  CREATE TABLE JOB_HISTORY (
                            EMPL_ID CHAR(9) NOT NULL, 
                            START_DATE DATE,
                            JOBS_ID CHAR(9) NOT NULL,
                            DEPT_ID CHAR(9),
                            PRIMARY KEY (EMPL_ID,JOBS_ID));

 CREATE TABLE JOBS (
                            JOB_IDENT CHAR(9) NOT NULL, 
                            JOB_TITLE VARCHAR(30),
                            MIN_SALARY DECIMAL(10,2),
                            MAX_SALARY DECIMAL(10,2),
                            PRIMARY KEY (JOB_IDENT));

CREATE TABLE DEPARTMENTS (
                            DEPT_ID_DEP CHAR(9) NOT NULL, 
                            DEP_NAME VARCHAR(15) ,
                            MANAGER_ID CHAR(9),
                            LOC_ID CHAR(9),
                            PRIMARY KEY (DEPT_ID_DEP));

CREATE TABLE LOCATIONS (
                            LOCT_ID CHAR(9) NOT NULL,
                            DEP_ID_LOC CHAR(9) NOT NULL,
                            PRIMARY KEY (LOCT_ID,DEP_ID_LOC));
%%sql

Insert into JOB_HISTORY (EMPL_ID, START_DATE, JOBS_ID, DEPT_ID)
VALUES 
("E1001",2000-08-01,100,2),
("E1002",2001-08-01,200,5),
("E1003",2001-08-16,300,5),
("E1004",2000-08-16,400,5),
("E1005",2000-05-30,500,2),
("E1006",2001-08-16,600,2),
("E1007",2002-05-30,650,7),
("E1008",2010-05-06,660,7),
("E1009",2016-08-16,234,7),
("E1010",2016-08-16,220,5);

%%sql

Insert into LOCATIONS (LOCT_ID, DEP_ID_LOC)
VALUES
('L0001',2),
('L0002',5),
('L0003',7)

%%sql

Insert into EMPLOYEES (EMP_ID, F_NAME, L_NAME, SSN, B_DATE, SEX, ADDRESS, JOB_ID, SALARY,MANAGER_ID,DEP_ID)
VALUES
("E1001","John","Thomas",123456,1976-09-01,"M","5631 Rice, OakPark,IL",100,100000,30001,2),
("E1002","Alice","James",123457,1972-07-31,"F","980 Berry ln, Elgin,IL",200,80000,30002,5),
("E1003","Steve","Wells",123458,1980-10-08,"M","291 Springs, Gary,IL",300,50000,30002,5),
("E1004","Santosh","Kumar",123459,1985-07-20,"M","511 Aurora Av, Aurora,IL",400,60000,30002,5),
("E1005","Ahmed","Hussain",123410,1981-04-01,"M","216 Oak Tree, Geneva,IL",500,70000,30001,2),
("E1006","Nancy","Allen",123411,1978-06-02,"F","111 Green Pl, Elgin,IL",600,90000,30001,2),
("E1007","Mary","Thomas",123412,1975-05-05,"F","100 Rose Pl, Gary,IL",650,65000,30003,7),
("E1008","Bharath","Gupta",123413,1985-06-05,"M","145 Berry Ln, Naperville,IL",660,65000,30003,7),
("E1009","Andrea","Jones",123414,1990-09-07,"F","120 Fall Creek, Gary,IL",234,70000,30003,7),
("E1010","Ann","Jacob",123415,1982-03-30,"F","111 Britany Springs,Elgin,IL",220,70000,30002,5)

%%sql

Insert into DEPARTMENTS (DEPT_ID_DEP, DEP_NAME, MANAGER_ID, LOC_ID)
VALUES
(2,"Architect Group",30001,"L0001"),
(5,"Software Group",30002,"L0002"),
(7,"Design Team",30003,"L0003")

%%sql

Insert into JOBS (JOB_IDENT, JOB_TITLE, MIN_SALARY, MAX_SALARY)
VALUES
(100,"Sr. Architect",60000,100000),
(200,"Sr. Software Developer",60000,80000),
(300,"Jr.Software Developer",40000,60000),
(400,"Jr.Software Developer",40000,60000),
(500,"Jr. Architect",50000,70000),
(600,"Lead Architect",70000,100000),
(650,"Jr. Designer",60000,70000),
(660,"Jr. Designer",60000,70000),
(234,"Sr. Designer",70000,90000),
(220,"Sr. Designer",70000,90000)
%%sql

SELECT F_NAME, L_NAME
FROM EMPLOYEES
WHERE ADDRESS LIKE '%Elgin,IL%';
F_NAME L_NAME
Alice James
Nancy Allen
Ann Jacob

Now assume that you want to identify the employees who were born during the 70s. The query above can be modified to

%%sql

SELECT F_NAME, L_NAME
FROM EMPLOYEES
WHERE B_DATE LIKE '197%';
F_NAME L_NAME
Ahmed Hussain
Nancy Allen
Bharath Gupta
Andrea Jones

Let us retrieve all employee records in department 5 where salary is between 60000 and 70000. The query that will be used is

%%sql

SELECT *
FROM EMPLOYEES
WHERE (SALARY BETWEEN 60000 AND 70000) AND DEP_ID = 5;
EMP_ID F_NAME L_NAME SSN B_DATE SEX ADDRESS JOB_ID SALARY MANAGER_ID DEP_ID
E1004 Santosh Kumar 123459 1958 M 511 Aurora Av, Aurora,IL 400 60000 30002 5
E1010 Ann Jacob 123415 1949 F 111 Britany Springs,Elgin,IL 220 70000 30002 5

Sorting

retrieve a list of employees ordered by department ID.

%%sql
SELECT F_NAME, L_NAME, DEP_ID 
FROM EMPLOYEES
ORDER BY DEP_ID;
F_NAME L_NAME DEP_ID
John Thomas 2
Ahmed Hussain 2
Nancy Allen 2
Alice James 5
Steve Wells 5
Santosh Kumar 5
Ann Jacob 5
Mary Thomas 7
Bharath Gupta 7
Andrea Jones 7

the records should be ordered in descending alphabetical order by last name. For descending order, you can make use of the DESC clause

%%sql

SELECT F_NAME, L_NAME, DEP_ID 
FROM EMPLOYEES
ORDER BY DEP_ID DESC, L_NAME DESC;
F_NAME L_NAME DEP_ID
Mary Thomas 7
Andrea Jones 7
Bharath Gupta 7
Steve Wells 5
Santosh Kumar 5
Alice James 5
Ann Jacob 5
John Thomas 2
Ahmed Hussain 2
Nancy Allen 2

Grouping

AVG is a function that can be used to calculate the Average or Mean of all values of a specified column in the result set.

  • A good example of grouping would be if For each department ID, we wish to retrieve the number of employees in the department.
%%sql

SELECT DEP_ID, COUNT(*)
FROM EMPLOYEES
GROUP BY DEP_ID;
DEP_ID COUNT(*)
2 3
5 4
7 3

retrieve the number of employees in the department and the average employee salary in the department

%%sql
SELECT DEP_ID, COUNT(*), AVG(SALARY)
FROM EMPLOYEES
GROUP BY DEP_ID;
DEP_ID COUNT(*) AVG(SALARY)
2 3 86666.66666666667
5 4 65000.0
7 3 66666.66666666667

Label the computed columns in the result set of the last problem as NUM_EMPLOYEES and AVG_SALARY

%%sql

SELECT DEP_ID, COUNT(*) AS "NUM_EMPLOYEES", AVG(SALARY) AS "AVG_SALARY"
FROM EMPLOYEES
GROUP BY DEP_ID;
DEP_ID NUM_EMPLOYEES AVG_SALARY
2 3 86666.66666666667
5 4 65000.0
7 3 66666.66666666667

e can sort the result of the previous query by average salary

%%sql

SELECT DEP_ID, COUNT(*) AS "NUM_EMPLOYEES", AVG(SALARY) AS "AVG_SALARY"
FROM EMPLOYEES
GROUP BY DEP_ID
ORDER BY AVG_SALARY;
DEP_ID NUM_EMPLOYEES AVG_SALARY
5 4 65000.0
7 3 66666.66666666667
2 3 86666.66666666667

we wish to limit the result to departments with fewer than 4 employees

%%sql

SELECT DEP_ID, COUNT(*) AS "NUM_EMPLOYEES", AVG(SALARY) AS "AVG_SALARY"
FROM EMPLOYEES
GROUP BY DEP_ID
HAVING count(*) < 4
ORDER BY AVG_SALARY;
DEP_ID NUM_EMPLOYEES AVG_SALARY
7 3 66666.66666666667
2 3 86666.66666666667

Practice Questions

1.Retrieve the list of all employees, first and last names, whose first names start with ‘S’

%%sql
SELECT F_NAME, L_NAME
FROM EMPLOYEES
where F_NAME like 'S%';
F_NAME L_NAME
Steve Wells
Santosh Kumar

2. Arrange all the records of the EMPLOYEES table in ascending order of the date of birth

%%sql

SELECT* FROM EMPLOYEES
ORDER BY B_DATE
EMP_ID F_NAME L_NAME SSN B_DATE SEX ADDRESS JOB_ID SALARY MANAGER_ID DEP_ID
E1002 Alice James 123457 1934 F 980 Berry ln, Elgin,IL 200 80000 30002 5
E1010 Ann Jacob 123415 1949 F 111 Britany Springs,Elgin,IL 220 70000 30002 5
E1004 Santosh Kumar 123459 1958 M 511 Aurora Av, Aurora,IL 400 60000 30002 5
E1003 Steve Wells 123458 1962 M 291 Springs, Gary,IL 300 50000 30002 5
E1007 Mary Thomas 123412 1965 F 100 Rose Pl, Gary,IL 650 65000 30003 7
E1001 John Thomas 123456 1966 M 5631 Rice, OakPark,IL 100 100000 30001 2
E1006 Nancy Allen 123411 1970 F 111 Green Pl, Elgin,IL 600 90000 30001 2
E1008 Bharath Gupta 123413 1974 M 145 Berry Ln, Naperville,IL 660 65000 30003 7
E1009 Andrea Jones 123414 1974 F 120 Fall Creek, Gary,IL 234 70000 30003 7
E1005 Ahmed Hussain 123410 1976 M 216 Oak Tree, Geneva,IL 500 70000 30001 2

3. Group the records in terms of the department IDs and filter them of ones that have average salary more than or equal to 60000. Display the department ID and the average salary

%%sql
SELECT DEP_ID,AVG(SALARY) as avg_salary from EMPLOYEES
GROUP BY DEP_ID
HAVING AVG(SALARY) >= 60000;
DEP_ID avg_salary
2 86666.66666666667
5 65000.0
7 66666.66666666667

For the problem above, sort the results for each group in descending order of average salary

%%sql
SELECT DEP_ID,AVG(SALARY) as avg_salary from EMPLOYEES
GROUP BY DEP_ID
HAVING AVG(SALARY) >= 60000
order by AVG(SALARY) desc
DEP_ID avg_salary
2 86666.66666666667
7 66666.66666666667
5 65000.0

Exercise 1: String Patterns

1. Retrieve all employees whose address is in Elgin,IL

%%sql

select * from Employees
where address like "%Elgin,IL%"
EMP_ID F_NAME L_NAME SSN B_DATE SEX ADDRESS JOB_ID SALARY MANAGER_ID DEP_ID
E1002 Alice James 123457 1934 F 980 Berry ln, Elgin,IL 200 80000 30002 5
E1006 Nancy Allen 123411 1970 F 111 Green Pl, Elgin,IL 600 90000 30001 2
E1010 Ann Jacob 123415 1949 F 111 Britany Springs,Elgin,IL 220 70000 30002 5

2. Retrieve all employees who were born during the 1970’s

%%sql

select * from Employees
where B_DATE like "197%"
EMP_ID F_NAME L_NAME SSN B_DATE SEX ADDRESS JOB_ID SALARY MANAGER_ID DEP_ID
E1005 Ahmed Hussain 123410 1976 M 216 Oak Tree, Geneva,IL 500 70000 30001 2
E1006 Nancy Allen 123411 1970 F 111 Green Pl, Elgin,IL 600 90000 30001 2
E1008 Bharath Gupta 123413 1974 M 145 Berry Ln, Naperville,IL 660 65000 30003 7
E1009 Andrea Jones 123414 1974 F 120 Fall Creek, Gary,IL 234 70000 30003 7

3. Retrieve all employees in department 5 whose salary is between 60000 and 70000

%%sql

select * from Employees
where DEP_ID =5 and salary between 60000 and 70000
EMP_ID F_NAME L_NAME SSN B_DATE SEX ADDRESS JOB_ID SALARY MANAGER_ID DEP_ID
E1004 Santosh Kumar 123459 1958 M 511 Aurora Av, Aurora,IL 400 60000 30002 5
E1010 Ann Jacob 123415 1949 F 111 Britany Springs,Elgin,IL 220 70000 30002 5

Exercise 2: Sorting

Retrieve a list of employees ordered by department ID

%%sql
select * from employees
Order by DEP_ID
EMP_ID F_NAME L_NAME SSN B_DATE SEX ADDRESS JOB_ID SALARY MANAGER_ID DEP_ID
E1001 John Thomas 123456 1966 M 5631 Rice, OakPark,IL 100 100000 30001 2
E1005 Ahmed Hussain 123410 1976 M 216 Oak Tree, Geneva,IL 500 70000 30001 2
E1006 Nancy Allen 123411 1970 F 111 Green Pl, Elgin,IL 600 90000 30001 2
E1002 Alice James 123457 1934 F 980 Berry ln, Elgin,IL 200 80000 30002 5
E1003 Steve Wells 123458 1962 M 291 Springs, Gary,IL 300 50000 30002 5
E1004 Santosh Kumar 123459 1958 M 511 Aurora Av, Aurora,IL 400 60000 30002 5
E1010 Ann Jacob 123415 1949 F 111 Britany Springs,Elgin,IL 220 70000 30002 5
E1007 Mary Thomas 123412 1965 F 100 Rose Pl, Gary,IL 650 65000 30003 7
E1008 Bharath Gupta 123413 1974 M 145 Berry Ln, Naperville,IL 660 65000 30003 7
E1009 Andrea Jones 123414 1974 F 120 Fall Creek, Gary,IL 234 70000 30003 7

2. Retrieve a list of employees ordered by department name, and within each department ordered alphabetically in descending order by last name

%%sql

select * from departments,employees
WHERE DEP_ID = DEPT_ID_DEP

Order by DEP_NAME,L_NAME   desc
DEPT_ID_DEP DEP_NAME MANAGER_ID LOC_ID EMP_ID F_NAME L_NAME SSN B_DATE SEX ADDRESS JOB_ID SALARY MANAGER_ID_1 DEP_ID
2 Architect Group 30001 L0001 E1001 John Thomas 123456 1966 M 5631 Rice, OakPark,IL 100 100000 30001 2
2 Architect Group 30001 L0001 E1005 Ahmed Hussain 123410 1976 M 216 Oak Tree, Geneva,IL 500 70000 30001 2
2 Architect Group 30001 L0001 E1006 Nancy Allen 123411 1970 F 111 Green Pl, Elgin,IL 600 90000 30001 2
7 Design Team 30003 L0003 E1007 Mary Thomas 123412 1965 F 100 Rose Pl, Gary,IL 650 65000 30003 7
7 Design Team 30003 L0003 E1009 Andrea Jones 123414 1974 F 120 Fall Creek, Gary,IL 234 70000 30003 7
7 Design Team 30003 L0003 E1008 Bharath Gupta 123413 1974 M 145 Berry Ln, Naperville,IL 660 65000 30003 7
5 Software Group 30002 L0002 E1003 Steve Wells 123458 1962 M 291 Springs, Gary,IL 300 50000 30002 5
5 Software Group 30002 L0002 E1004 Santosh Kumar 123459 1958 M 511 Aurora Av, Aurora,IL 400 60000 30002 5
5 Software Group 30002 L0002 E1002 Alice James 123457 1934 F 980 Berry ln, Elgin,IL 200 80000 30002 5
5 Software Group 30002 L0002 E1010 Ann Jacob 123415 1949 F 111 Britany Springs,Elgin,IL 220 70000 30002 5
%%sql
SELECT D.DEP_NAME , E.F_NAME, E.L_NAME
FROM EMPLOYEES as E, DEPARTMENTS as D
WHERE E.DEP_ID = D.DEPT_ID_DEP
ORDER BY D.DEP_NAME, E.L_NAME DESC;
DEP_NAME F_NAME L_NAME
Architect Group John Thomas
Architect Group Ahmed Hussain
Architect Group Nancy Allen
Design Team Mary Thomas
Design Team Andrea Jones
Design Team Bharath Gupta
Software Group Steve Wells
Software Group Santosh Kumar
Software Group Alice James
Software Group Ann Jacob

Exercise 3: Grouping

1. For each department ID retrieve the number of employees in the department

%%sql

select DEP_ID, count(*) as num_of_emp from employees
Group By DEP_ID
DEP_ID num_of_emp
2 3
5 4
7 3

For each department retrieve the number of employees in the department, and the average employee salary in the department

%%sql
select DEP_ID,count(*) as num_of_emp,AVG(SALARY) from employees
Group By DEP_ID;
DEP_ID num_of_emp AVG(SALARY)
2 3 86666.66666666667
5 4 65000.0
7 3 66666.66666666667

3. Label the computed columns in the result set of SQL problem 2 (Exercise 3 Problem 2) as NUM_EMPLOYEES and AVG_SALARY

%%sql
SELECT DEP_ID, COUNT(*) AS "NUM_EMPLOYEES", AVG(SALARY) AS "AVG_SALARY"
FROM EMPLOYEES
GROUP BY DEP_ID;
DEP_ID NUM_EMPLOYEES AVG_SALARY
2 3 86666.66666666667
5 4 65000.0
7 3 66666.66666666667

In SQL problem 3 (Exercise 3 Problem 3), order the result set by Average Salary

%%sql
SELECT DEP_ID, COUNT(*) AS "NUM_EMPLOYEES", AVG(SALARY) AS "AVG_SALARY"
FROM EMPLOYEES
GROUP BY DEP_ID
ORDER BY AVG_SALARY;
DEP_ID NUM_EMPLOYEES AVG_SALARY
5 4 65000.0
7 3 66666.66666666667
2 3 86666.66666666667

In SQL problem 4 (Exercise 3 Problem 4), limit the result to departments with fewer than 4 employees

%%sql
SELECT DEP_ID, COUNT(*) AS "NUM_EMPLOYEES", AVG(SALARY) AS "AVG_SALARY"
FROM EMPLOYEES
GROUP BY DEP_ID
HAVING count(*) < 4
ORDER BY AVG_SALARY;
DEP_ID NUM_EMPLOYEES AVG_SALARY
7 3 66666.66666666667
2 3 86666.66666666667

You want to select the author's lastname from a table, but you only remember that it starts with the letter J. Which of the following queries uses the correct string pattern?

SELECT lastname from author where lastname like J% 

SELECT lastname from author where lastname like J* 

SELECT lastname from author where lastname like J$ 

SELECT lastname from author where lastname like J# 

Question 2 In SQL, which of the following will be the correct way to sort a result set in descending order?

SELECT ID FROM TABLE_NAME ORDER BY ID DESC 

SELECT * FROM TABLE_NAME ORDER BY ID DESC 

SELECT * FROM TABLE_NAME ORDER BY ID 

SELECT ID FROM TABLE_NAME ORDER BY ID 

What is the role of HAVING clause in SQL queries in MySQL?

Acts as an alternative to WHERE clause in SQL queries. It may not necessarily organize the result set in a specific order. Check whether data records meet the specified condition is met or not. Restricts the result set for a query using GROUP BY clause.

Question 4 Which of the choices best describe the function of the following SQL query?

SELECT * FROM employees ORDER BY emp_name LIMIT 5; 

Retrieves all the columns of the top 5 rows of the table, sorted alphabetically based on emp_names

Retrieves all the columns of the top 5 rows of the table, sorted reverse alphabetically based on emp_names

Retrieves the entire contents of the table, sorted alphabetically based on emp_names

Retrieves the top 5 emp_names ordered alphabetically.

Question 5 Which of the following SQL statements lists the number of customers in each country, showing only the countries with more than five customers?

SELECT COUNT(CustomerID), Country FROM Customers GROUP BY Country HAVING CustomerID > 5; 

SELECT COUNT(CustomerID), Country FROM Customers GROUP BY Country HAVING COUNT(CustomerID) > 5; 

SELECT COUNT(CustomerID), Country FROM Customers GROUP BY Country HAVING COUNT(CustomerID) < 5; 

SELECT COUNT(CustomerID), Country FROM Customers GROUP BY Country HAVING COUNT(Customers) > 5;