Understanding PostgreSQL DISTINCT Clause
The PostgreSQL DISTINCT clause is a special node that is used to exclude the repetitions of a query string output from it. It is nearly impossible to avoid such cases, whenever we have to deal with databases, where the same database data may appear more than one time due to either the joining method, the collapsing, or the nature of the data storage. PostgreSQL DISTINCT keyword allows for the exclusion of redundant records, therefore making it possible to better the representation of the results through such an approach.
Inside this guide, we are going to walk thorough the PostgreSQL DISTINCT statement by looking into its syntax, use cases in different scenarios, and providing examples. Also, we will take a look at how PostgreSQL DISTINCT really works, what the best scenarios are when it should be appropriated DISTINCT, and how it can be combined with other clauses of SQL to assist the user in getting a refined and meaningful dataset.
Getting the hang of PostgreSQL DISTINCT usage is indispensable to SQL users, mainly when dealing with large databases or complex questions. Through filtration of repetitive contents, records from PostgreSQL DISTINCT will be of high quality, which is to say, your data will be improved in terms of speed and accuracy of your database queries.
What is the DISTINCT Keyword in PostgreSQL?
- The DISTINCT keyword is employed for the purpose of eliminating the duplicate rows from the result set of a SELECT query.
- In the event, when you query a table, you may retrieve privately the rows with the same values on each column selected.
- With the use of the DISTINCT keyword, you may guide PostgreSQL to only display unique rows in the result set.
The DISTINCT Clause Syntax
- The basic syntax for using the DISTINCT clause is given below:
SELECT DISTINCT column1, column2, …, columnN
FROM table_name;
- column1, column2, …, columnN: These are the columns selected for distinct values.
- table_name: The table to which the data is assigned.
- The DISTINCT keyword means that the values in all the specified columns are compared with each other; and, if they are same in another row, only one of those rows will appear in the result set.
How DISTINCT Works
- When you use SELECT DISTINCT in PostgreSQL, all the rows in your result set are compared to each other.
- When two rows have an identical value in all the selected columns, only one of them gets back.
- By the way, DISTINCT does not adhere to a column-separated basis; it takes all the selected columns as a mean of evaluation.
For instance, when you pick more than one column with DISTINCT, the database system deletes the duplicates only if the combination of the values in the selected columns is the same in different rows. Here is an example of a simple table:
id | name | city |
1 | Alice | New York |
2 | Bob | Chicago |
3 | Alice | New York |
4 | Charlie | Boston |
5 | Alice | Chicago |
- If we run the query:
SELECT DISTINCT name, city FROM table_name;
- The result will be:
name | city |
Alice | New York |
Bob | Chicago |
Alice | Chicago |
Charlie | Boston |
- Notice that the combination of “Alice” and “New York” appeared twice, but only one instance is returned due to DISTINCT.
Common Use Cases of DISTINCT
Distinct is a keyword that enables you to perform operations without a select statement on a SQL statement. It comes in handy in several cases:
- Eliminating Duplicate Records: Should you come across doubled records in connection with joins or operations, you can use DISTINCT to eliminate them.
- Counting Unique Values: More often than not, it is combined with the COUNT() function to count distinct values, for instance, the number of unique cities in the table:
SELECT COUNT(DISTINCT city) FROM table_name;
- Retrieval of Unique Combinations of Multiple Columns: If a user desires to return only unique combinations of multiple columns, then DISTINCT can be utilized to filter the results.A good example is to get only unique
SELECT DISTINCT name, city FROM table_name;
- How to Simplify Aggregate Functions: The use of the DISTINCT keyword inside aggregate functions is to execute only on the distinct values.This is used, for example, when you want to calculate the average salary but the salary results should be unique, you could use
SELECT AVG(DISTINCT salary) FROM employees;
DISTINCT with Multiple Columns
- Usually the most common application of DISTINCT is when you want to ensure that the result set contains unique combinations of values across more than one column.
- When DISTINCT is added to a query, PostgreSQL asserts true for the selected rows only if the combination of values in those columns is unique.
- Take a look at the given data:
id | name | city | age |
1 | Alice | New York | 30 |
2 | Bob | Chicago | 25 |
3 | Alice | New York | 35 |
4 | Alice | Chicago | 30 |
5 | Bob | Chicago | 25 |
- If you run the following query:
SELECT DISTINCT name, city FROM table_name;
- You would get the following result:
name | city |
Alice | New York |
Bob | Chicago |
Alice | Chicago |
- The unique combination of the name and the city is required for more than one row at a time, so even if Alice shows up more than once, she is only mentioned once unique per the city iteration.
DISTINCT with COUNT()
- You can be a DISTINCT keyword with the COUNT() function to determine the number of unique values in a column.
- A simple example is the counting of how many different cities are in the table:
SELECT COUNT(DISTINCT city) FROM table_name;
- This query would retrieve a unique city name, without repetition.
Examples of DISTINCT in PostgreSQL
Example 1: Basic Usage of DISTINCT
- Suppose a table employees contains the data below:
id | name | department |
1 | John | Sales |
2 | Alice | HR |
3 | Bob | Sales |
4 | John | Marketing |
5 | Alice | Sales |
- If we want to get the distinct department names, the query would be:
SELECT DISTINCT department FROM employees;
- The result would be:
department |
Sales |
HR |
Marketing |
- Here, DISTINCT ensures that “Sales” is listed only once, even though multiple employees belong to the Sales department.
Example 2: Using DISTINCT with Multiple Columns
- If you would like to get the unique employee name-departments, you may use DISTINCT with multiple columns like this:
SELECT DISTINCT name, department FROM employees;
- The output will be like this:
name | department |
John | Sales |
Alice | HR |
Bob | Sales |
John | Marketing |
Alice | Sales |
- It displays the unique combinations of name and department.
Example 3: Using DISTINCT with Aggregate Functions
- DISTINCT is often used to determine only unique values with aggregate functions such as COUNT() and AVG().
- For instance, if you want to count how many unique departments there are:
SELECT COUNT(DISTINCT department) FROM employees;
- The result will be :
|
- It means that there are 3 departments, which are unique.
Example 4: DISTINC with ORDER BY
- Use of DISTINCT along with the ORDER BY clause can also be a way to ensure the result set is ordered in a particular order.
- For example, running a query that would fetch the distinct city names and order them alphabetically as well:
SELECT DISTINCT city FROM employees
ORDER BY city;
- This will result unique cities in alphabetical order.
Example 5: DISTINCT and JOIN Operations
- When distinctive in queries are used in JOIN operations, the uniqueness will apply to the combination of all columns from the joined tables.
- Assume you have two tables: employees and departments.
employees:
id | name | department_id |
1 | John | 1 |
2 | Alice | 2 |
3 | Bob | 1 |
departments:
department_id | department_name |
1 | Sales |
2 | HR |
- In case you join the tables and want to select different department names:
SELECT DISTINCT department_name
FROM employees
JOIN departments ON employees.department_id = departments.department_id;
- The outcome will be:
department_name |
Sales |
HR |
- Even though there are multiple employees in the Sales department, DISTINCT allows for the each group to appear only once.
Performance Considerations
Even as the DISTINCT is a powerful function, it does have a relatively high cost of performance, especially when it handles larger data sets. The DISTINCT functions of PostgreSQL obliges it to compare all rows across the resulting set, which is cumbersome if the table is substantial, and many selected columns are there. To optimize performance:
- Indexing: Beyond making the columns to use with DISTINCT indexed, ensure that those columns are indexed. This can aid in decreasing the initial overhead required for sorting and comparison of rows.
- Limit Results: Rather than using DISTINCT, try to limit the number of results to make sure that the computation of large datasets is not performed unnecessarily.
- Where Possible, Use More Efficient Queries: In such situations, it is feasible to achieve the same result without using DISTINCT. For example, a correctly structured GROUP BY query might be more efficient.
