7 Powerful Ways to Use PostgreSQL DISTINCT for Cleaner, More Efficient Data Queries

Understanding PostgreSQL DISTINCT Clause

The  PostgreSQL  DISTINCT clause is a special node that is used to exclude the repetitions of a query string output from it. It is nearly impossible to avoid such cases, whenever we have to deal with databases, where the same database data may appear more than one time due to either the joining method, the collapsing, or the nature of the data storage. PostgreSQL DISTINCT keyword allows for the exclusion of redundant records, therefore making it possible to better the representation of the results through such an approach.

Inside this guide, we are going to walk thorough the PostgreSQL DISTINCT statement by looking into its syntax, use cases in different scenarios, and providing examples. Also, we will take a look at how PostgreSQL DISTINCT really works, what the best scenarios are when it should be appropriated DISTINCT, and how it can be combined with other clauses of SQL to assist the user in getting a refined and meaningful dataset.

Getting the hang of PostgreSQL DISTINCT usage is indispensable to SQL users, mainly when dealing with large databases or complex questions. Through filtration of repetitive contents, records from PostgreSQL DISTINCT will be of high quality, which is to say, your data will be improved in terms of speed and accuracy of your database queries.

What is the DISTINCT Keyword in PostgreSQL?

  • The DISTINCT keyword is employed for the purpose of eliminating the duplicate rows from the result set of a SELECT query.
  • In the event, when you query a table, you may retrieve privately the rows with the same values on each column selected.
  • With the use of the DISTINCT keyword, you may guide PostgreSQL to only display unique rows in the result set.

The DISTINCT Clause Syntax

  • The basic syntax for using the DISTINCT clause is given below:

SELECT DISTINCT column1, column2, …, columnN

FROM table_name;

    • column1, column2, …, columnN: These are the columns selected for distinct values.
    • table_name: The table to which the data is assigned.
  • The DISTINCT keyword means that the values in all the specified columns are compared with each other; and, if they are same in another row, only one of those rows will appear in the result set.

How DISTINCT Works

  • When you use SELECT DISTINCT in PostgreSQL, all the rows in your result set are compared to each other.
  • When two rows have an identical value in all the selected columns, only one of them gets back.
  • By the way, DISTINCT does not adhere to a column-separated basis; it takes all the selected columns as a mean of evaluation.

For instance, when you pick more than one column with DISTINCT, the database system deletes the duplicates only if the combination of the values in the selected columns is the same in different rows. Here is an example of a simple table:

idnamecity
1AliceNew York
2BobChicago
3AliceNew York
4CharlieBoston
5AliceChicago

 

  • If we run the query:

SELECT DISTINCT name, city FROM table_name;

  • The result will be:
namecity
AliceNew York
BobChicago
AliceChicago
CharlieBoston

 

  • Notice that the combination of “Alice” and “New York” appeared twice, but only one instance is returned due to DISTINCT.
PostgreSQL DISTINCT Clause
PostgreSQL DISTINCT Clause

Common Use Cases of DISTINCT

Distinct is a keyword that enables you to perform operations without a select statement on a SQL statement. It comes in handy in several cases:

  • Eliminating Duplicate Records: Should you come across doubled records in connection with joins or operations, you can use DISTINCT to eliminate them.
  • Counting Unique Values: More often than not, it is combined with the COUNT() function to count distinct values, for instance, the number of unique cities in the table:

SELECT COUNT(DISTINCT city) FROM table_name;

  • Retrieval of Unique Combinations of Multiple Columns: If a user desires to return only unique combinations of multiple columns, then DISTINCT can be utilized to filter the results.A good example is to get only unique

SELECT DISTINCT name, city FROM table_name;

  • How to Simplify Aggregate Functions: The use of the DISTINCT keyword inside aggregate functions is to execute only on the distinct values.This is used, for example, when you want to calculate the average salary but the salary results should be unique, you could use

SELECT AVG(DISTINCT salary) FROM employees;

DISTINCT with Multiple Columns

  • Usually the most common application of DISTINCT is when you want to ensure that the result set contains unique combinations of values across more than one column.
  • When DISTINCT is added to a query, PostgreSQL asserts true for the selected rows only if the combination of values in those columns is unique.
  • Take a look at the given data:
idnamecityage
1AliceNew York30
2BobChicago25
3AliceNew York35
4AliceChicago30
5BobChicago25

 

  • If you run the following query:

SELECT DISTINCT name, city FROM table_name;

  • You would get the following result:
namecity
AliceNew York
BobChicago
AliceChicago

 

  • The unique combination of the name and the city is required for more than one row at a time, so even if Alice shows up more than once, she is only mentioned once unique per the city iteration.

DISTINCT with COUNT()

  • You can be a DISTINCT keyword with the COUNT() function to determine the number of unique values in a column.
  • A simple example is the counting of how many different cities are in the table:

SELECT COUNT(DISTINCT city) FROM table_name;

  • This query would retrieve a unique city name, without repetition.

Examples of DISTINCT in PostgreSQL

Example 1: Basic Usage of DISTINCT

  • Suppose a table employees contains the data below:
idnamedepartment
1JohnSales
2AliceHR
3BobSales
4JohnMarketing
5AliceSales

 

  • If we want to get the distinct department names, the query would be:

SELECT DISTINCT department FROM employees;

  • The result would be:
department
Sales
HR
Marketing

 

  • Here, DISTINCT ensures that “Sales” is listed only once, even though multiple employees belong to the Sales department.

Example 2: Using DISTINCT with Multiple Columns

  • If you would like to get the unique employee name-departments, you may use DISTINCT with multiple columns like this:

SELECT DISTINCT name, department FROM employees;

  • The output will be like this:
namedepartment
JohnSales
AliceHR
BobSales
JohnMarketing
AliceSales

 

  • It displays the unique combinations of name and department.

Example 3: Using DISTINCT with Aggregate Functions

  • DISTINCT is often used to determine only unique values with aggregate functions such as COUNT() and AVG().
  • For instance, if you want to count how many unique departments there are:

SELECT COUNT(DISTINCT department) FROM employees;

  • The result will be :
count
3
  • It means that there are 3 departments, which are unique.

Example 4: DISTINC with ORDER BY

  • Use of DISTINCT along with the ORDER BY clause can also be a way to ensure the result set is ordered in a particular order.
  • For example, running a query that would fetch the distinct city names and order them alphabetically as well:

SELECT DISTINCT city FROM employees

ORDER BY city;

  • This will result  unique cities in alphabetical order.

Example 5: DISTINCT and JOIN Operations

  • When distinctive in queries are used in JOIN operations, the uniqueness will apply to the combination of all columns from the joined tables.
  • Assume you have two tables: employees and departments.

employees:

idnamedepartment_id
1John1
2Alice2
3Bob1

 

departments:

department_iddepartment_name
1Sales
2HR

 

  • In case you join the tables and want to select different department names:

SELECT DISTINCT department_name

FROM employees

JOIN departments ON employees.department_id = departments.department_id;

  • The outcome will be:
department_name
Sales
HR

 

  • Even though there are multiple employees in the Sales department, DISTINCT allows for the each group to appear only once.

Performance Considerations

Even as the DISTINCT is a powerful function, it does have a relatively high cost of performance, especially when it handles larger data sets. The DISTINCT functions of PostgreSQL obliges it to compare all rows across the resulting set, which is cumbersome if the table is substantial, and many selected columns are there. To optimize performance:

  • Indexing: Beyond making the columns to use with DISTINCT indexed, ensure that those columns are indexed. This can aid in decreasing the initial overhead required for sorting and comparison of rows.
  • Limit Results: Rather than using DISTINCT, try to limit the number of results to make sure that the computation of large datasets is not performed unnecessarily.
  • Where Possible, Use More Efficient Queries: In such situations, it is feasible to achieve the same result without using DISTINCT. For example, a correctly structured GROUP BY query might be more efficient.
PostgreSQL DISTINCT Clause
PostgreSQL DISTINCT Clause

 

Leave a Comment