SQL Query to Remove Duplicate Emails

 

Approach

  1. Normalize Emails: For each email, convert it to lowercase and trim any leading or trailing whitespace. This normalization ensures that emails differing only in case or whitespace are treated as identical.

  2. Identify Minimum IDs: For each normalized email, find the smallest id among all duplicate records.

  3. Retrieve Original Records: Using the identified minimum id values, fetch the corresponding original records (including the original email) from the users table.

  4. Order Results: Sort the final result by id in ascending order.

Solution Code

sql
SELECT u1.id, u1.email
FROM users u1
JOIN (
    SELECT MIN(id) AS min_id
    FROM users
    GROUP BY LOWER(TRIM(email))
) u2 ON u1.id = u2.min_id
ORDER BY u1.id;

Explanation

  1. Subquery for Minimum IDs: The inner query (SELECT MIN(id) AS min_id FROM users GROUP BY LOWER(TRIM(email))) processes the users table by:

    • Normalizing each email using LOWER(TRIM(email)) to handle case insensitivity and whitespace.

    • Grouping records by these normalized emails.

    • For each group, selecting the smallest id using MIN(id).

  2. Joining with Original Table: The outer query joins the original users table (u1) with the results of the subquery (u2) on the condition u1.id = u2.min_id. This ensures that only the records with the smallest id for each normalized email are selected.

  3. Ordering Results: The final result is ordered by id to meet the requirement of returning records in ascending order of id.

No comments:

Post a Comment

Best for you