It's easy to find duplicates with one field:
SELECT email, COUNT(email)
FROM users
GROUP BY email
HAVING COUNT(email) > 1
So if we have a table
ID NAME EMAIL
1 John asd@asd.com
2 Sam asd@asd.com
3 Tom asd@asd.com
4 Bob bob@asd.com
5 Tom asd@asd.com
This query will give us John, Sam, Tom, Tom because they all have the same email.
However, what I want is to get duplicates with the same email and name.
That is, I want to get "Tom", "Tom".
The reason I need this: I made a mistake, and allowed inserting duplicate name and email values. Now I need to remove/change the duplicates, so I need to find them first.
Solution 1
SELECT
name, email, COUNT(*)
FROM
users
GROUP BY
name, email
HAVING
COUNT(*) > 1
Simply group on both of the columns.
Note: the older ANSI standard is to have all non-aggregated columns in the GROUP BY but this has changed with the idea of "functional dependency":
In relational database theory, a functional dependency is a constraint between two sets of attributes in a relation from a database. In other words, functional dependency is a constraint that describes the relationship between attributes in a relation.
Support is not consistent:
- Recent PostgreSQL supports it.
- SQL Server (as at SQL Server 2017) still requires all non-aggregated columns in the GROUP BY.
- MySQL is unpredictable and you need
sql_mode=only_full_group_by:- GROUP BY lname ORDER BY showing wrong results;
- Which is the least expensive aggregate function in the absence of ANY() (see comments in accepted answer).
- Oracle isn't mainstream enough (warning: humour, I don't know about Oracle).
Solution 2
try this:
declare @YourTable table (id int, name varchar(10), email varchar(50))
INSERT @YourTable VALUES (1,'John','John-email')
INSERT @YourTable VALUES (2,'John','John-email')
INSERT @YourTable VALUES (3,'fred','John-email')
INSERT @YourTable VALUES (4,'fred','fred-email')
INSERT @YourTable VALUES (5,'sam','sam-email')
INSERT @YourTable VALUES (6,'sam','sam-email')
SELECT
name,email, COUNT(*) AS CountOf
FROM @YourTable
GROUP BY name,email
HAVING COUNT(*)>1
OUTPUT:
name email CountOf
---------- ----------- -----------
John John-email 2
sam sam-email 2
(2 row(s) affected)
if you want the IDs of the dups use this:
SELECT
y.id,y.name,y.email
FROM @YourTable y
INNER JOIN (SELECT
name,email, COUNT(*) AS CountOf
FROM @YourTable
GROUP BY name,email
HAVING COUNT(*)>1
) dt ON y.name=dt.name AND y.email=dt.email
OUTPUT:
id name email
----------- ---------- ------------
1 John John-email
2 John John-email
5 sam sam-email
6 sam sam-email
(4 row(s) affected)
to delete the duplicates try:
DELETE d
FROM @YourTable d
INNER JOIN (SELECT
y.id,y.name,y.email,ROW_NUMBER() OVER(PARTITION BY y.name,y.email ORDER BY y.name,y.email,y.id) AS RowRank
FROM @YourTable y
INNER JOIN (SELECT
name,email, COUNT(*) AS CountOf
FROM @YourTable
GROUP BY name,email
HAVING COUNT(*)>1
) dt ON y.name=dt.name AND y.email=dt.email
) dt2 ON d.id=dt2.id
WHERE dt2.RowRank!=1
SELECT * FROM @YourTable
OUTPUT:
id name email
----------- ---------- --------------
1 John John-email
3 fred John-email
4 fred fred-email
5 sam sam-email
(4 row(s) affected)
Solution 3
Try this:
SELECT name, email
FROM users
GROUP BY name, email
HAVING ( COUNT(*) > 1 )
Solution 4
If you want to delete the duplicates, here's a much simpler way to do it than having to find even/odd rows into a triple sub-select:
SELECT id, name, email
FROM users u, users u2
WHERE u.name = u2.name AND u.email = u2.email AND u.id > u2.id
And so to delete:
DELETE FROM users
WHERE id IN (
SELECT id/*, name, email*/
FROM users u, users u2
WHERE u.name = u2.name AND u.email = u2.email AND u.id > u2.id
)
Much more easier to read and understand IMHO
Note: The only issue is that you have to execute the request until there is no rows deleted, since you delete only 1 of each duplicate each time
Solution 5
In contrast to other answers you can view the whole records containing all columns if there are any. In the PARTITION BY part of row_number function choose the desired unique/duplicit columns.
SELECT *
FROM (
SELECT a.*
, Row_Number() OVER (PARTITION BY Name, Age ORDER BY Name) AS r
FROM Customers AS a
) AS b
WHERE r > 1;
When you want to select ALL duplicated records with ALL fields you can write it like
CREATE TABLE test (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY
, c1 integer
, c2 text
, d date DEFAULT now()
, v text
);
INSERT INTO test (c1, c2, v) VALUES
(1, 'a', 'Select'),
(1, 'a', 'ALL'),
(1, 'a', 'multiple'),
(1, 'a', 'records'),
(2, 'b', 'in columns'),
(2, 'b', 'c1 and c2'),
(3, 'c', '.');
SELECT * FROM test ORDER BY 1;
SELECT *
FROM test
WHERE (c1, c2) IN (
SELECT c1, c2
FROM test
GROUP BY 1,2
HAVING count(*) > 1
)
ORDER BY 1;
Tested in PostgreSQL.
Solution 6
SELECT name, email
FROM users
WHERE email in
(SELECT email FROM users
GROUP BY email
HAVING COUNT(*)>1)
Solution 7
A little late to the party but I found a really cool workaround to finding all duplicate IDs:
SELECT email, GROUP_CONCAT(id)
FROM users
GROUP BY email
HAVING COUNT(email) > 1;
Solution 8
This selects/deletes all duplicate records except one record from each group of duplicates. So, the delete leaves all unique records + one record from each group of the duplicates.
Select duplicates:
SELECT *
FROM table
WHERE
id NOT IN (
SELECT MIN(id)
FROM table
GROUP BY column1, column2
);
Delete duplicates:
DELETE FROM table
WHERE
id NOT IN (
SELECT MIN(id)
FROM table
GROUP BY column1, column2
);
Be aware of larger amounts of records, it can cause performance problems.
Solution 9
try this code
WITH CTE AS
( SELECT Id, Name, Age, Comments, RN = ROW_NUMBER()OVER(PARTITION BY Name,Age ORDER BY ccn)
FROM ccnmaster )
select * from CTE
Solution 10
In case you work with Oracle, this way would be preferable:
create table my_users(id number, name varchar2(100), email varchar2(100));
insert into my_users values (1, 'John', 'asd@asd.com');
insert into my_users values (2, 'Sam', 'asd@asd.com');
insert into my_users values (3, 'Tom', 'asd@asd.com');
insert into my_users values (4, 'Bob', 'bob@asd.com');
insert into my_users values (5, 'Tom', 'asd@asd.com');
commit;
select *
from my_users
where rowid not in (select min(rowid) from my_users group by name, email);
Solution 11
select name, email
, case
when ROW_NUMBER () over (partition by name, email order by name) > 1 then 'Yes'
else 'No'
end "duplicated ?"
from users
Solution 12
If you wish to see if there is any duplicate rows in your table, I used below Query:
create table my_table(id int, name varchar(100), email varchar(100));
insert into my_table values (1, 'shekh', 'shekh@rms.com');
insert into my_table values (1, 'shekh', 'shekh@rms.com');
insert into my_table values (2, 'Aman', 'aman@rms.com');
insert into my_table values (3, 'Tom', 'tom@rms.com');
insert into my_table values (4, 'Raj', 'raj@rms.com');
Select COUNT(1) As Total_Rows from my_table
Select Count(1) As Distinct_Rows from ( Select Distinct * from my_table) abc
Solution 13
SELECT id, COUNT(id) FROM table1 GROUP BY id HAVING COUNT(id)>1;
I think this will work properly to search repeated values in a particular column.
Solution 14
select id,name,COUNT(*) from user group by Id,Name having COUNT(*)>1
Solution 15
select emp.ename, emp.empno, dept.loc
from emp
inner join dept
on dept.deptno=emp.deptno
inner join
(select ename, count(*) from
emp
group by ename, deptno
having count(*) > 1)
t on emp.ename=t.ename order by emp.ename
/
Solution 16
This is the easy thing I've come up with. It uses a common table expression (CTE) and a partition window (I think these features are in SQL 2008 and later).
This example finds all students with duplicate name and dob. The fields you want to check for duplication go in the OVER clause. You can include any other fields you want in the projection.
with cte (StudentId, Fname, LName, DOB, RowCnt)
as (
SELECT StudentId, FirstName, LastName, DateOfBirth as DOB, SUM(1) OVER (Partition By FirstName, LastName, DateOfBirth) as RowCnt
FROM tblStudent
)
SELECT * from CTE where RowCnt > 1
ORDER BY DOB, LName
Solution 17
How we can count the duplicated values?? either it is repeated 2 times or greater than 2. just count them, not group wise.
as simple as
select COUNT(distinct col_01) from Table_01
Solution 18
By Using CTE also we can find duplicate value like this
with MyCTE
as
(
select Name,EmailId,ROW_NUMBER() over(PARTITION BY EmailId order by id) as Duplicate from [Employees]
)
select * from MyCTE where Duplicate>1
Solution 19
I think this will help you
SELECT name, email, COUNT(* )
FROM users
GROUP BY name, email
HAVING COUNT(*)>1
Solution 20
Well this question has been answered very neatly in all the above answers. But I would like to list all the possible manners, we can do this in various ways which may impart the understanding how we can do it and seeker can pick one of the solution which best fits to his/her need as this is one of the most common query SQL developer come across different business usecases or sometime in interviews as well.
Creating Sample Data
I will start with setting up some sample data from this question only.
Create table NewTable (id int, name varchar(10), email varchar(50))
INSERT NewTable VALUES (1,'John','asd@asd.com')
INSERT NewTable VALUES (2,'Sam','asd@asd.com')
INSERT NewTable VALUES (3,'Tom','asd@asd.com')
INSERT NewTable VALUES (4,'Bob','bob@asd.com')
INSERT NewTable VALUES (5,'Tom','asd@asd.com')
1. USING GROUP BY CLAUSE
SELECT
name,email, COUNT(*) AS Occurence
FROM NewTable
GROUP BY name,email
HAVING COUNT(*)>1
How it works:
- the GROUP BY clause groups the rows into groups by values in both name and email columns.
- Then, the COUNT() function returns the number of occurrences of each group (name,email).
- Then, the HAVING clause keeps only duplicate groups, which are groups that have more than one occurrence.
2. Using CTE:
To return the entire row for each duplicate row, you join the result of the above query with the NewTable table using a common table expression (CTE):
WITH cte AS (
SELECT
name,
email,
COUNT(*) occurrences
FROM NewTable
GROUP BY
name,
email
HAVING COUNT(*) > 1
)
SELECT
t1.Id,
t1.name,
t1.email
FROM NewTable t1
INNER JOIN cte ON
cte.name = t1.name AND
cte.email = t1.email
ORDER BY
t1.name,
t1.email;
3. Using ROW_NUMBER() function
WITH cte AS (
SELECT
name,
email,
ROW_NUMBER() OVER (
PARTITION BY name,email
ORDER BY name,email) rownum
FROM
NewTable t1
)
SELECT
*
FROM
cte
WHERE
rownum > 1;
How it works:
ROW_NUMBER()distributes rows of theNewTabletable into partitions by values in thenameandemailcolumns. The duplicate rows will have repeated values in thenameandemailcolumns, but different row numbers- Outer query removes the first row in each group.
Well Now I believe, you can have sound Idea of how to find duplicates and apply the logic to find duplicate in all possible scenarios. Thanks.
Solution 21
This should also work, maybe give it try.
Select * from Users a
where EXISTS (Select * from Users b
where ( a.name = b.name
OR a.email = b.email)
and a.ID != b.id)
Especially good in your case If you search for duplicates who have some kind of prefix or general change like e.g. new domain in mail. then you can use replace() at these columns
Solution 22
SELECT * FROM users u where rowid = (select max(rowid) from users u1 where
u.email=u1.email);
Solution 23
SELECT name, email,COUNT(email)
FROM users
WHERE email IN (
SELECT email
FROM users
GROUP BY email
HAVING COUNT(email) > 1)
Solution 24
The most important thing here is to have the fastest function. Also indices of duplicates should be identified. Self join is a good option but to have a faster function it is better to first find rows that have duplicates and then join with original table for finding id of duplicated rows. Finally order by any column except id to have duplicated rows near each other.
SELECT u.*
FROM users AS u
JOIN (SELECT username, email
FROM users
GROUP BY username, email
HAVING COUNT(*)>1) AS w
ON u.username=w.username AND u.email=w.email
ORDER BY u.email;
Solution 25
If you want to find duplicate data (by one or several criterias) and select the actual rows.
with MYCTE as (
SELECT DuplicateKey1
,DuplicateKey2 --optional
,count(*) X
FROM MyTable
group by DuplicateKey1, DuplicateKey2
having count(*) > 1
)
SELECT E.*
FROM MyTable E
JOIN MYCTE cte
ON E.DuplicateKey1=cte.DuplicateKey1
AND E.DuplicateKey2=cte.DuplicateKey2
ORDER BY E.DuplicateKey1, E.DuplicateKey2, CreatedAt
http://developer.azurewebsites.net/2014/09/better-sql-group-by-find-duplicate-data/
Solution 26
To delete records whose names are duplicate
;WITH CTE AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY name ORDER BY name) AS T FROM @YourTable
)
DELETE FROM CTE WHERE T > 1
Solution 27
To Check From duplicate Record in a table.
select * from users s
where rowid < any
(select rowid from users k where s.name = k.name and s.email = k.email);
or
select * from users s
where rowid not in
(select max(rowid) from users k where s.name = k.name and s.email = k.email);
To Delete the duplicate record in a table.
delete from users s
where rowid < any
(select rowid from users k where s.name = k.name and s.email = k.email);
or
delete from users s
where rowid not in
(select max(rowid) from users k where s.name = k.name and s.email = k.email);
Solution 28
Another easy way you can try this using analytic function as well:
SELECT * from
(SELECT name, email,
COUNT(name) OVER (PARTITION BY name, email) cnt
FROM users)
WHERE cnt >1;
Solution 29
SELECT column_name,COUNT(*) FROM TABLE_NAME GROUP BY column1, HAVING COUNT(*) > 1;
Solution 30
You may want to try this
SELECT NAME, EMAIL, COUNT(*)
FROM USERS
GROUP BY 1,2
HAVING COUNT(*) > 1
