In some MySQL tables, duplicate records may exist. In some cases, we allow duplicate data to exist, but sometimes we need to delete duplicate data.

In this blog post, we will show you how to prevent duplicate data from appearing in a table and how to remove duplicate data from a table.


Prevent duplicate data in the table

You can specify the PRIMARY KEY or UNIQUE index in the MySQL table to ensure that the data is UNIQUE.

 

Let’s try an example: The following table has no indexes or primary keys, so it allows multiple duplicate records.

CREATE TABLE person_tbl
(
    first_name CHAR(20),
    last_name CHAR(20),
    sex CHAR(10)
);
Copy the code

Select last_name, last__name, last_name, last_name, last_name, last_name, last_name, last_name, last_name, last_name, last_name, last_name, last_name, last_name, last_name, last_name, last_name, last_name, last_name; As follows:

CREATE TABLE person_tbl
(
   first_name CHAR(20) NOT NULL,
   last_name CHAR(20) NOT NULL,
   sex CHAR(10),
   PRIMARY KEY (last_name, first_name)
);
Copy the code

If we set a unique index, the SQL statement will not execute successfully and throw errors when inserting duplicate data.

INSERT IGNORE INTO will IGNORE data already in the database, if there is no data, INSERT new data, if there is data skipped data. In this way, the existing data in the database can be preserved to achieve the purpose of inserting data in the gap.

The following example uses INSERT IGNORE INTO without error and does not INSERT duplicate data INTO the table:

mysql> INSERT IGNORE INTO person_tbl (last_name, first_name) -> VALUES( 'Jay', 'Thomas'); Query OK, 1 row affected (0.00 SEC) mysql> INSERT IGNORE INTO person_tbl (last_name, first_name) -> VALUES('Jay', 'Thomas'); Query OK, 0 rows affected (0.00 SEC)Copy the code

INSERT IGNORE INTO If duplicate data is inserted after record uniqueness is set, no errors are returned, only warnings are returned. REPLACE INTO INTO if there is a primary or unique record, delete first. Insert a new record.

Another way to set the uniqueness of the data is to add a UNIQUE index like this:

CREATE TABLE person_tbl
(
   first_name CHAR(20) NOT NULL,
   last_name CHAR(20) NOT NULL,
   sex CHAR(10)
   UNIQUE (last_name, first_name)
);
Copy the code

Statistical duplication

Select last_name from last_first_name and last_name from last_name;

mysql> SELECT COUNT(*) as repetitions, last_name, first_name
    -> FROM person_tbl
    -> GROUP BY last_name, first_name
    -> HAVING repetitions > 1;
Copy the code

The above query returns the number of duplicate records in the PERSON_tbL table. Generally, to query duplicate values, perform the following operations:

  • Determine which columns may contain duplicate values.
  • In the column list select the columns listed using COUNT(*).
  • The columns listed in the GROUP BY clause.
  • The HAVING clause sets the number of repetitions greater than 1.

Filtering duplicate data

If you need to read unique data, use the DISTINCT keyword in the SELECT statement to filter duplicate data.

mysql> SELECT DISTINCT last_name, first_name
    -> FROM person_tbl
    -> ORDER BY last_name;
Copy the code

You can also use GROUP BY to read non-repeating data from a table:

mysql> SELECT last_name, first_name
    -> FROM person_tbl
    -> GROUP BY (last_name, first_name);
Copy the code

Deleting Duplicate Data

If you want to delete duplicate data from a table, you can use the following SQL statement:

mysql> CREATE TABLE tmp SELECT last_name, first_name, sex
    ->                  FROM person_tbl;
    ->                  GROUP BY (last_name, first_name);
mysql> DROP TABLE person_tbl;
mysql> ALTER TABLE tmp RENAME TO person_tbl;
Copy the code

You can also delete duplicate entries from a table by simply adding INDEX and PRIMAY KEY to the table. The method is as follows:

mysql> ALTER IGNORE TABLE person_tbl
    -> ADD PRIMARY KEY (last_name, first_name);
Copy the code