Avoiding Duplicate Rows in Many-to-Many Relationships in SQL

Understanding Many-to-Many Relationships in SQL

When dealing with many-to-many relationships between tables, it’s common to encounter duplicate rows as a result. In this article, we’ll explore the issue of duplicate rows in many-to-many relationships and how to avoid them.

The Problem with Duplicate Rows

The question arises when trying to join two or more tables that have many-to-many relationships. For example, consider a film with multiple actors and writers. If we try to join these tables on a single query, we’ll end up with duplicate rows. This is because the database has no way of knowing which actor attribute corresponds to which writer attribute.

Let’s consider an example using the provided schema:

CREATE TABLE `videos` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `description` varchar(10000) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=47606 DEFAULT CHARSET=utf8;

CREATE TABLE `actors` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `actors_UN` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=152216 DEFAULT CHARSET=utf8;

CREATE TABLE `film_actors` (
  `actor_id` int(11) NOT NULL,
  `film_id` int(11) NOT NULL,
  PRIMARY KEY (`film_id`,`actor_id`),
  KEY `FKrs472oyyff3hfwq10pyo94k1d` (`actor_id`),
  CONSTRAINT `FK12uvap3je50qd8cq3s0jf7h7r` FOREIGN KEY (`film_id`) REFERENCES `videos` (`id`) ON DELETE CASCADE,
  CONSTRAINT `FKrs472oyyff3hfwq10pyo94k1d` FOREIGN KEY (`actor_id`) REFERENCES `actors` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `writers` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(255) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `writers_UN` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=39875 DEFAULT CHARSET=utf8;

CREATE TABLE `film_writers` (
  `film_id` int(11) NOT NULL,
  `writer_id` int(11) NOT NULL,
  PRIMARY KEY (`writer_id`,`film_id`),
  KEY `FKh0kfwnarp6utb4f80ycj1lap` (`film_id`),
  CONSTRAINT `FK4hwpb5l48m0xps6jqn1wyjb63` FOREIGN KEY (`writer_id`) REFERENCES `writers` (`id`) ON DELETE CASCADE,
  CONSTRAINT `FKh0kfwnarp6utb4f80ycj1lap` FOREIGN KEY (`film_id`) REFERENCES `videos` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

The provided SQL query attempts to join the films, actors, and writers tables on a single query:

SELECT *
FROM
  films 
  INNER JOIN actors on films.id = actors.filmid
  INNER JOIN writers on films.id = writers.filmid

This results in duplicate rows, as we’ll see below.

The Cartesian Product

When trying to join two or more tables with many-to-many relationships, the database will produce a Cartesian product of the rows. In our example, this means that every actor will be associated with every writer, resulting in 110 rows (10 actors × 11 writers).

Let’s illustrate this using the provided data:

Actor1/Writer1
Actor1/Writer2
...
Actor1/Writer11
Actor2/Writer1
Actor2/Writer2
...
Actor2/Writer11
Actor3/Writer1
Actor3/Writer2
...
Actor3/Writer11
...

As we can see, every actor is associated with every writer, resulting in duplicate rows.

Avoiding Duplicate Rows

To avoid these duplicate rows, we must recognize that there is no relationship between a film actor and a film writer. The two entities share no common attributes or data that would allow us to join them on a single query.

Instead of trying to perform a single query that joins all three tables, we should run separate queries for each table:

SELECT *
FROM films 
INNER JOIN actors on films.id = actors.filmid;

SELECT *
FROM writers;

By running these two separate queries, we can retrieve the desired data without producing duplicate rows.

Conclusion

When dealing with many-to-many relationships in SQL, it’s essential to recognize that there may not be a relationship between the entities involved. In such cases, attempting to join all tables on a single query will result in duplicate rows due to the Cartesian product.

By breaking down these queries into separate ones for each table and recognizing the lack of common attributes or data between film actors and writers, we can avoid producing duplicate rows and achieve our desired results.


Last modified on 2023-08-02