1. An overview of the

A HashSet is a collection used to store unique elements.

In this article, we’ll discuss the performance of the removeAll() method in the java.util.hashset class.

2. HashSet.removeAll()

The removeAll method of the HashSet removes all elements containing the specified collection:

Set<Integer> set = new HashSet<Integer>();
set.add(1);
set.add(2);
set.add(3);
set.add(4);

Collection<Integer> collection = new ArrayList<Integer>();
collection.add(1);
collection.add(3);

set.removeAll(collection);

Integer[] actualElements = new Integer[set.size()];
Integer[] expectedElements = new Integer[] { 2, 4 };
assertArrayEquals(expectedElements, set.toArray(actualElements));
Copy the code

As a result, elements 1 and 3 from the original collection will be deleted from the.

3. Internal implementation and time complexity

The removeAll() method first determines which collection is smaller and executes logic differently depending on the size of the collection. This is done by calling the size() method on the original collection and on the specified collection.

If the specified collection has fewer elements than the specified original, it iterates over the specified collection with the time complexity O(n). It also checks if the element exists in a set of O(1) time complexity. If an element exists, it is removed from the original collection using the collection’s remove() method, which has a time complexity of O(1). So the total time is order n.

If the original collection has fewer elements than the specified collection, it iterates over the original collection with O(n). It then checks for the presence of each element in the specified collection by calling its contains() method. If such an element exists, it is removed from the original collection. So it depends on the time complexity of the contains() method.

Now in this case, if the specified collection is an ArrayList, the time complexity of the contains() method is O(m). Thus, the overall time complexity of removing all elements present in the ArrayList from the collection HashSet is O(n * m).

If you specify that the collection is again a HashSet, the time complexity of the contains() method is O(1). Thus, the overall time complexity of removing all elements that exist in the HashSet from the collection HashSet is O(n).

Performance of 4.

To see the performance differences in these three cases, let’s write a simple JMH benchmark.

In the first case, we initialize the original collection and the specified collection, where the original collection has more elements than the specified collection. In the second case, the specified collection has more elements than the original collection. In the third case, the second set has more elements than the first:

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(iterations = 5)
public class HashSetBenchmark {

    @State(Scope.Thread)
    public static class MyState {
        private Set employeeSet1 = new HashSet<>();
        private List employeeList1 = new ArrayList<>();
        private Set employeeSet2 = new HashSet<>();
        private List employeeList2 = new ArrayList<>();
        private Set<Employee> employeeSet3 = new HashSet<>();
        private Set<Employee> employeeSet4 = new HashSet<>();

        private long set1Size = 60000;
        private long list1Size = 50000;
        private long set2Size = 50000;
        private long list2Size = 60000;
        private long set3Size = 50000;
        private long set4Size = 60000;

        @Setup(Level.Trial)
        public void setUp() {
            // populating sets
        }
    }
}
Copy the code

After that, we add our benchmark:

@Benchmark
public boolean given_SizeOfHashsetGreaterThanSizeOfCollection_whenRemoveAllFromHashSet_thenGoodPerformance(MyState state) {
    return state.employeeSet1.removeAll(state.employeeList1);
}

@Benchmark
public boolean given_SizeOfHashsetSmallerThanSizeOfCollection_whenRemoveAllFromHashSet_thenBadPerformance(MyState state) {
    return state.employeeSet2.removeAll(state.employeeList2);
}

@Benchmark
public boolean given_SizeOfHashsetSmallerThanSizeOfAnotherHashSet_whenRemoveAllFromHashSet_thenGoodPerformance(MyState state) {
    return state.employeeSet3.removeAll(state.employeeSet4);
}
Copy the code

The results are as follows:

Benchmark Mode Cnt Score Error Units HashSetBenchmark. TestHashSetSizeGreaterThanCollection avgt 20 + / - 2700457.099 475673.379 ns/op HashSetBenchmark. TestHashSetSmallerThanCollection avgt 20 31522676649.950 + / - 3556834894.168 ns/op HashSetBenchmark. TestHashSetSmallerThanOtherHashset avgt 20 2672757.784 + / - 224505.866 ns/opCopy the code

We can see that hashset.removeall () behaves badly when the HashSet has fewer elements than Collection, which is passed to the removeAll() method as an argument. But when the other collection is again a HashSet, the performance is fine.

Conclusion 5.

In this article, we looked at the performance of removeAll() in HashSet. When the original collection has fewer elements than the collection, the performance of removeAll() depends on the time complexity of the contains() method of the collection.

Finally, the full code for this article is available on GitHub.

Reference: www.baeldung.com/java-hashse…