Counting Duplicates with Java Stream API
The Java Stream API provides powerful tools that can be used to efficiently process large amounts of data. One such tool is the Collector interface, which allows us to easily count and group elements in a collection based on different criteria. In this article we will demonstrate how to use it to calculate frequency or remove/count duplicates.
Count Duplicates
We can use the distinct method from java.util.stream.Stream to count duplicates:
var test = List.of(1, 3, 3, 4);
var distinctSize = test.stream().distinct().count();
To check if the list has duplicates, we just have to compare the size:
System.out.println(test.size() == distinctSize); // false
If the list contains objects instead of primitives like in the first example, then our objects have to implement the equals method:
@AllArgsConstructor
@EqualsAndHashCode
public class Devlabs {
long id;
}
var test = List.of(
new Devlabs(1),
new Devlabs(3),
new Devlabs(3),
new Devlabs(4));
var distinctSize = test.stream().distinct().count();
System.out.println(test.size() == distinctSize); // false
In this example we are using Lombok's @EqualsAndHashCode to generate the equals method which compares the objects using all relevant fields. With equals method implemented, the two objects new Devlabs(3) are counted as duplicates.
If we want to remove the duplicates, we just have to use the distinct in combination with the collect method to accumulate the input elements into a new List:
var noDuplicates = test.stream()
.distinct()
.collect(Collectors.toList());
Calculate the Frequency
If we want to count the frequency of the items in the list, we can use the java.util.stream.Stream.collect method in combination with Collectors.groupingBy:
var test = List.of(1, 3, 3, 4);
Map<Integer, Long> freq = test
.stream()
.collect(Collectors.groupingBy(
Function.identity(),
Collectors.counting()
));
// {1=1, 3=2, 4=1}
Another possibility is to use the Java.util.Collections.frequency method:
import static java.util.Collections.frequency;
// Calculate the frequency for a single item in the list
System.out.println(frequency(test, 3)); // 2
// Calculate the frequency for all objects in the list
Map<Integer, Integer> freqTest = test.stream()
.collect(
Collectors.toMap(
Function.identity(),
v -> frequency(test, v),
(v1, v2) -> v1)
);
In the last example, we use the frequency method to calculate the frequency for all items in the list. Since the toMap function is called on each element of the list, we have to use (v1, v2) -> v1 as a merge function, since the result is the same.