Top K Frequent Items

HKT, Morgen Peschke HKT, Morgen Peschke
Views
Article hero image

You may have encountered the classic interview question to find the top k frequent items in an array, particularly in O(N) time, which utilizes a bucket sort.

In this post, we will explore a few ways to implement the solution. We will discuss imperative, functional, and hybrid approaches, as well as linear and logarithmic runtime complexities.

Here is a typical Java implementation that an interviewer would rejoice seeing from the candidate:

public static List<Integer> topKFrequentOrderN(int[] nums, int k) {
  final var freqMap = new HashMap<Integer, Integer>();
  for (int num : nums) {
    freqMap.put(num, freqMap.getOrDefault(num, 0) + 1);
  }

  @SuppressWarnings("unchecked")
  final List<Integer>[] buckets = new List[nums.length + 1];

  for (int i = 0; i < buckets.length; ++i) {
    buckets[i] = new ArrayList<>();
  }

  for (var entry : freqMap.entrySet()) {
    buckets[entry.getValue()].add(entry.getKey());
  }

  final var result = new ArrayList<Integer>();
  for (int i = buckets.length - 1; i >= 0 && result.size() < k; --i) {
    for (var num : buckets[i]) {
      result.add(num);
      if (result.size() == k) {
        break;
      }
    }
  }

  return result;
}

Here is the Scala equivalent, which is relatively succinct:

def topKFrequentOrderN(nums: Array[Int], k: Int): Array[Int] = {
  val freqMap = mutable.Map[Int, Int]()
  nums.foreach { num =>
    freqMap.put(num, freqMap.getOrElse(num, 0) + 1)
  }

  val buckets = ArrayBuffer.fill(nums.length + 1)(ArrayBuffer[Int]())
  freqMap.foreach { entry =>
    buckets(entry._2).addOne(entry._1)
  }

  buckets.indices.view.reverse
    .flatMap(i => buckets(i))
    .take(k)
    .toArray
}

Although the above imperative solution runs in O(n) time, it can be somewhat tricky to understand, especially in Java. Let us try to implement using a functional approach.

Here is a more functional style implementation in Scala, which is pleasing to my eyes:

def topKFrequentNLogN(nums: Array[Int], k: Int): Array[Int] =
  nums
    .groupBy(identity)
    .view
    .mapValues(_.length)
    .toArray // The frequency map
    .sortBy(-_._2) // Instead of creating buckets, sort by the count and take the top k
    .take(k)
    .map(_._1)

We can implement the same in Java using the Streams API.

import java.util.*;
import java.util.stream.Collectors;

public static List<Integer> topKFrequentNLogN(int[] nums, int k) {
  return Arrays.stream(nums)
    .boxed()
    .collect(Collectors.groupingBy(num -> num, Collectors.counting())) // Create frequency map
    .entrySet()
    .stream()
    .sorted(Map.Entry.<Integer, Long>comparingByValue().reversed()) // Reverse sort by count
    .limit(k) // Take top k
    .map(Map.Entry::getKey) // Extract the keys
    .collect(Collectors.toList()); // Collect to List
}

While the functional implementation is elegant, concise, and comprehensible, it has a time complexity of O(n log n), which may not be accepted. The interviewer is looking for linear time complexity. We do too, but are also looking for elegant code.

A hybrid approach of functional and imperative should yield some elegant code with linear time complexity.

The following is a linear-time complexity implementation using a hybrid approach in Scala. While buckets is mutated, the rest of the code brings in the functional flavor to the mix.

def topKFrequentOrderN(nums: Array[Int], k: Int): Array[Int] = {
  val buckets = Array.fill[List[Int]](nums.length + 1)(List.empty[Int])

  nums
    .groupBy(identity)
    .view
    .mapValues(_.length)
    .toMap
    .foreach { case t @ (num, count) =>
      buckets(count) = num :: buckets(count)
    }

  buckets.reverse.flatten.take(k)
}

Here is the equivalent hybrid implementation in Java with linear time complexity:

public static int[] topKFrequentOrderN2(int[] nums, int k) {
  @SuppressWarnings("unchecked")
  List<Integer>[] buckets =
      IntStream
        .range(0, nums.length + 1)
        .mapToObj(i -> new ArrayList<Integer>())
        .toArray(List[]::new);

  Arrays
    .stream(nums)
    .boxed()
    .collect(Collectors.groupingBy(num1 -> num1, Collectors.counting())) // Frequency map
    .forEach((num, count) -> buckets[count.intValue()].add(num)); // Fill the buckets

  return Arrays.stream(buckets)
    .flatMap(List::stream)
    .limit(k)
    .mapToInt(Integer::intValue)
    .toArray();
}

Unfortunately, interviewers will not encourage a functional style of implementation as a solution to this problem. In an interview setting, an interviewer expects to see the mechanics of iterating over the array and other low-level concepts. However, I bet you will see the functional implementation in a typical large codebase where readability is paramount compared to the imperative style. Of course, it depends! And it’s all about trade-offs.


P.S: This post was co-authored by Morgen Peschke.

java scala problem