Solution – Choosing Oranges

38thFriday Fun Session – 3rd Nov 2017

Given a set of goodness scores of oranges and a window length, we need to find the highest scoring oranges within the window as we move it from left to the end.

This is the solution for JLTI Code Jam – Oct 2017.

Using priority queue

Suppose we have n scores and the window length is m. We can simply move the window from left to to right and take (consecutive) m scores within the window and each time compute the max of them, and output it, if it is already not outputted. Finding max from m scores would take O(m) and as we do it n times (n-m+1 times, to be precise), the complexity would be O(mn). However, it was expected that the complexity would be better than this.

Alternatively, we can use a max-heap, where we push each score as we encounter it. At the same time, we retrieve the top of the max-heap and if all is fine – output it. By if all is fine, we mean to say, we need to make sure that the orange has not been already outputted and that it belongs to the current window. At the same time, if the top item is out-dated, we can pop it, meaning take it out of the heap. Note that, max-heap is a data structure that retains the max element at the top.

Let us walk through an example

Let us take the first example as mentioned here. For the scores 1 3 5 7 3 5 9 1 2 5 with window size 5, let us walk through the process.

At first, we push the first 4 items (4 is one less than the window size 5). The max-heap would look like: 1 3 5 7 where 7 is the top element.

Then for each of the remaining items, we do the following:

1. If the (new) item is greater than or equal to the top item in the max-heap, pop it (out) and push the new item into it. Output the new item (if the same orange is not already outputted). We do it because the new item is the max in the present window. And the existing top one is of no further use. We can now move to the next item.
2. Keep on popping the top as long it is not one belonging to the current window. We do it, as we are interested to find the max within the window, not the out-dated ones those are no longer inside the window.
3. Output the top (if the same orange is not already outputted). We do it as it the max within the current window.
4. If the top item is the oldest (left-most/first/earliest/starting one) in the current window, pop it. We do it because this item is going to go out of the window as the next item gets in.

Score 3:

3 is not >= 7 (top in heap)

Existing top (7) is within the current window. Output it.

Push 3; max-heap looks like: 1 3 3 5 7

Score 5:

5 is not >= 7 (top in heap)

Existing top (7) is within the current window (3 5 7 3 5). But this orange is already outputted (we can use index of the item to track it, meaning instead of just pushing the score, retain the index along with it). No output this time.

Push 5; max-heap looks like: 1 3 3 5 5 7

Score 9:

9 >= 7 (top in heap)

Pop 7, push 9, output 9.

New max-heap: 1 3 3 5 5 9

Score 1:

1 is not >= 9 (top in heap)

Existing top (9) is within the current window. But this orange is already outputted. No output this time.

Push 1; max-heap looks like: 1 1 3 3 5 5 9

Score 2:

2 is not >= 9 (top in heap)

Existing top (9) is within the current window. But this orange is already outputted. No output this time.

Push 2; max-heap looks like: 1 1 2 3 3 5 5 9

Score 5:

Existing top (9) is within the current window. But this orange is already outputted. No output this time.

Push 5; max-heap looks like: 1 1 2 3 3 5 5 5 9

No item left. We are done!

Finally, output is 7, 9.

Complexity

If we closely observe, we see that the size of the max-heap would be always around m. Because, if the new item is greater or equal we are popping the top – hence the max-heap size is not increasing. If new item is smaller, we are pushing it and the size of the max-heap is increasing – true; but then soon the top would be out-dated and then we would pop that. So the max-heap size remains around m. Pushing (in) or popping (out) an item would cost log m, and since we would do it n times – the complexity would be O(n log m). Please note that getting the top of the max-heap costs O(1).

GitHub: Choosing Oranges

Index

Author: Gopal Das

I am working as a Data Scientist at CrimsonLogic Pte Ltd, Singapore. I have a BS in Computer Science & Engineering from Khulna University and ME in Internet Science & Engineering from Indian Institute of Science (IISc). I have a few publications on Query Optimization in RDBMS in ACM SIGMOD, IEEE ICDE etc. I was a founding team member and VP Engineering of iTwin, a spinoff from A*STAR. I am working as a software engineer/data scientist for 19 years. I am interested in Algorithms, Database, and Machine Learning among others. I am a father of three children (two at NUSH and one is too little to go to school). I live in Singapore. www.linkedin.com/in/dasgopal https://github.com/gopalcdas