Looping through numpy arrays (e.g. moving/rolling window)

Numpy is the cornerstone of matrix based calculations in QGIS (and elsewhere). It does wonders with raster data (unless it hits the limit of available live memory…).

A recurrent problem with Numpy is the implementation of various looping routines, such as the sliding window which is frequently used in image filtering and other approaches focused on cell neighbourhood. Below is the illustration of the problem: for each cell the window needs to query a specified neighbourhood (square, circular or other).

Now, there exists a solution in the form of “stride tricks” within the Numpy library. However it is really cryptic (the SciPy cookbook uses the “devious” epithet); after all the very name “tricks” implies it being … tricky. Numpy documentation actually discourages its use. We can do it in a more understandable manner – which also means more configurable – and without any particular computational overhead.

Let’s first examine the problem. Without Numpy we would need four nested loops: two for traversing the matrix and two for the analysed window.

#Level 1: traversing the matrix 
for y in range(rows):
    for x in range(columns):

        #Level 2: traversing the window (3x3 size) 
        for win_y in range(y-1,y+2):
            for  win_x in range(x-1, x+2):
                temp\_sum += matrix\[win\_y, win_x\]

       #Analysis inside the window is finished
       #Write result to a copy of the input matrix
       output\[y, x\] = calculate\_something(temp\_sum) 

We can immediately see that the level 2 loop can be easily replaced with a Numpy window: win = matrix[y-1: y+2, x-1: x+2]. Easy enough … but this is not the ideal approach. The window has 9 cells, while the matrix would have millions: shouldn’t we focus on the level 1 loop instead?

To avoid the level 1 loop we need to move the entire matrix, as when overlapping two sheets of paper. We’re now working on offsets between the two – which is the only tricky part. The (0,0) cell of the window now corresponds to all cells in the matrix that are on the upper left relative to analysed cells – which simply means shifting the entire matrix to the upper left. In order to move the matrix in that direction, we need to calculate offsets in relation to the central point. That would be window_index - window_radius , which for the upper left cell gives (-1, -1) (see below).

The only problem resides in handling these offsets on the level of the entire matrix. We need to extract matrix views that do not spill over the borders AND that match each other in size and shape. The overlap issue is represented on the illustration below.

This problem can be avoided with padding: adding extra row(s) and column(s) around the edges. The depth of the padded area should correspond to the maximum possible offset. The solution is now as follows:

import numpy as np

rows, columns = data.shape 
temp_sum = numpy.zeros(rows, columns)

# create a padded copy 
matrix = np.pad(data, 1, 'edge')

# Level 2: traversing the window (3x3 size)

for y in range(window\_y\_size):
    for x in range(window\_x\_size):

        # Level 1: handling the matrix
        offset_y = y-1 #1 is the window radius
        offset_x = x-1

        # rows and columns correspond to the non-padded array!
        temp\_sum += matrix\[offset\_y : rows + offset_y,
                          offset\_x : columns + offset\_x\]

#Analysis inside the window is finished
output = calculate\_something(temp\_sum) 

And we’re fine!

Refinement: unpadded solution

It’s kind of ugly to use pads. We are producing copies of the original data, which can be particularly problematic for large offsets. Furthermore, padding may influence the result in unexpected ways.

Returning to the illustration above, we can see that the cropped window view cannot match the main view (the one pointing to the original and/or temporary data). The latter also needs to be cropped, but in the opposite direction. This is logical: the top row cannot have a neighbour to the north anyway.

Therefore, we need to adjust both views at the same time. Note that these views are opposite to each other, which means that we can play with swapping: when the window is placed to the north-west (-1, -1), the main view is to the south-east (1,1), but when the window is on the south-east (1,1), the main view will be on the north-west (-1, -1).

To finish, let’s throw in an additional parameter for steps – the range to be skipped between analysed cells (see numpy docs for steps). Let’s now wrap all this to a function which will return matching views:

import numpy as np

Function returning two matching numpy views for moving window routines.
\- offset\_y and offset\_x refer to the shift in relation to the analysed (central) cell 
\- size\_y and size\_x refer to the size of the data matrix (not of the window!)
\- view\_in is the shifted view and view\_out is the position of central cells

All comments are welcome at: www.qgiswhisperer.com/2017/11/numpy-loop.html

def view (offset\_y, offset\_x, size\_y, size\_x, step=1):

    x = abs(offset_x)
    y = abs(offset_y)

    x\_in = slice(x , size\_x, step) 
    x\_out = slice(0, size\_x - x, step)

    y\_in = slice(y, size\_y, step)
    y\_out = slice(0, size\_y - y, step)

    # the swapping trick    
    if offset\_x < 0: x\_in, x\_out = x\_out, x_in                                 
    if offset\_y < 0: y\_in, y\_out = y\_out, y_in

    # return window view (in) and main view (out)
    return np.s_\[y\_in, x\_in\], np.s_\[y\_out, x\_out\]

These two views should be used as follows (regarding the main routine as above):  

view\_in, view\_out = view(offset\_y, offset\_x, rows, columns)

temp\_sum\[view\_out\] += matrix\[view_in\]

Perhaps we could do better, but the advantage of this approach is that we can adapt it to any imaginable shape of the moving window – because we conserve the loop over window cells. We are also working on views, as opposed to hard copies of the data, which is what Numpy likes a lot.

Any suggestions are welcome!

Leave a Reply

Your email address will not be published. Required fields are marked *