GEOG3/71551 Understanding GIS

Tuesday 9-12, Simon Building Room 6.004

Assessment 2 - Weighted Redistribution Assessment 2 - Weighted Redistribution

In which you apply the skills that you have learned to reproduce an algorithm from the


Introduction Introduction

Ever since the emergence of large scale web services for geocoding and plotting data, there

has been a proliferation of ‘heatmaps’ seeking to reveal the spatial distribution of pretty

much any pnenomena that mappers could get their hands on. You can find these maps


You should always be wary of heatmaps for a number of reasons, chief amongst which is that

changing the settings of the heatmap can quite dramatically change the results, and there is

rarely any justification for the choice of those settings. However, the worst heatmaps occur

when the underlying data are at multiple scales (i.e, multiple levels of generalisation generalisation),

which o!en happens when they are passively geocoded passively geocoded (meaning that the locations were

derived from place names in the dataset, which were not originally intended to be used to

locate the data on a map).

The variation in scale in passively geocoded passively geocoded datasets means that they tend to suffer from

an issue called false hotspots false hotspots, in which the heatmaps unexpectedly give false patterns. You

can see this effect in the below image, where a suffers from false hotspots; and b is the same

dataset, restored to a pattern more reflective of the ‘true’ location of each data point.

This problem is discussed (in context of Twitter data relating to the Royal Wedding) of Prince

William and Kate Middleton by a paper by Dr. Jonny Huck in the Journal of Spatial

Information Science. The paper includes a simple algorithm, which allows false hotspots to

be dissipated into realistic patterns based on a weighting surface (population density in this

case). In general terms, Weighted Redistribution could be considered an example of a spatial

disaggregation algorithm (of which there are many examples), but this one is quite unusual

in that it disaggregates data into a surface, rather than smaller aerial units.

Download and read the article, making sure that you understand the problem of False

Hotspots Hotspots and the principles of the Weighted Redistribution Weighted Redistribution algorithm that may be used

to solve it.

The Question The Question

Produce your own implementation of the weighted redistribution algorithm weighted redistribution algorithm and apply it

to the provided ‘tweets’ dataset in order to determine which parts of Greater Manchester

were most interested in the Royal Wedding as part of a scheme to target advertising during

such events.

You will have a random sample of 1,023 level 3 tweets relating to the Royal Wedding 1,023 level 3 tweets relating to the Royal Wedding

that have been geocoded to districts in Greater Manchester. Using these and the weighted weighted

redistribution redistribution algorithm, you should work out a likely spatial distirbution for Twitter

activity across Manchester relating to the Royal Wedding.

Produce a 1,000 word report explaining and justifying your implementation of the weighted weighted

redistribution redistribution algorithm to the level 3 Twitter data level 3 Twitter data in order to produce your map.

The analysis and outputs should be completed entirely in Python and the report should

include at least one map. You must submit both your code and report.

Make sure that you read this whole document before you start!

The datasets for the Assessment are available here. This contains:

level3-tweets-subset the tweets themselves

100m_pop_2019: a weighting surface (simple population density surface at 100m


gm-districts: polygons representing the districts (level 3) of Greater Manchester to

which these tweets were geocoded.

Please note that you are not limited to the data provided, nor are you required to use the

provided weighting surface - I would encourage you to include any data that you like to

embellish the analytical or cartographic quality of your report.

Contents Contents

Rules of Engagement

Some Pointers

Marking Criteria

Rules of Engagement Rules of Engagement

Submission Submission

This work must be submitted by 14:00 on Thursday 14 14:00 on Thursday 14th December 2023 (Week 12) December 2023 (Week 12).

You must submit your code and a report of up to up to 1000 words in length (±10%)

describing the choices that you made in order to implement this algorithm in an elegant,

efficient and robust manner.

Code snippets can be used in the report where appropriate (and are not included in the word

count) - but they should be styled using an appropriate font (I recommend courier).

Remember that I do not just want a line by line description of your code (that is what your

comments are for)!

Note that you must submit your assessment as described here - not having read Note that you must submit your assessment as described here - not having read

the below instructions will not be accepted as a reason for late or incorrect the below instructions will not be accepted as a reason for late or incorrect

submission submission

1. The Code part of your submission must be compressed compressed into a .zip (not *.rar or any

other format) file and submitted via a Dropbox File Request: Dropbox File Request: - links provided below.

The filename should be named using your student number in the format


1. Undergraduates Undergraduates should submit here.

2. Postgraduates Postgraduates should submit here.

2. The Report part of your submission must be submitted as usual via the Turnitin on the

Assessment Assessment page of the Blackboard Blackboard site. The report should contain the map(s) that

you have produced. Submitted files should be named in the format: 123456789.docx

or 123456789.pdf etc.

Code Template Code Template

In order to assess the efficiency of your algorithm, I need to be able to time how long your

code takes to execute the task. Accordingly, you must use the below template you must use the below template for your

code, and ALL of your code must be between the # NO CODE ABOVE HERE and # NO



Understanding GIS: Assessment 2


An Implementation Weighted Redistribution Algorithm (Huck et al. 20"""

In addition to this, please do not use any libraries that are not already available please do not use any libraries that are not already available

within the within the understandinggis understandinggis anaconda anaconda environment.

Zip File Structure Zip File Structure

Remember that your submission must include all files that are required for your code to run

successfully must be included (otherwise your code won’t work!).

I would recommend the following file structure within your main Assessment folder:

- 123456789/

- 123456789.py

- data/

- 100m_pop_2019.*

- gm-districts.*

- level3-tweets-subset.*

- [any other data files that you need]

- out/

- manchester_tweets.png

Please make sure that you name both your directory and python file using your Please make sure that you name both your directory and python file using your

student number! student number!

It is also important that all of your file paths are relative file paths relative file paths (e.g.

./data/my_file.shp). If you use absolute file paths absolute file paths (e.g.

C:/jimbob/understandinggis/assessment1/data/my_file.shp) then your

code will not work when I run it code will not work when I run it! Please also make sure that the paths link to the data

files inside your file, and not to elsewhere on your computer!

# import the time library

from time import time

# set start time

start_time = time() # NO CODE ABOVE HERE


# report runtime

print(f"completed in: {time() - start_time} seconds") # NO CODE B

A good way to ensure that this has worked is to either:

write your code in a new directory that is not inside your understandinggis

directory - this will avoid accidentally pointing your to the wrong version of your datasets

(i.e. the one that you use in the practicals)


write your code inside your understandinggis directory as normal, then just before

submission extract your zip file to a location elsewhere on your machine and test to see if

it still works

If you don’t know how to zip a folder, you can see how to do so here:



Please make sure that you use Please make sure that you use .zip format format (as per the instructions) - and not other

forms of compression (e.g. .rar, .tar, .gz, .7z etc.).

Some Pointers Some Pointers

Remember the golden rule: Don’t Panic! Don’t Panic! You have done almost every part of this

assessment in the course already, this is just a matter of finding the right bits and putting

them together! The Pseudocode for this assessment is given in the paper, so all you have to

do is implement it.

As with Assessment 1 - the key is to plan what you are going to do before you start coding -

don’t just dive in and hope for the best! All you need to do is keep breaking down each stage

into smaller and smaller jobs until you have a clear idea of how the program should look

(think back to our session on computational thinking computational thinking). As with Assessment 1, everything

that you need to be able to do has been covered in the course before, so if you find yourself

heading too far off the beaten track, then this is a good clue that you might be

overcomplicating things!

As with before, you can reproduce this algorithm entirely using things that you have already

done in this course, and additional help is available in the Hints and Tips section below.

Remember - break each part of the algorithm down into small steps again and again until

each step equates to approximately one line of code.

Note that you are only undertaking this analysis for a single level, so you can ignore the

outer (first) loop described in the pseudocode in the paper (which loops through multiple


The eagle-eyed of you will note from the paper that the original so!ware is Open Source

(written in Java) and published here. You are welcome to look at this to help you (this is the

file with the algorithm in it if you want it) - but please don’t be too reliant on this code, or feel

bad if you struggle to read it (much of it probably isn’t that useful to you, and it is in a

completely different and much lower level language). Focus on the approaches that you have

learned in this course, and remember to maximise efficiency wherever possible! Crucially, do

not attempt to simply copy and paste this code and convert it to Python - this will not attempt to simply copy and paste this code and convert it to Python - this will

not result in a good mark not result in a good mark.


In the report you should briefly explain why the algorithm is necesssary and how it works,

before providing justification justification for the approaches that you have taken in your

implementation, and a discussion around the limitations limitations of your the algorithm and your

implementation of it. In all cases, we are looking for you to demonstrate that you have

understood understood both the algorithm and what you have done to implement it.

Creating a Random Point (Minimum Bounding Radius) Creating a Random Point (Minimum Bounding Radius)

In the original version, random points are generated using a random distance and direction

from the centroid centroid of polygon. You will notice that this is quite different to the approach

given in Understanding Distortion, which simply generates a random x an y coordinate based

on the bounding box of the feature. Both are equally effective, and there are no more or less

marks for either approach - though the bounding box-based approach from Understanding

Distortion is likely easier, and perhaps slightly more efficient.

If you do want to use the minimal bounding radius minimal bounding radius approach, it is achieved simply by the

following steps:

1. Get the centroid of the polygon

2. Loop through each node in the polygon to find the distance from the centroid to the

furthest away node

3. Generate a random number between 0 and 360 (direction) and between 0 and the

distance that you calculated in step 2

4. Calculate the location of the new point at the specified distance and direction from the

centroid - you can do this using the offset() function that we used in week 5

Code Presentation Code Presentation

Broadly speaking, your code should be presented in the following form (all of which should

be contained in the appropriate part of the template, marked by '''ALL CODE MUST BE


1. import statements

2. functions (if any)

3. main code

You code should be well commented commented (remember the rule of thirds rule of thirds!) and do not leave in

any unnecessaary testing material, such as print() statements that do not contribute to

the user experience.

Also make sure that you are using all of the libraries that you have imported - if you load a lot

of unnecessary libraries, this demonstrates poor understanding (as well as slowing down

your code)!

Error Messages Error Messages

Remember, when you get an error message in the console, it is not the computer telling you

off or making fun of you - it is trying to help you! If you read it carefully it will tell you both

what the problem is and even which line of code is causing it - so don’t get upset when you

get a lot of errors, find the line of code that is causing the problem, read the description

(Paste it into Google if you don’t understand it) and see if you can solve it - you’ll probably

find that in most cases you can!

It is also worth remembering that the error message is extremely extremely unlikely to be incorrect

(though it can have been caused by another problem just above it); and it is extremely

unlikely that there is an error in one of the libraries in the understandinggis

environment. If, for example, it says it can’t find a file and you are sure that it’s there, then

either your file path or the working directory (top right hand corner of Spyder) is wrong.

The fact is, programming is 20% writing code and 80% debugging it. This doesn’t change as

you get more experienced, the errors just get more complicated!!

Hints & Tips Hints & Tips

1. This is not like an essay that you can bash out the night before - the longer you give

yourself to complete this, the easier you will make it for yourself.

2. Be sure to select your parameters (such as and ) for the algorithm carefully - the paper

explains how they affect the analysis, so use this knowledge and some

experimentation experimentation to get the best result for your client - you will definitely want to

explain the variables and your selected values to your client.

3. Note that you are only undertaking this analysis for a single level, so you can ignore the

outer (first) loop described in the pseudocode in the paper (which loops through

multiple levels).

4. Look back to the lectures and practical material, and make use of the Hints and Solutions


5. If you do feel like you are starting to panic - don’t suffer in silence - let me know!

Marking Criteria Marking Criteria

Marks will be given based upon:

The quality of the report (35% of assignment grade), including:

A clear and concise explanation of what you have done and why, demonstrating an

understanding understanding of the process and why it is necessary. Consider if you might

illustrate the process most effectively using more than one map. You should cover


The algorithm itself (presented in the paper)

Your implementation of it (how you achieved this in Python)

A detailed explanation of the limitations limitations of your analysis, including a clear

demonstration of the understanding understanding of the Geographical Information Science

issues involved.

In both of the above, you would expect clear links to the material that we have

covered in class as well as the Huck et al. 2015 paper.

A brief justification of your chosen CRS.

The quality of the algorithm algorithm (35% of assignment grade), including:

Efficiency iciency (the speed with which your algorithm resolves the problem)

Robustness Robustness (the script will not fail if it encounters ’normal’ problems, such as

incorrect file paths, missing data, unexpected inputs, and so on).

The quality of the code (20% of assignment grade), including:

n r

Neatness Neatness and Elegance Elegance (the script is well written and well presented)

Comments Comments (demonstrating a thorough understanding of the approaches that you

have used)

The cartographic quality of the resulting map(s) (10% of assignment grade), including:

Aesthetic quality and readability

Selection (and justification) of a suitable projection

The key to this is in the name of the course: In both your code and report I want you to

demonstrate a clear Understanding Understanding of what you have done - this is why comments are so

important in your code!

Finished! Finished!

