CS251 – Data Structures and Algorithms
Summer 2024
Project 3: Graphs
Due: July 26 at 11:59PM
General Guidelines:
The APIs given below describe the required methods in each class that will be tested.
You may need additional methods or classes that will not be directly tested but may be necessary to complete the assignment. Keep in mind that anything that does not need to be public should generally be kept private (instance variables, helper methods, etc.).
Unless otherwise stated in this handout, you are welcome (and encouraged) to add to/alter the provided java files as well as create new java files as needed.
In general, you are not allowed to import any packages in your code without explicit permission from your instructor!
Note on Academic Dishonesty:
Please note that it is considered academic dishonesty to read anyone else’s solution code to this problem, whether it is another student’s code, code from a textbook, or something you found online. You MUST do your own work. You are allowed to use resources to help you, but those resources should not be code, so be careful what you look up online.
Note on implementation details:
Note that even if your solution passes all test cases, if a visual inspection of your code shows that you did not follow the guidelines in the project, you will receive a 0.
Note on grading and provided tests:
The provided tests are to help you as you implement the project. The Vocareum tests will be similar but not exactly the same. Keep in mind that the points you see when you run the local tests are only an indicator of how you are doing. The final grade will be determined by your results on Vocareum.
GritTok Hiring Team:
Congratulations! You’ve been hired to be a part of our dev team for the new Purdue social media platform, GritTok. GritTokis a space where students can post reactions to recent exams, assignments, and professors … but there’s a twist. Once a certain number of people see the post, it disappears! This feature introduces the core ingenuity of our platform, which is the feeling of FOMO, or Fear Of Missing Out. This FOMO causes users to check their feeds often, and boosts our platform’s usage. While still in Beta, we need your help in creating a recommender system for user’s posts. Goodluck!
Project:
This project will have four parts, all of which will be written in Java.
Part 1: You will first implement a tool that will help you visualize the social network.
You’ll then modify it so that you can quickly see which users have seen a post and which users are yet to see it.
Part 2: Once you have a good visualization, you will implement Dijkstra’salgorithm and Prim’s algorithm which will be important in creating the recommender.
Part 3: Using Dijkstra’s algorithm and Prim’s algorithm, you will define a function that tells you the predicted FOMO that will be induced with a specific recommendation.
Using this FOMO score, you will have a method to pick the next recommendation (who to show the post).
Part 4: The FOMO score from part 3 will have a tunable hyperparameter. As a data scientist, you will analyze how this hyperparameter is important in a good FOMO scoring function as the size of the social network varies. To do this, you will simulate a post being viewed within the social network according to your FOMO score and you will tune the hyperparameter using these simulations.
Note: Do NOT ask for help or clarification before you have read through and tried to understand ALL comments. This applies to all projects and is a life lesson in independent thinking.
Part 1: Graph Visualization
In this part, you will use JUNG to create good visualizations of a social network. The other teams working at GritTok have put a lot of effort into building a good social network, since it is important for your part in creating good recommendations.
The social network that you are being given is an undirected graph, where each node represents a person, and an edge represents some connection between people. These connections are based on various interactions, such as liking posts, commenting, direct messages, and time spent looking at profile. In this specific graph, a smaller edge weight represents a stronger connection (think of it like how close they are to each other). Two nodes with no connection represent people who have little to no interaction history.
Creating a recommender system is a difficult task, and it is nearly impossible to do so without first visualizing what the data looks like. To represent the graph, you will need to
use JUNG (https://jrtom.github.io/jung/javadoc/), which is a great library for these types of visualizations. In this part, you will receive as input a representation of the social network graph and a subset of nodes which have already seen the post (just borrow the test cases from part 3’s input since there is no input/output in this part). It will be your job to plot the graph in such a way that weights are labeled, and such that the special nodes which have seen the post are a different color.
For example, in the above graph, the weights are explicitly written on the edges, and the yellow nodes represent people who have already seen the post. Note that the edge lengths do not necessarily need to match its weights. Your task in the next parts will be to identify which person to recommend this post to maximize FOMO. Who would you pick in this graph?
This part will have no input/output but is very important for debugging in future parts.
Download JUNG using this linkhttps://sourceforge.net/projects/jung/files/by clicking on “Download Latest Version.”
Follow the steps here https://www.jetbrains.com/help/idea/working-with-module-dependencies.htmlto add module dependencies after you download the zip file. JUNG visualizations will only work locally, but will be very useful in debugging later on. (Don’t try to use JUNG on Vocareum, the import statements will be removed anyway).
Part 2: Dijkstra and Prim
In this section, you will implement both Dijkstra’s algorithm and Prim’s algorithm, but for sparse representations. In a social network, people are only close with a very small proportion of the total population. Therefore, there are serious complexity gains to be had from a sparse representation. In this project you will be using the adjacency lists (AL) representation.
In the version of Dijkstra’s algorithm that you will implement, you will compute the minimum distance from a source node to each other node. For Prim’s, you will compute the minimum spanning tree (MST) of the graph. The difference between ‘normal’ Dijkstra and Dijkstra for sparse representations is the speed with which you can find neighbor
nodes. Make sure to use a priority queue (you can use the java.util.PriorityQueue implementation).
Important: for both implementations, you must represent the graph as an adjacency list, and you must do it yourself. For Dijkstra, if two nodes have the same distance, the smaller-numbered node comes first. For Prim, if two edges have the same weight, then the edge with the smaller-numbered source node comes first, and if both are the same, then the edge with the smaller-numbered destination node comes first. Finally, for Prim, make sure that you always start the algorithm on the 0th node.
As input, you will receive an AL representing the social network followed by the source node’s index. First, you will output the minimum distance between the source and every other node as a list of tuples [(node_1,distance_1), (node_2,distance_2), … , (node_n, distance_n)] in ascending order. Then you will output the AL representation of the MST in the format of the given test cases.
NOTE: At any point in this project, if your solution is running too slowly, it is your responsibility to think about the data structures and algorithms that we have used and learned to speed it up.
Part 3: Defining a FOMO Score
Now that you have a way to check shortest paths and an MST, you are ready to define a FOMO scoring function. Think about what it means to have FOMO. The key insight is that a person can only have FOMO if people they know well have seen the post, but they have not. This means that we want to recommend a post to someone who is as close as possible to the most people. Right? Well, sort of. Let’s look at how you will do this.
Consider the following example:
Assume that the yellow nodes are people that have seen the post, and 8 people can see the post before it disappears. Who should we recommend it to first? Let’s look at node 25 as a candidate and node 12. Using Dijkstra’s algorithm, we can compute each
shortest distance from 25 to the yellow nodes. Doing so gives us 174, 101, 149, 78, and 40. We can do the same thing for node 12 to get 310, 260, 342, 317, and 246. We now take a central metric of both; for this project, you’ll use mean (but using median would be interesting). So we get means of 108.4 and 295 for nodes 25 and 12 respectively. So clearly node 25 is a better recommendation. But now consider the following situation:
Using our previous metric and calculating the mean distance from the viewers shows that node 2 is one of the best choices. But consider what happens if we recommend to
2. Node 2 is what you might call a “spreader”. That is, once node 2 shares the post, it could spread in any direction! For example, filling out the 8 viewers might look
something like this:
If all we care about is being as close as possible to as many viewers, then we can get an issue like this. Clearly the users here would not feel that their group of friends is privy to some post, because the post has been shared high and wide among the network. So
how can we take this into account? The solution here is to use a minimum spanning tree (MST). In this network, the minimum spanning tree can be thought of as an “information highway”. That is, it’s a likely way that posts will be spread. Now look at the MST for the previous example:
Notice that nodes 2, 3, 0, and 23 are linchpins in the information spread. The way we can quantify this is by a node’s degree in the MST, or the number of edges it has. The principle you will follow is that high-degree nodes should be avoided. We have these two competing metrics: mean distance (closeness) and MST degree (popularity), so naturally we combine them! The FOMO score will look like this:
FOMO = α * closeness + popularity
Where α is a tunable parameter. We need this because we don’t know which one is
more important and their scales are completely different. If the closeness metric is much more important, then α should be a big number, but if the popularity is more important, then it should be small. Remember, closeness and popularity are to be minimized, so we want to recommend people with smaller FOMO scores.
For this part, you will take as input the AL representation, a list of nodes representing
people who have seen the post, and an alpha value. You will compute the FOMO score for each person who can be recommended (non-viewers) and output them in ascending FOMO score order (the person with the smallest FOMO score using the given alpha
comes first). See test cases for formatting.
Part 4: Tuning the FOMO Score
Finally, you will need to figure out what values of α should generally be used for certain networks given their size and sparsity. To do this, you need a way of evaluating how
good your α is, and therefore how good your FOMO score is. Once a post’s view count reaches its maximum and the post disappears, how might we measure how ‘satisfied’ a user is with the app? We use the assumption that a person is satisfied with the app if they feel that they have seen a post among friends and that others have not. That is, for each of the viewer nodes, we can calculate its average distance from all other viewer nodes. Then we can calculate its average distance from all other nodes in general. Taking the ratio of these two means, we will have a metric that is small when users are ‘satisfied’ and large when they are not. Here is the formula:
mean_viewer_to_viewer_distance
satisfaction =
mean_viewer_to_all_nodes_distance
In the numerator, every distance from every viewer to every viewer is included (including repetitions), and in the denominator, every distance from every viewer to all nodes is included (including viewers). Make sure to account for the fact that a node’s distance from itself is always 0. Note that again a lower satisfaction score means that your recommender works better.
Now that you have a way of measuring the outcome of your recommendations, you need to somehow guess who the viewers will be. To simplify this, you will make a reasonable assumption that every person who is recommended a post will view it. Using this assumption, you will simulate the recommendation process until the post disappears, and then analyze the users’ satisfaction.
In this part, you will be given an AL representation, a list of viewers, and a maximum number of viewers (before the post disappears). We want to know what the best value of α is, so it will be your job to simulate the post’s spread for several values of α and
report which value results in the highest satisfaction. The higher-ups in our company will use this information in determining α values for various network sizes and sparsities.
When you simulate the spread, you will need find the top recommended person and add them to the list of viewers and repeat this process until the post disappears (when it
reaches the maximum number of viewers given to you). Make sure to recalculate FOMO scores at each step, as these highly depend on who has seen the post! The best value of α could be small (0-1) or large (>1). You will perform a grid search on α ranging from
0-1 (inclusive) and 1-10 (inclusive).
Use the provided linspace function in Utils.java to uniformly sample 20 points from 0-1, and 5 points from 1-10. Please consult the comments on the use of this function. Note that since 1 overlaps between these ranges you should have a total of 24 datapoints.
Your output will be the α value which minimizes the satisfaction of your simulated
spread. If there are multiple values of α which minimize the satisfaction, then report the smallest one (cutoff at 2 decimal places, not rounded). See the test cases for examples.
Testing and Grading
You will be provided with 200 correct input and output test cases for each part (in the Vocareum starter code). These tests act as an indicator of your progress but do not guarantee your score. The final grade on both parts will be determined by your results on the tests run on Vocareum. You are encouraged to create your
own main test files to debug any errors you face.
If your test case output differs from the correct output by even one character, it will be counted wrong, so make sure to test your code using the test cases you have been given.
To do this, you must use the given test cases locally (e.g. go to the directory
above “src” and call javac -cp src src/Part2Main.java and then call java -cp src
src/Part2Main.java input0 input5 … etc with as many tests as you want). You can then call (for example) diff part1/outputs/output0 part1/correct_outputs/output0 to make sure there are no differences between your output and the correct
output. You can also write a more sophisticated script if you wish to check many tests at once.
The points for each part are as follows:
Part 1 |
0 |
Part 2 |
30 |
Part 3 |
20 |
Part 4 |
50 |
Total |
100 |
Submission Instructions
Your file structure should look like this (necessary for local testing):
• P3/src/Part2Main.java (your relevant work for part 2)
• P3/src/Part3Main.java (your relevant work for part 3)
• P3/src/Part4Main.java (your relevant work for part 4)
• P3/src/InputOutput.java
• P3/src/Utils.java
• P3/src/any_additional_files.java (helper methods/methods common to all parts …)
• P3/part2/inputs/input0, input1, punt2 … etc. with the given input cases
• P3/part2/correct_outputs/output0, output1, … etc. with the given output cases
• P3/part3/inputs/input0, input1, punt2 … etc. with the given input cases
• P3/part3/correct_outputs/output0, output1, … etc. with the given output cases
• P3/part2/outputs (this will be where your program sends its outputs)
• P3/part2/outputs (this will be where your program sends its outputs)
• Etc. for all parts of the project
*do NOT turn in these files, as they may prevent Vocareum from properly grading; only use them for local testing
Your main scripts (Part2Main.java, Part3Main.java, etc.) should receive a list of
arguments that are each a file name from the “inputs” directory in the form of
“input#” where # is the test case number. Your main should then produce a
solution for each file name and write it to the corresponding “output#” file in the “outputs” directory. The InputOutput.java class that does this has been provided to you. In order to use it, you just need to have it call your solution from the
corresponding part.
There are also helper methods that are provided for you to use in Utils.java.
Your submission in Vocareum must be the “starter_code” directory (put inside the “work” directory), and it must include the Part2Main.java, Part3Main.java, etc.
scripts to run.
This project will only allow the following import statements:
import java.util.Iterator;
import java.io.BufferedWriter; import java.io.File;
import java.io.FileInputStream; import java.io.FileOutputStream; import java.io.FileWriter;
import java.io.IOException; import java.io.InputStream; import java.io.OutputStream;
import java.io.PrintWriter; import java.util.Scanner; import java.util.ArrayList;
import java.util.EmptyStackException; import java.util.List;
import java.lang.reflect.Array; import java.util.Arrays;
import java.lang.Math;
import java.util.Random;
import java.math.BigDecimal;
import java.math.RoundingMode; import java.util.PriorityQueue;
import java.util.Collections; import java.util.HashMap;
ANY OTHER IMPORTS WILL BE REMOVED. It is your responsibility to make
sure that none of your code uses an illegal import. Any line with a print statement will also be removed, so do not put important code on the same line as print statements (or even better, remove all print statements).
Notes on Vocareum
Vocareum is not a debugging tool. As computer scientists, you are expected to
work with large amounts of data, which often require big resources and time. You can expect the grader to take >15 mins for some parts of this assignment, which is why we highly encourage you to debug locally. It is your responsibility to make sure that your code is efficient in time, otherwise it will time out and fail the test cases.
You will receive exactly 10 submissions per part. There will be no extra
submissions. You are free (and encouraged) to write your own test cases and use them to test your code (just follow the given naming convention).
Any submission past the deadline will receive a score of 0. If your code does not compile you will receive a score of 0. If you have an infinite loop in your code
even for one test case, we cannot guarantee points for any other test cases.
版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:821613408 微信:horysk8 电子信箱:[email protected]
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。