联系方式

您当前位置:首页 >> Java编程Java编程

日期:2024-04-12 08:02

Faculty of Engineering

School of Electrical

Engineering

and Computer Science

CSI 2120

Programming Paradigms

Similarity image search

Comprehensive assignment

(24%)

Winter 2024

Project for a group of 2 students at most

Part 1 due February 16th before 23:59

Part 2 due March 8th before 23:59

Part 3 and 4 due le April 22nd before 23:59

Late assignment policy: minus 10% for each day late. For example : a project due Friday night but

handed out on the Monday morning: -30%

Problem description

Nowadays, images are created and accumulated at a frenetic pace. It has become essential to have powerful

computer tools capable of analyzing these images and facilitating the search, classification, and discovery

of images of interest. In this project, you are asked to program a simple method for searching similar

images. By similar images, we mean images that resemble each other in terms of the content they present

and their visual appearance, colors and textures (for example, images showing sunsets).

Your program will manipulate color digital images. Let's then explain how these images are structured. A

digital image is divided into a rectangular grid in which each element is a pixel ('picture element'). This

grid contains a certain number of columns and rows, defining the image's resolution (for example, your

phone can probably produce images with a resolution of 4032x3024 pixels). A pixel contains the color

information associated with the corresponding position in the image. If your image were in grayscale (a

'black and white' image), each pixel would have a value between 0 and 255 (this 8-bit representation is

the most common), with 0 being black, 255 being white and the other values representing different shades

of gray. In the case of a color image, each pixel contains three values (three channels), corresponding to

the three primary colors: red, green, blue (RGB). These three values represent the amount of red, green,

and blue required to produce the desired color. For example, the color 255: red, 255: green, 0: blue will

produce a light yellow, while the color 50: red, 0: green, 0: blue will result in a dark red. The values

associated with a pixel are usually represented by a vector [R, G, B] containing the values for these three

channels (e.g. [255, 255, 0] for light yellow). Since each color can take a value between 0 and 255, the

combination of the three channels produces 256 x 256 x 256 different colors (i.e. more than 16 million).

CSI 2120 page 2

_________________________________________________________________________________________________

We are looking for similar images. So, let’s assume that images with similar colors should have similar

content. This is a simplistic assumption that is not always true but generally yields acceptable albeit

imperfect results. This is what we will verify in this project. Therefore, it is necessary to calculate the

histogram of an image and compare these histograms.

The histogram is simply the count of the colors contained in an image. It involves counting how many

pixels have the color [0, 0, 0], how many have the color [0, 0, 1], and so on. However, this would mean

counting the pixels for all 16 million possible colors, which is expensive and not very precise. It is

therefore recommended to reduce the color space. This can be done by simply reducing the number of

possible values, for example, by going from 8-bit values to 3-bit values per channel, resulting in a color

space of only 8 x 8 x 8 = 512 possible colors. In this case, a simple bit right-shift by 8 – 3 = 5 positions

reduces the space. The histogram will then have only 512 entries, a bin for each of the possible colors. To

compare images with different resolutions (different numbers of pixels), it is necessary to normalize the

histogram, i.e., divide each entry by the total number of pixels (such that by summing all the values of the

histogram, we will obtain 1.0).

Histogram comparison

As explained, the images will be compared by comparing their histograms. This can be done using

histogram intersection which can be computed, for the histograms H1 and H2, as follows:

If the two histograms are identical, this sum will give a value equal to 1.0. Conversely, if the two images

have no colors in common, then their histogram intersection will be equal to 0.0. Consequently, the more

similar are two images, the closer to 1.0 will be their histogram intersection.

Searching similar images

You are asked to find the images that are similar to a query image using color histogram intersection. An

image dataset will be provided for the search. The algorithm that searches the K most similar images to a

query image I using a color space reduced to D bits is as follows:

1. Compute the reduced color histogram of I

a. Reduce the pixel values by applying (8-D) right bit shifts for each channel R, G, B.

i. R' = R >> (8-D)

ii. G' = G >> (8-D)

iii. B' = B >> (8-D)

b. The number of bins in the histogram H will be N=2D * 3

c. Count how many pixels of each color are contained in I to obtain histogram H. The

histogram H is an array of N elements.

CSI 2120 page 3

_________________________________________________________________________________________________

i. The index of the histogram bin corresponding to color [R',G',B'] can be computed

as (R' << (2 * D)) + (G' << D) + B)

d. Normalize H such that the values of all its bins sum to 1.0

2. Compare H with all pre-computed histograms in the image set.

a. This comparison is done using histogram intersection

b. Returns the K images with distances the closest to 1.0

Programming

You have to write programs under different paradigms that solve different versions of this problem. You

will receive specific instructions for each language.

Each program will be marked as follows:

Program produces the correct result [3 points]

Adherence to programming paradigm [2 points]

Quality of programming (structures, organisation, etc) [1 point]

All your files must include a header showing student IDs and names of the group members.

All files must be submitted in a zip file.

CSI 2120 page 4

_________________________________________________________________________________________________

Dataset

• You have access to a dataset of images. The images are provided in jpg format. The histogram of

each image in this dataset has been computed (3 bits per channel) and saved in a text file.

• We also give you 16 query images. You will have to find the 5 most similar images to each query

images. The query images are provided in jpg and ppm format, you can use one or the other.

CSI 2120 page 5

_________________________________________________________________________________________________

1. Object-oriented part (Java) [6% of your final mark]

Since this solution must follow the object-oriented paradigm, your program must be composed of a set of

classes. Specifically, it must include, among others, the classes listed below.

In addition to the source code of your solution, you must also submit a document that includes a UML

diagram of all your classes (showing attributes, associations and methods). Do not use static methods,

except for the main function. This document must also cite all references used to build your solution.

• The SimilaritySearch class

o that contains the main method

▪ you must specify the image filename, the image dataset directory

• java SimilaritySearch q01.jpg imageDataset2_15_20

▪ you can assume that the histograms of the image dataset have been pre-computed

▪ however, you must compute the histogram of the query image

▪ you can assume that the search is done on 3-bit color reduced images but make

your program as generic as possible (no hard-coding of the depth value except in

the main method).

▪ The program must print the name of the 5 most similar images to the query image

• The ColorImage class that includes

o A constructor that creates an image from a file

▪ public ColorImage(String filename)

▪ you can read the image from the jpg or the ppm format (just choose one format)

• you can use the JMF Java API to read jpg images

• the ppm format is just a text file with the RGB values listed

▪ the pixel values of the images are stored in an array representation of your choice

(to be described in the submitted document)

o The following image attributes (and the corresponding getter methods)

▪ int width

▪ int height

▪ int depth (the number of bit per pixel)

o A getPixel method that returns the 3-channel value of pixel at column i row j in the

form of a 3-element array

▪ public int[3] getPixel(int i, int j)

o A reduceColor method that reduces the color space to a d-bit representation

▪ public void reduceColor(int d)

CSI 2120 page 6

_________________________________________________________________________________________________

• The ColorHistogram class that includes

o A constructor that construct a ColorHistogram instance for a d-bit image

▪ public ColorHistogram (int d)

o A constructor that construct a ColorHistogram from a text file

▪ public ColorHistogram (String filename)

o A setImage method that associate an image with a histogram instance

▪ public void setImage(ColorImage image)

o A getHistogram method that returns the normalized histogram of the image

▪ public double[] getHistogram()

o A compare method that returns the intersection between two histograms

▪ public double compare(ColorHistogram hist)

o A save that saves the histogram into a text file

▪ public void ColorHistogram (String filename)

o plus any other classes, methods or attributes you judge necessary

CSI 2120 page 7

_________________________________________________________________________________________________

2. Functional programming part (Scheme) [6% of your final mark]

For this part of the comprehensive assignment, we ask you to implement the Image Similarity Search

algorithm following the functional paradigm. Refer to the general problem description section for the

algorithmic steps.

The requirements are the same as for the Object-oriented part, except that, this time, you do not have to

generate the histograms of the query images. You can use the histogram files that you have generated

using your Java program.

You must create the following function in order to start your program:

(similaritySearch queryHistogramFilename imageDatasetDirectory)

This function should return the name of the 5 most similar images to the query image. You will obviously

have to create other functions. Remember that under the functional paradigm, it is much better to create

several short functions than few long ones.

Submit your project in a zip file containing the scheme functions file and a document listing the functions

you have created, and the output obtained for each of the query image. This document must also cite all

references you may have used to build your solution.

All your Scheme functions must have a header describing what the function does, the input parameters

and the output.

You are not allowed to use functions terminating by ! (such as the set! function) and you must not use

iterative loops, use recursion instead.

CSI 2120 page 8

_________________________________________________________________________________________________

3. Concurrent programming part (Go) [6% of your final mark]

For the concurrent part of the comprehensive assignment, we ask you to program the image similarity

search algorithm using multiple threads. To make it more computationally expensive, your program will

have to compute all histograms (from the query and database images) each time you perform a query.

Your go program is executed with the following arguments:

> go run similaritySearch queryImageFilename imageDatasetDirectory

Note: os.Args provides access to the command line arguments.

Reading a jpeg image

Fortunately, if you look at the Go documentation about the image package, https://pkg.go.dev/image, you

will find an example showing how to compute the histogram of an image. This is not exactly what you

have to do but this is a great starting point. A slightly modified version of this example is provided; it has

the signature of the histogram Go function you will have to write.

type Histo struct {

Name string

H []int

}

func computeHistogram(imagePath string, depth int) (Histo, error)

This function computes the histogram of the specified jpeg image and reduces it to the number of bits

given by the depth parameter. The starting code is relatively easy to follow, one thing to notice is that the

pixel values of images in Go are store in uint32 and you must right-shift the bits of each channel by 8

positions to get the correct range (0 to 255). Check the provided code, it simply displays the RGB values

of an image.

You will also need a function that computes the histograms of a slice of image filenames.

func computeHistograms(imagePath []string, depth int, \

hChan chan<- Histo)

When a histogram is computed, it is sent to the given channel.

CSI 2120 page 9

_________________________________________________________________________________________________

The main function

Your main function must perform the following operations:

1. Create the channel of histograms;

2. Get the list of all image filename in the dataset directory;

3. Split this list into K slices and send each slice to the go function computeHistograms;

4. In a separate thread, open the query image and compute its histogram

5. Read the channel of histograms

a. When a histogram is received compare it to the query histogram

b. Based on the similarity results, maintain a list of the 5 most similar images

6. Once all images have been processed, print the list of the 5 most similar images.

7. Close all channels and make sure all threads are stopped before the program terminates.

Experiments

In order to determine the optimal configuration for your concurrent algorithm, we ask you to perform the

following experiments and report the execution time for each case:

• K=1

• K=2

• K=4

• K=16

• K=64

• K=256

• K=1048

Create a graph showing running time versus number of threads (use the average running time for all

queries). Do not forget to not print text to the console while you are estimating running time, this would

considerably slow down your program. Also specify the operating system and the specifications of your

processor (including the number of cores). You can also add your own experiences with other

configurations. Remember that your program must compute all histograms (from query and dataset

images), do not use the pre-computed histogram text files.

In addition to your source code, you must submit a document showing the results of your experiments.

CSI 2120 page 10

_________________________________________________________________________________________________

4. Logical programming part (Prolog) [6% of your final mark]

For this last part of the comprehensive assignment, you have to implement the Image Similarity Search

algorithm following the logic programming paradigm. Refer to the general problem description section

for the algorithmic steps.

The requirements are the same as for the other parts, except that, this time, you do not have to generate

the histograms of the images. You can use the histogram text files for both the query and dataset images

that are provided.

You must create the following predicate that solves this problem as follows:

?- similarity_search('q00.jpg.txt',S).

S = [('2144.jpg.txt', 0.8799533333333334), ('1998.jpg.txt',

0.86362), ('3538.jpg.txt', 0.79226), ('3920.jpg.txt', 0.77334),

('4923.jpg.txt', 0.76828)] .

(note that the solution shown is not correct, it is only provided to demonstrate the format of the solution).

We give you a starter project that includes most of the predicates required. In particular, the one that reads

a histogram file and returns a list will be very useful.

?- read_hist_file('q00.jpg.txt',H).

H = [2715, 22, 0, 0, 0, 0, 0, 0, 2|...] .

The one that generates the list of text files in a directory is also useful.

?- dataset(D),directory_textfiles(D,L).

D = 'C:\\Users\\Documents\\imageDataset2_15_20\\',

L = ['1000.jpg.txt', '1001.jpg.txt', '1003.jpg.txt', '1004.jpg.txt',

'1005.jpg.txt', '1006.jpg.txt', '1007.jpg.txt', '1008.jpg.txt',

'1009.jpg.txt'|...].

Note how the dataset directory path is provided through the predicate dataset/1.

CSI 2120 page 11

_________________________________________________________________________________________________

Finally, we also provide you with the predicate that performs the high-level algorithm.

similarity_search(QueryFile,DatasetDirectory, DatasetFiles,Best):-

read_hist_file(QueryFile,QueryHisto),

%1.

compare_histograms(QueryHisto, DatasetDirectory,

DatasetFiles, Scores), %2.

sort(2,@>,Scores,Sorted), %3.

take(Sorted,5,Best). %4.

As you can see, this predicate first read the histogram file of the query image (read_hist_file/2). It then

compares this histogram with all the histograms in the list of histogram files (compare_histograms/4) by

producing a list of (HistogramFilename,Score) pairs. The next step sorts the obtained list (sort/4) from

which the first 5 pairs are extracted (take/3).

Read this Prolog file carefully and implement the missing predicates.

Submit your project in a zip file containing the well-commented Prolog predicates; all your Prolog

predicates must have a short header describing what the predicate does and its parameters. Also include a

document showing the output obtained for each of the query image.

CSI 2120 page 12

_________________________________________________________________________________________________

Rules

You can do this assignment in a group of two to learn team work. Make sure you collaborate both in

thinking and brainstorming about this problem and programming with your partner. Any similarity

between your programs to other groups is considered plagiarism. Yes, if you do not like team work, you

can do it alone. Do not use any code or program from the Internet because it is also considered plagiarism.

See the university policies for plagiarism in the following link.

https://www2.uottawa.ca/about-us/provost

Important Note: Using ChatGPT is strictly forbidden, if your TA sees that your code comes from

chatGPT it, you will receive 0 for all parts of this project immediately.

Measures that we take to detect plagiarism

Teaching assistants have been instructed to report to the professor any suspicion of plagiarism

they find when they mark assignments.

If plagiarism has been detected in any part or in the whole assignment, the professor will take

appropriate measures. Recall that it is equally bad to copy a solution and to let someone else

copy your solution.


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:821613408 微信:horysk8 电子信箱:[email protected]
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:horysk8