联系方式

您当前位置:首页 >> Python编程Python编程

日期:2024-05-06 09:31

OCMP5328 - Advanced Machine Learning

Assignment 1

This assignment is to be completed in groups of 2 to 3 students. It is worth 25% of your

total mark.

1 Objective

The objective of this assignment is to implement Non-negative Matrix Factorization

(NMF) algorithms and analyze the robustness of NMF algorithms when the dataset is

contaminated by large magnitude noise or corruption. More specifically, you should

implement at least two NMF algorithms and compare their robustness.

2 Instructions

2.1 Dataset description

In this assignment, you need to apply NMF algorithms on two real-world face image

datasets: (1) ORL dataset

1; (2) Extended YaleB dataset

2

.

• ORL dataset: it contains 400 images of 40 distinct subjects (i.e., 10 images per

subject). For some subjects, the images were taken at different times, varying the

lighting, facial expressions, and facial details (glasses / no glasses). All the images

were taken against a dark homogeneous background with the subjects in an

upright, frontal position. All images are cropped and resized to 92×112 pixels.

• Extended YaleB dataset: it contains 2414 images of 38 subjects under 9 poses

and 64 illumination conditions. All images are manually aligned, cropped, and

then resized to 168×192 pixels.

1https://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html

2http://vision.ucsd.edu/iskwak/ExtYaleDatabase/ExtYaleB.html2

Figure 1: An example face image and its occluded versions by b × b-blocks with b =

10,12, and 14 pixels.

Note: we provide a tutorial for this assignment, which contains example code for

loading a dataset to numpy array. Please find more details in assignment1.ipynb.

2.2 Assignment tasks

1. You need to implement at least two Non-negative Matrix Factorization (NMF)

algorithms:

• You should implement at least two NMF algorithms with at least one not

taught in this course (e.g., L1-Norm Based NMF, Hypersurface Cost Based

NMF, L1-Norm Regularized Robust NMF, and L2,1-Norm Based NMF).

• For each algorithm, you need to describe the definition of cost function as

well as the optimization methods used in your implementation.

2. You need to analyze the robustness of each algorithm on two datasets:

• You are allowed to design your own data pre-processing method (if

necessary).

• You need to use a block-occlusion noise similar to those shown in Figure 1.

The noise is generated by setting the pixel values to be 255 in the block. You

should design your own value for b (not necessary to be 10,12 or 14). You

are also encouraged to design your own noise other than the block-occlusion

noise.

• You need to demonstrate each type of noise used in your experiment (show

the original image as well as the image contaminated by noise).

• You should carefully choose the NMF algorithms and design experiment

settings to clearly show the different robustness of the algorithms you have

implemented.

3. You are only allowed to use the python standard library, numpy and scipy (if

necessary) to implement NMF algorithms. 3

2.3 Programming and External Libraries Python

This assignment is required to be finished by 3. When you implement NMF

algorithms, you are not allowed to use external libraries which contains NMF

implementations, such as scikit-learn, and Nimfa (i.e., you have to implement the NMF

algorithms by yourself). You are allowed to use scikit-learn for evaluation only (please

find more details in assignment1.ipynb). If you have any ambiguity whether you can

use a particular library or a function, please post on canvas under the Assignment 1

thread.

2.4 Evaluate metrics

To compare the performance and robustness of different NMF algorithms, we provide

three evaluation metrics: (1) Root Means Square Errors; (2) Average Accuracy; (3)

Normalized Mutual Information. For all experiments, you need to use at least two

metrics, i.e., Root Means Square Errors and Average Accuracy.

• Root Means Square Errors (RMSE): let X denote the contaminated dataset (by

adding noise), and  ̂ denote the clean dataset. Let   and   denote the

factorization results on  ̂ , the Root Means Square Errors then can be defined

as follows:

(1)

• Average Accuracy: You need to perform some clustering algorithms (i.e., Kmeans)

with num clusters equal to num classes. Each example is assigned with

the cluster label (please find more details in assignment1.ipynb). Lastly, you can

evaluate the accuracy of predictions Ypred as follows:

(3)

where I(·,·) is mutual information and H(·) is entropy.

Note: we expect you to have a rigorous performance evaluation. To provide an estimate

of the performance of the algorithms in the report, you can repeat multiple times (e.g.,

5 times) for each experiment by randomly sampling 90% data from the whole dataset

and average the metrics on different subset. You are also required to report the standard

deviations. 4

3 Report

The report should be organized like research papers, and should contain the following

sections:

• In abstract, you should briefly introduce the topic of this assignment and describe

the organization of your report.

• In introduction, you should first introduce the main idea of NMF as well as its

applications. You should then give an overview of the methods you want to use.

• In related work, you are expected to review the main idea of related NMF

algorithms (including their advantages and disadvantages).

• In methods, you should describe the details of your method (including the

definition of cost functions as well as optimization steps). You should also

describe your choices of noise and you are encouraged to explain the robustness

of each algorithm from theoretical view.

• In experiment, firstly, you should introduce the experimental setup (e.g., datasets,

algorithms, and noise used in your experiment for comparison).

Second, you should show the experimental results and give some comments.

• In conclusion, you should summarize your results and discuss your insights for

future work.

• In reference, you should list all references cited in your report and formatted all

references in a consistent way.

The layout of the report:

• Font: Times New Roman; Title: font size 14; Body: font size 12

• Length: Ideally 10 to 15 pages - maximum 20 pages

Note: You are encouraged to use LaTeX. Optionally, a MS-Word template is provided.

4 Submissions

The submission contains two parts: source code and report. Detailed instructions are

as follows:

1. Go to Canvas and upload the following files. 5

1. report (a pdf file): the report should include each member’s details

(student id and name).

2. code (a folder) as zip file

i. algorithm (a sub-folder): your code could be multiple files inside

algorithm sub-folder.

ii. data (an empty sub-folder): although two datasets should be inside the

data folder, please do not include them in the zip file. We will copy two

datasets to the data folder when we test the code.

2. Only one student needs to submit the report as pdf file and code as zip file which

must be named as student ID numbers of all group members separated by

underscores.

E.g., “xxxxx_xxxxx_xxxxx_code.zip and xxxxx_xxxxx_xxxxx_report.pdf”.

3. Your submission should include the report and the code. A plagiarism checker

will be used.

4. You need to clearly provide instructions on how to run your code in the appendix

of the report.

5. Indicate the contribution of each group member.

6. A penalty of minus 1.25 (5%) marks per each day after due (email late

submissions to TA and confirm late submission dates with TA). Maximum delay

is 5 days, Assignments more than 5 days late will get 0.

5 Plagiarism

• Please read the University Policy on Academic Honesty carefully:

http://sydney.edu.au/elearning/student/EI/academic_honesty.shtml

• All cases of academic dishonesty and plagiarism will be investigated.

• There is a new process and a centralised University system and database.

• Three types of offences:

1. Plagiarism – When you copy from another student, website or other

source. This includes copying the whole assignment or only a part of it.

2. Academic Dishonesty – When you make your work available to another

student to copy (the whole assignment or a part of it). There are other

examples of academic dishonesty. 6

3. Misconduct - When you engage another person to complete your

assignment (or a part of it), for payment or not. This is a very serious

matter, and the Policy requires that your case is forwarded to the

University Registrar for investigation.

• The penalties are severe and include:

1. A permanent record of academic dishonesty, plagiarism, and misconduct

in the University database and on your student file.

2. Mark deduction, ranging from 0 for the assignment to Fail for the course.

3. Expulsion from the University and cancelling of your student visa.

• When there is copying between students, note that both students are penalised –

the student who copies and the student who makes his/her work available for

copying.

• It is noted that only 30% (including references) is acceptable. The high

plagiarism will be reported to the school.

7

6 Marking scheme

Category Criterion Marks Comments

Report [20] Abstract [0.75]

•Problem, methods, organization.

Introduction [1.25]

•What is the problem you intend to solve?

•Why is this problem important?

Previous work [1.5]

•Previous relevant methods used in literature?

Methods [6.25]

•Pre-processing (if any) •NMF

Algorithm’s formulation.

•Noise choice and description.

Experiments and Discussions [6.25]

•Experiments, comparisons, and evaluation

•Extensive analysis and discussion of results

•Relevant personal reflection

Conclusions and Future work [0.75]

•Meaningful conclusions based on results

•Meaningful future work suggested

Presentation [1.25]

•Grammatical sentences, no spelling mistakes

•Good structure and layout, consistent

formatting

•Appropriate citation and referencing

•Use graphs and tables to summarize data

Other [2]

•At the discretion of the marker: for impressing

the marker, excelling expectation, etc.

Examples include clear presentation, welldesigned

experiment, fast code, etc.

8

Code [5]

•Code runs within a feasible time

•Well organized, commented and documented

Penalties [−]

•Badly written code: [−5]

•Not including instructions on how to run your

code: [−5]

Note: Marks for each category is indicated in square brackets. The minimum mark for the assignment will be 0 (zero).


相关文章

版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:821613408 微信:horysk8 电子信箱:[email protected]
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:horysk8