联系方式

您当前位置:首页 >> Java编程Java编程

日期:2024-12-06 08:31

EE5434 final project

Data were available on Nov. 5 (see the Kaggle website)

Report and source codes due: 11:59PM, Dec. 6th

Full mark: 100 pts.

During the process, you can keep trying new machine learning models and boost the learning

accuracy.

You are encouraged to form groups of size 2 with your classmates so that the team can

implement multiple learning models and compare their performance. If you cannot find any

partners, please send a message on the group discussion board and briefly introduce your

expertise. If you prefer to do this project yourself, you can get 5 bonus points.

Submission format: Report should be in PDF format. Source code should be in a notebook file

(.ipynb) and also save your source code as a HTML file (.html). Thus, there are three files you

need to upload to Canvas. Remember that you should not copy anyone’s codes, which can lead

to faisure of this course.

Files and naming rules: If you have two members in the team, start the file name with G2,

otherwise, G1. For example, you have a teammate and the team members are: Jackie Lee and

Xuantian Chan, name it as G2-Lee-Chan.xxx. 5 pts will be deducted if the naming rule is not

followed. In your report, please clearly show the group members.

How do we grade your report? We will consider the following factors.

1. You would get 30% (basic grade) if you correctly applied two learning models to our

classification problem. The accuracy should be much better than random guess. Your

report is written in generally correct English and is easy to follow. Your report should

include clear explanation of your implementation details and basic analysis of the

results.

2. Factors in grading:

a. Applied/implemented and compared at least 2 different models. You show good

sense in choosing appropriate models (such as some NLP related models).

b. For each model, clear explanation of the feature encoding methods, model

structure, etc. Carefully tuned multiple sets of parameters or feature engineering

methods. Provided evidence of multiple methods to boost the performance.

c. Consider performance metrics beyond accuracy (such as confusion matrix, recall,

ROC, etc.). Carefully compare the performance of different

methods/models/parameter sets. Being able to present your results using the most

insightful means such as tables/figures etc.

d. Well-written reports that are easy to follow/read.

e. Final ranking on Kaggle.  For each of the factor, we have unsatisfactory (1), acceptable (2), satisfactory (3), good (4),

excellent (5). The sum of each factor will determine the grade. For example, student A got 4

good and 1 acceptable for a to e. Then, A’s total score is 4*4+2=16. The full mark for a to e is

25. So, A’s percentage is 64%.

Note that if the final performance is very close (e.g. 0.65 vs 0.66), the corresponding

submissions belong to the same group in the ranking.

Factors that can increase your grade:

1. You used a new learning model/feature engineering method that was not taught in

class. This requires some reading and clear explanation why you think this model fits this

problem.

2. Your model’s performance is much better than others because of a new or optimized

method.

The format of the report

1. There is no page limit for the report. If you don’t have much to report, keep it simple.

Also, miminize the language issues by proofreading.

2. To make our grading more standard, please use the following sections:

a. Abstract. Summarize the report (what you done, what methods you use and the

conclusions). (less than 300 words)

b. Data properties (data explortary analysis). You should describe your

understanding/analysis of the data properties.

c. Methods/models. In this section, you should describe your implemented models.

Provide key parameters. For example, what are the features? If you use kNN,

what is k and how you computed the distance? If you use ANN, what is the

architecture, etc. You should separate the high-level description of the models

and the tuning of hyper-parameters.

d. Experimental results. In this section, compare and summarize the results using

appropriate tables/figures. Simplying copying screening is acceptable but will

lead to low mark for sure. Instead, you should *summarize* your results. You

can also compare the performance of your model under different

hyperparameters.

e. Conclusion and discussion. Discussion why your models perform well or poorly.

f. Future work. Discuss what you could do if more time is given.

3. For each model you tried, provide the codes of the model with the best performance. In

your report, you can detail the performance of this model with different parameters.

The code

The code should include:

1. Preprocessing of the data 2. Construction of the model

3. Training

4. Validation

5. Testing

6. And other code that is necessary

This is the link that you need to use to join the competition.

https://www.kaggle.com/t/79178536956041b8acb64b6268afb4de


相关文章

版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:821613408 微信:horysk8 电子信箱:[email protected]
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:horysk8