联系方式

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2024-10-17 06:48

CBS5503 of MScGAH

AY 202425 Sem 1

Individual Assignment Guidelines

Application of Machine Learning in Fake News Detection

Introduction

This practical assignment is designed to assess students’ understanding and programming skills of conducting Machine Learning using the task of Fake News Detection. Students will be assessed on their implementation of building  a supervised machine learning pipeline, including  data preprocessing, feature extraction (engineering)model selection, and model evaluation. They are also required to report and reflect on the classification results as indicated by the evaluation metrics.

Instruction

o This assignment constitutes 50% of the total marks of this course.

o Plagiarism will be penalized. Any student who commits an act of academic dishonesty is considered to have violated academic integrity and will therefore be subject to academic disciplinary actions.

o Include statement of GenAI usage at the end of the assignment.

o Submission:

   Submit a word doc including the results and reflections, named as <yourname>.docx.

   The Word document should be like a factual report, concise and less than 500 words.

   Submit the implementation code as well, named as <yourname>.ipynb.

   Deadline: 23:59, 20 October 2024 (Sunday of Week 7).

o Late submission

Submission after the deadline

Mark deduction

Within 7 calendar days

50%

More than 7 days

100%

Data

Structured as the sample dataset demonstrated in the tutorial, two different subsets of clean data are provided. Each subset contains 1,000 instances of news, together with the respective labels of truth (True vs. False) that are balanced. Each student will work on one of the two subsets data according to the number of the last digit in your student ID. Please refer to Table 1 in Appendix A to find the subset data for you to work on.


Tasks

    Common   Task   1:  Apart   from   the   existing   functions   in   Text_Cleaning(Text)   e.g. punctuation/digit/website  removal,  add  one  self-built  function  to  remove  keywords “Reuters” and all the city names and apply to the news Text.

    Common Task 2:  Visualize the fivegram distribution (plot top 10) of the True and False news respectively. Describe your observation after comparison.

    Task 3A: Construct the TFIDF feature matrix based on bigrams. Run the ML pipeline and compare the results on bigrams with unigrams. Discuss the difference and implications in the end of the file.

    Task 3B: Construct the TFIDF feature matrix based on trigrams. Run the ML pipeline and compare the results on trigrams with unigrams. Discuss the difference and implications in the end of the file.

    Task 4A: Split data proportion 9: 1 and use Logistic Regression in model selection. Try to change one of the key paramers and report the classification results respectively.

    Task 4B: Split data proportion 8: 2 and use Decision Tree in modelselection. Try to change one of the key paramers and report the classification results respectively.

Formatting Template

    Begin the your assignment with indication of the following information:

•    Course Code & Title

•    Individual Assignment

•    Title: Application of Machine Learning in Fake News Detection

•    Student Name and ID

    Run the tasks respectively with codes successfully run without errors; and

o For students with odd final digit: Report the results to common tasks 1-2 and tasks 3A, 4A. Provide basic descriptions to each of the result and interpret the results with proper analysis/account of the observation.

o For students with even final digit: Report the results to common tasks 1-2 and tasks 3B, 4B. Provide basic descriptions to each of the result and interpret the results with proper analysis/account of the observation.

 Conclusion: Summarize the overall findings of any interesting patterns that you have observed in the tasks. Conclude the assignment by offering recommendations or suggestions for further improvements.


Grading Criterion

Items

Evaluation Standard

Marks

Common Task 1

Success in running task 1 and report the result

10

Common Task 2

Success in running task 2 and report the result

10

Task 3

Success in running task 3 and report the result

30

Task 4

Success in running task 4 and report the result

30

Documentation

Tidy and neat with proper comments#

10

Analysis

In-depth thinking and reflections

10

Total:

100

Appendix A

You should work on one of the given datasets according to the last digit of your student ID.

Last digit of student ID

Name of subset data

0-4

fn_subset1.csv

5-9

fn_subset2.csv

Table 1 - Mapping of Student ID and Subset Number.

 


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:821613408 微信:horysk8 电子信箱:[email protected]
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:horysk8