联系方式

您当前位置:首页 >> Database作业Database作业

日期:2024-07-03 10:24

Due Jul 2 by 11:59pm

Points 100

Submitting a file upload

File Types rmd and html

Available until Jul 14 at 11:59pm

:[HY[(ZZPNUTLU[

5V[H)LUL

Read this entire page twice before you get started.

The due date is July 2 at 11:59pm ET. Late submissions are accepted (with the usual penalty of

2.5% per day late) until July 14 at 11:59pm ET. No submissions are accepted after that -- not

submitting the practicum by then will make it impossible to earn a passing grade for the course.

Work on the practicum for at least three hours every day and use the time during the week prior to

the practicum to start working on it, especially the configuration of the MySQL Server and the loading

of the data. Do not wait to get started.

A gentle reminder that the average of both practicums must be above 70% to pass this course, so be

sure to complete on time and seek help right away. Do not procrastinate -- things that appear simple

often take more time than expected and, of course, programming is fraught with potholes on the road

to success. So plan accordingly. Do not wait until shortly before it is due.

The average time to complete the practicum is 15-25 hours. Do not wait to start. Seek help early.

Submit often and as soon as you have enough code that works. We will only grade the last

submission. Check your submission before you submit and after.

6]LY]PL^

In this practicum you will build a database that can be used to analyze bird strikes on aircraft. For an

existing data set from the FAA [1], you will build a logical data model, a relational schema, realize the

relational schema in MySQL/MariaDB, load data into the database, execute SQL queries, a finally

perform some simple analysis of the data.

The graphic below shows some statistics regarding bird strikes and helps frame the data in the data

file.

Use the provided time estimates for each tasks to time-box your time. Seek assistance if you spend

more time than specified on a task -- you are likely not solving the problem correctly. A key objective

is to learn how to look things up, how to navigate complex problems, and how to identify and resolve

programming errors.

3LHYUPUN6IQLJ[P]LZ

In this practicum you will learn how to:

configure cloud MySQL

connect to cloud MySQL from R in an R Notebook

implement a relational schema for an existing data set

load data from CSV files into a relational database through R

execute SQL queries against a MySQL database through R

perform simple analytics in R

identify and resolve programming errors

look up details for R, SQL, and MySQL

time-box work

-VYTH[

Complete individually.

7YLYLX\PZP[L;HZRZ

1. Read the Hints and Tips section below and go back to them often when you encounter problem.

Most problems are addressed in that section. Consult the list before contacting us for help as

you'll be able to resolve the issue more quickly.

2. Create an R Project and within that project create an R Notebook in which to do all of your work.

Place all files within the project folder and do not use absolute file paths. Embed any diagrams inthe R Notebook and be sure to submit them as files in your submission. Keep updating this

notebook as you complete tasks. Learn how to use journaling in data analysis work. Use 6.204 --

Literate Programming with R Notebooks (http://artificium.us/lessons/06.r/l-6-204-rnotebooks/l-6-204.html)

for reference. Of course, in practice, you will likely build different

programs (perhaps even using different programming language) for database creation, data

loading, and the dashboard. But here we will do it all in one R Notebook.

3. Become familiar with the problem of bird strikes on aircraft. The report in [1] provides an overview.

You should scan it to become familiar with the domain but you do not need to read it deeply.

4. Download the file BirdStrikesData.csv (https://s3.us-east2.amazonaws.com/artificium.us/datasets/BirdStrikesData-V4-SuF24.csv)

and save it locally to your

R Project folder (the same folder that contains you .Rmd and your .Rproj files) or reference it from

the URL. You may wish to create a new data file that is a subset of the full data that you use for

development so loading takes less time. This is a common strategy in practice. To download the

file, use the right mouse button and choose "Save Link As..." or a similar menu choice on your

browser. Do not click on the link as that will cause the browser to open and display the file.

Read the Hints & Tips section frequently and before posting questions.

2L`9LZV\YJLZ

7YLYLX\PZP[L3LZZVUZ

06.103 Working with Vectors and Data Frames in R (http://artificium.us/lessons/06.r/l-6-

103-vecs-and-dfs/l-6-103.html)

06.106 Import Data into R from CSV, TSV, and Excel Files

(http://artificium.us/lessons/06.r/l-6-106-load-csv-tsv-excel-files/l-6-106.html)

06.108 Loops and Iteration in R (http://artificium.us/lessons/06.r/l-6-108-loops-iteration-inr/l-6-108.html)

06.112

Basics of Text & String Processing in R (http://artificium.us/lessons/06.r/l-6-112-

text-proc/l-6-112.html)

06.121 Writing Functions in R (http://artificium.us/lessons/06.r/l-6-121-funcs-in-r/l-6-

121.html)

06.204 Literate Programming with R Notebooks (http://artificium.us/lessons/06.r/l-6-204-rnotebooks/l-6-204.html)

06.191

Debugging R Code (http://artificium.us/lessons/06.r/l-6-191-debugging/l-6-191.html)

06.301 Working with Databases in R (http://artificium.us/lessons/06.r/l-6-681-key-value-dbredis-from-r/l-6-301.html)

06.304

Configure and Connect to Cloud MySQL from R (http://artificium.us/lessons/06.r/l6-304-cloudMySQL-from-r/l-6-304.html)

06.302

Bulk Load Data from CSV into Database in R (http://artificium.us/lessons/06.r/l-6-

302-bulkload-data-into-db/l-6-302.html)

06.306 Dates in R and SQLite (http://artificium.us/lessons/06.r/l-6-306-dates-in-r-and-sql/l-6-

306.html)70.907 Stored Procedures in MySQL (http://artificium.us/lessons/70.sql/l-70-907-storedprocs-mysql/l-70-907.html)

;HZRZ

Before

you start, read all of the questions. Inspect the CSV data file that you downloaded so you are

familiar with its columns, data types, and overall structure. Assume that this database will eventually

be used for an app that can be used by pilots (of any kind of aircraft) to report wildlife incidents.

All R and SQL code blocks must be named, as shown in the example below. This is necessary so

that you can reference your code blocks in the self-evaluation rubric to be filled out at the end of the

practicum. The names of code blocks must be unique.

```{r nameOfRCodeBlock, eval = T, warning = F}

```{sql nameOfSQLCodeBlock, connection = xDB}

You may add any additional block parameters as needed.

Use functions to structure your code so that it becomes more readable and easier to develop and

debug. Use headers to segment your notebook and add explanations as to what each code block

means. Follow common coding practices and format your code so it is readable, and use functions to

break down complex code.

1. (5 pts / 2.0 hrs) Set up and configure a MySQL Server database on db4free.net

(http://db4free.net/) or any other MySQL cloud service of your choice such as

freemysqlhosting.net (https://www.freemysqlhosting.net/) , Google Cloud or AWS. Note that

you may use SQLite instead of MySQL but you will not get credit for this question nor credit for the

question below that asks you to create a stored procedure, as those are not supported in SQLite.

See the Hints below for information on db4free. Lesson 6.304 -- Configure and Connect to

Cloud MySQL from R (http://artificium.us/lessons/06.r/l-6-304-cloudMySQL-from-r/l-6-304.html)

provides some advice on connecting to these two databases.

If you use freemysqlhosting, we recommend that you pay (for the remainder of the term or the

entire year) for a 100MB upgrade (really cheap, like cost of couple cups of coffee) as otherwise

exceeding the 5MB limit will get your account suspended. You may share a subscription as it

allows five databases. Alternatively, you could split the data over multiple databases. Of course,

using a different MySQL host such as Google Cloud or Amazon AWS (or any other of your choice)

can solve this issue too, but they are a bit more difficult to set up. If you use AWS RDS you need

to pay particular attention to ensuring you set global access policies and proper security policies --

this can be quite tricky and therefore we do not recommend using AWS. We recommend that you

purchase an educational subscription to freemysqlhosting (and share it if you'd like to); it is the

easiest to set up and work with.

Both Amazon AWS and Google Cloud offer free credits, but be sure to monitor usage so you do

not exceed the free credit or are prepared to pay (they can be a bit costly, so be careful to

suspend database when not in use and delete after the Practicum has been graded).You may collaborate and work with others to set up a cloud MySQL installation, but not for the

remainder of the practicum. A cloud MySQL installation is necessary for us to run your code -- we

cannot connect to a local installation of MySQL, although you may use one for testing and

development.

2. (1 pts / 0.1 hrs) Create an R Notebook named "LastNameFirstInitial.CS5200.PractI-SuF24.Rmd"

where LastName is your last name and FirstInitial is the first letter of your first name, e.g., GilesM.

Set the title of the R Notebook to "Analysis of Wildlife Strikes to Aircraft" and the subtitle to

"Practicum I CS5200", add an author field, and set the date to "Summer Full 2024". Be sure to do

all work within an R Project (named as you wish).

3. (4 pts / 0.1 hrs) Add an R code chunk that connects to your MySQL database. Use headers for all

other questions with appropriate titles so you (and we) can navigate the notebook more easily. If

you have difficulty connecting to or setting up MySQL, then use SQLite and proceed. You can

always come back to this question and change your configuration so that you connect to MySQL.

This is the benefit of relational databases: you can easily switch between databases without

changing your code. Do not echo the code in the notebook and supress any warnings or other

messages. Add an appropriate code chunk label.

4. (30 pts / 3.5 hrs) In a single R code chunk that is not echoed (i.e., set "echo = F" for the code

chunk and suppress all messages), create the database schema described below (do not use

{sql} code chunks). Add appropriate constraints, primary key and foreign key definitions. In the

schema definitions below, primary keys are underlined and foreign keys are bolded.

A. (5 pts / 0.5 hrs) Create a table flights that stores information about flights that experienced

wildlife strikes and follows this schema:

flights (fid : integer,

date : date, originAirport : integer,

airlineName : text, aircraftType : text, isHeavy: boolean)

Make isHeavy a Boolean flag and use TRUE if the aircraft is a "heavy" aircraft, FALSE

otherwise -- this information is in the CSV. For aircraftType use the column aircraft from the

CSV. Use appropriate data types for the columns and store any date as a date type not as text

(subject to the data types your chosen database supports). If date or boolean data types are

not supported, choose another data type that will work or split the dates into month, day, and

year columns. Note that some columns contain periods so that will require special treatment in

SQL -- investigate how to deal with this common issue.

The column originAirport is a foreign key linking a flight to an airport (defined in the table

below).B. (5 pts / 0.5 hrs) Create a table that stores airports and that follows this schema:

airports (aid : integer, airportName : text, airportState : text, airportCode : text)

aid is a synthetic primary key, airportName is the name of the airport and is from the airport

column in the CSV, airportState corresponds to the airport state (the origin column) from the

data file. The airportCode should be the airport's international code, e.g., BOS for Boston or

LGA for LaGuardia. For now, set it to the default value "ZZZ" -- it is for future expansion.

C. (4 pts / 0.3 hrs) Link the flights and airports tables via the origin foreign key in flights to the

primary key aid in airports. The origin is an FK to the airport in the airports table. Update the

above table definitions for airports and flights as necessary.

D. (4 pts / 0.5 hrs) Create a lookup table called conditions defined as follows:

conditions (cid, sky_condition, explanation)

Link this lookup table to the incidents table defined below through the conditions foreign key.

This table contains the values of all sky conditions, e.g., 'Overcast'. Leave

the explanation column empty (future expansion). Use the values from the CSV.

E. (4 pts / 0.5 hrs) Create a table that stores wildlife strike incidents called incidents and that

follows this schema:

incidents (iid : integer, fid : integer, wlsize : text, impact : text, altitude : integer ≥ 0, conditions

: {...})

iid is a synthetic key. fid is a foreign key for the flight for which this strike occurred for. The

conditions column is a foreign key link to the conditions lookup table. The column wlsize is the

wildlife size from the corresponding column in the CSV. The other columns are directly from

the CSV, or, if not present in the CSV, set them to a default value for now. The column altitude

should be restricted to positive integers greater than or equal to zero using either application

logic or a CHECK constraint (if the database supports constraints; MySQL version prior to

8.0.16 do not enforce any CHECK constraints). Using a trigger is also an option, but do not

turn on triggers until all data is inserted, as it will have significant a negative performance

impact on row insertion.

F. (3 pts / 0.3 hrs) Link the incidents and flights tables via the fid foreign key in incidents to the

primary key fid in flights.

G. (2 pts / 0.3 hrs) Add one or more code chunks that are not evaluated (eval = F) when the

Notebook is knitted but can be used to test your table definitions. Add whatever test code youneed to assure yourself that your table definitions are correct.

5. (1 pts / 0.1 hrs) If you haven't yet, download the bird strikes CSV data file from the link provided

before. Place the CSV file into the same folder as your R Notebook and then, in a separate code

chunk (not echoed) load it into a dataframe called bds.raw. Do not use a path name when loading.

The default path is the local folder that contains the R Notebook when you have the R Notebook

in an R Project. Be sure to always re-open the project when returning to R Studio.

6. (20 pts / 8 hrs) Using the table definitions and the data in the dataframe bds.raw from above,

populate the tables with the data from the appropriate columns. Omit any columns from the CSV

that are not referenced in the tables. You do not need to create any additional tables. Because we

are not adding additional tables there will be (unnormalized) data and repetitions, for example for

aircraft -- that is acceptable for this practicum due to time constraints but would not be if this were

an actual analytics database project. Assume "Business" to be an airline name (it is actually a

private flight but we assume it is the airline called "Business") and store those incidents.

You may (nd should) use functions to structure your code. Be sure not to echo any code chunks

and supress messages.

Use default values where the data file does not contain values or leave empty. If there is no

airport or airline, then link to a "sentinel" airline or airport, i.e., add an "unknown" airline and airport

to the tables rather than leaving the value NULL. Assign synthetic key values as and where

needed and use them as primary keys. Whether you generate them in R code or the database is

up to you -- each has pros and cons and part of the objective of this practicum is for you to think

through such decisions.

Map the values in the data file to appropriate values in any lookup tables using reasonable rules

that you may define as necessary.

All data manipulation and importing work must occur in R. You may not modify the original

data outside of R -- that would not be reproducible work. It may be helpful to create a subset

of the data for development and testing as the full file is quite large and takes time to load.

7. (1 pts / 0.5 hr) Add code chunks that you can run and that show that the loading of the data

worked by displaying parts of each table (do not show the entire tables). Do not echo the results

of those code chunks and make sure they are eval=F so they are not included in any knitting

process.

8. (3 pts / 1 hr) Add a level two (##) header with the title "Top Airlines with Strikes" and below create

a SQL query against your database to find the top 5 airlines with the greatest number of wildlife

strike incidents. You must use an R function to execute the query. It must be a single query.

Display the airline and the number of incidents. Note that every row in the incidents tableconstitutes one "incident". Do not echo the code.

9. (10 pts / 1 hr) Add a level two (##) header with the title "Analysis by Airport". Create a SQL query

against your database to find the airports that had an above average number wildlife strike

incidents. You must use an R function to execute the query. It must be a single query. To do this,

find the number of bird strike incidents for each airport (remember that each row in the incidents

table is a single bird strike incident). Then calculate the average across all airports and from there

find those airports which had an above average number of bird strike incidents. List the names of

the top 5 airports and the number of incidents for each. Do not echo the code, only the result.

10. (8 pts / 1 hr) Add a level two (##) header with the title "Analysis by Year". Create a SQL query

against your database to find the (total) number of wildlife strikes per year. Save the result of the

query in a dataframe. You must use an R function to execute the query. It must be a single query.

Format the result with appropriate table headers using the kableExtra package. This query can

help answer the question which year has more wildlife striking aircraft than normal. Do not echo

the code, only the result.

11. (5 pts / 2 hrs) Add a level two (##) header with the title "Trend by Year". Using the dataframe from

Question 10 above, build a line chart that plots year along the x-axis versus number of the

number of strikes on the y-axis. Adorn the graph with appropriate axis labels, titles, legend, data

labels, etc. You should use the standard R plot() function; you do not need to use packages

such as ggplot, ggplot2, or plotly -- although you may, of course. This tutorial

(https://www.statmethods.net/graphs/scatterplot.html) may help you get started.

12. (10 pts / 3 hrs) Create a stored procedure in MySQL (note that if you used SQLite, then you

cannot complete this step) that updates a strike (identified by its primary key) from the database.

You may decide what you need to pass to the stored procedure to update a wildlife incident. You

must log this update in an audit log that contains the type of modification (an update), the table

that was manipulated, the time and date of removal, and the original values of the strike that was

updated. Create the "update_log" table if it does not exist.

Show (in R) that your procedure works and that the log table is updated as required.

Note that if you used SQLite rather than the required MySQL for the practicum, then you cannot

complete this question as SQLite does not support stored procedures.

13. (5 pts) Create professionally developed code that is well documented and all chunks are labeled.

/PU[ZHUK;PWZ

Ask clarifying questions in the "Practicum I QnA" channel on Teams.

If you find it helpful, draw an ERD.

Ask questions as soon as you encounter them.

While you need to look up details for functions and R, the solution cannot be found via Google.Do not spend extraordinary time on code errors; ask for help if you cannot resolve them within 30-

minutes. First through Teams and then by going to TA office hours -- the TAs will alert the

instructor if there are doubts they cannot resolve.

The TA can help you and answer questions, but they are not your personal tutors; do not rely on

them to debug your code. Code small, test often.

You may use AI assistants as needed but let us know which ones you used and for what; add

acknowledgments in your code and you must be able to explain any code you copied or

borrowed.

If you have trouble connecting, be sure to disable any firewall or anti-virus software that may be

clocking port 3306 -- or add port 3306 to the list of open ports in your firewall software

configuration.

The function dbWriteTable() is disabled for bulk loading on the MySQL cloud installation

on db4free. It does work fine for MySQL on freemysqlhosting.net and AWS RDS and for local

installations of MySQL if you allow for batch loading.

Be sure to click the activation link in the email from db4free.net after you create your database; if

you get "Access denied..." error messages then you did not activate your account.

Be sure to shut down your databases if you are not using them; otherwise you might exceed the

monthly free time. And, be sure to delete the database after the course is done or you will be

billed.

Batch loading (via dbWriteTable() ) is disabled for security reasons by default on local installations

of MySQL. Here's how to enable it: mysql - ERROR: Loading local data is disabled - this must

be enabled on both the client and server sides - Stack Overflow

(https://stackoverflow.com/questions/59993844/error-loading-local-data-is-disabled-this-must-beenabled-on-both-the-client)

. Note that you cannot enable this for db4free.net -- it does not give

you privileged access to do this.

Enable functions, procedures, and triggers on Amazon AWS RDS:

https://aws.amazon.com/premiumsupport/knowledge-center/rds-mysql-functions/

(https://aws.amazon.com/premiumsupport/knowledge-center/rds-mysql-functions/)

On AWS, if you get errors that the service is not reachable, then you need to edit the "inbound

rules" of the security group that is attached to the RDS -- add the MySQL/Aurora type and port

range 3306 with cidr 0.0.0.0/0

If you use sqldf for manipulating or querying internal data frames, be mindful of conflicts between

SQLite and MySQL; see mysql - How to use sqldf in R to manipulate local dataframes? -

Stack Overflow. (https://stackoverflow.com/questions/60117431/how-to-use-sqldf-in-r-tomanipulate-local-dataframes)

A potential issue with sqldf can occur when you connect to MySQL

or a non-SQLite database as sqldf attempts to use your existing database connection as a

backing store for its data; this will often not work due to security constraints. So, you need to add

the R code options(sqldf.driver = 'SQLite') which forces sqldf to use SQLite as its backing

store.

sqldf is very slow for querying or extracting data from a data frame as it actually copies the data

frame to an in-memory SQLite database and then runs a SQL query; so, only use sqldf is there's

no simple way to do a native R "query" with logical operations and which() and any() , e.g., use

sqldf to do grouping but not much elseIf you get errors connecting to MySQL, make sure you have the latest version of R and upgrade R

and all packages as necessary.

Using mixed upper and lower case for table names sometimes causes issues with

dbSendStatement() and dbWriteTable() when using MySQL; make all table names and attributes

names fully lower case; SQL is not case sensitive when it comes to keywords like INSERT vs

insert BUT it is when it comes to table and attribute names

Put columns that contain special character such as period (.) or have names that are keywords

into backticks, e.g., SELECT `condition` FROM incidents;

If you get a message "Can't initialize character set unknown", see

https://stackoverflow.com/questions/52613809/rmysql-error-cant-initialize-character-setunknown

(https://stackoverflow.com/questions/52613809/rmysql-error-cant-initialize-characterset-unknown)

and https://github.com/pBlueG/SA-MP-MySQL/issues/203

(https://github.com/pBlueG/SA-MP-MySQL/issues/203)

There appears to be a bug in MySQL stored procedures that contain a SELECT as the last

statement in the procedure; if you get the message "Commands out of sync; you can't run this

command now", you must use INTO with your SELECT; see

https://stackoverflow.com/questions/6583020/mysql-stored-procedure-caused-commandsout-of-sync

(https://stackoverflow.com/questions/6583020/mysql-stored-procedure-causedcommands-out-of-sync)

All

your work must be within your programs; we will run your code against one of our MySQL

servers with a blank database (or SQLite if that's what you used) and it has to run from beginning

to end; so if you did work outside of your code then we cannot reproduce your work you'll get a

grade of 0.

Before submitting, test that you code runs and your any notebooks that may be required knit in a

clean environment. So, remove all objects first by either including this code in the beginning of

your R Notebook rm(list = ls()) or by clearing your environment in R Studio with the menu item

Session/Clear Workspace.

Explain your approaches and any manipulations, omissions, deletions, or modifications of data.

Be sure to install packages within your code (but only if not installed) to ensure they get installed

when the graders run your code. Here's an elegant way to do this:

https://statsandr.com/blog/an-efficient-way-to-install-and-load-r-packages/

(https://statsandr.com/blog/an-efficient-way-to-install-and-load-r-packages/)

In R, if you want to change a global variable (one used first outside a function), then you need to

use a special assignment operator: <<- instead of <-.

Unlike MySQL and other databases, SQLite does not have support for a separate "Boolean" data

type; instead Boolean values are stored as integers, although SQLite recognizes the keyword

TRUE (1) and FALSE (0) as of version 3.23.0 but those are just synonyms for the integers values

1 (TRUE) and 0 (FALSE).

If your data frames are not written to the database when you use dbWriteTable() then check to

see if your data frame contains columns of type date -- there may be issues in converting R dates

to SQL dates; for more information on this, see Lesson 6.306 Dates in R and SQLite

(http://artificium.us/lessons/06.r/l-6-306-dates-in-r-and-sql/l-6-306.html)To find version of your database, issue this SQL meta-query from an R code chunk or via MySQL

Workbench: SELECT version()

freemysqlhosting does not support triggers and has limits on constraints.

If you are using Google Cloud or AWS MySQL (which are awesome), be sure to suspend when

not using and shut down when you are done -- you will continue to be charged and eventually

your free credits will run out.

If you use freemysqlhosting, we recommend that you pay for a 100MB upgrade (really cheap, like

cost of lunch or dinner out) as otherwise exceeding the 5MB limit may get your account

suspended. Alternatively, you could split the data over multiple databases, or you could reduce

the size of the CSV so that the database is below 5MB. You may share a single subscription

among several students as it allows for up to five independent databases. You may not share the

database, of course.

4`:83*SV\K:LY]LYZ

Amazon RDS Free Tier | Cloud Relational Database | Amazon Web Services

(https://aws.amazon.com/rds/free/)

db4free.net - MySQL Database (https://www.db4free.net/)

Hosting with PHP, MySQL and cPanel (freehosting.host) (https://freehosting.host/)

Hosting with PHP, MySQL (AwardSpace.com) (https://www.awardspace.com/)

0UZWLJ[PUN3HYNL*:=-PSLZ

In the video tutorial below, Khoury Boston's Prof. Feinberg demonstrates how to find the path of a file

and how to inspect large CSV files prior to loading into R or another programming

environment. Follow the steps as you watch the tutorial. Stop the tutorial as needed. This

demonstration presumes that you have MacOS or use a Linux/Unix environment.

    0:00 / 5:59  

:\ITPZZPVU+L[HPSZ

Before submitting your code, complete the self-evaluation rubric (separate "assignment"; see

Canvas).1. Provide in your R Notebook:

A. name and email

B. table creation SQL for MySQL/MariaDB (or SQLite)

C. results of queries, visualizations, computations to show that your code works as expected

D. clear explanations of your code that is decomposed into chunks and preceded by headers

2. Complete the self-evaluation rubric: [30 min] COMPLETE: Practicum I / Self-Evaluation

(https://northeastern.instructure.com/courses/180750/assignments/2248391?wrap=1)

3. Submit the .Rmd source file of your R Notebook. The code must run from start to end, so be sure

to test carefully, load any required libraries, and ensure that your chunks run sequentially from

start to end.

4. Submit a knitted HTML of your Rmd; the HTML must be a compact page, so pay attention to

formatting, headers, and how much data you print. If you have trouble knitting on your local

installation of R Studio, consider using the cloud version of R Studio to do the knitting. Not

submitting a knitted file results in a loss of 5 points. Submitting a poorly formatted HTML results in

a loss up to 5 points.

5. All your work must be within your R Notebook; during the demonstration, we will ask you to run

your R Notebook against one of our MySQL servers with a blank database and it has to run from

beginning to end; so if you did work outside of your R Notebook (e.g., in Excel or MySQL

Workbench) then we won't be able to reproduce your work and you will get a grade of 0.

6. Your code has to run, obviously, but it also has to run somewhat efficiently... if everyone else's

code runs in 10-30 minutes but yours takes several hours then clearly is due to poor programming

and not due to the inherent complexity of the problem... follow common coding strategies for

writing efficient code such as factoring out invariants from loops, not calling functions repeatedly,

pre-allocating memory, not copying objects needlessly, not calling expensive functions when

simpler ones will do (e.g., call substring() instead of doing regular expressions), use which() when

searching and don't use sqldf, and so forth. These practices are not specific to R, although there

are R specific performance issues, but those are less likely to be a concern here.

9LX\PYLK5L_[:[LWZ

Check your submission to ensure the correct files have been submitted

Complete and submit the self-evaluation rubric

Schedule demo


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:821613408 微信:horysk8 电子信箱:[email protected]
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:horysk8