COSC 2P03 – Assignment 3 – All about that base
Introduction:
Huzzah! It’s officially the simplest assignment ever! Or the most elaborate. Or somewhere in between depending on how much planning you put into it.
Databases store data in myriad interesting ways, but each of these ways have something very important in common: massive levels of abstraction. In particular, the way you specify your query does inform. the Database Management
System (DBMS) on how to decide which Records to include in its results, but the system itself is free to decide how to organize, index, and manage the actual underlying data representation to facilitate those queries.
Don’t worry, we’re not going to try replicating all of that. We’re not even going to worry about storing records on disk. Instead, we’re just going to focus on one small aspect: the role of indexes on queries.
Records:
For the sake of this assignment, we’re going to define a Record as something that holds some number of values, each of which identified by some field (or attribute).
Put another way: each Record is its own mapping from keys to values (very much like in your A1). Not all Records necessarily have the same fields as every other Record.
For the sake of your implementation, a Record should both have a hasField function (to indicate whether or not it
has an entry for the specified field), and throw a NoSuchElementException if you request a field that does not
exist. (Note that the latter shouldn’t actually come up within this assignment; you’re just including it because it’s correct to do so)
You’ll be defining a collection to hold these Records: a RecordPool.
Queries:
A query is simply a sort of question: you somehow express what you’d like. In a real DBMS this can be complicated, but we’re going to simplify immensely: you provide both the field labels you wish to search for, and the values you wish to match on. e.g.:
Balls |
||
Id |
Colour |
Radius |
0 |
Red |
2.5 |
1 |
Green |
5 |
2 |
Red |
8 |
3 |
Blue |
5 |
4 |
Red |
5 |
5 |
Blue |
6 |
l [{Colour:Red}] yields [{0,Red,2.5}, {2,Red,8}]
l [{Radius:5},{Colour:Blue}] yields [{3,Blue,5}]
l [{Colour:Red,Radius:6}] yields []
Note that {this:syntax} is not necessarily how you’d specify it in your own implementation. From the perspective of the code itself, just use two String arrays: one for the field labels, and one for the target values. And how you choose to create your user interface is (mostly) up to you. To confirm: even numbers are still stored as text.
Notice that it’s possible for a query to result in one Record, multiple Records, or no Records. So long as you always return an array of the correct length, the usage stays nice and consistent.
Indexes:
Here’s where it gets more interesting. For the example above, the Id field could be the Record’s index within the array, but oftentimes not even that would be the case. Regardless, Records could — for example — still be sorted by their Ids, which would still speedup searching for them.
Two problems with this:
• As we add additional Records, we’d need to keep shifting records back to make room for them
• In none of the three sample queries above did we actually happen to search by Id!
Yes, we could sort their positions within the array according to Colour, but then how would we search by Radius? And we still needed the option of searching by Id. What’s more, Records could have nearly unlimited combinations of fields, not all of which will include overlap across Records.
So then, what’s the solution? The solution is an index!
An index in this context is a separate data structure that focuses on a single key. In a ‘ real’ index, that could be a simple key (using a single field) or a composite key (using a combination of fields to collectively identify each Record), but for the sake of this assignment assume any index will only use simple keys. i.e. any given index will use only one field.
From the sample data above, let’s say we might want to search by Id or Colour. Consider the diagram below:
Here we see the actual data at the bottom, as well as two relaxed BSTs (relaxed to permit duplicates). For each Node, we can see both the key value (the Id in the left tree, and the Colour in the right), as well as the array subscript. where the corresponding Record is kept. For the left tree, that’sa bit redundant. For the right, eventhough the Records aren’t sorted by Colour within the array, we can still (potentially) find Records within logarithmic time!
And what if we wanted to find the Ball with a Radius of 8? We can still find it; we just don’t have an index to speed it up. This still raises some questions, but each has a simple answer:
• How many indexes can a Record Pool have?
o As many as you can think of unique fields. Another way of putting it: 0 or 1 per-field
• When are indexes defined?
o Whenever the user wishes. An index maybe defined at anytime, including after data has been entered. Whenever such a new index is defined, the first step is toreindex the entire array for the new index
o Whenever a new Record is added to the Record Pool, it is also immediately added to any index attached to a field that the Record includes
e.g. for the indexes above, if you were to add a new Ball of {Id:9,Pattern:plaid,Radius:10}, it would get added to the index for Id, but not for Colour
• How do we handle deletion?
o For this assignment, the only deletion permitted is to drop a user-specified index in its entirety
• How many variations are there for queries?
o You do not need to allow for comparisons, but you do need to allow for an arbitrary number of fields (as per the Radius&Colour example above). So Name<"Mildred"? Nah. Name="Mildred"&Height= "72 "? Yah.
o For full marks, you need to allow for both conjunctions and disjunctions, but the latter will only be worth one point. e.g. you must allow for Radius and Colour (conjunction), and you should allow for Radius or
Colour. Or more interestingly, “Colour=Red or Colour=Green” should yield 4 Records above
Again, the precise syntax for how to specify queries is up to you. Please note you aren’ttrying to reimplement SQL or something, and I’ m not concerned how you define your order of operations
Also, again, the disjunction’s only worth a tiny bit
• What data structure(s) should be used for the index?
o That’s for you to figure out!
Client program:
Though the precise format of the program is up to you, here’sastarting point:
• The user must be able to enter new Records
o Remember that Records don’tall have to have the same fields! Refer to the example above that lacked a Colour but had a Pattern. When a particular Record doesn’t have a field interesting to the query, it
simply won’t be included in the results!
• The user may — at anytime — choose to add a new Index on some field, or to drop an existing one
• The user may search for Records
o However many Records match the query is how many should be displayed
o As both conjunction and disjunction should be possible, ensure there’sa clear way to do this
• To make it easier to test on larger sets of data (without requiring you to follow some specific arbitrary data file format) allow for the option of preloading some data file of your own convention, to prepopulate with Records to speed upgrading
• You must include some mechanism for basic time trials. Note that these don’t necessarily have to be fully integrated with the rest of the user interface; it’s just something to allow for comparing searching with and without an index
Documentation and analysis:
Writeup a simple report detailing:
• How your index works
• The results of basic time trial analysis to compare query speeds with/without an index (This part is not very large; it’s just a bit of reflection on what you’ve done)
Tips:
First and foremost, you should realize what this assignment actually is. Some assigned tasks are all about the precise specifications of the data structures. Others are about extracting information and developing a solution to meet the requirements. This is the latter.
There is no sample execution.
There is no single specific instruction on precisely what to implement. This requires thought and planning.
This also requires quite a bit of decomposition. This is not one large system, but rather several small pieces.
And lastly, note that eventhough you haven’t been told precisely what to write, that doesn’tmean that ‘correct’ and ‘incorrect’ have suddenly ceased existing. You’ re expected to be applying what you’ve learned, to write reasonable solutions; while adhering to reasonable conventions.
Submission:
This course (and by extension, this assignment) uses IntelliJ. Do not use any other IDE. JDK 11.
• As a clarification, you aren’t writing in something else then importing into IntelliJ. You aren’texpecting the
marker to fix your project for you. You’re including run configuration/s that explicitly include the main class(es), saved within the project
You are expected to adhere to certain minimal standards before true evaluation of your work even starts:
• Comment your code
o Comment what it does; not what it is!
o e.g. indicating that a ‘for loop’ is ‘a loop’ is worse than nothing. Don’tdo that
• Document your code
o No, this is not the same thing
• Follow basic standards
o Avoiding hardcoded literals
o Obviously nothing ‘static’ other than the main methods of the main classes
o (You get the idea!)
• Use packages where appropriate
• You are making all data structures yourself
o In this case, it’s perfectly fine to reuse code you might have written for A1/A2, if applicable. But that’sit. No, don’t bother sending emails asking “can Iuse X?” No, you probably can’t use X. Or Twitter
• Useful variable names. Organized. Indented correctly.
o i.e. your submission will be readable, or you did not submit
Include a sample output, along with your entire IntelliJ project and writeup/summary. Bundle/compress everything into a single .zip file.
• Not a .rar
• Not a .7z
• Not a .tar.gz
• Pleasedon’t decide to earn a zero from something so trivial Submit your .zip in Brightspace.
版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:821613408 微信:horysk8 电子信箱:[email protected]
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。