Motivation
Virtual memory provides memory for processes above the limit of physical
memory in a computer by swapping unused pages to a secondary storage
device, such as a disk (mechanical or solid-state). This type of hierarchical
memory is also used for CPU instruction caching and in database systems. In
this assignment, you will explore some aspects of constructing hierarchical
memory structures by building a simulation, and then further explore page
replacement algorithms.
Background: Hierarchical Memory
In the context of computer systems, the term "memory hierarchy" refers to the
organization and structure of various types of memory storage devices within
a computer, arranged in a hierarchy based on their proximity to the CPU
(Central Processing Unit) and their speed, capacity, and cost characteristics.
Memory hierarchy is a crucial concept in computer architecture and plays a
vital role in determining the overall performance of a computer system.
The memory hierarchy typically consists of several levels, each with different
properties:
1. Registers: These are the fastest and smallest storage units directly
accessible by the CPU. Registers store data that the CPU is currently working
on. Data can be read from or written to registers in a single clock cycle.
Registers provide the highest performance but have the smallest capacity.
2. Cache Memory: Cache memory is a small-sized, high-speed memory that
sits between the CPU and main memory (RAM). It stores frequently used data
and instructions to reduce the time it takes to access them. Cache memory is
divided into multiple levels (L1, L2, L3), with L1 being the smallest and fastest,
and L3 being larger but slightly slower. Caches use a mechanism called caching
to manage data movement between levels.
3. Main Memory (RAM): Main memory is the primary storage used to store
data and programs that the CPU is actively using. While it is larger in capacity
compared to cache memory, it is slower in terms of access times. Data must be
moved between main memory and cache when needed. The CPUs memory
management unit (MMU) does this automatically.
4. Secondary Storage: This includes non-volatile storage devices like hard disk
drives (HDDs), solid-state drives (SSDs), and optical drives. Secondary storage
provides much larger storage capacity but is significantly slower in terms of
access times compared to main memory. Data from secondary storage must be
loaded into main memory before the CPU can work on it. The virtual memory
subsystem manages the movement of pages from secondary storage to main
memory.
5. Tertiary Storage: This level includes slower and high-capacity storage
solutions, often used for long-term archival and backup purposes. Examples
include magnetic tapes and cloud storage.
The memory hierarchy is designed to optimize the trade-off between speed
and capacity. Data that is frequently accessed is kept closer to the CPU in
faster memory levels (e.g., registers and cache), while less frequently used data
is stored in slower, higher-capacity memory levels (e.g., main memory and
secondary storage). This organization aims to minimize data access times and
maximize overall system performance while balancing cost.
Efficient management of the memory hierarchy is a critical aspect of computer
architecture, and memory hierarchies are designed to ensure that the CPU can
access data and instructions for processing as quickly as possible while
maintaining a balance between cost and performance.
Tasks
Part 1 (30 Points) / Memory Hierarchy Simulation
The tasks below are a simulation and explore hierarchical memory, page faults,
page swapping, and page replacement algorithms. The tasks below explore
only some aspects of the simulation. Subsequent assignments explore
additional details. The assignments are intentionally decomposed to allow for
flexibility and level the effort across all assignments in this course.
You will implement a cache mechanism for a message-oriented data store.
Messages need to be retrieved rapidly but due to the volume of messages not
all of them will fit into main memory and need to be stored in secondary
memory (on disk through the file system).
You may set up the source code in any way you deem reasonable, although we
do expect that you use reasonable software engineering practices, follow
naming conventions, appropriately split code into source files and use header
files as necessary. All code must be written in C.
1. (15 min / 5 pts) Set up a data structure for a "message". The
message must minimally contain a unique identifier, time sent, the
sender (right now that is just some text), a receiver (again, for now,
just some text), content (text), and a flag indicating whether the
message was delivered.
2. (20 min / 10 pts) Write a function create_msg() that creates a new
message with the fields appropriately set and returns a dynamically
allocated message "object". Define arguments and return types as
necessary. You may discuss the design of the function signature and
behavior (but not the implementation) with your peers.
3. (90 min / 35 pts) Write a function store_msg() that stores the
message in a message store on disk. The design of the message
store is up to you, but you should consider how you can quickly
retrieve messages from disk when needed. Define arguments and
return types as necessary and appropriate. You may discuss the
design of the function signature and behavior (but not the
implementation) with your peers.
4. (90 min) / 30 pts) Write a function retrieve_msg() that finds and
returns a message identified by its identifier. You may discuss the
design of the function signature and behavior (but not the
implementation) with your peers.
Part 2 (30 Points) / Caching
Currently, messages are stored on disk (in either a file per message stored in a
folder, or a block per message in a single file). Finding a message therefore
requires access to disk. In this part, you will add a "cache" so that some
number of messages are stored in a paged structure in memory. Your
messages must be a fixed size; the choice of size is yours (but should be a
power of 2, e.g., 256, 512, or 1024 bytes). Of course, part of the message's
bytes are used for "house keeping" such as identification of the sender and the
receiver, time sent, and perhaps priority, whether it is encrypted, etc. Modify
your code from Part 1 as needed.
For testing purposes, a smaller cache and message size is recommended, e.g.,
1024 bytes per message and 16 message cache. Of course, you should adjust
these limits and sizes as you see fit for development and testing.
Having a cache means that every messages that is stored ("sent") is stored in
the cache and also written to disk. When a message is read (found), it should
first be looked for in cache. If the message cannot be found in the cache, the
message needs to be loaded from disk and place in the cache.
Create appropriate lookup data structures to facilitate finding messages in the
cache. In your code, thoroughly describe your strategy and design and why
you chose it. Mention alternative designs and why you did not consider them.
You may discuss your caching strategy and implementation approach with
your colleagues on Piazza, but you may not share code.
Part 3 (20 Points) / Page Replacement
If a message is placed into the cache but no "slot" for a message is open in the
cache, then an existing message in the cache must be replaced. Devise and
then implement two page (a message in our case) replacement algorithms:
1. Random Replacement: a replacement page is chosen at random
2. LRU: the least recently used page is replaced
Part 4 (10 Points) / Evaluation
Design and implement code that will test that the cache mechanism and
calculate the following cache metrics for each page replacement algorithm:
1. number of cache hits per 1000 random message accesses
2. number of cache misses per 1000 random message accesses
3. cache hit ratio per 1000 random message accesses
You need to create a representative message set for testing. You may share
your testing approach, test message mix, and metrics with your colleagues on
Piazza and discuss. You may discuss the calculation algorithms and formulas
and share test messages but not code.
Quality (10 Points)
The remaining points are awarded based on the quality, documentation,
thoroughness, and professionalism of your code.
Simplifying Assumptions, Hints, and Tips
1. You do not need to be concerned about internal fragmentation
within messages.
2. Additional hints and clarifications will be posted on Piazza, so keep
checking Piazza regularly, discuss and participate until the due date.
Submission and next steps
In addition to submitting your code, there are two next steps involved with
grading your practicum: First, we want you to review your own work perform
a self evaluation of you or your team's work. Next, you will have an
opportunity to present your work to us in a review session, where we will take
into account your work, your explanation of design choices and walkthroughs,
and how your self-evaluation lines up with your work.
Here are the next steps:
• Submit a .tar file containing all of your code, notes, explanations,
metrics, and test cases. You may embed everything in your code or
write a separate report. For those working in teams, one
submission per pair is sufficient, but please include a quick note of
the team members in the submission, to help us keep track.
• Schedule time for your review and attend as scheduled
版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:821613408 微信:horysk8 电子信箱:[email protected]
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。