CS 3410: Computer System Organization and Programming

CS 3410, “Computer System Organization and Programming,” is your chance to learn how computers really work. You already have plenty of experience programming them at a high level, but how does your code in Java or Python translate into the actual operation of a chunk of silicon? We’ll cover systems programming in C, assembly programming in RISC-V, the architecture of microprocessors, the way programs interact with operating systems, and how to correctly and efficiently harness the power of parallelism.

Lecture

Tuesdays and Thursdays 10:10am–11:25am in Uris Hall G01

Lab Sections

When you registered for CS 3410, you also registered for a Lab Section. Please attend the Lab Section that you are enrolled in; this is the only way to get credit for the lab attendance. If you need to change Lab Sections, do so officially on Student Center, but use the swap feature so as not to lose your spot in the lecture.

SectionDayTimeLocation
201Thursday8:40am–9:55amPhillips Hall 318
202Thursday11:40am–12:55pmSnee Geological Sci 1150
203Thursday1:25pm–2:40pmCarpenter Hall 104 blue
206Thursday1:25pm–2:40pmSnee Hall Geological Sci 1150
204Thursday2:25pm–4:10pmSnee Hall Geological Sci 1150
205 + 210Friday8:40am–9:55amSnee Hall Geological Sci 1150
208Friday10:10am–11:25amPhillips Hall 318
207Friday11:40am–12:55pmSnee Hall Geological Sci 1150
209Friday2:55pm–4:10pmSnee Hall Geological Sci 1150

Syllabus

Communications


Announcements and Q&A: Ed

We will be using Ed for all announcements and communication about the course. Each assignment will also have a pinned post at the top of the Ed Discussion forum which you should check regularly, especially before you begin work on an assignment. We recommend checking Ed often, and don’t miss the announcement emails.

For time sensitive matters, please email cs3410-staff@cornell.edu. This is the fastest way to get a response as it goes straight to many inboxes.

For sensitive topics that need to be handled exclusively by the instructor(s), please email cs3410-prof@cornell.edu or meet with the instructor(s) during their bookable office hours. Please do not email the instructor(s) directly using a netID email address; it is important to keep all 3410 communication in one place.

Accessing Ed

Log in to Ed with your netid@cornell.edu email address. You can also access the Ed Discussion through the link on Canvas.

How to use Ed

99% of all matters can be handled on Ed. Do not reach out to the instructor or a TA if your question/problem is one other students might have. Asking on Ed will get you your answer faster and also help others benefit from your asking. Additionally, if you can answer someone else’s question yourself, please do (but be careful not to post solutions)!

If you’re not sure whether something is OK to post, contact the course staff privately. You can do that by marking your question as “Private” when you post it.

Never post screenshots of code.

Screenshots are inaccessible, hard to copy and paste, and hard to read on small screens (e.g., phones). Use Ed’s “code block” feature and paste the actual code.

Assignments: Gradescope

You will submit your solutions to assignments and receive feedback and grades through Gradescope. The weekly topic mastery quizzes will also be posted on Gradescope, as well as graded exams.

We try to grade anonymously, i.e., the course staff won’t know who we’re grading. So please do not put your name or NetID anywhere in the files you upload to Gradescope. (Gradescope knows who you are!)

Accessing Gradescope

Log in to Gradescope with your netid@cornell.edu email address. You can also access Gradescope through the link on Canvas.

Textbooks


This course does not closely follow any one text. You will be responsible for understanding the material presented in lecture and the lecture notes. You can find the lecture note that corresponds to each lecture on the schedule.

That said, we will post readings to accompany each lecture (also found on the schedule page). We will be using three textbooks:

  1. Computer Organization and Design RISC-V Edition: The Hardware Software Interface, 2nd Edition by David A. Patterson and John L. Hennessy (ISBN: 9780128245583)
  2. Operating Systems: Three Easy Pieces by Remzi H. Arpaci-Dusseau and Andrea C. Arpaci-Dusseau

Course Policies


Lectures

PollEverywhere

For in-class activities and polls, we will be using PollEverywhere instead of iClickers. Your participation using PollEverywhere factors into your semester grade. PollEverywhere requires you to bring an Internet-connected device, preferably one that can scan a QR code.

Typically, answering all but 1 of the questions for a given day will give you full points for the day. (This doesn’t really work when there is only one question, obviously.) There will often be a question at the very start of class. Because of the leniency baked into the scoring, we will not manually adjust your clicker score if you are late to class, must leave early, your car battery dies, you were in the bathroom for a question, you are feeling ill, you have to quarantine, etc. We know there are very good reasons to miss a PollEverywhere activity, but if we adjust scores by hand the software recognizes the inconsistency and refuses to sync future scores.

Electronic Devices

Electronic devices are known distractors for users and those nearby.

  • Phones: Phone use is only allowed to participate in PollEverywhere activities.

  • Laptops: Laptop use is allowed only in the left-hand part of the lecture hall, facing the front.

  • Tablets: If you use your tablet like a laptop (propped up and you type on it) then please use it in the laptop section. If you use your tablet like a notebook (write on it with a stylus and keep it at an angle such that those behind you do not see what your are writing) you may use it in any section.

Labs

Your physical and mental attendance at the Lab Section that you are enrolled in is required. If you work on a lab for the entire lab section, you will get credit regardless of how far you get. If you show up and do non-lab work (even if it is work for CS 3410) or if you don’t show up to lab but argue that you did the work on your own you will not get credit for the lab.

You are responsible for ensuring that your attendance was recorded properly before the end of your lab. You can check your lab attendance grade on Canvas under the grades tab or from the lab itself. We are unable to retroactively change your lab attendance grade.

Missing (i.e., not getting credit for) more than 3 labs will lower your final grade by one grade per missed lab. For example, if you earn an A- in the class but you miss 5 labs, you will receive a B in the class.This flexibility is there to account for unavoidable absences. Furthermore, due to the add/drop period, the first lab is optional (but strongly recommended!).

To help maintain a high staff-to-student ratio we require that you attend the lab section that you are enrolled in. If you need to change Lab Sections, do so officially on Student Center, but use the swap feature so as not to lose your spot in the lecture.

Office Hours

  • TA office hours are a great place to get help with assignments, weekly topic mastery questions, and technical support (e.g., setting up the course infrastructure, VS Code, using Git). See Office Hours for details.

  • Instructor office hours are for lecture material, conceptual questions, and sensitive issues. For debugging and assignment help, please use TA office hours instead (or post on Ed)!

Students with Disabilities

Your access in this course is important to us. Please register with Student Disability Services (SDS) to document your eligibility early in the semester and let us know so that we have adequate time to arrange your approved academic accommodations.

Exam Accomodations

If you have an accommodation of extended time or access to a low-distraction room, we have mechanisms in place to meet your needs. If your letter has been sent 2 weeks prior to the prelim, you can verify our awareness of your needs by checking your “score” on the Exam Accommodation Assignment. You will also receive an email confirming our awareness of your needs no later than the Friday before the exam. If, however, your accommodation is granted within 2 weeks of the prelim or you have a unique exam accommodation (for example, you need the exam to be printed in a larger font), please email cs3410-prof@cornell.edu to make sure we accommodate you in a proper and timely manner.

Lecture Accommodations

If you have an accommodation that has to do with the lecture (you need a particular seat or require that the instructor wear a particular mic), please send an email to cs3410-prof@cornell.edu to make us aware of your needs.

Efforts have been made to comply with all accessibility requirements. If you experience any access barriers in this course, such as with printed content, graphics, online materials, or any communication barriers, please reach out to the instructor or your SDS counselor right away. If you need an immediate accommodation, please speak with the instructor after class or email the instructor and SDS at sds_cu@cornell.edu. If you have or think you may have a disability, please contact SDS for a confidential discussion: sds_cu@cornell.edu, 607-254-4545, https://sds.cornell.edu.

If you experience personal or academic stress or need to talk to someone who can help, contact the instructors or:

Please also explore other mental health resources available at Cornell.

Academic Integrity

All submitted work must be completed exclusively by you. Please adhere to the following rules of collaboration:

  • Do not look at or be in possession of other students’ (current or former) solutions.
  • Do not look at code that you did not write (including code online or generated by an AI tool).
  • Do not show other students your work or (screen) share solutions, not even to help each other.
  • Do not write documentation together.
  • Do not design or write a test suite together.
  • Cite your sources.
  • Definitely ask the course staff if you’re not sure whether or not something is OK.

Discussing an assignment with others is fine as long as you do not actually look at each other’s work or discuss matters in such detail that the implementation is essentially finished. As a general rule, if you walk away from your discussion without any written (or snapshotted) notes and then start working on the assignment later on your own, you should be fine.

Most academic integrity violations occur in a moment of panic and stress. If you are tempted to make a bad choice, please do not. The grade penalty for cheating is typically a -100% on the entire assignment, which is significantly worse than simply not turning it in. Other repercussions are detailed on the official university page on Academic Integrity. (As a side note, many academic integrity violations come about when students code up an assignment for hours at the same time, sitting right next to each other. This level of fine-grain interaction usually produces effectively one submission produced by two people. This is not okay; your source code must not bear remarkable syntactic similarity to someone else’s because of your collaboration.)

This AI policy does not apply to the weekly Topic Mastery Quizzes Quizzes. You may help each other with these as much as you like. The goal is to learn the material. If you don’t, that will be obvious when assignments and prelims are graded.

Accepting Responsibility (AR)

This course is participating in Accepting Responsibility (AR), which is a pilot supplement to the Cornell Code of Academic Integrity (AI). For details about the AR process and how it supplements the AI Code, see the AR website.

Use of Generative Artificial Intelligences (GenAI)

Mastering the essential, foundational concepts of this course takes effort and practice. Accordingly, the use of generative artificial intelligence (GenAI) tools is generally discouraged in this course, but will be allowed as an experiment for Spring 2025 under the following conditions:

  • Be careful about any use of GenAI. It is known to produce incorrect responses. You are responsible for the correctness of all your work. Although GenAI could be useful as a tutor or helper in programming, it must not become the sole creator of your work.
  • You may only use Microsoft Copilot Enterprise using Cornell’s institutional license. You can log in using your NetID.
    • This policy is in place for your protection. By using Copilot Enterprise under Cornell’s license, Microsoft cannot view your conversations with Copilot, and your prompts, answers, and viewed content are not used to train the underlying large language models. Another side benefit of using Cornell’s paid license is that the answers that are provided are likely to be of higher quality than other, free GenAI tools (e.g., a personal ChatGPT or Copilot account).
    • More details about using Microsoft Copilot Enterprise at Cornell can be found here.
  • If you use GenAI on an assignment, you must cite it by providing the following information:
    • the prompt you used,
    • the answer provided by Copilot,
    • a short statement about how useful the interaction was to you.

Warning

Failure to follow this GenAI policy will constitute a violation of the academic integrity policy.

Late Policy for Assignments

TL;DR

  • Assignments 1-10 can be submitted up to three days late.
    • Late submissions to Assignment 11 will not be accepted.
  • You are given ten (10) free slip days (i.e., 24-hour penalty free extension).
  • Each slip day used beyond your initial 10 will cost you 0.25% of your semester grade.

Gradescope will accept each assignment (with the exception of the final assignment) up to 3 days late. For each day you submit an assignment late, Gradescope records that you have used a slip day. Each slip day allows you to submit an assignment 24 hours later without penalty. You may never submit an assignment more than 3 days late.

5% of your semester grade is for Punctuality Points. You earn these points by not using more than 10 slip days across assignments 1-10. No slip days will be accepted for the last assignment. At the end of the semester, we will use your slip day usage to calculate your Punctuality Points using the following formula: \[ 5 - \frac{1}{4} \max(0, \mathit{slip~days~used} - 10) \] Here are some possible scenarios:

Slip Days UsedScenarioPunctuality Points
0You submitted each assignment on time.5 (no advantage for using < 10)
10You submitted each assignment one day late.5
10You submitted two assignments one day late, one assignment two days late, and two assignments three days late.5
14You submitted six assignments one day late and four assignments two days late.4
20You submitted each of the ten assignments two days late.2.5
30You submitted each of the ten assignments three days late.0

Note

Gradescope knows no mercy. If an assignment is due on Tuesday @ 11:59 and you turn it in at 11:59.20 (yes, before midnight!), you just used a slip day.

Pro Tip

Be sure to download your assignment once it it is uploaded to Gradescope to verify that it’s the file you meant to submit. Last semester we had an inordinate number of students who submitted the release code instead of their assignment code. They had to submit their actual work as a regrade with associated penalties weeks later.

Late Policy for Topic Mastery Quizzes

There is a 48 hour grace period for all online exercises and Topic Mastery Quizzes. Submissions within the grace period incur no late penalty. Many surveys will not have that grace period because either we do not control them (TA assessments, course evaluations) or your on-time response is required to effectively manage the course (e.g., prelim conflicts, etc.) After the grace period, the quizzes will be unavailable and there is no possibility of viewing or submitting them.

Regrade Policy

If you feel that your project or exam was not graded according to the stated rubrics, you may submit a regrade request within one week of the project or exam’s return. Regrade requests are submitted via Gradescope. Please note, regrade requests are a venue for discussion about the application of rubrics, not the rubrics themselves.

If your regrade involves us grading different files than your original submission, please send the changed file(s) to cs3410-staff@cornell.edu. In your regrade request, be sure to mention that the file(s) to be graded are in the staff inbox. Also please explain how these files differ from the original ones you submitted and whether you deem these changes to be significant or minor. (Course staff will assume the changes are major unless you convince them that the changes are minor.) All regrades that involve us grading new files will incur a blanket 25/100 point deduction. If we deem your changes significant (more than just changing a few lines of C code), the regrade will incur an additional 15/100 point deduction. This will be applied even if you submitted wrong/release/corrupted/empty files the first time around. (Again, we strongly suggest you download and check the files you submit to Gradescope at submission time.)

Inclusiveness

You should expect and demand to be treated by your classmates and the course staff with respect. You belong here, and we are here to help you learn and enjoy this course. If any incident occurs that challenges this commitment to a supportive and inclusive environment, please let the instructors know so that the issue can be addressed. We are personally committed to this and subscribe to the Computer Science Department’s Values of Inclusion.

Assessment


Grading

Your semester grade will be calculated approximately as follows :

  • Assignments: 35%
  • Exams (Prelim1, Prelim2, Final): 45%
  • Weekly Topic Mastery Quizzes: 10%
  • Assignment Punctuality Points: 5%
  • Surveys: 2%
  • Online Exercises: 3%
  • Grade adjustments:
    • up to 3% bonus for up to ~22 Poll Everywhere attendance points (capped at 100% total)
    • possible grade deductions for excessive Lab absences

Assignments

Generally, assignments are released weekly on Thursdays and are due on Wednesdays at 11:59PM. See the course schedule. All assignments are to be done indvidually. You’ll turn in assignments via Gradescope. You may use generative AI on all assignments as long as you follow our policy.

NEW: Grade Cap

In terms of your final course grade, assignment scores are capped at 90%. All scores above 90% will count as “full credit” and an A average; scores below 90% will be scaled accordingly (e.g., 85% on an assignment maps to a final-grade value of 94.4%). This policy is meant to help you focus holistically on learning what each assignment is trying to teach you, not on maximizing individual points.

Exams

There are two preliminary examinations and a final exam. See the course schedule.

Bring your student ID to all of your exams. We will be taking attendance by having you swipe it through a card reader upon arrival.

Makeup Exams

Makeup exams must be scheduled within the first three weeks of class. Check the exam schedule now to see if you have a conflict with another class.

Please register your conflict by completing the corresponding survey found on Gradescope so we can schedule a makeup exam. Specifically, register

  • conflicts with Prelim 1 here, and
  • conflicts with Prelim 2 here.

Topic Mastery Quizzes

Weekly topic mastery quizzes (TMQs) will help reinforce the lessons from a given week’s lectures. We’ll release the quiz on Sunday. The material will be covered in lectures that week. The quiz due date is the following Friday. These quizzes are also distributed on Gradescope.

As the goal of these quizzes is to give you practice with the lecture material, the grading scheme is very forgiving:

  • Don’t like your score? You are welcome to retake the quiz as many times as you like before the due date. We’ll keep your best attempt.

  • Your lowest quiz score in grading, so one quiz in the semester is a “freebie.” Also, your quiz grade will be capped at 90%, meaning if you get 9/10 you do not have to retake the quiz to receive a “perfect” quiz score. Note: this cap will be implemented via post-processing by the instructor, so you will not see this cap reflected on Gradescope.

  • You may submit each quiz up to 48 hours late without penalty. See the relevant late policy here.

Spring 2025 Course Schedule

Week Day Date Topic Lecture Slides Lecture Notes Readings Lab/Assignment
1 Tu 1/21 Intro and 1+1=2
Th 1/23 Numbers and C Intro
  • [P&H] 2.4, 3.2
  • [C] Ch. 1
2 Tu 1/28 Float, Types
  • [P&H] 3.5
A2: Minifloat (Due: 2/5)
Th 1/30 Arrays & Pointers Arrays & Pointers (notes) Arrays & Pointers Arrays, Pointers, Bit Packing
3 Tu 2/4 Heap & Allocation The Stack & Heap (notes) The Stack & Heap Strings, Memory Allocation A3: Huffman (Due: 2/12)
Th 2/6 Gates, Logic Gates (notes) Gates [P&H] A.1-A.3, A.5-A.6 (Appendix)
4 Tu 2/11 State State (notes) State [P&H] A.7-A.8,A.11 (Appendix) Lab 4: GDB (Due: 2/14)
Th 2/13 RISC-V ISA (1) RISC-V (1) (notes) RISC-V [P&H] 2.1-2.3, 2.5-2.6
5 Tu 2/18 February Break! A5: CPU Simulation (Due: 2/26)
Th 2/20 RISC-V ISA (2), CPU Stages, & Prelim 1 RISC-V (2), CPU Stages CPU Stages [P&H] 2.7, 2.20
6 Tu 2/25 RISC-V: Data Memory & Control Flow RISC-V: Data Memory & Control Flow RISC-V: Data Memory & Control Flow [P&H] 2.3-2.4, 2.7, 2.14, 5.1-5.2 A6: Assembly (Due: 3/5)
Th 2/27 Pipelining & Performance Pipelining & Performance Pipelining & Performance [P&H] 4.1 - 4.3
7 Tu 3/4 Calling Conv. (1) Calling Conv (notes) [P&H] 2.8 A7: Functions (Due: 3/12)
Th 3/6 Calling Conv. (2) [P&H] 2.13
8 Tu 3/11 Compiling

RISC, CISC, & ISAs
[P&H] 2.12

[P&H] 2.16-2.18, 2.22
Buffer Overflow
Th 3/13 Caches (1)
9 Tu 3/18 Caches (2) Blocking
Th 3/20 Caches (3)
10 Tu 3/25 Processes Happy Spring Break!
Th 3/27 System Calls
Tu 4/1 Spring Break
Th 4/3 Spring Break
11 Tu 4/8 Virtual Memory Shell
Th 4/10 Prelim 2
12 Tu 4/15 Threads Concurrent Hash Table
Th 4/17 Atomics
13 Tu 4/22 Synchronization (1) Raycasting
Th 4/24 Synchronization (2)
14 Tu 4/29 Parallelism
Th 5/1 Races & Deadlock
15 Tu 5/6 Parallelism

Reading Abbreviations

  • [P&H]: Computer Organization and Design RISC-V Edition: The Hardware Software Interface, 2nd Edition by David A. Patterson and John L. Hennessy (ISBN: 9780128245583)
  • [C]: Modern C by Jens Gustedt
  • [OSTEP]: Operating Systems: Three Easy Pieces by Remzi H. Arpaci-Dusseau and Andrea C. Arpaci-Dusseau

Lab Sections

Lab sections are split 50/50 between Thursday and Friday. The work in each lab is meant to help you get started on the assignment that is out that week. There is nothing separate to turn in from lab; the work you do in lab will get turned in as part of that week’s assignment.

Exams

There are three exams:

  • Two preliminary exams:
    • Prelim 1 at February 20 at 7:30pm
    • Prelim 2 at April 10 at 7:30pm
  • The final exam, on TBA

Office Hours

We look forward to seeing you in office hours! Check out the schedule of available office hours in this Google Calendar, which is also embedded below.

With TAs

Check the calendar below for the locations of office hours (e.g., Rhodes 529, Ives 107, or a Zoom link).

In-person office hours use a simple whiteboard queueing mechanism; Zoom office hours use Queue Me In.

Office hours do not happen on official Cornell days off and breaks. (We will attempt to make the calendar reflect this fact, but please trust this statement over the calendar.)

With the Instructors

Instructor office hours are appropriate for discussing technical content and course logistics. They are less appropriate for getting help with a specific assignment; please see TAs for that.

Meet the Course Staff

Instructors

Hakim Weatherspoon
(he/him)
Professor
Hometown
Ithaca, NY
Ask me about
sports, entrepreneurship, finding a major, finding a career
Zach Susag
(he/him)
CS PhD
Hometown
St. Paul, MN
Ask me about
graduate school, Linux, programming languages, barbeque

Graduate TAs

Keting Chen
(he/him)
CS PhD
Hometown
Chengdu, China
Ask me about
tennis, calligraphy
Salman Abid
(he/him)
CS PhD
Hometown
Karachi, Pakistan
Ask me about
futsal, LitRPGs, the history of chai
Jiahan Xie
(he/him)
CS MS
Hometown
Ningbo, China
Ask me about
violin, movies
Melissa Reifman
(she/her)
CS MEng
Hometown
Upper Saddle River, NJ
Ask me about
cooking, comedy, theater
Kevin Cui
(he/him)
CS MEng
Hometown
Philadelphia, PA
Ask me about
climbing, laufey, chess

Returning Undergraduate TAs

Michael Avellino
(he/him)
CS
Hometown
East Lansing, MI
Ask me about
F1, skiing, and Mario Kart
Angelica Borowy
(she/her)
CS
Hometown
Lake Worth, FL
Ask me about
game development, rock/grunge music, guitar
Serena Duncan
(she/her)
CS
Hometown
State College, PA
Ask me about
Broadway shows, traveling, food
Peter Engel
(he/him)
Math
Hometown
Madison, WI
Ask me about
twitter.com, McDonald's
Alan Han
(he/him)
CS MEng
Hometown
Cleveland, OH
Ask me about
music, swimming, football
David Suh
(he/him)
CS & Archaeology
Hometown
Rochester, NY
Ask me about
skiing, video games
Reese Thompson
(he/him)
CS
Hometown
Horseheads, NY
Ask me about
skiing, hiking, my project team
Jake Berko
(he/him)
CS
Hometown
Cherry Hill, NJ
Ask me about
skiing, pickleball, music
Santiago Blaumann
(he/him)
CS (minor in Physics)
Hometown
Piscataway, NJ
Ask me about
Skiing, Snowboarding, Wine
Will Bradley
(he/him)
CS & Math
Hometown
Rochester, MN
Ask me about
music theater, politics, Lean 4
Caitlyn Cahill
(she/her)
CS
Hometown
Canton, MA
Ask me about
traveling, skiing, Boston sports
Edward Duan
(he/him)
CS
Hometown
Syosset, NY
Ask me about
Chinese yoyo
Alex Koiv
(he/him)
CS
Hometown
Brooklyn, NY
Ask me about
photography, bowling
Andy Li
(he/him)
CS
Hometown
Nanjing, China
Ask me about
databases, hiking, South Park
Timmy Li
(he/him)
CS
Hometown
Gainesville, FL
Ask me about
Brandon Sanderson, Riot Games :(, skiing
Ryan Mistretta
(he/him)
CS
Hometown
Goshen, NY
Ask me about
skiing, tennis, football
Tawakalt Bisola Okunola
(she/her)
CS
Hometown
Mansfield, TX
Ask me about
linguistics, Afrobeats, rap
Noah Plant
(he/him)
CS
Hometown
Sendai, Japan
Ask me about
tennis, board games, Japanese food.
Savitta Sivapalan
(she/her)
CS
Hometown
Bronx, NY
Ask me about
art, movies
Ilya Strugatskiy
(he/him)
CS & Math
Hometown
Larchmont, NY
Ask me about
hiking, Durak
Melvin Van Cleave
(he/him)
CS (minors in Math & Physics)
Hometown
Cincinnati, OH
Ask me about
physics, music, fitness

New Undergraduate TAs

Omar Abuhammoud
(he/him)
CS
Hometown
Brooklyn, NY
Ask me about
games, guitar
Galiba Anjum
(she/her)
CS & IS
Hometown
Bronx, NY
Ask me about
cats, manhwas, gacha games
Bhuwan Bhattarai
(he/him)
CS
Hometown
Columbia, MD
Ask me about
soccer, skiing, Nepal
Luciano Bogomolni
(he/him)
CS
Hometown
Miami, FL
Ask me about
skiing, technology, astrophysics
Nathan Chu
(he/him)
CS
Hometown
Los Angeles, CA
Ask me about
philosophy, League of Legends
Ozan Ersöz
(he/him)
CS
Hometown
Istanbul, Turkey
Ask me about
traveling, pipe organs, skiing
Maximilian Fanning
(he/him)
CS & Plant Science
Hometown
Seattle, WA
Ask me about
music (banjo!), theater tech
Srija Ghosh
(she/her)
CS
Hometown
Lexington, MA
Ask me about
sitcoms, mystery books, chocolate
Andrew Hu
(he/him)
CS
Hometown
Columbus, NJ
Ask me about
cats, fencing, video games
Yunoo Kim
(she/her)
CS
Hometown
Austin, TX
Ask me about
writing, linguistics, Stardew Valley
Alex McGowan
(he/him)
CS
Hometown
Denver, CO
Ask me about
fantasy/sci-fi novels, video games, Christianity
Michael Micalizzi
(he/him)
CS
Hometown
Oceanside, NY
Ask me about
music, volleyball, food
Sharafa Mohammed
(she/her)
CS
Hometown
Edison, NJ
Ask me about
food, event planning, sewing
Kayla Ng
(she/her)
CS
Hometown
Tappan, NY
Ask me about
coffee, apple cider, snowboarding
Asen Ou
(he/him)
CS
Hometown
Seoul, South Korea
Ask me about
Cars, Food, Dogs
John Palsberg
(he/him)
CS
Hometown
Los Angeles, CA
Ask me about
piano, video games
Analeah Real
(she/her)
CS
Hometown
Brooklyn, NY
Ask me about
Animal Crossing, snowboarding, dance
Kelly Yue
(she/her)
CS
Hometown
New York City, NY
Ask me about
steak, xiaolongbao
Vivian Zhou
(she/her)
CS
Hometown
Brooklyn, NY
Ask me about
movies, music

Resources

RISC-V Infrastructure

RISC-V Infrastructure

Tools

C Programming

RISC-V Assembly

RISC-V Assembly

Using the CS 3410 Infrastructure

The coursework for CS 3410 mainly consists of writing and testing programs in C and RISC-V assembly. You will need to use the course’s provided infrastructure to compile and run these programs.

Course Setup Video

We have provided a video tutorial detailing how to get started with the course infrastructure. Feel free to read the instructions below instead—they are identical to what the video describes.

Setting Up with Docker

This semester, you will use a Docker container that comes with all of the infrastructure you will need to run your programs.

The first step is to install Docker. Docker has instructions for installing it on Windows, macOS, and on various Linux distributions. Follow the instructions on those pages to get Docker up and running.

For Windows users: to type the commands in these pages, you can choose to use either the Windows Subsystem for Linux (WSL) or PowerShell. PowerShell comes built in, but you have to install WSL yourself. On the other hand, WSL lets your computer emulate a Unix environment, so you can use more commands as written. If you don’t have a preference, we recommend WSL.

Check your installation by opening your terminal and entering:

docker --version

Now, you’ll want to download the container we’ve set up. Enter this command:

docker pull ghcr.io/sampsyo/cs3410-infra

If you get an error like this: “Cannot connect to the Docker daemon at [path]. Is the docker daemon running?”, you need to ensure that the Docker desktop application is actively running on your machine. Start the application and leave it running in the background before proceeding.

This command will take a while. When it’s done, let’s make sure it works! First, create the world’s tiniest C program by copying and pasting this command into your terminal:

printf '#include <stdio.h>\nint main() { printf("hi!\\n"); }\n' > hi.c

(Or, you can just use a text editor and write a little C program yourself.)

Now, here are two commands that use the Docker container to compile and run your program.

docker run -i --init --rm -v ${PWD}:/root ghcr.io/sampsyo/cs3410-infra gcc hi.c
docker run -i --init --rm -v ${PWD}:/root ghcr.io/sampsyo/cs3410-infra qemu a.out

If your terminal prints “hi!” then you’re good to go!

You won’t need to learn Docker to do your work in this course. But to explain what’s going on here:

  • docker run [OPTIONS] ghcr.io/sampsyo/cs3410-infra [COMMAND] tells Docker to run a given command in the CS 3410 infrastructure container.
  • Docker’s -it options make sure that the command is interactive and emulates TTY terminal output, in case you need to interact with whatever’s going on inside the container, and --rm tells it not to keep around an “image” of the container after the command finishes (which we definitely don’t need).
  • --init ensures that certain basic responsibilities are handled inside the container; in particular, signal handling and reaping of zombie processes (which you’ll learn about in a few weeks).
  • -v ${PWD}:/root uses a Docker volume to give the container access to your files, like hi.c.

After all that, the important part is the actual command we’re running. gcc hi.c compiles the C program (using GCC) to a RISC-V executable called a.out. Then, qemu a.out runs that program (using QEMU).

Make rv and rv-debug Aliases

The Docker commands above are a lot to type every time, and worse, they don’t even include everything you’ll need to invoke our container! To make this easier, we can use a shell alias.

On macOS, Linux, and WSL

Try copying and pasting these commands:

alias rv='docker run -i --init -e NETID=<YOUR_NET_ID> --rm -v "$PWD":/root ghcr.io/sampsyo/cs3410-infra'

Now you can use much shorter commands to compile and run code. Just put rv or rv-debug before the command you want to run, like this:

rv gcc hi.c
rv qemu a.out

NOTE: For the -e NETID=<YOUR_NET_ID> option, use your actual Cornell NetID for the NETID value.

Unfortunately, this alias will only last for your current terminal session. To make it stick around when you open a new terminal window, you will need to add the alias rv=... command to your shell’s configuration file.

First type this command to find out which shell you’re using:

echo $SHELL

It’s probably bash or zsh, in which case you need to edit the shell preferences file in your home directory. Here is a command you can copy and paste, but fill in the appropriate file name (.bashrc or .zshrc) according to your shell:

echo "alias rv='docker run -i --init -e NETID=<YOUR_NET_ID> --rm -v "$PWD":/root ghcr.io/sampsyo/cs3410-infra'" >> ~/.bashrc

Change that ~/.bashrc at the end to ~/.zshrc if your shell is zsh.

On Windows with PowerShell (Not WSL)

(Remember, if you’re using WSL on Windows, please use the previous section.)

In PowerShell, we will create a shell function instead of an alias.

We assume that you have created a cs3410 directory on your computer where you’ll be storing all your code files.

First, open Windows PowerShell ISE (not the plain PowerShell) by typing it into the Windows search bar. There will be an editor component at the top, right under Untitled1.ps1.

There, paste the following (with an appropriate value for NETID, as above):

Function rv_d {
   if (($args.Count) -eq 0) {
      docker run -i --init -e NETID=<YOUR_NET_ID> --rm -v "${PWD}":/root ghcr.io/sampsyo/cs3410-infra
   }
   else {
      $app_args=""
      foreach ($a in $args[1..($args.count-2)]) {
         $app_args = $app_args + $a + " "
      }
      $app_args = $app_args.Substring(0,$app_args.Length-1);
      docker run -i --init -e NETID=<YOUR_NET_ID> --rm -v "${PWD}":/root ghcr.io/sampsyo/cs3410-infra $args[0] $app_args
   }
}

This will create a function called rv_d that takes zero, one, or more arguments (we’ll see what those are in a bit). We’re naming it rv_d and not just rv (as done in the next section) because PowerShell already has a definition for rv. The “d” stands for Docker.

Then, in the top left corner, click “File → Save As” and name your creation. Here, we’ll use function_rv_d. Finally, navigate to the cs3410 folder that stores all your work and once you’re there, hit “Save.”

Assuming you don’t delete it, that file will forever be there. This is how we put it to work:

Every time you’d like to run those long docker commands, open PowerShell (the plain one, not the ISE) and navigate to your cs3410 folder. Then, enter the following command:

. .\function_rv_d.ps1 

This will run the code in that script file, therefore defining the rv_d function in your current PowerShell session. Then, navigate to wherever the .c file you’re working on is located (we assume it’s called file.c) and to compile it, simply type rv_d gcc file.c. To run the compiled code, enter rv_d qemu a.out. Try it out with your hi.c file. Finally, though it’s more of a curiosity right now, running just rv_d with no arguments with give you a prompt in a bash shell, within the Docker container itself.

Debugging C Code

GDB is an incredibly useful tool for debugging C code. It allows you to see where errors happen and step through your code one line at a time, with the ability to see values of variables along the way. Learning how to use GDB effectively will be very important to you in this course.

Entering GDB Commandline Mode

First, make sure to compile your source files with the -g flag. This flag will add debugging symbols to the executable that will allow GDB to debug much more effectively. For example, running:

rv gcc -g -Wall -Wextra -Wpedantic -Wshadow -Wformat=2 -std=c23 hi.c

In order to use gdb in the 3410 container, you need to open two terminals: one for running qemu with the debug mode in the background; and the other for invoking gdb and iteract with it.

  1. First, open a new terminal, and type the following commands:

    • docker run -i --rm -v `pwd`:/root --name cs3410 ghcr.io/sampsyo/cs3410-infra:latest. Feel free the change the “name” from cs3410 to any name you prefer.
    • gcc -g -Wall ... (more flags) EXECUTABLE SOURCE.c. Once you have entered the container, compile your source file with the -g flag and any other recommended commands.
    • qemu -g 1234 EXECUTABLE ARG1 ... (more arguments). Now you can start executing qemu with the debug mode and invoke the executable file EXECUTABLE with any arguments you need to pass in.
  2. Then, open another terminal, and type the following commands:

    • docker exec -i cs3410 /bin/bash, where cs3410 is the placeholder for the name of the container you are running in the background via the first terminal.
    • gdb --args EXECUTABLE ARG1 ... (more arguments) to start executing the GDB.
    • target remote localhost:1234: execute this inside the GDB. It instructs GDB to perform remote debugging by connecting it to listen to the specified port.
    • Start debugging!
  3. Once you quit a GDB session, you need to go back to the first terminal to spin up the qemu again (Step 1.3) and then invoke GDB again (Step 2.2 and onwards).

Checking for Common C Errors

Here are some important limitations of this method:

  1. You’ll have to run that script file every time you open a new PowerShell session.
  2. This function assumes you’ll only be using it to execute rv_d gcc file.c and rv_d qemu a.out (where file.c and a.out are the .c file and corresponding executable in question). For anything else, this rv_d function doesn’t work. For those, you’d have to type in the entire Docker command and then whatever else after. Another incentive to go the WSL route.

Set Up Visual Studio Code

You can use any text editor you like in CS 3410. If you don’t know what to pick, many students like Visual Studio Code, which is affectionately known as VSCode.

It’s completely optional, but you might want to use VSCode’s code completion and diagnostics. Here are some suggestions:

  • Install VSCode’s C/C++ extension. There is a guide to installing it in the docs.
  • Configure VSCode to use the container. Put the contents of this file in .devcontainer/devcontainer.json inside the directory where you’re doing your work for a given assignment.
  • Tell VSCode to use the RISC-V setup. Put the contents of this file in .vscode/c_cpp_properties.json in your work directory.

Unix Shell Tutorial

This is a modified version of Tutorials 1 and 2 of a Unix tutorial from the University of Surrey.

Listing Files and Directories

When you first open a terminal window, your current working directory is your home directory. To find out what files are in your home directory, type:

$ ls

(As with all examples in these pages, the $ is not part of the command. It is meant to evoke the shell’s prompt, and you should type only the characters that come after it.)

There may be no files visible in your home directory, in which case you’ll just see another prompt.

By default, ls will skip some hidden files. Hidden files are not special: they just have filenames that begin with a . character. Hidden files usually contain configurations or other files meant to be read by programs instead of directly by humans. To see everything, including the hidden files, use:

$ ls -a

ls is an example of a command which can take options, a.k.a. flags. -a is an example of an option. The options change the behavior of the command. There are online manual pages that tell you which options a particular command can take, and how each option modifies the behavior of the command. (See later in this tutorial.)

Making Directories

We will now make a subdirectory in your home directory to hold the files you will be creating and using in the course of this tutorial. To make a subdirectory called “unixstuff” in your current working directory type:

$ mkdir unixstuff

To see the directory you have just created, type:

$ ls 

Changing Directories

The command cd [directory] changes the current working directory to [directory]. The current working directory may be thought of as the directory you are in, i.e., your current position in the file-system tree.

To change to the directory you have just made, type:

$ cd unixstuff

Type ls to see the contents (which should be empty).

Exercise. Make another directory inside unixstuff called backups.

The directories . and ..

Still in the unixstuff directory, type

$ ls -a

As you can see, in the unixstuff directory (and in all other directories), there are two special directories called . and ... In UNIX, . means the current directory, so typing:

$ cd .

(with is a space between cd and .) means stay where you are (the unixstuff directory). This may not seem very useful at first, but using . as the name of the current directory will save a lot of typing, as we shall see later in the tutorial.

In UNIX, .. means the parent directory. So typing:

$ cd ..

will take you one directory up the hierarchy (back to your home directory). Try it now!

Typing cd with no argument always returns you to your home directory. This is very useful if you are lost in the file system.

Pathnames

Pathnames enable you to work out where you are in relation to the whole file-system. For example, to find out the absolute pathname of your home-directory, type cd to get back to your home-directory and then type:

$ pwd

pwd means “print working directory”. The full pathname will look something like this:

/home/youruser/unixstuff

which means that unixstuff is inside youruser (your home directory), which is in turn in a directory called home, which is in the “root” top-level directory, called /.

Exercise. Use the commands ls, cd, and pwd to explore the file system.

Understanding Pathnames

First, type cd to get back to your home-directory, then type

$ ls unixstuff

to list the conents of your unixstuff directory.

Now type

$ ls backups

You will get a message like this -

backups: No such file or directory

The reason is, backups is not in your current working directory. To use a command on a file (or directory) not in the current working directory (the directory you are currently in), you must either cd to the correct directory, or specify its full pathname. To list the contents of your backups directory, you must type

$ ls unixstuff/backups

You can refer to your home directory with the tilde ~ character. It can be used to specify paths starting at your home directory. So typing

$ ls ~/unixstuff

will list the contents of your unixstuff directory, no matter where you currently are in the file system.

Summary

CommandMeaning
lslist files and directories
ls -alist all files and directories
mkdirmake a directory
cd directorychange to named directory
cdchange to home directory
cd ~change to home directory
cd ..change to parent directory
pwddisplay the path of the current directory

Copying Files

cp [file1] [file2] makes a copy of file1 in the current working directory and calls it file2.

We will now download a file from the Web so we can copy it around. First, cd to your unixstuff directory:

$ cd ~/unixstuff

Then, type:

$ curl -O https://www.cs.cornell.edu/robots.txt

The curl command puts this text file into a new file called robots.txt. Now type cp robots.txt robots.bak to create a copy.

Moving Files

mv [file1] [file2] moves (or renames) file1 to file2.

To move a file from one place to another, use the mv command. This has the effect of moving rather than copying the file, so you end up with only one file rather than two. It can also be used to rename a file, by moving the file to the same directory, but giving it a different name.

We are now going to move the file robots.bak to your backup directory.

First, change directories to your unixstuff directory (can you remember how?). Then, inside the unixstuff directory, type:

$ mv robots.bak backups/robots.bak

Type ls and ls backups to see if it has worked.

Removing files and directories

To delete (remove) a file, use the rm command. As an example, we are going to create a copy of the robots.txt file then delete it.

Inside your unixstuff directory, type:

$ cp robots.txt tempfile.txt
$ ls
$ rm tempfile.txt
$ ls

You can use the rmdir command to remove a directory (make sure it is empty first). Try to remove the backups directory. You will not be able to since UNIX will not let you remove a non-empty directory.

Exercise. Create a directory called tempstuff using mkdir , then remove it using the rmdir command.

Displaying the contents of a file on the screen

Before you start the next section, you may like to clear the terminal window of the previous commands so the output of the following commands can be clearly understood. At the prompt, type:

$ clear

This will clear all text and leave you with the $ prompt at the top of the window.

The command cat can be used to display the contents of a file on the screen. Type:

$ cat robots.txt

As you can see, the file is longer than than the size of the window, so it scrolls past making it unreadable.

The command less writes the contents of a file onto the screen a page at a time. Type:

$ less robots.txt

Press the [space-bar] if you want to see another page, and type [q] if you want to quit reading.

The head command writes the first ten lines of a file to the screen.

First clear the screen, then type:

$ head robots.txt

Then type:

$ head -5 robots.txt

What difference did the -5 do to the head command?

The tail command writes the last ten lines of a file to the screen. Clear the screen and type:

$ tail robots.txt

Exercise. How can you view the last 15 lines of the file?

Searching the Contents of a File

Using less, you can search though a text file for a keyword (pattern). For example, to search through robots.txt for the word “jpeg”, type

$ less robots.txt

then, still in less, type a forward slash [/] followed by the word to search

/jpeg

As you can see, less finds and highlights the keyword. Type [n] to search for the next occurrence of the word.

grep is one of many standard UNIX utilities. It searches files for specified words or patterns. First clear the screen, then type:

$ grep jpeg robots.txt

As you can see, grep has printed out each line containing the word “jpeg”.

To search for a phrase or pattern, you must enclose it in single quotes (the apostrophe symbol). For example to search for spinning top, type

$ grep 'web crawlers' robots.txt

Some of the other options of grep are:

  • -v: display those lines that do NOT match
  • -n: precede each matching line with the line number
  • -c: print only the total count of matched lines

Summary

CommandMeaning
cp file1 file2copy file1 and call it file2
mv file1 file2move or rename file1 to file2
rm fileremove a file
rmdir directoryremove a directory
cat filedisplay a file
less filedisplay a file a page at a time
head filedisplay the first few lines of a file
tail filedisplay the last few lines of a file
grep 'keyword' filesearch a file for keywords

Don’t stop here! We highly recommend completing the online UNIX tutorial, beginning with Tutorial 3.

Manual Pages

Unix has a built-in “help system” for showing documentation about commands, called man. Try typing this:

$ man grep

That command launches less to read more than you ever wanted to know about the grep command. If you want to know how to use a given command, try man <that_command>.

Saving Time on the Command Line

Tab completion is an extremely handy service available on the command line. It can save you time and frustration by avoiding retyping filenames all the time. Say you want to run this command to find all the occurrences of “gif” in robots.txt:

$ grep gif robots.txt

Try just typing part of the command first:

$ grep gif ro

Then hit the [tab] key. Your shell should complete the name of the robots.txt file.

History

Type history at the command line to see your command history.

$ history

The Up Arrow

Use the up arrow on the command line instead of re-typing your most recent command. Want the command before that? Type the up arrow again!

Try it out! Hit the up arrow! If you’ve been stepping through these tips, you’ll probably see the command history.

Ctrl+r

If you need to find a command you typed 10 commands ago, instead of typing the up arrow 10 times, hold the [control] key and type [r]. Then, type a few characters contained within the command you’re looking for. Ctrl+r will reverse search your history for the most recent command that has that string.

Try it out! Assuming you’ve been working your way through all these tutorials, typing Ctrl+r and then grep will show you your last grep command. Hit return to execute that command again.

Git

Git is an extremely popular tool for software version control. Its primary purpose is to track your work, ensuring that as you make incremental changes to files, you will always be able to revert to, see, and combine old versions. When combined with a remote repository (in our case GitHub), it also ensures that you have an online backup of your work. Git is also a very effective way for multiple people to work together: collaborators can upload their work to a shared repository. (It certainly beats emailing versions back and forth.)

In CS 3410, we will use git as a way of disseminating assignment files to students and as a way for you to transfer, store, and backup your work. Please work in the class git repository that is created for you and not a repository of your own. (Publishing your code to a public repository is a violation of academic integrity rules.)

A good place to start when learning git is the free Pro Git book. This reference page will provide only a very basic intro to the most essential features of git.

Installing Git

If you do not have git installed on your own laptop, you can install it from the official website. If you encounter any problems, ask a TA.

Activate your Cornell GitHub Account

Before we can create a repository for you in this class, we will need you to activate your Cornell github account. Go to https://github.coecis.cornell.edu and log in with your Cornell NetID and password.

Create a Repository

Create a new repository on GitHub: Go to the top right of the GitHub home page, where you’ll see a bell, a plus sign, and your profile icon (which is likely just a pixely patterned square unless you uploaded your own). Click on the downward pointing triangle to the right of the plus sign, and you’ll see a drop-down menu that looks like this:

New Repository

Click on “New repository” and then create a new repository like this:

Name Repository

Note that the default setting is to make your repository public (visible to everyone). Any repository that contains code for this course should be made private; a public repository shares your code with others which constitutes an academic integrity violation.

Now click on the green “Create Repository” button.

Set Up Credentials

Before you can clone your repository (get a local copy to work on), you will need to set up SSH credentials with GitHub.

First, generate an SSH key if you don’t already have one. Just type this command:

$ ssh-keygen -C "<netid>@cornell.edu"

and use your NetID. The prompts will let you protect your key with a passphrase if you want.

Next, follow the instructions from GitHub to add the new SSH key to your GitHub account. To summarize, go to Settings -> SSH and GPG Keys -> New SSH key, and then paste the contents of a file named something like ~/.ssh/id_rsa.pub.

Clone the Repository

Cloning a git repository means that you create a local copy of its contents. You should clone the repository onto your own local machine (lab computer or laptop).

Find the green button on the right side of the GitHub webpage for your repository that says “Code”. Click it, then choose the “SSH” tab. Copy the URL there, which will look like this:

git@github.coecis.cornell.edu:abc123/play_repo.git

In a terminal, navigate to the folder where you would like to put your repository, and type:

$ git clone <PASTE>

That is, just type git clone (then a space) and paste the URL from GitHub. Run this command to download the repository from GitHub to your computer.

At this point, you’ll get authentication errors if your SSH key isn’t set up correctly. So try that again if you get messages like “Please make sure you have the correct access rights and the repository exists.”

Look Around

Type cd play_repo to enter the repository. Type ls and you’ll see that your repo currently has just one file in it called README.md.

Type git status to see an overview of your repository. This command will show the status of your repository and the files that you have changed. At first, this command won’t show much.

Tracking Files with Git

There are 3 steps to track a file with git and send it to GitHub: stage, commit, and push.

Stage

To try it out, let’s make a new file. Create a new file called <netid>.txt (use your NetID in there). Now type git add NetID.txt from the directory containing the file to stage the file. Staging informs git of the existence of the file so it can track its changes.

Type git status again. You will see the file you added highlighted in green. This means that the file is staged, but we still have two more steps to go to send your changes to GitHub. (You might consider going back to the GitHub web interface to confirm that your new <netid>.txt file doesn’t show up there yet.)

Commit

A commit is a record of the state of the repository at a specific time. To make a commit, run this command:

$ git commit -m "Added my favorite color!"

The message after -m is a commit message, which is an explanation of the changes that you have made since you last committed. Good commit messages help you keep track of the work you’ve done.

This commit is now on your local computer. Try refreshing the GitHub repository page to confirm that it’s still not on the remote repository.

Push

To send our changes to the server, type this:

$ git push

The git push command sends any commits you have on your local machine to the remote machine. You should imagine you are pushing them over the internet to GitHub’s servers. Try refreshing the GitHub repository page again—now you should see your file there!

Pull

You will also want to retrieve changes from the remote server. This is especially helpful if you work on the repository from different machines. Type this command:

$ git pull

For now, this should just say that everything’s up to date. But if there were any new changes on the server, this would download them.

Typical Usage Pattern

Here is a good git workflow you should follow:

  1. git pull: Type this before you start working to make sure you’re working on the most up to date version of your code (also in case the staff had to push any updates to you).
  2. Work on your files.
  3. git add file.txt: Type this for each file you either modified or added to the repo while you were working. Not sure what you touched or what’s new? Type git status and git will tell you!
  4. git commit -m "very helpful commit message": Save your changes in a commit. Write a message to remind your future self what you did.
  5. git push: Remember that, without the push, the changes are only your machine. If your laptop falls in a lake, then they’re gone forever. Push them to the server for safekeeping.

Git can be a little overwhelming, and sometimes the error messages can be hard to understand. Most of the time, following the instructions git gives you will help; if you run into real trouble, though, please ask a TA. If things get really messed up, don’t be afraid to clone a new copy of your repository and go from there.

It is completely OK to only know a few of the most common git commands and to not really understand how the whole thing works. Many professional programmers get immense value out of git while only ever using add, commit, push, and pull. Don’t worry about learning everything about git up front—you are already ready to use it productively!

Even More Commands

Here are a few other commands you might find useful. This is far from everything—there is a lot more in the git documentation.

Log

Type this command:

$ git log <netid>.txt

You’ll see the history of README.md. You will see the author, time, and commit message for every commit of this file, along with the commit hash, which is how Git labels your commits and how you reference them if you need to. At this point, you’ll only see a single commit. But if you were to change the file and run git commit again, you would see the new change in the log.

You can also type git log with no filename afterward to get a history of all commits in your entire repository.

Stash

If you want to revert to the state of the last commit after making some new changes, you can type git stash. Stashed changes are retrievable, but it might be a hassle to do so.

git stash only works on changes that have not yet been committed. If you accidentally commit a change and want to wipe it out before pulling work from other machines, use git reset HEAD~1 to undo the last commit (and then stash).

Introduction to SSH

SSH (Secure SHell) is a tool that lets you connect to another computer over the Internet to run commands on it. You run the ssh command in your terminal to use it.

The Cornell CS department has several machines available to you, if you want to use them to do your work. SSH is the (only) way to connect to these machines.

Accessing Cornell Resources from Off Campus

Cornell’s network requires you to be on campus to connect to Cornell machines. (This is a security measure: it is meant to prevents attacks from off campus.)

To access Cornell machines when you’re elsewhere, Cornell provides a mechanism called a Virtual Private Network (VPN) that lets you pretend to be on campus. Read more about Cornell’s VPN if you need it.

Log On

Make sure you are connected to the VPN or Cornell’s WiFi. Open a terminal window and type:

ssh <netid>@ugclinux.cs.cornell.edu

but replace with your actual NetID (don’t include the <>). Type yes and hit enter to accept the new SSH host key. Now type your NetID password.

You’re in! You should see a shell prompt; you can follow the Unix shell tutorial to learn how to use it.

Here, ugclinux.cs.cornell.edu is the name of a collection of servers that Cornell runs for this purpose. That’s what you’d replace with a different domain name to connect to a different machine.

scp

Suppose you have a file on the ugclinux machines and you want to get a copy locally onto your machine. The scp command can do this. It works like a super-powered version of the cp command that can copy between machines.

Say your file game.c is located at /home/yourNetID/mygame/game.c on ugclinux. On your local machine (i.e., when not connected over SSH already), type:

$ scp yourNetID@ugclinux.cs.cornell.edu:mygame/game.c .  

Here are the parts of that command:

$ scp <user>@<host>:<source> <dest>  

<user> and <host> are the same information you use to connect to the remote machine with the ssh command. <source> is the file on that remote machine that you want to obtain, and <dest> is the place where you want to copy that file to.

Makefile Basics

This document is meant to serve as a very brief reference on how to read the Makefiles provided in this class. This tutorial is meant to be just enough to help you read the Makefiles you provide, and is not meant to be a complete overview of Makefiles or enough to help you make your own. If you are interested in learning more, there are some good tutorials online, such as this walkthrough.

A Makefile is often used with C to help with automating the (repetitive) task of compiling multiple files. This is especially helpful in cases where there are multiple pieces of your codebase you want to compile separately, such as choosing to test a program or run that program.

Variables

To illustrate how this works, let us examine a few lines in the Makefile that will be used for the minifloat assignment. Our first line of code is to define a variable CFLAGS:

CFLAGS=-Wall -Wpedantic -Werror -Wshadow -Wformat=2 -Wconversion -std=c99

As in other settings, defining this variable CFLAGS allows us to use the contents (a string in this case) later in our Makefile. Our specific choice of CFLAGS here is to indicate that we are defining the flags (for C) that we will be using in this Makefile. Later, when we use this variable in-line, the Makefile will simply replace the variable with whatever we defined it as, thus allowing us to use the same flags consistently for every command we run.

Commands

The rest of our Makefile for this assignment will consist of commands. A command has the following structure:

name: dependent_files
  operation_to_run

The name of a command is what you run in your terminal after make, such as make part1 or make all (this gets a bit more complicated in some cases). The dependent_files indicate which files this command depends on – the Makefile will only run this command if one of these files changed since the last time we ran it. Finally, the operation is what actually gets run in our console, such as when we run gcc main.c -o main.o.

Example Command

To make this more concrete, let us examine our first command for part1:

part1: minifloat.c minifloat_test_part1.c minifloat_test_part1.expected
	$(CC) $(CFLAGS) minifloat.c minifloat_test_part1.c -o minifloat_test_part1.out

This command will execute when we run make part1, but only if one of minifloat.c, minifloat_test_part1.c or minifloat_test_part1.expected have been modified since we last ran this command. What actually runs is the next line, with the $(CC), $(CFLAGS), and a bunch of filenames. $(CC) is a standard Makefile variable that is replaced by our C compiler – in our case, this is gcc. The $(CFLAGS) variable here is what we defined earlier, so we include all of the flags we desired. Finally, the list of files is exactly the same as we might normally run with gcc. In total, then, this entire operation will be translated to:

$(CC) $(CFLAGS) minifloat.c minifloat_test_part1.c -o minifloat_test_part1.out
-->
gcc $(CFLAGS) minifloat.c minifloat_test_part1.c -o minifloat_test_part1.out
-->
gcc -Wall -Wpedantic -Werror -Wshadow -Wformat=2 -Wconversion -std=c99 minifloat.c minifloat_test_part1.c -o minifloat_test_part1.out

This compilation would be a huge pain to type out everytime, especially with all of those flags (and easy to mess up), but with the Makefile, we can run all this with just make part1. We can do the same with make part2 to run the next set of commands instead.

Clean

One final node is that it is conventional (though not required) to include a make clean that removes any generated files, often for being able to clean up our folder or push our work to a Git repository. In our particular file, we have defined clean to remove the generated .out files and any .txt files that were used for testing:

clean:
	rm -f *.out.stackdump
	rm -f *.out
	rm -f *.txt

Complete Makefile

For reference, the entirity of our Makefile is included here:

CFLAGS=-Wall -Wpedantic -Werror -Wshadow -Wformat=2 -Wconversion -std=c99
CC = gcc

all: part1 part2 part3

part1: minifloat.c minifloat_test_part1.c minifloat_test_part1.expected
	$(CC) $(CFLAGS) minifloat.c minifloat_test_part1.c -o minifloat_test_part1.out

part2: minifloat.c minifloat_test_part2.c
	$(CC) $(CFLAGS) minifloat.c minifloat_test_part2.c -o minifloat_test_part2.out

part3: minifloat.c minifloat_test_part3.c
	$(CC) $(CFLAGS) minifloat.c minifloat_test_part3.c -o minifloat_test_part3.out

clean:
	rm -f *.out.stackdump
	rm -f *.out
	rm -f *.txt

.PHONY: all clean

C Programming

Much of the work in CS 3410 involves programming in C. This section of the site contains some overviews of most of the C features you will need in CS 3410.

For authoritative details on C and its standard library, the C reference on cppreference.com (despite the name) is a good place to look. For example, here’s a list of all the functions in the stdio.h header, and here’s the documentation specifically about the fputs function.

Compiling and Running C Code

Before you proceed with this page, follow the instructions to set up the course’s RISC-V infrastructure.

Your First C Program

Copy and paste this program into a text file called first.c:

#include <stdio.h> 

int main() {
    printf("Hello, CS 3410!\n");
    return 0;
}

Next, run this command:

$ rv gcc -o first first.c

Here are some things to keep in mind whenever these pages ask you to run a command:

  • The $ is not part of the command. This is meant to evoke the command-line prompt in many shells, and it is there to indicate to you that the text that follows is a command that you should run. Do not include the $ when you type the command.
  • Our course’s RISC-V infrastructure setup has you create an rv alias for running commands inside the infrastructure container. We will not always include an rv prefix on example commands we list in these pages. Whenever you need to run a tool that comes from the container, use the rv prefix or some other mechanism to make sure the command runs in the container.
  • As with all shell commands, it really matters which directory you’re currently “standing in,” called the working directory. Here, first.c and first are both filenames that implicitly refer to files within the working directory. So before running this command, be sure to cd to the place where your first.c file exists.

If everything worked, you can now run this program with this command:

$ rv qemu first
Hello, CS 3410!

(Just type the rv qemu first part. The next line, without the $, is meant to show you what the command should print as output after you hit return.)

This command uses QEMU, an emulator for the RISC-V instruction set, to run the program we just compiled, which is in the file named first.

Recommended Options

While the simple command gcc -o first first.c works fine for this simple example, we officially recommend that you always use a few additional command-line options that make the GCC compiler more helpful. Here are the ones we recommend:

-Wall -Wextra -Wpedantic -Wshadow -Wformat=2 -std=c23

In other words, here’s our complete recommended command for compiling your C code:

$ rv gcc -Wall -Wextra -Wpedantic -Wshadow -Wformat=2 -std=c23 hi.c

Many assignments will include a Makefile that supplies these options for you.

Checking for Common C Errors

Memory-related bugs in C programs are extremely common! The worst thing about them is that they can cause obscure problems silently, without even crashing with a reasonable error message. Fortunately, GCC has built-in tools called sanitizers that can (much of the time, but not always) catch these bugs and give you reasonable error messages.

To use the sanitizers, add these flags to your compiler command:

-g -fsanitize=address -fsanitize=undefined

So here’s a complete compiler command with sanitizers enabled:

$ rv gcc -Wall -Wextra -Wpedantic -Wshadow -Wformat=2 -std=c23 -g -fsanitize=address -fsanitize=undefined hi.c

Then run the resulting program to check for errors.

LeakSanitizer in RISC-V

Unfortunately, LeakSanitizer, the part of AddressSanitizer that detects memory leaks, does not work properly on RISC-V platforms. As a result, memory leaks will not be caught when using the sanitizers within our infrastructure container.

Instead, we will attempt to provide leak check smoke tests on Gradescope which check for memory leaks when you submit your code.

We recommend trying the sanitizers whenever your code does something mysterious or unpredictable. It’s an unfortunate fact of life that, unlike many other languages, bugs in C code can silently cause weird behavior; sanitizers can help counteract this deeply frustrating problem.

C Basics

This section is an overview of the basic constructs in any C program.

Variable Declarations

C is a statically typed languages, so when you declare a variable, you must also declare its type.

int x; 
int y; 

Variable declarations contain the type (int in this example) and the variable name (x and y in this example). Like every statement in C, they end with a semicolon.

Assignment

Use = to assign new values to variables:

int x;
x = 4;

As a shorthand, you can also include the assignment in the same statement as the declaration:

int y = 6;

Expressions

An expression is a part of the code that evaluates to a value, like 10 or 7 * (4 + 2) or 3 - x. Expressions appear in many places, including on the right-hand side of an = in an assignment. Here are a few examples:

int x; 
x = 4 + 3 * 2;
int y = x - 6; 
x = x * y;

Functions

To define a function, you need to write these things, in order: the return type, the function name, the parameter list (each with a type and a name), and then the body. The syntax looks like this:

<return type> <name>(<parameter type> <parameter name>, ...) {
    <body>
}

Here’s an example:

int myfunc(int x, int y) {
  int z = x - 2 * y; 
  return z * x;
}

Function calls look like many other languages: you write the function name and then, in parentheses, the arguments. For example, you can call the function above using an expression like myfunc(10, 4).

The main Function

Complete programs must have a main function, which is the first one that will get called when the program starts up. main should always have a return type of int. It can optionally have arguments for command-line arguments (covered later).

Here’s a complete program:

int myfunc(int x, int y) {
  int z = x - 2 * y; 
  return z * x;
}

int main() {
  int z = myfunc(1, 2);
  return 0;
}

The return value for main is the program’s exit status. As a convention, an exit status of 0 means “success” and any nonzero number means some kind of exceptional condition. So, most of the time, use return 0 in your main.

Includes

To use functions declared somewhere else, including in the standard library, C uses include directives. They look like this:

#include <hello.h>
#include "goodbye.h"

In either form, we’re supplying the filename of a header file. Header files contain declarations for functions and variables that C programs can use. The standard filename extension for header files in C is .h. You should use the angle-bracket version for library headers and the quotation-mark version for header files you write yourself.

Printing

To print output to the console, use printf, a function from the C standard library which takes:

  • A string to print out, which may include format specifiers (more on these in a moment).
  • For each format specifier, a value to fill in for each format specifier.

The first string might have no format specifiers at all, in which case the printf only has a single argument. Here’s what that looks like:

#include <stdio.h>

int main() {
  printf("Hello, world!\n");
}

The \n part is an escape sequence that indicates a newline, i.e., it makes sure the next thing we output goes on the next line.

Format specifiers start with a % sign and include a few more characters describing how to print each additional argument. For example, %d prints a given argument as a decimal integer. Here’s an example:

#include <stdio.h> 

int main() {
  int x = 3; 
  int y = 4; 
  printf("x + y = %d.\n", x + y);
}

Here are some format specifiers for printing integers in different bases:

BaseFormat SpecifierExample
decimal%dprintf("%d", i);
hexadecimal%xprintf("%x", i);
octal%oprintf("%o", i);

And here are some common format specifiers for other data types:

Data TypeFormat SpecifierExample
string%sprintf("%s", str);
char%cprintf("%c", c);
float%fprintf("%f", f);
double%lfprintf("%lf", d);
long%ldprintf("%ld", l);
long long%lldprintf("%lld", ll);
pointers%pprintf("%p", ptr);

See the C reference for details on the full set of available format specifiers.

Basic Types in C

Some Common Data Types

TypeCommon Size in BytesInterpretation
char1one ASCII character
int4signed integer
float4single-precision floating-point number
double8double-precision floating-point number

A surprising quirk about C is that the sizes of some types can be different in different compilers and platforms! So this table lists common byte sizes for these types on popular platforms.

Characters

Every character is corresponds to a number. The mapping between characters and numbers is called the text encoding, and the ubiquitous one for basic characters in the English language is called ASCII. Here is a table with some of the most common characters in ASCII:

ASCII Mappings

For all the characters in ASCII (and beyond), see this ASCII table.

Booleans

C does not have a bool data type available by default. Instead, you need to include the stdbool.h header:

#include <stdbool.h>

That lets you use the bool type and the true and false expressions. If you get an error like unknown type name 'bool', just add the include above to fix it.

Prototypes and Headers

Declare Before Use

In C, the order of declarations matters. This program with two functions works fine:

#include <stdio.h>

void greet(const char* name) {
  printf("Hello, %s!\n", name);
}

int main() {
  greet("Eva");
  return 0;
}

But what happens if you just reverse the two function definitions?

#include <stdio.h>

int main() {
  greet("Eva");
  return 0;
}

void greet(const char* name) {
  printf("Hello, %s!\n", name);
}

The compiler gives us this somewhat confusing error message:

error: implicit declaration of function 'greet'

The problem is that, in C, you have to declare every name before you can use it. So the declaration of greet has to come earlier in the file than the call to greet("Eva").

Declarations, a.k.a. Prototypes

This declare-before-use rule can make it awkward to define functions in the order you want, and it seems to be a big problem for mutual recursion. Fortunately, C has a mechanism to let you declare a name before you define what it means. All the functions we’ve seen so far have been definitions (a.k.a. implementations), because they include the body of the function. A function declaration (a.k.a. prototype) looks the same, except that we leave off the body and just write a semicolon instead:

void greet(const char* name);

A declaration like this tells the compiler the name and type of the function, and it amounts to a promise that you will later provide a complete definition.

Here’s a version of our program above that works and keeps the function definition order we want (main and then greet):

#include <stdio.h>

void greet(const char* name);

int main() {
  greet("Eva");
  return 0;
}

void greet(const char* name) {
  printf("Hello, %s!\n", name);
}

By including the declaration at the top of the file, we are now free to call greet even though the definition comes later.

Header Files

It is so common to need to declare a bunch of functions so you can call them later that C has an entire mechanism to facilitate this: header files. A header is a C source-code file that contains declarations that are meant to be included in other C files. You can then “copy and paste” the contents of header files into other C code using the #include directive.

Even though the C language makes no formal distinction between what you can do in headers and in other files, it is a universal convention that headers have the .h filename extension while “implementation” files use the .c extension. For example, we could put our greet declaration into a utils.h header file:

void greet(const char* name);

Then, we might put this in main.c:

#include <stdio.h>
#include "utils.h"

int main() {
  greet("Eva");
  return 0;
}

void greet(const char* name) {
  printf("Hello, %s!\n", name);
}

The line #include "utils.h" instructs the C preprocessor to look for the file called utils.h and paste its entire contents in at that location. Because the preprocessor runs before the compiler, this two-file version of our project looks exactly the same to the compiler as if we had merged the two files by hand. You can read more about #include directives, including about the distinction between angle brackets and quotation marks.

Multiple Source Files

Eventually, your C programs will grow large enough that it’s inconvenient to keep them in one .c file. You could distribute the contents across several files and then #include them, but there is a better way: we can compile source files separately and then link them.

To make this work in our example, we will have three files. First, our header file utils.h, as before, just contains a declaration:

void greet(const char* name);

Next, we’re write an accompanying implementation file, utils.c:

#include <stdio.h>
#include "utils.h"

void greet(const char* name) {
  printf("Hello, %s!\n", name);
}

As a convention, C programmers typically write their programs as pairs of files: a header and an implementation file, with the same base name and different extensions (.h and .c). The idea is that the header declares exactly the set of functions that the implementation file defines. So in that way, the header file acts as a short “table of contents” for what exists in the longer implementation file.

Let’s call the final file main.c:

#include "utils.h"

int main() {
  greet("Eva");
  return 0;
}

Notably, we use #include "utils.h" to “paste in” the declaration of greet, but we don’t have its definition here.

Now, it’s time to compile the two source files, utils.c and main.c. Here are the commands to do that:

$ gcc -c utils.c -o utils.o
$ gcc -c main.c -o main.o

(Remember to prefix these commands with rv to use our RISC-V infrastructure.)

The -c flag tells the C compiler to just compile the single source file into an object file, not an executable. An object file contains all code for a single C source program, but it is not directly runnable yet—for one thing, it might not have a main function. Using -o utils.o tells the compiler to put the output in a file called utils.o. As a convention, the filename extension for object files is .o.

You’ll notice that we only compiled the .c files, not the .h files. This is intentional: header files are only for #includeing into other files. Only the actual implementation files get compiled.

Finally, we need to combine the two object files into an executable. This step is called linking. Here’s how to do that:

$ gcc utils.o main.o -o greeting

We supply the compiler with two object files as input and tell it where to put the resulting executable with -o greeting. Now you can run the program:

$ ./greeting

(Use rv qemu greeting to use the course RISC-V infrastructure.)

Control Flow

Logical Operators

Here are some logical operators you can use in expressions:

ExpressionTrue If…
expr1 == expr2expr1 is equal to expr2
expr1 != expr2expr1 is not equal to expr2
expr1 < expr2expr1 is less than expr2
expr1 <= expr2expr1 is less than or equal to expr2
expr1 > expr2expr1 is greater than expr2
expr1 >= expr2expr1 is greater than or equal to expr2
!exprexpr is false (i.e., zero)
expr1 && expr2expr1 and expr2 are true
expr1 || expr2expr1 or expr2 is true

false && expr2 will always evaluate to false, and true || expr2 will always evaluate to true, regardless of what expr2 evaluates to. This is called “short circuiting”: C evaluates the left-hand side of these expressions first and, if the truth value of that expression means that the other one doesn’t matter, it won’t evaluate the right-hand side at all.

Conditionals

Here is the syntax for if/else conditions:

if (condition) {
  // code to execute if condition is true
} else if (another_condition) {
  // code to execute if condition is false but another_condition is true
} else {
  // code to execute otherwise
}

The else if and else parts are optional.

Switch/Case

A switch statement can be a succinct alternative to a cascade of if/elses when you are checking several possibilities for one expression.

switch (expression) {
  case constant1:
    // code to execute if expression equals constant1
    break;
  case constant2:
    // code to execute if expression equals constant2
    break;
  // ...
  default:
    // code to be executed if expression doesn't match any case
}

While Loop

while (condition) {
  // code to execute as long as condition is true
}

For Loop

for (initialization; condition; increment) {
  // code to execute for each iteration
}

Roughly speaking, this for loop behaves the same way as this while equivalent:

initialization;
while (condition) {
  // code to execute for each iteration
  increment;
}

break and continue

To exit a loop early, use a break; statement. A break statement jumps out of the innermost enclosing loop or switch statement. If the break statement is inside nested contexts, then it exits only the most immediately enclosing one.

To skip the rest of a single iteration of a loop, but not cancel the loop entirely, use continue.

Declaring Your Own Types in C

Structures

The struct keyword lets you declare a type that bundles together several values, possibly of different types. To access the fields inside a struct variable, use dot syntax, like thing.field. Here’s an example:

struct rect_t {
  int left;
  int bottom;
  int right;
  int top;
};

int main() {
  struct rect_t myRect;
  myRect.left = -4;
  myRect.bottom = 1;
  myRect.right = 8;
  myRect.top = 6;

  printf("Bottom left = (%d,%d)\n", myRect.left, myRect.bottom);
  printf("Top right = (%d,%d)\n", myRect.right, myRect.top);

  return 0;
}

This program declares a type struct rect_t and then uses a variable myRect of that type.

Enumerations

The enum keyword declares a type that can be one of several options. Here’s an example:

enum threat_level_t {
  LOW,
  GUARDED,
  ELEVATED,
  HIGH,
  SEVERE
};

void printOneLevel(enum threat_level_t threat) {
  switch (threat) {
    case LOW:
      printf("Green/Low.\n");
      break;
    // ...omitted for brevity...
    case SEVERE:
      printf("Red/Severe.\n");
      break;
  }
}

void printLevels() {
  printf("Threat levels are:\n");
  for (int i = LOW; i <= SEVERE; i++) {
    printOneLevel(i);
  }
}

This code declares a type enum threat_level_t that can be one of 5 values.

Type Aliases

You can use the typedef keyword to give new names to existing types. Use typedef <old type> <new name>;, like this:

typedef int whole_number;

Now, you can use whole_number to mean the same thing as int.

Short Names for Structs and Enums

You may have noticed that struct and enum declarations make types that are kind of long and hard to type. For example, we declared a type enum threat_level_t. Wouldn’t it be nice if this type could just be called threat_level_t?

typedef is also useful for defining these short names. You could do this:

enum _threat_level_t { ... }
typedef enum _threat_level_t threat_level_t;

And that does work! But there’s also a shorter way to do it, by combining the enum and the typedef together:

typedef enum {
  ...
} threat_level_t;

That defines an anonymous enumeration and then immediately gives it a sensible name with typedef.

Below is a helpful table which summarizes the different ways that you can declare and initialize a struct (or an enum).

Description Declaration Declaration & Initialization
Define a type struct rect_t only.
struct rect_t {
  int left;
  int bottom;
  int right;
  int top;
};
struct rect_t myRect;
myRect.left = 1;
...
myRect has type struct rect_t
Define a type struct _rect_t and then define its type alias rect_t.
struct _rect_t {
  int left;
  int bottom;
  int right;
  int top;
};
typedef struct _rect_t rect_t;
struct _rect_t myRect;
myRect.left = 1;
...
OR
rect_t myRect;
myRect.left = 1;
...
Define a type struct _rect_t and its type alias rect_t in the same statement.
typedef struct _rect_t {
  int left;
  int bottom;
  int right;
  int top;
} rect_t;
struct _rect_t myRect;
myRect.left = 1;
...
OR
rect_t myRect;
myRect.left = 1;
...
Define a type rect_t.
typedef struct {
  int left;
  int bottom;
  int right;
  int top;
} rect_t;
rect_t myRect;
myRect.left = 1;
...

Bit Packing

Structs work well when you want to combine several types that have “nice” sizes: 1, 4, or 8 bytes, for example. But they can waste space if you actually only need a few bits for your values. For example, we learned that the float type is 32 bits: 1 sign bit, 8 exponent bits, and 23 significand bits. If we wanted to “fake” a floating-point number with a struct, we couldn’t use a 1-bit and 23-bit type. The best we can do is to use 8 bits, 8 bits, and 32 bits:

#include <stdio.h>
#include <stdint.h>

typedef struct {
    uint8_t sign;
    uint8_t exponent;
    uint32_t significand;
} fake_float_t;

int main() {
    printf("size: %lu\n", sizeof(fake_float_t));
}

That struct uses a total of 6 bytes for its fields. But compilers often need to insert padding to make sure values are aligned for efficient memory access, so the struct can be bigger than that. Here, we use sizeof to measure the actual total size of the struct, which is 8 bytes—twice as big as a real 4-byte float!

This section will show you how to pack these irregularly-sized values into integers—a trick that you can call bit packing. The big idea is to treat integer types like uint32_t just as sequences of bits rather than as actual integers, and to use C’s built-in bit-manipulation operations to insert and extract ranges of bits. The key operations are:

  • Masking, with the bitwise “and” operator, &.
  • Combining, with the bitwise “or” operator, |.
  • Shifting, with the bitwise shift operators >> and <<.

You may find it helpful to look over the full list of arithmetic and bit manipulation operators in C.

Shifting

In C, i << n shifts the bits in an integer i leftward by n places, filling in the bottom n bits with zeroes. Mathematically, this has the effect of multiplying i by \(2^n\):

#include <stdio.h>
#include <stdint.h>

int main() {
    uint32_t n = 21;
    printf("double n: %u\n", n << 1);
}

Similarly, i >> n shifts the bits rightward by n places, so it multiplies i by \(2^{-n}\).

These shift operations are useful for moving bit patterns around within the range of bits in the value. Let’s try moving a value around in a uint32_t and printing out the bits:

#include <stdio.h>
#include <stdint.h>

int main() {
    uint32_t n = 21;
    printf("%032b\n", n);
    printf("%032b\n", n << 8);
    printf("%032b\n", n << 16);
    printf("%032b\n", n << 24);
}

That %032b specifier tells printf to pad the value out to 32 bits for consistency. If you run this program, you can see the bit-pattern for the value 21 moving around within the range of 32 bits:

00000000000000000000000000010101
00000000000000000001010100000000
00000000000101010000000000000000
00010101000000000000000000000000

Combining

The bitwise “or” operator, written in C with a single |, is useful for combining different values that have been shifted to different places. The insight is that x | 0 == x for any bit x, and our shifted values have zeroes wherever they are “inactive.” Let’s try shifting two different small values to two different positions and then combining them:

#include <stdio.h>
#include <stdint.h>

int main() {
    uint32_t x = 21;
    uint32_t y = 17;
    printf("x:      %032b\n", x);
    printf("y<<8:   %032b\n", y << 8);
    printf("x|y<<8: %032b\n", x | (y << 8));
}

If you run this program, you can see the bit patterns for 21 and 17 coexisting happily, side-by-side. Because we know these values fit in 8 bits, we can think of the first value occupying bits 0 through 7 (numbered from the least significant bit) and the next one occupying bits 8 through 15 in the combined value.

Masking

Next, we want a way to extract bits out of one of these combined values. The idea is to use the bitwise “and” operator, &, together with a mask value that has ones exactly where the bits are that we’re interested in. We’ll use this property of the & operator:

  • Wherever mask is 1, mask & x == x for any bit x.
  • Wherever mask is 0, mask & x == 0 for any bit 0.

So a mask value has the effect of preserving values from x where it’s 1 and ignoring them (turning them into to 0) where it’s 0.

Let’s construct a mask to separate the two packed values from last time:

#include <stdio.h>
#include <stdint.h>

int main() {
    uint32_t x = 21;
    uint32_t y = 17;
    uint32_t comb = x | (y << 8);
    printf("comb:        %032b\n", comb);

    uint32_t x_mask = 0b00000000000000000000000011111111;
    uint32_t y_mask = 0b00000000000000001111111100000000;

    printf("comb&x_mask: %032b\n", comb & x_mask);
    printf("comb&y_mask: %032b\n", comb & y_mask);
}

Running this program will show how we’ve “separated” the combined value back into its constituent parts.

When writing masks, it can get really tiresome to write all those ones and zeroes out. It’s often more practical to write them as hexadecimal literals, remembering that every hex digit corresponds to 4 bits (a nibble): hex 0 is binary 0000, and hex F is binary 1111. So this program is equivalent:

#include <stdio.h>
#include <stdint.h>

int main() {
    uint32_t x = 21;
    uint32_t y = 17;
    uint32_t comb = x | (y << 8);
    printf("comb:        %032b\n", comb);

    uint32_t x_mask = 0x000000FF;
    uint32_t y_mask = 0x0000FF00;

    printf("comb&x_mask: %032b\n", comb & x_mask);
    printf("comb&y_mask: %032b\n", comb & y_mask);
}

Putting it All Together

Now that we’ve separated the two values out by masking the combined value, there is one more step to recover the original values. We just need to shift them right with >> back to their original positions. Actually, x is already in its original position, so we don’t have to do anything to it. But y was shifted left by 8 bits originally, so to get its original value, we’ll shift the masked-out value right again by the same amount.

Here’s a complete program that shows the combination and extraction together:

#include <stdio.h>
#include <stdint.h>

uint32_t pack(uint8_t x, uint8_t y) {
    return x | (y << 8);
}

uint8_t get_x(uint32_t comb) {
    return comb & 0x000000FF;
}

uint8_t get_y(uint32_t comb) {
    return (comb & 0x0000FF00) >> 8;
}

int main() {
    uint32_t comb = pack(34, 10);
    printf("recovered x: %hhd\n", get_x(comb));
    printf("recovered y: %hhd\n", get_y(comb));
}

The pack function combines x and y into a single uint32_t. Then, the get_x and get_y functions use masking and shifting to undo this combination and extract the original values.

Bit packing is a superpower that you have unlocked by understanding how values are represented at the level of bits. Use it to save space when ordinary structs won’t cut it!

Pointers!

Pointers are central to programming in C, yet are often one of the most foreign concepts to new C coders.

A Motivating Example

Suppose we want to write a swap function that will take two integers and swap their values. With the programming tools we have so far, our function might look something like this:

void swap(int a, int b) {
  int temp = a;
  a = b;
  b = temp;
}

This won’t work how we want it to! If we call swap(foo, bar), the swap function gets copies of the values in foo and bar. Reassigning a and b just affects those copies—not foo and bar themselves! (This behavior is called call by value (or pass by value) as the values of the variables are passed as function arguments, not references.)

How can we give swap direct access to the places where the arguments are stored so it can actually swap them? Pointers are the answer.

Pointers

Pointers are addresses in memory, and you can think of them as referring to a value that lives somewhere else.

Declaring a Pointer

For any type T, the type of a pointer to a value of that type is T*: that is, the same type with a star after it. For example, this code:

char* my_char_pointer;

(pronounced “char star my char pointer”) declares a variable with the name my_char_pointer. This variable is not a char itself! Instead, it is a pointer to a char.

Confusingly, the spaces don’t matter. The following three lines of code are all equivalent declarations of a pointer to an integer:

int* ptr;
int *ptr;
int * ptr;

ptr has the type “pointer to an integer.”

Initializing a Pointer

int* ptr = NULL;

The line above initializes the pointer to NULL, or zero. It means the pointer does not point to anything. This is a good idea if you don’t plan on having it point to something just yet. Initializing to NULL helps you avoid “dangling” pointers which can point to random memory locations that you wouldn’t want to access unintentionally. C will not do this for you.

You can check if a pointer is NULL with the expression ptr == NULL.

New in C23: nullptr!

The current C programming language standard, C23, introduces a new nullptr keyword which denotes a null pointer constant. The type of nullptr is also new, the aptly named nullptr_t. In fact, nullptr is the only valid value of type nullptr_t.

For compatibility with older C language standards, we recommend still using NULL to check for null pointers, even though in most cases. Indeed, it is likely that on your machine NULL is defined to be nullptr!

Assigning to a Pointer, and Getting Addresses

In the case of a pointer, changing its value means changing where it points. For example:

void func(int* x) {
  int* y = x;
  // ...

The assignment in that code makes y and x point to the same place.

But what if you want to point to a variable that already exists? C has an & operator, called the “address-of” (or “reference-of”) operator, that gets the pointer to a variable. For example:

int x = 5;
int* xPtr = &x;

Here, xPtr now points to x.

You can’t assign to the address of things; you can only use & in expressions (e.g., on the right-hand side of an assignment). So:

y = &x;  // this is fine
&x = y;  // will not compile!

This rule reflects the fact that you can get the location of any variable, but it is never possible to change the location of a variable.

Dereferencing Pointers

Once you have a pointer with a memory location in it, you will want to access the value that is being pointed at—either reading or changing the value in the box at the end of the arrow. For this, C has the * operator, known as the “dereference” operator because it follows a reference (pointer) and gives you the referred-to value.

You can both read from and write to a dereferenced pointer, so * expressions can appear on either side of an assignment. For example:

int number = *xPtr;  // read the value xPtr points to
printf("the number is %d\n", *xPtr);  // read it and then print it
*xPtr = 6;  // write the value that xPtr points to

Common Confusion with the * Operator

Do not be confused by the two contexts in which you will see the star (*) symbol:

  • Declaring a pointer: int* p;
  • Dereferencing a pointer (RHS): r = *p;
  • Dereferencing a pointer (LHS): *p = r;

The star is part of the type name when declaring a pointer and is the dereference operator when used in assignments.

Swap with Pointers

Now that we have pointers, we can correctly write that swap function we wanted! The new version of swap uses a “pass by reference” model in which pointers to arguments are passed to the function.

void swap(int* a, int* b) {
  int temp = *a;
  *a = *b;
  *b = temp;
}

The Arrow Operator

Recall that we used the “dot” operator to access elements within a struct, like myRect.left. If you instead have a pointer to a struct, you need to dereference it first before you can access its fields, like (*myRect).left.

Fortunately, C has a shorthand for this case! You can also write myRect->left to mean the same thing. In other words, the -> operator works like the . operator except that it also dereferences the pointer on the left-hand side.

Pointer Arithmetic

If pointers are just addresses in memory, and addresses are just integers, you might wonder if you can do arithmetic on them like you can with ints. Yes, you can!

Adding n to a pointer to any type T causes the pointer to point n Ts further in memory. For example, the expression ptr + offset might compute a pointer that is “four ints later in memory” or “six chars later in memory.”

int x = 5;
int *ptr = ...;

x = x + 1;
ptr = ptr + 1;

In this code:

  • x + 1: adds 1 to to the integer x, producing 6
  • ptr + 1: adds the size of an int in bytes to ptr, shifting to point to the next integer in memory

Printing Pointers

You can print the address of a pointer to see what memory location it is pointing to. For example:

printf("Pointer address: %p\n", (void*)ptr);

This will output the memory address the pointer ptr is currently holding.

Arrays

An array is a sequence of same-type values that are consecutive in memory.

Declaring an Array

To declare an array, specify its type and size (i.e., the number of items in the sequence). For example, an array of four integers can be declared as follows:

int myArray[4];

A few variations on this declaration are:

int myArray[4] = {42, 45, 65, -5}; // initializes the values in the array
int myArray[4] = {0};              // initializes all the values in the array to 0
int myArray[] = {42, 45, 65, -5};  // initializes the values in the array, compiler intuits the array size

Accessing an Array

To refer to an element, specify the array name (e.g., my_array) and the position number (e.g., 0):

// Declare an array of five `int`s called `my_array`.
int my_array[5];
// Store the integer `8` at position `0` in array `my_array`.
my_array[0] = 8;
printf("I just initialized the element at index 0 to %d!\n", my_array[0]);

After executing the above code, my_array would look like this in memory (where larger addresses are higher on the screen):

1darray

Ex: Compute the sum an array

To sum the elements of an array, we can use a for loop to iterate over the array’s indices, adding the elements together as we go:

int sum_array(int *array, int n) {
  int sum = 0;
  for (int i = 0; i < n; ++i) {
    answer += array[i];
  }
  return sum;
}

int main() {
  int data[4] = {4, 6, 3, 8};
  int sum = sum_array(data, 4);
  printf("sum: %d\n", sum);
  return 0;
}

Accessing an Array using Pointer Arithmetic

In C, you can treat arrays as pointers: namely, to the first element in the sequence.

This means that, perhaps surprisingly, the syntax array[i] is shorthand for *(array + i): that is, a combination of pointer arithmetic and dereferencing. So you can think of array[i] as treating array as a pointer to the first element, then shifting the pointer over by i slots, and then dereferencing the pointer to that shifted location.

Passing Arrays as Parameters

You can also treat arrays as pointers when you pass them into functions. You already saw this above; we declared a function this way:

int sum_array(int *array, int n) { ... }

and then called it like sum_array(data, 4). Even though we declared data as an array, C lets you treat it as a pointer to the first element.

Keep track of the size of your arrays!

C does not know the size of an array. As with many things in C, the language entrusts the programmer (i.e., you!) with that responsibility.

The rule of thumb is to pass around the length of the array in a separate parameter whenever you pass them into functions so you know how big the array is!

Common Pitfalls

  • C has no array-bound checks. You won’t even get a warning! If you write past the end of an array, you will simply start overwriting the values of other data in memory.
  • sizeof(array) will return a different value based on how the variable array was declared. If array is declared as int *array, then array will be considered the size of a pointer. If it was declared as int array[100] then it will be considered the size of 100 ints.

Multidimensional Arrays

C lets you declare multidimensional arrays, like int matrix[4][3]. However, it still lays everything out sequentially in memory. Here’s a visualization of what that matrix looks like conceptually and in memory:

arr2

This array occupies (4 * 3 * sizeof(double)) bytes of memory.

Strings

A string is an array of characters (chars), terminated by the null terminator character, '\0'. In general, the type of a string in C is char*.

String Literals

We have seen string literals so far—a sequence of characters written down in quotation marks, such as "Hello World\n".

The type of a string literal is const char*, so this is valid C:

const char* str = "Hello World\n";

The const shows up here because the characters in a string literal cannot be modified.

Mutable Strings

A mutable string has type char*, without the const. How can you declare a mutable string with a string literal, if string literals are always const? Here’s a trick you can use: remember that, in C, an array is like a pointer to its first element. So let’s declare the string as an array and give it an initializer:

char str[] = "Hello World\n";

This code behaves exactly as if we wrote:

char str[] = {'H', 'e', 'l', 'l', 'o', ' ', 'W', 'o', 'r', 'l', 'd', '\n', '\0'};

It declares a variable str which is an array of 13 characters (remember that the size of an array may be implicit if we provide an initializer from which the compiler can determine the size), and initializes it by copying the characters of the string "Hello World\n" (including the null terminator) into that array.

String Equality

The expression str1 == str2 doesn’t check whether str1 and str2 are the same string! Remember, since both of these have a pointer type (char*), C will just compare the pointers.

Instead, if you want to check whether two strings contain equal contents, you will need to use a function like strcmp from the string.h header.

String Copying

Similarly, an assignment like str1 = str2; does not copy strings! It just just does pointer assignment, so now str1 points to the same region of memory as str2.

Use a function like strcpy if you need to copy characters.

C Macros

Let’s say you have a program that works with arrays of a certain size: say, 100 elements. The number 100 will show up in different parts of the code:

float stuff[100];

// ... elsewhere ...

for (int i = 0; i < 100; ++i) {
  do_something(stuff[i]);
}

Repeating the number 100 in multiple locations is not great for multiple reasons:

  • It is not maintainable. If you ever need to change the size of the array, you need to carefully look for all the places where you mentioned 100 and change it to something else. If you happen to miss one, subtle bugs will arise.
  • It is not readable. Writing code is as much about communicating with other programmers as it is about communicating with the machine! When a human sees the number 100 appear out of nowhere, it can be mysterious and worrisome. For this reason, programmers often call these arbitrary-seeming constants magic numbers (in a derogatory way).

C has a feature called the preprocessor that can cut down on duplication, eliminate magic numbers, and make code more readable. In particular, you can use a macro definition to give your constant a name:

#define NUMBER_OF_THINGS 100

The syntax #define <macro> <expression> defines a new name, the macro, and instructs the preprocessor to replace that name with the given expression. (Notably, there is no semicolon after preprocessor directives like #define.) It is a convention to always use SHOUTY_SNAKE_CASE for macro names to help visually distinguish them from ordinary C variable names.

In this example, the C preprocessor will “find and replace” all occurrences of NUMBER_OF_THINGS in our program and replace it with the expression 100. So it means exactly the same thing to rewrite our program above like this:

#define NUMBER_OF_THINGS 100

float stuff[NUMBER_OF_THINGS];

// ... elsewhere ...

for (int i = 0; i < NUMBER_OF_THINGS; ++i) {
  do_something(stuff[i]);
}

The C preprocessor runs before the actual compiler, so you can think of it as doing a textual “find and replace” operation before compiling your code.

Dynamic Memory Allocation

Motivation

Suppose we wanted to write a function that takes an integer, creates an array of the size specified by the integer, initializes each field, and returns the array back to the caller. Given the tools we have thus far, our code might look like this:

// Broken code! Do not do this!
int *initArray(int howLarge) {
  int myArray[howLarge];
  for (int i = 0; i < howLarge; i++) {
    myArray[i] = i;
  }
  return myArray;
}

The reason this code will not work is that the array is created on the stack. Variables on the stack exist only until the function ends, at which point the stack frame is popped. You can’t use the memory for that stack frame anymore, and it will get reused for some other data.

Dynamic memory allocation lets you obtain memory on the heap instead of the stack. Unlike stack frames, the heap is forever: it remains even when the function returns. Instead, you have to remember to explicitly free the memory when you are done using it.

Both the stack and the heap can grow and shrink over time, as the program creates and destroys stack frames and heap-allocated memory regions. Typically, systems lay out the stack at higher addresses in memory and the heap at lower addresses in memory; as they grow, the stack grows “down” and the heap grows “up.” Here’s a diagram that depicts this growth in the address space:

malloc

The diagram also includes static data (globals and constants) and code, which are other memory regions distinct from the heap and stack.

malloc

To use dynamic memory allocation functions, #include <stdlib.h>. Check out the reference for the stdlib.h header.

To allocate memory on the heap, use the malloc function. Here’s its declaration:

void* malloc(size_t size);

The return type of malloc is void*, which looks a little weird, but it means “a pointer to some type but I’m not sure which.” The only argument is a size: the number of bytes you want to allocate. (size_t is an unsigned integer type.)

How do you know how many bytes you need? The best way is to use C’s sizeof operator. Use sizeof(int), for example, to get the number of bytes that an int occupies. For example, here’s how to allocate space for an int on the heap:

int* intPtr = malloc(sizeof(int));

If you want to get fancy, you can even avoid repeating the int type by using sizeof’s ability to get the type of a variable for you:

int* intPtr = malloc(sizeof(*intPtr));

And here’s how to allocate space for an array of 500 floats:

float* floatArray = malloc(500 * sizeof(*floatArray));

(Please use sizeof instead of guessing the sizes of things, even if you think you know that an int occupies 4 bytes. Because types can be different sizes on different platforms, using sizeof will make your code portable.)

free

Unlike stack variables, you are responsible for freeing memory that you malloc! You do that with the free function. free just takes one argument: the pointer to some memory previously allocated with malloc.

Remember this rule: every time you call malloc, remember to put a free somewhere to balance it out.

initArray Revisited

Here’s a fixed version of the code above:

int *initArray(int howLarge) {
  int *array = malloc(howLarge * sizeof(*array));
  if (array != NULL) {
    for (int i = 0; i < howLarge; i++) {
      array[i] = i;
    }
  }
  return array;
}

Of course, the caller of initArray will need to call free when it is finished with the memory.

Notice how the above code checks whether malloc returns NULL. It is possible that the heap could run out of space and that there is not enough memory to fulfill the current request. In such cases, malloc will return NULL instead of a valid pointer to a location on the heap. It is a good idea to check the value returned by malloc and make sure that it is not NULL before trying to use the pointer. If the value is NULL, the program should gracefully abort with an error message explaining that a call to malloc failed (or if it can recover from the situation and continue—that is even better).

realloc

The realloc function can reallocate a block of memory at a different size. In general, realloc might allocate a new (larger or smaller) block of memory, copy the contents of the original to the new one, and free the old one. (But it might do something faster if it can avoid it, e.g., if there is room to expand the allocated region “in place.”)

RISC-V Assembly Resources

CS 3410 uses the 64-bit RISC-V (pronounced risk-five) instruction set architecture (ISA). RISC-V is a modern reduced instruction set computer (RISC) architecture. RISC-V is unique because it’s an open instruction set that anyone can implement without any kind of licensing. (That’s in contrast to the two most popular ISAs, Arm and x86, which both require expensive licenses to implement in hardware.)

Here are some references you might find helpful when writing and reading RISC-V assembly code.

Reference Materials

  • This short reference sheet contains instruction encodings for RISC-V 32, RISC-V 64, and beyond.
  • For the definitive description of what every instruction does and how it’s encoded, see the official ISA manual. It’s long, though, and can get a little bit technical.

Online Tools

  • Cornell’s new experimental RISC-V interpreter supports 64-bit RISC-V, and replaces the previous 32-bit interpreter. Note that the old interpreter, which is now deprecated, was designed for the 32-bit ISA, while the new version more closely aligns with the 64-bit ISA taught in class.
  • Venus is a powerful interactive RISC-V simulator. It is more complicated to use, but it supports more RISC-V instructions.

Introduction

Syllabus and Setup

Please carefully read over the syllabus. Seriously! There is a lot in there that you will want to know.

CS 3410 has made some significant changes compared to prior years. We have updated the curriculum to focus on the essential topics we believe are critical to anyone studying computer science. Among many other changes, this means that there is more focus on programming in C and assembly, we regretfully needed to sacrifice all the digital-design assignments that used the Logisim for visual circuit design, and there is much more of an emphasis on parallelism (because, in the modern era, all computers are parallel).

There are two things you need to do this week:

  • An introductory survey on Gradescope. This is due on Friday.
  • Set up the RISC-V infrastructure that you will need for all assignments. Please do your best to do this before your first lab section. However, we will do it during this weeks lab session. Further, if you need help, please post on Ed or find a TA in office hours.

This week’s lab is setting up the infrastructure, this is lab 0! Once your infrastructure is setup, the assignment is printf. The printf assignment serves as an introduction to the C programming language and lets you exercise your skills with numerical representation, binary, and other bases.

As with every assignment in this class, the lab is there to help you get started on the assignment. The lab instructors will help guide you through “step 0” for the printf assignment; then, the rest is up to you.

Course Overview

CS 3410 is about how computers actually work. That puts it in contrast to other kinds of courses that at other “levels” in the computer science stack:

  • Classes like CS 1110, CS 2110, and CS 3110 are all about how to make computers do things. You used programming languages (Python, Java, and OCaml) to write programs without worrying to much about how those languages actually do what they do.
  • Classes on application topics like robotics, machine learning, and graphics are all about things computers can do. These are important, of course, because they are the reason we study computing in the first place.
  • Outside of CS, and below the 3410 “level,” there are many classes at Cornell on topics like electronics, chemistry, and physics that can tell you physical details of how computers work. That’s not what 3410 is about either: we will build abstractions over those physical phenomena to understand how computers work in the realm of logic.

Switches

The fundamental computational building block in the physical world is a switch. What we mean by a “switch” is: something that controls a physical phenomenon that you can abstractly think of as being in an “on” or “off” state. Some examples of switches include:

  • A valve controls hydraulic states, i.e., whether water is flowing or not.
  • A vacuum tube controls an electronic signal.
  • The game Turing Tumble controls signals in the form of marbles. Yes, you can build real computers out of little plastic levers.
relay

What you think of as a “real” computer controls electronic signals. Aside from vacuum tubes, a particularly easy-to-understand type of electronic switch is a relay. To make a relay, you need:

  • An electromagnet (i.e., a magnet controlled by an electronic signal).
  • A bendy piece of metal that can be attracted or repelled by that magnet.
  • Another piece of metal next to that one. You position it carefully so there’s a tiny gap between the two pieces of metal. When the electromagnet is on, it either closes or opens that gap (depending on whether it attracts or repels the bendy piece of metal).
  • Wires hooked up to the two pieces of metal. This way, you can think of the relay as a wire that is either connected or disconnected, depending on whether the electromagnet is charged.

The point is that a relay is a switch that both controls an electronic signal and is controlled by an electronic signal. That’s a really powerful idea, because it means you can wire up a whole bunch of relays to make them control each other! And that is basically what you need to build a computer.

Transistors

Computers today are universally built out of transistors. Transistors work like relays, in the sense that they let one electronic signal control another one. The difference is that they are solid-state devices, relying on the chemistry of the materials inside of them to do the current control instead of a physically moving bendy piece of metal. But abstractly, they do exactly the same thing.

The first transistor was built in Bell Labs in 1947. These days, you can buy them on Amazon for a few pennies apiece. You can build computers “from scratch” by buying a bunch of transistors on Amazon and wiring them up carefully.

Modern computers consist of billions of transistors, manufactured together in an integrated circuit. For example, Apple’s M4 is made up of 28 billion transistors. There is an entire industry of silicon manufacturing that is dedicated to building chunks of silicon and with many, many tiny transistors and wires on them.

Abstractly speaking, however, these integrated circuits are no different from a bunch of transistors you can buy on Amazon, wired up very carefully. Which are in turn (abstractly!) the same as relays, or valves, or Turing Tumble marble levers: they are all just a bunch of switches that control each other in careful ways.

One Plus One

Bits

Because computers are made of switches, data is made of bits. A bit is an abstraction of a physical phenomenon that can either be “on” or “off.” The mapping between the physical phenomenon and the 0 or 1 digit is arbitrary; this is just something that humans have to make up. For example

  • In a hydraulic computer, maybe 0 is “no water” and 1 is “water is flowing.”
  • In Turing Tumble, perhaps 0 is “marble goes left” and 1 is “marble goes right.”
  • In an electronic computer, let’s use 0 to to mean “low voltage” and 1 to mean “high voltage.”

Binary Numbers

Armed with switches and a logical mapping, computers have a way to represent numbers! Just really small numbers: a bit suffices to represent all the integers in interval [0, 1]. It would be nice to be able to represent numbers bigger than 1.

We do that by combining multiple bits together and counting in binary, a.k.a. “base 2.”

In elementary school math class, you probably learned about “place values.” The rightmost digit in a decimal number is for the ones, the next one is for tens, and the next one is for hundreds. In other words, if you want to know what the string of decimal digits “631” means, you can multiply each digit by its place value and add the results together:

\[ 631_{10} = 1 \times 10^0 + 3 \times 10^1 + 6 \times 10^2 \]

We’ll sometimes use subscripts, like \( n_{b} \), to be explicit when we are writing a number in base \( b \).

That’s the decimal, a.k.a. “base 10,” system for numerical notation. Base 2 works the same way, except all the place values are powers of 2 instead of powers of 10. So if you want to know what the string of binary digits “101” represents, we can do the same multiply-and-add dance:

\[ 101_2 = 1 \times 2^0 + 0 \times 2^1 + 1 \times 2^2 \]

That’s five, so we might write \( 101_2 = 5_{10} \).

Some Important Bases

We won’t be dealing with too many different bases in this class. In computer systems, only three bases are really important:

  • Binary (base 2).
  • Octal (base 8).
  • Hexadecimal (base 16), affectionately known as hex for short.

Octal works exactly as you might expect, i.e., we use the digits 0 through 7. For hexadecimal, we run out of normal human digits at 9 and need to invent 6 more digits. The universal convention is to use letters: so A has value 10 (in decimal), B has value 11, and F has value 15.

Converting Between Bases

Here are two strategies for converting numbers between different bases. In both algorithms, it can be helpful to write out the place values for the base you’re converting to. We’ll convert the decimal number 637 to octal as an example. In octal, the first few place values are 1, 8, 64, and 512.

Left to Right

First, compute the first digit (the most significant digit) finding the biggest place value you can that is less than that number. Then, find the largest number you can multiply by that place value. That’s your converted digit. Take that product (the place value times that largest number) and subtract it from your value. Now you have a residual value; start from the beginning of these instructions and repeat to get the rest of the digits.

Let’s try it by converting 637 to octal.

  • The biggest place value under 636 is 512. \( 512 \times 2 \) doesn’t stay “under the limit,” so we have to settle for \( 512 \times 1 \). That means the first digit of the converted number is 1. The residual value is \( 637 - 512 \times 1 = 125 \).
  • The value that “fits under” 125 is \( 64 \times 1 \). So the second digit is also 1. The residual value is \( 125 - 64 \times 1 = 61 \).
  • We’re now at the second-to-least-significant digit, with place value 8. The largest multiple that “fits under” 61 is \( 8 \times 7 \), so the next digit is 7 and the residual value is \( 61 - 8 \times 7 = 5 \).
  • This is the ones place, so the final digit is 5.

So the converted value is \( 1175_8 \).

Right to Left

First, compute the least significant digit by dividing the number by the base, \(b\). Get both the quotient and remainder. The remainder is the number of ones you have, so that’s your least significant digit. The quotient is the number of \(b\)s you have, so that’s the residual value that we will continue with.

Next, repeat with that residual value. Remember, you can think of that as the number of \(b\)s that remain. So when we divide by \(b\), the remainder is the number of \(b\)s and the quotient is the number of \(b^2\)s. So the remainder is the second-to-least-significant digit, and we can continue around the loop with the quotient. Stop the loop when the residual value becomes zero.

Let’s try it again with 637.

  • \( 637 \div 8 = 79 \) with remainder 5. So the least significant digit is 5.
  • \( 79 \div 8 = 9 \) with remainder 7. So the next-rightmost digit is 7.
  • \( 9 \div 8 = 1 \) with remainder 1. The next digit is 1.
  • \( 1 \div 8 = 0 \) with remainder 1. So the final, most significant digit is 1.

Fortunately, this method gave the same answer: \( 1175_8 \).

Programming Language Notation

When writing, we often use the notation \( 1175_8 \) to be explicit that we’re writing a number in base 8 (octal). Subscripts are hard to type in programming languages, so they use a different convention.

In many popular programming languages (at least Java, Python, and the language we will use in 3410: C), you can write:

  • 0b10110 to use binary notation.
  • 0x123abc to use hexadecimal notation.

Octal literals are a little less standardized, but in Python, you can use 0o123 (with a little letter “o”).

Addition

To add binary numbers, you can use the elementary-school algorithm for “long addition,” with carrying the one and all that. Just remember that, in binary, 1+1 = 10 and 1+1+1 (i.e., with a carried one) is 11.

Numbers

Addition

To add binary numbers, you can use the elementary-school algorithm for “long addition,” with carrying the one and all that. Just remember that, in binary, 1+1 = 10 and 1+1+1 (i.e., with a carried one) is 11.

Signed Numbers

This is all well and good for representing nonnegative numbers, but what if you want to represent \( -10110 \)? Remember, everything must be a bit, so we can’t use the \( - \) sign in our digital representation of negative numbers.

There is an “obvious” way that turns out to be problematic, and a less intuitive way that works out better from a mathematical and hardware perspective. The latter is what modern computers actually use.

Sign–Magnitude

The “obvious” way is sign–magnitude notation. The idea is to reserve the leftmost (most significant) bit for the sign: 0 means positive, 1 means negative.

For example, recall that \( 7_{10} = 111_{2} \). In a 4-bit sign–magnitude representation, we would represent positive \(7\) as 0111 and \(-7\) as 1111.

Sign–magnitude was used in some of the earliest electronic computers. However, it has some downsides that mean that it is no longer a common way to represent integers:

  • It leads to more complicated circuits to implement fundamental operations like addition and subtraction. (We won’t go into why—you’ll have to trust us on this.)
  • Annoyingly, it has two different zeros! There is as “positive zero” (0000 in 4 bits) and a “negative zero” (1000). That just kinda feels bad; there should only be one zero, and it should be neither positive nor negative.

Two’s Complement

The modern way is two’s complement notation. In two’s complement, there is still a sign bit, and it is still the leftmost (most significant) bit in the representation. 1 in the sign bit still means negative, and 0 means positive or zero.

For the positive numbers, things work like normal. In a 4-bit representation, 0001 means 1, 0010 means 2, 0011 means 3, and so on up to 0111, which means positive 7.

The key difference is that, in two’s complement, the negative numbers grow “up from the bottom.” (In the same sense that they grow “down from zero” in sign–magnitude.) That means that 1000 (and in general, “one followed by all zeroes”) is the most negative number: with 4 bits, that’s \(-8\). Then count upward from there: so 1001 is \(-7\), 1010 is \(-6\), and so on up to 1111, which is \(-1\).

Here’s another way to think about two’s complement: start with a normal, unsigned representation and negate the place value of the most significant bit. In other words: in an unsigned representation, the MSB has place value \(2^{n-1}\). In a two’s complement representation, all the other place values remain the same, but the MSB has place value \(-2^{n-1}\) instead.

Here are some cool facts about two’s complement numbers, when using \(n\) bits:

  • The all-zeroes bit string always represents 0.
  • The all-ones bit string always represents \(-1\).
  • The biggest positive value, sometimes known as INT_MAX, is 0 followed by all ones. Its value is \(2^{n-1}-1\).
  • The biggest negative value, sometimes known s INT_MIN, is 1 followed by all zeroes. Its value is \(-2^{n-1}\).
  • Addition works the same as for normal, unsigned binary numbers. You can just ignore the fact that one of the bits is a sign bits, add the two numbers as if they were plain binary values, and you get the right answer in a two’s complement representation!
  • To negate a number i, you can compute ~i + 1, where ~ means “flip all the bits, so every zero becomes one and every one becomes zero.”

Two’s Complement Example

Let’s use a six-bit two’s complement representation. What numbers (in standard decimal notation) do these bit patterns represent?

  • 011000
  • 111111
  • 111011

The answers are:

  • \(24\). For positive numbers (where the sign bit is 0), you don’t have to think much about two’s complement; just read the remaining bits as a normal binary number.
  • \(-1\). Remember the tip from last time: the all-ones bit pattern is always \(-1\).
  • \(-5\). There are many ways to get here. One option is to notice that this number is exactly \(100_2\) less than the all-ones bit pattern, so it’s \(-1 - 4\).

Introduction to C

Hello, C!

Much of the work for CS 3410 will consist of programming in C. If you have mainly programmed in the other Cornell-endorsed languages (Python, Java, and OCaml), the main difference you’ll notice in C is that it operates at a much lower level of abstraction. It gives you a far greater level of control over exactly what the computer does.

While this kind of low-level control is undeniably inconvenient and verbose, it has some extremely important advantages. The most common reasons to use a low-level language like C are:

  • Performance. Higher-level languages trade off convenience for speed. Often, programming in a low-level language is the only way to get the kind of efficiency you need.
  • Interactions with hardware. When you’re writing an operating system, a device driver, or anything else that interacts with hardware directly, you really need a low-level language.

There are other low-level languages that have the same advantages, such as C++ and Rust. However, C is unique because of its central position in the modern computing landscape. We can confidently say that almost everything you’ve ever done with a computer has eventually relied on software written in C. As just a few examples:

  • The Linux kernel is written in C.
  • The primary implementation of Python is written in C.
  • The C standard library is the de facto standard way that software interacts with operating systems. Even Rust programs rely on C’s standard library for things like printing to the console and opening files.
  • In general, whenever two different languages want to talk to each other, they go through C (via a foreign function interface).

Getting Started

Let’s write the smallest possible C program:

int main() {
    return 0;
}

Even this minimal program brings up a few basic things about C:

  • In basic ways, the syntax looks a little like Java. There are curly braces and semicolons. There is even a type called int. (This is because the designers of Java based its syntax on C.)
  • Unlike Java, however, there is no class definition here. You just write a main function at the top level; it’s not a method on some class. In fact, C doesn’t have classes or objects at all.
  • C is a statically typed language (like Java but not like Python). This means that C makes you declare the types of everything you write down. This example shows one type: the return type of the main function is int.
  • That return 0 for main determines the exit status for your program.

Let’s run our program. The commands you see here will assume you have followed our guide to setting up 3410’s RISC-V infrastructure, including setting up the rv alias. The rv alias works as a prefix that gives you access to the tools you need, so you can type any command you like after it. For example, you can type:

$ rv ls

and you’ll see similar results to running plain old ls.

Let’s compile the program, like this:

$ rv gcc minimal.c

where minimal.c is the name of the source file. GCC is the name of the compiler we’ll be using in this course.

That worked, but we actually recommend providing some more command-line options to the compiler whenever you use it. You can copy and paste our recommended options from the C compilation page. Then, add -o minimal to tell GCC where to put the output file (if you don’t, GCC picks the name a.out). So here’s a complete command:

$ rv gcc -Wall -Wextra -Wpedantic -Wshadow -Wformat=2 -std=c23 -o minimal minimal.c

That produces an executable file, minimal. Now let’s run it:

$ rv qemu minimal

That runs the QEMU emulator to execute the compiled minimal program. It won’t print anything at all!

Printing

Here’s a slightly more exciting program:

#include <stdio.h>

int main() {
    printf("Hello, 3410!\n");
    return 0;
}

We’ve added two lines:

The \n in the string is an escape sequence that means a newline character. That’s the same as in Java.

Now let’s declare and print a variable:

#include <stdio.h>

int main() {
    int n = 3410;
    printf("Hello, %d!\n", n);
    return 0;
}

We added a variable declaration of n, with type int. Read more about the basic types in C.

To print out the number, printf exploits format specifiers in the string that you pass to it. Format specifiers look like %d: they always start with %, followed by a few characters that tell printf how to format stuff. The d in this one stands for decimal, because that’s the base it uses.

If you have n format specifiers in your printf string, you should pass n extra arguments after the string to printf. It will print each extra argument using each specified format, in order.

Let’s try some other format specifiers. %b prints ints in binary, and %x prints them in hex:

#include <stdio.h>

int main() {
    int n = 3410;
    printf("Decimal: %d\n", n);
    printf("Binary: %b\n", n);
    printf("Hexadecimal: %x\n", n);
    return 0;
}

Read more about format specifiers for printf.

Playing with Numbers

C makes it easy to put our new knowledge about binary numbers and two’s complement into practice. We’ll use the int8_t type, which is an integer with exactly 8 bits. (In lots of “normal” code, you can just use int to get a default-sized integer—but for these examples, we really want to use just 8 bits.)

#include <stdio.h>
#include <stdint.h>

int main() {
    int8_t n = 7;
    printf("n = %hhd\n", n);
    return 0;
}

The %hhd format specifier is for printing the int8_t type in decimal. We also need to #include the stdint.h library to get the int8_t type.

We can also write our 8-bit number in binary notation:

#include <stdio.h>
#include <stdint.h>

int main() {
    int8_t n = 0b00000111;
    printf("n = %hhd\n", n);
    return 0;
}

This should also print 7. An important thing to reassure yourself is that, in the two programs above, the variable n contains exactly the same value. There is no difference between the same number specified in decimal notation and binary notation; the choice is just a convenience for the programmer, and the compiler will translate either one into exactly the same value for the computer. (And that value will be in binary because, of course, everything is bits.)

We can also use the sign bit. What’s this value if we flip the top bit of 7 from 0 to 1?

#include <stdio.h>
#include <stdint.h>

int main() {
    int8_t n = 0b10000111;
    printf("n = %hhd\n", n);
    return 0;
}

That prints -121. Maybe you can convince yourself this is correct by thinking of the largest negative value in 8 bits.

A Little More C

Let’s try the inversion trick from last time: the identity that, in two’s complement, ~x + 1 is equal to -x.

#include <stdio.h>
#include <stdint.h>

int main() {
    int8_t n = 7;
    printf("n (binary) = %hhb\n", n);
    printf("n (decimal) = %hhd\n", n);

    int8_t flipped = ~n + 1;
    printf("flipped (binary) = %hhb\n", flipped);
    printf("flipped (decimal) = %hhd\n", flipped);

    return 0;
}

That worked for 7. To see a little more of C, let’s try checking that this works for every number we can represent with a char.

#include <stdio.h>
#include <stdint.h>

int8_t flip(int8_t num) {
    return ~num + 1;
}

int main() {
    for (int8_t i = -128; i < 127; ++i) {
        //printf("i = %hhd\n", i);
        int8_t negated = -i;
        int8_t flipped = flip(i);
	printf("i = %hhd, neg = %hhd, flip = %hhd\n", i, negated, flipped);

        if (negated != flipped) {
            printf("mismatch!\n");
        }
    }
    return 0;
}

This example shows off C’s for loops and if conditions. If you’re familiar with Java, these should look pretty familiar. Read more about control flow in C.

It also demonstrates function definitions in C.

If you run this program, there are no mismatches! So we can be pretty sure this trick works for all the int8_t values, even if you don’t want to try doing the math.

Overflow

Computer representations of integers (usually) have a fixed width, i.e., the number of bits they use: for example, int8_t always has 8 bits. This has some fun consequences.

In our last example, we had to think through the minimum and maximum values you can store in an int8_t. What happens if you exceed this value?

The C language has pretty annoying rules about this. For signed numbers, it is actually a silent error (a concept known as undefined behavior) to exceed the maximum, e.g., to add 1 to the biggest possible signed number. But it’s legal to do this for unsigned numbers. So we’ll try it out with the type uint8_t, which is the unsigned (only-positive) version of our friend int8_t. Here’s a loop that just adds 1 to an int8_t value many times:

#include <stdio.h>
#include <stdint.h>

int main() {
    uint8_t num = 0;
    for (int i = 0; i < 500; ++i) {
        num += 1;
        printf("num = %hhu\n", num);
    }
    return 0;
}

If you run this program, you’ll see the number counting up from 1. When we reach 255, adding 1 takes us right back down to 0.

It can be helpful to think about the bits. 255 is the all-ones bit pattern: in 8 bits, 1111 1111. (Sometimes it’s helpful to put spaces in your binary numbers to group together 4 bits, just for legibility.) Adding one to this will “carry” all the way across, setting every bit to zero. The last carry bit would go in position 9, but because this is an 8-bit representation, the computer just drops that bit. And so, the result of the addition 1111 1111 + 0000 0001 is 0000 0000.

This behavior is called integer overflow and it is the source of many fun bugs in all kinds of software. Memorably, YouTube originally used a signed 32-bit number (i.e., an int) to represent the number of views for a video. That meant that the largest number of views that any video could have was \(2^{32 - 1} - 1\), or 2,147,483,647 views. The first video to exceed this number of views was Psy’s “Gangnam Style”. YouTube made a cute announcement when they had to change that value to a 64-bit integer. That should be plenty of views for a long time (more than 9 quintillion views).

Prototypes, Headers, Libraries, and Linking

There is a lot more to explore about C programming that you will learn through doing assignments in 3410. But here is one more concept I think will be helpful to see early.

Declarations Must Precede Uses

Here’s a tiny program with one function call:

#include <stdio.h>

void greet(const char* name) {
    printf("Hello, %s!\n", name);
}

int main() {
    greet("3410");
}

(As an aside, void is the “return type” you use for functions that don’t return anything, and const char* is the type of a string literal. We’ll learn more about why the * is in there later in the course.)

A fun quirk about C is that it wants declarations to come before uses. That means that it won’t work to call greet before we define it, like in this broken program:

#include <stdio.h>

int main() {
    greet("3410");
}

void greet(const char *name) {
    printf("Hello, %s!\n", name);
}

Prototypes, a.k.a. Declarations

As you can imagine, this restriction can get frustrating, and unworkable if you need mutual recursion. The way to fix it is to use a prototype, a.k.a. a declaration. A function declaration looks a lot like a function definition but omits the body. So this program works:

#include <stdio.h>

void greet(const char *name);

int main() {
    greet("3410");
}

void greet(const char *name) {
    printf("Hello, %s!\n", name);
}

We just need to copy and paste the “signature” part of the function definition, put it at the top of the file, and add a semicolon. That makes it a declaration that means that the call to greet is legal.

Header Files

The need for these declarations is so common that programmers typically put them in a whole separate C source code file, called a header file. Header files are C files that, by convention, end with a .h instead of a .c and mostly just contain declarations. So we might put the declaration in greet.h:

void greet(const char *name);

We can use this declaration by #include-ing it:

#include <stdio.h>
#include "greet.h"

int main() {
    greet("3410");
}

void greet(const char *name) {
    printf("Hello, %s!\n", name);
}

Notice the difference between the #include <stdio.h> line and the #include "greet.h" line. The angle brackets search for built-in library headers; the quotation marks are for header files you write yourself and tell the compiler to look in the same directory as the source file.

In either case, #include works a lot like just “copying and pasting” the entire text of the file into your source program. So #include-ing greet.h looks the same to the compiler as a version that just includes the declaration right there.

Separating Source Files

Headers are also part of the mechanism that lets you break up long .c source files. Let’s say we want to create a separate greet.c library that just contains our greeting function:

#include <stdio.h>
#include "greet.h"

void greet(const char *name) {
    printf("Hello, %s!\n", name);
}

Then, our main.c can use the library like this:

#include <stdio.h>
#include "greet.h"

int main() {
    greet("3410");
}

By “copying and pasting” the contents of greet.h here, the #include sorta works as a way to “import” the greet function so we can use it in main.

Linking Multiple Files

Now, however, we need a way to combine the two .c files into a single executable. One option is to just give both source files on the command line:

$ rv gcc main.c greet.c -o main

Notice that we don’t list header files when compiling the whole thing: only .c files, not .h files. Header files are just for #include-ing into other files, so the compiler already sees the contents of those files implicitly.

There’s another way too: it can be useful to compile the .c files separately and then link them together. Here’s what that looks like:

$ rv gcc -c main.c -o main.o
$ rv gcc -c greet.c -o greet.o
$ rv gcc main.o greet.o -o main

The first two lines, with -c, compile the source files to object files that end in .o. Then, the last command links the two object files together into an executable.

Separating it out this way can save you time. If you only change greet.c, for example, then you only need to re-compile that file and then re-link; you can skip re-compiling the unchanged main.c.

Floating Point

Like other languages you’ve used before, C has a float type that works for numbers with a decimal point in them:

#include <stdio.h>

int main() {
    float n = 8.4f;
    printf("%f\n", n * 5.0f);
    return 0;
}

But how does float actually work? How do we represent fractional numbers like this at the level of bits? The answers have profound implications for the performance and accuracy of any software that does serious numerical computation.

For example, see if you can predict what the last line of this example will print:

#include <stdio.h>

int main() {
    float x = 0.00000001f;
    float y = 0.00000002f;

    printf("x = %e\n", x);
    printf("y = %e\n", y);
    printf("y - x = %e\n", y - x);

    printf("1+x = %e\n", 1.0f + x);
    printf("1+y = %e\n", 1.0f + y);
    printf("(1+y) - (1+x) = %e\n", (1.0f + y) - (1.0f + x));

    return 0;
}

Understanding how float actually works is the key to avoiding surprising pitfalls like this.

Real Numbers in Binary

Before we get to computer representations, let’s think about binary numbers “on paper.” We’ve seen plenty of integers in binary notation; we can extend the same thinking to numbers with fractional parts.

Let’s return to elementary school again and think about how to read the decimal number 19.64. The digits to the right of the decimal point have place values too: those are the “tenths” and “hundredths” places. So here’s the value that decimal notation represents:

\[ 19.64_{10} = 1 \times 10^1 + 9 \times 10^0 + 6 \times 10^{-1} + 4 \times 10^{-2} \]

Beyond the decimal point, the place values are negative powers of ten. We can use exactly the same strategy in binary notation, with negative powers of two. For example, let’s read the binary number 10.01:

\[ 10.01_2 = 1 \times 2^1 + 0 \times 2^0 + 0 \times 2^{-1} + 1 \times 2^{-2} \]

So that’s \( 2 + \frac{1}{4} \), or 2.25 in decimal.

The moral of this section is: binary numbers can have points too! But I suppose you call it the “binary point,” not the “decimal point.”

Fixed-Point Numbers

Next, computers need a way to encode numbers with binary points in bits. One way, called a fixed-point representation, relies on some sort of bookkeeping on the side to record the position of the binary point. To use fixed-point numbers, you (the programmer) have to decide two things:

  • How many bits are we going to use to represent our numbers? Call this bit count \(n\).
  • Where will the binary point go? Call this position \(e\) for exponent. By convention, \(e=0\) means the binary point goes at the very end (so it’s just a normal integer), \(e=-1\) means there is one bit after the binary point.

The idea is that, if you read your \(n\) bits as an integer \(i\), then the number those bits represent is \(i \times 2^{e}\). (This should look a little like scientific notation, where you might be accustomed to writing numbers like \(34.10 \times 10^{-5}\). It’s sort of like that, but with a base of 2 instead of 10.)

For example, let’s decide we’re going to use a fixed-point number system with 4 bits and a binary point right in the middle. In other words, \(n = 4\) and \(e = -2\). In this number system, the bit pattern 1001 represents the value \(10.01_2\) or \(2.25_{10}\).

It’s also possible to have positive exponents. If we pick a number system with \(n = 4\) and \(e = 2\), then the same bit pattern 1001 represents the value \(1001_2 \times 2^2 = 100100_2\), or \(36_{10}\). So positive exponents have the effect of tacking \(e\) zeroes onto the end of the binary number. (Sort of like how, in scientific notation, \(\times 10^e\) tacks \(e\) zeroes onto the end.)

Let’s stick with 4 bits and try it out. If \(e = -3\), what is the value represented by 1111? If \(e = 1\), what is the value represented by 0101?

The best and worst thing about fixed-point numbers is that the exponent \(e\) is metadata and not part of the actual data that the computer stores. It’s in the eye of the beholder: the same bit pattern can represent many different numbers, depending on the exponent that the programmer has in mind. That means the programmer has to be able to predict the values of \(e\) that they will need for any run of the program.

That’s a serious limitation, and it means that this strategy is not what powers the float type. On the other hand, if programs can afford the complexity to deal with this limitation, fixed-point numbers can be extremely efficient—so they’re popular in resource-constrained application domains like machine learning and digital signal processing. Most software, however, ends up using a different strategy that makes the exponent part of the data itself.

Floating-Point Numbers

The float type gets its name because, unlike a fixed-point representation, it lets the binary point float around. It does that by putting the point position right into the value itself. This way, every float can have a different \(e\) value, so different floats can exist on very different scales:

#include <stdio.h>

int main() {
    float n = 34.10f;
    float big = n * 123456789.0f;
    float small = n / 123456789.0f;
    printf("big = %e\nsmall = %e\n", big, small);
    return 0;
}

The %e format specifier makes printf use scientific notation, so we can see that these values have very different magnitudes.

The key idea is that every float actually consists of three separate unsigned integers, packed together into one bit pattern:

  • A sign, \(s\), which is a single bit.
  • The exponent, an unsigned integer \(e\).
  • The significand (also called the mantissa), another unsigned integer \(g\).

Together, a given \(s\), \(e\), and \(g\) represent this number:

\[(-1)^s \times 1.g \times 2^{e-127}\]

…where \(1.g\) is some funky notation we’ll get to in a moment. Let’s break it down into the three terms:

  • \((-1)^s\) makes \(s\) work as a sign bit: 0 for positive, 1 for negative. (Yes, floating point numbers use a sign–magnitude strategy: this means that +0.0 and -0.0 are distinct float values!)
  • \(1.g\) means “take the bits from \(g\) and put them all after the binary point, with a 1 in the ones place.” The significand is the “main” part of the number, so (in the normal case) it always represents a number between 1.0 and 2.0.
  • \(2^{e-127}\) is a scaling term, i.e., it determines where the binary point goes. The \(-127\) in there is a bias: this way, the unsigned exponent value \(e\) can work to represent a wide range of both positive and negative binary-point position choices.

The float type is actually an international standard, universally implemented across programming languages and hardware platforms. So it behaves the same way regardless of the language you’re programming in and the CPU or GPU you run your code on. It works by packing the three essential values into 32 bits. From left to right:

  • 1 sign bit
  • 8 exponent bits
  • 23 significand bits

To get more of a sense of how float works at the level of bits, now would be a great time to check out the amazing tool at float.exposed. You can click the bits to flip them and make any value you want.

Conversion Examples

As an exercise, we can try converting decimal numbers to floating-point representations by hand and using float.exposed to check our work. Let’s try representing the value 8.25 as a float:

  1. First, let’s convert it to binary: \(1000.01_2\)
  2. Next, normalize the number by shifting the binary point and multiplying by \(2^{\text{something}}\): \(1.00001 \times 2^3\)
  3. Finally, break down the three components of the float:
    • \(s = 0\), because it’s a positive number.
    • \(g\) is the bit pattern starting with 00001 and then a bunch of zeroes, i.e., we just read the bits after the “1.” in the binary number.
    • \(e = 3 + 127\), where the 3 comes from the power of two in our normalized number, and we need to add 127 to account for the bias in the float representation.

Try entering these values (0, 00001000…, and 130) into float.exposed to see if it worked. It’s easiest to enter the exponent in the little text box and the significand by clicking bits in the bit pattern.

Can you convert -5.125 in the same way?

Checking In with C

To prove that float.exposed agrees with C, we can use a little program that reinterprets the bits it produces to a float and prints it out:

#include <stdio.h>
#include <stdint.h>
#include <string.h>

int main() {
    uint32_t bits = 0x41040000;

    // Copy the to a variable with a different type.
    float val;
    memcpy(&val, &bits, sizeof(val));

    // Print the bits as a floating-point number.
    printf("%f\n", val);
    return 0;
}

The memcpy function just copies bits from one location to another. Don’t worry about the details of how to invoke it yet; we’ll cover that later in 3410.

Also, we can use bit operators such as bit shift and a bit-wise AND with a mask to isolate the sign, exponent, and significand from the 32-bit float.

#include <stdio.h>
#include <stdint.h>
#include <string.h>

int main() {
    uint32_t bits = 0x41040000;
    uint32_t significand = bits & 0x007fffff; // mask and isolate the mantissa
    uint32_t exponent = (bits & 0x7f800000) >> 23; // mask and bit shift
    uint32_t sign = (bits & 80000000) >> 31; // mask and bit shift


    // Print the components of a floating-point number.
    printf("s = %b, e = %b, g = %b \n", sign, exponent, significand);
    return 0;
}

Special Cases

Annoyingly, we haven’t yet seen the full story for floating-point representations. The above rules apply to most float values, but there are a few special cases:

  • To represent +0.0 and -0.0, you have to set both \(e = 0\) and \(g = 0\). (That is, use all zeroes for all the bits in both of those ranges.) We need this special case to “override” the significand’s implicit 1 that would otherwise make it impossible to represent zero. And requiring that \(e=0\) ensures that there are only two zero values, not many different zeroes with different exponents.
  • When \(e = 0\) but \(g \neq 0\), that’s a denormalized number. The rule is that denormalized numbers represent the value \((-1)^s \times 0.g \times 2^{-126}\). The important difference is that we now use \(0.g\) instead of \(1.g\). These values are useful to eke out the last drops of precision for extremely small numbers.
  • When \(e\) is “all ones” and \(g = 0\), that represents infinity. (Yes, we have both +∞ and -∞.)
  • When \(e\) is “all ones” and \(g \neq 0\), the value is called “not a number” or NaN for short. NaNs arise to represent erroneous computations

The rules around infinity and NaN can be a little confusing. For example, dividing zero by zero is NaN, but dividing other numbers by zero is infinity:

#include <stdio.h>

int main() {
    printf("%f\n", 0.0f / 0.0f);  // NaN
    printf("%f\n", 5.0f / 0.0f);  // infinity
    return 0;
}

Other Floating-Point Formats

All of this so far has been about one (very popular) floating-point format: float, also known as “single precision” or “32-bit float” or just f32. But there are many other formats that work using the same principles but with different details. A few to be aware of are:

  • double, a.k.a. “double precision” or f64, is a 64-bit format. If offers even more accuracy and dynamic range than 32-bit floats, at the cost of taking up twice as much space. There is still only one sign bit, but you get 11 exponent bits and 52 significand bits.
  • Half-precision floating point goes in the other direction: it’s only 16 bits in total (5 exponent bits, 10 significand bits).
  • The bfloat16 or “brain floating point” format is a different 16-bit floating-point format that was invented recently specifically for machine learning. It is just a small twist on “normal” half-precision floats that reallocates a few bits from the significand to the exponent (8 exponent bits, 7 significand bits). It turns out that having extra dynamic range, at the cost of precision, is exactly what lots of deep learning models need. So it has very quickly become implemented in lots of hardware.

Some General Guidelines

Now that you know how floating-point numbers work, we can justify a few common pieces of advice that programmers often get about using them:

  • Floating-point numbers are not real numbers. Expect to accumulate some error when you use them.
  • Never use floating-point numbers to represent currency. When people say $123.45, they want that exact number of cents, not $123.40000152. Use an integer number of cents: i.e., a fixed-point representation with a fixed decimal point.
  • If you ever end up comparing two floating-point numbers for equality, with f1 == f2, be suspicious. For example, try 0.1 + 0.2 == 0.3 to be disappointed. Consider using an “error tolerance” in comparisons, like abs(f1 - f2) < epsilon.
  • Floating-point arithmetic is slower and costs more energy than integer or fixed-point arithmetic. You get what you pay for: the flexibility of floating-point operations mean that they are fundamentally more complex for the hardware to execute. That’s why many practical machine learning systems convert (quantize) models to a fixed-point representation so they can run efficiently.

For many more details and much more advice, I recommend “What Every Computer Scientist Should Know About Floating-Point Arithmetic” by David Goldberg.

Data Types in C

Type Aliases

Don’t like the names of types in C? You can create type aliases to give them new names:

#include <stdio.h>

typedef int number;

int main() {
    number x = 3410;
    int y = x / 2;
    printf("%d %d\n", x, y);
}

Use typedef <old type> <new type> to declare a new name.

This admittedly isn’t very useful by itself, but it will come in handy as types get more complicated to write. See the C reference pages on typedef for more.

Structures

In C, you can declare structs to package up multiple values into a single, aggregate value:

#include <stdio.h>

struct point {
    int x;
    int y;
};

void print_point(struct point p) {
    printf("(%d, %d)\n", p.x, p.y);
}

int main() {
    struct point location = {4, 10};
    location.y = 2;
    print_point(location);
}

Structs are a little like objects in other languages (e.g., Java), but they don’t have methods—only fields. You use “dot syntax” to read and write the fields. This example also shows off how to initialize a new struct, with curly brace syntax:

struct point location = {4, 10};

You supply all the fields, in order, in the curly braces of the initializer.

Again, there is a section in the C reference pages for more on struct declarations.

Short Names for Structs

The type of the struct in the previous example is struct point. It’s common to give structs like these short names, for which typedef can help:

#include <stdio.h>

typedef struct {
    int x;
    int y;
} point_t;

void print_point(point_t p) {
    printf("(%d, %d)\n", p.x, p.y);
}

int main() {
    point_t location = {4, 10};
    location.y = 2;
    print_point(location);
}

This version uses a typedef to give the struct the shorter name point_t instead of struct point. By convention, C programmers often use <something>_t for custom type names to make them stand out.

Enumerations

There is another kind of “custom” data type in C, called enum. An enum is for values that can be one of a short list of options. For example, we can use it for seasons:

#include <stdio.h>

typedef enum {
    SPRING,
    SUMMER,
    AUTUMN,
    WINTER,
} season_t;

int main() {
    season_t now = WINTER;
    season_t next = SPRING;
    printf("%d %d\n", now, next);
    return 0;
}

We’re using the same typedef trick as above to give this type the short name season_t instead of enum season.

Enums are useful to avoid situations where you would otherwise use a plain integer. They’re more readable and maintainable than trying to keep track of which number means which season in your head.

There is a reference page on enums too.

Arrays & Pointers

Arrays

Like other languages you have used before, C has arrays. Here’s an example:

#include <stdio.h>

int main() {
    int courses[7] = {1110, 1111, 2110, 2112, 2800, 3110, 3410};

    int course_total = 0;
    for (int i = 0; i < 7; ++i) {
        course_total += courses[i];
    }
    printf("the average course is CS %d\n", course_total / 7);

    return 0;
}

You declare an array of 7 ints like this:

int courses[7];

And you can also, optionally, provide an initial value for all of the things in the array, as we do in the example above:

int courses[7] = {1110, 1111, 2110, 2112, 2800, 3110, 3410};

You access arrays like courses[i]. This works for both reading and writing. You can read more about arrays in the C reference pages.

Pointers

Pointers are (according to me) the essential feature of C. They are what make it C. They are simultaneously dead simple and wildly complex. They can also be the hardest aspect of C programming to understand. So forge bravely on, but do not worry if they seem weird at first. Pointers will feel more natural with time, as you gain more experience as a C programmer.

Memory

Pointers are a way for C programs to talk about memory, so we first need to consider what memory is.

It’s helpful to think of a simplified computer architecture diagram, consisting of a processor and a memory. The processor is where your C code runs; it can do any computation you want, but it can’t remember anything. The memory is where all the data is stored; it remembers a bunch of bits, but it doesn’t do any computation at all. They are connected—imagine wires that allow them to send signals (made of bits) back and forth. There are two things the CPU can do with the memory: it can load the value at a given address of its choosing, and it can store a new value at an address.

Abstractly, we can think of memory as a giant array of bytes. Metaphorically speaking (not actually!), it might be helpful to imagine a C declaration like this:

uint8_t mem[SIZE];

where SIZE is the total number of bytes in your machine. Several billion, surely. In this metaphor, the processor reads from memory by doing something like mem[123], and it writes by doing mem[123] = 45 in C. The “address” works like an index into this metaphorical array of bytes.

Maybe the most important thing to take away from this metaphor is that an address is just bits. Because, after all, everything is just bits. You can think of those bits as an integer, i.e., the index of the byte you’re interested in within the imaginary mem array.

A Pointer is an Address

In C, a pointer is the kind of value for memory addresses. You can think of a pointer as logically pointing to the value at a given address, hence the name.

But I’ll say it again, because it’s important: pointers are just bits. Recall that a double variable and a int64_t variable are both 64-bit values—from the perspective of the computer, there is no difference between these kinds of values. They are both just groups of 64 bits, and only the way the program treats these bits makes them an integer or a floating-point number. Pointers are the same way: they are nothing more than 64-bit values, treated by programs in a special way as addresses into memory.

The size of pointers (the number of bits) depends on the machine you’re running on. In this class, all our code is compiled for the RISC-V 64-bit architecture, so pointers are always 64 bits. (If you’ve ever heard a processor called a “32-bit” or “64-bit” architecture, that number probably describes the size of pointers, among other values. Most modern “normal” computers (servers, desktops, laptops, and mobile devices) use 64-bit processors, but 32-bit and narrower architectures are still commonplace in embedded systems.)

Pointer Types and Reference-Of

In C, the type of a pointer to a value of type T is T*. For example, a pointer to an integer might have type int*, and pointer to a floating-point value might be a float*, and a pointer to a pointer to a character could have type char**.

To reiterate, all of these types are nothing more than 64-bit memory addresses. The only difference is in the way the program treats those addresses: e.g., the program promises to only store an int in memory at the address contained in an int*.

In C, you can think of all data in the program as “living” in memory. So every variable and every function argument exists somewhere in the giant metaphorical mem array we imagined above. That means that every variable has an address: the index in that huge array where it lives.

C has a built-in operator to obtain the address for any variable. The & operator, called the reference-of operator, takes a variable and gives you a pointer to the variable. For example, if x is an int variable, then &x is the address where x is stored in memory, with type int*.

Here’s an example where we use & to get the address of a couple of variables:

#include <stdio.h>

int main() {
    int x = 34;
    int y = 10;

    int* ptr_to_x = &x;
    int* ptr_to_y = &y;

    printf("ints are %lu bytes\n", sizeof(int));
    printf("pointers are %lu bytes\n", sizeof(int*));
    printf("x is located at %p\n", ptr_to_x);
    printf("y is located at %p\n", ptr_to_y);

    return 0;
}

We’re also using the %p format specifier for printf, which prints out memory addresses in hexadecimal format. (By convention, programmers almost always use hex when writing memory addresses.) Here’s what this program printed once on my machine:

ints are 4 bytes
pointers are 8 bytes
x is located at 0x1555d56bbc
y is located at 0x1555d56bb8

The built-in sizeof operator tells us that pointers are 8 bytes (64 bits) on our RISC-V 64 architecture, which makes sense. ints are 4 bytes, as they are on many modern platforms. The system is free to choose different addresses for variables, so don’t worry if the addresses are different when you run this program—that’s perfectly normal.

In this output, however, the system is telling us that it chose very nearby addresses for the x and y variables: the first 60 bits of these addresses are identical. The address of x ends in the 4 bits corresponding to the hex digit c (12 in decimal), and y lives at an address ending in 8. That means that x and y are located right next to each other in memory: y occupies the 4 bytes at addresses …6bb8, …6bb9, …6bba, and …6bbb, and then the 4 bytes for x begin at the very next address, …6bbc.

Whitespace Insensitivity

In C, it doesn’t matter where you put the whitespace in a pointer declaration. int* x, int *x, and int * x all mean exactly the same thing. We will tend to write declarations like int* x, although you’ll often see int *x in real-world C code. You can use whichever you prefer

Everything Has an Address, Including Pointers

Just to emphasize the idea that, in C, all variables live somewhere in memory, let’s take a moment to appreciate that ptr_to_x and ptr_to_y are themselves variables. So they also have addresses:

#include <stdio.h>

int main() {
    int x = 34;
    int y = 10;

    int* ptr_to_x = &x;
    int* ptr_to_y = &y;

    printf("ints are %lu bytes\n", sizeof(int));
    printf("pointers are %lu bytes\n", sizeof(int*));
    printf("x is located at %p\n", ptr_to_x);
    printf("y is located at %p\n", ptr_to_y);
    printf("ptr_to_x is located at %p\n", &ptr_to_x);
    printf("ptr_to_y is located at %p\n", &ptr_to_y);

    return 0;
}

Always remember: pointers are just bits, pointer-typed variables follow the same rules as any other variables.

Pointers as References, and Dereferencing

While pointers are (like everything else) just bits, what makes them useful is that it’s also possible to think of them in a different way: as references to other values. From this perspective, pointers in C resemble references in other languages you have used: it is the power you need to create variables that refer to other values.

The key C feature that makes this view possible is its * operator, called the dereference operator. The C expression *p means, roughly, “take the pointer p and follow it to wherever it points in memory, so I can read or write that value (not p itself).”

You can use the * operator both to load from (read) and store to (write) memory. Imagine a pointer p of type int*. Here’s how you read from the place where p points:

int value = *p;

And here’s how you write to that location where p points:

*p = 5;

When you’re reading, *p can appear anywhere in a larger expression too, so you can use *p + 5 to load the value p points to and then add 5 to that integer.

All this means that you can use pointers and dereferencing to perform “remote control” accesses to other variables, in the same way that references work in other programming languages. Here’s an example:

#include <stdio.h>

int main() {
    int x = 34;
    int y = 10;

    int* ptr = &x;

    printf("initially, x = %d and y = %d and ptr = %p\n", x, y, ptr);
    *ptr = 41;
    printf("afterward, x = %d and y = %d and ptr = %p\n", x, y, ptr);

    return 0;
}

The point of this example is that modifying *ptr changes the value of x. It does not, however, change the value of ptr itself: that still points to the same place.

To emphasize that pointer-typed variables behave like any other variable, we can also try assigning to the pointer variable. It is absolutely critical to recognize the subtle difference between assigning to *ptr and assigning to ptr:

#include <stdio.h>

int main() {
    int x = 34;
    int y = 10;

    int* ptr = &x;

    printf("0: x = %d and y = %d and ptr = %p\n", x, y, ptr);
    *ptr = 41;
    printf("1: x = %d and y = %d and ptr = %p\n", x, y, ptr);
    ptr = &y;
    printf("2: x = %d and y = %d and ptr = %p\n", x, y, ptr);
    *ptr = 20;
    printf("3: x = %d and y = %d and ptr = %p\n", x, y, ptr);

    return 0;
}

The thing to pay attention to here is that assigning to ptr just changes ptr itself; it does not change x or y. (That’s the rule for assigning to any variable, not just pointers!) Then, when we assign to *ptr the second time, it updates y this time, because that’s where it points.

I hope this kind of “variables that reference other variables” thinking is familiar from using other languages, where references are extremely common. The difference in C is that there is no magic: we get reference behavior out of the “raw materials” of bits, by treating some 64-bit values as addresses in memory. Under the hood, this is how references in other languages are implemented too—but in C, we get direct access to the underlying bits.

Arrays are Mostly Just Pointers

Now that we know about pointers, let’s revisit arrays. In C, an array is a sequence of values all laid out next to each other in memory. We can use the & reference-of operator to check out the addresses of the elements in an array:

#include <stdio.h>

int main() {
    int courses[7] = {1110, 1111, 2110, 2112, 2800, 3110, 3410};

    printf("first element is at %p\n", &courses[0]);
    printf(" next element is at %p\n", &courses[1]);
    printf(" last element is at %p\n", &courses[6]);

    return 0;
}

When I ran this program on my machine once, it told me that the first element of the array was located at address 0x1555d56b90, the next element was at 0x1555d56b94, and so on, with each address increasing by 4 with each element. Remember that ints are 4 bytes on our platform, so these addresses mean that the elements are packed densely, each one next to the other.

You can think of the array having a base address \(b\). Then, the address of an element at index \(i\) has this address:

\[ b + s \times i \]

where \(s\) is the size of the elements, in bytes.

Treat an Array as a Pointer to the First Element

In fact, C lets you treat an array itself as if it were a pointer to the first element: i.e., the base address \(b\). This works, for example:

#include <stdio.h>

int main() {
    int courses[7] = {1110, 1111, 2110, 2112, 2800, 3110, 3410};

    printf("first element is at %p\n", &courses[0]);
    printf("the array itself is %p\n", courses);

    return 0;
}

And C tells us that, if we treat courses as a pointer, it has the same address as its first element. From that perspective, it is helpful to think of an array variable as storing of the address of the first element of the array. One important takeaway from this realization is that C does not store the length of your array anywhere—just a pointer to the first element. It’s up to you to keep track of the length yourself somehow.

This means that, if you want to pass an array to a function, you can use a pointer-typed argument:

#include <stdio.h>

int sum_n(int* vals, int count) {
    int total = 0;
    for (int i = 0; i < count; ++i) {
        total += vals[i];
    }
    return total;
}

int main() {
    int courses[7] = {1110, 1111, 2110, 2112, 2800, 3110, 3410};

    int sum = sum_n(courses, 7);
    printf("the average course is CS %d\n", sum / 7);

    return 0;
}

If you do, it is always a good idea to pass the length of the array in a separate argument. The subscript syntax, like vals[i], works the same way for pointers as it does for arrays.

Function Parameters: int arr[] or int* arr?

C also lets you declare function parameters with actual array types (e.g., int arr[]) instead of pointer types (e.g., int* arr). This can quickly get confusing, however, and it has very few benefits over just using pointers—so we recommend against it in essentially every case. Just use pointer types whenever you need to pass an array as an argument to a function.

Pointer Arithmetic

Since we’ve seen that the elements of an array exist right next to each other in memory, can we access them by computing their addresses ourselves? Absolutely! C supports arithmetic operators like + and - on pointers, but they follow a special rule you will need to remember. Here’s an example:

#include <stdio.h>

void experiment(int* courses) {
    printf("courses     = %p\n", courses);
    printf("courses + 1 = %p\n", courses + 1);
}

int main() {
    int courses[7] = {1110, 1111, 2110, 2112, 2800, 3110, 3410};
    experiment(courses);
    return 0;
}

The important thing to notice here is that adding 1 to courses increased its value by 4, not by 1. That’s because the rule in C is that pointer arithmetic “moves” pointers by element-sized chunks. So because courses has type int*, its element size is 4 bytes. The rule says that, if you write the expression courses + n, that will actually add \(n \times 4\) bytes to the address value of courses.

This may seem odd, but it’s extremely useful: it means that pointer arithmetic stays pointing to the first byte of an element. If you think of courses itself as a pointer to the first int in the array, then courses + 1 points to the (first byte of) the second int in the array. It would be inconvenient and annoying if doing +1 just took us to the second byte in the first element; nobody wants that.

A consequence is that we can use pointer arithmetic directly, along with the dereferencing operator *, to access the elements of an array:

#include <stdio.h>

void experiment(int* courses) {
    printf("courses[0] = %d\n", *(courses + 0));
    printf("courses[5] = %d\n", *(courses + 5));
}

int main() {
    int courses[7] = {1110, 1111, 2110, 2112, 2800, 3110, 3410};
    experiment(courses);
    return 0;
}

Now that you know how arrays and pointer arithmetic work, you don’t actually need the subscripting operator! Instead of writing arr[idx], you can always just use *(arr + idx). It means the same thing.

Here’s a fun but mostly useless fact about C programming. Since arr[idx] means exactly the same thing as *(arr + idx), and because + is commutative, this also means the same thing as *(idx + arr), which can—by the same rules—also be written as idx[arr]. So if you really want to confuse the people reading your code, you can always write your array indexing expressions backward:

#include <stdio.h>

void experiment(int* courses) {
    printf("courses[0] = %d\n", 0[courses]);
    printf("courses[5] = %d\n", 5[courses]);
}

int main() {
    int courses[7] = {1110, 1111, 2110, 2112, 2800, 3110, 3410};
    experiment(courses);
    return 0;
}

But this is, uh, not a great idea in the real world, where your code will actually be read by humans with thoughts and feelings.

Strings are Null-Terminated Character Arrays

Our new knowledge about pointers and arrays now lets us revisit another concept we’ve already been using in C: strings. You may recall that we previously told you not to worry about why strings in C have the type char*. Now we can demystify this fact: strings in C are arrays of char values, each of which is a single character.

On most modern systems (including our RISC-V target), char is a 1-byte (8-bit) type. So each char in a string is a number between 0 and \(2^8-1\), i.e., 255. Programs use a text encoding to decide which number represents which textual character. An extremely popular encoding that includes the basic English alphabet is ASCII. But C saves you the trouble of looking up characters in the ASCII table; you can use a literal 'q' (note the single quotes!) to get a char with the numeric value corresponding to a lower-case q character.

As with any other array in C, a string just consists of a pointer to the first element (the first character in this case). So when you see char* str, you can think either “str is a string” or “str is the address of the first element of a string.”

Also as with any other array, we need a way to know how many elements there are in the array. Instead of keeping track of the length as an integer, as we have so far, C strings use a different convention: they use a null character, with value 0, to indicate the end of a string. You can write this special character as '\0'. This means that various functions that process strings work by iterating through all the characters and then stopping when the character is '\0'.

All this means that you can use everything you know about C arrays and apply them to strings. For example:

#include <stdio.h>

void print_line(char* s) {
    for (int i = 0; s[i] != '\0'; ++i) {
        fputc(s[i], stdout);
    }
    fputc('\n', stdout);
}

int main() {
    char message[7] = {'H', 'e', 'l', 'l', 'o', '!', '\0'};
    print_line(message);
    return 0;
}

This shows several C array features that are equally useful for strings (character arrays) as they are for any other array:

  • Array initialization, with curly braces.
  • Treating arrays as pointers to their first element, so we can pass our char array to a function expecting a char*.
  • Using array subscript notation, like s[i], on the pointer to access the array’s elements.

One important thing to realize here is that, when we initialize this array “manually” using the array initialization syntax, we have to remember to include the null terminator '\0' ourselves. Ordinary string literals, like "Hello!", include a null terminator automatically. So these lines are roughly equivalent:

char message[7] = {'H', 'e', 'l', 'l', 'o', '!', '\0'};
char* message = "Hello!";

If you go the manual route and forget the null terminator, bad things will happen. Try to imagine what might go wrong in this program if we left off the '\0', for example. There are many possibilities, and none of them are good. (This is an example of undefined behavior in C, so there is no single answer.)

Fun Pointer Tricks

Here are some useful things you can do with pointers.

Pass by Reference

Pointers are useful for passing parameters by reference. C doesn’t actually have a way to “native” pass-by-reference; everything is passed as a value. But you can pass pointers as values and use those to refer to other values.

For example, this swap function doesn’t work because a and b are passed by value:

#include <stdio.h>

void swap(int x, int y) {
    int tmp = x;
    x = y;
    y = tmp;
}

int main() {
    int a = 34;
    int b = 10;
    printf("%d %d\n", a, b);
    swap(a, b);
    printf("%d %d\n", a, b);
}

But if we pass pointers instead, we can dereference those pointers so we modify the original variables in place. So this version works:

#include <stdio.h>

void swap(int* x, int* y) {
    int tmp = *x;
    *x = *y;
    *y = tmp;
}

int main() {
    int a = 34;
    int b = 10;
    printf("%d %d\n", a, b);
    swap(&a, &b);
    printf("%d %d\n", a, b);
}

Null Pointers

Because pointers are just integers, you can set the to zero. Zero isn’t actually a valid memory address. That makes the zero value useful for signaling the absence of data. It’s particularly useful for writing functions with optional parameters.

In C, you can use NULL to get a pointer with value zero. Here’s an example that extends our swap function to optionally also produce the sum of the values:

#include <stdio.h>

void swap_and_sum(int* x, int* y, int* sum) {
    int tmp = *x;
    *x = *y;
    *y = tmp;

    if (sum != NULL) {
        *sum = *x + *y;
    }
}

int main() {
    int a = 34;
    int b = 10;
    printf("%d %d\n", a, b);
    int sum;
    swap_and_sum(&a, &b, &sum);
    swap_and_sum(&a, &b, NULL);
    printf("%d %d\n", a, b);
    printf("sum = %d\n", sum);
}

When a pointer might be null, always remember to include a != NULL check before using it. The possibility of accidentally dereferencing a null pointer is Sir Tony Hoare’s “billion-dollar mistake.”

Pointers to Pointers

The type of a pointer to a value of type T is T*. That includes when T itself is a pointer type! So you can create pointers to pointers, and so on. For example, int** is a pointer to a pointer to an int. (It’s not common to go any deeper than two levels, but nothing stops you…)

It’s a silly example, but we can make our swap function swap int*s instead of actual ints:

#include <stdio.h>

void swap(int** x, int** y) {
    int* tmp = *x;
    *x = *y;
    *y = tmp;
}

int main() {
    int a = 34;
    int b = 10;

    int* a_ptr = &a;
    int* b_ptr = &b;

    printf("%d %d\n", a, b);
    swap(&a_ptr, &b_ptr);
    printf("%d %d\n", a, b);
}

Pointers to Functions

Maybe you have taken CS 3110, so you know it’s cool to pass functions into other functions. C can do that too, kind of! By creating pointers to functions.

The syntax admittedly looks really weird. You write T1 (*name)(T2, T3) for a pointer to a function that takes argument types T2 and T3 and returns a type T1.

Here’s an example in action:

#include <stdio.h>

int incr(int x) {
    return x + 1;
}

int decr(int x) {
    return x - 1;
}

int apply_n_times(int x, int n, int (*func)(int)) {
    for (int i = 0; i < n; ++i) {
        x = func(x);
    }
    return x;
}

int main() {
    int n = 20;
    n = apply_n_times(n, 5, &incr);
    n = apply_n_times(n, 2, &decr);
    printf("n = %d\n", n);
}

Pointers to Anything

Remember that pointers are bits, and all pointers look the same: they are just memory addresses. So, if you just look at the bits, there is no difference between an int* and a float* and a char*. They are all just addresses.

For this reason, C has a special type that means “a pointer to something, but I don’t know what.” The type is spelled void*. It is useful in situations where you don’t care what’s being pointed to.

Here’s a simple program that uses a void* to wrap up a call to printf for showing addresses:

#include <stdio.h>

void print_ptr(void* p) {
    printf("%p\n", p);
}

int main() {
    int x = 34;
    float y = 10.0f;
    print_ptr(&x);
    print_ptr(&y);
}

The Stack, The Heap, the Dynamic Memory Allocation

The Stack

So far, all the data we’ve used in our C programs has been stored in local variables. These variables exist for the duration of the function call—and as soon as the function returns, the variables disappear. All this per-call local-variable storage is part of the function call stack, also known as just the stack.

Don’t confuse the stack with the abstract data type (ADT) that is also called a stack. The stack works like a stack, in the sense that you push and pop elements on one end of the stack. But it’s not just any stack; it’s a special one that the compiler manages for you.

You may have visualized the function call stack when you learned other programming languages. You can draw it with a box for every function call, which gets created (pushed) when you call the function and destroyed (popped) when the function returns. These boxes are called stack frames, or just frames for short (or sometimes, an activation record). For reasons that will become clear soon, when thinking about C programs, it’s important that we draw the stack growing “downward,” so the first call’s frame is at the top of the page.

Here is a mildly interesting C program that uses the stack:

#include <stdio.h>

const float EULER = 2.71828f;
const int COUNT = 10;

// Fill an array, `dest`, with `COUNT` values from an exponential series.
void fill_exp(float* dest) {
    dest[0] = 1.0f;
    for (int i = 1; i < COUNT; ++i) {
        dest[i] = dest[i - 1] * EULER;
    }
}

// Print the first `n` values in a float array.
void print_floats(float* vals, int n) {
    for (int i = 0; i < n; ++i) {
        printf("%f\n", vals[i]);
    }
}

int main() {
    float values[COUNT];
    fill_exp(values);
    print_floats(values, COUNT);
    return 0;
}

The values array is part of main’s stack frame. The calls to fill_exp and print_floats have pointer variables in their stack frames that point to the first element of this array.

Limitations of the Stack

The key limitation of putting your data on the stack comes from this observation: variables only live as long as the function call. So if you want data to remain after a function call returns, local variables (data in stack frames) won’t suffice.

The consequence of this observation is the following rule: never return a pointer to a local variable. When you do, you’re returning a pointer to data that is about to be destroyed. So it will be a mistake (undefined behavior in C) to use that pointer.

On the other hand, both of these things are perfectly safe:

  • Passing a pointer to a local variable as an argument to a function. Our example above does this. This is fine because the data exists in the caller’s stack frame, which still exists as long as the callee is running (and longer).
  • Returning a non-pointer value stored in a local variable. The compiler takes care of copying return values into the caller’s stack frame if necessary.

To get a sense for why this is limiting, consider our example above. It’s inconvenient that we have to write a fill_exp function that fills in an exponential series into an array that already exists. It seems more natural to instead write a create_exp function that returns an array populated with an exponential series. Something like this:

#include <stdio.h>

const float EULER = 2.71828f;
const int COUNT = 10;

// This function has a bug! Do not return pointers to local variables!
float* create_exp() {
    float dest[COUNT];
    dest[0] = 1.0f;
    for (int i = 1; i < COUNT; ++i) {
        dest[i] = dest[i - 1] * EULER;
    }
    return dest;
}

// Print the first `count` values in a float array.
void print_floats(float* vals, int count) {
    for (int i = 0; i < count; ++i) {
        printf("%f\n", vals[i]);
    }
}

int main() {
    float* values = create_exp();
    print_floats(values, COUNT);
    return 0;
}

That API looks cleaner; we can rely on the create_exp function to both create the array and to fill it up with the values we want. But this program has a serious bug—in C, it has undefined behavior. When I ran it on my machine, it just hung indefinitely. Of course, subtler and worse consequences are also possible.

To see what’s wrong, let’s think about what might happen with the stack in memory. All the stack frames, and all the local variables, exist at addresses in memory. When the call create_exp returns, its memory doesn’t literally get destroyed; the memory, literally speaking, still exists in my computer. But when we call print_floats on the following line, its stack frame takes the space previously occupied by the create_exp frame! So its local variables (vals and count) take up the same space that was previously occupied by the dest array.

The Heap

This create_exp example is not en edge case; in practice, real programs often need to store data that “outlives” a single function call. C has a separate region of memory just for this purpose. This region is called the heap.

As above, don’t confuse the heap with the data structure called a heap, which is useful for implementing priority queues. The heap is not a heap at all. It is just a region of memory.

The key distinction between the heap and the stack is that you, the programmer, have to manage data on the heap manually. The compiler takes care of managing data on the stack: it allocates space in stack frames for all your local variables automatically. Your code, on the other hand, needs to explicitly allocate and deallocate regions of memory on the heap whenever it needs to store data that lasts beyond the end of a function call.

C comes with a library of functions for managing memory on the heap, which live in a header called stdlib.h. The two most important functions are:

  • malloc (short for memory allocate): Allocate a new region of memory on the heap, consisting of a number of bytes that you choose. Return a pointer to the first byte in the newly allocated region.
  • free: Take a pointer to some memory previously allocated with malloc and deallocate it, freeing up the memory for use by some future allocation.

Here’s a version of our create_exp program that (correctly) uses the heap:

#include <stdio.h>
#include <stdlib.h>

const float EULER = 2.71828f;
const int COUNT = 10;

// Allocate a new array containing `COUNT` values from an exponential series.
float* create_exp() {
    float* dest = malloc(COUNT * sizeof(float));  // New!
    dest[0] = 1.0f;
    for (int i = 1; i < COUNT; ++i) {
        dest[i] = dest[i - 1] * EULER;
    }
    return dest;
}

// Print the first `count` values in a float array.
void print_floats(float* vals, int count) {
    for (int i = 0; i < count; ++i) {
        printf("%f\n", vals[i]);
    }
}

int main() {
    float* values = create_exp();
    print_floats(values, COUNT);
    free(values);  // Also new!
    return 0;
}

Let’s look at the new lines in more detail. First, the allocation:

float* dest = malloc(COUNT * sizeof(float));

The malloc function takes one argument: the number of bytes of memory you want to allocate. We want COUNT floating-point values, so we can compute that size in bytes by multiplying that array length by sizeof(float) (which gives us the number of bytes occupied by a single float). You almost always want to use sizeof in the argument of your malloc calls; this is clearer and more portable than trying to remember the size of a given type yourself.

Next, the deallocation:

free(values);

The free function also takes one argument: a pointer to memory that you previously allocated with malloc. This illustrates the cost of manual memory management: whenever you allocate memory, you take responsibility for deallocating it. That’s unlike the stack, where the compiler takes care of managing the life-cycle of the memory for you. (By the way, you should never call free on a pointer to the stack.)

The Heap Laws

Because you manually manage the memory on the heap, it’s possible to make mistakes. There are four big things you must avoid:

  • Use after free. After you free memory, you are no longer allowed to use it. Your program may not load or store through any pointers into the freed memory.
  • Double free. You may only free memory once. Do not call free on already-freed memory.
  • Memory leak. You must pair every call to malloc with a corresponding call to free. Otherwise, your program will never “recycle” its memory, so the data will grow until you run out of memory.
  • Out-of-bounds access. You must only use the pointer returned from malloc to access data inside the allocated range of bytes. You can use pointer arithmetic (or array subscripting) to read and write bytes in the range, but nothing before the beginning or after the end of the range.

Even if they seem simple, C programmers find in practice that these rules are extremely hard to follow consistently. As software gets more complex, it can be hard to keep track of when memory has been freed, when it still needs to be freed, and what to check to ensure that accesses are within bounds. Personally, I think following these rules is the hardest part of programming in C (and C++). And these problems, because they trigger undefined behavior in C, can have extremely serious consequences—not just crashes and misbehavior, but security vulnerabilities.

As an example to illustrate the severity of the problem, a 2019 study by Microsoft found that 70% of all the security vulnerabilities they tracked in their software stemmed from these kinds of memory bugs.

If you still aren’t convinced, you may recall the 2024 CrowdStrike outage last July (2024). Across the globe, approximately 8.5 million machines running Windows crashed and were unable to restart. Many core industries, such as airlines, banks, hospitals, payment systems, and more were affected costing approximately $10 billion. Ultimately, the root cause of the outage was an out-of-bounds read.

Please reflect on the fact that these problems are really only possible in languages like C and C++, where you are responsible for managing the heap yourself. In contrast, Python, Java, OCaml, Rust, and Swift are all memory safe languages, meaning that they manage the heap automatically for you. This is not just a convenience; these languages can rule out out these extremely dangerous memory bugs altogether. While they give up some performance or control to do so, programmers in these languages find these downside to be an acceptable trade-off to avoid the extreme challenge posed by memory bugs.

Catching Memory Bugs

Let’s try writing a program that intentionally violates the laws. Specifically, let’s try adding out-of-bounds reads to our create_exp program:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

const float EULER = 2.71828f;
const int COUNT = 10;

// Allocate a new array containing `COUNT` values from an exponential series.
float* create_exp() {
    float* dest = malloc(COUNT * sizeof(float));  // New!
    dest[0] = 1.0f;
    for (int i = 1; i < COUNT; ++i) {
        dest[i] = dest[i - 1] * EULER;
    }
    return dest;
}

// Print the first `count` values in a float array.
void print_floats(float* vals, int count) {
    for (int i = 0; i < count; ++i) {
        printf("%f\n", vals[i]);
    }

    // Let's see what's nearby...
    char* ptr = (char*)vals;
    for (int j = 0; j < 100; ++j) {
        char* byte = ptr - j;
        printf("%p: %d %c\n", byte, *byte, *byte);
    }
}

// Generate a secret.
char* gen_secret() {
    char* secret = malloc(16);
    strcpy(secret, "seekrit!");
    return secret;
}

int main() {
    char* password = gen_secret();
    float* values = create_exp();

    print_floats(values, COUNT);

    free(values);
    free(password);
    return 0;
}

This program takes a pointer to our values array, and it first safely walks forward from there to print out the floats it contains. Then, it does something sneaky: it starts walking backward from the beginning of the array, immediately leaving the range of legal bytes it’s allowed to read.

Because this program violates the laws, it might do anything: it might crash, corrupt memory, or just give nonsense results. But when I ran this on my machine once, it walked all the way into the memory pointed to by password and printed out its contents. Spooky! This kind of out-of-bounds read is the basis for many real-world security vulnerabilities.

Since I’m telling you that these bugs are extremely easy to create, is there any way of catching them? Fortunately, GCC has a built-in mechanism for catching some memory bugs, called sanitizers. To use them, compile your program with the flags -g -fsanitize=address -fsanitize=undefined:

$ gcc -Wall -Wextra -Wpedantic -Wshadow -Wformat=2 -std=c23 -g -fsanitize=address -fsanitize=undefined heap_bug.c -o heap_bug

Sanitizers check your code dynamically, so this won’t print an error at compile time. Try running the resulting code:

$ qemu heap_bug

If everything works, the sanitizer will print out a long, helpful message telling you exactly what the program tried to do.

Crashing with a useful error is a much more helpful thing to do than behave unpredictably. So whenever you suspect your program might have a memory bug, try enabling the sanitizers to check.

Memory Layout

The stack and the heap are both regions in the giant metaphorical array that is memory. Both of them need to grow and shrink dynamically: the program can always malloc more memory on the heap, or it can call another function to push a new frame onto the stack. Computers therefore need to choose carefully where to put these memory segments so they have plenty of room to grow as the program executes.

In general:

  • The heap starts at a low memory address and grows upward as the program allocates more memory.
  • The stack starts at a high memory address and grows downward as the program calls more functions.

By starting these two segments at opposite “ends” of the address space, this strategy maximizes the amount of room each one has to grow.

There are also other common memory segments. These ones typically have a fixed size, so “room to grow” is not an issue:

  • The data segment holds global variables and constants, which exist for the entire duration of the program. Aside from the global variables you declare yourself, string literals from your program go here.
  • The text segment contains the program, as machine code instructions. Much more discussion of these instructions is coming in a couple of weeks.

Gates & Logic

Our goal over the next couple of lectures is to build a computer.

Let’s take it back to the beginning: computers are made out of logical switches. In the modern era, these switches are implemented using transistors. But let’s start with relays instead, because they’re easier to think about.

We won’t build a computer in one step. We’re going to use relays to build bigger components, and then think abstractly about what those components do. Then we can forget about the internals, i.e., how we built the thing, and we can build something even bigger out of that. Step by step, we will climb up the latter of abstraction and build a computer.

Truth Tables

To climb the abstraction latter, we need an abstract way to write down the behavior of a circuit element. Our tool for this is a truth table, which exhaustively describes how the circuit’s input and output signals behave in terms of bits.

Logical AND and OR gates have two inputs, A and B, and one output, out.

Recall how relays have a “default on” and a “default off” variant. (The electromagnet repulses or attracts the bendy piece of metal, respectively.) Truth tables are a good way to write down the difference between the variants.

Here is the truth table for a logical OR gate:

ABout
000
011
101
111

Truth tables have one column per input, and they have one row for every combination of input values.

Here’s the truth table for a logical AND gate:

ABout
000
010
100
111

Building Not

Let’s build a not function next. Here’s the truth table:

inout
01
10

This circuit is also called an inverter.

Level Up: Building NAND and NOR

It’s important to write down the specification for the function we want. Our specifications will be truth tables. Here’s the truth table for NAND:

ABANDNAND
0001
0101
1001
1110

There are two inputs, A and B, and one output, NAND. Note that NAND is the opposite of AND; i.e. NAND is the inversion of AND.

ABORNOR
0001
0110
1010
1110

Similarly for NOR, there are two inputs, A and B, and one output, NOR. NOR is the opposite of OR; i.e. NOR is the inversion of OR.

ABXORXNOR
0001
0110
1010
1101

Further, XOR and XNOR have inputs A and B, where ouput, XOR is 1 when A and B are not equal and XNOR is 1 when A and B are equal; i.e. XNOR is the inversion of XOR.

Keep Leveling Up

We’re going to keep building larger and more interesting circuits out of smaller ones. This “leveling up” sort of feels like a video game. In fact, people have made video games out of this process! A cool one is Nandgame.

Try using Nandgame to build the circuits we already made. Then, try going farther and making AND and OR circuits.

Logic Notation

It’s going to be helpful to have a notation to write down these logic circuits as we make them more complicated. Here is some common mathy notation that people use to write these operators.

nameC bitwise opmathy
not~a\( \overline{a} \) or \( \neg a \) or \(a’\)
anda & b\( a \wedge b \) or \(a \cdot b\) or just \(ab\)
ora | b\( a \vee b \) or \(a + b\)
xora ^ b\(a \oplus b\)

Each of these operators has a visual representation for wiring schematics, but they are too hard to include here. You can see them all on the Wikipedia page for logic gate.

Universal Gates, and a Recipe for Building Anything

Nandgame encourages you to be creative: to think carefully about how to use your “inventory” efficiently to build a new circuit. But there is an easier, more mechanical way that works to build anything: that is, given an arbitrary truth table, this method can give you a circuit.

Here are the steps:

  1. Start with a truth table.
  2. For every row where the output is 1, write out the minterms. The minterm is the logical expression that is an “and” of all the input variables, either with or without negation, according to the truth value of the given input. For example, if the row in the truth table has \(a = 1\) and \(b = 0\), then the minterm is \(a\overline{b}\). The idea is that the minterm completely describes the input condition where that row is active.
  3. Join all the minterms for those output-1 rows with “ors.” This is the sum-of-products expression.

That gives you a logical expression consisting only of not, and, and or that is 1 when the output in the truth table is 1 and 0 otherwise. You can construct a circuit out of these three gates to match the expression.

Because this sum-of-products process works for any truth table, and it only uses those three gates, you can conclude that the combination of and, not and or is all you really need: if you just have those three functions, you can build any other function.

It gets better: you can each all of and, or, and not through a clever combination of only nand gates. You can also build any of them out of just nor gates. (Try it in Nandgame if you want!) That means that, transitively, you can build any circuit out of just nand or just nor. People call these gates universal for that reason.

Practicing Sum-of-Products Constructions

Here are two functions you can build to try out your newfound skills in building arbitrary circuits out of and, or, and not:

  1. Try building xnor, i.e., “not xor,” using this technique.
  2. A multiplexer (aka a mux or a selector) has three inputs: s for “select,” in₀, and in₁. It has one output, out. When s is 0, out is equal to in₀. When s is 1, out is equal to in₁.

Because the multiplexer has 3 inputs, you will want to use 3-input and and or gates. You can, of course, implement these with a cascade of 2-input gates.

Arithmetic

If this technique really works to build “everything,” let’s try using it build math. Starting with addition.

Half Adder

To keep the circuit small, let’s add two 1-bit numbers.

Let’s start by writing out all the possible combinations, and the sum as a binary value. This is not quite a truth table, because the output is a 2-bit number and not a truth value, but it’s close:

aba+b
000
011
101
1110

To make this into a truth table, let’s separate the two bits of the output sum—and fill in the implicit 0 in the most significant bit. The normal way to do this is to label the two bits c, for the carry bit, and s, for the sum. The truth table looks like this:

abcs
0000
0101
1001
1110

Remember that a and b are the input columns, and c and s are the output columns.

This truth table is a little different from the other ones on this page because it has two outputs. But we can still use the same approach, just one output at a time. That is, we can write the logical formulas for the two outputs separately \( c = ab \) and \( s = \overline{a}b \vee a\overline{b} \).

It is “fun” to notice that there is another truth table that already matches the behavior of the sum value: namely, \( s = a \oplus b \). So we can use two of the gates we built above to make this one-bit adder: an and gate for c and an xor gate for s.

This circuit is usually called a half adder. Why “half”? It’s missing an important feature that we’ll add next.

Full Adder

Adding one-bit numbers is nice, but we would like to add bigger numbers. The insight that will get us there is that, when we do “long addition” of binary numbers, we add up one bit at a time—and possibly “carry the one” to the next column. At each step in this process, we actually need to add three one-bit numbers together: each of the two input bits and—for every bit except the first—the carried bit from the previous column (which may be zero).

So the key to implementing a circuit that does “long addition” is to extend our one-bit adder above to take three inputs instead of two. This thing will be called a full adder. It has three one-bit inputs: \(a\), \(b\), and \(c_{\mathrm{in}}\) for the carry-in bit. Just like the half adder, it has two one-bit outputs: the sum \(s\) and the carry-out bit \(c_{\mathrm{out}}\).

Try writing out a truth table for this circuit. One useful thing to remember is that, despite \(c_{\mathrm{in}}\) having a different-looking name, the three inputs are really indistinguishable: we’re just adding up 3 one-bit numbers here.

We could absolutely use the sum-of-products approach to build the circuit for the full adder. But it turns out that there is a much simpler way to do it by using two half adders and some other logic. Can you build this circuit? You can try skipping to the “full adder” level in Nandgame to try it out.

n-Bit Adder

The full adder is the building block we need to construct an \(n\)-bit adder, for any \(n\): a circuit that takes two \(n\)-bit numbers and adds them together, producing an \((n+1)\)-bit result. You can make this circuit by chaining together a series of \(n\) full adders, hooking the \(c_{\mathrm{out}}\) of one to the \(c_{\mathrm{in}}\) of the next.

By climbing the abstraction ladder, we have gradually gotten from relays, something we can physically understand, all the way to a binary calculator. We don’t have a computer yet, exactly, but we do have something pretty cool.

Binary Subtraction

Two’s complement subtraction works with the same n-bit adder circuit! In particular, subtraction is addition with a negated operand. I.e Negation is done by inverting all bits and adding one: \( A - B = A + (-B) = A + (\overline{B} + 1) \)

Thus, the n-bit adder works by setting the carry-in input to 1 and invertying the B operand bits.

n-Bit Adder that can add or subtract

Lastly, the n-bit adder can be modified such that it can add or subtract. In particular, the carry-in input is set to 0 for add or 1 for subtract. Then, an XOR gate can be used for the operand B such that it negates B if carry-in is set to subtract.

sub?\(B_0\)\(newB_0\)
000
011
101
110

if subtracting, invert \(B_0\)

Stateful Logic

The Need for State

So far, we have climbed up the abstraction ladder to build circuits that can do lots of interesting computations on bits. We have an n-bit adder, for example, so maybe you can believe that—using the same principles—we can build more complicated operations: multiplication and even divisor, for example. But I contend that the principles we’ve been using have a fundamental limitation: they are stateless. To build a real computer, we will need a way to store and retrieve information.

To see what I mean by stateless, try inputting a bunch of numbers into an adder (or whatever) in Nandgame. Then, reset all the inputs back to zero. The circuit’s outputs also back down to zero, because they are a function of the current values of the inputs. The circuit has no memory of what happened in the past.

The reason this is a problem is that computers work by iteratively updating stored values, one step at a time. Extending our simplified view of computer architecture, let’s imagine a computer made of three parts:

  • The processor logic, with circuits for addition and such.
  • The data memory: a mapping from memory addresses to values.
  • An instruction: a string of bits that encode some operation for the processor to take, such as “read the values from the data memory at addresses 0xaf and 0x1c and put the result at address 0xe9.”

If the bits of the instruction were exposed via buttons on your machine, you could do computations by sequentially keying in different instructions. The data memory itself clearly needs to be stateful, i.e., to do a thing that our circuits are incapable of so far to keep data around. But let’s pretend that’s someone else’s problem and focus just on the processor for now. Even so, this setup leaves something to be desired: a human would have to manually key in each instruction in sequence. That’s of course not how programs work in real computers; somehow, there’s a way to write a program down up front and then let the computer run through instructions of its own accord.

Let’s extend our architecture diagram with another memory: the instruction memory. This will contain a bunch of bit-strings like our example above, laid out in order. Again, I know this memory itself needs state, but let’s ignore that for now. To make the whole machine work, we will also need a way to keep track of the current instruction we are executing. In real machines, this thing is called the program counter (PC): a stateful element that holds the address in the instruction memory of the currently-executing instruction. This might start out at zero, so we read out the value of the 0th instruction; then, when that instruction is done doing all of its work, we need to increment it to 1 to run the next instruction, and so on.

This program counter needs to be stateful. It needs to keep track of the current value and hold it over time until we decide to change it. Today, we will build circuits that can work like this.

The Clock

Stateful circuits are all about doing things over time: i.e., taking different actions at one point in time vs. another. But how do we define “time”? Stateful circuits usually use a special signal, called a clock to keep track of “logical time.” By “logical time,” we mean time measured in an integer number of clock cycles, as opposed to the continuous world of real time measured in seconds and minutes.

A clock is an input signal to our circuits that oscillates between 0 and 1 in a regular pattern. You can imagine a person with a button just continuously toggling the signal on and off. We will assume the clock signal as an input—in practice, people implement it with special analog circuits that we won’t cover in this class.

Here is some terminology about clocks:

  • The clock is high when the value is 1 and low when the value is 0.
  • Accordingly, a rising edge is the moment when the clock goes from low to high. A falling edge is when it goes from high to low. It can help to visualize these moments in a timing diagram, with real time on the x-axis and the clock value on the y-axis.
  • The clock period is the time between two adjacent rising edges (or between two falling edges—it’s the same). So during one clock period, the clock is high for half the time and low for half the time. The period is measured in real time, i.e., in seconds.
  • The clock frequency is the reciprocal of the clock period. It’s measured in hertz (Hz).

For examples of the latter two, one nanosecond is one billionth of a second. So a system with clock period 1 ns has a frequency of 1 GHz.

SR Latch

Let’s build our first stateful circuit. It’s called an SR latch, named after its two inputs: S for “set” and R for “reset.” It has one output, traditionally named Q.

The circuit is made of two NOR gates. Most of it will look familiar, but there’s one tricky aspect: one gate feeds back into itself, via the other gate. (See the visual notes associated with this lecture for the circuit diagram.)

Let’s attempt to analyze this circuit by thinking through its truth table:

SRQ
00
01
10
11

The middle two rows are not too hard. When only one of S and R are 1, the NOR gates seem to “ignore” the feedback path. We can fill in those rows by propagating the signals through the wires:

SRQ
00
010
101
11

Now let’s try the first row, where both S and R are 0. The “feedback” path seems to actually matter in this case. One way to analyze the circuit is to assume the value for Q and then try to confirm. If you try this for both possible values of Q, something strange happens: we can “confirm” either assumption! It turns out that this circuit preserves the old value of Q. So while we’re definitely violating the rules of truth tables (so this is not really a truth table anymore), we can record a note about what happens here:

SRQ
00keep the old value
010
101
11

Finally, there’s the last case: where both S and R are 1. I would actually like to avoid talking too much about this case because it’s not part of the “spec” of what we want out of an SR latch. Now is a good time to talk about that spec—here’s how it’s supposed to behave:

  • When S is 1, that’s a set, and we set the stored value to 1.
  • When R is 1, that’s a reset, and we set the stored value to 0.
  • Otherwise, when the circuit is “at rest” and their input is 1, the value stays what it was, and Q outputs the stored value.
  • Please don’t set S and R to 1 simultaneously.

The annoying thing about it the “both 1” case is that, after you do this, you probably want to lower both inputs to 0 (to return to the “at rest” state). But the final value of Q depends on the (real time) order that these signals change, which is weird. So the “spec” for SR latches usually just says “please don’t do this.” It’s a little bit like undefined behavior!

D Latch

The SR latch, while an amazing first attempt at putting state into circuits, has two shortcomings, both of which stem from having separate S and R inputs:

  • It’s kind of weird that there are two different wires for encoding the state that we want to store. Can’t we just have one, that is 0 when we want to store 0 and 1 when we want to store 1?
  • There’s the uncomfortable business of the case where both S and R are 1 simultaneously. Can we prevent this?

We will now build a more sophisticated stateful circuit that solves both problems. It’s called a D latch. The key idea is to have a single data input (named D) that is 0 when we want to store 0 and 1 when we want to store 1. However, we also need a way to tell the circuit whether we are currently trying to store something, or whether the value should just stay the same. For that, we’ll wire up a clock signal (named C), and use the convention that the data can only get stored when the clock is high.

You can make a D latch by adding a couple of AND gates and an inverter “in front” of an SR latch. (Again, see the visual notes accompanying this lecture for the diagram.) It is useful to think again about the not-quite-truth-table for the circuit:

CDQ
00
01
10
11

When C is 0 (the clock is low), notice that both AND gates are inactive, in the sense that they ignore their other input and output zero. So regardless of the value of D, both the S and R inputs to the SR latch are zero. That’s the case where the SR latch keeps its current value. So, in our table for the D latch, the same thing happens to Q:

CDQ
00keep
01keep
10
11

Now let’s think about the rows where the clock is high. Now, one input to both AND gates is 1, so their output behaves like the other input (remember that \(b \wedge 1 = b\) for any bit \(b\)).

So what’s going on with those other inputs to the ANDs? D goes straight into the S input of the SR latch, and it is inverted when it goes into the R input. So in this setting, S and R are always opposites of each other: either S is 1 or R is one but not both. (Which is great, because we avoid the weird both-are-1 case.) The consequence is that:

  • When D is 1, we set the SR latch.
  • When D is 0, we reset the SR latch.

So let’s complete our not-quite-truth-table:

CDQ
00keep
01keep
100 (and store 0)
111 (and store 1)

The parentheticals there are meant to convey that we update the state that this circuit stores. So you can also think of the D latch’s “spec” this way:

  • Q is always the current stored value.
  • When the clock is low, ignore D and keep the current stored value.
  • When the clock is high, store D and immediately start outputting it via Q.

D Flip-Flop

The D latch has simplified the interface quite a bit, but it still has a shortcoming that we’d like to fix. In complex circuits, it can be inconvenient that the Q output changes immediately with the D input. The problem is that, in the real world, circuits can take (real) time to determine the value of D that they want to store—and, during that time, the value of the D input might change. We would like to hide those transient changes and define a specific moment where we capture and store the value of D. That’s what our next circuit will do.

The idea is to only pay attention to D in the moment where the clock signal changes: the rising edge or the falling edge. We’ll use the rising edge, but the technique easily generalizes to using the falling edge. We want our new circuit, called a D flip-flop, to keep Q stable for entire clock periods, and to only change its value (to match the D input) at the moment of the rising clock edge.

You can make a D flip-flop by wiring up two D latches in series and inverting the first one’s C input. (Again, see the wiring diagram in the accompanying visual notes.) The way to analyze this circuit is to realize that only one of the two D latches is “awake” at a given time. The first is active when the clock is low, and the second is active when the clock is high. So it takes half the clock period for the new data value to make it halfway through the circuit, and the entire clock period to finally reach the Q output.

The D flip-flop is the fundamental building block for stateful circuits that we will use in this class.

Register

A register is the computer-science name for when you write up \(n\) flip-flops in parallel and treat them a single unit that can store \(n\) bits. When you use 64 of these together, all wired up to the same clock signal, we’ll call that as a 64-bit register.

Abstractly speaking, you can think of a register as behaving the same way as a D flip-flop, but storing an \(n\)-bit number instead of a single bit. That is, think of the register as having two inputs (a 1-bit clock signal and an \(n\)-bit data signal) and one output (also \(n\) bits); the register captures a new stored value on the rising edge of the clock and keeps its output stable for the entire following clock period.

Register File

A Register File has N read/write registers, indexed by register number.

For 64-bit RISC-V, there are 32x 64-bit registers, where two 64-bit registers are used to read, \(Q_A\) and \(Q_B\), and one 64-bit register to write, \(D_W\). Each register is indexed using 5 bits since \(2^5\) is 32, \(R_A\), \(R_B\), \(R_W\). In a single clock cycle, two registers indexed by \(R_A\) and \(R_B\) can be read as input to an arithmetic logic unit (ALU), then the output stored in a register indexed by \(R_W\).

The RISC-V ISA

So far, we have used the raw materials of switches and transistors to build circuits that can do arithmetic and store state. At this point I think it’s interesting to ask yourself a philosophical question: what is a “computer”? It’s clearly a subjective definitional question, so you can decide for yourself. Take a minute or two to ponder!

I would argue that we do not yet have a computer as it is missing a key aspect: programmability. One definition of a computer is a machine that can be programmed to automatically execute sequences of arithmetic or logical operations. But before we can program our processor, we need a language.

Instructions

Recall that we can manually control our arithmetic and state circuits by turning on certain bits/wires. For example, registers have an enable input that decides whether or not to store the new input. Multiplexers have a select bit which determines which input to output. Even the inputs to adders are simply sequences of bits. Ultimately, what the circuit does is wholly determined by which of these bits are set and which ones are not.

As you know by now, if we collected all of the “control” bits together we would get a number in binary. However, this number is special—it means something to our circuit. We call this special number an instruction as it tells the circuit what to do.

Machine Code

Instructions encode a single action: “add 2 to the value in register 1”, “store 42 in register 5”, etc. In a weird way, this view means we’ve defined a programming language. A really bad, primitive programming language.

This bit-level “programming language” exists in every processor in existence. It is called machine code, and it is how all software on the computer works. Every program you’ve ever run, and every program you’ve ever written in every language, eventually translates down to machine code for your processor.

Instruction Set Architecture

A machine code language is called an instruction set architecture (ISA). Some popular ISAs for “real” computers include:

  • RISC-V, which we are using in this course.
  • ARM, which your phone almost certainly uses and your laptop might use.
  • Intel’s x86, which your laptop might use.

Each of these ISAs defines a “meaning” for strings of bits. Then, processors interpret those bits to decide which actions to take.

RISC-V

We will now take a leap to a full-featured processor and a standard, popular ISA: RISC-V.

Like all ISAs, RISC-V is an extremely primitive programming language made of bits, and it has a textual assembly format that makes it easier to read and write than entering binary values manually. Each instruction is like an extremely simple statement in a different programming language, and it describes a single small action that the processor can take.

As a general-purpose ISA, RISC-V has enough instructions so that arbitrary C programs can be translated to RISC-V code. In fact, that’s what happened every time you typed gcc during this whole semester.

Why Learn Assembly Programming?

Understanding assembly is important because it is the language that the computer actually speaks. So while it would be infeasible in the modern age to write entire large software projects entirely in assembly, it remains relevant for the small handful of exceptional cases where higher levels of abstraction obscure important information. Here are some examples:

  • People hand-write assembly for extremely performance-sensitive loops. A classic example is audio/video encoding/decoding: the popular FFmpeg library, for example, is mostly written in C but contains hand-written RISC-V assembly for performance-critical functions. While modern compiler optimizations are amazing, humans can still sometimes beat them.
  • Operating system internals typically need some platform-specific assembly to deal with the edge cases that arise with controlling user processes.
  • Code that must be secure, such as encryption and decryption routines, are often written directly in assembly to avoid timing channels. If an encryption routine takes different amounts of time depending on the key, an attacker can learn the key by repeatedly measuring the time taken to encrypt or decrypt. By taking direct control over which instructions get executed, humans can sometimes ensure that the code takes a constant amount of time, so that the attacker can’t learn anything by timing it. This is hard to do by writing C because the compiler tries to be clever: by optimizing your code, it can “accidentally” make its timing input-dependent.
  • Even more commonly: reading assembly is an important diagnostic skill. When something goes wrong, sometimes reading the assembly is the only way to track down the root cause. If it’s a performance problem, for example, understanding the source code only gets you so far. If it’s a compiler bug (and compilers do have bugs!), then debugging is hopeless unless you can read assembly.

For these reasons and others, it is important to know how to read and write assembly code. We will program in RISC-V during this semester, but the skills you learn as a RISC-V programmer will translate to other ISAs such as ARM and x86.

Let’s See Some RISC-V Assembly

To get started, let’s look at some RISC-V assembly code. I mentioned already that, every time you have typed gcc so far this semester, you have been invoking a compiler whose job it is to translate your C into machine code. We can ask it to instead stop at the assembly and print that out using the -S command-line flag.

Let’s start with an extremely simple C program:

unsigned long mean(unsigned long x, unsigned long y) {
    return (x + y) / 2;
}

To see the assembly code, try a command like this:

$ rv gcc -O1 -S mean.c -o mean.s

The -S tells GCC to emit assembly, and -o mean.s determines the output file. I’m also using some optimizations, with -O1, that clean up the code somewhat (in addition to making the code faster, it also makes the assembly more readable). This is just a text file, so you can open it in the same editor you use to write C code. Try opening it up.

There’s a lot going on in this output, but let’s zoom in on these 3 lines:

add     a0,a0,a1
srli    a0,a0,1
ret

This is a sequence of 3 assembly instructions. Each one works like a statement in a “real” programming language, and it describes a single, small action for the program to take. Even though we don’t know what these instructions do, we can puzzle through what this code does:

  • add probably adds two numbers together. Which is good, because that’s what our original C program does first.
  • srli is a little more mysterious. It turns out that this mnemonic stands for shift right logical immediate. The important part is that this is a bitwise right shift. So the compiler has cleverly decided to use something like >> 1 instead of / 2.
  • ret returns from the function.

The takeaway here is that our “second interpretation” of assembly code works for RISC-V too. We can think of it as an extremely primitive programming language and understand the code that way, forgetting about the fact that each instruction corresponds to some control bits that orchestrate the circuitry in a processor.

A Look at the Bits

Now let’s return to the first interpretation of assembly code: it’s a roughly 1-1 reflection of the (binary) machine code for a program that actually executes. Let’s look at those bits.

Object Files and Disassembly

We can translate our .s assembly code into machine by assembling it. Try this command:

$ rv gcc -c mean.s -o mean.o

The -c flag instructs GCC to just compile the code to an object file (with the .o extension), and not to link the result into an executable. (You can also ask GCC to go all the way from C to a .o in one step if you want; just provide the .c file as the input and remember to use -c.)

You could look directly at this object file with xxd mean.o if you want, but that’s not very informative. It’s more useful to disassemble the code in this file so you can see the text form of the instructions. (Disassembling is the opposite of assembling: it’s a translation from machine code back to assembly code.) Our container comes with a tool called objdump that can do this:

$ rv objdump -d mean.o

The important part of the output is:

0000000000000000 <mean>:
   0:   00b50533                add     a0,a0,a1
   4:   00155513                srli    a0,a0,0x1
   8:   00008067                ret

Here’s how to read this output:

function address <function name>:
 addr:  machine code           assembly instruction

On the right, we see the same three instructions in the textual assembly format. On the left the tool is also printing out the hex form of the machine code (and the corresponding address). For example, the first instruction consists of the bytes 00b50533, starting at address 0. In RISC-V, every instruction is exactly 4 bytes long, so the next instruction starts at address 4.

Raw Machine Code

The .o object files that our compiler produces don’t just contain machine code; they also contain other metadata to make linking possible. Sometimes (like on this week’s assignment), it is useful to have a “raw” binary file just containing the instructions. In the CS 3410 container, we have provided a convenient command that makes it easy to produce these raw files, called asbin.

Let’s put just the instructions we want into a new file:

add a0, a0, a1
srli a0, a0, 1
ret

Try this command:

$ rv asbin mean.s

Then take a look at the bytes:

$ xxd mean.bin
00000000: 3305 b500 1355 1500 6780 0000            3....U..g...

You can see the bits for same 4-byte instructions here, with a twist. The bytes are backward, for a reason we’ll explain next (named endianness).

For the curious only: our little asbin script just runs a couple of commands. You can run them yourself too:

$ as something.s -o something.o
$ objcopy something.o -O binary something.bin

The objcopy command is a powerful tool for converting between binary file formats, but we just need it to do this one thing. We just thought this was common enough in CS 3410 that it would be handy to have a single command to do it all.

Endianness

The reason the instruction bytes appear backward in the file is because of a concept called endianness or byte order. Different computers have different conventions for how to order the bytes within a multi-byte value. For example, in RISC-V, both int and instructions are 4 bytes—which order should we put those bytes into memory?

The options are:

  • Big endian: The “obvious” order. The most-significant byte goes at the lowest address.
  • Little endian: The other order. The least-significant byte goes at the lowest address.

Fortunately or unfortunately, most modern computers use little endian. That includes all of x86, ARM, and RISC-V (in their most common modes). That’s why the lowest byte in our instructions appears first when we look at the binary file with xxd. File I/O routines will hide this different from you, so if you read an int from a file, it will put the bytes in the right order by the time your program sees the bytes.

Why are these called big and little “endian”? It’s one of the all-time great examples of computer scientists being terrible at naming things: these names come from the 1726 novel Gulliver’s Travels by Jonathan Swift, from a part about a war between people who believe you should crack an egg on the big end or the little end.

RISC-V Assembly Basics

Let’s cover a few fundamental concepts that RISC-V will use for every instruction. We will break down this instruction from our example:

add a0, a0, a1

Registers

There are 32 registers. RISC-V names them x0 through x31. We’re using the 64-bit version of the RISC-V ISA, so each register holds a 64-bit value.

Alternative Names for Registers

While all the registers just hold bits, there are conventions about how each one is usually used. To help remind you of these purposes, RISC-V also gives the instructions alternative symbolic names. Wikipedia has a detailed table with all of these names that I won’t reproduce here. Here are some register names that will be relevant immediately:

  • x0 is also known as zero. It is unique among all RISC-V registers because it cannot be written: it always holds the all-0s value. If you try to update this register, the write is ignored. Having quick access to “64 zeroes” turns out to be useful for many programs.
  • x10 through x17 are also known as a0 through a7.
  • x5, x6, x7, and x28 through x31 are also known as t0 through t6.
  • x8, x9, and x18 through x27 are also known as s0 through s11.

The latter 3 sets of registers (aN, tN, and sN) have subtly different conventions that have to do with function calls, which we’ll cover later. For now, however, you can think of them as interchangeable places to put values when we’re operating on them. You absolutely do not need to memorize the alternative names for every register—you just need to know that there are multiple names. This way, you know that our instruction above is exactly equivalent to:

add x10, x10, x11

…because it just uses different names for the same registers. These alternate names are just an assembly language phenomenon (i.e., for human readability), and the machine code for these two versions looks exactly the same.

Three-Operand Form

Most RISC-V instructions take three operands, so they look like this:

<name> <operand>, <operand>, <operand>

The name tells us what operation the instruction should do, and the three operands tell us what values it will operate on. So our example is an add instruction, with three register operands: a0, a0, and a1.

In these three-operand instructions, the first one is the destination register and the second two are the source registers. You’ll sometimes see the format off the add instruction written like this:

add rd, rs1, rs2

The mnemonic is that r* are register operands, d means destination, and s means source. So our instruction add a0, a0, a1 adds the values in a0 and a1 and puts the result in a0. It is allowed, and extremely common, for the same register to be used both as a source and a destination.

Using the Manual

Working with assembly code entails reading the manual. A lot. In other languages, you can quickly build up an intuition for what all the basic components mean. In assembly languages, there are usually so many instructions that you need to look them up continuously. Expect to work with assembly with your code in one hand and the ISA manual in the other.

Navigate to this site’s RISC-V Assembly resource page. I recommend using the RISC-V reference card linked there all the time. In rare circumstances where you need more details, you can use the (very long) specification document. I’ll refer to the reference card here.

The first page of the reference card tells us what each instruction means. To understand our add instruction, we can find it on the list to see the format, a short English description, and a somewhat cryptic pseudocode description of the semantics.

The second page tells us how to encode the instruction as actual machine-code bits. We’ll cover the encoding strategy next.

Instruction Encodings

Every assembly instruction corresponds to a 32-bit value. This correspondence is called the instruction encoding.

For example, we know that the add instruction we’re working with, when assembled, encodes to the value 0x00b50533. Why those particular bits?

In RISC-V, instruction encodings use one of a few different formats, which it calls “types.” You can see a list of all the formats on the second page of the reference card: R-, I-, S-, B-, U-, and J-type (another list that you should not attempt to memorize). Each format comes with a little diagram mapping out the purpose of each bit in the 32-bit range.

Add Instruction

add is an R-type instruction (so named because all the operands are registers). Reading from the least-significant to most-significant bits, the map of the bits in an R-type instruction consists of:

  • 7 bits for the opcode. The opcode determines which instruction this is. The reference card tells us that the opcode for add is 0110011, in binary.
  • 5 bits for rd, the destination register. It makes sense that the register is 5 bits because there are a total of \(2^5=32\) possible registers. So to use destination register x10, we’d put the binary value 01010 into this field.
  • 3 function bits. (We’ll come back to this in a moment.)
  • The first source register operand, rs1. Also 5 bits.
  • The second source register, rs2. 5 bits again.
  • 7 more function bits.

In RISC-V, the function bit fields—labeled funct3 and funct7—specify more about how the instruction should work. They’re kind of a supplement to the opcode. For example, the table tells us that add and sub (and many others) actually share an opcode, and the bits in funct3 and funct7 tell us which operation to perform. To encode an add, set all the bits are zero.

So now we can describe exactly how to encode our example instruction, add x10, x10, x11. Again starting with the least-significant bits:

  • The opcode (7 bits): 0110011.
  • rd (5 bits): decimal 10, binary 01010.
  • funct3 (3 bits): 000.
  • rs1 (5 bits): decimal 10, binary 01010 (again).
  • rs2 (5 bits): decimal 11, binary 01011.
  • funct7 (7 bits): 0000000.

Try stringing these bits together and converting to hex. You should get the hex value the assembler produced for us, 0x00b50533. Some handy tools for doing these conversions include:

  • Bitwise, an interactive tool that runs in your terminal for experimenting with data encodings.
  • The macOS Calculator app. Press ⌘3 to switch to “programmer mode.”

Add-Immediate Instruction

To try another format, consider this instruction:

addi a0, a1, 42

This add-immediate instruction is different from add because one of the operands isn’t a register, it’s an immediate integer. The reference card tells us that this instruction uses a different format: I-type (the I is for immediate). The distinguishing feature in this format is that the most-significant 11 bits are used for this immediate value. (This field replaces the funct7 and rs2 fields from the R-type format.)

If we assemble this instruction, we get the 32-bit value 0x02a58513. The interesting part is the top 12 bits, which are 00000010 1010 or, in decimal, 42.

Let’s Write an Assembly Program

Let’s try out our new reading-the-manual skills to write an assembly program from scratch. Our program will compute \( (34-13) \times 2 \). We’ll implement the multiplication with a left shift, so our program will work like the C expression (34 - 13) << 1.

When writing assembly, it can help to start by writing out some pseudocode where each statement is roughly the complexity of an instruction and all the variables are named like registers. Here’s a Python-like reformatting of that expression:

a0 = 34
a1 = a0 - 13
a2 = a1 << 1

I’ve used three different registers just for illustrative purposes; we could definitely have just reused a0.

Let’s translate this program to assembly one line at a time:

  1. We need to put the constant value 34 into register a0. Remember the add-immediate instruction? And remember the special x0 register that is always zero? We can combine these to do something like a0 = 0 + 34, which works just as well. The instruction is addi a0, x0, 34.
  2. Now we need to subtract 13. Let’s look at the reference card. There is no subtract-immediate instruction… but we can add a negative number. Let’s try the instruction addi a1, a0, -13.
  3. Finally, let’s look for a left-shift instruction in the reference card. We can find slli, for shift left logical immediate. The final instruction we need is slli a2, a1, 1.

Here’s our complete program:

addi a0, x0, 34
addi a1, a0, -13
slli a2, a1, 1

To try this out, we could compile it to machine code, but this would be a little hard to work with because we’d need to craft the assembly code to print stuff out. (We’ll cover more about how to do this over the coming weeks.) Instead, a handy resource that you can find linked from our RISC-V assembly resources page is this online RISC-V simulator. Try pasting this program into the web interface and clicking the “Run” or “Step” buttons to see if we got it right: i.e., that the program puts the result \( (34-13) \times 2 \) into register a2.

Logical Operations in RISC-V

RISC-V has a full complement of instructions to do bitwise logical operations. Remember using &, |, <<, and >> for masking and combining in bit packing code? These instructions implement those C-level constructs.

Basic Logic

To start with:

  • Bitwise and: and, andi
  • Bitwise or: or, ori
  • Bitwise exclusive or (xor): xor, xori

These are all three-operand instructions. All of these instructions operate on all 64 bits in the registers at once. They also all have a register version and an immediate version; the latter one has the i suffix. The forms of the instructions are like:

xor rd, rs1, rs2
xori rd, rs1, imm

So the first version takes two register inputs, while the second takes a register and an immediate.

What About Not?

There is no (real) bitwise “not” instruction. The reason is that ~x is equivalent to x ^ -1, i.e., XORing the value with the all-ones value. If you spend some quality time with the XOR truth table, you’ll notice that you can think of it this way:

  • The first input to the XOR is a bunch of bits. You want to flip some of these bits.
  • The second input contains 1s in all the places where you want to flip the bit in the first input. Where this input is zero, leave the other bits alone.

So XORing with an all-ones value means “flip all the bits.” Instead of a proper “not” instruction, you can use xori:

xori rd, rs1, -1

In fact, RISC-V has made your life somewhat easier: it lets you write a pseudo-instruction to mean this. So in assembly code, you can actually pretend there is a not instruction:

not rd, rs1

But there is no separate opcode for not; it is not a real instruction. The assembler will translate the line of assembly above into an xori instruction for you. Keeping the number of “real” instructions small—by eliminating needless instructions that can be easily implemented with other instructions—keeps processors small, simple, and efficient. This is the reduced instruction set computer (RISC) philosophy.

Aside: Extension and Truncation

We will frequently need to change the size (the number of bits) of various values. For example, we’ll need to take an 8-bit value and treat it as a 64-bit value, and we’ll need to take a 64-bit value and treat it as a 32-bit value. When you increase the number of bits, that’s called extension, and when you decrease the size, that’s called truncation. The goal in both situations is to avoid losing information whenever possible: that is, to keep the same represented integer value when converting between sizes.

Truncation

Truncation from \(m\) bits to \(n\) bits works by extracting the lowest (least significant) \(n\) bits from the value. There is, sadly, no way to avoid losing information in some cases. Here are some examples:

  • Let’s truncate the 64-bit value 0x00000000000000ab to 32 bits. In decimal, this number has the value 171. Truncating to 32 bits yields 0x000000ab. That’s also 171. Awesome!
  • Let’s truncate 0xffffffffffffffab to 32 bits. That’s the value -85 in two’s complement. Truncating yields 0xffffffab. That’s still -85. Excellent!
  • Now let’s truncate the bits 0x80000000000000ab (note the 8 in the most-significant hex digit). That’s a really big negative value, because the leading bit is 1. Truncating yields 0x000000ab, which represents 171. That’s bad—we now have a different value. But losing some information is inevitable when you lose some bits.

Extension

There are two modes for extending from \(m\) bits to \(n\) bits. Both work by putting the value in the \(m\) least-significant bits of the \(n\)-bit output. The difference is in what we do with the extra \(n-m\) bits, which are the most-significant (upper) bits in the output.

  • Zero extension fills the upper bits with zeroes.
  • Sign extension fills them with copies of the most-significant bit in the input. (That is, the sign bit.)

Let’s see some examples.

  • Let’s zero-extend 0xffffffab (remember, that’s -85) to 64 bits. The result is 0x00000000ffffffab a pretty big positive number (4294967211 in decimal). So we didn’t preserve the value.
  • Now let’s sign-extend the same value. Because the most significant bit in the 32-bit input is 1, we fill in the upper 32 bits with 1s. The output is 0xffffffffffffffab in hex, or -85 in decimal. So we preserved the value!

The moral of the story is: when extending unsigned numbers, use zero extension; when extending signed numbers, use sign extension.

Shifts

RISC-V has bit-shifting instructions to implement C’s << and >>. Here are the ones for shifting left:

  • slli rd, rs1, imm: Shift left by an immediate amount.
  • sll rd, rs1, rs2: Shift left by an amount in a register.

No surprises here. But for rightward shifts, RISC-V has twice as many versions:

  • srl and srli: Shift right logical.
  • sra srai: Shift right arithmetic.

What is the difference between an arithmetic and a logical shift? The difference is similar to the deal with sign extension and zero extension. the difference is in what you do with the most-significant \(n\) bits that weren’t there before. That is, if you shift right by \(n\) bits, you just drop the original value’s least significant \(n\) bits, but what should you put for the output value’s most significant \(n\) bits? The two versions differ in their answer:

  • Logical shift right: Fill in those \(n\) most-significant bits with 0s.
  • Arithmetic shift right: Fill them in with copies of the sign bit.

Say, for example, that you have a register containing the negative number -3410, in two’s complement.

  • If you use srai to do an arithmetic shift right, you fill in the top bit with a copy of the original number’s sign bit, which is a 1. So the result is still negative: -1705.
  • If you instead use srli to do a logical shift right, the most-significant bit of the output will be a 0. So the result will be a very large positive number.

As with sign- and zero-extension, you want to use logical right shifts for unsigned numbers and arithmetic right shifts for signed numbers.

Consider asking yourself: why is there no separate arithmetic left shift?

An Example

Imagine that x10 contains the value 0x34ff. What does x12 contain after you run these instructions?

slli x12, x10, 0x10
srli x12, x12, 0x08
and  x12, x12, x10

Try working through the instructions one step at a time. It can save time to write the values in the registers in hex, if you can imagine the corresponding binary in your head.

The result value is 0x3400.

RISC-V: Data Memory & Control Flow

The Memory Hierarchy

So far, we have seen a bunch of RISC-V instructions that access the 32 registers, but we haven’t accessed memory yet. Registers are fine as long as your data fits in 31 64-bit values, but real software needs “bulk” storage, and that’s what memory is for.

In general, computer architects think of these different ways of storing data as tiers in an organization called the memory hierarchy. You can imagine an entire spectrum of different ways of storing data, all of which trade off between different goals:

  • Smaller memories that are closer to the processor and faster to access.
  • Larger memories that are farther from the processor and slower to access.

Registers are toward the first extreme: in 64-bit RISC-V, there is only a total of \(31 \times 8 = 248\) bytes of mutable storage, and it usually takes around 1 cycle (less than a nanosecond) to access a register.

Modern main memory is at the opposite extreme: even cheap phones have several gigabytes of main memory, and it typically takes hundreds of cycles (hundreds of nanoseconds) to access it.

You might reasonably ask: why not make the whole plane out of registers? There are two big answers to this question.

  • In real computers, these different memories are made out of different memory technologies. The physical details of how to construct memories are out of scope for CS 3410, but registers are universally made from transistors (like the flip-flops we built in class) and integrated with the processor, main memory is made of DRAM, a memory-specific technology that uses tiny little capacitors to store bits. DRAM requires different manufacturing processes than logic, is much cheaper per bit than integrated-with-logic storage, but it is also much slower.
  • There is a fundamental trade-off between capacity and latency. In any memory technology you can think of, building a larger memory makes it take longer to access.

Registers and main memory are two points in the memory-hierarchy spectrum. There are other points too: later in the semester, we will learn much more about caches, which fill in the space in between registers and main memory. You can also think of persistent storage (magnetic hard drives or flash memory SSDs) or even the Internet as further tiers beyond main memory.

Extension and Truncation

When we access memory, we will often need to change the size (the number of bits) of various values. For example, we’ll need to take an 8-bit value and treat it as a 64-bit value, and we’ll need to take a 64-bit value and treat it as a 32-bit value. When you increase the number of bits, that’s called extension, and when you decrease the size, that’s called truncation. The goal in both situations is to avoid losing information whenever possible: that is, to keep the same represented integer value when converting between sizes.

Truncation

Truncation from \(m\) bits to \(n\) bits works by extracting the lowest (least significant) \(n\) bits from the value. There is, sadly, no way to avoid losing information in some cases. Here are some examples:

  • Let’s truncate the 64-bit value 0x00000000000000ab to 32 bits. In decimal, this number has the value 171. Truncating to 32 bits yields 0x000000ab. That’s also 171. Awesome!
  • Let’s truncate 0xffffffffffffffab to 32 bits. That’s the value -85 in two’s complement. Truncating yields 0xffffffab. That’s still -85. Excellent!
  • Now let’s truncate the bits 0x80000000000000ab (note the 8 in the most-significant hex digit). That’s a really big negative value, because the leading bit is 1. Truncating yields 0x000000ab, which represents 171. That’s bad—we now have a different value. But losing some information is inevitable when you lose some bits.

Extension

There are two modes for extending from \(m\) bits to \(n\) bits. Both work by putting the value in the \(m\) least-significant bits of the \(n\)-bit output. The difference is in what we do with the extra \(n-m\) bits, which are the most-significant (upper) bits in the output.

  • Zero extension fills the upper bits with zeroes.
  • Sign extension fills them with copies of the most-significant bit in the input. (That is, the sign bit.)

Let’s see some examples.

  • Let’s zero-extend 0xffffffab (remember, that’s -85) to 64 bits. The result is 0x00000000ffffffab a pretty big positive number (4294967211 in decimal). So we didn’t preserve the value.
  • Now let’s sign-extend the same value. Because the most significant bit in the 32-bit input is 1, we fill in the upper 32 bits with 1s. The output is 0xffffffffffffffab in hex, or -85 in decimal. So we preserved the value!

The moral of the story is: when extending unsigned numbers, use zero extension; when extending signed numbers, use sign extension.

Load and Store Instructions

The 64-bit RISC-V instruction set gives you several instructions for loading from and storing to memory. They are very similar; the only difference is the size of the load or store: the number of bits we’re reading or writing.

Let’s start with ld and sd. The mnemonics use l and s for load and store, and the d means double word, which means they load/store 64 bits at a time.

The format looks like this:

ld rd, offset(rs1)
sd rs2, offset(rs1)

In both cases, the second operand is the address. This operand uses the funky-looking offset(rs1) syntax. This means “get the value from register rs1, and add the constant value offset to it; treat the result as the address.” The reason these instructions have a built-in constant offset is because it is so incredibly common for code to need to add a small constant value to an address before doing the access. If you don’t need this offset, you can always use 0 for the offset.

The ld instruction puts the value into rd. The sd instruction takes the value from rs2 and stores it to memory at the computed address.

Accessing Different Widths

The instruction set gives you several other load and store operations for different widths. Here is a non-exhaustive list:

  • ld and sd: Load or store a double word (64 bits).
  • lw, lwu, and sw: Load or store a word (32 bits).
  • lb, lbu, and sw: Load or store a byte (8 bits).

Recall that our registers are all 64 bits. So what happens when you use a smaller-width load or store?

  • When storing, you truncate (take the lowest \(n\) bits from the register).
  • When loading, you extend. The instruction tells you whether you zero-extend or sign-extend:
    • The instructions with the u suffix are for unsigned numbers, and they zero-extend.
    • The instructions without this suffix are for signed numbers, and they sign-extend.

So, for example, lb loads a single byte and sign-extends it to 64 bits to put it in a register. lbu does the same thing, but it zero-extends instead.

Example: Store Word, Load Byte

Consider this short program:

addi x11, x0, 0x49C
sw x11, 0(x5)
lb x12, 0(x5)

What is the value of x12 at the end?

As always, it helps to translate the assembly to pseudocode to understand it. Here’s one attempt:

x11 = 0x49c;
store_word(x11, x5);
x12 = load_byte(x5);

So we don’t know what address x5 holds, but that’s the memory address. We’re storing the value 0x49c as a word (32 bits) to that address, and then loading the byte at that address. Let’s look at the two steps:

  1. First, we store the 64-bit value 0x49c. Since we use little endian, least-significant byte goes at the smallest address. Let’s say x5 holds the address \(a\). Then address \(a\) will hold the byte 0x9c, \(a+1\) holds the byte 0x04, and addresses \(a+2\) and \(a+3\) both hold zero.
  2. Next, we load the byte at the same address. The load instruction gets the byte 0x9c, and it sign-extends it to 64 bits, so the final value is 0xffffffffffffff9c, or -100 in decimal if we interpret it as a signed number.

Example: Translating from C

How would you translate this C program to assembly?

void mystery(int* x, int* y) {
    *x = *y;
}

Assume (as is the case on our RISC-V target) that int is a 32-bit type. Assume also that the pointers x and y are stored in registers x3 and x5, respectively.

Here’s a reasonable translation:

lw x8, 0(x5)
sw x8, 0(x3)

Here are some salient observations about this code:

  • It makes sense that this is a load instruction followed by a store instruction, because we need to read the value at y and write it back to address x.
  • It also makes sense that we are using word-sized accesses (lw and sw) because that’s how you access 32 bits.
  • We use the signed version of the load (lw instead of lwu) to get sign-extension, not zero-extension. (If we used unsigned int instead, you would want lwu.)
  • The offset is zero in both instructions, because we want to use the addresses in x5 and x3 unmodified.

Control Flow in Assembly

So far, all the assembly programs we’ve written have been straight-line code, in the sense that they always run one instruction after the other. That’s like writing C without any control flow: no if, for, while, etc. The remainder of this lecture is about the instructions that exist in RISC-V to implement control-flow constructs.

Branch If Equal

For most instructions, when the processor is done running that instruction, it proceeds onto the next instruction (incrementing the program counter by 4 on RISC-V, because every instruction is 4 bytes). A branch instruction is one that can choose whether to do that or to execute some other instruction of your choosing instead. One example is the beq instruction, which means branch if equal:

beq rs1, rs2, label

The first two operands are registers, and beq checks whether the values are equal. The third operand is a label, which we’ll look closer at in a moment, but it refers to some other instruction. Then:

  • If the two registers hold equal values, then go to the instruction at label.
  • If they’re not equal, then just go to the next instruction (add 4 to the PC) as usual.

Labels appear in your assembly code like this:

my_great_label:

That is, just pick a name and put a : after it. This labels a specific instruction so that a branch can refer to it.

Here’s an example:

  beq x1, x2, some_label
  addi x3, x3, 42
some_label:
  addi x3, x3, 27

This program checks whether x1 == x2. If so, then it immediately executes the last instruction, skipping the second instruction. Otherwise, it runs all 3 instructions in this listing in order (it adds 42 and then adds 27 to x3).

In other words, you can imagine this assembly code implementing an if statement in C:

if (x1 != x2) {
  x3 += 42;
}
x3 += 27;

Labels in Machine Code

As shown above, in assembly code we can define labels like

my_great_label:

by simply picking a name and putting a : after it. However, these labels are symbolic and only appear in assembly code, not machine code.

When assembling the machine code, the assembler converts each label into signed offset. This offset is then added to the program counter (PC) to point to the next instruction if the branch is taken.

For example, consider the assembly program from the previous section annotated with the memory address (in instruction memory) of each instruction:

0:  beq x1, x2, some_label
4:  addi x3, x3, 42
some_label:
8:  addi x3, x3, 27

The assembler would remove the label some_label: and replace each occurrence with the appropriate offset:

0:  beq x1, x2, 8
4:  addi x3, x3, 42
8:  addi x3, x3, 27

Use Labels!

When writing assembly code by hand, use labels! Labels exist largely to make it easier (or possible) for programmers to read and write assembly code by hand. Replacing labels with offsets is a job better left to the assembler.

Other Branches and Jumps

You should read the RISC-V spec to see an exhaustive list of branch instructions it supports. Here are a few, beyond beq:

  • bne rs1, rs2, label: Branch if the registers are not equal.
  • blt rs1, rs2, label: Branch if rs1 is less than rs2, treated as signed (two’s complement) integers.
  • bge rs1, rs2, label: Like that, but with “greater than.”
  • bltu and bgtu are similar but do unsigned integer comparisons.

You will also encounter unconditional jumps, written j label. Unlike branches, j doesn’t check a condition; it always immediately transfers control to the label.

Implementing Loops

We have already seen how branches in assembly can implement the if control-flow construct. There are also all you need to implement loops, like the for and while constructs in C. We’ll see a worked example in this section.

Consider this loop that sums the values in an array:

int sum = 0;
for (int i = 0; i < 20; i++) {
  sum += A[i];
}

And imagine that A is declared as an array of ints:

int A[20];

Imagine that the A base pointer is in x8. Here’s a complete implementation of this loop in RISC-V assembly:

  add x9, x8, x0         # x9 = &A[0]
  add x10, x0, x0        # sum = 0
  add x11, x0, x0        # i = 0
  addi x13, x0, 20       # x13 = 20
Loop:
  bge x11, x13, Done
  lw x12, 0(x9)          # x12 = A[i]
  add x10, x10, x12      # sum += x12
  addi x9, x9, 4         # &A[i+1]
  addi x11, x11, 1       # i++
  j Loop
Done:

The important instructions for implementing the loop are the bge (branch if greater than or equal to) and j (unconditional jump) instructions. The former checks the loop condition i < 20, and the latter starts the next execution of the loop.

We have included comments to indicate how we implemented the various changes to variables. Here are some observations about this implementation:

  • We have chosen to put sum in register x10 and i in x11.
  • The x13 register just holds the number 20. We need it in a register so we can compare i < 20 with the bge instruction.
  • The x9 register is a little funky. It starts out storing the A base address, but then the pointer moves by 4 bytes on every loop iteration (with addi). The idea is that it always stores the address &A[i], i.e., a pointer to the \(i\)th element of the A array on the \(i\)th iteration. So to load the value A[i], we just need to load this address with lw.

The 5 Classic CPU Stages

Consider the following diagram of our RISC-V processor datapath.

A diagram of a single-cycle RISC-V datapath annotated with the five CPU stages: fetch, decode, execute, memory, and writeback

We can break down all the things that a CPU needs to do for every instruction into stages:

  • Fetch the instruction from the instruction memory.
  • Decode the instruction bits, producing control signals to orchestrate the rest of the processor. Read the operand values from the register file. For example, this stage needs to convert from a binary encoding of each register index into a “one-hot” signal to read from the appropriate register.
  • EXecute the actual computation for the instruction, using the arithmetic logic unit (ALU): add the numbers, shift the values, whatever the instruction requires.
  • Access Memory, reading or writing an address in the external data memory. Only some instructions need this stage—just loads and stores.
  • Write results back into the register file. The result could come from the ALU or from memory, if it’s a load instruction.

As the bolding in this list implies, computer architects often abbreviate these stages with a single letter: F, D, X, M, or W.

Pipelining & Performance

In this lecture we will consider the massively important topic of processor performance. We’ll first learn how to quantitatively estimate performance. Afterwards, we will analyze the performance of three architecture styles: single-cycle, multi-cycle, and pipelined CPUs.

Iron Law of Processor Performance

First, let’s define what we mean by processor performance. The performance of a processor is simply the amount of time it takes to execute a program, denoted by \(\frac{\mathrm{Time}}{\mathrm{Program}}\). The Iron Law of Processor Performance breaks this down into three parts:

\[ \frac{\mathrm{Time}}{\mathrm{Program}} = \frac{\mathrm{Instructions}}{\mathrm{Program}} \times \frac{\mathrm{Cycles}}{\mathrm{Instruction}} \times \frac{\mathrm{Time}}{\mathrm{Cycles}}\]

In English, the performance of a processor is the product of:

  • the number of instructions in the program,
  • the number of clock cycles it takes to execute a single instruction (a.k.a., cycles per instruction or CPI),
  • and how long a clock cycle is (a.k.a., the clock period1).

With the Iron Law of Processor Performance in mind, how can we make a processor that runs programs faster?

We can’t usually change the number of instructions in a program as that is largely determined by the ISA and the compiler. We do have some control over the CPI and the clock period, but there is a trade-off. We can either do more work in a given cycle by decreasing the CPI, but this inevitably makes the clock period longer. Alternatively, we could make the clock period shorter, but this generally means we are doing less each cycle. There is also a third option.

Architecture Styles

Recall our [processor schematic] depicting the five stages of a CPU: Fetch, Decode, EXecute, Memory, and Writeback. To design a processor, we have to decide how to map these stages for each instruction onto clock cycles.

There are three main architecture styles: single-cycle, multi-cycle, and pipelined.

Single-Cycle Processors

This is the most obvious approach to designing a processor: all the work for a single instruction is done in one cycle. Because there’s a lot of work that needs to be done, the clock period is long. In fact, the clock period must be long enough such that the slowest instruction can complete in a single cycle. As we saw in the last lecture, data transfer instructions take the longest to execute, in particular load instructions2.

Let’s analyze the performance of a single-cycle CPU. Since each instruction takes one cycle to execute, the CPI for single-cycle processors is \(1\). This means that we can execute \(n\) instructions \(n\) (long) cycles.

Multi-Cycle Processors

The key downside to single-cycle processors is that the clock period is tied to the latency3 of the slowest instruction (e.g., load instructions). This means that relatively fast instructions (e.g., instructions that don’t access memory) take the same amount of time as the slowest instruction.

Multi-cycle processors get around this restriction by running just one stage per cycle instead of one instruction per cycle. In this setup, one instruction executes over multiple cycles. To facilitate this, registers must be inserted at the end of each stage to hold control signals and values between cycles4.

These registers allow instructions to take a different number of cycles to execute dependent upon which stages they need to run. For example, the ld instruction has work to do in each of the five stages so it will take five cycles to execute. On the other hand, the add instruction can skip the memory stage and so will only take four cycles to run.

Regarding performance, multi-cycle processors are the opposite of single-cycle processors. Multi-cycle processors boast a very short clock period, but a high CPI as now instructions take multiple cycles to execute.

Single-Cycle vs. Multi-Cycle

Let’s now compare the performance of single-cycle and multi-cycle processors by comparing their clock periods and CPIs.

The clock period of a single-cycle processor is equal to the time it takes to run each of the five CPU stages (i.e., the latency of the slowest instruction). In comparison, the clock period of a multi-cycle processor is equal to the time it takes to run the longest CPU stage plus some \(\epsilon\) to account for the overhead of accessing the registers between stages.

The CPI of single-cycle processors is always \(1\) as each instruction takes one cycle to execute. For multi-cycle processors, the CPI is wholly dependent on what programs are run as different instructions take a different number of cycles to run. Since each program is different, we often use the average CPI to estimate the performance of multi-cycle CPUs.

For example, suppose that we have a program that consists of 20% branch instructions, 20% load instructions, and 60% ALU instructions. On a multi-cycle processor, branch instructions take 3 cycles, load instructions take five cycles, and ALU instructions take four cycles. The average CPI of a multi-cycle processor given this workload would be

\[ 0.2 \times 3 + 0.2 \times 5 + 0.6 \times 4 = 4 \]

Pipelined Processors

For most workloads, multi-cycle processors are faster than single-cycle processors. But can we do better?

If you build a multi-cycle processor, you quickly notice that much of your circuit remains idle most of the time. For example, the part of the processor for the Fetch stage is only active every ~5th cycle. We can exploit that idle time using pipelining.

The general idea behind pipelining is to overlap the executions of different tasks. In fact, you all likely use pipelining when you do laundry. There are three “stages” to doing laundry: washing, drying, and folding. Let’s assume that it takes 20 minutes for the washing machine to run, 30 minutes for the dryer to run, and 10 for you to fold the dry clothes. A single load of laundry then takes 60 minutes as we first wash the clothes for 20 minutes, move the wet clothes to the dryer to dry for 30 minutes, and lastly spend 10 minutes folding the clothes once the dryer finishes.

Suppose you’re backed up and need to do multiple loads of laundry. You start the same by putting the first load of laundry into the washer. After 20 minutes, you move the wet clothes into the dryer as before. However, at this point you probably put the second load of laundry in the washing machine so that the washing machine and the dryer are running at the same time. It would be inefficient if you waited until after you folded the first load of laundry to start the next load of laundry.

Pipelined processors do very nearly the same thing! While we Decode one instruction, we can simultaneously Fetch the next instruction. Then in the next cycle, we can eXecute the instruction we just decoded, Decode the instruction we just Fetched, all while Fetching the next instruction.

We can build pipelined processors in a similar way to multi-cycle ones. Like multi-cycle processors, pipelined processors break the datapath into multiple cycles where each stage completes in one cycle. We also need to add pipeline registers between the stages.

Pipelining is such a useful idea that the vast majority of real processors use it. Real processors actually tend to break instruction processing into many more than 5 stages. It’s difficult to find public information about the specifics, but, as one data point, this reliable source claims that an oldish Intel processor had somewhere between 14 and 19 stages.

Performance of Pipelined Processors

Now let’s consider the performance of a pipelined processor.

Suppose that all of the instructions overlap perfectly in a 5-stage pipeline. In this scenario, the first instruction finishes after the 5th cycle. The second instruction then finishes after the 6th cycle. The third instruction finishes after the 7th cycle and, so on. So, on average, an instruction finishes executing every cycle resulting in a CPI of 1! More precisely, it takes only \(4 + n\) cycles to execute \(n\) instructions.

The clock period of pipelined processors can be nearly as short as a multi-cycle processor too! Again, this is because the clock period needs to be long enough such that the slowest stage can execute plus some additional time to account for the overhead of accessing the pipeline registers.

The table below compares the clock period and the CPI of single-cycle, multi-cycle, and pipelined processors.

MetricSingle-CycleMulti-CyclePipelined
Clock Period\(\mathbf{F}+\mathbf{D}+\mathbf{X}+\mathbf{M}+\mathbf{W}\)\(\mathrm{max}(\mathbf{F}+\mathbf{D}+\mathbf{X}+\mathbf{M}+\mathbf{W})+\epsilon_M\)\(\mathrm{max}(\mathbf{F}+\mathbf{D}+\mathbf{X}+\mathbf{M}+\mathbf{W})+\epsilon_P\)
Cycles Per Instruction (CPI)1It depends!1

As you can see, pipelined processors are the best of both worlds! They have the clock period of multi-cycle processors with the CPI of single-cycle ones!

Single-Cycle vs. Multi-Cycle vs. Pipelined

To drive home the point, let’s see a concrete example!

Suppose that you stumble upon a mysterious program alongside a README containing the following table:

Instruction TypeStagesPercentage of Program
BranchesF,D,X20%
MemoryF,D,X,M,W20%
Arithmetic & LogicalF,D,X,W60%

Something compels you to estimate the performance (\(\frac{\mathrm{Time}}{\mathrm{Instruction}}\)) of this mystery program. Luckily, you’re fortunate to have single-cycle, multi-cycle, and pipelined versions of the same base processor with the following stage latencies:

StageLatency (ns)
Fetch170 ns
Decode180 ns
EXecute200 ns
Memory200 ns
Writeback150 ns

In the multi-cycle and pipelined versions, let the overhead of the registers between the stages be 5 nanoseconds (\(\epsilon_M = \epsilon_P = 5~\mathrm{ns}\)). We now have everything we need to estimate the performance of our mystery program on each architecture style!

MetricSingle-CycleMulti-CyclePipelined
Clock Period900 ns205 ns205 ns
Cycles Per Instruction (CPI)141
Performance (\(\frac{\mathrm{Time}}{\mathrm{Instruction}}\))900 ns820 ns205 ns

Notice how the pipelined processor is 4X faster than the multi-cycle processor and ~4.39X faster than the single-cycle processor! Wow!!

Latency vs. Throughput

It is important to note that pipelined processors don’t execute any one instruction faster than a multi-cycle processor. Actually, the instruction latency of pipelined processors is generally worse than multi-cycle processors. What makes pipelined processors fast is their high throughput by executing multiple instructions in parallel.

Hazards

This is the part of the lecture where I have to come clean and admit that I lied to you. Unfortunately, pipelining isn’t that straight-forward.

To see why, suppose that our program contained the following two RISC-V assembly instructions:

j EXIT
addi x10, x11, 1

After j EXIT is done, the next instruction that should be run is not addi x10, x11, 1, rather it should be whatever instruction is after the EXIT label. But pipelined processors will have just finished running the Memory stage of the addi instruction! Now all the work that has been done needs to be thrown away and we need to start again by Fetching the instruction at EXIT.

This is just one of the many ways where pipelining can go wrong, appropriately named hazards! However, they are out of scope for this class. If you’re interested, see sections 4.8–4.9 in [P&H].


1

The clock period is the inverse of the clock frequency or clock speed. That is, the clock period is how long a single clock cycle takes whereas the clock frequency is how many cycles can be run during a fixed unit of time. Clock frequency is often used as a measure of how fast a CPU is, usually in GHz.

2

Load instructions take the longest as the processor needs to do work in every stage to execute a load instruction. On the other hand, the processor doesn’t need to do any work in the writeback stage for store instructions which shaves off a couple nanoseconds.

3

The latency of an instruction is the time it takes to execute an instruction.

4

What would go wrong if we omitted the registers at the end of each stage? Why don’t we need a register at the end of the writeback stage?

A0: Infrastructure

Instructions: Remember, all assignments in CS 3410 are individual. You must submit work that is 100% your own. Remember to ask for help from the CS 3410 staff in office hours or on Ed! If you discuss the assignment with anyone else, be careful not to share your actual work, and include an acknowledgment of the discussion in a collaboration.txt file along with your submission.

The assignment is due via Gradescope at 11:59pm on the due date indicated on the schedule.

Late Submissions

This assignment is due on Monday, 1/27 at 11:59pm. You may submit the assignment up to three days late (i.e., until 1/30 at 11:59pm) without using slip days.

Submission Requirements

You will submit your completed solution to this assignment to Gradescope. You must submit:

  • lab1.c, which will be modified with your solution for print_digit and print_string

Restrictions

  • You may not include any libraries beyond what is already included in stdio.h
  • Your solution should use constant space (you should not use arrays, either dynamically or statically)

Provided Files

There is no release code for this assignment. You create your own file, as described below.

Implementation

View the lab slides here.

Before coming to lab, go through the course setup materials for Git and the RISC-V Infrastructure. The lab tasks will assume you have at least set up your Cornell GitHub credentials and have your favorite text editor, such as Visual Studio Code, ready to go.

Step 1: Compiling and running C programs

Course Docker Container

Follow these instructions to set up Docker and obtain CS 3410’s Docker container. To summarize, you will need to:

  • Install Docker itself.
  • Download the image with docker pull ghcr.io/sampsyo/cs3410-infra.
  • Consider setting up an rv alias to make the container easy to use.

If you don’t already have a favorite text editor, now would also be a good time to install VSCode.

C Programming

Next, follow these instructions for writing, compiling, and running your first C program.

When your program runs, show the result to a TA. Congratulations! You’re now a C programmer.

Git

Now, we’ll get some experience with Git! If you haven’t already, be sure to follow our guide to setting up your credentials on GitHub so you have an SSH key in place.

Go to the to the Cornell GitHub website and create a repository called “lab1”. This repository can be public, but for assignments all of your repositories must be private.

Now, clone your repository from within the cs3410 directory you made earlier:

$ git clone git@github.coecis.cornell.edu:abc123/lab1.git

replacing abc123 with your actual NetID. If this doesn’t work, ask a TA for assistance. There is probably something wrong with your GitHub configuration.

Before changing directories into the repo, you should move your hi.c file that you created during the Docker setup step into the lab1 folder and clean up the executables we made earlier:

$ mv hi.c lab1
$ rm a.out
$ cd lab1
$ ls

If you haven’t created one yet, you can run:

$ cd lab1
$ printf '#include <stdio.h>\nint main() { printf("hi!\\n"); }\n' > hi.c

You should see the file hi.c in your repository. Enter:

$ git status

The following should appear (or something like it):

On branch main

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)
        hi.c

Now, you should add the file hi.c to stage it, make a commit, and then push to the remote repository:

$ git add hi.c
$ git commit -m "Initial commit"
$ git push

This is commonly the GitHub workflow for a single person working on an assignment. You’ll make some changes, commit them, and push them, over and over until you finish the assignment.

Git

To learn more about Git, consider following our complete git tutorial!

Step 2: print_digit and print_string

For this next task, you are going to write two helper functions to help you in Assignment 1:

  • print_digit(int digit): Given an integer digit between 0 and 15, print digit as a hexadecimal digit using lowercase letters to the terminal (without using printf)
  • print_string(char* s): Given a string, print it to the terminal (without using printf)

First, cd into your lab1 repository. Then, make a file called lab1.c, and copy/paste the following code:

#include <stdio.h>

// LAB TASK: Implement print_digit
void print_digit(int digit) {
}

// LAB TASK: Implement print_string
void print_string(char* s) {
}

int main(int argc, char* argv[]) {
  printf("print_digit test: \n"); // Not to use this in A1
  for (int i = 0; i <= 16; ++i) {
    print_digit(i);
    fputc(' ', stdout);
  }
  printf("\nprint_string test: \n"); // Not to use this in A1

  char* str = "Hello, 3410\n";
  print_string(str);
  return 0;
}

fputc

fputc (defined in stdio.h) writes a single character to a given output stream (e.g., stdout). See more here.

Hint

For print_digit, you’ll want to use an ASCII table.

Save the file and exit the editor. Now is a good time to commit and push your changes to your repository. Once you’ve pushed, try to implement the functions print_digit and print_string. The TAs are available for help should you need it.

Once you’ve implemented the functions, you can run the program:

$ rv gcc -Wall -Wextra -Wpedantic -Wshadow -std=c17 -o test_lab1 lab1.c
$ rv qemu test_lab1

Warning

Like many commands on this page, this assumes you have the rv aliases setup as described in our RISC-V Infrastructure setup guide.

Remember, if you change lab1.c between runs, you need to recompile the program. That’s all for Assignment 0!

Submission

Submit lab1.c to Gradescope. Upon submission, we will provide a smoke test to ensure your code compiles and passes the public test cases.

A1: Implementing printf

Instructions: Remember, all assignments in CS 3410 are individual. You must submit work that is 100% your own. Remember to ask for help from the CS 3410 staff in office hours or on Ed! If you discuss the assignment with anyone else, be careful not to share your actual work, and include an acknowledgment of the discussion in a collaboration.txt file along with your submission.

The assignment is due via Gradescope at 11:59pm on the due date indicated on the schedule.

Submission Requirements

You will submit your completed solution to this assignment to Gradescope. You must submit:

  • my_printf.c, which will be modified with your solution for Task 1 and Task 2
  • test_my_printf.c, which will contain your tests for your solution for Task 1 and Task 2

Restrictions

  • You may not include any libraries beyond what is already included in my_printf.h
  • Your solution should use constant space (you should not use arrays, either dynamically or statically)
  • You may add as many helper functions as you would like in my_printf.c (including those you wrote in Assignment 0!), but you must leave the function signatures for my_printf and print_integer unchanged. You may not change my_printf.h, as we will be using our own header file for grading.

Provided Files

The provided release code contains four files:

  • my_printf.h, which is a header file that contains the required function definitions and some useful include statements. You may not modify this file. You may also not include any libraries in your implementation beyond what is included in already in this file.
  • my_printf.c, which contains the function definitions for your implementation. This is where you will write your code for my_printf and print_integer.
  • test_my_printf.c, which is a test file with a couple test cases to get you started. You must add more tests to receive full credit for this assignment.
  • test_my_printf.txt, which is a text file that you can use to compare your outputs to by “diff” testing. See more in Running and Testing.

Getting Started

To get started, obtain the release code by cloning the a1 repository from GitHub:

$ git clone git@github.coecis.cornell.edu:cs3410-2025sp-student/<YOUR NET ID>_printf.git
  • Note: Please replace the <YOUR_NET_ID> with your NetID. For example, if you NetID is zw669, then this clone statement would be git clone git@github.coecis.cornell.edu:cs3410-2025sp-student/zw669_printf.git

Overview

In this assignment you will implement your own version of printf (see the documentation here) called my_printf without relying on the C standard library. Recall that printf works by taking in a format string that contains various format codes, in addition to a variable number of other arguments. The format codes specify how to “plug in” the arguments into the format string, to get the final result. For example:

printf("I love %d!", 3410); // prints "I love 3410!"
printf("Hello, %s", "Alan"); // prints "Hello, Alan"
printf("Hello %s and %s!", "Alan", "Alonzo"); // prints "Hello Alan and Alonzo!"

You will implement two key functions:

  • print_integer(int n, int radix, char *prefix): Print the integer n to stdout in the specified base (radix), with prefix immediately before the first digit.
  • my_printf(char *format, ...): Print a format string with any format codes replaced by the respective additional arguments.

Your implementation will be contained in my_printf.c. We’ve provided you with the function signatures to get you started. You should look at my_printf.h for detailed function specifications.

Assignment Outline

  • Task 1: You will implement the print_integer function
  • Task 2: You will implement the my_printf function

Implementation

Task 1: print_integer

Starter Code & A0

For Task 1 and Task 2, all your code should be in the “a1” Git repository. See the Getting Started section for how to retrieve the starter code. Your implementation will be contained in my_printf.c and test_my_printf.c.

If you would like to use the print_digit and print_string functions that you wrote in Assignment 0, you should copy and paste them into my_printf.c from your lab1.c file that you submitted for Assignment 0.

The print_integer function takes a number, a target base, and a prefix string and prints the number in the target base with the prefix string immediately before the first digit to stdout. radix may be any integer between 2 and 16 inclusive. For values of radix above 10, use lowercase letters to represent the digits following 9 (since bases higher than 10 canonically use lowercase letters as well).

This function should not print a newline. Here are some examples:

  • print_integer(3410, 10, "") should print “3410”
  • print_integer(-3410, 10, "") should print “-3410”
  • print_integer(-3410, 10, "$") should print “-$3410”
  • print_integer(3410, 16, "") should print “d52”
  • print_integer(3410, 16, "0x") should print “0xd52”
  • print_integer(-3410, 2, "0b") should print “0b11111111111111111111001010101110”
  • print_integer(-3410, 16, "0x") should print “0xfffff2ae”

For the radix 10, negative numbers should be printed with a negative sign (-). All other bases should use the 2’s complement representation from lecture. In other words, it should not print a negative sign, and instead just print an unsigned integer representing a 2’s complement number. This is exactly what printf from the standard library does when you pass in negative integers for bases other than 10. You can try this on your own:

#include <stdio.h>

int main() {
    printf("-10 in hex is: %x\n", -10);
    printf("-10 in binary is: %b\n", -10); // Note: requires C23
}

The above code outputs:

-10 in hex is: fffffff6
-10 in binary is: 11111111111111111111111111110110

which is the 2’s complement representation of -10 in hex and binary, respectively.

You can only use fputc.

You are not allowed to call any functions from the C standard library except for fputc anywhere in your implementation. You should print a character to the console using fputc(c, stdout), where c is the character you want to print.

Tip: In addition to the documentation on cppreference.com, you can also find documentation for many standard library functions in C through the manual pages (“manpages”) in your terminal. Simply type:

$ man fputc

to pull it up. You can scroll through it and then type q to exit.

You must not make any assumptions about the size of an integer on a given platform. On our platform, an integer is 32 bits, but C allows int to be different sizes on different platforms. For example, on some architectures int is 64 bits. Thus, you cannot store the new representation of the integer as a string or in a buffer of any size, as this would make assumptions about how big an integer is on your platform. Calling malloc is also prohibited (by extension of the fact that stdlib.h is prohibited). In other words, you should figure out how to do this without using any additional memory.

Warning

Storing characters or integers in an array (dynamically or statically) will result in a significant deduction.

You’ll also need to figure out how to print the integer from left-to-right instead of right-to-left without using additional memory. One of the algorithms you might recall from class for changing the base of a number would give you the digits from right-to-left, so it can seem tempting to try to use this as a starting point. Be warned that this will not work, as any tricks such as “reversing” the output or storing the digits would violate the constraints of this assignment (i.e. no standard library usage and no storing values in an array). Instead, think of how you can work backwards from the methods you’ve learned in class.

Task 2: my_printf

This function prints format with any format codes replaced by the respective additional arguments, as specified below:

Your my_printf function is required to support the following format codes:

  • %d: integer (int, short, or char), expressed in decimal notation, with no prefix.
  • %x: integer (int, short, or char), expressed in hexadecimal notation with the prefix “0x”. Lowercase letters are used for digits beyond 9
  • %b: integer (int, short, or char), expressed in binary notation with the prefix “0b”.
  • %s: string (char*)
  • %c: character (int, short, or char, between 0 and 127) expressed as its corresponding ASCII character
  • %%: a single percent sign (no parameter)

For each occurrence of any of the above codes, your program shall print one of the arguments (after the format) to my_printf(...) in the specified format. Anything else in the format string should be expressed as is. For example, if the format string included "%z", then "%z" would be printed. Likewise, a lone “%” at the end of the string would also be printed as is (note that this differs slightly from the behavior of printf).

Note that strings in C can be NULL. If my_printf is passed a null string as an argument, it should not crash, but instead print (null) to represent the would-be string:

#include <stdio.h>

int main(int argc, char* argv[]) {
  my_printf("Null string: %s", NULL); // Prints: "Null string: (null)"
}

Again, you are not allowed to call any C standard library functions. You should print to stdout only using fputc (documentation for fputc is here).

For any format codes relating to numbers, your program should handle any valid int values between INT_MIN and INT_MAX, inclusive.

Note that my_printf is a variadic function, meaning it takes in a variable number of arguments. You don’t need to know this deeply, but you will need to look up the syntax, and also understand how a program determines the number of arguments.

A variadic function is any function that takes in an unknown number of optional parameters. The optional parameters are represented by three dots (e.g. int foo(int n, ...)). The dots are a part of the C language. The optional arguments are accessed using va_arg from stdarg.h. You must call va_start at the start of your variadic function before the first use of va_arg. You must call va_end once at the end of your variadic function, after the last use of va_arg. There is no way to know from va_arg how many optional arguments there are, so you need to use some other information to determine how many times to call va_arg. In this case, it is the format string. Here’s an example from the GNU documentation:

#include <stdarg.h>
#include <stdio.h>

int add_em_up(int count,...) {
  va_list ap;
  va_start (ap, count);         /* Initialize the argument list. */

  int sum = 0;
  for (int i = 0; i < count; i++)
    sum += va_arg (ap, int);    /* Get the next argument value. */

  va_end (ap);                  /* Clean up. */
  return sum;
}

int main(int argc, char* argv[]) {
  /* This call prints 16. */
  printf("%d\n", add_em_up (3, 5, 5, 6));

  /* This call prints 55. */
  printf("%d\n", add_em_up (10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10));

  return 0;
}

Here are some examples to help you understand the spec:

  • my_printf("3410") should print “3410”
  • my_printf("My favorite class is %d", 3410) should print “My favorite class is 3410”
  • my_printf("%d in hex is %x", 3410, 3410) should print “3410 in hex is 0xd52”
  • my_printf("The pass rate in 3410 is 100%%") should print “The pass rate in 3410 is 100%”
  • my_printf("Professor %s and Professor %s are the instructors", "Weatherspoon", "Susag") should print “Professor Weatherspoon and Professor Susag are the instructors”

Note that insufficient parameters could lead to undefined behavior (i.e. when the number of arguments is less than the number of format codes). You do not have to handle this case. Similarly, mismatched parameters (when the format code does not match the given argument’s type) can also lead to undefined behavior, but you do not need to handle this.

You are encouraged to use print_integer in my_printf. Nonetheless, these functions will be tested independently.

Running and Testing

RISC-V Infrastructure

Like many commands on this page, this assumes you have the rv aliases setup as described in our RISC-V Infrastructure setup guide.

To compile your code, run:

rv gcc -Wall -Wextra -Wpedantic -Wshadow -std=c17 -o test_my_printf test_my_printf.c my_printf.c

Then, to run your code:

rv qemu test_my_printf

We will be testing your code by comparing the output of your program to a test file. You will extend the file test_my_printf.txt with your own test cases. You are required to write more tests, and the quality of the tests will be graded. Feel free to use the examples in this handout as a starting point.

To receive full credit for testing, you should have at least 10 test cases each for print_integer and my_printf. Test cases should cover as many paths through your code as possible. To receive full credit for testing for print_integer, you should have at least:

  • One test representing integers for each base from 2-16
  • One or more tests for different prefixes
  • One or more tests with no prefixes

To receive full credit for testing my_printf you should have at least:

  • One test for each format code
  • One test for no format codes
  • One test that contains multiple format codes

To compare the output of your program with the test file, run:

rv qemu test_my_printf > out.txt && diff out.txt test_my_printf.txt

If you don’t see any output from this command, your tests are passing. Note, for each test you add in test_my_printf.txt, you must call the corresponding function (either print_integer or my_printf) in test_my_printf.c. You should insert newlines between your test cases for readability. You may use printf in your test file, if you wish.

Don’t forget to recompile your code between different runs of your program.

Note, you can do this all in one command, like such:

rv gcc -Wall -Wextra -Wpedantic -Wshadow -std=c17 -o test_my_printf test_my_printf.c my_printf.c && \
    rv qemu test_my_printf > out.txt && \
    diff out.txt test_my_printf.txt

Submission

Submit my_printf.c and test_my_printf.c to Gradescope. Upon submission, we will provide a smoke test to ensure your code compiles and passes the public test cases.

Rubric

  • 40 points: print_integer correctness
  • 50 points: my_printf correctness
  • 10 points: test quality

A2: Minifloat

A2 Megathread

For answers to frequently asked questions regarding this assignment, please see the A2 Megathread on Ed.

Instructions: Remember, all assignments in CS 3410 are individual. You must submit work that is 100% your own. Remember to ask for help from the CS 3410 staff in office hours or on Ed! If you discuss the assignment with anyone else, be careful not to share your actual work, and include an acknowledgment of the discussion in a collaboration.txt file along with your submission.

The assignment is due via Gradescope at 11:59pm on the due date indicated on the schedule.

Submission Requirements

For this assignment, you will need to submit the following five files:

  • minifloat.c, with your written implementation for the missing functions.
  • minifloat_test_part1.expected, to match additional tests added in minifloat_test_part1.c
  • Some additional tests, in:
    • minifloat_test_part1.c
    • minifloat_test_part2.c
    • minifloat_test_part3.c

Restrictions

For this assignment, you will build your own floating-point representation.

  • You may not use built-in C operations for floating-point arithmetic.
  • You may not cast data to float or double, or create variables with these types.

Provided Files

The provided release code contains seven files:

  • minifloat.c, which includes some completed functions and some functions you are expected to implement
  • minifloat.h, which provides declarations and comments for the functions in minifloat.c, including those you are to implement
  • minifloat_test_part1.c, minifloat_test_part2.c, minifloat_test_part3.c, which provide some tests for you to get started. You are expected to add more tests of your own to each of these test suites
  • minifloat_test_part1.expected, which provides a baseline file to help with testing part 1. You are expected to add more lines to this file as part of testing part 1.
  • Makefile, which provides structure to compile your code (see our brief tutorial on Makefiles)

Getting Started

To get started, obtain the release code by cloning your assignment repository from GitHub:

$ git clone git@github.coecis.cornell.edu:cs3410-2025sp-student/<NETID>_minifloat.git

Replace <NETID> with your NetID. For example, if your NetID is zw669, then this clone statement would be git clone git@github.coecis.cornell.edu:cs3410-2025sp-student/zw669_minifloat.git

Overview

In this assignment, you will develop a custom minifloat data format in C. You will be expected to reason about floating-point details and implement operations over your custom floating-point data type in C.

Background

In class, we learned about floating-point numbers, which represent decimals with some number of bits. C has built-in float and double types, which use (on modern hardware) 32 bits and 64 bits, respectively. Increasing the number of bits in a floating-point representation gives it more precision and more dynamic range, at the expense of less efficient arithmetic. It can also be useful, however, to perform operations with smaller floating-point representations—trading off precision for potentially faster calculations.

In this assignment, you will implement functions for a specialized 8-bit floating-point type. We’ll call these 8-bit numbers minifloats. Minifloats have severely limited precision, but such tiny floating-point values are useful for situations where errors matter less and data sizes are enormous: most prominently, in machine learning. See, for example, this paper and this other paper that both show serious efficiency advantages from using 8-bit minifloats. While most floating-point formats enjoy built-in hardware support, we can also implement minifloats in software with bit packing tricks.

Minifloats follow a similar representation strategy to the standard IEEE floating-point types that we learned about in lecture. However, they differ in a few important ways to make the implementation simpler, which we will summarize as well.

Minifloat Specification

  • Minifloats use 8 bits in total: 1 sign bit, 3 exponent bits, and 4 significand bits. The layout of a minifloat looks like this, with s for sign, e for exponent, and g for significand:
Minifloat Layout
  • As in standard formats, a sign bit of 0 indicates a positive number, and a sign bit of 1 indicates a negative number.

  • Minifloats have a bias of 3. In other words, we subtract 3 from the bit-representation of a minifloat exponent. In comparison, single-precision floating-point numbers (i.e., float) have a bias of 127.

  • Unlike standard floating-point formats, wherein we usually append a leading 1 to the significand bits with the \(1.g\) notation, minifloats use the significand directly, with the binary point after the first digit. So if the four significand bits are \(g_3 g_2 g_1 g_0\), then the “base” part of the represented value is the binary number \(g_3 . g_2 g_1 g_0\). Or, in other words, the value is \(g \times 2^{-3}\), where \(g\) is the unsigned integer value of those 4 bits.

  • Also unlike standard floating-point formats, our minifloats do not use special values: not a number (NaN) and infinity (+∞ and -∞).

All together, the value represented by a minifloat with sign \(s\), exponent \(e\), and significand \(g\) is:

\[ (-1)^s \times (g \times 2^{-3}) \times 2^{e - 3} \]

Or, equivalently, if you prefer to think of the significand’s representation in terms of bits:

\[ (-1)^s \times (g_3.g_2g_1g_0) \times 2^{e - 3} \]

where \(g_3\) is the significand’s most significant bit, \(g_0\) is the least significant bit, and so on.

Examples

Now that we have defined our minifloat specification, let’s see some examples!

Example 1: 10111100

We have a sign of 1, an exponent of 011, and a signficand of 1100.

  • Our sign bit 1 corresponds to \(-1\).
  • Our exponent 011 corresponds to a decimal exponent of \(3-3 = 0\). (We’re applying our \(-3\) bias here.)
  • Our significand 1100 corresponds to the decimal \(12 \times 2^{-3}=\frac{12}{8}=1.5\). (Or, equivalently, the significand corresponds to the binary number \(1.100_2\), which is \(1.5\) in decimal.)

Altogether, 10111100 is \(-1 \times 1.5 \times 2^0 = -1 \times 1.5 \times 1 = -1.5\) in base-10.

Example 2: 00010010

We have a sign of 0, an exponent of 001, and a significand of 0010.

  • Our sign 0 corresponds to \(+1\).
  • Our exponent 001 corresponds to a decimal exponent of \(1-3 = -2\).
  • Our significand 0010 corresponds indicates the binary value \(0.010_{2}\), which equals \(0.25_{10}\).

Altogether, 00010010 is \(1 \times 0.25 \times 2^{-2} = \frac{1}{16} = 0.0625\) in base-10.

Converting between Minifloats and Decimals

Decimal to Minifloat

To convert a decimal number into a minifloat:

  1. Convert the integer and fractional parts into binary.
  2. Normalize to match the format \( g_3.g_2g_1g_0 \times 2^e \).
  3. Convert exponent into biased form (i.e., add 3).
  4. Set the sign bit accordingly.

Example: Converting 2.25 into an 8-bit float

Step 1: Convert the integer and fractional parts to binary.

Converting the integer portion into binary yields 10.

Our fractional part is 0.25. To convert, multiply the fractional part by 2, record the integer part of the result (should be 0 or 1), and repeat with the new fractional part until the fractional part becomes 0 or the precision limit is reached (is 4 digits for our minifloat format). The recorded integer parts of this process becomes our binary representation for the original fractional part.

  • \( 0.25 \times 2 = 0.50 \). Record 0.
  • \( 0.50 \times 2 = 1.00 \). Record 1.

Thus our binary representation of 0.25 is 01. Together with the integer portion, our binary representation of 2.25 is 10.01.

Step 2: Normalize to match the format \( g_3.g_2g_1g_0 \times 2^e \).

Now we normalize our result so that it fits the format \(g_3.g_2g_1g_0 \times 2^e\). In this case, we shift to the left by one place: \(1.001 \times 2^1\). From this we can see that our significand is 1001.

Step 3: Convert exponent into biased form (i.e., add 3).

Next, we need to apply our format’s exponent bias, which for minifloats is 3. To bias the exponent, we add our original exponent \(e\) with the bias. So, \(1 + 3 = 4\) (100 in binary).

Step 4: Set the sign bit accordingly.

Lastly, because 2.25 is positive, the sign bit should be set to 0.

Thus the minifloat representation of 2.25 is 01001001.

Minifloat to Decimal

To convert from a floating-point number into a decimal number:

  1. Extract the sign, exponent, and significand.
  2. Normalize the significand to the format \( g_3.g_2g_1g_0 \) and remove trailing zeros.
  3. De-normalize to make the exponent 0.
  4. Convert the integer and fractional parts to decimals.
  5. Add a negative sign if necessary.

Example: Converting 11011100 into a Decimal

Step 1: Extract the sign, exponent, and significand.

  • Sign bit: 1 (negative)
  • Exponent: 101
  • Significand: 1100

Step 2: Normalize the significand to the format \( g_3.g_2g_1g_0 \) and remove trailing zeros.

Our significand 1100 becomes 1.1.

Step 3: De-normalize to make the exponent 0.

We first convert our binary exponent 101 into base-10, yielding 5. We then subtract our bias (which is 3 for minifloats) from our exponent to get \( 5-3=2 \).

Since our exponent is 2, we shift our binary point 2 places to the right, yielding 110.0.

Step 4: Convert the integer and fractional parts to decimals

Next, we convert the integer and fractional parts of 110.0 into base-10. Since \(110_2 = 6_{10}\) and \(0_2 = 0_{10}\), \(110.0_{2} = 6.0_{10}\).

Step 5: Set the sign according to sign bit

Since the sign bit is 1, the final value is: \(-6.0\).

Adding Minifloats

To perform addition with floating-point numbers:

  1. Rewrite the smaller number so that the exponents are equal, and adjust the mantissa of the number with the smaller exponent by shifting it to the right accordingly.
  2. Add the mantissas together.
  3. Recombine and renormalize the result if necessary.

Example: \(1.5 + 0.5\)

First, we need to convert 1.5 and 0.5 into their minifloat representations. For 1.5 this is \(1.1 \times 2^0\), and for 0.5 this is \(1.0 \times 2^{-1}\).

Step 1: Adjust the mantissa

Because the exponents differ, we shift 0.5’s mantissa to the right by one: \( 1.0 \rightarrow 0.10 \)

Now both numbers have an exponent of 0.

Step 2: Add the mantissas together.

  • \( 1.1_2 + 0.10_2 = 10.0_2\)

Step 3: Recombine and renormalize the result if necessary

  • \( 10.0_2 \times 2^0 = 1.0 \times 2^1 \)

Thus the answer is 0 100 1000 which is equivalent to 2.0 in base-10.

Bit size in C

We want to ensure that the type we are using to represent a minifloat is exactly 8 bits. We will use the uint8_t type from C’s stdint.h header. (We will avoid char, even though char is 8 bits on most platforms, because C unhelpfully does not guarantee that is is exactly 8 bits everywhere.) To break down this type’s, the uint means that bit-level operations are as on an unsigned integer, the 8 means that we expect operations to be on 8 bits, and _t is a common naming convention that indicates that this is a type. The stdint.h header defines many similar types, like these:

TypeDescription
uint8_tunsigned integer with 8 bits
uint16_tunsigned integer with 16 bits
int8_tsigned integer with 8 bits

Your Task

This assignment is divided into three parts: displaying minifloats as decimals, implementing operations on minifloats, and using minifloats. Each part will have you implementing 1–3 functions, and adding test cases to help convince yourself these functions are correct. You must add at least 4 new test cases per function to what we have provided, though you may add more.

Warning

For all of your C implementations, you may not include any constants or variables of type float, double, or long double. You may not use C’s built-in floating-point operations, such as + on floating-point values.

This is not an arbitrary restriction. Using a larger float representation in your implementation will defeat the purpose of the smaller representation, which is that they are smaller and faster than “normal” floating-point types. Because of floating-point error, it is also very likely to introduce incorrect results.

We have provided a mini_to_double utility function to help you with debugging and testing. You may not use this function in any of your submitted implementations, but you may use this function for writing test cases for any of your functions.

Part 1: Lab

View the lab slides here.

Review

If you need to, look over the lecture notes on standard floating-point types to remind yourself of the basic principles. And try out float.exposed to get hands-on practice!

Read over the background above and especially the specification for minifloats. To briefly summarize the minifloat format:

  • Bit 7 is the sign bit
  • Bits 6–4 are the exponent bits
  • Bits 3–0 are the fraction bits

(Bits are numbered from the right, so 0 is the least significant bit.)

Displaying Minifloats

In this lab, your task is to implement a function for displaying minifloats in C, named print_mini. This function takes in a minifloat and must print the sign, whole number, and fractional part associated with this minifloat as a base-10 value. The exact specification, with examples, is given in minifloat.h. Your implementation should be filled into minifloat.c.

To make your task somewhat easier, we have written a concrete call to printf at the end of the each function that you may use as a guide for what to implement. Note that print_mini requires that we write 6 decimal digits—the provided printf specifier %06d will fill any integer to have preceding zeros such that the printed integer has 6 digits. To provide two concrete examples:

  • printf("%06d", 123) will print 000123
  • printf("%06d", 100000) will print 100000

Warning

Remember, you may not include any constants or variables of type float, double, or long double, and you may not use any floating-point operations. You may, however, use any integer arithmetic operation (including integer division and modulus). In C, dividing two integers with i / j produces an integer. But be sure not to include a double constant (such as 1.0) by accident.

Hint

You may find it useful to observe that \(1/64=0.015625\), and that, with integer division, \(1000000 / 64 = 15625\).

Testing Part 1

A test script to help guide your development can be found in minifloat_test_part1.c. You can build this test with the following command:

rv make part1

To test this code, you must execute the resulting .out file and pipe your print results to a file, such as with the following command:

rv qemu minifloat_test_part1.out > minifloat_test_part1.txt

Reminder: Use the rv alias!

Reminder: use the rv aliases for each command if you have it set up!

Finally, you must compare the resulting prints to our expected results using diff:

diff minifloat_test_part1.txt minifloat_test_part1.expected

If you observe any differences between the two, a printing test failed.

You can also combine these operations into a single bash command:

rv make part1 && rv qemu minifloat_test_part1.out > minifloat_test_part1.txt && diff minifloat_test_part1.txt minifloat_test_part1.expected

Reminder: You must add 4 new printing tests (which means modifying both minifloat_test_part1.c and minifloat_test_part1.expected).

Part 2: Minifloat Operations

Your second task is to implement an equality check, addition, and multiplication between minifloats. Specifically, you will be implementing mini_eq, mini_add, and mini_mul, which both take in two minifloats and produce a new minifloat. As before, the specifications for each function can be found in minifloat.h, and your implementation should be written in minifloat.c.

The results of the arithmetic operations mini_add and mini_mul must produce the minifloat value closest to adding together the corresponding real numbers. If there are two possible closest real numbers, your implementation must correspond to the closest real number further from zero than the result of addition. For example, we would round 2.125 to 2.25, and similarly -1.0625 to -1.125.

If there are multiple possible minifloat representations of the resulting real number, you must return the minifloat with the smallest exponent. For example, the minifloat value 0 011 0010 could be equivalently represented as 0 001 1000, and only the latter is considered correct for these arithmetic operations. Additionally, if an arithmetic operation would return 0, you must return exactly 00000000.

If applying addition or multiplication would result in a real number larger or smaller than can be represented by a minifloat, the result of these operations is undefined, and need not be tested.

Hint: If you become stuck on any of these functions, consider attempting another—each requires detail that can become more obvious while working on another.

Testing Part 2

Testing minifloat operations is more straightforward than testing the printing implemented earlier. We can simply run each test file and compare the resulting minifloats to expected values. To test part 2, you can directly build and execute part2:

rv make part2 && rv qemu minifloat_test_part2.out

Reminder: You must add 4 new tests per function.

Hint: Write as many edge-case tests as you can think of, there are many potential tricks with negative numbers and very small or very large minifloats.

Part 3: Using Minifloats

Your third task is a straightforward example use of the minifloats you have implemented. Specifically, you’ll be implementing functions to calculate the volume and surface area of a cylinder in the functions titled cylinder_volume and cylinder_area.

The volume and surface area of a cylinder depends on two variables, the radius r and height h of the cylinder, by the following equations:

  • \( \text{volume} = \pi \times r \times r \times h \)
  • \( \text{surface area} = 2 \times \pi \times r \times (h + r) \)

For reference and comparison, we have also written an implementation of these functions double_cylinder_volume and double_cylinder_area. These may be useful to refer to while implementing your own function, but are also used for the written task below.

For these implementations, you are expected to use the constant minifloat representation of PI to be 01001101 (representing 3.25), which is the closest minifloat to the decimal \(\pi \approx 3.14159\). We have included this constant definition in minifloat.c for your convenience.

Testing Part 3

To test part 3, you can directly build and execute part3:

rv make part3 && rv qemu minifloat_test_part3.out

We have only provided you with a single simple test for each, and you should write at least 4 new tests. We test these particular functions by comparing our minifloat calculation to the result produced by calculating the same value with a double. We expect that the minifloat result (being less accurate) will have some error compared to the double representation, which in the test is represented by the threshold parameter.

We recommend trying out a few operations and seeing how difference there is between minifloat and double calculations, and adjusting your threshold accordingly. To help with comparing these operations, we use the provided mini_to_double utility function to calculate calculate a double value before and after computing the minifloat equivalent. (We do not define a double_to_mini conversion.)

Warning

The mini_to_double utility is only for testing. Do not use it in your main implementation.

Remember that your goal is to implement minifloat operations “from scratch,” using only integer arithmetic. This is what makes minifloats more efficient than float or double.

Your tests should not include cases where the minifloat arithmetic would overflow (produce a result larger than the maximum minifloat or smaller than the largest negative minifloat). We do not define the results of these overflowing operations.

Submission

Submit minifloat.c, minifloat_test_part1.expected, minifloat_test_part1.c, minifloat_test_part2.c, and minifloat_test_part3.c to Gradescope. Upon submission, we will provide a smoke test to ensure your code compiles and passes the public test cases.

Rubric

  • 16 points: print_mini correctness
  • 18 points: mini_eq correctness
  • 16 points: mini_add correctness
  • 19 points: mini_mul correctness
  • 8 points: cylinder_area correctness
  • 8 points: cylinder_volume correctness
  • 15 points: test quality

A3: Huffman Compression

Instructions: Remember, all assignments in CS 3410 are individual. You must submit work that is 100% your own. Remember to ask for help from the CS 3410 staff in office hours or on Ed! If you discuss the assignment with anyone else, be careful not to share your actual work, and include an acknowledgment of the discussion in a collaboration.txt file along with your submission.

The assignment is due via Gradescope at 11:59pm on the due date indicated on the schedule.

Submission Requirements

You will submit your completed solution to this assignment on Gradescope. You must submit:

  • huffman.c, which will contain part of your work for Task 0 and all of your work for Tasks 1 and 2.
  • priority_queue.c, which will contain part of your work for Task 0.

Restrictions

  • You may not modify any files other than huffman.c and priority_queue.c (i.e., the files you will submit).

Provided Files

  • priority_queue.h, which is a header file that defines the specification for the priority queue.
  • priority_queue.c, which will contain your implementation of a priority queue and stack. You will modify this file.
  • huffman.h, which is a header file that defines the types and functions you will need to implement Huffman compression.
  • huffman.c, which will contain your implementation for the Huffman compression system. You will also modify this one.
  • bit_tools.h, which is a header file that defines the BitWriter and BitReader structs and their respective functions for reading and writing binary values from files.
  • bit_tools.c, which contains the implementation of the functions for BitWriter and BitReader.
  • utils.h, which contains utility functions for printing lists and tree nodes.
  • utils.c, which contains the implementation for the utility functions.
  • Makefile, which contains the build tools for this assignment.
  • test_priority_queue.c, which contains functions to test your implementation for Task 0. You may add tests here, but you will not turn this file in.
  • test_huffman.c, which contains functions to test your implementation for Task 1. You may also modify this file, as above.
  • cu_unit.h, which contains the macro definitions that you’ll use for unit testing.
  • compress.c, which contains the compression program’s command line interface.
  • decompress.c, which contains the decompression program’s command line interface.

Remember, do not modify other source files except the ones containing your implementation. We will grade your submission with “stock” versions of the starter code.

Getting Started

To get started, obtain the release code by cloning your assignment repository from GitHub:

$ git clone git@github.coecis.cornell.edu:cs3410-2025sp-student/<NETID>_huffman.git

Replace <NETID> with your NetID. All the letters in your NetID should be in lowercase.

Overview

In this assignment you will implement a data compression system using Huffman coding. Huffman compression is an encoding scheme that uses fewer bits to encode more frequently appearing characters and more bits to encode less frequently appearing characters. It is used by ZIP files, among many other things. The high-level overview of the algorithm is:

  1. Calculate the frequency of each character in the data. (Task 0)
  2. Build a Huffman tree using the frequencies. (Task 1)
  3. Build an encoding table using the Huffman tree. (Task 2)
  4. Encode each character in the data using your encoding table. (Task 2)

In the lab, you will implement a priority queue in C. You’ll use this to build your Huffman tree. The bulk of the work for this assignment will come from understanding the Huffman coding algorithm and manipulating data structures in C using pointers.

Huffman Compression Algorithm

Your implementation will read a single text file as input and produce two output files: a compressed data file and a coding table file that encodes enough information to allow decompression. (This assignment does not include decompression; we have given you a decompressor implementation.) Task 2 describes the format for these files.

Before moving onto the tasks, let’s break down the Huffman compression algorithm. You may recall that ASCII is a straightforward way to represent characters. In ASCII, every character is encoded with 8 bits (1 byte). There are 256 possible ASCII values that can be represented. This means that if we use standard ASCII encodings to represent a text file, each character in the file requires exactly 1 byte. This is inefficient, as most text streams don’t actually use all 256 possible characters. The basic idea behind Huffman encoding is as follows: use fewer bits to represent characters that occur more frequently.

For example, consider the string go go gophers. Notice how g and o appear three times more often than the remaining letters. It would be nice if we could construct an encoding which uses fewer bits for g and o and (possibly) more bits for the remaining characters (e.g., h, r). That’s the goal with Huffman coding.

At the heart of Huffman coding is the Huffman tree data structure. A Huffman tree is a binary tree with characters at its leaves. Each edge in the tree corresponds to a bit: a left edge corresponds to 0 and a right edge corresponds to 1. To get the encoding for a character, follow the path from the root node to the character’s leaf node and concatenate all the corresponding bits.

Here’s a Huffman tree that contains all the characters in our string, go go gophers:

Huffman Tree

We have labeled each leaf with the frequency of that character. Internal nodes also have a frequency number that is the sum of all the frequencies of the children.

Here’s a table that shows the binary code for each character, according to this tree:

CharacterBinary code
101
e1100
g00
h1101
o01
p1110
r1111
s100

Remember, you get the encoding by traversing the path from the root to the character, using a 0 for every left edge and a 1 for every right edge.

The Huffman tree ensures that characters that are more frequent in the input receive shorter encodings, and characters that are less frequent receive longer encodings. Our goal is to construct the Huffman tree, write the coding table, and write the compressed file using these shorter encodings.

Assignment Outline

  • Task 0: You will complete Task 0 in lab. You will implement a priority queue in C as well as the calc_frequencies function in huffman.c.
  • Task 1: You will implement the algorithm to create a Huffman tree.
  • Task 2: You will implement the functions write_coding_table and write_compressed to write the coding table and compressed bytes to distinct files.

Implementation

(Lab) Task 0: Implementing a priority queue and frequency counter

View the lab slides here.

Before starting, make sure you’ve cloned the release code by following the instructions in Getting Started.

Step 1: Implement a priority queue

The code for this portion is located in priority_queue.c, which is provided to you in the release code. In this step, you’ll build a priority queue that accepts a “generic” data type. This is accomplished by storing a pointer to an arbitrary piece of memory that can store anything by using void*. We’ve provided a header file that defines the PQNode type as well as the function declarations for the functions you are required to implement.

Your implementation will go in priority_queue.c. We’ve provided a basic test suite in test_priority_queue.c. You will implement the following functions:

  • PQNode *pq_enqueue(PQNode **a_head, void *a_value, int (*cmp_fn)(const void *, const void *)): Add a new node with value a_value to a priority queue, using function cmp_fn(...) to determine the ordering of the priority queue.
  • PQNode *pq_dequeue(PQNode **a_head): Detach and return the head. Note, the caller is responsible for freeing the detached node, and any memory it refers to. Do not call free.
  • void destroy_list(PQNode **a_head, void (*destroy_fn)(void *)): Deallocates the priority queue. This should call the detroy_fn function on every data element, and it should free the list nodes.
  • PQNode *stack_push(PQNode **stack, void *a_value): Add a new node with value a_value to the front of the list.
  • PQNode *stack_pop(PQNode **stack): Detach and return the head of the list. Note, this function is extremely similar to pq_dequeue.

The last two functions are to enable us to use the same data structure as a stack, when needed. You probably will not make use of this for your Huffman compression system, but the decompression system needs a stack to work properly. If you can implement pq_enqueue, and pq_dequeue, implementing stack_push and stack_pop should be very easy.

We’ve provided a test file called test_priority_queue.c. Running rv make pqtest from the command line will build an executable called test_priority_queue, which you can then run by typing rv qemu test_priority_queue.

The tests use the header file cu_unit.h, which defines various macros that help you write unit tests. In general, tests should be structured like so:

static int _test_name() {
    cu_start();
    //-------------------
    // Setup code - build a list, declare a variable, call a function, etc. 
    cu_check(/*condition you want to check*/);
    // ... add as many checks as you want
    //-------------------
    cu_end();
}

int main(int argc, char* argv[]) {
    cu_start_tests(); // Indicate start of test suite
    cu_run(_test_name); // Don't forget to run the test in `main`
    cu_end_tests(); // Indicate end of the test suite
} 

Upon running the test, you’ll see one of the two following messages:

Test passed: _test_name

which will be displayed in green, or:

Test failed: _test_name at line x

which will be printed in red, and give the line that failed. We’ve provided two simple tests in the release code that check the behavior of your priority queue and stack. You are encouraged to add more tests to verify the functionality of your implementation. You will not be turning in test_priority_queue.c, however, so this will not be graded.

Generic data types

You might notice some strange looking syntax in these function declarations. This is to enable generic data types. The PQNode struct contains a void*, which you can think of as a memory address to any type. This allows you to use the same code for linked lists of any type.

You can assign a void* to an address of any type. This is why you can write code like:

char* s = malloc(...);

even though malloc(...) returns a void*, not a char*. This is also similar to the way functions such as qsort(...) allow you to sort arrays of any type.

Function addresses

Code that deals with generic data types often needs to pass functions as parameters. To do this, you need to specify the address to a function as an argument. In other words, you are declaring the parameter of the function (in this case cmp_fn) as the address to a function that takes in some parameter(s) of specified types and returns a value of a specified type. For the compare function, you’ll always return an integer, and the arguments to the compare function can be anything, depending on the underlying data in the nodes of the priority queue.

Let’s look at an example:

void _print_square(int n) {
    printf("%d squared is %d\n", n, n * n);
}

void _print_cube(int n) {
    printf("%d cubed is %d\n", n, n * n * n);
}

void _call_print_fn(int n, void(*print_fn)(int)) {
    print_fn(n);
}

int main(int argc, char* argv[]) {
    _call_print_fn(4, _print_square); // Prints 16
    _call_print_fn(4, _print_cube); // Prints 64
}

In the above code, the type of parameter print_fn is void(*)(int). In other words, print_fn is the address to a function taking an int and returning void. Generalizing this to our priority queue, notice that the type of parameter cmp_fn is int(*)(const void*, const void*). This is the address to a function taking two addresses to memory locations of any type and returning an int.

Similarly, destroy_list also takes a function address. This is because beyond freeing the node itself, you also need to potentially free whatever the node stores (e.g., if you have a priority queue of dynamically allocated strings).

Implementing pq_enqueue

You might recall from CS 2110 that priority queues can be implemented with binary heaps. In our implementation, however, we will be implementing our priority queue as a linked list that we will keep sorted by priority. This means that inserting a node will be an \(O(n)\) time operation, and removing from the priority queue will be a constant time operation. This is fine for our purposes.

In pq_enqueue, *a_head refers to the head of the linked list. If *a_head is NULL, then the list is empty. a_value is the address of whatever value is associated with this node. Allocate a new PQNode and insert it into the list in sorted order, according to the cmp_fn function. That is, everything before the new PQNode should be less than the new one, and everything to the right should be bigger than (or equal to) the new one.

*a_head should be updated if the new node becomes the first item in the list. The function should return the address of the new node.

This function should call malloc exactly once. You should not call free in this function.

We recommend you test your implementation for your priority queue as you go in test_priority_queue.c. You should also test your implementation for types other than integers, including dynamically allocated types such as strings. You will need to write your own comparison function to do this, and potentially your own print function if you want to be able to print your list.

Implementing pq_dequeue

Like the previous function, *a_head refers to the head (first node) of a valid linked list. If the list is empty, return NULL (since there is nothing to dequeue). Upon return, *a_head must be a valid linked list (although possibly empty). For our purposes, NULL is a valid linked list of size 0. Thus, *a_head will be set to NULL if the list is empty, and upon removing the last node, you should set *a_head to NULL.

You must also set the next field of the removed node to NULL. The caller is responsible for freeing the detached node, and any memory it refers to. For this reason, this function should not call free, directly or indirectly.

Again, you should test this by adding more statements to test_priority_queue.c and printing the list to observe the behavior of your function.

Implementing destroy_list

This function should completely destroy the linked list referred to by *a_head, freeing any memory that was allocated for it. destroy_fn(...) is a function that deallocates *a_value as needed (if for example, the nodes of the priority queue had values that were themselves dynamically allocated). This function should set the head to NULL in the caller’s stack frame (i.e. *a_head = NULL).

This is a good point to check to make sure that your code does not leak memory. Suppose you have the following code in test_priority_queue.c:

#include "priority_queue.h"
#include "cu_unit.h"

int _cmp_int(const void *a, const void *b) {...}

void _print_int(void *a_n) {...}

int _test_destroy() {
    cu_start(); 
    // ------------------
    PQNode* head = NULL;
    int n1 = 5, n2 = 7, n3 = 6;
    pq_enqueue(&head, &n1, _cmp_int);
    pq_enqueue(&head, &n2, _cmp_int);
    pq_enqueue(&head, &n3, _cmp_int);
    destroy_list(&head, NULL);
    cu_check(head == NULL);
    //--------------------
    cu_end();
}

int main(int argc, char* argv[]) {
    cu_start_tests();
    cu_run(_test_destroy);
    cu_end_tests();
    return 0;
}

This code should contain no memory leaks, i.e., it should eventually free everything that it mallocs.

You will likely want to use the sanitizers to check for memory bugs. Running rv make pqtest also enables the sanitizers so you don’t have to write out the command-line flags yourself.

Implementing stack_push and stack_pop

In stack_push, *stack stores the address of the first node in the linked list. a_value stores the address of the generic type. The newly allocated node should become the first node of the list, and *stack should be updated. The function returns the address of the new node.

In this function, you will call malloc exactly once, and you will not call free. This function is extremely similar to pq_enqueue, except you don’t need to think about where in the list the node should go. It always goes in the front of the list.

For stack_pop, you should simply detach and return the node from the head of the linked list. Note that this is incredibly similar to the specification for pq_dequeue.

Again, make sure you thoroughly test this code, as it will be used extensively in Task 1 and Task 2. If you are confident your code is correct, now would be a good time to commit and push your work to GitHub.

Step 2: Implementing calc_frequencies

The code for this task is located in huffman.c. You will be implementing the following function:

  • calc_frequencies(Frequencies freqs, const char* path, const char** a_error): Open a file at path and either store the character frequencies in freq or set *a_error to strerror(errno).

Before getting started, we recommend you take a look at the type definitions and function specification located in huffman.h. In particular, pay careful attention to these two lines:

typedef unsigned char uchar; 
typedef uint64_t Frequencies[256];

The first line tell us that uchar is simply an alias for an unsigned char. Similarly, the second line tells us that Frequencies is an alias for an array of 256 unsigned integer values.

For the function calc_frequencies, the caller is responsible for initializing freqs[ch] to 0 for all ch from 0 through 255. The function should behave as follows:

  • If the file is opened correctly, then set freqs[ch] to \(n\), where \(n\) is the number of occurrences of the character ch in the file at path. Note that a char is an integer type, so it can be used to index directly into an array. But note that, just like other integer types, we need to specify whether it is signed/unsigned.

    After this, return true. Do not modify a_error.

  • If the file could not be opened (i.e., fopen returned NULL), set *a_error to strerror(errno) and return false. Do not modify freqs.

You only need to check for errors related to failure to open the file. This function should not print anything, nor should you call malloc or free. You do not need them.

This function will need to use file input/output functions from the stdio.h header. In particular, use the documentation for fopen, fgetc, and fclose. Working with files in C can be confusing at first. Let’s look at some of the basic syntax:

#include <stdio.h>
#include <stdlib.h>

void print_first_character(char const* path) {
    FILE* stream = fopen(path, "r"); // this opens the file in reading mode 
    char ch = fgetc(stream); // read one character from the file, starting from the beginning
    fputc(ch, stdout); // write that character to stdout
    fclose(stream); // always call fclose() if you call fopen()
}

int main(int arc, char* argv[]) {
    print_first_character("animal.txt");
    return 0;
}

In the fopen function, the second argument indicates the mode the file should be opened in. "r" is for reading, "w" is for writing, and "a" is for appending. If you wanted to write a function to print out every character in a file (and not just the first), you’d write something like this:

void cat(char const* path) {
    FILE* stream = fopen(path, "r"); 

    for (char ch = fgetc(stream); !feof(stream); ch = fgetc(stream)) {
        fputc(ch, stdout);
    }

    fclose(stream);
}

Be sure to use the stdio.h documentation to find the I/O functions you need.

Again, we recommend testing your code for calc_frequencies before moving on. Create a file called test_frequencies.c, and an example file such as animals.txt. Try calling your function and seeing if it correctly obtains the frequencies of each character in the text file using cu_unit.

That’s all for Task 0 and the lab! Don’t forget to commit and push your code to GitHub.

Task 1: Building a Huffman Tree

In lab we created a priority queue that accepts a “generic” data type. We will use the priority queue in this task to build our Huffman tree.

Finish Task 0 Before Continuing

If you missed lab or you don’t have a working priority queue or calc_frequencies function, go back and finish that first. Your code for this task will rely on the previous task.

The implementation for the Huffman tree will be contained in huffman.c. Look carefully first at huffman.h to ensure you understand the functions you are required to implement. In this task you will be implementing two functions:

  • TreeNode* make_huffman_tree(Frequencies freq): Given an array freq which contains the frequency of each character, create a Huffman tree and return the root.
  • void destroy_huffman_tree(TreeNode** a_root): Given the address of the root of a Huffman tree created by make_huffman_tree(...), deallocate and destroy the tree.

Recall that freq is an array with 256 values. Each index of the array is an ASCII character (recall that chars are just unsigned bytes in C). The value of freq[c] is the frequency of character c in the input file.

Also important in the header file is the definition of the TreeNode struct. A Huffman tree node contains the character, the frequency of the character in the input, and two child nodes. Huffman’s algorithm assumes that we’re building a single tree from a set (or forest) of trees. Initially, all the trees have a single node containing a character and the character’s weight. Iteratively, a new tree is formed by picking two trees and making a new tree whose child nodes are the roots of the two trees. The weight of the new tree is the sum of the weights of the two sub-trees. This decreases the number of trees by one in each iteration. The process iterates until there is only one tree left. The algorithm is as follows:

  1. Begin with a forest of trees. All trees have just one node, with the weight of the tree equal to the weight of the character in the node. Characters that occur most frequently have the highest weights. Characters that occur least frequently have the smallest weights. These nodes will be the leaves of the Huffman tree that you will be building.
  2. Repeat this step until there is only one tree: Choose two trees with the smallest weights; call these trees T1 and T2. Create a new tree whose root has a weight equal to the sum of the weights T1 + T2 and whose left sub-tree is T1 and whose right sub-tree is T2.
  3. The single tree left after the previous step is an optimal encoding tree.

To implement this strategy, use your priority queue to store your tree nodes. You want all the nodes to be ordered by their weights, so you can easily find the two trees with the smallest weights (at the front of the queue). You will need to write your own comparison function to implement this policy. To break ties when two tree-nodes have the same frequency, you can order them lexicographically by the ASCII value of the character.

We will not pay particular attention to the tie-breaking between a node and a non-leaf node, since those nodes are supposed to not hold a value in theory. Adding a tie-breaking here would make your implementation unnecessarily more complex. While there is only a single theoretically correct Huffman tree, this implies that the tree we build here can take on multiple forms. That’s fine; we will not grade based on the exact structure of your Huffman tree, but the properties delineated below.

When you test your code, you should make sure that calling destroy_huffman_tree(TreeNode** a_root) ensures that your code has no memory leaks.

For testing, there are a few properties of Huffman trees we would like to verify:

  1. The weight of an internal node is equal to the sum of the weights of its children.
  2. The sum of the weights of the leaf nodes is equal to the number of characters in the uncompressed text.
  3. If the number of distinct leaf nodes is \(n\), then the number of total nodes in the Huffman tree is \(2n - 1\).

The last property follows from the fact that if you start with \(n\) leaf nodes, you need \(n - 1\) internal nodes to connect them.

We’ve provided you with a file test_huffman.c, which defines functions that verify the aforementioned properties using cu_unit.h. We’ve provided three test functions: one for each file given to you in the tests directory. You are encouraged to add more thorough tests yourself; however, you do not need to turn in test_huffman.c. Once you are confident your implementation is correct, move on to the next task.

To compile and run this program, you’ll run:

$ rv make hufftest
$ rv qemu test_huffman

Task 2: Writing the compressed file and coding table

Now we have all of the pieces we need to write the compressed file and the coding table. For this task, you must implement two functions, found in huffman.c:

  • void write_coding_table(TreeNode* root, BitWriter* a_writer): Write the code table to a_writer->file. This function writes to a file called coding_table.bits.
  • void write_compressed(TreeNode* root, BitWriter* a_writer): Write the encoded data to a_writer->file. This function writes to a file called compressed.bits

The above functions make use of the BitWriter struct, which is defined in bit_tools.h. The BitWriter allows us to write data to a file in increments of bits instead of bytes. (Normal file writing APIs, including C’s standard stdio.h, only support writing entire bytes at a time.) You are not responsible for fully understanding the inner workings of BitWriter, but you do need to know how to use it to write data to the file.

The BitWriter struct contains a file that is already opened in "w" mode. To write bits to the file, you must call the function write_bits(BitWriter* a_writer, uint8_t bits, uint8_t num_bits_to_write). It takes three parameters:

  • a_writer: The address of a BitWriter that contains a file which is open for writing
  • bits: The bits you want to write, stored in a uint8_t
  • num_bits_to_write: The number of bits you want to write, which must be between 0 and 8 inclusive

For both the compressed file and the coding table, you should only need to write bits to the file in 1-bit and 8-bit increments. The following program may help in understanding the behavior of the BitWriter more clearly:

int main(int argc, char* argv[]) {
    BitWriter writer = open_bit_writer("new_file.bits");
    write_bits(&writer, 0x05, 3);  // 0x05 ↔ 00000101₂ ⋯ writes just 101₂
    write_bits(&writer, 0xf3, 3);  // 0xf3 ↔ 11110011₂ ⋯ writes just 011₂
    write_bits(&writer, 0x01, 2);  // 0x01 ↔ 00000001₂ ⋯ writes just 01₂
    write_bits(&writer, 0x20, 6);  // 0x20 ↔ 00100000₂ ⋯ writes just 100000₂
    write_bits(&writer, 0x13, 5);  // 0x13 ↔ 00010011₂ ⋯ writes just 10011₂
    write_bits(&writer, 0x05, 5); // 0x05 ↔ 00000101₂ ⋯ writes just 00101₂ 
    close_bit_writer(&writer);
    return 0;
}

After running this code, you can inspect the new_file.bits file using the following command:

$ xxd -b -g 1 new_file.bits

The xxd tool prints out files in binary, hex, and ASCII formats so you can see exactly what you have written.

Be careful when writing characters whose encodings are greater than 8 bits. write_bits can only write at most 8 bits at a time as bits is an 8-bit unsigned integer (uint8_t). One way to get around this restriction is to iteratively print the number one bit at a time. See below for an example of how to do this:

int main(int argc, char* argv[]) {
    BitWriter writer = open_bit_writer("new_file.bits");

    uint32_t bits = 0x107; // 0x101 ↔ 100000111₂ --> more than 8 bits long
    uint8_t num_bits_to_write = 9;

    // THIS LINE WOULD FAIL because we have more than 8 bits we are trying to write in write_bits
    write_bits(&writer, bits, num_bits_to_write);

    // THIS LINE WORKS because we write the encoding bit-by-bit.
    for(int i = 0; i < num_bits_to_write; ++i){
        write_bits(&writer, bits >> (num_bits_to_write - i - 1), 1) // write the encoded bits one at a time
    }

    close_bit_writer(&writer);
    return 0;
}

Implementing write_coding_table

The coding table is a file that encodes the structure of your Huffman tree in a text file. It is an important utility for the decompression algorithm, as it allows you to recover the structure of the Huffman tree without needing the original uncompressed text. In this step, we will write the encoded Huffman tree to a file called coding_table.bits.

To write the coding table, you do a post-order traversal of your Huffman tree.

  1. Traverse the left subtree of the root (i.e., encode it to the file).
  2. Traverse the right subtree of the root (i.e., encode it to the file).
  3. Visit the root.

Every time you “visit” a node (including the root of a subtree):

  • If it is a leaf (i.e., character), you write one bit: 1. Then, you write the entire character (8 bits). Example: If the character is A, you will write 0b101000001. The 1 is to signify that it is a leaf. The 0b01000001 is to specify the character itself.
  • If it is a non-leaf (i.e., an internal node), you write one bit: 0.

To write out the bits for a character, you can pass a char value directly to write_bits. For example, use write_bits(my_writer, 'A', 8) to write out the binary encoding of the character A.

Your code will write the bits for the coding table using BitWriter. To make the coding table more explicit, consider the following Huffman tree for go go gophers again:

huffman tree

If we provide this tree as an input to write_coding_table, the coding table representation should look like 1g1o01s1 01e1h01p1r00000, and in complete binary (as formatted by xxd), it would be represented as:

00000000: 10110011 11011011 11010111 00111001 00000010 11001011
00000006: 01101000 01011100 00101110 01000000

Notice that the first bit is a 1, indicating a leaf, followed by the byte 01100111, which represents the character g in ASCII. Write the bits of the coding table to the file only. Do not write anything before or after the encoding of the Huffman tree.

Before we move on, here’s another reminder that the Huffman tree you build in make_huffman_tree can take on various forms depending on how you tiebreak the non-leaf nodes; there is no single “correct” Huffman tree for the purpose of this assignment. This means your binary representation generated by the compression driver below for go go gophers might not match the example above; in fact, in our implementation we got:

00000000: 10111001 11011001 01101101 00000101 10011101 01101111  ..m..o
00000006: 10111000 01011100 10010010 00000000                    .\..

So even if your coding table for the gophers example might not match our examples in this instruction, there is no need to fret. Just make sure to verify that your coding table matches your Huffman tree and run some tests.

You can verify the functionality of your write_coding_table by running the compression driver:

$ rv make 
$ rv qemu compress tests/ex.txt
$ xxd -b -g 1 coding_table.bits

Running the compress binary will produce two files: coding_table.bits and compressed.bits. You can inspect each of these files to verify the correctness of the write_coding_table and write_compressed functions, respectively.

Implementing write_compressed

In this step, we will write the compressed data to compressed.bits. The argument a_writer to the function points to a BitWriter that has compressed.bits open for writing. To write the compressed data, you will need to traverse your Huffman tree to recover the encodings, and then use the encodings to write the compressed data. How you accomplish this is largely up to you—there are many valid approaches here. Just make sure that there are no memory leaks and that your compressed data file actually represents the Huffman encodings. Again, write the bits of the compressed data only—do not write any bits before or after the compressed bits.

When you go to inspect the file, you may notice that there are an additional four bytes written to compressed.bits before the compressed data itself. These bytes represent the size of the original uncompressed text in bytes. Integers are typically four bytes, so we use four bytes to write this information to the file. This is written for you by the compression driver (do not write this yourself). The reason it’s there is for decompression—the decompression program needs to know how big the original text file was to recover the uncompressed text.

Using the go go gophers example, the compressed data should look something like (where there are four additional bytes at the beginning):

00000000: 00001101 00000000 00000000 00000000 01101110 11011101  ....n.
00000006: 10110000 11001011 01000000                             ..@

Notice that if you use the command ls -l, you can see the sizes of your files in the directory in bytes. The original file was 13 bytes but the compressed file is 9 bytes—our compression was successful!

Running and Testing

To make it easier to compile and run your code, we’ve provided a Makefile. To build your program, simply type rv make. rv make will build two executables: a compression program and a decompression program. To run the compression program, type:

rv qemu compress <filename>

This will produce two output files: compressed.bits and coding_table.bits. If you run the compression program on another input file, the two output files will be overwritten with the new results.

To run the decompression program, type:

rv qemu decompress compressed.bits coding_table.bits <uncompressed_filename>

This produces a file called <uncompressed_filename>. To see if your compression was successful, you can try comparing the result of the decompression to the original unencoded file by running:

diff <original_file> <uncompressed_file> 

For example, if you were trying this on the cornell.txt file in the tests directory, you’d run:

$ rv qemu compress tests/cornell.txt
$ rv qemu decompress compressed.bits coding_table.bits uncompressed_cornell.txt
$ diff tests/cornell.txt uncompressed_cornell.txt

If you see nothing when running this, that means the files are identical and decompressing your compressed file was successful. Good work!

Note that the decompression tool is based on your implementation of the coding table and the Huffman tree. In other words, you might be able to decompress your file correctly, but that does not mean your Huffman tree is correct.

Round-trip compression and decompression is necessary for the correctness of the entire system, but not sufficient, to guarantee that all of the functions from Task 1 and Task 2 are correct. You are strongly encouraged to use cu_unit.h (described in Task 0) to more thoroughly test your code for Task 0 and Task 1. You can add tests directly to test_priority_queue.c and test_huffman.c. You are not required to submit these files, but we strongly encourage you to test each task separately as that is how your code will be graded.

To build the test executables, you can run:

$ rv make pqtest
$ rv make hufftest

which will generate test_priority_queue and test_huffman, respectively.

Submission

Submit huffman.c and priority_queue.c to Gradescope. Upon submission, we will provide a smoke test to ensure your code compiles and passes the public test cases. The public test cases will only test for round-trip compression and decompression, and not intermediate functions.

Rubric

  • Task 0: 30 points
  • Task 1: 30 points
  • Task 2: 40 points

Code that contains memory leaks will be subjected to a flat 5 point deduction.

Lab 4: Address Sanitizer & GDB

Instructions: Remember, all assignments in CS 3410 are individual. You must submit work that is 100% your own. Remember to ask for help from the CS 3410 staff in office hours or on Ed! If you discuss the assignment with anyone else, be careful not to share your actual work, and include an acknowledgment of the discussion in a collaboration.txt file along with your submission.

The assignment is due via Gradescope at 11:59pm on the due date indicated on the schedule.

In this lab we will introduce two tools for debugging C code - AddressSanitizer (ASan) and the GNU Debugger (GDB). ASan is useful for catching many common memory bugs. GDB allows you to step through your code one line at a time, with the ability to see values of variables along the way.

In this lab, you are given two programs, sel_sort.c and meal_count.c, each one containing multiple bugs. Your job is to find these bugs, using the capabilities of GDB and ASan.

Important

To get credit for this lab you must follow along and complete the gradescope lab 4 assignment.

ASIDE: Working with Docker + QEMU + GDB

As with other assignments in this course, you should carry out all of your work within the Docker container that is distributed as part of the course infrastructure. The combination of Docker, QEMU, and GDB appears in several real-world applications (for example, kernel debugging), so beyond the standardization it offers for our class assignments, being able to use GDB in this way will turn out to be a useful skill for you.

However, the combination of these three adds some additional complexity to the use of GDB:

  • Because it needs to work at the level of the target machine’s ISA (i.e., RISC-V), you can’t just run a compiled program directly with GDB. Instead, you will need to use GDB’s remote-connection facility.

  • The remote-connection facility requires that you have two open terminal windows: one for the executable being run under QEMU and the other for GDB to connect to that process. Unfortunately, the fact that we are running QEMU in a Docker container adds even more complication:

    • Because you are running everything in a Docker container, you need to make sure that both terminal windows are invoking the exact same container instance.

Adding Debugging Support To The CS3410 Container

The CS3410 course infrastructure document suggests that you define an alias (or, on Windows, an equivalent PowerShell function):

alias rv='docker run -i --init -e NETID=<YOUR_NET_ID> --rm -v "$PWD":/root ghcr.io/sampsyo/cs3410-infra'

where <YOUR_NET_ID> should be replaced with your actual Cornell NetID.

We’ll use this as the basis for an invocation that adds two additional pieces of functionality, control of the container image’s name and support for core dumps in the current working directory:

alias rv-debug='docker run -it --rm --init -e NETID=<YOUR_NET_ID> --name testing --ulimit core=-1 --mount type=bind,source="$PWD"/,target="$PWD"/ -v "$PWD":/root ghcr.io/sampsyo/cs3410-infra'

To make the alias stick around when you open a new terminal shell, you will need to add it to your shell’s configuration file. You can do this by pasting the alias at the end of your shell’s configuration file or by typing these commands in your terminal but fill in the appropriate file according to your shell.

echo "alias rv='docker run -i --init -e NETID=<YOUR_NET_ID> --rm -v "$PWD":/root ghcr.io/sampsyo/cs3410-infra'" >> ~/.bashrc
echo "alias rv-debug='docker run -it --rm --init -e NETID=<YOUR_NET_ID> --name testing --ulimit core=-1 --mount type=bind,source=\"\$PWD\"/,target=\"\$PWD\"/ -v \"\$PWD\":/root ghcr.io/sampsyo/cs3410-infra'" >> ~/.bashrc

As before, you don’t really need to understand the details of Docker to use this in your work, but for the curious:

  • --name testing changes the name of the container image to “testing”, but you can choose any other name value, so long as it begins with an upper or lowercase letter. This is useful for situations in which you need to run multiple terminal windows with access to the same container image, as you will in the next section of this assignment.

  • --ulimit core=-1 --mount <etc.> enables support for core dumps, which are created when a program crashes. The specific form used here ensures that a core file is always created in the current working directory.

Like rv, you can run rv-debug with zero, one, or more arguments. With zero arguments, you’ll get a bash prompt in the Docker container itself. Any arguments that are supplied are considered to be an execution of an application within the container itself.

As before, there is a similar PowerShell function that you can define if you’re working on a Windows system:

Function rv_debug {
   if (($args.Count) -eq 0) {
      docker run -i --init --rm -e NETID=<YOUR_NET_ID> --name testing --ulimit core=-1 --mount type=bind,source="$PWD"/,target="$PWD"/ -v ${PWD}:/root ghcr.io/sampsyo/cs3410-infra
   }
   else {
      $app_args=""
      foreach ($a in $args[1..($args.count-2)) {
         $app_args = $app_args + $a + " "
      }
      $app_args = $app_args.Substring(0,$app_args.Length-1);
      docker run -i --init --rm -e NETID=<YOUR_NET_ID> --name testing --ulimit core=-1 --mount type=bind,source="$PWD"/,target="$PWD"/ -v ${PWD}:/root ghcr.io/sampsyo/cs3410-infra $args[0] $app_args
   }
}

Try adding this to the function_rv_d file in which you have already defined rv_d. As with the Linux/MacOS version, you should be able to run this just like rv_d, with or without additional arguments.

See the course infrastructure document for details on making this and the rv alias a permanent part of your working environment.

Getting Started

To get started, obtain the release code by cloning your assignment repository from GitHub:

$ git clone git@github.coecis.cornell.edu:cs3410-2025sp-student/<NETID>_gdb.git

Replace <NETID> with your NetID. All the letters in your NetID should be in lowercase.

Part 1: Memory Bugs in sel_sort.c

Now that you have the aliases setup for GDB, compile sel_sort.c using the below command:

$ rv gcc -g -std=c23 -Wall -Werror sel_sort.c -o sel_sort

And run your code:

$ rv bash # Enter the interactive rv bash shell
# qemu sel_sort
Segmentation fault (core dumped)
# Your code may also hang, in that case press ^C three times in a row to exit.

Tip

Seeing the words “Segmentation fault,” “double free,” code freezing, or print statements not printing should immediately tell you to add AddressSanitizer to your code. In later assignments, approximately half of the bugs you encounter can be solved using ASan, use it!

Now add -fsanitize=address,undefined to the compile command, like so:

$ rv gcc -g -std=c23 -Wall -fsanitize=address,undefined -Werror sel_sort.c -o sel_sort

Running your code using qemu should give you something similar to this output:

# rv qemu sel_sort
sel_sort.c:28:10: runtime error: load of misaligned address 0x000000000001 for type 'long int', which requires 8 byte alignment
0x000000000001: note: pointer points here
<memory cannot be printed>
AddressSanitizer:DEADLYSIGNAL
=================================================================
==1==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000001 (pc 0x000000010eec bp 0x001555d569d0 sp 0x001555d56990 T0)
==1==The signal is caused by a READ memory access.
==1==Hint: address points to the zero page.
    #0 0x10eee in swap /root/sel_sort.c:28
    #1 0x11182 in selection_sort /root/sel_sort.c:40
    #2 0x11582 in main /root/sel_sort.c:69
    #3 0x1556ace922 in __libc_start_call_main (/lib/libc.so.6+0x2b922)
    #4 0x1556acea0e in __libc_start_main@GLIBC_2.27 (/lib/libc.so.6+0x2ba0e)
    #5 0x10bda in _start (/root/sel_sort+0x10bda)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /root/sel_sort.c:28 in swap
==1==ABORTING

The important line to focus on here is:

#0 0x10eee in swap /root/sel_sort.c:28

It tells us that line 28 in sel_sort.c caused the segmentation fault. Can you figure out what is wrong on line 28? ASan output can be confusing at times, if you are struggling do not be afraid to ask course staff for help.

Hint

There are two memory related bugs in sel_sort.c, repeat the procedure above to fix both bugs.

After fixing both bugs, you might notice that your code does not print the correct output. Unfortunately, ASan cannot help find logic bugs in your code. For those, GDB is needed.

Part 2: Logic Bugs in Selection Sort

Introduction

The file sel_sort.c contains an implementation of the selection sort algorithm, with a main procedure that tests it on two different arrays. A version that passes its tests will display each array in ascending order. Sadly, it does not pass. In fact, trying to run it results in an unsorted array:

# qemu sel_sort
Test array #1:
[an unsorted array]

Test array #2:
[another unsorted array]

First, lets get GDB set up for your sel_sort.c.

Building Source Files for Debugging

In order to debug a program with GDB, you must first compile its source code with debugging symbols that allow GDB to inspect the resulting executable and display information such as program execution and variable values in terms of the original C code. To do this, compile the source file with the additional -g flag. This flag will add debugging symbols to the executable that will allow GDB to debug much more effectively.

Unlike previous assignments, we will often recommend here that you execute commands within a running CS3410 container, instead of using rv (or rv-debug/rv_ddebug) to run each command as a standalone process. To do this, simply type rv or rv-debug without any additional arguments. This will give you a shell prompt in the container itself, in which you can explore GDB and other utilities. For example, you can compile sel_sort.c for debugging with gdb either like this:

$ rv-debug gcc -g -std=c23 -Wall -Werror sel_sort.c -o sel_sort

or like this:

$ rv-debug
root@738c193ce5cb:~# gcc -g -std=c23 -Wall -Werror sel_sort.c -o sel_sort

Note

To help make clear when you’re running a command in your computer’s native terminal windows versus the terminal window in the CS3410 continer, we’re including the prompts for each one in the commands you’ll type below. Those that begin with $ are prompts in your native terminal app, while prompts that look like “root@738c193ce5cb:~#” are in the container terminal shell. The 738c193ce5cb component of the prompt is the ID of the running container, so this value will likely vary between runs.

Using GDB’s Remote Debugging

To use GDB in the Docker+QEMU environment, you will need to run your application and GDB as separate processes that communicate on the same port number. Assuming you have already compiled the sel_sort.c code, here are the basic steps:

  1. Open a second window in your terminal app; ideally, this will be a split view window. The details vary, but most terminal applications have this capability.

  2. In one window, start a shell prompt in the CS3410 container (rv-debug), and type the following:

    $ rv-debug
    root@fc4d619a76a4:~# qemu -g 1234 sel_sort # The fc4d619a76a4 value will vary from run to run
    

    This will appear to hang, which is what you want. The application is now running, but QEMU is waiting on GDB to launch.

  3. In the other terminal window, type the following using the value you wrote down in the previous step:

    $ docker exec -it fc4d619a76a4 bash
    root@fc4d619a76a4:~# gdb  -ex 'target remote localhost:1234' -ex 'set sysroot /opt/riscv/sysroot' -ex 'file /root/sel_sort' -ex 'set can-use-hw-watchpoints 0' sel_sort
    

    You should see several lines of output, ending in a warning about changing the file. Answer “y” to both prompts, and you’ll get the GDB prompt, (gdb):

    gdb remote

    • The fc4d619a76a4 value in the docker exec command is the ID of the Docker container where exec will run its command. This ID needs to match the ID of the container you started in Step 2. Since we defined the rv-debug shortcut to include an explict container name of our choice (“--name testing”), you can avoid having to copy/paste the container ID every time by typing instead:

      docker exec -it `docker ps -f name=testing -q` bash
      
    • If you were using GDB on a compiled program that was running on native rather than emulated hardware, you could just invoke GDB like this:

      gdb sel_sort
      

      If you try that with the RISCV-64 executable you just compiled, it will load GDB and give you the GDB prompt, but you won’t be able to actually run the program.

GDB Basics

After you entered GDB, there are different commands you can use to help you narrow down the problems. We introduce some of them briefly in the following. With the exception of run, all of these commands should work the same way, whether you’re using GDB in our CS3410 container or natively.

Run

In the remote debugging you’ll use for this assignment and other in the class, you won’t ever use this command (the qemu -g 1234 <etc.> is already running the program you’re debugging). In other settings, however, run is a fundamental part of the basic GDB toolbox. The command runs your program until a breakpoint or crash is encountered. If you are not using GDB remotely, run is the command you would type to begin execution of your program. You can also pause your program by pressing Control-C (useful for finding infinite loops). When one of these is encountered, you will be able to inspect the state of your program with any of the commands below.

Breakpoints, next, step, continue, finish

If we want to stop and see what is going on at a particular point in our program, we can use breakpoints. To do this in GDB, type break, followed by the line number of the source code file where you want to stop. For example, break 64 will set a breakpoint at the beginning of the main in sel_sort.c (i.e. on Line 64). If you want to set a breakpoint at the entry to a procedure, without reference to a line number, you can type break <procedure name> instead.

If the program is already running but paused, continue will resume execution. It will stop at the next breakpoint if there is one, and run to the end, otherwise. If you only want to run to the end of the current procedure, you can use the finish command instead.

After the program stops at a breakpoint, you can use either next or step to execute the program line by line.

Note

(The difference between them is that next will skip over execution of the body of a called procedure and just go to the instruction after the procedure returns, while step will pause at the first instruction of the procedure body.)

(gdb) break main
Breakpoint 1 at 0x10860: file sel_sort.c, line 60.
(gdb) continue
Continuing.
warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?

Breakpoint 1, main (argc=1, argv=0x1555d56d18) at sel_sort.c:60
60          long test_array[5] = {1,4,2,0,3};
(gdb) continue
Continuing.
[Inferior 1 (process 9) exited normally]

Note

If the program you are debugging closes or crashes, you will need to restart the remote debuggin process: exit GDB, restart your program with QEMU waiting on GDB, then re-launch GDB in the other terminal window.

Disable/delete breakpoints

Use the delete <N> command to delete breakpoint N, or disable <N> if you only want to disable it. It reverse, enable N , is used to enable breakpoint N. Typing either delete or disable with no arguments will delete/disable all breakpoints at once.

Backtrace

When GDB reaches an error or a breakpoint it will only tell you the line of code that it occurred on. In order to see the whole backtrace, the whole set of stack frames associated with the file at the time, type backtrace. Use this to find the function that called the function. sel_sort.c:<line number> tells you the file and line number of the instruction that was running when the breakpoint was triggered.

(gdb) break swap
Breakpoint 1 at 0x106b8: file sel_sort.c, line 28.
(gdb) continue
Continuing.
Breakpoint 1, swap (a=0x1555d56b58, b=0x1555d56b70) at sel_sort.c:28
28          long tmp = *a;
(gdb) backtrace
#0  swap (a=0x1555d56b58, b=0x1555d56b70) at sel_sort.c:28
#1  0x000000000001077c in selection_sort (arr=0x1555d56b58, len=5) at sel_sort.c:40
#2  0x00000000000108c4 in main (argc=1, argv=0x1555d56d18) at sel_sort.c:69

This gives the state of the call stack and program execution point at the moment that the breakpoint was triggered. This output tells us that the last instruction to run was line 28 of a call to swap, which itself was called on line 42 of selection_sort, and so on.

Print

While having this much information about the call stack is helpful, we will often want to have a more detailed view of what’s going on in the program. We can see the value of any variable that is in scope in the current stack frame by using the commands print and display. These instructions print the value of any expression that is semantically valid at the current line of execution; in particular, they are useful for seeing the current values of declared variables. The difference between them is that display will show the value of its expresion argument after every instruction step, while print displays it just once.

Breakpoint 1, selection_sort (arr=0x1555d56b58, len=5) at sel_sort.c:38
38          for (int i = 0; i < len; i++)
(gdb) print (i < len)
$1 = 1
(gdb) print a
No symbol "a" in current context.
(gdb) display i
1: i = 0
(gdb) step
39              int swap_idx = smallest_idx(&arr[i], len - i);
1: i = 0
(gdb) display (i < len)
2: (i < len) = 1
(gdb) s
smallest_idx (arr=0x1555d56b58, len=5) at sel_sort.c:10
10          int smallest_i = 0;

Notice how the displays fof both i and (i < len) cease when execution steps into the body of smallest_idx. Once smallest_idx returns, the display of these expressions will resume. You can cancel an ongoing fdisplay with undisplay.

(gdb) finish
Run till exit from #0  smallest_idx (arr=0x1555d56b58, len=5) at sel_sort.c:13
0x0000000000010748 in selection_sort (arr=0x1555d56b58, len=5) at sel_sort.c:39
39              int swap_idx = smallest_idx(&arr[i], len - i);
1: i = 0
2: (i < len) = 1
Value returned is $3 = 3
(gdb) undisplay 2
(gdb) s
42              swap((long *)arr[i], (long *)arr[swap_idx]);
1: i = 0
(gdb)

Finally, a related command, x, gives a more low-level version of this same feature by showing the contents of memory at a given address. See https://visualgdb.com/gdbreference/commands/x, among other resources, for a detailed explanation.

Info

The info command provides brief summaries of important program information:

  • info locals—displays the values of every local variable in the current stack frame
  • info args—displays the values of every parameter in the current stack frame
  • info stack—displays the current call stack
  • info break—displays all currently-defined breakpoints, whether they are enabled or not.

Some Advanced GDB Feautures: Watchpoints And Conditional Breakpoints

Watchpoints

Watchpoints break the program execution whenever the value of an expression changes, and the value changes will be displayed. To set a new watchpoint, you need to invoke watch with either an expression or a raw memory address. If you watch an expression, it must be semantically valid for the current execution point (i.e. all variables in scope, etc.); the watchpoint will be deleted when execution leaves the block in which the expression is meaningful. To watch the contents of a memory address regardless of the program’s block structure, use the -location (or -l) flag. For example, you could set a watchpoint on index 0 of the array test_array.

Breakpoint 1, main (argc=1, argv=0x1555d56d18) at buggy_sel_sort.c:64
64          long test_array[5] = {1,4,2,0,3};
(gdb) watch test_array[0]
Watchpoint 2: test_array[0]
(gdb) watch -location test_array[0]
Watchpoint 3: -location test_array[0]
(gdb) continue
Continuing.

Watchpoint 2: test_array[0]

Old value = 0
New value = 1

Watchpoint 3: -location test_array[0]

Old value = 0
New value = 1
0x000000000001088c in main (argc=1, argv=0x1555d56d18) at buggy_sel_sort.c:64
64          long test_array[5] = {1,4,2,0,3};
(gdb) continue
Continuing.

Watchpoint 2 deleted because the program has left the block in
which its expression is valid.
(gdb) info break
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x000000000001086c in main at buggy_sel_sort.c:64
        breakpoint already hit 1 time
3       watchpoint     keep y                      -location test_array[0]
        breakpoint already hit 1 time
4       breakpoint     keep y   0x0000000000010710 in selection_sort at buggy_sel_sort.c:38

The command info break will show watchpoints as well as breakpoints. To disable a watchpoint, type disable <watchpoint_num>.

Conditional Breakpoints

Conditional breakpoints enable you to break execution on a line of code when an expression evaluates to true. To set a new conditional breakpoint, type break if . For example, to break from execution when smallest_idx is not equal to arr[0] on line 17, you can type break 17 if smallest != arr[0]. Conditional breakpoints allow you to debug specific scenarios and limit the messages that you would collect otherwise when debugging without specific conditions.

(gdb) break 17 if smallest != arr[0]
Breakpoint 1 at 0x1065c: file buggy_sel_sort.c, line 17.
(gdb) continue
Continuing.
warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?

Breakpoint 1, smallest_idx (arr=0x1555d56b58, len=5) at buggy_sel_sort.c:17
17                  smallest_i = i;
(gdb) print smallest_idx
$1 = {int (long *, int)} 0x105f0 <smallest_idx>

Fix the Sorting

Now, use GDB to see what is causing your selection sort to fail.

Hint

What does smallest_idx do?

Part 3: meal_count Problems (Optional)

Now let’s do the slightly harder but more interesting challenge - meal_count! This one requires you to use two new features of GDB, conditional breakpoints and watchpoints.

Wrong Orders!

A new bagel store called Computer System Bagel (CSB) just opens. Unlike CTB where you can buy bagel and coffee separately, CSB sells them as a meal - you must buy one bagel plus one coffee! On CSB’s menu there are three types of bagel - MIPS(#0), ARM(#1), and x86(#2) (sorry no RISC-V), and three types of coffee - HDL(#0), C(#1), and assembly(#2). The meal_count program is used by CSB to track which bagel and coffee have the best sale. When you run the program, it produces output like the following:

2022-10-08, Saturday      # Date
Bagel count: 510 488 2    # MIPS bagel was sold 510 times, ARM bagel 488 times, and x86 bagel 2 times.
Coffee count: 504 494 2   # HDL coffee was sold 504 times, C coffee 494 times, and assembly 2 times.

The manager thinks something is wrong with the output because neither the x86 bagel nor the assembly coffee are sold on Saturday (yes they’re too complicated to make).

Your job is to debug meal_count.c. Fortunately, there are no bugs in the program logic (let us know if you find one though …). But there are issues with the order history, like a wrong item number. The order history is stored in struct Order order_history[NUM_ORDER]. The format is {<BAGEL_NUMBER>, <COFFEE_NUMBER>}. For example, a {0, 1} means one client ordered a MIPS bagel and a C coffee.

There are two wrong orders in the order history. Please try to identify the indices (starting from 0) of these two wrong orders. For example, if the order history is {{0, 0}, {2, 1}} then the order with index 1 is invalid since #2 (x86 bagel) is not sold on Saturday. Let your TA knows the indexes when you find them!

Questions

  1. What are the wrong indices?
  2. Where are they in the source code?
  3. What GDB commands did you use to find them?

Hints:

  • In gdb, you can use p order_idx to print the order index.
  • You can easily find one wrong order using a conditional breakpoint.
  • You may need a watchpoint to find the other one.

Invariants And Assertions

We hope you find the wrong indices! But the reality is that sometimes you don’t even know that your program is misbehaving. For example, if your order history is {{4, 0}, {0, -2}}, the meal_count program generates a totally reasonable report:

2022-10-08, Saturday
Bagel count: 1 1 0
Coffee count: 1 1 0

The report looks good, but it really isn’t, since the 4 and -2 in the order history are invalid. One thing that can help is to think about the invariants of programs and use assertions to detect any unexpected behaviors.

An assertion is a simple expression that will raise an error when its condition doesn’t hold during execution. In C, we write these as ordinary statements of the form “assert(<condition>);”, where <condition> is any boolean-valued expression. For example, you can write this in C:

#include <assert.h>
struct Queue {
  // Assume we have a Queue specification saying that when a Queue is created, it must be empty.
  //  ...
};
int isEmpty(struct Queue q) {
    return ... ;  // return 0 if not empty
}
int main() {
    struct Queue q;
    assert(isEmpty(q) != 0);  // This asserts that q must be empty.
}

Using assertions is a good way to reason about whether your program is implemented as the specification says. For example, an ill-implemented Queue may be not be empty when it’s created. This violates the specification and can be easily caught by the “assert(isEmpty(q) != 0)”.

Some useful cases, among any others, include (1) check whether an expected-to-be-positive int is positive or not; (2) check whether the index to access an array is out-of-bounds. Also, for our CSB Bagel case, both the bagel and coffee number must be 0, 1, or 2, (and on Saturdays, either 0 or 1).

Now, try to add assertions in the meal_count program, and see whether it can catch the invalid order history.

Questions

  1. What is the invariant that fails here?
  2. What causes the failure?
  3. What assertion(s) did you add to detect the failure, and where did you put them?

A5: CPU Simulation

Instructions: Remember, all assignments in CS 3410 are individual. You must submit work that is 100% your own. Remember to ask for help from the CS 3410 staff in office hours or on Ed! If you discuss the assignment with anyone else, be careful not to share your actual work, and include an acknowledgment of the discussion in a collaboration.txt file along with your submission.

The assignment is due via Gradescope at 11:59pm on the due date indicated on the schedule.

Submission Requirements

For this assignment, there are two files to submit to Gradescope:

  • logic.c with the listed functions all implemented.
  • tests.txt with all of your test cases in the correct format.

Restrictions

  • You may not use any additional include directives.
  • Please only change logic.c. Changes to other files will not be reflected in your submission.

Provided Files

The following files are provided in the release:

  • logic.c, which includes the five functions you will implement in this assignment.
  • runner.c, which handles I/O and the structure of the simulator.
  • hash_table.c, an implementation of a simple hash table.
  • hash_table.h, which is the above’s associated header file.
  • sol.h, which includes the signatures of the functions in logic.c and hash_table.c, as well as useful define macros and variable declarations.
  • Makefile, which will appropriately compile and link the above to produce an executable, runner.
  • check.s, a simple assembly program to be used as a smoke test.
  • check.bin, the input to the program, which is the result of assembling check.s.

The only file among these that you will modify is logic.c.

Getting Started

To get started, obtain the release code by cloning your assignment repository from GitHub:

$ git clone git@github.coecis.cornell.edu:cs3410-2025sp-student/<NETID>_cpusim.git

Replace <NETID> with your NetID. All the letters in your NetID should be in lowercase.

Overview

In this assignment, you will implement a subset of the RISC-V 64 instruction set. In order to gain a better understanding of control logic, processor architecture, and how assembly language functions, you will simulate the steps—Fetch, Decode, eXecute, Memory, Writeback—of a simple single-cycle processor.

The program takes assembled RISC-V machine code into standard input. We handle the I/O and break the instructions down into an array of uint32_t values, named instructions. instructions[0] has the 32-bit encoding for the first instruction, and generally, instructions[PC / 4] has the 32-bit encoding for the PC / 4 + 1st instruction (i.e., the instruction at address PC in the input file). The instruction encodings follow the standard that is specified in the RISC-V ISA manual.

After the instructions are fed into the program, while the program counter (divided by 4) is less than the static instruction count, it will continuously, in order, call the functions fetch(), decode(), execute(), memory(), and writeback().

Each of these 5 functions passes information to the next stage. fetch() will pass the current instruction to decode(), which will pass relevant information to execute(), which will pass other information to memory(), which will pass more information to writeback(), which will update the registers and the program counter. The relevant information is stored in a struct called info, which has 4 integers. It is up to you to decide exactly what information to store in the info struct, and not every stage will need all the bits.

The info struct

The info struct is meant as a container for arbitrary bits. There is no single correct way to use its fields to represent the relevant state. You will use the info struct in entirely different ways for each of the four stage → stage communication steps.

The 32 general-purpose registers are simulated as an array of 32 uint64_ts. The starter code initializes all of these to 0.

Memory is simulated as a hash table, data, that maps from uint64_t to uint64_t. The keys are addresses, and the values are the data stored in memory. We suggest mapping an address to one byte of data, but an alternative such as mapping addresses to four or eight bytes is also acceptable.

An implementation of a hash table is provided in hash_table.c and hash_table.h. All key (address) → value (data) mappings are effectively initialized to 0, as the ht_get() function returns 0 when the key is not found.

Use little-endian!

Use the little-endian byte order for your simulated memory. For example, when storing an 8-byte value to address a, store the least-significant byte at a and the most-significant byte at address a+7.

Assignment Outline

  1. Work out a high-level plan and implement addi, andi, ori, and xori detailed in Task 0.
  2. Implement the rest of the instruction subset, detailed in Task 1.
  3. Create a thorough test suite that you will submit, specified in Task 2.

Implementation

(Lab) Task 0: Getting Started in Lab

View the lab slides here.

Task 0.0: Design Plan

As stated in the overview, one of the goals of the assignment is to familiarize yourself with the important steps in a simple five-stage processor. The figure below may be used as reference.

Processor diagram

The five stages of the processor that you simulate are:

  1. Fetch an instruction from main memory. The PC (Program Counter) holds the address of the instruction to fetch.

  2. Decode the instruction into relevant parts that the processor understands, and read the requested register(s). Things to consider: What info is important to extract from an instruction? How should we generate the correct immediate value from the bits in the instruction? How do we single out bits that differentiate instructions—what makes lw different from sw or from sb?

  3. Execute the instruction to determine its result value.

  4. Access memory (here simulated as a hash table) to load/store data, if necessary. Things to consider: How should the stage differentiate bytes vs. words vs. double words? When should this stage sign-extend or zero-extend values when loading and storing?

  5. Write back a new value for the PC, which should—except in the case of a branch—increment by 4 after every cycle, since each instruction is expressed with 4 bytes. Also, write back a newly computed value to the register file, if necessary. Things to consider: When should we write to the register file at all? What should we increment the PC by?

Create a high-level plan for what each function should do and what information it should pass to the next stage. For example, the Memory stage is the only one that accesses memory, and the Decode stage will be the only one that deals with bit-level slicing of the actual instruction word.

Warning

While it would certainly be possible to simulate everything in one function, implementations that are not faithful to the purpose of each stage will incur penalties.

Task 0.1: Implementing four I-type Instructions

Now that you have a plan, let’s walk through four instructions.

  • addi rd, rs1, imm is implemented as:
    Registers[rd] = Registers[rs1] + Sign-extend(imm)

  • andi rd, rs1, imm is implemented as:
    Registers[rd] = Registers[rs1] & Sign-extend(imm)

  • ori rd, rs1, imm is implemented as:
    Registers[rd] = Registers[rs1] | Sign-extend(imm)

  • xori rd, rs1, imm is implemented as:
    Registers[rd] = Registers[rs1] ^ Sign-extend(imm)

Consult the RISC-V reference card to see the encodings for these instructions. Since all of these are I-type instructions, they share the same encoding structure:

31 – 2019 – 1514 – 1211 – 76 – 0
imm[11:0]rs1funct3rdopcode

The reference also tells us the values of the opcode and funct3 fields:

Instructionopcodefunct3
addi0010011000
andi0010011111
ori0010011110
xori0010011100

The fetch stage will get the instruction at index PC / 4. Then, for addi, andi, ori, and xori instructions, the argument to the decode stage will be a uint32_t whose binary is of one of the following forms:

0b[XXXXXXXXXXXX][XXXXX][000][XXXXX][0010011]  // addi
0b[XXXXXXXXXXXX][XXXXX][111][XXXXX][0010011]  // andi
0b[XXXXXXXXXXXX][XXXXX][110][XXXXX][0010011]  // ori
0b[XXXXXXXXXXXX][XXXXX][100][XXXXX][0010011]  // xori

Using bitwise operators, differentiate between these four functions and extract the relevant pieces of information to send to the execute stage.

Hint

Consider using one of the integers in info to communicate which instruction it is. We provide a mapping from instructions to integers via the #define macros in sol.h.

Now, in execute, we will use the operands to compute the result. Since none of these instructions should use the memory stage, think about what information the writeback stage will need, and send this to memory, which will be a no-op.

After using the memory stage to send the information from execute to writeback, consider how your writeback stage should update the state of the program to prepare it for the next instruction.

Trying It Out

To test your implementation, we can write a simple assembly program, prog.s. We will use addi and andi for now. You will later expand testing to include ori and xori in Task 2. Your assembly program can look something like this:

addi ra,zero,0x155
andi sp,ra,0x1b9

In order to obtain the binary to be used as standard input, run either of the two following equivalent commands that assemble prog.s to machine code and copy its contents as raw binary to prog.bin:

  • Option 1: asbin prog.s
  • Option 2: as prog.s -o tmp.o && objcopy tmp.o -O binary prog.bin && rm tmp.o

(Option 1 works because we have provided, in the CS 3410 container, a shorthand script asbin that just runs the commands in Option 2.)

Compile your simulator with make, producing an executable named runner. Now you can run the program with prog.bin as standard input with:

qemu runner < prog.bin

Upon successful execution of runner, the values of the 32 general purpose registers will be printed in hexadecimal.

Testing Routine

To summarize, here are the commands to run if you want to execute your simulator on an assembly program:

$ rv make
$ rv asbin your_great_test_program.s
$ rv qemu runner < your_great_test_program.bin

As always, you can use the rv alias to run commands in the official CS 3410 container.

Task 1: Simulating a RISC-V CPU

Now that you have addi and andi working, implement the remainder of the RISC-V 64 subset listed in the table:

FormatInstructions
R-typeADD, SUB, AND, SLT, SLL, SRA
I-typeADDI, ANDI, ORI, XORI, LD, LW, LB
S-typeSD, SW, SB
U-typeLUI
B-typeBEQ

In the official RISC-V ISA manual, these instructions are part of the RV64I Base Integer Instruction Set, a superset of RV32I (Chapters 2 and 4). A table with the encodings is in Chapter 19. You can also use the reference card.

For the purposes of testing, command line arguments of the form <register number>@<hexadecimal value> set the starting values of individual registers. For example, to set the initial value of register 5 to 0xbeefdeadbeef and the initial value of register 12 to 0xc, the command would be

qemu runner 5@0xbeefdeadbeef 12@0xc < prog.bin

In the release files, we provide a basic test, check.s, and the output of asbin check.s, check.bin. This is also the smoke test that the autograder will run upon submission.

Behavior of BEQ

The RISC-V assembler lets you write beq instructions in two different ways: with labels or write immediate addresses. Because of an assembler quirk, we recommend that you only use labels.

Here’s some more detail. The assembler will convert an instruction of the form beq rs1, rs2, z where z is an immediate address into a sequence of two instructions: bne followed by a jal. This behavior allows assembly programmers to use beq as a pseudoinstruction for jumps beyond what can be done in one actual machine beq instruction. (The addresses of the instructions are not known until linking, so the assembler does not know if the immediate in the beq instruction is within range). We do not expect you to implement bne or jal in this assignment, so we need to write assembly programs to avoid this “convenient” behavior.

Instead, to ensure that the assembler encodes an actual beq instruction, we can use labels with optional offsets. Write your beq instructions in one of these forms:

  • beq rs1, rs2, L1 where L1 is a label at the instruction you want to jump to.
  • beq rs1, rs2, start + imm where start is a label at the very start of the program and imm is the offset (in bytes) of the instruction you want to jump to.

The two following assembly programs, for example, are equivalent and use beq in the correct manner:

Option 1 Option 2
addi t0,zero,1
addi t1,zero,2
equal:
addi t0,t0,2
addi t1,t1,1
beq t0,t1,equal
add t2,t1,t0
start:
addi t0,zero,1
addi t1,zero,2
addi t0,t0,2
addi t1,t1,1
beq t0,t1,start + 8
add t2,t1,t0

The label at equal points to the same location as an offset of 8 bytes (2 instructions) from a label at start.

Task 2: Test Case Submission

Even with this reduced subset of the RISC-V 64 instruction set, there is still plenty of complicated behavior. We suggest writing many test cases to ensure the correctness of your program.

In addition to your implementation in logic.c, you will submit a test suite in tests.txt.

Each test should begin with a line for the additional command-line arguments: CMDS: <arg_0> ... <arg_n>, followed by the assembly for the test case. The last line should have the non-zero outputs in the same format as the command-line arguments: OUTS: <out_0> ... <out_n>.

For example, the following adheres to this format:

CMDS:
addi ra,zero,0x155
andi sp,ra,0x1b9
OUTS: 1@0x155 2@0x111

CMDS: 8@0xbeef 2@0xbee 9@0xef
addi x8,  x8, 9
add x1, x8, x9
add x1, x1, x2
OUTS: 1@0xcbd5 2@0xbee 9@0xef 8@0xbef8

Your tests should cover both basic and edge cases for all of the required instructions. You should have at least 15.

Submission

Submit logic.c and tests.txt on Gradescope. Upon submission, we will provide a smoke test to ensure your code compiles and passes the public test cases.

Rubric

  • logic.c: 75 points
  • tests.txt: 25 points

A6: Assembly Programming

Instructions: Remember, all assignments in CS 3410 are individual. You must submit work that is 100% your own. Remember to ask for help from the CS 3410 staff in office hours or on Ed! If you discuss the assignment with anyone else, be careful not to share your actual work, and include an acknowledgment of the discussion in a collaboration.txt file along with your submission.

The assignment is due via Gradescope at 11:59pm on the due date indicated on the schedule.

Submission Requirements

You will submit the following files to Gradescope. From the lab:

  • lab.txt: Contains all work from the lab exercises.

From Part I;

From Part II:

Getting Started

There is no starter code for this assignment.

However, we still encourage you to use Git to keep track of your solution. An assignment repository has been created for you on GitHub:

$ git clone git@github.coecis.cornell.edu:cs3410-2025sp-student/<NETID>_asm.git

Replace <NETID> with your NetID. All the letters in your NetID should be in lowercase.

Overview

This assignment will level-up your skills as an assembly language programmer by reading and writing RISC-V assembly.

Part 0: Lab

View the lab slides here.

During lab section, we will start with some warm-up exercises to get you familiar with writing RISC-V assembly and to help you start with the assignment. To familiarize yourself with the available instructions, see the RISC-V instruction set manual. As you write assembly, you will also likely find it helpful to use 3410 RISC-V interpreter to execute and validate your code.

Submit all your answers to this part as a text file: lab.txt. This does not need to be formatted in any specific way; just make it readable to a human. We are just looking for complete answers in this part.

Writing Assembly Programs

Your task in lab is to write RISC-V assembly programs to implement several functions.

1. Arithmetic

We begin with implementing arithmetic functions. The binomial theorem lets you expand the powers of a binomial as the following sum of terms:

\[ (x + y)^n = \sum_{k=0}^{n}{n\choose k}x^{k}y^{n-k} \]

We’ll implement both the right- and the left-hand side of this equation for \(n = 4\).

Let’s consider what these programs might look like in C. The LHS would look like:

z = pow(x + y, 4)

And you could write the RHS as:

z = 1 * 1 * pow(y, 4) + 4 * x * pow(y, 3) + 6 * pow(x, 2) * pow(y, 2) + 4 * pow(x, 3) * y + 1 * pow(x, 4) * 1

Write two RISC-V assembly programs: one that computes the value of the LHS of the equation and another that computes the RHS. Then, check that the values given by both are the same for x = 5 and y = 7.

For each program, assume that:

  • register x1 holds the value of x
  • x2 holds y
  • x3 holds z, the final value of the expression

Hint

You can use the mul instruction to implement the calls to pow in the code above. As an even better alternative, you can use shift instructions to multiply by a number that is a power of two. So when you need multiply by a constant, see if you can instead write a sum of shifts.

2. Load and Store

Consider this function in C, which swaps the values at indices 1 and 3 in an array of ints:

void swap(int* arr) {
    int temp = arr[1];
    arr[1] = arr[3];
    arr[3] = temp; 
}

Assume that the arr pointer is in register x1. (Also, don’t worry about out-of-bounds accesses: assume that we allocated enough space for the arr array). Write the RISC-V assembly code to implement this swap.

3. Conditional Control Flow

Consider this code with a simple if statement:

if (x < y)
    y = (x - y) * 2;
else
    y--;

Assume that:

  • register x16 holds x
  • x17 holds y

You may use all other registers to store temporary values if you like. Write a RISC-V assembly program to implement this code.

4. Loops

Consider this for loop in C:

for (int i = 0; i < y; i++) {
  x = x + 2;
}
return x;

Assume that x and i start at 0, and that we use these register mappings:

  • y is in register a0
  • x is in register a1
  • i is in register t0

Which of these RISC-V assembly translations are correct? For the incorrect translations, write a brief explanation of why they are incorrect.

Option 1:

for:
blt t0, a0, end
body:
addi a1, a1, 2
addi t0, t0, 1
beq x0, x0, for
end:

Option 2:

for:
beq t0, a0, end
addi a1, a1, 2
addi t0, t0, 1
beq x0, x0, for
end:

Option 3:

bge x0, a0, end
for:
bge t0, a0, end
addi a1, a1, 2
addi t0, t0, 1
beq x0, x0, for
end:

Option 4:

bge x0, a0, end
for:
bge t0, a0, end
body:
addi a1, a1, 2
addi t0, t0, 1
end:

Option 5:

ble a0, x0, end
for:
addi a1, a1, 2
addi t0, t0, 1
blt t0, a0, for
end:

5. Putting Everything Together

Finally, let’s translate the following C program that calculates the product of an array:

void product(int* arr, int size) {
  int product = 1;
  // --- START HERE ---
  for (int i = 0; i < size; i++) {
    product *= arr[i];
  }
  // --- END HERE ---
  printf("The product is %d\n", product);
}

Translate the indicated section of code—just the loop—to RISC-V assembly. Assume that:

  • x1 holds arr pointer
  • x2 holds size
  • x3 holds product, and it is already initialized to 1 (outside of your code)
  • x4 is uninitialized, but will hold i

Feel free to use any other registers as you see fit.

Reading Assembly

Next, we’ll try understanding assembly code. A good strategy for understanding assembly code is to try reverse translation: write out a C program (or a “pseudo-C program”) that corresponds to the assembly code and then try to understand that code.

6. Branches

Consider the following RISC-V assembly:

addi t0, x0, 0
addi t1, x0, 5
blt t1, x0, label
addi t0, t0, 5
label:
addi t0, t0, 6

What is the value of register t0 after running this code? To answer this question, you can try writing out the corresponding C program.

If blt were replaced by bge, what would the value of register t0 be?

7. Accessing Memory

Consider the following assembly:

addi t1, x0, 4
addi s2, x0, 7
sw s2, 8(t1)
lw s3, 12(x0)

What is the value of s3 after this code runs?

Again, it can be very helpful to first write the corresponding pseudo-C code. Here’s one way to do that:

int* t1 = 4;
int s2 = 7;
*(t1 + 2) = s2;
int s3 = *(3 + ((int*)0));

Why are the constants in those last two lines 2 and 3? You may want to refresh your memory about the rules of pointer arithmetic in C.

8. Loop to C

Let’s translate this assembly code back to C:

addi t0, x0, 7
addi t1, x0, 0
loop: 
bge x0, t0, end
addi t0, t0, -1
add t1, t1, t0
beq x0, x0, loop
end:

Assume that the value of variable x is held in register t0 and y is held in register t1. Here’s a partial translation:

int x = 7;
int y = 0;
while (A) {
  x = B;
  y = C;
}

The placeholders A, B, and C mark expressions that are up to you. All of these should be C expressions.

Part I: From C to RISC-V

In this first part, you’ll translate three C programs written to RISC-V assembly. Consider trying out your implementations using the online RISC-V simulator to check that it behaves like the original C.

Array Accesses

Imagine we have variables of these types:

int x;   // x10
int y;   // x11
int* A;  // x12
int* B;  // x13

Assume that the two pointer variables, A and B, point to large arrays of ints. The code you need to translate is:

x += (x + y) * 2 - A[4];
B[3] = x;

Assume:

  • x is stored in register x10
  • y is in x11
  • the base address of array A is in register x12
  • B is in x13

Use x5 and x6 (and no more) as the temporary registers. Write your assembly code in a file named arrays.s.

Multiplication

Let’s implement the integer multiplication instruction in RISC-V using other instructions! The instruction mul rd, rs1, rs2 multiplies rs1 and rs2 and stores the result in rd. Here is an implementation in C for 64-bit integers:

unsigned long intmul(unsigned long rs1, unsigned long rs2) {
  unsigned long rd = 0;
  for (int i = 0; i < 64; i++) {
    if (rs2 & 0x1) {
      rd += rs1;
    }
    rs1 <<= 1;
    rs2 >>= 1;
  }
  return rd;
}

Translate the above code to assembly. Do not use the mul instruction. Assume:

  • the variable rs1 is stored in register a0
  • rs2 is in register a1
  • the return value rd goes in t0

Use t0, t1, and t2 for any temporary values. Please name your submission file mult.s.

Primality Test

The following function prime gives a rudimentary algorithm for checking whether a number (p) is prime:

bool prime(int p) {
  if (p < 2) {
    return false;
  }

  for (int i = 2; i < p; i++) {
    int rem = p % i;
    if (rem == 0) {
      return false;
    }
  }
  return true;
}

Translate this function to RISC-V. Submit your file as prime.s.

Please label the entry block to your assembly with .prime.

Imagine that there are two labels .ret_tru and .ret_fls that already exist; translate the return true and return false lines into jumps to these labels.

Assume p is stored in a2 (a.k.a. x12).

To implement the % operation, you will need to use mul and div instructions. Please use t3t6 (a.k.a. x28x31) for temporary values, and try to minimize how many of these you use.

Part II: Mysterious RISC-V

Your friend, Sia, is a great C programmer. But she doesn’t understand RISC-V assembly, unfortunately. She is trying to understand some mysterious RISC-V programs so she comes to find you, a RISC-V assembly programmer, to help her translate those RISC-V programs to C so that she can understand what they do.

Mysterious Function 1

Here’s one assembly program Sia is trying to understand:

loop:
  loop:
  lw   x5, 0(x11)
  add  x5, x5, x15
  lw   x6, 0(x12)
  mul  x6, x6, x5
  sw   x6, 0(x13)
  addi x11, x11, 4
  addi x12, x12, 4
  addi x13, x13, 4
  addi x14, x14, -1
  bne  x14, x0, loop
  ret

Sia has already written a function signature:

void mystery1(int *arr1, int *arr2, int *arr3, int size, int num) {
  // ???
}

Assume that the function arguments are in registers x11 through x15, a.k.a. a1 through a5. Also assume that any array length given as an input is greater than zero. Complete this C function so it behaves the same way as the above assembly.

Follow these guidelines in your translation:

  • Prioritize readability. Comments are optional, but use them if you think it makes the code easier to understand.
  • Do not use goto. Use C’s if, for, while, etc. instead.
  • Prefer for loops over while loops. It is always possible to use while to implement any loop, but we want you to use for if the control flow fits the typical for (i = 0; i < max; i++) pattern.

It is possible to implement this function in only 2 lines of straightforward, readable C. Your solution does not need to be that short, but try to make it reasonably compact and understandable. (Sia will be grateful!)

Submit your completed implementation of the mystery1 function in mystery1.c.

Hint

Once you have a working C program, consider writing some tests for it. You can write a main function that calls the mystery1 function a few times on different inputs, for example, so you can compare the results to running the original RISC-V code. But please only submit the mystery1 function alone.

Mysterious Function 2

Sia asks you about a second mysterious assembly program:

addi x10, x0, 0

loop:
  lw x6, 0(x11)
  bne x6, x0, foo
  j bar

foo:
  sw x6, 0(x12)
  addi x12, x12, 4
  addi x10, x10, 1

bar:
  addi x11, x11, 4
  addi x13, x13, -1
  bne x13, x0, loop

ret

She already has this function signature:

int mystery2(int* arr1, int* arr2, int size) {
  // ...
}

The function arguments are again in registers a1 through a3 (a.k.a. x11 through x13). Register x10 is used to store the result of mystery2. Complete this function body. Use the same guidelines as in the previous part. You can also assume that any array length given as an input is greater than zero. It is possible to implement this code in about 6 lines of readable C but, again, your solution does not need to be that short.

Submit your solution in a file named mystery2.c.

Rubric

We will test all submitted code by running it on several test cases to check that it behaves correctly, i.e., equivalent to the original code. We will also manually read to the assembly code to check that the required registers are used, and we’ll read the C to see that it obeys the guidelines.

  • lab.txt: 20 points
  • arrays.s: 16 points
  • mult.s: 16 points
  • prime.s: 16 points
  • mystery1.c: 16 points
  • mystery2.c: 16 points

A7: Functions in Assembly

Instructions: Remember, all assignments in CS 3410 are individual. You must submit work that is 100% your own. Remember to ask for help from the CS 3410 staff in office hours or on Ed! If you discuss the assignment with anyone else, be careful not to share your actual work, and include an acknowledgment of the discussion in a collaboration.txt file along with your submission.

The assignment is due via Gradescope at 11:59pm on the due date indicated on the schedule.

Submission Requirements

Please submit the following files.

From the in-lab work:

  • addone.s: your first assembly function, which increments the integer it’s given
  • recsum.s: a recursive summation function

From Part 1:

Provided Files

We have provided you the following:

  • recursive.c, memoization.c, tail_recursive.c, and opt_tail.c, as the starter code for Part 1
  • compare.c, a program that compares the performance of the different versions of the Fibonacci function
  • Makefile, which you can use to build executables for the above programs

Getting Started

To get started, obtain the release code by cloning your assignment repository from GitHub:

$ git clone git@github.coecis.cornell.edu:cs3410-2025sp-student/<NETID>_asmfunc.git

Replace <NETID> with your NetID. All the letters in your NetID should be in lowercase.

Overview

This assignment will expand your understanding of RISC-V assembly programming with a focus on managing the call stack. You will get direct experience with defining and calling functions and adhering to RISC-V calling conventions. You will learn about optimizing recursive functions for performance.

Part 0: Lab Section

View the lab slides here.

During this lab section, you’ll get some initial experience with writing functions in assembly. The key challenge is in following the RISC-V standard calling conventions. The calling conventions are a set of rules that both the caller code and the callee code (the function being called) must follow.

All RISC-V functions can be broken up into three parts (in-order):

  1. Prologue. Located at the beginning of the function, the prologue constructs the stack frame.
  2. Body. Located after the prologue, the body contains the instructions of what the function actually does.
  3. Epilogue. Located at the end of the function, the epilogue releases the stack frame before returning control to the caller.

We recommend starting writing the body of the function, noting which callee- and caller-saved registers you used. Afterwards, you can write the prologue and epilogue to properly save and restore the registers that you used.

Warm Up: add_one

Let’s start simple by implementing a function that adds 1 to its argument. You can imagine this C function:

int add_one(int i) {
  return i + 1;
}

Let’s start by compiling the body of the function. If you refer to the RISC-V calling conventions, you’ll notice that the first argument and the return value go in register a0. This makes the body pretty simple — we just have to add 1 to a0!

addi a0, a0, 1

Next, the prologue. First, we need to determine how large the stack frame must be; let’s call this number SIZE. SIZE must be big enough to hold the return address, any callee-saved registers, and any local variables that don’t fit in registers. Here’s a compact “to-do” list for what the prologue must do:

  1. Move the stack pointer down by the size of the stack frame.
  2. Push the return address onto the stack.
  3. Push any callee-saved registers that our body uses onto the stack.
  4. If needed, push any local variables that don’t fit into registers onto the stack.

The epilogue does the opposite; it must release the stack frame (i.e., clean up!):

  1. Restore any callee-saved registers by popping them from the stack.
  2. Pop the return address from the stack.
  3. Move the stack pointer back to its original position.
  4. Use ret (a.k.a., jr ra) to return to jump to the next instruction after the function call.

As we’ve written it, our function body doesn’t use any callee-saved registers (s0s11), nor does it require any stack space for local variables. That means we just have to store the return address on the stack. In RISC-V 64-bit (what we’re using), memory addresses are 8 bytes (64 bits).

Putting it all together, here’s an implementation of add_one:

add_one:
  # Prologue.
  addi sp, sp, -8  # Push the stack frame.
  sd   ra, 0(sp)   # Save return address.

  # Body.
  addi a0, a0, 1

  # Epilogue.
  ld   ra, 0(sp)   # Restore return address.
  addi sp, sp, 8   # Pop the stack frame.
  ret

The key difficulty of writing the prologue and epilogue is deciding where in the stack frame to store what. In other words, what is stored at which offset from sp? add_one only needs to store one value, the return address, so that just goes at 0(sp). However, in general you must determine the layout of the stack frame.

Copy the RISC-V assembly of the add_one function above into a file called addone.s.

Trying It Out: Calling Your Function From C

We can’t run this assembly program yet as it lacks a main function. It also doesn’t print anything out, which makes it hard to tell what it’s doing (if anything). One way to test your assembly functions is to write a C program that calls your assembly function.

Make sure that your addone.s implementation has an add_one: label. Now, at the top of the file add the following line:

.global add_one

This directive tells the assembler that the add_one label is a global symbol, meaning it’s accessible to other code.

Then, in a separate file (e.g., addone_test.c) copy the C program below:

#include <stdio.h>

int add_one(int i);

int main() {
  int res = add_one(42);
  printf("%d\n", res);
}

That add_one declaration is called a prototype, which means it doesn’t have a function body. It just tells the C compiler that the function is implemented elsewhere—in your case, in an assembly file.

Now, let’s compile and link these two files together.

$ rv gcc addone.s addone_test.c -o addone_test

Then use rv qemu addone_test, to run the program.

This works thanks to the magic of calling conventions! You and GCC are both “assembly programmers”, and because you agree on the standard way to invoke functions, the assembly code you both write can interoperate.

Recursive Sum

Next, we’ll write a recursive function that sums the integers from \(1\) through \(n\). The function we want to implement would look something like this in C:

int sum(int n) {
  if (n == 0)
    return n;
  return n + sum(n - 1);
}

In assembly, recursive function calls work exactly the same way as any other function call—the caller and callee just happen to be the same function. We’ll follow the RISC-V calling conventions in both roles.

Start by writing the function body. The interesting part is implementing the function call. Take note of which caller-saved registers you need to save before the jal instruction and restore after the function returns.

Next, write the prologue and epilogue. You’ll want to start by making a list of all the values this function will ever need to store in its stack frame. Determine the stack frame layout, or the offsets you’ll store each value at. Lastly, follow the recipe from the add_one step above to write the prologue and epilogue.

Once you’re done, you can test your sum function by writing a main wrapper in C, as we did for add_one. You’ll want to try calling sum on several different inputs.

To finish, put your assembly implementation of the sum in a file called recsum.s.

Part 1: Optimizing Fibonacci

In this assignment, you will implement several different versions of a function the \(n\)th number of the Fibonacci sequence. We’ll start with a straightforward recursive implementation and then explore various performance optimizations.

Version A: Recursive Fibonacci

Here’s a straightforward recursive implementation of a Fibonacci function in C:

unsigned long r_fibonacci(int n) {
  if (n == 0)
    return 0;
  else if (n == 1)
    return 1;
  else
    return r_fibonacci(n - 2) + r_fibonacci(n - 1);
}

Your task is to translate this code into RISC-V assembly.

Put your implementation in a file called recursive.s. We have provided a main function you can use to test your code in recursive.c. To test your code, type:

$ rv make recursive     # Build the `recursive` executable.
$ rv qemu recursive 10  # Run it.

The recursive executable takes a command-line argument: the index of the Fibonacci to calculate. So qemu recursive 10 should print the 10th Fibonacci number, which is 55.

Version B: Memoized Fibonacci

The recursive implementation works, but it is very slow. Try timing the execution of a few Fibonacci calculations:

$ time rv qemu recursive 35
$ time rv qemu recursive 40
$ time rv qemu recursive 42

On my machine, calculating the 40th Fibonacci number took 4 seconds, and calculating the 42nd took 11 seconds. That suggests that the asymptotic complexity is pretty bad.

Part of the problem is that the recursive version recomputes the same answer many times. For example, if you call r_fibonacci(4), it will eventually call r_fibonacci(2) twice: once directly, and once indirectly via the recursive call to r_fibonacci(3). This redundancy can waste a lot of work.

A popular way to avoid wasteful recomputation is memoization. The idea is to maintain a memo table of previously-computed answers and to reuse them whenever possible. For our function, the memo table can just be an array, where the \(i\)th index holds the \(i\)th Fibonacci number. Here’s some Python code that illustrates the idea:

def m_fibonacci(n, memo_table, size):
    # Check the memo table. A nonzero value means we've already computed this.
    if n < size and memo_table[n] != 0:
        return memo_table[n]

    # We haven't computed this, so do the actual recursive computation.
    if n == 0:
        return 0
    elif n == 1:
        return 1
    answer = (m_fibonacci(n - 2, memo_table, size) + 
        m_fibonacci(n - 1, memo_table, size))

    # Save the answer in the memo table before returning.
    if n < size:
        memo_table[n] = answer

    return answer

In C, the type of memo_table will be unsigned long*, i.e., an array of positive numbers. size is the length of that array. Here’s the function signature for our new function:

unsigned long m_fibonacci(int n, unsigned long* memo_table, int size);

Implement this m_fibonacci function in RISC-V assembly. Put your code in memoization.s.

We have provided a memoization.c wrapper that you can use to test your code. You can use the same procedure as above to try your implementation: rv make memoization followed by rv qemu memoization <number>.

Notice how much faster the new implementation is! Take some number that was especially slow in the recursive implementation and time it using your memoized version:

$ time rv qemu memoization 42

On my machine, that takes just 0.5 seconds. That’s 22× faster!

Version C: Tail Recursive Fibonacci

While the new version is a lot faster, it still makes a lot of function calls. Some of those function calls turn out to be fast, because they just look up the answer in the memo table. But we can do better by changing the algorithm to need only one recursive call.

Again using Python syntax, here’s the algorithm for a faster recursive version:

def tail_r_fibonacci(n, a, b):
    if n == 0:
        return a
    if n == 1:
        return b
    return tail_r_fibonacci(n - 1, b, a + b)

This version is called tail-recursive because the recursive call is the very last thing the function does before returning. Marvel at the fact that this version makes only \(n\) recursive calls to calculate the \(n\)th Fibonacci number!

Here’s the function signature for this version:

unsigned long tail_r_fibonacci(int n, unsigned long a, unsigned long b);

Implement this tail_r_fibonacci function in tail_recursive.s. As usual, we have provided a C wrapper so you can test your implementation: rv make tail_recursive followed by rv qemu tail_recursive <number>.

Version D: Tail-Call Optimized Fibonacci

Making \(n\) recursive calls is pretty good, but is it possible to optimize this code to do no recursion at all? That would mean that the algorithm uses \(O(1)\) stack space instead of \(O(n)\).

That’s the idea in tail-call optimization. The plan is to exploit that, once the recursive call to tail_r_fibonacci is done, the caller has nothing more to do. The callee puts its return value in a0, and that is exactly what the caller wants to return too. Because there is no more work to do after the tail call, we don’t need to waste time maintaining the stack frame for the caller. We can just reuse the same stack frame for the recursive call!

Implement an optimized version of the tail-recursive Fibonacci algorithm in opt_tail.s. Instead of using a jal (or call) instruction for the recursive call, you can just use a plain unconditional jump (j in RISC-V). Be sure to carefully think through when and where you need to save and restore the return address to make this work.

Your function should be named opt_tail_fibonacci, and it should have the same function signature as the previous version. As usual, opt_tail.c can help you test your implementation: rv make opt_tail followed by rv qemu opt_tail <number>.

Compare Performance

We have provided a program, in compare.c, that can compare the performance of these various optimizations more precisely than the time command. (That was also measuring the time it takes to start the executable up, which can be slow, especially when it entails launching a Docker container.) Build the tool and invoke it like this:

$ rv make compare
$ rv qemu compare <method> <n>

You can give it the name of a method (recursive, memoization, tail_recursive, or opt_tail) and a number \(n\) to measure the time taken to compute the \(n\)th Fibonacci number. Or use the all method to compare all the implementations.

When I ran this once on my machine with \(n=20\), it reported that the recursive implementation took about 2.6 seconds, memoization brought this down to just 7 milliseconds, tail recursion was even faster at 3 ms, and the optimized tail call version was blazingly fast at only half a millisecond. Every computer is different, so your numbers will vary, but see if you observe the same overall performance trend.

There is nothing to turn in for this part—it’s just cool!

Submission

Submit all the files listed in Submission Requirements to Gradescope. Upon submission, we will provide a smoke test to ensure your code compiles and passes the public test cases.

Rubric

  • Part 0:
    • addone.s: 5 points
    • recsum.s: 5 points
  • Part 1:
    • recursive.s: 10 points
    • memoization.s: 15 points
    • tail_recursive.s: 10 points
    • opt_tail.s: 15 points

Prelim 1

Time & Location

Thursday, February 20 at 7:30pm in Statler Auditorium (STL185).

Scope

  • All lectures up to and including week 4 (L01–L08). This includes material presented in lecture, lecture slides, and lecture notes (but not textbook readings)
  • Assignments 0–3
  • Topic Master Quizzes (TMQs) 1–4
  • Online exercises E0-E4

Review Sessions

There will be two in-person review sessions:

  • Thursday, February 13 from 7–8pm in Gates G01
  • Wednesday, February 19 from 7–8pm in Malott 251

Both review sessions will cover the same material. Slides from the review sessions are located here.

Past/Practice Exams

Browse the table below for links to past exams and solutions. Please note that these files are hosted on Canvas and will require you to login using your NetID.

2025SP Solutions

You can find links to the two versions of the prelim, along with solutions, in the table below. These files are hosted on Canvas and will require you to login using your NetID.

Lab 202

Snee Hall Geological Sci 1150

TA 1
Sharafa Mohammed
TA 2
Alex McGowan
TA 3
Vivian Zhou
TA 4
Reese Thompson
TA 5
Serena Duncan
TA 6
Salman Abid
Arman M.Fiifi B.Andrew K.Tim H.Nicole S.Simon I.
Samuel T.Muhammad H.Xueqing T.Joseph W.Amelia Z.Rahul R.
Ambrose B.Nolan B.Oleksandr B.Jack Z.Meg I.Ethan C.
Rich H.Sam C.Carter T.Aidan W.Markus B.Madeline O.
Jenna I.Damon H.Nikita D.William L.Pablo R.Kevin B.
Najiullah B.Yoojung J.Huajie Z.Timothy L.Grace S.
Matthew J.Sam S.

Lab 203

Carpenter Hall 104 Blue

TA 1
Keting Chen
TA 2
Analeah Real
TA 3
Caitlyn Cahill
TA 4
Edward Duan
TA 5
Omar Abuhammoud
Salem A.Jonah H.Fudayl N.Linda S.Justin X.
Abrar A.Noah H.Andrew P.Amrita T.Harry Y.
Tanya A.Logan H.Phoebe Q.Mericel T.Michael Z.
Jay B.Grace K.David R.Armaan T.Joey Z.
Aiden C.Ryan K.Jordan S.Eva V.Elaine W.
Zhuo C.Xiaoxin L.Sophie S.Esha V.
Grant H.Kaustav M.Shriya S.

Lab 204

Snee Hall Geological Sci 1150

TA 1
Michael Avellino
TA 2
Peter Engel
TA 3
Noah Plant
TA 4
Maximilian Fanning
TA 5
Luciano Bogomolni
TA 6
Jiahan Xie
Lauren B.David C.Sowoon C.Joshua D.Tamer G.Arnab G.
Edward H.Andrew H.Jerry J.John K.Ben K.Kelly L.
Gabriel L.Timothy L.William L.Winnie L.Nolan L.Farhan M.
Timothy N.Kea-Roy O.Andrew Q.Falak R.Jacob R.Niko R.
Andres R.Anderson S.Joel S.Amy W.Adelynn W.Austin W.
Jinzhou W.Ethan X.Jingyu X.Eric Y.Zhijia Y.Harvey Z.

Lab 205 & Lab 210

Snee Hall Geological Sci 1150

TA 1
Melissa Reifman
TA 2
Galiba Anjum
TA 3
Kayla Ng
TA 4
Kelly Yue
Aryan A.James C.Ivan D.Andy M.
Aghamatlab A.Rishika C.Ryan F.Rithikh P.
Pierre A.Jonathan C.Kai G.Jonathan S.
Adam C.Sylvia H.Anthony K.Robert T.
Winnie L.Casper L.

Lab 206

Snee Hall Geological Sci 1150

TA 1
Bisola Okunola
TA 2
Angelica Borowy
TA 3
Alex Koiv
TA 4
Michael Micalizzi
TA 5
Jake Berko
Vishu A.Ahan M.Frank D.Naveen R.Alex I.
Keya A.Ikechi N.Harry G.Marco R.Leon J.
Matthew A.Benjamin N.Eric G.Sanjum S.Matteo J.
Joanna A.Dylan O.Aaron G.Arjun S.Joshua K.
Cole B.Kira P.Max G.Johnson W.Sridula K.
Michelle C.Bassem Q.Lawrence G.Eric W.Srivatsa K.
Jeremy C.Leon H.Jenna M.

Lab 207

Snee Hall Geological Sci 1150

TA 1
Bhuwan Bhattarai
TA 2
Ryan Mistretta
TA 3
Srija Ghosh
TA 4
Timmy Li
TA 5
Ozan Ersöz
Sreya J.Cougar H.David K.Steven Y.Ian K.
Eric Y.Andrew A.Mohammed A.Dorothy H.Sorong D.
Anna L.Lucas S.Ellyn H.Wilson C.Shubham M.
Abigail K.Adam H.Alan C.Julian P.Julia K.
Jon S.Bryant H.Selina L.Dylan K.Jay J.
Aarsha J.Jerry J.Ignacio C.Eric J.Alexis L.
Megan Y.Niti G.Mikko L.Aadi S.

Lab 208

Phillips Hall 318

TA 1
Ilya Strugatskiy
TA 2
Alan Han
TA 3
John Palsberg
TA 4
Will Bradley
Tanvi B.Bella F.Trevor L.Teg S.
Helen B.Jason F.Ronald L.Martin S.
Saarang B.Ryan F.William L.Yihun S.
Kayton B.Meris G.Aiden M.Jay T.
Vail C.Jeana H.Skai N.Grace W.
Andrei C.Mishita K.Jo R.Nicholas Y.
Jaden C.Arnav K.Ganesh R.Kelly Z.
Andrew D.Marc K.Nikil S.Alan Z.

Lab 209

Snee Hall Geological Sci 1150

TA 1
David Suh
TA 2
Yunoo Kim
TA 3
Nathan Chu
TA 4
Melvin Van Cleave
TA 5
Kevin Cui
Najiullah B.Ravnoor B.Elli B.Evan C.Paul F.
Maia F.Maggie G.Bradley G.Jeffrey H.Aaron L.
Charles L.Elliott L.Raymond L.Thomas L.Krish M.
Sameer M.Kiyam M.Aakaash N.Alexander N.Cedric O.
Christian P.Jonathan L.Razika R.Kevin R.Davey S.
Stephan V.Daniel W.Daniel X.Firdavs Y.Ashlie Z.
Cici Z.Ethan K.Aadi S.