A2: Minifloat
For answers to frequently asked questions regarding this assignment, please see the A2 Megathread on Ed.
Instructions:
Remember, all assignments in CS 3410 are individual.
You must submit work that is 100% your own.
Remember to ask for help from the CS 3410 staff in office hours or on Ed!
If you discuss the assignment with anyone else, be careful not to share your actual work, and include an acknowledgment of the discussion in a collaboration.txt
file along with your submission.
The assignment is due via Gradescope at 11:59pm on the due date indicated on the schedule.
Submission Requirements
For this assignment, you will need to submit the following five files:
minifloat.c
, with your written implementation for the missing functions.minifloat_test_part1.expected
, to match additional tests added inminifloat_test_part1.c
- Some additional tests, in:
minifloat_test_part1.c
minifloat_test_part2.c
minifloat_test_part3.c
Restrictions
For this assignment, you will build your own floating-point representation.
- You may not use built-in C operations for floating-point arithmetic.
- You may not cast data to
float
ordouble
, or create variables with these types.
Provided Files
The provided release code contains seven files:
minifloat.c
, which includes some completed functions and some functions you are expected to implementminifloat.h
, which provides declarations and comments for the functions inminifloat.c
, including those you are to implementminifloat_test_part1.c
,minifloat_test_part2.c
,minifloat_test_part3.c
, which provide some tests for you to get started. You are expected to add more tests of your own to each of these test suitesminifloat_test_part1.expected
, which provides a baseline file to help with testing part 1. You are expected to add more lines to this file as part of testing part 1.Makefile
, which provides structure to compile your code (see our brief tutorial on Makefiles)
Getting Started
To get started, obtain the release code by cloning your assignment repository from GitHub:
$ git clone git@github.coecis.cornell.edu:cs3410-2025sp-student/<NETID>_minifloat.git
Replace <NETID>
with your NetID. For example, if your NetID is zw669
, then this clone statement would be git clone git@github.coecis.cornell.edu:cs3410-2025sp-student/zw669_minifloat.git
Overview
In this assignment, you will develop a custom minifloat data format in C. You will be expected to reason about floating-point details and implement operations over your custom floating-point data type in C.
Background
In class, we learned about floating-point numbers, which represent decimals with some number of bits.
C has built-in float
and double
types, which use (on modern hardware)
32 bits and 64 bits, respectively.
Increasing the number of bits in a floating-point representation gives it more precision and more dynamic range, at the expense of less efficient arithmetic.
It can also be useful, however, to perform operations with smaller floating-point representations—trading off precision for potentially faster calculations.
In this assignment, you will implement functions for a specialized 8-bit floating-point type. We’ll call these 8-bit numbers minifloats. Minifloats have severely limited precision, but such tiny floating-point values are useful for situations where errors matter less and data sizes are enormous: most prominently, in machine learning. See, for example, this paper and this other paper that both show serious efficiency advantages from using 8-bit minifloats. While most floating-point formats enjoy built-in hardware support, we can also implement minifloats in software with bit packing tricks.
Minifloats follow a similar representation strategy to the standard IEEE floating-point types that we learned about in lecture. However, they differ in a few important ways to make the implementation simpler, which we will summarize as well.
Minifloat Specification
- Minifloats use 8 bits in total: 1 sign bit, 3 exponent bits, and 4 significand bits. The layout of a minifloat looks like this, with
s
for sign,e
for exponent, andg
for significand:

-
As in standard formats, a sign bit of
0
indicates a positive number, and a sign bit of1
indicates a negative number. -
Minifloats have a bias of 3. In other words, we subtract 3 from the bit-representation of a minifloat exponent. In comparison, single-precision floating-point numbers (i.e.,
float
) have a bias of 127. -
Unlike standard floating-point formats, wherein we usually append a leading 1 to the significand bits with the 1.g notation, minifloats use the significand directly, with the binary point after the first digit. So if the four significand bits are g3g2g1g0, then the “base” part of the represented value is the binary number g3.g2g1g0. Or, in other words, the value is g×2−3, where g is the unsigned integer value of those 4 bits.
-
Also unlike standard floating-point formats, our minifloats do not use special values: not a number (NaN) and infinity (+∞ and -∞).
All together, the value represented by a minifloat with sign s, exponent e, and significand g is:
(−1)s×(g×2−3)×2e−3
Or, equivalently, if you prefer to think of the significand’s representation in terms of bits:
(−1)s×(g3.g2g1g0)×2e−3
where g3 is the significand’s most significant bit, g0 is the least significant bit, and so on.
Examples
Now that we have defined our minifloat specification, let’s see some examples!
Example 1: 10111100
We have a sign of 1
, an exponent of 011
, and a signficand of 1100
.
- Our sign bit
1
corresponds to −1. - Our exponent
011
corresponds to a decimal exponent of 3−3=0. (We’re applying our −3 bias here.) - Our significand
1100
corresponds to the decimal 12×2−3=128=1.5. (Or, equivalently, the significand corresponds to the binary number 1.1002, which is 1.5 in decimal.)
Altogether, 10111100
is −1×1.5×20=−1×1.5×1=−1.5 in base-10.
Example 2: 00010010
We have a sign of 0
, an exponent of 001
, and a significand of 0010
.
- Our sign
0
corresponds to +1. - Our exponent
001
corresponds to a decimal exponent of 1−3=−2. - Our significand
0010
corresponds indicates the binary value 0.0102, which equals 0.2510.
Altogether, 00010010
is 1×0.25×2−2=116=0.0625 in base-10.
Converting between Minifloats and Decimals
Decimal to Minifloat
To convert a decimal number into a minifloat:
- Convert the integer and fractional parts into binary.
- Normalize to match the format g3.g2g1g0×2e.
- Convert exponent into biased form (i.e., add 3).
- Set the sign bit accordingly.
Example: Converting 2.25 into an 8-bit float
Step 1: Convert the integer and fractional parts to binary.
Converting the integer portion into binary yields 10
.
Our fractional part is 0.25. To convert, multiply the fractional part by 2, record the integer part of the result (should be 0 or 1), and repeat with the new fractional part until the fractional part becomes 0 or the precision limit is reached (is 4 digits for our minifloat format). The recorded integer parts of this process becomes our binary representation for the original fractional part.
- 0.25×2=0.50. Record
0
. - 0.50×2=1.00. Record
1
.
Thus our binary representation of 0.25 is 01
. Together with the integer
portion, our binary representation of 2.25 is 10.01
.
Step 2: Normalize to match the format g3.g2g1g0×2e.
Now we normalize our result so that it fits the format g3.g2g1g0×2e. In this case, we shift to the left by one place: 1.001×21. From this we can see that our significand is 1001
.
Step 3: Convert exponent into biased form (i.e., add 3).
Next, we need to apply our format’s exponent bias, which for minifloats is 3. To
bias the exponent, we add our original exponent e with the bias. So,
1+3=4 (100
in binary).
Step 4: Set the sign bit accordingly.
Lastly, because 2.25 is positive, the sign bit should be set to 0
.
Thus the minifloat representation of 2.25 is 01001001
.
Minifloat to Decimal
To convert from a floating-point number into a decimal number:
- Extract the sign, exponent, and significand.
- Normalize the significand to the format g3.g2g1g0 and remove trailing zeros.
- De-normalize to make the exponent 0.
- Convert the integer and fractional parts to decimals.
- Add a negative sign if necessary.
Example: Converting 11011100
into a Decimal
Step 1: Extract the sign, exponent, and significand.
- Sign bit:
1
(negative) - Exponent:
101
- Significand:
1100
Step 2: Normalize the significand to the format g3.g2g1g0 and remove trailing zeros.
Our significand 1100
becomes 1.1.
Step 3: De-normalize to make the exponent 0.
We first convert our binary exponent 101
into base-10, yielding 5. We then
subtract our bias (which is 3 for minifloats) from our exponent to get 5−3=2.
Since our exponent is 2, we shift our binary point 2 places to the right, yielding 110.0.
Step 4: Convert the integer and fractional parts to decimals
Next, we convert the integer and fractional parts of 110.0 into base-10. Since 1102=610 and 02=010, 110.02=6.010.
Step 5: Set the sign according to sign bit
Since the sign bit is 1
, the final value is: −6.0.
Adding Minifloats
To perform addition with floating-point numbers:
- Rewrite the smaller number so that the exponents are equal, and adjust the mantissa of the number with the smaller exponent by shifting it to the right accordingly.
- Add the mantissas together.
- Recombine and renormalize the result if necessary.
Example: 1.5+0.5
First, we need to convert 1.5 and 0.5 into their minifloat representations. For 1.5 this is 1.1×20, and for 0.5 this is 1.0×2−1.
Step 1: Adjust the mantissa
Because the exponents differ, we shift 0.5’s mantissa to the right by one: 1.0→0.10
Now both numbers have an exponent of 0.
Step 2: Add the mantissas together.
- 1.12+0.102=10.02
Step 3: Recombine and renormalize the result if necessary
- 10.02×20=1.0×21
Thus the answer is 0 100 1000
which is equivalent to 2.0 in base-10.
Bit size in C
We want to ensure that the type we are using to represent a minifloat is exactly 8 bits.
We will use the uint8_t
type from C’s stdint.h
header.
(We will avoid char
, even though char
is 8 bits on most platforms, because C unhelpfully does not guarantee that is is exactly 8 bits everywhere.)
To break down this type’s, the uint
means that bit-level operations are as on an unsigned integer, the 8
means that we expect operations to be on 8 bits, and _t
is a common naming convention that indicates that this is a type.
The stdint.h
header defines many similar types, like these:
Type | Description |
---|---|
uint8_t | unsigned integer with 8 bits |
uint16_t | unsigned integer with 16 bits |
int8_t | signed integer with 8 bits |
Your Task
This assignment is divided into three parts: displaying minifloats as decimals, implementing operations on minifloats, and using minifloats. Each part will have you implementing 1–3 functions, and adding test cases to help convince yourself these functions are correct. You must add at least 4 new test cases per function to what we have provided, though you may add more.
For all of your C implementations, you may not include any constants or variables of type float
, double
, or long double
.
You may not use C’s built-in floating-point operations, such as +
on floating-point values.
This is not an arbitrary restriction. Using a larger float representation in your implementation will defeat the purpose of the smaller representation, which is that they are smaller and faster than “normal” floating-point types. Because of floating-point error, it is also very likely to introduce incorrect results.
We have provided a mini_to_double
utility function to help you with debugging and testing. You may not use this function in any of your submitted implementations, but you may use this function for writing test cases for any of your functions.
Part 1: Lab
View the lab slides here.
Review
If you need to, look over the lecture notes on standard floating-point types to remind yourself of the basic principles. And try out float.exposed to get hands-on practice!
Read over the background above and especially the specification for minifloats. To briefly summarize the minifloat format:
- Bit 7 is the sign bit
- Bits 6–4 are the exponent bits
- Bits 3–0 are the fraction bits
(Bits are numbered from the right, so 0 is the least significant bit.)
Displaying Minifloats
In this lab, your task is to implement a function for displaying minifloats in C, named print_mini
. This function takes in a minifloat and must print the sign, whole number, and fractional part associated with this minifloat as a base-10 value. The exact specification, with examples, is given in minifloat.h
. Your implementation should be filled into minifloat.c
.
To make your task somewhat easier, we have written a concrete call to printf
at the end of the each function that you may use as a guide for what to implement. Note that print_mini
requires that we write 6 decimal digits—the provided printf
specifier %06d
will fill any integer to have preceding zeros such that the printed integer has 6 digits. To provide two concrete examples:
printf("%06d", 123)
will print000123
printf("%06d", 100000)
will print100000
Remember, you may not include any constants or variables of type float
, double
, or long double
, and you may not use any floating-point operations.
You may, however, use any integer arithmetic operation (including integer division and modulus).
In C, dividing two integers with i / j
produces an integer.
But be sure not to include a double constant (such as 1.0
) by accident.
You may find it useful to observe that 1/64=0.015625, and that, with integer division, 1000000/64=15625.
Testing Part 1
A test script to help guide your development can be found in minifloat_test_part1.c
. You can build this test with the following command:
rv make part1
To test this code, you must execute the resulting .out
file and pipe your print results to a file, such as with the following command:
rv qemu minifloat_test_part1.out > minifloat_test_part1.txt
Reminder: use the rv
aliases for each command if you have it set up!
Finally, you must compare the resulting prints to our expected results using diff
:
diff minifloat_test_part1.txt minifloat_test_part1.expected
If you observe any differences between the two, a printing test failed.
You can also combine these operations into a single bash command:
rv make part1 && rv qemu minifloat_test_part1.out > minifloat_test_part1.txt && diff minifloat_test_part1.txt minifloat_test_part1.expected
Reminder: You must add 4 new printing tests (which means modifying both minifloat_test_part1.c
and minifloat_test_part1.expected
).
Part 2: Minifloat Operations
Your second task is to implement an equality check, addition, and multiplication between minifloats. Specifically, you will be implementing mini_eq
, mini_add
, and mini_mul
, which both take in two minifloats and produce a new minifloat. As before, the specifications for each function can be found in minifloat.h
, and your implementation should be written in minifloat.c
.
The results of the arithmetic operations mini_add
and mini_mul
must produce the minifloat value closest to adding together the corresponding real numbers. If there are two possible closest real numbers, your implementation must correspond to the closest real number further from zero than the result of addition. For example, we would round 2.125
to 2.25
, and similarly -1.0625
to -1.125
.
If there are multiple possible minifloat representations of the resulting real number, you must return the minifloat with the smallest exponent. For example, the minifloat value 0 011 0010
could be equivalently represented as 0 001 1000
, and only the latter is considered correct for these arithmetic operations. Additionally, if an arithmetic operation would return 0
, you must return exactly 00000000
.
If applying addition or multiplication would result in a real number larger or smaller than can be represented by a minifloat, the result of these operations is undefined, and need not be tested.
Hint: If you become stuck on any of these functions, consider attempting another—each requires detail that can become more obvious while working on another.
Testing Part 2
Testing minifloat operations is more straightforward than testing the printing implemented earlier. We can simply run each test file and compare the resulting minifloats to expected values. To test part 2, you can directly build and execute part2
:
rv make part2 && rv qemu minifloat_test_part2.out
Reminder: You must add 4 new tests per function.
Hint: Write as many edge-case tests as you can think of, there are many potential tricks with negative numbers and very small or very large minifloats.
Part 3: Using Minifloats
Your third task is a straightforward example use of the minifloats you have implemented. Specifically, you’ll be implementing functions to calculate the volume and surface area of a cylinder in the functions titled cylinder_volume
and cylinder_area
.
The volume and surface area of a cylinder depends on two variables, the radius r
and height h
of the cylinder, by the following equations:
- volume=π×r×r×h
- surface area=2×π×r×(h+r)
For reference and comparison, we have also written an implementation of these functions double_cylinder_volume
and double_cylinder_area
. These may be useful to refer to while implementing your own function, but are also used for the written task below.
For these implementations, you are expected to use the constant minifloat representation of PI to be 01001101
(representing 3.25), which is the closest minifloat to the decimal π≈3.14159. We have included this constant definition in minifloat.c
for your convenience.
Testing Part 3
To test part 3, you can directly build and execute part3
:
rv make part3 && rv qemu minifloat_test_part3.out
We have only provided you with a single simple test for each, and you should write at least 4 new tests. We test these particular functions by comparing our minifloat calculation to the result produced by calculating the same value with a double. We expect that the minifloat result (being less accurate) will have some error compared to the double representation, which in the test is represented by the threshold
parameter.
We recommend trying out a few operations and seeing how difference there is between minifloat and double calculations, and adjusting your threshold accordingly. To help with comparing these operations, we use the provided mini_to_double
utility function to calculate calculate a double value before and after computing the minifloat equivalent.
(We do not define a double_to_mini
conversion.)
The mini_to_double
utility is only for testing.
Do not use it in your main implementation.
Remember that your goal is to implement minifloat operations “from scratch,” using only integer arithmetic.
This is what makes minifloats more efficient than float
or double
.
Your tests should not include cases where the minifloat arithmetic would overflow (produce a result larger than the maximum minifloat or smaller than the largest negative minifloat). We do not define the results of these overflowing operations.
Submission
Submit minifloat.c
, minifloat_test_part1.expected
, minifloat_test_part1.c
, minifloat_test_part2.c
, and minifloat_test_part3.c
to Gradescope.
Upon submission, we will provide a smoke test to ensure your code compiles and passes the public test cases.
Rubric
- 16 points:
print_mini
correctness - 18 points:
mini_eq
correctness - 16 points:
mini_add
correctness - 19 points:
mini_mul
correctness - 8 points:
cylinder_area
correctness - 8 points:
cylinder_volume
correctness - 15 points: test quality