Assignment 3: Spreadsheet Join Utility
Due: Thursday, September 29
Learning Objectives
- Critique and fix method specifications
- Implement a pointer-rich data structure (linked list)
- Implement a main program from scratch
- Use Java I/O
- Thoughtfully design test cases with good coverage
Introduction
Spreadsheets are often used as a way to store and manipulate information. Sometimes information ends up spread across multiple spreadsheets, making it useful to be able to combine information from them. For example, if we had a spreadsheet listing the Gross Domestic Product (GDP) of all 50 states, and another spreadsheet containing median income of the residents of those states (not necessarily listed in the same order), it would be handy if we could easily generate a single spreadsheet containing both kinds of information for each state.
Joins
This operation corresponds to what is called a join in the database area. Like a spreadsheet, a database is made of tables with rows and columns. A join combines two tables to produce a new table, based on information matching in one or more columns. There is more than one kind of join, but we will be interested in computing a left join using the first column of each spreadsheet, under the assumption that the first column of each spreadsheet represents the same data item. For each row in the first table, the left join identifies every row in the second table with a matching first column. Then for each matching row, the new table contains a row that concatenates the row from the first table with the matching row's contents except for the first column. If there is no such row in the second table, a single row is added to the new table, but with empty entries for columns that would have come from the second table.
For example, consider the following two tables. The first one records some (past and present) capitals of states; the second has some population and economic data:
State | Capital |
---|---|
New York | Albany |
California | Sacramento |
Florida | Tallahassee |
Texas | Austin |
Texas | Houston |
Vermont | Montpelier |
State | Population | GDP |
---|---|---|
New York | 19.5 | 1500 |
Florida | 21.2 | 1150 |
California | 39.4 | 3400 |
Texas | 28.6 | 2000 |
North Dakota | 0.8 | 56 |
The left join of these two tables is the following table:
State | Capital | Population | GDP |
---|---|---|---|
New York | Albany | 19.5 | 1500 |
California | Sacramento | 39.4 | 3400 |
Florida | Tallahassee | 21.2 | 1150 |
Texas | Austin | 28.6 | 2000 |
Texas | Houston | 28.6 | 2000 |
Vermont | Montpelier |
The order of rows in the result table always matches the order in the first table. There is an incomplete Vermont row because there is no Vermont row in the second table; there is no North Dakota row because North Dakota does not occur in the first table. Note that if there had been two “Texas” rows in the second table, the result table would contain four Texas rows: two Austin, Texas rows followed by two Houston, Texas rows.
CSV Input Format
Your program will need to be able to read in such tables in a simplified CSV (comma-separated values) format that is compatible with many spreadsheet programs. In this format, values on each row are separated by commas. For example, the first table above would look like the following in CSV format:
State,Capital New York,Albany California,Sacramento Florida,Tallahassee Texas,Austin Texas,Houston Vermont,Montpelier
Note that values may contain space characters, as in “New York”. The
first row of this table is serving as a header row, but it will not be
treated specially by your program; it will be just like any other row.
For simplicity, we will require the CSV input to only
contain values that do not use any special characters such as commas
or newline characters. This restriction will allow you to break a line
into a sequence of values just using the method String.split()
.
You will need to read this kind of data from the input file using the Java I/O library and to extract the data into a list of rows, each of which is a list of values. You must use the linked list data structure that you are implementing as part of this assignment to store the table data.
Command-Line Syntax
Your program should take the names of two files as command-line
arguments, which can be set in the IntelliJ
IDEA run configuration. For example, if you set the program arguments
to “file1.csv file2.csv
”, the strings
"file1.csv"
and "file2.csv"
will then be
available in the arguments array passed to the
a3.Main.main
method, at positions 0 and 1.
The new table that results from a join of the two tables should be
output to the console, and this should be the only output to System.out
.
The output must be in the format of a valid CSV file like the ones your program reads.
Notice that you should not try to use the toString()
method to generate
output, because it does not produce output in the format of a CSV file.
If the program is run with the wrong number of arguments, or if one of the two specified files does not exist, an appropriate helpful error message should be printed and the program should exit cleanly. A stack trace is not considered a helpful error message.
Optionally, you can run your program outside IntelliJ using a shell like bash or PowerShell. Run your program in the top top directory of your IntelliJ project with a command like the following:
java -classpath out/production/a3 a3.Main file1.csv file2.csv
Output redirection can be used to write the program output to a file:
java -classpath out/production/a3 a3.Main file1.csv file2.csv > output.csv
Linked Lists
In this assignment you will be implementing a list abstraction whose
interface is called LList
. The implementation will be in
the class SLinkedList
. To promote loose coupling,
client code should mostly use lists through the LList
interface.
The abstraction and implementation are both parameterized over an
element type T
, so they will work on any boxed type
T
. Inside the implementations of LList
and
SLinkedList
, the type T
stands for an unknown
type that is determined when these abstractions are instantiated on a
particular type. For example, the signature of
LList<Integer>.head()
looks just like the signature
of LList<T>.head()
except that T
is
replaced by Integer
. Your code for SLinkedList
will be written in terms of an unknown type T
, and therefore
needs to be written in a way that works for any legal choice of type
T
.
This list abstraction will be implemented using a linked-list data
structure whose nodes are objects of the class Node<T>
,
which has been provided for you. Your main implementation challenge is
to implement most of the methods of the SLinkedList
class.
You will notice that SLinkedList
already has an
iterator()
method implemented for you. This method makes
it possible to use an SLinkedList
or LList
in
an “enhanced for-loop”, which will be helpful to client code such as your
main program. Enhanced for-loops support easy iteration over arrays and
other data structures. For example, you can write a loop like the following to
iterate over and printout all the integers in an array:
int[] a = ... for (int s : a) { System.out.println(s); }Similarly, you can iterate over all the integers in a
LList<Integer>
:
LList<Integer> lst = .. for (Integer s : lst) { System.out.println(s); }
You may not use any classes from the java.util
package in your
implementation of SLinkedList
. Importing interfaces and exceptions
is fine, however.
Academic Integrity and Collaboration Policy
You may not collaborate with anyone on the code for the classes in this assignment— this includes looking at anyone else’s code or showing anyone else your code in any form. While you can talk to others, this should not be code specific but rather general strategies on how to tackle a problem. You may not look at solutions to similar previous assignments in 2110.
In this assignment, as in A2, you may work with another student with whom you will share test cases; however, for A3 it is optional to have a testing partner. You and this student can work together to write test cases for testing the classes. However, you and your partner may not share any code for the classes themselves.
If you do not know where to start, are stuck, do not understand testing, or are lost, please see someone on course staff immediately. This can be in the form of an instructor, TA, or consultant.
How to do this Assignment
To set up your project, download the provided file a3.zip and unzip it into a directory named a3, which will be the root of your IntellJ project. You can import it into IntelliJ by either dragging the folder onto the IntelliJ program or by using IntelliJ's “New Project from Existing Sources”, and pointing it at this folder. After it is set up, you should have a top-level folder in your IntelliJ project, with src and tests folders inside it. Inside the src folder, there should be a folder named a3 that contains Java source files. The src folder should be colored blue to indicate that it is a source folder.
a3 ├─src │ └─a3 │ ├─Main.java │ ├─LList.java │ └─... │ └─tests └─a3 └─SLinkedListTest.java
If your folders look more like the picture below, it won't work! Everything needs to be
moved up a level to get rid of the extra a3
layer.
a3 └─a3 ├─src │ └─a3 │ └─... │ └─tests └─...
Scan the entirety of these assignment instructions before starting. The three assignment classes are commented with various TODO statements numbered 1 through 9. We recommend designing your classes and tests in this order to avoid confusion.
Testing will be an important part of this assignment. An important part of design is to develop your classes and test them incrementally. This means you should test each part of the code you write before continuing to the next implementation task.
Since you are building your CSV processor on top of your linked-list implementation, bugs in the linked list are likely to show up in the CSV processor, making the program hard to debug. Careful testing of the linked list is advised so that you have a solid foundation to work on.
We will pin a post on Ed with FAQs about this assignment. This post will be updated regularly with answers to questions including notes about what you can or can't use in your solution. Please read this post regularly to stay up to date on this information.
- At this point, you should have all of the release code classes in your project. The first step is to read the method specifications in the interface LList.java. Two of the methods, remove and contains, do not have specifications. You are expected to write specifications for these methods, using your engineering judgment. A description of the methods can be found below.
- Once you have identified and fixed the specifications, you are now ready to set up your test suite. Please see the section on testing for detailed instructions. You should write thorough tests for each method. You will need to use toString() to test some methods.
-
Once your test suite is set up, it is time to start implementing
the methods in SLinkedList.java. The methods that you have seen
in lecture are provided to you here. You will implement the following
additional methods of an
SLinkedList
:classInv()
: A partial implementation of this standard method for checking the class invariant is already provided, but it is incomplete. Think carefully about what your class invariant is and complete the necessary checks. You should useclassInv()
in your other method implementations.toString()
: You must use a StringBuilder, which is already initialized for you.append(T v)
: This method must run in constant time rather than iterating through the list.insertBefore(T x, T y)
: You must use a while-loop.get(int k)
: You must use a for-loop.contains(T elem):
This method should return true if elem is in the linked list, and false otherwise.remove(T x)
: This method returns whether x was removed or not. If x is successfully removed, the method should return true, otherwise it should return false. If there are more than one items in the list that have value x, the method should remove the first instance of x from the list.
-
You are now ready to start coding your Main class. In this class
you will do 3 things:
-
Write a method called
csvToList
that takes in a CSV file name as a string and returns aLList<LList<String>>
. This method should use aFileReader
to read each line of the CSV and separate it by commas. -
Write a method called join that takes in 2 CSV tables of
type
LList<LList<String>>
. The method should then return a singleLList<LList<String>>
that is a table that combines the two input tables, retaining the row order of the first table. -
Write your main method so that it uses the two previous methods,
and perhaps other methods you define, and outputs the
resulting
LList<LList<String>>
table in CSV format to standard output (System.out
).
-
Write a method called
Testing
Unlike A1 where you tested using print statements, in A3, you will
use JUnit tests to test your SLinkedList
methods.
- There should be a folder called “tests” outside the a3 folder. Right-click the tests folder and select from the dropdown: Mark directory as → Test Sources Root
- In the SLinkedListTest.java file, create at least 3 different tests for each method that you implement.
Notes on testing:
-
Make sure that you include
@Test
directly above each test method. Test methods that do not have@Test
will not be run by JUnit, which might lead you to falsely think that all your test cases are passing! - Do not use assert statements and/or try-catch statements in testing.
- You must test all the methods that you implement with at least 3 different test cases. Think about what tests are needed to achieve coverage in both a black-box and a glass-box sense.
How to Test CSV Left Join
There are six CSV files provided to you in the release code (in the inputs-tests/example and input-tests/states folders). In each folder you will find two files named input1.csv and input2.csv, and a file output.csv showing the expected output of the program on those input files.
To create your own CSV files that you can submit as test cases, open a Google Sheets file or Microsoft Excel file. Write name columns on the first row and fill in the corresponding data in the subsequent rows. To save as a CSV file:
- For Google Sheets: File → Download → Comma Separated Values (.csv)
- For Microsoft Excel: File → Save As; File Format: Comma Separated Values (.csv)
Once you have your CSV files, drag them into a new folder under the input-tests folder. Each such folder should include input files input1.csv and input2.csv, and the expected output output.csv. You can create the output file by editing it manually, by copying the console output from running your program, or by using output redirection as described above.
Submission
Please make sure to fill in the time you spent using the timeSpent
field in the Main
class. Additionally, be sure to fill in your
NetID and other information in the comment at the top of the SLinkedList
class. Please also provide your testing partner’s NetID after your own.
Then upload your files LList.java, SLinkedList.java, and Main.java to
Assignment 3 on CMSX before the deadline.
You should also submit your tests to Assignment 3 Tests on CMSX before the deadline in cooperation with your partner if any. Your tests consist of the unit tests you have written in SLinkedListTest.java, plus some system tests, which are CSV files that should be placed in the input-tests folder. Each subfolder should contain files named input1.csv, input2.csv, and output.csv, where the file output.csv contains the expected output from running your program on the inputs input1.csv and input2.csv. The current release of the files already shows this format for two tests.
To locate these files on your computer, you can right click on any of the classes and select “Open In” and then select your file explorer. For Windows, that may be “Explorer” and for Mac, that may be “Finder.” Do not submit any files with different extensions at the end. To check if you submitted the correct files, download the files from CMSX after you have submitted them.