Assignment 3: Kendall, Histogram, CountingSort
Deadline
5 October, 2018 (Friday), 6:00pm.
Prerequisite
- You are able to access the CS1010 programming environment.
- You are familiar with basic UNIX CLI and using terminal-based editor
vim
. - You have gone through Exercise 1 and have already set up your
.gitconfig
. - You are familiar with the steps of accepting, downloading, and submitting your exercise and assignment.
- You are familiar with the directory structure and the common files included in your assignment skeleton.
Learning Outcomes
- Be comfortable writing C programs that involve arithmetic operations,
long
,double
,bool
, andchar
types, conditionalif
/else
statements, loops withwhile
/for
/do-while
statements, and arrays. - Be comfortable breaking down a problem into smaller subproblems which can be resolved using functions, including reusing existing functions written for other programs (with a tweak), writing a function that calls itself, designing what should the inputs and return values/types of a function be.
Setup
- Click on this link to accept the assignment.
- Login to one of the hosts of CS1010 programming environment
- Run:
~cs1010/get-as03
- You should see the folder
as03-<github id>
in your home directory with the assignment skeleton inside.
Solving The Assignments
- Edit the files
kandall.c
,histogram.c
,countingsort.c
, to solve the corresponding question as described below. - You should break down the problem into smaller subproblems, and write one function for each of the subproblem to solve the problem.
- To compile and run tests with the sample inputs and outputs:
make
- The test cases are given in the
inputs
andoutputs
subdirectory. You can usecat
orless
to look at the content of these test cases. You can add more test cases or edit the given ones if needed.
Submission
When you are ready, run the following command to submit:
~cs1010/submit-as03
The four files kendall.c
, histogram.c
, countingsort.c
, will be uploaded to GitHub. You can submit multiple times, but only the last submission will be graded.
Editing Your Files in Multiple Locations
You should edit your code only on the CS1010 PE hosts. If you choose to edit your code in other places, such as directly on Github or in a second location (such as your own laptop), you need to be comfortable with various git
command to synchronize your code across the different locations, possibly needing to resolve synchronization conflicts.
Only the four C files listed above will be submitted. You can create additional C files for testing different solutions, but eventually, you must put your solution into the corresponding C file provided as the skeleton.
Identifying Yourself
In every C file that you submit to CS1010, you need to identify yourself by writing your name and tutorial group. Marks will be deducted if you fail to do so. You need to edit the line:
@author XXXX (Group YYYY)
and change it to something like:
@author Peter Parker (Group 9)
Grading
This assignment contributes towards 3% of your final grade. The total mark for this assignment is 30 marks. There are two marking criteria: correctness and style.
For each question, 2 marks are allocated for coding style. Please refer to the CS1010 C Style Guide and follow the recommended guideline.
The rest of the marks are allocated to correctness. Note that passing the tests does not guarantee that your code is correct. Even if your code produces the correct output all the time, you may still lose marks if you do not write your code properly (e.g., using the most appropriate type, violating the constraints set by the question).
We reserve the right to penalize students for using banned C syntax in the assignments. In addition, each grader at his or her own discretion can penalze students for repeating mistakes / bad practices from the student's past assignments (even if it was just a warning with no marks deducted in the earlier assignments).
Question 1: Kendall (10 marks)
Suppose that we are given a set of items and we ask two different parties to rank the items according to some order. We may get two different order of the items. How do we measure how similar (or dissimilar) the two rankings are?
For example, consider a search engine that returns a list of web pages ranked by their relevance to the search query. A user may not always agree with the ranking of the search engine and may judge the relevance of the search result differently, i.e., the user may have his or her own ranking. This measurement of similarity between the ranking by the search engine and the ranking by the user gives us a metric on how good the search engine result is. The more similar it is to the ranking of the user, the better the search engine is in ranking in the search results.
One way to measure the similarity of two rankings is the Kendall tau distance. You will write a program kendall
that calculates the normalized Kendall tau distance for this question.
We will represent a ranking by the order of the items. The first item is ranked 1, second is ranked 2, and so on. To simplify the problem, we take one of the ranking that we want to calculate the Kendall tau distance on, and label the items sequentially, as the sequence 1, 2, 3, 4, 5, ...n, where n is the number of items. We call this the base ranking. The other ranking will then be a permutation of the numbers 1 to n.
For example, suppost we have three items A, B, C. the first ranking ranks the items as B, C, A. The second ranking ranks the items C, A, B. After relabelling the first ranking as 1, 2, 3, the second ranking becomes 2, 3, 1.
The Kendall tau distance counts the number of pairs of items in one ranking that are ranked in different order in the other ranking. In the example above, we have three possible pairs:
Pair | Ranking 1 | Ranking 2 |
---|---|---|
A-B | B then A | A then B |
A-C | C then A | C then A |
B-C | B then C | C then B |
Out of the three pairs, the pair A-B and B-C are ordered differently in the two rankings, so that Kendall tau distance is 2.
The normalized Kendall tau distance is the ratio of the number of pairs ranked in different order to all possible number of pairs.
In the example above, the normalized Kendall distance is ⅔ = 0.6666.
Your program should read the following from the standard input:
- The first positive integer, n, is the number of items (n > 1).
- The next n numbers is a permutation of integers between 1 to n. This corresponds to the ranking of the items from 1 to n.
Your program should print the normalized Kendall tau distance between the ranking read above and the base ranking (1, 2, 3, .. n) to the standard output.
Sample Run
ooiwt@pe118:~/as03-skeleton$ ./kendall
3
2 3 1
0.6667
ooiwt@pe118:~/as03-skeleton$ ./kendall
10
1 2 3 4 5 6 7 8 9 10
0.0000
ooiwt@pe118:~/as03-skeleton$ ./kendall
6
6 5 4 3 2 1
1.0000
Question 2: Histogram (10 marks)
It is often useful to visualize the distribution of numerical data using a histogram. In this question, you will write a program called histogram
that plots the histogram for real numbers ranged between 0 and 100 (inclusive).
Our histogram will have 10 buckets, b_0, b_1, ... b_9. The bucket b_i corresponds to the interval [10i, 10(i+1)) (includes 10i, but excludes 10(i+1)), except b_9, which includes 100.
To plot the histogram, we count how many percent of the data falls into each bucket. Let the percentage of data that falls into bucket b_i be k_i%. We set the length of each bar in histogram to be at most 10 characters, so we scale down k_i by 10 to get the length l_i.
We then draw on the screen using character "█" and "▌", according to the following rules:
- If l_i is an integer, then we draw "█" l_i times.
- Otherwise, if x < l_i \le x + 0.5 for some integer x, then we draw "█" x times followed by a "▌".
- Otherwise, if x + 0.5 < l_i < x + 1 for some integer x, then we draw "█" x+1 times.
Your program should read the following from the standard input:
- a positive integer n, followed by
- n real numbers, corresponding to data to plot the histogram with.
Your program should print the axis and labels, as well as the bars for the histogram as shown in the same run.
The code to print both axis and labels are already given in the skeleton histogram.c
. The string that corresponding to "█" and "▌" are also given in the code, defined as BLOCK
and HALF_BLOCK
.
Sample Run
ooiwt@pe118:~/as03-skeleton$ ./histogram
10
25 35 35 45 45 45 45 55 55 65
┴┴┴┴┴┴┴┴┴┴
0 - 10
10 - 20
20 - 30 █
30 - 40 ██
40 - 50 ████
50 - 60 ██
60 - 70 █
70 - 80
80 - 90
90 - 100
ooiwt@pe118:~/as03-skeleton$ ./histogram
1
100
┴┴┴┴┴┴┴┴┴┴
0 - 10
10 - 20
20 - 30
30 - 40
40 - 50
50 - 60
60 - 70
70 - 80
80 - 90
90 - 100 ██████████
ooiwt@pe118:~/as03-skeleton$ ./histogram < inputs/histogram.5.in
┴┴┴┴┴┴┴┴┴┴
0 - 10 ▌
10 - 20 ▌
20 - 30
30 - 40 ▌
40 - 50
50 - 60 ▌
60 - 70 █
70 - 80 ██
80 - 90 █████▌
90 - 100 ██
The last histogram above plots the actual data from your Assignment 1 marks (scaled up 3x to have the range of 0 - 90).
Question 3: CountingSort (10 marks)
Sorting is a fundamental computational problem: given a list of items, we want to rearrange the items in some order.
In this question, you will write you first sorting algorithm, called counting sort. This is an extremely fast algorithm for sorting positive integers if the range of the integers are limited.
The idea of counting sort is that, given the list of integers (each guranteed to be between 1 to k) to sort, we count how many times 1 appear in the list, how many times 2 appears in the list, etc. Finally, we print out each number between 1 to k according to how many times they appear in the list, skipping those numbers who do not appear.
For instance, suppose we have 6 integers between 1 to 9: 5 5 3 2 8 2
. We first count how many times each number appears. Then we print the sorted list the following way: 2
appears twice, so we print
2
2
The number 3 appears once, we print
3
The number 5 appears twice, we print
5
5
Finally 8 appears once, we print
8
The printed output is thus
2
2
3
5
5
8
which is the numbers sorted in increasing order.
Write a program countingsort.c
that reads the following in order from the standard input:
- n the number of integers to sort
- k the maximum value of the integers to sort
- The next n numbers are the integers to be sorted, each guaranteed to be between 1 and k.
Sort the integers using the algorithms above and print them in increasing order to the standard output, one integer on each line. Note that if you use any other algorithms to sort the numbers, you will be penalized heavily.
Input Validation: For this question, you need to validate that the numbers are actually ranged between 1 and k. Any inputs that are not in this range must be omitted in the output.
Sample Run
ooiwt@pe118:~/as03-skeleton$ ./countingsort
6 9
5 5 3 2 8 2
2
2
3
5
5
8
ooiwt@pe118:~/as03-skeleton$ ./countingsort
3 1000
256 872 112
112
256
872
ooiwt@pe118:~/as03-skeleton$ ./countingsort
4 3
3 2 -3 2
2
2
3