Introduction to Cryptography

Project Assignment: Introduction to Cryptography Instructions

Overview

Two of the Modules include projects that, along with the single Discussion Assignment, are the only parts of the course that are outside of WebAssign. This class was originally created in large measure to support Liberty’s popular Cybersecurity program. While most of the course material is quite general, the project assignments each focus in on statistical applications that are at least moderately related to this field.

Each project assignment takes a bit of a deeper dive into course topics that are introduced in the WebAssign assignments. In each case, the topics are at least broadly related to the field of Cybersecurity. While these assignments are by no means intended to give realistic current applications in this area, they do illustrate how course topics impact two specific types security-related issues: passwords and encryption. Project assignments demonstrate the relevance of exponential growth, of estimation strategies and of enumerative techniques such as tree diagrams in the analysis of these types of problems. While the first project is quite similar to problems encountered in the second homework and could be completed using hand calculations, the second project assumes that students are able to utilize Excel formulas to work through what would otherwise be a very difficult computational burden.

Instructions

The data file includes text taken from three books of the Bible (Joshua, Jonah and Philippians) using the ESV translation. While these are all great books, our only interest for this project is how often each letter is used.

1) In the Word file containing the Biblical text, use the “Find” feature to identify how many times each letter occurs (i.e. the letter’s frequency). Create an Excel spreadsheet to display the number of occurrences of each letter in the English alphabet. (10 points)

2) In the Excel spreadsheet, sum your frequencies to compute the total number of letters in the 3 books (this is sample size n).

  1. a) In your spreadsheet, use the formula to compute the sample proportion of each letter’s appearances relative to total number of letters (i.e. the relative frequency of each letter). Use the Excel sorting function to sort the letters in order of their frequencies. (6 points)
  2. b) Use the simple Confidence Interval (CI) formula to find a 95% CI on the proportion of how often each letter is used in English text in general. Enter the lower limit in the first Excel column using the formula   and the upper limit in the next column using the formula  .  (8 points)

3)  Identify those letters whose Cls do not overlap with any the CIs of any of the other letters.  (For example the CI (0.042, 0.052) overlaps with (0.050, 0.060) because the upper limit of the first CI is greater than the lower limit of the second CI.) List the letters with the non-overlapping Cis and specify how many such letters there are. (6 points)

4) The previous analysis could be useful if our goal was to decipher an encrypted message, where each letter is scrambled (for example, each “a” might become a “g”, while each “b” might become an “o” and so forth).

  1. a) Assume that the letter “z” in encrypted message has a relative frequency of 0.06 (it accounts for 6% of the total number of letters). Which letter’s Confidence Intervals (from question 2) contain 0.06 and thus are the most likely candidates to be the letter which was encrypted as “z”? (4 points)
  2. b) Further assume that “y” in the encrypted message has a relative frequency of 0.04 (4%). Which letter’s CIs contain 0.04? (4 points)
  3. c) If “x” in the encrypted message has a relative frequency of 0.025 (2.5%), which letter’s CIs contain 0.025? (4 points)

5) a) How many possible ways are there to assign the actual letters of the alphabet to the encrypted letters in a message? (Hint: “A” could be assigned to any one of the 26 letters, including itself.  Once “A” has been assigned, “B” can be assigned to any letter except the letter that corresponds to “A”). (4 points)

  1. b) As your answer to part (a) makes clear, there are a super-high number of possible ways all the letters could be assigned. Knowing something about each letter’s relative frequency dramatically reduces the number of likely combinations. For example, if there were only 3 possible options for half of the encrypted letters (i.e. 13 letters) in the message and only 2 possible options for the remaining 13 letters, then how many possible ways would there be to assign real letters to the letter in the encrypted message? (4 points)

Note: Your assignment will be checked for originality via the Turnitin plagiarism tool.

Leave a Reply

Your email address will not be published. Required fields are marked *