Technology2 mins ago
Decrypting Vigenere Cipher
21 Answers
I have a question from my coursework I am stuck on.
Two 10 letter words w1 and w2 were chosen from the list at https:/ /www.be stwordl ist.com /10lett erwords .htm and encrypted using a keyword k a random string of 10 characters. Two ciphertexts are produced c1: NXPRKTZPM and c2:ZQTRRYVNPM. The same keyword k is used for both encryptions. Find the two plaintexts wi an w2 and the original key k.
I have tried to figure it out but got stuck. There must be an easier way than going through every single 10 letter word.
Please Help
The coursework is due in on Tuesday 17th
Two 10 letter words w1 and w2 were chosen from the list at https:/
I have tried to figure it out but got stuck. There must be an easier way than going through every single 10 letter word.
Please Help
The coursework is due in on Tuesday 17th
Answers
Best Answer
No best answer has yet been selected by CleoZ. Once a best answer has been selected, it will be shown here.
For more on marking an answer as the "Best Answer", please visit our FAQ.Your codeword c1 is only nine characters. This will make it difficult to explain the process and also illustrate it, as there's an error in the ciphertext.
A good first step is to notice that both ciphertexts end PM. This suggests that the codeword used is four, five or ten, letters long, with a preference for five letters.
A good first step is to notice that both ciphertexts end PM. This suggests that the codeword used is four, five or ten, letters long, with a preference for five letters.
The "by hand" attacks to the Vigenere cipher rely on (a) a short(ish) keyword, so that regular patterns can be used to deduce the length; (b) a reasonable about of Ciphertext, so that frequency analysis is a useful attack; and, to a lesser extent, (c) a keyword that isn't completely random. None of these conditions is met, so you will be reduced to some level of guessing fairly quickly.
I wonder if the NZ pair in first and eighth places is suggestive that the keyword has at least one repeated letter.
I'd like to point out that if, say, w1 = aaaaaaaaaa, then w2 = mteahfloaa , which at least helps to give an idea of the shape of the second word as you cycle through possibilities in the first word. The way I've got this is by treating c1 as the keyword and seeing its effect on c2, and I think it may help to provide a solution strategy, although at the moment this is just a rough sketch rather than a full solution.
I wonder if the NZ pair in first and eighth places is suggestive that the keyword has at least one repeated letter.
I'd like to point out that if, say, w1 = aaaaaaaaaa, then w2 = mteahfloaa , which at least helps to give an idea of the shape of the second word as you cycle through possibilities in the first word. The way I've got this is by treating c1 as the keyword and seeing its effect on c2, and I think it may help to provide a solution strategy, although at the moment this is just a rough sketch rather than a full solution.
Just to add to this, because I'm having too much fun trying to break a cipher for the first time in a while:
1. The key thing to note is that we can write w1 + k = c1 mod 26 and w2 + k = c2 mod 26, so that in particular w1 + (c2 -c1) = w2 mod 26. Hence we can eliminate the keyword for the time being and pretend that the difference c2-c1 is a keyword that will encode w1 to w2. This reduces the search space to just real words that map into each other and are separated by an "effective keyword" k_eff = c2 - c1 = mteahfloaa.
2. For the next step, I think you can make some limited progress by considering all possible two-letter combinations that start 10-letter words. Eg, let's assume that w1 begins with b. Then, firstly, the second letter can only be one of {aeiloruy}, but then w2 would begin n{txbehknr}, and we can check that only ne is a valid starting sequence for 10-letter words. Hence w1 can only begin bl, if it begins with b at all. This reduced the number of words beginning with b we need to check from about 1600 to about 180, ie by about 85%. A similar trick for words beginning with A more than halves the list, so overall I think we can reduce the search space by around two-thirds just by focusing on the first two letters.
Looking at common word endings, it's obviously annoying that ed codes to ed, etc, but it may help to note that *ing encodes to *WNG, ie no -ing words work (approx 10% of all 10-letter words on the list I'm using!). Similarly we can remove the -lly ending (about 200 words), as this would encode to *ZLY.
All of this should sieve things down enough to make a final exhaustive search plausible, as you might get down from 35,000 words to around 3,000. But at some point you will just have to check everything.
1. The key thing to note is that we can write w1 + k = c1 mod 26 and w2 + k = c2 mod 26, so that in particular w1 + (c2 -c1) = w2 mod 26. Hence we can eliminate the keyword for the time being and pretend that the difference c2-c1 is a keyword that will encode w1 to w2. This reduces the search space to just real words that map into each other and are separated by an "effective keyword" k_eff = c2 - c1 = mteahfloaa.
2. For the next step, I think you can make some limited progress by considering all possible two-letter combinations that start 10-letter words. Eg, let's assume that w1 begins with b. Then, firstly, the second letter can only be one of {aeiloruy}, but then w2 would begin n{txbehknr}, and we can check that only ne is a valid starting sequence for 10-letter words. Hence w1 can only begin bl, if it begins with b at all. This reduced the number of words beginning with b we need to check from about 1600 to about 180, ie by about 85%. A similar trick for words beginning with A more than halves the list, so overall I think we can reduce the search space by around two-thirds just by focusing on the first two letters.
Looking at common word endings, it's obviously annoying that ed codes to ed, etc, but it may help to note that *ing encodes to *WNG, ie no -ing words work (approx 10% of all 10-letter words on the list I'm using!). Similarly we can remove the -lly ending (about 200 words), as this would encode to *ZLY.
All of this should sieve things down enough to make a final exhaustive search plausible, as you might get down from 35,000 words to around 3,000. But at some point you will just have to check everything.
Other useful aspects of the sieving:
eb maps to QU, so is the only allowed combination starting E, but there are only ten such words, and they all encode to nonsense, meaning that we can discard E as a starting letter for w1 altogether. Likewise, L maps to X, but decoding X-words shows that you likewise get nonsense for w1, meaning that we can discard L for a starting letter. Together, those two checks remove about 2,300 words as w1 candidates from the list with minimal effort.
eb maps to QU, so is the only allowed combination starting E, but there are only ten such words, and they all encode to nonsense, meaning that we can discard E as a starting letter for w1 altogether. Likewise, L maps to X, but decoding X-words shows that you likewise get nonsense for w1, meaning that we can discard L for a starting letter. Together, those two checks remove about 2,300 words as w1 candidates from the list with minimal effort.
I have managed to crack it via brute force - I wrote a small program to find w1 and w2. Given that the OP posted this in the technology section I presume that the setter of the homework expected the answer to be obtained programmatically. I can’t see any other practical way of doing it, although Jim’s suggestion of trialling word endings is spot on. I’m happy to post the answer if anyone wants to see it, or provide some hints if anyone wants to work it out manually.
I'll be interested in seeing the answer at some point. I was getting a little tired of sorting through two-letter matchings, although it's clearly fairly productive, reducing the necessary word list for words w1 beginning A-F from about 12,300 down to less than 1,600, and for T-Z from around 4,500 down to 341, which is a whopping 88% reduction. But once I've done the letters G-S I'm assuming that still leaves around 4,000-5,000 words to compile (which doesn't take into account the further reductions from removing *ing or *ess or *lly etc. words, which will remove several more).
This would presumably have been a lot easier if I'd exported the list into a single document that I could then gradually reduce this way. At some point, learning how to code up the brute-force search probably becomes easier!
Since it provides at least some help, what's the last letter of w1?
This would presumably have been a lot easier if I'd exported the list into a single document that I could then gradually reduce this way. At some point, learning how to code up the brute-force search probably becomes easier!
Since it provides at least some help, what's the last letter of w1?
Here's the link to the wordlist:
https:/ /drive. google. com/dri ve/fold ers/1bs VqzTtql k17GArL s4zz6Hv i8llbTw d0?usp= sharing
https:/
Scott,
I won't post the code because it is written in a language you've probably never heard of! But the basic idea is this:
We know (as Jim has explained above) that the first letters of w1 and w2 are 12 letters apart, the second letters are 19 letters apart, and so on. So if w1 was the first word in the list (AARDWOLVES), w2 would be MTVDDTWJES.
First, I read all the words from the word list into an array.
Then, for each word in the list, I call a function to transform w1 to w2 by moving the first letter 12 forward in the alphabet, the 2nd letter 19 forward, and so on. (If this takes us past the end of the alphabet, we count back so that Z+1 becomes A, for example.)
Then, we see if w2 exists in our word list, i.e. we look up the word list array using w2 as a search argument. If we find a match, we print w1 and w2 and stop; if we don't, we carry on with the next word in the list.
You could make the process more efficient by using Jim's suggestions above to reduce the number of possible words in the wordlist.
I won't post the code because it is written in a language you've probably never heard of! But the basic idea is this:
We know (as Jim has explained above) that the first letters of w1 and w2 are 12 letters apart, the second letters are 19 letters apart, and so on. So if w1 was the first word in the list (AARDWOLVES), w2 would be MTVDDTWJES.
First, I read all the words from the word list into an array.
Then, for each word in the list, I call a function to transform w1 to w2 by moving the first letter 12 forward in the alphabet, the 2nd letter 19 forward, and so on. (If this takes us past the end of the alphabet, we count back so that Z+1 becomes A, for example.)
Then, we see if w2 exists in our word list, i.e. we look up the word list array using w2 as a search argument. If we find a match, we print w1 and w2 and stop; if we don't, we carry on with the next word in the list.
You could make the process more efficient by using Jim's suggestions above to reduce the number of possible words in the wordlist.
Related Questions
Sorry, we can't find any related questions. Try using the search bar at the top of the page to search for some keywords, or choose a topic and submit your own question.