Decrypting Vigenere Cipher

CleoZ | 18:49 Sun 15th Nov 2020 | Technology

21 Answers

I have a question from my coursework I am stuck on.

Two 10 letter words w1 and w2 were chosen from the list at https://www.bestwordlist.com/10letterwords.htm and encrypted using a keyword k a random string of 10 characters. Two ciphertexts are produced c1: NXPRKTZPM and c2:ZQTRRYVNPM. The same keyword k is used for both encryptions. Find the two plaintexts wi an w2 and the original key k.

I have tried to figure it out but got stuck. There must be an easier way than going through every single 10 letter word.

Please Help

The coursework is due in on Tuesday 17th

Printer Problems.

Apple Id

Answers

1 to 20 of 21

1 2 Next Last

Best Answer

No best answer has yet been selected by CleoZ. Once a best answer has been selected, it will be shown here.

For more on marking an answer as the "Best Answer", please visit our FAQ.

jim360

Your codeword c1 is only nine characters. This will make it difficult to explain the process and also illustrate it, as there's an error in the ciphertext.

A good first step is to notice that both ciphertexts end PM. This suggests that the codeword used is four, five or ten, letters long, with a preference for five letters.

jim360

(Four letters only works if the encoding was done separately, which is why I'm inclined to discard this choice, as I suspect they'll have been encoded in sequence, which would break the pattern).

ichkeria

The code word is a string of 10 characters apparently.

jim360

Oh yes, a fair point.

Still, That doesn't change the observation that both end in the same two-letter string PM (hopefully!), which supports trialling es/ed/ly/ng*/th as possible word endings.

*In particular if the missing character in c1 is N in position 8, then NPM might well decode to ing.

jim360

Given that k is a random string, I'm wondering if this is part of a computer coding assignment, which changes the approach significantly. In those cases a brute-force approach may be the only one really possible.

ichkeria

Yes I think I you’re right Jim

CleoZ

Question Author

c1 should be NXPRKTKZPM. I worked out that the last two letters and the fourth letter of both words must be the same but I don't know where to go from there.

jim360

The "by hand" attacks to the Vigenere cipher rely on (a) a short(ish) keyword, so that regular patterns can be used to deduce the length; (b) a reasonable about of Ciphertext, so that frequency analysis is a useful attack; and, to a lesser extent, (c) a keyword that isn't completely random. None of these conditions is met, so you will be reduced to some level of guessing fairly quickly.

I wonder if the NZ pair in first and eighth places is suggestive that the keyword has at least one repeated letter.

I'd like to point out that if, say, w1 = aaaaaaaaaa, then w2 = mteahfloaa , which at least helps to give an idea of the shape of the second word as you cycle through possibilities in the first word. The way I've got this is by treating c1 as the keyword and seeing its effect on c2, and I think it may help to provide a solution strategy, although at the moment this is just a rough sketch rather than a full solution.

jim360

Just to clarify, the point I'm making is that we can at least deduce that the first letters of w1 and w2 are twelve apart, the third are four apart, and so on.

jim360

Just to add to this, because I'm having too much fun trying to break a cipher for the first time in a while:

1. The key thing to note is that we can write w1 + k = c1 mod 26 and w2 + k = c2 mod 26, so that in particular w1 + (c2 -c1) = w2 mod 26. Hence we can eliminate the keyword for the time being and pretend that the difference c2-c1 is a keyword that will encode w1 to w2. This reduces the search space to just real words that map into each other and are separated by an "effective keyword" k_eff = c2 - c1 = mteahfloaa.

2. For the next step, I think you can make some limited progress by considering all possible two-letter combinations that start 10-letter words. Eg, let's assume that w1 begins with b. Then, firstly, the second letter can only be one of {aeiloruy}, but then w2 would begin n{txbehknr}, and we can check that only ne is a valid starting sequence for 10-letter words. Hence w1 can only begin bl, if it begins with b at all. This reduced the number of words beginning with b we need to check from about 1600 to about 180, ie by about 85%. A similar trick for words beginning with A more than halves the list, so overall I think we can reduce the search space by around two-thirds just by focusing on the first two letters.

Looking at common word endings, it's obviously annoying that ed codes to ed, etc, but it may help to note that *ing encodes to *WNG, ie no -ing words work (approx 10% of all 10-letter words on the list I'm using!). Similarly we can remove the -lly ending (about 200 words), as this would encode to *ZLY.

All of this should sieve things down enough to make a final exhaustive search plausible, as you might get down from 35,000 words to around 3,000. But at some point you will just have to check everything.

jim360

Other useful aspects of the sieving:

eb maps to QU, so is the only allowed combination starting E, but there are only ten such words, and they all encode to nonsense, meaning that we can discard E as a starting letter for w1 altogether. Likewise, L maps to X, but decoding X-words shows that you likewise get nonsense for w1, meaning that we can discard L for a starting letter. Together, those two checks remove about 2,300 words as w1 candidates from the list with minimal effort.

scrunge999

I have managed to crack it via brute force - I wrote a small program to find w1 and w2. Given that the OP posted this in the technology section I presume that the setter of the homework expected the answer to be obtained programmatically. I can’t see any other practical way of doing it, although Jim’s suggestion of trialling word endings is spot on. I’m happy to post the answer if anyone wants to see it, or provide some hints if anyone wants to work it out manually.

jim360

I'll be interested in seeing the answer at some point. I was getting a little tired of sorting through two-letter matchings, although it's clearly fairly productive, reducing the necessary word list for words w1 beginning A-F from about 12,300 down to less than 1,600, and for T-Z from around 4,500 down to 341, which is a whopping 88% reduction. But once I've done the letters G-S I'm assuming that still leaves around 4,000-5,000 words to compile (which doesn't take into account the further reductions from removing *ing or *ess or *lly etc. words, which will remove several more).

This would presumably have been a lot easier if I'd exported the list into a single document that I could then gradually reduce this way. At some point, learning how to code up the brute-force search probably becomes easier!

Since it provides at least some help, what's the last letter of w1?

scrunge999

Jim,
It’s S.
I’ve exported the word list to a text file - I can make this publicly available if that’s any use to you?

jim360

Might be useful for the future as well to have a complete list of ten-letter words :)

scrunge999

Here's the link to the wordlist:

https://drive.google.com/drive/folders/1bsVqzTtqlk17GArLs4zz6Hvi8llbTwd0?usp=sharing

THECORBYLOON

JIM, as the keyword is a random set of ten characters, does that not mean the outputs are not shifted by twenty-six characters?

scott23

scrunge999, please would you be able to share the brute-force implementation?

scrunge999

Scott,

I won't post the code because it is written in a language you've probably never heard of! But the basic idea is this:

We know (as Jim has explained above) that the first letters of w1 and w2 are 12 letters apart, the second letters are 19 letters apart, and so on. So if w1 was the first word in the list (AARDWOLVES), w2 would be MTVDDTWJES.

First, I read all the words from the word list into an array.

Then, for each word in the list, I call a function to transform w1 to w2 by moving the first letter 12 forward in the alphabet, the 2nd letter 19 forward, and so on. (If this takes us past the end of the alphabet, we count back so that Z+1 becomes A, for example.)

Then, we see if w2 exists in our word list, i.e. we look up the word list array using w2 as a search argument. If we find a match, we print w1 and w2 and stop; if we don't, we carry on with the next word in the list.

You could make the process more efficient by using Jim's suggestions above to reduce the number of possible words in the wordlist.

outzam

Hey did you ever get the solution to this I have a Coursework with the exact same Qs?

1 to 20 of 21

1 2 Next Last

Do you know the answer?