Contents

Writeup for ORGCTF2: Crypto.my_own_cryptography

./assets/binary_orange.jpg

This is a cryptography challenge from ORGCTF2 at Harbin Engineering University. We will use this challenge as a starting point to explore the security vulnerabilities of stream ciphers.

1 Challenge

After completing a cryptography course, Yang decided to design a secure and computationally efficient encryption algorithm. He boasted that this algorithm was invulnerable and openly shared his source code and encryption examples with the public. Now, there is a copy of this example here, and we hope you can show him the power of a skilled hacker.

from debidong import story, flag

def encrypt(plaintext, key):
    plaintext += (len(key) - len(plaintext) % len(key)) * " "
    ciphertext = ""
    for i in range(len(plaintext) // len(key)):
        for j in range(len(key)):
            ciphertext += hex(ord(plaintext[i * len(key) + j]) ^ ord(key[j]))[2:].zfill(2)
    return ciphertext

print(encrypt(story, flag))
# 201c040654130b1f5f45005311360755454e361a48047f232c3b2430294c5d0c130b0f11025b387430222723734a44010b2d114809363e20316335665615060602431c070f505f040c16017f26591d1a33114832372131306d740a48091b1e0243230e120454451612167f1c551b177f0705042d3c65342d30664d121917034337323d5e112d04530a391e55074e2f151a11362b2c25222023455d061c472020205b135e08111611361e5906002c54090b3b6824362b3d2357180b520a021a1f5b154906041f093a0444491c3a071d092b3b6b750c3a2301190e0b4b43000e1e507920343031194a5b0000381007087f3f2426633d28571c0b170343161f5b1850060a16172c44103d063a5400043c232027307422440e1b00081a11025b04590041180c310d5406037807480b3a3c323a313f6652041c06020e540715141116151c093a4a44010b7f1f010b382c2a3864276656180e1e130b5a46371945110d16450802591d0b7f100d06362c2031632029010e1b13090754130b50500b0553013a0c55081a7f0000007f202436283134525d1b1d471015101e50450d04530e3604570d01325a4829363c313926741149141b17470111011a1e110d0800453e0e460c002b011a0071680d3063322f530e1b52010c01081f50450d045306331f551a4e33110e117f2a3c75373c2301150e110c0606155750500b055311370f5e491a2d150b0e3a2c65312c232801090717470b15051015431646530d360e55061b2b5409063c2737312a3a2101090052130b1146181c4400125d451604101d063a5400043c23202730736649140b170816004a5b3c5811151f007f3d58001a3a5400043b682475253d23531e0a520502001217151112080070d7f1e580c4e37150b0e3a3a367b63152055181d5206431c0709141107000711330f1c492236001c093a68123d2a2023011b061c060f181f5b1454030412113a0e101d063a5400043c2320273074274f194f01061511025b04590041180c310d54060371543c0d3a6835302c242a445d001447171c035b1b580b06170a324a470c1c3a541e002d316532313532441b1a1e47171b46371945110d16450802591d0b7f1506017f2b24392f31220115061f4702540e1e025e4b413f0c2b1e5c0c4e081c01113a6f367530202953044f01171111071f50450d131c1038025f1c1a7f0000007f232c3b2430294c514f1b0910040f09195f024116133a184906003a5a4820292d372c2c3a23011500020210541213114545151b00264a5308007f15041630682730203b2b445d0e520f0606095b1c580e045329361e44050b7f23000c2b2d65342d30664714081a1343120909505b1012070c3c0f1e492731541c0d3a68203b277429475d1b1a0243071214024849413f0c2b1e5c0c4e081c01113a68273020352b445d1b1a024313131a02550c001d45300c101d063a5420200a0b1113633f2f4f1a0b1d0a4d542e1e50520a0f070c311f550d4e2b1b48162b3d212c633c2753194f130907540f1600430a171645370343491d341d04092c682c3b633b3445181d52130c54041e0445001353152d05440c0d2b541c0d3a682e3c2d33224e104152474354465b5011454153457f4a10494e7f5448457f686575637466015d

2 Solution

2.1 Analyzing the encrypt Function

First, let’s focus on the encrypt function. By manually renaming variable names to make the code clearer, we can deduce the approximate code of the original function:

def encrypt(plaintext: str, key: str):
    plaintext += (len(key) - len(plaintext) % len(key)) * " "
    ciphertext = ""
    for i in range(len(plaintext) // len(key)):
        for j in range(len(key)):
            ciphertext += hex(ord(plaintext[i * len(key) + j]) ^ ord(key[j]))[2

:].zfill(2)
    return ciphertext

In this function, the following steps are performed:

  1. The plaintext is padded to ensure that its length is a multiple of the key length. It adds spaces to the end of the plaintext to achieve this.
  2. The plaintext is divided into segments with a length equal to the key’s length. Then, for each segment, each character is XORed with the corresponding character from the key.
  3. The XOR result is converted to a hexadecimal string with leading zeros if necessary and added to the ciphertext.

It’s clear that this function is using a simple XOR-based encryption mechanism. However, there is a significant security flaw in this encryption approach (as discussed in the next section).

2.2 About XOR

Let’s briefly recall the truth table for the XOR operation:

01
001
110

XOR returns true (1) when the two bits being compared are different; otherwise, it returns false (0). One important property of XOR is that XORing a number with itself results in 0, and XOR also obeys the commutative law. Therefore, the XOR operation has the following property:

The result of XORing two numbers bitwise is the same as XORing each number with a common number and then XORing the results together:

That is:

$$ a \oplus b = (a \oplus c) \oplus (b \oplus c) $$

This property is straightforward to prove because the first two properties hold:

$$ (a \oplus c) \oplus (b \oplus c) = (a \oplus b) \oplus (c \oplus c) = a \oplus b $$

Now, let’s consider what information can be revealed by XORing plaintexts.

2.3 ASCII Codes and XOR

Do you remember what ASCII codes are? Nowadays, ASCII codes are a subset of UTF and are used to represent English characters and control symbols using 8-bit characters.

./assets/1280px-USASCII_code_chart.png

Now, let’s focus on the ASCII code for the space character, which is 0x20. In binary representation, it is 00100000. Let’s also consider the letter A, whose corresponding ASCII code is 01000001. What happens when we XOR A with a space?

First, let’s XOR the lower four bits. Since 0 XOR any number is 0, the lower four bits of the result will be the same as the lower four bits of A. Then, let’s XOR the upper four bits. The space’s sixth bit is 1, which means it will make the result’s sixth bit opposite to A, while the other bits remain unchanged. The final result is 01100001, which corresponds to a.

With this property, we can determine that when we XOR the ciphertext pairwise, and if the XOR result contains a letter, it implies that the corresponding position in the plaintext should be a character within the range of A-Z or a-z. (Don’t forget that ciphertext XOR is equal to plaintext XOR.)

2.4 Final Solution

After breaking the plaintext into n pairs (with a length equal to the key, denoted as l), it forms an n*l matrix. XORing it with the key results in the ciphertext.

$$ \begin{bmatrix} m_{00} & m_{01} & … & m_{0l} \\ m_{10} &&& …\\ … &&& …\\ m_{n0} &… &… & m_{nl} \end{bmatrix}\oplus \begin{bmatrix} k_{0} & 0 & … & 0\\ 0& k_{1} & 0 &… \\ … & 0 & … &0 \\ 0 & … &0&k_{l} \end{bmatrix}= \begin{bmatrix} c_{00} & c_{01} & … & c_{0l} \\ c_{10} &&& …\\ … &&& …\\ c_{n0} &… &… & c_{nl} \end{bmatrix} $$

For each column in the ciphertext, we XOR the bits with each other. If the result of the XOR operation is a letter, it indicates that the corresponding characters in the plaintext contain a space character. For the character that meets this condition the most times in a column, we can consider it as a space character. Therefore, while traversing the operations, we assign weights to different rows in each column. Each time a space appears in the XOR operation, we increase the weight of these two rows by 1. After the traversal, we set the character in the row with the highest weight as a space and use that value to restore the character in the other positions in that row. By repeating this process for each column, we can obtain the entire plaintext. Once we have the plaintext, it becomes easy to obtain the key (flag).

There is one final issue to address: how to determine the value of n. We can use a brute-force method to determine this by manually observing whether the decrypted plaintext appears as “normal text.”

We can write a simple script to find the greatest common divisor of the padded plaintext length to estimate possible key lengths. For this problem, the possible key lengths are 2, 19, 31, 38, 62 and 589. Here is the decryption script.

# c: ciphertext
# l: length of the key
def decode(c: str, l: int):
    rows = []
    plain = [[0 for _ in range(l)] for _ in range(len(c)//(l*2))]
    for i in range(len(c)//(l*2)):
        rows.append(c[2*i*l:(i+1)*2*l])
    # for every column of the matrix
    for col in range(l):
        # weight of the space for every element
        blanks = {}
        # for every row
        for i in range(len(rows)):
            # possibility of being '0x20'
            p = 0
            # xor with other rows
            for j in range(len(rows)):
                if i == j:
                    continue
                # check the result of xor
                ascii =  int("0x"+rows[i][col*2:col*2+2], 16) ^ int("0x"+rows[j][col*2:col*2+2], 16)
                if ord('a') <= ascii <= ord('z') or ord('A') <= ascii <= ord('Z'):
                    p += 1
            blanks[i] = p
        # get the row of char that supposed to be 0x20 in this column
        row = max(zip(blanks.values(), blanks.keys()))[1]
        # restore plaintext on this column
        plain[row][col] = ' '
        for i in range(len(plain)):
            if i == row:
                continue
            plain[i][col] = chr(int("0x"+rows[i][col*2:col*2+2],16) ^ int("0x"+rows[row][col*2:col*2+2], 16) ^ ord(" "))

    # print the result
    for row in plain:
        for i in row:
            print(i, end="")
        print("")

Call the function.

c = "201c...015d" # ciphertext omitted
decode(c, 31)

The result obtained is:

Once upon a time, in a kingdom 
called HEUCTF, there lived a wh
ite hat named Little White. Lit
tle White was very smart and lo
ved CTF. He often participated 
in CTF competitions and achieve
d many excellent results. One d
ay, the HEUCTF kingdom was inva
ded by hackers. The hackers des
troyed the kingdom's network sy
stem and stole the kingdom's we
alth. Little White decided to s
tand up and defeat the hackers 
to save the kingdom. Little Whi
te began his adventure. He firs
t found the clues left by the h
ackers, and then tracked down t
he hackers' hideout according t
o the clues. In the hackers' hi
deout, Little White had a fierc
e battle with the hackers. Afte
r a hard battle, Little White f
inally defeated the hackers and
 saved the kingdom. The people 
of the kingdom were very gratef
ul to Little White and called h
im a hero. Little White's story
 spread throughout the kingdom,
 inspiring everyone. Everyone h
opes that they can also become
a hero like Little White and fi
ght for justice. In the end of
the story, Little White became
the guardian of the HEUCTF king
dom. He continued to study hard
 and improve his skills in orde
r to better protect the kingdom
.

A very reasonable plaintext indeed. From this, we can XOR any row with the corresponding row in the ciphertext to obtain the flag orgctf{please_j0in_the_HEUCTF!}.

2.5 Using Hamming Distance to Estimate Key Length

As the length of the plaintext increases, brute-forcing the key length becomes highly inefficient. We can calculate the Hamming distance to estimate the key length. The Hamming distance between two ASCII-encoded English characters is usually between 2 and 3, meaning the average Hamming distance for normal English letters is 2-3. For any characters (non-pure letters), the average Hamming distance between them is 4.

In this problem, since the ciphertext is obtained by XORing the plaintext with a key, under the assumption that the key length is correct, the Hamming distance between different ciphertext groups should be in the range of 2-3. We can write a script to calculate the Hamming distance between ciphertext groups for each possible key length.

def get_avg_hamming_weight(c):
    avgs = {}
    # l stands for the number of characters in ciphertext
    l = len(c) // 2
    # i stands for the length of the key
    for i in range(1, l):
        # Make sure i is a divisor of l
        if l % i != 0:
            continue
        rows = []
        for j in range(l // i):
            rows.append("0x" + c[j * i : j * i + i])
        avg = 0
        for j in range(len(rows)):
            for k in range(len(rows)):
                avg += bin(int(rows[j], 16) ^ int(rows[k], 16)).count("1")
        avg = avg / (i * len(rows) * (len(rows) - 1) / 2)
        avgs[i] = avg
    return avgs

print(get_avg_hamming_weight(c))

The output is:

{
    1: 3.568754841306132,
    2: 3.3118626058233143,
    19: 3.5943666675944224,
    31: 3.2528793649336945,
    38: 3.358234295415959,
    62: 2.6361063950198074,
    589: 3.7928692699490663
}

In this problem, the lengths of the plaintext and the key are not large, so the Hamming distance may not be highly indicative. However, when the plaintext and ciphertext lengths are significantly larger, Hamming distance can be a valuable method for estimating the key length.

3 Conclusion

A stream cipher must ensure one-time-only encryption to prevent leakage of information. Pseudo-random key stream generation algorithms must be unpredictable; otherwise, under sufficient redundancy, the security issue demonstrated in this problem may occur.

4 Postscript

Some participants in solving this problem successfully deduced the entire flag by noticing that the last group of plaintext was almost entirely padding. They achieved this by XORing it with spaces and the last row. This is indeed a good approach and serves as a reminder not to create padding carelessly, as it could potentially make the encryption algorithm more vulnerable in specific situations.