HardEncrypt Documentation -- Tips for good key seed files

Tips for good HardEncrypt key seed files

As stressed in the other sections of this documentation, HardEncrypt's uncrackable encryption relies on the quality of the key files, and thus relies on the quality of the seed files that are used to generate the key files. In the usage guide, we advocate using audio files as key seeds. Any sort of personalized audio file is sufficient for the novice or casual user. In fact, we're almost certain that any sort of audio file will provide a completely secure encryption key seed as long as it contains some sort of irreproducible sound (e.g., the user's voice saying so much as "hello, how are you?", since every time the user says this phrase, the recording will be different).

However, certain specialized attacks may be attempted on keys made using ordinary audio data. For one thing, audio data is often periodic and contains sinusoidal components. If the attackers suspect that a human voice recording was used to make the key file, they could narrow their search of key files to all those containing periodic data of frequencies between 0 and 10,000 Hz. However, this space is still enormous, basically the space of every possible recorded human voice. If the attackers could get a sample of the user's voice, they could narrow the search space even further by looking only the space of the user's vocal spectrum. If they knew what words the user said into the mic when making the seed, they could narrow the search even more. However, even if the attackers obtain a recording of the user saying the exact words that were said to make the key (e.g., the movie Sneakers), if they didn't have the *exact* recording that was used, they'd still have an enormous search space.

The point is that if the file doesn't actually contain random data, the key file space is a lot smaller. It may still be big enough to provide excellent, uncrackable encryption, but it is theoretically weaker. Despite the drawbacks presented regarding the seed file of a casual user, audio data can be a great source of truly random data when created properly.

The first departure in the right direction might be one away from the human voice. Try recording other sounds, especially those that approach white noise in the way that they sound. A noise generator in a digital sound editing program is no good for the same reason that a software rand() function is no good: neither actually generate random data. However, we are surrounded by natural noise generators. The sound of chatter in a crowded room, the sound of wind blowing into a mic, the sound of a radio that is tuned in to an unused signal band, the sound of a waterfall, all are excellent sources of good, random data.

Certainly, though, all of the above sources have their limitations: each have a signature frequency range. The ultimate key seed file can be created by recording many different random sound sources and mixing them together in a digital sound editor. When the sound starts sounding unrecognizable, you're almost there. Even mixing 30 different recordings of your own voice saying different phrases will produce very good random data.

One other rule of thumb when making a mix of sounds is to avoid any sound sources that are periodic (sine waves, pure tones, etc.) and any other sounds that have "smooth" wave forms. Smooth wave forms don't change much from sample to sample. If you are using 16-bit samples, this small change may only affect the low-order bits of the sample (if the amplitude of the wave changes from 31,000 to 31,244, the high 8 bits of the sample remain unchanged). Since ASCII characters in a text message are each represented by 8 bits, using a smooth, slow changing wave form could expose every other character of your message to a sophisticated attack. If you're using good, random noise, your wave form will not be smooth at all: it will jump a random amount from sample to sample. If you're worried about your sound source being too smooth, sampling it at 8-bit may improve its security properties: there are no high-order bytes that may remain unchanged between samples. In any case, after mixing random sources to make a seed file, you can zoom in to the wave form in a sound editor. If the wave form looks smooth at maximum zoom, you need better random noise sources.

Given the perfect audio file, be it an AIFF or WAV, etc., there is one remaining weakpoint in the file as a seed: it has a header and potentially a footer. All files of the same type have headers containing much of the same data--file type, sample rate, and so on. If attackers know that you used an AIFF file, they can potentially decrypt part of your encrypted messages. If you used the sound file as a key directly (which is possible since HardEncrypt can use anything as a key), attackers would be able to decrypt the beginning of your message. However, if you used GenKeyFile to make a key from your audio file, the file contents are pseudo-randomly mixed, and the header data is spread throughout the first block of the key. If they figured out which psuedo-random mixing pattern was used, attackers could still decrypt portions of your file, but the bytes they could decrypt would be scattered throughout the file. This sort of decrypting ability is not useful if they're trying to make sense of a complete message: a byte here and another byte there doesn't piece the puzzle together very well. You might imagine a scenareo in which you've got a file full of secret numbers, all of which need to be transmitted safely. If the header was pseudo-randomly mixed in to the key, attackers could read parts (or all) of some of the numbers.

The pseudo-random mixing feature was provided primarily for novice and casual users. To alieviate these scattered "holes" in a key, an advanced user could simply remove the header (and possibly the footer) before running GenKeyFile. This can be done with a hex editor by deleting a good portion of the beginning and end of the file. If you delete a few pages worth of hex from each end, you should end up with pure data. Of course, after you delete the header and save the file, you won't be able to use the file as audio anymore, but this shouldn't be a problem. Also, you might want to make the audio slightly longer than needed in the first place so that removing the header still produces a file with the length that you need.