Binary file does not include checksum
By themselves, checksums are often used to verify data integrity but are not relied upon to verify data authenticity. The actual procedure which yields the checksum from a data input is called a checksum function or checksum algorithm. Depending on its design goals, a good checksum algorithm will usually output a significantly different value, even for small changes made to the input.
This is especially true of cryptographic hash functions , which may be used to detect many data corruption errors and verify overall data integrity ; if the computed checksum for the current data input matches the stored value of a previously computed checksum, there is a very high probability the data has not been accidentally altered or corrupted.
Checksum functions are related to hash functions , fingerprints , randomization functions , and cryptographic hash functions. However, each of those concepts has different applications and therefore different design goals. For instance a function returning the start of a string can provide a hash appropriate for some applications but will never be a suitable checksum.
Checksums are used as cryptographic primitives in larger authentication algorithms. For cryptographic systems with these two specific design goals, see HMAC.
Check digits and parity bits are special cases of checksums, appropriate for small blocks of data such as Social Security numbers , bank account numbers, computer words , single bytes , etc. Some error-correcting codes are based on special checksums which not only detect common errors but also allow the original data to be recovered in certain cases.
The simplest checksum algorithm is the so-called longitudinal parity check , which breaks the data into "words" with a fixed number n of bits, and then computes the exclusive or XOR of all those words.
The result is appended to the message as an extra word. To check the integrity of a message, the receiver computes the exclusive or of all its words, including the checksum; if the result is not a word consisting of n zeros, the receiver knows a transmission error occurred. With this checksum, any transmission error which flips a single bit of the message, or an odd number of bits, will be detected as an incorrect checksum. However, an error which affects two bits will not be detected if those bits lie at the same position in two distinct words.
Also swapping of two or more words will not be detected. A variant of the previous algorithm is to add all the "words" as unsigned binary numbers, discarding any overflow bits, and append the two's complement of the total as the checksum. To validate a message, the receiver adds all the words in the same manner, including the checksum; if the result is not a word full of zeros, an error must have occurred.
This variant too detects any single-bit error, but the promodular sum is used in SAE J The simple checksums described above fail to detect some common errors which affect many bits at once, such as changing the order of data words, or inserting or deleting words with all bits set to zero.
This can be done by comparing two files bit-by-bit, but requires two copies of the same file, and may miss systematic corruptions which might occur to both files. A more popular approach is to also store checksums hashes of files, also known as message digests , for later comparison. File integrity can be compromised, usually referred to as the file becoming corrupted.
A file can become corrupted by a variety of ways: Hash -based verification ensures that a file has not been corrupted by comparing the file's hash value to a previously calculated value. If these values match, the file is presumed to be unmodified. Due to the nature of hash functions, hash collisions may result in false positives , but the likelihood of collisions is often negligible with random corruption.
It is often desirable to verify that a file hasn't been modified in transmission or storage by untrusted parties, for example, to include malicious code such as viruses or backdoors. To verify the authenticity, a classical hash function is not enough as they are not designed to be collision resistant ; it is computationally trivial for an attacker to cause deliberate hash collisions, meaning that a malicious change in the file is not detected by a hash comparison.
In cryptography, this attack is called a preimage attack. For this purpose, cryptographic hash functions are employed often. As long as the hash sums cannot be tampered with — for example, if they are communicated over a secure channel — the files can be presumed to be intact. Alternatively, digital signatures can be employed to assure tamper resistance. There are a few well-known checksum file formats. Several utilities, such as md5deep , can use such checksum files to automatically verify an entire directory of files in one operation.
The particular hash algorithm used is often indicated by the file extension of the checksum file. As of , best practice recommendations is to use SHA-2 or SHA-3 to generate new file integrity digests; and to accept MD5 and SHA1 digests for backward compatibility if stronger digests are not available.