Principle introduction and implementation of SHA256 algorithm

Recently, I was learning some knowledge about algorithm encryption and decryption. I didn't have a special understanding of SHA256 algorithm before. I saw the detailed explanation and implementation process of SHA256 algorithm by many other leaders, and finally understood it a little. Thank you very much for integrating these materials here. The purpose of writing this learning note is to record the process of learning SHA256 algorithm for next time. Of course, it would be great if we could provide some ideas and inspiration to our partners in need.

1. Overview of Sha algorithm

SHA (Secure Hash Algorithm) is a family of cryptographic hash functions. It is FIPS (Federal Information Processing Standards)
The authenticated secure hash algorithm. An algorithm that can calculate a fixed length string (also known as message digest) corresponding to a digital message.

An n-bit hash function is a mapping from an arbitrarily long message to an n-bit hash value. An n-bit encrypted hash function is a one-way, collision free n-bit hash function. Such a function is a very important means in digital signature and password protection.

At present, the popular hash functions mainly include 128 bit MD4 and MD5 and 160 bit (20 byte) SHA-1. The SHA-2 family introduced today has more output hash values, which is more difficult to crack and can improve higher security.

SHA-2, whose name comes from the abbreviation of Secure Hash Algorithm 2 (English: Secure Hash Algorithm 2), is a cryptographic hash function algorithm standard developed by the national security administration and released by the National Institute of standards and Technology (NIST) in 2001. It is one of the Sha algorithms and the successor of SHA-1. It can be further divided into six different algorithm standards, including SHA-224, SHA-256, SHA-384, SHA-512, SHA-512/224 and SHA-512/256.
In addition to some minor differences in the length of the generated summary and the number of cycles, the basic structure of the algorithm is the same.

2. Introduction to sha256 algorithm

When it comes to SHA256, it literally means that for messages of any length, SHA256 will generate a 256 bit hash value called message digest. This summary is equivalent to an array with a length of 32 bytes, a total of 256 bits. It is usually represented by a hexadecimal string with a length of 64, in which 1 byte = 8 bits and a hexadecimal character is 4 bits long.

SHA256 hashes the message, as shown in the following example:

1122334455667788    //news

The message summary obtained by the hash function SHA256 is:

1DCE6604591EFB439D5E87418A1D00DBFD014327D8C4DEA862815714B76AE9A5    //Hash value

Here, the original 8-byte message "1122334455667788" is calculated by SHA256 algorithm to obtain a 32-byte message summary, and small changes are made to the message, and the generated Hash will change greatly, which is completely different from the original value, as follows:

0122334455667788    //news
2B85738907AB2C4C39DFFFDD5328A694F4DF04B75E6F482F832279C6BBFE8530   //Hash value

Only by changing the value of one bit, the Hash value has changed greatly.

3. Detailed description of sha256 algorithm principle

In order to better understand the principle of SHA256, we first introduce the modules that can be extracted separately from the algorithm, including constant initialization, information preprocessing and used logic operation. After getting rid of these obstacles in understanding, let's explore the main part of SHA256 algorithm, that is, how to calculate the message summary.

3.1 constant initialization

SHA256 algorithm uses 8 initial hash values and 64 hash constants. 64 hash constants participate in the subsequent hash value calculation.
The 8 initial hash values of SHA256 algorithm are:

H1 := 0x6a09e667
H2 := 0xbb67ae85
H3 := 0x3c6ef372
H4 := 0xa54ff53a
H5 := 0x510e527f
H6 := 0x9b05688c
H7 := 0x1f83d9ab
H8 := 0x5be0cd19

The initial hash value H(1-8) is taken from the decimal part of the square root of the first 8 prime numbers (2,3,5,7,11,13,17,19) in the natural number, and takes the first 32 bits As an example, the decimal part of [Formula] is about 0.414213562373095048, where

Therefore, the first 32 bits of the decimal part of the square root of prime 2 correspond to 0x6a09e667. And so on, you can get 8 initial hash values.

The 64 hash constants of SHA256 algorithm are:


Similar to the 8 initial hash values, the 64 hash constants are obtained from the decimal part of the cube root of the first 64 prime numbers (2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97...) in the natural number, and the first 32 bits are taken.

3.2 information preprocessing

The preprocessing in SHA256 algorithm is to supplement the required information after the message that wants to Hash, so that the whole message meets the specified structure.

The preprocessing of information is divided into two steps: additional padding bits and additional length

STEP1: additional padding bits

Fill in the end of the message so that the remainder of the message length after taking the modulus of 512 is 448

The filling is carried out in this way: first fill the first bit as 1, and then fill 0 until the length meets the modulo of 512, and the remainder is 448.

It should be noted that the information must be filled, that is, even if the length has met the requirement that the remainder after 512 modulo is 448, the bit filling must also be carried out. At this time, 512 bits must be filled.

Therefore, filling is to fill in at least one digit and up to 512 digits.

**Example: * * take the information "abc" as an example to display the filling process.
a. The ASCII codes corresponding to B and C are 97, 98 and 99 respectively

The original binary code of 100000111 is 10000111

The first step is to fill in a "1": 011000101100010 01100011 1

The second step is to fill in 423 "0": 0110001 01100010 01100011 10000000 0000000... 0000000

The data after filling is as follows (expressed in hexadecimal):

61626380 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000

Why 448?

Because after the preprocessing in the first step, a 64bit data will be added in the second step to represent the length information of the original message. And 448 + 64 = 512, just put together a complete structure.

STEP2: additional length value

The additional length value is to add the length information of the original data (the message before the first filling step) to the message that has been filled.

SHA256 uses a 64 bit data to represent the length of the original message.

Therefore, the message length calculated by SHA256 must be less than $2 ^ 64 $, which is large enough in most cases.

The encoding method of length information is 64 bit big endian integer

The meaning of Big endian is supplemented at the end of the paper

Back to the example just now, the message "abc" has three characters and occupies 24 bit s

Therefore, after the complement length operation is performed, the whole message becomes as follows (hexadecimal format)

61626380 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000018

3.3 logic operation

The operations involved in SHA256 hash function are all logical bit operations

It includes the following logic functions:

Here are some operations:

Logical operationmeaning
Bitwise "and"
¬Bitwise "and"
Bitwise XOR
SnRotate n bit s to the right
RnShift n bit s right

3.4 logic operation

Now let's introduce the main part of SHA256 algorithm, that is, how to calculate the message digest.

First: decompose the message into 512 bit blocks

Assuming that the message M can be decomposed into n blocks, what the whole algorithm needs to do is to complete n iterations, and the result of N iterations is the final hash value, that is, the digital summary of 256bit.

The initial value H0 of a 256 bit summary is calculated by the first data block to obtain H1, that is, the first iteration is completed

H1 passes through the second data block to obtain H2,..., which is processed in turn to obtain Hn, which is the final 256 bit message summary

Use $map (H {I-1}) = h for the mapping of each iteration_ {i} $indicates that the iteration can be more vividly displayed as:

The 256 bit Hi in the figure is described as eight small blocks, because the smallest operation unit in SHA256 algorithm is called "Word", and one Word is 32 bits.

In addition, in the first iteration, the initial value of the mapping is set to the 8 hash initial values described above, as shown in the following figure:

Let's start with the content of each iteration, that is, mapping $map (H {I-1}) = H_ {i} Specific algorithm of $

STEP1: construct 64 word s

break chunk into sixteen 32-bit big-endian words w[0], ..., w[15]

For each block, the block is decomposed into 16 32-bit big endian words, which are recorded as w[0],..., w[15]

That is, the first 16 words are directly decomposed by the ith block of the message

The remaining words are obtained by the following iterative formula:

STEP2: perform 64 cycles

Map $map (H #u{i-1}) = h_ {i} $contains 64 encryption cycles

That is, 64 encryption cycles can complete one iteration

Each encryption cycle can be described by the following figure:

It can be seen from the above figure that the eight word s ABCDEFGH are being updated according to certain rules. The rules are as follows:

If the original data block is ABCDEFGH, one cycle is:


The dark blue box is a pre-defined nonlinear logic function, which has been paved above
The red field square represents mod $2 ^ {32} $addition, that is, add the two numbers together. If the result is greater than $2 ^ {32}, you must divide by 2 ^ {32} $and find the remainder. Of course, if we define the data as 32 bits, that is, four bytes, we don't have to deliberately take the remainder. The previous loop represents A '- H'.

The initial values of ABCDEFGH are $h at the beginning_ {i-1}(0),H_ {i-1}(1),…,H_{i-1}(7) $.

Kt is the t-th key, corresponding to the 64 constants mentioned above.

Wt is the t-th word generated in this block. The original message is cut into 512 bit blocks with a fixed length. For each block, 64 words are generated. The eight words ABCDEFGH are cyclically encrypted by repeating the cycle n times.

The eight words generated in the last loop together are the hash string $h corresponding to the ith block_ {i} $.

4.SHA256 algorithm implementation

4.1 logic function

typedef unsigned char BYTE;             // 8 bits
typedef unsigned int  WORD;             // 32-bit data, 4 bytes represent a word

typedef struct {
	BYTE data[64];  // current 512-bit chunk of message data, just like a buffer
	WORD datalen;   // sign the data length of current chunk
	unsigned long long bitlen;  // the bit length of the total message
	WORD state[8];  // store the middle state of hash abstract
} SHA256_CTX;
//The logic functions are as follows
∧: Bitwise "and"
!: Bitwise "complement"
⊕: Bitwise XOR
Sn: Rotate n bit s to the right
Rn: Shift n bit s right
#define ROTRIGHT(a,b) (((a) >> (b)) | ((a) << (32-(b))))  			// rotate right
#define MAJ(x,y,z) (((x) & (y)) ^ ((x) & (z)) ^ ((y) & (z)))	    //Ma(x,y,z)=(x∧y)⊕(x∧z)⊕(y∧z)
#define EP0(x) (ROTRIGHT(x,2) ^ ROTRIGHT(x,13) ^ ROTRIGHT(x,22))	//Σ0(x)=S2(x)⊕S13(x)⊕S22(x)
#define EP1(x) (ROTRIGHT(x,6) ^ ROTRIGHT(x,11) ^ ROTRIGHT(x,25))	//Σ1(x)=S6(x)⊕S11(x)⊕S25(x)
#define SIG0(x) (ROTRIGHT(x,7) ^ ROTRIGHT(x,18) ^ ((x) >> 3))		//σ0(x)=S7(x)⊕S18(x)⊕R3(x)
#define SIG1(x) (ROTRIGHT(x,17) ^ ROTRIGHT(x,19) ^ ((x) >> 10))		//σ1(x)=S17(x)⊕S19(x)⊕R10(x)

4.2 Hash value obtained by data block processing

//Operate a data block with a length of 512bit and convert it into Hash value
void sha256_transform(SHA256_CTX *ctx, const BYTE data[])
	//Abcdefgh represents eight words of abcdefgh, and t1t2 stores the intermediate operation value
	WORD a, b, c, d, e, f, g, h, i, j, t1, t2, m[64];//m here represents 64 words (32 bits per word)

	// initialization 
	for (i = 0, j = 0; i < 16; ++i, j += 4)
		m[i] = (data[j] << 24) | (data[j + 1] << 16) | (data[j + 2] << 8) | (data[j + 3]);//Divide the data block into 16 32bit words and store them in the first 16 words of m
	for ( ; i < 64; ++i)
		m[i] = SIG1(m[i - 2]) + m[i - 7] + SIG0(m[i - 15]) + m[i - 16];//Other words: 16-63, according to the formula Mt= σ 1(M(t-2))+M(t-7)+ σ 0(M(t-15))+M(t-16) calculation	
	//In the first iteration, the initial mapping value is set to 8 hash initial values
	a = ctx->state[0];
	b = ctx->state[1];
	c = ctx->state[2];
	d = ctx->state[3];
	e = ctx->state[4];
	f = ctx->state[5];
	g = ctx->state[6];
	h = ctx->state[7];
		If the original data block is ABCDEFGH, one cycle is:
	//64 encryption cycles
	for (i = 0; i < 64; ++i) 
		//There is no need to remove the remainder of 2 ^ 32, because the length of a-t2 is 32bit, and if it exceeds 2 ^ 32, the excess will be discarded
		t1 = h + EP1(e) + CH(e,f,g) + k[i] + m[i];//An operation required by SHA256
		t2 = EP0(a) + MAJ(a,b,c);
		h = g;
		g = f;
		f = e;
		e = d + t1;
		d = c;
		c = b;
		b = a;
		a = t1 + t2;
	//Integrate hash
	ctx->state[0] += a;
	ctx->state[1] += b;
	ctx->state[2] += c;
	ctx->state[3] += d;
	ctx->state[4] += e;
	ctx->state[5] += f;
	ctx->state[6] += g;
	ctx->state[7] += h;
void sha256_final(SHA256_CTX *ctx, BYTE hash[])
	WORD i;
	i = ctx->datalen;
	// Pad whatever data is left in the buffer.
	if (ctx->datalen < 56) {
		ctx->data[i++] = 0x80;  // pad 10000000 = 0x80
		while (i < 56)
			ctx->data[i++] = 0x00;
	else {
		ctx->data[i++] = 0x80;
		while (i < 64)
			ctx->data[i++] = 0x00;
		sha256_transform(ctx, ctx->data);
		memset(ctx->data, 0, 56);
	//Append the total length of the message (in bits) to the padding and convert it
	//56-638 bytes represent the bit length and are stored according to the big end mode (low address put high)
	ctx->bitlen += ctx->datalen * 8;
	ctx->data[63] = ctx->bitlen;
	ctx->data[62] = ctx->bitlen >> 8;
	ctx->data[61] = ctx->bitlen >> 16;
	ctx->data[60] = ctx->bitlen >> 24;
	ctx->data[59] = ctx->bitlen >> 32;
	ctx->data[58] = ctx->bitlen >> 40;
	ctx->data[57] = ctx->bitlen >> 48;
	ctx->data[56] = ctx->bitlen >> 56;
	sha256_transform(ctx, ctx->data);
	// copying the final state to the output hash(use big endian).
	//hash[i] type is Byte, STX - > state [i] type is WORD(4 bytes)
	for (i = 0; i < 4; ++i) 
		//Take the upper left 8 bits, press the big end mode, hash[0] is the low address, take the high bit, and so on
		hash[i]      = (ctx->state[0] >> (24 - i * 8)) & 0x000000FF;
		hash[i + 4]  = (ctx->state[1] >> (24 - i * 8)) & 0x000000FF;
		hash[i + 8]  = (ctx->state[2] >> (24 - i * 8)) & 0x000000FF;
		hash[i + 12] = (ctx->state[3] >> (24 - i * 8)) & 0x000000FF;
		hash[i + 16] = (ctx->state[4] >> (24 - i * 8)) & 0x000000FF;
		hash[i + 20] = (ctx->state[5] >> (24 - i * 8)) & 0x000000FF;
		hash[i + 24] = (ctx->state[6] >> (24 - i * 8)) & 0x000000FF;
		hash[i + 28] = (ctx->state[7] >> (24 - i * 8)) & 0x000000FF;

5. Reference

The main reference materials of this study note are as follows:
Detailed explanation of SHA256 algorithm principle
C language SHA256 encryption algorithm based on STM32
Understand the principle and implementation of SHA256 algorithm

Posted by chadrt on Fri, 15 Apr 2022 11:23:40 +0930