Shader compression scheme based on ZSTD dictionary

With the increasing scale of the project, the number of Shader variants in the UE gradually increases, often reaching millions of Shader variants. Although the UE provides the Share Material Shader Code function to avoid repeated storage of shaders, which can serialize shaders into an independent ushaderbytecode file, it will also occupy a package size of hundreds of M. To solve this problem, we can only start from two aspects:

  1. Reduce the number of variants in the project, Dump the Shader information in the project, and analyze what is unnecessary;
  2. The algorithm with higher compression rate is used to compress ShaderCode, and LZ4 is used for compression by default in the engine;

The first method requires the collaborative implementation of TA and art, and it is difficult to achieve significant improvement. Therefore, starting with the second method, this article implements a special compression method for Shader, which can effectively reduce the size of shaderbytecode and greatly improve the compression ratio of Shader.

Forward

Open Share Material Shader Code in project settings project packaging:

By default, after being turned on, the UE will be in the.. /.. / of Cooked/ PROJECT_ Generate two shaderbytecode (metalmap and metallib if IOS is compiled as Native) files under name / content:

These two files store all shaders in the project.

The engine also provides the default Shader compression mechanism. Using LZ4, you can control the switch through the Console Variable:

r.Shaders.SkipCompression=0

A value other than 0 turns off compression.

After compiling the Shader, the compression process is performed when adding the Shader to the ShaderMap:

void FShaderMapResourceCode::AddShaderCode(EShaderFrequency InFrequency, const FSHAHash& InHash, TConstArrayView<uint8> InCode)
{
	const int32 Index = Algo::LowerBound(ShaderHashes, InHash);
	if (Index >= ShaderHashes.Num() || ShaderHashes[Index] != InHash)
	{
		ShaderHashes.Insert(InHash, Index);

		FShaderEntry& Entry = ShaderEntries.InsertDefaulted_GetRef(Index);
		Entry.Frequency = InFrequency;
		Entry.UncompressedSize = InCode.Num();
		
		bool bAllowShaderCompression = true;
#if !(UE_BUILD_SHIPPING || UE_BUILD_TEST)
		static const IConsoleVariable* CVarSkipCompression = IConsoleManager::Get().FindConsoleVariable(TEXT("r.Shaders.SkipCompression"));
		bAllowShaderCompression = CVarSkipCompression ? CVarSkipCompression->GetInt() == 0 : true;
#endif

		int32 CompressedSize = InCode.Num();
		Entry.Code.AddUninitialized(CompressedSize);

		if (bAllowShaderCompression && FCompression::CompressMemory(GetShaderCompressionFormat(), Entry.Code.GetData(), CompressedSize, InCode.GetData(), InCode.Num()))
		{
			// resize to fit reduced compressed size, but don't reallocate memory
			Entry.Code.SetNum(CompressedSize, false);
		}
		else
		{
			FMemory::Memcpy(Entry.Code.GetData(), InCode.GetData(), InCode.Num());
		}
	}
}

However, even after LZ4 compression, the volume of Shaderbytecode is still very large (the volume of 100w Shader reaches 91M):

Moreover, if the target platform supports multiple shader formats, such as SM5 and ES31, the volume will be doubled.

ZSTD

In my previous blog post, I introduced ZSTD Integrated into UE, used as Pak compression algorithm to improve the compression ratio of Pak:

In the game project, the performance of ZSTD is slightly weaker than that of GAD's Oodle algorithm. It is also integrated into UE after being acquired by Epic and used as the default compression algorithm of 4.27 +.

The size of each ShaderCode is about K to 100k. Generally, it is very difficult to compress this kind of small data, because the compression algorithm is based on the existing data to compress the future data. There are not so many data sets in small files, so it is difficult to improve the compression ratio.

However, ZSTD has a special compression mode: training dictionaries from existing data and compressing small data with dictionaries. This situation is very suitable for the compression of small data such as Shader.

The test data provided by zstd is also very good:

Commands for creating training sets, compressing and decompressing provided by zstd:

# Create dictionary
$ zstd --train ./DumpShaders/PCD3D_SM5/* -r -o PCD3D_SM5.dict
# Use dictionary compression
$ zstd -D PCD3D_SM5.dict ./PCD3D_SM5/* -o PCD3D_SM5.compressed
# Extract using dictionary
$ zstd -D PCD3D_SM5.dict -d PCD3D_SM5.compressed -o ./PCD3D_SM5

You can see more usage through zstd --help.

So, how to integrate this method into UE?

Integration to UE

Firstly, as described in the previous section, the dictionary based compression method needs to train the dictionary from the existing data. In UE, it needs to be trained based on the uncompressed ShaderCode.

The integration process requires several steps:

  1. Turn off the engine's default Shader compression
  2. Dump all shadercode in the process of Shader serialization
  3. Use zstd to create a dictionary based on the Shader file from Dump as the training set
  4. Integrate zstd into UE and compress ShaderCode based on dictionary
  5. Serialize the compressed ShaderCode into ushaderbytecode
  6. At the same time, you need to modify the part of the engine that reads the Shader from ushaderbytecode to ensure that the Shader code can be decompressed correctly

But there are several points to note:

When using ZSTD to create a dictionary from a training set, you should pay attention to:

  1. Ensure that the training set is large enough. The size of the training set and the dictionary should be at least 10-100 times. The larger the training set is, the better the compression effect of the dictionary can be guaranteed
  2. The default maximum dictionary size of zstd is 110k. You can estimate the dictionary size according to the size of the training set and the ratio of 100 times, and specify it through -- maxdict
  3. Make sure that the Shader data from Dump is uncompressed

At present, there is no specific implementation of open source. The above points are the core steps of implementation. The Shader is loaded normally during operation:

Compression effect and performance

Compressed data of Shader by different compression algorithms:

Category/kb

NoCompress

LZ4(Default)

zstd+dict

Global

20151

5304

1997

StarterContent

16972

6353

1443

Dict

501

Total

37123

11657

3941

Using ZSTD + dictionary for Shader compression improves the compression rate by about 66% compared with the engine's default LZ4.

Decompression time at runtime:

epilogue

After testing, the method based on ZSTD + dictionary can improve the Shader compression ratio and reduce the size of shaderbytecode in the package compared with LZ4.

However, it is still necessary to manually train the Dump Shader to get the dictionary before compression. In the original packaging process of UE, it is not convenient to intervene in this process to realize automation. The future direction is to realize the automation of the whole process of packaging and optimization, based on HotPatcher The framework can easily intervene in the serialization process of Shader, train the dictionary without perception, and apply the dictionary to the compression of Shader HotPatcher Integrate the process in.

Posted by someone on Mon, 18 Apr 2022 09:18:30 +0930