Fast and Clean: Custom Compute Shader and Raster Shader in UE5

Tutorial / 15 September 2023

This tutorial part, I'm introducing a clean way to add compute or fullscreen pixel passes with RDG API and without touching UE5 engine code at all. Everything lives inside our own module.

Adding a module 

  1. follow the generic guideline to prepare a customized module from Unreal Official https://docs.unrealengine.com/5.2/en-US/how-to-make-a-gameplay-module-in-unreal-engine/
  2. in MyProject.uproject, set "LoadingPhase" to "PostConfigInit" because we want the shader module to be loaded immediately after config system has been initialized and before engine is been fully initialized. 
  3. in MyModule.h make sure FMyModule class inherit from public IModuleInterface, then you can implement essential overridable functions like StartupModule and ShutdownModule. 
  4. in MyModule.cpp the implementation of FMyModule::StartupModule(), map the actual shader folder on your disk to a virtual path. 
    FString ShaderDirectory = FPaths::Combine(FPaths::EngineDir(), TEXT("Shaders/"));
    AddShaderSourceDirectoryMapping("/CustomShaders", ShaderDirectory);

  5. Don't forget IMPLEMENT_MODULE(FMyModule, ModuleName) Macro in MyModule.cpp

Compute pass example

  Prepare shader class

  1. make another cpp and h files (MyShaderHandler.cpp & MyShaderHandler.h). we need a FGlobalShader class, which you can imagine as an binding bridge between UE5 and HLSL. 
    // interface between engine and HLSL shader
    class FMyShaderClass: public FGlobalShader
  2. create a shader parameters struct (same as parameters in shader) For example let's look at a simple compute shader just writing a debug texture:

    ExampleCS.usf 
    RWTexture2D<float4> OutputTexture;
    float DebugValue;[numthreads(THREADGROUPSIZE_X, THREADGROUPSIZE_Y, THREADGROUPSIZE_Z)] // this 3 values will be set from C++ side in our global shader classvoid MainCS(uint3 Gid : SV_GroupID, //atm: -, 0...256, - in rows (Y) --> current group index (dispatched by c++)
     uint3 DTid : SV_DispatchThreadID, //atm: 0...256 in rows & columns (XY) --> "global" thread id
     uint3 GTid : SV_GroupThreadID, //atm: 0...256, -,- in columns (X) --> current threadId in group / "local" threadId
     uint GI : SV_GroupIndex) //atm: 0...256 in columns (X) --> "flattened" index of a thread within a group)
    {  
        OutputTexture[DTid.xy] = float4(0, DebugValue, 0, 1);
    }
    in our shader class, these macros are creating equivalent parameters to our shader example, so later on we can bind them and set values. In Unity we reply on make a string name or int representation. If you feel uncertain about what each macro mean, check for their declarations in ShaderParameterMacros.h

    public:
       DECLARE_GLOBAL_SHADER(FMyShaderClass);
       SHADER_USE_PARAMETER_STRUCT(FMyShaderClass, FGlobalShader);
       
       BEGIN_SHADER_PARAMETER_STRUCT(FParameters, )
          SHADER_PARAMETER(float, DebugValue)
       
          SHADER_PARAMETER_RDG_TEXTURE_UAV(RWTexture2D, OutputTexture)
    END_SHADER_PARAMETER_STRUCT()
    
    
  3.  set correct shader permutation for the platform you want to run on (same concept as shader variations in Unity) note: RHI is a graphics layer on top of various lower level Graphics APIs in UE5

    static bool ShouldCompilePermutation(const FGlobalShaderPermutationParameters& Parameters)
    {
       return IsFeatureLevelSupported(Parameters.Platform, ERHIFeatureLevel::TYPE);
    }
  4. set thread group count (some places call them work group size) from our shader class to HLSL
    static inline void ModifyCompilationEnvironment(const FGlobalShaderPermutationParameters& Parameters, FShaderCompilerEnvironment& OutEnvironment)
    {
       FGlobalShader::ModifyCompilationEnvironment(Parameters, OutEnvironment);
       OutEnvironment.SetDefine(TEXT("THREADGROUPSIZE_X"), NUM_THREADS_PER_GROUP_DIMENSION); // NUM_THREADS_PER_GROUP_DIMENSION is just a macro user defined
       OutEnvironment.SetDefine(TEXT("THREADGROUPSIZE_Y"), NUM_THREADS_PER_GROUP_DIMENSION);
       OutEnvironment.SetDefine(TEXT("THREADGROUPSIZE_Z"), 1);
    }
  5. config shader class with correspondent HLSL file and the kernel to execute (right outside of our shader class). Now we have our shader class ready!
    IMPLEMENT_GLOBAL_SHADER(FMyShaderClass, "/CustomShaders/ExampleCS.usf", "MainCS", SF_Compute);

Bind Parameters and attach pass to unreal renderer

  1. bind parameters. Don't worry about PostOpaqueRenderParamters for now, it's a built-in UE5 render event parameter, we will get into that later.
    To config and dispatch our shader, let's create another class FCSDispatcher which implements our module API MODULENAME_API
    There are a few things I would like to explain for those get irritated if don't understand what's behind the scene (me too), if you're comfortable with black box feel free to ignore this:
    first, MODULENAME_API is automatically generated by Unreal build tool UEBuildModuleCPP.cs at Definition.MyProject.h as #define COMPUTESHADER_API DLLIMPORT (somehow in Riders search doesn't work for DLL files, if you know how to config teach me svp!).
    Second, as you can see the below code we use FRDGBuilder and FRDGTexture, RDG is render graph API 1 level higher than RHI. It packs command lists to graph data structure, it has other advantages like executing passes in parallel. This is a good and brief diagram found in the hyperlink:Third, the delegate in our dispatcher class is to hook onto Unreal's renderer callback event. This way we don't pollute engine code with our passes. If you don't fully understand, don't worry, you will see more details later. BeginPass is to register the our delegate and EndPass is to remove our pass from it. 
    class MODULENAME_API FCSDispatcher
    {
    public:
       FCSDispatcher();
    
    
       FRDGTextureUAV* ComputeOutput;
       
       void BeginPass();
       void EndPass();
    
    
    private:
       FDelegateHandle OnPostOpaqueRenderDelegate;
       void AddComputePass(FPostOpaqueRenderParameters& PostOpaqueRenderParameters);
    };
  2. in AddComputePass, bind objects/values to shader parameters:
    FRDGBuilder& GraphBuilder = *PostOpaqueRenderParameters.GraphBuilder; 
    TShaderMapRef<FMyShaderClass> ComputeShader(GetGlobalShaderMap(GMaxRHIFeatureLevel)); // get reference to our shader in shadermap
    // create UAV
    FRDGTexture* Texture = GraphBuilder.CreateTexture(FRDGTextureDesc::Create2D(FIntPoint(1920, 1080), PF_R8, FClearValueBinding::Black, TexCreate_UAV), TEXT("ComputeOutput"));
    ComputeOutput = GraphBuilder.CreateUAV(FRDGTextureUAVDesc(Texture, 0));
    // bind pass parameters
    FMyComputeShader::FParameters* PassParameters = GraphBuilder.AllocParameters<FMyComputeShader::FParameters>();
    PassParameters->OutputTexture = ComputeOutput;
    PassParameters->DebugValue = 1;
  3. Then attach our pass to RDG graph builder by AddPass, AddPass function attach and execute the pass to command list every frame, it doesn't mean our pass will be added repeatedly as the name AddPass might suggest. Another thing to note is ERDGPassFlags::NeverCull, normally this flag is unnecessary.  In Unreal RenderGraphBuilder.cpp, a pass will be automatically culled if none of the output resource is used somewhere (for example texture used as fullscreen output or attach to a material of some objects). If a pass is culled, renderer will throw an assertion error and crash immediately. Here for demonstration purposes I just temporarily tell the renderer not to cull our pass. 
    FComputeShaderUtils::AddPass(
       GraphBuilder,
       RDG_EVENT_NAME("My COMPUTE PASS"),
       ERDGPassFlags::Compute| ERDGPassFlags::NeverCull,
       ComputeShader,
       PassParameters,
       FIntVector(FMath::DivideAndRoundUp(1920, NUM_THREADS_PER_GROUP_DIMENSION), FMath::DivideAndRoundUp(1080, NUM_THREADS_PER_GROUP_DIMENSION), 1)
       );
  4. Register and unregister render pass to a proper render module in Unreal. FPostOpaqueRenderParameters and OnPostOpaqueRenderDelegate. PostOpaqueRender is a  render event provided by Unreal renderer. You can go through Unreal's render pipeline to find proper sequence for your pass to be executed. Once we add our delegate to PostOpaqueRenderDelegate, our pass will be called everytime in RenderPostOpaqueExtensions, and RendererInterface will pass PostOpaqueRenderParameters to our function. see the internal delegate declaration of  FPostOpaqueRenderDelegate. With PostOpaqueRenderParameters you can access to many useful screen render resources. 
    FPostOpaqueRenderDelegate
    DECLARE_MULTICAST_DELEGATE_OneParam(FOnPostOpaqueRender, class FPostOpaqueRenderParameters&)
    Now it should be obvious enough to look at BeginPass and EndPass
    void FCSDispatcher::BeginPass()
    {
       if(OnPostOpaqueRenderDelegate.IsValid()) return;;
    
    
       OnPostOpaqueRenderDelegate = GetRendererModule().RegisterPostOpaqueRenderDelegate(FPostOpaqueRenderDelegate::CreateRaw(this, &FCSDispatcher::AddComputePass));
    }void FCSDispatcher::EndPass()
    {
       GetRendererModule().RemovePostOpaqueRenderDelegate(OnPostOpaqueRenderDelegate);
    }

Full-screen rasterization pass example

Prepare shader class

this step is the same as the instruction of making a shader class in compute shader pass part, only thing special is to be able to bind our fragment shader output to renderer, add the following macro when declare shader parameters.

RENDER_TARGET_BINDING_SLOTS()

Add a full screen pass 

  1. bind paramters: nothing very special except with the macro we wrote in shader parameter declaration, we now have access to RenderTargets array, and here we are binding it to screen color texture PostOpaqueRender provides. 
    FMyPostProcessingShader::FParameters* PixelPassParameters = GraphBuilder.AllocParameters<FMyPostProcessingShader::FParameters>();
    PixelPassParameters->Test = 1;
    PixelPassParameters->RenderTargets[0] = FRenderTargetBinding(PostOpaqueRenderParameters.ColorTexture, ERenderTargetLoadAction::ENoAction);
  2. attach full screen pass
    TShaderMapRef<FMyPostProcessingShader> PixelShader(GetGlobalShaderMap(GMaxRHIFeatureLevel));
    
    
    FPixelShaderUtils::AddFullscreenPass(
       GraphBuilder,
       GetGlobalShaderMap(GMaxRHIFeatureLevel),
       RDG_EVENT_NAME("My PIXEL PASS"),
       PixelShader,
       PixelPassParameters,
       PostOpaqueRenderParameters.ViewportRect
       );

Handy Notes:

  • to hot reload shader after some modifications: run recompileshaders changed in console

References

https://medium.com/realities-io/using-compute-shaders-in-unreal-engine-4-f64bac65a907

https://sites.google.com/view/arnauaguilar/projects/computeshader-ue5

https://dev.epicgames.com/community/learning/tutorials/WkwJ/unreal-engine-simple-compute-shader-with-cpu-readback

https://www.froyok.fr/blog/2021-09-ue4-custom-lens-flare/#step_1_setting_up_a_plugin

https://itscai.us/blog/post/ue-view-extensions/#mycustommodulecpp-and-mycustommoduleh

https://mcro.de/c/rdg#pass-parameters

https://thegraphicguysquall.wordpress.com/2022/04/23/ue4-27-hello-ray-tracing/

GPU Particle Synchronization System

General / 21 June 2021

Summery: a fireflies inspired GPU particle synchronization Engine for Teamlab worldwide exhibitions which can also be adapted to simulate other grouping behaviors.
I created a high performance firefly simulation system: adapted synchronization math and coupling behavior in compute shader. Efficient pipeline assures million fireflies interact with sound and human motion seamlessly for giant projection setups in San Francisco, Miami, Tokyo and Shanghai etc. 

System written in C#, compute shader, hlsl in Unity.


Features:

  1. Synchronization patterns: creates desired syncing modes by simple parameters adjustment. 
    (drastic full-range coupling)

    (fast spiral coupling)

    (fruit flies coupling) 

    (noise tornado coupling)

    (far range natural coupling)

  2. Grouping map: A mini system through shader to control grouping behaviors of particles.

    grouping map modifying demonstration  

    process of making grouping map in hlsl shader

  3. Secondary sync map control: An interactive layer on top of synchronization system to control sync range and spreading speed.

    A: Interactive coupling map

    B: Base Kuramoto synchronization layer
      

    A+B: Coupling map controlled sychronization

    Kuramoto synchronization algorithm in pseudo code:

    If (within coupling range)
     For 0 -> N
      phaseInRad = phaseBuffer[index] * PI * 2;
      sumInXAxis += cos(phase);
      sumInYAxis += sin(phase);
     num ++;
    
    note:
    R = square root of sumInX / num and sumInY / num
    ψ = angular value of sum in x and y (atan2)

Kuramoto Synchronization close-up


Resources:

 https://en.wikipedia.org/wiki/Kuramoto_model

 http://go.owu.edu/~physics/StudentResearch/2005/BryanDaniels/intro.html

https://www.nationalgeographic.com/animals/article/watch-how-mexican-fireflies-synchronize-light-shows