Visualization of Unity perspective projection matrix transformation


This article demonstrates that the Demo has been uploaded to Github: CameraProjectionMatix

In 3D rendering pipeline, when a point of an object is mapped from 3D space to 2D screen, MVP transformation matrix is usually used, and these three letters refer to three matrices transformed in different coordinate spaces, namely:

  • M (Model): convert from local space to world space
  • V (View): convert world space to camera space
  • P (Projection): convert camera space to regular observation volume

In the previous article, the conversion matrix of different coordinate spaces of the camera has been described in detail. The introduction about the conversion of local coordinates of objects to world coordinates is also applicable to the conversion of world space to camera space. If you are interested, you can check this link:

On the basis of the previous article, this article will introduce the most important P transformation in the rendering process, that is, the process of converting the camera space to the regular observation volume (which can also be understood as converting the perspective space of the camera to the orthogonal space). The process is shown in the figure (the picture comes from the network):

The derivation of projection matrix has been described in detail by many leaders in theory, but it is usually the formula analysis for graphics, which is difficult to understand, and there is less mention of the application in engineering projects. In order to simply understand the transformation process of projection and apply it in practice, this article will do a detailed transformation visualization based on Unity engine to disassemble the whole process as simple as possible

Significance of camera projection matrix

1. Camera perspective projection concept:

In painting theory, the method or technology of using perspective to represent the spatial relationship of objects depicted on a plane or curved surface. Generally speaking, the online spatial sense and three-dimensional sense on a plane will be represented by three attributes:

  • The perspective shape (contour line) of an object, that is, the reason for the change and reduction of different distance shapes up, down, left, right, front and back
  • The color change caused by distance is the scientization of color perspective and air perspective
  • The degree of fuzziness of objects at different distances, that is, invisible perspective

The above information comes from Baidu Encyclopedia's explanation of perspective. The simple understanding is that in the perspective shape of the object, due to the existence of the angle of the eyes, we always have the feeling of near large and far small when observing the object. In the long run, due to the gradual accumulation of experience, this phenomenon indirectly helps human beings complete the construction of three-dimensional space

Back to the game engine, there is no better way than simulating human eye imaging. In order to enable the computer to correctly simulate the human senses and render the perspective picture, in the perspective mode, the camera marks its viewing range through a cone with an opening angle, and usually the tip of the top of the cone will be eliminated, which is called the viewing cone

Although the concept of visual cone can well solve the problem of perspective, it brings some problems to its subsequent calculation. In short, a rectangle can easily divide its spatial range through the length of the center point and its sides, and compress an axis to achieve the purpose of three-dimensional to two-dimensional projection. However, the visual cone is a cone with different lengths in different coaxial directions, so it is difficult to define the scope

Since it is difficult to directly judge the space of the viewing cone, excellent programmers or mathematicians think of the mathematical transformation formula to regularize the viewing cone and participate in the operation in the form of matrix, which is the perspective projection matrix of the camera

2. Visualizing camera cone space with Gizmos:

As mentioned earlier, the perspective mode of the camera is different. In order to obtain the picture of near large and far small, the projection camera will zoom the picture of the plane section according to the depth information, and these plane sections are continuously combined to form the viewing cone space of the camera. The rendering space range of the camera can be expressed through the rendering tool Gizmos provided by Unity. The rendering code is:

	public void OnDrawGizmos()
        //Camera projection matrix drawing
        Matrix4x4 start = Gizmos.matrix;
        Gizmos.matrix = Matrix4x4.TRS(transform.position, transform.rotation,;
        Gizmos.color = Color.yellow;
        Gizmos.DrawFrustum(, cam.fieldOfView, cam.farClipPlane, 0, cam.aspect);
        Gizmos.color =;
        Gizmos.DrawFrustum(, cam.fieldOfView, cam.farClipPlane, cam.nearClipPlane, cam.aspect);      
        Gizmos.matrix = start;

        //Coordinate guide drawing
        Gizmos.color =;
        Gizmos.DrawLine(cam.transform.position, cam.transform.position + cam.transform.right * 10);
        Gizmos.color =;
        Gizmos.DrawLine(cam.transform.position, cam.transform.position + cam.transform.up * 10);
        Gizmos.color =;
        Gizmos.DrawLine(cam.transform.position, cam.transform.position + cam.transform.forward * 10);

After editing the above code, you can see the auxiliary lines in the following figure in the Scene window of Unity engine. The three-dimensional figure delineated by the red box is the camera projection cone:

Projection matrix transformation process

The rendering space of the camera can be delineated through the visual cone, but how to express it mathematically to facilitate machine understanding and operation? In order to get a conclusion, we need to clarify the conditions first. You need to get the information conditions before you start pushing. Consult the official Unity document to get the parameter information related to the camera angle:

  • Near Clip Planes: Near Clip Planes, which represent the nearest position of the object to be rendered
  • Far Clip Planes: Far Clip Planes, representing the farthest position where the object will be rendered
  • Camera FOV: the camera angle represents the aspect ratio of the field of view
  • Aspect ratio of the screen: FOV of the camera in the other direction can be obtained according to this ratio

1. Identifies the eight vertices of the projection cone:

To identify the range of a space, line segments are usually combined into wireframes of corresponding shapes, such as the mesh of the model. To determine the length and position of the line segment, you need to get the vertex first. Therefore, in the first step, we will calculate the eight vertices corresponding to the viewing cone of the camera based on some basic parameters of the camera above, so as to lay the foundation for the subsequent CVV space delimitation and acquisition

In order to simplify the derivation process, the problem of three-dimensional space is mapped to two-dimensional space. With the Y-axis as the normal, the shape of the camera cone is shown in the figure below. The space range identification line of the camera cone under the two-dimensional perspective composed of the X-axis and the Z-axis is used to identify some synonyms of key points. According to the key parameters of the camera in front, the following known information can be obtained:

  • Length of AB: near clipping plane of camera
  • Length of AC: far clipping plane of camera
  • Angle BAD: it is half of the corresponding radian of the camera FOV (horiziontal or vertical direction)

Taking the coordinate acquisition of point D as an example, through the triangle constructed in the above figure, it is known that the length of AB is the length from the near cutting plane to the camera, and the degree of angle BAD is half of the camera FOV. The length of BD can be calculated by using the trigonometric function, so as to obtain the coordinate lengths AB and BD of point D in Z-axis and X-axis

As for the length of the Z-axis, the corresponding Camera can also pass through the length of the near clipping plane and the value of the FOV of the Camera in the other axis direction (which can be obtained by using the Camera's static method VerticalToHorizontalFieldOfView)

The coordinates of these objects are obtained by instantiating the coordinates of the top eight points of the camera and placing them in the following process:

 	List<Vector3> GetPosLocation()
        List<Vector3> backList=new List<Vector3>();
        for (int z = 0; z < 2; z++)
            for (int i = -1; i < 2; i += 2)
                for (int j = -1; j < 2; j += 2)
                    Vector3 pos = GetPos(z, new Vector2Int(i, j) , cam);
      return backList;
    Vector3 GetPos(int lenType,Vector2Int dirType,Camera cam)
        Vector3 cPos = cam.transform.position;
        //According to FOV, radians corresponding to horizontal and vertical angles are obtained
        float vecAngle = (cam.fieldOfView*Mathf.PI)/360;
        float horAngle = Camera.VerticalToHorizontalFieldOfView(cam.fieldOfView, cam.aspect) * Mathf.PI/360;
        float zoffset  = lenType == 0 ? cam.nearClipPlane : cam.farClipPlane;
        float vecOffset = zoffset * Mathf.Tan(vecAngle);
        float horOffset = zoffset * Mathf.Tan(horAngle);
        Vector3 offsetV3 = new Vector3(horOffset * dirType.x, vecOffset * dirType.y, zoffset);
        return cPos + offsetV3;

After obtaining the coordinate positions of eight vertices through trigonometric function, instantiate Sphere to calibrate these points. The specific effect is shown in the figure:

2. Convert camera space to CVV through projection matrix:

In order to calibrate the viewing cone range after the projection matrix transformation, the eight vertices of the camera viewing cone obtained above are used as a reference for MVP calculation. Because this article directly makes spatial transformation based on the points of world coordinates and has no concept of local coordinates, the process of object transformation from local space to world space can be ignored

Skip the M calculation and directly convert the coordinates of the above eight points through the trigonometric function into the space coordinates of the camera to obtain its coordinates in the camera space, as shown in the figure:

As can be seen from the above figure, when the camera is at the origin, the coordinates of the eight vertices after calculation are the same as those before calculation. The X-axis and Y-axis coordinates are the same, but the Z-axis is opposite. This is because both the world space and local space coordinates of Unity use the left-hand coordinate system, while the camera space uses the opposite right-hand coordinate system

After the vertex is converted from world space to camera space, the line of sight can be converted from camera space to CVV through the projection matrix of the camera. However, it should be noted that the calculation is done under the homogeneous coordinates when calculating the projection matrix. Therefore, when obtaining the coordinates of CVV space, it needs to be divided by the W component in the return value of Vector4. After the calculation is completed, it will be displayed as:
As shown in the figure, after the multiplication calculation of the projection matrix, the spatial coordinate points in the viewing cone of the camera will be mapped to a regular observation body with a spatial range of 1. Of course, the length representation in the three-dimensional space is not obvious. Map it to the two-dimensional space. Its length is shown in the figure:

After the corresponding coordinates in the CVV are obtained through projection transformation, the three-dimensional space can be reduced to the two-dimensional plane by discarding a certain axis, and the viewfinder picture of the camera can be obtained. At the same time, in order to obtain the picture with correct occlusion relationship, the picture is usually drawn based on the Z-axis, that is, the concept of depth buffer

Camera projection matrix:

Through the previous visualization process, we can see that converting the viewing cone of camera space to CVV is the meaning of camera projection matrix. Conversely, the projection matrix supporting this process is the set of mathematical formulas in this transformation process

According to the Unity document, the projection matrix of the camera can be obtained. However, it should be noted that the FOV parameter of the camera is discarded in the matrix and converted into the distance from the center point of the camera near the cutting plane to the four sides. In this way, the expression of the matrix is slightly clearer and clearer. The matrix is expressed as:

{ 2 ∗ n e a r / ( r i g h t − l e f t ) 0 ( r i g h t + l e f t ) / ( r i g h t − l e f t ) 0 0 2 ∗ n e a r / ( t o p − b o t t o m ) ( t o p + b o t t o m ) / ( t o p − b o t t o m ) 0 0 0 − ( f a r + n e a r ) / ( f a r − n e a r ) − ( 2 ∗ f a r ∗ n e a r ) / ( f a r − n e a r ) 0 0 − 1 0 } \left\{ \begin{matrix} 2*near/(right-left) & 0 & (right+left)/(right-left) & 0\\ 0 & 2*near/(top-bottom) & (top + bottom) / (top - bottom) & 0\\ 0 & 0 & -(far + near) / (far - near) & -(2 * far * near) / (far - near) \\ 0 &0 &-1 &0 \end{matrix} \right\} ⎩⎪⎪⎨⎪⎪⎧​2∗near/(right−left)000​02∗near/(top−bottom)00​(right+left)/(right−left)(top+bottom)/(top−bottom)−(far+near)/(far−near)−1​00−(2∗far∗near)/(far−near)0​⎭⎪⎪⎬⎪⎪⎫​

The meaning of each parameter:

  • Near: the distance from the camera's near clipping plane to the camera
  • Far: the distance from the far clipping plane of the camera to the camera
  • Right: the distance from the right border of the camera near the clipping plane to the midpoint
  • Left: the distance from the left border of the camera near the clipping plane to the midpoint
  • top: the distance from the upper border of the camera near the clipping plane to the midpoint
  • Bottom: the distance from the bottom frame of the camera near the clipping plane to the midpoint

Although the specific derivation process of the principle is easy to understand, the expression of the mathematical formula is complex. If you are very interested, you can read this article:

Explore perspective projection transformation

Projection matrix extension

1. Use the camera projection matrix to judge whether a point is in the camera field of view

Due to the irregular range of the camera cone, it is more complex to directly use the world coordinates to judge whether a point is in the field of view of the camera. In the previous understanding of projection matrix, when the viewing cone of the camera is converted from world coordinates to CVV, its boundary range is a regular cube of known size, and it is very easy to judge whether a point exists in the space range:

 public static bool CheckPointIsInCamera(Vector3 worldPoint, Camera camera)
        Vector4 projectionPos = camera.projectionMatrix * camera.worldToCameraMatrix * new Vector4(worldPoint.x, worldPoint.y, worldPoint.z, 1);
        if (projectionPos.x < -projectionPos.w) return false;
        if (projectionPos.x > projectionPos.w) return false;
        if (projectionPos.y < -projectionPos.w) return false;
        if (projectionPos.y > projectionPos.w) return false;
        if (projectionPos.z < -projectionPos.w) return false;
        if (projectionPos.z > projectionPos.w) return false;
        return true;

However, there is usually less judgment for a point, and more processing for an object in the scene. At the same time, in order to minimize the amount of calculation, the boundary range of the object will be judged. Here is a method written by the boss to judge whether the Bound of the object has an intersection with the viewing cone of the camera:

    public static bool CheckBoundIsInCamera(this Bounds bound, Camera camera)
        System.Func<Vector4, int> ComputeOutCode = (projectionPos) =>
            int _code = 0;
            if (projectionPos.x < -projectionPos.w) _code |= 1;
            if (projectionPos.x > projectionPos.w) _code |= 2;
            if (projectionPos.y < -projectionPos.w) _code |= 4;
            if (projectionPos.y > projectionPos.w) _code |= 8;
            if (projectionPos.z < -projectionPos.w) _code |= 16;
            if (projectionPos.z > projectionPos.w) _code |= 32;
            return _code;

        Vector4 worldPos =;
        int code = 63;
        for (int i = -1; i <= 1; i += 2)
            for (int j = -1; j <= 1; j += 2)
                for (int k = -1; k <= 1; k += 2)
                    worldPos.x = + i * bound.extents.x;
                    worldPos.y = + j * bound.extents.y;
                    worldPos.z = + k * bound.extents.z;

                    code &= ComputeOutCode(camera.projectionMatrix * camera.worldToCameraMatrix * worldPos);
        return code == 0 ? true : false;

2. Deep buffer knowledge

Depth buffer is used to record the depth value of each pixel. Through the depth buffer, depth test can be carried out to determine the occlusion relationship of pixels and ensure correct rendering. When an object in space is calculated to CVV space through MVP, the distance from the camera will be recorded in the direction of Z axis, that is, the depth of a pixel

Due to the relationship of spatial projection, the accuracy of depth buffer is nonlinear. Generally, the closer to the camera, the higher the accuracy of depth buffer. As can be seen from the example in the following figure, when a point in world space (marked by the yellow ball on the right) moves away from the camera at a uniform speed, the Z-axis change of its corresponding CVV space will be smaller and smaller:

Tags: Unity

Posted by df75douglas on Tue, 19 Apr 2022 06:37:36 +0930