Technology and Methods for Deferred Synthesis of 4K Stereo Clips for Complex Dynamic Virtual Scenes

We consider the task of capturing the result of a researcher-driven stereo visualization of a complex dynamic virtual scene into a video sequence of stereo pairs (stereo clip) of ultrahigh resolution. An efficient technology of deferred synthesis of stereo clips is proposed. It allows one to create stereo clips without interfering with real-time visualization. The technology includes real-time construction of visualization scenario and offline transformation of the scenario into a stereo clip. Methods for implementing these stages for the task of stereo visualization of the saturation isosurface of displacing fluid are considered. For this purpose, a special file format «scr» for the visualization scenario is developed on the basis of chunk data structures. This format provides a compact representation of neighboring repeated frames. The scenario file is transformed into a sequence of 4K stereo pairs by means of the offscreen rendering technology of the virtual scene, and stereo pairs are added to the stereo clip using a number of open-source FFmpeg libraries designed for processing digital video content. The generated stereo clip is placed within an MP4 container, and the video compressing standard H.264 is used. The proposed technologies and methods of 4K stereo clip deferred synthesis are implemented in a software designed for visualizing simulation results of the unstable displacement of oil from porous media. Using this software, a 4K stereo clip is created, which illustrates the evolution of the saturation isosurface during the process of unstable oil displacement. Testing confirmed the validity of the solution. The software can be used in virtual laboratories, for designing virtual environment systems and scientific visualization systems, in educational applications etc.


INTRODUCTION
Presently, 3D real-time visualization of complex dynamic virtual scenes [1,2] (with the frame generation rate at least 25 frames per second) is required in many scientific experiments. This is especially important in the domains where the investigation of the object entails its destruction and the cost of a new specimen is high. An example is the oil and gas industry, in particular, virtual experiments on unstable displacement of oil from porous media [3][4][5].
The informativeness and quality of virtual experiments can be significantly improved using stereo visualization of virtual scenes [6] in ultra-high resolution (4K, Ultra HD). Such a visualization in real time is a computationally costly task-it requires high-end graphics cards and specialized software, which prevents information sharing in scientific community. An efficient solution is to write the visualization process of a virtual scene to a file as a video sequence of stereo pairs (stereo clip). Stereo clips can be easily shared among researchers and replayed using stereo players [7] on personal computers and mobile devices. The difficulty is that when external video capturing software [8] is used, the rendering of each frame is actually suspended for the time needed to read and encode the image, which results in the violation of the real-time image generation and intermittences in the video clip. This is especially noticeable when the researcher drives the visualization process, e.g., rotates or zooms the virtual scene, switches between viewpoints, etc. To reduce the time needed to capture images, hardware-based video encoders (NVidia NVENC [9]) have been developed in which compression algorithms are executed in parallel on multicore GPUs. Compared with encoding videos on the CPU, this gives significant speed increase in compressing ready-to-use video sequences [10]; however, in the visualization of complex dynamic virtual scenes, where GPU is intensively used [4], the effect of mutual deceleration of the encoding and PROGRAMMING  visualization processes is observed. Hence, there is need to develop technologies for synthesizing video clips in situ, i.e., directly within the visualization system. These technologies should be based on capturing the primary information (the dynamic visualization parameters of the virtual scene), since the amount of this data is significantly less than the pixel data of the synthesized images.
In this paper, we propose the technology of deferred synthesis of 4K stereo video clips based on the real-time construction of a scenario for the virtual scene visualization driven by the researcher and on the offline-transformation of this scenario into the stereo clip. The proposed solution is implemented in С++ using the open source Qt software development toolkit, OpenGL library, and the set of open source libraries FFmpeg for processing digital videos.

THE TECHNOLOGY OF DEFERRED STEREO CLIP SYNTHESYS
Suppose we have a visualization system that is able to do real-time rendering of virtual scenes in mono and stereo modes (side-by-side stereo pair [6]). Initially, a virtual scene is loaded into the system, and the mono mode is set. The proposed technology of synthesizing stereo clips consists of two phases: (a) capturing visualization parameters and (b) generating the 4K stereo clip.
In the first phase, the set of dynamic parameters of virtual scene visualization is read from the visualization system in real time (see Fig. 1). Given such a set, the state of the virtual scene at the time of the corresponding image synthesis can be unambiguously reconstructed. In this paper, the set of all captured sets of dynamic parameters in which there are no identical adjacent sets is called the visualization scenario. After the first phase has been completed, the file containing the generated scenario is formed.
In the second phase (4K stereo clip generation) (see Fig. 2), the visualization system plays the scenario file frame-by-frame; during this process, the 4K stereo pair is rendered into a special offscreen (invisible) frame buffer, and its downscaled copy is placed into the onscreen (visible) frame buffer; this copy is fitted to the main visualization window width. Each generated 4K stereo pair is read from the offscreen buffer, encoded, and added to the stereo clip file, while the downscaled copy is rendered to the screen to enable the visual control of the clip generation process. This process can be performed not in real time, depending on the selected video encoder and the CPU and GPU performance.
Below, we consider the implementation of these phases using as an example the visualization of the results of unstable oil displacement from porous media [6] and, in particular, visualization of the saturation isosurface of displacing liquid.

Method for Constructing the Visualization Scenario
In the task under examination, the parameters of the visualization process are as follows: • position and orientation of the isosurface model; • parameters of the virtual camera; • parameters of the light source; • parameters of the isosurface model material; • constant value of the saturation field (isolevel); • the running number of the simulation step to be visualized.
To capture these parameters, a special visualization scenario file format (scr file) was developed, which includes a header (containing the general information) and a data block containing the visualization parameters (see Fig. 3). The scr-file header contains the following data: • the identifier scrId = 0x00726373 (the letters s, c, and r are in ASCII); • the number numFrames of frames in the scenario; • the interval queryInterval to query the visualization parameters in ms; • the size scrSize of the data block in bytes. The scr-file data block stores the visualization parameters for only nonrepeated frames. If a frame repeats several times, then the visualization parameters are written only for the first frame, and the subse- Virtual scene quent identical frames are simply counted. For this purpose, the structure «frame packet» is used. It consists of the following fields: • the size blockSize of the packet data block in bytes; • the number numRepeats of repetitions of the current frame in succession; • the packet data block with the values of the visualization parameters.
The packet data block consists of chunks. A chunk is a data structure that contains the object identifier id, the size (in bytes) and the field data of the visualization parameter values. The field data of the chunk contains the visualization parameter values (one parameter or a group of related parameters). The use of such structures makes it possible to easily add new visualization parameters to the scr-format, modify and delete them, while maintaining the backward compatibility of formats (if the visualization system cannot recognize a chunk identifier, then its data field is ignored).
For the task under examination, the following types of chunk structures were implemented: • SMVChunk contains the modelview matrix specifying the position and orientation of the isosurface model in the camera coordinate system (id = 0, size = 64, data contains 16 elements of the matrix); • SCamChunk determines the camera parameters (id = 1, size = 16, data determines the camera vertical field of view, the ratio of the frame width to its height (aspect), and the distances to the near and far clipping planes); • SLightChunk determines the parameters of the directed light source (id = 2, size = 64, data determines the light source direction and the intensity of the ambient, diffuse, and specular components of illumination); • SMaterialChunk determines the parameters of the isosurface model material (id = 3, size = 52, data determines the colors of the ambient, diffuse, and specular properties of the material and the shininess coefficient); • SIsoLevelChunk determines the isolevel (id = 4, size = 4, data determines the constant value of the scalar saturation field of the displacing liquid); • SStepChunk determines currently visualized simulation step (id = 5, size = 4, data is the running number of the visualized simulation step).
The visualization parameters are captured with a query frequency ν 1 (times per second), where 25 < ν 1 ≤ ν 2 and ν 2 is the lowest frequency of generating virtual scene images, which depends on the computational power of the graphics processor. During each query, the values of all visualization parameters are read, and they are compared with the values given by the preceding query. If the current value of a visualization parameter is repeated, it is not written to the scenario file. Figure 4 shows the querying scheme using as an example the modelview matrix of the isosurface model, which is calculated by the GPU for each frame. Note that, in more complicated multiobject virtual scenes, the position and orientation of virtual objects may be written in terms of quaternions in order to reduce the size of the scenario file.
In this work, the visualization parameter querying is implemented using the timer (the class QTimer) available in the library Qt. The timer's periodic time out signal is connected to the function that forms the packet using the operator connect. The frame packets produced by this function are accumulated in the dynamic byte array V.   To view the captured visualization process, a scenario playback tool was developed. As in Algorithm 1, the playback mode is implemented using a timer. The timer's periodic time out signal is connected to the developed scenario playback function. Before playback, we load all packets from the scenario file into the byte array K of size scrSize (from the scr-file header). The scenario playback is performed by Algorithm 2. If gotFrame ≠ true, then Read blockSize and numRepeats of the current packet from array K. curSize = 0. Loop on the chunks in the packet until curSize ≠ blockSize.
Read id and size of the current chunk. Read the value of the visualization parameter (identified by its id) from the field data of the current chunk. Update the value of this parameter in the visualization system. curSize = curSize + CHUNK_HEADER + size. End of loop. packetsCnt = packetsCnt + PACKET_HEADER + blockSize. If numRepeats ≠ 0, then gotFrame = true; Otherwise, frameCnt = frameCnt + 1. If frameCnt == numRepeats, then gotFrame = false, frameCnt = 0. Visualize the scene with the new parameters. End of loop. Algorithm 2. Visualization scenario playback.

Method for Transforming Scenario to a 4K Stereo Clip
To create a 4K stereo clip, efficient media container and a video encoding algorithm supporting the work with large amounts of video data (about 16 million pixels per frame) are required. In this work, the container MP4 is used, which is a part of the international standard MPEG-4 (MPEG-4 Part 14, ISO/IEC 14496-14:2003) of compressing digital audio and video records. Table 1 compares MP4 with other widespread containers AVI (Microsoft), MOV (Apple), and MKV (open source). It is seen from this table that MP4 outperforms AVI and MKV in terms of functional capabilities, in particular, due to the support of B-frames (they make it possible to considerably reduce the size of the encoded video sequence). Another important advantage of MP4 is the support of the Edit-in-place technology, which allows one to modify certain parts of the video (e.g., cut and paste) without recompression of the entire video sequence. The capabilities of MOV are close to MP4; however, it is primarily designed for using in Apple operating systems, while MP4 is an industry standard and has a wider support.
Another significant feature of MP4 is the support of online broadcasting.
The container MP4 supports a number of modern video codecs, among which H.264 was selected. This codec is also a part of the international standard MPEG-4 (MPEG-4, Part 10); even though its compression ratio is lower than in the codecs H.265/HEVC and AV1, it is most widespread because it has been in the industry for many years. During this time, the standard has been updated, got hardware decoding in the majority of players, and proved to be effective. A feature of H.264 is the work with macroblocks (groups of pixels) of size from 16 × 16 to 4 × 4 [11], which imposes certain requirements for the width and height of the video to be encoded. In the task under examination, the size of the 4K stereo pair is a multiple of 16 (7680 × 2160 pixels).
In this work, the recording of stereo pairs into the stereo clip was implemented using the set of open source libraries FFmpeg (Fast Forward MPEG) [12]. We used the following libraries available in this set: libavcodec (audio and video coders and decoders), libavformat (multiplexors and demultiplexors of media containers), libswscale (image scaling functions and converters of color spaces and pixel formats), and libavutil (random number generators, mathematics and multimedia utilities, etc.). The codec H.264 has a large number of settings, which enable one to control the quality and performance of video sequence encoding. Among them are the bitrate, the number of frames in each group of successive pictures within a coded video stream (GOP), the parameter of complexity of motion evaluation, the video stream compression ratio, the video codec profile, and others. In our work, these parameters are specified in the structure AVCodecContext of the FFmpeg library.
To begin the process of encoding a video using FFmpeg, a number of preliminary actions must be performed: specify the output video format (AVOutput-Format), create the input-output context for recording the video (AVFormatContext), create the video recording stream (AVStream), connect the stream to the coder Н.264 (AVCodec), and initialize the structures AVFrame, SwsContext, and AVPicture for processing video frames and the structure AVPacket for adding frames to the container MP4. A more detailed description of the implementation of these stages can be found in [12].
The process of writing a 4K stereo clip includes the modified playback of the scenario file (see Algorithm 2), in which the virtual scene stereo visualization in the side-by-side format is used [6]. As in Algorithm 2, the input data is the array K of frame packets loaded from the scenario file. The transformation of the scenario into a stereo clip is implemented by Algorithm 3.
1. Create an offscreen buffer F for the frame of size (2w s )xh s pixels, where w s and h s are the width and height of the 4K-frame in the mono mode.
2. Create a buffer I (of type QImage from the library Qt) of size (2w s )xh s pixels for storing the stereo pair in memory.
3. Specify a rectangle P 0 P 1 P 2 P 3 of size w f × h f fit to the main visualization window width: Calculate the height h i of the rectangle P 0 P 1 P 2 P 3 in the normalized device coordinate system (NDCS): h i = 2a s a f , where a s = 2w s /h s is the 4K stereo pair aspect, and a f = w f /h f is the main visualization window aspect. Write the coordinates of P 0 P 1 P 2 P 3 vertices in NDCS: P 0 = (-1, 0.5h i ), P 1 = (-1, -0.5h i ), P 2 = (1, 0.5h i ), P 3 = (1, -0.5h i ). 4. Open file for writing the video using the FFmpeg function url_fopen.

Loop on i from 1 to numFrames
Read the ith frame packet from array K and update the values of the parameters in the visualization system (see items 3.1 and 3.2 in Algorithm 2).
Synthesize the texture S with the stereo pair image: Activate the offscreen frame buffer F. Visualize the virtual scene in the side-by-side stereo mode. Deactivate the offscreen frame buffer F. Add the stereo pair to the MP4 container: Unload the texture S from the video memory into the stereo pair buffer I. Convert the RGB stereo pair (buffer I) to a YUV image (structure AVFrame), using the FFmpeg function sws_scale; here Y is the brightness and U and V are the color difference components.
Encode the YUV image into the byte data array Н.264 using the FFmpeg function avcodec_encode_video. Add the data array Н.264 into the compressed frame structure AVPacket (container MP4). Render the stereo pair in the main visualization window: Set the viewport using the function glViewport(0, 0, w f , h f ). Visualize the rectangle P 0 P 1 P 2 P 3 with the superimposed texture S. End of loop. 6. Close the file containing the written video using the function url_fclose.
Algorithm 3. Transforming the scenario to a stereo clip.
Algorithm 3 produces an MP4-file containing the 4K stereo clip in the basic stereo side-by-side format, which can be played by means of polarization stereoscopic equipment using a stereo player (e.g., Stereo-scopic Player) and transformed (in the same player) to some other popular stereo formats (anaglyphic, interlaced, etc.).

RESULTS
The technology and methods of deferred synthesis of 4K stereo clips were implemented in software for visualizing the results of simulation of unstable displacement of oil from porous media [13]. Using this software, we investigated the evolution of instability of oil displacement by water and, in particular, changes of the shape of the displacing liquid isosurface. The input data consists of a sequence of 65 3D arrays of saturation values of the displacing liquid obtained by the step-by-step simulation of unstable oil displacement on a grid of 100 3 cells. For each of the 65 simulation steps, we constructed and visualized a 3D polygonal isosurface model. The study included dynamic changes in the orientation and scaling of the isosurface model and its reconstruction for various constant saturation values. The visualization of the isosurface model was carried out with Ultra HD 4K resolution (3840 × 2160) on a personal computer (Intel Core i7 950 3.06 GHz, 12GB RAM, NVIDIA GeForce GTX 1080 Ti with 11GB VRAM and 3584 cores). The average visualization framerate was about 100 frames per second.
A part of the study lasting about 2 minutes was chosen, for which a visualization scenario with a querying interval of 10 ms was recorded. The time cost of capturing the dynamic visualization parameters was extremely low (less than 1 ms per frame) and did not affect the visualization framerate. The size of the resulting scrscenario file was about 370 Kb. Based on the created scenario file, we synthesized a 4K stereo clip that demonstrated changes in the shape of the isosurface of displacing liquid. Figure 5 shows the process of synthesis of a 4K stereo clip in the visualization system. Figure 6 shows the playback of the resulting 4K stereo clip in the Stereoscopic Player in anaglyphic stereo mode.

CONCLUSIONS
Capturing stereo visualization of a complex dynamic virtual scene into a 4K stereo clip was considered. The use of external video capturing programs is difficult because as each frame is visualized, a significant amount of time is spent on reading and encoding the stereo pair, which results in intermittences in the stereo clip. For this reason, it is reasonable to synthesize stereo clips directly within the visualization system (in situ) based on capturing the values of dynamic parameters of the visualization.
In this paper, a technology for the deferred synthesis of video clips was proposed that makes it possible to write 4K stereo clips not interfering with real-time visualization. The technology includes the phase of capturing all nonrepeated in succession sets of dynamic parameters of the visualization (scenario) and the phase of transforming such a scenario into a stereo clip, which can be done offline. Methods for the implementation of these phases using as an example the stereo visualization of the saturation isosurface of displacing liquid are proposed. The implementation includes a developed original «scr» format of the scenario file. This format uses chunk data structures that make it possible to effectively expand and modify the format while maintaining backward compatibility.
Efficient algorithms for constructing and playing back scenarios in the visualization system are described. In these algorithms, the successive identical packets of visualization parameter values are simply counted but not recorded. The phase of transforming the scenario file into a 4K stereo clip is implemented using the technology of virtual scene offscreen rendering and the set of open source libraries FFmpeg for processing digital video records. A number of media containers are compared, and the container MP4, which outperforms the other containers in some respects, together with the video compression standard H.264 is chosen. The paper describes an algorithm for transforming the scenario into a stereo clip in which visual control of the stereo clip synthesis process is implemented.
The solution proposed in this paper was implemented in the visualization system of the simulation results of unstable displacement of oil from porous media. A 4K stereo clip was created that demonstrates changes in the shape of the saturation isosurface of the displacing liquid during the evolution of the unstable oil displacement process. The results confirmed the validity of the proposed solution and its usefulness for virtual laboratories, virtual environment systems, scientific visualization, and for other purposes.

FUNDING
The publication is made within the state task on carrying out basic scientific researches (GP 14) on topic (project) "34.9. Virtual environment systems: technologies, methods and algorithms of mathematical modeling and visualization" (0580-2021-0012).