Sunday 29 December 2013

Timeout Detection and Recovery (Stop 0x116) Internals

Stop 0x116's and Stop 0x117's are largely the same bugcheck. There is also the Stop 0x119, which is related to the video scheduler causing problems. However, this blog post is going to look at the internals of Timeout Detection and Recovery and explain what this recovery process is, and how it may lead to a Stop 0x116 or Stop 0x117 bugcheck.

Since Windows Vista, Microsoft has introduced a new feature called TDR (Timeout Detection and Recovery), which as the name suggests, enables drivers to recover from hardware time-outs instead of the system completely crashing.

The GPU Scheduler firstly detects a graphics task is taking longer than it should, and goes to preempt this task and it's associated thread. If the GPU Scheduler is unable to complete or premept the task with the TDR Timeout period, then the GPU is considered to be frozen, and the preparation for recovery process begins.


The GPU Scheduler then calls the DxgkDdiResetFromTimeout which informs the graphics driver that the operating system has detected a timeout, and the GPU will need to be reset. The routine also stops the graphics card driver from accessing any form of memory. The routine causes the graphics card driver threads to run synchronously, as a result no other threads are running at the same time as the DxgkDdiResetFromTimeout routine. Furthermore, access to the frame buffer is not permitted and the PLL is also set for the memory controller. The PLL or Phase Locked Loop is used for digital clock signal synchronization for data transfers. A Frame Buffer, on the other hand, is used to store bitmaps of pixels (forming a entire image), and then storing this image in a Video RAM (VRAM) to be sent to the monitor for output. The KeSynchronizeExecution routine may be called to register interrupts and ISRs with graphics related reset routines.

 If the DxgkDdiResetFromTimeout routine fails, then the system will bugcheck with a Stop 0x116. Otherwise, the recovery stage will be started and the graphics stack will be reset.


After the DxgkDdiResetFromTimeout routine has returned with STATUS_SUCCESS, then the operating system will begin to clear up any resources which are not being used. Other driver routines may be called here, which I will begin to explain below.

For example, lets begin with the DxgkDdiBuildPagingBuffer routine. This routine is used if a allocation was paged into a memory segment. A short concise explanation of memory segments and video memory should be described here to help explain this routine. 

Memory Segments are used by the Miniport driver to describe the GPU's address space to the Video Memory Manager. Each Memory Segments are generally used to organize video memory resources. The driver creates a list of support segment types with the DxgkDdiQueryAdapterInfo routine, and then describes each segment with the
DXGK_SEGMENTDESCRIPTOR data structure.

When the Video Memory Manager wishes to allocate a certain video resource to a memory segment, the driver checks which segment (by the segment identifier) is most suitable for the video resource at hand and request. A allocation is created with the
DxgkDdiCreateAllocation routine, these allocations are then described with the
DXGK_ALLOCATIONINFO data structure.


The information above should be enough to understand, the DxgkDdiBuildPagingBuffer routine and it's role with releasing allocations. When the above routine is called after a reset, a paging buffer is created which is DMA buffer for the use by the GPU.


In this current situation, a paging buffer will be created for a transfer operation, thus the Operation member of the DXGKARG_BUILDPAGINGBUFFER data structure is set to
DXGK_OPERATION_TRANSFER to move the content of one allocation to another.


The Transfer.Size member is set to 0, since the content would have been lost during the reset.

On the other hand, if the Memory Segment was a aperture (physical address space assigned to a external device) memory segment, then the DXGKARG_BUILDPAGINGBUFFER Operation member is assigned the value of DXGK_OPERATION_UNMAP_APERTURE_SEGMENT which then umaps the allocation was the aperture.


Additional Reading - Linear Aperture Address Space Segments

The DxgkDdiReleaseSwizzlingRange routine may be called to release a swizzling range for a CPU based aperture memory segment. The DXGKARG_RELEASESWIZZLINGRANGE data structure is used to store information about releasing a swizzling range. The
DxgkDdiAcquireSwizzlingRange routine is used to create a swizzling range.

Swizzling for computer graphics commonly means organising vectors, so they provide better performance and better textures for graphics.

Additional Reading - What is Swizzling?



References

No comments:

Post a Comment