№ | Слайд | Текст |
1 |
 |
Behind the Scenes: Optimizing FarCry 3, Left 4 Dead 2 and AssassinsCreed 3 using AMD’s GPU PerfStudio2 Gordon Selley - AMD Senior Member of Technical Staff - GPU Developer Tools Jean-Francois St-Amour - Ubisoft Montreal Lead Graphics Programmer Rich Geldreich - Valve Corporation Software Engineer |
2 |
 |
Behind the Scenes: Optimizing FarCry 3, Left 4 Dead 2 and AssassinsCreed 3 using AMD’s GPU PerfStudio2 GPU PerfStudio2 is AMD's performance and debugging tool for graphics applications. It is currently being used by many game studios to optimize games for PC based platforms. Recently used in the development of FarCry3 and Assassins Creed 3 at Ubisoft we will demonstrate how it was used to optimize their DirectX11® renderers. In addition, with industry support for OpenGL on the rise, Valve will demonstrate how GPU PerfStudio2 is being used to port and optimize existing games such as Left 4 Dead 2 on OpenGL. |
3 |
 |
GPU PerfSTudio 2 | The PresentersGordon Selley is a member of AMD’s GPU Developer Tools Team and is the team leader for GPU PerfStudio2. He has developed graphics software for the flight simulation, TV and film FX, PC and online games industry. Jean-Francois St-Amour is currently working as a Lead Graphics Programmer at Ubisoft's Montreal studio. In his time at Ubisoft, he has worked on the graphics technology behind Assassin's Creed and Prince of Persia. Before working on current console hardware, Jean-Fran?ois was working on mobile titles at Gameloft. He has a Master's degree in Computer Graphics from the University of Montreal, where he worked on real-time soft shadow algorithms, as well as GPGPU. Rich Geldreich is a software engineer at Valve Corporation, where he's spent the last N years shipping out labors of love: Portal 2, Defense of the Ancients 2, Counter Strike: Global Offensive, Team Fortress 2. When not at work, Rich works on open source image compression libraries and orders chicken instead of delicious steak. |
4 |
 |
GPU PerfSTudio 2 | Presentation OverviewIntroduction to GPU PerfStudio 2 What it is and what it does, who uses it, how it works. Assassins Creed 3: Understanding your frame How and why the tool was used during development Working with DirectX11 command lists Using the Frame Debugger with a deferred renderer Using Direct3D Performance markers with your effects Far Cry 3: Data mine your app using GPU PerfStudio2 Understanding the client/server command system Using a web browser to access game data Using a script to automate access to game data Left for Dead 2: How Valve ported Steam to Linux using GPU PerfStudio2 Overview of the Win32 to Linux conversion process Why and how GPU PerfStudio2 was used Trouble shooting GPU PerfStudio2 with OpenGL apps Questions |
5 |
 |
Introduction to GPU PerfStudio 2 |
6 |
 |
GPU PerfSTudio 2 | IntroductionWhat is GPU PerfStudio 2? GPU PerfStudio 2 is AMD’s current GPU Performance Analysis & Debugging Tool for graphics applications Key Features Integrated Frame Profiler, Frame Debugger, and API Trace with CPU timing information Shader Debugger with support for DirectX ® 10 & 11 HLSL and Assembly code Client / Server model - GPU PerfStudio 2 Client runs locally or remotely over the network GPU PerfStudio 2 Server supports 32-bit and 64-bit applications Supports DirectX® 11, DirectX® 10.1 , DirectX® 10 and OpenGL 4.2 applications No special build required for your application Small footprint – no installation Free download http://developer.amd.com/tools/graphics-development/gpu-perfstudio-2/ |
7 |
 |
GPU PerfSTudio 2 | Introduction – The 4 Main ToolsFrame Debugger Capture and play back a single frame View all game resources and state bound at each draw call Frame Profiler Identify costly draw calls Detect GPU bottlenecks - investigate at the counter level Shader Debugger/Editor Edit & Debug HLSL and Assembly code from inside your app Step, insert breakpoints, inspect all register values Edit GLSL from inside your app API Trace Viewer Inspect all API calls (with arguments) Visualize multi-threaded API usage CPU timeline information for each API call |
8 |
 |
GPU PerfSTudio 2 | Who uses itWidely used by internal and external groups AMD ISV Game Engineers: Optimize & debug co-marketed game titles in conjunction with developers AMD Driver Performance Team: Improve GPU benchmarks and titles at the driver level AMD Driver Team: Inspect apps that cause driver problems Graphics developers: Used in the development of DX11 and OpenGL graphics applications AMD Game Compute Team: Debug and optimize game technologies for new GPU hardware - AMD Mecha Demo, Ladybug, Leo demo |
9 |
 |
GPU PerfSTudio 2 | Remote and local debug sessionsClient/Server architecture Remote usage allows the game to be run full screen. Higher profiling accuracy, useful during final optimization. Local usage on a single machine (preferred by developers). More suited for use during development. |
10 |
 |
GPU PerfStudio 2 | How it works |
11 |
 |
GPU PerfSTudio 2 | How can the architecture affect the developerGPU PerfStudio2 shares the application’s memory space The game application and GPU PerfStudio2 contend for remaining free memory Remaining free memory needs to support the tool features Memory required for extraction and transfer of textures and buffers 32Bit Games Larger games near the 32bit threshold can be unstable due to memory allocation failures Depends on the use of specific features (Frame Debugger, Profiler, APITrace) CPU-side buffer copies (a client setting for Frame Debugger) can use a lot of memory Memory failures can occur in the game or in GPU PerfStudio2 Few games to date actually have problems in this area 64Bit Games Much less likely to be unstable due to memory contention |
12 |
 |
Assassins Creed III Understanding your frame |
13 |
 |
GPU PerfSTudio 2 | Assassins Creed IIIDeveloped using the Anvil Next rendering engine Based on the previous Anvil and Scimitar Engines Same engine lineage as previous AC games, starting with AC1 First AC game to use deferred rendering First DX11 Scimitar title |
14 |
 |
GPU PerfSTudio 2 | Assassins Creed IIIAnvil Next - New rendering effects Weather Volumetric mist Falling snow/rain Surface wetness/snow Deformable snow Bulk crowds Thousands of NPCs for large-scale battles Lighting improvements New static lighting techinique New large-scale static AO Ocean simulation DX11 New hexagonal filtering method New’ish AO tech (combining MSSAO+low-res HBAO) |
15 |
 |
GPU PerfSTudio 2 | Assassins Creed IIIHow did Ubisoft use GPUPerfStudio2 with Assassins Creed III? It was used by all 5 of the graphics programmers Quick startup of the tool with the app was important fast edit/test cycle It was used for cross platform development Most of the work was done using the Frame Debugger To understanding the frame structure To identify slow draw calls To identify and then optimize shaders The profiler was used towards the end of the dev cycle Stability “This is really the most important thing with this type of tool. The reality is unfortunately that we often need to cycle PIX/GPA/GPS2/Nsight to find one that works for our particular scene. GPS2 works the most often by far.” |
16 |
 |
Assassins Creed III GPU PerfStudio2 Demo |
17 |
 |
FarCry 3: Data mine your app using GPU PerfStudio2 |
18 |
 |
GPU PerfSTudio 2 | Far Cry 3Developed using the Dunia 2 engine The Dunia engine was originally used in FarCry2 Dunia 2 was switched to a deferred renderer in the middle of FC3 development 4 out of 10 3D programmers used AMD GPUs 2 out of the 4 programmers used GPU PerfStudio2 The remaining 2 programmers developed for console Highly optimized for Xbox 360 and PS3 (SPU) Optimized later for PC |
19 |
 |
GPU PerfSTudio 2 | FarCry3Dunia 2- New rendering effects Deferred rendering with multi-sample anti-aliasing (MSAA) Analysis of each scene/level in the game, selectively enabling MSAA on sections of the scene (like trees and lighting) that would benefit from the technique. Avoiding MSAA on sections that would slow the game with no visual improvement. Comprehensive light-culling system. This mechanism performs intelligent, real-time calculation of the lighting that would be visible to the player, then lets the GPU reject the material that won’t be seen. Not only is the MSAA faster than it would otherwise be (when enabled), but general frame rates are improved versus engines without this technology. http://engineroom.ubi.com/wp-content/bigfiles/farcry3_drtv_lowres.pdf http://blogs.amd.com/play/2012/12/10/far-cry-3-in-depth/ |
20 |
 |
GPU PerfSTudio 2 | FarCry3Dunia 2- New rendering effects Transparency super-sample anti-aliasing (SSTr) Real-time global illumination DirectCompute-accelerated ambient occlusion Advanced hair and skin shading |
21 |
 |
GPU PerfSTudio 2 | Far Cry 3What GPUPerfStudio2 was used for on Far Cry 3 Mainly used the Frame Debugger to investigate rendering bugs Issues when activating DX11 multithreaded rendering (bad ordering of sync points) Issues with using wrong render targets in MSAA For example using MSAA color with non-MSAA depth in the same draw call Stencil issues for skin and hair rendering Crossfire related issues Used in-house profiling tools |
22 |
 |
GPU PerfSTudio 2 | Data mine your app using GPU PerfStudio2Understanding the client server model As mentioned at the beginning of this presentation the client and server exchange data through a web server You can see the commands arrive at the server in the console window |
23 |
 |
GPU PerfSTudio 2 | Data mine your app using GPU PerfStudio2Understanding the client server model You can view all of the commands sent to the server during your debug session in the client server log Open the Server Log from the client Help menu |
24 |
 |
GPU PerfSTudio 2 | Data mine your app using GPU PerfStudio2Understanding the client server model The log displays the commands sent from the client to the server. These look like URL’s The log will contain any error messages generated by the server |
25 |
 |
GPU PerfSTudio 2 | Data mine your app using GPU PerfStudio2FarCry3 running with GPU PerfStudio2 Use the command URL in a web browser to request data from the server We can see the data command requests in the server console window We can access state data We can access the shader code. In fact we can access all data necessary to reconstruct the draw call. Attach the GPU PerfClient, pause the app, move to draw call |
26 |
 |
GPU PerfSTudio 2 | Data mine your app using GPU PerfStudio2Cool, my game is a web server, I can script that It is possible to use PerfStudio2 web requests in scripts to automate and customize access to your apps data As part of the work carried out on Far Cry3 we needed to know where specific sections of HLSL code were being used in a frame. We were able to use a script to retrieve the HLSL code from each draw call in a frame an search the code for keywords that would identify the code. Lets look at an example that does something similar |
27 |
 |
GPU PerfSTudio 2 | Data mine your app using GPU PerfStudio2After a script has run on FarCry3 The render target overlay shows its contents at draw call 10 – the breakpoint the script finished at The server log shows the breakpoint and code viewer commands The script finds 2 unique shaders in the first 10 draw calls # Create a user agent object use LWP::UserAgent; $ua = new LWP::UserAgent; $ua->agent("AgentName/0.1 " . $ua->agent); my $HTML_Request = "text/html"; my $XML_Request = "text/xml"; my $Google_URL = "http://www.google.com"; my $GPS_APP_SERVER = "10176"; my $GPS_BreakpointID = "104"; my $GPS_CodeViewer_URL = "http://localhost/$GPS_APP_SERVER/DX11/FD/Pipeline/PS/codeviewer.xml"; my $GPS_Breakpoint_URL = "http://localhost/$GPS_APP_SERVER/DX11/FD/BreakPoint=$GPS_BreakpointID"; my $GPS_NumBreakpoints = 100; sub http_Request { my($Request_URL, $Request_Type) = @_; # Create a request my $req = new HTTP::Request GET => $Request_URL; $req->content_type( $Request_Type ); #$req->content(''); # Pass request to the user agent and get a response back my $res = $ua->request($req); # Check the outcome of the response if ($res->is_success) { #print $res->content; #print "$Request_URL @ $Request_Type = Success!\n"; return $res->content; } else { print "Bad luck this time\n"; exit; } } sub iterate_breakpoint { my($i) = @_; my $content = http_Request("http://localhost/$GPS_APP_SERVER/DX11/FD/BreakPoint=$i", $XML_Request); if( $content =~ m/BreakPoint\W{2}OK/i ) { #print "Breakpoint OK\n"; return 1; } else { print "Breakpoint Failed\n"; return 0; } } my %HashOfPixelShaderCRCs; sub get_pixel_shader_code { return http_Request($GPS_CodeViewer_URL, $XML_Request); } for( my $i = 1; $i < $GPS_NumBreakpoints; $i++ ) { my $retVal = iterate_breakpoint($i); if( $retVal == 1 ) { #print "Breakpoint is OK\n"; my $psCode = get_pixel_shader_code(); if( $psCode =~ m/(\W{1}Hash\W{1}(\w+)\W{2}Hash\W{1})/i ) { my $thisHashID = $2; print "HashID: $thisHashID\n"; $HashOfPixelShaderCRCs{$thisHashID} = $psCode; } else { #print "Hash Not Found\n"; } } else { #print "Breakpoint FAILED!\n"; } } print "\nFound source code for ", scalar keys %HashOfPixelShaderCRCs, " *unique* pixel shaders used in this frame.\n"; print "\nDone!\n"; |
28 |
 |
GPU PerfSTudio 2 | Data mine your app using GPU PerfStudio2Here is the Perl script we used # Create a user agent object use LWP::UserAgent; $ua = new LWP::UserAgent; $ua->agent("AgentName/0.1 " . $ua->agent); my $HTML_Request = "text/html"; my $XML_Request = "text/xml"; my $Google_URL = "http://www.google.com"; my $GPS_APP_SERVER = "10176"; my $GPS_BreakpointID = "104"; my $GPS_CodeViewer_URL = "http://localhost/$GPS_APP_SERVER/DX11/FD/Pipeline/PS/codeviewer.xml"; my $GPS_Breakpoint_URL = "http://localhost/$GPS_APP_SERVER/DX11/FD/BreakPoint=$GPS_BreakpointID"; my $GPS_NumBreakpoints = 100; sub http_Request { my($Request_URL, $Request_Type) = @_; # Create a request my $req = new HTTP::Request GET => $Request_URL; $req->content_type( $Request_Type ); #$req->content(''); # Pass request to the user agent and get a response back my $res = $ua->request($req); # Check the outcome of the response if ($res->is_success) { #print $res->content; #print "$Request_URL @ $Request_Type = Success!\n"; return $res->content; } else { print "Bad luck this time\n"; exit; } } sub iterate_breakpoint { my($i) = @_; my $content = http_Request("http://localhost/$GPS_APP_SERVER/DX11/FD/BreakPoint=$i", $XML_Request); if( $content =~ m/BreakPoint\W{2}OK/i ) { #print "Breakpoint OK\n"; return 1; } else { print "Breakpoint Failed\n"; return 0; } } my %HashOfPixelShaderCRCs; sub get_pixel_shader_code { return http_Request($GPS_CodeViewer_URL, $XML_Request); } for( my $i = 1; $i < $GPS_NumBreakpoints; $i++ ) { my $retVal = iterate_breakpoint($i); if( $retVal == 1 ) { #print "Breakpoint is OK\n"; my $psCode = get_pixel_shader_code(); if( $psCode =~ m/(\W{1}Hash\W{1}(\w+)\W{2}Hash\W{1})/i ) { my $thisHashID = $2; print "HashID: $thisHashID\n"; $HashOfPixelShaderCRCs{$thisHashID} = $psCode; } else { #print "Hash Not Found\n"; } } else { #print "Breakpoint FAILED!\n"; } } print "\nFound source code for ", scalar keys %HashOfPixelShaderCRCs, " *unique* pixel shaders used in this frame.\n"; print "\nDone!\n"; |
29 |
 |
GPU PerfSTudio 2 | Data mine your app using GPU PerfStudio2How the script works The GPU PerfClient was connected to the game, it was paused and the Frame Debugger was opened. A perl script was run to send a command that sets the breakpoint to the first draw call Inside a loop Send a command to retrieve the PS shader code from the current draw call The shader code was stored using its unique hash as the key Send a command to advance the breakpoint When finished with the loop we can calculate how many unique shaders there are |
30 |
 |
Left for Dead 2: How Valve ported Steam to Linux using GPU PerfStudio2 |
31 |
 |
GPU PerfSTudio 2 | Left for Dead 2 – Porting Steam to LinuxOverview of the Win32 to Linux conversion process The process of porting our titles to Linux started by porting the Source Engine to GL mode under Windows We used SDL to abstract away windowing/input APIs. Resulted in one build that could be compiled to Win32/DX9 or SDL/GL on Linux, OSX and Windows. We did this to gain access to several key windows based OpenGL tools, such as GPS2 and CodeXL The vast majority of GL or rendering related bugs we see on Linux can be reproduced on Windows So, the majority of our GL and rendering related debugging can be done under Windows |
32 |
 |
Why use GPU PerfStudio2 We had numerous glitches and bugs in L4D2 and TF2 Win/Linux GL mode Many where new bugs introduced by optimizing and fixing the renderer and our D3D->GL translation backend GPU PerfStudio 2 was the only available GL debugger product we could find with a usable frame debugger that didn't fall over when faced with a “large GL application” It is not only a useful debugger and profile, it's also a great way to help you learn OpenGL GPU PerfSTudio 2 | Left for Dead 2 – Porting Steam to Linux |
33 |
 |
GPU PerfSTudio 2 | Left for Dead 2 – Porting Steam to LinuxWe first started with PerfStudio2's "API Trace" window The "Interface" column is especially useful, as it shows the GL version (or the GL extension) used by each API call. This data is particularly helpful while learning GL. The API Trace view is synchronized with the Frame Debugger - i.e. select a draw call in the trace view, then switch to the frame debugger to focus on that draw The trace can be saved to .CSV files. It's easy to diff multiple traces using Beyond Compare, which we used to find several difficult GL bugs. (We saved a trace, exited and tweaked the app, then relaunched the game and placed the camera in the same exact location/orientation as the first trace and saved another trace.) |
34 |
 |
GPU PerfSTudio 2 | Left for Dead 2 – Porting Steam to LinuxThe Frame Debugger saved our bacon countless times while porting TF2/L4D2 to GL You can hand-edit GLSL shaders by clicking the "Enable shader editing" button, then tweaking the shader and hitting compile. This is a very powerful way of quickly conducting experiments while debugging. Supports named PIX begin/end markers in GL mode. This is critical for quickly zeroing in on specific parts of the frame for further analysis. The Frame Buffer visualization window is where we do a lot of debugging. In particular, the "description" tab can be very useful. You can scrub through all GL draws/blits in the bottom timeline, and visualize how the frame is composed in the app window. This helps you get a feel for the overall flow of the frame's GL calls. Most of our debugging is done by scrubbing through draw/blit events and examining the state of render targets/textures. For example, we used this to debug particularly tricky GL texture completeness issues, by hacking a shader to only fetch from a single texture (sometimes with explicit LOD). |
35 |
 |
GPU PerfSTudio 2 | Left for Dead 2 – Porting Steam to LinuxThe Frame Debugger saved our bacon countless times while porting TF2/L4D2 to GL PerfStudio2 supports displaying both sampler objects or texture parameters (most other tools I've seen don't support sampler objects yet) We can also run our titles in D3D9 mode, and use PIX for Windows to capture frames. In a few cases, it was invaluable to be able to capture a frame in both PIX and GPS2 and manually compare the states, framebuffers, shaders, etc. |
36 |
 |
Left for Dead 2 GPU PerfStudio2 Demo |
37 |
 |
GPU PerfSTudio 2 | Left for Dead 2 – Porting Steam to LinuxTrouble shooting GPU PerfStudio2 with OpenGL apps If you have problems capturing frames with PerfStudio 2 in GL mode: First try capturing a sample GL app, before trying to capture a bigger app Try experimenting with the Server Settings - try "Slow Motion" or "None“ Try disabling multithreading in your app On one new project, that didn't use SDL to create the GL context, we had some problems getting GPS2 to connect: Try statically linking against opengl32.lib, not dynamically loading it and calling GetProcAddress() Check how you're calling SwapBuffers: try SwapBuffers() instead of calling wglSwapBuffers() Double and triple check how you're creating the GL context, what version it is, its attributes, etc. Proper GL context creation is surprisingly tricky. AMD's CodeXL can help you debug context creation code - it checks for a bunch of common GL context creation errors |
38 |
 |
GPU PerfSTudio 2 | Working with Steam, UPlay, and OriginUsing GPU PerfStudio2 with Steam Do not start your game directly with GPU PerfStudio2 Make sure that Steam.exe is NOT already running (check using the Task Manager) Start by dragging and dropping Steam.exe onto the GPU PerfStudio2 server GPU PerfStudio2 will inject into Steam.exe In the Steam settings turn off “Enable Steam Community In-Game”. Start the game using the Steam UI GPU PerfStudio2 will inject into the game Start the GPU PerfClient and attach it to the game Use similar approach to Origin and UPlay |
39 |
 |
AMD Developer ToolSDownload from AMD Developer Central http://developer.amd.com/tools/ GPU PerfStudio2 GPU Profiling and debugging DX11, DX10, OpenGL CodeXL CPU/GPU Profiling GPU debugging (OpenGL, OpenCL) OpenCL Kernel Analyzer gDEBugger OpenCL™ OpenGL debugger memory analyzer |
40 |
 |
GPU PerfSTudio 2 | Thank youTo Ubisoft and Valve for supporting this presentation Thanks to my co-presenters Jean-Francois St-Amour (Ubisoft) Rich Geldreich (Valve) For FarCry3 information Jean-Sebastien Guay (Ubisoft) For supporting GPU PerfStudio2 Layla Mah (AMD ISV Engineer – FarCry3) Raul Aguaviva (AMD ISV Engineer) |
41 |
 |
Questions gordonselley@amd.com |
42 |
 |
Disclaimer & Attribution The information presented in this document isfor informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. There is no obligation to update or otherwise correct or revise this information. However, we reserve the right to revise this information and to make changes from time to time to the content hereof without obligation to notify any person of such revisions or changes. NO REPRESENTATIONS OR WARRANTIES ARE MADE WITH RESPECT TO THE CONTENTS HEREOF AND NO RESPONSIBILITY IS ASSUMED FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. ALL IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE ARE EXPRESSLY DISCLAIMED. IN NO EVENT WILL ANY LIABILITY TO ANY PERSON BE INCURRED FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. AMD, the AMD arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. All other names used in this presentation are for informational purposes only and may be trademarks of their respective owners. [© 2012 Advanced Micro Devices, Inc. |
43 |
 |
Trademark Attribution AMD, the AMD Arrow logo and combinations thereofare trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. Other names used in this presentation are for identification purposes only and may be trademarks of their respective owners. ©2012 Advanced Micro Devices, Inc. All rights reserved. |
«Behind the Scenes: Optimizing FarCry 3, Left 4 Dead 2 and Assassins Creed 3 using AMDs GPU PerfStudio2» |
http://900igr.net/prezentacija/anglijskij-jazyk/behind-the-scenes-optimizing-farcry-3-left-4-dead-2-and-assassins-creed-3-using-amds-gpu-perfstudio2-238884.html