When does pixels-to-screen pipeline happens? - web-performance

I was reading Rendering Performance by Paul Lewis, and learnt about the pixel to screen pipeline. (shown below)
pixel to screen pipeline
But I started to wonder when this process happens in the whole request/response cycle (shown below).
request/response cycle
Does it happen after the render tree is completed? or before?

Related

Analysis of canvas rendering performance

I'm developing a rendering of an isometric 3d environment made up of blocks (kind of minecraft).
I'm drawing it in a canvas using its 2d context (and doing some math).
On page load a loop is created adding some blocks each frame (window.requestAnimationFrame(fn)), but I'm struggling with low fps when rendering.
This is first time for me to go so deep in performance analysis, and I'm struggling understanding the performance view of Chrome Devtools.
Looking at the results:
What I understand is that the frame took 115.9ms to complete, but looking at the processes seems it took just ~30ms to do the calculation using the canvas API, but in the task bar (upon the Animation Frame Fired) I see much longer time for the frame to complete.
Is this a common behavior? Have I did some dumb mistake wasting performance some way?
(if it is a common behavior, what is happening during that time? Is it the actual drawing?)
I blocked as I'm wondering if I should try to improve my algorithm of drawing, or I should look somewhere else to address a bottleneck
I don't know if you ever got an answer to this, but one thing that jumps out at me is that in your screenshot the green "GPU" bar is nearly solid. As I understand it, this bar indicates that the browser is sending instructions and/or data to the GPU for hardware-accelerated rendering. In my experience this can be a problem if you're using a slow graphics card, depending on what you're trying to do.
The good news is that I would expect testing on a more powerful system to result in an immediate framerate improvement. The bad news is, I'm not sure how to tell exactly which canvas operations put that much load on your (bad) GPU or how to optimize to reduce GPU traffic.

Multithreading with Metal

I'm new to Apple's Metal API and graphics programming in general. I'm slowly building a game engine of sorts, starting with UI. My UI is based on nodes each with their own list of child nodes. So if I have a 'menu' node with three 'buttons' as children, calling render(:MTLDrawable:CommandQueue) the menu will render itself to the drawable by committing a command buffer to the queue, and then call the same method for all of its children with the same drawable and queue, until the entire node tree has been rendered from top to bottom. I want a separate subthread to be spawned for he rendering of every node in the tree--can I just wrap each render function in a dispatch-async call? Is the command queue inherently thread-safe? What is the accepted solution for concurrently rendering multiple objects to a single texture before presenting it using Metal? All I've seen in any Metal tutorial so far is a single thread that renders everything in order using a single command buffer per frame, calling presentDrawable() and then commit() at the end of each frame.
Edit
When I say I want to use multithreading, it applies only to command encoding, not execution itself. I don't want to end up with the buttons in my theoretical menu being drawn and then covered up with the menu background as a result of bad execution order. I just want each object's render operation to be encoded on a separate thread, before being handed to the command queue.
Using a separate command buffer and render pass for each UI element is extreme overkill, even if you want to use CPU-side concurrency to do your rendering. I would contend that you should start out by writing the simplest thing that could possibly work, then optimize from there. You can set a lot of state and issue a lot of draw calls before the CPU becomes your bottleneck, which is why people start with a simple, single-threaded approach.
Dispatching work to threads isn't free. It introduces overhead, and that overhead will likely dominate the work you're doing to issue the commands for drawing any given element, especially once you factor in the bandwidth required to repeatedly load and store your render targets.
Once you've determined you're CPU-bound (probably once you're issuing thousands of draw calls per frame), you can look at splitting the encoding up across threads with an MTLParallelRenderCommandEncoder, or multipass solutions. But well before you reach that point, you should probably introduce some kind of batching system that removes the responsibility of issuing draw calls from your UI elements, because although that seems tidy from an OOP perspective, it's likely to be a large architectural misstep if you care about performance at scale.
For one example, you could take a look at this Metal implementation of a backend renderer for the popular dear imgui project to see how to architect a system that does draw call batching in the context of UI rendering.

How to apply a postprocess effect to a UI element

I have a post-process effect that uses Unity's Graphics.Blit to pixelate or apply a crt-screen effect to a scene. There are some UI elements that display after the fact (basically making it not a true post process, but let's put that aside for a moment).
Now, I want to apply a second process that performs a screen wipe and transitions out one scene for another. This time I want to include the UI buttons in the effect.
I've looked into using a render texture and then rendering to a second camera, but I feel like there is a smarter/more accurate way to handle this.
1st version of this question:
Can I selectively include the screenspace-overlay UI in the script that applies the post process?
or, 2nd version of this question
Is there a trick to getting the render texture to preserve resolution and display accurately (i.e.: without lost quality) when re rendering to a second camera?

Timeline Paint Profiler in Devtools suggests everything is being painted

When we use the Paint Profiler in Chrome we can see what's being painted. I created a simple example that adds a new div to the page every 3 seconds and here is what is shown as being painted:
But when I use the paint profiler in the Timeline it looks like everything is being repainted:
As shown in the screenshot, on the fifth paint we have 5 calls to drawTextBlob calls. This suggests that all the 5 divs where painted. I was expecting only one.
Can someone shed some light into this?
The exact meaning of "Paint" event has changed over time. It used to be that during Paint the renderer directly updated the layer's bitmap, which was often slow. Back in those days, you would likely find that the painted rectangle matches the area that you actually invalidated (i.e. would be just the last line in your case), as you probably expect.
Present implementation of Chrome's rendering subsystem performs rasterization either on other threads (in an attempt to keep things off the main thread which is busy enough with JavaScript execution, DOM layout and lots of other things) or on GPU (check out "Rasterization" and "Multiple raster threads" in chrome://gpu if you're curious to know what's the current mode on your platform). So the "Paint" event that you see on the main thread just covers recording a log of paint commands (i.e. what you see on the left pane of the Paint Profiler), without actually producing any pixels -- this is relatively fast, and Chrome chooses to re-record the entire layer so it can later pick what part of it to rasterize (e.g. in an anticipated case of scrolling) without going to main thread again, which is likely to be busy by running JavaScript or doing a layout.
Now if you switch Timeline into Flame Chart mode (right icon near "View" label in Toolbar), you'll see "Rasterize Paint" event which is actual rasterization -- Chrome picks the paint command log recorded during Paint event on the main thread and re-plays it producing actual pixels for a fragment of the layer. You can see what part of the layer was being rasterized and the Paint Profiler for this part when you select "Rasterize Paint". Note that there are many small Rasterize Paint events for different fragments, possible on different threads, but they still all have the entire log (i.e. 5 drawTextBlob commands in your example). However, those paint commands that do not affect the fragment being rasterized will be culled as they fall outside of the fragment's clip rectangle and hence won't have noticeable effect on rasterization time.
Then, you'll probably notice that the fragments being rasterized are still larger than the area you've actually invalidated. This is because Chrome manages rasterized layers in terms of tiles, small rectangular bitmaps (often 128 x 128, but may vary by platform), so that for large layers (e.g. pages much longer than viewport), only the parts visible in the viewport could be stored on the GPU (which often has a limited memory) and the parts that suddenly become visible as a result of scrolling could be uploaded fast.
Finally, the parts that you see highlighted with green as a result of you ticking "Show Paint rectangles" in Rendering options, are technically an "invalidation" rectangles -- i.e. that's the areas of your page that have really changed as a result of changed layout/styles etc. These areas are what you really as an author can directly affect, but as you see, Chrome will likely paint and rasterize more than that, mostly out of concerns of managing the scrolling of large pages efficiently.

In the dev tools timeline, what are the empty green rectangles?

In the Chrome dev tools timeline, what is the difference between the filled, green rectangles (which represent paint operations) and the empty, green rectangles (which apparently also represent something about paint operations...)?
Painting is really two tasks: draw calls and rasterization.
Draw calls. This is a list of things you'd like to draw, and its derived from the CSS applied to your elements. Ultimately there is a list of draw calls not dissimilar to the Canvas element's: moveTo, lineTo, fillRect (though they have slightly different names in Skia, Chrome's painting backend, it's a similar concept.)
Rasterization. The process of stepping through those draw calls and filling out actual pixels into buffers that can be uploaded to the GPU for compositing.
So, with that background, here we go:
The solid green blocks are the draw calls being recorded by Chrome. These are done on the main thread alongside JavaScript, style calculations, and layout. These draw calls are grouped together as a data structure and passed to the compositor thread.
The empty green blocks are the rasterization. These are handled by a worker thread spawned by the compositor.
Essentially, then, both are paint, they just represent different sub-tasks of the overall job. If you're having performance issues (and from the grab you provided you appear to be paint bound), then you may need to examine what properties you're changing via CSS or JavaScript and see if there is a compositor-only way to achieve the same ends. CSS Triggers can probably help here.