WebGPU best practices with setPipeline and optimization - webgpu

I'm learning WebGPU for the first time and, in the tutorials I'm following, I see that setPipeline is called on each rendering pass. I'm wondering if there's a performance hit if the pipeline is changed between passes? Most of the tutorials I'm reading use the same pipeline for every pass and just change the data going to it via a writeBuffer, but I don't know if that's intentional. The only thing I've read about pipeline optimization is from this tutorial
The configuration of the components of this pipeline (e.g., the shaders, vertex state, render output state, etc.) are fixed, allowing the GPU to better optimize rendering for the pipeline.
That'd lead me to believe that the pipeline shouldn't be changed between passes, but I haven't seen anything stating that explicitly. Thanks in advance for any help!

It's fairly common for applications to use different shaders for different objects in a single render (eg: see this question). From an optimisation perspective, you'd want to set a pipeline and render all objects that use that pipeline, then set another pipeline and render all objects that use that pipeline, and so on. You'd probably want to look into instances to minimize the number of draw calls too.

Related

How to implement Featuretools into my ML Process?

I am exploring the possibility of implementing Featuretools into my pipeline, to be able to create new features from my Df.
Currently I am using a GridSearchCV, with a Pipeline embedded inside it. Since Featuretools is creating new features with aggregation on columns, like STD(column) etc, I feel like it is suspectible to data leakage. In their FAQ, they are giving an example approach to tackle it, which is not suitable for a Pipeline structure I am using.
Idea 0: I would love to integrate it directly into my Pipeline but it seems like not compatible with Pipelines. It would use fold train data to construct features, transform fold test data. K times. At the end, it would use whole data to construct, during Refit= True stage of GridSearchCV. If you have any example opposed to this fact, you are very welcome.
Idea 1: I can switch to a manual CV structure, not embedded into pipeline. And inside it, I can use Train data to construct new features, and test data to transform with these. It will work K times. At the end, all data can be used to construct Ultimate model.
It is the safest option, with time and complexity disadvantages.
Idea 2: Using it with whole data, ignore the leakage possibility. I am not in favor of this of course. But when I look at Project Github page, all the examples are combining Train and Test data, creating these features with whole data. Then go on with Train-Test division for modeling.
https://github.com/Featuretools/predict-taxi-trip-duration/blob/master/NYC%20Taxi%203%20-%20Simple%20Featuretools.ipynb
Actually if the developers of the project think like that, I could give it a chance with whole data.
What do you think, I would love to hear about your experiences on FeatureTools.

UE5: import csv for a data driven animation

I was wondering if UE5 can support 50k+ lines of a db/CSV as they rappresent the parameters of the whole animation. (coordinates[x,y,z], TimeDelta, Speed, Brake)
Any documentation is very much appreciated
There is no existing functionality in the engine itself for this extremely specific use case. Of course, it can "support" it if you write a custom solution using the many available tools within the engine.
You can use IFileHandle to stream in a file (your csv): link
You can then parse the incoming data to create a FVector3 of your coordinates, a float of your TimeDelta, etc. For example, FVector::InitFromString may help: link
However, this depends very much on the format of your data. Parsing string/texts into values is not specific to UE4, you can find a lot of info on converting streams of binary/character data to needed values.
Applying the animation as the data is read is a separate, quite big, task. Since you provide no details on what the animation data represents, or what you need to apply it to, I cannot really help.
In general though, it can help you a lot to break down your question into 3-4 separate, more specific, questions. In any case though, this is a task that will require a lot of research and work.
And even before that, it might be good to research alternative approaches and changing the pipeline, to avoid using such non-standard file structures for animation.

Getting Recursive Tasks in Asana with reasonable performance

I'm using the Asana REST API to iterate over workspaces, projects, and tasks. After I achieved the initial crawl over the data, I was surprised to see that I only retrieved the top-level tasks. Since I am required to provide the workspace and project information, I was hoping not to have to recurse any deeper. It appears that I can recurse on a single task with the \subtasks endpoint and re-query... wash/rinse/repeat... but that amounts to a potentially massive number of REST calls (one for each subtask to see if they, in turn, have subtasks to query - and so on).
I can partially mitigate this by adding to the opt_fields query parameter something like:
&opt_fields=subtasks,subtasks.subtasks
However, this doesn't scale well. It means I have to elongate the query for each layer of depth. I suppose I could say "don't put tasks deeper than x layers deep" - but that seems to fly in the face of Asana's functionality and design. Also, since I need lots of other properties, it requires me to make a secondary query for each node in the hierarchy to gather those. Ugh.
I can use the path method to try to mitigate this a bit:
&opt_fields=(this|subtasks).(id|name|etc...)
but again, I have to do this for every layer of depth. That's impractical.
There's documentation about this great REPEATER + operator. Supposedly it would work like this:
&opt_fields=this.subtasks+.name
That is supposed to apply to ALL subtasks anywhere in the hierarchy. In practice, this is completely broken, and the REST API chokes and returns only the ids of the top-level tasks. :( Apparently their documentation is just wrong here.
The only method that seems remotely functional (if not practical) is to iterate first on the top-level tasks, being sure to include opt_fields=subtasks. Whenever this is a non-empty array, I would need to recurse on that task, query for its subtasks, and continue in that manner, until I reach a null subtasks array. This could be of arbitrary depth. In practice, the first REST call yields me (hopefully) the largest number of tasks, so the individual recursion may be mitigated by real data... but it's a heck of an assumption.
I also noticed that the limit parameter applied ONLY to the top-level tasks. If I choose to expand the subtasks, say. I could get a thousand tasks back instead of 100. The call could timeout if the data is too large. The safest thing to do would be to only request the ids of subtasks until recursion, and as always, ask for all the desired top-level properties at that time.
All of this seems incredibly wasteful - what I really want is a flat list of tasks which include the parent.id and possibly a list of subtasks.id - but I don't want to query for them hierarchically. I also want to page my queries with rational data sizes in mind. I'd like to get 100 tasks at a time until Asana runs out - but that doesn't seem possible, since the limit only applies to top-level items.
Unfortunately the repeater didn't solve my problem, since it just doesn't work. What are other people doing to solve this problem? And, secondarily, can anyone with intimate Asana insight provide any hope of getting a better way to query?
While I'm at it, a suggested way to design this: the task endpoint should not require workspace or project predicate. I should be able to filter by them, but not be required to. I am limited to 100 objects already, why force me to filter unnecessarily? In the same vein - navigating the hierarchy of Asana seems an unnecessary tax for clients who are not Asana (and possibly even the Asana UI itself).
Any ideas or insights out there?
Have you ensured that the + you send is URL-encoded? Whatever library you are using should usually handle this (which language are you using, btw? We have some first-party client libraries available)
Try &opt_fields=this.subtasks%2B.name if you're creating the URL manually, or (better yet) use a library that correctly encodes URL query parameters.

Are Operational Transformation Frameworks only meant for text?

Looking at all the examples of Operational Transformation Frameworks out there, they all seem to resolve around the transformation of changes to plain text documents. How would an OT framework be used for more complex objects?
I'm wanting to dev a real-time sticky notes style app, where people can co-create sticky notes, change their positon and text value. Would I be right in assuming that the position values wouldn't be transformed? (I mean, how would they, you can't merge them right?). However, I would want to use an OT framework to resolve conflicts with the posit-its value, correct?
I do not see any problem to use Operational Transformation to work with Complex Objects, what you need is to define what operations your OT system support and how concurrency is solved for them
For instance, if you receive two Sticky notes "coordinates move operation" from two different users from same 'client state', you need to make both states to converge, probably cancelling out second operation.
This is exactly the same behaviour with text when two users generate two updates to delete a text range that overlaps completely, (or maybe partially), the second update processed must be transformed against the previous and the resultant operation will only effectively delete a portion of the original one, (or completely cancelled with a 'no-op')
You can take a look on this nice explanation about how Google Wave Operational Transformation works and guess from this point how it should work your own implementation
See the following paper for an approach to using OT with trees if you want to go down that route:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.100.74
However, in your particular case, I would use a separate plain text OT document for each stickynote and use an existing library, eg: etherPad, to do the heavy lifting. The positions of the notes could then be broadcast on a last-committer-wins basis.
Operation Transformation is a general technique, it works for any data type. The point is you need to define your transformation functions. Also, there are some atomic attributes that you cannot merge automatically like (position and background color) those will be mostly "last-update wins" or the user solves them manually when there is a conflict.
there are some nice libs and frameworks that provide OT for complex data already out there:
ShareJS : library for Node which provides all operations on JSON objects
DerbyJS: framework for NodeJS, it uses ShareJS for OT stuff.
Open Coweb framework : Dojo foundation project for cooperative web applications using OT

Powershell cmdlets development best practices

I'm currently putting together some Powershell cmdlets. Building them is easy enough but I don't know if I'm building them in an acceptable manner (so to speak).
Are there any guidelines/best practices that one should follow for passing data into the Powershell pipeline? At the moment I'm actually output a single object of type DataSet - if any cmdlet wanted to use it downstream then they would have to loop over the DataTables in that DataSet, then loop over the DataRows in each DataTable.
I guess the question is....am I going to p!ss anyone off by doing this? Or should I be outputting data that is inherently a bunch of rows?
Thanks all in advance
-JT
It's acceptable to output whatever type of object is best used to represent what you're writing out - a DataSet is absolutely fine. The only potential caution is that v2 of PowerShell may find itself running on a reduced version of the .NET Framework (such as on Server Core), so if that's a potential scenario for your cmdlets, you need to use some caution to make sure the object you're outputting exists on every system where your cmdlet might be used.
All that said, the pipeline works best when it contains collections of objects; a DataSet isn't a collection per se. In other words, you want downstream cmdlets to be able to receive one object at a time via the pipeline, so that those cmdlets don't have to manually enumerate through an object. I don't know a lot about exactly what you're doing - it could well be that a DataSet is entirely appropriate - but I'd generally prefer to see a cmdlet loop through the DataSet internally, create its own custom objects (so that each column in the table becomes a property), and output those objects to the pipeline. That simply increases the number of downstream cmdlets that can consume what you're putting out.
A simple test is to pipe your cmdlet's output to Export-CSV. If it works (and it probably wouldn't with a DataSet), then you're doing the right thing generally. Now, you may well need to create a cmdlet which outputs a DataSet and you only intend for certain other cmdlets you've written (which consume DataSets) to operate against that output. Nothing wrong with that. Max flexibility is single objects, though, since it enables all of PowerShell's core cmdlets to work on your output.
Hope that helps.
MSDN has an amazing set of Cmdlet Development Guidelines which I found extremely useful when developing my own. They are broken up into three different sections:
Required Development Guidelines
Strongly Encouraged Development Guidelines
Advisory Development Guidelines