Let's assume I have an AppSync API.
This API has one resolver which consists of a simple GetItem operation on a DynamoDB table.
Now if for some obscure reason I wanted to convert that single operation resolver into a pipeline resolver (with still the same operation and nothing significant in before and after mapping templates), it would probably be fair to assume that there would be some performance hit between the straight operation resolver and the pipeline resolver.
Now I was wondering about the scale of that performance hit: is it going to be negligible, noticeable, or orders of magnitude ?
The performance hit should be linear in the sense that if you have a resolver accessing a data source with a request and response mapping template compared to the performance of 2 functions doing similar work with a request and response mapping template each, accessing 2 data sources (each function accessing a data source), this should roughly take twice the time the first resolver takes.
Related
Case in point - I have a build which invokes a lot of REST API calls and processes the results. I would like to split the monolithic step that does all that into 3 steps:
initial data acquisition - gets data from REST Api. Plain objects, no reference loops or duplicate references
data massaging - enriches the data from (1) with all kinds of useful information. May result in duplicate references (the same object is referenced from multiple places) or reference loops.
data processing
The catch is that there is a lot of data and converting it to json takes too much time to my taste. I did not check the Export-CliXml function, but I think it would be slow too.
If I wrote the code in C# I would use some kind of binary serialization, which should be sophisticated enough to handle reference loops and duplicate references.
Please, note that serialization would write to the build staging directory and would be deserialized almost immediately as soon as the next step runs.
I wonder what are my options in Powershell?
EDIT 1
I would like to clarify what do I mean by steps. This is a build running on a CI build server. Each step runs in a separate shell and is reported individually on the build page. There is no memory sharing between the steps. The only way to communicate between the steps is either through build variables or file system. Of course, using databases is also possible, but it is an overkill.
Build variables are set using certain API and are exposed to the subsequent steps as environment variables. As such they are quite limited in length.
So I am talking about communicating through the file system. I am sacrificing performance here for the sake of build granularity - instead of having one monolithic step I want to have 3 smaller steps. This way the build is more transparent and communicates clearly what it is doing. But I have to temporarily persist payloads between steps. If it is possible to minimize the overhead, then the benefits worth it. If the performance is going to degrade significantly, then I will keep the monolithic step.
I'm just wondering if the design I will be trying to implement is valid CQRS.
I'm going to have a query handler that itself will send more queries to other sub-handlers. Its main task is going to aggregate results from multiple services.
Is this ok to send queries from within handlers? I can already think of 3 level deep hierachies of these in my application.
No, MediatR is designed for a single level of requests and handlers. A better design may be to create a service/manager of some kind which invokes multiple, isolated queries using MediatR and aggregate the results. The implementation may be similar to what you have in mind, except that it's not a request handler itself but rather an aggregation of multiple request handlers.
This will badly affect the system's resilience and compute time and it will increase coupling.
If any of the sub-handlers fails then the entire handler will fail. If the queries are send in a synchronous way then the total compute time is sum of the individual queries times.
One way to reuse the sub-handlers is to query them in the background, outside the client's request, it possible. In this way, when a client request comes you already have the data locally, increasing the resilience and compute time. You will be left only with the coupling but it could worth it if the reuse is heavier than the coupling.
I don't know if any of this is possible in MediatR, there are just general principles of system architecture.
I need to present ~10 million XML documents to Qlik Sense using MarkLogic REST interface with the intention of analyzing raw data on Qlik.
I'm unable to send that bulk data using simple cts:search.
A template view with SQL call like below is not helping as it is not recognized at Qlik Sense.
xdmp:to-json(xdmp:sql('select * from SC1.V1'))
Is there a better way to achieve this?
I understand it is not usual to load such huge data to Qlik, but what limitations should I consider?
You are unlikely to be able transfer that volume of data into or out of ANY system in a single 'transaction' (or request ). And if you could you wouldn't want to because when it fails, it's likely to fail forever as you have to start all over.
You should 'batch' up the documents into manageable chunks .. 100MB or '1 minute' is a reasonable high upper bound -- as size and time increase the probability of problems goes up (way up) due to timeouts, memory, temp space, internet and network transient problems etc.
A simple strategy that often works well is to first produce a 'list' of what to extract (document uris, primary keys ..), save that, and then work your way through the list in batches - retrying as needed. Depending on the destination and local storage etc. you can either combine the lot to send on to the recipient, or generally better, send the target data in batches as well.
This approach has good transactional characteristics ... you effectively 'freeze' the set of data when you make the list, but can take your time collecting and sending it. Depending -- you may be able to do so in parallel.
I'm using tinkerpop3 gremlin server.
I execute a simple query (using the standard REST api) to get edges of a vertex.
g.traversal().V(123456).outE('label')
When there are many results (about 2000-3000), the query is very slow, more than 20 seconds to get the JSON-results response.
The interesting thing is when I'm using the gremlin shell, running the same query, it takes about 1 second to receive the edges objects results!
I'm not sure, but I suspect that maybe the gremlin-server's JSON parser (I'm using GraphSon) is problematic (maybe very slow).
Any ideas?
Thanks
That does seem pretty slow, but it is building a potentially large result set in memory and serializing a graph element such as an entire Vertex or Edge is a bit "heavy" because it tries to match the general structure of the Vertex/Edge API hierarchy. We've seen where you can get a faster serialization time by just changing your query to:
g.V(123456).outE('label').valueMap()
or if you need the id/label as well:
g.V(123456).outE('label').valueMap(true)
in this way the hierarchy for the Vertex/Edge gets flattened to a simple Map which has less serialization overhead. In short, limit the amount of data you actually need on the client side to improve your performance.
I'm setting up a new application using Entity Framework Code Fist and I'm looking at ways to try to reduce the number of round trips to the SQL Server as much as possible.
When I first read about the .Local property here I got excited about the possibility of bringing down entire object graphs early in my processing pipeline and then using .Local later without ever having to worry about incurring the cost of extra round trips.
Now that I'm playing around with it I'm wondering if there is any way to take down all the data I need for a single request in one round trip. If for example I have a web page that has a few lists on it, news and events and discussions. Is there a way that I can take down the records of their 3 unrelated source tables into the DbContext in one single round trip? Do you all out there on the interweb think it's perfectly fine when a single page makes 20 round trips to the db server? I suppose with a proper caching mechanism in place this issue could be mitigated against.
I did run across a couple of cracks at returning multiple results from EF queries in one round trip but I'm not sure the complexity and maturity of these kinds of solutions is worth the payoff.
In general in terms of composing datasets to be passed to MVC controllers do you think that it's best to simply make a separate query for each set of records you need and then worry about much of the performance later in the caching layer using either the EF Caching Provider or asp.net caching?
It is completely ok to make several DB calls if you need them. If you are affraid of multiple roundtrips you can either write stored procedure and return multiple result sets (doesn't work with default EF features) or execute your queries asynchronously (run multiple disjunct queries in the same time). Loading unrealted data with single linq query is not possible.
Just one more notice. If you decide to use asynchronous approach make sure that you use separate context instance in each asynchronous execution. Asynchronous execution uses separate thread and context is not thread safe.
I think you are doing a lot of work for little gain if you don't already have a performance problem. Yes, pay attention to what you are doing and don't make unnecessary calls. The actual connection and across the wire overhead for each query is usually really low so don't worry about it.
Remember "Premature optimization is the root of all evil".
My rule of thumb is that executing a call for each collection of objects you want to retrieve is ok. Executing a call for each row you want to retrieve is bad. If your web page requires 20 collections then 20 calls is ok.
That being said, reducing this to one call would not be difficult if you use the Translate method. Code something like this would work
var reader = GetADataReader(sql);
var firstCollection = context.Translate<whatever1>(reader);
reader.NextResult();
var secondCollection = context.Translate<whateve2r>(reader);
etc
The big down side to doing this is that if you place your sql into a stored proc then your stored procs become very specific to your web pages instead of being more general purpose. This isn't the end of the world as long as you have good access to your database. Otherwise you could just define your sql in code.