So, in DynamoDB the reccomended approach to a many-to-many relationship is using Adjacency List Pattern.
Now, it works great for when you need to read the Data because you can easily read several items with one request.
But what if I need to update/delete the Data? These operations happen on a specific item instead of a query result.
So if I have thousands of replicated data to facilitate a GET operation, how am I going to update all of these replicas?
The easiest way I can think of is instead of duplicating the data, I only store an immutable ID, but that's pretty much emulating a relational database and will take at least 2 requests.
Simple answer: You just update the duplicated items :) AFAIK redundant data is preferred in NoSQL databases and there are no shortcuts to updating data.
This of course works best when read/write ratio of the data is heavily on the read side. And in most everyday apps that is the case (my gut feeling that could be wrong), so updates to data are rare compared to queries.
DynamoDB has a couple of utils that might be applicable here. Both have their shortcomings though
BatchWriteItem allows to put or delete multiple items in one or more tables. Unfortunately, it does not allow updates, so probably not applicable to your case. The number of operations is also limited to 25.
TransactWriteItems allows to perform an atomic operation that groups up to 10 action requests in one or more tables. Again the number of operations is limited for your case
My understanding is that both of these should be used with caution and consideration, since they might cause performance bottlenecks for example. The simple way of updating each item separately is usually just fine. And since the data is redundant, you can use async operations to make multiple updates in parallel.
Im work with Classic ASP and all my pages do multiples calls (stored procedures) to database to construct the page (reports, forms...).
Is it better to do 1 call with multiple recordset or do what Im doing (multiple calls)?
I know, maybe, there is something better with another languages (PHP, C#...), but my app was built entirely in Classic ASP.
Tks
As always, there is a case for both ways.
To optimize for amount of total work done, as Blam said, you should do one big call to reduce the round trip time. Not only for network latency, but also for all the network overhead of putting together packets and handling sockets.
However, this would mean that your page gets no data until all database accesses are done. So to improve response time, you may want to consider doing a pipeline where there are some database calls, but you are also processing some of the database results while other calls are made. This is a fairly unusual case since most of the time, processing is fairly light.
A common reason to break up the stored procedure is for reuse. If you have one big stored procedure, then to reuse any part of the stored procedure, you have to reuse all of it. (Unless you do messy branches and conditions inside your stored procedure that probably hurt performance due to query plan optimizations.) If you have multiple pages that can share some of the code, you probably want to break it up.
In a typical web farm, the database and the page servers are fairly close together so that network latency is not too bad. I profiled some of our production loads, and there are several places where we make multiple database calls serially, taking less than 1 ms for 10 database calls.
If your database network latency is significant, it may be worth it to do database calls in parallel. This way, you can break your stored procedure up for code reuse and not worry about network latency.
As a general rule, make your code clean and pretty without worrying about performance. Throw more hardware at the problem until you can't make it faster by paying more money. Typically, hardware is a lot cheaper than developers.
I have several performance issue in my website.
I'm using asp.net mvc 2 and Entity Framework 4.0. I bought a Entity Framework Profiler to see what kind of SQL request that EF generated.
By example, some page take between 3 and 5 seconds to open. This is to much for my client.
To see if it's a performance problem with SQL generated by EF, I used my profiler and Copy / Paste the generated SQL in Sql Management Studio to see the execution plan and the sql statistic. The result show in less than a second.
Now that I eliminated the SQL query, I suspect EF at buidling query step.
I Follow the msdn step by step to pre-generate my view. I didn't see any performance gain.
How to be sure that my query use these Pre-Generated Views ?
Is there anything I can do to increase performance of my website ?
thanks
First of all, keep in mind that the pre-compiled queries still take just as long (in fact a little longer) the first time they are run, because the queries are compiled the first time they are invoked. After the first invocation, you should see a significant performance increase on the individual queries.
However, you will find the best answer to all performance questions is: figure out what's taking the most time first, then work on improving in that area. Until you have run a profiler and know where your system is blocking, any time you spend trying to speed things up is likely to be wasted.
Once you've determined what's taking the most time, there are a lot of possible techniques to use to speed things up:
Caching data that doesn't change often
Restructuring your data accesses so you pull the data you need in fewer round trips.
Ensuring you're not pulling more data than you need when you do your database queries.
Buying better hardware.
... and many others
One last note: In Entity Framework 5, they plan to implement automatic query caching, which will make precompiling queries practically useless. So I'd only recommend doing it where you know for sure that you'll get a significant improvement.
I have a script that loops over a set of records, performs some statistical calculations and updates the records. It's a big cursor: get record, calculate statistics from embedded documents, set fields on record, save record. There's <5k records that are being looped and each one embeds 90 history entries.
Question: would I get substantially better performance if I did this via javascript? The alternative being writing it in Ruby. My opinion (unfounded) is that since this can be done entirely in the database I will get better performance if send a chunk of js to Mongodb instead of adding Ruby in to the mix.
Related: is map/reduce appropriate for finding the median and mode of a set of values for many records?
The answer is really "it depends" - if the fields you need to do the calculations are very large, doing the calculation on the server side with JS might be a lot faster simply by cutting down on network traffic.
But, executing JS on the server side also holds a write lock, so depending on how complicated the calculations are, it might be more efficient to just do your calculations on the client side and then simply update the document.
Your best bet is to do a simple benchmark with ruby vs. server side JS. If you need to serve other database traffic at the same time, this should also be considered as well, because your lock % could be different in the two scenarios (you can monitor this with mongostat).
Also, keep in mind that using db.eval will not work with sharding, so avoid it if you are using a sharded environment or plan to in the future.
I have a ADO.NET/TSQL performance question. We have two options in our application:
1) One big database call with multiple result sets, then in code step through each result set and populate my objects. This results in one round trip to the database.
2) Multiple small database calls.
There is much more code reuse with Option 2 which is an advantage of that option. But I would like to get some input on what the performance cost is. Are two small round trips twice as slow as one big round trip to the database, or is it just a small, say 10% performance loss? We are using C# 3.5 and Sql Server 2008 with stored procedures and ADO.NET.
I would think it in part would depend on when you need the data. For instance if you return ten datasets in one large process, and see all ten on the screen at once, then go for it. But if you return ten datasets and the user may only click through the pages to see three of them then sending the others was a waste of server and network resources. If you return ten datasets but the user really needs to see sets seven and eight only after making changes to sets 5 and 6, then the user would see the wrong info if you returned it too soon.
If you use separate stored procs for each data set called in one master stored proc, there is no reason at all why you can't reuse the code elsewhere, so code reuse is not really an issue in my mind.
It sounds a wee bit obvious, but only send what you need in one call.
For example, we have a "getStuff" stored proc for presentation. The "updateStuff" proc calls "getStuff" proc and the client wrapper method for "updateStuff" expects type "Thing". So one round trip.
Chatty servers are one thing you prevent up front with minimal effort. Then, you can tune the DB or client code as needed... but it's hard to factor out the roundtrips later no matter how fast your code runs. In the extreme, what if your web server is in a different country to your DB server...?
Edit: it's interesting to note the SQL guys (HLGEM, astander, me) saying "one trip" and the client guys saying "multiple, code reuse"...
I am struggling with this problem myself. And I don't have an answer yet, but I do have some thoughts.
Having reviewed the answers given by others to this point, there is still a third option.
In my appllication, around ten or twelve calls are made to the server to get the data I need. Some of the datafields are varchar max and varbinary max fields (pictures, large documents, videos and sound files). All of my calls are synchronous - i.e., while the data is being requested, the user (and the client side program) has no choice but to wait. He may only want to read or view the data which only makes total sense when it is ALL there, not just partially there. The process, I believe, is slower this way and I am in the process of developing an alternative approach which is based on asynchronous calls to the server from a DLL libaray which raises events to the client to announce the progress to the client. The client is programmed to handle the DLL events and set a variable on the client side indicating chich calls have been completed. The client program can then do what it must do to prepare the data received in call #1 while the DLL is proceeding asynchronously to get the data of call #2. When the client is ready to process the data of call #2, it must check the status and wait to proceed if necessary (I am hoping this will be a short or no wait at all). In this manner, both server and client side software are getting the job done in a more efficient manner.
If you're that concerned with performance, try a test of both and see which performs better.
Personally, I prefer the second method. It makes life easier for the developers, makes code more re-usable, and modularizes things so changes down the road are easier.
I personally like option two for the reason you stated: code reuse
But consider this: for small requests the latency might be longer than what you do with the request. You have to find that right balance.
As the ADO.Net developer, your job is to make the code as correct, clear, and maintainable as possible. This means that you must separate your concerns.
It's the job of the SQL Server connection technology to make it fast.
If you implement a correct, clear, maintainable application that solves the business problems, and it turns out that the database access is the major bottleneck that prevents the system from operating within acceptable limits, then, and only then, should you start persuing ways to fix the problem. This may or may not include consolidating database queries.
Don't optimize for performance until a need arisess to do so. This means that you should analyze your anticipated use patterns and determine what the typical frequency of use for this process will be, and what user interface latency will result from the present design. If the user will receive feedback from the app is less than a few (2-3) seconds, and the application load from this process is not an inordinate load on server capacity, then don't worry about it. If otoh the user is waiting an unacceptable amount of time for a response (subjectve but definitiely measurable) or if the server is being overloaded, then it's time to begin optimization. And then, which optimization techniques will make the most sense, or be the most cost effective, depend on what your analysis of the issue tells you.
So, in the meantime, focus on maintainability. That means, in your case, code reuse
Personally I would go with 1 larger round trip.
This will definately be influenced by the exact reusability of the calling code, and how it might be refactored.
But as mentioned, this will depend on your exact situation, where maintainability vs performance could be a factor.