Did anyone observed slowness when querying data from snowflake(select statement) with OS Windows 2016 + pycharm environment.
Getting result quickly with window 10 OS with same environment.
When checked network performance using wireshark, round trip time is more with windows 10 when compared with window 2016.
Other difference is window 10 is on physical on-prem system and windows 2016 is workspace in AWS.
Any one experienced slowness with similar setup as above.
Any suggestion to troubleshoot further.
Do you think the query execution at Snowflake is slow?
Have you compared the query execution time by keeping other conditions (like warehouse size, load on the warehouse) the same?
If the slowness is in data transfer over network, it could be because of AWS region etc, not to mention several other factors that could impact network data transfer speeds.
Snowflake's query execution performance should not be impacted based on your location or OS etc.
Snowflake is running as a SaaS on the cloud. In Ragesh's case I think it is running on AWS. So, from a query performance perspective on snowflake, it ideally should not matter whether you are initiating the connection from a Windows 2010 or 2016. But the network bandwidth may be a factor to consider. Your bandwidth on premise may not be as good as when you are on AWS. To be sure about that. Can you please do this test.
Run your query with Windows 10 - Before running the query, please set a query tag like
ALTER SESSION SET QUERY_TAG ='ONPREMISE';
Run your query with Windows 16 - Before running the query, please set a query tag like
ALTER SESSION SET QUERY_TAG ='ONAWS';
After this, go to Snwoflake console history view and filter based on query_tag and share the query profile details for both the scenarions
Related
Anybody who can advise or have experience on the possibility to have an SqlBase database in a cloud environment and run a Gupta application which is stored on local PCs?
Thanks.
We have some experience running a SQL-Database (Oracle, SqlServer, SqlBase) on a remote Server connected over WAN. Most often data access is very slow and you have to write your application carefully.
The reason for the slowness is usually not the bandwidth but the number of hops an IP-packet takes. Each hop adds some milliseconds of delay which oftens sums up to a painful experience. So it's ok to get one big blob from a database. It's also ok to fetch large resultsets. But when there are a lot of smaller queries it will get very slow.
There are two solutions to this problem:
1) Use a dedicated line from client to server if possible.
2) Write your application in a way that minimizes the number of queries.
I seem to be having real issues trying to get performance anywhere near that stated in the docs (~700 - 2000 tps with a VM of: 2 vCPUs 4GB RAM). I have tried on a local VM, a local machine and a few AWS VMs and I can't get anywhere close. - The maximum I have achieved is 80 tps on an AWS VM.
I have tried changing the -dbPoolSize and the -reqPoolSize for orion and playing with ulimit to set it to that suggested by MongoDB - but everything I change doesn't seem to get me anywhere close.
I have set indexes on the _id.id, _id.type and _id.servicePath as suggested in the docs - the latter of which gave me an increase from 40 tps to 80 tps.
Are there any config options for Orion or Mongo that I should be setting away from the default which will get me any closer? Are there any other tips for performance? The link in the docs to the test scripts doesn't work so I haven't been able to see the examples.
I have created my own test scripts using Node.js and I have tested update and queries using a variable amount of concurrent connections and between 1 and 2 load injectors.
From looking at the output from "top" the load is with Mongo as it almost maxes out the CPU but adding more cores to the VM doesn't change the stats. The VM has 7.5GB or 15GB of RAM so mongo should be able to put all the data into memory for blazing fast performance?
I have used mongostat to see that the connections from orion to mongo change with the -dbPoolSize option, but this doesn't yield any better performance.
Any help you can provide would be much appreciated.
I have tried using CentOS 6.5 and 6.7 with Orion 0.25 and 0.26 and MongoDB 2.6 with ~500,000 entities
My test scripts and data are on GitHub
I have only tested without subscriptions so far, but I have scripts ready to test with subscriptions - but I wanted to get a good baseline before adding subscriptions.
My data is modeled around parking spaces in the UK countries their regions and their outcodes (first part of the postcode). This is using servicePaths to split them down to parking lot in an outcode.
Here is a gist with the requests and mongo shell output
Performance is a complex topic which depends on many factor (deployment setup of Orion and MongoDB, startup configuration of Orion and MongoDB, hardware profile in the systems hosting the processes, network communications, overprovisioning level in the case of virtualization, injected load, etc.) so there isn't any general answer to deal with this kind of problems. However, I'd try to provide some hints and recommendations that I hope may help.
Regarding versions, Orion 0.26.0 (or 0.26.1) is recommended over 0.25.0. We have included a lot of improvements related with performance in Orion 0.26.0. Regarding MongoDB, we have also found that 3.0 could be much better than 2.6, specially in update intensive scenarios.
Having said that, first of all you should locate the bottleneck. Useful tools to do this are top, mongostat and mongotop. It could be either Orion, MongoDB or the network connecting them. If the bottleneck is CB, maybe the performance tuning hints provided in this document may help. Slow queries information in MongoDB could be also pointing to bottlenecks at Orion. If the bottleneck is MongoDB, taking into account the large number of entities you have (500,000) maybe you should consider to implement sharding. If the bottleneck is the network, colocation both Orion and MongoDB may help.
Finally, some things you can also try in order to get more insight into the problem:
Run some tests outside AWS (i.e. virtual machines in local premises) to compare. I don't know too much about the overprovising policy in AWS but based in my previous experiences with other cloud providers the VM overprovisioning (specially if it varies along time) could impact in performance.
Analyze if the peformance is related with the number of entities. E.g. run test with 500, 5,000, 50,000 and 500,000 entities and get the performance figure in each case.
Analyze if the performance is related with the usage of servicePath, e.g. put all the 500,000 entities in the default service path / (moving the current content of the servicePath to another place, e.g. an entity attribute or part of the entity ID string) and test. Currently Orion uses a regex to filter for servicePath and that could be slow.
I have already combed through this old article:
Why is Entity Framework taking 30 seconds to load records when the generated query only takes 1/2 of a second?
but no success.
I have tested the query:
without lazy loading (not using .Include of related entities) and
without merge tracking (using AsNoTracking)
I do not think I can easily switch to compiled queries in general due to the complexity of queries and using a Code First model, but let me know if you experience otherwise...
Setup
Entity Framework '4.4' (.Net 4.0 with EF 5 install)
Code First model and DbContext
Testing directly on the SQL Server 2008 machine hosting the database
Query
- It's just returning simple fields from one table:
SELECT
[Extent1].[Id] AS [Id],
[Extent1].[Active] AS [Active],
[Extent1].[ChangeUrl] AS [ChangeUrl],
[Extent1].[MatchValueSetId] AS [MatchValueSetId],
[Extent1].[ConfigValueSetId] AS [ConfigValueSetId],
[Extent1].[HashValue] AS [HashValue],
[Extent1].[Creator] AS [Creator],
[Extent1].[CreationDate] AS [CreationDate]
FROM [dbo].[MatchActivations] AS [Extent1]
The MatchActivations table has relationships with other tables, but for this purpose using explicit loading of related entities as needed.
Results (from SQL Server Profiler)
For Microsoft SQL Server Management Studio Query: CPU = 78 msec., Duration = 587 msec.
For EntityFrameworkMUE: CPU = 31 msec., Duration = 8216 msec.!
Does anyone know, besides suggesting the use of compiled queries if there is anything else to be aware of when using Entity Framework for such a simple query?
A number of people have run into problems where cached query execution plans due to parameter sniffing cause SQL Server to produce a very inefficient execution plan when running a query through ADO.NET, while running the exact same query directly from SQL Server Management Studio uses a different execution plan because some flags on the query are set differently by default.
Some people have reported success in forcing a refresh of the query execution plans by running one or both of the following commands:
DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE
But a more long-term, targeted solution to this problem would be to use Query Hints like OPTIMIZE FOR and OPTION(Recompile), as described in this article, to help ensure that good execution plans are chosen more consistently in the first place.
I think the framework is doing something funky if you what you say is true i.e. running the query in management studio takes half a second while entity framework takes 8.2 seconds. My hunch is that it's trying to do something with those 25K+ records set (perhaps bind to something else).
Can you download NP NET profiler and profile your app once? http://www.microsoft.com/en-in/download/details.aspx?id=35370
This nifty little program is going to record every method call and their execution time and basically give you info from under the hood on where it's spending those 7+ seconds. If that does not help, I also recommend trying out JetBrains .NET profiler. https://www.jetbrains.com/profiler/
Previous answer suggests that the execution plan can be off and that's true in many cases but it's also worth to sometimes look under the hood to determine the cause.
My thanks to Kalagen and others who responded to this - I did come to a conclusion on this, but forgot about this post.
It turns it is the number of records being returned X processing time (LINQ/EF I presume) to repurpose the raw SQL data back into objects on the client side. I set up wireshark on the SQL server to monitor the network traffic between it and client machines post-query and discovered:
There is a constant stream of network traffic between SQL server and
the rate of packet processing varies greatly between different machines (8x)
While that is occurring, the SQL server CPU utilization is < 25% and no resource starvation seems to be happening (working set, virtual memory, thread, handle counts, etc.)
so it is basically the constant conversion of the results back into EF objects.
The query in question BTW was part of a 'performance' unit test so we ended up culling it down to a more reasonable typical web-page loading of 100 records in under 1 sec. which passes easily.
If anyone wants to chime in on the details of how Entity Framework processes records post-query, I'm sure that would be useful to know.
It was an interesting discovery that the processing time depended more heavily on the client machine than on the SQL server machine (this is an intranet application).
The sql timeout expired and operation timeout expired, these 2 error message are mostly pop up in crm 2011.
I have written a plugin which access the NAV webservice and the update the order and order product entity.
The Database size is around 240 gb and around 1000 times the plugin written above process within 2 hours.
Kindly suggest the solution.
Like Nick says we need more details, but this appears that your database operations might be failing under load. A 340 GB database probably has several tables with tens of millions of records, and 500 plugins firing an hour could be a significant amount of concurrency depending on the complexity of what the plugin is doing. In general the solution is to optimize your server infrastructure.
More specifically I would take a look at several potential actions, loosely in order of bang for buck:
Index Maintenance:
Microsoft recommends that indexes with greater than 30% fragmentation be rebuilt, and those with greater than 10% be reorganized.
Blog about CRM Index Maintenance
Indexing:
Creating indexes for your large tables that experience a high level of concurrent access can greatly increase performance as well as reduce table locking. Indexes for CRM must be created in SQL server and this is supported by Microsoft.
Analyze the efficiency of your plugins:
Are you only writing delta data? Are you limiting database reads to only bring back columns you require? Are you caching information that will not change within the scope of the plugin or application?
Database Isolation Level:
Microsoft recommends an isolation level of "Read Committed with Row Versioning" for CRM databases that operate under a high level of concurrency. Here is a related article.
Upgrade Hardware
More hardware power never hurts. Also having your SQL server and CRM application on separate machines is recommended, as is having your database log file on its own physical hard drive.
Not sure if this is a question for here or the DBA forum, but here is some background to my problem. I have an application written in C# which uses the Gentle Framework to interface with our database. The issue I am running into has shown up on two different servers, both running SQL Server 2008 R2. One server is running Windows Server 2003 with 16GB of RAM, and the other is running Windows Server 2008 R2 with 64GB of RAM. Also we are running gigabit intranet so I doubt it is either a resource or network issue.
That being said my issue is... when inserting into the database, each insert is taking a little more time starting with about 30ms and building up to about 1900ms before suddenly taking 189549ms (a little over 3 minutes). After that 3 minute insert, the time drops down to about 10ms and starts building up again. Here is a link to my log file showing the time in ms and the query. Unfortunately, due to what has been called "proprietary", I can't share the exact inserts with you but I can answer general questions about the details.
Some additional details:
The log file linked only shows queries which took over 10ms, there are many more queries in the log between the inserts, however they are only taking 1-2ms. I can link one with all of the queries.
I have looked at other questions regarding a similar issues but they have either been about RAID or multiple inserts vs multiple values statements
These are parameterized queries
The tables are indexed and it is not possible to remove these because other applications are using these tables and depend on those indexes.
I don't think I have much choice in changing how records are inserted because Gentle is generating the SQL.
I found this question which I think is close.
I believe that Gentle is doing all of this in 1 transaction, and I have been told that "it should be done in 1 transaction, and we are not going to break it apart"