Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
i have a question on speed up application built by MATLAB software, i need to know the affect of using vectorization and parallel computation on speed up the application ? and if there is better method than both previous way in such case ? thanks
The first thing you need to do when your MATLAB code runs too slow is to run it in the profiler. In recent versions of MATLAB, this can by done by pressing the "Run and Time" button on the main toolbar. This way, you will now which functions and which lines in these function take up the most time. Once you know this, you may do one of the following, depending on your circumstances and the nature of the particular piece of code:
Think if your algorithm is the most optimal one in terms of O() complexity.
Try turning loops into vector operations. The efficacy of this has declined in recent versions of MATLAB because of improvements in how loops are executed.
If you have a multi-core CPU try using the parallel computing toolbox. If your code parallelizes well, you will get a sped up nearly equal to the number of cores.
If you have an nVidia GPU try using the GPU support. You can get a speed-up by a factor of 10 or more with some problems, but not all problems are amicable to this sort of optimization.
If everything else fails, you may outsource the slowest piece of your code to a low level language like C. See here for how to do this. You could then use low-level profiling tools like Intel vTune to get the absolute maximum speed from the low-level code.
If it is still too slow, you may need to buy an FPGA. See here for a brief tutorial.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
What is the Von Neuman bottleneck and how does functional programming reduce its effect? Can someone explain in a simple way through a practical and comprehensive example that shows, for instance, the advantage of using Scala over Java, if there is any?
More importantly, why is avoiding imperative control-structures and preferring functions so significant to improving performance? Ideally, an actual coding example that explains how a problem solved with a function and without one is affected by the Von Neuman bottleneck would be very helpful.
Using Scala will not necessarily fix your performance problems, even if you use functional programming.
More importantly, there are many causes of poor performance, and you don't know the right solution without profiling.
The von Neumann Bottleneck has to do with the fact that, in a von Neumann architecture, the CPU and memory are separate and therefore the CPU often has to wait for memory. Modern CPUs solve this by caching memory. This isn't a perfect fix, since it requires the CPU to guess correctly about which memory it needs to cache. However, high-performance code makes it easy for the CPU to guess correctly by structuring data efficiency and iterating over data linearly (i.e. good data locality).
Scala can simplify parallel programming, which is probably what you are looking for. This is not directly related to the von Neumann Bottleneck.
Even so, Scala is not automatically the answer if you want to do parallel programming. There are several reasons for this.
Java is also capable of parallel programming, and has many types of parallel collections for that purpose.
Java 8 Streams are Java's answer to Scala's parallel collections. They can be used for functional programming.
Parallel programming is not guaranteed to improve performance, and can make a program slower on small data sets, due to setup costs.
There is one case where you are correct that Scala overcomes the von Neumann Bottleneck, and that is with big data. When the data won't fit easily on a single machine, you can store the data on many machines, such as a Hadoop cluster. Hadoop's distributed filesystem is designed to keep data and CPUs close together to avoid network traffic. The easiest way to program for Hadoop is currently with Apache Spark in Scala. Here are some Spark examples; as of Spark 2.x, the Scala examples are much simpler than the Java examples.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I need lots of cables of different small sizes (under 100 meters) and cables are only sold in lenghts of 100 meters.
So, to optimize my purchase, I would like a code where I can input the lengths of all pieces of cables that I need. The code will combine my inputs under the constraint the sum is under 100, while minimizing the total number of 100m-length cables that I need to buy.
If anyone could help with a code in VBA, Matlab or Python I would be very grateful.
This is known as a bin-packing problem, and it's actually very difficult (computationally speaking) to find the optimal solution.
However, it is a problem that is practically useful to solve (as you have seen for yourself) and so there are several approaches that seek to find an approximate solution--one that is "good enough" without guaranteeing that it's the best possible solution. I did a quick search and found this course website, which has some examples that may help you out.
If you are looking for an exact solution, you can ask the related question "will I be able to fit the cables I need into N 100-meter cables?". This feasibility problem can be expressed as a "binary program", which is a special case of a "mixed-integer linear program", for which MATLAB has a solver called intlinprog (requires the optimization toolbox).
I'm sorry that I don't have any code to solve your problem, but I hope that this at least gives you some keywords to help you find more resources!
I believe this is like the cutting stock problem. There are some very good methods to solve this. Here is an implementation and some background. It is not too difficult to write an Excel front-end for this (see here).
If you google for "cutting stock problem" you will find lots of references.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
It would be extremely useful to have some idea of expected performance benchmarks for inserts in a postgresql database. Typically the type of answers one would get on this are vague, and in many ways rightly so. For example, answers could range from every database is different, to it depends on the number of indexes/columns, to hardware makes a big difference, to db tuning makes a big difference etc. MY goal is to know the general guidelines of insert performance, roughly at an equivalent level as when an experienced SQL Developer's intuition says "this seems slow, I should try to optimize this".
Let me illustrate, someone could ask how much does it cost to buy a house? We answer, expensive! And there are many factors that go into the price such as size of the house and location in the country. BUT, to the person asking the question, they might think $20,000 is a lot of money so houses must cost about that much. Saying it's expensive and there are a lot of variables obviously doesn't help the person asking the question much. It would be MUCH more helpful for someone to say, in general the "normal" cost of houses ranges from $100K-$1M, the average middle-class family can afford a house between $200K and $500K, and a normal cost per square foot is $100/square foot.
All that to say I'm looking for ballpark performance benchmarks on inserts for the following factors
Inserting 1000, 10000, 100000 rows into average table size of 15 columns.
Rough effect of every additional 5 columns added to the table
Rough effect of each index on the table
Effect of special types of indexes
Any other ideas that people have
I'm fine with gut feel answers on these if you are an experienced postgresql performance tuner.
You cannot get a meaningful figure here for the list of conditions you specified, because you do not even list the types of conditions that would have a profound effect on the speed of the INSERT command:
Hardware capabilities:
CPU speed + number of cores
storage speed
memory speed and size
Cluster architecture, in case the batch is huge and can cross over
Execution scenario:
text batch, with pre-generated inserts one-by-one
direct stream-based insert
insert via a specific driver, like an ORM
In addition, the insert speed can be:
maintained (consistent or average) speed
single-operation speed, i.e. for a single batch execution
You can always find a combination of such criteria so bad you would be struggling to do 100 inserts a second, and on the other side it is possible to go over 1m of inserts in a properly set up environment and execution plan.
So you will find the speed of your implementation somewhere in between, but given the known conditions, the speed will be 42 :)
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 months ago.
Improve this question
in the meantime, is there a way to dictate MATLAB or Paraview or any other application that uses OpenGL to do stuff in double precision ? I could use a workaround for my problems, but I prefer not to :) Thanks!
EDIT:
I try to be more specific about the problem/issue. First two images:
The first one is rendered using openGL, the second (fine one) is rendered after typing the "opengl neverselect" method, which switches to another renderer. Since I experience quite simiular renderering problems in Paraview as well, I am quite sure that this is OpenGL specific and not the "fault" of matlab or Paraview. When I shift the values as mentioned in the comment below, I get smoothly rendered images as well. I assume that is because my data range has a huge offset from zero and the precision in the rendering routine is not accurate enough and produces serious rounding errors in the rendering calculations.
Thus, I would like to know if you know some way (in MATLAB, Paraview, in the OS settings) to set the rendering precision higher ( i read that gpus/OpenGL usually calculate in float)
First off, this has nothing to do with OpenGL. The part of MATLAB actually doing the plotting is written in some compiled language, and relies on OpenGL just for displaying stuff to the screen.
The precision used (double/float) is hard coded into the program. You can't have the OS or something force the program to use different data types. In certain cases you might be able to make the relevant changes to the source code of a program and then recompile, but this doesn't sound like it is applicable in your case.
This doesn't mean that there isn't a way to do what you want in MATLAB. In fact, since the program is specifically designed to do numeric computation there almost certainly is a way to specify the precision. You would need to provide more detailed information on your issue (screenshot?) if you want to get further guidance.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Basically I have some hourly and daily data like
Day 1
Hours,Measure
(1,21)
(2,22)
(3,27)
(4,24)
Day 2
hours,measure
(1,23)
(2,26)
(3,29)
(4,20)
Now I want to find outliers in the data by considering hourly variations and as well as the daily variations using bivariate analysis...which includes hourly and measure...
So which is the best clustering algorithm is more suited to find outlier considering this scenario?
.
one 'good' advice (:P) I can give you is that (based on my experience) it is NOT a good idea to treat time similar to spatial features. So beware of solutions that do this. You probably can start with searching the literature in outlier detection for time-series data.
You really should use a different repesentation for your data.
Why don't you use an actual outlier detection method, if you want to detect outliers?
Other than that, just read through some literature. k-means for example is known to have problems with outliers. DBSCAN on the other hand is designed to be used on data with "Noise" (the N in DBSCAN), which essentially are outliers.
Still, the way you are representing your data will make none of these work very well.
You should use time series based outlier detection method because of the nature of your data (it has its own seasonality, trend, autocorrelation etc.). Time series based outliers are of different kinds (AO, IO etc.) and it's kind of complicated but there are applications which make it easy to implement.
Download the latest build of R from http://cran.r-project.org/. Install the packages "forecast" & "TSA".
Use the auto.arima function of forecast package to derive the best model fit for your data amd pass on those variables along with your data to detectAO & detectIO of TSA functions. These functions will pop up any outlier which is present in the data with their time indexes.
R is also easy to integrate with other applications or just simply run a batch job ....Hope that helps...