If a Q# function can run classical code that uses bits, will the classical code still run at the speed that quantum computers operate on - q#

Microsoft states:
A Q# function is a classical subroutine used within a quantum algorithm. It may contain classical code but no quantum operations.
By 'classical code', does that mean 32 bit and 64-bit code and applications? If so, will the classical code run at the extreme speeds that quantum computers operate at?

In Q# there are both functions and operations. Operations can describe both classical and quantum computations. Functions on the other hand must be deterministic so they can't describe quantum computations. Q# includes both, but only Operations will generate instructions to the target quantum devices/simulators. The rest of the application and data is managed by a C# driver and thus you can run any other classical programs via C#as a part of your Q# application. And as Caleb says quantum computers are not faster, they can only provide computational speedups on select types of problems. For more Q# tips check out my post from the Q#advent calendar, as well as the Q# docs you already found!

Related

Can CPU(like Intel/AMD/ARM) do higher math computation except for addition/subtraction/multiplication/division?

I learn something about logical circuit and computer architecture,including assembly instruction set(such as x86 instruction set,ARM instruction set) and microarchitecture(x86/ARM),I found no matter Intel processor or ARM processor can only do addition/subtraction/multiplication/division these four basic math computation hardwarely,because Intel/ARM processor only have adder/subtracter/multiplier/divider these four basic computers.
But these processors support more advanced math computation like trigonometric function,exponential function,power function and these functions' derivative/ definite integral?Even matrix computation?
I know these advanced math computations can be done softwarely(like Python's NumPy/SciPy),but Intel/ARM processor can support these advanced math computations hardwarely just like addition/subtraction/multiplication/division?
Generally speaking, you can build hardware structures to help accelerate the calculation of things such as trigonometric functions. However, in practice, it's pointless, because it's not a good use of hardware resources.
There is a paper from 1983 on how trigonometric functions were implemented on the 8087 floating-point co-processor (Implementation of transcendental functions on a numerics processor). Even there, they rely on a CORDIC implementation, which is a method of calculating trig functions using relatively basic hardware (add/sub/shift/table look-up). You can read more about CORDIC implementations in the following paper: Evaluating Elementary Functions in a Numerical Coprocessor Based on Rational Approximations
On a modern x86 processor, complex instructions like FCOS are implemented using microcode. Intel doesn't like to talk about their microcoded instructions, but you can find a paper from AMD right now that describes this particular use of microcode: The K5 Transcendental Functions
Intel processors do support trigonometric and many advanced computations
According to "Professional Assembly Language" by Richard Blum
Since 80486 the Intel IA-32 platform has directly supported floating point operations.
The FPU Floating Point Unit supports many advanced functions other than simple add, sum, multiple, div.
They are
Absolute value FABS
Change sign FCHS
Cosine FCOS
Partial Tangent FPTAN
etc.

What are the available approaches to interconnecting simulation systems?

I am looking for a distributed simulation algorithm which allows me to couple multiple standalone systems. The systems I am targeting for interconnection use different formalisms, e.g. discrete time and continuous simulation paradigms. Now, the only algorithms I found were from the field of parallel discrete event simulation (PDES), such as the classical chandy/misra "null"-message protocol, which has some very undesirable problems. My question is now, what other approaches to interconnecting simulation systems besides PDES-algorithms are known i.e. can be used for interconnecting simulation systems?
Not an algorithm, but there are two IEEE standards out there that define protocols intended to address your issue: High-Level Architecture (HLA) and Distributed Interactive Simulation (DIS). HLA has a much greater presence in the analytic discrete-event simulation community where I hang out, DIS tends to get more use in training applications. If you'd like to check out some applications papers, go to the Winter Simulation Conference / INFORMS-sponsorted paper archive site and search for HLA, you'll get 448 hits.
Be forewarned, trying to make this stuff work in general requires some pretty weird plumbing, lots of kludges, and can be very fragile.

Dedicated distributed system to handle matlab jobs

I'm a software engineer and currently looking forward to setup a distributed system at my laboratory so that I can process some matlab jobs over them. I have looked into MATLAB MPI but I want to know if there is some way so that I can setup a system here without any FEE or AMOUNT.
I have spent a lot of time looking at that very issue, and the short answer is: nope, not possible.
There are two long answers. First, if you're constrained to using Matlab, then all roads lead back to MathWorks. One possibility is that you could compile your code, you'd need to buy the compiler from Mathworks, though, and then you could run the compiled code on whatever grid infrastructure you wish, such as Hadoop.
Second, for this reason, I have found it much better to simply port code to another language, usually one in open source. For the work I tend to do, Octave is a poor replacement for Matlab. Instead, R and Python are great for most of the same functionality. Personally, I lean a lot more toward R than Python, but that's because R is a better fit for these applications (i.e. they're very statistical in nature).
I've ported a lot of Matlab code to R and it's not too bad. Porting to Python would be easier in general, though, and there is a very large Matlab refugee community that has switched to Python.
Once you're in either Python or R, there are a lot of options for MPI, multicore tools, distributed systems, GPU tools, and more. In fact, you may find the migration easier by writing some of the distributed functions in Python or R, loading up an easy to use grid system, and then have Matlab submit the job to the server. Your local code could be the same, but then you can work on porting only the gridded parts, which you'd probably have to devote some time to write in Matlab, anyway.
I wouldn't say it's completely impossible; You can use TCP/IP sockets to build a client/server application (you will find many MEX implementations for BSD sockets on File Exchange).
The architecture is simple: your main MATLAB client script sends jobs (code along with any needed data serialized) to nodes to evaluate and send back results when done. These nodes would be distributed MATLAB instances running the server part which listens for connections, and runs anything it receive through the EVAL function.
Obviously it is up to write code that can be divided into breakable tasks.
This is not as sophisticated as what is offered by the Distributed Computing Toolbox, but basically does the same thing...

Matlab and GPU/CUDA programming

I need to run several independent analyses on the same data set.
Specifically, I need to run bunches of 100 glm (generalized linear models) analyses and was thinking to take advantage of my video card (GTX580).
As I have access to Matlab and the Parallel Computing Toolbox (and I'm not good with C++), I decided to give it a try.
I understand that a single GLM is not ideal for parallel computing, but as I need to run 100-200 in parallel, I thought that using parfor could be a solution.
My problem is that it is not clear to me which approach I should follow. I wrote a gpuArray version of the matlab function glmfit, but using parfor doesn't have any advantage over a standard "for" loop.
Has this anything to do with the matlabpool setting? It is not even clear to me how to set this to "see" the GPU card. By default, it is set to the number of cores in the CPU (4 in my case), if I'm not wrong.
Am I completely wrong on the approach?
Any suggestion would be highly appreciated.
Edit
Thanks. I'm aware of GPUmat and Jacket, and I could start writing in C without too much effort, but I'm testing the GPU computing possibilities for a department where everybody uses Matlab or R. The final goal would be a cluster based on C2050 and the Matlab Distribution Server (or at least this was the first project).
Reading the ADs from Mathworks I was under the impression that parallel computing was possible even without C skills. It is impossible to ask the researchers in my department to learn C, so I'm guessing that GPUmat and Jacket are the better solutions, even if the limitations are quite big and the support to several commonly used routines like glm is non-existent.
How can they be interfaced with a cluster? Do they work with some job distribution system?
I would recommend you try either GPUMat (free) or AccelerEyes Jacket (buy, but has free trial) rather than the Parallel Computing Toolbox. The toolbox doesn't have as much functionality.
To get the most performance, you may want to learn some C (no need for C++) and code in raw CUDA yourself. Many of these high level tools may not be smart enough about how they manage memory transfers (you could lose all your computational benefits from needlessly shuffling data across the PCI-E bus).
Parfor will help you for utilizing multiple GPUs, but not a single GPU. The thing is that a single GPU can do only one thing at a time, so parfor on a single GPU or for on a single GPU will achieve the exact same effect (as you are seeing).
Jacket tends to be more efficient as it can combine multiple operations and run them more efficiently and has more features, but most departments already have parallel computing toolbox and not jacket so that can be an issue. You can try the demo to check.
No experience with gpumat.
The parallel computing toolbox is getting better, what you need is some large matrix operations. GPUs are good at doing the same thing multiple times, so you need to either combine your code somehow into one operation or make each operation big enough. We are talking a need for ~10000 things in parallel at least, although it's not a set of 1e4 matrices but rather a large matrix with at least 1e4 elements.
I do find that with the parallel computing toolbox you still need quite a bit of inline CUDA code to be effective (it's still pretty limited). It does better allow you to inline kernels and transform matlab code into kernels though, something that

Software simulation of a quantum computer

While we are waiting for our quantum computers, is it possible to write a software simulation of one? I suspect the answer is no, but hope the reasons why not will throw some light on the mystery.
Implementing it isn't that hard. The problem is that the computational and memory complexity is exponential in the number of quantum bits you want to simulate.
Basically a quantum computer operates on all possible n-bit states at once. And those grow like 2^n.
The size of an operator grows even faster since it's a matrix. So it grows like (2^n)^2 = 2^(2*n) = 4^n
So I expect a good computer to be able to simulate a quantum computer up to about 20 bits, but it will be rather slow.
They do exist. Here's a browser based one. Here's one written in C++. Here's one written in Java. But, as stated by CodesInChaos, a quantum computer operates on all probability amplitudes at once. So imagine a 3 qubit quantum register, a typical state for it to be in looks like this:
a1|000> + a2|001> + a3|010> + a4|011> + a5|100> + a6|101> + a7|110>+ a8|111>
It's a superposition of all the possible combinations. What's worse is that those probability amplitudes are complex numbers. So an n-qubit register would require 2^(2*n) real numbers. So for a 32 qubit register, that's 2^(2*32) = 18446744073709551616 real numbers.
And as CodesInChaos said, the unitary matrices used to transform those states are that number squared. Their application being a dot product... They're computationally costly, to say the least.
My answer is yes:
You can simulate the behaviours of a quantum machine by simulating the quantum machine algorithm
D-Wave quantum machine using a technique called quantum annealing. This algorithm could be compared to simulated annealing algorithm.
References:
1.Quantum annealing
2.Simulated annealing
3.Optimization by simulated annealing: Quantitative studies
As Wikipedia state:
A classical computer could in principle (with exponential resources) simulate a quantum algorithm, as quantum computation does not violate the Church–Turing thesis.
There is a very big list of languages, frameworks and simulators.
Some simulate at low level the quantum equations, other just the gates.
Microsoft Quantum Development Kit (Q#)
Microsoft LIQUi>IBM Quantum Experience
Rigetti Forest
ProjectQ
QuTiP
OpenFermion
Qbsolv
ScaffCC
Quantum Computing Playground (Google)
Raytheon BBN
Quirk
Forest
It would be great to know your opinions on their capabilities and easiness of use.
https://quantumcomputingreport.com/resources/tools/
https://github.com/topics/quantum-computing?o=desc&s=stars
Years ago I attended a talk at a Perl conference where Damian Conway (I believe) was speculating on some of this. A bit later there was a Perl module made available that did some of this stuff. Search CPAN for Quantum::Superpositions.
Yet another reason why classical simulation of quantum computing is hard: you need almost perfect - i.e. as perfect as possible - random number generators to simulate measurement.
Quipper is full blown simulation EDSL for Quantum Computing, implemented in Haskell
I have experince to simulate behaviour of several QC algorithms such as Deutsch, Deutsch–Jozsa, Simon's, Shor's algorithms and it's very straightforward.
Another reason why classical simulation of quantum computation is hard: to keep track you may want to know after each action of a n-qubit gate (n>1) whether the outgoing qubits are entangled or not. This must be calculated classically but is known to be NP-hard.
See here: https://stackoverflow.com/a/23327816/363429