My OS is windows 10 x86_64.
I had checked supporting arm64 the cpu. So I had knew 64bit cpu.
But sometimes, I got error message about OS bit.
So I do cpu bit test on c language.
printf("%d", sizeof(int*));
I had expected result is 8. But Result was 4.
1. What is my cpu bit?
2. if my cpu is 32bit, Can use memory over 4GB? My cpu supports arm64.
Please I'm very confused.
Your CPU almost certainly can't be both arm64 and run x86_64 Windows, because the Intel and ARM instruction sets are not the same. Perhaps you meant AMD64? If you search the web for you CPU model, you probably will be able to find out how many bit it is.
Further, keep in mind that the C standard only requires that ints be at least 16 bits, not the same size as the machine's native size. I suspect that the compiler you were testing with might not have been aware of the 64-bit capabilities of your CPU, and compiled your code as though your CPU was a 32-bit CPU.
As far as memory support, as far as I know, the motherboard and CPU model will affect the actual amount of memory your system will support.
Most likely you CPU supports amd64.
The size of the C standard types depends on the data model.
The size of a pointer depends on the execution mode (long-mode vs compatibility-mode) and can be 32-bit even on 64-bit OSes.
If your CPU is 32-bits you could use more than 4GiB of memory, but since the premises is almost surely false, the easiest solution is simply to recompile for a 64-bits environment.
Related
I have Matlab R2017a installed on a server running MS Windows Server 2008 R2 Enterprise v 6.1 (SP1) and the benchmark results are awful:
bench
3.6424 0.5267 0.2114 5.0303 1.5557 3.4980
[columns = LU, FFT, ODE, Sparse, 2-D, 3-D]
Note that it is particularly slow for LU and Sparse.
The server has this hardware:
CPU: Intel Xeon E7320 # 2.13GHz (4 physical processors, 16 logical)
128 GB RAM
64-bit operating system
Matlab Version: 9.2.0.556344 (R2017a)
Java version: Java 1.7.0_60-b19 with Oracle corporation Java Hotspot(TM) 64-Bit Server VM mixed mode.
There are also other users that can be online on the server but I can see that they are not stressing the system and have verified that these running times are stable (have tested multiple times the past week.
My question is this: is there any other library or something that Matlab relies on that could be "wrong"? I have another similar setup on a similar but slightly newer server that gets bench results much closer to what I'd expect based on the specs. I'm wondering if it's using a "wrong" linear algebra module or something.
Alternative explanation I know that Matlab ran extremely slowly on a particular AMD Opteron CPU (I happen to also have worked on such a server in Matlab, link https://se.mathworks.com/matlabcentral/answers/33939-poor-matlab-performance-on-amd-based-computer). Is it possible that it's a similar issue with the Intel Xeon E7320?
Edit: Xeon E7320 as suggested by Peter.
Update: I'm not sure whether Matlab's bench takes advantage of just a single CPU core, multiple CPU cores, or also a GPU (OpenCL / CUDA). If it can use GPU acceleration, that would make a huge difference. (Especially if you don't have one at all in your "slow" server).
As discussed in comments, a dual-core Sandybridge laptop is 10x faster on some of the benchmarks, but only 2 or 1.5x faster on some other components. (But I'm not sure if the version of Matlab was controlled for; that thread you linked mentioned that different version of Matlab do a different amount of work in their bench).
The rest of this answer was written with the assumption that your test takes advantage of all your CPU cores (otherwise there's no point using an old many-core machine). But without considering GPU.
I think your CPU is actually a 65nm Core2-based Xeon E7320, not "E3720" (no google hits). What are you comparing against? Your Tigerton CPUs are ancient (about 10 years old), of course they're slow. (Tigerton is the same microarchitecture as Conroe/Merom, first-gen Core2).
You have very low memory bandwidth and cache speeds compared to a modern CPU, as well as only having SSSE3, not AVX or FMA. These CPUs don't have a memory controller built-in, so all 4 sockets are sharing the memory controller hub (MCH) via separate 1066MHz Front-Side Buses. Memory bandwidth doesn't scale with number of sockets, and is not great. Memory bandwidth has grown faster than per-core performance over the years. According to that link, a quad-socket 16-core Tigerton (like you have) is barely better than a quad-socket 8-core Barcelona Opteron. It's not so bad for CPU-bound workloads, but memory-bound workloads will do quite badly.
As well as the low clock speed, it's significantly slower clock-for-clock than a modern CPU. IDK what those times are supposed to be like (I'm here for the [performance] tag, not Matlab), but it's totally plausible that a 3GHz quad-core i5 or i7 Haswell / Skylake desktop or high-power laptop would be faster than your 16-core dinosaur machine.
(Actually, does that benchmark even scale with the number of cores? If not, the single-threaded memory bandwidth is probably really not good.)
A very big jump in performance happened with Sandybridge (for all code, including non-SIMD workloads), and there were several other smaller jumps in between your machine and modern CPUs as well. SnB can run 2 load instructions per clock, vs. 1 for previous Intel (like your Core2).
For FP-specific stuff that optimized libraries will take advantage of, x86 ISA extensions have been important: AVX doubles the SIMD vector width, doubling FLOPS (on Intel CPUs with full-width execution units). FMA does a mul+add in one instruction, potentially doubling FLOPS. Microarchitectural improvements are important, too: Haswell has two FMA units vs. earlier CPUs having one FP adder and one FP multiplier, again potentially doubling FLOPS. Only contiguous memory and high computation vs. memory workloads will fully take advantage of this, e.g. a dense matmul, but in that case one Haswell core is doing as much work as 8 Tigerton cores.
I assume Matlab can take advantage of AVX + FMA if the CPU has it.
And BTW, it's not just 16 "logical" processors. You don't have hyperthreading, so you have a 4-socket system with four quad-core CPUs, for 16 physical cores. (And these "quad core" chips are actually two separate dual-core dies in the same package, according to wikipedia.
So as far as the number of physical chips that need to communicate with each other, there are 8 (two in each package). That's a lot of hops to reach other CPUs, so synchronization between cores is more expensive than for a single-die quad-core. (And probably worse even than a modern dual-socket Xeon box with a pair of 18-core CPUs or something).
Note that high latency to memory can also hurt memory bandwidth: see the "latency bound platforms" part of this answer about optimizing memcpy/memset and how store bandwidth works in Intel CPUs.
As I understand it, Intel 64-bit CPUs offer the ability to address a larger address space (>4GB), which is useful for a large simulation. Interesting architectural hardware advantages::
16 general purpose registers instead of 8
Additional SSE registers
A no execute (NX) bit to prevent buffer overrun attacks
BACKGROUND
Historically, the simulations have been performed on 32-bit IA (Intel Architecture) systems. I am wondering if where (if any) is opportunity to reduce simulation times with 64-bit CPUs: I expect that software should be recompiled to take advantage of 64-bit capability. This type of simulation would not benefit from a MAC (multiply and accumulate) nor does it use floating point calculations.
QUESTION
That being said, is there an Intel 64-bit instruction or capability that offers an appreciable advantage over the 32-bit instructions set that would accelerate simulation (computationally intensive and lengthy 32-BIT algorithms)?
If you have experience implementing simulations and have transitioned from 32 to 64 bit CPUs, please state this in your response (relevant experience is important). I look forward to insightful responses from the community
The most immediate computational benefits to expect regarding CPU instructions I can think of would be AVX although this is only loosely related to x86_64, but more of an CPU-generational issue.
In our company, we developed multiple, highly-complex discrete event simulations, simulating aircraft (including electrics, hydraulics, avionics software and everything related). They are all built with or ported to x86_64. The reasons are mostly due to memory addressing, allowing for larger caches and wider choice of algorithms (e.g. data-centric design, concurrency), graphics content also tends to be huge nowadays. However, optimizations regarding x86_64 instructions themselves, such as AVX, are left to compilers. I never saw code written in assembler or using compiler intrinsics to actually refer to specific x86_64 instructions explicitly.
To summarize, based on my experience, x86_64 CPUs allow for certain optimizations, often sacrificing memory consumption in favor of CPU processing:
Wider choice of algorithms, especially regarding concurrency, where data may need to be laid out in a way favoring parallel processing at the cost of occupied memory
Intermediate results or other processing output may be cached more easily in memory to avoid recomputation or to optimize for temporal or state-related coherence
AVX instructions may help compilers to vectorize more code than with MMX/SSE
Does 32 bit mean ram size should be 4GB ? or can a computer with say 32GB ram also have 32 bit provided adress space does not exceed 32 bit ?
When we say 32-bit windows or 64-bit OS, which part of OS exactly differs between the two ? I mean does some part of kernel differ ? if yes then which part ?
NOTE: this question is not a duplicate. please dont vote to close
No 32-bit does not necessarily refer to the size of the address bus. If the address bus is 32-bit then certainly the maximum RAM in the system is 4 gb, or 2^32. There have been several examples of 32-bit machines that could exceed 4gb of RAM, however, by using a concept of Page-Extended Addressing (PAE) That was introduces in the mid 1990s.
Another examples where this comes into play is the first IBM PC. It used a 16-bit microprocessor known as the 8088. The 8088 had a 20-bit address line and as such had the capacity of 2^20 (1MB) of RAM.
When we speak of a microprocessor having a certain number of 'bits', such as a 16-bit microprocessor or a 32-bit microprocessor, we are primarily referring to the basic data unit that the processor can handle at a time. This is determined by the size of the processor registers, which are the areas of the processor used for holding data for calculations and decisions.
Because there is a fundamental difference in how machine code is used to grab and process data in a 32-bit vs a 64-bit system, All code must be compiled specifically for the machine you want it to run on. This is why there are two version of many x86 operating systems. There is often one for 32-bit and one for 64-bit x86. x86 microprocessors have a legacy of backwards compatibility and are therefore able to run in 16, 32, or 64-bit modes. This means that you can run 32-bit windows on a 64-bit processor. If this backwards compatibility wasn't build in, however, this would not be possible.
So, as far as which part of the kernel differs, the answer is all of it. The same is true for desktop applications that are coded for 64-bit machines. If they have two versions, the entire code is different as the compiler optimizes for one or the other.
The only difference I know is that size of the registers for 64-bit and 32-bit processors are 64 and 32 bits, respectively. Also the addresses are 64 bits in 64 bit processors. Are there any other differences between these two?
x86_64 has more registers than x86, so more work can be done on the CPU rather than constantly fetching bits from RAM. Also, x86_64 guarantees that the CPU supports at least SSE2, so the compiler knows it can optimize for that.
Those are the key differences, but those differences have many effects - for instance, since addresses are larger, the amount of memory you can effectively access is greater - 32-bit OSes are traditionally limited to around 4GB of memory.
In 32-bit machine the maximum size of RAM will be 4GB
2^32=4294967296 bits which equals to 4GB
but in case of 64-bit machine this will be-
2^64=18446744073709551616 bits which equals to 17179869184 GB
Physical Address Extension (PAE) is a feature to allow x86 processors to access a physical address space larger than 4 GB. This can go up to 64 GB. To use PAE, the OS must support this feature. All major OSes allow the use of PAE, including Windows.
Hence, memory access can't really be held as grounds for distinction between 32-bit & 64-bit OSes.
On the other hand, almost all the processors coming now into the market are 64-bit capable, so it really depends on your OS, how much memory access it allows.
The main difference between 32-bit processors and 64-bit processors is
the speed they operate. 64-bit processors can come in dual core, quad
core, and six core versions for home computing (with eight core
versions coming soon).
Multiple cores allow for increase processing
power and faster computer operation. Software programs that require
many calculations to function operate faster on the multi-core 64-bit
processors, for the most part.
It is important to note that 64-bit
computers can still use 32-bit based software programs, even when the
Windows operating system is a 64-bit version.
Another big difference
between 32-bit processors and 64-bit processors is the maximum amount
of memory (RAM) that is supported. 32-bit computers support a maximum
of 3-4GB of memory, whereas a 64-bit computer can support memory
amounts over 4 GB. This is important for software programs that are
used for graphical design, engineering design or video editing, where
many calculations are performed to render images, drawings, and video
footage. One thing to note is that 3D graphic programs and games do
not benefit much, if at all, from switching to a 64-bit computer,
unless the program is a 64-bit program.
A 32-bit processor is adequate
for any program written for a 32-bit processor. In the case of
computer games, you'll get a lot more performance by upgrading the
video card instead of getting a 64-bit processor.
In the end, 64-bit
processors are becoming more and more commonplace in home computers.
Most manufacturers build computers with 64-bit processors due to
cheaper prices and because more users are now using 64-bit operating
systems and programs. Computer parts retailers are offering fewer and
fewer 32-bit processors and soon may not offer any at all.
Extract from : Here.
To increase the timeliness of my programs matlab, I got Windows 7 (64bit) and 64bit Matlab. and I've installed on a partition of the hard disk. Unfortunately, I was shocked to see that the execution time of the program is longer with 64bit Matlab. I do not know what's the problem. knowing that I have a core 2 Quad processor and 3GB of RAM.
In general, 64-bit does not make code faster. It just lets you access more memory. Your code will only speed up if it was memory constrained in a 32-bit process. In Matlab, this would usually cause Out Of Memory errors, not slowdowns. And since you only have 3 GB, you probably weren't hitting the 32-bit limit of 4 GB. So you probably shouldn't expect a speedup. A slowdown is surprising, though.
Are you using object-oriented Matlab, especially the old (pre-MCOS) style? There is a known bug in 64-bit Matlab on Windows that increases the overhead of method dispatch. OO code will run slower in 64-bit Matlab than 32-bit Matlab, with the slowdown increasing with the density of method calls. In my codebase (heavily OO), it's about a 2x slowdown. That's about the magnitude you're seeing.
See Is MATLAB OOP slow or am I doing something wrong?. (It's discussed tangentially there.)
You can still run 32-bit Matlab on 64-bit Windows. (Though it's not officially supported.) This arrangement does not suffer from the method dispatch slowdown, plus it gets 4 GB of virtual memory instead of the 2 GB it would under a 32-bit OS. (Probably only useful if you have >=4GB RAM.) If the 32-bit does run faster on the exact same machine, you should report it as a bug to MathWorks; the more users that mention it, the more likely it is to get fixed.
Matlab has a built-in profiler, which is a tool that tells you how many times each function is called and how much time it takes to execute. You should use the profiler to find out where the bottle-neck is, i. e. what parts of your program take the most time.
If you do this on both the 32-bit and the 64-bit platforms, you may find out why the 64-bit version is slower.