Dependency of a C program on CPU and OS - operating-system

Let's think about a simple C program compiled in Windows.
I can compile the program on an Intel CPU machine and run it on an AMD CPU one (same operating system). So does it mean that the instruction set of the CPU's are the same?
Why doesn't the same program run on a machine with different OS and the same CPU?

Intel and AMD line of processors in general have a big overlap in the instruction set that they implement (e.g. sometimes one the other invent some new things and there is a gap until the other company catches up) - that is why you can run programs on both architectures. The same is the reason you cannot run it on other CPU architectures - they don't have the same instruction set for starters, but there are many things that are different.
Operating systems also have their differences. For example, when you compile a program under Windows, you generally get an .exe file. That .exe has a format that only Windows understands and is very different from the format used by Linux for example.
Also, the support that OS gives is completely different - Windows has different kernel functions that you can call compared to e.g. OpenBSD. Even on more abstract levels, it's incompatible. E.g. Windows uses drive letters such as C:\, D:\ and so on to mark drives, while e.g. under Linux it's one big filesystem where you mount different partitions, e.g. under /media or so.
There are different attempts, such as Wine and Cygwin, to execute programs from one platform on another. Using Wine, you can execute Windows executables on Linux directly, as it tries to emulate what Windows provides (not everything works, though). Cygwin is a different product - you can run Windows programs that work similarly to GNU programs on Linux, but they need to be specially compiled - just giving you a hint that it's just two worlds.
That is why Java and .NET (with Mono support on Linux) try to bring these two together. When you make a Java application, you should be able to run it on Linux with more or less same code - some things might not be the same, but majority is.

They're the same, or at least your program is using only a common subset.
For your second question, a few common reasons include:
different OSes require different formats of executables
different OSes will typically have different functions for the program to use
different OSes use different ways of invoking what they provide

1) Intel CPUs and AMD CPUs are intentionally produced this way. You can not run
a program compiled for, say, SPARC CPU on, say, an ARM CPU.
2) In theory it can. Say, Linux has this Wine thing to emulate Windows.
Many Windows programs run perfectly on Linux under Wine.

Related

Machinecode and hardware

first of all hello to all hope you are good and safe.
well i have some questions about machine code and hardware and operating system.
1- i was search about how is pure machine code and i find somethings in here and net but not enough to answer my questions since im new to low level programming language. so how to write a pure machine code like open just my computer with 0,1 are machine code have any file extensions like assembly and .exe i wana write code just directly get area in ram and talk with processor and do what i writed for example open my computer or open a text file for example. so i wana know how to do it are pure machine code have a file extension like .exe or .asm
2- i know each cpu have it owen machine language somethigns is different on them it could not be a way to all cpu's undrestand our machine code. also i wana know for example we have 2 cpu both of them are x64 or x32 but 1 of them are windows other is linux are machine code of x64 windows will work also on x64 cpu linux?
thank you for give your time to me and read.
for now gathering information
An operating system provides the capability to run programs.  So, one program, like the desktop or command line shell, can ask the operating system to run another program.
When that happens, the operating system creates an environment to run the program called a process, and then loads a program file from disc into the process, and directs the CPU to begin executing that program file starting at its beginning.
The operating system has a loader, whose job is to load the disc-based program file into memory of the process.
The loader knows about certain executable file formats.  Different operating system have different loaders and most likely understand different executable formats.
Machine code is contained in these program files, stored on disc using those file formats.  There are other ways to load machine code into memory, though a program file stored on disc loaded by the loader is the most common approach.
Asm, .asm, is a text file, human readable, for storing program code in assembly language.  To use it, such text file is given as input to a build system, which converts that human readable program code into a program file containing equivalent machine code, for later loading into a process by the operating system.
Not only do different operating systems support different file formats for program files, they also support different ways to interact with the operating system, which goes to their programming model that is described by an Application Binary Interface aka ABI.  All programs need to interact with the operating system for basic services like input, output, mouse, keyboard, etc..  Because ABIs differ between operating systems, the machine code in a program written for one operating system won't necessarily run on a different operating system, even if the processor is exactly the same.
Most disc-based file formats for executable program files contain indicators telling what processor the program will run on, so the same operating system on different processors requires different machine code, and hence usually different executable program files.  (Some file formats support "fat" binaries meaning that the machine code for several different processors is in one program file.)
Operating systems also have features that allow execution of new machine code within an existing process.  That machine code can be generated on the fly as with JTT compilers, or loaded more informally by an application program rather than the operating system loader.  Further, most operating system loaders support dynamically loading additional program file content from executable program files.
So, there's lots of ways to get machine code into the memory of a process for execution — support for machine code is one of the fundamental features of operating systems.
Let's also note that no real program is pure machine code — programs use machine code & data together, so all executable file formats store both machine code and data (and metadata).

How to write a program that works on different OS?

The question I'm asking is a bit unclear, I think it's a bit hard to explain it with only one line. Here's my situation: I have a Bitbucket repertory which is cloned in both an Linux environment and Windows environment. The problem I have is:
1- I need to read and write from files and the paths to the different locations must be changed every time I commit and push. Hence, if I was working on Windows and made a push, when I go back to Linux and need to pull and change the paths I used.
2- I'm running selenium with Python. In order to make it work on my Linux serverless machine, I need to create a virtual display with pyvirtualdisplay library. Hence, some code that needs to be executed on my Linux machine must not be executed on my Windows machine.
So the problem I got is if I work on my Windows machine, I need to comment out the lines that creates the virtual display.
These two problems takes me a lot of time, because everytime I pull in a different machine, I can't directly work on the code, but have to change the code first.
Pretty simple. What you are looking for is to import the 'os' module.
import os
Then you can get the "name" of the current running Operating System by checking: os.name.
You can then base if or case statements, or however you want to manage your code, on what OS is reported.
To be clear, I would wrap the code I want to run only in a certain environment in an if statement that determines which OS is running and runs the appropriate code block based on the result.
if [[ os.name == "nt" ]]; then
;do Windows stuff
else
;do linux stuff
fi
Additional Note! That was psuedocode based on general Bash scripting.
This is Python psuedocode:
if os.name == "nt":
; Do Windows Stuff
else:
; Do Linux Stuff
Your first issue can be solved by handling the paths in a configuration file instead of hard-coding them in the program itself (and possibly using the pathlib or os.path modules for the actual path manipulations), then telling git to ignore the config file. You can then create appropriate configurations on each system, and git won't bother them at all.
Your second issue can be solved by using any of the various methods Python gives you to figure out what OS you're running on, and then using simple conditional statements to match on them. In theory, you could do the same for the paths, but it really is better in the long run to get in the habit of properly separating runtime configuration like that from program logic. Options for this include:
os.name: Contains posix, nt, or java, which identifies what type of OS you're on. java for Jython, nt for Windows,posix` for pretty much everything else. Useful when you just care about certain low-level OS semantics.
sys.platform: Contains a generic name for the underlying OS ABI. win32 for Windows, darwin for macOS, linux for Linux, and the name of the OS for other UNIX variants. This lets you check for particular underlying platforms, and is generally what you should use for conditional code that only runs on one platform. Make sure to always check this with a construct like sys.platform.startswith('X'), as some platforms and Python implementations include version info after the OS name.
platform.system(): Similar to sys.platform, except that the returned string is more user-friendly (Windows for Windows, Linux for Linux, etc), and it returns an empty string if it can't figure out what OS you're on. Useful for displaying the OS to the user, but not for doing conditionals (because it's free-form and may not always return useful information).

Why are there different packages for the same architecture, but different OSes?

My question is rather conceptual. I noticed that there are different packages for the same architecture, like x86-64, but for different OSes. For example, RPM offers different packages for Fedora and OpenSUSE for the same x86-64 architecture: http://www.rpmfind.net/linux/rpm2html/search.php?query=wget - not to mention different packages served up by YUM and APT (for Ubuntu), all for x86-64.
My understanding is that a package contains binary instructions suitable for a given CPU architecture, so that as long as CPU is of that architecture, it should be able to execute those instructions natively. So why do packages built for the same architecture differ for different OSes?
Considering just different Linux distros:
Besides being compiled against different library versions (as Hadi described), the packaging itself and default config files can be different. Maybe one distro wants /etc/wget.conf, while maybe another wants /etc/default/wget.conf, or for those files to have different contents. (I forget if wget specifically has a global config file; some packages definitely do, and not just servers like Exim or Apache.)
Or different distros could enable / disable different sets of compile-time options. (Traditionally set with ./configure --enable-foo --disable-bar before make -j4 && make install).
For wget, choices may include which TLS library to compile against (OpenSSL vs. gnutls), not just which version.
So ABIs (library versions) are important, but there are other reasons why every distro has their own package of things.
Completely different OSes, like Linux vs. Windows vs. OS X, have different executable file formats. ELF vs. PE vs. Mach-O. All three of those formats contain x86-64 machine code, but the metadata is different. (And OS differences mean you'd want the machine code to do different things.
For example, opening a file on Linux or OS X (or any POSIX OS) can be done with an int open(const char *pathname, int flags, mode_t mode); system call. So the same source code works for both those platforms, although it can still compile to different machine code, or actually in this case very similar machine code to call a libc wrapper around the system call (OS X and Linux use the same function calling convention), but with a different symbol name. OS X would compile to a call to _open, but Linux doesn't prepend underscores to symbol names, so the dynamic linker symbol name would be open.
The mode constants for open might be different. e.g. maybe OS X defines O_RDWR as 4, but maybe Linux defines it as 2. This would be an ABI difference: same source compiles to different machine code, where the program and the library agree on what means what.
But Windows isn't a POSIX system. The WinAPI function for opening a file is HFILE WINAPI OpenFile(LPCSTR lpFileName, LPOFSTRUCT lpReOpenBuff, UINT uStyle);
If you want to do anything invented more recently than opening / closing files, especially drawing a GUI, things are even less similar between platforms and you will use different libraries. (Or a cross platform GUI library will use a different back-end on different platforms).
OS X and Linux both have Unix heritage (real or as a clone implementation), so the low-level file stuff is similar.
These packages contain native binaries that require a particular Application Binary Interface (ABI) to run. The CPU architecture is only one part of the ABI. Different Linux distros have different ABIs and therefore the same binary may not be compatible across them. That's why there are different packages for the same architecture, but different OSes. The Linux Standard Base project aims at standardizing the ABIs of Linux distros so that it's easier to build portable packages.

msysgit large installation size

I installed (extracted) msysgit portable (PortableGit-1.9.5-preview20150319.7z)
The compressed archive is 23 MB, but once extracted the contents take up 262 MB. This is mostly due to the git command binaries (under 'libexec\git-core'). Almost all of the binaries are identical, they just have different names.
Why did the developers build the project like this? I suppose they need an executable for each command to support the CLI on windows cmd.exe.
But isn't there a way to avoid having ~100 identical binaries, each 1.5 MB in size (ex: using batch files)?
Why did the developers build the project like this? I suppose they
need an executable for each command to support the CLI on windows
cmd.exe.
Under unixoid OSes, you can have symbolic links to another file that behave exactly like the original file; if you do that for your executable, your executable can look into argv[0] to find out how it was called. That's a very common trick for many programs.
Under Windows, and especially without installers, it's (to my knowledge) impossible to get the same behaviour out of your file system -- there's just no symbolic link equivalent. Especially when you consider that programs that are meant to run from USB drives have to cope with ancient filesystems such as FAT32!
Thus, the executables simply were copied over. Same functionality, more storage. However, on a modern machine that you'd run windows on, you really don't care about 200MB give or take for such a versatile tool such as git.
In conclusion: the developers had no choice here; since windows (though having some posix abstraction layer) has no proper filesystem support for symbolic links, that was the only good way to port this unix-originating program. You either shouldn't care or use an OS that behaves better in that respect. I'd recommend the latter, but OS choices often aren't made freely...

Develop Simple OS under Mac OS X, how to build the boot img from the Mach-O?

I am writing a simple OS under Mac OS X environment. I can build a simple bootloader by nasm. when i develop the more part by C language, i should build them together. The GCC of Mac OS X will compile a Mach-O output format. I want to know how to cat the instruction part of the output object, and link it together with the nasm part.
thanks.
There's a bigger problem which you aren't seeing.
GCC does not generate 16-bit x86 code, only 32-bit or 64-bit. x86 PC bootloaders start execution in the real addressing mode, which is a 16-bit mode for 16-bit code only.
So, even if you manage to link together the C code compiled with gcc and the assembly code compiled with NASM, you won't be able to execute the C code part (any 32-bit code part for that matter) until after you've switched into the 32-bit protected mode, which is not a very easy thing to do.
And you don't want to switch into protected mode in the 512-byte-long boot sector either. BIOS functions cannot be used in protected mode. If you switch too early, you won't be able to load any more stuff from the disk.
The most practical strategy is to split the bootloader into several parts. The 512-byte-long bootsector would load the next part(s) using the BIOS disk I/O functions. And those other parts will either contain the whole OS, or enough code that would load the rest of the OS either by using the same BIOS I/O functions or by using its own disk driver(s) in the real or protected mode.
So, you are doomed to writing 16-bit code in assembly language by hand for the bootsector, no C, no 32-bit.
You can, however, use other C compilers capable of producing 16-bit x86 code for the other parts of the bootloader. There are at least two of such compilers freely available online:
Turbo C++ 1.01 (runs only in DOS, Windows XP or below, VMs with DOS/Windows, e.g. DosBox)
Open Watcom C/C++ 1.9 (runs in DOS, Windows and probably Linux)