I'm currently building object files using swiftc, with:
swiftc -emit-object bar.swift
where bar.swift is something simple like:
class Bar {
var value: Int
init(value: Int) {
self.value = value
}
func plusValue(_ value: Int) -> Int {
return self.value + value
}
}
When I then move on to linking this against my main object to create an executable, I get the following error:
$ cc -o foobar foo.o bar.o
duplicate symbol '_main' in:
foo.o
bar.o
ld: 1 duplicate symbol for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
This indicates that swiftc is adding a main implementation to the object file, which can be confirmed with:
$ nm bar.o | grep _main
0000000000000000 T _main
As far as I can tell, this added function does very little:
$ otool -tV bar.o
bar.o:
(__TEXT,__text) section
_main:
0000000000000000 pushq %rbp
0000000000000001 movq %rsp, %rbp
0000000000000004 xorl %eax, %eax
0000000000000006 movl %edi, -0x4(%rbp)
0000000000000009 movq %rsi, -0x10(%rbp)
000000000000000d popq %rbp
000000000000000e retq
000000000000000f nop
....snip...
Is there a way to tell swiftc -emit-object to not add this vestigial implementation of main?
Short answer
The command line argument that I was missing was:
-parse-as-library
Long answer
In my quest to find an answer, I resorted to looking at the Swift source code on Github, and with a bit of luck found the following compiler invocation in the test suite:
// RUN: %target-build-swift %S/Inputs/CommandLineStressTest/CommandLineStressTest.swift -parse-as-library -force-single-frontend-invocation -module-name CommandLineStressTestSwift -emit-object -o %t/CommandLineStressTestSwift.o
According to swiftc --help, -parse-as-library causes the compiler to:
Parse the input file(s) as libraries, not scripts
This has proven to work for me, and the only difference in exported symbols is the removal of _main:
$ diff -U1 <(swiftc -emit-object -module-name bar -o a.o bar.swift && nm a.o | cut -c18-) <(swiftc -emit-object -parse-as-library -module-name bar -o b.o bar.swift && nm b.o | cut -c18-)
--- /dev/fd/63 2020-03-28 17:13:08.000000000 +1100
+++ /dev/fd/62 2020-03-28 17:13:08.000000000 +1100
## -23,3 +23,2 ##
U __objc_empty_cache
-T _main
s _objc_classes
and in the generated assembly the only change is the removal of the _main code:
$ diff -U1 <(swiftc -emit-object -module-name bar -o a.o bar.swift && objdump -d -no-leading-addr -no-show-raw-insn a.o) <(swiftc -emit-object -parse-as-library -module-name bar -o b.o bar.swift && objdump -d -no-leading-addr -no-show-raw-insn b.o)
--- /dev/fd/63 2020-03-28 17:19:03.000000000 +1100
+++ /dev/fd/62 2020-03-28 17:19:03.000000000 +1100
## -1,15 +1,5 ##
-a.o: file format Mach-O 64-bit x86-64
+b.o: file format Mach-O 64-bit x86-64
Disassembly of section __TEXT,__text:
-_main:
- pushq %rbp
- movq %rsp, %rbp
- xorl %eax, %eax
- movl %edi, -4(%rbp)
- movq %rsi, -16(%rbp)
- popq %rbp
- retq
- nop
-
_$S3bar3BarC5valueSivg:
Related
So, if I multiply two values:
emacs -batch -eval '(print (* 1252463 -4400000000000))'
It will exceed most-negative-fixnum frame and will return mathematically wrong answer. What will be the difference in instruction level between
-O2 flag, -O2 -fsanitize=undefined flag, and -O2 -fwrapv flag?
In emacs? Probably nothing. The function that is compiled probably looks like this:
int multiply(int x, int y) {
return x * y;
}
If we compile that and look at the assembly (gcc -S multiply.c && cat multiply.s), we get
multiply:
pushq %rbp
movq %rsp, %rbp
movl %edi, -4(%rbp)
movl %esi, -8(%rbp)
movl -4(%rbp), %eax
imull -8(%rbp), %eax
popq %rbp
ret
See the imull instruction? It's doing a regular multiply. What if we try gcc -O2 -S multiply.c?
multiply:
movl %edi, %eax
imull %esi, %eax
ret
Well that's certainly removed some code, but it's still doing imull, a regular multiplication.
Let's try to get it to not use imull:
int multiply(int x) {
return x * 2;
}
With gcc -O2 -S multiply.c, we get
multiply:
leal (%rdi,%rdi), %eax
ret
Instead of computing the slower x * 2, it instead computed x + x, because addition is faster than multiplication.
Can we get -fwrapv to do produce different code? Yes:
int multiply(int x) {
return x * 2 < 0;
}
With gcc -O2 -S multiply.c, we get
multiply:
movl %edi, %eax
shrl $31, %eax
ret
So it was simplified into x >> 31, which is the same thing as x < 0. In math, if x * 2 < 0 then x < 0. But in the reality of processors, if x * 2 overflows it may become negative, for example 2,000,000,000 * 2 = -294967296.
If you force gcc to take this into account with gcc -O2 -fwrapv -S temp.c, we get
multiply:
leal (%rdi,%rdi), %eax
shrl $31, %eax
ret
So it optimized x * 2 < 0 to x + x < 0. It might seem strange to have -fwrapv not be the default, but C was created before it was standard for multiplication to overflow in this predictable manner.
I have cross compiled tool chain for armv7eb. Now I have copied the tools to my target machine and tried to build perl on target. In this process of building perl, there is a binary called 'miniperl' which crashes, outputting "ILLEGAL INSTRUCTION".
Can someone help me point out what I should be doing?
PERL VERSION: 5.8.8
GCC: 4.1.3
Please find the crash log below. Does anybody have any suggestions?
sh cflags "optimize='-O'" opmini.o` -DPIC -fPIC -DPERL_EXTERNAL_GLOB opmini.c
CCCMD = cc -DPERL_CORE -c -msoft-float -dynamic -fno-strict-aliasing -pipe -Wdeclaration-after-statement -O -Wall
LD_LIBRARY_PATH=/usr/pkg/src/perl-5.8.8:/lib:/usr/lib:/usr/pkg/lib:/usr/pkg/src/PTHREAD/lib cc -Wl,-rpath,/usr/pkg/lib -Wl,-rpath,/usr/local/lib -L/usr/pkg/lib -L/lib -L/usr/lib -o miniperl miniperlmain.o opmini.o -L. -lperl -lm -lcrypt -lutil -lc -lposix
LD_LIBRARY_PATH=/usr/pkg/src/perl-5.8.8:/lib:/usr/lib:/usr/pkg/lpid 25563 (miniperl), uid 0: exited on signal 4 (core not dumped, err = 2)
ib:/usr/pkg/src/PTHREAD/lib ./miniperl -w -Ilib -MExporter -e '<?>' || make minitest
[1] Illegal instruction LD_LIBRARY_PATH=...
cp ext/re/re.pm lib/re.pm
LD_LIBRARY_PATH=/usr/pkg/src/perpid 8052 (miniperl), uid 0: exited on signal 4 (core not dumped, err = 2)
l-5.8.8:/lib:/usr/lib:/usr/pkg/lib:/usr/pkg/src/PTHREAD/lib ./miniperl -Ilib configpm --heavy=lib/Config_heavy.pl lib/Config.pm
[1] Illegal instruction LD_LIBRARY_PATH=...
*** Error code 132
Stop.
make: stopped in /upid 25595 (miniperl), uid 0: exited on signal 4 (core not dumped, err = 2)
sr/pkg/src/perl-5.8.8
*** Error code 1 (ignored)
You may see some irrelevant test failures if you have been unable
to build lib/Config.pm, lib/lib.pm or the Unicode data files.
cd t && (rm -f perl; /bin/ln -s ../miniperl perl) && LD_LIBRARY_PATH=/usr/pkg/src/perl-5.8.8:/lib:/usr/lib:/usr/pkg/lib:/usr/pkg/src/PTHREAD/lib ./perl TEST -minitest base/*.t comp/*.t cmd/*.t run/*.t io/*.t op/*.t uni/*.t </dev/tty
[1] Illegal instruction LD_LIBRARY_PATH=...
*** Error code 132 (ignored)
*** Error code 1 (ignored)
LD_LIBRARY_PATH=/usr/pkg/src/perpid 703 (miniperl), uid 0: exited on signal 4 (core not dumped, err = 2)
l-5.8.8:/lib:/usr/lib:/usr/pkg/lib:/usr/pkg/src/PTHREAD/lib ./miniperl -Ilib configpm --heavy=lib/Config_heavy.pl lib/Config.pm
I read that range-based loops have better performance on some programming language. Is it the case in Swift. For instance in Playgroud:
func timeDebug(desc: String, function: ()->() )
{
let start : UInt64 = mach_absolute_time()
function()
let duration : UInt64 = mach_absolute_time() - start
var info : mach_timebase_info = mach_timebase_info(numer: 0, denom: 0)
mach_timebase_info(&info)
let total = (duration * UInt64(info.numer) / UInt64(info.denom)) / 1_000
println("\(desc): \(total) µs.")
}
func loopOne(){
for i in 0..<4000 {
println(i);
}
}
func loopTwo(){
for var i = 0; i < 4000; i++ {
println(i);
}
}
range-based loop
timeDebug("Loop One time"){
loopOne(); // Loop One time: 2075159 µs.
}
normal for loop
timeDebug("Loop Two time"){
loopTwo(); // Loop Two time: 1905956 µs.
}
How to properly benchmark in swift?
// Update on the device
First run
Loop Two time: 54 µs.
Loop One time: 482 µs.
Second
Loop Two time: 44 µs.
Loop One time: 382 µs.
Third
Loop Two time: 43 µs.
Loop One time: 419 µs.
Fourth
Loop Two time: 44 µs.
Loop One time: 399 µs.
// Update 2
func printTimeElapsedWhenRunningCode(title:String, operation:()->()) {
let startTime = CFAbsoluteTimeGetCurrent()
operation()
let timeElapsed = CFAbsoluteTimeGetCurrent() - startTime
println("Time elapsed for \(title): \(timeElapsed) s")
}
printTimeElapsedWhenRunningCode("Loop Two time") {
loopTwo(); // Time elapsed for Loop Two time: 4.10079956054688e-05 s
}
printTimeElapsedWhenRunningCode("Loop One time") {
loopOne(); // Time elapsed for Loop One time: 0.000500023365020752 s.
}
You shouldn’t really benchmark in playgrounds since they’re unoptimized. Unless you’re interested in how long things will take when you’re debugging, you should only ever benchmark optimized builds (swiftc -O).
To understand why a range-based loop can be faster, you can look at the assembly generated for the two options:
Range-based
% echo "for i in 0..<4_000 { println(i) }" | swiftc -O -emit-assembly -
; snip opening boiler plate...
LBB0_1:
movq %rbx, -32(%rbp)
; increment i
incq %rbx
movq %r14, %rdi
movq %r15, %rsi
; print (pre-incremented) i
callq __TFSs7printlnU__FQ_T_
; compare i to 4_000
cmpq $4000, %rbx
; loop if not equal
jne LBB0_1
xorl %eax, %eax
addq $8, %rsp
popq %rbx
popq %r14
popq %r15
popq %rbp
retq
.cfi_endproc
C-style for loop
% echo "for var i = 0;i < 4_000;++i { println(i) }" | swiftc -O -emit-assembly -
; snip opening boiler plate...
LBB0_1:
movq %rbx, -32(%rbp)
movq %r14, %rdi
movq %r15, %rsi
; print i
callq __TFSs7printlnU__FQ_T_
; increment i
incq %rbx
; jump if overflow
jo LBB0_4
; compare i to 4_000
cmpq $4000, %rbx
; loop if less than
jl LBB0_1
xorl %eax, %eax
addq $8, %rsp
popq %rbx
popq %r14
popq %r15
popq %rbp
retq
LBB0_4:
; raise illegal instruction due to overflow
ud2
.cfi_endproc
So the reason the C-style loop is slower is because it’s performing an extra operation – checking for overflow. Either Range was written to avoid the overflow check (or do it up front), or the optimizer was more able to eliminate it with the Range version.
If you switch to using the check-free addition operator, you can eliminate this check. This produces near-identical code to the range-based version (the only difference being some immaterial ordering of the code):
% echo "for var i = 0;i < 4_000;i = i &+ 1 { println(i) }" | swiftc -O -emit-assembly -
; snip
LBB0_1:
movq %rbx, -32(%rbp)
movq %r14, %rdi
movq %r15, %rsi
callq __TFSs7printlnU__FQ_T_
incq %rbx
cmpq $4000, %rbx
jne LBB0_1
xorl %eax, %eax
addq $8, %rsp
popq %rbx
popq %r14
popq %r15
popq %rbp
retq
.cfi_endproc
Never Benchmark Unoptimized Builds
If you want to understand why, try looking at the output for the Range-based version of the above, but with no optimization: echo "for var i = 0;i < 4_000;++i { println(i) }" | swiftc -Onone -emit-assembly -. You will see it output a lot more code. That’s because Range used via for…in is an abstraction, a struct used with custom operators and functions returning generators, and does a lot of safety checks and other helpful things. This makes it a lot easier to write/read code. But when you turn on the optimizer, all this disappears and you’re left with very efficient code.
Benchmarking
As to ways to benchmark, this is the code I tend to use, just replacing the array:
import CoreFoundation.CFDate
func timeRun<T>(name: String, f: ()->T) -> String {
let start = CFAbsoluteTimeGetCurrent()
let result = f()
let end = CFAbsoluteTimeGetCurrent()
let timeStr = toString(Int((end - start) * 1_000_000))
return "\(name)\t\(timeStr)µs, produced \(result)"
}
let n = 4_000
let runs: [(String,()->Void)] = [
("for in range", {
for i in 0..<n { println(i) }
}),
("plain ol for", {
for var i = 0;i < n;++i { println(i) }
}),
("w/o overflow", {
for var i = 0;i < n;i = i &+ 1 { println(i) }
}),
]
println("\n".join(map(runs, timeRun)))
But the results will probably be meaningless, since jitter during println will likely obscure actual measurement. To really benchmark (assuming you don’t just trust the assembly analysis :) you’d need to replace it with something very lightweight.
When creating a UITableViewController, sometimes I only need the indexPath in my function, is there a performance improvement when using _ to ignore the tableView parameter?
Ex: using this:
override func tableView(_: UITableView, didSelectRowAtIndexPath indexPath: NSIndexPath)
instead of this:
override func tableView(tableView: UITableView, didSelectRowAtIndexPath indexPath: NSIndexPath) {
Generally, this falls under the category "micro optimization".
Even if there were a difference, it would probably be negligible
compared to the rest of your program. And chances are great that the
compiler notices the unused parameter and optimizes the code
accordingly. You should decide what parameter name makes the most
sense in your situation.
In this particular case, it does not make any difference at all.
How you name the (internal) method parameter affects only the compiling
phase, but does not change the generated code.
You can verify that easily
yourself. Create a source file "main.swift":
// main.swift
import Swift
func foo(str : String) -> Int {
return 100
}
func bar(_ : String) -> Int {
return 100
}
println(foo("a"))
println(bar("b"))
Now compile it and inspect the generated assembly code:
swiftc -O -emit-assembly main.swift
The assembly code for both methods is completely identical:
.private_extern __TF4main3fooFSSSi
.globl __TF4main3fooFSSSi
.align 4, 0x90
__TF4main3fooFSSSi:
pushq %rbp
movq %rsp, %rbp
movq %rdx, %rdi
callq _swift_unknownRelease
movl $100, %eax
popq %rbp
retq
.private_extern __TF4main3barFSSSi
.globl __TF4main3barFSSSi
.align 4, 0x90
__TF4main3barFSSSi:
pushq %rbp
movq %rsp, %rbp
movq %rdx, %rdi
callq _swift_unknownRelease
movl $100, %eax
popq %rbp
retq
In ObjC, a bool's bit pattern could be retrieved by casting it to a UInt8.
e.g.
true => 0x01
false => 0x00
This bit pattern could then be used in further bit manipulation operations.
Now I want to do the same in Swift.
What I got working so far is
UInt8(UInt(boolValue))
but this doesn't look like it is the preferred approach.
I also need the conversion in O(1) without data-dependent branching. So, stuff like the following is not allowed.
boolValue ? 1 : 0
Also, is there some documentation about the way the UInt8 and UInt initializers are implemented? e.g. if the UInt initializer to convert from bool uses data-dependent branching, I can't use it either.
Of course, the fallback is always to use further bitwise operations to avoid the bool value altogether (e.g. Check if a number is non zero using bitwise operators in C).
Does Swift offer an elegant way to access the bit pattern of a Bool / convert it to UInt8, in O(1) without data-dependent branching?
When in doubt, have a look at the generated assembly code :)
func foo(someBool : Bool) -> UInt8 {
let x = UInt8(UInt(someBool))
return x
}
compiled with ("-O" = "Compile with optimizations")
xcrun -sdk macosx swiftc -emit-assembly -O main.swift
gives
.globl __TF4main3fooFSbVSs5UInt8
.align 4, 0x90
__TF4main3fooFSbVSs5UInt8:
.cfi_startproc
pushq %rbp
Ltmp2:
.cfi_def_cfa_offset 16
Ltmp3:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp4:
.cfi_def_cfa_register %rbp
callq __TFE10FoundationSb19_bridgeToObjectiveCfSbFT_CSo8NSNumber
movq %rax, %rdi
callq __TFE10FoundationSuCfMSuFCSo8NSNumberSu
movzbl %al, %ecx
cmpq %rcx, %rax
jne LBB0_2
popq %rbp
retq
The function names can be demangled with
$ xcrun -sdk macosx swift-demangle __TFE10FoundationSb19_bridgeToObjectiveCfSbFT_CSo8NSNumber __TFE10FoundationSuCfMSuFCSo8NSNumberSu
_TFE10FoundationSb19_bridgeToObjectiveCfSbFT_CSo8NSNumber ---> ext.Foundation.Swift.Bool._bridgeToObjectiveC (Swift.Bool)() -> ObjectiveC.NSNumber
_TFE10FoundationSuCfMSuFCSo8NSNumberSu ---> ext.Foundation.Swift.UInt.init (Swift.UInt.Type)(ObjectiveC.NSNumber) -> Swift.UInt
There is no UInt initializer that takes a Bool argument.
So the smart compiler has used the automatic conversion between Swift
and Foundation types and generated some code like
let x = UInt8(NSNumber(bool: someBool).unsignedLongValue)
Probably not very efficient with two function calls. (And it does not
compile if you only import Swift, without Foundation.)
Now the other method where you assumed data-dependent branching:
func bar(someBool : Bool) -> UInt8 {
let x = UInt8(someBool ? 1 : 0)
return x
}
The assembly code is
.globl __TF4main3barFSbVSs5UInt8
.align 4, 0x90
__TF4main3barFSbVSs5UInt8:
pushq %rbp
movq %rsp, %rbp
andb $1, %dil
movb %dil, %al
popq %rbp
retq
No branching, just an "AND" operation with 0x01!
Therefore I do not see a reason not to use this "straight-forward" conversion.
You can then profile with Instruments to check if it is a bottleneck for
your app.
#martin-r’s answer is more fun :-), but this can be done in a playground.
// first check this is true or you’ll be sorry...
sizeof(Bool) == sizeof(UInt8)
let t = unsafeBitCast(true, UInt8.self) // = 1
let f = unsafeBitCast(false, UInt8.self) // = 0