GC issues compiling a large scala monorepo - scala

Scala version=2.12.15
SBT version=1.4.9
JVM=1.8
GC=ParallelGC ( marksweep/scavenge )
Heap size=14GB
Monorepo is full of spark code
Files to compile 34013 over 3-4 subprojects
What we've noticed is the gradual increase on old-gen space that doesn't clear out even though GC's are constantly happening which suggests there are a lot of live objects
A heap dump was taken
jmap -histo ( 3rd column is size in megabytes )
13: 1209208 181.843 [Ljava.lang.Object;
12: 8881966 203.292 xsbti.api.NameHash
11: 9080251 207.83 scala.collection.immutable.HashSet$HashSet1
10: 7052557 215.227 scala.reflect.internal.Trees$Literal
9: 6539689 249.469 scala.reflect.internal.Trees$TypeTree
8: 10487573 320.055 scala.reflect.internal.Symbols$TypeHistory
7: 4839129 332.277 scala.reflect.internal.Symbols$TermSymbol
6: 10500404 400.559 scala.reflect.internal.Trees$Apply
5: 10728133 409.246 scala.reflect.internal.Trees$Ident
4: 30407384 695.97 java.lang.String
3: 18842332 718.778 scala.reflect.internal.Trees$Select
2: 35977908 823.469 scala.collection.immutable.$colon$colon
1: 30412139 1942.91 [C
When we analyzed the retained sizes of various classes, scala.collection.immutable.$colon$colon was retaining around 5G of heap
I'm curious to understand how scalac works to get to the bottom of this. Worst case perhaps we do need 14G of heap to compile our monorepo

Related

Searchkick memory leak

Running rake searchkick:reindex CLASS=Product for an application causes the Rake process to leak memory; after about 15-20 minutes it's bad enough to freeze a Debian system with 16GB of RAM. There are ~3800 "Product" records.
I managed to work around this problem with the following code in a Rake task:
connection = ActiveRecord::Base.connection
res = connection.execute('select max(id) from products')
id = res.getvalue(0,0)
1.upto(id) do |i|
p = Product.find_by_id(i)
next unless p
p.reindex
end
This is also a little quicker.
Can anyone suggest a means to investigate this memory leak? It would be useful to do so in more detail before considering opening a ticket.
This causes a problem with generating indexes: Text fields are not optimised for operations that require per-document field data
That problem can be fixed by adding the following to the code above:
Product.reindex(import: false)
# Rest of code goes here...

Swift build time too long when the configuration is 'release'?

I have a open source project and the counter of files in project are more than 40.
I build the project when the configuration is Debug and the compile time is 2m22s.
And I also use the BuildTimeAnalyzer, the longest time of all is 28ms.
But when I build the project with the Release configuration, it stuck in Compile Swift source files more than one hour.
I have no idea about this, please help me.
In the DEBUG build, if you add up all the time spent on each function, you get about 7s. The numbers don't quite add up — you have spent 142s to build the whole thing, but these functions just take about than 7s to compile??
That's because these timing just accounts for type-checking each function body. In the Swift frontend there are three flags you could use:
-Xfrontend -debug-time-compilation
-Xfrontend -debug-time-function-bodies
-Xfrontend -debug-time-expression-type-checking
Let's use the first to see the whole picture. Pick one slow file, say Option.swift, and look:
===-------------------------------------------------------------------------===
Swift compilation
===-------------------------------------------------------------------------===
Total Execution Time: 30.5169 seconds (43.6413 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
23.5183 ( 80.1%) 0.7773 ( 67.6%) 24.2957 ( 79.6%) 34.4762 ( 79.0%) LLVM output
3.7312 ( 12.7%) 0.0437 ( 3.8%) 3.7749 ( 12.4%) 5.4192 ( 12.4%) LLVM optimization
1.8563 ( 6.3%) 0.2830 ( 24.6%) 2.1393 ( 7.0%) 3.1800 ( 7.3%) IRGen
0.2026 ( 0.7%) 0.0376 ( 3.3%) 0.2402 ( 0.8%) 0.4666 ( 1.1%) Type checking / Semantic analysis
... <snip> ...
29.3665 (100.0%) 1.1504 (100.0%) 30.5169 (100.0%) 43.6413 (100.0%) Total
Turns out it's not Swift that is slow, but LLVM! So there is not point looking at type-checking time. We can further check why LLVM is slow using -Xllvm -time-passes, but it won't give us useful information, it's just saying X86 Assembly / Object Emitter is taking most time.
Let's take a step back and check which files take most time to compile:
Option.swift 30.5169
Toolbox.swift 15.6143
PictorialBarSerie.swift 12.2670
LineSerie.swift 8.9690
ScatterSerie.swift 8.5959
FunnelSerie.swift 8.3299
GaugeSerie.swift 8.2945
...
Half a minute is spent in Options.swift. What's wrong with this file?
You have a huge struct, with 31 members. Compiling that struct alone takes 11 seconds.
You have a huge enum, with 80 variants. Compiling this enum alone takes 7 seconds.
The first problem is easy to fix: Use final class instead! The second problem would not have a simple fix (I don't see any time improvement with alternatives e.g. replace the enums with class hierarchy). All other slow files have a similar problem: large structure, large enums.
Simply replacing all struct with final class is enough to bring the compilation time from "over hours and still compiling" to "2.5 minutes".
See also Why Choose Struct Over Class?. Your "struct"s may not qualify as structs.
Note that changing from struct to class do change the semantics of users' code since classes have reference semantics.
Try this....
under Build Settings -> Swift Compiler - Code Generation for your Release choose SWIFT_OPTIMIZATION_LEVEL = -Owholemodule. Then under Other Swift Flags enter -Onone. Doing this carved off big chunks of time off my project.

How to run !dumpheap -dead on the generation 2 across all the heaps?

I have 8 managed gc heaps reported by !eeheap -gc:
0:000> !eeheap -gc
Number of GC Heaps: 8
------------------------------
Heap 0 (00000000009a2c50)
generation 0 starts at 0x00000000d92e3aa0
generation 1 starts at 0x00000000d8cdb128
generation 2 starts at 0x000000007fff1000
ephemeral segment allocation context: none
segment begin allocated size
000000007fff0000 000000007fff1000 00000000d93edab8 0x593fcab8(1497352888)
Large object heap starts at 0x000000047fff1000
segment begin allocated size
000000047fff0000 000000047fff1000 0000000487fabf00 0x7fbaf00(133934848)
00000004e6400000 00000004e6401000 00000004ee3af2f8 0x7fae2f8(133882616)
000000050e400000 000000050e401000 00000005152f8578 0x6ef7578(116356472)
0000000572400000 0000000572401000 00000005756e8ad8 0x32e7ad8(53377752)
Heap Size: Size: 0x73544d00 (1934904576) bytes.
------------------------------
Heap 1 (00000000009ad690)
generation 0 starts at 0x00000001609a9cc8
generation 1 starts at 0x000000016072f780
generation 2 starts at 0x00000000ffff1000
ephemeral segment allocation context: none
segment begin allocated size
00000000ffff0000 00000000ffff1000 0000000161bf8f50 0x61c07f50(1640005456)
Large object heap starts at 0x0000000487ff1000
segment begin allocated size
0000000487ff0000 0000000487ff1000 000000048ffea910 0x7ff9910(134191376)
0000000044b50000 0000000044b51000 000000004cb44978 0x7ff3978(134166904)
000000051e400000 000000051e401000 000000052575aae0 0x7359ae0(120953568)
000000057a400000 000000057a401000 000000057c2e8610 0x1ee7610(32405008)
Heap Size: Size: 0x7ae362c8 (2061722312) bytes.
...
I would like to run the !dumpheap -dead command on the gen 2 and LOH only, however, I am a bit confused as to:
The command clearly says where the gen 2 starts, but it is unclear to me where does it end. For example, for Heap 0 I figure I give -start 0x000000007fff1000, but what goes into -end ? Is it the start of gen 1?
I have 8 heaps, so I guess I have to run the !dumpheap -dead 8 times for gen 2. For LOH, which seems to span multiple fragments, the number of times is even higher. Is there a way to automate the process of dumping all these dead objects across all the LOHs and gen 2s?

Trying to read a Xcode Instruments .trace file. What is the file format of a .trace file?

I am writing an automated profiling system, to profile different GPU intensive screens in my App. I have been trying to use 'XCode Instruments' for this, with the 'OpenGL ES Driver' instrument that captures the gpu usage data.
My automated system runs Xcode Instruments from the command line which runs the App, profiles and captures the data, and writes the data to a ".trace" file.
I now want to be able to open the trace file, and read the trace data using my automated profiling system, so that I can inform App developers of how the various parts of the App perform.
I cannot however find any way of reading the trace file. It seems to be package which contain various directories, and buried in there is a .zip file which seems to contain some binary data. How is the data in this file parsed?
The Instruments system seems fairly sosphisticated, and I've been suprised how hard it has been to access the trace data that it produces.
Does anyone know how to parse the trace file?
I am currently using XCode 4.6.1
Alright, so to answer the main question: The data in the .zip archive is a blob of data that was serialized with the NSArchiver class (they have a fairly distinctive header when being opened with a hex tool (I used hex fiend), so that was the first clue). It's fairly straight forward to read, all you have to do is making a call to NSUnarchiver, at least that's the theory. Before I go in into the details, here is a very simple example app that dumps a few infos: https://github.com/JustSid/Traced
So, the problem with NSArchiver, and NSUnarchiver, is that you first of all need to have all the classes that were archived, and second of all you have to read the data out in the order in that it was archived (that was the tricky bit, I used class-dump to dump the interface for a few of the required classes and then tried to unarchive the data object by object and looking at what I got returned. Luckily, NSArchiver dies with descriptive error messages, if there is a class missing, it will tell you what its name is). The biggest problem that I had was that the Instruments binary and the used frameworks don't contain all the classes that I needed, in particular the archive contains serialized data of a class named XRVideoCardRun. I have the assumption that the .template file inside the .trace bundle contains a dynamic library with the required class (I mean, it's over 300kb in size and contains a lot of blobs (it's btw a binary plist)). I was too lazy to extract the binary data out of it and run class-dump against it, and I was lucky enough that most of the data that came out of the archive was consistent with what I was expecting to see for the superclass, XRRun (which I found in one of the Instruments frameworks), with the exception of an array containing dictionaries, which content looked like the sample data.
So, the rest was just combining everything together. If you look into the sample app, the most interesting part should be the XRRun.m and .h file. They contain a bit of documentation, and some pieces on how to extract the data from the samples, although you probably want to replace this with your own logic for your automation. Hope it helps.
The app thrown agains your sample file outputs this:
Run 1, starting at 24.05.13 17:42:16, running until 24.05.13 17:42:28
Sample 0: FPS: 27 Device: 0% Renderer: 0% Tiler: 0% Timestamp: 1.012740
Sample 1: FPS: 35 Device: 11% Renderer: 10% Tiler: 2% Timestamp: 2.018574
Sample 2: FPS: 34 Device: 33% Renderer: 32% Tiler: 7% Timestamp: 3.026101
Sample 3: FPS: 59 Device: 59% Renderer: 59% Tiler: 16% Timestamp: 4.032030
Sample 4: FPS: 60 Device: 59% Renderer: 58% Tiler: 16% Timestamp: 5.038990
Sample 5: FPS: 59 Device: 59% Renderer: 58% Tiler: 16% Timestamp: 6.046022
Sample 6: FPS: 59 Device: 57% Renderer: 53% Tiler: 17% Timestamp: 7.051187
Sample 7: FPS: 60 Device: 67% Renderer: 66% Tiler: 14% Timestamp: 8.057343
Sample 8: FPS: 59 Device: 64% Renderer: 64% Tiler: 11% Timestamp: 9.064914
Sample 9: FPS: 60 Device: 67% Renderer: 67% Tiler: 11% Timestamp: 10.072592
Sample 10: FPS: 59 Device: 65% Renderer: 65% Tiler: 15% Timestamp: 11.080248
(PS: If the format changes, the app will break as well...)
I am trying to parse the .trace document using the undocumented frameworks shipped with Instruments itself. It is now working with Time Profiler and it shouldn't be hard to get it working with other instrument templates as well, with a little more reverse engineering work.
There are quite a few frameworks bundled with Instruments as you can see in /Applications/Xcode.app/Contents/Applications/Instruments.app/Contents/Frameworks.
However we only need to link against these two:
DVTInstrumentsFoundation.framework
InstrumentsPlugIn.framework
Another thing you should know before starting is that instrument templates are actually plugins in /Applications/Xcode.app/Contents/Applications/Instruments.app/Contents/PlugIns.
For example, SamplerPlugin.xrplugin is for Time Profiler.
The code is short and commented out: https://github.com/Qusic/TraceUtility
You may not be able to analyze the Trace file directly with a script, but they can exported to a CSV file that can be analyzed by a script or put into Excel, Numbers, etc. You may even be able to add the export as a CSV to your automated testing, depending on how it is done.
.trace is actually a folder and it has a zip 1.run.zip in .trace/instruments_data/ and after some folders you will find the zip. Unzip it and you will get 1.run. Not sure how to decode that. Best way is to call - instruments .trace - this will open in instruments with the details.

Suppress iOS Console Output "Unloading xxx unused Assets..."

Is there any way to suppress console output in iPhone player when a new scene is loaded using Application.LoadLevelAdditiveAsync or similar methods?
Unloading 7 Unused Serialized files (Serialized files now loaded: 0 / Dirty serialized files: 0)
Unloading 185 unused Assets to reduce memory usage. Loaded Objects now: 3468. Operation took 377.272217 ms.
System memory in use: 6.7 MB.
Yes it might not be the most important thing on earth but it's somewhat annoying when looking for relevant error messages within noisy output.