In Perl, I generate a huge read-only data-structure once, then fork().
This is to take advantage of COW on RSS pages when forking. It works really well, but when a child process exits, it allocates all the RAM from itelf just prior dying.
Is there a way to avoid this useless allocation ?
Here is sample Perl code that shows the issue.
#! /usr/bin/perl
my $a = [];
# Allocate 100 MiB
for my $i (1 .. 100000) {
push #$a, "x" x 1024;
}
# Fork 10 other process
for my $j (1 .. 10) {
last unless fork();
}
# Sleep for a while to be able to see the RSS
sleep(5);
In the sample vmstat output, we can see that it first allocates only 100MiB, then after the 1rst sleep it allocates the whole for a short while, and then releases all of it.
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 1329660 80596 86936 0 0 21 18 160 25 0 0 100 0 0
1 0 0 1328048 80596 86936 0 0 0 0 1013 44 0 0 100 0 0
0 0 0 1223888 80596 86936 0 0 0 0 1028 76 11 5 84 0 0
0 0 0 1223888 80596 86936 0 0 0 0 1010 40 0 0 100 0 0
0 0 0 1223888 80596 86936 0 0 0 0 1026 54 0 0 100 0 0
0 0 0 1223888 80596 86936 0 0 0 0 1006 39 0 0 100 0 0
13 0 0 741156 80596 86936 0 0 0 0 1012 66 13 58 28 0 0
0 0 0 1329288 80596 86936 0 0 0 0 1032 60 0 0 100 0 0
Note: it seems it isn't a Perl version specific issue. As I tested 5.8.8, 5.10.1 & 5.14.2 and they all do exhibit this behavior.
Update:
As #choroba asked in comments, I also tried to undef the data-structure, but it seems that it triggers the memory-touching as the RAM is then allocated.
You can add the following snippet at the end of the first script.
# Unallocate $a
undef $a;
# Sleep for a while to be able to see the RSS
sleep(5);
Actually, as I found out myself, this behavior is a feature, and the answer lies in the Perl doc:
The exit() function does not always exit immediately.
Likewise any object destructors that need to be called
are called before the real exit.
If this is a problem, you can
call POSIX::_exit($status) to avoid END and destructor processing.
And indeed, adding it at the end of the original code sample does avoid the behavior.
# XXX - To be added just before ending the process
# Use POSIX::_exit($status) to end without allocating copy-on-write RAM
use POSIX;
POSIX::_exit(0);
Note: for this to work, the child has to exit also before the data-structure goes out of scope.
Related
I have been struggling with this for several hours now and after reading many threads about ScriptBlocks, Closures, scopes etc I still don't see what's wrong in my code.
Let me explain: I have a main script that dynamically generates an HTML page using PSWriteHTML module and ScriptBlocks.
As I have a lot of PSWriteHTML pages to write, I use an arrayList of ScriptBlocks to generate the code with different set of values each time (corresponding to different servers), these ScriptBlocks being executed into a foreach loop.
This is done using the Save-utilizationReport function (I have only kept the relevant code):
function Save-utilizationReport ($currentDate, $navLinksScriptBlock, $htmlScriptBlockArray, $emeaTotalNumberOfCalls, $namTotalNumberOfCalls, $apacTotalNumberOfCalls, $path, $logFilePath) {
[...]
# Using Script Blocks, add the pages generated during the analysis to the HTML Report
foreach($htmlScriptBlock in $htmlScriptBlockArray){
Invoke-Command -ScriptBlock ($htmlScriptBlock)
}
[...]
}
The ScriptBlock are created using set of values gathered from a list of servers' logs and added to the arrayList of ScriptBlocks in the Create-utilizationReportPage function (again, I've only kept the relevant code):
function Create-utilizationReportPage ($matchedLines, $ipAddress, $hostname, $pageId, $utilisationReportTemplate, $htmlScriptBlockArray) {
# Retrieve the content of the Utilisation Report Template as a RAW string (Here-String)
$htmlPageCodeBlock = Get-Content $utilisationReportTemplate -Raw
# Create the Script Block that contains the HTML page
$htmlPageScriptBlock = {
# Get the "Total number of calls" information
$timeArray = $matchedLines.time
$participantNumberArray = $matchedLines.participantNumber
[...]
# Update the page ID in the template
$htmlPageCodeBlock = $htmlPageCodeBlock -replace '%PAGE_ID%', "$pageId"
# Update the page header information in the template
$htmlPageCodeBlock = $htmlPageCodeBlock -replace '%PAGE_HEADER%', "$ipAddress [$hostname]"
Invoke-Command -ScriptBlock ([scriptblock]::Create($htmlPageCodeBlock))
}.GetNewClosure()
# Add the Page's script block to the Script Blocks array
$htmlScriptBlockArray.Add($htmlPageScriptBlock)
}
These are called in the script below:
$currentDate = Get-CorrectDate $latestFolder
# The global Log file
$logFilePath = "$scriptPath/Logs/logs_$currentDate.txt"
$serversList = "$scriptPath/Config/$configFileName"
# If the Servers list exists, retrieve the Servers list
if (Test-Path $serversList) {
# Get the data from the file
[xml]$servers = Get-Content $serversList
# Select only the Servers information
$nodes = $servers.SelectNodes("//server")
try {
[...]
# Iterate through the Servers list
foreach ($node in $nodes) {
# Get the Server IP Address
$ipAddress = $node.ip
# Get the Server Hostname
$hostname = $node.hostname
# Get the "Debug Utilization" lines from the logbundle's syslog files
$matchedLines = [System.Collections.ArrayList]#(Find-UtilizationLinesInLogs $currentDate "$scriptPath\Data\$currentDate\logs_$ipAddress" "host:server: \[USAGE\] : \{`"1`" : ")[-1]
# Add the Server's participants throughout the day
switch ($hostname) {
{$_.Contains("emea")} {
# Update EMEA total number of participants
Update-ZoneTotalParticipants ([ref]$emeaServer) ([ref]$emeaTotalNumberOfCalls) $matchedLines
}
{$_.Contains("nam")} {
# Update NAM total number of participants
Update-ZoneTotalParticipants ([ref]$namServer) ([ref]$namTotalNumberOfCalls) $matchedLines
}
{$_.Contains("apac")} {
# Update APAC total number of participants
Update-ZoneTotalParticipants ([ref]$apacServer) ([ref]$apacTotalNumberOfCalls) $matchedLines
}
}
# Get the CPU Utilization values lines from the logbundle's sysdebug files
$cpu = Get-CpuUsage "$scriptPath\Data\$currentDate\logs_$ipAddress\sysdebug"
# Get the Memory Utilization values lines from the logs' sysdebug files
$memory = Get-MemoryUsage "$scriptPath\Data\$currentDate\logs_$ipAddress\sysdebug"
# Export the "Debug Utilization" to a CSV file
Save-csvUtilizationReport $matchedLines "$scriptPath\Output\$currentDate" "$ipAddress" "$hostname" $logFilePath
# Draw graphs from the "Debug Utilization" information and then export it to an HTML File
Create-utilizationReportPage $matchedLines "$ipAddress" "$hostname" $pageId $utilisationReportTemplate $htmlScriptBlockArray
# Add the new navigation link to the Navigation Links array
Add-htmlNavLink $navLinksArray "$ipAddress" "$hostname" $pageId $logFilePath
# Increment the Page ID counter
$pageId += 1
}
# Create an Here-String from the Navigation Links array
$OFS = ""
$navLinksCode =#"
$($navLinksArray)
"#
$OFS = " "
# Create a script block from the Navigation Links
$navLinksScriptBlock = [scriptblock]::Create($navLinksCode)
# Save the daily HTML utilization report
Save-utilizationReport $currentDate $navLinksScriptBlock $htmlScriptBlockArray $emeaTotalNumberOfCalls $namTotalNumberOfCalls $apacTotalNumberOfCalls "$scriptPath\Output\$currentDate" $logFilePath
}
catch
{
Write-Logs $logFilePath "Error: $($_.Exception.Message)"
exit 1
}
}
Everything is working fine and as expcted except for the first set of values in the first page which is somehow the sum of all the other set of values...
For example when I have 3 pages, I can see that the collected values are correct when the Find-UtilizationLinesInLogs function is executed:
matchedLines.participantNumber:
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 5 5 5 4 4 4 4 4 12 14 14 15 16 16 16 7 7 7 7 8 14 17 18 19 18 19 19 20 16 16 16 15 7 7 7 7 5 4 4 4 4 4 4 4 4 1 1 0 0 0 3 6 5 5 5 5 9 14 16 18
matchedLines.participantNumber:
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 8 9 10 10 10 10 9 9 9 9 9 9 8 8 8 8 8 8 8 8 8 8 9 4 12 15 11 14 14 13 12 12 12 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0
matchedLines.participantNumber:
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 2 0 0 0 3 9 10 10 9 8 10 9 9 10 10 10 10 10 11 11 11 11 8 7 8 8 7 7 7 7 7 7 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 16 17 16 16
But when the ScriptBlocks are executed using the Invoke-Command in the foreach loop, the first batch of values is systematically the sum of the 3 sets of values while the following ones are correct:
matchedLines.participantNumber inside Create-utilizationReportPage:
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 6 6 7 7 5 5 5 8 29 33 34 34 34 36 34 25 26 26 26 27 32 36 37 38 37 35 34 36 32 31 32 26 26 29 25 22 20 18 17 17 17 6 6 6 6 3 3 2 2 2 5 8 7 7 7 10 25 31 32 34
matchedLines.participantNumber inside Create-utilizationReportPage:
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 8 9 10 10 10 10 9 9 9 9 9 9 8 8 8 8 8 8 8 8 8 8 9 4 12 15 11 14 14 13 12 12 12 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0
matchedLines.participantNumber inside Create-utilizationReportPage:
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 2 0 0 0 3 9 10 10 9 8 10 9 9 10 10 10 10 10 11 11 11 11 8 7 8 8 7 7 7 7 7 7 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 16 17 16 16
I have tried many things without success so if someone has any hint of what can go wrong, it would be great!
Thanks for your help!
So! I finally found out what was wrong.
I suspected that my problem was related to arrayList copies or at least variable copies... so I tried to remove this part of the script where I extract the $macthedLines values and copy them into global arrays using references:
# Add the Server's participants throughout the day
switch ($hostname) {
{$_.Contains("emea")} {
# Update EMEA total number of participants
Update-ZoneTotalParticipants ([ref]$emeaServer) ([ref]$emeaTotalNumberOfCalls) $matchedLines
}
{$_.Contains("nam")} {
# Update NAM total number of participants
Update-ZoneTotalParticipants ([ref]$namServer) ([ref]$namTotalNumberOfCalls) $matchedLines
}
{$_.Contains("apac")} {
# Update APAC total number of participants
Update-ZoneTotalParticipants ([ref]$apacServer) ([ref]$apacTotalNumberOfCalls) $matchedLines
}
}
And bingo, this time the values written down in the first PSWriteHTML page are the correct one!
So I focused on the Update-ZoneTotalParticipants function which is doing the values copy:
function Update-ZoneTotalParticipants ([ref][int]$ServerNumber, [ref][System.Collections.ArrayList]$totalNumberOfCalls, $values) {
# If this is the first server to be analysed in the zone
if ($ServerNumber.value -eq 1) {
# Copy the Utilization lines of the server
$totalNumberOfCalls.value = $values
# Increment the zone's server counter
$ServerNumber.value += 1
}
# If this at least the 2nd server to be analysed in the zone
elseif ($ServerNumber.value -gt 1) {
# Parse the server matched lines, get the participantNumber value and add it to the total
0..($totalNumberOfCalls.value.Count - 1) | ForEach-Object {
if ($_ -le ($values.Count - 1)) {
$totalNumberOfCalls.value[$_].participantNumber = [int]($totalNumberOfCalls.value[$_].participantNumber) + [int]($values[$_].participantNumber)
}
# If there are less objects in the current server matched lines, add a 0 instead
else {
$totalNumberOfCalls.value[$_].participantNumber = [int]($totalNumberOfCalls.value[$_].participantNumber) + 0
}
}
}
}
The only part of the code where $matchedLines is involved is $totalNumberOfCalls.value = $values so it certainly is were the array manipulation goes wrong.
So I dug around ArrayList copies or clones and found out that I was not doing a deep copy of the object and that it could cause issues.
I used Petru Zaharia's solution in this thread to update the function:
# Copy the Utilization lines of the server
$totalNumberOfCalls.value = $values
# replaced with:
# Copy the Utilization lines of the server : Serialize and Deserialize data using PSSerializer:
$_TempCliXMLString = [System.Management.Automation.PSSerializer]::Serialize($matchedLines, [int32]::MaxValue)
$totalNumberOfCalls.value = [System.Management.Automation.PSSerializer]::Deserialize($_TempCliXMLString)
And now everything works as expected.
Thanks guys for your support!
Is it possible to tell whether a process/thread has the PF_NO_SETAFFINITY flag set? I'm running taskset on a series of process ids and some are throwing errors of the following form:
taskset: failed to set pid 30's affinity: Invalid argument
I believe this is because some processes have PF_NO_SETAFFINITY set (see Answer).
Thank you!
Yes - look at /proc/PID/stat's 'flag' field
<linux/sched.h
#define PF_NO_SETAFFINITY 0x04000000 /* Userland is not allowed to meddle with cpus_allowed */
Look here for details on using /proc:
http://man7.org/linux/man-pages/man5/proc.5.html
https://supportcenter.checkpoint.com/supportcenter/portal?eventSubmit_doGoviewsolutiondetails=&solutionid=sk65143
Example:
ps -eaf
www-data 30084 19962 0 07:09 ? 00:00:00 /usr/sbin/apache2 -k start
...
cat /proc/30084/stat
30084 (apache2) S 19962 19962 19962 0 -1 4194624 554 0 3 0 0 0 0 0 20 0 1 0 298837672 509616128 5510 18446744073709551615 1 1 0 0 0 0 0 16781312 201346799 0 0 0 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0
The flags are 4194624
Q: Do you mind specifying how you'd write a simple script that outputs
true/false based on whether you're allowed to set affinity?
A: I don't feel comfortable providing this without the opportunity to test, but you can try something like this...
flags=$(cut -f 9 -d ' ' /proc/30084/stat)
echo $(($flags & 0x40000000))
When our application run for some time, for example , run for hours, the sbcl will throw heap exhausted exception.
Heap exhausted during garbage collection: 1968 bytes available, 2128 requested.
Gen StaPg UbSta LaSta LUbSt Boxed Unboxed LB LUB !move Alloc Waste Trig WP GCs Mem-age
0: 0 0 0 0 0 0 0 0 0 0 0 5368709 0 0 0.0000
1: 0 0 0 0 0 0 0 0 0 0 0 5368709 0 0 0.0000
2: 0 0 0 0 0 0 0 0 0 0 0 5368709 0 0 0.0000
3: 101912 101913 0 0 19362 20536 0 0 0 162867456 554752 102714709 0 1 1.4405
4: 130984 131071 0 0 29240 18868 0 0 25 191196152 5854216 128537781 14785 1 0.6442
5: 75511 81013 0 0 16567 17127 92 99 36 132974568 5818392 2000000 16565 0 0.0000
6: 0 0 0 0 7949 1232 0 0 0 37605376 0 2000000 7766 0 0.0000
Total bytes allocated = 524643552
Dynamic-space-size bytes = 536870912
GC control variables:
*GC-INHIBIT* = true
*GC-PENDING* = true
*STOP-FOR-GC-PENDING* = false
fatal error encountered in SBCL pid 3281(tid 3067845440):
Heap exhausted, game over.
Welcome to LDB, a low-level debugger for the Lisp runtime environment.
ldb>
Any suggestion?
SBCL does not allow you to allocate more than (sb-ext:dynamic-space-size) bytes on the heap. Here you have a 512MB default size (536870912 bytes) and the Lisp program already was using nearly that amount when it attempted to make another allocation.
You could double the amount of heap space available to 1024MB by starting SBCL with --dynamic-space-size 1024. However, as several comments point out, there may be a memory leak, where objects are referenced somehow proportional to the time that the system has been running, so this will offer only a temporary respite.
The (room t) standard Common Lisp function call might help debug this, if you call it periodically.
More advanced code like this http://dwim.hu/darcsweb/darcsweb.cgi?r=HEAD%20hu.dwim.debug;a=headblob;f=/source/path-to-root.lisp#l42 which delves into the SB-VM internal map of allocations could shed more light, and SBCL has a statistical profiler, http://www.sbcl.org/manual/#Statistical-Profiler that supports reporting on allocations too.
I have four files. File 1 (named as inupt_22.txt) is an input file containing two columns (space delimited). First column is the alphabetically sorted list of ligandcode (three letter/number code for a particular ligand). Second column is a list of PDBcodes (Protein Data Bank code) respective of each ligandcode (unsorted list though).
File 1 (input_22.txt):
803 1cqp
AMH 1b2i
ASC 1f9g
ETS 1cil
MIT 1dwc
TFP 1ctr
VDX 1db1
ZMR 1a4g
File 2(named as SD_2.txt) is a SDF (Structure Data file) for fragments of each ligand. A ligand can contain one or more than one fragments. For instance, here 803 is the ligandcode and it has two fragments. So the file will look like: four dollar sign ($$$$) followed by ligandcode (i.e 803 in this example) in next line. every fragment follows the same thing. Next, in the 5th line of each fragment (third line from $$$$.\n803), there is a number that represents number of rows in next block of rows, like 7 in first fragment and 10 in next fragment of 803 ligand. Now, next block of rows contains a column (61-62) which contains specific number that refers to atoms in fragments. For example in first fragment of 803, these numbers are 15,16,17,19,20,21,22. These numbers need to be matched in file 3.
File 2 (SD_2.txt) looks like:
$$$$
803
SciTegic05101215222D
7 7 0 0 0 0 999 V2000
3.0215 -0.5775 0.0000 C 0 0 0 0 0 0 0 0 0 15 0 0
2.3070 -0.9900 0.0000 C 0 0 0 0 0 0 0 0 0 16 0 0
1.5926 -0.5775 0.0000 C 0 0 0 0 0 0 0 0 0 17 0 0
1.5926 0.2475 0.0000 C 0 0 0 0 0 0 0 0 0 19 0 0
2.3070 0.6600 0.0000 C 0 0 0 0 0 0 0 0 0 20 0 0
2.3070 1.4850 0.0000 O 0 0 0 0 0 0 0 0 0 21 0 0
3.0215 0.2475 0.0000 O 0 0 0 0 0 0 0 0 0 22 0 0
1 2 1 0
1 7 1 0
2 3 1 0
3 4 1 0
4 5 1 0
5 6 2 0
5 7 1 0
M END
> <Name>
803
> <Num_Rings>
1
> <Num_CSP3>
4
> <Fsp3>
0.8
> <Fstereo>
0
$$$$
803
SciTegic05101215222D
10 11 0 0 0 0 999 V2000
-1.7992 -1.7457 0.0000 C 0 0 0 0 0 0 0 0 0 1 0 0
-2.5137 -1.3332 0.0000 C 0 0 0 0 0 0 0 0 0 2 0 0
-2.5137 -0.5082 0.0000 C 0 0 0 0 0 0 0 0 0 3 0 0
-1.7992 -0.0957 0.0000 C 0 0 0 0 0 0 0 0 0 5 0 0
-1.0847 -0.5082 0.0000 C 0 0 0 0 0 0 0 0 0 6 0 0
-0.3702 -0.0957 0.0000 C 0 0 0 0 0 0 0 0 0 7 0 0
0.3442 -0.5082 0.0000 C 0 0 0 0 0 0 0 0 0 8 0 0
0.3442 -1.3332 0.0000 C 0 0 0 0 0 0 0 0 0 9 0 0
-0.3702 -1.7457 0.0000 C 0 0 0 0 0 0 0 0 0 11 0 0
-1.0847 -1.3332 0.0000 C 0 0 0 0 0 0 0 0 0 12 0 0
1 2 1 0
1 10 1 0
2 3 1 0
3 4 1 0
4 5 2 0
5 6 1 0
5 10 1 0
6 7 2 0
7 8 1 0
8 9 1 0
10 9 1 0
M END
> <Name>
803
> <Num_Rings>
2
> <Num_CSP3>
6
> <Fsp3>
0.6
> <Fstereo>
0.1
File 3 is CIF (Crystallographic Information file). This file can be obtained from following link: File_3
This file is a collection of individual cif files for several ligand molecules. Each part in file starts with data_ligandcode. For our example it will be data_803. After 46 lines from the start of each small file in collection, there is a block that gives structural information about the molecule. The number of rows in this block is not fixed. However, this block ends with an Hash sign (#). In this block two columns are important which are 53-56 and 62-63. 62-63 column contains numbers that can be matched from numbers obtained from file 2. And, 53-56 contains atom names like C1 (Carbon 1) etc. This column can be used to match with file 4.
File 4 is a Grow.out file that contains information about interaction of each ligand with their target protein. The file name is the PDBcode given in file 1 against each ligand. For example for ligand 803 the PDBcode is 1cqp. So, the grow.out file will be having name of 1cqp. 1cqp
In this file those rows are important those contain ligandcode (for example 803) and and the atom name obtained from 53-56 column of file three.
Task: I need a script that reads ligandcode from File 1, goes to file 2 search for $$$$ . \nLigandcode and then obtain numbers from column 61-62 for each fragment. Then in next step my script should pass these number to file 3 and match the rows containing these number in column 62-63 of file 3 and then pull out the information in column 53-56 (atom names). And last step will be opening of file 4 with the name of PDBcode and then printing the rows containing ligandcode and the atom names obtained from file 3. The printing should be done in an output file.
I am a Biomedical Research student. I don't have computer science background. However, I have to use Perl programming for some task. For the above mentioned task I wrote a script, but it is not working properly and I can not find the reason behind it. The script I wrote is :
#!/usr/bin/perl
use strict;
use warnings;
use Text::Table;
use Carp qw(croak);
{
my $a;
my $b;
my $input_file = "input_22.txt";
my #lines = slurp($input_file);
for my $line (#lines){
my ($ligandcode, $pdbcode) = split(/\t/, $line);
my $i=0;
my $k=0;
my #array;
my #array1;
open (FILE, '<', "SD_2.txt");
while (<FILE>) {
my $i=0;
my $k=0;
my #array;
my #array1;
if ( $_=~/\x24\x24\x24\x24/ . /\n$ligandcode/) {
my $nextline1 = <FILE>;
my $nextline2 = <FILE>;
my $nextline3 = <FILE>;
my $nextline4= <FILE>;
my $totalatoms= substr( $nextline4, 1,2);
print $totalatoms,"\n";
while ($i<$totalatoms)
{
my $nextlines= <FILE>;
my $sub= substr($nextlines, 61, 2);
print $sub;
$array[$i] = $sub;
open (FH, '<', "components.txt");
while (my $ship=<FH>) {
my $var="data_$ligandcode";
if ($ship=~/$var/)
{
while ($k<=44)
{
$k++;
my $nextline = <FH>;
}
my $j=0;
my $nextline3;
do
{
$nextline3=<FH>;
print $nextline3;
my $part= substr($nextline3, 62, 2);
my $part2= substr($nextline3, 53, 4);
$array1[$j] = $part;
if ($array1[$j] eq $array[$i])
{
print $part2, "\n";
open (GH, '<', "$pdbcode");
open (OH, ">>out_grow.txt");
while (my $grow = <GH>)
{
if ( $grow=~/$ligandcode/){
print OH $grow if $grow=~/$part2/;
}}
close (GH);
close (OH);
}
$j++;
} while $nextline3 !~/\x23/;
}
}
$i++;
close (FH);
}
}}
close (FILE);
}
}
##Slurps a file into a list
sub slurp {
my ($file) = #_;
my (#data, #data_chomped);
open IN, "<", $file or croak "can't open $file\n";
#data = <IN>;
for my $line (#data){
chomp($line);
push (#data_chomped, $line);
}
close IN;
return (#data_chomped);
}
I want to make it a script that works fast and works for 1000 fragments altogether, if I make a list of 400 molecules in file 1. Kindly help me to make this script working. I ll be grateful.
You need to break your code into manageable steps.
Create data-structures from the files
use Slurp;
my #input = map{
[ split /\s+/, $_, 2 ]
} slurp $input_filename;
# etc
Process each element of input_22.txt, using those data structures.
I really think you should look into PerlMol. After all, half the reason to use Perl is CPAN.
Things you did well
Using 3-arg open
use strict;
use warnings;
Things you shouldn't have done
(Re)defined $a and $b
They are already defined for you.
Reimplemented slurp (poorly)
Read the same file in multiple times.
You opened SD_2.txt once for every line of input_22.txt.
Defined symbols outside of the scope where you use them.
$j, $k, #array and #array1 are defined twice, but only one of the definitions is being used.
Used open and close without some sort of error checking.
Either open ... or die; or use autodie;
You used bareword filehandles. IN, FILE etc
Instead use open my $FH, ...
Most of those aren't that big of a deal though, for a one-off program.
i have an installation on memcache which i want to use in my production environment but when i have ran a couple of tests it seems that memcache doesn't free up memory even after it has used up all of it allocated memory, Also i logged in and ran a flush_all command but the objects are still in the cache.
Here are outputs from some tests
memcached-tool
memcache-top v0.6 (default port: 11211, color: on, refresh: 3 seconds)
INSTANCE USAGE HIT % CONN TIME EVICT/s READ/s WRITE/s
127.0.0.1:11211 427.1% 0.0% 18 1.4ms 0.0 244 261.0K
AVERAGE: 427.1% 0.0% 18 1.4ms 0.0 244 261.0K
TOTAL: 4.3MB/ 1.0MB 18 1.4ms 0.0 244 261.0K
memcached-tool 127.0.0.1:11211 display
No Item_Size Max_age Pages Count Full? Evicted Evict_Time OOM
1 560B 4s 1 1872 yes 0 0 15488
2 704B 32s 1 559 no 0 0 0
3 880B 4s 1 1191 yes 0 0 1335
4 1.1K 9s 1 116 no 0 0 0
5 1.4K 21s 1 14 no 0 0 0
6 1.7K 4s 1 17 no 0 0 0
7 2.1K 84s 1 24 no 0 0 0
8 2.7K 130s 1 60 no 0 0 0
9 3.3K 25s 1 290 no 0 0 0
10 4.2K 9s 1 194 no 0 0 0
11 5.2K 9s 1 116 no 0 0 0
15 12.7K 816s 1 1 no 0 0 0
16 15.9K 769s 1 5 no 0 0 0
18 24.8K 786s 1 1 no 0 0 0
21 48.5K 816s 1 1 no 0 0 0
memcached-tool 127.0.0.1:11211 stats
127.0.0.1:11211 Field Value
accepting_conns 1
auth_cmds 0
auth_errors 0
bytes 4478060
bytes_read 23964596
bytes_written 546642860
cas_badval 0
cas_hits 0
cas_misses 0
cmd_flush 0
cmd_get 240894
cmd_set 4504
conn_yields 0
connection_structures 21
curr_connections 18
curr_items 4461
decr_hits 0
decr_misses 0
delete_hits 0
delete_misses 0
evictions 0
get_hits 43756
get_misses 197138
incr_hits 0
incr_misses 0
limit_maxbytes 1048576
listen_disabled_num 0
pid 8731
pointer_size 64
reclaimed 0
rusage_system 5.047232
rusage_user 4.311344
threads 4
time 1306247929
total_connections 3092
total_items 4504
uptime 1240
version 1.4.5
-m tells memcached how much RAM to use for item storage (in megabytes). Note
carefully that this isn't a global
memory limit, so memcached will use a
few % more memory than you tell it to.
Set this to safe values. Setting it to
less than 48 megabytes does not work
properly in 1.4.x and earlier. It will
still use the memory.
Source: https://github.com/memcached/memcached/wiki/ConfiguringServer#commandline-arguments