I am trying to find min/max/average memory consumed by particular pod over a time inteval.
Currently I am using
sum(container_memory_working_set_bytes{namespace="test", pod="test1", container!="POD", container!=""}) by (container)
Output -> test1 = 9217675264
For report purpose, I need to find what the min/peak memory used by pod over a time interval ( 6h)
and average too.
You can do that with a range vector (add an [interval] to a metric name/selector) and an aggregation-over-time function:
min_over_time(container_memory_usage_bytes{}[6h])
max_over_time(container_memory_usage_bytes{}[6h])
avg_over_time(container_memory_usage_bytes{}[6h])
Related
I have already installed the both plugins but don't know how to use them for pod analysis. Need help in that as i don't have programming background. Also can we use it for batch processing of images, in case i have more than 100 images?
Another approach per specially coded ImageJ-macro gives reasonable estimates of the widths and lengths of all pods in the sample image. You can access the macro code from here. Unzip the zip-archive and drop the file "plantPodDimensions.ijm" onto the ImageJ main window. Then open the sample image and run the macro. The estimated pod dimensions appear in a table.
Specimen [right to left] Mean Pod Width [cm] Pod Length [cm]
OHiI7_pod-1 0.70±0.11 23.6
OHiI7_pod-2 0.59±0.09 22.3
OHiI7_pod-3 0.64±0.05 20.7
OHiI7_pod-4 0.41±0.04 20.5
OHiI7_pod-5 0.66±0.07 22.9
OHiI7_pod-6 0.68±0.10 24.4
OHiI7_pod-7 0.60±0.07 20.5
Of course it couldn't be tested, if the macro works as expected for other images than the sample image.
With the download of the plugins come documents that explain the use of the plugins. Batch processing is possible, if the starting points of the traces are known or can somehow be determined by additional pre-processing steps (not trivial). Both plugins are macro-recordable. In any case, batch processing will require some macro code.
For the use case in question I would recommend to perform the analyses via the GUI, not per batch processing. The coding of a suitable macro would take more time than the processing of 100+ images.
I am using Slurm on a single node (control and compute) and I cannot seem to correctly limit memory. The script seems to call SBATCH with small memory values (3G), but I see values in top that exceed 25G. Sacct gives me the correct values:
squeue -o "%C %m"
CPUS MIN_MEMORY
2 3G
This is my slurm.conf:
#
SlurmctldHost=schopenhauer
#
#DisableRootJobs=NO
#EnforcePartLimits=NO
#Epilog=
#EpilogSlurmctld=
#FirstJobId=1
#MaxJobId=999999
#GresTypes=
#GroupUpdateForce=0
#GroupUpdateTime=600
#JobFileAppend=0
#JobRequeue=1
#JobSubmitPlugins=1
#KillOnBadExit=0
#LaunchType=launch/slurm
#Licenses=foo*4,bar
#MailProg=/bin/mail
#MaxJobCount=5000
#MaxStepCount=40000
#MaxTasksPerNode=128
MpiDefault=none
#MpiParams=ports=#-#
#PluginDir=
#PlugStackConfig=
#PrivateData=jobs
#ProctrackType=proctrack/cgroup
ProctrackType=proctrack/linuxproc
#Prolog=
#PrologFlags=
#PrologSlurmctld=
#PropagatePrioProcess=0
#PropagateResourceLimits=
#PropagateResourceLimitsExcept=
#RebootProgram=
ReturnToService=1
#SallocDefaultCommand=
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=slurm
#SlurmdUser=root
#SrunEpilog=
#SrunProlog=
StateSaveLocation=/var/spool/slurmd
SwitchType=switch/none
#TaskEpilog=
TaskPlugin=task/affinity
TaskPluginParam=Sched
#TaskProlog=
#TopologyPlugin=topology/tree
#TmpFS=/tmp
#TrackWCKey=no
#TreeWidth=
#UnkillableStepProgram=
#UsePAM=0
#
#
# TIMERS
#BatchStartTimeout=10
#CompleteWait=0
#EpilogMsgTime=2000
#GetEnvTimeout=2
#HealthCheckInterval=0
#HealthCheckProgram=
InactiveLimit=0
KillWait=30
#MessageTimeout=10
#ResvOverRun=0
MinJobAge=300000
#OverTimeLimit=0
SlurmctldTimeout=120
SlurmdTimeout=300
#UnkillableStepTimeout=60
#VSizeFactor=0
Waittime=0
#
#
# SCHEDULING
#DefMemPerCPU=0
#MaxMemPerCPU=0
#SchedulerTimeSlice=30
SchedulerType=sched/backfill
SelectType=select/cons_tres
SelectTypeParameters=CR_Core
#
#
# JOB PRIORITY
#PriorityFlags=
#PriorityType=priority/basic
#PriorityDecayHalfLife=
#PriorityCalcPeriod=
#PriorityFavorSmall=
#PriorityMaxAge=
#PriorityUsageResetPeriod=
#PriorityWeightAge=
#PriorityWeightFairshare=
#PriorityWeightJobSize=
#PriorityWeightPartition=
#PriorityWeightQOS=
#
#
# LOGGING AND ACCOUNTING
#AccountingStorageEnforce=0
#AccountingStorageHost=
AccountingStorageLoc=/var/log/slurm/slurm_jobacct.log
#AccountingStoragePass=
#AccountingStoragePort=
AccountingStorageType=accounting_storage/filetxt
#AccountingStorageUser=
AccountingStoreJobComment=YES
ClusterName=cluster
#DebugFlags=
#JobCompHost=
JobCompLoc=/var/log/slurm/slurm_jobcomp.log
#JobCompPass=
#JobCompPort=
JobCompType=jobcomp/filetxt
#JobCompUser=
#JobContainerType=job_container/none
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/linux
SlurmctldDebug=info
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmdDebug=debug5
SlurmdLogFile=/var/log/slurm/slurmd.log
#SlurmSchedLogFile=
#SlurmSchedLogLevel=
#
#
# POWER SAVE SUPPORT FOR IDLE NODES (optional)
#SuspendProgram=
#ResumeProgram=
#SuspendTimeout=
#ResumeTimeout=
#ResumeRate=
#SuspendExcNodes=
#SuspendExcParts=
#SuspendRate=
#SuspendTime=
#
#
# COMPUTE NODES
NodeName=schopenhauer CPUs=128 Boards=1 SocketsPerBoard=2 CoresPerSocket=32 ThreadsPerCore=2 RealMemory=500000 State=UNKNOWN
PartitionName=short Nodes=schopenhauer Default=YES MaxTime=INFINITE State=UP
Did I misunderstand something? Why does it say minimum memory when I want to make that minimum and maximum as well?
EDIT: I just noticed by setting required memory to a larger one that this doesn't work as a minimum either i.e. many tasks were started even though there was enough RAM for only 12 of them (I requested 40G and I have 500G). Is this the same problem?
Slurm controls memory through the Linux cgroup functionality. You need to set TaskPlugin=task/cgroup in slurm.conf (Cf. https://slurm.schedmd.com/cgroups.html) and ConstrainRAMSpace=yes in cgroup.conf (Cf. https://slurm.schedmd.com/cgroup.conf.html). Then the memory requested by jobs with --mem or --mem-per-cpu becomes effectively a hard limit in addition to being a resource request.
The -m option of gives the memory requested by the job. As a request, it is considered a minimum requirement. But if you configure cgroup it effectively also becomes the maximum.
I don't think slurm enforces memory or cpu usage. It's just there as indication what you think your job's usage will be. To set binding memory you could use ulimit, something like ulimit -v 3G at the beginning of your script.
Just know that this will likely cause problems with your program as it actually requires the amount of memory it requests, so it won't finish succesfully.
The following space allocation is giving me an sB37 JCL error. The cobol size of the output file is 100 bytes and the lrecl size is 100 bytes. What do you think is causing this error? I have tried increase the size to 500,100 and still get the same error.
Code:
//OUTPUT1 DD DSN=A.B.C,DISP=(NEW,CATLG,DELETE),
// DCB=(LRECL=100,BLKSIZE=,RECFM=FBM),
// SPACE=(CYL,(10,5),RLSE)
Try to increase not only the space, but the volume as well.
Include VOL=(,,,#) in your DD. # is the numbers of values you want to allocate
Ex: SPACE=(CYL,(10,5),RLSE),VOL=(,,,3) - includes 3 volumes.
Additionally, you can increase the size, but try to stay within reasonable limits :)
The documentation for B37 says the application programmer should respond as indicated for message IEC030I. The documentation for IEC030I says, in part...
Probable user error. For all cases, allocate as many units as volumes
required.
...as noted in another answer. However, be advised that the documentation for the VOL parameter of the DD statement says...
If you omit the volume count or if you specify 1 through 5, the system
allows up to five volumes; if you specify 6 through 20, the system
allows 20 volumes; if you specify a count greater than 20, the system
allows 5 plus a multiple of 15 volumes. You can override the maximum
volume count in data class by using the volume-count subparameter. The
maximum volume count for an SMS-managed mountable tape data set or a
Non-managed tape data set is 255.
...so for DASD allocations you are best served specifying a volume count greater than 5 (at least).
//OUTPUT1 DD DSN=A.B.C,DISP=(NEW,CATLG,DELETE),
// DCB=(LRECL=100,BLKSIZE=,RECFM=FBM),
// SPACE=(CYL,(10,5),RLSE)
Try this instead. Notice that the secondary will take advantage of a large dataset whereas without that parameter the largest secondary that makes any sense is < 300. Oh, and if indeed it is from a COBOL program make sure that the FD says "BLOCK 0"!!!!! If it isn't "BLOCK 0" then you might not even need to change your JCL because it wasn't fixed block machine. It was merely fixed and unblocked so the space would almost never be enough. And finally you may wish to revisit why you have the M in the RECFM to begin with. Notice also that I took out the LRECL, the BLKSIZE and the RECFM. That is because the FD in the COBOL program is all you need and putting it in the JCL is not only redundant but dangerous because any change will have to be now done in multiple places.
//OUTPUT1 DD DSN=A.B.C,DISP=(NEW,CATLG,DELETE),
// DSNTYPE=LARGE,UNIT=(SYSALLDA,59),
// SPACE=(CYL,(10,1000),RLSE)
There is a limit of 65,535 tracks per one volume. So if you will specify a SPACE that exceeds that limit - system will simply ignore it.
You can increase this limit to 16,777,215 tracks by adding DSNTYPE=LARGE paramter.
Or you can specify that your dataset is a multi volume by adding VOL=(,,,3)
You can also use DATACLAS=xxxx paramter here, however first of all you need to find it. Easy way is to contact your local Storage Team and ask for one. Or If you are familiar with ISPF navigation, you can enter ISMF;4 command to open a panel
use bellow paramters before hitting enter.
CDS Name . . . . . . 'ACTIVE'
Data Class Name . . *
It should produce a list of all available data classes. Find the one that suits you ( has enougth amount of volume count, does not limit primary and secondary space
I'm trying to simulate in Matlab the traffic of a network of 14 nodes and 21 links. I have a function called "New_Connection" and another called " Close_connection " among others.
For now I have implemented traffic using a 'for' loop. In each iteration is called "New_Connection" that randomly chooses a source node , a destination node and a random duration (now this value is an integer) is executed. This connection may or may not be set (lock).
After, is called the "Close_connection" function, which checks all connection times (stored in an array) and if they have a value of 0 closes the connection.
Finally, before the end of the loop, is subtracted a temporary unit to all connections established least the last one.
What I would like is to perform this simulation using a system that implements a time (eg 1 minute) and at any time, any node establishes a new connection. For example:
t=0.000134 s ---- Node1 to Node8
t=0.003024 s ---- Node12 to Node11
t=0.003799 s ---- Node6 to Node3
.
.
.
t=59.341432 s ---- Node1 to Node4
And the "Close_Connection" function considers these time to close connections.
I have searched for information on Simulink, SimEvents, Parallel computing, Discrete event simulation... but I can not really understand the functioning.
Thank you very much in advance and apologies for my English.
You don't need to to use a complex framework like SIMEVENTS. For simple tasks you can write your own event queue. The following code implements a simple scenario. Create new Connections ever T=Uniform(0,10) seconds, delete the connection after 10s
%max duration
SIMTIME=60;
T=0;
NODES=[1:20];
%Constructor for new events. 'Command' is a string, 'data' gives the parameters
MAKEEVENT=#(t,c,d)(struct('time',t,'command',c,'data',{d}));
%create event to end simulation
QUEUE(1)=MAKEEVENT(SIMTIME,'ENDSIM',[]);
%create initial event to create the first connection
QUEUE(end+1)=MAKEEVENT(0,'PRODUCECONNECTION',[]);
RUN=true;
while RUN
[nT,cevent]=min([QUEUE.time]);
assert(nT>=T,'event was created for the past')
T=nT;
EVENT=QUEUE(cevent);
QUEUE(cevent)=[];
fprintf('T=%f\n',T)
switch (EVENT.command)
case 'ENDSIM'
%maybe collect data here
RUN=false;
case 'PRODUCECONNECTION'
%standard producer pattern
%Create a connection between two random nodes every 10s
next=rand*10;
QUEUE(end+1)=MAKEEVENT(T+next,'PRODUCECONNECTION',[]);
R=randperm(size(NODES,2));
first=NODES(R(1));
second=NODES(R(2));
fprintf('CONNECT NODE %d and %d\n',first,second)
%connection will last for 20s
QUEUE(end+1)=MAKEEVENT(T+next,'RELEASECONNECTION',{first,second});
case 'RELEASECONNECTION'
first=EVENT.data{1};
second=EVENT.data{2};
fprintf('DISCONNNECT NODE %d and %d\n',first,second)
end
end
I am using xen hypervisor. I am trying to get the IO count of the VMs running on top of the xen hypervisor. Can someone suggest me some way or tool to get the IO count ? I tried using xenmon and virt-top. Virt-top doesnt give any value and xenmon always shows 0. Any suggestions to get the number of read or write calls made by a VM or the read and write(Block IO) bandwidth of a particular VM. Thanks !
Regards,
Sethu
You can read this directly from sysfs on most systems. You want to open the following directory:
/sys/devices/xen-backend
And look for directories starting with vbd-
The nomenclature is:
vbd-{domain_id}-{vbd_id}/statistics
Inside, you'll find what you need, which is:
br_req - Number of block read requests
oo_req - Number of 'out of' requests (no room left in list to service any given request)
rd_req - Number of read requests
rd_sect - Number of sectors read
wr_sect - Number of sectors written
The br_req will be an aggregate count of things like write barriers, aborts, etc.
Note, for this to work, The kernel has to be told to export Xen attributes via sysfs, but most Xen packages have this enabled. Additionally, the location in sysfs might be different with earlier versions of Xen.
have you tried xentop?
There is also bwm-ng (check your distro). It shows block utilization per disk (real/virtual). If you know the name of the virtual disk attached to the VM, then you can use bwm-ng to get those stats.