how to sum distinct by mutiple column

how to sum distinct by mutiple column - group-by

I'm struggling to achieve this
VM Disk Capacity
VM1 Disk 1 100
VM1 Disk 2 40
VM1 Disk 1 100
VM1 Disk 2 40
VM2 Disk 1 45
VM2 Disk 1 45
VM3 Disk 1 30
VM3 Disk 2 30
The result should be like this
VM Capacity
VM1 140
VM2 45
VM3 60

Does this help?
SELECT a.column1, SUM(column3) FROM
(
SELECT DISTINCT column1, column2, column3 FROM tableA
) a
GROUP BY column1

Here you go
http://sqlfiddle.com/#!6/285fe
SELECT a.vm, SUM(capacity)
FROM (
SELECT DISTINCT vm, DISK, capacity
FROM TEMP
) a
GROUP BY vm

SELECT a.vm, SUM(capacity) as capacity FROM
(
SELECT DISTINCT vm, disk, capacity FROM test
) a
GROUP BY vm
SQL fiddle

Related

pyspark conf and yarn top memory discrepancies

An EMR cluster reads (from main node, after running yarn top):
ARN top - 13:27:57, up 0d, 1:34, 1 active users, queue(s): root
NodeManager(s): 6 total, 6 active, 0 unhealthy, 2 decommissioned, 0
lost, 0 rebooted Queue(s) Applications: 3 running, 8 submitted, 0
pending, 5 completed, 0 killed, 0 failed Queue(s) Mem(GB): 18
available, 189 allocated, 1555 pending, 0 reserved Queue(s) VCores: 44
available, 20 allocated, 132 pending, 0 reserved Queue(s) Containers:
20 allocated, 132 pending, 0 reserved
APPLICATIONID USER TYPE QUEUE PRIOR #CONT #RCONT VCORES RVCORES MEM RMEM VCORESECS MEMSECS %PROGR TIME NAME
application_1663674823778_0002 hadoop spark default 0 10 0 10 0 99G 0G 18754 187254 10.00 00:00:33 PyS
application_1663674823778_0003 hadoop spark default 0 9 0 9 0 88G 0G 9446 84580 10.00 00:00:32 PyS
application_1663674823778_0008 hadoop spark default 0 1 0 1 0 0G 0G 382 334 10.00 00:00:06 PyS
Note that the PySpark apps for application_1663674823778_0002 and application_1663674823778_0003 were provisioned via the main node command line with just executing pyspark (with no explicit config editing).
However, the application_1663674823778_0008 was provisioned via the following command: pyspark --conf spark.executor.memory=11g --conf spark.driver.memory=12g. Despite this (test) PySpark config customization, that app in yarn fails to show anything other than 0 for the memory (regular or reserved) value.
Why is this?

Rookio Ceph cluster : mon c is low on available space message

I have setup RookIO 1.4 cluster in Kubernetes 1.18. with 3 nodes allocated 1TB storage on each of them.
after creating cluster. when I run the ceph status cluster status shows as HEALTH_WARN with mon c is low on available space.
There is no data stored yet. why status how low on available space? How to clear this error?
[root#rook-ceph-tools-6bdcd78654-sfjvl /]# ceph status
cluster:
id: ad42764d-aa28-4da5-a828-2d87205aff08
health: HEALTH_WARN
mon c is low on available space
services:
mon: 3 daemons, quorum a,b,c (age 37m)
mgr: a(active, since 36m)
osd: 3 osds: 3 up (since 37m), 3 in (since 37m)
data:
pools: 1 pools, 1 pgs
objects: 0 objects, 0 B
usage: 3.0 GiB used, 3.6 TiB / 3.6 TiB avail
pgs: 1 active+clean
All three node has same size storage:
sdb 8:16 0 1.2T 0 disk
└─ceph--a6cd601d--7584--4b1f--bf82--48c95437f351-osd--data--ae1bc856--8ded--4b1e--8c87--30ca0f0959a3 253:3 0 1.2T 0 lvm
sdb 8:16 0 1.2T 0 disk
└─ceph--ccaf7144--d6a0--441c--bcd5--6a09d056bd7a-osd--data--36a9b28c--7207--400a--936b--edfb3255ce0b 253:3 0 1.2T 0 lvm
sdb 8:16 0 1.2T 0 disk
└─ceph--53e9b8a9--8925--4b21--a6ea--f8e17a322d5c-osd--data--6b1e779c--a18a--4e4d--960e--73ca9473d02f 253:3 0 1.2T 0 lvm
Thanks
SR

This alert is for your monitor disk space that is stored normally in /var/lib/ceph/mon. This path is stored in root fs that isn't related to your OSDs block device. This warn is raised when this path has less than 30% available space (see mon_data_avail_warn which is 30 by default).
You can change it to ignore alert or resize that path to have more space for its RocksDB data.

As Seena explained, it was because the available space is less than 30%, in this case, you could compact the mon data by the command as follow.
ceph tell mon.`hostname -s` compact
There is another way to trigger the data compaction for mon, add the mon config to the ceph.conf, and then restart the mon.
[mon]
mon compact on start = true

joblib Parallel running out of memory

I have something like this
outputs = Parallel(n_jobs=12, verbose=10)(delayed(_process_article)(article, config) for article in data)
Case 1: Run on ubuntu with 80 cores:
CPU(s): 80
Thread(s) per core: 2
Core(s) per socket: 20
Socket(s): 2
There are a total of 90,000 tasks. At around 67k it fails and is terminated.
joblib.externals.loky.process_executor.BrokenProcessPool: A process in the executor was terminated abruptly, the pool is not usable anymore.
When I monitor the top at 67k I see a sharp fall in the memory
top - 11:40:25 up 2 days, 18:35, 4 users, load average: 7.09, 7.56, 7.13
Tasks: 32 total, 3 running, 29 sleeping, 0 stopped, 0 zombie
%Cpu(s): 7.6 us, 2.6 sy, 0.0 ni, 89.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 33554432 total, 40 free, 33520996 used, 33396 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 40 avail Mem
Case 2: Mac with 8 cores
hw.physicalcpu: 4
hw.logicalcpu: 8
But on the mac it is much much slower .. And surprisingly it does not get killed at 67k..
Additionally, I reduced the parallelism (in case 1) to 2,4 and it still fails :(
Why is this happening? Has anyone faced this issue before and has a fix?
Note: when I run for 50,000 tasks it runs well and does not give any problems.
Thank you!

Got a machine with an increased memory of 128GB and that solved the problem!

How can I find out the cluster size of an exfat partition?

How can I find out the cluster size of an exfat partition?
It appears that fsutil only has a command for ntsf partition
.

This is how I do it on my Ubuntu with exfat-utils package installed.
$ sudo dumpexfat /dev/sdb1
Volume label
Volume serial number 0xb631210e
FS version 1.0
Sector size 512
Cluster size 32768
Sectors count 1953523120
Free sectors 1953276800
Clusters count 30520069
Free clusters 30519950
First sector 0
FAT first sector 128
FAT sectors count 238528
First cluster sector 238656
Root directory cluster 120
Volume state 0x0002
FATs count 1
Drive number 0x80
Allocated space 0%

How to understand Memory dump from windbg?

One of our website is using like 2gb memories, and we are trying to understand why it is using so much (as we are trying to push this site to azure, as big memory usage means higher bill from azure).
I took an IIS dump and from task manager, I can see it was using like 2.2GB momory.
Then I run !address -summaryand this is what I got:
--- Usage Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
Free 913 7fb`2f5ce000 ( 7.981 Tb) 99.76%
<unknown> 4055 4`a49c9000 ( 18.572 Gb) 96.43% 0.23%
Heap 338 0`1dbd1000 ( 475.816 Mb) 2.41% 0.01%
Image 3147 0`0c510000 ( 197.063 Mb) 1.00% 0.00%
Stack 184 0`01d40000 ( 29.250 Mb) 0.15% 0.00%
Other 14 0`001bf000 ( 1.746 Mb) 0.01% 0.00%
TEB 60 0`00078000 ( 480.000 kb) 0.00% 0.00%
PEB 1 0`00001000 ( 4.000 kb) 0.00% 0.00%
--- Type Summary (for busy) ------ RgnCount ----------- Total Size -------- %ofBusy %ofTotal
MEM_PRIVATE 2206 4`ba7d2000 ( 18.914 Gb) 98.20% 0.23%
MEM_IMAGE 5522 0`148b0000 ( 328.688 Mb) 1.67% 0.00%
MEM_MAPPED 71 0`019a0000 ( 25.625 Mb) 0.13% 0.00%
--- State Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
MEM_FREE 913 7fb`2f5ce000 ( 7.981 Tb) 99.76%
MEM_RESERVE 2711 4`378f4000 ( 16.868 Gb) 87.58% 0.21%
MEM_COMMIT 5088 0`9912e000 ( 2.392 Gb) 12.42% 0.03%
--- Protect Summary (for commit) - RgnCount ----------- Total Size -------- %ofBusy %ofTotal
PAGE_READWRITE 1544 0`81afb000 ( 2.026 Gb) 10.52% 0.02%
PAGE_EXECUTE_READ 794 0`0f35d000 ( 243.363 Mb) 1.23% 0.00%
PAGE_READONLY 2316 0`05ea8000 ( 94.656 Mb) 0.48% 0.00%
PAGE_EXECUTE_READWRITE 279 0`020f4000 ( 32.953 Mb) 0.17% 0.00%
PAGE_WRITECOPY 92 0`0024f000 ( 2.309 Mb) 0.01% 0.00%
PAGE_READWRITE|PAGE_GUARD 61 0`000e6000 ( 920.000 kb) 0.00% 0.00%
PAGE_EXECUTE 2 0`00005000 ( 20.000 kb) 0.00% 0.00%
--- Largest Region by Usage ----------- Base Address -------- Region Size ----------
Free 5`3fac0000 7f9`59610000 ( 7.974 Tb)
<unknown> 3`06a59000 0`f9067000 ( 3.891 Gb)
Heap 0`0f1c0000 0`00fd0000 ( 15.813 Mb)
Image 7fe`fe767000 0`007ad000 ( 7.676 Mb)
Stack 0`01080000 0`0007b000 ( 492.000 kb)
Other 0`00880000 0`00183000 ( 1.512 Mb)
TEB 7ff`ffe44000 0`00002000 ( 8.000 kb)
PEB 7ff`fffdd000 0`00001000 ( 4.000 kb)
There are lots of things I don't really get:
The webserver has 8GB memory in total, but the Free section in the Usage summary is showing 7.9Tb? Why?
Unknown was showing 19.572GB, but the webserver has 8GB memory in total. Why?
The task manager shows private memory workset was like 2.2GB, but if I add Heap, Image and Stack together it is only around 700MB, so where are the rest 1.5 GB memory or I totally read the output wrong?
Many Thanks

The webserver has 8GB memory in total, but the Free section in the Usage summary is showing 7.9Tb? Why?
The 8 GB RAM is physical memory, i.e. the one that's filled in the DDR slots of your PC. The 8 TB is virtual memory, which could also be stored in the page file.
The virtual memory can be 4 GB for a 32 bit process and depends on the exact limitations of the OS for 64 bit processes.
Unknown was showing 19.572GB, but the webserver has 8GB memory in total. Why?
The 19 GB is amount the virtual memory used by an <unknown> memory manager, e.g. .NET or direct calls to VirtualAlloc().
Even if 19 GB is more than 8 GB, this does not necessarily mean that memory was swapped to disk. It depends on the state of the memory. Looking at MEM_RESERVE, we see that most of it is not in use yet. Therefore, your application may still have good performance.
The task manager shows private memory workset was like 2.2GB, but if I add Heap, Image and Stack together it is only around 700MB, so where are the rest 1.5 GB memory or I totally read the output wrong?
The rest is in <unknown>, so the sum is even more than 2.2 GB shown by task manager. The working set size indicates how much physical RAM is used by your process. Ideally, everything would be in RAM, since RAM is the fastest. But RAM is limited and not all applications will fit into RAM. Therefore, memory that is not used very often is swapped to disk, which decreases the use of physical RAM and thus decreases the working set size.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

how to sum distinct by mutiple column - group-by

I'm struggling to achieve this VM Disk Capacity VM1 Disk 1 100 VM1 Disk 2 40 VM1 Disk 1 100 VM1 Disk 2 40 VM2 Disk 1 45 VM2 Disk 1 45 VM3 Disk 1 30 VM3 Disk 2 30 The result should be like this VM Capacity VM1 140 VM2 45 VM3 60

Does this help? SELECT a.column1, SUM(column3) FROM ( SELECT DISTINCT column1, column2, column3 FROM tableA ) a GROUP BY column1

Here you go http://sqlfiddle.com/#!6/285fe SELECT a.vm, SUM(capacity) FROM ( SELECT DISTINCT vm, DISK, capacity FROM TEMP ) a GROUP BY vm

SELECT a.vm, SUM(capacity) as capacity FROM ( SELECT DISTINCT vm, disk, capacity FROM test ) a GROUP BY vm SQL fiddle

Related

pyspark conf and yarn top memory discrepancies

Rookio Ceph cluster : mon c is low on available space message

joblib Parallel running out of memory

How can I find out the cluster size of an exfat partition?

How to understand Memory dump from windbg?

Categories

Resources