brunel visualization - how to select all headers with a regular expression - visualization

I'm importing the following csv file:
import pandas as pd
from numpy import log, abs, sign, sqrt
import brunel
# Read data
DISKAVGRIO = pd.read_csv("../DISKAVGRIO_nmon.csv")
DISKAVGRIO.head(6)
And the following table:
Hostname | Date-Time | hdisk1342 | hdisk1340 | hdisk1343 | ...
------------ | ----------------------- | ----------- | ------------- | ----------- | ------
host1 | 12-08-2015 00:56:12 | 0.0 | 0.0 | 0.0 | ...
host1 | 12-08-2015 01:11:13 | 0.0 | 0.0 | 0.0 | ...
host1 | 12-08-2015 01:26:14 | 0.0 | 0.0 | 0.0 | ...
host1 | 12-08-2015 01:41:14 | 0.0 | 0.0 | 0.0 | ...
host1 | 12-08-2015 01:56:14 | 0.0 | 0.4 | 4.2 | ...
host1 | 12-08-2015 02:11:14 | 0.0 | 0.0 | 0.0 | ...
How do I select all fields that starts with hdisk?

Brunel does not directly support this, but you can define a python variable that builds the string you need for the Brunel and then use this field with a $ inside the Brunel magic. See this stackoverflow issue for details.
b = "y(hdisk1342)"
%brunel data('DISKAVGRIO') x(Date_Time) $b line

Related

How to get percentage of one column based off different totals in PostgreSQL

POSTGRESQL: I have a table of three columns and 2000+ rows. The columns are (clothing) item, size, and total(price). I need to calculate a percentage column that reflects the fraction of the total price that should be allocated to each item and size combo.
table example
| item | size | price |
| ---- | ---- | ----- |
|shirt | M | 3.99 |
|pants | S | 2.99 |
|shirt | S | 2.50 |
|shirt | L | 4.25 |
|pants | S | 4.30 |
|shirt | S | 6.50 |
|shirt | M | 2.99 |
|shirt | L | 1.25 |
What I want:
| item | size | price | percentage |
| ---- | ---- | ----- | ---------- |
|shirt | M | 3.99 |57.16 |
|pants | S | 2.99 |41.02 |
|shirt | S | 2.50 |27.78 |
|shirt | L | 4.25 |77.27 |
|pants | S | 4.30 |58.98 |
|shirt | S | 6.50 |72.22 |
|shirt | M | 2.99 |42.83 |
|shirt | L | 1.25 |22.72 |
I don't even know where to start... Thanks
**The table's looked normal in preview but now are looking quite odd. I hope you can still understand

pyspark - preprocessing with a kind of "product-join"

I have 2 datasets that I can represent as:
The first dataframe is my raw data. It contains millions of row and around 6000 areas.
+--------+------+------+-----+-----+
| user | area | time | foo | bar |
+--------+------+------+-----+-----+
| Alice | A | 5 | ... | ... |
| Alice | B | 12 | ... | ... |
| Bob | A | 2 | ... | ... |
| Charly | C | 8 | ... | ... |
+--------+------+------+-----+-----+
This second dataframe is a mapping table. It has around 200 areas (not 5000) for 150 places. Each area can have 1-N places (and a place can have 1-N areas too). It can be represented unpivoted this way:
+------+--------+-------+
| area | place | value |
+------+--------+-------+
| A | placeZ | 0.1 |
| B | placeB | 0.6 |
| B | placeC | 0.4 |
| C | placeA | 0.1 |
| C | placeB | 0.04 |
| D | placeA | 0.4 |
| D | placeC | 0.6 |
| ... | ... | ... |
+------+--------+-------+
or pivoted
+------+--------+--------+--------+-----+
| area | placeA | placeB | placeC | ... |
+------+--------+--------+--------+-----+
| A | 0 | 0 | 0 | ... |
| B | 0 | 0.6 | 0.4 | ... |
| C | 0.1 | 0.04 | 0 | ... |
| D | 0.4 | 0 | 0.6 | ... |
+------+--------+--------+--------+-----+
I would like to create a kind of product-join to have something like:
+--------+--------+--------+--------+-----+--------+
| user | placeA | placeB | placeC | ... | placeZ |
+--------+--------+--------+--------+-----+--------+
| Alice | 0 | 7.2 | 4.8 | 0 | 0.5 | <- 7.2 & 4.8 comes from area B and 0.5 from area A
| Bob | 0 | 0 | 0 | 0 | 0.2 |
| Charly | 0.8 | 0.32 | 0 | 0 | 0 |
+--------+--------+--------+--------+-----+--------+
I see 2 options so far:
Perform a left join between the main table and the pivoted one
Multiply each column by the time (around 150 columns)
Groupby user with a sum
Perform a outer join between the main table and the unpivoted one
Multiply the time by value
Pivot place
Groupby user with a sum
I don't like the first option because of the number of multiplications involved (the mapping dataframe is quite sparse).
I prefer the second option but I see two problems :
If someday, the dataset does not have a place represented, the column will not exist and the dataset will have a different shape (hence failing).
Some other features like foo, bar will be duplicated with the outer-join and I'll have to handle it on case by case at the grouping stage (sum or average).
I would like to know if there is something more ready-to-use for this kind of product-join in spark ? I have seen the OneHotEncoder but it only provides only a "1" on each column (so it is even worse than the first solution).
Thanks in advance,
Nicolas

Create a PostgreSQL table from a SAS table

I am creating a PostgreSQL table from a sas table and the sas table is showing below.type of column woe is numeric and other column are type of char.
+----------+--------------+------+-------+-----+-----+
| variable | new_variable | type | start | end | woe |
+----------+--------------+------+-------+-----+-----+
| A | mi_A | char | 1 | | 1.3 |
| A | mi_A | char | 0 | | 0.6 |
| B | mi_B | char | 1 | | 5.4 |
| B | mi_B | char | 0 | | 0.1 |
| gnd_cd | gnd_cd | char | 3 | | 1.3 |
| gnd_cd | gnd_cd | char | #0 | | 0.6 |
| gnd_cd | gnd_cd | char | 2 | | 5.4 |
| gnd_cd | gnd_cd | char | N | | 0.1 |
| gnd_cd | gnd_cd | char | 1 | | 1.3 |
| gnd_cd | gnd_cd | char | 99 | | 0.6 |
| mar_sign | mar_sign | char | 0 | | 5.4 |
| mar_sign | mar_sign | char | Y | | 0.1 |
| mar_sign | mar_sign | char | N | | 6 |
+----------+--------------+------+-------+-----+-----+
the client shows an error:syntax error at or near "end".I think the error may possibly caused by the "start" column.still don't why and how to fix it.
my code is simply a sql,and tableA is for postgresql and tableB is from SAS:
create table schema.tableA from select * from mywork.tableB;
any advice are appreciated!
SAS is much more flexible in allowing keywords for variable names. In your postgresql code you might need to add double quotes around variable or dataset names to prevent it from thinking you are typing a keyword. So use
select "end" from ...
If you do then watch out for case of the names. Without quotes Postgres will default to creating/searching for name in all lowercase. But once you add the quotes then the value in the quotes must match exactly the variable name. So if you create variable named Name it will actually create a variable named name and you can reference as NAME or NaMe.
But if you create a variable in mixed case using quote, like "Name", then the uppercase N will be in the variable name and you must use "Name" to reference it. Name without the quotes will not work since it will look for variable with lowercase, name.

Postgres Group/compress result row inside 1 linked row

I have a table structure with 2 table like this:
result table: 1 Row with generic into + a row UUID.
--------------------------
| uuid | name | other |
--------------------------
| result1 | foo | bar |
--------------------------
| result2 | foo2 | bar2 |
--------------------------
criteria_result:
-----------------------------------
| result_uuid | crit_uuid | value |
-----------------------------------
| result1 | crit1 | 7 |
-----------------------------------
| result1 | crit2 | 8 |
-----------------------------------
| result1 | crit3 | 9 |
-----------------------------------
| result1 | crit7 | 4 |
-----------------------------------
| result2 | crit1 | 2 |
-----------------------------------
What I need is 1 row per result table but that group all the criteria_result table inside, ex:
----------------------------------------------------
| uuid | name | result_crit |
----------------------------------------------------
| result1 | foo | [
| | crit1 | crit2 | crit3 | crit7 |
| | 7 | 8 | 9 | 4 |]
----------------------------------------------------
| result2 | foo2 | [
| | crit1 |
| | 2 | ]
----------------------------------------------------
Or even
-----------------------------------------
| uuid | name | result_crit |
-----------------------------------------
| result1 | foo | [ | name | value |
| crit1 | 7 |
| crit2 | 8 |
| crit3 | 9 |
| crit7 | 4 | ]
-----------------------------------------
-----------------------------------------
| result2 | foo2 | [ | name | value |
| crit1 | 2 | ]
-----------------------------------------
Anything that can get only 1 result per row when I export it but also have all Criteria of that row/result in a sub array/object.
SELECT
result.uuid,
result.name,
criteria_result.result_uuid
FROM
public.criteria_result,
public.result
WHERE
result.uuid = criteria_result.result_uuid;
I tried CUBE, GROUP BY, GROUPING SETS, but I don't seems to get it right or find the answer :/.
Thanks
Note: I do have a recent Postgres 9.5.1.

nova diagnostics in devstack development

In ssh, when I run this command
nova diagnostics 2ad0dda0-072d-46c4-8689-3c487a452248
I got all the resources in devstack
+---------------------------+----------------------+
| Property | Value |
+---------------------------+----------------------+
| cpu0_time | 3766640000000 |
| hdd_errors | 18446744073709551615 |
| hdd_read | 111736 |
| hdd_read_req | 73 |
| hdd_write | 0 |
| hdd_write_req | 0 |
| memory | 2097152 |
| memory-actual | 2097152 |
| memory-available | 1922544 |
| memory-major_fault | 2710 |
| memory-minor_fault | 10061504 |
| memory-rss | 509392 |
| memory-swap_in | 0 |
| memory-swap_out | 0 |
| memory-unused | 1079468 |
| tap5a148e0f-b8_rx | 959777 |
| tap5a148e0f-b8_rx_drop | 0 |
| tap5a148e0f-b8_rx_errors | 0 |
| tap5a148e0f-b8_rx_packets | 8758 |
| tap5a148e0f-b8_tx | 48872 |
| tap5a148e0f-b8_tx_drop | 0 |
| tap5a148e0f-b8_tx_errors | 0 |
| tap5a148e0f-b8_tx_packets | 615 |
| vda_errors | 18446744073709551615 |
| vda_read | 597230592 |
| vda_read_req | 31443 |
| vda_write | 164690944 |
| vda_write_req | 18422 |
+---------------------------+----------------------+
How can I get this in devstack user interfaces.
Please help..
Thanks in advance
its not available in openstack icehouse/juno version though it can be edited in juno to retrieve in devstack.
I didn't use openstack Kilo. In juno, if your hypervisor is libvirt, Vsphere or XenAPI then you can retrive this statistics in devstack UI. for this you have to do this:
For Libvirt
In this location ceilometer/compute/virt/libvirt/inspector.py, add this:
from oslo.utils import units
from ceilometer.compute.pollsters import util
def inspect_memory_usage(self, instance, duration=None):
instance_name = util.instance_name(instance)
domain = self._lookup_by_name(instance_name)
state = domain.info()[0]
if state == libvirt.VIR_DOMAIN_SHUTOFF:
LOG.warn(_('Failed to inspect memory usage of %(instance_name)s, '
'domain is in state of SHUTOFF'),
{'instance_name': instance_name})
return
try:
memory_stats = domain.memoryStats()
if (memory_stats and
memory_stats.get('available') and
memory_stats.get('unused')):
memory_used = (memory_stats.get('available') -
memory_stats.get('unused'))
# Stat provided from libvirt is in KB, converting it to MB.
memory_used = memory_used / units.Ki
return virt_inspector.MemoryUsageStats(usage=memory_used)
else:
LOG.warn(_('Failed to inspect memory usage of '
'%(instance_name)s, can not get info from libvirt'),
{'instance_name': instance_name})
# memoryStats might launch an exception if the method
# is not supported by the underlying hypervisor being
# used by libvirt
except libvirt.libvirtError as e:
LOG.warn(_('Failed to inspect memory usage of %(instance_name)s, '
'can not get info from libvirt: %(error)s'),
{'instance_name': instance_name, 'error': e})
for more details you can check the following link:
https://review.openstack.org/#/c/90498/