Creating table with multiple row and column names - matlab

I'm trying to create a table in matlab, with rows and columns which have several 'levels' of names, par example column name 'Neutral' which is divided into sublevels 'M' and 'SD' (see below for an illustration). I have the same problem with rows. Does anyone know if this is possible in Matlab, and if yes, how?
| Neutral |<- Column name
|----|----|
| M | SD |<- Sublevel of column name
|----|----|
|5.70|2.39|<- Data
|7.37|2.27|<-
| .. | .. |<-
| .. | .. |<-

You can nest table objects, like so:
t = table(table((1:10)', rand(10,1), 'VariableNames', {'M', 'SD'}), ...
'VariableNames', {'Neutral'});
The display looks a little odd, but you can index the nested variables in the way that you might expect, i.e. t.Neutral.M etc.

Related

How to convert nested json to data frame with kdb+

I am trying to get the data from cryptostats like below, it gives me back a nested json. I want it to be in a table format. How do I do that?
query:"https://api.cryptostats.community/api/v1/fees/oneDayTotalFees/2023-02-07";
raw:.Q.hg query;
res:.j.k raw;
To get json file, use https://api.cryptostats.community/api/v1/fees/oneDayTotalFees/2023-02-07
To view json code into a table format, use https://jsongrid.com/json-grid
Final result would be a kdb+ table which has all the cols from nested json output
They are all dictionaries
q)distinct type each res[`data]
,99h
But they do not collapse to a table because they do not all have matching keys
q)distinct key each res[`data]
`id`bundle`results`metadata`errors
`id`bundle`results`metadata
Looking at a row where errors is populated we can see it is a dictionary
q)res[`data;0;`errors]
oneDayTotalFees| "Error executing oneDayTotalFees on compound: Date incomplete"
You can create a prototype dictionary with a blank errors key in it and join , each piece of data onto it. This will result in uniform dictionaries which will be promoted to a table type 98h
q)table:(enlist[`errors]!enlist (`$())!()),/:res`data
q)type table
98h
Row which already had errors is unaffected:
q)table 0
errors | (,`oneDayTotalFees)!,"Error executing oneDayTotalFees..
id | "compound"
bundle | 0n
results | (,`oneDayTotalFees)!,0n
metadata| `source`icon`name`category`description`feeDescription;..
Row which previously did not have errors now has a valid empty dictionary
q)table 1
errors | (`symbol$())!()
id | "swapr-ethereum"
bundle | "swapr"
results | (,`oneDayTotalFees)!,24.78725
metadata| `category`name`icon`bundle`blockchain`description`feeDescription..
https://kx.com/blog/kdb-q-insights-parsing-json-files/
https://code.kx.com/q/ref/join/
https://code.kx.com/q/kb/faq/#construction
https://code.kx.com/q/basics/datatypes/
https://code.kx.com/q/ref/maps/#each-left-and-each-right
If you want to explore nested objects you can index at depth (see blog post linked above). If you have many sparse keys leaving it like this is efficient for storage:
q)select tokenSymbol:metadata[::;`tokenSymbol] from table where not ""~/:metadata[::;`tokenSymbol]
tokenSymbol
-----------
"HNY"
If you do wish to explode a nested field you can run similar to:
q)table:table,'{flip c!flip table[`metadata]#\:(c:distinct raze key each table[`metadata])}[]
q)meta table
c | t f a
----------------| -----
errors |
id | C
bundle | C
results |
metadata |
source | C
icon | C
name | C
category | C
description | C
feeDescription | C
blockchain | C
website | C
tokenTicker | C
tokenCoingecko | C
protocolLaunch | C
tokenLaunch | C
adapter | C
subtitle | C
events | C
shortName | C
protocolShutdown| C
tokenSymbol | C
subcategory | C
tokenticker | C
tokencoingecko | C
Care needs to be taken will filling in nulls and keeping consistent types of data in each column. In this dataset the events tag inside metadata is tabular data:
q)select distinct type each events from table
events
------
10
98
0
This would need to be cleaned similar to:
q)table:update events:count[i]#enlist ([] date:();description:()) from table where not 98h=type each events
The data returned from the API contains dictionaries with two distinct sets of keys:
q)distinct key each res`data
`id`bundle`results`metadata`errors
`id`bundle`results`metadata
One simple way to convert this to a table is to enlist each dictionary first, converting them to tables, then joining with uj:
q)(uj/)enlist each res`data
id bundle results metadata ..
-----------------------------------------------------------------------------..
"compound" 0n (,`oneDayTotalFees)!,0n `source`i..
"swapr-ethereum" "swapr" (,`oneDayTotalFees)!,24.78725 `category..
...
This works as uj generalises the join operator ,, allowing different schemas with common elements to be combined.

how to split a column containing pipe-separated string into two columns in scala

I have two split pipe-separated columns in a table
like
column_name
name-MARYAM BEGUM | MOHD AIJAZUR RAHMAN
fathers_name-AIJAZUR RAHMAN | MOHD HABEEB SAB
when i using explode split function, it producing 4 rows as I want two rows like
name fathers name
|SYED YOUSUF |JANI MIYA |
| MOHAMMED MUBEEN UL ALI | MOHAMMED SHAFI UL ALI|
You can use withColumn to create a new column from existing column values. You can extract data from a column using org.apache.spark.sql.functions.regexp_extract. You can also use combination of org.apache.spark.sql.functions.substring and org.apache.spark.sql.functions.instr functions etc. Please take a look at all the functions available.
//lets say "column_name" has concatenated data
df.withColumn($"name", regexp_extract($"column_name", <expression to extract name>, <group index>))
df.withColumn($"father_name", regexp_extract($"column_name", <expression to extract fathers name>,<group index>))

Choosing data structure/storage solution for complex geo queries

I have a dataset of entities with their type and lat/long. Like this:
Name Type Lat Long
House1 Big 1 2
House11 Bigger 2 2
House12 Biggest 3 2
House13 Small 4 2
House14 Medium 5 2
So these are houses with their type and location. Now I need to answer queries like: "Find all house of type Big which have a Small and a Medium house in its 10km radius"
What kind of data structure/storage solution would be right here? I looked at Elasticsearch and Redis but looks like I need to iterate over all the houses of the given type (Big for the sample query above) to answer this.
It's perfectly feasible directly from PostgreSQL with PostGIS.
Considering your table structure ...
CREATE TEMPORARY TABLE t (name TEXT, type TEXT, geom GEOGRAPHY);
... and your test data ...
INSERT INTO t VALUES ('House1','Big', ST_MakePoint(1,2));
INSERT INTO t VALUES ('House11','Bigger', ST_MakePoint(2,2));
INSERT INTO t VALUES ('House12','Biggest', ST_MakePoint(3,2));
INSERT INTO t VALUES ('House13','Small', ST_MakePoint(4,2));
INSERT INTO t VALUES ('House14','Medium', ST_MakePoint(5,2));
(Note: here makes no sense to split lat,long in different columns. PostGIS can store both in a single GEOGRAPHY or GEOMETRY column. See ST_MakePoint for more details.)
"Find all house of type Big which have a Small and a Medium house in
its 10km radius"
Try something like this using ST_Distance:
WITH j AS (SELECT * FROM t WHERE type = 'Big')
SELECT
j.name,j.type,
ST_Distance(j.geom,t.geom) AS distance,
t.name, t.type
FROM j,t
WHERE
ST_Distance(j.geom,t.geom) > 10000 AND
t.type IN ('Small','Medium');
name | type | distance | name | type
--------+------+-----------------+---------+--------
House1 | Big | 333756.3481116 | House13 | Small
House1 | Big | 445008.41595616 | House14 | Medium
(2 Zeilen)
(This query returns records which are more than 10k meters away from the Big type house. Just adapt the first where statement to your needs)
EDIT: Query based on the comments.
WITH j AS (SELECT *, ARRAY(SELECT DISTINCT t2.type
FROM t t2
WHERE t2.type IN ('Small','Medium') AND
ST_Distance(t2.geom,t1.geom) < 100000
) AS nearHouseType
FROM t t1 WHERE type = 'Big')
SELECT *
FROM j
WHERE j.nearHouseType #> '{Medium, Small}'::TEXT[]

populate cells defined by dates period

Sorry if the post is in fact a duplicate. Just could not google anything similar and I am bit stuck on approach.
I am trying to populate cells in one sheet depending on date in rows of a different sheet, like these:
Sheet1 - entry sheet
ID | Name | Start date | End date
10 | Mike | 1.06.2016 | 2.06.2016
13 | Dido | 1.06.2016 | 5.06.2016
8 | Rene | 2.06.2016 | 20.06.2016
Sheet2 - report sheet
ids/dates | 1.06.2016 | 2.06.2016 | 3.06.2016 | date+1
8 | | Rene | Rene | Rene
10 | Mike | Mike | |
13 | Dido | Dido | Dido | Dido
Column Name cell's are to be populated in sheet2 depending on Sheet1 Column ID, Start date, end date. The position of the populated cell is defined in sheet2 by column ID and row Dates that should equal the same values in sheet1.
This report could be done with help of one formula. Please, check this Example File.
Assumptions
Suppose, you have Sheet1 with data:
Col A: ID
Col B: Name
Col C: Start date
Col D: End Date
Case 1. ID's are unique.
Go to Sheet2 and paste this formula in it:
={{"ids/dates";filter(Sheet1!A2:A,Sheet1!A2:A<>"")},{ArrayFormula(add(MIN(Sheet1!C:D),COLUMN(OFFSET(A1,,,1,MAX(Sheet1!C:D)-MIN(Sheet1!C:D)))-1));ArrayFormula(if(--(add(MIN(Sheet1!C:D),COLUMN(OFFSET(A1,,,1,MAX(Sheet1!C:D)-MIN(Sheet1!C:D)))-1)>=filter(Sheet1!C2:C,Sheet1!C2:C<>0))*--(add(MIN(Sheet1!C:D),COLUMN(OFFSET(A1,,,1,MAX(Sheet1!C:D)-MIN(Sheet1!C:D)))-1)<=filter(Sheet1!D2:D,Sheet1!C2:C<>0))=1,VLOOKUP(FILTER(Sheet1!A2:A,Sheet1!A2:A<>""),Sheet1!A:B,2,0),""))}}
That's all. Report will expand automatically when new data arrives on Sheet1. The report will return error if Data is not complete (misssing Names or dates) on Sheet1.
Case 2. ID's are NOT unique.
This solution works when ID's are not unique, ID's will be grouped together. One ID belongs to one person in this case.
The formula will be a bit longer:
={{"ids/dates";sort(UNIQUE(filter(Sheet1!A2:A,Sheet1!A2:A<>"")))},{ArrayFormula(add(MIN(Sheet1!C:D),COLUMN(OFFSET(A1,,,1,MAX(Sheet1!C:D)-MIN(Sheet1!C:D)))-1));ArrayFormula(if(QUERY(QUERY({filter(Sheet1!A2:A,Sheet1!A2:A<>""),ArrayFormula((--(add(MIN(Sheet1!C:D),COLUMN(OFFSET(A1,,,1,MAX(Sheet1!C:D)-MIN(Sheet1!C:D)))-1)>=filter(Sheet1!C2:C,Sheet1!C2:C<>0))*--(add(MIN(Sheet1!C:D),COLUMN(OFFSET(A1,,,1,MAX(Sheet1!C:D)-MIN(Sheet1!C:D)))-1)<=filter(Sheet1!D2:D,Sheet1!C2:C<>0))))},"select Col1, sum(Col"&JOIN("), sum(Col",ArrayFormula(COLUMN(OFFSET(B2,,,1,MAX(Sheet1!C:D)-MIN(Sheet1!C:D)))))&") group by Col1"),"Select Col"&JOIN(", Col",ArrayFormula(COLUMN(OFFSET(B2,,,1,MAX(Sheet1!C:D)-MIN(Sheet1!C:D)))))&" where Col1>0",0)=1,VLOOKUP(sort(UNIQUE(filter(Sheet1!A2:A,Sheet1!A2:A<>""))),Sheet1!A:B,2,0),""))}}
See example here.
Case 3. IDs are NOT unique. One ID <> one name
Here's working example, please check it. This case is the hardest one. We can have multiple IDs referring to multiple names. The final formula:
={{"ids/dates",ArrayFormula(add(MIN(Sheet1!C:D),COLUMN(OFFSET(A1,,,1,MAX(Sheet1!C:D)-MIN(Sheet1!C:D)))-1))};{sort(UNIQUE(FILTER(Sheet1!A2:A,Sheet1!A2:A<>""))),ArrayFormula(IFERROR(VLOOKUP(QUERY(QUERY({FILTER(Sheet1!A2:B,Sheet1!A2:A<>""),ArrayFormula(--(add(MIN(Sheet1!C:D),COLUMN(OFFSET(A1,,,1,MAX(Sheet1!C:D)-MIN(Sheet1!C:D)))-1)>=filter(Sheet1!C2:C,Sheet1!C2:C<>0))*--(add(MIN(Sheet1!C:D),COLUMN(OFFSET(A1,,,1,MAX(Sheet1!C:D)-MIN(Sheet1!C:D)))-1)<=filter(Sheet1!D2:D,Sheet1!C2:C<>0))*row(OFFSET(A1,,,rows(FILTER(Sheet1!A2:B,Sheet1!A2:A<>"")))))},"select Col1, sum(Col"&JOIN("), sum(Col",ArrayFormula(COLUMN(OFFSET(C1,,,1,MAX(Sheet1!C:D)-MIN(Sheet1!C:D)))))&") group by Col1"),"Select Col"&JOIN(", Col",ArrayFormula(COLUMN(OFFSET(B2,,,1,MAX(Sheet1!C:D)-MIN(Sheet1!C:D)))))&" where Col1>0",0),{ArrayFormula(row(OFFSET(A1,,,rows(FILTER(Sheet1!A2:B,Sheet1!A2:A<>""))))),FILTER(Sheet1!A2:B,Sheet1!A2:A<>"")},3,0)))}}
The formula will work incorrectly if two Date ranges intersect:
102 Mike 6/21/2016 6/27/2016
102 Mike 6/11/2016 6/22/2016

Search inside full search column using certain letters

I want to search inside a full search column using certain letters, I mean:
select "Name","Country","_score" from datatable where match("Country", 'China');
Returns many rows and is ok. My question is, how can I search for example:
select "Name","Country","_score" from datatable where match("Country", 'Ch');
I want to see, China, Chile, etc.
I think that match_type phrase_prefix can be the answer, but I don't know how I can use (correct syntax).
The match predicate supports different types by use of using match_type [with (match_parameter = [value])].
So in your example using the phrase_prefix match type:
select "Name","Country","_score" from datatable where match("Country", 'Ch') using phrase_prefix;
gives you your desired results.
See the match predicate documentation: https://crate.io/docs/en/latest/sql/fulltext.html?#match-predicate
If you just need to match the beginning of a string column, you don't need a fulltext analyzed column. You can use the LIKE operator instead, e.g.:
cr> create table names_table (name string, country string);
CREATE OK (0.840 sec)
cr> insert into names_table (name, country) values ('foo', 'China'), ('bar','Chile'), ('foobar', 'Austria');
INSERT OK, 3 rows affected (0.049 sec)
cr> select * from names_table where country like 'Ch%';
+---------+------+
| country | name |
+---------+------+
| Chile | bar |
| China | foo |
+---------+------+
SELECT 2 rows in set (0.037 sec)