How can I count a phrase instances in a field using OrientDB?
Let's say I have the following data:
Display_title | Description
My_data_12 | The quick brown fox jumps over the lazy dog
My_data_13 | The one
How can I count the number of 'the' instances, output similar to this:
Display_title | Count
My_data_12 | 2
My_data_13 | 1
using #Vitim.us answer from here
you can call it from studio with
select title, occurrences(#this.description, "the") from v
NOTE that this is case sensitive, so with your example you'll get:
title | occurrences
My_data_12 | 1
My_data_13 | 0
Related
For a PostgreSQL table, suppose the following data is in table A:
key_path | key | value
--------------------------------------
foo[1]__scrog | scrog | apple
foo[2]__scrog | scrog | orange
bar | bar | peach
baz[1]__biscuit | biscuit | watermelon
The goal is to group data when there is an incrementing number present for an otherwise identical value for column key_path.
For context, key_path is a JSON key path and key is the leaf key. The desired outcome would be:
key_path_group | key | values
------------------------------------------------------------
[foo[1]__scrog, foo[2]__scrog] | scrog | [apple, orange]
bar | bar | peach
[baz[1]__biscuit] | biscuit | [watermelon]
Also noting that for key_path=baz[1]__biscuit even though there is only a single incrementing value, it still triggers casting to an array of length 1.
Any tips or suggestions much appreciated!
May have answered my own question (sometimes just typing it out helps). The following gets very close, if not exactly, what I'm looking for:
select
regexp_replace(key_path, '(.*)\[(\d+)\](.*)', '\1[x]\3') as key_path_group,
key,
jsonb_agg(value) as values
from A
group by gp_key_path, key;
I have the following sample data (items) with some kind of recursion. For the sake of simplicity I limited the sample to 2 level. Matter of fact - they could grow quite deep.
+----+--------------------------+----------+------------------+-------+
| ID | Item - Name | ParentID | MaterializedPath | Color |
+----+--------------------------+----------+------------------+-------+
| 1 | Parent 1 | null | 1 | green |
| 2 | Parent 2 | null | 2 | green |
| 4 | Parent 2 Child 1 | 2 | 2.4 | orange|
| 6 | Parent 2 Child 1 Child 1 | 4 | 2.4.6 | red |
| 7 | Parent 2 Child 1 Child 2 | 4 | 2.4.7 | orange|
| 3 | Parent 1 Child 1 | 1 | 1.3 | orange|
| 5 | Parent 1 Child 1 Child | 3 | 1.3.5 | red |
+----+--------------------------+----------+------------------+-------+
I need to get via SQL all children
which are not orange
for a given starting ID
with either starting ID=1. The result should be 1, 1.3.5. When start with ID=4 the should be 2.4.6.
I read little bit and found the CTE should be used. I tried the following simplified definition
WITH w1( id, parent_item_id) AS
( SELECT
i.id,
i.parent_item_id
FROM
item i
WHERE
id = 4
UNION ALL
SELECT
i.id,
i.parent_item_id
FROM
item, JOIN w1 ON i.parent_item_id = w1.id
);
However, this won't even be executed as SQL-statement. I have several question to this:
CTE could be used with Hibernate?
Is there a way have the result via SQL queries? (more or less as recursive pattern)
I'm somehow lost with the recursive pattern combined with selection of color for the end result.
Your query is invalid for the following reasons:
As documented in the manual a recursive CTE requires the RECURSIVE keyword
Your JOIN syntax is wrong. You need to remove the , and give the items table an alias.
If you need the color column, just add it to both SELECTs inside the CTE and filter the rows in the final SELECT.
If that is changed, the following works fine:
WITH recursive w1 (id, parent_item_id, color) AS
(
SELECT i.id,
i.parent_item_id,
i.color
FROM item i
WHERE id = 4
UNION ALL
SELECT i.id,
i.parent_item_id,
i.color
FROM item i --<< missing alias
JOIN w1 ON i.parent_item_id = w1.id
)
select *
from w1
where color <> 'orange'
Note that the column list for the CTE definition is optional, so you can just write with recursive w1 as ....
I have a column with values counting occurrences.
I am trying to continue the series in Power Query.
I am thus trying to increment 1 to the max of the given column..
The ID column has rows with letter tags : AB or BE. Following these letters, specific numeric ranges are associated. For both AB and BE, number ranges first from 0000 to 3000 and from 3000 to 6000.
I thus have the following possibilities: From AB0000 to AB3000 From AB3001 to AB6000 From BE0000 to BE3000 From BE3001 to AB6000
Each category match to the a specific item in my column geography, from the other workbook: From AB0000 to AB3000, it is ItalyZ From AB3001 to AB6000, it is ItalyB From BE0000 to BE3000, it is UKY From BE3001 to AB6000, it is UKM
I am thus trying to find the highest number associated to the first AB category, the second AB category, the first BE category, and the second.
My issue is that for some values, there is simply "nothing" yet in we source file.
This means that there is no occurrence yet of UKM for example.
Here is an example with no UKM or UKY:
|------------------|---------------------|
| Max | Geography |
|------------------|---------------------|
| 0562 | ItalyZ |
|------------------|---------------------|
| 0563 | ItalyZ |
|------------------|---------------------|
Hence, I have the following result:
|------------------|---------------------|
| Increment | Place |
|------------------|---------------------|
| 0564 | ItalyZ |
|------------------|---------------------|
| 0565 | ItalyZ |
|------------------|---------------------|
| 0565 | ItalyZ |
|------------------|---------------------|
| null | UKM |
|------------------|---------------------|
Here is the used power query code:
let
Source = #table({"Prefix", "Seq_Start", "Seq_End","GeoLocation"},{{"AB",0,2999,"ItalyZ"},{"AB",3000,6000,"ItalyB"},{"BC",0,299,"UKY"},{"BC",3000,6000,"UKM"}}),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Seq_Start", Int64.Type}, {"Seq_End", Int64.Type}}),
#"Merged Queries" = Table.NestedJoin(#"Changed Type", {"Prefix"}, HighestID, {"Prefix"}, "HighestID", JoinKind.LeftOuter),
#"Expanded HighestID" = Table.ExpandTableColumn(#"Merged Queries", "HighestID", {"Number"}, {"Number"}),
#"Filtered Rows" = Table.SelectRows(#"Expanded HighestID", each [Number] >= [Seq_Start] and [Number] <= [Seq_End]),
#"Grouped Rows" = Table.Group(#"Filtered Rows", {"Prefix", "Seq_Start", "Seq_End", "GeoLocation"}, {{"NextSeq", each List.Max([Number]) + 1, type number}})
in
#"Grouped Rows"
I would like to know how I could insure that when I have the first occurrence of a value, I would not have "null", but "0000" (or 0) and so on for the next occurrences.
Because, for example, if I have 0 occurrences of UKY before, I do not know why but the end results will be as follows:
|------------------|---------------------|
| Increment | Place |
|------------------|---------------------|
| 1 | UKM |
|------------------|---------------------|
| 2 | UKM |
|------------------|---------------------|
Which is not ideal because UKM should start at 30000. And because I had no values recorded before, it is starting with "null" only and then, 1, 2...rather than 3001 and 3002.
I have a file with a [start] and [end] date in Tableau and would like to create a calculated field that counts number of rows on a rolling basis that occur between [start] and [end] for each [person]. This data is like so:
| Start | End | Person
|1/1/2019 |1/7/2019 | A
|1/3/2019 |1/9/2019 | A
|1/8/2019 |1/15/2019| A
|1/1/2019 |1/7/2019 | B
I'd like to create a calculated field [count] with results like so:
| Start | End | Person | Count
|1/1/2019 |1/7/2019 | A | 1
|1/3/2019 |1/9/2019 | A | 2
|1/8/2019 |1/15/2019| A | 2
|1/1/2019 |1/7/2019 | B | 1
EDITED: A good analogy for what [count] represents is: "how many videos does each person rented at the same time as of that moment?" With the 1st row for person A, count is 1, with 1 item rented. As of row 2, person A has 2 items rented. But for the 3rd row [count]= 2 since the video rented in the first row is no longer rented.
I am quite new to OrientDB and have some trouble that keeps me for days now:
I have two classes. "PAGES" is holding information about pages, "CHECKS" contains information about checks on these pages.
They are connected by a 1 > n linkset called page2chck
It looks like this
Class PAGES
+----+---------+---------------------------------+
| Id | Title | Url |
+----+-------------------------------------------+
| 30 | Blahbla | http://www.test.com/test.html |
+----+-------------------------------------------+
| 40 | sometxt | http://www.foo.org/dummy.html |
+----+-------------------------------------------+
Class CHECKS
+---------------------+---------+
| Lastcheck | Status |
+-------------------------------+
| 2016-02-01 23:58:12 | OK |
+-------------------------------+
| 2016-02-02 22:04:24 | OK |
+-------------------------------+
| 2016-02-02 23:57:55 | ERR |
+-------------------------------+
| 2016-02-01 23:59:01 | OK |
+-------------------------------+
I created a linkset like this
CREATE LINK page2chck TYPE LINKSET FROM CHECKS.CH_PID to PAGES.Id INVERSE
Now I want to retrieve all Pages that do not have a check after 2016-02-03, and I want to show the last date they were checked and the status
What I tried was:
select Title, page2chck.Lastcheck, page2chck.Status from PAGES
where date.asLong(page2chck.Lastcheck) < 1454540400
But it returns an empty result
However, to test the integrity of the relation I ran
select Title from PAGES where page2chck.CH_PID=30
which correctly returns "BlahBlah"
So I tried
page2chck.Lastcheck, page2chck.Status, Title from PAGES
where page2chck.CH_PID=30
which returned
# |#CLASS|page2chck|page2chck|Title
----+------+---------+---------+---------------------------------
0 |null |[441] |[441] |BlahBlah
So basically I have two problems here:
How can I run comparison on the date of a linked class and
how can I show the fields of this class?
CREATE LINK page2chck TYPE LINKSET FROM CHECKS.CH_PID to PAGES.Id INVERSE
Now From console
From Studio
You can use this query
SELECT Title, $checks[0].Lastcheck as Lastcheck , $checks[0].Status as Status FROM PAGES
let $a = (select EXPAND(page2chck) from $parent.$current),
$checks= ( select Lastcheck, Status from $a where Lastcheck in
( select max(Lastcheck) from $a where Lastcheck < DATE("2016-02-03 00:00:00")))
From console
From Studio
If you want to retrieve all Pages that do not have a check after 2016-02-03
you can use this query
select from (SELECT Title, $checks[0].Lastcheck as Lastcheck , $checks[0].Status as Status FROM PAGES
let $a = ( select EXPAND(page2chck) from $parent.$current),
$checks= ( select Lastcheck, Status from $a where Lastcheck in ( select max(Lastcheck) from $a))
) where Lastcheck < DATE("2016-02-03 00:00:00")
Hope it helps.
I think I've found one of the problems in your query
select Title, page2chck.Lastcheck, page2chck.Status from PAGES where date.asLong(page2chck.Lastcheck) < 1454540400
1454540400 means 1970-01-17 21:02:20, verifiable by
select DATE(1454540400)
----+------+-------------------
# |#CLASS|DATE
----+------+-------------------
0 |null |1970-01-17 21:02:20
----+------+-------------------
BTW, you could create the linklist without the INVERSE
CREATE LINK chck2page TYPE LINKSET FROM CHECKS.CH_PID to PAGES.Id
and querying like this:
orientdb {db=pages_checks}> select chck2page.title, lastcheck, status from CHECKS where lastcheck < DATE("2016-02-01 23:59:10")
----+------+---------+-------------------+------
# |#CLASS|chck2page|lastcheck |status
----+------+---------+-------------------+------
0 |null |blablabla|2016-02-01 23:58:12|OK
1 |null |foo |2016-02-01 23:59:01|OK
----+------+---------+-------------------+------