Sphinx Group By for both RT index and plain index - sphinx

I need to put Group by for Both Rt index and plain index...
For example:
I have 4 records with different document ids in plain index with persons of same age..
And also I have 2 records of the same document ids with same age in RT index...
When i'm putting GROUP BY by combining two Index...Grouping count want to remain same for the same document ids that are in the both index... But for me returning Grouping count which are included with the entire count for same document ids too...
Rt index data:
+-----------+------+
| id | age |
+-----------+------+
| 1 | 47 |
| 123455 | 47 |
| 123456 | 127 |
| 123457 | 55 |
| 101100063 | 51 |
Plain index..
+-----------+------+
| id | age |
+-----------+------+
| 123455 | 47 |
| 101100061 | 47 |
| 111123456 | 127 |
| 156123457 | 55 |
| 101100063 | 51 |
After Grouping Age when combing both i need the result as, need the count by skipping same document ids
+-----------+------+----------|
| id | age | Count |
+-----------+------+----------|
| 123455 | 47 | 3 |
| 101100061 | 127 | 2 |
| 111123456 | 55 | 1 |
| 156123457 | 51 | 1 |

You should be able to use COUNT(DISTINCT id) rather than just COUNT(*) to get the count. (assuming you using sphinxQL of course!)

Related

Filtering out hierarchical data

I need help with a problem I am facing processing hierarchical data.
Schema of the tables that maintain hierarchical data:
Category table:
| ID | Label |
Mapping table:
| ID | QualifierID | ItemID | ParentID |
Step 1: Wrote a simple self-join query to trasnform above mappings:
WITH category_masterlist AS (
SELECT id,
label
FROM Category
)
select id, id as itemid, label, NULL as parentId from [Category] where categoryLevel = 1
UNION
select itemid as id, itemId, (select label from category_masterlist where id = cm.itemid) Label, parentId
from [CategoryMapping] cm
Step 2: Wrote a self-join query using common table expression to return mapping data as follows:
WITH CategoryCTE(ParentID, ID, Label, CategoryLevel) AS
(
SELECT ParentID, ItemID, Label, 0 AS CategoryLevel
FROM [view_TreeviewCategoryMapping]
WHERE ParentID IS NULL
UNION ALL
SELECT e.ParentID, e.ItemID, e.Label, CategoryLevel + 1
FROM [view_TreeviewCategoryMapping] AS e
INNER JOIN CategoryCTE AS d
ON e.ParentID = d.ID
)
SELECT distinct ParentID, ID, Label, CategoryLevel
FROM CategoryCTE
| ID | Label | ParentID | CategoryLevel |
--------------------------------------------------------------------------------
| 90 | Satellite | NULL | 0 |
| 91 | Concrete | NULL | 0 |
| 92 | ETC | NULL | 0 |
| 93 | Chisel | NULL | 0 |
| 94 | Steel | NULL | 0 |
| 96 | Wood | NULL | 0 |
| 97 | MIC Systems | 90 | 1 |
| 97 | MIC Systems | 91 | 1 |
| 99 | Foundations | 91 | 1 |
| 100 | Down Systems | 91 | 1 |
| 101 | Side Systems | 91 | 1 |
| 102 | Systems | 91 | 1 |
| 98 | DWG | 92 | 1 |
| 97 | MIC Systems | 93 | 1 |
| 97 | MIC Systems | 94 | 1 |
| 99 | Foundations | 94 | 1 |
| 100 | Down Systems | 94 | 1 |
| 101 | Side Systems | 94 | 1 |
| 102 | Systems | 94 | 1 |
| 97 | MIC Systems | 95 | 1 |
| 98 | DWG | 95 | 1 |
| 102 | Systems | 95 | 1 |
| 103 | Project Management| 95 | 1 |
| 104 | Software | 95 | 1 |
| 99 | Foundations | 96 | 1 |
| 119 | Fronts | 97 | 2 |
| 121 | Technology | 98 | 2 |
| 112 | Root Systems | 98 | 2 |
| 112 | Root Systems | 99 | 2 |
| 137 | Closed Systems | 112 | 3 |
| 203 | Support | 121 | 3 |
Step 3: I would like to filter above results so that only categories that are mapped completely are returned. Completed mapping is a mapping that has children at level=3. For example, below is what I am looking for based on above resultset:
| ID | Label | ParentID | CategoryLevel |
--------------------------------------------------------------------------------
| 96 | Wood | NULL | 0 |
| 92 | ETC | NULL | 0 |
| 98 | DWG | 92 | 1 |
| 99 | Foundations | 96 | 1 |
| 121 | Technology | 98 | 2 |
| 112 | Root Systems | 98 | 2 |
| 112 | Root Systems | 99 | 2 |
| 137 | Closed Systems | 112 | 3 |
| 203 | Support | 121 | 3 |
Step 4: Ultimately, end user should be presented with a tree view control as follows:
Root
|
|---Wood
| |---Foundations
| |---Root Systems
| |---Closed Systems
|
|---ETC
| |---DWG
| |---Technology
| |---Support
| |---Root Systems
| |---Closed Systems
Please note, a category can have multiple parents. For example, Root Systems has two parents - DWG and Foundations. Did I get the schema correct for category and mapping table especially for the case when a category can have multiple parents?
How can I filter out categories that are not mapped completely from Step 2 to Step 3? That is the hurdle I am unable to cross. Any pointers? I can filter them out at the application level but would really love to filter them out at database level.
I am open to suggestions and recommendations that will help me achieve my goal. I also want a confirmation that the schema I am using is the most efficient one.
Thank you!
Here is a working option that uses the datatype hierarchyID
The nesting is option and really for illustration.
Example
Declare #Top int = null --<< Sets top of Hier Try 94
;with cteP as (
Select ID
,ParentID
,Label
,HierID = convert(hierarchyid,concat('/',ID,'/'))
From YourTable
Where IsNull(#Top,-1) = case when #Top is null then isnull(ParentID ,-1) else ID end
Union All
Select ID = r.ID
,Pt = r.ParentID
,Label = r.Label
,HierID = convert(hierarchyid,concat(p.HierID.ToString(),r.ID,'/'))
From YourTable r
Join cteP p on r.ParentID = p.ID)
Select Lvl = HierID.GetLevel()
,ID
,ParentID
,Label = replicate('|----',HierID.GetLevel()-1) + Label -- Nesting Optional ... For Presentation
,HierID_String = HierID.ToString()
From cteP A
Order By A.HierID
Results
Now if #Top was set to 94

Facet a Mutli-value(MVA) type field in sphinx

I have executed below query in sphinx,
select MVA_FIELD from mySphinxIndex facet MVA_FIELD order by count(*) desc;
What I got is like,
+----------------------------+----------+
| MVA_FIELD | count(*) |
+----------------------------+----------+
| | 664 |
| 0 | 536 |
| 13 | 439 |
| 4,13 | 8 |
| 19,13 | 8 |
| 18,13,20 | 8 |
| 8,17,18 | 8 |
| 8,18,13 | 8 |
| 8,15,18 | 8 |
| 8,13,20 | 7 |
| 17,13 | 7 |
| 18,19,20 | 7 |
| 8,17 | 7 |
| 13,17,19 | 7 |
| 11,6 | 7 |
| 6,11,13 | 7 |
| 15,18 | 7 |
| 11,13,20 | 7 |
| 11,13,17 | 7 |
| 6,18,19 | 6 |
| 7,20 | 6 |
| 8,11,13 | 6 |
| 13,17,20 | 6 |
I want to get the count of each ids in MVA_FIELD. For example, I just want the count of 0, 4, 13,... each id separately. How to achieve this ?
Honestly dont how how to do it with FACET suger, but with a normal GROUP BY query, would just use the GROUPBY() function when grouping by a MVA attribute
SELECT GROUPBY() AS value,COUNT(*) FROM mySphinxIndex GROUP BY MVA_FIELD ORDER BY COUNT(*) DESC;
From the docs
A special GROUPBY() function is also supported. It returns the GROUP BY key. That is particularly useful when grouping by an MVA value, in order to pick the specific value that was used to create the current group.

Split postgres records into groups based on time fields

I have a table with records that look like this:
| id | coord-x | coord-y | time |
---------------------------------
| 1 | 0 | 0 | 123 |
| 1 | 0 | 1 | 124 |
| 1 | 0 | 3 | 125 |
The time column represents a time in milliseconds. What I want to do is find all coord-x, coord-y as a set of points for a given timeframe for a given id. For any given id there is a unique coord-x, coord-y, and time.
What I need to do however is group these points as long as they're n milliseconds apart. So if I have this:
| id | coord-x | coord-y | time |
---------------------------------
| 1 | 0 | 0 | 123 |
| 1 | 0 | 1 | 124 |
| 1 | 0 | 3 | 125 |
| 1 | 0 | 6 | 140 |
| 1 | 0 | 7 | 141 |
I would want a result similar to this:
| id | points | start-time | end-time |
| 1 | (0,0), (0,1), (0,3) | 123 | 125 |
| 1 | (0,140), (0,141) | 140 | 141 |
I do have PostGIS installed on my database, the times I posted above are not representative but I kept them small just as a sample, the time is just a millisecond timestamp.
The tricky part is picking the expression inside your GROUP BY. If n = 5, you can do something like time / 5. To match the example exactly, the query below uses (time - 3) / 5. Once you group it, you can aggregate them into an array with array_agg.
SELECT
array_agg(("coord-x", "coord-y")) as points,
min(time) AS time_start,
max(time) AS time_end
FROM "<your_table>"
WHERE id = 1
GROUP BY (time - 3) / 5
Here is the output
+---------------------------+--------------+------------+
| points | time_start | time_end |
|---------------------------+--------------+------------|
| {"(0,0)","(0,1)","(0,3)"} | 123 | 125 |
| {"(0,6)","(0,7)"} | 140 | 141 |
+---------------------------+--------------+------------+

Comparing Subqueries

I have two subqueries. Here is the output of subquery A....
id | date_lat_lng | stat_total | rnum
-------+--------------------+------------+------
16820 | 2016_10_05_10_3802 | 9 | 2
15701 | 2016_10_05_10_3802 | 9 | 3
16821 | 2016_10_05_11_3802 | 16 | 2
17861 | 2016_10_05_11_3802 | 16 | 3
16840 | 2016_10_05_12_3683 | 42 | 2
17831 | 2016_10_05_12_3767 | 0 | 2
17862 | 2016_10_05_12_3802 | 11 | 2
17888 | 2016_10_05_13_3683 | 35 | 2
17833 | 2016_10_05_13_3767 | 24 | 2
16823 | 2016_10_05_13_3802 | 24 | 2
and subquery B, in which date_lat_lng and stat_total has commonality with subquery A, but id does not.
id | date_lat_lng | stat_total | rnum
-------+--------------------+------------+------
17860 | 2016_10_05_10_3802 | 9 | 1
15702 | 2016_10_05_11_3802 | 16 | 1
17887 | 2016_10_05_12_3683 | 42 | 1
15630 | 2016_10_05_12_3767 | 20 | 1
16822 | 2016_10_05_12_3802 | 20 | 1
16841 | 2016_10_05_13_3683 | 35 | 1
15632 | 2016_10_05_13_3767 | 23 | 1
17863 | 2016_10_05_13_3802 | 3 | 1
16842 | 2016_10_05_14_3683 | 32 | 1
15633 | 2016_10_05_14_3767 | 12 | 1
Both subquery A and B pull data from the same table. I want to delete the rows in that table that share the same ID as subquery A but only where date_lat_lng and stat_total have a shared match in subquery B.
Effectively I need:
DELETE FROM table WHERE
id IN
(SELECT id FROM (subqueryA) WHERE
subqueryA.date_lat_lng=subqueryB.date_lat_lng
AND subqueryA.stat_total=subqueryB.stat_total)
Except I'm not sure where to place subquery B, or if I need an entirely different structure.
Something like this,
DELETE FROM table WHERE
id IN (
SELECT DISTINCT id
FROM subqueryA
JOIN subqueryB
USING (id,date_lat_lng,stat_total)
)

In postgresql, how do you find aggregate base on time range

For example, if I have a database table of transactions done over the counter. And I would like to search whether there was any time that was defined as extremely busy (Processed more than 10 transaction in the span of 10 minutes). How would I go about querying it? Could I aggregate based on time range and count the amount of transaction id within those ranges?
Adding example to clarify my input and desired output:
+----+--------------------+
| Id | register_timestamp |
+----+--------------------+
| 25 | 08:10:50 |
| 26 | 09:07:36 |
| 27 | 09:08:06 |
| 28 | 09:08:35 |
| 29 | 09:12:08 |
| 30 | 09:12:18 |
| 31 | 09:12:44 |
| 32 | 09:15:29 |
| 33 | 09:15:47 |
| 34 | 09:18:13 |
| 35 | 09:18:42 |
| 36 | 09:20:33 |
| 37 | 09:20:36 |
| 38 | 09:21:04 |
| 39 | 09:21:53 |
| 40 | 09:22:23 |
| 41 | 09:22:42 |
| 42 | 09:22:51 |
| 43 | 09:28:14 |
+----+--------------------+
Desired output would be something like:
+-------+----------+
| Count | Min |
+-------+----------+
| 1 | 08:10:50 |
| 3 | 09:07:36 |
| 7 | 09:12:08 |
| 8 | 09:20:33 |
+-------+----------+
How about this:
SELECT time,
FROM (
SELECT count(*) AS c, min(time) AS time
FROM transactions
GROUP BY floor(extract(epoch from time)/600);
)
WHERE c > 10;
This will find all ten minute intervals for which more than ten transactions occurred within that interval. It assumes that the table is called transactions and that it has a column called time where the timestamp is stored.
Thanks to redneb, I ended up with the following query:
SELECT count(*) AS c, min(register_timestamp) AS register_timestamp
FROM trak_participants_data
GROUP BY floor(extract(epoch from register_timestamp)/600)
order by register_timestamp
It works close enough for me to be able tell which time chunks are the most busiest for the counter.