Select single value from sphinx MVA - sphinx

I'm currently using Sphinx MVAs (Multi Value Attribute) for indexer performance reasons, each MVA only has a single value. I'm basically using the MVA's in the same way as a sql_joined_field (I can't use sql_joined_field since you cannot filter by joined values).
I want to be able to sort by the value of the MVA. According to sphinx docs, you cannot actually do this, however, you can sort by selected derived values. (eg, MAX(price) AS sort_field or GROUP_CONCAT(tag) AS sort_field)
Is there a way to select a single value from the MVA (or possibly concatenating all values in the MVA)?

ok, while it appears you can sort by a MVA,
sphinxQL>select id,bucket_id from gi_stemmed where match('bridge') order by bucket_id desc;
+---------+-----------+
| id | bucket_id |
+---------+-----------+
| 4135611 | 492 |
| 4135609 | 492 |
| 4132078 | 492 |
| 4130626 | 492 |
| 4117904 | 492 |
| 4114632 | 490 |
| 4087884 | 490 |
| 4087786 | 490 |
| 4087767 | 490 |
| 4087010 | 490 |
| 4086927 | 490 |
| 4086920 | 490 |
| 4086125 | 490 |
| 4083465 | 761 |
| 4081812 | 491 |
| 4081713 | 490 |
| 4065533 | 490 |
| 4065427 | 490 |
| 4065338 | 490 |
| 4065321 | 490 |
+---------+-----------+
Server version: 2.2.1-dev (r4133)
ie no error. It doesn't work completely. There are a few results out of order (see 2/3rds down in the example above)
But there is a GREATEST() function, which works like MAX in your question.
sphinxQL>select id,bucket_id,greatest(bucket_id) as two from gi_stemmed where match('bridge road') order by two desc;

You can sort by MVA's...
sphinxQL>select id,bucket_id from gi_stemmed order by bucket_id desc;
+---------+-----------+
| id | bucket_id |
+---------+-----------+
| 4138739 | 492 |
| 4138708 | 492 |
| 4138671 | 492 |
| 4138663 | 492 |
| 4138661 | 492 |
| 4138615 | 492 |
bucket_id is a MVA (for a similar reason to you)
sphinxQL>describe gi_stemmed like 'bucket_id';
+-----------+------+
| Field | Type |
+-----------+------+
| bucket_id | mva |
+-----------+------+
Server version: 2.2.1-dev (r4133)

Related

Facet a Mutli-value(MVA) type field in sphinx

I have executed below query in sphinx,
select MVA_FIELD from mySphinxIndex facet MVA_FIELD order by count(*) desc;
What I got is like,
+----------------------------+----------+
| MVA_FIELD | count(*) |
+----------------------------+----------+
| | 664 |
| 0 | 536 |
| 13 | 439 |
| 4,13 | 8 |
| 19,13 | 8 |
| 18,13,20 | 8 |
| 8,17,18 | 8 |
| 8,18,13 | 8 |
| 8,15,18 | 8 |
| 8,13,20 | 7 |
| 17,13 | 7 |
| 18,19,20 | 7 |
| 8,17 | 7 |
| 13,17,19 | 7 |
| 11,6 | 7 |
| 6,11,13 | 7 |
| 15,18 | 7 |
| 11,13,20 | 7 |
| 11,13,17 | 7 |
| 6,18,19 | 6 |
| 7,20 | 6 |
| 8,11,13 | 6 |
| 13,17,20 | 6 |
I want to get the count of each ids in MVA_FIELD. For example, I just want the count of 0, 4, 13,... each id separately. How to achieve this ?
Honestly dont how how to do it with FACET suger, but with a normal GROUP BY query, would just use the GROUPBY() function when grouping by a MVA attribute
SELECT GROUPBY() AS value,COUNT(*) FROM mySphinxIndex GROUP BY MVA_FIELD ORDER BY COUNT(*) DESC;
From the docs
A special GROUPBY() function is also supported. It returns the GROUP BY key. That is particularly useful when grouping by an MVA value, in order to pick the specific value that was used to create the current group.

Why does Postgresql gives row number of answers for a count request?

I am using Postgresql 9.4.
I have this table recorded:
Colonne | Type | Modificateurs
---------+-----------------------+---------------
noemp | integer | non NULL
nomemp | character varying(15) |
emploi | character varying(14) |
mgr | integer |
dateemb | date |
sal | real |
comm | real |
nodept | integer |
Which has those values inside:
noemp | nomemp | emploi | mgr | dateemb | sal | comm | nodept
-------+-----------+----------------+------+------------+------+------+--------
7369 | SERGE | FONCTIONNAIRE | 7902 | 1980-12-07 | 800 | | 20
7499 | BRAHIM | VENDEUR | 7698 | 1981-02-20 | 1600 | 300 | 30
7521 | NASSIMA | VENDEUR | 7698 | 1981-02-22 | 1250 | 500 | 30
7566 | LUCIE | GESTIONNAIRE | 7839 | 1981-04-02 | 2975 | | 20
7654 | MARTIN | VENDEUR | 7698 | 1981-09-28 | 1250 | 1400 | 30
7698 | BENJAMIN | GESTIONNAIRE | 7839 | 1981-05-01 | 2850 | | 30
7782 | DAYANE | GESTIONNAIRE | 7839 | 1981-06-09 | 2450 | | 10
7788 | ARIJ | ANALYSTE | 7566 | 1982-12-09 | 3000 | | 20
7839 | MAYAR | PRESIDENT | | 1981-11-17 | 5000 | | 10
7844 | ROI | VENDEUR | 7698 | 1981-09-08 | 1500 | 0 | 30
7876 | VIRGINIE | FONCTIONNAIRE | 7788 | 0983-01-12 | 1100 | | 20
7902 | ASMA | ANALYSTE | 7566 | 1981-12-03 | 3000 | | 20
7934 | SIMONE | FONCTIONNAIRE | 7782 | 1982-01-23 | 1300 | | 10
7900 | LYNA | FONCTIONNAIRE | 7698 | 1981-12-03 | 950 | | 30
(14 lignes)
When I make a function to count the number of "nodept" with an asked value like this one:
CREATE OR REPLACE FUNCTION depcount(integer)RETURNS integer AS $$
DECLARE
somme integer;
BEGIN
SELECT DISTINCT(COUNT(*)) FROM EMP WHERE nodept=$1 INTO somme ;
RETURN somme;
END$$
LANGUAGE plpgsql;
with a SELECT depcount(30) FROM EMP;
I get this answer:
----------
6
6
6
6
6
6
6
6
6
6
6
6
6
6
(14 lignes)
14 results, as I should normally have only one.
I have to specify that I'm doing this for a course and I can't change the postgresql version, which must be 9.4.
If you have any idea why I get 14 results instead of one ?
thank you.
You're executing the function once per row, running the SELECT COUNT(*) 14 times and getting the result once for each row.
You probably want SELECT depcount(30) (without aFROM clause), to run the function only once.
On a side note, using a function for this sort of query is a bit overkill in most case in my opinion. You also don't need to use plpgsql, language sql would be enough here (though your function may be a bit more complicated than in your example). Using DISTINCT(COUNT(*)) doesn't really make sense either.

How to group MVA field for faceted in sphinx

I have an index where some data's has duplicate, all fields are similar except for latitude,longitude and id (field id is not realy ID, just generated row_number() OVER () AS id).
it's example:
mysql> select id,vacancy_id,prof_area_ids,latitude,longitude from jobVacancy;
+------+------------+---------------+----------+-----------+
| id | vacancy_id | prof_area_ids | latitude | longitude |
+------+------------+---------------+----------+-----------+
| 1 | 917 | 11,199,202 | 0.973178 | 0.743566 |
| 2 | 916 | 17,283,288 | 0.973178 | 0.743566 |
| 3 | 915 | 17,288 | 0.973178 | 0.743566 |
| 4 | 914 | 30,482 | 0.973178 | 0.743566 |
| 5 | 919 | 15,243 | 0.825153 | 0.692837 |
| 6 | 919 | 15,243 | 0.825162 | 0.692828 |
| 7 | 918 | 8,154 | 0.825153 | 0.692837 |
| 8 | 918 | 8,154 | 0.825162 | 0.692828 |
| 9 | 920 | 17,283,288 | 0.958914 | 1.282161 |
| 10 | 920 | 17,283,288 | 0.958915 | 1.282215 |
| 11 | 924 | 12,208 | 0.97333 | 0.658246 |
| 12 | 924 | 12,208 | 0.973336 | 0.658237 |
| 13 | 923 | 21,365 | 0.97333 | 0.658246 |
| 14 | 923 | 21,365 | 0.973336 | 0.658237 |
| 15 | 922 | 20,359 | 0.97333 | 0.658246 |
| 16 | 922 | 20,359 | 0.973336 | 0.658237 |
| 17 | 921 | 19,346 | 0.97333 | 0.658246 |
| 18 | 921 | 19,346 | 0.973336 | 0.658237 |
| 19 | 926 | 12,17,208,292 | 0.88396 | 2.389868 |
| 20 | 925 | 12,208 | 0.88396 | 2.389868 |
+------+------------+---------------+----------+-----------+
20 rows in set (0.00 sec)
Now I want to group data by vacancy_id
mysql> select id,vacancy_id,prof_area_ids,latitude,longitude from jobVacancy group by vacancy_id;
+------+------------+---------------+----------+-----------+
| id | vacancy_id | prof_area_ids | latitude | longitude |
+------+------------+---------------+----------+-----------+
| 1 | 917 | 11,199,202 | 0.973178 | 0.743566 |
| 2 | 916 | 17,283,288 | 0.973178 | 0.743566 |
| 3 | 915 | 17,288 | 0.973178 | 0.743566 |
| 4 | 914 | 30,482 | 0.973178 | 0.743566 |
| 5 | 919 | 15,243 | 0.825153 | 0.692837 |
| 7 | 918 | 8,154 | 0.825153 | 0.692837 |
| 9 | 920 | 17,283,288 | 0.958914 | 1.282161 |
| 11 | 924 | 12,208 | 0.97333 | 0.658246 |
| 13 | 923 | 21,365 | 0.97333 | 0.658246 |
| 15 | 922 | 20,359 | 0.97333 | 0.658246 |
| 17 | 921 | 19,346 | 0.97333 | 0.658246 |
| 19 | 926 | 12,17,208,292 | 0.88396 | 2.389868 |
| 20 | 925 | 12,208 | 0.88396 | 2.389868 |
| 21 | 961 | 4,105 | 0.959217 | 1.280721 |
| 23 | 960 | 8,155 | 0.959217 | 1.280721 |
| 25 | 959 | 12,208 | 0.959217 | 1.280721 |
| 27 | 928 | 1,60 | 0.963734 | 1.070297 |
| 29 | 927 | 32,513 | 0.963734 | 1.070297 |
| 31 | 929 | 6,140 | 0.786553 | 0.678649 |
| 33 | 932 | 1,40,46 | 0.824627 | 0.694182 |
+------+------------+---------------+----------+-----------+
20 rows in set (0.00 sec)
Result is awesome! But problem begins when I want to get all grouped data with faceted
mysql> select id,vacancy_id,prof_area_ids,latitude,longitude from jobVacancy where prof_area_ids=199 group by vacancy_id facet prof_area_ids;
+------+------------+-----------------+----------+-----------+
| id | vacancy_id | prof_area_ids | latitude | longitude |
+------+------------+-----------------+----------+-----------+
| 1 | 917 | 11,199,202 | 0.973178 | 0.743566 |
| 191 | 1004 | 11,196,199 | 0.925335 | 2.768874 |
| 313 | 1072 | 1,11,60,197,199 | 0.963968 | 1.070624 |
| 318 | 1136 | 11,196,199 | 0.96071 | 1.448998 |
| 374 | 1097 | 11,199 | 0.785255 | 0.678504 |
+------+------------+-----------------+----------+-----------+
5 rows in set (0.00 sec)
+---------------+----------+
| prof_area_ids | count(*) |
+---------------+----------+
| 202 | 1 |
| 199 | 12 |
| 11 | 12 |
| 196 | 5 |
| 197 | 3 |
| 60 | 3 |
| 1 | 3 |
+---------------+----------+
7 rows in set (0.02 sec)
Faceted result is incorrect. Because in fact data's count where prof_area_ids=199 must be 5 and not 12. So how I can group field for faceted?
Additionaly
I fount here http://sphinxsearch.com/blog/2013/06/21/faceted-search-with-sphinx/ but just written "If you have a MVA facet, you need to use the GROUPBY() function which returns the actual value on which the grouping was made." and without examle.
mysql> select id,vacancy_id,prof_area_ids,latitude,longitude,GROUPBY() as selected,COUNT(*) from jobVacancy where prof_area_ids=199 group by vacancy_id facet prof_area_ids;
+------+------------+-----------------+----------+-----------+----------+----------+
| id | vacancy_id | prof_area_ids | latitude | longitude | selected | count(*) |
+------+------------+-----------------+----------+-----------+----------+----------+
| 1 | 917 | 11,199,202 | 0.973178 | 0.743566 | 917 | 1 |
| 191 | 1004 | 11,196,199 | 0.925335 | 2.768874 | 1004 | 2 |
| 313 | 1072 | 1,11,60,197,199 | 0.963968 | 1.070624 | 1072 | 3 |
| 318 | 1136 | 11,196,199 | 0.96071 | 1.448998 | 1136 | 3 |
| 374 | 1097 | 11,199 | 0.785255 | 0.678504 | 1097 | 3 |
+------+------------+-----------------+----------+-----------+----------+----------+
5 rows in set (0.00 sec)
+---------------+----------+
| prof_area_ids | count(*) |
+---------------+----------+
| 202 | 1 |
| 199 | 12 |
| 11 | 12 |
| 196 | 5 |
| 197 | 3 |
| 60 | 3 |
| 1 | 3 |
+---------------+----------+
7 rows in set (0.02 sec)
Also faceted result is wrong.
Seems, wanting effectively COUNT(DISTINCT vacancy_id) on the FACET rather than the default COUNT(*), but alas it turns out
... FACET prof_area_ids,COUNT(DISTINCT vacancy_id) AS vacancies BY prof_area_ids
doesnt work. The bit before BY only supports attributes, not custom functions.
... will just have to write it out the long way, with full queries...
select id,vacancy_id,prof_area_ids,latitude,longitude from jobVacancy
where prof_area_ids=199 group by vacancy_id;
SELECT GROUPBY() AS prof_area_id, COUNT(DISTINCT vacancy_id) FROM jobVacancy
WHERE prof_area_ids=199 GROUP BY prof_area_id;
Same results, just slightly more verbose. ie rather than using FACET shorthand, write it
out in full, as multiple seperate queries.
Faceted result is incorrect. Because in fact data's count where prof_area_ids=199 must be 5 and not 12. So how I can group field for faceted?
It looks like you misunderstand how FACET works. It seems to me, that you think it takes as a base the main query's result, but it actually just does another grouping. E.g. here:
mysql> select g, t from idx_mva where t = 11 group by g facet t;
+------+----------+
| g | t |
+------+----------+
| 1 | 11,12 |
| 2 | 11,13,15 |
| 3 | 9,11 |
| 5 | 11,12,15 |
+------+----------+
4 rows in set (0.00 sec)
+------+----------+
| t | count(*) |
+------+----------+
| 12 | 2 |
| 11 | 6 |
| 15 | 4 |
| 13 | 1 |
| 9 | 1 |
| 3 | 1 |
+------+----------+
6 rows in set (0.00 sec)
for t=11 you can see that as in your case it's found 3 times in the 1st query's result, but the count for that is 6 in the FACET's query result. This is because it actually occurs 6 times in the index:
mysql> select * from idx_mva where t = 11;
+------+------+----------+
| id | g | t |
+------+------+----------+
| 2 | 1 | 11,12 |
| 3 | 1 | 11,15 |
| 3 | 2 | 11,13,15 |
| 6 | 3 | 9,11 |
| 8 | 5 | 11,12,15 |
| 11 | 2 | 3,11,15 |
+------+------+----------+
6 rows in set, 1 warning (0.00 sec)
and it happens 3 times in the 1st case only because the t's value is returned only once for each of the groups. You can use group_concat() to see more values from the same group:
mysql> select g, group_concat(to_string(t)) from idx_mva where t = 11 group by g facet t;
+------+----------------------------+
| g | group_concat(to_string(t)) |
+------+----------------------------+
| 1 | 11,12,11,15 |
| 2 | 11,13,15,3,11,15 |
| 3 | 9,11 |
| 5 | 11,12,15 |
+------+----------------------------+
4 rows in set (0.00 sec)
+------+----------+
| t | count(*) |
+------+----------+
| 12 | 2 |
| 11 | 6 |
| 15 | 4 |
| 13 | 1 |
| 9 | 1 |
| 3 | 1 |
+------+----------+
6 rows in set (0.00 sec)
If you want to learn more about faceting here's an interactive course about that - https://play.manticoresearch.com/faceting/

Get data based on latest date

Based on the dataset below, I'm trying to get the lastest cost based on the latest report date.
For example: When the report date=forecast date (column headers) then pick the values as on that report date which can be achived by this formula
IF [Report Date]=[Forecast Date] THEN [Forecasted Cost] END
but I also want to get the subsequent values as of the lastest report date i.e. 2/15/2019. How do I achieve this?
DESIRED OUTPUT
+------------+-----------+-----------+------------+------------+------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
| | 8/15/2018 | 9/15/2018 | 10/15/2018 | 11/15/2018 | 12/15/2018 | 1/15/2019 | 2/15/2019 | 3/15/2019 | 4/15/2019 | 5/15/2019 | 6/15/2019 | 7/15/2019 | 8/15/2019 |
+------------+-----------+-----------+------------+------------+------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
| Final Cost | 646.00 | 646.00 | 620.00 | 620.00 | 550.00 | 445.00 | 361.00 | 332.50 | 315.40 | 296.40 | 290.70 | 285.00 | 279.30 |
+------------+-----------+-----------+------------+------------+------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
DATASET
+------+-------------+-----------+-----------+------------+------------+------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
| Item | Report Date | 8/15/2018 | 9/15/2018 | 10/15/2018 | 11/15/2018 | 12/15/2018 | 1/15/2019 | 2/15/2019 | 3/15/2019 | 4/15/2019 | 5/15/2019 | 6/15/2019 | 7/15/2019 | 8/15/2019 |
+------+-------------+-----------+-----------+------------+------------+------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
| 4124 | 8/15/2018 | 646.00 | 646.00 | 658.00 | 658.00 | 658.00 | 658.00 | 658.00 | | | | | | |
| 4124 | 9/15/2018 | | 646 | 626 | 626 | 626 | 622 | 622 | 622 | | | | | |
| 4124 | 10/15/2018 | | | 620 | 620 | 620 | 585 | 585 | 585 | 555 | | | | |
| 4124 | 11/15/2018 | | | | 620 | 620 | 610 | 595 | 554.5 | 543.38 | 535.35 | | | |
| 4124 | 12/15/2018 | | | | | 550 | 535 | 505 | 490 | 490 | 490 | 490 | | |
| 4124 | 1/15/2019 | | | | | | 445 | 430 | 420 | 410 | 400 | 390 | 384 | |
| 4124 | 2/15/2019 | | | | | | | 361 | 332.5 | 315.4 | 296.4 | 290.7 | 285 | 279.3 |
+------+-------------+-----------+-----------+------------+------------+------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
First of all, you need to transpose your dataset, i.e. to have 4 columns "Item", "Report Date", "Forecast Date" and "Forecast Cost". Then you create a filter "forecast date >= report date" and show values by forecast date.
Now you will have multiple values for each forecast date. if you only want to get the latest value, then you can use table calculation window_min(date diff).

Create Calculated Pivot from Several Query Results in PostgreSQL

I have question regarding how to make a calculated pivot table from several query results on PostgreSQL. I've managed to make three queries results but don't have any idea how to combine and calculate all the data into a single table. I've tried to google it but found out that most of the question is about how to make a pivot table from a single table, which I'm able to do using sum, case, and group by. Well, Here's the simplified version of my query results
Query from query 1 which contains gross value
| city | code | gross |
|-------|------|--------|
| city1 | 21 | 194793 |
| city1 | 25 | 139241 |
| city1 | 28 | 231365 |
| city2 | 21 | 282025 |
| city2 | 25 | 334458 |
| city2 | 28 | 410852 |
| city3 | 21 | 109237 |
Result from query 2 which contains positive adjustments
| city | code | adj_pos |
|-------|------|---------|
| city1 | 21 | 16259 |
| city1 | 25 | 13634 |
| city1 | 28 | 45854 |
| city2 | 25 | 18060 |
| city2 | 28 | 18220 |
Result from query 3 which contains negative adjustments
| city | code | adj_neg |
|-------|------|---------|
| city1 | 25 | 23364 |
| city2 | 21 | 27478 |
| city2 | 25 | 23474 |
And what I want to to is to create something like this
| city | 21_gross | 25_gross | 28_gross | 21_pos | 25_pos | 28_pos | 21_neg | 25_neg | 28_neg |
|-------|----------|----------|----------|--------|--------|--------|--------|--------|--------|
| city1 | 194793 | 139241 | 231365 | 16259 | 13634 | 45854 | | 23364 | |
| city2 | 282025 | 334458 | 410852 | | 18060 | 18220 | 27478 | 23474 | |
| city3 | 109237 | | | | | | | | |
or probably final calculation which come from gross + positive adjustment -
negative adjustment from each city on each code like this
| city | 21_nett | 25_nett | 28_nett |
|-------|---------|---------|---------|
| city1 | 211052 | 129511 | 277219 |
| city2 | 254547 | 329044 | 429072 |
| city3 | 109237 | 0 | 0 |
Any suggestion will be appreciated. Thank you!
I think the best you can achieve is to get the pivoting part as JSON - http://sqlfiddle.com/#!17/b7d64/23:
select
city,
json_object_agg(
code,
coalesce(gross,0) + coalesce(adj_pos,0) - coalesce(adj_neg,0)
) as js
from q1
left join q2 using (city,code)
left join q3 using (city,code)
group by city