Sphinx calculation issue with large numbers - sphinx

I have connected mysql client with sphinx server
when I issue this query
select 20130919.0+(15/4),15/4 from [INDEX] limit 1;
I get the following result
+------+--------+-------------------+----------+
| id | weight | 20130919.0+(15/4) | 15/4 |
+------+--------+-------------------+----------+
| 7414 | 1 | 20130924.000000 | 3.750000 |
+------+--------+-------------------+----------+
Note that 15/4 returns 3.75 but when it is added to 20130919.0 it returns wrong result.
in another case when i write the following query
select 2222+15/4,15/4 from [INDEX] limit 1;
It returns correct result.
+------+--------+-------------+----------+
| id | weight | 2222+15/4 | 15/4 |
+------+--------+-------------+----------+
| 7414 | 1 | 2225.750000 | 3.750000 |
+------+--------+-------------+----------+
similarly in the previous case third column should have the value 20130922.75. I thought the problem was that sphinx return rounded off number but in that case it should have been 20130923.000 not 20130924.000.
What I want is that it should return a correct floating point number but it is acting strangely. Hope someone here has any explanation for this behaviour.

Sphinx mostly does single precision float maths
http://en.wikipedia.org/wiki/Single-precision_floating-point_format
which only uses 8 bits for the exponent. The amount of decimal digits that can be stored precisely is approximately 7 - you have 8.
There is a double() function, but I havent tested it.
Edit: Actully no, double() wont help.
sphinxQL>select double(20130919.0)+(15/4),15/4 from sample2 limit 1;
+---------------------------+----------+
| double(20130919.0)+(15/4) | 15/4 |
+---------------------------+----------+
| 20130924.000000 | 3.750000 |
+---------------------------+----------+
1 row in set (0.03 sec)
sphinxQL>select double(20130919.0+(15/4)),15/4 from sample2 limit 1;
+---------------------------+----------+
| double(20130919.0+(15/4)) | 15/4 |
+---------------------------+----------+
| 20130924.000000 | 3.750000 |
+---------------------------+----------+
1 row in set (0.02 sec)

Related

Output of Show Meta in SphinxQL

I am trying to check if my config has issues or I am not understanding Show Meta correctly;
If I make a regex in the config:
regexp_filter=NY=>New York
then if I do a SphinxQL search on 'NY'
Search Index where MATCH('NY')
and then Show Meta
it should show keyword1=New and keyword2=York not NY is that correct?
And if it does not then somehow my config is not working as intended?
it should show keyword1=New and keyword2=York not NY is that correct?
This is correct. When you do MATCH('NY') and have NY=>New York regexp conversion then Sphinx first converts NY into New York and only after that it starts searching, i.e. it forgets about NY completely. The same happens when indexing: it first prepares tokens, then indexes them forgetting about the original text.
To demonstrate (this is in Manticore (fork of Sphinx), but in terms of processing regexp_filter and how it affects searching works the same was as Sphinx):
mysql> create table t(f text) regexp_filter='NY=>New York';
Query OK, 0 rows affected (0.01 sec)
mysql> insert into t values(0, 'I low New York');
Query OK, 1 row affected (0.01 sec)
mysql> select * from t where match('NY');
+---------------------+----------------+
| id | f |
+---------------------+----------------+
| 2810862456614682625 | I low New York |
+---------------------+----------------+
1 row in set (0.01 sec)
mysql> show meta;
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| total | 1 |
| total_found | 1 |
| time | 0.000 |
| keyword[0] | new |
| docs[0] | 1 |
| hits[0] | 1 |
| keyword[1] | york |
| docs[1] | 1 |
| hits[1] | 1 |
+---------------+-------+
9 rows in set (0.00 sec)

Create calculated fields in crosstab

I have a crosstab between supplier and order acceptance status, containing maxvalue of a number.
I need to create a formula like :
IF ACCEPTED > MISSING THEN "GOOD" ELSE "BAD"
Can you help with the syntax?
This is what I get using formula suggested and this is what I should get
ORDER | ACCEP | MISSING | SHOULDBE | NOW |
-------------------------------------------------------------------------------
61010 | 6 | 0 | GOOD | GOOD | FORMULAOK
61011 | 3 | 12 | BAD | BAD | FORMULAOK
63239 | 9 | 11 | BAD | BAD | FORMULAOK
66749 | 0 | | BAD | GOOD | FORMULAnotOK
76824 | 2 | 1 | GOOD | BAD | FORMULAnotOK
Use a SUM() and INT() function to do Conditional Checks ,convert Boolean as Numbers and add the numbers to get total counts. Then its just a comparison.
IF SUM(INT([ACCEPTANCESTATUS]="ACCEPTED")) > SUM(INT([ACCEPTANCESTATUS]="MISSING"))
THEN "GOOD" ELSE "BAD" END
You're data is already pivoted, you can just write a calculated field exactly has you framed your pseudo code:
IF SUM([ACCEP]) > SUM([MISSING]=)
THEN "GOOD"
ELSE "BAD"
END
You can remove the sum elements if you want the value calculated for every row (and not aggregated).

PostgreSQL simple count query

Trying to scale this down so the answer is simple. I can probably extrapolate the answers here to apply to a bigger data set.
Given the following table:
+------+-----+
| name | age |
+------+-----+
| a | 5 |
| b | 7 |
| c | 8 |
| d | 8 |
| e | 10 |
+------+-----+
I want to make a table that shows the count of people where their age is equal to or greater than x. For instance, the table about would produce:
+--------------+-------+
| at least age | count |
+--------------+-------+
| 5 | 5 |
| 6 | 4 |
| 7 | 4 |
| 8 | 3 |
| 9 | 1 |
| 10 | 1 |
+--------------+-------+
Is there a single query that can accomplish this task? Obviously, it is easy to write a simple function for it, but I'm hoping to be able to do this quickly with one query.
Thanks!
Yes, what you're looking for is a window function.
with cte_age_count as (
select age,
count(*) c_star
from people
group by age)
select age,
sum(c_star) over (order by age
range between unbounded preceding
and current row)
from cte_age_count
Not syntax checked ... let me know if it works!

Crosstabs Crystal Reports Null value vs zero

I'm creating a crosstab report showing the survey history for gopher tortoises (if you must know what that is) monitoring stations. Not all stations are monitored for a given survey and sometimes when we monitor we don't find any and thus record a 0 which is a valid result.
In the crosstab when the station isn't used I would like it to say "N/A" or some other equivalent, but when it's a zero I want it to stay as zero.
I've found so much on how to change a null to a zero, but nothing when you want to keep the zero and somehow note the null.
Below is what the crosstab should look like. You'll see that the 0 in Station4 on 1/1/2004 is "real" (meaning we didn't find any) but all of the N/A's are when we didn't use the station.
Survey Dates
| | 1/1/2000 | 1/1/2002 | 1/1/2004 | 1/1/2006 |
|----------|----------|----------|----------|----------|
| Station1 | 9 | 5 | N/A | N/A |
| Station2 | 5 | 7 | 2 | 6 |
| Station3 | N/A | N/A | 6 | 9 |
| Station4 | 10 | 9 | 0 | 11 |
This is what the Oracle table look like for the 1/1/2000 survey as an example
| SurveyID | StationID | Number |
|----------|-----------|--------|
| 1 | 1 | 9 |
| 1 | 2 | 5 |
| 1 | 4 | 6 |
So, basically how to I keep the zero's and put some text in the nulls in a CR crosstab?
Thanks!
Because CR doesn't differentiate between nulls and actual zeros in the crosstab, you can try replacing actual zero values with a placeholder so you can tell the difference. Note that this solution will only work if you are trying to display the values and not do any aggregate calculations.
First, create a formula that will replace the zeros with a placeholder value. In this case, I'm using -1 since that number should never appear in the database.
//{#Survey Num}
local numbervar totalSurvey;
totalSurvey:={Table.ActiveBurrows} + {Table.InactiveBurrows};
if totalSurvey=0 then -1 else totalSurvey
Use this formula to create your crosstab. Now you need to set a display string so that everything appears correctly. Right-click one of your crosstab cells → hit "Format Field" → select the "Common" tab → then create a "Display String" formula. That formula should be something like:
if currentfieldvalue=-1 then "0" else if currentfieldvalue=0 then "N/A" else totext(currentfieldvalue,0,'')
Now you're basically just printing the real values over top of the placeholders.

Join column with timestamps where value is maximum

I have a table that looks like
+-------+-----------+
| value | timestamp |
+-------+-----------+
and I'm trying to build a query that gives a result like
+-------+-----------+------------+------------------------+
| value | timestamp | MAX(value) | timestamp of max value |
+-------+-----------+------------+------------------------+
so that the result looks like
+---+----------+---+----------+
| 1 | 1.2.1001 | 3 | 1.1.1000 |
| 2 | 5.5.1021 | 3 | 1.1.1000 |
| 3 | 1.1.1000 | 3 | 1.1.1000 |
+---+----------+---+----------+
but I got stuck on joining the column with the corresponding timestamps.
Any hints or suggestions?
Thanks in advance!
For further information (if that helps):
In the real project the max-values are grouped by month and day (with group by clause, which works btw), but somehow I got stuck on joining the timestamps for max-values.
EDIT
Cross joins are a good idea, but I want to have them grouped by month e.g.:
+---+----------+---+----------+
| 1 | 1.1.1101 | 6 | 1.1.1300 |
| 2 | 2.6.1021 | 5 | 5.6.1000 |
| 3 | 1.1.1200 | 6 | 1.1.1300 |
| 4 | 1.1.1040 | 6 | 1.1.1300 |
| 5 | 5.6.1000 | 5 | 5.6.1000 |
| 6 | 1.1.1300 | 6 | 1.1.1300 |
+---+----------+---+----------+
EDIT 2
I've added a fiddle for some sample data and and example of the current query.
http://sqlfiddle.com/#!1/efa42/1
How to add the corresponding timestamp to the maximum?
Try a cross join with two sub queries, the first one selects all records, the second one gets one row that represents the time_stamp of the max value, <3;"1000-01-01"> for example.
SELECT col_value,col_timestamp,max_col_value, col_timestamp_of_max_value FROM table1
cross join
(
select max(col_value) max_col_value ,col_timestamp col_timestamp_of_max_value from table1
group by col_timestamp
order by max_col_value desc
limit 1
) A --One row that represents the time_stamp of the max value, ie: <3;"1000-01-01">
Use the window cause you use with pg
Select *, max( value ) over (), max( timestamp ) over() from table
That gives you the max values from all values in every row
http://www.postgresql.org/docs/9.1/static/tutorial-window.html